Open Source | Windows support for Apache Arrow

Max Hora

Automation Engineer

Researcher

Software Engineer

C++

Conda

Python

Software

Apache Arrow is software created by and for the developer community. Apache Arrow defines a language-independent columnar memory format for flat and nested data, organized for efficient analytic operations on modern hardware like CPUs and GPUs. The Arrow memory format also supports zero-copy reads for lightning-fast data access without serialization overhead.

Project Scope

The ambitions goal of the project was to port Apache Arrow and its numerous dependencies to run on Windows, which never was done before. Windows builds support is supposed to be added in terms of conda-forge recipes system to build and publish Apache Arrow C++ and Pythong libraries to be accessible by Anaconda package manager.

Key Contributions

Windows support for Apache Arrow dependencies: Numerous Apache Arrow dependencies (GTest, Brotli, GLafgs, Flatbuffers, Thirft-CPP, TurboDBC, Snappy, ZSTD, LZ4) were updated to support Windows platform within Anaconda package manager.
Windows support for Apache Arrow codebase: CMake scripts and source code updates to support Windows platform.
Bug Fixes: Identified and resolved various bugs and issues, improving the stability and performance of the Apache Arrow system.
Documentation: Contributed to the Apache Arrow documentation, providing clear and comprehensive guides for users to understand and utilize the system effectively.
Community Support: Actively participated in the Apache Arrow community, providing support and guidance to other contributors and users.

Technologies Used

CMake: Automated building and configuration of the Apache Arrow condebase and its dependencies.
C++: Utilized C++ for developing new features and fixing bugs in the Apache Arrow codebase.
Python: Used Python for scripting and automation tasks related to the Apache Arrow project.
Git: Used Git for version control and collaboration with other contributors.
Markdown: Created and updated documentation using Markdown.
Continuous Integration: Implemented continuous integration practices to ensure the quality and reliability of the Apache Arrow codebase.

Challenges and Learnings

One of the main challenges was to add Windows support for numereous Apache Arrow dependencies and ensure their seemless automated build process within conda-forge build system. Once I have got the local version of Apache Arrow library working on the local Windows machine and tests successful passing, the work to update conda-forge recipes for the dependencies started. Progression to complete the work required cooperation with dozens of different conda-forge teams responsible for the maintaining of the various Apache Arrow's dependencies.
Feedback about the work done on Apache Arrow and Parquet projects
Feedback about the work done on Apache Arrow and Parquet projects

Sample Contributions

Here are some of my notable contributions to the Apache Arrow project:
Windows compilation support: Codebase is updated to be compiled successfully on Windows.
Automated Windows builds with Appveyor: CMake, bat and yml scripts were updated to support automated Windows builds.
Porting codebase to Windows: Various changes required to support Windows and msvc compiler.
Like this project
0

Posted Mar 14, 2025

Numerous Apache Arrow dependencies and library itself successfully got the support of the Windows platform.

Likes

0

Views

0

Timeline

May 8, 2017 - Jan 8, 2018

Clients

Conda Forge

Tags

Automation Engineer

Researcher

Software Engineer

C++

Conda

Python

Software

Source Elements | Low latency desktop webrtc streaming solution
Source Elements | Low latency desktop webrtc streaming solution
Maxsip Telecom | CTO working on scalable telecom solution
Maxsip Telecom | CTO working on scalable telecom solution
Status IM | Web3 platform and Facebook's react-native port
Status IM | Web3 platform and Facebook's react-native port
Open Source | Fluent Terminal for Windows
Open Source | Fluent Terminal for Windows