Open Source | Windows support for Apache Parquet

Max Hora

Automation Engineer

Researcher

Software Engineer

C++

Make

Python

Software

Apache Parquet is a columnar storage file format optimized for use with big data processing frameworks. It provides efficient data compression and encoding schemes, making it a popular choice for data storage and retrieval.

Project Scope

The Apache Parquet project is a columnar storage file format optimized for use with big data processing frameworks. Apache Parquet is designed to bring efficiency and performance improvements to data storage and retrieval.

Key Contributions

- Windows support for Apache Parquet dependencies: Ensuring that previously ported to Windows Apache Arrow dependency is built and integrated correctly with Apache Parquet, updating others Apache Parquet dependencies to support Windows Platform.
- Windows support for Apache Parquet codebase: CMake scripts and source code updates to support Windows platform.
- Bug Fixes: Identified and resolved various bugs and issues, improving the stability and performance of the Apache Parquet system.
- Community Support: Actively participated in the Apache Parquet community, providing support and guidance to other contributors and users.

Technologies Used

- CMake: Automated building and configuration of the Apache Parquet condebase and its dependencies.
- C++: Utilized C++ for developing new features and fixing bugs in the Apache Parquet codebase.
- Python: Used Python for scripting and automation tasks related to the Apache Parquet project.
- Git: Used Git for version control and collaboration with other contributors.
- Markdown: Created and updated documentation using Markdown.
- Continuous Integration: Implemented continuous integration practices to ensure the quality and reliability of the Apache Parquet codebase.

Challenges and Learnings

The Apache Parquet project on Windows platform became the first widely used library on Windows to use the Windows build of previously ported by me Apache Arrow project.
Passing all automated tests and seamless building allowed to create workable Windows version from the first attempt. The on-going work on the dependencies, the same as with Apache Arrow project, required the cooperation with GitHub conda-forge maintainers of the dependencies.
Development contract
Development contract

Sample Contributions

Here are some of my notable contributions to the Apache Parquet project:
Local Windows build and Appveyor support: Codebase is updated to be compiled successfully on Windows.
Resolve unit tests issues on Windows: Fixing unit tests problems by adjusting the source code to work correctly on Windows platform.
Resolve Windows build issues with 3rd party libs: Further adjustments and enchancements of CMake build script.
Like this project
0

Posted Mar 14, 2025

The Apache Parquet project and its dependencies were successfully got the support of the Windows platform.

Likes

0

Views

2

Timeline

Nov 15, 2017 - May 4, 2018

Clients

Conda Forge

Tags

Automation Engineer

Researcher

Software Engineer

C++

Make

Python

Software

Open Source | Windows support for Apache Arrow
Open Source | Windows support for Apache Arrow
Source Elements | Low latency desktop webrtc streaming solution
Source Elements | Low latency desktop webrtc streaming solution
Maxsip Telecom | CTO working on scalable telecom solution
Maxsip Telecom | CTO working on scalable telecom solution
Status IM | Web3 platform and Facebook's react-native port
Status IM | Web3 platform and Facebook's react-native port