Skip to content
This repository has been archived by the owner on Jul 22, 2024. It is now read-only.

Latest commit

 

History

History
29 lines (15 loc) · 1.99 KB

parquet-getting-started.md

File metadata and controls

29 lines (15 loc) · 1.99 KB

Getting started with Parquet development

If you are a complete novice to Paruqet we would recommend starting with these documents:

Encodings and types

If you are looking for a description of parquet encodings please follow this link.

To understand how Parquet represents rich logical types read this

Reference implementations

There are already working implementations in other languages we find useful to check we are doing things right or when stuck understanding how particular feature is supposed to work.

parquet-mr is an official specification repository containing Thrift definitions for data structures within the Parquet file. This spec is referenced by any library that impelments Parquet.

fastparquet is probably the best implementation for Python, and it's extremely easy to follow. This is also our library of choice to work with parquet format (of course, before parquet-dotnet was created :) )

parquet-mr is an official Java implementation, somewheat overengineered, however the most stable.

parquet-cpp is an awful implementation in C++ language, struggling both with code quality and compatibility and I wouldn't recommend looking at it if you're new to parquet.

3rd Party Libraies

Snappy Sharp is used to compress and decompress via Snappy Algorithm