miniparquet

miniparquet is a reader for a common subset of Parquet files. miniparquet only supports rectangular-shaped data structures (no nested tables) and only the Snappy compression scheme. miniparquet has no (zero, none, 0) external dependencies and is very lightweight. It compiles in seconds to a binary size of under 1 MB.

Installation

Miniparquet comes as C++ library, a Python package and a R package. Install the R package like so:

devtools::install_github("hannesmuehleisen/miniparquet")

The C++ library can be built by typing make.

The Python package is installed using python setup.py install

Usage

Use the R package like so: df <- miniparquet::parquet_read("example.parquet")

Folders of similar-structured Parquet files (e.g. produced by Spark) can be read like this:

df <- data.table::rbindlist(lapply(Sys.glob("some-folder/part-*.parquet"), miniparquet::parquet_read))

If you find a file that should be supported but isn't, please open an issue here with a link to the file.

Use the Python package like so: miniparquet.read('example.parquet'). You can convert the result to a Pandas dataframe like so: pandas.DataFrame.from_dict(miniparquet.read('example.parquet'))

Performance

miniparquet is quite fast, on my laptop (I7-4578U) it can read compressed Parquet files at over 200 MB/s using only a single thread. Previously, there was a comparision with the arrow package here, but it appeared that results were caused by a bug which is fixed.

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
R		R
inst/extdata		inst/extdata
man		man
src		src
tests		tests
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
.travis.yml		.travis.yml
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
Makefile		Makefile
NAMESPACE		NAMESPACE
README.md		README.md
bench.cpp		bench.cpp
dependencies.R		dependencies.R
dump.py		dump.py
parquet.thrift		parquet.thrift
pq2csv.cpp		pq2csv.cpp
roundingdiff.py		roundingdiff.py
setup.py		setup.py
test.sh		test.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

miniparquet

Installation

Usage

Performance

About

Releases 2

Packages

Contributors 2

Languages

License

hannes/miniparquet

Folders and files

Latest commit

History

Repository files navigation

miniparquet

Installation

Usage

Performance

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 2

Languages

Packages