Skip to content

Commit

Permalink
More colour on PyArrow
Browse files Browse the repository at this point in the history
  • Loading branch information
miohtama committed Mar 15, 2024
1 parent b9305ee commit 888b4a7
Showing 1 changed file with 34 additions and 2 deletions.
36 changes: 34 additions & 2 deletions source/glossary.rst
Original file line number Diff line number Diff line change
Expand Up @@ -429,9 +429,35 @@ and algorithmic trading.

Parquet is a columnar storage format for big data processing and analysis, commonly used in the Apache Hadoop and Apache Spark ecosystems. It is optimised for fast querying and efficient storage of large, complex data sets, and supports a wide range of data formats and compression options. By organising data into columns rather than rows, Parquet enables more efficient compression and encoding, as well as improved query performance, making it a popular choice for data warehousing and analytics applications. `More information <https://parquet.apache.org/>`__.

Pyarrow
Parquent comes from Apache :term:`Arrow` ecosystem.

Python API for :term:`Arrow` library. `More information <https://arrow.apache.org/docs/python/>`__. PyArrow is an open-source Python library that provides a fast, efficient way to process and analyse large datasets, especially those in Apache Arrow format. It is used for handling columnar and/or chunked data in memory, including reading and writing data from/to disk and interprocess communication. PyArrow also provides a rich set of data structures and algorithms for working with arrays, tables, and data frames, as well as support for various data formats such as Parquet, Avro, ORC, and others. The library is designed to be highly performant and can be used in a variety of applications, including data science, machine learning, and data engineering.
See also

- :term:`PyArrow` - Python library

- :term:`Parquet` - file format

- :term:`OHLCV` - market and price data type

PyArrow

PyArrow is Python API for :term:`Arrow` library.

PyArrow is an open-source Python library that provides a fast, efficient way to process and analyse large :term:`datasets <dataset>`, especially those in Apache Arrow format. It is used for handling columnar and/or chunked data in memory, including reading and writing data from/to disk and interprocess communication. PyArrow also provides a rich set of data structures and algorithms for working with arrays, tables, and data frames, as well as support for various data formats such as :term`Parquet`, Avro, ORC, and others. The library is designed to be highly performant and can be used in a variety of applications, including data science, machine learning, and data engineering.

`More information <https://arrow.apache.org/docs/python/>`__.

See also

- :term:`Arrow` - native library

- :term:`Parquet` - file format

- :term:`Python` - programming language

- :term:`Jupyter Notebook` - data research tool

- :term:`Trading Strategy Framework` - build automated :term:`trading strategies <trading strategy>` with Python

Arrow

Expand All @@ -443,6 +469,12 @@ and algorithmic trading.

`More information <https://arrow.apache.org/docs/index.html>`__.

See also

- :term:`PyArrow` - Python library

- :term:`Parquet` - file format

Dataclass

A dataclass is a type of class in the programming language Python that is used to define data structures. It provides a convenient and efficient way of representing structured data, such as records, tuples, or database tables. Dataclasses allow for the creation of classes with automatically generated special methods, such as the `__init__`, `__repr__`, and `__eq__` methods, which are commonly used for defining classes that represent data.
Expand Down

0 comments on commit 888b4a7

Please sign in to comment.