From 888b4a74e4505a76de96e6195e58aa163a343b28 Mon Sep 17 00:00:00 2001 From: Mikko Ohtamaa Date: Fri, 15 Mar 2024 21:41:38 +0100 Subject: [PATCH] More colour on PyArrow --- source/glossary.rst | 36 ++++++++++++++++++++++++++++++++++-- 1 file changed, 34 insertions(+), 2 deletions(-) diff --git a/source/glossary.rst b/source/glossary.rst index 2e5a3f9..3506c2b 100644 --- a/source/glossary.rst +++ b/source/glossary.rst @@ -429,9 +429,35 @@ and algorithmic trading. Parquet is a columnar storage format for big data processing and analysis, commonly used in the Apache Hadoop and Apache Spark ecosystems. It is optimised for fast querying and efficient storage of large, complex data sets, and supports a wide range of data formats and compression options. By organising data into columns rather than rows, Parquet enables more efficient compression and encoding, as well as improved query performance, making it a popular choice for data warehousing and analytics applications. `More information `__. - Pyarrow + Parquent comes from Apache :term:`Arrow` ecosystem. - Python API for :term:`Arrow` library. `More information `__. PyArrow is an open-source Python library that provides a fast, efficient way to process and analyse large datasets, especially those in Apache Arrow format. It is used for handling columnar and/or chunked data in memory, including reading and writing data from/to disk and interprocess communication. PyArrow also provides a rich set of data structures and algorithms for working with arrays, tables, and data frames, as well as support for various data formats such as Parquet, Avro, ORC, and others. The library is designed to be highly performant and can be used in a variety of applications, including data science, machine learning, and data engineering. + See also + + - :term:`PyArrow` - Python library + + - :term:`Parquet` - file format + + - :term:`OHLCV` - market and price data type + + PyArrow + + PyArrow is Python API for :term:`Arrow` library. + + PyArrow is an open-source Python library that provides a fast, efficient way to process and analyse large :term:`datasets `, especially those in Apache Arrow format. It is used for handling columnar and/or chunked data in memory, including reading and writing data from/to disk and interprocess communication. PyArrow also provides a rich set of data structures and algorithms for working with arrays, tables, and data frames, as well as support for various data formats such as :term`Parquet`, Avro, ORC, and others. The library is designed to be highly performant and can be used in a variety of applications, including data science, machine learning, and data engineering. + + `More information `__. + + See also + + - :term:`Arrow` - native library + + - :term:`Parquet` - file format + + - :term:`Python` - programming language + + - :term:`Jupyter Notebook` - data research tool + + - :term:`Trading Strategy Framework` - build automated :term:`trading strategies ` with Python Arrow @@ -443,6 +469,12 @@ and algorithmic trading. `More information `__. + See also + + - :term:`PyArrow` - Python library + + - :term:`Parquet` - file format + Dataclass A dataclass is a type of class in the programming language Python that is used to define data structures. It provides a convenient and efficient way of representing structured data, such as records, tuples, or database tables. Dataclasses allow for the creation of classes with automatically generated special methods, such as the `__init__`, `__repr__`, and `__eq__` methods, which are commonly used for defining classes that represent data.