More colour on PyArrow

tradingstrategy-ai · Mar 15, 2024 · 888b4a7 · 888b4a7
1 parent b9305ee
commit 888b4a7
Showing 1 changed file with 34 additions and 2 deletions.
diff --git a/source/glossary.rst b/source/glossary.rst
@@ -429,9 +429,35 @@ and algorithmic trading.
 
         Parquet is a columnar storage format for big data processing and analysis, commonly used in the Apache Hadoop and Apache Spark ecosystems. It is optimised for fast querying and efficient storage of large, complex data sets, and supports a wide range of data formats and compression options. By organising data into columns rather than rows, Parquet enables more efficient compression and encoding, as well as improved query performance, making it a popular choice for data warehousing and analytics applications. `More information <https://parquet.apache.org/>`__.
 
-    Pyarrow
+        Parquent comes from Apache :term:`Arrow` ecosystem.
 
-        Python API for :term:`Arrow` library. `More information <https://arrow.apache.org/docs/python/>`__. PyArrow is an open-source Python library that provides a fast, efficient way to process and analyse large datasets, especially those in Apache Arrow format. It is used for handling columnar and/or chunked data in memory, including reading and writing data from/to disk and interprocess communication. PyArrow also provides a rich set of data structures and algorithms for working with arrays, tables, and data frames, as well as support for various data formats such as Parquet, Avro, ORC, and others. The library is designed to be highly performant and can be used in a variety of applications, including data science, machine learning, and data engineering.
+        See also
+
+        - :term:`PyArrow` - Python library
+
+        - :term:`Parquet` - file format
+
+        - :term:`OHLCV` - market and price data type
+
+    PyArrow
+
+        PyArrow is Python API for :term:`Arrow` library.
+
+        PyArrow is an open-source Python library that provides a fast, efficient way to process and analyse large :term:`datasets <dataset>`, especially those in Apache Arrow format. It is used for handling columnar and/or chunked data in memory, including reading and writing data from/to disk and interprocess communication. PyArrow also provides a rich set of data structures and algorithms for working with arrays, tables, and data frames, as well as support for various data formats such as :term`Parquet`, Avro, ORC, and others. The library is designed to be highly performant and can be used in a variety of applications, including data science, machine learning, and data engineering.
+
+        `More information <https://arrow.apache.org/docs/python/>`__.
+
+        See also
+
+        - :term:`Arrow` - native library
+
+        - :term:`Parquet` - file format
+
+        - :term:`Python` - programming language
+
+        - :term:`Jupyter Notebook` - data research tool
+
+        - :term:`Trading Strategy Framework` - build automated :term:`trading strategies <trading strategy>` with Python
 
     Arrow
 
@@ -443,6 +469,12 @@ and algorithmic trading.
 
         `More information <https://arrow.apache.org/docs/index.html>`__.
 
+        See also
+
+        - :term:`PyArrow` - Python library
+
+        - :term:`Parquet` - file format
+
     Dataclass
 
         A dataclass is a type of class in the programming language Python that is used to define data structures. It provides a convenient and efficient way of representing structured data, such as records, tuples, or database tables. Dataclasses allow for the creation of classes with automatically generated special methods, such as the `__init__`, `__repr__`, and `__eq__` methods, which are commonly used for defining classes that represent data.