Skip to content

Commit

Permalink
FEAT-#6767: Provide the ability to use experimental functionality whe…
Browse files Browse the repository at this point in the history
…n experimental mode is not enabled globally via an environment variable (#6764)

Signed-off-by: Anatoly Myachev <[email protected]>
  • Loading branch information
anmyachev authored Dec 8, 2023
1 parent b61b40d commit c3a4f78
Show file tree
Hide file tree
Showing 32 changed files with 287 additions and 695 deletions.
1 change: 0 additions & 1 deletion .github/workflows/ci-required.yml
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,6 @@ jobs:
asv_bench/benchmarks/__init__.py asv_bench/benchmarks/io/__init__.py \
asv_bench/benchmarks/scalability/__init__.py \
modin/core/io \
modin/experimental/core/execution/ray/implementations/pandas_on_ray \
modin/experimental/core/execution/ray/implementations/pyarrow_on_ray \
modin/pandas/series.py \
modin/core/execution/python \
Expand Down
28 changes: 3 additions & 25 deletions docs/development/architecture.rst
Original file line number Diff line number Diff line change
Expand Up @@ -224,18 +224,6 @@ documentation page on :doc:`contributing </development/contributing>`.
- Uses native python execution - mainly used for debugging.
- The storage format is `pandas` and the in-memory partition type is a pandas DataFrame.
- For more information on the execution path, see the :doc:`pandas on Python </flow/modin/core/execution/python/implementations/pandas_on_python/index>` page.
- pandas on Ray (experimental)
- Uses the Ray_ execution framework.
- The storage format is `pandas` and the in-memory partition type is a pandas DataFrame.
- For more information on the execution path, see the :doc:`experimental pandas on Ray </flow/modin/experimental/core/execution/ray/implementations/pandas_on_ray/index>` page.
- pandas on MPI (experimental)
- Uses MPI_ through the Unidist_ execution framework.
- The storage format is `pandas` and the in-memory partition type is a pandas DataFrame.
- For more information on the execution path, see the :doc:`experimental pandas on Unidist </flow/modin/experimental/core/execution/unidist/implementations/pandas_on_unidist/index>` page.
- pandas on Dask (experimental)
- Uses the Dask_ execution framework.
- The storage format is `pandas` and the in-memory partition type is a pandas DataFrame.
- For more information on the execution path, see the :doc:`experimental pandas on Dask </flow/modin/experimental/core/execution/dask/implementations/pandas_on_dask/index>` page.
- :doc:`HDK on Native </development/using_hdk>` (experimental)
- Uses HDK as an engine.
- The storage format is `hdk` and the in-memory partition type is a pyarrow Table. When defaulting to pandas, the pandas DataFrame is used.
Expand Down Expand Up @@ -341,19 +329,9 @@ details. The documentation covers most modules, with more docs being added every
│ ├─── :doc:`experimental </flow/modin/experimental/index>`
│ │ ├───core
│ │ │ ├───execution
│ │ │ │ ├───native
│ │ │ │ │ └───implementations
│ │ │ │ │ └─── :doc:`hdk_on_native </flow/modin/experimental/core/execution/native/implementations/hdk_on_native/index>`
│ │ │ │ ├───ray
│ │ │ │ │ └───implementations
│ │ │ │ │ ├─── :doc:`pandas_on_ray </flow/modin/experimental/core/execution/ray/implementations/pandas_on_ray/index>`
│ │ │ │ │ └─── :doc:`pyarrow_on_ray </flow/modin/experimental/core/execution/ray/implementations/pyarrow_on_ray>`
│ │ │ │ ├───unidist
│ │ │ │ | └───implementations
│ │ │ │ | └─── :doc:`pandas_on_unidist </flow/modin/experimental/core/execution/unidist/implementations/pandas_on_unidist/index>`
| │ | | └───dask
| | | | └───implementations
│ │ │ │ └─── :doc:`pandas_on_dask </flow/modin/experimental/core/execution/dask/implementations/pandas_on_dask/index>`
│ │ │ │ └───native
│ │ │ │ └───implementations
│ │ │ │ └─── :doc:`hdk_on_native </flow/modin/experimental/core/execution/native/implementations/hdk_on_native/index>`
│ │ │ ├─── :doc:`storage_formats </flow/modin/experimental/core/storage_formats/index>`
| │ │ | ├─── :doc:`hdk </flow/modin/experimental/core/storage_formats/hdk/index>`
│ │ │ | └─── :doc:`pyarrow </flow/modin/experimental/core/storage_formats/pyarrow/index>`
Expand Down

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

2 changes: 1 addition & 1 deletion docs/flow/modin/experimental/pandas.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,4 +13,4 @@ Experimental API Reference
.. autofunction:: read_csv_glob
.. autofunction:: read_custom_text
.. autofunction:: read_pickle_distributed
.. automethod:: modin.experimental.pandas.DataFrame.to_pickle_distributed
.. automethod:: modin.pandas.DataFrame.modin::to_pickle_distributed
3 changes: 2 additions & 1 deletion docs/supported_apis/dataframe_supported.rst
Original file line number Diff line number Diff line change
Expand Up @@ -424,7 +424,8 @@ default to pandas.
+----------------------------+---------------------------+------------------------+----------------------------------------------------+
| ``to_period`` | `to_period`_ | D | |
+----------------------------+---------------------------+------------------------+----------------------------------------------------+
| ``to_pickle`` | `to_pickle`_ | D | Experimental implementation: to_pickle_distributed |
| ``to_pickle`` | `to_pickle`_ | D | Experimental implementation: |
| | | | DataFrame.modin.to_pickle_distributed |
+----------------------------+---------------------------+------------------------+----------------------------------------------------+
| ``to_records`` | `to_records`_ | D | |
+----------------------------+---------------------------+------------------------+----------------------------------------------------+
Expand Down
2 changes: 1 addition & 1 deletion docs/usage_guide/advanced_usage/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ Modin also supports these experimental APIs on top of pandas that are under acti
- :py:func:`~modin.experimental.pandas.read_sql` -- add optional parameters for the database connection
- :py:func:`~modin.experimental.pandas.read_custom_text` -- read custom text data from file
- :py:func:`~modin.experimental.pandas.read_pickle_distributed` -- read multiple files in a directory
- :py:meth:`~modin.experimental.pandas.DataFrame.to_pickle_distributed` -- write to multiple files in a directory
- :py:meth:`~modin.pandas.DataFrame.modin.to_pickle_distributed` -- write to multiple files in a directory

DataFrame partitioning API
--------------------------
Expand Down
22 changes: 22 additions & 0 deletions modin/core/execution/dask/implementations/pandas_on_dask/io/io.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,9 @@
SQLDispatcher,
)
from modin.core.storage_formats.pandas.parsers import (
ExperimentalCustomTextParser,
ExperimentalPandasPickleParser,
PandasCSVGlobParser,
PandasCSVParser,
PandasExcelParser,
PandasFeatherParser,
Expand All @@ -40,6 +43,12 @@
PandasSQLParser,
)
from modin.core.storage_formats.pandas.query_compiler import PandasQueryCompiler
from modin.experimental.core.io import (
ExperimentalCSVGlobDispatcher,
ExperimentalCustomTextDispatcher,
ExperimentalPickleDispatcher,
ExperimentalSQLDispatcher,
)


class PandasOnDaskIO(BaseIO):
Expand Down Expand Up @@ -74,5 +83,18 @@ def __make_write(*classes, build_args=build_args):
to_sql = __make_write(SQLDispatcher)
read_excel = __make_read(PandasExcelParser, ExcelDispatcher)

# experimental methods that don't exist in pandas
read_csv_glob = __make_read(PandasCSVGlobParser, ExperimentalCSVGlobDispatcher)
read_pickle_distributed = __make_read(
ExperimentalPandasPickleParser, ExperimentalPickleDispatcher
)
to_pickle_distributed = __make_write(ExperimentalPickleDispatcher)
read_custom_text = __make_read(
ExperimentalCustomTextParser, ExperimentalCustomTextDispatcher
)
read_sql_distributed = __make_read(
ExperimentalSQLDispatcher, build_args={**build_args, "base_read": read_sql}
)

del __make_read # to not pollute class namespace
del __make_write # to not pollute class namespace
Loading

0 comments on commit c3a4f78

Please sign in to comment.