Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEAT-#6767: Provide the ability to use experimental functionality when experimental mode is not enabled globally via an environment variable #6764

Merged
merged 19 commits into from
Dec 8, 2023
Merged
Show file tree
Hide file tree
Changes from 16 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion .github/workflows/ci-required.yml
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,6 @@ jobs:
asv_bench/benchmarks/__init__.py asv_bench/benchmarks/io/__init__.py \
asv_bench/benchmarks/scalability/__init__.py \
modin/core/io \
modin/experimental/core/execution/ray/implementations/pandas_on_ray \
modin/experimental/core/execution/ray/implementations/pyarrow_on_ray \
modin/pandas/series.py \
modin/core/execution/python \
Expand Down
28 changes: 3 additions & 25 deletions docs/development/architecture.rst
Original file line number Diff line number Diff line change
Expand Up @@ -224,18 +224,6 @@ documentation page on :doc:`contributing </development/contributing>`.
- Uses native python execution - mainly used for debugging.
- The storage format is `pandas` and the in-memory partition type is a pandas DataFrame.
- For more information on the execution path, see the :doc:`pandas on Python </flow/modin/core/execution/python/implementations/pandas_on_python/index>` page.
- pandas on Ray (experimental)
- Uses the Ray_ execution framework.
- The storage format is `pandas` and the in-memory partition type is a pandas DataFrame.
- For more information on the execution path, see the :doc:`experimental pandas on Ray </flow/modin/experimental/core/execution/ray/implementations/pandas_on_ray/index>` page.
- pandas on MPI (experimental)
- Uses MPI_ through the Unidist_ execution framework.
- The storage format is `pandas` and the in-memory partition type is a pandas DataFrame.
- For more information on the execution path, see the :doc:`experimental pandas on Unidist </flow/modin/experimental/core/execution/unidist/implementations/pandas_on_unidist/index>` page.
- pandas on Dask (experimental)
- Uses the Dask_ execution framework.
- The storage format is `pandas` and the in-memory partition type is a pandas DataFrame.
- For more information on the execution path, see the :doc:`experimental pandas on Dask </flow/modin/experimental/core/execution/dask/implementations/pandas_on_dask/index>` page.
- :doc:`HDK on Native </development/using_hdk>` (experimental)
- Uses HDK as an engine.
- The storage format is `hdk` and the in-memory partition type is a pyarrow Table. When defaulting to pandas, the pandas DataFrame is used.
Expand Down Expand Up @@ -341,19 +329,9 @@ details. The documentation covers most modules, with more docs being added every
│ ├─── :doc:`experimental </flow/modin/experimental/index>`
│ │ ├───core
│ │ │ ├───execution
│ │ │ │ ├───native
│ │ │ │ │ └───implementations
│ │ │ │ │ └─── :doc:`hdk_on_native </flow/modin/experimental/core/execution/native/implementations/hdk_on_native/index>`
│ │ │ │ ├───ray
│ │ │ │ │ └───implementations
│ │ │ │ │ ├─── :doc:`pandas_on_ray </flow/modin/experimental/core/execution/ray/implementations/pandas_on_ray/index>`
│ │ │ │ │ └─── :doc:`pyarrow_on_ray </flow/modin/experimental/core/execution/ray/implementations/pyarrow_on_ray>`
│ │ │ │ ├───unidist
│ │ │ │ | └───implementations
│ │ │ │ | └─── :doc:`pandas_on_unidist </flow/modin/experimental/core/execution/unidist/implementations/pandas_on_unidist/index>`
| │ | | └───dask
| | | | └───implementations
│ │ │ │ └─── :doc:`pandas_on_dask </flow/modin/experimental/core/execution/dask/implementations/pandas_on_dask/index>`
│ │ │ │ └───native
│ │ │ │ └───implementations
│ │ │ │ └─── :doc:`hdk_on_native </flow/modin/experimental/core/execution/native/implementations/hdk_on_native/index>`
│ │ │ ├─── :doc:`storage_formats </flow/modin/experimental/core/storage_formats/index>`
| │ │ | ├─── :doc:`hdk </flow/modin/experimental/core/storage_formats/hdk/index>`
│ │ │ | └─── :doc:`pyarrow </flow/modin/experimental/core/storage_formats/pyarrow/index>`
Expand Down

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

This file was deleted.

2 changes: 1 addition & 1 deletion docs/flow/modin/experimental/pandas.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,4 +13,4 @@ Experimental API Reference
.. autofunction:: read_csv_glob
.. autofunction:: read_custom_text
.. autofunction:: read_pickle_distributed
.. automethod:: modin.experimental.pandas.DataFrame.to_pickle_distributed
.. automethod:: modin.experimental.pandas.DataFrame.modin.to_pickle_distributed
3 changes: 2 additions & 1 deletion docs/supported_apis/dataframe_supported.rst
Original file line number Diff line number Diff line change
Expand Up @@ -424,7 +424,8 @@ default to pandas.
+----------------------------+---------------------------+------------------------+----------------------------------------------------+
| ``to_period`` | `to_period`_ | D | |
+----------------------------+---------------------------+------------------------+----------------------------------------------------+
| ``to_pickle`` | `to_pickle`_ | D | Experimental implementation: to_pickle_distributed |
| ``to_pickle`` | `to_pickle`_ | D | Experimental implementation: |
| | | | DataFrame.modin.to_pickle_distributed |
+----------------------------+---------------------------+------------------------+----------------------------------------------------+
| ``to_records`` | `to_records`_ | D | |
+----------------------------+---------------------------+------------------------+----------------------------------------------------+
Expand Down
2 changes: 1 addition & 1 deletion docs/usage_guide/advanced_usage/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ Modin also supports these experimental APIs on top of pandas that are under acti
- :py:func:`~modin.experimental.pandas.read_sql` -- add optional parameters for the database connection
- :py:func:`~modin.experimental.pandas.read_custom_text` -- read custom text data from file
- :py:func:`~modin.experimental.pandas.read_pickle_distributed` -- read multiple files in a directory
- :py:meth:`~modin.experimental.pandas.DataFrame.to_pickle_distributed` -- write to multiple files in a directory
- :py:meth:`~modin.experimental.pandas.DataFrame.modin.to_pickle_distributed` -- write to multiple files in a directory
anmyachev marked this conversation as resolved.
Show resolved Hide resolved
YarShev marked this conversation as resolved.
Show resolved Hide resolved

DataFrame partitioning API
--------------------------
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,9 @@
SQLDispatcher,
)
from modin.core.storage_formats.pandas.parsers import (
ExperimentalCustomTextParser,
ExperimentalPandasPickleParser,
YarShev marked this conversation as resolved.
Show resolved Hide resolved
PandasCSVGlobParser,
PandasCSVParser,
PandasExcelParser,
PandasFeatherParser,
Expand All @@ -40,6 +43,12 @@
PandasSQLParser,
)
from modin.core.storage_formats.pandas.query_compiler import PandasQueryCompiler
from modin.experimental.core.io import (
ExperimentalCSVGlobDispatcher,
ExperimentalCustomTextDispatcher,
ExperimentalPickleDispatcher,
ExperimentalSQLDispatcher,
)


class PandasOnDaskIO(BaseIO):
Expand Down Expand Up @@ -74,5 +83,18 @@ def __make_write(*classes, build_args=build_args):
to_sql = __make_write(SQLDispatcher)
read_excel = __make_read(PandasExcelParser, ExcelDispatcher)

# experimental methods that don't exist in pandas
read_csv_glob = __make_read(PandasCSVGlobParser, ExperimentalCSVGlobDispatcher)
read_pickle_distributed = __make_read(
ExperimentalPandasPickleParser, ExperimentalPickleDispatcher
)
to_pickle_distributed = __make_write(ExperimentalPickleDispatcher)
read_custom_text = __make_read(
ExperimentalCustomTextParser, ExperimentalCustomTextDispatcher
)
read_sql_distributed = __make_read(
ExperimentalSQLDispatcher, build_args={**build_args, "base_read": read_sql}
)
Comment on lines +95 to +97
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The experimental implementation of reading sql on the master is called a non-experimental implementation of the same function. Due to the fact that classes are built from several other classes at runtime, I have not found a more beautiful way to do the same thing, given the new class architecture (in the PR).


del __make_read # to not pollute class namespace
del __make_write # to not pollute class namespace
Loading
Loading