-
Notifications
You must be signed in to change notification settings - Fork 653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FEAT-#6767: Provide the ability to use experimental functionality when experimental mode is not enabled globally via an environment variable #6764
Conversation
cd64066
to
f5ba5a7
Compare
6e141e8
to
31dfdf1
Compare
modin.utils.enable_exp_mode
context manager
…ext manager Signed-off-by: Anatoly Myachev <[email protected]>
31dfdf1
to
e1caebb
Compare
modin/utils.py
Outdated
FactoryDispatcher._update_factory_if_already_initialized() | ||
yield | ||
finally: | ||
IsExperimental.put(old_value or False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
old_value
can be None
modin/pandas/accessor.py
Outdated
""" | ||
|
||
def __init__(self, data): | ||
if IsExperimental.get() is not True: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get rid of this requirement? I mean, if a user already gone that far by accessing an underscore df._exp
field, then he probably understands what he's doing and his intention to use experimental API is certain enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could have added a decorator for each of the methods in ExperimentalFunctions
that would enable/disable experimental mode in the fly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dchigarev done, ready for review
Signed-off-by: Anatoly Myachev <[email protected]>
modin/utils.py
Outdated
old_value = IsExperimental.get() | ||
try: | ||
IsExperimental.put(True) | ||
FactoryDispatcher._update_factory_if_already_initialized() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we hide the factory updating to the IsExperimental.put()
method? We can then make this context function generic, so it could work with other config variables that we tend to change often (for example, ExperimentalGroupbyImpl
, ExperimentalNumpyAPI
, AsyncReading
, ...):
with modin.utils.set_config_value(IsExperimental, value=True):
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An interesting idea, it seems possible, but let's check it separately.
setattr(DataFrame, "to_pickle_distributed", to_pickle_distributed) # noqa: F405 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why? it seems like it breaks the old use-case:
from modin.config import IsExperimental
IsExperimental.put(True)
import modin.experimental.pandas as pd
pd.DataFrame(...).to_pickle_distributed() # fails
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect the new behavior via ._exp
to be adding a new way of invoking experimental method, but not replacing the previous one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it breaks. Initially, this was done without taking into account the case when the experimental mode should work in conjunction with the regular one. Updating attributes during import makes context switching very difficult.
Since this is experimental code, I thought that such a breaking change would not be a big problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updating attributes during import makes context switching very difficult
why? shouldn't enable_exp_mode
context do nothing if the experimental mode is already enabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
enable_exp_mode context do nothing if the experimental mode is already enabled?
Yes, but what if import modin.experimental.pandas
is made in this context? The expected behavior would be that when exiting the context, no side effects remain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it still should be fine then. If I understand correctly, after we leave the context, the experimental methods of a DataFrame will still be accessible but not functioning, which seems like a pretty logical behavior
with enable_exp_mode():
import modin.experimental.pandas as pd
pd.DataFrame([]).to_pickle_distributed() # works
pd.DataFrame([]).to_pickle_distributed() # raises: not in experimental mode
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I brought back this function, but added a warning about future removal and an alternative to use. I hope this suits you.
modin/pandas/dataframe.py
Outdated
# Namespace for experimental functions | ||
_exp = CachedAccessor("_exp", ExperimentalFunctions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adding a private accessor to be used by users so they can access experimental functionality is fine with me, however, would like to hear opinions from others @Garra1980 @YarShev
Signed-off-by: Anatoly Myachev <[email protected]>
Maybe we should initialize both a primary factory and a respective expetimental factory at once and expose experimental pandas API that is slightly different to the original one? Doing so we could avoid an extra accessor |
Do you mean to provide access to experimental functions in the same namespace as regular functions, but which will only work when experimental mode is enabled?
Moreover, we could simply combine them into one. |
Through
We could probably combine some of them into one, and, I think since we have the executions such as ExpetimentalPyarrowOnRay and ExperimentalHdkOnNative, we should provide them when |
@YarShev I don’t really understand how in this case we will be able to provide access to experimental dataframe methods. Could you provide some more details? |
Signed-off-by: Anatoly Myachev <[email protected]>
Signed-off-by: Anatoly Myachev <[email protected]>
Signed-off-by: Anatoly Myachev <[email protected]>
33818d3
to
bee5206
Compare
Signed-off-by: Anatoly Myachev <[email protected]>
read_sql_distributed = __make_read( | ||
ExperimentalSQLDispatcher, build_args={**build_args, "base_read": read_sql} | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The experimental implementation of reading sql on the master is called a non-experimental implementation of the same function. Due to the fact that classes are built from several other classes at runtime, I have not found a more beautiful way to do the same thing, given the new class architecture (in the PR).
Signed-off-by: Anatoly Myachev <[email protected]>
2fc9113
to
dead18a
Compare
Signed-off-by: Anatoly Myachev <[email protected]>
modin.utils.enable_exp_mode
context managerSigned-off-by: Anatoly Myachev <[email protected]>
Signed-off-by: Anatoly Myachev <[email protected]>
@@ -115,7 +115,7 @@ def read_sql( | |||
|
|||
assert IsExperimental.get(), "This only works in experimental mode" | |||
|
|||
result = FactoryDispatcher.read_sql(**kwargs) | |||
result = FactoryDispatcher.read_sql_distributed(**kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new name is for internal use only.
Signed-off-by: Anatoly Myachev <[email protected]>
) | ||
def _read_sql_distributed(cls, **kwargs): | ||
current_execution = get_current_execution() | ||
if current_execution not in supported_executions: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we move this check into existing is_distributed
method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This requires adding the following attribute to all IO classes:
read_sql_distributed = __make_read(
ExperimentalSQLDispatcher, build_args={**build_args, "base_read": read_sql}
)
Looks like a complication.
Left a few comments + there are some not addressed ones. Other than that, LGTM. |
Signed-off-by: Anatoly Myachev <[email protected]>
@@ -13,4 +13,4 @@ Experimental API Reference | |||
.. autofunction:: read_csv_glob | |||
.. autofunction:: read_custom_text | |||
.. autofunction:: read_pickle_distributed | |||
.. automethod:: modin.experimental.pandas.DataFrame.to_pickle_distributed | |||
.. automethod:: modin.pandas.DataFrame.modin::to_pickle_distributed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a hint to autodoc extension, you can put a :: separator in between module name and object name to let autodoc know the correct module name if it is ambiguous.
from https://www.sphinx-doc.org/en/master/usage/extensions/autodoc.html#directive-automethod
@dchigarev, @Garra1980, any other comments? |
What do these changes do?
Main changes:
Newenable_exp_mode
context managermodin = CachedAccessor("modin", ExperimentalFunctions)
modin.experimental.pandas
.When switching between two modes (Experimental and non experimental), it would be quite difficult to remove/add them again.As a side effect, the factory is now initialized more than once when usingenable_exp_mode
context manager.flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py
black --check modin/ asv_bench/benchmarks scripts/doc_checker.py
git commit -s
docs/development/architecture.rst
is up-to-date