Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OptunaSearchCV cannot multithread on a Pipeline with multiple ColumnTransformers referencing column names #146

Open
vkarakcheev opened this issue Aug 1, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@vkarakcheev
Copy link

vkarakcheev commented Aug 1, 2024

Expected behavior

OptunaSearchCV executed on sklearn Pipeline with multiple ColumnTransformers crashes when OptunaSearchCV's n_jobs > 1 and the transformers argument of ColumnTransformers references column names. But works fine when n_jobs = 1 or when transformers argument of ColumnTransformers references column indices.

Environment

  • OS: Windows-10-10.0.19045-SP0
  • Python version: 3.11.7
  • Optuna version: 3.6.1
  • Optuna Integration version: 3.6.0
  • Sklearn version: 1.5.0
  • Pandas version: 2.1.4

Error messages, stack traces, or logs

C:\Users\SC13015\AppData\Local\Temp\ipykernel_19552\2556591622.py:1: ExperimentalWarning: OptunaSearchCV is experimental (supported from v0.17.0). The interface can change in the future.
  model = OptunaSearchCV(
[I 2024-08-01 18:57:31,427] A new study created in memory with name: no-name-55d6303b-7b50-4255-af29-7f37bb81988a
C:\Users\SC13015\AppData\Local\Anaconda3\Lib\site-packages\optuna_integration\sklearn.py:377: RuntimeWarning: Mean of empty slice
  trial.set_user_attr("mean_{}".format(name), np.nanmean(array))
C:\Users\SC13015\AppData\Local\Anaconda3\Lib\site-packages\numpy\lib\nanfunctions.py:1879: RuntimeWarning: Degrees of freedom <= 0 for slice.
  var = nanvar(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
[W 2024-08-01 18:57:31,657] Trial 0 failed with parameters: {'est__alpha': 0.2564150272852753, 'est__l1_ratio': 0.946852427396853} because of the following error: The value nan is not acceptable.
C:\Users\SC13015\AppData\Local\Anaconda3\Lib\site-packages\optuna_integration\sklearn.py:377: RuntimeWarning: Mean of empty slice
  trial.set_user_attr("mean_{}".format(name), np.nanmean(array))
C:\Users\SC13015\AppData\Local\Anaconda3\Lib\site-packages\numpy\lib\nanfunctions.py:1879: RuntimeWarning: Degrees of freedom <= 0 for slice.
  var = nanvar(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
[W 2024-08-01 18:57:31,664] Trial 1 failed with parameters: {'est__alpha': 0.19084011171917495, 'est__l1_ratio': 0.05273897241757375} because of the following error: The value nan is not acceptable.
[W 2024-08-01 18:57:31,693] Trial 1 failed with value nan.
C:\Users\SC13015\AppData\Local\Anaconda3\Lib\site-packages\optuna_integration\sklearn.py:377: RuntimeWarning: Mean of empty slice
  trial.set_user_attr("mean_{}".format(name), np.nanmean(array))
C:\Users\SC13015\AppData\Local\Anaconda3\Lib\site-packages\numpy\lib\nanfunctions.py:1879: RuntimeWarning: Degrees of freedom <= 0 for slice.
  var = nanvar(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
[W 2024-08-01 18:57:31,665] Trial 0 failed with value nan.
[W 2024-08-01 18:57:31,705] Trial 2 failed with parameters: {'est__alpha': 0.0025768190494916683, 'est__l1_ratio': 0.9202702982029136} because of the following error: The value nan is not acceptable.
[W 2024-08-01 18:57:31,740] Trial 2 failed with value nan.
C:\Users\SC13015\AppData\Local\Anaconda3\Lib\site-packages\optuna_integration\sklearn.py:377: RuntimeWarning: Mean of empty slice
  trial.set_user_attr("mean_{}".format(name), np.nanmean(array))
C:\Users\SC13015\AppData\Local\Anaconda3\Lib\site-packages\numpy\lib\nanfunctions.py:1879: RuntimeWarning: Degrees of freedom <= 0 for slice.
  var = nanvar(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
[W 2024-08-01 18:57:31,749] Trial 8 failed with parameters: {'est__alpha': 71.11084264916614, 'est__l1_ratio': 0.6008590162012909} because of the following error: The value nan is not acceptable.
C:\Users\SC13015\AppData\Local\Anaconda3\Lib\site-packages\optuna_integration\sklearn.py:377: RuntimeWarning: Mean of empty slice
  trial.set_user_attr("mean_{}".format(name), np.nanmean(array))
C:\Users\SC13015\AppData\Local\Anaconda3\Lib\site-packages\numpy\lib\nanfunctions.py:1879: RuntimeWarning: Degrees of freedom <= 0 for slice.
  var = nanvar(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
[W 2024-08-01 18:57:31,758] Trial 3 failed with parameters: {'est__alpha': 0.002797310249235216, 'est__l1_ratio': 0.9280090705873864} because of the following error: The value nan is not acceptable.
[W 2024-08-01 18:57:31,759] Trial 4 failed with parameters: {'est__alpha': 25.786889650390023, 'est__l1_ratio': 0.5543902689675048} because of the following error: The value nan is not acceptable.
[W 2024-08-01 18:57:31,759] Trial 8 failed with value nan.
[W 2024-08-01 18:57:31,761] Trial 5 failed with parameters: {'est__alpha': 370.9566773707335, 'est__l1_ratio': 0.63458459809165} because of the following error: The value nan is not acceptable.
[W 2024-08-01 18:57:31,763] Trial 6 failed with parameters: {'est__alpha': 0.13413828959725388, 'est__l1_ratio': 0.18413969872571734} because of the following error: The value nan is not acceptable.
[W 2024-08-01 18:57:31,765] Trial 7 failed with parameters: {'est__alpha': 0.109174090958778, 'est__l1_ratio': 0.002713607083478675} because of the following error: The value nan is not acceptable.
[W 2024-08-01 18:57:31,766] Trial 3 failed with value nan.
[W 2024-08-01 18:57:31,770] Trial 9 failed with parameters: {'est__alpha': 0.030474994200691385, 'est__l1_ratio': 0.1817844394988457} because of the following error: The value nan is not acceptable.
[W 2024-08-01 18:57:31,782] Trial 9 failed with value nan.
[W 2024-08-01 18:57:31,773] Trial 5 failed with value nan.
[W 2024-08-01 18:57:31,776] Trial 6 failed with value nan.
[W 2024-08-01 18:57:31,779] Trial 7 failed with value nan.
[W 2024-08-01 18:57:31,770] Trial 4 failed with value nan.
No trials are completed yet.
Traceback (most recent call last):
  File "C:\Users\SC13015\AppData\Local\Anaconda3\Lib\site-packages\optuna_integration\sklearn.py", line 820, in _refit
    self.best_estimator_.set_params(**self.study_.best_params)
                                      ^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\SC13015\AppData\Local\Anaconda3\Lib\site-packages\optuna\study\study.py", line 114, in best_params
    return self.best_trial.params
           ^^^^^^^^^^^^^^^
  File "C:\Users\SC13015\AppData\Local\Anaconda3\Lib\site-packages\optuna\study\study.py", line 157, in best_trial
    return copy.deepcopy(self._storage.get_best_trial(self._study_id))
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\SC13015\AppData\Local\Anaconda3\Lib\site-packages\optuna\storages\_in_memory.py", line 234, in get_best_trial
    raise ValueError("No trials are completed yet.")
ValueError: No trials are completed yet.

Steps to reproduce

import pandas as pd

from sklearn import set_config
from sklearn.datasets import load_iris
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.linear_model import ElasticNet

from optuna_integration import OptunaSearchCV
from optuna.distributions import FloatDistribution

iris = load_iris()
X = pd.DataFrame(iris['data'], columns=iris['feature_names'])
y = pd.Series(iris['target']).rename('iris type')

set_config(transform_output='pandas')

transf_params = dict(
    remainder='passthrough',
    verbose_feature_names_out=False,
    force_int_remainder_cols=False,
)

# Works only when n_jobs=1
sc1_cols = ['sepal length (cm)', 'sepal width (cm)']
sc2_cols = ['petal length (cm)', 'petal width (cm)']

# Works with any n_jobs
# sc1_cols = [0, 1]
# sc2_cols = [2, 3]

pipe = Pipeline([
    ('sc1', ColumnTransformer([('sc1', StandardScaler(), sc1_cols)], **transf_params)), 
    ('sc2', ColumnTransformer([('sc2', MinMaxScaler(), sc2_cols)], **transf_params)), 
    ('est', ElasticNet())
])

param_distributions = {
    'est__alpha': FloatDistribution(1e-3, 1e3, log=True),
    'est__l1_ratio': FloatDistribution(0, 1),
}

model = OptunaSearchCV(
    estimator=pipe,
    param_distributions=param_distributions,
    n_trials=10,
    n_jobs=-1,
)
model.fit(X, y)

Additional context (optional)

No response

@vkarakcheev vkarakcheev added the bug Something isn't working label Aug 1, 2024
@vkarakcheev
Copy link
Author

Added similar issue to optuna/issues/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant