OptunaSearchCV cannot multithread on a Pipeline with multiple ColumnTransformers referencing column names #146

vkarakcheev · 2024-08-01T17:03:17Z

Expected behavior

OptunaSearchCV executed on sklearn Pipeline with multiple ColumnTransformers crashes when OptunaSearchCV's n_jobs > 1 and the transformers argument of ColumnTransformers references column names. But works fine when n_jobs = 1 or when transformers argument of ColumnTransformers references column indices.

Environment

OS: Windows-10-10.0.19045-SP0
Python version: 3.11.7
Optuna version: 3.6.1
Optuna Integration version: 3.6.0
Sklearn version: 1.5.0
Pandas version: 2.1.4

Error messages, stack traces, or logs

C:\Users\SC13015\AppData\Local\Temp\ipykernel_19552\2556591622.py:1: ExperimentalWarning: OptunaSearchCV is experimental (supported from v0.17.0). The interface can change in the future.
  model = OptunaSearchCV(
[I 2024-08-01 18:57:31,427] A new study created in memory with name: no-name-55d6303b-7b50-4255-af29-7f37bb81988a
C:\Users\SC13015\AppData\Local\Anaconda3\Lib\site-packages\optuna_integration\sklearn.py:377: RuntimeWarning: Mean of empty slice
  trial.set_user_attr("mean_{}".format(name), np.nanmean(array))
C:\Users\SC13015\AppData\Local\Anaconda3\Lib\site-packages\numpy\lib\nanfunctions.py:1879: RuntimeWarning: Degrees of freedom <= 0 for slice.
  var = nanvar(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
[W 2024-08-01 18:57:31,657] Trial 0 failed with parameters: {'est__alpha': 0.2564150272852753, 'est__l1_ratio': 0.946852427396853} because of the following error: The value nan is not acceptable.
C:\Users\SC13015\AppData\Local\Anaconda3\Lib\site-packages\optuna_integration\sklearn.py:377: RuntimeWarning: Mean of empty slice
  trial.set_user_attr("mean_{}".format(name), np.nanmean(array))
C:\Users\SC13015\AppData\Local\Anaconda3\Lib\site-packages\numpy\lib\nanfunctions.py:1879: RuntimeWarning: Degrees of freedom <= 0 for slice.
  var = nanvar(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
[W 2024-08-01 18:57:31,664] Trial 1 failed with parameters: {'est__alpha': 0.19084011171917495, 'est__l1_ratio': 0.05273897241757375} because of the following error: The value nan is not acceptable.
[W 2024-08-01 18:57:31,693] Trial 1 failed with value nan.
C:\Users\SC13015\AppData\Local\Anaconda3\Lib\site-packages\optuna_integration\sklearn.py:377: RuntimeWarning: Mean of empty slice
  trial.set_user_attr("mean_{}".format(name), np.nanmean(array))
C:\Users\SC13015\AppData\Local\Anaconda3\Lib\site-packages\numpy\lib\nanfunctions.py:1879: RuntimeWarning: Degrees of freedom <= 0 for slice.
  var = nanvar(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
[W 2024-08-01 18:57:31,665] Trial 0 failed with value nan.
[W 2024-08-01 18:57:31,705] Trial 2 failed with parameters: {'est__alpha': 0.0025768190494916683, 'est__l1_ratio': 0.9202702982029136} because of the following error: The value nan is not acceptable.
[W 2024-08-01 18:57:31,740] Trial 2 failed with value nan.
C:\Users\SC13015\AppData\Local\Anaconda3\Lib\site-packages\optuna_integration\sklearn.py:377: RuntimeWarning: Mean of empty slice
  trial.set_user_attr("mean_{}".format(name), np.nanmean(array))
C:\Users\SC13015\AppData\Local\Anaconda3\Lib\site-packages\numpy\lib\nanfunctions.py:1879: RuntimeWarning: Degrees of freedom <= 0 for slice.
  var = nanvar(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
[W 2024-08-01 18:57:31,749] Trial 8 failed with parameters: {'est__alpha': 71.11084264916614, 'est__l1_ratio': 0.6008590162012909} because of the following error: The value nan is not acceptable.
C:\Users\SC13015\AppData\Local\Anaconda3\Lib\site-packages\optuna_integration\sklearn.py:377: RuntimeWarning: Mean of empty slice
  trial.set_user_attr("mean_{}".format(name), np.nanmean(array))
C:\Users\SC13015\AppData\Local\Anaconda3\Lib\site-packages\numpy\lib\nanfunctions.py:1879: RuntimeWarning: Degrees of freedom <= 0 for slice.
  var = nanvar(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
[W 2024-08-01 18:57:31,758] Trial 3 failed with parameters: {'est__alpha': 0.002797310249235216, 'est__l1_ratio': 0.9280090705873864} because of the following error: The value nan is not acceptable.
[W 2024-08-01 18:57:31,759] Trial 4 failed with parameters: {'est__alpha': 25.786889650390023, 'est__l1_ratio': 0.5543902689675048} because of the following error: The value nan is not acceptable.
[W 2024-08-01 18:57:31,759] Trial 8 failed with value nan.
[W 2024-08-01 18:57:31,761] Trial 5 failed with parameters: {'est__alpha': 370.9566773707335, 'est__l1_ratio': 0.63458459809165} because of the following error: The value nan is not acceptable.
[W 2024-08-01 18:57:31,763] Trial 6 failed with parameters: {'est__alpha': 0.13413828959725388, 'est__l1_ratio': 0.18413969872571734} because of the following error: The value nan is not acceptable.
[W 2024-08-01 18:57:31,765] Trial 7 failed with parameters: {'est__alpha': 0.109174090958778, 'est__l1_ratio': 0.002713607083478675} because of the following error: The value nan is not acceptable.
[W 2024-08-01 18:57:31,766] Trial 3 failed with value nan.
[W 2024-08-01 18:57:31,770] Trial 9 failed with parameters: {'est__alpha': 0.030474994200691385, 'est__l1_ratio': 0.1817844394988457} because of the following error: The value nan is not acceptable.
[W 2024-08-01 18:57:31,782] Trial 9 failed with value nan.
[W 2024-08-01 18:57:31,773] Trial 5 failed with value nan.
[W 2024-08-01 18:57:31,776] Trial 6 failed with value nan.
[W 2024-08-01 18:57:31,779] Trial 7 failed with value nan.
[W 2024-08-01 18:57:31,770] Trial 4 failed with value nan.
No trials are completed yet.
Traceback (most recent call last):
  File "C:\Users\SC13015\AppData\Local\Anaconda3\Lib\site-packages\optuna_integration\sklearn.py", line 820, in _refit
    self.best_estimator_.set_params(**self.study_.best_params)
                                      ^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\SC13015\AppData\Local\Anaconda3\Lib\site-packages\optuna\study\study.py", line 114, in best_params
    return self.best_trial.params
           ^^^^^^^^^^^^^^^
  File "C:\Users\SC13015\AppData\Local\Anaconda3\Lib\site-packages\optuna\study\study.py", line 157, in best_trial
    return copy.deepcopy(self._storage.get_best_trial(self._study_id))
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\SC13015\AppData\Local\Anaconda3\Lib\site-packages\optuna\storages\_in_memory.py", line 234, in get_best_trial
    raise ValueError("No trials are completed yet.")
ValueError: No trials are completed yet.

Steps to reproduce

import pandas as pd

from sklearn import set_config
from sklearn.datasets import load_iris
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.linear_model import ElasticNet

from optuna_integration import OptunaSearchCV
from optuna.distributions import FloatDistribution

iris = load_iris()
X = pd.DataFrame(iris['data'], columns=iris['feature_names'])
y = pd.Series(iris['target']).rename('iris type')

set_config(transform_output='pandas')

transf_params = dict(
    remainder='passthrough',
    verbose_feature_names_out=False,
    force_int_remainder_cols=False,
)

# Works only when n_jobs=1
sc1_cols = ['sepal length (cm)', 'sepal width (cm)']
sc2_cols = ['petal length (cm)', 'petal width (cm)']

# Works with any n_jobs
# sc1_cols = [0, 1]
# sc2_cols = [2, 3]

pipe = Pipeline([
    ('sc1', ColumnTransformer([('sc1', StandardScaler(), sc1_cols)], **transf_params)), 
    ('sc2', ColumnTransformer([('sc2', MinMaxScaler(), sc2_cols)], **transf_params)), 
    ('est', ElasticNet())
])

param_distributions = {
    'est__alpha': FloatDistribution(1e-3, 1e3, log=True),
    'est__l1_ratio': FloatDistribution(0, 1),
}

model = OptunaSearchCV(
    estimator=pipe,
    param_distributions=param_distributions,
    n_trials=10,
    n_jobs=-1,
)
model.fit(X, y)

Additional context (optional)

No response

The text was updated successfully, but these errors were encountered:

vkarakcheev · 2024-08-28T09:37:50Z

Added similar issue to optuna/issues/

vkarakcheev added the bug Something isn't working label Aug 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OptunaSearchCV cannot multithread on a Pipeline with multiple ColumnTransformers referencing column names #146

OptunaSearchCV cannot multithread on a Pipeline with multiple ColumnTransformers referencing column names #146

vkarakcheev commented Aug 1, 2024 •

edited

Loading

vkarakcheev commented Aug 28, 2024

OptunaSearchCV cannot multithread on a Pipeline with multiple ColumnTransformers referencing column names #146

OptunaSearchCV cannot multithread on a Pipeline with multiple ColumnTransformers referencing column names #146

Comments

vkarakcheev commented Aug 1, 2024 • edited Loading

Expected behavior

Environment

Error messages, stack traces, or logs

Steps to reproduce

Additional context (optional)

vkarakcheev commented Aug 28, 2024

vkarakcheev commented Aug 1, 2024 •

edited

Loading