Use sklearn 1.4 or above #271

ilario · 2024-10-18T15:21:38Z

In sklearn 1.4.0, sklearn.utils.validation._check_fit_params has been renamed to sklearn.utils.validation._check_method_params, with this commit.

sklearn.utils.fixes.delayed has been moved to sklearn.utils.parallel.delayed (in sklearn 1.3?) and the pointer that remained in fixes was finally removed in 1.5.0.

ilario · 2024-10-18T15:24:23Z

Some tests are failing, maybe is too early for requiring at least sklearn 1.4.0?

marcosfelt · 2024-10-19T17:42:04Z

Thanks for this? Do you know what version change was implemented in?

ilario · 2024-10-21T09:08:08Z

Do you know what version change was implemented in?

I am not sure I understand the question. Can confirm that the info in my original comment does not already answer it?

marcosfelt · 2024-10-22T10:17:02Z

Sorry - should have read better🤦🏾‍♂️

I think the issue is that we need to change to python ^3.9: https://github.com/sustainable-processes/summit/actions/runs/11406460445/job/31740294173?pr=271

This can be done in pyproject.toml

marcosfelt · 2024-10-22T10:36:31Z

I think this needs to also be changed:

summit/.github/workflows/ci.yml

Line 43 in 1de682d

python: ["3.8", "3.9", "3.10"]

ilario · 2024-10-22T11:04:39Z

I think this needs to also be changed:

summit/.github/workflows/ci.yml

Line 43 in 1de682d

python: ["3.8", "3.9", "3.10"]

Done.

Now I observed a failure as this line

summit/summit/benchmarks/experimental_emulator.py

Line 369 in 34162c8

scores_list.append(_score(predictor, X_test, y_test, scorers))

does not provide to sklearn/model_selection/_validation._score a score_params parameter that become required with
scikit-learn/scikit-learn@6fa514a#diff-24fbe29b336ea0ad7ef67d54c362bfebd086d2367337526fb99dae4583891c2dR961

I have no idea of what that is. But it accepts a None value, so I am going to try with that. Please advise if None here is a bad idea.

marcosfelt · 2024-10-22T11:33:42Z

I have no idea of what that is. But it accepts a None value, so I am going to try with that. Please advise if None here is a bad idea.

I think you need to pass an empty dictionary? i.e., {}

ilario · 2024-10-22T12:26:14Z

Now it fails as we are providing a dict for scorer but scorer cannot be a dict anymore since this edit:
scikit-learn/scikit-learn@cf1fb22#diff-24fbe29b336ea0ad7ef67d54c362bfebd086d2367337526fb99dae4583891c2dL972-L974

They moved the dict conversion code to these lines:
scikit-learn/scikit-learn@cf1fb22#diff-24fbe29b336ea0ad7ef67d54c362bfebd086d2367337526fb99dae4583891c2dR361-R363

The change happened with sklearn 1.5.0.
Running _MultimetricScorer in our code should be compatible both with sklearn 1.5 and 1.4.

…n._score so we have to do it

ilario · 2024-10-22T13:03:36Z

There is still one error that gets raised at these lines:

https://github.com/pandas-dev/pandas/blob/2b9ca0734ead966146321fa22d009d5eb186eb0b/pandas/core/indexes/multi.py#L3525-L3527

And it happens when running the example from LogSpaceObjectives documentation.

>>> from summit.domain import *
>>> from summit.strategies import SNOBFIT, MultitoSingleObjective
>>> from summit.utils.dataset import DataSet
>>> domain = Domain()
>>> domain += ContinuousVariable(name="temperature",description="reaction temperature in celsius", bounds=[50, 100])
>>> domain += ContinuousVariable(name="flowrate_a", description="flow of reactant a in mL/min", bounds=[0.1, 0.5])
>>> domain += ContinuousVariable(name="flowrate_b", description="flow of reactant b in mL/min", bounds=[0.1, 0.5])
>>> domain += ContinuousVariable(name="yield_", description="", bounds=[0, 100], is_objective=True, maximize=True)
>>> domain += ContinuousVariable(name="de",description="diastereomeric excess",bounds=[0, 100],is_objective=True,maximize=True)
>>> columns = [v.name for v in domain.variables]
>>> values = {("temperature", "DATA"): 60,("flowrate_a", "DATA"): 0.5,("flowrate_b", "DATA"): 0.5,("yield_", "DATA"): 50,("de", "DATA"): 90}
>>> previous_results = DataSet([values], columns=columns)
>>> from summit.strategies.base import LogSpaceObjectives
>>> transform = LogSpaceObjectives(domain)
<stdin>:1: DeprecationWarning: This class will be deprecated in a future version of summit.
/home/user/summit/summit/strategies/base.py:554: RuntimeWarning: divide by zero encountered in log
  v._lower_bound = np.log(v.bounds[0])
>>> strategy = SNOBFIT(domain, transform=transform)
>>> next_experiments = strategy.suggest_experiments(5, previous_results)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/user/summit/summit/strategies/snobfit.py", line 142, in suggest_experiments
    next_experiments, xbest, fbest, param = self._inner_suggest_experiments(
  File "/home/user/summit/summit/strategies/snobfit.py", line 297, in _inner_suggest_experiments
    outputs[v.name] = -1 * outputs[v.name]
  File "/home/user/summit/summit/utils/dataset.py", line 281, in __getitem__
    return super().__getitem__(key)
  File "/usr/lib/python3/dist-packages/pandas/core/frame.py", line 3457, in __getitem__
    return self._getitem_multilevel(key)
  File "/usr/lib/python3/dist-packages/pandas/core/frame.py", line 3508, in _getitem_multilevel
    loc = self.columns.get_loc(key)
  File "/usr/lib/python3/dist-packages/pandas/core/indexes/multi.py", line 2922, in get_loc
    loc = self._get_level_indexer(key, level=0)
  File "/usr/lib/python3/dist-packages/pandas/core/indexes/multi.py", line 3211, in _get_level_indexer
    raise KeyError(key)
KeyError: 'log_yield_'

marcosfelt · 2024-10-23T02:32:58Z

I think that example should just be removed - sorry for the slow response and thanks for working through this!

ilario · 2024-10-23T07:05:11Z

Thank you indeed for the very useful and open source tool :D

marcosfelt · 2024-10-23T10:15:45Z

Do you want to remove the LogSpaceObjectives docstring example? I would do it on another PR but that would fail because of the other issues in this PR not being fixed

ilario · 2024-10-23T14:01:35Z

In the previous step, the test_runner_mo_integration[experiment1-NelderMead] test failed (did not understand why).
https://github.com/sustainable-processes/summit/actions/runs/11478033686/job/31941404822?pr=271#step:6:34

But now the tests are failing for a weird new reason:

Run actions/setup-python@v2
  with:
    python-version: 3.9
    cache: pip
    cache-dependency-path: poetry.lock
    token: ***
Successfully setup CPython (3.9.20)
/opt/hostedtoolcache/Python/3.9.20/x64/bin/pip cache dir
/home/runner/.cache/pip
Received 234881024 of 2150362014 (10.9%), 224.0 MBs/sec
Received 545259520 of 2150362014 (25.4%), 259.2 MBs/sec
Received 843055104 of 2150362014 (39.2%), 267.4 MBs/sec
Received 1140850688 of 2150362014 (53.1%), 271.5 MBs/sec
Received 1430257664 of 2150362014 (66.5%), 272.1 MBs/sec
Received 1761607680 of 2150362014 (81.9%), 279.3 MBs/sec
Received 2092957696 of 2150362014 (97.3%), 284.5 MBs/sec
Received 2150362014 of 2150362014 (100.0%), 285.9 MBs/sec
Error: The value of "length" is out of range. It must be >= 0 && <= 2147483647. Received 2150362014

ilario · 2024-10-25T08:19:42Z

In the workflow run there are some warnings that maybe are useful for identifying the error:
the warnings complain about the fact that we are using old versions of actions/checkout, actions/setup-python, Gr1N/setup-poetry, actions/upload-artifact, actions/upload-artifact in the workflow file https://github.com/sustainable-processes/summit/blob/main/.github/workflows/ci.yml

marcosfelt · 2024-10-28T01:50:17Z

Ahh I see - I'll put in fixes for these tomorrow. Thanks! After that, you should be able to rebase on top

ilario added 3 commits October 18, 2024 14:17

updated sklearn's _check_fit_params function name

ec85cd9

updated sklearn version needed for _check_method_params

af51444

delayed has been moved to sklearn.utils.parallel

cbd25eb

ilario added 2 commits October 22, 2024 12:29

require python 3.9 for scikit-learn >= 1.4

d210d14

remove python 3.8 from tests

181ec48

add required option for sklearn.model_selection._validation._score 1.4.0

ecb8b88

_MultimetricScorer was moved out of sklearn.model_selection.validatio…

9db3516

…n._score so we have to do it

ilario added 2 commits October 23, 2024 12:29

removed failing example from strategies.base.LogSpaceObjectives

b67c767

removed failing test from test_strategies

e8c97bc

ilario changed the title ~~Update sklearn function names~~ Use sklearn 1.4 or above Oct 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use sklearn 1.4 or above #271

Use sklearn 1.4 or above #271

ilario commented Oct 18, 2024

ilario commented Oct 18, 2024

marcosfelt commented Oct 19, 2024

ilario commented Oct 21, 2024

marcosfelt commented Oct 22, 2024 •

edited

Loading

marcosfelt commented Oct 22, 2024

ilario commented Oct 22, 2024

marcosfelt commented Oct 22, 2024

ilario commented Oct 22, 2024 •

edited

Loading

ilario commented Oct 22, 2024

marcosfelt commented Oct 23, 2024

ilario commented Oct 23, 2024

marcosfelt commented Oct 23, 2024

ilario commented Oct 23, 2024 •

edited

Loading

ilario commented Oct 25, 2024

marcosfelt commented Oct 28, 2024 •

edited

Loading

Use sklearn 1.4 or above #271

Are you sure you want to change the base?

Use sklearn 1.4 or above #271

Conversation

ilario commented Oct 18, 2024

ilario commented Oct 18, 2024

marcosfelt commented Oct 19, 2024

ilario commented Oct 21, 2024

marcosfelt commented Oct 22, 2024 • edited Loading

marcosfelt commented Oct 22, 2024

ilario commented Oct 22, 2024

marcosfelt commented Oct 22, 2024

ilario commented Oct 22, 2024 • edited Loading

ilario commented Oct 22, 2024

marcosfelt commented Oct 23, 2024

ilario commented Oct 23, 2024

marcosfelt commented Oct 23, 2024

ilario commented Oct 23, 2024 • edited Loading

ilario commented Oct 25, 2024

marcosfelt commented Oct 28, 2024 • edited Loading

marcosfelt commented Oct 22, 2024 •

edited

Loading

ilario commented Oct 22, 2024 •

edited

Loading

ilario commented Oct 23, 2024 •

edited

Loading

marcosfelt commented Oct 28, 2024 •

edited

Loading