Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use sklearn 1.4 or above #271

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

ilario
Copy link
Contributor

@ilario ilario commented Oct 18, 2024

In sklearn 1.4.0, sklearn.utils.validation._check_fit_params has been renamed to sklearn.utils.validation._check_method_params, with this commit.

sklearn.utils.fixes.delayed has been moved to sklearn.utils.parallel.delayed (in sklearn 1.3?) and the pointer that remained in fixes was finally removed in 1.5.0.

@ilario
Copy link
Contributor Author

ilario commented Oct 18, 2024

Some tests are failing, maybe is too early for requiring at least sklearn 1.4.0?

@marcosfelt
Copy link
Collaborator

Thanks for this? Do you know what version change was implemented in?

@ilario
Copy link
Contributor Author

ilario commented Oct 21, 2024

Do you know what version change was implemented in?

I am not sure I understand the question. Can confirm that the info in my original comment does not already answer it?

@marcosfelt
Copy link
Collaborator

marcosfelt commented Oct 22, 2024

Sorry - should have read better🤦🏾‍♂️

I think the issue is that we need to change to python ^3.9: https://github.com/sustainable-processes/summit/actions/runs/11406460445/job/31740294173?pr=271

This can be done in pyproject.toml

@marcosfelt
Copy link
Collaborator

I think this needs to also be changed:

python: ["3.8", "3.9", "3.10"]

@ilario
Copy link
Contributor Author

ilario commented Oct 22, 2024

I think this needs to also be changed:

python: ["3.8", "3.9", "3.10"]

Done.

Now I observed a failure as this line

scores_list.append(_score(predictor, X_test, y_test, scorers))

does not provide to sklearn/model_selection/_validation._score a score_params parameter that become required with
scikit-learn/scikit-learn@6fa514a#diff-24fbe29b336ea0ad7ef67d54c362bfebd086d2367337526fb99dae4583891c2dR961

I have no idea of what that is. But it accepts a None value, so I am going to try with that. Please advise if None here is a bad idea.

@marcosfelt
Copy link
Collaborator

I have no idea of what that is. But it accepts a None value, so I am going to try with that. Please advise if None here is a bad idea.

I think you need to pass an empty dictionary? i.e., {}

@ilario
Copy link
Contributor Author

ilario commented Oct 22, 2024

Now it fails as we are providing a dict for scorer but scorer cannot be a dict anymore since this edit:
scikit-learn/scikit-learn@cf1fb22#diff-24fbe29b336ea0ad7ef67d54c362bfebd086d2367337526fb99dae4583891c2dL972-L974

They moved the dict conversion code to these lines:
scikit-learn/scikit-learn@cf1fb22#diff-24fbe29b336ea0ad7ef67d54c362bfebd086d2367337526fb99dae4583891c2dR361-R363

The change happened with sklearn 1.5.0.
Running _MultimetricScorer in our code should be compatible both with sklearn 1.5 and 1.4.

@ilario
Copy link
Contributor Author

ilario commented Oct 22, 2024

There is still one error that gets raised at these lines:

https://github.com/pandas-dev/pandas/blob/2b9ca0734ead966146321fa22d009d5eb186eb0b/pandas/core/indexes/multi.py#L3525-L3527

And it happens when running the example from LogSpaceObjectives documentation.

>>> from summit.domain import *
>>> from summit.strategies import SNOBFIT, MultitoSingleObjective
>>> from summit.utils.dataset import DataSet
>>> domain = Domain()
>>> domain += ContinuousVariable(name="temperature",description="reaction temperature in celsius", bounds=[50, 100])
>>> domain += ContinuousVariable(name="flowrate_a", description="flow of reactant a in mL/min", bounds=[0.1, 0.5])
>>> domain += ContinuousVariable(name="flowrate_b", description="flow of reactant b in mL/min", bounds=[0.1, 0.5])
>>> domain += ContinuousVariable(name="yield_", description="", bounds=[0, 100], is_objective=True, maximize=True)
>>> domain += ContinuousVariable(name="de",description="diastereomeric excess",bounds=[0, 100],is_objective=True,maximize=True)
>>> columns = [v.name for v in domain.variables]
>>> values = {("temperature", "DATA"): 60,("flowrate_a", "DATA"): 0.5,("flowrate_b", "DATA"): 0.5,("yield_", "DATA"): 50,("de", "DATA"): 90}
>>> previous_results = DataSet([values], columns=columns)
>>> from summit.strategies.base import LogSpaceObjectives
>>> transform = LogSpaceObjectives(domain)
<stdin>:1: DeprecationWarning: This class will be deprecated in a future version of summit.
/home/user/summit/summit/strategies/base.py:554: RuntimeWarning: divide by zero encountered in log
  v._lower_bound = np.log(v.bounds[0])
>>> strategy = SNOBFIT(domain, transform=transform)
>>> next_experiments = strategy.suggest_experiments(5, previous_results)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/user/summit/summit/strategies/snobfit.py", line 142, in suggest_experiments
    next_experiments, xbest, fbest, param = self._inner_suggest_experiments(
  File "/home/user/summit/summit/strategies/snobfit.py", line 297, in _inner_suggest_experiments
    outputs[v.name] = -1 * outputs[v.name]
  File "/home/user/summit/summit/utils/dataset.py", line 281, in __getitem__
    return super().__getitem__(key)
  File "/usr/lib/python3/dist-packages/pandas/core/frame.py", line 3457, in __getitem__
    return self._getitem_multilevel(key)
  File "/usr/lib/python3/dist-packages/pandas/core/frame.py", line 3508, in _getitem_multilevel
    loc = self.columns.get_loc(key)
  File "/usr/lib/python3/dist-packages/pandas/core/indexes/multi.py", line 2922, in get_loc
    loc = self._get_level_indexer(key, level=0)
  File "/usr/lib/python3/dist-packages/pandas/core/indexes/multi.py", line 3211, in _get_level_indexer
    raise KeyError(key)
KeyError: 'log_yield_'

@marcosfelt
Copy link
Collaborator

I think that example should just be removed - sorry for the slow response and thanks for working through this!

@ilario
Copy link
Contributor Author

ilario commented Oct 23, 2024

Thank you indeed for the very useful and open source tool :D

@marcosfelt
Copy link
Collaborator

Do you want to remove the LogSpaceObjectives docstring example? I would do it on another PR but that would fail because of the other issues in this PR not being fixed

@ilario
Copy link
Contributor Author

ilario commented Oct 23, 2024

In the previous step, the test_runner_mo_integration[experiment1-NelderMead] test failed (did not understand why).
https://github.com/sustainable-processes/summit/actions/runs/11478033686/job/31941404822?pr=271#step:6:34

But now the tests are failing for a weird new reason:

Run actions/setup-python@v2
  with:
    python-version: 3.9
    cache: pip
    cache-dependency-path: poetry.lock
    token: ***
Successfully setup CPython (3.9.20)
/opt/hostedtoolcache/Python/3.9.20/x64/bin/pip cache dir
/home/runner/.cache/pip
Received 234881024 of 2150362014 (10.9%), 224.0 MBs/sec
Received 545259520 of 2150362014 (25.4%), 259.2 MBs/sec
Received 843055104 of 2150362014 (39.2%), 267.4 MBs/sec
Received 1140850688 of 2150362014 (53.1%), 271.5 MBs/sec
Received 1430257664 of 2150362014 (66.5%), 272.1 MBs/sec
Received 1761607680 of 2150362014 (81.9%), 279.3 MBs/sec
Received 2092957696 of 2150362014 (97.3%), 284.5 MBs/sec
Received 2150362014 of 2150362014 (100.0%), 285.9 MBs/sec
Error: The value of "length" is out of range. It must be >= 0 && <= 2147483647. Received 2150362014

@ilario ilario changed the title Update sklearn function names Use sklearn 1.4 or above Oct 23, 2024
@ilario
Copy link
Contributor Author

ilario commented Oct 25, 2024

In the workflow run there are some warnings that maybe are useful for identifying the error:
the warnings complain about the fact that we are using old versions of actions/checkout, actions/setup-python, Gr1N/setup-poetry, actions/upload-artifact, actions/upload-artifact in the workflow file https://github.com/sustainable-processes/summit/blob/main/.github/workflows/ci.yml

@marcosfelt
Copy link
Collaborator

marcosfelt commented Oct 28, 2024

Ahh I see - I'll put in fixes for these tomorrow. Thanks! After that, you should be able to rebase on top

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants