Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EHVI & NEHVI break with more than 7 objectives #2387

Open
ronald-jaepel opened this issue Apr 21, 2024 · 12 comments
Open

EHVI & NEHVI break with more than 7 objectives #2387

ronald-jaepel opened this issue Apr 21, 2024 · 12 comments
Labels
bug Something isn't working

Comments

@ronald-jaepel
Copy link

ronald-jaepel commented Apr 21, 2024

Hello Ax Team,

when running EHVI or NEHVI with more than 7 objectives, we get an error during the evaluation of the objective function.

Here's an MRE:

import numpy as np
from ax.service.ax_client import AxClient, ObjectiveProperties

N_OBJECTIVES = 8

ax_client = AxClient()
ax_client.create_experiment(
    name="test_experiment",
    parameters=[
        {
            "name": "x1",
            "type": "range",
            "bounds": [-5.0, 10.0],
            "value_type": "float",
        },
        {
            "name": "x2",
            "type": "range",
            "bounds": [0.0, 10.0],
            "value_type": "float",
        },
    ],
    objectives={
        f"Objective_{i}": ObjectiveProperties(minimize=True, threshold=1) for i in range(N_OBJECTIVES)
    },
)


def objective_function():
    res = {
        f"Objective_{i}": np.random.rand() for i in range(N_OBJECTIVES)
    }
    return res


for _ in range(15):
    parameters, trial_index = ax_client.get_next_trial()
    ax_client.complete_trial(trial_index=trial_index, raw_data=objective_function())

and here's the full traceback:

  File "...\ax\service\ax_client.py", line 531, in get_next_trial
    generator_run=self._gen_new_generator_run(), ttl_seconds=ttl_seconds
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "...\ax\service\ax_client.py", line 1763, in _gen_new_generator_run
    return not_none(self.generation_strategy).gen(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "...\ax\modelbridge\generation_strategy.py", line 478, in gen
    return self._gen_multiple(
           ^^^^^^^^^^^^^^^^^^^
  File "...\ax\modelbridge\generation_strategy.py", line 675, in _gen_multiple
    generator_run = self._curr.gen(
                    ^^^^^^^^^^^^^^^
  File "...\ax\modelbridge\generation_node.py", line 737, in gen
    gr = super().gen(
         ^^^^^^^^^^^^
  File "...\ax\modelbridge\generation_node.py", line 307, in gen
    generator_run = model_spec.gen(
                    ^^^^^^^^^^^^^^^
  File "...\ax\modelbridge\model_spec.py", line 219, in gen
    return fitted_model.gen(**model_gen_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "...\ax\modelbridge\base.py", line 784, in gen
    gen_results = self._gen(
                  ^^^^^^^^^^
  File "...\ax\modelbridge\torch.py", line 690, in _gen
    gen_results = not_none(self.model).gen(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "...\ax\models\torch\botorch_modular\model.py", line 428, in gen
    candidates, expected_acquisition_value = acqf.optimize(
                                             ^^^^^^^^^^^^^^
  File "...\ax\models\torch\botorch_modular\acquisition.py", line 439, in optimize
    return optimize_acqf(
           ^^^^^^^^^^^^^^
  File "...\botorch\optim\optimize.py", line 563, in optimize_acqf
    return _optimize_acqf(opt_acqf_inputs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "...\botorch\optim\optimize.py", line 584, in _optimize_acqf
    return _optimize_acqf_batch(opt_inputs=opt_inputs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "...\botorch\optim\optimize.py", line 274, in _optimize_acqf_batch
    batch_initial_conditions = opt_inputs.get_ic_generator()(
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "...\botorch\optim\initializers.py", line 417, in gen_batch_initial_conditions
    Y_rnd_curr = acq_function(
                 ^^^^^^^^^^^^^
  File "...\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "...\torch\nn\modules\module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "...\botorch\utils\transforms.py", line 305, in decorated
    return method(cls, X, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "...\botorch\utils\transforms.py", line 259, in decorated
    output = method(acqf, X, *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "...\botorch\acquisition\multi_objective\logei.py", line 468, in forward
    nehvi = self._compute_log_qehvi(samples=samples, X=X)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "...\botorch\acquisition\multi_objective\logei.py", line 267, in _compute_log_qehvi
    return logmeanexp(logsumexp(log_areas_per_segment, dim=-1), dim=0)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "...\botorch\utils\safe_math.py", line 146, in logsumexp
    return _inf_max_helper(torch.logsumexp, x=x, dim=dim, keepdim=keepdim)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "...\botorch\utils\safe_math.py", line 170, in _inf_max_helper
    M = x.amax(dim=dim, keepdim=True)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
IndexError: amax(): Expected reduction dim -1 to have non-zero size.
@esantorella
Copy link
Contributor

This reproduces. Thanks for reporting. Weird bug!

@Balandat
Copy link
Contributor

One thing to note is that HV-based acquisition functions generally don't scale well to problems with many objectives. 2-3 is generally fine, for 4+ you'll likely see a pretty substantial slowdown and/or memory explosion because of how complex the box decompositions of the Pareto set become. In cases with 8 objective such as yours you'll likely want to either express some of the objectives as constraints instead (if that's possible), drop them from the optimization, or use a different acquisition function such as qParEGO (which will scale better but is less sample efficient).

@esantorella
Copy link
Contributor

I started looking into this, and the bug seems to stem from hypervolume computations starting to use zero cells once m>7, because this check for Pareto dominance always evaluates to False. I'm not sure why yet. That issue didn't happen when I tried to reproduce this using pure BoTorch, so I'm leaving this as an Ax issue for now -- I'm not sure if this is a BoTorch bug or if Ax is passing bad inputs.

@Balandat
Copy link
Contributor

cc @sdaulton

@schmoelder
Copy link

Hi, I wanted to ask if there are any updates on this issue. Cheers!

@esantorella
Copy link
Contributor

I'm afraid I haven't made any progress on this, but it remains a bug we want to understand.

@lena-kashtelyan lena-kashtelyan added the bug Something isn't working label Jul 31, 2024
@lena-kashtelyan
Copy link
Contributor

@esantorella, @Balandat, my sense is that we may want to validate against too many objectives in (n)EHVI and simply not allow this behavior, as users are likely best served by a) converting some of their objectives into constraints or b) using parEGO if they do indeed have this many objectives, is that right?

@eytan
Copy link
Contributor

eytan commented Jul 31, 2024

That is correct @lena-kashtelyan , people should not be using EHVI-based methods for 7+ dimensional objectives. I am not sure what the default approximation values are for the approximate HV computation, but if we are sufficiently aggressive (zeta=1e-3) at higher dimensions (say M=4 or M=5) then it could be reasonably fast relative to ParEGO (see p29 of https://arxiv.org/pdf/2006.05078). I would recommend making sure that we kick into more aggressive approximation for higher dimensionalities, and for anything 6 or higher default to ParEGO and throw a warning.

@schmoelder can you tell us a little bit more about your use case? MOO tends to be less useful and sample efficient when you have so many objective, since the area of the frontier increases exponentially with the objectives and ultimately people are interested in just a few "good" tradeoffs. Many people have legit reasons for wanting to optimize so many objectives, and we've developed for using preference-based feedback to do the search more efficiently than multi-objective Bayesian optimization (paper @ https://arxiv.org/pdf/2203.11382, code @ https://botorch.org/tutorials/bope)

@esantorella esantorella removed their assignment Aug 19, 2024
@schmoelder
Copy link

We include Ax as one of multiple optimizers in a process optimization tool (CADET-Process).
Our users can setup their process models / optimization problems and combine them with any of the provided optimizers, since they are internally translated to the individual APIs of the different libraries.

One of our users reported the issue at hand and @ronald-jaepel was just relaying this with a stripped down MRE. We also wouldn't recommend using EHVI with this many objectives, but I don't know much about their specific use case. And as maintainer of CADET-Process, I mostly care about stability of the tool, not so much about the "why".

To avoid the crash, I could add an internal check to catch this before any optimization is started, but having a fix upstream would be preferable. Please let us know if we can help with anything.

@eytan
Copy link
Contributor

eytan commented Dec 5, 2024 via email

@schmoelder
Copy link

Hi Eytan, I understand your reasoning, but the bug in Ax is independent from the question whether it is actually useful to have that many objectives or not, right? There seems to be some underlying structural issue in the code. Simply "catching" user configuration to avoid the crash does not really fix this.

But I'm also not a maintainer of Ax and I understand that resources are always limited and this is not a super pressing issue. This is just my personal opinion and I will respect any decision on that matter.

@sdaulton
Copy link
Contributor

sdaulton commented Dec 5, 2024

The bug here is that we use this heuristic in Ax to determine the approximation level that should be used in the approximate box decomposition when there are many objectives: https://github.com/pytorch/botorch/blob/5012fe8a39b434e1b0f3d3a968eb17b3dd0c9e27/botorch/acquisition/multi_objective/utils.py#L63

If there are 8 or more objective this breaks. I will update this raise a warning and clamp the maximum alpha value to be < 1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

7 participants