Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add chosen metric argument to clarify early stopping behaviour #6424

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions include/LightGBM/config.h
Original file line number Diff line number Diff line change
Expand Up @@ -397,6 +397,9 @@ struct Config {
// desc = LightGBM allows you to provide multiple evaluation metrics. Set this to ``true``, if you want to use only the first metric for early stopping
bool first_metric_only = false;

// desc = LightGBM allows you to provide multiple evaluation metrics. Set this to a specific metric name, if you want to use only this metric for early stopping
std::string chosen_metric_early_stopping;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this some more... I don't think we should add this as a parameter in LightGBM's C API.

Right now, LightGBM (across all its interfaces), has this mix of behaviors:

  • you can provide multiple metrics via the metric parameter
  • if you set early_stopping_rounds > 0 and provide any validation sets, LightGBM will try to perform early stopping based on all metrics and all validation sets
    • ... unless you set first_metric_only = true, in which case LightGBM will perform early stopping on only 1 metric (but still for all validation sets)

related: #6360

Two types of behavior rely on that metric parameter:

  • which metrics should be computed and logged/recorded during training?
  • which metrics should be used for early stopping?

We still want to provide the ability to record multiple metrics during training.

In addition, the CLI and C API don't have a concept of "callbacks", so a parameter metric_name that only accepts a single metric wouldn't be sufficient for them if they want to perform early stopping on the basis of multiple metrics.

We also have to think carefully about what breaking changes (if any) to make to LightGBM's existing behavior of automatically performing early stopping on all metrics if you enable early stopping at all.

I'm not sure what direction to set you on... need to think about this for a few days.

This is has been a quite complicated part of LightGBM's interface, I'd like to simplify it and give people finer control, but also do that in a way that minimizes the number of breaking changes made.

For example, maybe we could turn off the "automatically add the early stopping callback based on params" behavior if any lgb.early_stopping callbacks are passed through callbacks.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your explanations, I now also realize all the implications of this change, adjusting the python part with others!

I also understand that being able to specify the metric_name in the parameters dict would be preferable, as other early stopping parameters can be specified here as well. However feel free to tell me to undo the changes outside of the Callback class, if it helps to split this in different PRs.
My tests with the callback API changes alone have the expected behaviour.

I will try to take a bit more look at the C API and give you my 2cents during the week-end about how the change could be implemented. I don't expect to come up with the solution, but I guess it could help you to take a decision.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, thanks! I just want to be sure we're respectful of your time and limit how often we ask you to do something and then to undo it.

This is a part of LightGBM (and the Python package) that has to be handled with extreme care. Early stopping is a very important part of training GBDTs, and lots of existing code out in the world relies on the existing behavior.

If you want some more background on that,you might find this discussion useful: #5808


// alias = max_tree_output, max_leaf_output
// desc = used to limit the max output of tree leaves
// desc = ``<= 0`` means no constraint
Expand Down
29 changes: 27 additions & 2 deletions python-package/lightgbm/callback.py
Original file line number Diff line number Diff line change
Expand Up @@ -279,16 +279,24 @@ def __init__(
first_metric_only: bool = False,
verbose: bool = True,
min_delta: Union[float, List[float]] = 0.0,
chosen_metric: str = None,
) -> None:
self.enabled = _should_enable_early_stopping(stopping_rounds)

# Test if both parameters are used
if (first_metric_only + (chosen_metric is not None)) == 2:
error_message = """
Only one of first_metric_only and chosen_metric parameters should be used"""
raise ValueError(error_message)

self.order = 30
self.before_iteration = False

self.stopping_rounds = stopping_rounds
self.first_metric_only = first_metric_only
self.verbose = verbose
self.min_delta = min_delta
self.chosen_metric = chosen_metric

self._reset_storages()

Expand Down Expand Up @@ -345,7 +353,13 @@ def _init(self, env: CallbackEnv) -> None:

self._reset_storages()

n_metrics = len({m[1] for m in env.evaluation_result_list})
list_metrics = {m[1] for m in env.evaluation_result_list}
if (self.chosen_metric is not None) and (self.chosen_metric not in list_metrics):
error_message = f"""Chosen callback metric: {self.chosen_metric} is not in the evaluation list.
The list of available metrics for early stopping is: {list_metrics}."""
raise ValueError(error_message)

n_metrics = len(list_metrics)
n_datasets = len(env.evaluation_result_list) // n_metrics
if isinstance(self.min_delta, list):
if not all(t >= 0 for t in self.min_delta):
Expand All @@ -363,11 +377,14 @@ def _init(self, env: CallbackEnv) -> None:
raise ValueError("Must provide a single value for min_delta or as many as metrics.")
if self.first_metric_only and self.verbose:
_log_info(f"Using only {self.min_delta[0]} as early stopping min_delta.")
if (self.chosen_metric is not None) and self.verbose:
index_chosen_metric = list_metrics.index(self.chosen_metric)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this works, list_metrics is not actually a list 👀

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will look into it.
Following @jameslamb comment, this part of the code will be impacted so I will need to rewrite it anyway

_log_info(f"Using only {self.min_delta[index_chosen_metric]} as early stopping min_delta.")
deltas = self.min_delta * n_datasets
else:
if self.min_delta < 0:
raise ValueError("Early stopping min_delta must be non-negative.")
if self.min_delta > 0 and n_metrics > 1 and not self.first_metric_only and self.verbose:
if self.min_delta > 0 and n_metrics > 1 and not self.first_metric_only and (self.index_chosen_metric is None) and self.verbose:
_log_info(f"Using {self.min_delta} as min_delta for all metrics.")
deltas = [self.min_delta] * n_datasets * n_metrics

Expand All @@ -391,6 +408,8 @@ def _final_iteration_check(self, env: CallbackEnv, eval_name_splitted: List[str]
)
if self.first_metric_only:
_log_info(f"Evaluated only: {eval_name_splitted[-1]}")
if self.chosen_metric is not None:
_log_info(f"Evaluated only: {self.chosen_metric}")
raise EarlyStopException(self.best_iter[i], self.best_score_list[i])

def __call__(self, env: CallbackEnv) -> None:
Expand Down Expand Up @@ -418,6 +437,8 @@ def __call__(self, env: CallbackEnv) -> None:
eval_name_splitted = env.evaluation_result_list[i][1].split(" ")
if self.first_metric_only and self.first_metric != eval_name_splitted[-1]:
continue # use only the first metric for early stopping
if (self.chosen_metric is not None) and self.chosen_metric != eval_name_splitted[-1]:
continue # use only the first metric for early stopping
if self._is_train_set(
ds_name=env.evaluation_result_list[i][0],
eval_name=eval_name_splitted[0],
Expand All @@ -432,6 +453,8 @@ def __call__(self, env: CallbackEnv) -> None:
_log_info(f"Early stopping, best iteration is:\n[{self.best_iter[i] + 1}]\t{eval_result_str}")
if self.first_metric_only:
_log_info(f"Evaluated only: {eval_name_splitted[-1]}")
if self.chosen_metric is not None:
_log_info(f"Evaluated only: {self.chosen_metric}")
raise EarlyStopException(self.best_iter[i], self.best_score_list[i])
self._final_iteration_check(env, eval_name_splitted, i)

Expand All @@ -453,6 +476,7 @@ def early_stopping(
first_metric_only: bool = False,
verbose: bool = True,
min_delta: Union[float, List[float]] = 0.0,
chosen_metric: str = None,
) -> _EarlyStoppingCallback:
"""Create a callback that activates early stopping.

Expand Down Expand Up @@ -492,4 +516,5 @@ def early_stopping(
first_metric_only=first_metric_only,
verbose=verbose,
min_delta=min_delta,
chosen_metric=chosen_metric
)
16 changes: 15 additions & 1 deletion python-package/lightgbm/engine.py
Original file line number Diff line number Diff line change
Expand Up @@ -191,7 +191,13 @@ def train(
if params["early_stopping_round"] is None:
params.pop("early_stopping_round")
first_metric_only = params.get("first_metric_only", False)

chosen_metric_early_stopping = params.get("chosen_metric_early_stopping", None)
# Test if both parameters are used
if (first_metric_only + (chosen_metric_early_stopping is not None)) == 2:
error_message = """
Only one of first_metric_only and chosen_metric_early_stopping parameters should be used"""
raise ValueError(error_message)

predictor: Optional[_InnerPredictor] = None
if isinstance(init_model, (str, Path)):
predictor = _InnerPredictor.from_model_file(model_file=init_model, pred_parameter=params)
Expand Down Expand Up @@ -241,6 +247,7 @@ def train(
callback.early_stopping(
stopping_rounds=params["early_stopping_round"], # type: ignore[arg-type]
first_metric_only=first_metric_only,
chosen_metric=chosen_metric_early_stopping,
verbose=_choose_param_value(
main_param_name="verbosity",
params=params,
Expand Down Expand Up @@ -716,6 +723,12 @@ def cv(
if params["early_stopping_round"] is None:
params.pop("early_stopping_round")
first_metric_only = params.get("first_metric_only", False)
chosen_metric_early_stopping = params.get("chosen_metric_early_stopping", None)
# Test if both parameters are used
if (first_metric_only + (chosen_metric_early_stopping is not None)) == 2:
error_message = """
Only one of first_metric_only and chosen_metric_early_stopping parameters should be used"""
raise ValueError(error_message)

if isinstance(init_model, (str, Path)):
predictor = _InnerPredictor.from_model_file(
Expand Down Expand Up @@ -765,6 +778,7 @@ def cv(
callback.early_stopping(
stopping_rounds=params["early_stopping_round"], # type: ignore[arg-type]
first_metric_only=first_metric_only,
chosen_metric=chosen_metric_early_stopping,
verbose=_choose_param_value(
main_param_name="verbosity",
params=params,
Expand Down
1 change: 1 addition & 0 deletions src/io/config_auto.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -815,6 +815,7 @@ const std::unordered_map<std::string, std::vector<std::string>>& Config::paramet
{"extra_seed", {}},
{"early_stopping_round", {"early_stopping_rounds", "early_stopping", "n_iter_no_change"}},
{"first_metric_only", {}},
{"chosen_metric_early_stopping", {}},
{"max_delta_step", {"max_tree_output", "max_leaf_output"}},
{"lambda_l1", {"reg_alpha", "l1_regularization"}},
{"lambda_l2", {"reg_lambda", "lambda", "l2_regularization"}},
Expand Down
Loading