Include validation series with hyperparameter optimization in Darts #2301

ETTAN93 · 2024-04-04T16:39:20Z

When tuning hyperparameters for non time-series data, normally one would split the dataset into training set, validation set and test set. The validation set is then used to test which set of hyperparameters perform the best.

How does this work for historical backtest in time-series forecasting? I referred to the two examples here in Darts: example1 and example2 here in Darts.

For example, when just doing a normal historical backtest, assuming I have hourly data from 2020-01-01 to 2023-12-31. I would just specify when the test set starts, e.g. 2023-01-01 and carry out the historical backtest that way, e.g.

model_estimator.historical_forecasts(
        series=target_hf, 
        past_covariates= None,
        future_covariates= future_cov_hf,
        start='2023-01-01', 
        retrain=30,
        forecast_horizon=24,
        stride=24,
        train_length = 2160,
        verbose=True,
        last_points_only=False,
    )

This means that the model is retrained every 30 days with the past 90 days of data. It predicts the next 24 hours every 24 hours.

If I want to do now hyperparameter optimization with optuna and Darts, would this make sense:

def objective(trial):
    forecast_horizon = 24
    fc_lags_dict = {}
    for feature in future_cov:
        future_cov_lags_lower_bound = trial.suggest_int(f'fc_lb_{feature}', -96, -1)
        future_cov_lags_upper_bound = trial.suggest_int(f'fc_up_{feature}', 1, 72)
        fc_lags_dict[feature] = list(range(future_cov_lags_lower_bound, future_cov_lags_upper_bound))
        
    target_lags_lower_bound = trial.suggest_int('target_lags_lower_bound', -96, -1)

    model = LinearRegressionModel(
        lags=list(range(target_lags_lower_bound, 0)),
        lags_past_covariates=None, 
        lags_future_covariates=fc_lags_dict,
        output_chunk_length=forecast_horizon,
        multi_models=True,
    )
    
    hf_results = model.historical_forecasts(
            series=target_hf, 
            past_covariates= None,
            future_covariates= future_cov_hf,
            start='2023-01-01', 
            retrain=30,
            forecast_horizon=24,
            stride=24,
            train_length = 2160,
            verbose=True,
            last_points_only=False,
        ) 
   
    mae= return_metrics(hf_results )
    return mae

but this then uses the full set of data to do the hyperparameter optimization. do I need to split the data out separately for the test set?

The text was updated successfully, but these errors were encountered:

dennisbader · 2024-04-05T14:22:56Z

Hi @ETTAN93, yes for this you can simply define a val_end as a pd.Timestamp and then set series=target_hf[:val_end] when calling historical forecasts.

For the final test set, adjust the start date to be after the val_end and use the entire target_hf.

ETTAN93 · 2024-04-28T20:00:02Z

Hi @dennisbader, just to clarify but what you mean:

Assuming I have a dataset that goes from 2020-01-01 to 2023-12-31, are you saying to split the dataset into for example:

train_start = '2020-01-01'
val_start = '2022-01-01'
val_end = '2022-12-31'
test_start = '2023-01-01'

Then within the objective function for hyperparameter optimization, you would set:

hf_results = model.historical_forecasts(
         series=target_hf[ : val_end], 
         past_covariates= None,
         future_covariates= future_cov_hf,
         start=val_start, 
         retrain=30,
         forecast_horizon=24,
         stride=24,
         train_length = 2160,
         verbose=True,
         last_points_only=False,
     )

After getting the hyperparameters, you would then evaluate on the test set again with:

hf_results = model.historical_forecasts(
         series=target_hf, 
         past_covariates= None,
         future_covariates= future_cov_hf,
         start=test_start, 
         retrain=30,
         forecast_horizon=24,
         stride=24,
         train_length = 2160,
         verbose=True,
         last_points_only=False,
     )

Is that correct?

dennisbader · 2024-04-30T08:59:36Z

Hi @ETTAN93, yes, that's exactly it 👍 (assuming that your frequency is "D"/daily)

noori11 · 2024-05-29T09:09:19Z

Hi @dennisbader,

This seems as the models hyper parameters are being tuned in the interval from 2022-01-01 until 2022-12-31, then used for all the forecasts made from 2023-01-01 and forward.

However, what if you wanted to do hyper parameter optimization every month in an expanding- or sliding window cross validation instead? How would you structure it using Darts?

madtoinou · 2024-08-23T12:31:18Z

Hi @noori11,

This is usually the way to go; you train the model with as much data as possible "before" the validation set, use the validation set to identify the best parameters and then, just assess the performance on the test set.

If by "hyper-parameter optimization every month" you mean generating/assess performance only once per month, you would have to re-use the trick described in #2497 to have forecasts at the desired frequency before computing the metrics. But I would highly recommend using the code snippet mentioned above in this thread.

Closing this since the initial question was answered, feel free to reopen this issue or open a new one if something is still not clear.

madtoinou added the question Further information is requested label Apr 8, 2024

dennisbader mentioned this issue Apr 17, 2024

Nested Cross-validation #2322

Closed

madtoinou closed this as completed Aug 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Include validation series with hyperparameter optimization in Darts #2301

Include validation series with hyperparameter optimization in Darts #2301

ETTAN93 commented Apr 4, 2024

dennisbader commented Apr 5, 2024

ETTAN93 commented Apr 28, 2024

dennisbader commented Apr 30, 2024

noori11 commented May 29, 2024

madtoinou commented Aug 23, 2024

Include validation series with hyperparameter optimization in Darts #2301

Include validation series with hyperparameter optimization in Darts #2301

Comments

ETTAN93 commented Apr 4, 2024

dennisbader commented Apr 5, 2024

ETTAN93 commented Apr 28, 2024

dennisbader commented Apr 30, 2024

noori11 commented May 29, 2024

madtoinou commented Aug 23, 2024