TiDE Model Stops At A Specific Epoch #2496

ETTAN93 · 2024-08-12T12:23:25Z

A more general question, I am trying to run a historical backtest using TiDE model for my use case:

from darts.models import TiDEModel

tide_model = TiDEModel(
    input_chunk_length=8,
    output_chunk_length=3,
    n_epochs=20
)

tide_model .fit(
    series =...,
    past_covariates= ...
    future_covariates= ...
)
    
tide_hf_results = model_estimator.historical_forecasts(
...
)

For some reason, the model always stalls at a specific point (77% of Epoch 5). I can see that the kernel is still running under the hood but the progress bar will no longer continue moving. I have tried increasing the memory and CPU by 3x but still, the model would stall at exactly the same point. Not sure if anyone have met this issue before and have any suggested solutions.

No error messages are returned at all so I am not sure how to debug the issue.

The text was updated successfully, but these errors were encountered:

madtoinou · 2024-08-13T11:19:25Z

Hi @ETTAN93,

Does this happen with a specific dataset or with all the dataset you're trying to use with the model? Does reducing the size of the model impact the epoch at which the process get stuck?

Can you try to share a reproducible example so that we can better investigate the source of the problem? Please include the argument used as well as some synthetic data with features similar to the one you're using.

ETTAN93 · 2024-08-30T19:52:55Z

hi @madtoinou, this happens with a specific dataset that I am using but I did a bit more testing around the issue and discovered a few things:

The model runs completely fine if epoch = 5. If I set epoch >= 6, the progress bar will get stuck at 77% as previously mentioned.
It seems like the amount of data in the train or test set when carrying out historical forecast could be causing the issue. I tried setting epoch = 6 .
Original dates:

start_date= '2019-09-01 00:00:00'
split_date= '2023-01-31 23:59:00' 
end_date= '2024-05-31 23:59:00'

This would cause the model to fail at Epoch 5 77%

When reducing the amount of data to

start_date= '2019-09-01 00:00:00'
split_date= '2021-12-31 23:59:00
end_date= '2022-12-31 23:59:00'"

The model successfully completes Epoch 5.

When increasing the train data by 1 extra year while keeping test set at 1 year:

start_date= '2019-09-01 00:00:00'
split_date= '2022-12-31 23:59:00
end_date= '2023-12-31 23:59:00'"

The model fails at Epoch 5 but at 92%. so it seems like the amount of train data could be causing it. Do you have any experience of this before?

The dataset below can be used to replicate the issue:

num_rows = 175319
num_columns = 88
start_date = '2019-08-15 03:15:00'
end_date = '2024-08-14 08:45:00'

# Generate random float data
data = np.random.rand(num_rows, num_columns) * 100

# Generate the DatetimeIndex with a frequency of 15 minutes
datetime_index = pd.date_range(start=start_date, end=end_date, freq='15T', name='timestamp_utc')

column_names = [f'column_{i+1}' for i in range(num_columns)]

# Create the DataFrame
test_df = pd.DataFrame(data, columns = column_names)
test_df.index = datetime_index

start_date = '2019-09-01 00:00:00'
split_date =  '2022-12-31 23:59:00' 
end_date = '2023-12-31 23:59:00' 

target_series = TimeSeries.from_dataframe(test_df[column_1])[start_date:end_date] #should contain 1 column only
future_cov_series = TimeSeries.from_dataframe(test_df [....])[start_date:end_date] #should contain 22 columns
past_cov_series = TimeSeries.from_dataframe(test_df [...])[start_date:end_date] #should contain 65 columns

target_train = target_series[start_date:split_date]
future_cov_train = future_cov_series[start_date:split_date]
past_cov_train = past_cov_series[start_date:split_date]

tide_model = TiDEModel(
    input_chunk_length=8,
    output_chunk_length=3,
    n_epochs=6
)

tide_model.fit(
    series =target_train ,
    past_covariates= past_cov_train,
    future_covariates= future_cov_train 
)
    
tide_hf_results = model_estimator.historical_forecasts(
    series=target_series, 
    past_covariates= past_cov_series,
    future_covariates= future_cov_series,
    start=split_date, #can change to different date examples mentioned above
    retrain=False,
    forecast_horizon=3,
    stride=1,
    train_length = None,
    verbose=True,
    last_points_only=False,
)

ETTAN93 · 2024-09-30T17:29:26Z

hi @madtoinou, has there been any updates about this issue?

madtoinou · 2024-10-01T07:37:49Z

Hi @ETTAN93,

I tried your code snippet (with some corrections to make it run) and could not reproduce the issue (with the latest version of master). If the model fails during the 5th epoch, it means that you have an issue during the training, not the historical forecasts (which uses the optimized routine and rely on only one DataLoader). The problem might be hardware or memory related...

import numpy as np
import pandas as pd
from darts import TimeSeries
from darts.models import TiDEModel

num_rows = 175319
num_columns = 88
start_date = '2019-08-15 03:15:00'
end_date = '2024-08-14 08:45:00'

# Generate random float data
data = np.random.rand(num_rows, num_columns) * 100

# Generate the DatetimeIndex with a frequency of 15 minutes
datetime_index = pd.date_range(start=start_date, end=end_date, freq='15T', name='timestamp_utc')
column_names = [f'column_{i+1}' for i in range(num_columns)]

# Create the DataFrame
test_df = pd.DataFrame(data, columns = column_names)
test_df.index = datetime_index

start_date = pd.Timestamp('2019-09-01 00:00:00')
split_date = pd.Timestamp('2022-12-31 00:00:00')
end_date = pd.Timestamp('2023-12-31 00:00:00')

target_series = TimeSeries.from_dataframe(test_df[["column_1"]])[start_date:end_date] #should contain 1 column only
future_cov_series = TimeSeries.from_dataframe(test_df[[f'column_{i+1}' for i in range(1, 23)]])[start_date:end_date] #should contain 22 columns
past_cov_series = TimeSeries.from_dataframe(test_df[[f'column_{i+1}' for i in range(23, 88)]])[start_date:end_date] #should contain 65 columns

target_train = target_series[start_date:split_date]
future_cov_train = future_cov_series[start_date:split_date]
past_cov_train = past_cov_series[start_date:split_date]

tide_model = TiDEModel(
    input_chunk_length=8,
    output_chunk_length=3,
    n_epochs=6,
    pl_trainer_kwargs={"accelerator":"cpu"}
)

tide_model.fit(
    series =target_train ,
    past_covariates= past_cov_train,
    future_covariates= future_cov_train 
)
    
tide_hf_results = tide_model.historical_forecasts(
    series=target_series, 
    past_covariates= past_cov_series,
    future_covariates= future_cov_series,
    start=split_date, #can change to different date examples mentioned above
    retrain=False,
    forecast_horizon=3,
    stride=1,
    train_length = None,
    verbose=True,
    last_points_only=False,
)

madtoinou added the bug Something isn't working label Sep 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TiDE Model Stops At A Specific Epoch #2496

TiDE Model Stops At A Specific Epoch #2496

ETTAN93 commented Aug 12, 2024 •

edited

Loading

madtoinou commented Aug 13, 2024

ETTAN93 commented Aug 30, 2024 •

edited

Loading

ETTAN93 commented Sep 30, 2024

madtoinou commented Oct 1, 2024

TiDE Model Stops At A Specific Epoch #2496

TiDE Model Stops At A Specific Epoch #2496

Comments

ETTAN93 commented Aug 12, 2024 • edited Loading

madtoinou commented Aug 13, 2024

ETTAN93 commented Aug 30, 2024 • edited Loading

ETTAN93 commented Sep 30, 2024

madtoinou commented Oct 1, 2024

ETTAN93 commented Aug 12, 2024 •

edited

Loading

ETTAN93 commented Aug 30, 2024 •

edited

Loading