Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TiDE Model Stops At A Specific Epoch #2496

Open
ETTAN93 opened this issue Aug 12, 2024 · 4 comments
Open

TiDE Model Stops At A Specific Epoch #2496

ETTAN93 opened this issue Aug 12, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@ETTAN93
Copy link

ETTAN93 commented Aug 12, 2024

A more general question, I am trying to run a historical backtest using TiDE model for my use case:

from darts.models import TiDEModel

tide_model = TiDEModel(
    input_chunk_length=8,
    output_chunk_length=3,
    n_epochs=20
)

tide_model .fit(
    series =...,
    past_covariates= ...
    future_covariates= ...
)
    
tide_hf_results = model_estimator.historical_forecasts(
...
) 

For some reason, the model always stalls at a specific point (77% of Epoch 5). I can see that the kernel is still running under the hood but the progress bar will no longer continue moving. I have tried increasing the memory and CPU by 3x but still, the model would stall at exactly the same point. Not sure if anyone have met this issue before and have any suggested solutions.

No error messages are returned at all so I am not sure how to debug the issue.

image

@madtoinou
Copy link
Collaborator

Hi @ETTAN93,

Does this happen with a specific dataset or with all the dataset you're trying to use with the model? Does reducing the size of the model impact the epoch at which the process get stuck?

Can you try to share a reproducible example so that we can better investigate the source of the problem? Please include the argument used as well as some synthetic data with features similar to the one you're using.

@ETTAN93
Copy link
Author

ETTAN93 commented Aug 30, 2024

hi @madtoinou, this happens with a specific dataset that I am using but I did a bit more testing around the issue and discovered a few things:

  1. The model runs completely fine if epoch = 5. If I set epoch >= 6, the progress bar will get stuck at 77% as previously mentioned.
  2. It seems like the amount of data in the train or test set when carrying out historical forecast could be causing the issue. I tried setting epoch = 6 .
    Original dates:
start_date= '2019-09-01 00:00:00'
split_date= '2023-01-31 23:59:00' 
end_date= '2024-05-31 23:59:00' 

This would cause the model to fail at Epoch 5 77%

When reducing the amount of data to

start_date= '2019-09-01 00:00:00'
split_date= '2021-12-31 23:59:00
end_date= '2022-12-31 23:59:00'"

The model successfully completes Epoch 5.

When increasing the train data by 1 extra year while keeping test set at 1 year:

start_date= '2019-09-01 00:00:00'
split_date= '2022-12-31 23:59:00
end_date= '2023-12-31 23:59:00'"

The model fails at Epoch 5 but at 92%. so it seems like the amount of train data could be causing it. Do you have any experience of this before?

The dataset below can be used to replicate the issue:

num_rows = 175319
num_columns = 88
start_date = '2019-08-15 03:15:00'
end_date = '2024-08-14 08:45:00'

# Generate random float data
data = np.random.rand(num_rows, num_columns) * 100

# Generate the DatetimeIndex with a frequency of 15 minutes
datetime_index = pd.date_range(start=start_date, end=end_date, freq='15T', name='timestamp_utc')

column_names = [f'column_{i+1}' for i in range(num_columns)]

# Create the DataFrame
test_df = pd.DataFrame(data, columns = column_names)
test_df.index = datetime_index

start_date = '2019-09-01 00:00:00'
split_date =  '2022-12-31 23:59:00' 
end_date = '2023-12-31 23:59:00' 

target_series = TimeSeries.from_dataframe(test_df[column_1])[start_date:end_date] #should contain 1 column only
future_cov_series = TimeSeries.from_dataframe(test_df [....])[start_date:end_date] #should contain 22 columns
past_cov_series = TimeSeries.from_dataframe(test_df [...])[start_date:end_date] #should contain 65 columns

target_train = target_series[start_date:split_date]
future_cov_train = future_cov_series[start_date:split_date]
past_cov_train = past_cov_series[start_date:split_date]

tide_model = TiDEModel(
    input_chunk_length=8,
    output_chunk_length=3,
    n_epochs=6
)

tide_model.fit(
    series =target_train ,
    past_covariates= past_cov_train,
    future_covariates= future_cov_train 
)
    
tide_hf_results = model_estimator.historical_forecasts(
    series=target_series, 
    past_covariates= past_cov_series,
    future_covariates= future_cov_series,
    start=split_date, #can change to different date examples mentioned above
    retrain=False,
    forecast_horizon=3,
    stride=1,
    train_length = None,
    verbose=True,
    last_points_only=False,
) 

@madtoinou madtoinou added the bug Something isn't working label Sep 2, 2024
@ETTAN93
Copy link
Author

ETTAN93 commented Sep 30, 2024

hi @madtoinou, has there been any updates about this issue?

@madtoinou
Copy link
Collaborator

Hi @ETTAN93,

I tried your code snippet (with some corrections to make it run) and could not reproduce the issue (with the latest version of master). If the model fails during the 5th epoch, it means that you have an issue during the training, not the historical forecasts (which uses the optimized routine and rely on only one DataLoader). The problem might be hardware or memory related...

import numpy as np
import pandas as pd
from darts import TimeSeries
from darts.models import TiDEModel

num_rows = 175319
num_columns = 88
start_date = '2019-08-15 03:15:00'
end_date = '2024-08-14 08:45:00'

# Generate random float data
data = np.random.rand(num_rows, num_columns) * 100

# Generate the DatetimeIndex with a frequency of 15 minutes
datetime_index = pd.date_range(start=start_date, end=end_date, freq='15T', name='timestamp_utc')
column_names = [f'column_{i+1}' for i in range(num_columns)]

# Create the DataFrame
test_df = pd.DataFrame(data, columns = column_names)
test_df.index = datetime_index

start_date = pd.Timestamp('2019-09-01 00:00:00')
split_date = pd.Timestamp('2022-12-31 00:00:00')
end_date = pd.Timestamp('2023-12-31 00:00:00')

target_series = TimeSeries.from_dataframe(test_df[["column_1"]])[start_date:end_date] #should contain 1 column only
future_cov_series = TimeSeries.from_dataframe(test_df[[f'column_{i+1}' for i in range(1, 23)]])[start_date:end_date] #should contain 22 columns
past_cov_series = TimeSeries.from_dataframe(test_df[[f'column_{i+1}' for i in range(23, 88)]])[start_date:end_date] #should contain 65 columns

target_train = target_series[start_date:split_date]
future_cov_train = future_cov_series[start_date:split_date]
past_cov_train = past_cov_series[start_date:split_date]

tide_model = TiDEModel(
    input_chunk_length=8,
    output_chunk_length=3,
    n_epochs=6,
    pl_trainer_kwargs={"accelerator":"cpu"}
)

tide_model.fit(
    series =target_train ,
    past_covariates= past_cov_train,
    future_covariates= future_cov_train 
)
    
tide_hf_results = tide_model.historical_forecasts(
    series=target_series, 
    past_covariates= past_cov_series,
    future_covariates= future_cov_series,
    start=split_date, #can change to different date examples mentioned above
    retrain=False,
    forecast_horizon=3,
    stride=1,
    train_length = None,
    verbose=True,
    last_points_only=False,
) 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants