-
Notifications
You must be signed in to change notification settings - Fork 874
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TiDE Model Stops At A Specific Epoch #2496
Comments
Hi @ETTAN93, Does this happen with a specific dataset or with all the dataset you're trying to use with the model? Does reducing the size of the model impact the epoch at which the process get stuck? Can you try to share a reproducible example so that we can better investigate the source of the problem? Please include the argument used as well as some synthetic data with features similar to the one you're using. |
hi @madtoinou, this happens with a specific dataset that I am using but I did a bit more testing around the issue and discovered a few things:
This would cause the model to fail at Epoch 5 77% When reducing the amount of data to
The model successfully completes Epoch 5. When increasing the train data by 1 extra year while keeping test set at 1 year:
The model fails at Epoch 5 but at 92%. so it seems like the amount of train data could be causing it. Do you have any experience of this before? The dataset below can be used to replicate the issue:
|
hi @madtoinou, has there been any updates about this issue? |
Hi @ETTAN93, I tried your code snippet (with some corrections to make it run) and could not reproduce the issue (with the latest version of master). If the model fails during the 5th epoch, it means that you have an issue during the training, not the historical forecasts (which uses the optimized routine and rely on only one DataLoader). The problem might be hardware or memory related... import numpy as np
import pandas as pd
from darts import TimeSeries
from darts.models import TiDEModel
num_rows = 175319
num_columns = 88
start_date = '2019-08-15 03:15:00'
end_date = '2024-08-14 08:45:00'
# Generate random float data
data = np.random.rand(num_rows, num_columns) * 100
# Generate the DatetimeIndex with a frequency of 15 minutes
datetime_index = pd.date_range(start=start_date, end=end_date, freq='15T', name='timestamp_utc')
column_names = [f'column_{i+1}' for i in range(num_columns)]
# Create the DataFrame
test_df = pd.DataFrame(data, columns = column_names)
test_df.index = datetime_index
start_date = pd.Timestamp('2019-09-01 00:00:00')
split_date = pd.Timestamp('2022-12-31 00:00:00')
end_date = pd.Timestamp('2023-12-31 00:00:00')
target_series = TimeSeries.from_dataframe(test_df[["column_1"]])[start_date:end_date] #should contain 1 column only
future_cov_series = TimeSeries.from_dataframe(test_df[[f'column_{i+1}' for i in range(1, 23)]])[start_date:end_date] #should contain 22 columns
past_cov_series = TimeSeries.from_dataframe(test_df[[f'column_{i+1}' for i in range(23, 88)]])[start_date:end_date] #should contain 65 columns
target_train = target_series[start_date:split_date]
future_cov_train = future_cov_series[start_date:split_date]
past_cov_train = past_cov_series[start_date:split_date]
tide_model = TiDEModel(
input_chunk_length=8,
output_chunk_length=3,
n_epochs=6,
pl_trainer_kwargs={"accelerator":"cpu"}
)
tide_model.fit(
series =target_train ,
past_covariates= past_cov_train,
future_covariates= future_cov_train
)
tide_hf_results = tide_model.historical_forecasts(
series=target_series,
past_covariates= past_cov_series,
future_covariates= future_cov_series,
start=split_date, #can change to different date examples mentioned above
retrain=False,
forecast_horizon=3,
stride=1,
train_length = None,
verbose=True,
last_points_only=False,
) |
A more general question, I am trying to run a historical backtest using TiDE model for my use case:
For some reason, the model always stalls at a specific point (77% of Epoch 5). I can see that the kernel is still running under the hood but the progress bar will no longer continue moving. I have tried increasing the memory and CPU by 3x but still, the model would stall at exactly the same point. Not sure if anyone have met this issue before and have any suggested solutions.
No error messages are returned at all so I am not sure how to debug the issue.
The text was updated successfully, but these errors were encountered: