-
Notifications
You must be signed in to change notification settings - Fork 874
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[QUESTION]Training Loss Much Lower Than Validation Loss in TSMixerModel: Need Help Understanding Why #2558
Comments
Hi @erl61, could you provide a minimal reproducible example including model training (potentially processing of the data), what series you provide to fit and predict? |
Hi @dennisbader, here's an example taken from the documentation but applied to my data. Unfortunately, I cannot share the actual data due to an NDA, but my code looks like this:
Could my issue be related to the large number of zeros in the dataset (10% of data) or the scale of the target variable (which ranges from zero to millions, but I use scaler)? Would these factors affect the loss calculation and result in such a significant discrepancy between training and validation losses? |
The nature of your time series data could indeed be the issue.
|
@dennisbader Thank you! Yes, the loss curve starts to look correct when I use a subset of data with stable time series behavior and fewer zeros. In my data, I do have a long tail, with the majority of the values falling between 0 and 100,000. Over time, the values increase, so the beginning of the series has lower values compared to the end. I have 650 different group combinations, which reflect the complexity of the business model. I’m using the Temporal Fusion Transformer model from the pytorch-forecasting package, and it performs well when using EncoderNormalizer, which normalizes each individual time series sequence during training. Is there something similar I can use in Darts? |
We have the |
Issue
I am training a TSMixerModel to forecast multivariate time series. The model performs well overall, but I notice that the training loss is consistently much lower than the validation loss (sometimes by orders of magnitude).
I have already tried different loss functions (MAELoss, MapeLoss), and the issue persists. However, when I forecast using this model, I don’t observe signs of overfitting, and the model predictions look good.
Callback
I use the following setup for logging the losses:
Model
This is how I initialize the model:
Loss curves
Here are the plotted loss curves after training:
Data
I create my multivariate time series using from_group_dataframe() as follows:
Question
Why is my training loss significantly lower than the validation loss, sometimes by orders of magnitude? Could it be related to how the data is structured as a list of time series? Is this expected behavior in this scenario, or could there be an issue with scaling or loss calculation?
I appreciate any help or insights!
Thanks!
The text was updated successfully, but these errors were encountered: