Model cannot be automatically initialized in the same programming file #976

ZhikangLai · 2024-04-18T03:40:48Z

What happened + What you expected to happen

Hello, there. I found an interesting phenomenon. When I use the model from Neuralforcast multiple times in a program file, only the initial usage yields good results, while subsequent uses produce bad outcomes.

You can see the example I have uploaded.

When I first conducted an experiment with 'sigma=20' using Neuralforcast, the results were promising. However, when I proceeded to conduct an experiment with 'sigma=30' using Neuralforcast, the results deteriorated significantly, as shown in the following figure:

However, when I restarted Jupyter and conducted the experiment with 'sigma=30' alone, the results improved significantly again. This suggests that the model might not have been initialized properly. To validate my hypothesis, I wrote the following model initialization function:

Then, I conducted the experiments again with 'sigma=20' and 'sigma=30', and the results are as follows:

Although these results are not as good as those from conducting the experiments separately, they at least confirm my hypothesis.

Versions / Dependencies

neuralforecast 1.70, python 3.11

Reproduction script

test.zip
Here is my example code.

Issue Severity

None

ZhikangLai · 2024-04-18T03:48:12Z

Here is the result to conduct the experiment with 'sigma=30' alone

and Here is my method code:

ZhikangLai · 2024-04-18T07:34:54Z

Even though I set 'use_init_models = True', it doesn't work.

elephaint · 2024-04-18T13:28:53Z

Thanks for reporting - I can reproduce the issue.

jmoralez · 2024-04-18T21:35:10Z

Hey @ZhikangLai, thanks for using neuralforecast. I believe the issue is in the part that adds noise, since that doesn't have a seed and thus produces different values each time you run it. By having a fixed RNG there:

df['y'] = df['y'] + np.random.default_rng(seed=0).normal(loc=0, scale=0.02, size=len(df)) # add GP noise

I get the same results each time I run the model. Please let us know if this doesn't work as expected.

ZhikangLai · 2024-04-21T05:39:40Z

Hey @ZhikangLai, thanks for using neuralforecast. I believe the issue is in the part that adds noise, since that doesn't have a seed and thus produces different values each time you run it. By having a fixed RNG there:
df['y'] = df['y'] + np.random.default_rng(seed=0).normal(loc=0, scale=0.02, size=len(df)) # add GP noise
I get the same results each time I run the model. Please let us know if this doesn't work as expected.

Hey @jmoralez , I removed this line of code, and I got the same result on each time I run. But when I set random seed to 42 and added noise to the filtered data, the same issue would come out. I don't think this issue is related to the line of code.

Here is the code for model initialization:

When I use this code to reinitialize each model, and it would be get good results.

Here is the code for my entire test.
test.zip

ZhikangLai · 2024-04-21T05:57:26Z

@jmoralez By the way, If it's the line of noise that affects it, that won't be affects the second run

when you just run the follow code:

model_train(sigma = 20, model_name = 'MLP')

or

model_train(sigma = 30, model_name = 'MLP')

the result will be great. But if you run the follow code in the same time:

model_train(sigma = 20, model_name = 'MLP')
model_train(sigma = 30, model_name = 'MLP')

or

model_train(sigma = 30, model_name = 'MLP')
model_train(sigma = 20, model_name = 'MLP')

Only the first line code result will be great, and the second line code will get bad result. I'm not sure if because after the model trained on the first line of code, the parameters leaked to the model trained on the second line of code.

elephaint · 2024-04-22T10:40:10Z

@ZhikangLai

The difference between 'good' and 'bad' results is simply the choice of seed. Seed 42 produces good results. Another seed not so.

You can remove your entire initialization script (it's unnecessary), and try out different seeds (i) at the location where @jose-moralez suggested it and (ii) in the model_params. You'll see great results sometimes, sometimes bad. Seed 42 just happens to give good results.

Executing model_train twice in the same cell only one of the two runs is plotted (at least in my editor). So I'm not sure a visual evaluation of both forecasts makes sense then.

To conclude this issue:

Make sure you properly seed random number generations.
Try out different random seeds.
Evaluate numbers, not plots, to make sure you are comparing the right numbers between different runs or cells in which algorithsm are ran.

Hope this helps.

ZhikangLai added the bug label Apr 18, 2024

jmoralez added the awaiting response label Apr 18, 2024

github-actions bot removed the awaiting response label Apr 21, 2024

elephaint closed this as completed Apr 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model cannot be automatically initialized in the same programming file #976

Model cannot be automatically initialized in the same programming file #976

ZhikangLai commented Apr 18, 2024

ZhikangLai commented Apr 18, 2024

ZhikangLai commented Apr 18, 2024

elephaint commented Apr 18, 2024

jmoralez commented Apr 18, 2024

ZhikangLai commented Apr 21, 2024

ZhikangLai commented Apr 21, 2024

elephaint commented Apr 22, 2024

Model cannot be automatically initialized in the same programming file #976

Model cannot be automatically initialized in the same programming file #976

Comments

ZhikangLai commented Apr 18, 2024

What happened + What you expected to happen

Versions / Dependencies

Reproduction script

Issue Severity

ZhikangLai commented Apr 18, 2024

ZhikangLai commented Apr 18, 2024

elephaint commented Apr 18, 2024

jmoralez commented Apr 18, 2024

ZhikangLai commented Apr 21, 2024

ZhikangLai commented Apr 21, 2024

elephaint commented Apr 22, 2024