TimeLLM takes a long time to setup training. #950

hxuaj · 2024-04-01T07:06:19Z

What happened + What you expected to happen

Hi,
I was tring to run the example code of the TimeLLM model https://nixtlaverse.nixtla.io/neuralforecast/models.timellm.html#timellm
It took almost 1 hour before actual training. In the terminal, it only shows "Seed set to 1". I checked the GPU, where there is no GPU usage and only memory being taken about 500MB(~gpt2 size). Then the training began, it took only ~10s. In the training, there was usual GPU usage. Last, it also took ~1 hour to wrap up(predict time?).
I was wondering if it's a normal situation for TimeLLM since the model is new. If it's a problem, where the bottleneck could possibly be?
To exclude the network issue, I used local files to load GPT2:

gpt2_config = GPT2Config.from_pretrained(gpt2_local_path, local_files_only=True)
gpt2 = GPT2Model.from_pretrained(gpt2_local_path, config=gpt2_config, local_files_only=True)
gpt2_tokenizer = GPT2Tokenizer.from_pretrained(gpt2_local_path, local_files_only=True)

Hardware: NVIDIA T4 (only tried it on one of my GPUs due to #937)
OS: Linux

Versions / Dependencies

Python 3.9
neuralforecast 1.7.0

Reproduction script

https://nixtlaverse.nixtla.io/neuralforecast/models.timellm.html#timellm

Issue Severity

None

JKYtydt · 2024-04-08T10:53:00Z

发生了什么 + 你期望发生什么

你好，我试图运行 TimeLLM 模型的示例代码https://nixtlaverse.nixtla.io/neuralforecast/models.timellm.html#timellm 在实际训练之前花了将近 1 小时。在终端中，它只显示“Seed set to 1”。我检查了 GPU，没有 GPU 使用情况，仅占用了大约 500MB（~~gpt2 大小）的内存。然后训练开始，只花了~~10s。在训练中，GPU 使用情况很常见。最后，也花了大约 1 个小时来结束（预测时间？）。我想知道这是否是 TimeLLM 的正常情况，因为该模型是新的。如果有问题，瓶颈可能在哪里？为了排除网络问题，我使用本地文件加载GPT2：
gpt2_config = GPT2Config.from_pretrained(gpt2_local_path, local_files_only=True)
gpt2 = GPT2Model.from_pretrained(gpt2_local_path, config=gpt2_config, local_files_only=True)
gpt2_tokenizer = GPT2Tokenizer.from_pretrained(gpt2_local_path, local_files_only=True)
硬件：NVIDIA T4（由于#937，仅在我的一个 GPU 上尝试过）操作系统：Linux

版本/依赖项

Python 3.9 神经预测 1.7.0

复制脚本

https://nixtlaverse.nixtla.io/neuralforecast/models.timellm.html#timellm

问题严重性

没有任何

您好，我和您遇到了同样的问题，我把训练完的模型保存下来，再去做推理预测，依旧很慢，您这边是否解决了呢

elephaint · 2024-04-16T08:05:23Z

Thanks - I can reproduce the issue (very long time to setup the training). We'll look into it.

JKYtydt · 2024-04-23T02:36:14Z

谢谢 - 我可以重现这个问题（设置培训的时间很长）。我们会调查一下。

谢谢，期待您的回复

elephaint · 2024-04-29T12:23:22Z

I can't seem to find a solution for this, unfortunately. TimeLLM with the current model also seems slow on my machine. Maybe you could try a different LLM from the Transformers library?

hxuaj · 2024-04-30T08:16:50Z

I can't seem to find a solution for this, unfortunately. TimeLLM with the current model also seems slow on my machine. Maybe you could try a different LLM from the Transformers library?

Thank you for the reply.
Do you mean apply LLMs other than gpt2 to TimeLLM? Since this issue, I switched to other models like nhits and timesnet already. Thanks again.

elephaint · 2024-04-30T10:17:38Z

I can't seem to find a solution for this, unfortunately. TimeLLM with the current model also seems slow on my machine. Maybe you could try a different LLM from the Transformers library?

Thank you for the reply. Do you mean apply LLMs other than gpt2 to TimeLLM? Since this issue, I switched to other models like nhits and timesnet already. Thanks again.

Yes, indeed, that's what I'd try. But I haven't tried myself different models yet, so can't recommend one, I'm sorry.

jexterliangsufe · 2024-06-24T09:00:40Z

So what causes this problem? I am facing the same problem as you did.

* Fix issue #950: Reduce TimeLLM setup time for training * Restore changes on the examples * Revert changes to nbs/models.ipynb, nbs/models.softs.ipynb and neuralforecast/_modidx.py * Revert changes to nbs/models.ipynb, nbs/models.softs.ipynb and neuralforecast/_modidx.py * Refactor code to dynamically load models with AutoModel, AutoTokenizer, and AutoConfig - Updated load_model_and_tokenizer function to use AutoModel, AutoTokenizer, and AutoConfig for flexible model loading. - Included default model(gpt2) for cases where the specified model fails to load. - Kept llm, llm_config, and llm_tokenizer arguments to minimize changes. - Changed llm from storing pretrained weights to accepting pretrained model path to reduce necessary modifications. This update enhances the flexibility and reliability of model loading based on received feedback while minimizing necessary changes. * Refactor code to dynamically load models with AutoModel, AutoTokenizer, and AutoConfig - Updated load_model_and_tokenizer function to use AutoModel, AutoTokenizer, and AutoConfig for flexible model loading. - Included default model(gpt2) for cases where the specified model fails to load. - Kept llm, llm_config, and llm_tokenizer arguments to minimize changes. - Changed llm from storing pretrained weights to accepting pretrained model path to reduce necessary modifications. This update enhances the flexibility and reliability of model loading based on received feedback while minimizing necessary changes. * clear output * modify test code * Optimize model loading and add deprecation warning - Simplify model loading logic - Add constant for default model name - Improve error handling for model loading - Add success messages for model loading - Implement deprecation warning for 'llm_config' and 'llm_tokenizer' parameters - Update print messages for clarity - Remove redundant code This commit improves code readability, maintainability, and user experience by providing clearer feedback and warnings about deprecated parameters. * Resolved conflict in nbs/models.timellm.ipynb --------- Co-authored-by: ive2go <[email protected]> Co-authored-by: Olivier Sprangers <[email protected]>

* Use math.ceil to prevent shape mismatch * Show exog support for KAN in doc * FEAT: TimeLLM is faster and supports more LLMs (#1139) * Fix issue #950: Reduce TimeLLM setup time for training * Restore changes on the examples * Revert changes to nbs/models.ipynb, nbs/models.softs.ipynb and neuralforecast/_modidx.py * Revert changes to nbs/models.ipynb, nbs/models.softs.ipynb and neuralforecast/_modidx.py * Refactor code to dynamically load models with AutoModel, AutoTokenizer, and AutoConfig - Updated load_model_and_tokenizer function to use AutoModel, AutoTokenizer, and AutoConfig for flexible model loading. - Included default model(gpt2) for cases where the specified model fails to load. - Kept llm, llm_config, and llm_tokenizer arguments to minimize changes. - Changed llm from storing pretrained weights to accepting pretrained model path to reduce necessary modifications. This update enhances the flexibility and reliability of model loading based on received feedback while minimizing necessary changes. * Refactor code to dynamically load models with AutoModel, AutoTokenizer, and AutoConfig - Updated load_model_and_tokenizer function to use AutoModel, AutoTokenizer, and AutoConfig for flexible model loading. - Included default model(gpt2) for cases where the specified model fails to load. - Kept llm, llm_config, and llm_tokenizer arguments to minimize changes. - Changed llm from storing pretrained weights to accepting pretrained model path to reduce necessary modifications. This update enhances the flexibility and reliability of model loading based on received feedback while minimizing necessary changes. * clear output * modify test code * Optimize model loading and add deprecation warning - Simplify model loading logic - Add constant for default model name - Improve error handling for model loading - Add success messages for model loading - Implement deprecation warning for 'llm_config' and 'llm_tokenizer' parameters - Update print messages for clarity - Remove redundant code This commit improves code readability, maintainability, and user experience by providing clearer feedback and warnings about deprecated parameters. * Resolved conflict in nbs/models.timellm.ipynb --------- Co-authored-by: ive2go <[email protected]> Co-authored-by: Olivier Sprangers <[email protected]> * Consistency with math.ceil --------- Co-authored-by: Olivier Sprangers <[email protected]> Co-authored-by: ive2go <[email protected]>

* WIP - Add reversible mixture of kan * WIP - Allows import of RMoK * AutoRMoK, add it to doc, add parameters * Fix tests * Get default config of AutoRMoK * FIX: timemixer shapes mismatch and doc update (#1138) * Use math.ceil to prevent shape mismatch * Show exog support for KAN in doc * FEAT: TimeLLM is faster and supports more LLMs (#1139) * Fix issue #950: Reduce TimeLLM setup time for training * Restore changes on the examples * Revert changes to nbs/models.ipynb, nbs/models.softs.ipynb and neuralforecast/_modidx.py * Revert changes to nbs/models.ipynb, nbs/models.softs.ipynb and neuralforecast/_modidx.py * Refactor code to dynamically load models with AutoModel, AutoTokenizer, and AutoConfig - Updated load_model_and_tokenizer function to use AutoModel, AutoTokenizer, and AutoConfig for flexible model loading. - Included default model(gpt2) for cases where the specified model fails to load. - Kept llm, llm_config, and llm_tokenizer arguments to minimize changes. - Changed llm from storing pretrained weights to accepting pretrained model path to reduce necessary modifications. This update enhances the flexibility and reliability of model loading based on received feedback while minimizing necessary changes. * Refactor code to dynamically load models with AutoModel, AutoTokenizer, and AutoConfig - Updated load_model_and_tokenizer function to use AutoModel, AutoTokenizer, and AutoConfig for flexible model loading. - Included default model(gpt2) for cases where the specified model fails to load. - Kept llm, llm_config, and llm_tokenizer arguments to minimize changes. - Changed llm from storing pretrained weights to accepting pretrained model path to reduce necessary modifications. This update enhances the flexibility and reliability of model loading based on received feedback while minimizing necessary changes. * clear output * modify test code * Optimize model loading and add deprecation warning - Simplify model loading logic - Add constant for default model name - Improve error handling for model loading - Add success messages for model loading - Implement deprecation warning for 'llm_config' and 'llm_tokenizer' parameters - Update print messages for clarity - Remove redundant code This commit improves code readability, maintainability, and user experience by providing clearer feedback and warnings about deprecated parameters. * Resolved conflict in nbs/models.timellm.ipynb --------- Co-authored-by: ive2go <[email protected]> Co-authored-by: Olivier Sprangers <[email protected]> * Consistency with math.ceil --------- Co-authored-by: Olivier Sprangers <[email protected]> Co-authored-by: ive2go <[email protected]> * Add image, docstring, fix typo in comment --------- Co-authored-by: Olivier Sprangers <[email protected]> Co-authored-by: ive2go <[email protected]>

hxuaj added the bug label Apr 1, 2024

elephaint self-assigned this Apr 17, 2024

elephaint closed this as completed May 7, 2024

ive2go added a commit to ive2go/neuralforecast that referenced this issue Jul 19, 2024

Fix issue Nixtla#950: Reduce TimeLLM setup time for training

39de4b6

ive2go mentioned this issue Jul 19, 2024

Fix issue #950: Reduce TimeLLM setup time for training #1073

Closed

ive2go mentioned this issue Jul 27, 2024

Fix Issue #950: Improve TimeLLM Initialization by Excluding LLM Weights from Hyperparameters Saving #1086

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TimeLLM takes a long time to setup training. #950

TimeLLM takes a long time to setup training. #950

hxuaj commented Apr 1, 2024 •

edited

Loading

JKYtydt commented Apr 8, 2024

发生了什么 + 你期望发生什么

版本/依赖项

复制脚本

问题严重性

elephaint commented Apr 16, 2024

JKYtydt commented Apr 23, 2024

elephaint commented Apr 29, 2024 •

edited

Loading

hxuaj commented Apr 30, 2024

elephaint commented Apr 30, 2024

jexterliangsufe commented Jun 24, 2024

TimeLLM takes a long time to setup training. #950

TimeLLM takes a long time to setup training. #950

Comments

hxuaj commented Apr 1, 2024 • edited Loading

What happened + What you expected to happen

Versions / Dependencies

Reproduction script

Issue Severity

JKYtydt commented Apr 8, 2024

发生了什么 + 你期望发生什么

版本/依赖项

复制脚本

问题严重性

elephaint commented Apr 16, 2024

JKYtydt commented Apr 23, 2024

elephaint commented Apr 29, 2024 • edited Loading

hxuaj commented Apr 30, 2024

elephaint commented Apr 30, 2024

jexterliangsufe commented Jun 24, 2024

hxuaj commented Apr 1, 2024 •

edited

Loading

elephaint commented Apr 29, 2024 •

edited

Loading