Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Abnormal memory increase in eval step #77

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

aaron1aaron2
Copy link

@aaron1aaron2 aaron1aaron2 commented Apr 2, 2024

This problem is caused by using the trainer provided by the transformers package. The memory usage will increase abnormally during the eval step when using customized compute_metrics(), but there is no problem during training.

I finetune my data set. The size of my evaluation set was about 27k. There was insufficient memory at the beginning. I used the eval_accumulation_steps parameter to put the evaluation part on the CPU. and it work, but in the end the RAM usage was It reaches 140 GB, and it takes a long time.

my related settings:

  • model_max_length=500
  • per_device_eval_batch_size=16

refer to this article, I fixed this issue with add preprocess_logits_for_metrics() under the trainer.

In our code, we need to exclude dnabert last layer output in preprocess_logits_for_metrics() before the trainer passes the output to compute_metrics() (it will be passed out together with the output classification logits by default) to avoid taking up too much memory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant