Abnormal memory increase in eval step #77
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This problem is caused by using the trainer provided by the transformers package. The memory usage will increase abnormally during the eval step when using customized
compute_metrics()
, but there is no problem during training.I finetune my data set. The size of my evaluation set was about 27k. There was insufficient memory at the beginning. I used the
eval_accumulation_steps
parameter to put the evaluation part on the CPU. and it work, but in the end the RAM usage was It reaches 140 GB, and it takes a long time.my related settings:
refer to this article, I fixed this issue with add
preprocess_logits_for_metrics()
under the trainer.In our code, we need to exclude dnabert last layer output in
preprocess_logits_for_metrics()
before the trainer passes the output tocompute_metrics()
(it will be passed out together with the output classification logits by default) to avoid taking up too much memory.