Out of memory error with ChemGPT-1.2B #56

PatWalters · 2023-05-21T17:48:18Z

PatWalters
May 21, 2023

I've been trying to use ChemGPT-1.2B, but I get an out of memory error. Can someone show me the trick to running ChemGPT-1.2B on a GPU without running out of memory? I put together a gist with an example of what I've been doing. The code is a minor modification of the molfeat "Finetuning a pretrained transformer" example.

https://gist.github.com/PatWalters/63caca95bfc808fa7580df2a3bb525b2

maclandrol · 2023-05-22T02:52:15Z

maclandrol
May 22, 2023
Maintainer

Hello @PatWalters, for ChemGPT-1.2B the simplest solution would be a larger GPU (or multiple GPUs)...

I conducted some small tests and you would need a minimum of 25GB of GPU memory.

Although the model itself can fit on your GPU, additional memory space is needed during training for data, model output, optimizer states, gradients, etc. (You can find a great explanation here: https://huggingface.co/docs/transformers/perf_train_gpu_one)

The tutorial snippet can be optimized in the following ways:

1. Instead of padding the entire dataset, pad the inputs during batching to reduce the amount of padding and overall GPU memory usage.
2. efficient training methods such as gradient accumulation/checkpointing, different optimizers, deepzero, etc.
3. lower precision
4. lower batch size (with gradient accumulation potentially)

I have made some modifications to your gist to implement (1) and utilize PyTorch Lightning, which makes investigating (2) and (3) easier. You can find the modified version here: https://gist.github.com/maclandrol/8bd3e50cdfc345fa095e7c96bc3643b3

As a side note, we have recently added two new models (GPT2-Zinc480M-87M and Roberta-Zinc480M-102M) both coming from https://github.com/kheyer (see #51). These models should be more manageable for your GPU. The only change required is:

featurizer = PretrainedHFTransformer("GPT2-Zinc480M-87M", notation="smiles", dtype=torch.float, preload=True)

A quick test without any tuning for GPT2-Zinc480M-87M on your data (random split) gives:

2 replies

PatWalters May 22, 2023
Author

Thanks for your quick and helpful reply! Given the requirement for 25GB of GPU memory, it looks like multiple GPUs would be the best choice. Do you have a pointer to a reference showing how to do the tuning on multiple GPUs?

maclandrol May 23, 2023
Maintainer

Based on this notebook, you will only need to modify a bit the train_and_test_model to use your GPUs, for example devices=[0,1] or auto and potentially set a training strategy for the pl.Trainer (see here: https://lightning.ai/docs/pytorch/stable/extensions/strategy.html).

Note however that multi-GPUs usually does not work in a notebook and you will have to use a script.

PatWalters · 2023-05-24T13:41:43Z

PatWalters
May 24, 2023
Author

Thank you! I'll give that a try.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Out of memory error with ChemGPT-1.2B #56

{{title}}

Replies: 2 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Out of memory error with ChemGPT-1.2B #56

PatWalters May 21, 2023

Replies: 2 comments · 2 replies

maclandrol May 22, 2023 Maintainer

PatWalters May 22, 2023 Author

maclandrol May 23, 2023 Maintainer

PatWalters May 24, 2023 Author

PatWalters
May 21, 2023

Replies: 2 comments 2 replies

maclandrol
May 22, 2023
Maintainer

PatWalters May 22, 2023
Author

maclandrol May 23, 2023
Maintainer

PatWalters
May 24, 2023
Author