You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
How to convert the batch cell from the GenomicBenchmarks data to user data? CUDA memory overload if running "Single example" cell multiple times to produce embeddings.
#55
Open
Ontos46 opened this issue
Mar 14, 2024
· 0 comments
Could you, please, help me with using HyenaDNA for inference? I'm trying to produce embeddings for a series of long sequences (about 1500 sequences of up to 400,000 nucleotides). When I try running the "single example" method from colab notebook, it can only be run one time before CUDA memory is filled (torch.cuda.empty_cache() doesn't help) and colab session needs to be restarted. Most likely it is necessary to use the "Batch example" method but it seems to be designed around the GenomicBenchmarks dataset. Is there any way to repurpose it towards user-input data? Effectively I have a list of DNA sequences strings; how do I pass them to the model correctly in batch format?
The text was updated successfully, but these errors were encountered:
Could you, please, help me with using HyenaDNA for inference? I'm trying to produce embeddings for a series of long sequences (about 1500 sequences of up to 400,000 nucleotides). When I try running the "single example" method from colab notebook, it can only be run one time before CUDA memory is filled (torch.cuda.empty_cache() doesn't help) and colab session needs to be restarted. Most likely it is necessary to use the "Batch example" method but it seems to be designed around the GenomicBenchmarks dataset. Is there any way to repurpose it towards user-input data? Effectively I have a list of DNA sequences strings; how do I pass them to the model correctly in batch format?
The text was updated successfully, but these errors were encountered: