You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I think I may be missing something. I am trying to create a nucleotide index on a 677G fasta (nt) file and I get the expected error:
WARNING: Your sequence file is already larger than your physical memory!
This means you will likely encounter a crash with "bad_alloc".
Split you sequence file into many smaller ones or use a computer
with more memory!
free -h
total used free shared buff/cache available
Mem: 503Gi 31Gi 432Gi 4.1Gi 39Gi 466Gi
Swap: 31Gi 1.8Gi 30Gi
My questions are, if I split the fasta file say into 3 and create separate indexes :
How would I run the search against the 3 lba files? and
would I not still have too little memory?
Kind regards
Armand
The text was updated successfully, but these errors were encountered:
even assuming that you manage to create the database, what is your use-case for using it? Unless you search >10GB of query sequences, your program runtime will be dominated by just loading the database (which will take super long as it is going to be around 2TB big in total).
If you search very large query files, this could still be worth it, but you will need to split the database, run the searches individually and then manually merge the output file. In such a case, I would recommend using m8 output, reducing the desired number of hits per query and then using a combination of the shell commands sort (increase allowed memory usage and threads) and awk (for filtering) to merge the files.
If you want to proceed with splitting the index, I would suggest the following:
Try with a small chunk (~30GB) first. Use /usr/bin/time -v to measure runtime and memory usage ("MaxRSS" value).
This will give you an indication of whether the time constraints are viable for you and how large you can make the chunks in a productive setting.
I would definitely recommend using .lba.gz to reduce the on-disk size of the index files. This may even make it faster when loading.
If you have any further questions, feel free to ask :)
Dear lambda creators
I think I may be missing something. I am trying to create a nucleotide index on a 677G fasta (nt) file and I get the expected error:
My questions are, if I split the fasta file say into 3 and create separate indexes :
and
Kind regards
Armand
The text was updated successfully, but these errors were encountered: