-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test DIAMOND #27
Comments
Have built the DIAMOND DB based on refseq complete nonredundant protein sequences, ~87GB |
The Taxprofiler process was terminated when I suspect this happened because of the usage of @sofstam We need to address this issue with scilifelab IT since Blast will be used in validating Taxprofiler results in the future. |
☝️ Memory issue has been resolved. Some error messages from the tests:
Conclusions from standalone tests. Database: mentioned above (complete_nonredundant_protein_db). Diamond version 2.0.15 (the same version as the one in nf-core/taxprofiler v1.1.0) works fine.
Conclusions from
The time taken for this process is determined by the size of the input files. I most our routine cases, the unmapped reads from Bowtie2/align are smaller than 2.5GB.
The running time increases with the growing size of the database. For instance, it takes about 28h for read1/read2 of 2.5 GB using refseq protein data. |
Shall we update this config and ask in taxprofiler to change the label of the process? |
I plan to discuss that in the Slack channel once I finish all tests. Yes, for us, we need to update the above config. |
Sounds great! |
So from the practical point of view, we should use a complete non-redundant protein database.
@sofstam What do you think? |
DIAMOND is a program for finding homologs of protein and DNA sequences in a reference database.
Run DIAMOND and compare with Kraken2 results.
TO DO:
1: build the protein database
2: Run diamond for clinical samples within the #196939
The text was updated successfully, but these errors were encountered: