You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello dear author, thank you in advance for developing such an amazing tool.
I'm trying to build a custom database with some Fusarium genomes I'm interested in. In order to do that, I've created a file called genids.txt which only contains the GenBank ID of my genomes of interest per line and looks like this:
I'm running HGTector with the command hgtector database -o customdb_dir --genbank -g genids.txt and the output I'm getting is this:
Database building started at 2022-09-07 15:25:33.376946. Using local file taxdump.tar.gz. Reading NCBI taxonomy database... done. Total number of TaxIDs: 2442490. Using local file assembly_summary_refseq.txt. Reading RefSeq assembly summary... done. Using local file assembly_summary_genbank.txt. Reading GenBank assembly summary... done. Total number of genomes: 1637581. Genome categories: archaea, bacteria, fungi, protozoa Downloading genome list per RefSeq category... Using local file refseq_archaea.txt. archaea: 1319 Using local file refseq_bacteria.txt. bacteria: 257994 Using local file refseq_fungi.txt. fungi: 459 Using local file refseq_protozoa.txt. protozoa: 95 Done. Downloading genome list per GenBank category... Using local file genbank_archaea.txt. archaea: 10075 Using local file genbank_bacteria.txt. bacteria: 1266069 Using local file genbank_fungi.txt. fungi: 12012 Using local file genbank_protozoa.txt. protozoa: 1469 Done. Total number of genomes in categories: 1548312. Filtering genomes... Including 39 custom genome IDs... Dropped 1548279 genomes. Done. Filtering genomes by taxonomy... Done. Traceback (most recent call last): File "/home/irecha/anaconda3/envs/HGTector/bin/hgtector", line 96, in <module> main() File "/home/irecha/anaconda3/envs/HGTector/bin/hgtector", line 35, in main module(args) File "/home/irecha/anaconda3/envs/HGTector/lib/python3.10/site-packages/hgtector/database.py", line 148, in __call__ self.filter_to_sampled() File "/home/irecha/anaconda3/envs/HGTector/lib/python3.10/site-packages/hgtector/database.py", line 591, in filter_to_sampled raise ValueError('No genome is retained after sampling.') ValueError: No genome is retained after sampling.
It seems like I'm not getting the genomes I specified in the file. Could you please help me with this issue?
The text was updated successfully, but these errors were encountered:
Hello @IrechaC Thanks for reporting this issue. I looked at the code and found that it is a bug. I just fixed it (#103 ). You can update the program by:
Note that in the genome ID list you provided, some don't have available protein sequences or lack some key metadata fields. They cannot be retrieved by the program. Eventually, 20 out of 38 genomes were retrieved.
Hello dear author, thank you in advance for developing such an amazing tool.
I'm trying to build a custom database with some Fusarium genomes I'm interested in. In order to do that, I've created a file called genids.txt which only contains the GenBank ID of my genomes of interest per line and looks like this:
$ cat genids.txt GCA_004367085.1 GCA_003615085.1 GCA_018894095.1 GCA_007994515.1 GCA_000350345.1 GCA_000149955.2 GCA_000260155.3 GCA_000259975.2 GCA_000260495.2 GCA_000271705.2 GCA_000271745.2 GCA_003615185.1 GCA_013085055.1 GCA_900096695.1 GCA_000260075.2 GCA_001702695.2 GCA_000260235.2 GCA_019157275.1 GCA_000260175.2 GCA_000149555.1 GCA_023509805.1 GCA_900079805.1 GCA_000240135.3 GCA_001703125.1 GCA_001931975.2 GCA_002233935.1 GCA_003025205.1 GCA_003615115.1 GCA_003615155.1 GCA_003615165.1 GCA_003704975.1 GCA_003705035.1 GCA_013347375.1 GCA_013347365.1 GCA_014325185.1 GCA_014325215.1 GCA_014857085.1 GCA_016163925.1
I'm running HGTector with the command
hgtector database -o customdb_dir --genbank -g genids.txt
and the output I'm getting is this:Database building started at 2022-09-07 15:25:33.376946. Using local file taxdump.tar.gz. Reading NCBI taxonomy database... done. Total number of TaxIDs: 2442490. Using local file assembly_summary_refseq.txt. Reading RefSeq assembly summary... done. Using local file assembly_summary_genbank.txt. Reading GenBank assembly summary... done. Total number of genomes: 1637581. Genome categories: archaea, bacteria, fungi, protozoa Downloading genome list per RefSeq category... Using local file refseq_archaea.txt. archaea: 1319 Using local file refseq_bacteria.txt. bacteria: 257994 Using local file refseq_fungi.txt. fungi: 459 Using local file refseq_protozoa.txt. protozoa: 95 Done. Downloading genome list per GenBank category... Using local file genbank_archaea.txt. archaea: 10075 Using local file genbank_bacteria.txt. bacteria: 1266069 Using local file genbank_fungi.txt. fungi: 12012 Using local file genbank_protozoa.txt. protozoa: 1469 Done. Total number of genomes in categories: 1548312. Filtering genomes... Including 39 custom genome IDs... Dropped 1548279 genomes. Done. Filtering genomes by taxonomy... Done. Traceback (most recent call last): File "/home/irecha/anaconda3/envs/HGTector/bin/hgtector", line 96, in <module> main() File "/home/irecha/anaconda3/envs/HGTector/bin/hgtector", line 35, in main module(args) File "/home/irecha/anaconda3/envs/HGTector/lib/python3.10/site-packages/hgtector/database.py", line 148, in __call__ self.filter_to_sampled() File "/home/irecha/anaconda3/envs/HGTector/lib/python3.10/site-packages/hgtector/database.py", line 591, in filter_to_sampled raise ValueError('No genome is retained after sampling.') ValueError: No genome is retained after sampling.
It seems like I'm not getting the genomes I specified in the file. Could you please help me with this issue?
The text was updated successfully, but these errors were encountered: