Error while building database with custom GenBank IDs list #101

IrechaC · 2022-09-07T21:09:31Z

Hello dear author, thank you in advance for developing such an amazing tool.

I'm trying to build a custom database with some Fusarium genomes I'm interested in. In order to do that, I've created a file called genids.txt which only contains the GenBank ID of my genomes of interest per line and looks like this:

$ cat genids.txt GCA_004367085.1 GCA_003615085.1 GCA_018894095.1 GCA_007994515.1 GCA_000350345.1 GCA_000149955.2 GCA_000260155.3 GCA_000259975.2 GCA_000260495.2 GCA_000271705.2 GCA_000271745.2 GCA_003615185.1 GCA_013085055.1 GCA_900096695.1 GCA_000260075.2 GCA_001702695.2 GCA_000260235.2 GCA_019157275.1 GCA_000260175.2 GCA_000149555.1 GCA_023509805.1 GCA_900079805.1 GCA_000240135.3 GCA_001703125.1 GCA_001931975.2 GCA_002233935.1 GCA_003025205.1 GCA_003615115.1 GCA_003615155.1 GCA_003615165.1 GCA_003704975.1 GCA_003705035.1 GCA_013347375.1 GCA_013347365.1 GCA_014325185.1 GCA_014325215.1 GCA_014857085.1 GCA_016163925.1

I'm running HGTector with the command hgtector database -o customdb_dir --genbank -g genids.txt and the output I'm getting is this:

Database building started at 2022-09-07 15:25:33.376946. Using local file taxdump.tar.gz. Reading NCBI taxonomy database... done. Total number of TaxIDs: 2442490. Using local file assembly_summary_refseq.txt. Reading RefSeq assembly summary... done. Using local file assembly_summary_genbank.txt. Reading GenBank assembly summary... done. Total number of genomes: 1637581. Genome categories: archaea, bacteria, fungi, protozoa Downloading genome list per RefSeq category... Using local file refseq_archaea.txt. archaea: 1319 Using local file refseq_bacteria.txt. bacteria: 257994 Using local file refseq_fungi.txt. fungi: 459 Using local file refseq_protozoa.txt. protozoa: 95 Done. Downloading genome list per GenBank category... Using local file genbank_archaea.txt. archaea: 10075 Using local file genbank_bacteria.txt. bacteria: 1266069 Using local file genbank_fungi.txt. fungi: 12012 Using local file genbank_protozoa.txt. protozoa: 1469 Done. Total number of genomes in categories: 1548312. Filtering genomes... Including 39 custom genome IDs... Dropped 1548279 genomes. Done. Filtering genomes by taxonomy... Done. Traceback (most recent call last): File "/home/irecha/anaconda3/envs/HGTector/bin/hgtector", line 96, in <module> main() File "/home/irecha/anaconda3/envs/HGTector/bin/hgtector", line 35, in main module(args) File "/home/irecha/anaconda3/envs/HGTector/lib/python3.10/site-packages/hgtector/database.py", line 148, in __call__ self.filter_to_sampled() File "/home/irecha/anaconda3/envs/HGTector/lib/python3.10/site-packages/hgtector/database.py", line 591, in filter_to_sampled raise ValueError('No genome is retained after sampling.') ValueError: No genome is retained after sampling.
It seems like I'm not getting the genomes I specified in the file. Could you please help me with this issue?

The text was updated successfully, but these errors were encountered:

qiyunzhu · 2022-09-11T15:36:28Z

Hello @IrechaC Thanks for reporting this issue. I looked at the code and found that it is a bug. I just fixed it (#103 ). You can update the program by:

pip install --force-reinstall --no-cache-dir git+https://github.com/qiyunlab/HGTector.git

Note that in the genome ID list you provided, some don't have available protein sequences or lack some key metadata fields. They cannot be retrieved by the program. Eventually, 20 out of 38 genomes were retrieved.

qiyunzhu mentioned this issue Sep 11, 2022

Fixed two bugs in the database workflow #103

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error while building database with custom GenBank IDs list #101

Error while building database with custom GenBank IDs list #101

IrechaC commented Sep 7, 2022

qiyunzhu commented Sep 11, 2022

Error while building database with custom GenBank IDs list #101

Error while building database with custom GenBank IDs list #101

Comments

IrechaC commented Sep 7, 2022

qiyunzhu commented Sep 11, 2022