-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem in downloading database #126
Comments
Hi @Subhajeet1997 Thanks for reporting. I have not seen this problem before. It seems to be a problem outside HGTector's Python code. Perhaps it is because your gzip library isn't correctly installed in the computer. To debug, you may grab a downloaded .gz file (say, import gzip
f = gzip.open('filename.gz', 'rb')
print(f.read().decode().splitlines()[0])
f.close() If you get the same error, then my guess is correct. |
yes, i have tried to gzip a file using your script. it is showing following error |
But gzip is properly installed in my system. when i tried to unzip the same file with gzip -d "filename". it is easily unzipped |
I see. The gzip program and Python may use different libraries. Perhaps the Python part is not right. It could also be that the gzipped file you tested is not a text file, causing the decoding error. Can you please try a text file? Alternatively, you can modify the line of code from |
test.txt.gz |
Hello, I can't download the database by default method. So, I have downloaded the pre-built database named "hgtdb_20230102" and unzip it. It contains "db.faa, genome.map.gz, genomes.tsv, lineages.txt, taxdump, taxon.map.gz" files. I have then tried to do manual database compilation using following command. |
Hello @Subhajeet1997 Thanks for the follow-up. I just tried to compile the "hgtdb_20230102" database using DIAMOND v2.1.8 (the latest version), and it worked. I also tried to do it on the demo database "ref107" and it worked too. Therefore, I am afraid that I cannot reproduce the error you encountered. Which DIAMOND version did you use? If it's too old (like 0.7.x) there could be a problem. Otherwise, you perhaps can check the integrity of the downloaded database file. There is an MD5 checksum attached in the repository for you to do this check. |
Also, I just built a small custom database using the |
Yes, you are right, my diamond tool is of older version diamond v0.9.25.126. I will update the diamond and try to compile the database. But for now, I have compiled the database using makeblastdb, it is successfully compiled and I have run one search using blast. It is obviously slow compared to diamond, taking 2-2.5 days to run. So, I am waiting for the output. Hope I will get some results. |
Hey, the blast run has successfully and got results. But I have another query what are default parameters for "--maxhits --evalue --identity --coverage ". As I run in default, is running in default mode acceptable? |
Hi @Subhajeet1997 The default parameters are stored in
|
Hello Prof. Zhu (@qiyunzhu ),
The code for Bacillota is 1239 which is what I want to download. But even this is taking an awfully long time (approx. 13h). The download is happening without any error but it's too slow. Following are my system and Wifi details :
Do I require more disk space for this download? Or is there anything wrong with the code? If you think that my disk space is not enough could you suggest any other way to do this? Thank you. |
I have used the command "hgtector database -o db_dir --default" to download the database. After downloading the protein files successfully. when downloading the genome files. it is showing following error
Using local file GCF_963082495.1_Q8283_protein.faa.gz.
Using local file GCF_963378075.1_MU0083_Flye_MinION_protein.faa.gz.
Using local file GCF_963378095.1_MU0053_Flye_MinION.2_protein.faa.gz.
Using local file GCF_963378105.1_MU0102_Flye_MinION_protein.faa.gz.
Using local file GCF_963394915.1_CCUG_26878_T_protein.faa.gz.
Done.
Extracting downloaded genomic data...Killed
what is the reason behind it??
The text was updated successfully, but these errors were encountered: