Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DIAMOND and BLAST version issue #41

Closed
Yliub opened this issue Nov 24, 2019 · 31 comments
Closed

DIAMOND and BLAST version issue #41

Yliub opened this issue Nov 24, 2019 · 31 comments

Comments

@Yliub
Copy link

Yliub commented Nov 24, 2019

Hi Qiyun,
I am using the HGTector2 pipeline to investigate the HGT event of my 30 bacterial strain. I set up my local diamond database using the file downloaded from your dropbox. I seem that everything is ok. but when I use the search function, the screen always traceback me a messages showed as follow:
Traceback (most recent call last):
File "/home/liub/miniconda3/envs/hgtector/bin/hgtector", line 100, in
main()
File "/home/liub/miniconda3/envs/hgtector/bin/hgtector", line 39, in main
module(args)
File "/home/liub/miniconda3/envs/hgtector/lib/python3.7/site-packages/hgtector/search.py", line 167, in call
else None)
File "/home/liub/miniconda3/envs/hgtector/lib/python3.7/site-packages/hgtector/search.py", line 706, in search_wf
res = self.diamond_search(seqs)
File "/home/liub/miniconda3/envs/hgtector/lib/python3.7/site-packages/hgtector/search.py", line 1542, in diamond_search
raise ValueError('diamond failed with error code {}.'.format(ec))
ValueError: diamond failed with error code 1.
what's wrong with this? waiting for your repy. thank you very much
Regards

@qiyunzhu
Copy link
Contributor

Hello @Yliub This error message means that when HGTector calls DIAMOND to do a homology search, the latter program failed for some reason. The error code "1" does not tell us much about the actual reason though. I would suggest that you try the following steps:

  1. Check your DIAMOND version. If it's too old (e.g., < 0.9), things could go wrong.

  2. Which database did you download from my Dropbox? If the small test database (ref107), the DIAMOND database was prebuilt using DIAMOND 0.9.26, and an earlier version of DIAMOND may not work. If the full database which you need to compile by yourself, the DIAMOND version won't matter since it will always match your local DIAMOND version.

  3. Try to run DIAMOND manually to see which specific error message it reports. The command is:

diamond blastp --query input.fa --db /path/to/db --outfmt 6 qseqid sseqid pident evalue bitscore qcovhsp staxids

@Yliub
Copy link
Author

Yliub commented Nov 25, 2019

Hi Qiyun,
I download the full database via the link you mentioned at hgtector database manual where it can directly click the ' download' word, I have tried different Diamond version including 0.7.x, 0.8.x ,0.9.26 and the latest version, it still remind me the same message, I once suspect that if I have built a bad database, so I delete the establised database and re-built it successfully, but the same trackback was still on the screen. I tried to conduct hgrector search by remote server. It still back me wrong message. I don't know why?
Regards
Yliub

@Yliub
Copy link
Author

Yliub commented Nov 25, 2019

If the CPU and memory have a effect on this phenomenon? my computer memory is 8Gb and CPU is dual core 4 thread

@qiyunzhu
Copy link
Contributor

Hello @Yliub It may be the case that 8Gb memory is not sufficient for DIAMOND. What do you see if you run the command manually as I suggested above?

And what do you mean by "conduct hgrector search by remote server"? I guess the wrong message won't be the same. Can you let me know it?

@Yliub
Copy link
Author

Yliub commented Nov 25, 2019

Hi,
I check the database I have downloaded, it is a pre-built default database available at 2019-10-21.

@qiyunzhu
Copy link
Contributor

qiyunzhu commented Nov 25, 2019

@Yliub Sorry. No. This database may require more than 8Gb. Can you try that command?

@Yliub
Copy link
Author

Yliub commented Nov 25, 2019

ok,I will reply you as soon as I come to my work room because my side is two o'clock now,thank you so much.

@qiyunzhu
Copy link
Contributor

You are welcome! :)

@Yliub
Copy link
Author

Yliub commented Nov 25, 2019

Hello @qiyunzhu is it necessary that the computer memory must be more than 8Gb if I run remote search?

@Yliub
Copy link
Author

Yliub commented Nov 26, 2019

Hi@qiyunzhu
I try to run the command you provided, the screen back me the following message: Failed to allocate sufficient memory. Please refer to the manual for instructions on memory usage.
I try to run remote search by run the command line: hgtector search -i input -o . but it back me the following message: File "/home/liub/miniconda3/envs/hgtector/bin/hgtector", line 100, in
main()
File "/home/liub/miniconda3/envs/hgtector/bin/hgtector", line 39, in main
module(args)
File "/home/liub/miniconda3/envs/hgtector/lib/python3.7/site-packages/hgtector/search.py", line 167, in call
else None)
File "/home/liub/miniconda3/envs/hgtector/lib/python3.7/site-packages/hgtector/search.py", line 701, in search_wf
res = self.remote_search(seqs)
File "/home/liub/miniconda3/envs/hgtector/lib/python3.7/site-packages/hgtector/search.py", line 1623, in remote_search
if time() - starttime > self.timeout:
TypeError: '>' not supported between instances of 'float' and 'NoneType'

@qiyunzhu
Copy link
Contributor

Hi @Yliub Thanks for the error message! I believe this is a bug, and I just fixed it. Can you reinstall HGTector2 and test if the problem is resolved? Thank you!

@Yliub
Copy link
Author

Yliub commented Nov 27, 2019

Hi @qiyunzhu I reinstall the hgtector2 pepiline and run hgtector remote search command. the screen back me the following:
Traceback (most recent call last):
File "/home/liub/miniconda3/envs/hgtector/bin/hgtector", line 100, in
main()
File "/home/liub/miniconda3/envs/hgtector/bin/hgtector", line 39, in main
module(args)
File "/home/liub/miniconda3/envs/hgtector/lib/python3.7/site-packages/hgtector/search.py", line 173, in call
self.taxinfo_wf(res)
File "/home/liub/miniconda3/envs/hgtector/lib/python3.7/site-packages/hgtector/search.py", line 804, in taxinfo_wf
xml = self.remote_taxinfo(tids2q)
File "/home/liub/miniconda3/envs/hgtector/lib/python3.7/site-packages/hgtector/search.py", line 1876, in remote_taxinfo
res = self.remote_fetches(ids, 'db=taxonomy&id={}')
File "/home/liub/miniconda3/envs/hgtector/lib/python3.7/site-packages/hgtector/search.py", line 1301, in remote_fetches
res += self.remote_fetch(urlapi.format(','.join(batch)))
File "/home/liub/miniconda3/envs/hgtector/lib/python3.7/site-packages/hgtector/search.py", line 1346, in remote_fetch
return response.read().decode('utf-8')
File "/home/liub/miniconda3/envs/hgtector/lib/python3.7/http/client.py", line 464, in read
return self._readall_chunked()
File "/home/liub/miniconda3/envs/hgtector/lib/python3.7/http/client.py", line 571, in _readall_chunked
chunk_left = self._get_chunk_left()
File "/home/liub/miniconda3/envs/hgtector/lib/python3.7/http/client.py", line 554, in _get_chunk_left
chunk_left = self._read_next_chunk_size()
File "/home/liub/miniconda3/envs/hgtector/lib/python3.7/http/client.py", line 514, in _read_next_chunk_size
line = self.fp.readline(_MAXLINE + 1)
File "/home/liub/miniconda3/envs/hgtector/lib/python3.7/socket.py", line 589, in readinto
return self._sock.recv_into(b)
File "/home/liub/miniconda3/envs/hgtector/lib/python3.7/ssl.py", line 1071, in recv_into
return self.read(nbytes, buffer)
File "/home/liub/miniconda3/envs/hgtector/lib/python3.7/ssl.py", line 929, in read
return self._sslobj.read(len, buffer)
socket.timeout: The read operation timed out

@Yliub
Copy link
Author

Yliub commented Nov 27, 2019

Hi @qiyunzhu before the error messagae, the pepline has ran some time, and retrieved a part of the result. is that as a result of the internet speed limit?

@qiyunzhu
Copy link
Contributor

Hello @Yliub Sorry for the late response! It seems to me that the pipeline worked; and the error was due to Internet limit. If you run the same command again, the pipeline will resume from the checkpoint. Hopefully, you will get the final result after one or a few trials :)

But the gsul example is just an example. You will eventually need to set up the local database on a decent server (not the one with 8Gb memory) and run HGTector2 in local search mode. Do you have access to such a server.

@Yliub
Copy link
Author

Yliub commented Nov 29, 2019

Hi @qiyunzhu,
I have ran remote search successfully,but it's very slow,are there any other methods without memory limits to set up local database?

@qiyunzhu
Copy link
Contributor

Hello @Yliub ,

I think there may be a solution after carefulling reading the DIAMOND manual. It says:

--block-size/-b #
Block size in billions of sequence letters to be processed at a time. This is the main pa-
rameter for controlling the program’s memory usage. Bigger numbers will increase the use
of memory and temporary disk space, but also improve performance. The program can be
expected to use roughly six times this number of memory (in GB). So for the default value of
-b2.0, the memory usage will be about 12 GB.

So, you may follow this instruction and the HGTector database compilation instruction to build a new DIAMOND database which demands less than 8Gb memory on your computer.

By the way, I just fixed a bug of HGTector2. You may update your copy by running:

pip install git+https://github.com/DittmarLab/HGTector.git

@Yliub
Copy link
Author

Yliub commented Nov 30, 2019

@qiyunzhu Ok, thanks for your help.

@Yliub
Copy link
Author

Yliub commented Dec 5, 2019

Hi@qiyunzhu, I try to execute the --block-size/-b command in diamond algorithm. but it doesn‘t work, I carefulling reading the DIAMOND manual, I think the --block-size/-b command should be executed during making alignment process. so could you please make it inbuilted in your hgtector search command?

@qiyunzhu
Copy link
Contributor

qiyunzhu commented Dec 5, 2019

Hello @Yliub Thanks for letting me know! Running DIAMOND on a computer with 8GB memory is something unexpected and could reduce performance. So I hesitate to make the change you suggested into HGTector. That being said, you can modify your HGTector installation to include this feature. Specifically, you can edit this line in search.py to add this DIAMOND parameter. Hope it works!

@Yliub
Copy link
Author

Yliub commented Dec 5, 2019

@qiyunzhu ok,I will try it, thank you very much.

@Yliub
Copy link
Author

Yliub commented Dec 5, 2019

Dear@qiyunzhu, I am sorry for disturbing you again, I modified the search.py file in the line you mentioned and subsequently run hgtector search command, but it seem doesn't work, and the error meassage is showed in the attached picture. if possible, could you please add the --block-size parameter in the corresponding search.py line, it's appreciate that if you can help me to add this parameter and send the revised file to me. my e-mail is [email protected].
4
3

@Yliub
Copy link
Author

Yliub commented Dec 5, 2019

Hi@qiyunzhu, by the way, I run the command " diamond blastp --query input.fa --db /path/to/db --outfmt 6 qseqid sseqid pident evalue bitscore qcovhsp staxids" plus " --block-size" aligner options, it works successfully agianst the local database. so I think the "hgtector search" using diamond with " --block-size" aligner options will work.

@qiyunzhu
Copy link
Contributor

qiyunzhu commented Dec 5, 2019

Hello @Yliub , in the way you modified the code, you added an attribute block-size to self, however this attribute does not exist, which causes error. You need to hard-code this parameter. Here is the correct way (I set it as 1.0 so the memory consumption is 6GB):

        cmd = '{} blastp --block-size 1.0 --query {} --db {} --threads {} --tmpdir {}'.format(

I have sent a modified search.py to the email address you provided. Hope it helps!

@Yliub
Copy link
Author

Yliub commented Dec 6, 2019

@qiyunzhu hi,it works, thanks for your kind help

@Yliub
Copy link
Author

Yliub commented Dec 31, 2019

Hi @qiyunzhu, I am sorry for disturbing you again. the HGTector, as you known, is a powful tool for the predication of HGT events, my 30 bacterial genomes were used as a whole to predict the gene donors using default parameters. but the results I think were untrusted (I attached the results in the figure uploaded). the predicted gene donors were somtimes phylogenetic included in the upper level, for example Serratia sp. S1B belongs to Gammaproteobacteria, and the Gammaproteobacteria were also predicted, and all the predicted gene donors can eventually belongs to bacteria. so my puzzle is that how can I predicted the gene donors at a uniform bacterial level (such as Phylum,Class,Order or Family). thank you very much.

Best regards to you
liub
result

@qiyunzhu
Copy link
Contributor

Hello @Yliub Sorry for the late response. The beginning of 2020 has been quite tough in my calendar.

Re. your question: The scenario you described is not unusual. HGTector does not explicitly predict donors, but only reports a taxonomic unit that could summarize the top hits from the distal group. The details are here. This algorithm is consistent with DIAMOND.

Now here comes the question: if these "top hits" are too diverse in taxonomy, HGTector will just find the LCA, which can be at a very high level, and thus less meaningful. However this is the best the program can do given current search results. It is safe, though not precise. Please don't expect that every gene will have a donor at a lower level (e.g., species). If the program says "Gammaproteobacteria", it means that although Serratia sp. S1B could be a plausible donor, other gammaproteobacteria also have high bit scores, and the program cannot safely exclude them from the candidate donor list, therefore it chooses to report the LCA.

Please note that none of the taxonomic names shown here are the true donor, they are but close relatives to the true donor, which must be an organism once lived in history and died out already. The goal of donor prediction is to describe the hypothetical donor using extant taxonomy.


If you really want to get precise donors (e.g., genus- or species-level), you can add --distal-top 0 to the command. This will only report one best hit, or hits which have equally high bit scores as the best hit. Therefore you have a higher chance of getting precise output, although it is not the safest way.

If you want to force all donors at a specific rank, it is also doable: by translating the donors into a specific rank. Actually the previous HGTector can do it, but I dropped this functional module in HGTector 2. Maybe I should add it back.

Hope this helps!

@Yliub
Copy link
Author

Yliub commented Jan 15, 2020

@qiyunzhu So, first of all, thank you for your patient and professional guide. I don't understand what you have occured, but wishing you have a good day so as to pack your mood and work forward. Now, let's back to the questions. I just want to care about the hypothetical donors at a specific rank, though, you known, they are not true donor, so could you please add corresponding function module in the latest updated HGTector 2. Thank you very much.

Best regards to you
liub

@qiyunzhu
Copy link
Contributor

Hello @Yliub thanks! Same wishes to you. I am totally fine. Just busy working. I think this function is useful and I will add in the next version. It may not be too soon since I have got other priorities. Thanks for your interest.

@Yliub
Copy link
Author

Yliub commented Jan 15, 2020

@qiyunzhu OK, that's fine, please inform me when the new version is available. thanks!
Regards

@qiyunzhu qiyunzhu changed the title ValueError DIAMOND and BLAST version issue Apr 24, 2020
@Yliub
Copy link
Author

Yliub commented Sep 30, 2020

Hi @ @qiyunzhu, have you add the functional module which can predicte the hypothetical gene donors at at a specific rank in the newest HGTector version ?

@qiyunzhu
Copy link
Contributor

This feature request (also mentioned in issue #62 ) is now fulfilled in pull request #66 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants