blastCblast_stats

Kindly if you find this repo useful for your work, cite & star this repo

What is this script?

Blast or Cblaster are great tools for finding homologous and co-located homologous respectively. Here, you can break down the long complicated results into a simple result per species, which will count the number of hits per species. Then, it will efetch the number of the assembled genomes per species (via Biopython Enterez based on the NCBI assembly database). Dividing both numbers will hint at the spread of your cluster among different species. Finally, this script will draw a tree based on the pre-defined NCBI taxonomy among your species using ete3 toolkit Finally, merging the results of the database as a pie chart with the tree will give you a nice visualization.

What do you need?

For NCBI Blast (DNA or protein) users, you need the description CSV table as explained here,

For Cblaster users, You shall have the binary file as easy as I get it like this.

cblaster search --query_file CP018841.1.faa --binary example_binary.csv -bde "," -bhh -bdc 6 -mi 50 -mc 50 -hs 3000

PS: hs is very useful if you have a lot of results due to low coverage mc, low identity search mi. suggested by the last author in this issue.

So, type this command effortlessly.

 python blastcblast_stats.py -i  example_cblaster_binary.csv -og deinococcus_radiodurans

"-i /--input_dir" is your path to the directory for your <BlastN, P ,or Cblaster binary file>

"-og /--outgroup" <optional> is an outgroup species that I know that it is NOT in my results and phylogenetically far from my results. PS: do not forget to use underscore _ in the name of this species.

What about dependencies?

Pandas, Biopython, ete3, argparse

Well, for ete3, I recommend installing it via conda env (even if it takes a lot of time), if the pip does not work properly.

What do you get?

Currently, there are three files.

database_percentage_your_file.csv (The main output where you can find for each species the count of this species in the binary file, the number of assembled genomes per NCBI assembly database, and the percentage of (count/assembly)*100.
your_file_tree.nwk. If you would like to take this tree to a visulization tool (iTOL,FigTree,..)
your_file_tree.pdf. This is just a basic tree that links your isolates together but with a pie chart that shows the results of file number 1.

I hope this helps.

Thanks

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
Ncbi_blast.png		Ncbi_blast.png
README.md		README.md
blastcblast_stats.py		blastcblast_stats.py
example_binary_tree_with_pies-1.png		example_binary_tree_with_pies-1.png
example_blastn.csv		example_blastn.csv
example_blastp.csv		example_blastp.csv
example_cblaster_binary.csv		example_cblaster_binary.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

blastCblast_stats

About

Releases

Packages

Languages

AhmedElsherbini/blastCblast_stats

Folders and files

Latest commit

History

Repository files navigation

blastCblast_stats

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages