-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dendogram #75
Comments
Hi, Sure, you can do that. You'd start with a distance or similarity matrix, and then feed that into a hierarchical clustering algorithm. Good options could include scipy's hierarchical clustering (https://docs.scipy.org/doc/scipy/reference/cluster.hierarchy.html) or HDBSCAN, both of which can work on distance matrices. For parameter election, the k will depend on how similar the genomes are. 16-19 seems to be good for generating pairwise distance across all fungal genomes in RefSeq, but if you're working with many related strains you may want something more like 30-100. An example workflow with Scipy's Hierarchical Clustering you might follow: import numpy as np
import scipy.cluster.hierarchy as sch
import matplotlib.pyplot as plt
x = ... # Parse distance matrix from file somehow
# If square, convert to condensed distance matrix from scipy.cluster.hierarchy
if x.ndim > 1:
from scipy.spatial.distance import squareform
x = squareform(x)
L = sch.linkage(x)
dn = sch.dendrogram(L) You can then export the dendrogram or visualize it with matplotlib. ( The downside to this is that it only works for symmetric distances in SciPy, though you should be able to use containment distance with HBDSCAN. Of course, you can convert any similarity measure (containment, jaccard) into a distance by using Spectral Clustering, for instance, will use affinities rather than distances. I hope this helps, and let me know if you have any further questions or problems. Thanks, Daniel |
Quicktree also performs quite well
|
Can you create a dendrogram from the dist results?
Also, could you recommend parameters for large fungal genome comparison?
The text was updated successfully, but these errors were encountered: