You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I generated a large similarity table from ~1 million genomes using sourmash'es branchwater and tried to cluster it with Clusty. Clusty didn't finish reading the input after 6 hours, while pyLeiden finished reading everything within ~30 min.
Is there a reason Clusty is taking so much time to read the input?
The text was updated successfully, but these errors were encountered:
We manged to run Clusty on distance tables having tens of gigabytes so it seems there is some issue which made Clusty hung on your dataset. How large is your data file and how many distances it contains? Could you please provide me with at least part of it?
I don't have the original input anymore, but a filtered version (which Clusty is also taking a long time to read) is ~34GB with 401,294,724 lines (not counting the header).
I generated a large similarity table from ~1 million genomes using sourmash'es branchwater and tried to cluster it with Clusty. Clusty didn't finish reading the input after 6 hours, while pyLeiden finished reading everything within ~30 min.
Is there a reason Clusty is taking so much time to read the input?
The text was updated successfully, but these errors were encountered: