Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concordance Analysis Bug - Large Number of Partitions? #316

Open
jasongallant opened this issue Sep 10, 2024 · 6 comments
Open

Concordance Analysis Bug - Large Number of Partitions? #316

jasongallant opened this issue Sep 10, 2024 · 6 comments

Comments

@jasongallant
Copy link

Hi There,

I'm trying to follow along this tutorial with my own data (http://iqtree.org/doc/recipes/concordance-vector)

I'm currently using the latest release (IQ-TREE multicore version 2.3.6 for Linux x86 64-bit built Aug 1 2024). I can successfully run this command:

iqtree2 -te astral_species_annotated.tree -p loci.best_model.nex --scfl 100 --prefix scfl -T 128

This example dataset contains 400 genes from a variety of bird species.

I'm trying to do something similar with about 25k genes. When I run this with the full dataset:

iqtree2 -te my_astral_species_annotated.tree -p my_loci.best_model.nex --scfl 100 --prefix scfl -T 128

I get this error:

Reading partition model file my_loci.best_model.nex ...
Reading "SETS" block...
terminate called after throwing an instance of 'std::__cxx11::basic_string<char, std::char_traits, std::allocator >'
ERROR: STACK TRACE FOR DEBUGGING:
ERROR:
ERROR: *** IQ-TREE CRASHES WITH SIGNAL ABORTED
ERROR: *** For bug report please send to developers:
ERROR: *** Log file: loci.best_model.repaired.nex.log
ERROR: *** Alignment files (if possible)
Aborted

However, If I manually edit the my_loci.best_model.nex to only include the first 10 genes, iqtree2 runs without issue. This causes me to suspect that this is related to the large number of partitions, however the program crashes nearly instantly. I'm running attempting this run on a machine with 128 processors and 2TB of RAM.

Any suggestions how to fix or proceed with this? Many thanks in advance!

@jasongallant
Copy link
Author

I wrote a little python script that subsets the my_loci.best_model.nex randomly-- looks like somewhere between 200-400 sequences is the limit before it crashes?

@jasongallant
Copy link
Author

For what its worth, this is the same type of analysis attempted in #155

@roblanf
Copy link
Collaborator

roblanf commented Sep 16, 2024

@thomaskf and @bqminh any ideas here?

@jasongallant, one option you could try is to use --scf instead. I appreciate this is not the same, but it might get you some useful information and/or help us track down the bug

@jasongallant
Copy link
Author

Hi @roblanf - thanks for the reply, working with scf right now-- I also noted another issue #223 that affects tree calculations (noticed by @simone-says originally) in scfl. It has made the going tough, but it looks like scf is the way forward until this gets ironed out. let me know if I can provide more info on this end.

@roblanf
Copy link
Collaborator

roblanf commented Sep 17, 2024

Thanks for the cross-linking! As on the other thread, the most useful thing is a reproducible example if you have one, then as soon as one of us has time we can get straight to debugging.

@thomaskf
Copy link
Collaborator

Hi @jasongallant,
Thanks again for reporting the issue, and sorry for the delay. I have tested the program with a data set containing around 30K partitions, and it worked without any problems. Is it possible to share your data with us so we can investigate the issue further? If the dataset is too large, you may send a smaller subset where you encountered the error. Thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants