Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Add support for the --link-exchange-rates option in the MAST model #285

Open
StefanFlaumberg opened this issue Jul 26, 2024 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@StefanFlaumberg
Copy link

Dear IQ-Tree team,

In a recent paper you have shown that re-estimating the substitution matrix under a profile mixture model on a database of relevant sequences (resulting in a GTRpmix matrix) may improve phylogenetic reconstruction accuracy. However, such matrix reestimation itself needs a guide tree, thus posing a self-reference problem as one would like to re-estimate the matrix to improve reconstruction of the very same tree being used as the guide tree. To put it shorter, the true topology of what should be used as a guide tree is usually unknown.
Fortunately, in practice we usually know the general topology of a species tree, but not sure about just several bipartitions in it. This leads to an elegant solution -- to use the tree-mixture model (MAST) with equal tree-weights during GTRpmix matrix estimation to express our partial knowledge about the guide tree topology.

Currently MAST works well with frequency profile mixtures, but cannot link the GTR20 matrix parameters across the frequency profiles. One gets a segmentation fault on trying to include the --link-exchange-rates option, like this:

Estimate model parameters (epsilon = 0.99000)
1. Initial log-likelihood: -10878.02429
ERROR: STACK TRACE FOR DEBUGGING:
ERROR:
ERROR: *** IQ-TREE CRASHES WITH SIGNAL SEGMENTATION FAULT
ERROR: *** For bug report please send to developers:
ERROR: ***    Log file: ./aln_iqtree_gtrpmix.log
ERROR: ***    Alignment files (if possible)
803984 Segmentation fault      iqtree2 -seed 123 -nt 3 -mem 3G -s ./aln.fasta -m "GTR20+C10+T[x,x]" --gtr20-model "LG" --link-exchange-rates -mwopt -te ./trees.nwk -me 0.99 -pre ./aln_iqtree_gtrpmix

Could you, please, implement the --link-exchange-rates option in the MAST model for the approach to work?
Thank you!

Best,
Stefan

@thomaskf
Copy link
Collaborator

thomaskf commented Aug 1, 2024

@StefanFlaumberg
Thanks for the suggestion! This is a good idea. We are currently busy with various projects but I will consider to do so, perhaps in the coming few weeks/months.

@thomaskf thomaskf added the enhancement New feature or request label Aug 1, 2024
@thomaskf thomaskf self-assigned this Aug 1, 2024
@roblanf
Copy link
Collaborator

roblanf commented Aug 6, 2024

Hi Stefan,

Related to this, we are working on a different solution to this problem. I'm not totally convinced that MAST is the right way to go here - I like the idea in principle (as I like all ideas for making all the different avenues of IQ-TREE work together), but the problem is that orthogonal mixture classes are multiplicative. So, if you have e.g. 5 MAST trees (i.e. tree classes), 60 profiles (i.e. frequency classes), and e.g. a +R4 model (i.e. 4 rate classes), then every site has 5604 = 1200 likelihoods to calculate, and any estimation will need 1200 times the RAM of estimating a single likelihood per site.

Because of this, anything we can do to reign in the number of classes is useful. One is to assume a tree.

So, another solution to the circular problem is to do what is internal to phylogenetics programs anyway, and:

  1. Infer a tree
  2. Infer a new model
  3. Go to 1, until convergence

W.r.t. convergence, you could look at the correlation of the Q matrix from 1 iteration to the next. Le and Gascuel did that for the LG model, and we copied them for the QMaker paper (I think we set the correlation had to be >0.999). We have been using the same approach for lots of estimates of Q matrices, and in my experience the process almost never goes beyond 2 iterations (even if the tree changes a decent amount after the first iteration), suggesting that in most cases the tree is not too important for estimating the Q matrix.

I hope some of that helps.

Rob

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants