You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We currently assign clade labels to trees in our main phylogenetic workflow using the augur clades command and the influenza clade nomenclature TSVs. However, clade assignments vary for some samples between these public/private trees and the Nextclade trees. Clade assignments can vary between different runs of public/private tress from the same time period due to different sample compositions of the trees produced by our random subsampling logic. These mismatches can cause confusion among users who look at both Nextclade outputs for their own data and the public/private Nextstrain trees.
Description
Since Nextclade provides a standard clade label interface already, we should use Nextclade to annotate clades in our main phylogenetic workflow instead of augur clades. This change will ensure that the samples are assigned to the same clade regardless of the sample composition of a given public/private tree.
Possible solutions
In the short term, we could replace our nextalign alignment with nextclade using the corresponding reference's dataset for each subtype. We would need to replace the current augur clades command with functionality like @corneliusroemer proposed in nextstrain/augur#1329 that allows us to assign clades to internal nodes and branches for complete backward compatibility of clade display in Auspice. Instead of inferring clades for internal nodes as a discrete trait, we could consider assigning clades with Nextclade to the inferred ancestral sequences for nodes.
In the long (medium?) term, we could run Nextclade during our "data upload to S3" workflow, upload the alignments and Nextclade annotations joined with metadata, and then start our workflows with those files. This approach would allow us to skip the alignment and clades steps of the current workflow and it would provide useful Nextclade data on S3 that we need for other analyses like flu frequencies, etc.
The text was updated successfully, but these errors were encountered:
Context
We currently assign clade labels to trees in our main phylogenetic workflow using the
augur clades
command and the influenza clade nomenclature TSVs. However, clade assignments vary for some samples between these public/private trees and the Nextclade trees. Clade assignments can vary between different runs of public/private tress from the same time period due to different sample compositions of the trees produced by our random subsampling logic. These mismatches can cause confusion among users who look at both Nextclade outputs for their own data and the public/private Nextstrain trees.Description
Since Nextclade provides a standard clade label interface already, we should use Nextclade to annotate clades in our main phylogenetic workflow instead of
augur clades
. This change will ensure that the samples are assigned to the same clade regardless of the sample composition of a given public/private tree.Possible solutions
In the short term, we could replace our
nextalign
alignment withnextclade
using the corresponding reference's dataset for each subtype. We would need to replace the currentaugur clades
command with functionality like @corneliusroemer proposed in nextstrain/augur#1329 that allows us to assign clades to internal nodes and branches for complete backward compatibility of clade display in Auspice. Instead of inferring clades for internal nodes as a discrete trait, we could consider assigning clades with Nextclade to the inferred ancestral sequences for nodes.In the long (medium?) term, we could run Nextclade during our "data upload to S3" workflow, upload the alignments and Nextclade annotations joined with metadata, and then start our workflows with those files. This approach would allow us to skip the alignment and clades steps of the current workflow and it would provide useful Nextclade data on S3 that we need for other analyses like flu frequencies, etc.
The text was updated successfully, but these errors were encountered: