Process HA and NA alignments separately #122
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Replaces logic for running distance calculations and embeddings on either an HA alignment or a concatenated HA/NA alignment with logic to get distances and embeddings from HA and NA alignments separately. This works because pathogen-embed supports multiple input values to its alignment and distance matrix arguments. The benefit of this change is that the pathogen-distance command can calculate distances that ignore leading and trailing gaps in each gene's alignment that would otherwise be counted in the concatenated alignment. Since we did not calculate indel distances for the HA/NA analysis, this change to the workflow should only affect the PCA embeddings. Since the new simplex encoding of PCA inputs effectively ignores gaps, even the PCA embeddings should be minimally affected by this change. However, the most important aspect of this change is the demonstration of how we recommend these tools to be used for this kind of reassortment analysis.
Related issues
Closes #121