Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Usher_PHB #149

Merged
merged 9 commits into from
Aug 16, 2023
Merged

Usher_PHB #149

merged 9 commits into from
Aug 16, 2023

Conversation

sage-wright
Copy link
Member

@sage-wright sage-wright commented Aug 9, 2023

🛠️ Changes Being Made

This PR adds the Usher_PHB workflow

🧠 Context and Rationale

This workflow allows users to place their samples on the global phylogenetic tree. Pre-sets include sars-cov-2, mpox, RSV-A, and RSV-B.

📋 Workflow/Task Steps

This workflow runs the UShER command line tool.

Inputs

required

  • assembly_fasta : either an array or single-sample assembly to be placed on the tree.
  • organism : what organism to run usher on. The following organisms will download the latest data by default: sars-cov-2, mpox, RSV-A, and RSV-B.
  • tree_name : the name for your tree

optional

  • mutation_annotated_tree_pb : A protobuf file containing the mutation annotated tree (not needed unless a different organism than the 4 listed above)
  • reference_genome : The reference genome in .fasta format (not needed unless a different organism than the 4 listed above)
  • subtree_size : the number of nearest neighbors to include on the subtrees. by default, this is 20

Outputs

  • usher_clades : the clades predicted for the samples
  • usher_phb_analysis_date : the date the analysis was run
  • usher_phb_version : the version of PHB the workflow is from
  • usher_protobuf_version : the version of the protobuf tree (what day and what samples are included if from a default organism; otherwise, says it was user-provided)
  • usher_subtrees : an array of subtrees where your samples have been placed.
  • usher_uncondensed_tree : the entire global tree with your samples included
  • usher_version : the version of usher used

🧪 Testing

Locally

Terra

mpox

sars-cov-2

🔬 Quality checks

Pull Request (PR) checklist:

  • Include a description of what is in this pull request in this message.
  • The workflow/task has been tested locally and on Terra
  • The CI/CD has been adjusted and tests are passing
  • Everything follows the style guide

@jrotieno
Copy link
Contributor

Hi @sage-wright, wonderful work. A few quick ones:

  1. Is the number of subtrees expected to be equal to the number of samples a set or that is ad hoc dependent on how diverse the set is from each other?
  2. Related to 1. above, if it is the latter, I bet one cannot have a better naming system than subtree-1, subtree-2, etc, right?
  3. Unlike the outputs usher_clades, and usher_uncondensed_tree, the usher_subtrees output in Terra Job Manager is not a clickable link, e.g. [ "gs://fc-b10638b5-95fd-4758-b0cc-6a3d9153e5c0/submissions/e9879ba6-a760-4720-818d-53f33d241fe8/usher_workflow/a35b2b5c-aef4-4f1c-bbe2-a524d707519b/call-usher/glob-a0a6913e54ca67a3ab4b40632c05da6f/subtree-1.nh" ], which would rather be gs://fc-b10638b5-95fd-4758-b0cc-6a3d9153e5c0/submissions/e9879ba6-a760-4720-818d-53f33d241fe8/usher_workflow/a35b2b5c-aef4-4f1c-bbe2-a524d707519b/call-usher/glob-a0a6913e54ca67a3ab4b40632c05da6f/subtree-1.nh. However, on the data table the links are fine which is the most important.

@sage-wright
Copy link
Member Author

Thanks James!

  1. The number of subtrees is dependent on how diverse the samples are. If they're very similar, a lower number of subtrees will be found. If very diverse, a higher number will be found. The most subtrees that can be found is the number of samples in the set, so a subtree for each set.
  2. Yes, because of that, the naming convention can't really be changed unless we dig into the files, but that would get messy quickly if more than one of the provided samples are in the subtree.
  3. I'm guessing they can't be clicked on in the Job Manager because they're in an array. But yes, since they're clickable in the table, that's what matters most.

@jrotieno jrotieno merged commit cfe1f29 into main Aug 16, 2023
5 checks passed
@cimendes cimendes deleted the smw-usher-dev branch August 16, 2023 09:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants