Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exposing Variant Filtering Params #104

Closed

Conversation

Layth17
Copy link
Member

@Layth17 Layth17 commented Mar 24, 2023

  • Tested

Tumor VAF cutoff for FP filter applied to mutect and strelka calls can now be tuned at immuno.wdl level using immuno.min_var_freq in yaml.

immuno.varscan_min_var_freq is used specifically for varscan now.

immuno.filter_somatic_llr_threshold can be set at the level of immuno.wdl in yaml

@Layth17 Layth17 linked an issue Mar 24, 2023 that may be closed by this pull request
@Layth17
Copy link
Member Author

Layth17 commented Mar 24, 2023

unclear what the functional distinction between varscan_min_var_freq and min_var_freq for now ...

For now, immuno.varscan_min_var_freq can be set using YAML
immuno.filter_somatic_llr_threshold as well

@Layth17 Layth17 marked this pull request as ready for review March 24, 2023 20:55
@Layth17 Layth17 requested a review from malachig March 24, 2023 22:42
Copy link
Member

@tmooney tmooney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for adding filter_somatic_llr_threshold

@Layth17
Copy link
Member Author

Layth17 commented Mar 27, 2023

immuno.wdl takes in varscan_min_var_freq & min_var_freq then passes them to somatic_exome.wdl which passes them to detectVariants.wdl which passes min_var_freq for both subworkflows/mutect.wdl and subworkflows/strelka_and_post_processing.wdl (varscan_min_var_freq is used for the varscan process) and respectively pass that value to subworkflows.fp_filter.wdl which ultimately calls tools/fp_filters with that specified value.

@@ -42,6 +42,8 @@ workflow detectVariants {
Float varscan_p_value = 0.99
Float? varscan_max_normal_freq

Float? min_var_freq = varscan_min_var_freq
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change I don't follow. Currently, if one does not specify any cutoffs in the input YAML. The min_var_freq that will get used in tools.fp_filter.wdl is 0.05. It seems like this would set that default to be 0.1. I would like these two to be independent and if one doesn't specify anything all the behavior should stay exactly as it is now.

Copy link
Member

@malachig malachig Mar 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the way this PR is setup right now, if someone ran immuno.wdl then min_var_freq would get set to 0.05 by default. And if someone ran somatic_exome.wdl then min_var_freq would be set to 0.05 via varscan_min_var_freq. But if someone were to run detect_variants.wdl directly, then min_var_freq would be set to 0.1 via varscan_min_var_freq. I think this is confusing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will assert that best practice should be to set the default value for a tool in the tool itself, and then only expose it upstream (or override it) when necessary because different users might want it set differently.

This seems like a case where it could be set as 0.05 on the tool, then exposed (with no default) upstream for folks that would like to change it. (I do know that people are modifying this one sometimes for low-VAF calling)

@malachig
Copy link
Member

I think we should reconsider why we are setting defaults for these cutoffs at various places in the chain from immuno -> somatic -> detect variants -> variant caller sub workflow -> variant caller tool or false positive filter itself.

I think we want the user running analysis from each of these points to be able override the default, but unless we are explicitly doing it on purpose for a good reason, why should we define a default value anywhere other than the final place it is about to be used (in this case either in varscan or the false positive filter)?

@chrisamiller
Copy link
Member

Yeah, agree with Malachi. This seems like an opportunity to clean up some of this parameter passing//logic (see my comment inline upthread). It's gotten that way over time, through changes by lots of different folks, but we can draw a line in the sand and say "the mess stops now" :)

@malachig
Copy link
Member

malachig commented Nov 7, 2023

Working on testing but if all goes well this PR should be deprecated by:

#136

@malachig
Copy link
Member

Testing of #136 worked out. Closing this one.

@malachig malachig closed this Nov 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

expose variant filtering params up to immuno
4 participants