Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

expose variant filtering params up to immuno #97

Closed
malachig opened this issue Mar 17, 2023 · 3 comments
Closed

expose variant filtering params up to immuno #97

malachig opened this issue Mar 17, 2023 · 3 comments
Assignees
Labels
enhancement New feature or request

Comments

@malachig
Copy link
Member

It seems that currently I can't configure these values in my YAML?

Tumor VAF cutoff applied to individual variant caller results (mutect and strelka) prior to creating a merged VCF

Float min_var_freq = 0.05

LLR threshold applied to the merge VCF

Float filter_somatic_llr_threshold = 5

@malachig
Copy link
Member Author

I would like to be able to do this in my immuno YAML:

#Reduce tumor VAF cutoff for FP filter applied to mutect and strelka calls (default is 0.05)
immuno.min_var_freq: 0.03

#Reduce tumor VAF cutoff use by varscan (default is 0.05)
immuno.varscan_min_var_freq: 0.03

#Reduce LLR threshold for filtering of the multi caller merged VCF (default is 5)
immuno.filter_somatic_llr_threshold: 2

But currently I think I can only do the varscan one?

@malachig malachig added enhancement New feature or request good first issue Good for newcomers labels Mar 24, 2023
@Layth17 Layth17 linked a pull request Mar 24, 2023 that will close this issue
1 task
@malachig
Copy link
Member Author

Some notes on this issue:

  • There are two general ways that minimum variant frequency (VAF) filters are used in the pipelines: supplied to tools/varscan_somatic.wdl/tools/varscan_germline.wdl, and to tools/fp_filter.wdl
  • Defaults for these filters are set to 0.05 or 0.1 for varscan and 0.05 for fp_filter
  • Even though we call it varscan_germline it is NOT actually used for germline variant calling. Rather, it is used for tumor only variant calling. The default threshold reflects that. Or germline variant calling uses GATK.
  • fp_filter is applied to variant calls from varscan, pindel, strelka and mutect. This means two rounds of filtering on VAF are applied to the VarScan results.
  • At present the defaults are set many, places throughout the pipeline. The first one encountered (or one you set in your inputs YAML) takes precedent.

The only real application I can see to having separate variables for varscan and fp_filter is where you set the varscan_filter at a more stringent threshold (e.g. 0.1) but then apply a more relaxed filter in fp_filter (e.g. 0.05). This would allow lower VAF variants to come from Strelka/Mutect than that allowed from VarScan. I'm not entirely convinced that this narrow use case is worth the complexity it creates to have separate variables. However, it does seem that was the intent of how the pipeline was created.

I am experimenting with a PR that will implement the following approach:

  • Name the two min_var_freq variables according their usage in varscan or fp_filter to make it easier to trace which is being used where
  • Do NOT set defaults for these variables anywhere except in the final tool WDL.
  • Make sure they can be passed in and set in the YAML when one runs immuno.wdl or any of the sub-workflows that involve variant detection.

@malachig malachig removed the good first issue Good for newcomers label Apr 13, 2023
@malachig malachig self-assigned this Apr 13, 2023
@malachig
Copy link
Member Author

malachig commented Mar 7, 2024

This now complete (#136) and working as expected.

@malachig malachig closed this as completed Mar 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants