-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: ffpe event handling now correctly considers the provided purity #33
base: main
Are you sure you want to change the base?
Conversation
This requires Varlociraptor 8.3, see PR snakemake-workflows/dna-seq-varlociraptor#249 |
…ows/dna-seq-mtb into fix/ffpe-event-handling
WalkthroughThe pull request introduces comprehensive updates across multiple configuration files for a genomic analysis workflow. The changes primarily focus on updating the reference genome version from 100 to 111, adding a new Changes
Sequence DiagramsequenceDiagram
participant Config as Configuration
participant Workflow as Workflow Engine
participant VEP as Variant Effect Predictor
participant Plugins as Annotation Plugins
Config->>Workflow: Update reference genome
Config->>Workflow: Configure mutational signatures
Workflow->>VEP: Initialize variant annotation
VEP->>Plugins: Load REVEL
VEP->>Plugins: Load SpliceAI
VEP->>Plugins: Load AlphaMissense
Plugins-->>VEP: Return annotated variants
Poem
Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
workflow/resources/config/gene-sets/cancer-genes.txt (1)
1143-1143
: Addition of MUC17 to cancer genes list.The inclusion of MUC17 is appropriate as it's a known cancer-associated gene. Please ensure there's a newline at the end of the file.
workflow/resources/config/default.yaml (2)
116-122
: Consolidated variant scoring logic.The new
any_score_malign
filter effectively combines SpliceAI, AlphaMissense, and REVEL scores. However, there are trailing spaces in the expressions.Remove trailing spaces from lines 120-121:
-(ANN['SpliceAI_pred_DS_AG'] > 0.5 or ANN['SpliceAI_pred_DS_AL'] > 0.5 or ANN['SpliceAI_pred_DS_DG'] > 0.5 or -ANN['SpliceAI_pred_DS_DL'] > 0.5) or (ANN['am_pathogenicity'] is NA or ANN['am_pathogenicity'] >= 0.34) or +(ANN['SpliceAI_pred_DS_AG'] > 0.5 or ANN['SpliceAI_pred_DS_AL'] > 0.5 or ANN['SpliceAI_pred_DS_DG'] > 0.5 or +ANN['SpliceAI_pred_DS_DL'] > 0.5) or (ANN['am_pathogenicity'] is NA or ANN['am_pathogenicity'] >= 0.34) or🧰 Tools
🪛 yamllint (1.35.1)
[error] 120-120: trailing spaces
(trailing-spaces)
[error] 121-121: trailing spaces
(trailing-spaces)
541-544
: New custom alignment properties configuration.The custom alignment properties feature is well-structured but currently disabled. Consider adding documentation about the expected format of the TSV file.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
.test/data/alphamissense_scores.tsv.gz
is excluded by!**/*.gz
.test/data/spliceai_scores.raw.snv.hg38.vcf.gz
is excluded by!**/*.gz
📒 Files selected for processing (9)
.test/config-tumor_normal/config.yaml
(2 hunks).test/config-tumor_normal_ffpe/config.yaml
(2 hunks).test/config-tumor_only/config.yaml
(2 hunks).test/config-tumor_only_ffpe/config.yaml
(2 hunks)config/config.yaml
(1 hunks)workflow/Snakefile
(1 hunks)workflow/resources/config/default.yaml
(19 hunks)workflow/resources/config/gene-sets/cancer-genes.txt
(1 hunks)workflow/resources/config/scenario.yaml
(4 hunks)
✅ Files skipped from review due to trivial changes (1)
- workflow/Snakefile
🧰 Additional context used
🪛 yamllint (1.35.1)
workflow/resources/config/default.yaml
[warning] 38-38: wrong indentation: expected 4 but found 6
(indentation)
[warning] 41-41: too many spaces after colon
(colons)
[error] 120-120: trailing spaces
(trailing-spaces)
[error] 121-121: trailing spaces
(trailing-spaces)
workflow/resources/config/scenario.yaml
[error] 48-48: trailing spaces
(trailing-spaces)
[error] 94-94: no new line character at the end of file
(new-line-at-end-of-file)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: Testing
🔇 Additional comments (13)
.test/config-tumor_only/config.yaml (3)
12-12
: LGTM: Release version updateThe Ensembl release has been consistently updated to version 111 across all configuration files.
20-22
: LGTM: New mutational signatures sectionThe new section is properly structured and currently deactivated, which is a safe default.
63-65
: Verify genome build compatibility with plugin data filesThe configuration specifies saccharomyces_cerevisiae as the species, but the SpliceAI and AlphaMissense data files are for hg38 (human genome). This might cause issues during variant annotation.
Please confirm if:
- The species setting should be updated to homo_sapiens
- The data file paths should be updated for yeast genome
config/config.yaml (1)
44-55
: LGTM: Well-documented VEP configuration templateThe main configuration file provides a well-documented template for VEP plugins with:
- Clear documentation for plugin sources
- Proper placeholder syntax for file paths
- Consistent plugin list with test configurations
.test/config-tumor_normal/config.yaml (1)
12-12
: Same issues as in tumor-only configThe changes in this file mirror those in the tumor-only configuration, including the potential species/genome build mismatch with plugin data files.
Also applies to: 20-22, 63-65
.test/config-tumor_only_ffpe/config.yaml (1)
12-12
: Same issues as in tumor-only configThe changes in this file mirror those in the tumor-only configuration, including the potential species/genome build mismatch with plugin data files.
Also applies to: 20-22, 63-65
.test/config-tumor_normal_ffpe/config.yaml (2)
12-12
: Same issues as in tumor-only configThe changes in this file mirror those in the tumor-only configuration, including the potential species/genome build mismatch with plugin data files.
Also applies to: 20-22, 63-65
Line range hint
1-1
: Verify alignment with PR objectiveThe PR title suggests fixing FFPE event handling to consider purity, but the changes appear to focus on:
- Updating the Ensembl release version
- Adding a new (disabled) mutational signatures feature
- Adding VEP plugins for variant annotation
Please clarify how these changes relate to improving FFPE event handling and purity consideration.
workflow/resources/config/scenario.yaml (3)
19-27
: Well-structured modularization of purity-related functions.The addition of dedicated functions for purity handling improves code organization and reusability. The
ffpe_threshold()
function correctly scales the threshold based on sample purity.
43-47
: Verify FFPE artifact threshold calculation.The FFPE artifact detection now uses a dynamic threshold that scales with sample purity. Please ensure this scaling factor (0.05) has been validated with real FFPE samples across different purity levels.
68-74
: Comprehensive update of event definitions with FFPE considerations.The event definitions now properly exclude FFPE artifacts from various variant categories using the new
is_ffpe_artifact
variable. This should help reduce false positives in variant calling.workflow/resources/config/default.yaml (2)
70-74
: New mutational signatures analysis configuration.The mutational signatures analysis is properly configured to analyze present variants.
35-43
: Verify pangenome index compatibility.The pangenome alignment feature is properly configured, but ensure the specified VCF is compatible with GRCh38 build as mentioned in the comment.
Run this script to verify the pangenome VCF:
🧰 Tools
🪛 yamllint (1.35.1)
[warning] 38-38: wrong indentation: expected 4 but found 6
(indentation)
[warning] 41-41: too many spaces after colon
(colons)
Summary by CodeRabbit
Configuration Updates
Workflow Improvements
Gene Database