Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RNU-gene and reference files cleanup #231

Merged
merged 3 commits into from
Sep 26, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 17 additions & 9 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,18 +1,26 @@
# Files
git.hash
nextflow.config
todo
mito_full.nf

# Suffix
*~
__pycache__/
container/*.sif
*.sif
.nextflow*
nextflow.config
work/
*.html
*#
git.hash
todo
mito_full.nf
## ref tools
bin/reference_tools/*.log
bin/reference_tools/*.bed
*.orig

# Folders
__pycache__/
work/
data/
.vscode/

# Ref tools
bin/reference_tools/*.log
bin/reference_tools/*.bed
bin/reference_tools/refdata/*.vcf.gz

1 change: 1 addition & 0 deletions bin/reference_tools/refdata/user_added.bed
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
17 43387226 43387417 RNU2-4
Copy link
Contributor

@alkc alkc Sep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coordinates check out 👍

338 changes: 0 additions & 338 deletions bin/reference_tools/update_bed.pl

This file was deleted.

Empty file modified bin/reference_tools/update_bed.py
100644 → 100755
Empty file.
2 changes: 1 addition & 1 deletion configs/nextflow.hopper.config
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,7 @@ profiles {
params.outdir = "${params.resultsdir}${params.dev_suffix}"
params.subdir = 'wgs'
params.crondir = "${params.outdir}/cron/"
params.intersect_bed = "${params.refpath}/bed/wgsexome/exons_108padded20bp_clinvar-20231230padded5bp.bed"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked the diffs using bedtools intersect -a <> -b <> -v, confirming that the changes was some added ClinVar (~1000), a few removed, and the added RNU-gene (please double check as a reviewer)

Sure about that ~1000 added, or is it a typo? I get 168 new entries when comparing the new file with the old:

(alkc-base) alkc@MTLUCMDS1:~$ bedtools intersect -a /fs1/resources/ref/hg38//bed/wgsexome/exons_108padded20bp_clinvar-20240825padded5bp.bed -b /fs1/resources/ref/hg38//bed/wgsexome/exons_108padded20bp_clinvar-20231230padded5bp.bed -v | wc -l     
168                                                                                                                                                                                                                                                   
(alkc-base) alkc@MTLUCMDS1:~$ bedtools intersect -b /fs1/resources/ref/hg38//bed/wgsexome/exons_108padded20bp_clinvar-20240825padded5bp.bed -a /fs1/resources/ref/hg38//bed/wgsexome/exons_108padded20bp_clinvar-20231230padded5bp.bed -v | wc -l     
33                                                                                                                                                                                                                                                                                                                                                                                           

RNU confirmed added w/ correct corrdinated:

(alkc-base) alkc@MTLUCMDS1:~$ bedtools intersect -a /fs1/resources/ref/hg38//bed/wgsexome/exons_108padded20bp_clinvar-20240825padded5bp.bed -b /fs1/resources/ref/hg38//bed/wgsexome/exons_108padded20bp_clinvar-20231230padded5bp.bed -v | grep RNU  
17      43387226        43387417        RNU2-4                                                                                                                                                                                                        

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure about that ~1000 added, or is it a typo? I get 168 new entries when comparing the new file with the old:

Here is the log output:

INFO: Clinvar in common between versions: 231560
INFO: Added new (unique targets): 42111 (1305)
INFO: Removed old (unique targets): 18330 (37)

But, this comparison is between the ClinVar vcfs, not between the final intersect files. Maybe most of the new entries are overlapping with the exons.

I'll see if I can verify that hypothesis.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I understand this now. The logged numbers are for targets added and removed to the bed file based on the new ClinVar, i.e. what isn't present in the exons, agilient and our custom bed.

I think the number is correct, but I'll see if I can clarify the code / output a bit so this is clear next time around.

I'll re-request review when I am done with this.

params.intersect_bed = "${params.refpath}/bed/wgsexome/exons_108padded20bp_clinvar-20240825padded5bp.bed"
params.align = true
params.varcall = true
params.annotate = true
Expand Down
Loading