-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add RNU-gene and reference files cleanup #231
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got one question above below about added variants in the updated bed file.
@@ -113,7 +113,7 @@ profiles { | |||
params.outdir = "${params.resultsdir}${params.dev_suffix}" | |||
params.subdir = 'wgs' | |||
params.crondir = "${params.outdir}/cron/" | |||
params.intersect_bed = "${params.refpath}/bed/wgsexome/exons_108padded20bp_clinvar-20231230padded5bp.bed" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checked the diffs using bedtools intersect -a <> -b <> -v, confirming that the changes was some added ClinVar (~1000), a few removed, and the added RNU-gene (please double check as a reviewer)
Sure about that ~1000 added, or is it a typo? I get 168 new entries when comparing the new file with the old:
(alkc-base) alkc@MTLUCMDS1:~$ bedtools intersect -a /fs1/resources/ref/hg38//bed/wgsexome/exons_108padded20bp_clinvar-20240825padded5bp.bed -b /fs1/resources/ref/hg38//bed/wgsexome/exons_108padded20bp_clinvar-20231230padded5bp.bed -v | wc -l
168
(alkc-base) alkc@MTLUCMDS1:~$ bedtools intersect -b /fs1/resources/ref/hg38//bed/wgsexome/exons_108padded20bp_clinvar-20240825padded5bp.bed -a /fs1/resources/ref/hg38//bed/wgsexome/exons_108padded20bp_clinvar-20231230padded5bp.bed -v | wc -l
33
RNU confirmed added w/ correct corrdinated:
(alkc-base) alkc@MTLUCMDS1:~$ bedtools intersect -a /fs1/resources/ref/hg38//bed/wgsexome/exons_108padded20bp_clinvar-20240825padded5bp.bed -b /fs1/resources/ref/hg38//bed/wgsexome/exons_108padded20bp_clinvar-20231230padded5bp.bed -v | grep RNU
17 43387226 43387417 RNU2-4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure about that ~1000 added, or is it a typo? I get 168 new entries when comparing the new file with the old:
Here is the log output:
INFO: Clinvar in common between versions: 231560
INFO: Added new (unique targets): 42111 (1305)
INFO: Removed old (unique targets): 18330 (37)
But, this comparison is between the ClinVar vcfs, not between the final intersect files. Maybe most of the new entries are overlapping with the exons.
I'll see if I can verify that hypothesis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I understand this now. The logged numbers are for targets added and removed to the bed file based on the new ClinVar, i.e. what isn't present in the exons, agilient and our custom bed.
I think the number is correct, but I'll see if I can clarify the code / output a bit so this is clear next time around.
I'll re-request review when I am done with this.
@@ -0,0 +1 @@ | |||
17 43387226 43387417 RNU2-4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Coordinates check out 👍
…, and some cleanup
OK, I have worked through the ClinVar adding/removing logic again. What confused me before was the difference numbers:
I extended the log to include this:
This does not change the number of entries included in the output, only the logging. Beyond this I have meditated for quite some time about why 183. When reducing overlapping ranges it goes down to 175. The remaining bunch are a few Pathogenic which for some reason isn't seen as new compared to the previous run. But they are included in the final version. So I don't think it is dangerous. |
Description and reviewer info
reference_tools
bedtools intersect -a <> -b <> -v
, confirming that the changes was some added ClinVar (~1000), a few removed, and the added RNU-gene (please double check as a reviewer).gitignore
a bit (just shifting things around)I wonder if we should move out the config and things like reference data outside the repo itself. The workflow overall should be location agnostic, but these parts are coded to us in Lund (and which we might rather keep private, such as exact file locations on our servers). Discussion for the future.
I'll test run onco, wgs and wgs trio, and verify that things look OK.
Type of change
Checklist
Verification_samples_log
Excel sheetDocumentation
Patch
Major / Minor change
onco
run finishes without any new warnings/errors and the results canbe loaded into scout
wgs
single run finishes without any new warnings/errors and the resultscan be loaded into scout
wgs
trio run finishes without any new warnings/errors and the resultscan be loaded into scout
Test/review documentation
Review performed by
(Add if missing)
Testing performed by