-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix createyml random image matching #221
Conversation
…eate_yml.pl input
Can we briefly discuss the overall strategy here @ViktorHy before I wrap the reviewing. This seems fragile to me, to do this filtering out in the process based on the position. If we change something upstreams of
On the other hand, it seems like script itself does not need, and thus shouldn't, know about whether it is processing a proband file or not. So the right place to handle this seems to be on the nextflow level. I tested if you could filter on "name" instead of position in the
As the
If we could handle that in a named way as the example above, then I think you are right - we should offload |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this works in testing - great! Some minor comments.
my @g_c = split/,/,$opt{g}; | ||
### Proband ### Could differ from group, needed to select correct eklipse image | ||
### Clarity-ID ### | ||
my @g_c; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The meaning of this variable name is not clear to me (I realize this is not from this PR)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes its an abbreviated name, group_clarity. And the --g flag was from it being only group in the beginning.
@@ -132,52 +132,75 @@ | |||
} | |||
|
|||
### Group ### | |||
if (!defined $opt{g}) { print STDERR "need group name"; exit;} | |||
my @g_c = split/,/,$opt{g}; | |||
### Proband ### Could differ from group, needed to select correct eklipse image |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great addition - early fail and clear user info when calling with wrong input. I like it
print STDERR $_,"\n"; | ||
if ($tmp[0] eq "BAM") { | ||
$INFO{BAM}->{$tmp[1]} = $tmp[2]; | ||
my $category = $tmp[0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
bin/create_yml.pl
Outdated
} | ||
|
||
} | ||
close INFO; | ||
print Dumper(%INFO); | ||
my $info_json = to_json(\%INFO, { pretty => 1, indent => 4 }); | ||
print STDERR ($info_json); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For debugging only (to be removed) or for production (to be kept)? (Just wondering)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First I thought I'd keep it, but then again the input file shows the same information
@@ -121,7 +121,7 @@ | |||
if ($opt{assay}) { | |||
my @a_a = split/,/,$opt{assay}; | |||
$assay = $a_a[0]; | |||
if ($a_a[1] ne 'false' && $a_a[1]) { | |||
if ($a_a[1]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
bin/create_yml.pl
Outdated
@@ -203,9 +219,12 @@ | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about adding a STDERR message in the if statement here. To give us a tiny chance to see this if ever triggered
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it really should be triggered everytime, since the regex really does not look for anything being added. Maybe I should just remove this. The funciton this was to solve has been solved elsewhere by using flags in loqusdb processes instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK removing sounds even better!
foreach my $key (@{$data}) { | ||
if (ref $key->{institute} eq 'ARRAY') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, tricky to follow here. Guessing you know this chunk well enough, so that no new issues are introduced here ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is some really old legacy code. It's from the time we had a manually cured list of genepanels. Now it was just confusing to keep
@@ -1695,14 +1695,18 @@ process run_eklipse { | |||
publishDir "${OUTDIR}/plots/mito", mode: 'copy', overwrite: 'true', pattern: '*.png' | |||
|
|||
input: | |||
set group, id, file(bam), file(bai) from eklipse_bam | |||
|
|||
set group, id, file(bam), file(bai), sex, type from eklipse_bam.join(meta_eklipse, by: [0,1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the [0, 1]
due to the id
and group
flipping? We should really sort that out soon ... Maybe we can replace all the group + id with a meta object. Then we could also bunch the type
and sex
into that one. An issue for another day though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe not so far away though. Seems like something to sort out before transitioning to DSL2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not because of the flipping. It's just to say that it should match on both. The result is that the group and id channel does not become a wierd list.
if I only match on group it will be:
group, id1, bam, bai, id1/2/3(random), sex, type.
if I only match on id it will be:
[group,group], id, bam, bai, sex, type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see!
Tests passed for wgs single, trio and onko samples |
Description and reviewer info
In this PR I aim to fix two issues.
First, eklipse images from trios would always be random, or rather the image that was created last in the pipeline, To fix this I have added proband ID information to create_yaml process in main.nf and use that information in create_yml.pl to select correct image.
Second, some genelist would go missing without any reason from yaml-files. I tracked the issue down to a crappy regex that matched anything with test and ignored them. This resulted in missing all intestinal genelists. I removed the regex-part and cleaned up the code around genelist a bit.
Moreover, I also changed some variable names to be more readable, removed some errors and made sure to only output to STDERR.
Type of change
Checklist
Verification_samples_log
Excel sheetPatch
Test/review documentation
Review performed by
(Add if missing)
Testing performed by