Fix createyml random image matching #221

ViktorHy · 2024-08-21T12:47:52Z

Description and reviewer info

In this PR I aim to fix two issues.

First, eklipse images from trios would always be random, or rather the image that was created last in the pipeline, To fix this I have added proband ID information to create_yaml process in main.nf and use that information in create_yml.pl to select correct image.

Second, some genelist would go missing without any reason from yaml-files. I tracked the issue down to a crappy regex that matched anything with test and ignored them. This resulted in missing all intestinal genelists. I removed the regex-part and cleaned up the code around genelist a bit.

Moreover, I also changed some variable names to be more readable, removed some errors and made sure to only output to STDERR.

Type of change

Documentation
Patch
Minor change
Major change

Checklist

Self-review of my code
Update the CHANGELOG
Tag the latest commit (vX.Y.Z format)
Log samples used for testing in the Verification_samples_log Excel sheet

Patch

Stub run completes without errors or new warnings
At least one other person has reviewed and approved my code (not required for trivial changes)

Test/review documentation

Review performed by

(Add if missing)

Testing performed by

…eate_yml.pl input

…ge file name

ViktorHy · 2024-08-21T12:50:11Z

I just realized I could also filter the eklipse channel to only pass on proband image. Would require less create_yml.pl fixes. I dont know what you guys prefer? @alkc @Jakob37

Jakob37 · 2024-08-22T05:26:28Z

Can we briefly discuss the overall strategy here @ViktorHy before I wrap the reviewing.

This seems fragile to me, to do this filtering out in the process based on the position. If we change something upstreams of yml_diag later, this might very well break.

set group, id, sex, mother, father, phenotype, diagnosis, type, assay, clarity_sample_id, ffpe, analysis, type, file(ped), file(INFO) from yml_diag.filter { it[7] == 'proband' }.join(ped_scout).join(yaml_INFO).view()

On the other hand, it seems like script itself does not need, and thus shouldn't, know about whether it is processing a proband file or not.

So the right place to handle this seems to be on the nextflow level.

I tested if you could filter on "name" instead of position in the yml_diag channel. It seems this cannot be done after converting it into a tuple, but it could be done before (tested using nextflow console).

Channel
    .fromPath("trio.csv")
    .splitCsv(header: true)
    .filter { row -> row.type == "proband" }
    .map { row -> tuple(row.type, row.group, row.id, row.sex) }.view()

As the yml_diag only seems to be used in this process. Could we do a separate channel with only the proband info instead? And then feed the csv_proband_only. I think this would give the same result as what you have here 🤔

I just realized I could also filter the eklipse channel to only pass on proband image. Would require less create_yml.pl fixes. I dont know what you guys prefer? @alkc @Jakob37

If we could handle that in a named way as the example above, then I think you are right - we should offload create_yml.pl this responsibility and handle it on the nextflow level.

…ional output

Jakob37

If this works in testing - great! Some minor comments.

Jakob37 · 2024-08-21T18:00:10Z

bin/create_yml.pl

-my @g_c = split/,/,$opt{g};
+### Proband ### Could differ from group, needed to select correct eklipse image
+### Clarity-ID ###
+my @g_c;


The meaning of this variable name is not clear to me (I realize this is not from this PR)

yes its an abbreviated name, group_clarity. And the --g flag was from it being only group in the beginning.

Jakob37 · 2024-08-21T18:03:03Z

bin/create_yml.pl

@@ -132,52 +132,75 @@
 }

 ### Group ###
-if (!defined $opt{g}) { print STDERR "need group name"; exit;}
-my @g_c = split/,/,$opt{g};
+### Proband ### Could differ from group, needed to select correct eklipse image


Great addition - early fail and clear user info when calling with wrong input. I like it

Jakob37 · 2024-08-21T18:04:09Z

bin/create_yml.pl

-    print STDERR $_,"\n";
-    if ($tmp[0] eq "BAM") {
-        $INFO{BAM}->{$tmp[1]} = $tmp[2];
+    my $category = $tmp[0];


Jakob37 · 2024-08-21T18:07:30Z

bin/create_yml.pl

    }

 }
 close INFO;
-print Dumper(%INFO);
+my $info_json = to_json(\%INFO, { pretty => 1, indent => 4 });
+print STDERR ($info_json);


For debugging only (to be removed) or for production (to be kept)? (Just wondering)

First I thought I'd keep it, but then again the input file shows the same information

Jakob37 · 2024-08-22T09:08:44Z

bin/create_yml.pl

@@ -121,7 +121,7 @@
 if ($opt{assay}) { 
    my @a_a = split/,/,$opt{assay};
    $assay = $a_a[0];
-    if ($a_a[1] ne 'false' && $a_a[1]) {
+    if ($a_a[1]) {


Jakob37 · 2024-08-22T09:14:26Z

bin/create_yml.pl

@@ -203,9 +219,12 @@
    }


What about adding a STDERR message in the if statement here. To give us a tiny chance to see this if ever triggered

it really should be triggered everytime, since the regex really does not look for anything being added. Maybe I should just remove this. The funciton this was to solve has been solved elsewhere by using flags in loqusdb processes instead

OK removing sounds even better!

Jakob37 · 2024-08-22T09:16:04Z

bin/create_yml.pl

    foreach my $key (@{$data}) {
-        if (ref $key->{institute} eq 'ARRAY') {


Hmm, tricky to follow here. Guessing you know this chunk well enough, so that no new issues are introduced here ...

Yes, this is some really old legacy code. It's from the time we had a manually cured list of genepanels. Now it was just confusing to keep

Jakob37 · 2024-08-22T09:17:17Z

main.nf

@@ -1695,14 +1695,18 @@ process run_eklipse {
 	publishDir "${OUTDIR}/plots/mito", mode: 'copy', overwrite: 'true', pattern: '*.png'

 	input:
-		set group, id, file(bam), file(bai) from eklipse_bam
-
+		set group, id, file(bam), file(bai), sex, type from eklipse_bam.join(meta_eklipse, by: [0,1])


Is the [0, 1] due to the id and group flipping? We should really sort that out soon ... Maybe we can replace all the group + id with a meta object. Then we could also bunch the type and sex into that one. An issue for another day though

Maybe not so far away though. Seems like something to sort out before transitioning to DSL2

It's not because of the flipping. It's just to say that it should match on both. The result is that the group and id channel does not become a wierd list.
if I only match on group it will be:
group, id1, bam, bai, id1/2/3(random), sex, type.
if I only match on id it will be:
[group,group], id, bam, bai, sex, type

main.nf

ViktorHy · 2024-08-26T06:52:51Z

Tests passed for wgs single, trio and onko samples

ViktorHy added 4 commits August 21, 2024 13:37

removed regex matching of genepanels, would remove all *test*

50c8962

fixed some uninitialized value errors

b2e3350

make sure proband channel gets to create_yml and add proband id to cr…

762cc75

…eate_yml.pl input

fixed some variable names, and made sure proband id is in eklipse ima…

b2371f9

…ge file name

ViktorHy added 3 commits August 22, 2024 10:18

removed proband-id import to script

0867f28

fixed syntax error

d80a27c

moved proband filtering of eklipse plot to eklipse process, using opt…

8027ef0

…ional output

Jakob37 approved these changes Aug 22, 2024

View reviewed changes

Jakob37 mentioned this pull request Aug 22, 2024

Group id, group, type and sex into a meta object #222

Open

ViktorHy and others added 3 commits August 22, 2024 13:57

removed debug print

bac9758

eklipse and create_yml.pl fixes

39cd08b

Merge branch 'master' into fix_createyml_random_image_matching

ef7db7b

ViktorHy merged commit 1d86c92 into master Aug 26, 2024
1 check passed

alkc mentioned this pull request Aug 29, 2024

d4_file not added to yaml in bam-only starts #223

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix createyml random image matching #221

Fix createyml random image matching #221

ViktorHy commented Aug 21, 2024 •

edited

Loading

ViktorHy commented Aug 21, 2024

Jakob37 commented Aug 22, 2024

Jakob37 left a comment

Jakob37 Aug 21, 2024

ViktorHy Aug 22, 2024

Jakob37 Aug 21, 2024

Jakob37 Aug 21, 2024

Jakob37 Aug 21, 2024

ViktorHy Aug 22, 2024

Jakob37 Aug 22, 2024

Jakob37 Aug 22, 2024

ViktorHy Aug 22, 2024

Jakob37 Aug 22, 2024

Jakob37 Aug 22, 2024

ViktorHy Aug 22, 2024

Jakob37 Aug 22, 2024

Jakob37 Aug 22, 2024

ViktorHy Aug 22, 2024

Jakob37 Aug 22, 2024

ViktorHy commented Aug 26, 2024

		foreach my $key (@{$data}) {
		if (ref $key->{institute} eq 'ARRAY') {

Fix createyml random image matching #221

Fix createyml random image matching #221

Conversation

ViktorHy commented Aug 21, 2024 • edited Loading

Description and reviewer info

Type of change

Checklist

Patch

Test/review documentation

Review performed by

Testing performed by

ViktorHy commented Aug 21, 2024

Jakob37 commented Aug 22, 2024

Jakob37 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ViktorHy commented Aug 26, 2024

ViktorHy commented Aug 21, 2024 •

edited

Loading