-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[dev-ratio] add deseq2 and gprofiler2 into the experimental branch #306
Conversation
|
…hways/contrast/etc into different subfolders
@@ -490,7 +490,7 @@ process { | |||
// However, later we need to better handle this, maybe by a bit of groovy scripting to | |||
// overwrite the repeated parameters (?) | |||
|
|||
withName: "PROPR"{ | |||
withName: PROPR { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
withName: PROPR { | |
withName: 'PROPR' { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency I'd use ' throughout the config
(meta.diff_method ? "-from-${meta.diff_method}" : ''), | ||
(meta.args_diff ? "-${meta.args_diff.replace('--','').replace(' ', '_')}" : ''), | ||
(meta.cor_method ? "-from-${meta.cor_method}" : ''), | ||
(meta.args_cor ? "-${meta.args_cor.replace('--','').replace(' ', '_')}" : ''), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With splitting the toolsheet information into only the subworkflow relevant chunks, the data accessed here wouldn't be existing anymore. I'm not sure such an elaborate file naming is necessary in this case - could it not be reduced to the pathway name and args_enr? (I dont see a reason for 2 different pathways with identical args_diff, args_cor and args_enr to exist)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but if you have two pathways:
- deseq2 + gprofiler2
- deseq2 + gsea
Would you save same deseq2 output twice, but into two different pathway directories?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm talking only about the enrichment methods like the GREA process here, where you would access not only enr arguments but also need to pass over all other args from previous steps to build the filename in the way you suggest. So, what I mean is to use only args_enr and if this is not specific enough maybe additionally the pathway_name.
For deseq2 in the differential analysis step using args_diff to build the name is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but now that we don't have the pathway name info inside the meta anymore, we don't have any way to distinguish between them and store them into different places (?)
pathway = "propd,propd_grea,propr,cor" | ||
|
||
// analysis | ||
pathway = "propd,propd_grea,propd_ora,propr,cor,deseq2_ora" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You think propd_fdr and pcorbshrink can be safely omitted from the test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
propd_fdr is currently a bit useless for this test, because it is not giving any significant results for the test dataset, so it can be ignored.
But I would have to check other datasets for it to run and test
Then instead of pcorbshrink maybe I would ignore cor
include { DESEQ2_DIFFERENTIAL as DESEQ2 } from '../../../modules/nf-core/deseq2/differential/main' | ||
include { FILTER_DIFFTABLE as FILTER_DESEQ2 } from '../../../modules/local/filter_difftable' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally prefer to use renaming only when necessary, i.e. when a module is included multiple times. That way you always directly know for nf-core modules in which folder the module code is located, without needing to check the include statement.
ch_results_genewise_filtered | ||
.filter { it[0]["enr_method"] == "gprofiler2" } | ||
.combine(ch_gene_sets) | ||
.combine(ch_counts) | ||
.multiMap { meta_results, results, meta_gene_sets, gene_sets, meta_counts, counts -> | ||
de : [meta_results, results] | ||
gmt : [gene_sets] | ||
background : [counts] | ||
} | ||
.set{ ch_enrichment_gprofiler2 } | ||
|
||
GPROFILER2_GOST( | ||
ch_enrichment_gprofiler2.de, | ||
ch_enrichment_gprofiler2.gmt, | ||
ch_enrichment_gprofiler2.background | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a reason for the multiMap here.
Why not something like
ch_results_genewise_filtered | |
.filter { it[0]["enr_method"] == "gprofiler2" } | |
.combine(ch_gene_sets) | |
.combine(ch_counts) | |
.multiMap { meta_results, results, meta_gene_sets, gene_sets, meta_counts, counts -> | |
de : [meta_results, results] | |
gmt : [gene_sets] | |
background : [counts] | |
} | |
.set{ ch_enrichment_gprofiler2 } | |
GPROFILER2_GOST( | |
ch_enrichment_gprofiler2.de, | |
ch_enrichment_gprofiler2.gmt, | |
ch_enrichment_gprofiler2.background | |
) | |
GPROFILER2_GOST( | |
ch_results_genewise_filtered.filter { it[0]["enr_method"] == "gprofiler2" }, | |
ch_gene_sets.first()[1], | |
ch_counts.first()[1] | |
) |
// parse optional input files that affect the normalization | ||
// TODO we should consider to put this kind of stuff in a separate data handling / preprocessing / normalization block | ||
if (params.control_features) { | ||
ch_control_features = Channel.of([ [ "id": params.study_name ], file(params.control_features, checkIfExists: true)]).first() | ||
} else { | ||
ch_control_features = [[],[]] | ||
} | ||
if (params.transcript_length_matrix) { | ||
ch_transcript_lengths = Channel.of([ [ "id": params.study_name ], file(params.transcript_length_matrix, checkIfExists: true)]).first() | ||
} else { | ||
ch_transcript_lengths = [[],[]] | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this is Deseq2 specific, wouldn't it be better to move it into the differential_analysis subworkflow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is good to call params within the subworkflows? I thought that should be avoided for a better encapsulation of the subworflow?
Also, not now, but in the future, I think it make sense to have two separated blocks with DESEQ2_NORM dealing the normalization of the data (so in the data processing block) that will use these control/transcript length features, and DESEQ2_DIFFERENTIAL running the differential analysis on the already normalized data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I see your point, but I would argue that we have a better encapsulation, when a subworkflow can be so transparently changed (i.e. by adding a new module) that there is no need for making any changes to the calling workflow.
PR checklist
nf-core lint
).nf-test test main.nf.test -profile test,docker
).nextflow run . -profile debug,test,docker --outdir <OUTDIR>
).docs/usage.md
is updated.docs/output.md
is updated.CHANGELOG.md
is updated.README.md
is updated (including new tool citations and authors/contributors).