Multi-tool functionality and subworkflows as hub of methods #385

suzannejin · 2024-12-10T09:58:39Z

grst · 2024-12-10T10:03:23Z

Regarding the "Toolsheet", how does that relate to what we proposed in #362?

suzannejin · 2024-12-10T11:02:45Z

Regarding the "Toolsheet", how does that relate to what we proposed in #362?

The toolsheet is to decide which DE and functional analysis methods to run. An example is here. This is the default toolsheet where each row is a combination of tools that would make sense to be together.

The idea is that the user can select for example --pathway deseq2_gsea,limma_gprofiler2, then this will run both options at the same time with default parameters for each method (with the possibility to change the parameters by toolsheet or cmd flags).

As for your question, the method option in the contrast file could be a way to match between each contrast and the corresponding method to run.

grst · 2024-12-10T12:06:18Z

I'm wondering if it wouldn't be more convenient to specify everything in yaml format? Essentially each list item would replace one row in your toolsheet and everything could be specified in one place. YAML seems the more natural choice to me in cases where you have a lot of empty columns in a CSV file otherwise and/or lists of things such as deseq2_gsea,limma_gprofiler2.

I'm also afraid that all the parameters for a differentialabundance run get scattered across too many places... nextflow params, contrasts file, toolsheet file, samplesheet... I'd rather reduce the number of places where to specify parameters.

Something like:

models: 
  - method: limma
    formula: ~ treatment + response
    contrasts:
      - id: treatment_a_vs_b
        type: simple
        comparison: ["treatment", "A", "B"]
    enrichment: 
      - gsea
      - gprofiler2    
  -  method: propd
     permutations: 100
     contrasts: 
      - id: treatment
        type: anova
        column: treatment
   - compositional: propr
      metric: rho

This obviously needs to be fleshed out in more detail. For this it would be important to understand which of the workflows depends on each other. I guess the compositional workflow is completely separate from the differential workflow. The enrichment workflow could be independent when working on the expression data, but it could also work off a ranked gene list generated by the differential workflow.

suzannejin · 2024-12-10T16:04:22Z

I'm wondering if it wouldn't be more convenient to specify everything in yaml format?

I don't have too much of a strong feeling between yaml or csv format. However, merging contrast with toolsheet into one file could become tricky. This is because, when there are many methods available, it is nice to have a 'default' toolsheet as a place to specify all the possible combinations of tools that really make sense to be together from the theoretical perspective. This file will always be there, in the pipeline github. Whereas the contrast file is data specific.

grst · 2024-12-10T16:15:04Z

it is nice to have a 'default' toolsheet as a place to specify all the possible combinations of tools that really make sense to be together from the theoretical perspective

What are the implications of this? Would you fail the pipeline if a user specifies an "invalid" combination?

suzannejin · 2024-12-10T17:21:14Z

What are the implications of this? Would you fail the pipeline if a user specifies an "invalid" combination?

Don't have a plan for that yet, but one option is to raise a warning that it is a non-tested combination.

Indeed, for benchmark users, we considered the possibility of providing an extra toolsheet with all the rows one wants to benchmark.

suzannejin · 2024-12-10T17:36:00Z

I'm also afraid that all the parameters for a differentialabundance run get scattered across too many places... nextflow params, contrasts file, toolsheet file, samplesheet... I'd rather reduce the number of places where to specify parameters.

This is also a concern for us... but for the moment we have not find a better solution. It would be nice to brainstorm at some point and super welcome to contribute if you find a better way :)

grst · 2024-12-11T06:59:59Z

This file will always be there, in the pipeline github. Whereas the contrast file is data specific.

Just to clarify again, this will only be in the pipeline and the user specifies the combination of tools using standard params, e.g. --pathway deseq2_gsea,limma_gprofiler2? Or will this be an additional input file for the user?

suzannejin · 2024-12-11T09:16:01Z

Just to clarify again, this will only be in the pipeline and the user specifies the combination of tools using standard params, e.g. --pathway deseq2_gsea,limma_gprofiler2? Or will this be an additional input file for the user?

We defined tools = "${projectDir}/assets/tools_samplesheet.csv" in nextflow.config.
In theory, users should not provide any additional toolsheet to run the pipeline, but we also don't want to stop the users doing so. Hence, one can still change tools path to a custom toolsheet under their own risk. Do you think this will be a problem?

grst · 2024-12-11T09:18:53Z

No, it's all good then. All I wanted to know is that in a standard pipeline run, the user wouldn't be required to specify yet another config file.

As you said, we should still think about how to reduce the number of places where to specify parameters, but that's a topic for a separate issue.

suzannejin · 2024-12-11T15:23:39Z

Here I created a meta issue with all the steps/sub-issues needed to achieve what we agreed to do.
Let me know what you think and if you would add/modify anything :)

CC @mirpedrol @bjlang @JoseEspinosa @pinin4fjords @WackerO

mirpedrol · 2024-12-11T15:32:01Z

I'm wondering if it wouldn't be more convenient to specify everything in yaml format?

Since the tool sheet will be read with nf-schema, it can accept both CSV and YAML, so a user could use the one that is more convenient for them.

suzannejin · 2024-12-12T15:20:38Z

I'm wondering if it wouldn't be more convenient to specify everything in yaml format?

Actually @mirpedrol , if it is in yaml format, does it mean that it would be more flexible, and better allow definitions of optional methods/params?

mirpedrol · 2024-12-13T07:51:31Z

I would say they are equivalent if we use simple YAML (without nesting), up to a user preference which one is easier to type.

suzannejin added this to differentialabundance Dec 10, 2024

suzannejin converted this from a draft issue Dec 10, 2024

suzannejin changed the title ~~Multi-tool functionality~~ Multi-tool functionality and subworkflows as hub of methods Dec 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-tool functionality and subworkflows as hub of methods #385

Multi-tool functionality and subworkflows as hub of methods #385

suzannejin commented Dec 10, 2024 •

edited

Loading

grst commented Dec 10, 2024

suzannejin commented Dec 10, 2024 •

edited

Loading

grst commented Dec 10, 2024 •

edited

Loading

suzannejin commented Dec 10, 2024 •

edited

Loading

grst commented Dec 10, 2024

suzannejin commented Dec 10, 2024 •

edited

Loading

suzannejin commented Dec 10, 2024

grst commented Dec 11, 2024

suzannejin commented Dec 11, 2024

grst commented Dec 11, 2024

suzannejin commented Dec 11, 2024

mirpedrol commented Dec 11, 2024

suzannejin commented Dec 12, 2024 •

edited

Loading

mirpedrol commented Dec 13, 2024

Multi-tool functionality and subworkflows as hub of methods #385

Multi-tool functionality and subworkflows as hub of methods #385

Comments

suzannejin commented Dec 10, 2024 • edited Loading

Goals

Context

Steps needed

Other related features

grst commented Dec 10, 2024

suzannejin commented Dec 10, 2024 • edited Loading

grst commented Dec 10, 2024 • edited Loading

suzannejin commented Dec 10, 2024 • edited Loading

grst commented Dec 10, 2024

suzannejin commented Dec 10, 2024 • edited Loading

suzannejin commented Dec 10, 2024

grst commented Dec 11, 2024

suzannejin commented Dec 11, 2024

grst commented Dec 11, 2024

suzannejin commented Dec 11, 2024

mirpedrol commented Dec 11, 2024

suzannejin commented Dec 12, 2024 • edited Loading

mirpedrol commented Dec 13, 2024

suzannejin commented Dec 10, 2024 •

edited

Loading

suzannejin commented Dec 10, 2024 •

edited

Loading

grst commented Dec 10, 2024 •

edited

Loading

suzannejin commented Dec 10, 2024 •

edited

Loading

suzannejin commented Dec 10, 2024 •

edited

Loading

suzannejin commented Dec 12, 2024 •

edited

Loading