Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workflow output definition (second preview) #5185

Merged
merged 21 commits into from
Oct 24, 2024

Conversation

bentsherman
Copy link
Member

@bentsherman bentsherman commented Jul 30, 2024

Close #5103

TODO:

language improvements:

  • support dynamic path
  • move publish options to config
  • remove publish section from process definition

runtime improvements:

  • prevent leading/trailing slashes
  • report unused target names
  • detect file collisions at runtime ?
  • detect outputs that aren't published or used downstream ?

new features (might be handled separately):

  • support JSON index file
  • generate output schema ?
  • include publish targets in inspect command ?

Copy link

netlify bot commented Jul 30, 2024

Deploy Preview for nextflow-docs-staging ready!

Name Link
🔨 Latest commit 7d5d69b
🔍 Latest deploy log https://app.netlify.com/sites/nextflow-docs-staging/deploys/67194094986ac9000862eb27
😎 Deploy Preview https://deploy-preview-5185--nextflow-docs-staging.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@bentsherman bentsherman changed the title Finalize workflow output definition Workflow output definition (second preview) Sep 23, 2024
@bentsherman bentsherman marked this pull request as ready for review September 24, 2024 19:42
@bentsherman bentsherman requested a review from a team as a code owner September 24, 2024 19:42
@bentsherman bentsherman requested review from pditommaso and removed request for a team September 24, 2024 19:42
@bentsherman
Copy link
Member Author

Second preview is ready for review, POCs have also been updated:

Signed-off-by: Ben Sherman <[email protected]>
@nvnieuwk
Copy link

nvnieuwk commented Oct 8, 2024

I just tried the dynamic publishing with some dummy code and it worked perfectly! I'm looking forward to implementing this in my pipelines 🥳

Copy link
Contributor

@christopher-hakkaart christopher-hakkaart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added small language suggestion.

docs/amazons3.md Outdated Show resolved Hide resolved
docs/channel.md Outdated Show resolved Hide resolved
docs/channel.md Outdated Show resolved Hide resolved
docs/channel.md Outdated Show resolved Hide resolved
docs/reference/config.md Outdated Show resolved Hide resolved
pditommaso and others added 2 commits October 21, 2024 14:42
Co-authored-by: Christopher Hakkaart <[email protected]>
Signed-off-by: Paolo Di Tommaso <[email protected]>

`header`
: When `true`, the keys of the first record are used as the column names (default: `false`). Can also be a list of column names.
- The process `publish:` section has been removed. Channels should be published only in workflows, ideally the entry workflow.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really not sure about this, I see the need for modules but it will make the writing of workflows much more verbose. Could not both approach co-exist?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's too confusing. If a process publishes some outputs by default and you don't want to publish them, then you have to explicitly disable them. And one of the goals of workflow outputs was to have all published outputs in one place, instead of being scattered across the pipeline

My initial feeling was to only allow publishing in the entry workflow, but that would also create a lot of boilerplate to pass all the channels up from the subworkflows. So a happy middle ground for now is to allow publishing from any workflow, but not processes

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the original motivation for process publishing is to provide some sensible publishing defaults for a module. I think this is better handled by providing an entry workflow with the module:

process ASPERA_CLI {
    input:
    // ...

    output:
    tuple val(meta), path("*fastq.gz"), emit: fastq
    tuple val(meta), path("*md5")     , emit: md5

    script:
    // ...
}

workflow {
    main:
    ASPERA_CLI ( params.input, params.args )

    publish:
    ASPERA_CLI.out.fastq >> 'fastq'
    ASPERA_CLI.out.md5 >> 'md5'
}

This gives an example of how to use the module, both params and publishing, but they are opt-in. The user can incorporate these publishing rules into their pipeline explicitly, if they want to.

@pditommaso
Copy link
Member

@bentsherman can you please have a look at the conflicts

@pditommaso pditommaso merged commit ea12846 into master Oct 24, 2024
21 of 23 checks passed
@pditommaso pditommaso deleted the workflow-output-definition-final branch October 24, 2024 14:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants