From 8cb2b56d4d224602b70e356759575b084f208209 Mon Sep 17 00:00:00 2001 From: Ben Sherman Date: Wed, 22 Mar 2023 22:44:23 -0500 Subject: [PATCH 1/9] Incorporate DSL2 features into docs [ci skip] Signed-off-by: Ben Sherman --- docs/dsl1.rst | 192 ++++++++++++ docs/dsl2.rst | 769 ---------------------------------------------- docs/index.rst | 4 +- docs/module.rst | 267 ++++++++++++++++ docs/process.rst | 2 +- docs/script.rst | 34 ++ docs/wave.rst | 2 +- docs/workflow.rst | 409 ++++++++++++++++++++++++ 8 files changed, 907 insertions(+), 772 deletions(-) create mode 100644 docs/dsl1.rst delete mode 100644 docs/dsl2.rst create mode 100644 docs/module.rst create mode 100644 docs/workflow.rst diff --git a/docs/dsl1.rst b/docs/dsl1.rst new file mode 100644 index 0000000000..ce4cf5887d --- /dev/null +++ b/docs/dsl1.rst @@ -0,0 +1,192 @@ +.. _dsl1-page: + +******************** +Migrating from DSL 1 +******************** + +In Nextflow version ``22.03.0-edge``, DSL2 became the default DSL version. In version ``22.12.0-edge``, +DSL1 support was removed, and this documentation was updated to use DSL2 by default. Users who are still +using DSL1 should migrate their pipelines to DSL2 in order to use the latest versions of Nextflow. This page +describes the differences between DSL1 and DSL2, and how to migrate to DSL2. + +In Nextflow versions prior to ``22.03.0-edge``, you must enable DSL2 explicitly in order to use it. You can either +set the feature flag in your pipeline script:: + + nextflow.enable.dsl=2 + +Or set the environment variable where you launch Nextflow:: + + export NXF_DEFAULT_DSL=2 + + +Processes and workflows +======================= + +In DSL1, a process definition is also the process invocation. Process inputs and outputs are connected to channels +using ``from`` and ``into``. Here is the :ref:`getstarted-first` example written in DSL1:: + + nextflow.enable.dsl=1 + + params.str = 'Hello world!' + + process splitLetters { + + output: + file 'chunk_*' into letters + + """ + printf '${params.str}' | split -b 6 - chunk_ + """ + } + + process convertToUpper { + + input: + file x from letters.flatten() + + output: + stdout result + + """ + cat $x | tr '[a-z]' '[A-Z]' + """ + } + + result.view { it.trim() } + +To migrate this code to DSL2, you need to move all of your channel logic throughout the script +into a (nameless) ``workflow`` definition. Additionally, you must call each process explicitly, +passing any input channels as arguments (instead of ``from ...``) and receiving any output channels +as return values (instead of ``into ...``). + +Refer to the :ref:`workflow-page` page to learn how to define a workflow, as well as the original +:ref:`getstarted-first` example to see the resulting DSL2 version. + + +Channel forking +=============== + +In DSL1, a channel can be used as an input only once; to use a channel multiple times, the channel must +be forked using the ``into`` operator. + +In DSL2, channels are automatically forked when connecting two or more consumers. + +For example:: + + channel + .from('Hello','Hola','Ciao') + .set{ cheers } + + cheers + .map{ it.toUpperCase() } + .view() + + cheers + .map{ it.reverse() } + .view() + +Similarly, process outputs can be consumed by multiple consumers automatically, which makes workflow scripts +much easier to read and write. + + +Modules +======= + +In DSL1, the entire Nextflow pipeline must be defined in a single file (e.g. ``main.nf``). This +restriction becomes quite cumbersome as a pipeline becomes larger, and it hinders the sharing and +reuse of pipeline components. + +DSL2 introduces the concept of "module scripts" (or "modules" for short), which are Nextflow scripts +that can be "included" by other scripts. While modules are not essential to migrating to DSL2, nor are +they mandatory in DSL2 by any means, modules can help you organize a large pipeline into multiple smaller +files, and take advantage of modules created by others. Check out the :ref:`module-page` to get started. + + +Deprecations +============ + +Processes +--------- + +* The ``set`` process input type is no longer supported, use :ref:`tuple ` instead. +* The ``set`` process output type is no longer supported, use :ref:`tuple ` instead. +* The ``mode flatten`` option for process outputs is no longer available. Use the :ref:`operator-flatten` operator on the corresponding output channel instead. + +* Unqualified value and file elements in a tuple declaration are no longer allowed. Use an explicit + ``val`` or ``path`` qualifier. + + For example:: + + process foo { + input: + tuple X, 'some-file.sam' + output: + tuple X, 'some-file.bam' + + script: + ''' + your_command --in $X some-file.sam > some-file.bam + ''' + } + + Use:: + + process foo { + input: + tuple val(X), path('some-file.sam') + output: + tuple val(X), path('some-file.bam') + + script: + ''' + your_command --in $X some-file.sam > some-file.bam + ''' + } + + +Channels +-------- + +* Channel method ``bind`` has been deprecated in DSL2. +* Channel method ``<<`` has been deprecated in DSL2. +* Channel factory ``create`` has been deprecated in DSL2. + +Operators +--------- + +* Operator ``choice`` has been deprecated in DSL2. Use :ref:`operator-branch` instead. +* Operator ``close`` has been deprecated in DSL2. +* Operator ``countBy`` has been deprecated in DSL2. +* Operator ``into`` has been deprecated in DSL2, as it is no longer needed. +* Operator ``fork`` has been renamed to :ref:`operator-multimap`. +* Operator ``groupBy`` has been deprecated in DSL2. Use :ref:`operator-grouptuple` instead. +* Operators ``print`` and ``println`` have been deprecated in DSL2. Use :ref:`operator-view` instead. +* Operator ``route`` has been deprecated in DSL2. +* Operator ``separate`` has been deprecated in DSL2. +* Operator ``spread`` has been deprecated in DSL2. Use :ref:`operator-combine` instead. + +DSL2 Preview +------------ + +* The ``nextflow.preview.dsl=2`` feature flag is no longer needed. +* Anonymous and unwrapped includes are no longer supported. Use an explicit module inclusion instead. + + For example:: + + include './some/library' + include bar from './other/library' + + workflow { + foo() + bar() + } + + Should be replaced with:: + + include { foo } from './some/library' + include { bar } from './other/library' + + workflow { + foo() + bar() + } diff --git a/docs/dsl2.rst b/docs/dsl2.rst deleted file mode 100644 index f2907928f7..0000000000 --- a/docs/dsl2.rst +++ /dev/null @@ -1,769 +0,0 @@ -.. _dsl2-page: - -****** -DSL 2 -****** - -Nextflow provides a syntax extension that allows the definition of module libraries and -simplifies the writing of complex data analysis pipelines. - -To enable this feature you need to define the following directive at the beginning of -your workflow script:: - - nextflow.enable.dsl=2 - -.. tip:: - As of version ``22.03.0-edge`` Nextflow defaults to DSL 2 if no version is specified explicitly. - You can restore the previous behavior setting in into your environment the following variable:: - - export NXF_DEFAULT_DSL=1 - -.. note:: - As of version ``22.03.0-edge`` the DSL version specification (either 1 or 2) can also be specified in - the Nextflow configuration file using the same notation shown above. - -Function -======== - -Nextflow allows the definition of custom functions in the workflow script using the following syntax:: - - def ( arg1, arg, .. ) { - - } - -For example:: - - def foo() { - 'Hello world' - } - - def bar(alpha, omega) { - alpha + omega - } - - -The above snippet defines two simple functions, that can be invoked in the workflow script as ``foo()`` which -returns the ``Hello world`` string and ``bar(10,20)`` which returns the sum of two parameters (``30`` in this case). - -.. note:: Functions implicitly return the result of the last evaluated statement. - -The keyword ``return`` can be used to explicitly exit from a function and return the specified value. -For example:: - - def fib( x ) { - if( x <= 1 ) - return x - else - fib(x-1) + fib(x-2) - } - - -Process -======= - -Process definition ------------------- - -The new DSL separates the definition of a process from its invocation. The process definition follows the usual -syntax as described in the :ref:`process documentation `. The only difference is that the -``from`` and ``into`` channel declarations have to be omitted. - -Then a process can be invoked as a function in the ``workflow`` scope, passing the expected -input channels as parameters as if it were a custom function. For example:: - - nextflow.enable.dsl=2 - - process foo { - output: - path 'foo.txt' - - script: - """ - your_command > foo.txt - """ - } - - process bar { - input: - path x - - output: - path 'bar.txt' - - script: - """ - another_command $x > bar.txt - """ - } - - workflow { - data = channel.fromPath('/some/path/*.txt') - foo() - bar(data) - } - -.. warning:: - A process component can be invoked only once in the same workflow context. - - -Process composition -------------------- - -Processes having matching *input-output* declaration can be composed so that the output -of the first process is passed as input to the next process. Taking in consideration -the previous example, it's possible to write the following:: - - workflow { - bar(foo()) - } - - -Process output ---------------- - -A process output can also be accessed using the ``out`` attribute on the corresponding -process object. For example:: - - workflow { - foo() - bar(foo.out) - bar.out.view() - } - -When a process defines two or more output channels, each of them can be accessed -using the array element operator e.g. ``out[0]``, ``out[1]``, etc. or using -*named outputs* (see below). - - -Process named output --------------------- - -The ``emit`` option can be added to the process output definition to assign a name identifier. This name -can be used to reference the channel within the caller scope. For example:: - - process foo { - output: - path '*.bam', emit: samples_bam - - ''' - your_command --here - ''' - } - - workflow { - foo() - foo.out.samples_bam.view() - } - - -Process named stdout --------------------- - -The ``emit`` option can be used also to name the stdout:: - - process sayHello { - input: - val cheers - - output: - stdout emit: verbiage - - script: - """ - echo -n $cheers - """ - } - - workflow { - things = channel.of('Hello world!', 'Yo, dude!', 'Duck!') - sayHello(things) - sayHello.out.verbiage.view() - } - -.. note:: - Optional params for a process input/output are always prefixed with a comma, except for ``stdout``. Because - ``stdout`` does not have an associated name or value like other types, the first param should not be prefixed. - -Workflow -======== - -Workflow definition --------------------- - -The ``workflow`` keyword allows the definition of sub-workflow components that enclose the -invocation of one or more processes and operators:: - - workflow my_pipeline { - foo() - bar( foo.out.collect() ) - } - -For example, the above snippet defines a workflow component, named ``my_pipeline``, that can be invoked from -another workflow component definition as any other function or process with ``my_pipeline()``. - - -Workflow parameters ---------------------- - -A workflow component can access any variable and parameter defined in the outer scope:: - - params.data = '/some/data/file' - - workflow my_pipeline { - if( params.data ) - bar(params.data) - else - bar(foo()) - } - - -Workflow input ---------------- - -A workflow component can declare one or more input channels using the ``take`` keyword. For example:: - - workflow my_pipeline { - take: data - main: - foo(data) - bar(foo.out) - } - -.. warning:: - When the ``take`` keyword is used, the beginning of the workflow body must be identified with the - ``main`` keyword. - -Then, the input can be specified as an argument in the workflow invocation statement:: - - workflow { - my_pipeline( channel.from('/some/data') ) - } - -.. note:: - Workflow inputs are always channels by definition. If a basic data type is provided instead, - such as a number, string, list, etc, it is implicitly converted to a :ref:`value channel `. - - -Workflow output ----------------- - -A workflow component can declare one or more output channels using the ``emit`` keyword. For example:: - - workflow my_pipeline { - main: - foo(data) - bar(foo.out) - emit: - bar.out - } - -Then, the result of the ``my_pipeline`` execution can be accessed using the ``out`` property, i.e. -``my_pipeline.out``. When multiple output channels are declared, use the array bracket notation -to access each output channel as described for the `Process output`_ definition. - - -Workflow named output ---------------------- -If the output channel is assigned to an identifier in the ``emit`` declaration, such identifier can be used -to reference the channel within the caller scope. For example:: - - workflow my_pipeline { - main: - foo(data) - bar(foo.out) - emit: - my_data = bar.out - } - -Then, the result of the above snippet can accessed using ``my_pipeline.out.my_data``. - - -Workflow entrypoint -------------------- - -A workflow definition which does not declare any name (also known as *implicit workflow*) is -the entry point of execution for the workflow application. - -.. note:: - Implicit workflow definition is ignored when a script is included as a module. This - allows the writing of a workflow script that can be used either as a library module or as - an application script. - -.. tip:: - A different workflow entrypoint can be specified using the ``-entry`` command line option. - - -Workflow composition --------------------- - -Workflows defined in your script or imported with `Module inclusion`_ can be invoked and composed -as any other process in your application. - -:: - - workflow flow1 { - take: data - main: - foo(data) - bar(foo.out) - emit: - bar.out - } - - workflow flow2 { - take: data - main: - foo(data) - baz(foo.out) - emit: - baz.out - } - - workflow { - take: data - main: - flow1(data) - flow2(flow1.out) - } - -.. note:: - Nested workflow execution determines an implicit scope. Therefore the same process can be - invoked in two different workflow scopes, like for example ``foo`` in the above snippet that - is used both in ``flow1`` and ``flow2``. The workflow execution path, along with the - process names, determines the *fully qualified process name* that is used to distinguish the - two different process invocations, i.e. ``flow1:foo`` and ``flow2:foo`` in the above example. - -.. tip:: - The fully qualified process name can be used as a valid :ref:`process selector ` in the - ``nextflow.config`` file and it has priority over the simple process name. - - -Modules -======= - -The new DSL allows the definition of *module scripts* that -can be included and shared across workflow applications. - -A module script (or simply, module) can contain the definition of functions, processes and workflows -as described in the previous sections. - -.. note:: - Functions, processes and workflows are globally referred to as *components*. - - -Module inclusion ----------------- - -A component defined in a module script can be imported into another Nextflow script using the ``include`` keyword. - -For example:: - - include { foo } from './some/module' - - workflow { - data = channel.fromPath('/some/data/*.txt') - foo(data) - } - -The above snippet includes a process with name ``foo`` defined in the module script in the main -execution context. This way, ``foo`` can be invoked in the ``workflow`` scope. - -Nextflow implicitly looks for the script file ``./some/module.nf`` resolving the path -against the *including* script location. - -.. note:: - Relative paths must begin with the ``./`` prefix. Also, the ``include`` statement must be defined **outside** of the workflow definition. - -.. _dsl2-module-directory: - -Module directory ----------------- - -As of version ``22.10.0``, the module can be defined as a directory whose name matches the module name and -contains a script named ``main.nf``. For example:: - - some - └-module - └-main.nf - -When defined as a directory the module needs to be included specifying the module directory path:: - - include { foo } from './some/module' - -Module directories allows the use of module scoped binaries scripts. See `Module binaries`_ for details. - -Multiple inclusions -------------------- - -A Nextflow script allows the inclusion of an arbitrary number of modules and components. When multiple -components need to be included from the same module script, the component names can be -specified in the same inclusion using the curly brackets notation as shown below:: - - include { foo; bar } from './some/module' - - workflow { - data = channel.fromPath('/some/data/*.txt') - foo(data) - bar(data) - } - - -Module aliases --------------- - -When including a module component, it's possible to specify an *alias* with the ``as`` keyword. -This allows the inclusion and the invocation of components with the same name -in your script using different names. For example:: - - include { foo } from './some/module' - include { foo as bar } from './other/module' - - workflow { - foo(some_data) - bar(other_data) - } - -The same is possible when including the same component multiple times from the same module script as shown below:: - - include { foo; foo as bar } from './some/module' - - workflow { - foo(some_data) - bar(other_data) - } - - -Module parameters ------------------ - -A module script can define one or more parameters using the same syntax of a Nextflow workflow script:: - - params.foo = 'Hello' - params.bar = 'world!' - - def sayHello() { - println "$params.foo $params.bar" - } - - -Then, parameters are inherited from the including context. For example:: - - params.foo = 'Hola' - params.bar = 'Mundo' - - include {sayHello} from './some/module' - - workflow { - sayHello() - } - -The above snippet prints:: - - Hola Mundo - -.. note:: - The module inherits the parameters defined *before* the ``include`` statement, therefore any further - parameter set later is ignored. - -.. tip:: - Define all pipeline parameters at the beginning of the script *before* any ``include`` declaration. - -The option ``addParams`` can be used to extend the module parameters without affecting the external -scope. For example:: - - include {sayHello} from './some/module' addParams(foo: 'Ciao') - - workflow { - sayHello() - } - -The above snippet prints:: - - Ciao world! - -Finally, the include option ``params`` allows the specification of one or more parameters without -inheriting any value from the external environment. - - -.. _module-templates: - -Module templates ------------------ - -The module script can be defined in an external :ref:`template ` file. With DSL2 the template file -can be placed under the ``templates`` directory where the module script is located. - -For example, let's suppose to have a project L with a module script defining 2 processes (P1 and P2) and both use templates. -The template files can be made available under the local ``templates`` directory:: - - Project L - |─myModules.nf - └─templates - |─P1-template.sh - └─P2-template.sh - -Then, we have a second project A with a workflow that includes P1 and P2:: - - Pipeline A - └-main.nf - -Finally, we have a third project B with a workflow that includes again P1 and P2:: - - Pipeline B - └-main.nf - -With the possibility to keep the template files inside the project L, A and B can use the modules defined in L without any changes. -A future project C would do the same, just cloning L (if not available on the system) and including its module script. - -Beside promoting sharing modules across pipelines, there are several advantages in keeping the module template under the script path: - -1. module components are *self-contained*, -2. module components can be tested independently from the pipeline(s) importing them, -3. it is possible to create libraries of module components. - -Ultimately, having multiple template locations allows a more structured organization within the same project. If a project -has several module components, and all them use templates, the project could group module scripts and their templates as needed. For example:: - - baseDir - |─main.nf - └─Phase0-Modules - |─mymodules1.nf - |─mymodules2.nf - └─templates - |─P1-template.sh - |─P2-template.sh - └─Phase1-Modules - |─mymodules3.nf - |─mymodules4.nf - └─templates - |─P3-template.sh - └─P4-template.sh - └─Phase2-Modules - |─mymodules5.nf - |─mymodules6.nf - └─templates - |─P5-template.sh - |─P6-template.sh - └─P7-template.sh - -Module binaries ------------------ - -As of version ``22.10.0``, modules can define binary scripts that are locally scoped to the processes defined by the tasks. - -To enable this feature add the following setting in pipeline configuration file:: - - nextflow.enable.moduleBinaries = true - -The binary scripts must be placed in the module directory names ``/resources/usr/bin``:: - - - |─main.nf - └─resources - └─usr - └─bin - |─your-module-script1.sh - └─another-module-script2.py - -Those scripts will be accessible as any other command in the tasks environment, provided they have been granted -the Linux execute permissions. - -.. note:: - This feature requires the use of a local or shared file system as the pipeline work directory or - :ref:`wave-page` when using cloud based executors. - -Channel forking -=============== - -Using the new DSL, Nextflow channels are automatically forked when connecting two or more consumers. - -For example:: - - channel - .from('Hello','Hola','Ciao') - .set{ cheers } - - cheers - .map{ it.toUpperCase() } - .view() - - cheers - .map{ it.reverse() } - .view() - -The same is valid for the result (channel) of a process execution. Therefore a process output can be consumed by -two or more processes without the need to fork it using the :ref:`operator-into` operator, making the -writing of workflow scripts more fluent and readable. - - -Pipes -===== - -The *pipe* operator -------------------- - -Nextflow processes and operators can be composed using the ``|`` *pipe* operator. For example:: - - process foo { - input: - val data - - output: - val result - - exec: - result = "$data world" - } - - workflow { - channel.from('Hello','Hola','Ciao') | foo | map { it.toUpperCase() } | view - } - -The above snippet defines a process named ``foo`` and invokes it passing the content of the -``data`` channel. The result is then piped to the :ref:`operator-map` operator which converts each string -to uppercase and finally, the last :ref:`operator-view` operator prints it. - - -The *and* operator ------------------- - -The ``&`` *and* operator allows feeding of two or more processes with the content of the same -channel(s). For example:: - - process foo { - input: - val data - - output: - val result - - exec: - result = "$data world" - } - - process bar { - input: - val data - - output: - val result - - exec: - result = data.toUpperCase() - } - - workflow { - channel.from('Hello') | map { it.reverse() } | (foo & bar) | mix | view - } - -In the above snippet the channel emitting the ``Hello`` string is piped with the :ref:`operator-map` -which reverses the string value. Then, the result is passed to both ``foo`` and ``bar`` -processes which are executed in parallel. Each process outputs a channel, and the two channels are merged -into a single channel using the :ref:`operator-mix` operator. Finally the result is printed -using the :ref:`operator-view` operator. - -.. tip:: - The break-line operator ``\`` can be used to split long statements over multiple lines. - The above snippet can also be written as:: - - workflow { - channel.from('Hello') \ - | map { it.reverse() } \ - | (foo & bar) \ - | mix \ - | view - } - - -DSL2 migration notes -===================== - -* DSL2 final version is activated using the declaration ``nextflow.enable.dsl=2`` in place of ``nextflow.preview.dsl=2``. -* Process inputs of type ``set`` have to be replaced with :ref:`tuple `. -* Process outputs of type ``set`` have to be replaced with :ref:`tuple `. -* Process output option ``mode flatten`` is no longer available. Replace it using the :ref:`operator-flatten` operator on the corresponding output channel. -* Anonymous and unwrapped includes are not supported anymore. Replace them with an explicit module inclusion. For example:: - - include './some/library' - include bar from './other/library' - - workflow { - foo() - bar() - } - - Should be replaced with:: - - include { foo } from './some/library' - include { bar } from './other/library' - - workflow { - foo() - bar() - } - -* The use of unqualified value and file elements into input tuples is not allowed anymore. Replace them with a corresponding - ``val`` or ``path`` qualifier:: - - process foo { - input: - tuple X, 'some-file.bam' - - script: - ''' - your_command --in $X some-file.bam - ''' - } - - Use:: - - process foo { - input: - tuple val(X), path('some-file.bam') - - script: - ''' - your_command --in $X some-file.bam - ''' - } - -* The use of unqualified value and file elements into output tuples is not allowed anymore. Replace them with a corresponding - ``val`` or ``path`` qualifier:: - - process foo { - output: - tuple X, 'some-file.bam' - - script: - X = 'some value' - ''' - your_command > some-file.bam - ''' - } - - Use:: - - process foo { - output: - tuple val(X), path('some-file.bam') - - script: - X = 'some value' - ''' - your_command > some-file.bam - ''' - } - -* Operator :ref:`channel-bind1` has been deprecated by DSL2 syntax -* Operator :ref:`channel-bind2` has been deprecated by DSL2 syntax. -* Operator :ref:`operator-choice` has been deprecated by DSL2 syntax. Use :ref:`operator-branch` instead. -* Operator :ref:`operator-close` has been deprecated by DSL2 syntax. -* Operator :ref:`channel-create` has been deprecated by DSL2 syntax. -* Operator ``countBy`` has been deprecated by DSL2 syntax. -* Operator :ref:`operator-into` has been deprecated by DSL2 syntax since it's not needed anymore. -* Operator ``fork`` has been renamed to :ref:`operator-multimap`. -* Operator ``groupBy`` has been deprecated by DSL2 syntax. Replace it with :ref:`operator-grouptuple` -* Operator ``print`` and ``println`` have been deprecated by DSL2 syntax. Use :ref:`operator-view` instead. -* Operator :ref:`operator-separate` has been deprecated by DSL2 syntax. -* Operator :ref:`operator-spread` has been deprecated with DSL2 syntax. Replace it with :ref:`operator-combine`. -* Operator ``route`` has been deprecated by DSL2 syntax. diff --git a/docs/index.rst b/docs/index.rst index f5fe34eecb..975fa7742d 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -17,9 +17,10 @@ Contents: process channel operator + workflow + module executor config - dsl2 cli container wave @@ -39,3 +40,4 @@ Contents: mail plugins secrets + dsl1 diff --git a/docs/module.rst b/docs/module.rst new file mode 100644 index 0000000000..55011420e3 --- /dev/null +++ b/docs/module.rst @@ -0,0 +1,267 @@ +.. _module-page: + +******* +Modules +******* + +.. note:: + Modules were introduced in DSL2. If you are still using DSL1, see the :ref:`dsl1-page` page to + learn how to migrate your Nextflow pipelines to DSL2. + +In Nextflow, a **module** is a script that may contain functions, processes, and workflows. A module +can be included in other modules or pipeline scripts and even shared across workflows. + +.. note:: + Functions, processes, and workflows are collectively referred to as *components*. + + +Module inclusion +---------------- + +A component defined in a module script can be imported into another Nextflow script using the ``include`` keyword. + +For example:: + + include { foo } from './some/module' + + workflow { + data = channel.fromPath('/some/data/*.txt') + foo(data) + } + +The above snippet imports a process named ``foo``, defined in the module script, into the main +execution context. This way, ``foo`` can be invoked in the ``workflow`` scope. + +Nextflow implicitly looks for the script file ``./some/module.nf``, resolving the path +against the *including* script location. + +Module includes are subject to the following rules: + +- Relative paths must begin with the ``./`` prefix. +- Include statements are not allowed from within a worklfow. They must occur at the script level. + +.. _module-directory: + +Module directory +---------------- + +.. note:: + This feature requires Nextflow version ``22.10.0`` or later. + +A module can be defined as a directory with the same name as the module and with a script +named ``main.nf``. For example:: + + some + └-module + └-main.nf + +When defined as a directory, the module must be included by specifying the module directory path:: + + include { foo } from './some/module' + +Module directories allow the use of module scoped binaries scripts. See `Module binaries`_ for details. + +Multiple inclusions +------------------- + +A Nextflow script can include any number of modules, and an ``include`` statement can import any number of +components from a module. Multiple components can be included from the same module by using the syntax +shown below:: + + include { foo; bar } from './some/module' + + workflow { + data = channel.fromPath('/some/data/*.txt') + foo(data) + bar(data) + } + + +.. _module-aliases: + +Module aliases +-------------- + +When including a module component, it's possible to specify an *alias* with the ``as`` keyword. +Aliasing allows you to include and invoke components with the same name, by assigning them different +names in the including context. For example:: + + include { foo } from './some/module' + include { foo as bar } from './other/module' + + workflow { + foo(some_data) + bar(other_data) + } + +You can even include the same component multiple times under different names:: + + include { foo; foo as bar } from './some/module' + + workflow { + foo(some_data) + bar(other_data) + } + + +Module parameters +----------------- + +A module script can define parameters using the same syntax as a Nextflow workflow script:: + + params.foo = 'Hello' + params.bar = 'world!' + + def sayHello() { + println "$params.foo $params.bar" + } + + +When including a module, the module will first use parameters from the including context. For example:: + + params.foo = 'Hola' + params.bar = 'Mundo' + + include { sayHello } from './some/module' + + workflow { + sayHello() + } + +The above snippet prints:: + + Hola Mundo + +.. note:: + The module inherits the parameters defined *before* the ``include`` statement, therefore any parameters + set afterwards will not be used by the module. + +.. tip:: + It is best to define all pipeline parameters *before* any ``include`` statements. + +The ``addParams`` option can be used to pass parameters to the module without affecting the including +scope. + +:: + + params.foo = 'Hola' + params.bar = 'Mundo' + + include { sayHello } from './some/module' addParams(foo: 'Ciao') + + workflow { + sayHello() + } + +The above snippet prints:: + + Ciao Mundo + +Alternatively, the ``params`` option allows you to pass parameters to module without affecting the including +scope, *and* without inheriting any parameters from the including scope. + +:: + + params.foo = 'Hola' + params.bar = 'Mundo' + + include { sayHello } from './some/module' params(foo: 'Ciao') + + workflow { + sayHello() + } + +The above snippet prints:: + + Ciao world! + + +.. _module-templates: + +Module templates +---------------- + +The module script can be defined in an external :ref:`template ` file. The template file +can be placed in the ``templates`` directory where the module script is located. + +For example, suppose we have a project L with a module script that defines two processes, P1 and P2, both +of which use templates. The template files can be made available in the local ``templates`` directory:: + + Project L + |─myModules.nf + └─templates + |─P1-template.sh + └─P2-template.sh + +Then, we have a second project A with a workflow that includes P1 and P2:: + + Pipeline A + └-main.nf + +Finally, we have a third project B with a workflow that also includes P1 and P2:: + + Pipeline B + └-main.nf + +With the possibility to keep the template files inside the project L, A and B can use the modules defined in L without any changes. +A future project C would do the same, just cloning L (if not available on the system) and including its module script. + +Beside promoting the sharing of modules across pipelines, there are several advantages to keeping the module template under the script path: + +1. module components are *self-contained*, +2. module components can be tested independently from the pipeline(s) that import them, +3. it is possible to create libraries of module components. + +Ultimately, having multiple template locations allows a more structured organization within the same project. If a project +has several module components, and all of them use templates, the project could group module scripts and their templates as needed. For example:: + + baseDir + |─main.nf + └─Phase0-Modules + |─mymodules1.nf + |─mymodules2.nf + └─templates + |─P1-template.sh + |─P2-template.sh + └─Phase1-Modules + |─mymodules3.nf + |─mymodules4.nf + └─templates + |─P3-template.sh + └─P4-template.sh + └─Phase2-Modules + |─mymodules5.nf + |─mymodules6.nf + └─templates + |─P5-template.sh + |─P6-template.sh + └─P7-template.sh + +Module binaries +--------------- + +.. note:: + This feature requires Nextflow version ``22.10.0`` or later. + +Modules can define binary scripts that are locally scoped to the processes defined by the tasks. + +To enable this feature, enable the following flag in your pipeline script or configuration file:: + + nextflow.enable.moduleBinaries = true + +The binary scripts must be placed in the module directory names ``/resources/usr/bin``:: + + + |─main.nf + └─resources + └─usr + └─bin + |─your-module-script1.sh + └─another-module-script2.py + +Those scripts will be made accessible like any other command in the task environment, provided they have been granted +the Linux execute permissions. + +.. note:: + This feature requires the use of a local or shared file system for the pipeline work directory, or + :ref:`wave-page` when using container-native executors. diff --git a/docs/process.rst b/docs/process.rst index e24de605c8..f5bc616984 100644 --- a/docs/process.rst +++ b/docs/process.rst @@ -7,7 +7,7 @@ Processes In Nextflow, a **process** is the basic processing primitive to execute a user script. The process definition starts with the keyword ``process``, followed by process name and finally the process body -delimited by curly brackets. The process body must contain a string which represents the command or, more generally, +delimited by curly braces. The process body must contain a string which represents the command or, more generally, a script that is executed by it. A basic process looks like the following example:: process sayHello { diff --git a/docs/script.rst b/docs/script.rst index d41376246d..242b166bc2 100644 --- a/docs/script.rst +++ b/docs/script.rst @@ -215,6 +215,40 @@ In the preceding example, ``blastp`` and its ``-in``, ``-out``, ``-db`` and ``-h their arguments are effectively a single line. +Functions +--------- + +Functions can be defined using the following syntax:: + + def ( arg1, arg, .. ) { + + } + +For example:: + + def foo() { + 'Hello world' + } + + def bar(alpha, omega) { + alpha + omega + } + +The above snippet defines two simple functions, that can be invoked in the workflow script as ``foo()``, which +returns ``'Hello world'``, and ``bar(10,20)``, which returns the sum of two parameters (``30`` in this case). + +.. note:: Functions implicitly return the result of the last evaluated statement. + +The keyword ``return`` can be used to explicitly exit from a function and return the specified value. For example:: + + def fib( x ) { + if( x <= 1 ) + return x + else + fib(x-1) + fib(x-2) + } + + .. _implicit-variables: Implicit variables diff --git a/docs/wave.rst b/docs/wave.rst index 3309965a90..69d59d57a9 100644 --- a/docs/wave.rst +++ b/docs/wave.rst @@ -65,7 +65,7 @@ Build module containers Wave can build and provision container images on-demand for your Nextflow pipelines. -To enable this feature, add the Dockerfile of the container to be built in the :ref:`module directory ` +To enable this feature, add the Dockerfile of the container to be built in the :ref:`module directory ` where the pipeline process is defined. When Wave is enabled, it automatically uses the Dockerfile to build the required container, upload to the registry, and use the container to carry out the tasks defined in the module. diff --git a/docs/workflow.rst b/docs/workflow.rst new file mode 100644 index 0000000000..1cef5b778c --- /dev/null +++ b/docs/workflow.rst @@ -0,0 +1,409 @@ +.. _workflow-page: + +********* +Workflows +********* + +.. note:: + Workflows were introduced in DSL2. If you are still using DSL1, see the :ref:`dsl1-page` page to + learn how to migrate your Nextflow pipelines to DSL2. + +In Nextflow, a **workflow** is a composition of processes and dataflow logic (i.e. channels and operators). + +The workflow definition starts with the keyword ``workflow``, followed by an optional name, and finally the workflow body +delimited by curly braces. A basic workflow looks like the following example:: + + workflow { + foo() + } + +Where ``foo`` could be a function, a process, or another workflow. + +Workflows are *lazily executed*, which means that Nextflow parses the entire workflow structure first, and then +executes the entire workflow at once. The order in which a task is executed is determined only by its dependencies, so a task +will be executed as soon as all of its required inputs are available. + +The syntax of a workflow is defined as follows:: + + workflow [ name ] { + + take: + < workflow inputs > + + main: + < dataflow statements > + + emit: + < workflow outputs > + + } + +.. note:: + The ``main:`` label can be omitted if there are no ``take:`` or ``emit:`` blocks. + + +Invoking processes +================== + +A process can be invoked like a function in a workflow definition, passing the expected +input channels like function arguments. For example:: + + process foo { + output: + path 'foo.txt' + + script: + """ + your_command > foo.txt + """ + } + + process bar { + input: + path x + + output: + path 'bar.txt' + + script: + """ + another_command $x > bar.txt + """ + } + + workflow { + data = channel.fromPath('/some/path/*.txt') + foo() + bar(data) + } + +.. warning:: + A process can be invoked only once in the same workflow. See :ref:`module-aliases` for + a workaround. + + +Process composition +------------------- + +Processes with matching input/output declarations can be composed so that the output +of the first process is passed as input to the second process. Taking in consideration +the previous example, it's possible to write the following:: + + workflow { + bar(foo()) + } + + +Process outputs +--------------- + +A process output can be accessed using the ``out`` attribute on the corresponding +process object. For example:: + + workflow { + foo() + bar(foo.out) + bar.out.view() + } + +When a process defines multiple output channels, each output can be accessed +using the array element operator (``out[0]``, ``out[1]``, etc.) or using *named outputs* (see below). + +The process output(s) can also be accessed like the return value of a function:: + + workflow { + f_out = foo() + (b1, b2) = bar(f_out) + b1.view() + } + + +Process named outputs +--------------------- + +The ``emit`` option can be added to the process output definition to assign a name identifier. This name +can be used to reference the channel from the calling workflow. For example:: + + process foo { + output: + path '*.bam', emit: samples_bam + + ''' + your_command --here + ''' + } + + workflow { + foo() + foo.out.samples_bam.view() + } + + +Process named stdout +-------------------- + +The ``emit`` option can also be used to name a ``stdout`` output:: + + process sayHello { + input: + val cheers + + output: + stdout emit: verbiage + + script: + """ + echo -n $cheers + """ + } + + workflow { + things = channel.of('Hello world!', 'Yo, dude!', 'Duck!') + sayHello(things) + sayHello.out.verbiage.view() + } + +.. note:: + Optional params for a process input/output are always prefixed with a comma, except for ``stdout``. Because + ``stdout`` does not have an associated name or value like other types, the first param should not be prefixed. + + +Subworkflows +============ + +A named workflow is a "subworkflow" that can be invoked from other workflows. For example:: + + workflow my_pipeline { + foo() + bar( foo.out.collect() ) + } + + workflow { + my_pipeline() + } + +The above snippet defines a workflow named ``my_pipeline``, that can be invoked from +another workflow as ``my_pipeline()``, just like any other function or process. + + +Workflow parameters +------------------- + +A workflow component can access any variable or parameter defined in the global scope:: + + params.data = '/some/data/file' + + workflow my_pipeline { + if( params.data ) + bar(params.data) + else + bar(foo()) + } + + +Workflow inputs +--------------- + +A workflow can declare one or more input channels using the ``take`` keyword. For example:: + + workflow my_pipeline { + take: data + main: + foo(data) + bar(foo.out) + } + +Multiple inputs must be specified on separate lines:: + + workflow my_pipeline { + take: + data1 + data2 + main: + foo(data1, data2) + bar(foo.out) + } + +.. warning:: + When the ``take`` keyword is used, the beginning of the workflow body must be defined with the + ``main`` keyword. + +Inputs can be specified like arguments when invoking the workflow:: + + workflow { + my_pipeline( channel.from('/some/data') ) + } + +.. note:: + Workflow inputs are always channels by definition. If a basic data type, such as a number, string, + list, etc, is provided, it is implicitly converted to a :ref:`value channel `. + + +Workflow outputs +---------------- + +A workflow can declare one or more output channels using the ``emit`` keyword. For example:: + + workflow my_pipeline { + main: + foo(data) + bar(foo.out) + emit: + bar.out + } + +When invoking the workflow, the output channel(s) can be accessed using the ``out`` property, i.e. +``my_pipeline.out``. When multiple output channels are declared, use the array bracket notation or +the assignment syntax to access each output channel as described for `Process outputs`_. + + +Workflow named outputs +---------------------- + +If an output channel is assigned to an identifier in the ``emit`` block, the identifier can be used +to reference the channel from the calling workflow. For example:: + + workflow my_pipeline { + main: + foo(data) + bar(foo.out) + emit: + my_data = bar.out + } + +The result of the above workflow can be accessed using ``my_pipeline.out.my_data``. + + +Workflow entrypoint +------------------- + +A workflow with no name (also known as the *implicit workflow*) is the default entrypoint of the +Nextflow pipeline. A different workflow entrypoint can be specified using the ``-entry`` command line option. + +.. note:: + Implicit workflow definitions are ignored when a script is included as a module. This way, + a workflow script can be written in such a way that it can be used either as a library module or + an application script. + + +Workflow composition +-------------------- + +Named workflows can be invoked and composed just like any other process or function. + +:: + + workflow flow1 { + take: data + main: + foo(data) + bar(foo.out) + emit: + bar.out + } + + workflow flow2 { + take: data + main: + foo(data) + baz(foo.out) + emit: + baz.out + } + + workflow { + take: data + main: + flow1(data) + flow2(flow1.out) + } + +.. note:: + Each workflow invocation has its own scope. As a result, the same process can be + invoked in two different workflow scopes, like ``foo`` in the above snippet, which + is used in both ``flow1`` and ``flow2``. The workflow execution path, along with the + process names, determines the *fully qualified process name* that is used to distinguish the + different process invocations, i.e. ``flow1:foo`` and ``flow2:foo`` in the above example. + +.. tip:: + The fully qualified process name can be used as a :ref:`process selector ` in a + Nextflow configuration file, and it takes priority over the simple process name. + + +Special operators +================= + +Pipe (``|``) +------------ + +The ``|`` *pipe* operator can be used to compose Nextflow processes and operators. For example:: + + process foo { + input: + val data + + output: + val result + + exec: + result = "$data world" + } + + workflow { + channel.from('Hello','Hola','Ciao') | foo | map { it.toUpperCase() } | view + } + +The above snippet defines a process named ``foo`` and invokes it with the ``data`` channel. The +result is then piped to the :ref:`operator-map` operator, which converts each string to uppercase, +and finally to the :ref:`operator-view` operator which prints it. + +.. tip:: + Statements can also be split across multiple lines for better readability:: + + workflow { + channel.from('Hello','Hola','Ciao') + | foo + | map { it.toUpperCase() } + | view + } + + +And (``&``) +----------- + +The ``&`` *and* operator can be used to feed multiple processes with the same channel(s). For example:: + + process foo { + input: + val data + + output: + val result + + exec: + result = "$data world" + } + + process bar { + input: + val data + + output: + val result + + exec: + result = data.toUpperCase() + } + + workflow { + channel.from('Hello') + | map { it.reverse() } + | (foo & bar) + | mix + | view + } + +In the above snippet, the initial channel is piped to the :ref:`operator-map` operator, which +reverses the string value. Then, the result is passed to the processes ``foo`` and ``bar``, which +are executed in parallel. Each process outputs a channel, and the two channels are combined using +the :ref:`operator-mix` operator. Finally, the result is printed using the :ref:`operator-view` operator. From 4f6414a83fad59fbbbe20cf768e9eab1da486843 Mon Sep 17 00:00:00 2001 From: Ben Sherman Date: Wed, 22 Mar 2023 22:55:00 -0500 Subject: [PATCH 2/9] Rename "Nextflow scripting" page to "Scripts" [ci skip] Signed-off-by: Ben Sherman --- docs/script.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/script.rst b/docs/script.rst index 242b166bc2..88e04c383d 100644 --- a/docs/script.rst +++ b/docs/script.rst @@ -1,8 +1,8 @@ .. _script-page: -****************** -Nextflow scripting -****************** +******* +Scripts +******* The Nextflow scripting language is an extension of the Groovy programming language. Groovy is a powerful programming language for the Java virtual machine. The Nextflow From 49d0bb433be1f1eea50874664b4b83224045bbe2 Mon Sep 17 00:00:00 2001 From: Ben Sherman Date: Fri, 24 Mar 2023 09:37:13 -0500 Subject: [PATCH 3/9] Apply suggestions from code review [ci skip] Co-authored-by: Phil Ewels Signed-off-by: Ben Sherman --- docs/dsl1.rst | 10 ++++++---- docs/module.rst | 2 +- 2 files changed, 7 insertions(+), 5 deletions(-) diff --git a/docs/dsl1.rst b/docs/dsl1.rst index ce4cf5887d..37976fb2b3 100644 --- a/docs/dsl1.rst +++ b/docs/dsl1.rst @@ -5,7 +5,7 @@ Migrating from DSL 1 ******************** In Nextflow version ``22.03.0-edge``, DSL2 became the default DSL version. In version ``22.12.0-edge``, -DSL1 support was removed, and this documentation was updated to use DSL2 by default. Users who are still +DSL1 support was removed, and the Nextflow documentation was updated to use DSL2 by default. Users who are still using DSL1 should migrate their pipelines to DSL2 in order to use the latest versions of Nextflow. This page describes the differences between DSL1 and DSL2, and how to migrate to DSL2. @@ -55,7 +55,7 @@ using ``from`` and ``into``. Here is the :ref:`getstarted-first` example written result.view { it.trim() } To migrate this code to DSL2, you need to move all of your channel logic throughout the script -into a (nameless) ``workflow`` definition. Additionally, you must call each process explicitly, +into a ``workflow`` definition. Additionally, you must call each process explicitly, passing any input channels as arguments (instead of ``from ...``) and receiving any output channels as return values (instead of ``into ...``). @@ -71,7 +71,7 @@ be forked using the ``into`` operator. In DSL2, channels are automatically forked when connecting two or more consumers. -For example:: +For example, this would not work in DSL1 but is not a problem in DSL2:: channel .from('Hello','Hola','Ciao') @@ -168,7 +168,9 @@ Operators DSL2 Preview ------------ -* The ``nextflow.preview.dsl=2`` feature flag is no longer needed. +An early preview of DSL2 was available in 2020. Note that some of that early DSL2 syntax has since changed. + +* The ``nextflow.preview.dsl=2`` (and ``nextflow.enable.dsl=1``) feature flags are no longer needed. * Anonymous and unwrapped includes are no longer supported. Use an explicit module inclusion instead. For example:: diff --git a/docs/module.rst b/docs/module.rst index 55011420e3..e631599374 100644 --- a/docs/module.rst +++ b/docs/module.rst @@ -83,7 +83,7 @@ Module aliases -------------- When including a module component, it's possible to specify an *alias* with the ``as`` keyword. -Aliasing allows you to include and invoke components with the same name, by assigning them different +Aliasing allows you to avoid module name clashes, by assigning them different names in the including context. For example:: include { foo } from './some/module' From f7cc88c2cf8d6cd7fecce05a257cdb928d72a762 Mon Sep 17 00:00:00 2001 From: Ben Sherman Date: Thu, 23 Mar 2023 04:30:26 -0500 Subject: [PATCH 4/9] Apply suggestions from review [ci skip] Signed-off-by: Ben Sherman --- docs/dsl1.rst | 41 +++++++++++++++++++++++++++++++++++------ docs/module.rst | 19 ++++++++----------- docs/script.rst | 5 ++--- docs/workflow.rst | 14 ++++++-------- 4 files changed, 51 insertions(+), 28 deletions(-) diff --git a/docs/dsl1.rst b/docs/dsl1.rst index 37976fb2b3..9f0b8d3819 100644 --- a/docs/dsl1.rst +++ b/docs/dsl1.rst @@ -59,8 +59,37 @@ into a ``workflow`` definition. Additionally, you must call each process explici passing any input channels as arguments (instead of ``from ...``) and receiving any output channels as return values (instead of ``into ...``). -Refer to the :ref:`workflow-page` page to learn how to define a workflow, as well as the original -:ref:`getstarted-first` example to see the resulting DSL2 version. +Refer to the :ref:`workflow-page` page to learn how to define a workflow. The DSL2 version of the above +script is duplicated here for your convenience :: + + params.str = 'Hello world!' + + process splitLetters { + + output: + path 'chunk_*' + + """ + printf '${params.str}' | split -b 6 - chunk_ + """ + } + + process convertToUpper { + + input: + path x + + output: + stdout + + """ + cat $x | tr '[a-z]' '[A-Z]' + """ + } + + workflow { + splitLetters | flatten | convertToUpper | view { it.trim() } + } Channel forking @@ -124,9 +153,9 @@ Processes tuple X, 'some-file.bam' script: - ''' + """ your_command --in $X some-file.sam > some-file.bam - ''' + """ } Use:: @@ -138,9 +167,9 @@ Processes tuple val(X), path('some-file.bam') script: - ''' + """ your_command --in $X some-file.sam > some-file.bam - ''' + """ } diff --git a/docs/module.rst b/docs/module.rst index e631599374..b224231924 100644 --- a/docs/module.rst +++ b/docs/module.rst @@ -4,16 +4,14 @@ Modules ******* +In Nextflow, a **module** is a script that may contain functions, processes, and workflows +(collectively referred to as *components*). A module can be included in other modules or +pipeline scripts and even shared across workflows. + .. note:: Modules were introduced in DSL2. If you are still using DSL1, see the :ref:`dsl1-page` page to learn how to migrate your Nextflow pipelines to DSL2. -In Nextflow, a **module** is a script that may contain functions, processes, and workflows. A module -can be included in other modules or pipeline scripts and even shared across workflows. - -.. note:: - Functions, processes, and workflows are collectively referred to as *components*. - Module inclusion ---------------- @@ -139,8 +137,7 @@ The above snippet prints:: .. tip:: It is best to define all pipeline parameters *before* any ``include`` statements. -The ``addParams`` option can be used to pass parameters to the module without affecting the including -scope. +The ``addParams`` option can be used to pass parameters to the module without adding them to the including scope. :: @@ -157,8 +154,8 @@ The above snippet prints:: Ciao Mundo -Alternatively, the ``params`` option allows you to pass parameters to module without affecting the including -scope, *and* without inheriting any parameters from the including scope. +Alternatively, the ``params`` option can be used to pass parameters to module without adding them +to the including scope, *and* without inheriting any parameters from the including scope. :: @@ -264,4 +261,4 @@ the Linux execute permissions. .. note:: This feature requires the use of a local or shared file system for the pipeline work directory, or - :ref:`wave-page` when using container-native executors. + :ref:`wave-page` when using cloud-based executors. diff --git a/docs/script.rst b/docs/script.rst index 88e04c383d..a80e2dcf48 100644 --- a/docs/script.rst +++ b/docs/script.rst @@ -237,9 +237,8 @@ For example:: The above snippet defines two simple functions, that can be invoked in the workflow script as ``foo()``, which returns ``'Hello world'``, and ``bar(10,20)``, which returns the sum of two parameters (``30`` in this case). -.. note:: Functions implicitly return the result of the last evaluated statement. - -The keyword ``return`` can be used to explicitly exit from a function and return the specified value. For example:: +Functions implicitly return the result of the last statement. Additionally, the ``return`` keyword can be used to +explicitly exit from a function and return the specified value. For example:: def fib( x ) { if( x <= 1 ) diff --git a/docs/workflow.rst b/docs/workflow.rst index 1cef5b778c..a4788dbb54 100644 --- a/docs/workflow.rst +++ b/docs/workflow.rst @@ -4,10 +4,6 @@ Workflows ********* -.. note:: - Workflows were introduced in DSL2. If you are still using DSL1, see the :ref:`dsl1-page` page to - learn how to migrate your Nextflow pipelines to DSL2. - In Nextflow, a **workflow** is a composition of processes and dataflow logic (i.e. channels and operators). The workflow definition starts with the keyword ``workflow``, followed by an optional name, and finally the workflow body @@ -38,9 +34,13 @@ The syntax of a workflow is defined as follows:: } -.. note:: +.. tip:: The ``main:`` label can be omitted if there are no ``take:`` or ``emit:`` blocks. +.. note:: + Workflows were introduced in DSL2. If you are still using DSL1, see the :ref:`dsl1-page` page to + learn how to migrate your Nextflow pipelines to DSL2. + Invoking processes ================== @@ -78,9 +78,7 @@ input channels like function arguments. For example:: } .. warning:: - A process can be invoked only once in the same workflow. See :ref:`module-aliases` for - a workaround. - + A process can be only be invoked once in a single workflow, unless using :ref:`module-aliases`. Process composition ------------------- From b53f2fbd465bea783a8b8ca63953c4ff3d8049a8 Mon Sep 17 00:00:00 2001 From: Ben Sherman Date: Tue, 25 Apr 2023 11:19:39 -0500 Subject: [PATCH 5/9] minor edits Signed-off-by: Ben Sherman --- docs/azure.md | 2 +- docs/container.md | 4 ++-- docs/script.md | 2 +- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/azure.md b/docs/azure.md index ac5b7b65b7..341db0ff49 100644 --- a/docs/azure.md +++ b/docs/azure.md @@ -215,7 +215,7 @@ The pool name can only contain alphanumeric, hyphen and underscore characters. ::: :::{warning} -If the pool name includes a hyphen, make sure to wrap it with single quotes. For example:: +If the pool name includes a hyphen, make sure to wrap it with single quotes. For example: ```groovy azure { diff --git a/docs/container.md b/docs/container.md index df6aa749c7..02e44cae6c 100644 --- a/docs/container.md +++ b/docs/container.md @@ -32,7 +32,7 @@ If your Apptainer installation support the "user bind control" feature, enable t ### How it works -The integration for Apptainer follows the same execution model implemented for Docker. You won't need to modify your Nextflow script in order to run it with Apptainer. Simply specify the Apptainer image file from where the containers are started by using the `-with-apptainer` command line option. For example:: +The integration for Apptainer follows the same execution model implemented for Docker. You won't need to modify your Nextflow script in order to run it with Apptainer. Simply specify the Apptainer image file from where the containers are started by using the `-with-apptainer` command line option. For example: ```bash nextflow run -with-apptainer [apptainer image file] @@ -65,7 +65,7 @@ When a process input is a *symbolic link* file, make sure the linked file is sto ### Multiple containers -It is possible to specify a different Apptainer image for each process definition in your pipeline script. For example, let's suppose you have two processes named `foo` and `bar`. You can specify two different Apptainer images specifying them in the `nextflow.config` file as shown below:: +It is possible to specify a different Apptainer image for each process definition in your pipeline script. For example, let's suppose you have two processes named `foo` and `bar`. You can specify two different Apptainer images specifying them in the `nextflow.config` file as shown below: ```groovy process { diff --git a/docs/script.md b/docs/script.md index f7d0c739f3..df93696424 100644 --- a/docs/script.md +++ b/docs/script.md @@ -590,7 +590,7 @@ To include the schema, the `toUriString()` method should be used instead: assert ref.toUriString() == 's3://some-bucket/foo.txt' ``` -Also, instead of composing paths through string interpolation, the `resolve()` method or the `/` operator should be used instead:: +Also, instead of composing paths through string interpolation, the `resolve()` method or the `/` operator should be used instead: ```groovy def dir = file('s3://bucket/some/data/path') From 8ffa4ff640d0830dfe5c2b7c8ec492dc33b66659 Mon Sep 17 00:00:00 2001 From: Ben Sherman Date: Mon, 18 Sep 2023 10:39:20 -0500 Subject: [PATCH 6/9] Minor updates Signed-off-by: Ben Sherman --- docs/dsl1.md | 39 ++++++------------------------ docs/getstarted.md | 28 ++------------------- docs/module.md | 2 +- docs/snippets/your-first-script.nf | 26 ++++++++++++++++++++ docs/workflow.md | 14 ++++++++--- 5 files changed, 47 insertions(+), 62 deletions(-) create mode 100644 docs/snippets/your-first-script.nf diff --git a/docs/dsl1.md b/docs/dsl1.md index ce50a4ae38..5635590b3d 100644 --- a/docs/dsl1.md +++ b/docs/dsl1.md @@ -26,7 +26,6 @@ nextflow.enable.dsl=1 params.str = 'Hello world!' process splitLetters { - output: file 'chunk_*' into letters @@ -36,7 +35,6 @@ process splitLetters { } process convertToUpper { - input: file x from letters.flatten() @@ -55,35 +53,8 @@ To migrate this code to DSL2, you need to move all of your channel logic through Refer to the {ref}`workflow-page` page to learn how to define a workflow. The DSL2 version of the above script is duplicated here for your convenience: -```groovy -params.str = 'Hello world!' - -process splitLetters { - - output: - path 'chunk_*' - - """ - printf '${params.str}' | split -b 6 - chunk_ - """ -} - -process convertToUpper { - - input: - path x - - output: - stdout - - """ - cat $x | tr '[a-z]' '[A-Z]' - """ -} - -workflow { - splitLetters | flatten | convertToUpper | view { it.trim() } -} +```{literalinclude} snippets/your-first-script.nf +:language: groovy ``` ## Channel forking @@ -95,7 +66,7 @@ In DSL2, channels are automatically forked when connecting two or more consumers For example, this would not work in DSL1 but is not a problem in DSL2: ```groovy -channel +Channel .from('Hello','Hola','Ciao') .set{ cheers } @@ -116,6 +87,10 @@ In DSL1, the entire Nextflow pipeline must be defined in a single file (e.g. `ma DSL2 introduces the concept of "module scripts" (or "modules" for short), which are Nextflow scripts that can be "included" by other scripts. While modules are not essential to migrating to DSL2, nor are they mandatory in DSL2 by any means, modules can help you organize a large pipeline into multiple smaller files, and take advantage of modules created by others. Check out the {ref}`module-page` to get started. +:::{note} +With DSL2, the Groovy shell used by Nextflow also imposes a 64KB size limit on pipeline scripts, so if your DSL1 script is very large, you may need to split your script into modules anyway to avoid this limit. +::: + ## Deprecations ### Processes diff --git a/docs/getstarted.md b/docs/getstarted.md index 4049a0e694..da47ecb5a9 100644 --- a/docs/getstarted.md +++ b/docs/getstarted.md @@ -96,32 +96,8 @@ nextflow self-update Copy the following example into your favorite text editor and save it to a file named `tutorial.nf`: -```groovy -params.str = 'Hello world!' - -process splitLetters { - output: - path 'chunk_*' - - """ - printf '${params.str}' | split -b 6 - chunk_ - """ -} - -process convertToUpper { - input: - path x - output: - stdout - - """ - cat $x | tr '[a-z]' '[A-Z]' - """ -} - -workflow { - splitLetters | flatten | convertToUpper | view { it.trim() } -} +```{literalinclude} snippets/your-first-script.nf +:language: groovy ``` :::{note} diff --git a/docs/module.md b/docs/module.md index 6d2409ff6c..9cae786a71 100644 --- a/docs/module.md +++ b/docs/module.md @@ -30,7 +30,7 @@ Nextflow implicitly looks for the script file `./some/module.nf`, resolving the Module includes are subject to the following rules: - Relative paths must begin with the `./` prefix. -- Include statements are not allowed from within a worklfow. They must occur at the script level. +- Include statements are not allowed from within a workflow. They must occur at the script level. (module-directory)= diff --git a/docs/snippets/your-first-script.nf b/docs/snippets/your-first-script.nf new file mode 100644 index 0000000000..f457ff3deb --- /dev/null +++ b/docs/snippets/your-first-script.nf @@ -0,0 +1,26 @@ +params.str = 'Hello world!' + +process splitLetters { + output: + path 'chunk_*' + + """ + printf '${params.str}' | split -b 6 - chunk_ + """ +} + +process convertToUpper { + input: + path x + + output: + stdout + + """ + cat $x | tr '[a-z]' '[A-Z]' + """ +} + +workflow { + splitLetters | flatten | convertToUpper | view { it.trim() } +} \ No newline at end of file diff --git a/docs/workflow.md b/docs/workflow.md index fab1b7ec20..ebc3aae4ee 100644 --- a/docs/workflow.md +++ b/docs/workflow.md @@ -41,7 +41,7 @@ The `main:` label can be omitted if there are no `take:` or `emit:` blocks. Workflows were introduced in DSL2. If you are still using DSL1, see the {ref}`dsl1-page` page to learn how to migrate your Nextflow pipelines to DSL2. ::: -## Invoking processes +## Process invocation A process can be invoked like a function in a workflow definition, passing the expected input channels like function arguments. For example: @@ -77,12 +77,12 @@ workflow { ``` :::{warning} -A process can be only be invoked once in a single workflow, unless using {ref}`module-aliases`. +A process can be only be invoked once in a single workflow, however you can get around this restriction by using {ref}`module-aliases`. ::: ### Process composition -Processes with matching input/output declarations can be composed so that the output of the first process is passed as input to the second process. Taking in consideration the previous example, it's possible to write the following: +Processes with matching input/output declarations can be composed so that the output of the first process is passed as input to the second process. The previous example can be rewritten as follows: ```groovy workflow { @@ -134,6 +134,14 @@ workflow { } ``` +When referencing a named output directly from the process invocation, you can use a more concise syntax: + +```groovy +workflow { + ch_samples = foo().samples_bam +} +``` + ### Process named stdout The `emit` option can also be used to name a `stdout` output: From 0818aedc0277d58af56dcffde5551076a9e45396 Mon Sep 17 00:00:00 2001 From: Ben Sherman Date: Tue, 26 Sep 2023 16:12:54 -0500 Subject: [PATCH 7/9] Add redirect dsl2.html -> dsl1.html, fix warnings Signed-off-by: Ben Sherman --- docs/conf.py | 7 ++++++- docs/module.md | 2 ++ docs/requirements.txt | 3 ++- 3 files changed, 10 insertions(+), 2 deletions(-) diff --git a/docs/conf.py b/docs/conf.py index 6ef95e6b15..9a81200273 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -30,6 +30,7 @@ extensions = [ 'sphinx.ext.mathjax', 'sphinxcontrib.mermaid', + 'sphinxext.rediraffe', 'sphinx_rtd_theme', 'myst_parser' ] @@ -38,6 +39,10 @@ myst_heading_anchors = 3 +rediraffe_redirects = { + 'dsl2.md': 'dsl1.md' +} + # Add any paths that contain templates here, relative to this directory. templates_path = ['_templates'] @@ -75,7 +80,7 @@ # List of patterns, relative to source directory, that match files and # directories to ignore when looking for source files. -exclude_patterns = ['_build'] +exclude_patterns = ['_build', '**README.md'] # The reST default role (used for this markup: `text`) to use for all documents. #default_role = None diff --git a/docs/module.md b/docs/module.md index 9cae786a71..11298699dc 100644 --- a/docs/module.md +++ b/docs/module.md @@ -238,6 +238,8 @@ baseDir └── P7-template.sh ``` +(module-binaries)= + ## Module binaries :::{versionadded} 22.10.0 diff --git a/docs/requirements.txt b/docs/requirements.txt index 6e887390ed..06694e6ca0 100644 --- a/docs/requirements.txt +++ b/docs/requirements.txt @@ -1,4 +1,5 @@ myst-parser==0.18.1 sphinx==5.3.0 sphinx-rtd-theme==1.1.1 -sphinxcontrib-mermaid==0.9.2 \ No newline at end of file +sphinxcontrib-mermaid==0.9.2 +sphinxext-rediraffe==0.2.7 \ No newline at end of file From 20001ca7611b38180005507baf25f3a4011b2937 Mon Sep 17 00:00:00 2001 From: Paolo Di Tommaso Date: Sun, 15 Oct 2023 16:34:08 +0200 Subject: [PATCH 8/9] Minor changes Signed-off-by: Paolo Di Tommaso --- docs/workflow.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/workflow.md b/docs/workflow.md index ebc3aae4ee..209b3e769b 100644 --- a/docs/workflow.md +++ b/docs/workflow.md @@ -330,7 +330,7 @@ The fully qualified process name can be used as a {ref}`process selector Date: Sun, 15 Oct 2023 16:34:44 +0200 Subject: [PATCH 9/9] Bulding sphinx container on demand Signed-off-by: Paolo Di Tommaso --- docs/make-html.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/make-html.sh b/docs/make-html.sh index ebf3378c75..da26d88bf2 100755 --- a/docs/make-html.sh +++ b/docs/make-html.sh @@ -1,4 +1,4 @@ #!/bin/bash -docker run -v $(pwd):/tmp nextflow/sphinx:5.3.0 -- make html +docker run -v $(pwd):/tmp $(wave -f Dockerfile --context .) -- make html echo "Done. See _build/html/index.html"