diff --git a/.github/CONTRIBUTING.md b/.github/CONTRIBUTING.md index f9393180..67f02e93 100644 --- a/.github/CONTRIBUTING.md +++ b/.github/CONTRIBUTING.md @@ -19,7 +19,7 @@ If you'd like to write some code for nf-core/mag, the standard workflow is as fo 1. Check that there isn't already an issue about your idea in the [nf-core/mag issues](https://github.com/nf-core/mag/issues) to avoid duplicating work. If there isn't one already, please create one so that others know you're working on this 2. [Fork](https://help.github.com/en/github/getting-started-with-github/fork-a-repo) the [nf-core/mag repository](https://github.com/nf-core/mag) to your GitHub account 3. Make the necessary changes / additions within your forked repository following [Pipeline conventions](#pipeline-contribution-conventions) -4. Use `nf-core schema build` and add any new parameters to the pipeline JSON schema (requires [nf-core tools](https://github.com/nf-core/tools) >= 1.10). +4. Use `nf-core pipelines schema build` and add any new parameters to the pipeline JSON schema (requires [nf-core tools](https://github.com/nf-core/tools) >= 1.10). 5. Submit a Pull Request against the `dev` branch and wait for the code to be reviewed and merged If you're not used to this workflow with git, you can start with some [docs from GitHub](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests) or even their [excellent `git` resources](https://try.github.io/). @@ -40,7 +40,7 @@ There are typically two types of tests that run: ### Lint tests `nf-core` has a [set of guidelines](https://nf-co.re/developers/guidelines) which all pipelines must adhere to. -To enforce these and ensure that all pipelines stay in sync, we have developed a helper tool which runs checks on the pipeline code. This is in the [nf-core/tools repository](https://github.com/nf-core/tools) and once installed can be run locally with the `nf-core lint ` command. +To enforce these and ensure that all pipelines stay in sync, we have developed a helper tool which runs checks on the pipeline code. This is in the [nf-core/tools repository](https://github.com/nf-core/tools) and once installed can be run locally with the `nf-core pipelines lint ` command. If any failures or warnings are encountered, please follow the listed URL for more documentation. @@ -75,7 +75,7 @@ If you wish to contribute a new step, please use the following coding standards: 2. Write the process block (see below). 3. Define the output channel if needed (see below). 4. Add any new parameters to `nextflow.config` with a default (see below). -5. Add any new parameters to `nextflow_schema.json` with help text (via the `nf-core schema build` tool). +5. Add any new parameters to `nextflow_schema.json` with help text (via the `nf-core pipelines schema build` tool). 6. Add sanity checks and validation for all relevant parameters. 7. Perform local tests to validate that the new code works as expected. 8. If applicable, add a new test command in `.github/workflow/ci.yml`. @@ -86,11 +86,11 @@ If you wish to contribute a new step, please use the following coding standards: Parameters should be initialised / defined with default values in `nextflow.config` under the `params` scope. -Once there, use `nf-core schema build` to add to `nextflow_schema.json`. +Once there, use `nf-core pipelines schema build` to add to `nextflow_schema.json`. ### Default processes resource requirements -Sensible defaults for process resource requirements (CPUs / memory / time) for a process should be defined in `conf/base.config`. These should generally be specified generic with `withLabel:` selectors so they can be shared across multiple processes/steps of the pipeline. A nf-core standard set of labels that should be followed where possible can be seen in the [nf-core pipeline template](https://github.com/nf-core/tools/blob/master/nf_core/pipeline-template/conf/base.config), which has the default process as a single core-process, and then different levels of multi-core configurations for increasingly large memory requirements defined with standardised labels. +Sensible defaults for process resource requirements (CPUs / memory / time) for a process should be defined in `conf/base.config`. These should generally be specified generic with `withLabel:` selectors so they can be shared across multiple processes/steps of the pipeline. A nf-core standard set of labels that should be followed where possible can be seen in the [nf-core pipeline template](https://github.com/nf-core/tools/blob/main/nf_core/pipeline-template/conf/base.config), which has the default process as a single core-process, and then different levels of multi-core configurations for increasingly large memory requirements defined with standardised labels. The process resources can be passed on to the tool dynamically within the process with the `${task.cpus}` and `${task.memory}` variables in the `script:` block. @@ -103,7 +103,7 @@ Please use the following naming schemes, to make it easy to understand what is g ### Nextflow version bumping -If you are using a new feature from core Nextflow, you may bump the minimum required version of nextflow in the pipeline with: `nf-core bump-version --nextflow . [min-nf-version]` +If you are using a new feature from core Nextflow, you may bump the minimum required version of nextflow in the pipeline with: `nf-core pipelines bump-version --nextflow . [min-nf-version]` ### Images and figures diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md index 459141ac..e44a93d1 100644 --- a/.github/PULL_REQUEST_TEMPLATE.md +++ b/.github/PULL_REQUEST_TEMPLATE.md @@ -17,7 +17,7 @@ Learn more about contributing: [CONTRIBUTING.md](https://github.com/nf-core/mag/ - [ ] If you've fixed a bug or added code that should be tested, add tests! - [ ] If you've added a new tool - have you followed the pipeline conventions in the [contribution docs](https://github.com/nf-core/mag/tree/master/.github/CONTRIBUTING.md) - [ ] If necessary, also make a PR on the nf-core/mag _branch_ on the [nf-core/test-datasets](https://github.com/nf-core/test-datasets) repository. -- [ ] Make sure your code lints (`nf-core lint`). +- [ ] Make sure your code lints (`nf-core pipelines lint`). - [ ] Ensure the test suite passes (`nextflow run . -profile test,docker --outdir `). - [ ] Check for unexpected warnings in debug mode (`nextflow run . -profile debug,test,docker --outdir `). - [ ] Usage Documentation in `docs/usage.md` is updated. diff --git a/.github/workflows/awsfulltest.yml b/.github/workflows/awsfulltest.yml index 9e382f35..395b231c 100644 --- a/.github/workflows/awsfulltest.yml +++ b/.github/workflows/awsfulltest.yml @@ -1,18 +1,36 @@ name: nf-core AWS full size tests -# This workflow is triggered on published releases. +# This workflow is triggered on PRs opened against the master branch. # It can be additionally triggered manually with GitHub actions workflow dispatch button. # It runs the -profile 'test_full' on AWS batch on: - release: - types: [published] + pull_request: + branches: + - master workflow_dispatch: + pull_request_review: + types: [submitted] + jobs: run-platform: name: Run AWS full tests - if: github.repository == 'nf-core/mag' + # run only if the PR is approved by at least 2 reviewers and against the master branch or manually triggered + if: github.repository == 'nf-core/mag' && github.event.review.state == 'approved' && github.event.pull_request.base.ref == 'master' || github.event_name == 'workflow_dispatch' runs-on: ubuntu-latest steps: + - uses: octokit/request-action@v2.x + if: github.event_name != 'workflow_dispatch' + id: check_approvals + with: + route: GET /repos/${{ github.repository }}/pulls/${{ github.event.pull_request.number }}/reviews?per_page=100 + env: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + - id: test_variables + if: github.event_name != 'workflow_dispatch' + run: | + JSON_RESPONSE='${{ steps.check_approvals.outputs.data }}' + CURRENT_APPROVALS_COUNT=$(echo $JSON_RESPONSE | jq -c '[.[] | select(.state | contains("APPROVED")) ] | length') + test $CURRENT_APPROVALS_COUNT -ge 2 || exit 1 # At least 2 approvals are required - name: Launch workflow via Seqera Platform uses: seqeralabs/action-tower-launch@v2 with: diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 3afa7887..d2fa6e12 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -7,9 +7,12 @@ on: pull_request: release: types: [published] + workflow_dispatch: env: NXF_ANSI_LOG: false + NXF_SINGULARITY_CACHEDIR: ${{ github.workspace }}/.singularity + NXF_SINGULARITY_LIBRARYDIR: ${{ github.workspace }}/.singularity concurrency: group: "${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}" @@ -17,15 +20,29 @@ concurrency: jobs: test: - name: Run pipeline with test data + name: "Run pipeline with test data (${{ matrix.NXF_VER }} | ${{ matrix.test_name }} | ${{ matrix.profile }})" # Only run on push if this is the nf-core dev branch (merged PRs) if: "${{ github.event_name != 'push' || (github.event_name == 'push' && github.repository == 'nf-core/mag') }}" runs-on: ubuntu-latest strategy: matrix: NXF_VER: - - "23.04.0" + - "24.04.2" - "latest-everything" + profile: + - "conda" + - "docker" + - "singularity" + test_name: + - "test" + isMaster: + - ${{ github.base_ref == 'master' }} + # Exclude conda and singularity on dev + exclude: + - isMaster: false + profile: "conda" + - isMaster: false + profile: "singularity" steps: - name: Free some space run: | @@ -35,17 +52,42 @@ jobs: - name: Check out pipeline code uses: actions/checkout@0ad4b8fadaa221de15dcec353f45205ec38ea70b # v4 - - name: Install Nextflow + - name: Set up Nextflow uses: nf-core/setup-nextflow@v2 with: version: "${{ matrix.NXF_VER }}" - - name: Disk space cleanup + - name: Set up Apptainer + if: matrix.profile == 'singularity' + uses: eWaterCycle/setup-apptainer@main + + - name: Set up Singularity + if: matrix.profile == 'singularity' + run: | + mkdir -p $NXF_SINGULARITY_CACHEDIR + mkdir -p $NXF_SINGULARITY_LIBRARYDIR + + - name: Set up Miniconda + if: matrix.profile == 'conda' + uses: conda-incubator/setup-miniconda@a4260408e20b96e80095f42ff7f1a15b27dd94ca # v3 + with: + miniconda-version: "latest" + auto-update-conda: true + conda-solver: libmamba + channels: conda-forge,bioconda + + - name: Set up Conda + if: matrix.profile == 'conda' + run: | + echo $(realpath $CONDA)/condabin >> $GITHUB_PATH + echo $(realpath python) >> $GITHUB_PATH + + - name: Clean up Disk space uses: jlumbroso/free-disk-space@54081f138730dfa15788a46383842cd2f914a1be # v1.3.1 - - name: Run pipeline with test data + - name: "Run pipeline with test data ${{ matrix.NXF_VER }} | ${{ matrix.test_name }} | ${{ matrix.profile }}" run: | - nextflow run ${GITHUB_WORKSPACE} -profile test,docker --outdir ./results + nextflow run ${GITHUB_WORKSPACE} -profile ${{ matrix.test_name }},${{ matrix.profile }} --outdir ./results profiles: name: Run workflow profile @@ -55,7 +97,7 @@ jobs: strategy: matrix: # Run remaining test profiles with minimum nextflow version - profile: + test_name: [ test_host_rm, test_hybrid, @@ -82,9 +124,12 @@ jobs: wget -qO- get.nextflow.io | bash sudo mv nextflow /usr/local/bin/ - - name: Run pipeline with ${{ matrix.profile }} test profile + - name: Clean up Disk space + uses: jlumbroso/free-disk-space@54081f138730dfa15788a46383842cd2f914a1be # v1.3.1 + + - name: Run pipeline with ${{ matrix.test_name }} test profile run: | - nextflow run ${GITHUB_WORKSPACE} -profile ${{ matrix.profile }},docker --outdir ./results + nextflow run ${GITHUB_WORKSPACE} -profile ${{ matrix.test_name }},docker --outdir ./results checkm: name: Run single test to checkm due to database download @@ -99,17 +144,20 @@ jobs: sudo rm -rf "$AGENT_TOOLSDIRECTORY" - name: Check out pipeline code - uses: actions/checkout@v2 + uses: actions/checkout@0ad4b8fadaa221de15dcec353f45205ec38ea70b # v4 - name: Install Nextflow run: | wget -qO- get.nextflow.io | bash sudo mv nextflow /usr/local/bin/ + - name: Clean up Disk space + uses: jlumbroso/free-disk-space@54081f138730dfa15788a46383842cd2f914a1be # v1.3.1 + - name: Download and prepare CheckM database run: | mkdir -p databases/checkm - wget https://data.ace.uq.edu.au/public/CheckM_databases/checkm_data_2015_01_16.tar.gz -P databases/checkm + wget https://zenodo.org/records/7401545/files/checkm_data_2015_01_16.tar.gz -P databases/checkm tar xzvf databases/checkm/checkm_data_2015_01_16.tar.gz -C databases/checkm/ - name: Run pipeline with ${{ matrix.profile }} test profile diff --git a/.github/workflows/download_pipeline.yml b/.github/workflows/download_pipeline.yml index 2d20d644..713dc3e7 100644 --- a/.github/workflows/download_pipeline.yml +++ b/.github/workflows/download_pipeline.yml @@ -1,4 +1,4 @@ -name: Test successful pipeline download with 'nf-core download' +name: Test successful pipeline download with 'nf-core pipelines download' # Run the workflow when: # - dispatched manually @@ -8,7 +8,7 @@ on: workflow_dispatch: inputs: testbranch: - description: "The specific branch you wish to utilize for the test execution of nf-core download." + description: "The specific branch you wish to utilize for the test execution of nf-core pipelines download." required: true default: "dev" pull_request: @@ -39,9 +39,11 @@ jobs: with: python-version: "3.12" architecture: "x64" - - uses: eWaterCycle/setup-singularity@931d4e31109e875b13309ae1d07c70ca8fbc8537 # v7 + + - name: Setup Apptainer + uses: eWaterCycle/setup-apptainer@4bb22c52d4f63406c49e94c804632975787312b3 # v2.0.0 with: - singularity-version: 3.8.3 + apptainer-version: 1.3.4 - name: Install dependencies run: | @@ -54,33 +56,64 @@ jobs: echo "REPOTITLE_LOWERCASE=$(basename ${GITHUB_REPOSITORY,,})" >> ${GITHUB_ENV} echo "REPO_BRANCH=${{ github.event.inputs.testbranch || 'dev' }}" >> ${GITHUB_ENV} + - name: Make a cache directory for the container images + run: | + mkdir -p ./singularity_container_images + - name: Download the pipeline env: - NXF_SINGULARITY_CACHEDIR: ./ + NXF_SINGULARITY_CACHEDIR: ./singularity_container_images run: | - nf-core download ${{ env.REPO_LOWERCASE }} \ + nf-core pipelines download ${{ env.REPO_LOWERCASE }} \ --revision ${{ env.REPO_BRANCH }} \ --outdir ./${{ env.REPOTITLE_LOWERCASE }} \ --compress "none" \ --container-system 'singularity' \ - --container-library "quay.io" -l "docker.io" -l "ghcr.io" \ + --container-library "quay.io" -l "docker.io" -l "community.wave.seqera.io" \ --container-cache-utilisation 'amend' \ - --download-configuration + --download-configuration 'yes' - name: Inspect download run: tree ./${{ env.REPOTITLE_LOWERCASE }} + - name: Count the downloaded number of container images + id: count_initial + run: | + image_count=$(ls -1 ./singularity_container_images | wc -l | xargs) + echo "Initial container image count: $image_count" + echo "IMAGE_COUNT_INITIAL=$image_count" >> ${GITHUB_ENV} + - name: Run the downloaded pipeline (stub) id: stub_run_pipeline continue-on-error: true env: - NXF_SINGULARITY_CACHEDIR: ./ + NXF_SINGULARITY_CACHEDIR: ./singularity_container_images NXF_SINGULARITY_HOME_MOUNT: true run: nextflow run ./${{ env.REPOTITLE_LOWERCASE }}/$( sed 's/\W/_/g' <<< ${{ env.REPO_BRANCH }}) -stub -profile test,singularity --outdir ./results - name: Run the downloaded pipeline (stub run not supported) id: run_pipeline if: ${{ job.steps.stub_run_pipeline.status == failure() }} env: - NXF_SINGULARITY_CACHEDIR: ./ + NXF_SINGULARITY_CACHEDIR: ./singularity_container_images NXF_SINGULARITY_HOME_MOUNT: true run: nextflow run ./${{ env.REPOTITLE_LOWERCASE }}/$( sed 's/\W/_/g' <<< ${{ env.REPO_BRANCH }}) -profile test,singularity --outdir ./results + + - name: Count the downloaded number of container images + id: count_afterwards + run: | + image_count=$(ls -1 ./singularity_container_images | wc -l | xargs) + echo "Post-pipeline run container image count: $image_count" + echo "IMAGE_COUNT_AFTER=$image_count" >> ${GITHUB_ENV} + + - name: Compare container image counts + run: | + if [ "${{ env.IMAGE_COUNT_INITIAL }}" -ne "${{ env.IMAGE_COUNT_AFTER }}" ]; then + initial_count=${{ env.IMAGE_COUNT_INITIAL }} + final_count=${{ env.IMAGE_COUNT_AFTER }} + difference=$((final_count - initial_count)) + echo "$difference additional container images were \n downloaded at runtime . The pipeline has no support for offline runs!" + tree ./singularity_container_images + exit 1 + else + echo "The pipeline can be downloaded successfully!" + fi diff --git a/.github/workflows/linting.yml b/.github/workflows/linting.yml index 1fcafe88..a502573c 100644 --- a/.github/workflows/linting.yml +++ b/.github/workflows/linting.yml @@ -1,6 +1,6 @@ name: nf-core linting # This workflow is triggered on pushes and PRs to the repository. -# It runs the `nf-core lint` and markdown lint tests to ensure +# It runs the `nf-core pipelines lint` and markdown lint tests to ensure # that the code meets the nf-core guidelines. on: push: @@ -41,17 +41,32 @@ jobs: python-version: "3.12" architecture: "x64" + - name: read .nf-core.yml + uses: pietrobolcato/action-read-yaml@1.1.0 + id: read_yml + with: + config: ${{ github.workspace }}/.nf-core.yml + - name: Install dependencies run: | python -m pip install --upgrade pip - pip install nf-core + pip install nf-core==${{ steps.read_yml.outputs['nf_core_version'] }} + + - name: Run nf-core pipelines lint + if: ${{ github.base_ref != 'master' }} + env: + GITHUB_COMMENTS_URL: ${{ github.event.pull_request.comments_url }} + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + GITHUB_PR_COMMIT: ${{ github.event.pull_request.head.sha }} + run: nf-core -l lint_log.txt pipelines lint --dir ${GITHUB_WORKSPACE} --markdown lint_results.md - - name: Run nf-core lint + - name: Run nf-core pipelines lint --release + if: ${{ github.base_ref == 'master' }} env: GITHUB_COMMENTS_URL: ${{ github.event.pull_request.comments_url }} GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} GITHUB_PR_COMMIT: ${{ github.event.pull_request.head.sha }} - run: nf-core -l lint_log.txt lint --dir ${GITHUB_WORKSPACE} --markdown lint_results.md + run: nf-core -l lint_log.txt pipelines lint --release --dir ${GITHUB_WORKSPACE} --markdown lint_results.md - name: Save PR number if: ${{ always() }} diff --git a/.github/workflows/linting_comment.yml b/.github/workflows/linting_comment.yml index 40acc23f..42e519bf 100644 --- a/.github/workflows/linting_comment.yml +++ b/.github/workflows/linting_comment.yml @@ -11,7 +11,7 @@ jobs: runs-on: ubuntu-latest steps: - name: Download lint results - uses: dawidd6/action-download-artifact@09f2f74827fd3a8607589e5ad7f9398816f540fe # v3 + uses: dawidd6/action-download-artifact@bf251b5aa9c2f7eeb574a96ee720e24f801b7c11 # v6 with: workflow: linting.yml workflow_conclusion: completed diff --git a/.github/workflows/release-announcements.yml b/.github/workflows/release-announcements.yml index 03ecfcf7..c6ba35df 100644 --- a/.github/workflows/release-announcements.yml +++ b/.github/workflows/release-announcements.yml @@ -12,7 +12,7 @@ jobs: - name: get topics and convert to hashtags id: get_topics run: | - echo "topics=$(curl -s https://nf-co.re/pipelines.json | jq -r '.remote_workflows[] | select(.full_name == "${{ github.repository }}") | .topics[]' | awk '{print "#"$0}' | tr '\n' ' ')" >> $GITHUB_OUTPUT + echo "topics=$(curl -s https://nf-co.re/pipelines.json | jq -r '.remote_workflows[] | select(.full_name == "${{ github.repository }}") | .topics[]' | awk '{print "#"$0}' | tr '\n' ' ')" | sed 's/-//g' >> $GITHUB_OUTPUT - uses: rzr/fediverse-action@master with: diff --git a/.github/workflows/template_version_comment.yml b/.github/workflows/template_version_comment.yml new file mode 100644 index 00000000..e8aafe44 --- /dev/null +++ b/.github/workflows/template_version_comment.yml @@ -0,0 +1,46 @@ +name: nf-core template version comment +# This workflow is triggered on PRs to check if the pipeline template version matches the latest nf-core version. +# It posts a comment to the PR, even if it comes from a fork. + +on: pull_request_target + +jobs: + template_version: + runs-on: ubuntu-latest + steps: + - name: Check out pipeline code + uses: actions/checkout@0ad4b8fadaa221de15dcec353f45205ec38ea70b # v4 + with: + ref: ${{ github.event.pull_request.head.sha }} + + - name: Read template version from .nf-core.yml + uses: nichmor/minimal-read-yaml@v0.0.2 + id: read_yml + with: + config: ${{ github.workspace }}/.nf-core.yml + + - name: Install nf-core + run: | + python -m pip install --upgrade pip + pip install nf-core==${{ steps.read_yml.outputs['nf_core_version'] }} + + - name: Check nf-core outdated + id: nf_core_outdated + run: echo "OUTPUT=$(pip list --outdated | grep nf-core)" >> ${GITHUB_ENV} + + - name: Post nf-core template version comment + uses: mshick/add-pr-comment@b8f338c590a895d50bcbfa6c5859251edc8952fc # v2 + if: | + contains(env.OUTPUT, 'nf-core') + with: + repo-token: ${{ secrets.NF_CORE_BOT_AUTH_TOKEN }} + allow-repeats: false + message: | + > [!WARNING] + > Newer version of the nf-core template is available. + > + > Your pipeline is using an old version of the nf-core template: ${{ steps.read_yml.outputs['nf_core_version'] }}. + > Please update your pipeline to the latest version. + > + > For more documentation on how to update your pipeline, please see the [nf-core documentation](https://github.com/nf-core/tools?tab=readme-ov-file#sync-a-pipeline-with-the-template) and [Synchronisation documentation](https://nf-co.re/docs/contributing/sync). + # diff --git a/.gitignore b/.gitignore index 5124c9ac..a42ce016 100644 --- a/.gitignore +++ b/.gitignore @@ -6,3 +6,4 @@ results/ testing/ testing* *.pyc +null/ diff --git a/.gitpod.yml b/.gitpod.yml index 105a1821..46118637 100644 --- a/.gitpod.yml +++ b/.gitpod.yml @@ -4,17 +4,14 @@ tasks: command: | pre-commit install --install-hooks nextflow self-update - - name: unset JAVA_TOOL_OPTIONS - command: | - unset JAVA_TOOL_OPTIONS vscode: extensions: # based on nf-core.nf-core-extensionpack - - esbenp.prettier-vscode # Markdown/CommonMark linting and style checking for Visual Studio Code + #- esbenp.prettier-vscode # Markdown/CommonMark linting and style checking for Visual Studio Code - EditorConfig.EditorConfig # override user/workspace settings with settings found in .editorconfig files - Gruntfuggly.todo-tree # Display TODO and FIXME in a tree view in the activity bar - mechatroner.rainbow-csv # Highlight columns in csv files in different colors - # - nextflow.nextflow # Nextflow syntax highlighting + - nextflow.nextflow # Nextflow syntax highlighting - oderwat.indent-rainbow # Highlight indentation level - streetsidesoftware.code-spell-checker # Spelling checker for source code - charliermarsh.ruff # Code linter Ruff diff --git a/.nf-core.yml b/.nf-core.yml index db428d16..5d193256 100644 --- a/.nf-core.yml +++ b/.nf-core.yml @@ -1,6 +1,4 @@ -repository_type: pipeline -nf_core_version: "2.14.1" - +bump_version: null lint: files_unchanged: - lib/NfcoreTemplate.groovy @@ -8,3 +6,17 @@ lint: - config_defaults: - params.phix_reference - params.lambda_reference +nf_core_version: 3.0.2 +org_path: null +repository_type: pipeline +template: + author: "Hadrien Gourlé, Daniel Straub, Sabrina Krakau, James A. Fellows Yates, Maxime Borry" + description: Assembly, binning and annotation of metagenomes + force: false + is_nfcore: true + name: mag + org: nf-core + outdir: . + skip_features: null + version: 3.2.0 +update: null diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml index 4dc0f1dc..9e9f0e1c 100644 --- a/.pre-commit-config.yaml +++ b/.pre-commit-config.yaml @@ -7,7 +7,7 @@ repos: - prettier@3.2.5 - repo: https://github.com/editorconfig-checker/editorconfig-checker.python - rev: "2.7.3" + rev: "3.0.3" hooks: - id: editorconfig-checker alias: ec diff --git a/CHANGELOG.md b/CHANGELOG.md index a80183c0..43c68211 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -3,6 +3,38 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). +## 3.2.0 [2024-10-27] + +### `Added` + +- [#674](https://github.com/nf-core/mag/pull/674) - Added `--longread_adaptertrimming_tool` Where user can chose between porechop_abi (default) and porechop (added by @muabnezor) + +### `Changed` + +- [#674](https://github.com/nf-core/mag/pull/674) - Changed to porechop-abi as default adapter trimming tool for long reads. User can still use porechop if preferred (added by @muabnezor) +- [#666](https://github.com/nf-core/mag/pull/666) - Update SPAdes to version 4.0.0, replace both METASPADES and MEGAHIT with official nf-core modules (requested by @elsherbini, fix by @jfy133) +- [#666](https://github.com/nf-core/mag/pull/666) - Update URLs to GTDB database downloads due to server move (reported by @Jokendo-collab, fix by @jfy133) +- [#695](https://github.com/nf-core/mag/pull/695) - Updated to nf-core 3.0.2 `TEMPLATE` (by @jfy133) +- [#695](https://github.com/nf-core/mag/pull/695) - Switch more stable Zenodo link for CheckM data (by @jfy133) + +### `Fixed` + +- [#674](https://github.com/nf-core/mag/pull/674) - Make longread preprocessing a subworkflow (added by @muabnezor) +- [#674](https://github.com/nf-core/mag/pull/674) - Add porechop and filtlong logs to multiqc (added by @muabnezor) +- [#674](https://github.com/nf-core/mag/pull/674) - Change local filtlong module to the official nf-core/filtlong module (added by @muabnezor) +- [#690](https://github.com/nf-core/mag/pull/690) - MaxBin2 now using the abundance information from different samples rather than an average (reported by @uel3 and fixed by @d4straub) +- [#698](https://github.com/nf-core/mag/pull/698) - Updated prodigal module to not pick up input symlinks for compression causing pigz errors (reported by @zackhenny, fix by @jfy133 ) + +### `Dependencies` + +| Tool | Previous version | New version | +| ------------ | ---------------- | ----------- | +| Porechop_ABI | | 0.5.0 | +| Filtlong | 0.2.0 | 0.2.1 | +| SPAdes | 3.15.3 | 4.0.0 | + +### `Deprecated` + ## 3.1.0 [2024-10-04] ### `Added` diff --git a/CITATIONS.md b/CITATIONS.md index 560a103a..52caa1e6 100644 --- a/CITATIONS.md +++ b/CITATIONS.md @@ -64,7 +64,7 @@ - [geNomad](https://doi.org/10.1101/2023.03.05.531206) - > Camargo, A. P., et al. (2023). You can move, but you can’t hide: identification of mobile genetic elements with geNomad. bioRxiv preprint. doi: https://doi.org/10.1101/2023.03.05.531206 + > Camargo, A. P., et al. (2023). You can move, but you can’t hide: identification of mobile genetic elements with geNomad. bioRxiv preprint. doi: 10.1101/2023.03.05.531206 - [GTDB-Tk](https://doi.org/10.1093/bioinformatics/btz848) @@ -96,11 +96,11 @@ - [MetaEuk](https://doi.org/10.1186/s40168-020-00808-x) -> Levy Karin, E., Mirdita, M. & Söding, J. MetaEuk—sensitive, high-throughput gene discovery, and annotation for large-scale eukaryotic metagenomics. Microbiome 8, 48 (2020). https://doi.org/10.1186/s40168-020-00808-x + > Levy Karin, E., Mirdita, M. & Söding, J. MetaEuk—sensitive, high-throughput gene discovery, and annotation for large-scale eukaryotic metagenomics. Microbiome 8, 48 (2020). 10.1186/s40168-020-00808-x - [MMseqs2](https://www.nature.com/articles/nbt.3988) -> Steinegger, M., Söding, J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat Biotechnol 35, 1026–1028 (2017). https://doi.org/10.1038/nbt.3988 + > Steinegger, M., Söding, J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat Biotechnol 35, 1026–1028 (2017).10.1038/nbt.3988 - [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/) @@ -116,6 +116,10 @@ - [Porechop](https://github.com/rrwick/Porechop) +- [Porechop-abi](https://github.com/bonsai-team/Porechop_ABI) + + > Bonenfant, Q., Noé, L., & Touzet, H. (2022). Porechop_ABI: discovering unknown adapters in ONT sequencing reads for downstream trimming. bioRxiv. 10.1101/2022.07.07.499093 + - [Prodigal](https://pubmed.ncbi.nlm.nih.gov/20211023/) > Hyatt D, Chen GL, Locascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010 Mar 8;11:119. doi: 10.1186/1471-2105-11-119. PMID: 20211023; PMCID: PMC2848648. @@ -145,6 +149,7 @@ ## Data - [Full-size test data](https://doi.org/10.1038/s41587-019-0191-2) + > Bertrand, D., Shaw, J., Kalathiyappan, M., Ng, A. H. Q., Kumar, M. S., Li, C., ... & Nagarajan, N. (2019). Hybrid metagenomic assembly enables high-resolution analysis of resistance determinants and mobile elements in human microbiomes. Nature biotechnology, 37(8), 937-944. doi: 10.1038/s41587-019-0191-2. ## Software packaging/containerisation tools diff --git a/README.md b/README.md index 405e298a..d82f04a9 100644 --- a/README.md +++ b/README.md @@ -9,7 +9,7 @@ [![GitHub Actions Linting Status](https://github.com/nf-core/mag/actions/workflows/linting.yml/badge.svg)](https://github.com/nf-core/mag/actions/workflows/linting.yml)[![AWS CI](https://img.shields.io/badge/CI%20tests-full%20size-FF9900?labelColor=000000&logo=Amazon%20AWS)](https://nf-co.re/mag/results)[![Cite with Zenodo](http://img.shields.io/badge/DOI-10.5281/zenodo.3589527-1073c8?labelColor=000000)](https://doi.org/10.5281/zenodo.3589527) [![nf-test](https://img.shields.io/badge/unit_tests-nf--test-337ab7.svg)](https://www.nf-test.com)[![Cite Publication](https://img.shields.io/badge/Cite%20Us!-Cite%20Publication-orange)](https://doi.org/10.1093/nargab/lqac007) -[![Nextflow](https://img.shields.io/badge/nextflow%20DSL2-%E2%89%A523.04.0-23aa62.svg)](https://www.nextflow.io/) +[![Nextflow](https://img.shields.io/badge/nextflow%20DSL2-%E2%89%A524.04.2-23aa62.svg)](https://www.nextflow.io/) [![run with conda](http://img.shields.io/badge/run%20with-conda-3EB049?labelColor=000000&logo=anaconda)](https://docs.conda.io/en/latest/) [![run with docker](https://img.shields.io/badge/run%20with-docker-0db7ed?labelColor=000000&logo=docker)](https://www.docker.com/) [![run with singularity](https://img.shields.io/badge/run%20with-singularity-1d355c.svg?labelColor=000000)](https://sylabs.io/docs/) @@ -58,8 +58,7 @@ nextflow run nf-core/mag -profile [!WARNING] -> Please provide pipeline parameters via the CLI or Nextflow `-params-file` option. Custom config files including those provided by the `-c` Nextflow option can be used to provide any configuration _**except for parameters**_; -> see [docs](https://nf-co.re/usage/configuration#custom-configuration-files). +> Please provide pipeline parameters via the CLI or Nextflow `-params-file` option. Custom config files including those provided by the `-c` Nextflow option can be used to provide any configuration _**except for parameters**_; see [docs](https://nf-co.re/docs/usage/getting_started/configuration#custom-configuration-files). For more details and further functionality, please refer to the [usage documentation](https://nf-co.re/mag/usage) and the [parameter documentation](https://nf-co.re/mag/parameters). @@ -90,6 +89,7 @@ Other code contributors include: - [Jim Downie](https://github.com/prototaxites) - [Phil Palmer](https://github.com/PhilPalmer) - [@willros](https://github.com/willros) +- [Adam Rosenbaum](https://github.com/muabnezor) Long read processing was inspired by [caspargross/HybridAssembly](https://github.com/caspargross/HybridAssembly) written by Caspar Gross [@caspargross](https://github.com/caspargross) diff --git a/assets/multiqc_config.yml b/assets/multiqc_config.yml index 8dcc79aa..e9da7a41 100644 --- a/assets/multiqc_config.yml +++ b/assets/multiqc_config.yml @@ -1,7 +1,7 @@ report_comment: > - This report has been generated by the nf-core/mag + This report has been generated by the nf-core/mag analysis pipeline. For information about how to interpret these results, please see the - documentation. + documentation. report_section_order: "nf-core-mag-methods-description": order: -1000 @@ -25,6 +25,8 @@ run_modules: - quast - kraken - prokka + - porechop + - filtlong ## Module order top_modules: @@ -35,6 +37,7 @@ top_modules: - "fastp" - "adapterRemoval" - "porechop" + - "filtlong" - "fastqc": name: "FastQC: after preprocessing" info: "After trimming and, if requested, contamination removal." @@ -109,6 +112,9 @@ sp: fn_re: ".*[kraken2|centrifuge].*report.txt" quast: fn_re: "report.*.tsv" + filtlong: + num_lines: 20 + fn_re: ".*_filtlong.log" ## File name cleaning extra_fn_clean_exts: diff --git a/assets/schema_assembly_input.json b/assets/schema_assembly_input.json index c6712717..404845b9 100644 --- a/assets/schema_assembly_input.json +++ b/assets/schema_assembly_input.json @@ -1,5 +1,5 @@ { - "$schema": "http://json-schema.org/draft-07/schema", + "$schema": "https://json-schema.org/draft/2020-12/schema", "$id": "https://raw.githubusercontent.com/nf-core/mag/master/assets/schema_input.json", "title": "nf-core/mag pipeline - params.input schema", "description": "Schema for the file provided with params.input", @@ -10,30 +10,26 @@ "id": { "type": "string", "pattern": "^\\S+$", - "errorMessage": "ID must be provided and cannot contain spaces", "meta": ["id"] }, "group": { - "type": "string", + "type": ["string", "integer"], "pattern": "^\\S+$", - "meta": ["group"], - "errorMessage": "Column 'group' contains an empty field. Either remove column 'group' or fill each field with a value." + "meta": ["group"] }, "assembler": { "type": "string", "pattern": "MEGAHIT|SPAdes|SPAdesHybrid", - "meta": ["assembler"], - "errorMessage": "Only MEGAHIT or SPAdes assemblies are supported" + "meta": ["assembler"] }, "fasta": { "type": "string", "format": "file-path", "pattern": "^\\S+\\.(fasta|fas|fa|fna)(\\.gz)?$", - "exists": true, - "unique": true, - "errorMessage": "FastA file with pre-assembled contigs must be provided, cannot contain spaces and must have extension 'fasta', 'fas', 'fa', or 'fna', all optionally gzipped." + "exists": true } }, - "required": ["id", "assembler", "fasta"] - } + "required": ["id", "group", "assembler", "fasta"] + }, + "allOf": [{ "uniqueEntries": ["fasta"] }] } diff --git a/assets/schema_input.json b/assets/schema_input.json index b8177803..01b494b5 100644 --- a/assets/schema_input.json +++ b/assets/schema_input.json @@ -1,5 +1,5 @@ { - "$schema": "http://json-schema.org/draft-07/schema", + "$schema": "https://json-schema.org/draft/2020-12/schema", "$id": "https://raw.githubusercontent.com/nf-core/mag/master/assets/schema_input.json", "title": "nf-core/mag pipeline - params.input schema", "description": "Schema for the file provided with params.input", @@ -10,46 +10,43 @@ "sample": { "type": "string", "pattern": "^\\S+$", - "errorMessage": "Sample name must be provided and cannot contain spaces", "meta": ["id"] }, "run": { - "type": "string", + "type": ["string", "integer"], "pattern": "^\\S+$", "meta": ["run"], - "unique": ["sample"], - "errorMessage": "Column 'run' contains an empty field. Either remove column 'run' or fill each field with a value." + "unique": ["sample"] }, "group": { - "type": "string", + "type": ["string", "integer"], "pattern": "^\\S+$", - "meta": ["group"], - "errorMessage": "Column 'group' contains an empty field. Either remove column 'group' or fill each field with a value." + "meta": ["group"] }, "short_reads_1": { "type": "string", "format": "file-path", "exists": true, - "pattern": "^\\S+\\.f(ast)?q\\.gz$", - "errorMessage": "FastQ file for reads 1 must be provided, cannot contain spaces and must have extension '.fq.gz' or '.fastq.gz'" + "pattern": "^\\S+\\.f(ast)?q\\.gz$" }, "short_reads_2": { "type": "string", "format": "file-path", "exists": true, - "pattern": "^\\S+\\.f(ast)?q\\.gz$", - "dependentRequired": ["short_reads_1"], - "errorMessage": "FastQ file for reads 2 cannot contain spaces and must have extension '.fq.gz' or '.fastq.gz'" + "pattern": "^\\S+\\.f(ast)?q\\.gz$" }, "long_reads": { "type": "string", "format": "file-path", "exists": true, - "pattern": "^\\S+\\.f(ast)?q\\.gz$", - "dependentRequired": ["short_reads_1", "short_reads_2"], - "errorMessage": "FastQ file for long reads cannot contain spaces and must have extension '.fq.gz' or '.fastq.gz'" + "pattern": "^\\S+\\.f(ast)?q\\.gz$" } }, - "required": ["sample", "short_reads_1"] + "required": ["sample", "group", "short_reads_1"] + }, + "uniqueEntries": ["sample", "run"], + "dependentRequired": { + "short_reads_2": ["short_reads_1"], + "long_reads": ["short_reads_1", "short_reads_2"] } } diff --git a/conf/base.config b/conf/base.config index 1e2540f3..21a8ac3e 100644 --- a/conf/base.config +++ b/conf/base.config @@ -10,9 +10,9 @@ process { - cpus = { check_max( 1 * task.attempt, 'cpus' ) } - memory = { check_max( 7.GB * task.attempt, 'memory' ) } - time = { check_max( 4.h * task.attempt, 'time' ) } + cpus = { 1 * task.attempt } + memory = { 7.GB * task.attempt } + time = { 4.h * task.attempt } errorStrategy = { task.exitStatus in ((130..145) + 104) ? 'retry' : 'finish' } maxRetries = 3 @@ -24,150 +24,148 @@ process { // If possible, it would be nice to keep the same label naming convention when // adding in your local modules too. // See https://www.nextflow.io/docs/latest/config.html#config-process-selectors - withLabel:process_single { - cpus = { check_max( 1 , 'cpus' ) } - memory = { check_max( 6.GB * task.attempt, 'memory' ) } - time = { check_max( 4.h * task.attempt, 'time' ) } + withLabel: process_single { + cpus = { 1 } + memory = { 6.GB * task.attempt } + time = { 4.h * task.attempt } } - withLabel:process_low { - cpus = { check_max( 2 * task.attempt, 'cpus' ) } - memory = { check_max( 12.GB * task.attempt, 'memory' ) } - time = { check_max( 4.h * task.attempt, 'time' ) } + withLabel: process_low { + cpus = { 2 * task.attempt } + memory = { 12.GB * task.attempt } + time = { 4.h * task.attempt } } - withLabel:process_medium { - cpus = { check_max( 6 * task.attempt, 'cpus' ) } - memory = { check_max( 36.GB * task.attempt, 'memory' ) } - time = { check_max( 8.h * task.attempt, 'time' ) } + withLabel: process_medium { + cpus = { 6 * task.attempt } + memory = { 36.GB * task.attempt } + time = { 8.h * task.attempt } } - withLabel:process_high { - cpus = { check_max( 12 * task.attempt, 'cpus' ) } - memory = { check_max( 72.GB * task.attempt, 'memory' ) } - time = { check_max( 16.h * task.attempt, 'time' ) } + withLabel: process_high { + cpus = { 12 * task.attempt } + memory = { 72.GB * task.attempt } + time = { 16.h * task.attempt } } - withLabel:process_long { - time = { check_max( 20.h * task.attempt, 'time' ) } + withLabel: process_long { + time = { 20.h * task.attempt } } - withLabel:process_high_memory { - memory = { check_max( 200.GB * task.attempt, 'memory' ) } + withLabel: process_high_memory { + memory = { 200.GB * task.attempt } } - withLabel:error_ignore { + withLabel: error_ignore { errorStrategy = 'ignore' } - withLabel:error_retry { + withLabel: error_retry { errorStrategy = 'retry' maxRetries = 2 } withName: BOWTIE2_HOST_REMOVAL_BUILD { - cpus = { check_max (10 * task.attempt, 'cpus' ) } - memory = { check_max (20.GB * task.attempt, 'memory' ) } - time = { check_max (4.h * task.attempt, 'time' ) } + cpus = { 10 * task.attempt } + memory = { 20.GB * task.attempt } + time = { 4.h * task.attempt } } withName: BOWTIE2_HOST_REMOVAL_ALIGN { - cpus = { check_max (10 * task.attempt, 'cpus' ) } - memory = { check_max (10.GB * task.attempt, 'memory' ) } - time = { check_max (6.h * task.attempt, 'time' ) } + cpus = { 10 * task.attempt } + memory = { 10.GB * task.attempt } + time = { 6.h * task.attempt } } withName: BOWTIE2_PHIX_REMOVAL_ALIGN { - cpus = { check_max (4 * task.attempt, 'cpus' ) } - memory = { check_max (8.GB * task.attempt, 'memory' ) } - time = { check_max (6.h * task.attempt, 'time' ) } + cpus = { 4 * task.attempt } + memory = { 8.GB * task.attempt } + time = { 6.h * task.attempt } } withName: PORECHOP_PORECHOP { - cpus = { check_max (4 * task.attempt, 'cpus' ) } - memory = { check_max (30.GB * task.attempt, 'memory' ) } - time = { check_max (4.h * task.attempt, 'time' ) } + cpus = { 4 * task.attempt } + memory = { 30.GB * task.attempt } + time = { 4.h * task.attempt } } withName: NANOLYSE { - cpus = { check_max (2 * task.attempt, 'cpus' ) } - memory = { check_max (10.GB * task.attempt, 'memory' ) } - time = { check_max (3.h * task.attempt, 'time' ) } + cpus = { 2 * task.attempt } + memory = { 10.GB * task.attempt } + time = { 3.h * task.attempt } } //filtlong: exponential increase of memory and time with attempts withName: FILTLONG { - cpus = { check_max (8 * task.attempt , 'cpus' ) } - memory = { check_max (64.GB * (2**(task.attempt-1)), 'memory' ) } - time = { check_max (24.h * (2**(task.attempt-1)), 'time' ) } + cpus = { 8 * task.attempt } + memory = { 64.GB * (2 ** (task.attempt - 1)) } + time = { 24.h * (2 ** (task.attempt - 1)) } } withName: CENTRIFUGE_CENTRIFUGE { - cpus = { check_max (8 * task.attempt, 'cpus' ) } - memory = { check_max (40.GB * task.attempt, 'memory' ) } - time = { check_max (12.h * task.attempt, 'time' ) } + cpus = { 8 * task.attempt } + memory = { 40.GB * task.attempt } + time = { 12.h * task.attempt } } withName: KRAKEN2 { - cpus = { check_max (8 * task.attempt, 'cpus' ) } - memory = { check_max (40.GB * task.attempt, 'memory' ) } - time = { check_max (12.h * task.attempt, 'time' ) } + cpus = { 8 * task.attempt } + memory = { 40.GB * task.attempt } + time = { 12.h * task.attempt } } withName: KRONA_KTIMPORTTAXONOMY { - cpus = { check_max (8 * task.attempt, 'cpus' ) } - memory = { check_max (20.GB * task.attempt, 'memory' ) } - time = { check_max (12.h * task.attempt, 'time' ) } + cpus = { 8 * task.attempt } + memory = { 20.GB * task.attempt } + time = { 12.h * task.attempt } } withName: CAT_DB_GENERATE { - memory = { check_max (200.GB * task.attempt, 'memory' ) } - time = { check_max (16.h * task.attempt, 'time' ) } + memory = { 200.GB * task.attempt } + time = { 16.h * task.attempt } } withName: CAT { - cpus = { check_max (8 * task.attempt, 'cpus' ) } - memory = { check_max (40.GB * task.attempt, 'memory' ) } - time = { check_max (12.h * task.attempt, 'time' ) } + cpus = { 8 * task.attempt } + memory = { 40.GB * task.attempt } + time = { 12.h * task.attempt } } withName: GTDBTK_CLASSIFYWF { - cpus = { check_max (10 * task.attempt, 'cpus' ) } - memory = { check_max (128.GB * task.attempt, 'memory' ) } - time = { check_max (12.h * task.attempt, 'time' ) } + cpus = { 10 * task.attempt } + memory = { 128.GB * task.attempt } + time = { 12.h * task.attempt } } //MEGAHIT returns exit code 250 when running out of memory withName: MEGAHIT { - cpus = { check_megahit_cpus (8, task.attempt ) } - memory = { check_max (40.GB * task.attempt, 'memory' ) } - time = { check_max (16.h * task.attempt, 'time' ) } + cpus = { params.megahit_fix_cpu_1 ? 1 : (8 * task.attempt) } + memory = { 40.GB * task.attempt } + time = { 16.h * task.attempt } errorStrategy = { task.exitStatus in ((130..145) + 104 + 250) ? 'retry' : 'finish' } } //SPAdes returns error(1) if it runs out of memory (and for other reasons as well...)! //exponential increase of memory and time with attempts, keep number of threads to enable reproducibility withName: SPADES { - cpus = { check_spades_cpus (10, task.attempt) } - memory = { check_max (64.GB * (2**(task.attempt-1)), 'memory' ) } - time = { check_max (24.h * (2**(task.attempt-1)), 'time' ) } + cpus = { params.spades_fix_cpus != -1 ? params.spades_fix_cpus : (10 * task.attempt) } + memory = { 64.GB * (2 ** (task.attempt - 1)) } + time = { 24.h * (2 ** (task.attempt - 1)) } errorStrategy = { task.exitStatus in ((130..145) + 104 + 21 + 12 + 1) ? 'retry' : 'finish' } maxRetries = 5 } withName: SPADESHYBRID { - cpus = { check_spadeshybrid_cpus (10, task.attempt) } - memory = { check_max (64.GB * (2**(task.attempt-1)), 'memory' ) } - time = { check_max (24.h * (2**(task.attempt-1)), 'time' ) } + cpus = { params.spadeshybrid_fix_cpus != -1 ? params.spadeshybrid_fix_cpus : (10 * task.attempt) } + memory = { 64.GB * (2 ** (task.attempt - 1)) } + time = { 24.h * (2 ** (task.attempt - 1)) } errorStrategy = { task.exitStatus in ((130..145) + 104 + 21 + 12 + 1) ? 'retry' : 'finish' } maxRetries = 5 } //returns exit code 247 when running out of memory withName: BOWTIE2_ASSEMBLY_ALIGN { - cpus = { check_max (2 * task.attempt, 'cpus' ) } - memory = { check_max (8.GB * task.attempt, 'memory' ) } - time = { check_max (8.h * task.attempt, 'time' ) } + cpus = { 2 * task.attempt } + memory = { 8.GB * task.attempt } + time = { 8.h * task.attempt } errorStrategy = { task.exitStatus in ((130..145) + 104 + 247) ? 'retry' : 'finish' } } withName: METABAT2_METABAT2 { - cpus = { check_max (8 * task.attempt, 'cpus' ) } - memory = { check_max (20.GB * task.attempt, 'memory' ) } - time = { check_max (8.h * task.attempt, 'time' ) } + cpus = { 8 * task.attempt } + memory = { 20.GB * task.attempt } + time = { 8.h * task.attempt } } withName: MAG_DEPTHS { - memory = { check_max (16.GB * task.attempt, 'memory' ) } + memory = { 16.GB * task.attempt } } withName: BUSCO { - cpus = { check_max (8 * task.attempt, 'cpus' ) } - memory = { check_max (20.GB * task.attempt, 'memory' ) } + cpus = { 8 * task.attempt } + memory = { 20.GB * task.attempt } } withName: MAXBIN2 { - // often fails when insufficient information, so we allow it to gracefully fail without failing the pipeline - errorStrategy = { task.exitStatus in [ 1, 255 ] ? 'ignore' : 'retry' } + errorStrategy = { task.exitStatus in [1, 255] ? 'ignore' : 'retry' } } withName: DASTOOL_DASTOOL { - // if SCGs not found, bins cannot be assigned and DAS_tool will die with exit status 1 errorStrategy = { task.exitStatus in ((130..145) + 104) ? 'retry' : task.exitStatus == 1 ? 'ignore' : 'finish' } } } diff --git a/conf/igenomes_ignored.config b/conf/igenomes_ignored.config new file mode 100644 index 00000000..b4034d82 --- /dev/null +++ b/conf/igenomes_ignored.config @@ -0,0 +1,9 @@ +/* +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Nextflow config file for iGenomes paths +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Empty genomes dictionary to use when igenomes is ignored. +---------------------------------------------------------------------------------------- +*/ + +params.genomes = [:] diff --git a/conf/modules.config b/conf/modules.config index 954263a1..0fbea292 100644 --- a/conf/modules.config +++ b/conf/modules.config @@ -13,20 +13,11 @@ process { //default: do not publish into the results folder - publishDir = [ - path: { "${params.outdir}/${task.process.tokenize(':')[-1].tokenize('_')[0].toLowerCase()}" }, - mode: params.publish_dir_mode, - saveAs: { filename -> filename.equals('versions.yml') ? null : filename }, - enabled: false - ] + publishDir = [path: { "${params.outdir}/${task.process.tokenize(':')[-1].tokenize('_')[0].toLowerCase()}" }, mode: params.publish_dir_mode, saveAs: { filename -> filename.equals('versions.yml') ? null : filename }, enabled: false] withName: FASTQC_RAW { ext.args = '--quiet' - publishDir = [ - path: { "${params.outdir}/QC_shortreads/fastqc" }, - mode: params.publish_dir_mode, - pattern: "*.html" - ] + publishDir = [path: { "${params.outdir}/QC_shortreads/fastqc" }, mode: params.publish_dir_mode, pattern: "*.html"] ext.prefix = { "${meta.id}_run${meta.run}_raw" } tag = { "${meta.id}_run${meta.run}_raw" } } @@ -171,36 +162,40 @@ process { publishDir = [ path: { "${params.outdir}/QC_longreads/porechop" }, mode: params.publish_dir_mode, - pattern: "*_trimmed.fastq", + pattern: "*_porechop_trimmed.fastq.gz", enabled: params.save_porechop_reads ] - ext.prefix = { "${meta.id}_run${meta.run}_trimmed" } + ext.prefix = { "${meta.id}_run${meta.run}_porechop_trimmed" } + } + + withName: PORECHOP_ABI { + publishDir = [ + path: { "${params.outdir}/QC_longreads/porechop" }, + mode: params.publish_dir_mode, + pattern: "*_porechop-abi_trimmed.fastq.gz", + enabled: params.save_porechop_reads + ] + ext.prefix = { "${meta.id}_run${meta.run}_porechop-abi_trimmed" } } withName: FILTLONG { + ext.args = [ + "--min_length ${params.longreads_min_length}", + "--keep_percent ${params.longreads_keep_percent}", + "--trim", + "--length_weight ${params.longreads_length_weight}" + ].join(' ').trim() publishDir = [ path: { "${params.outdir}/QC_longreads/Filtlong" }, mode: params.publish_dir_mode, - pattern: "*_lr_filtlong.fastq.gz", + pattern: "*_filtlong.fastq.gz", enabled: params.save_filtlong_reads ] - ext.prefix = { "${meta.id}_run${meta.run}_lengthfiltered" } + ext.prefix = { "${meta.id}_run${meta.run}_filtlong" } } withName: NANOLYSE { - publishDir = [ - [ - path: { "${params.outdir}/QC_longreads/NanoLyse" }, - mode: params.publish_dir_mode, - pattern: "*.log" - ], - [ - path: { "${params.outdir}/QC_longreads/NanoLyse" }, - mode: params.publish_dir_mode, - pattern: "*_nanolyse.fastq.gz", - enabled: params.save_lambdaremoved_reads - ] - ] + publishDir = [[path: { "${params.outdir}/QC_longreads/NanoLyse" }, mode: params.publish_dir_mode, pattern: "*.log"], [path: { "${params.outdir}/QC_longreads/NanoLyse" }, mode: params.publish_dir_mode, pattern: "*_nanolyse.fastq.gz", enabled: params.save_lambdaremoved_reads]] ext.prefix = { "${meta.id}_run${meta.run}_lambdafiltered" } } @@ -236,20 +231,12 @@ process { } withName: CENTRIFUGE_CENTRIFUGE { - publishDir = [ - path: { "${params.outdir}/Taxonomy/centrifuge/${meta.id}" }, - mode: params.publish_dir_mode, - pattern: "*.txt" - ] + publishDir = [path: { "${params.outdir}/Taxonomy/centrifuge/${meta.id}" }, mode: params.publish_dir_mode, pattern: "*.txt"] } withName: CENTRIFUGE_KREPORT { ext.prefix = { "${meta.id}_kreport" } - publishDir = [ - path: { "${params.outdir}/Taxonomy/centrifuge/${meta.id}" }, - mode: params.publish_dir_mode, - pattern: "*.txt" - ] + publishDir = [path: { "${params.outdir}/Taxonomy/centrifuge/${meta.id}" }, mode: params.publish_dir_mode, pattern: "*.txt"] } withName: KRAKEN2 { @@ -262,62 +249,33 @@ process { } withName: KREPORT2KRONA_CENTRIFUGE { - publishDir = [ - path: { "${params.outdir}/Taxonomy/${meta.classifier}/${meta.id}" }, - mode: params.publish_dir_mode, - pattern: "*.txt", - enabled: false - ] + publishDir = [path: { "${params.outdir}/Taxonomy/${meta.classifier}/${meta.id}" }, mode: params.publish_dir_mode, pattern: "*.txt", enabled: false] } withName: KRONA_KTIMPORTTAXONOMY { - publishDir = [ - path: { "${params.outdir}/Taxonomy/${meta.classifier}/${meta.id}" }, - mode: params.publish_dir_mode, - pattern: "*.html" - ] + publishDir = [path: { "${params.outdir}/Taxonomy/${meta.classifier}/${meta.id}" }, mode: params.publish_dir_mode, pattern: "*.html"] } - //pattern: "*.{fa.gz,log}" //'pattern' didnt work, probably because the output is in a folder, solved with 'saveAs' withName: MEGAHIT { - ext.args = params.megahit_options ?: '' - publishDir = [ - path: { "${params.outdir}/Assembly" }, - mode: params.publish_dir_mode, - saveAs: { filename -> - filename.equals('versions.yml') - ? null - : filename.indexOf('.contigs.fa.gz') > 0 - ? filename - : filename.indexOf('.log') > 0 ? filename : null - } - ] + ext.args = { params.megahit_options ? params.megahit_options + "-m ${task.memory.toBytes()}" : "-m ${task.memory.toBytes()}" } + ext.prefix = { "MEGAHIT-${meta.id}" } + publishDir = [path: { "${params.outdir}/Assembly/MEGAHIT" }, mode: params.publish_dir_mode, pattern: "*.{fa.gz,log}"] } - withName: SPADES { - ext.args = params.spades_options ?: '' - publishDir = [ - path: { "${params.outdir}/Assembly/SPAdes" }, - mode: params.publish_dir_mode, - pattern: "*.{fasta.gz,gfa.gz,log}" - ] + withName: METASPADES { + ext.args = params.spades_options ? params.spades_options + ' --meta' : '--meta' + ext.prefix = { "SPAdes-${meta.id}" } + publishDir = [path: { "${params.outdir}/Assembly/SPAdes" }, mode: params.publish_dir_mode, pattern: "*.{fasta.gz,gfa.gz,fa.gz,log}"] } - withName: SPADESHYBRID { - ext.args = params.spades_options ?: '' - publishDir = [ - path: { "${params.outdir}/Assembly/SPAdesHybrid" }, - mode: params.publish_dir_mode, - pattern: "*.{fasta.gz,gfa.gz,log}" - ] + withName: METASPADESHYBRID { + ext.args = params.spades_options ? params.spades_options + ' --meta' : '--meta' + ext.prefix = { "SPAdesHybrid-${meta.id}" } + publishDir = [path: { "${params.outdir}/Assembly/SPAdesHybrid" }, mode: params.publish_dir_mode, pattern: "*.{fasta.gz,gfa.gz,fa.gz,log}"] } withName: QUAST { - publishDir = [ - path: { "${params.outdir}/Assembly/${meta.assembler}/QC/${meta.id}" }, - mode: params.publish_dir_mode, - saveAs: { filename -> filename.equals('versions.yml') ? null : filename } - ] + publishDir = [path: { "${params.outdir}/Assembly/${meta.assembler}/QC/${meta.id}" }, mode: params.publish_dir_mode, saveAs: { filename -> filename.equals('versions.yml') ? null : filename }] } withName: GENOMAD_ENDTOEND { @@ -352,11 +310,7 @@ process { } withName: 'MAG_DEPTHS_PLOT|MAG_DEPTHS_SUMMARY' { - publishDir = [ - path: { "${params.outdir}/GenomeBinning/depths/bins" }, - mode: params.publish_dir_mode, - pattern: "*.{png,tsv}" - ] + publishDir = [path: { "${params.outdir}/GenomeBinning/depths/bins" }, mode: params.publish_dir_mode, pattern: "*.{png,tsv}"] } withName: BIN_SUMMARY { @@ -368,11 +322,7 @@ process { } withName: BUSCO_DB_PREPARATION { - publishDir = [ - path: { "${params.outdir}/GenomeBinning/QC/BUSCO" }, - mode: params.publish_dir_mode, - pattern: "*.tar.gz" - ] + publishDir = [path: { "${params.outdir}/GenomeBinning/QC/BUSCO" }, mode: params.publish_dir_mode, pattern: "*.tar.gz"] } withName: BUSCO { @@ -387,40 +337,21 @@ process { } withName: BUSCO_SAVE_DOWNLOAD { - publishDir = [ - path: { "${params.outdir}/GenomeBinning/QC/BUSCO" }, - mode: params.publish_dir_mode, - overwrite: false, - saveAs: { filename -> filename.equals('versions.yml') ? null : filename } - ] + publishDir = [path: { "${params.outdir}/GenomeBinning/QC/BUSCO" }, mode: params.publish_dir_mode, overwrite: false, saveAs: { filename -> filename.equals('versions.yml') ? null : filename }] } withName: 'BUSCO_SUMMARY|QUAST_BINS|QUAST_BINS_SUMMARY' { - publishDir = [ - path: { "${params.outdir}/GenomeBinning/QC" }, - mode: params.publish_dir_mode, - saveAs: { filename -> filename.equals('versions.yml') ? null : filename } - ] + publishDir = [path: { "${params.outdir}/GenomeBinning/QC" }, mode: params.publish_dir_mode, saveAs: { filename -> filename.equals('versions.yml') ? null : filename }] } withName: ARIA2_UNTAR { - publishDir = [ - path: { "${params.outdir}/GenomeBinning/QC/CheckM/checkm_downloads" }, - mode: params.publish_dir_mode, - overwrite: false, - saveAs: { filename -> filename.equals('versions.yml') ? null : filename }, - enabled: params.save_checkm_data - ] + publishDir = [path: { "${params.outdir}/GenomeBinning/QC/CheckM/checkm_downloads" }, mode: params.publish_dir_mode, overwrite: false, saveAs: { filename -> filename.equals('versions.yml') ? null : filename }, enabled: params.save_checkm_data] } withName: CHECKM_LINEAGEWF { tag = { "${meta.assembler}-${meta.binner}-${meta.domain}-${meta.refinement}-${meta.id}" } ext.prefix = { "${meta.assembler}-${meta.binner}-${meta.domain}-${meta.refinement}-${meta.id}_wf" } - publishDir = [ - path: { "${params.outdir}/GenomeBinning/QC/CheckM" }, - mode: params.publish_dir_mode, - saveAs: { filename -> filename.equals('versions.yml') ? null : filename } - ] + publishDir = [path: { "${params.outdir}/GenomeBinning/QC/CheckM" }, mode: params.publish_dir_mode, saveAs: { filename -> filename.equals('versions.yml') ? null : filename }] } withName: CHECKM_QA { @@ -435,11 +366,7 @@ process { withName: COMBINE_CHECKM_TSV { ext.prefix = { "checkm_summary" } - publishDir = [ - path: { "${params.outdir}/GenomeBinning/QC" }, - mode: params.publish_dir_mode, - saveAs: { filename -> filename.equals('versions.yml') ? null : filename } - ] + publishDir = [path: { "${params.outdir}/GenomeBinning/QC" }, mode: params.publish_dir_mode, saveAs: { filename -> filename.equals('versions.yml') ? null : filename }] } withName: GUNC_DOWNLOADDB { @@ -470,27 +397,15 @@ process { } withName: CAT_DB_GENERATE { - publishDir = [ - path: { "${params.outdir}/Taxonomy/CAT" }, - mode: params.publish_dir_mode, - pattern: "*.tar.gz" - ] + publishDir = [path: { "${params.outdir}/Taxonomy/CAT" }, mode: params.publish_dir_mode, pattern: "*.tar.gz"] } withName: CAT { - publishDir = [ - path: { "${params.outdir}/Taxonomy/CAT/${meta.assembler}/${meta.binner}" }, - mode: params.publish_dir_mode, - saveAs: { filename -> filename.equals('versions.yml') ? null : filename } - ] + publishDir = [path: { "${params.outdir}/Taxonomy/CAT/${meta.assembler}/${meta.binner}" }, mode: params.publish_dir_mode, saveAs: { filename -> filename.equals('versions.yml') ? null : filename }] } withName: CAT_SUMMARY { ext.prefix = "cat_summary" - publishDir = [ - path: { "${params.outdir}/Taxonomy/CAT/" }, - mode: params.publish_dir_mode, - saveAs: { filename -> filename.equals('versions.yml') ? null : filename } - ] + publishDir = [path: { "${params.outdir}/Taxonomy/CAT/" }, mode: params.publish_dir_mode, saveAs: { filename -> filename.equals('versions.yml') ? null : filename }] } withName: GTDBTK_CLASSIFYWF { @@ -510,49 +425,30 @@ process { withName: GTDBTK_SUMMARY { ext.args = "--extension fa" - publishDir = [ - path: { "${params.outdir}/Taxonomy/GTDB-Tk" }, - mode: params.publish_dir_mode, - saveAs: { filename -> filename.equals('versions.yml') ? null : filename } - ] + publishDir = [path: { "${params.outdir}/Taxonomy/GTDB-Tk" }, mode: params.publish_dir_mode, saveAs: { filename -> filename.equals('versions.yml') ? null : filename }] } withName: PROKKA { ext.args = "--metagenome" - publishDir = [ - path: { "${params.outdir}/Annotation/Prokka/${meta.assembler}" }, - mode: params.publish_dir_mode, - saveAs: { filename -> filename.equals('versions.yml') ? null : filename } - ] + publishDir = [path: { "${params.outdir}/Annotation/Prokka/${meta.assembler}" }, mode: params.publish_dir_mode, saveAs: { filename -> filename.equals('versions.yml') ? null : filename }] } withName: PRODIGAL { ext.args = "-p meta" - publishDir = [ - path: { "${params.outdir}/Annotation/Prodigal/${meta.assembler}/${meta.id}" }, - mode: params.publish_dir_mode, - saveAs: { filename -> filename.equals('versions.yml') ? null : filename } - ] + ext.prefix = { "${meta.assembler}-${meta.id}_prodigal" } + publishDir = [path: { "${params.outdir}/Annotation/Prodigal/${meta.assembler}/${meta.id}" }, mode: params.publish_dir_mode, saveAs: { filename -> filename.equals('versions.yml') ? null : filename }] } withName: FREEBAYES { ext.prefix = { "${meta.assembler}-${meta.id}" } ext.args = "-p ${params.freebayes_ploidy} -q ${params.freebayes_min_basequality} -F ${params.freebayes_minallelefreq}" - publishDir = [ - path: { "${params.outdir}/Ancient_DNA/variant_calling/freebayes" }, - mode: params.publish_dir_mode, - pattern: "*.vcf.gz" - ] + publishDir = [path: { "${params.outdir}/Ancient_DNA/variant_calling/freebayes" }, mode: params.publish_dir_mode, pattern: "*.vcf.gz"] } withName: BCFTOOLS_VIEW { ext.prefix = { "${meta.assembler}-${meta.id}.filtered" } ext.args = "-v snps,mnps -i 'QUAL>=${params.bcftools_view_high_variant_quality} || (QUAL>=${params.bcftools_view_medium_variant_quality} && FORMAT/AO>=${params.bcftools_view_minimal_allelesupport})'" - publishDir = [ - path: { "${params.outdir}/Ancient_DNA/variant_calling/filtered" }, - mode: params.publish_dir_mode, - pattern: "*.vcf.gz" - ] + publishDir = [path: { "${params.outdir}/Ancient_DNA/variant_calling/filtered" }, mode: params.publish_dir_mode, pattern: "*.vcf.gz"] } withName: BCFTOOLS_CONSENSUS { @@ -601,32 +497,12 @@ process { } withName: METABAT2_JGISUMMARIZEBAMCONTIGDEPTHS { - publishDir = [ - path: { "${params.outdir}/GenomeBinning/depths/contigs" }, - mode: params.publish_dir_mode, - pattern: '*-depth.txt.gz' - ] + publishDir = [path: { "${params.outdir}/GenomeBinning/depths/contigs" }, mode: params.publish_dir_mode, pattern: '*-depth.txt.gz'] ext.prefix = { "${meta.assembler}-${meta.id}-depth" } } withName: METABAT2_METABAT2 { - publishDir = [ - [ - path: { "${params.outdir}/GenomeBinning/MetaBAT2/bins/" }, - mode: params.publish_dir_mode, - pattern: '*[!lowDepth|tooShort|unbinned].fa.gz' - ], - [ - path: { "${params.outdir}/GenomeBinning/MetaBAT2/discarded" }, - mode: params.publish_dir_mode, - pattern: '*tooShort.fa.gz' - ], - [ - path: { "${params.outdir}/GenomeBinning/MetaBAT2/discarded" }, - mode: params.publish_dir_mode, - pattern: '*lowDepth.fa.gz' - ] - ] + publishDir = [[path: { "${params.outdir}/GenomeBinning/MetaBAT2/bins/" }, mode: params.publish_dir_mode, pattern: '*[!lowDepth|tooShort|unbinned].fa.gz'], [path: { "${params.outdir}/GenomeBinning/MetaBAT2/discarded" }, mode: params.publish_dir_mode, pattern: '*tooShort.fa.gz'], [path: { "${params.outdir}/GenomeBinning/MetaBAT2/discarded" }, mode: params.publish_dir_mode, pattern: '*lowDepth.fa.gz']] ext.prefix = { "${meta.assembler}-MetaBAT2-${meta.id}" } ext.args = [ params.min_contig_size < 1500 ? "-m 1500" : "-m ${params.min_contig_size}", @@ -641,6 +517,11 @@ process { path: { "${params.outdir}/GenomeBinning/MaxBin2/discarded" }, mode: params.publish_dir_mode, pattern: '*.tooshort.gz' + ], + [ + path: { "${params.outdir}/GenomeBinning/MaxBin2/" }, + mode: params.publish_dir_mode, + pattern: '*.{summary,abundance}' ] ] ext.prefix = { "${meta.assembler}-MaxBin2-${meta.id}" } @@ -674,23 +555,7 @@ process { } withName: SPLIT_FASTA { - publishDir = [ - [ - path: { "${params.outdir}/GenomeBinning/${meta.binner}/unbinned" }, - mode: params.publish_dir_mode, - pattern: '*.*[0-9].fa.gz' - ], - [ - path: { "${params.outdir}/GenomeBinning/${meta.binner}/unbinned/discarded" }, - mode: params.publish_dir_mode, - pattern: '*.pooled.fa.gz' - ], - [ - path: { "${params.outdir}/GenomeBinning/${meta.binner}/unbinned/discarded" }, - mode: params.publish_dir_mode, - pattern: '*.remaining.fa.gz' - ] - ] + publishDir = [[path: { "${params.outdir}/GenomeBinning/${meta.binner}/unbinned" }, mode: params.publish_dir_mode, pattern: '*.*[0-9].fa.gz'], [path: { "${params.outdir}/GenomeBinning/${meta.binner}/unbinned/discarded" }, mode: params.publish_dir_mode, pattern: '*.pooled.fa.gz'], [path: { "${params.outdir}/GenomeBinning/${meta.binner}/unbinned/discarded" }, mode: params.publish_dir_mode, pattern: '*.remaining.fa.gz']] } withName: DASTOOL_FASTATOCONTIG2BIN_METABAT2 { @@ -752,32 +617,19 @@ process { } withName: TIARA_SUMMARY { - publishDir = [ - path: { "${params.outdir}/GenomeBinning/Tiara" }, - mode: params.publish_dir_mode, - pattern: "tiara_summary.tsv" - ] + publishDir = [path: { "${params.outdir}/GenomeBinning/Tiara" }, mode: params.publish_dir_mode, pattern: "tiara_summary.tsv"] ext.prefix = "tiara_summary" } withName: MMSEQS_DATABASES { ext.prefix = { "${params.metaeuk_mmseqs_db.replaceAll("/", "-")}" } - publishDir = [ - path: { "${params.outdir}/Annotation/mmseqs_db/" }, - mode: params.publish_dir_mode, - saveAs: { filename -> filename.equals('versions.yml') ? null : filename }, - enabled: params.save_mmseqs_db - ] + publishDir = [path: { "${params.outdir}/Annotation/mmseqs_db/" }, mode: params.publish_dir_mode, saveAs: { filename -> filename.equals('versions.yml') ? null : filename }, enabled: params.save_mmseqs_db] } withName: METAEUK_EASYPREDICT { ext.args = "" ext.prefix = { "${meta.id}" } - publishDir = [ - path: { "${params.outdir}/Annotation/MetaEuk/${meta.assembler}/${meta.id}" }, - mode: params.publish_dir_mode, - saveAs: { filename -> filename.equals('versions.yml') ? null : filename } - ] + publishDir = [path: { "${params.outdir}/Annotation/MetaEuk/${meta.assembler}/${meta.id}" }, mode: params.publish_dir_mode, saveAs: { filename -> filename.equals('versions.yml') ? null : filename }] } withName: MULTIQC { diff --git a/conf/test.config b/conf/test.config index c7a3ff3f..04fced63 100644 --- a/conf/test.config +++ b/conf/test.config @@ -10,25 +10,28 @@ ---------------------------------------------------------------------------------------- */ -params { - config_profile_name = 'Test profile' - config_profile_description = 'Minimal test dataset to check pipeline function' +process { + resourceLimits = [ + cpus: 4, + memory: '15.GB', + time: '1.h' + ] +} - // Limit resources so that this can run on GitHub Actions - max_cpus = 2 - max_memory = '6.GB' - max_time = '6.h' +params { + config_profile_name = 'Test profile' + config_profile_description = 'Minimal test dataset to check pipeline function' // Input data - input = params.pipelines_testdata_base_path + 'mag/samplesheets/samplesheet.multirun.csv' - centrifuge_db = params.pipelines_testdata_base_path + 'mag/test_data/minigut_cf.tar.gz' - kraken2_db = params.pipelines_testdata_base_path + 'mag/test_data/minigut_kraken.tgz' - skip_krona = false - min_length_unbinned_contigs = 1 - max_unbinned_contigs = 2 - busco_db = "https://busco-data.ezlab.org/v5/data/lineages/bacteria_odb10.2024-01-08.tar.gz" - busco_clean = true - skip_gtdbtk = true - gtdbtk_min_completeness = 0 - skip_concoct = true + input = params.pipelines_testdata_base_path + 'mag/samplesheets/samplesheet.multirun.csv' + centrifuge_db = params.pipelines_testdata_base_path + 'mag/test_data/minigut_cf.tar.gz' + kraken2_db = params.pipelines_testdata_base_path + 'mag/test_data/minigut_kraken.tgz' + skip_krona = false + min_length_unbinned_contigs = 1 + max_unbinned_contigs = 2 + busco_db = "https://busco-data.ezlab.org/v5/data/lineages/bacteria_odb10.2024-01-08.tar.gz" + busco_clean = true + skip_gtdbtk = true + gtdbtk_min_completeness = 0.01 + skip_concoct = true } diff --git a/conf/test_adapterremoval.config b/conf/test_adapterremoval.config index 7ec304e8..63f04cf6 100644 --- a/conf/test_adapterremoval.config +++ b/conf/test_adapterremoval.config @@ -10,14 +10,18 @@ ---------------------------------------------------------------------------------------- */ -params { - config_profile_name = 'Test profile for running with AdapterRemoval and domain classification' - config_profile_description = 'Minimal test dataset to check pipeline function with AdapterRemoval data and domain classification.' +// Limit resources so that this can run on GitHub Actions +process { + resourceLimits = [ + cpus: 4, + memory: '15.GB', + time: '1.h' + ] +} - // Limit resources so that this can run on GitHub Actions - max_cpus = 2 - max_memory = '6.GB' - max_time = '6.h' +params { + config_profile_name = 'Test profile for running with AdapterRemoval and domain classification' + config_profile_description = 'Minimal test dataset to check pipeline function with AdapterRemoval data and domain classification.' // Input data input = params.pipelines_testdata_base_path + 'mag/samplesheets/samplesheet.euk.csv' @@ -29,7 +33,7 @@ params { max_unbinned_contigs = 2 busco_db = "https://busco-data.ezlab.org/v5/data/lineages/bacteria_odb10.2024-01-08.tar.gz" skip_gtdbtk = true - gtdbtk_min_completeness = 0 + gtdbtk_min_completeness = 0.01 clip_tool = 'adapterremoval' skip_concoct = true bin_domain_classification = true diff --git a/conf/test_ancient_dna.config b/conf/test_ancient_dna.config index e9d48205..e8dab425 100644 --- a/conf/test_ancient_dna.config +++ b/conf/test_ancient_dna.config @@ -10,14 +10,19 @@ ---------------------------------------------------------------------------------------- */ -params { - config_profile_name = 'Ancient DNA test profile ' - config_profile_description = 'Minimal test dataset to check pipeline function for ancient DNA step' +// Limit resources so that this can run on GitHub Actions +process { + resourceLimits = [ + cpus: 4, + memory: '15.GB', + time: '1.h' + ] +} - // Limit resources so that this can run on GitHub Actions - max_cpus = 2 - max_memory = '6.GB' - max_time = '6.h' + +params { + config_profile_name = 'Ancient DNA test profile ' + config_profile_description = 'Minimal test dataset to check pipeline function for ancient DNA step' // Input data input = params.pipelines_testdata_base_path + 'mag/samplesheets/samplesheet.csv' @@ -28,7 +33,7 @@ params { max_unbinned_contigs = 2 busco_db = "https://busco-data.ezlab.org/v5/data/lineages/bacteria_odb10.2024-01-08.tar.gz" skip_gtdbtk = true - gtdbtk_min_completeness = 0 + gtdbtk_min_completeness = 0.01 ancient_dna = true binning_map_mode = 'own' skip_spades = false diff --git a/conf/test_bbnorm.config b/conf/test_bbnorm.config index 35442fea..223f99a3 100644 --- a/conf/test_bbnorm.config +++ b/conf/test_bbnorm.config @@ -10,32 +10,36 @@ ---------------------------------------------------------------------------------------- */ -params { - config_profile_name = 'Test profile' - config_profile_description = 'Minimal test dataset to check pipeline function' +// Limit resources so that this can run on GitHub Actions +process { + resourceLimits = [ + cpus: 4, + memory: '15.GB', + time: '1.h' + ] +} - // Limit resources so that this can run on GitHub Actions - max_cpus = 2 - max_memory = '6.GB' - max_time = '6.h' +params { + config_profile_name = 'Test profile' + config_profile_description = 'Minimal test dataset to check pipeline function' // Input data - input = params.pipelines_testdata_base_path + 'mag/samplesheets/samplesheet.csv' - keep_phix = true - skip_clipping = true - skip_prokka = true - skip_prodigal = true - skip_quast = true - skip_binning = true - centrifuge_db = params.pipelines_testdata_base_path + 'mag/test_data/minigut_cf.tar.gz' - kraken2_db = params.pipelines_testdata_base_path + 'mag/test_data/minigut_kraken.tgz' - skip_krona = true - min_length_unbinned_contigs = 1 - max_unbinned_contigs = 2 - busco_db = "https://busco-data.ezlab.org/v5/data/lineages/bacteria_odb10.2024-01-08.tar.gz" - busco_clean = true - skip_gtdbtk = true - gtdbtk_min_completeness = 0 - bbnorm = true - coassemble_group = true + input = params.pipelines_testdata_base_path + 'mag/samplesheets/samplesheet.csv' + keep_phix = true + skip_clipping = true + skip_prokka = true + skip_prodigal = true + skip_quast = true + skip_binning = true + centrifuge_db = params.pipelines_testdata_base_path + 'mag/test_data/minigut_cf.tar.gz' + kraken2_db = params.pipelines_testdata_base_path + 'mag/test_data/minigut_kraken.tgz' + skip_krona = true + min_length_unbinned_contigs = 1 + max_unbinned_contigs = 2 + busco_db = "https://busco-data.ezlab.org/v5/data/lineages/bacteria_odb10.2024-01-08.tar.gz" + busco_clean = true + skip_gtdbtk = true + gtdbtk_min_completeness = 0.01 + bbnorm = true + coassemble_group = true } diff --git a/conf/test_binrefinement.config b/conf/test_binrefinement.config index 180775e2..9602197c 100644 --- a/conf/test_binrefinement.config +++ b/conf/test_binrefinement.config @@ -10,14 +10,18 @@ ---------------------------------------------------------------------------------------- */ -params { - config_profile_name = 'Test profile' - config_profile_description = 'Minimal test dataset to check pipeline function' +// Limit resources so that this can run on GitHub Actions +process { + resourceLimits = [ + cpus: 4, + memory: '15.GB', + time: '1.h' + ] +} - // Limit resources so that this can run on GitHub Actions - max_cpus = 2 - max_memory = '6.GB' - max_time = '6.h' +params { + config_profile_name = 'Test profile' + config_profile_description = 'Minimal test dataset to check pipeline function' // Input data input = params.pipelines_testdata_base_path + 'mag/samplesheets/samplesheet.csv' @@ -29,7 +33,7 @@ params { max_unbinned_contigs = 2 busco_db = "https://busco-data.ezlab.org/v5/data/lineages/bacteria_odb10.2024-01-08.tar.gz" skip_gtdbtk = true - gtdbtk_min_completeness = 0 + gtdbtk_min_completeness = 0.01 refine_bins_dastool = true refine_bins_dastool_threshold = 0 // TODO not using 'both' until #489 merged diff --git a/conf/test_busco_auto.config b/conf/test_busco_auto.config index 8302f753..902a8d89 100644 --- a/conf/test_busco_auto.config +++ b/conf/test_busco_auto.config @@ -10,14 +10,18 @@ ---------------------------------------------------------------------------------------- */ -params { - config_profile_name = 'Test profile' - config_profile_description = 'Minimal test dataset to check pipeline function' +// Limit resources so that this can run on GitHub Actions +process { + resourceLimits = [ + cpus: 4, + memory: '15.GB', + time: '1.h' + ] +} - // Limit resources so that this can run on GitHub Actions - max_cpus = 2 - max_memory = '6.GB' - max_time = '6.h' +params { + config_profile_name = 'Test profile' + config_profile_description = 'Minimal test dataset to check pipeline function' // Input data input = params.pipelines_testdata_base_path + 'mag/samplesheets/samplesheet.csv' @@ -25,7 +29,7 @@ params { min_length_unbinned_contigs = 1 max_unbinned_contigs = 2 skip_gtdbtk = true - gtdbtk_min_completeness = 0 + gtdbtk_min_completeness = 0.01 skip_prokka = true skip_prodigal = true skip_quast = true diff --git a/conf/test_concoct.config b/conf/test_concoct.config index b427fd2c..2d90ab50 100644 --- a/conf/test_concoct.config +++ b/conf/test_concoct.config @@ -11,33 +11,37 @@ ---------------------------------------------------------------------------------------- */ +// Limit resources so that this can run on GitHub Actions +process { + resourceLimits = [ + cpus: 4, + memory: '15.GB', + time: '1.h' + ] +} + params { config_profile_name = 'Test CONCOCT profile' config_profile_description = 'Minimal test dataset to check pipeline function' - // Limit resources so that this can run on GitHub Actions - max_cpus = 2 - max_memory = '6.GB' - max_time = '6.h' - // Input data - input = params.pipelines_testdata_base_path + 'mag/samplesheets/samplesheet.csv' - centrifuge_db = null - kraken2_db = null - skip_krona = true - skip_clipping = true - skip_adapter_trimming = false - skip_spades = true - skip_spadeshybrid = true - skip_megahit = false - skip_quast = true - skip_prodigal = true - skip_binning = false - skip_metabat2 = false - skip_maxbin2 = true - skip_concoct = false - skip_prokka = true - skip_binqc = true - skip_gtdbtk = true - gtdbtk_min_completeness = 0 + input = params.pipelines_testdata_base_path + 'mag/samplesheets/samplesheet.csv' + centrifuge_db = null + kraken2_db = null + skip_krona = true + skip_clipping = true + skip_adapter_trimming = false + skip_spades = true + skip_spadeshybrid = true + skip_megahit = false + skip_quast = true + skip_prodigal = true + skip_binning = false + skip_metabat2 = false + skip_maxbin2 = true + skip_concoct = false + skip_prokka = true + skip_binqc = true + skip_gtdbtk = true + gtdbtk_min_completeness = 0.01 } diff --git a/conf/test_full.config b/conf/test_full.config index 9a01bc58..ed5923d0 100644 --- a/conf/test_full.config +++ b/conf/test_full.config @@ -16,30 +16,30 @@ params { // Input data for full size test // hg19 reference with highly conserved and low-complexity regions masked by Brian Bushnell - host_fasta = "s3://ngi-igenomes/test-data/mag/hg19_main_mask_ribo_animal_allplant_allfungus.fa.gz" - input = "s3://ngi-igenomes/test-data/mag/samplesheets/samplesheet.full.csv" + host_fasta = "s3://ngi-igenomes/test-data/mag/hg19_main_mask_ribo_animal_allplant_allfungus.fa.gz" + input = "s3://ngi-igenomes/test-data/mag/samplesheets/samplesheet.full.csv" //centrifuge_db = "s3://ngi-igenomes/test-data/mag/p_compressed+h+v.tar.gz" - kraken2_db = "s3://ngi-igenomes/test-data/mag/minikraken_8GB_202003.tgz" - cat_db = "s3://ngi-igenomes/test-data/mag/CAT_prepare_20210107.tar.gz" + kraken2_db = "s3://ngi-igenomes/test-data/mag/minikraken_8GB_202003.tgz" + cat_db = "s3://ngi-igenomes/test-data/mag/CAT_prepare_20210107.tar.gz" // gtdb_db = "s3://ngi-igenomes/test-data/mag/gtdbtk_r214_data.tar.gz" ## This should be updated to release 220, once we get GTDB-Tk working again - skip_gtdbtk = true + skip_gtdbtk = true // TODO TEMPORARY: deactivate SPAdes due to incompatibility of container with fusion file system - skip_spades = true - skip_spadeshybrid = true + skip_spades = true + skip_spadeshybrid = true // reproducibility options for assembly - spades_fix_cpus = 10 - spadeshybrid_fix_cpus = 10 - megahit_fix_cpu_1 = true + spades_fix_cpus = 10 + spadeshybrid_fix_cpus = 10 + megahit_fix_cpu_1 = true // available options to enable reproducibility for BUSCO (--busco_db) not used here // to allow detection of possible problems in automated lineage selection mode using public databases // test CAT with official taxonomic ranks only - cat_official_taxonomy = true + cat_official_taxonomy = true // Skip CONCOCT due to timeout issues - skip_concoct = true + skip_concoct = true } diff --git a/conf/test_host_rm.config b/conf/test_host_rm.config index 68c03fb1..e241e03e 100644 --- a/conf/test_host_rm.config +++ b/conf/test_host_rm.config @@ -10,14 +10,18 @@ ---------------------------------------------------------------------------------------- */ -params { - config_profile_name = 'Test profile' - config_profile_description = 'Minimal test dataset to check pipeline function' +// Limit resources so that this can run on GitHub Actions +process { + resourceLimits = [ + cpus: 4, + memory: '15.GB', + time: '1.h' + ] +} - // Limit resources so that this can run on GitHub Actions - max_cpus = 2 - max_memory = '6.GB' - max_time = '6.h' +params { + config_profile_name = 'Test profile' + config_profile_description = 'Minimal test dataset to check pipeline function' // Input data host_fasta = params.pipelines_testdata_base_path + 'mag/host_reference/genome.hg38.chr21_10000bp_region.fa' @@ -26,6 +30,6 @@ params { max_unbinned_contigs = 2 busco_db = "https://busco-data.ezlab.org/v5/data/lineages/bacteria_odb10.2024-01-08.tar.gz" skip_gtdbtk = true - gtdbtk_min_completeness = 0 + gtdbtk_min_completeness = 0.01 skip_concoct = true } diff --git a/conf/test_hybrid.config b/conf/test_hybrid.config index ca6f4c74..cfb0991c 100644 --- a/conf/test_hybrid.config +++ b/conf/test_hybrid.config @@ -10,14 +10,18 @@ ---------------------------------------------------------------------------------------- */ -params { - config_profile_name = 'Test profile' - config_profile_description = 'Minimal test dataset to check pipeline function' +// Limit resources so that this can run on GitHub Actions +process { + resourceLimits = [ + cpus: 4, + memory: '15.GB', + time: '1.h' + ] +} - // Limit resources so that this can run on GitHub Actions - max_cpus = 2 - max_memory = '6.GB' - max_time = '6.h' +params { + config_profile_name = 'Test profile' + config_profile_description = 'Minimal test dataset to check pipeline function' // Input data input = params.pipelines_testdata_base_path + 'mag/samplesheets/samplesheet.hybrid.csv' @@ -25,6 +29,6 @@ params { max_unbinned_contigs = 2 busco_db = "https://busco-data.ezlab.org/v5/data/lineages/bacteria_odb10.2024-01-08.tar.gz" skip_gtdbtk = true - gtdbtk_min_completeness = 0 + gtdbtk_min_completeness = 0.01 skip_concoct = true } diff --git a/conf/test_hybrid_host_rm.config b/conf/test_hybrid_host_rm.config index 3d920995..9ffd3dc7 100644 --- a/conf/test_hybrid_host_rm.config +++ b/conf/test_hybrid_host_rm.config @@ -10,14 +10,18 @@ ---------------------------------------------------------------------------------------- */ -params { - config_profile_name = 'Test profile' - config_profile_description = 'Minimal test dataset to check pipeline function' +// Limit resources so that this can run on GitHub Actions +process { + resourceLimits = [ + cpus: 4, + memory: '15.GB', + time: '1.h' + ] +} - // Limit resources so that this can run on GitHub Actions - max_cpus = 2 - max_memory = '6.GB' - max_time = '6.h' +params { + config_profile_name = 'Test profile' + config_profile_description = 'Minimal test dataset to check pipeline function' // Input data host_fasta = params.pipelines_testdata_base_path + 'mag/host_reference/genome.hg38.chr21_10000bp_region.fa' @@ -27,5 +31,5 @@ params { skip_binqc = true skip_concoct = true skip_gtdbtk = true - gtdbtk_min_completeness = 0 + gtdbtk_min_completeness = 0.01 } diff --git a/conf/test_nothing.config b/conf/test_nothing.config index e5905a9a..0270218f 100644 --- a/conf/test_nothing.config +++ b/conf/test_nothing.config @@ -11,34 +11,38 @@ ---------------------------------------------------------------------------------------- */ +// Limit resources so that this can run on GitHub Actions +process { + resourceLimits = [ + cpus: 4, + memory: '15.GB', + time: '1.h' + ] +} + params { config_profile_name = 'Test nothing profile' config_profile_description = 'Minimal test dataset to check pipeline function' - // Limit resources so that this can run on GitHub Actions - max_cpus = 2 - max_memory = '6.GB' - max_time = '6.h' - // Input data - input = params.pipelines_testdata_base_path + 'mag/samplesheets/samplesheet.csv' - centrifuge_db = null - kraken2_db = null - skip_krona = true - skip_clipping = true - skip_adapter_trimming = true - skip_spades = true - skip_spadeshybrid = true - skip_megahit = true - skip_quast = true - skip_prodigal = true - skip_binning = true - skip_metabat2 = true - skip_maxbin2 = true - skip_concoct = true - skip_prokka = true - skip_binqc = true - skip_gtdbtk = true - gtdbtk_min_completeness = 0 - skip_concoct = true + input = params.pipelines_testdata_base_path + 'mag/samplesheets/samplesheet.csv' + centrifuge_db = null + kraken2_db = null + skip_krona = true + skip_clipping = true + skip_adapter_trimming = true + skip_spades = true + skip_spadeshybrid = true + skip_megahit = true + skip_quast = true + skip_prodigal = true + skip_binning = true + skip_metabat2 = true + skip_maxbin2 = true + skip_concoct = true + skip_prokka = true + skip_binqc = true + skip_gtdbtk = true + gtdbtk_min_completeness = 0.01 + skip_concoct = true } diff --git a/conf/test_single_end.config b/conf/test_single_end.config index fb60a3d0..951a4361 100644 --- a/conf/test_single_end.config +++ b/conf/test_single_end.config @@ -10,29 +10,33 @@ ---------------------------------------------------------------------------------------- */ -params { - config_profile_name = 'Test single-end profile' - config_profile_description = 'Minimal test dataset to check pipeline function' +// Limit resources so that this can run on GitHub Actions +process { + resourceLimits = [ + cpus: 4, + memory: '15.GB', + time: '1.h' + ] +} - // Limit resources so that this can run on GitHub Actions - max_cpus = 2 - max_memory = '6.GB' - max_time = '6.h' +params { + config_profile_name = 'Test single-end profile' + config_profile_description = 'Minimal test dataset to check pipeline function' - input = params.pipelines_testdata_base_path + 'mag/samplesheets/samplesheet.single_end.csv' - single_end = true - centrifuge_db = params.pipelines_testdata_base_path + 'mag/test_data/minigut_cf.tar.gz' - kraken2_db = params.pipelines_testdata_base_path + 'mag/test_data/minigut_kraken.tgz' - skip_krona = true - megahit_fix_cpu_1 = true - spades_fix_cpus = 1 - binning_map_mode = 'own' - min_length_unbinned_contigs = 1000000 - max_unbinned_contigs = 2 - skip_gtdbtk = true - skip_concoct = true - skip_binqc = true - skip_gtdbtk = true - skip_prokka = true - skip_metaeuk = true + input = params.pipelines_testdata_base_path + 'mag/samplesheets/samplesheet.single_end.csv' + single_end = true + centrifuge_db = params.pipelines_testdata_base_path + 'mag/test_data/minigut_cf.tar.gz' + kraken2_db = params.pipelines_testdata_base_path + 'mag/test_data/minigut_kraken.tgz' + skip_krona = true + megahit_fix_cpu_1 = true + spades_fix_cpus = 1 + binning_map_mode = 'own' + min_length_unbinned_contigs = 1000000 + max_unbinned_contigs = 2 + skip_gtdbtk = true + skip_concoct = true + skip_binqc = true + skip_gtdbtk = true + skip_prokka = true + skip_metaeuk = true } diff --git a/conf/test_virus_identification.config b/conf/test_virus_identification.config index 24893899..380401b3 100644 --- a/conf/test_virus_identification.config +++ b/conf/test_virus_identification.config @@ -10,34 +10,38 @@ ---------------------------------------------------------------------------------------- */ +// Limit resources so that this can run on GitHub Actions +process { + resourceLimits = [ + cpus: 4, + memory: '15.GB', + time: '1.h' + ] +} + params { config_profile_name = 'Test profile for running virus_identification' config_profile_description = 'Minimal test dataset to check pipeline function virus identification' - // Limit resources so that this can run on GitHub Actions - max_cpus = 2 - max_memory = '6.GB' - max_time = '6.h' - // Input data - input = params.pipelines_testdata_base_path + 'mag/samplesheets/samplesheet.csv' - run_virus_identification = true - genomad_splits = 7 + input = params.pipelines_testdata_base_path + 'mag/samplesheets/samplesheet.csv' + run_virus_identification = true + genomad_splits = 7 // For computational efficiency - reads_minlength = 150 - coassemble_group = true - skip_gtdbtk = true - gtdbtk_min_completeness = 0 - skip_binning = true - skip_prokka = true - skip_spades = true - skip_spadeshybrid = true - skip_quast = true - skip_prodigal = true - skip_krona = true - skip_adapter_trimming = true - skip_metabat2 = true - skip_maxbin2 = true - skip_busco = true + reads_minlength = 150 + coassemble_group = true + skip_gtdbtk = true + gtdbtk_min_completeness = 0.01 + skip_binning = true + skip_prokka = true + skip_spades = true + skip_spadeshybrid = true + skip_quast = true + skip_prodigal = true + skip_krona = true + skip_adapter_trimming = true + skip_metabat2 = true + skip_maxbin2 = true + skip_busco = true } diff --git a/docs/images/mag_workflow.png b/docs/images/mag_workflow.png index 4c99e66b..2a2abfa3 100644 Binary files a/docs/images/mag_workflow.png and b/docs/images/mag_workflow.png differ diff --git a/docs/images/mag_workflow.svg b/docs/images/mag_workflow.svg index b3434968..e2039d81 100644 --- a/docs/images/mag_workflow.svg +++ b/docs/images/mag_workflow.svg @@ -350,11 +350,11 @@ borderopacity="1.0" inkscape:pageopacity="0.0" inkscape:pageshadow="2" - inkscape:zoom="0.94128294" - inkscape:cx="640.61503" - inkscape:cy="418.57765" + inkscape:zoom="1.8825659" + inkscape:cx="392.01815" + inkscape:cy="508.34874" inkscape:document-units="mm" - inkscape:current-layer="g6248" + inkscape:current-layer="g3-6" showgrid="true" inkscape:window-width="1664" inkscape:window-height="1051" @@ -1062,10 +1062,10 @@ inkscape:export-ydpi="289.40701" inkscape:export-xdpi="289.40701" ry="4.5584702" - y="43.085247" - x="-35.226036" - height="74.083328" - width="39.6875" + y="42.83075" + x="-40.226028" + height="74.33783" + width="44.687492" id="rect4728-66" style="fill:#ffffff;fill-opacity:1;stroke:#000000;stroke-width:0.489677;stroke-linecap:square;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1;paint-order:normal" transform="rotate(-90)" />EvaluationtrimmingtrimmingporechopPorechopporechop_ABIv3.1v3.2 + + + +image/svg+xmlTaxonomicclassificationCentrifugeKraken2VisualizationKronaDomain classificationReportingMultiQC(MAG summary)tsvDBShort reads(required)Adapter/qualitytrimmingBBNormfastpAdapterRemovalHost read removalDepth normalisationBowtie2Remove PhiXBowtie2FastQCEvaluation csvLong reads(optional)NanoPlotEvaluationNanoLyseRemove LambdaFiltlongQuality filteringAdapter/qualitytrimmingPorechopporechop_ABIDBTaxonomic classificationCATGTDB-TkTiaraMetaEukGenome annotationPROKKAProtein-codinggene predictionPRODIGALVirus identificationAssembly(sample- or group-wise)EvaluationQUASTaDNA ValidationpyDamageFreebayesBCFToolsgeNomadSPAdesMEGAHITSPAdesHybridDBBinningMetaBAT2MaxBin2CONCOCTEvaluationBUSCOCheckMGUNCQUAST(Abundance estimation and visualization)v3.2Binning refinementDAS Toolnf-core/magCC-BY 4.0 Design originally by Zandra FagernäsBin post-processing diff --git a/docs/output.md b/docs/output.md index 5f889056..4e43ffb6 100644 --- a/docs/output.md +++ b/docs/output.md @@ -113,6 +113,19 @@ The pipeline uses Nanolyse to map the reads against the Lambda phage and removes The pipeline uses filtlong and porechop to perform quality control of the long reads that are eventually provided with the TSV input file. +
+Output files + +- `QC_longreads/porechop/` + - `[sample]_[run]_porechop_trimmed.fastq.gz`: If `--longread_adaptertrimming_tool 'porechop'`, the adapter trimmed FASTQ files from porechop + - `[sample]_[run]_porechop-abi_trimmed.fastq.gz`: If `--longread_adaptertrimming_tool 'porechop_abi'`, the adapter trimmed FASTQ files from porechop_ABI +- `QC_longreads/filtlong/` + - `[sample]_[run]_filtlong.fastq.gz`: The length and quality filtered reads in FASTQ from Filtlong + +
+ +Trimmed and filtered FASTQ output directories and files will only exist if `--save_porechop_reads` and/or `--save_filtlong_reads` (respectively) are provided to the run command . + No direct host read removal is performed for long reads. However, since within this pipeline filtlong uses a read quality based on k-mer matches to the already filtered short reads, reads not overlapping those short reads might be discarded. The lower the parameter `--longreads_length_weight`, the higher the impact of the read qualities for filtering. @@ -206,10 +219,10 @@ Trimmed (short) reads are assembled with both megahit and SPAdes. Hybrid assembl Output files - `Assembly/SPAdes/` - - `[sample/group]_scaffolds.fasta.gz`: Compressed assembled scaffolds in fasta format - - `[sample/group]_graph.gfa.gz`: Compressed assembly graph in gfa format - - `[sample/group]_contigs.fasta.gz`: Compressed assembled contigs in fasta format - - `[sample/group].log`: Log file + - `[sample/group].scaffolds.fa.gz`: Compressed assembled scaffolds in fasta format + - `[sample/group].assembly.gfa.gz`: Compressed assembly graph in gfa format + - `[sample/group].contigs.fa.gz`: Compressed assembled contigs in fasta format + - `[sample/group].spades.log`: Log file - `QC/[sample/group]/`: Directory containing QUAST files and Bowtie2 mapping logs - `SPAdes-[sample].bowtie2.log`: Bowtie2 log file indicating how many reads have been mapped from the sample that the metagenome was assembled from, only present if `--coassemble_group` is not set. - `SPAdes-[sample/group]-[sampleToMap].bowtie2.log`: Bowtie2 log file indicating how many reads have been mapped from the respective sample ("sampleToMap"). @@ -225,10 +238,10 @@ SPAdesHybrid is a part of the [SPAdes](http://cab.spbu.ru/software/spades/) soft Output files - `Assembly/SPAdesHybrid/` - - `[sample/group]_scaffolds.fasta.gz`: Compressed assembled scaffolds in fasta format - - `[sample/group]_graph.gfa.gz`: Compressed assembly graph in gfa format - - `[sample/group]_contigs.fasta.gz`: Compressed assembled contigs in fasta format - - `[sample/group].log`: Log file + - `[sample/group].scaffolds.fa.gz`: Compressed assembled scaffolds in fasta format + - `[sample/group].assembly.gfa.gz`: Compressed assembly graph in gfa format + - `[sample/group].contigs.fa.gz`: Compressed assembled contigs in fasta format + - `[sample/group].spades.log`: Log file - `QC/[sample/group]/`: Directory containing QUAST files and Bowtie2 mapping logs - `SPAdesHybrid-[sample].bowtie2.log`: Bowtie2 log file indicating how many reads have been mapped from the sample that the metagenome was assembled from, only present if `--coassemble_group` is not set. - `SPAdesHybrid-[sample/group]-[sampleToMap].bowtie2.log`: Bowtie2 log file indicating how many reads have been mapped from the respective sample ("sampleToMap"). diff --git a/docs/usage.md b/docs/usage.md index d3932206..f7582fe6 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -130,9 +130,9 @@ The above pipeline run specified with a params file in yaml format: nextflow run nf-core/mag -profile docker -params-file params.yaml ``` -with `params.yaml` containing: +with: -```yaml +```yaml title="params.yaml" input: './samplesheet.csv' outdir: './results/' <...> @@ -302,14 +302,6 @@ See the main [Nextflow documentation](https://www.nextflow.io/docs/latest/config If you have any questions or issues please send us a message on [Slack](https://nf-co.re/join/slack) on the [`#configs` channel](https://nfcore.slack.com/channels/configs). -## Azure Resource Requests - -To be used with the `azurebatch` profile by specifying the `-profile azurebatch`. -We recommend providing a compute `params.vm_type` of `Standard_D16_v3` VMs by default but these options can be changed if required. - -Note that the choice of VM size depends on your quota and the overall workload during the analysis. -For a thorough list, please refer the [Azure Sizes for virtual machines in Azure](https://docs.microsoft.com/en-us/azure/virtual-machines/sizes). - ## Running in the background Nextflow handles job submissions and supervises the running jobs. The Nextflow process must run until the pipeline is finished. diff --git a/main.nf b/main.nf index 0bd1f674..3771fb1f 100644 --- a/main.nf +++ b/main.nf @@ -9,8 +9,6 @@ ---------------------------------------------------------------------------------------- */ -nextflow.enable.dsl = 2 - /* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ IMPORT FUNCTIONS / MODULES / SUBWORKFLOWS / WORKFLOWS @@ -20,7 +18,6 @@ nextflow.enable.dsl = 2 include { MAG } from './workflows/mag' include { PIPELINE_INITIALISATION } from './subworkflows/local/utils_nfcore_mag_pipeline' include { PIPELINE_COMPLETION } from './subworkflows/local/utils_nfcore_mag_pipeline' - include { getGenomeAttribute } from './subworkflows/local/utils_nfcore_mag_pipeline' /* @@ -60,10 +57,8 @@ workflow NFCORE_MAG { raw_long_reads, input_assemblies ) - emit: multiqc_report = MAG.out.multiqc_report // channel: /path/to/multiqc_report.html - } /* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -74,13 +69,11 @@ workflow NFCORE_MAG { workflow { main: - // // SUBWORKFLOW: Run initialisation tasks // PIPELINE_INITIALISATION ( params.version, - params.help, params.validate_params, params.monochrome_logs, args, @@ -96,7 +89,6 @@ workflow { PIPELINE_INITIALISATION.out.raw_long_reads, PIPELINE_INITIALISATION.out.input_assemblies ) - // // SUBWORKFLOW: Run completion tasks // diff --git a/modules.json b/modules.json index 5f4eb8bb..3eea27cd 100644 --- a/modules.json +++ b/modules.json @@ -104,7 +104,12 @@ }, "fastqc": { "branch": "master", - "git_sha": "285a50500f9e02578d90b3ce6382ea3c30216acd", + "git_sha": "666652151335353eef2fcd58880bcef5bc2928e1", + "installed_by": ["modules"] + }, + "filtlong": { + "branch": "master", + "git_sha": "666652151335353eef2fcd58880bcef5bc2928e1", "installed_by": ["modules"] }, "freebayes": { @@ -164,7 +169,12 @@ }, "maxbin2": { "branch": "master", - "git_sha": "911696ea0b62df80e900ef244d7867d177971f73", + "git_sha": "283613159e079152f1336cef0db1c836086206e0", + "installed_by": ["modules"] + }, + "megahit": { + "branch": "master", + "git_sha": "7755db15e36b30da564cd67fffdfe18a255092aa", "installed_by": ["modules"] }, "metabat2/jgisummarizebamcontigdepths": { @@ -189,7 +199,7 @@ }, "multiqc": { "branch": "master", - "git_sha": "b7ebe95761cd389603f9cc0e0dc384c0f663815a", + "git_sha": "cf17ca47590cc578dfb47db1c2a44ef86f89976d", "installed_by": ["modules"] }, "nanolyse": { @@ -202,6 +212,11 @@ "git_sha": "3135090b46f308a260fc9d5991d7d2f9c0785309", "installed_by": ["modules"] }, + "porechop/abi": { + "branch": "master", + "git_sha": "06c8865e36741e05ad32ef70ab3fac127486af48", + "installed_by": ["modules"] + }, "porechop/porechop": { "branch": "master", "git_sha": "1d68c7f248d1a480c5959548a9234602b771199e", @@ -209,7 +224,7 @@ }, "prodigal": { "branch": "master", - "git_sha": "603ecbd9f45300c9788f197d2a15a005685b4220", + "git_sha": "666652151335353eef2fcd58880bcef5bc2928e1", "installed_by": ["modules"] }, "prokka": { @@ -237,6 +252,11 @@ "git_sha": "911696ea0b62df80e900ef244d7867d177971f73", "installed_by": ["modules"] }, + "spades": { + "branch": "master", + "git_sha": "cfebb244d8c83ae533bf2db399f9af361927d504", + "installed_by": ["modules"] + }, "tiara/tiara": { "branch": "master", "git_sha": "911696ea0b62df80e900ef244d7867d177971f73", @@ -258,17 +278,17 @@ }, "utils_nextflow_pipeline": { "branch": "master", - "git_sha": "5caf7640a9ef1d18d765d55339be751bb0969dfa", + "git_sha": "3aa0aec1d52d492fe241919f0c6100ebf0074082", "installed_by": ["subworkflows"] }, "utils_nfcore_pipeline": { "branch": "master", - "git_sha": "92de218a329bfc9a9033116eb5f65fd270e72ba3", + "git_sha": "1b6b9a3338d011367137808b49b923515080e3ba", "installed_by": ["subworkflows"] }, - "utils_nfvalidation_plugin": { + "utils_nfschema_plugin": { "branch": "master", - "git_sha": "5caf7640a9ef1d18d765d55339be751bb0969dfa", + "git_sha": "bbd5a41f4535a8defafe6080e00ea74c45f4f96c", "installed_by": ["subworkflows"] } } diff --git a/modules/local/convert_depths.nf b/modules/local/convert_depths.nf index f61e0c29..0c54e5c6 100644 --- a/modules/local/convert_depths.nf +++ b/modules/local/convert_depths.nf @@ -11,14 +11,27 @@ process CONVERT_DEPTHS { output: // need to add empty val because representing reads as we dont want maxbin to calculate for us. - tuple val(meta), path(fasta), val([]), path("*_mb2_depth.txt"), emit: output - path "versions.yml" , emit: versions + tuple val(meta), path(fasta), val([]), path("*.abund"), emit: output + path "versions.yml" , emit: versions script: def prefix = task.ext.prefix ?: "${meta.id}" """ gunzip -f $depth - bioawk -t '{ { if (NR > 1) { { print \$1, \$3 } } } }' ${depth.toString() - '.gz'} > ${prefix}_mb2_depth.txt + + # Determine the number of abundance columns + n_abund=\$(awk 'NR==1 {print int((NF-3)/2)}' ${depth.toString() - '.gz'}) + + # Get column names + read -r header<${depth.toString() - '.gz'} + header=(\$header) + + # Generate abundance files for each read set + for i in \$(seq 1 \$n_abund); do + col=\$((i*2+2)) + name=\$( echo \${header[\$col-1]} | sed s/\\.bam\$// ) + bioawk -t '{if (NR > 1) {print \$1, \$'"\$col"'}}' ${depth.toString() - '.gz'} > \${name}.abund + done cat <<-END_VERSIONS > versions.yml "${task.process}": diff --git a/modules/local/filtlong.nf b/modules/local/filtlong.nf deleted file mode 100644 index 5410c1cb..00000000 --- a/modules/local/filtlong.nf +++ /dev/null @@ -1,33 +0,0 @@ -process FILTLONG { - tag "$meta.id" - - conda "bioconda::filtlong=0.2.0" - container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? - 'https://depot.galaxyproject.org/singularity/filtlong:0.2.0--he513fc3_3' : - 'biocontainers/filtlong:0.2.0--he513fc3_3' }" - - input: - tuple val(meta), path(long_reads), path(short_reads_1), path(short_reads_2) - - output: - tuple val(meta), path("${meta.id}_lr_filtlong.fastq.gz"), emit: reads - path "versions.yml" , emit: versions - - script: - """ - filtlong \ - -1 ${short_reads_1} \ - -2 ${short_reads_2} \ - --min_length ${params.longreads_min_length} \ - --keep_percent ${params.longreads_keep_percent} \ - --trim \ - --length_weight ${params.longreads_length_weight} \ - ${long_reads} | gzip > ${meta.id}_lr_filtlong.fastq.gz - - cat <<-END_VERSIONS > versions.yml - "${task.process}": - filtlong: \$(filtlong --version | sed -e "s/Filtlong v//g") - END_VERSIONS - """ -} - diff --git a/modules/local/mag_depths_plot.nf b/modules/local/mag_depths_plot.nf index 5f2f44ea..2291ca2d 100644 --- a/modules/local/mag_depths_plot.nf +++ b/modules/local/mag_depths_plot.nf @@ -1,18 +1,17 @@ process MAG_DEPTHS_PLOT { tag "${meta.assembler}-${meta.binner}-${meta.id}" - - conda "conda-forge::python=3.9 conda-forge::pandas=1.3.0 anaconda::seaborn=0.11.0" - container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? - 'https://depot.galaxyproject.org/singularity/mulled-v2-d14219255233ee6cacc427e28a7caf8ee42e8c91:0a22c7568e4a509925048454dad9ab37fa8fe776-0' : - 'biocontainers/mulled-v2-d14219255233ee6cacc427e28a7caf8ee42e8c91:0a22c7568e4a509925048454dad9ab37fa8fe776-0' }" + conda "conda-forge::python=3.9 conda-forge::pandas=1.3.0 conda-forge::seaborn=0.11.0 conda-forge::matplotlib=3.4.2" + container "${workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container + ? 'https://depot.galaxyproject.org/singularity/mulled-v2-d14219255233ee6cacc427e28a7caf8ee42e8c91:0a22c7568e4a509925048454dad9ab37fa8fe776-0' + : 'biocontainers/mulled-v2-d14219255233ee6cacc427e28a7caf8ee42e8c91:0a22c7568e4a509925048454dad9ab37fa8fe776-0'}" input: tuple val(meta), path(depths) - path(sample_groups) + path sample_groups output: tuple val(meta), path("${meta.assembler}-${meta.binner}-${meta.id}-binDepths.heatmap.png"), emit: heatmap - path "versions.yml" , emit: versions + path "versions.yml", emit: versions script: """ diff --git a/modules/local/megahit.nf b/modules/local/megahit.nf deleted file mode 100644 index 6f31425c..00000000 --- a/modules/local/megahit.nf +++ /dev/null @@ -1,40 +0,0 @@ -process MEGAHIT { - tag "$meta.id" - - conda "bioconda::megahit=1.2.9" - container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? - 'https://depot.galaxyproject.org/singularity/megahit:1.2.9--h2e03b76_1' : - 'biocontainers/megahit:1.2.9--h2e03b76_1' }" - - input: - tuple val(meta), path(reads1), path(reads2) - - output: - tuple val(meta), path("MEGAHIT/MEGAHIT-${meta.id}.contigs.fa"), emit: assembly - path "MEGAHIT/*.log" , emit: log - path "MEGAHIT/MEGAHIT-${meta.id}.contigs.fa.gz" , emit: assembly_gz - path "versions.yml" , emit: versions - - script: - def args = task.ext.args ?: '' - def input = meta.single_end ? "-r \"" + reads1.join(",") + "\"" : "-1 \"" + reads1.join(",") + "\" -2 \"" + reads2.join(",") + "\"" - mem = task.memory.toBytes() - if ( !params.megahit_fix_cpu_1 || task.cpus == 1 ) - """ - ## Check if we're in the same work directory as a previous failed MEGAHIT run - if [[ -d MEGAHIT ]]; then - rm -r MEGAHIT/ - fi - - megahit $args -t "${task.cpus}" -m $mem $input -o MEGAHIT --out-prefix "MEGAHIT-${meta.id}" - - gzip -c "MEGAHIT/MEGAHIT-${meta.id}.contigs.fa" > "MEGAHIT/MEGAHIT-${meta.id}.contigs.fa.gz" - - cat <<-END_VERSIONS > versions.yml - "${task.process}": - megahit: \$(echo \$(megahit -v 2>&1) | sed 's/MEGAHIT v//') - END_VERSIONS - """ - else - error "ERROR: '--megahit_fix_cpu_1' was specified, but not succesfully applied. Likely this is caused by changed process properties in a custom config file." -} diff --git a/modules/local/spades.nf b/modules/local/spades.nf deleted file mode 100644 index 9ef7ec77..00000000 --- a/modules/local/spades.nf +++ /dev/null @@ -1,51 +0,0 @@ -process SPADES { - tag "$meta.id" - - conda "bioconda::spades=3.15.3" - container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? - 'https://depot.galaxyproject.org/singularity/spades:3.15.3--h95f258a_0' : - 'biocontainers/spades:3.15.3--h95f258a_0' }" - - input: - tuple val(meta), path(reads) - - output: - tuple val(meta), path("SPAdes-${meta.id}_scaffolds.fasta"), emit: assembly - path "SPAdes-${meta.id}.log" , emit: log - path "SPAdes-${meta.id}_contigs.fasta.gz" , emit: contigs_gz - path "SPAdes-${meta.id}_scaffolds.fasta.gz" , emit: assembly_gz - path "SPAdes-${meta.id}_graph.gfa.gz" , emit: graph - path "versions.yml" , emit: versions - - script: - def args = task.ext.args ?: '' - maxmem = task.memory.toGiga() - // The -s option is not supported for metaspades. Each time this is called with `meta.single_end` it's because - // read depth was normalized with BBNorm, which actually outputs pairs, but in an interleaved file. - def readstr = meta.single_end ? "--12 ${reads}" : "-1 ${reads[0]} -2 ${reads[1]}" - - if ( params.spades_fix_cpus == -1 || task.cpus == params.spades_fix_cpus ) - """ - metaspades.py \ - $args \ - --threads "${task.cpus}" \ - --memory $maxmem \ - ${readstr} \ - -o spades - mv spades/assembly_graph_with_scaffolds.gfa SPAdes-${meta.id}_graph.gfa - mv spades/scaffolds.fasta SPAdes-${meta.id}_scaffolds.fasta - mv spades/contigs.fasta SPAdes-${meta.id}_contigs.fasta - mv spades/spades.log SPAdes-${meta.id}.log - gzip "SPAdes-${meta.id}_contigs.fasta" - gzip "SPAdes-${meta.id}_graph.gfa" - gzip -c "SPAdes-${meta.id}_scaffolds.fasta" > "SPAdes-${meta.id}_scaffolds.fasta.gz" - - cat <<-END_VERSIONS > versions.yml - "${task.process}": - python: \$(python --version 2>&1 | sed 's/Python //g') - metaspades: \$(metaspades.py --version | sed "s/SPAdes genome assembler v//; s/ \\[.*//") - END_VERSIONS - """ - else - error "ERROR: '--spades_fix_cpus' was specified, but not succesfully applied. Likely this is caused by changed process properties in a custom config file." -} diff --git a/modules/local/spadeshybrid.nf b/modules/local/spadeshybrid.nf deleted file mode 100644 index 13578a69..00000000 --- a/modules/local/spadeshybrid.nf +++ /dev/null @@ -1,49 +0,0 @@ -process SPADESHYBRID { - tag "$meta.id" - - conda "bioconda::spades=3.15.3" - container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? - 'https://depot.galaxyproject.org/singularity/spades:3.15.3--h95f258a_0' : - 'biocontainers/spades:3.15.3--h95f258a_0' }" - - input: - tuple val(meta), path(long_reads), path(short_reads) - - output: - tuple val(meta), path("SPAdesHybrid-${meta.id}_scaffolds.fasta"), emit: assembly - path "SPAdesHybrid-${meta.id}.log" , emit: log - path "SPAdesHybrid-${meta.id}_contigs.fasta.gz" , emit: contigs_gz - path "SPAdesHybrid-${meta.id}_scaffolds.fasta.gz" , emit: assembly_gz - path "SPAdesHybrid-${meta.id}_graph.gfa.gz" , emit: graph - path "versions.yml" , emit: versions - - script: - def args = task.ext.args ?: '' - maxmem = task.memory.toGiga() - if ( params.spadeshybrid_fix_cpus == -1 || task.cpus == params.spadeshybrid_fix_cpus ) - """ - metaspades.py \ - $args \ - --threads "${task.cpus}" \ - --memory $maxmem \ - --pe1-1 ${short_reads[0]} \ - --pe1-2 ${short_reads[1]} \ - --nanopore ${long_reads} \ - -o spades - mv spades/assembly_graph_with_scaffolds.gfa SPAdesHybrid-${meta.id}_graph.gfa - mv spades/scaffolds.fasta SPAdesHybrid-${meta.id}_scaffolds.fasta - mv spades/contigs.fasta SPAdesHybrid-${meta.id}_contigs.fasta - mv spades/spades.log SPAdesHybrid-${meta.id}.log - gzip "SPAdesHybrid-${meta.id}_contigs.fasta" - gzip "SPAdesHybrid-${meta.id}_graph.gfa" - gzip -c "SPAdesHybrid-${meta.id}_scaffolds.fasta" > "SPAdesHybrid-${meta.id}_scaffolds.fasta.gz" - - cat <<-END_VERSIONS > versions.yml - "${task.process}": - python: \$(python --version 2>&1 | sed 's/Python //g') - metaspades: \$(metaspades.py --version | sed "s/SPAdes genome assembler v//; s/ \\[.*//") - END_VERSIONS - """ - else - error "ERROR: '--spadeshybrid_fix_cpus' was specified, but not succesfully applied. Likely this is caused by changed process properties in a custom config file." -} diff --git a/modules/nf-core/fastqc/environment.yml b/modules/nf-core/fastqc/environment.yml index 1787b38a..691d4c76 100644 --- a/modules/nf-core/fastqc/environment.yml +++ b/modules/nf-core/fastqc/environment.yml @@ -1,7 +1,5 @@ -name: fastqc channels: - conda-forge - bioconda - - defaults dependencies: - bioconda::fastqc=0.12.1 diff --git a/modules/nf-core/fastqc/main.nf b/modules/nf-core/fastqc/main.nf index d79f1c86..d8989f48 100644 --- a/modules/nf-core/fastqc/main.nf +++ b/modules/nf-core/fastqc/main.nf @@ -26,7 +26,10 @@ process FASTQC { def rename_to = old_new_pairs*.join(' ').join(' ') def renamed_files = old_new_pairs.collect{ old_name, new_name -> new_name }.join(' ') - def memory_in_mb = MemoryUnit.of("${task.memory}").toUnit('MB') + // The total amount of allocated RAM by FastQC is equal to the number of threads defined (--threads) time the amount of RAM defined (--memory) + // https://github.com/s-andrews/FastQC/blob/1faeea0412093224d7f6a07f777fad60a5650795/fastqc#L211-L222 + // Dividing the task.memory by task.cpu allows to stick to requested amount of RAM in the label + def memory_in_mb = MemoryUnit.of("${task.memory}").toUnit('MB') / task.cpus // FastQC memory value allowed range (100 - 10000) def fastqc_memory = memory_in_mb > 10000 ? 10000 : (memory_in_mb < 100 ? 100 : memory_in_mb) diff --git a/modules/nf-core/fastqc/meta.yml b/modules/nf-core/fastqc/meta.yml index ee5507e0..4827da7a 100644 --- a/modules/nf-core/fastqc/meta.yml +++ b/modules/nf-core/fastqc/meta.yml @@ -16,35 +16,44 @@ tools: homepage: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/ documentation: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/ licence: ["GPL-2.0-only"] + identifier: biotools:fastqc input: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. [ id:'test', single_end:false ] - - reads: - type: file - description: | - List of input FastQ files of size 1 and 2 for single-end and paired-end data, - respectively. + - - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - reads: + type: file + description: | + List of input FastQ files of size 1 and 2 for single-end and paired-end data, + respectively. output: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. [ id:'test', single_end:false ] - html: - type: file - description: FastQC report - pattern: "*_{fastqc.html}" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*.html": + type: file + description: FastQC report + pattern: "*_{fastqc.html}" - zip: - type: file - description: FastQC report archive - pattern: "*_{fastqc.zip}" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*.zip": + type: file + description: FastQC report archive + pattern: "*_{fastqc.zip}" - versions: - type: file - description: File containing software versions - pattern: "versions.yml" + - versions.yml: + type: file + description: File containing software versions + pattern: "versions.yml" authors: - "@drpatelh" - "@grst" diff --git a/modules/nf-core/fastqc/tests/main.nf.test b/modules/nf-core/fastqc/tests/main.nf.test index 70edae4d..e9d79a07 100644 --- a/modules/nf-core/fastqc/tests/main.nf.test +++ b/modules/nf-core/fastqc/tests/main.nf.test @@ -23,17 +23,14 @@ nextflow_process { then { assertAll ( - { assert process.success }, - - // NOTE The report contains the date inside it, which means that the md5sum is stable per day, but not longer than that. So you can't md5sum it. - // looks like this:
Mon 2 Oct 2023
test.gz
- // https://github.com/nf-core/modules/pull/3903#issuecomment-1743620039 - - { assert process.out.html[0][1] ==~ ".*/test_fastqc.html" }, - { assert process.out.zip[0][1] ==~ ".*/test_fastqc.zip" }, - { assert path(process.out.html[0][1]).text.contains("File typeConventional base calls") }, - - { assert snapshot(process.out.versions).match("fastqc_versions_single") } + { assert process.success }, + // NOTE The report contains the date inside it, which means that the md5sum is stable per day, but not longer than that. So you can't md5sum it. + // looks like this:
Mon 2 Oct 2023
test.gz
+ // https://github.com/nf-core/modules/pull/3903#issuecomment-1743620039 + { assert process.out.html[0][1] ==~ ".*/test_fastqc.html" }, + { assert process.out.zip[0][1] ==~ ".*/test_fastqc.zip" }, + { assert path(process.out.html[0][1]).text.contains("File typeConventional base calls") }, + { assert snapshot(process.out.versions).match() } ) } } @@ -54,16 +51,14 @@ nextflow_process { then { assertAll ( - { assert process.success }, - - { assert process.out.html[0][1][0] ==~ ".*/test_1_fastqc.html" }, - { assert process.out.html[0][1][1] ==~ ".*/test_2_fastqc.html" }, - { assert process.out.zip[0][1][0] ==~ ".*/test_1_fastqc.zip" }, - { assert process.out.zip[0][1][1] ==~ ".*/test_2_fastqc.zip" }, - { assert path(process.out.html[0][1][0]).text.contains("File typeConventional base calls") }, - { assert path(process.out.html[0][1][1]).text.contains("File typeConventional base calls") }, - - { assert snapshot(process.out.versions).match("fastqc_versions_paired") } + { assert process.success }, + { assert process.out.html[0][1][0] ==~ ".*/test_1_fastqc.html" }, + { assert process.out.html[0][1][1] ==~ ".*/test_2_fastqc.html" }, + { assert process.out.zip[0][1][0] ==~ ".*/test_1_fastqc.zip" }, + { assert process.out.zip[0][1][1] ==~ ".*/test_2_fastqc.zip" }, + { assert path(process.out.html[0][1][0]).text.contains("File typeConventional base calls") }, + { assert path(process.out.html[0][1][1]).text.contains("File typeConventional base calls") }, + { assert snapshot(process.out.versions).match() } ) } } @@ -83,13 +78,11 @@ nextflow_process { then { assertAll ( - { assert process.success }, - - { assert process.out.html[0][1] ==~ ".*/test_fastqc.html" }, - { assert process.out.zip[0][1] ==~ ".*/test_fastqc.zip" }, - { assert path(process.out.html[0][1]).text.contains("File typeConventional base calls") }, - - { assert snapshot(process.out.versions).match("fastqc_versions_interleaved") } + { assert process.success }, + { assert process.out.html[0][1] ==~ ".*/test_fastqc.html" }, + { assert process.out.zip[0][1] ==~ ".*/test_fastqc.zip" }, + { assert path(process.out.html[0][1]).text.contains("File typeConventional base calls") }, + { assert snapshot(process.out.versions).match() } ) } } @@ -109,13 +102,11 @@ nextflow_process { then { assertAll ( - { assert process.success }, - - { assert process.out.html[0][1] ==~ ".*/test_fastqc.html" }, - { assert process.out.zip[0][1] ==~ ".*/test_fastqc.zip" }, - { assert path(process.out.html[0][1]).text.contains("File typeConventional base calls") }, - - { assert snapshot(process.out.versions).match("fastqc_versions_bam") } + { assert process.success }, + { assert process.out.html[0][1] ==~ ".*/test_fastqc.html" }, + { assert process.out.zip[0][1] ==~ ".*/test_fastqc.zip" }, + { assert path(process.out.html[0][1]).text.contains("File typeConventional base calls") }, + { assert snapshot(process.out.versions).match() } ) } } @@ -138,22 +129,20 @@ nextflow_process { then { assertAll ( - { assert process.success }, - - { assert process.out.html[0][1][0] ==~ ".*/test_1_fastqc.html" }, - { assert process.out.html[0][1][1] ==~ ".*/test_2_fastqc.html" }, - { assert process.out.html[0][1][2] ==~ ".*/test_3_fastqc.html" }, - { assert process.out.html[0][1][3] ==~ ".*/test_4_fastqc.html" }, - { assert process.out.zip[0][1][0] ==~ ".*/test_1_fastqc.zip" }, - { assert process.out.zip[0][1][1] ==~ ".*/test_2_fastqc.zip" }, - { assert process.out.zip[0][1][2] ==~ ".*/test_3_fastqc.zip" }, - { assert process.out.zip[0][1][3] ==~ ".*/test_4_fastqc.zip" }, - { assert path(process.out.html[0][1][0]).text.contains("File typeConventional base calls") }, - { assert path(process.out.html[0][1][1]).text.contains("File typeConventional base calls") }, - { assert path(process.out.html[0][1][2]).text.contains("File typeConventional base calls") }, - { assert path(process.out.html[0][1][3]).text.contains("File typeConventional base calls") }, - - { assert snapshot(process.out.versions).match("fastqc_versions_multiple") } + { assert process.success }, + { assert process.out.html[0][1][0] ==~ ".*/test_1_fastqc.html" }, + { assert process.out.html[0][1][1] ==~ ".*/test_2_fastqc.html" }, + { assert process.out.html[0][1][2] ==~ ".*/test_3_fastqc.html" }, + { assert process.out.html[0][1][3] ==~ ".*/test_4_fastqc.html" }, + { assert process.out.zip[0][1][0] ==~ ".*/test_1_fastqc.zip" }, + { assert process.out.zip[0][1][1] ==~ ".*/test_2_fastqc.zip" }, + { assert process.out.zip[0][1][2] ==~ ".*/test_3_fastqc.zip" }, + { assert process.out.zip[0][1][3] ==~ ".*/test_4_fastqc.zip" }, + { assert path(process.out.html[0][1][0]).text.contains("File typeConventional base calls") }, + { assert path(process.out.html[0][1][1]).text.contains("File typeConventional base calls") }, + { assert path(process.out.html[0][1][2]).text.contains("File typeConventional base calls") }, + { assert path(process.out.html[0][1][3]).text.contains("File typeConventional base calls") }, + { assert snapshot(process.out.versions).match() } ) } } @@ -173,21 +162,18 @@ nextflow_process { then { assertAll ( - { assert process.success }, - - { assert process.out.html[0][1] ==~ ".*/mysample_fastqc.html" }, - { assert process.out.zip[0][1] ==~ ".*/mysample_fastqc.zip" }, - { assert path(process.out.html[0][1]).text.contains("File typeConventional base calls") }, - - { assert snapshot(process.out.versions).match("fastqc_versions_custom_prefix") } + { assert process.success }, + { assert process.out.html[0][1] ==~ ".*/mysample_fastqc.html" }, + { assert process.out.zip[0][1] ==~ ".*/mysample_fastqc.zip" }, + { assert path(process.out.html[0][1]).text.contains("File typeConventional base calls") }, + { assert snapshot(process.out.versions).match() } ) } } test("sarscov2 single-end [fastq] - stub") { - options "-stub" - + options "-stub" when { process { """ @@ -201,12 +187,123 @@ nextflow_process { then { assertAll ( - { assert process.success }, - { assert snapshot(process.out.html.collect { file(it[1]).getName() } + - process.out.zip.collect { file(it[1]).getName() } + - process.out.versions ).match("fastqc_stub") } + { assert process.success }, + { assert snapshot(process.out).match() } ) } } + test("sarscov2 paired-end [fastq] - stub") { + + options "-stub" + when { + process { + """ + input[0] = Channel.of([ + [id: 'test', single_end: false], // meta map + [ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true), + file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_2.fastq.gz', checkIfExists: true) ] + ]) + """ + } + } + + then { + assertAll ( + { assert process.success }, + { assert snapshot(process.out).match() } + ) + } + } + + test("sarscov2 interleaved [fastq] - stub") { + + options "-stub" + when { + process { + """ + input[0] = Channel.of([ + [id: 'test', single_end: false], // meta map + file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_interleaved.fastq.gz', checkIfExists: true) + ]) + """ + } + } + + then { + assertAll ( + { assert process.success }, + { assert snapshot(process.out).match() } + ) + } + } + + test("sarscov2 paired-end [bam] - stub") { + + options "-stub" + when { + process { + """ + input[0] = Channel.of([ + [id: 'test', single_end: false], // meta map + file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/bam/test.paired_end.sorted.bam', checkIfExists: true) + ]) + """ + } + } + + then { + assertAll ( + { assert process.success }, + { assert snapshot(process.out).match() } + ) + } + } + + test("sarscov2 multiple [fastq] - stub") { + + options "-stub" + when { + process { + """ + input[0] = Channel.of([ + [id: 'test', single_end: false], // meta map + [ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true), + file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_2.fastq.gz', checkIfExists: true), + file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test2_1.fastq.gz', checkIfExists: true), + file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test2_2.fastq.gz', checkIfExists: true) ] + ]) + """ + } + } + + then { + assertAll ( + { assert process.success }, + { assert snapshot(process.out).match() } + ) + } + } + + test("sarscov2 custom_prefix - stub") { + + options "-stub" + when { + process { + """ + input[0] = Channel.of([ + [ id:'mysample', single_end:true ], // meta map + file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true) + ]) + """ + } + } + + then { + assertAll ( + { assert process.success }, + { assert snapshot(process.out).match() } + ) + } + } } diff --git a/modules/nf-core/fastqc/tests/main.nf.test.snap b/modules/nf-core/fastqc/tests/main.nf.test.snap index 86f7c311..d5db3092 100644 --- a/modules/nf-core/fastqc/tests/main.nf.test.snap +++ b/modules/nf-core/fastqc/tests/main.nf.test.snap @@ -1,88 +1,392 @@ { - "fastqc_versions_interleaved": { + "sarscov2 custom_prefix": { "content": [ [ "versions.yml:md5,e1cc25ca8af856014824abd842e93978" ] ], "meta": { - "nf-test": "0.8.4", - "nextflow": "23.10.1" + "nf-test": "0.9.0", + "nextflow": "24.04.3" }, - "timestamp": "2024-01-31T17:40:07.293713" + "timestamp": "2024-07-22T11:02:16.374038" }, - "fastqc_stub": { + "sarscov2 single-end [fastq] - stub": { "content": [ - [ - "test.html", - "test.zip", - "versions.yml:md5,e1cc25ca8af856014824abd842e93978" - ] + { + "0": [ + [ + { + "id": "test", + "single_end": true + }, + "test.html:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "1": [ + [ + { + "id": "test", + "single_end": true + }, + "test.zip:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "2": [ + "versions.yml:md5,e1cc25ca8af856014824abd842e93978" + ], + "html": [ + [ + { + "id": "test", + "single_end": true + }, + "test.html:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "versions": [ + "versions.yml:md5,e1cc25ca8af856014824abd842e93978" + ], + "zip": [ + [ + { + "id": "test", + "single_end": true + }, + "test.zip:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ] + } + ], + "meta": { + "nf-test": "0.9.0", + "nextflow": "24.04.3" + }, + "timestamp": "2024-07-22T11:02:24.993809" + }, + "sarscov2 custom_prefix - stub": { + "content": [ + { + "0": [ + [ + { + "id": "mysample", + "single_end": true + }, + "mysample.html:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "1": [ + [ + { + "id": "mysample", + "single_end": true + }, + "mysample.zip:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "2": [ + "versions.yml:md5,e1cc25ca8af856014824abd842e93978" + ], + "html": [ + [ + { + "id": "mysample", + "single_end": true + }, + "mysample.html:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "versions": [ + "versions.yml:md5,e1cc25ca8af856014824abd842e93978" + ], + "zip": [ + [ + { + "id": "mysample", + "single_end": true + }, + "mysample.zip:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ] + } ], "meta": { - "nf-test": "0.8.4", - "nextflow": "23.10.1" + "nf-test": "0.9.0", + "nextflow": "24.04.3" }, - "timestamp": "2024-01-31T17:31:01.425198" + "timestamp": "2024-07-22T11:03:10.93942" }, - "fastqc_versions_multiple": { + "sarscov2 interleaved [fastq]": { "content": [ [ "versions.yml:md5,e1cc25ca8af856014824abd842e93978" ] ], "meta": { - "nf-test": "0.8.4", - "nextflow": "23.10.1" + "nf-test": "0.9.0", + "nextflow": "24.04.3" }, - "timestamp": "2024-01-31T17:40:55.797907" + "timestamp": "2024-07-22T11:01:42.355718" }, - "fastqc_versions_bam": { + "sarscov2 paired-end [bam]": { "content": [ [ "versions.yml:md5,e1cc25ca8af856014824abd842e93978" ] ], "meta": { - "nf-test": "0.8.4", - "nextflow": "23.10.1" + "nf-test": "0.9.0", + "nextflow": "24.04.3" }, - "timestamp": "2024-01-31T17:40:26.795862" + "timestamp": "2024-07-22T11:01:53.276274" }, - "fastqc_versions_single": { + "sarscov2 multiple [fastq]": { "content": [ [ "versions.yml:md5,e1cc25ca8af856014824abd842e93978" ] ], "meta": { - "nf-test": "0.8.4", - "nextflow": "23.10.1" + "nf-test": "0.9.0", + "nextflow": "24.04.3" }, - "timestamp": "2024-01-31T17:39:27.043675" + "timestamp": "2024-07-22T11:02:05.527626" }, - "fastqc_versions_paired": { + "sarscov2 paired-end [fastq]": { "content": [ [ "versions.yml:md5,e1cc25ca8af856014824abd842e93978" ] ], "meta": { - "nf-test": "0.8.4", - "nextflow": "23.10.1" + "nf-test": "0.9.0", + "nextflow": "24.04.3" + }, + "timestamp": "2024-07-22T11:01:31.188871" + }, + "sarscov2 paired-end [fastq] - stub": { + "content": [ + { + "0": [ + [ + { + "id": "test", + "single_end": false + }, + "test.html:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "1": [ + [ + { + "id": "test", + "single_end": false + }, + "test.zip:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "2": [ + "versions.yml:md5,e1cc25ca8af856014824abd842e93978" + ], + "html": [ + [ + { + "id": "test", + "single_end": false + }, + "test.html:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "versions": [ + "versions.yml:md5,e1cc25ca8af856014824abd842e93978" + ], + "zip": [ + [ + { + "id": "test", + "single_end": false + }, + "test.zip:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ] + } + ], + "meta": { + "nf-test": "0.9.0", + "nextflow": "24.04.3" + }, + "timestamp": "2024-07-22T11:02:34.273566" + }, + "sarscov2 multiple [fastq] - stub": { + "content": [ + { + "0": [ + [ + { + "id": "test", + "single_end": false + }, + "test.html:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "1": [ + [ + { + "id": "test", + "single_end": false + }, + "test.zip:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "2": [ + "versions.yml:md5,e1cc25ca8af856014824abd842e93978" + ], + "html": [ + [ + { + "id": "test", + "single_end": false + }, + "test.html:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "versions": [ + "versions.yml:md5,e1cc25ca8af856014824abd842e93978" + ], + "zip": [ + [ + { + "id": "test", + "single_end": false + }, + "test.zip:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ] + } + ], + "meta": { + "nf-test": "0.9.0", + "nextflow": "24.04.3" }, - "timestamp": "2024-01-31T17:39:47.584191" + "timestamp": "2024-07-22T11:03:02.304411" }, - "fastqc_versions_custom_prefix": { + "sarscov2 single-end [fastq]": { "content": [ [ "versions.yml:md5,e1cc25ca8af856014824abd842e93978" ] ], "meta": { - "nf-test": "0.8.4", - "nextflow": "23.10.1" + "nf-test": "0.9.0", + "nextflow": "24.04.3" + }, + "timestamp": "2024-07-22T11:01:19.095607" + }, + "sarscov2 interleaved [fastq] - stub": { + "content": [ + { + "0": [ + [ + { + "id": "test", + "single_end": false + }, + "test.html:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "1": [ + [ + { + "id": "test", + "single_end": false + }, + "test.zip:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "2": [ + "versions.yml:md5,e1cc25ca8af856014824abd842e93978" + ], + "html": [ + [ + { + "id": "test", + "single_end": false + }, + "test.html:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "versions": [ + "versions.yml:md5,e1cc25ca8af856014824abd842e93978" + ], + "zip": [ + [ + { + "id": "test", + "single_end": false + }, + "test.zip:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ] + } + ], + "meta": { + "nf-test": "0.9.0", + "nextflow": "24.04.3" + }, + "timestamp": "2024-07-22T11:02:44.640184" + }, + "sarscov2 paired-end [bam] - stub": { + "content": [ + { + "0": [ + [ + { + "id": "test", + "single_end": false + }, + "test.html:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "1": [ + [ + { + "id": "test", + "single_end": false + }, + "test.zip:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "2": [ + "versions.yml:md5,e1cc25ca8af856014824abd842e93978" + ], + "html": [ + [ + { + "id": "test", + "single_end": false + }, + "test.html:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "versions": [ + "versions.yml:md5,e1cc25ca8af856014824abd842e93978" + ], + "zip": [ + [ + { + "id": "test", + "single_end": false + }, + "test.zip:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ] + } + ], + "meta": { + "nf-test": "0.9.0", + "nextflow": "24.04.3" }, - "timestamp": "2024-01-31T17:41:14.576531" + "timestamp": "2024-07-22T11:02:53.550742" } } \ No newline at end of file diff --git a/modules/nf-core/filtlong/environment.yml b/modules/nf-core/filtlong/environment.yml new file mode 100644 index 00000000..746c83a4 --- /dev/null +++ b/modules/nf-core/filtlong/environment.yml @@ -0,0 +1,5 @@ +channels: + - conda-forge + - bioconda +dependencies: + - bioconda::filtlong=0.2.1 diff --git a/modules/nf-core/filtlong/main.nf b/modules/nf-core/filtlong/main.nf new file mode 100644 index 00000000..627247fe --- /dev/null +++ b/modules/nf-core/filtlong/main.nf @@ -0,0 +1,39 @@ +process FILTLONG { + tag "$meta.id" + label 'process_low' + + conda "${moduleDir}/environment.yml" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/filtlong:0.2.1--h9a82719_0' : + 'biocontainers/filtlong:0.2.1--h9a82719_0' }" + + input: + tuple val(meta), path(shortreads), path(longreads) + + output: + tuple val(meta), path("*.fastq.gz"), emit: reads + tuple val(meta), path("*.log") , emit: log + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + def short_reads = !shortreads ? "" : meta.single_end ? "-1 $shortreads" : "-1 ${shortreads[0]} -2 ${shortreads[1]}" + if ("$longreads" == "${prefix}.fastq.gz") error "Longread FASTQ input and output names are the same, set prefix in module configuration to disambiguate!" + """ + filtlong \\ + $short_reads \\ + $args \\ + $longreads \\ + 2> >(tee ${prefix}.log >&2) \\ + | gzip -n > ${prefix}.fastq.gz + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + filtlong: \$( filtlong --version | sed -e "s/Filtlong v//g" ) + END_VERSIONS + """ +} diff --git a/modules/nf-core/filtlong/meta.yml b/modules/nf-core/filtlong/meta.yml new file mode 100644 index 00000000..804c1b0d --- /dev/null +++ b/modules/nf-core/filtlong/meta.yml @@ -0,0 +1,65 @@ +name: filtlong +description: Filtlong filters long reads based on quality measures or short read data. +keywords: + - nanopore + - quality control + - QC + - filtering + - long reads + - short reads +tools: + - filtlong: + description: Filtlong is a tool for filtering long reads. It can take a set of + long reads and produce a smaller, better subset. It uses both read length (longer + is better) and read identity (higher is better) when choosing which reads pass + the filter. + homepage: https://anaconda.org/bioconda/filtlong + tool_dev_url: https://github.com/rrwick/Filtlong + licence: ["GPL v3"] + identifier: biotools:filtlong +input: + - - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - shortreads: + type: file + description: fastq file + pattern: "*.{fq,fastq,fq.gz,fastq.gz}" + - longreads: + type: file + description: fastq file + pattern: "*.{fq,fastq,fq.gz,fastq.gz}" +output: + - reads: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*.fastq.gz": + type: file + description: Filtered (compressed) fastq file + pattern: "*.fastq.gz" + - log: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*.log": + type: file + description: Standard error logging file containing summary statistics + pattern: "*.log" + - versions: + - versions.yml: + type: file + description: File containing software versions + pattern: "versions.yml" +authors: + - "@d4straub" + - "@sofstam" +maintainers: + - "@d4straub" + - "@sofstam" diff --git a/modules/nf-core/filtlong/tests/main.nf.test b/modules/nf-core/filtlong/tests/main.nf.test new file mode 100644 index 00000000..d54ce39c --- /dev/null +++ b/modules/nf-core/filtlong/tests/main.nf.test @@ -0,0 +1,108 @@ +nextflow_process { + + name "Test Process FILTLONG" + script "../main.nf" + process "FILTLONG" + config "./nextflow.config" + tag "filtlong" + tag "modules" + tag "modules_nfcore" + + test("sarscov2 nanopore [fastq]") { + + when { + params { + outdir = "$outputDir" + } + process { + """ + input[0] = [ + [ id:'test', single_end:false ], // meta map + [], + [ file(params.modules_testdata_base_path + 'genomics/sarscov2/nanopore/fastq/test.fastq.gz', checkIfExists: true) ] + ] + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert path(process.out.log.get(0).get(1)).readLines().contains("Scoring long reads")}, + { assert snapshot( + process.out.reads, + process.out.versions + ).match() + } + ) + } + + } + + + test("sarscov2 nanopore [fastq] + Illumina single-end [fastq]") { + + when { + params { + outdir = "$outputDir" + } + process { + """ + input[0] = [ + [ id:'test', single_end:true ], // meta map + [ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true) ], + [ file(params.modules_testdata_base_path + 'genomics/sarscov2/nanopore/fastq/test.fastq.gz', checkIfExists: true) ] + ] + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert path(process.out.log.get(0).get(1)).readLines().contains("Scoring long reads")}, + { assert snapshot( + process.out.reads, + process.out.versions + ).match() + } + ) + } + + } + + + test("sarscov2 nanopore [fastq] + Illumina paired-end [fastq]") { + + when { + params { + outdir = "$outputDir" + } + process { + """ + input[0] = [ + [ id:'test', single_end:false ], // meta map + [ + file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true), + file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_2.fastq.gz', checkIfExists: true) + ], + [ file(params.modules_testdata_base_path + 'genomics/sarscov2/nanopore/fastq/test.fastq.gz', checkIfExists: true) ] + ] + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert path(process.out.log.get(0).get(1)).readLines().contains("Scoring long reads")}, + { assert snapshot( + process.out.reads, + process.out.versions + ).match() + } + ) + } + + } +} diff --git a/modules/nf-core/filtlong/tests/main.nf.test.snap b/modules/nf-core/filtlong/tests/main.nf.test.snap new file mode 100644 index 00000000..1a25c3fc --- /dev/null +++ b/modules/nf-core/filtlong/tests/main.nf.test.snap @@ -0,0 +1,65 @@ +{ + "sarscov2 nanopore [fastq]": { + "content": [ + [ + [ + { + "id": "test", + "single_end": false + }, + "test_lr.fastq.gz:md5,7567d853ada6ac142332619d0b541d76" + ] + ], + [ + "versions.yml:md5,af5988f30157282acdb0ac50ebb4c8cc" + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "24.04.3" + }, + "timestamp": "2024-08-06T10:51:29.197603" + }, + "sarscov2 nanopore [fastq] + Illumina paired-end [fastq]": { + "content": [ + [ + [ + { + "id": "test", + "single_end": false + }, + "test_lr.fastq.gz:md5,7567d853ada6ac142332619d0b541d76" + ] + ], + [ + "versions.yml:md5,af5988f30157282acdb0ac50ebb4c8cc" + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "24.04.3" + }, + "timestamp": "2024-08-06T10:51:39.68464" + }, + "sarscov2 nanopore [fastq] + Illumina single-end [fastq]": { + "content": [ + [ + [ + { + "id": "test", + "single_end": true + }, + "test_lr.fastq.gz:md5,7567d853ada6ac142332619d0b541d76" + ] + ], + [ + "versions.yml:md5,af5988f30157282acdb0ac50ebb4c8cc" + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "24.04.3" + }, + "timestamp": "2024-08-06T10:51:34.404022" + } +} \ No newline at end of file diff --git a/modules/nf-core/filtlong/tests/nextflow.config b/modules/nf-core/filtlong/tests/nextflow.config new file mode 100644 index 00000000..d366b4c3 --- /dev/null +++ b/modules/nf-core/filtlong/tests/nextflow.config @@ -0,0 +1,4 @@ +process { + ext.args = "--min_length 10" + ext.prefix = "test_lr" +} diff --git a/modules/nf-core/maxbin2/environment.yml b/modules/nf-core/maxbin2/environment.yml new file mode 100644 index 00000000..8a881999 --- /dev/null +++ b/modules/nf-core/maxbin2/environment.yml @@ -0,0 +1,5 @@ +channels: + - conda-forge + - bioconda +dependencies: + - bioconda::maxbin2=2.2.7 diff --git a/modules/nf-core/maxbin2/main.nf b/modules/nf-core/maxbin2/main.nf index d5f49344..845c8e4e 100644 --- a/modules/nf-core/maxbin2/main.nf +++ b/modules/nf-core/maxbin2/main.nf @@ -2,7 +2,7 @@ process MAXBIN2 { tag "$meta.id" label 'process_medium' - conda "bioconda::maxbin2=2.2.7" + conda "${moduleDir}/environment.yml" container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? 'https://depot.galaxyproject.org/singularity/maxbin2:2.2.7--he1b5a44_2' : 'biocontainers/maxbin2:2.2.7--he1b5a44_2' }" @@ -13,6 +13,7 @@ process MAXBIN2 { output: tuple val(meta), path("*.fasta.gz") , emit: binned_fastas tuple val(meta), path("*.summary") , emit: summary + tuple val(meta), path("*.abundance") , emit: abundance , optional: true tuple val(meta), path("*.log.gz") , emit: log tuple val(meta), path("*.marker.gz") , emit: marker_counts tuple val(meta), path("*.noclass.gz") , emit: unbinned_fasta @@ -27,7 +28,16 @@ process MAXBIN2 { script: def args = task.ext.args ?: '' def prefix = task.ext.prefix ?: "${meta.id}" - def associate_files = reads ? "-reads $reads" : "-abund $abund" + if (reads && abund) { error("ERROR: MaxBin2 can only accept one of `reads` or `abund`, no both. Check input.") } + def associate_files = "" + if ( reads ) { + associate_files = "-reads $reads" + } else if ( abund instanceof List ) { + associate_files = "-abund ${abund[0]}" + for (i in 2..abund.size()) { associate_files += " -abund$i ${abund[i-1]}" } + } else { + associate_files = "-abund $abund" + } """ mkdir input/ && mv $contigs input/ run_MaxBin.pl \\ diff --git a/modules/nf-core/maxbin2/meta.yml b/modules/nf-core/maxbin2/meta.yml index 7971d481..9546afb1 100644 --- a/modules/nf-core/maxbin2/meta.yml +++ b/modules/nf-core/maxbin2/meta.yml @@ -11,69 +11,133 @@ keywords: - contigs tools: - maxbin2: - description: MaxBin is software for binning assembled metagenomic sequences based on an Expectation-Maximization algorithm. + description: MaxBin is software for binning assembled metagenomic sequences based + on an Expectation-Maximization algorithm. homepage: https://sourceforge.net/projects/maxbin/ documentation: https://sourceforge.net/projects/maxbin/ tool_dev_url: https://sourceforge.net/projects/maxbin/ doi: "10.1093/bioinformatics/btv638" licence: ["BSD 3-clause"] - + identifier: "" input: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. [ id:'test', single_end:false ] - - contigs: - type: file - description: Multi FASTA file containing assembled contigs of a given sample - pattern: "*.fasta" - - reads: - type: file - description: Reads used to assemble contigs in FASTA or FASTQ format. Do not supply at the same time as abundance files. - pattern: "*.fasta" - - abund: - type: file - description: Contig abundance files, i.e. reads against each contig. See MaxBin2 README for details. Do not supply at the same time as read files. - + - - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - contigs: + type: file + description: Multi FASTA file containing assembled contigs of a given sample + pattern: "*.fasta" + - reads: + type: file + description: Reads used to assemble contigs in FASTA or FASTQ format. Do not + supply at the same time as abundance files. + pattern: "*.fasta" + - abund: + type: list + description: One or more contig abundance files, i.e. average depth of reads against each contig. See MaxBin2 + README for details. Do not supply at the same time as read files. output: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. [ id:'test', single_end:false ] - - versions: - type: file - description: File containing software versions - pattern: "versions.yml" - binned_fastas: - type: file - description: Binned contigs, one per bin designated with numeric IDs - pattern: "*.fasta.gz" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*.fasta.gz": + type: file + description: Binned contigs, one per bin designated with numeric IDs + pattern: "*.fasta.gz" - summary: - type: file - description: Summary file describing which contigs are being classified into which bin - pattern: "*.summary" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*.summary": + type: file + description: Summary file describing which contigs are being classified into + which bin + pattern: "*.summary" + - abundance: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*.abundance": + type: file + description: Abundance of each bin if multiple abundance files were supplied + which bin + pattern: "*.abundance" - log: - type: file - description: Log file recording the core steps of MaxBin algorithm - pattern: "*.log.gz" - - marker: - type: file - description: Marker gene presence numbers for each bin - pattern: "*.marker.gz" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*.log.gz": + type: file + description: Log file recording the core steps of MaxBin algorithm + pattern: "*.log.gz" + - marker_counts: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*.marker.gz": + type: file + description: Marker counts + pattern: "*.marker.gz" - unbinned_fasta: - type: file - description: All sequences that pass the minimum length threshold but are not classified successfully. - pattern: "*.noclass.gz" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*.noclass.gz": + type: file + description: All sequences that pass the minimum length threshold but are not + classified successfully. + pattern: "*.noclass.gz" - tooshort_fasta: - type: file - description: All sequences that do not meet the minimum length threshold. - pattern: "*.tooshort.gz" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*.tooshort.gz": + type: file + description: All sequences that do not meet the minimum length threshold. + pattern: "*.tooshort.gz" + - marker_bins: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*_bin.tar.gz": + type: file + description: Marker bins + pattern: "*_bin.tar.gz" - marker_genes: - type: file - description: All sequences that do not meet the minimum length threshold. - pattern: "*.marker_of_each_gene.tar.gz" - + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*_gene.tar.gz": + type: file + description: Marker genes + pattern: "*_gene.tar.gz" + - versions: + - versions.yml: + type: file + description: File containing software versions + pattern: "versions.yml" authors: - "@jfy133" +maintainers: + - "@jfy133" diff --git a/modules/nf-core/maxbin2/tests/main.nf.test b/modules/nf-core/maxbin2/tests/main.nf.test new file mode 100644 index 00000000..efb23c2b --- /dev/null +++ b/modules/nf-core/maxbin2/tests/main.nf.test @@ -0,0 +1,47 @@ + +nextflow_process { + + name "Test Process MAXBIN2" + script "../main.nf" + process "MAXBIN2" + + tag "modules" + tag "modules_nfcore" + tag "maxbin2" + + test("test-maxbin2") { + + when { + process { + """ + input[0] = [ + [ id:'test1', single_end:false ], // meta map + file(params.modules_testdata_base_path + 'genomics/prokaryotes/bacteroides_fragilis/illumina/fasta/test1.contigs.fa.gz', checkIfExists: true), + file(params.modules_testdata_base_path + 'genomics/prokaryotes/bacteroides_fragilis/illumina/fastq/test1_1.fastq.gz', checkIfExists: true), + [] + ] + + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot( + process.out.binned_fastas, + process.out.summary, + file(process.out.log[0][1]).name, + process.out.marker_counts, + file(process.out.unbinned_fasta[0][1]).name, // empty + process.out.tooshort_fasta, + file(process.out.marker_bins[0][1]).name, // unstable + process.out.marker_genes, + process.out.versions + ).match() + } + ) + } + } + +} diff --git a/modules/nf-core/maxbin2/tests/main.nf.test.snap b/modules/nf-core/maxbin2/tests/main.nf.test.snap new file mode 100644 index 00000000..caecef8e --- /dev/null +++ b/modules/nf-core/maxbin2/tests/main.nf.test.snap @@ -0,0 +1,59 @@ +{ + "test-maxbin2": { + "content": [ + [ + [ + { + "id": "test1", + "single_end": false + }, + [ + "test1.001.fasta.gz:md5,92eeca569534d770af91a1c07e62afa9", + "test1.002.fasta.gz:md5,628ef3b2e6647aed95511c28ea0dc229" + ] + ] + ], + [ + [ + { + "id": "test1", + "single_end": false + }, + "test1.summary:md5,7cdbedbfadd7a96203bdeca55ad822da" + ] + ], + "test1.log.gz", + [ + [ + { + "id": "test1", + "single_end": false + }, + "test1.marker.gz:md5,928994e84b9d723a8a48841432e1a262" + ] + ], + "test1.noclass.gz", + [ + [ + { + "id": "test1", + "single_end": false + }, + "test1.tooshort.gz:md5,b4e48e83637217aa9eba7f27f5990b24" + ] + ], + "test1.marker_of_each_bin.tar.gz", + [ + + ], + [ + "versions.yml:md5,a8b5754ee5df020d62ff25306376fc0a" + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "24.04.4" + }, + "timestamp": "2024-08-30T14:56:43.557114" + } +} \ No newline at end of file diff --git a/modules/nf-core/megahit/environment.yml b/modules/nf-core/megahit/environment.yml new file mode 100644 index 00000000..eed8b725 --- /dev/null +++ b/modules/nf-core/megahit/environment.yml @@ -0,0 +1,6 @@ +channels: + - conda-forge + - bioconda +dependencies: + - bioconda::megahit=1.2.9 + - conda-forge::pigz=2.8 diff --git a/modules/nf-core/megahit/main.nf b/modules/nf-core/megahit/main.nf new file mode 100644 index 00000000..f6e50f94 --- /dev/null +++ b/modules/nf-core/megahit/main.nf @@ -0,0 +1,70 @@ +process MEGAHIT { + tag "${meta.id}" + label 'process_high' + conda "${moduleDir}/environment.yml" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://community-cr-prod.seqera.io/docker/registry/v2/blobs/sha256/f2/f2cb827988dca7067ff8096c37cb20bc841c878013da52ad47a50865d54efe83/data' : + 'community.wave.seqera.io/library/megahit_pigz:87a590163e594224' }" + + input: + tuple val(meta), path(reads1), path(reads2) + + output: + tuple val(meta), path("*.contigs.fa.gz") , emit: contigs + tuple val(meta), path("intermediate_contigs/k*.contigs.fa.gz") , emit: k_contigs + tuple val(meta), path("intermediate_contigs/k*.addi.fa.gz") , emit: addi_contigs + tuple val(meta), path("intermediate_contigs/k*.local.fa.gz") , emit: local_contigs + tuple val(meta), path("intermediate_contigs/k*.final.contigs.fa.gz"), emit: kfinal_contigs + tuple val(meta), path('*.log') , emit: log + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def args2 = task.ext.args2 ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + def reads_command = meta.single_end || !reads2 ? "-r ${reads1}" : "-1 ${reads1.join(',')} -2 ${reads2.join(',')}" + """ + megahit \\ + ${reads_command} \\ + ${args} \\ + -t ${task.cpus} \\ + --out-prefix ${prefix} + + pigz \\ + --no-name \\ + -p ${task.cpus} \\ + ${args2} \\ + megahit_out/*.fa \\ + megahit_out/intermediate_contigs/*.fa + + mv megahit_out/* . + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + megahit: \$(echo \$(megahit -v 2>&1) | sed 's/MEGAHIT v//') + END_VERSIONS + """ + + stub: + def args = task.ext.args ?: '' + def args2 = task.ext.args2 ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + def reads_command = meta.single_end || !reads2 ? "-r ${reads1}" : "-1 ${reads1.join(',')} -2 ${reads2.join(',')}" + """ + mkdir -p intermediate_contigs + echo "" | gzip > ${prefix}.contigs.fa.gz + echo "" | gzip > intermediate_contigs/k21.contigs.fa.gz + echo "" | gzip > intermediate_contigs/k21.addi.fa.gz + echo "" | gzip > intermediate_contigs/k21.local.fa.gz + echo "" | gzip > intermediate_contigs/k21.final.contigs.fa.gz + touch ${prefix}.log + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + megahit: \$(echo \$(megahit -v 2>&1) | sed 's/MEGAHIT v//') + END_VERSIONS + """ +} diff --git a/modules/nf-core/megahit/meta.yml b/modules/nf-core/megahit/meta.yml new file mode 100644 index 00000000..04dab4c2 --- /dev/null +++ b/modules/nf-core/megahit/meta.yml @@ -0,0 +1,114 @@ +name: megahit +description: An ultra-fast metagenomic assembler for large and complex metagenomics +keywords: + - megahit + - denovo + - assembly + - debruijn + - metagenomics +tools: + - megahit: + description: "An ultra-fast single-node solution for large and complex metagenomics + assembly via succinct de Bruijn graph" + homepage: https://github.com/voutcn/megahit + documentation: https://github.com/voutcn/megahit + tool_dev_url: https://github.com/voutcn/megahit + doi: "10.1093/bioinformatics/btv033" + licence: ["GPL v3"] + args_id: "$args" + identifier: biotools:megahit + - pigz: + description: "Parallel implementation of the gzip algorithm." + homepage: "https://zlib.net/pigz/" + documentation: "https://zlib.net/pigz/pigz.pdf" + args_id: "$args2" + + identifier: biotools:megahit +input: + - - meta: + type: map + description: | + Groovy Map containing sample information and input single, or paired-end FASTA/FASTQ files (optionally decompressed) + e.g. [ id:'test', single_end:false ] + - reads1: + type: file + description: | + A single or list of input FastQ files for single-end or R1 of paired-end library(s), + respectively in gzipped or uncompressed FASTQ or FASTA format. + - reads2: + type: file + description: | + A single or list of input FastQ files for R2 of paired-end library(s), + respectively in gzipped or uncompressed FASTQ or FASTA format. +output: + - contigs: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*.contigs.fa.gz": + type: file + description: Final final contigs result of the assembly in FASTA format. + pattern: "*.contigs.fa.gz" + - k_contigs: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - intermediate_contigs/k*.contigs.fa.gz: + type: file + description: Contigs assembled from the de Bruijn graph of order-K + pattern: "k*.contigs.fa.gz" + - addi_contigs: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - intermediate_contigs/k*.addi.fa.gz: + type: file + description: Contigs assembled after iteratively removing local low coverage + unitigs in the de Bruijn graph of order-K + pattern: "k*.addi.fa.gz" + - local_contigs: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - intermediate_contigs/k*.local.fa.gz: + type: file + description: Contigs of the locally assembled contigs for k=K + pattern: "k*.local.fa.gz" + - kfinal_contigs: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - intermediate_contigs/k*.final.contigs.fa.gz: + type: file + description: Stand-alone contigs for k=K; if local assembly is turned on, the + file will be empty + pattern: "k*.final.contigs.fa.gz" + - log: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - "*.log": + type: file + description: Log file containing statistics of the assembly output + pattern: "*.log" + - versions: + - versions.yml: + type: file + description: File containing software versions + pattern: "versions.yml" +authors: + - "@jfy133" +maintainers: + - "@jfy133" diff --git a/modules/nf-core/megahit/tests/main.nf.test b/modules/nf-core/megahit/tests/main.nf.test new file mode 100644 index 00000000..b52765d4 --- /dev/null +++ b/modules/nf-core/megahit/tests/main.nf.test @@ -0,0 +1,126 @@ +nextflow_process { + + name "Test Process MEGAHIT" + script "../main.nf" + process "MEGAHIT" + + tag "modules" + tag "modules_nfcore" + tag "megahit" + + test("sarscov2 - fastq - se") { + + when { + process { + """ + input[0] = [ [id:"test", single_end:true], + file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true), + []] + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert path(process.out.contigs[0][1]).linesGzip.toString().contains(">k") }, + { assert process.out.k_contigs[0][1].each{path(it).linesGzip.toString().contains(">k")}}, + { assert process.out.addi_contigs[0][1].each{path(it).linesGzip.toString().contains(">k")}}, + { assert process.out.local_contigs[0][1].each{path(it).linesGzip.toString().contains(">k")}}, + { assert process.out.kfinal_contigs[0][1].each{path(it).linesGzip.toString().contains(">k")}}, + { assert snapshot( + path(process.out.log[0][1]).readLines().last().contains("ALL DONE. Time elapsed"), + process.out.versions + ).match() + } + ) + } + + } + + test("sarscov2 - fastq - pe") { + + when { + process { + """ + input[0] = [ [id:"test", single_end:false], + file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true), + file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_2.fastq.gz', checkIfExists: true) + ] + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert path(process.out.contigs[0][1]).linesGzip.toString().contains(">k") }, + { assert process.out.k_contigs[0][1].each{path(it).linesGzip.toString().contains(">k")}}, + { assert process.out.addi_contigs[0][1].each{path(it).linesGzip.toString().contains(">k")}}, + { assert process.out.local_contigs[0][1].each{path(it).linesGzip.toString().contains(">k")}}, + { assert process.out.kfinal_contigs[0][1].each{path(it).linesGzip.toString().contains(">k")}}, + { assert snapshot( + path(process.out.log[0][1]).readLines().last().contains("ALL DONE. Time elapsed"), + process.out.versions + ).match() + } + ) + } + + } + + test("sarscov2 - fastq - pe - coassembly") { + + when { + process { + """ + input[0] = [ [id:"test", single_end:false], + [ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true), file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test2_1.fastq.gz', checkIfExists: true)] , + [ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_2.fastq.gz', checkIfExists: true), file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test2_2.fastq.gz', checkIfExists: true)] + ] + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert path(process.out.contigs[0][1]).linesGzip.toString().contains(">k") }, + { assert process.out.k_contigs[0][1].each{path(it).linesGzip.toString().contains(">k")}}, + { assert process.out.addi_contigs[0][1].each{path(it).linesGzip.toString().contains(">k")}}, + { assert process.out.local_contigs[0][1].each{path(it).linesGzip.toString().contains(">k")}}, + { assert process.out.kfinal_contigs[0][1].each{path(it).linesGzip.toString().contains(">k")}}, + { assert snapshot( + path(process.out.log[0][1]).readLines().last().contains("ALL DONE. Time elapsed"), + process.out.versions + ).match() + } + ) + } + + } + + test("sarscov2 - stub") { + + options "-stub" + + when { + process { + """ + input[0] = [ [id:"test", single_end:true], + file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true), + [] + ] + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot(process.out).match() } + ) + } + } + +} diff --git a/modules/nf-core/megahit/tests/main.nf.test.snap b/modules/nf-core/megahit/tests/main.nf.test.snap new file mode 100644 index 00000000..4677cc33 --- /dev/null +++ b/modules/nf-core/megahit/tests/main.nf.test.snap @@ -0,0 +1,172 @@ +{ + "sarscov2 - fastq - se": { + "content": [ + true, + [ + "versions.yml:md5,e3c0731297c9abe2f495ab6d541ac0e6" + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "24.04.4" + }, + "timestamp": "2024-09-12T16:45:42.387947698" + }, + "sarscov2 - fastq - pe": { + "content": [ + true, + [ + "versions.yml:md5,e3c0731297c9abe2f495ab6d541ac0e6" + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "24.04.4" + }, + "timestamp": "2024-09-12T16:45:48.679485983" + }, + "sarscov2 - stub": { + "content": [ + { + "0": [ + [ + { + "id": "test", + "single_end": true + }, + "test.contigs.fa.gz:md5,68b329da9893e34099c7d8ad5cb9c940" + ] + ], + "1": [ + [ + { + "id": "test", + "single_end": true + }, + [ + "k21.contigs.fa.gz:md5,68b329da9893e34099c7d8ad5cb9c940", + "k21.final.contigs.fa.gz:md5,68b329da9893e34099c7d8ad5cb9c940" + ] + ] + ], + "2": [ + [ + { + "id": "test", + "single_end": true + }, + "k21.addi.fa.gz:md5,68b329da9893e34099c7d8ad5cb9c940" + ] + ], + "3": [ + [ + { + "id": "test", + "single_end": true + }, + "k21.local.fa.gz:md5,68b329da9893e34099c7d8ad5cb9c940" + ] + ], + "4": [ + [ + { + "id": "test", + "single_end": true + }, + "k21.final.contigs.fa.gz:md5,68b329da9893e34099c7d8ad5cb9c940" + ] + ], + "5": [ + [ + { + "id": "test", + "single_end": true + }, + "test.log:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "6": [ + "versions.yml:md5,e3c0731297c9abe2f495ab6d541ac0e6" + ], + "addi_contigs": [ + [ + { + "id": "test", + "single_end": true + }, + "k21.addi.fa.gz:md5,68b329da9893e34099c7d8ad5cb9c940" + ] + ], + "contigs": [ + [ + { + "id": "test", + "single_end": true + }, + "test.contigs.fa.gz:md5,68b329da9893e34099c7d8ad5cb9c940" + ] + ], + "k_contigs": [ + [ + { + "id": "test", + "single_end": true + }, + [ + "k21.contigs.fa.gz:md5,68b329da9893e34099c7d8ad5cb9c940", + "k21.final.contigs.fa.gz:md5,68b329da9893e34099c7d8ad5cb9c940" + ] + ] + ], + "kfinal_contigs": [ + [ + { + "id": "test", + "single_end": true + }, + "k21.final.contigs.fa.gz:md5,68b329da9893e34099c7d8ad5cb9c940" + ] + ], + "local_contigs": [ + [ + { + "id": "test", + "single_end": true + }, + "k21.local.fa.gz:md5,68b329da9893e34099c7d8ad5cb9c940" + ] + ], + "log": [ + [ + { + "id": "test", + "single_end": true + }, + "test.log:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "versions": [ + "versions.yml:md5,e3c0731297c9abe2f495ab6d541ac0e6" + ] + } + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "24.04.4" + }, + "timestamp": "2024-09-12T16:44:35.245399991" + }, + "sarscov2 - fastq - pe - coassembly": { + "content": [ + true, + [ + "versions.yml:md5,e3c0731297c9abe2f495ab6d541ac0e6" + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "24.04.4" + }, + "timestamp": "2024-09-12T16:45:56.23363342" + } +} \ No newline at end of file diff --git a/modules/nf-core/megahit/tests/tags.yml b/modules/nf-core/megahit/tests/tags.yml new file mode 100644 index 00000000..9e865846 --- /dev/null +++ b/modules/nf-core/megahit/tests/tags.yml @@ -0,0 +1,2 @@ +megahit: + - "modules/nf-core/megahit/**" diff --git a/modules/nf-core/multiqc/environment.yml b/modules/nf-core/multiqc/environment.yml index ca39fb67..6f5b867b 100644 --- a/modules/nf-core/multiqc/environment.yml +++ b/modules/nf-core/multiqc/environment.yml @@ -1,7 +1,5 @@ -name: multiqc channels: - conda-forge - bioconda - - defaults dependencies: - - bioconda::multiqc=1.21 + - bioconda::multiqc=1.25.1 diff --git a/modules/nf-core/multiqc/main.nf b/modules/nf-core/multiqc/main.nf index 47ac352f..cc0643e1 100644 --- a/modules/nf-core/multiqc/main.nf +++ b/modules/nf-core/multiqc/main.nf @@ -3,14 +3,16 @@ process MULTIQC { conda "${moduleDir}/environment.yml" container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? - 'https://depot.galaxyproject.org/singularity/multiqc:1.21--pyhdfd78af_0' : - 'biocontainers/multiqc:1.21--pyhdfd78af_0' }" + 'https://depot.galaxyproject.org/singularity/multiqc:1.25.1--pyhdfd78af_0' : + 'biocontainers/multiqc:1.25.1--pyhdfd78af_0' }" input: path multiqc_files, stageAs: "?/*" path(multiqc_config) path(extra_multiqc_config) path(multiqc_logo) + path(replace_names) + path(sample_names) output: path "*multiqc_report.html", emit: report @@ -23,16 +25,22 @@ process MULTIQC { script: def args = task.ext.args ?: '' + def prefix = task.ext.prefix ? "--filename ${task.ext.prefix}.html" : '' def config = multiqc_config ? "--config $multiqc_config" : '' def extra_config = extra_multiqc_config ? "--config $extra_multiqc_config" : '' - def logo = multiqc_logo ? /--cl-config 'custom_logo: "${multiqc_logo}"'/ : '' + def logo = multiqc_logo ? "--cl-config 'custom_logo: \"${multiqc_logo}\"'" : '' + def replace = replace_names ? "--replace-names ${replace_names}" : '' + def samples = sample_names ? "--sample-names ${sample_names}" : '' """ multiqc \\ --force \\ $args \\ $config \\ + $prefix \\ $extra_config \\ $logo \\ + $replace \\ + $samples \\ . cat <<-END_VERSIONS > versions.yml @@ -44,7 +52,7 @@ process MULTIQC { stub: """ mkdir multiqc_data - touch multiqc_plots + mkdir multiqc_plots touch multiqc_report.html cat <<-END_VERSIONS > versions.yml diff --git a/modules/nf-core/multiqc/meta.yml b/modules/nf-core/multiqc/meta.yml index 45a9bc35..b16c1879 100644 --- a/modules/nf-core/multiqc/meta.yml +++ b/modules/nf-core/multiqc/meta.yml @@ -1,5 +1,6 @@ name: multiqc -description: Aggregate results from bioinformatics analyses across many samples into a single report +description: Aggregate results from bioinformatics analyses across many samples into + a single report keywords: - QC - bioinformatics tools @@ -12,40 +13,59 @@ tools: homepage: https://multiqc.info/ documentation: https://multiqc.info/docs/ licence: ["GPL-3.0-or-later"] + identifier: biotools:multiqc input: - - multiqc_files: - type: file - description: | - List of reports / files recognised by MultiQC, for example the html and zip output of FastQC - - multiqc_config: - type: file - description: Optional config yml for MultiQC - pattern: "*.{yml,yaml}" - - extra_multiqc_config: - type: file - description: Second optional config yml for MultiQC. Will override common sections in multiqc_config. - pattern: "*.{yml,yaml}" - - multiqc_logo: - type: file - description: Optional logo file for MultiQC - pattern: "*.{png}" + - - multiqc_files: + type: file + description: | + List of reports / files recognised by MultiQC, for example the html and zip output of FastQC + - - multiqc_config: + type: file + description: Optional config yml for MultiQC + pattern: "*.{yml,yaml}" + - - extra_multiqc_config: + type: file + description: Second optional config yml for MultiQC. Will override common sections + in multiqc_config. + pattern: "*.{yml,yaml}" + - - multiqc_logo: + type: file + description: Optional logo file for MultiQC + pattern: "*.{png}" + - - replace_names: + type: file + description: | + Optional two-column sample renaming file. First column a set of + patterns, second column a set of corresponding replacements. Passed via + MultiQC's `--replace-names` option. + pattern: "*.{tsv}" + - - sample_names: + type: file + description: | + Optional TSV file with headers, passed to the MultiQC --sample_names + argument. + pattern: "*.{tsv}" output: - report: - type: file - description: MultiQC report file - pattern: "multiqc_report.html" + - "*multiqc_report.html": + type: file + description: MultiQC report file + pattern: "multiqc_report.html" - data: - type: directory - description: MultiQC data dir - pattern: "multiqc_data" + - "*_data": + type: directory + description: MultiQC data dir + pattern: "multiqc_data" - plots: - type: file - description: Plots created by MultiQC - pattern: "*_data" + - "*_plots": + type: file + description: Plots created by MultiQC + pattern: "*_data" - versions: - type: file - description: File containing software versions - pattern: "versions.yml" + - versions.yml: + type: file + description: File containing software versions + pattern: "versions.yml" authors: - "@abhi18av" - "@bunop" diff --git a/modules/nf-core/multiqc/tests/main.nf.test b/modules/nf-core/multiqc/tests/main.nf.test index f1c4242e..33316a7d 100644 --- a/modules/nf-core/multiqc/tests/main.nf.test +++ b/modules/nf-core/multiqc/tests/main.nf.test @@ -8,6 +8,8 @@ nextflow_process { tag "modules_nfcore" tag "multiqc" + config "./nextflow.config" + test("sarscov2 single-end [fastqc]") { when { @@ -17,6 +19,8 @@ nextflow_process { input[1] = [] input[2] = [] input[3] = [] + input[4] = [] + input[5] = [] """ } } @@ -41,6 +45,8 @@ nextflow_process { input[1] = Channel.of(file("https://github.com/nf-core/tools/raw/dev/nf_core/pipeline-template/assets/multiqc_config.yml", checkIfExists: true)) input[2] = [] input[3] = [] + input[4] = [] + input[5] = [] """ } } @@ -66,6 +72,8 @@ nextflow_process { input[1] = [] input[2] = [] input[3] = [] + input[4] = [] + input[5] = [] """ } } diff --git a/modules/nf-core/multiqc/tests/main.nf.test.snap b/modules/nf-core/multiqc/tests/main.nf.test.snap index bfebd802..2fcbb5ff 100644 --- a/modules/nf-core/multiqc/tests/main.nf.test.snap +++ b/modules/nf-core/multiqc/tests/main.nf.test.snap @@ -2,14 +2,14 @@ "multiqc_versions_single": { "content": [ [ - "versions.yml:md5,21f35ee29416b9b3073c28733efe4b7d" + "versions.yml:md5,41f391dcedce7f93ca188f3a3ffa0916" ] ], "meta": { - "nf-test": "0.8.4", - "nextflow": "23.10.1" + "nf-test": "0.9.0", + "nextflow": "24.04.4" }, - "timestamp": "2024-02-29T08:48:55.657331" + "timestamp": "2024-10-02T17:51:46.317523" }, "multiqc_stub": { "content": [ @@ -17,25 +17,25 @@ "multiqc_report.html", "multiqc_data", "multiqc_plots", - "versions.yml:md5,21f35ee29416b9b3073c28733efe4b7d" + "versions.yml:md5,41f391dcedce7f93ca188f3a3ffa0916" ] ], "meta": { - "nf-test": "0.8.4", - "nextflow": "23.10.1" + "nf-test": "0.9.0", + "nextflow": "24.04.4" }, - "timestamp": "2024-02-29T08:49:49.071937" + "timestamp": "2024-10-02T17:52:20.680978" }, "multiqc_versions_config": { "content": [ [ - "versions.yml:md5,21f35ee29416b9b3073c28733efe4b7d" + "versions.yml:md5,41f391dcedce7f93ca188f3a3ffa0916" ] ], "meta": { - "nf-test": "0.8.4", - "nextflow": "23.10.1" + "nf-test": "0.9.0", + "nextflow": "24.04.4" }, - "timestamp": "2024-02-29T08:49:25.457567" + "timestamp": "2024-10-02T17:52:09.185842" } } \ No newline at end of file diff --git a/modules/nf-core/multiqc/tests/nextflow.config b/modules/nf-core/multiqc/tests/nextflow.config new file mode 100644 index 00000000..c537a6a3 --- /dev/null +++ b/modules/nf-core/multiqc/tests/nextflow.config @@ -0,0 +1,5 @@ +process { + withName: 'MULTIQC' { + ext.prefix = null + } +} diff --git a/modules/nf-core/porechop/abi/environment.yml b/modules/nf-core/porechop/abi/environment.yml new file mode 100644 index 00000000..dabb4921 --- /dev/null +++ b/modules/nf-core/porechop/abi/environment.yml @@ -0,0 +1,7 @@ +--- +# yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/modules/environment-schema.json +channels: + - conda-forge + - bioconda +dependencies: + - bioconda::porechop_abi=0.5.0 diff --git a/modules/nf-core/porechop/abi/main.nf b/modules/nf-core/porechop/abi/main.nf new file mode 100644 index 00000000..88ec5bd0 --- /dev/null +++ b/modules/nf-core/porechop/abi/main.nf @@ -0,0 +1,50 @@ +process PORECHOP_ABI { + tag "$meta.id" + label 'process_medium' + + conda "${moduleDir}/environment.yml" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/porechop_abi:0.5.0--py310h590eda1_0': + 'biocontainers/porechop_abi:0.5.0--py310h590eda1_0' }" + + input: + tuple val(meta), path(reads) + + output: + tuple val(meta), path("*.fastq.gz") , emit: reads + tuple val(meta), path("*.log") , emit: log + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}.porechop_abi" + if ("$reads" == "${prefix}.fastq.gz") error "Input and output names are the same, use \"task.ext.prefix\" to disambiguate!" + """ + porechop_abi \\ + --input $reads \\ + --threads $task.cpus \\ + $args \\ + --output ${prefix}.fastq.gz \\ + | tee ${prefix}.log + cat <<-END_VERSIONS > versions.yml + "${task.process}": + porechop_abi: \$( porechop_abi --version ) + END_VERSIONS + """ + + stub: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}.porechop_abi" + """ + echo "" | gzip > ${prefix}.fastq.gz + touch ${prefix}.log + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + porechop_abi: \$( porechop_abi --version ) + END_VERSIONS + """ +} diff --git a/modules/nf-core/porechop/abi/meta.yml b/modules/nf-core/porechop/abi/meta.yml new file mode 100644 index 00000000..a856ffbe --- /dev/null +++ b/modules/nf-core/porechop/abi/meta.yml @@ -0,0 +1,48 @@ +name: "porechop_abi" +description: Extension of Porechop whose purpose is to process adapter sequences in ONT reads. +keywords: + - porechop_abi + - adapter + - nanopore +tools: + - "porechop_abi": + description: Extension of Porechop whose purpose is to process adapter sequences in ONT reads. + homepage: "https://github.com/bonsai-team/Porechop_ABI" + documentation: "https://github.com/bonsai-team/Porechop_ABI" + tool_dev_url: "https://github.com/bonsai-team/Porechop_ABI" + doi: "10.1101/2022.07.07.499093" + licence: ["MIT"] +input: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - reads: + type: file + description: fastq/fastq.gz file + pattern: "*.{fastq,fastq.gz,fq,fq.gz}" +output: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" + - reads: + type: file + description: Adapter-trimmed fastq.gz file + pattern: "*.fastq.gz" + - log: + type: file + description: Log file containing stdout information + pattern: "*.log" +authors: + - "@sofstam" + - "LilyAnderssonLee" +maintainers: + - "@sofstam" + - "LilyAnderssonLee" diff --git a/modules/nf-core/porechop/abi/tests/main.nf.test b/modules/nf-core/porechop/abi/tests/main.nf.test new file mode 100644 index 00000000..b5a29f90 --- /dev/null +++ b/modules/nf-core/porechop/abi/tests/main.nf.test @@ -0,0 +1,59 @@ +nextflow_process { + + name "Test Process PORECHOP_ABI" + script "../main.nf" + process "PORECHOP_ABI" + tag "modules" + tag "modules_nfcore" + tag "porechop" + tag "porechop/abi" + + test("sarscov2-nanopore") { + + when { + process { + """ + input[0] = [ + [ id:'test'], // meta map + file(params.modules_testdata_base_path + 'genomics/sarscov2/nanopore/fastq/test.fastq.gz', checkIfExists: true) + ] + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot( + process.out.reads, + file(process.out.log.get(0).get(1)).readLines()[20..40], + process.out.versions).match() + } + ) + } + } + + test("sarscov2-nanopore - stub") { + + options "-stub" + + when { + + process { + """ + input[0] = [ + [ id:'test'], // meta map + file(params.modules_testdata_base_path + 'genomics/sarscov2/nanopore/fastq/test.fastq.gz', checkIfExists: true) + ] + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot(process.out).match() } + ) + } + } +} diff --git a/modules/nf-core/porechop/abi/tests/main.nf.test.snap b/modules/nf-core/porechop/abi/tests/main.nf.test.snap new file mode 100644 index 00000000..ad63f4ed --- /dev/null +++ b/modules/nf-core/porechop/abi/tests/main.nf.test.snap @@ -0,0 +1,94 @@ +{ + "sarscov2-nanopore": { + "content": [ + [ + [ + { + "id": "test" + }, + "test.porechop_abi.fastq.gz:md5,886fdb859fb50e0dddd35007bcff043e" + ] + ], + [ + " Best \u001b[0m", + " read Best \u001b[0m", + " start read end\u001b[0m", + " \u001b[4mSet %ID %ID \u001b[0m", + " \u001b[32mSQK-NSK007 100.0 73.1\u001b[0m", + " Rapid 40.4 0.0", + " RBK004_upstream 77.5 0.0", + " SQK-MAP006 75.8 72.7", + " SQK-MAP006 short 65.5 66.7", + " PCR adapters 1 73.9 69.6", + " PCR adapters 2 80.0 72.7", + " PCR adapters 3 70.8 69.6", + " 1D^2 part 1 71.4 70.0", + " 1D^2 part 2 84.8 75.8", + " cDNA SSP 63.0 61.7", + " \u001b[32mBarcode 1 (reverse) 100.0 100.0\u001b[0m", + " Barcode 2 (reverse) 70.8 69.2", + " Barcode 3 (reverse) 76.0 70.4", + " Barcode 4 (reverse) 74.1 71.4", + " Barcode 5 (reverse) 77.8 80.8", + " Barcode 6 (reverse) 73.1 70.8" + ], + [ + "versions.yml:md5,0e9e5e0d35a68ff8e6490c949b257f98" + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.04.1" + }, + "timestamp": "2024-07-29T13:50:49.318599" + }, + "sarscov2-nanopore - stub": { + "content": [ + { + "0": [ + [ + { + "id": "test" + }, + "test.porechop_abi.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940" + ] + ], + "1": [ + [ + { + "id": "test" + }, + "test.porechop_abi.log:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "2": [ + "versions.yml:md5,0e9e5e0d35a68ff8e6490c949b257f98" + ], + "log": [ + [ + { + "id": "test" + }, + "test.porechop_abi.log:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "reads": [ + [ + { + "id": "test" + }, + "test.porechop_abi.fastq.gz:md5,68b329da9893e34099c7d8ad5cb9c940" + ] + ], + "versions": [ + "versions.yml:md5,0e9e5e0d35a68ff8e6490c949b257f98" + ] + } + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.04.1" + }, + "timestamp": "2024-07-29T13:50:54.425389" + } +} \ No newline at end of file diff --git a/modules/nf-core/prodigal/environment.yml b/modules/nf-core/prodigal/environment.yml new file mode 100644 index 00000000..7609bf3b --- /dev/null +++ b/modules/nf-core/prodigal/environment.yml @@ -0,0 +1,6 @@ +channels: + - conda-forge + - bioconda +dependencies: + - bioconda::prodigal=2.6.3 + - conda-forge::pigz=2.6 diff --git a/modules/nf-core/prodigal/main.nf b/modules/nf-core/prodigal/main.nf index 8cf87a6d..49ced167 100644 --- a/modules/nf-core/prodigal/main.nf +++ b/modules/nf-core/prodigal/main.nf @@ -2,7 +2,7 @@ process PRODIGAL { tag "$meta.id" label 'process_single' - conda "bioconda::prodigal=2.6.3 conda-forge::pigz=2.6" + conda "${moduleDir}/environment.yml" container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? 'https://depot.galaxyproject.org/singularity/mulled-v2-2e442ba7b07bfa102b9cf8fac6221263cd746ab8:57f05cfa73f769d6ed6d54144cb3aa2a6a6b17e0-0' : 'biocontainers/mulled-v2-2e442ba7b07bfa102b9cf8fac6221263cd746ab8:57f05cfa73f769d6ed6d54144cb3aa2a6a6b17e0-0' }" @@ -33,7 +33,10 @@ process PRODIGAL { -a "${prefix}.faa" \\ -s "${prefix}_all.txt" - pigz -nm ${prefix}* + pigz -nm ${prefix}.fna + pigz -nm ${prefix}.${output_format} + pigz -nm ${prefix}.faa + pigz -nm ${prefix}_all.txt cat <<-END_VERSIONS > versions.yml "${task.process}": @@ -41,4 +44,21 @@ process PRODIGAL { pigz: \$(pigz -V 2>&1 | sed 's/pigz //g') END_VERSIONS """ + + stub: + def args = task.ext.args ?: '' + prefix = task.ext.prefix ?: "${meta.id}" + """ + touch ${prefix}.fna.gz + touch ${prefix}.${output_format}.gz + touch ${prefix}.faa.gz + touch ${prefix}_all.txt.gz + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + prodigal: \$(prodigal -v 2>&1 | sed -n 's/Prodigal V\\(.*\\):.*/\\1/p') + pigz: \$(pigz -V 2>&1 | sed 's/pigz //g') + END_VERSIONS + """ + } diff --git a/modules/nf-core/prodigal/meta.yml b/modules/nf-core/prodigal/meta.yml index 30747a90..7d3d459e 100644 --- a/modules/nf-core/prodigal/meta.yml +++ b/modules/nf-core/prodigal/meta.yml @@ -1,57 +1,79 @@ name: prodigal -description: Prodigal (Prokaryotic Dynamic Programming Genefinding Algorithm) is a microbial (bacterial and archaeal) gene finding program +description: Prodigal (Prokaryotic Dynamic Programming Genefinding Algorithm) is a + microbial (bacterial and archaeal) gene finding program keywords: - prokaryotes - gene finding - microbial tools: - prodigal: - description: Prodigal (Prokaryotic Dynamic Programming Genefinding Algorithm) is a microbial (bacterial and archaeal) gene finding program + description: Prodigal (Prokaryotic Dynamic Programming Genefinding Algorithm) + is a microbial (bacterial and archaeal) gene finding program homepage: https://github.com/hyattpd/Prodigal documentation: https://github.com/hyattpd/prodigal/wiki tool_dev_url: https://github.com/hyattpd/Prodigal doi: "10.1186/1471-2105-11-119" licence: ["GPL v3"] - + identifier: biotools:prodigal input: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. [ id:'test', single_end:false ] - - genome: - type: file - description: fasta/fasta.gz file - - output_format: - type: string - description: Output format ("gbk"/"gff"/"sqn"/"sco") - + - - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - genome: + type: file + description: fasta/fasta.gz file + - - output_format: + type: string + description: Output format ("gbk"/"gff"/"sqn"/"sco") output: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. [ id:'test', single_end:false ] - - versions: - type: file - description: File containing software versions - pattern: "versions.yml" + - gene_annotations: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - ${prefix}.${output_format}.gz: + type: file + description: gene annotations in output_format given as input + pattern: "*.{output_format}" - nucleotide_fasta: - type: file - description: nucleotide sequences file - pattern: "*.{fna}" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - ${prefix}.fna.gz: + type: file + description: nucleotide sequences file + pattern: "*.{fna}" - amino_acid_fasta: - type: file - description: protein translations file - pattern: "*.{faa}" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - ${prefix}.faa.gz: + type: file + description: protein translations file + pattern: "*.{faa}" - all_gene_annotations: - type: file - description: complete starts file - pattern: "*.{_all.txt}" - - gene_annotations: - type: file - description: gene annotations in output_format given as input - pattern: "*.{output_format}" - + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - ${prefix}_all.txt.gz: + type: file + description: complete starts file + pattern: "*.{_all.txt}" + - versions: + - versions.yml: + type: file + description: File containing software versions + pattern: "versions.yml" authors: - "@grst" +maintainers: + - "@grst" diff --git a/modules/nf-core/prodigal/tests/main.nf.test b/modules/nf-core/prodigal/tests/main.nf.test new file mode 100644 index 00000000..446bd0d1 --- /dev/null +++ b/modules/nf-core/prodigal/tests/main.nf.test @@ -0,0 +1,101 @@ +nextflow_process { + + name "Test Process PRODIGAL" + script "../main.nf" + process "PRODIGAL" + + tag "modules" + tag "modules_nfcore" + tag "prodigal" + + test("prodigal - sarscov2 - gff") { + when { + process { + """ + input[0] = [ + [id:'test', single_end:false ], // meta map + file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta', checkIfExists: true) + ] + input[1] = 'gff' + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot(process.out).match() } + ) + } + } + + test("prodigal - sarscov2 - gbk") { + when { + process { + """ + input[0] = [ + [id:'test', single_end:false ], // meta map + file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta', checkIfExists: true) + ] + input[1] = 'gbk' + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot(process.out).match() } + ) + } + } + + test("prodigal - sarscov2 - gff - stub") { + + options "-stub" + + when { + process { + """ + input[0] = [ + [id:'test', single_end:false ], // meta map + file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta', checkIfExists: true) + ] + input[1] = 'gff' + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot(process.out.out).match() } + ) + } + } + + test("prodigal - sarscov2 - gbk - stub") { + + options "-stub" + + when { + process { + """ + input[0] = [ + [id:'test', single_end:false ], // meta map + file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta', checkIfExists: true) + ] + input[1] = 'gbk' + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot(process.out.out).match() } + ) + } + } + +} \ No newline at end of file diff --git a/modules/nf-core/prodigal/tests/main.nf.test.snap b/modules/nf-core/prodigal/tests/main.nf.test.snap new file mode 100644 index 00000000..f29802b4 --- /dev/null +++ b/modules/nf-core/prodigal/tests/main.nf.test.snap @@ -0,0 +1,196 @@ +{ + "prodigal - sarscov2 - gbk - stub": { + "content": null, + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.10.1" + }, + "timestamp": "2024-03-18T13:58:09.852618454" + }, + "prodigal - sarscov2 - gff": { + "content": [ + { + "0": [ + [ + { + "id": "test", + "single_end": false + }, + "test.gff.gz:md5,612c2724c2891c63350f171f74165757" + ] + ], + "1": [ + [ + { + "id": "test", + "single_end": false + }, + "test.fna.gz:md5,1bc8a05bcb72a3c324f5e4ffaa716d3b" + ] + ], + "2": [ + [ + { + "id": "test", + "single_end": false + }, + "test.faa.gz:md5,7168b854103f3586ccfdb71a44c389f7" + ] + ], + "3": [ + [ + { + "id": "test", + "single_end": false + }, + "test_all.txt.gz:md5,e6d6c50f0c39e5169f84ae3c90837fa9" + ] + ], + "4": [ + "versions.yml:md5,9541e53a6927e9856036bb97bfb30307" + ], + "all_gene_annotations": [ + [ + { + "id": "test", + "single_end": false + }, + "test_all.txt.gz:md5,e6d6c50f0c39e5169f84ae3c90837fa9" + ] + ], + "amino_acid_fasta": [ + [ + { + "id": "test", + "single_end": false + }, + "test.faa.gz:md5,7168b854103f3586ccfdb71a44c389f7" + ] + ], + "gene_annotations": [ + [ + { + "id": "test", + "single_end": false + }, + "test.gff.gz:md5,612c2724c2891c63350f171f74165757" + ] + ], + "nucleotide_fasta": [ + [ + { + "id": "test", + "single_end": false + }, + "test.fna.gz:md5,1bc8a05bcb72a3c324f5e4ffaa716d3b" + ] + ], + "versions": [ + "versions.yml:md5,9541e53a6927e9856036bb97bfb30307" + ] + } + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.10.1" + }, + "timestamp": "2024-03-18T13:57:49.57989696" + }, + "prodigal - sarscov2 - gff - stub": { + "content": null, + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.10.1" + }, + "timestamp": "2024-03-18T13:58:03.210222528" + }, + "prodigal - sarscov2 - gbk": { + "content": [ + { + "0": [ + [ + { + "id": "test", + "single_end": false + }, + "test.gbk.gz:md5,188b3a0e3f78740ded7f3ec4d876cb4b" + ] + ], + "1": [ + [ + { + "id": "test", + "single_end": false + }, + "test.fna.gz:md5,1bc8a05bcb72a3c324f5e4ffaa716d3b" + ] + ], + "2": [ + [ + { + "id": "test", + "single_end": false + }, + "test.faa.gz:md5,7168b854103f3586ccfdb71a44c389f7" + ] + ], + "3": [ + [ + { + "id": "test", + "single_end": false + }, + "test_all.txt.gz:md5,e6d6c50f0c39e5169f84ae3c90837fa9" + ] + ], + "4": [ + "versions.yml:md5,9541e53a6927e9856036bb97bfb30307" + ], + "all_gene_annotations": [ + [ + { + "id": "test", + "single_end": false + }, + "test_all.txt.gz:md5,e6d6c50f0c39e5169f84ae3c90837fa9" + ] + ], + "amino_acid_fasta": [ + [ + { + "id": "test", + "single_end": false + }, + "test.faa.gz:md5,7168b854103f3586ccfdb71a44c389f7" + ] + ], + "gene_annotations": [ + [ + { + "id": "test", + "single_end": false + }, + "test.gbk.gz:md5,188b3a0e3f78740ded7f3ec4d876cb4b" + ] + ], + "nucleotide_fasta": [ + [ + { + "id": "test", + "single_end": false + }, + "test.fna.gz:md5,1bc8a05bcb72a3c324f5e4ffaa716d3b" + ] + ], + "versions": [ + "versions.yml:md5,9541e53a6927e9856036bb97bfb30307" + ] + } + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "23.10.1" + }, + "timestamp": "2024-03-18T13:57:56.606374214" + } +} \ No newline at end of file diff --git a/modules/nf-core/prodigal/tests/tags.yml b/modules/nf-core/prodigal/tests/tags.yml new file mode 100644 index 00000000..fc0cb020 --- /dev/null +++ b/modules/nf-core/prodigal/tests/tags.yml @@ -0,0 +1,2 @@ +prodigal: + - "modules/nf-core/prodigal/**" diff --git a/modules/nf-core/spades/environment.yml b/modules/nf-core/spades/environment.yml new file mode 100644 index 00000000..8cc5321f --- /dev/null +++ b/modules/nf-core/spades/environment.yml @@ -0,0 +1,5 @@ +channels: + - conda-forge + - bioconda +dependencies: + - bioconda::spades=4.0.0 diff --git a/modules/nf-core/spades/main.nf b/modules/nf-core/spades/main.nf new file mode 100644 index 00000000..36cdfe44 --- /dev/null +++ b/modules/nf-core/spades/main.nf @@ -0,0 +1,102 @@ +process SPADES { + tag "$meta.id" + label 'process_high' + + conda "${moduleDir}/environment.yml" + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/spades:4.0.0--h5fb382e_1' : + 'biocontainers/spades:4.0.0--h5fb382e_1' }" + + input: + tuple val(meta), path(illumina), path(pacbio), path(nanopore) + path yml + path hmm + + output: + tuple val(meta), path('*.scaffolds.fa.gz') , optional:true, emit: scaffolds + tuple val(meta), path('*.contigs.fa.gz') , optional:true, emit: contigs + tuple val(meta), path('*.transcripts.fa.gz') , optional:true, emit: transcripts + tuple val(meta), path('*.gene_clusters.fa.gz'), optional:true, emit: gene_clusters + tuple val(meta), path('*.assembly.gfa.gz') , optional:true, emit: gfa + tuple val(meta), path('*.warnings.log') , optional:true, emit: warnings + tuple val(meta), path('*.spades.log') , emit: log + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + def maxmem = task.memory.toGiga() + def illumina_reads = illumina ? ( meta.single_end ? "-s $illumina" : "-1 ${illumina[0]} -2 ${illumina[1]}" ) : "" + def pacbio_reads = pacbio ? "--pacbio $pacbio" : "" + def nanopore_reads = nanopore ? "--nanopore $nanopore" : "" + def custom_hmms = hmm ? "--custom-hmms $hmm" : "" + def reads = yml ? "--dataset $yml" : "$illumina_reads $pacbio_reads $nanopore_reads" + """ + spades.py \\ + $args \\ + --threads $task.cpus \\ + --memory $maxmem \\ + $custom_hmms \\ + $reads \\ + -o ./ + mv spades.log ${prefix}.spades.log + + if [ -f scaffolds.fasta ]; then + mv scaffolds.fasta ${prefix}.scaffolds.fa + gzip -n ${prefix}.scaffolds.fa + fi + if [ -f contigs.fasta ]; then + mv contigs.fasta ${prefix}.contigs.fa + gzip -n ${prefix}.contigs.fa + fi + if [ -f transcripts.fasta ]; then + mv transcripts.fasta ${prefix}.transcripts.fa + gzip -n ${prefix}.transcripts.fa + fi + if [ -f assembly_graph_with_scaffolds.gfa ]; then + mv assembly_graph_with_scaffolds.gfa ${prefix}.assembly.gfa + gzip -n ${prefix}.assembly.gfa + fi + + if [ -f gene_clusters.fasta ]; then + mv gene_clusters.fasta ${prefix}.gene_clusters.fa + gzip -n ${prefix}.gene_clusters.fa + fi + + if [ -f warnings.log ]; then + mv warnings.log ${prefix}.warnings.log + fi + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + spades: \$(spades.py --version 2>&1 | sed -n 's/^.*SPAdes genome assembler v//p') + END_VERSIONS + """ + + stub: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + def maxmem = task.memory.toGiga() + def illumina_reads = illumina ? ( meta.single_end ? "-s $illumina" : "-1 ${illumina[0]} -2 ${illumina[1]}" ) : "" + def pacbio_reads = pacbio ? "--pacbio $pacbio" : "" + def nanopore_reads = nanopore ? "--nanopore $nanopore" : "" + def custom_hmms = hmm ? "--custom-hmms $hmm" : "" + def reads = yml ? "--dataset $yml" : "$illumina_reads $pacbio_reads $nanopore_reads" + """ + echo "" | gzip > ${prefix}.scaffolds.fa.gz + echo "" | gzip > ${prefix}.contigs.fa.gz + echo "" | gzip > ${prefix}.transcripts.fa.gz + echo "" | gzip > ${prefix}.gene_clusters.fa.gz + echo "" | gzip > ${prefix}.assembly.gfa.gz + touch ${prefix}.spades.log + touch ${prefix}.warnings.log + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + spades: \$(spades.py --version 2>&1 | sed -n 's/^.*SPAdes genome assembler v//p') + END_VERSIONS + """ +} diff --git a/modules/nf-core/spades/meta.yml b/modules/nf-core/spades/meta.yml new file mode 100644 index 00000000..986871be --- /dev/null +++ b/modules/nf-core/spades/meta.yml @@ -0,0 +1,99 @@ +name: spades +description: Assembles a small genome (bacterial, fungal, viral) +keywords: + - genome + - assembly + - genome assembler + - small genome + - de novo assembler +tools: + - spades: + description: SPAdes (St. Petersburg genome assembler) is intended for both standard isolates and single-cell MDA bacteria assemblies. + homepage: http://cab.spbu.ru/files/release3.15.0/manual.html + documentation: http://cab.spbu.ru/files/release3.15.0/manual.html + tool_dev_url: https://github.com/ablab/spades + doi: 10.1089/cmb.2012.0021 + licence: ["GPL v2"] +input: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - illumina: + type: file + description: | + List of input FastQ (Illumina or PacBio CCS reads) files + of size 1 and 2 for single-end and paired-end data, + respectively. This input data type is required. + - pacbio: + type: file + description: | + List of input PacBio CLR FastQ files of size 1. + - nanopore: + type: file + description: | + List of input FastQ files of size 1, originating from Oxford Nanopore technology. + - yml: + type: file + description: | + Path to yml file containing read information. + The raw FASTQ files listed in this YAML file MUST be supplied to the respective illumina/pacbio/nanopore input channel(s) _in addition_ to this YML. + File entries in this yml must contain only the file name and no paths. + pattern: "*.{yml,yaml}" + - hmm: + type: file + description: File or directory with amino acid HMMs for Spades HMM-guided mode. +output: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - scaffolds: + type: file + description: | + Fasta file containing scaffolds + pattern: "*.fa.gz" + - contigs: + type: file + description: | + Fasta file containing contigs + pattern: "*.fa.gz" + - transcripts: + type: file + description: | + Fasta file containing transcripts + pattern: "*.fa.gz" + - gene_clusters: + type: file + description: | + Fasta file containing gene_clusters + pattern: "*.fa.gz" + - gfa: + type: file + description: | + gfa file containing assembly + pattern: "*.gfa.gz" + - log: + type: file + description: | + Spades log file + pattern: "*.spades.log" + - log: + type: file + description: | + Spades warning log file + pattern: "*.warning.log" + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" +authors: + - "@JoseEspinosa" + - "@drpatelh" + - "@d4straub" +maintainers: + - "@JoseEspinosa" + - "@drpatelh" + - "@d4straub" diff --git a/modules/nf-core/spades/tests/main.nf.test b/modules/nf-core/spades/tests/main.nf.test new file mode 100644 index 00000000..3a93f486 --- /dev/null +++ b/modules/nf-core/spades/tests/main.nf.test @@ -0,0 +1,228 @@ +nextflow_process { + + name "Test Process SPADES" + script "../main.nf" + process "SPADES" + config "./nextflow.config" + tag "modules" + tag "modules_nfcore" + tag "spades" + + test("sarscov2 - se ") { + + when { + process { + """ + input[0] = [ [ id:'test', single_end:true ], + [ file(params.modules_testdata_base_path + "genomics/sarscov2/illumina/fastq/test_2.fastq.gz", checkIfExists: true) ], + [], + [] + ] + input[1] = [] + input[2] = [] + """ + } + } + then { + assertAll( + { assert process.success }, + { assert snapshot(process.out.scaffolds, + process.out.contigs, + process.out.transcripts, + process.out.gene_clusters, + process.out.gfa, + process.out.versions + ).match() }, + { assert path(process.out.log[0][1]).readLines().any { it.contains("SPAdes pipeline finished") } } + ) + } + } + + test("sarscov2 - pe ") { + + when { + process { + """ + input[0] = [ [ id:'test', single_end:false ], // meta map + [ file(params.modules_testdata_base_path + "genomics/sarscov2/illumina/fastq/test_1.fastq.gz", checkIfExists: true), + file(params.modules_testdata_base_path + "genomics/sarscov2/illumina/fastq/test_2.fastq.gz", checkIfExists: true) ], + [], + [] + ] + input [1] = [] + input [2] = [] + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot(process.out.scaffolds, + process.out.contigs, + process.out.transcripts, + process.out.gene_clusters, + process.out.gfa, + process.out.versions + ).match() }, + { assert path(process.out.log[0][1]).readLines().any { it.contains("SPAdes pipeline finished") } }, + { assert file(process.out.warnings[0][1]).find{ file(it).name == "warnings.log"} } + ) + } + + } + // isnt perfect, because CCS reads should rather be used with -s instead of --pacbio + test("sarscov2 - pe - pacbio ") { + when { + process { + """ + input[0] = [ [ id:'test', single_end:false ], // meta map + [ file(params.modules_testdata_base_path + "genomics/sarscov2/illumina/fastq/test_1.fastq.gz", checkIfExists: true), + file(params.modules_testdata_base_path + "genomics/sarscov2/illumina/fastq/test_2.fastq.gz", checkIfExists: true) ], + [], + [ file(params.modules_testdata_base_path + "genomics/sarscov2/nanopore/fastq/test.fastq.gz", checkIfExists: true) ] + ] + input [1] = [] + input [2] = [] + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot(process.out.scaffolds, + process.out.contigs, + process.out.transcripts, + process.out.gene_clusters, + process.out.gfa, + process.out.versions + ).match() }, + { assert path(process.out.log[0][1]).readLines().any { it.contains("SPAdes pipeline finished") } }, + { assert file(process.out.warnings[0][1]).find{ file(it).name == "warnings.log"} } + ) + } + } + + test("sarscov2 - pe - nanopore ") { + + when { + process { + """ + input[0] = [ [ id:'test', single_end:false ], // meta map + [ file(params.modules_testdata_base_path + "genomics/sarscov2/illumina/fastq/test_1.fastq.gz", checkIfExists: true), + file(params.modules_testdata_base_path + "genomics/sarscov2/illumina/fastq/test_2.fastq.gz", checkIfExists: true) ], + [], + [ file(params.modules_testdata_base_path + "genomics/sarscov2/nanopore/fastq/test.fastq.gz", checkIfExists: true) ] + ] + input [1] = [] + input [2] = [] + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot(process.out.scaffolds, + process.out.contigs, + process.out.transcripts, + process.out.gene_clusters, + process.out.gfa, + process.out.versions + ).match() }, + { assert path(process.out.log[0][1]).readLines().any { it.contains("SPAdes pipeline finished") } }, + { assert file(process.out.warnings[0][1]).find{ file(it).name == "warnings.log"} } + ) + } + } + + test("sarscov2 - pe - nanopore - yml ") { + when { + process { + """ + input[0] = [ [ id:'test', single_end:false ], // meta map + [ file(params.modules_testdata_base_path + "genomics/sarscov2/illumina/fastq/test_1.fastq.gz", checkIfExists: true), + file(params.modules_testdata_base_path + "genomics/sarscov2/illumina/fastq/test_2.fastq.gz", checkIfExists: true) ], + [], + [ file(params.modules_testdata_base_path + "genomics/sarscov2/nanopore/fastq/test.fastq.gz", checkIfExists: true) ] + ] + input [1] = file(params.modules_testdata_base_path + "delete_me/spades/spades_input_yml.yml", checkIfExists: true) + input [2] = [] + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot(process.out.scaffolds, + process.out.contigs, + process.out.transcripts, + process.out.gene_clusters, + process.out.gfa, + process.out.versions + ).match() }, + { assert path(process.out.log[0][1]).readLines().any { it.contains("SPAdes pipeline finished") } }, + { assert file(process.out.warnings[0][1]).find{ file(it).name == "warnings.log"} } + ) + } + } + + test("sarscov2 - pe - hmm ") { + when { + process { + """ + input[0] = [ [ id:'test', single_end:false ], // meta map + [ file("https://github.com/nf-core/test-datasets/raw/viralrecon/illumina/sispa/SRR11140744_R1.fastq.gz", checkIfExists: true), + file("https://github.com/nf-core/test-datasets/raw/viralrecon/illumina/sispa/SRR11140744_R2.fastq.gz", checkIfExists: true) ], + [], + [] + ] + input [1] = [] + input [2] = [file(params.modules_testdata_base_path + "/genomics/sarscov2/genome/proteome.hmm.gz", checkIfExists: true)] + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot(process.out.scaffolds, + process.out.contigs, + process.out.transcripts, + process.out.gene_clusters, + process.out.gfa, + process.out.versions + ).match() }, + { assert path(process.out.log[0][1]).readLines().any { it.contains("SPAdes pipeline finished") } } + ) + } + } + + test("sarscov2 - pe - stub ") { + options "-stub" + when { + process { + """ + input[0] = [ [ id:'test', single_end:false ], // meta map + [ file(params.modules_testdata_base_path + "genomics/sarscov2/illumina/fastq/test_1.fastq.gz", checkIfExists: true), + file(params.modules_testdata_base_path + "genomics/sarscov2/illumina/fastq/test_2.fastq.gz", checkIfExists: true) ], + [], + [] + ] + input [1] = [] + input [2] = [] + """ + } + } + + then { + assertAll( + { assert process.success }, + { assert snapshot(process.out).match() } + ) + } + } + +} diff --git a/modules/nf-core/spades/tests/main.nf.test.snap b/modules/nf-core/spades/tests/main.nf.test.snap new file mode 100644 index 00000000..e1b3b652 --- /dev/null +++ b/modules/nf-core/spades/tests/main.nf.test.snap @@ -0,0 +1,403 @@ +{ + "sarscov2 - pe - nanopore ": { + "content": [ + [ + [ + { + "id": "test", + "single_end": false + }, + "test.scaffolds.fa.gz:md5,7ddaf03740df422a93fcaffbcd7e9679" + ] + ], + [ + [ + { + "id": "test", + "single_end": false + }, + "test.contigs.fa.gz:md5,7ddaf03740df422a93fcaffbcd7e9679" + ] + ], + [ + + ], + [ + + ], + [ + [ + { + "id": "test", + "single_end": false + }, + "test.assembly.gfa.gz:md5,19418df83534fc93543dec4ec9b2ae72" + ] + ], + [ + "versions.yml:md5,990abcdf543421412170e5cf413ec56d" + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "24.04.2" + }, + "timestamp": "2024-06-07T07:13:08.663068339" + }, + "sarscov2 - pe - hmm ": { + "content": [ + [ + [ + { + "id": "test", + "single_end": false + }, + "test.scaffolds.fa.gz:md5,ce077d5f3380690f8d9a5fe188f82128" + ] + ], + [ + + ], + [ + + ], + [ + + ], + [ + [ + { + "id": "test", + "single_end": false + }, + "test.assembly.gfa.gz:md5,07136eab8e231f095dc5dd62f1b62a91" + ] + ], + [ + "versions.yml:md5,990abcdf543421412170e5cf413ec56d" + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "24.04.2" + }, + "timestamp": "2024-06-07T08:04:19.650636803" + }, + "sarscov2 - pe - pacbio ": { + "content": [ + [ + [ + { + "id": "test", + "single_end": false + }, + "test.scaffolds.fa.gz:md5,7ddaf03740df422a93fcaffbcd7e9679" + ] + ], + [ + [ + { + "id": "test", + "single_end": false + }, + "test.contigs.fa.gz:md5,7ddaf03740df422a93fcaffbcd7e9679" + ] + ], + [ + + ], + [ + + ], + [ + [ + { + "id": "test", + "single_end": false + }, + "test.assembly.gfa.gz:md5,19418df83534fc93543dec4ec9b2ae72" + ] + ], + [ + "versions.yml:md5,990abcdf543421412170e5cf413ec56d" + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "24.04.2" + }, + "timestamp": "2024-06-07T07:12:49.305512756" + }, + "sarscov2 - pe ": { + "content": [ + [ + + ], + [ + [ + { + "id": "test", + "single_end": false + }, + "test.contigs.fa.gz:md5,70e4a5485dd59566b212a199c31c343b" + ] + ], + [ + + ], + [ + + ], + [ + [ + { + "id": "test", + "single_end": false + }, + "test.assembly.gfa.gz:md5,b773132d52be5090cdbdf5a643027093" + ] + ], + [ + "versions.yml:md5,990abcdf543421412170e5cf413ec56d" + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "24.04.2" + }, + "timestamp": "2024-06-07T07:12:36.161628498" + }, + "sarscov2 - pe - nanopore - yml ": { + "content": [ + [ + [ + { + "id": "test", + "single_end": false + }, + "test.scaffolds.fa.gz:md5,7ddaf03740df422a93fcaffbcd7e9679" + ] + ], + [ + [ + { + "id": "test", + "single_end": false + }, + "test.contigs.fa.gz:md5,7ddaf03740df422a93fcaffbcd7e9679" + ] + ], + [ + + ], + [ + + ], + [ + [ + { + "id": "test", + "single_end": false + }, + "test.assembly.gfa.gz:md5,19418df83534fc93543dec4ec9b2ae72" + ] + ], + [ + "versions.yml:md5,990abcdf543421412170e5cf413ec56d" + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "24.04.2" + }, + "timestamp": "2024-06-07T07:13:21.868805946" + }, + "sarscov2 - se ": { + "content": [ + [ + [ + { + "id": "test", + "single_end": true + }, + "test.scaffolds.fa.gz:md5,65ba6a517c152dbe219bf4b5b92bdad7" + ] + ], + [ + [ + { + "id": "test", + "single_end": true + }, + "test.contigs.fa.gz:md5,65ba6a517c152dbe219bf4b5b92bdad7" + ] + ], + [ + + ], + [ + + ], + [ + [ + { + "id": "test", + "single_end": true + }, + "test.assembly.gfa.gz:md5,e4836fdf7104d79e314e3e50986b4bb2" + ] + ], + [ + "versions.yml:md5,990abcdf543421412170e5cf413ec56d" + ] + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "24.04.2" + }, + "timestamp": "2024-06-07T07:12:16.562778962" + }, + "sarscov2 - pe - stub ": { + "content": [ + { + "0": [ + [ + { + "id": "test", + "single_end": false + }, + "test.scaffolds.fa.gz:md5,68b329da9893e34099c7d8ad5cb9c940" + ] + ], + "1": [ + [ + { + "id": "test", + "single_end": false + }, + "test.contigs.fa.gz:md5,68b329da9893e34099c7d8ad5cb9c940" + ] + ], + "2": [ + [ + { + "id": "test", + "single_end": false + }, + "test.transcripts.fa.gz:md5,68b329da9893e34099c7d8ad5cb9c940" + ] + ], + "3": [ + [ + { + "id": "test", + "single_end": false + }, + "test.gene_clusters.fa.gz:md5,68b329da9893e34099c7d8ad5cb9c940" + ] + ], + "4": [ + [ + { + "id": "test", + "single_end": false + }, + "test.assembly.gfa.gz:md5,68b329da9893e34099c7d8ad5cb9c940" + ] + ], + "5": [ + [ + { + "id": "test", + "single_end": false + }, + "test.warnings.log:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "6": [ + [ + { + "id": "test", + "single_end": false + }, + "test.spades.log:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "7": [ + "versions.yml:md5,990abcdf543421412170e5cf413ec56d" + ], + "contigs": [ + [ + { + "id": "test", + "single_end": false + }, + "test.contigs.fa.gz:md5,68b329da9893e34099c7d8ad5cb9c940" + ] + ], + "gene_clusters": [ + [ + { + "id": "test", + "single_end": false + }, + "test.gene_clusters.fa.gz:md5,68b329da9893e34099c7d8ad5cb9c940" + ] + ], + "gfa": [ + [ + { + "id": "test", + "single_end": false + }, + "test.assembly.gfa.gz:md5,68b329da9893e34099c7d8ad5cb9c940" + ] + ], + "log": [ + [ + { + "id": "test", + "single_end": false + }, + "test.spades.log:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ], + "scaffolds": [ + [ + { + "id": "test", + "single_end": false + }, + "test.scaffolds.fa.gz:md5,68b329da9893e34099c7d8ad5cb9c940" + ] + ], + "transcripts": [ + [ + { + "id": "test", + "single_end": false + }, + "test.transcripts.fa.gz:md5,68b329da9893e34099c7d8ad5cb9c940" + ] + ], + "versions": [ + "versions.yml:md5,990abcdf543421412170e5cf413ec56d" + ], + "warnings": [ + [ + { + "id": "test", + "single_end": false + }, + "test.warnings.log:md5,d41d8cd98f00b204e9800998ecf8427e" + ] + ] + } + ], + "meta": { + "nf-test": "0.8.4", + "nextflow": "24.04.2" + }, + "timestamp": "2024-06-07T07:20:07.195881734" + } +} diff --git a/modules/nf-core/spades/tests/nextflow.config b/modules/nf-core/spades/tests/nextflow.config new file mode 100644 index 00000000..adec1bde --- /dev/null +++ b/modules/nf-core/spades/tests/nextflow.config @@ -0,0 +1,5 @@ +process { + withName: SPADES { + ext.args = '--rnaviral' + } +} diff --git a/modules/nf-core/spades/tests/tags.yml b/modules/nf-core/spades/tests/tags.yml new file mode 100644 index 00000000..035861ff --- /dev/null +++ b/modules/nf-core/spades/tests/tags.yml @@ -0,0 +1,2 @@ +spades: + - "modules/nf-core/spades/**" diff --git a/nextflow.config b/nextflow.config index 2ea9d994..026f67d8 100644 --- a/nextflow.config +++ b/nextflow.config @@ -27,6 +27,8 @@ params { adapterremoval_adapter2 = 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT' adapterremoval_trim_quality_stretch = false keep_phix = false + // long read preprocessing options + longread_adaptertrimming_tool = "porechop_abi" // phix_reference = "ftp://ftp.ncbi.nlm.nih.gov/genomes/genbank/viral/Enterobacteria_phage_phiX174_sensu_lato/all_assembly_versions/GCA_002596845.1_ASM259684v1/GCA_002596845.1_ASM259684v1_genomic.fna.gz" phix_reference = "${baseDir}/assets/data/GCA_002596845.1_ASM259684v1_genomic.fna.gz" save_phixremoved_reads = false @@ -88,7 +90,7 @@ params { cat_official_taxonomy = false save_cat_db = false skip_gtdbtk = false - gtdb_db = "https://data.ace.uq.edu.au/public/gtdb/data/releases/release220/220.0/auxillary_files/gtdbtk_package/full_package/gtdbtk_r220_data.tar.gz" + gtdb_db = "https://data.gtdb.ecogenomic.org/releases/release220/220.0/auxillary_files/gtdbtk_package/full_package/gtdbtk_r220_data.tar.gz" gtdb_mash = null gtdbtk_min_completeness = 50.0 gtdbtk_max_contamination = 10.0 @@ -127,7 +129,7 @@ params { busco_auto_lineage_prok = false save_busco_db = false busco_clean = false - checkm_download_url = "https://data.ace.uq.edu.au/public/CheckM_databases/checkm_data_2015_01_16.tar.gz" + checkm_download_url = "https://zenodo.org/records/7401545/files/checkm_data_2015_01_16.tar.gz" checkm_db = null save_checkm_data = false run_gunc = false @@ -148,7 +150,7 @@ params { save_mmseqs_db = false // References - genome = null + //genome = null // we use --host_genome instead igenomes_base = 's3://ngi-igenomes/igenomes/' igenomes_ignore = false @@ -168,51 +170,27 @@ params { monochrome_logs = false hook_url = null help = false + help_full = false + show_hidden = false version = false pipelines_testdata_base_path = 'https://raw.githubusercontent.com/nf-core/test-datasets/' - monochromeLogs = null // TODO remove once nf-validation removes the bug - // Config options config_profile_name = null config_profile_description = null + custom_config_version = 'master' custom_config_base = "https://raw.githubusercontent.com/nf-core/configs/${params.custom_config_version}" config_profile_contact = null config_profile_url = null - // Max resource options - // Defaults only, expecting to be overwritten - max_memory = '128.GB' - max_cpus = 16 - max_time = '240.h' - // Schema validation default options - validationFailUnrecognisedParams = false - validationLenientMode = false - validationSchemaIgnoreParams = 'genome,genomes,igenomes_base,monochromeLogs' - validationShowHiddenParams = false - validate_params = true - + validate_params = true } // Load base.config by default for all pipelines includeConfig 'conf/base.config' -// Load nf-core custom profiles from different Institutions -try { - includeConfig "${params.custom_config_base}/nfcore_custom.config" -} catch (Exception e) { - System.err.println("WARNING: Could not load nf-core/config profiles: ${params.custom_config_base}/nfcore_custom.config") -} - -// Load nf-core/mag custom profiles from different institutions. -try { - includeConfig "${params.custom_config_base}/pipeline/mag.config" -} catch (Exception e) { - System.err.println("WARNING: Could not load nf-core/config/mag profiles: ${params.custom_config_base}/pipeline/mag.config") -} - profiles { debug { dumpHashes = true @@ -227,7 +205,7 @@ profiles { podman.enabled = false shifter.enabled = false charliecloud.enabled = false - conda.channels = ['conda-forge', 'bioconda', 'defaults'] + conda.channels = ['conda-forge', 'bioconda'] apptainer.enabled = false } mamba { @@ -330,25 +308,24 @@ profiles { test_concoct { includeConfig 'conf/test_concoct.config' } } -// Set default registry for Apptainer, Docker, Podman and Singularity independent of -profile -// Will not be used unless Apptainer / Docker / Podman / Singularity are enabled -// Set to your registry if you have a mirror of containers -apptainer.registry = 'quay.io' -docker.registry = 'quay.io' -podman.registry = 'quay.io' -singularity.registry = 'quay.io' +// Load nf-core custom profiles from different Institutions +includeConfig !System.getenv('NXF_OFFLINE') && params.custom_config_base ? "${params.custom_config_base}/nfcore_custom.config" : "/dev/null" -// Nextflow plugins -plugins { - id 'nf-validation@1.1.3' // Validation of pipeline parameters and creation of an input channel from a sample sheet -} +// Load nf-core/mag custom profiles from different institutions. +includeConfig !System.getenv('NXF_OFFLINE') && params.custom_config_base ? "${params.custom_config_base}/pipeline/mag.config" : "/dev/null" + +// Set default registry for Apptainer, Docker, Podman, Charliecloud and Singularity independent of -profile +// Will not be used unless Apptainer / Docker / Podman / Charliecloud / Singularity are enabled +// Set to your registry if you have a mirror of containers +apptainer.registry = 'quay.io' +docker.registry = 'quay.io' +podman.registry = 'quay.io' +singularity.registry = 'quay.io' +charliecloud.registry = 'quay.io' // Load igenomes.config if required -if (!params.igenomes_ignore) { - includeConfig 'conf/igenomes.config' -} else { - params.genomes = [:] -} +includeConfig !params.igenomes_ignore ? 'conf/igenomes.config' : 'conf/igenomes_ignored.config' + // Export these variables to prevent local Python/R libraries from conflicting with those in the container // The JULIA depot path has been adjusted to a fixed path `/usr/local/share/julia` that needs to be used for packages in the container. // See https://apeltzer.github.io/post/03-julia-lang-nextflow/ for details on that. Once we have a common agreement on where to keep Julia packages, this is adjustable. @@ -360,8 +337,15 @@ env { JULIA_DEPOT_PATH = "/usr/local/share/julia" } -// Capture exit codes from upstream processes when piping -process.shell = ['/bin/bash', '-euo', 'pipefail'] +// Set bash options +process.shell = """\ +bash + +set -e # Exit if a tool returns a non-zero status/exit code +set -u # Treat unset variables and parameters as an error +set -o pipefail # Returns the status of the last command to exit with a non-zero status or zero if all successfully execute +set -C # No clobber - prevent output redirection from overwriting files. +""" // Disable process selector warnings by default. Use debug profile to enable warnings. nextflow.enable.configProcessNamesValidation = false @@ -390,58 +374,46 @@ manifest { homePage = 'https://github.com/nf-core/mag' description = """Assembly, binning and annotation of metagenomes""" mainScript = 'main.nf' - nextflowVersion = '!>=23.04.0' - version = '3.1.0' + nextflowVersion = '!>=24.04.2' + version = '3.2.0' doi = '10.1093/nargab/lqac007' } -// Load modules.config for DSL2 module specific options -includeConfig 'conf/modules.config' +// Nextflow plugins +plugins { + id 'nf-schema@2.1.2' // Validation of pipeline parameters and creation of an input channel from a sample sheet +} -// Function to ensure that resource requirements don't go beyond -// a maximum limit -def check_max(obj, type) { - if (type == 'memory') { - try { - if (obj.compareTo(params.max_memory as nextflow.util.MemoryUnit) == 1) - return params.max_memory as nextflow.util.MemoryUnit - else - return obj - } catch (all) { - println " ### ERROR ### Max memory '${params.max_memory}' is not valid! Using default value: $obj" - return obj - } - } else if (type == 'time') { - try { - if (obj.compareTo(params.max_time as nextflow.util.Duration) == 1) - return params.max_time as nextflow.util.Duration - else - return obj - } catch (all) { - println " ### ERROR ### Max time '${params.max_time}' is not valid! Using default value: $obj" - return obj - } - } else if (type == 'cpus') { - try { - return Math.min( obj, params.max_cpus as int ) - } catch (all) { - println " ### ERROR ### Max cpus '${params.max_cpus}' is not valid! Using default value: $obj" - return obj - } +validation { + defaultIgnoreParams = ["genomes"] + help { + enabled = true + command = "nextflow run $manifest.name -profile --input samplesheet.csv --outdir " + fullParameter = "help_full" + showHiddenParameter = "show_hidden" + beforeText = """ +-\033[2m----------------------------------------------------\033[0m- + \033[0;32m,--.\033[0;30m/\033[0;32m,-.\033[0m +\033[0;34m ___ __ __ __ ___ \033[0;32m/,-._.--~\'\033[0m +\033[0;34m |\\ | |__ __ / ` / \\ |__) |__ \033[0;33m} {\033[0m +\033[0;34m | \\| | \\__, \\__/ | \\ |___ \033[0;32m\\`-._,-`-,\033[0m + \033[0;32m`._,._,\'\033[0m +\033[0;35m ${manifest.name} ${manifest.version}\033[0m +-\033[2m----------------------------------------------------\033[0m- +""" + afterText = """${manifest.doi ? "* The pipeline\n" : ""}${manifest.doi.tokenize(",").collect { " https://doi.org/${it.trim().replace('https://doi.org/','')}"}.join("\n")}${manifest.doi ? "\n" : ""} +* The nf-core framework + https://doi.org/10.1038/s41587-020-0439-x + +* Software dependencies + https://github.com/${manifest.name}/blob/master/CITATIONS.md +""" + } + summary { + beforeText = validation.help.beforeText + afterText = validation.help.afterText } } -// Functions to fix number of cpus to allow reproducibility for MEGAHIT and SPAdes -// if corresponding parameters are specified, number of cpus is not increased with retries -def check_megahit_cpus (x, attempt ) { - if (params.megahit_fix_cpu_1) return 1 - else return check_max (x * attempt, 'cpus' ) -} -def check_spades_cpus (x, attempt ) { - if (params.spades_fix_cpus != -1) return check_max (params.spades_fix_cpus, 'cpus' ) - else return check_max (x * attempt, 'cpus' ) -} -def check_spadeshybrid_cpus (x, attempt ) { - if (params.spadeshybrid_fix_cpus != -1) return check_max (params.spadeshybrid_fix_cpus, 'cpus' ) - else return check_max (x * attempt, 'cpus' ) -} +// Load modules.config for DSL2 module specific options +includeConfig 'conf/modules.config' diff --git a/nextflow_schema.json b/nextflow_schema.json index aaff9835..ceb3ac08 100644 --- a/nextflow_schema.json +++ b/nextflow_schema.json @@ -1,10 +1,10 @@ { - "$schema": "http://json-schema.org/draft-07/schema", + "$schema": "https://json-schema.org/draft/2020-12/schema", "$id": "https://raw.githubusercontent.com/nf-core/mag/master/nextflow_schema.json", "title": "nf-core/mag pipeline parameters", "description": "Assembly, binning and annotation of metagenomes", "type": "object", - "definitions": { + "$defs": { "input_output_options": { "title": "Input/output options", "type": "object", @@ -66,20 +66,20 @@ "fa_icon": "fas fa-dna", "description": "Reference genome related files and options required for the workflow.", "properties": { - "igenomes_base": { - "type": "string", - "format": "directory-path", - "description": "Directory / URL base for iGenomes references.", - "default": "s3://ngi-igenomes/igenomes/", - "fa_icon": "fas fa-cloud-download-alt", - "hidden": true - }, "igenomes_ignore": { "type": "boolean", "description": "Do not load the iGenomes reference config.", "fa_icon": "fas fa-ban", "hidden": true, "help_text": "Do not load `igenomes.config` when running the pipeline. You may choose this option if you observe clashes between custom parameters and those supplied in `igenomes.config`." + }, + "igenomes_base": { + "type": "string", + "format": "directory-path", + "description": "The base path to the igenomes reference files", + "fa_icon": "fas fa-ban", + "hidden": true, + "default": "s3://ngi-igenomes/igenomes/" } } }, @@ -131,41 +131,6 @@ } } }, - "max_job_request_options": { - "title": "Max job request options", - "type": "object", - "fa_icon": "fab fa-acquisitions-incorporated", - "description": "Set the top limit for requested resources for any single job.", - "help_text": "If you are running on a smaller system, a pipeline step requesting more resources than are available may cause the Nextflow to stop the run with an error. These options allow you to cap the maximum resources requested by any single job so that the pipeline will run on your system.\n\nNote that you can not _increase_ the resources requested by any job using these options. For that you will need your own configuration file. See [the nf-core website](https://nf-co.re/usage/configuration) for details.", - "properties": { - "max_cpus": { - "type": "integer", - "description": "Maximum number of CPUs that can be requested for any single job.", - "default": 16, - "fa_icon": "fas fa-microchip", - "hidden": true, - "help_text": "Use to set an upper-limit for the CPU requirement for each process. Should be an integer e.g. `--max_cpus 1`" - }, - "max_memory": { - "type": "string", - "description": "Maximum amount of memory that can be requested for any single job.", - "default": "128.GB", - "fa_icon": "fas fa-memory", - "pattern": "^\\d+(\\.\\d+)?\\.?\\s*(K|M|G|T)?B$", - "hidden": true, - "help_text": "Use to set an upper-limit for the memory requirement for each process. Should be a string in the format integer-unit e.g. `--max_memory '8.GB'`" - }, - "max_time": { - "type": "string", - "description": "Maximum amount of time that can be requested for any single job.", - "default": "240.h", - "fa_icon": "far fa-clock", - "pattern": "^(\\d+\\.?\\s*(s|m|h|d|day)\\s*)+$", - "hidden": true, - "help_text": "Use to set an upper-limit for the time requirement for each process. Should be a string in the format integer-unit e.g. `--max_time '2.h'`" - } - } - }, "generic_options": { "title": "Generic options", "type": "object", @@ -173,12 +138,6 @@ "description": "Less common options for the pipeline, typically set in a config file.", "help_text": "These options are common to all nf-core pipelines and allow you to customise some of the core preferences for how the pipeline runs.\n\nTypically these options would be set in a Nextflow config file loaded for all pipeline runs, such as `~/.nextflow/config`.", "properties": { - "help": { - "type": "boolean", - "description": "Display help text.", - "fa_icon": "fas fa-question-circle", - "hidden": true - }, "version": { "type": "boolean", "description": "Display version and exit.", @@ -194,6 +153,11 @@ "enum": ["symlink", "rellink", "link", "copy", "copyNoFollow", "move"], "hidden": true }, + "monochrome_logs": { + "type": "boolean", + "description": "Use monochrome_logs", + "hidden": true + }, "email_on_fail": { "type": "string", "description": "Email address for completion summary, only when pipeline fails.", @@ -216,12 +180,6 @@ "fa_icon": "fas fa-file-upload", "hidden": true }, - "monochrome_logs": { - "type": "boolean", - "description": "Do not use coloured log outputs.", - "fa_icon": "fas fa-palette", - "hidden": true - }, "hook_url": { "type": "string", "description": "Incoming hook URL for messaging service", @@ -254,27 +212,6 @@ "fa_icon": "fas fa-check-square", "hidden": true }, - "validationShowHiddenParams": { - "type": "boolean", - "fa_icon": "far fa-eye-slash", - "description": "Show all params when using `--help`", - "hidden": true, - "help_text": "By default, parameters set as _hidden_ in the schema are not shown on the command line when a user runs with `--help`. Specifying this option will tell the pipeline to show all parameters." - }, - "validationFailUnrecognisedParams": { - "type": "boolean", - "fa_icon": "far fa-check-circle", - "description": "Validation of parameters fails when an unrecognised parameter is found.", - "hidden": true, - "help_text": "By default, when an unrecognised parameter is found, it returns a warinig." - }, - "validationLenientMode": { - "type": "boolean", - "fa_icon": "far fa-check-circle", - "description": "Validation of parameters in lenient more.", - "hidden": true, - "help_text": "Allows string values that are parseable as numbers or booleans. For further information see [JSONSchema docs](https://github.com/everit-org/json-schema#lenient-mode)." - }, "pipelines_testdata_base_path": { "type": "string", "fa_icon": "far fa-check-circle", @@ -485,6 +422,12 @@ "save_filtlong_reads": { "type": "boolean", "description": "Specify to save the resulting length filtered FASTQ files to --outdir." + }, + "longread_adaptertrimming_tool": { + "type": "string", + "description": "Specify which long read adapter trimming tool to use.", + "enum": ["porechop", "porechop_abi"], + "default": "porechop_abi" } } }, @@ -542,7 +485,7 @@ "gtdb_db": { "type": "string", "description": "Specify the location of a GTDBTK database. Can be either an uncompressed directory or a `.tar.gz` archive. If not specified will be downloaded for you when GTDBTK or binning QC is not skipped.", - "default": "https://data.ace.uq.edu.au/public/gtdb/data/releases/release220/220.0/auxillary_files/gtdbtk_package/full_package/gtdbtk_r220_data.tar.gz" + "default": "https://data.gtdb.ecogenomic.org/releases/release220/220.0/auxillary_files/gtdbtk_package/full_package/gtdbtk_r220_data.tar.gz" }, "gtdb_mash": { "type": "string", @@ -550,7 +493,7 @@ }, "gtdbtk_min_completeness": { "type": "number", - "default": 50, + "default": 50.0, "description": "Min. bin completeness (in %) required to apply GTDB-tk classification.", "help_text": "Completeness assessed with BUSCO analysis (100% - %Missing). Must be greater than 0 (min. 0.01) to avoid GTDB-tk errors. If too low, GTDB-tk classification results can be impaired due to not enough marker genes!", "minimum": 0.01, @@ -558,7 +501,7 @@ }, "gtdbtk_max_contamination": { "type": "number", - "default": 10, + "default": 10.0, "description": "Max. bin contamination (in %) allowed to apply GTDB-tk classification.", "help_text": "Contamination approximated based on BUSCO analysis (%Complete and duplicated). If too high, GTDB-tk classification results can be impaired due to contamination!", "minimum": 0, @@ -566,7 +509,7 @@ }, "gtdbtk_min_perc_aa": { "type": "number", - "default": 10, + "default": 10.0, "description": "Min. fraction of AA (in %) in the MSA for bins to be kept.", "minimum": 0, "maximum": 100 @@ -579,14 +522,13 @@ "maximum": 1 }, "gtdbtk_pplacer_cpus": { - "type": "number", + "type": "integer", "default": 1, "description": "Number of CPUs used for the by GTDB-Tk run tool pplacer.", "help_text": "A low number of CPUs helps to reduce the memory required/reported by GTDB-Tk. See also the [GTDB-Tk documentation](https://ecogenomics.github.io/GTDBTk/faq.html#gtdb-tk-reaches-the-memory-limit-pplacer-crashes)." }, "gtdbtk_pplacer_useram": { "type": "boolean", - "default": false, "description": "Speed up pplacer step of GTDB-Tk by loading to memory.", "help_text": "Will be faster than writing to disk (default setting), however at the expense of much larger memory (RAM) requirements for GDTBTK/CLASSIFY." }, @@ -609,8 +551,8 @@ }, "spades_options": { "type": "string", - "description": "Additional custom options for SPAdes.", - "help_text": "An example is adjusting k-mers (\"-k 21,33,55,77\") or adding [advanced options](https://github.com/ablab/spades#advanced-options). But not -t, -m, -o or --out-prefix, because these are already in use. Must be used like this: --spades_options \"-k 21,33,55,77\")" + "description": "Additional custom options for SPAdes and SPAdesHybrid. Do not specify `--meta` as this will be added for you!", + "help_text": "An example is adjusting k-mers (\"-k 21,33,55,77\") or adding [advanced options](https://github.com/ablab/spades#advanced-options). But not --meta, -t, -m, -o or --out-prefix, because these are already in use. Must be used like this: --spades_options \"-k 21,33,55,77\")" }, "megahit_options": { "type": "string", @@ -802,7 +744,7 @@ }, "checkm_download_url": { "type": "string", - "default": "https://data.ace.uq.edu.au/public/CheckM_databases/checkm_data_2015_01_16.tar.gz", + "default": "https://zenodo.org/records/7401545/files/checkm_data_2015_01_16.tar.gz", "hidden": true, "description": "URL pointing to checkM database for auto download, if local path not supplied.", "help_text": "You can use this parameter to point to an online copy of the checkM database TAR archive that the pipeline will use for auto download if a local path is not supplied to `--checkm_db`." @@ -909,49 +851,46 @@ }, "allOf": [ { - "$ref": "#/definitions/input_output_options" - }, - { - "$ref": "#/definitions/reference_genome_options" + "$ref": "#/$defs/input_output_options" }, { - "$ref": "#/definitions/institutional_config_options" + "$ref": "#/$defs/reference_genome_options" }, { - "$ref": "#/definitions/max_job_request_options" + "$ref": "#/$defs/institutional_config_options" }, { - "$ref": "#/definitions/generic_options" + "$ref": "#/$defs/generic_options" }, { - "$ref": "#/definitions/reproducibility_options" + "$ref": "#/$defs/reproducibility_options" }, { - "$ref": "#/definitions/quality_control_for_short_reads_options" + "$ref": "#/$defs/quality_control_for_short_reads_options" }, { - "$ref": "#/definitions/quality_control_for_long_reads_options" + "$ref": "#/$defs/quality_control_for_long_reads_options" }, { - "$ref": "#/definitions/taxonomic_profiling_options" + "$ref": "#/$defs/taxonomic_profiling_options" }, { - "$ref": "#/definitions/assembly_options" + "$ref": "#/$defs/assembly_options" }, { - "$ref": "#/definitions/gene_prediction_and_annotation_options" + "$ref": "#/$defs/gene_prediction_and_annotation_options" }, { - "$ref": "#/definitions/virus_identification_options" + "$ref": "#/$defs/virus_identification_options" }, { - "$ref": "#/definitions/binning_options" + "$ref": "#/$defs/binning_options" }, { - "$ref": "#/definitions/bin_quality_check_options" + "$ref": "#/$defs/bin_quality_check_options" }, { - "$ref": "#/definitions/ancient_dna_assembly" + "$ref": "#/$defs/ancient_dna_assembly" } ] } diff --git a/subworkflows/local/longread_preprocessing.nf b/subworkflows/local/longread_preprocessing.nf new file mode 100644 index 00000000..ec434858 --- /dev/null +++ b/subworkflows/local/longread_preprocessing.nf @@ -0,0 +1,90 @@ +/* + * LONGREAD_PREPROCESSING: Preprocessing and QC for long reads + */ + +include { NANOPLOT as NANOPLOT_RAW } from '../../modules/nf-core/nanoplot/main' +include { NANOPLOT as NANOPLOT_FILTERED } from '../../modules/nf-core/nanoplot/main' +include { NANOLYSE } from '../../modules/nf-core/nanolyse/main' +include { PORECHOP_PORECHOP } from '../../modules/nf-core/porechop/porechop/main' +include { PORECHOP_ABI } from '../../modules/nf-core/porechop/abi/main' +include { FILTLONG } from '../../modules/nf-core/filtlong' + +workflow LONGREAD_PREPROCESSING { + take: + ch_raw_long_reads // [ [meta] , fastq] (mandatory) + ch_short_reads // [ [meta] , fastq1, fastq2] (mandatory) + ch_nanolyse_db // [fasta] + + main: + ch_versions = Channel.empty() + ch_multiqc_files = Channel.empty() + + NANOPLOT_RAW ( + ch_raw_long_reads + ) + ch_versions = ch_versions.mix(NANOPLOT_RAW.out.versions.first()) + + ch_long_reads = ch_raw_long_reads + .map { + meta, reads -> + def meta_new = meta - meta.subMap('run') + [ meta_new, reads ] + } + + if ( !params.assembly_input ) { + if (!params.skip_adapter_trimming) { + if (params.longread_adaptertrimming_tool && + params.longread_adaptertrimming_tool == 'porechop_abi') { + PORECHOP_ABI ( + ch_raw_long_reads + ) + ch_long_reads = PORECHOP_ABI.out.reads + ch_versions = ch_versions.mix(PORECHOP_ABI.out.versions.first()) + ch_multiqc_files = ch_multiqc_files.mix( PORECHOP_ABI.out.log ) + } else if (params.longread_adaptertrimming_tool == 'porechop') { + PORECHOP_PORECHOP ( + ch_raw_long_reads + ) + ch_long_reads = PORECHOP_PORECHOP.out.reads + ch_versions = ch_versions.mix(PORECHOP_PORECHOP.out.versions.first()) + ch_multiqc_files = ch_multiqc_files.mix( PORECHOP_PORECHOP.out.log ) + } + } + + if (!params.keep_lambda) { + NANOLYSE ( + ch_long_reads, + ch_nanolyse_db + ) + ch_long_reads = NANOLYSE.out.fastq + ch_versions = ch_versions.mix(NANOLYSE.out.versions.first()) + } + + // join long and short reads by sample name + ch_short_reads_tmp = ch_short_reads + .map { meta, sr -> [ meta.id, meta, sr ] } + + ch_short_and_long_reads = ch_long_reads + .map { meta, lr -> [ meta.id, meta, lr ] } + .join(ch_short_reads_tmp, by: 0) + .map { id, meta_lr, lr, meta_sr, sr -> [ meta_lr, sr, lr ] } // should not occur for single-end, since SPAdes (hybrid) does not support single-end + + FILTLONG ( + ch_short_and_long_reads + ) + ch_long_reads = FILTLONG.out.reads + ch_versions = ch_versions.mix(FILTLONG.out.versions.first()) + ch_multiqc_files = ch_multiqc_files.mix( FILTLONG.out.log ) + + NANOPLOT_FILTERED ( + ch_long_reads + ) + + ch_versions = ch_versions.mix(NANOPLOT_FILTERED.out.versions.first()) + } + + emit: + long_reads = ch_long_reads + versions = ch_versions + multiqc_files = ch_multiqc_files +} diff --git a/subworkflows/local/utils_nfcore_mag_pipeline/main.nf b/subworkflows/local/utils_nfcore_mag_pipeline/main.nf index 29806112..11f65460 100644 --- a/subworkflows/local/utils_nfcore_mag_pipeline/main.nf +++ b/subworkflows/local/utils_nfcore_mag_pipeline/main.nf @@ -1,4 +1,3 @@ - // Subworkflow with functionality specific to the nf-core/mag pipeline // @@ -8,29 +7,24 @@ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ -include { UTILS_NFVALIDATION_PLUGIN } from '../../nf-core/utils_nfvalidation_plugin' -include { paramsSummaryMap } from 'plugin/nf-validation' -include { fromSamplesheet } from 'plugin/nf-validation' -include { UTILS_NEXTFLOW_PIPELINE } from '../../nf-core/utils_nextflow_pipeline' -include { completionEmail } from '../../nf-core/utils_nfcore_pipeline' -include { completionSummary } from '../../nf-core/utils_nfcore_pipeline' -include { dashedLine } from '../../nf-core/utils_nfcore_pipeline' -include { nfCoreLogo } from '../../nf-core/utils_nfcore_pipeline' -include { imNotification } from '../../nf-core/utils_nfcore_pipeline' -include { UTILS_NFCORE_PIPELINE } from '../../nf-core/utils_nfcore_pipeline' -include { workflowCitation } from '../../nf-core/utils_nfcore_pipeline' +include { UTILS_NFSCHEMA_PLUGIN } from '../../nf-core/utils_nfschema_plugin' +include { paramsSummaryMap } from 'plugin/nf-schema' +include { samplesheetToList } from 'plugin/nf-schema' +include { completionEmail } from '../../nf-core/utils_nfcore_pipeline' +include { completionSummary } from '../../nf-core/utils_nfcore_pipeline' +include { imNotification } from '../../nf-core/utils_nfcore_pipeline' +include { UTILS_NFCORE_PIPELINE } from '../../nf-core/utils_nfcore_pipeline' +include { UTILS_NEXTFLOW_PIPELINE } from '../../nf-core/utils_nextflow_pipeline' /* -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ SUBWORKFLOW TO INITIALISE PIPELINE -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ workflow PIPELINE_INITIALISATION { - take: version // boolean: Display version and exit - help // boolean: Display help text validate_params // boolean: Boolean whether to validate parameters against the schema at runtime monochrome_logs // boolean: Do not use coloured log outputs nextflow_cli_args // array: List of positional nextflow CLI args @@ -44,7 +38,7 @@ workflow PIPELINE_INITIALISATION { // // Print version and exit if required and dump pipeline parameters to JSON file // - UTILS_NEXTFLOW_PIPELINE ( + UTILS_NEXTFLOW_PIPELINE( version, true, outdir, @@ -54,22 +48,16 @@ workflow PIPELINE_INITIALISATION { // // Validate parameters and generate parameter summary to stdout // - pre_help_text = nfCoreLogo(monochrome_logs) - post_help_text = '\n' + workflowCitation() + '\n' + dashedLine(monochrome_logs) - def String workflow_command = "nextflow run ${workflow.manifest.name} -profile --input samplesheet.csv --outdir " - UTILS_NFVALIDATION_PLUGIN ( - help, - workflow_command, - pre_help_text, - post_help_text, + UTILS_NFSCHEMA_PLUGIN( + workflow, validate_params, - "nextflow_schema.json" + null ) // // Check config provided to the pipeline // - UTILS_NFCORE_PIPELINE ( + UTILS_NFCORE_PIPELINE( nextflow_cli_args ) @@ -83,35 +71,38 @@ workflow PIPELINE_INITIALISATION { // Validate FASTQ input ch_samplesheet = Channel - .fromSamplesheet("input") + .fromList(samplesheetToList(params.input, "${projectDir}/assets/schema_input.json")) .map { validateInputSamplesheet(it[0], it[1], it[2], it[3]) } // Prepare FASTQs channel and separate short and long reads and prepare - ch_raw_short_reads = ch_samplesheet - .map { meta, sr1, sr2, lr -> - meta.run = meta.run == null ? "0" : meta.run - meta.single_end = params.single_end - - if (params.single_end) { - return [ meta, [ sr1 ] ] - } else { - return [ meta, [ sr1, sr2 ] ] - } - } + ch_raw_short_reads = ch_samplesheet.map { meta, sr1, sr2, lr -> + meta.run = meta.run == [] ? "0" : meta.run + meta.single_end = params.single_end - ch_raw_long_reads = ch_samplesheet - .map { meta, sr1, sr2, lr -> - if (lr) { - meta.run = meta.run == null ? "0" : meta.run - return [ meta, lr ] - } - } + if (params.single_end) { + return [meta, [sr1]] + } + else { + return [meta, [sr1, sr2]] + } + } + + ch_raw_long_reads = ch_samplesheet.map { meta, sr1, sr2, lr -> + if (lr) { + meta.run = meta.run == [] ? "0" : meta.run + return [meta, lr] + } + } // Check already if long reads are provided, for later parameter validation - def hybrid =false - ch_raw_long_reads.map{if (it) hybrid = true} + def hybrid = false + ch_raw_long_reads.map { + if (it) { + hybrid = true + } + } // // Custom validation for pipeline parameters @@ -122,18 +113,17 @@ workflow PIPELINE_INITIALISATION { // Validate PRE-ASSEMBLED CONTIG input when supplied if (params.assembly_input) { - ch_input_assemblies = Channel - .fromSamplesheet("assembly_input") + ch_input_assemblies = Channel.fromList(samplesheetToList(params.assembly_input, "${projectDir}/assets/schema_assembly_input.json")) } // Prepare ASSEMBLY input channel if (params.assembly_input) { - ch_input_assemblies - .map { meta, fasta -> - return [ meta + [id: params.coassemble_group ? "group-${meta.group}" : meta.id], [ fasta ] ] - } - } else { - ch_input_assemblies = Channel.empty() + ch_input_assemblies.map { meta, fasta -> + return [meta + [id: params.coassemble_group ? "group-${meta.group}" : meta.id], [fasta]] + } + } + else { + ch_input_assemblies = Channel.empty() } // Cross validation of input assembly and read IDs: ensure groups are all represented between reads and assemblies @@ -150,31 +140,30 @@ workflow PIPELINE_INITIALISATION { .toList() .sort() - ch_read_ids.concat(ch_assembly_ids).collect(flat: false) // need flat:false to ensure the two lists of IDs in the channels don't get smushed into a single list (and thus no ids1 and ids2 lists to compare) + ch_read_ids + .concat(ch_assembly_ids) + .collect(flat: false) .map { ids1, ids2 -> if (ids1.sort() != ids2.sort()) { - exit 1, "[nf-core/mag] ERROR: supplied IDs or Groups in read and assembly CSV files do not match!" + exit(1, "[nf-core/mag] ERROR: supplied IDs or Groups in read and assembly CSV files do not match!") } } } - - emit: raw_short_reads = ch_raw_short_reads raw_long_reads = ch_raw_long_reads input_assemblies = ch_input_assemblies - versions = ch_versions + versions = ch_versions } /* -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ SUBWORKFLOW FOR PIPELINE COMPLETION -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ workflow PIPELINE_COMPLETION { - take: email // string: email address email_on_fail // string: email address sent on pipeline failure @@ -185,7 +174,6 @@ workflow PIPELINE_COMPLETION { multiqc_report // string: Path to MultiQC report main: - summary_params = paramsSummaryMap(workflow, parameters_schema: "nextflow_schema.json") // @@ -193,25 +181,32 @@ workflow PIPELINE_COMPLETION { // workflow.onComplete { if (email || email_on_fail) { - completionEmail(summary_params, email, email_on_fail, plaintext_email, outdir, monochrome_logs, multiqc_report.toList()) + completionEmail( + summary_params, + email, + email_on_fail, + plaintext_email, + outdir, + monochrome_logs, + multiqc_report.toList() + ) } completionSummary(monochrome_logs) - if (hook_url) { imNotification(summary_params, hook_url) } } workflow.onError { - log.error "Pipeline failed. Please refer to troubleshooting docs: https://nf-co.re/docs/usage/troubleshooting" + log.error("Pipeline failed. Please refer to troubleshooting docs: https://nf-co.re/docs/usage/troubleshooting") } } /* -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ FUNCTIONS -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ // // Check and validate pipeline parameters @@ -224,95 +219,86 @@ def validateInputParameters(hybrid) { error("[nf-core/mag] ERROR: Invalid combination of parameter '--binning_map_mode own' and parameter '--coassemble_group'. Select either 'all' or 'group' mapping mode when performing group-wise co-assembly.") } - // Check if specified cpus for SPAdes are available - if ( params.spades_fix_cpus > params.max_cpus ) { - error("[nf-core/mag] ERROR: Invalid parameter '--spades_fix_cpus ${params.spades_fix_cpus}', max cpus are '${params.max_cpus}'.") - } - if ( params.spadeshybrid_fix_cpus > params.max_cpus ) { - error("[nf-core/mag] ERROR: Invalid parameter '--spadeshybrid_fix_cpus ${params.spadeshybrid_fix_cpus}', max cpus are '${params.max_cpus}'.") - } // Check if settings concerning reproducibility of used tools are consistent and print warning if not if (params.megahit_fix_cpu_1 || params.spades_fix_cpus != -1 || params.spadeshybrid_fix_cpus != -1) { if (!params.skip_spades && params.spades_fix_cpus == -1) { - log.warn "[nf-core/mag]: At least one assembly process is run with a parameter to ensure reproducible results, but SPAdes not. Consider using the parameter '--spades_fix_cpus'." + log.warn("[nf-core/mag]: At least one assembly process is run with a parameter to ensure reproducible results, but SPAdes not. Consider using the parameter '--spades_fix_cpus'.") } if (hybrid && params.skip_spadeshybrid && params.spadeshybrid_fix_cpus == -1) { - log.warn "[nf-core/mag]: At least one assembly process is run with a parameter to ensure reproducible results, but SPAdes hybrid not. Consider using the parameter '--spadeshybrid_fix_cpus'." + log.warn("[nf-core/mag]: At least one assembly process is run with a parameter to ensure reproducible results, but SPAdes hybrid not. Consider using the parameter '--spadeshybrid_fix_cpus'.") } if (!params.skip_megahit && !params.megahit_fix_cpu_1) { - log.warn "[nf-core/mag]: At least one assembly process is run with a parameter to ensure reproducible results, but MEGAHIT not. Consider using the parameter '--megahit_fix_cpu_1'." + log.warn("[nf-core/mag]: At least one assembly process is run with a parameter to ensure reproducible results, but MEGAHIT not. Consider using the parameter '--megahit_fix_cpu_1'.") } if (!params.skip_binning && params.metabat_rng_seed == 0) { - log.warn "[nf-core/mag]: At least one assembly process is run with a parameter to ensure reproducible results, but for MetaBAT2 a random seed is specified ('--metabat_rng_seed 0'). Consider specifying a positive seed instead." + log.warn("[nf-core/mag]: At least one assembly process is run with a parameter to ensure reproducible results, but for MetaBAT2 a random seed is specified ('--metabat_rng_seed 0'). Consider specifying a positive seed instead.") } } // Check if SPAdes and single_end - if ( (!params.skip_spades || !params.skip_spadeshybrid) && params.single_end) { - log.warn '[nf-core/mag]: metaSPAdes does not support single-end data. SPAdes will be skipped.' + if ((!params.skip_spades || !params.skip_spadeshybrid) && params.single_end) { + log.warn('[nf-core/mag]: metaSPAdes does not support single-end data. SPAdes will be skipped.') } // Check if parameters for host contamination removal are valid - if ( params.host_fasta && params.host_genome) { + if (params.host_fasta && params.host_genome) { error('[nf-core/mag] ERROR: Both host fasta reference and iGenomes genome are specified to remove host contamination! Invalid combination, please specify either --host_fasta or --host_genome.') } - if ( hybrid && (params.host_fasta || params.host_genome) ) { - log.warn '[nf-core/mag]: Host read removal is only applied to short reads. Long reads might be filtered indirectly by Filtlong, which is set to use read qualities estimated based on k-mer matches to the short, already filtered reads.' - if ( params.longreads_length_weight > 1 ) { - log.warn "[nf-core/mag]: The parameter --longreads_length_weight is ${params.longreads_length_weight}, causing the read length being more important for long read filtering than the read quality. Set --longreads_length_weight to 1 in order to assign equal weights." + if (hybrid && (params.host_fasta || params.host_genome)) { + log.warn('[nf-core/mag]: Host read removal is only applied to short reads. Long reads might be filtered indirectly by Filtlong, which is set to use read qualities estimated based on k-mer matches to the short, already filtered reads.') + if (params.longreads_length_weight > 1) { + log.warn("[nf-core/mag]: The parameter --longreads_length_weight is ${params.longreads_length_weight}, causing the read length being more important for long read filtering than the read quality. Set --longreads_length_weight to 1 in order to assign equal weights.") } } - if ( params.host_genome ) { + if (params.host_genome) { if (!params.genomes) { error('[nf-core/mag] ERROR: No config file containing genomes provided!') } // Check if host genome exists in the config file if (!params.genomes.containsKey(params.host_genome)) { - error('=============================================================================\n' + - " Host genome '${params.host_genome}' not found in any config files provided to the pipeline.\n" + - ' Currently, the available genome keys are:\n' + - " ${params.genomes.keySet().join(', ')}\n" + - '===================================================================================') + error( + '=============================================================================\n' + " Host genome '${params.host_genome}' not found in any config files provided to the pipeline.\n" + ' Currently, the available genome keys are:\n' + " ${params.genomes.keySet().join(', ')}\n" + '===================================================================================' + ) } - if ( !params.genomes[params.host_genome].fasta ) { + if (!params.genomes[params.host_genome].fasta) { error("[nf-core/mag] ERROR: No fasta file specified for the host genome ${params.host_genome}!") } - if ( !params.genomes[params.host_genome].bowtie2 ) { + if (!params.genomes[params.host_genome].bowtie2) { error("[nf-core/mag] ERROR: No Bowtie 2 index file specified for the host genome ${params.host_genome}!") } } // Check MetaBAT2 inputs - if ( !params.skip_metabat2 && params.min_contig_size < 1500 ) { - log.warn "[nf-core/mag]: Specified min. contig size under minimum for MetaBAT2. MetaBAT2 will be run with 1500 (other binners not affected). You supplied: --min_contig_size ${params.min_contig_size}" + if (!params.skip_metabat2 && params.min_contig_size < 1500) { + log.warn("[nf-core/mag]: Specified min. contig size under minimum for MetaBAT2. MetaBAT2 will be run with 1500 (other binners not affected). You supplied: --min_contig_size ${params.min_contig_size}") } // Check more than one binner is run for bin refinement (required DAS by Tool) // If the number of run binners (i.e., number of not-skipped) is more than one, otherwise throw an error - if ( params.refine_bins_dastool && !([ params.skip_metabat2, params.skip_maxbin2, params.skip_concoct ].count(false) > 1) ) { + if (params.refine_bins_dastool && !([params.skip_metabat2, params.skip_maxbin2, params.skip_concoct].count(false) > 1)) { error('[nf-core/mag] ERROR: Bin refinement with --refine_bins_dastool requires at least two binners to be running (not skipped). Check input.') } // Check that bin refinement is actually turned on if any of the refined bins are requested for downstream if (!params.refine_bins_dastool && params.postbinning_input != 'raw_bins_only') { - error("[nf-core/mag] ERROR: The parameter '--postbinning_input ${ params.postbinning_input }' for downstream steps can only be specified if bin refinement is activated with --refine_bins_dastool! Check input.") + error("[nf-core/mag] ERROR: The parameter '--postbinning_input ${params.postbinning_input}' for downstream steps can only be specified if bin refinement is activated with --refine_bins_dastool! Check input.") } // Check if BUSCO parameters combinations are valid if (params.skip_binqc && params.binqc_tool == 'checkm') { - error('[nf-core/mag] ERROR: Both --skip_binqc and --binqc_tool \'checkm\' are specified! Invalid combination, please specify either --skip_binqc or --binqc_tool.') + error('[nf-core/mag] ERROR: Both --skip_binqc and --binqc_tool 'checkm' are specified! Invalid combination, please specify either --skip_binqc or --binqc_tool.') } if (params.skip_binqc) { if (params.busco_db) { - error('[nf-core/mag] ERROR: Both --skip_binqc and --busco_db are specified! Invalid combination, please specify either --skip_binqc or --binqc_tool \'busco\' with --busco_db.') + error('[nf-core/mag] ERROR: Both --skip_binqc and --busco_db are specified! Invalid combination, please specify either --skip_binqc or --binqc_tool 'busco' with --busco_db.') } if (params.busco_auto_lineage_prok) { - error('[nf-core/mag] ERROR: Both --skip_binqc and --busco_auto_lineage_prok are specified! Invalid combination, please specify either --skip_binqc or --binqc_tool \'busco\' with --busco_auto_lineage_prok.') + error('[nf-core/mag] ERROR: Both --skip_binqc and --busco_auto_lineage_prok are specified! Invalid combination, please specify either --skip_binqc or --binqc_tool 'busco' with --busco_auto_lineage_prok.') } } if (params.skip_binqc && !params.skip_gtdbtk) { - log.warn '[nf-core/mag]: --skip_binqc is specified, but --skip_gtdbtk is explictly set to run! GTDB-tk will be omitted because GTDB-tk bin classification requires bin filtering based on BUSCO or CheckM QC results to avoid GTDB-tk errors.' + log.warn('[nf-core/mag]: --skip_binqc is specified, but --skip_gtdbtk is explictly set to run! GTDB-tk will be omitted because GTDB-tk bin classification requires bin filtering based on BUSCO or CheckM QC results to avoid GTDB-tk errors.') } // Check if CAT parameters are valid @@ -337,8 +323,12 @@ def validateInputParameters(hybrid) { // def validateInputSamplesheet(meta, sr1, sr2, lr) { - if ( !sr2 && !params.single_end ) { error("[nf-core/mag] ERROR: Single-end data must be executed with `--single_end`. Note that it is not possible to mix single- and paired-end data in one run! Check input TSV for sample: ${meta.id}") } - if ( sr2 && params.single_end ) { error("[nf-core/mag] ERROR: Paired-end data must be executed without `--single_end`. Note that it is not possible to mix single- and paired-end data in one run! Check input TSV for sample: ${meta.id}") } + if (!sr2 && !params.single_end) { + error("[nf-core/mag] ERROR: Single-end data must be executed with `--single_end`. Note that it is not possible to mix single- and paired-end data in one run! Check input TSV for sample: ${meta.id}") + } + if (sr2 && params.single_end) { + error("[nf-core/mag] ERROR: Paired-end data must be executed without `--single_end`. Note that it is not possible to mix single- and paired-end data in one run! Check input TSV for sample: ${meta.id}") + } return [meta, sr1, sr2, lr] } @@ -347,9 +337,9 @@ def validateInputSamplesheet(meta, sr1, sr2, lr) { // Get attribute from genome config file e.g. fasta // def getGenomeAttribute(attribute) { - if (params.genomes && params.genome && params.genomes.containsKey(params.genome)) { - if (params.genomes[ params.genome ].containsKey(attribute)) { - return params.genomes[ params.genome ][ attribute ] + if (params.genomes && params.host_genome && params.genomes.containsKey(params.host_genome)) { + if (params.genomes[params.host_genome].containsKey(attribute)) { + return params.genomes[params.host_genome][attribute] } } return null @@ -359,16 +349,11 @@ def getGenomeAttribute(attribute) { // Exit pipeline if incorrect --genome key provided // def genomeExistsError() { - if (params.genomes && params.genome && !params.genomes.containsKey(params.genome)) { - def error_string = "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n" + - " Genome '${params.genome}' not found in any config files provided to the pipeline.\n" + - " Currently, the available genome keys are:\n" + - " ${params.genomes.keySet().join(", ")}\n" + - "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~" + if (params.genomes && params.host_genome && !params.genomes.containsKey(params.genome)) { + def error_string = "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n" + " Genome '${params.host_genome}' not found in any config files provided to the pipeline.\n" + " Currently, the available genome keys are:\n" + " ${params.host_genomes.keySet().join(", ")}\n" + "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~" error(error_string) } } - // // Generate methods description for MultiQC // @@ -377,11 +362,11 @@ def toolCitationText() { // Can use ternary operators to dynamically construct based conditions, e.g. params["run_xyz"] ? "Tool (Foo et al. 2023)" : "", // Uncomment function in methodsDescriptionText to render in MultiQC report def citation_text = [ - "Tools used in the workflow included:", - "FastQC (Andrews 2010),", - "MultiQC (Ewels et al. 2016)", - "." - ].join(' ').trim() + "Tools used in the workflow included:", + "FastQC (Andrews 2010),", + "MultiQC (Ewels et al. 2016)", + "." + ].join(' ').trim() return citation_text } @@ -391,9 +376,9 @@ def toolBibliographyText() { // Can use ternary operators to dynamically construct based conditions, e.g. params["run_xyz"] ? "
  • Author (2023) Pub name, Journal, DOI
  • " : "", // Uncomment function in methodsDescriptionText to render in MultiQC report def reference_text = [ - "
  • Andrews S, (2010) FastQC, URL: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/).
  • ", - "
  • Ewels, P., Magnusson, M., Lundin, S., & Käller, M. (2016). MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics , 32(19), 3047–3048. doi: /10.1093/bioinformatics/btw354
  • " - ].join(' ').trim() + "
  • Andrews S, (2010) FastQC, URL: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/).
  • ", + "
  • Ewels, P., Magnusson, M., Lundin, S., & Käller, M. (2016). MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics , 32(19), 3047–3048. doi: /10.1093/bioinformatics/btw354
  • " + ].join(' ').trim() return reference_text } @@ -410,10 +395,15 @@ def methodsDescriptionText(mqc_methods_yaml) { // Removing `https://doi.org/` to handle pipelines using DOIs vs DOI resolvers // Removing ` ` since the manifest.doi is a string and not a proper list def temp_doi_ref = "" - String[] manifest_doi = meta.manifest_map.doi.tokenize(",") - for (String doi_ref: manifest_doi) temp_doi_ref += "(doi: ${doi_ref.replace("https://doi.org/", "").replace(" ", "")}), " + def manifest_doi = meta.manifest_map.doi.tokenize(",") + manifest_doi.each { doi_ref -> + temp_doi_ref += "(doi: ${doi_ref.replace("https://doi.org/", "").replace(" ", "")}), " + } meta["doi_text"] = temp_doi_ref.substring(0, temp_doi_ref.length() - 2) - } else meta["doi_text"] = "" + } + else { + meta["doi_text"] = "" + } meta["nodoi_text"] = meta.manifest_map.doi ? "" : "
  • If available, make sure to update the text to include the Zenodo DOI of version of the pipeline used.
  • " // Tool references @@ -427,7 +417,7 @@ def methodsDescriptionText(mqc_methods_yaml) { def methods_text = mqc_methods_yaml.text - def engine = new groovy.text.SimpleTemplateEngine() + def engine = new groovy.text.SimpleTemplateEngine() def description_html = engine.createTemplate(methods_text).make(meta) return description_html.toString() diff --git a/subworkflows/nf-core/utils_nextflow_pipeline/main.nf b/subworkflows/nf-core/utils_nextflow_pipeline/main.nf index ac31f28f..0fcbf7b3 100644 --- a/subworkflows/nf-core/utils_nextflow_pipeline/main.nf +++ b/subworkflows/nf-core/utils_nextflow_pipeline/main.nf @@ -2,18 +2,13 @@ // Subworkflow with functionality that may be useful for any Nextflow pipeline // -import org.yaml.snakeyaml.Yaml -import groovy.json.JsonOutput -import nextflow.extension.FilesEx - /* -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ SUBWORKFLOW DEFINITION -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ workflow UTILS_NEXTFLOW_PIPELINE { - take: print_version // boolean: print version dump_parameters // boolean: dump parameters @@ -26,7 +21,7 @@ workflow UTILS_NEXTFLOW_PIPELINE { // Print workflow version and exit on --version // if (print_version) { - log.info "${workflow.manifest.name} ${getWorkflowVersion()}" + log.info("${workflow.manifest.name} ${getWorkflowVersion()}") System.exit(0) } @@ -49,16 +44,16 @@ workflow UTILS_NEXTFLOW_PIPELINE { } /* -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ FUNCTIONS -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ // // Generate version string // def getWorkflowVersion() { - String version_string = "" + def version_string = "" as String if (workflow.manifest.version) { def prefix_v = workflow.manifest.version[0] != 'v' ? 'v' : '' version_string += "${prefix_v}${workflow.manifest.version}" @@ -76,13 +71,13 @@ def getWorkflowVersion() { // Dump pipeline parameters to a JSON file // def dumpParametersToJSON(outdir) { - def timestamp = new java.util.Date().format( 'yyyy-MM-dd_HH-mm-ss') - def filename = "params_${timestamp}.json" - def temp_pf = new File(workflow.launchDir.toString(), ".${filename}") - def jsonStr = JsonOutput.toJson(params) - temp_pf.text = JsonOutput.prettyPrint(jsonStr) + def timestamp = new java.util.Date().format('yyyy-MM-dd_HH-mm-ss') + def filename = "params_${timestamp}.json" + def temp_pf = new File(workflow.launchDir.toString(), ".${filename}") + def jsonStr = groovy.json.JsonOutput.toJson(params) + temp_pf.text = groovy.json.JsonOutput.prettyPrint(jsonStr) - FilesEx.copyTo(temp_pf.toPath(), "${outdir}/pipeline_info/params_${timestamp}.json") + nextflow.extension.FilesEx.copyTo(temp_pf.toPath(), "${outdir}/pipeline_info/params_${timestamp}.json") temp_pf.delete() } @@ -90,37 +85,40 @@ def dumpParametersToJSON(outdir) { // When running with -profile conda, warn if channels have not been set-up appropriately // def checkCondaChannels() { - Yaml parser = new Yaml() + def parser = new org.yaml.snakeyaml.Yaml() def channels = [] try { def config = parser.load("conda config --show channels".execute().text) channels = config.channels - } catch(NullPointerException | IOException e) { - log.warn "Could not verify conda channel configuration." - return + } + catch (NullPointerException e) { + log.warn("Could not verify conda channel configuration.") + return null + } + catch (IOException e) { + log.warn("Could not verify conda channel configuration.") + return null } // Check that all channels are present // This channel list is ordered by required channel priority. - def required_channels_in_order = ['conda-forge', 'bioconda', 'defaults'] + def required_channels_in_order = ['conda-forge', 'bioconda'] def channels_missing = ((required_channels_in_order as Set) - (channels as Set)) as Boolean // Check that they are in the right order - def channel_priority_violation = false - def n = required_channels_in_order.size() - for (int i = 0; i < n - 1; i++) { - channel_priority_violation |= !(channels.indexOf(required_channels_in_order[i]) < channels.indexOf(required_channels_in_order[i+1])) - } + def channel_priority_violation = required_channels_in_order != channels.findAll { ch -> ch in required_channels_in_order } if (channels_missing | channel_priority_violation) { - log.warn "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n" + - " There is a problem with your Conda configuration!\n\n" + - " You will need to set-up the conda-forge and bioconda channels correctly.\n" + - " Please refer to https://bioconda.github.io/\n" + - " The observed channel order is \n" + - " ${channels}\n" + - " but the following channel order is required:\n" + - " ${required_channels_in_order}\n" + - "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~" + log.warn """\ + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + There is a problem with your Conda configuration! + You will need to set-up the conda-forge and bioconda channels correctly. + Please refer to https://bioconda.github.io/ + The observed channel order is + ${channels} + but the following channel order is required: + ${required_channels_in_order} + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~" + """.stripIndent(true) } } diff --git a/subworkflows/nf-core/utils_nextflow_pipeline/tests/nextflow.config b/subworkflows/nf-core/utils_nextflow_pipeline/tests/nextflow.config index d0a926bf..a09572e5 100644 --- a/subworkflows/nf-core/utils_nextflow_pipeline/tests/nextflow.config +++ b/subworkflows/nf-core/utils_nextflow_pipeline/tests/nextflow.config @@ -3,7 +3,7 @@ manifest { author = """nf-core""" homePage = 'https://127.0.0.1' description = """Dummy pipeline""" - nextflowVersion = '!>=23.04.0' + nextflowVersion = '!>=23.04.0' version = '9.9.9' doi = 'https://doi.org/10.5281/zenodo.5070524' } diff --git a/subworkflows/nf-core/utils_nfcore_pipeline/main.nf b/subworkflows/nf-core/utils_nfcore_pipeline/main.nf index 14558c39..5cb7bafe 100644 --- a/subworkflows/nf-core/utils_nfcore_pipeline/main.nf +++ b/subworkflows/nf-core/utils_nfcore_pipeline/main.nf @@ -2,17 +2,13 @@ // Subworkflow with utility functions specific to the nf-core pipeline template // -import org.yaml.snakeyaml.Yaml -import nextflow.extension.FilesEx - /* -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ SUBWORKFLOW DEFINITION -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ workflow UTILS_NFCORE_PIPELINE { - take: nextflow_cli_args @@ -25,23 +21,20 @@ workflow UTILS_NFCORE_PIPELINE { } /* -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ FUNCTIONS -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ // // Warn if a -profile or Nextflow config has not been provided to run the pipeline // def checkConfigProvided() { - valid_config = true + def valid_config = true as Boolean if (workflow.profile == 'standard' && workflow.configFiles.size() <= 1) { - log.warn "[$workflow.manifest.name] You are attempting to run the pipeline without any custom configuration!\n\n" + - "This will be dependent on your local compute environment but can be achieved via one or more of the following:\n" + - " (1) Using an existing pipeline profile e.g. `-profile docker` or `-profile singularity`\n" + - " (2) Using an existing nf-core/configs for your Institution e.g. `-profile crick` or `-profile uppmax`\n" + - " (3) Using your own local custom config e.g. `-c /path/to/your/custom.config`\n\n" + - "Please refer to the quick start section and usage docs for the pipeline.\n " + log.warn( + "[${workflow.manifest.name}] You are attempting to run the pipeline without any custom configuration!\n\n" + "This will be dependent on your local compute environment but can be achieved via one or more of the following:\n" + " (1) Using an existing pipeline profile e.g. `-profile docker` or `-profile singularity`\n" + " (2) Using an existing nf-core/configs for your Institution e.g. `-profile crick` or `-profile uppmax`\n" + " (3) Using your own local custom config e.g. `-c /path/to/your/custom.config`\n\n" + "Please refer to the quick start section and usage docs for the pipeline.\n " + ) valid_config = false } return valid_config @@ -52,12 +45,14 @@ def checkConfigProvided() { // def checkProfileProvided(nextflow_cli_args) { if (workflow.profile.endsWith(',')) { - error "The `-profile` option cannot end with a trailing comma, please remove it and re-run the pipeline!\n" + - "HINT: A common mistake is to provide multiple values separated by spaces e.g. `-profile test, docker`.\n" + error( + "The `-profile` option cannot end with a trailing comma, please remove it and re-run the pipeline!\n" + "HINT: A common mistake is to provide multiple values separated by spaces e.g. `-profile test, docker`.\n" + ) } if (nextflow_cli_args[0]) { - log.warn "nf-core pipelines do not accept positional arguments. The positional argument `${nextflow_cli_args[0]}` has been detected.\n" + - "HINT: A common mistake is to provide multiple values separated by spaces e.g. `-profile test, docker`.\n" + log.warn( + "nf-core pipelines do not accept positional arguments. The positional argument `${nextflow_cli_args[0]}` has been detected.\n" + "HINT: A common mistake is to provide multiple values separated by spaces e.g. `-profile test, docker`.\n" + ) } } @@ -66,25 +61,21 @@ def checkProfileProvided(nextflow_cli_args) { // def workflowCitation() { def temp_doi_ref = "" - String[] manifest_doi = workflow.manifest.doi.tokenize(",") - // Using a loop to handle multiple DOIs + def manifest_doi = workflow.manifest.doi.tokenize(",") + // Handling multiple DOIs // Removing `https://doi.org/` to handle pipelines using DOIs vs DOI resolvers // Removing ` ` since the manifest.doi is a string and not a proper list - for (String doi_ref: manifest_doi) temp_doi_ref += " https://doi.org/${doi_ref.replace('https://doi.org/', '').replace(' ', '')}\n" - return "If you use ${workflow.manifest.name} for your analysis please cite:\n\n" + - "* The pipeline\n" + - temp_doi_ref + "\n" + - "* The nf-core framework\n" + - " https://doi.org/10.1038/s41587-020-0439-x\n\n" + - "* Software dependencies\n" + - " https://github.com/${workflow.manifest.name}/blob/master/CITATIONS.md" + manifest_doi.each { doi_ref -> + temp_doi_ref += " https://doi.org/${doi_ref.replace('https://doi.org/', '').replace(' ', '')}\n" + } + return "If you use ${workflow.manifest.name} for your analysis please cite:\n\n" + "* The pipeline\n" + temp_doi_ref + "\n" + "* The nf-core framework\n" + " https://doi.org/10.1038/s41587-020-0439-x\n\n" + "* Software dependencies\n" + " https://github.com/${workflow.manifest.name}/blob/master/CITATIONS.md" } // // Generate workflow version string // def getWorkflowVersion() { - String version_string = "" + def version_string = "" as String if (workflow.manifest.version) { def prefix_v = workflow.manifest.version[0] != 'v' ? 'v' : '' version_string += "${prefix_v}${workflow.manifest.version}" @@ -102,8 +93,8 @@ def getWorkflowVersion() { // Get software versions for pipeline // def processVersionsFromYAML(yaml_file) { - Yaml yaml = new Yaml() - versions = yaml.load(yaml_file).collectEntries { k, v -> [ k.tokenize(':')[-1], v ] } + def yaml = new org.yaml.snakeyaml.Yaml() + def versions = yaml.load(yaml_file).collectEntries { k, v -> [k.tokenize(':')[-1], v] } return yaml.dumpAsMap(versions).trim() } @@ -113,8 +104,8 @@ def processVersionsFromYAML(yaml_file) { def workflowVersionToYAML() { return """ Workflow: - $workflow.manifest.name: ${getWorkflowVersion()} - Nextflow: $workflow.nextflow.version + ${workflow.manifest.name}: ${getWorkflowVersion()} + Nextflow: ${workflow.nextflow.version} """.stripIndent().trim() } @@ -122,11 +113,7 @@ def workflowVersionToYAML() { // Get channel of software versions used in pipeline in YAML format // def softwareVersionsToYAML(ch_versions) { - return ch_versions - .unique() - .map { processVersionsFromYAML(it) } - .unique() - .mix(Channel.of(workflowVersionToYAML())) + return ch_versions.unique().map { version -> processVersionsFromYAML(version) }.unique().mix(Channel.of(workflowVersionToYAML())) } // @@ -134,25 +121,31 @@ def softwareVersionsToYAML(ch_versions) { // def paramsSummaryMultiqc(summary_params) { def summary_section = '' - for (group in summary_params.keySet()) { - def group_params = summary_params.get(group) // This gets the parameters of that particular group - if (group_params) { - summary_section += "

    $group

    \n" - summary_section += "
    \n" - for (param in group_params.keySet()) { - summary_section += "
    $param
    ${group_params.get(param) ?: 'N/A'}
    \n" + summary_params + .keySet() + .each { group -> + def group_params = summary_params.get(group) + // This gets the parameters of that particular group + if (group_params) { + summary_section += "

    ${group}

    \n" + summary_section += "
    \n" + group_params + .keySet() + .sort() + .each { param -> + summary_section += "
    ${param}
    ${group_params.get(param) ?: 'N/A'}
    \n" + } + summary_section += "
    \n" } - summary_section += "
    \n" } - } - String yaml_file_text = "id: '${workflow.manifest.name.replace('/','-')}-summary'\n" - yaml_file_text += "description: ' - this information is collected when the pipeline is started.'\n" - yaml_file_text += "section_name: '${workflow.manifest.name} Workflow Summary'\n" - yaml_file_text += "section_href: 'https://github.com/${workflow.manifest.name}'\n" - yaml_file_text += "plot_type: 'html'\n" - yaml_file_text += "data: |\n" - yaml_file_text += "${summary_section}" + def yaml_file_text = "id: '${workflow.manifest.name.replace('/', '-')}-summary'\n" as String + yaml_file_text += "description: ' - this information is collected when the pipeline is started.'\n" + yaml_file_text += "section_name: '${workflow.manifest.name} Workflow Summary'\n" + yaml_file_text += "section_href: 'https://github.com/${workflow.manifest.name}'\n" + yaml_file_text += "plot_type: 'html'\n" + yaml_file_text += "data: |\n" + yaml_file_text += "${summary_section}" return yaml_file_text } @@ -161,7 +154,7 @@ def paramsSummaryMultiqc(summary_params) { // nf-core logo // def nfCoreLogo(monochrome_logs=true) { - Map colors = logColours(monochrome_logs) + def colors = logColours(monochrome_logs) as Map String.format( """\n ${dashedLine(monochrome_logs)} @@ -180,7 +173,7 @@ def nfCoreLogo(monochrome_logs=true) { // Return dashed line // def dashedLine(monochrome_logs=true) { - Map colors = logColours(monochrome_logs) + def colors = logColours(monochrome_logs) as Map return "-${colors.dim}----------------------------------------------------${colors.reset}-" } @@ -188,7 +181,7 @@ def dashedLine(monochrome_logs=true) { // ANSII colours used for terminal logging // def logColours(monochrome_logs=true) { - Map colorcodes = [:] + def colorcodes = [:] as Map // Reset / Meta colorcodes['reset'] = monochrome_logs ? '' : "\033[0m" @@ -200,54 +193,54 @@ def logColours(monochrome_logs=true) { colorcodes['hidden'] = monochrome_logs ? '' : "\033[8m" // Regular Colors - colorcodes['black'] = monochrome_logs ? '' : "\033[0;30m" - colorcodes['red'] = monochrome_logs ? '' : "\033[0;31m" - colorcodes['green'] = monochrome_logs ? '' : "\033[0;32m" - colorcodes['yellow'] = monochrome_logs ? '' : "\033[0;33m" - colorcodes['blue'] = monochrome_logs ? '' : "\033[0;34m" - colorcodes['purple'] = monochrome_logs ? '' : "\033[0;35m" - colorcodes['cyan'] = monochrome_logs ? '' : "\033[0;36m" - colorcodes['white'] = monochrome_logs ? '' : "\033[0;37m" + colorcodes['black'] = monochrome_logs ? '' : "\033[0;30m" + colorcodes['red'] = monochrome_logs ? '' : "\033[0;31m" + colorcodes['green'] = monochrome_logs ? '' : "\033[0;32m" + colorcodes['yellow'] = monochrome_logs ? '' : "\033[0;33m" + colorcodes['blue'] = monochrome_logs ? '' : "\033[0;34m" + colorcodes['purple'] = monochrome_logs ? '' : "\033[0;35m" + colorcodes['cyan'] = monochrome_logs ? '' : "\033[0;36m" + colorcodes['white'] = monochrome_logs ? '' : "\033[0;37m" // Bold - colorcodes['bblack'] = monochrome_logs ? '' : "\033[1;30m" - colorcodes['bred'] = monochrome_logs ? '' : "\033[1;31m" - colorcodes['bgreen'] = monochrome_logs ? '' : "\033[1;32m" - colorcodes['byellow'] = monochrome_logs ? '' : "\033[1;33m" - colorcodes['bblue'] = monochrome_logs ? '' : "\033[1;34m" - colorcodes['bpurple'] = monochrome_logs ? '' : "\033[1;35m" - colorcodes['bcyan'] = monochrome_logs ? '' : "\033[1;36m" - colorcodes['bwhite'] = monochrome_logs ? '' : "\033[1;37m" + colorcodes['bblack'] = monochrome_logs ? '' : "\033[1;30m" + colorcodes['bred'] = monochrome_logs ? '' : "\033[1;31m" + colorcodes['bgreen'] = monochrome_logs ? '' : "\033[1;32m" + colorcodes['byellow'] = monochrome_logs ? '' : "\033[1;33m" + colorcodes['bblue'] = monochrome_logs ? '' : "\033[1;34m" + colorcodes['bpurple'] = monochrome_logs ? '' : "\033[1;35m" + colorcodes['bcyan'] = monochrome_logs ? '' : "\033[1;36m" + colorcodes['bwhite'] = monochrome_logs ? '' : "\033[1;37m" // Underline - colorcodes['ublack'] = monochrome_logs ? '' : "\033[4;30m" - colorcodes['ured'] = monochrome_logs ? '' : "\033[4;31m" - colorcodes['ugreen'] = monochrome_logs ? '' : "\033[4;32m" - colorcodes['uyellow'] = monochrome_logs ? '' : "\033[4;33m" - colorcodes['ublue'] = monochrome_logs ? '' : "\033[4;34m" - colorcodes['upurple'] = monochrome_logs ? '' : "\033[4;35m" - colorcodes['ucyan'] = monochrome_logs ? '' : "\033[4;36m" - colorcodes['uwhite'] = monochrome_logs ? '' : "\033[4;37m" + colorcodes['ublack'] = monochrome_logs ? '' : "\033[4;30m" + colorcodes['ured'] = monochrome_logs ? '' : "\033[4;31m" + colorcodes['ugreen'] = monochrome_logs ? '' : "\033[4;32m" + colorcodes['uyellow'] = monochrome_logs ? '' : "\033[4;33m" + colorcodes['ublue'] = monochrome_logs ? '' : "\033[4;34m" + colorcodes['upurple'] = monochrome_logs ? '' : "\033[4;35m" + colorcodes['ucyan'] = monochrome_logs ? '' : "\033[4;36m" + colorcodes['uwhite'] = monochrome_logs ? '' : "\033[4;37m" // High Intensity - colorcodes['iblack'] = monochrome_logs ? '' : "\033[0;90m" - colorcodes['ired'] = monochrome_logs ? '' : "\033[0;91m" - colorcodes['igreen'] = monochrome_logs ? '' : "\033[0;92m" - colorcodes['iyellow'] = monochrome_logs ? '' : "\033[0;93m" - colorcodes['iblue'] = monochrome_logs ? '' : "\033[0;94m" - colorcodes['ipurple'] = monochrome_logs ? '' : "\033[0;95m" - colorcodes['icyan'] = monochrome_logs ? '' : "\033[0;96m" - colorcodes['iwhite'] = monochrome_logs ? '' : "\033[0;97m" + colorcodes['iblack'] = monochrome_logs ? '' : "\033[0;90m" + colorcodes['ired'] = monochrome_logs ? '' : "\033[0;91m" + colorcodes['igreen'] = monochrome_logs ? '' : "\033[0;92m" + colorcodes['iyellow'] = monochrome_logs ? '' : "\033[0;93m" + colorcodes['iblue'] = monochrome_logs ? '' : "\033[0;94m" + colorcodes['ipurple'] = monochrome_logs ? '' : "\033[0;95m" + colorcodes['icyan'] = monochrome_logs ? '' : "\033[0;96m" + colorcodes['iwhite'] = monochrome_logs ? '' : "\033[0;97m" // Bold High Intensity - colorcodes['biblack'] = monochrome_logs ? '' : "\033[1;90m" - colorcodes['bired'] = monochrome_logs ? '' : "\033[1;91m" - colorcodes['bigreen'] = monochrome_logs ? '' : "\033[1;92m" - colorcodes['biyellow'] = monochrome_logs ? '' : "\033[1;93m" - colorcodes['biblue'] = monochrome_logs ? '' : "\033[1;94m" - colorcodes['bipurple'] = monochrome_logs ? '' : "\033[1;95m" - colorcodes['bicyan'] = monochrome_logs ? '' : "\033[1;96m" - colorcodes['biwhite'] = monochrome_logs ? '' : "\033[1;97m" + colorcodes['biblack'] = monochrome_logs ? '' : "\033[1;90m" + colorcodes['bired'] = monochrome_logs ? '' : "\033[1;91m" + colorcodes['bigreen'] = monochrome_logs ? '' : "\033[1;92m" + colorcodes['biyellow'] = monochrome_logs ? '' : "\033[1;93m" + colorcodes['biblue'] = monochrome_logs ? '' : "\033[1;94m" + colorcodes['bipurple'] = monochrome_logs ? '' : "\033[1;95m" + colorcodes['bicyan'] = monochrome_logs ? '' : "\033[1;96m" + colorcodes['biwhite'] = monochrome_logs ? '' : "\033[1;97m" return colorcodes } @@ -262,14 +255,15 @@ def attachMultiqcReport(multiqc_report) { mqc_report = multiqc_report.getVal() if (mqc_report.getClass() == ArrayList && mqc_report.size() >= 1) { if (mqc_report.size() > 1) { - log.warn "[$workflow.manifest.name] Found multiple reports from process 'MULTIQC', will use only one" + log.warn("[${workflow.manifest.name}] Found multiple reports from process 'MULTIQC', will use only one") } mqc_report = mqc_report[0] } } - } catch (all) { + } + catch (Exception all) { if (multiqc_report) { - log.warn "[$workflow.manifest.name] Could not attach MultiQC report to summary email" + log.warn("[${workflow.manifest.name}] Could not attach MultiQC report to summary email") } } return mqc_report @@ -281,26 +275,35 @@ def attachMultiqcReport(multiqc_report) { def completionEmail(summary_params, email, email_on_fail, plaintext_email, outdir, monochrome_logs=true, multiqc_report=null) { // Set up the e-mail variables - def subject = "[$workflow.manifest.name] Successful: $workflow.runName" + def subject = "[${workflow.manifest.name}] Successful: ${workflow.runName}" if (!workflow.success) { - subject = "[$workflow.manifest.name] FAILED: $workflow.runName" + subject = "[${workflow.manifest.name}] FAILED: ${workflow.runName}" } def summary = [:] - for (group in summary_params.keySet()) { - summary << summary_params[group] - } + summary_params + .keySet() + .sort() + .each { group -> + summary << summary_params[group] + } def misc_fields = [:] misc_fields['Date Started'] = workflow.start misc_fields['Date Completed'] = workflow.complete misc_fields['Pipeline script file path'] = workflow.scriptFile misc_fields['Pipeline script hash ID'] = workflow.scriptId - if (workflow.repository) misc_fields['Pipeline repository Git URL'] = workflow.repository - if (workflow.commitId) misc_fields['Pipeline repository Git Commit'] = workflow.commitId - if (workflow.revision) misc_fields['Pipeline Git branch/tag'] = workflow.revision - misc_fields['Nextflow Version'] = workflow.nextflow.version - misc_fields['Nextflow Build'] = workflow.nextflow.build + if (workflow.repository) { + misc_fields['Pipeline repository Git URL'] = workflow.repository + } + if (workflow.commitId) { + misc_fields['Pipeline repository Git Commit'] = workflow.commitId + } + if (workflow.revision) { + misc_fields['Pipeline Git branch/tag'] = workflow.revision + } + misc_fields['Nextflow Version'] = workflow.nextflow.version + misc_fields['Nextflow Build'] = workflow.nextflow.build misc_fields['Nextflow Compile Timestamp'] = workflow.nextflow.timestamp def email_fields = [:] @@ -338,39 +341,41 @@ def completionEmail(summary_params, email, email_on_fail, plaintext_email, outdi // Render the sendmail template def max_multiqc_email_size = (params.containsKey('max_multiqc_email_size') ? params.max_multiqc_email_size : 0) as nextflow.util.MemoryUnit - def smail_fields = [ email: email_address, subject: subject, email_txt: email_txt, email_html: email_html, projectDir: "${workflow.projectDir}", mqcFile: mqc_report, mqcMaxSize: max_multiqc_email_size.toBytes() ] + def smail_fields = [email: email_address, subject: subject, email_txt: email_txt, email_html: email_html, projectDir: "${workflow.projectDir}", mqcFile: mqc_report, mqcMaxSize: max_multiqc_email_size.toBytes()] def sf = new File("${workflow.projectDir}/assets/sendmail_template.txt") def sendmail_template = engine.createTemplate(sf).make(smail_fields) def sendmail_html = sendmail_template.toString() // Send the HTML e-mail - Map colors = logColours(monochrome_logs) + def colors = logColours(monochrome_logs) as Map if (email_address) { try { - if (plaintext_email) { throw GroovyException('Send plaintext e-mail, not HTML') } + if (plaintext_email) { +new org.codehaus.groovy.GroovyException('Send plaintext e-mail, not HTML') } // Try to send HTML e-mail using sendmail def sendmail_tf = new File(workflow.launchDir.toString(), ".sendmail_tmp.html") sendmail_tf.withWriter { w -> w << sendmail_html } - [ 'sendmail', '-t' ].execute() << sendmail_html - log.info "-${colors.purple}[$workflow.manifest.name]${colors.green} Sent summary e-mail to $email_address (sendmail)-" - } catch (all) { + ['sendmail', '-t'].execute() << sendmail_html + log.info("-${colors.purple}[${workflow.manifest.name}]${colors.green} Sent summary e-mail to ${email_address} (sendmail)-") + } + catch (Exception all) { // Catch failures and try with plaintext - def mail_cmd = [ 'mail', '-s', subject, '--content-type=text/html', email_address ] + def mail_cmd = ['mail', '-s', subject, '--content-type=text/html', email_address] mail_cmd.execute() << email_html - log.info "-${colors.purple}[$workflow.manifest.name]${colors.green} Sent summary e-mail to $email_address (mail)-" + log.info("-${colors.purple}[${workflow.manifest.name}]${colors.green} Sent summary e-mail to ${email_address} (mail)-") } } // Write summary e-mail HTML to a file def output_hf = new File(workflow.launchDir.toString(), ".pipeline_report.html") output_hf.withWriter { w -> w << email_html } - FilesEx.copyTo(output_hf.toPath(), "${outdir}/pipeline_info/pipeline_report.html"); + nextflow.extension.FilesEx.copyTo(output_hf.toPath(), "${outdir}/pipeline_info/pipeline_report.html") output_hf.delete() // Write summary e-mail TXT to a file def output_tf = new File(workflow.launchDir.toString(), ".pipeline_report.txt") output_tf.withWriter { w -> w << email_txt } - FilesEx.copyTo(output_tf.toPath(), "${outdir}/pipeline_info/pipeline_report.txt"); + nextflow.extension.FilesEx.copyTo(output_tf.toPath(), "${outdir}/pipeline_info/pipeline_report.txt") output_tf.delete() } @@ -378,15 +383,17 @@ def completionEmail(summary_params, email, email_on_fail, plaintext_email, outdi // Print pipeline summary on completion // def completionSummary(monochrome_logs=true) { - Map colors = logColours(monochrome_logs) + def colors = logColours(monochrome_logs) as Map if (workflow.success) { if (workflow.stats.ignoredCount == 0) { - log.info "-${colors.purple}[$workflow.manifest.name]${colors.green} Pipeline completed successfully${colors.reset}-" - } else { - log.info "-${colors.purple}[$workflow.manifest.name]${colors.yellow} Pipeline completed successfully, but with errored process(es) ${colors.reset}-" + log.info("-${colors.purple}[${workflow.manifest.name}]${colors.green} Pipeline completed successfully${colors.reset}-") + } + else { + log.info("-${colors.purple}[${workflow.manifest.name}]${colors.yellow} Pipeline completed successfully, but with errored process(es) ${colors.reset}-") } - } else { - log.info "-${colors.purple}[$workflow.manifest.name]${colors.red} Pipeline completed with errors${colors.reset}-" + } + else { + log.info("-${colors.purple}[${workflow.manifest.name}]${colors.red} Pipeline completed with errors${colors.reset}-") } } @@ -395,21 +402,30 @@ def completionSummary(monochrome_logs=true) { // def imNotification(summary_params, hook_url) { def summary = [:] - for (group in summary_params.keySet()) { - summary << summary_params[group] - } + summary_params + .keySet() + .sort() + .each { group -> + summary << summary_params[group] + } def misc_fields = [:] - misc_fields['start'] = workflow.start - misc_fields['complete'] = workflow.complete - misc_fields['scriptfile'] = workflow.scriptFile - misc_fields['scriptid'] = workflow.scriptId - if (workflow.repository) misc_fields['repository'] = workflow.repository - if (workflow.commitId) misc_fields['commitid'] = workflow.commitId - if (workflow.revision) misc_fields['revision'] = workflow.revision - misc_fields['nxf_version'] = workflow.nextflow.version - misc_fields['nxf_build'] = workflow.nextflow.build - misc_fields['nxf_timestamp'] = workflow.nextflow.timestamp + misc_fields['start'] = workflow.start + misc_fields['complete'] = workflow.complete + misc_fields['scriptfile'] = workflow.scriptFile + misc_fields['scriptid'] = workflow.scriptId + if (workflow.repository) { + misc_fields['repository'] = workflow.repository + } + if (workflow.commitId) { + misc_fields['commitid'] = workflow.commitId + } + if (workflow.revision) { + misc_fields['revision'] = workflow.revision + } + misc_fields['nxf_version'] = workflow.nextflow.version + misc_fields['nxf_build'] = workflow.nextflow.build + misc_fields['nxf_timestamp'] = workflow.nextflow.timestamp def msg_fields = [:] msg_fields['version'] = getWorkflowVersion() @@ -434,13 +450,13 @@ def imNotification(summary_params, hook_url) { def json_message = json_template.toString() // POST - def post = new URL(hook_url).openConnection(); + def post = new URL(hook_url).openConnection() post.setRequestMethod("POST") post.setDoOutput(true) post.setRequestProperty("Content-Type", "application/json") - post.getOutputStream().write(json_message.getBytes("UTF-8")); - def postRC = post.getResponseCode(); - if (! postRC.equals(200)) { - log.warn(post.getErrorStream().getText()); + post.getOutputStream().write(json_message.getBytes("UTF-8")) + def postRC = post.getResponseCode() + if (!postRC.equals(200)) { + log.warn(post.getErrorStream().getText()) } } diff --git a/subworkflows/nf-core/utils_nfschema_plugin/main.nf b/subworkflows/nf-core/utils_nfschema_plugin/main.nf new file mode 100644 index 00000000..4994303e --- /dev/null +++ b/subworkflows/nf-core/utils_nfschema_plugin/main.nf @@ -0,0 +1,46 @@ +// +// Subworkflow that uses the nf-schema plugin to validate parameters and render the parameter summary +// + +include { paramsSummaryLog } from 'plugin/nf-schema' +include { validateParameters } from 'plugin/nf-schema' + +workflow UTILS_NFSCHEMA_PLUGIN { + + take: + input_workflow // workflow: the workflow object used by nf-schema to get metadata from the workflow + validate_params // boolean: validate the parameters + parameters_schema // string: path to the parameters JSON schema. + // this has to be the same as the schema given to `validation.parametersSchema` + // when this input is empty it will automatically use the configured schema or + // "${projectDir}/nextflow_schema.json" as default. This input should not be empty + // for meta pipelines + + main: + + // + // Print parameter summary to stdout. This will display the parameters + // that differ from the default given in the JSON schema + // + if(parameters_schema) { + log.info paramsSummaryLog(input_workflow, parameters_schema:parameters_schema) + } else { + log.info paramsSummaryLog(input_workflow) + } + + // + // Validate the parameters using nextflow_schema.json or the schema + // given via the validation.parametersSchema configuration option + // + if(validate_params) { + if(parameters_schema) { + validateParameters(parameters_schema:parameters_schema) + } else { + validateParameters() + } + } + + emit: + dummy_emit = true +} + diff --git a/subworkflows/nf-core/utils_nfschema_plugin/meta.yml b/subworkflows/nf-core/utils_nfschema_plugin/meta.yml new file mode 100644 index 00000000..f7d9f028 --- /dev/null +++ b/subworkflows/nf-core/utils_nfschema_plugin/meta.yml @@ -0,0 +1,35 @@ +# yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/subworkflows/yaml-schema.json +name: "utils_nfschema_plugin" +description: Run nf-schema to validate parameters and create a summary of changed parameters +keywords: + - validation + - JSON schema + - plugin + - parameters + - summary +components: [] +input: + - input_workflow: + type: object + description: | + The workflow object of the used pipeline. + This object contains meta data used to create the params summary log + - validate_params: + type: boolean + description: Validate the parameters and error if invalid. + - parameters_schema: + type: string + description: | + Path to the parameters JSON schema. + This has to be the same as the schema given to the `validation.parametersSchema` config + option. When this input is empty it will automatically use the configured schema or + "${projectDir}/nextflow_schema.json" as default. The schema should not be given in this way + for meta pipelines. +output: + - dummy_emit: + type: boolean + description: Dummy emit to make nf-core subworkflows lint happy +authors: + - "@nvnieuwk" +maintainers: + - "@nvnieuwk" diff --git a/subworkflows/nf-core/utils_nfschema_plugin/tests/main.nf.test b/subworkflows/nf-core/utils_nfschema_plugin/tests/main.nf.test new file mode 100644 index 00000000..842dc432 --- /dev/null +++ b/subworkflows/nf-core/utils_nfschema_plugin/tests/main.nf.test @@ -0,0 +1,117 @@ +nextflow_workflow { + + name "Test Subworkflow UTILS_NFSCHEMA_PLUGIN" + script "../main.nf" + workflow "UTILS_NFSCHEMA_PLUGIN" + + tag "subworkflows" + tag "subworkflows_nfcore" + tag "subworkflows/utils_nfschema_plugin" + tag "plugin/nf-schema" + + config "./nextflow.config" + + test("Should run nothing") { + + when { + + params { + test_data = '' + } + + workflow { + """ + validate_params = false + input[0] = workflow + input[1] = validate_params + input[2] = "" + """ + } + } + + then { + assertAll( + { assert workflow.success } + ) + } + } + + test("Should validate params") { + + when { + + params { + test_data = '' + outdir = 1 + } + + workflow { + """ + validate_params = true + input[0] = workflow + input[1] = validate_params + input[2] = "" + """ + } + } + + then { + assertAll( + { assert workflow.failed }, + { assert workflow.stdout.any { it.contains('ERROR ~ Validation of pipeline parameters failed!') } } + ) + } + } + + test("Should run nothing - custom schema") { + + when { + + params { + test_data = '' + } + + workflow { + """ + validate_params = false + input[0] = workflow + input[1] = validate_params + input[2] = "${projectDir}/subworkflows/nf-core/utils_nfschema_plugin/tests/nextflow_schema.json" + """ + } + } + + then { + assertAll( + { assert workflow.success } + ) + } + } + + test("Should validate params - custom schema") { + + when { + + params { + test_data = '' + outdir = 1 + } + + workflow { + """ + validate_params = true + input[0] = workflow + input[1] = validate_params + input[2] = "${projectDir}/subworkflows/nf-core/utils_nfschema_plugin/tests/nextflow_schema.json" + """ + } + } + + then { + assertAll( + { assert workflow.failed }, + { assert workflow.stdout.any { it.contains('ERROR ~ Validation of pipeline parameters failed!') } } + ) + } + } +} diff --git a/subworkflows/nf-core/utils_nfschema_plugin/tests/nextflow.config b/subworkflows/nf-core/utils_nfschema_plugin/tests/nextflow.config new file mode 100644 index 00000000..0907ac58 --- /dev/null +++ b/subworkflows/nf-core/utils_nfschema_plugin/tests/nextflow.config @@ -0,0 +1,8 @@ +plugins { + id "nf-schema@2.1.0" +} + +validation { + parametersSchema = "${projectDir}/subworkflows/nf-core/utils_nfschema_plugin/tests/nextflow_schema.json" + monochromeLogs = true +} \ No newline at end of file diff --git a/subworkflows/nf-core/utils_nfvalidation_plugin/tests/nextflow_schema.json b/subworkflows/nf-core/utils_nfschema_plugin/tests/nextflow_schema.json similarity index 95% rename from subworkflows/nf-core/utils_nfvalidation_plugin/tests/nextflow_schema.json rename to subworkflows/nf-core/utils_nfschema_plugin/tests/nextflow_schema.json index 7626c1c9..331e0d2f 100644 --- a/subworkflows/nf-core/utils_nfvalidation_plugin/tests/nextflow_schema.json +++ b/subworkflows/nf-core/utils_nfschema_plugin/tests/nextflow_schema.json @@ -1,10 +1,10 @@ { - "$schema": "http://json-schema.org/draft-07/schema", + "$schema": "https://json-schema.org/draft/2020-12/schema", "$id": "https://raw.githubusercontent.com/./master/nextflow_schema.json", "title": ". pipeline parameters", "description": "", "type": "object", - "definitions": { + "$defs": { "input_output_options": { "title": "Input/output options", "type": "object", @@ -87,10 +87,10 @@ }, "allOf": [ { - "$ref": "#/definitions/input_output_options" + "$ref": "#/$defs/input_output_options" }, { - "$ref": "#/definitions/generic_options" + "$ref": "#/$defs/generic_options" } ] } diff --git a/subworkflows/nf-core/utils_nfvalidation_plugin/main.nf b/subworkflows/nf-core/utils_nfvalidation_plugin/main.nf deleted file mode 100644 index 2585b65d..00000000 --- a/subworkflows/nf-core/utils_nfvalidation_plugin/main.nf +++ /dev/null @@ -1,62 +0,0 @@ -// -// Subworkflow that uses the nf-validation plugin to render help text and parameter summary -// - -/* -======================================================================================== - IMPORT NF-VALIDATION PLUGIN -======================================================================================== -*/ - -include { paramsHelp } from 'plugin/nf-validation' -include { paramsSummaryLog } from 'plugin/nf-validation' -include { validateParameters } from 'plugin/nf-validation' - -/* -======================================================================================== - SUBWORKFLOW DEFINITION -======================================================================================== -*/ - -workflow UTILS_NFVALIDATION_PLUGIN { - - take: - print_help // boolean: print help - workflow_command // string: default commmand used to run pipeline - pre_help_text // string: string to be printed before help text and summary log - post_help_text // string: string to be printed after help text and summary log - validate_params // boolean: validate parameters - schema_filename // path: JSON schema file, null to use default value - - main: - - log.debug "Using schema file: ${schema_filename}" - - // Default values for strings - pre_help_text = pre_help_text ?: '' - post_help_text = post_help_text ?: '' - workflow_command = workflow_command ?: '' - - // - // Print help message if needed - // - if (print_help) { - log.info pre_help_text + paramsHelp(workflow_command, parameters_schema: schema_filename) + post_help_text - System.exit(0) - } - - // - // Print parameter summary to stdout - // - log.info pre_help_text + paramsSummaryLog(workflow, parameters_schema: schema_filename) + post_help_text - - // - // Validate parameters relative to the parameter JSON schema - // - if (validate_params){ - validateParameters(parameters_schema: schema_filename) - } - - emit: - dummy_emit = true -} diff --git a/subworkflows/nf-core/utils_nfvalidation_plugin/meta.yml b/subworkflows/nf-core/utils_nfvalidation_plugin/meta.yml deleted file mode 100644 index 3d4a6b04..00000000 --- a/subworkflows/nf-core/utils_nfvalidation_plugin/meta.yml +++ /dev/null @@ -1,44 +0,0 @@ -# yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/subworkflows/yaml-schema.json -name: "UTILS_NFVALIDATION_PLUGIN" -description: Use nf-validation to initiate and validate a pipeline -keywords: - - utility - - pipeline - - initialise - - validation -components: [] -input: - - print_help: - type: boolean - description: | - Print help message and exit - - workflow_command: - type: string - description: | - The command to run the workflow e.g. "nextflow run main.nf" - - pre_help_text: - type: string - description: | - Text to print before the help message - - post_help_text: - type: string - description: | - Text to print after the help message - - validate_params: - type: boolean - description: | - Validate the parameters and error if invalid. - - schema_filename: - type: string - description: | - The filename of the schema to validate against. -output: - - dummy_emit: - type: boolean - description: | - Dummy emit to make nf-core subworkflows lint happy -authors: - - "@adamrtalbot" -maintainers: - - "@adamrtalbot" - - "@maxulysse" diff --git a/subworkflows/nf-core/utils_nfvalidation_plugin/tests/main.nf.test b/subworkflows/nf-core/utils_nfvalidation_plugin/tests/main.nf.test deleted file mode 100644 index 5784a33f..00000000 --- a/subworkflows/nf-core/utils_nfvalidation_plugin/tests/main.nf.test +++ /dev/null @@ -1,200 +0,0 @@ -nextflow_workflow { - - name "Test Workflow UTILS_NFVALIDATION_PLUGIN" - script "../main.nf" - workflow "UTILS_NFVALIDATION_PLUGIN" - tag "subworkflows" - tag "subworkflows_nfcore" - tag "plugin/nf-validation" - tag "'plugin/nf-validation'" - tag "utils_nfvalidation_plugin" - tag "subworkflows/utils_nfvalidation_plugin" - - test("Should run nothing") { - - when { - - params { - monochrome_logs = true - test_data = '' - } - - workflow { - """ - help = false - workflow_command = null - pre_help_text = null - post_help_text = null - validate_params = false - schema_filename = "$moduleTestDir/nextflow_schema.json" - - input[0] = help - input[1] = workflow_command - input[2] = pre_help_text - input[3] = post_help_text - input[4] = validate_params - input[5] = schema_filename - """ - } - } - - then { - assertAll( - { assert workflow.success } - ) - } - } - - test("Should run help") { - - - when { - - params { - monochrome_logs = true - test_data = '' - } - workflow { - """ - help = true - workflow_command = null - pre_help_text = null - post_help_text = null - validate_params = false - schema_filename = "$moduleTestDir/nextflow_schema.json" - - input[0] = help - input[1] = workflow_command - input[2] = pre_help_text - input[3] = post_help_text - input[4] = validate_params - input[5] = schema_filename - """ - } - } - - then { - assertAll( - { assert workflow.success }, - { assert workflow.exitStatus == 0 }, - { assert workflow.stdout.any { it.contains('Input/output options') } }, - { assert workflow.stdout.any { it.contains('--outdir') } } - ) - } - } - - test("Should run help with command") { - - when { - - params { - monochrome_logs = true - test_data = '' - } - workflow { - """ - help = true - workflow_command = "nextflow run noorg/doesntexist" - pre_help_text = null - post_help_text = null - validate_params = false - schema_filename = "$moduleTestDir/nextflow_schema.json" - - input[0] = help - input[1] = workflow_command - input[2] = pre_help_text - input[3] = post_help_text - input[4] = validate_params - input[5] = schema_filename - """ - } - } - - then { - assertAll( - { assert workflow.success }, - { assert workflow.exitStatus == 0 }, - { assert workflow.stdout.any { it.contains('nextflow run noorg/doesntexist') } }, - { assert workflow.stdout.any { it.contains('Input/output options') } }, - { assert workflow.stdout.any { it.contains('--outdir') } } - ) - } - } - - test("Should run help with extra text") { - - - when { - - params { - monochrome_logs = true - test_data = '' - } - workflow { - """ - help = true - workflow_command = "nextflow run noorg/doesntexist" - pre_help_text = "pre-help-text" - post_help_text = "post-help-text" - validate_params = false - schema_filename = "$moduleTestDir/nextflow_schema.json" - - input[0] = help - input[1] = workflow_command - input[2] = pre_help_text - input[3] = post_help_text - input[4] = validate_params - input[5] = schema_filename - """ - } - } - - then { - assertAll( - { assert workflow.success }, - { assert workflow.exitStatus == 0 }, - { assert workflow.stdout.any { it.contains('pre-help-text') } }, - { assert workflow.stdout.any { it.contains('nextflow run noorg/doesntexist') } }, - { assert workflow.stdout.any { it.contains('Input/output options') } }, - { assert workflow.stdout.any { it.contains('--outdir') } }, - { assert workflow.stdout.any { it.contains('post-help-text') } } - ) - } - } - - test("Should validate params") { - - when { - - params { - monochrome_logs = true - test_data = '' - outdir = 1 - } - workflow { - """ - help = false - workflow_command = null - pre_help_text = null - post_help_text = null - validate_params = true - schema_filename = "$moduleTestDir/nextflow_schema.json" - - input[0] = help - input[1] = workflow_command - input[2] = pre_help_text - input[3] = post_help_text - input[4] = validate_params - input[5] = schema_filename - """ - } - } - - then { - assertAll( - { assert workflow.failed }, - { assert workflow.stdout.any { it.contains('ERROR ~ ERROR: Validation of pipeline parameters failed!') } } - ) - } - } -} diff --git a/subworkflows/nf-core/utils_nfvalidation_plugin/tests/tags.yml b/subworkflows/nf-core/utils_nfvalidation_plugin/tests/tags.yml deleted file mode 100644 index 60b1cfff..00000000 --- a/subworkflows/nf-core/utils_nfvalidation_plugin/tests/tags.yml +++ /dev/null @@ -1,2 +0,0 @@ -subworkflows/utils_nfvalidation_plugin: - - subworkflows/nf-core/utils_nfvalidation_plugin/** diff --git a/workflows/mag.nf b/workflows/mag.nf index 6c158284..7afb4316 100644 --- a/workflows/mag.nf +++ b/workflows/mag.nf @@ -3,27 +3,27 @@ IMPORT MODULES / SUBWORKFLOWS / FUNCTIONS ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ - -include { MULTIQC } from '../modules/nf-core/multiqc/main' -include { paramsSummaryMap } from 'plugin/nf-validation' -include { paramsSummaryMultiqc } from '../subworkflows/nf-core/utils_nfcore_pipeline' -include { softwareVersionsToYAML } from '../subworkflows/nf-core/utils_nfcore_pipeline' -include { methodsDescriptionText } from '../subworkflows/local/utils_nfcore_mag_pipeline' +include { MULTIQC } from '../modules/nf-core/multiqc/main' +include { paramsSummaryMap } from 'plugin/nf-schema' +include { paramsSummaryMultiqc } from '../subworkflows/nf-core/utils_nfcore_pipeline' +include { softwareVersionsToYAML } from '../subworkflows/nf-core/utils_nfcore_pipeline' +include { methodsDescriptionText } from '../subworkflows/local/utils_nfcore_mag_pipeline' // // SUBWORKFLOW: Consisting of a mix of local and nf-core/modules // -include { BINNING_PREPARATION } from '../subworkflows/local/binning_preparation' -include { BINNING } from '../subworkflows/local/binning' -include { BINNING_REFINEMENT } from '../subworkflows/local/binning_refinement' -include { BUSCO_QC } from '../subworkflows/local/busco_qc' -include { VIRUS_IDENTIFICATION } from '../subworkflows/local/virus_identification' -include { CHECKM_QC } from '../subworkflows/local/checkm_qc' -include { GUNC_QC } from '../subworkflows/local/gunc_qc' -include { GTDBTK } from '../subworkflows/local/gtdbtk' -include { ANCIENT_DNA_ASSEMBLY_VALIDATION } from '../subworkflows/local/ancient_dna' -include { DOMAIN_CLASSIFICATION } from '../subworkflows/local/domain_classification' -include { DEPTHS } from '../subworkflows/local/depths' +include { BINNING_PREPARATION } from '../subworkflows/local/binning_preparation' +include { BINNING } from '../subworkflows/local/binning' +include { BINNING_REFINEMENT } from '../subworkflows/local/binning_refinement' +include { BUSCO_QC } from '../subworkflows/local/busco_qc' +include { VIRUS_IDENTIFICATION } from '../subworkflows/local/virus_identification' +include { CHECKM_QC } from '../subworkflows/local/checkm_qc' +include { GUNC_QC } from '../subworkflows/local/gunc_qc' +include { GTDBTK } from '../subworkflows/local/gtdbtk' +include { ANCIENT_DNA_ASSEMBLY_VALIDATION } from '../subworkflows/local/ancient_dna' +include { DOMAIN_CLASSIFICATION } from '../subworkflows/local/domain_classification' +include { DEPTHS } from '../subworkflows/local/depths' +include { LONGREAD_PREPROCESSING } from '../subworkflows/local/longread_preprocessing' // // MODULE: Installed directly from nf-core/modules @@ -32,10 +32,6 @@ include { ARIA2 as ARIA2_UNTAR } from '../modul include { FASTQC as FASTQC_RAW } from '../modules/nf-core/fastqc/main' include { FASTQC as FASTQC_TRIMMED } from '../modules/nf-core/fastqc/main' include { SEQTK_MERGEPE } from '../modules/nf-core/seqtk/mergepe/main' -include { PORECHOP_PORECHOP } from '../modules/nf-core/porechop/porechop/main' -include { NANOPLOT as NANOPLOT_RAW } from '../modules/nf-core/nanoplot/main' -include { NANOPLOT as NANOPLOT_FILTERED } from '../modules/nf-core/nanoplot/main' -include { NANOLYSE } from '../modules/nf-core/nanolyse/main' include { BBMAP_BBNORM } from '../modules/nf-core/bbmap/bbnorm/main' include { FASTP } from '../modules/nf-core/fastp/main' include { ADAPTERREMOVAL as ADAPTERREMOVAL_PE } from '../modules/nf-core/adapterremoval/main' @@ -47,7 +43,11 @@ include { KRONA_KRONADB } from '../modul include { KRONA_KTIMPORTTAXONOMY } from '../modules/nf-core/krona/ktimporttaxonomy/main' include { KRAKENTOOLS_KREPORT2KRONA as KREPORT2KRONA_CENTRIFUGE } from '../modules/nf-core/krakentools/kreport2krona/main' include { CAT_FASTQ } from '../modules/nf-core/cat/fastq/main' +include { MEGAHIT } from '../modules/nf-core/megahit/main' +include { SPADES as METASPADES } from '../modules/nf-core/spades/main' +include { SPADES as METASPADESHYBRID } from '../modules/nf-core/spades/main' include { GUNZIP as GUNZIP_ASSEMBLIES } from '../modules/nf-core/gunzip' +include { GUNZIP as GUNZIP_ASSEMBLYINPUT } from '../modules/nf-core/gunzip' include { PRODIGAL } from '../modules/nf-core/prodigal/main' include { PROKKA } from '../modules/nf-core/prokka/main' include { MMSEQS_DATABASES } from '../modules/nf-core/mmseqs/databases/main' @@ -56,140 +56,131 @@ include { METAEUK_EASYPREDICT } from '../modul // // MODULE: Local to the pipeline // -include { BOWTIE2_REMOVAL_BUILD as BOWTIE2_HOST_REMOVAL_BUILD } from '../modules/local/bowtie2_removal_build' -include { BOWTIE2_REMOVAL_ALIGN as BOWTIE2_HOST_REMOVAL_ALIGN } from '../modules/local/bowtie2_removal_align' -include { BOWTIE2_REMOVAL_BUILD as BOWTIE2_PHIX_REMOVAL_BUILD } from '../modules/local/bowtie2_removal_build' -include { BOWTIE2_REMOVAL_ALIGN as BOWTIE2_PHIX_REMOVAL_ALIGN } from '../modules/local/bowtie2_removal_align' -include { FILTLONG } from '../modules/local/filtlong' -include { KRAKEN2_DB_PREPARATION } from '../modules/local/kraken2_db_preparation' -include { KRAKEN2 } from '../modules/local/kraken2' -include { POOL_SINGLE_READS as POOL_SHORT_SINGLE_READS } from '../modules/local/pool_single_reads' -include { POOL_PAIRED_READS } from '../modules/local/pool_paired_reads' -include { POOL_SINGLE_READS as POOL_LONG_READS } from '../modules/local/pool_single_reads' -include { MEGAHIT } from '../modules/local/megahit' -include { SPADES } from '../modules/local/spades' -include { SPADESHYBRID } from '../modules/local/spadeshybrid' -include { QUAST } from '../modules/local/quast' -include { QUAST_BINS } from '../modules/local/quast_bins' -include { QUAST_BINS_SUMMARY } from '../modules/local/quast_bins_summary' -include { CAT_DB } from '../modules/local/cat_db' -include { CAT_DB_GENERATE } from '../modules/local/cat_db_generate' -include { CAT } from '../modules/local/cat' -include { CAT_SUMMARY } from "../modules/local/cat_summary" -include { BIN_SUMMARY } from '../modules/local/bin_summary' -include { COMBINE_TSV as COMBINE_SUMMARY_TSV } from '../modules/local/combine_tsv' - -//////////////////////////////////////////////////// -/* -- Create channel for reference databases -- */ -//////////////////////////////////////////////////// - -if ( params.host_genome ) { - host_fasta = params.genomes[params.host_genome].fasta ?: false - ch_host_fasta = Channel - .value(file( "${host_fasta}" )) - host_bowtie2index = params.genomes[params.host_genome].bowtie2 ?: false - ch_host_bowtie2index = Channel - .value(file( "${host_bowtie2index}/*" )) -} else if ( params.host_fasta ) { - ch_host_fasta = Channel - .value(file( "${params.host_fasta}" )) -} else { - ch_host_fasta = Channel.empty() -} +include { BOWTIE2_REMOVAL_BUILD as BOWTIE2_HOST_REMOVAL_BUILD } from '../modules/local/bowtie2_removal_build' +include { BOWTIE2_REMOVAL_ALIGN as BOWTIE2_HOST_REMOVAL_ALIGN } from '../modules/local/bowtie2_removal_align' +include { BOWTIE2_REMOVAL_BUILD as BOWTIE2_PHIX_REMOVAL_BUILD } from '../modules/local/bowtie2_removal_build' +include { BOWTIE2_REMOVAL_ALIGN as BOWTIE2_PHIX_REMOVAL_ALIGN } from '../modules/local/bowtie2_removal_align' +include { KRAKEN2_DB_PREPARATION } from '../modules/local/kraken2_db_preparation' +include { KRAKEN2 } from '../modules/local/kraken2' +include { POOL_SINGLE_READS as POOL_SHORT_SINGLE_READS } from '../modules/local/pool_single_reads' +include { POOL_PAIRED_READS } from '../modules/local/pool_paired_reads' +include { POOL_SINGLE_READS as POOL_LONG_READS } from '../modules/local/pool_single_reads' +include { QUAST } from '../modules/local/quast' +include { QUAST_BINS } from '../modules/local/quast_bins' +include { QUAST_BINS_SUMMARY } from '../modules/local/quast_bins_summary' +include { CAT_DB } from '../modules/local/cat_db' +include { CAT_DB_GENERATE } from '../modules/local/cat_db_generate' +include { CAT } from '../modules/local/cat' +include { CAT_SUMMARY } from '../modules/local/cat_summary' +include { BIN_SUMMARY } from '../modules/local/bin_summary' +include { COMBINE_TSV as COMBINE_SUMMARY_TSV } from '../modules/local/combine_tsv' -if (params.busco_db) { - ch_busco_db = file(params.busco_db, checkIfExists: true) -} else { - ch_busco_db = [] -} +workflow MAG { + take: + ch_raw_short_reads // channel: samplesheet read in from --input + ch_raw_long_reads + ch_input_assemblies -if(params.checkm_db) { - ch_checkm_db = file(params.checkm_db, checkIfExists: true) -} + main: -if (params.gunc_db) { - ch_gunc_db = file(params.gunc_db, checkIfExists: true) -} else { - ch_gunc_db = Channel.empty() -} + ch_versions = Channel.empty() + ch_multiqc_files = Channel.empty() -if(params.kraken2_db){ - ch_kraken2_db_file = file(params.kraken2_db, checkIfExists: true) -} else { - ch_kraken2_db_file = [] -} + //////////////////////////////////////////////////// + /* -- Create channel for reference databases -- */ + //////////////////////////////////////////////////// -if(params.cat_db){ - ch_cat_db_file = Channel - .value(file( "${params.cat_db}" )) -} else { - ch_cat_db_file = Channel.empty() -} + if (params.host_genome) { + host_fasta = params.genomes[params.host_genome].fasta ?: false + ch_host_fasta = Channel.value(file("${host_fasta}")) + host_bowtie2index = params.genomes[params.host_genome].bowtie2 ?: false + ch_host_bowtie2index = Channel.value(file("${host_bowtie2index}/*")) + } + else if (params.host_fasta) { + ch_host_fasta = Channel.value(file("${params.host_fasta}")) + } + else { + ch_host_fasta = Channel.empty() + } -if(params.krona_db){ - ch_krona_db_file = Channel - .value(file( "${params.krona_db}" )) -} else { - ch_krona_db_file = Channel.empty() -} + if (params.busco_db) { + ch_busco_db = file(params.busco_db, checkIfExists: true) + } + else { + ch_busco_db = [] + } -if(!params.keep_phix) { - ch_phix_db_file = Channel - .value(file( "${params.phix_reference}" )) -} + if (params.checkm_db) { + ch_checkm_db = file(params.checkm_db, checkIfExists: true) + } -if (!params.keep_lambda) { - ch_nanolyse_db = Channel - .value(file( "${params.lambda_reference}" )) -} + if (params.gunc_db) { + ch_gunc_db = file(params.gunc_db, checkIfExists: true) + } + else { + ch_gunc_db = Channel.empty() + } -if (params.genomad_db){ - ch_genomad_db = file(params.genomad_db, checkIfExists: true) -} else { - ch_genomad_db = Channel.empty() -} + if (params.kraken2_db) { + ch_kraken2_db_file = file(params.kraken2_db, checkIfExists: true) + } + else { + ch_kraken2_db_file = [] + } -gtdb = ( params.skip_binqc || params.skip_gtdbtk ) ? false : params.gtdb_db + if (params.cat_db) { + ch_cat_db_file = Channel.value(file("${params.cat_db}")) + } + else { + ch_cat_db_file = Channel.empty() + } -if (gtdb) { - gtdb = file( "${gtdb}", checkIfExists: true) - gtdb_mash = params.gtdb_mash ? file("${params.gtdb_mash}", checkIfExists: true) : [] -} else { - gtdb = [] -} + if (params.krona_db) { + ch_krona_db_file = Channel.value(file("${params.krona_db}")) + } + else { + ch_krona_db_file = Channel.empty() + } -if(params.metaeuk_db && !params.skip_metaeuk) { - ch_metaeuk_db = Channel. - value(file("${params.metaeuk_db}", checkIfExists: true)) -} else { - ch_metaeuk_db = Channel.empty() -} + if (!params.keep_phix) { + ch_phix_db_file = Channel.value(file("${params.phix_reference}")) + } -/* -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - RUN MAIN WORKFLOW -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -*/ + if (!params.keep_lambda) { + ch_nanolyse_db = Channel.value(file("${params.lambda_reference}")) + } -// Additional info for completion email and summary -def busco_failed_bins = [:] + if (params.genomad_db) { + ch_genomad_db = file(params.genomad_db, checkIfExists: true) + } + else { + ch_genomad_db = Channel.empty() + } -workflow MAG { + gtdb = params.skip_binqc || params.skip_gtdbtk ? false : params.gtdb_db - take: - ch_raw_short_reads // channel: samplesheet read in from --input - ch_raw_long_reads - ch_input_assemblies + if (gtdb) { + gtdb = file("${gtdb}", checkIfExists: true) + gtdb_mash = params.gtdb_mash ? file("${params.gtdb_mash}", checkIfExists: true) : [] + } + else { + gtdb = [] + } - main: + if (params.metaeuk_db && !params.skip_metaeuk) { + ch_metaeuk_db = Channel.value(file("${params.metaeuk_db}", checkIfExists: true)) + } + else { + ch_metaeuk_db = Channel.empty() + } - ch_versions = Channel.empty() - ch_multiqc_files = Channel.empty() + // Additional info for completion email and summary + def busco_failed_bins = [:] // Get checkM database if not supplied - if ( !params.skip_binqc && params.binqc_tool == 'checkm' && !params.checkm_db ) { - ARIA2_UNTAR (params.checkm_download_url) + if (!params.skip_binqc && params.binqc_tool == 'checkm' && !params.checkm_db) { + ARIA2_UNTAR(params.checkm_download_url) ch_checkm_db = ARIA2_UNTAR.out.downloaded_file } @@ -200,25 +191,22 @@ workflow MAG { ch_versions = ch_versions.mix(MMSEQS_DATABASES.out.versions) } - - - /* ================================================================================ Preprocessing and QC for short reads ================================================================================ */ - FASTQC_RAW ( + FASTQC_RAW( ch_raw_short_reads ) ch_versions = ch_versions.mix(FASTQC_RAW.out.versions.first()) ch_bowtie2_removal_host_multiqc = Channel.empty() - if ( !params.assembly_input ) { - if ( !params.skip_clipping ) { - if ( params.clip_tool == 'fastp' ) { - ch_clipmerge_out = FASTP ( + if (!params.assembly_input) { + if (!params.skip_clipping) { + if (params.clip_tool == 'fastp') { + ch_clipmerge_out = FASTP( ch_raw_short_reads, [], params.fastp_save_trimmed_fail, @@ -226,70 +214,72 @@ workflow MAG { ) ch_short_reads_prepped = FASTP.out.reads ch_versions = ch_versions.mix(FASTP.out.versions.first()) - - } else if ( params.clip_tool == 'adapterremoval' ) { + } + else if (params.clip_tool == 'adapterremoval') { // due to strange output file scheme in AR2, have to manually separate // SE/PE to allow correct pulling of reads after. - ch_adapterremoval_in = ch_raw_short_reads - .branch { - single: it[0]['single_end'] - paired: !it[0]['single_end'] - } + ch_adapterremoval_in = ch_raw_short_reads.branch { + single: it[0]['single_end'] + paired: !it[0]['single_end'] + } - ADAPTERREMOVAL_PE ( ch_adapterremoval_in.paired, [] ) - ADAPTERREMOVAL_SE ( ch_adapterremoval_in.single, [] ) + ADAPTERREMOVAL_PE(ch_adapterremoval_in.paired, []) + ADAPTERREMOVAL_SE(ch_adapterremoval_in.single, []) - ch_short_reads_prepped = Channel.empty() - ch_short_reads_prepped = ch_short_reads_prepped.mix(ADAPTERREMOVAL_SE.out.singles_truncated, ADAPTERREMOVAL_PE.out.paired_truncated) + ch_short_reads_prepped = Channel.empty() + ch_short_reads_prepped = ch_short_reads_prepped.mix(ADAPTERREMOVAL_SE.out.singles_truncated, ADAPTERREMOVAL_PE.out.paired_truncated) ch_versions = ch_versions.mix(ADAPTERREMOVAL_PE.out.versions.first(), ADAPTERREMOVAL_SE.out.versions.first()) } - } else { + } + else { ch_short_reads_prepped = ch_raw_short_reads } - if (params.host_fasta){ - if ( params.host_fasta_bowtie2index ) { + if (params.host_fasta) { + if (params.host_fasta_bowtie2index) { ch_host_bowtie2index = file(params.host_fasta_bowtie2index, checkIfExists: true) - } else { - BOWTIE2_HOST_REMOVAL_BUILD ( + } + else { + BOWTIE2_HOST_REMOVAL_BUILD( ch_host_fasta ) ch_host_bowtie2index = BOWTIE2_HOST_REMOVAL_BUILD.out.index } - } ch_bowtie2_removal_host_multiqc = Channel.empty() - if (params.host_fasta || params.host_genome){ - BOWTIE2_HOST_REMOVAL_ALIGN ( + if (params.host_fasta || params.host_genome) { + BOWTIE2_HOST_REMOVAL_ALIGN( ch_short_reads_prepped, ch_host_bowtie2index ) ch_short_reads_hostremoved = BOWTIE2_HOST_REMOVAL_ALIGN.out.reads ch_bowtie2_removal_host_multiqc = BOWTIE2_HOST_REMOVAL_ALIGN.out.log ch_versions = ch_versions.mix(BOWTIE2_HOST_REMOVAL_ALIGN.out.versions.first()) - } else { + } + else { ch_short_reads_hostremoved = ch_short_reads_prepped } - if(!params.keep_phix) { - BOWTIE2_PHIX_REMOVAL_BUILD ( + if (!params.keep_phix) { + BOWTIE2_PHIX_REMOVAL_BUILD( ch_phix_db_file ) - BOWTIE2_PHIX_REMOVAL_ALIGN ( + BOWTIE2_PHIX_REMOVAL_ALIGN( ch_short_reads_hostremoved, BOWTIE2_PHIX_REMOVAL_BUILD.out.index ) ch_short_reads_phixremoved = BOWTIE2_PHIX_REMOVAL_ALIGN.out.reads ch_versions = ch_versions.mix(BOWTIE2_PHIX_REMOVAL_ALIGN.out.versions.first()) - } else { + } + else { ch_short_reads_phixremoved = ch_short_reads_hostremoved } if (!(params.keep_phix && params.skip_clipping && !(params.host_genome || params.host_fasta))) { - FASTQC_TRIMMED ( + FASTQC_TRIMMED( ch_short_reads_phixremoved ) ch_versions = ch_versions.mix(FASTQC_TRIMMED.out.versions) @@ -298,62 +288,60 @@ workflow MAG { // Run/Lane merging ch_short_reads_forcat = ch_short_reads_phixremoved - .map { - meta, reads -> - def meta_new = meta - meta.subMap('run') - [ meta_new, reads ] + .map { meta, reads -> + def meta_new = meta - meta.subMap('run') + [meta_new, reads] } .groupTuple() - .branch { - meta, reads -> - cat: reads.size() >= 2 // SE: [[meta], [S1_R1, S2_R1]]; PE: [[meta], [[S1_R1, S1_R2], [S2_R1, S2_R2]]] - skip_cat: true // Can skip merging if only single lanes + .branch { meta, reads -> + cat: reads.size() >= 2 + skip_cat: true } - CAT_FASTQ ( ch_short_reads_forcat.cat.map { meta, reads -> [ meta, reads.flatten() ]} ) + CAT_FASTQ(ch_short_reads_forcat.cat.map { meta, reads -> [meta, reads.flatten()] }) // Ensure we don't have nests of nests so that structure is in form expected for assembly - ch_short_reads_catskipped = ch_short_reads_forcat.skip_cat - .map { meta, reads -> - def new_reads = meta.single_end ? reads[0] : reads.flatten() - [ meta, new_reads ] - } + ch_short_reads_catskipped = ch_short_reads_forcat.skip_cat.map { meta, reads -> + def new_reads = meta.single_end ? reads[0] : reads.flatten() + [meta, new_reads] + } // Combine single run and multi-run-merged data ch_short_reads = Channel.empty() ch_short_reads = CAT_FASTQ.out.reads.mix(ch_short_reads_catskipped) - ch_versions = ch_versions.mix(CAT_FASTQ.out.versions.first()) + ch_versions = ch_versions.mix(CAT_FASTQ.out.versions.first()) - if ( params.bbnorm ) { - if ( params.coassemble_group ) { + if (params.bbnorm) { + if (params.coassemble_group) { // Interleave pairs, to be able to treat them as single ends when calling bbnorm. This prepares // for dropping the single_end parameter, but keeps assembly modules as they are, i.e. not // accepting a mix of single end and pairs. - SEQTK_MERGEPE ( - ch_short_reads.filter { ! it[0].single_end } + SEQTK_MERGEPE( + ch_short_reads.filter { !it[0].single_end } ) ch_versions = ch_versions.mix(SEQTK_MERGEPE.out.versions.first()) // Combine the interleaved pairs with any single end libraries. Set the meta.single_end to true (used by the bbnorm module). - ch_bbnorm = SEQTK_MERGEPE.out.reads - .mix(ch_short_reads.filter { it[0].single_end }) - .map { [ [ id: sprintf("group%s", it[0].group), group: it[0].group, single_end: true ], it[1] ] } - .groupTuple() - } else { + ch_bbnorm = SEQTK_MERGEPE.out.reads + .mix(ch_short_reads.filter { it[0].single_end }) + .map { [[id: sprintf("group%s", it[0].group), group: it[0].group, single_end: true], it[1]] } + .groupTuple() + } + else { ch_bbnorm = ch_short_reads } - BBMAP_BBNORM ( ch_bbnorm ) + BBMAP_BBNORM(ch_bbnorm) ch_versions = ch_versions.mix(BBMAP_BBNORM.out.versions) ch_short_reads_assembly = BBMAP_BBNORM.out.fastq - } else { + } + else { ch_short_reads_assembly = ch_short_reads } - } else { - ch_short_reads = ch_raw_short_reads - .map { - meta, reads -> - def meta_new = meta - meta.subMap('run') - [ meta_new, reads ] - } + } + else { + ch_short_reads = ch_raw_short_reads.map { meta, reads -> + def meta_new = meta - meta.subMap('run') + [meta_new, reads] + } } /* @@ -361,55 +349,15 @@ workflow MAG { Preprocessing and QC for long reads ================================================================================ */ - NANOPLOT_RAW ( - ch_raw_long_reads - ) - ch_versions = ch_versions.mix(NANOPLOT_RAW.out.versions.first()) - ch_long_reads = ch_raw_long_reads - .map { - meta, reads -> - def meta_new = meta - meta.subMap('run') - [ meta_new, reads ] - } - - if ( !params.assembly_input ) { - if (!params.skip_adapter_trimming) { - PORECHOP_PORECHOP ( - ch_raw_long_reads - ) - ch_long_reads = PORECHOP_PORECHOP.out.reads - ch_versions = ch_versions.mix(PORECHOP_PORECHOP.out.versions.first()) - } - - if (!params.keep_lambda) { - NANOLYSE ( - ch_long_reads, - ch_nanolyse_db - ) - ch_long_reads = NANOLYSE.out.fastq - ch_versions = ch_versions.mix(NANOLYSE.out.versions.first()) - } - - // join long and short reads by sample name - ch_short_reads_tmp = ch_short_reads - .map { meta, sr -> [ meta.id, meta, sr ] } - - ch_short_and_long_reads = ch_long_reads - .map { meta, lr -> [ meta.id, meta, lr ] } - .join(ch_short_reads_tmp, by: 0) - .map { id, meta_lr, lr, meta_sr, sr -> [ meta_lr, lr, sr[0], sr[1] ] } // should not occur for single-end, since SPAdes (hybrid) does not support single-end + LONGREAD_PREPROCESSING( + ch_raw_long_reads, + ch_short_reads, + ch_nanolyse_db + ) - FILTLONG ( - ch_short_and_long_reads - ) - ch_long_reads = FILTLONG.out.reads - ch_versions = ch_versions.mix(FILTLONG.out.versions.first()) - - NANOPLOT_FILTERED ( - ch_long_reads - ) - } + ch_versions = ch_versions.mix(LONGREAD_PREPROCESSING.out.versions) + ch_long_reads = LONGREAD_PREPROCESSING.out.long_reads /* ================================================================================ @@ -418,18 +366,20 @@ workflow MAG { */ // Centrifuge - if ( !params.centrifuge_db ) { + if (!params.centrifuge_db) { ch_db_for_centrifuge = Channel.empty() - } else { - if ( file(params.centrifuge_db).isDirectory() ) { + } + else { + if (file(params.centrifuge_db).isDirectory()) { ch_db_for_centrifuge = Channel.of(file(params.centrifuge_db, checkIfExists: true)) - } else { - ch_db_for_centrifuge = CENTRIFUGEDB_UNTAR ( Channel.of([[id: 'db'], file(params.centrifuge_db, checkIfExists: true)])).untar.map{it[1]}.first() + } + else { + ch_db_for_centrifuge = CENTRIFUGEDB_UNTAR(Channel.of([[id: 'db'], file(params.centrifuge_db, checkIfExists: true)])).untar.map { it[1] }.first() ch_versions = ch_versions.mix(CENTRIFUGEDB_UNTAR.out.versions.first()) } } - CENTRIFUGE_CENTRIFUGE ( + CENTRIFUGE_CENTRIFUGE( ch_short_reads, ch_db_for_centrifuge, false, @@ -437,70 +387,74 @@ workflow MAG { ) ch_versions = ch_versions.mix(CENTRIFUGE_CENTRIFUGE.out.versions.first()) - CENTRIFUGE_KREPORT ( CENTRIFUGE_CENTRIFUGE.out.results, ch_db_for_centrifuge ) + CENTRIFUGE_KREPORT(CENTRIFUGE_CENTRIFUGE.out.results, ch_db_for_centrifuge) ch_versions = ch_versions.mix(CENTRIFUGE_KREPORT.out.versions.first()) // Kraken2 - if ( !ch_kraken2_db_file.isEmpty() ) { - if ( ch_kraken2_db_file.extension in ['gz', 'tgz'] ) { + if (!ch_kraken2_db_file.isEmpty()) { + if (ch_kraken2_db_file.extension in ['gz', 'tgz']) { // Expects to be tar.gz! - ch_db_for_kraken2 = KRAKEN2_DB_PREPARATION ( ch_kraken2_db_file ).db - } else if ( ch_kraken2_db_file.isDirectory() ) { + ch_db_for_kraken2 = KRAKEN2_DB_PREPARATION(ch_kraken2_db_file).db + } + else if (ch_kraken2_db_file.isDirectory()) { ch_db_for_kraken2 = Channel - .fromPath( "${ch_kraken2_db_file}/*.k2d" ) - .collect() - .map{ - file -> - if (file.size() >= 3) { - def db_name = file[0].getParent().getName() - [ db_name, file ] - } else { - error("Kraken2 requires '{hash,opts,taxo}.k2d' files.") - } - } - } else { + .fromPath("${ch_kraken2_db_file}/*.k2d") + .collect() + .map { file -> + if (file.size() >= 3) { + def db_name = file[0].getParent().getName() + [db_name, file] + } + else { + error("Kraken2 requires '{hash,opts,taxo}.k2d' files.") + } + } + } + else { ch_db_for_kraken2 = Channel.empty() } - } else { + } + else { ch_db_for_kraken2 = Channel.empty() } - KRAKEN2 ( + KRAKEN2( ch_short_reads, ch_db_for_kraken2 ) ch_versions = ch_versions.mix(KRAKEN2.out.versions.first()) - if (( params.centrifuge_db || params.kraken2_db ) && !params.skip_krona){ - if (params.krona_db){ + if ((params.centrifuge_db || params.kraken2_db) && !params.skip_krona) { + if (params.krona_db) { ch_krona_db = ch_krona_db_file - } else { - KRONA_KRONADB () + } + else { + KRONA_KRONADB() ch_krona_db = KRONA_KRONADB.out.db ch_versions = ch_versions.mix(KRONA_KRONADB.out.versions) } - if ( params.centrifuge_db ) { - ch_centrifuge_for_krona = KREPORT2KRONA_CENTRIFUGE ( CENTRIFUGE_KREPORT.out.kreport ).txt.map{ meta, files -> ['centrifuge', meta, files] } + if (params.centrifuge_db) { + ch_centrifuge_for_krona = KREPORT2KRONA_CENTRIFUGE(CENTRIFUGE_KREPORT.out.kreport).txt.map { meta, files -> ['centrifuge', meta, files] } ch_versions = ch_versions.mix(KREPORT2KRONA_CENTRIFUGE.out.versions.first()) - } else { + } + else { ch_centrifuge_for_krona = Channel.empty() } // Join together for Krona ch_tax_classifications = ch_centrifuge_for_krona - .mix(KRAKEN2.out.results_for_krona) - .map { classifier, meta, report -> - def meta_new = meta + [classifier: classifier] - [ meta_new, report ] - } + .mix(KRAKEN2.out.results_for_krona) + .map { classifier, meta, report -> + def meta_new = meta + [classifier: classifier] + [meta_new, report] + } - KRONA_KTIMPORTTAXONOMY ( + KRONA_KTIMPORTTAXONOMY( ch_tax_classifications, ch_krona_db ) ch_versions = ch_versions.mix(KRONA_KTIMPORTTAXONOMY.out.versions.first()) - } /* @@ -509,137 +463,146 @@ workflow MAG { ================================================================================ */ - if ( !params.assembly_input ) { - // Co-assembly: prepare grouping for MEGAHIT and for pooling for SPAdes + if (!params.assembly_input) { + + // Co-assembly preparation: grouping for MEGAHIT and for pooling for SPAdes if (params.coassemble_group) { // short reads // group and set group as new id ch_short_reads_grouped = ch_short_reads_assembly - .map { meta, reads -> [ meta.group, meta, reads ] } + .map { meta, reads -> [meta.group, meta, reads] } .groupTuple(by: 0) .map { group, metas, reads -> - def assemble_as_single = params.single_end || ( params.bbnorm && params.coassemble_group ) - def meta = [:] - meta.id = "group-$group" - meta.group = group - meta.single_end = assemble_as_single - if ( assemble_as_single ) [ meta, reads.collect { it }, [] ] - else [ meta, reads.collect { it[0] }, reads.collect { it[1] } ] + def assemble_as_single = params.single_end || (params.bbnorm && params.coassemble_group) + def meta = [:] + meta.id = "group-${group}" + meta.group = group + meta.single_end = assemble_as_single + if (assemble_as_single) { + [meta, reads.collect { it }, []] + } + else { + [meta, reads.collect { it[0] }, reads.collect { it[1] }] + } } // long reads // group and set group as new id ch_long_reads_grouped = ch_long_reads - .map { meta, reads -> [ meta.group, meta, reads ] } + .map { meta, reads -> [meta.group, meta, reads] } .groupTuple(by: 0) .map { group, metas, reads -> def meta = [:] - meta.id = "group-$group" - meta.group = group - [ meta, reads.collect { it } ] + meta.id = "group-${group}" + meta.group = group + [meta, reads.collect { it }] } - } else { + } + else { ch_short_reads_grouped = ch_short_reads_assembly .filter { it[0].single_end } - .map { meta, reads -> [ meta, [ reads ], [] ] } - .mix ( - ch_short_reads_assembly - .filter { ! it[0].single_end } - .map { meta, reads -> [ meta, [ reads[0] ], [ reads[1] ] ] } + .map { meta, reads -> [meta, [reads], []] } + .mix( + ch_short_reads_assembly.filter { !it[0].single_end }.map { meta, reads -> [meta, [reads[0]], [reads[1]]] } ) ch_long_reads_grouped = ch_long_reads } - ch_assemblies = Channel.empty() - - if (!params.skip_megahit){ - MEGAHIT ( ch_short_reads_grouped ) - ch_megahit_assemblies = MEGAHIT.out.assembly - .map { meta, assembly -> - def meta_new = meta + [assembler: 'MEGAHIT'] - [ meta_new, assembly ] + if (!params.skip_spades || !params.skip_spadeshybrid) { + if (params.coassemble_group) { + if (params.bbnorm) { + ch_short_reads_spades = ch_short_reads_grouped.map { [it[0], it[1]] } } - ch_assemblies = ch_assemblies.mix(ch_megahit_assemblies) - ch_versions = ch_versions.mix(MEGAHIT.out.versions.first()) - } - - // Co-assembly: pool reads for SPAdes - if ( ! params.skip_spades || ! params.skip_spadeshybrid ){ - if ( params.coassemble_group ) { - if ( params.bbnorm ) { - ch_short_reads_spades = ch_short_reads_grouped.map { [ it[0], it[1] ] } - } else { - POOL_SHORT_SINGLE_READS ( - ch_short_reads_grouped - .filter { it[0].single_end } + else { + POOL_SHORT_SINGLE_READS( + ch_short_reads_grouped.filter { it[0].single_end } ) - POOL_PAIRED_READS ( - ch_short_reads_grouped - .filter { ! it[0].single_end } + POOL_PAIRED_READS( + ch_short_reads_grouped.filter { !it[0].single_end } ) - ch_short_reads_spades = POOL_SHORT_SINGLE_READS.out.reads - .mix(POOL_PAIRED_READS.out.reads) + ch_short_reads_spades = POOL_SHORT_SINGLE_READS.out.reads.mix(POOL_PAIRED_READS.out.reads) } - } else { + } + else { ch_short_reads_spades = ch_short_reads_assembly } // long reads - if (!params.single_end && !params.skip_spadeshybrid){ - POOL_LONG_READS ( ch_long_reads_grouped ) + if (!params.single_end && !params.skip_spadeshybrid) { + POOL_LONG_READS(ch_long_reads_grouped) ch_long_reads_spades = POOL_LONG_READS.out.reads - } else { + } + else { ch_long_reads_spades = Channel.empty() } - } else { + } + else { ch_short_reads_spades = Channel.empty() - ch_long_reads_spades = Channel.empty() + ch_long_reads_spades = Channel.empty() } - if (!params.single_end && !params.skip_spades){ - SPADES ( ch_short_reads_spades ) - ch_spades_assemblies = SPADES.out.assembly - .map { meta, assembly -> - def meta_new = meta + [assembler: 'SPAdes'] - [ meta_new, assembly ] - } - ch_assemblies = ch_assemblies.mix(ch_spades_assemblies) - ch_versions = ch_versions.mix(SPADES.out.versions.first()) + // Assembly + + ch_assembled_contigs = Channel.empty() + + if (!params.single_end && !params.skip_spades) { + METASPADES(ch_short_reads_spades.map { meta, reads -> [meta, reads, [], []] }, [], []) + ch_spades_assemblies = METASPADES.out.scaffolds.map { meta, assembly -> + def meta_new = meta + [assembler: 'SPAdes'] + [meta_new, assembly] + } + ch_assembled_contigs = ch_assembled_contigs.mix(ch_spades_assemblies) + ch_versions = ch_versions.mix(METASPADES.out.versions.first()) } - if (!params.single_end && !params.skip_spadeshybrid){ - ch_short_reads_spades_tmp = ch_short_reads_spades - .map { meta, reads -> [ meta.id, meta, reads ] } + if (!params.single_end && !params.skip_spadeshybrid) { + ch_short_reads_spades_tmp = ch_short_reads_spades.map { meta, reads -> [meta.id, meta, reads] } ch_reads_spadeshybrid = ch_long_reads_spades - .map { meta, reads -> [ meta.id, meta, reads ] } + .map { meta, reads -> [meta.id, meta, reads] } .combine(ch_short_reads_spades_tmp, by: 0) - .map { id, meta_long, long_reads, meta_short, short_reads -> [ meta_short, long_reads, short_reads ] } + .map { id, meta_long, long_reads, meta_short, short_reads -> [meta_short, short_reads, [], long_reads] } - SPADESHYBRID ( ch_reads_spadeshybrid ) - ch_spadeshybrid_assemblies = SPADESHYBRID.out.assembly - .map { meta, assembly -> - def meta_new = meta + [assembler: "SPAdesHybrid"] - [ meta_new, assembly ] - } - ch_assemblies = ch_assemblies.mix(ch_spadeshybrid_assemblies) - ch_versions = ch_versions.mix(SPADESHYBRID.out.versions.first()) + METASPADESHYBRID(ch_reads_spadeshybrid, [], []) + ch_spadeshybrid_assemblies = METASPADESHYBRID.out.scaffolds.map { meta, assembly -> + def meta_new = meta + [assembler: "SPAdesHybrid"] + [meta_new, assembly] + } + ch_assembled_contigs = ch_assembled_contigs.mix(ch_spadeshybrid_assemblies) + ch_versions = ch_versions.mix(METASPADESHYBRID.out.versions.first()) } - } else { - ch_assemblies_split = ch_input_assemblies - .branch { meta, assembly -> - gzipped: assembly.getExtension() == "gz" - ungzip: true + + if (!params.skip_megahit) { + MEGAHIT(ch_short_reads_grouped) + ch_megahit_assemblies = MEGAHIT.out.contigs.map { meta, assembly -> + def meta_new = meta + [assembler: 'MEGAHIT'] + [meta_new, assembly] } + ch_assembled_contigs = ch_assembled_contigs.mix(ch_megahit_assemblies) + ch_versions = ch_versions.mix(MEGAHIT.out.versions.first()) + } + - GUNZIP_ASSEMBLIES(ch_assemblies_split.gzipped) + + GUNZIP_ASSEMBLIES(ch_assembled_contigs) ch_versions = ch_versions.mix(GUNZIP_ASSEMBLIES.out.versions) + ch_assemblies = GUNZIP_ASSEMBLIES.out.gunzip + } + else { + ch_assemblies_split = ch_input_assemblies.branch { meta, assembly -> + gzipped: assembly.getExtension() == "gz" + ungzip: true + } + + GUNZIP_ASSEMBLYINPUT(ch_assemblies_split.gzipped) + ch_versions = ch_versions.mix(GUNZIP_ASSEMBLYINPUT.out.versions) + ch_assemblies = Channel.empty() - ch_assemblies = ch_assemblies.mix(ch_assemblies_split.ungzip, GUNZIP_ASSEMBLIES.out.gunzip) + ch_assemblies = ch_assemblies.mix(ch_assemblies_split.ungzip, GUNZIP_ASSEMBLYINPUT.out.gunzip) } ch_quast_multiqc = Channel.empty() - if (!params.skip_quast){ - QUAST ( ch_assemblies ) + if (!params.skip_quast) { + QUAST(ch_assemblies) ch_versions = ch_versions.mix(QUAST.out.versions.first()) } @@ -649,8 +612,8 @@ workflow MAG { ================================================================================ */ - if (!params.skip_prodigal){ - PRODIGAL ( + if (!params.skip_prodigal) { + PRODIGAL( ch_assemblies, 'gff' ) @@ -663,7 +626,7 @@ workflow MAG { ================================================================================ */ - if (params.run_virus_identification){ + if (params.run_virus_identification) { VIRUS_IDENTIFICATION(ch_assemblies, ch_genomad_db) ch_versions = ch_versions.mix(VIRUS_IDENTIFICATION.out.versions.first()) } @@ -674,11 +637,11 @@ workflow MAG { ================================================================================ */ - ch_busco_summary = Channel.empty() - ch_checkm_summary = Channel.empty() + ch_busco_summary = Channel.empty() + ch_checkm_summary = Channel.empty() - if ( !params.skip_binning || params.ancient_dna ) { - BINNING_PREPARATION ( + if (!params.skip_binning || params.ancient_dna) { + BINNING_PREPARATION( ch_assemblies, ch_short_reads ) @@ -691,7 +654,7 @@ workflow MAG { ================================================================================ */ - if (params.ancient_dna){ + if (params.ancient_dna) { ANCIENT_DNA_ASSEMBLY_VALIDATION(BINNING_PREPARATION.out.grouped_mappings) ch_versions = ch_versions.mix(ANCIENT_DNA_ASSEMBLY_VALIDATION.out.versions.first()) } @@ -702,88 +665,81 @@ workflow MAG { ================================================================================ */ - if (!params.skip_binning){ + if (!params.skip_binning) { // Make sure if running aDNA subworkflow to use the damage-corrected contigs for higher accuracy if (params.ancient_dna && !params.skip_ancient_damagecorrection) { - BINNING ( - BINNING_PREPARATION.out.grouped_mappings - .join(ANCIENT_DNA_ASSEMBLY_VALIDATION.out.contigs_recalled) - .map { it -> [ it[0], it[4], it[2], it[3] ] }, // [meta, contigs_recalled, bam, bais] + BINNING( + BINNING_PREPARATION.out.grouped_mappings.join(ANCIENT_DNA_ASSEMBLY_VALIDATION.out.contigs_recalled).map { it -> [it[0], it[4], it[2], it[3]] }, ch_short_reads ) - } else { - BINNING ( + } + else { + BINNING( BINNING_PREPARATION.out.grouped_mappings, ch_short_reads ) } ch_versions = ch_versions.mix(BINNING.out.versions) - if ( params.bin_domain_classification ) { + if (params.bin_domain_classification) { // Make sure if running aDNA subworkflow to use the damage-corrected contigs for higher accuracy if (params.ancient_dna && !params.skip_ancient_damagecorrection) { ch_assemblies_for_domainclassification = ANCIENT_DNA_ASSEMBLY_VALIDATION.out.contigs_recalled - } else { + } + else { ch_assemblies_for_domainclassification = ch_assemblies } - DOMAIN_CLASSIFICATION ( ch_assemblies_for_domainclassification, BINNING.out.bins, BINNING.out.unbinned ) - ch_binning_results_bins = DOMAIN_CLASSIFICATION.out.classified_bins + DOMAIN_CLASSIFICATION(ch_assemblies_for_domainclassification, BINNING.out.bins, BINNING.out.unbinned) + ch_binning_results_bins = DOMAIN_CLASSIFICATION.out.classified_bins ch_binning_results_unbins = DOMAIN_CLASSIFICATION.out.classified_unbins - ch_versions = ch_versions.mix(DOMAIN_CLASSIFICATION.out.versions) - - - } else { - ch_binning_results_bins = BINNING.out.bins - .map { meta, bins -> - def meta_new = meta + [domain: 'unclassified'] - [meta_new, bins] - } - ch_binning_results_unbins = BINNING.out.unbinned - .map { meta, bins -> - def meta_new = meta + [domain: 'unclassified'] - [meta_new, bins] - } + ch_versions = ch_versions.mix(DOMAIN_CLASSIFICATION.out.versions) + } + else { + ch_binning_results_bins = BINNING.out.bins.map { meta, bins -> + def meta_new = meta + [domain: 'unclassified'] + [meta_new, bins] + } + ch_binning_results_unbins = BINNING.out.unbinned.map { meta, bins -> + def meta_new = meta + [domain: 'unclassified'] + [meta_new, bins] + } } /* * DAS Tool: binning refinement */ - ch_binning_results_bins = ch_binning_results_bins - .map { meta, bins -> - def meta_new = meta + [refinement:'unrefined'] - [meta_new , bins] - } + ch_binning_results_bins = ch_binning_results_bins.map { meta, bins -> + def meta_new = meta + [refinement: 'unrefined'] + [meta_new, bins] + } - ch_binning_results_unbins = ch_binning_results_unbins - .map { meta, bins -> - def meta_new = meta + [refinement:'unrefined_unbinned'] - [meta_new, bins] - } + ch_binning_results_unbins = ch_binning_results_unbins.map { meta, bins -> + def meta_new = meta + [refinement: 'unrefined_unbinned'] + [meta_new, bins] + } // If any two of the binners are both skipped at once, do not run because DAS_Tool needs at least one - if ( params.refine_bins_dastool ) { - ch_prokarya_bins_dastool = ch_binning_results_bins - .filter { meta, bins -> - meta.domain != "eukarya" - } + if (params.refine_bins_dastool) { + ch_prokarya_bins_dastool = ch_binning_results_bins.filter { meta, bins -> + meta.domain != "eukarya" + } - ch_eukarya_bins_dastool = ch_binning_results_bins - .filter { meta, bins -> - meta.domain == "eukarya" - } + ch_eukarya_bins_dastool = ch_binning_results_bins.filter { meta, bins -> + meta.domain == "eukarya" + } if (params.ancient_dna) { ch_contigs_for_binrefinement = ANCIENT_DNA_ASSEMBLY_VALIDATION.out.contigs_recalled - } else { - ch_contigs_for_binrefinement = BINNING_PREPARATION.out.grouped_mappings - .map{ meta, contigs, bam, bai -> [ meta, contigs ] } + } + else { + ch_contigs_for_binrefinement = BINNING_PREPARATION.out.grouped_mappings.map { meta, contigs, bam, bai -> [meta, contigs] } } - BINNING_REFINEMENT ( ch_contigs_for_binrefinement, ch_prokarya_bins_dastool ) + BINNING_REFINEMENT(ch_contigs_for_binrefinement, ch_prokarya_bins_dastool) // ch_refined_bins = ch_eukarya_bins_dastool // .map{ meta, bins -> // def meta_new = meta + [refinement: 'eukaryote_unrefined'] @@ -794,23 +750,26 @@ workflow MAG { ch_refined_unbins = BINNING_REFINEMENT.out.refined_unbins ch_versions = ch_versions.mix(BINNING_REFINEMENT.out.versions) - if ( params.postbinning_input == 'raw_bins_only' ) { - ch_input_for_postbinning_bins = ch_binning_results_bins + if (params.postbinning_input == 'raw_bins_only') { + ch_input_for_postbinning_bins = ch_binning_results_bins ch_input_for_postbinning_bins_unbins = ch_binning_results_bins.mix(ch_binning_results_unbins) - } else if ( params.postbinning_input == 'refined_bins_only' ) { - ch_input_for_postbinning_bins = ch_refined_bins + } + else if (params.postbinning_input == 'refined_bins_only') { + ch_input_for_postbinning_bins = ch_refined_bins ch_input_for_postbinning_bins_unbins = ch_refined_bins.mix(ch_refined_unbins) - } else if ( params.postbinning_input == 'both' ) { + } + else if (params.postbinning_input == 'both') { ch_all_bins = ch_binning_results_bins.mix(ch_refined_bins) - ch_input_for_postbinning_bins = ch_all_bins + ch_input_for_postbinning_bins = ch_all_bins ch_input_for_postbinning_bins_unbins = ch_all_bins.mix(ch_binning_results_unbins).mix(ch_refined_unbins) } - } else { - ch_input_for_postbinning_bins = ch_binning_results_bins + } + else { + ch_input_for_postbinning_bins = ch_binning_results_bins ch_input_for_postbinning_bins_unbins = ch_binning_results_bins.mix(ch_binning_results_unbins) } - DEPTHS ( ch_input_for_postbinning_bins_unbins, BINNING.out.metabat2depths, ch_short_reads ) + DEPTHS(ch_input_for_postbinning_bins_unbins, BINNING.out.metabat2depths, ch_short_reads) ch_input_for_binsummary = DEPTHS.out.depths_summary ch_versions = ch_versions.mix(DEPTHS.out.versions) @@ -820,74 +779,70 @@ workflow MAG { ch_input_bins_for_qc = ch_input_for_postbinning_bins_unbins.transpose() - if (!params.skip_binqc && params.binqc_tool == 'busco'){ + if (!params.skip_binqc && params.binqc_tool == 'busco') { /* * BUSCO subworkflow: Quantitative measures for the assessment of genome assembly */ - BUSCO_QC ( + BUSCO_QC( ch_busco_db, ch_input_bins_for_qc ) ch_busco_summary = BUSCO_QC.out.summary ch_versions = ch_versions.mix(BUSCO_QC.out.versions.first()) // process information if BUSCO analysis failed for individual bins due to no matching genes - BUSCO_QC.out - .failed_bin - .splitCsv(sep: '\t') - .map { bin, error -> if (!bin.contains(".unbinned.")) busco_failed_bins[bin] = error } + BUSCO_QC.out.failed_bin.splitCsv(sep: '\t').map { bin, error -> + if (!bin.contains(".unbinned.")) { + busco_failed_bins[bin] = error + } + } } - if (!params.skip_binqc && params.binqc_tool == 'checkm'){ + if (!params.skip_binqc && params.binqc_tool == 'checkm') { /* * CheckM subworkflow: Quantitative measures for the assessment of genome assembly */ - ch_input_bins_for_checkm = ch_input_bins_for_qc - .filter { meta, bins -> - meta.domain != "eukarya" - } + ch_input_bins_for_checkm = ch_input_bins_for_qc.filter { meta, bins -> + meta.domain != "eukarya" + } - CHECKM_QC ( + CHECKM_QC( ch_input_bins_for_checkm.groupTuple(), ch_checkm_db ) ch_checkm_summary = CHECKM_QC.out.summary ch_versions = ch_versions.mix(CHECKM_QC.out.versions) - } - if ( params.run_gunc && params.binqc_tool == 'checkm' ) { - GUNC_QC ( ch_input_bins_for_checkm, ch_gunc_db, CHECKM_QC.out.checkm_tsv ) - ch_versions = ch_versions.mix( GUNC_QC.out.versions ) - } else if ( params.run_gunc ) { - ch_input_bins_for_gunc = ch_input_for_postbinning_bins_unbins - .filter { meta, bins -> - meta.domain != "eukarya" - } - GUNC_QC ( ch_input_bins_for_qc, ch_gunc_db, [] ) - ch_versions = ch_versions.mix( GUNC_QC.out.versions ) + if (params.run_gunc && params.binqc_tool == 'checkm') { + GUNC_QC(ch_input_bins_for_checkm, ch_gunc_db, CHECKM_QC.out.checkm_tsv) + ch_versions = ch_versions.mix(GUNC_QC.out.versions) + } + else if (params.run_gunc) { + ch_input_bins_for_gunc = ch_input_for_postbinning_bins_unbins.filter { meta, bins -> + meta.domain != "eukarya" + } + GUNC_QC(ch_input_bins_for_qc, ch_gunc_db, []) + ch_versions = ch_versions.mix(GUNC_QC.out.versions) } ch_quast_bins_summary = Channel.empty() - if (!params.skip_quast){ + if (!params.skip_quast) { ch_input_for_quast_bins = ch_input_for_postbinning_bins_unbins - .groupTuple() - .map { - meta, bins -> - def new_bins = bins.flatten() - [meta, new_bins] - } - - QUAST_BINS ( ch_input_for_quast_bins ) + .groupTuple() + .map { meta, bins -> + def new_bins = bins.flatten() + [meta, new_bins] + } + + QUAST_BINS(ch_input_for_quast_bins) ch_versions = ch_versions.mix(QUAST_BINS.out.versions.first()) - ch_quast_bin_summary = QUAST_BINS.out.quast_bin_summaries - .collectFile(keepHeader: true) { - meta, summary -> - ["${meta.id}.tsv", summary] + ch_quast_bin_summary = QUAST_BINS.out.quast_bin_summaries.collectFile(keepHeader: true) { meta, summary -> + ["${meta.id}.tsv", summary] } - QUAST_BINS_SUMMARY ( ch_quast_bin_summary.collect() ) + QUAST_BINS_SUMMARY(ch_quast_bin_summary.collect()) ch_quast_bins_summary = QUAST_BINS_SUMMARY.out.summary } @@ -895,23 +850,22 @@ workflow MAG { * CAT: Bin Annotation Tool (BAT) are pipelines for the taxonomic classification of long DNA sequences and metagenome assembled genomes (MAGs/bins) */ ch_cat_db = Channel.empty() - if (params.cat_db){ - CAT_DB ( ch_cat_db_file ) + if (params.cat_db) { + CAT_DB(ch_cat_db_file) ch_cat_db = CAT_DB.out.db - } else if (params.cat_db_generate){ - CAT_DB_GENERATE () + } + else if (params.cat_db_generate) { + CAT_DB_GENERATE() ch_cat_db = CAT_DB_GENERATE.out.db } - CAT ( + CAT( ch_input_for_postbinning_bins_unbins, ch_cat_db ) // Group all classification results for each sample in a single file - ch_cat_summary = CAT.out.tax_classification_names - .collectFile(keepHeader: true) { - meta, classification -> - ["${meta.id}.txt", classification] - } + ch_cat_summary = CAT.out.tax_classification_names.collectFile(keepHeader: true) { meta, classification -> + ["${meta.id}.txt", classification] + } // Group all classification results for the whole run in a single file CAT_SUMMARY( ch_cat_summary.collect() @@ -920,9 +874,10 @@ workflow MAG { ch_versions = ch_versions.mix(CAT_SUMMARY.out.versions) // If CAT is not run, then the CAT global summary should be an empty channel - if ( params.cat_db_generate || params.cat_db) { + if (params.cat_db_generate || params.cat_db) { ch_cat_global_summary = CAT_SUMMARY.out.combined - } else { + } + else { ch_cat_global_summary = Channel.empty() } @@ -930,17 +885,16 @@ workflow MAG { * GTDB-tk: taxonomic classifications using GTDB reference */ - if ( !params.skip_gtdbtk ) { + if (!params.skip_gtdbtk) { ch_gtdbtk_summary = Channel.empty() - if ( gtdb ){ + if (gtdb) { - ch_gtdb_bins = ch_input_for_postbinning_bins_unbins - .filter { meta, bins -> - meta.domain != "eukarya" - } + ch_gtdb_bins = ch_input_for_postbinning_bins_unbins.filter { meta, bins -> + meta.domain != "eukarya" + } - GTDBTK ( + GTDBTK( ch_gtdb_bins, ch_busco_summary, ch_checkm_summary, @@ -950,12 +904,13 @@ workflow MAG { ch_versions = ch_versions.mix(GTDBTK.out.versions.first()) ch_gtdbtk_summary = GTDBTK.out.summary } - } else { + } + else { ch_gtdbtk_summary = Channel.empty() } - if ( ( !params.skip_binqc ) || !params.skip_quast || !params.skip_gtdbtk){ - BIN_SUMMARY ( + if ((!params.skip_binqc) || !params.skip_quast || !params.skip_gtdbtk) { + BIN_SUMMARY( ch_input_for_binsummary, ch_busco_summary.ifEmpty([]), ch_checkm_summary.ifEmpty([]), @@ -969,17 +924,18 @@ workflow MAG { * Prokka: Genome annotation */ - if (!params.skip_prokka){ - ch_bins_for_prokka = ch_input_for_postbinning_bins_unbins.transpose() - .map { meta, bin -> - def meta_new = meta + [id: bin.getBaseName()] - [ meta_new, bin ] - } - .filter { meta, bin -> - meta.domain != "eukarya" - } + if (!params.skip_prokka) { + ch_bins_for_prokka = ch_input_for_postbinning_bins_unbins + .transpose() + .map { meta, bin -> + def meta_new = meta + [id: bin.getBaseName()] + [meta_new, bin] + } + .filter { meta, bin -> + meta.domain != "eukarya" + } - PROKKA ( + PROKKA( ch_bins_for_prokka, [], [] @@ -988,16 +944,17 @@ workflow MAG { } if (!params.skip_metaeuk && (params.metaeuk_db || params.metaeuk_mmseqs_db)) { - ch_bins_for_metaeuk = ch_input_for_postbinning_bins_unbins.transpose() + ch_bins_for_metaeuk = ch_input_for_postbinning_bins_unbins + .transpose() .filter { meta, bin -> meta.domain in ["eukarya", "unclassified"] } .map { meta, bin -> def meta_new = meta + [id: bin.getBaseName()] - [ meta_new, bin ] + [meta_new, bin] } - METAEUK_EASYPREDICT (ch_bins_for_metaeuk, ch_metaeuk_db) + METAEUK_EASYPREDICT(ch_bins_for_metaeuk, ch_metaeuk_db) ch_versions = ch_versions.mix(METAEUK_EASYPREDICT.out.versions) } } @@ -1008,35 +965,42 @@ workflow MAG { softwareVersionsToYAML(ch_versions) .collectFile( storeDir: "${params.outdir}/pipeline_info", - name: 'nf_core_pipeline_software_mqc_versions.yml', + name: 'nf_core_' + 'pipeline_software_' + 'mqc_' + 'versions.yml', sort: true, newLine: true - ).set { ch_collated_versions } + ) + .set { ch_collated_versions } + // // MODULE: MultiQC // - ch_multiqc_config = Channel.fromPath( - "$projectDir/assets/multiqc_config.yml", checkIfExists: true) - ch_multiqc_custom_config = params.multiqc_config ? - Channel.fromPath(params.multiqc_config, checkIfExists: true) : - Channel.empty() - ch_multiqc_logo = params.multiqc_logo ? - Channel.fromPath(params.multiqc_logo, checkIfExists: true) : - Channel.fromPath("${workflow.projectDir}/docs/images/mag_logo_mascot_light.png", checkIfExists: true) - - summary_params = paramsSummaryMap( - workflow, parameters_schema: "nextflow_schema.json") + ch_multiqc_config = Channel.fromPath( + "${projectDir}/assets/multiqc_config.yml", + checkIfExists: true + ) + ch_multiqc_custom_config = params.multiqc_config + ? Channel.fromPath(params.multiqc_config, checkIfExists: true) + : Channel.empty() + ch_multiqc_logo = params.multiqc_logo + ? Channel.fromPath(params.multiqc_logo, checkIfExists: true) + : Channel.fromPath("${workflow.projectDir}/docs/images/mag_logo_mascot_light.png", checkIfExists: true) + + summary_params = paramsSummaryMap( + workflow, + parameters_schema: "nextflow_schema.json" + ) ch_workflow_summary = Channel.value(paramsSummaryMultiqc(summary_params)) - - ch_multiqc_custom_methods_description = params.multiqc_methods_description ? - file(params.multiqc_methods_description, checkIfExists: true) : - file("$projectDir/assets/methods_description_template.yml", checkIfExists: true) - ch_methods_description = Channel.value( - methodsDescriptionText(ch_multiqc_custom_methods_description)) - ch_multiqc_files = ch_multiqc_files.mix( - ch_workflow_summary.collectFile(name: 'workflow_summary_mqc.yaml')) + ch_workflow_summary.collectFile(name: 'workflow_summary_mqc.yaml') + ) + ch_multiqc_custom_methods_description = params.multiqc_methods_description + ? file(params.multiqc_methods_description, checkIfExists: true) + : file("${projectDir}/assets/methods_description_template.yml", checkIfExists: true) + ch_methods_description = Channel.value( + methodsDescriptionText(ch_multiqc_custom_methods_description) + ) + ch_multiqc_files = ch_multiqc_files.mix(ch_collated_versions) ch_multiqc_files = ch_multiqc_files.mix( ch_methods_description.collectFile( @@ -1045,70 +1009,66 @@ workflow MAG { ) ) - ch_multiqc_files = ch_multiqc_files.mix(FASTQC_RAW.out.zip.collect{it[1]}.ifEmpty([])) + ch_multiqc_files = ch_multiqc_files.mix(FASTQC_RAW.out.zip.collect { it[1] }.ifEmpty([])) + ch_multiqc_files = ch_multiqc_files.mix(LONGREAD_PREPROCESSING.out.multiqc_files.collect { it[1] }.ifEmpty([])) if (!params.assembly_input) { - if ( !params.skip_clipping && params.clip_tool == 'adapterremoval' ) { - ch_multiqc_files = ch_multiqc_files.mix(ADAPTERREMOVAL_PE.out.settings.collect{it[1]}.ifEmpty([])) - ch_multiqc_files = ch_multiqc_files.mix(ADAPTERREMOVAL_SE.out.settings.collect{it[1]}.ifEmpty([])) - - } else if ( !params.skip_clipping && params.clip_tool == 'fastp' ) { - ch_multiqc_files = ch_multiqc_files.mix(FASTP.out.json.collect{it[1]}.ifEmpty([])) + if (!params.skip_clipping && params.clip_tool == 'adapterremoval') { + ch_multiqc_files = ch_multiqc_files.mix(ADAPTERREMOVAL_PE.out.settings.collect { it[1] }.ifEmpty([])) + ch_multiqc_files = ch_multiqc_files.mix(ADAPTERREMOVAL_SE.out.settings.collect { it[1] }.ifEmpty([])) + } + else if (!params.skip_clipping && params.clip_tool == 'fastp') { + ch_multiqc_files = ch_multiqc_files.mix(FASTP.out.json.collect { it[1] }.ifEmpty([])) } if (!(params.keep_phix && params.skip_clipping && !(params.host_genome || params.host_fasta))) { - ch_multiqc_files = ch_multiqc_files.mix(FASTQC_TRIMMED.out.zip.collect{it[1]}.ifEmpty([])) + ch_multiqc_files = ch_multiqc_files.mix(FASTQC_TRIMMED.out.zip.collect { it[1] }.ifEmpty([])) } - if ( params.host_fasta || params.host_genome ) { - ch_multiqc_files = ch_multiqc_files.mix(BOWTIE2_HOST_REMOVAL_ALIGN.out.log.collect{it[1]}.ifEmpty([])) + if (params.host_fasta || params.host_genome) { + ch_multiqc_files = ch_multiqc_files.mix(BOWTIE2_HOST_REMOVAL_ALIGN.out.log.collect { it[1] }.ifEmpty([])) } - if(!params.keep_phix) { - ch_multiqc_files = ch_multiqc_files.mix(BOWTIE2_PHIX_REMOVAL_ALIGN.out.log.collect{it[1]}.ifEmpty([])) + if (!params.keep_phix) { + ch_multiqc_files = ch_multiqc_files.mix(BOWTIE2_PHIX_REMOVAL_ALIGN.out.log.collect { it[1] }.ifEmpty([])) } - } - ch_multiqc_files = ch_multiqc_files.mix(CENTRIFUGE_KREPORT.out.kreport.collect{it[1]}.ifEmpty([])) - ch_multiqc_files = ch_multiqc_files.mix(KRAKEN2.out.report.collect{it[1]}.ifEmpty([])) + ch_multiqc_files = ch_multiqc_files.mix(CENTRIFUGE_KREPORT.out.kreport.collect { it[1] }.ifEmpty([])) + ch_multiqc_files = ch_multiqc_files.mix(KRAKEN2.out.report.collect { it[1] }.ifEmpty([])) - if (!params.skip_quast){ + if (!params.skip_quast) { ch_multiqc_files = ch_multiqc_files.mix(QUAST.out.report.collect().ifEmpty([])) - if ( !params.skip_binning ) { + if (!params.skip_binning) { ch_multiqc_files = ch_multiqc_files.mix(QUAST_BINS.out.dir.collect().ifEmpty([])) } } - if ( !params.skip_binning || params.ancient_dna ) { + if (!params.skip_binning || params.ancient_dna) { ch_multiqc_files = ch_multiqc_files.mix(BINNING_PREPARATION.out.bowtie2_assembly_multiqc.collect().ifEmpty([])) } - if (!params.skip_binning && !params.skip_prokka){ - ch_multiqc_files = ch_multiqc_files.mix(PROKKA.out.txt.collect{it[1]}.ifEmpty([])) + if (!params.skip_binning && !params.skip_prokka) { + ch_multiqc_files = ch_multiqc_files.mix(PROKKA.out.txt.collect { it[1] }.ifEmpty([])) } - if (!params.skip_binning && !params.skip_binqc && params.binqc_tool == 'busco'){ + if (!params.skip_binning && !params.skip_binqc && params.binqc_tool == 'busco') { ch_multiqc_files = ch_multiqc_files.mix(BUSCO_QC.out.multiqc.collect().ifEmpty([])) } - MULTIQC ( + MULTIQC( ch_multiqc_files.collect(), ch_multiqc_config.toList(), ch_multiqc_custom_config.toList(), - ch_multiqc_logo.toList() + ch_multiqc_logo.toList(), + [], + [] ) emit: multiqc_report = MULTIQC.out.report.toList() // channel: /path/to/multiqc_report.html - versions = ch_versions // channel: [ path(versions.yml) ] + versions = ch_versions // channel: [ path(versions.yml) ] } - -/* -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - THE END -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -*/