Skip to content

Commit

Permalink
Use nf-core template for the pipeline to facilitate standardisation (#84
Browse files Browse the repository at this point in the history
)

* Using `nf-core` 🍏  template on MTBseq-nf step 1️⃣  (#78)

* using base nf-core template from nf-core tools

* Update nf-core-mtbseq/main.nf

---------

Co-authored-by: Abhinav Sharma <[email protected]>

* updates using the last version

* creating all modules using nf-core template

* updating `tbamend`

* adding double slash (`\\`) to `tbfull`

* updating `tbgroups`

* updating `tbpile`

* using meta.id instead of genomeFileName

* update `tbstrains` and `tbvariants`

* update `tbbwa`

* update `tbjoin`

* adding external args on `tbbwa` and `tbvariants`

* update tblist

* update tbstats

* update nf-core tool and run `nf-core create` again (#82)

* creating all modules using nf-core template

* updating `tbamend`

* adding double slash (`\\`) to `tbfull`

* updating `tbgroups`

* updating `tbpile`

* using meta.id instead of genomeFileName

* update `tbstrains` and `tbvariants`

* update `tbbwa`

* update `tbjoin`

* adding external args on `tbbwa` and `tbvariants`

* update tblist

* update tbstats

* updating tbrefine

* using meta.id and params.project on tbpile and tbvariants tags

* re-write abhinav1s suggestion

* update modules accordingly to abhinav's suggestions

* move the changes to the correct folder

* updating the nf-core template using nf-core tools

* remove nf-core-mtbseq residual folder

* create a new nf-core-mtbseq using the nf-core tools template

* add the modules to the new template

* move the old workflows inside the new template

* merge the new template into the old pipeline

* remove template folder and add hidden files

* move subworkflows to correct location

* usind sed to rename the pipeline accordingly

* creating basic pipeline logic

* first `-stub-run`viable version

* add container option

* fix config regex for tb processes

* dev-interation-1

* add resources labels to all mtbseq processes

* due to missing `\\` the program was trowing exit code 1 even with success

* tweaking tbbwa input filename

* substitute `genomeFileName` by `meta.id` to enforce nf-core guidelines  usage

* remove error on tbfull

* tweak processes output to export `meta` tuple instead of `meta.id`

* tweak processes input to `meta` instead of `meta.id`

* tweaking the tsv sampesheet generation (needs test)

---------

Co-authored-by: Davi Marcon <[email protected]>
  • Loading branch information
abhi18av and Mxrcon authored Mar 15, 2024
1 parent fe4a1d5 commit 281216f
Show file tree
Hide file tree
Showing 94 changed files with 4,503 additions and 554 deletions.
20 changes: 20 additions & 0 deletions .devcontainer/devcontainer.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
{
"name": "nfcore",
"image": "nfcore/gitpod:latest",
"remoteUser": "gitpod",
"runArgs": ["--privileged"],

// Configure tool-specific properties.
"customizations": {
// Configure properties specific to VS Code.
"vscode": {
// Set *default* container specific settings.json values on container create.
"settings": {
"python.defaultInterpreterPath": "/opt/conda/bin/python"
},

// Add the IDs of extensions you want installed when the container is created.
"extensions": ["ms-python.python", "ms-python.vscode-pylance", "nf-core.nf-core-extensionpack"]
}
}
}
37 changes: 37 additions & 0 deletions .editorconfig
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
root = true

[*]
charset = utf-8
end_of_line = lf
insert_final_newline = true
trim_trailing_whitespace = true
indent_size = 4
indent_style = space

[*.{md,yml,yaml,html,css,scss,js}]
indent_size = 2

# These files are edited and tested upstream in nf-core/modules
[/modules/nf-core/**]
charset = unset
end_of_line = unset
insert_final_newline = unset
trim_trailing_whitespace = unset
indent_style = unset
[/subworkflows/nf-core/**]
charset = unset
end_of_line = unset
insert_final_newline = unset
trim_trailing_whitespace = unset
indent_style = unset

[/assets/email*]
indent_size = unset

# ignore Readme
[README.md]
indent_style = unset

# ignore python
[*.{py,md}]
indent_style = unset
4 changes: 4 additions & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
*.config linguist-language=nextflow
*.nf.test linguist-language=nextflow
modules/nf-core/** linguist-generated
subworkflows/nf-core/** linguist-generated
20 changes: 20 additions & 0 deletions .gitpod.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
image: nfcore/gitpod:latest
tasks:
- name: Update Nextflow and setup pre-commit
command: |
pre-commit install --install-hooks
nextflow self-update
- name: unset JAVA_TOOL_OPTIONS
command: |
unset JAVA_TOOL_OPTIONS
vscode:
extensions: # based on nf-core.nf-core-extensionpack
- esbenp.prettier-vscode # Markdown/CommonMark linting and style checking for Visual Studio Code
- EditorConfig.EditorConfig # override user/workspace settings with settings found in .editorconfig files
- Gruntfuggly.todo-tree # Display TODO and FIXME in a tree view in the activity bar
- mechatroner.rainbow-csv # Highlight columns in csv files in different colors
# - nextflow.nextflow # Nextflow syntax highlighting
- oderwat.indent-rainbow # Highlight indentation level
- streetsidesoftware.code-spell-checker # Spelling checker for source code
- charliermarsh.ruff # Code linter Ruff
50 changes: 50 additions & 0 deletions .nf-core.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
lint:
files_exist:
- CODE_OF_CONDUCT.md
- assets/nf-core-mtbseqnf_logo_light.png
- docs/images/nf-core-mtbseqnf_logo_light.png
- docs/images/nf-core-mtbseqnf_logo_dark.png
- .github/ISSUE_TEMPLATE/config.yml
- .github/workflows/awstest.yml
- .github/workflows/awsfulltest.yml
- .github/ISSUE_TEMPLATE/bug_report.yml
- .github/ISSUE_TEMPLATE/feature_request.yml
- .github/PULL_REQUEST_TEMPLATE.md
- .github/CONTRIBUTING.md
- .github/.dockstore.yml
- .gitignore
- .github/workflows/branch.yml
- .github/workflows/ci.yml
- .github/workflows/linting_comment.yml
- .github/workflows/linting.yml
- conf/igenomes.config
files_unchanged:
- CODE_OF_CONDUCT.md
- assets/nf-core-mtbseqnf_logo_light.png
- docs/images/nf-core-mtbseqnf_logo_light.png
- docs/images/nf-core-mtbseqnf_logo_dark.png
- .github/ISSUE_TEMPLATE/bug_report.yml
- .github/ISSUE_TEMPLATE/config.yml
- .github/ISSUE_TEMPLATE/feature_request.yml
- .github/PULL_REQUEST_TEMPLATE.md
- .github/workflows/branch.yml
- .github/workflows/linting_comment.yml
- .github/workflows/linting.yml
- .github/CONTRIBUTING.md
- .github/.dockstore.yml
- .github/ISSUE_TEMPLATE/bug_report.yml
multiqc_config:
- report_comment
nextflow_config:
- manifest.name
- manifest.homePage
readme:
- nextflow_badge
repository_type: pipeline
template:
prefix: mtbseq-nf
skip:
- github
- ci
- github_badges
- igenomes
10 changes: 10 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
repos:
- repo: https://github.com/pre-commit/mirrors-prettier
rev: "v3.1.0"
hooks:
- id: prettier
- repo: https://github.com/editorconfig-checker/editorconfig-checker.python
rev: "2.7.3"
hooks:
- id: editorconfig-checker
alias: ec
12 changes: 12 additions & 0 deletions .prettierignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
email_template.html
adaptivecard.json
slackreport.json
.nextflow*
work/
data/
results/
.DS_Store
testing/
testing*
*.pyc
bin/
1 change: 1 addition & 0 deletions .prettierrc.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
printWidth: 120
16 changes: 16 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# mycobactopia-org/MTBseq-nf: Changelog

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## v1.0dev - [date]

Initial release of mycobactopia-org/MTBseq-nf, created with the [nf-core](https://nf-co.re/) template.

### `Added`

### `Fixed`

### `Dependencies`

### `Deprecated`
41 changes: 41 additions & 0 deletions CITATIONS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# mycobactopia-org/MTBseq-nf: Citations

## [nf-core](https://pubmed.ncbi.nlm.nih.gov/32055031/)

> Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020 Mar;38(3):276-278. doi: 10.1038/s41587-020-0439-x. PubMed PMID: 32055031.
## [Nextflow](https://pubmed.ncbi.nlm.nih.gov/28398311/)

> Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311.
## Pipeline tools

- [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)

> Andrews, S. (2010). FastQC: A Quality Control Tool for High Throughput Sequence Data [Online].
- [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/)

> Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924.
## Software packaging/containerisation tools

- [Anaconda](https://anaconda.com)

> Anaconda Software Distribution. Computer software. Vers. 2-2.4.0. Anaconda, Nov. 2016. Web.
- [Bioconda](https://pubmed.ncbi.nlm.nih.gov/29967506/)

> Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Köster J; Bioconda Team. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. 2018 Jul;15(7):475-476. doi: 10.1038/s41592-018-0046-7. PubMed PMID: 29967506.
- [BioContainers](https://pubmed.ncbi.nlm.nih.gov/28379341/)

> da Veiga Leprevost F, Grüning B, Aflitos SA, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T, Pfeuffer J, Alvarez RV, Griss J, Nesvizhskii AI, Perez-Riverol Y. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017 Aug 15;33(16):2580-2582. doi: 10.1093/bioinformatics/btx192. PubMed PMID: 28379341; PubMed Central PMCID: PMC5870671.
- [Docker](https://dl.acm.org/doi/10.5555/2600239.2600241)

> Merkel, D. (2014). Docker: lightweight linux containers for consistent development and deployment. Linux Journal, 2014(239), 2. doi: 10.5555/2600239.2600241.
- [Singularity](https://pubmed.ncbi.nlm.nih.gov/28494014/)

> Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675.
21 changes: 21 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) Abhinav Sharma (@abhi18av) and Davi Marcon (@mxrcon)

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
95 changes: 61 additions & 34 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,54 +1,81 @@
# MTBseq-nf
## Introduction

`MTBseq-nf` pipeline makes [MTBseq](https://github.com/ngs-fzb/MTBseq_source) simple and easy to use via [Nextflow](https://www.nextflow.io/) workflow manager.
**mycobactopia-org/MTBseq-nf** is a bioinformatics pipeline that ...

# Benefits of the Nextflow wrapper
<!-- TODO nf-core:
Complete this sentence with a 2-3 sentence summary of what types of data the pipeline ingests, a brief overview of the
major pipeline sections and the types of output it produces. You're giving an overview to someone new
to nf-core here, in 15-20 seconds. For an example, see https://github.com/nf-core/rnaseq/blob/master/README.md#introduction
-->

- Ability to analyze genomes in **parallel** in addition to the default execution mode.
- Fine-grain control over resource (CPU/Memory/Storage) allocation
- Use of bioconda and biocontainers for installing packages for reproducibility
- Ease of use on a range of infrastructure
- Local machine - A strong server/laptop
- Cloud - Azure / AWS
- On-prem clusters - SLURM / PBS
- Resumability for failed processes
- Centralized locations for specifying
- MTBseq parameters (`default_params.config`)
- Hardware requirements (`conf/standard.config`)
- Software requirements (`conf/docker.config` or `conf/conda.config`)
- Dedicated user interface for all parameters for wider audience (`nextflow_schema.json`). This allows [Nextflow Tower](tower.nf/) to generate a launch form dynamically.
- Easier customizability for the pipeline, using explicit parameters (`default_params.config`).
<!-- TODO nf-core: Include a figure that guides the user through the major workflow steps. Many nf-core
workflows use the "tube map" design for that. See https://nf-co.re/docs/contributing/design_guidelines#examples for examples. -->
<!-- TODO nf-core: Fill in short bullet-pointed list of the default steps in the pipeline -->

# Parallel execution of MTBseq via MTBseq-nf
1. Read QC ([`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/))
2. Present QC for raw reads ([`MultiQC`](http://multiqc.info/))

![](./docs/MTBseq-nf-modes.png)
## Usage

This pipeline add a new option for running MTBseq with paralellization using nextflow to control the resource utilization, as well optimizing the overall time to run it.
> [!NOTE]
> If you are new to Nextflow and nf-core, please refer to [this page](https://nf-co.re/docs/usage/installation) on how to set-up Nextflow. Make sure to [test your setup](https://nf-co.re/docs/usage/introduction#how-to-run-a-pipeline) with `-profile test` before running the workflow on actual data.
## Normal and Parallel workflows
<!-- TODO nf-core: Describe the minimum required steps to execute the pipeline, e.g. how to prepare samplesheets.
Explain what rows and columns represent. For instance (please edit as appropriate):
This pipeline has two execution types: normal and parallel and here is a visual representation.
First, prepare a samplesheet with your input data that looks as follows:
The execution type is determined by the presence of `parallel` parameter.
`samplesheet.csv`:
## What are the differences between `Normal` and `Parallel` analysis modes?
```csv
sample,fastq_1,fastq_2
CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz
```
A normal MTBseq run would use `MTBseq --step full` and all samples would move to the next stage of the analysis in sync with each other, hence not allowing parallelization of analysis for samples which have been analyzed at a given step. Steps like `TB BWA` and `TB Variants` and leading to suboptimal usage of the available hardware.
Each row represents a fastq file (single-end) or a pair of fastq files (paired end).
Using `--parallel` run we enforce the parallelization of each step. The main advantage of it is the precise resource usage as the steps are controlled by Nextflow, and some steps require less CPUs and RAM than other, this allow us to optimize the run time and resource costs.
-->

# Installation and Usage
Now, you can run the pipeline using:

For installation and usage please refer the dedicated [INSTALL](./docs/INSTALL.md) and [USAGE](./docs/USAGE.md) documents.
<!-- TODO nf-core: update the following command to include all required parameters for a minimal example -->

# Contributions
```bash
nextflow run mycobactopia-org/MTBseq-nf \
-profile <docker/singularity/.../institute> \
--input samplesheet.csv \
--outdir <OUTDIR>
```

Contributions are warmly accepted!
> [!WARNING]
> Please provide pipeline parameters via the CLI or Nextflow `-params-file` option. Custom config files including those provided by the `-c` Nextflow option can be used to provide any configuration _**except for parameters**_;
> see [docs](https://nf-co.re/usage/configuration#custom-configuration-files).
# License
## Credits

The inspiration for this project is [MTBseq](https://github.com/ngs-fzb/MTBseq_source) which is released under a GPL-3 license as of [v1.0.3](https://github.com/ngs-fzb/MTBseq_source/blob/v1.0.3/LICENSE.md).
mycobactopia-org/MTBseq-nf was originally written by Abhinav Sharma (@abhi18av) and Davi Marcon (@mxrcon).

The components related to `MTBseq-nf` project itself (the Nextflow wrapper code) are licensed under the liberal MPL-2.0 license.
We thank the following people for their extensive assistance in the development of this pipeline:

We would like to thank the developers of MTBseq for putting in the initial effort!
<!-- TODO nf-core: If applicable, make list of people who have also contributed -->

## Contributions and Support

If you would like to contribute to this pipeline, please see the [contributing guidelines](.github/CONTRIBUTING.md).

## Citations

<!-- TODO nf-core: Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file. -->
<!-- If you use mycobactopia-org/MTBseq-nf for your analysis, please cite it using the following doi: [10.5281/zenodo.XXXXXX](https://doi.org/10.5281/zenodo.XXXXXX) -->

<!-- TODO nf-core: Add bibliography of tools and data used in your pipeline -->

An extensive list of references for the tools used by the pipeline can be found in the [`CITATIONS.md`](CITATIONS.md) file.

This pipeline uses code and infrastructure developed and maintained by the [nf-core](https://nf-co.re) community, reused here under the [MIT license](https://github.com/nf-core/tools/blob/master/LICENSE).

> **The nf-core framework for community-curated bioinformatics pipelines.**
>
> Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
>
> _Nat Biotechnol._ 2020 Feb 13. doi: [10.1038/s41587-020-0439-x](https://dx.doi.org/10.1038/s41587-020-0439-x).
Loading

0 comments on commit 281216f

Please sign in to comment.