Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add kraken2 phylogenetic assignment subworkflow #47

Open
wants to merge 53 commits into
base: dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
036daf0
first commit with kraken2 module
ctuni Oct 28, 2024
e7c1e71
added a database to the config
ctuni Oct 28, 2024
4549e1a
added kraken2 param to schema
ctuni Oct 28, 2024
c158510
readded modules
ctuni Oct 28, 2024
d058ccf
fixed kraken2 name
ctuni Oct 28, 2024
e942664
added missing kraken2 options
ctuni Oct 28, 2024
033df6e
fixed something in the linting
ctuni Oct 28, 2024
c5edd67
added output to multiqc
ctuni Oct 28, 2024
6c55393
changed the kraken2 channels
ctuni Oct 28, 2024
59c4108
removed kraken2 from mutiqc for now
ctuni Oct 28, 2024
bf8eaa2
added kraken2 reports to the multqc channel
ctuni Oct 28, 2024
a9aba3e
trying other methods to creae the multiqc files channel
ctuni Oct 28, 2024
07bd918
trying to pass kraken2 report to multiqc
ctuni Oct 28, 2024
f4dc19a
why is multiqc not working?
ctuni Oct 28, 2024
d430a27
why is multiqc not working?
ctuni Oct 28, 2024
3e0d357
ugh
ctuni Oct 28, 2024
b6c0d88
removed kraken2 from multiqc files
ctuni Oct 28, 2024
ff8efdd
updated schema
ctuni Oct 28, 2024
5b602dd
Merge branch 'dev' into feature/kraken2
ctuni Oct 28, 2024
7b2f665
fixing some things
ctuni Oct 28, 2024
ca597f7
changed schema version
ctuni Oct 28, 2024
400a76b
trying to unbreak things
ctuni Oct 28, 2024
4906699
added prettier
ctuni Oct 28, 2024
54ce903
further unbreaking things
ctuni Oct 28, 2024
a3ef27b
re-updated schema
ctuni Oct 28, 2024
4b7aaad
prettier
ctuni Oct 28, 2024
0192830
forgot to change changelog
ctuni Oct 28, 2024
a6f5c6b
added krona plots to the emit block and added check for uncompressed …
ctuni Oct 28, 2024
d8c31ec
fixed typo
ctuni Oct 29, 2024
7e41b19
several improvements
ctuni Oct 29, 2024
82c68e7
prettier
ctuni Oct 29, 2024
3e9b674
updated citations
ctuni Oct 29, 2024
dbb92c1
Update docs/output.md
ctuni Oct 29, 2024
0324ead
updated output
ctuni Oct 29, 2024
f1759e1
updated output
ctuni Oct 29, 2024
dd9e07c
Merge branch 'dev' into feature/kraken2
ctuni Oct 30, 2024
828abd6
added kraken2 reports to multiqc
ctuni Oct 30, 2024
31c1f82
Merge branch 'feature/kraken2' of https://github.com/ctuni/seqinspect…
ctuni Oct 30, 2024
3a22a9e
schema
ctuni Oct 30, 2024
5a4ce9c
Merge branch 'dev' into feature/kraken2
ctuni Oct 30, 2024
34b766d
Merge branch 'feature/kraken2' of https://github.com/ctuni/seqinspect…
ctuni Oct 30, 2024
b698d41
prettier
ctuni Oct 30, 2024
6dd4859
Update workflows/seqinspector.nf
ctuni Oct 30, 2024
07dee47
disabled the publish of the taxonomy file
ctuni Oct 30, 2024
abea138
Merge branch 'feature/kraken2' of https://github.com/ctuni/seqinspect…
ctuni Oct 30, 2024
a60d573
removed default database for a null one
ctuni Oct 30, 2024
2488887
added test kraken2 database to the test configs
ctuni Oct 30, 2024
e28397a
fixed typo
ctuni Oct 30, 2024
9d8f631
miseq test is failing
ctuni Oct 30, 2024
5d1a15b
miseq test is failing
ctuni Oct 30, 2024
4729ada
promethion test is failing
ctuni Oct 30, 2024
bb8e3b0
novaseq test is failing
ctuni Oct 30, 2024
758ce12
novaseq test is failing
ctuni Oct 30, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ Initial release of nf-core/seqinspector, created with the [nf-core](https://nf-c
- [#20](https://github.com/nf-core/seqinspector/pull/20) Use tags to generate group reports
- [#13](https://github.com/nf-core/seqinspector/pull/13) Generate reports per run, per project and per lane.
- [#49](https://github.com/nf-core/seqinspector/pull/49) Merge with template 3.0.2.
- [#47](https://github.com/nf-core/seqinspector/pull/47) Added kraken2 subworkflow
- [#50](https://github.com/nf-core/seqinspector/pull/50) Add an optional subsampling step.
- [#51](https://github.com/nf-core/seqinspector/pull/51) Add nf-test to CI.
- [#63](https://github.com/nf-core/seqinspector/pull/63) Contribution guidelines added about displaying results for new tools
Expand Down
8 changes: 8 additions & 0 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,14 @@

> Andrews, S. (2010). FastQC: A Quality Control Tool for High Throughput Sequence Data [Online].

- [Kraken2](https://doi.org/10.1186/s13059-019-1891-0)

> Wood, D. E., Lu, J., & Langmead, B. (2019). Improved metagenomic analysis with Kraken 2. Genome Biology, 20(1), 257. https://doi.org/10.1186/s13059-019-1891-0

- [Krona](https://doi.org/10.1186/1471-2105-12-385)

> Ondov, B. D., Bergman, N. H., & Phillippy, A. M. (2011). Interactive metagenomic visualization in a Web browser. BMC Bioinformatics, 12. https://doi.org/10.1186/1471-2105-12-385

- [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/)

> Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924.
Expand Down
34 changes: 34 additions & 0 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,40 @@ process {
ext.args = '--quiet'
}

withName: 'KRAKEN2_KRAKEN2' {
publishDir = [
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a general statement for this in this (file)[https://github.com/nf-core/seqinspector/blob/31c1f829d97c4b98d21b68beed4af050fd331a37/conf/modules.config#L15], so I don't think is needed to add it twice, except if that bit is going to be removed it later?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is! I just added it here for two reasons: the first is that I wanted more descriptive names for the folders (kraken2_reports instead of just kraken2) and I wanted the krona plots to be inside the kraken2_reports folder, with a more descriptive name as well.
The second reason I have added this seemingly redundant code is that kraken2 and kronatools can produce more output than what is produced now. I left these lines here looking into the future: they might need to be modified depending on the needs of the pipeline once it reaches a more stable status.

path: { "${params.outdir}/kraken2_reports" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}

withName: 'KRONA_KTUPDATETAXONOMY' {
publishDir = [
path: { "${params.outdir}/kraken2_reports/krona_reports" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
enabled: false
]
}

withName: 'KRONA_KTIMPORTTAXONOMY' {
publishDir = [
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as before

path: { "${params.outdir}/kraken2_reports/krona_reports" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}

withName: 'UNTAR' {
publishDir = [
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure you want to output the kraken db, because it's size can be huge (depending on the selected one) and also it has been previously downloaded by the user, so already in user's device? You may want to use the storeDir in case you want to store the db and reuse it for later without the need of publishing it in the output

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean! I could create a patch to the UNTAR module to add the storeDir directive, that would also need some changes to the config but it can be done.

In any case, to avoid unnecessary waste of space by saving the uncompressed database, the pipeline works differently if the user provides a gzipped database or an uncompressed one. If the pipeline is gzipped, the UNTAR module uncompresses it and uses it, but by default, it won't save the uncompressed database if the user provided a compressed database.

The outputting of the uncompressed kraken2 db is turned off by default by the params.save_uncompressed_k2db, which is set as false. On the modules.config file this is read by the enable declaration.

If the database is uncompressed, and the user passes a path to the kraken2_db param, the UNTAR module is not called; the database is simply used and remains in the user's original directory.

path: { "${params.outdir}/kraken2_db" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
enabled: params.save_uncompressed_k2db
]
}

withName: 'MULTIQC_GLOBAL' {
ext.args = { params.multiqc_title ? "--title \"$params.multiqc_title\"" : '' }
publishDir = [
Expand Down
4 changes: 4 additions & 0 deletions conf/test.config
Original file line number Diff line number Diff line change
Expand Up @@ -29,4 +29,8 @@ params {

// Genome references
genome = 'R64-1-1'

// Kraken options
// Database information: https://github.com/nf-core/test-datasets/blob/taxprofiler/README.md#kraken2
kraken2_db = 'https://github.com/nf-core/test-datasets/raw/taxprofiler/data/database/kraken2/testdb-kraken2.tar.gz'
}
4 changes: 4 additions & 0 deletions conf/test_full.config
Original file line number Diff line number Diff line change
Expand Up @@ -21,4 +21,8 @@ params {

// Genome references
genome = 'R64-1-1'

// Kraken
// Database information: https://github.com/nf-core/test-datasets/blob/taxprofiler/README.md#kraken2
kraken2_db = 'https://github.com/nf-core/test-datasets/raw/taxprofiler/data/database/kraken2/testdb-kraken2.tar.gz'
}
37 changes: 37 additions & 0 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d

- [Seqtk](#seqtk) - Subsample a specific number of reads per sample
- [FastQC](#fastqc) - Raw read QC
- [Kraken2](#kraken2) - Phylogenetic assignment of reads using k-mers
- [Krona](#krona) - Interactive visualization of Kraken2 results
- [MultiQC](#multiqc) - Aggregate report describing results and QC from the whole pipeline
- [Pipeline information](#pipeline-information) - Report metrics generated during the workflow execution

Expand Down Expand Up @@ -40,6 +42,41 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d

[FastQC](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) gives general quality metrics about your sequenced reads. It provides information about the quality score distribution across your reads, per base sequence content (%A/T/G/C), adapter contamination and overrepresented sequences. For further reading and documentation see the [FastQC help pages](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/).

### Kraken2

[Kraken](https://ccb.jhu.edu/software/kraken2/) is a taxonomic sequence classifier that assigns taxonomic labels to DNA sequences. Kraken examines the k-mers within a query sequence and uses the information within those k-mers to query a database. That database maps -mers to the lowest common ancestor (LCA) of all genomes known to contain a given k-mer.

<details markdown="1">
<summary>Output files</summary>

- `kraken2/`
- `<sample>.kraken2.report.txt`: A report containing information on the phylogenetic assignment of reads in a given sample.
- `<db_name>/`
- `<sample_id>_<db_name>.classified.fastq.gz`: FASTQ file containing all reads that had a hit against a reference in the database for a given sample
- `<sample_id>_<db_name>.unclassified.fastq.gz`: FASTQ file containing all reads that did not have a hit in the database for a given sample
- `<sample_id>_<db_name>.classifiedreads.txt`: A list of read IDs and the hits each read had against each database for a given sample

</details>

The main taxonomic classification file from Kraken2 is the `*report.txt` file. It gives you the most information for a single sample.
You will only receive the `.fastq` and `*classifiedreads.txt` file if you supply `--kraken2_save_reads` and/or `--kraken2_save_readclassifications` parameters to the pipeline.

### Krona

[Krona](https://github.com/marbl/Krona) allows the exploration of (metagenomic) hierarchical data with interactive zooming, multi-layered pie charts.

Krona charts will be generated by the pipeline for supported tools (Kraken2, Centrifuge, Kaiju, and MALT)

<details markdown="1">
<summary>Output files</summary>

- `krona/`
- `<tool_name>_<db_name>.html`: per-tool/per-database interactive HTML file containing hierarchical piecharts

</details>

The resulting HTML files can be loaded into your web browser for exploration. Each file will have a dropdown to allow you to switch between each sample aligned against the given database of the tool.

### MultiQC

nf-core/seqinspector will generate the following MultiQC reports:
Expand Down
20 changes: 20 additions & 0 deletions modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,31 @@
"git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
"installed_by": ["modules"]
},
"kraken2/kraken2": {
"branch": "master",
"git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
"installed_by": ["modules"]
},
"krona/ktimporttaxonomy": {
"branch": "master",
"git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
"installed_by": ["modules"]
},
"krona/ktupdatetaxonomy": {
"branch": "master",
"git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
"installed_by": ["modules"]
},
"multiqc": {
"branch": "master",
"git_sha": "cf17ca47590cc578dfb47db1c2a44ef86f89976d",
"installed_by": ["modules"]
},
"untar": {
"branch": "master",
"git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
"installed_by": ["modules"]
},
"seqtk/sample": {
"branch": "master",
"git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
Expand Down
7 changes: 7 additions & 0 deletions modules/nf-core/kraken2/kraken2/environment.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

85 changes: 85 additions & 0 deletions modules/nf-core/kraken2/kraken2/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

99 changes: 99 additions & 0 deletions modules/nf-core/kraken2/kraken2/meta.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading
Loading