Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DSL2: Qualimap #998

Closed
wants to merge 29 commits into from
Closed
Show file tree
Hide file tree
Changes from 6 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
2811ea3
Adding first version of qualimap (not functional yet)
jbv2 Jun 2, 2023
b4da458
adding snpcapture as channel
jbv2 Jun 6, 2023
56f9ae6
changing mqc output
jbv2 Jun 9, 2023
5e5d357
Merge branch 'dev' into qualimap
jbv2 Jun 9, 2023
fe23a31
minor changes, waiting for docker
jbv2 Jun 9, 2023
10c6d2d
Apply suggestions from code review
jbv2 Jun 23, 2023
0014cd0
Apply suggestions from code review
jbv2 Jun 23, 2023
f4797ed
Apply suggestions from code review
jbv2 Jun 30, 2023
f00573b
prettier
jbv2 Jun 30, 2023
abecece
Apply suggestions from code review
jbv2 Jun 30, 2023
31f473f
Apply suggestions from code review
jbv2 Jun 30, 2023
8e053c4
minor changes after merging reviews
jbv2 Jun 30, 2023
c4c1daa
removing duplicated code lines
jbv2 Jun 30, 2023
89776d8
removing duplicated code lines
jbv2 Jun 30, 2023
4ddc639
Apply suggestions from code review
jbv2 Jul 6, 2023
937e96a
Apply suggestions from code review
jbv2 Jul 6, 2023
5bcc7ff
Merge branch 'qualimap' of https://github.com/jbv2/eager into qualimap
jbv2 Jul 6, 2023
c5bacf3
Apply suggestions from code review
jbv2 Jul 6, 2023
a45be9d
Adding warning
jbv2 Jul 6, 2023
f033a4c
Merge branch 'qualimap'
jbv2 Jul 6, 2023
6db6d45
Adding map to snpcapture
jbv2 Jul 6, 2023
57a9e49
adding map to snpcapture
jbv2 Jul 6, 2023
727b98f
prettier
jbv2 Jul 6, 2023
2d7f9b9
Merge branch 'dev' into qualimap
jbv2 Jul 6, 2023
6f3344f
Update nextflow_schema.json
jbv2 Jul 14, 2023
51f4853
removing duplicate lines
jbv2 Jul 14, 2023
fcfa34f
Merge branch 'qualimap' of https://github.com/jbv2/eager into qualimap
jbv2 Jul 14, 2023
c84d68f
Merge branch 'dev' into qualimap
jbv2 Jul 14, 2023
b032ef7
readd qualimap
jbv2 Jul 14, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,8 +85,13 @@
> Jun, G., Wing, M. K., Abecasis, G. R., & Kang, H. M. (2015). An efficient and scalable analysis framework for variant extraction and refinement from population-scale DNA sequence data. Genome Research, 25(6), 918–925. doi: [10.1101/gr.176552.114](https://doi.org/10.1101/gr.176552.114)

- [DamageProfiler](https://doi.org/10.1093/bioinformatics/btab190)

> DamageProfiler Neukamm, J., Peltzer, A., & Nieselt, K. (2020). DamageProfiler: Fast damage pattern calculation for ancient DNA. In Bioinformatics (btab190). doi: [10.1093/bioinformatics/btab190](https://doi.org/10.1093/bioinformatics/btab190). Download: https://github.com/Integrative-Transcriptomics/DamageProfiler

- [QualiMap](https://doi.org/10.1093/bioinformatics/btv566)

> QualiMap Okonechnikov, K., Conesa, A., & García-Alcalde, F. (2016). Qualimap 2: advanced multi-sample quality control for high-throughput sequencing data. Bioinformatics , 32(2), 292–294. Download: http://qualimap.bioinfo.cipf.es/
jbv2 marked this conversation as resolved.
Show resolved Hide resolved

## Software packaging/containerisation tools

- [Anaconda](https://anaconda.com)
Expand Down
8 changes: 8 additions & 0 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -726,4 +726,12 @@ process {
enabled: true
]
}

withName: "QUALIMAP_BAMQC" {
jbv2 marked this conversation as resolved.
Show resolved Hide resolved
publishDir = [
jbv2 marked this conversation as resolved.
Show resolved Hide resolved
jbv2 marked this conversation as resolved.
Show resolved Hide resolved
path: { "${params.outdir}/qualimap/" },
mode: params.publish_dir_mode,
enabled: true
]
}
}
24 changes: 17 additions & 7 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -380,18 +380,28 @@ Within nf-core/eager, when BAM trimming is activated alongside PMD-filtering, tr
[DamageProfiler](https://github.com/Integrative-Transcriptomics/DamageProfiler)
is a tool which calculates a variety of standard 'aDNA' metrics from a BAM file. The primary plots here are the misincorporation and length distribution plots. Ancient DNA undergoes depurination and hydrolysis, causing fragmentation of molecules into gradually shorter fragments, and cytosine to thymine deamination damage, that occur on the subsequent single-stranded overhangs at the ends of molecules.

### Contamination estimation for Nuclear DNA

#### ANGSD nuclear contamination
#### QualiMap

<details markdown="1">
<summary>Output files</summary>

- `contamination_estimation/angsd/`
- `qualimap/`: this contains a sub-directory for every sample, which includes a qualimap report and associated raw statistic files. You can open the `.html` file in your internet browser to see the in-depth report (this will be more detailed than in MultiQC). This includes stuff like percent coverage, depth coverage, GC content and so on of your mapped reads.
jbv2 marked this conversation as resolved.
Show resolved Hide resolved

jbv2 marked this conversation as resolved.
Show resolved Hide resolved
</details>

[QualiMap](http://qualimap.bioinfo.cipf.es/)
is a tool which provides statistics on the quality of the mapping of your reads to your reference genome. It allows you to assess how well covered your reference genome is by your data, both in 'fold' depth (average number of times a given base on the reference is covered by a read) and 'percentage' (the percentage of all bases on the reference genome that is covered at a given fold depth). These outputs allow you to make decision if you have enough quality data for downstream applications like genotyping, and how to adjust the parameters for those tools accordingly.

- `*.txt`: Text file containing the results of nuclear contamination estimation with ANGSD for each library.
- `nuclear_contamination.txt`: Text file containing a summary table of contamination estimates for all libraries.
- `nuclear_contamination_mqc.json`: JSON file containing a summary table of contamination estimates for all libraries.
> NB: Neither fold coverage nor percent coverage on there own is sufficient to assess whether you have a high quality mapping. Abnormally high fold coverages of a smaller region such as highly conserved genes or un-removed-adapter-containing reference genomes can artificially inflate the mean coverage, yet a high percent coverage is not useful if all bases of the genome are covered at just 1x coverage.

**Note that many of the statistics from this module are displayed in the General Stats table, as they represent single values that are not plottable.**

You will receive output for each sample. This means you will statistics of deduplicated values of all types of libraries combined in a single value (i.e. non-UDG treated, full-UDG, paired-end, single-end all together).
jbv2 marked this conversation as resolved.
Show resolved Hide resolved
warning: If your library has no reads mapping to the reference, this will result in an empty BAM file. Qualimap will therefore not produce any output even if a BAM exists!
jbv2 marked this conversation as resolved.
Show resolved Hide resolved

### Contamination estimation for Nuclear DNA

#### ANGSD nuclear contamination

jbv2 marked this conversation as resolved.
Show resolved Hide resolved
</details>

Expand Down
5 changes: 5 additions & 0 deletions modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -150,6 +150,11 @@
"git_sha": "911696ea0b62df80e900ef244d7867d177971f73",
"installed_by": ["modules"]
},
"qualimap/bamqc": {
"branch": "master",
"git_sha": "911696ea0b62df80e900ef244d7867d177971f73",
"installed_by": ["modules"]
},
"samtools/faidx": {
"branch": "master",
"git_sha": "911696ea0b62df80e900ef244d7867d177971f73",
Expand Down
123 changes: 123 additions & 0 deletions modules/nf-core/qualimap/bamqc/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

47 changes: 47 additions & 0 deletions modules/nf-core/qualimap/bamqc/meta.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 4 additions & 0 deletions nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -164,6 +164,10 @@ params {
damageprofiler_threshold = 15
damageprofiler_yaxis = 0.3

// Qualimap options
skip_qualimap = false
snpcapture_bed = null

// Damage Manipulation options
run_mapdamage_rescaling = false
damage_manipulation_rescale_seqlength = 12
Expand Down
40 changes: 32 additions & 8 deletions nextflow_schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -966,19 +966,20 @@
"description": "Specifies the length filter for DamageProfiler.",
"help_text": "> Modifies DamageProfile parameter: -l "
},
"damageprofiler_threshold": {
"type": "integer",
"default": 15,
"description": "Specify number of bases of each read to consider for DamageProfiler calculations.",
"fa_icon": "fas fa-ruler-horizontal",
"help_text": "Specifies the length of the read start and end to be considered for profile generation in DamageProfiler.\n\n> Modifies DamageProfile parameter: -t\n"
},

"damageprofiler_yaxis": {
"type": "number",
"default": 0.3,
"fa_icon": "fas fa-ruler-vertical",
"description": "Specify the maximum misincorporation frequency that should be displayed on damage plot. Set to 0 to 'autoscale'.",
"help_text": "Specifies what the maximum misincorporation frequency should be displayed as, in the DamageProfiler damage plot. This is set to 0.30 (i.e. 30%) by default as this matches the popular [mapDamage2.0 program](https://ginolhac.github.io/mapDamage/). However, the default behavior of the DamageProfiler is to 'autoscale' the y-axis maximum to zoom in on any possible damage that may occur (e.g. if the damage is about 10%, the highest value on the y-axis would be set to 0.12). This 'autoscale' behavior can be turned on by specifying the number to 0.\n\n> Modifies DamageProfile parameter: -yaxis_dp_max"
"help_text": "Specifies what the maximum misincorporation frequency should be displayed as, in the DamageProfiler damage plot. This is set to 0.30 (i.e. 30%) by default as this matches the popular [mapDamage2.0 program](https://ginolhac.github.io/mapDamage/). However, the default behavior of the DamageProfiler is to 'autoscale' the y-axis maximum to zoom in on any possible damage that may occur (e.g. if the damage is about 10%, the highest value on the y-axis would be set to 0.12). This 'autoscale' behavior can be turned on by specifying the number to 0. Default: 0.30.\n\n> Modifies DamageProfile parameter: -yaxis_dp_max"
},
"damageprofiler_threshold": {
"type": "integer",
"default": 15,
"description": "Specify number of bases of each read to consider for DamageProfiler calculations.",
"fa_icon": "fas fa-ruler-horizontal",
"help_text": "Specifies the length of the read start and end to be considered for profile generation in DamageProfiler. By default set to 15 bases.\n\n> Modifies DamageProfile parameter: -t"
}
}
},
Expand Down Expand Up @@ -1061,6 +1062,26 @@
"fa_icon": "fas fa-map"
}
}
},
"statistics": {
"title": "Statistics",
"type": "object",
"description": "Additional options.",
"default": "",
"properties": {
"skip_qualimap": {
"type": "boolean",
"fa_icon": "fas fa-forward",
"description": "Turns off QualiMap and thus does not compute coverage and other mapping metrics."
jbv2 marked this conversation as resolved.
Show resolved Hide resolved
jbv2 marked this conversation as resolved.
Show resolved Hide resolved
},
"snpcapture_bed": {
"type": "string",
"default": null,
jbv2 marked this conversation as resolved.
Show resolved Hide resolved
"help_text": "Can be used to set a path to a BED file (3/6 column format) of SNP positions of a reference genome, to calculate SNP captured libraries on-target efficiency. This should be used for array or in-solution SNP capture protocols such as 390K, 1240K, etc. If supplied, some on-target metrics are automatically generated for you by qualimap in the 'Globals inside' section of the 'genome_results.txt' file in the qualimap results directory. These statistics are currently NOT displayed in MultiQC!",
jbv2 marked this conversation as resolved.
Show resolved Hide resolved
jbv2 marked this conversation as resolved.
Show resolved Hide resolved
jbv2 marked this conversation as resolved.
Show resolved Hide resolved
"fa_icon": "fas fa-magnet"
}
},
"fa_icon": "fas fa-calculator"
}
},
"allOf": [
Expand Down Expand Up @@ -1112,6 +1133,9 @@
{
"$ref": "#/definitions/adna_damage_analysis"
},
{
"$ref": "#/definitions/statistics"
},
{
"$ref": "#/definitions/contamination_estimation"
},
Expand Down
25 changes: 25 additions & 0 deletions workflows/eager.nf
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,7 @@ include { PRESEQ_CCURVE } from '../modules/nf-core/preseq/ccurve/m
include { PRESEQ_LCEXTRAP } from '../modules/nf-core/preseq/lcextrap/main'
include { FALCO } from '../modules/nf-core/falco/main'
include { MTNUCRATIO } from '../modules/nf-core/mtnucratio/main'
include { QUALIMAP_BAMQC } from '../modules/nf-core/qualimap/bamqc/main'
include { HOST_REMOVAL } from '../modules/local/host_removal'

/*
Expand Down Expand Up @@ -128,6 +129,13 @@ workflow EAGER {
if ( params.preprocessing_tool == 'fastp' && !adapterlist.extension.matches(".*(fa|fasta|fna|fas)") ) error "[nf-core/eager] ERROR: fastp adapter list requires a `.fasta` format and extension (or fa, fas, fna). Check input: --preprocessing_adapterlist ${params.preprocessing_adapterlist}"
}

// QualiMap
if ( params.snpcapture_bed ) {
ch_snpcapture_bed = Channel.fromPath(params.snpcapture_bed, checkIfExists: true).collect()
jbv2 marked this conversation as resolved.
Show resolved Hide resolved
} else {
ch_snpcapture_bed = Channel.empty()
}

// Contamination estimation
hapmap_file = file(params.contamination_estimation_angsd_hapmap, checkIfExists:true)

Expand Down Expand Up @@ -352,6 +360,23 @@ workflow EAGER {
}

//
// MODULE: QUALIMAP_BAMQC
//

if ( !params.skip_qualimap ) {

qualimap_input = ch_dedupped_bams
.map {
meta, bam, bai ->
[ meta, bam ]
}

jbv2 marked this conversation as resolved.
Show resolved Hide resolved
QUALIMAP_BAMQC( qualimap_input, ch_snpcapture_bed )
ch_versions = ch_versions.mix( QUALIMAP_BAMQC.out.versions )
ch_multiqc_files = ch_multiqc_files.mix(QUALIMAP_BAMQC.out.results.collect{it[1]}.ifEmpty([]))

}

// SUBWORKFLOW: Contamination estimation
//

Expand Down