Skip to content

Commit

Permalink
Merge pull request nf-core#1045 from scarlhoff/dsl2_mapDamage
Browse files Browse the repository at this point in the history
Dsl2 map damage
  • Loading branch information
scarlhoff authored Feb 16, 2024
2 parents 2a10626 + 15da81d commit 2effd73
Show file tree
Hide file tree
Showing 8 changed files with 108 additions and 35 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ jobs:
- "-profile test,docker --preprocessing_tool fastp --preprocessing_adapterlist 'https://github.com/nf-core/test-datasets/raw/modules/data/delete_me/fastp/adapters.fasta'"
- "-profile test,docker --preprocessing_tool adapterremoval --preprocessing_adapterlist 'https://github.com/nf-core/test-datasets/raw/modules/data/delete_me/adapterremoval/adapterremoval_adapterlist.txt' --sequencing_qc_tool falco"
- "-profile test,docker --mapping_tool bwamem --run_mapdamage_rescaling --run_pmd_filtering --run_trim_bam"
- "-profile test,docker --mapping_tool bowtie2"
- "-profile test,docker --mapping_tool bowtie2 --damagecalculation_tool mapdamage --damagecalculation_mapdamage_downsample 100"
- "-profile test,docker --skip_preprocessing"
- "-profile test_humanbam,docker --run_mtnucratio --run_contamination_estimation_angsd --snpcapture_bed 'https://raw.githubusercontent.com/nf-core/test-datasets/eager/reference/Human/1240K.pos.list_hs37d5.0based.bed.gz'"
- "-profile test_multiref,docker" ## TODO add damage manipulation here instead once it goes multiref
Expand Down
23 changes: 20 additions & 3 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -876,9 +876,9 @@ process {
withName: DAMAGEPROFILER {
tag = { "${meta.reference}|${meta.sample_id}_${meta.library_id}" }
ext.args = [
"-l ${params.damageprofiler_length}",
"-t ${params.damageprofiler_threshold}",
"-yaxis_dp_max ${params.damageprofiler_yaxis}"
"-l ${params.damagecalculation_damageprofiler_length}",
"-t ${params.damagecalculation_xaxis}",
"-yaxis_dp_max ${params.damagecalculation_yaxis}"
].join(' ').trim()
ext.prefix = { "${meta.sample_id}_${meta.library_id}_${meta.reference}" }
publishDir = [
Expand All @@ -888,6 +888,23 @@ process {
]
}

withName: CALCULATE_MAPDAMAGE2 {
tag = { "${meta.reference}|${meta.sample_id}_${meta.library_id}" }
ext.args = { [
"--no-stats",
"-y ${params.damagecalculation_yaxis}",
params.damagecalculation_mapdamage_downsample != 0 ? "-n ${params.damagecalculation_mapdamage_downsample} --downsample-seed=1" : "",
{ meta.strandedness } == "single" ? '--single-stranded' : '',
"-m ${params.damagecalculation_xaxis}"
].join(' ').trim() }
ext.prefix = { "${meta.sample_id}_${meta.library_id}_${meta.reference}" }
publishDir = [
path: { "${params.outdir}/damage_estimation/mapDamage2/" },
mode: params.publish_dir_mode,
pattern: 'results_*/*'
]
}

//
// LIBRARY MERGE
//
Expand Down
14 changes: 14 additions & 0 deletions docs/development/manual_tests.md
Original file line number Diff line number Diff line change
Expand Up @@ -575,6 +575,20 @@ nextflow run ../main.nf -profile docker,test_multiref --outdir results_endorspy_
nextflow run main.nf -profile test,conda --outdir ./results -resume
```
#### With mapDamage2
```bash
## mapDamage2 with default parameters
## Expect: mapdamage directory with 3pGtoA_freq, 5pCtoT_freq, dnacomp_genome, dnacomp, Fragmisincorporation_plot, Length_plot, lgdistribution, misincorporation, Runtime_log
nextflow run main.nf -profile test,docker --outdir ./results --damagecalculation_tool mapdamage
```
```bash
## mapDamage2 with downsampling to 100 reads
## Expect: mapdamage directory with 3pGtoA_freq, 5pCtoT_freq, dnacomp_genome, dnacomp, Fragmisincorporation_plot, Length_plot, lgdistribution, misincorporation, Runtime_log
nextflow run main.nf -profile test,docker --outdir ./results --damagecalculation_tool mapdamage --damagecalculation_mapdamage_downsample 100
```
### ESTIMATE CONTAMINATION
#### With ANGSD
Expand Down
20 changes: 20 additions & 0 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -504,6 +504,26 @@ Within nf-core/eager, when BAM trimming is activated alongside PMD-filtering, tr
[DamageProfiler](https://github.com/Integrative-Transcriptomics/DamageProfiler)
is a tool which calculates a variety of standard 'aDNA' metrics from a BAM file. The primary plots here are the misincorporation and length distribution plots. Ancient DNA undergoes depurination and hydrolysis, causing fragmentation of molecules into gradually shorter fragments, and cytosine to thymine deamination damage, that occur on the subsequent single-stranded overhangs at the ends of molecules.

#### mapDamage2

<details markdown="1">
<summary>Output files</summary>

- `damage_estimation/mapDamage2/`: this contains sample specific directories containing raw statistics and damage plots from mapDamage2.

- `Fragmisincorporation_plot.pdf`: a pdf file that displays both fragmentation and misincorporation patterns.
- `Length_plot.pdf`: a pdf file that displays length distribution of singleton reads per strand and cumulative frequencies of C->T at 5'-end and G->A at 3'-end are also displayed per strand.
- `misincorporation.txt`: contains a table with occurrences for each type of mutations and relative positions from the reads ends.
- `5pCtoT_freq.txt`: contains frequencies of Cytosine to Thymine mutations per position from the 5'-ends.
- `3pGtoA_freq.txt`: contains frequencies of Guanine to Adenine mutations per position from the 3'-ends.
- `dnacomp.txt`: contains a table of the reference genome base composition per position, inside reads and adjacent regions.
- `lgdistribution.txt`: contains a table with read length distributions per strand.
- `Runtime_log.txt`: log file with a summary of command lines used and timestamps.

</details>

[mapDamage2](https://ginolhac.github.io/mapDamage/) is a computational framework written in Python and R, which tracks and quantifies DNA damage patterns among ancient DNA sequencing reads generated by Next-Generation Sequencing platforms. Based on a sample bam file and a fasta reference file, the software calculates the frequency of 5' C to T and 3' G to A misincorporations, as well as the read length distribution per strand.

### Contamination estimation for Nuclear DNA

#### ANGSD nuclear contamination
Expand Down
10 changes: 6 additions & 4 deletions nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -173,10 +173,12 @@ params {
mapstats_preseq_defects_mode = false

// Damage Calculation options
skip_damage_calculation = false
damageprofiler_length = 100
damageprofiler_threshold = 15
damageprofiler_yaxis = 0.3
skip_damagecalculation = false
damagecalculation_tool = 'damageprofiler'
damagecalculation_yaxis = 0.3
damagecalculation_xaxis = 25
damagecalculation_damageprofiler_length = 100
damagecalculation_mapdamage_downsample = 0

// Damage Manipulation options
run_mapdamage_rescaling = false
Expand Down
52 changes: 32 additions & 20 deletions nextflow_schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -979,31 +979,46 @@
"fa_icon": "fas fa-chart-line",
"help_text": "More documentation can be seen in the follow links for:\n\n[DamageProfiler](https://github.com/Integrative-Transcriptomics/DamageProfiler)\n\nIf using TSV input, DamageProfiler is performed per library, i.e. after lane merging. BAM Trimming is only performed on non-UDG and half-UDG treated data.",
"properties": {
"skip_damage_calculation": {
"skip_damagecalculation": {
"type": "boolean",
"fa_icon": "fas fa-forward",
"help_text": "Turns off the DamageProfiler module to compute DNA damage profiles."
"help_text": "Turns off damage calculation to compute DNA damage profiles."
},
"damageprofiler_length": {
"damagecalculation_tool": {
"type": "string",
"default": "damageprofiler",
"enum": ["damageprofiler", "mapdamage"],
"fa_icon": "fas fa-tools",
"description": "Specify the tool to use for damage calculation.",
"help_text": "Specify the tool to be used for damage calculation. DamageProfiler is generally faster than mapDamage2, but the latter has an option to limit the number of reads used. This can significantly speed up the processing of very large files, where the damage estimates are already accurate after processing only a fraction of the input."
},
"damagecalculation_yaxis": {
"type": "number",
"default": 0.3,
"description": "Specify the maximum misincorporation frequency that should be displayed on damage plot. Set to 0 to 'autoscale'.",
"help_text": "Specifies what the maximum misincorporation frequency should be displayed as, in the damage plot.\n\n> Modifies DamageProfiler parameter: -yaxis_dp_max or mapDamage2 parameter: --ymax",
"fa_icon": "fas fa-ruler-combined"
},
"damagecalculation_xaxis": {
"type": "integer",
"default": 25,
"description": "Specify number of bases of each read to be considered for plotting damage estimation.",
"help_text": "Specifies the number of bases to be considered for plotting nucleotide misincorporations.\n\n> Modifies DamageProfiler parameter: -t or mapDamage2 parameter: -m\n",
"fa_icon": "far fa-chart-bar"
},
"damagecalculation_damageprofiler_length": {
"type": "integer",
"default": 100,
"fa_icon": "fas fa-sort-amount-up",
"description": "Specifies the length filter for DamageProfiler.",
"help_text": "> Modifies DamageProfile parameter: -l "
"help_text": "Number of bases which are considered for frequency computations, by default set to 100.`\n\n> Modifies DamageProfiler parameter: -l",
"fa_icon": "fas fa-sort-amount-down"
},
"damageprofiler_threshold": {
"damagecalculation_mapdamage_downsample": {
"type": "integer",
"default": 15,
"description": "Specify number of bases of each read to consider for DamageProfiler calculations.",
"fa_icon": "fas fa-ruler-horizontal",
"help_text": "Specifies the length of the read start and end to be considered for profile generation in DamageProfiler.\n\n> Modifies DamageProfile parameter: -t\n"
},
"damageprofiler_yaxis": {
"type": "number",
"default": 0.3,
"fa_icon": "fas fa-ruler-vertical",
"description": "Specify the maximum misincorporation frequency that should be displayed on damage plot. Set to 0 to 'autoscale'.",
"help_text": "Specifies what the maximum misincorporation frequency should be displayed as, in the DamageProfiler damage plot. This is set to 0.30 (i.e. 30%) by default as this matches the popular [mapDamage2.0 program](https://ginolhac.github.io/mapDamage/). However, the default behavior of the DamageProfiler is to 'autoscale' the y-axis maximum to zoom in on any possible damage that may occur (e.g. if the damage is about 10%, the highest value on the y-axis would be set to 0.12). This 'autoscale' behavior can be turned on by specifying the number to 0.\n\n> Modifies DamageProfile parameter: -yaxis_dp_max"
"default": 0,
"fa_icon": "fas fa-compress-alt",
"description": "Specify the maximum number of reads to consider for damage calculation. Defaults value is 0 (i.e. no downsampling is performed).",
"help_text": "The maximum number of reads used for damage calculation in mapDamage2. Can be used to significantly reduce the amount of time required for damage assessment. Note that a too low value can also obtain incorrect results.\n\n>Modifies mapDamage2 parameter: -n\n"
}
}
},
Expand Down Expand Up @@ -1162,9 +1177,6 @@
{
"$ref": "#/definitions/host_removal"
},
{
"$ref": "#/definitions/contamination_estimation"
},
{
"$ref": "#/definitions/contamination_estimation"
}
Expand Down
20 changes: 14 additions & 6 deletions subworkflows/local/calculate_damage.nf
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@
// Calculate damage profile
//

include { DAMAGEPROFILER } from '../../modules/nf-core/damageprofiler/main'
include { DAMAGEPROFILER } from '../../modules/nf-core/damageprofiler/main'
include { MAPDAMAGE2 as CALCULATE_MAPDAMAGE2 } from '../../modules/nf-core/mapdamage2/main'

workflow CALCULATE_DAMAGE {
take:
Expand All @@ -21,7 +22,7 @@ workflow CALCULATE_DAMAGE {
WorkflowEager.addNewMetaFromAttributes( it, "id" , "reference" , false )
}
// Ensure input bam matches the regions file
ch_damageprofiler_input = ch_bam_bai
ch_damagetool_input = ch_bam_bai
.map {
// Prepend a new meta that contains the meta.reference value as the new_meta.reference attribute
WorkflowEager.addNewMetaFromAttributes( it, "reference" , "reference" , false )
Expand All @@ -37,14 +38,21 @@ workflow CALCULATE_DAMAGE {
fasta_fai: fasta_fai
}
// Calculate damage
DAMAGEPROFILER(
ch_damageprofiler_input.bam,
ch_damageprofiler_input.fasta,
ch_damageprofiler_input.fasta_fai,
if ( params.damagecalculation_tool == 'damageprofiler' ) {
DAMAGEPROFILER(
ch_damagetool_input.bam,
ch_damagetool_input.fasta,
ch_damagetool_input.fasta_fai,
[]
)
ch_versions = ch_versions.mix( DAMAGEPROFILER.out.versions.first() )
ch_multiqc_files = ch_multiqc_files.mix( DAMAGEPROFILER.out.results )
} else if ( params.damagecalculation_tool == 'mapdamage' ) {
CALCULATE_MAPDAMAGE2 ( ch_damagetool_input.bam, ch_damagetool_input.fasta )

ch_versions = ch_versions.mix( CALCULATE_MAPDAMAGE2.out.versions.first() )
ch_multiqc_files = ch_multiqc_files.mix( CALCULATE_MAPDAMAGE2.out.folder )
}

emit:
versions = ch_versions // channel: [ versions.yml ]
Expand Down
2 changes: 1 addition & 1 deletion workflows/eager.nf
Original file line number Diff line number Diff line change
Expand Up @@ -505,7 +505,7 @@ workflow EAGER {
fasta_fai: [ meta, fai ]
}

if ( !params.skip_damage_calculation ) {
if ( !params.skip_damagecalculation ) {
CALCULATE_DAMAGE( ch_dedupped_bams, ch_fasta_for_damagecalculation.fasta, ch_fasta_for_damagecalculation.fasta_fai )
ch_versions = ch_versions.mix( CALCULATE_DAMAGE.out.versions )
ch_multiqc_files = ch_multiqc_files.mix(CALCULATE_DAMAGE.out.mqc.collect{it[1]}.ifEmpty([]))
Expand Down

0 comments on commit 2effd73

Please sign in to comment.