Skip to content

Commit

Permalink
Merge pull request nf-core#1060 from nf-core/dsl2_bam2fastq
Browse files Browse the repository at this point in the history
DSL2: Convert input BAM
  • Loading branch information
TCLamnidis authored May 10, 2024
2 parents 0a09d5d + 2ad49ec commit 62f1b57
Show file tree
Hide file tree
Showing 11 changed files with 284 additions and 38 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ jobs:
- "-profile test,docker --preprocessing_tool adapterremoval --preprocessing_adapterlist 'https://github.com/nf-core/test-datasets/raw/modules/data/delete_me/adapterremoval/adapterremoval_adapterlist.txt' --sequencing_qc_tool falco --run_genotyping --genotyping_tool 'freebayes' --genotyping_source 'raw'"
- "-profile test,docker --mapping_tool bwamem --run_mapdamage_rescaling --run_pmd_filtering --run_trim_bam --run_genotyping --genotyping_tool 'ug' --genotyping_source 'trimmed'"
- "-profile test,docker --mapping_tool bowtie2 --damagecalculation_tool mapdamage --damagecalculation_mapdamage_downsample 100 --run_genotyping --genotyping_tool 'hc' --genotyping_source 'raw'"
- "-profile test,docker --skip_preprocessing"
- "-profile test,docker --skip_preprocessing --convert_inputbam"
- "-profile test_humanbam,docker --run_mtnucratio --run_contamination_estimation_angsd --snpcapture_bed 'https://raw.githubusercontent.com/nf-core/test-datasets/eager/reference/Human/1240K.pos.list_hs37d5.0based.bed.gz' --run_genotyping --genotyping_tool 'pileupcaller' --genotyping_source 'raw'"
- "-profile test_humanbam,docker --run_sexdeterrmine --run_genotyping --genotyping_tool 'angsd' --genotyping_source 'raw'"
- "-profile test_multiref,docker" ## TODO add damage manipulation here instead once it goes multiref
Expand Down
19 changes: 19 additions & 0 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,25 @@ process {
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]

//
// CONVERT INPUT BAM
//
withName: SAMTOOLS_CONVERT_BAM_INPUT {
tag = { "${meta.sample_id}_${meta.library_id}_L${meta.lane}" }
ext.prefix = { "${meta.sample_id}_${meta.library_id}_L${meta.lane}" }
publishDir = [
enabled: false
]
}

withName: CAT_FASTQ_CONVERTED_BAM {
tag = { "${meta.sample_id}_${meta.library_id}_L${meta.lane}" }
ext.prefix = { "${meta.sample_id}_${meta.library_id}_L${meta.lane}" }
publishDir = [
enabled: false
]
}

//
// READ PREPROCESSING
//
Expand Down
8 changes: 8 additions & 0 deletions docs/development/manual_tests.md
Original file line number Diff line number Diff line change
Expand Up @@ -903,3 +903,11 @@ nextflow run main.nf -profile test,docker --outdir ./results -w work/ -resume --
## Expect: One `glf.gz` file in binary_three format per reference.
nextflow run main.nf -profile test,docker --outdir ./results -w work/ -resume --run_genotyping --genotyping_tool 'angsd' --genotyping_angsd_glmodel 'syk' --genotyping_angsd_glformat 'text' --genotyping_source 'raw' -ansi-log false -dump-channels
```
# CONVERT BAM INPUT
```bash
## BAM input converted to FastQ and remapped.
## Expect: BAM input shows up in FastQC -> mapping results.
nextflow run main.nf -profile test,docker --outdir ./results -w work/ --convert_inputbam --skip_deduplication -resume -ansi-log false -dump-channels
```
12 changes: 12 additions & 0 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,18 @@ CONTROL_REP1,AEG588A1_S1_L003_R1_001.fastq.gz,AEG588A1_S1_L003_R2_001.fastq.gz
CONTROL_REP1,AEG588A1_S1_L004_R1_001.fastq.gz,AEG588A1_S1_L004_R2_001.fastq.gz
```

### Supplying BAM input

It is possible to also supply BAM files as input to nf-core/eager. This can allow you to skip earlier steps of the pipeline (preprocessing and mapping) when desired - e.g. when re-processing public data. You can also convert input BAM files back to FASTQ files to re-undergo preprocessing and mapping. This may be desired when you want to standardise the mapping parameters between your own and previously published data.

You will still need to fill the `pairment` column in the input TSV sheet for the BAM files. If you do not convert the BAM files back to FASTQ, you must specify the column as `single`. If you do do the conversion, you must specify the type of reads the BAM file contains, i.e.:

- If the mapped reads in the BAM file are single end then specify `single`
- If the mapped reads in the BAM file are paired-end _but merged pairs_ (i.e. overlapping pairs collapsed to a single read), then you must also supply `single`
- If the mapped reads in the BAM file are paired-end and are _not_ merged (i.e., paired-end mapping was originally performed), then you must specify `paired`

Note that if you do not specify to merge BAM converted paired-end FASTQs (i.e., request paired-end mapping), only forward and reverse pairs will be used - singletons in the BAMs will be discarded!

### Full samplesheet

The pipeline will auto-detect whether a sample is single- or paired-end using the information provided in the samplesheet. The samplesheet can have as many columns as you desire, however, there is a strict requirement for the first 3 columns to match those defined in the table below.
Expand Down
5 changes: 5 additions & 0 deletions modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -210,6 +210,11 @@
"git_sha": "6b0e4fe14ca1b12e131f64608f0bbaf36fd11451",
"installed_by": ["modules"]
},
"samtools/collatefastq": {
"branch": "master",
"git_sha": "f4596fe0bdc096cf53ec4497e83defdb3a94ff62",
"installed_by": ["modules"]
},
"samtools/depth": {
"branch": "master",
"git_sha": "a1ffbc1fd87bd5a829e956cc26ec9cc53af3e817",
Expand Down
8 changes: 8 additions & 0 deletions modules/nf-core/samtools/collatefastq/environment.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

55 changes: 55 additions & 0 deletions modules/nf-core/samtools/collatefastq/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

76 changes: 76 additions & 0 deletions modules/nf-core/samtools/collatefastq/meta.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 3 additions & 0 deletions nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,9 @@ params {
// Input options
input = null

// Input BAM conversion
convert_inputbam = false

// References
genome = null
igenomes_base = 's3://ngi-igenomes/igenomes/'
Expand Down
7 changes: 6 additions & 1 deletion nextflow_schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,12 @@
"help_text": "You will need to create a design file with information about the samples in your experiment before running the pipeline. Use this parameter to specify its location. It has to be a comma-separated file with 3 columns, and a header row. See [usage docs](https://nf-co.re/eager/usage#samplesheet-input).",
"fa_icon": "fas fa-file-csv"
},
"convert_inputbam": {
"type": "boolean",
"description": "Specify to convert input BAM files back to FASTQ for remapping",
"help_text": "This parameter tells the pipeline to convert the BAM files listed in the `--input` TSV or CSV sheet back to FASTQ format to allow re-preprocessing and mapping\n\nCan be useful when you want to ensure consistent mapping parameters across all libraries when incorporating public data, however be careful of biases that may come from re-processing again (the BAM files may already be clipped, or only mapped reads with different settings are included so you may not have all reads from the original publication).",
"fa_icon": "fas fa-undo-alt"
},
"outdir": {
"type": "string",
"format": "directory-path",
Expand Down Expand Up @@ -1308,7 +1314,6 @@
"properties": {
"run_sexdeterrmine": {
"type": "boolean",
"default": false,
"fa_icon": "fas fa-transgender-alt",
"description": "Turn on sex determination for human reference genomes. This will run on single- and double-stranded variants of a library separately.",
"help_text": "Specify to run the optional process of sex determination."
Expand Down
Loading

0 comments on commit 62f1b57

Please sign in to comment.