Skip to content

Commit

Permalink
Merge pull request nf-core#1058 from nf-core/add_angsd_gl
Browse files Browse the repository at this point in the history
Add angsd gl
  • Loading branch information
jfy133 authored Mar 20, 2024
2 parents cb178b6 + 127302a commit eef7726
Show file tree
Hide file tree
Showing 11 changed files with 333 additions and 6 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ jobs:
- "-profile test,docker --mapping_tool bowtie2 --damagecalculation_tool mapdamage --damagecalculation_mapdamage_downsample 100 --run_genotyping --genotyping_tool 'hc' --genotyping_source 'raw'"
- "-profile test,docker --skip_preprocessing"
- "-profile test_humanbam,docker --run_mtnucratio --run_contamination_estimation_angsd --snpcapture_bed 'https://raw.githubusercontent.com/nf-core/test-datasets/eager/reference/Human/1240K.pos.list_hs37d5.0based.bed.gz' --run_genotyping --genotyping_tool 'pileupcaller' --genotyping_source 'raw'"
- "-profile test_humanbam,docker --run_sexdeterrmine"
- "-profile test_humanbam,docker --run_sexdeterrmine --run_genotyping --genotyping_tool 'angsd' --genotyping_source 'raw'"
- "-profile test_multiref,docker" ## TODO add damage manipulation here instead once it goes multiref
steps:
- name: Check out pipeline code
Expand Down
20 changes: 20 additions & 0 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -1152,4 +1152,24 @@ process {
pattern: '*.txt'
]
}

withName: ANGSD_GL {
tag = { "${meta.reference}" }
ext.args = {
gl_model = params.genotyping_angsd_glmodel == 'samtools' ? 1 : params.genotyping_angsd_glmodel == 'gatk' ? 2 : params.genotyping_angsd_glmodel == 'soapsnp' ? 3 : 4
gl_format = params.genotyping_angsd_glformat == 'binary' ? 1 : params.genotyping_angsd_glformat == 'beagle_binary' ? 2 : params.genotyping_angsd_glformat == 'binary_three' ? 3 : 4
[
( gl_format == 2 || gl_format == 3 ) ? '-doMajorMinor 1': '',
"-GL ${gl_model}",
"-doGlf ${gl_format}",
].join(' ').trim()
}
ext.prefix = { "angsd_${meta.reference}" }
publishDir = [
path: { "${params.outdir}/genotyping/" },
mode: params.publish_dir_mode,
enabled: true,
pattern: '*.{glf,beagle}.gz'
]
}
}
26 changes: 26 additions & 0 deletions docs/development/manual_tests.md
Original file line number Diff line number Diff line change
Expand Up @@ -877,3 +877,29 @@ nextflow run main.nf -profile test_humanbam,docker --outdir ./results -w work/ -
## Specifically, no geno/snp/ind for the reference that has no bed/snp file (Mammoth). Only data for "human" reference.
nextflow run main.nf -profile test_multiref,docker --input test/samplesheet_multilane_multilib_noBAM.tsv --outdir ./results -w work/ -resume --run_genotyping --genotyping_tool 'pileupcaller' --genotyping_source 'raw' -ansi-log false -dump-channels
```
## ANGSD
```bash
## ANGSD on raw reads. Default parameters. (--genotyping_angsd_glmodel 'samtools' --genotyping_angsd_glformat 'binary' )
## Expect: One `glf.gz` file in binary format per reference.
nextflow run main.nf -profile test,docker --outdir ./results -w work/ -resume --run_genotyping --genotyping_tool 'angsd' --genotyping_source 'raw' -ansi-log false -dump-channels
```
```bash
## ANGSD on raw reads. gatk model, beagle binary format.
## Expect: One `beagle.gz` file in beagle format per reference.
nextflow run main.nf -profile test,docker --outdir ./results -w work/ -resume --run_genotyping --genotyping_tool 'angsd' --genotyping_angsd_glmodel 'gatk' --genotyping_angsd_glformat 'beagle_binary' --genotyping_source 'raw' -ansi-log false -dump-channels
```
```bash
## ANGSD on raw reads. soapSNP model, binary_three format.
## Expect: One `glf.gz` file in binary_three format per reference.
nextflow run main.nf -profile test,docker --outdir ./results -w work/ -resume --run_genotyping --genotyping_tool 'angsd' --genotyping_angsd_glmodel 'soapsnp' --genotyping_angsd_glformat 'binary_three' --genotyping_source 'raw' -ansi-log false -dump-channels
```
```bash
## ANGSD on raw reads. syk model, text format.
## Expect: One `glf.gz` file in binary_three format per reference.
nextflow run main.nf -profile test,docker --outdir ./results -w work/ -resume --run_genotyping --genotyping_tool 'angsd' --genotyping_angsd_glmodel 'syk' --genotyping_angsd_glformat 'text' --genotyping_source 'raw' -ansi-log false -dump-channels
```
13 changes: 13 additions & 0 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -621,3 +621,16 @@ When using pileupCaller for genotyping, single-stranded and double-stranded libr
</details>

[FreeBayes](https://github.com/freebayes/freebayes) is a Bayesian genetic variant detector designed to find small polymorphisms, specifically SNPs (single-nucleotide polymorphisms), indels (insertions and deletions), MNPs (multi-nucleotide polymorphisms), and complex events (composite insertion and substitution events) smaller than the length of a short-read sequencing alignment. It calls variants based on the literal sequences of reads aligned to a particular target, not their precise alignment. This model is a straightforward generalization of previous ones (e.g. PolyBayes, samtools, GATK) which detect or report variants based on alignments. This method avoids one of the core problems with alignment-based variant detection - that identical sequences may have multiple possible alignments. The output provided is a bgzipped VCF file for containing the genotype calls of each sample, it's index file, as well as the statistics of the VCF file generated by `bcftools stats`.

#### ANGSD

<details markdown="1">
<summary>Output files</summary>

- `genotyping/`

- `*.{glf,beagle}.gz`: Genotype likelihood file, containing likelihoods across all samples per reference.

</details>

[ANGSD](http://www.popgen.dk/angsd/index.php/ANGSD) is a software for analyzing next generation sequencing data. It can estimate genotype likelihoods and allele frequencies from next-generation sequencing data. The output provided is a bgzipped genotype likelihood file, containing likelihoods across all samples per reference. Users can specify the model used for genotype likelihood estimation, as well as the output format. For more information on the available options, see the [ANGSD](https://www.popgen.dk/angsd/index.php/Genotype_Likelihoods).
5 changes: 5 additions & 0 deletions modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,11 @@
"git_sha": "3f5420aa22e00bd030a2556dfdffc9e164ec0ec5",
"installed_by": ["bam_docounts_contamination_angsd"]
},
"angsd/gl": {
"branch": "master",
"git_sha": "c22aa6082716bd372cbb8f7ccf7c83220f180864",
"installed_by": ["modules"]
},
"bamutil/trimbam": {
"branch": "master",
"git_sha": "3f5420aa22e00bd030a2556dfdffc9e164ec0ec5",
Expand Down
10 changes: 10 additions & 0 deletions modules/nf-core/angsd/gl/environment.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

112 changes: 112 additions & 0 deletions modules/nf-core/angsd/gl/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

65 changes: 65 additions & 0 deletions modules/nf-core/angsd/gl/meta.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 2 additions & 0 deletions nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -227,6 +227,8 @@ params {
genotyping_gatk_hc_emitrefconf = 'GVCF'
genotyping_freebayes_min_alternate_count = 1
genotyping_freebayes_skip_coverage = 0
genotyping_angsd_glmodel = 'samtools'
genotyping_angsd_glformat = 'binary'
}

// Load base.config by default for all pipelines
Expand Down
16 changes: 16 additions & 0 deletions nextflow_schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -1025,6 +1025,22 @@
"description": "Specify to skip over regions of high depth by discarding alignments overlapping positions where total read depth is greater than specified.",
"help_text": "Specify to skip over regions of high depth by discarding alignments overlapping positions where total read depth is greater than the specified value. Setting to 0 (the default) deactivates this behaviour.\n\n> Modifies freebayes parameter: `-g`",
"fa_icon": "fab fa-think-peaks"
},
"genotyping_angsd_glmodel": {
"type": "string",
"default": "samtools",
"fa_icon": "fas fa-project-diagram",
"description": "Specify which ANGSD genotyping likelihood model to use.",
"help_text": "Specify which genotype likelihood model to use.\n\n> Modifies angsd parameter: `-GL`",
"enum": ["samtools", "gatk", "soapsnp", "syk"]
},
"genotyping_angsd_glformat": {
"type": "string",
"default": "binary",
"fa_icon": "fas fa-text-height",
"description": "Specify the formatting of the output VCF for ANGSD genotype likelihood results.",
"help_text": "Specifies what type of genotyping likelihood file format will be output.\n\nThe options refer to the following descriptions respectively:\n\n- `binary`: binary output of all 10 log genotype likelihood\n- `beagle_binary`: beagle likelihood file\n- `binary_three`: binary 3 times likelihood\n- `text`: text output of all 10 log genotype likelihoods.\n\nSee the [ANGSD documentation](http://www.popgen.dk/angsd/) for more information on which to select for your downstream applications.\n\n> Modifies angsd parameter: `-doGlf`",
"enum": ["binary", "beagle_binary", "binary_three", "text"]
}
},
"fa_icon": "fas fa-sliders-h",
Expand Down
Loading

0 comments on commit eef7726

Please sign in to comment.