Skip to content

Commit

Permalink
Merge pull request nf-core#1035 from nf-core/dsl2-sex_determination
Browse files Browse the repository at this point in the history
Dsl2 sex determination
  • Loading branch information
TCLamnidis authored Mar 20, 2024
2 parents fd6fa52 + 624ba69 commit cb178b6
Show file tree
Hide file tree
Showing 16 changed files with 378 additions and 18 deletions.
1 change: 1 addition & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ jobs:
- "-profile test,docker --mapping_tool bowtie2 --damagecalculation_tool mapdamage --damagecalculation_mapdamage_downsample 100 --run_genotyping --genotyping_tool 'hc' --genotyping_source 'raw'"
- "-profile test,docker --skip_preprocessing"
- "-profile test_humanbam,docker --run_mtnucratio --run_contamination_estimation_angsd --snpcapture_bed 'https://raw.githubusercontent.com/nf-core/test-datasets/eager/reference/Human/1240K.pos.list_hs37d5.0based.bed.gz' --run_genotyping --genotyping_tool 'pileupcaller' --genotyping_source 'raw'"
- "-profile test_humanbam,docker --run_sexdeterrmine"
- "-profile test_multiref,docker" ## TODO add damage manipulation here instead once it goes multiref
steps:
- name: Check out pipeline code
Expand Down
4 changes: 4 additions & 0 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -122,6 +122,10 @@

> Li H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics (2011) 27(21) 2987-93.doi: [10.1093/bioinformatics/btr509](https://doi.org/10.1093/bioinformatics/btr509).
- [Sex.DetERRmine.py](http://dx.doi.org/10.1038/s41467-018-07483-5)

> Sex.DetERRmine.py Lamnidis, T.C. et al., 2018. Ancient Fennoscandian genomes reveal origin and spread of Siberian ancestry in Europe. Nature communications, 9(1), p.5018. Available at: http://dx.doi.org/10.1038/s41467-018-07483-5. Download: https://github.com/TCLamnidis/Sex.DetERRmine
## Software packaging/containerisation tools

- [Anaconda](https://anaconda.com)
Expand Down
23 changes: 23 additions & 0 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -919,6 +919,29 @@ process {
]
}

//
// RUN SEXDETERRMINE
//
withName: SAMTOOLS_DEPTH_SEXDETERRMINE {
tag = { "${meta1.reference}|${meta1.sample_id}_${meta1.library_id}" }
ext.prefix = { "${meta2.id}_samtoolsdepth" }
ext.args = '-aa -q30 -Q30 -H'
publishDir = [
enabled: false
]
}

withName: SEXDETERRMINE {
tag = { "${meta.reference}|${meta.sample_id}_${meta.library_id}" }
ext.prefix = { "${meta.reference}_sexdeterrmine" }
publishDir = [
path: { "${params.outdir}/sex_determination/" },
mode: params.publish_dir_mode,
pattern: '*{_sexdeterrmine}*',
enabled: true
]
}

//
// LIBRARY MERGE
//
Expand Down
2 changes: 1 addition & 1 deletion conf/test_humanbam.config
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ params {

// TODO Reactivate sexDet and genotyping params when those steps get implemented.
// //Sex Determination
// sexdeterrmine_bedfile = 'https://raw.githubusercontent.com/nf-core/test-datasets/eager/reference/Human/1240K.pos.list_hs37d5.0based.bed.gz'
sexdeterrmine_bedfile = 'https://raw.githubusercontent.com/nf-core/test-datasets/eager/reference/Human/1240K.pos.list_hs37d5.0based.bed.gz'
// // Genotyping
genotyping_pileupcaller_bedfile = 'https://raw.githubusercontent.com/nf-core/test-datasets/eager/reference/Human/1240K.pos.list_hs37d5.0based.bed.gz'
genotyping_pileupcaller_snpfile = 'https://raw.githubusercontent.com/nf-core/test-datasets/eager/reference/Human/1240K_covered_in_JK2067_downsampled_s0.1.numeric_chromosomes.snp'
Expand Down
14 changes: 14 additions & 0 deletions docs/development/manual_tests.md
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,12 @@ Tool Specific combinations
- single reference: with damage manipulation (pmd + trimming), on pmd filtered data ✅
- multi reference: no damage manipulation ✅

- Sex determination

- With sexdeterrmine

- with default parameters

### Multi-reference tests

```bash
Expand Down Expand Up @@ -739,6 +745,14 @@ nextflow run main.nf -profile test,docker --outdir ./results -w work/ -resume --
nextflow run main.nf -profile test_multiref,docker --outdir ./results -w work/ -resume --genotyping_source 'raw' -ansi-log false -dump-channels
```
# Run Sexdeterrmine
```bash
## Running sex determination subworkflow from deduplicated bams
## Expect: sex_deterrmine/sexdeterrmine directory with tsv summary table for all individuals.
nextflow run main.nf -profile test_humanbam,arm,docker --outdir ./results --run_sexdeterrmine
```
# GENOTYPING
These tests were ran before library merging was implemented.
Expand Down
24 changes: 20 additions & 4 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -541,9 +541,25 @@ is a tool which calculates a variety of standard 'aDNA' metrics from a BAM file.

[ANGSD](http://www.popgen.dk/angsd/index.php/ANGSD) is a software for analyzing next generation sequencing data. Among other functions, ANGSD can estimate contamination for chromosomes for which one copy exists, i.e. X-chromosome for humans with karyotype XY. To do this, we first generate a binary count file for the X-chromosome (`angsd`) and then perform a Fisher's exact test for finding a p-value and jackknife to get an estimate of contamination (`contamination`). Contamination is estimated with Method of Moments (MOM) and Maximum Likelihood (ML) for both Method1 and Method2. Method1 compares the total number of minor and major reads at SNP sites with the number of minor and major reads at adjacent sites, assuming independent errors between reads and sites, while Method2 only samples one read at each site to remove the previous assumption. The results of all methods for each library, as well as respective standard errors are summarised in `nuclear_contamination.txt` and `nuclear_contamination_mqc.json`.

### Sex Determination

<details markdown="1">
<summary>Output files</summary>

- `sex_determination/`: this contains the output for the sex determination run. This is a single `.tsv` file that includes a table with the sample name, the number of autosomal SNPs, number of SNPs on the X/Y chromosome, the number of reads mapping to the autosomes, the number of reads mapping to the X/Y chromosome, the relative coverage on the X/Y chromosomes, and the standard error associated with the relative coverages. These measures are provided for each bam file, one row per file. If the `sexdeterrmine_bedfile` option has not been provided, the error bars cannot be trusted!
- </details>

#### Sex.DetERRmine

Sex.DetERRmine calculates the coverage of your mapped reads on the X and Y chromosomes relative to the coverage on the autosomes (X-/Y-rate). This metric can be thought of as the number of copies of chromosomes X and Y that is found within each cell, relative to the autosomal copies. The number of autosomal copies is assumed to be two, meaning that an X-rate of 1.0 means there are two X chromosomes in each cell, while 0.5 means there is a single copy of the X chromosome per cell. Human females have two copies of the X chromosome and no Y chromosome (XX), while human males have one copy of each of the X and Y chromosomes (XY).

When a bedfile of specific sites is provided, Sex.DetERRmine runs much faster and additionally calculates error bars around each relative coverage estimate. For this estimate to be trustworthy, the sites included in the bedfile should be spaced apart enough that a single sequencing read cannot overlap multiple sites. Hence, when a bedfile has not been provided, this error should be ignored. When a suitable bedfile is provided, each observation of a covered site is independent, and the error around the coverage is equal to the binomial error estimate. This error is then propagated during the calculation of relative coverage for the X and Y chromosomes.

> Note that in nf-core/eager this will be run on single- and double-stranded variants of the same library separately. This can also help assess for differential contamination between libraries.
### Genotyping

### pileupCaller
#### pileupCaller

<details markdown="1">
<summary>Output files</summary>
Expand All @@ -561,7 +577,7 @@ is a tool which calculates a variety of standard 'aDNA' metrics from a BAM file.

When using pileupCaller for genotyping, single-stranded and double-stranded libraries are genotyped separately. Single-stranded libraries are genotyped with the additional option `--singeStrandMode`, which ensure that deamination damage artefactts cannot affect the genotype calls, by only using the forward- or reverse-mapping reads when genotyping on transitions (depending on the alleles of the transition).

### GATK UnifiedGenotyper
#### GATK UnifiedGenotyper

<details markdown="1">
<summary>Output files</summary>
Expand All @@ -576,7 +592,7 @@ When using pileupCaller for genotyping, single-stranded and double-stranded libr

[GATK's UnifiedGenotyper](https://github.com/broadinstitute/gatk-docs/blob/master/gatk3-tooldocs/3.5-0/org_broadinstitute_gatk_tools_walkers_genotyper_UnifiedGenotyper.html) uses a Bayesian genotype likelihood model to estimate simultaneously the most likely genotypes and allele frequency in a population of N samples, emitting a genotype for each sample. The system can either emit just the variant sites or complete genotypes (which includes homozygous reference calls) satisfying some phred-scaled confidence value. This tool has been deprecated by the GATK developers in favour of HaplotypeCaller, but is still cosidered a preferable genotyper for ancient DNA, given its ability to handle low coverage data. The output provided is a bgzipped VCF file for containing the genotype calls of each sample, it's index file, as well as the statistics of the VCF file generated by `bcftools stats`.

### GATK HaplotypeCaller
#### GATK HaplotypeCaller

<details markdown="1">
<summary>Output files</summary>
Expand All @@ -591,7 +607,7 @@ When using pileupCaller for genotyping, single-stranded and double-stranded libr

[GATK's HaplotypeCaller](https://gatk.broadinstitute.org/hc/en-us/articles/13832687299739-HaplotypeCaller) is capable of calling SNPs and indels simultaneously via local de-novo assembly of haplotypes in an active region. In addition, HaplotypeCaller is able to handle non-diploid organisms as well as pooled experiment data. This is the preferred genotyper for modern DNA. The output provided is a bgzipped VCF file for containing the genotype calls of each sample, it's index file, as well as the statistics of the VCF file generated by `bcftools stats`.

### FreeBayes
#### FreeBayes

<details markdown="1">
<summary>Output files</summary>
Expand Down
32 changes: 21 additions & 11 deletions modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -62,12 +62,12 @@
},
"bwa/aln": {
"branch": "master",
"git_sha": "3f5420aa22e00bd030a2556dfdffc9e164ec0ec5",
"git_sha": "6278bf9afd4a4b2d00fa6052250e73da3d91546f",
"installed_by": ["fastq_align_bwaaln"]
},
"bwa/index": {
"branch": "master",
"git_sha": "3f5420aa22e00bd030a2556dfdffc9e164ec0ec5",
"git_sha": "6278bf9afd4a4b2d00fa6052250e73da3d91546f",
"installed_by": ["modules"]
},
"bwa/mem": {
Expand Down Expand Up @@ -117,7 +117,7 @@
},
"fastp": {
"branch": "master",
"git_sha": "f4ae1d942bd50c5c0b9bd2de1393ce38315ba57c",
"git_sha": "003920c7f9a8ae19b69a97171922880220bedf56",
"installed_by": ["modules"]
},
"fastqc": {
Expand Down Expand Up @@ -205,34 +205,39 @@
"git_sha": "6b0e4fe14ca1b12e131f64608f0bbaf36fd11451",
"installed_by": ["modules"]
},
"samtools/depth": {
"branch": "master",
"git_sha": "a1ffbc1fd87bd5a829e956cc26ec9cc53af3e817",
"installed_by": ["modules"]
},
"samtools/faidx": {
"branch": "master",
"git_sha": "ce0b1aed7d504883061e748f492a31bf44c5777c",
"installed_by": ["modules"]
},
"samtools/fastq": {
"branch": "master",
"git_sha": "ce0b1aed7d504883061e748f492a31bf44c5777c",
"git_sha": "8d8f0ae52d6c9342bd41c33dda0b74b07e32153d",
"installed_by": ["modules"]
},
"samtools/flagstat": {
"branch": "master",
"git_sha": "ce0b1aed7d504883061e748f492a31bf44c5777c",
"git_sha": "8d8f0ae52d6c9342bd41c33dda0b74b07e32153d",
"installed_by": ["modules"]
},
"samtools/idxstats": {
"branch": "master",
"git_sha": "ce0b1aed7d504883061e748f492a31bf44c5777c",
"git_sha": "8d8f0ae52d6c9342bd41c33dda0b74b07e32153d",
"installed_by": ["modules"]
},
"samtools/index": {
"branch": "master",
"git_sha": "ce0b1aed7d504883061e748f492a31bf44c5777c",
"git_sha": "8d8f0ae52d6c9342bd41c33dda0b74b07e32153d",
"installed_by": ["bam_split_by_region", "fastq_align_bwaaln"]
},
"samtools/merge": {
"branch": "master",
"git_sha": "ce0b1aed7d504883061e748f492a31bf44c5777c",
"git_sha": "8d8f0ae52d6c9342bd41c33dda0b74b07e32153d",
"installed_by": ["modules"]
},
"samtools/mpileup": {
Expand All @@ -242,13 +247,13 @@
},
"samtools/sort": {
"branch": "master",
"git_sha": "ce0b1aed7d504883061e748f492a31bf44c5777c",
"git_sha": "8d8f0ae52d6c9342bd41c33dda0b74b07e32153d",
"installed_by": ["modules"]
},
"samtools/view": {
"branch": "master",
"git_sha": "ce0b1aed7d504883061e748f492a31bf44c5777c",
"installed_by": ["bam_split_by_region", "modules"]
"git_sha": "8d8f0ae52d6c9342bd41c33dda0b74b07e32153d",
"installed_by": ["bam_split_by_region"]
},
"seqkit/split2": {
"branch": "master",
Expand All @@ -259,6 +264,11 @@
"branch": "master",
"git_sha": "3f5420aa22e00bd030a2556dfdffc9e164ec0ec5",
"installed_by": ["modules"]
},
"sexdeterrmine": {
"branch": "master",
"git_sha": "911696ea0b62df80e900ef244d7867d177971f73",
"installed_by": ["modules"]
}
}
},
Expand Down
39 changes: 39 additions & 0 deletions modules/nf-core/samtools/depth/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

54 changes: 54 additions & 0 deletions modules/nf-core/samtools/depth/meta.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

40 changes: 40 additions & 0 deletions modules/nf-core/sexdeterrmine/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit cb178b6

Please sign in to comment.