Skip to content

Commit

Permalink
Release 1.1.0 (#11)
Browse files Browse the repository at this point in the history
* feat: add modules

* feat: add docker image

* fix: swap to wave containers

* feat: update schema and config

* feat: add modules to workflow

* fix: usage of vcf2mat

* fix: remove .view and add more comments explaining the code

* feat: add gvcf to vcf conversion

* feat: add GATK iGenomes

* feat: add tabix

* update docs

* update input schema

* update sbwf

* update schemas

* add pipeline tests

* feat: add correct tests

* change access to index bool

* prettier to hopefullly fix linting?

* fix indentation

* add env to local module

* fix linting for first version

* add nf-test and ignore external configs

* wip on nf-test

* finalize tests

* Template update for nf-core/tools version 3.0.2

* Template update for nf-core/tools version 3.1.0

* Apply suggestions from code review

* fix: filename collision by using different subsets of the filename (#4)

* Fix/filenamecoll (#5)

* First release :) (#1)

* feat: add modules

* feat: add docker image

* fix: swap to wave containers

* feat: update schema and config

* feat: add modules to workflow

* fix: usage of vcf2mat

* fix: remove .view and add more comments explaining the code

* feat: add gvcf to vcf conversion

* feat: add GATK iGenomes

* feat: add tabix

* update docs

* update input schema

* update sbwf

* update schemas

* add pipeline tests

* feat: add correct tests

* change access to index bool

* prettier to hopefullly fix linting?

* fix indentation

* add env to local module

* fix linting for first version

* add nf-test and ignore external configs

* wip on nf-test

* finalize tests

* Template update for nf-core/tools version 3.0.2

* Template update for nf-core/tools version 3.1.0

* Apply suggestions from code review

* Change famosab to qbic-pipelines after transfer (#2)

* Update README.md after transfer

* change famosab to qbic-pipelines

* update main

* fix: filename collision by using filebasename and include modules config again

* fix: update snaps

* prettier

* add sample names to columns and restructure (#7)

* add sample names and restructure

* add new param

* prettier

* expand docs

* remove weird param

* bump-version to dev

* prettier

* tests

* add concatenation of same saples with same label (#8)

* add concatenation of same saples with same label

* update changelog and docs

* maybe we need to switch to a subway map soon

* try other ci file

* try other ci file

* fix name

* remove ci from linting

* update pipeline level tests

* ignore modules

* add nft-vcf

* update ci and snaps

* modify

* revert

* prepare release 1.1.0 (#10)

* prepare release

* version correction

* prettier

* fix schema

* remove dev

Co-authored-by: Daniel Straub <[email protected]>

* add forgotten explanations to output

* prettier

---------

Co-authored-by: Daniel Straub <[email protected]>
  • Loading branch information
famosab and d4straub authored Jan 8, 2025
1 parent bd07ec4 commit e93735a
Show file tree
Hide file tree
Showing 46 changed files with 2,486 additions and 187 deletions.
8 changes: 8 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,11 @@ null/
.nf-test
.nf-test*
.nf-test/*

.vscode
.vscode/*

tests/unmergedgvcfs
tests/unmergedgvcfs/*
tests/input-full-ncgm.csv
conf/test_full_ncgm.config
5 changes: 3 additions & 2 deletions .nf-core.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ lint:
- docs/images/nf-core-vcftomat_logo_dark.png
- .github/ISSUE_TEMPLATE/bug_report.yml
included_configs: false
actions_ci: false
multiqc_config:
- report_comment
nextflow_config:
Expand All @@ -30,7 +31,7 @@ lint:
nf_core_version: 3.1.0
repository_type: pipeline
template:
author: "Famke B\xE4uerle, Dorothy Ellis"
author: "Famke Bäuerle, Dorothy Ellis"
description: Nextflow pipeline to convert (g)vcfs to matrices suitable for statistical
analysis
force: false
Expand All @@ -43,4 +44,4 @@ template:
- codespaces
- fastqc
- adaptivecard
version: 1.0.0dev
version: 1.1.0
12 changes: 12 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,18 @@
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## v1.1.0 - Newton Puccoon - 08.01.2025

### Added

- [#7](https://github.com/qbic-pipelines/vcftomat/pull/7) - samplenames to columns
- [#8](https://github.com/qbic-pipelines/vcftomat/pull/8) - concat for sample, label pairs

### Fixed

- [#5](https://github.com/qbic-pipelines/vcftomat/pull/5) - filename collision
- [#10](https://github.com/qbic-pipelines/vcftomat/pull/10) - prepare release 1.1.0

## v1.0.0 - Curie Purpureal - 16.12.2024

Initial release of qbic-pipelines/vcftomat, created with the [nf-core](https://nf-co.re/) template.
19 changes: 11 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,11 @@

1. Indexes (g.)vcf files ([`tabix`](http://www.htslib.org/doc/tabix.html))
2. Converts g.vcf files to vcf with `genotypegvcf` ([`GATK`](https://gatk.broadinstitute.org/hc/en-us))
3. Merges all vcfs from the same sample with `bcftools/merge` ([`bcftools`](https://samtools.github.io/bcftools/bcftools.html))
4. Converts the (merged) vcfs to a matrix using a custom R script written by @ellisdoro ([`R`](https://www.r-project.org/))
5. Collects all reports into a MultiQC report ([`MultiQC`](http://multiqc.info/))
3. Concatenates all vcfs that have the same id and the same label with `bcftools/concat` ([`bcftools`](https://samtools.github.io/bcftools/bcftools.html))
4. Changes the sample name in the vcf file to the filename with `bcftools/reheader` ([`bcftools`](https://samtools.github.io/bcftools/bcftools.html)) - This can be turned off by adding `--rename false` to the `nextflow run` command.
5. Merges all vcfs from the same sample with `bcftools/merge` ([`bcftools`](https://samtools.github.io/bcftools/bcftools.html))
6. Converts the (merged) vcfs to a matrix using a custom R script written by @ellisdoro ([`R`](https://www.r-project.org/))
7. Collects all reports into a MultiQC report ([`MultiQC`](http://multiqc.info/))

![](./docs/images/vcftomat.excalidraw.png)

Expand All @@ -32,13 +34,14 @@ First, prepare a samplesheet with your input data that looks as follows:
`samplesheet.csv`:

```csv
sample,gvcf,vcf_path,vcf_index_path
SAMPLE-1,false,path/to/vcf.gz,path/to/.vcf.gz.tbi
SAMPLE-1,false,path/to/vcf.gz,path/to/.vcf.gz.tbi
SAMPLE-2,true,path/to/g.vcf.gz,path/to/g.vcf.gz.tbi
sample,label,gvcf,vcf_path,vcf_index_path
SAMPLE-1,pipelineA-callerA,false,path/to/vcf.gz,path/to/.vcf.gz.tbi
SAMPLE-1,pipelineB-callerA,false,path/to/vcf.gz,path/to/.vcf.gz.tbi
SAMPLE-2,pipelineB-callerB,true,path/to/g.vcf.gz,path/to/g.vcf.gz.tbi
SAMPLE-2,pipelineB-callerB,true,path/to/g.vcf.gz,path/to/g.vcf.gz.tbi
```

Each row represents a VCF file coming from a sample. The `gvcf` column indicates whether the file is a g.vcf file or not. The `vcf_path` and `vcf_index_path` columns contain the path to the VCF file and its index, respectively.
Each row represents a VCF file coming from a sample. The `label` column enables concatenation of vcfs (for example when the pipeline produces different vcfs for chrM and chrY). The `gvcf` column indicates whether the file is a g.vcf file or not. The `vcf_path` and `vcf_index_path` columns contain the path to the VCF file and its index, respectively.

Now, you can run the pipeline using:

Expand Down
4 changes: 2 additions & 2 deletions assets/multiqc_config.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
report_comment: >
This report has been generated by the <a href="https://github.com/qbic-pipelines/vcftomat/releases/tag/1.0.0" target="_blank">qbic-pipelines/vcftomat</a>
analysis pipeline.
This report has been generated by the <a href="https://github.com/qbic-pipelines/vcftomat/releases/tag/1.1.0"
target="_blank">qbic-pipelines/vcftomat</a> analysis pipeline.
report_section_order:
"qbic-pipelines-vcftomat-methods-description":
order: -1000
Expand Down
9 changes: 5 additions & 4 deletions assets/samplesheet.csv
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
sample,gvcf,vcf_path,vcf_index_path
SAMPLE-1,false,path/to/vcf.gz,path/to/.vcf.gz.tbi
SAMPLE-1,false,path/to/vcf.gz,path/to/.vcf.gz.tbi
SAMPLE-2,true,path/to/g.vcf.gz,path/to/g.vcf.gz.tbi
sample,label,gvcf,vcf_path,vcf_index_path
SAMPLE-1,pipelineA-callerA,false,path/to/vcf.gz,path/to/.vcf.gz.tbi
SAMPLE-1,pipelineB-callerA,false,path/to/vcf.gz,path/to/.vcf.gz.tbi
SAMPLE-2,pipelineB-callerB,true,path/to/g.vcf.gz,path/to/g.vcf.gz.tbi
SAMPLE-2,pipelineB-callerB,true,path/to/g.vcf.gz,path/to/g.vcf.gz.tbi
8 changes: 7 additions & 1 deletion assets/schema_input.json
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,12 @@
"errorMessage": "Sample name must be provided and cannot contain spaces",
"meta": ["id"]
},
"label": {
"type": "string",
"pattern": "^\\S+$",
"errorMessage": "Label must be provided and cannot contain spaces",
"meta": ["label"]
},
"gvcf": {
"type": "boolean",
"errorMessage": "",
Expand Down Expand Up @@ -40,6 +46,6 @@
"errorMessage": "Index of VCF file must have extension '.tbi'- Optional"
}
},
"required": ["sample", "gvcf", "vcf_path"]
"required": ["sample", "label", "gvcf", "vcf_path"]
}
}
33 changes: 29 additions & 4 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -23,13 +23,38 @@ process {
}

withName: 'GATK4_GENOTYPEGVCFS' {
ext.prefix = { "${input.baseName.tokenize('.')[0]}" }
ext.prefix = { "${meta.name}" }
}

withName: 'BCFTOOLS_CONCAT' {
memory = 8.GB
ext.prefix = { "${meta.label}.concat" }
ext.args = { " --allow-overlaps --output-type z --write-index=tbi" }
publishDir = [
mode: params.publish_dir_mode,
path: { "${params.outdir}/bcftools/concat/" },
]
}

withName: 'BCFTOOLS_REHEADER' {
beforeScript = { "echo ${meta.label} > ${meta.label}.txt" }
ext.args = { "--samples ${meta.label}.txt" }
ext.prefix = { "${meta.label}.reheader" }
ext.args2 = { "--output-type z --write-index=tbi" }
publishDir = [
mode: params.publish_dir_mode,
path: { "${params.outdir}/bcftools/reheader/" },
]
}

withName: 'BCFTOOLS_MERGE' {
memory = 8.GB
ext.args = { '--force-samples' }
ext.prefix = { "${meta.id}.merged" }
memory = 8.GB
ext.args = { "--force-samples --output-type z --write-index=tbi" }
ext.prefix = { "${meta.id}.merge" }
publishDir = [
mode: params.publish_dir_mode,
path: { "${params.outdir}/bcftools/merge/" },
]
}

withName: 'MULTIQC' {
Expand Down
Binary file modified docs/images/vcftomat.excalidraw.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
20 changes: 18 additions & 2 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,27 +6,43 @@ This document describes the output produced by the pipeline. Most of the plots a

The directories listed below will be created in the results directory after the pipeline has finished. All paths are relative to the top-level results directory.

<!-- TODO nf-core: Write this documentation describing your workflow's output -->

## Pipeline overview

The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes data using the following steps:

- [Tabix](#tabix) - Indexes (g.)vcf files
- [GenotypeGVCFs](#genotypegvcfs) - Converts g.vcf files to vcf with GATK
- [Concatenate VCFs](#concatenate-vcfs) - Concatenates all vcfs that have the same id and the same label with bcftools/concat
- [Rename Samples](#rename-samples) - Changes the sample name in the vcf file to the label with bcftools/reheader
- [Merge VCFs](#merge-vcfs) - Merges all vcfs from the same sample with bcftools/merge
- [Convert to matrix](#convert-to-matrix) - Converts the (merged) vcfs to a matrix using a custom R script written by @ellisdoro
- [MultiQC](#multiqc) - Aggregate report describing results and QC from the whole pipeline
- [Pipeline information](#pipeline-information) - Report metrics generated during the workflow execution

### Tabix

Tabix generated index files with `.tbi` extension for all `(g).vcf` files that are given to the pipeline without index.

### GenotypeGVCFs

The GATK GenotypeGVCFs module translates genotype (g) vcf files into classic vcf files. The key difference between a regular VCF and a GVCF is that the GVCF has records for all sites, whether there is a variant call there or not.

### Concatenate VCFs

Some variant calling pipelines will return multiple (g)VCF files for one patient. The `concatenate` function of `bcftools` is used to add these VCFs to one VCF.

### Rename Samples

To make enable the comparison of the finalized CSV files, `bcftools reheader` can be enabled to rename the variant sample name from the generic name given by the variant caller to a custom label given with the samplesheet.

### Merge VCFs

To enable comparison of different variant callers or variant calling pipelines, all VCFs that come from the same sample are merged based on the sample ID submitted by the user.

### Convert to matrix

A custom R script is used to convert the finalized VCF to a CSV which can be used for further downstream analysis. Script was written by [Dorothy Ellis](https://github.com/ellisdoro).

### MultiQC

<details markdown="1">
Expand Down
19 changes: 10 additions & 9 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,15 +19,17 @@ You will need to create a samplesheet with information about the samples you wou
The `sample` identifiers have to be the same when the vcfs originate from the same bam but were yielded with different callers. The pipeline will merge all vcfs from the same sample into one vcf file but is also able to handle if there is only one vcf file for a sample (merging will then be skipped).

```csv title="samplesheet.csv"
sample,gvcf,vcf_path,vcf_index_path
SAMPLE-1,false,path/to/vcf.gz,path/to/.vcf.gz.tbi
SAMPLE-1,false,path/to/vcf.gz,path/to/.vcf.gz.tbi
SAMPLE-2,true,path/to/g.vcf.gz,path/to/g.vcf.gz.tbi
sample,label,gvcf,vcf_path,vcf_index_path
SAMPLE-1,pipelineA-callerA,false,path/to/vcf.gz,path/to/.vcf.gz.tbi
SAMPLE-1,pipelineB-callerA,false,path/to/vcf.gz,path/to/.vcf.gz.tbi
SAMPLE-2,pipelineB-callerB,true,path/to/g.vcf.gz,path/to/g.vcf.gz.tbi
SAMPLE-2,pipelineB-callerB,true,path/to/g.vcf.gz,path/to/g.vcf.gz.tbi
```

| Column | Description |
| ---------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `sample` | Custom sample name. This entry will be identical for vcfs that originate from the same bam but were yielded with different callers. Spaces in sample names are automatically converted to underscores (`_`). |
| `label` | Label for the vcf file. This is used to concatenate vcfs with the same label. |
| `gvcf` | Boolean whether the supplied sample is a gvcf (true) or a normal vcf (false). |
| `vcf_path` | Full path to VCF file, should have the extension ".g.vcf.gz", ".vcf.gz", ".g.vcf" or ".vcf". |
| `vcf_index_path` | Full path to index of (g)VCF file. Optional. Should have extension ".tbi". |
Expand All @@ -39,7 +41,7 @@ An [example samplesheet](../assets/samplesheet.csv) has been provided with the p
The typical command for running the pipeline is as follows:

```bash
nextflow run qbic-pipelines/vcftomat --input ./samplesheet.csv --outdir ./results --genome GATK.GRCh38 -profile docker
nextflow run qbic-pipelines/vcftomat --input ./samplesheet.csv --outdir ./results --genome GATK.GRCh38 --rename true -profile docker
```

This will launch the pipeline with the `docker` configuration profile. See below for more information about profiles.
Expand Down Expand Up @@ -69,10 +71,9 @@ nextflow run qbic-pipelines/vcftomat -profile docker -params-file params.yaml
with:

```yaml title="params.yaml"
input: './samplesheet.csv'
outdir: './results/'
genome: 'GATK.GRCh38'
<...>
input: "./samplesheet.csv"
outdir: "./results/"
genome: "GATK.GRCh38"
```
You can also generate such `YAML`/`JSON` files via [nf-core/launch](https://nf-co.re/launch).
Expand Down
10 changes: 10 additions & 0 deletions modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,21 @@
"https://github.com/nf-core/modules.git": {
"modules": {
"nf-core": {
"bcftools/concat": {
"branch": "master",
"git_sha": "d1e0ec7670fa77905a378627232566ce54c3c26d",
"installed_by": ["modules"]
},
"bcftools/merge": {
"branch": "master",
"git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
"installed_by": ["modules"]
},
"bcftools/reheader": {
"branch": "master",
"git_sha": "666652151335353eef2fcd58880bcef5bc2928e1",
"installed_by": ["modules"]
},
"gatk4/genotypegvcfs": {
"branch": "master",
"git_sha": "1999eff2c530b2b185a25cc42117a1686f09b685",
Expand Down
5 changes: 5 additions & 0 deletions modules/nf-core/bcftools/concat/environment.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

59 changes: 59 additions & 0 deletions modules/nf-core/bcftools/concat/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit e93735a

Please sign in to comment.