Skip to content

Variant calling and imputation from low coverage sequencing data

License

Notifications You must be signed in to change notification settings

cguyomar/PARSEC

Repository files navigation

Introduction

PARSEC (imPutAtion for spaRSE sequenCing) is a bioinformatics pipeline designed to genotype large populations using low coverage sequencing data. It relies on bcftools mpileup to detect SNP sites and stitch to impute genotypes.

The pipeline is still in early development

metro map

  1. Index bams (SAMtools)
  2. Prepare fixed size genomic chunks (bedtools)
  3. Optionnal : call variants from sparse data
    1. Merge bams on each window (SAMtools)
    2. Call variants for each window (bcftools)
    3. Concatenate vcf files (bcftools)
    4. Sort vcf (bcftools)
    5. Filter variants (bcftools)
  4. Impute genotypes (stitch
  5. Index vcf (Tabix)
  6. Concatenate vcf files (bcftools)
  7. Sort vcf (bcftools)

Usage

Note If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test before running the workflow on actual data.

An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.

This pipeline uses code and infrastructure developed and maintained by the nf-core community, reused here under the MIT license.

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.

About

Variant calling and imputation from low coverage sequencing data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published