Haplomap

Haplotype-based computational genetic mapping (a.k.a HBCGM)

Haplomap is a successor project of HBCGM, as development on the latter was last continued in 2010. Haplomap has been adopted as a replacement for the original HBCGM

Citation:

Zhuoqing Fang, Gary Peltz, An Automated Multi-Modal Graph-Based Pipeline for Mouse Genetic Discovery, Bioinformatics, 2022;, btac356, https://doi.org/10.1093/bioinformatics/btac356

see what's new in the CHANGELOG.

Dependency

Works both on Linux and MacOS

Haplomap:

CMake
GCC >= 4.8
clang >= 11.0.3 (only tested with 11.x version)
C++11
GSL

For Variant Calling, you need:

GATK 4.x
SAMtools
BCFtools
BEDtools
BWA

Running pipeline

Snakemake

Installation

conda install -c bioconda haplomap

Installl from source

Install GSL first e.g.

Ubuntu

sudo apt-get install libgsl-dev

MacOS

brew install gsl

or compile GSL(makesure that GSL include and lib path is exported)

./configure --prefix=${HOME}/program/gsl
make && make install
# you may need to add this line to your .bashrc 
export LD_LIBRARY_PATH="${HOME}/program/gsl/lib:$LD_LIBRARY_PATH"

build and install to path

cd ${haplomap_repo}
mkdir build && cd build
cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/path/to/directory/bin ..
make

Usage

Run haplomap standalone

See more detail in haplomap subfolder: Run haplomap standalone

Use `snakemake` workflow to run Mouse Phenome Database (MPD) datasets

0. Variant calling

See variant calling using GATK, BCFtools, svtools.

e.g.

# modify the file path in haplomap and run with 12 cores
snakemake -s workflows/bcftools.call.smk  --configfile config.yaml \
          -k -p -j 12

Mouse Phenome Database have > 10K datasets. Try to configure the files below to run

1. Prepare MPD `measnum` id file. One id per row, suffixed with "-m" or "-f"(f: female, m: male)

26720-m
26720-f
9940-f
...

2. Edit the `config.yaml` file path in `workflows` folder:

only edit HBCGM section.

HBCGM:
    # working directory
    WORKSPACE: "/data/bases/fangzq/MPD/results_drug_diet"
    # path to haplomap
    BIN: "/home/fangzq/github/HBCGM/build/bin"
    
    # MPD id file, one id per line 
    TRAIT_IDS: "/data/bases/fangzq/MPD/drug-diet.ids.txt"
    # set to true will select individual animal data. Default: use strain means.   
    USE_RAWDATA: false 
    # strains metadata: map strain abbrev to full name, jax ids, etc. 
    # see docs folder to view examples
    STRAIN_ANNO: "/data/bases/shared/haplomap/PELTZ_20210609/strains.metadata.csv"
    
    # filtered VCF files after variant calling step 
    VCF_DIR: "/data/bases/shared/haplomap/PELTZ_20210609/VCFs"
    # Ensembl-vep output after variant calling step
    VEP_DIR: "/data/bases/shared/haplomap/PELTZ_20210609/VEP"

    ## Optional files
    # genetic relation file from PLink output
    GENETIC_REL: "/data/bases/shared/haplomap/PELTZ_20210609/mouse54_grm.rel"
    # gene expression file 
    GENE_EXPRS: "/data/bases/shared/haplomap/PELTZ_20210609/mus.compact.exprs.txt"

3. run haplomap pipeline

3.1 create conda envs

conda create -n hbcgm -f environment.yaml

3.2 run on a local computing node.

source activate hbcgm
# modify the file path in haplomap and run with 24 cores
snakemake -s workflows/haplomap.smk \
          --configfile workflows/config.yaml 
          -k -p -j 24

3.3 Run on the HPC, e.g. Stanford Sherlock

e.g. Sherlock slurm

edit slurm.submit.sh, change file path to HBCGM/workflows
edit workflows/slurm_config.yaml, specify the resource you need.
submit

sbatch slurm.submit.sh

Output

output explanation, see here: Run haplomap standalone

Contact

Email:

Zhuoqing Fang: [email protected]
Gary Peltz: [email protected]

Copyright and License Information

Authors: Zhuoqing Fang and Gary Peltz.

The original HBCGM (the maximal haplotype construction method) was developed by Dr. David Dill and Dr. Gary Peltz at Stanford.

HBCGM/Halomap is patented to Dr. Gary Peltz.

This program is licensed with commercial restriction use license. Please see the attached LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 317 Commits
.github/workflows		.github/workflows
conda		conda
docs		docs
example		example
haplomap		haplomap
scripts		scripts
test		test
webapp		webapp
workflows		workflows
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
CMakeLists.txt.in		CMakeLists.txt.in
LICENSE		LICENSE
README.md		README.md
environment.yaml		environment.yaml
slurm.submit.sh		slurm.submit.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Haplomap

Dependency

Installation

Installl from source

Usage

Run haplomap standalone

Use `snakemake` workflow to run Mouse Phenome Database (MPD) datasets

0. Variant calling

1. Prepare MPD `measnum` id file. One id per row, suffixed with "-m" or "-f"(f: female, m: male)

2. Edit the `config.yaml` file path in `workflows` folder:

3. run haplomap pipeline

3.1 create conda envs

3.2 run on a local computing node.

3.3 Run on the HPC, e.g. Stanford Sherlock

Output

Contact

Copyright and License Information

About

Releases 3

Packages

Contributors 2

Languages

License

zqfang/haplomap

Folders and files

Latest commit

History

Repository files navigation

Haplomap

Dependency

Installation

Installl from source

Usage

Run haplomap standalone

Use snakemake workflow to run Mouse Phenome Database (MPD) datasets

0. Variant calling

1. Prepare MPD measnum id file. One id per row, suffixed with "-m" or "-f"(f: female, m: male)

2. Edit the config.yaml file path in workflows folder:

3. run haplomap pipeline

3.1 create conda envs

3.2 run on a local computing node.

3.3 Run on the HPC, e.g. Stanford Sherlock

Output

Contact

Copyright and License Information

About

Resources

License

Stars

Watchers

Forks

Releases 3

Packages 0

Contributors 2

Languages

Use `snakemake` workflow to run Mouse Phenome Database (MPD) datasets

1. Prepare MPD `measnum` id file. One id per row, suffixed with "-m" or "-f"(f: female, m: male)

2. Edit the `config.yaml` file path in `workflows` folder:

Packages