HLApers integrates software such as kallisto, Salmon and STAR. Before using it, please read the license notices here
git clone https://github.com/genevol-usp/HLApers.git
- from Bioconductor:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("Biostrings")
- from GitHub:
if (!requireNamespace("devtools", quietly = TRUE))
install.packages("devtools")
devtools::install_github("genevol-usp/hlaseqlib")
-
STAR v2.5.3a+
-
Salmon v0.8.2+
-
samtools 1.3+
-
seqtk
- kallisto
git clone https://github.com/ANHIG/IMGTHLA.git
-
transcripts fasta (e.g., Gencode v37 fasta)
-
corresponding annotations GTF (e.g., Gencode v37 GTF)
Link the hlapers executable in your execution path, or change to the
HLApers directory and execute the program with ./hlapers
.
HLApers is composed of the following modes:
hlapers --help
Usage: hlapers [modes]
prepare-ref Prepare transcript fasta files.
index Create index for read alignment.
bam2fq Convert BAM to fastq.
genotype Infer HLA genotypes.
quant Quantify HLA expression.
The first step is to use hlapers prepare-ref
to build an index
composed of Gencode transcripts, where we replace the HLA transcripts
with IMGT HLA allele sequences.
hlapers prepare-ref --help
Usage: hlapers prepare-ref [options]
-t | --transcripts Fasta with Gencode transcript sequences.
-a | --annotations GTF from Gencode for the same Genome version.
-i | --imgt Path to IMGT directory.
-o | --out Output directory.
Example:
hlapers prepare-ref -t gencode.v37.transcripts.fa.gz -a gencode.v37.annotation.gtf.gz -i IMGTHLA -o hladb
hlapers index --help
Usage: hlapers index [options]
-t | --transcripts Fasta with Gencode transcript sequences.
-p | --threads Number of threads.
-o | --out Output directory.
--kallisto Create index for kallisto pipeline instead of STARsalmon.
Example:
hlapers index -t hladb/transcripts_MHC_HLAsupp.fa -p 4 -o index
Given a BAM file from a previous alignment to the genome, we first need
to extract the reads mapped to the MHC region and those which are
unmapped. For this, we can use the bam2fq
utility.
hlapers bam2fq --help
Usage: hlapers bam2fq [options]
-m | --mhc-coords Genomic coordinates of the MHC region in chrN:start-end format if MHC fastq is desired.
-b | --bam BAM file (if -m is specified, needs to be sorted by coordinate; otherwise use --sort).
-o | --outprefix Output prefix name.
--sort Sort input BAM file by coordinate (REQUIRED if -m is specified and BAM is not sorted by coordinate).
Example:
hlapers bam2fq -b HG00096.bam -m ./hladb/mhc_coords.txt -o HG00096
Then we run the genotyping module.
hlapers genotype --help
Usage: hlapers genotype [options]
-i | --index Index generated by 'hlapers index'.
-t | --transcripts Fasta with Gencode transcripts sequences used for 'hlapers index'.
-1 | --fq1 Fastq for READ 1.
-2 | --fq2 Fastq for READ 2.
-p | --threads Number of threads.
-o | --outprefix Output prefix name.
--kallisto Use kallisto for genotyping.
Example:
hlapers genotype -i index/STARMHC -t ./hladb/transcripts_MHC_HLAsupp.fa -1 HG00096_mhc_1.fq -2 HG00096_mhc_2.fq -p 8 -o results/HG00096
In order to quantify expression, we use the quant
module. If the
original fastq files are available, we can proceed directly to the
quantification step. If only a BAM file of a previous alignment to the
genome is available, we first need to convert the BAM to fastq using the
bam2fq
utility.
Example:
hlapers bam2fq -b HG00096.bam -o HG00096
Proceed to the quantification step.
hlapers quant --help
Usage: hlapers quant [options]
-t | --transcripts Reference transcripts directory.
-g | --genotypes *_genotypes.tsv file generated by 'hlapers genotype'.
-1 | --fq1 Fastq for READ 1.
-2 | --fq2 Fastq for READ 2.
-p | --threads Number of threads.
-o | --out Output prefix name.
--salmonreads Use Salmon lightweight alignment for quantification (NOT TESTED)
--kallisto Use kallisto for quantification.
Example:
hlapers quant -t ./hladb -g ./results/HG00096_genotypes.tsv -1 HG00096_1.fq.gz -2 HG00096_2.fq.gz -o ./results/HG00096 -p 8