hlaTyper

Technical comparison of computational HLA prediction tools from NGS data.

Folder content

data
- comparison - contains the scripts to run and evaluate tools
- goldstandard - contains the curated information of use genotypes per sample

Comparison

Runs are triggered with submitAll WGS, submitAll WES submitAll RNAseq, respectively, extracting bam-files for the HLA region in each sample and submitting the wrapper script for the 5 tools. These scripts are named Run_.sbatch and can be found in the bin-folder.

Results are then collected over all samples for each tool and evaluated to obtain the final success and accuracy values.

The log file output for ClassI+II prediction of the evaluation script evaluatePredictions.py (as triggered by eval.sh) can be found in the RepositoryLogFiles-folder.

File structure is results1000<Dataset>ExtrChr6<Tool>(<Variables>).evalCII.txt, e.g. results1000EExtrChr6.hlamineralnTop.evalCII.txt

Dataset

E - Whole Exome Seq
G - Whole Genome Seq
R - RNA seq

Tools

hlamineraln - HLAminer alignment
hlaminersbly - HLAminer assembly
hlavbseq
optiytype
phlat
seq2hla

Variables

Top - single results for hlaminer
Top2 - single results for hlavbseq (per Chromosome)
Top3 - top 3 results for hlaminer
Top5 - top 5 results for hlavbseq

File content looks like so:

W2 NA19676.A.1 gold: 01:01:01:01/01:01:01:02/01:04/01:22/01:32/01:34/01:37 hlaminer: 31:01
W4 NA19676.A.1 gold: 01:01:01:01/01:01:01:02/01:04/01:22/01:32/01:34/01:37 hlaminer: 31:01
...
R4 NA19676.DRB1.2 gold: 12:01:01/12:06/12:10/12:17 hlaminer: 12:01
SUM NA19676 rightLow 6 wrongLow 6 success = 0.50 accuracy = 0.50 | rightHigh 3 wrongHigh 9 success = 0.25 accracy = 0.25 | na 0 total 12
...
Class I+II rightLow 6690 wrongLow 4953 success = 0.57 accuracy = 0.56 | rightHigh 3128 wrongHigh 8515 success = 0.27 accuracy = 0.26 | na 226 total 11869 NA 0.02 | samples 992 predictions 992 failed 0

For each individual the file contains for each HLA loci one row with

evaluation R(ight)2(-digit)R4,W(rong)2,W4
sample name and HLA loci
gold standard entry for this loci (can be blank)
tool name
prediction from tool (can be blank)

Followed by a SUM line summarizing performance of this tool for this individual with

number of right 2-digits
number of wrong 2-digits
success 2-digits
accuracy 2-digits
number of right 4-digits
number of wrong 4-digits
success 4-digits
accuracy 4-digits
number of NA from tool
total number of known alleles
percentage

The file is concluded with a final summary Class I+II

number of right 2-digits
number of wrong 2-digits
success 2-digits
accuracy 2-digits
number of right 4-digits
number of wrong 4-digits
success 4-digits
accuracy 4-digits
number of NA from tool
total number of known alleles
percentage
number of samples
number of predictions
number of failed runs

Other Analysis

Scripts for the other analyses performed for the paper (e.g. coverage stats, or correlations) can be found in bin

Citing

Bauer DC, Zadoorian A, Wilson LOW, Melbourne Genomics Health Alliance, and Thorne NP, “Evaluation of computational programs to predict HLA genotypes from genomic sequencing data” in peparation

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data		data
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

hlaTyper

Folder content

Comparison

Other Analysis

Citing

About

Releases

Packages

Languages

BauerLab/hlaTyper

Folders and files

Latest commit

History

Repository files navigation

hlaTyper

Folder content

Comparison

Other Analysis

Citing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages