-
Notifications
You must be signed in to change notification settings - Fork 7
02 Parameters and Usage
vcfdist <query.vcf> <truth.vcf> <ref.fasta> [optional arguments]
type: string, default: none
Phased VCF file containing variant calls to evaluate.
type: string, default: none
Phased VCF file containing ground truth variant calls.
type: string, default: none
FASTA file containing the draft reference sequence.
type: string, default: none
BED file containing regions to evaluate. Variants located on the border of a BED region are currently excluded from the evaluation (details here).
type: integer, default: 1
Printing verbosity (0: succinct, 1: default, 2: verbose).
- Succinct: Only warnings, errors, and the precision-recall summary are logged to console.
- Default: High-level info on parsed variants, superclustering, phasing, output results, and timing is additionally logged.
- Verbose: For debugging; warnings are printed each time they occur with helpful data included.
type: string, default: ./
Prefix for output files (directories need a trailing slash).
For example -p results/
will store results/summary.vcf
, -p test_
will store test_summary.vcf
.
type: flag
Skip writing output files, only print summary to console.
type: comma-separated string, default: all variants pass filtering stage
Select just variants passing these FILTERs (OR operation).
type: integer, default: 5000
Maximum variant size to be evaluated, larger variants are ignored.
type: integer, default: 50
Variants of this size or larger are considered SVs, not INDELs. This is useful because precision-recall summary statistics are reported separately for SNPs, INDELs, and SVs.
type: integer, default: 0
Minimum variant quality, lower quality variants are ignored.
type: integer, default: 60
Maximum variant quality, higher quality variants are kept but their Q-score is thresholded to this value.
type: string [and integer], default: biwfa
Select clustering method, one of: biwfa
, size N
, and gap N
Clusters are generated using bi-directional wave-front alignment, essentially an efficient algorithm for finding possible alternate alignments (and therefore if nearby variants are independent). See the papers on BiWFA and WFA for more details. This is the currently recommended (and default) vcfdist clustering algorithm because it is the most accurate; it will always find dependencies if they exist. However, when evaluating large structural variants (above 1kbp) it tends to create large clusters, which results in large memory usage and slower evaluations. For evaluating large variants, --cluster size 100
may be preferable.
Gap-based clustering is the simplest and fastest clustering method: group together all variants less than N bases apart. It is also the least accurate, and will miss variant dependencies if N is too small. Conversely, as N nears the reciprocal of the background rate of genomic variation between humans (one SNP every 1000 bases), clusters will grow to be very large. We recommend 50 < N < 200, and to limit evaluations to small variants when using this option.
This is a heuristic that compromises in terms of efficiency and accuracy, basically extending the gap N heuristic to work with larger variants. Once a variant is larger than size N, the required gap to consider it independent of an adjacent variant is the size of the variant, instead of N.
type: integer, default: 4
Maximum number of iterations for expanding/merging clusters, only applicable if --cluster biwfa
is selected (which is the default).
type: flag
Realign query variants using Smith-Waterman parameters -x -o -e
type: flag
Realign truth variants using Smith-Waterman parameters -x -o -e
type: flag
Standardize truth and query variant representations, then exit.
type: integer, default: 3
Smith-Waterman mismatch (substitution) penalty.
type: integer, default: 2
Smith-Waterman gap opening penalty.
type: integer, default: 1
Smith-Waterman gap extension penalty.
type: float, default: 0.70
Minimum partial credit (calculated as a fractional reduction in edit distance over if the variant is omitted) to consider a query variant a true positive.
type: float, default: 0.60
Minimum fractional reduction in edit distance over other phasing in order to consider this supercluster phased. Phased superclusters are then used to calculate switch and flip errors.
type: flag
Flag to include alignment distance calculations, which are skipped by default.
type: integer, default: 3
Mismatch penalty (--distance
evaluation only).
type: integer, default: 2
Gap opening penalty (--distance
evaluation only).
type: integer, default: 1
Gap extension penalty (--distance
evaluation only).
type: integer, default: 64
Maximum threads to use for clustering and precision/recall alignment.
type: float, default: 64.00
Approximate maximum RAM (measured in GB) to use for precision/recall alignment. Evaluation of superclusters requiring RAM usage above this threshold will still occur, but with a warning.
type: flag
Prints a help message listing all required and optional command-line parameters.
type: flag
Prints a help message listing all command-line parameters, including advanced options that are not recommended for most users.
type: flag Prints the BibTeX and MLA formatted citations for vcfdist.
Prints the current version of vcfdist.