diff --git a/README.md b/README.md index 2c7eda1..a8e5201 100644 --- a/README.md +++ b/README.md @@ -7,8 +7,7 @@ * [Introduction](#introduction) * [Installation](#installation) * [Usage](#usage) -* [Acknowledgements](#acknowledgements) -* [Limitations](#limitations) +* [Wiki](#wiki) * [License](#license) @@ -24,11 +23,11 @@ vcfdist is a distance-based **germline variant calling evaluation tool** that: This results in more stable and accurate SNP, INDEL, and SV precision-recall curves than previous work, particularly when complex variants are involved. -This project is currently under active development. We welcome the submission of any feedback, issues, or suggestions for improvement! +This project is currently under active development. We welcome the submission of any feedback, issues, or suggestions for improvement! Check out the [wiki](https://github.com/TimD1/vcfdist/wiki) for more information. ### Citation -Please cite the following work if you use vcfdist: +Please cite the following works if you use vcfdist:
@@ -51,11 +50,29 @@ Please cite the following work if you use vcfdist:
+
+ +[bioRxiv] Jointly benchmarking small and structural variant calls with vcfdist + + +
+@article{dunn2024vcfdist,
+  author={Dunn, Tim and Zook, Justin M and Holt, James M and Narayanasamy, Satish},
+  title={Jointly benchmarking small and structural variant calls with vcfdist},
+  journal={bioRxiv},
+  year={2024},
+  publisher={Cold Spring Harbor Laboratory},
+  doi={10.1101/2024.01.23.575922},
+  URL={https://doi.org/10.1101/2024.01.23.575922}
+}
+
+
+ ## Installation ### Option 1: GitHub Source -vcfdist is developed for Linux and its only dependencies are GCC v8+ and HTSlib. Please note that on Mac, `g++` is aliased to `clang`, which is currently not supported. If you don't have HTSlib already, please set it up as follows: +vcfdist is developed for Linux and its only dependencies are GCC v8+ and HTSlib. If you don't have HTSlib already, please set it up as follows: ```bash > wget https://github.com/samtools/htslib/releases/download/1.17/htslib-1.17.tar.bz2 > tar -xvf htslib-1.17.tar.bz2 @@ -96,26 +113,30 @@ vcfdist \ -v 0 ``` -You can expect to see this output. + +You can expect to see the following output: +``` +PRECISION-RECALL SUMMARY -To include more details on intermediate results, run it again at higher verbosity by removing the `-v 0` flag. -Please note that your results may not be identical, since vcfdist is under active development and handling of edge-cases may differ between versions. +TYPE THRESHOLD TRUTH_TP QUERY_TP TRUTH_FN QUERY_FP PREC RECALL F1_SCORE F1_QSCORE +SNP NONE Q >= 0 8222 8222 1 2 0.9997 0.9998 0.9998 37.3885 +SNP BEST Q >= 0 8222 8222 1 2 0.9997 0.9998 0.9998 37.3885 -Please see additional options documented here, or run `vcfdist --help`. +INDEL NONE Q >= 0 876 876 51 12 0.9864 0.9449 0.9652 14.5953 +INDEL BEST Q >= 0 876 876 51 12 0.9864 0.9449 0.9652 14.5953 -The output TSV files are documented here. +SV NONE Q >= 0 0 0 0 0 1.0000 1.0000 1.0000 100.000 +SV BEST Q >= 0 0 0 0 0 1.0000 1.0000 1.0000 100.000 -Find out more information on using `hap.py` to stratify variants here. +ALL NONE Q >= 0 9098 9098 52 14 0.9984 0.9943 0.9963 24.4200 +ALL BEST Q >= 0 9098 9098 52 14 0.9984 0.9943 0.9963 24.4200 +``` +To include more details on intermediate results, run it again at higher verbosity by removing the `-v 0` flag. -## Acknowledgements -Datasets used in the evaluation of the accompanying paper are listed here. +## Wiki -## Limitations -The current version of vcfdist is not designed to support: -- overlapping or unphased variants -- polyploid contigs -- somatic variants +The [vcfdist wiki](https://github.com/TimD1/vcfdist/wiki) is currently a work-in-progress, but has helpful information on [command-line parameters](https://github.com/TimD1/vcfdist/wiki/02-Parameters-and-Usage) and [output documentation](https://github.com/TimD1/vcfdist/wiki/09-Outputs). If something isn't covered yet, just start a [discussion](https://github.com/TimD1/vcfdist/discussions) or file an [issue](https://github.com/TimD1/vcfdist/issues) and I'd be happy to answer. ## License This project is covered under the GNU GPL v3 license. diff --git a/src/README.md b/src/README.md deleted file mode 100644 index b03e7e6..0000000 --- a/src/README.md +++ /dev/null @@ -1,73 +0,0 @@ -### Usage -``` -Usage: vcfdist [options] - -Required: - query.vcf phased VCF file containing variant calls to evaluate - truth.vcf phased VCF file containing ground truth variant calls - ref.fasta FASTA file containing draft reference sequence - -Options: - - Inputs/Outputs: - -b, --bed - BED file containing regions to evaluate - -v, --verbosity [1] - printing verbosity (0: succinct, 1: default, 2:verbose) - -p, --prefix [./] - prefix for output files (directory needs a trailing slash) - -n, --no-output-files - skip writing output files, only print summary to console - - Variant Filtering/Selection: - -f, --filter [ALL] - select just variants passing these FILTERs (OR operation) - -s, --smallest-variant [1] - minimum variant size, smaller variants ignored (SNPs are size 1) - -l, --largest-variant [5000] - maximum variant size, larger variants ignored - -sv, --sv-threshold [50] - variants of this size or larger are considered SVs, not INDELs - -mn, --min-qual [0] - minimum variant quality, lower qualities ignored - -mx, --max-qual [60] - maximum variant quality, higher qualities kept but thresholded - - Re-Alignment: - -rq, --realign-query - realign query variants using Smith-Waterman parameters - -rt, --realign-truth - realign truth variants using Smith-Waterman parameters - -ro, --realign-only - standardize truth and query variant representations, then exit - -x, --mismatch-penalty [3] - Smith-Waterman mismatch (substitution) penalty - -o, --gap-open-penalty [2] - Smith-Waterman gap opening penalty - -e, --gap-extend-penalty [1] - Smith-Waterman gap extension penalty - - Precision-Recall: - -ct, --credit-threshold [0.70] - minimum partial credit to consider variant a true positive - - Distance: - -d, --distance - flag to include alignment distance calculations, skipped by default - - Utilization: - -t, --max-threads [64] - maximum threads to use for clustering and precision/recall alignment - -r, --max-ram [64.000GB] - (approximate) maximum RAM to use for precision/recall alignment - - Miscellaneous: - -h, --help - show this help message - -a, --advanced - show advanced options, not recommended for most users - -c, --citation - please cite vcfdist if used in your analyses; thanks :) - -v, --version - print vcfdist version (v2.3.2) -```