An efficient dynamic programming algorithm of the Sequence-Levenshtein distance using Cython. This is a direct translation of the Sequence-Levenshtein algorithm presented in Buschmann & Bystrykh (BMC Bioinformatics, 2013; see references below).
This repository contains a Cython source and its associated Setup file, both of which are written in Python 3. Once compiled, the distance function can be imported as a typical module function.
modified-Levenshtein requires Cython V.0.28.2 to run.
Building from source:
$ cython -V
Cython version 0.28.2
$ python3 setup.py build_ext --inplace
Importing within Python:
>>> from cLev import dist
>>> dist("CAGG", "CGTC")
2
>>> dist("TAGG", "TCCATGCATA")
3
It's been a while since I wrote this code so if you find any mistakes or optimizations please let me know! Feel free to submit a pull request!
I found the following refrences particularly helpful while hacking this up:
- https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3853030/
- https://github.com/gfairchild/pyxDamerauLevenshtein
- https://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Levenshtein_distance
- http://hackmap.blogspot.com/2008/04/levenshtein-in-cython.html
MIT