Skip to content

Lower bound on cardinalities for fixed length fixed alphabet

Yoann Dufresne edited this page Oct 8, 2019 · 1 revision

Remark on minimality

an looks to be the minimal cardinality neighbourhood words. This observation comes from experimentations on different size of alphabet, word length and reachable k values.

Proof needed !

A closed formula for an

Remarks on edit operations

  • substitution then deletion on another letter is the same than deletion then substitution
  • insert a letter is the same than insert a then substitute it by any other letter

Closed formula

k edits of a word of size n lead to words of size between n-k to n+k. To create the formula, let's count the words by size.

Size n words

For the size n, we can generate all the words with 0 to k substitution. So the results is a sum over k of exact Hamming neighbourhood.

Ham(n, k) = comb(k, n) * (a-1)k

size m < n words

To reach size m, we will have to perform n-m deletions. As said in the remarks, we can chose to do the deletions first. After doing n-m deletions, we still have k-(n-m) edit operations available for substitutions. So, we can sum hamming distance over i from 0 to k-(n-m). The total is sum of all the possible sizes of Ham(m, i).

size m > n words