Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Damerau–Levenshtein Distance Asymmetry with Certain Strings #757

Open
mrname5 opened this issue Dec 19, 2024 · 0 comments
Open

Damerau–Levenshtein Distance Asymmetry with Certain Strings #757

mrname5 opened this issue Dec 19, 2024 · 0 comments
Assignees
Labels

Comments

@mrname5
Copy link

mrname5 commented Dec 19, 2024

Describe the bug
When using DamerauLevenshteinDistance function on certain strings, the returned distance sometimes depends on the order in which the two strings are passed as arguments. From what I understand, Damerau–Levenshtein distance should be symmetric (i.e., distance(a, b) should equal distance(b, a)). However, for at least one pair of strings, calling the function with reversed arguments yields a different result, which indicates unexpected, asymmetric behavior.

To Reproduce
The one that has weird behaviour is the first example. The output of code is in the additional context section below:

const natural = require('natural');

let s1 = '0,1,10,11';
let s2 = '0,11,110,111';
let bin1 = '0,10,100,110';
let bin2 = '0,11,110,111';
let normal1 = 'pig';
let normal2 = 'chicken';

console.log('Strings under test:');
console.log('s1:', JSON.stringify(s1));
console.log('s2:', JSON.stringify(s2));
console.log('bin1:', JSON.stringify(bin1));
console.log('bin2:', JSON.stringify(bin2));
console.log('normal1:', JSON.stringify(normal1));
console.log('normal2:', JSON.stringify(normal2));

console.log('\n--- Unexpected Result ---');
console.log('Arguments used and order:', 'DamerauLevenshteinDistance(s1, s2)');
console.log('Result:', natural.DamerauLevenshteinDistance(s1, s2)); // Expected symmetric result
console.log('Arguments used and order:', 'DamerauLevenshteinDistance(s2, s1)');
console.log('Result:', natural.DamerauLevenshteinDistance(s2, s1)); // Unexpectedly different

console.log('\n---  Expected Result 1 ---');
console.log('Arguments used and order:', 'DamerauLevenshteinDistance(bin1, bin2)');
console.log('Result:', natural.DamerauLevenshteinDistance(bin1, bin2));
console.log('Arguments used and order:', 'DamerauLevenshteinDistance(bin2, bin1)');
console.log('Result:', natural.DamerauLevenshteinDistance(bin2, bin1)); // Symmetric as expected

console.log('\n---  Expected Result 2 ---');
console.log('Arguments used and order:', 'DamerauLevenshteinDistance(normal1, normal2)');
console.log('Result:', natural.DamerauLevenshteinDistance(normal1, normal2));
console.log('Arguments used and order:', 'DamerauLevenshteinDistance(normal2, normal1)');
console.log('Result:', natural.DamerauLevenshteinDistance(normal2, normal1)); // Symmetric as expected

Expected behavior
Based on my understanding, the Damerau–Levenshtein distance is a metric and should therefore be symmetric. This means DamerauLevenshteinDistance(s1, s2) should return the same value as DamerauLevenshteinDistance(s2, s1).

Screenshots
Not applicable.

Desktop (please complete the following information):

  • OS: Arch Linux
  • Not using browser just nodejs
  • nodejs version: v20.18.0
  • natural version: [email protected]

Additional context
Output of code above:

Strings under test:
s1: "0,1,10,11"
s2: "0,11,110,111"
bin1: "0,10,100,110"
bin2: "0,11,110,111"
normal1: "pig"
normal2: "chicken"

--- Unexpected Result ---
Arguments used and order: DamerauLevenshteinDistance(s1, s2)
Result: 0
Arguments used and order: DamerauLevenshteinDistance(s2, s1)
Result: 3

---  Expected Result 1 ---
Arguments used and order: DamerauLevenshteinDistance(bin1, bin2)
Result: 3
Arguments used and order: DamerauLevenshteinDistance(bin2, bin1)
Result: 3

---  Expected Result 2 ---
Arguments used and order: DamerauLevenshteinDistance(normal1, normal2)
Result: 6
Arguments used and order: DamerauLevenshteinDistance(normal2, normal1)
Result: 6

Code demonstrates that certain numeric-comma-separated strings (s1 and s2) produce asymmetric Damerau–Levenshtein distances, while other pairs (bin1, bin2 and normal1, normal2) behave as expected. Expect all pairs to yield symmetric results. Please tell me if my understanding of Damerau–Levenshtein is incorrect or I am using this function incorrectly. I appreciate the help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants