Nilsimsa is a hashing function which belongs to the family of locality sensitive hashes (LSH). This library provides functions to compute and compare the Nilsimsa string similarity hash for a given string.
The code is a port of the Python Nilsimsa implementation by Michael Itz to the Java language: http://code.google.com/p/py-nilsimsa.
Original C nilsimsa-0.2.4 implementation by cmeclax: http://ixazon.dynip.com/~cmeclax/nilsimsa.html
- Java 8+
<dependency>
<groupId>com.weblyzard.lib.string</groupId>
<artifactId>nilsimsa</artifactId>
<version>0.0.5</version>
</dependency>
String text = "A short test message";
Nilsimsa n = Nilsimsa.getHash(text);
System.out.println("Nilsimsa hash for message '" + text + "': " + n.hexdigest());
Nilsimsa first = Nilsimsa.getHash("A short test message");
Nilsimsa second = Nilsimsa.getHash("A short test message!");
Nilsimsa third = Nilsimsa.getHash("Something completely different");
System.out.println(first.bitwiseDifference(first)); // 0
System.out.println(first.bitwiseDifference(second)); // 3
System.out.println(first.bitwiseDifference(third)); // 133
List<String> testList = Arrays.asList("A short test message",
"A short test message!",
"Something completely different");
for (String firstString: testList) {
for (String secondString: testList) {
Nilsimsa firstHash = Nilsimsa.getHash(firstString);
Nilsimsa secondHash = Nilsimsa.getHash(secondString);
System.out.println("The hash value of text '" + firstString + "' and '"
+ secondString + "' differ in " + firstHash.bitwiseDifference(secondHash) + " bits.");
}
}
Please refer to the releases page.