Skip to content
/ wmd4j Public
forked from crtomirmajer/wmd4j

wmd4j is a Java library for calculating Word Mover's Distance (WMD)

License

MIT, Unknown licenses found

Licenses found

MIT
LICENSE
Unknown
license-header
Notifications You must be signed in to change notification settings

ArDoCo/wmd4j

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

77 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

wmd4j

wmd4j is a Java library for computing Word Mover's Distance (WMD) between two text documents. It provides the same functionality as Word2Vec.wmdistance in Gensim.

wmd4j depends on deeplearning4j WordVectors interface for word vector manipulation and uses an optimized version of JFastEMD (Earth Mover's Distance transportation problem) underneath, which is about 1.8x faster.

This is a forked and updated version of crtomirmajer/wmd4j.

Usage

WordVectors vectors = WordVectorSerializer.loadStaticModel(new File(word2vecPath));
WordMovers wordMovers = WordMovers.builder().wordVectors(vectors).build();

wordMovers.distance("obama speaks to the media in illinois", "the president greets the press in chicago");

Validation

wmd4j is validated against Gensim's wmdistance results on custom word2vec model.

About

wmd4j is a Java library for calculating Word Mover's Distance (WMD)

Resources

License

MIT, Unknown licenses found

Licenses found

MIT
LICENSE
Unknown
license-header

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages

  • Java 99.1%
  • Python 0.9%