Tag Clustering using `wordnet` and `word2vec` distance metrics

Clustering a set of wordnet synsets using k-means, the wordnet pair-wise distance (semantic relatedness) of word senses using the Edge Counting method of the of Wu & Palmer (1994) is mapped to the euclidean distance to allow K-means to converge preserving the original pair-wise relationship.

By toggling use_wordnet = False to True the distance metric between words will use a GloVe model glove.6B.300d_word2vec.txt (this must be in the word2vec format) and the word2vec similarity value

extras folder is proof of concept/experimentations

To Use:

create a newline delimited file with a list of wordnet senses (eg. data/example_tags.txt)
to use wordnet set use_wordnet=True, to use word2vec use_wordnet=False
python generate-tag-clusters.py data/example_tags.txt 25 0.7
- 25 is the number of clusters to segment the list of wordnet senses into.
- 0.7 is the similarity threshold, below this the words are considered not similar
results places into the results folder as a json file

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
data		data
experimental		experimental
extras		extras
results		results
.gitignore		.gitignore
LICENCE		LICENCE
README.md		README.md
generate-tag-clusters.py		generate-tag-clusters.py
print_clusters.py		print_clusters.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tag Clustering using `wordnet` and `word2vec` distance metrics

To Use:

About

Releases

Packages

Languages

License

darenr/wordnet-clusters

Folders and files

Latest commit

History

Repository files navigation

Tag Clustering using wordnet and word2vec distance metrics

To Use:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Tag Clustering using `wordnet` and `word2vec` distance metrics

Packages