Skip to content

Can a K-Means Cluster algorithm identify famous books?

Notifications You must be signed in to change notification settings

Blauyourmind/clustering_books

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Description

Can a K-Means Cluster algorithm identify famous books? In this repository, I run an experiment to see if a K-Means Clustering Algorithm can correctly group and identify 3 famous authors and book series: Harry Potter, Lord of The Rings, and Game of Thrones.

Dependencies

  1. Python 3.7+
pip install pandas;
pip install matplotlib;
pip install sklearn;
pip install nltk;

Book Vectors

Each book has a vector that consists of relative token frequencies. Only the tokens with the highest document frequency are included in the vectors.

Data Sources

  1. GOT txt files: https://www.kaggle.com/muhammedfathi/game-of-thrones-book-files
  2. Harry Potter and Lord of The Rings txt files: http://www.glozman.com/textpages.html

About

Can a K-Means Cluster algorithm identify famous books?

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published