Description

Can a K-Means Cluster algorithm identify famous books? In this repository, I run an experiment to see if a K-Means Clustering Algorithm can correctly group and identify 3 famous authors and book series: Harry Potter, Lord of The Rings, and Game of Thrones.

Dependencies

Python 3.7+

pip install pandas;
pip install matplotlib;
pip install sklearn;
pip install nltk;

Book Vectors

Each book has a vector that consists of relative token frequencies. Only the tokens with the highest document frequency are included in the vectors.

Data Sources

GOT txt files: https://www.kaggle.com/muhammedfathi/game-of-thrones-book-files
Harry Potter and Lord of The Rings txt files: http://www.glozman.com/textpages.html

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
.gitignore		.gitignore
README.md		README.md
k_means_clusters.ipynb		k_means_clusters.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Description

Dependencies

Book Vectors

Data Sources

About

Releases

Packages

Languages

Blauyourmind/clustering_books

Folders and files

Latest commit

History

Repository files navigation

Description

Dependencies

Book Vectors

Data Sources

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages