How does it work?

Novel Heatmap

Create a heatmap of the words in a book from any text, like this image:

which was synthesized from the text of this article about Vincent van Gogh.

How does it work?

(Image created from Isaac Asimov's The Last Question.)

It works by making a matrix of the sentences in the input file(s); each sentence is a vector and each number corresponds to the length of the word at its corresponding position.
i.e.: "The quick brown fox jumped over the lazy dog" becomes [3 5 5 3 6 4 3 4 3] and is represented thusly

Processing a .txt file containing Lorem ipsum dolor sit amet, consectetur adipiscing elit. Quisque laoreet convallis leo vitae dignissim. would produce the following result:

A paragraph of lorem ipsum would be presented:

Workflow

The images can be made by running book_map.py and selecting your file(s) via a tkinter GUI; there are various options within the code for things like "normalizing" the data when processing multiple files and outputting .mat files for use with other programs such as Octave.

To convert an .epub or similar format book to a .txt file, you can use the ebook-convert command in terminal if you have Calibre installed.
I think creating an image from a book works best by combining separate images for each chapter, so I also wrote a script to help split up the output of ebook-convert into distinct files for each chapter, the delimiter is currently the string "Chapter ", so it will likely need some cleaning up if it works at all.
For *nix users, autotrim.sh can be used to remove whitespace from all images within the directory it's called from.

Normalization

Multiple chapters (files) are "normalized" by adding 0's to the end of vectors until they match the length of the longest sentence. Each chapter is then normalized by length by adding 0 filled vectors until all matrices are the same size.
If you're not worried about visualizing data and just want pretty pictures, turning off normalization and experimenting with the files can produce some nice images. Below is a comparison of a somewhat extreme example from processing and combining the first 5 chapters of Tolstoy's What is Art? - which conveniently has a sentence-rich third chapter - with (left or top) and without (right or bottom) normalization.

(NB: These are not fully automated; both images were made by importing individual chapter images into inkscape, stacking and center aligning them. For the raw image I manually changed the color of the area outside of each respective matrix to match the color scheme to demonstrate the aesthetic intent of that setting.)

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
images		images
LICENSE		LICENSE
README.md		README.md
autotrim.sh		autotrim.sh
book_map.py		book_map.py
split_book_to_chapters.py		split_book_to_chapters.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Novel Heatmap

How does it work?

Workflow

Normalization

About

Releases

Packages

Languages

License

ChemiKyle/Novel-heatmap

Folders and files

Latest commit

History

Repository files navigation

Novel Heatmap

How does it work?

Workflow

Normalization

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages