This is a Word Cloud Generator, which will attempt to place w
number of the most common words from an input text file, f
, on an image of size NxN
. Below I've summarized my implementation and some of the considerations taken:
- Collect input parameters
w
,f
, andN
- Build Dictionary of
stopwords
, which are uninteresting words to blacklist. - Build the
vocabulary
Dictionary, which are the words contained inf
that are NOT in thestopwords
Dictionary. - Generate an
NxN
grid to optimize placement of words in the generated Word Cloud, as it will only try to place words in places where two lines of the same colour/ thickness intersect perpendicularly. - Attempts to place each word on a square in the
NxN
grid. A valid placement is:- When the collision box around the word does NOT collide with any other placed word's collision box. For this I used the
does_overlap
function from theAxis Aligned Bounding Box (AABB) Trees
library where the tree contains the coordinates of the word boxes for each placed word. This allowed for efficient checks to see whether a word placement would overlap with an already placed word. - When the collision box around the word lies within the
NxN
grid's coordinates
- When the collision box around the word does NOT collide with any other placed word's collision box. For this I used the
- Upon successful execution, the output Word Cloud image will be saved to
output/wordcloud.png
The following Word Cloud was generated using the novel 1984 by George Orwell's:
- Left most image is the generated
NxN
grid mentioned in A.O. #4 - Middle image is a visualization of the word-box collision boxes mentioned in in A.O. #5.1
- Right most image is the generated Word Cloud image mentioned in in A.O. #6
Clone the repository, install the dependencies via pip
, then from the terminal run: python main.py
Upon execution, you will be prompted for 3 pieces for information;
- Input text (.txt) file
f
, must be in theinput
directory - Number of words
w
- Image dimension
N