Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZeroDivisionError: division by zero #274

Open
ArpiJakab opened this issue Jun 13, 2017 · 7 comments
Open

ZeroDivisionError: division by zero #274

ArpiJakab opened this issue Jun 13, 2017 · 7 comments

Comments

@ArpiJakab
Copy link

wordcloud = WordCloud().generate('not funny, funny,')

Generates error:
wordcloud = WordCloud().generate('not funny, funny,')
File "/Library/Python/2.7/site-packages/wordcloud/wordcloud.py", line 556, in generate
return self.generate_from_text(text)
File "/Library/Python/2.7/site-packages/wordcloud/wordcloud.py", line 541, in generate_from_text
words = self.process_text(text)
File "/Library/Python/2.7/site-packages/wordcloud/wordcloud.py", line 522, in process_text
word_counts = unigrams_and_bigrams(words, self.normalize_plurals)
File "/Library/Python/2.7/site-packages/wordcloud/tokenization.py", line 57, in unigrams_and_bigrams
if score(count, counts[word1], counts[word2], n_words) > 30:
File "/Library/Python/2.7/site-packages/wordcloud/tokenization.py", line 22, in score
p2 = (c2 - c12) / (N - c1)
ZeroDivisionError: division by zero

@amueller
Copy link
Owner

amueller commented Jun 15, 2017

Thanks, I can reproduce. Not sure what a good fix is. Did you get this in a real usecase?
I "fixed" it such that "funny funny" is now not detected as a collocation, and the output will be a word-cloud containing "funny", not "funny funny".

@ArpiJakab
Copy link
Author

ArpiJakab commented Jun 16, 2017 via email

@amueller
Copy link
Owner

not is removed because it's a stop-word and English stopwords are removed: https://github.com/amueller/word_cloud/blob/master/wordcloud/wordcloud.py#L195

If you already did tokenization, I think it's better to call generate_from_frequencies, to circumvent the tokenzation and normalization in WordCloud. You need to count the occurrences first, but that should be a one-line for-loop.

@ArpiJakab
Copy link
Author

ArpiJakab commented Jun 16, 2017 via email

@Prashiksha
Copy link

I want to work on this issue. Getting some heads up/approval would be helpful. I am a Masters student trying to work on this open source project.

@amueller
Copy link
Owner

@Prashiksha go for it!

@Prashiksha
Copy link

I have found the solution to this issue. I want to resolve this issue and want it to be closed. Shall I modify the changes in API or how you want me to incorporate the change?
Seeking for some help @amueller

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants