-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ZeroDivisionError: division by zero #274
Comments
Thanks, I can reproduce. Not sure what a good fix is. Did you get this in a real usecase? |
Hi Andreas, thank you for responding quickly. My data includes a comma separated set of monogram and bigram sentiments like "funny", "not funny", "bad", "never good" etc...
The word cloud is only shows the second word of the bigram. For example "not funny" is the most common sentiment, although the cloud only shows "funny". I've tried changing all "not funny" to "not-funny" although no change.
I reduced the data set to a single line and that's when I hit the divide by zero error.
- Arpi
… On Jun 15, 2017, at 1:49 PM, Andreas Mueller ***@***.***> wrote:
Thanks, I can reproduce. Not sure what a good fix is. Did you get this in a real usecase?
I "fixed" it such that "funny funny" is now not detected as a collocation, and the output will be a word-cloud containing "funny", not "funny funny".
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
not is removed because it's a stop-word and English stopwords are removed: https://github.com/amueller/word_cloud/blob/master/wordcloud/wordcloud.py#L195 If you already did tokenization, I think it's better to call |
Great, I'll give it a go, thanks!
- Arpi
… On Jun 16, 2017, at 7:44 AM, Andreas Mueller ***@***.***> wrote:
not is removed because it's a stop-word and English stopwords are removed: https://github.com/amueller/word_cloud/blob/master/wordcloud/wordcloud.py#L195
If you already did tokenization, I think it's better to call generate_from_frequencies, to circumvent the tokenzation and normalization in WordCloud. You need to count the occurrences first, but that should be a one-line for-loop.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
I want to work on this issue. Getting some heads up/approval would be helpful. I am a Masters student trying to work on this open source project. |
@Prashiksha go for it! |
I have found the solution to this issue. I want to resolve this issue and want it to be closed. Shall I modify the changes in API or how you want me to incorporate the change? |
wordcloud = WordCloud().generate('not funny, funny,')
Generates error:
wordcloud = WordCloud().generate('not funny, funny,')
File "/Library/Python/2.7/site-packages/wordcloud/wordcloud.py", line 556, in generate
return self.generate_from_text(text)
File "/Library/Python/2.7/site-packages/wordcloud/wordcloud.py", line 541, in generate_from_text
words = self.process_text(text)
File "/Library/Python/2.7/site-packages/wordcloud/wordcloud.py", line 522, in process_text
word_counts = unigrams_and_bigrams(words, self.normalize_plurals)
File "/Library/Python/2.7/site-packages/wordcloud/tokenization.py", line 57, in unigrams_and_bigrams
if score(count, counts[word1], counts[word2], n_words) > 30:
File "/Library/Python/2.7/site-packages/wordcloud/tokenization.py", line 22, in score
p2 = (c2 - c12) / (N - c1)
ZeroDivisionError: division by zero
The text was updated successfully, but these errors were encountered: