Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug Report: ProfileReport fails if column has only empty strings #1373

Closed
3 tasks done
BraginIvan opened this issue Jun 29, 2023 · 2 comments · Fixed by #1459
Closed
3 tasks done

Bug Report: ProfileReport fails if column has only empty strings #1373

BraginIvan opened this issue Jun 29, 2023 · 2 comments · Fixed by #1459
Assignees
Labels
bug 🐛 Something isn't working

Comments

@BraginIvan
Copy link

BraginIvan commented Jun 29, 2023

Current Behaviour

    profile = ProfileReport(
        pd.DataFrame({
            "q": ["", None, " "],
            "v": [1, 2, 3]
        }),
        minimal=True,
    ).html

Fails with

    wordcloud = WordCloud(
  File "/wordcloud/wordcloud.py", line 410, in generate_from_frequencies
    raise ValueError("We need at least 1 word to plot a word cloud, "
ValueError: We need at least 1 word to plot a word cloud, got 0.

Expected Behaviour

Should create an html

Data Description

one of df columns should have only empty strings and Nones

Code that reproduces the bug

import pandas as pd 
from ydata_profiling import ProfileReport
profile = ProfileReport(
        pd.DataFrame({
            "q": ["", None, " "],
            "v": [1, 2, 3]
        }),
        minimal=True,
).html

pandas-profiling version

v4.3.1

Dependencies

pandas=1.5.3

OS

MacOs 13.4 (22F66)

Checklist

  • There is not yet another bug report for this issue in the issue tracker
  • The problem is reproducible from this bug report. This guide can help to craft a minimal bug report.
  • The issue has not been resolved by the entries listed under Common Issues.

Hints

    pd.DataFrame({
        "q": ["", None, " "],
        "v": [1, 2, 3]
    })
  • q type is Text in the df ^
  • text summarizer culculates word_counts but remove whitespaces. So we have empry series
    ydata_profiling.model.pandas.describe_categorical_pandas.word_summary_vc
    words = word_lists.explode().str.strip(string.punctuation + string.whitespace)
  • wordcloud cant work with empty series
    wordcloud.wordcloud.WordCloud.generate_from_frequencies
    if len(frequencies) <= 0:
    raise ValueError("We need at least 1 word to plot a word cloud, "
    "got %d." % len(frequencies))
  • As I understood in the previous versions q would be categorical not Text
@BraginIvan BraginIvan changed the title Bug Report Bug Report: ProfileReport fails if column has only empty strings Jun 29, 2023
@fabclmnt fabclmnt added bug 🐛 Something isn't working and removed needs-triage labels Jul 6, 2023
@Ge0f3
Copy link

Ge0f3 commented Jul 7, 2023

Any update on this or a solution to bypass this?

@alexbarros
Copy link
Contributor

Any update on this or a solution to bypass this?

Not yet, but you can bypass this by setting the variable as categorical using type_schema:

import pandas as pd 
from ydata_profiling import ProfileReport
profile = ProfileReport(
        pd.DataFrame({
            "q": ["", None, " "],
            "v": [1, 2, 3]
        }),
        minimal=True,
        type_schema={"q": "categorical"},
).html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐛 Something isn't working
Projects
Status: Approval
Development

Successfully merging a pull request may close this issue.

5 participants