Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Word clouds for text fields display information when in sensitive mode #1377

Closed
3 tasks done
Mjboothaus opened this issue Jul 6, 2023 · 1 comment · Fixed by #1415
Closed
3 tasks done

Word clouds for text fields display information when in sensitive mode #1377

Mjboothaus opened this issue Jul 6, 2023 · 1 comment · Fixed by #1415
Assignees
Labels
bug 🐛 Something isn't working

Comments

@Mjboothaus
Copy link

Current Behaviour

profile_report = ProfileReport(df, title=title, tsmode=False, dark_mode=True, sensitive=True)

When using the sensitive=True flag, while text values are redacted, the values still appear in the word clouds in the report.

Expected Behaviour

Word clouds should not be displayed or the text values are first de-identified.

Data Description

e.g. df["ip_address"] contains a list of ip addresses or another field with PII data.

Code that reproduces the bug

As above.

pandas-profiling version

v4.3.1

Dependencies

| pathspec                 │ 0.11.1   │          │
│ typing_extensions        │ 4.5.0    │          │
│ setuptools               │ 68.0.0   │          │
│ multidict                │ 6.0.4    │          │
│ tangled-up-in-unicode    │ 0.2.0    │          │
│ ply                      │ 3.11     │          │
│ uri-template             │ 1.3.0    │          │
│ patsy                    │ 0.5.3    │          │
│ rfc3339-validator        │ 0.1.4    │          │
│ appnope                  │ 0.1.3    │          │
│ Pillow                   │ 9.5.0    │          │
│ pyrsistent               │ 0.19.3   │          │
│ PyWavelets               │ 1.4.1    │          │
│ wcwidth                  │ 0.2.6    │          │
│ fqdn                     │ 1.5.1    │          │
│ decorator                │ 5.1.1    │          │
│ notebook                 │ 6.5.4    │          │
│ numpy                    │ 1.23.5   │          │
│ Jinja2                   │ 3.1.2    │          │
│ pandocfilters            │ 1.5.0    │          │
│ ptyprocess               │ 0.7.0    │          │
│ jupyter_server           │ 2.7.0    │          │
│ aiosignal                │ 1.3.1    │          │
│ statsmodels              │ 0.14.0   │          │
│ isoduration              │ 20.11.0  │          │
│ cfgv                     │ 3.3.1    │          │
│ matplotlib               │ 3.7.1    │          │
│ tinycss2                 │ 1.2.1    │          │
│ bleach                   │ 6.0.0    │          │
│ pip                      │ 23.1.2   │          │
│ webencodings             │ 0.5.1    │          │
│ pyzmq                    │ 25.1.0   │          │
│ python-dateutil          │ 2.8.2    │          │
│ pre-commit               │ 3.3.3    │          │
│ attrs                    │ 23.1.0   │          │
│ Pygments                 │ 2.15.1   │          │
│ platformdirs             │ 3.8.0    │          │
│ pytz                     │ 2023.3   │          │
│ asttokens                │ 2.2.1    │          │
│ tornado                  │ 6.3.2    │          │
│ cohere                   │ 4.9.0    │          │
│ async-timeout            │ 4.0.2    │          │
│ python-json-logger       │ 2.0.7    │          │
│ jupyter_server_terminals │ 0.4.4    │          │
│ networkx                 │ 3.1      │          │
│ defusedxml               │ 0.7.1    │          │
│ marshmallow              │ 3.19.0   │          │
│ jsonpointer              │ 2.4      │          │
│ wordcloud                │ 1.9.2    │          │
│ pickleshare              │ 0.7.5    │          │
│ stack-data               │ 0.6.2    │          │
│ nest-asyncio             │ 1.5.6    │          │
│ jupyter-events           │ 0.6.3    │          │
│ widgetsnbextension       │ 4.0.7    │          │
│ langchain                │ 0.0.159  │          │
│ parso                    │ 0.8.3    │          │
│ executing                │ 1.2.0    │          │
│ six                      │ 1.16.0   │          │
│ mypy-extensions          │ 1.0.0    │          │
│ jsonpath-ng              │ 1.5.3    │          │
│ kiwisolver               │ 1.4.4    │          │
│ contourpy                │ 1.1.0    │          │
│ filelock                 │ 3.12.2   │          │
│ typing-inspect           │ 0.9.0    │          │
│ yarl                     │ 1.9.2    │          │
│ click                    │ 8.1.3    │          │
│ websocket-client         │ 1.6.1    │          │
│ webcolors                │ 1.13     │          │
│ openapi-schema-pydantic  │ 1.2.4    │          │
│ scipy                    │ 1.10.1   │          │
│ virtualenv               │ 20.23.1  │          │
│ argon2-cffi              │ 21.3.0   │          │
│ tzdata                   │ 2023.3   │          │
│ numexpr                  │ 2.8.4    │          │
│ zipp                     │ 3.15.0   │          │
│ debugpy                  │ 1.6.7    │          │
│ nodeenv                  │ 1.8.0    │          │
│ arrow                    │ 1.2.3    │          │
│ idna                     │ 3.4      │          │
│ pycparser                │ 2.21     │          │
│ distlib                  │ 0.3.6    │          │
│ comm                     │ 0.1.3    │          │
│ certifi                  │ 2023.5.7 │          │
│ pure-eval                │ 0.2.2    │          │
│ backcall                 │ 0.2.0    │          │
│ cycler                   │ 0.11.0   │          │
│ requests                 │ 2.31.0   │          │
│ htmlmin                  │ 0.1.12   │          │
│ SQLAlchemy               │ 2.0.17   │          │
│ pandas                   │ 2.0.2    │          │
│ matplotlib-inline        │ 0.1.6    │          │
│ notebook_shim            │ 0.2.3    │          │
│ joblib                   │ 1.2.0    │          │
│ terminado                │ 0.17.1   │          │
│ crawlerdetect            │ 0.1.5    │          │
│ ipython-genutils         │ 0.2.0    │          │
│ multimethod              │ 1.9.1    │          │
│ dataclasses-json         │ 0.5.8    │          │
│ jupyter_client           │ 8.3.0    │          │
│ PyYAML                   │ 6.0      │          │
│ importlib-metadata       │ 5.2.0    │          │
│ beautifulsoup4           │ 4.12.2   │          │
│ packaging                │ 23.1     │          │
│ jupyterlab-pygments      │ 0.2.2    │          │
│ typeguard                │ 2.13.3   │          │
│ mistune                  │ 3.0.1    │          │
│ ipykernel                │ 6.23.3   │          │
│ nbclassic                │ 1.0.0    │          │
│ pexpect                  │ 4.8.0    │          │
│ cffi                     │ 1.15.1   │          │
│ psutil                   │ 5.9.5    │          │
│ tokenize-rt              │ 5.1.0    │          │
│ frozenlist               │ 1.3.3    │          │
│ nbformat                 │ 5.9.0    │          │
│ ipywidgets               │ 8.0.6    │          │
│ jupyterlab-widgets       │ 3.0.7    │          │
│ jupyter_core             │ 5.3.1    │          │
│ wheel                    │ 0.40.0   │          │
│ rfc3986-validator        │ 0.1.1    │          │
│ tomli                    │ 2.0.1    │          │
│ prompt-toolkit           │ 3.0.38   │          │
│ phik                     │ 0.12.3   │          │
│ argon2-cffi-bindings     │ 21.2.0   │          │
│ Send2Trash               │ 1.8.2    │          │
│ marshmallow-enum         │ 1.5.1    │          │
│ tenacity                 │ 8.2.2    │          │
│ jupyter_ai_magics        │ 0.8.0    │          │
│ exceptiongroup           │ 1.1.1    │          │
│ sh                       │ 2.0.4    │          │
│ ydata-profiling          │ 4.3.1    │          │
│ sniffio                  │ 1.3.0    │          │
│ nbconvert                │ 7.6.0    │          │
│ soupsieve                │ 2.4.1    │          │
│ backoff                  │ 2.2.1    │          │
│ ImageHash                │ 4.3.1    │          │
│ dacite                   │ 1.8.1    │          │
│ tqdm                     │ 4.65.0   │          │
│ prometheus-client        │ 0.17.0   │          │
│ overrides                │ 7.3.1    │          │
│ nbclient                 │ 0.8.0    │          │
│ fastjsonschema           │ 2.17.1   │          │
│ black                    │ 23.3.0   │          │
│ aiohttp                  │ 3.8.4    │          │
│ pydantic                 │ 1.10.9   │          │
│ anyio                    │ 3.7.0    │          │
│ MarkupSafe               │ 2.1.3    │          │
│ urllib3                  │ 2.0.3    │          │
│ charset-normalizer       │ 3.1.0    │          │
│ seaborn                  │ 0.12.2   │          │
│ fonttools                │ 4.40.0   │          │
│ jedi                     │ 0.18.2   │          │
│ pyparsing                │ 3.1.0    │          │
│ visions                  │ 0.7.5    │          │
│ ipython                  │ 8.14.0   │          │
│ identify                 │ 2.5.24   │          │
│ traitlets                │ 5.9.0    │          │
│ jsonschema               │ 4.17.3   │          │

OS

macOS

Checklist

  • There is not yet another bug report for this issue in the issue tracker
  • The problem is reproducible from this bug report. This guide can help to craft a minimal bug report.
  • The issue has not been resolved by the entries listed under Common Issues.
@fabclmnt fabclmnt added bug 🐛 Something isn't working and removed needs-triage labels Jul 11, 2023
@fabclmnt
Copy link
Contributor

Hi @Mjboothaus,

thank you for your request. this is already being considered for the next release with the following logic:

  • If private, it is expected that the same plots behaviour from "categorical" type is assumed.
  • Otherwise, the word cloud is calculated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐛 Something isn't working
Projects
Development

Successfully merging a pull request may close this issue.

4 participants