-
Notifications
You must be signed in to change notification settings - Fork 409
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug: PII Filter Incorrectly Masks the Word 'individual' as Sensitive Data #818
Comments
Hi @Pouyanpi |
Hi @mohilmakwana3107 , sorry for getting back to you late. I can confirm this issue. Please see that this is expected in general. So basically presidio tags the word So we should set a proper threshold in the I'll open a PR to fix this bug soon. |
@mohilmakwana3107, A draft PR is available. Feel free to test it on your side. ✏ Anyone who is willing to contribute can have a look at the draft PR and continue from there. |
Thank you, @Pouyanpi, for jumping on this issue and providing a fix so quickly! I really appreciate your help and the detailed explanation. I’ll check out the draft PR on my end. Thanks again for your support! |
@mohilmakwana3107 did you find time to verify the fix: If you or anyone else interested to verify it: To test the PR locally:
|
@Pouyanpi |
Did you check docs and existing issues?
Python version (python --version)
Python 3.11.0
Operating system/version
Ubuntu 20.04.6 LTS
NeMo-Guardrails version (if you must use a specific version and not the latest
0.10.1
Describe the bug
I am currently testing the PII filter functionality for a project. Initially, everything was working fine. However, I recently noticed that the PII filter is masking the word "individual" with
X
(as per my code's configuration to mask sensitive data with theX
character).I reviewed the logs, but I couldn't find any specific information to explain why the word "individual" is being masked as sensitive data.
YAML Configuration
Below is the YAML configuration I'm using:
I also checked the RAG output, but there doesn't seem to be any issue there.
Version Specifications
Let me know if you need further details or clarification!
Steps To Reproduce
Steps to Reproduce:
YAML Configuration:
Use the attached YAML config for PII filtering.
PDF Creation:
Create and ingest a PDF into PG Vector containing sample PII data:
Run Test:
Use the PII filter and observe that the word "individual" is incorrectly masked with
X
.RAG Setup:
Note: The PII data is fictional, generated using LLM.
Expected Behavior
The expected behavior was supposed to not block "individual" word.
RAG response :
Logs :
Actual Behavior
Answer from NeMo-Guardrails :
The text was updated successfully, but these errors were encountered: