Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Triplicates can be avoided by using special characters #24

Open
itsrachelfish opened this issue Nov 3, 2015 · 4 comments
Open

Triplicates can be avoided by using special characters #24

itsrachelfish opened this issue Nov 3, 2015 · 4 comments

Comments

@itsrachelfish
Copy link
Member

By including special characters (color codes, bold, etc.) or only making small changes like adding a space or exclamation mark, it is possible to bypass fishy's triplicate detection.

Fishy should strip special characters from messages and do a text comparison of the most recent lines to make sure they don't have repeating sections. For example, the following messages should trigger triplicate detection even though they aren't exact matches:

hex: IS IT TRUE THAT YOU LOVE BUTTS?
weazzy: IS IT TRUE THAT YOU LOVE BUTTS?
rachel: IS IT TRUE THAT YOU LOVE BUTTS?
@tmick0
Copy link

tmick0 commented Nov 3, 2015

this sounds like a job for FUZZY HASHING

@ghost
Copy link

ghost commented Nov 3, 2015

Nah, easier solved with "stripColorsAndStyle" from https://github.com/fent/irc-colors.js

@itsrachelfish
Copy link
Member Author

@edwin-pers Your "stripColorsAndStyle" solution would not solve the example given.

@le1ca Thank you for the tip, I found a fuzzy hashing lib for node.js: https://github.com/huwenshuo/ctph.js

@tmick0
Copy link

tmick0 commented Nov 3, 2015

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants