-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hashing Emojis in strings is not consistent #11
Comments
Ouch. Thanks for reporting this. We have only tested some unicode symbols here: https://github.com/replikativ/hasch/blob/master/test/hasch/test.cljc#L35 It seems this was not sufficient. Do you have some experience with unicode? |
Unfortunately not. An internal user of mine just started using emoticons. I use hasch to validate things cross-platform, so I was considering disabling this specific validation for now. But since you are actively maintaining this, I wouldn't mind looking into unicode and making a pull request in the next few days if you want. |
That would be very good. Let me know when you get stuck and I will try to help. This bug is bad. |
What is your setup for running and testing this project? So far, I have been getting stuck with the error In the meantime, I started taking snippits of the code to test against. Getting the byte values, things are consistent between the two so far. However, the 'utf8' step in
Maybe my bytes-to-int function is off, I am just using it for seeing the results ATM. I will try some different ways to visualize the bytes to narrow it down. Then maybe I will look into the .getBytes Java implementation |
Alright... I think the problem is that cljs I think either the .cljs version of 'utf8' needs to go into a signed int array instead of unsigned, or we need to coerce the results of |
Hey, sorry for taking so long, I had to work through some other issues first. I have updated the Could you open a separate branch against |
@Erdromian Did that fix your issue with the piggieback middleware? |
No, I basically just stopped using it in my project. |
Ok, no problem. Thanks for reporting back. How did you make things consistent in your project? Did you manage to make JavaScript's and Java's UTF8 representation match? |
Just ran into this - I think the JS algorithm isn't entirely correct and doesn't handle UTF-16 surrogate pairs in JS. I was able to get around this by using the google closure fn that seems to do the right thing: |
This is great news! Thanks so much for coming back here. I can incorporate it as soon as possible, but if you feel like opening a PR and become a contributor then I am also happy to review that. |
Running into an issue where strings don't hash the same cross-platform when emojis are present. I assume this extends to more Unicode as well.
(is (= "1a934e63-1467-50af-b644-a35343cb16b9" (str (hasch/uuid "👍 👍 👍")))) ; in Clojure
(is (= "28fe1faf-09a0-5def-a1e6-78326e03882b" (str (hasch/uuid "👍 👍 👍")))) ; in ClojureScript
The text was updated successfully, but these errors were encountered: