Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rcpp sugar indexHash (unique, sort_unique, etc.) distinguish signed and unsigned zeros #1340

Open
SebKrantz opened this issue Oct 31, 2024 · 2 comments

Comments

@SebKrantz
Copy link

Hi Dirk, sorry if this is not exactly as much effort as you expect for an issue, I just wanted to flag something reported to collapse (#648), which is present in both my hash functions written in C and your hash functions, and that is the following:

library(collapse)
#> Warning: package 'collapse' was built under R version 4.3.3
#> collapse 2.0.17, see ?`collapse-package` or ?`collapse-documentation`

x = round(rnorm(100))
unique(x)               # R
#> [1]  1  0  2 -1 -2
funique(x)              # My hash function in C
#> [1]  1  0  0  2 -1 -2
funique(x, sort = TRUE) # Rcpp::sugar::sort_unique()
#> [1] -2 -1  0  0  1  2
# More explicit proof
collapse:::sortuniqueCpp(x)
#> [1] -2 -1  0  0  1  2

# The solution
y = x + 0L

funique(y)              
#> [1]  1  0  2 -1 -2
collapse:::sortuniqueCpp(y)
#> [1] -2 -1  0  1  2

Created on 2024-10-31 with reprex v2.0.2

In words: R functions like round() create signed and unsigned zeros, whose hashes differ. A quite efficient remedy is to add an integer zero (gives like a 3% slower execution on my very efficient C hash). I'm considering to roll this out, but of course cannot control your code. So just pushing it to you as food for thought.

@eddelbuettel
Copy link
Member

eddelbuettel commented Oct 31, 2024

Thanks for the note. We will ponder this, Making a change may risk upsetting existing code, but documenting your workaround of adding an explicit zero seems like a good idea.

@SebKrantz
Copy link
Author

I'd be surprised if adding a zero breaks any code - except for this particular issue which is clearly not meant to be as far as R is concerned. But yeah, haven't checked yet what base R does to avoid sign bits with zeros - probably something smarter but slower...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants