-
Notifications
You must be signed in to change notification settings - Fork 231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update hashFromAddress function to return larger hashes #3429
Conversation
e405916
to
79ae91c
Compare
I'm a little confused. If I open up M2 v1.24.05 I get higher hash numbers: i1 : hash(() -> null)
o1 = 1761012022
i2 : hash(() -> null)
o2 = 1760525980 Did we change this recently and now changing it back? To be clear, I see #3335, but I don't understand how that change limited hashes to 512. |
Oh I see, never mind, with p=9 you changed |
-- so we swap them. We use p = 10 here since #buckets Matrix = 2^10. | ||
export hashFromAddress(e:Expr):hash_t := ( | ||
x := Ccode(hash_t, "11400714819323198485ull * (unsigned long)", e); | ||
Ccode(hash_t, "(", x, " >> 54) | (", x, " << 10)")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment talks about p
but the code never uses p
. Maybe keeping the two functions still separated would make this clear?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The last sentence of the comment "assigns" a specific value to p
.
I figured removing the fibonacciHashing
function might make sense for several reasons. We only called it in the one place, we only ever used one specific value for p
, and the name was no longer quite accurate since we're now doing something slightly different than "Fibonacci hashing" by hanging on to the least significant bits.
I'm a little confused about this statement. If in practice we use the p least significant bits of the hash for assigning buckets in hash tables and such, then isn't it misleading to show huge hash numbers to us? In other words, if two objects are going to end up in the same bucket I want to see that their hash is the same, because I would never guess that actually they have the same p least significant bits. |
It wouldn't affect the hash size, but we'd end up using something other than the Fibonacci hashing function since we wouldn't be using the most significant bits after our multiplication any more.
Hash codes serve at least two purposes -- assigning keys to buckets in hash tables and keeping track of the age of types. I agree that using these huge 64-bit integers can be misleading for the first purpose since we end up modding out by some much smaller power of 2, but it's definitely useful for the second purpose since folks were starting to hit the upper bound of 32-bit hash codes. Also, even with small hash codes, it's hard to tell if two objects will end up in the same bucket unless we know how many buckets we have. For example, 1 and 5 get sorted into the same bucket if our hash table is small, but different buckets if it's gets bigger: i2 : buckets set {1, 5}
o2 = {{}, {(5, 1), (1, 1)}, {}, {}}
o2 : List
i3 : buckets set {1, 2, 3, 4, 5}
o3 = {{}, {(1, 1)}, {(2, 1)}, {(3, 1)}, {(4, 1)}, {(5, 1)}, {}, {}}
o3 : List |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation!
Currently, function closures have relatively small hashes:
This is because, by construction, they will always be less than 2^9 = 512. But this leads to occasional build failures:
On average, we'd expect this build failure to happen 1 out of every 512 builds. (Note that
res
andsyz
actually have slightly larger hashes -- they'reSpecialExpr
objects and notFunctionClosure
objects, so we take a linear combination the hashes of the underlying function closures and theMethodFunctionWithOptions
type.)Currently, what we do is apply Fibonacci hashing to the address of the function in memory (basically, divide it by the golden ratio) and take only the 9 most significant bits.
The proposal is to move these most significant bits (switching from 9 to 10, since there's at least one hash table -- the type
Matrix
-- with 2^10 buckets) to the end of the hash and the remaining 54 least significant bits to the beginning. This way, the chances of two functions having the same hash will be vanishingly small, but we still take advantage of Fibonacci hashing when it's time to use the hash to stuff things in a hash table.So afterwards:
We also combine the corresponding code into a single function and improve the accompanying comment.