Update hashFromAddress function to return larger hashes #3429

d-torrance · 2024-08-23T16:20:50Z

Currently, function closures have relatively small hashes:

i1 : hash(() -> null)

o1 = 402

i2 : hash(() -> null)

o2 = 251

This is because, by construction, they will always be less than 2^9 = 512. But this leads to occasional build failures:

ii17 : assert ( hash res =!= hash syz )
testing.m2:24:6:(3):[5]: error: assertion failed

On average, we'd expect this build failure to happen 1 out of every 512 builds. (Note that res and syz actually have slightly larger hashes -- they're SpecialExpr objects and not FunctionClosure objects, so we take a linear combination the hashes of the underlying function closures and the MethodFunctionWithOptions type.)

Currently, what we do is apply Fibonacci hashing to the address of the function in memory (basically, divide it by the golden ratio) and take only the 9 most significant bits.

The proposal is to move these most significant bits (switching from 9 to 10, since there's at least one hash table -- the type Matrix -- with 2^10 buckets) to the end of the hash and the remaining 54 least significant bits to the beginning. This way, the chances of two functions having the same hash will be vanishingly small, but we still take advantage of Fibonacci hashing when it's time to use the hash to stuff things in a hash table.

So afterwards:

i1 : hash(() -> null)

o1 = 9799832789158200301

i2 : hash(() -> null)

o2 = 14699749183737299083

We also combine the corresponding code into a single function and improve the accompanying comment.

mahrud · 2024-08-25T12:40:19Z

I'm a little confused. If I open up M2 v1.24.05 I get higher hash numbers:

i1 : hash(() -> null)

o1 = 1761012022

i2 : hash(() -> null)

o2 = 1760525980

Did we change this recently and now changing it back?

To be clear, I see #3335, but I don't understand how that change limited hashes to 512.

mahrud · 2024-08-25T12:49:52Z

Oh I see, never mind, with p=9 you changed >> (32 - p) = 23 changed to >> (64 - p) = 55 so on 64-bit systems this is leaving just the right 9 bits non-zero. Why not just change 64 back to 32? That doesn't affect the hash size, does it?

mahrud · 2024-08-25T12:53:48Z

M2/Macaulay2/d/expr.d

+-- so we swap them.  We use p = 10 here since #buckets Matrix = 2^10.
+export hashFromAddress(e:Expr):hash_t := (
+    x := Ccode(hash_t, "11400714819323198485ull * (unsigned long)", e);
+    Ccode(hash_t, "(", x, " >> 54) | (", x, " << 10)"));


The comment talks about p but the code never uses p. Maybe keeping the two functions still separated would make this clear?

The last sentence of the comment "assigns" a specific value to p.

I figured removing the fibonacciHashing function might make sense for several reasons. We only called it in the one place, we only ever used one specific value for p, and the name was no longer quite accurate since we're now doing something slightly different than "Fibonacci hashing" by hanging on to the least significant bits.

mahrud · 2024-08-25T12:57:21Z

This way, the chances of two functions having the same hash will be vanishingly small, but we still take advantage of Fibonacci hashing when it's time to use the hash to stuff things in a hash table.

I'm a little confused about this statement. If in practice we use the p least significant bits of the hash for assigning buckets in hash tables and such, then isn't it misleading to show huge hash numbers to us? In other words, if two objects are going to end up in the same bucket I want to see that their hash is the same, because I would never guess that actually they have the same p least significant bits.

d-torrance · 2024-08-25T13:19:07Z

Why not just change 64 back to 32? That doesn't affect the hash size, does it?

It wouldn't affect the hash size, but we'd end up using something other than the Fibonacci hashing function since we wouldn't be using the most significant bits after our multiplication any more.

If in practice we use the p least significant bits of the hash for assigning buckets in hash tables and such, then isn't it misleading to show huge hash numbers to us? In other words, if two objects are going to end up in the same bucket I want to see that their hash is the same, because I would never guess that actually they have the same p least significant bits.

Hash codes serve at least two purposes -- assigning keys to buckets in hash tables and keeping track of the age of types. I agree that using these huge 64-bit integers can be misleading for the first purpose since we end up modding out by some much smaller power of 2, but it's definitely useful for the second purpose since folks were starting to hit the upper bound of 32-bit hash codes.

Also, even with small hash codes, it's hard to tell if two objects will end up in the same bucket unless we know how many buckets we have. For example, 1 and 5 get sorted into the same bucket if our hash table is small, but different buckets if it's gets bigger:

i2 : buckets set {1, 5}

o2 = {{}, {(5, 1), (1, 1)}, {}, {}}

o2 : List

i3 : buckets set {1, 2, 3, 4, 5}

o3 = {{}, {(1, 1)}, {(2, 1)}, {(3, 1)}, {(4, 1)}, {(5, 1)}, {}, {}}

o3 : List

mahrud

Thanks for the explanation!

d-torrance requested a review from mahrud August 23, 2024 16:20

Update hashFromAddress function to return larger hashes

79ae91c

d-torrance force-pushed the fibonacci-hashing branch from e405916 to 79ae91c Compare August 23, 2024 21:50

mahrud reviewed Aug 25, 2024

View reviewed changes

mahrud approved these changes Aug 25, 2024

View reviewed changes

mahrud merged commit 5e304c5 into Macaulay2:development Aug 25, 2024
5 checks passed

d-torrance deleted the fibonacci-hashing branch September 3, 2024 13:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update hashFromAddress function to return larger hashes #3429

Update hashFromAddress function to return larger hashes #3429

d-torrance commented Aug 23, 2024 •

edited

Loading

mahrud commented Aug 25, 2024 •

edited

Loading

mahrud commented Aug 25, 2024

mahrud Aug 25, 2024

d-torrance Aug 25, 2024

mahrud commented Aug 25, 2024

d-torrance commented Aug 25, 2024

mahrud left a comment

Update hashFromAddress function to return larger hashes #3429

Update hashFromAddress function to return larger hashes #3429

Conversation

d-torrance commented Aug 23, 2024 • edited Loading

mahrud commented Aug 25, 2024 • edited Loading

mahrud commented Aug 25, 2024

mahrud Aug 25, 2024

Choose a reason for hiding this comment

d-torrance Aug 25, 2024

Choose a reason for hiding this comment

mahrud commented Aug 25, 2024

d-torrance commented Aug 25, 2024

mahrud left a comment

Choose a reason for hiding this comment

d-torrance commented Aug 23, 2024 •

edited

Loading

mahrud commented Aug 25, 2024 •

edited

Loading