Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update hashFromAddress function to return larger hashes #3429

Merged
merged 1 commit into from
Aug 25, 2024

Conversation

d-torrance
Copy link
Member

@d-torrance d-torrance commented Aug 23, 2024

Currently, function closures have relatively small hashes:

i1 : hash(() -> null)

o1 = 402

i2 : hash(() -> null)

o2 = 251

This is because, by construction, they will always be less than 2^9 = 512. But this leads to occasional build failures:

ii17 : assert ( hash res =!= hash syz )
testing.m2:24:6:(3):[5]: error: assertion failed

On average, we'd expect this build failure to happen 1 out of every 512 builds. (Note that res and syz actually have slightly larger hashes -- they're SpecialExpr objects and not FunctionClosure objects, so we take a linear combination the hashes of the underlying function closures and the MethodFunctionWithOptions type.)

Currently, what we do is apply Fibonacci hashing to the address of the function in memory (basically, divide it by the golden ratio) and take only the 9 most significant bits.

The proposal is to move these most significant bits (switching from 9 to 10, since there's at least one hash table -- the type Matrix -- with 2^10 buckets) to the end of the hash and the remaining 54 least significant bits to the beginning. This way, the chances of two functions having the same hash will be vanishingly small, but we still take advantage of Fibonacci hashing when it's time to use the hash to stuff things in a hash table.

So afterwards:

i1 : hash(() -> null)

o1 = 9799832789158200301

i2 : hash(() -> null)

o2 = 14699749183737299083

We also combine the corresponding code into a single function and improve the accompanying comment.

@d-torrance d-torrance requested a review from mahrud August 23, 2024 16:20
@mahrud
Copy link
Member

mahrud commented Aug 25, 2024

I'm a little confused. If I open up M2 v1.24.05 I get higher hash numbers:

i1 : hash(() -> null)

o1 = 1761012022

i2 : hash(() -> null)

o2 = 1760525980

Did we change this recently and now changing it back?

To be clear, I see #3335, but I don't understand how that change limited hashes to 512.

@mahrud
Copy link
Member

mahrud commented Aug 25, 2024

Oh I see, never mind, with p=9 you changed >> (32 - p) = 23 changed to >> (64 - p) = 55 so on 64-bit systems this is leaving just the right 9 bits non-zero. Why not just change 64 back to 32? That doesn't affect the hash size, does it?

-- so we swap them. We use p = 10 here since #buckets Matrix = 2^10.
export hashFromAddress(e:Expr):hash_t := (
x := Ccode(hash_t, "11400714819323198485ull * (unsigned long)", e);
Ccode(hash_t, "(", x, " >> 54) | (", x, " << 10)"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment talks about p but the code never uses p. Maybe keeping the two functions still separated would make this clear?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last sentence of the comment "assigns" a specific value to p.

I figured removing the fibonacciHashing function might make sense for several reasons. We only called it in the one place, we only ever used one specific value for p, and the name was no longer quite accurate since we're now doing something slightly different than "Fibonacci hashing" by hanging on to the least significant bits.

@mahrud
Copy link
Member

mahrud commented Aug 25, 2024

This way, the chances of two functions having the same hash will be vanishingly small, but we still take advantage of Fibonacci hashing when it's time to use the hash to stuff things in a hash table.

I'm a little confused about this statement. If in practice we use the p least significant bits of the hash for assigning buckets in hash tables and such, then isn't it misleading to show huge hash numbers to us? In other words, if two objects are going to end up in the same bucket I want to see that their hash is the same, because I would never guess that actually they have the same p least significant bits.

@d-torrance
Copy link
Member Author

Why not just change 64 back to 32? That doesn't affect the hash size, does it?

It wouldn't affect the hash size, but we'd end up using something other than the Fibonacci hashing function since we wouldn't be using the most significant bits after our multiplication any more.

If in practice we use the p least significant bits of the hash for assigning buckets in hash tables and such, then isn't it misleading to show huge hash numbers to us? In other words, if two objects are going to end up in the same bucket I want to see that their hash is the same, because I would never guess that actually they have the same p least significant bits.

Hash codes serve at least two purposes -- assigning keys to buckets in hash tables and keeping track of the age of types. I agree that using these huge 64-bit integers can be misleading for the first purpose since we end up modding out by some much smaller power of 2, but it's definitely useful for the second purpose since folks were starting to hit the upper bound of 32-bit hash codes.

Also, even with small hash codes, it's hard to tell if two objects will end up in the same bucket unless we know how many buckets we have. For example, 1 and 5 get sorted into the same bucket if our hash table is small, but different buckets if it's gets bigger:

i2 : buckets set {1, 5}

o2 = {{}, {(5, 1), (1, 1)}, {}, {}}

o2 : List

i3 : buckets set {1, 2, 3, 4, 5}

o3 = {{}, {(1, 1)}, {(2, 1)}, {(3, 1)}, {(4, 1)}, {(5, 1)}, {}, {}}

o3 : List

Copy link
Member

@mahrud mahrud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation!

@mahrud mahrud merged commit 5e304c5 into Macaulay2:development Aug 25, 2024
5 checks passed
@d-torrance d-torrance deleted the fibonacci-hashing branch September 3, 2024 13:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants