Use memory address instead of counter for hashes of some mutable types #2934
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a first draft addressing the points in #2591 (comment). I've started with a relatively conservative approach:
hashFromAddress
, which takes anExpr
and returns a hash code based on its address in memory. Since the addresses are all multiples of powers of 2, by themselves, they make awful hash codes with lots of collisions. So we use Fibonacci hashing to spread them out.Expr
type that previously usednextHash
to determine its hash code but is not one of the input types foryoungest
orserialNumber
now useshashFromAddress
.Before
After
Questions
youngest
says it only accepts mutable hash tables, but in reality it also accepts files, compiled function closures, and symbols. Are these necessary? Should we move these over tohashFromAddress
as well?nextHash
to justType
objects, or keep it for all mutable hash tables?serialNumber
function, which also uses hash codes for some types to determine the age of an object, is only used by theSerialization
package, which is "still experimental and preliminary". Do we need this? Or could we move these over tohashFromAddress
, too?