-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added byte shuffle, CRC32c, carryless multiplication, and xor-rotate #13
base: master
Are you sure you want to change the base?
Conversation
451b7fd adds support for the carryless multiplication – Compared to "regular" integer multiplication, carryless multiplications replaces the addition of all minor products by an exclusive-or operation. It is invertible as long as the constant multiplier is odd. |
0f70a23 adds support for xor-rotate It internally is called As this is the first operation to accept two operands, some additional but minor changes to the code were required (data struct members, parsing). At this stage, the pull request also fixes #14. |
I might have a look some time or other. |
The last two commits add CRC32c-step support if hardware supports it. Works for 32-bit hashes only. Use |
It's interesting to check the results - maybe new good hash functions were found? |
You can download the current code of this branch from github's auto-archiver for testing. However,
So, I have put it into Also, for verification purpose, I tried I even conducted some trials using the very same constant at all three spots – leading to the exact same bias. Not needing to load several constants, this approach could slightly increase the hashing function's speed. So, Furthermore, removing the final So, a CRC32-MUL-CRC32 scheme looks quite promising – if I am not misled here which would be extremely embarrassing... 😉 I will try to throw some more tests at this kind of hashing function and report back soon. |
I am excited to see that it passes SmallCrunch (15/15) and masters Crunch (140/144). More to follow. |
Another function to add is higher 64 bits of 64bit * 64bit -> 128bit multiplication. |
On the one hand, the higher bits might not make a huge difference when compared to the lower 64 bits as they get used now. The influence of the other bits is just inverse, i.e. MSB of upper 64 bits has almost no dependencies and in that respect corresponds to the LSB of the lower 64 bits. On the other hand however, we have addition's carry smearing effect presumably being stronger on the upper 64 bits of the result than in the lower half. So, yes, I think it might be slightly more interesting than current multiplication. The assembly part will be easy, just But let's see... [EDIT] First trials suggest to not expect too much from it |
I had it running for a while now. For regular multiplication (lower bits of result), I get
The modified multiplication (upper half of result) gave
I cross-checked the multiplier constant with the respectively other multiplication and saw So, the modified multiplication might offer some bias opportunities but seems to require its own class of constants. I will clean-up my test implementation and offer it as additional option in a while. But – before moving on – one more thought as I see failing SmallCrush on the "upper-bits of multiplication result": Under what circumstances is it guaranteed to be bijective? |
It might be better to use the mulx primitive, which xors the low and high part of the multiplication, instead of just taking the high part. This - to the best of my knowledge - has been first proposed by Vladimir Makarov in MUM hash. The original uses addition, but later improvements such as wyhash seem to have standardized on xor. Abseil uses this in its hash function for integers (https://godbolt.org/z/MTeznrTc7) and it seems to work well for them as it gives good spread with only a single multiplication. I don't think this primitive is a bijection though. |
Sounds interesting... If interest is, I might give it a shot sometime soon and add it to the PR (probably not before September). |
This pull request adds a byte shuffle
shf
to hash prospector for 32-bit as well as 64-bit hashing functions.It relies on SSSE3's
pshufb
instruction and only works on corresponding hardware which should comprise most recent Intel/AMD CPUs. All related additions to the code are guarded by#ifdef
s.-p
-provided patterns can useshf
orshf:<perm>
where<perm>
denotes a permutation of the byte positions. In 32-bit mode,shf:03020100
describes identity, i.e. no change of position, andshf:00010203
equals an endianess changing byte swap. In 64-bit mode (-8
), those permutations need to be longer, e.g.shf:0605040302010007
corresponding to an 8-bit left rotate. A soleshf
employs a randomly generated permutation.Some additional thoughts especially on 32-bit byte shuffle and their implementation can be found here.
Fixes #7.
Fixes #14.
Fixes #17.