Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added byte shuffle, CRC32c, carryless multiplication, and xor-rotate #13

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

Logan007
Copy link
Contributor

@Logan007 Logan007 commented May 8, 2021

This pull request adds a byte shuffle shf to hash prospector for 32-bit as well as 64-bit hashing functions.

It relies on SSSE3's pshufb instruction and only works on corresponding hardware which should comprise most recent Intel/AMD CPUs. All related additions to the code are guarded by #ifdefs.

-p-provided patterns can use shf or shf:<perm> where <perm> denotes a permutation of the byte positions. In 32-bit mode, shf:03020100 describes identity, i.e. no change of position, and shf:00010203 equals an endianess changing byte swap. In 64-bit mode (-8), those permutations need to be longer, e.g. shf:0605040302010007 corresponding to an 8-bit left rotate. A sole shf employs a randomly generated permutation.

Some additional thoughts especially on 32-bit byte shuffle and their implementation can be found here.

Fixes #7.
Fixes #14.
Fixes #17.

@Logan007
Copy link
Contributor Author

Logan007 commented Jun 12, 2021

451b7fd adds support for the carryless multiplication – clmul or clmul:<constant>in -p's patterns – in case hardware supports the CLMUL instruction set (usually all modern CPUs).

Compared to "regular" integer multiplication, carryless multiplications replaces the addition of all minor products by an exclusive-or operation. It is invertible as long as the constant multiplier is odd.

@Logan007 Logan007 changed the title added byte shuffle added byte shuffle and carryless multiplication Jun 12, 2021
@Logan007
Copy link
Contributor Author

Logan007 commented Jun 12, 2021

0f70a23 adds support for xor-rotate x ^= (x <<< a) ^ (x <<< b), xrot2 in patterns. It optionally takes (exactly) two operands, e.g. -p mul,xrot2:17:13,mul.

It internally is called xrot2 because it needs two rotates to be invertible.

As this is the first operation to accept two operands, some additional but minor changes to the code were required (data struct members, parsing).

At this stage, the pull request also fixes #14.

@Logan007 Logan007 changed the title added byte shuffle and carryless multiplication added byte shuffle, carryless multiplication, and xor-rotate Jun 12, 2021
@alexey-milovidov
Copy link

@Logan007 Maybe also add #17 to this pull request?

@Logan007
Copy link
Contributor Author

I might have a look some time or other.

@Logan007 Logan007 changed the title added byte shuffle, carryless multiplication, and xor-rotate added byte shuffle, CRC32c, carryless multiplication, and xor-rotate Nov 20, 2021
@Logan007
Copy link
Contributor Author

Logan007 commented Nov 20, 2021

The last two commits add CRC32c-step support if hardware supports it. Works for 32-bit hashes only. Use crc32 as pattern, e.g. -p crc32,mul,crc32.

@alexey-milovidov
Copy link

It's interesting to check the results - maybe new good hash functions were found?

@Logan007
Copy link
Contributor Author

Logan007 commented Nov 23, 2021

It's interesting to check the results - maybe new good hash functions were found?

You can download the current code of this branch from github's auto-archiver for testing.

However, ./prospector -p crc32,mul,crc32 proposes quite some CRC32-MUL-CRC32 functions. They all seem to have similar bias. For example, one function was

// score = 1.9029479221291472
uint32_t
hash(uint32_t x)
{
    x = _mm_crc32_u32(x, 0x2e84292d);
    x *= 0x941325ab;
    x = _mm_crc32_u32(x, 0x902c1e3f);
    return x;
}

So, I have put it into hash.c – mind the #include <immintrin.h> – and compiled and ran ./prospector -4 -Eel ./hash.so to determine the more exact bias which was calculated to 0.21402936517975638 and apparently hits the magnitude of two-round functions (XORSHIFT-MUL-XORSHIFT-MUL-XORSHIFT).

Also, for verification purpose, I tried ./prospector -4 -l ./hash.so -L | more which seems to deliver identical output as ./prospector -p crc32:2e84292d,mul:941325ab,crc32:902c1e3f -L | more.

I even conducted some trials using the very same constant at all three spots – leading to the exact same bias. Not needing to load several constants, this approach could slightly increase the hashing function's speed. So, ./prospector -p crc32:941325ab,mul:941325ab,crc32:941325ab -Ee outputs 0.21402936517975638.

Furthermore, removing the final crc32 gives a bias of 299.51029008431851 (CRC32-MUL) whereas removing the first crc32 results in a bias of 275.79160312307044 (MUL-CRC32).

So, a CRC32-MUL-CRC32 scheme looks quite promising – if I am not misled here which would be extremely embarrassing... 😉

I will try to throw some more tests at this kind of hashing function and report back soon.

@Logan007
Copy link
Contributor Author

I am excited to see that it passes SmallCrunch (15/15) and masters Crunch (140/144). More to follow.

@alexey-milovidov
Copy link

Another function to add is higher 64 bits of 64bit * 64bit -> 128bit multiplication.
Not sure if it will be useful... found somewhere here: https://github.com/erthink/t1ha

@Logan007
Copy link
Contributor Author

Logan007 commented Nov 25, 2021

higher 64 bits of 64bit * 64bit -> 128bit multiplication

On the one hand, the higher bits might not make a huge difference when compared to the lower 64 bits as they get used now. The influence of the other bits is just inverse, i.e. MSB of upper 64 bits has almost no dependencies and in that respect corresponds to the LSB of the lower 64 bits.

On the other hand however, we have addition's carry smearing effect presumably being stronger on the upper 64 bits of the result than in the lower half.

So, yes, I think it might be slightly more interesting than current multiplication. The assembly part will be easy, just mul (instead of currently used imul) and mov the upper 64 bit from RDX to RAX. The C-code will be a bit more tricky, slower and perhaps not even portable – either (a * (unsigned __int128)b) >> 64 or manual multiplication, see here.

But let's see... [EDIT] First trials suggest to not expect too much from it

@Logan007
Copy link
Contributor Author

Logan007 commented Nov 26, 2021

I had it running for a while now.

For regular multiplication (lower bits of result), I get 761(32 bit) and 772 (64 bit) bias:

// score = 761.60037986541965
uint32_t
hash(uint32_t x)
{
    x *= 0x365554ab;
    return x;
}
// score = 772.3235805382576
uint64_t
hash(uint64_t x)
{
    x *= 0x6edd5445d569aaad;
    return x;
}

The modified multiplication (upper half of result) gave 701 (32 bit) and 741 (64 bit) bias.

// score = 701.09132626454516
uint32_t
hash(uint32_t x)
{
    x *= 0xd5554ac7; // upper half of result !!!
    return x;
}
// score = 741.46122558848492
uint64_t
hash(uint64_t x)
{
    x *= 0xab566a5dab5acd4d; // upper half of result !!!
    return x;
}

I cross-checked the multiplier constant with the respectively other multiplication and saw 743 (32 bit) and 757 (64 bit) when using the constant from normal multiplication in modified multiplication. Using the constants found with modified multiplication in regular multiplication yielded 778 (32 bit) and 771 (64 bit).

So, the modified multiplication might offer some bias opportunities but seems to require its own class of constants. I will clean-up my test implementation and offer it as additional option in a while.

But – before moving on – one more thought as I see failing SmallCrush on the "upper-bits of multiplication result": Under what circumstances is it guaranteed to be bijective?

@pdimov
Copy link

pdimov commented Jul 13, 2022

It might be better to use the mulx primitive, which xors the low and high part of the multiplication, instead of just taking the high part. This - to the best of my knowledge - has been first proposed by Vladimir Makarov in MUM hash. The original uses addition, but later improvements such as wyhash seem to have standardized on xor.

Abseil uses this in its hash function for integers (https://godbolt.org/z/MTeznrTc7) and it seems to work well for them as it gives good spread with only a single multiplication.

I don't think this primitive is a bijection though.

@Logan007
Copy link
Contributor Author

Logan007 commented Aug 7, 2022

Sounds interesting... If interest is, I might give it a shot sometime soon and add it to the PR (probably not before September).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants