About the avalanche bias #28

jonmaiga · 2023-11-14T13:28:38Z

First I think this is a really awesome project! I stumbled across it some years ago when I was playing with a similar thing (automatically find good mixers) and I remember it as very inspiring!

So I found the lists of mixers (in the readme+an issue) and I though it would be a good correlation test to run through my Tests for Randomness (TFR) and PractRand, both using Pelle Evensen's harsh RRC-scheme to scramble the input bits.

I ran through +100 of the prospector mixers (love the plain text format plus 32-bit mixers are so quick to evaluate even with RRC), and I thought it might be interesting to share the results here.

What may come as a surprise is that the original 32-bit prospector (from the article) seems to performing very well. The original is the only one that goes to 2^18 in TFR the second runner up fails at 2^13 of all tested prospector mixers.

This leads me to the question if something in the search code of prospector changed since the original was found?

RRC-PractRand	TFR	Construction	Alias	Comment
14	12	16 85ebca6b 13 c2b2ae35 16 = ?	murmur3	For reference
16	18	15 2c1b3c6d 12 297a2d39 15 = 0.34968228323361017	original	From the prospector article
14	10	16 21f0aaad 15 d35a2d97 15 = 0.10760229515479501	low_bias	The one with lowest bias (by TheIronBorn)

For reference, the furthest I've ever seen a xmxmx 32-bit mixer go in RRC-PractRand is to 2^18 and in TFR to 2^19.

Now PractRand is a bit of a blackbox to me, but since I am very familiar with TFR I investigated how the low_bias one fails. If you run it through TFR it will say something like:

==========================================================================================
mix32::prospector_low_bias 2^10 with 128 samples using 32-bits (0s)
bic,coupon,divisibility,gap,mean,permutation,runs,sac,uniform
==========================================================================================
test       p-value  remark       stream/description
uniform-m  0        failure(10)  mix32::prospector_low_bias(reverse_co-31(counter-1))

***FAILED*** at 2^10 with a total of 1 failures and 0 suspicious results.

In the uniform test we expect a uniform distribution but the statistics says that this is very likely not the case. The mix32::prospector_low_bias(reverse_co-31(counter-1)) indicates the used test sample (out of 128). This is a unitary counter has reversed bits + complement and rotated 31 bits (as per RRC). If we were to recreate this in code it would look something like this in c++ (copy pasted a bit so pardon the overkill with templates etc):

#include <iostream>
#include <vector>

template <typename T>
constexpr int bit_sizeof() {
	return 8 * sizeof(T);
}

template <typename T>
constexpr T ror(T v, int r) {
	constexpr auto Bits = bit_sizeof<T>();
	return (v << r) | (v >> (Bits - r));
}

constexpr uint32_t reverse_bits(uint32_t x) {
	x = ((x >> 1) & 0x55555555) | ((x & 0x55555555) << 1);
	x = ((x >> 2) & 0x33333333) | ((x & 0x33333333) << 2);
	x = ((x >> 4) & 0x0F0F0F0F) | ((x & 0x0F0F0F0F) << 4);
	x = ((x >> 8) & 0x00FF00FF) | ((x & 0x00FF00FF) << 8);
	return (x >> 16) | (x << 16);
}

int main() {
	std::vector<uint32_t> ys;
	for (uint32_t i = 1; i <= 1024; ++i) {
		auto x = ror(~reverse_bits(i), 31);
		x ^= x >> 16;
		x *= 0x21f0aaad;
		x ^= x >> 15;
		x *= 0xd35a2d97;
		x ^= x >> 15;
		ys.push_back(x);
	}

	for (const auto y : ys) {
		std::cout << y << ",";
	}
}

This will output 2^10 values: 2731428922,3383114888,3653537803,1575712730,... now if we examine those with chi2 statistics in Mathematica we get:

PearsonChiSquareTest[low_bias_values, UniformDistribution[{0, 2^32-1}]] this outputs a p-value close to zero (2.23e-22).

Doing the same for the original mixer gives a p-value of 0.11 which means indistinguishable from uniform distribution. It's possible to see this with the naked eye, here are histograms of the output for the same rrc-stream from the lowbias, the original and trng (true random data):

Here it looks like the lowest bias version is not necessarily the best according to RRC-PractRand and TFR (see attached logs for many more examples). Maybe the bias calculated in prospector is an insufficient metric?

On a positive note, I strongly believe avalanche in some form is a sufficient test. In TFR, both sac and bic are very strong indicators, while most others (uniform, gap, coupons etc ..) statistics will fail early or not at all. So maybe only some minor changes could help.

I've attached two text files with logs for all prospector mixers I tested (tfr+practrand). The first column is how far the mixer came in the tests:

tfr_prospector.txt
pr_prospector.txt

This is more of a thought, if this seems interesting/relevant we can dig deeper, keep up the good work!

The text was updated successfully, but these errors were encountered:

TheIronBorn · 2023-11-15T05:35:53Z

Others have pointed out issues with this project's metric (#21) though I'm not sure if that's common to all avalanche metrics.
Note that this project uses Root Mean Square Relative Error, while TFR seems to use a Chi-Squared test

jonmaiga · 2023-11-15T12:49:32Z

Oh interesting, I didn't see that issue! Yes cycles will appear and with a mere 32-bit state I guess the probability makes it likely in practice.

For mixing or hashing arbitrary data I'm not sure how/if the cycles affects the result? In the case of using it as an PRNG I can see how (at least in a naïve implementation (state=mix(state)) some seeds should definitely be avoided. But maybe minimizing short cycles will lead to better mixing/hashing too? I haven't thought of that.

The chi2 test will only be able to say pass/fail (p-values can't be used as a guiding metric). The way I have done it is to test the avalanche (pass/fail) at intervals (perhaps at power of two) and use how much data before failure as primary metric (then I add some other statistics to get more fine grained).

But yeah, there might be better/more efficient ways.

TheIronBorn · 2023-11-15T21:43:22Z

I wonder if something like Testing for goodness rather than lack of fit of continuous probability distributions would be more useful

jonmaiga · 2023-11-23T07:53:20Z

Yes maybe, it sounds like equivalence testing (new to me) possibly would fail the test earlier than traditional testing. But tbh I've no intuition about this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About the avalanche bias #28

About the avalanche bias #28

jonmaiga commented Nov 14, 2023 •

edited

Loading

TheIronBorn commented Nov 15, 2023 •

edited

Loading

jonmaiga commented Nov 15, 2023 •

edited

Loading

TheIronBorn commented Nov 15, 2023

jonmaiga commented Nov 23, 2023

About the avalanche bias #28

About the avalanche bias #28

Comments

jonmaiga commented Nov 14, 2023 • edited Loading

TheIronBorn commented Nov 15, 2023 • edited Loading

jonmaiga commented Nov 15, 2023 • edited Loading

TheIronBorn commented Nov 15, 2023

jonmaiga commented Nov 23, 2023

jonmaiga commented Nov 14, 2023 •

edited

Loading

TheIronBorn commented Nov 15, 2023 •

edited

Loading

jonmaiga commented Nov 15, 2023 •

edited

Loading