You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jun 27, 2022. It is now read-only.
Could you please add LogData(Guid data) overload to prevent unnecessary boxing/unboxing? (and probably overloads for int, long, string, and other popular types)
The text was updated successfully, but these errors were encountered:
Your implementation is comparable to Microsoft's one in performance, both in speed and accuracy.
Main performace bottleneck is here
_registers.Sum(r => 1d / Math.Pow(2, r));
where power of 2 can be replaced with faster implementation:
public double FastPow2(int x)
{
//return (double)((System.Numerics.BigInteger)1 << x);
return (ulong)1 << x; // right up to 2^63
}
I wondering why Pow2_32 = 4294967297, not 4294967296. A mistake?
And this line
dv = (int)(Math.Pow(-2, 32)* Math.Log(1 - dvEstimate / Pow2_32));
should be
dv = (int)(-Pow2_32* Math.Log(1 - dvEstimate / Pow2_32));
bacause it now returns negative estimate for 1,000,000,000 guids.
And I'd prefer Count() to return long, for large data sets. (Upd: Int is OK. I found that this algorithm is dimensioned for cardinalities up to 10^9. Please, specify this in the description.).
I also opened LogHash function, because I found that Guids are good hashes itself (at least first 8 bytes) and hashing them adds no gain.
public void LogData(Guid data)
{
var hash = GetHash(data);
LogHash(hash);
}
public void LogHash(ulong hash)
{
var registerIndex = GetRegisterIndex(hash); // binary address of the rightmost b bits
var runLength = RunOfZeros(hash); // length of the run of zeroes starting at bit b+1
_registers[registerIndex] = Math.Max(_registers[registerIndex], runLength);
}
I created a pull request with the changes above.
Thanks
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Hi, thanks for the implementation.
Could you please add LogData(Guid data) overload to prevent unnecessary boxing/unboxing? (and probably overloads for int, long, string, and other popular types)
The text was updated successfully, but these errors were encountered: