Skip to content
This repository has been archived by the owner on Jun 27, 2022. It is now read-only.

working with Guids #1

Open
yatskovsky opened this issue Oct 26, 2015 · 2 comments
Open

working with Guids #1

yatskovsky opened this issue Oct 26, 2015 · 2 comments

Comments

@yatskovsky
Copy link
Contributor

Hi, thanks for the implementation.

Could you please add LogData(Guid data) overload to prevent unnecessary boxing/unboxing? (and probably overloads for int, long, string, and other popular types)

@saguiitay
Copy link
Owner

I strongly recommend that you look at the following implementation by Microsoft: https://github.com/Microsoft/CardinalityEstimation

It has support for various value types, several performance improvements, etc.

If you still prefer using my implementation, I'd be happy to accept a PR.

@yatskovsky
Copy link
Contributor Author

Your implementation is comparable to Microsoft's one in performance, both in speed and accuracy.

Main performace bottleneck is here

_registers.Sum(r => 1d / Math.Pow(2, r));

where power of 2 can be replaced with faster implementation:

    public double FastPow2(int x)
    {
        //return (double)((System.Numerics.BigInteger)1 << x);
        return (ulong)1 << x;  // right up to 2^63
    }

I wondering why Pow2_32 = 4294967297, not 4294967296. A mistake?

And this line

    dv = (int)(Math.Pow(-2, 32)* Math.Log(1 - dvEstimate / Pow2_32));

should be

    dv = (int)(-Pow2_32* Math.Log(1 - dvEstimate / Pow2_32));

bacause it now returns negative estimate for 1,000,000,000 guids.

And I'd prefer Count() to return long, for large data sets. (Upd: Int is OK. I found that this algorithm is dimensioned for cardinalities up to 10^9. Please, specify this in the description.).

I also opened LogHash function, because I found that Guids are good hashes itself (at least first 8 bytes) and hashing them adds no gain.

    public void LogData(Guid data)
    {
        var hash = GetHash(data);

        LogHash(hash);
    }

    public void LogHash(ulong hash)
    {
        var registerIndex = GetRegisterIndex(hash); // binary address of the rightmost b bits
        var runLength = RunOfZeros(hash); // length of the run of zeroes starting at bit b+1
        _registers[registerIndex] = Math.Max(_registers[registerIndex], runLength);
    }

I created a pull request with the changes above.
Thanks

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants