-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should the 1 billion row file be deterministic? #35
Comments
PR welcome for this change to the generator. Note that I am already using the same measurements.txt file for evaluating all entries, i.e. fairness is ensured. |
The evaluation shouldn't use a public test file because that allows the contenders to tightly optimize for the exact keyset in that file. For example, tweaking the hash function to minimize collisions, having special cases for some keys, sizing everything exactly right for the keyset, etc. |
If there's concern that some solution may just get unlucky with a given keyset, the winner can be determined by repeating the test with 2-3 different test files. I very much doubt that this would be a factor, given the large keyset size (10,000); more noise can be expected from all the environmental factors on the test machine. |
You are absolutely correct, and I agree! I do not think that the current test-file should be shared or changed, but am asking for determinism so that it becomes a lot easier to compare/run on 1 billion row files that other contestants are using without requiring transmission of the entire data file. |
Currently it seems that the 1 billion rows file is generated randomly. Making the generation pseudorandom would make sharing the 1 billion row file a little easier (since it should always be the same), and would make sure that everyone is running exactly the same test.
Just using a
Random
with a predefined seed to pick out stations, and seeding aRandom
with the hash code of the city name to obtain measurements should do the trick.The text was updated successfully, but these errors were encountered: