Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Entry for abhinav-upadhyay #696

Closed
wants to merge 6 commits into from
Closed

Entry for abhinav-upadhyay #696

wants to merge 6 commits into from

Conversation

abhinav-upadhyay
Copy link

Check List:

  • You have run ./mvnw verify and the project builds successfully
  • Tests pass (./test.sh <username> shows no differences between expected and actual outputs)
  • All formatting changes by the build are committed
  • Your launch script is named calculate_average_<username>.sh (make sure to match casing of your GH user name) and is executable
  • Output matches that of calculate_average_baseline.sh
  • For new entries, or after substantial changes: When implementing custom hash structures, please point to where you deal with hash collisions (line number): Line 62 (I did not find any collisions in the 1 billion file or the test case, so removed the handling).
  • Execution time: 3.70 seconds
  • Execution time of reference implementation: 3 min 33 seconds

@abhinav-upadhyay
Copy link
Author

So my GitHub username is abhinav-upadhyay. But a file name cannot contain -. The CI job is failing to find the script to run my program. Is there any alternative to get around it?

@abhinav-upadhyay
Copy link
Author

So my GitHub username is abhinav-upadhyay. But a file name cannot contain -. The CI job is failing to find the script to run my program. Is there any alternative to get around it?

Realized, I can have file names with -. Just need to use quotes when referring to them on the shell.

final long uhash = nameHash ^ (nameHash >> 29);
// final long uhash = nameHash;
int index = (int) uhash & TABLE_MASK;
Row row = table[index];
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll need to handle collisions, otherwise you'd aggregate the values from several stations with the same hash.

@gunnarmorling
Copy link
Owner

Fails with the 10K key set test (see create_measurements_3.sh):

+ timeout -v 300 ./test.sh abhinav-upadhyay measurements_10K_1B.txt
Validating calculate_average_abhinav-upadhyay.sh -- measurements_10K_1B.txt
410c410
< Fort Porta;-21.8;9.3;42.0
---
> Fort Porta;-21.8;8.6;38.0
617c617
< Ki;-17.4;15.5;47.4
---
> Ki;-17.4;17.6;47.4
1020c1020
< Roch;-22.0;14.3;46.6
---
> Roch;-11.6;18.9;48.0
...

@gunnarmorling
Copy link
Owner

Still incorrect results for the 10K keyset. Note that we're after the cut-off time, so you'll have two more changes you can make to this PR (see note at the top of the README). If it's not working or valid then, I'll have to close it unfortunately. Updates should be pushed quickly, so I can evaluate all the pending entries. Thx!

return;
}

while (row.hash != uhash) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checking the hash isn't enough, you'll need to compare the actual name to differentiate between multiple stations with the same hash.

The code was generating probes but not using it.
Also, fix rounding issue when computing the mean temperature value
@gunnarmorling
Copy link
Owner

Hey @abhinav-upadhyay, I still believe this does not handle hash collisions correctly. In case of a collision, the actual names of the existing and the incoming entry must be compared. I can't find where this is happening? You can push one more time to this PR before it'll either be merged (if valid) or closed (if invalid). Thx!

@abhinav-upadhyay
Copy link
Author

Hey @abhinav-upadhyay, I still believe this does not handle hash collisions correctly. In case of a collision, the actual names of the existing and the incoming entry must be compared. I can't find where this is happening? You can push one more time to this PR before it'll either be merged (if valid) or closed (if invalid). Thx!

I am building the hash using all the bytes of the location name (lines 180-198 in readFile method). This hash is stored into a 64 bit long value. In the test cases and even in case of the 10k keys file (which contained some huge names), it is not resulting in any cases where two names have the same hash. Because of this, I have not added any checks for comparing the names themselves .

@gunnarmorling
Copy link
Owner

Understood, but absence of collisions in specific key sets doesn't mean that there cannot be collisions at all. As per the rules, you cannot make any assumptions on specific names:

Q: Can I make assumptions on the names of the weather stations showing up in the data set?
A: No, while only a fixed set of station names is used by the data set generator, any solution should work with arbitrary UTF-8 station names (for the sake of simplicity, names are guaranteed to contain no ; or \n characters).

So, in order for the entry to be considered valid, it must fall back to comparing names in case of collisions.

@gunnarmorling
Copy link
Owner

I am going to close this one as I haven't heard back within two days and we are after the cut-off time. Thanks for your efforts nevertheless!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants