Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIMD parsing newlines, integer parsing, custom hashtable with SIMD lookup table for equality #663

Merged
merged 11 commits into from
Feb 1, 2024

Conversation

ChrisBellew
Copy link
Contributor

@ChrisBellew ChrisBellew commented Jan 30, 2024

Thanks for this amazing competition!

Here's my implementation. I must have sunk 100+ hours into background benchmarking of this. I learned a lot! I purposely didn't look at any other submissions so I've no idea how similar they are. Looking forward to learning from the crazy fast submissions from others.

Check List:

  • You have run ./mvnw verify and the project builds successfully
  • Tests pass (./test.sh <username> shows no differences between expected and actual outputs)
  • All formatting changes by the build are committed
  • Your launch script is named calculate_average_<username>.sh (make sure to match casing of your GH user name) and is executable
  • Output matches that of calculate_average_baseline.sh
  • For new entries, or after substantial changes: When implementing custom hash structures, please point to where you deal with hash collisions (line number) -> line 480 (slotIndex = (slotIndex + 1) % NUM_SLOTS;)
  • Execution time: 3.974s i9 7900X, 10 Cores, 20 HT, 64GB RAM. Expecting it to take ~5s on the Hetzner AX161.
  • Execution time of reference implementation: 3m28.684s i9 7900X, 10 Cores, 20 HT, 64GB RAM

@gunnarmorling
Copy link
Owner

Please run the formatter and amend the PR with the changes.

@ChrisBellew
Copy link
Contributor Author

ChrisBellew commented Jan 31, 2024

Formatting committed, thanks @gunnarmorling!

I ran ./mvnw clean verify and committed the result.

@gunnarmorling
Copy link
Owner

This fails on the 10K keyset test (see create_measurements_3.sh) with 1B rows. Seeing these:

Exception in thread "Thread-8" java.lang.IndexOutOfBoundsException: Index 262139 out of bounds for length 262137
	at java.base/jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:100)
	at java.base/jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Preconditions.java:106)
	at java.base/jdk.internal.util.Preconditions.checkIndex(Preconditions.java:302)
	at java.base/java.util.Objects.checkIndex(Objects.java:385)
	at jdk.incubator.vector/jdk.incubator.vector.VectorIntrinsics.checkFromIndexSize(VectorIntrinsics.java:61)
	at jdk.incubator.vector/jdk.incubator.vector.ByteVector.fromArray(ByteVector.java:2975)
	at dev.morling.onebrc.CalculateAverage_chrisbellew$ThreadProcessor.processBuffer(CalculateAverage_chrisbellew.java:339)
	at dev.morling.onebrc.CalculateAverage_chrisbellew$ThreadProcessor.processRange(CalculateAverage_chrisbellew.java:303)
	at dev.morling.onebrc.CalculateAverage_chrisbellew$ThreadProcessor.run(CalculateAverage_chrisbellew.java:279)

@ChrisBellew
Copy link
Contributor Author

Fixed the issue, thanks @gunnarmorling. :)

@gunnarmorling
Copy link
Owner

This produces incorrect results for the 10K keyset test (see create_measurements_3.sh). Note that we're after the cut-off time, you'll have two more changes you can make to this PR (see note at the top of the README). If it's not working or valid then, I'll have to close it unfortunately. Updates should be pushed quickly, so I can evaluate all the pending entries. Thx!

+ timeout -v 300 ./test.sh ChrisBellew measurements_10K_1B.txt
Validating calculate_average_ChrisBellew.sh -- measurements_10K_1B.txt
WARNING: Using incubator modules: jdk.incubator.vector
48c48
< -so;-15.7;15.1;45.6
---
> -so;-15.7;14.8;45.6
50c50
< -su;-25.0;7.0;46.0
---
> -su;-25.0;5.6;35.0
94c94
< Alot;-13.9;16.6;48.5
---
> Alot;-12.1;17.1;48.5
...

@ChrisBellew
Copy link
Contributor Author

Thanks for your patience @gunnarmorling. I wasn't making good enough use of the test scripts with the different files.

I located the issue and fixed it. We should be good to go now!

@gunnarmorling
Copy link
Owner

Looking good now: 00:06.982.

@gunnarmorling gunnarmorling merged commit 8ab88e9 into gunnarmorling:main Feb 1, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants