-
Notifications
You must be signed in to change notification settings - Fork 6.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JNI get_helper code sharing / multiGet() use efficient batch C++ support [replaced by https://github.com/facebook/rocksdb/pull/12344] #11827
Conversation
8f4147f
to
38eda77
Compare
e5e7c04
to
5fe9745
Compare
I wonder if @ajkr or @pdillinger could help me here ? When I call the optimized version of I have been using the changed |
f20018f
to
046298c
Compare
fdf5085
to
5acf2f6
Compare
5acf2f6
to
7998b73
Compare
8d37b7e
to
73e9b48
Compare
All the uses of this helper in rocksjni.cc Single use in transaction.cc
Suprising that this doesn’t exist already. At best, this should get fixed at a lower level
We have added a GetForUpdate(… PinnableSlice* …) override which implements a missing method in the case where the column family is defaulted. A version with explicit column family already exists. The overloads are not distinguishable in the case where nullptr is passed as the value, and this is manifest in transaction_test.cc. We cast appropriately in tests to fix this. Does this constitute a breaking API change ? Add a test for the PinnableSlice GetForUpdate() variant to confirm the basic usage of the new overload.
First step, new supporting RAII classes and change 1 of the MultiGet() methods The test that calls it still passes Next, add other support classes/methods and update the other MultiGet() methods before deleting the helpers.
Bring java/jmh MultiGet tests back to life, and update Compare old, intermediate and new implementations of simplest multiGet for performance Do we need linux for it to make any difference ? (io_uring)
We have confirmed that the new mechanisms lead to a significant (region of 20% in the small sample jmh tests in ./java/jmh we ran) performance improvement: ``` java -jar target/rocksdbjni-jmh-1.0-SNAPSHOT-benchmarks.jar multiGetList10 -p columnFamilyTestType=no_column_family -p keyCount=10000 -p multiGetSize=100 -p valueSize=64,1024 ``` it is reasonable to assume that the other versions of multiGet, when we change them, will also benefit.
So we have 2 “classes” which are nearer to namespaces for static helper methods: MultiGetKeys, and MultiGetValues
Missed that this was orphaned in a previous checkin
Ignore block checksum-based verifyFlag test for multiGet(), the block checksum compute count appears not to be updated (and this is probably reasonable ?).
Convert the variants of transactional multiGet and multGetForUpdate methods to share the new helpers with those for DB multiGet.
Make some get() methods look like multiGet() instead - helpers to set up the parameters, call the appropriate RocksDB C++ get() method directly, and then helpers to extract results.
All transaction.cc Get methods returning jbyteArray us the new style of helpers introduced with this PR; get the keys, call the C++ get(), extract the result.. Now jbyteArray jni_get_helper is removed.
Break down the get() methods which use it to use component helpers Add some extra testing where a few of the many variants of get() could be tested a bit more thoroughly (e.g. partial results).
Use unique_ptr<jbyte[]> for copied buffers Makes de-allocation more automatic/RAII-style Also, Start to comment some methods
Give the helpers a final once over Sort the naming as it is unclear/inconsistent; say what type we are copying into, rather than “values” Slightly unclear if a pinnable which might have a buffer doesn’t get reset when there’s an error. Just reset it before returning the error.
Use JDirectBufferSlice helpers Re-instate a rebase issue which removed one of the getDirect() instances.
Use JByteArraySlice, JDirectBufferSlice for and their pinnable variants for JNI get code. Makes the changes in this PR consistent with previous re-organisation and extension of rocksjni methods and removes redundant semi-duplicate code we produced initially in this work.
Single key get() now solely uses kv_helper.h methods
Return a sensible status when the result length overflows a 32 bit signed quantity.
These tests were verifying that overflow sizes of java buffers returned correctly. They are useful, but creating enough data and fetching it back is slow/unreliable and so we @ignore them for now.
527f422
to
102e86b
Compare
Implement RAII-based helpers for JNI
Get()
andmultiGet()
Replace JNI C++ helpers
rocksdb_get_helper
,rocksdb_get_helper_direct
,multi_get_helper
,multi_get_helper_direct
,multi_get_helper_release_keys
,txn_get_helper
,txn_multi_get_helper
The model is to entirely do away with a single helper, instead a number of utility methods allow each separate
JNI
Get()
andMultiGet()
method to organise their parameters efficiently, then call the underlying C++db->Get()
,db->MultiGet()
,txn->Get()
ortxn->MultiGet()
method itself, anduse further utilities to retrieve results.
Roughly speaking:
We achieve a useful performance gain as part of this work; by using the updated C++
multiGet
we immediately pick up its performance gains (batch improvements to multiGet C++ were previously implemented, but not until now used by Java/JNI).multiGetBB
already uses the batchedC++
multiGet()
, and all other benchmarks show consistent improvement after the changes:Before this PR
Benchmark (columnFamilyTestType) (keyCount) (keySize) (multiGetSize) (valueSize) Mode Cnt Score Error Units
MultiGetNewBenchmarks.multiGetBB200 no_column_family 10000 1024 100 256 thrpt 25 5315.459 ± 20.465 ops/s
MultiGetNewBenchmarks.multiGetBB200 no_column_family 10000 1024 100 1024 thrpt 25 5673.115 ± 78.299 ops/s
MultiGetNewBenchmarks.multiGetBB200 no_column_family 10000 1024 100 4096 thrpt 25 2616.860 ± 46.994 ops/s
MultiGetNewBenchmarks.multiGetBB200 no_column_family 10000 1024 100 16384 thrpt 25 1700.058 ± 24.034 ops/s
MultiGetNewBenchmarks.multiGetBB200 no_column_family 10000 1024 100 65536 thrpt 25 791.171 ± 13.955 ops/s
MultiGetNewBenchmarks.multiGetList10 no_column_family 10000 1024 100 256 thrpt 25 6129.929 ± 94.200 ops/s
MultiGetNewBenchmarks.multiGetList10 no_column_family 10000 1024 100 1024 thrpt 25 7012.405 ± 97.886 ops/s
MultiGetNewBenchmarks.multiGetList10 no_column_family 10000 1024 100 4096 thrpt 25 2799.014 ± 39.352 ops/s
MultiGetNewBenchmarks.multiGetList10 no_column_family 10000 1024 100 16384 thrpt 25 1417.205 ± 22.272 ops/s
MultiGetNewBenchmarks.multiGetList10 no_column_family 10000 1024 100 65536 thrpt 25 655.594 ± 13.050 ops/s
MultiGetNewBenchmarks.multiGetListExplicitCF20 no_column_family 10000 1024 100 256 thrpt 25 6147.247 ± 82.711 ops/s
MultiGetNewBenchmarks.multiGetListExplicitCF20 no_column_family 10000 1024 100 1024 thrpt 25 7004.213 ± 79.251 ops/s
MultiGetNewBenchmarks.multiGetListExplicitCF20 no_column_family 10000 1024 100 4096 thrpt 25 2715.154 ± 110.017 ops/s
MultiGetNewBenchmarks.multiGetListExplicitCF20 no_column_family 10000 1024 100 16384 thrpt 25 1408.070 ± 31.714 ops/s
MultiGetNewBenchmarks.multiGetListExplicitCF20 no_column_family 10000 1024 100 65536 thrpt 25 623.829 ± 57.374 ops/s
MultiGetNewBenchmarks.multiGetListRandomCF30 no_column_family 10000 1024 100 256 thrpt 25 6119.243 ± 116.313 ops/s
MultiGetNewBenchmarks.multiGetListRandomCF30 no_column_family 10000 1024 100 1024 thrpt 25 6931.873 ± 128.094 ops/s
MultiGetNewBenchmarks.multiGetListRandomCF30 no_column_family 10000 1024 100 4096 thrpt 25 2678.253 ± 39.113 ops/s
MultiGetNewBenchmarks.multiGetListRandomCF30 no_column_family 10000 1024 100 16384 thrpt 25 1337.384 ± 19.500 ops/s
MultiGetNewBenchmarks.multiGetListRandomCF30 no_column_family 10000 1024 100 65536 thrpt 25 625.596 ± 14.525 ops/s
After this PR
Benchmark (columnFamilyTestType) (keyCount) (keySize) (multiGetSize) (valueSize) Mode Cnt Score Error Units
MultiGetBenchmarks.multiGetBB200 no_column_family 10000 1024 100 256 thrpt 25 5320.496 ± 24.702 ops/s
MultiGetBenchmarks.multiGetBB200 no_column_family 10000 1024 100 1024 thrpt 25 5656.477 ± 54.518 ops/s
MultiGetBenchmarks.multiGetBB200 no_column_family 10000 1024 100 4096 thrpt 25 2562.934 ± 19.211 ops/s
MultiGetBenchmarks.multiGetBB200 no_column_family 10000 1024 100 16384 thrpt 25 1677.339 ± 13.946 ops/s
MultiGetBenchmarks.multiGetBB200 no_column_family 10000 1024 100 65536 thrpt 25 763.773 ± 16.721 ops/s
MultiGetBenchmarks.multiGetList10 no_column_family 10000 1024 100 256 thrpt 25 6988.732 ± 53.948 ops/s
MultiGetBenchmarks.multiGetList10 no_column_family 10000 1024 100 1024 thrpt 25 7531.901 ± 76.555 ops/s
MultiGetBenchmarks.multiGetList10 no_column_family 10000 1024 100 4096 thrpt 25 2906.175 ± 35.923 ops/s
MultiGetBenchmarks.multiGetList10 no_column_family 10000 1024 100 16384 thrpt 25 1590.764 ± 27.832 ops/s
MultiGetBenchmarks.multiGetList10 no_column_family 10000 1024 100 65536 thrpt 25 743.695 ± 10.272 ops/s
MultiGetBenchmarks.multiGetListExplicitCF20 no_column_family 10000 1024 100 256 thrpt 25 6883.862 ± 46.907 ops/s
MultiGetBenchmarks.multiGetListExplicitCF20 no_column_family 10000 1024 100 1024 thrpt 25 7579.593 ± 96.047 ops/s
MultiGetBenchmarks.multiGetListExplicitCF20 no_column_family 10000 1024 100 4096 thrpt 25 2920.348 ± 43.066 ops/s
MultiGetBenchmarks.multiGetListExplicitCF20 no_column_family 10000 1024 100 16384 thrpt 25 1583.203 ± 35.926 ops/s
MultiGetBenchmarks.multiGetListExplicitCF20 no_column_family 10000 1024 100 65536 thrpt 25 744.396 ± 7.844 ops/s
MultiGetBenchmarks.multiGetListRandomCF30 no_column_family 10000 1024 100 256 thrpt 25 6975.793 ± 23.194 ops/s
MultiGetBenchmarks.multiGetListRandomCF30 no_column_family 10000 1024 100 1024 thrpt 25 7600.065 ± 46.056 ops/s
MultiGetBenchmarks.multiGetListRandomCF30 no_column_family 10000 1024 100 4096 thrpt 25 2939.329 ± 41.117 ops/s
MultiGetBenchmarks.multiGetListRandomCF30 no_column_family 10000 1024 100 16384 thrpt 25 1606.477 ± 8.143 ops/s
MultiGetBenchmarks.multiGetListRandomCF30 no_column_family 10000 1024 100 65536 thrpt 25 744.661 ± 9.443 ops/s
It would be good to extend this work to other helpers (other than get()), but if we did so at this point the scope would be far too great.
This PR closes #11518
This PR is being replaced by #12344 - I will delete this when that one has been completed.