batcheval: 24.3 microbench regression in `BenchmarkRefreshRange` #134420

tbg · 2024-11-06T15:55:39Z

./dev bench --ignore-cache --stream-output --bench-mem ./pkg/kv/kvserver/batcheval/ --filter 'RefreshRange/mixed-case/refresh_window=.0.00,0.00.' --test-args '-test.memprofile=mem.pb.gz -test.cpuprofile=cpu.pb.gz -test.benchtime=50000x -test.cpu=8'

Clearly the new one indeed is much slower (I think the spreadsheet says ~75%, and I verified that), but it's unclear why. The additional time seems to be spent in iterator.SeekGE.

old

new

There's something weird about the refresh_window=[N,N] case. Note how [0,0] is way slower than [0,anything-else]. Similar, how [50,50] is by far the slowest of the [50,*] family, etc.

$ ./dev bench --ignore-cache --stream-output --bench-mem ./pkg/kv/kvserver/batcheval/ --filter 'RefreshRange/mixed-case/' --test-args '-test.memprofile=mem.pb.gz -test.cpuprofile=cpu.pb.gz -test.benchtime=3s -test.cpu=8' 2>&1 | grep --line-buffered -E '^Benchmark' | tee bench.txt
BenchmarkRefreshRange/mixed-case/refresh_window=[0.00,0.00]-8              16135            222172 ns/op            1325 B/op         20 allocs/op
BenchmarkRefreshRange/mixed-case/refresh_window=[0.00,50.00]-8            209877             16824 ns/op            1398 B/op         23 allocs/op
BenchmarkRefreshRange/mixed-case/refresh_window=[0.00,75.00]-8            208724             16817 ns/op            1399 B/op         23 allocs/op
BenchmarkRefreshRange/mixed-case/refresh_window=[0.00,95.00]-8            170229             20697 ns/op            1398 B/op         23 allocs/op
BenchmarkRefreshRange/mixed-case/refresh_window=[0.00,99.00]-8            180392             19752 ns/op            1398 B/op         23 allocs/op
BenchmarkRefreshRange/mixed-case/refresh_window=[50.00,50.00]-8               25         132782851 ns/op            7538 B/op         38 allocs/op
BenchmarkRefreshRange/mixed-case/refresh_window=[50.00,75.00]-8           200251             17568 ns/op            1404 B/op         24 allocs/op
BenchmarkRefreshRange/mixed-case/refresh_window=[50.00,95.00]-8           168374             21018 ns/op            1404 B/op         24 allocs/op
BenchmarkRefreshRange/mixed-case/refresh_window=[50.00,99.00]-8           177073             20014 ns/op            1404 B/op         24 allocs/op
BenchmarkRefreshRange/mixed-case/refresh_window=[75.00,75.00]-8               26         128251366 ns/op            4628 B/op         38 allocs/op
BenchmarkRefreshRange/mixed-case/refresh_window=[75.00,95.00]-8           168642             20962 ns/op            1404 B/op         24 allocs/op
BenchmarkRefreshRange/mixed-case/refresh_window=[75.00,99.00]-8           177039             20013 ns/op            1404 B/op         24 allocs/op
BenchmarkRefreshRange/mixed-case/refresh_window=[95.00,95.00]-8           113584             31360 ns/op            1321 B/op         20 allocs/op
BenchmarkRefreshRange/mixed-case/refresh_window=[95.00,99.00]-8           211741             16561 ns/op            1404 B/op         24 allocs/op
BenchmarkRefreshRange/mixed-case/refresh_window=[99.00,99.00]-8             1128           3128315 ns/op            1359 B/op         20 allocs/op

The mixed-case benchmark family roughly writes 1 million (separate) keys in random order, key i at timestamp 5ns*(1+i). The engine is set up with an LBaseMaxBytes of 256 to create a more interesting LSM. Then, it benchmarks RefreshRange with a timestamp range that covers the given percentage of cases.

For example, [0,0] translates to refreshing from ts=5ns to ts=5ns+1 logical. It makes zero sense to me that that would take 222us per op, whereas refreshing over essentially all of the data would take around 20us.

But here's the kicker, the above table is from b20c617, i.e. the "old" commit in the microbench regression. There's little point in musing about why it's gotten almost twice as bad (and it has - ~388 micros for the [0,0] case) if we don't understand either result in the first place.

Jira issue: CRDB-44073

The text was updated successfully, but these errors were encountered:

tbg mentioned this issue Nov 6, 2024

kv: triage microbenchmark regressions for v24.3 #133682

Open

13 tasks

This comment was marked as resolved.

Sign in to view

blathers-crl bot added the X-blathers-untriaged blathers was unable to find an owner label Nov 6, 2024

tbg changed the title ~~RefreshRange @tbg~~ batcheval: 24.3 microbench regression in BenchmarkRefreshRange Nov 6, 2024

tbg added branch-release-24.3 Used to mark GA and release blockers, technical advisories, and bugs for 24.3 C-investigation Further steps needed to qualify. C-label will change. T-kv KV Team labels Nov 6, 2024

tbg self-assigned this Nov 6, 2024

yuzefovich removed the X-blathers-untriaged blathers was unable to find an owner label Nov 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

batcheval: 24.3 microbench regression in `BenchmarkRefreshRange` #134420

batcheval: 24.3 microbench regression in `BenchmarkRefreshRange` #134420

tbg commented Nov 6, 2024 •

edited

Loading

This comment was marked as resolved.

batcheval: 24.3 microbench regression in BenchmarkRefreshRange #134420

batcheval: 24.3 microbench regression in BenchmarkRefreshRange #134420

Comments

tbg commented Nov 6, 2024 • edited Loading

This comment was marked as resolved.

batcheval: 24.3 microbench regression in `BenchmarkRefreshRange` #134420

batcheval: 24.3 microbench regression in `BenchmarkRefreshRange` #134420

tbg commented Nov 6, 2024 •

edited

Loading