kv: triage microbenchmark regressions for v24.3 #133682

arulajmani · 2024-10-29T15:38:56Z

Describe the problem

release-24.1 (b20c617) -> master (1b5c419); raw sheets from.

All relevant performance regressions are reflected in the sec/op metric unless otherwise specified.

pkg/kv/kvserver

ReplicaProposal (secs/op and allocs/op) @andrewbaptist
OptimisticEval/latches @arulajmani
BumpSideTransportClosed @arulajmani
StoreRangeSplit (@stevendanna)

pkg/kv/kvserver/gc

Run (allocs/op) @tbg¹

pkg/kv/kvserver/batcheval

batcheval: 24.3 microbench regression in BenchmarkRefreshRange #134420

pkg/kv/kvserver/tscache

IntervalSklAdd @tbg²

pkg/kv/kvclient/kvcoord

TruncateLoop (secs/op and allocs/op) @andrewbaptist
SingleRoundtripWithLatency (secs/op and allocs/op) @arulajmani

pkg/raft/rafttest

Proposal3Nodes (secs/op and allocs/op) @iskettaneh (test-related regression)
Status/members (secs/op and allocs/op) @iskettaneh (Marked as expected in this issue)
RawNode/two-voters-32 (allocs/op) @iskettaneh (opened this issue)
OneNode-32 @nvanbenschoten

Tips for debugging

Benchdiff can be a useful tool for debugging. Install the tool, then follow these steps to auto-bisect the regression.

# Setup steps:
export base_sha=b20c6171
export master_sha=1b5c4193
echo 'git clean -e benchdiff -e post-checkout.sh -e .bazelrc.user -e _bazel -e bin/.dev-status -e bin/dev-versions -dxff' >> post-checkout.sh
echo './dev generate' >> post-checkout.sh
echo './dev build short' >> post-checkout.sh

# Per-benchmark configuration.
export pkg_name='./pkg/kv/kvserver/spanlatch'
export test_name='BTreeDeleteInsert/count=16'
export threshold=0.2

# Single comparison between master and release-24.1:
benchdiff --new=${master_sha} --old=${base_sha} --post-checkout='bash post-checkout.sh' --run=${test_name} --count=10 ${pkg_name}
# OR auto-bisect regression:
rm -rf ./benchdiff/*
git bisect reset
git bisect start ${master_sha} ${base_sha}
git bisect run benchdiff --old=${base_sha} --post-checkout='bash post-checkout.sh' --run=${test_name} --count=10 --threshold=${threshold} ${pkg_name}

Jira issue: CRDB-43730

I think this is about the 5.41% more in allocs? Most other flavors of this benchmark are doing significantly better (according to this spreadsheet). This test also sets up a giant dataset and somehow takes literal ages under benchdiff on my gceworker; I think it somehow doesn't cache the dataset maybe? It takes 15m on my M1. Either way this is probably not worth seeing through, the regressed test case is 10% intents and nothing deleted, so it isn't anything that needs to be particularly fast in real life. ↩
I ran $ git log --oneline 1b5c419 --not $(git merge-base b20c617 1b5c419) -- './pkg/kv/kvserver/tscache/*.go' and there haven't been any code changes to this package. I ran benchdiff just to make sure and it confirmed that everything was ~. ↩

The text was updated successfully, but these errors were encountered:

andrewbaptist · 2024-10-30T19:06:46Z

I don't think the CPU regression is real. There is a fairly high variance in the test and the measurement is in microseconds/op. After running a few times with different combinations, it seems most likely that this is just test variance rather than a real issue:

CPU regression:

baptist_cockroachlabs_com@gceworker-abaptist:~/go/src/github.com/cockroachdb/cockroach-perf$ benchdiff --new 1b5c4193 --old b20c6171 --post-checkout ~/post-checkout.sh --run='TruncateLoop/asc/reqs=128/ranges=64/type=scan' --count 20 ./pkg/kv/kvclient/kvcoord
test binaries already exist for 'b20c617'; skipping build
test binaries already exist for '1b5c419'; skipping build

  pkg=1/1 iter=20/20 cockroachdb/cockroach/pkg/kv/kvclient/kvcoord |

name                                              old time/op    new time/op    delta
TruncateLoop/asc/reqs=128/ranges=64/type=scan-24     533µs ±16%     518µs ± 7%   ~     (p=0.110 n=20+17)

name                                              old alloc/op   new alloc/op   delta
TruncateLoop/asc/reqs=128/ranges=64/type=scan-24     447kB ±22%     437kB ±11%   ~     (p=0.270 n=20+17)

name                                              old allocs/op  new allocs/op  delta
TruncateLoop/asc/reqs=128/ranges=64/type=scan-24     6.60k ±17%     6.43k ±16%   ~     (p=0.184 n=20+19)
baptist_cockroachlabs_com@gceworker-abaptist:~/go/src/github.com/cockroachdb/cockroach-perf$ benchdiff --new 1b5c4193 --old b20c6171 --post-checkout ~/post-checkout.sh --run='TruncateLoop/asc/reqs=128/ranges=64/type=scan' --count 20 ./pkg/kv/kvclient/kvcoord
test binaries already exist for 'b20c617'; skipping build
test binaries already exist for '1b5c419'; skipping build

  pkg=1/1 iter=20/20 cockroachdb/cockroach/pkg/kv/kvclient/kvcoord \

name                                              old time/op    new time/op    delta
TruncateLoop/asc/reqs=128/ranges=64/type=scan-24     539µs ± 9%     534µs ±19%   ~     (p=0.478 n=20+20)

name                                              old alloc/op   new alloc/op   delta
TruncateLoop/asc/reqs=128/ranges=64/type=scan-24     454kB ±15%     453kB ±24%   ~     (p=0.698 n=20+20)

name                                              old allocs/op  new allocs/op  delta
TruncateLoop/asc/reqs=128/ranges=64/type=scan-24     6.68k ± 9%     6.66k ±18%   ~     (p=0.654 n=20+20)

The memory regression is real, but its due to known changes to the ScanRequest. It is all coming from ScanRequest.ShallowCopy using more memory. In the last two releases we increased the size of a ScanRequest by adding 2 fields to it key_locking_strength and return_raw_mvcc_values. More than half of the memory allocations in this tests are ScanRequests so I think it makes sense that it would pick up on this change. That said, there are more objects being allocated also, so I might be misreading the profile:

baptist_cockroachlabs_com@gceworker-abaptist:~/go/src/github.com/cockroachdb/cockroach-perf$ benchdiff --new 1b5c4193 --old b20c6171 --post-checkout ~/post-checkout.sh --run='TruncateLoop/asc/reqs=128/ranges=4/type=scan' --count 20 ./pkg/kv/kvclient/kvcoord
test binaries already exist for 'b20c617'; skipping build
test binaries already exist for '1b5c419'; skipping build

  pkg=1/1 iter=20/20 cockroachdb/cockroach/pkg/kv/kvclient/kvcoord /

name                                             old time/op    new time/op    delta
TruncateLoop/asc/reqs=128/ranges=4/type=scan-24    64.4µs ±25%    65.8µs ±29%   ~     (p=0.383 n=20+20)

name                                             old alloc/op   new alloc/op   delta
TruncateLoop/asc/reqs=128/ranges=4/type=scan-24    48.4kB ±21%    50.7kB ±19%   ~     (p=0.209 n=19+19)

name                                             old allocs/op  new allocs/op  delta
TruncateLoop/asc/reqs=128/ranges=4/type=scan-24       487 ±30%       519 ±23%   ~     (p=0.261 n=19+19)
baptist_cockroachlabs_com@gceworker-abaptist:~/go/src/github.com/cockroachdb/cockroach-perf$ benchdiff --new 1b5c4193 --old b20c6171 --post-checkout ~/post-checkout.sh --run='TruncateLoop/asc/reqs=128/ranges=4/type=scan' --count 20 ./pkg/kv/kvclient/kvcoord
test binaries already exist for 'b20c617'; skipping build
test binaries already exist for '1b5c419'; skipping build

  pkg=1/1 iter=20/20 cockroachdb/cockroach/pkg/kv/kvclient/kvcoord /

name                                             old time/op    new time/op    delta
TruncateLoop/asc/reqs=128/ranges=4/type=scan-24    62.5µs ±24%    65.3µs ±15%   ~     (p=0.398 n=20+20)

name                                             old alloc/op   new alloc/op   delta
TruncateLoop/asc/reqs=128/ranges=4/type=scan-24    45.7kB ±35%    49.1kB ±23%   ~     (p=0.199 n=20+20)

name                                             old allocs/op  new allocs/op  delta
TruncateLoop/asc/reqs=128/ranges=4/type=scan-24       450 ±46%       502 ±29%   ~     (p=0.199 n=20+20)
baptist_cockroachlabs_com@gceworker-abaptist:~/go/src/github.com/cockroachdb/cockroach-perf$ benchdiff --new 1b5c4193 --old b20c6171 --post-checkout ~/post-checkout.sh --run='TruncateLoop/asc/reqs=128/ranges=4/type=scan' --count 20 ./pkg/kv/kvclient/kvcoord
test binaries already exist for 'b20c617'; skipping build
test binaries already exist for '1b5c419'; skipping build

  pkg=1/1 iter=20/20 cockroachdb/cockroach/pkg/kv/kvclient/kvcoord |

name                                             old time/op    new time/op    delta
TruncateLoop/asc/reqs=128/ranges=4/type=scan-24    65.8µs ±16%    66.2µs ±14%   ~     (p=0.885 n=19+19)

name                                             old alloc/op   new alloc/op   delta
TruncateLoop/asc/reqs=128/ranges=4/type=scan-24    47.9kB ±31%    49.8kB ±26%   ~     (p=0.513 n=20+19)

name                                             old allocs/op  new allocs/op  delta
TruncateLoop/asc/reqs=128/ranges=4/type=scan-24       485 ±41%       506 ±25%   ~     (p=0.593 n=20+19)

kvoli · 2024-10-31T19:23:16Z

I am not going to get to the regression I assigned myself against today. I'll be back on Monday the 10th of November to assist if necessary. Un-assigning myself in the meantime.

arulajmani · 2024-10-31T19:32:02Z

BumpSideTransportClosed bisected down to 37fd8f1. Seems unrelated, but I haven't dug into it. Posting interim results.

BumpSideTransportClosed-24      0.00           0.00        ~     (all equal)
37fd8f1a082efa6c0dde73cb36fbbc7d1409bc11 is the first bad commit
commit 37fd8f1a082efa6c0dde73cb36fbbc7d1409bc11
Merge: 786cb46ea5f 5884ff568d0
Author: craig[bot] <[email protected]>
Date:   Tue Jul 2 17:11:44 2024 +0000

    Merge #125905

    125905: ccl,keys,sql,storage,util: restructure fuzz tests r=srosenberg a=Nukitt

    Previously we've had the fuzz tests and unit tests in the same `_test.go` file. This has to be changed now since oss-fuzz uses [this](https://github.com/AdamKorcz/go-118-fuzz-build/) to compile fuzzers which has a limitation that

    > _test.go files are currently not included in the scope of the build.

    To address this we have separate our fuzz tests and unit tests, putting the fuzz tests in `X_fuzz.go` files.

    Epic: None
    Release note: None

    Co-authored-by: Nukitt <[email protected]>

Ran another bisect, pointing to 78732f6. This seems more likely. Will investigate.

78732f6096e614bef9edea02707ad1885c65beab is the first bad commit
commit 78732f6096e614bef9edea02707ad1885c65beab
Author: Nathan VanBenschoten <[email protected]>
Date:   Fri Jun 14 17:34:12 2024 -0400

    kv: move leaseStatus to leases package

    Informs #123498.

    This commit moves the leaseStatus function to the leases package. This
    allows for easier unit testing and helps to further consolidate leasing
    logic in a single location.

    Code movement only, not behavior changes.

    Release note: None

 pkg/kv/kvserver/BUILD.bazel                 |   1 -
 pkg/kv/kvserver/leases/BUILD.bazel          |  13 ++-
 pkg/kv/kvserver/leases/status.go            | 173 ++++++++++++++++++++++++++++
 pkg/kv/kvserver/leases/status_test.go       | 137 ++++++++++++++++++++++
 pkg/kv/kvserver/replica_range_lease.go      | 153 ++----------------------
 pkg/kv/kvserver/replica_range_lease_test.go | 156 -------------------------
 6 files changed, 331 insertions(+), 302 deletions(-)
 create mode 100644 pkg/kv/kvserver/leases/status.go
 create mode 100644 pkg/kv/kvserver/leases/status_test.go
 delete mode 100644 pkg/kv/kvserver/replica_range_lease_test.go
bisect run success

From what I can tell, GetLiveness used to be in-lined before, but is no longer inlined after 78732f6 because we pulled it behind an interface. The implicated change was required to build leader leases as it centralizes all the leasing logic in a single place. Given we're moving to leader leases in 25.1 anyway, and this regression only affects epoch based leases, we might have to live with this one for 24.3. I think we mark it off as a regrettable regression. cc @nvanbenschoten -- what do you think?

benchdiff --new=${first_bad} --old=${last_good} --post-checkout='bash post-checkout.sh' --run=${test_name} --count=10 ${pkg_name} --cpuprofile
test binaries already exist for 'a598299'; skipping build
test binaries already exist for '78732f6'; skipping build

  pkg=1/1 iter=10/10 cockroachdb/cockroach/pkg/kv/kvserver -

name                        old time/op    new time/op    delta
BumpSideTransportClosed-24     238ns ±10%     280ns ± 8%  +17.50%  (p=0.000 n=10+10)

name                        old alloc/op   new alloc/op   delta
BumpSideTransportClosed-24     1.00B ± 0%     1.30B ±54%     ~     (p=0.211 n=10+10)

name                        old allocs/op  new allocs/op  delta
BumpSideTransportClosed-24      0.00           0.00          ~     (all equal)

wrote cpu profiles to:
  old=benchdiff/a598299/artifacts/cpu.prof
  new=benchdiff/78732f6/artifacts/cpu.prof

arulajmani · 2024-11-04T19:33:50Z

name                                                                     old time/op    new time/op    delta
OptimisticEval/latches-and-locks/real-contention=false/num-writers=4-24     140µs ± 5%     132µs ± 5%   -5.69%  (p=0.000 n=9+9)
OptimisticEval/latches-and-locks/real-contention=false/num-writers=1-24     134µs ± 4%     128µs ± 8%     ~     (p=0.079 n=9+10)
OptimisticEval/latches-and-locks/real-contention=true/num-writers=1-24     1.39ms ± 7%    1.35ms ± 2%     ~     (p=0.315 n=10+9)
OptimisticEval/latches-and-locks/real-contention=true/num-writers=4-24     2.55ms ±15%    2.61ms ±13%     ~     (p=0.684 n=10+10)
OptimisticEval/latches/real-contention=false/num-writers=1-24              80.8µs ± 5%    83.2µs ± 6%     ~     (p=0.075 n=10+10)
OptimisticEval/latches/real-contention=true/num-writers=1-24                195µs ± 4%     201µs ± 3%   +3.04%  (p=0.011 n=10+10)
OptimisticEval/latches/real-contention=true/num-writers=4-24                596µs ± 5%     615µs ± 6%   +3.21%  (p=0.023 n=10+10)
OptimisticEval/latches/real-contention=false/num-writers=4-24              99.2µs ± 8%   103.8µs ± 6%   +4.60%  (p=0.023 n=10+10)

name                                                                     old alloc/op   new alloc/op   delta
OptimisticEval/latches-and-locks/real-contention=false/num-writers=1-24    15.1kB ± 1%    14.4kB ± 1%   -4.29%  (p=0.000 n=9+8)
OptimisticEval/latches-and-locks/real-contention=false/num-writers=4-24    15.2kB ± 1%    14.6kB ± 1%   -4.17%  (p=0.000 n=9+10)
OptimisticEval/latches/real-contention=false/num-writers=4-24              15.8kB ± 3%    15.2kB ± 2%   -4.00%  (p=0.000 n=10+9)
OptimisticEval/latches/real-contention=true/num-writers=1-24               22.5kB ± 1%    21.6kB ± 2%   -3.97%  (p=0.000 n=8+10)
OptimisticEval/latches-and-locks/real-contention=true/num-writers=1-24     72.5kB ± 1%    70.3kB ± 2%   -3.05%  (p=0.000 n=10+10)
OptimisticEval/latches-and-locks/real-contention=true/num-writers=4-24      128kB ± 5%     129kB ± 8%     ~     (p=0.400 n=9+10)
OptimisticEval/latches/real-contention=false/num-writers=1-24              12.9kB ± 1%    12.9kB ± 1%     ~     (p=0.549 n=10+9)
OptimisticEval/latches/real-contention=true/num-writers=4-24               63.1kB ± 2%    62.7kB ± 1%     ~     (p=0.720 n=9+10)

name                                                                     old allocs/op  new allocs/op  delta
OptimisticEval/latches-and-locks/real-contention=true/num-writers=1-24        572 ± 2%       499 ± 0%  -12.71%  (p=0.000 n=10+10)
OptimisticEval/latches-and-locks/real-contention=false/num-writers=1-24       107 ± 1%        94 ± 1%  -11.59%  (p=0.000 n=10+8)
OptimisticEval/latches-and-locks/real-contention=false/num-writers=4-24       108 ± 1%        96 ± 1%  -11.52%  (p=0.000 n=10+10)
OptimisticEval/latches/real-contention=true/num-writers=1-24                  175 ± 2%       155 ± 1%  -11.37%  (p=0.000 n=9+10)
OptimisticEval/latches/real-contention=false/num-writers=4-24                 109 ± 7%        97 ± 2%  -10.93%  (p=0.000 n=10+9)
OptimisticEval/latches/real-contention=true/num-writers=4-24                  473 ± 1%       427 ± 0%   -9.68%  (p=0.000 n=9+10)
OptimisticEval/latches-and-locks/real-contention=true/num-writers=4-24      1.04k ± 2%     0.97k ± 7%   -7.18%  (p=0.000 n=8+10)
OptimisticEval/latches/real-contention=false/num-writers=1-24                87.4 ± 2%      82.0 ± 0%   -6.18%  (p=0.000 n=10+9)

We seem to have improved on all but OptimisticEval/latches/real-contention={true,false} benchmarks here. I'll run a bisect.

Doesn't work out of the box, because some benchmarks here have been improved over the course of 24.3 development:

The merge base cc6cf240d1dc800527c3bd9c489db4aa16bf64b1 is bad.
This means the bug has been fixed between cc6cf240d1dc800527c3bd9c489db4aa16bf64b1 and [b20c617197fc13f08ae9be59ec40254c17c9e0f5].
bisect run failed:
'bisect_state bad' exited with error code 3

I'm going to focus just on OptimisticEval/latches/real-contention=true/ and see what shakes out.

  pkg=1/1 iter=10/10 cockroachdb/cockroach/pkg/kv/kvserver |

name                                                                     old time/op    new time/op    delta
OptimisticEval/latches-and-locks/real-contention=false/num-writers=1-24     140µs ± 7%     143µs ± 4%     ~     (p=0.393 n=10+10)
OptimisticEval/latches-and-locks/real-contention=false/num-writers=4-24     139µs ± 4%     142µs ± 6%     ~     (p=0.165 n=10+10)
OptimisticEval/latches/real-contention=false/num-writers=1-24              80.7µs ± 5%    82.6µs ± 5%     ~     (p=0.143 n=10+10)
OptimisticEval/latches/real-contention=false/num-writers=4-24               101µs ± 3%     104µs ± 2%   +2.19%  (p=0.006 n=9+8)
OptimisticEval/latches-and-locks/real-contention=true/num-writers=1-24     1.38ms ± 5%    1.43ms ± 4%   +3.43%  (p=0.009 n=10+10)
OptimisticEval/latches/real-contention=true/num-writers=1-24                199µs ± 4%     208µs ± 4%   +4.46%  (p=0.000 n=10+10)
OptimisticEval/latches/real-contention=true/num-writers=4-24                595µs ± 4%     633µs ± 3%   +6.38%  (p=0.000 n=10+10)
OptimisticEval/latches-and-locks/real-contention=true/num-writers=4-24     2.79ms ±11%    3.02ms ±14%   +8.20%  (p=0.011 n=10+10)

name                                                                     old alloc/op   new alloc/op   delta
OptimisticEval/latches/real-contention=false/num-writers=1-24              12.9kB ± 1%    13.6kB ± 1%   +5.60%  (p=0.000 n=8+10)
OptimisticEval/latches-and-locks/real-contention=false/num-writers=4-24    15.2kB ± 1%    16.2kB ± 1%   +6.18%  (p=0.000 n=10+8)
OptimisticEval/latches/real-contention=false/num-writers=4-24              15.8kB ± 2%    16.9kB ± 3%   +6.55%  (p=0.000 n=10+10)
OptimisticEval/latches-and-locks/real-contention=false/num-writers=1-24    15.1kB ± 1%    16.1kB ± 1%   +6.70%  (p=0.000 n=9+10)
OptimisticEval/latches-and-locks/real-contention=true/num-writers=1-24     72.9kB ± 1%    78.6kB ± 2%   +7.86%  (p=0.000 n=10+9)
OptimisticEval/latches/real-contention=true/num-writers=4-24               63.1kB ± 1%    68.5kB ± 2%   +8.50%  (p=0.000 n=10+9)
OptimisticEval/latches/real-contention=true/num-writers=1-24               22.4kB ± 1%    24.7kB ± 6%  +10.27%  (p=0.000 n=10+10)
OptimisticEval/latches-and-locks/real-contention=true/num-writers=4-24      133kB ± 2%     155kB ± 4%  +16.31%  (p=0.000 n=9+10)

name                                                                     old allocs/op  new allocs/op  delta
OptimisticEval/latches/real-contention=false/num-writers=1-24                87.4 ± 1%     102.8 ± 2%  +17.65%  (p=0.000 n=8+10)
OptimisticEval/latches-and-locks/real-contention=false/num-writers=4-24       108 ± 0%       129 ± 1%  +19.14%  (p=0.000 n=6+9)
OptimisticEval/latches-and-locks/real-contention=false/num-writers=1-24       107 ± 1%       127 ± 1%  +19.21%  (p=0.000 n=10+10)
OptimisticEval/latches/real-contention=false/num-writers=4-24                 109 ± 2%       131 ± 4%  +20.04%  (p=0.000 n=10+10)
OptimisticEval/latches-and-locks/real-contention=true/num-writers=1-24        577 ± 2%       706 ± 5%  +22.31%  (p=0.000 n=10+10)
OptimisticEval/latches/real-contention=true/num-writers=1-24                  174 ± 2%       217 ± 8%  +24.47%  (p=0.000 n=10+10)
OptimisticEval/latches/real-contention=true/num-writers=4-24                  472 ± 2%       590 ± 4%  +25.02%  (p=0.000 n=10+9)
OptimisticEval/latches-and-locks/real-contention=true/num-writers=4-24      1.07k ± 4%     1.41k ± 2%  +31.51%  (p=0.000 n=10+7)

No luck there either. Seems like we overall regressed this benchmark and then improved it. Whatever improvement we made resulted in most of the variants getting better, but 3 of them still showed a slight regression overall.

  pkg=1/1 iter=10/10 cockroachdb/cockroach/pkg/kv/kvserver \

name                                                                    old time/op    new time/op    delta
OptimisticEval/latches-and-locks/real-contention=true/num-writers=1-24    1.43ms ± 5%    1.42ms ± 3%     ~     (p=0.123 n=10+10)
OptimisticEval/latches/real-contention=true/num-writers=4-24               601µs ± 4%     630µs ± 5%   +4.75%  (p=0.001 n=10+10)
OptimisticEval/latches/real-contention=true/num-writers=1-24               196µs ± 4%     208µs ± 5%   +6.06%  (p=0.000 n=9+10)
OptimisticEval/latches-and-locks/real-contention=true/num-writers=4-24    2.78ms ±20%    3.08ms ± 7%  +10.86%  (p=0.002 n=10+10)

name                                                                    old alloc/op   new alloc/op   delta
OptimisticEval/latches/real-contention=true/num-writers=4-24              63.6kB ± 2%    68.6kB ± 2%   +7.76%  (p=0.000 n=10+10)
OptimisticEval/latches/real-contention=true/num-writers=1-24              22.5kB ± 2%    24.3kB ± 2%   +7.89%  (p=0.000 n=10+10)
OptimisticEval/latches-and-locks/real-contention=true/num-writers=1-24    72.5kB ± 1%    79.4kB ± 2%   +9.57%  (p=0.000 n=9+10)
OptimisticEval/latches-and-locks/real-contention=true/num-writers=4-24     135kB ± 8%     155kB ± 5%  +14.74%  (p=0.000 n=9+9)

name                                                                    old allocs/op  new allocs/op  delta
OptimisticEval/latches/real-contention=true/num-writers=1-24                 177 ± 2%       212 ± 3%  +19.86%  (p=0.000 n=8+10)
OptimisticEval/latches/real-contention=true/num-writers=4-24                 479 ± 2%       589 ± 3%  +23.08%  (p=0.000 n=10+10)
OptimisticEval/latches-and-locks/real-contention=true/num-writers=1-24       571 ± 2%       708 ± 2%  +23.93%  (p=0.000 n=10+10)
OptimisticEval/latches-and-locks/real-contention=true/num-writers=4-24     1.08k ±16%     1.40k ± 5%  +29.96%  (p=0.000 n=10+9)
fatal: time/op regression in OptimisticEval/latches/real-contention=true/num-writers=4-24 of +4.75% exceeded threshold of 3.00%
The merge base cc6cf240d1dc800527c3bd9c489db4aa16bf64b1 is bad.
This means the bug has been fixed between cc6cf240d1dc800527c3bd9c489db4aa16bf64b1 and [b20c617197fc13f08ae9be59ec40254c17c9e0f5].
bisect run failed:
'bisect_state bad' exited with error code 3

Given the regression between 24.1 and master is <4% for 3 variants, and most variants of the benchmark have actually improved, I'm going to check this off.

Make this usable with Benchdiff. Informs cockroachdb#133682 Release note: None

arulajmani · 2024-11-05T01:21:04Z

I bisected some of the regression in SingleRoundtripWithLatency to 9f0e04a.

bisect result

  pkg=1/1 iter=10/10 cockroachdb/cockroach/pkg/kv/kvclient/kvcoord |

name                                        old time/op    new time/op    delta
SingleRoundtripWithLatency/latency=10ms-24    12.7ms ± 1%    12.6ms ± 2%     ~     (p=0.447 n=9+10)
SingleRoundtripWithLatency/latency=0s-24       236µs ± 3%     244µs ± 6%   +3.10%  (p=0.043 n=9+10)

name                                        old alloc/op   new alloc/op   delta
SingleRoundtripWithLatency/latency=0s-24      17.8kB ± 1%    18.4kB ± 1%   +3.22%  (p=0.000 n=10+10)
SingleRoundtripWithLatency/latency=10ms-24    21.4kB ± 5%    24.3kB ± 6%  +13.79%  (p=0.000 n=9+10)

name                                        old allocs/op  new allocs/op  delta
SingleRoundtripWithLatency/latency=0s-24         144 ± 0%       144 ± 0%     ~     (all equal)
SingleRoundtripWithLatency/latency=10ms-24       183 ± 1%       184 ± 0%   +0.61%  (p=0.009 n=8+8)
fatal: alloc/op regression in SingleRoundtripWithLatency/latency=10ms-24 of +13.79% exceeded threshold of 8.00%
9f0e04af27b6e7be8b38db3a625641fd616449b0 is the first bad commit
commit 9f0e04af27b6e7be8b38db3a625641fd616449b0
Author: Kyle Wong <[email protected]>

Comparing to the commit prior, we see:


name                                        old time/op    new time/op    delta
SingleRoundtripWithLatency/latency=0s-24       236µs ± 3%     242µs ± 7%    ~     (p=0.105 n=10+10)
SingleRoundtripWithLatency/latency=10ms-24    12.6ms ± 1%    12.7ms ± 2%  +0.96%  (p=0.043 n=9+10)

name                                        old alloc/op   new alloc/op   delta
SingleRoundtripWithLatency/latency=0s-24      18.4kB ± 1%    18.4kB ± 1%    ~     (p=0.617 n=10+10)
SingleRoundtripWithLatency/latency=10ms-24    22.5kB ± 7%    24.3kB ± 6%  +8.31%  (p=0.000 n=10+10)

name                                        old allocs/op  new allocs/op  delta
SingleRoundtripWithLatency/latency=0s-24         144 ± 0%       144 ± 0%    ~     (all equal)
SingleRoundtripWithLatency/latency=10ms-24       184 ± 0%       187 ± 3%    ~     (p=0.273 n=9+10)

Looking at the change and the heap profile, the only thing I can weakly conclude is that the allocations here are a result of introducing a new system table.

For context, the entire regression between master and 24.1 looks like:

test binaries already exist for 'b20c617'; skipping build
test binaries already exist for '1b5c419'; skipping build

  pkg=1/1 iter=10/10 cockroachdb/cockroach/pkg/kv/kvclient/kvcoord \

name                                        old time/op    new time/op    delta
SingleRoundtripWithLatency/latency=10ms-24    12.6ms ± 1%    12.7ms ± 2%    ~     (p=0.447 n=9+10)
SingleRoundtripWithLatency/latency=0s-24       240µs ± 5%     255µs ± 1%  +6.59%  (p=0.000 n=10+7)

name                                        old alloc/op   new alloc/op   delta
SingleRoundtripWithLatency/latency=0s-24      17.8kB ± 1%    18.3kB ± 1%  +2.82%  (p=0.000 n=10+10)
SingleRoundtripWithLatency/latency=10ms-24    21.7kB ± 5%    23.7kB ± 6%  +9.35%  (p=0.000 n=10+10)

name                                        old allocs/op  new allocs/op  delta
SingleRoundtripWithLatency/latency=10ms-24       184 ± 2%       185 ± 3%    ~     (p=0.927 n=10+10)
SingleRoundtripWithLatency/latency=0s-24         144 ± 0%       146 ± 0%  +1.39%  (p=0.000 n=10+10)

I'll focus on the CPU regression for SingleRoundtripWithLatency/latency=0s next and run a bisect.

Bisect finished; points to 1b5c419.

bisect results

Successfully built binary for target //pkg/cmd/cockroach-short:cockroach-short at cockroach-short
Successfully built binary for target //pkg/cmd/cockroach-short:cockroach-short at cockroach
building benchmark binaries for '7a7e3f9' 1/1 /

  pkg=1/1 iter=10/10 cockroachdb/cockroach/pkg/kv/kvclient/kvcoord \

name                                      old time/op    new time/op    delta
SingleRoundtripWithLatency/latency=0s-24     228µs ± 8%     239µs ± 5%  +4.67%  (p=0.011 n=10+10)

name                                      old alloc/op   new alloc/op   delta
SingleRoundtripWithLatency/latency=0s-24    17.8kB ± 0%    18.3kB ± 1%  +3.06%  (p=0.000 n=9+10)

name                                      old allocs/op  new allocs/op  delta
SingleRoundtripWithLatency/latency=0s-24       144 ± 0%       146 ± 0%  +1.39%  (p=0.000 n=10+10)
1b5c41939197efb0fa50ded9795e2e83f5c1dd34 is the first bad commit
commit 1b5c41939197efb0fa50ded9795e2e83f5c1dd34
Merge: 2afb5fa1583 7a7e3f98e75
Author: craig[bot] <[email protected]>
Date:   Sat Oct 26 14:59:33 2024 +0000

    Merge #133508

    133508: kvserver: preinitialize some Pebble category stats r=RaduBerinde a=RaduBerinde

    This change fixes non-determinism when generating `metrics.html` due
    to the Pebble category stats.

    Informs #133507
    Unblocks #133409

    Epic: none
    Release note: None

    Co-authored-by: Radu Berinde <[email protected]>

 docs/generated/metrics/metrics.html | 18 ++++++++++++++++++
 pkg/kv/kvserver/metrics.go          | 11 +++++++++++
 2 files changed, 29 insertions(+)
bisect run success

I'm not convinced this isn't just "noise". On 4 different runs, I see:

run 1


test binaries already exist for '2afb5fa'; skipping build
test binaries already exist for '1b5c419'; skipping build

  pkg=1/1 iter=10/10 cockroachdb/cockroach/pkg/kv/kvclient/kvcoord |

name                                      old time/op    new time/op    delta
SingleRoundtripWithLatency/latency=0s-24     233µs ± 4%     237µs ± 5%   ~     (p=0.089 n=10+10)

name                                      old alloc/op   new alloc/op   delta
SingleRoundtripWithLatency/latency=0s-24    18.4kB ± 1%    18.4kB ± 1%   ~     (p=0.971 n=10+10)

name                                      old allocs/op  new allocs/op  delta
SingleRoundtripWithLatency/latency=0s-24       146 ± 0%       146 ± 0%   ~     (all equal)

run 2


test binaries already exist for '2afb5fa'; skipping build
test binaries already exist for '1b5c419'; skipping build

  pkg=1/1 iter=10/10 cockroachdb/cockroach/pkg/kv/kvclient/kvcoord /

name                                      old time/op    new time/op    delta
SingleRoundtripWithLatency/latency=0s-24     229µs ± 8%     238µs ± 5%  +3.61%  (p=0.043 n=10+10)

name                                      old alloc/op   new alloc/op   delta
SingleRoundtripWithLatency/latency=0s-24    18.4kB ± 1%    18.4kB ± 1%    ~     (p=0.425 n=10+10)

name                                      old allocs/op  new allocs/op  delta
SingleRoundtripWithLatency/latency=0s-24       146 ± 0%       146 ± 0%    ~     (all equal)

run 3


test binaries already exist for '2afb5fa'; skipping build
test binaries already exist for '1b5c419'; skipping build

  pkg=1/1 iter=10/10 cockroachdb/cockroach/pkg/kv/kvclient/kvcoord \

name                                      old time/op    new time/op    delta
SingleRoundtripWithLatency/latency=0s-24     225µs ± 5%     240µs ± 5%  +6.74%  (p=0.000 n=10+9)

name                                      old alloc/op   new alloc/op   delta
SingleRoundtripWithLatency/latency=0s-24    18.3kB ± 0%    18.4kB ± 1%    ~     (p=0.055 n=10+10)

name                                      old allocs/op  new allocs/op  delta
SingleRoundtripWithLatency/latency=0s-24       146 ± 0%       146 ± 0%    ~     (all equal)

wrote cpu profiles to:
  old=benchdiff/2afb5fa/artifacts/cpu.prof
  new=benchdiff/1b5c419/artifacts/cpu.prof

run 4

test binaries already exist for '2afb5fa'; skipping build
test binaries already exist for '1b5c419'; skipping build

  pkg=1/1 iter=10/10 cockroachdb/cockroach/pkg/kv/kvclient/kvcoord |

name                                      old time/op    new time/op    delta
SingleRoundtripWithLatency/latency=0s-24     232µs ± 8%     238µs ± 7%    ~     (p=0.218 n=10+10)

name                                      old alloc/op   new alloc/op   delta
SingleRoundtripWithLatency/latency=0s-24    18.4kB ± 0%    18.3kB ± 1%  -0.24%  (p=0.012 n=9+10)

name                                      old allocs/op  new allocs/op  delta
SingleRoundtripWithLatency/latency=0s-24       146 ± 0%       146 ± 0%    ~     (all equal)

wrote cpu profiles to:
  old=benchdiff/2afb5fa/artifacts/cpu.prof
  new=benchdiff/1b5c419/artifacts/cpu.prof

For any on-lookers that don't want to expand, that's 0%, 3.61%, 6.74%, 0%. Looking at the commit, and diffing the CPU profiles from the 6.74%, nothing jumps out either.

nvanbenschoten · 2024-11-05T21:30:35Z

The 20.19% increase in alloc/op in OneNode is proportional to the increase the the raftpb.Message size due to the introduction of 3 new fields (Match, Lead, LeadEpoch). The Message's in-memory foortprint grew from 160 bytes to 184 bytes, a 15% increase.

cockroach/pkg/raft/raftpb/raft_test.go

Line 48 in b20c617

assert(unsafe.Sizeof(m), if64Bit(160, 112), "Message")

cockroach/pkg/raft/raftpb/raft_test.go

Line 51 in 1b5c419

assert(unsafe.Sizeof(m), if64Bit(184, 112), "Message")

Marking the regression on that metric as expected.

stevendanna · 2024-11-06T08:15:32Z

I'm still investigating since I'm using it as a learning activity,but StoreRangeSplit to me looks like a pretty noisy benchmark. It splits range 1, so the amount of data in the range isn't controlled over time. The amount of data in the range can matter substantially because of stats recomputation.

arulajmani added C-investigation Further steps needed to qualify. C-label will change. A-benchmarking GA-blocker T-kv KV Team branch-release-24.3 Used to mark GA and release blockers, technical advisories, and bugs for 24.3 labels Oct 29, 2024

arulajmani assigned stevendanna, pav-kv, tbg, nvanbenschoten, arulajmani, kvoli, andrewbaptist, miraradeva and iskettaneh Oct 29, 2024

arulajmani added a commit to arulajmani/cockroach that referenced this issue Nov 5, 2024

kvserver: fix logging for BenchmarkReplicaProposal

fad2283

Make this usable with Benchdiff. Informs cockroachdb#133682 Release note: None

arulajmani mentioned this issue Nov 5, 2024

kvserver: fix logging for BenchmarkReplicaProposal #134264

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kv: triage microbenchmark regressions for v24.3 #133682

kv: triage microbenchmark regressions for v24.3 #133682

arulajmani commented Oct 29, 2024 •

edited by iskettaneh

Loading

andrewbaptist commented Oct 30, 2024 •

edited

Loading

kvoli commented Oct 31, 2024

arulajmani commented Oct 31, 2024 •

edited

Loading

arulajmani commented Nov 4, 2024 •

edited

Loading

arulajmani commented Nov 5, 2024 •

edited

Loading

nvanbenschoten commented Nov 5, 2024

stevendanna commented Nov 6, 2024

kv: triage microbenchmark regressions for v24.3 #133682

kv: triage microbenchmark regressions for v24.3 #133682

Comments

arulajmani commented Oct 29, 2024 • edited by iskettaneh Loading

Tips for debugging

Footnotes

andrewbaptist commented Oct 30, 2024 • edited Loading

kvoli commented Oct 31, 2024

arulajmani commented Oct 31, 2024 • edited Loading

arulajmani commented Nov 4, 2024 • edited Loading

arulajmani commented Nov 5, 2024 • edited Loading

nvanbenschoten commented Nov 5, 2024

stevendanna commented Nov 6, 2024

arulajmani commented Oct 29, 2024 •

edited by iskettaneh

Loading

andrewbaptist commented Oct 30, 2024 •

edited

Loading

arulajmani commented Oct 31, 2024 •

edited

Loading

arulajmani commented Nov 4, 2024 •

edited

Loading

arulajmani commented Nov 5, 2024 •

edited

Loading