Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

record: allow unbounded queued blocks in LogWriter #2762

Merged
merged 2 commits into from
Jul 24, 2023

Conversation

jbowens
Copy link
Collaborator

@jbowens jbowens commented Jul 20, 2023

Previously, a LogWriter would allocate up to 16 blocks of 32 KiB for buffering WAL writes. If all 16 blocks had been allocated and no free blocks were available, a batch writing to the WAL would queue until the flushing goroutine freed blocks. In testing of write-heavy workloads, especially with larger value sizes, we've seen queueing at the LogWriter. This queueing blocks the commit pipeline, preventing any batches from committing regardless of priority and whether they require waiting for fsync.

This commit modifies LogWriter to allow the queueing of an unbounded number of blocks. In practice, for the current WAL, the memtable size serves as an upper bound. With a 64 MiB memtable, at most 64 MiB / 32 KiB = 2,048 blocks may queue. This is not an unreasonable of additional memory overhead for a write-heavy workload.

Beyond improving throughput for write-heavy workloads, removing this hard bound improves tolerance of momentary fsync stalls.

Informs cockroachdb/cockroach#88699.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@jbowens jbowens requested review from sumeerbhola and a team July 20, 2023 22:08
@jbowens jbowens changed the base branch from endless-WAL-queue to master July 21, 2023 14:59
Copy link
Collaborator Author

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might also consider adding additional memory accounting for these blocks. I don't think it's urgent, given the 64 MiB bound. But if in the future we allow pipelining WALs and permit buffering writes across the WAL rotation (which I think would help tail latencies considerably), I think we'll need some kind of accounting.

Reviewable status: 0 of 6 files reviewed, all discussions resolved (waiting on @sumeerbhola)

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth adding some commentary to #2540?

Can you add a test that does allocate a large number of blocks, so we can be sure that (a) it is correct, (b) the slices are behaving reasonably in that we are not incurring slice allocations where we don't expect to.

Reviewed 6 of 6 files at r1.
Reviewable status: all files reviewed (commit messages unreviewed), 2 unresolved discussions (waiting on @jbowens)


batch.go line 323 at r2 (raw file):

	// WALQueueWaitDuration is the wait time for allocating memory blocks in the
	// LogWriter (due to the LogWriter not writing fast enough).
	WALQueueWaitDuration time.Duration

will this come back once we add some memory bound?
I am wondering whether we want to keep this for now given that we have the plumbing for it in CockroachDB already.


record/log_writer.go line 569 at r2 (raw file):

	w.free.Lock()
	if len(w.free.blocks) == 0 {
		w.free.blocks = append(w.free.blocks, &block{})

should we consider allocating the free blocks from a sync.Pool, given the well-defined lifecycle and the larger number of blocks a logWriter may allocate? That is, two levels of free pool management -- what we have here, and then the sync.Pool for &block{} and return them all to the sync.Pool in LogWriter.Close.

Copy link
Collaborator Author

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment to #2540 and a unit test + benchmark.

Reviewable status: 3 of 6 files reviewed, 2 unresolved discussions (waiting on @sumeerbhola)


batch.go line 323 at r2 (raw file):

Previously, sumeerbhola wrote…

will this come back once we add some memory bound?
I am wondering whether we want to keep this for now given that we have the plumbing for it in CockroachDB already.

Yeah, I started to wonder whether we ultimately should only have MemtableStopWritesThreshold which will also bound on the queued WAL blocks. But I'll keep this around so we can make a decision then.


record/log_writer.go line 569 at r2 (raw file):

Previously, sumeerbhola wrote…

should we consider allocating the free blocks from a sync.Pool, given the well-defined lifecycle and the larger number of blocks a logWriter may allocate? That is, two levels of free pool management -- what we have here, and then the sync.Pool for &block{} and return them all to the sync.Pool in LogWriter.Close.

Done.

@jbowens jbowens force-pushed the endless-WAL-queue branch 3 times, most recently from a9d5013 to b1800d7 Compare July 24, 2023 16:50
Previously, a LogWriter would allocate up to 16 blocks of 32 KiB for buffering
WAL writes. If all 16 blocks had been allocated and no free blocks were
available, a batch writing to the WAL would queue until the flushing goroutine
freed blocks. In testing of write-heavy workloads, especially with larger value
sizes, we've seen queueing at the LogWriter. This queueing blocks the commit
pipeline, preventing any batches from committing regardless of priority and
whether they require waiting for fsync.

This commit modifies LogWriter to allow the queueing of an unbounded number of
blocks. In practice, for the current WAL, the memtable size serves as an upper
bound. With a 64 MiB memtable, at most 64 MiB / 32 KiB = 2,048 blocks may
queue. This is not an unreasonable of additional memory overhead for a
write-heavy workload.

Beyond improving throughput for write-heavy workloads, removing this hard bound
improves tolerance of momentary disk stalls.

Informs cockroachdb/cockroach#88699.
When a LogWriter is closed, return its blocks to a sync.Pool for reuse by a
subsequent LogWriter to reduce allocations.

```
goos: darwin
goarch: arm64
pkg: github.com/cockroachdb/pebble/record
                                    │   old.txt    │               new.txt               │
                                    │    sec/op    │   sec/op     vs base                │
QueueWALBlocks/record-size=64B-10      32.80m ± 2%   28.23m ± 2%  -13.93% (p=0.000 n=10)
QueueWALBlocks/record-size=512B-10     15.12m ± 2%   11.55m ± 1%  -23.58% (p=0.000 n=10)
QueueWALBlocks/record-size=1.0KB-10    14.23m ± 4%   10.93m ± 4%  -23.20% (p=0.000 n=10)
QueueWALBlocks/record-size=2.0KB-10    13.84m ± 2%   10.52m ± 2%  -24.00% (p=0.000 n=10)
QueueWALBlocks/record-size=32KB-10    13.197m ± 1%   9.784m ± 2%  -25.87% (p=0.000 n=10)
geomean                                16.67m        12.97m       -22.22%

                                    │   old.txt    │               new.txt                │
                                    │     B/s      │     B/s       vs base                │
QueueWALBlocks/record-size=64B-10     1.906Gi ± 2%   2.214Gi ± 2%  +16.18% (p=0.000 n=10)
QueueWALBlocks/record-size=512B-10    4.134Gi ± 2%   5.411Gi ± 1%  +30.86% (p=0.000 n=10)
QueueWALBlocks/record-size=1.0KB-10   4.393Gi ± 4%   5.721Gi ± 4%  +30.21% (p=0.000 n=10)
QueueWALBlocks/record-size=2.0KB-10   4.514Gi ± 2%   5.940Gi ± 2%  +31.58% (p=0.000 n=10)
QueueWALBlocks/record-size=32KB-10    4.736Gi ± 1%   6.388Gi ± 2%  +34.89% (p=0.000 n=10)
geomean                               3.748Gi        4.820Gi       +28.58%

                                    │     old.txt     │               new.txt                │
                                    │      B/op       │     B/op      vs base                │
QueueWALBlocks/record-size=64B-10     96058.79Ki ± 0%   99.72Ki ± 1%  -99.90% (p=0.000 n=10)
QueueWALBlocks/record-size=512B-10    83738.81Ki ± 0%   98.77Ki ± 0%  -99.88% (p=0.000 n=10)
QueueWALBlocks/record-size=1.0KB-10   82858.79Ki ± 0%   98.99Ki ± 0%  -99.88% (p=0.000 n=10)
QueueWALBlocks/record-size=2.0KB-10   82418.81Ki ± 0%   98.78Ki ± 0%  -99.88% (p=0.000 n=10)
QueueWALBlocks/record-size=32KB-10    82019.15Ki ± 0%   99.02Ki ± 0%  -99.88% (p=0.000 n=10)
geomean                                  83.26Mi        99.06Ki       -99.88%

                                    │   old.txt    │              new.txt               │
                                    │  allocs/op   │ allocs/op   vs base                │
QueueWALBlocks/record-size=64B-10     2413.00 ± 0%   14.00 ± 0%  -99.42% (p=0.000 n=10)
QueueWALBlocks/record-size=512B-10    2105.00 ± 0%   14.00 ± 0%  -99.33% (p=0.000 n=10)
QueueWALBlocks/record-size=1.0KB-10   2083.00 ± 0%   14.00 ± 7%  -99.33% (p=0.000 n=10)
QueueWALBlocks/record-size=2.0KB-10   2072.00 ± 0%   14.00 ± 0%  -99.32% (p=0.000 n=10)
QueueWALBlocks/record-size=32KB-10    2062.00 ± 0%   14.00 ± 7%  -99.32% (p=0.000 n=10)
geomean                                2.143k        14.00       -99.35%
```
Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 4 of 4 files at r3, 1 of 1 files at r4, 1 of 1 files at r6.
Reviewable status: 6 of 7 files reviewed, 2 unresolved discussions (waiting on @jbowens)

@jbowens
Copy link
Collaborator Author

jbowens commented Jul 24, 2023

TFTR!

@jbowens jbowens merged commit 2e84c30 into cockroachdb:master Jul 24, 2023
9 checks passed
@jbowens jbowens deleted the endless-WAL-queue branch July 24, 2023 17:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants