-
Notifications
You must be signed in to change notification settings - Fork 452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
record: allow unbounded queued blocks in LogWriter #2762
Conversation
5db333d
to
3a83819
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might also consider adding additional memory accounting for these blocks. I don't think it's urgent, given the 64 MiB bound. But if in the future we allow pipelining WALs and permit buffering writes across the WAL rotation (which I think would help tail latencies considerably), I think we'll need some kind of accounting.
Reviewable status: 0 of 6 files reviewed, all discussions resolved (waiting on @sumeerbhola)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth adding some commentary to #2540?
Can you add a test that does allocate a large number of blocks, so we can be sure that (a) it is correct, (b) the slices are behaving reasonably in that we are not incurring slice allocations where we don't expect to.
Reviewed 6 of 6 files at r1.
Reviewable status: all files reviewed (commit messages unreviewed), 2 unresolved discussions (waiting on @jbowens)
batch.go
line 323 at r2 (raw file):
// WALQueueWaitDuration is the wait time for allocating memory blocks in the // LogWriter (due to the LogWriter not writing fast enough). WALQueueWaitDuration time.Duration
will this come back once we add some memory bound?
I am wondering whether we want to keep this for now given that we have the plumbing for it in CockroachDB already.
record/log_writer.go
line 569 at r2 (raw file):
w.free.Lock() if len(w.free.blocks) == 0 { w.free.blocks = append(w.free.blocks, &block{})
should we consider allocating the free blocks from a sync.Pool
, given the well-defined lifecycle and the larger number of blocks a logWriter
may allocate? That is, two levels of free pool management -- what we have here, and then the sync.Pool
for &block{}
and return them all to the sync.Pool
in LogWriter.Close
.
3a83819
to
d91ae94
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment to #2540 and a unit test + benchmark.
Reviewable status: 3 of 6 files reviewed, 2 unresolved discussions (waiting on @sumeerbhola)
batch.go
line 323 at r2 (raw file):
Previously, sumeerbhola wrote…
will this come back once we add some memory bound?
I am wondering whether we want to keep this for now given that we have the plumbing for it in CockroachDB already.
Yeah, I started to wonder whether we ultimately should only have MemtableStopWritesThreshold
which will also bound on the queued WAL blocks. But I'll keep this around so we can make a decision then.
record/log_writer.go
line 569 at r2 (raw file):
Previously, sumeerbhola wrote…
should we consider allocating the free blocks from a
sync.Pool
, given the well-defined lifecycle and the larger number of blocks alogWriter
may allocate? That is, two levels of free pool management -- what we have here, and then thesync.Pool
for&block{}
and return them all to thesync.Pool
inLogWriter.Close
.
Done.
a9d5013
to
b1800d7
Compare
Previously, a LogWriter would allocate up to 16 blocks of 32 KiB for buffering WAL writes. If all 16 blocks had been allocated and no free blocks were available, a batch writing to the WAL would queue until the flushing goroutine freed blocks. In testing of write-heavy workloads, especially with larger value sizes, we've seen queueing at the LogWriter. This queueing blocks the commit pipeline, preventing any batches from committing regardless of priority and whether they require waiting for fsync. This commit modifies LogWriter to allow the queueing of an unbounded number of blocks. In practice, for the current WAL, the memtable size serves as an upper bound. With a 64 MiB memtable, at most 64 MiB / 32 KiB = 2,048 blocks may queue. This is not an unreasonable of additional memory overhead for a write-heavy workload. Beyond improving throughput for write-heavy workloads, removing this hard bound improves tolerance of momentary disk stalls. Informs cockroachdb/cockroach#88699.
When a LogWriter is closed, return its blocks to a sync.Pool for reuse by a subsequent LogWriter to reduce allocations. ``` goos: darwin goarch: arm64 pkg: github.com/cockroachdb/pebble/record │ old.txt │ new.txt │ │ sec/op │ sec/op vs base │ QueueWALBlocks/record-size=64B-10 32.80m ± 2% 28.23m ± 2% -13.93% (p=0.000 n=10) QueueWALBlocks/record-size=512B-10 15.12m ± 2% 11.55m ± 1% -23.58% (p=0.000 n=10) QueueWALBlocks/record-size=1.0KB-10 14.23m ± 4% 10.93m ± 4% -23.20% (p=0.000 n=10) QueueWALBlocks/record-size=2.0KB-10 13.84m ± 2% 10.52m ± 2% -24.00% (p=0.000 n=10) QueueWALBlocks/record-size=32KB-10 13.197m ± 1% 9.784m ± 2% -25.87% (p=0.000 n=10) geomean 16.67m 12.97m -22.22% │ old.txt │ new.txt │ │ B/s │ B/s vs base │ QueueWALBlocks/record-size=64B-10 1.906Gi ± 2% 2.214Gi ± 2% +16.18% (p=0.000 n=10) QueueWALBlocks/record-size=512B-10 4.134Gi ± 2% 5.411Gi ± 1% +30.86% (p=0.000 n=10) QueueWALBlocks/record-size=1.0KB-10 4.393Gi ± 4% 5.721Gi ± 4% +30.21% (p=0.000 n=10) QueueWALBlocks/record-size=2.0KB-10 4.514Gi ± 2% 5.940Gi ± 2% +31.58% (p=0.000 n=10) QueueWALBlocks/record-size=32KB-10 4.736Gi ± 1% 6.388Gi ± 2% +34.89% (p=0.000 n=10) geomean 3.748Gi 4.820Gi +28.58% │ old.txt │ new.txt │ │ B/op │ B/op vs base │ QueueWALBlocks/record-size=64B-10 96058.79Ki ± 0% 99.72Ki ± 1% -99.90% (p=0.000 n=10) QueueWALBlocks/record-size=512B-10 83738.81Ki ± 0% 98.77Ki ± 0% -99.88% (p=0.000 n=10) QueueWALBlocks/record-size=1.0KB-10 82858.79Ki ± 0% 98.99Ki ± 0% -99.88% (p=0.000 n=10) QueueWALBlocks/record-size=2.0KB-10 82418.81Ki ± 0% 98.78Ki ± 0% -99.88% (p=0.000 n=10) QueueWALBlocks/record-size=32KB-10 82019.15Ki ± 0% 99.02Ki ± 0% -99.88% (p=0.000 n=10) geomean 83.26Mi 99.06Ki -99.88% │ old.txt │ new.txt │ │ allocs/op │ allocs/op vs base │ QueueWALBlocks/record-size=64B-10 2413.00 ± 0% 14.00 ± 0% -99.42% (p=0.000 n=10) QueueWALBlocks/record-size=512B-10 2105.00 ± 0% 14.00 ± 0% -99.33% (p=0.000 n=10) QueueWALBlocks/record-size=1.0KB-10 2083.00 ± 0% 14.00 ± 7% -99.33% (p=0.000 n=10) QueueWALBlocks/record-size=2.0KB-10 2072.00 ± 0% 14.00 ± 0% -99.32% (p=0.000 n=10) QueueWALBlocks/record-size=32KB-10 2062.00 ± 0% 14.00 ± 7% -99.32% (p=0.000 n=10) geomean 2.143k 14.00 -99.35% ```
b1800d7
to
4445b51
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 4 files at r3, 1 of 1 files at r4, 1 of 1 files at r6.
Reviewable status: 6 of 7 files reviewed, 2 unresolved discussions (waiting on @jbowens)
TFTR! |
Previously, a LogWriter would allocate up to 16 blocks of 32 KiB for buffering WAL writes. If all 16 blocks had been allocated and no free blocks were available, a batch writing to the WAL would queue until the flushing goroutine freed blocks. In testing of write-heavy workloads, especially with larger value sizes, we've seen queueing at the LogWriter. This queueing blocks the commit pipeline, preventing any batches from committing regardless of priority and whether they require waiting for fsync.
This commit modifies LogWriter to allow the queueing of an unbounded number of blocks. In practice, for the current WAL, the memtable size serves as an upper bound. With a 64 MiB memtable, at most 64 MiB / 32 KiB = 2,048 blocks may queue. This is not an unreasonable of additional memory overhead for a write-heavy workload.
Beyond improving throughput for write-heavy workloads, removing this hard bound improves tolerance of momentary fsync stalls.
Informs cockroachdb/cockroach#88699.