You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Let L be the fsync latency of the WAL storage medium.
When the memtable and WAL are rotated, the first batch application to the new WAL may need to at worst wait:
For an inflight fsync of entries to the previous WAL to complete (at worst, L).
For a final fsync of entries to the previous WAL that did not make the in-flight fsync. ( L )
A final fsync in LogWriter.Close to ensure the EOF trailer is synced. ( L )
A fsync of the WAL directory to ensure the new WAL is durably linked into its new name. ( L )
The fsync of this new batch. ( L )
Cumulatively, these can cause commit tail latencies to increase 5x. There are a few ways this could be reduced.
(2) & (3) could be together bounded by 1 L through more coordination between LogWriter.Close and the LogWriter's flush loop. The final flush of log entries (2) can include the EOF trailer and sync:
// Sync any flushed data to disk. NB: flushLoop will sync after flushing the
// last buffered data only if it was requested via syncQ, so we need to sync
// here to ensure that all the data is synced.
err:=w.flusher.err
varsyncLatency time.Duration
iferr==nil&&w.s!=nil {
syncLatency, err=w.syncWithLatency()
}
(4) & (5) could happen in parallel, but it would require some additional, delicate synchronization.
Or alternatively we could prepare the next WAL ahead of time. In a steady state, Pebble would have two open WALs with log numbers >= minUnflushedLogNum: current and next. The next LogWriter's flushLoop would synchronize with current's Close, refusing to signal to waiting syncQueuers until current's Close has completed. By addressing (2) & (3) as well, this would eliminate any additional worst-case fsync latency from the WAL rotation itself, making it inline with ordinary WAL fsyncs.
In Open, we would need to relax/rework the strictWALTail option. Currently all replayed WALs besides the most recent one are required to have clean tails indicating that they were deliberately closed—anything else is interpreted as corruption. With this change, it would be possible for the second most recent WAL to have an unclean tail for some time. We could include a marker entry in the next WAL that is written only once after the next WAL observed that current's Close completed, indicating that if recovery observed an unclean tail of the previous WAL, it should treat it as corruption.
In #2762 we've unbounded the amount of data that may be queued for flushing within a single WAL. Today, the 1:1 relationship between WALs and memtables mean that the amount of data queued for flushing is bounded by the size of the mutable memtable. If we begin pipelining WALs allowing more than one WAL to queue writes, this bound will effectively be lifted to opts.MemTableStopWritesThreshold * opts.MemtableSize. If/when we make this change, we should reevaluate what if any additional bound we want to impose on blocks queued for flushing.
Let L be the fsync latency of the WAL storage medium.
When the memtable and WAL are rotated, the first batch application to the new WAL may need to at worst wait:
LogWriter.Close
to ensure the EOF trailer is synced. ( L )Cumulatively, these can cause commit tail latencies to increase 5x. There are a few ways this could be reduced.
(2) & (3) could be together bounded by 1 L through more coordination between
LogWriter.Close
and theLogWriter
's flush loop. The final flush of log entries (2) can include the EOF trailer and sync:pebble/record/log_writer.go
Lines 638 to 645 in f6eaf9a
(4) & (5) could happen in parallel, but it would require some additional, delicate synchronization.
Or alternatively we could prepare the next WAL ahead of time. In a steady state, Pebble would have two open WALs with log numbers
>= minUnflushedLogNum
: current and next. The next LogWriter'sflushLoop
would synchronize with current'sClose
, refusing to signal to waiting syncQueuers until current'sClose
has completed. By addressing (2) & (3) as well, this would eliminate any additional worst-case fsync latency from the WAL rotation itself, making it inline with ordinary WAL fsyncs.In
Open
, we would need to relax/rework thestrictWALTail
option. Currently all replayed WALs besides the most recent one are required to have clean tails indicating that they were deliberately closed—anything else is interpreted as corruption. With this change, it would be possible for the second most recent WAL to have an unclean tail for some time. We could include a marker entry in the next WAL that is written only once after the next WAL observed that current'sClose
completed, indicating that if recovery observed an unclean tail of the previous WAL, it should treat it as corruption.Jira issue: PEBBLE-192
The text was updated successfully, but these errors were encountered: