Enable file system buffer reuse for compaction prefetches #13187

archang19 · 2024-12-05T18:41:46Z

Summary

In #13177, I discussed an unsigned integer overflow issue that affects compaction reads inside FilePrefetchBuffer when we attempt to enable the file system buffer reuse optimization. In that PR, I disabled the optimization whenever for_compaction was true to eliminate the source of the bug.

This PR safely re-enables the optimization when for_compaction is true. We need to properly set the overlap buffer through PrefetchInternal rather than simply calling Prefetch. Prefetch assumes num_buffers_ is 1 (i.e. async IO is disabled), so historically it did not have any overlap buffer logic. What ends up happening (with the old bug) is that, when we try to reuse the file system provided buffer, inside the Prefetch method, we read the remaining missing data. However, since we do not do any RefitTail method when use_fs_buffer is true, normally we would rely on copying the partial relevant data into an overlap buffer. That overlap buffer logic was missing, so the final main buffer ends up storing data from an offset that is greater than the requested offset, and we effectively end up "throwing away" part of the requested data.

Test Plan

I removed the temporary test case from #13200 and incorporated the same test cases into my updated parameterized test case, which tests the valid combinations between use_async_prefetch and for_compaction.

I went further and added a randomized test case that will simply try to hit assertion failures and catch any missing areas in the logic.

I also added a test case for compaction reads without the file system buffer reuse optimization. I am thinking that it may be valuable to make a future PR that unifies a lot of these prefetch tests and parametrizes as much of them as possible. This way we can avoid writing duplicate tests and just look over different parameters for async IO, direct IO, file system buffer reuse, and for_compaction.

facebook-github-bot · 2024-12-06T22:42:38Z

@archang19 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-12-06T23:19:12Z

@archang19 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2024-12-06T23:28:52Z

@archang19 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2024-12-06T23:31:40Z

@archang19 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-12-06T23:48:44Z

@archang19 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2024-12-06T23:49:21Z

@archang19 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-12-06T23:55:30Z

@archang19 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2024-12-06T23:56:24Z

@archang19 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-12-09T00:29:04Z

@archang19 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2024-12-09T16:06:34Z

@archang19 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2024-12-09T23:31:03Z

@archang19 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2024-12-09T23:44:55Z

@archang19 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2024-12-10T00:01:55Z

@archang19 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2024-12-12T18:12:15Z

@archang19 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2024-12-12T18:34:03Z

@archang19 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2024-12-12T19:10:05Z

@archang19 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2024-12-12T21:17:48Z

@archang19 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2024-12-12T22:58:00Z

@archang19 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2024-12-12T23:28:34Z

@archang19 has updated the pull request. You must reimport the pull request before landing.

hx235 · 2024-12-13T06:13:36Z

Just to clarify:

Prefetch assumes num_buffers_ is 1 (i.e. async IO is disabled), so historically it did not have any overlap buffer logic. ... However, since we do not do any RefitTail method when use_fs_buffer is true, normally we would rely on copying the partial relevant data into an overlap buffer. That overlap buffer logic was missing, so the final main buffer ends up storing data from an offset that is greater than the requested offset, and we effectively end up "throwing away" part of the requested data.

Why do we need overlap buffer logic for reuse file system buffer + prefetch + non-assync IO? Is it because there can actually be overlapping data OR is it just because the overlap_buf_ is reused while the data being copied are non-overlapped? I'm trying to get a high-level understanding on why we ended up having "so the final main buffer ends up storing data from an offset that is greater than the requested offset..." in Prefetch()

hx235 · 2024-12-13T06:22:14Z

file/file_prefetch_buffer.cc

+        // readahize_size_.
+        uint64_t trimmed_readahead_size = 0;
+        if (n < readahead_size_) {
+          trimmed_readahead_size = readahead_size_ - n;


Where in the original code shows that compaction is prefetching only readahead_size_ - n but not readahead_size_?

I remembered it was very confusing to me what was std::max(n, readahead_size_) in Prefetch() intended for - why not n + readahead_size_

Where in the original code shows that compaction is prefetching only readahead_size_ - n but not readahead_size_?

https://github.com/facebook/rocksdb/blob/main/file/file_prefetch_buffer.cc#L823

if (for_compaction) { s = Prefetch(opts, reader, offset, std::max(n, readahead_size_));

I think you have it correct. The std::max(n, readahead_size_) that you mentioned is the reason that the prefetch is only readahead_size_ - n. If n < readahead_size_, we read the original n bytes plus readahead_size_ - n of prefetched data

@anand1976 I am in favor of unifying the logic so that we treat compaction readaheads the same as non-compaction readaheads (e.g. fetch the requested amount + readahead_size_ in either case). Do you have any objections / what are your thoughts? I think this would simplify our logic and don't see downsides

hx235 · 2024-12-13T06:32:56Z

file/prefetch_test.cc

+  readahead_params.max_readahead_size = 8192;
+  readahead_params.num_buffers = 1;
+
+  FilePrefetchBuffer fpb(readahead_params, true, false, fs(), nullptr,


nit: add comments of the parameter name right after each hard-coded value, same for TryReadFromCache()

hx235 · 2024-12-13T06:36:32Z

file/file_prefetch_buffer.cc

@@ -819,8 +819,20 @@ bool FilePrefetchBuffer::TryReadFromCacheUntracked(
      assert(reader != nullptr);
      assert(max_readahead_size_ >= readahead_size_);

+      // We disallow async IO for compaction reads since their
+      // latencies are not user-visible


nit: ... are more tolerable ?? We have stats for user to know when compaction read is slow

I can update the wording. The idea was that the compaction read was a background operation, in contrast to a latency-sensitive user-initiated scan operation

hx235 · 2024-12-13T06:43:24Z

file/prefetch_test.cc

-  ASSERT_EQ(stats->getAndResetTickerCount(PREFETCH_HITS), 1);
-  ASSERT_EQ(stats->getAndResetTickerCount(PREFETCH_BYTES_USEFUL),
-            4096);  // 12288-16384
+  if (!for_compaction) {


why !for_compaction?

I can add a comment in the code for this.

We do not update the prefetch stats for compaction reads https://github.com/facebook/rocksdb/blob/main/file/file_prefetch_buffer.cc#L856-L857, so I need this check to keep the tests passing for all the different test parameter variations

} else if (!for_compaction) { UpdateStats(/*found_in_buffer=*/true, n); }

I don't know why exactly we don't do this. @anand1976 may know. I would guess that we don't want to "contaminate" the prefetch stats for non-compaction reads (PREFETCH_HITS and PREFETCH_BYTES_USEFUL) which we presumably are more interested in. I think we could have defined a separate set of prefetch stats for compaction reads specifically, but maybe we just did not care because the compaction reads are all background ops anyways

facebook-github-bot · 2024-12-19T16:50:51Z

@archang19 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2024-12-19T20:56:44Z

@archang19 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

archang19 · 2024-12-19T21:02:42Z

Why do we need overlap buffer logic for reuse file system buffer + prefetch + non-assync IO? Is it because there can actually be overlapping data OR is it just because the overlap_buf_ is reused while the data being copied are non-overlapped? I'm trying to get a high-level understanding on why we ended up having "so the final main buffer ends up storing data from an offset that is greater than the requested offset..." in Prefetch()

@hx235 With async IO, there are 2 buffers, so the overlap buffer is for when the requested data spans the 2 buffers (i.e. buffer 1 has the first half of the data and buffer 2 has the second half).

Without async IO, and with the file system buffer optimization, we use the overlap buffer when the main (and only buffer) only has a "partial hit." Say the buffer contains offsets 100-200 and we request 150-300. Now, without the file system buffer optimization, we would normally "refit tail," move bytes 150-200 from the end of the buffer to the start of the buffer, and then request bytes 200-300 (+ any additional prefetching). With the file system buffer optimization, we don't do that because we are pointing to a buffer allocated by the file system. So instead we would copy bytes 150-200 to the overlap buffer, request bytes 200-300 (+ any additional prefetching), and then copy back bytes 200-300 to the overlap buffer. Ultimately, the overlap buffer would contain exactly what the user asked for (bytes 150-300) and no more. We also avoid refetching the same data (i.e. making a request for bytes 150-300)

facebook-github-bot added the CLA Signed label Dec 5, 2024

archang19 force-pushed the assert-preconditions-prefetch branch from 772efa0 to a7ec31f Compare December 6, 2024 22:19

archang19 force-pushed the assert-preconditions-prefetch branch from cb16e90 to 32d2d17 Compare December 6, 2024 23:28

archang19 requested a review from anand1976 December 6, 2024 23:50

archang19 force-pushed the assert-preconditions-prefetch branch from 3fd9280 to 9a0a2c4 Compare December 6, 2024 23:55

archang19 force-pushed the assert-preconditions-prefetch branch from f87e358 to 88ebd8b Compare December 9, 2024 23:30

archang19 changed the title ~~Add precondition assertions for file prefetch buffer reads~~ Enable file system buffer reuse for compaction prefetches Dec 9, 2024

archang19 force-pushed the assert-preconditions-prefetch branch from 88ebd8b to ab57916 Compare December 9, 2024 23:44

archang19 force-pushed the assert-preconditions-prefetch branch from 315bfc0 to 5e4483f Compare December 12, 2024 18:12

hx235 reviewed Dec 13, 2024

View reviewed changes

archang19 added 12 commits December 19, 2024 08:50

Add precondition assertions for file prefetch buffer reads

0d3fc66

Add error logging

859ae85

Probably only need this for now

3d5b7d6

Demonstrate failing test with for_compaction=true

0676dd0

Try using PrefetchInternal instead of regular Prefetch

d379ca2

Update tests and implementation

b2ede2b

Temporarily add randomized tests

5602243

Fix randomized test

2ddc2ea

Check status inside test

b1030e2

Handle platforms with no async io support

019e869

Add test for compaction read without optimization

c69d536

Make randomized test more tricky

ac5fd5c

archang19 force-pushed the assert-preconditions-prefetch branch from 2d72233 to ac5fd5c Compare December 19, 2024 16:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable file system buffer reuse for compaction prefetches #13187

Enable file system buffer reuse for compaction prefetches #13187

archang19 commented Dec 5, 2024 •

edited

Loading

facebook-github-bot commented Dec 6, 2024

facebook-github-bot commented Dec 6, 2024

facebook-github-bot commented Dec 6, 2024

facebook-github-bot commented Dec 6, 2024

facebook-github-bot commented Dec 6, 2024

facebook-github-bot commented Dec 6, 2024

facebook-github-bot commented Dec 6, 2024

facebook-github-bot commented Dec 6, 2024

facebook-github-bot commented Dec 9, 2024

facebook-github-bot commented Dec 9, 2024

facebook-github-bot commented Dec 9, 2024

facebook-github-bot commented Dec 9, 2024

facebook-github-bot commented Dec 10, 2024

facebook-github-bot commented Dec 12, 2024

facebook-github-bot commented Dec 12, 2024

facebook-github-bot commented Dec 12, 2024

facebook-github-bot commented Dec 12, 2024

facebook-github-bot commented Dec 12, 2024

facebook-github-bot commented Dec 12, 2024

hx235 commented Dec 13, 2024

hx235 Dec 13, 2024 •

edited

Loading

archang19 Dec 19, 2024

archang19 Dec 19, 2024

hx235 Dec 13, 2024 •

edited

Loading

hx235 Dec 13, 2024

archang19 Dec 19, 2024

hx235 Dec 13, 2024

archang19 Dec 19, 2024

facebook-github-bot commented Dec 19, 2024

facebook-github-bot commented Dec 19, 2024

archang19 commented Dec 19, 2024

Enable file system buffer reuse for compaction prefetches #13187

Are you sure you want to change the base?

Enable file system buffer reuse for compaction prefetches #13187

Conversation

archang19 commented Dec 5, 2024 • edited Loading

Summary

Test Plan

facebook-github-bot commented Dec 6, 2024

facebook-github-bot commented Dec 6, 2024

facebook-github-bot commented Dec 6, 2024

facebook-github-bot commented Dec 6, 2024

facebook-github-bot commented Dec 6, 2024

facebook-github-bot commented Dec 6, 2024

facebook-github-bot commented Dec 6, 2024

facebook-github-bot commented Dec 6, 2024

facebook-github-bot commented Dec 9, 2024

facebook-github-bot commented Dec 9, 2024

facebook-github-bot commented Dec 9, 2024

facebook-github-bot commented Dec 9, 2024

facebook-github-bot commented Dec 10, 2024

facebook-github-bot commented Dec 12, 2024

facebook-github-bot commented Dec 12, 2024

facebook-github-bot commented Dec 12, 2024

facebook-github-bot commented Dec 12, 2024

facebook-github-bot commented Dec 12, 2024

facebook-github-bot commented Dec 12, 2024

hx235 commented Dec 13, 2024

hx235 Dec 13, 2024 • edited Loading

Choose a reason for hiding this comment

archang19 Dec 19, 2024

Choose a reason for hiding this comment

archang19 Dec 19, 2024

Choose a reason for hiding this comment

hx235 Dec 13, 2024 • edited Loading

Choose a reason for hiding this comment

hx235 Dec 13, 2024

Choose a reason for hiding this comment

archang19 Dec 19, 2024

Choose a reason for hiding this comment

hx235 Dec 13, 2024

Choose a reason for hiding this comment

archang19 Dec 19, 2024

Choose a reason for hiding this comment

facebook-github-bot commented Dec 19, 2024

facebook-github-bot commented Dec 19, 2024

archang19 commented Dec 19, 2024

archang19 commented Dec 5, 2024 •

edited

Loading

hx235 Dec 13, 2024 •

edited

Loading

hx235 Dec 13, 2024 •

edited

Loading