Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable file system buffer reuse for compaction prefetches #13187

Draft
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

archang19
Copy link
Contributor

@archang19 archang19 commented Dec 5, 2024

Summary

In #13177, I discussed an unsigned integer overflow issue that affects compaction reads inside FilePrefetchBuffer when we attempt to enable the file system buffer reuse optimization. In that PR, I disabled the optimization whenever for_compaction was true to eliminate the source of the bug.

This PR safely re-enables the optimization when for_compaction is true. We need to properly set the overlap buffer through PrefetchInternal rather than simply calling Prefetch. Prefetch assumes num_buffers_ is 1 (i.e. async IO is disabled), so historically it did not have any overlap buffer logic. What ends up happening (with the old bug) is that, when we try to reuse the file system provided buffer, inside the Prefetch method, we read the remaining missing data. However, since we do not do any RefitTail method when use_fs_buffer is true, normally we would rely on copying the partial relevant data into an overlap buffer. That overlap buffer logic was missing, so the final main buffer ends up storing data from an offset that is greater than the requested offset, and we effectively end up "throwing away" part of the requested data.

Test Plan

I removed the temporary test case from #13200 and incorporated the same test cases into my updated parameterized test case, which tests the valid combinations between use_async_prefetch and for_compaction.

I went further and added a randomized test case that will simply try to hit assertion failures and catch any missing areas in the logic.

I also added a test case for compaction reads without the file system buffer reuse optimization. I am thinking that it may be valuable to make a future PR that unifies a lot of these prefetch tests and parametrizes as much of them as possible. This way we can avoid writing duplicate tests and just look over different parameters for async IO, direct IO, file system buffer reuse, and for_compaction.

@facebook-github-bot
Copy link
Contributor

@archang19 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@archang19 has updated the pull request. You must reimport the pull request before landing.

@archang19 archang19 force-pushed the assert-preconditions-prefetch branch from cb16e90 to 32d2d17 Compare December 6, 2024 23:28
@facebook-github-bot
Copy link
Contributor

@archang19 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@archang19 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@archang19 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@archang19 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@archang19 archang19 requested a review from anand1976 December 6, 2024 23:50
@archang19 archang19 force-pushed the assert-preconditions-prefetch branch from 3fd9280 to 9a0a2c4 Compare December 6, 2024 23:55
@facebook-github-bot
Copy link
Contributor

@archang19 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@archang19 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@archang19 has updated the pull request. You must reimport the pull request before landing.

1 similar comment
@facebook-github-bot
Copy link
Contributor

@archang19 has updated the pull request. You must reimport the pull request before landing.

@archang19 archang19 force-pushed the assert-preconditions-prefetch branch from f87e358 to 88ebd8b Compare December 9, 2024 23:30
@facebook-github-bot
Copy link
Contributor

@archang19 has updated the pull request. You must reimport the pull request before landing.

@archang19 archang19 changed the title Add precondition assertions for file prefetch buffer reads Enable file system buffer reuse for compaction prefetches Dec 9, 2024
@archang19 archang19 force-pushed the assert-preconditions-prefetch branch from 88ebd8b to ab57916 Compare December 9, 2024 23:44
@facebook-github-bot
Copy link
Contributor

@archang19 has updated the pull request. You must reimport the pull request before landing.

1 similar comment
@facebook-github-bot
Copy link
Contributor

@archang19 has updated the pull request. You must reimport the pull request before landing.

@archang19 archang19 force-pushed the assert-preconditions-prefetch branch from 315bfc0 to 5e4483f Compare December 12, 2024 18:12
@facebook-github-bot
Copy link
Contributor

@archang19 has updated the pull request. You must reimport the pull request before landing.

5 similar comments
@facebook-github-bot
Copy link
Contributor

@archang19 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@archang19 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@archang19 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@archang19 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@archang19 has updated the pull request. You must reimport the pull request before landing.

@hx235
Copy link
Contributor

hx235 commented Dec 13, 2024

Just to clarify:

Prefetch assumes num_buffers_ is 1 (i.e. async IO is disabled), so historically it did not have any overlap buffer logic. ... However, since we do not do any RefitTail method when use_fs_buffer is true, normally we would rely on copying the partial relevant data into an overlap buffer. That overlap buffer logic was missing, so the final main buffer ends up storing data from an offset that is greater than the requested offset, and we effectively end up "throwing away" part of the requested data.

Why do we need overlap buffer logic for reuse file system buffer + prefetch + non-assync IO? Is it because there can actually be overlapping data OR is it just because the overlap_buf_ is reused while the data being copied are non-overlapped? I'm trying to get a high-level understanding on why we ended up having "so the final main buffer ends up storing data from an offset that is greater than the requested offset..." in Prefetch()

// readahize_size_.
uint64_t trimmed_readahead_size = 0;
if (n < readahead_size_) {
trimmed_readahead_size = readahead_size_ - n;
Copy link
Contributor

@hx235 hx235 Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where in the original code shows that compaction is prefetching only readahead_size_ - n but not readahead_size_?

I remembered it was very confusing to me what was std::max(n, readahead_size_) in Prefetch() intended for - why not n + readahead_size_

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where in the original code shows that compaction is prefetching only readahead_size_ - n but not readahead_size_?

https://github.com/facebook/rocksdb/blob/main/file/file_prefetch_buffer.cc#L823

      if (for_compaction) {
        s = Prefetch(opts, reader, offset, std::max(n, readahead_size_));

I think you have it correct. The std::max(n, readahead_size_) that you mentioned is the reason that the prefetch is only readahead_size_ - n. If n < readahead_size_, we read the original n bytes plus readahead_size_ - n of prefetched data

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anand1976 I am in favor of unifying the logic so that we treat compaction readaheads the same as non-compaction readaheads (e.g. fetch the requested amount + readahead_size_ in either case). Do you have any objections / what are your thoughts? I think this would simplify our logic and don't see downsides

readahead_params.max_readahead_size = 8192;
readahead_params.num_buffers = 1;

FilePrefetchBuffer fpb(readahead_params, true, false, fs(), nullptr,
Copy link
Contributor

@hx235 hx235 Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add comments of the parameter name right after each hard-coded value, same for TryReadFromCache()

@@ -819,8 +819,20 @@ bool FilePrefetchBuffer::TryReadFromCacheUntracked(
assert(reader != nullptr);
assert(max_readahead_size_ >= readahead_size_);

// We disallow async IO for compaction reads since their
// latencies are not user-visible
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: ... are more tolerable ?? We have stats for user to know when compaction read is slow

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can update the wording. The idea was that the compaction read was a background operation, in contrast to a latency-sensitive user-initiated scan operation

ASSERT_EQ(stats->getAndResetTickerCount(PREFETCH_HITS), 1);
ASSERT_EQ(stats->getAndResetTickerCount(PREFETCH_BYTES_USEFUL),
4096); // 12288-16384
if (!for_compaction) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why !for_compaction?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add a comment in the code for this.

We do not update the prefetch stats for compaction reads https://github.com/facebook/rocksdb/blob/main/file/file_prefetch_buffer.cc#L856-L857, so I need this check to keep the tests passing for all the different test parameter variations

  } else if (!for_compaction) {
    UpdateStats(/*found_in_buffer=*/true, n);
  }

I don't know why exactly we don't do this. @anand1976 may know. I would guess that we don't want to "contaminate" the prefetch stats for non-compaction reads (PREFETCH_HITS and PREFETCH_BYTES_USEFUL) which we presumably are more interested in. I think we could have defined a separate set of prefetch stats for compaction reads specifically, but maybe we just did not care because the compaction reads are all background ops anyways

@archang19 archang19 force-pushed the assert-preconditions-prefetch branch from 2d72233 to ac5fd5c Compare December 19, 2024 16:50
@facebook-github-bot
Copy link
Contributor

@archang19 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@archang19 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@archang19
Copy link
Contributor Author

Why do we need overlap buffer logic for reuse file system buffer + prefetch + non-assync IO? Is it because there can actually be overlapping data OR is it just because the overlap_buf_ is reused while the data being copied are non-overlapped? I'm trying to get a high-level understanding on why we ended up having "so the final main buffer ends up storing data from an offset that is greater than the requested offset..." in Prefetch()

@hx235 With async IO, there are 2 buffers, so the overlap buffer is for when the requested data spans the 2 buffers (i.e. buffer 1 has the first half of the data and buffer 2 has the second half).

Without async IO, and with the file system buffer optimization, we use the overlap buffer when the main (and only buffer) only has a "partial hit." Say the buffer contains offsets 100-200 and we request 150-300. Now, without the file system buffer optimization, we would normally "refit tail," move bytes 150-200 from the end of the buffer to the start of the buffer, and then request bytes 200-300 (+ any additional prefetching). With the file system buffer optimization, we don't do that because we are pointing to a buffer allocated by the file system. So instead we would copy bytes 150-200 to the overlap buffer, request bytes 200-300 (+ any additional prefetching), and then copy back bytes 200-300 to the overlap buffer. Ultimately, the overlap buffer would contain exactly what the user asked for (bytes 150-300) and no more. We also avoid refetching the same data (i.e. making a request for bytes 150-300)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants