Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intercept fetches to R2 and use direct CARPARK pull #141

Merged
merged 5 commits into from
Jan 9, 2025

Conversation

hannahhoward
Copy link
Member

@hannahhoward hannahhoward commented Jan 4, 2025

Goals

When we fetch blocks from an R2 CARPark URL, we appear to get TTFB around 400ms. Interestingly, if we instead use Cloudflare Worker's direct R2 access, our TTFB drops to around 200ms. Bandwidth doesn't seem to change much, but this is still an important win.

Implementation

  • Using a seperate change in blob-fetcher to allow passing a custom fetch implementation, we now write a version of fetch that checks URLs to see if they are CAR Park URLs. if they are, instead of using the fetch API, we use our direct connection to CARPark in the worker to perform the fetch request.
  • Pass this custom fetch to the batching fetch used by content claims dagula
  • Also, to assemble a browser like Response object use the existing code in withCarBlockHandler, factored out to utility file
  • In the process of doing this, I discovered a bug in that code where it was treating the last byte in the range as exclusive to the returned content rather inclusive, and this was undetected as the test was expecting the wrong result.

For Discussion

Comparison traces, note the average TTFB of 200 vs 400 on requests to R2

Regular fetch:
Screenshot 2025-01-03 at 8 42 11 PM

With intercepted direct gets to R2:
Screenshot 2025-01-04 at 3 48 55 PM

@hannahhoward
Copy link
Member Author

hannahhoward commented Jan 4, 2025

blocked by storacha/blob-fetcher#30

Base automatically changed from fix/tracing-usage to main January 5, 2025 00:02
@hannahhoward hannahhoward force-pushed the feat/use-carpark-fetch branch from 24f687a to d79f917 Compare January 5, 2025 00:03
Copy link
Member

@fforbeck fforbeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Member

@alanshaw alanshaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code change looks good to me but does require non-multipart byte range requests from the fetcher.

Do you see the 200ms perf issue when hitting the public URL directly...or just from the worker? I'm just wondering if it's the public bucket code that needs optimizing? https://github.com/storacha/public-bucket - looks like it does a bucket.head then bucket.get for all requests - perhaps that's the difference?

if (rangeHeader) {
try {
range = parseRange(rangeHeader)
} catch (err) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
} catch (err) {
} catch (err) {
console.warn('parsing range header', err)

@hannahhoward hannahhoward force-pushed the feat/use-carpark-fetch branch 2 times, most recently from 51e9393 to e0ba7c8 Compare January 8, 2025 02:15
@hannahhoward hannahhoward force-pushed the feat/use-carpark-fetch branch from e0ba7c8 to c1944ff Compare January 8, 2025 02:22
@hannahhoward
Copy link
Member Author

@alanshaw now uses the public public handler directly. perf looks really good:

Screenshot 2025-01-07 at 6 24 36 PM

@hannahhoward hannahhoward merged commit 4daf8a3 into main Jan 9, 2025
1 check passed
@hannahhoward hannahhoward deleted the feat/use-carpark-fetch branch January 9, 2025 02:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants