feat(fetcher): first step at optimization #30

hannahhoward · 2024-12-14T06:12:54Z

Goals

First pass to optimize TTFB and bandwidth on the batching fetcher

Implementation

see https://www.notion.so/storacha/Notes-on-Gateway-optimization-15c5305b55248022ad9be861df4899fb?pvs=4

The final algorithm that I settled on, which balances resource usage with optimal TTFB & bandwidth, is as follows:

during a batch fetch, return blocks to the caller as soon as they are available (previously, the entire batch would need to be processed before any blocks are returned to the caller)
remove multipart range requests, opting for a single range request that covers the entire batch. this may potentially grab a few extra bytes, but multipart range request support varies a lot server to server, and I'd rather not require it as we move into a decentralized world.
when multiple batch requests are present at the same time, queue them up as follows:
- make a request
- as soon as first byte is received, kick off the next request while processing the response body of the first request
- iterate through batches but make sure you only have at most two requests that are processing blocks (so memory usage is never more than batch size * 2). IOW, if you have received first byte on 2 requests, do not kick off any more until at least one finishes processing.
- this should produce a steady stream of blocks that can be processed as fast as any client can handle
- implementing this algorithm does require the use of async generators
this also adds some helpful tracing helpers. I'm not sure if these should live in this library though.

For discussion

Future suggestion: freeway should be a monorepo, and this should be part of freeway.

Final trace:

allows simple & batching fetchers to use a custom fetch implementation. also exposes tracing library.

alanshaw · 2025-01-04T21:54:48Z

I’m hesitant to move away from multipart byte range requests. I really feel that the client should ask for exactly what it wants and the server should decide whether it’s optimal to “over share” ranges because they’re close together. Not using byte range requests forces the server side to over-egress with no real recourse i.e. no-one is going to be willing to pay for it. I’m not sure how we justify this in the decentralized network?

I’d like to see some evidence of multipart byte range support varying per server, and some reasoning for why it would cause us problems. It’s well spec’d and in my experience seems consistent between implementations. Note we don’t need to mandate support for multipart, but there would be no incentive to not support it because you’d end up being a slow server, requiring a request per block (assuming we’d not send requests to cover multiple blocks since the extra egress is not accountable).

feat(fetcher): allow passing a custom fetch implementation

alanshaw · 2025-01-05T18:05:49Z

I have slept on this and I’m not totally against it. I think that the multipart header in a response is likely larger than a block header within a CAR file. So actually requesting multiple blocks in a single request might actually be less egress than the equivalent multipart request, assuming the blocks are next to each other (which is also likely) and we don't allow the distance between blocks to be larger than say, your average multipart header. I don’t have any evidence for this, but I reckon it’s probably true 😜.

alanshaw · 2025-01-06T14:43:56Z

Future suggestion: freeway should be a monorepo, and this should be part of freeway.

I'm fine with this going forwards. Gateway related libraries were originally separate because freeway was just one of a few different gateways we implemented e.g. autobahn - we needed related libraries in multiple implementations. Arguably this should have been part of gateway-lib originally.

alanshaw · 2025-01-06T14:49:28Z

I'm not clear on the performance benefits this change has had? Is there a comparable before/after trace that I can look at?

I'm just wondering if simply altering the existing code to resolve blocks as soon as they are processed would yield basically the same results?

alanshaw · 2025-01-06T14:54:45Z

src/fetcher/batching.js

+  // get last byte to fetch
+  const aggregateRangeEnd = resolvedBlobs.reduce((aggregateEnd, r) => r.range[1] > aggregateEnd ? r.range[1] : aggregateEnd, 0)
+  // fetch bytes from the first starting byte to the last byte to fetch
+  const headers = { Range: `bytes=${resolvedBlobs[0].range[0]}-${aggregateRangeEnd}` }


As far as I can tell this doesn't account for the space between blocks and it would be easy to upload a CAR file where the blocks are not ordered and thus would cause the worker to download far more data than needed.

Note: I have seen this type of thing where DAGs have been generated and stored to disk using for example leveldb and then streamed back out in key order (effectively random) to be stored.

fix(blob-fetcher): remove unused package

hannahhoward · 2025-01-07T22:38:33Z

@alanshaw I've reverted back to using multipart ranges, let me know if you need any additional changes here

alanshaw

LGTM

alanshaw · 2025-01-08T11:59:25Z

src/api.ts

+ * [-100]
+ * ```
+ */
+export type Range = AbsoluteRange | SuffixRange


Since we're not removing multipart-byte-range can we just re-use or re-export the types from there?

oh right :)

hannahhoward added 6 commits December 13, 2024 18:11

feat(tracing): add tracing

b24812e

feat(batching): fetch from single range

24ff66c

feat(batching): kick off fetchblobs in parallel

33afdb1

feat(batching): resolve blocks as soon as we have them

34c5a81

refactor(minimize memory usage with generators):

ae3e5f1

refactor(batching): use generator for memory usage

80320a7

hannahhoward changed the title ~~WIP: Optimization~~ Tracing and Optimization Dec 25, 2024

hannahhoward requested a review from alanshaw January 4, 2025 04:31

hannahhoward marked this pull request as ready for review January 4, 2025 04:31

hannahhoward changed the title ~~Tracing and Optimization~~ feat(fetcher): first step at optimization Jan 4, 2025

refactor(fetcher): document algorithms used, cleanup code

321d03c

hannahhoward force-pushed the feat/optimization branch from dcef703 to 321d03c Compare January 4, 2025 04:48

hannahhoward added 2 commits January 3, 2025 20:50

feat(fetcher): allow passing a custom fetch implementation

c828a19

allows simple & batching fetchers to use a custom fetch implementation. also exposes tracing library.

2.4.4-rc.0

3ebc9c5

Merge pull request #31 from storacha/feat/custom-fetch

60a5276

feat(fetcher): allow passing a custom fetch implementation

hannahhoward mentioned this pull request Jan 5, 2025

Intercept fetches to R2 and use direct CARPARK pull storacha/freeway#141

Merged

alanshaw reviewed Jan 6, 2025

View reviewed changes

feat(blob-fetcher): revert no multipart-byte-range

46fa971

fix(blob-fetcher): remove unused package

alanshaw approved these changes Jan 8, 2025

View reviewed changes

refactor(api): remove copied types

6d95105

hannahhoward merged commit c2c609e into main Jan 9, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(fetcher): first step at optimization #30

feat(fetcher): first step at optimization #30

hannahhoward commented Dec 14, 2024 •

edited

Loading

alanshaw commented Jan 4, 2025

alanshaw commented Jan 5, 2025

alanshaw commented Jan 6, 2025

alanshaw commented Jan 6, 2025 •

edited

Loading

alanshaw Jan 6, 2025

alanshaw Jan 6, 2025

hannahhoward commented Jan 7, 2025

alanshaw left a comment

alanshaw Jan 8, 2025

hannahhoward Jan 8, 2025

feat(fetcher): first step at optimization #30

feat(fetcher): first step at optimization #30

Conversation

hannahhoward commented Dec 14, 2024 • edited Loading

Goals

Implementation

For discussion

alanshaw commented Jan 4, 2025

alanshaw commented Jan 5, 2025

alanshaw commented Jan 6, 2025

alanshaw commented Jan 6, 2025 • edited Loading

alanshaw Jan 6, 2025

Choose a reason for hiding this comment

alanshaw Jan 6, 2025

Choose a reason for hiding this comment

hannahhoward commented Jan 7, 2025

alanshaw left a comment

Choose a reason for hiding this comment

alanshaw Jan 8, 2025

Choose a reason for hiding this comment

hannahhoward Jan 8, 2025

Choose a reason for hiding this comment

hannahhoward commented Dec 14, 2024 •

edited

Loading

alanshaw commented Jan 6, 2025 •

edited

Loading