-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(fetcher): first step at optimization #30
Conversation
dcef703
to
321d03c
Compare
allows simple & batching fetchers to use a custom fetch implementation. also exposes tracing library.
I’m hesitant to move away from multipart byte range requests. I really feel that the client should ask for exactly what it wants and the server should decide whether it’s optimal to “over share” ranges because they’re close together. Not using byte range requests forces the server side to over-egress with no real recourse i.e. no-one is going to be willing to pay for it. I’m not sure how we justify this in the decentralized network? I’d like to see some evidence of multipart byte range support varying per server, and some reasoning for why it would cause us problems. It’s well spec’d and in my experience seems consistent between implementations. Note we don’t need to mandate support for multipart, but there would be no incentive to not support it because you’d end up being a slow server, requiring a request per block (assuming we’d not send requests to cover multiple blocks since the extra egress is not accountable). |
feat(fetcher): allow passing a custom fetch implementation
I have slept on this and I’m not totally against it. I think that the multipart header in a response is likely larger than a block header within a CAR file. So actually requesting multiple blocks in a single request might actually be less egress than the equivalent multipart request, assuming the blocks are next to each other (which is also likely) and we don't allow the distance between blocks to be larger than say, your average multipart header. I don’t have any evidence for this, but I reckon it’s probably true 😜. |
I'm fine with this going forwards. Gateway related libraries were originally separate because freeway was just one of a few different gateways we implemented e.g. autobahn - we needed related libraries in multiple implementations. Arguably this should have been part of |
I'm not clear on the performance benefits this change has had? Is there a comparable before/after trace that I can look at? I'm just wondering if simply altering the existing code to resolve blocks as soon as they are processed would yield basically the same results? |
src/fetcher/batching.js
Outdated
// get last byte to fetch | ||
const aggregateRangeEnd = resolvedBlobs.reduce((aggregateEnd, r) => r.range[1] > aggregateEnd ? r.range[1] : aggregateEnd, 0) | ||
// fetch bytes from the first starting byte to the last byte to fetch | ||
const headers = { Range: `bytes=${resolvedBlobs[0].range[0]}-${aggregateRangeEnd}` } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can tell this doesn't account for the space between blocks and it would be easy to upload a CAR file where the blocks are not ordered and thus would cause the worker to download far more data than needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: I have seen this type of thing where DAGs have been generated and stored to disk using for example leveldb and then streamed back out in key order (effectively random) to be stored.
fix(blob-fetcher): remove unused package
@alanshaw I've reverted back to using multipart ranges, let me know if you need any additional changes here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
src/api.ts
Outdated
* [-100] | ||
* ``` | ||
*/ | ||
export type Range = AbsoluteRange | SuffixRange |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're not removing multipart-byte-range
can we just re-use or re-export the types from there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh right :)
Goals
First pass to optimize TTFB and bandwidth on the batching fetcher
Implementation
see https://www.notion.so/storacha/Notes-on-Gateway-optimization-15c5305b55248022ad9be861df4899fb?pvs=4
The final algorithm that I settled on, which balances resource usage with optimal TTFB & bandwidth, is as follows:
For discussion
Future suggestion: freeway should be a monorepo, and this should be part of freeway.
Final trace: