-
Notifications
You must be signed in to change notification settings - Fork 540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add "decompress" response utility #3423
base: main
Are you sure you want to change the base?
Conversation
it'd be nice if this worked with the rest of undici too, not just fetch |
@KhafraDev, can you point me to other usages where this should work? I admit, we mostly need it for fetch, since |
undici.request, stream, etc. #1155 |
Added a basic set of unit tests for the ✔ ignores responses without the "Content-Encoding" header (32.383292ms)
✔ ignores responses with empty "Content-Encoding" header (0.780042ms)
✔ ignores redirect responses (0.439708ms)
✔ ignores HEAD requests (0.456958ms)
﹣ ignores CONNECT requests (0.0775ms) # SKIP
✔ ignores responses with unsupported encoding (0.219292ms)
✔ decompresses responses with "gzip" encoding (15.465084ms) Still need to cover multiple |
@KhafraDev, thinking of tailoring this to a generic body decompression purposes, I can imagine it being used this way: function createDecompressionStream(args): TransformStream
// Then, usage:
anyCompressedBodyStream.pipe(createDecompressionStream(args)) The most challenging part is figuring out the |
I think for arguments a stream and a header value would be fine |
Got it. Added a Now, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So far, it LGTM;
I believe the decompressStream
can be used within an interceptor if we want to aim for that in the other PR as well.
lib/web/fetch/decompress.js
Outdated
decoders.length = 0 | ||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
decoders.length = 0 | |
break | |
return null |
Maybe return null or otherwise there is no way to know if it "failed".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also decompressStream
should probably not live under /web/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ronag, can it fail though? The intention was that, if there's no known compression provided, the stream is returned as-is.
You can still handle failures as you normally would:
decompressStream(myStream).on('error', callback)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if there's no known compression provided
The logic seems to be that if an unknown coding is sent, the body isn't decompressed at all. I'm not sure if that sounds right either.
Content-Encoding: gzip, zstd, br
-> ignored (zstd is not supported)
Content-Encoding: gzip, br
-> decompressed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a consumer, I expect the body stream to be returned as-is if:
- It has no
Content-Encoding
set (nocodings
provided as an argument); - It has unknown
Content-Encoding
set or an unknowncoding(s)
is provided as an argument.
I wouldn't expect it to throw because that's a disruptive behavior that implies I need to add a guard before trying to call decompress
/decompressStream
. At least, not at a level this low.
If the API is potentially disruptive, I need to have an extra check in place to prevent scenarios where decompression is impossible. This puts extra work on me as a consumer but also results in a more frail API since the list of supported encodings is baked in Undici and may change across releases.
I strongly suggest to return the body stream as-is in those two cases I described above. decompressString()
must never error, save for the streams errors itself (and those are irrelevant in the context of this function).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@metcoder95 @KhafraDev do you agree with my reasoning in the post above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hah, so there's an existing expectation that decompression is skipped if encountered unknown encoding:
✖ should skip decompression if unsupported codings
I suppose that answers the question.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
throwing an error will make fetch slower
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the decoders customization discussion, I don't think it justifies the cost of making decompressStream
more complex. The only thing you need is the list of decodings, which you can get using a one-liner:
const codings = contentEncoding
.toLowerCase()
.split(',')
.map((coding) => coding.trim())
The purpose of decompressStream
is to grab the exact behavior Undici has under the hood. If you need a different behavior, you should (1) parse the Content-Encoding
header; (2) map encodings to decompression streams with custom options.
I can see how you'd want to extend or override certain things from the default decompression, and for that we, perhaps, can consider exporting those options separately?
module.exports = {
gzipDecompressionOptions: {
flush: zlib.constants.Z_SYNC_FLUSH,
finishFlush: zlib.constants.Z_SYNC_FLUSH
}
}
Even then, that's all the options Undici uses right now. To me this looks like a nice thing to have without substantial use case argumentation. I lean toward iterating on this when we have more of the latter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding erroring or throwing; As you suggest @kettanaito, I'd like to fail faster and inform the implementer the encoding is not supported without even disturbing the input stream, so fallbacks can be applied if they want to.
For fetch
, as @KhafraDev points out, the throw
ing might be slower, so we can disable by default, and not throw at all, but rather ignore the non-supported encodings.
The issue I'd like to avoid, is putting the implementer in the spot of all or nothing.
Even then, that's all the options Undici uses right now. To me this looks like a nice thing to have without substantial use case argumentation. I lean toward iterating on this when we have more of the latter.
Agree, let's do that 👍
788a1df
to
1b06bf1
Compare
I've added unit tests for Also added test cases for |
The CI is failing with some issues download Node.js binary, it seems. Doesn't appear to be related to the changes. |
@KhafraDev, what do you think if myStream.pipe(createDecompressStream(codings)) I'd love to learn from you about the difference/implications here. Thanks. |
UpdateI've added remaining unit tests for the Based on the conclusion of the discussion around unknown encodings, and taking into the consideration that there's an explicit existing test suite that unknown encodings must be ignored when decompressing the I suggest adding the throwing logic to With this, I believe this pull request is ready for review cc @KhafraDev @mcollina |
lib/web/fetch/decompress.js
Outdated
* @param {Response} response | ||
* @returns {ReadableStream<Uint8Array> | null} | ||
*/ | ||
function decompress (request, response) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think decompressStream is sufficient. It should already be redirected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to have a designated utility that performs the decompression including the handling of redirect responses, etc. decompressStream
implies you handle that yourself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks to me like you are targeting the Fetch api response.
In this case, have you already been redirected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tsctx, I'm moving the existing content encoding logic to this utility function. In the existing logic, Undici does accept the response, which is the redirect response, not the redirected response. This is correct. This utility must also decide if decompression is unnecessary if the response is a redirect response.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would recommend adding 'Web' to the suffix since it targets the web api.
body: decompress(request, { | ||
status, | ||
statusText, | ||
headers: headersList, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
headersList is not an instance of Headers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, not, but its APIs are compatible, from what I can see. At least, the entire test suite hasn't proven me wrong.
I can construct Headers
instance out of headersList
but it may have performance implications. Would you advise me to do that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just having the considerations of @tsctx about the util namespace and the following thread: https://github.com/nodejs/undici/pull/3423/files#r1694215822
Can you please add the types for this and the related type tests? |
On a related subject, I've recently learned about response.body.pipeTo(new DecompressionStream('gzip')) One practical downside of this is that there's no connection between the response content encoding and the decompression used. You can also provide only one decompression format (gzip, deflate, deflate-raw), so in case of response streams encoded with multiple codings, you'd have to create a pipe of decompression streams. Does anybody know if there's any difference in I'd still much like to be consistent with Unidici here, but knowing more would be good. |
Moved the tests to the root level of ✖ should include encodedBodySize in performance entry (1.82966ms)
TypeError [Error]: webidl.converters.USVString is not a function
at fetch (/Users/kettanaito/Projects/contrib/undici/index.js:121:13)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at async Server.<anonymous> (/Users/kettanaito/Projects/contrib/undici/test/fetch/resource-timing.js:68:18) The tests are in their own modules, no existing tests were edited. The errors themselves don't seem to be related to the changes. Does anybody know what's causing this? |
@kettanaito are you still considering finishing this? |
@mcollina, yes. I've fallen off lately, doing other things. This is still on my todo list, and I'd love to see this merged. Still got those test shenanigans, no idea what causing seemingly unrelated tests to fail. I would appreciate your patience with me here, it may take me time to get back to this. |
I've rebased the branch against the latest Is the rightmost -body: decoders.length
- ? pipeline(this.body, ...decoders, (err) => {
- if (err) {
- this.onError(err)
- }
- }).on('error', onError)
- : this.body.on('error', onError)
+body: decompress(request, {
+ status,
+ statusText,
+ headers: headersList,
+ body: this.body
+}).on('error', onError) Basically, my rebase removed the Also, since |
On code formattingI undid the code formatting to keep the diff sane, but I highly encourage you to investigate your Prettier setup. Running |
Can I please get some help with these failing tests? They don't seem to be related to the changes I'm introducing. Looks more like I need to build something or update something...
Why is |
It's a circular require, |
lib/core/util.js
Outdated
let createInflate | ||
|
||
function lazyCreateInflate (zlibOptions) { | ||
createInflate ??= require('../web/fetch/util').createInflate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@KhafraDev, wouldn't it make more sense to lift createInflate
to ../core/util
in this case? If multiple modules depend on it, perhaps it's a sign to make it a generic utility.
From what I can see, it was collocated with fetch only because it was used for the body decompression. Now that the decompression logic is a core util, maybe it should be there, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went for the path of least resistance, I have no objections in either case.
/** | ||
* @note Undici does not support the "CONNECT" request method. | ||
*/ | ||
test.skip('ignores CONNECT requests', async () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should, or what do you mean exactly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CONNECT is a special method which opens a tunnel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I get that but we do support it (maybe fetch doesn’t, and that’s the part that i missed?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this utility get publicly exposed?
@mcollina, yes, that's the intention. We would like to reuse it in MSW to have a consistent body decompression across the board. As suggested by prior reviewers, we can expose this from utils, but I suppose that still means the root export of the package. |
@KhafraDev, thank you so much for your support on this one! I believe there are still type tests remaining on my end, and making sure we export this correctly (perhaps @mcollina has some thoughts on this). |
cc18ceb
to
4cd64f3
Compare
@kettanaito add the export from the root package and related docs. |
Will do! I should have some time tomorrow to finalize this. |
Is this needed after mswjs/interceptors#661? I think there are some semver issues to consider along with the general weirdness of exposing internals for public use |
@KhafraDev, that's a good question. I've come to realize that we wouldn't be able to reuse this compression straight away, and would have to either ship I would rather not ship I was surprised to learn about the Compression Streams API. Sadly, it doesn't support Brotli, and I'm researching the possible ways to move forward here. That being said, I still find the |
I think that moving the code from fetch to core is wrong. decompress expects Request and Response from fetch and not the core Request and Response instances. So the decompression needs to be moved back to the fetch folder. Or do I understand that wrong? |
This relates to...
Rationale
Exposing the response decompression logic (i.e. the handling of the
Content-Encoding
header) will allow other Node.js libraries to reuse it, resulting in a consistent experience for everyone.Changes
Features
decompress
utility, abstracting theContent-Type
handling from the existinglib/fetch/index.js
.decompress
utility in theonHeaders
callback.decompress
.decompressStream
.decompress
utility publicly fromundici
.decompressStream
utility publicly fromundici
.Bug Fixes
Breaking Changes and Deprecations
Status