Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to compress request body #989

Merged
merged 23 commits into from
Apr 12, 2019

Conversation

mstoykov
Copy link
Contributor

@mstoykov mstoykov commented Apr 8, 2019

This adds compression param to all http methods that will compress the body
set the correct Content-Encoding and Content-Length and will send the request
with the compressed body. The currently supported algorithms are 'gzip' and 'deflate'.
Combinations of the two separated by a comma(",") are also supported.

This fixes #988

This adds `compression` param to all http methods with only current
possible value `gzip` that will compress the body set the correct
`Content-Encoding` and `Content-Length` and will send the request with
the compressed body.
@mstoykov mstoykov modified the milestone: v1.0.0 Apr 8, 2019
Copy link
Member

@robingustafsson robingustafsson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just don't forget to update release notes 🙂

@mstoykov
Copy link
Contributor Author

mstoykov commented Apr 8, 2019

LGTM, just don't forget to update release notes slightly_smiling_face

I am even going to update the docs :P
after all the checks pass ... and figure out what appveyor's problem is >.>

@codecov
Copy link

codecov bot commented Apr 8, 2019

Codecov Report

Merging #989 into master will decrease coverage by 0.02%.
The diff coverage is 53.33%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     grafana/k6#989      +/-   ##
==========================================
- Coverage   72.12%   72.09%   -0.03%     
==========================================
  Files         131      131              
  Lines        9621     9636      +15     
==========================================
+ Hits         6939     6947       +8     
- Misses       2268     2272       +4     
- Partials      414      417       +3
Impacted Files Coverage Δ
js/modules/k6/http/request.go 78.42% <53.33%> (-1.36%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3506ee1...9efecb2. Read the comment docs.

@codecov
Copy link

codecov bot commented Apr 8, 2019

Codecov Report

Merging #989 into master will increase coverage by 0.16%.
The diff coverage is 90.69%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     grafana/k6#989      +/-   ##
==========================================
+ Coverage   72.12%   72.28%   +0.16%     
==========================================
  Files         131      132       +1     
  Lines        9621     9699      +78     
==========================================
+ Hits         6939     7011      +72     
- Misses       2268     2273       +5     
- Partials      414      415       +1
Impacted Files Coverage Δ
js/modules/k6/http/request.go 80.48% <100%> (+0.7%) ⬆️
lib/netext/httpext/compression_type_gen.go 86.66% <86.66%> (ø)
lib/netext/httpext/request.go 88.28% <89.65%> (+0.11%) ⬆️
lib/netext/httpext/tracer.go 98.21% <0%> (+1.78%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3506ee1...5ae8d5c. Read the comment docs.

js/modules/k6/http/request.go Outdated Show resolved Hide resolved
js/modules/k6/http/request.go Outdated Show resolved Hide resolved
release notes/upcoming.md Outdated Show resolved Hide resolved
@na--
Copy link
Member

na-- commented Apr 9, 2019

Moving the discussion about just using the Content-Encoding header to specify any compression here, instead of in the inline code comments. In my mind, this would make both for better UX, and make the implementation cleaner.

It seems fine to me if we should throw errors when users try to specify unsupported Content-Encoding values (e.g. br for Brotli, or multiple encodings like deflate, gzip). Until we support those things, of course. For example, there's already an open issue for brotli support, and we can easily add other ones for zstd, compress and any others or multiple layered encodings like gzip, deflate

A somewhat connected issue would be the Accept-Encoding and Content-Encoding headers we generate in the HAR file converter, as mentioned here - we need to make sure we don't advertise things we don't support there.

@robingustafsson
Copy link
Member

I'm against using user specified headers as an implicit signal that we should do additional processing of the body contents (if you specify application/json or any other Content-Type we don't do any special processing of the body data afaik?).
Specifying headers like Content-Encoding to me means you're doing low-level modifications of the request and I'd thus expect you to also do the compression of the body yourself.
I think any situation where we want to do further processing in Go, it should be a "param" as it's more explicit that there will be some processing done in Go that will (potentially) modify the HTTP headers and/or body.

@na--
Copy link
Member

na-- commented Apr 9, 2019

Well... I'm not totally convinced that we need separate options. I get that it might be a bit strange to modify k6's behavior by setting the headers map of a request object, but if the alternative is having multiple different and separate settings that magically modify the headers for every little thing? This will only exacerbate the config mess we're currently in...

Even ignoring that, how do we handle things if those settings and the user-specified header values conflict? Do we silently overwrite them or do we throw an error? Or what if the user specified a gzip content-type but didn't compress the body themselves? And in any case, such compression will be super inefficient in the JS runtime, so should we expose the gzip/deflate/brotly/etc. primitives in the K6 JS API as well? The whole issue becomes much more complicated for no apparent UX benefit...

Regarding Content-Type: there are a bunch of open issues for application/json and multipart/form-data content types that would be much better with this approach: #843, #747, #878, #988 (yes, this one 😁 ). The actual k6 code will also be more readable. This seems true to me also for any converter output, i.e. the internal HAR converter (https://github.com/loadimpact/k6/issues/762) and the other jmeter/postman/etc. converters.

And for super-advanced users that want to manually compress their request bodies or mess around with other internals, we could add a single boolean ignoreRequestHeaders option in http.Params that would disable any extra k6 behavior, though I doubt many users would use it.

Also, we kind of already read the HTTP headers, to get any Host value the user manually specified. And to merge cookies between the request and the cookie jar... I think I actually found a bug with the current k6 behavior when I was checking this. What do you think this should produce?

import http from "k6/http";

export let options = {
  iterations: 2,
};

export default function () {
  // Set cookies
  if (__ITER === 0) {
    let resp1 = http.get("https://httpbingo.org/cookies/set?key=value1");
    console.log(resp1.body);
  }

  // Check cookies
  let resp2 = http.get("https://httpbingo.org/cookies");
  console.log(resp2.body);

  // Check custom cookies...
  let resp3 = http.get("https://httpbingo.org/cookies", {
    headers: { "Cookie": "key=value2;anotherkey=anothervalue" }
  });
  console.log(resp3.body);
}

And similarly, we probably need to validate any Accept-encoding headers that contain encoding that k6 doesn't support.

@mstoykov
Copy link
Contributor Author

mstoykov commented Apr 9, 2019

Us reading the headers is not a problem, us doing "magical" interactions with them is.
My opinion on how params vs headers should work (not how it works now but how it will be least confusing):

  1. headers take precedence
  2. If we would modify a header because of param we don't
    2.1. Maybe as a better UX print a warning when params are specifically specified that will not overwrite headers, but not if they are set for the whole group()/script after Ideas for improving the group() built-in function #884 .

The bug above is strange ... I guess it has something to do with the way cookies are used and cookiejars and such.

As an additional question how do you think #970 should be implemented ? because I have always thought it will be through params supposedly or do you propose they add the headers themself

@na--
Copy link
Member

na-- commented Apr 9, 2019

Sorry, I'm not sure I understand what you mean by "headers take precedence". Let's try with examples.... Here's this script, what should happen?

http.get("https://httpbin.org/anything", {
  compression: "gzip",
  headers: {
    "Content-Encoding": "br",
  }
});

Here's a similar script, with Content-Type this time:

http.post("https://httpbin.org/anything", {foo: "bar", "baz": "zaz"}, {
  contentType: "application/json",
  headers: {
    "Content-Type": "multipart/form-data; boundary=--myBoundary",
  }
});

If "headers take precedence" means we do nothing, since the user explicitly set the header, then this seems like a poor UX to me, since the user also explicitly set our parameter. Or are those errors, warnings, or do we silently overwrite the headers? And how would you deal with more complicated cases, like setting that custom multipart boundary with params? Like this:

http.post("https://httpbin.org/anything", {foo: "bar", "baz": "zaz"}, {
  contentType: "multipart/form-data; boundary=--myBoundary",
});

or by separate parameters, using an object? Or multiple encodings? In a header it's simply Content-Encoding: deflate, gzip, bit do you use contentEncoding: "deflate, gzip" or contentEncoding: ["deflate", "gzip"] in the JS code? I'm struggling to understand why we're trying to re-invent the wheel when there's already a standardized way all of these options are configured that would be in a needless conflict with any custom options we introduce.

@na--
Copy link
Member

na-- commented Apr 9, 2019

Regarding #970 - in the same way I'm not suggesting we make users implement brotly/gzip/zstd compression themselves, or NTLM/digest/basic authentication, I'm also not suggesting that they should implement the h2c headers themselves. There are also other protocol-level issues that are way beyond the level of headers...

@mstoykov
Copy link
Contributor Author

mstoykov commented Apr 9, 2019

p.s. While writing this I decided that this discussion should probably be moved outside of this PR as this is getting out of scope

By "precedence" I mean - we do nothing if they've setted a header that our param would've set. I think it means that they have mistaken something and we either error out (seem excessive but maybe the better option) or print a warning (probably going to get noisy).

first example - we don't encode anything send the header as it is and print a warning/error out without doing anything. Will their server get uncompressed body when they expected compressed one -yes . Is that bad - yes. Do we have to read minds in this situation - I don't think so.

second example - what does contentType does ? and leave it to what the Header is.

third example - they are not doing it correctly ... it is not our fault the HTTP spec for multipart/form-data requires a lot of hard to write by hand text ...
Which after I read some more leaves me to believe that

  1. K6 Does not respect the boundary value placed in a Content-Type header #843 is invalid - they've set the boundary wrong in the header. (even if they are using the map/json object body)
  2. The whole thing with the file seems to me like it was a try to make it work in all cases but tried to be magical by not actually not letting the user set the Content-Type through some useful manner. I would propose that we add some params option that sets it instead.

All my arguments have been about: lets not overwrite what the user has said or change things without it being extremely obvious that this is what the user wants is, because I HATE MAGIC (not magic the gathering, that one is great). Everything that does something without it being obvious is potential issue.

  • If you set param that says compression:gzip you would expect that we will compress something as gzip ... for example your body as there is nothing else that is gzipable in that particular case. Should we also require them to set the header ? I would say "no" because we don't make them set Content-Length* as well. (I suppose if they Content-Length we should respect that ... although this could be problematic)
  • If you set your header to something you will expect that your header will have the value you have set it ... no matter what.
  • If you have given us some json object(map) instead of string/array of integers you would ... I suppose expect that we will send it as formdata and here the magic (given that we want to support this case) is inevitable because if we also support setting files there are 2 Content-Types we can choose from and this depends on what the server expects. IMO we should keep the current behaviour and add an option for through params for users to say that this request even though it can be application/x-www-form-urlencoded you want it to be mutlipart/form-data. I guess the name should multipart: true. I would need to go look at all possible/likely Content-Types to see if there are other funky corner cases.
    But this whole issue comes from magic that we have decided to add for user convenience.

And the thing while writing this that I found out is that there doesn't appear to be cases where k6 reads headers and than change them- it just change them to not make users set them but we require them to set other things in order for things like auth/multipart/formdata and so on to work, without the user setting the. We obviously will need to continue to set some headers on behalf of the user as thing such as Cookies and Content-Length and Agent and so on are hard to do. But IMO we should always leave what they have set. Otherwise there is no way to check how an API behaves if you send it crappy headers/data. To that end we should probably add another params flag such as don't touch anything or at least don't touch headers.
This of course all goes toward what exactly do we want our users to be able to do ... easy and how easy is easy for particular use case. I think less magic is always better. And in my opinion the route with "a lot" of params for different things leaves it possible for users to do testing for things that are not "usual"/"standard" or downright broken and see how their tests behave. Going down the route of us deciding based on headers what the user actually wanted seems to me to tie our and our users hand in the future to only ways of working that we have considered useful.

@na--
Copy link
Member

na-- commented Apr 9, 2019

I think it might be a little too late to move the discussion to the issue, since a lot of the context is already here...

#843 is invalid - they've set the boundary wrong in the header. (even if they are using the map/json object body)

They haven't though - boundary=--myBoundary is perfectly valid, unless it's contained somewhere in the body.

second example - what does contentType does ? and leave it to what the Header is.

You've kind of answered yourself below in your post. Say that we "ignore" (i.e. not use for configuration but warn/error/? if there's mismatch?) the user-specified headers. Then we'd need another new http.Params option that could be used to specify how k6 should encode the body users send in http requests, if it's an object. I'll use it as an example again, since it may be the most tricky use case out of all of them...

The current Content-Type logic is roughly this:

  • if there's no body or if the body is string or []byte - k6 won't mess with the Content-Type header and whatever the user has specified will be sent, and if nothing is specified, nothing will be sent.
  • if we pass an object as body:
    • if that object contains an http.FileData element, we'll send Content-Type: multipart/form-data; boundary=some-random-boundary
    • otherwise, we'll urlencode the body and send Content-Type: application/x-www-form-urlencoded

This is mostly reasonable, and yes - slightly "magical" ✨ 😄 . If I was starting from scratch, maybe I would have had a much more explicit system, like how you're forced to build a multipart request in Go. But, given that k6 is a load testing tool, this small amount of magic is probably justified and results in better UX, especially if it actually worked better. It currently has a lot of shortcomings, and it breaks in some corner cases:

  • it's very inflexible, if users want to force Content-Type: multipart/form-data for non-file-upload requests, there's no way to do it (Better handling of multipart/form-data requests #747)
  • there's no way to specify a custom boundary (K6 Does not respect the boundary value placed in a Content-Type header #843)
  • even more general than the above issue, it breaks when there's a mismatch between a custom Content-Type header and the body we're sending. For example, sending Content-Type: application/x-www-form-urlencoded and an http.file() body produces something completely broken, since the headers have precedence
  • there are no warnings or errors since k6 doesn't check for custom Content-Type values

All of that can be easily fixed if we check for custom Content-Type headers values that users are specifying and adjust k6's behavior accordingly. For another minor UX win, we'll even be able to handle Content-Type: application/json requests accordingly (#878). I don't think this would substantially increase the amount of magic, it will just fix the currently broken magic 😉

But instead of making the two options (body and Content-Type header) work in sync, like users expect them to, you're instead proposing that we should add a third option multipart: true in the http.Params object so we avoid more magic 🙌 😄. Not only do I think that it won't fix any of the issues above, it will simply make things more complicated and unclear, and add more validation code that we'll have to add so users don't shoot themselves in the foot.

I feel your dislike for magic, I usually don't like it as well, but mostly the "magic at a distance" phenomenon. In this case, k6 reading the Content-Type and Content-Encoding headers of a single request and acting accordingly seems like an OK compromise to me... And even though I feel like testing for things that are not "usual"/"standard" or downright broken is a bit of a straw-man use case for a load-testing tool, it can easily be accommodated! I mentioned it above, but we can simply add single, simple, boolean option in http.Params that would cause k6 to disregard any request headers. We can call it disableMagic, or disregardHeaders or ignoreCustomHeaders or whatever - it will turn off any special handling for the following headers:

  • Host - we already do that... though probably in the wrong place 😄
  • Cookie - we already do that as well, as demonstrated in the example above
  • Content-Encoding - what started the argument
  • Content-Type - what I described above as the most complicated use case IMO
  • Accept-Encoding - we probably should verify that we support the encodings, otherwise servers will return garbage
  • any other "special" headers we're currently not validating but probably should: TE, Trailer, Upgrade, Connection, Keep-Alive, ...? there are a lot of corner cases we've likely glossed over and ignore when we should instead return a warning or an error...

@na-- na-- mentioned this pull request Apr 9, 2019
@na--
Copy link
Member

na-- commented Apr 10, 2019

After sleeping on this and thinking about it some more, I've mostly come around to your opinion, @mstoykov and @robingustafsson 😄, at least with regards to this specific feature (compression / contentEncoding). Probably also any other similar features we'll want in the future as well, with the only big exception being Content-Type.

Because we're already using some "magic" to determine the request Content-Type based on the body contents, and the complexity of introducing a third variable in that whole mess (see above message), I still feel like, in that specific case, it would make more sense for us to actually fix the magic so it works like people expect it to work, i.e. read any custom Content-Type header values that users are specifying and adjust k6's behavior accordingly. But only for this specific header, which we'll properly document.

I also propose that instead of a disableMagic (or equivalent 😄) option in http.Params, in the future we probably should add a global validateHeaders option, probably defaulting to true. If that option is enabled, any mismatches or wrong or unsupported headers will cause k6 to emit a warning, or if throw is also enabled - throw an error.

That will be a totally separate PR of course, and if you agree, I can add an issue about it.

js/modules/k6/http/request.go Outdated Show resolved Hide resolved
release notes/upcoming.md Outdated Show resolved Hide resolved
js/modules/k6/http/request_test.go Outdated Show resolved Hide resolved
@na-- na-- changed the title Add ability to gzip compress request body fix #988 Add ability to gzip compress request body Apr 10, 2019
- Move compression to httpext
- support deflate and gzip
- support combination of the two
@mstoykov mstoykov changed the title Add ability to gzip compress request body Add ability to compress request body Apr 10, 2019
js/modules/k6/http/request_test.go Outdated Show resolved Hide resolved
js/modules/k6/http/request_test.go Outdated Show resolved Hide resolved
js/modules/k6/http/request.go Outdated Show resolved Hide resolved
lib/netext/httpext/request.go Show resolved Hide resolved
lib/netext/httpext/request.go Show resolved Hide resolved
js/modules/k6/http/request.go Show resolved Hide resolved
js/modules/k6/http/request.go Outdated Show resolved Hide resolved
lib/netext/httpext/request.go Outdated Show resolved Hide resolved
lib/netext/httpext/request.go Outdated Show resolved Hide resolved
release notes/upcoming.md Outdated Show resolved Hide resolved
lib/netext/httpext/request.go Outdated Show resolved Hide resolved
Copy link
Member

@na-- na-- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM now, besides that minor nitpick about more informative error messages and having this code not use the new constants.

lib/netext/httpext/request.go Outdated Show resolved Hide resolved
js/modules/k6/http/request.go Outdated Show resolved Hide resolved
lib/netext/httpext/request.go Outdated Show resolved Hide resolved
lib/netext/httpext/request.go Outdated Show resolved Hide resolved
lib/netext/httpext/request.go Outdated Show resolved Hide resolved
lib/netext/httpext/request.go Show resolved Hide resolved
Copy link
Member

@na-- na-- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mstoykov mstoykov merged commit 3fc300c into master Apr 12, 2019
@mstoykov mstoykov deleted the feature/AddRequestBodyCompression branch April 12, 2019 08:50
@artych
Copy link

artych commented Apr 12, 2019

Thanks, @mstoykov and @na-- , will try this tomorrow :)

na-- added a commit that referenced this pull request Aug 7, 2019
This would restore the behavior pre-#989, so that any content-length header values manually specified by users would be overwritten by k6. Only, this time it won't be done silently, a warning would be shown every time there was a mismatch.

The HAR converter is also modified to not include the content-length header in the generated sources. While that header isn't going to be a problem if its values matches the actual body content length, some HAR files are broken and include the header without any body data, and other times users may modify the body but forget the header. So, since k6 would automatically send it, we shouldn't include it in the script.
na-- added a commit that referenced this pull request Aug 8, 2019
This would restore the behavior pre-#989, so that any content-length header values manually specified by users would be overwritten by k6. Only, this time it won't be done silently, a warning would be shown every time there was a mismatch.

The HAR converter is also modified to not include the content-length header in the generated sources. While that header isn't going to be a problem if its values matches the actual body content length, some HAR files are broken and include the header without any body data, and other times users may modify the body but forget the header. So, since k6 would automatically send it, we shouldn't include it in the script.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for request compression
5 participants