-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fails on duplicate Content-Type
headers
#29
Comments
Content-Type
headersContent-Type
headers
Ok, doing a little bit of initial digging. The previously referenced RFC in the readme and code is superseded by this one and the old one does not contain this language afaict. Seems likely that this implementation just pre-dates the official handling of this in the spec. I think the problem goes a bit deeper than JUST parsing this for multiple values though. Since there is a following section about handling multiple
And the
This leads me to think that while our handling is slightly incorrect, it is also could be a security risk if we do anything other than consider this completely invalid behavior and throw/bail on it (see the paragraph above this one on how mime-sniffing can open issues). I wonder if maybe @jasnell might be a good person to help us on this as someone who both deeply knows the domain and also works at CF. I am inclined to say we should do one of two things (depending on James' input):
|
The RFC mentioned in the readme does actually make mention of multiple header fields, with
The reference link mentioned in this section, https://datatracker.ietf.org/doc/html/rfc7230, reads:
So this is still something to be expected even in the older specification, it seems
I think this would be the most sensible thing to do imo, but I may just be biased
That would be great. Right now we have an additional CF Snippet running that will automatically convert a duplicate header into a single value by taking the first entry (same as what was proposed here), but that's not super ideal for us in the long run since CF Snippets are limited in quantity and this is eating one of our slots |
I think there might also be an argument to be made about upping the target spec? If the lack of presence of certain language in the older spec is an issue at least. These libraries are being used by |
I don't think it was an issue, I was just surprised it was not accounting for this since I know proxies have done this for years. Thanks for finding the section on that, I must have just missed it. But yeah, I think we can work toward updating the references to the spec (good first PRs 😉) as well as doing the proposal here. Ideally we would do the doc references separately so they can breeze on through in case we need to go back and forth on the implementation updates. Especially since there is some risk this becomes a breaking change. |
tldr; lists of values in Content-Type is not spec compliant, it's the job of an application to deal with invalid header valuesSection 5.2 explains why I think the quoted part of section 5.3 below is being misinterpretted in terms of what "good example" means. It does not support the idea that a list is valid for
The spec is referring to how robust For the called out section from rfc7230 section 3.2.2
This is typical spec speak, and how I interpret is is informed by
Combining of multiple headers this way is common for proxies, yes, but proxies typically do it without any idea if they are creating valid values. They do it per 5.2 in order to guard downstream servers from seeing duplicate headers and being confused. The root is almost always a misconfigured client, and the proxy is just doing its best. An invalid value better represents the semantics of the transported message than duplicate headers which could cause unexpected results. I see a few paths forward here, ordered in terms of my preference
If you're already handling these at the ingest, then that sounds sane to me and like the right approach. Fixing the client is the best approach, but since you're working with hardware I understand that's not an option. |
Thank you for the reply, and I apologize for my lateness. Thank you for the detailed look at the targeted spec, and while I do still believe this is within spec as per the current RFC (due to section 5.2 seemingly being much more lenient in it's language than the current target specs language) that's just my interpretation
Yes unfortunately we don't have much control here, even though I agree this is the best option. Though to be fair even in situations outside of ours where someone may not be working with hardware like we are, fixing the client is often not an option for most people since you don't typically control the client outside of your own products
I unfortunately have no pull over how Cloudflare operates in this regard. Even in our current setup, we aren't preventing the combined header from being created. Cloudflare creates it and we're retroactively deduplicating it. So I'm not sure how much of an option this actually is
We're handling this via Cloudflare Snippets right now yes, but that's not super ideal for us since Cloudflare only provides a limited number of these Snippets and this takes up a slot. We are doing this at the Cloudflare level because the issue presents itself through That's why I initially made several issues on various repositories, since I'm aware it's up to the application to decide what to do but I was not sure whose responsibility it actually was in this case since there's 6 different areas where this issue could potentially be handled (Cloudflare Snippets, Express/the Express server, The issues in the other repositories were closed in favor of handling this downstream, but if the interpretation is that this isn't spec compliant then I suppose those would be reopened? Or is the expectation that none of the packages handle this and it's left up to the developer? I'm genuinely asking, to be clear, since I really have no idea who would be responsible in this case and we're looking to settle on a solution for this on our side
Imo regardless of what happens, even if none of the mentioned packages fix this themselves, a better error would probably be in order just to make it more clear as to why the header is being rejected. At the very least it would save people some time having to dig through the source of multiple libraries to figure that out (plus digging through multiple versions of the same libraries, as in some of these cases GitHub does not match npm) |
Ok, we have a few long posts here, but to help bring it back to a decision for this library specifically, we need to choose between one of these to I think:
My vote goes to 1 since it is invalid in the first place, but I think 2 is viable because it would solve a real problem here. |
Both of those sound reasonable to me Given that, at least the current target spec, it does seem to forbid this I do agree that it probably isn't best to fix it in this repo but rather a clear error should be thrown to indicate why it's being rejected, and the deduplication should probably be handled either in type-is or body-parser |
According to spec, an HTTP message may contain headers with the same name. In those cases the values of the headers are concatenated into a comma separated list https://datatracker.ietf.org/doc/html/rfc9110#section-5.2:
However
content-type
fails to parse values when they are concatenated like this. Basic repro:While an uncommon situation, it is one that can happen and is perfectly within the spec. A common way which this may happen is when using Cloudflare Workers/Snippets. If a request comes in and is processed by a Cloudflare Worker/Snippet and contains a duplicate header, Cloudflare will automatically convert this into a comma separated list, and thus fail to be processed here
This is caused by the following regex only allowing a single value for the header:
This issue is related to several other issues, which I will link here after they are made, since I'm not sure who should have the responsibility of ensuring the header is split correctly
The text was updated successfully, but these errors were encountered: