Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chunked encoding gets doubly chunked when using puma corrupting the content #6

Open
pedrocr opened this issue Apr 18, 2014 · 14 comments

Comments

@pedrocr
Copy link

pedrocr commented Apr 18, 2014

I am running a Rails app in puma and using rack-streaming-proxy to proxy a MJPEG HTTP stream. The stream is a multipart message in chunked encoding. The proxy seems to break the stream by not proxying it as chunked encoding. All the code is here:

https://github.com/pedrocr/camerasink

Testing with telnet the original stream returns:

HTTP/1.0 200 OK
Server: camerasave
Date: Fri, 18 Apr 2014 11:13:06 GMT
Transfer-Encoding: chunked
Content-Type: multipart/x-mixed-replace;boundary=SurelyJPEGDoesntIncludeThis

51
--SurelyJPEGDoesntIncludeThis
Content-Type: image/jpeg
Content-Length: 9437

(the image contents go here, the headers after "51" are repeated on every new image)

The same stream after being proxied with rack-streaming-proxy and puma returns:

HTTP/1.0 200 OK
server: camerasave
date: Fri, 18 Apr 2014 11:12:16 GMT
content-type: multipart/x-mixed-replace;boundary=SurelyJPEGDoesntIncludeThis
Cache-Control: no-cache
X-Request-Id: c68247f6-e4ef-4507-b34c-c37330062289
X-Runtime: 0.256272
Connection: close

--SurelyJPEGDoesntIncludeThis
Content-Type: image/jpeg
Content-Length: 9427

(the image contents go here and the headers get repeated as well)

The difference seems to be that the proxied request doesn't have chunked encoding. At least in firefox this breaks the MJPEG streaming.

@pedrocr
Copy link
Author

pedrocr commented Apr 18, 2014

Just looked at the specs in the code and HTTP/1.0 requests will strip the chunked encoding. Don't know why that is but Firefox will be doing HTTP/1.1 requests anyway. I was just testing with HTTP/1.0 since it's easier. Here's the same with HTTP/1.1:

$ telnet localhost 8080
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
GET /camera/stream/test HTTP/1.1
Host: localhost

HTTP/1.1 200 OK
server: camerasave
date: Fri, 18 Apr 2014 11:25:27 GMT
transfer-encoding: chunked
content-type: multipart/x-mixed-replace;boundary=SurelyJPEGDoesntIncludeThis
Cache-Control: no-cache
X-Request-Id: 48dda60f-baae-455f-a277-e34c06c8220a
X-Runtime: 0.124132
Transfer-Encoding: chunked

57
51
--SurelyJPEGDoesntIncludeThis
Content-Type: image/jpeg
Content-Length: 9453



24f5
24ed
�JFIF����C���������

The problem seems to be that the proxy is actually keeping the chunked encoding markers from the original stream on the proxied stream. This corrupts the JPEG images.

@pedrocr pedrocr changed the title Proxying a MJPEG stream seems to drop the chunked encoding, breaking it Chunked encoding proxying passes through the original chunk markers, corrupting the stream Apr 18, 2014
@pedrocr
Copy link
Author

pedrocr commented Apr 18, 2014

I was looking through the spec and noticed a comment about it only working on unicorn. This indeed works properly on unicorn. Unfortunately I can hang unicorn very easily running these long-lived clients. I probably need to figure out how to increase the number of workers.

Is this a bug in puma then or is rack-streaming-proxy doing something that makes puma redo the chunking and break the stream?

@pedrocr
Copy link
Author

pedrocr commented Apr 18, 2014

Created a test case for this here:

https://github.com/pedrocr/camerasink/blob/b01aac64e8aaef96f8527f6e29768cc1f5f6199b/testcases/puma_double_chunking.ru

It's runnable as "rackup puma_double_chunking.ru" and then:

The original server first (a basic webrick server):

$ curl http://localhost:9000
Hello!

The puma+rack-streaming-proxy server, proxying the first server:

$ curl http://localhost:9292
7
Hello!

0

@pedrocr pedrocr changed the title Chunked encoding proxying passes through the original chunk markers, corrupting the stream Chunked encoding gets doubly chunked when using puma corrupting the content Apr 18, 2014
@evanphx
Copy link

evanphx commented Apr 18, 2014

The reason this happens is that puma always uses chunked encoding because it's more efficient, server wise, when the body is being returned in sections via #each.

rack-streaming-proxy assumes that the rack server is a dumb one and just outputs the chunks end-to-end back to the client, but thats not what puma does.

I'd like to get some clarification from someone on the rack team (@rkh, @spastorino, @raggi) on whether Puma is violating the rack spec or if rack-streaming-proxy is. rack-streaming-proxy and puma probably need to figure out a way to communicate who is doing the chunked encoding so the other doesn't.

At any rate, you don't need the chunked proxy with puma because, as you've seen, puma does it by default for you.

@pedrocr
Copy link
Author

pedrocr commented Apr 18, 2014

Thanks for looking into it @evanphx. Unfortunately I really do need the proxying to work. The chunking is already done from the source I just need to pass it through. My use case is that I run a C program to do some video capture on a set of surveillance cameras and create a MJPEG monitor output for each camera on http://127.0.0.1:30###/mjpeg. I then want my app to show that as http://myapp/camera/stream/cameraname so that it's all self-contained in a single HTTP server.

I don't think I can disable the chunked encoding in the C program either because since this is an infinite stream there's no way to do a content-length encoding.

@evanphx
Copy link

evanphx commented Apr 19, 2014

Ah! If you're using puma, I'd suggest you use rack.hijack, which puma supports natively. That will allow you to very efficiently stream the output from your program directly to the HTTP client. You can even attach the socket that rack.hijack returns straight up to the stdout of your C program so that you don't have to copy the data yourself.

@sonots
Copy link
Collaborator

sonots commented Apr 19, 2014

@pedrocr chunked transfer is supported from http/1.1. So, your first example on original stream

HTTP/1.0 200 OK
Server: camerasave
Date: Fri, 18 Apr 2014 11:13:06 GMT
Transfer-Encoding: chunked

looks wrong already. If the server can speak HTTP/1.1, but the request is from HTTP/1.0 client, it should retrun response like

HTTP/1.1 200 OK
Server: camerasave
Date: Fri, 18 Apr 2014 11:13:06 GMT
content-length: xxxx

The first line expresses whether the server can speak http/1.1. But, because the request is from http/1.0, it should not return chunked transfer because the client can not understand it.

I will look into the case of http/1.1.

@sonots
Copy link
Collaborator

sonots commented Apr 19, 2014

Hmm, well, as you guys tell, it looks puma is doing something ...
rack-streaming-proxy can't control it because the puma tweaks and returns response to the client finally.

@pedrocr
Copy link
Author

pedrocr commented Apr 19, 2014

@sonots, I was just using HTTP/1.0 requests to make it easier to write it out on the terminal (saves the host line). The clients will be doing HTTP/1.1. Strangely libsoup will do chunked encoding with 1.0. But as you've seen the problem exists with HTTP/1.1 as well. rack-streaming-proxy and puma are not seeing eye to eye on the rack spec.

@evenphx, I'll look into rack.hijack. At worst it should allow me to implement the proxy myself. But it may actually be better than that. Note that the C program doesn't output on stdout it runs a libsoup HTTP server itself. But maybe I can get rid of that and instead use a pipe or a file socket directly. Need to figure out how I'd handle multiple clients like that, but sounds interesting. Thanks.

@pathsny
Copy link

pathsny commented Mar 22, 2015

I've managed to get a basic version working with rack_hijack and this works with puma. Unfortunately this will mean a separate implementation for servers that support rack hijacking and those that dont.

Additionally I've made it work by violating the encapsulation of the Net::HTTP class to grab the socket inside once the request is sent which is probably not ideal lol.

@raggi
Copy link

raggi commented Mar 23, 2015

@evanphx Using middleware for chunking was a bad idea from the outset, it turns out. The problem being that it's consistently poorly defined for a server whether an app has performed chunking already. My recommendation is that you assume that if you are on http 1.1+ and have no content length and no connection:close, and no transfer-encoding:chunked, then you should be safe to apply chunked encoding. If any of these are otherwise, you may break user required semantics. If you see an explicit transfer-encoding:identity (which is deprecated since RFC7230) then you also probably shouldn't re-encode. Some protocols have unfortunate recommendations in this area (glaring at Server Sent Events in particular).

@raggi
Copy link

raggi commented Mar 23, 2015

As far as this ticket goes, I think the core problem in rack-streaming-proxy is that this class https://github.com/zerowidth/rack-streaming-proxy/blob/master/lib/rack/streaming_proxy/response.rb should be declaring Transfer-Encoding:chunked, given that it's outputting chunked encoding. It is not doing so, and as such servers have no way to know of the encoding. Somewhere else, code is generating a non-canonical "transfer-encoding" header also, that may be missed in other tests. I strongly recommend canonicalizing to avoid such issues, but either way, whatever is missing that is buggy too.

https://github.com/zerowidth/rack-streaming-proxy/blob/master/lib/rack/streaming_proxy/proxy.rb#L87-L91 is wrong.

https://github.com/zerowidth/rack-streaming-proxy/blob/master/lib/rack/streaming_proxy/proxy.rb#L93-L96 is also wrong, iff it still chunks.

https://github.com/zerowidth/rack-streaming-proxy/blob/master/lib/rack/streaming_proxy/response.rb#L76 single line reads are insufficient to read all the headers. You need a parser here, although I'm not entirely sure what @piper is, given that this is a rack proxy, but gets suggests the wrong kind of read.

@pathsny
Copy link

pathsny commented Mar 23, 2015

@raggi @piper is an interprocess communicator. The actual response handling code is here https://github.com/zerowidth/rack-streaming-proxy/blob/master/lib/rack/streaming_proxy/session.rb#L85 which just uses Net::HTTP

@evanphx
Copy link

evanphx commented Mar 23, 2015

@raggi Ok, thanks for the clarification. Puma already disables it's own chunked encoding if any Transfer-Encoding header is present, since that's the only safe assumption.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants