-
-
Notifications
You must be signed in to change notification settings - Fork 488
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suggestion: Speed up decoder by queue-ing queueInputBuffer (previous idea: BUFFER_FLAG_PARTIAL_FRAME) #2436
Comments
It looks like the safest way to do this is to Have VideoDecoderSink produce something like a SocketBuffer which can be fed into SocketReader which will eventually get a |
Abandoning the idea since StreamSocket.recv is used for multiple streams :) |
Sorry I haven't replied sooner. The idea actually makes sense, but not in the way you think. while we cannot do transmission and decoding in parallel for parts of the same frame, we can receive and copy the frame in parallel. This could maybe save 1-2ms. I know because previously we managed to shave some latency just by being careful not to do unnecessary copies of the data |
This would actually be the first instance of intra-frame pipelining in ALVR. currently we show statistics as a simple bar graph where it's impossible to show overlaps. I plan to rework this soon by switching to a hybrid scatter/timeline plot. Let's work on this after I finished the statistics rewrite. |
Thank you for taking notice :) You are talking about StreamSocket.recv around line 563 I guess. I havent wrapped my head around the shard logic, but it would require to have a "partial packet" logic around the whole thing I believe. |
Actually no, i didn't mean touching the StreamSocket logic. I mean sending smaller packets and receiving smaller packets, so that StreamSocket will not split the packets at all. Doing this will not take advantage of the packet reordering logic, but it doesn't matter, we can wait because the copy logic is much faster than the network transmission. In any case, we would still need to test this, it's not guaranteed that this will reduce latency. It will increase CPU usage because the send, recv, and copy logic need to be repeated for each smaller packet instead of once per frame. We might need to find a compromise regarding the packet size. I'd say that this improvement is a micro optimization, since the most we can save is 1 video frame copy. |
As suggested by @xytovl, we don't need to use BUFFER_FLAG_PARTIAL_FRAME, we can just acquire a input buffer and queue it when it has been filled. Let's keep this issue open for the copy pipelining feature. |
I see multiple ideas overlapping, so i'll try and summarize what I undrestand:
The obvious caveat right now as I see it, is the fact that it is guaranteed right now that we have a whole frame before push_frame_nal. My thoughts b)I believe is not really a problem because shard_length is probably constant so the resize is not that common. Not sure. My suggested approach is two-fold:
ii)Make subscribe_to_stream() require a supplier object that can deliver buffers to the socket logic i+ii) means the decoder provides buffers, they are filled with shards, each can be allocated once and delivered in-order as data comes through, enabling lost packet and skipped frames to be handled as they are. Noob question: how do I make rustanalyzer build dependencies for android? where do I put --platform in vscode. |
I don't think we can do any better. We have one copy from socket buffer to decoder buffer. on the server/send side instead we need to prefix the data with the packet header (at least the packet index), which means a copy. So nothing we can save |
This is a interesting idea, but of limited use. we can just do its own separate thing for video.
You should be able to change the "check" command for rust-analyzer. I did it some times but changing it back and forth got old. if I need to edit android cfg-gated files I comment the cfg-gate temporarily |
I agree. Also for audio maybe? |
I would say audio is not even a hot code path. The data size is much lower, plus we are not as sensitive to audio latency |
For posterity: |
Ah ok. I think best thing is to do a compromise, by splitting the frame into a set number of packets to not exceed the max mediacodec input buffers. |
I am having trouble with the decoder flags, if anyone is interested to test in another device, (just for the heck of it at this point), I can provide a branch to play around. I am testing on a Xiaomi Note 7 and the decoder doesnt really know what to do with the partial frames. It almost looks like that it does not support the flag, and simply tries to render the partial frame. Its weird. The top of the image is perfect, and the rest is extremely low bitrate. Its as if the first partial frame is all that it gets rendered. |
That's not the first time we see vendors not supporting Mediacodec flags. Seems the best option is not to mess with mediacodec at all. |
Just an idea, I've been browsing through the client decoder stream setup and it seems a whole frame needs to be queued to begin the decoder. Ignoring any details that I have no idea about, it could be possible to retrieve the dequeInputBuffer and dump directly to the mediacodec buffer each chunk from the socket. I haven't explored the details (rust is new to me), but it seems it would save a ptr: copy non overlapping, as well as priming with more data as soon as they arrive. Any thoughts on why this can't work?
The text was updated successfully, but these errors were encountered: