Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JVB doesn't resume media stream after LastN is limited for long #1648

Open
jerry2013 opened this issue May 5, 2021 · 7 comments
Open

JVB doesn't resume media stream after LastN is limited for long #1648

jerry2013 opened this issue May 5, 2021 · 7 comments

Comments

@jerry2013
Copy link

This Issue tracker is only for reporting bugs and tracking code related issues.

Before posting, please make sure you check community.jitsi.org to see if the same or similar bugs have already been discussed. General questions, installation help, and feature requests can also be posted to community.jitsi.org.

Description

We use lib-jitsi-meet to do custom layout and utilize the new VideoConstraints. I found that after video streams are suspended (by setting LastN to be small) for a long time (e.g. 15-20 min), it becomes impossible to get the streams back, by setting LastN to large number or -1.

All those suspended streams would stay suspended, even setting them as "on stage" doesn't work.

Other viewers without doing LastN change are receiving from everyone else just fine.

I have then tested against https://beta.meet.jit.si/ to verify it's not my setup or code.

Current behavior

Have 10+ video.

Set LastN to a small number - in my own setup, it's set to match what is on the screen, e.g. screenshare means 1 on-stage with 4 selected, so LastN is set to 5.

After 15-20 min (e.g. screenshare stops), LastN is set back to 20 (e.g. everyone)

Problem: the suspended streams don't come back.

Expected Behavior

All the video streams flows again after the LastN is large again.

Possible Solution

Turning the camera on/off by others may restore the stream (not sure if it's always the case)

Steps to reproduce

  1. Start a meeting in https://beta.meet.jit.si/ (tried 1 hour ago today)
  2. Join with video on using test robots or other computers, to get to 10 or more video streams.
  3. In console:
> APP.conference._room.setLastN(2)
> console.log(JSON.stringify(APP.conference._room.receiveVideoController._receiverVideoConstraints))
{
	"_defaultConstraints": {
		"maxHeight": 180
	},
	"_lastN": 2,
	"_maxFrameHeight": 180,
	"_selectedEndpoints": [
		"64f0b270",
		"29839958",
		"9f4d8dc6",
		"53717bcf",
		"4a6e6c1e",
		"7d1b72d6",
		"7ca69f19",
		"fb74688c",
		"31ef978c"
	],
	"_receiverVideoConstraints": {
		"constraints": {},
		"defaultConstraints": {
			"maxHeight": 180
		},
		"lastN": 2,
		"onStageEndpoints": [],
		"selectedEndpoints": []
	}
}
  1. Then let the meeting continue for 15-20 min (I tried once with <10 min and it worked fine) - I'm not sure if it'll happen all the time, but it happened twice for me.
  2. Either toggle the stage/tile view, or do APP.conference._room.setLastN(20)

Environment details

lastest jitsi unstable on beta.meet

@bgrozev
Copy link
Member

bgrozev commented May 5, 2021

Interesting. My first thought is the SRTP ROC getting out of sync. None of the machines on beta.meet.jit.si have any packets dropped in SRTP, so if that's the case its between the bridge and a receiver. When the issue occurs, do you see an updated "forwarded endpoints" message in the console? Do the streams appear if you switch to tile/stage view, click on a thumbnail, or just wait for 2-3 minutes?

@jerry2013
Copy link
Author

I waited for 2 min while toggling the tile/stage view back and forth, by selecting different tiles to be on stage and the video stream did not resume.

I did not check the console earlier because with 10+ video it's a wall of text - but I can run the test again and filter "forwarded"

@jerry2013
Copy link
Author

So, yes, there are "forwarded" messages after setting LastN back to the big number. And it'll just keep cycling through all the endpoints w/o being able to bring any of it back live.

Logger.js:154 2021-05-06T01:09:49.870Z [modules/RTC/BridgeChannel.js] <WebSocket.e.onmessage>:  New forwarded endpoints: 838fbf70
Logger.js:154 2021-05-06T01:09:49.969Z [modules/RTC/BridgeChannel.js] <WebSocket.e.onmessage>:  New forwarded endpoints: 344d5a4f,838fbf70
Logger.js:154 2021-05-06T01:09:50.050Z [modules/RTC/BridgeChannel.js] <WebSocket.e.onmessage>:  New forwarded endpoints: 344d5a4f,838fbf70,8585ee5a
Logger.js:154 2021-05-06T01:09:50.142Z [modules/RTC/BridgeChannel.js] <WebSocket.e.onmessage>:  New forwarded endpoints: 06446c0c,344d5a4f,838fbf70,8585ee5a
Logger.js:154 2021-05-06T01:09:50.239Z [modules/RTC/BridgeChannel.js] <WebSocket.e.onmessage>:  New forwarded endpoints: 06446c0c,344d5a4f,838fbf70,6c8ba4ed,8585ee5a
Logger.js:154 2021-05-06T01:09:50.512Z [modules/RTC/BridgeChannel.js] <WebSocket.e.onmessage>:  New forwarded endpoints: 8585ee5a,344d5a4f,838fbf70,6c8ba4ed,06446c0c,671f63c5
Logger.js:154 2021-05-06T01:09:50.929Z [modules/RTC/BridgeChannel.js] <WebSocket.e.onmessage>:  New forwarded endpoints: 838fbf70,06446c0c,dcccf638,671f63c5,8585ee5a,6c8ba4ed,344d5a4f
Logger.js:154 2021-05-06T01:09:51.292Z [modules/RTC/BridgeChannel.js] <WebSocket.e.onmessage>:  New forwarded endpoints: 8585ee5a,6c8ba4ed,dcccf638,b184e261,06446c0c,344d5a4f,838fbf70,671f63c5

Date()
"Wed May 05 2021 21:10:22 GMT-0400 (Eastern Daylight Time)"

APP.conference._room.setLastN(2)

Logger.js:154 2021-05-06T01:10:34.396Z [modules/RTC/BridgeChannel.js] <WebSocket.e.onmessage>:  New forwarded endpoints: 8585ee5a,344d5a4f

APP.conference._room.setLastN(20)

Logger.js:154 2021-05-06T01:43:16.861Z [modules/RTC/BridgeChannel.js] <WebSocket.e.onmessage>:  New forwarded endpoints: 6c8ba4ed,06446c0c,b184e261,344d5a4f,671f63c5,8585ee5a
Logger.js:154 2021-05-06T01:43:17.389Z [modules/RTC/BridgeChannel.js] <WebSocket.e.onmessage>:  New forwarded endpoints: 06446c0c,dcccf638,344d5a4f,8585ee5a
Logger.js:154 2021-05-06T01:43:17.805Z [modules/RTC/BridgeChannel.js] <WebSocket.e.onmessage>:  New forwarded endpoints: 344d5a4f,8585ee5a
Logger.js:154 2021-05-06T01:43:19.170Z [modules/RTC/BridgeChannel.js] <WebSocket.e.onmessage>:  New forwarded endpoints: 06446c0c,344d5a4f,8585ee5a
Logger.js:154 2021-05-06T01:43:20.788Z [modules/RTC/BridgeChannel.js] <WebSocket.e.onmessage>:  New forwarded endpoints: 344d5a4f,8585ee5a
Logger.js:154 2021-05-06T01:43:22.210Z [modules/RTC/BridgeChannel.js] <WebSocket.e.onmessage>:  New forwarded endpoints: 06446c0c,344d5a4f,8585ee5a
Logger.js:154 2021-05-06T01:43:23.176Z [modules/RTC/BridgeChannel.js] <WebSocket.e.onmessage>:  New forwarded endpoints: 344d5a4f,8585ee5a
Logger.js:154 2021-05-06T01:43:25.802Z [modules/RTC/BridgeChannel.js] <WebSocket.e.onmessage>:  New forwarded endpoints: 06446c0c,344d5a4f,8585ee5a
Logger.js:154 2021-05-06T01:43:26.766Z [modules/RTC/BridgeChannel.js] <WebSocket.e.onmessage>:  New forwarded endpoints: 344d5a4f,8585ee5a
Logger.js:154 2021-05-06T01:43:28.236Z [modules/RTC/BridgeChannel.js] <WebSocket.e.onmessage>:  New forwarded endpoints: 06446c0c,344d5a4f,8585ee5a
Logger.js:154 2021-05-06T01:43:29.348Z [modules/RTC/BridgeChannel.js] <WebSocket.e.onmessage>:  New forwarded endpoints: 344d5a4f,8585ee5a
Logger.js:154 2021-05-06T01:43:31.829Z [modules/RTC/BridgeChannel.js] <WebSocket.e.onmessage>:  New forwarded endpoints: 6c8ba4ed,344d5a4f,8585ee5a
Logger.js:154 2021-05-06T01:43:32.390Z [modules/RTC/BridgeChannel.js] <WebSocket.e.onmessage>:  New forwarded endpoints: 344d5a4f,8585ee5a
Logger.js:154 2021-05-06T01:43:35.023Z [modules/RTC/BridgeChannel.js] <WebSocket.e.onmessage>:  New forwarded endpoints: 06446c0c,344d5a4f,8585ee5a
Logger.js:154 2021-05-06T01:43:35.976Z [modules/RTC/BridgeChannel.js] <WebSocket.e.onmessage>:  New forwarded endpoints: 344d5a4f,8585ee5a
Logger.js:154 2021-05-06T01:43:38.768Z [modules/RTC/BridgeChannel.js] <WebSocket.e.onmessage>:  New forwarded endpoints: 6c8ba4ed,344d5a4f,8585ee5a
Logger.js:154 2021-05-06T01:43:39.981Z [modules/RTC/BridgeChannel.js] <WebSocket.e.onmessage>:  New forwarded endpoints: 344d5a4f,8585ee5a
Logger.js:154 2021-05-06T01:43:42.763Z [modules/RTC/BridgeChannel.js] <WebSocket.e.onmessage>:  New forwarded endpoints: b184e261,344d5a4f,8585ee5a
Logger.js:154 2021-05-06T01:43:43.627Z [modules/RTC/BridgeChannel.js] <WebSocket.e.onmessage>:  New forwarded endpoints: 344d5a4f,8585ee5a
Logger.js:154 2021-05-06T01:43:44.889Z [modules/RTC/BridgeChannel.js] <WebSocket.e.onmessage>:  New forwarded endpoints: 344d5a4f,8585ee5a,6c8ba4ed
Logger.js:154 2021-05-06T01:43:46.404Z [modules/RTC/BridgeChannel.js] <WebSocket.e.onmessage>:  New forwarded endpoints: 344d5a4f,8585ee5a
Logger.js:154 2021-05-06T01:43:51.026Z [modules/RTC/BridgeChannel.js] <WebSocket.e.onmessage>:  New forwarded endpoints: 06446c0c,344d5a4f,8585ee5a
Logger.js:154 2021-05-06T01:43:52.185Z [modules/RTC/BridgeChannel.js] <WebSocket.e.onmessage>:  New forwarded endpoints: 344d5a4f,8585ee5a
Logger.js:154 2021-05-06T01:43:54.471Z [modules/RTC/BridgeChannel.js] <WebSocket.e.onmessage>:  New forwarded endpoints: 06446c0c,344d5a4f,8585ee5a
Logger.js:154 2021-05-06T01:43:55.836Z [modules/RTC/BridgeChannel.js] <WebSocket.e.onmessage>:  New forwarded endpoints: 344d5a4f,8585ee5a
Logger.js:154 2021-05-06T01:43:59.140Z [modules/RTC/BridgeChannel.js] <WebSocket.e.onmessage>:  New forwarded endpoints: 344d5a4f,06446c0c,8585ee5a
Logger.js:154 2021-05-06T01:44:00.354Z [modules/RTC/BridgeChannel.js] <WebSocket.e.onmessage>:  New forwarded endpoints: 344d5a4f,8585ee5a
Logger.js:154 2021-05-06T01:44:03.896Z [modules/RTC/BridgeChannel.js] <WebSocket.e.onmessage>:  New forwarded endpoints: 06446c0c,344d5a4f,8585ee5a
Logger.js:154 2021-05-06T01:44:04.955Z [modules/RTC/BridgeChannel.js] <WebSocket.e.onmessage>:  New forwarded endpoints: 344d5a4f,8585ee5a
Logger.js:154 2021-05-06T01:44:08.246Z [modules/RTC/BridgeChannel.js] <WebSocket.e.onmessage>:  New forwarded endpoints: 06446c0c,344d5a4f,8585ee5a
Logger.js:154 2021-05-06T01:44:13.816Z [modules/RTC/BridgeChannel.js] <WebSocket.e.onmessage>:  New forwarded endpoints: 344d5a4f,8585ee5a
Logger.js:154 2021-05-06T01:44:15.083Z [modules/RTC/BridgeChannel.js] <WebSocket.e.onmessage>:  New forwarded endpoints: 06446c0c,344d5a4f,8585ee5a
Logger.js:154 2021-05-06T01:44:15.988Z [modules/RTC/BridgeChannel.js] <WebSocket.e.onmessage>:  New forwarded endpoints: 344d5a4f,8585ee5a
Logger.js:154 2021-05-06T01:44:18.473Z [modules/RTC/BridgeChannel.js] <WebSocket.e.onmessage>:  New forwarded endpoints: 06446c0c,344d5a4f,8585ee5a
Logger.js:154 2021-05-06T01:44:19.582Z [modules/RTC/BridgeChannel.js] <WebSocket.e.onmessage>:  New forwarded endpoints: 344d5a4f,8585ee5a
Logger.js:154 2021-05-06T01:44:21.050Z [modules/RTC/BridgeChannel.js] <WebSocket.e.onmessage>:  New forwarded endpoints: 06446c0c,344d5a4f,8585ee5a
Logger.js:154 2021-05-06T01:44:22.622Z [modules/RTC/BridgeChannel.js] <WebSocket.e.onmessage>:  New forwarded endpoints: 344d5a4f,8585ee5a
Logger.js:154 2021-05-06T01:44:25.561Z [modules/RTC/BridgeChannel.js] <WebSocket.e.onmessage>:  New forwarded endpoints: 06446c0c,344d5a4f,8585ee5a
Logger.js:154 2021-05-06T01:44:26.420Z [modules/RTC/BridgeChannel.js] <WebSocket.e.onmessage>:  New forwarded endpoints: 344d5a4f,8585ee5a
Logger.js:154 2021-05-06T01:44:29.206Z [modules/RTC/BridgeChannel.js] <WebSocket.e.onmessage>:  New forwarded endpoints: 06446c0c,344d5a4f,8585ee5a
Logger.js:154 2021-05-06T01:44:30.216Z [modules/RTC/BridgeChannel.js] <WebSocket.e.onmessage>:  New forwarded endpoints: 344d5a4f,8585ee5a

@jerry2013
Copy link
Author

Hi @bgrozev are you able to repro the problem I encountered? I hope this can get fixed soon as I've been trying to use ReceiverVideoConstraints to make jitsi scale even better.

@jerry2013
Copy link
Author

Update - I ran a different test:

  • not setting LastN in ReceiverVideoConstraints, only defaultConstraints (low res, low fps) and set higher constraints for the selected endpoints.
  • JVB has its jvb-last-n set to 24.
  • I tried viewing (1-24) for 20 minutes, verified by checking frameDecoded in the stats.
  • I then selected a separate batch of 24 - I didn't encounter any problem with the new set of streams coming online.

Given this test, is it still SRTP ROC getting out of sync? Is JVB enforced last-n different from the client requested constraints?

@inventerprising
Copy link

inventerprising commented Feb 6, 2022

I think this issue is caused by the JVB incorrectly resuming the VP8 RPT’s SEQ number resulting in the WebRTC library discarding the packets as part of its replay protection.

Steps to reproduce

Join meet.jit.si with 5 people. Ensure at least one participant is Firefox or Safari (to drop back to VP8). Ensure all participants are in tile mode. Reduce a participant’s window size until tiles are 2x2 so that one video stream is not displayed and inactive. Wait 20 minutes. Reactivate video stream by increasing window size, or scrolling down, or switching to stage view (so reactivated stream will be shown in filmstrip). Reactivated stream is now “corrupted” and will report packet losses resulting in reduced BWE and other video streams being degraded.

Summary

After an endpoint’s stream has been suspended (due to bandwidth allocation, last-n or pagination) and later resumes, RPT’s SEQ number should continue incrementing without a gap in the sequence. In the case of VP9 packets, or VP8 packets where no other endpoint is requesting that SSRC, the SEQ number correctly “pauses” then “resumes” from the previous count. However, VP8 packets (with at least one other endpoint requesting the SSRC) resume after a pause with a gap in the SEQ number as though the counter continued incrementing during the paused stage.

If the pause was sufficiently long (15-20 mins) the discontiguous SEQ number of the resumed stream will appear to the WebRTC library as a replay attack and all of the packets for that SSRC will be discarded. The browser will then report all the packets as lost (via the TCC in its RTCP) and the JVB’s TCC node will reduce the BWE and suspend endpoints. After the problematic SSRC becomes inactive the packet losses will stop, the BWE will increase, the allocator will reactivate the endpoint, then lost packets will be reported again, rinse and repeat.

If the “corrupted” video is the current speaker then you won’t see any video. If the corrupted video is one of the thumbnails on a stage view, the on-stage video will be degraded and some thumbnails might become inactive. If the corrupted video is included in tile view, most tiles will show as either low frame rates, and/or numerous tiles switching on and off (flicking every few seconds) without any improvement.

In the case of larger meetings, if at least one person is viewing 5x5 tiles then other participants viewing stage mode will have approximately 6 videos active (in the filmstrip) and 19 videos inactive and vulnerable to this issue. After 20 minutes if a participant switches to tile mode they will experience the issue. Or, if one of the inactive participants speaks, they become a recent speaker and will appear in the other participants’ filmstrips which will trigger the issue.

Details

image

The first image shows the RTP original SEQ (green) and the RTP projected SEQ (yellow) sent by the JVB to the participant. The JVB stores a delta value and uses this to calculate the projected SEQ. In the image the first gaps illustrate the bug where the SEQ effectively keeps incrementing during the period when the endpoint doesn’t require that SSRC, but at least one other participant is receiving the stream. This results in a jump in the SEQ when the stream resumes. The last gap in the image illustrates the case where no participant is receiving the SSRC and there’s no jump in the SEQ when it resumes.

image

The bottom chart in the second image shows the bug being triggered and the results. The red dotted lines were added to illustrate the continuation of the SEQ increments during the participant’s video’s inactive period.

The chart shows that when the video resumed, the SEQ counter jumped 35,000. This jump is more than half of 2^16 and therefore WebRTC will assume the latest SEQ value is less than the previous SEQ value (ROC of -1) and will discard the RTP packet as a replay attack. This behavior can be observed by watching the packets and noticing the relevant RTP packets being received, but the RTCP TCC packets reporting them as lost. You can also debug WebRTC in Chrome:

path_to_chrome\chrome.exe --enable-logging -vmodule=*/webrtc/*=-1

and look for errors such as “Failed to unprotect SRTP packet, err=9, previous failure count: 100". Err 9 indicates “replay check failed (bad index)” which means the index is in the list of recently received SEQ indexes; and Err 10 indicates replay check failed (index too old)” which means the index is before the list of recent SEQ indexes. In both cases it’s only “appearing” to be a replay attack because the index has “wrapped around”. The error messages are reported every 100x instances.

The SEQ counter eventually increases back into the valid range and the system returns back to a stable state, however, the incrementing rate slows down after the counter has become corrupted (shown in second image) and thus it takes a long time. Also, this issue results in other videos becoming inactive (due to reduced BWE) and it’s likely those other streams will also become corrupted for the same reasons.

HD Streams increment the SEQ counter significantly faster. The second image’s steep slopes (at the beginning of the chart) are an example of this. At this rate it takes a pause of 200 seconds to cause the large SEQ gap. I experienced this during the initial exploration of this issue, however, further testing wasn’t able to reproduce this again. It’s possible I was mistaken and the issue only occurs when the uploader’s HD stream has been suspended (i.e. only when no other participant is requesting the HD video).

Let me know if any additional details would be helpful.

  • Nigel

@bukharin
Copy link

bukharin commented Feb 7, 2022

I'm also can confirm that issue is actual for our jitsi-meet installation with latest packages

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants