Temporal layer in simulcast for VP8 not working

Noticed that in Janus EchoTest and also in our application, that temporal simulcast seems not to work anymore for VP8. I click on L, but I keep receiving 24 or 30 FPS. I remember 1-2 years ago it was working fine and it was lowering down the FPS. Could you double-check temporal in VP8? Maybe something broke during the VP9 simulcast refactoring effort? I noticed that L temporal works fine for us in VP9 simulcast, but not in VP8 anymore.

Iā€™ve just tried the EchoTest with Firefox and it works as expected for me: the UI shows three buttons for temporal layers, but there really only are two (TL1 and TL0).

With Chrome apparently only TL0 exists, but thatā€™s not a problem in Janus. It probably means Chrome stopped sending temporal layers for VP8, since they still do work with Firefox.

But if I look in the WebRTC stats, I still receive a high FPS (around 30), even if I click the TL0. Can you check the actual traffic and confirm FPS and bytes per second is lower in the WebRTC stats?

about:webrtc in Firefox and chrome://webrtc-internals

Thatā€™s because itā€™s the only temporal layer that exists. My guess is Chrome stopped adding temporal layers when doing VP8 simulcast.

1 Like

I see thereā€™s a new rtp-hdrext/video-layers-allocation00 extension being negotiated, which apparently provides info on which layers exist. Iā€™ll take note of negotiating/parsing that extension for experimentation, as that may give more details on what a sender is actually configured to send.

1 Like

Thank you for the prompt response! I noticed this old Chrome discussion from 2020 - https://groups.google.com/g/discuss-webrtc/c/N1sMEBJhOz4. When Chrome switched from 3 to 2 temporal layers for VP8. I also noticed this comment, not sure if related.

There are platforms where Chrome uses HW encoders that donā€™t have temporal layer support. In which case Chrome will only send a single temporal layer.

I doubt my Dell laptop has a VP8 hardware encoder :sweat_smile:

1 Like

FYI, I created a basic parser for that extension, which seems to say there should still be two temporal layers:

a1549c048407b401ac023c6404ff02cf1e027f01671e013f00b31e (54 bytes)
	  -- 27 bytes
	a1 54 9c 04 84 07 b4 01 ac 02 3c 64 04 ff 02 cf 1e 02 7f 01 67 1e 01 3f 00 b3 1e 
	  -- -- rid=2, ns=2 (3 RTP streams), sl_bm=1 (0001)
	  -- -- -- RTP #0, sl_bm=1 (0001)
	  -- -- -- RTP #1, sl_bm=1 (0001)
	  -- -- -- RTP #2, sl_bm=1 (0001)
	  -- -- Temporal layers (54)
	  -- -- -- RTP #0, sl=0, tl=1 (2 temporal layers)
	  -- -- -- RTP #1, sl=0, tl=1 (2 temporal layers)
	  -- -- -- RTP #2, sl=0, tl=1 (2 temporal layers)
	  -- -- Target bitrates (9c 04 84 ...)
	  -- -- -- RTP #0, sl=0, tl=0, bitrate=540
	  -- -- -- RTP #0, sl=0, tl=1, bitrate=900
	  -- -- -- RTP #1, sl=0, tl=0, bitrate=180
	  -- -- -- RTP #1, sl=0, tl=1, bitrate=300
	  -- -- -- RTP #2, sl=0, tl=0, bitrate=60
	  -- -- -- RTP #2, sl=0, tl=1, bitrate=100
	  -- -- Resolutions (04 ff 02 ...)
	  -- -- -- RTP #0, sl=0, res=1280x720 @ 30fps
	  -- -- -- RTP #1, sl=0, res=640x360 @ 30fps
	  -- -- -- RTP #2, sl=0, res=320x180 @ 30fps
Parsed 27/27 bytes
Bye!

When I do an unencrypted Wireshark capture, though, I see different things:

  • Firefox is setting X=1 in the VP8 descriptor to specify thereā€™s more info, and then T=1 to say the descriptor contains the temporal layer index. This is what we parse and use for simulcast swithcing.
  • Chrome is setting X=0, which means all the additional info we need isnā€™t there. I donā€™t know if it isnā€™t there because, despite what the extension says, there actually arenā€™t multiple temporal layers, or if itā€™s because theyā€™re now signalling that info somewhere else (maybe the AV1 descriptor? but why? thatā€™s optional)
1 Like

Thanks to Philipp Hancke, we found out the root cause of the issue:
https://issues.webrtc.org/issues/42226269

When the Dependency Descriptor extension is negotiated, Chrome will stop sending the temporal layer index in the payload descriptor, and put it in the DD instead. We always negotiate the extension because we might need it in case AV1 is used, but donā€™t do anything with it for other codecs, since itā€™s an extension that was originally conceived only for AV1 SVC usage.

As such, there are a few potential fixes:

  1. just munge the SDP and remove that extension before sending the offer to Janus: this is the easiest fix that you can deploy right away, no change to Janus needed;
  2. we change the code in Janus (or the plugins) so that we only negotiate the extension if AV1 is the codec weā€™ll actually use;
  3. we change the code in Janus (or the plugins) so that we try the ā€œusualā€ way of getting the temporal index first, and if we canā€™t find it, we look in the DD.

My guess is that 3. will be the ā€œcleanestā€ approach in the longer run, but itā€™s also the one that will probably take me more time, as Iā€™ll have to check the implications of using resources Iā€™ve mostly hardcoded to AV1/SVC to something else as well.

1 Like

I ended up implementing 2., which was much easier and quicker to do, and transparent to plugins (no change needed there): Don't accept Dependency Descriptor extension unless the negotiated coā€¦ Ā· meetecho/janus-gateway@13e7260 Ā· GitHub

In case we want to go for 3. in the future, that will need a dedicated refactoring, as the code does make many assumptions associated with SVC. It will probably require a new dedicated structure or set of utilities that can transparently use payload descriptors or DD by providing a unified C API to plugins.

1 Like

PS: I created a PR for the parsing of the video-layers-allocation00 RTP extension: the patch makes a partial usage of it in the EchoTest demo, if it works and makes sense we can extend that to the VideoRoom plugin.

1 Like