Huge thanks for making this project open source, and all the time the team has put into it.
I’m having an issue with ICE negotiation with a particular client network. The use case is one way video where the client (using chrome) is subscribing to a video from the video room plugin.
Looking at the chrome://webrtc-internals/ dump from the session, a srflx/prflx candidate pair is chosen. The browser client receives the first 3 STUN responses, and then stops receiving any responses. After about 7-15 seconds ice connection state changes from connected to disconnected.
I’ve tried doing ICE restarts after the state switches to disconnected, which will generate relay candidates, but the connection is still trying to prioritize the srflx/prflx candidate pair despite it being unstable.
If I change the client configuration so that iceTransportPolicy is set to relay everything works as expected, but I don’t want to force all clients to use TURN.
Looking at the Janus logs I see the following messages repeated over and over again. I also see the component state changing continuously between ready and connected.
[ERR] [ice.c:janus_ice_outgoing_traffic_handle:4892] [3241648296573675] ... only sent -1 bytes? (was 284)
[ERR] [ice.c:janus_ice_outgoing_traffic_handle:4892] [3241648296573675] ... only sent -1 bytes? (was 284)
[ERR] [ice.c:janus_ice_outgoing_traffic_handle:4892] [3241648296573675] ... only sent -1 bytes? (was 284)
[ERR] [ice.c:janus_ice_outgoing_traffic_handle:4892] [3241648296573675] ... only sent -1 bytes? (was 284)
[ERR] [ice.c:janus_ice_outgoing_traffic_handle:4892] [3241648296573675] ... only sent -1 bytes? (was 284)
[ERR] [ice.c:janus_ice_outgoing_traffic_handle:4892] [3241648296573675] ... only sent -1 bytes? (was 284)
[ERR] [ice.c:janus_ice_outgoing_traffic_handle:4892] [3241648296573675] ... only sent -1 bytes? (was 284)
[ERR] [ice.c:janus_ice_outgoing_traffic_handle:4892] [3241648296573675] ... only sent -1 bytes? (was 284)
[ERR] [ice.c:janus_ice_outgoing_traffic_handle:4892] [3241648296573675] ... only sent -1 bytes? (was 284)
[ERR] [ice.c:janus_ice_outgoing_traffic_handle:4892] [3241648296573675] ... only sent -1 bytes? (was 284)
[ERR] [ice.c:janus_ice_outgoing_traffic_handle:4892] [3241648296573675] ... only sent -1 bytes? (was 284)
[ERR] [ice.c:janus_ice_outgoing_traffic_handle:4892] [3241648296573675] ... only sent -1 bytes? (was 284)
[ERR] [ice.c:janus_ice_outgoing_traffic_handle:4892] [3241648296573675] ... only sent -1 bytes? (was 1344)
[ERR] [ice.c:janus_ice_outgoing_traffic_handle:4892] [3241648296573675] ... only sent -1 bytes? (was 716)
Janus was compiled with libnice 0.1.22 and libsrtp 2.2.0. The dtls-mtu is set to 1200.
Libnice in general does not support switching to a selected pair with a lower priority.
In the scenario you described Janus is the ICE controller entity, whereas Chrome is the controlled one. That means that, according to the standard, Janus is the entity that must choose the pair in use and notify the other party. However browsers (and Chrome in particular) tend not to honor the standard and decide to change selected pair autonomously even when being the controlled entity. In case like yours the browsers detect network issues and switch pair (without notifying Janus in any way). Janus/libnice, on the other hand, will detect a temporary failure but a check STUN coming from the working candidate (relay) on the other side will restore the ICE liveness. Even in that case, libnice will not switch pair and will try to send to an invalid port, hence the errors “only sent -1 bytes”.
We have tried to find workarounds through the config options ice_consent_freshness and hangup_on_failed. When setting both to true, it’d be easier in these cases to make the PC fail and avoiding a loop of “only sent -1 bytes”.
Your case seems even trickier: the prflx candidate works for some seconds then a network policy kicks in (firewall, shaping or whatever) and you lose that path. So an ICE restart or a PC setup from scratch will basically bring back to the same status.
You can try experimenting with the config options I mentioned but I’m not sure those would help. In that case I guess your only option is falling back to relay only candidates. That is logically correct since it’s the only working path.
This was my assumption. I figured since Janus makes the offer it’s the controlling agent. It’s also in an environment where the clients are likely to have enterprise-y firewalls. I’ve already tried different combinations of ice_consent_freshness and ice_keepalive_conncheck to no avail. Unfortunately hangup_on_failed isn’t an option for us, because we need to support WHIP clients with ICE restarts, not that restarting the PC would be much better.
I appreciate the detailed response. I’m considering starting two PCs simultaneously, one with iceTransportPolicy set to relay and one set to all in order to negate the startup time it takes negotiation to fail. If the one set to relay succeeds and the one set to all does not, go with relay connection. If all succeeds, use that PC and close the relay one.