Hi, we are getting issues on one of our customer, they are having on peak 400 rooms, 700 participants joined to vr and 300 audio publishers to audiobridge(not no the same room, splitted between those 400 rooms). Their janus servers spec is 112 cpu logical cores with 1tb of memory running on hardware servers.
During the problem we are getting messages like these in janus logs:
2025-12-03T11:03:07.935383323Z Destroying session 5296646086800046; 0x744d8770b9f0
2025-12-03T11:03:07.935391095Z Detaching handle from JANUS VideoRoom plugin; 0x744d7ff814a0 0x744d51f4f7c0 0x744d7ff814a0 0x744d870f69d0
2025-12-03T11:03:07.935556587Z [1862024904566746] Handle and related resources freed; 0x744d7ff814a0 0x744d8770b9f0
2025-12-03T11:03:08.047161660Z e[33m[WARN]e[0m [4362754542292474] Didn't receive audio for more than 2 second(s)...
2025-12-03T11:03:08.047194814Z e[33m[WARN]e[0m [8561031195446425] Didn't receive audio for more than 2 second(s)...
2025-12-03T11:03:08.049458629Z e[33m[WARN]e[0m [2907460136359324] Didn't receive video for more than 2 second(s)...
2025-12-03T11:03:08.084176494Z e[31m[ERR]e[0m [plugins/janus_videoroom.c:janus_videoroom_remote_publisher_thread:14471] remote publisher 81334128 audio receiving: false
2025-12-03T11:03:08.133518727Z e[33m[WARN]e[0m Participant queue-in contains too many packets, clearing now (count=5)
2025-12-03T11:03:08.133767501Z e[33m[WARN]e[0m Participant queue-in contains too many packets, clearing now (count=5)
2025-12-03T11:03:08.133822400Z e[33m[WARN]e[0m Participant queue-in contains too many packets, clearing now (count=5)
2025-12-03T11:03:08.138042398Z e[33m[WARN]e[0m Participant queue-in contains too many packets, clearing now (count=5)
2025-12-03T11:03:08.138293580Z e[33m[WARN]e[0m Participant queue-in contains too many packets, clearing now (count=5)
2025-12-03T11:03:08.141789492Z e[33m[WARN]e[0m Participant queue-in contains too many packets, clearing now (count=5)
and this:
2025-12-03T11:03:29.420822528Z e[33m[WARN]e[0m [8600477068923980] Discarding too old outgoing packet (age=1055487us)
2025-12-03T11:03:29.420825426Z e[33m[WARN]e[0m [8600477068923980] Discarding too old outgoing packet (age=1053981us)
2025-12-03T11:03:29.420830704Z e[33m[WARN]e[0m [8600477068923980] Discarding too old outgoing packet (age=1047758us)
2025-12-03T11:03:29.420833441Z e[33m[WARN]e[0m [8600477068923980] Discarding too old outgoing packet (age=1020543us)
2025-12-03T11:03:29.420836508Z e[33m[WARN]e[0m [8600477068923980] Discarding too old outgoing packet (age=1006888us)
same time cpu usage is rising to 100%, and clients getting massive hangups.
Are the janus encoders using separate cores? What logic for cores selection for encoders/decoders in audiobridge? Can we tune something maybe?
How can we figure out what is happening to solve this? The same issue are encountered on different dc locations which are not related to each other.
We can split rooms to use more physical servers but customer saying what hardware is strong enough already and want to figure out what is wrong with them.