spscream
(Alexander Malaev)
April 18, 2025, 6:53pm
1
Hi, I’m investigating crash on our fork then helper threads are enabled and found some missing reference increment to room here: janus-gateway/src/plugins/janus_videoroom.c at master · meetecho/janus-gateway · GitHub
In other places publisher->room reference is incremented(in dummy and remote publishers). Is it correct or bug maybe?
I’m not sure crash is occuring on master version so I don’t make issue on github, since I can’t provide any details now, continue to digging into it. In our case looks like room or streams references cleared ahead of time and janus crashes in helper thread on room destroy.
spscream
(Alexander Malaev)
April 21, 2025, 3:05pm
2
I also found what ref counter is incremented on helper threads creation, before it was initialized:
for(i=0; i<helper_threads; i++) {
janus_videoroom_helper *helper = g_malloc0(sizeof(janus_videoroom_helper));
helper->id = i+1;
helper->room = videoroom;
helper->subscribers = g_hash_table_new(NULL, NULL);
helper->queued_packets = g_async_queue_new_full((GDestroyNotify)janus_videoroom_rtp_relay_packet_free);
janus_mutex_init(&helper->mutex);
janus_refcount_init(&helper->ref, janus_videoroom_helper_free);
/* Spawn a thread and add references */
g_snprintf(tname, sizeof(tname), "vhelp %u-%s", helper->id, videoroom->room_id_str);
janus_refcount_increase(&videoroom->ref);
janus_refcount_increase(&helper->ref);
helper->thread = g_thread_try_new(tname, &janus_videoroom_helper_thread, helper, &error);
if(error != NULL) {
/* TODO Should this be a hard failure? */
JANUS_LOG(LOG_ERR, "Got error %d (%s) trying to launch the helper thread...\n",
error->code, error->message ? error->message : "??");
} else {
janus_refcount_increase(&helper->ref);
videoroom->threads = g_list_append(videoroom->threads, helper);
}
videoroom->record = json_is_true(record);
}
if(rec_dir) {
videoroom->rec_dir = g_strdup(json_string_value(rec_dir));
}
if(lock_record) {
videoroom->lock_record = json_is_true(lock_record);
}
g_atomic_int_set(&videoroom->destroyed, 0);
janus_mutex_init(&videoroom->mutex);
janus_refcount_init(&videoroom->ref, janus_videoroom_room_free);
videoroom->participants = g_hash_table_new_full(string_ids ? g_str_hash : g_int64_hash, string_ids ? g_str_equal : g_int64_equal,
(GDestroyNotify)g_free, (GDestroyNotify)janus_videoroom_publisher_dereference);
videoroom->private_ids = g_hash_table_new(NULL, NULL);
videoroom->allowed = g_hash_table_new_full(g_str_hash, g_str_equal, (GDestroyNotify)g_free, NULL);
if(allowed != NULL) {
/* Populate the "allowed" list as an ACL for people trying to join */
if(json_array_size(allowed) > 0) {
size_t i = 0;
for(i=0; i<json_array_size(allowed); i++) {
const char *token = json_string_value(json_array_get(allowed, i));
spscream
(Alexander Malaev)
April 23, 2025, 9:09am
3
spscream
(Alexander Malaev)
April 23, 2025, 9:09am
4
but I found a problem - if helper threads are enabled and if room is destroyed via ‘destroy’ call, refs to room from helper threads aren’t cleared
lorenzo
(Lorenzo Miniero)
April 23, 2025, 10:12am
5
Dummy publishers were using references incorrectly: see here for a fix we’ve prepared that should be merged soon. As such, that’s what should be used as an example as to how references are handled.
I haven’t investigated how helper threads use references, as I can’t remember on the top of my head. Any additional reference that may be needed will need to be dealt with accordingly when cleaning up, though: this was the issue with dummy publishers, for instance, which don’t go through the same cleanup procedures regular and remote publishers go through. Besides, there are differences between destroying a room using “destroy” and having it destroyed by just shutting down Janus: the former may have more things happening, while the latter relies entirely on the callback that’s invoked when the room is removed from its hashtable.
spscream
(Alexander Malaev)
April 23, 2025, 10:49am
6
I got the following on room destroy(don’t look on line numbers, this is our fork, so it’s not related to master of janus):
2025-04-23T09:24:05.167031831Z [plugins/janus_videoroom.c:janus_videoroom_room_destroy:2891:decrease] 0x61200061a3b8 (38)
2025-04-23T09:24:05.167074441Z [plugins/janus_videoroom.c:janus_videoroom_room_dereference:2886:decrease] 0x61200061a3b8 (37)
2025-04-23T09:24:05.167262265Z [plugins/janus_videoroom.c:janus_videoroom_room_dereference:2886:decrease] 0x61200061a3b8 (36)
2025-04-23T09:24:05.167401455Z [plugins/janus_videoroom.c:janus_videoroom_room_dereference:2886:decrease] 0x61200061a3b8 (35)
2025-04-23T09:24:05.167581355Z [plugins/janus_videoroom.c:janus_videoroom_room_dereference:2886:decrease] 0x61200061a3b8 (34)
2025-04-23T09:24:05.167683167Z [plugins/janus_videoroom.c:janus_videoroom_room_dereference:2886:decrease] 0x61200061a3b8 (33)
2025-04-23T09:24:05.167688216Z [plugins/janus_videoroom.c:janus_videoroom_room_dereference:2886:decrease] 0x61200061a3b8 (32)
2025-04-23T09:24:05.167839583Z [plugins/janus_videoroom.c:janus_videoroom_room_dereference:2886:decrease] 0x61200061a3b8 (31)
2025-04-23T09:24:05.167850145Z [plugins/janus_videoroom.c:janus_videoroom_room_dereference:2886:decrease] 0x61200061a3b8 (30)
2025-04-23T09:24:05.167853535Z [plugins/janus_videoroom.c:janus_videoroom_room_dereference:2886:decrease] 0x61200061a3b8 (29)
2025-04-23T09:24:05.167905480Z [plugins/janus_videoroom.c:janus_videoroom_room_dereference:2886:decrease] 0x61200061a3b8 (28)
2025-04-23T09:24:05.168003048Z [plugins/janus_videoroom.c:janus_videoroom_room_dereference:2886:decrease] 0x61200061a3b8 (27)
2025-04-23T09:24:05.168009406Z [plugins/janus_videoroom.c:janus_videoroom_process_synchronous_request:5661:decrease] 0x61200061a3b8 (26)
2025-04-23T09:24:05.199128516Z [plugins/janus_videoroom.c:janus_videoroom_destroy_session:4628:decrease] 0x61200061a3b8 (25)
2025-04-23T09:24:05.221558701Z [plugins/janus_videoroom.c:janus_videoroom_destroy_session:4628:decrease] 0x61200061a3b8 (24)
2025-04-23T09:24:05.227838943Z [plugins/janus_videoroom.c:janus_videoroom_destroy_session:4628:decrease] 0x61200061a3b8 (23)
2025-04-23T09:24:05.231123795Z [plugins/janus_videoroom.c:janus_videoroom_destroy_session:4628:decrease] 0x61200061a3b8 (22)
2025-04-23T09:24:05.234499909Z [plugins/janus_videoroom.c:janus_videoroom_destroy_session:4628:decrease] 0x61200061a3b8 (21)
2025-04-23T09:24:05.237615655Z [plugins/janus_videoroom.c:janus_videoroom_destroy_session:4628:decrease] 0x61200061a3b8 (20)
2025-04-23T09:24:05.916402152Z [plugins/janus_videoroom.c:janus_videoroom_remote_publisher_thread:14615:decrease] 0x61200061a3b8 (19)
2025-04-23T09:24:05.969971884Z [plugins/janus_videoroom.c:janus_videoroom_remote_publisher_thread:14615:decrease] 0x61200061a3b8 (18)
2025-04-23T09:24:06.124293857Z [plugins/janus_videoroom.c:janus_videoroom_remote_publisher_thread:14615:decrease] 0x61200061a3b8 (17)
2025-04-23T09:24:06.162886473Z [plugins/janus_videoroom.c:janus_videoroom_remote_publisher_thread:14615:decrease] 0x61200061a3b8 (16)
16 - is the number of our helper threads
I think since helper threads use g_async_queue_pop, it blocks forever if no data available and thread will run forever even if room already detroyed (janus-gateway/src/plugins/janus_videoroom.c at master · meetecho/janus-gateway · GitHub )
maybe we should change it to g_async_queue_timeout_pop? to force while cycle check for room->destroyed periodically?
lorenzo
(Lorenzo Miniero)
April 23, 2025, 10:52am
7
You can probably check with gdb if that’s indeed where those threads are waiting. In the core we artificially inject a fake “exit” object whose only purpose is to wake those pop and break the loop (e.g., janus_ice_detach_handle
in ice.c
), so I guess we could do the same here.
spscream
(Alexander Malaev)
April 23, 2025, 11:17am
8
yeah, in this case helper thread awaits for exit_packet, but exit_packet queued on janus_videoroom_room_free, which called if room ref counter goes to 0, looks like deadlock for me
lorenzo
(Lorenzo Miniero)
April 23, 2025, 2:44pm
9
I think the problem may be that we only add exit_packet
in janus_videoroom_room_free
, but that’s the method that’s invoked when the room’s references get to 0: since helper threads hold a ref too, that never happens, so it’s indeed a catch-22 kind of situation. Maybe that block of code should be moved to janus_videoroom_room_destroy
, even though we should then ensure no race condition occurs for publishers still using them.
spscream
(Alexander Malaev)
April 24, 2025, 7:41am
10
moving to janus_videoroom_room_destroy fixes it:
2025-04-24T07:34:08.113640416Z [plugins/janus_videoroom.c:janus_videoroom_leave_or_unpublish:4530:increase] 0x612000023b38 (21)
2025-04-24T07:34:08.113859596Z [plugins/janus_videoroom.c:janus_videoroom_leave_or_unpublish:4569:decrease] 0x612000023b38 (20)
2025-04-24T07:34:08.126752014Z [plugins/janus_videoroom.c:janus_videoroom_process_synchronous_request:5615:increase] 0x612000023b38 (20)
2025-04-24T07:34:08.126816872Z [plugins/janus_videoroom.c:janus_videoroom_helper_thread:14678:decrease] 0x612000023b38 (19)
2025-04-24T07:34:08.126941673Z [plugins/janus_videoroom.c:janus_videoroom_helper_thread:14678:decrease] 0x612000023b38 (18)
2025-04-24T07:34:08.126955350Z [plugins/janus_videoroom.c:janus_videoroom_room_destroy:2899:decrease] 0x612000023b38 (17)
2025-04-24T07:34:08.127135918Z [plugins/janus_videoroom.c:janus_videoroom_room_dereference:2886:decrease] 0x612000023b38 (16)
2025-04-24T07:34:08.127252828Z [plugins/janus_videoroom.c:janus_videoroom_room_dereference:2886:decrease] 0x612000023b38 (15)
2025-04-24T07:34:08.127551317Z [plugins/janus_videoroom.c:janus_videoroom_helper_thread:14678:decrease] 0x612000023b38 (14)
2025-04-24T07:34:08.127561325Z [plugins/janus_videoroom.c:janus_videoroom_helper_thread:14678:decrease] 0x612000023b38 (13)
2025-04-24T07:34:08.127569269Z [plugins/janus_videoroom.c:janus_videoroom_helper_thread:14678:decrease] 0x612000023b38 (12)
2025-04-24T07:34:08.127582185Z [plugins/janus_videoroom.c:janus_videoroom_helper_thread:14678:decrease] 0x612000023b38 (11)
2025-04-24T07:34:08.127660782Z [plugins/janus_videoroom.c:janus_videoroom_helper_thread:14678:decrease] 0x612000023b38 (11)
2025-04-24T07:34:08.127687402Z [plugins/janus_videoroom.c:janus_videoroom_helper_thread:14678:decrease] 0x612000023b38 (9)
2025-04-24T07:34:08.127707631Z [plugins/janus_videoroom.c:janus_videoroom_helper_thread:14678:decrease] 0x612000023b38 (8)
2025-04-24T07:34:08.127719653Z [plugins/janus_videoroom.c:janus_videoroom_helper_thread:14678:decrease] 0x612000023b38 (7)
2025-04-24T07:34:08.127746333Z [plugins/janus_videoroom.c:janus_videoroom_helper_thread:14678:decrease] 0x612000023b38 (6)
2025-04-24T07:34:08.127753205Z [plugins/janus_videoroom.c:janus_videoroom_process_synchronous_request:5663:decrease] 0x612000023b38 (5)
2025-04-24T07:34:08.127761791Z [plugins/janus_videoroom.c:janus_videoroom_helper_thread:14678:decrease] 0x612000023b38 (4)
2025-04-24T07:34:08.127790850Z [plugins/janus_videoroom.c:janus_videoroom_helper_thread:14678:decrease] 0x612000023b38 (3)
2025-04-24T07:34:08.127799871Z [plugins/janus_videoroom.c:janus_videoroom_helper_thread:14678:decrease] 0x612000023b38 (2)
2025-04-24T07:34:08.127962664Z [plugins/janus_videoroom.c:janus_videoroom_helper_thread:14678:decrease] 0x612000023b38 (1)
2025-04-24T07:34:08.127976467Z [plugins/janus_videoroom.c:janus_videoroom_helper_thread:14678:decrease] 0x612000023b38 (0)
lorenzo
(Lorenzo Miniero)
April 24, 2025, 8:06am
11
Cool! Before submitting a patch to do that, though, you may want to make sure that messing with room->threads
there doesn’t cause any issue in other places, e.g., publishers still pushing traffic to the room and trying to use helpers for that before they’re removed.
spscream
(Alexander Malaev)
April 25, 2025, 12:48pm
12
How can be be sure, can you suggest?
We run janus under load testing and I see no problems now(janus doesn’t crash) all streams are cleared before room is destroyed.
Maybe check for room->destroyed before trying to iterate on room->threads?
spscream
(Alexander Malaev)
April 25, 2025, 1:48pm
13
I also found what helper->ref != 0 on room destroy:
2025-04-25T13:06:17.760739783Z [plugins/janus_videoroom.c:janus_videoroom_process_synchronous_request:5071:init] 0x60700001d950 (1)
2025-04-25T13:06:17.760857039Z [plugins/janus_videoroom.c:janus_videoroom_process_synchronous_request:5075:increase] 0x60700001d950 (2)
2025-04-25T13:06:17.760916837Z [plugins/janus_videoroom.c:janus_videoroom_process_synchronous_request:5082:increase] 0x60700001d950 (3)
2025-04-25T13:15:46.529385498Z [plugins/janus_videoroom.c:janus_videoroom_helper_destroy:2414:decrease] 0x60700001d950 (2)
2025-04-25T13:15:46.529414289Z [plugins/janus_videoroom.c:janus_videoroom_helper_thread:14651:decrease] 0x60700001d950 (1)