Hi, all
We have several production systems with Janus version 0.15.2 on Centos7.
Periodically we catch a deadlock in the SIP plugin module.
This can happen from a few hours to a few weeks after launch.
At the moment all janus_sip_sessions expire and close, but new ones cannot be created.
Per core dump, one thread is blocked when janus_mutex_lock(&master->mutex) in janus_sip_destroy_session()
Thread 24 (Thread 0x7f4566ff5700 (LWP 18689)):
#0 0x00007f45b5613e29 in syscall () from /lib64/libc.so.6
#1 0x00007f45b6fabf42 in g_mutex_lock_slowpath () from /lib64/libglib-2.0.so.0
#2 0x00007f45b00f520c in janus_sip_destroy_session (handle=0x7f45a4005730, error=) at plugins/janus_sip.c:2460
#3 0x0000000000448d4b in janus_ice_outgoing_traffic_handle (handle=0x7f45a40079a0, pkt=) at ice.c:4896
#4 0x000000000044be84 in janus_ice_outgoing_traffic_dispatch (source=0x7f45a40080b0, callback=, user_data=) at ice.c:492
#5 0x00007f45b6fdc119 in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
#6 0x00007f45b6fdc478 in g_main_context_iterate.isra.19 () from /lib64/libglib-2.0.so.0
#7 0x00007f45b6fdc74a in g_main_loop_run () from /lib64/libglib-2.0.so.0
#8 0x000000000043d370 in janus_ice_handle_thread (data=0x7f45a40079a0) at ice.c:1316
#9 0x00007f45b70035b0 in g_thread_proxy () from /lib64/libglib-2.0.so.0
#10 0x00007f45b58f0ea5 in start_thread () from /lib64/libpthread.so.0
#11 0x00007f45b5619b0d in clone () from /lib64/libc.so.6
Other threads are waiting for janus_mutex_lock(&sessions_mutex) in janus_sip_destroy_session()
Thread 23 (Thread 0x7f4580ff9700 (LWP 18685)):
#0 0x00007f45b5613e29 in syscall () from /lib64/libc.so.6
#1 0x00007f45b6fabf42 in g_mutex_lock_slowpath () from /lib64/libglib-2.0.so.0
#2 0x00007f45b00f4eec in janus_sip_destroy_session (handle=0x7f45a4005790, error=0x7f4580ff88a0) at plugins/janus_sip.c:2425
#3 0x0000000000448d4b in janus_ice_outgoing_traffic_handle (handle=0x7f45a4003e50, pkt=) at ice.c:4896
#4 0x000000000044be84 in janus_ice_outgoing_traffic_dispatch (source=0x7f45a4005860, callback=, user_data=) at ice.c:492
#5 0x00007f45b6fdc119 in g_main_context_dispatch () from /lib64/libglib-2.0.so.0
#6 0x00007f45b6fdc478 in g_main_context_iterate.isra.19 () from /lib64/libglib-2.0.so.0
#7 0x00007f45b6fdc74a in g_main_loop_run () from /lib64/libglib-2.0.so.0
#8 0x000000000043d370 in janus_ice_handle_thread (data=0x7f45a4003e50) at ice.c:1316
#9 0x00007f45b70035b0 in g_thread_proxy () from /lib64/libglib-2.0.so.0
#10 0x00007f45b58f0ea5 in start_thread () from /lib64/libpthread.so.0
#11 0x00007f45b5619b0d in clone () from /lib64/libc.so.6
The service does not crash. Only a restart helps.
Compiled using GMutex (USE_PTHREAD_MUTEX not defined).
Per GDB, the blocked thread #24 has invalid pointer to session->master (may be, “use-after-free”) in helper session.
Q: Can there be a situation where the session->master pointer is not cleared in helper sessions after the master session is destroyed?