Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Possible dialog memory leak on combination $DLG_timeout $DLG_delay_delete #3370

Open
volga629-1 opened this issue Apr 22, 2024 · 16 comments
Assignees
Milestone

Comments

@volga629-1
Copy link

volga629-1 commented Apr 22, 2024

Version

 opensips -V
version: opensips 3.4.0 (x86_64/linux)
flags: STATS: On, DISABLE_NAGLE, USE_MCAST, SHM_MMAP, PKG_MALLOC, Q_MALLOC, F_MALLOC, HP_MALLOC, DBG_MALLOC, FAST_LOCK-ADAPTIVE_WAIT
ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144, MAX_LISTEN 16, MAX_URI_SIZE 1024, BUF_SIZE 65535
poll method support: poll, epoll, sigio_rt, select.
git revision: f3e0d5333
main.c compiled on 06:17:39 Aug  9 2023 with gcc 12

Issue

In combination of vars $DLG_del_delay and $DLG_timeout causing dialogs with state 5 never be removed, which causing out of memory issues.

Specific vm is 8GB share memory and 12 Gb physical ( not over provisioned)

Apr 21 14:09:02 sbc5 /usr/sbin/opensips[7126]: ERROR:core:hp_shm_malloc_dbg: not enough free shm memory (3406256 bytes left, need 6568), please increase the "-m" command line parameter!
Apr 21 14:09:02 sbc5 /usr/sbin/opensips[7126]: ERROR:tm:sip_msg_cloner: no more share memory
Apr 21 14:09:02 sbc5 /usr/sbin/opensips[7126]: ERROR:tm:new_t: out of mem

Share Memory stats for 24 h

# sbc 5 21 Apr 2024
(opensips-cli): mi get_statistics all
{
    "shmem:total_size": 8589934592,
    "shmem:max_used_size": 278452104,
    "shmem:free_size": 8330746920,
    "shmem:used_size": 212396184,
    "shmem:real_used_size": 259187672,
    "shmem:fragments": 956846,

# sbc 5 22 Apr 2024
(opensips-cli): mi get_statistics all
{
    "shmem:total_size": 8589934592,
    "shmem:max_used_size": 783103000,
    "shmem:free_size": 7813004096,
    "shmem:used_size": 642682864,
    "shmem:real_used_size": 776930496,
    "shmem:fragments": 2774774,

Code


        # CSTA INVITE
        if(!has_totag() && is_method("INVITE") && has_body("application/csta+xml")) {

                ##xlog("[REQ_ROUTE] [$rm] [$cfg_line] CSTA reqest from => [$si] enabling debug\n");
                # Send BYE on dialog timeout
                create_dialog("B");
                # delay ( late BYE )
                $DLG_del_delay = 180;
                $DLG_timeout = 120;

SHM dump
opensips-SHM-dump.txt

@volga629-1 volga629-1 changed the title [BUG] Dialog memory leak on combination $DLG_timeout $DLG_delay_delete [BUG] Possible dialog memory leak on combination $DLG_timeout $DLG_delay_delete Apr 23, 2024
Copy link

github-actions bot commented May 9, 2024

Any updates here? No progress has been made in the last 15 days, marking as stale. Will close this issue if no further updates are made in the next 30 days.

@github-actions github-actions bot added the stale label May 9, 2024
@volga629-1
Copy link
Author

in progress

@stale stale bot removed the stale label May 9, 2024
Copy link

Any updates here? No progress has been made in the last 15 days, marking as stale. Will close this issue if no further updates are made in the next 30 days.

@github-actions github-actions bot added the stale label May 25, 2024
@volga629-1
Copy link
Author

In progress

@github-actions github-actions bot removed the stale label May 31, 2024
@bogdan-iancu
Copy link
Member

Do you have a minimal working cfg reproducing the issue (like how to combine the 2 options in the way that the dialogs get stuck in state 5) ?

@bogdan-iancu bogdan-iancu self-assigned this Jun 3, 2024
@bogdan-iancu bogdan-iancu added this to the 3.4.6 milestone Jun 3, 2024
@volga629-1
Copy link
Author

Hello Bogdan,
This cfg for SIP INVITE.

	# Regular INVITE  without Alert Info header
        if(!has_totag() && is_method("INVITE") && !has_body("application/csta+xml")) {
		# Create dialog
		create_dialog("B");


                $DLG_timeout = 120;

		# Dialog delete delay ( late BYE )
		$DLG_del_delay = 1800;

@bogdan-iancu
Copy link
Member

And how does the call terminate? via timeout ? or via BYE ?

@volga629-1
Copy link
Author

Normally completed with BYE.

Copy link

Any updates here? No progress has been made in the last 15 days, marking as stale. Will close this issue if no further updates are made in the next 30 days.

@github-actions github-actions bot added the stale label Jun 19, 2024
@volga629-1
Copy link
Author

In progress

@stale stale bot removed the stale label Jun 20, 2024
Copy link

github-actions bot commented Jul 6, 2024

Any updates here? No progress has been made in the last 15 days, marking as stale. Will close this issue if no further updates are made in the next 30 days.

@github-actions github-actions bot added the stale label Jul 6, 2024
@volga629-1
Copy link
Author

in progress

@stale stale bot removed the stale label Jul 9, 2024
Copy link

Any updates here? No progress has been made in the last 15 days, marking as stale. Will close this issue if no further updates are made in the next 30 days.

@github-actions github-actions bot added the stale label Jul 31, 2024
@volga629-1
Copy link
Author

In progress

@github-actions github-actions bot removed the stale label Aug 13, 2024
Copy link

Any updates here? No progress has been made in the last 15 days, marking as stale. Will close this issue if no further updates are made in the next 30 days.

@github-actions github-actions bot added the stale label Aug 29, 2024
@luislza
Copy link

luislza commented Sep 17, 2024

Hi,

We're seeing the same behaviour in an active / passive cluster (version 3.4.8) with active/backup sharing tags (no DB) and have narrowed the cause down to one scenario (in our case).

It seems that dialogs that are CANCELED with a response of 487 (no BYE) before answer hang around on the active server until restart - they're shown in the output of dlg_list on the active server.

These dialogs replicate correctly to the passive node and correctly disappear from the output of dlg_list on the passive node.

The odd part in the output of dlg_list on the active node seems to be that they all have the following in common:

        "state": 5,
        "timestart": 0,
        "timeout": 0

No timestart and no timeout.

Restarting the active node clears the dialogs and syncs active dialogs from the passive node correctly.

EDIT: In our case there is no delete delay set and adding one makes no difference to the behaviour.
EDIT2: I have an opensips debug log and pcaps available for this scenario that I can e-mail through (due to the sensitive personal information contained).

Luis

@stale stale bot removed the stale label Sep 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants