Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connection issues after host system reboot #333

Open
teemupiiroinenwirepas opened this issue Nov 16, 2022 · 6 comments
Open

Connection issues after host system reboot #333

teemupiiroinenwirepas opened this issue Nov 16, 2022 · 6 comments

Comments

@teemupiiroinenwirepas
Copy link

teemupiiroinenwirepas commented Nov 16, 2022

When the host system is rebooted with "sudo reboot" command, some connections change from clean_session true to false. Also some old connection names can be seen after the restart. We were not able to reproduce this with native Vernemq installation, but only happens when using Vernemq Docker image. The issue can be seen from Python, JavaScript and C/C++ language connections, so it doesn't seem to be client library issue.

Issue can be "fixed" by adding extra sleep before Docker daemon start, by running "sudo systemctl edit docker.service" and adding these lines:
[Service]
ExecStartPre=/bin/sleep 10

Host: Ubuntu 20.04.4 LTS running in AWS EC2
Docker version: 20.10.21 (issue is also seen with older version)
Vernemq docker version: latest (issues is also seen with older versions)

Before host restart:

+---------------+-------------------------------------+------------------+
| clean_session | client_id                           | offline_messages |
+---------------+-------------------------------------+------------------+
| true          | rtsituation_manager-1-1668591944364 | 0                |
+---------------+-------------------------------------+------------------+
| true          | extraClientId44                     | 0                |
+---------------+-------------------------------------+------------------+
| true          | extraClientId19                     | 0                |
+---------------+-------------------------------------+------------------+
| true          | extraClientId52                     | 0                |
+---------------+-------------------------------------+------------------+
| true          | extraClientId47                     | 0                |
+---------------+-------------------------------------+------------------+
| true          | extraClientId38                     | 0                |
+---------------+-------------------------------------+------------------+
| true          | Parser                              | 0                |
+---------------+-------------------------------------+------------------+
| true          | extraClientId42                     | 0                |
+---------------+-------------------------------------+------------------+
| true          | extraClientId32                     | 0                |
+---------------+-------------------------------------+------------------+
| true          | extraClientId28                     | 0                |
+---------------+-------------------------------------+------------------+
| true          | clientSimu                          | 0                |
+---------------+-------------------------------------+------------------+
| true          | extraClientId33                     | 0                |
+---------------+-------------------------------------+------------------+
| true          | extraClientId35                     | 0                |
+---------------+-------------------------------------+------------------+
| true          | anon-QIn+Oi9jgpwVhAtOBOWuvdTVSXI=   | 0                |
+---------------+-------------------------------------+------------------+
| true          | extraClientId24                     | 0                |
+---------------+-------------------------------------+------------------+
| true          | anon-H9oQ8Kwn+zL8M5Wgf6Kqta8HlJE=   | 0                |
+---------------+-------------------------------------+------------------+
| true          | extraClientId29                     | 0                |
+---------------+-------------------------------------+------------------+
| true          | rtsituation_manager-0-1668591944369 | 0                |
+---------------+-------------------------------------+------------------+
| true          | extraClientId43                     | 0                |
+---------------+-------------------------------------+------------------+
| true          | extraClientId8                      | 0                |
+---------------+-------------------------------------+------------------+
| true          | extraClientId30                     | 0                |
+---------------+-------------------------------------+------------------+
| true          | extraClientId0                      | 0                |
+---------------+-------------------------------------+------------------+
| true          | extraClientId51                     | 0                |
+---------------+-------------------------------------+------------------+
| true          | extraClientId12                     | 0                |
+---------------+-------------------------------------+------------------+
| true          | gateway_communicator                | 0                |
+---------------+-------------------------------------+------------------+
| true          | extraClientId54                     | 0                |
+---------------+-------------------------------------+------------------+
| true          | extraClientId6                      | 0                |
+---------------+-------------------------------------+------------------+
| true          | extraClientId16                     | 0                |
+---------------+-------------------------------------+------------------+
| true          | extraClientId17                     | 0                |
+---------------+-------------------------------------+------------------+
| true          | extraClientId18                     | 0                |
+---------------+-------------------------------------+------------------+
| true          | extraClientId21                     | 0                |
+---------------+-------------------------------------+------------------+
| true          | extraClientId4                      | 0                |
+---------------+-------------------------------------+------------------+
| true          | extraClientId36                     | 0                |
+---------------+-------------------------------------+------------------+
| true          | 7469ba00_ddabc3ae                   | 0                |
+---------------+-------------------------------------+------------------+
| true          | extraClientId45                     | 0                |
+---------------+-------------------------------------+------------------+
| true          | extraClientId34                     | 0                |
+---------------+-------------------------------------+------------------+
| true          | extraClientId20                     | 0                |
+---------------+-------------------------------------+------------------+
| true          | extraClientId48                     | 0                |
+---------------+-------------------------------------+------------------+
| true          | extraClientId49                     | 0                |
+---------------+-------------------------------------+------------------+
| true          | extraClientId22                     | 0                |
+---------------+-------------------------------------+------------------+
| true          | extraClientId25                     | 0                |
+---------------+-------------------------------------+------------------+
| true          | mqttjs_473365cb                     | 0                |
+---------------+-------------------------------------+------------------+
| true          | dc7b229a_33c0eee9                   | 0                |
+---------------+-------------------------------------+------------------+
| true          | extraClientId46                     | 0                |
+---------------+-------------------------------------+------------------+
| true          | extraClientId9                      | 0                |
+---------------+-------------------------------------+------------------+
| true          | extraClientId26                     | 0                |
+---------------+-------------------------------------+------------------+
| true          | extraClientId13                     | 0                |
+---------------+-------------------------------------+------------------+
| true          | extraClientId3                      | 0                |
+---------------+-------------------------------------+------------------+
| true          | extraClientId50                     | 0                |
+---------------+-------------------------------------+------------------+
| true          | extraClientId2                      | 0                |
+---------------+-------------------------------------+------------------+
| true          | extraClientId31                     | 0                |
+---------------+-------------------------------------+------------------+
| true          | extraClientId1                      | 0                |
+---------------+-------------------------------------+------------------+
| true          | extraClientId53                     | 0                |
+---------------+-------------------------------------+------------------+
| true          | extraClientId15                     | 0                |
+---------------+-------------------------------------+------------------+
| true          | extraClientId5                      | 0                |
+---------------+-------------------------------------+------------------+
| true          | extraClientId37                     | 0                |
+---------------+-------------------------------------+------------------+
| true          | extraClientId14                     | 0                |
+---------------+-------------------------------------+------------------+
| true          | extraClientId41                     | 0                |
+---------------+-------------------------------------+------------------+
| true          | extraClientId7                      | 0                |
+---------------+-------------------------------------+------------------+
| true          | extraClientId40                     | 0                |
+---------------+-------------------------------------+------------------+
| true          | extraClientId23                     | 0                |
+---------------+-------------------------------------+------------------+
| true          | extraClientId10                     | 0                |
+---------------+-------------------------------------+------------------+
| true          | extraClientId27                     | 0                |
+---------------+-------------------------------------+------------------+
| true          | extraClientId11                     | 0                |
+---------------+-------------------------------------+------------------+
| true          | extraClientId39                     | 0                |
+---------------+-------------------------------------+------------------+

After host restart:

+---------------+-------------------------------------+------------------+
| clean_session | client_id                           | offline_messages |
+---------------+-------------------------------------+------------------+
| false         | rtsituation_manager-1-1668591944364 | 0                |
+---------------+-------------------------------------+------------------+
| true          | Parser                              | 0                |
+---------------+-------------------------------------+------------------+
| false         | extraClientId32                     | 0                |
+---------------+-------------------------------------+------------------+
| false         | extraClientId28                     | 0                |
+---------------+-------------------------------------+------------------+
| true          | anon-og5vBDO3RSjEaY+hIdo+UbAS/Sw=   | 0                |
+---------------+-------------------------------------+------------------+
| false         | extraClientId33                     | 0                |
+---------------+-------------------------------------+------------------+
| false         | extraClientId35                     | 0                |
+---------------+-------------------------------------+------------------+
| true          | anon-uM3uqhgZHmrWFh+2zmdJZUciysE=   | 0                |
+---------------+-------------------------------------+------------------+
| false         | anon-QIn+Oi9jgpwVhAtOBOWuvdTVSXI=   | 0                |
+---------------+-------------------------------------+------------------+
| false         | rtsituation_manager-0-1668591944369 | 0                |
+---------------+-------------------------------------+------------------+
| false         | extraClientId43                     | 0                |
+---------------+-------------------------------------+------------------+
| true          | rtsituation_manager-1-1668592049839 | 0                |
+---------------+-------------------------------------+------------------+
| true          | gateway_communicator                | 0                |
+---------------+-------------------------------------+------------------+
| false         | extraClientId6                      | 0                |
+---------------+-------------------------------------+------------------+
| true          | mqttjs_2b3f0295                     | 0                |
+---------------+-------------------------------------+------------------+
| false         | extraClientId21                     | 0                |
+---------------+-------------------------------------+------------------+
| true          | d7ecf346_6ec39341                   | 0                |
+---------------+-------------------------------------+------------------+
| false         | extraClientId34                     | 0                |
+---------------+-------------------------------------+------------------+
| false         | extraClientId20                     | 0                |
+---------------+-------------------------------------+------------------+
| true          | rtsituation_manager-0-1668592049813 | 0                |
+---------------+-------------------------------------+------------------+
| false         | extraClientId22                     | 0                |
+---------------+-------------------------------------+------------------+
| false         | dc7b229a_33c0eee9                   | 0                |
+---------------+-------------------------------------+------------------+
| false         | extraClientId26                     | 0                |
+---------------+-------------------------------------+------------------+
| false         | extraClientId13                     | 0                |
+---------------+-------------------------------------+------------------+
| false         | extraClientId3                      | 0                |
+---------------+-------------------------------------+------------------+
| false         | extraClientId50                     | 0                |
+---------------+-------------------------------------+------------------+
| false         | extraClientId2                      | 0                |
+---------------+-------------------------------------+------------------+
| false         | extraClientId31                     | 0                |
+---------------+-------------------------------------+------------------+
| false         | extraClientId41                     | 0                |
+---------------+-------------------------------------+------------------+
| false         | extraClientId7                      | 0                |
+---------------+-------------------------------------+------------------+
| true          | b6542364_53f850b8                   | 0                |
+---------------+-------------------------------------+------------------+
| false         | extraClientId23                     | 0                |
+---------------+-------------------------------------+------------------+
@ioolkos
Copy link
Contributor

ioolkos commented Nov 16, 2022

@teemupiiroinenwirepas hm...
Do you know what will happen to the RAM state of the base OS when the host OS starts? Does Docker try to persist this between container restarts?


👉 Thank you for supporting VerneMQ: https://github.com/sponsors/vernemq
👉 Using the binary VerneMQ packages commercially (.deb/.rpm/Docker) requires a paid subscription.

@teemupiiroinenwirepas
Copy link
Author

  • Can you please elaborate more that what do you mean by RAM state?
  • We use docker compose with volumes to persist the data. Volume definition from compose file:
    volumes:
    - vernemq_data:/vernemq/data

@teemupiiroinenwirepas
Copy link
Author

@ioolkos Do you need more debugging information from us?

@ioolkos
Copy link
Contributor

ioolkos commented Nov 19, 2022

@teemupiiroinenwirepas no need for more debugging info (no free bandwidth to undertake an investigation) but let's focus on why a delay in the Docker daemon start fixes this. What can be possibly concluded from it? would the volume mounts for the VerneMQ data dir not be available or something like that?


👉 Thank you for supporting VerneMQ: https://github.com/sponsors/vernemq
👉 Using the binary VerneMQ packages commercially (.deb/.rpm/Docker) requires a paid subscription.

@teemupiiroinenwirepas
Copy link
Author

teemupiiroinenwirepas commented Nov 21, 2022

I doubt that it is about file system mounting as we mount directly the root directory. We run Postgres and Influx also in the vm and we haven't see any issues with them. Volumes are in the default directory /var/lib/docker/volumes.

Filesystem      Size  Used Avail Use% Mounted on
/dev/root        49G   20G   29G  41% /
devtmpfs        3.8G     0  3.8G   0% /dev
tmpfs           3.8G     0  3.8G   0% /dev/shm
tmpfs           766M  892K  765M   1% /run
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs           3.8G     0  3.8G   0% /sys/fs/cgroup
/dev/loop0       26M   26M     0 100% /snap/amazon-ssm-agent/5656
/dev/loop1       25M   25M     0 100% /snap/amazon-ssm-agent/6312
/dev/loop3       45M   45M     0 100% /snap/certbot/2511
/dev/loop4       56M   56M     0 100% /snap/core18/2566
/dev/loop5       56M   56M     0 100% /snap/core18/2620
/dev/loop6       64M   64M     0 100% /snap/core20/1634
/dev/loop8       48M   48M     0 100% /snap/snapd/17336
/dev/loop7       71M   71M     0 100% /snap/lxd/21029
/dev/loop9       68M   68M     0 100% /snap/lxd/22753
/dev/loop10      64M   64M     0 100% /snap/core20/1695
/dev/loop11      50M   50M     0 100% /snap/snapd/17576
/dev/loop12      45M   45M     0 100% /snap/certbot/2539
tmpfs           766M     0  766M   0% /run/user/1000

@ioolkos
Copy link
Contributor

ioolkos commented Nov 21, 2022

Ok, thanks. Please keep us posted on your findings.


👉 Thank you for supporting VerneMQ: https://github.com/sponsors/vernemq
👉 Using the binary VerneMQ packages commercially (.deb/.rpm/Docker) requires a paid subscription.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants