-
-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error enabling msgr2 messenger in Ceph during Ansible playbook execution #11
Comments
That would happen if the Ceph cluster isn't functional. This most commonly happen if you have fully redone your deployment without also wiping the data from the In this scenario you end up with a freshly deployed cluster that's still expecting the servers from the previous deployment and so is unable to achieve a quorum, causing the Ceph API to fall to come online and results in the configuration failure you're getting. |
Ceph monitor initialization issue: monmap min_mon_release older than installed version |
Can you show Normally the logic in the playbook is to set the min-mon-release in the mon map to the same release as |
I have already cleaned the data/ceph/ folder and others. I also used both Quincy and Reef versions. I am lost in this deployment. |
Also the output of |
root@haruunkal:~/incus-deploy# git rev-parse HEAD |
Okay, so it shouldn't be because of lack of support for calling monmaptool with the needed set-min-mon-release, but then it's pretty confusing as to why it would have set a release of 15 when it should have been passed 18. The output of |
Thank you very much for your support. It seems that there was an issue with my lab workstation that was resolved only when I disabled the IPv6 network. After that, the entire process ran perfectly. |
root@haruunkal:~/incus-deploy# monmaptool --print ansible/data/ceph/cluster.e2850e1f-7aab-472e-b6b1-824e19a75071.mon.map |
rsrsrs. Other error: |
Yeah, so the Maybe that older version of You could add the Ceph repository to your own machine and then update to a new version of |
Having the exact same issue here |
Same here as well |
OK, i got the full installation. Here are my logs. I had the same error relative to the monmaptool -v
ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy (stable)
monmaptool --create --set-min-mon-release 18 --fsid e2850e1f-7aab-472e-b6b1-824e19a75071 data/ceph/cluster.e2850e1f-7aab-472e-b6b1-824e19a75071.mon.map --clobber
monmaptool: monmap file data/ceph/cluster.e2850e1f-7aab-472e-b6b1-824e19a75071.mon.map
setting min_mon_release = octopus
monmaptool: set fsid to e2850e1f-7aab-472e-b6b1-824e19a75071
monmaptool: writing epoch 0 to data/ceph/cluster.e2850e1f-7aab-472e-b6b1-824e19a75071.mon.map (0 monitors) On Ubuntu, I used CEPH_RELEASE=18.2.0
curl --silent --remote-name --location https://download.ceph.com/rpm-${CEPH_RELEASE}/el9/noarch/cephadm
chmod +x cephadm
sudo mv cephadm /usr/local/bin/
sudo cephadm add-repo --release reef
sudo apt update
sudo apt upgrade -y Once update to version 18, the monmaptool -v
ceph version 18.2.4 (e7ad5345525c7aa95470c26863873b581076945d) reef (stable)
monmaptool --create --set-min-mon-release reef --fsid e2850e1f-7aab-472e-b6b1-824e19a75071 ansible/data/ceph/cluster.e2850e1f-7aab-472e-b6b1-824e19a75071.mon.map --clobber
monmaptool: monmap file ansible/data/ceph/cluster.e2850e1f-7aab-472e-b6b1-824e19a75071.mon.map
setting min_mon_release = reef
monmaptool: set fsid to e2850e1f-7aab-472e-b6b1-824e19a75071
monmaptool: writing epoch 0 to ansible/data/ceph/cluster.e2850e1f-7aab-472e-b6b1-824e19a75071.mon.map (0 monitors) Finnally, the installation is reset and re-applied.
It succeed. PLAY RECAP ***********************************************************************************************************************************************************************************
server01 : ok=67 changed=0 unreachable=0 failed=0 skipped=40 rescued=0 ignored=0
server02 : ok=67 changed=0 unreachable=0 failed=0 skipped=40 rescued=0 ignored=0
server03 : ok=68 changed=0 unreachable=0 failed=0 skipped=41 rescued=0 ignored=0
server04 : ok=56 changed=0 unreachable=0 failed=0 skipped=51 rescued=0 ignored=0
server05 : ok=56 changed=0 unreachable=0 failed=0 skipped=51 rescued=0 ignored=0 Note, I had to re-execute
|
Yeah, we need to re-shuffle things a bit to have the monmap be generated on the target servers and pulled back onto the source, you'd think that monmaptool having the argument would work or if not, would at least give an error, but it doesn't... |
One of the things I worked on in my fork (https://github.com/mttjohnson/incus-deploy/tree/fixes-to-run-for-me) was to get the ceph commands running from the target host because I'm initiating the incus-deploy actions from a mac and couldn't get the ceph tools installed on my mac. That branch on my fork |
Ah, it'd be great if you could extract that logic and send it as a PR! |
Description:When running the Ansible playbook deploy.yaml from the incus-deploy project, an error occurs while attempting to enable the msgr2 messenger in Ceph. The ceph mon enable-msgr2 command fails with a timeout, indicating that it could not connect to the RADOS cluster.
Error Message:
fatal: [server01]: FAILED! => {"changed": true, "cmd": "ceph mon enable-msgr2", "delta": "0:05:00.070563", "end": "2024-07-11 13:43:18.284315", "msg": "non-zero return code", "rc": 1, "start": "2024-07-11 13:38:18.213752", "stderr": "2024-07-11T13:43:18.279+0000 7ff21f567640 0 monclient(hunting): authenticate timed out after 300\n[errno 110] RADOS timed out (error connecting to the cluster)", "stderr_lines": ["2024-07-11T13:43:18.279+0000 7ff21f567640 0 monclient(hunting): authenticate timed out after 300", "[errno 110] RADOS timed out (error connecting to the cluster)"], "stdout": "", "stdout_lines": []}
fatal: [server03]: FAILED! => {"changed": true, "cmd": "ceph mon enable-msgr2", "delta": "0:05:00.109144", "end": "2024-07-11 13:43:18.320621", "msg": "non-zero return code", "rc": 1, "start": "2024-07-11 13:38:18.211477", "stderr": "2024-07-11T13:43:18.316+0000 7fc48f66d640 0 monclient(hunting): authenticate timed out after 300\n[errno 110] RADOS timed out (error connecting to the cluster)", "stderr_lines": ["2024-07-11T13:43:18.316+0000 7fc48f66d640 0 monclient(hunting): authenticate timed out after 300", "[errno 110] RADOS timed out (error connecting to the cluster)"], "stdout": "", "stdout_lines": []}
fatal: [server02]: FAILED! => {"changed": true, "cmd": "ceph mon enable-msgr2", "delta": "0:05:00.093801", "end": "2024-07-11 13:43:18.316757", "msg": "non-zero return code", "rc": 1, "start": "2024-07-11 13:38:18.222956", "stderr": "2024-07-11T13:43:18.314+0000 7f4cb7b4a640 0 monclient(hunting): authenticate timed out after 300\n[errno 110] RADOS timed out (error connecting to the cluster)", "stderr_lines": ["2024-07-11T13:43:18.314+0000 7f4cb7b4a640 0 monclient(hunting): authenticate timed out after 300", "[errno 110] RADOS timed out (error connecting to the cluster)"], "stdout": "", "stdout_lines": []}
Steps to Reproduce:
Execute the Ansible playbook deploy.yaml in the directory ~/incus-deploy/ansible.
Observe the error during the task to enable the msgr2 messenger in Ceph.
Expected Behavior:
The ceph mon enable-msgr2 command should execute without errors, enabling the msgr2 messenger in the Ceph cluster.
Actual Behavior:
The ceph mon enable-msgr2 command fails with a timeout, indicating it could not connect to the RADOS cluster.
Additional Details:
The error occurs on multiple servers (server01, server02, server03).
Specific error message: RADOS timed out (error connecting to the cluster).
The playbook was executed as root.
Environment:
Ansible version: [2.17.1]]
Ubuntu: 22.04
Execute:
root@haruunkal:
/incus-deploy/terraform# cd ../ansible//incus-deploy/ansible# ansible-playbook deploy.yamlroot@haruunkal:
PLAY [Ceph - Generate cluster keys and maps] ********************************************************************************************
TASK [Gathering Facts] ******************************************************************************************************************
[WARNING]: Platform linux on host server03 is using the discovered Python interpreter at /usr/bin/python3.10, but future installation of
another Python interpreter could change the meaning of that path. See https://docs.ansible.com/ansible-
core/2.17/reference_appendices/interpreter_discovery.html for more information.
ok: [server03]
[WARNING]: Platform linux on host server04 is using the discovered Python interpreter at /usr/bin/python3.10, but future installation of
another Python interpreter could change the meaning of that path. See https://docs.ansible.com/ansible-
core/2.17/reference_appendices/interpreter_discovery.html for more information.
ok: [server04]
[WARNING]: Platform linux on host server02 is using the discovered Python interpreter at /usr/bin/python3.10, but future installation of
another Python interpreter could change the meaning of that path. See https://docs.ansible.com/ansible-
core/2.17/reference_appendices/interpreter_discovery.html for more information.
ok: [server02]
[WARNING]: Platform linux on host server05 is using the discovered Python interpreter at /usr/bin/python3.10, but future installation of
another Python interpreter could change the meaning of that path. See https://docs.ansible.com/ansible-
core/2.17/reference_appendices/interpreter_discovery.html for more information.
ok: [server05]
[WARNING]: Platform linux on host server01 is using the discovered Python interpreter at /usr/bin/python3.10, but future installation of
another Python interpreter could change the meaning of that path. See https://docs.ansible.com/ansible-
core/2.17/reference_appendices/interpreter_discovery.html for more information.
ok: [server01]
TASK [Generate mon keyring] *************************************************************************************************************
changed: [server03 -> 127.0.0.1]
ok: [server04 -> 127.0.0.1]
ok: [server01 -> 127.0.0.1]
ok: [server05 -> 127.0.0.1]
ok: [server02 -> 127.0.0.1]
TASK [Generate client.admin keyring] ****************************************************************************************************
changed: [server03 -> 127.0.0.1]
ok: [server04 -> 127.0.0.1]
ok: [server01 -> 127.0.0.1]
ok: [server05 -> 127.0.0.1]
ok: [server02 -> 127.0.0.1]
TASK [Generate bootstrap-osd keyring] ***************************************************************************************************
changed: [server03 -> 127.0.0.1]
ok: [server04 -> 127.0.0.1]
ok: [server01 -> 127.0.0.1]
ok: [server05 -> 127.0.0.1]
ok: [server02 -> 127.0.0.1]
TASK [Generate mon map] *****************************************************************************************************************
changed: [server03 -> 127.0.0.1]
ok: [server04 -> 127.0.0.1]
ok: [server01 -> 127.0.0.1]
ok: [server05 -> 127.0.0.1]
ok: [server02 -> 127.0.0.1]
RUNNING HANDLER [Add key to client.admin keyring] ***************************************************************************************
changed: [server03 -> 127.0.0.1]
RUNNING HANDLER [Add key to bootstrap-osd keyring] **************************************************************************************
changed: [server03 -> 127.0.0.1]
RUNNING HANDLER [Add nodes to mon map] **************************************************************************************************
changed: [server03 -> 127.0.0.1] => (item={'name': 'server01', 'ip': 'fd42:60dc:dec6:a73b:216:3eff:fe2d:4c57'})
changed: [server03 -> 127.0.0.1] => (item={'name': 'server02', 'ip': 'fd42:60dc:dec6:a73b:216:3eff:fe05:31f6'})
changed: [server03 -> 127.0.0.1] => (item={'name': 'server03', 'ip': 'fd42:60dc:dec6:a73b:216:3eff:fe01:1c21'})
PLAY [Ceph - Add package repository] ****************************************************************************************************
TASK [Gathering Facts] ******************************************************************************************************************
ok: [server04]
ok: [server05]
ok: [server03]
ok: [server01]
ok: [server02]
TASK [Create apt keyring path] **********************************************************************************************************
ok: [server03]
ok: [server01]
ok: [server05]
ok: [server04]
ok: [server02]
TASK [Add ceph GPG key] *****************************************************************************************************************
changed: [server04]
changed: [server03]
changed: [server05]
changed: [server01]
changed: [server02]
TASK [Get DPKG architecture] ************************************************************************************************************
ok: [server04]
ok: [server03]
ok: [server05]
ok: [server01]
ok: [server02]
TASK [Add ceph package sources] *********************************************************************************************************
changed: [server03]
changed: [server05]
changed: [server04]
changed: [server02]
changed: [server01]
RUNNING HANDLER [Update apt] ************************************************************************************************************
changed: [server01]
changed: [server04]
changed: [server05]
changed: [server03]
changed: [server02]
PLAY [Ceph - Install packages] **********************************************************************************************************
TASK [Gathering Facts] ******************************************************************************************************************
ok: [server01]
ok: [server04]
ok: [server05]
ok: [server03]
ok: [server02]
TASK [Install ceph-common] **************************************************************************************************************
changed: [server02]
changed: [server03]
changed: [server05]
changed: [server04]
changed: [server01]
TASK [Install ceph-mon] *****************************************************************************************************************
skipping: [server04]
skipping: [server05]
changed: [server03]
changed: [server01]
changed: [server02]
TASK [Install ceph-mgr] *****************************************************************************************************************
skipping: [server04]
skipping: [server05]
changed: [server03]
changed: [server02]
changed: [server01]
TASK [Install ceph-mds] *****************************************************************************************************************
skipping: [server04]
skipping: [server05]
changed: [server01]
changed: [server02]
changed: [server03]
TASK [Install ceph-osd] *****************************************************************************************************************
changed: [server01]
changed: [server04]
changed: [server03]
changed: [server02]
changed: [server05]
TASK [Install ceph-rbd-mirror] **********************************************************************************************************
skipping: [server01]
skipping: [server02]
skipping: [server04]
skipping: [server05]
skipping: [server03]
TASK [Install radosgw] ******************************************************************************************************************
skipping: [server01]
skipping: [server02]
skipping: [server03]
changed: [server04]
changed: [server05]
PLAY [Ceph - Set up config and keyrings] ************************************************************************************************
TASK [Transfer the cluster configuration] ***********************************************************************************************
changed: [server01]
changed: [server04]
changed: [server03]
changed: [server05]
changed: [server02]
TASK [Create main storage directory] ****************************************************************************************************
ok: [server04]
ok: [server01]
ok: [server03]
ok: [server05]
ok: [server02]
TASK [Create monitor bootstrap path] ****************************************************************************************************
skipping: [server05]
skipping: [server04]
changed: [server01]
changed: [server03]
changed: [server02]
TASK [Create OSD bootstrap path] ********************************************************************************************************
changed: [server05]
changed: [server04]
changed: [server01]
changed: [server03]
changed: [server02]
TASK [Transfer main admin keyring] ******************************************************************************************************
changed: [server05]
changed: [server03]
changed: [server01]
changed: [server02]
changed: [server04]
TASK [Transfer additional client keyrings] **********************************************************************************************
skipping: [server05]
skipping: [server03]
skipping: [server04]
skipping: [server01]
skipping: [server02]
TASK [Transfer bootstrap mon keyring] ***************************************************************************************************
skipping: [server05]
skipping: [server04]
changed: [server03]
changed: [server02]
changed: [server01]
TASK [Transfer bootstrap mon map] *******************************************************************************************************
skipping: [server05]
skipping: [server04]
changed: [server03]
changed: [server02]
changed: [server01]
TASK [Transfer bootstrap OSD keyring] ***************************************************************************************************
changed: [server05]
changed: [server04]
changed: [server01]
changed: [server03]
changed: [server02]
RUNNING HANDLER [Restart Ceph] **********************************************************************************************************
changed: [server05]
changed: [server03]
changed: [server02]
changed: [server04]
changed: [server01]
PLAY [Ceph - Deploy mon] ****************************************************************************************************************
TASK [Gathering Facts] ******************************************************************************************************************
ok: [server01]
ok: [server02]
ok: [server05]
ok: [server04]
ok: [server03]
TASK [Bootstrap Ceph mon] ***************************************************************************************************************
skipping: [server04]
skipping: [server05]
changed: [server02]
changed: [server03]
changed: [server01]
TASK [Enable and start Ceph mon] ********************************************************************************************************
skipping: [server04]
skipping: [server05]
changed: [server02]
changed: [server03]
changed: [server01]
RUNNING HANDLER [Enable msgr2] **********************************************************************************************************
fatal: [server01]: FAILED! => {"changed": true, "cmd": "ceph mon enable-msgr2", "delta": "0:05:00.070563", "end": "2024-07-11 13:43:18.284315", "msg": "non-zero return code", "rc": 1, "start": "2024-07-11 13:38:18.213752", "stderr": "2024-07-11T13:43:18.279+0000 7ff21f567640 0 monclient(hunting): authenticate timed out after 300\n[errno 110] RADOS timed out (error connecting to the cluster)", "stderr_lines": ["2024-07-11T13:43:18.279+0000 7ff21f567640 0 monclient(hunting): authenticate timed out after 300", "[errno 110] RADOS timed out (error connecting to the cluster)"], "stdout": "", "stdout_lines": []}
fatal: [server03]: FAILED! => {"changed": true, "cmd": "ceph mon enable-msgr2", "delta": "0:05:00.109144", "end": "2024-07-11 13:43:18.320621", "msg": "non-zero return code", "rc": 1, "start": "2024-07-11 13:38:18.211477", "stderr": "2024-07-11T13:43:18.316+0000 7fc48f66d640 0 monclient(hunting): authenticate timed out after 300\n[errno 110] RADOS timed out (error connecting to the cluster)", "stderr_lines": ["2024-07-11T13:43:18.316+0000 7fc48f66d640 0 monclient(hunting): authenticate timed out after 300", "[errno 110] RADOS timed out (error connecting to the cluster)"], "stdout": "", "stdout_lines": []}
fatal: [server02]: FAILED! => {"changed": true, "cmd": "ceph mon enable-msgr2", "delta": "0:05:00.093801", "end": "2024-07-11 13:43:18.316757", "msg": "non-zero return code", "rc": 1, "start": "2024-07-11 13:38:18.222956", "stderr": "2024-07-11T13:43:18.314+0000 7f4cb7b4a640 0 monclient(hunting): authenticate timed out after 300\n[errno 110] RADOS timed out (error connecting to the cluster)", "stderr_lines": ["2024-07-11T13:43:18.314+0000 7f4cb7b4a640 0 monclient(hunting): authenticate timed out after 300", "[errno 110] RADOS timed out (error connecting to the cluster)"], "stdout": "", "stdout_lines": []}
PLAY RECAP ******************************************************************************************************************************
server01 : ok=29 changed=18 unreachable=0 failed=1 skipped=3 rescued=0 ignored=0
server02 : ok=29 changed=18 unreachable=0 failed=1 skipped=3 rescued=0 ignored=0
server03 : ok=32 changed=25 unreachable=0 failed=1 skipped=3 rescued=0 ignored=0
server04 : ok=22 changed=11 unreachable=0 failed=0 skipped=10 rescued=0 ignored=0
server05 : ok=22 changed=11 unreachable=0 failed=0 skipped=10 rescued=0 ignored=0
The text was updated successfully, but these errors were encountered: