Temporary error message: hook failed: "start" on VM recovery #618

taurus-forever · 2024-09-12T12:08:55Z

Hi,

It is a followup for #566
It is more cosmetic issue, but it worth to report it and polish this case as it also may abort CI nightly tests.

Steps to reproduce

Deploy landscape-scalable bundle using the latest 14/stable charm revision in cluster mode:

> cat overlay.yaml
applications:
  postgresql:
    charm: ch:postgresql
    channel: 14/stable
    revision: 468
    num_units: 3

>  juju deploy landscape-scalable --overlay overlay.yaml
> # wait for all successfully deployed

Check primary IP landscape in use:

juju exec --unit landscape-server/0 -- grep 5432 /etc/landscape/service.conf

Stop Primary using lxc stop --force juju-3088be-3 (where juju-3088be-3 is a Primary LXC container)
Wait for new Primary promoted and joined cluster, check new IP in use: juju exec --unit landscape-server/0 -- grep 5432 /etc/landscape/service.conf
Start the manually stopped VM: lxc start juju-3088be-3

Expected behavior

The newly started/restored VM joined the cluster without errors.

Actual behavior

Temporary error on Juju status (it will gone after some time):
Mutiple Errors in Debug-log (check with LXD team the necessary systemd tuning for reliable mount point):

unit-postgresql-2: 13:19:53 ERROR unit.postgresql/2.juju-log Data directory not attached. Reboot unit.                                                                                                                                                  
unit-postgresql-2: 13:20:00 ERROR unit.postgresql/2.juju-log Data directory not attached. Reboot unit.                                                                                                                                                  
unit-postgresql-2: 13:20:09 ERROR unit.postgresql/2.juju-log Data directory not attached. Reboot unit.                                                                                                                                                  
unit-postgresql-2: 13:20:09 ERROR unit.postgresql/2.juju-log Data directory not attached. Reboot unit.

Versions

Operating system: 22.04
Juju CLI: 3.5.3
Juju agent: 3.5.3
Charm revision: 468/amd
LXD: 5.0.3-80aeff7

Log output

Juju debug log:

machine-3: 13:32:58 INFO juju.worker.authenticationworker "machine-3" key updater worker started                                                                                                                                                        
unit-postgresql-1: 13:32:58 INFO juju.worker.uniter unit "postgresql/1" started                                                                                                                                                                         
unit-postgresql-1: 13:32:58 INFO juju.worker.uniter hooks are retried true                                                                                                                                                                              
unit-postgresql-1: 13:32:58 INFO juju.worker.uniter reboot detected; triggering implicit start hook to notify charm                                                                                                                                     
unit-postgresql-1: 13:32:59 DEBUG unit.postgresql/1.juju-log ops 2.16.0 up and running.                                                                                                                                                                 
unit-postgresql-1: 13:32:59 INFO unit.postgresql/1.juju-log Running legacy hooks/start.                                                                                                                                                                 
unit-postgresql-1: 13:32:59 DEBUG unit.postgresql/1.juju-log ops 2.16.0 up and running.                                                                                                                                                                 
unit-postgresql-1: 13:32:59 DEBUG unit.postgresql/1.juju-log Charm called itself via hooks/start.                                                                                                                                                       
unit-postgresql-1: 13:32:59 DEBUG unit.postgresql/1.juju-log Legacy hooks/start exited with status 0.                                                                                                                                                   
unit-postgresql-1: 13:33:00 INFO unit.postgresql/1.juju-log Starting cluster topology observer process                                                                                                                                                  
unit-postgresql-1: 13:33:00 INFO unit.postgresql/1.juju-log Started cluster topology observer process with PID 786                                                                                                                                      
unit-postgresql-1: 13:33:00 DEBUG unit.postgresql/1.juju-log no relation on 'tracing': tracing not ready                                                                                                                                                
unit-postgresql-1: 13:33:00 DEBUG unit.postgresql/1.juju-log Emitting Juju event start.                                                                                                                                                                 
unit-postgresql-1: 13:33:00 DEBUG unit.postgresql/1.juju-log Deferring <StartEvent via PostgresqlOperatorCharm/on/start[240]>.                                                                                                                          
unit-postgresql-1: 13:33:00 ERROR unit.postgresql/1.juju-log Data directory not attached. Reboot unit.

Additional context

All those errors are cosmetic, the PostgreSQL cluster is resurected, but it takes some time (could wait the next update-status).
We can try to polish UX here.

Also, on jhack tail output I see many start events EVEN if charm is not restarting container:

No ideas, why...

The text was updated successfully, but these errors were encountered:

syncronize-issues-to-jira · 2024-09-12T12:09:02Z

Thank you for reporting us your feedback!

The internal ticket has been created: https://warthogs.atlassian.net/browse/DPE-5455.

This message was autogenerated

taurus-forever added the bug Something isn't working label Sep 12, 2024

taurus-forever mentioned this issue Sep 12, 2024

Failed to retrieve the PostgreSQL version to initialise/update db-admin relation during failover #566

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Temporary error message: hook failed: "start" on VM recovery #618

Temporary error message: hook failed: "start" on VM recovery #618

taurus-forever commented Sep 12, 2024 •

edited

Loading

syncronize-issues-to-jira bot commented Sep 12, 2024

Temporary error message: hook failed: "start" on VM recovery #618

Temporary error message: hook failed: "start" on VM recovery #618

Comments

taurus-forever commented Sep 12, 2024 • edited Loading

Steps to reproduce

Expected behavior

Actual behavior

Versions

Log output

Additional context

syncronize-issues-to-jira bot commented Sep 12, 2024

taurus-forever commented Sep 12, 2024 •

edited

Loading