Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Temporary error message: hook failed: "start" on VM recovery #618

Open
taurus-forever opened this issue Sep 12, 2024 · 1 comment
Open
Labels
bug Something isn't working

Comments

@taurus-forever
Copy link
Contributor

taurus-forever commented Sep 12, 2024

Hi,

It is a followup for #566
It is more cosmetic issue, but it worth to report it and polish this case as it also may abort CI nightly tests.

Steps to reproduce

  • Deploy landscape-scalable bundle using the latest 14/stable charm revision in cluster mode:
> cat overlay.yaml
applications:
  postgresql:
    charm: ch:postgresql
    channel: 14/stable
    revision: 468
    num_units: 3

>  juju deploy landscape-scalable --overlay overlay.yaml
> # wait for all successfully deployed
  • Check primary IP landscape in use:
juju exec --unit landscape-server/0 -- grep 5432 /etc/landscape/service.conf
  • Stop Primary using lxc stop --force juju-3088be-3 (where juju-3088be-3 is a Primary LXC container)
  • Wait for new Primary promoted and joined cluster, check new IP in use: juju exec --unit landscape-server/0 -- grep 5432 /etc/landscape/service.conf
  • Start the manually stopped VM: lxc start juju-3088be-3

Expected behavior

The newly started/restored VM joined the cluster without errors.

Actual behavior

  1. Temporary error on Juju status (it will gone after some time):
    Screenshot from 2024-09-12 13-35-41

  2. Mutiple Errors in Debug-log (check with LXD team the necessary systemd tuning for reliable mount point):

unit-postgresql-2: 13:19:53 ERROR unit.postgresql/2.juju-log Data directory not attached. Reboot unit.                                                                                                                                                  
unit-postgresql-2: 13:20:00 ERROR unit.postgresql/2.juju-log Data directory not attached. Reboot unit.                                                                                                                                                  
unit-postgresql-2: 13:20:09 ERROR unit.postgresql/2.juju-log Data directory not attached. Reboot unit.                                                                                                                                                  
unit-postgresql-2: 13:20:09 ERROR unit.postgresql/2.juju-log Data directory not attached. Reboot unit.         

Versions

Operating system: 22.04
Juju CLI: 3.5.3
Juju agent: 3.5.3
Charm revision: 468/amd
LXD: 5.0.3-80aeff7

Log output

Juju debug log:

machine-3: 13:32:58 INFO juju.worker.authenticationworker "machine-3" key updater worker started                                                                                                                                                        
unit-postgresql-1: 13:32:58 INFO juju.worker.uniter unit "postgresql/1" started                                                                                                                                                                         
unit-postgresql-1: 13:32:58 INFO juju.worker.uniter hooks are retried true                                                                                                                                                                              
unit-postgresql-1: 13:32:58 INFO juju.worker.uniter reboot detected; triggering implicit start hook to notify charm                                                                                                                                     
unit-postgresql-1: 13:32:59 DEBUG unit.postgresql/1.juju-log ops 2.16.0 up and running.                                                                                                                                                                 
unit-postgresql-1: 13:32:59 INFO unit.postgresql/1.juju-log Running legacy hooks/start.                                                                                                                                                                 
unit-postgresql-1: 13:32:59 DEBUG unit.postgresql/1.juju-log ops 2.16.0 up and running.                                                                                                                                                                 
unit-postgresql-1: 13:32:59 DEBUG unit.postgresql/1.juju-log Charm called itself via hooks/start.                                                                                                                                                       
unit-postgresql-1: 13:32:59 DEBUG unit.postgresql/1.juju-log Legacy hooks/start exited with status 0.                                                                                                                                                   
unit-postgresql-1: 13:33:00 INFO unit.postgresql/1.juju-log Starting cluster topology observer process                                                                                                                                                  
unit-postgresql-1: 13:33:00 INFO unit.postgresql/1.juju-log Started cluster topology observer process with PID 786                                                                                                                                      
unit-postgresql-1: 13:33:00 DEBUG unit.postgresql/1.juju-log no relation on 'tracing': tracing not ready                                                                                                                                                
unit-postgresql-1: 13:33:00 DEBUG unit.postgresql/1.juju-log Emitting Juju event start.                                                                                                                                                                 
unit-postgresql-1: 13:33:00 DEBUG unit.postgresql/1.juju-log Deferring <StartEvent via PostgresqlOperatorCharm/on/start[240]>.                                                                                                                          
unit-postgresql-1: 13:33:00 ERROR unit.postgresql/1.juju-log Data directory not attached. Reboot unit.     

Additional context

All those errors are cosmetic, the PostgreSQL cluster is resurected, but it takes some time (could wait the next update-status).
We can try to polish UX here.

Also, on jhack tail output I see many start events EVEN if charm is not restarting container:
image
No ideas, why...

@taurus-forever taurus-forever added the bug Something isn't working label Sep 12, 2024
Copy link

Thank you for reporting us your feedback!

The internal ticket has been created: https://warthogs.atlassian.net/browse/DPE-5455.

This message was autogenerated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant