Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

recover when using kubernetes statefulset #2

Open
zetaab opened this issue Sep 7, 2017 · 3 comments
Open

recover when using kubernetes statefulset #2

zetaab opened this issue Sep 7, 2017 · 3 comments

Comments

@zetaab
Copy link

zetaab commented Sep 7, 2017

Hi

I am trying to figure out is this galera setup really "production ready" like blob post said. However, I am thinking do you have problem in logic here https://github.com/severalnines/galera-docker-mariadb/blob/master/entrypoint.sh#L167-L203 The idea of that code is that if whole cluster goes down - it will check from all galera nodes which have the latest sequence number. However, when using statefulsets the pods are starting in order - which means that only one pod can report its status -> and it will always get bootstrapped. We should have possibility to start all nodes under statefulset and after that decide which one has the latest seq number, right? Or am I missing something?

@ashraf-s9s
Copy link
Collaborator

Hi @zetaab , you are right. The entrypoint was built on top of Docker Swarm and Kubernetes ReplicaSet, but StatefulSet must be handled differently. At the moment, this is what I can think of:

  1. During startup, verify if gcache and grastate.data exist.
  2. If true, the MySQL will be started with wsrep_on=0. In this case, the pod shall be seen as healthy, the next container n+1 can be started.
  3. Then, every pod reports its last committed value.
  4. The winner will be started using SET GLOBAL wsrep_provider_options = 'pc.bootstrap=1';

What do you think?

@pantaoran
Copy link

But when and how do you determine the winner from your step 4?
If all pods are running happily and are seen as healthy, then no entrypoint script will run.
I think they would need to self-destruct after a while...

@syz521
Copy link

syz521 commented Oct 14, 2019

I use another way to solve the problem.
During startup, verify if gvwstate.dat exist. If true, start. Else, check safe_to_bootstrap, if =1, then bootstrap, else make liveness and readiness return true, wait the bootstraper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants