Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2 nodes of galera cluster (3 nodes totally) restart periodically #398

Closed
ybxiang opened this issue Oct 9, 2021 · 6 comments
Closed

2 nodes of galera cluster (3 nodes totally) restart periodically #398

ybxiang opened this issue Oct 9, 2021 · 6 comments

Comments

@ybxiang
Copy link

ybxiang commented Oct 9, 2021

Dear experts:

I use bellow docker-stack.yml to start galera cluster with 3 nodes, but 2 nodes of galera cluster (3 nodes totally) restart periodically:

version: '3.7'

services:

  mariadb01:
    image: mariadb:10.5
    networks:
      - terra-overlay-net
    environment:
      MYSQL_ROOT_PASSWORD: "test-root"
    command: --wsrep-new-cluster --binlog-format=ROW --wsrep-on=1 --wsrep-cluster-name=terra-mariadb-cluster --wsrep-cluster-address=gcomm://mariadb02,mariadb03,mariadb01 --wsrep-forced-binlog-format=ROW --wsrep-provider=/usr/lib/galera/libgalera_smm.so --wsrep-sst-method=rsync --wsrep-node-address=mariadb01 --wsrep-node-name=server1 --server-id=1 --bind-address=0.0.0.0 --default-storage-engine=InnoDB --innodb-autoinc-lock-mode=2
    volumes:
      - ./mariadb01-data:/var/lib/mysql
    deploy:
      mode: replicated
      replicas: 1

  mariadb02:
    image: mariadb:10.5
    depends_on:
      - mariadb01
    networks:
      - terra-overlay-net
    environment:
      MYSQL_ROOT_PASSWORD: "test-root"
    command: --binlog-format=ROW --wsrep-on=1 --wsrep-cluster-name=terra-mariadb-cluster --wsrep-cluster-address=gcomm://mariadb01,mariadb03,mariadb02 --wsrep-forced-binlog-format=ROW --wsrep-provider=/usr/lib/galera/libgalera_smm.so --wsrep-sst-method=rsync --wsrep-node-address=mariadb02 --wsrep-node-name=server2 --server-id=2 --bind-address=0.0.0.0 --default-storage-engine=InnoDB --innodb-autoinc-lock-mode=2
    volumes:
      - ./mariadb02-data:/var/lib/mysql
    deploy:
      mode: replicated
      replicas: 1

  mariadb03:
    image: mariadb:10.5
    depends_on:
      - mariadb01
    networks:
      - terra-overlay-net
    environment:
      MYSQL_ROOT_PASSWORD: "test-root"
    command: --binlog-format=ROW --wsrep-on=1 --wsrep-cluster-name=terra-mariadb-cluster --wsrep-cluster-address=gcomm://mariadb01,mariadb02,mariadb03 --wsrep-forced-binlog-format=ROW --wsrep-provider=/usr/lib/galera/libgalera_smm.so --wsrep-sst-method=rsync --wsrep-node-address=mariadb03 --wsrep-node-name=server3 --server-id=3 --bind-address=0.0.0.0 --default-storage-engine=InnoDB --innodb-autoinc-lock-mode=2
    volumes:
      - ./mariadb03-data:/var/lib/mysql
    deploy:
      mode: replicated
      replicas: 1

networks:
  terra-overlay-net:
    driver: overlay
    name: terra-overlay-net
    external: true

The commands to create network and start the servie stack:

echo "prepare clean data directories ******************************************"
rm -rf mariadb01-data  mariadb02-data  mariadb03-data
sleep 1
mkdir  mariadb01-data  mariadb02-data  mariadb03-data

echo "prepare fresh overlay network *******************************************"
docker network rm terra-overlay-net
docker network prune -f
sleep 1
docker network create -d overlay --attachable --subnet 172.16.238.0/24 terra-overlay-net
sleep 1

echo "start services **********************************************************"
docker stack deploy --compose-file=docker-stack.yml terra-mariadb-cluster

But mariadb02 and mariadb03 restart periodically with bellow errors:

2021-10-09  7:08:04 0 [Note] WSREP: Flow-control interval: [23, 23]
WSREP_SST: [INFO] previous SST is not completed, waiting for it to exit (20211009 07:08:04.838)
WSREP_SST: [INFO] previous SST is not completed, waiting for it to exit (20211009 07:08:05.849)
WSREP_SST: [ERROR] previous SST script still running. (20211009 07:08:05.852)
2021-10-09  7:08:05 0 [ERROR] WSREP: Failed to read 'ready <addr>' from: wsrep_sst_rsync --role 'joiner' --address 'mariadb02' --datadir '/var/lib/mysql/' --parent '1' --mysqld-args --binlog-format=ROW --wsrep-on=1 --wsrep-cluster-name=terra-mariadb-cluster --wsrep-cluster-address=gcomm://mariadb01,mariadb03,mariadb02 --wsrep-forced-binlog-format=ROW --wsrep-provider=/usr/lib/galera/libgalera_smm.so --wsrep-sst-method=rsync --wsrep-node-address=mariadb02 --wsrep-node-name=server2 --server-id=2 --bind-address=0.0.0.0 --default-storage-engine=InnoDB --innodb-autoinc-lock-mode=2
 Read: '(null)'
2021-10-09  7:08:05 0 [ERROR] WSREP: Process completed with error: wsrep_sst_rsync --role 'joiner' --address 'mariadb02' --datadir '/var/lib/mysql/' --parent '1' --mysqld-args --binlog-format=ROW --wsrep-on=1 --wsrep-cluster-name=terra-mariadb-cluster --wsrep-cluster-address=gcomm://mariadb01,mariadb03,mariadb02 --wsrep-forced-binlog-format=ROW --wsrep-provider=/usr/lib/galera/libgalera_smm.so --wsrep-sst-method=rsync --wsrep-node-address=mariadb02 --wsrep-node-name=server2 --server-id=2 --bind-address=0.0.0.0 --default-storage-engine=InnoDB --innodb-autoinc-lock-mode=2: 114 (Operation already in progress)
2021-10-09  7:08:05 1 [ERROR] WSREP: Failed to prepare for 'rsync' SST. Unrecoverable.
2021-10-09  7:08:05 1 [ERROR] WSREP: SST request callback failed. This is unrecoverable, restart required.

Is there any possible bugs in the docker-stack.yml???
I spent more than 1 week on this issue, please help me!

@ybxiang
Copy link
Author

ybxiang commented Oct 9, 2021

I am wondering why the official mariadb image can NOT be configured as cluster easily. and why should we use the wrapper image such as https://hub.docker.com/r/bitnami/mariadb-galera/ or https://hub.docker.com/r/toughiq/mariadb-cluster?

I can NOT find any docker-stack.yml which uses "official mariadb image" to deploy mariadb cluster, isn't it a bad thing?

@ybxiang
Copy link
Author

ybxiang commented Oct 11, 2021

Occasionally, I checked the description of wsrep-node-address, I removed it, the cluster is started successfully.

@ybxiang ybxiang closed this as completed Oct 11, 2021
@grooverdan
Copy link
Member

Glad you got it going, I do need to revisit the review of #377 soon.

@wuqiang0720
Copy link

`2024-12-18 13:13:21 0 [Note] WSREP: Start replication
2024-12-18 13:13:21 0 [Note] WSREP: Connecting with bootstrap option: 0
2024-12-18 13:13:21 0 [Note] WSREP: Setting GCS initial position to 00000000-0000-0000-0000-000000000000:-1
2024-12-18 13:13:21 0 [Note] WSREP: protonet asio version 0
2024-12-18 13:13:21 0 [Note] WSREP: Using CRC-32C for message checksums.
2024-12-18 13:13:21 0 [Note] WSREP: backend: asio
2024-12-18 13:13:21 0 [Note] WSREP: gcomm thread scheduling priority set to other:0
2024-12-18 13:13:21 0 [Note] WSREP: access file(/var/lib/mysql//gvwstate.dat) failed(No such file or directory)
2024-12-18 13:13:21 0 [Note] WSREP: restore pc from disk failed
2024-12-18 13:13:21 0 [Note] WSREP: GMCast version 0
2024-12-18 13:13:21 0 [Note] WSREP: (daa21bca-8ab0, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2024-12-18 13:13:21 0 [Note] WSREP: (daa21bca-8ab0, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2024-12-18 13:13:21 0 [Note] WSREP: EVS version 1
2024-12-18 13:13:21 0 [Note] WSREP: gcomm: connecting to group 'openstack', peer '192.168.126.100:4567,192.168.126.100:4568,192.168.126.100:4569'
2024-12-18 13:13:21 0 [Note] WSREP: (daa21bca-8ab0, 'tcp://0.0.0.0:4567') Found matching local endpoint for a connection, blacklisting address tcp://192.168.126.100:4567
2024-12-18 13:13:21 0 [Note] WSREP: (daa21bca-8ab0, 'tcp://0.0.0.0:4567') connection established to d0c11824-880c tcp://192.168.126.100:4568
2024-12-18 13:13:21 0 [Note] WSREP: (daa21bca-8ab0, 'tcp://0.0.0.0:4567') connection established to d0cbc809-b497 tcp://192.168.126.100:4569
2024-12-18 13:13:21 0 [Note] WSREP: EVS version upgrade 0 -> 1
2024-12-18 13:13:21 0 [Note] WSREP: declaring d0c11824-880c at tcp://192.168.126.100:4568 stable
2024-12-18 13:13:21 0 [Note] WSREP: declaring d0cbc809-b497 at tcp://192.168.126.100:4569 stable
2024-12-18 13:13:21 0 [Note] WSREP: PC protocol upgrade 0 -> 1
2024-12-18 13:13:21 0 [Note] WSREP: Node d0c11824-880c state prim
2024-12-18 13:13:21 0 [Note] WSREP: view(view_id(PRIM,d0c11824-880c,334) memb {
d0c11824-880c,0
d0cbc809-b497,0
daa21bca-8ab0,0
} joined {
} left {
} partitioned {
})
2024-12-18 13:13:21 0 [Note] WSREP: save pc into disk
2024-12-18 13:13:22 0 [Note] WSREP: gcomm: connected
2024-12-18 13:13:22 0 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
2024-12-18 13:13:22 0 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
2024-12-18 13:13:22 0 [Note] WSREP: Opened channel 'openstack'
2024-12-18 13:13:22 0 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 2, memb_num = 3
2024-12-18 13:13:22 0 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
2024-12-18 13:13:22 0 [Note] WSREP: STATE EXCHANGE: sent state msg: daefd1a7-bd41-11ef-9d8d-3a9e32c104d3
2024-12-18 13:13:22 0 [Note] WSREP: STATE EXCHANGE: got state msg: daefd1a7-bd41-11ef-9d8d-3a9e32c104d3 from 0 (ubuntu-focal)
2024-12-18 13:13:22 0 [Note] WSREP: STATE EXCHANGE: got state msg: daefd1a7-bd41-11ef-9d8d-3a9e32c104d3 from 1 (ubuntu-focal)
2024-12-18 13:13:22 2 [Note] WSREP: Starting rollbacker thread 2
2024-12-18 13:13:22 0 [Note] WSREP: STATE EXCHANGE: got state msg: daefd1a7-bd41-11ef-9d8d-3a9e32c104d3 from 2 (ubuntu-focal)
2024-12-18 13:13:22 0 [Note] WSREP: Quorum results:
version = 6,
component = PRIMARY,
conf_id = 326,
members = 2/3 (joined/total),
act_id = 327,
last_appl. = 0,
protocols = 4/11/4 (gcs/repl/appl),
vote policy= 0,
group UUID = d0c2c986-bd29-11ef-90ca-faaaf15585a9
2024-12-18 13:13:22 1 [Note] WSREP: Starting applier thread 1
2024-12-18 13:13:22 0 [Note] WSREP: Flow-control interval: [28, 28]
2024-12-18 13:13:22 0 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 328)
2024-12-18 13:13:22 1 [Note] WSREP: ####### processing CC 328, local, ordered
2024-12-18 13:13:22 1 [Note] WSREP: Process first view: d0c2c986-bd29-11ef-90ca-faaaf15585a9 my uuid: daa21bca-bd41-11ef-8ab0-42bb73b544d0
2024-12-18 13:13:22 1 [Note] WSREP: Server ubuntu-focal connected to cluster at position d0c2c986-bd29-11ef-90ca-faaaf15585a9:328 with ID daa21bca-bd41-11ef-8ab0-42bb73b544d0
2024-12-18 13:13:22 1 [Note] WSREP: Server status change disconnected -> connected
2024-12-18 13:13:22 1 [Note] WSREP: ####### My UUID: daa21bca-bd41-11ef-8ab0-42bb73b544d0
2024-12-18 13:13:22 1 [Note] WSREP: Cert index reset to 00000000-0000-0000-0000-000000000000:-1 (proto: 11), state transfer needed: yes
2024-12-18 13:13:22 0 [Note] WSREP: Service thread queue flushed.
2024-12-18 13:13:22 1 [Note] WSREP: ####### Assign initial position for certification: 00000000-0000-0000-0000-000000000000:-1, protocol version: -1
2024-12-18 13:13:22 1 [Note] WSREP: State transfer required:
Group state: d0c2c986-bd29-11ef-90ca-faaaf15585a9:328
Local state: 00000000-0000-0000-0000-000000000000:-1
2024-12-18 13:13:22 1 [Note] WSREP: Server status change connected -> joiner
2024-12-18 13:13:22 0 [Note] WSREP: Running: 'wsrep_sst_mariabackup --role 'joiner' --address '192.168.126.100' --datadir '/var/lib/mysql/' --parent 1 --progress 0 --mysqld-args --wsrep_on=ON --wsrep_provider=/usr/lib/galera/libgalera_smm.so --wsrep_cluster_address=gcomm://192.168.126.100:4567,192.168.126.100:4568,192.168.126.100:4569'
2024-12-18 13:13:22 0 [Note] WSREP: Joiner monitor thread started to monitor
WSREP_SST: [INFO] mariabackup SST started on joiner (20241218 13:13:22.072)
WSREP_SST: [INFO] SSL configuration: CA='', CAPATH='', CERT='', KEY='', MODE='DISABLED', encrypt='0' (20241218 13:13:22.100)
WSREP_SST: [INFO] Progress reporting tool pv not found in path: /usr//bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/sbin:/usr/bin:/sbin:/bin (20241218 13:13:22.169)
WSREP_SST: [INFO] Disabling all progress/rate-limiting (20241218 13:13:22.171)
WSREP_SST: [INFO] Streaming with mbstream (20241218 13:13:22.183)
WSREP_SST: [INFO] Using socat as streamer (20241218 13:13:22.185)
WSREP_SST: [INFO] Stale sst_in_progress file: /var/lib/mysql/sst_in_progress (20241218 13:13:22.188)
WSREP_SST: [INFO] previous SST is not completed, waiting for it to exit (20241218 13:13:22.209)
WSREP_SST: [INFO] previous SST is not completed, waiting for it to exit (20241218 13:13:23.219)
2024-12-18 13:13:24 0 [Note] WSREP: (daa21bca-8ab0, 'tcp://0.0.0.0:4567') turning message relay requesting off
WSREP_SST: [INFO] previous SST is not completed, waiting for it to exit (20241218 13:13:24.229)
WSREP_SST: [INFO] previous SST is not completed, waiting for it to exit (20241218 13:13:25.240)
WSREP_SST: [INFO] previous SST is not completed, waiting for it to exit (20241218 13:13:26.252)
WSREP_SST: [INFO] previous SST is not completed, waiting for it to exit (20241218 13:13:27.263)
WSREP_SST: [INFO] previous SST is not completed, waiting for it to exit (20241218 13:13:28.275)
WSREP_SST: [INFO] previous SST is not completed, waiting for it to exit (20241218 13:13:29.289)
WSREP_SST: [INFO] previous SST is not completed, waiting for it to exit (20241218 13:13:30.301)
WSREP_SST: [INFO] previous SST is not completed, waiting for it to exit (20241218 13:13:31.316)
WSREP_SST: [ERROR] previous SST script still running. (20241218 13:13:31.321)
2024-12-18 13:13:31 0 [ERROR] WSREP: Failed to read 'ready ' from: wsrep_sst_mariabackup --role 'joiner' --address '192.168.126.100' --datadir '/var/lib/mysql/' --parent 1 --progress 0 --mysqld-args --wsrep_on=ON --wsrep_provider=/usr/lib/galera/libgalera_smm.so --wsrep_cluster_address=gcomm://192.168.126.100:4567,192.168.126.100:4568,192.168.126.100:4569
Read: '(null)'
2024-12-18 13:13:31 0 [ERROR] WSREP: Process completed with error: wsrep_sst_mariabackup --role 'joiner' --address '192.168.126.100' --datadir '/var/lib/mysql/' --parent 1 --progress 0 --mysqld-args --wsrep_on=ON --wsrep_provider=/usr/lib/galera/libgalera_smm.so --wsrep_cluster_address=gcomm://192.168.126.100:4567,192.168.126.100:4568,192.168.126.100:4569: 114 (Operation already in progress)
2024-12-18 13:13:31 1 [ERROR] WSREP: Failed to prepare for 'mariabackup' SST. Unrecoverable.
2024-12-18 13:13:31 1 [ERROR] WSREP: SST request callback failed. This is unrecoverable, restart required.
2024-12-18 13:13:31 1 [Note] WSREP: ReplicatorSMM::abort()
2024-12-18 13:13:31 1 [Note] WSREP: Closing send monitor...
2024-12-18 13:13:31 1 [Note] WSREP: Closed send monitor.
2024-12-18 13:13:31 1 [Note] WSREP: gcomm: terminating thread
2024-12-18 13:13:31 1 [Note] WSREP: gcomm: joining thread
2024-12-18 13:13:31 1 [Note] WSREP: gcomm: closing backend
2024-12-18 13:13:32 1 [Note] WSREP: view(view_id(NON_PRIM,d0c11824-880c,334) memb {
daa21bca-8ab0,0
} joined {
} left {
} partitioned {
d0c11824-880c,0
d0cbc809-b497,0
})
2024-12-18 13:13:32 1 [Note] WSREP: PC protocol downgrade 1 -> 0
2024-12-18 13:13:32 1 [Note] WSREP: view((empty))
2024-12-18 13:13:32 1 [Note] WSREP: gcomm: closed
2024-12-18 13:13:32 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2024-12-18 13:13:32 0 [Note] WSREP: Flow-control interval: [16, 16]
2024-12-18 13:13:32 0 [Note] WSREP: Received NON-PRIMARY.
2024-12-18 13:13:32 0 [Note] WSREP: Shifting PRIMARY -> OPEN (TO: 328)
2024-12-18 13:13:32 0 [Note] WSREP: New SELF-LEAVE.
2024-12-18 13:13:32 0 [Note] WSREP: Flow-control interval: [0, 0]
2024-12-18 13:13:32 0 [Note] WSREP: Received SELF-LEAVE. Closing connection.
2024-12-18 13:13:32 0 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 328)
2024-12-18 13:13:32 0 [Note] WSREP: RECV thread exiting 0: Success
2024-12-18 13:13:32 1 [Note] WSREP: recv_thread() joined.
2024-12-18 13:13:32 1 [Note] WSREP: Closing send queue.
2024-12-18 13:13:32 1 [Note] WSREP: Closing receive queue.
2024-12-18 13:13:32 1 [Note] WSREP: mysqld: Terminated.
241218 13:13:32 [ERROR] mysqld got signal 11 ;
Sorry, we probably made a mistake, and this is a bug.

Your assistance in bug reporting will enable us to fix this for the next release.
To report this bug, see https://mariadb.com/kb/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.

Server version: 10.5.26-MariaDB-ubu2004-log source revision: 7a5b8bf0f5470a13094101f0a4bdfa9e1b9ded02
key_buffer_size=0
read_buffer_size=131072
max_used_connections=0
max_threads=65537
thread_count=3
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 144287360 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7fe59c000c58
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7fe5b1d85d98 thread_stack 0x49000
Printing to addr2line failed
mysqld(my_print_stacktrace+0x32)[0x557e4ba5dad2]
mysqld(handle_fatal_signal+0x475)[0x557e4b4825f5]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7fe5c0ebf420]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x213)[0x7fe5c09a2941]
/usr/lib/galera/libgalera_smm.so(+0x22b532)[0x7fe5bbf84532]
/usr/lib/galera/libgalera_smm.so(+0x6fae8)[0x7fe5bbdc8ae8]
/usr/lib/galera/libgalera_smm.so(+0x7fc95)[0x7fe5bbdd8c95]
/usr/lib/galera/libgalera_smm.so(+0x806ff)[0x7fe5bbdd96ff]
/usr/lib/galera/libgalera_smm.so(+0x80d4d)[0x7fe5bbdd9d4d]
/usr/lib/galera/libgalera_smm.so(+0xb236b)[0x7fe5bbe0b36b]
/usr/lib/galera/libgalera_smm.so(+0xb285f)[0x7fe5bbe0b85f]
/usr/lib/galera/libgalera_smm.so(+0x7ef10)[0x7fe5bbdd7f10]
/usr/lib/galera/libgalera_smm.so(+0x52581)[0x7fe5bbdab581]
mysqld(_ZN5wsrep18wsrep_provider_v2611run_applierEPNS_21high_priority_serviceE+0x12)[0x557e4bafe4c2]
mysqld(+0xcef8d1)[0x557e4b77e8d1]
mysqld(_Z15start_wsrep_THDPv+0x26f)[0x557e4b76d3ef]
mysqld(+0xc66eff)[0x557e4b6f5eff]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8609)[0x7fe5c0eb3609]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x43)[0x7fe5c0a9f353]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x0): (null)
Connection ID (thread ID): 1
Status: NOT_KILLED

Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on,not_null_range_scan=off

The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mariadbd/ contains
information that should help you find out what is causing the crash.

We think the query pointer is invalid, but we will try to print it anyway.
Query:

Writing a core file...
Working directory at /var/lib/mysql
Resource Limits:
Limit Soft Limit Hard Limit Units
Max cpu time unlimited unlimited seconds
Max file size unlimited unlimited bytes
Max data size unlimited unlimited bytes
Max stack size 8388608 unlimited bytes
Max core file size 0 0 bytes
Max resident set unlimited unlimited bytes
Max processes unlimited unlimited processes
Max open files 1048576 1048576 files
Max locked memory 65536 65536 bytes
Max address space unlimited unlimited bytes
Max file locks unlimited unlimited locks
Max pending signals 15312 15312 signals
Max msgqueue size 819200 819200 bytes
Max nice priority 0 0
Max realtime priority 0 0
Max realtime timeout unlimited unlimited us
Core pattern: |/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E

Kernel version: Linux version 5.4.0-202-generic (buildd@lcy02-amd64-104) (gcc version 9.4.0 (Ubuntu 9.4.0-1ubuntu1~20.04.2)) #222-Ubuntu SMP Fri Nov 8 14:45:04 UTC 2024
`

@wuqiang0720
Copy link

is anyone know what the root cause about the above issue?

@grooverdan
Copy link
Member

I can't see this as an existing issue so can you create a new issue on https://jira.mariadb.org following https://mariadb.com/kb/en/reporting-bugs.

There are two issues:

  1. Stale sst_in_progress file: /var/lib/mysql/sst_in_progress

I can't tell if there's a real possibility that this node is already having a SST in progress from another donor machine, it seems unlikey. But recommend removing this file.

  1. the crash in the shutdown as a result of this sst_in_progress error.

Please include the logs from all nodes.

@MariaDB MariaDB locked as resolved and limited conversation to collaborators Dec 18, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

No branches or pull requests

3 participants