PG server won't start after crash #4042

zhangluva · 2024-12-05T02:23:54Z

Please ensure you do the following when reporting a bug:

Provide a concise description of what the bug is.
Provide information about your environment.
Provide clear steps to reproduce the bug.
Don't really know how to reproduce. We have the same system/configuration deployed to multiple clusters, only one is having the problem. The database is used by another OTF application, only thing different is the load pattern.
Attach applicable logs. Please do not attach screenshots showing logs unless you are unable to copy and paste the log data.
Ensure any code / output examples are properly formatted for legibility.

Note that some logs needed to troubleshoot may be found in the /pgdata/<CLUSTERNAME>/pg_log directory on your Postgres instance.

An incomplete bug report can lead to delays in resolving the issue or the closing of a ticket, so please be as detailed as possible.

If you are looking for general support, please view the support page for where you can ask questions.

Thanks for reporting the issue, we're looking forward to helping you!

Overview

When the primary server crashes, the failover happens correctly, but the old primary won't start with the following error message. Kill the pod clears the error and get the database container started correctly.

2024-12-04 15:35:46.387 | 2024-12-04 20:35:46.385 UTC [671109] HINT:  Terminate any old server processes associated with data directory "/pgdata/pg15". 
2024-12-04 15:35:46.387 | 2024-12-04 20:35:46,386 INFO:  stderr=2024-12-04 20:35:46.385 UTC [671109] FATAL:  pre-existing shared memory block (key 524289, ID 1) is still in use
2024-12-04 15:35:46.387 | 2024-12-04 20:35:46,386 INFO:  stdout= 
2024-12-04 15:35:46.386 | 2024-12-04 20:35:46,386 ERROR: Crash recovery finished with code=1 
2024-12-04 15:35:46.362 | 2024-12-04 20:35:46,362 INFO: doing crash recovery in a single user mode

Environment

Kubernetes version: 1.29.10
PGO version: 5.7.0
PG server version: 15.8 registry.developers.crunchydata.com/crunchydata/crunchy-postgres:ubi8-15.8-2
Server resources:

cpu requests/limits: 2000m/3000m
memory requests/limits: 2500/3000
Please provide the following details:
Platform: EKS
Platform Version: 1.29.10
PGO Image Tag: ubi8-5.7.0-0
Postgres Version ubi8-15.8-2
Storage: 10G PVC on GP3

Steps to Reproduce

REPRO

Provide steps to get to the error condition:

Run ...
Do ...
Try ...

EXPECTED

Provide the behavior that you expected.
If the database process keeps restarting, fail the liveness probe to get the pod restarted.

ACTUAL

Describe what actually happens
The database process keeps restarting and pod never gets into ready state (4/5 ready). If the new primary crashed during this time, the cluster becomes unavailable.

Logs

Please provided appropriate log output or any configuration files that may help troubleshoot the issue. DO NOT include sensitive information, such as passwords.

Additional Information

Please provide any additional information that may be helpful.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PG server won't start after crash #4042

PG server won't start after crash #4042

zhangluva commented Dec 5, 2024

PG server won't start after crash #4042

PG server won't start after crash #4042

Comments

zhangluva commented Dec 5, 2024

Overview

Environment

Steps to Reproduce

REPRO

EXPECTED

ACTUAL

Logs

Additional Information