Cluster fails to restart if lock is held by other pod than the first one in the statefulset

Please, answer some short questions which should help us to understand your problem / question better?

- **Which image of the operator are you using?** registry.opensource.zalan.do/acid/postgres-operator:v1.8.2
- **Where do you run it - cloud or metal? Kubernetes or OpenShift?** k0s
- **Are you running Postgres Operator in production?** yes
- **Type of issue?** Bug

Some general remarks when posting a bug report:
- Please, check the operator, pod (Patroni) and postgresql logs first. When copy-pasting many log lines please do it in a separate GitHub gist together with your Postgres CRD and configuration manifest.
- If you feel this issue might be more related to the [Spilo](https://github.com/zalando/spilo/issues) docker image or [Patroni](https://github.com/zalando/patroni/issues), consider opening issues in the respective repos.

---

Cluster fails to restart if the lock is held by pod-1, while the statefulset is restarting. Pod-0 will start, recognise that pod-1 holds the lock and wait forever, eventually falling into a state where it knows its not up to date and thus waits forever.

Due to using podManagementPolicy Ordered, instead of Parallel, pod-1 will never be started, and thus this configuration stays broken.

In my case this happened when a node failed, there was a failover to pod-1 and then both died.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cluster fails to restart if lock is held by other pod than the first one in the statefulset #1978

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Cluster fails to restart if lock is held by other pod than the first one in the statefulset #1978

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions