OOM Results in Endless Patroni Restart Loop

Please, answer some short questions which should help us to understand your problem / question better?

- **Which image of the operator are you using?** registry.opensource.zalan.do/acid/spilo-13:2.0-p7
- **Where do you run it - Azure / AWS
- **Are you running Postgres Operator in production?** Yes
- **Type of issue?** Bug report

We are seeing the Spilo pods occasionally end up in a deadlock where Patroni keeps restarting over and over, but failing because Postgres is already running.

This issue is already known on Patroni's side: https://github.com/zalando/patroni/issues/1733

However, it appears that @CyberDem0n decided to close it without resolution due to his belief the user was using Patroni incorrectly.

Unfortunately, it appears normal operations for the postgres-operator are also causing this bug to trigger.

We're not doing anything special for this to occur - it seems to just happen on its own, and it's not clear exactly what the trigger is.

There's only one log entry prior to this occurring:

```
Dec 14 17:05:17.903 aks-nodepool1-28461340-vmss000001   spilo-13    prod-celanese   /run/service/patroni: finished with code=-1 signal=9
```

This lines up with what @CyberDem0n mentions in the Patroni issue, that `kill -9` on Patroni can trigger this bug. However, no one is manually running `kill -9` on our side - this appears to be something either happening from within Spilo or Kubernetes itself.

As I mention in the other issue as well, simply doing a `pkill postgresql` within the deadlocked pod is enough to fix the issue.

So here's the issues at play here:

* What is causing this to happen in postgres-operator?
* Is this a problem with the postgres-operator, or with Kubernetes?
* How can one prevent this from happening?
* Can this be fixed on the postgres-operator side?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

OOM Results in Endless Patroni Restart Loop #1725

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

OOM Results in Endless Patroni Restart Loop #1725

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions