Skip to content

OOM Results in Endless Patroni Restart Loop #1725

Open
@apeschel

Description

@apeschel

Please, answer some short questions which should help us to understand your problem / question better?

  • Which image of the operator are you using? registry.opensource.zalan.do/acid/spilo-13:2.0-p7
  • **Where do you run it - Azure / AWS
  • Are you running Postgres Operator in production? Yes
  • Type of issue? Bug report

We are seeing the Spilo pods occasionally end up in a deadlock where Patroni keeps restarting over and over, but failing because Postgres is already running.

This issue is already known on Patroni's side: patroni/patroni#1733

However, it appears that @CyberDem0n decided to close it without resolution due to his belief the user was using Patroni incorrectly.

Unfortunately, it appears normal operations for the postgres-operator are also causing this bug to trigger.

We're not doing anything special for this to occur - it seems to just happen on its own, and it's not clear exactly what the trigger is.

There's only one log entry prior to this occurring:

Dec 14 17:05:17.903 aks-nodepool1-28461340-vmss000001   spilo-13    prod-celanese   /run/service/patroni: finished with code=-1 signal=9

This lines up with what @CyberDem0n mentions in the Patroni issue, that kill -9 on Patroni can trigger this bug. However, no one is manually running kill -9 on our side - this appears to be something either happening from within Spilo or Kubernetes itself.

As I mention in the other issue as well, simply doing a pkill postgresql within the deadlocked pod is enough to fix the issue.

So here's the issues at play here:

  • What is causing this to happen in postgres-operator?
  • Is this a problem with the postgres-operator, or with Kubernetes?
  • How can one prevent this from happening?
  • Can this be fixed on the postgres-operator side?

Metadata

Metadata

Assignees

No one assigned

    Labels

    postgresIssue more related to PostgreSQLquestion

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions