Skip to content

Recovering failing after all nodes are powered off #2205

Open
@hshmilo

Description

@hshmilo

Please, answer some short questions which should help us to understand your problem / question better?

  • Which image of the operator are you using? registry.opensource.zalan.do/acid/postgres-operator:v1.9.0
  • Where do you run it - cloud or metal? Kubernetes or OpenShift? AWS K8s
  • Are you running Postgres Operator in production? no
  • Type of issue? Bug report

Hello,
I am observing that after reboot of all K8S nodes (VMs), some of the nodes (pods) of PG cluster have an empty string State, Host fields:

postgres@postgres-db-pg-cluster-2:~$ patronictl list
+ Cluster: postgres-db-pg-cluster (7195856528308064328) -----+----+-----------+
| Member                   | Host        | Role    | State   | TL | Lag in MB |
+--------------------------+-------------+---------+---------+----+-----------+
| postgres-db-pg-cluster-0 | x.x.x.149 | Replica | running |  6 |       128 |
| postgres-db-pg-cluster-1 | x.x.x.247 | Replica | running |  6 |       112 |
| postgres-db-pg-cluster-2 |             | Leader  |         |    |           |
+--------------------------+-------------+---------+---------+----+-----------+

The pods are in the Running state.
I can run SELECT from DB, but any ALTER TABLE queries are stuck:

postgres@postgres-db-pg-cluster-2:~$ psql -U postgres -d mydb -h postgres-db-pg-cluster
Password for user postgres:
psql (15.1 (Ubuntu 15.1-1.pgdg22.04+1), server 14.6 (Ubuntu 14.6-1.pgdg22.04+1))
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off)
Type "help" for help.

mydb=#  ALTER TABLE mytable alter column name type ltree using name::ltree;

When I deleted the postgres-db-pg-cluster-2 pod and the pod recreated, the Host, State fealds were assigned:

postgres@postgres-db-pg-cluster-2:~$ patronictl list
+ Cluster: postgres-db-pg-cluster (7195856528308064328) +---------+----+-----------+
| Member                   | Host        | Role         | State   | TL | Lag in MB |
+--------------------------+-------------+--------------+---------+----+-----------+
| postgres-db-pg-cluster-0 |             | Sync Standby |         |    |   unknown |
| postgres-db-pg-cluster-1 | 10.42.2.41  | Replica      | running | 10 |         0 |
| postgres-db-pg-cluster-2 | 10.42.0.114 | Leader       | running | 10 |           |
+--------------------------+-------------+--------------+---------+----+-----------+

Then I can perform any ALTER TABLE queries.

I have the following manifest:

apiVersion: acid.zalan.do/v1
kind: postgresql
metadata:
  name: postgres-db-pg-cluster
spec:
  databases: {}
  dockerImage: ghcr.io/zalando/spilo-15:2.1-p9
  enableConnectionPooler: false
  enableLogicalBackup: false
  enableMasterLoadBalancer: false
  enableReplicaLoadBalancer: false
  numberOfInstances: 3
  patroni:
    synchronous_mode: true
    synchronous_mode_strict: true
  postgresql:
    parameters:
      log_min_error_statement: debug1
      log_min_messages: debug1
      max_connections: "250"
      password_encryption: scram-sha-256
    version: "15"

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions