Skip to content

PGO no primary instance with error Readiness probe failed: HTTP probe failed with statuscode: 503 #3521

Closed
@modigithub

Description

@modigithub

Hello,
at the moment I'm not sure if it's a bug. I installed pgo in my cluster exactly as instructed. I also modified the postgres.yml a bit. the last part why i'm posting a bug report here is that i can't connect to the svc hippo primary.

Overview

I have pgo according to instructions: https://access.crunchydata.com/documentation/postgres-operator/latest/quickstart/

installed in my cluster. Essentially everything worked. I had to provide some PV. I implemented pgAdmin to access the cluster. That's when I first noticed that postgresql instance didn't work
image

At the beginning I thought that the services didn't work here. However, everything is fine.
image

After that I checked the pods. Then I noticed that the instances were not completed.
image

Environment

  • Platform: (Kubernetes (Kubespray), Debian, Bare Metal,)
  • Kubernetes Version: 1.25.4
  • PGO Image Tag: ubi8-5.3.0-0
  • Postgres Version: 14
  • Storage: hostpath

Steps to Reproduce

Installation according to the instructions:
Step 1: Download the examples
Step 2: Install PGO, the Postgres operator
Adaptation of the postgres.yaml:
apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
name: hippo
spec:
image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres:ubi8-14.6-2
postgresVersion: 14
monitoring:
pgmonitor:
exporter:
image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres-exporter:ubi8-5.3.0-0
instances:
- name: instance1
replicas: 2
dataVolumeClaimSpec:
accessModes:
- "ReadWriteOnce"
resources:
requests:
storage: 1Gi
backups:
pgbackrest:
image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:ubi8-2.41-2
repos:
- name: repo1
volume:
volumeClaimSpec:
accessModes:
- "ReadWriteOnce"
resources:
requests:
storage: 1Gi
userInterface:
pgAdmin:
image: registry.developers.crunchydata.com/crunchydata/crunchy-pgadmin4:ubi8-4.30-4
dataVolumeClaimSpec:
accessModes:
- "ReadWriteOnce"
resources:
requests:
storage: 1Gi

REPRO

An error does not appear. (apart from not being able to connect)
A hint lies in the pod event describtion:
kubectl describe pod -n postgres-operator hippo-instance1-hd2m-0
image

EXPECTED

I want to be able to connect to the Postgres cluster.

ACTUAL

I can't connect due to the error message above.

Logs

I found out that something is wrong with the log. He obviously can't connect to anything here. I have neither any firewall rules nor blockades. So I can't say exactly why he's blocking something
2023-01-05 09:41:44,512 INFO: Lock owner: None; I am hippo-instance1-hd2m-0
2023-01-05 09:41:44,512 INFO: not healthy enough for leader race
2023-01-05 09:41:44,512 INFO: restarting after failure in progress
/tmp/postgres:5432 - no response
2023-01-05 09:41:54,512 WARNING: Postgresql is not running.
2023-01-05 09:41:54,512 INFO: Lock owner: None; I am hippo-instance1-hd2m-0
2023-01-05 09:41:54,517 INFO: pg_controldata:
pg_control version number: 1300
Catalog version number: 202107181
Database system identifier: 7184294301444526168
Database cluster state: shut down in recovery
pg_control last modified: Thu Jan 5 09:18:38 2023
Latest checkpoint location: 0/C000180
Latest checkpoint's REDO location: 0/C000180
Latest checkpoint's REDO WAL file: 0000000C000000000000000C
Latest checkpoint's TimeLineID: 12
Latest checkpoint's PrevTimeLineID: 12
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID: 0:749
Latest checkpoint's NextOID: 32768
Latest checkpoint's NextMultiXactId: 1
Latest checkpoint's NextMultiOffset: 0
Latest checkpoint's oldestXID: 726
Latest checkpoint's oldestXID's DB: 1
Latest checkpoint's oldestActiveXID: 0
Latest checkpoint's oldestMultiXid: 1
Latest checkpoint's oldestMulti's DB: 1
Latest checkpoint's oldestCommitTsXid: 0
Latest checkpoint's newestCommitTsXid: 0
Time of latest checkpoint: Tue Jan 3 10:34:34 2023
Fake LSN counter for unlogged rels: 0/3E8
Minimum recovery ending location: 0/C0020D0
Min recovery ending loc's timeline: 12
Backup start location: 0/0
Backup end location: 0/0
End-of-backup record required: no
wal_level setting: logical
wal_log_hints setting: on
max_connections setting: 100
max_worker_processes setting: 8
max_wal_senders setting: 10
max_prepared_xacts setting: 0
max_locks_per_xact setting: 64
track_commit_timestamp setting: off
Maximum data alignment: 8
Database block size: 8192
Blocks per segment of large relation: 131072
WAL block size: 8192
Bytes per WAL segment: 16777216
Maximum length of identifiers: 64
Maximum columns in an index: 32
Maximum size of a TOAST chunk: 1996
Size of a large-object chunk: 2048
Date/time type storage: 64-bit integers
Float8 argument passing: by value
Data page checksum version: 1
Mock authentication nonce: f7bbed266ef770855f400b49b8a10707c7397a9ef3cb7ee24f56f6f1f286c4e5

2023-01-05 09:41:54,529 INFO: Lock owner: None; I am hippo-instance1-hd2m-0
2023-01-05 09:41:54,623 INFO: starting as a secondary
2023-01-05 09:41:54.819 UTC [4988] LOG: pgaudit extension initialized
2023-01-05 09:41:54,824 INFO: postmaster pid=4988
2023-01-05 09:41:54.831 UTC [4988] LOG: redirecting log output to logging collector process
2023-01-05 09:41:54.831 UTC [4988] HINT: Future log output will appear in directory "log".
/tmp/postgres:5432 - no response
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
2023-01-05 09:42:04,513 INFO: Lock owner: None; I am hippo-instance1-hd2m-0
2023-01-05 09:42:04,513 INFO: not healthy enough for leader race
2023-01-05 09:42:04,566 INFO: restarting after failure in progress
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections

I only copied a small part here, as the areas are repeated.

Additional Information

I've been trying to solve the problem for 2 days now. Unfortunately without success. I've restarted the nodes several times and also deleted the pods quite often. I have also deleted and reinstalled pgo several times.

As already mentioned, I'm missing the primary instance. I think it has to do with that:
image

I hope that someone can help me. Unfortunately I'm at the end of my ideas now

In addition, the log writes me the following message:
2023-01-05 09:41:54,512 WARNING: Postgresql is not running.

I didn't find anything about it in the manual. I think it's not intended that I have to install and start postgres manually?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions