Description
Hello,
at the moment I'm not sure if it's a bug. I installed pgo in my cluster exactly as instructed. I also modified the postgres.yml a bit. the last part why i'm posting a bug report here is that i can't connect to the svc hippo primary.
Overview
I have pgo according to instructions: https://access.crunchydata.com/documentation/postgres-operator/latest/quickstart/
installed in my cluster. Essentially everything worked. I had to provide some PV. I implemented pgAdmin to access the cluster. That's when I first noticed that postgresql instance didn't work
At the beginning I thought that the services didn't work here. However, everything is fine.
After that I checked the pods. Then I noticed that the instances were not completed.
Environment
- Platform: (
Kubernetes (Kubespray)
,Debian
,Bare Metal
,) - Kubernetes Version: 1.25.4
- PGO Image Tag: ubi8-5.3.0-0
- Postgres Version: 14
- Storage: hostpath
Steps to Reproduce
Installation according to the instructions:
Step 1: Download the examples
Step 2: Install PGO, the Postgres operator
Adaptation of the postgres.yaml:
apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
name: hippo
spec:
image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres:ubi8-14.6-2
postgresVersion: 14
monitoring:
pgmonitor:
exporter:
image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres-exporter:ubi8-5.3.0-0
instances:
- name: instance1
replicas: 2
dataVolumeClaimSpec:
accessModes:
- "ReadWriteOnce"
resources:
requests:
storage: 1Gi
backups:
pgbackrest:
image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:ubi8-2.41-2
repos:
- name: repo1
volume:
volumeClaimSpec:
accessModes:
- "ReadWriteOnce"
resources:
requests:
storage: 1Gi
userInterface:
pgAdmin:
image: registry.developers.crunchydata.com/crunchydata/crunchy-pgadmin4:ubi8-4.30-4
dataVolumeClaimSpec:
accessModes:
- "ReadWriteOnce"
resources:
requests:
storage: 1Gi
REPRO
An error does not appear. (apart from not being able to connect)
A hint lies in the pod event describtion:
kubectl describe pod -n postgres-operator hippo-instance1-hd2m-0
EXPECTED
I want to be able to connect to the Postgres cluster.
ACTUAL
I can't connect due to the error message above.
Logs
I found out that something is wrong with the log. He obviously can't connect to anything here. I have neither any firewall rules nor blockades. So I can't say exactly why he's blocking something
2023-01-05 09:41:44,512 INFO: Lock owner: None; I am hippo-instance1-hd2m-0
2023-01-05 09:41:44,512 INFO: not healthy enough for leader race
2023-01-05 09:41:44,512 INFO: restarting after failure in progress
/tmp/postgres:5432 - no response
2023-01-05 09:41:54,512 WARNING: Postgresql is not running.
2023-01-05 09:41:54,512 INFO: Lock owner: None; I am hippo-instance1-hd2m-0
2023-01-05 09:41:54,517 INFO: pg_controldata:
pg_control version number: 1300
Catalog version number: 202107181
Database system identifier: 7184294301444526168
Database cluster state: shut down in recovery
pg_control last modified: Thu Jan 5 09:18:38 2023
Latest checkpoint location: 0/C000180
Latest checkpoint's REDO location: 0/C000180
Latest checkpoint's REDO WAL file: 0000000C000000000000000C
Latest checkpoint's TimeLineID: 12
Latest checkpoint's PrevTimeLineID: 12
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID: 0:749
Latest checkpoint's NextOID: 32768
Latest checkpoint's NextMultiXactId: 1
Latest checkpoint's NextMultiOffset: 0
Latest checkpoint's oldestXID: 726
Latest checkpoint's oldestXID's DB: 1
Latest checkpoint's oldestActiveXID: 0
Latest checkpoint's oldestMultiXid: 1
Latest checkpoint's oldestMulti's DB: 1
Latest checkpoint's oldestCommitTsXid: 0
Latest checkpoint's newestCommitTsXid: 0
Time of latest checkpoint: Tue Jan 3 10:34:34 2023
Fake LSN counter for unlogged rels: 0/3E8
Minimum recovery ending location: 0/C0020D0
Min recovery ending loc's timeline: 12
Backup start location: 0/0
Backup end location: 0/0
End-of-backup record required: no
wal_level setting: logical
wal_log_hints setting: on
max_connections setting: 100
max_worker_processes setting: 8
max_wal_senders setting: 10
max_prepared_xacts setting: 0
max_locks_per_xact setting: 64
track_commit_timestamp setting: off
Maximum data alignment: 8
Database block size: 8192
Blocks per segment of large relation: 131072
WAL block size: 8192
Bytes per WAL segment: 16777216
Maximum length of identifiers: 64
Maximum columns in an index: 32
Maximum size of a TOAST chunk: 1996
Size of a large-object chunk: 2048
Date/time type storage: 64-bit integers
Float8 argument passing: by value
Data page checksum version: 1
Mock authentication nonce: f7bbed266ef770855f400b49b8a10707c7397a9ef3cb7ee24f56f6f1f286c4e5
2023-01-05 09:41:54,529 INFO: Lock owner: None; I am hippo-instance1-hd2m-0
2023-01-05 09:41:54,623 INFO: starting as a secondary
2023-01-05 09:41:54.819 UTC [4988] LOG: pgaudit extension initialized
2023-01-05 09:41:54,824 INFO: postmaster pid=4988
2023-01-05 09:41:54.831 UTC [4988] LOG: redirecting log output to logging collector process
2023-01-05 09:41:54.831 UTC [4988] HINT: Future log output will appear in directory "log".
/tmp/postgres:5432 - no response
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
2023-01-05 09:42:04,513 INFO: Lock owner: None; I am hippo-instance1-hd2m-0
2023-01-05 09:42:04,513 INFO: not healthy enough for leader race
2023-01-05 09:42:04,566 INFO: restarting after failure in progress
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
I only copied a small part here, as the areas are repeated.
Additional Information
I've been trying to solve the problem for 2 days now. Unfortunately without success. I've restarted the nodes several times and also deleted the pods quite often. I have also deleted and reinstalled pgo several times.
As already mentioned, I'm missing the primary instance. I think it has to do with that:
I hope that someone can help me. Unfortunately I'm at the end of my ideas now
In addition, the log writes me the following message:
2023-01-05 09:41:54,512 WARNING: Postgresql is not running.
I didn't find anything about it in the manual. I think it's not intended that I have to install and start postgres manually?