Description
Please ensure you do the following when reporting a bug:
- Provide a concise description of what the bug is.
- Provide information about your environment.
- Provide clear steps to reproduce the bug.
- Attach applicable logs. Please do not attach screenshots showing logs unless you are unable to copy and paste the log data.
- Ensure any code / output examples are properly formatted for legibility.
Note that some logs needed to troubleshoot may be found in the /pgdata/<CLUSTERNAME>/pg_log
directory on your Postgres instance.
An incomplete bug report can lead to delays in resolving the issue or the closing of a ticket, so please be as detailed as possible.
If you are looking for general support, please view the support page for where you can ask questions.
Thanks for reporting the issue, we're looking forward to helping you!
Overview
Using Operator Lifecyle Manager, lifecycling Postgres Crunchy operator on OpenShift.
Channel v5. installPlanApproval automatic.
As of last Friday, I saw crunchy operator upgrading, leaving that operator stuck in crashloopbackoff.
Upgrading from postgresoperator.v5.8.1 to postgresoperator.v5.8.2
Environment
Please provide the following details:
- Platform: OpenShift
- Platform Version: 4.16.39 (kubernetes 1.29.14)
- PGO Image Tag: registry.connect.redhat.com/crunchydata/postgres-operator@sha256:2e010468471f3c55acdfe67e7b71d15af973fb5708d4a1199eeace03e4da4d69
- Postgres Version N/A
- Storage: azureDisk or EBS
Steps to Reproduce
REPRO
Provide steps to get to the error condition:
- Install crunchy operator using channel v5, forcing CSV to postgresoperator.v5.8.1, disabling auto approvals
- Approve installplan installing v5.8.1, confirm that version works
- Approve installplan upgrading to v5.8.2, should get those errors
EXPECTED
Operator can upgrade
ACTUAL
Operator stuck upgrading
Logs
time="2025-05-19T08:12:52Z" level=debug msg="debug flag set to true" version=5.8.2-0
time="2025-05-19T08:12:52Z" level=info msg="feature gates" PGO_FEATURE_GATES= enabled="AutoCreateUserSchema=true,InstanceSidecars=true,PGUpgradeCPUConcurrency=true" version=5.8.2-0
time="2025-05-19T08:12:52Z" level=debug msg="Found APIs" index_size=377 version=5.8.2-0
time="2025-05-19T08:12:52Z" level=info msg="connected to Kubernetes" api=v1.29.14+7cf4c05 openshift=true version=5.8.2-0
time="2025-05-19T08:12:52Z" level=info msg="upgrade checking enabled" version=5.8.2-0
time="2025-05-19T08:12:52Z" level=info msg="Starting metrics server" version=5.8.2-0
time="2025-05-19T08:12:52Z" level=info msg="enabling metrics via http/1.1" version=5.8.2-0
time="2025-05-19T08:12:52Z" level=info msg="starting server" addr="[::]:8081" name="health probe" version=5.8.2-0
I0519 08:12:52.814126 1 leaderelection.go:254] attempting to acquire leader lease my-crunchy-postgres-operator/cpk-leader-election-lease...
time="2025-05-19T08:12:53Z" level=info msg="Serving metrics server" bindAddress=":8443" secure=true version=5.8.2-0
I0519 08:13:10.270440 1 leaderelection.go:268] successfully acquired lease my-crunchy-postgres-operator/cpk-leader-election-lease
time="2025-05-19T08:13:10Z" level=debug msg="pgo-5b7547d7cd-pgjxg_71c5e0ae-acd2-45a2-8d2c-1f9ad4993516 became leader" object="{Lease my-crunchy-postgres-operator cpk-leader-election-lease 59a9eb1b-22f9-4b8d-be38-537d6e2a0faf coordination.k8s.io/v1 2280434222 }" reason=LeaderElection type=Normal version=5.8.2-0
time="2025-05-19T08:13:10Z" level=info msg="Starting EventSource" controller=pgupgrade controllerGroup=postgres-operator.crunchydata.com controllerKind=PGUpgrade source="kind source: *v1beta1.PGUpgrade" version=5.8.2-0
time="2025-05-19T08:13:10Z" level=info msg="Starting EventSource" controller=pgupgrade controllerGroup=postgres-operator.crunchydata.com controllerKind=PGUpgrade source="kind source: *v1.Job" version=5.8.2-0
time="2025-05-19T08:13:10Z" level=info msg="Starting EventSource" controller=pgupgrade controllerGroup=postgres-operator.crunchydata.com controllerKind=PGUpgrade source="kind source: *v1beta1.PostgresCluster" version=5.8.2-0
time="2025-05-19T08:13:10Z" level=info msg="Starting Controller" controller=pgupgrade controllerGroup=postgres-operator.crunchydata.com controllerKind=PGUpgrade version=5.8.2-0
time="2025-05-19T08:13:10Z" level=info msg="Starting EventSource" controller=postgrescluster controllerGroup=postgres-operator.crunchydata.com controllerKind=PostgresCluster source="kind source: *v1beta1.PostgresCluster" version=5.8.2-0
time="2025-05-19T08:13:10Z" level=info msg="Starting EventSource" controller=postgrescluster controllerGroup=postgres-operator.crunchydata.com controllerKind=PostgresCluster source="kind source: *v1.ConfigMap" version=5.8.2-0
time="2025-05-19T08:13:10Z" level=info msg="Starting EventSource" controller=pgadmin controllerGroup=postgres-operator.crunchydata.com controllerKind=PGAdmin source="kind source: *v1beta1.PGAdmin" version=5.8.2-0
time="2025-05-19T08:13:10Z" level=info msg="Starting EventSource" controller=pgadmin controllerGroup=postgres-operator.crunchydata.com controllerKind=PGAdmin source="kind source: *v1.ConfigMap" version=5.8.2-0
time="2025-05-19T08:13:10Z" level=info msg="Starting EventSource" controller=pgadmin controllerGroup=postgres-operator.crunchydata.com controllerKind=PGAdmin source="kind source: *v1.PersistentVolumeClaim" version=5.8.2-0
time="2025-05-19T08:13:10Z" level=info msg="Starting EventSource" controller=crunchybridgecluster controllerGroup=postgres-operator.crunchydata.com controllerKind=CrunchyBridgeCluster source="kind source: *v1beta1.CrunchyBridgeCluster" version=5.8.2-0
time="2025-05-19T08:13:10Z" level=info msg="Starting EventSource" controller=pgadmin controllerGroup=postgres-operator.crunchydata.com controllerKind=PGAdmin source="kind source: *v1.Secret" version=5.8.2-0
time="2025-05-19T08:13:10Z" level=info msg="Starting EventSource" controller=pgadmin controllerGroup=postgres-operator.crunchydata.com controllerKind=PGAdmin source="kind source: *v1.StatefulSet" version=5.8.2-0
time="2025-05-19T08:13:10Z" level=info msg="Starting EventSource" controller=crunchybridgecluster controllerGroup=postgres-operator.crunchydata.com controllerKind=CrunchyBridgeCluster source="kind source: *v1.Secret" version=5.8.2-0
time="2025-05-19T08:13:10Z" level=info msg="Starting EventSource" controller=postgrescluster controllerGroup=postgres-operator.crunchydata.com controllerKind=PostgresCluster source="kind source: *v1.Endpoints" version=5.8.2-0
time="2025-05-19T08:13:10Z" level=info msg="Starting EventSource" controller=postgrescluster controllerGroup=postgres-operator.crunchydata.com controllerKind=PostgresCluster source="kind source: *v1.PersistentVolumeClaim" version=5.8.2-0
time="2025-05-19T08:13:10Z" level=info msg="Starting EventSource" controller=postgrescluster controllerGroup=postgres-operator.crunchydata.com controllerKind=PostgresCluster source="kind source: *v1.Secret" version=5.8.2-0
time="2025-05-19T08:13:10Z" level=info msg="Starting EventSource" controller=pgadmin controllerGroup=postgres-operator.crunchydata.com controllerKind=PGAdmin source="kind source: *v1.Service" version=5.8.2-0
time="2025-05-19T08:13:10Z" level=info msg="Starting EventSource" controller=pgadmin controllerGroup=postgres-operator.crunchydata.com controllerKind=PGAdmin source="kind source: *v1beta1.PostgresCluster" version=5.8.2-0
time="2025-05-19T08:13:10Z" level=info msg="Starting EventSource" controller=pgadmin controllerGroup=postgres-operator.crunchydata.com controllerKind=PGAdmin source="kind source: *v1.Secret" version=5.8.2-0
time="2025-05-19T08:13:10Z" level=info msg="Starting EventSource" controller=crunchybridgecluster controllerGroup=postgres-operator.crunchydata.com controllerKind=CrunchyBridgeCluster source="kind source: *v1.Secret" version=5.8.2-0
time="2025-05-19T08:13:10Z" level=info msg="Starting Controller" controller=pgadmin controllerGroup=postgres-operator.crunchydata.com controllerKind=PGAdmin version=5.8.2-0
time="2025-05-19T08:13:10Z" level=info msg="Starting EventSource" controller=postgrescluster controllerGroup=postgres-operator.crunchydata.com controllerKind=PostgresCluster source="kind source: *v1.Service" version=5.8.2-0
time="2025-05-19T08:13:10Z" level=info msg="Starting EventSource" controller=postgrescluster controllerGroup=postgres-operator.crunchydata.com controllerKind=PostgresCluster source="kind source: *v1.ServiceAccount" version=5.8.2-0
time="2025-05-19T08:13:10Z" level=info msg="Starting EventSource" controller=postgrescluster controllerGroup=postgres-operator.crunchydata.com controllerKind=PostgresCluster source="kind source: *v1.Deployment" version=5.8.2-0
time="2025-05-19T08:13:10Z" level=info msg="Starting EventSource" controller=postgrescluster controllerGroup=postgres-operator.crunchydata.com controllerKind=PostgresCluster source="kind source: *v1.StatefulSet" version=5.8.2-0
time="2025-05-19T08:13:10Z" level=info msg="Starting EventSource" controller=crunchybridgecluster controllerGroup=postgres-operator.crunchydata.com controllerKind=CrunchyBridgeCluster source="every 5m0s" version=5.8.2-0
time="2025-05-19T08:13:10Z" level=info msg="Starting EventSource" controller=postgrescluster controllerGroup=postgres-operator.crunchydata.com controllerKind=PostgresCluster source="kind source: *v1.Job" version=5.8.2-0
time="2025-05-19T08:13:10Z" level=info msg="Starting EventSource" controller=postgrescluster controllerGroup=postgres-operator.crunchydata.com controllerKind=PostgresCluster source="kind source: *v1.Role" version=5.8.2-0
time="2025-05-19T08:13:10Z" level=info msg="Starting EventSource" controller=postgrescluster controllerGroup=postgres-operator.crunchydata.com controllerKind=PostgresCluster source="kind source: *v1.RoleBinding" version=5.8.2-0
time="2025-05-19T08:13:10Z" level=info msg="Starting EventSource" controller=postgrescluster controllerGroup=postgres-operator.crunchydata.com controllerKind=PostgresCluster source="kind source: *v1.CronJob" version=5.8.2-0
time="2025-05-19T08:13:10Z" level=info msg="Starting EventSource" controller=postgrescluster controllerGroup=postgres-operator.crunchydata.com controllerKind=PostgresCluster source="kind source: *v1.PodDisruptionBudget" version=5.8.2-0
time="2025-05-19T08:13:10Z" level=info msg="Starting EventSource" controller=postgrescluster controllerGroup=postgres-operator.crunchydata.com controllerKind=PostgresCluster source="kind source: *v1.Pod" version=5.8.2-0
time="2025-05-19T08:13:10Z" level=info msg="Starting EventSource" controller=postgrescluster controllerGroup=postgres-operator.crunchydata.com controllerKind=PostgresCluster source="kind source: *v1.StatefulSet" version=5.8.2-0
time="2025-05-19T08:13:10Z" level=info msg="Starting Controller" controller=postgrescluster controllerGroup=postgres-operator.crunchydata.com controllerKind=PostgresCluster version=5.8.2-0
W0519 08:13:10.300554 1 reflector.go:561] k8s.io/client-go@v0.31.0/tools/cache/reflector.go:243: failed to list *v1.PodDisruptionBudget: poddisruptionbudgets.policy is forbidden: User "system:serviceaccount:my-crunchy-postgres-operator:pgo" cannot list resource "poddisruptionbudgets" in API group "policy" at the cluster scope
E0519 08:13:10.300658 1 reflector.go:158] "Unhandled Error" err="k8s.io/client-go@v0.31.0/tools/cache/reflector.go:243: Failed to watch *v1.PodDisruptionBudget: failed to list *v1.PodDisruptionBudget: poddisruptionbudgets.policy is forbidden: User \"system:serviceaccount:my-crunchy-postgres-operator:pgo\" cannot list resource \"poddisruptionbudgets\" in API group \"policy\" at the cluster scope" logger="UnhandledError"
time="2025-05-19T08:13:10Z" level=info msg="Starting Controller" controller=crunchybridgecluster controllerGroup=postgres-operator.crunchydata.com controllerKind=CrunchyBridgeCluster version=5.8.2-0
W0519 08:13:11.724946 1 reflector.go:561] k8s.io/client-go@v0.31.0/tools/cache/reflector.go:243: failed to list *v1.PodDisruptionBudget: poddisruptionbudgets.policy is forbidden: User "system:serviceaccount:my-crunchy-postgres-operator:pgo" cannot list resource "poddisruptionbudgets" in API group "policy" at the cluster scope
E0519 08:13:11.725000 1 reflector.go:158] "Unhandled Error" err="k8s.io/client-go@v0.31.0/tools/cache/reflector.go:243: Failed to watch *v1.PodDisruptionBudget: failed to list *v1.PodDisruptionBudget: poddisruptionbudgets.policy is forbidden: User \"system:serviceaccount:my-crunchy-postgres-operator:pgo\" cannot list resource \"poddisruptionbudgets\" in API group \"policy\" at the cluster scope" logger="UnhandledError"
W0519 08:13:13.460489 1 reflector.go:561] k8s.io/client-go@v0.31.0/tools/cache/reflector.go:243: failed to list *v1.PodDisruptionBudget: poddisruptionbudgets.policy is forbidden: User "system:serviceaccount:my-crunchy-postgres-operator:pgo" cannot list resource "poddisruptionbudgets" in API group "policy" at the cluster scope
E0519 08:13:13.460553 1 reflector.go:158] "Unhandled Error" err="k8s.io/client-go@v0.31.0/tools/cache/reflector.go:243: Failed to watch *v1.PodDisruptionBudget: failed to list *v1.PodDisruptionBudget: poddisruptionbudgets.policy is forbidden: User \"system:serviceaccount:my-crunchy-postgres-operator:pgo\" cannot list resource \"poddisruptionbudgets\" in API group \"policy\" at the cluster scope" logger="UnhandledError"
W0519 08:13:18.918743 1 reflector.go:561] k8s.io/client-go@v0.31.0/tools/cache/reflector.go:243: failed to list *v1.PodDisruptionBudget: poddisruptionbudgets.policy is forbidden: User "system:serviceaccount:my-crunchy-postgres-operator:pgo" cannot list resource "poddisruptionbudgets" in API group "policy" at the cluster scope
E0519 08:13:18.918803 1 reflector.go:158] "Unhandled Error" err="k8s.io/client-go@v0.31.0/tools/cache/reflector.go:243: Failed to watch *v1.PodDisruptionBudget: failed to list *v1.PodDisruptionBudget: poddisruptionbudgets.policy is forbidden: User \"system:serviceaccount:my-crunchy-postgres-operator:pgo\" cannot list resource \"poddisruptionbudgets\" in API group \"policy\" at the cluster scope" logger="UnhandledError"
W0519 08:13:29.043899 1 reflector.go:561] k8s.io/client-go@v0.31.0/tools/cache/reflector.go:243: failed to list *v1.PodDisruptionBudget: poddisruptionbudgets.policy is forbidden: User "system:serviceaccount:my-crunchy-postgres-operator:pgo" cannot list resource "poddisruptionbudgets" in API group "policy" at the cluster scope
E0519 08:13:29.043950 1 reflector.go:158] "Unhandled Error" err="k8s.io/client-go@v0.31.0/tools/cache/reflector.go:243: Failed to watch *v1.PodDisruptionBudget: failed to list *v1.PodDisruptionBudget: poddisruptionbudgets.policy is forbidden: User \"system:serviceaccount:my-crunchy-postgres-operator:pgo\" cannot list resource \"poddisruptionbudgets\" in API group \"policy\" at the cluster scope" logger="UnhandledError"
time="2025-05-19T08:13:38Z" level=debug msg="could not complete upgrade check" response="Get \"https://operator-maestro.crunchydata.com/pgo-versions\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)" version=5.8.2-0
W0519 08:13:45.065829 1 reflector.go:561] k8s.io/client-go@v0.31.0/tools/cache/reflector.go:243: failed to list *v1.PodDisruptionBudget: poddisruptionbudgets.policy is forbidden: User "system:serviceaccount:my-crunchy-postgres-operator:pgo" cannot list resource "poddisruptionbudgets" in API group "policy" at the cluster scope
E0519 08:13:45.065877 1 reflector.go:158] "Unhandled Error" err="k8s.io/client-go@v0.31.0/tools/cache/reflector.go:243: Failed to watch *v1.PodDisruptionBudget: failed to list *v1.PodDisruptionBudget: poddisruptionbudgets.policy is forbidden: User \"system:serviceaccount:my-crunchy-postgres-operator:pgo\" cannot list resource \"poddisruptionbudgets\" in API group \"policy\" at the cluster scope" logger="UnhandledError"
W0519 08:14:25.821900 1 reflector.go:561] k8s.io/client-go@v0.31.0/tools/cache/reflector.go:243: failed to list *v1.PodDisruptionBudget: poddisruptionbudgets.policy is forbidden: User "system:serviceaccount:my-crunchy-postgres-operator:pgo" cannot list resource "poddisruptionbudgets" in API group "policy" at the cluster scope
E0519 08:14:25.821954 1 reflector.go:158] "Unhandled Error" err="k8s.io/client-go@v0.31.0/tools/cache/reflector.go:243: Failed to watch *v1.PodDisruptionBudget: failed to list *v1.PodDisruptionBudget: poddisruptionbudgets.policy is forbidden: User \"system:serviceaccount:my-crunchy-postgres-operator:pgo\" cannot list resource \"poddisruptionbudgets\" in API group \"policy\" at the cluster scope" logger="UnhandledError"
time="2025-05-19T08:15:10Z" level=error msg="Could not wait for Cache to sync" controller=pgadmin controllerGroup=postgres-operator.crunchydata.com controllerKind=PGAdmin error="failed to wait for pgadmin caches to sync: timed out waiting for cache to be synced for Kind *v1beta1.PGAdmin" version=5.8.2-0
time="2025-05-19T08:15:10Z" level=info msg="Stopping and waiting for non leader election runnables" version=5.8.2-0
time="2025-05-19T08:15:10Z" level=info msg="Stopping and waiting for leader election runnables" version=5.8.2-0
time="2025-05-19T08:15:10Z" level=error msg="Could not wait for Cache to sync" controller=pgupgrade controllerGroup=postgres-operator.crunchydata.com controllerKind=PGUpgrade error="failed to wait for pgupgrade caches to sync: timed out waiting for cache to be synced for Kind *v1beta1.PGUpgrade" version=5.8.2-0
time="2025-05-19T08:15:10Z" level=error msg="Could not wait for Cache to sync" controller=postgrescluster controllerGroup=postgres-operator.crunchydata.com controllerKind=PostgresCluster error="failed to wait for postgrescluster caches to sync: cache did not sync" version=5.8.2-0
time="2025-05-19T08:15:10Z" level=error msg="error received after stop sequence was engaged" error="failed to wait for pgupgrade caches to sync: timed out waiting for cache to be synced for Kind *v1beta1.PGUpgrade" version=5.8.2-0
time="2025-05-19T08:15:10Z" level=error msg="error received after stop sequence was engaged" error="failed to wait for postgrescluster caches to sync: cache did not sync" version=5.8.2-0
time="2025-05-19T08:15:10Z" level=error msg="failed to get informer from cache" error="Timeout: failed waiting for *v1.PodDisruptionBudget Informer to sync" version=5.8.2-0
time="2025-05-19T08:15:10Z" level=error msg="Could not wait for Cache to sync" controller=crunchybridgecluster controllerGroup=postgres-operator.crunchydata.com controllerKind=CrunchyBridgeCluster error="failed to wait for crunchybridgecluster caches to sync: cache did not sync" version=5.8.2-0
time="2025-05-19T08:15:10Z" level=info msg="Stopping and waiting for caches" version=5.8.2-0
time="2025-05-19T08:15:10Z" level=error msg="error received after stop sequence was engaged" error="failed to wait for crunchybridgecluster caches to sync: cache did not sync" version=5.8.2-0
time="2025-05-19T08:15:10Z" level=info msg="Stopping and waiting for webhooks" version=5.8.2-0
time="2025-05-19T08:15:10Z" level=info msg="Stopping and waiting for HTTP servers" version=5.8.2-0
time="2025-05-19T08:15:10Z" level=info msg="Shutting down metrics server with timeout of 1 minute" version=5.8.2-0
time="2025-05-19T08:15:10Z" level=info msg="shutting down server" addr="[::]:8081" name="health probe" version=5.8.2-0
time="2025-05-19T08:15:10Z" level=info msg="Wait completed, proceeding to shutdown the manager" version=5.8.2-0
time="2025-05-19T08:15:10Z" level=error msg="error received after stop sequence was engaged" error="leader election lost" version=5.8.2-0
time="2025-05-19T08:15:10Z" level=debug msg="pgo-5b7547d7cd-pgjxg_71c5e0ae-acd2-45a2-8d2c-1f9ad4993516 stopped leading" object="{Lease my-crunchy-postgres-operator cpk-leader-election-lease 59a9eb1b-22f9-4b8d-be38-537d6e2a0faf coordination.k8s.io/v1 2280437629 }" reason=LeaderElection type=Normal version=5.8.2-0
time="2025-05-19T08:15:52Z" level=info msg="received signal from OS" signal=terminated version=5.8.2-0
time="2025-05-19T08:15:52Z" level=info msg="shutting down" version=5.8.2-0
time="2025-05-19T08:15:52Z" level=error msg="shutdown failed" error="failed to wait for pgadmin caches to sync: timed out waiting for cache to be synced for Kind *v1beta1.PGAdmin" version=5.8.2-0
Additional Information
I could find ClusterRole & ClusterRoleBindings, created on the day that upgrade did trigger.
In that ClusterRole, I can see:
- apiGroups:
- policy/v1
resources:
- poddisruptionbudgets
verbs:
- create
- delete
- get
- list
- patch
- watch
Looks like that policy/v1
is a mistake: should be policy
api group?
If I duplicate ClusterRole+ClusterRoleBindings, under new names, fixing that policy
api group: operator starts just fine.