Skip to content

Bug(?): invalid number of restore Jobs found when attempting to reconcile a pgBackRest data source #3364

Closed
@Richard87

Description

@Richard87

Had some issues with restore jobs hanging, so I deleted the cluster related pods. Now nothing is working with this log from PGO:

time="2022-08-26T14:02:56Z" level=debug msg="debug flag set to true" file="cmd/postgres-operator/main.go:63" func=main.main version=5.1.0-0
I0826 14:02:57.301458       1 request.go:655] Throttling request took 1.034908398s, request: GET:https://10.56.0.1:443/apis/acme.cert-manager.io/v1beta1?timeout=32s
time="2022-08-26T14:02:58Z" level=info msg="metrics server is starting to listen" addr=":8080" file="sigs.k8s.io/controller-runtime@v0.8.3/pkg/log/deleg.go:130" func="log.(*DelegatingLogger).Info" version=5.1.0-0
time="2022-08-26T14:02:58Z" level=info msg="starting controller runtime manager and will wait for signal to exit" file="cmd/postgres-operator/main.go:84" func=main.main version=5.1.0-0
time="2022-08-26T14:02:58Z" level=info msg="upgrade checking enabled" file="cmd/postgres-operator/main.go:89" func=main.main version=5.1.0-0
time="2022-08-26T14:02:58Z" level=info msg="starting metrics server" file="sigs.k8s.io/controller-runtime@v0.8.3/pkg/manager/internal.go:385" func="manager.(*controllerManager).serveMetrics.func2" path=/metrics version=5.1.0-0
time="2022-08-26T14:02:58Z" level=info msg="Starting EventSource" file="sigs.k8s.io/controller-runtime@v0.8.3/pkg/internal/controller/controller.go:165" func="controller.(*Controller).Start.func1" reconciler group=postgres-operator.crunchydata.com reconciler kind=PostgresCluster source="kind source: /, Kind=" version=5.1.0-0
time="2022-08-26T14:02:58Z" level=info msg="Starting EventSource" file="sigs.k8s.io/controller-runtime@v0.8.3/pkg/internal/controller/controller.go:165" func="controller.(*Controller).Start.func1" reconciler group=postgres-operator.crunchydata.com reconciler kind=PostgresCluster source="kind source: /, Kind=" version=5.1.0-0
time="2022-08-26T14:02:58Z" level=info msg="Starting EventSource" file="sigs.k8s.io/controller-runtime@v0.8.3/pkg/internal/controller/controller.go:165" func="controller.(*Controller).Start.func1" reconciler group=postgres-operator.crunchydata.com reconciler kind=PostgresCluster source="kind source: /, Kind=" version=5.1.0-0
time="2022-08-26T14:02:58Z" level=info msg="Starting EventSource" file="sigs.k8s.io/controller-runtime@v0.8.3/pkg/internal/controller/controller.go:165" func="controller.(*Controller).Start.func1" reconciler group=postgres-operator.crunchydata.com reconciler kind=PostgresCluster source="kind source: /, Kind=" version=5.1.0-0
time="2022-08-26T14:02:58Z" level=info msg="Starting EventSource" file="sigs.k8s.io/controller-runtime@v0.8.3/pkg/internal/controller/controller.go:165" func="controller.(*Controller).Start.func1" reconciler group=postgres-operator.crunchydata.com reconciler kind=PostgresCluster source="kind source: /, Kind=" version=5.1.0-0
time="2022-08-26T14:02:58Z" level=info msg="Starting EventSource" file="sigs.k8s.io/controller-runtime@v0.8.3/pkg/internal/controller/controller.go:165" func="controller.(*Controller).Start.func1" reconciler group=postgres-operator.crunchydata.com reconciler kind=PostgresCluster source="kind source: /, Kind=" version=5.1.0-0
time="2022-08-26T14:02:58Z" level=info msg="Starting EventSource" file="sigs.k8s.io/controller-runtime@v0.8.3/pkg/internal/controller/controller.go:165" func="controller.(*Controller).Start.func1" reconciler group=postgres-operator.crunchydata.com reconciler kind=PostgresCluster source="kind source: /, Kind=" version=5.1.0-0
time="2022-08-26T14:02:59Z" level=info msg="Starting EventSource" file="sigs.k8s.io/controller-runtime@v0.8.3/pkg/internal/controller/controller.go:165" func="controller.(*Controller).Start.func1" reconciler group=postgres-operator.crunchydata.com reconciler kind=PostgresCluster source="kind source: /, Kind=" version=5.1.0-0
time="2022-08-26T14:02:59Z" level=info msg="{\"pgo_versions\":[{\"tag\":\"v5.1.0\"},{\"tag\":\"v5.0.5\"},{\"tag\":\"v5.0.4\"},{\"tag\":\"v5.0.3\"},{\"tag\":\"v5.0.2\"},{\"tag\":\"v5.0.1\"},{\"tag\":\"v5.0.0\"}]}" X-Crunchy-Client-Metadata="{\"deployment_id\":\"46fad48b-2358-4353-969c-e69550caff50\",\"kubernetes_env\":\"v1.21.14-gke.700\",\"pgo_clusters_total\":9,\"pgo_version\":\"5.1.0-0\",\"is_open_shift\":false}" file="internal/upgradecheck/http.go:181" func=upgradecheck.CheckForUpgradesScheduler version=5.1.0-0
time="2022-08-26T14:02:59Z" level=info msg="Starting EventSource" file="sigs.k8s.io/controller-runtime@v0.8.3/pkg/internal/controller/controller.go:165" func="controller.(*Controller).Start.func1" reconciler group=postgres-operator.crunchydata.com reconciler kind=PostgresCluster source="kind source: /, Kind=" version=5.1.0-0
time="2022-08-26T14:02:59Z" level=info msg="Starting EventSource" file="sigs.k8s.io/controller-runtime@v0.8.3/pkg/internal/controller/controller.go:165" func="controller.(*Controller).Start.func1" reconciler group=postgres-operator.crunchydata.com reconciler kind=PostgresCluster source="kind source: /, Kind=" version=5.1.0-0
time="2022-08-26T14:02:59Z" level=info msg="Starting EventSource" file="sigs.k8s.io/controller-runtime@v0.8.3/pkg/internal/controller/controller.go:165" func="controller.(*Controller).Start.func1" reconciler group=postgres-operator.crunchydata.com reconciler kind=PostgresCluster source="kind source: /, Kind=" version=5.1.0-0
time="2022-08-26T14:02:59Z" level=info msg="Starting EventSource" file="sigs.k8s.io/controller-runtime@v0.8.3/pkg/internal/controller/controller.go:165" func="controller.(*Controller).Start.func1" reconciler group=postgres-operator.crunchydata.com reconciler kind=PostgresCluster source="kind source: /, Kind=" version=5.1.0-0
time="2022-08-26T14:02:59Z" level=info msg="Starting EventSource" file="sigs.k8s.io/controller-runtime@v0.8.3/pkg/internal/controller/controller.go:165" func="controller.(*Controller).Start.func1" reconciler group=postgres-operator.crunchydata.com reconciler kind=PostgresCluster source="kind source: /, Kind=" version=5.1.0-0
time="2022-08-26T14:02:59Z" level=info msg="Starting EventSource" file="sigs.k8s.io/controller-runtime@v0.8.3/pkg/internal/controller/controller.go:165" func="controller.(*Controller).Start.func1" reconciler group=postgres-operator.crunchydata.com reconciler kind=PostgresCluster source="kind source: /, Kind=" version=5.1.0-0
time="2022-08-26T14:02:59Z" level=info msg="Starting EventSource" file="sigs.k8s.io/controller-runtime@v0.8.3/pkg/internal/controller/controller.go:165" func="controller.(*Controller).Start.func1" reconciler group=postgres-operator.crunchydata.com reconciler kind=PostgresCluster source="kind source: /, Kind=" version=5.1.0-0
time="2022-08-26T14:02:59Z" level=info msg="Starting EventSource" file="sigs.k8s.io/controller-runtime@v0.8.3/pkg/internal/controller/controller.go:165" func="controller.(*Controller).Start.func1" reconciler group=postgres-operator.crunchydata.com reconciler kind=PostgresCluster source="kind source: /, Kind=" version=5.1.0-0
time="2022-08-26T14:02:59Z" level=info msg="Starting Controller" file="sigs.k8s.io/controller-runtime@v0.8.3/pkg/internal/controller/controller.go:173" func="controller.(*Controller).Start.func1" reconciler group=postgres-operator.crunchydata.com reconciler kind=PostgresCluster version=5.1.0-0
time="2022-08-26T14:02:59Z" level=info msg="Starting workers" file="sigs.k8s.io/controller-runtime@v0.8.3/pkg/internal/controller/controller.go:211" func="controller.(*Controller).Start.func1" reconciler group=postgres-operator.crunchydata.com reconciler kind=PostgresCluster version=5.1.0-0 worker count=2
time="2022-08-26T14:02:59Z" level=error msg="Reconciler error" error="invalid number of restore Jobs found when attempting to reconcile a pgBackRest data source" file="internal/controller/postgrescluster/cluster.go:277" func="postgrescluster.(*Reconciler).reconcileDataSource" name=eportaldb namespace=eportal reconciler group=postgres-operator.crunchydata.com reconciler kind=PostgresCluster version=5.1.0-0
time="2022-08-26T14:02:59Z" level=error msg="Reconciler error" error="invalid number of restore Jobs found when attempting to reconcile a pgBackRest data source" file="internal/controller/postgrescluster/cluster.go:277" func="postgrescluster.(*Reconciler).reconcileDataSource" name=eportaldb namespace=default reconciler group=postgres-operator.crunchydata.com reconciler kind=PostgresCluster version=5.1.0-0
time="2022-08-26T14:02:59Z" level=error msg="Reconciler error" error="invalid number of restore Jobs found when attempting to reconcile a pgBackRest data source" file="internal/controller/postgrescluster/cluster.go:277" func="postgrescluster.(*Reconciler).reconcileDataSource" name=eportaldb namespace=staging reconciler group=postgres-operator.crunchydata.com reconciler kind=PostgresCluster version=5.1.0-0
time="2022-08-26T14:02:59Z" level=error msg="Reconciler error" error="invalid number of restore Jobs found when attempting to reconcile a pgBackRest data source" file="internal/controller/postgrescluster/cluster.go:277" func="postgrescluster.(*Reconciler).reconcileDataSource" name=eportaldb namespace=richard reconciler group=postgres-operator.crunchydata.com reconciler kind=PostgresCluster version=5.1.0-0
time="2022-08-26T14:02:59Z" level=error msg="Reconciler error" error="invalid number of restore Jobs found when attempting to reconcile a pgBackRest data source" file="internal/controller/postgrescluster/cluster.go:277" func="postgrescluster.(*Reconciler).reconcileDataSource" name=eportaldb namespace=maja reconciler group=postgres-operator.crunchydata.com reconciler kind=PostgresCluster version=5.1.0-0
time="2022-08-26T14:02:59Z" level=error msg="Reconciler error" error="invalid number of restore Jobs found when attempting to reconcile a pgBackRest data source" file="internal/controller/postgrescluster/cluster.go:277" func="postgrescluster.(*Reconciler).reconcileDataSource" name=eportaldb namespace=staging-153 reconciler group=postgres-operator.crunchydata.com reconciler kind=PostgresCluster version=5.1.0-0
time="2022-08-26T14:02:59Z" level=error msg="Reconciler error" error="invalid number of restore Jobs found when attempting to reconcile a pgBackRest data source" file="internal/controller/postgrescluster/cluster.go:277" func="postgrescluster.(*Reconciler).reconcileDataSource" name=eportaldb namespace=richard2 reconciler group=postgres-operator.crunchydata.com reconciler kind=PostgresCluster version=5.1.0-0
time="2022-08-26T14:02:59Z" level=error msg="Reconciler error" error="invalid number of restore Jobs found when attempting to reconcile a pgBackRest data source" file="internal/controller/postgrescluster/cluster.go:277" func="postgrescluster.(*Reconciler).reconcileDataSource" name=eportaldb namespace=morten reconciler group=postgres-operator.crunchydata.com reconciler kind=PostgresCluster version=5.1.0-0
time="2022-08-26T14:02:59Z" level=error msg="Reconciler error" error="invalid number of restore Jobs found when attempting to reconcile a pgBackRest data source" file="internal/controller/postgrescluster/cluster.go:277" func="postgrescluster.(*Reconciler).reconcileDataSource" name=eportaldb namespace=eportal reconciler group=postgres-operator.crunchydata.com reconciler kind=PostgresCluster version=5.1.0-0
time="2022-08-26T14:02:59Z" level=error msg="Reconciler error" error="invalid number of restore Jobs found when attempting to reconcile a pgBackRest data source" file="internal/controller/postgrescluster/cluster.go:277" func="postgrescluster.(*Reconciler).reconcileDataSource" name=eportaldb namespace=staging-151 reconciler group=postgres-operator.crunchydata.com reconciler kind=PostgresCluster version=5.1.0-0
time="2022-08-26T14:02:59Z" level=error msg="Reconciler error" error="invalid number of restore Jobs found when attempting to reconcile a pgBackRest data source" file="internal/controller/postgrescluster/cluster.go:277" func="postgrescluster.(*Reconciler).reconcileDataSource" name=eportaldb namespace=default reconciler group=postgres-operator.crunchydata.com reconciler kind=PostgresCluster version=5.1.0-0
time="2022-08-26T14:02:59Z" level=error msg="Reconciler error" error="invalid number of restore Jobs found when attempting to reconcile a pgBackRest data source" file="internal/controller/postgrescluster/cluster.go:277" func="postgrescluster.(*Reconciler).reconcileDataSource" name=eportaldb namespace=richard reconciler group=postgres-operator.crunchydata.com reconciler kind=PostgresCluster version=5.1.0-0
time="2022-08-26T14:03:00Z" level=error msg="Reconciler error" error="invalid number of restore Jobs found when attempting to reconcile a pgBackRest data source" file="internal/controller/postgrescluster/cluster.go:277" func="postgrescluster.(*Reconciler).reconcileDataSource" name=eportaldb namespace=staging reconciler group=postgres-operator.crunchydata.com reconciler kind=PostgresCluster version=5.1.0-0
time="2022-08-26T14:03:00Z" level=error msg="Reconciler error" error="invalid number of restore Jobs found when attempting to reconcile a pgBackRest data source" file="internal/controller/postgrescluster/cluster.go:277" func="postgrescluster.(*Reconciler).reconcileDataSource" name=eportaldb namespace=staging-153 reconciler group=postgres-operator.crunchydata.com reconciler kind=PostgresCluster version=5.1.0-0
time="2022-08-26T14:03:00Z" level=error msg="Reconciler error" error="invalid number of restore Jobs found when attempting to reconcile a pgBackRest data source" file="internal/controller/postgrescluster/cluster.go:277" func="postgrescluster.(*Reconciler).reconcileDataSource" name=eportaldb namespace=maja reconciler group=postgres-operator.crunchydata.com reconciler kind=PostgresCluster version=5.1.0-0
time="2022-08-26T14:03:00Z" level=error msg="Reconciler error" error="invalid number of restore Jobs found when attempting to reconcile a pgBackRest data source" file="internal/controller/postgrescluster/cluster.go:277" func="postgrescluster.(*Reconciler).reconcileDataSource" name=eportaldb namespace=richard2 reconciler group=postgres-operator.crunchydata.com reconciler kind=PostgresCluster version=5.1.0-0
time="2022-08-26T14:03:00Z" level=error msg="Reconciler error" error="invalid number of restore Jobs found when attempting to reconcile a pgBackRest data source" file="internal/controller/postgrescluster/cluster.go:277" func="postgrescluster.(*Reconciler).reconcileDataSource" name=eportaldb namespace=morten reconciler group=postgres-operator.crunchydata.com reconciler kind=PostgresCluster version=5.1.0-0
time="2022-08-26T14:03:00Z" level=error msg="Reconciler error" error="invalid number of restore Jobs found when attempting to reconcile a pgBackRest data source" file="internal/controller/postgrescluster/cluster.go:277" func="postgrescluster.(*Reconciler).reconcileDataSource" name=eportaldb namespace=staging-151 reconciler group=postgres-operator.crunchydata.com reconciler kind=PostgresCluster version=5.1.0-0
time="2022-08-26T14:03:00Z" level=error msg="Reconciler error" error="invalid number of restore Jobs found when attempting to reconcile a pgBackRest data source" file="internal/controller/postgrescluster/cluster.go:277" func="postgrescluster.(*Reconciler).reconcileDataSource" name=eportaldb namespace=eportal reconciler group=postgres-operator.crunchydata.com reconciler kind=PostgresCluster version=5.1.0-0

Nothing is working:
Restore a database (nothing happens)
Deleting a PostgresCluster (this works!)
Creating a new PostgresCluster (recreating the deleted one) (new cluster is not created, no restore job is starting, no repo host statefullset etc)

Environment

Please provide the following details:

  • Platform: GKE 1.21.14-gke.700
  • PGO Image Tag: registry.developers.crunchydata.com/crunchydata/postgres-operator:ubi8-5.1.0-0
  • Postgres Version: 14.1
  • Storage: kubernetes.io/gce-pd

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions