Skip to content

pgbackrest cronjob concurrency - is another pgBackRest process running? #3145

Open
@djtaylor

Description

@djtaylor

Overview

I have a full and incremental backup schedule defined for a postgres cluster. I consistently see more then one job get scheduled, causing the namespace to be filled with pods in an Error state, with duplicate jobs (the ones that didn't run first) throwing error:

[ERROR: [050]: unable to acquire lock on file '/tmp/pgbackrest/db-backup.lock': Resource temporarily unavailable\n HINT: is another pgBackRest process running?\n]

As far as I can tell, this is not preventing backups from running, as at least one of the jobs succeeds, leaving the others in an error state.

Environment

Please provide the following details:

  • Platform: Kubernetes
  • Platform Version: 1.20.15
  • PGO Image Tag: ubi8-5.0.4-0
  • Postgres Version: 13
  • Storage: isci PVCs for pods, s3 for pgbackrest

Steps to Reproduce

REPRO

  1. Build a postgres cluster with the operator, use S3 as a backend for pgbackrest
  2. Schedule a full and incremental backup job schedule
  3. See the full backup jobs conflict with each other

EXPECTED

  1. I would expect only one job per schedule invocation get created at a time, until it is finished

ACTUAL

  1. More the one job gets created, causing the errors seen above

Logs

time="2022-04-05T01:00:19Z" level=info msg="crunchy-pgbackrest starts"
time="2022-04-05T01:00:19Z" level=info msg="debug flag set to false"
time="2022-04-05T01:00:19Z" level=info msg="backrest backup command requested"
time="2022-04-05T01:00:19Z" level=info msg="command to execute is [pgbackrest backup --stanza=db --repo=1 --type=full]"
time="2022-04-05T01:00:19Z" level=info msg="output=[]"
time="2022-04-05T01:00:19Z" level=info msg="stderr=[ERROR: [050]: unable to acquire lock on file '/tmp/pgbackrest/db-backup.lock': Resource temporarily unavailable\n       HINT: is another pgBackRest process running?\n]"
time="2022-04-05T01:00:19Z" level=fatal msg="command terminated with exit code 50"

Additional Information

When inspecting the CronJob definition, there is a field ConcurrencyPolicy set to Allow. Would it not make more sense to set this value to Forbid (or at least have the option of doing so via the CRD), to prevent this type of scheduling conflict from happening?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions