Skip to content

Random crashing #2138

Closed
Closed
@echel0n

Description

@echel0n

Describe the bug
I have 4 pgo clusters created and running, randomly after some time they will crash and fail to come back, I've tried restarting the clusters and even killing all the pods in the pgo namespace which just results in a complaint of multi-attached pvc's, so far the only way to fix this is delete the clusters and restore from backups.

Storage for both primary and replication is rook-ceph-block, backrest is S3, I can't for the life of me figure out what is causing the crash as the logs them self do not indicate anything that jumps out at me, please see below:

2020-12-20 02:21:09,854 INFO: no action.  i am the leader with the lock
2020-12-20 02:21:19,847 INFO: Lock owner: keycloak-8445bc9877-997sw; I am keycloak-8445bc9877-997sw
2020-12-20 02:21:19,916 INFO: no action.  i am the leader with the lock
2020-12-20 02:21:29,847 INFO: Lock owner: keycloak-8445bc9877-997sw; I am keycloak-8445bc9877-997sw
2020-12-20 02:21:29,872 INFO: no action.  i am the leader with the lock
2020-12-20 02:21:39,847 WARNING: Postgresql is not running.
2020-12-20 02:21:39,847 INFO: Lock owner: keycloak-8445bc9877-997sw; I am keycloak-8445bc9877-997sw
2020-12-20 02:21:39,907 INFO: Reaped pid=45567, exit status=0
2020-12-20 02:21:39,908 INFO: pg_controldata:

Please tell us about your environment:

  • Operating System: Ubuntu 20.04.1 LTS
  • Where is this running ( Local, Cloud Provider): Local
  • Storage being used (NFS, Hostpath, Gluster, etc): S3 and rook-ceph-block
  • Container Image Tag: centos7-12.5-4.5.1
  • PostgreSQL Version: 12.5
  • Platform (Docker, Kubernetes, OpenShift): Kubernetes 1.20.1
  • Platform Version: 4.5.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions