mysql container in server pod crashes intermittently after deployment. #259
Description
Is this a BUG REPORT or FEATURE REQUEST?
Choose one: BUG REPORT
Versions
MySQL Operator Version:
helm chart master (c98210b)
Values.image.tag 0.3.0
Environment:
- Kubernetes version (use
kubectl version
):
Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.3", GitCommit:"721bfa751924da8d1680787490c54b9179b1fed0", GitTreeState:"clean", BuildDate:"2019-02-01T20:08:12Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"12+", GitVersion:"v1.12.5-gke.5", GitCommit:"2c44750044d8aeeb6b51386ddb9c274ff0beb50b", GitTreeState:"clean", BuildDate:"2019-02-01T23:53:25Z", GoVersion:"go1.10.8b4", Compiler:"gc", Platform:"linux/amd64"}
- Cloud provider or hardware configuration: GKE
What happened?
When producing a cluster stateful set, often one of the pods enters a crash loop with the following logs:
What you expected to happen?
All pods to start successfully
How to reproduce it (as minimally and precisely as possible)?
Here's my cluster.yaml:
---
kind: ConfigMap
apiVersion: v1
metadata:
name: mysql-config
data:
my.cnf: |-
[mysqld]
default_authentication_plugin=mysql_native_password
---
apiVersion: mysql.oracle.com/v1alpha1
kind: Cluster
metadata:
name: alchemy-database
spec:
members: 3
version: 8.0.12
config:
name: mysql-config
volumeClaimTemplate:
metadata:
name: data
spec:
storageClassName: standard
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
---
apiVersion: v1
kind: Service
metadata:
name: alchemy-database-router
labels:
app: alchemy-database-router
spec:
ports:
- name: read-write
port: 6446
targetPort: 6446
protocol: TCP
- name: read-only
port: 6447
targetPort: 6447
protocol: TCP
selector:
app: alchemy-database-router
type: ClusterIP
clusterIP: None
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: alchemy-database-router
labels:
app: alchemy-database-router
spec:
strategy:
type: Recreate
template:
metadata:
labels:
app: alchemy-database-router
spec:
containers:
- name: mysqlrouter
image: mysql/mysql-router:8.0.12
env:
- name: MYSQL_PASSWORD
valueFrom:
secretKeyRef:
name: alchemy-database-root-password
key: password
- name: MYSQL_USER
value: root
- name: MYSQL_PORT
value: "3306"
- name: MYSQL_HOST
value: alchemy-database
- name: MYSQL_INNODB_NUM_MEMBERS
value: "3"
command:
- "/bin/bash"
- "-cx"
- "exec /run.sh mysqlrouter"
ports:
- containerPort: 6446
- containerPort: 6447
Anything else we need to know?
mysql-operator is installed into the same namespace as the above yaml, "alchemy".
This yaml is based on some of the examples provided in this repo, however I've changed the access mode on the volume claims to ReadWriteOnce because ReadWriteMany isn't supported on GKE out of the box. Perhaps ReadWriteMany is required for mysql-operator?
By following the link at the end of the crash log I found the line:
The preceding means that normally you should not get corrupted tables unless one of the following happens:
- Some external program is manipulating data files or index files at the same time as mysqld without locking the table properly.
Also if one pod crashes it'll continue to crash every time it's restarted, but the others remain running with no issue.
All of this makes me think it might be something to do with the access mode but I was under the impression that each pod mounts it's own PV and so ReadWriteOnce should be sufficient.