Skip to content

Commit 8d5d40f

Browse files
prlaurencecahoonpwork
authored andcommitted
Add Overview to Doc Content (#955)
* Add Overview to Doc Content Adding a new Overview file to the documentation. This includes some content that is redundant to the designoverview and getting started sections. Assuming this looks like a good addition, I can create new PRs to rationalize Overview with the other existing content. * Create storage-overview.md
1 parent c1a0edd commit 8d5d40f

11 files changed

+414
-0
lines changed

hugo/content/overview/_index.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
---
2+
title: "Design"
3+
date:
4+
draft: false
5+
weight: 4
6+
---
7+
Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
---
2+
title: "PostgreSQL Operator Backup and Restore Capability"
3+
date:
4+
draft: false
5+
weight: 5
6+
---
7+
8+
## PostgreSQL Operator Backup and Restore Capability
9+
10+
The PostgreSQL Operator provides users with the ability to manage PostgreSQL cluster backups through both native PostgreSQL backup functionality, as well as using pgbackrest, an open source backup and restore solution designed to scale up to the largest databases. By default, beginning with verison 4.0, the PostgreSQL Operator backup command performs a PostgreSQL pgbackrest backup.
11+
12+
The three backup types that can be configured through the PostgreSQL Operator CLI are:
13+
14+
* pgbackrest a simple, reliable backup and restore solution that can seamlessly scale up to the largest databases and workloads. It provides full, incremental, differential
15+
backups, and point-in-time recovery.
16+
17+
* pg_basebackup is used to take base backups of a running PostgreSQL database cluster. These are taken without affecting other clients to the database, and can be used both for
18+
point-in-time recovery and as the starting point for a log shipping or streaming replication standby servers.
19+
20+
* pg_dump is a utility for backing up a single PostgreSQL database. It makes consistent backups even if the database is being used concurrently. pg_dump does not block other users
21+
accessing the database (readers or writers).pg_dump
22+
23+
### pgBackRest Integration
24+
25+
The PostgreSQL Operator integrates various features of the [pgbackrest backup and restore project](https://pgbackrest.org) to support backup and restore capability.
26+
27+
The *pgo-backrest-repo* container acts as a pgBackRest remote repository for the PostgreSQL cluster to use for storing archive files and backups.
28+
29+
The following diagrams depicts some of the integration features:
30+
31+
![alt text](/operator-backrest-integration.png "Operator Backrest Integration")
32+
33+
In this diagram, starting from left to right we see the following:
34+
35+
* a user when they enter *pgo backup mycluster --backup-type=pgbackrest* will cause a pgo-backrest container to be run as a Job, that container will execute a *pgbackrest backup* command in the pgBackRest repository container to perform the backup function.
36+
37+
* a user when they enter *pgo show backup mycluster --backup-type=pgbackrest* will cause a *pgbackrest info* command to be executed on the pgBackRest repository container, the *info* output is sent directly back to the user to view
38+
39+
* the PostgreSQL container itself will use an archive command, *pgbackrest archive-push* to send archives to the pgBackRest repository container
40+
41+
* the user entering *pgo create cluster mycluster --pgbackrest* will cause a pgBackRest repository container deployment to be created, that repository is exclusively used for this Postgres cluster
42+
43+
* lastly, a user entering *pgo restore mycluster* will cause a *pgo-backrest-restore* container to be created as a Job, that container executes the *pgbackrest restore* command
44+
45+
### Support for pgBackRest Use of S3 Buckets
46+
47+
The PostgreSQL Operator supports the use AWS S3 storage buckets for the pgbackrest repository in any pgbackrest-enabled cluster. When S3 support is enabled for a cluster, all archives will automatically be pushed to a pre-configured S3 storage bucket, and that same bucket can then be utilized for the creation of any backups as well as when performing restores. Please note that once a storage type has been selected for a cluster during cluster creation (specifically `local`, `s3`, or _both_, as described in detail below), it cannot be changed.
48+
49+
The PostgreSQL Operator allows for the configuration of a single storage bucket, which can then be utilized across multiple clusters. Once S3 support has been enabled for a cluster, pgbackrest will create a `backrestrepo` directory in the root of the configured S3 storage bucket (if it does not already exist), and subdirectories will then be created under the `backrestrepo` directory for each cluster created with S3 storage enabled.
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
---
2+
title: "Custom Resource Definitions Overview"
3+
date:
4+
draft: false
5+
weight: 5
6+
---
7+
8+
## PostgreSQL Operator Custom Resource Definitions
9+
10+
The PostgreSQL Operator defines the following series of Kubernetes [Custom Resource Definitions (CRDs)](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/#custom-resources):
11+
12+
![Reference](/operator-crd-architecture.png)
13+
14+
Each of these CRDs are used in the design of the PostgreSQL Operator to perform PostgreSQL related operations in order to enable with on-demand, PostgreSQL-as-a-Service workflows.
15+
16+
### Cluster (pgclusters)
17+
18+
The Cluster or pgcluster CRD is used by the PostgreSQL Operator to define the PostgreSQL cluster definition and make new PostgreSQL cluster requests.
19+
20+
### Backup (pgbackups)
21+
22+
The Backup or pgbackup CRD is used by the PostgreSQL Operator to perform a pgbasebackup and to hold the workflow and status of the last backup job. Crunchy Data plans to deprecate this CRD in a future release in favor of a more general pgtask resource
23+
24+
### Tasks (pgtask)
25+
26+
The Tasks or pgtask CRD is used by the PostgreSQL Operator to perform workflow and other related administration tasks. The pgtasks CRD captures workflows and administrative tasks for a given pgcluster.
27+
28+
### Replica (pgreplica)
29+
30+
The Replica or pgreplica CRD is used by teh PostgreSQL Operator to create a PostgreSQL replica. When a user creates a PostgreSQL replica, a pgreplica CRD is created to define that replica.
31+
32+
Metadata about each PostgreSQL cluster deployed by the PostgreSQL Operator are stored within these CRD resources which act as the source of truth for the
33+
Operator. The PostgreSQL Operator makes use of CRDs to maintain state and resource definitions as offered by the PostgreSQL Operator.
34+
35+
36+
37+
38+
39+
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
---
2+
title: "Failover in the PostgreSQL Operator Overview"
3+
date:
4+
draft: false
5+
weight:
6+
---
7+
8+
## Failover in the PostgreSQL Operator
9+
10+
There are a number of potential events that could cause a primary database instance or cluster to become unavailable during the course of normal operations, including:
11+
12+
* A database storage (disk) failure or any other hardware failure
13+
* The network on which the database resides becomes unreachable
14+
* The host operating system becomes unstable and crashes
15+
* A key database file becomes corrupted
16+
* Total loss of data center
17+
18+
There may also be downtime events that are due to the normal case of operations, such as performing a minor upgrade, security patching of operating system, hardware upgrade, or other maintenance.
19+
20+
To enable rapid recovery from the unavailability of the primary PostgreSQL instance within a PostgreSQL cluster, the PostgreSQL Operator supports both Manual and Automated failover within a single Kubernetes cluster.
21+
22+
### PostgreSQL Cluster Architecture
23+
24+
The failover from a primary PostgreSQL instances to a replica PostgreSQL instance within a PostgreSQL cluster.
25+
26+
### Manual Failover
27+
28+
Manual failover is performed by PostgreSQL Operator API actions involving a *query* and then a *target* being specified to pick the fail-over replica target.
29+
30+
### Automatic Failover
31+
32+
Automatic failover is performed by the PostgreSQL Operator by evaluating the readiness of a primary. Automated failover can be globally specified for all clusters or specific clusters. If desired, users can configure the PostgreSQL Operator to replace a failed primary PostgreSQL instance with a new PostgreSQL replica.
33+
34+
The PostgreSQL Operator automatic failover logic includes:
35+
36+
* deletion of the failed primary Deployment
37+
* pick the best replica to become the new primary
38+
* label change of the targeted Replica to match the primary Service
39+
* execute the PostgreSQL promote command on the targeted replica
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
---
2+
title: "PostgreSQL Operator Namespace Considerations Overview"
3+
date:
4+
draft: false
5+
weight: 6
6+
---
7+
8+
## Kubernetes Namespaces and the PostgreSQL Operator
9+
10+
In Kubernetes, namespaces provide the user the ability to divide cluster resources between multiple users (via resource quota).
11+
12+
The PostgreSQL Operator makes use of the Kubernetes Namespace support in order to define the Namespace to which the PostgreSQL Operator will deploy PostgreSQL clusters, enabling users to more easily allocate Kubernetes resources to specific areas within their business (users, projects, departments).
13+
14+
#### Namespaces Applied to Organizational Requirements
15+
16+
Prior to version PostgreSQL Operator 4.0, the PostgreSQL Operator could only be deployed with a Namespace deployment pattern where both the PostgreSQL Operator and the PostgreSQL Clusters it deployed existed within a single Kubernetes namespace.
17+
18+
With the PostgreSQL Operator 4.0 release, the operator now supports a variety of Namespace deployment patterns, including:
19+
20+
* **OwnNamespace** Operator and PostgreSQL clusters deployed to the same Kubernetes Namespace
21+
22+
* **SingleNamespace and MultiNamespace** Operator and PostgreSQL clusters deployed to a predefined set of Kubernetes Namespaces
23+
24+
* **AllNamespaces** Operator deployed into a single Kubernetes Namespace but watching all Namespaces on a Kubernetes cluster
25+
26+
#### Configuration of the Namespace to which PostgreSQL Operator is Deployed
27+
28+
In order to configure the Kubernetes Namespace within which the PostgreSQL Operator will run, it is necessary to configure the PGO_OPERATOR_NAMESPACE environment variable. Both the Ansible and Bash installation method enable you to modify this PGO_OPERATOR_NAMESPACE environment variable in connection with the PostgreSQL Operator installation.
29+
30+
#### Configuration of the Namespaces to which PostgreSQL Operator will Deploy PostgreSQL Clusters
31+
32+
At startup time, the PostgreSQL Operator determines the Kubernetes Namespaces to which it will be able to deploy and administer PostgreSQL databases and clusters. The Kubernetes Namespace that the PostgreSQL Operator will be able to service is determined at startup time by the NAMESPACE environment variable. The NAMESPACE variable is set as part of the PostgreSQL Operator installation process. The format of the NAMESPACE value in the PostgreSQL Operator is modeled after the Operator Lifecycle Manager project.
33+
34+
35+
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
---
2+
title: "Node Affinity in PostgreSQL Operator"
3+
date:
4+
draft: false
5+
weight: 3
6+
---
7+
8+
## Node Affinity in PostgreSQL Operator
9+
10+
Kubernetes node affinity allows you to constrain which nodes your pod is eligible to be scheduled on, based on labels on the node.
11+
12+
The PostgreSQL Operator provides users with the ability to add a node affinity section to a new Cluster Deployment. By adding a node affinity section to the Cluster Deployment, users can direct Kubernetes to attempt to schedule a primary PostgreSQL instance within a cluster on a specific Kubernetes node.
13+
14+
As an example, you can see the nodes on your Kubernetes cluster by running the following:
15+
```
16+
kubectl get nodes
17+
```
18+
19+
You can then specify one of those Kubernetes node names (e.g. kubeadm-node2) when creating a PostgreSQL cluster;
20+
```
21+
pgo create cluster thatcluster --node-label=kubeadm-node2
22+
```
23+
24+
The node affinity rule inserted in the Deployment uses a *preferred* strategy so that if the node were down or not available, Kubernetes will go ahead and schedule the Pod on another node.
25+
26+
When you scale up a PostgreSQL cluster by adding a PostgreSQL replica instance, the scaling will take into account the use of `--node-label`. If it sees that a PostgreSQL cluster was created with a specific node name, then the PostgreSQL replica Deployment will add an affinity rule to attempt to schedule the Pod.
27+
28+
29+
30+
31+
32+
Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
## PostgreSQL Operator Containers Overview
2+
3+
The PostgreSQL Operator orchestrates a series of PostgreSQL and PostgreSQL related containers containers that enable rapid deployment of PostgreSQL, including administration and monitoring tools in a Kubernetes environment. The PostgreSQL Operator supports PostgreSQL 9.5+ with multiple PostgreSQL cluster deployment strategies and a variety of PostgreSQL related extensions and tools enabling enterprise grade PostgreSQL-as-a-Service. A full list of the containers supported by the PostgreSQL Operator is provided below.
4+
5+
### PostgreSQL Server and Extensions
6+
7+
* **PostgreSQL** (crunchy-postgres). PostgreSQL database server. The crunchy-postgres container image is unmodified, open source PostgreSQL packaged and maintained by Crunchy Data.
8+
9+
* **PostGIS** (crunchy-postgres-gis). PostgreSQL database server including the PostGIS extension. The crunchy-postgres-gis container image is unmodified, open source PostgreSQL packaged and maintained by Crunchy Data. This image is identical to the crunchy-postgres image except it includes the open source geospatial extension PostGIS for PostgreSQL in addition to the language extension PL/R which allows for writing functions in the R statistical computing language.
10+
11+
### Backup and Restore
12+
13+
* **pgBackRest** (crunchy-backrest-restore). pgBackRest is a high performance backup and restore utility for PostgreSQL. The crunchy-backrest-restore container executes the pgBackRest utility, allowing FULL and DELTA restore capability.
14+
15+
* **pg_basebackup** (crunchy-backup). pg_basebackup is used to take base backups of a running PostgreSQL database cluster. The crunchy-backup container executes a full backup against another database container using the standard pg_basebackup utility that is included with PostgreSQL.
16+
17+
* **pgdump** (crunchy-pgdump). The crunchy-pgdump container executes either a pg_dump or pg_dumpall database backup against another PostgreSQL database.
18+
19+
* **crunchy-pgrestore** (restore). The restore image provides a means of performing a restore of a dump from pg_dump or pg_dumpall via psql or pg_restore to a PostgreSQL container database.
20+
21+
22+
### Administration Tools
23+
24+
* **pgAdmin4** (crunchy-pgadmin4). PGAdmin4 is a graphical user interface administration tool for PostgreSQL. The crunchy-pgadmin4 container executes the pgAdmin4 web application.
25+
26+
* **pgbadger** (crunchy-pgbadger). pgbadger is a PostgreSQL log analyzer with fully detailed reports and graphs. The crunchy-pgbadger container executes the pgBadger utility, which generates a PostgreSQL log analysis report using a small HTTP server running on the container.
27+
28+
* **pg_upgrade** (crunchy-upgrade). The crunchy-upgrade container contains 9.5, 9.6, 10, and 11 PostgreSQL packages in order to perform a pg_upgrade from 9.5 to 9.6, 9.6 to 10, and 10 to 11 versions.
29+
30+
* **scheduler** (crunchy-scheduler). The crunchy-scheduler container provides a cron like microservice for automating pgBaseBackup and pgBackRest backups within a single namespace.
31+
32+
### Metrics and Monitoring
33+
34+
* **Metrics Collection** (crunchy-collect). The crunchy-collect container provides real time metrics about the PostgreSQL database via an API. These metrics are scraped and stored by a Prometheus time-series database and are then graphed and visualized through the open source data visualizer Grafana.
35+
36+
* **Grafana** (crunchy-grafana). Grafana is an open source Visual dashboards are created from the collected and stored data that crunchy-collect and crunchy-prometheus provide for the crunchy-grafana container, which hosts an open source web-based graphing dashboard called Grafana.
37+
38+
* **Prometheus** (crunchy-prometheus). Prometheus is a multi-dimensional time series data model with an elastic query language. It is used in collaboration with Crunchy Collect and Grafana to provide metrics.
39+
40+
### Connection Pooling and Load Balancing
41+
42+
* **pgbouncer** (crunchy-pgbouncer). pgbouncer is a lightweight connection pooler for PostgreSQL. The crunchy-pgbouncer container provides a pgbouncer image.
43+
44+
* **pgpool** (crunchy-pgpool). pgPool II is a middleware that works between PostgreSQL servers and a PostgreSQL database client. The crunchy-pgpool container executes the utility. pgPool can be used to provide a smart PostgreSQL-aware proxy to a PostgreSQL cluster, both primary and replica, so that applications only have to work with a single database connection.
45+
46+
47+
48+
49+
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
---
2+
title: "PostgreSQL Operator Overview"
3+
date:
4+
draft: false
5+
weight: 5
6+
---
7+
8+
## PostgreSQL Operator Overview
9+
10+
The PostgreSQL Operator extends Kubernetes to provide a higher- level abstraction enabling the rapid creation and management of PostgreSQL databases and clusters.
11+
12+
The PostgreSQL Operator include the following components:
13+
14+
* PostgreSQL Operator
15+
* PostgreSQL Operator Containers
16+
* PostgreSQL Operator PGO Client
17+
* PostgreSQL Operator REST API Server
18+
* PostgreSQL PGO Schedule
19+
20+
#### PostgreSQL Operator
21+
22+
The PostgreSQL Operator makes use of Kubernetes “Custom Resource Definitions” or “CRDs” to extend Kubernetes with custom, PostgreSQL specific, Kubernetes objects such as “Database” and “Cluster”. The PostgreSQL Operator users these CRDs to enable users to deploy, configure and administer PostgreSQL databases and clusters as Kubernetes-natvie, open source PostgreSQL-as-a-Service infrastructure.
23+
24+
#### PostgreSQL Operator Containers
25+
26+
The PostgreSQL Operator orchestrates a series of PostgreSQL and PostgreSQL related containers containers that enable rapid deployment of PostgreSQL, including administration and monitoring tools in a Kubernetes environment.
27+
28+
#### PostgreSQL Operator PGO Client
29+
30+
The PostgreSQL Operator provides a command line interface (CLI), pgo. This CLI tool may be used an end-user to create databases or clusters, or make changes to existing databases. The CLI interacts with the REST API deployed within the postgres-operator deployment.
31+
32+
#### PostgreSQL Operator REST API Server
33+
34+
A feature of the PostgreSQL Operator is to provide a REST API upon which users or custom applications can inspect and cause actions within the Operator such as provisioning resources or viewing status of resources. This API is secured by a RBAC (role based access control) security model whereby each API call has a permission assigned to it. API roles are defined to provide granular authorization to Operator services.
35+
36+
#### PostgreSQL Operator PGO Scheduler
37+
38+
The PostgreSQL Operator includes a cron like scheduler application called pgo-scheduler. The purpose of pgo-scheduler is to run automated tasks such as PostgreSQL backups or SQL policies against PostgreSQL clusters created by the PostgreSQL Operator. PGO Scheduler watches Kubernetes for configmaps with the label crunchy-scheduler=true in the same namespace where the PostgreSQL Operator is deployed.
39+

0 commit comments

Comments
 (0)