Skip to content

Commit

Permalink
Feat: enable owner references (#2688)
Browse files Browse the repository at this point in the history
* feat(498): Add ownerReferences to managed entities
* empty owner reference for cross namespace secret and more tests
* update ownerReferences of existing resources
* removing ownerReference requires Update API call
* CR ownerReference on PVC blocks pvc retention policy of statefulset
* make ownerreferences optional and disabled by default
* update unit test to check len ownerReferences
* update codegen
* add owner references e2e test
* update unit test
* add block_owner_deletion field to test owner reference
* fix typos and update docs once more
* reflect code feedback

---------

Co-authored-by: Max Begenau <[email protected]>
  • Loading branch information
FxKu and mbegenau authored Aug 9, 2024
1 parent d5a88f5 commit a87307e
Show file tree
Hide file tree
Showing 28 changed files with 534 additions and 205 deletions.
7 changes: 5 additions & 2 deletions charts/postgres-operator/crds/operatorconfigurations.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -211,9 +211,9 @@ spec:
enable_init_containers:
type: boolean
default: true
enable_secrets_deletion:
enable_owner_references:
type: boolean
default: true
default: false
enable_persistent_volume_claim_deletion:
type: boolean
default: true
Expand All @@ -226,6 +226,9 @@ spec:
enable_readiness_probe:
type: boolean
default: false
enable_secrets_deletion:
type: boolean
default: true
enable_sidecars:
type: boolean
default: true
Expand Down
2 changes: 2 additions & 0 deletions charts/postgres-operator/templates/clusterrole.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,7 @@ rules:
- create
- delete
- get
- patch
- update
# to check nodes for node readiness label
- apiGroups:
Expand Down Expand Up @@ -196,6 +197,7 @@ rules:
- get
- list
- patch
- update
# to CRUD cron jobs for logical backups
- apiGroups:
- batch
Expand Down
6 changes: 4 additions & 2 deletions charts/postgres-operator/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -129,8 +129,8 @@ configKubernetes:
enable_finalizers: false
# enables initContainers to run actions before Spilo is started
enable_init_containers: true
# toggles if operator should delete secrets on cluster deletion
enable_secrets_deletion: true
# toggles if child resources should have an owner reference to the postgresql CR
enable_owner_references: false
# toggles if operator should delete PVCs on cluster deletion
enable_persistent_volume_claim_deletion: true
# toggles pod anti affinity on the Postgres pods
Expand All @@ -139,6 +139,8 @@ configKubernetes:
enable_pod_disruption_budget: true
# toogles readiness probe for database pods
enable_readiness_probe: false
# toggles if operator should delete secrets on cluster deletion
enable_secrets_deletion: true
# enables sidecar containers to run alongside Spilo in the same pod
enable_sidecars: true

Expand Down
70 changes: 62 additions & 8 deletions docs/administrator.md
Original file line number Diff line number Diff line change
Expand Up @@ -223,9 +223,9 @@ configuration:

Now, every cluster manifest must contain the configured annotation keys to
trigger the delete process when running `kubectl delete pg`. Note, that the
`Postgresql` resource would still get deleted as K8s' API server does not
block it. Only the operator logs will tell, that the delete criteria wasn't
met.
`Postgresql` resource would still get deleted because the operator does not
instruct K8s' API server to block it. Only the operator logs will tell, that
the delete criteria was not met.

**cluster manifest**

Expand All @@ -243,11 +243,65 @@ spec:

In case, the resource has been deleted accidentally or the annotations were
simply forgotten, it's safe to recreate the cluster with `kubectl create`.
Existing Postgres cluster are not replaced by the operator. But, as the
original cluster still exists the status will show `CreateFailed` at first.
On the next sync event it should change to `Running`. However, as it is in
fact a new resource for K8s, the UID will differ which can trigger a rolling
update of the pods because the UID is used as part of backup path to S3.
Existing Postgres cluster are not replaced by the operator. But, when the
original cluster still exists the status will be `CreateFailed` at first. On
the next sync event it should change to `Running`. However, because it is in
fact a new resource for K8s, the UID and therefore, the backup path to S3,
will differ and trigger a rolling update of the pods.

## Owner References and Finalizers

The Postgres Operator can set [owner references](https://kubernetes.io/docs/concepts/overview/working-with-objects/owners-dependents/) to most of a cluster's child resources to improve
monitoring with GitOps tools and enable cascading deletes. There are three
exceptions:

* Persistent Volume Claims, because they are handled by the [PV Reclaim Policy]https://kubernetes.io/docs/tasks/administer-cluster/change-pv-reclaim-policy/ of the Stateful Set
* The config endpoint + headless service resource because it is managed by Patroni
* Cross-namespace secrets, because owner references are not allowed across namespaces by design

The operator would clean these resources up with its regular delete loop
unless they got synced correctly. If for some reason the initial cluster sync
fails, e.g. after a cluster creation or operator restart, a deletion of the
cluster manifest would leave orphaned resources behind which the user has to
clean up manually.

Another option is to enable finalizers which first ensures the deletion of all
child resources before the cluster manifest gets removed. There is a trade-off
though: The deletion is only performed after the next two operator SYNC cycles
with the first one setting a `deletionTimestamp` and the latter reacting to it.
The final removal of the custom resource will add a DELETE event to the worker
queue but the child resources are already gone at this point. If you do not
desire this behavior consider enabling owner references instead.

**postgres-operator ConfigMap**

```yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: postgres-operator
data:
enable_finalizers: "false"
enable_owner_references: "true"
```

**OperatorConfiguration**

```yaml
apiVersion: "acid.zalan.do/v1"
kind: OperatorConfiguration
metadata:
name: postgresql-operator-configuration
configuration:
kubernetes:
enable_finalizers: false
enable_owner_references: true
```

:warning: Please note, both options are disabled by default. When enabling owner
references the operator cannot block cascading deletes, even when the [delete protection annotations](administrator.md#delete-protection-via-annotations)
are in place. You would need an K8s admission controller that blocks the actual
`kubectl delete` API call e.g. based on existing annotations.

## Role-based access control for the operator

Expand Down
49 changes: 25 additions & 24 deletions docs/reference/operator_parameters.md
Original file line number Diff line number Diff line change
Expand Up @@ -263,6 +263,31 @@ Parameters to configure cluster-related Kubernetes objects created by the
operator, as well as some timeouts associated with them. In a CRD-based
configuration they are grouped under the `kubernetes` key.

* **enable_finalizers**
By default, a deletion of the Postgresql resource will trigger an event
that leads to a cleanup of all child resources. However, if the database
cluster is in a broken state (e.g. failed initialization) and the operator
cannot fully sync it, there can be leftovers. By enabling finalizers the
operator will ensure all managed resources are deleted prior to the
Postgresql resource. See also [admin docs](../administrator.md#owner-references-and-finalizers)
for more information The default is `false`.

* **enable_owner_references**
The operator can set owner references on its child resources (except PVCs,
Patroni config service/endpoint, cross-namespace secrets) to improve cluster
monitoring and enable cascading deletion. The default is `false`. Warning,
enabling this option disables configured delete protection checks (see below).

* **delete_annotation_date_key**
key name for annotation that compares manifest value with current date in the
YYYY-MM-DD format. Allowed pattern: `'([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9]'`.
The default is empty which also disables this delete protection check.

* **delete_annotation_name_key**
key name for annotation that compares manifest value with Postgres cluster name.
Allowed pattern: `'([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9]'`. The default is
empty which also disables this delete protection check.

* **pod_service_account_name**
service account used by Patroni running on individual Pods to communicate
with the operator. Required even if native Kubernetes support in Patroni is
Expand Down Expand Up @@ -293,16 +318,6 @@ configuration they are grouped under the `kubernetes` key.
of a database created by the operator. If the annotation key is also provided
by the database definition, the database definition value is used.

* **delete_annotation_date_key**
key name for annotation that compares manifest value with current date in the
YYYY-MM-DD format. Allowed pattern: `'([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9]'`.
The default is empty which also disables this delete protection check.

* **delete_annotation_name_key**
key name for annotation that compares manifest value with Postgres cluster name.
Allowed pattern: `'([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9]'`. The default is
empty which also disables this delete protection check.

* **downscaler_annotations**
An array of annotations that should be passed from Postgres CRD on to the
statefulset and, if exists, to the connection pooler deployment as well.
Expand Down Expand Up @@ -332,20 +347,6 @@ configuration they are grouped under the `kubernetes` key.
drained if the node_readiness_label is not used. If this option if set to
`false` the `spilo-role=master` selector will not be added to the PDB.

* **enable_finalizers**
By default, a deletion of the Postgresql resource will trigger an event
that leads to a cleanup of all child resources. However, if the database
cluster is in a broken state (e.g. failed initialization) and the operator
cannot fully sync it, there can be leftovers. By enabling finalizers the
operator will ensure all managed resources are deleted prior to the
Postgresql resource. There is a trade-off though: The deletion is only
performed after the next two SYNC cycles with the first one updating the
internal spec and the latter reacting on the `deletionTimestamp` while
processing the SYNC event. The final removal of the custom resource will
add a DELETE event to the worker queue but the child resources are already
gone at this point.
The default is `false`.

* **persistent_volume_claim_retention_policy**
The operator tries to protect volumes as much as possible. If somebody
accidentally deletes the statefulset or scales in the `numberOfInstances` the
Expand Down
109 changes: 100 additions & 9 deletions e2e/tests/test_e2e.py
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ def setUpClass(cls):
print("Failed to delete the 'standard' storage class: {0}".format(e))

# operator deploys pod service account there on start up
# needed for test_multi_namespace_support()
# needed for test_multi_namespace_support and test_owner_references
cls.test_namespace = "test"
try:
v1_namespace = client.V1Namespace(metadata=client.V1ObjectMeta(name=cls.test_namespace))
Expand Down Expand Up @@ -1419,17 +1419,11 @@ def test_multi_namespace_support(self):
k8s.wait_for_pod_start("spilo-role=master", self.test_namespace)
k8s.wait_for_pod_start("spilo-role=replica", self.test_namespace)
self.assert_master_is_unique(self.test_namespace, "acid-test-cluster")
# acid-test-cluster will be deleted in test_owner_references test

except timeout_decorator.TimeoutError:
print('Operator log: {}'.format(k8s.get_operator_log()))
raise
finally:
# delete the new cluster so that the k8s_api.get_operator_state works correctly in subsequent tests
# ideally we should delete the 'test' namespace here but
# the pods inside the namespace stuck in the Terminating state making the test time out
k8s.api.custom_objects_api.delete_namespaced_custom_object(
"acid.zalan.do", "v1", self.test_namespace, "postgresqls", "acid-test-cluster")
time.sleep(5)

@timeout_decorator.timeout(TEST_TIMEOUT_SEC)
@unittest.skip("Skipping this test until fixed")
Expand Down Expand Up @@ -1640,6 +1634,71 @@ def test_overwrite_pooler_deployment(self):
self.eventuallyEqual(lambda: k8s.count_running_pods("connection-pooler="+pooler_name),
0, "Pooler pods not scaled down")

@timeout_decorator.timeout(TEST_TIMEOUT_SEC)
def test_owner_references(self):
'''
Enable owner references, test if resources get updated and test cascade deletion of test cluster.
'''
k8s = self.k8s
cluster_name = 'acid-test-cluster'
cluster_label = 'application=spilo,cluster-name={}'.format(cluster_name)
default_test_cluster = 'acid-minimal-cluster'

try:
# enable owner references in config
enable_owner_refs = {
"data": {
"enable_owner_references": "true"
}
}
k8s.update_config(enable_owner_refs)
self.eventuallyEqual(lambda: k8s.get_operator_state(), {"0": "idle"}, "Operator does not get in sync")

time.sleep(5) # wait for the operator to sync the cluster and update resources

# check if child resources were updated with owner references
self.assertTrue(self.check_cluster_child_resources_owner_references(cluster_name, self.test_namespace), "Owner references not set on all child resources of {}".format(cluster_name))
self.assertTrue(self.check_cluster_child_resources_owner_references(default_test_cluster), "Owner references not set on all child resources of {}".format(default_test_cluster))

# delete the new cluster to test owner references
# and also to make k8s_api.get_operator_state work better in subsequent tests
# ideally we should delete the 'test' namespace here but the pods
# inside the namespace stuck in the Terminating state making the test time out
k8s.api.custom_objects_api.delete_namespaced_custom_object(
"acid.zalan.do", "v1", self.test_namespace, "postgresqls", cluster_name)

# statefulset, pod disruption budget and secrets should be deleted via owner reference
self.eventuallyEqual(lambda: k8s.count_pods_with_label(cluster_label), 0, "Pods not deleted")
self.eventuallyEqual(lambda: k8s.count_statefulsets_with_label(cluster_label), 0, "Statefulset not deleted")
self.eventuallyEqual(lambda: k8s.count_pdbs_with_label(cluster_label), 0, "Pod disruption budget not deleted")
self.eventuallyEqual(lambda: k8s.count_secrets_with_label(cluster_label), 0, "Secrets were not deleted")

time.sleep(5) # wait for the operator to also delete the leftovers

# pvcs and Patroni config service/endpoint should not be affected by owner reference
# but deleted by the operator almost immediately
self.eventuallyEqual(lambda: k8s.count_pvcs_with_label(cluster_label), 0, "PVCs not deleted")
self.eventuallyEqual(lambda: k8s.count_services_with_label(cluster_label), 0, "Patroni config service not deleted")
self.eventuallyEqual(lambda: k8s.count_endpoints_with_label(cluster_label), 0, "Patroni config endpoint not deleted")

# disable owner references in config
disable_owner_refs = {
"data": {
"enable_owner_references": "false"
}
}
k8s.update_config(disable_owner_refs)
self.eventuallyEqual(lambda: k8s.get_operator_state(), {"0": "idle"}, "Operator does not get in sync")

time.sleep(5) # wait for the operator to remove owner references

# check if child resources were updated without Postgresql owner references
self.assertTrue(self.check_cluster_child_resources_owner_references(default_test_cluster, "default", True), "Owner references still present on some child resources of {}".format(default_test_cluster))

except timeout_decorator.TimeoutError:
print('Operator log: {}'.format(k8s.get_operator_log()))
raise

@timeout_decorator.timeout(TEST_TIMEOUT_SEC)
def test_password_rotation(self):
'''
Expand Down Expand Up @@ -1838,7 +1897,6 @@ def test_rolling_update_flag(self):
replica = k8s.get_cluster_replica_pod()
self.assertTrue(replica.metadata.creation_timestamp > old_creation_timestamp, "Old master pod was not recreated")


except timeout_decorator.TimeoutError:
print('Operator log: {}'.format(k8s.get_operator_log()))
raise
Expand Down Expand Up @@ -2412,6 +2470,39 @@ def assert_distributed_pods(self, target_nodes, cluster_labels='cluster-name=aci

return True

def check_cluster_child_resources_owner_references(self, cluster_name, cluster_namespace='default', inverse=False):
k8s = self.k8s

# check if child resources were updated with owner references
sset = k8s.api.apps_v1.read_namespaced_stateful_set(cluster_name, cluster_namespace)
self.assertTrue(self.has_postgresql_owner_reference(sset.metadata.owner_references, inverse), "statefulset owner reference check failed")

svc = k8s.api.core_v1.read_namespaced_service(cluster_name, cluster_namespace)
self.assertTrue(self.has_postgresql_owner_reference(svc.metadata.owner_references, inverse), "primary service owner reference check failed")
replica_svc = k8s.api.core_v1.read_namespaced_service(cluster_name + "-repl", cluster_namespace)
self.assertTrue(self.has_postgresql_owner_reference(replica_svc.metadata.owner_references, inverse), "replica service owner reference check failed")

ep = k8s.api.core_v1.read_namespaced_endpoints(cluster_name, cluster_namespace)
self.assertTrue(self.has_postgresql_owner_reference(ep.metadata.owner_references, inverse), "primary endpoint owner reference check failed")
replica_ep = k8s.api.core_v1.read_namespaced_endpoints(cluster_name + "-repl", cluster_namespace)
self.assertTrue(self.has_postgresql_owner_reference(replica_ep.metadata.owner_references, inverse), "replica owner reference check failed")

pdb = k8s.api.policy_v1.read_namespaced_pod_disruption_budget("postgres-{}-pdb".format(cluster_name), cluster_namespace)
self.assertTrue(self.has_postgresql_owner_reference(pdb.metadata.owner_references, inverse), "pod disruption owner reference check failed")

pg_secret = k8s.api.core_v1.read_namespaced_secret("postgres.{}.credentials.postgresql.acid.zalan.do".format(cluster_name), cluster_namespace)
self.assertTrue(self.has_postgresql_owner_reference(pg_secret.metadata.owner_references, inverse), "postgres secret owner reference check failed")
standby_secret = k8s.api.core_v1.read_namespaced_secret("standby.{}.credentials.postgresql.acid.zalan.do".format(cluster_name), cluster_namespace)
self.assertTrue(self.has_postgresql_owner_reference(standby_secret.metadata.owner_references, inverse), "standby secret owner reference check failed")

return True

def has_postgresql_owner_reference(self, owner_references, inverse):
if inverse:
return owner_references is None or owner_references[0].kind != 'postgresql'

return owner_references is not None and owner_references[0].kind == 'postgresql' and owner_references[0].controller

def list_databases(self, pod_name):
'''
Get list of databases we might want to iterate over
Expand Down
3 changes: 2 additions & 1 deletion manifests/configmap.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ data:
enable_master_pooler_load_balancer: "false"
enable_password_rotation: "false"
enable_patroni_failsafe_mode: "false"
enable_secrets_deletion: "true"
enable_owner_references: "false"
enable_persistent_volume_claim_deletion: "true"
enable_pgversion_env_var: "true"
# enable_pod_antiaffinity: "false"
Expand All @@ -59,6 +59,7 @@ data:
enable_readiness_probe: "false"
enable_replica_load_balancer: "false"
enable_replica_pooler_load_balancer: "false"
enable_secrets_deletion: "true"
# enable_shm_volume: "true"
# enable_sidecars: "true"
enable_spilo_wal_path_compat: "true"
Expand Down
2 changes: 2 additions & 0 deletions manifests/operator-service-account-rbac-openshift.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,7 @@ rules:
- create
- delete
- get
- patch
- update
# to check nodes for node readiness label
- apiGroups:
Expand Down Expand Up @@ -166,6 +167,7 @@ rules:
- get
- list
- patch
- update
# to CRUD cron jobs for logical backups
- apiGroups:
- batch
Expand Down
Loading

0 comments on commit a87307e

Please sign in to comment.