Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add volume group replication controller code #610

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

Nikhil-Ladha
Copy link
Contributor

@Nikhil-Ladha Nikhil-Ladha commented Jul 8, 2024

This PR adds the following codes/logic:

  • Added volume group replication controller logic
  • Added generated crds, rbacs
  • Added docs to create VGR, VGRClass and VGRContent CRs

@Nikhil-Ladha
Copy link
Contributor Author

Nikhil-Ladha commented Jul 8, 2024

The changes are not yet tested, will wait for #605 to get merged and then update a few things and test it.
In the meantime, please feel free to provide some initial reviews.
Thanks :)

@Nikhil-Ladha Nikhil-Ladha force-pushed the add-vgr-controller-us branch from f322110 to 10df032 Compare July 11, 2024 07:36
internal/proto/volumegroup.proto Outdated Show resolved Hide resolved
internal/proto/volumegroup.proto Outdated Show resolved Hide resolved
@Nikhil-Ladha Nikhil-Ladha force-pushed the add-vgr-controller-us branch from 10df032 to 15b901d Compare July 11, 2024 09:27
Copy link
Member

@Madhu-1 Madhu-1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still need to do few more rounds of review, just done it for few files

controllers/replication.storage/finalizers.go Outdated Show resolved Hide resolved
controllers/replication.storage/finalizers.go Outdated Show resolved Hide resolved
controllers/replication.storage/finalizers.go Outdated Show resolved Hide resolved
controllers/replication.storage/utils.go Outdated Show resolved Hide resolved
// Check if the PVC has any labels defined
objLabels := obj.GetLabels()
if len(objLabels) == 0 {
return []reconcile.Request{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there can be a case where someone removed the label from the PVC, we need to remove that from the VGR as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I understand from the docs, the enqueue function would be ran for both the old,new objects in case of update function. So, when the request is ran for older object, the details from the pvc annotation will be fetched and the VGR reconcile request will be enqueued, and since the object is already updated in the cluster the selector won't inlcude that PVC.
Though this is all theoritical, will test it to confirm.

Comment on lines 445 to 473
vgrObjsList := &replicationv1alpha1.VolumeGroupReplicationList{}
logger := log.FromContext(context)
err := r.Client.List(context, vgrObjsList)
if err != nil {
logger.Error(err, "failed to list VolumeGroupReplication instances")
return []reconcile.Request{}
}

// Check if the pvc labels match any VGRs based on selectors present in it's annotation
for _, vgr := range vgrObjsList.Items {
if vgr.Annotations != nil && vgr.Annotations["pvcSelector"] != "" {
labelSelector, err := labels.Parse(vgr.Annotations["pvcSelector"])
if err != nil {
logger.Error(err, "failed to parse selector from VolumeGroupReplication's annotation")
return []reconcile.Request{}
}
objLabelsSet := labels.Set(objLabels)
if labelSelector.Matches(objLabelsSet) {
return []reconcile.Request{
{
NamespacedName: types.NamespacedName{
Namespace: vgr.Namespace,
Name: vgr.Name,
},
},
}
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@Nikhil-Ladha Nikhil-Ladha Jul 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I will just update the code to fetch the VGR name from the annotation, since now we are adding the owner annotation to PVC.
EDIT: Ok, can't exactly do that. Since, new PVCs won't be considered then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing is we can't use a helper function like the one you mentioned is because the selector is not a single field that we can compare or so, and to match labels we need to create a selector requirement as done in the getPVCFromDataSource function.

@Nikhil-Ladha Nikhil-Ladha force-pushed the add-vgr-controller-us branch from 15b901d to 6b05893 Compare July 12, 2024 12:02
@mergify mergify bot added the api Change to the API, requires extra care label Jul 12, 2024
@mergify mergify bot requested a review from ShyamsundarR July 12, 2024 12:02
@Nikhil-Ladha Nikhil-Ladha force-pushed the add-vgr-controller-us branch 7 times, most recently from 0121241 to 8e8830d Compare July 18, 2024 11:38
@mergify mergify bot added the vendor Pull requests that update vendored dependencies label Jul 18, 2024
@Nikhil-Ladha Nikhil-Ladha force-pushed the add-vgr-controller-us branch from 8e8830d to fbeb1a7 Compare July 19, 2024 10:13
Copy link
Collaborator

@nixpanic nixpanic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add documentation with an example in the docs/ directory? That makes it easier to visualize how things are supposed to work.

@nixpanic nixpanic dismissed their stale review August 5, 2024 09:49

Nothing blocking, only test results missing.


"github.com/csi-addons/kubernetes-csi-addons/internal/controller/replication.storage/replication"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reorganize imports, internal pkgs go last

- Then, create the VR CR and add VGR name/namespace as the annotation to it
- Update the VGR status with the VR status.

In case of deletion:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handle deletion case of object first.
maybe define an instance object like https://github.com/csi-addons/kubernetes-csi-addons/blob/main/internal/controller/replication.storage/volumereplication_controller.go#L616
and then write helper functions with it ?

The reconcile function is >250 lines.

Copy link
Contributor Author

@Nikhil-Ladha Nikhil-Ladha Aug 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handle deletion case of object first.

I feel, if we handle deletion first then it becomes a bit arbritary while reading the code to understand what exactly is happening and why. Having it in the end tells the reader that these all operations happened when the CR was created and that's why all these cleanups are needed.

maybe define an instance object like https://github.com/csi-addons/kubernetes-csi-addons/blob/main/internal/controller/replication.storage/volumereplication_controller.go#L616
and then write helper functions with it ?

I don't see any use case of defining such an instance object here, the only helper function here is the getPVCFromDataSource and it doesn't makes sense to add this extra type for a single function.

The reconcile function is >250 lines.

It's a bit long I agree, but are there any rules defined for the lenght of the reconcile function?
Secondly, the operations that are happening in the reconcile function are very basic and can't be broken into individual helper functions. Those, that can be broken are already done. If you have any suggestions for it, I would be happy to know.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the code to move the creation of CRs into their individual functions.
Now it's < 250 :)

@Nikhil-Ladha Nikhil-Ladha force-pushed the add-vgr-controller-us branch from cb4b33d to 9ad19b0 Compare August 7, 2024 09:56
@Nikhil-Ladha
Copy link
Contributor Author

Looks very promising to me, thanks!

Can you include some testing results in a PR-comment?

Here's the result of creating the VGR:

apiVersion: v1
items:
- apiVersion: replication.storage.openshift.io/v1alpha1
  kind: VolumeGroupReplication
  metadata:
    annotations:
      pvcSelector: group=replication
      replication.storage.openshift.io/volume-replication-name: vr-c4e68900-e0c5-465b-802a-c6fd011ff40c
      replication.storage.openshift.io/volumegroupreplication-content-name: vgrcontent-c4e68900-e0c5-465b-802a-c6fd011ff40c
    creationTimestamp: "2024-08-08T05:53:25Z"
    finalizers:
    - replication.storage.openshift.io/vgr-protection
    generation: 3
    labels:
      app.kubernetes.io/created-by: kubernetes-csi-addons
      app.kubernetes.io/instance: volumegroupreplication-sample
      app.kubernetes.io/managed-by: kustomize
      app.kubernetes.io/name: volumegroupreplication
      app.kubernetes.io/part-of: kubernetes-csi-addons
    name: volumegroupreplication-sample
    namespace: rook-ceph
    resourceVersion: "193975"
    uid: c4e68900-e0c5-465b-802a-c6fd011ff40c
  spec:
    autoResync: false
    replicationState: primary
    source:
      selector:
        matchLabels:
          group: replication
    volumeGroupReplicationClassName: volumegroupreplicationclass-sample
    volumeGroupReplicationContentName: vgrcontent-c4e68900-e0c5-465b-802a-c6fd011ff40c
    volumeReplicationClassName: rbd-volumereplicationclass
    volumeReplicationName: vr-c4e68900-e0c5-465b-802a-c6fd011ff40c
  status:
    conditions:
    - lastTransitionTime: "2024-08-08T05:53:26Z"
      message: ""
      observedGeneration: 1
      reason: FailedToPromote
      status: "False"
      type: Completed
    - lastTransitionTime: "2024-08-08T05:53:26Z"
      message: ""
      observedGeneration: 1
      reason: Error
      status: "True"
      type: Degraded
    - lastTransitionTime: "2024-08-08T05:53:26Z"
      message: ""
      observedGeneration: 1
      reason: NotResyncing
      status: "False"
      type: Resyncing
    message: 'volume 0001-0009-rook-ceph-0000000000000002-104cfb18-4ea6-4916-b301-94401d466e34
      not found: Failed as image not found (internal RBD image not found)'
    observedGeneration: 1
    persistentVolumeClaimsRefList:
    - name: test-pvc
    state: Unknown

The CR is created and the dependend CRs i.e, VR and VGRContent created as well

apiVersion: v1
items:
- apiVersion: replication.storage.openshift.io/v1alpha1
  kind: VolumeGroupReplicationContent
  metadata:
    annotations:
      replication.storage.openshift.io/volumegroupref: volumegroupreplication-sample/rook-ceph
    creationTimestamp: "2024-08-08T05:53:25Z"
    finalizers:
    - replication.storage.openshift.io/vgr-protection
    generation: 2
    name: vgrcontent-c4e68900-e0c5-465b-802a-c6fd011ff40c
    resourceVersion: "193969"
    uid: 10c4a992-107b-49ad-aa77-6bc989f42184
  spec:
    provisioner: rook-ceph.rbd.csi.ceph.com
    source:
      volumeHandles:
      - 0001-0009-rook-ceph-0000000000000002-6526bb74-044a-489f-a821-7d8c042719b4
    volumeGroupReplicationClassName: volumegroupreplicationclass-sample
    volumeGroupReplicationHandle: 0001-0009-rook-ceph-0000000000000002-104cfb18-4ea6-4916-b301-94401d466e34
    volumeGroupReplicationRef:
      apiVersion: replication.storage.openshift.io/v1alpha1
      kind: VolumeGroupReplication
      name: volumegroupreplication-sample
      namespace: rook-ceph
      uid: c4e68900-e0c5-465b-802a-c6fd011ff40c
  status:
    persistentVolumeRefList:
    - name: pvc-7228cbb8-7a2f-4daf-bea4-4dabbd2ecabe
apiVersion: v1
items:
- apiVersion: replication.storage.openshift.io/v1alpha1
  kind: VolumeReplication
  metadata:
    annotations:
      replication.storage.openshift.io/volumegroupref: volumegroupreplication-sample/rook-ceph
    creationTimestamp: "2024-08-08T05:53:26Z"
    finalizers:
    - replication.storage.openshift.io
    generation: 1
    name: vr-c4e68900-e0c5-465b-802a-c6fd011ff40c
    namespace: rook-ceph
    ownerReferences:
    - apiVersion: replication.storage.openshift.io/v1alpha1
      kind: VolumeGroupReplication
      name: volumegroupreplication-sample
      uid: c4e68900-e0c5-465b-802a-c6fd011ff40c
    resourceVersion: "193974"
    uid: 2f1f45bf-417b-41ad-9973-71741dda555d
  spec:
    autoResync: false
    dataSource:
      apiGroup: replication.storage.openshift.io
      kind: VolumeGroupReplication
      name: volumegroupreplication-sample
    replicationHandle: ""
    replicationState: primary
    volumeReplicationClass: rbd-volumereplicationclass
  status:
    conditions:
    - lastTransitionTime: "2024-08-08T05:53:26Z"
      message: ""
      observedGeneration: 1
      reason: FailedToPromote
      status: "False"
      type: Completed
    - lastTransitionTime: "2024-08-08T05:53:26Z"
      message: ""
      observedGeneration: 1
      reason: Error
      status: "True"
      type: Degraded
    - lastTransitionTime: "2024-08-08T05:53:26Z"
      message: ""
      observedGeneration: 1
      reason: NotResyncing
      status: "False"
      type: Resyncing
    message: 'volume 0001-0009-rook-ceph-0000000000000002-104cfb18-4ea6-4916-b301-94401d466e34
      not found: Failed as image not found (internal RBD image not found)'
    observedGeneration: 1
    state: Unknown

@nixpanic
Copy link
Collaborator

There are a few minor conflicts that need to be resolved.

Is there a way to set the status.conditions[*].message instead, or in addition to the single (last?) status.message?

@Nikhil-Ladha Nikhil-Ladha force-pushed the add-vgr-controller-us branch from 9ad19b0 to 6ed68f4 Compare August 27, 2024 10:14
@Nikhil-Ladha
Copy link
Contributor Author

Nikhil-Ladha commented Aug 27, 2024

There are a few minor conflicts that need to be resolved.

Resolved the conflicts.

Is there a way to set the status.conditions[*].message instead, or in addition to the single (last?) status.message?

Well, we are just copying the status from VolumeReplication CR, so I am not sure if there is any way we can change that other than updating the VolumeReplication CR :/

@mergify mergify bot requested a review from black-dragon74 September 3, 2024 12:15
@Nikhil-Ladha Nikhil-Ladha force-pushed the add-vgr-controller-us branch 3 times, most recently from 569acd1 to 9ada567 Compare September 3, 2024 13:25
@Nikhil-Ladha Nikhil-Ladha force-pushed the add-vgr-controller-us branch from 9ada567 to 7e35516 Compare October 25, 2024 05:28
@Nikhil-Ladha Nikhil-Ladha force-pushed the add-vgr-controller-us branch 2 times, most recently from 3634fbb to 1588709 Compare November 21, 2024 04:07
@Nikhil-Ladha Nikhil-Ladha force-pushed the add-vgr-controller-us branch 10 times, most recently from 7d2176f to fdbc6b0 Compare December 4, 2024 08:40
added controller logic for volume group replication

Signed-off-by: Nikhil-Ladha <[email protected]>
add generated changes for crds, rbacs

Signed-off-by: Nikhil-Ladha <[email protected]>
add docs for volumegroupreplication and volumegroupreplicationclass
CRs.

Signed-off-by: Nikhil-Ladha <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api Change to the API, requires extra care vendor Pull requests that update vendored dependencies
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants