-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update precedence for schedule #677
Conversation
InfoIt was decided to have the precedence like this so that it is easier for the admin to update the schedule on all the PVCs by just updating it on NS or SC. Without it the admin would need to update it on per PVC basis. Since we do not have a controller that watches NS changes yet, updates to NS would not trigger a reconcile, but if a schedule is present on NS, it will be read and used while reconciling SC or PVC. TestingUsing precedence: sc-first
Using Precedence: pvc-first
Regards |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This patch enhances the parsing logic of the schedule within annotations to establish the following precedence: NS > SC > PVC.
This adjustment applies to both key rotation and reclaim space processes.
The schedule indicated in the PVC annotations will consistently reflect the value of the highest precedence.
Any manual modifications to this schedule will be overwritten.
Where was this decided ?
b283354
to
cd3dab1
Compare
As we had a meeting about this, it would be good to include a summary of the discussion in this PR. From what I remember, we want to prevent users (non admins) from interfering with space reclaim, which needs:
... did I forget something? |
7a80ff8
to
1b1247c
Compare
// DS flag, read only from the SC | ||
if schedule = r.getScheduleFromSC(ctx, pvc, logger, annotationKey); schedule != "" { | ||
return schedule, nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The option should be SC-only instead pvc-first/sc-first?
This would make is a lot simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. Having one mode of operation is always better... But it was the general consensus to preserve the existing (current) functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. Having one mode of operation is always better... But it was the general consensus to preserve the existing (current) functionality.
I meant SC-only option which toggles between pvc>ns>sc and only sc annotation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, right. Makes sense :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// DS flag, read only from the SC
This comment should be present. The option can be used by all users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IG you meant shouldn't be present? Removed it.
if _, ok := krcJob.GetAnnotations()[krcJobExcludeAnnotation]; ok { | ||
logger.Info("EncryptionKeyRotationCronJob has exclude annotation set, exiting reconcile") | ||
return nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you considered using suspend functionality https://github.com/csi-addons/kubernetes-csi-addons/blob/main/api/csiaddons/v1alpha1/reclaimspacecronjob_types.go#L59-L62 instead of exclude annotation ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. The /exclude
annotation serves the purpose of not overwriting the user modifications in a next reconcile. It is a way for the user to tell the reconciler that he/she wants to have control over the CR. Without it the suspend value would be reset (to false) in a subsequent reconcile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@black-dragon74, instead of /exclude
annotation can we just update suspend field from existing CronJob when updating the CronJob?
kubernetes-csi-addons/internal/controller/csiaddons/persistentvolumeclaim_controller.go
Lines 629 to 647 in 598d50f
if rsCronJob != nil { | |
newRSCronJob := constructRSCronJob(rsCronJob.Name, req.Namespace, schedule, pvc.Name) | |
if reflect.DeepEqual(newRSCronJob.Spec, rsCronJob.Spec) { | |
logger.Info("No change in reclaimSpaceCronJob.Spec, exiting reconcile") | |
return ctrl.Result{}, nil | |
} | |
// update rsCronJob spec | |
rsCronJob.Spec = newRSCronJob.Spec | |
err = r.Client.Update(ctx, rsCronJob) | |
if err != nil { | |
logger.Error(err, "Failed to update reclaimSpaceCronJob") | |
return ctrl.Result{}, err | |
} | |
logger.Info("Successfully updated reclaimSpaceCronJob") | |
return ctrl.Result{}, nil | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a user edits a CronJob and adds the suspend field to its spec, it will stop the CronJob controller from executing related operations (like creating new Jobs). However, this won’t prevent the PersistentVolume (PV) controller from updating or recreating the CronJob resource if it notices a difference between the current spec and the expected one.
This is where the /exclude
option comes in. If this option is present, the PV controller will ignore the CronJob spec assertion, allowing the user's changes to persist. The CronJob controller can then reconcile the resource based on the updated spec. This is particularly useful in cases where user wants to have a different schedule.
TLDR; We need a way to tell the controller that the user wants to have control of the existing CR and I would prefer it to be explicit. /exclude
ticks both the boxes. If you have anything else to suggest, I'd love that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@black-dragon74 So its a 3 step process.
user finds the cronjob, adds exclude annotation on it and then sets suspend field to true ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. We can do without the exclude annotation for suspend but modifications to schedule depend on it. So we decided to have it present in both the cases.
c8e62f9
to
69abd2c
Compare
69abd2c
to
950696b
Compare
@black-dragon74 the annotations should be mentioned in the documentation too. Please add a paragraph about those. |
May I follow the documentation updates in a separate PR? P.S: The upcoming disable operations are related to this PR and documentation would be similar as well. |
if schedule, err = r.getScheduleFromNS(ctx, pvc, logger, driverName, annotationKey); schedule != "" { | ||
return schedule, nil | ||
} | ||
if errors.Is(err, ErrConnNotFoundRequeueNeeded) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should proceed to check schedule on SC only when you get ErrScheduleNotFound here right ?
you need to error out for ConnNotFound + errors other than ErrScheduleNotFound ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, in cases where the schedule in present only on the NS and we are unable to fetch the NS.
950696b
to
75c600a
Compare
|
||
krcJobScheduleTimeAnnotation = "keyrotation." + csiaddonsv1alpha1.GroupVersion.Group + "/schedule" | ||
krcJobNameAnnotation = "keyrotation." + csiaddonsv1alpha1.GroupVersion.Group + "/cronjob" | ||
krCSIAddonsDriverAnnotation = "keyrotation." + csiaddonsv1alpha1.GroupVersion.Group + "/drivers" | ||
krcJobExcludeAnnotation = "keyrotation." + csiaddonsv1alpha1.GroupVersion.Group + "/exclude" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of exclude, can we have a state that represents state i.e. managed/unmanaged. by default the controller sets it to managed state, and if the user/admin wants they can change the state to unmanaged and modify the CR. if they want to revert back the changes, just set the state back to managed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an annotation or a spec field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now annotation only.
75c600a
to
059a581
Compare
My preference is to include a commit about it in this PR. There is a large chance it is forgotten otherwise. |
func annotationValueMissing(scAnnotations, pvcAnnotations map[string]string, keys []string) bool { | ||
// AnnotationValueMissingOrDiff checks if any of the specified keys are missing | ||
// or differ from the PVC annotations when they are present in the StorageClass annotations. | ||
func annotationValueMissingOrDiff(scAnnotations, pvcAnnotations map[string]string, keys []string) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AnnotationValueMissingOrDiff
to annotationValueMissingOrDiff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
} | ||
|
||
logger.Error(err, "Failed to get StorageClass", "StorageClass", storageClassName) | ||
return "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why we are returning empty in case of actual error, IMO this should be retried if there any problem in getting the SC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
determineScheduleAndRequeue
will return the ErrScheduleNotFound
upon getting an empty schedule from getScheduleFromSC
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about the case where we failed to get the SC (apart from not found error)? we should retry in that case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -91,6 +91,7 @@ func main() { | |||
flag.StringVar(&cfg.Namespace, "namespace", cfg.Namespace, "Namespace where the CSIAddons pod is deployed") | |||
flag.BoolVar(&enableAdmissionWebhooks, "enable-admission-webhooks", false, "[DEPRECATED] Enable the admission webhooks") | |||
flag.BoolVar(&showVersion, "version", false, "Print Version details") | |||
flag.StringVar(&cfg.SchedulePrecedence, "schedule-precedence", "", "The order of precedence in which schedule of reclaimspace and keyrotation is considered. Possible values are sc-only") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we have a validation for this one to ensure that only user is passing expected values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case of invalid values, we fall back to the default precedence (ignoring it). The new precedence is used only when the value is sc-only
. You want to exit in case of invalid value (we do that in case of configmap)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to validate and exit in the main.go or from where ever we are reading the value from the configmap, if we don't have the logs we will not get to know why it got skipped
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
519fb8e
to
ba60945
Compare
da1b1e9
to
e5241db
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, please update the PR description to match the code changes
This patch modifies the parsing logic of schedule found in annotation so that the precedence is in form: SC > NS > PVC This applies for both keyrotation and reclaimspace The schedule present on the PVC annotations will always be equal to that of the highest precedence. Modifying it manually will lead to it being overwritten. Signed-off-by: Niraj Yadav <[email protected]>
This commit introduces a new annotation that tracks the managed state for the CRs created by PersistentVolumeClaim controller. If the value of this annotation is `unmanaged` the pvc controller will not make any modifications to the CR. Signed-off-by: Niraj Yadav <[email protected]>
This commit updates the docs for ReclaimSpace and EncryptionKeyRotation for the new state annotation: `csiaddons.openshift.io/state` Signed-off-by: Niraj Yadav <[email protected]> fixme Signed-off-by: Niraj Yadav <[email protected]> fixme Signed-off-by: Niraj Yadav <[email protected]>
e5241db
to
4cb36d7
Compare
This patch updates the schedule parsing logic in the following manner:
schedule-precedence
. Valid values are:sc-first
.sc-first
is the new DS specific flag that only considers SCs as source of truth for schedule.This change aims to put the control of managing RS/KR operations to the Storage Admins.
If an application has specific needs, the Admin can grant the necessary RBACs so that the app owner
can modify the schedule on RS/KR CronJobs. One would achive it in the following manner.
csiaddons.openshift.io/state=managed
tocsiaddons.openshift.io/state=unmanaged
Once a CronJob has
state
set tounmanaged
, the application owner is in control of the operations.