-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add kserve raw reconciler #250
base: main
Are you sure you want to change the base?
add kserve raw reconciler #250
Conversation
Signed-off-by: Vedant Mahabaleshwarkar <[email protected]>
Skipping CI for Draft Pull Request. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: VedantMahabaleshwarkar The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
NewKServeMetricsServiceReconciler(client), | ||
NewKServeMetricsServiceMonitorReconciler(client), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this break existing metrics functionality for serverless?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I got my answer: #250 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I remember is that these reconcilers were left for doing clean-up of some resources that we no longer need (see the old createDesiredResource
that just returns nil as the "wanted" resource).
By removing these reconcilers, the clean-up is what will no longer happen for version upgrades. I'm not sure if not doing the clean-up will break something, but I'm quite sure we still want the clean-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The old metrics reconcilers for kserve serverless were present pre 2.8 (EUS release)
Kserve Raw support will happen after 2.13 at minimum (2.12 is a EUS if i'm not wrong)
I can keep an "empty reconciliation" loop around, but the old resources should long by the time kserve raw is GA. This was my reasoning for assuming that it's safe to delete the old reconcilers for serverless
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the upgrade path for any install that still has those reconcilers should look like
2.7 or lower
-> 2.8
-> 2.12
(at minimum) -> kserve raw GA.
If i'm mistaken in this assumption I'll re-add the cleanup.
} | ||
|
||
if !createRoute { | ||
log.Info("InferenceService does not have '" + constants.LabelEnableRoute + "' annotation set to 'True'. Skipping route creation") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either set to false or does not exist
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't get this, could you elaborate?
} | ||
|
||
func (r *KserveRawRouteReconciler) Cleanup(_ context.Context, _ logr.Logger, _ string) error { | ||
// NOOP - resources are deleted together with ISVCs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to check existing route that are not created by the controller, e.g. no owner reference?
NoResourceRemoval | ||
client client.Client | ||
serviceMonitorHandler resources.ServiceMonitorHandler | ||
deltaProcessor processors.DeltaProcessor | ||
} | ||
|
||
func NewKServeMetricsServiceMonitorReconciler(client client.Client) *KserveMetricsServiceMonitorReconciler { | ||
return &KserveMetricsServiceMonitorReconciler{ | ||
func NewRawKServeMetricsServiceMonitorReconciler(client client.Client) *KserveRawMetricsServiceMonitorReconciler { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe keep the same but skips reconciliation for serverless mode?
@@ -17,6 +17,7 @@ package reconcilers | |||
|
|||
import ( | |||
"context" | |||
"github.com/hashicorp/go-multierror" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this to the next group of imports.
func (r *KserveRawInferenceServiceReconciler) Reconcile(ctx context.Context, log logr.Logger, isvc *kservev1beta1.InferenceService) error { | ||
var reconcileErrors *multierror.Error | ||
for _, reconciler := range r.subResourceReconcilers { | ||
reconcileErrors = multierror.Append(reconcileErrors, reconciler.Reconcile(ctx, log, isvc)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You also will need to implement OnDeletionOfKserveInferenceService
and CleanupNamespaceIfNoKserveIsvcExists
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep those changes are made already :D, they will be a part of the next set of commits
NewKServeMetricsServiceReconciler(client), | ||
NewKServeMetricsServiceMonitorReconciler(client), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I remember is that these reconcilers were left for doing clean-up of some resources that we no longer need (see the old createDesiredResource
that just returns nil as the "wanted" resource).
By removing these reconcilers, the clean-up is what will no longer happen for version upgrades. I'm not sure if not doing the clean-up will break something, but I'm quite sure we still want the clean-up.
if err != nil { | ||
return nil, err | ||
} | ||
metricsService := &v1.Service{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Raw, isn't it better to inject the serving.kserve.io/enable-prometheus-scraping: "true"
as documented in https://kserve.github.io/website/master/modelserving/observability/prometheus_metrics/ ?
I'm not sure we should be doing our own way, given the push for better API support in ODH.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm this puts the onus of enabling metrics for a inferenceservice on the creator of the inferenceservice, rather than the platform. Even if we add logic to inject this in the UI flow, if a user is using yamls then they would have to specifically remember to enable metrics. In ODH/RHOAI, I'm guessing we always want metrics and the user should not have to worry about remembering to enable them.
But I can be convinced otherwise :)
cc @danielezonca @terrytangyuan WDYT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, we could enable it by default in the ConfigMap: https://github.com/opendatahub-io/kserve/blob/ae30b971320f45bc98dd362b6b8873d7c1a48873/config/configmap/inferenceservice.yaml#L447. But doing that would also affect Serverless deployments, so not sure if it will impact something.
What I don't know is if this is specific to Serverless, because docs only refer to the Serverless use-case.
In case it works for Raw, I said to inject meaning we could use a mutating webhook in odh-model-controller to inject the annotation on the ISVC. I wasn't meaning that the UI should do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@israel-hdez revisiting this comment, we actively moved away from the kserve native annotations for metrics i.e serving.kserve.io/enable-prometheus-scraping: "true"
for Serverless, and we have instead opted for prometheus.io/port
and prometheus.io/path
annotations on the ServingRuntime instead.
So going back to serving.kserve.io/enable-prometheus-scraping: "true"
for RawDeployment would IMO be more confusing. Creating a SeviceMonitor and Service for metrics will allow us to be consistent w.r.t metrics within RHOAI.
} | ||
} | ||
|
||
func (r *KserveRawRouteReconciler) Reconcile(ctx context.Context, log logr.Logger, isvc *kservev1beta1.InferenceService) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For exposing, KServe supports creating Ingress
resources, but we have it turned off. OpenShift can follow-up and create Routes.
Given the push for better supporting the API, rather than doing our own thing, we should try to align better with upstream.
By creating Routes on our side, KServe is no longer aware. So, it won't expose what's the endpoint in status.url
of the InferenceService. This will just increase complexity on dashboard code. Also, it will be more complex for users because the status.url
field will be useless for Raw, but will be useful for Serverless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah nice, I realized the thing about the status.url
myself but didn't realize an option existed to do this in a way such that Kserve is aware. I'll incorporate this.
…e constants Signed-off-by: Vedant Mahabaleshwarkar <[email protected]>
Signed-off-by: Vedant Mahabaleshwarkar <[email protected]>
/retest-required |
@VedantMahabaleshwarkar: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
not sure why the images job is failing, I think it's likely a infra issue. Local builds are succeeding |
/hold |
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
hi @VedantMahabaleshwarkar could u pleas point me where is the unit test for this part of code ? |
@mwaykole about unit tests:
Hope that helps |
) | ||
|
||
const ( | ||
inferenceServiceLabelName = "serving.kserve.io/inferenceservice" | ||
) | ||
|
||
var _ SubResourceReconciler = (*KserveMetricsServiceReconciler)(nil) | ||
var _ SubResourceReconciler = (*KserveRawMetricsServiceReconciler)(nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so we will support Kserve Raw Metrics as well , and it should work same as server less ?
Description
https://issues.redhat.com/browse/RHOAIENG-10293
How Has This Been Tested?
Merge criteria: