-
Notifications
You must be signed in to change notification settings - Fork 322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sync Catalog: Add k8s endpoint health state to consul health check #3874
Sync Catalog: Add k8s endpoint health state to consul health check #3874
Conversation
2d02bf3
to
e360996
Compare
@david-yu could you possibly help get a review on this please? |
bumpety bump in case anyone's watching |
@jukie taking a look at this now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confirmed that the relevant acceptance tests are passing, and the changes look good to me. Just one small comment to address!
Tomorrow I want to make sure the failing peering acceptance tests are unrelated, and if you're able to make that one small test case change then I think we can merge after.
Hey @jukie I was about to merge yesterday but then I was thinking more about the use cases you listed in the issue, and since Consul supports health checks, should we instead use Consul healthchecks to reflect the readiness of an endpoint, rather than ignoring it entirely if its non-ready? That way the user can configure Consul as they wish based on the health checks. I talked to the team and the consensus was that it makes more sense to use the health checks to reflect the readiness status. If you're willing to make that change, I'll keep an eye on this PR (or a new PR if you make one, feel free to tag me) for those changes, or I can bring up the changes for prioritization within the team and work on it myself (but I'm not sure when I'll be able to commit to having this done). Thanks so much for your effort on this. |
@ndhanushkodi could you explain how that would work? I think that may be in addition to or an extension of this change vs instead of. |
I'd be willing to make a new PR to support consul health checks as well but I'd like to understand the opinion that it wouldn't be a separate feature contribution. The existing behavior reacts to Service Endpoint changes and this extends that to make use of the additional state provided by EndpointSlices. |
Thanks for your willingness to contribute and so sorry for the late response! The existing behavior seems to register all endpoints in the endpoint slice, regardless of the condition of that endpoint, so if I created a service where one of the endpoints went into a non-ready state, that would still get registered as a service instance in Consul. This PR would change the behavior to no longer register service instances in Consul for endpoints that are in a non-ready state. My thought with the suggestion above was we should continue to register non-ready endpoints into Consul as we do today, but instead mark the health check unhealthy. Rather than changing the behavior to not register the non-ready endpoints, I'm proposing the health check change as the way to deal with non-ready endpoints. For the cases that use the function Let me know if I'm misunderstanding anything you're saying, or if this makes sense.
This is for K8s Services right? Whether the endpoint addresses should exist in the K8s Endpoint itself? This change is about whether the K8s Endpoint addresses that exist should get synced to Consul if they are non-ready so I think that feature doesn't quite match the behavior I mentioned in my suggestion. Again, let me know if I'm misunderstanding what you mean! |
@ndhanushkodi Sorry for the late reply as well but coming back to this. You are correct that it would change behavior but I think that behavior matches the intention given the way that services would already be registered with details stating that "Kubernetes health checks passing" https://github.com/hashicorp/consul-k8s/blob/main/control-plane/catalog/to-consul/resource.go#L44-L48 If you'd like to include a way that replicates prior behavior for backwards compatibility I could instead add a new option that would always register an endpoint no matter what. This change is to achieve the desired behavior of what catalog sync says it's doing. As shown in the link above when services are registered it is already assumed that they are "ready" and skipping non-ready endpoints looks to be a limitation in how Endpoints worked since that state information wasn't available. Now that EndpointSlices are in use we have more granular control and should make use of it. |
Thinking more, ignoring might not be the best path but instead adjusting the health check state that's added to the registered consul service. |
94b6367
to
4ab7e80
Compare
@ndhanushkodi I've made that adjustment to instead include endpoint health info and retain the logic of still registering all services. Could you have another look please? |
Not sure if @ndhanushkodi is still an active maintainer so adding @zalimeni as well |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late response I was working on some other tickets. Thanks for making these changes, I agree with this approach. I think the resource_test.go file will have failing tests (when I pulled those changes locally, it doesn't work)-- are you able to update those? Once that's in a passing state I can run our pipelines against your changes and we can get it merged
Signed-off-by: jukie <[email protected]>
@ndhanushkodi sorry for my lack of checking beforehand! Updated the tests and re-ran acceptance ones as well. |
@ndhanushkodi was there any other changes you'd like to see here or is it ready to merge? |
We are very interested in having this functionality at my company! |
…3874) Signed-off-by: jukie <[email protected]> --------- Signed-off-by: jukie <[email protected]>
…3874) Signed-off-by: jukie <[email protected]> --------- Signed-off-by: jukie <[email protected]>
Changes proposed in this PR
I recently contributed a change here from using Endpoints to EndpointSlices.
EndpointSlices offer additional state information such as readiness information but in the current approach we aren't taking advantage of this and instead only add/remove endpoints based on creation or deletions.
We can utilize the Endpoint conditions to accurately represent a registered service's health-state instead of always registering as healthy.
How I've tested this PR
How I expect reviewers to test this PR
An easy way to test would be to launch a pod with a long terminationGracePeriod that stays up and initiate a deployment rollout or delete the pod. Once the pod enters terminating state you'll see the same state in the relevant
EndpointSlice
.At this point the registered consul service instance will have its health check updated to "critical" phase.
Checklist
fixes #3898