Healthchecks #7686
Replies: 7 comments
-
Here's one example -- in this case the server is not known. This particular issue is less relevant with a server launcher - but other errors like permissions would be the same. Note that the real info we need is in relatedHTTPCode.
From a script we can get this out with
which may be sufficient (though jq is likely not on our base image, nor python.. so may need to add - or do in script |
Beta Was this translation helpful? Give feedback.
-
Adding 'jq' (I use this - or python - mostly) is probably the most pragmatic & understandable. An awk version could work for this case, but less readable/ likely to be more buggy |
Beta Was this translation helpful? Give feedback.
-
If we assume that a good status is both only true, and sufficient, when the relatedHTTPCode is 200, then a very simple check can be done by doing an exec with the above curl command, and piping to a simple regex -- in the test case this might be This does not solve a more general case, has or even begin to address aggregated results, but will work today with the existing platform in determining a server is at least running. Generally an external check is also much better than relying on exec too. However, I will add this note to the docs |
Beta Was this translation helpful? Give feedback.
-
HOWEVER ... A server starting up will also return 200 -- yet it is not ready
Therefore the relatedHTTPCode is insufficient. Using jq seems inevitable for any check to be possible without adding new rest api calls. A better solution is that we need clean status check calls. |
Beta Was this translation helpful? Give feedback.
-
As discussed with Nigel, I think a new healthcheck API would make sense which returns the 200 or not as per Kubenetes expectations. I envisage for each servertype it checks the status of its capabilities - initially this could be using existing status calls. If these are not adequate will could add more logic in this server based healthcheck API shim layer. For example it might want to issue a call to a downstream OMAS to see if it is working. The glossary author UI issues get all glossaries with a page size of 1 to see if the backend if active and the UI is usable.; we could do something similar in this layer if we need to. |
Beta Was this translation helpful? Give feedback.
-
I'll take a look at prototyping this |
Beta Was this translation helpful? Give feedback.
-
I've merged the first version of some prototype code into the egeria-cloudnative repository This is very rudimentary but offers a basic API to see if a server is running. Next steps may include
As well as some tech omissions like
However I thought it worth checking in as-is since
|
Beta Was this translation helpful? Give feedback.
-
In the workgroup call yesterday we talked briefly about healthchecks.
I started off capturing some output and summarizing the behaviour. Since this was a first-pass to add into the documentation I worked in markdown. I haven't spent long on it yet
There's a PR at odpi/egeria-docs#775
However it reminded me immediately of the significant issue we have with our current status calls - even before we consider what 'ready' means, and how to rollup status.
One of the most used Kubernetes healthcheck approaches, is to make a http request to a specific endpoint. If it returns >=200 and <400 life is good - pass. If not/timeout - fail. This is simple to define, and would usually work well.
Egeria though -- assuming the platform is active -- will always return HTTP/200, and inspection of the body is required to determine the real result.
Healthchecks can be defined to issue a request within the container. One could imagine this being a script, which would do the interpretation, and return the simpler result we are looking for, but this is (imo) ugly.
A much better approach would be to support simpler http requests aligned with the way many applications work, but this is different to our standard style. Perhaps it could be more flexible in our simple server launcher (vs the chassis)
We could potentially take a dual-track here:
a) define the appropriate command which could - however ugly - be used within an 'exec' k8s healthcheck . This could be used today and just needs a little experimentation. I'll take a look, and see if I can add into these docs
a1) extension: do this via a simple cli tool (script or java). This could also be done fairly quickly.
b) Implement simple http requests which are k8s healthcheck friendly - at least in our server launcher . This is cleaner, but could take significantly longer.
Note - There is a feature open against kubernetes to add some body matching - kubernetes/kubernetes#55405 which also refers to an example in healthcare.. and the same workaround I mention here
cc: @juergenhemelt @davidradl
Beta Was this translation helpful? Give feedback.
All reactions