You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Currently monitoring of workflow failures is not done. So if a call to VAI for example fails then the workflow finishes and no one is the wiser.
Currently the various workflows KFP Operator triggers start a pod for the provider image which provides a CLI for the various bits of functionality.
e.g. /provider --provider <config location> pipeline create --pipeline-definition /resource-definition.yaml
To allow for better monitoring of provider events turning this step into calling an api on a running deployment that can expose metrics (using open telemetry) (rather than relying on publishing metrics to the argo workflow controller, which we have already experienced is not the best solution).
Describe the solution you'd like
We have the event source server/event processor for handling the run completion events as a deployment already. Idea would be to add a http rest interface to this deployment to handle the different CLI commands that are currently made to be CRUD HTTP requests on the running deployment. The workflows will then all need to be changed so that they make http requests out rather than start a pod up. A metrics endpoint should also be added to expose metrics for the various events that are processed and whether was successful or not. Analysis should be carried to see what useful metrics would be for the different endpoints.
--provider = this is the custom resource so service will still need to load the resource to get the config as it currently does. Provider name will be a template var.
The different resource definitions will be passed in the body of the request.
/pipeline
/run
/schedule
/experiment
PUT = Create
POST = Update
DELETE = Delete
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
Currently monitoring of workflow failures is not done. So if a call to VAI for example fails then the workflow finishes and no one is the wiser.
Currently the various workflows KFP Operator triggers start a pod for the provider image which provides a CLI for the various bits of functionality.
e.g.
/provider --provider <config location> pipeline create --pipeline-definition /resource-definition.yaml
To allow for better monitoring of provider events turning this step into calling an api on a running deployment that can expose metrics (using open telemetry) (rather than relying on publishing metrics to the argo workflow controller, which we have already experienced is not the best solution).
Describe the solution you'd like
We have the event source server/event processor for handling the run completion events as a deployment already. Idea would be to add a http rest interface to this deployment to handle the different CLI commands that are currently made to be CRUD HTTP requests on the running deployment. The workflows will then all need to be changed so that they make http requests out rather than start a pod up. A metrics endpoint should also be added to expose metrics for the various events that are processed and whether was successful or not. Analysis should be carried to see what useful metrics would be for the different endpoints.
--provider = this is the custom resource so service will still need to load the resource to get the config as it currently does. Provider name will be a template var.
The different resource definitions will be passed in the body of the request.
/pipeline
/run
/schedule
/experiment
PUT = Create
POST = Update
DELETE = Delete
The text was updated successfully, but these errors were encountered: