Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change from Provider CLI to Provider API #376

Open
grahamia opened this issue Oct 16, 2024 · 0 comments
Open

Change from Provider CLI to Provider API #376

grahamia opened this issue Oct 16, 2024 · 0 comments

Comments

@grahamia
Copy link
Contributor

grahamia commented Oct 16, 2024

Is your feature request related to a problem? Please describe.
Currently monitoring of workflow failures is not done. So if a call to VAI for example fails then the workflow finishes and no one is the wiser.

Currently the various workflows KFP Operator triggers start a pod for the provider image which provides a CLI for the various bits of functionality.

e.g. /provider --provider <config location> pipeline create --pipeline-definition /resource-definition.yaml

To allow for better monitoring of provider events turning this step into calling an api on a running deployment that can expose metrics (using open telemetry) (rather than relying on publishing metrics to the argo workflow controller, which we have already experienced is not the best solution).

Describe the solution you'd like
We have the event source server/event processor for handling the run completion events as a deployment already. Idea would be to add a http rest interface to this deployment to handle the different CLI commands that are currently made to be CRUD HTTP requests on the running deployment. The workflows will then all need to be changed so that they make http requests out rather than start a pod up. A metrics endpoint should also be added to expose metrics for the various events that are processed and whether was successful or not. Analysis should be carried to see what useful metrics would be for the different endpoints.

--provider = this is the custom resource so service will still need to load the resource to get the config as it currently does. Provider name will be a template var.

The different resource definitions will be passed in the body of the request.

/pipeline
/run
/schedule
/experiment

PUT = Create
POST = Update
DELETE = Delete

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant