-
Notifications
You must be signed in to change notification settings - Fork 448
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
if experiment name is too long the suggestion service can't start #2454
Comments
I may be interested in contributing a fix if it would be welcome and some guidance could be provided as to how to go about it. |
Thank you for creating this @garymm! /assign @garymm Feel free to reach out if you have any questions. |
/remove-label lifecycle/needs-triage |
What exactly should the validaton be though? I think the better fix is to name the experiment service in a way that is guaranteed to be legal. Where might that happen in the code? |
@garymm Our longest algorithm name is: You just simple need to add this additional check here: https://github.com/kubeflow/katib/blob/master/pkg/webhook/v1beta1/experiment/validator/validator.go#L85 E.g. |
/good-first-issue |
@andreyvelich: Please ensure the request meets the requirements listed here. If this request no longer meets these requirements, the label can be removed In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Hello! @andreyvelich has a PR been made to fix this issue? If not, I would be open to helping out as well :) |
There are no PRs open that reference this issue. |
@AydanPirani Yes, please feel free to submit a PR |
Sorry for the delay, will get this up and going ASAP :) |
Hi @garymm (cc @andreyvelich)! I'm trying to reproduce this locally, and I had a few questions:
Thanks! |
I ran:
Just create an experiment with a long name, such that experiment name + algorithm name > 63 characters. |
@garymm Is right, this command prints the version of Katib controller which is equal to Katib version (e.g. v0.17.0). |
@andreyvelich Got it, thanks! Also, can you please point me to the docs page re. deployment? I'm looking at this, but I don't see much about local Docker builds... For clarity - I got the Docker build to work, I now have an image katib-master (that follows the master branch), how do I run this locally? Thanks again for the help, I appreciate it! |
@AydanPirani You can find some info here: https://github.com/kubeflow/katib/blob/master/CONTRIBUTING.md#build-from-source-code, but we don't really explain how to update images. After that, you can follow similar steps as for Training Operator to deploy Kind cluster locally and Katib using the standalone Kustomize overlay: https://github.com/kubeflow/training-operator/blob/master/CONTRIBUTING.md#run-a-kubernetes-cluster. |
Hi all,
Update: I have a local repro, getting the PR up now. |
What happened?
I provided an experiment name that was 57 characters long.
It got stuck waiting for trials to be created because the suggestion service couldn't be started because the name was more than 63 characters long.
What did you expect to happen?
Katib to pick a valid name for the service.
Environment
Kubernetes version:
Katib controller version:
$ kubectl get pods -n kubeflow -l katib.kubeflow.org/component=controller -o jsonpath="{.items[*].spec.containers[*].image}" docker.io/kubeflowkatib/katib-controller:v0.17.0
Katib Python SDK version:
Impacted by this bug?
Give it a 👍 We prioritize the issues with most 👍
The text was updated successfully, but these errors were encountered: