-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat[frontend]: implement artifact-repositories configmap support #11354
Conversation
Signed-off-by: droctothorpe <[email protected]> Co-authored-by: quinnovator <[email protected]>
4a25f6a
to
b373f18
Compare
@HumairAK to recreate the edge case:
metadata:
annotations:
workflows.argoproj.io/default-artifact-repository: artifact-repositories
name: artifact-repositories
namespace: kubeflow
data:
artifact-repositories: |-
archiveLogs: true
s3:
accessKeySecret:
key: accesskey
name: mlpipeline-minio-artifact
bucket: mlpipeline
endpoint: minio-service.kubeflow:9000
insecure: true
keyFormat: foo
secretKeySecret:
key: secretkey
name: mlpipeline-minio-artifact
kind: ConfigMap
|
I updated the instructions to reproduce to edge case to make manual validation a lot easier by running the UI on your host machine. |
Hey @droctothorpe /@quinnovator, thanks for this. I'm trying this out now and encountering some hiccups. I suspect it might be my environment/setup, but I want to confirm with you first.
the UI yaml is as posted below: deployment.yamlkind: Deployment
apiVersion: apps/v1
metadata:
annotations:
deployment.kubernetes.io/revision: '3'
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"apps/v1","kind":"Deployment","metadata":{"annotations":{},"labels":{"app":"ml-pipeline-ui","application-crd-id":"kubeflow-pipelines"},"name":"ml-pipeline-ui","namespace":"kubeflow"},"spec":{"selector":{"matchLabels":{"app":"ml-pipeline-ui","application-crd-id":"kubeflow-pipelines"}},"template":{"metadata":{"annotations":{"cluster-autoscaler.kubernetes.io/safe-to-evict":"true"},"labels":{"app":"ml-pipeline-ui","application-crd-id":"kubeflow-pipelines"}},"spec":{"containers":[{"env":[{"name":"DISABLE_GKE_METADATA","value":"true"},{"name":"VIEWER_TENSORBOARD_POD_TEMPLATE_SPEC_PATH","value":"/etc/config/viewer-pod-template.json"},{"name":"MINIO_NAMESPACE","valueFrom":{"fieldRef":{"fieldPath":"metadata.namespace"}}},{"name":"MINIO_ACCESS_KEY","valueFrom":{"secretKeyRef":{"key":"accesskey","name":"mlpipeline-minio-artifact"}}},{"name":"MINIO_SECRET_KEY","valueFrom":{"secretKeyRef":{"key":"secretkey","name":"mlpipeline-minio-artifact"}}},{"name":"ALLOW_CUSTOM_VISUALIZATIONS","value":"true"},{"name":"FRONTEND_SERVER_NAMESPACE","valueFrom":{"fieldRef":{"fieldPath":"metadata.namespace"}}}],"image":"gcr.io/ml-pipeline/frontend:2.3.0","imagePullPolicy":"IfNotPresent","livenessProbe":{"exec":{"command":["wget","-q","-S","-O","-","http://localhost:3000/apis/v1beta1/healthz"]},"initialDelaySeconds":3,"periodSeconds":5,"timeoutSeconds":2},"name":"ml-pipeline-ui","ports":[{"containerPort":3000}],"readinessProbe":{"exec":{"command":["wget","-q","-S","-O","-","http://localhost:3000/apis/v1beta1/healthz"]},"initialDelaySeconds":3,"periodSeconds":5,"timeoutSeconds":2},"resources":{"requests":{"cpu":"10m","memory":"70Mi"}},"volumeMounts":[{"mountPath":"/etc/config","name":"config-volume","readOnly":true}]}],"serviceAccountName":"ml-pipeline-ui","volumes":[{"configMap":{"name":"ml-pipeline-ui-configmap"},"name":"config-volume"}]}}}}
resourceVersion: '627852'
name: ml-pipeline-ui
uid: 4d454d73-05f6-4b01-b744-8a4c9aec05ec
creationTimestamp: '2024-11-12T22:37:37Z'
generation: 8
namespace: kubeflow
labels:
app: ml-pipeline-ui
application-crd-id: kubeflow-pipelines
spec:
replicas: 0
selector:
matchLabels:
app: ml-pipeline-ui
application-crd-id: kubeflow-pipelines
template:
metadata:
creationTimestamp: null
labels:
app: ml-pipeline-ui
application-crd-id: kubeflow-pipelines
annotations:
cluster-autoscaler.kubernetes.io/safe-to-evict: 'true'
spec:
restartPolicy: Always
serviceAccountName: ml-pipeline-ui
schedulerName: default-scheduler
terminationGracePeriodSeconds: 30
securityContext: {}
containers:
- resources:
requests:
cpu: 10m
memory: 70Mi
readinessProbe:
exec:
command:
- wget
- '-q'
- '-S'
- '-O'
- '-'
- 'http://localhost:3000/apis/v1beta1/healthz'
initialDelaySeconds: 3
timeoutSeconds: 2
periodSeconds: 5
successThreshold: 1
failureThreshold: 3
terminationMessagePath: /dev/termination-log
name: ml-pipeline-ui
livenessProbe:
exec:
command:
- wget
- '-q'
- '-S'
- '-O'
- '-'
- 'http://localhost:3000/apis/v1beta1/healthz'
initialDelaySeconds: 3
timeoutSeconds: 2
periodSeconds: 5
successThreshold: 1
failureThreshold: 3
env:
- name: DISABLE_GKE_METADATA
value: 'true'
- name: VIEWER_TENSORBOARD_POD_TEMPLATE_SPEC_PATH
value: /etc/config/viewer-pod-template.json
- name: MINIO_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- name: MINIO_ACCESS_KEY
valueFrom:
secretKeyRef:
name: mlpipeline-minio-artifact
key: accesskey
- name: MINIO_SECRET_KEY
valueFrom:
secretKeyRef:
name: mlpipeline-minio-artifact
key: secretkey
- name: ALLOW_CUSTOM_VISUALIZATIONS
value: 'true'
- name: FRONTEND_SERVER_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- name: ARGO_ARTIFACT_REPOSITORIES_LOOKUP
value: 'true'
- name: ARGO_ARCHIVE_LOGS
value: 'true'
ports:
- containerPort: 3000
protocol: TCP
imagePullPolicy: IfNotPresent
volumeMounts:
- name: config-volume
readOnly: true
mountPath: /etc/config
terminationMessagePolicy: File
image: 'quay.io/hukhan/ds-pipelines-frontend:11354'
serviceAccount: ml-pipeline-ui
volumes:
- name: config-volume
configMap:
name: ml-pipeline-ui-configmap
defaultMode: 420
dnsPolicy: ClusterFirst
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 25%
maxSurge: 25%
revisionHistoryLimit: 10
progressDeadlineSeconds: 600
status:
observedGeneration: 8
conditions:
- type: Progressing
status: 'True'
lastUpdateTime: '2024-11-13T04:36:07Z'
lastTransitionTime: '2024-11-12T22:37:37Z'
reason: NewReplicaSetAvailable
message: ReplicaSet "ml-pipeline-ui-5c5479dc67" has successfully progressed.
- type: Available
status: 'True'
lastUpdateTime: '2024-11-13T22:54:29Z'
lastTransitionTime: '2024-11-13T22:54:29Z'
reason: MinimumReplicasAvailable
message: Deployment has minimum availability.
And the artifact repository configmap: kind: ConfigMap
apiVersion: v1
metadata:
name: artifact-repositories
namespace: kubeflow
# edit: forgot to add this
annotations:
workflows.argoproj.io/default-artifact-repository: artifact-repositories
data:
artifact-repositories: |-
archiveLogs: true
s3:
accessKeySecret:
key: accesskey
name: mlpipeline-minio-artifact
bucket: mlpipeline
endpoint: minio-service.kubeflow:9000
insecure: true
keyFormat: custom_location/{{workflow.name}}/{{workflow.creationTimestamp.Y}}/{{workflow.creationTimestamp.m}}/{{workflow.creationTimestamp.d}}/{{pod.name}}
secretKeySecret:
key: secretkey
name: mlpipeline-minio-artifact
As you can see I have set
The pipeline used: pipeline.yaml# PIPELINE DEFINITION
# Name: tutorial-data-passing-2
# Inputs:
# message: str [Default: 'message']
components:
comp-preprocess:
executorLabel: exec-preprocess
inputDefinitions:
parameters:
message:
parameterType: STRING
outputDefinitions:
artifacts:
output_dataset_one:
artifactType:
schemaTitle: system.Dataset
schemaVersion: 0.0.1
output_dataset_two_path:
artifactType:
schemaTitle: system.Dataset
schemaVersion: 0.0.1
parameters:
output_bool_parameter_path:
parameterType: BOOLEAN
output_dict_parameter_path:
parameterType: STRUCT
output_list_parameter_path:
parameterType: LIST
output_parameter_path:
parameterType: STRING
comp-train:
executorLabel: exec-train
inputDefinitions:
artifacts:
dataset_one_path:
artifactType:
schemaTitle: system.Dataset
schemaVersion: 0.0.1
dataset_two:
artifactType:
schemaTitle: system.Dataset
schemaVersion: 0.0.1
parameters:
input_bool:
parameterType: BOOLEAN
input_dict:
parameterType: STRUCT
input_list:
parameterType: LIST
message:
parameterType: STRING
num_steps:
defaultValue: 100.0
isOptional: true
parameterType: NUMBER_INTEGER
outputDefinitions:
artifacts:
model:
artifactType:
schemaTitle: system.Model
schemaVersion: 0.0.1
deploymentSpec:
executors:
exec-preprocess:
container:
args:
- --executor_input
- '{{$}}'
- --function_to_execute
- preprocess
command:
- sh
- -c
- "\nif ! [ -x \"$(command -v pip)\" ]; then\n python3 -m ensurepip ||\
\ python3 -m ensurepip --user || apt-get install python3-pip\nfi\n\nPIP_DISABLE_PIP_VERSION_CHECK=1\
\ python3 -m pip install --quiet --no-warn-script-location 'kfp==2.3.0'\
\ '--no-deps' 'typing-extensions>=3.7.4,<5; python_version<\"3.9\"' && \"\
$0\" \"$@\"\n"
- sh
- -ec
- 'program_path=$(mktemp -d)
printf "%s" "$0" > "$program_path/ephemeral_component.py"
_KFP_RUNTIME=true python3 -m kfp.dsl.executor_main --component_module_path "$program_path/ephemeral_component.py" "$@"
'
- "\nimport kfp\nfrom kfp import dsl\nfrom kfp.dsl import *\nfrom typing import\
\ *\n\ndef preprocess(\n # An input parameter of type string.\n message:\
\ str,\n # Use Output[T] to get a metadata-rich handle to the output\
\ artifact\n # of type `Dataset`.\n output_dataset_one: Output[Dataset],\n\
\ # A locally accessible filepath for another output artifact of type\n\
\ # `Dataset`.\n output_dataset_two_path: OutputPath('Dataset'),\n\
\ # A locally accessible filepath for an output parameter of type string.\n\
\ output_parameter_path: OutputPath(str),\n # A locally accessible\
\ filepath for an output parameter of type bool.\n output_bool_parameter_path:\
\ OutputPath(bool),\n # A locally accessible filepath for an output parameter\
\ of type dict.\n output_dict_parameter_path: OutputPath(Dict[str, int]),\n\
\ # A locally accessible filepath for an output parameter of type list.\n\
\ output_list_parameter_path: OutputPath(List[str]),\n):\n \"\"\"\
Dummy preprocessing step.\"\"\"\n\n # Use Dataset.path to access a local\
\ file path for writing.\n # One can also use Dataset.uri to access the\
\ actual URI file path.\n with open(output_dataset_one.path, 'w') as\
\ f:\n f.write(message)\n\n # OutputPath is used to just pass\
\ the local file path of the output artifact\n # to the function.\n \
\ with open(output_dataset_two_path, 'w') as f:\n f.write(message)\n\
\n with open(output_parameter_path, 'w') as f:\n f.write(message)\n\
\n with open(output_bool_parameter_path, 'w') as f:\n f.write(\n\
\ str(True)) # use either `str()` or `json.dumps()` for bool\
\ values.\n\n import json\n with open(output_dict_parameter_path,\
\ 'w') as f:\n f.write(json.dumps({'A': 1, 'B': 2}))\n\n with\
\ open(output_list_parameter_path, 'w') as f:\n f.write(json.dumps(['a',\
\ 'b', 'c']))\n\n"
image: quay.io/opendatahub/ds-pipelines-ci-executor-image:v1.0
exec-train:
container:
args:
- --executor_input
- '{{$}}'
- --function_to_execute
- train
command:
- sh
- -c
- "\nif ! [ -x \"$(command -v pip)\" ]; then\n python3 -m ensurepip ||\
\ python3 -m ensurepip --user || apt-get install python3-pip\nfi\n\nPIP_DISABLE_PIP_VERSION_CHECK=1\
\ python3 -m pip install --quiet --no-warn-script-location 'kfp==2.3.0'\
\ '--no-deps' 'typing-extensions>=3.7.4,<5; python_version<\"3.9\"' && \"\
$0\" \"$@\"\n"
- sh
- -ec
- 'program_path=$(mktemp -d)
printf "%s" "$0" > "$program_path/ephemeral_component.py"
_KFP_RUNTIME=true python3 -m kfp.dsl.executor_main --component_module_path "$program_path/ephemeral_component.py" "$@"
'
- "\nimport kfp\nfrom kfp import dsl\nfrom kfp.dsl import *\nfrom typing import\
\ *\n\ndef train(\n # Use InputPath to get a locally accessible path\
\ for the input artifact\n # of type `Dataset`.\n dataset_one_path:\
\ InputPath('Dataset'),\n # Use Input[T] to get a metadata-rich handle\
\ to the input artifact\n # of type `Dataset`.\n dataset_two: Input[Dataset],\n\
\ # An input parameter of type string.\n message: str,\n # Use\
\ Output[T] to get a metadata-rich handle to the output artifact\n #\
\ of type `Model`.\n model: Output[Model],\n # An input parameter\
\ of type bool.\n input_bool: bool,\n # An input parameter of type\
\ dict.\n input_dict: Dict[str, int],\n # An input parameter of type\
\ List[str].\n input_list: List[str],\n # An input parameter of type\
\ int with a default value.\n num_steps: int = 100,\n):\n \"\"\"Dummy\
\ Training step.\"\"\"\n with open(dataset_one_path, 'r') as input_file:\n\
\ dataset_one_contents = input_file.read()\n\n with open(dataset_two.path,\
\ 'r') as input_file:\n dataset_two_contents = input_file.read()\n\
\ print(\"test3\")\n line = (f'dataset_one_contents: {dataset_one_contents}\
\ || '\n f'dataset_two_contents: {dataset_two_contents} || '\n\
\ f'message: {message} || '\n f'input_bool: {input_bool},\
\ type {type(input_bool)} || '\n f'input_dict: {input_dict},\
\ type {type(input_dict)} || '\n f'input_list: {input_list},\
\ type {type(input_list)} \\n')\n\n with open(model.path, 'w') as output_file:\n\
\ for i in range(num_steps):\n output_file.write('Step\
\ {}\\n{}\\n=====\\n'.format(i, line))\n\n # model is an instance of\
\ Model artifact, which has a .metadata dictionary\n # to store arbitrary\
\ metadata for the output artifact.\n model.metadata['accuracy'] = 0.9\n\
\n"
image: quay.io/opendatahub/ds-pipelines-ci-executor-image:v1.0
pipelineInfo:
name: tutorial-data-passing-2
root:
dag:
tasks:
preprocess:
cachingOptions:
enableCache: true
componentRef:
name: comp-preprocess
inputs:
parameters:
message:
componentInputParameter: message
taskInfo:
name: preprocess
train:
cachingOptions:
enableCache: true
componentRef:
name: comp-train
dependentTasks:
- preprocess
inputs:
artifacts:
dataset_one_path:
taskOutputArtifact:
outputArtifactKey: output_dataset_one
producerTask: preprocess
dataset_two:
taskOutputArtifact:
outputArtifactKey: output_dataset_two_path
producerTask: preprocess
parameters:
input_bool:
taskOutputParameter:
outputParameterKey: output_bool_parameter_path
producerTask: preprocess
input_dict:
taskOutputParameter:
outputParameterKey: output_dict_parameter_path
producerTask: preprocess
input_list:
taskOutputParameter:
outputParameterKey: output_list_parameter_path
producerTask: preprocess
message:
taskOutputParameter:
outputParameterKey: output_parameter_path
producerTask: preprocess
taskInfo:
name: train
inputDefinitions:
parameters:
message:
defaultValue: message
isOptional: true
parameterType: STRING
schemaVersion: 2.1.0
sdkVersion: kfp-2.3.0 So question for you:
|
Thank you so much for taking the time to validate this, @HumairAK, and for being so thorough with documenting the issue that you encountered. I'm going to try a couple of things to see if I can repro your results:
Results:
I think I see the problem. I did a comparison of the two configmaps: Your configmap is missing the Can you run the same test but make sure to include that annotation in your configmap? Thank you! 🙏 |
Ah yeah I actually did include the annotation, when trying to prune the configmap of runtime information for the github post, I accidentally left it out. I see that in your configmap your path is set to |
AHA found the reason:
we'll need to update the mlpipeline ui role for this and we should probably surface this error so it's easier to find next time |
frontend/server/workflow-helper.ts
Outdated
const [configMap] = await getConfigMap('artifact-repositories', namespace); | ||
if (configMap === undefined) { | ||
// If there is no artifact-repositories configmap, return undefined. The | ||
// caller will just use keyFormat as specified in configs.ts. | ||
return undefined; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's surface the k8serr from getConfigmap, and report that to console so it can help with troubleshooting if this part fails for whatever reasons (e.g. rbac is insufficient)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch, @HumairAK! I pushed up a fix. Here's some manual validation evidence as well:
Logs not available. Error surfacing in UI pod:
Is surfacing the error in the pod sufficient? We want to continue execution in case the default path specified by ARGO_KEYFORMAT
in configs.ts
works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The failing CI check seems unrelated to the code.
Signed-off-by: droctothorpe <[email protected]>
559d431
to
efabeb5
Compare
/lgtm let me retry running the ci |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: HumairAK The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@droctothorpe are you guys willing to provide some supplementing docs for this functionality? I'm thinking https://www.kubeflow.org/docs/components/pipelines/operator-guides/ here makes the most sense |
Thanks for the review and merge, @HumairAK! Happy to add some supplemental docs. Would you prefer for it to be in a standalone page or appended to https://www.kubeflow.org/docs/components/pipelines/operator-guides/configure-object-store/? |
I think we can create a standalone page on logging. Especially since we're planning future work on that front, there will probably be more to add later. WDYT? |
I didn't forget about this, @HumairAK! We just really slow down in Dec, heh. My only reservation wrt tackling it now is if we turn logs into first class artifacts, which I plan to start on very soon, the logging documentation will no longer be necessary / relevant. |
hrmm good point |
Unless someone is calling the backend for the frontend directly, it's hard to imagine a scenario in which an end user would be impacted by this. Maybe it's just lack of imagination on my part though, heh. Users do the darndest things. Once logs are proper artifacts, a user would have to be out of their mind to not use that pattern. This sequential guess work on the UIs part is terrible, inconsistent, and brittle. |
Description of your changes:
Argo Workflows supports the ability to specify an artifact repository for archived logs uniquely for each namespace rather than just globally as documented here.
This PR adds support for this functionality to the KFP UI. If a workflow runs and...
artifact-repositories
configmap in the target namespace that provides a unique, namespacedkeyFormat
...this PR ensures that logs are still accessible in the UI. This functionality is disabled by default to avoid unnecessary network calls (for end users who don't leverage namespace-specific
artifact-repositories
). It can be toggled on throughconfigs.ts
or an environment variable.Co-authored with @quinnovator 🌟 .
Fixes: #11339
Checklist: