Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(metadata): ability to get artifacts location for Argo-Workflows v3.0+ #5829

Closed
wants to merge 1 commit into from

Conversation

Subreptivus
Copy link

Starting from this change, Argo-Workflows isn't providing full data (with provider and bucket properties) in outputs annotation by default.
With this change metadata_writer will combine artifact file location from the outputs annotation and provider and bucket data from archiveLocation property of the template annotation in case of missing data in outputs annotation.

@google-cla google-cla bot added the cla: yes label Jun 10, 2021
@google-oss-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign ark-kun after the PR has been reviewed.
You can assign the PR to them by writing /assign @ark-kun in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-robot
Copy link

Hi @Subreptivus. Thanks for your PR.

I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

bucket=s3_artifact.get('bucket', ''),
key=s3_artifact.get('key', ''),
)
if (s3_artifact.keys() >= {'endpoint', 'bucket'}):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comparison looks fragile. Can we do this check in a more robust way?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, and why does it look fragile to you?
Looks clean and short check of multiple keys in dict to me comparing to checking for subset or making unnecessary loops.

@@ -325,8 +333,14 @@ def is_kfp_v2_pod(pod) -> bool:

output_artifacts = []
for name, art in argo_output_artifacts.items():
artifact_uri = argo_artifact_to_uri(art)
if not artifact_uri:
artifact_uri_check = argo_artifact_to_uri(art)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just call this artifact_uri?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we can, it's just a habit of not overwriting variables while making some checks.

if not artifact_uri:
artifact_uri_check = argo_artifact_to_uri(art)
if artifact_uri_check:
if re.search('(\W+)', artifact_uri_check).group(1) != '://':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be something like if re.match(r'^\W+://', artifact_uri):?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, it could be if not re.match(r'\w+://', artifact_uri):, I just don't like to use match if I'm searching for anything in the middle of the string.

artifact_uri_check = argo_artifact_to_uri(art)
if artifact_uri_check:
if re.search('(\W+)', artifact_uri_check).group(1) != '://':
artifact_uri_wo_key = argo_artifact_to_uri(argo_template.get('archiveLocation', {}), True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can always pass argo_template.get('archiveLocation', {}) to argo_artifact_to_uri and encapsulate this logic there?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing is, that despite the property name in archiveLocation they're storing the folder of artifact location but not the actual archive (file) location. For example in the outputs you will have file location like "key":"artifacts/artifact-passing-lr6fg/artifact-passing-lr6fg-2908156709/hello-art.tgz" and within archiveLocation you will have something like "key":"artifacts/artifact-passing-lr6fg/artifact-passing-lr6fg-2908156709"

@Ark-kun
Copy link
Contributor

Ark-kun commented Jun 11, 2021

Thank you for this contribution.

@stale
Copy link

stale bot commented Mar 2, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Mar 2, 2022
Copy link
Member

@Arhell Arhell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/ok-to-test

@google-oss-prow
Copy link

@Subreptivus: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
kubeflow-pipeline-backend-test b2ea678 link true /test kubeflow-pipeline-backend-test
kubeflow-pipeline-upgrade-test b2ea678 link true /test kubeflow-pipeline-upgrade-test
kubeflow-pipeline-e2e-test b2ea678 link true /test kubeflow-pipeline-e2e-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@Sharathmk99
Copy link

I'm facing same problem. Using KF version 1.3. Any workaround for now?

@stale stale bot removed the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Jun 12, 2023
@github-actions github-actions bot added the Stale label Jun 21, 2024
@github-actions github-actions bot closed this Jul 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants