Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot specify key for input artifact (without full artifact location) #3307

Closed
hadim opened this issue Jun 25, 2020 · 16 comments · Fixed by #4618
Closed

Cannot specify key for input artifact (without full artifact location) #3307

hadim opened this issue Jun 25, 2020 · 16 comments · Fixed by #4618
Assignees
Labels
solution/workaround There's a workaround, might not be great, but exists type/feature Feature request
Milestone

Comments

@hadim
Copy link

hadim commented Jun 25, 2020

Tested on 2.9.0-rc3

The official example works:

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: artifactory-repository-ref-
spec:
  entrypoint: main
  artifactRepositoryRef:
    key: minio
  templates:
    - name: main
      container:
        image: docker/whalesay:latest
        command: [sh, -c]
        args: ["cowsay hello world | tee /tmp/hello_world.txt"]
      outputs:
        artifacts:
          - name: hello_world
            path: /tmp/hello_world.txt

When switching to input it fails:

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: artifactory-repository-ref-
spec:
  entrypoint: main
  artifactRepositoryRef:
    key: minio
  templates:
    - name: main
      container:
        image: docker/whalesay:latest
        command: [sh, -c]
        args: ["cowsay hello world | tee /tmp/hello_world.txt"]
      inputs:
        artifacts:
          - name: hello_world
            path: /tmp/hello_world.txt

with

Failed to submit workflow: templates.entrypoint.steps[0].main templates.main-template inputs.artifacts.hello_world was not supplied

Another important point IMO. I don't see any ways to specify the key during workflow creation (the location of your object within S3/bucket). My understanding is that artifactRepositoryRef can be used to setup default repositories and then can be reused within workflows specifying the location of the folder or file we want to use as inputs or outputs. Was that designed for that purpose?

@hadim hadim added the type/bug label Jun 25, 2020
@alexec
Copy link
Contributor

alexec commented Jun 25, 2020

Can I confirm if this used to work and does not work anymore? Or if it just never seemed to work?

@hadim
Copy link
Author

hadim commented Jun 25, 2020

It never worked for me. See also #2461 (comment)

@hadim hadim changed the title artifactRepositoryRef does not work for inputs + no way to specify the localtion within the artifact (key) artifactRepositoryRef does not work for inputs + no way to specify the location within the artifact (key) Jun 25, 2020
@hadim
Copy link
Author

hadim commented Jun 25, 2020

I have tested it using both gcs and s3.

@alexec
Copy link
Contributor

alexec commented Jun 25, 2020

I think you must specify the key within the bucket, as well as bucket, endpoint etc. Not great I agree, but that is how it is today.

@hadim
Copy link
Author

hadim commented Jun 25, 2020

So what's the point artifactRepositoryRef if you must replicate all the config?

@alexec alexec added type/feature Feature request solution/workaround There's a workaround, might not be great, but exists and removed type/bug labels Jul 16, 2020
@alexec
Copy link
Contributor

alexec commented Jul 16, 2020

@hadim I'm going to recategorize this as an "enhancement". We should do more work in this area, and I'd like to asses interest.

@vitalyrychkov
Copy link
Contributor

Hi, i was looking for the same solution - i need many different artifacts as inputs and was hoping to have s3 parameters to be defined only once.

@alexec
Copy link
Contributor

alexec commented Jul 31, 2020

Do 👍 to show interest.

@dekovach
Copy link

It's was really surprising to me to find out that you cannot specify input artifacts from the default artifact repository. It should be supported by default. I would expect to be able to specify only the key in the default bucket, and then it should work out of the box.

@alexec
Copy link
Contributor

alexec commented Oct 14, 2020

This issue isn't really to do with artifactRepositoryRef, it's not supported at all. I'm going to rename this issue to reflect this.

@alexec alexec changed the title artifactRepositoryRef does not work for inputs + no way to specify the location within the artifact (key) Cannot specify key for input artifact (without full artifact location) Oct 14, 2020
@Bobgy
Copy link
Contributor

Bobgy commented Oct 21, 2020

Kubeflow Pipelines (KFP) try to separate cluster config (like artifact repository) from workflow config. If this feature is supported from argo, we could let each user manage its artifact repository in their own namespace while keeping the workflow definitions shareable.

This is one of the areas KFP hasn't been able to achieve multi-tenancy separation: kubeflow/pipelines#1223 (comment).

Some recent discussion: https://kubeflow.slack.com/archives/CE10KS9M4/p1602516358147900

@Ark-kun
Copy link
Member

Ark-kun commented Oct 22, 2020

It's was really surprising to me to find out that you cannot specify input artifacts from the default artifact repository.

Please help me understand, how it's supposed to choose the input artifact from the arepository (out of thousands of artifacts already there) based on this information alone?

      inputs:
        artifacts:
          - name: hello_world
            path: /tmp/hello_world.txt

This does not seem to contain any information that can be used to get an artifact.

@Ark-kun
Copy link
Member

Ark-kun commented Oct 22, 2020

One solution that could solve this feature request is to add support for a generic uri field in the artifact. The rest of the artifact repository information can be selected based on the artifact URI schema.

      inputs:
        artifacts:
          - name: hello_world
            path: /tmp/hello_world.txt
            uri: s3://my-bucket/my_key

This would also be a step towards making it possible to pass the artifact URIs using placeholders like {{tasks.some-task.outputs.hello_world.uri}}.

@alexec
Copy link
Contributor

alexec commented Oct 22, 2020

generic uri

How would you support secrets for username + password?

@alexec alexec added this to the v3.0 milestone Dec 20, 2020
@fvdnabee
Copy link
Contributor

It's was really surprising to me to find out that you cannot specify input artifacts from the default artifact repository.

Please help me understand, how it's supposed to choose the input artifact from the arepository (out of thousands of artifacts already there) based on this information alone?

      inputs:
        artifacts:
          - name: hello_world
            path: /tmp/hello_world.txt

This does not seem to contain any information that can be used to get an artifact.

I agree, however if the workflow definition defines the artifact type and the key (in case of s3), then the workflow controller should be able to figure out that it could try fetching the artifact from the default artifact repository (if that is s3) or from a referenced artifact location? This resembles more closely like the scenario posted by the OP in this issue #2461

I think you must specify the key within the bucket, as well as bucket, endpoint etc. Not great I agree, but that is how it is today.

This is also my understanding today.

For output artifacts, it is sufficient to only specify the name and the path in the wf spec. The wf controller will upload the output artifacts to the default artifact repository (if available). This enables you to decouple the artifact repository configuration from the workflow definition (e.g. this configuration might depend on a local, staging or production environment).

For input artifacts however, you need to specify the endpoint, bucket and key in the artifact definition (in the case of s3). I've been unsuccessful to decouple the artifact configuration from the workflow definition. Which, in my case, means the workflow spec depends on the environment as input arguments are fetched from different places in development, staging and production argo workflows.

For both input and output artifcats it would be wanted to:
a) decouple artifact configuration from workflow specification
b) enable this decoupling a per-artifact basis, as input artifact [A, B] might be fetched from repositories [X, Y] whereas output artifact C might be uploaded to repository Z.
Item b) is of interest in workflows where the input artifacts come from an artifact repository that differs from the artifact repo where argo stores its workflow output (which is typically more ephemeral of nature).

#4618 addresses some of these issues, but it is not clear to me if the decoupling is provided on a per-artifact basis. It appears to be configured an entire workflow.

@alexec
Copy link
Contributor

alexec commented Jan 12, 2021

#4618 de-couples artifact configuration from the workflow by allowing you to store it in a config map (or secret, I forget). You can only specify this configuration at the workflow level, so all artifacts must be stored in the same place UNLESS you are completely explicit.

I.e. it does not do (b). However, it does lay a lot of groud-work that would make (b) straight-forward to do.

Do you think (b) is a common use case? Give me a 👍 or 👎

@alexec alexec linked a pull request Jan 20, 2021 that will close this issue
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
solution/workaround There's a workaround, might not be great, but exists type/feature Feature request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants