Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vertex AI Pipeline - Container OP set_cpu_limit does not work with parameter_values nor at runtime #6681

Closed
SaschaHeyer opened this issue Oct 5, 2021 · 19 comments
Assignees
Labels
area/sdk kind/bug lifecycle/stale The issue / pull request is stale, any activities remove this label.

Comments

@SaschaHeyer
Copy link

SaschaHeyer commented Oct 5, 2021

Hello Kubeflow Team,
Hello Google Team,

The container OP .set_cpu_limitonly works when the value is set explicit and not via parameter_values or at runtime

Reproduce

  1. parameter_values: see steps to reproduce
  2. runtime: see https://github.com/kubeflow/pipelines/blob/master/samples/core/resource_spec/runtime_resource_request.py

Environment

  • How did you deploy Kubeflow Pipelines (KFP)? Vertex AI Pipelines
  • kfp 1.8.4
  • kfp-pipeline-spec 0.1.11
  • kfp-server-api 1.7.0

Steps to reproduce

Not working

from kfp.v2.dsl import pipeline

@pipeline(name="reproduction",
              pipeline_root="ADD PIPELINE ROOT")
def pipeline(cpu_limit: str):
    train_op = train().set_cpu_limit(cpu_limit)

compiler.Compiler().compile(pipeline_func=pipeline,
        package_path='pipeline.json')

api_client = AIPlatformClient(
                project_id="ADD PROJECT",
                region="us-central1"
                )

response = api_client.create_run_from_job_spec(
    'pipeline.json',
      parameter_values={
          'cpu_limit': "16"
  }
)

Working

from kfp.v2.dsl import pipeline

@pipeline(name="reproduction",
              pipeline_root="ADD PIPELINE ROOT")
def pipeline():
    train_op = train().set_cpu_limit("16")

compiler.Compiler().compile(pipeline_func=pipeline,
        package_path='pipeline.json')

api_client = AIPlatformClient(
                project_id="ADD PROJECT",
                region="us-central1"
                )

response = api_client.create_run_from_job_spec(
    'pipeline.json'
)

Expected result

The CPU limits can be set via parameter_values

Looking forward to your feedback

@SaschaHeyer SaschaHeyer changed the title Vertex AI Pipeline - Container OP set_cpu_limitdoes not work with parameter_values Vertex AI Pipeline - Container OP set_cpu_limit does not work with parameter_values Oct 5, 2021
@SaschaHeyer SaschaHeyer changed the title Vertex AI Pipeline - Container OP set_cpu_limit does not work with parameter_values Vertex AI Pipeline - Container OP set_cpu_limit does not work with parameter_values (at runtime) Oct 5, 2021
@SaschaHeyer SaschaHeyer changed the title Vertex AI Pipeline - Container OP set_cpu_limit does not work with parameter_values (at runtime) Vertex AI Pipeline - Container OP set_cpu_limit does not work with parameter_values nor at runtime Oct 5, 2021
@SaschaHeyer SaschaHeyer changed the title Vertex AI Pipeline - Container OP set_cpu_limit does not work with parameter_values nor at runtime Vertex AI Pipeline - Container OP set_cpu_limit does not work with **parameter_values** nor at **runtime** Oct 5, 2021
@SaschaHeyer SaschaHeyer changed the title Vertex AI Pipeline - Container OP set_cpu_limit does not work with **parameter_values** nor at **runtime** Vertex AI Pipeline - Container OP set_cpu_limit does not work with parameter_values nor at runtime Oct 5, 2021
@zijianjoy
Copy link
Collaborator

cc @chensun

@SaschaHeyer
Copy link
Author

Morning
any updates?

@chensun chensun self-assigned this Nov 11, 2021
@chensun
Copy link
Member

chensun commented Nov 11, 2021

Hi @SaschaHeyer , this is indeed a known limitation and we plan to discuss the best solution for this in Q1/Q2 2022.

Can you help us understand what's your use case to set a dynamic value for cpu limit, and how critical is this feature to you? Thanks!

@SaschaHeyer
Copy link
Author

SaschaHeyer commented Nov 23, 2021

Hi @chensun
Thanks a lot for your feedback.

I work for one of the biggest Google Cloud partners, we get this request regularly from our customers, at least once every 2 weeks.
Parameterizing the machine type (CPU and memory) can be really useful if you use the same pipeline just for different datasets and or hyperparameters (This way there is no need to re-compile).

Changing those hyperparameters also might require bigger machines.
For example, if you increase the batch size.

Currently, a re-compile of the pipeline is required. Would be useful if we could do this via parameter as well.

@iuiu34
Copy link

iuiu34 commented Nov 30, 2021

in this line, would be nice also if when a task throws an kfp error for being out of memory, that
a) you can play with the memory-limit as a parameter as SaschaHeyer request, just re-runing the task, not the whole pipeline (though if cache is enabled this maybe is already solved)
b) does the upscale automatically and re-runs the task again

@chensun
Copy link
Member

chensun commented Dec 16, 2021

@SaschaHeyer Thanks for the context!

@chensun
Copy link
Member

chensun commented Dec 16, 2021

in this line, would be nice also if when a task throws an kfp error for being out of memory, that
a) you can play with the memory-limit as a parameter as SaschaHeyer request, just re-runing the task, not the whole pipeline (though if cache is enabled this maybe is already solved)

Yes, caching would help here if the upstream doesn't have any changes on their inputs.

b) does the upscale automatically and re-runs the task again

This might create some surprise billing issue :)

@iuiu34
Copy link

iuiu34 commented Dec 16, 2021

This might create some surprise billing issue :)

yep, in case implemented, there should be an autoscale: bool = False argument in the function kfp.v2.compiler.Compiler().compile
But agree that option b): auto-scaling could have some dramatic problems for the user in terms of money money that option a) doesn't have.

@ashrafgt
Copy link

Huge +1 on this!

@stale
Copy link

stale bot commented Apr 17, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Apr 17, 2022
@iuiu34
Copy link

iuiu34 commented Apr 17, 2022

are plans to support this? or is explored in another ticket?

@stale stale bot removed the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Apr 17, 2022
@SaschaHeyer
Copy link
Author

Hi
are there any updates? This would be a huge benefit for re-using pipelines without the need to re-compile them.

@saigirishgilly98
Copy link

Hi are there any updates? This would be a huge benefit for re-using pipelines without the need to re-compile them.

+++

I agree with @SaschaHeyer, we are building reusable pipeline templates with only data changing and depending on the data size, we would want to be able to configure the CPU and Memory for each of the components through pipeline params or any other way.

@acarvalho2-wiq
Copy link

Hi guys, do we have any updates on this?
I am also looking for exactly the same dynamic parameterisation of my pipeline.

Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Jun 25, 2024
Copy link

This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.

Copy link

@entsarangi: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@entsarangi
Copy link

This is an useful feature to have cpu_limit available via pipeline_params ? Any update or workaround that doesn't involve hard-coded values ?

@tymorton
Copy link

tymorton commented Jan 7, 2025

This has been resolved.
#11097

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/sdk kind/bug lifecycle/stale The issue / pull request is stale, any activities remove this label.
Projects
Status: Closed
Development

No branches or pull requests

9 participants