Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support unknown resources #603

Closed
wants to merge 1 commit into from

Conversation

universam1
Copy link

@universam1 universam1 commented Oct 13, 2023

Description

Karpenter cannot be used on clusters where custom resources for pods are defined, such as device drivers like /dev/fuse used with Podman and many more (see references).

Following error is logged:

karpenter-778b9dbc4f-gk88t {"level":"ERROR",..."logger":"controller.provisioner","message":"Could not schedule pod, incompatible with provisioner \"default\", daemonset overhead={\"cpu\":\"562m\",\"memory\":\"758026799\",\"pods\":\"10\"}, no instance type satisfied resources {\"cpu\":\"1562m\",\"memory\":\"1831768623\",\"pods\":\"11\",\"smarter-devices/fuse\":\"1\"} and requirements karpenter.k8s.aws/instance-category In [c m r], karpenter.k8s.aws/instance-generation Exists >2, karpenter.k8s.aws/instance-hypervisor In [nitro], karpenter.k8s.aws/instance-size NotIn [medium micro nano small], karpenter.sh/capacity-type In [on-demand spot], karpenter.sh/provisioner-name In [default], kubernetes.io/arch In [amd64], kubernetes.io/os In [linux], node.kubernetes.io/node-group In [primary] (no instance type has enough resources)"}

Here we add a flag to instruct Karpenter to ignore certain defined resources, which will allow the usage of Karpenter for these clusters.

apiVersion: v1
kind: ConfigMap
metadata:
  name: karpenter-global-settings
data:
  ignoredDeviceRequests: "smarter-devices/fuse,some-other-device"

Fixes #751
Fixes aws/karpenter-provider-aws#2390
Fixes aws/karpenter-provider-aws#2899
Fixes navvis-dev/karpenter#3
Fixes aws/karpenter-provider-aws#3535
Fixes https://gitlab.com/gitlab-org/gitlab-runner/-/merge_requests/3717
Fixes #308
Fixes aws/karpenter-provider-aws#3315
Fixes aws/karpenter-provider-aws#3693

How was this change tested?

This fork is run in dozen of production clusters.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@universam1 universam1 requested a review from a team as a code owner October 13, 2023 08:52
universam1 added a commit to o11n/karpenter that referenced this pull request Oct 13, 2023
@github-actions
Copy link

This PR has been inactive for 14 days. StaleBot will close this stale PR after 14 more days of inactivity.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 27, 2023
@universam1
Copy link
Author

bump

@github-actions github-actions bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 28, 2023
@njtran
Copy link
Contributor

njtran commented Oct 31, 2023

Hey @universam1, we've deprecated the configmap as part of the v1beta1 APIs. Due to that, we won't be accepting any changes to the ConfigMap. More details here https://karpenter.sh/docs/upgrading/v1beta1-migration/

As another point, it looks like there are CI failures, and this is a fairly complex problem that warrants a design. Can you come to working group or kubernetes/karpenter-dev to discuss?

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 22, 2023
@k8s-triage-robot
Copy link

Unknown CLA label state. Rechecking for CLA labels.

Send feedback to sig-contributor-experience at kubernetes/community.

/check-cla
/easycla

Copy link

linux-foundation-easycla bot commented Jan 19, 2024

CLA Not Signed

@k8s-ci-robot k8s-ci-robot added the cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. label Jan 19, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: universam1
Once this PR has been reviewed and has the lgtm label, please assign jonathan-innis for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Feb 28, 2024
@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Feb 29, 2024
Karpenter cannot be used on clusters where custom resources for pods are used, such as device drivers like `/dev/fuse` used with Podman.

Following error is logged:
```
karpenter-778b9dbc4f-gk88t {"level":"ERROR",..."logger":"controller.provisioner","message":"Could not schedule pod, incompatible with provisioner \"default\", daemonset overhead={\"cpu\":\"562m\",\"memory\":\"758026799\",\"pods\":\"10\"}, no instance type satisfied resources {\"cpu\":\"1562m\",\"memory\":\"1831768623\",\"pods\":\"11\",\"smarter-devices/fuse\":\"1\"} and requirements karpenter.k8s.aws/instance-category In [c m r], karpenter.k8s.aws/instance-generation Exists >2, karpenter.k8s.aws/instance-hypervisor In [nitro], karpenter.k8s.aws/instance-size NotIn [medium micro nano small], karpenter.sh/capacity-type In [on-demand spot], karpenter.sh/provisioner-name In [default], kubernetes.io/arch In [amd64], kubernetes.io/os In [linux], node.kubernetes.io/node-group In [primary] (no instance type has enough resources)"}
```

Here we add a flag to instruct Karpenter to ignore certain defined resources, which will allow the usage of Karpenter for these clusters.
@k8s-ci-robot k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Feb 29, 2024
@coveralls
Copy link

Pull Request Test Coverage Report for Build 8092652375

Details

  • 1 of 2 (50.0%) changed or added relevant lines in 1 file are covered.
  • 2 unchanged lines in 1 file lost coverage.
  • Overall coverage increased (+0.01%) to 80.97%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/utils/resources/resources.go 1 2 50.0%
Files with Coverage Reduction New Missed Lines %
pkg/controllers/disruption/expiration.go 2 90.91%
Totals Coverage Status
Change from base Build 8060854800: 0.01%
Covered Lines: 8178
Relevant Lines: 10100

💛 - Coveralls

@jonathan-innis
Copy link
Member

Consider looking at #1305! This is our first iteration at solving this problem more comprehensively!

@universam1
Copy link
Author

closing in favor of #1305
Thank you @jonathan-innis for the effort!

@universam1 universam1 closed this Jun 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. needs-design size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
6 participants