Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Base resource requests on actual usage #2540

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

NimJay
Copy link
Collaborator

@NimJay NimJay commented May 10, 2024

Background

  • This fixes Base CPU/Memory Requests on Actual Usage #591.
  • I looked at a deployment of Online Boutique from 4 days ago. I noted the max (actually used) vCPU and memory values of each microservice — within the full 4 days. (The deployment included the loadgenerator.) Olivier did a similar analysis in 2022.
  • For each value, I rounded up to the next 10m (for CPU) and the next 10MiB for memory — to add a buffer.

Change Summary

  • I've updated the resource.requests.cpu and resource.requests.memory values of each default microservices's Deployment.

Testing Procedure

  • We just have to make sure the staging URL works fine. :)

Copy link

🚲 PR staged at http://34.42.127.234

@NimJay NimJay marked this pull request as ready for review May 10, 2024 23:47
@NimJay NimJay requested review from yoshi-approver and a team as code owners May 10, 2024 23:47
Copy link

🚲 PR staged at http://34.42.127.234

@NimJay NimJay marked this pull request as draft May 11, 2024 00:15
@NimJay NimJay added the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label May 11, 2024
@NimJay NimJay marked this pull request as ready for review May 11, 2024 00:16
@NimJay NimJay removed the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label May 11, 2024
@NimJay
Copy link
Collaborator Author

NimJay commented May 13, 2024

I have tested the new Kubernetes manifests on the regular channel and rapid channel (see delete-me-test-resources-new and delete-me-test-resources-new-rapid below).
The new manifests work.
Screenshot 2024-05-13 at 11 32 42 AM

However, I believe, on GKE autopilot, this only shaved off 0.25 vCPUs because the Managed Service for Prometheus is enabled by default on GKE autopilot and requires a lot (relative to Online Boutique) of resources.

Screenshot 2024-05-13 at 11 34 52 AM
Screenshot 2024-05-13 at 11 36 30 AM

Copy link

🚲 PR staged at http://34.42.127.234

Copy link

github-actions bot commented Jun 2, 2024

🚲 PR staged at http://34.42.127.234

cpu: 200m
memory: 450Mi
cpu: 30m
memory: 30Mi
Copy link
Contributor

@mathieu-benoit mathieu-benoit Jun 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting OOMKilled with this on a brand new Kind cluster locally, let's have 70Mi in limits and 50Mi in requests like the other Python app emailservice has?

@mathieu-benoit
Copy link
Contributor

mathieu-benoit commented Jun 3, 2024

FYI: I just deployed this in my own GKE Autopilot cluster, and I'm getting these values, k top pods -n onlineboutique-development:

NAME                                    CPU(cores)   MEMORY(bytes)
adservice-bc9667f7c-cdbrd               254m         128Mi
cartservice-69754cfcc9-pvf88            7m           89Mi
checkoutservice-856c5f7f68-cwnnn        84m          49Mi
currencyservice-66cc88ff55-5mk9l        11m          74Mi
emailservice-7d69bb6774-cvx2b           14m          81Mi
frontend-7fcf78c9bd-nxrz8               4m           50Mi
loadgenerator-76f787d859-nthg6          12m          142Mi
paymentservice-7495f4f8d9-b94rh         4m           66Mi
productcatalogservice-5dffd678c-8b7nn   5m           51Mi
recommendationservice-ff8fcbf9d-cgbjz   6m           91Mi
redis-575569978c-2z5fh                  6m           40Mi
shippingservice-7b6764896f-9zhd5        5m           47Mi

In addition to the comment on the recommendationservice, I'm getting these restarts because of memory limits too restrictive:

adservice-bc9667f7c-cdbrd               1/2     CrashLoopBackOff   115 (72s ago)   5h39m
checkoutservice-856c5f7f68-cwnnn        2/2     Running            1 (7m50s ago)   5h39m
currencyservice-66cc88ff55-5mk9l        2/2     Running            5 (62m ago)     5h39m```

Should we have:

  • For adservice: 150Mi for memory limits? and 150m for cpu limits?
  • For checkoutservice: 80Mi for memory limits?
  • For currencyservice: 90Mi for memory limits?

Copy link

github-actions bot commented Jun 4, 2024

🚲 PR staged at http://34.42.127.234

@bourgeoisor
Copy link
Member

@NimJay do you remember where this PR was left off?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Base CPU/Memory Requests on Actual Usage
4 participants