ai-on-gke benchmark locust load inferencer hits 90%+ cpu usage with master at 200+ users #766

annapendleton · 2024-08-06T21:19:30Z

At 200+ users, master hits a CPU 90%+ error and becomes non-responsive.

Taking out the related custom metrics related calls in these lines fixes the issue:

https://github.com/GoogleCloudPlatform/ai-on-gke/blob/main/benchmarks/benchmark/tools/locust-load-inference/locust-docker/locust-tasks/tasks.py#L174

ai-on-gke/benchmarks/benchmark/tools/locust-load-inference/locust-docker/locust-tasks/tasks.py

Line 179 in 54531da

handle_failed_response(request, resp)

Observation:

Custom metrics calls adds unscalable load on the master. The master is single threaded and it appears these calls are causing too much CPU pressure on the master for scaling up # of users for larger load tests.

Feature request:

Adjust the custom metrics implementation so that it doesn't add excessive load on the master at 200+ users.

annapendleton changed the title ~~ai-on-gke benchmark locust~~ ai-on-gke benchmark locust load inferencer hits 90%+ cpu usage with master at 200+ users Aug 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ai-on-gke benchmark locust load inferencer hits 90%+ cpu usage with master at 200+ users #766

ai-on-gke benchmark locust load inferencer hits 90%+ cpu usage with master at 200+ users #766

annapendleton commented Aug 6, 2024

ai-on-gke benchmark locust load inferencer hits 90%+ cpu usage with master at 200+ users #766

ai-on-gke benchmark locust load inferencer hits 90%+ cpu usage with master at 200+ users #766

Comments

annapendleton commented Aug 6, 2024