-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GRPC endpoint not responding properly after the InferenceService reports as Loaded
#146
Comments
I could work around the issue by increasing the memory limit of the Istio egress/ingress Pods (to 4GB, to be safe):
but this wasn't happening a few weeks ago, with RHOAI 2.1.0 and 300 models (when running on AWS with 35 nodes, whereas this bug occured on a single-node OpenShift) Can this be a regression, or is it somehow expected? |
@kpouget I am wondering if we can get some insights into these metrics as well:
|
@kpouget was it also running on istio underneath? if so - how was it configured? |
yes it was. Istio was using these files for configuration (pinned commit from what I used at the time of the test) |
I managed to reduce resource consumption roughly by half. Here's the script which you can apply. In short this script:
#!/bin/bash
cat <<EOF > smcp-patch.yaml
apiVersion: maistra.io/v2
kind: ServiceMeshControlPlane
metadata:
name: data-science-smcp
namespace: istio-system
spec:
gateways:
egress:
runtime:
container:
resources:
limits:
cpu: 1024m
memory: 4G
requests:
cpu: 128m
memory: 1G
ingress:
runtime:
container:
resources:
limits:
cpu: 1024m
memory: 4G
requests:
cpu: 128m
memory: 1G
runtime:
components:
pilot:
container:
env:
PILOT_FILTER_GATEWAY_CLUSTER_CONFIG: "true"
resources:
limits:
cpu: 1024m
memory: 4G
requests:
cpu: 128m
memory: 1024Mi
EOF
trap '{ rm -rf -- smcp-patch.yaml; }' EXIT
kubectl patch smcp/data-science-smcp -n istio-system --type=merge --patch-file smcp-patch.yaml
namespaces=$(kubectl get ns -ltopsail.scale-test -o name | cut -d'/' -f 2)
# limit sidecarproxy endpoints to its own ns and istio-system
for ns in $namespaces; do
cat <<EOF | kubectl apply -f -
apiVersion: networking.istio.io/v1beta1
kind: Sidecar
metadata:
name: default
namespace: $ns
spec:
egress:
- hosts:
- "./*"
- "istio-system/*"
EOF
done
# force changes to take effect
for ns in $namespaces; do
kubectl delete pods --all -n "${ns}"
done
# force re-creation of all pods with envoy service registry rebuilt
kubectl delete pods --all -n istio-system Initial state❯ istioctl proxy-config endpoint deployment/istio-ingressgateway -n istio-system | wc -l
1052
❯ istioctl proxy-config endpoint $(kubectl get pods -o name -n watsonx-scale-test-u1) -n watsonx-scale-test-u1 | wc -l
1065
❯ kubectl top pods -n istio-system
NAME CPU(cores) MEMORY(bytes)
istio-egressgateway-6b7fdb6cb9-lh5jg 100m 2519Mi
istio-ingressgateway-7dbdc66dd7-nkxxq 91m 2320Mi
istiod-data-science-smcp-65f4877fff-tndf4 82m 1392Mi
❯ kubectl k top pods -n watsonx-scale-test-u0 --containers
POD NAME CPU(cores) MEMORY(bytes)
u0-m0-predictor-00001-deployment-c46f9d59-jv9pq POD 0m 0Mi
u0-m0-predictor-00001-deployment-c46f9d59-jv9pq istio-proxy 14m 372Mi
... Modifications❯ istioctl proxy-config endpoint deployment/istio-ingressgateway -n istio-system | wc -l
1052 // it knows the whole world, so that is the same
❯ istioctl proxy-config endpoint $(kubectl get pods -o name -n watsonx-scale-test-u1) -n watsonx-scale-test-u1 | wc -l
34
❯ kubectl top pods -n istio-system
NAME CPU(cores) MEMORY(bytes)
istio-egressgateway-5778df8594-j869r 83m 444Mi
istio-ingressgateway-6847d4b974-sk25z 77m 946Mi
istiod-data-science-smcp-5568884d7d-45zkz 36m 950Mi
❯ kubectl k top pods -n watsonx-scale-test-u0 --containers
POD NAME CPU(cores) MEMORY(bytes)
u0-m0-predictor-00001-deployment-c46f9d59-jv9pq POD 0m 0Mi
u0-m0-predictor-00001-deployment-c46f9d59-jv9pq istio-proxy 6m 136Mi
... |
As part of my automated scale test, I observe that the InferenceService sometimes reports as
Loaded
, but the call to GRPC endpoint returns with errors.Examples:
Versions
The text was updated successfully, but these errors were encountered: