inference with timeouts leads to Internal error #98

fsatka · 2023-06-06T08:52:17Z

ServingRuntime: torchserve

Current behavior

sent requests with client timeouts (load our modelmesh)
after some time, client starts to receive

ERROR:
  Code: Internal
  Message: org.pytorch.serve.grpc.inference.InferenceAPIsService/Predictions: INTERNAL: Model "test-search-vectorizer__isvc-409d074e44" has no worker to serve inference request. Please use scale workers API to add workers.

if we stop our load and start to send only one request without timeouts, we have same error or sticking

In my opinion, some resources can't released under client timeout, that lead to workers exousted. Becouse if we turn off client timeouts, there are no errors.

Expected behaivor

requests with timeouts can't lead to workers exousted and sticking on single request to modelmesh

Log Info

message

org.pytorch.serve.grpc.inference.InferenceAPIsService/Predictions: INTERNAL: Model "test-search-vectorizer__isvc-409d074e44" has no worker to serve inference request. Please use scale workers API to add workers.
InternalServerException.()

trace

ApplierException(message:org.pytorch.serve.grpc.inference.InferenceAPIsService/Predictions: INTERNAL: Model "test-search-vectorizer__isvc-409d074e44" has no worker to serve inference request. Please use scale workers API to add workers.
InternalServerException.(), causeStacktrace:io.grpc.StatusException: INTERNAL: Model "test-search-vectorizer__isvc-409d074e44" has no worker to serve inference request. Please use scale workers API to add workers.
InternalServerException.()
	at com.ibm.watson.modelmesh.SidecarModelMesh$ExternalModel$1.onClose(SidecarModelMesh.java:450)
, grpcStatusCode:INTERNAL)
	at com.ibm.watson.modelmesh.thrift.ModelMeshService$applyModelMulti_result$applyModelMulti_resultStandardScheme.read(ModelMeshService.java:1940)
	at com.ibm.watson.modelmesh.thrift.ModelMeshService$applyModelMulti_result$applyModelMulti_resultStandardScheme.read(ModelMeshService.java:1905)
	at com.ibm.watson.modelmesh.thrift.ModelMeshService$applyModelMulti_result.read(ModelMeshService.java:1815)
	at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:93)
	at com.ibm.watson.modelmesh.thrift.ModelMeshService$Client.recv_applyModelMulti(ModelMeshService.java:74)
	at com.ibm.watson.modelmesh.thrift.ModelMeshService$Client.applyModelMulti(ModelMeshService.java:59)
	at jdk.internal.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at com.ibm.watson.litelinks.client.ClientInvocationHandler.invoke(ClientInvocationHandler.java:496)
	at com.ibm.watson.litelinks.client.ClientInvocationHandler.invokeWithRetries(ClientInvocationHandler.java:383)
	at com.ibm.watson.litelinks.client.ClientInvocationHandler.doInvoke(ClientInvocationHandler.java:184)
	at com.ibm.watson.litelinks.client.ClientInvocationHandler.invoke(ClientInvocationHandler.java:118)
	at jdk.proxy2/jdk.proxy2.$Proxy30.applyModelMulti(Unknown Source)
	at jdk.internal.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at com.ibm.watson.modelmesh.SidecarModelMesh.invokeRemoteModel(SidecarModelMesh.java:1071)
	at com.ibm.watson.modelmesh.ModelMesh.invokeRemote(ModelMesh.java:4399)
	at com.ibm.watson.modelmesh.ModelMesh.invokeModel(ModelMesh.java:3644)
	at com.ibm.watson.modelmesh.SidecarModelMesh.callModel(SidecarModelMesh.java:1106)
	at com.ibm.watson.modelmesh.ModelMeshApi.callModel(ModelMeshApi.java:457)
	at com.ibm.watson.modelmesh.ModelMeshApi$4.onHalfClose(ModelMeshApi.java:733)
	at io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:355)
	at io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:867)
	at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
	at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Thread.java:833)
	at com.ibm.watson.litelinks.server.ServerRequestThread.run(ServerRequestThread.java:47)

The text was updated successfully, but these errors were encountered:

wDevil · 2023-09-11T13:43:12Z

Are there any updates on this bug?

ckadner · 2024-01-19T22:31:35Z

@fsatka -- it's been a long time since you opened this issue and several updates have been made since. Can you still reproduce this bug in the latest version of ModelMesh-Serving v0.11.2?

rafvasq added the bug Something isn't working label Jul 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

inference with timeouts leads to Internal error #98

inference with timeouts leads to Internal error #98

fsatka commented Jun 6, 2023 •

edited

Loading

wDevil commented Sep 11, 2023

ckadner commented Jan 19, 2024

inference with timeouts leads to Internal error #98

inference with timeouts leads to Internal error #98

Comments

fsatka commented Jun 6, 2023 • edited Loading

Current behavior

Expected behaivor

Log Info

wDevil commented Sep 11, 2023

ckadner commented Jan 19, 2024

fsatka commented Jun 6, 2023 •

edited

Loading