You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Based on watsonx requirements, we should make available these metrics, at least:
'# of inference requests over defined time period
Avg. response time over defined time period
'# of successful / failed inference requests over defined time period
Compute utilization (CPU,GPU,Memory)
However, users won't find metrics with the same name and some of them need to be computed by combination. Examples:
failed inference requests over defined time period: you must do sth like tgi_batch_inference_count-tgi_batch_inference_success plus adding the time period syntax
Memory consumption: there isn't a specific istio/tgi/caikit metric for it (at least, i didn't find it). I thought users can compute it with sth similar to: sum(container_memory_working_set_bytes{pod='<isvc_predictor_pod_name>',namespace='<isvc_namespace>',container='',}) BY (pod, namespace)
Moreover, there are additional metrics which deserves to be documented, like tgi_request_generated_tokens_count
The text was updated successfully, but these errors were encountered:
Based on watsonx requirements, we should make available these metrics, at least:
However, users won't find metrics with the same name and some of them need to be computed by combination. Examples:
tgi_batch_inference_count-tgi_batch_inference_success
plus adding the time period syntaxsum(container_memory_working_set_bytes{pod='<isvc_predictor_pod_name>',namespace='<isvc_namespace>',container='',}) BY (pod, namespace)
Moreover, there are additional metrics which deserves to be documented, like
tgi_request_generated_tokens_count
The text was updated successfully, but these errors were encountered: