You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanos version 0.35.0
Go version go1.21.9
Prometheus version 2.39.1
Object Storage Provider: Dell ECS
What happened: Query Frontend is unable to run some metrics in graph format, no matter the resolution, even though Querier and Grafana can.
What you expected to happen: Metric in graph view to return valid results just like Querier does.
How to reproduce it (as minimally and precisely as possible): Test heavy metrics such as container_oom_events_total in graph view on both query and query frontend. Tripper config is response_header_timeout: 2m and max_idle_conns_per_host: 100. The downstream is querier load balancer.
Full logs to relevant components: Error executing query: Unexpected token '<', " <bod"... is not valid JSON
Anything else we need to know:
sum(container_oom_events_total) and sum(container_oom_events_total) by (pod) both work in query frontend. It is only the metric on its own that doesn't.
Not all metrics have this issue. For example, kube_cronjob_info works just fine in graph format.
Running this metric a few times in a row doesn't help, it persists.
The table format works for container_oom_events_total, it is only the graph that doesn't.
Environment:
Openshift 4.12.27
The text was updated successfully, but these errors were encountered:
Hey @RomyKess! 👋 I'm here to help you with any bugs, questions, or contributions while you wait for a human maintainer. Let's squash some bugs together!
Additionally, for handling heavy metrics like container_oom_events_total, consider the following best practices:
Query Splitting: Use the --query-range.split-interval flag to split long queries into shorter ones, which can help prevent OOM issues and improve load balancing.
Caching: Enable caching to store and reuse query results, reducing backend load.
Retry Mechanism: Configure retries for failed queries using the --query-range.max-retries-per-request flag.
Horizontal Scalability: Deploy multiple instances of the Query Frontend to distribute the load.
Downstream Tripper Configuration: Ensure max_idle_conns_per_host is set to at least 100 to leverage HTTP keep-alive connections, reducing latency.
Slow Query Logging: Use the --query-frontend.log-queries-longer-than flag to log and optimize slow queries.
Compression: Enable response compression to reduce data transfer over the network [2][3].
These practices can help optimize the performance of the Query Frontend under high-load conditions.
Thanos, Prometheus and Golang version used:
Thanos version 0.35.0
Go version go1.21.9
Prometheus version 2.39.1
Object Storage Provider: Dell ECS
What happened: Query Frontend is unable to run some metrics in graph format, no matter the resolution, even though Querier and Grafana can.
What you expected to happen: Metric in graph view to return valid results just like Querier does.
How to reproduce it (as minimally and precisely as possible): Test heavy metrics such as container_oom_events_total in graph view on both query and query frontend. Tripper config is response_header_timeout: 2m and max_idle_conns_per_host: 100. The downstream is querier load balancer.
Full logs to relevant components: Error executing query: Unexpected token '<', " <bod"... is not valid JSON
Anything else we need to know:
sum(container_oom_events_total) and sum(container_oom_events_total) by (pod) both work in query frontend. It is only the metric on its own that doesn't.
Not all metrics have this issue. For example, kube_cronjob_info works just fine in graph format.
Running this metric a few times in a row doesn't help, it persists.
The table format works for container_oom_events_total, it is only the graph that doesn't.
Environment:
The text was updated successfully, but these errors were encountered: