Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query Frontend: unable to run queries in Graph Format #7799

Open
RomyKess opened this issue Oct 6, 2024 · 1 comment
Open

Query Frontend: unable to run queries in Graph Format #7799

RomyKess opened this issue Oct 6, 2024 · 1 comment

Comments

@RomyKess
Copy link

RomyKess commented Oct 6, 2024

Thanos, Prometheus and Golang version used:

Thanos version 0.35.0
Go version go1.21.9
Prometheus version 2.39.1

Object Storage Provider: Dell ECS

What happened: Query Frontend is unable to run some metrics in graph format, no matter the resolution, even though Querier and Grafana can.

What you expected to happen: Metric in graph view to return valid results just like Querier does.

How to reproduce it (as minimally and precisely as possible): Test heavy metrics such as container_oom_events_total in graph view on both query and query frontend. Tripper config is response_header_timeout: 2m and max_idle_conns_per_host: 100. The downstream is querier load balancer.

Full logs to relevant components: Error executing query: Unexpected token '<', " <bod"... is not valid JSON

Anything else we need to know:
sum(container_oom_events_total) and sum(container_oom_events_total) by (pod) both work in query frontend. It is only the metric on its own that doesn't.
Not all metrics have this issue. For example, kube_cronjob_info works just fine in graph format.
Running this metric a few times in a row doesn't help, it persists.
The table format works for container_oom_events_total, it is only the graph that doesn't.

Environment:

  • Openshift 4.12.27
Copy link

dosubot bot commented Oct 6, 2024

Hey @RomyKess! 👋 I'm here to help you with any bugs, questions, or contributions while you wait for a human maintainer. Let's squash some bugs together!

I found a similar closed issue that might be relevant to your problem: Error executing query: "Unexpected token '<',' <h'.. is not a valid json. This issue was resolved by a pull request that handled query parsing errors on the Query Frontend [1].

Additionally, for handling heavy metrics like container_oom_events_total, consider the following best practices:

  1. Query Splitting: Use the --query-range.split-interval flag to split long queries into shorter ones, which can help prevent OOM issues and improve load balancing.
  2. Caching: Enable caching to store and reuse query results, reducing backend load.
  3. Retry Mechanism: Configure retries for failed queries using the --query-range.max-retries-per-request flag.
  4. Horizontal Scalability: Deploy multiple instances of the Query Frontend to distribute the load.
  5. Downstream Tripper Configuration: Ensure max_idle_conns_per_host is set to at least 100 to leverage HTTP keep-alive connections, reducing latency.
  6. Slow Query Logging: Use the --query-frontend.log-queries-longer-than flag to log and optimize slow queries.
  7. Compression: Enable response compression to reduce data transfer over the network [2][3].

These practices can help optimize the performance of the Query Frontend under high-load conditions.

To continue talking to Dosu, mention @dosu.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant