query: Timings/Stats in API looks different than Prometheus #7960

jan-kantert · 2024-12-04T17:52:11Z

Thanos, Prometheus and Golang version used: 0.37.1

Object Storage Provider: S3

What happened:

When requesting stats=all from the API my client library breaks since the output looks different than what Prometheus returns.

URL http://thanos/api/v1/query?stats=all&query=sum(nginx_ingress_controller_requests)

What I got:

{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [
      {
        "metric": {},
        "value": [
          1733334578.612,
          "0"
        ]
      }
    ],
    "stats": {
      "samples": {
        "totalQueryableSamples": 66,
        "totalQueryableSamplesPerStep": null,
        "peakSamples": 18
      }
    },
    "analysis": {
      "name": "",
      "executionTime": "0s",
      "children": null
    }
  },
  "warnings": [
    "PromQL info: metric might not be a counter, name does not end in _total/_sum/_count/_bucket: \"nginx_ingress_controller_requests\""
  ]
}

What you expected to happen:

I expected an output such as (this is from Prometheus):

{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [
      {
        "metric": {},
        "value": [
          1733334244.573,
          "0"
        ]
      }
    ],
    "stats": {
      "timings": {
        "evalTotalTime": 0.000113446,
        "resultSortTime": 0,
        "queryPreparationTime": 0.000037608,
        "innerEvalTime": 0.000066364,
        "execQueueTime": 0.000018282,
        "execTotalTime": 0.000123218
      },
      "samples": {
        "totalQueryableSamples": 0,
        "peakSamples": 5
      }
    }
  }
}

How to reproduce it (as minimally and precisely as possible):

Open in your browser: http://thanos/api/v1/query?stats=all&query=sum(nginx_ingress_controller_requests)
Then open the same in your browser for Prometheus: http://prometheus/api/v1/query?stats=all&query=sum(nginx_ingress_controller_requests)

Anything else we need to know:

My client library breaks due to the missing timings key: puetzp/prometheus-http-query#15

The text was updated successfully, but these errors were encountered:

dosubot · 2024-12-04T17:56:00Z

Hey @jan-kantert! I'm here to assist you with any bugs, questions, or contributions. Let's tackle this issue together!

The discrepancy you're experiencing between Thanos and Prometheus API responses when using stats=all is due to differences in how each handles and structures the response data. In Thanos, the response includes an analysis key with fields like executionTime, while Prometheus provides a timings key with detailed timing metrics such as evalTotalTime, resultSortTime, etc.

Currently, there is no documentation or code comments in the Thanos repository that explain the purpose of the analysis key or any configuration settings that influence the inclusion of timing metrics in the API response ^[1]. This difference in response structure is likely causing your client library to break due to the missing timings key.

To address this, you might need to adjust your client library to handle the Thanos response format or implement a workaround that maps the Thanos response to the expected Prometheus format.

To continue talking to Dosu, mention @dosu.

jan-kantert · 2024-12-04T20:08:45Z

Hey @jan-kantert! I'm here to assist you with any bugs, questions, or contributions. Let's tackle this issue together!

The discrepancy you're experiencing between Thanos and Prometheus API responses when using stats=all is due to differences in how each handles and structures the response data. In Thanos, the response includes an analysis key with fields like executionTime, while Prometheus provides a timings key with detailed timing metrics such as evalTotalTime, resultSortTime, etc.

Currently, there is no documentation or code comments in the Thanos repository that explain the purpose of the analysis key or any configuration settings that influence the inclusion of timing metrics in the API response [1]. This difference in response structure is likely causing your client library to break due to the missing timings key.

To address this, you might need to adjust your client library to handle the Thanos response format or implement a workaround that maps the Thanos response to the expected Prometheus format.

To continue talking to Dosu, mention @dosu.

I disagree with your judgement. One of the stated goals of Thanos is to offer a Prometheus compatible API. From the webpage:

Prometheus Compatible

Use the same tools you love, such as Grafana and others, that support the Prometheus Query API.

Same tool breaks because Thanos Query API does not behave like the Prometheus Query API (which seems to violate a stated project goal).

yeya24 · 2024-12-05T01:13:21Z

Are you using the new Thanos engine or the Prometheus query engine?

If it is the new engine I understand we don't support the timing stats.
If it is the Prometheus query engine, the same stats should be propagated. Do you use Thanos Query Frontend? The issue might be Thanos Query Frontend doesn't propagate those timing info.

MichaHoffmann · 2024-12-05T20:19:09Z

Just out of curiosity - are stats official API? I didn't find it in the prometheus http API spec. It feels like this depends somewhat on the engine and backend as to what stats make sense to display

jan-kantert · 2024-12-06T08:43:54Z

Are you using the new Thanos engine or the Prometheus query engine?

If it is the new engine I understand we don't support the timing stats. If it is the Prometheus query engine, the same stats should be propagated. Do you use Thanos Query Frontend? The issue might be Thanos Query Frontend doesn't propagate those timing info.

Good question. I am not 100% sure.

thanos-query-frontend -> Missing timing field
thanos-query -> Everything as expected

So yeah it looks like your analysis might be correct.

Just out of curiosity - are stats official API? I didn't find it in the prometheus http API spec. It feels like this depends somewhat on the engine and backend as to what stats make sense to display

I wondered that as well. Seems like most libraries support them based on whats returned in Prometheus.

yeya24 · 2024-12-10T07:09:21Z

I do think Thanos Query Frontend needs additional work to propagate them. But due to the nature of Thanos as queries are usually sharded, it is probably doesn't make much sense to progagate 100% the same timings info like Prometheus does

For example, Thanos can have its own resultSortTime and execQueueTime. Query evaluations are usually in parallel.

      "timings": {
        "evalTotalTime": 0.000113446,
        "resultSortTime": 0,
        "queryPreparationTime": 0.000037608,
        "innerEvalTime": 0.000066364,
        "execQueueTime": 0.000018282,
        "execTotalTime": 0.000123218
      },

I wondered that as well. Seems like most libraries support them based on whats returned in Prometheus.

That's because those libraries just proxy/forward the Prometheus response while Thanos has additional sharding or other logic

dosubot bot added bug component: query labels Dec 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

query: Timings/Stats in API looks different than Prometheus #7960

query: Timings/Stats in API looks different than Prometheus #7960

jan-kantert commented Dec 4, 2024

dosubot bot commented Dec 4, 2024

jan-kantert commented Dec 4, 2024 •

edited

Loading

yeya24 commented Dec 5, 2024

MichaHoffmann commented Dec 5, 2024

jan-kantert commented Dec 6, 2024

yeya24 commented Dec 10, 2024

query: Timings/Stats in API looks different than Prometheus #7960

query: Timings/Stats in API looks different than Prometheus #7960

Comments

jan-kantert commented Dec 4, 2024

dosubot bot commented Dec 4, 2024

jan-kantert commented Dec 4, 2024 • edited Loading

yeya24 commented Dec 5, 2024

MichaHoffmann commented Dec 5, 2024

jan-kantert commented Dec 6, 2024

yeya24 commented Dec 10, 2024

jan-kantert commented Dec 4, 2024 •

edited

Loading