Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

query: Timings/Stats in API looks different than Prometheus #7960

Open
jan-kantert opened this issue Dec 4, 2024 · 6 comments
Open

query: Timings/Stats in API looks different than Prometheus #7960

jan-kantert opened this issue Dec 4, 2024 · 6 comments

Comments

@jan-kantert
Copy link

Thanos, Prometheus and Golang version used: 0.37.1

Object Storage Provider: S3

What happened:

When requesting stats=all from the API my client library breaks since the output looks different than what Prometheus returns.

URL http://thanos/api/v1/query?stats=all&query=sum(nginx_ingress_controller_requests)

What I got:

{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [
      {
        "metric": {},
        "value": [
          1733334578.612,
          "0"
        ]
      }
    ],
    "stats": {
      "samples": {
        "totalQueryableSamples": 66,
        "totalQueryableSamplesPerStep": null,
        "peakSamples": 18
      }
    },
    "analysis": {
      "name": "",
      "executionTime": "0s",
      "children": null
    }
  },
  "warnings": [
    "PromQL info: metric might not be a counter, name does not end in _total/_sum/_count/_bucket: \"nginx_ingress_controller_requests\""
  ]
}

What you expected to happen:

I expected an output such as (this is from Prometheus):

{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [
      {
        "metric": {},
        "value": [
          1733334244.573,
          "0"
        ]
      }
    ],
    "stats": {
      "timings": {
        "evalTotalTime": 0.000113446,
        "resultSortTime": 0,
        "queryPreparationTime": 0.000037608,
        "innerEvalTime": 0.000066364,
        "execQueueTime": 0.000018282,
        "execTotalTime": 0.000123218
      },
      "samples": {
        "totalQueryableSamples": 0,
        "peakSamples": 5
      }
    }
  }
}

How to reproduce it (as minimally and precisely as possible):

Open in your browser: http://thanos/api/v1/query?stats=all&query=sum(nginx_ingress_controller_requests)
Then open the same in your browser for Prometheus: http://prometheus/api/v1/query?stats=all&query=sum(nginx_ingress_controller_requests)

Anything else we need to know:

My client library breaks due to the missing timings key: puetzp/prometheus-http-query#15

Copy link

dosubot bot commented Dec 4, 2024

Hey @jan-kantert! I'm here to assist you with any bugs, questions, or contributions. Let's tackle this issue together!

The discrepancy you're experiencing between Thanos and Prometheus API responses when using stats=all is due to differences in how each handles and structures the response data. In Thanos, the response includes an analysis key with fields like executionTime, while Prometheus provides a timings key with detailed timing metrics such as evalTotalTime, resultSortTime, etc.

Currently, there is no documentation or code comments in the Thanos repository that explain the purpose of the analysis key or any configuration settings that influence the inclusion of timing metrics in the API response [1]. This difference in response structure is likely causing your client library to break due to the missing timings key.

To address this, you might need to adjust your client library to handle the Thanos response format or implement a workaround that maps the Thanos response to the expected Prometheus format.

To continue talking to Dosu, mention @dosu.

@jan-kantert
Copy link
Author

jan-kantert commented Dec 4, 2024

Hey @jan-kantert! I'm here to assist you with any bugs, questions, or contributions. Let's tackle this issue together!

The discrepancy you're experiencing between Thanos and Prometheus API responses when using stats=all is due to differences in how each handles and structures the response data. In Thanos, the response includes an analysis key with fields like executionTime, while Prometheus provides a timings key with detailed timing metrics such as evalTotalTime, resultSortTime, etc.

Currently, there is no documentation or code comments in the Thanos repository that explain the purpose of the analysis key or any configuration settings that influence the inclusion of timing metrics in the API response [1]. This difference in response structure is likely causing your client library to break due to the missing timings key.

To address this, you might need to adjust your client library to handle the Thanos response format or implement a workaround that maps the Thanos response to the expected Prometheus format.

To continue talking to Dosu, mention @dosu.

I disagree with your judgement. One of the stated goals of Thanos is to offer a Prometheus compatible API. From the webpage:

Prometheus Compatible

Use the same tools you love, such as Grafana and others, that support the Prometheus Query API.

Same tool breaks because Thanos Query API does not behave like the Prometheus Query API (which seems to violate a stated project goal).

@yeya24
Copy link
Contributor

yeya24 commented Dec 5, 2024

Are you using the new Thanos engine or the Prometheus query engine?

If it is the new engine I understand we don't support the timing stats.
If it is the Prometheus query engine, the same stats should be propagated. Do you use Thanos Query Frontend? The issue might be Thanos Query Frontend doesn't propagate those timing info.

@MichaHoffmann
Copy link
Contributor

Just out of curiosity - are stats official API? I didn't find it in the prometheus http API spec. It feels like this depends somewhat on the engine and backend as to what stats make sense to display

@jan-kantert
Copy link
Author

Are you using the new Thanos engine or the Prometheus query engine?

If it is the new engine I understand we don't support the timing stats. If it is the Prometheus query engine, the same stats should be propagated. Do you use Thanos Query Frontend? The issue might be Thanos Query Frontend doesn't propagate those timing info.

Good question. I am not 100% sure.

thanos-query-frontend -> Missing timing field
thanos-query -> Everything as expected

So yeah it looks like your analysis might be correct.

Just out of curiosity - are stats official API? I didn't find it in the prometheus http API spec. It feels like this depends somewhat on the engine and backend as to what stats make sense to display

I wondered that as well. Seems like most libraries support them based on whats returned in Prometheus.

@yeya24
Copy link
Contributor

yeya24 commented Dec 10, 2024

I do think Thanos Query Frontend needs additional work to propagate them. But due to the nature of Thanos as queries are usually sharded, it is probably doesn't make much sense to progagate 100% the same timings info like Prometheus does

For example, Thanos can have its own resultSortTime and execQueueTime. Query evaluations are usually in parallel.

      "timings": {
        "evalTotalTime": 0.000113446,
        "resultSortTime": 0,
        "queryPreparationTime": 0.000037608,
        "innerEvalTime": 0.000066364,
        "execQueueTime": 0.000018282,
        "execTotalTime": 0.000123218
      },

I wondered that as well. Seems like most libraries support them based on whats returned in Prometheus.

That's because those libraries just proxy/forward the Prometheus response while Thanos has additional sharding or other logic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants