-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reporting the previous state on the current scrape cycle #130
Comments
Thanks for reporting, I won't have time to work on this, but if someone wants to tackle that and submit a PR I can take a look. In general I'd try to minimize the runtime of queries that the sql_exporter is executing as the main use case is really for simple health check queries that are lightweight and fast. |
Hello Philipp, Thank you for the fast response. In our situation optimizing a particular SQL query will not be a sufficient solution, since we have a number of heavy queries and soon or later we are going to face the same issue with another scrape query. |
You are the first to report this, and we also haven't seen that in our own usage over the past 7 years. The project is provided "as is", but contributions and bug fixes are very welcome. Thanks! |
@pgeorgie1 my metrics with this exporter do not have that label, but there are other ways to solve this:
In that case I would argue you need to change your configuration in order to make sure the DB scrapes don't overlap with each other. If you have queries that are expected to take longer than the configured interval, then you should increase the interval to make sure this will not happen. |
Summary: There is a possible race condition defect related with reporting the previous state during the current scrape cycle
Found in: sql_exporter, version 0.4.5
Actual state:
We have a scrape job which is performing database query one each scrape cycle. The SQL query, executed at each scrape cycle, is adding current_timestamp field to each SQL statement result. As result, each scrape data point shall be uniques, since the scrape cycles are performed at each 4 minute interval, the current_timestamp fields on consequential samples will be roughly 4 minutes offsetted.
The problematic observation is that when we inspect the data points, persisted in the Prometheus, we found that there are data point with 8 minutes interval and with the current setup, where the current_timestamp is rendering each data point to be unique, this behavior is incorrect.
We shall highlighted that for Prometheus, merging two sequential data point in a combined data point with totalinterval equals to the intervals of the merged data points is only possible if the sequential data points are completely equal (all their fields are equals).
In our case this merged data point behavior shall not be possible since each data point has unique current_timestamp field.
This improper behavior is presented at the attached screenshots, it might be observed that the referred data point has length of 8 minutes, instead of displaying two data point with 4 minutes length. One might observe that the screenshot is rendering just one broken data point, and all other data points have 4 minutes length.
a sample for broken data point:
a broken data point starting time:
a broken data point ending time:
Our root-cause hypothesis is that sometimes the 4 minute scrape cycle interval is not enough to perform the SQL query for the current scrap cycle and as result sql-exporter is reporting the previous state (the previous already persisted data point state).
The exposed root-cause hypothesis is supporting the observation that single data point are reported two consequential times and merged into one data point with double length (8 minutes, instead of expected 4) since all filed for the properties are equals, including the current sample timestamp.
Desired state:
A data state shall not be reported more than once, possibly when a data point is reported successfully the its state shall be removed from the registry of the sql-exported service. Possible an warning message shall be raised, if the current SQL queries is not completed in the scope of current scraping cycle interval.
The text was updated successfully, but these errors were encountered: