Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TraceQL Metrics] Remove all obsolete settings and code for RF3 metrics #3995

Merged
merged 5 commits into from
Aug 26, 2024

Conversation

mdisibio
Copy link
Contributor

@mdisibio mdisibio commented Aug 22, 2024

What this PR does:
Deletes all obsolete settings and code for metrics queries against RF3 blocks. This path didn't scale, so we pivoted in the last release to a read path using RF1 blocks flushed from the generators. These don't need deduping or complex job handling, and it scales really well. This is the long-term duration, and some of the RF3 path is already broken accidentally by other feature additions, so it's good to delete it now.

This introduces some breaking changes against main builds in the past few months. (There is no breaking change with v2.5, these are unreleased features changing pre-2.6)

  1. the rf1_read_path frontend setting is deleted. The read path is always rf1 now. This will cause a configuration parse error on startup if you have set this value. It must be removed.
  2. The local-blocks processor must be enabled to start using metrics queries like { } | rate(). If not enabled metrics queries fail with the error localblocks processor not found. Enabling the local-blocks processor can be done two ways:

(a) . Per-tenant in the per-tenant overrides:

overrides:
  'tenantID':
    metrics_generator_processors:
      - local-blocks

(b) or by default for all tenants in the main config:

overrides:
  defaults:
    metrics_generator:
      processors: [local-blocks] 
  1. The get metrics queries on historical data, you must configure the local-blocks processor to flush rf1 blocks to object storage:
metrics_generator:
  processor:
    local_blocks:
      flush_to_storage: true

Other changes:

  1. The query hints sample and job_interval are obsolete and were only parts of the RF3 sharding mechanism. These hints no longer do anything.

Which issue(s) this PR fixes:
Fixes n/a

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@mdisibio mdisibio marked this pull request as ready for review August 23, 2024 13:20
@mdisibio mdisibio changed the title WIP - [TraceQL Metrics] Remove all obsolete settings and code for RF3 metrics [TraceQL Metrics] Remove all obsolete settings and code for RF3 metrics Aug 23, 2024
Copy link
Member

@joe-elliott joe-elliott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@javiermolinar
Copy link
Contributor

javiermolinar commented Aug 26, 2024

@mdisibio I have found another 500 error response when the metrics-generator is disabled:

error finding generators in Querier.SpanMetricsSummary: empty ring

This comes from here:

return nil, fmt.Errorf("error finding generators in Querier.SpanMetricsSummary: %w", err)

when the metrics_generator: entry in the Tempo config yaml is not present.

Since this will be there for the whole release until RF1 is done, we should consider stopping early any traceql metric query if the generator is not enabled, with a more descriptive error.

@mdisibio
Copy link
Contributor Author

@mdisibio I have found another 500 error response when the metrics-generator is disabled:

error finding generators in Querier.SpanMetricsSummary: empty ring

Thanks. That should stop the query already, when the querier job returns error to the frontend. By earlier do you mean do the check in the query-frontend? The new error could be a little cleaner but the query-frontend doesn't access the generator ring currently, and not sure I want to introduce that as a last-minute change.

However the error message is incorrect, it was copy-and-pasted from the other API. Fixed that and a few others.

@mdisibio mdisibio merged commit 43d9326 into grafana:main Aug 26, 2024
17 checks passed
@joe-elliott joe-elliott added type/bug Something isn't working backport release-v2.6 labels Aug 26, 2024
Copy link
Contributor

The backport to release-v2.6 failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new branch
git switch --create backport-3995-to-release-v2.6 origin/release-v2.6
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x 43d932647732f236e09b21d149d0f9ea66199e5c

When the conflicts are resolved, stage and commit the changes:

git add . && git cherry-pick --continue

If you have the GitHub CLI installed:

# Push the branch to GitHub:
git push --set-upstream origin backport-3995-to-release-v2.6
# Create the PR body template
PR_BODY=$(gh pr view 3995 --json body --template 'Backport 43d932647732f236e09b21d149d0f9ea66199e5c from #3995{{ "\n\n---\n\n" }}{{ index . "body" }}')
# Create the PR on GitHub
echo "${PR_BODY}" | gh pr create --title '[release-v2.6] [TraceQL Metrics] Remove all obsolete settings and code for RF3 metrics' --body-file - --label 'type/bug' --label 'backport' --base release-v2.6 --milestone release-v2.6 --web

Or, if you don't have the GitHub CLI installed (we recommend you install it!):

# Push the branch to GitHub:
git push --set-upstream origin backport-3995-to-release-v2.6

# Create a pull request where the `base` branch is `release-v2.6` and the `compare`/`head` branch is `backport-3995-to-release-v2.6`.

# Remove the local backport branch
git switch main
git branch -D backport-3995-to-release-v2.6

@joe-elliott joe-elliott removed the type/bug Something isn't working label Aug 26, 2024
joe-elliott pushed a commit that referenced this pull request Aug 26, 2024
…cs (#3995)

* Remove all obsolete settings and code for RF3 metrics. The only path going forward is RF1-based

* Remove remaining references to rf1_read_path

* changelog

* Remove another reference to obsolete sampling rate

* Update error messages for correctness

(cherry picked from commit 43d9326)
joe-elliott added a commit that referenced this pull request Aug 27, 2024
…cs (#3995) (#4010)

* Remove all obsolete settings and code for RF3 metrics. The only path going forward is RF1-based

* Remove remaining references to rf1_read_path

* changelog

* Remove another reference to obsolete sampling rate

* Update error messages for correctness

(cherry picked from commit 43d9326)

Co-authored-by: Martin Disibio <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants