Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve storage by normalizing the current metrics table #181

Open
surister opened this issue Oct 14, 2024 · 0 comments
Open

Improve storage by normalizing the current metrics table #181

surister opened this issue Oct 14, 2024 · 0 comments
Labels
enhancement go Pull requests that update Go code

Comments

@surister
Copy link
Contributor

surister commented Oct 14, 2024

As pointed out by @proddata in some private conversations, we could further reduce the storage usage by normalizing how we store the data in CrateDB.

Currently we store the same information, label_set many times, e.g:

SELECT
  count(*)
FROM
  "doc"."metrics"
WHERE
  labels_hash = '478d4639912fc742' LIMIT 10
-- 36_558

In my 33M, dataset, this particular label hash is written 36k times, that means that we are writing (in this specific case) the following objects, (36.000 - 1) times unnecessarily.

{
  instance: "some_domain"
  __name__: "probe_http_content_length"
  job: "blackbox"
}

Also, in my test dataset, there are 939 unique label hashes, so potentially we could stop writing objects like that (33x10^6-939) times.

@surister surister added enhancement go Pull requests that update Go code labels Oct 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement go Pull requests that update Go code
Projects
None yet
Development

No branches or pull requests

1 participant