Skip to content
This repository has been archived by the owner on Sep 2, 2020. It is now read-only.

Proposal: use metric md5 hash instead of the name to reduce memory footprint #208

Open
ifesdjeen opened this issue May 21, 2016 · 0 comments

Comments

@ifesdjeen
Copy link
Collaborator

It could be beneficial for memory to store the md5 hash of metric instead of the metric name in memory. Currently, we keep the metric names in memory for the lifetime of application. Although, given the size of the metric (let's say 50 symbols), they'd use a max of 5mb given 100K processed metrics. Anyways, just an idea I thought still worth throwing in. Might be I underestimate the size the strings may take up.

PK of the metric can be the md5 hash. One can always reconstruct the hash from the metric name to run the query. The real metric name can be a static column (also could be beneficial for storage reasons as it's going to be repeated just once per partition), so it would get retrieved for the end result.

Obvious problem here is hash collisions, which in this particular case has no real solution. Even if we can detect the hash collision using some cardinality measure, we can't know which exactly metric this one collides with.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant