-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Challenge of one metric per document #91775
Comments
Let's say each metric is ingested as a single document (so metric name would be a dimension) all the time. How much times more documents would be ingested? For example in this node scraper integration. In the second json example that you shared, I think we would end up with sparse fields, which I think isn't ideal as well. The first case we would then end up with more dense fields. |
What I posted above is a cut out of the full output from the node exporter and only took a very small subset of the metrics to make it easier to explain the problem. To get a better sense on how many more documents we would talk when shipping one document per metric, lets have a look at some existing metricsets:
There are smaller and larger metricsets but I think we are usually talking around 10x - 100x the number documents we would have if we have 1 metric per document. If we follow an approach of only 1 metric per document, there is also the option to fundamentally change the structure to something like:
It would fundamentally change how we query for metrics. @martijnvg Can you elaborate more on dense vs sparse (with examples) and what the related challenges are in this context? |
Are the mentioned metrics also from integrations that push metrics to us?
True, this would result in very different queries. I do think labels / dimensions should be included in that structure? Otherwise we would not join like functionality?
Actually this used to be problematic in Lucene but not today. A dense field is a field that every document in an index has a value for. A sparse field is a field that not every document has a value for. Sparse fields used to take as much disk space as dense fields and cpu time time during merging. This is no longer the case. So I don't think we need to worry about this. |
Pinging @elastic/es-analytics-geo (Team:Analytics) |
For a time series database, I think it's appropriate to not have an identity or even a dedicated document for the individual data points within a time series. Could we optimize time series data streams further if we didn't need to support operations that rely on the document's identity, such as the single document APIs? For metric values, we only need them in doc_values. The rest is time series dimensions or tags which need to be indexed and added to doc_values. Maybe that's effectively what we're doing when with doc_value only mappings for metrics, synthetic source, and _id-less indices. But maybe there's more we can do by not thinking of individual data points as documents. |
For the record, here are the downsides of having one metric per document that I can think of:
One interesting property that doesn't solve the problem is that all these issues would disappear with a downsampling operation: the downsampled index of an index that stored one metric per document or the same index that stored multiple metrics per document would be exactly the same. For the record, another approach could consist of having one data stream per metric. This would remove issues around sparse fields and query performance, but it still has the same issue with the higher disk and indexing overhead, and may introduce more issues with high numbers of indexes that we'd like to avoid. One metric per document feels like it's a better option for us than one metric per data stream. Being a bit creative, it may be possible for Elasticsearch to automatically "fix" sparseness by hooking into the merging process: when some input segments have sparse metrics, Elasticsearch could create a view around the incoming segments for the merge that would collapse adjacent documents (the fact that TSDB uses index sorting is important here) with the same dimensions and timestamps but different sets of metrics. This would not address the issue around slower indexing, but it would address issues with storage efficiency and query performance. |
This is interesting. Would be nice if Elasticsearch could do something like this in real time during ingestion. For example group all the metrics of the same dimension within the same second?
I prefer to think of it as one index per metrics behind a data stream so that from a user perspective it is an implementation detail on how it is persisted. But as there is a big chance that many metrics will only show up a few times, I expect we would have too many small indices.
Sounds similar to the approach mentioned before like having 1s downsampling. But here it is after ingestion instead of during ingestion. One important bit on this is that we would ensure on the edge that all separate metric docs that could be potentially merged must have an identical timestamp. |
This would have the nice property that I can query timeseries via metric names. For example, I can easily and simultaneously query all metrics starting with But I guess this 1 metric per document approach could pose some cost challenges. If you look at Prometheus's TSDB or Facebook's Gorilla, they are able to compress a metric sample (8 byte timestamp + 8 byte value) down to 1-2 bytes storage space. They need roughly 1 bit per timestamp (via double delta) and the rest for the compressed metric value and index structures. In comparison: In our Metricbeat index with a mixture of Prometheus and Kubernetes metrics, we are seeing ~10 metrics per document with a size of ~180 bytes. We are not using TSDB, yet. I would assume this could bring us down to 60 bytes per document, so about 6 bytes per metric. So even with TSDB we'd be a factor less efficient than these other timeseries databases. This gap would probably widen if we'd move to one metric per document. |
With #92045, the overhead of the @timestamp is kept very low due to double delta compression. In the future, we'd ideally not even create
What's the overhead of sparse fields and how does the on-disk structure look like for sparse vs dense doc values?
During ingest, APM Server is already doing that for OTLP: Metrics with the same set of dimensions are stored in the same document. The issue is that the metrics make more use of dimensions which implies that not as many metrics can be packed into the same document. I don't think downsampling or merging documents in ES would help. |
That would be fantastic.
Currently, sparse doc-values fields encode the set of documents that have a field in a similar way as roaring bitmaps. See javadocs of. https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/codecs/lucene90/IndexedDISI.java which give a bit more details. Dense doc values fields just store a flag in metadata about the field that says that the field is dense. At search time, such fields get a specialized implementation that skips checks for "does this doc have a value", and some queries go a bit further, e.g. So space-wise, the overhead of sparse fields can be very low (hardly noticeable) if docs that have the same fields are grouped together, and up to one bit per field per doc in the shard. I don't have a good intuition for the search-time CPU overhead of sparse fields however, it probably highly depends on queries. |
That's good news. Due to the default index sorting in TSDB, docs with the same fields are almost always grouped together (except for cases when a new metric gets added). I think this means that this they would be stored in a dense range in Lucene even though the docs across the whole index may have sparse values, right? |
If documents that have the same dimension values also have the same metric fields, then the TSDB index sorting should help keep the overhead low indeed. Lucene has a nightly benchmark that verifies that: https://home.apache.org/~mikemccand/lucenebench/sparseResults.html#index_size. It indexes the same dataset both in a dense way, and in a sparse way by giving yellow cabs and green cabs different field names, and checks index sizes of dense vs. sparse vs. sparse sorted by cab color. As you can see the space efficiency is the same in the dense and sparse sorted cases. |
Feedback provided, closing. |
@ruflin @felixbarny Is there anything that we need to do now for this issue? I think removing the _id / seq_no is covered in another issue. |
What is the conclusion? Would be great if it could be summarised here as a final comment. @martijnvg We need a summary on what the recommendation is. Also if there are follow up issues, can you link them here? Most important from my side is that the Elasticsearch team is aware more and more single metrics will be ingested and TSDB needs to be able to deal with it in a good way. |
Summary of the current downsides of the one metric per document approach:
Trimming or removing the _id / _seq_no fields only solve part of the 1 & 2 challenges. The Moving to a data stream per challenge could help with challenge 3, but we likely run into another scaling issue, the number of shards/indices/data streams. Although each data stream would then have a much smaller mapping than today. |
Maybe this is where a dedicated metrics query language (ESQL, PromQL) could help that automatically adds a filter on the metric name.
When creating that many different data streams, it would also make data management tasks, such as managing retention and permission very difficult. Especially when you don't want to manage that on a per-metric but on a per-service or per-team basis. |
True, but we still need an additional filter that we wouldn't need otherwise.
Good point. This would make authorisation much more complex. Another reason against a data stream per metric. |
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
Something I have not seen mentioned anywhere in these discussions is how we would handle multivariate aggregations/statistics, e.g. cross-correlation, https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-matrix-stats-aggregation.html Or a simpler one: calculate the ratio over time between data points of two time series, like in the example of https://prometheus.io/docs/prometheus/latest/querying/operators/#one-to-one-vector-matches These can be done fairly trivially when the metrics are in the same document, which is what we have historically done with our own data shippers (Metricbeat, APM), but that's not always something we can control. e.g. if OTel metrics have the same attributes but different timestamps then they will end up in different documents, and we effectively end up with a single metric per document. See also elastic/apm-data#372 (comment) I think this problem should be solvable, and should probably be part of requirements & design. |
@martijnvg is this something ES|QL will be able to do, even if the metrics are in different documents? |
I don't think that this is something that will just work in es|ql and requires some additional work. If I follow the |
Metricbeat and Elastic Agent ship multiple metrics with the same dimensions in a single document to Elasticsearch (example system memory doc). Having multiple metrics in a single document was the preferred way for Elasticsearch when Metricbeat was created. It is also nice for users to see in a single document many related metrics. But having many metrics in a single document only works as long as we are in control of the collection.
Many time series data bases think of a single metric with labels as a time serie. For example prometheus defines it as following:
During the collection of prometheus and otel metrics, Metricbeat tries to group together all the metrics with the same labels to not create one document per metric. In some cases this works well, in others not so much. In our data model, labels are used mostly for metadata, in prometheus labels are used for dimensions of metrics where we instead used the metric names for it.
Looking at the following prometheus example,
{collect=*}
is a label for the metricnode_scrape_collector_duration_seconds
. Each value of the label is unique to the metric so our conversion creates one document for each entry.This would result in documents similar to:
In our metrics format it would be more something like:
As you see above, the label made it into the metric name but not necessarily in a predicable place. This becomes more obvious when there are many labels:
How should we group this magically into a single document? Even in our scenario we send 1 document per file system but we group together multipel metrics for the same file system.
Push metrics
In more and more scenarios like otel, prometheus remote write, we don't collect the metrics but metrics are pushed to us. In these scenarios, we loose control around what metrics we receive in which order. Some caching would still allow us to group together some of the metrics but as described above, not all. The number of documents increases further.
Too many docs
The number of metric documents keeps increases compared to Metricbeat 10x to 100x even though we combine metrics and the number of metrics per document keeps decreasing. In some scenarios we reached 1 metric per document. But historically Elasticsearch did not like the overhead of single metrics docs.
I'm creating this issue to bring awareness to the Elasticsearch around this issue and discuss potential solutions.
Links
The text was updated successfully, but these errors were encountered: