-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
event.duration
takes a significative amount of disk space
#31574
Comments
Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane) |
Pinging @elastic/obs-dc (Team:Obs-DC) |
If you dropped it to second precision or microsecond precision it might still be useful and take up much much less space. You could hit it with the disk usage API. |
@cmacknz any thoughts here? Should we change the precision or rather get rid of it? |
Could do we a test with a reduced precision, microsecond is definitively enough. The idea behind this field was to visualise and detect potential delays of the event collection. If we don't use it anywhere, we could also just introduce a config and turn it off by default. |
Not sure if reducing precision is a realistic option, this is defined as nanoseconds in ECS, there can be uses of this with this precision. Also, there are cases where an event can take sub-millisecond times, as when collecting system or runtime metrics, or in performance monitoring. I think that disabling it by default in Metricbeat would be a better option, but it can be also a breaking change if someone is using it. |
Agreed, I suspect this field is only used during module development and possibly in SDHs where we can just request it be enabled. Does anyone from @elastic/obs-cloud-monitoring or @elastic/obs-cloudnative-monitoring have any thoughts on this field? |
Just want to make sure we are only talking about |
From a disk usage standpoint you can send in the number in nanoseconds if you want and round it to microseconds. It'll just have a lot of trailing 0s which we will optimize away in storage. At least when synthetic source is available. But, yeah, not saving it at all is wonderful if we can get away with it. |
@kaiyan-sheng yes, the main concern is the field added automatically by Metricbeat. In the cases when the field is explicitly collected and used by an integration I think that this is fine. |
Hi! We're labeling this issue as |
Commenting as we should not drop this issue. |
Hi! We're labeling this issue as |
👍 |
This seems like something we could just start logging a summary of. In theory this should match the period in the configuration, I imagine the intent is to spot cases where we are slightly out of sync.
FYI @strawgate this is another place we could be reducing ingest volume. |
Great, thanks Craig! Perhaps we can round the field to a desired resolution (maybe 0.1s?) and have a setting to enable nanosecond precision? I imagine if we keep it as nanosecond but all the durations are rounded we reduce cardinality by a lot |
Edit: this is referencing |
Most of the text about describing the need are targetting |
Whoops, I misread that entire issue as applying to |
Reducing to millisecond would make it still useful while reducing cardinality by 100,000x, I can test what the savings is if that would be useful |
As a plan moving forward: Short-term: since the precision can't be adjusted, implement what @nik9000 suggests HERE. Document/benchmark the disk usage savings. medium/long-term: Since this is a breaking change I suggest adding it to the list for 9.0 and make it a configurable option then. We just don't have enough information on how users may be utilizing this field today. does this work? |
i think we should reduce precision as much as we can get away with as each decimal we drop is a 10x reduction in cardinality. I don't know that nano to micro will be a big enough savings but if someone can benchmark maybe we can find the "sweet spot" |
Looks like I wonder if the tests that suggested that @martijnvg do you have recent numbers of a storage breakdown by field so that we can see if Removing |
I believe last time, we gather the disk usage from Rally benchmarks. If the tracks aren't updated, then we also don't see any improvements. |
While analyzing disk space used by data collected by Metricbeat and stored in indexes with TSDB and synthetic
_source
enabled (elastic/elasticsearch#85649), @nik9000 found thatevent.duration
takes up to 16.7% of the disk space.This field is automatically added by Metricbeat, by default, with the duration of the fetch operation.
I guess that the main purpose of this field is to monitor or debug metrics collection itself, but this may not be so useful for the final users of most modules.
Being in Metricbeat, this is also added to metrics documents collected by Agent.
We should reconsider this field, or disable it by default.
cc @ruflin
The text was updated successfully, but these errors were encountered: