`event.duration` takes a significative amount of disk space #31574

jsoriano · 2022-05-10T14:09:48Z

While analyzing disk space used by data collected by Metricbeat and stored in indexes with TSDB and synthetic _source enabled (elastic/elasticsearch#85649), @nik9000 found that event.duration takes up to 16.7% of the disk space.

This field is automatically added by Metricbeat, by default, with the duration of the fetch operation.

I guess that the main purpose of this field is to monitor or debug metrics collection itself, but this may not be so useful for the final users of most modules.

Being in Metricbeat, this is also added to metrics documents collected by Agent.

We should reconsider this field, or disable it by default.

cc @ruflin

The text was updated successfully, but these errors were encountered:

elasticmachine · 2022-05-10T14:09:50Z

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

elasticmachine · 2022-05-10T14:09:51Z

Pinging @elastic/obs-dc (Team:Obs-DC)

nik9000 · 2022-05-10T17:57:07Z

We should reconsider this field

If you dropped it to second precision or microsecond precision it might still be useful and take up much much less space. You could hit it with the disk usage API.

jlind23 · 2022-05-11T07:59:16Z

@cmacknz any thoughts here? Should we change the precision or rather get rid of it?

ruflin · 2022-05-11T09:32:37Z

Could do we a test with a reduced precision, microsecond is definitively enough.

The idea behind this field was to visualise and detect potential delays of the event collection. If we don't use it anywhere, we could also just introduce a config and turn it off by default.

jsoriano · 2022-05-11T09:40:46Z

Not sure if reducing precision is a realistic option, this is defined as nanoseconds in ECS, there can be uses of this with this precision. Also, there are cases where an event can take sub-millisecond times, as when collecting system or runtime metrics, or in performance monitoring.
So this field is probably ok in nanoseconds, but it should be used only where/when needed, and maybe we should have a different field for the cases when less precision is needed.

I think that disabling it by default in Metricbeat would be a better option, but it can be also a breaking change if someone is using it.

cmacknz · 2022-05-11T14:35:53Z

I think that disabling it by default in Metricbeat would be a better option, but it can be also a breaking change if someone is using it.

Agreed, I suspect this field is only used during module development and possibly in SDHs where we can just request it be enabled.

Does anyone from @elastic/obs-cloud-monitoring or @elastic/obs-cloudnative-monitoring have any thoughts on this field?

kaiyan-sheng · 2022-05-11T14:45:31Z

Just want to make sure we are only talking about event.duration field in Metricbeat right? I don't think we are using it in metrics collection. But we are definitely leveraging this field in Filebeat and log data streams.

nik9000 · 2022-05-11T15:10:33Z

Not sure if reducing precision is a realistic option, this is defined as nanoseconds in ECS, there can be uses of this with this precision. Also, there are cases where an event can take sub-millisecond times, as when collecting system or runtime metrics, or in performance monitoring.

From a disk usage standpoint you can send in the number in nanoseconds if you want and round it to microseconds. It'll just have a lot of trailing 0s which we will optimize away in storage. At least when synthetic source is available. But, yeah, not saving it at all is wonderful if we can get away with it.

jsoriano · 2022-05-11T15:15:06Z

Just want to make sure we are only talking about event.duration field in Metricbeat right? I don't think we are using it in metrics collection. But we are definitely leveraging this field in Filebeat and log data streams.

@kaiyan-sheng yes, the main concern is the field added automatically by Metricbeat. In the cases when the field is explicitly collected and used by an integration I think that this is fine.

botelastic · 2023-05-11T15:29:45Z

Hi!
We just realized that we haven't looked into this issue in a while. We're sorry!

We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1.
Thank you for your contribution!

ruflin · 2023-05-12T08:50:08Z

Commenting as we should not drop this issue.

botelastic · 2024-05-11T09:01:33Z

Hi!
We just realized that we haven't looked into this issue in a while. We're sorry!

We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1.
Thank you for your contribution!

ruflin · 2024-05-14T11:21:36Z

👍

cmacknz · 2024-05-14T13:30:52Z

This field is automatically added by Metricbeat, by default, with the duration of the fetch operation.

This seems like something we could just start logging a summary of. In theory this should match the period in the configuration, I imagine the intent is to spot cases where we are slightly out of sync.

While analyzing disk space used by data collected by Metricbeat and stored in indexes with TSDB and synthetic _source enabled (elastic/elasticsearch#85649), @nik9000 found that event.duration takes up to 16.7% of the disk space

FYI @strawgate this is another place we could be reducing ingest volume.

strawgate · 2024-05-14T13:34:24Z

Great, thanks Craig!

Perhaps we can round the field to a desired resolution (maybe 0.1s?) and have a setting to enable nanosecond precision?

I imagine if we keep it as nanosecond but all the durations are rounded we reduce cardinality by a lot

cmacknz · 2024-05-14T13:38:30Z

~~There is a lot more context starting from elastic/integrations#4894 (comment) on how this gets used.~~

Edit: this is referencing event.ingested not event.duration

strawgate · 2024-05-14T14:59:32Z

Most of the text about describing the need are targetting event.ingested which we aren't talking about changing in this ticket. I didn't actually see really any info on how event.duration is used just a bunch of ideas for how to reduce its storage requirement?

cmacknz · 2024-05-14T16:05:39Z

Whoops, I misread that entire issue as applying to event.duration. You are right what I linked is not relevant to event.duration.

strawgate · 2024-05-14T21:56:41Z

Reducing to millisecond would make it still useful while reducing cardinality by 100,000x, I can test what the savings is if that would be useful

nimarezainia · 2024-05-14T21:56:53Z

As a plan moving forward:

Short-term: since the precision can't be adjusted, implement what @nik9000 suggests HERE. Document/benchmark the disk usage savings.

medium/long-term: Since this is a breaking change I suggest adding it to the list for 9.0 and make it a configurable option then. We just don't have enough information on how users may be utilizing this field today.

does this work?

strawgate · 2024-05-14T22:04:43Z

i think we should reduce precision as much as we can get away with as each decimal we drop is a 10x reduction in cardinality. I don't know that nano to micro will be a big enough savings but if someone can benchmark maybe we can find the "sweet spot"

felixbarny · 2024-10-23T08:32:33Z

Looks like event.duration is already in second precision since 8.0/7.15: elastic/kibana#104044

I wonder if the tests that suggested that event.duration takes up to 16.7% of disk space used an old version of the pipeline that didn't do that truncation.

@martijnvg do you have recent numbers of a storage breakdown by field so that we can see if event.ingested is still an issue from a storage perspective?

Removing event.ingested is problematic as transforms, such as the ones used for the SLO feature rely on it. See also elastic/integrations#4894 (comment)

martijnvg · 2024-10-23T08:37:20Z

@martijnvg do you have recent numbers of a storage breakdown by field so that we can see if event.ingested is still an issue from a storage perspective?

I believe last time, we gather the disk usage from Rally benchmarks. If the tracks aren't updated, then we also don't see any improvements.

jsoriano added bug discuss Issue needs further discussion. Team:Obs-DC Label for the Data Collection team Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team labels May 10, 2022

ruflin mentioned this issue Dec 23, 2022

Remove event.duration and event.ingested from metric events elastic/integrations#4894

Open

botelastic bot added the Stalled label May 11, 2023

botelastic bot removed the Stalled label May 12, 2023

botelastic bot added the Stalled label May 11, 2024

botelastic bot removed the Stalled label May 14, 2024

felixbarny mentioned this issue Oct 23, 2024

Re-introduce event.ingested in observability project types? elastic/integrations#11491

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`event.duration` takes a significative amount of disk space #31574

`event.duration` takes a significative amount of disk space #31574

jsoriano commented May 10, 2022 •

edited

Loading

elasticmachine commented May 10, 2022

elasticmachine commented May 10, 2022

nik9000 commented May 10, 2022

jlind23 commented May 11, 2022

ruflin commented May 11, 2022

jsoriano commented May 11, 2022

cmacknz commented May 11, 2022

kaiyan-sheng commented May 11, 2022

nik9000 commented May 11, 2022

jsoriano commented May 11, 2022

botelastic bot commented May 11, 2023

ruflin commented May 12, 2023

botelastic bot commented May 11, 2024

ruflin commented May 14, 2024

cmacknz commented May 14, 2024

strawgate commented May 14, 2024 •

edited

Loading

cmacknz commented May 14, 2024 •

edited

Loading

strawgate commented May 14, 2024

cmacknz commented May 14, 2024

strawgate commented May 14, 2024

nimarezainia commented May 14, 2024 •

edited

Loading

strawgate commented May 14, 2024

felixbarny commented Oct 23, 2024

martijnvg commented Oct 23, 2024

event.duration takes a significative amount of disk space #31574

event.duration takes a significative amount of disk space #31574

Comments

jsoriano commented May 10, 2022 • edited Loading

elasticmachine commented May 10, 2022

elasticmachine commented May 10, 2022

nik9000 commented May 10, 2022

jlind23 commented May 11, 2022

ruflin commented May 11, 2022

jsoriano commented May 11, 2022

cmacknz commented May 11, 2022

kaiyan-sheng commented May 11, 2022

nik9000 commented May 11, 2022

jsoriano commented May 11, 2022

botelastic bot commented May 11, 2023

ruflin commented May 12, 2023

botelastic bot commented May 11, 2024

ruflin commented May 14, 2024

cmacknz commented May 14, 2024

strawgate commented May 14, 2024 • edited Loading

cmacknz commented May 14, 2024 • edited Loading

strawgate commented May 14, 2024

cmacknz commented May 14, 2024

strawgate commented May 14, 2024

nimarezainia commented May 14, 2024 • edited Loading

strawgate commented May 14, 2024

felixbarny commented Oct 23, 2024

martijnvg commented Oct 23, 2024

`event.duration` takes a significative amount of disk space #31574

`event.duration` takes a significative amount of disk space #31574

jsoriano commented May 10, 2022 •

edited

Loading

strawgate commented May 14, 2024 •

edited

Loading

cmacknz commented May 14, 2024 •

edited

Loading

nimarezainia commented May 14, 2024 •

edited

Loading