[8.15] [Data Forge] Add artificial delay feature (#187901) #188333

kibanamachine · 2024-07-15T16:54:14Z

Backport

This will backport the following commits from main to 8.15:

[Data Forge] Add artificial delay feature (#187901)

Questions ?

Please refer to the Backport tool documentation

@timestamp

## Summary This PR adds a new setting, `indexing.artificialIndexDelay`, to the indexing configuration to control how much artificial delay to add to the timestamps. This PR also adds a "final" ingest pipeline to each data source along with injecting a new base `component_template` which includes the `event.ingested` field. The artificial delay is useful for testing transforms on data that has a significant delays. It also allows us to test if we miss data when syncing on the transforms using `event.ingested`. - Installs default ingest pipeline to add event.ingested to each document - Adds final_pipeline to each install_index_template - Inject base component_template to each index_template at install time - Add artificial delay for "current" events, historical events are ingested without delay. - Change index math to produce monthly indices ### How to test: Copy the following to `fake_logs.delayed.yaml`: ```YAML --- elasticsearch: installKibanaUser: false kibana: installAssets: true host: "http://localhost:5601/kibana" indexing: dataset: "fake_logs" eventsPerCycle: 100 artificialIndexDelay: 300000 schedule: - template: "good" start: "now-1h" end: false eventsPerCycle: 100 ``` Then run `node x-pack/scripts/data_forge.js --config fake_logs.delayed.yaml`. This should index an hour of data immediately, then add a 300s delay when indexing in "real time". The logs will look like: ``` info Starting index to http://localhost:9200 with a payload size of 10000 using 5 workers to index 100 events per cycle info Installing index templates (fake_logs) info Installing components for fake_logs (fake_logs_8.0.0_base,fake_logs_8.0.0_event,fake_logs_8.0.0_log,fake_logs_8.0.0_host,fake_logs_8.0.0_metricset) info Installing index template (fake_logs) info Indexing "good" events from 2024-07-09T16:23:36.803Z to indefinitely info Delaying 100 by 300000ms info Waiting 60000ms info { took: 2418721239, latency: 541, indexed: 6000 } Indexing 6000 documents. ... ``` Then after `300s`, it will index another `100` documents every `60s`. You can also inspect the delay per minute using the following ES|QL in Discover: ``` FROM kbn-data-forge-fake_logs.fake_logs-* | eval diff=DATE_DIFF("seconds", @timestamp, event.ingested) | STATS delay=AVG(diff) by timestamp=BUCKET(@timestamp, 1 minute) ``` This should give you a chart that looks something like this: <img width="1413" alt="image" src="https://github.com/elastic/kibana/assets/41702/2f48cb85-a410-487e-8f3b-41311ff95186"> There should also be a 5 minute gap at the end in Discover: <img width="1413" alt="image" src="https://github.com/elastic/kibana/assets/41702/660acc87-6958-4ce9-a544-d66d56f805dd"> --------- Co-authored-by: kibanamachine <[email protected]> (cherry picked from commit 2fac5e8)

kibanamachine assigned simianhacker Jul 15, 2024

kibanamachine added the backport label Jul 15, 2024

kibanamachine enabled auto-merge (squash) July 15, 2024 16:54

kibanamachine mentioned this pull request Jul 15, 2024

[Data Forge] Add artificial delay feature #187901

Merged

github-actions bot approved these changes Jul 15, 2024

View reviewed changes

kibanamachine merged commit 4a9ca6f into elastic:8.15 Jul 15, 2024
22 of 23 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[8.15] [Data Forge] Add artificial delay feature (#187901) #188333

[8.15] [Data Forge] Add artificial delay feature (#187901) #188333

kibanamachine commented Jul 15, 2024

[8.15] [Data Forge] Add artificial delay feature (#187901) #188333

[8.15] [Data Forge] Add artificial delay feature (#187901) #188333

Conversation

kibanamachine commented Jul 15, 2024

Backport

Questions ?