Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[8.15] [Data Forge] Add artificial delay feature (#187901) #188333

Merged
merged 1 commit into from
Jul 15, 2024

Conversation

kibanamachine
Copy link
Contributor

Backport

This will backport the following commits from main to 8.15:

Questions ?

Please refer to the Backport tool documentation

## Summary

This PR adds a new setting, `indexing.artificialIndexDelay`, to the
indexing configuration to control how much artificial delay to add to
the timestamps. This PR also adds a "final" ingest pipeline to each data
source along with injecting a new base `component_template` which
includes the `event.ingested` field.

The artificial delay is useful for testing transforms on data that has a
significant delays. It also allows us to test if we miss data when
syncing on the transforms using `event.ingested`.

- Installs default ingest pipeline to add event.ingested to each
document
- Adds final_pipeline to each install_index_template
- Inject base component_template to each index_template at install time
- Add artificial delay for "current" events, historical events are
ingested without delay.
- Change index math to produce monthly indices

### How to test:

Copy the following to `fake_logs.delayed.yaml`:

```YAML
---
elasticsearch:
  installKibanaUser: false

kibana:
  installAssets: true
  host: "http://localhost:5601/kibana"

indexing:
  dataset: "fake_logs"
  eventsPerCycle: 100
  artificialIndexDelay: 300000

schedule:
  - template: "good"
    start: "now-1h"
    end: false
    eventsPerCycle: 100
```
Then run `node x-pack/scripts/data_forge.js --config
fake_logs.delayed.yaml`. This should index an hour of data immediately,
then add a 300s delay when indexing in "real time". The logs will look
like:

```
 info Starting index to http://localhost:9200 with a payload size of 10000 using 5 workers to index 100 events per cycle
 info Installing index templates (fake_logs)
 info Installing components for fake_logs (fake_logs_8.0.0_base,fake_logs_8.0.0_event,fake_logs_8.0.0_log,fake_logs_8.0.0_host,fake_logs_8.0.0_metricset)
 info Installing index template (fake_logs)
 info Indexing "good" events from 2024-07-09T16:23:36.803Z to indefinitely
 info Delaying 100 by 300000ms
 info Waiting 60000ms
 info { took: 2418721239, latency: 541, indexed: 6000 } Indexing 6000 documents.
...
```
Then after `300s`, it will index another `100` documents every `60s`.
You can also inspect the delay per minute using the following ES|QL in
Discover:
```
FROM kbn-data-forge-fake_logs.fake_logs-* | eval diff=DATE_DIFF("seconds", @timestamp, event.ingested) | STATS delay=AVG(diff) by timestamp=BUCKET(@timestamp, 1 minute)
```
This should give you a chart that looks something like this:

<img width="1413" alt="image"
src="https://github.com/elastic/kibana/assets/41702/2f48cb85-a410-487e-8f3b-41311ff95186">

There should also be a 5 minute gap at the end in Discover:

<img width="1413" alt="image"
src="https://github.com/elastic/kibana/assets/41702/660acc87-6958-4ce9-a544-d66d56f805dd">

---------

Co-authored-by: kibanamachine <[email protected]>
(cherry picked from commit 2fac5e8)
@kibanamachine kibanamachine merged commit 4a9ca6f into elastic:8.15 Jul 15, 2024
22 of 23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants