You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The workers tend to flush at (roughly) the same time.
The main reason for this is that worker buffers are filled evenly, because all workers fetch their items from the same Go channel in parallel.
The buffer expiration made it worse, because so far (up to go-elasticsearch 8.7), there is one ticker that flushes all workers at the same time. #624 fixed this for 8.8+.
The flushing at the same time has several bad effects:
peak in memory usage: the bulk indexer items are kept in memory until flush, the buffer memory (HTTP body) is allocated/filled at the same time
peak in CPU consumption: the HTTP bodies are generated and compressed at the same time
peak in network usage: all requests to ES go out in parallel
same peak effects on the Elastic Search side, leading to 429 responses more often than needed (and expected), consequentially leading to more retries, which amplifies the above peak behavior
Possible solution: Fill buffers sequentially, flush in background
(Out of scope is changing the API)
Let's say we have number of workers set to N with FlushBytes B and FlushInterval I.
Let the Add() function collect items into an array A0 until B or I is reached, then flush in the background.
Further calls to Add() are going into a new array A1 until B or I is reached, then flush in the background.
...
Allow maximal N background flushes. If reached, throttle ingestion in the Add() function (as we do now).
Pros:
spread the workload over time to reduce peak behavior and pressure on ES
Cons:
?
The text was updated successfully, but these errors were encountered:
I don't see any real cons, depending on actual implementation it could be hard to figure out the numbers of workers. I'm thinking about actual allocated memory in worst case scenario when everyone is stuck.
I would envision that as worker pool that would need a basic scheduler to handle the handover of items? I wonder if making that pluggable has any values to have different strategies.
The Problem
The workers tend to flush at (roughly) the same time.
The main reason for this is that worker buffers are filled evenly, because all workers fetch their items from the same Go channel in parallel.
The buffer expiration made it worse, because so far (up to go-elasticsearch 8.7), there is one ticker that flushes all workers at the same time. #624 fixed this for 8.8+.
The flushing at the same time has several bad effects:
Possible solution: Fill buffers sequentially, flush in background
(Out of scope is changing the API)
Let's say we have number of workers set to N with
FlushBytes
B and FlushInterval I.Let the Add() function collect items into an array A0 until B or I is reached, then flush in the background.
Further calls to Add() are going into a new array A1 until B or I is reached, then flush in the background.
...
Allow maximal N background flushes. If reached, throttle ingestion in the Add() function (as we do now).
Pros:
Cons:
The text was updated successfully, but these errors were encountered: