Pre-Fetch Streamer Messages #264

darunrs · 2023-09-29T18:49:45Z

Historical streamer messages are not fetched in coordinator prior to the IndexerRunner call. As a result, we cannot apply the latency saving benefits of having coordinator cache the streamer message for use by runner. Instead, we want to pre fetch from S3 so that runner invocations don't have to wait for the streamer message to be retrieved from S3. Instead, these messages will be pre fetched and awaited in an array to ensure in order processing of block heights.

Below is the current flow for historical processing:

Coordinator retrieves timestamp for the block height that historical processing is starting from as well as current block height. It uses this to see which days to look for index files.
Index files are fils generated for each day an indexer is active. They contain information such as block heights which the particular indexer function was applied for. Coordinator fetches each index file available for the indexer starting from the day the starting block height falls into.
Coordinator parses the block heights from the file and puts them in the historical redis stream. This is where the divergence between real time and historical lies. Real time does not have any index files so it reads the streamer message from S3 to get data including block height. This block height is put into the real time stream.
Runner reads the block height from the historical stream. It pulls the streamer message from S3, parses it, and uses it for execution. This leads to each invocation taking at least 200ms if not more. I've seen as high as 700ms in a sample size of 20 invocations. 99th percentile might be much higher.

Below is the new workflow:

Coordinator functionality remains the same.
In runner, fetch X blocks from S3 as a promise.
Load the promises into an array, which is used as a queue.
Delete the block height from the stream, for each block height successfully placed on queue.
Await the first block in the queue. Upon completion of the promise, trigger the function call and pass in the loaded data.

I've also made it so that real-time also uses prefetch mechanism on top of the existing caching.

While an indexer function is running, several other blocks are being loaded simultaneously. For each loop, we ensure the array is as full as possible. This ensures few functions are waiting for the block instead of all of them.

Tasks

Give feedback

Migrate Shared S3 Code to New Class
Add console statements to breakdown execution time
Tune Queue size parameter
Get accurate estimates of overhead latency
Create metrics on overhead, database, code execution, and so on
Reduce latency as much as possible, targeting 100+ BPS
Update BPS to also include skipped blocks
Options

darunrs mentioned this issue Sep 29, 2023

Optimize Runner Streamer Message Acquisition #204

Closed

darunrs self-assigned this Sep 29, 2023

darunrs changed the title ~~Store Historical Streamer Message in Redis~~ Cache Historical Streamer Message in Redis Oct 2, 2023

darunrs changed the title ~~Cache Historical Streamer Message in Redis~~ Pre-Fetch Historical Streamer Messages Oct 4, 2023

darunrs linked a pull request Oct 5, 2023 that will close this issue

feat: Pre-Fetch Streamer Messages #269

Merged

darunrs mentioned this issue Oct 5, 2023

feat: Pre-Fetch Streamer Messages #269

Merged

darunrs changed the title ~~Pre-Fetch Historical Streamer Messages~~ Pre-Fetch Streamer Messages Nov 2, 2023

darunrs closed this as completed in #269 Nov 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pre-Fetch Streamer Messages #264

Pre-Fetch Streamer Messages #264

darunrs commented Sep 29, 2023 •

edited

Loading

Tasks

Pre-Fetch Streamer Messages #264

Pre-Fetch Streamer Messages #264

Comments

darunrs commented Sep 29, 2023 • edited Loading

Tasks

darunrs commented Sep 29, 2023 •

edited

Loading