Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Pre-Fetch Streamer Messages #269

Merged
merged 24 commits into from
Nov 8, 2023
Merged

feat: Pre-Fetch Streamer Messages #269

merged 24 commits into from
Nov 8, 2023

Conversation

darunrs
Copy link
Collaborator

@darunrs darunrs commented Oct 5, 2023

Historical streamer messages are not fetched in coordinator prior to the IndexerRunner call. As a result, we cannot apply the latency saving benefits of having coordinator cache the streamer message for use by runner. Instead, we want to pre fetch from S3 so that runner invocations don't have to wait for the streamer message to be retrieved from S3.

In addition, it's possible for real-time messages to backup temporarily preventing the cached message from being used. So, we also want to prefetch any messages which aren't found in the cache.

The new workflow works by having two loops for each worker thread: a producer and a consumer. The producer loads a promise for fetching the block (either from cache or S3) into an array. The consumer then removes the first element from the array and processes it, deleting the streamer message upon success. While one block is being processed, the other blocks are being fetched. This ensures that wait time is minimal. The producer loop attempts to keep the array as close to full as possible.

@darunrs darunrs linked an issue Oct 5, 2023 that may be closed by this pull request
@darunrs darunrs changed the base branch from main to cacheStreamer October 5, 2023 03:48
@darunrs
Copy link
Collaborator Author

darunrs commented Oct 5, 2023

#264

Base automatically changed from cacheStreamer to main October 5, 2023 23:50
@darunrs darunrs force-pushed the cacheHistorical branch 3 times, most recently from f71a38d to 4e10f80 Compare November 2, 2023 01:12
@darunrs darunrs marked this pull request as ready for review November 2, 2023 02:18
@darunrs darunrs requested a review from a team as a code owner November 2, 2023 02:18
@darunrs darunrs changed the title Pre-Fetch Historical Streamer Messages feat: Pre-Fetch Historical Streamer Messages Nov 2, 2023
@darunrs darunrs changed the title feat: Pre-Fetch Historical Streamer Messages feat: Pre-Fetch Streamer Messages Nov 2, 2023
Copy link
Collaborator

@morgsmccauley morgsmccauley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really great start on this - left a couple comments :)

runner/src/lake-client/lake-client.ts Outdated Show resolved Hide resolved
runner/src/lake-client/lake-client.ts Outdated Show resolved Hide resolved
runner/src/metrics.ts Show resolved Hide resolved
runner/src/lake-client/lake-client.ts Outdated Show resolved Hide resolved
runner/src/lake-client/lake-client.ts Outdated Show resolved Hide resolved
runner/src/stream-handler/worker.ts Outdated Show resolved Hide resolved
runner/src/stream-handler/worker.ts Outdated Show resolved Hide resolved
runner/src/stream-handler/worker.ts Show resolved Hide resolved
runner/src/stream-handler/worker.ts Outdated Show resolved Hide resolved
runner/src/stream-handler/worker.ts Outdated Show resolved Hide resolved
@darunrs
Copy link
Collaborator Author

darunrs commented Nov 6, 2023

I'm still experimenting with the blocking values. In the meantime, I released the changes I've made so far. Let me know what you think!

@darunrs
Copy link
Collaborator Author

darunrs commented Nov 6, 2023

I ran roughly 5000 messages of sweat blockheight before and after the blocking commit and I can visually see an improvement. Left is with the commit and right is without it. Block wait time is lower and less spread out. Although, it was already low enough to not impact much anymore anyway. Execution duration and overhead latency spread is tighter too. The BPS curve is more linear and the peak was higher too. In any case, it seems the commit definitely helped.

image

runner/src/lake-client/lake-client.ts Outdated Show resolved Hide resolved
runner/src/lake-client/lake-client.ts Outdated Show resolved Hide resolved
runner/src/redis-client/redis-client.ts Outdated Show resolved Hide resolved
runner/src/stream-handler/worker.ts Outdated Show resolved Hide resolved
runner/src/stream-handler/worker.ts Outdated Show resolved Hide resolved
@darunrs
Copy link
Collaborator Author

darunrs commented Nov 8, 2023

I did some more experimentation and found that blocking does in fact block both loops, not just the producer loop. As a result, once no stream messages are present, execution of the functions becomes slow as the xread blocks the whole thread for the blocking duration. Removing it returned the speed improvements.

In addition, I ran a test against sweat_blockheight where I appended the block-height of each stream message to one file and then appended the block height to a second file after runFunction is called. I diff'd the two files and found no differences. This was enough to verify that every message in the stream is seen and processed exactly once.

Copy link
Collaborator

@morgsmccauley morgsmccauley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - great work :)

@morgsmccauley
Copy link
Collaborator

Just need to fix failing tests

@darunrs darunrs merged commit 262b183 into main Nov 8, 2023
3 checks passed
@darunrs darunrs deleted the cacheHistorical branch November 8, 2023 22:23
morgsmccauley added a commit that referenced this pull request Nov 9, 2023
morgsmccauley added a commit that referenced this pull request Nov 9, 2023
morgsmccauley added a commit that referenced this pull request Nov 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pre-Fetch Streamer Messages
2 participants