feat: Expose health info from Block Stream/Executor RPC #889

morgsmccauley · 2024-07-17T09:33:25Z

This PR exposes a health field on the Block Stream/Executor info, which can be accessed via RPC. The intent of this field is for Coordinator to monitor it, and then act accordingly. I wanted to raise this work first, so that the overall result is not too large.

Essentially, health contains only a single enum describing the "state" of the process, but this can be expanded over time as needed.

morgsmccauley · 2024-07-17T09:37:16Z

block-streamer/src/block_stream.rs

+#[derive(Clone)]
+pub struct BlockStreamHealth {
+    pub processing_state: ProcessingState,
+    pub last_updated: SystemTime,


Unlike Runner which pushes the information up, Block Streamer polls it. For this reason I've added a timestamp, so that we can determine whether the data is stale or not. Stale will probably result in restart.

morgsmccauley · 2024-07-17T19:13:57Z

runner/src/stream-handler/worker.ts

@@ -162,7 +162,7 @@ async function blockQueueConsumer (workerContext: WorkerContext): Promise<void>
        });

        const postRunSpan = tracer.startSpan('Delete redis message and shift queue', {}, context.active());
-        parentPort?.postMessage({ type: WorkerMessageType.STATUS, data: { status: IndexerStatus.RUNNING } });
+        parentPort?.postMessage({ type: WorkerMessageType.EXECUTION_STATE, data: { state: ExecutionState.RUNNING } });


IndexerStatus is outward facing, and we need a bit more granularity for internal use, so opted to create a separate type.

darunrs

Looks fine to me! I'm definitely still confused about the states as they look and feel very distinct from any action items though. A state like INDEXER_BACKFILL or LAKE_BACKFILL make more direct sense to what the block stream is actually doing for example. But I imagine you're keeping the states vague on purpose to avoid having to manage them more frequently? I'd say from a clarity perspective I'd prefer more detailed state information provided there's distinct Coordinator behavior for them.

darunrs · 2024-07-17T20:11:21Z

block-streamer/proto/block_streamer.proto

+    uint64 updated_at_timestamp_secs = 2;
+}
+
+enum ProcessingState {


It seems you just want to represent any possible states of the service from the get go? I imagine scenarios for UNSPECIFIED and WAITING are vague right now. Specifically for WAITING, I'm confused what that refers to. Does Runner back pressure count as WAITING? If it does, then it could swing between WAITING and RUNNING repeatedly. Otherwise, I'm not sure. I'm also trying to think what action items Coordinator would intend to have when receiving each of these states.

It seems you just want to represent any possible states of the service from the get go?

Yes essentially. While we only really need Stalled, it was pretty trivial to add these other states.

And yes, Waiting was created specifically for the back pressure case. Doesn't really serve any purpose now. But it could be beneficial to expose these as metrics from Coordinator so we can view what state each indexer is in 🤔

darunrs · 2024-07-17T20:12:30Z

block-streamer/src/block_stream.rs

+/// Represents the processing state of a block stream
+#[derive(Clone)]
+pub enum ProcessingState {
+    /// Block Stream is not currently active but can be started. Either has not been started or was


A block stream which wasn't started wouldn't even return a state right? It wouldn't be present in the list of block streams.

Yes, and no. The RPC logic is to only add "started" BlockStreams. But it's still possible to create a BlockStream and not start it, and it would make sense to default to Running

morgsmccauley · 2024-07-17T21:01:47Z

I'd say from a clarity perspective I'd prefer more detailed state information provided there's distinct Coordinator behavior for them.

At this stage we don't have requirement for these more granular states. The goal here is to remove the need for manually restarting the Block Streamer/Runner services. To achieve that, we only need to know if the process has stalled.

We could perhaps expand this to have granular states, and then restart if the bitmap backfill abnormally finished, but I wouldn't say that's required in the short term.

morgsmccauley requested a review from a team as a code owner July 17, 2024 09:33

morgsmccauley commented Jul 17, 2024

View reviewed changes

morgsmccauley linked an issue Jul 17, 2024 that may be closed by this pull request

Restart unhealthy block streams and executors #875

Closed

morgsmccauley commented Jul 17, 2024

View reviewed changes

darunrs approved these changes Jul 17, 2024

View reviewed changes

Base automatically changed from refactor/dedicated-control-loops to main July 18, 2024 01:56

morgsmccauley force-pushed the main branch from 61036b2 to f2cdc78 Compare July 18, 2024 02:19

morgsmccauley added 12 commits July 18, 2024 15:03

feat: Use less confusing name for start_block_stream span

ffc7ff2

feat: Monitor and expose block stream processing state

e28db69

feat: Expose block stream health via rpc

ea86d61

fix: Handle streams larger than max size

1ec7bd3

feat: Expose timestamp to ensure health is not stale

5c58292

fix: Cancel monitoring task on block stream stop

abdd075

refactor: Rename ProcessingState variants

89d3d34

chore: Add doc comments to block streamer proto

fd4f554

feat: Expose health information from executor rpc

167a7c5

fix: executor rpc examples

11459b0

refactor: Paused -> Waiting

7c33263

feat: Update runner client proto

249acc2

morgsmccauley force-pushed the feat/health-rpc branch from 606bc64 to 249acc2 Compare July 18, 2024 03:03

morgsmccauley merged commit 29bde3c into main Jul 18, 2024
7 checks passed

morgsmccauley deleted the feat/health-rpc branch July 18, 2024 03:11

This was referenced Jul 18, 2024

Prod Release 19/07/24 #897

Merged

Prod Release 19/07/24 - 2 #899

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Expose health info from Block Stream/Executor RPC #889

feat: Expose health info from Block Stream/Executor RPC #889

morgsmccauley commented Jul 17, 2024

morgsmccauley Jul 17, 2024

morgsmccauley Jul 17, 2024

darunrs left a comment

darunrs Jul 17, 2024

morgsmccauley Jul 17, 2024

darunrs Jul 17, 2024

morgsmccauley Jul 17, 2024

morgsmccauley commented Jul 17, 2024

feat: Expose health info from Block Stream/Executor RPC #889

feat: Expose health info from Block Stream/Executor RPC #889

Conversation

morgsmccauley commented Jul 17, 2024

morgsmccauley Jul 17, 2024

Choose a reason for hiding this comment

morgsmccauley Jul 17, 2024

Choose a reason for hiding this comment

darunrs left a comment

Choose a reason for hiding this comment

darunrs Jul 17, 2024

Choose a reason for hiding this comment

morgsmccauley Jul 17, 2024

Choose a reason for hiding this comment

darunrs Jul 17, 2024

Choose a reason for hiding this comment

morgsmccauley Jul 17, 2024

Choose a reason for hiding this comment

morgsmccauley commented Jul 17, 2024