-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Expose health info from Block Stream/Executor RPC #889
Conversation
#[derive(Clone)] | ||
pub struct BlockStreamHealth { | ||
pub processing_state: ProcessingState, | ||
pub last_updated: SystemTime, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unlike Runner which pushes the information up, Block Streamer polls it. For this reason I've added a timestamp, so that we can determine whether the data is stale or not. Stale will probably result in restart.
@@ -162,7 +162,7 @@ async function blockQueueConsumer (workerContext: WorkerContext): Promise<void> | |||
}); | |||
|
|||
const postRunSpan = tracer.startSpan('Delete redis message and shift queue', {}, context.active()); | |||
parentPort?.postMessage({ type: WorkerMessageType.STATUS, data: { status: IndexerStatus.RUNNING } }); | |||
parentPort?.postMessage({ type: WorkerMessageType.EXECUTION_STATE, data: { state: ExecutionState.RUNNING } }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IndexerStatus
is outward facing, and we need a bit more granularity for internal use, so opted to create a separate type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine to me! I'm definitely still confused about the states as they look and feel very distinct from any action items though. A state like INDEXER_BACKFILL
or LAKE_BACKFILL
make more direct sense to what the block stream is actually doing for example. But I imagine you're keeping the states vague on purpose to avoid having to manage them more frequently? I'd say from a clarity perspective I'd prefer more detailed state information provided there's distinct Coordinator behavior for them.
uint64 updated_at_timestamp_secs = 2; | ||
} | ||
|
||
enum ProcessingState { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems you just want to represent any possible states of the service from the get go? I imagine scenarios for UNSPECIFIED and WAITING are vague right now. Specifically for WAITING, I'm confused what that refers to. Does Runner back pressure count as WAITING? If it does, then it could swing between WAITING and RUNNING repeatedly. Otherwise, I'm not sure. I'm also trying to think what action items Coordinator would intend to have when receiving each of these states.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems you just want to represent any possible states of the service from the get go?
Yes essentially. While we only really need Stalled
, it was pretty trivial to add these other states.
And yes, Waiting
was created specifically for the back pressure case. Doesn't really serve any purpose now. But it could be beneficial to expose these as metrics from Coordinator so we can view what state each indexer is in 🤔
/// Represents the processing state of a block stream | ||
#[derive(Clone)] | ||
pub enum ProcessingState { | ||
/// Block Stream is not currently active but can be started. Either has not been started or was |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A block stream which wasn't started wouldn't even return a state right? It wouldn't be present in the list of block streams.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and no. The RPC logic is to only add "started" BlockStream
s. But it's still possible to create a BlockStream
and not start it, and it would make sense to default to Running
At this stage we don't have requirement for these more granular states. The goal here is to remove the need for manually restarting the Block Streamer/Runner services. To achieve that, we only need to know if the process has stalled. We could perhaps expand this to have granular states, and then restart if the bitmap backfill abnormally finished, but I wouldn't say that's required in the short term. |
606bc64
to
249acc2
Compare
This PR exposes a
health
field on the Block Stream/Executor info, which can be accessed via RPC. The intent of this field is for Coordinator to monitor it, and then act accordingly. I wanted to raise this work first, so that the overall result is not too large.Essentially,
health
contains only a singleenum
describing the "state" of the process, but this can be expanded over time as needed.