Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prod Release 03/07/24 #851

Merged
merged 10 commits into from
Jul 3, 2024
Merged

Prod Release 03/07/24 #851

merged 10 commits into from
Jul 3, 2024

Conversation

morgsmccauley
Copy link
Collaborator

@morgsmccauley morgsmccauley commented Jul 3, 2024

morgsmccauley and others added 10 commits June 25, 2024 17:26
Also runs `cargo format` (1st commit)

closes: #822
This PR introduces back pressure to the Redis Stream in Block Streamer,
ensuring that the stream does not exceed a specified maximum length.
This is achieved by blocking the `redis.publish_block()` call,
intermittently polling the Stream length, and publishing once it falls
below the configured limit.

To aid testing, the current `RedisClient` struct has been split in to
two:
- `RedisCommands` - thin wrapper around redis commands to make mocking
possible.
- `RedisClient` - provides higher-level redis functionality, e.g.
"publishing blocks", utilising the above.

In most cases, `RedisClient` will be used. The split just allows us to
test `RedisWrapper` itself.
This PR adds a Node script to Runner to suspend Indexers due to
inactivity. The script will:
1. Call Coordinator to disable the indexer
2. Write to the Indexers logs table to notify of suspension

Note that as Coordinator is in a private network, you must tunnel to the
machine to expose the gRPC server. This can be achieved via running the
following in a separate terminal:
```sh
gcloud compute ssh ubuntu@queryapi-coordinator-mainnet -- -L 9003:0.0.0.0:9003
```

The following environment variables are required:
- `HASURA_ADMIN_SECRET`
- `HASURA_ENDPOINT`
- `PGPORT`
- `PGHOST`

All of which can be found in the Runner compute instance metadata:
```sh
gcloud compute instances describe queryapi-runner-mainnet
```


Usage: `npm run script:suspend-indexer -- <accountId> <functionName>`
… Separate Concerns (#830)

Refactored Editor Component to TypeScript. This refactoring involved
breaking down the Editor file into smaller chunks and separating
concerns into distinct components. Also did some minor work on validator
to ts as it is a major consumer in the editor. It is setup to later
iterate on some additional test for validators.
Promises without rejection handlers, i.e. `.catch` or `try catch`, will
throw "unhandled rejection" errors, which bubble up to the worker thread
causing it to exit. This PR adds handlers to the various
`simultaneousPromises` triggered within the Executor, to avoid the
described.
…843)

The current methods for determining both Block Stream and Executor
health is flawed. This PR addresses these flaws by adding new, more
reliable, metrics for use within Grafana.

### Block Streams

A Block Stream is considered healthy if `LAST_PROCESSED_BLOCK` is
continuously incremented, i.e. we are continuously downloading blocks
from S3. This is flawed for the following reasons:
1. When the Redis Stream if full, we halt the Block Stream, preventing
it from processing more blocks
2. When a Block Stream is intentionally stopped, we no longer process
blocks

To address these flaws, I've introduced a new dedicated metric:
`BLOCK_STREAM_UP`, which:
- is incremented every time the Block Stream future is polled, i.e. the
task is doing work. A static value means unhealthy.
- is removed when the Block Stream is stopped, so that it doesn't
trigger the false positive described above

### Executors

An Executor is considered unhealthy if: it has messages in the Redis
Stream, and no reported execution durations. The latter only being
recorded on success. The inverse of this is used to determine "healthy".
This is flawed for the following reasons:
1. We distinguish the difference between a genuinely broken Indexer, and
one broken due to system failures
2. "health" is only determined when there are messages in Redis, meaning
we catch the issue later than possible

To address these I have added the following metrics:
1. `EXECUTOR_UP` which is incremented on every Executor loop, like
above, a static value means unhealthy.
2. `SUCCESSFUL_EXECUTIONS`/`FAILED_EXECUTIONS` which track
successful/failed executions directly, rather than tracking using
durations. This will be useful for tracking health of specific Indexers,
e.g. the `staking` indexer should never have failed executions.
We skip reporting metrics if there are no messages in the pre-fetch
queue/Redis Stream. This is especially problematic for `EXECUTOR_UP`, as
we won't increment the metric even though we are processing.

This PR moves the metrics logic so that it is always reported, even when
no messages in the stream.
)

Added logic to monaco lifecycle method and react lifecycle methods to
persist line/cursor position between the swapped files.
set scroll past last line in monaco flag to true
@morgsmccauley morgsmccauley requested a review from a team as a code owner July 3, 2024 01:22
@morgsmccauley morgsmccauley reopened this Jul 3, 2024
@morgsmccauley morgsmccauley merged commit 932c277 into stable Jul 3, 2024
36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants