Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(sequencer-relayer): provide a shutdown controller #889

Merged

Conversation

Fraser999
Copy link
Contributor

@Fraser999 Fraser999 commented Apr 1, 2024

Summary

Adds a shutdown handle for the sequencer-relayer.

Background

We want the ability to invoke the shutdown sequence from main and from tests.

Changes

A new RAII object ShutdownHandle has been added. This cancels the wrapped CancellationToken when the ShutdownHandle is dropped or when its shutdown method is called.

SIGTERM handling has been moved to main.rs; on receiving a SIGTERM, shutdown is called on the shutdown handle associated with the relayer.

The relayer's main select! loop was simplified to only contain the two tasks.

Testing

The TestSequencerRelayer was updated to make use of the new shutdown controller. Also manually tested by sending a SIGTERM to a running sequencer-relayer.

Related Issues

Closes #882

@github-actions github-actions bot added the sequencer-relayer pertaining to the astria-sequencer-relayer crate label Apr 1, 2024
timeout(Duration::from_secs(30), sequencer_relayer)
.await
.unwrap_or_else(|_| {
panic!("timed out waiting for sequencer relayer to shut down");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally avoid putting panicking code in drop impls, but I think it's probably ok here since this is a test-only object, and having a test process abort seems like a better option than just logging here since the log message could easily be missed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this panic here makes sense as that means the shutdown logic failed, which we'd otherwise not catch.

There is also std::thread::panicking to check if the thread is ... panicking. You could issue a debug! in that case so it doesn't re-panick.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, that's what I was meaning in my comment. I think it's better to not check for std::thread::panicking here since all we'd win is avoiding the test process aborting. But if the thread is already panicking and all we do here is log an error, then I think it'll be easy to miss that error.

@@ -305,6 +326,12 @@ pub struct TestSequencerRelayerConfig {

impl TestSequencerRelayerConfig {
pub async fn spawn_relayer(self) -> TestSequencerRelayer {
assert_ne!(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need the tokio runtime to not be of the "current thread" flavour so as to allow the TestSequencerRelayer::drop to not hang at the futures::executor::block_on call.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting. The wiremock mock servers are fine running on current_thread. Do you know where this might be deadlocking? I wonder if we should try and address this problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it's not a problem with code currently in the main branch. It's the addition in this PR of the call to futures::executor::block_on in the TestSequencerRelayer::drop.

The async function passed to block_on can't be cancelled or paused, and I believe that's stopping the relayer making progress inside the main select! macro as the single tokio thread is blocked.

Copy link
Member

@SuperFluffy SuperFluffy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like explicitly handling the shutdowns very much so that we can test this on every test.

I think this is almost ready to merge, but I would like to explore whether we can both simplify the ShutdownHandle (by moving the signal trap into main.rs and relying on a oneshot channel or a CancellationToken/DropGuard) and making it more explicit (by providing an ShutdownHandle::shutdown method).

crates/astria-sequencer-relayer/Cargo.toml Show resolved Hide resolved
crates/astria-sequencer-relayer/src/sequencer_relayer.rs Outdated Show resolved Hide resolved
crates/astria-sequencer-relayer/src/sequencer_relayer.rs Outdated Show resolved Hide resolved
timeout(Duration::from_secs(30), sequencer_relayer)
.await
.unwrap_or_else(|_| {
panic!("timed out waiting for sequencer relayer to shut down");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this panic here makes sense as that means the shutdown logic failed, which we'd otherwise not catch.

There is also std::thread::panicking to check if the thread is ... panicking. You could issue a debug! in that case so it doesn't re-panick.

@@ -305,6 +326,12 @@ pub struct TestSequencerRelayerConfig {

impl TestSequencerRelayerConfig {
pub async fn spawn_relayer(self) -> TestSequencerRelayer {
assert_ne!(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting. The wiremock mock servers are fine running on current_thread. Do you know where this might be deadlocking? I wonder if we should try and address this problem.

crates/astria-sequencer-relayer/src/sequencer_relayer.rs Outdated Show resolved Hide resolved
crates/astria-sequencer-relayer/src/sequencer_relayer.rs Outdated Show resolved Hide resolved
Copy link
Member

@SuperFluffy SuperFluffy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is neat, thank you!

crates/astria-sequencer-relayer/src/main.rs Outdated Show resolved Hide resolved
crates/astria-sequencer-relayer/src/main.rs Outdated Show resolved Hide resolved
@Fraser999 Fraser999 added this pull request to the merge queue Apr 3, 2024
Merged via the queue into astriaorg:main with commit 0877d69 Apr 3, 2024
36 checks passed
@Fraser999 Fraser999 deleted the 882-improve-relayer-shutdown-control branch April 3, 2024 21:37
github-merge-queue bot pushed a commit that referenced this pull request Apr 5, 2024
…#919)

## Summary
Added the `--all-targets` flag to `cargo hack check` to ensure tests can
also be built outside the workspace.

## Background
While implementing #889 it was
found that sequencer-relayer tests can only be run from the repository
root but not from the crate directory itself. The CI job that is
supposed to ensure this failed because it was not running with the right
flags.

## Changes
- Added `--all-targets` flag to `cargo hack check` in the rust test
github workflow.
- 
## Related Issues
Closes #893
github-merge-queue bot pushed a commit that referenced this pull request May 21, 2024
## Summary
Conductor now respects shutdown signals it receives during init.

## Background
Conductor's task ignored shutdowns while still initializing. This meant
that Conductor would hang for up to 30 seconds.

## Changes
- refactor conductor's constituent long-running tasks to separate
initialization and running
- listen for the shutdown signal in all of conductor's tasks

## Testing
Run conductor with endpoints that hang indefinitely and sending it
SIGTERM. Observe that conductor shuts down quickly.

The main operation of conductor is unaffected on the happy path: all
blackbox tests run to completion.

A proper test for the shutdown logic will be implemented in a follow-up
refactor similar to #889
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sequencer-relayer pertaining to the astria-sequencer-relayer crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sequencer-relayer: improve shutdown control
2 participants