Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: don't crash relayer due to flaky RPCs #5115

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

kamiyaa
Copy link
Collaborator

@kamiyaa kamiyaa commented Jan 7, 2025

Description

  • instead of crashing relayer if any RPC calls fail during build_mailboxes and build_validator_announces, we just log the error and metric
  • relayer will continue running without those chains

Drive-by changes

Related issues

Fixes #4887

Backward compatibility

Are these changes backward compatible? Are there any infrastructure implications, e.g. changes that would prohibit deploying older commits using this infra tooling?

Yes

Testing

Added unittests to test ensure functions run correctly and log errors in metrics

Copy link

changeset-bot bot commented Jan 7, 2025

⚠️ No Changeset found

Latest commit: a90a9bb

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@kamiyaa kamiyaa force-pushed the jeff/bad-rpc branch 3 times, most recently from ca808d3 to 1c9af66 Compare January 7, 2025 14:04
let db = dbs.get(origin).unwrap().clone();
let metadata_builder = BaseMetadataBuilder::new(
origin.clone(),
destination_chain_setup.clone(),
prover_syncs[origin].clone(),
validator_announces[origin].clone(),
validator_annonce.clone(),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we can no longer assume all mailboxes / validator_announce were successfully instantiated, we should loop through the successfully instantiated ones instead of blindly going through all origin_chains / destination_chains

/// Any chains that fail to build mailbox will not be included
/// in the hashmap. Errors will be logged and chain metrics
/// will be updated for chains that fail to build mailbox.
pub async fn build_mailboxes(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move logic into functions so we can test them in isolation

 - instead of crashing relayer if any RPC calls fail
 we just log the error and metric

#[tokio::test]
#[tracing_test::traced_test]
async fn test_failed_build_mailboxes() {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out we don't need to do too many fancy things to cause an error for build_mailboxes or build_validator_announces.
All we need to do is just not have a ChainConf associated with said chain and it will cause an error in build_mailbox / build_validator_announce

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Relayer can still crash upon startup due to bad RPCs
1 participant