Non persistent TSS signer and x25519 keypair #1216

ameba23 · 2024-12-13T10:58:46Z

entropy-tss has two keypairs which define its identity on the network: the TSS account used to sign extrinsics, and the x25519 encryption keypair. Since both are generated in a confidential virtual machine and their public keys included in the attestation, authenticating with these keypairs proves that the attestation still holds.

However, because they are currently stored on disk, it is possible that the validator operator restarts the virtual machine, makes some modifications, and continues to use these keypairs without needing to make another attestation.

This PR makes the signing / x25519 keypairs no longer persistent, but generated new every time entropy-tss launches (except in development / testing where test mnemonics are used). If the entropy-tss process is killed for whatever reason, it will not be able to continue to participate in the protocols until the validator operator updates the TSS details using the change_threshold_accounts extrinsic. I am hoping to find a way to automate this - see #1214

This PR has implications for our devops flow - the --setup-only command line option is no longer available. entropy-tss should be run only once, and the public keys retrieved using the /info http route.

Closes #1203 by adding a boolean ready flag to the application state, which is set to true once the 'prerequisite checks' are complete. These checks now also include a check the the TSS account id has been registered with the staking pallet, and the balance check is now mandatory (before, only a warning was logged if the TSS account had no funds). In a non-ready state, all the the HTTP routes relating the the protocols will return an error.

This also has implications for slashing, as a TSS node should not be in a non-ready state for too long. How long is acceptable depends a bit on whether we are able to automate the process of the node getting funded and calling change_threshold_accounts.

…- generate it internally

* master: Add TDX test network chainspec (#1204) Test CLI command to retrieve quote and change endpoint / TSS account in one command (#1198) Bump the patch-dependencies group with 2 updates (#1212) Bump thiserror from 2.0.4 to 2.0.6 in the patch-dependencies group (#1206) Downgrade parity-scale-codec as version we currently use has been yanked (#1205) Bump clap from 4.5.22 to 4.5.23 in the patch-dependencies group (#1202)

ameba23 · 2024-12-17T15:35:54Z

crates/threshold-signature-server/src/user/tests.rs

@@ -101,26 +93,6 @@ use crate::{
    validation::EncryptedSignedMessage,
 };

-#[tokio::test]
-#[serial]
-async fn test_get_signer_does_not_throw_err() {


I think this test is no longer needed as the process of getting a signer from app state is now infallible

ameba23 · 2024-12-17T15:37:39Z

crates/threshold-signature-server/src/lib.rs

+    }
+
+    /// Convenience function to get chain api and rpc
+    pub async fn get_api_rpc(


Not related to this PR but i though once being here why not

Can you open a Good First Issue to go through the codebase and use this?

ameba23 · 2024-12-18T09:05:27Z

crates/threshold-signature-server/src/helpers/launch.rs

+                Ok(())
+            };
+
+            if let Err(error) = backoff::future::retry(backoff.clone(), balance_query).await {


This balance check and the one below could maybe be combined into a state machine which makes both checks, but i think for now its ok to make one after the other

Doesn't look like multiple balance queries like this are handled natively by subxt

Doesn't look like multiple balance queries like this are handled natively by subxt

Can you explain what you mean a bit more i dont get it. It looks like the person who made that issue wants to check balances for a bunch of different accounts, but here we are making a balance query for one account several times one after the other.

ameba23 · 2024-12-18T09:06:53Z

crates/threshold-signature-server/src/helpers/launch.rs

-    let backoff = backoff::ExponentialBackoff::default();
-    match backoff::future::retry(backoff, connect_to_substrate_node).await {
+    // Never give up trying to connect
+    let backoff = backoff::ExponentialBackoff { max_elapsed_time: None, ..Default::default() };


I've not looked at the backoff crate in too much detail but i think adding max_elapsed_time: None means it will keep checking indefinitely.

Yeah, I also think it means it won't stop based off time (but may stop under other conditions).

Why do you want to do it this way as opposed to the 15 min limit we had before?

I don't see what is to gain by limiting it. It will be not immediately obvious to the operator that the entropy-tss process has stopped if the VM is still up and running, so i am imagining entropy-tss is going to be automatically re-started if it bails, in which case it will continue to attempt to connect anyway.

By having it spin down after 15 (or some amount) of minutes it becomes a very clear error state for any operators, assuming they have some way to get notified that the process is down.

While it may "just get restarted" by the operator it will also signal to them that some manual action (in this case funding the accounts) needs to be taken.

Ok i will add a limit but make an issue regarding the dev-ops-related workflow for how the host operator can be notified that the process has stopped.

ameba23 · 2024-12-18T09:09:42Z

crates/threshold-signature-server/src/helpers/validator.rs

-    kv: &KvManager,
-) -> Result<(PairSigner<EntropyConfig, sr25519::Pair>, StaticSecret), UserErr> {
-    let hkdf = get_hkdf(kv).await?;
+pub fn get_signer_and_x25519_secret(


This is now only used when a ValidatorName (eg: --alice) is given, or in tests.

ameba23 · 2024-12-18T09:13:41Z

crates/threshold-signature-server/src/user/errors.rs

@@ -179,6 +179,8 @@ pub enum UserErr {
    TooFewSigners,
    #[error("Non signer sent from relayer")]
    IncorrectSigner,
+    #[error("Node has started fresh and not yet successfully set up")]
+    NotReady,


If we want to react programmatically to this error (eg: for slashing) we could change the impl IntoResponse below to give a some special status code if this variant is present.

ameba23 · 2024-12-18T11:19:38Z

crates/threshold-signature-server/src/lib.rs

+    /// - Communication has been established with the chain node
+    /// - The TSS account is funded
+    /// - The TSS account is registered with the staking extension pallet
+    ready: Arc<RwLock<bool>>,


This bool could maybe be replaced with an enum with the various states of readiness to make it easier to determine why the node is not ready: no connection to chain, no funds, or not registered with staking pallet.

Unless we have a defined state transition flow here I don't think this makes sense. But I could be convinced either way

JesseAbram · 2025-01-06T20:32:34Z

are we fully set on not keeping this in the kvdb, I thought we were still on the fence about that, if so then I can push removing the whole kvdb as its purpose is now kinda obsolute and everything can be held in state, also probably needs to get devops eyes on this too as it will break their flow

HCastano

Generally looks good, but I need another review to finish up

HCastano · 2025-01-06T21:47:09Z

crates/threshold-signature-server/src/helpers/launch.rs

-    let backoff = backoff::ExponentialBackoff::default();
-    match backoff::future::retry(backoff, connect_to_substrate_node).await {
+    // Never give up trying to connect
+    let backoff = backoff::ExponentialBackoff { max_elapsed_time: None, ..Default::default() };


Yeah, I also think it means it won't stop based off time (but may stop under other conditions).

Why do you want to do it this way as opposed to the 15 min limit we had before?

HCastano · 2025-01-06T21:56:16Z

crates/threshold-signature-server/src/lib.rs

+    /// - Communication has been established with the chain node
+    /// - The TSS account is funded
+    /// - The TSS account is registered with the staking extension pallet
+    ready: Arc<RwLock<bool>>,


Unless we have a defined state transition flow here I don't think this makes sense. But I could be convinced either way

crates/threshold-signature-server/src/lib.rs

HCastano · 2025-01-06T21:59:05Z

crates/threshold-signature-server/src/lib.rs

+    }
+
+    /// Get a [PairSigner] for submitting extrinsics with subxt
+    pub fn signer(&self) -> PairSigner<EntropyConfig, sr25519::Pair> {


Nice, I like all the getters 👍

ameba23 · 2025-01-07T09:41:01Z

@JesseAbram

are we fully set on not keeping this in the kvdb, I thought we were still on the fence about that, if so then I can push removing the whole kvdb as its purpose is now kinda obsolute and everything can be held in state,

Personally, i think it would be great to remove the kvdb. Im still not totally sure what our best option is for keyshare storage - i am looking into using the SGX seal API which Hang from Phala suggested to us. But i would suggest for now we keep it in memory only.

also probably needs to get devops eyes on this too as it will break their flow

I have a call with @vitropy tomorrow to look at the setup we have on the TDX machine, which hopefully will make clear why these changes are needed.

HCastano

Nice work with this 💪

cc @entropyxyz/system-reliability-engineers since this will impact your flow

crates/threshold-signature-server/src/helpers/launch.rs

HCastano · 2025-01-07T17:17:00Z

crates/threshold-signature-server/src/helpers/launch.rs

+                Ok(())
+            };
+
+            if let Err(error) = backoff::future::retry(backoff.clone(), balance_query).await {


Doesn't look like multiple balance queries like this are handled natively by subxt

crates/threshold-signature-server/src/node_info/api.rs

HCastano · 2025-01-07T17:39:38Z

crates/threshold-signature-server/src/lib.rs

+    }
+
+    /// Convenience function to get chain api and rpc
+    pub async fn get_api_rpc(


Can you open a Good First Issue to go through the codebase and use this?

HCastano · 2025-01-07T17:49:38Z

crates/threshold-signature-server/src/user/api.rs

@@ -220,12 +220,14 @@ pub async fn sign_tx(
    State(app_state): State<AppState>,
    Json(encrypted_msg): Json<EncryptedSignedMessage>,
 ) -> Result<(StatusCode, Body), UserErr> {
-    let (signer, x25519_secret) = get_signer_and_x25519_secret(&app_state.kv_store).await?;
+    if !app_state.is_ready() {


I do worry a bit that we'll forget this check in some new endpoint and it could cause problems. But maybe it'll be quite obvious when somebody tries to do something when the app isn't ready

Agree, this is a problem. The other way to approach this would be to start with a mini axum app which only exposes /version and /info, then when the checks have passed kill it and start our full axum app. But the disadvantage is that when an entropy-tss goes down and restarts, we would get a 404 when hitting eg: /sign_tx, which is hard to distinguish from some other problem.

HCastano · 2025-01-07T17:52:13Z

crates/threshold-signature-server/src/helpers/tests.rs

@@ -98,8 +92,7 @@ pub async fn create_clients(
    values: Vec<Vec<u8>>,
    keys: Vec<String>,
    validator_name: &Option<ValidatorName>,
-) -> (IntoMakeService<Router>, KvManager) {
-    let listener_state = ListenerState::default();
+) -> (IntoMakeService<Router>, KvManager, SubxtAccountId32) {


Do we need to return the account ID here? Or maybe said another way, is there no way to access the app state in the test context after this setup helper?

i hear you, but i am not sure how to access app state after we have called .into_make_service()

HCastano · 2025-01-07T17:54:07Z

CHANGELOG.md

@@ -32,6 +32,9 @@ runtime
 - In [#1147](https://github.com/entropyxyz/entropy-core/pull/1147) a field is added to the
  chainspec: `jump_started_signers` which allows the chain to be started in a pre-jumpstarted state
  for testing. If this is not desired it should be set to `None`.
+- In [#1216](https://github.com/entropyxyz/entropy-core/pull/1216) the `--setup-only` option for `entropy-tss`


The second part of this breaking change kinda understates the impact this will have on the operator flow...we can maybe elaborate on this more when cutting the next release

ameba23 · 2025-01-08T07:26:07Z

As this is a significant breaking change and i feel like there is still a bit of unsureness i am not gonna merge until we have talked about it at the core sync.

Having looked at the option of SGX-sealing the keyshare - i have concerns about how we can use a recovered keyshare if we have a new TSS account id - since the keyshare's party id is the TSS account id, and we need to sign protocol messages with it in order to do anything useful with the keyshare.

But that said i don't really see another option than this, because of needing the keypairs to prove attestation.

@JesseAbram @HCastano

…account, and registering on chain

ameba23 added 14 commits October 22, 2024 11:36

Dont allow mnemonic to be passed in via CLI, or environment variable …

1de907a

…- generate it internally

Changelog

6b97f8f

Error handling

7bd4358

Add endpoint giving public keys

4fcfc30

Document new endpoint

e1649a4

Changelog

b12149b

Clippy

53d5175

Merge master

7c7acc7

Merge master

a2c9cd7

Merge master

5b8db51

Fix lockfile

1de6e81

Add keys to appstate

8a8cb52

Rm persisted TSS keys

542849e

ameba23 marked this pull request as draft December 13, 2024 10:58

ameba23 added 7 commits December 13, 2024 14:00

Tidy following app state change

15d3bbe

Fixes for tests and test helpers

1ca0a0c

Revert commented out import

1303be2

Clippy

ebc339e

Update unsafe get test

87c6afd

Rm setup only option, tidy

e71cc72

Tidy AppState interface

6831f49

ameba23 changed the title ~~Non persistant TSS signer and x25519 keypair~~ Non persistent TSS signer and x25519 keypair Dec 16, 2024

ameba23 added 7 commits December 16, 2024 14:11

Allow for entropy-tss to be put in a non-ready state

1c36ae0

Update node info test

e416800

Make app state ready in tests

7d8690b

Comments

96c6a1a

Fix node info test

91ac834

Update pre-requisite checks

b59d78d

Clippy

c7d5ca2

ameba23 commented Dec 17, 2024

View reviewed changes

Fixes, add helper

a71521b

ameba23 commented Dec 17, 2024

View reviewed changes

Base automatically changed from peg/generate-mnemonic to master December 18, 2024 07:20

ameba23 added 2 commits December 18, 2024 09:56

Merge master

9d2f80d

Changelog

7cb7ac6

ameba23 commented Dec 18, 2024

View reviewed changes

ameba23 marked this pull request as ready for review December 18, 2024 09:55

ameba23 requested review from HCastano and JesseAbram December 18, 2024 09:55

ameba23 added 2 commits December 18, 2024 11:18

Improve display of failed balance check errors

3c08c75

Improve display of failed registration checks

c18124e

ameba23 commented Dec 18, 2024

View reviewed changes

JesseAbram mentioned this pull request Jan 6, 2025

Remove manual password input #1237

Closed

HCastano reviewed Jan 6, 2025

View reviewed changes

HCastano added the Breaks API label Jan 7, 2025

HCastano approved these changes Jan 7, 2025

View reviewed changes

ameba23 added 4 commits January 8, 2025 12:37

Add a 15 minutes maximum time limit for connecting to chain, funding …

35f6c9a

…account, and registering on chain

Minor edits from PR review

f67fa3d

Minor edits from PR review

3dbf710

Merge master

db93f38

ameba23 requested review from cooldracula and vitropy January 8, 2025 17:36

ameba23 mentioned this pull request Jan 14, 2025

Key provider / recovery feature for entropy-tss #1249

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non persistent TSS signer and x25519 keypair #1216

Non persistent TSS signer and x25519 keypair #1216

ameba23 commented Dec 13, 2024 •

edited

Loading

ameba23 Dec 17, 2024 •

edited

Loading

ameba23 Dec 17, 2024

HCastano Jan 7, 2025

ameba23 Dec 18, 2024

HCastano Jan 7, 2025

ameba23 Jan 8, 2025

ameba23 Dec 18, 2024 •

edited

Loading

HCastano Jan 6, 2025

ameba23 Jan 7, 2025 •

edited

Loading

HCastano Jan 7, 2025

ameba23 Jan 8, 2025

ameba23 Dec 18, 2024

ameba23 Dec 18, 2024

ameba23 Dec 18, 2024

HCastano Jan 6, 2025

JesseAbram commented Jan 6, 2025

HCastano left a comment

HCastano Jan 6, 2025

HCastano Jan 6, 2025

HCastano Jan 6, 2025

ameba23 commented Jan 7, 2025 •

edited

Loading

HCastano left a comment

HCastano Jan 7, 2025

HCastano Jan 7, 2025

HCastano Jan 7, 2025

ameba23 Jan 8, 2025

HCastano Jan 7, 2025

ameba23 Jan 8, 2025

HCastano Jan 7, 2025

ameba23 commented Jan 8, 2025

Non persistent TSS signer and x25519 keypair #1216

Are you sure you want to change the base?

Non persistent TSS signer and x25519 keypair #1216

Conversation

ameba23 commented Dec 13, 2024 • edited Loading

ameba23 Dec 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ameba23 Dec 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ameba23 Jan 7, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JesseAbram commented Jan 6, 2025

HCastano left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ameba23 commented Jan 7, 2025 • edited Loading

HCastano left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ameba23 commented Jan 8, 2025

ameba23 commented Dec 13, 2024 •

edited

Loading

ameba23 Dec 17, 2024 •

edited

Loading

ameba23 Dec 18, 2024 •

edited

Loading

ameba23 Jan 7, 2025 •

edited

Loading

ameba23 commented Jan 7, 2025 •

edited

Loading