REST API: return participation key from generate endpoint #6158

PhearZero · 2024-10-28T20:36:45Z

Summary

Returns the model.ParticipationKey from /v2/participation/generate

Test Plan

jannotti

I'm unsure about the idea of blocking until the part keys are created. Generation can take a very long time because of Falcon keys.
Perhaps we can generate and return the ID immediately, and the caller can be expected to ask for the details, perhaps repeatedly, until they have been generated.

jannotti · 2024-11-12T17:43:36Z

daemon/algod/api/algod.oas2.json

@@ -5629,4 +5629,4 @@
      "name": "private"
    }
  ]
-}
+}


Let's not remove end-of-file newlines gratuitously, else we'll end up with commit wars that bring them back and remove them depending on who edits them last.

daemon/algod/api/server/v2/handlers.go

jannotti · 2024-11-12T18:02:40Z

daemon/algod/api/server/v2/handlers.go

 	// Semaphore was acquired, generate the key.
 	go func() {
 		defer v2.KeygenLimiter.Release(1)
-		err := v2.generateKeyHandler(address, params)
-		if err != nil {
-			v2.Log.Warnf("Error generating participation keys: %v", err)
-		}
 	}()


This starts a (nearly empty) go routine which immediately releases the KeygenLimiter semaphore. Therefore, this allows any number of keygens to happen concurrently. I doubt that's what we want, so the defer line should be moved outside of the go routine, which should go away. I would put it before line 307.

Since generating part keys can take a very long time, I'm not sure what will happen to the TCP connection that's being held open the entire time. I expect timeouts will occur in many situations.

This was the only was I could get the lock to release while testing. I will take another look at it.

Somewhat off-topic. It's unclear to me why a semaphore.Weighted was used. It seems to be used like a simple sync.WaitGroup. You don't have to fix that, I suppose.

PhearZero · 2024-11-12T19:49:01Z

I'm unsure about the idea of blocking until the part keys are created. Generation can take a very long time because of Falcon keys. Perhaps we can generate and return the ID immediately, and the caller can be expected to ask for the details, perhaps repeatedly, until they have been generated.

@jannotti This is a good question, I am currently watching the participation endpoint during key generation. I agree, returning the participation ID would be an improvement to the current implementation. It would still require the caller to handle errors, effectively just shifting the problem to a different machine and adds RPC overhead to the node.

Since it's a private api guarded by the admin token, it may make sense to allow a long lived connection. At least that was my thinking, would love to know what you have in mind.

Co-authored-by: John Jannotti <[email protected]>

jannotti · 2024-11-12T20:13:19Z

Since it's a private api guarded by the admin token, it may make sense to allow a long lived connection.

I think that's fine, in principle. But I think that extremely long-lived TCP connections that don't send any data are sometimes subject to timeouts. For example, if there is a NAT box between client and algod, it's going to timeout the state for the connection eventually. I think this is an area that you just can't count on any particular behavior past a minute or two, so I would not want our APIs to depend on the correct behaviour of very long-lived, silent connections.

Polling by the client using an ID that is returned quickly may seem inelegant, but I think it's the most robust.

A particularly annoying thing that could occur:

ask for a lot of part keys
server starts chugging along.
connection is lost
client tries again
NOPE. Server is still working, so mutex is held. Client must wait before asking again.

PhearZero · 2024-11-12T21:24:22Z

Polling by the client using an ID that is returned quickly may seem inelegant, but I think it's the most robust.

I agree with the returning just an ID approach as long as we can handle the errors elegantly it should be a good solution. I wasn't able to find an elegant way to handle the error when keys already existed for the given first and last. Do you have any reservations on checking the file system similar to the generate method?

For example, if there is a NAT box between client and algod, it's going to timeout the state for the connection eventually. I think this is an area that you just can't count on any particular behavior past a minute or two, so I would not want out APIs to depend on the correct behaviour of very long lived connections.

Absolutely, whatever you think is best. The server generally should be in charge of state and in this case it would require a state machine on clients for a resource owned by the service. We could help the consumers here, maybe a status after block style endpoint for the generator which would allow for both? It would also allow resiliency to failures since we could restart the wait with a given ID. The generate endpoint could return an ID/fail early then we can handle state changes client side by watching the long lived endpoint. (Could be a separate PR for status endpoint)

What do you think @jannotti?

PhearZero · 2024-11-13T15:38:16Z

I'm not seeing a way to return the ID since it is dependent on the generated secrets (I may be missing something obvious). There are quite a few error messages that are not returned, is this something that is expected?

Just refactoring the error handling a bit may get us what we need until we find a solution for the ID

jannotti · 2024-11-13T20:37:09Z

I'm not seeing a way to return the ID since it is dependent on the generated secrets

That seems to be the case, which explains why this wasn't done before, despite the comment saying that maybe it could be done. Without looking into at all, I don't know why the ID is a hash of the contents. I don't know if that's just "cute" - a interesting way to generate a unique ID - or an actual meaningful aspect of the implementation. Generally speaking, I like my database IDs to just be sequential, or perhaps uuids. I'd support such a change if it's staightforward. If we did that, we could generate the ID first, and perform the insertion later, with the pregenerated ID.

I don't know if there's any code that needs to find the right partkey based on contents. That would explain the deterministic ID. If there is, it would have to be refactored - presumably it could be done with a more elaborate where clause on the SQL to find the key.

returns participation id on generate

PhearZero · 2024-11-14T04:01:04Z

It appears as if the table has a sequential primary key and is storing the Participation ID as an additional field. I was able to remove the generated keys from the Participation Identity which produces the hash. This should keep changes low but you raise a good point

I don't know if that's just "cute" - a interesting way to generate a unique ID - or an actual meaningful aspect of the implementation

It seems like it could be the former, I will ask around to see if anyone has any information. If this implementation is accepted it would be good to expose a few errors before shipping (just the most common ones)

PhearZero · 2024-11-14T15:29:48Z

I've come up empty for justifications so far. The ID was introduced with the RPC endpoints in #3164. I assume it was just an ad-hoc decision.

feat: return participation key from generate endpoint

d4ff032

jannotti requested changes Nov 12, 2024

View reviewed changes

PhearZero and others added 2 commits November 12, 2024 15:00

chore: match conventions

982c32c

Co-authored-by: John Jannotti <[email protected]>

chore: add newline

ab8df31

fix: unnecessary threading

6e5d773

refactor: participation id hash

2b4ea0c

returns participation id on generate

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

REST API: return participation key from generate endpoint #6158

REST API: return participation key from generate endpoint #6158

PhearZero commented Oct 28, 2024

jannotti left a comment

jannotti Nov 12, 2024

jannotti Nov 12, 2024

PhearZero Nov 12, 2024 •

edited

Loading

jannotti Nov 12, 2024

PhearZero commented Nov 12, 2024 •

edited

Loading

jannotti commented Nov 12, 2024 •

edited

Loading

PhearZero commented Nov 12, 2024

PhearZero commented Nov 13, 2024

jannotti commented Nov 13, 2024

PhearZero commented Nov 14, 2024 •

edited

Loading

PhearZero commented Nov 14, 2024

REST API: return participation key from generate endpoint #6158

Are you sure you want to change the base?

REST API: return participation key from generate endpoint #6158

Conversation

PhearZero commented Oct 28, 2024

Summary

Test Plan

jannotti left a comment

Choose a reason for hiding this comment

jannotti Nov 12, 2024

Choose a reason for hiding this comment

jannotti Nov 12, 2024

Choose a reason for hiding this comment

PhearZero Nov 12, 2024 • edited Loading

Choose a reason for hiding this comment

jannotti Nov 12, 2024

Choose a reason for hiding this comment

PhearZero commented Nov 12, 2024 • edited Loading

jannotti commented Nov 12, 2024 • edited Loading

PhearZero commented Nov 12, 2024

PhearZero commented Nov 13, 2024

jannotti commented Nov 13, 2024

PhearZero commented Nov 14, 2024 • edited Loading

PhearZero commented Nov 14, 2024

PhearZero Nov 12, 2024 •

edited

Loading

PhearZero commented Nov 12, 2024 •

edited

Loading

jannotti commented Nov 12, 2024 •

edited

Loading

PhearZero commented Nov 14, 2024 •

edited

Loading