-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Create rust GRPC client for Runner #491
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it looks like you went with the approach of having client be a separate folder and keeping a copy of the runner proto file. I wasn't sure about that approach since we'd have duplicate proto files and I felt like the build process would be a bit wonky since (New) Coordinator would then have to copy the folder in, and then build it separately, and then I assume do something with the build output? I wasn't really sure how that would end up looking. What's the integration in Docker like? Have you tried using it in new coordinator, existing coordinator, or block streamer as a test?
if (validationResult !== null) { | ||
callback(validationResult, null); | ||
return; | ||
} | ||
|
||
const { accountId, functionName, code, schema, redisStream, version } = call.request; | ||
const executorId = hashString(`${accountId}/${functionName}`); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're using accountId and function name anyway, is there a need for hashing here? Couldn't we just use accountId/functionName directly? I had envisioned executorId would be something coordinator decides (And it includes whatever information it wants to generate it, could be function name, account id, maybe machine ID, version, etc. in the future, whatever it is) and it simply passes it over to Runner so that it can be stored for access in that way. Coordinator can then pass that, in the future to one of many runners, or maybe a load balancer, depending on how we set things up. Did you have any ideas on leveraging determinism of the ID for anything or is it more just because its a good quality to have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typically the service (Runner) which manages the entities (Executors) assigns the IDs, as they have the most intimate knowledge of these entities and can ensure there are no collisions. I'm not sure there is benefit in Coordinator generating these IDs.
As for hashing, it's generally a good idea to keep IDs opaque. This allows us to change how these IDs are generated going forward without impacting clients, which I have been somewhat thinking about, we may not need deterministic IDs after all. Without opaque IDs, clients might construct the IDs themselves, and then run in to issues when the IDs change on the server side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha. Yeah you're right that Coordinator managing the IDs is not particularly valuable since it will be given the ID back when it creates the executor, which it can use for stopping an executor later. This is actually better as Runner being the ID creator means Coordinator is less "stateful". Like, if Coordinator dies and we restart it and have it load the existing runner streams and their IDs, then having Coordinator assume those IDs makes more sense if Runner created them, rather than some previous instance of Coordinator.
And yeah your points on changing ID generation was what I was considering. Changing the way we ID the executors is probably not too difficult either, if we continue with the approach of Runner creating it and passing it back.
This is being used in #444 now. The Docker context needs to be at the root directory anyway since Coordinator depends on two separate crates. Duplicate proto files might be something we just have to live with until we migrate Runner to Rust 😅. Let me try symlinking them, that might be more convenient. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I understand what you mean by having the client at root. I'll learn whatever strategy you use for linking so I know it for the future. And yeah, relying on proto files for codegen is indeed one of the cons of using protobuf. We might have to do it again if we eventually allow developers to stop and start their indexers from the UI, but that's likely a long time from now, if we ever do so.
One last thing, I noticed you are using the hard coded host:port values. That's also fine for now, though in my PR I did address that by setting that up. Its up to you if you want to include that in this PR. I feel like it's a good place to do so. I or you can of course do so in a future PR too.
Let's get this merged and we can address in a future PR :) |
This PR creates a rust based GRPC client for Runner, for use within Coordinator V2. For now, this exists as its own crate within the top-level directory. The Coordinator PR was becoming quite large so I decided to separate this out.
Additionally, I've made a couple changes to the Runner proto:
spec
->runner
executorId
fromStartExecutorRequest
- it should be deterministic and assigned internallyversion
to the executor - this will be used to determine whether an executor should be restarted