-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Nelson control plane #240
Comments
Meeting notes 2019-08-13Participants: @adelbertc @drewgonzales360 @lu4nm3 @miranhpark @timperrett TerminologyCP: Nelson Control Plane Notes
High-level flow
Background details
Open questions
|
A few questions that arise immediately for me:
|
Summary
Nelson was designed to be extensible in that all the high-level operations are abstracted away and the system as a whole is programmed against interfaces. That adding Kubernetes support just required adding interpreters for the scheduler and health checker is a testament to this fact.
However just having interpreters was not enough - we soon realized that because of the plethora and flexibility of different systems there would be no way for Nelson to be prescriptive about deployments. This led to the implementation of blueprints which have been used with great success and have solved numerous issues with regards to organizational choices without sacrificing Nelson's rigorous approach to deployment infrastructure.
We are now at another crossroad. While deployment blueprints get us a lot of flexibility along certain axes, it is not sufficient for full flexibility in deploying Nelson. To date the following issues have come up:
This RFC proposes to re-architect Nelson as a control plane, where instead of both "deciding what to do" and "actually doing it" Nelson becomes purely about "deciding what to do" and emits these as events, leaving the "actually doing it" to a data plane. This data plane would subscribe to events from Nelson and act on them accordingly and most importantly, be controlled by an organization. Different organizations with different policies would just have different data planes.
Design
Relevant initial Gitter discussion
Nelson is already built around an internal queuing model:
The idea then is to take the subset of these background jobs that constitute the "data plane" and instead of having both a producer and consumer inside Nelson, have only the producer and relegate the consumption to the downstream data plane.
The current thinking is these will stay in the control plane:
These will be relegated to the data plane:
For each of the data plane components, instead of being consumed by an implementation that actually acts on the event, it will instead be emitted to any subscribers listening on a network port. It is then on the subscriber to act on this information.
Implementation
Because deploying to a scheduler is the largest burden at the moment, the pipeline processor will be our first target. However because launching, deleting, and health checking are all scheduler-specific functionality, we cannot simply just migrate the pipeline processor, but also the cleanup pipeline, sweeper, and deployment monitor. The routing cron can likely be left as a separate step. Thus the migration order is:
As for how to emit events, current proposals are:
Implementation steps
Other notes
Testing: Rearchitecting Nelson into a control plane may also bring benefits to testing and demonstrations. Right now it has been hard to showcase Nelson because the control plane and data plane are tied together and thus require things like a functional Kubernetes cluster to startup. If instead Nelson just emitted events, we could say, have a dummy scheduler interpret those events for demo purposes, or even have something that interprets those events as D3.js actions where "deploy X" becomes rendering a node, service dependencies become edges, traffic shifting becomes moving edges, and garbage collection becomes deleting nodes.
The text was updated successfully, but these errors were encountered: