Scaling flow graphs #3386

shaunc · 2020-09-24T13:51:43Z

shaunc
Sep 24, 2020

In response to my post on slack about co-routines, @jlowin was kind enough to talk with me last weekend about your plans to introduce more dynamism into flow graphs which is a huge and welcome step forward. I do worry that they will exacerbate another sort of bottleneck in your current design: the process running the FlowRunner currently drives all the activity. Formally, unrolling loops may allow creating thousands or millions of tasks in long-running flows that, say, may be processing a fast stream from kafka. It seems, however, that unless the FlowRunner is rethought, it will get in the way of practically executing such a flow.

My first thought was to use Dask pub/sub for edges, but more abstractly, a good way forward would be to have a pluggable "edge engine" that implements intertask communication and monitoring for a flow (or sub-flow). The current "centralized, imperative" edge engine is perfectly appropriate for debugging and for small flows. Another edge engine can inject wrappers around tasks that use a pub/sub framework (Dask, NATS, redis, web-sockets, grpc... who knows -- perhaps even hybrids of the former with IPC for tasks schedule on workers that share an IPC domain), and asynchronously report back to monitoring flow (including ability to restart).

Using different edge engines for different portions of the graph would allow, for instance, an external event stream to be used for one portion of a flow, while allowing the processing of external events to be unfolded using a different edge engine -- say one based on Dask pub/sub.

An edge engine would have both a "task" component, and a "flow" component: the former for messages between tasks, the latter for control and monitoring. The later would need to be flexible enough to incorporate both current imperative/direct management, and a declarative compile and monitor strategy. However the latter should allow simulation of the former.

Edge engines would allow flow graphs to pass messages much more quickly, and to scale past the capabilities of a centralized controller. In the future, they would provide a path to run flows that outstrip the abilities of centralized monitoring, as well.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scaling flow graphs #3386

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Scaling flow graphs #3386

shaunc Sep 24, 2020

Replies: 0 comments

shaunc
Sep 24, 2020