You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In response to my post on slack about co-routines, @jlowin was kind enough to talk with me last weekend about your plans to introduce more dynamism into flow graphs which is a huge and welcome step forward. I do worry that they will exacerbate another sort of bottleneck in your current design: the process running the FlowRunner currently drives all the activity. Formally, unrolling loops may allow creating thousands or millions of tasks in long-running flows that, say, may be processing a fast stream from kafka. It seems, however, that unless the FlowRunner is rethought, it will get in the way of practically executing such a flow.
My first thought was to use Dask pub/sub for edges, but more abstractly, a good way forward would be to have a pluggable "edge engine" that implements intertask communication and monitoring for a flow (or sub-flow). The current "centralized, imperative" edge engine is perfectly appropriate for debugging and for small flows. Another edge engine can inject wrappers around tasks that use a pub/sub framework (Dask, NATS, redis, web-sockets, grpc... who knows -- perhaps even hybrids of the former with IPC for tasks schedule on workers that share an IPC domain), and asynchronously report back to monitoring flow (including ability to restart).
Using different edge engines for different portions of the graph would allow, for instance, an external event stream to be used for one portion of a flow, while allowing the processing of external events to be unfolded using a different edge engine -- say one based on Dask pub/sub.
An edge engine would have both a "task" component, and a "flow" component: the former for messages between tasks, the latter for control and monitoring. The later would need to be flexible enough to incorporate both current imperative/direct management, and a declarative compile and monitor strategy. However the latter should allow simulation of the former.
Edge engines would allow flow graphs to pass messages much more quickly, and to scale past the capabilities of a centralized controller. In the future, they would provide a path to run flows that outstrip the abilities of centralized monitoring, as well.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
In response to my post on slack about co-routines, @jlowin was kind enough to talk with me last weekend about your plans to introduce more dynamism into flow graphs which is a huge and welcome step forward. I do worry that they will exacerbate another sort of bottleneck in your current design: the process running the FlowRunner currently drives all the activity. Formally, unrolling loops may allow creating thousands or millions of tasks in long-running flows that, say, may be processing a fast stream from kafka. It seems, however, that unless the FlowRunner is rethought, it will get in the way of practically executing such a flow.
My first thought was to use Dask pub/sub for edges, but more abstractly, a good way forward would be to have a pluggable "edge engine" that implements intertask communication and monitoring for a flow (or sub-flow). The current "centralized, imperative" edge engine is perfectly appropriate for debugging and for small flows. Another edge engine can inject wrappers around tasks that use a pub/sub framework (Dask, NATS, redis, web-sockets, grpc... who knows -- perhaps even hybrids of the former with IPC for tasks schedule on workers that share an IPC domain), and asynchronously report back to monitoring flow (including ability to restart).
Using different edge engines for different portions of the graph would allow, for instance, an external event stream to be used for one portion of a flow, while allowing the processing of external events to be unfolded using a different edge engine -- say one based on Dask pub/sub.
An edge engine would have both a "task" component, and a "flow" component: the former for messages between tasks, the latter for control and monitoring. The later would need to be flexible enough to incorporate both current imperative/direct management, and a declarative compile and monitor strategy. However the latter should allow simulation of the former.
Edge engines would allow flow graphs to pass messages much more quickly, and to scale past the capabilities of a centralized controller. In the future, they would provide a path to run flows that outstrip the abilities of centralized monitoring, as well.
Beta Was this translation helpful? Give feedback.
All reactions