RxPy operator to distribute a computation with ray
The distribute operator can be used directly in an existing pipeline to parallelize computations:
data = range(200)
ray.init()
rx.from_(data).pipe(
rxray.distribute(
lambda: rx.pipe(ops.map(lambda i: i*2)),
),
).subscribe()
When the distributed computation is stateful, items can be pinned to an actor with a key-based selector:
data = [(i, j) for i in range(17) for j in range(100)]
random.shuffle(data)
ray.init()
rx.from_(data).pipe(
rxray.distribute(
lambda: rx.pipe(
ops.group_by(lambda i: i[0]),
ops.flat_map(lambda g: g.pipe(
ops.map(lambda i: i[1]),
ops.average(),
ops.map(lambda i: (g.key, i)),
))
),
actor_selector=rxray.partition_by_key(lambda i: i[0]),
),
).subscribe()
RxRay is available on PyPi and can be installed with pip:
python3 -m pip install rxray