Skip to content

Commit

Permalink
Fix README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
Kobzol committed Nov 11, 2020
1 parent ccb2b90 commit 345e85e
Showing 1 changed file with 16 additions and 16 deletions.
32 changes: 16 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,19 +17,20 @@ If your pipeline cannot be run by `rsds`, feel free to send us an issue.
To compile and use `rsds`, you must have Rust toolchain installed. You can install it using e.g. [Rustup](https://rustup.rs/).

1) Build `rsds`:
```bash
$ RUSTFLAGS="-C target-cpu=native" cargo build --release
```
```bash
$ RUSTFLAGS="-C target-cpu=native" cargo build --release
```
2) Install our modified version of Dask:
```bash
$ pip install git+https://github.com/Kobzol/distributed@simplified-encoding
```
The modifications that we had to perform to make it manageable to implement the Dask
protocol in Rust are described [here](https://github.com/dask/distributed/pull/3809).
```bash
$ pip install git+https://github.com/Kobzol/distributed@simplified-encoding
```
The modifications that we had to perform to make it manageable to implement the Dask
protocol in Rust are described [here](https://github.com/dask/distributed/pull/3809).

3) Use `rsds-scheduler` instead of `dask-scheduler` when starting a Dask cluster:
```bash
$ ./target/release/rsds-scheduler
```
```bash
$ ./target/release/rsds-scheduler
```

After that just use `target/release/rsds-scheduler` as you would use `dask-scheduler`.
Be wary that most of the command line options from `dask-scheduler` are not supported though.
Expand All @@ -39,17 +40,17 @@ Be wary that most of the command line options from `dask-scheduler` are not supp
```bash
# run server
$ ./target/release/rsds-scheduler
# run worker
# run worker (in another shell)
$ dask-worker localhost:8786
```

2) Run a simple example that uses Dask dataframe:
2) Run a simple example that uses a Dask dataframe:
```python
import dask
from dask.distributed import Client
client = Client("tcp://localhost:8786")
df = dask.datasets.timeseries(start="2020-01-01", end="2020-01-03")
result = df.groupby("name")["x"].mean().compute()
print(result)
Expand All @@ -64,5 +65,4 @@ on 1 and 7 node clusters with 24 workers per node.
![image](resources/speedup-rsds-ws-7.png)

## Reports

* https://github.com/dask/distributed/issues/3139

0 comments on commit 345e85e

Please sign in to comment.