[WIP] Deploying a dask cluster on EC2 spot instances

This demonstrates how to set up a dask.distributed cluster on EC2 spot instances.

Leveraging existing devops tools like Terraform and Docker, we aim to provide easy to deploy and highly customizable dask cluster configurations while keeping scripts to a minimum.

For now, the recipes provided in this repository remains highly experimental, untested and not recommended for use in any settings. For a more thorough and mature solution see: dask/dask-ec2, an easy to use command-line tool to create dask clusters on EC2, part of the official dask project.

Prerequisites

Terraform and Packer binaries in PATH
AWS credentials set in ~/.aws/credentials

Quickstart

Create the base AMI with packer (install docker and build container):

packer build images/dask_base_cpu.json

Note: add argument --force to re-create an existing AMI.

Generate an ssh key to access dask-scheduler and worker nodes:

mkdir -p .ssh
ssh-keygen -t rsa -b 4096 -C "dask_node" -f .ssh/dask_node_key -P ""

Create the cluster with terraform:

terraform plan
terraform apply

Set up port redirection to access the dask scheduler:

ssh -N -i .ssh/dask_node_key \
    -L 8786:localhost:8786 \
    -L 9786:localhost:9786 \
    -L 8787:localhost:8787 \
    ubuntu@[scheduler.public_ip]

Run dask jobs:

from dask.distributed import Client

def square(x):
    return x ** 2

client = Client('127.0.0.1:8786')

# TODO: add a substantive example of dask job.
squares = client.map(square, range(100))

print(client.gather(squares))

Check the scheduler UI: localhost:8787/status

Release AWS resources when all jobs are finished:

terraform destroy

TODO and ideas

Use docker registry
Use Kubernetes or ECS
Command line tool?
GPU image/container with CUDA 8 support
Dask job example
Use terraform variables
Support for other cloud providers (openstack, GCE, etc.)

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
images		images
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.tf		main.tf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[WIP] Deploying a dask cluster on EC2 spot instances

Prerequisites

Quickstart

TODO and ideas

About

Releases

Packages

Languages

License

martinsotir/dask-terraform-recipes

Folders and files

Latest commit

History

Repository files navigation

[WIP] Deploying a dask cluster on EC2 spot instances

Prerequisites

Quickstart

TODO and ideas

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages