Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move things & add few pages #212

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions .readthedocs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
version: 2

# Set the OS, Python version and other tools you might need
build:
os: ubuntu-22.04
tools:
python: "3.9"

# Build documentation in the "docs/" directory with Sphinx
sphinx:
configuration: docs/conf.py

python:
install:
- method: pip
path: .
- requirements: docs/requirements.txt
5 changes: 5 additions & 0 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
{
"githubPullRequests.ignoredPullRequestBranches": [
"master"
]
}
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
49 changes: 49 additions & 0 deletions docs/Contributing/Design.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
Design
======

Milabench aims to simulate research workloads for benchmarking purposes.

* Performance is measured as throughput (samples / secs).
For example, for a model like resnet the throughput would be image per seconds.

* Single GPU workloads are spawned per GPU to ensure the entire machine is used.
Simulating something similar to a hyper parameter search.
The performance of the benchmark is the sum of throughput of each processes.

* Multi GPU workloads

* Multi Nodes


Run
===

* Milabench Manager Process
* Handles messages from benchmark processes
* Saves messages into a file for future analysis

* Benchmark processes
* run using ``voir``
* voir is configured to intercept and send events during the training process
* This allow us to add models from git repositories without modification
* voir sends data through a file descriptor that was created by milabench main process


What milabench is
=================

* Training focused
* milabench show candid performance numbers
* No optimization beyond batch size scaling is performed
* we want to measure the performance our researcher will see
not the performance they could get.
* pytorch centric
* Pytorch has become the defacto library for research
* We are looking for accelerator with good maturity that can support
this framework with limited code change.


What milabench is not
=====================

* milabench goal is not a performance show case of an accelerator.
File renamed without changes.
File renamed without changes.
161 changes: 44 additions & 117 deletions docs/docker.rst → docs/GettingStarted/Docker.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,40 @@ Docker

`Docker Images <https://github.com/mila-iqia/milabench/pkgs/container/milabench>`_ are created for each release. They come with all the benchmarks installed and the necessary datasets. No additional downloads are necessary.


Setup
------

0. Make sure the machine can ssh between each other without passwords
1. Pull the milabench docker image you would like to run on all machines
- ``docker pull``
1. Create the output directory
- ``mkdir -p results``
2. Create a list of nodes that will participate in the benchmark inside a ``results/system.yaml`` file (see example below)
- ``vi results/system.yaml``
3. Call milabench with by specifying the node list we created.
- ``docker ... -v $(pwd)/results:/milabench/envs/runs -v <privatekey>:/milabench/id_milabench milabench run ... --system /milabench/envs/runs/system.yaml``


.. code-block:: yaml

system:
sshkey: <privatekey>
arch: cuda
docker_image: ghcr.io/mila-iqia/milabench:${system.arch}-nightly

nodes:
- name: node1
ip: 192.168.0.25
main: true
port: 8123
user: <username>

- name: node2
ip: 192.168.0.26
main: false
user: <username>

CUDA
----

Expand All @@ -22,16 +56,19 @@ storing the results inside the ``results`` folder on the host machine:

.. code-block:: bash

export SSH_KEY_FILE=$HOME/.ssh/id_rsa

# Choose the image you want to use
export MILABENCH_IMAGE=ghcr.io/mila-iqia/milabench:cuda-nightly

# Pull the image we are going to run
docker pull $MILABENCH_IMAGE

# Run milabench
docker run -it --rm --ipc=host --gpus=all \
-v $(pwd)/results:/milabench/envs/runs \
$MILABENCH_IMAGE \
docker run -it --rm --ipc=host --gpus=all --network host --privileged \
-v $SSH_KEY_FILE:/milabench/id_milabench \
-v $(pwd)/results:/milabench/envs/runs \
$MILABENCH_IMAGE \
milabench run

``--ipc=host`` removes shared memory restrictions, but you can also set ``--shm-size`` to a high value instead (at least ``8G``, possibly more).
Expand Down Expand Up @@ -63,16 +100,19 @@ For ROCM the usage is similar to CUDA, but you must use a different image and th

.. code-block:: bash

export SSH_KEY_FILE=$HOME/.ssh/id_rsa

# Choose the image you want to use
export MILABENCH_IMAGE=ghcr.io/mila-iqia/milabench:rocm-nightly

# Pull the image we are going to run
docker pull $MILABENCH_IMAGE

# Run milabench
docker run -it --rm --ipc=host \
docker run -it --rm --ipc=host --network host --privileged \
--device=/dev/kfd --device=/dev/dri \
--security-opt seccomp=unconfined --group-add video \
-v $SSH_KEY_FILE:/milabench/id_milabench \
-v /opt/amdgpu/share/libdrm/amdgpu.ids:/opt/amdgpu/share/libdrm/amdgpu.ids \
-v /opt/rocm:/opt/rocm \
-v $(pwd)/results:/milabench/envs/runs \
Expand All @@ -90,119 +130,6 @@ For the performance report, it is the same command:
milabench report --runs /milabench/envs/runs


Multi-node benchmark
^^^^^^^^^^^^^^^^^^^^

There are currently two multi-node benchmarks, ``opt-1_3b-multinode`` (data-parallel) and
``opt-6_7b-multinode`` (model-parallel, that model is too large to fit on a single GPU). Here is how to run them:

0. Make sure the machine can ssh between each other without passwords
1. Pull the milabench docker image you would like to run on all machines
- ``docker pull``
1. Create the output directory
- ``mkdir -p results``
2. Create a list of nodes that will participate in the benchmark inside a ``results/system.yaml`` file (see example below)
- ``vi results/system.yaml``
3. Call milabench with by specifying the node list we created.
- ``docker ... -v $(pwd)/results:/milabench/envs/runs -v <privatekey>:/milabench/id_milabench milabench run ... --system /milabench/envs/runs/system.yaml``

.. notes::

The main node is the node that will be in charge of managing the other worker nodes.

.. code-block:: yaml

system:
sshkey: <privatekey>
arch: cuda
docker_image: ghcr.io/mila-iqia/milabench:${system.arch}-nightly

nodes:
- name: node1
ip: 192.168.0.25
main: true
port: 8123
user: <username>

- name: node2
ip: 192.168.0.26
main: false
user: <username>


Then, the command should look like this:

.. code-block:: bash

# On manager-node:

# Change if needed
export SSH_KEY_FILE=$HOME/.ssh/id_rsa
export MILABENCH_IMAGE=ghcr.io/mila-iqia/milabench:cuda-nightly
docker run -it --rm --gpus all --network host --ipc=host --privileged \
-v $SSH_KEY_FILE:/milabench/id_milabench \
-v $(pwd)/results:/milabench/envs/runs \
$MILABENCH_IMAGE \
milabench run --system /milabench/envs/runs/system.yaml \
--select multinode

The last line (``--select multinode``) specifically selects the multi-node benchmarks. Omit that line to run all benchmarks.

If you need to use more than two nodes, edit or copy ``system.yaml`` and simply add the other nodes' addresses in ``nodes``.
You will also need to update the benchmark definition and increase the max number of nodes by creating a new ``overrides.yaml`` file.

For example, for 4 nodes:


.. code-block:: yaml

# Name of the benchmark. You can also override values in other benchmarks.
opt-6_7b-multinode:
num_machines: 4


.. code-block:: yaml

system:
arch: cuda
docker_image: ghcr.io/mila-iqia/milabench:${system.arch}-nightly

nodes:
- name: node1
ip: 192.168.0.25
main: true
port: 8123
user: <username>

- name: node2
ip: 192.168.0.26
main: false
user: <username>

- name: node3
ip: 192.168.0.27
main: false
user: <username>

- name: node4
ip: 192.168.0.28
main: false
user: <username>


The command would look like

.. code-block:: bash

docker ... milabench run ... --system /milabench/envs/runs/system.yaml --overrides /milabench/envs/runs/overrides.yaml


.. note::
The multi-node benchmark is sensitive to network performance. If the mono-node benchmark ``opt-6_7b`` is significantly faster than ``opt-6_7b-multinode`` (e.g. processes more than twice the items per second), this likely indicates that Infiniband is either not present or not used. (It is not abnormal for the multinode benchmark to perform *a bit* worse than the mono-node benchmark since it has not been optimized to minimize the impact of communication costs.)

Even if Infiniband is properly configured, the benchmark may fail to use it unless the ``--privileged`` flag is set when running the container.


Building images
---------------

Expand Down
Loading
Loading