diff --git a/.readthedocs.yml b/.readthedocs.yml new file mode 100644 index 000000000..6e083ca7e --- /dev/null +++ b/.readthedocs.yml @@ -0,0 +1,17 @@ +version: 2 + +# Set the OS, Python version and other tools you might need +build: + os: ubuntu-22.04 + tools: + python: "3.9" + +# Build documentation in the "docs/" directory with Sphinx +sphinx: + configuration: docs/conf.py + +python: + install: + - method: pip + path: . + - requirements: docs/requirements.txt \ No newline at end of file diff --git a/.vscode/settings.json b/.vscode/settings.json new file mode 100644 index 000000000..6c2ff60b6 --- /dev/null +++ b/.vscode/settings.json @@ -0,0 +1,5 @@ +{ + "githubPullRequests.ignoredPullRequestBranches": [ + "master" + ] +} \ No newline at end of file diff --git a/docs/ref-pack.rst b/docs/API/ref-pack.rst similarity index 100% rename from docs/ref-pack.rst rename to docs/API/ref-pack.rst diff --git a/docs/sizer.rst b/docs/Contributing/BatchSizer.rst similarity index 100% rename from docs/sizer.rst rename to docs/Contributing/BatchSizer.rst diff --git a/docs/config.rst b/docs/Contributing/Config.rst similarity index 100% rename from docs/config.rst rename to docs/Contributing/Config.rst diff --git a/docs/dev-usage.rst b/docs/Contributing/Debugging.rst similarity index 100% rename from docs/dev-usage.rst rename to docs/Contributing/Debugging.rst diff --git a/docs/Contributing/Design.rst b/docs/Contributing/Design.rst new file mode 100644 index 000000000..cf802cfca --- /dev/null +++ b/docs/Contributing/Design.rst @@ -0,0 +1,49 @@ +Design +====== + +Milabench aims to simulate research workloads for benchmarking purposes. + +* Performance is measured as throughput (samples / secs). + For example, for a model like resnet the throughput would be image per seconds. + +* Single GPU workloads are spawned per GPU to ensure the entire machine is used. + Simulating something similar to a hyper parameter search. + The performance of the benchmark is the sum of throughput of each processes. + +* Multi GPU workloads + +* Multi Nodes + + +Run +=== + +* Milabench Manager Process + * Handles messages from benchmark processes + * Saves messages into a file for future analysis + +* Benchmark processes + * run using ``voir`` + * voir is configured to intercept and send events during the training process + * This allow us to add models from git repositories without modification + * voir sends data through a file descriptor that was created by milabench main process + + +What milabench is +================= + +* Training focused +* milabench show candid performance numbers + * No optimization beyond batch size scaling is performed + * we want to measure the performance our researcher will see + not the performance they could get. +* pytorch centric + * Pytorch has become the defacto library for research + * We are looking for accelerator with good maturity that can support + this framework with limited code change. + + +What milabench is not +===================== + +* milabench goal is not a performance show case of an accelerator. diff --git a/docs/instrument.rst b/docs/Contributing/Instrumentation.rst similarity index 100% rename from docs/instrument.rst rename to docs/Contributing/Instrumentation.rst diff --git a/docs/new_benchmarks.rst b/docs/Contributing/NewBenchmark.rst similarity index 100% rename from docs/new_benchmarks.rst rename to docs/Contributing/NewBenchmark.rst diff --git a/docs/docker.rst b/docs/GettingStarted/Docker.rst similarity index 57% rename from docs/docker.rst rename to docs/GettingStarted/Docker.rst index 582ca95a6..22881acf2 100644 --- a/docs/docker.rst +++ b/docs/GettingStarted/Docker.rst @@ -3,6 +3,40 @@ Docker `Docker Images `_ are created for each release. They come with all the benchmarks installed and the necessary datasets. No additional downloads are necessary. + +Setup +------ + +0. Make sure the machine can ssh between each other without passwords +1. Pull the milabench docker image you would like to run on all machines + - ``docker pull`` +1. Create the output directory + - ``mkdir -p results`` +2. Create a list of nodes that will participate in the benchmark inside a ``results/system.yaml`` file (see example below) + - ``vi results/system.yaml`` +3. Call milabench with by specifying the node list we created. + - ``docker ... -v $(pwd)/results:/milabench/envs/runs -v :/milabench/id_milabench milabench run ... --system /milabench/envs/runs/system.yaml`` + + +.. code-block:: yaml + + system: + sshkey: + arch: cuda + docker_image: ghcr.io/mila-iqia/milabench:${system.arch}-nightly + + nodes: + - name: node1 + ip: 192.168.0.25 + main: true + port: 8123 + user: + + - name: node2 + ip: 192.168.0.26 + main: false + user: + CUDA ---- @@ -22,6 +56,8 @@ storing the results inside the ``results`` folder on the host machine: .. code-block:: bash + export SSH_KEY_FILE=$HOME/.ssh/id_rsa + # Choose the image you want to use export MILABENCH_IMAGE=ghcr.io/mila-iqia/milabench:cuda-nightly @@ -29,9 +65,10 @@ storing the results inside the ``results`` folder on the host machine: docker pull $MILABENCH_IMAGE # Run milabench - docker run -it --rm --ipc=host --gpus=all \ - -v $(pwd)/results:/milabench/envs/runs \ - $MILABENCH_IMAGE \ + docker run -it --rm --ipc=host --gpus=all --network host --privileged \ + -v $SSH_KEY_FILE:/milabench/id_milabench \ + -v $(pwd)/results:/milabench/envs/runs \ + $MILABENCH_IMAGE \ milabench run ``--ipc=host`` removes shared memory restrictions, but you can also set ``--shm-size`` to a high value instead (at least ``8G``, possibly more). @@ -63,6 +100,8 @@ For ROCM the usage is similar to CUDA, but you must use a different image and th .. code-block:: bash + export SSH_KEY_FILE=$HOME/.ssh/id_rsa + # Choose the image you want to use export MILABENCH_IMAGE=ghcr.io/mila-iqia/milabench:rocm-nightly @@ -70,9 +109,10 @@ For ROCM the usage is similar to CUDA, but you must use a different image and th docker pull $MILABENCH_IMAGE # Run milabench - docker run -it --rm --ipc=host \ + docker run -it --rm --ipc=host --network host --privileged \ --device=/dev/kfd --device=/dev/dri \ --security-opt seccomp=unconfined --group-add video \ + -v $SSH_KEY_FILE:/milabench/id_milabench \ -v /opt/amdgpu/share/libdrm/amdgpu.ids:/opt/amdgpu/share/libdrm/amdgpu.ids \ -v /opt/rocm:/opt/rocm \ -v $(pwd)/results:/milabench/envs/runs \ @@ -90,119 +130,6 @@ For the performance report, it is the same command: milabench report --runs /milabench/envs/runs -Multi-node benchmark -^^^^^^^^^^^^^^^^^^^^ - -There are currently two multi-node benchmarks, ``opt-1_3b-multinode`` (data-parallel) and -``opt-6_7b-multinode`` (model-parallel, that model is too large to fit on a single GPU). Here is how to run them: - -0. Make sure the machine can ssh between each other without passwords -1. Pull the milabench docker image you would like to run on all machines - - ``docker pull`` -1. Create the output directory - - ``mkdir -p results`` -2. Create a list of nodes that will participate in the benchmark inside a ``results/system.yaml`` file (see example below) - - ``vi results/system.yaml`` -3. Call milabench with by specifying the node list we created. - - ``docker ... -v $(pwd)/results:/milabench/envs/runs -v :/milabench/id_milabench milabench run ... --system /milabench/envs/runs/system.yaml`` - -.. notes:: - - The main node is the node that will be in charge of managing the other worker nodes. - -.. code-block:: yaml - - system: - sshkey: - arch: cuda - docker_image: ghcr.io/mila-iqia/milabench:${system.arch}-nightly - - nodes: - - name: node1 - ip: 192.168.0.25 - main: true - port: 8123 - user: - - - name: node2 - ip: 192.168.0.26 - main: false - user: - - -Then, the command should look like this: - -.. code-block:: bash - - # On manager-node: - - # Change if needed - export SSH_KEY_FILE=$HOME/.ssh/id_rsa - export MILABENCH_IMAGE=ghcr.io/mila-iqia/milabench:cuda-nightly - docker run -it --rm --gpus all --network host --ipc=host --privileged \ - -v $SSH_KEY_FILE:/milabench/id_milabench \ - -v $(pwd)/results:/milabench/envs/runs \ - $MILABENCH_IMAGE \ - milabench run --system /milabench/envs/runs/system.yaml \ - --select multinode - -The last line (``--select multinode``) specifically selects the multi-node benchmarks. Omit that line to run all benchmarks. - -If you need to use more than two nodes, edit or copy ``system.yaml`` and simply add the other nodes' addresses in ``nodes``. -You will also need to update the benchmark definition and increase the max number of nodes by creating a new ``overrides.yaml`` file. - -For example, for 4 nodes: - - -.. code-block:: yaml - - # Name of the benchmark. You can also override values in other benchmarks. - opt-6_7b-multinode: - num_machines: 4 - - -.. code-block:: yaml - - system: - arch: cuda - docker_image: ghcr.io/mila-iqia/milabench:${system.arch}-nightly - - nodes: - - name: node1 - ip: 192.168.0.25 - main: true - port: 8123 - user: - - - name: node2 - ip: 192.168.0.26 - main: false - user: - - - name: node3 - ip: 192.168.0.27 - main: false - user: - - - name: node4 - ip: 192.168.0.28 - main: false - user: - - -The command would look like - -.. code-block:: bash - - docker ... milabench run ... --system /milabench/envs/runs/system.yaml --overrides /milabench/envs/runs/overrides.yaml - - -.. note:: - The multi-node benchmark is sensitive to network performance. If the mono-node benchmark ``opt-6_7b`` is significantly faster than ``opt-6_7b-multinode`` (e.g. processes more than twice the items per second), this likely indicates that Infiniband is either not present or not used. (It is not abnormal for the multinode benchmark to perform *a bit* worse than the mono-node benchmark since it has not been optimized to minimize the impact of communication costs.) - - Even if Infiniband is properly configured, the benchmark may fail to use it unless the ``--privileged`` flag is set when running the container. - - Building images --------------- diff --git a/docs/GettingStarted/Usage.rst b/docs/GettingStarted/Usage.rst new file mode 100644 index 000000000..cc76e3f43 --- /dev/null +++ b/docs/GettingStarted/Usage.rst @@ -0,0 +1,185 @@ + +Install and use +--------------- + +.. note:: + + You may use Docker to run the benchmarks, which will likely be easier. See the Docker section of this documentation for more information. + + +To install, clone the repo: + +.. code-block:: bash + + # You may need to upgrade pip + pip install pip -U + git clone git@github.com:mila-iqia/milabench.git + cd milabench + # + # Install in editable mode + pip install -e . + +This will install two commands, ``milabench`` and ``voir``. + + +Before running the benchmarks +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +1. Create a system configuration + + Milabench can run on multiple accelerator and multiple node. + The ``system.yaml`` sepecify setup specific values that will be used by milabench to run the benchmarks. + You can find an example below. + + * ``sshkey``: path to a privatekey to use to ssh to the worker nodes + * ``arch``: GPU accelerator kind, this is optional, if not specified milabench will try to deduce the backend + It might be necessary if accelerators from multiple vendors are installed on the system. + * ``docker_image``: docker image to load up on worker nodes for multi-node benchmarks + * ``nodes``: A list of worker node that will run the benchmarks. + + .. notes:: + + The main node is the node that will be in charge of managing the other worker nodes. + + .. code-block:: yaml + + system: + sshkey: + arch: cuda + docker_image: ghcr.io/mila-iqia/milabench:${system.arch}-nightly + + nodes: + - name: node1 + ip: 192.168.0.25 + main: true + port: 8123 + user: + + - name: node2 + ip: 192.168.0.26 + main: false + user: + + +1. Set the ``$MILABENCH_BASE`` environment variable to the base directory in which all the code, + virtual environments and data should be put. + + .. code-block:: text + + base/ # folder $MILABENCH_BASE + ├── extra # Cache and lock files + ├── data # Dataset used by benchmarks + ├── venv/ # Virtual environment created for each benchmark + │ └── torch + └── runs/ # Benchmark metrics + ├── run_name_1 + └── run_name_2 + + +2. Set the ``$MILABENCH_CONFIG`` environment variable to the configuration file that represents the benchmark suite you want to run. + Normally it should be set to ``config/standard.yaml``. + +3. ``milabench install --system system.yaml``: Install the individual benchmarks in virtual environments. + +4. ``milabench prepare --system system.yaml``: Download the datasets, weights, etc. + +If the machine has both NVIDIA/CUDA and AMD/ROCm GPUs, you may have to set the +``MILABENCH_GPU_ARCH`` environment variable as well, to either ``cuda`` or ``rocm``. + + +Run milabench +~~~~~~~~~~~~~ + +The following command will run the whole benchmark and will put the results in a new directory in ``$MILABENCH_BASE/runs`` (the path will be printed to stdout). + +.. code-block:: bash + + milabench run + +Here are a few useful options for ``milabench run``: + +.. code-block:: bash + + # Only run the bert benchmark + milabench run --system system.yaml --select bert + + # Run all benchmarks EXCEPT bert and stargan + milabench run --system system.yaml --exclude bert,stargan + + # Run the benchmark suite three times in a row + milabench run --system system.yaml --repeat 3 + + +Batch Resizing +^^^^^^^^^^^^^^ + +Milabench supports automatic batch resize to accomodate different GPU memory capacity. +The feature is disabled by default and can be enabled using the environment variable ``MILABENCH_SIZER_AUTO``. +Additional constraint on the memory usage can be set to test for different condition. + +.. code-block:: text + + MILABENCH_SIZER_BATCH_SIZE int # Override the batch size + MILABENCH_SIZER_AUTO False # Enable autoscaling from the GPU max memory + MILABENCH_SIZER_MULTIPLE int # Force the Batch size to be a multiple of something + MILABENCH_SIZER_OPTIMIZED int # Use configured batch + MILABENCH_SIZER_CAPACITY str # Override GPU max memory + + +We recommend the following constraint: + +.. code-block:: text + + export MILABENCH_SIZER_AUTO=true + export MILABENCH_SIZER_MULTIPLE=8 + + +Reports +~~~~~~~ + +The following command will print out a report of the tests that ran, the metrics and if there were any failures. It will also produce an HTML report that contains more detailed information about errors if there are any. + +.. code-block:: bash + + milabench report --runs $MILABENCH_BASE/runs/some_specific_run --html report.html + +The report will also print out a score based on a weighting of the metrics, as defined in the file ``$MILABENCH_CONFIG`` points to. + + +.. code-block:: text + + ================= + Benchmark results + ================= + fail n perf sem% std% peak_memory score weight + bert-fp16 0 8 155.08 0.3% 4.3% 24552 1241.260310 0.00 + bert-fp32 0 8 29.52 0.0% 0.5% 31524 236.337218 0.00 + bert-tf32 0 8 120.46 0.4% 6.1% 31524 964.713297 0.00 + bert-tf32-fp16 0 8 154.76 0.3% 4.1% 24552 1238.477257 3.00 + convnext_large-fp16 0 8 337.48 0.9% 14.0% 27658 2741.604444 0.00 + convnext_large-fp32 0 8 44.61 0.8% 12.6% 49786 354.207225 0.00 + convnext_large-tf32 0 8 135.99 0.7% 11.2% 49786 1089.394916 0.00 + convnext_large-tf32-fp16 0 8 338.58 0.8% 13.0% 27658 2744.325170 3.00 + davit_large 0 8 312.79 0.3% 6.7% 35058 2515.326450 1.00 + davit_large-multi 0 1 2401.65 1.0% 7.7% 42232 2401.651720 5.00 + dlrm 0 1 188777.20 1.8% 14.0% 3194 188777.203190 1.00 + focalnet 0 8 400.47 0.2% 5.4% 26604 3215.431924 2.00 + opt-1_3b 0 1 26.71 0.1% 0.4% 44116 26.714365 5.00 + opt-1_3b-multinode 0 2 34.62 0.2% 1.0% 43552 34.618292 10.00 + opt-6_7b 0 1 14.32 0.0% 0.1% 55750 14.319587 5.00 + opt-6_7b-multinode 0 2 10.79 0.1% 0.7% 49380 10.792595 10.00 + reformer 0 8 61.70 0.0% 0.9% 25376 494.110834 1.00 + regnet_y_128gf 0 8 99.96 0.2% 5.0% 31840 803.012507 2.00 + resnet152 0 8 710.18 0.3% 6.2% 36732 5710.828608 1.00 + resnet152-multi 0 1 5367.34 1.0% 8.1% 38638 5367.338469 5.00 + resnet50 0 8 984.43 0.9% 19.1% 5026 7927.257351 1.00 + rwkv 0 8 428.65 0.2% 3.8% 5546 3435.097716 1.00 + stargan 0 8 51.32 1.8% 40.8% 37848 413.238870 1.00 + super-slomo 0 8 41.63 0.1% 2.3% 34082 332.395065 1.00 + t5 0 8 48.05 0.2% 3.9% 35466 384.317023 2.00 + whisper 0 8 248.16 0.0% 0.6% 37006 1985.861017 1.00 + + Scores + ------ + Failure rate: 0.00% (PASS) + Score: 219.06 diff --git a/docs/Welcome/Changelog.rst b/docs/Welcome/Changelog.rst new file mode 100644 index 000000000..7dc58dfe7 --- /dev/null +++ b/docs/Welcome/Changelog.rst @@ -0,0 +1,4 @@ +Changelog +========= + +TBD \ No newline at end of file diff --git a/docs/Welcome/Features.rst b/docs/Welcome/Features.rst new file mode 100644 index 000000000..ebf3b32e4 --- /dev/null +++ b/docs/Welcome/Features.rst @@ -0,0 +1,55 @@ +Features +======== + +* non intruisive Instrumentation +* Validation Layers +* Automatic batch resizing +* Docker +* Hardware + * ROCm 5.7 + * NVIDIA +* Metrics gathering + * Performance throughput + * GPU util + * CPU util + * IO util + + +Benchmarks +---------- + +.. code-block:: text + + +--------------------------+-----------+-----------+-------------+-----------+-------------------+ + | Benchmark | Unit | Domain | Network | Focus | Task | + +==========================+===========+===========+=============+===========+===================+ + | bf16 | TFlops | Synthetic | | Training | | + | fp16 | TFlops | Synthetic | | Training | | + | tf32 | TFlops | Synthetic | | Training | | + | fp32 | TFlops | Synthetic | | Training | | + | bert-fp16 | | NLP | Transformer | Training | Language Modeling | + | bert-fp32 | | NLP | Transformer | Training | Language Modeling | + | bert-tf32 | | NLP | Transformer | Training | Language Modeling | + | bert-tf32-fp16 | | NLP | Transformer | Training | Language Modeling | + | opt-1_3b | | NLP | Transformer | Training | Language Modeling | + | opt-6_7b | | NLP | Transformer | Training | Language Modeling | + | reformer | | NLP | Transformer | Training | Language Modeling | + | rwkv | | NLP | RNN | Training | Language Modeling | + | llama | Token/sec | NLP | Transformer | Inference | Generation | + | dlrm | | NLP | | Training | Recommendation | + | convnext_large-fp16 | img/sec | Vision | Convolution | Training | Classification | + | convnext_large-fp32 | img/sec | Vision | Convolution | Training | Classification | + | convnext_large-tf32 | img/sec | Vision | Convolution | Training | Classification | + | convnext_large-tf32-fp16 | img/sec | Vision | Convolution | Training | Classification | + | davit_large | img/sec | Vision | Transformer | Training | Classification | + | focalnet | | Vision | Convolution | Training | Classification | + | davit_large-multi | img/sec | Vision | Transformer | Training | Classification | + | regnet_y_128gf | img/sec | Vision | Convolution | Training | Classification | + | resnet152 | img/sec | Vision | Convolution | Training | Classification | + | resnet152-multi | img/sec | Vision | Convolution | Training | Classification | + | resnet50 | img/sec | Vision | Convolution | Training | Classification | + | stargan | img/sec | Vision | Convolution | Training | GAN | + | super-slomo | img/sec | Vision | Convolution | Training | | + | t5 | | NLP | Transformer | Training | | + | whisper | | Audio | | Training | | + +--------------------------+-----------+-----------+-------------+-----------+-------------------+ \ No newline at end of file diff --git a/docs/Welcome/Roadmap.rst b/docs/Welcome/Roadmap.rst new file mode 100644 index 000000000..5dbac9771 --- /dev/null +++ b/docs/Welcome/Roadmap.rst @@ -0,0 +1,10 @@ +Roadmap +======= + +* Cloud CI +* ROCm 6.0 - MI300 support +* GPU Max Series - 1550 support +* Evaluate suitability + * Tenstorrent + * Graphcore + * Cerebras diff --git a/docs/index.rst b/docs/index.rst index 3ac990fcf..dfd97c7b1 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -1,20 +1,38 @@ +Milabench +========= -Welcome to milabench's documentation! -===================================== .. toctree:: + :caption: News + :maxdepth: 1 + + Welcome/Features + Welcome/Roadmap + Welcome/Changelog + +.. toctree:: + :caption: Getting Started :maxdepth: 2 - :caption: Contents: - usage.rst - docker.rst - dev-usage.rst - new_benchmarks.rst - reference.rst - sizer.rst + GettingStarted/Usage + GettingStarted/Docker + +.. toctree:: + :caption: Contributing + :maxdepth: 1 + + Contributing/NewBenchmark + Contributing/Instrumentation + Contributing/BatchSizer + Contributing/Debugging + Contributing/Design + +.. toctree:: + :caption: API + :maxdepth: 1 + + API/ref-pack.rst -Indices and tables -================== * :ref:`genindex` * :ref:`modindex` diff --git a/docs/reference.rst b/docs/reference.rst deleted file mode 100644 index 3a8121c94..000000000 --- a/docs/reference.rst +++ /dev/null @@ -1,7 +0,0 @@ -Reference -========= - -.. toctree:: - :maxdepth: 3 - - ref-pack.rst diff --git a/docs/requirements.txt b/docs/requirements.txt new file mode 100644 index 000000000..cbf1e3658 --- /dev/null +++ b/docs/requirements.txt @@ -0,0 +1,2 @@ +sphinx +sphinx-rtd-theme diff --git a/docs/usage.rst b/docs/usage.rst deleted file mode 100644 index ecea88b75..000000000 --- a/docs/usage.rst +++ /dev/null @@ -1,71 +0,0 @@ - -Install and use ---------------- - -.. note:: - - You may use Docker to run the benchmarks, which will likely be easier. See the Docker section of this documentation for more information. - - -To install, clone the repo: - -.. code-block:: bash - - # You may need to upgrade pip - pip install pip -U - git clone git@github.com:mila-iqia/milabench.git - cd milabench - # - # Install in editable mode - pip install -e . - -This will install two commands, ``milabench`` and ``voir``. - - -Before running the benchmarks -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -1. Set the ``$MILABENCH_BASE`` environment variable to the base directory in which all the code, virtual environments and data should be put. - -2. Set the ``$MILABENCH_CONFIG`` environment variable to the configuration file that represents the benchmark suite you want to run. Normally it should be set to ``config/standard.yaml``. - -3. ``milabench install``: Install the individual benchmarks in virtual environments. - -4. ``milabench prepare``: Download the datasets, weights, etc. - -If the machine has both NVIDIA/CUDA and AMD/ROCm GPUs, you may have to set the ``MILABENCH_GPU_ARCH`` environment variable as well, to either ``cuda`` or ``rocm``. - - -Run milabench -~~~~~~~~~~~~~ - -The following command will run the whole benchmark and will put the results in a new directory in ``$MILABENCH_BASE/runs`` (the path will be printed to stdout). - -.. code-block:: bash - - milabench run - -Here are a few useful options for ``milabench run``: - -.. code-block:: bash - - # Only run the bert benchmark - milabench run --select bert - - # Run all benchmarks EXCEPT bert and stargan - milabench run --exclude bert,stargan - - # Run the benchmark suite three times in a row - milabench run --repeat 3 - - -Reports -~~~~~~~ - -The following command will print out a report of the tests that ran, the metrics and if there were any failures. It will also produce an HTML report that contains more detailed information about errors if there are any. - -.. code-block:: bash - - milabench report --runs $MILABENCH_BASE/runs/some_specific_run --html report.html - -The report will also print out a score based on a weighting of the metrics, as defined in the file ``$MILABENCH_CONFIG`` points to.