Skip to content

Commit

Permalink
Doc
Browse files Browse the repository at this point in the history
  • Loading branch information
muellerzr committed Aug 24, 2023
1 parent 430ee9d commit 39ffc3a
Show file tree
Hide file tree
Showing 2 changed files with 127 additions and 0 deletions.
2 changes: 2 additions & 0 deletions docs/source/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,8 @@
title: Example Zoo
- local: usage_guides/big_modeling
title: How to perform inference on large models with small resources
- local: usage_guides/model_size_estimator
title: Knowing how big of a model you can fit into memory
- local: usage_guides/quantization
title: How to quantize model
- local: usage_guides/distributed_inference
Expand Down
125 changes: 125 additions & 0 deletions docs/source/usage_guides/model_size_estimator.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->

# Understanding how big of a model can fit on your machine

One very difficult aspect when exploring potential models to use on your machine is knowing just how big of a model will *fit* with your current graphics card.

To help alleviate this, 🤗 Accelerate has a CLI interface through `accelerate calculate-memory`. This tutorial will
help walk you through using it, what to expect, and at the end link to the interactive demo hosted on the 🤗 Hub which will
even let you post those results directly on the model repo!

Currently we support searching for models that can be used in `timm` and `transformers`.

## The Command

When using `accelerate estimate-memory`, you need to pass in the name of the model you want to use, potentially the framework
that model utilizing (if it can't be found automatically), and the data types you want the model to be loaded in with.

For example, here is how we can calculate the memory footprint for `bert-base-cased`:

```bash
accelerate estimate-memory bert-base-cased
```

This will download the `config.json` for `bert-based-cased`, load the model on the `meta` device, and report back how much space
it will use:

```
┌────────────────────────────────────────────────────┐
│ Memory Usage for loading `bert-base-cased` │
├───────┬─────────────┬──────────┬───────────────────┤
│ dtype │Largest Layer│Total Size│Training using Adam│
├───────┼─────────────┼──────────┼───────────────────┤
│float32│ 84.95 MB │413.18 MB │ 1.61 GB │
│float16│ 42.47 MB │206.59 MB │ 826.36 MB │
│ int8 │ 21.24 MB │103.29 MB │ 413.18 MB │
│ int4 │ 10.62 MB │ 51.65 MB │ 206.59 MB │
└───────┴─────────────┴──────────┴───────────────────┘
```

By default it will return all the supported dtypes (`int4` through `float32`), but if you are interested in specific ones these can be filtered.

### Specific libraries

If the source library cannot be determined automatically (like it could in the case of `bert-base-cased`), a library name can
be passed in.

```bash
accelerate estimate-memory HuggingFaceM4/idefics-80b-instruct --library_name transformers
```

```
┌────────────────────────────────────────────────────────────────────┐
│ Memory Usage for loading `HuggingFaceM4/idefics-80b-instruct` |
├───────┬─────────────┬──────────┬───────────────────────────────────┤
│ dtype │Largest Layer│Total Size│ Training using Adam │
├───────┼─────────────┼──────────┼───────────────────────────────────┤
│float32│ 3.02 GB │297.12 GB │ 1.16 TB │
│float16│ 1.51 GB │148.56 GB │ 594.24 GB │
│ int8 │ 772.52 MB │ 74.28 GB │ 297.12 GB │
│ int4 │ 386.26 MB │ 37.14 GB │ 148.56 GB │
└───────┴─────────────┴──────────┴───────────────────────────────────┘
```

```bash
accelerate estimate-memory timm/resnet50.a1_in1k --library_name timm
```

```
┌────────────────────────────────────────────────────┐
│ Memory Usage for loading `timm/resnet50.a1_in1k` │
├───────┬─────────────┬──────────┬───────────────────┤
│ dtype │Largest Layer│Total Size│Training using Adam│
├───────┼─────────────┼──────────┼───────────────────┤
│float32│ 9.0 MB │ 97.7 MB │ 390.78 MB │
│float16│ 4.5 MB │ 48.85 MB │ 195.39 MB │
│ int8 │ 2.25 MB │ 24.42 MB │ 97.7 MB │
│ int4 │ 1.12 MB │ 12.21 MB │ 48.85 MB │
└───────┴─────────────┴──────────┴───────────────────┘
```

### Specific dtypes

As mentioned earlier, while we return `int4` through `float32` by default, any dtype can be used from `float32`, `float16`, `int8`, and `int4`.

To do so, pass them in after specifying `--dtypes`:

```bash
accelerate estimate-memory bert-base-cased --dtypes float32 float16
```

```
┌────────────────────────────────────────────────────┐
│ Memory Usage for loading `bert-base-cased` │
├───────┬─────────────┬──────────┬───────────────────┤
│ dtype │Largest Layer│Total Size│Training using Adam│
├───────┼─────────────┼──────────┼───────────────────┤
│float32│ 84.95 MB │413.18 MB │ 1.61 GB │
│float16│ 42.47 MB │206.59 MB │ 826.36 MB │
└───────┴─────────────┴──────────┴───────────────────┘
```

## Caviats with this calculator

This calculator will tell you how much memory is needed to purely load the model in, *not* to perform inference.

In general, you can expect to add up to an additional 20% to this number as found by [EleutherAI](https://blog.eleuther.ai/transformer-math/). We'll be conducting research into finding a more accurate estimate to these values, and will update
this calculator once done.

## Live Gradio Demo

Lastly, we invite you to try the [live Gradio demo](https://huggingface.co/spaces/hf-accelerate/model-memory-usage) of this utility,
which includes an option to post a discussion thread on a models repository with this data. Doing so will help provide access to these numbers in the community faster and help users know what you've learned!

0 comments on commit 39ffc3a

Please sign in to comment.