Skip to content

Commit

Permalink
Documentation improvements (#764)
Browse files Browse the repository at this point in the history
* minimal example for enabling Andrej's runner, from commit 2d47702

* Minimal example

* Documentation improvements to align documentation with changes in code base
  • Loading branch information
mikekgfb authored and malfet committed Jul 17, 2024
1 parent b6a09aa commit b165142
Show file tree
Hide file tree
Showing 2 changed files with 22 additions and 12 deletions.
12 changes: 11 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -250,9 +250,19 @@ Now, follow the app's UI guidelines to pick the model and tokenizer files from t

### Deploy and run on Android

**This section is copied from the original REAMDE and may require additional integration work**

Please refer to our [tutorial on how to build an Android app running
your PyTorch models with
Executorch](https://pytorch.org/executorch/main/llm/llama-demo-android.html)
to for an example on how to run your torchchat models on Android.

MISSING. TBD.
![Screenshot](https://pytorch.org/executorch/main/_static/img/android_llama_app.png
"Android app running Llama model")

Detailed step by step in conjunction with ET Android build, to run on
simulator for Android. `scripts/android_example.sh` for running a
model on an Android simulator (on Mac), and in `docs/Android.md`.



Expand Down
22 changes: 11 additions & 11 deletions docs/ADVANCED-USERS.md
Original file line number Diff line number Diff line change
Expand Up @@ -206,8 +206,8 @@ which are not available for exported DSO and PTE models.

## Eval

For an introduction to the model evaluation tool `eval`, please see the introductory
README.
For an introduction to the model evaluation tool `eval`, please see
the introductory README.

In addition to running eval on models in eager mode (optionally
compiled with `torch.compile()`), you can also load dso and pte models
Expand Down Expand Up @@ -276,11 +276,16 @@ to achieve this.

### Visualizing the backend delegate on ExecuTorch export

By default, export will lower to the XNNPACK delegate for improved performance. ExecuTorch export
provides APIs to visualize what happens after the `to_backend()` call in the lowering process.
By default, export will lower to the XNNPACK delegate for improved
performance. ExecuTorch export provides APIs to visualize what happens
after the `to_backend()` call in the lowering process.

- `get_delegation_info()`: provide a summary of the model after the `to_backend()` call, including the total delegated subgraphs, number of delegated nodes and number of non-delegated nodes.
- `format_delegated_graph`: a formatted str of the whole graph, as well as the subgraph/s consumed by the backend.
- `get_delegation_info()`: provide a summary of the model after the
`to_backend()` call, including the total delegated subgraphs, number
of delegated nodes and number of non-delegated nodes.

- `format_delegated_graph`: a formatted str of the whole graph, as
well as the subgraph/s consumed by the backend.

See the
[debug backend delegate documentation](https://pytorch.org/executorch/main/debug-backend-delegate.html)
Expand Down Expand Up @@ -319,11 +324,6 @@ python3 generate.py --dtype [bf16 | fp16 | fp32] ...
python3 export.py --dtype [bf16 | fp16 | fp32] ...
```

**Unlike gpt-fast which uses bfloat16 as default, Torchchat uses
float32 as the default. As a consequence you will have to set to
`--dtype bf16` or `--dtype fp16` on server / desktop for best
performance.**

You can find instructions for quantizing models in
[docs/quantization.md](file:///./quantization.md). Advantageously,
quantization is available in eager mode as well as during export,
Expand Down

0 comments on commit b165142

Please sign in to comment.