From b1651420f69865080e9186bbfb8f9c7bb5cf3c73 Mon Sep 17 00:00:00 2001 From: Michael Gschwind <61328285+mikekgfb@users.noreply.github.com> Date: Sun, 12 May 2024 17:46:50 -0700 Subject: [PATCH] Documentation improvements (#764) * minimal example for enabling Andrej's runner, from commit 2d477022986843bbe23b60ea0529cd5d2718377b * Minimal example * Documentation improvements to align documentation with changes in code base --- README.md | 12 +++++++++++- docs/ADVANCED-USERS.md | 22 +++++++++++----------- 2 files changed, 22 insertions(+), 12 deletions(-) diff --git a/README.md b/README.md index 5cbeee088..d6d271103 100644 --- a/README.md +++ b/README.md @@ -250,9 +250,19 @@ Now, follow the app's UI guidelines to pick the model and tokenizer files from t ### Deploy and run on Android +**This section is copied from the original REAMDE and may require additional integration work** +Please refer to our [tutorial on how to build an Android app running +your PyTorch models with +Executorch](https://pytorch.org/executorch/main/llm/llama-demo-android.html) +to for an example on how to run your torchchat models on Android. -MISSING. TBD. +![Screenshot](https://pytorch.org/executorch/main/_static/img/android_llama_app.png + "Android app running Llama model") + +Detailed step by step in conjunction with ET Android build, to run on +simulator for Android. `scripts/android_example.sh` for running a +model on an Android simulator (on Mac), and in `docs/Android.md`. diff --git a/docs/ADVANCED-USERS.md b/docs/ADVANCED-USERS.md index 910a0f597..ab87654a9 100644 --- a/docs/ADVANCED-USERS.md +++ b/docs/ADVANCED-USERS.md @@ -206,8 +206,8 @@ which are not available for exported DSO and PTE models. ## Eval -For an introduction to the model evaluation tool `eval`, please see the introductory -README. +For an introduction to the model evaluation tool `eval`, please see +the introductory README. In addition to running eval on models in eager mode (optionally compiled with `torch.compile()`), you can also load dso and pte models @@ -276,11 +276,16 @@ to achieve this. ### Visualizing the backend delegate on ExecuTorch export -By default, export will lower to the XNNPACK delegate for improved performance. ExecuTorch export -provides APIs to visualize what happens after the `to_backend()` call in the lowering process. +By default, export will lower to the XNNPACK delegate for improved +performance. ExecuTorch export provides APIs to visualize what happens +after the `to_backend()` call in the lowering process. -- `get_delegation_info()`: provide a summary of the model after the `to_backend()` call, including the total delegated subgraphs, number of delegated nodes and number of non-delegated nodes. -- `format_delegated_graph`: a formatted str of the whole graph, as well as the subgraph/s consumed by the backend. +- `get_delegation_info()`: provide a summary of the model after the + `to_backend()` call, including the total delegated subgraphs, number + of delegated nodes and number of non-delegated nodes. + +- `format_delegated_graph`: a formatted str of the whole graph, as + well as the subgraph/s consumed by the backend. See the [debug backend delegate documentation](https://pytorch.org/executorch/main/debug-backend-delegate.html) @@ -319,11 +324,6 @@ python3 generate.py --dtype [bf16 | fp16 | fp32] ... python3 export.py --dtype [bf16 | fp16 | fp32] ... ``` -**Unlike gpt-fast which uses bfloat16 as default, Torchchat uses - float32 as the default. As a consequence you will have to set to - `--dtype bf16` or `--dtype fp16` on server / desktop for best - performance.** - You can find instructions for quantizing models in [docs/quantization.md](file:///./quantization.md). Advantageously, quantization is available in eager mode as well as during export,