Documentation improvements (#764)

* minimal example for enabling Andrej's runner, from commit 2d47702 * Minimal example * Documentation improvements to align documentation with changes in code base
pytorch · Jul 17, 2024 · b165142 · b165142
1 parent b6a09aa
commit b165142
Show file tree

Hide file tree

Showing 2 changed files with 22 additions and 12 deletions.
diff --git a/README.md b/README.md
@@ -250,9 +250,19 @@ Now, follow the app's UI guidelines to pick the model and tokenizer files from t
 
 ### Deploy and run on Android
 
+**This section is copied from the original REAMDE and may require additional integration work**
 
+Please refer to our [tutorial on how to build an Android app running
+your PyTorch models with
+Executorch](https://pytorch.org/executorch/main/llm/llama-demo-android.html)
+to for an example on how to run your torchchat models on Android.
 
-MISSING. TBD.
+![Screenshot](https://pytorch.org/executorch/main/_static/img/android_llama_app.png
+ "Android app running Llama model")
+
+Detailed step by step in conjunction with ET Android build, to run on
+simulator for Android. `scripts/android_example.sh` for running a
+model on an Android simulator (on Mac), and in `docs/Android.md`.
 
 
 

diff --git a/docs/ADVANCED-USERS.md b/docs/ADVANCED-USERS.md
@@ -206,8 +206,8 @@ which are not available for exported DSO and PTE models.
 
 ## Eval
 
-For an introduction to the model evaluation tool `eval`, please see the introductory
-README.
+For an introduction to the model evaluation tool `eval`, please see
+the introductory README.
 
 In addition to running eval on models in eager mode (optionally
 compiled with `torch.compile()`), you can also load dso and pte models
@@ -276,11 +276,16 @@ to achieve this.
 
 ### Visualizing the backend delegate on ExecuTorch export
 
-By default, export will lower to the XNNPACK delegate for improved performance. ExecuTorch export
-provides APIs to visualize what happens after the `to_backend()` call in the lowering process.
+By default, export will lower to the XNNPACK delegate for improved
+performance. ExecuTorch export provides APIs to visualize what happens
+after the `to_backend()` call in the lowering process.
 
-- `get_delegation_info()`: provide a summary of the model after the `to_backend()` call, including the total delegated subgraphs, number of delegated nodes and number of non-delegated nodes.
-- `format_delegated_graph`: a formatted str of the whole graph, as well as the subgraph/s consumed by the backend.
+- `get_delegation_info()`: provide a summary of the model after the
+  `to_backend()` call, including the total delegated subgraphs, number
+  of delegated nodes and number of non-delegated nodes.
+
+- `format_delegated_graph`: a formatted str of the whole graph, as
+  well as the subgraph/s consumed by the backend.
 
 See the
 [debug backend delegate documentation](https://pytorch.org/executorch/main/debug-backend-delegate.html)
@@ -319,11 +324,6 @@ python3 generate.py --dtype [bf16 | fp16 | fp32] ...
 python3 export.py --dtype [bf16 | fp16 | fp32] ...
 ```
 
-**Unlike gpt-fast which uses bfloat16 as default, Torchchat uses
-  float32 as the default. As a consequence you will have to set to
-  `--dtype bf16` or `--dtype fp16` on server / desktop for best
-  performance.**
-
 You can find instructions for quantizing models in
 [docs/quantization.md](file:///./quantization.md).  Advantageously,
 quantization is available in eager mode as well as during export,