Merge pull request #109 from AmbiqAI/carlos/ap4-lite

Fixes occasional USB hang on 4.4.1, adds AP4 Lite support to AutoDeploy
AmbiqAI · Jul 27, 2023 · cd48666 · cd48666
2 parents fb97657 + 9d4377a
commit cd48666
Show file tree

Hide file tree

Showing 16 changed files with 307 additions and 89 deletions.
diff --git a/README.md b/README.md
@@ -1,13 +1,22 @@
 # NeuralSPOT
-NeuralSPOT is Ambiq's AI Enablement Library. It implements an AI-centric API for common tasks such as collecting audio from Ambiq's peripherals, computing features from that audio, controlling power modes, reading accelerometer and gyroscopic data from i2c sensors, and a variety of helper functions and tools which make developing AI features on Ambiq hardware easier.
+NeuralSPOT is Ambiq's AI SDK. It is open-source, real-time, and OS-agnostic. NeuralSPOT is designed to help AI feature developers in 3 important ways:
+
+1. **Initial development and fine-tuning of the AI model**: neuralSPOT has tools to rapidly [characterize the performance and size](./docs/From%20TF%20to%20EVB%20-%20testing,%20profiling,%20and%20deploying%20AI%20models.md) of a TFLite model on Ambiq processors.
+2. **Rapid AI feature prototyping**: neuralSPOT's library of easy to use drivers, feature extractors, helper functions, and communication mechanisms accelerate the development of stand-alone AI feature applications to test the model in real-world situations with real-world data and latencies.
+3. **AI model library export**: once an AI model has been developed and refined via prototyping, neuralSPOT allows one-click deployment of a static library implementing the AI model, suitable to linking into larger embedded applications.
+
+![image-20230727151931018](./docs/images/image-20230727151931018.png)
+
+
+NeuralSPOT wraps an AI-centric API around the AmbiqSuite SDK to ease common tasks such as sensing, computing features from the sensor data, performance profiling, and controlling Ambiq's many on-board peripherals.
 
 ![image-20220811095223908](./docs/images/image-20220811095223908.png)
 
-NeuralSPOT documentation is spread throughout the repository - generally, every component has its own documentation, which can be overwhelming. Please visit [`doc/`](https://github.com/AmbiqAI/neuralSPOT/tree/main/docs) for high level documents useful as a starting point for understanding neuralSPOT's overall structure and intended usage.
+NeuralSPOT's documentation is spread throughout the repository - generally, every component has its own documentation, which can be overwhelming. Please visit [`doc/`](https://github.com/AmbiqAI/neuralSPOT/tree/main/docs) for high level documents useful as a starting point for understanding neuralSPOT's overall structure and intended usage.
 
 # Building and Deploying NeuralSPOT
 
-NeuralSPOT is designed to be used in two ways:
+NeuralSPOT make system is can be used in two ways:
 1. **As the 'base of operations' for your AI development**. Intended for stand-alone EVB development, you can add new binary (axf) targets to the /examples directory.
 2. **As a seed for adding NeuralSPOT to a larger project**. In this mode of operations, you would use NeuralSPOT to create a stub project (a "nest", described below) with everything needed to start running AI on EVBs.
 
@@ -23,43 +32,48 @@ All `make` invocations for NS must be done from the base directory ("nest" makes
 | `make nest`          | creates a minimal '[nest](#The_Nest)' *without* a basic main.cc stub file and without overwriting makefiles |
 | `make nestcomponent` | updates a single component in the nest                       |
 | `make deploy`        | Uses jlink to deploy an application to a connected EVB       |
-| `make view`          | Starts a SWO interface                                       |
+| `make view`          | Starts a SWO terminal interface                              |
 
 Besides targets, NeuralSPOT has a standard set of compile-time switches to help you configure the build exactly the way you need. These are set via the normal make conpiption, e.g. `make BOARD=apollo4b`.
 
 | Parameter | Description | Default |
 | --------- | ----------- | ------- |
-| BOARD | Defines the target SoC (currently either apollo4b or apollo4p) | apollo4p |
-| EVB | Defines the EVB type (evb or evb_blue) | evb |
-| BINDIR | Name of directories where binaries and build artifacts are stored. Note that there will be one build directory per binary or library being created | build |
-| EXAMPLE | Name of example to be built. By default, all examples will be built. |
+| BOARD | Defines the target SoC (currently apollo4b, apollo4p, or apollo4l) | apollo4p |
+| EVB | Defines the EVB type (evb, blue_evb, blue_kxr_evb, or blue_kbr_evb) | evb |
+| BINDIR | Name of directories where binaries and build artifacts are stored. | build |
+| EXAMPLE | Name of example to be built. By default, all examples will be built. |All|
 | NESTDIR | Relative path and directory name where nest will be created | nest |
-| NESTCOMP | root path to a single component to be updated in nest | extern/AmbiqSuite |
+| NESTCOMP | root path to a single component to be updated for `make nestcomponent` | extern/AmbiqSuite |
 | NESTEGG | name of neuralspot example used to create nest | basic_tf_stub |
-| AS_VERSION | Ambiqsuite Version | R4.3.0 |
-| TF_VERSION | Tensflow Lite for Microcontrollers Version | 0c46d6e |
+| AS_VERSION | Ambiqsuite Version | R4.4.1 |
+| TF_VERSION | Tensflow Lite for Microcontrollers Version | fecdd5d |
 | TARGET | Defines what target will be loaded by `make deploy` | basic_tf_stub |
 | MLDEBUG | Setting to '1' turns on TF debug prints and use debug TFLM | 0 |
 | MLPROFILE | Setting to '1' enables TFLM profiling and logs (*NOTE* not supported for TF_VERSION R2.3.1) | 0 |
-| AUDIO_DEBUG | Setting to '1' turns on RTT audio dump | 0 |
 
 > **Note**  Defaults for these values are set in `./make/neuralspot_config.mk`. Ambiq EVBs are available in a number of flavors, each of which requiring slightly different config settings. For convenience, these settings can be placed in `./make/local_overrides.mk` (note that this file is ignored by git to prevent inadvertent overrides making it into the repo). To make changes to this file without tracking them in git, you can do the following:
 > `$> git update-index --assume-unchanged make/local_overrides.mk`
 
-# NeuralSPOT Repo Structure
+# NeuralSPOT Structure and Directories
+
+NeuralSPOT consists of the neuralspot library, required external components, tools, and examples.
+
+<img src="./docs/images/image-20230727151002947.png" alt="image-20230727151002947" style="zoom:50%;" />
 
-NeuralSPOT consists of the neuralspot library, required external components, and examples.
+The directory structure reflects the code structure
 
 ```/neuralspot - contains all code for NeuralSPOT libraries
-neuralspot/ # contains neuralspot feature-specific libraries
-extern/     # contains external dependencies, including TF and AmbiqSuite
-/examples   # contains several examples, each of which can be compiled to a deployable axf
-/projects   # contains examples of how to integrate external projects such as EdgeImpulse models
-/make       # contains makefile helpers, including neuralspot-config.mk
+/neuralspot # Sensor, communications, and helper libraries
+/extern     # External dependencies, including TF and AmbiqSuite
+/examples   # Example applications, each of which can be compiled to a deployable binary
+/projects   # Examples of how to integrate external projects such as EdgeImpulse models
+/make       # Makefile helpers, including neuralspot-config.mk and local_overrides.mk
+/tools	    # AutoDeploy and RPC python-based tools
+/tests.     # Simple compatibility tests
 /docs       # introductory documents, guides, and release notes
 ```
 
-# NeuralSPOT Theory of Operations.
+# NeuralSPOT Theory of Operations
 
 NeuralSPOT is a SDK for AI development on Ambiq products via an AI-friendly API. It offers a set of libraries for accessing hardware, pre-configured instances of external dependencies such as AmbiqSuite and Tensorflow Lite for Microcontrollers, and a handful of examples which compile into deployable binaries.
 
@@ -71,9 +85,11 @@ NeuralSPOT is continuously growing, and offers the following libraries today - f
 
 1. `ns-audio`: [Library for sampling audio](neuralspot/ns-audio/ns-audio.md) from Ambiq's audio interfaces and sending them to an AI application via several IPC methods. This library also contains audio-centric common AI feature helpers such as configurable Mel-spectogram computation.
 2. `ns-peripherals`: API for controlling Ambiq's power modes, performance modes, and helpers for commonly used I/O devices such as EVB buttons.
-3. `ns-harness`: a simple harness for abstracting common AmbiqSuite code, meant to be replaced when NeuralSPOT is not being used by AmbiqSuite.
-4. `ns-ipc`: Common mechanisms for presenting collected sensor data to AI applications
-5. ... and many more
+3. `ns-usb` and `ns-rpc`: libraries for talking to PC applications via remote procedure calls to ease cross-platform development.
+4. `ns-ble`: an easy-to-use BLE wrapper for creating simple BLE-based demos such as WebBLE dashboards.
+5. `ns-harness`: a simple harness for abstracting common AmbiqSuite code, meant to be replaced when NeuralSPOT is not being used by AmbiqSuite.
+6. `ns-ipc`: Common mechanisms for presenting collected sensor data to AI applications
+7. ... and many more
 
 ## The Nest
 

diff --git a/docs/From TF to EVB - testing, profiling, and deploying AI models.md b/docs/From TF to EVB - testing, profiling, and deploying AI models.md
@@ -21,9 +21,7 @@ Fortunately, all the information needed to automate the middle 2 steps is tucked
 
 AutoDeploy is a tool that speeds up the AI/Embedded iteration cycle by automating most of the tedious bits - given a TFLite file, the tool will convert it to code that can run on an Ambiq EVB, then run a series of tests to characterize its embedded behavior. It then generates a minimal static library suitable implementing the model for easy integration into applications.
 
-
-
-![image-20230407163734881](/Users/carlosmorales/AmbiqDev/neuralSPOT/docs/images/image-20230407163734881.png)
+![image-20230727120409828](./images/image-20230727120409828.png)
 
 ### Pain Points
 
@@ -36,17 +34,148 @@ AutoDeploy was designed to address many common pain points:
   - AutoDeploy runs the same model inputs on the PC and EVB, and compares the results, leading to early discovery and easier debugging of behavior differences. Configuration of the model, input/output tensors, and statistics gathering are driven by AutoDeploy using RPC.
 - Model Performance Profiling
   - AutoDeploy extends the TFLite for Microcontrollers Profiling to produce detailed reports including per-layer latency, MAC count, and cache and CPU performance statistics.
+- Model Power Usage Profiling
+  - If a Joulescope is available, AutoDeploy can use it to automatically measure the model inference power consumption.
+
 
 # Using AutoDeploy
 
 Using AutoDeploy is easy - just give it a TFLite model, connect an EVB, and let it go to work:
 
 ```bash
 $> cd tools
-$> python -m ns_autodeploy --tflite-filename=mymodel.tflite --model-name example_model
+$> python -m ns_autodeploy --tflite-filename=mymodel.tflite --model-name mymodel --measure-power
+```
+
+As part of the process, AutoDeploy generates a number of artifacts, including three ready-to-deploy binary files and the source code used to generate them:
+
+```bash
+.../projects/autodeploy
+	./mymodel
+			./mymodel
+					./lib # minimal static library and API header
+					./src # tiny example that compiles the lib into an EVB image
+			./tflm_validator
+					./src # Highly instrumented model validation application (leverages USB and RPC)
+			./mymodel_power
+					./src # Power measurement application (requires Joulescope and GPIO connections)
+			mymodel.csv # Per-layer profile
+			mymodel_mc.pkl # artifact generated during validation, useful for non-USB EVBs
+			mymodel_md.pkl # artifact generated during validation, useful for non-USB EVBs
+
+```
+The `mymodel.csv` file contains a CSV representation of per-layer profiling stats. For a KWS, for example, the CSV contains the following:
+
+| Event | Tag | uSeconds| Est MACs| cycles| cpi| exc| sleep| lsu| fold| daccess| dtaglookup| dhitslookup| dhitsline| iaccess| itaglookup| ihitslookup| ihitsline |
+| ------| ------|------|------|------|------|------|------|------|------|------|------|------|------|------|------|------|------ |
+|0| CONV_2D| 47794| 320000| 9176380| 141| 252| 0| 188| 166| 58335| 55777| 0| 2558| 1220789| 357761| 0| 863028 |
+|1| DEPTHWISE_C| 23334| 72000| 4480072| 153| 54| 0| 200| 123| 30068| 25311| 0| 4757| 625707| 175380| 0| 450327 |
+|2| CONV_2D| 44674| 512000| 8577368| 168| 102| 0| 96| 194| 81745| 80093| 0| 1652| 1087045| 309917| 0| 777128 |
+|3| DEPTHWISE_C| 23337| 72000| 4480468| 74| 147| 0| 77| 126| 30072| 25311| 0| 4761| 625704| 175399| 0| 450305 |
+|4| CONV_2D| 44671| 512000| 8578178| 1| 120| 0| 21| 195| 81736| 80070| 0| 1666| 1086915| 309928| 0| 776987 |
+|5| DEPTHWISE_C| 23328| 72000| 4478896| 44| 99| 0| 225| 122| 30059| 25301| 0| 4758| 625584| 175343| 0| 450241 |
+|6| CONV_2D| 44676| 512000| 8576362| 83| 135| 0| 27| 195| 81736| 80079| 0| 1657| 1086955| 309892| 0| 777063 |
+|7| DEPTHWISE_C| 23336| 72000| 4478996| 85| 238| 0| 165| 118| 30067| 25313| 0| 4754| 625660| 175393| 0| 450267 |
+|8| CONV_2D| 44675| 512000| 8577472| 118| 184| 0| 161| 196| 81745| 80092| 0| 1653| 1087073| 309922| 0| 777151 |
+|9| AVERAGE_POO| 583| 0| 111772| 100| 169| 0| 39| 197| 202| 177| 0| 25| 60827| 1541| 0| 59286 |
+|10| RESHAPE| 21| 0| 3748| 213| 74| 0| 219| 1| 31| 24| 0| 7| 422| 142| 0| 280 |
+|11| FULLY_CONNE| 134| 768| 25644| 69| 76| 0| 249| 21| 275| 263| 0| 12| 3014| 911| 0| 2103 |
+|12| SOFTMAX| 85| 0| 17574| 227| 231| 0| 24| 43| 102| 63| 0| 39| 2552| 717| 0| 1835 |
+
+This information is collected from a number of sources based on both static analysis on the PC and dynamic profiling on the EVB:
+Dynamically collected statistics are:
+1. The Tag and uSeconds come from TFLM's micro-profiler and is collected by TFLM during the first inference only
+1. The cycles, cpi, exc, sleep, lsu, and fold come from Arm's ETM profiling registers and are measured during the first inference only
+1. The daccess, dtaglookup, dhitslook, dhitsline, iaccess, itagelookup, ihitslookup, and ihitsline come from Ambiq cache profiling module and are measured during the first inference only
+
+Estimated MACs are based on a static analysis of the TFLite file and are calculated as the theoretical MAC per layer type and shape.
+
+## Measuring Power
+AutoDeploy can use a Joulescope to measure power. It does so by:
+1. Creating and deploying a power measurement binary to the EVB
+2. Triggering a number of inference operations by using the Joulescope's GP0out bin (which is monitored by the EVB)
+3. Waiting for certain patterns on the Joulescope's GPIn pins to know when the inference code is running
+4. Using Joulescope's python driver to sample power measurements during that time.
+
+It will do this twice, once for 96Mhz and another for 192Mhz.
+
+### Requirements
+1. A Joulescope
+2. Connections between the Joulescope's GPIn and GPOut pins and the appriopriate GPIO pins (see the wiring guide below)
+
+### Connecting the Joulescope to the EVB
+1. Power connections (follow EVB power measurement connection guide)
+2. EVB Pin 22 to Joulescope In0
+2. EVB Pin 23 to Joulescope In1
+4. EVB Pin 24 to Joulescope Out0
+
+### Procedure
+1. Plug in EVB and Joulescope
+2. Start the Joulescope desktop application to power on the device
+3. Stop the Joulescope desktop application in order to allow the Python driver to control the Joulescope instead
+4. Run the power measurement script
+
+*NOTE*: AutoDeploy needs characterization information to create the power binary - this data can be obtained by running AutoDeploy's profiling step (it will run by default),
+or it can be loaded from a previous run (see the Apollo4 Lite section below for one scenario in which this is necessary).
+
+### Result Output
 ```
+Charcterization Report for har:
+[Profile] Per-Layer Statistics file:         har_stats.csv
+[Profile] Max Perf Inference Time (ms):      355.427
+[Profile] Total Estimated MACs:              3465792
+[Profile] Total CPU Cycles:                  68241382
+[Profile] Total Model Layers:                17
+[Profile] MACs per second:                   9751065.620
+[Profile] Cycles per MAC:                    19.690
+[Power]   Max Perf Inference Time (ms):      20.249
+[Power]   Max Perf Inference Energy (uJ):    189.257
+[Power]   Max Perf Inference Avg Power (mW): 9.346
+[Power]   Min Perf Inference Time (ms):      20.110
+[Power]   Min Perf Inference Energy (uJ):    190.665
+[Power]   Min Perf Inference Avg Power (mW): 9.481
+
+Notes:
+        - Statistics marked with [Profile] are collected from the first inference, whereas [Power] statistics
+          are collected from the average of the 100 inferences. This will lead to slight
+          differences due to cache warmup, etc.
+        - CPU cycles are captured via Arm ETM traces
+        - MACs are estimated based on the number of operations in the model, not via instrumented code
+```
+
 
-As part of the process, AutoDeploy generates two ready-to-deploy binary files and the source code used to generate them:
+## Profiling Apollo4 Lite
+Apollo4 Lite doesn't have a USB port, which is needed by AutoDeploy to fine-tune and profile the model. Furthermore, the data collected from these steps is needed when creating the power measurement binary and minimal library. In order to enable power measurement and library generation on Apollo4 Lite, AutoDeploy is capable of generating the required metadata by profiling the model on Apollo4 Plus and saving it for subsequent use on Apollo4 Lite. 
 
-1. `neuralSPOT/projects/models/tflm_validator/` contains a small application that implements the model and uses RPC to validate and profile it.
-2. `neuralSPOT/projects/models/example_model` contains a minimal static library that can be linked with applications wishing to invoke the model, and a trivial example of how to do so.
+### Requirements
+To measure power and generate model libraries for Apollo4 Lite, you'll need:
+1. An Apollo4P or Apollo4P Blue EVB
+2. An Apollo4 Lite
+3. A Joulescope (to measure power)
+
+### Procedure
+The overall procedure is:
+1. Run characterization on Apollo4 Plus or Blue Plus EVB, which generates the required metadata
+2. Switch the EVB to Apollo4 Lite
+3. Run just the library generation and/or power measurement steps on Apollo4 Lite
+
+Using the human activity recognition model as an example, this procedure translates to something like the following steps.
+
+```bash
+cd .../neuralSPOT/tools
+# Plug in Apollo4 Plus or Blue Plus EVB
+vi ../make/local_overrides.mk # uncomment the BOARD:=apollo4p line, comment out all other BOARD settings in this file
+
+python -m ns_autodeploy --model-name har --tflite-filename har.tflite --runs 3 --no-create-library # small number of runs and skip creating the library to save time
+# Wait for successful completion of script
+
+ls ../projects/autodeploy/har
+# check for existence of har_md.pkl and har_mc.pkl
+
+# Switch the Apollo4P EVB for the Apollo4 Lite EVB, including Joulescope GPIO cables if needed
+
+vi ../make/local_overrides.mk # uncomment the BOARD:=apollo4l line, comment out all other BOARD settings in this file
+
+python -m ns_autodeploy --model-name har --tflite-filename har.tflite --runs 3 --measure-power --no-create-binary --no-create-profile # Skip fine-tuning and profiling steps on AP4 Lite
+# Wait for successful completion of script
+```