diff --git a/docs/guide_development.md b/docs/guide_development.md index 60315ff..3c67cb5 100644 --- a/docs/guide_development.md +++ b/docs/guide_development.md @@ -36,9 +36,7 @@ Building an ad-hoc model is sooo easy with NNoM since most of the codes are auto ### What can NNoM provide to embedded engineers? It provides an **easy to use** and **easy to evaluate** inference tools for fast neural network development. -As embedded engineers, we might not know well how does neural network work and how can we optimize it for the MCU. - -NNoM together with Keras can help you to start practising within half an hour. There is no need to learn other ML libs from scratch. Deployment can be done with one line of python code after you have trained a model using Keras. +As embedded engineers, we might not know well how does neural network work and how can we optimize it for the MCU. NNoM together with Keras can help you to start practising within half an hour. There is no need to learn other ML libs from scratch. Deployment can be done with one line of python code after you have trained a model using Keras. Other than building a model, NNoM also provides a set of evaluation methods. These evaluation methods will give the developer a layer-to-layer performance evaluation of the model. @@ -73,6 +71,8 @@ NNoM currently only support 8bit weights and 8bit activations. The model will be The input data (activations) will need to be quantised then feed to the model. +Please see any example for quantising the input data. + ----- ## Optimization @@ -96,7 +96,7 @@ Some of them can be further optimized by square shape, however, the optimization > Trick, if you keep the channel size is a multiple of 4, it should work in most of the case. -If you are not sure whether the optimization is working, simply us the `model_stat()` in [Evaluation API](api_evaluation.md) to print the performance of each layer. The comparison will be shown in the following sections. +If you are not sure whether the optimization is working, simply use the `model_stat()` in [Evaluation API](api_evaluation.md) to print the performance of each layer. The comparison will be shown in the following sections. Fully connected layers and pooling layers are less constrained. @@ -133,9 +133,9 @@ In the table, layer 3 and 5 are both Convolution layer with input and output cha You can already see the efficiency difference. When input channel = 3, the convolution is performed by `arm_convolve_HWC_q7_RGB()`. This method is partially optimized since the input channel is not a multiple of 4, While Layer 3 and layer 5 are fully optimized. The efficiency difference is already huge (`0.36` vs `0.71/0.68`). -To achieve high efficiency, you should keep both input channel = a multiple of 4 and output is a multiple of 2. +To achieve high efficiency, you should keep both input channel is a multiple of 4 and output is a multiple of 2. -What does this number mean? You can estimate your runtime while designing your ad-hoc model. +What does this number mean? You can use this number to estimate the best size of the model to fit the targeting MCU. In typical applications: @@ -158,11 +158,11 @@ Firstly, the model structure is printed during compiling in `model_compile()`, w Secondly, the runtime performance is printed by `model_stat()`. -Thirdly, there is a set of `prediction_*()` APIs to validate a set of testing data and print out Top-K accuracy, confusion matrix. +Thirdly, there is a set of `prediction_*()` APIs to validate a set of testing data and print out Top-K accuracy, confusion matrix and other info. ### An NNoM model -This is what a typical model looks like in the `weights.h` or `model.h` or whatever you name is. These codes are generated by the script. +This is what a typical model looks like in the `weights.h` or `model.h` or whatever you name it. These codes are generated by the script. In user's `main()`, call `nnom_model_create()` will create and compile the model. ~~~ @@ -236,7 +236,8 @@ Compling done in 179 ms It shows the run order, Layer names, activations, the output shape of the layer, the operation counts, the buffer size, and the memory block assignments. -Later, it prints the maximum memory cost for each memory block. Since the memory block is shared between layers, the model only use 3 memory blocks, altogether gives a sum memory cost by `18144 Bytes`. +Later, it prints the maximum memory cost for each memory block. Since the memory block is shared between layers, the model only +e 3 memory blocks, altogether gives a sum memory cost by `18144 Bytes`. ### Runtime statistices