Skip to content

Commit

Permalink
Merge pull request pytorch#255 from jeffxtang/pocket_fft
Browse files Browse the repository at this point in the history
Version 2 of the Streaming ASR app
  • Loading branch information
jeffxtang authored Jul 7, 2022
2 parents 86aff27 + d9ad95c commit 8e2700a
Show file tree
Hide file tree
Showing 11 changed files with 4,228 additions and 645 deletions.
6 changes: 0 additions & 6 deletions StreamingASR/CMakeLists.txt

This file was deleted.

39 changes: 14 additions & 25 deletions StreamingASR/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,9 @@ In the Speech Recognition Android [demo app](https://github.com/pytorch/android-

## Prerequisites

* PyTorch 1.11 and torchaudio 0.11 or above (Optional)
* PyTorch 1.12 and torchaudio 0.12 or above (Optional)
* Python 3.8 (Optional)
* Android Pytorch library org.pytorch:pytorch_android_lite:1.11.0
* Android Pytorch library org.pytorch:pytorch_android_lite:1.12.2
* Android Studio 4.0.1 or later

## Quick Start
Expand All @@ -22,39 +22,32 @@ git clone https://github.com/pytorch/android-demo-app
cd android-demo-app/StreamingASR
```

If you don't have PyTorch 1.11 and torchaudio 0.11 installed or want to have a quick try of the demo app, you can download the optimized scripted model file [streaming_asr.ptl](https://drive.google.com/file/d/1awT_1S6H5IXSOOqpFLmpeg0B-kQVWG2y/view?usp=sharing), then drag and drop it to the `StreamingASR/app/src/main/assets` folder inside `android-demo-app/StreamingASR`, and continue to Step 3.

Also you need to download [Eigen](https://eigen.tuxfamily.org/), a C++ template library for linear algebra, for Android NDK build required to run the app (see last section of this README for more info):
```
mkdir external; cd external
git clone https://github.com/jeffxtang/eigen
```
If you don't have PyTorch 1.12 and torchaudio 0.12 installed or want to have a quick try of the demo app, you can download the optimized scripted model file [streaming_asrv2.ptl](https://drive.google.com/file/d/1XRCAFpMqOSz5e7VP0mhiACMGCCcYfpk-/view?usp=sharing), then drag and drop it to the `StreamingASR/app/src/main/assets` folder inside `android-demo-app/StreamingASR`, and continue to Step 3.

### 2. Test and Prepare the Model

To install PyTorch 1.11, torchaudio 0.11, and other required Python packages (numpy and pyaudio), do something like this:
To install PyTorch 1.12, torchaudio 0.12, and other required packages (numpy, pyaudio, and fairseq), do something like this:

```
conda create -n pt1.11 python=3.8.5
conda activate pt1.11
pip install torch torchaudio numpy pyaudio
conda create -n pt1.12 python=3.8.5
conda activate pt1.12
pip install torch torchaudio numpy pyaudio fairseq
```

Now download the streaming ASR model file
[scripted_wrapper_tuple_no_transform.pt](https://drive.google.com/file/d/1_49DwHS_a3p3THGdHZj3TXmjNJj60AhP/view?usp=sharing) to the `android-demo-app/StreamingASR` directory.
First, create the model file `scripted_wrapper_tuple.pt` by running `python generate_ts.py`.

To test the model, run `python run_sasr.py`. After you see:
Then, to test the model, run `python run_sasr.py`. After you see:
```
Initializing model...
Initialization complete.
```
say something like "good afternoon happy new year", and you'll likely see the streaming recognition results `good afternoon happy new year` while you speak. Hit Ctrl-C to end.
say something like "good afternoon happy new year", and you'll likely see the streaming recognition results `good afternoon happy new year` while you speak. Hit Ctrl-C to end.

To optimize and convert the model to the format that can run on Android, run the following commands:
Finally, to optimize and convert the model to the format that can run on Android, run the following commands:
```
mkdir -p StreamingASR/app/src/main/assets
python save_model_for_mobile.py
mv streaming_asr.ptl StreamingASR/app/src/main/assets
mv streaming_asrv2.ptl StreamingASR/app/src/main/assets
```

### 3. Build and run with Android Studio
Expand All @@ -67,10 +60,6 @@ Start Android Studio, open the project located in `android-demo-app/StreamingASR

## Librosa C++, Eigen, and JNI

Note that this demo uses a [C++ port](https://github.com/ewan-xu/LibrosaCpp/) of [Librosa](https://librosa.org), a popular audio processing library in Python, to perform the MelSpectrogram transform. In the Python script `run_sasr.py` above, the torchaudio's [MelSpectrogram](https://pytorch.org/audio/stable/transforms.html#melspectrogram) is used, but you can achieve the same transform result by replacing `spectrogram = transform(tensor).transpose(1, 0)`, line 46 of run_sasr.py with:
```
mel = librosa.feature.melspectrogram(np_array, sr=16000, n_fft=400, n_mels=80, hop_length=160)
spectrogram = torch.tensor(mel).transpose(1, 0)
```
The first version of this demo uses a [C++ port](https://github.com/ewan-xu/LibrosaCpp/) of [Librosa](https://librosa.org), a popular audio processing library in Python, to perform the MelSpectrogram transform, because torchaudio before version 0.11 doesn't support fft on Android (see [here](https://github.com/pytorch/audio/issues/408)). Using the Librosa C++ port and [JNI](https://developer.android.com/training/articles/perf-jni) (Java Native Interface) on Android makes the MelSpectrogram possible on Android. Furthermore, the Librosa C++ port requires [Eigen](https://eigen.tuxfamily.org/), a C++ template library for linear algebra, so both the port and the Eigen library are included in the first version of the demo app and built as JNI.

Because torchaudio currently doesn't support fft on Android (see [here](https://github.com/pytorch/audio/issues/408)), using the Librosa C++ port and [JNI](https://developer.android.com/training/articles/perf-jni) (Java Native Interface) on Android makes the MelSpectrogram possible on Android. Furthermore, the Librosa C++ port requires [Eigen](https://eigen.tuxfamily.org/), a C++ template library for linear algebra, so both the port and the Eigen library are included in the demo app and built as JNI, using the `CMakeLists.txt` and `MainActivityJNI.cpp` in `StreamingASR/app/src/main/cpp`.
See [here](https://github.com/jeffxtang/android-demo-app/tree/librosa_jni/StreamingASR) for the first version of the demo if interested in an example of using native C++ to expand operations not yet supported in PyTorch or one of its domain libraries.
16 changes: 1 addition & 15 deletions StreamingASR/StreamingASR/app/build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -13,13 +13,6 @@ android {
versionName "1.0"

testInstrumentationRunner "androidx.test.runner.AndroidJUnitRunner"

externalNativeBuild {
cmake {
cppFlags ""
arguments "-DLOGGER_BUILD_HEADER_LIB=ON", "-DBUILD_TESTING=OFF"
}
}
}

buildTypes {
Expand All @@ -32,13 +25,6 @@ android {
sourceCompatibility JavaVersion.VERSION_1_8
targetCompatibility JavaVersion.VERSION_1_8
}

externalNativeBuild {
cmake {
path "../../CMakeLists.txt"
version "3.10.2"
}
}
}

dependencies {
Expand All @@ -50,5 +36,5 @@ dependencies {
androidTestImplementation 'androidx.test.ext:junit:1.1.3'
androidTestImplementation 'androidx.test.espresso:espresso-core:3.4.0'

implementation 'org.pytorch:pytorch_android_lite:1.11'
implementation 'org.pytorch:pytorch_android_lite:1.12.2'
}
5 changes: 0 additions & 5 deletions StreamingASR/StreamingASR/app/src/main/cpp/CMakeLists.txt

This file was deleted.

92 changes: 0 additions & 92 deletions StreamingASR/StreamingASR/app/src/main/cpp/MainActivityJNI.cpp

This file was deleted.

Loading

0 comments on commit 8e2700a

Please sign in to comment.