-
Notifications
You must be signed in to change notification settings - Fork 27
Python example? #11
Comments
Please refer to the attached colab which uses python code to run inference on a TFLite model. If I have the time, I will write a simple python script to demonstrate how to run inference with a TFLite model. In general, you can use python code to run inference on a TFLite model . To run inference on a TFLite model, you can use Python code. One way to do this is to use the TensorFlow Lite Interpreter. Here is an sample snippet of how to do this: Import necessary packagesimport tensorflow as tf Load the TFLite modelinterpreter = tf.lite.Interpreter(model_path="model.tflite") Preprocess the input data,write the actual spectrograms data hereinput_data = np.random.randn(1, 256, 256, 3) Run inferenceinterpreter.invoke() Obtain and postprocess the outputoutput_details = interpreter.get_output_details() Save and/or visualize the resultsnp.savetxt("output.txt", output_data) |
Thanks, I'll check that out! 👍 |
Using your instructions I built a very naive example but haven't had much luck so far. Here is the code: import wave
import tensorflow as tf
import numpy as np
audio_file="test_wavs/1089-134686-0001.wav"
print(f'Loading audio file: {audio_file}')
wf = wave.open(audio_file, "rb")
sample_rate_orig = wf.getframerate()
audio_length = wf.getnframes() * (1 / sample_rate_orig)
if (wf.getnchannels() != 1 or wf.getsampwidth() != 2
or wf.getcomptype() != "NONE" or sample_rate_orig != 16000):
print("Audio file must be WAV format mono PCM.")
exit (1)
input_data = np.frombuffer(wf.readframes(wf.getnframes()), np.int16)
#input_data = np.random.randn(1, 256, 256, 3)
print(f'Samplerate: {sample_rate_orig}, length: {audio_length}s')
print(f'Loading tflite model ...')
interpreter = tf.lite.Interpreter(model_path="models/whisper.tflite")
input_details = interpreter.get_input_details()
interpreter.resize_tensor_input(input_details[0]['index'], input_data.shape)
interpreter.allocate_tensors()
interpreter.set_tensor(input_details[0]['index'], input_data)
interpreter.invoke()
output_details = interpreter.get_output_details()
output_data = interpreter.get_tensor(output_details[0]['index'])
output_data = output_data.squeeze()
np.savetxt("output.txt", output_data) The error I get is (for both real and random data):
I'm using Python 3.9 on aarch64, with tensorflow 2.11.0, tflite 2.10.0. and numpy 1.24.0. |
I have used Google Colab to test the below code Please install below required tools/repo !git lfs install
!git clone https://github.com/usefulsensors/openai-whisper.git
!pip install git+https://github.com/openai/whisper.git and then run the below code to generate tokens import whisper
from whisper.audio import load_audio, log_mel_spectrogram,pad_or_trim,N_FRAMES, SAMPLE_RATE
import tensorflow as tf
import numpy as np
audio_file="/content/openai-whisper/samples/jfk.wav"
print(f'Loading audio file: {audio_file}')
mel_from_file = log_mel_spectrogram(audio_file)
input_data = pad_or_trim(mel_from_file, N_FRAMES)
input_data = tf.expand_dims(input_data, 0)
print(f'Loading tflite model ...')
# Load the TFLite model and allocate tensors
interpreter = tf.lite.Interpreter(model_path="/content/openai-whisper/models/whisper.tflite")
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()
print(input_data.shape)
interpreter.resize_tensor_input(input_details[0]['index'], input_data.shape)
interpreter.set_tensor(input_details[0]['index'], input_data)
interpreter.invoke()
output_details = interpreter.get_output_details()
output_data = interpreter.get_tensor(output_details[0]['index'])
print(output_data) Convert tokens into text import torch
wtokenizer = whisper.tokenizer.get_tokenizer(False, language="en")
for token in output_data:
token[token == -100] = wtokenizer.eot
text = wtokenizer.decode(token, skip_special_tokens=True)
print(text) |
Thanks!
|
Please run the below Colab link |
Ah sorry, I didn't see that you had removed the I saw you've worked on a streaming version as well. I read that this will likely increase WER due to the missing context (Whisper is trained on 30s windows if I remember correct) but I'd like to try and see what happens ^^. Could you maybe give me a hint on how to adapt the basic demo to handle chunks instead of all data at once? Is it even possible with this configuration? |
Hey @nyadla-sys , I was doing some tests today and wondered if it is possible to use the I'm assuming this is because the input has wrong data type:
|
@fquirin Interesting! Did you get the Python to work with the streaming version? btw: I have ran the inference on Aarch64 both on a rpi3 and a rpi4 without problems as seen at my benchmark issue; As inferernce on the standard 11 second jfk.wav only teakes about 5 seconds to transcribe, utilizing this from Python together with streaming mode this could be a very nice STT engine on embedded devices. Is you Python work available somewhere? |
I kind of gave up on streaming for the current version of Whisper. The model is just not built for it and the workarounds all mess with the context and cut audio at specific time intervals which leads to all sorts of artifacts and WER problems, at least that is my experience. I hope Open-AI will make a proper streaming model soon (same as Nvidia NeMo ^^).
For SEPIA I rarely need anything larger than 4s, unfortunately that means the user-experience is rather bad when you have to wait another 3-4s for the result (on a RPi4) AFTER you've finished speaking.
I quickly put together a new repository for ASR experiments with the test code 🙂. |
Currently I am seeing issue with TFLIte converter for int8 whisper model conversion and working with Google to resolve this issue |
thanks for the info 👍 |
I am investigating the option of converting the encoder and decoder into separate TFLite models and will provide additional information later. |
@fquirin I have seen your performance reports at whisper.cpp They are inline with what I did and reported for the RPI4 and RPI3. Also, fully agree with you, streaming really has a PPP (Piss Poor Performance) WER. Just feeding the few seconds WAV to the normal binary which adds empty audio to the end up to 30 seconds works suprisingly well, so no clue what is happing there. I guess the streaming mode throws away to much at the beginning. It really needs to keep all audio from beginning to the end and just continue inference on the growing audio till VAD detection shuts it down. From that point one last inference on the final wav needs to happen. I just don't know if that is possible or if it already happens. @nyadla-sys Tried testing your last medium model, however as epected it does not fit into 2 GB memory of the RPI4. It appears to work as it does not segfault like reported in that other issue. Could you also upload tflite converted base and small models? I am able to load them into memory running whisper.cpp so gues the tflite one should fit as well. Anyhow, great work! |
The way I understand it is that the non-streaming Transformer models need to see the whole input at once to reach the high accuracy, because they are trained on long context. In Whispers case this seems to be a 30s window, meaning the model looks at all 30s at once and the first second will influence the last. That ist also the reason for hallucinations. In a way the model recognizes a certain part and makes up the most probable rest, similar to LLMs that finish a story.
I've seen systems doing that, but for obvious reasons on high-performance machines like Mac where you can afford to transcribe the same data over and over again until your audio is complete or reached the VAD stop signal. |
I will update base and small models ASAP |
Hi @nyadla-sys ,
this is a very interesting work on OpenAI's Whisper 🙂👍.
I've built a multi engine, streaming server for STT (SEPIA STT-Server) that runs on Raspberry Pi and was thinking about Whisper integration a while ago, but didn't really follow up on it since Whisper is a non-streaming system by design. Then I saw your TFlite port and was wondering if it may be fast enough to get something like a pseudo-real-time experience ^^.
Since the SEPIA STT-Server is build on Python I was wondering if you have simple Python demo available? 🙂
Ty,
Florian
The text was updated successfully, but these errors were encountered: