The goals of this project are:
- Evaluating the feasability of deploying the DeepSpeech STT (Speech-To-Text) model on a device without GPU accelaration.
- Getting a qualitative intuition behind what goes right/wrong in prediction.
- Getting a quantitative intuition behind what goes right/wrong in prediction (e.g. with Word Error Rate).
A PDF report is available here.