-
Notifications
You must be signed in to change notification settings - Fork 8
Previous work done in this space
Normalisation rules challenge: https://www.kaggle.com/headsortails/watch-your-language-update-feature-engineering/report
from one of the textAV event problem domain a group focused on defining specs for a STT benchmarking tool
The Speech Recognition Benchmark is a program that assesses and compares the performances of automated speech recognition (ASR) APIs. It runs on Mac OS X, Microsoft Windows and Ubuntu. It currently supports the following ASR APIs: Amazon Lex, Google, Google Cloud, Houndify, IBM Watson, Microsoft (a.k.a. Bing), Speechmatics and Wit.
Made in Vancouver, Canada by Picovoice
This is a minimalist and extensible framework for benchmarking different speech-to-text engines. It has been developed and tested on Ubuntu with Python3.
this issue in their repository would sudgest the system has a WER component/functionality
Tried out a few word level diff libraries - as expected they all give slightly different results.
Asked Dylan from AssemblyAI how they perform their WER - he said:
We do a couple of things to compare. When we have the ground truth, we look for the WER using this WER algorithm
We also normalize the text by lowercasing everything, removing all punctuation, and converting numbers to written form (eg, "7" -> "seven") because different engines return numbers differently. Some write them out (like us) and some transform them to symbol format.
WER algorithm - Word Error Rate Calculation
Blog post about BBC R&D STT benchmarking tool. <-- there is interest in open sourcing scripts and sharing notes.
Levenshtein distance is a string metric for measuring the difference between two sequences. wikipedia
Used in the context of calculating WER.
An efficient Javascript implementation of the Levenshtein algorithm with locale-specific collator support.
A JavaScript module which provides classes and functions for comparing sequences. It can be used for example, for comparing files, and can produce difference information in various formats, including context and unified diffs. Ported from Python's difflib module.
This module provides classes and functions for comparing sequences. It can be used for example, for comparing files, and can produce difference information in various formats, including HTML and context and unified diffs. For comparing directories and files, see also, the filecmp module.
- based on python
difflib
Diff Match Patch is a high-performance library in multiple languages that manipulates plain text.
-
sclite
: alignment engine used to "align" errorful hypothesized texts, such as output from an ASR system, to the correct reference texts. After alignment, sclite generates a veriety of summary as well as detailed scoring reports. Bundled with the CMU-Cambridge Statistical Language Modeling Toolkit v2. The toolkit is used to compute word-weights based on an N-gram language model. -
sc_stats
: compares system performance between more than one system. Inter-System comparisons are made by running tests paired-comparison statistical significance tests. -
Rover
: combines ASR system outputs into a composite Word Transition network which is then searched an scored to retrieve the best scoring word sequence.
The Multi-Genre Broadcast (MGB) Challenge is an evaluation of speech recognition, speaker diarization, dialect detection and lightly supervised alignment using TV recordings in English and Arabic.
A React component to make correcting automated transcriptions of audio and video easier and faster. By BBC News Labs.
- Github repository
- demo click load demo
https://www.kaggle.com/headsortails/watch-your-language-update-feature-engineering/report