Skip to content

Previous work done in this space

EyalLavi edited this page Apr 24, 2019 · 16 revisions

Normalisation rules challenge: https://www.kaggle.com/headsortails/watch-your-language-update-feature-engineering/report

TextAV 2018

from one of the textAV event problem domain a group focused on defining specs for a STT benchmarking tool

Franck-Dernoncourt/ASR_benchmark

The Speech Recognition Benchmark is a program that assesses and compares the performances of automated speech recognition (ASR) APIs. It runs on Mac OS X, Microsoft Windows and Ubuntu. It currently supports the following ASR APIs: Amazon Lex, Google, Google Cloud, Houndify, IBM Watson, Microsoft (a.k.a. Bing), Speechmatics and Wit.

Picovoice/stt-benchmark

Made in Vancouver, Canada by Picovoice

This is a minimalist and extensible framework for benchmarking different speech-to-text engines. It has been developed and tested on Ubuntu with Python3.

Mozilla DeepSpeech

this issue in their repository would sudgest the system has a WER component/functionality

pietrop/STT-Services-comparator

Tried out a few word level diff libraries - as expected they all give slightly different results.

AssemblyAI Benchmarking script

Asked Dylan from AssemblyAI how they perform their WER - he said:

We do a couple of things to compare. When we have the ground truth, we look for the WER using this WER algorithm

We also normalize the text by lowercasing everything, removing all punctuation, and converting numbers to written form (eg, "7" -> "seven") because different engines return numbers differently. Some write them out (like us) and some transform them to symbol format.

WER algorithm - Word Error Rate Calculation

BBC R&D STT benchmarking

Blog post about BBC R&D STT benchmarking tool. <-- there is interest in open sourcing scripts and sharing notes.

fast-levenshtein

Levenshtein distance is a string metric for measuring the difference between two sequences. wikipedia

Used in the context of calculating WER.

An efficient Javascript implementation of the Levenshtein algorithm with locale-specific collator support.

difflib

A JavaScript module which provides classes and functions for comparing sequences. It can be used for example, for comparing files, and can produce difference information in various formats, including context and unified diffs. Ported from Python's difflib module.

This module provides classes and functions for comparing sequences. It can be used for example, for comparing files, and can produce difference information in various formats, including HTML and context and unified diffs. For comparing directories and files, see also, the filecmp module.

google/diff-match-patch

Diff Match Patch is a high-performance library in multiple languages that manipulates plain text.

  • sclite: alignment engine used to "align" errorful hypothesized texts, such as output from an ASR system, to the correct reference texts. After alignment, sclite generates a veriety of summary as well as detailed scoring reports. Bundled with the CMU-Cambridge Statistical Language Modeling Toolkit v2. The toolkit is used to compute word-weights based on an N-gram language model.
  • sc_stats: compares system performance between more than one system. Inter-System comparisons are made by running tests paired-comparison statistical significance tests.
  • Rover: combines ASR system outputs into a composite Word Transition network which is then searched an scored to retrieve the best scoring word sequence.

The Multi-Genre Broadcast (MGB) Challenge is an evaluation of speech recognition, speaker diarization, dialect detection and lightly supervised alignment using TV recordings in English and Arabic.

@bbc/react-transcript-editor

A React component to make correcting automated transcriptions of audio and video easier and faster. By BBC News Labs.

Text normalization competition

https://www.kaggle.com/headsortails/watch-your-language-update-feature-engineering/report