-
Notifications
You must be signed in to change notification settings - Fork 8
Glossary
In this context, synonymous with STT.
Combination of speaker segmentation and speaker clustering. The first aims at finding speaker change points in an audio stream. The second aims at grouping together speech segments on the basis of speaker characteristics
The processes or tool for comparing two text files, presenting the deletions, insertions and replacements. Diff is a processing step in determining WER. Diff tools may use different algorithms and so produce different results.
In the context of STT, a high-accuracy transcript against which the results of the STT provider are compared. Usually prepared manually.
A machine learning term. Here, a synonym for 'results'.
A measurement applied to the final transcript returned by the provider or to the process of transcription. Represents a dimension of difference between providers.
A system, service or tool that provides speech-to-text capability.
A machine learning term. Here, a synonym for 'ground truth'.
In the context of STT, the transcript returned by the provider for an audio file.
Recognising a real world speaker from their voice.
Speech-to-text. This is the loose term for automatic transcription systems. Other terms may describe specific technical functions.
In the context of STT, audio-visual files with a corresponding transcript against which the results of STT providers are evaluated.
A commercial speech to text provider.
Word Error Rate. A commonly-used (but coarse) metric to evaluate the accuracy of machine transcription.