Skip to content

Glossary

EyalLavi edited this page Dec 17, 2018 · 5 revisions

ASR

In this context, synonymous with STT.

Diarisation / speaker diarisation

Combination of speaker segmentation and speaker clustering. The first aims at finding speaker change points in an audio stream. The second aims at grouping together speech segments on the basis of speaker characteristics

Diff / Diff tool

The processes or tool for comparing two text files, presenting the deletions, insertions and replacements. Diff is a processing step in determining WER. Diff tools may use different algorithms and so produce different results.

Ground truth

In the context of STT, a high-accuracy transcript against which the results of the STT provider are compared. Usually prepared manually.

Hypothesis

A machine learning term. Here, a synonym for 'results'.

Metric

A measurement applied to the final transcript returned by the provider or to the process of transcription. Represents a dimension of difference between providers.

Provider / STT provider

A system, service or tool that provides speech-to-text capability.

Reference

A machine learning term. Here, a synonym for 'ground truth'.

Results

In the context of STT, the transcript returned by the provider for an audio file.

Speaker recognition

Recognising a real world speaker from their voice.

STT

Speech-to-text. This is the loose term for automatic transcription systems. Other terms may describe specific technical functions.

Test set / test data

In the context of STT, audio-visual files with a corresponding transcript against which the results of STT providers are evaluated.

Vendor / STT vendor

A commercial speech to text provider.

WER

Word Error Rate. A commonly-used (but coarse) metric to evaluate the accuracy of machine transcription.

Clone this wiki locally