-
Notifications
You must be signed in to change notification settings - Fork 8
Metrics
EyalLavi edited this page Dec 17, 2018
·
3 revisions
This is a list of possible metrics for assessing STT providers, in no particular order. Some are applicable to streams or files only. Some are useful when analysing a single asset, others are meaningful only in the context of a large and varied data set. All require further detail.
- Word Error Rate
- Weighted WER (forthcoming?)
- NER (subjective)
- Timing accuracy
- Speaker change (voice changes)
- Speaker identification (words matched with speaker)
- Voice recognition (voice matched to real world person)
- Repeatability of results
- Punctuation: sentence boundaries, semantic phrases
- Capitalisation: sentence structure and proper nouns
- Ratio of processing to duration
- Stream vs file performance
- Accuracy of initial ‘partials’ (live)
- Latency of word recognition (live)
- Latency/accuracy ratio (where configurable)
- Growing results lookup distance (live)
- Tolerance of noise
- Performance on different accent groups
- Tolerance of uncommon vocabulary
- Some sense of how much errors are “grouped”. Are they peppered throughout or are they clustered.
- The gap between aggregate confidence score and confidence. Average confidence vs actual accuracy, and also association at a per word level. This is useful when you want to automatically steer people towards places that need correction, or quickly rule files in or out as being appropriate for ASR at all.
- How much performance is improved by adding a list of vocabulary (this is a bit fiddly though as might be tuneable to quirks of certain speech engines)