-
Notifications
You must be signed in to change notification settings - Fork 8
Use cases
A command line user views Word Error Rate scores for transcriptions produced by the big 4 STT providers using a built-in dataset. This can be used to compare providers over time.
Preconditions:
- Access to big 4 APIs
- Normalisation rules for ground truth per language
- Normalised ground truth per language
- Normalisation rules for transcripts
- WER algorithm per language
- UI access control (e.g. login page)
- Good diffing tool (build or source)
Primary flow: Select language > Submit audio test files to 4 APIs > Normalise results > Calculate WER for each result > Output results by provider.
A web user views Word Error Rate scores for transcriptions produced by the big 4 STT providers using a built-in dataset. Same as UC1 but accessed through a web UI so that users don't have to configure their own access to the providers' APIs.
A developer creates a component for their chosen STT provider and is able to connect it to the framework using a documented interface. Same as UC1 with the addition of this provider.
A user views WER for streams. Same as UC1-3, but the audio is sent to the providers as a stream. Results are converted to transcript files for WER analysis.
A user views the time it has taken the provider to return the transcript for an audio file, expressed as a ratio of duration to processing time.
A user views the average time it has taken the provider to return the correct word (TBC: providers that support configurable latency).
Allows the user to submit their own test data for benchmarking the big 4 STT providers. Audio files are optimised for the providers and ground truth text files are normalised.
Preconditions:
- Ground truth preparation guidelines
- Audio file preparation guidelines
- Normalisation rules for ground truth, per language
Primary flow: Submit audio test files to 4 APIs > Normalise results > Calculate WER for each result > Output results by provider.