Skip to content

Use cases

EyalLavi edited this page Dec 14, 2018 · 5 revisions

UC1: Get WER for big 4 providers using CLI and a built-in dataset.

A command line user views Word Error Rate scores for transcriptions produced by the big 4 STT providers using a built-in dataset. This can be used to compare providers over time.

Preconditions:

  • Access to big 4 APIs
  • Normalisation rules for ground truth per language
  • Normalised ground truth per language
  • Normalisation rules for transcripts
  • WER algorithm per language
  • UI access control (e.g. login page)
  • Good diffing tool (build or source)

Primary flow: Select language > Submit audio test files to 4 APIs > Normalise results > Calculate WER for each result > Output results by provider.

UC2: Get WER for big 4 providers using web UI and a built-in dataset.

A web user views Word Error Rate scores for transcriptions produced by the big 4 STT providers using a built-in dataset. Same as UC1 but accessed through a web UI so that users don't have to configure their own access to the providers' APIs.

UC3: Get WER for any provider using CLI and a built-in dataset.

A developer creates a component for their chosen STT provider and is able to connect it to the framework using a documented interface. Same as UC1 with the addition of this provider.

UC4: Get WER for streams using CLI and a built-in dataset.

A user views WER for streams. Same as UC1-3, but the audio is sent to the providers as a stream. Results are converted to transcript files for WER analysis.

UC5: Get processing speed for files.

A user views the time it has taken the provider to return the transcript for an audio file, expressed as a ratio of duration to processing time.

UC6: Get latency for streams.

A user views the average time it has taken the provider to return the correct word (TBC: providers that support configurable latency).

UC: Make test data for big 4

Allows the user to submit their own test data for benchmarking the big 4 STT providers. Audio files are optimised for the providers and ground truth text files are normalised.

Preconditions:

  • Ground truth preparation guidelines
  • Audio file preparation guidelines
  • Normalisation rules for ground truth, per language

Primary flow: Submit audio test files to 4 APIs > Normalise results > Calculate WER for each result > Output results by provider.