Audio transcription support #781

andrewfrench · 2024-05-15T00:05:50Z

Introduces an AudioLoader, responsible for producing an AudioArtifact from binary audio data
Add AudioTranscriptionTask, accepting an AudioArtifact and responding with a TextArtifact containing the transcribed text
Add AudioTranscriptionClient allowing an Agent to generate a transcription from audio in memory or on the filesystem
Add OpenAiAudioTranscriptionDriver to support integration with OpenAI's speech-to-text models
Update structure configs to reference either OpenAiAudioTranscriptionDriver or DummyAudioTranscriptionDriver
Add tests and documentation for added constructs

📚 Documentation preview 📚: https://griptape--781.org.readthedocs.build//781/

codecov · 2024-05-21T23:31:54Z

Codecov Report

Attention: Patch coverage is 82.31707% with 29 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
...o_transcription/base_audio_transcription_driver.py	51.72%	12 Missing and 2 partials ⚠️
griptape/tasks/audio_transcription_task.py	77.14%	5 Missing and 3 partials ⚠️
griptape/tools/audio_transcription_client/tool.py	89.65%	1 Missing and 2 partials ⚠️
...transcription/openai_audio_transcription_driver.py	90.47%	1 Missing and 1 partial ⚠️
..._transcription/dummy_audio_transcription_driver.py	90.00%	1 Missing ⚠️
...iptape/engines/audio/audio_transcription_engine.py	87.50%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

collindutter · 2024-05-23T23:04:05Z

griptape/engines/audio/audio_transcription_engine.py

Outside the scope of this PR but I'm starting to doubt our pattern of creating Engines that are just thin wrappers over Drivers. Engines should augment 1 or more Drivers for a specific use-case that a single Driver cannot do on its' own.

What do you think?
CC @vasinov

I strongly agree, this layer is way too thin, but I also did not want to create a weird single audio-to-text modality that breaks our established pattern. Optimistically, they're a layer where we can add additional functionality. For now, they're mostly an additional initialization step.

collindutter

Nice work, just naming comments

griptape/drivers/audio_transcription/base_audio_transcription_driver.py

griptape/tools/transcription_client/tool.py

collindutter · 2024-05-28T16:43:06Z

griptape/drivers/audio_transcription/base_audio_transcription_driver.py

+
+
+@define
+class BaseAudioTranscriptionDriver(SerializableMixin, ExponentialBackoffMixin, ABC):


Should we name these Drivers BaseSpeechToTextDriver for consistency with the inverse Drivers?

I think I've steered a bit too far in the direction of naming drivers based on their artifact interfaces. I think the specificity of this name is helpful, what do you think about adding a similarly specific name to the BaseTextToSpeechDriver? BaseSpeechGenerationDriver?

Yeah thinking about it more I think this current convention is the more "correct" one.

Down to do a rename of BaseTextToSpeechDriver (though maybe in a separate PR), what do you think about BaseAudioGenerationDriver? This may extend beyond speech in the future?

collindutter

Great work!

andrewfrench marked this pull request as draft May 15, 2024 00:05

andrewfrench changed the title ~~French/240514/transcription transcription~~ Audio transcription support May 15, 2024

andrewfrench force-pushed the french/240514/transcription--transcription branch from cfa71d8 to a4d2383 Compare May 15, 2024 22:19

andrewfrench marked this pull request as ready for review May 21, 2024 22:58

andrewfrench added 5 commits May 21, 2024 16:06

Updates, docs stubs

e5a87ae

Audio Loader dependencies

b5e7997

Remove task, prefer ToolTask with Client?

2c20378

Docs and tests

5e73eee

Fix typo

c055935

andrewfrench force-pushed the french/240514/transcription--transcription branch from 6d9af68 to c055935 Compare May 21, 2024 23:07

poetry lock --no-update

55cbf44

andrewfrench added 6 commits May 21, 2024 16:52

Tasks, tests, docs

a9162cb

TranscriptionClient tests

af4d323

Merge branch 'dev' into french/240514/transcription--transcription

3d306db

Add audio transcription task tests

6715875

Fix docs

47ebb6a

poetry run ruff format

d719edb

andrewfrench requested a review from a team May 22, 2024 22:08

andrewfrench added 2 commits May 23, 2024 11:30

Update changelog

ae93421

Merge branch 'dev' into french/240514/transcription--transcription

91c97ba

collindutter reviewed May 23, 2024

View reviewed changes

andrewfrench added 4 commits May 24, 2024 12:03

Evaluate callable input at runtime

211eeae

Merge branch 'dev' into french/240514/transcription--transcription

dd00a89

Fix docs example

34108eb

Update changelog

2906a9b

collindutter reviewed May 28, 2024

View reviewed changes

Naming

949792c

andrewfrench requested a review from collindutter May 29, 2024 00:41

collindutter previously approved these changes May 31, 2024

View reviewed changes

Merge branch 'dev' into french/240514/transcription--transcription

0e944d0

andrewfrench dismissed collindutter’s stale review via 0e944d0 June 3, 2024 16:18

poetry lock --no-update

09b9e24

andrewfrench requested a review from collindutter June 3, 2024 16:21

collindutter previously approved these changes Jun 3, 2024

View reviewed changes

attr -> attrs

2c0e261

andrewfrench dismissed collindutter’s stale review via 2c0e261 June 3, 2024 21:40

Merge branch 'dev' into french/240514/transcription--transcription

40692b6

collindutter previously approved these changes Jun 3, 2024

View reviewed changes

Fix integration test

83e8408

andrewfrench dismissed collindutter’s stale review via 83e8408 June 3, 2024 21:58

collindutter approved these changes Jun 4, 2024

View reviewed changes

andrewfrench merged commit ff008c0 into dev Jun 4, 2024
11 checks passed

andrewfrench deleted the french/240514/transcription--transcription branch June 4, 2024 16:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Audio transcription support #781

Audio transcription support #781

andrewfrench commented May 15, 2024 •

edited

Loading

codecov bot commented May 21, 2024 •

edited

Loading

collindutter May 23, 2024

andrewfrench May 24, 2024

collindutter left a comment

collindutter May 28, 2024

andrewfrench May 29, 2024

collindutter May 29, 2024

collindutter left a comment



		@define
		class BaseAudioTranscriptionDriver(SerializableMixin, ExponentialBackoffMixin, ABC):

Audio transcription support #781

Audio transcription support #781

Conversation

andrewfrench commented May 15, 2024 • edited Loading

codecov bot commented May 21, 2024 • edited Loading

Codecov Report

collindutter May 23, 2024

Choose a reason for hiding this comment

andrewfrench May 24, 2024

Choose a reason for hiding this comment

collindutter left a comment

Choose a reason for hiding this comment

collindutter May 28, 2024

Choose a reason for hiding this comment

andrewfrench May 29, 2024

Choose a reason for hiding this comment

collindutter May 29, 2024

Choose a reason for hiding this comment

collindutter left a comment

Choose a reason for hiding this comment

andrewfrench commented May 15, 2024 •

edited

Loading

codecov bot commented May 21, 2024 •

edited

Loading