Proof of concept implementation for OpenAI compatible API format #237

ayancey · 2024-08-07T23:57:53Z

Quick and dirty implementation of what it would look like to support OpenAI's API format. This is an attempt to satisfy #227.

Example OpenAI output from /v1/audio/transcriptions:

{
    "task": "transcribe",
    "language": "english",
    "duration": 9.90999984741211,
    "text": "The dog jumped over the big fence and then it ran over to the farm.",
    "segments": [
        {
            "id": 0,
            "seek": 0,
            "start": 0.0,
            "end": 11.0,
            "text": " The dog jumped over the big fence and then it ran over to the farm.",
            "tokens": [
                50364,
                440,
                3000,
                13864,
                670,
                264,
                955,
                15422,
                293,
                550,
                309,
                5872,
                670,
                281,
                264,
                5421,
                13,
                50914
            ],
            "temperature": 0.0,
            "avg_logprob": -0.3397972285747528,
            "compression_ratio": 1.0634920597076416,
            "no_speech_prob": 0.02906951494514942
        }
    ]
}

Some notes:

OpenAI's implementation uses form data, not JSON input
Their formats offered are json, text, srt, verbose_json, or vtt. json only has one key with "text", whereas verbose_json includes other basic info. verbose_json is used along with timestamp_granularities[] array to provide segments or word level timestamps. Since we always get segments, we have to throw those away when json format is used.
Need a way to get duration of the file to match OpenAI output. I can divide the size of the numpy array by the sample rate to get the number of seconds, but I also have to divide it by 2? Not sure if it's stereo, that wouldn't make much sense.
The abstraction between the endpoint route method and the core.py methods for whisper/faster-whisper need to be changed. Mostly for modifying the JSON keys before they're turned into a StringIO stream.

ahmetoner · 2024-10-06T16:34:39Z

I am planning to merge this update for version 2.

Proof of concept implementation for OpenAI compatible API format

81b8575

ayancey mentioned this pull request Aug 10, 2024

FR: compatibility with OpenAI's API #182

Open

ahmetoner self-assigned this Aug 19, 2024

ahmetoner added the enhancement New feature or request label Aug 19, 2024

ayancey mentioned this pull request Sep 11, 2024

OpenAI Whisper Provider morpheus65535/bazarr#2073

Merged

5 tasks

matheusmaiberg approved these changes Sep 29, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proof of concept implementation for OpenAI compatible API format #237

Proof of concept implementation for OpenAI compatible API format #237

ayancey commented Aug 7, 2024

ahmetoner commented Oct 6, 2024

Proof of concept implementation for OpenAI compatible API format #237

Are you sure you want to change the base?

Proof of concept implementation for OpenAI compatible API format #237

Conversation

ayancey commented Aug 7, 2024

ahmetoner commented Oct 6, 2024