Provide wav data directly? #241

ThatHackerDudeFromCyberspace · 2024-05-05T06:41:26Z

Is there a way to provide the wav data directly, say from a microphone-like peripheral instead of providing a wav file or having to record to one first?

deeeed · 2024-08-01T10:07:01Z

Same, I would like to pass data externally from js or another module. Data can be passed down as string base64 encoded or number[] assuming it is in the correct pcm format (16kHz / 16-bit PCM). It would also be great if we could directly pass mel spectrogram data and skip the WAV conversion.

For context, I am working on a separate package @siteed/expo-audio-stream to do the audio streaming and want to pass down the data to whisper.rn. I have done n audio playground demo with live transcription on web but now want to integrate the same in native (try it at https://deeeed.github.io/expo-audio-stream/playground/

I forked the repository and started experimenting with the code to see if I could integrate this feature, but it doesn't seem straightforward.

My initial thougths were to:

Add new method to JS Interface

  startRealtimeTranscribeWithAudioInput(
    contextId: number,
    jobId: number,
    options: TranscribeOptions,
  ): Promise<void>;
  receiveAudioDataChunk(jobId: number, audioData: number[]): Promise<void>;

Implement the methods in cpp:

namespace rnwhisper {

std::mutex job_mutex;
std::unordered_map<int, job*> job_map;

void start_realtime_transcribe_with_audio_input(int job_id, const whisper_full_params& params) {
    RNWHISPER_LOG_INFO("Started real-time transcription with audio input for job_id: %d\n", job_id);

    std::lock_guard<std::mutex> lock(job_mutex);
    job* new_job = new job{job_id, false, params};
    job_map[job_id] = new_job;
}

void receive_audio_data(int job_id, const std::vector<short>& audio_data) {
    std::lock_guard<std::mutex> lock(job_mutex);
    auto it = job_map.find(job_id);
    if (it != job_map.end()) {
        job* current_job = it->second;

        // Log the size of the received audio data
        RNWHISPER_LOG_INFO("Received audio data for job_id: %d, data size: %zu\n", job_id, audio_data.size());

        // Convert std::vector<int> to std::vector<short>
        std::vector<short> audio_data_short(audio_data.begin(), audio_data.end());
        current_job->put_pcm_data(audio_data_short.data(), 0, audio_data_short.size(), audio_data_short.size());

         // Log after processing the audio data
        RNWHISPER_LOG_INFO("Processed audio data for job_id: %d\n", job_id);
    } else {
        RNWHISPER_LOG_WARN("No job found for job_id: %d\n", job_id);
    }
}

Integrate Amdroid and IOS

Android:
Add native method declarations in the RNWhisper class.
Implement methods to call the native functions, handle audio data, and resolve promises
iOS:
Add corresponding method declarations in the Objective-C/Swift code.
Implement methods to interface with the C++ functions, handle audio data, and communicate with JavaScript.

Unfortunately, it would take me a while to figure out the full implementation as I haven't done much C++ integration in RN before.

@jhen0409 does this feature makes sense to you? Could you highlight the steps you would take and things to watch for? I can then take a proper pass at implementing it with much less struggles :)

Cheers

jhen0409 · 2024-11-09T03:57:25Z

After #267, we provided base64 wav decode in context.transcribe.

@jhen0409 does this feature makes sense to you? Could you highlight the steps you would take and things to watch for? I can then take a proper pass at implementing it with much less struggles :)

Sorry for the long delay in responding.

Our goal will be move the realtime transcription logic, allows us to process audio buffers more easily, and I added context.transcribeData first for transcribe PCM data (base64). You can see TranscribeData example. (Note that currently it's not a realtime transcription example)

#52 will be next we need to implement for non-bridge data send.

jhen0409 mentioned this issue Nov 6, 2024

feat: support wav base64 for transcribe & add transcribeData for no head data #267

Merged

3 tasks

jhen0409 closed this as completed Nov 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide wav data directly? #241

Provide wav data directly? #241

ThatHackerDudeFromCyberspace commented May 5, 2024

deeeed commented Aug 1, 2024 •

edited

Loading

jhen0409 commented Nov 9, 2024

Provide wav data directly? #241

Provide wav data directly? #241

Comments

ThatHackerDudeFromCyberspace commented May 5, 2024

deeeed commented Aug 1, 2024 • edited Loading

jhen0409 commented Nov 9, 2024

deeeed commented Aug 1, 2024 •

edited

Loading