Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide wav data directly? #241

Closed
ThatHackerDudeFromCyberspace opened this issue May 5, 2024 · 2 comments
Closed

Provide wav data directly? #241

ThatHackerDudeFromCyberspace opened this issue May 5, 2024 · 2 comments

Comments

@ThatHackerDudeFromCyberspace

Is there a way to provide the wav data directly, say from a microphone-like peripheral instead of providing a wav file or having to record to one first?

@deeeed
Copy link

deeeed commented Aug 1, 2024

Same, I would like to pass data externally from js or another module. Data can be passed down as string base64 encoded or number[] assuming it is in the correct pcm format (16kHz / 16-bit PCM). It would also be great if we could directly pass mel spectrogram data and skip the WAV conversion.

For context, I am working on a separate package @siteed/expo-audio-stream to do the audio streaming and want to pass down the data to whisper.rn. I have done n audio playground demo with live transcription on web but now want to integrate the same in native (try it at https://deeeed.github.io/expo-audio-stream/playground/

I forked the repository and started experimenting with the code to see if I could integrate this feature, but it doesn't seem straightforward.

My initial thougths were to:

  1. Add new method to JS Interface
  startRealtimeTranscribeWithAudioInput(
    contextId: number,
    jobId: number,
    options: TranscribeOptions,
  ): Promise<void>;
  receiveAudioDataChunk(jobId: number, audioData: number[]): Promise<void>;
  1. Implement the methods in cpp:
namespace rnwhisper {

std::mutex job_mutex;
std::unordered_map<int, job*> job_map;

void start_realtime_transcribe_with_audio_input(int job_id, const whisper_full_params& params) {
    RNWHISPER_LOG_INFO("Started real-time transcription with audio input for job_id: %d\n", job_id);

    std::lock_guard<std::mutex> lock(job_mutex);
    job* new_job = new job{job_id, false, params};
    job_map[job_id] = new_job;
}

void receive_audio_data(int job_id, const std::vector<short>& audio_data) {
    std::lock_guard<std::mutex> lock(job_mutex);
    auto it = job_map.find(job_id);
    if (it != job_map.end()) {
        job* current_job = it->second;

        // Log the size of the received audio data
        RNWHISPER_LOG_INFO("Received audio data for job_id: %d, data size: %zu\n", job_id, audio_data.size());

        // Convert std::vector<int> to std::vector<short>
        std::vector<short> audio_data_short(audio_data.begin(), audio_data.end());
        current_job->put_pcm_data(audio_data_short.data(), 0, audio_data_short.size(), audio_data_short.size());

         // Log after processing the audio data
        RNWHISPER_LOG_INFO("Processed audio data for job_id: %d\n", job_id);
    } else {
        RNWHISPER_LOG_WARN("No job found for job_id: %d\n", job_id);
    }
}
  1. Integrate Amdroid and IOS
  • Android:
    Add native method declarations in the RNWhisper class.
    Implement methods to call the native functions, handle audio data, and resolve promises
  • iOS:
    Add corresponding method declarations in the Objective-C/Swift code.
    Implement methods to interface with the C++ functions, handle audio data, and communicate with JavaScript.

Unfortunately, it would take me a while to figure out the full implementation as I haven't done much C++ integration in RN before.

@jhen0409 does this feature makes sense to you? Could you highlight the steps you would take and things to watch for? I can then take a proper pass at implementing it with much less struggles :)

Cheers

@jhen0409
Copy link
Member

jhen0409 commented Nov 9, 2024

After #267, we provided base64 wav decode in context.transcribe.

@jhen0409 does this feature makes sense to you? Could you highlight the steps you would take and things to watch for? I can then take a proper pass at implementing it with much less struggles :)

Sorry for the long delay in responding.

Our goal will be move the realtime transcription logic, allows us to process audio buffers more easily, and I added context.transcribeData first for transcribe PCM data (base64). You can see TranscribeData example. (Note that currently it's not a realtime transcription example)

#52 will be next we need to implement for non-bridge data send.

@jhen0409 jhen0409 closed this as completed Nov 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants