-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide wav data directly? #241
Comments
Same, I would like to pass data externally from js or another module. Data can be passed down as string base64 encoded or number[] assuming it is in the correct pcm format (16kHz / 16-bit PCM). It would also be great if we could directly pass mel spectrogram data and skip the WAV conversion. For context, I am working on a separate package @siteed/expo-audio-stream to do the audio streaming and want to pass down the data to whisper.rn. I have done n audio playground demo with live transcription on web but now want to integrate the same in native (try it at https://deeeed.github.io/expo-audio-stream/playground/ I forked the repository and started experimenting with the code to see if I could integrate this feature, but it doesn't seem straightforward. My initial thougths were to:
startRealtimeTranscribeWithAudioInput(
contextId: number,
jobId: number,
options: TranscribeOptions,
): Promise<void>;
receiveAudioDataChunk(jobId: number, audioData: number[]): Promise<void>;
namespace rnwhisper {
std::mutex job_mutex;
std::unordered_map<int, job*> job_map;
void start_realtime_transcribe_with_audio_input(int job_id, const whisper_full_params& params) {
RNWHISPER_LOG_INFO("Started real-time transcription with audio input for job_id: %d\n", job_id);
std::lock_guard<std::mutex> lock(job_mutex);
job* new_job = new job{job_id, false, params};
job_map[job_id] = new_job;
}
void receive_audio_data(int job_id, const std::vector<short>& audio_data) {
std::lock_guard<std::mutex> lock(job_mutex);
auto it = job_map.find(job_id);
if (it != job_map.end()) {
job* current_job = it->second;
// Log the size of the received audio data
RNWHISPER_LOG_INFO("Received audio data for job_id: %d, data size: %zu\n", job_id, audio_data.size());
// Convert std::vector<int> to std::vector<short>
std::vector<short> audio_data_short(audio_data.begin(), audio_data.end());
current_job->put_pcm_data(audio_data_short.data(), 0, audio_data_short.size(), audio_data_short.size());
// Log after processing the audio data
RNWHISPER_LOG_INFO("Processed audio data for job_id: %d\n", job_id);
} else {
RNWHISPER_LOG_WARN("No job found for job_id: %d\n", job_id);
}
}
Unfortunately, it would take me a while to figure out the full implementation as I haven't done much C++ integration in RN before. @jhen0409 does this feature makes sense to you? Could you highlight the steps you would take and things to watch for? I can then take a proper pass at implementing it with much less struggles :) Cheers |
After #267, we provided base64 wav decode in
Sorry for the long delay in responding. Our goal will be move the realtime transcription logic, allows us to process audio buffers more easily, and I added #52 will be next we need to implement for non-bridge data send. |
Is there a way to provide the wav data directly, say from a microphone-like peripheral instead of providing a wav file or having to record to one first?
The text was updated successfully, but these errors were encountered: