-
Notifications
You must be signed in to change notification settings - Fork 271
fe_process_frames_ext
can discard speech data?
#41
Comments
same problem happened at Line 525-535 /* Process all remaining frames. */
while (*inout_nframes > 0 && *inout_nsamps >= (size_t)fe->frame_shift) {
fe_shift_frame(fe, *inout_spch, fe->frame_shift);
fe_write_frame(fe, buf_cep[outidx], voiced_spch != NULL);
outidx = fe_check_prespeech(fe, inout_nframes, buf_cep, outidx, out_frameidx, inout_nsamps, orig_nsamps);
/* Update input-output pointers and counters. */
*inout_spch += fe->frame_shift;
*inout_nsamps -= fe->frame_shift;
} If |
Honestly there are so many issues here. Yes, sometimes data is skipped. We actually desperately need a frontend rework, not simply bug fixing, a totally new architecture with proper estimation of parameters is required. If you are interested to work on this, I can outline the design in a document. |
Good to hear that. |
In my opinion, despite what is claimed on https://cmusphinx.github.io/wiki/faq/, noise suppression should be done externally. The VAD and noise removal code has added even more complexity to the frontend which was already too complex. Particularly since for a live application we do not want to even manage the audio input at all as it will be done by some external audio graph/pipeline like GStreamer, and this is how it is done on all platforms for quite some time now. Putting VAD in the gst-plugin was the right idea. Given that PocketSphinx development is essentially abandoned we should revert to the 0.8 frontend code, particularly since alignment in batch mode is actually a common use case, and we do not want to ever discard any input in that case. We should also discard the audio library entirely as its API is backwards for any modern platform where audio is always pushed to a processing node. The feature extractor should extract features and do nothing else. This is what I have done in SoundSwallower for instance: https://github.com/ReadAlongs/SoundSwallower |
fe_interface.c Line 490-498
If
*inout_nframes
< prespch_buf's ncep, code will return from here, while the input of speech data is totally ignored.I have verified this case, it seems a bug.
The text was updated successfully, but these errors were encountered: