`fe_process_frames_ext` can discard speech data? #41

wutiantong · 2017-03-27T07:50:58Z

fe_interface.c Line 490-498

    /* Try to read from prespeech buffer */
    if (fe->vad_data->in_speech && fe_prespch_ncep(fe->vad_data->prespch_buf) > 0) {
    	outidx = fe_copy_from_prespch(fe, inout_nframes, buf_cep, outidx);
        if ((*inout_nframes) < 1) {
            /* mfcc buffer is filled from prespeech buffer */
            *inout_nframes = outidx;
            return 0;
        }
    }

If *inout_nframes < prespch_buf's ncep, code will return from here, while the input of speech data is totally ignored.
I have verified this case, it seems a bug.

The text was updated successfully, but these errors were encountered:

wutiantong · 2017-03-27T14:49:18Z

same problem happened at Line 525-535

/* Process all remaining frames. */
    while (*inout_nframes > 0 && *inout_nsamps >= (size_t)fe->frame_shift) {
        fe_shift_frame(fe, *inout_spch, fe->frame_shift);
        fe_write_frame(fe, buf_cep[outidx], voiced_spch != NULL);

	outidx = fe_check_prespeech(fe, inout_nframes, buf_cep, outidx, out_frameidx, inout_nsamps, orig_nsamps);

        /* Update input-output pointers and counters. */
        *inout_spch += fe->frame_shift;
        *inout_nsamps -= fe->frame_shift;
    }

If fe_write_frame has changed vad_data->in_speech(false -> true), fe_check_prespeech can completely exhaust inout_nframes with vad_data->prespch_buf, then terminate this while loop halfway - remained speech data would be skipped, even though the following code try to handle overflow_samps.
I'm sure some speech data is skipped here.

nshmyrev · 2017-03-27T14:51:15Z

Honestly there are so many issues here. Yes, sometimes data is skipped. We actually desperately need a frontend rework, not simply bug fixing, a totally new architecture with proper estimation of parameters is required. If you are interested to work on this, I can outline the design in a document.

wutiantong · 2017-03-27T15:30:09Z

Good to hear that.
Yes, I'm interested, however, probably lack of experience on this work.
I can't promise, but I'll try my best.

dhdaines · 2022-06-08T13:17:14Z

In my opinion, despite what is claimed on https://cmusphinx.github.io/wiki/faq/, noise suppression should be done externally. The VAD and noise removal code has added even more complexity to the frontend which was already too complex. Particularly since for a live application we do not want to even manage the audio input at all as it will be done by some external audio graph/pipeline like GStreamer, and this is how it is done on all platforms for quite some time now. Putting VAD in the gst-plugin was the right idea.

Given that PocketSphinx development is essentially abandoned we should revert to the 0.8 frontend code, particularly since alignment in batch mode is actually a common use case, and we do not want to ever discard any input in that case.

We should also discard the audio library entirely as its API is backwards for any modern platform where audio is always pushed to a processing node. The feature extractor should extract features and do nothing else. This is what I have done in SoundSwallower for instance: https://github.com/ReadAlongs/SoundSwallower

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`fe_process_frames_ext` can discard speech data? #41

`fe_process_frames_ext` can discard speech data? #41

wutiantong commented Mar 27, 2017 •

edited

Loading

wutiantong commented Mar 27, 2017 •

edited

Loading

nshmyrev commented Mar 27, 2017

wutiantong commented Mar 27, 2017

dhdaines commented Jun 8, 2022

fe_process_frames_ext can discard speech data? #41

fe_process_frames_ext can discard speech data? #41

Comments

wutiantong commented Mar 27, 2017 • edited Loading

wutiantong commented Mar 27, 2017 • edited Loading

nshmyrev commented Mar 27, 2017

wutiantong commented Mar 27, 2017

dhdaines commented Jun 8, 2022

`fe_process_frames_ext` can discard speech data? #41

`fe_process_frames_ext` can discard speech data? #41

wutiantong commented Mar 27, 2017 •

edited

Loading

wutiantong commented Mar 27, 2017 •

edited

Loading