Skip to content
This repository has been archived by the owner on Jun 9, 2022. It is now read-only.

fe_process_frames_ext can discard speech data? #41

Open
wutiantong opened this issue Mar 27, 2017 · 4 comments
Open

fe_process_frames_ext can discard speech data? #41

wutiantong opened this issue Mar 27, 2017 · 4 comments

Comments

@wutiantong
Copy link

wutiantong commented Mar 27, 2017

fe_interface.c Line 490-498

    /* Try to read from prespeech buffer */
    if (fe->vad_data->in_speech && fe_prespch_ncep(fe->vad_data->prespch_buf) > 0) {
    	outidx = fe_copy_from_prespch(fe, inout_nframes, buf_cep, outidx);
        if ((*inout_nframes) < 1) {
            /* mfcc buffer is filled from prespeech buffer */
            *inout_nframes = outidx;
            return 0;
        }
    }

If *inout_nframes < prespch_buf's ncep, code will return from here, while the input of speech data is totally ignored.
I have verified this case, it seems a bug.

@wutiantong
Copy link
Author

wutiantong commented Mar 27, 2017

same problem happened at Line 525-535

/* Process all remaining frames. */
    while (*inout_nframes > 0 && *inout_nsamps >= (size_t)fe->frame_shift) {
        fe_shift_frame(fe, *inout_spch, fe->frame_shift);
        fe_write_frame(fe, buf_cep[outidx], voiced_spch != NULL);

	outidx = fe_check_prespeech(fe, inout_nframes, buf_cep, outidx, out_frameidx, inout_nsamps, orig_nsamps);

        /* Update input-output pointers and counters. */
        *inout_spch += fe->frame_shift;
        *inout_nsamps -= fe->frame_shift;
    }

If fe_write_frame has changed vad_data->in_speech(false -> true), fe_check_prespeech can completely exhaust inout_nframes with vad_data->prespch_buf, then terminate this while loop halfway - remained speech data would be skipped, even though the following code try to handle overflow_samps.
I'm sure some speech data is skipped here.

@nshmyrev
Copy link
Contributor

Honestly there are so many issues here. Yes, sometimes data is skipped. We actually desperately need a frontend rework, not simply bug fixing, a totally new architecture with proper estimation of parameters is required. If you are interested to work on this, I can outline the design in a document.

@wutiantong
Copy link
Author

Good to hear that.
Yes, I'm interested, however, probably lack of experience on this work.
I can't promise, but I'll try my best.

@dhdaines
Copy link
Contributor

dhdaines commented Jun 8, 2022

In my opinion, despite what is claimed on https://cmusphinx.github.io/wiki/faq/, noise suppression should be done externally. The VAD and noise removal code has added even more complexity to the frontend which was already too complex. Particularly since for a live application we do not want to even manage the audio input at all as it will be done by some external audio graph/pipeline like GStreamer, and this is how it is done on all platforms for quite some time now. Putting VAD in the gst-plugin was the right idea.

Given that PocketSphinx development is essentially abandoned we should revert to the 0.8 frontend code, particularly since alignment in batch mode is actually a common use case, and we do not want to ever discard any input in that case.

We should also discard the audio library entirely as its API is backwards for any modern platform where audio is always pushed to a processing node. The feature extractor should extract features and do nothing else. This is what I have done in SoundSwallower for instance: https://github.com/ReadAlongs/SoundSwallower

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

No branches or pull requests

3 participants