adding word boundaries to the acoustic model for Na #210

alexis-michaud · 2018-10-25T15:39:21Z

Since 2018, the model for Na includes tone-group boundaries. But up till now (Oct. 2018), the model for Na still disregards word boundaries. A look at story-fold cross-validation materials suggests that longer words have somewhat different acoustic properties. So there could be value for phoneme & tone recognition in adding word boundaries to the training.

A first step (suggested by @oadams ) could be to produce separate error rates for short words versus longer words by using the word segmentation in the reference transcription as a guide.

(Suggested label for this Issue: Yongning Na)

alexis-michaud · 2019-02-10T15:33:51Z

This relates to #214, in that the word boundary in the training corpus is a space.

"it's important that if users want to explictly predict spaces (in character prediction), then that is accounted for. Probably best with a flag to segment_into_chars() or something similar, which would generate special tokens that represent spaces, such as underscores, for training and decoding. These then would get removed as a postprocessing step."

shuttle1987 added the Yongning Na label Oct 26, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adding word boundaries to the acoustic model for Na #210

adding word boundaries to the acoustic model for Na #210

alexis-michaud commented Oct 25, 2018

alexis-michaud commented Feb 10, 2019

adding word boundaries to the acoustic model for Na #210

adding word boundaries to the acoustic model for Na #210

Comments

alexis-michaud commented Oct 25, 2018

alexis-michaud commented Feb 10, 2019