Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it a duplication that AA0 and AA both in the cmudict.symbols file? #33

Open
JohnHerry opened this issue Mar 27, 2023 · 1 comment
Open

Comments

@JohnHerry
Copy link

Thanks for the job. English is not my mother tongue, but I am learning to do English text-to-speech training. I want the cmudict.syllable as a symbol table to encode English phonemes. as I know the digits at the end of cmudict symbols are stress labels. AA0 means no stress. So do I need the symbol AA and AA0 at the same time in the process of English phoneme coding?

@Alexir
Copy link

Alexir commented Jun 1, 2023

A good question.
Whether you use AA0 depends on how your synthesizer deals with stress. For example, an unstressed vowel might be shortened or centralized (moved towards AX in phonetic space).

Note that not all entries have this annotation. This dict was developed for speech recognition: At some point, stress marks became less important. Stress was modeled implicitly in triphones, specifically the stressed and unstressed variants of a vowel were represented by (essentially) a bimodal distribution. More recently (and with more training data) they became even less important.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants