Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The listed pronunciation of "for" and "four" both sound like "far" #37

Open
jsalsman opened this issue Sep 11, 2024 · 13 comments
Open

The listed pronunciation of "for" and "four" both sound like "far" #37

jsalsman opened this issue Sep 11, 2024 · 13 comments

Comments

@jsalsman
Copy link

This has wide-ranging implications:

Microsoft/Duolingo: https://www.youtube.com/watch?v=DTj7VILryRo

Google: https://www.youtube.com/watch?v=K-tEkivp_YM

This happens because CMUDICT lists "for" as F AO R instead of F OW R, using the "ah" vowel sound in "caught" instead of the "oh" sound in "oat."

This is NOT because of the cot-caught merger, or any other linguistic reason. It is a bona fide coding error which occurs in over 50 similar entries.

I have repeatedly attempted to raise this issue with Alex Rudnicky and others, to no avail. Recommendations for a reasonable plan to approach this issue are both sorely needed and welcome.

@dhdaines
Copy link

dhdaines commented Sep 11, 2024

As far as I know this really is intentional. I have always assumed that AO is /ɔ/ whereas OW is /oʊ/ and I am not alone in this assumption. Those are not the same vowel, even in Canadian / Midwestern, cot-caught merged English.

And in fact it is because of the cot-caught merger, because if "caught" (or the other one, I have no idea which is which) is transcribed "K AO T", then yes obviously if you train a TTS from cot-caught merged English (which, frankly, I suspect the younger generation speaks nearly everywhere in North America) then "for" is going to sound like "far"...

@dhdaines
Copy link

I would add though that perhaps it is a distinction without a difference as I can't think of a minimal pair for ɔ ~ oʊ off the top of my head. But phonetically, definitely not the same, see, e.g.

https://en.wiktionary.org/wiki/for#Pronunciation
https://en.wiktionary.org/wiki/boat#Pronunciation

@jsalsman
Copy link
Author

Nobody says, "one two three four" in a way that rhymes with "far", do they?

@dhdaines
Copy link

Nobody says, "one two three four" in a way that rhymes with "far", do they?

I don't really understand your question. Perhaps I wasn't clear in what I said?

Are you under the impression that the vowel in "far" is /ɔ/ (that is, AO)? It's not listed that way in CMUDict.

Are you under the impression that AO is pronounced as something other than /ɔ/?

@dhdaines
Copy link

I'll refer you to this table, tell me if there are pronunciations in cmudict that don't correspond to "General American":

https://en.wikipedia.org/wiki/Template:English_-or-_table

@dhdaines
Copy link

Note also this for "for" versus "four" which are the same vowel nearly everywhere in the US (I think they're the same vowel too but many Canadians may disagree):

https://en.wikipedia.org/wiki/English-language_vowel_changes_before_historic_/r/#Horse%E2%80%93hoarse_merger

@dhdaines
Copy link

dhdaines commented Sep 14, 2024

Also, note that "caught" is transcribed with two alternatives to cover dialects of US English with and without the merger:

caught K AA1 T
caught(2) K AO1 T

Again. AO is /ɔ/. Transcribing "four" as "F AO R" is not an error. In the case of "for" the vowel can be reduced to "ER", which again, is also present in the dictionary (not sure what for(3) is supposed to be...):

for F AO1 R
for(2) F ER0
for(3) F R ER0

@danmartinez
Copy link

danmartinez commented Sep 14, 2024

Isn't the issue here that the Microsoft/Google pronunciations are not following what's in the cmudict?

Wondering if they might be using some older version where the entries were incorrect, I went all the back to the original 0.7a commit 11 years ago; the entries mentioned are still as stated above; i.e.,

DOOR D AO1 R
FAR F AA1 R
FOR F AO1 R

It seems to me that this isn't an instance of incorrect cmudict transcriptions.

The question is then, why are those pronunciations seemingly treating "far" like "for"? I'm struggling to think of any instance of a North American dialect where this is even the case.

Regarding the -or table linked to above: I do not believe anyone with the particular NA dialectic tendency to pronounce "Florida" to rhyme with "far" would ever pronounce "for" or "four" like "far" when it's a free morpheme. (So, yes, to my knowledge, no one with such a dialect tendency would say, "one, two, three, far.")

@jsalsman
Copy link
Author

Here's what Merriam-Webster.com has, including on its pronunciation key:

Far: fär, where ä is "as o in mop"

For and Four: fȯr, where ȯ is "as aw in law," but is very clearly spoken in both of the words' audio clips as ō, which is described "as o in go"

So this is not just a CMUDICT problem! Were CMUDICT pronunciations originally taken from a revision of Webster?

@jsalsman jsalsman changed the title The listed pronunciation for "for" and "four" both sound like "far" The listed pronunciation of "for" and "four" both sound like "far" Sep 21, 2024
@danmartinez
Copy link

@jsalsman What change(s) would you suggest to how these sounds might be encoded then?

@jsalsman
Copy link
Author

jsalsman commented Sep 21, 2024

I think AO should be changed to OW anywhere it appears as /ɔ/ in the OED.com pronunciations for U.S. English.

I'm not a fan of dipthongs, but otherwise it's just wrong.

@dhdaines
Copy link

dhdaines commented Sep 23, 2024

I think AO should be changed to OW anywhere it appears as /ɔ/ in the OED.com pronunciations for U.S. English.

I'm not a fan of dipthongs, but otherwise it's just wrong.

What vowel is AO supposed to represent, then, if it's not /ɔ/?

If it's /ɒ/, that's not a phoneme of General American English.

I don't understand your comment about diphthongs at all. It's not a question of whether you like them but:

  1. Whether they exist in people's speech (the vowel in "for" is not a diphthong in GAE)
  2. Whether they are distinctive phonemes.

I can't think of a minimal pair for /ɔ/ and /oʊ/ in GAE so yes, this might make sense to merge them, but they are quite phonetically distinct...

@danmartinez
Copy link

danmartinez commented Sep 23, 2024

I think I must be missing the point. @jsalsman If you would, please, I'd like more explanation as to the rationale for such changes.

My thinking is, if the concern arises from a speech analyzer determining that (as an example) door rhymes with far, I believe the most likely applicable scenarios (in General American) are 1) the treatment of /ɔ/ as /ɑ/ ~ /ɒ/ (unlikely) and/or 2) /ɔ/ ~ /oʊ/.

We can easily disregard /ɑ/~/ɒ/. Some North American speakers do realize a non-coda /ɔ/ as /ɑ/ ~ /ɒ/, but this is allophonic. Regardless, as this trait only applies to non-coda positions, if the analyzer's rationale stems from this in some way, it's simply incorrect. In any case, this would have nothing to do with cmudict.

As to whether this is between /ɔ/ and /oʊ/ or perhaps /o/, then there's several things to consider. Are we only talking about vowels that occur before /r/? What mergers, if any, are we accounting for? (Several concern these sounds.)

Lastly, are we even sure this software is using cmudict in the first place?

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants