-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding new word in the vocabulary #79
Comments
Hello, I'm quite new to Kaldi and not qualified to answer, but does this help ? I hope it helps at least with the format question. |
Thanks for the answer but what I want to do is to add a set of new in-domain words in the vocabulary (I do not want them to be considered as OOV). To do that, I need to generate pronunciations for them. Pronunciations coming either from CMU dictionary, or coming from a G2P system. |
Any hint? |
Would it work to add the words to the original dict.ipa, use the scripts
to generate the new phones.txt and the graph and use those for rescoring
on decoding ?
|
@joazoa If the new word is not in the language model, you have to extend the language model too. An approach is provided by this repo: https://github.com/gooofy/kaldi-adapt-lm |
Yes, sorry forgot to mention that part, you'd have to run kenlm again also.
|
Yes but with kaldi-adapt-lm, it seems you only restrict to the words the model is already able to recognise (ie words part of the lexicon). cf. "we also want to limit our language model to the vocabulary the audio model supports, so let's extract the vocabulary next" |
you can use the speech_lex_edit.py script to add new words to the dictionary. the original dict uses IPA phoneme symbols - for the kaldi models those get converted to XSAMPA AFAIR. you can find translation tables as well as mapping helper functions here: https://github.com/gooofy/py-nltools/blob/master/nltools/phonetics.py |
Did you manage to do this? @ckobus |
Sorry, I just noticed your message. |
Hi @ckobus ,which scripts did you use and from where after converting to ipa? Can you please clarify? |
@gooofy , @ckobus , @ammyt |
Hi @fquirin , there is a script in the package that does the conversion automatically (at least for german) |
Hi @abdullah-tayeh , thanks for the note :-) |
@fquirin, please check out the tables in https://github.com/gooofy/py-nltools/blob/master/nltools/phonetics.py which should contain all the phonemes used in zamia-speech |
hey @gooofy , yes that's where I found I'm building a new version of kaldi-lm-adapt and wanted to add an espeak-to-zamia feature (espeak IPA) for new lexicon entries 🙂 . Btw the 2019 Zamia Kaldi models still rock 😎 👍 |
AFAIR I decided against the concept of "primary stress" vs "secondary stress" when designing the zamia phoneme set, instead I went with general "stress" marks which can appear multiple times within one word. Main reason was dealing with german compound words but also practicality: zamia's phoneme set is geared towards dealing with tts results which can contain arbitrary numbers of stress marks depending on the tool used. In fact, I don't recall any tts engine distinguishing primary and secondary stress. |
Thanks for the explanation @gooofy ! I tried to search for info about "AFAIR" before but couldn't find anything ^^.
|
From my experience converting from IPA can always be difficult, depending on the source. That IPA-normalization table grew when I started extracting IPA from wiktionary and is certainly by no means complete (or correct, for that matter). |
Ok weird, shouldn't there be a clear set of characters and conversion rules for IPA to X-SAMPA? 😕 |
To be honest I don't understand this IPA normalization table entirely 🤔 . For example those characters:
All 4 of them exist in the IPA table and have a different purpose. Why would you convert one to another? [EDIT] |
I am by no means an expert here, maybe you should discuss these questions with someone more proficient in the field of (computer-)linguistics. That said, here is my take: IPA is typically written by humans for humans to convey some idea of how a written word could be pronounced. I came across dozens of wiktionary IPA entries that looked very sensible to me until I fed them into a TTS system and listened to what that system produced out of it. IPA defines a huge number of phonemes and lots of additional symbols - all that helps conveying pronunciations to humans and supporting lots of different languages. Designing a phoneme set for machines to produce mathematical models of human speech is a very different affair: typically you want a small set of phonemes especially when you start with a relatively small set of samples - the larger your phoneme set, the more phonemes will have very few samples (or none at all) they occur in causing instabilities in your model. But even if you have a large sample base there is still the question what good additional phonemes will do to your model - will those additional phonemes really improve recognition performance or the quality of the speech produced? At some point you will also face the question of which phonemes actually exists in nature and which of them you want to model - after all, speech is a natural phenomenon analog world which you model model using discrete phonemes. In fact, even amongst linguists these questions seem debatable: https://en.wikipedia.org/wiki/Phoneme#The_non-uniqueness_of_phonemic_solutions one of my favorite examples in the german language is r vs ʀ vs ʁ - which one of them is used differs by region/dialect - so in this case it comes down to the question whether you want to model dialects in your pronunciation dictionary - in zamia, I definitely decided against that but of course other designers may decide otherwise for their phoneme set. |
Thanks again for the background info. I see now, its not a trivial problem to solve 😁 . So, back at the drawing board, what's actually the best way to generate new words for the Zamia lexicon.txt files? 🤷♂️ NOTE: The reason why I would like to use espeak is because I can create the phoneme set by actually listening to it (looking at the original 'speech_lex_edit.py' file I think you had the same intention). |
In my experience if you want high quality lexicon entries there is no way around checking them manually. In general I would use speech_lex_edit to add new entries to the dictionary (either directly or through speech_editor while reviewing samples). Inside that tool you have options to generate pronounciations via espeak, mary tts and sequitur g2p. Usually I would listen to all three and pick the best one, sometimes with manual improvements (like fixing stress marks etc.). |
Hi,
I would like to use the pretrained acoustic model for English but use it in combination with a new in-domain language model, for which I have to generate pronunciations.
I am used to the Kaldi toolkit and the CMU dictionary, which uses the ARPA alphabet. I saw in your repo the script to convert the CMU dictionary to the IPA format but when I look at the phones.txt file associated to the acoustic model, I do not recognize the IPA format. For ex, to which phoneme tS corresponds to in the ARPA alphabet?
I hoope my question is clear enough.
Thank you for your answer!
CK
The text was updated successfully, but these errors were encountered: