-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Word capitalization, everything lowercase #57
Comments
Agreed. Especially words like: I, I'm, I'll, I'd. I don't think there's any situation where those should be lowercase. Please improve this as it'll make a great difference. |
We can have a list of all the words and if they should be capitalized. Does anyone know of such (multilingual) list? |
This is probably more an issue of the (smaller) models: alphacep/vosk-api#1204
Maybe a combination of a small base vosk model with a (reduced) punctuation model would work? Word lists
Such a word list would need to include the context or consist of generated patterns, since in some languages the capitalization cannot be determined just by the word form itself. E.g. in German all nouns are capitalized, including nominalization: So most word lists with nouns etc. would probably result in a lot of incorrect capitalization, since the verb form in such cases is more common. |
As far as English goes: Months... (January → December) The hard one is Titles of books/movies and stuff like that, but we can't expect everything. At least taking care of the most common stuff listed above will help tremendously. As of right now using Sayboard is just outputting one long sentence all lowercase, it just doesn't look good and I have to spend 15 min after just to correct everything. Perhaps also add a user defined list of replacements that the user can add their own list of words to auto-replace, then the user can tune the app for his/her own needs. Example, I can add: So with such a user defined list, we can fine tune the app to our personal needs. But definitely have built-in lists of common names, and "I". |
I use sayboard mostly in German. For me it would be a great help if all nouns were capitalized. I have found the following possible word lists:
I understand the restrictions of #57 (comment) but capitalising all nouns would help me a lot. I could still edit nominalisation and similar special cases by hand. This issue is also related to #58 which would also help. |
Text generated by Sayboard is all lowercase, except for the first word after punctuation.
Not sure if this is an issue with the models or the app, but it makes the app not very usable beyond casual/lazy chats, especially for languages capitalizing nouns etc., like German.
The text was updated successfully, but these errors were encountered: