Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Czech quotation marks in English input results in omitting whole sentences #39

Open
martinpopel opened this issue Jul 8, 2021 · 0 comments

Comments

@martinpopel
Copy link
Member

Czech (and German) uses lower and upper quotes. English does not. However, at least two users reported errors caused by this. E.g. „I am working on that,“ the AI told him. „Repairs now at sixty-one percent.“ is translated as „Pracuji na tom,“ sdělila mu umělá inteligence. - the second sentence translation is lost.

I plan to inspect if the problem is in the backend model or the frontend. Either way, by normalizing quotes in English inputs, we will get better-quality translation. Note that the same Unicode symbol U+201C is the opening quote in English and closing quote in Czech, so we need to be careful and apply the substitution only if lower quotes are present in the English input:

if ''„" in src:
    src = src.replace('“', '"').replace('„', '"')

Alternatively, we could use English directional (curved) quotes. Note that in this case the order of replacements is important:

if ''„" in src:
    src = src.replace('“', '”').replace('„', '“')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant