Czech quotation marks in English input results in omitting whole sentences #39

martinpopel · 2021-07-08T10:36:54Z

Czech (and German) uses lower and upper quotes. English does not. However, at least two users reported errors caused by this. E.g. „I am working on that,“ the AI told him. „Repairs now at sixty-one percent.“ is translated as „Pracuji na tom,“ sdělila mu umělá inteligence. - the second sentence translation is lost.

I plan to inspect if the problem is in the backend model or the frontend. Either way, by normalizing quotes in English inputs, we will get better-quality translation. Note that the same Unicode symbol U+201C is the opening quote in English and closing quote in Czech, so we need to be careful and apply the substitution only if lower quotes are present in the English input:

if ''„" in src:
    src = src.replace('“', '"').replace('„', '"')

Alternatively, we could use English directional (curved) quotes. Note that in this case the order of replacements is important:

if ''„" in src:
    src = src.replace('“', '”').replace('„', '“')

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Czech quotation marks in English input results in omitting whole sentences #39

Czech quotation marks in English input results in omitting whole sentences #39

martinpopel commented Jul 8, 2021

Czech quotation marks in English input results in omitting whole sentences #39

Czech quotation marks in English input results in omitting whole sentences #39

Comments

martinpopel commented Jul 8, 2021