Cache recent/frequent translations #50

martinpopel · 2022-04-04T22:42:55Z

When a long document is edited by a user in the frontend, all sentences of the document are being translated again after each edit.

For sentence-level models, we should re-translate only the sentences which have been changed.
Perhaps even a better solution would be to add a cache of recently translated sentences into the API server, so it can be reused by various frontends.
For a given translation direction and model, the cache should include

Last N sentences requested to be translated (and either translated using the backend or retrieved from the cache).
M most frequently translated sentences. This list can be build from the real usage from longer time period or from a monolingual corpus. This list does not need to be updated (frequently).

For document-level models, we could have a similar cache of the (possibly multi-sentence) sequences, which are being sent for translation to the backend. In other words, the cache could be integrated into the load balancer, so it does not need to distinguish whether how many sentences are in a sequence. We just should not introduce the same bug as DeepL, which is using a doc-level model (for en-cs), but caching seems to be on the sentence level.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache recent/frequent translations #50

Cache recent/frequent translations #50

martinpopel commented Apr 4, 2022

Cache recent/frequent translations #50

Cache recent/frequent translations #50

Comments

martinpopel commented Apr 4, 2022