Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache recent/frequent translations #50

Open
martinpopel opened this issue Apr 4, 2022 · 0 comments
Open

Cache recent/frequent translations #50

martinpopel opened this issue Apr 4, 2022 · 0 comments

Comments

@martinpopel
Copy link
Member

When a long document is edited by a user in the frontend, all sentences of the document are being translated again after each edit.

For sentence-level models, we should re-translate only the sentences which have been changed.
Perhaps even a better solution would be to add a cache of recently translated sentences into the API server, so it can be reused by various frontends.
For a given translation direction and model, the cache should include

  • Last N sentences requested to be translated (and either translated using the backend or retrieved from the cache).
  • M most frequently translated sentences. This list can be build from the real usage from longer time period or from a monolingual corpus. This list does not need to be updated (frequently).

For document-level models, we could have a similar cache of the (possibly multi-sentence) sequences, which are being sent for translation to the backend. In other words, the cache could be integrated into the load balancer, so it does not need to distinguish whether how many sentences are in a sequence. We just should not introduce the same bug as DeepL, which is using a doc-level model (for en-cs), but caching seems to be on the sentence level.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant