Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(feature request) a way to pause and continue a translation process that takes days to execute; plus a progress indicator #416

Open
bruceleerabbit opened this issue May 17, 2024 · 1 comment

Comments

@bruceleerabbit
Copy link

A document with 50k words takes about a day to translate on a 16 year old machine. For the past day, I’ve been working on a doc double that size. There is no progress indicator and the target output file is zero in size, so I have no idea how far along the processing is. If I lose power or need to shutdown the machine, the past day of work is lost. Most people have newer hardware but considering the heavy amount of computation needed anyone can be trapped in a task that takes days given enough text to translate.

Workaround: we can split the original doc into manageable pieces. But this is manual labor intensive because we cannot just chop the doc at arbitrary locations. Every piece must not break a paragraph. Then the pieces can be in such high numbers that it’s hard to manage.

So I suggest these improvements:

  • A progress indicator
  • A way to send a signal to stop processing gracefully and preserve the work completed so far
  • A way to resume processing on a giant document later
  • A way to see and make use of partial translations. E.g. on day 5 of a translation process, a user should be able to access output generated 2 days into the translation.
@AddioElectronics
Copy link

You definitely shouldn't be translating whole documents at once, that would take a long time even on a beast of a machine.

This could be considered "low level" library, and the features you are asking for are don't really make sense for what it is.
Everything you are asking for is a fairly easy task to accomplish, but I don't think it really suits the library. Maybe I'm wrong though I just found it an hour ago.

If I were you I'd translate it line by line, or sentence by sentence, like you mentioned. This is easily automated for even a beginner coder, or are you from a non-coding background?

  1. Read the document stream until you reach a period.
  2. Translate the sentence, save to your output stream.
  3. Save the position in a secondary file, so if you need to start later, you can open the file and begin where you left off.
  4. Update your progress after each sentence by checking your current position to the length of the stream.
  5. As for viewing past translations, if you are doing it in chunks you can just open the file to view it.

If you need help automating it, let me know, and we can talk about it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants