-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Three transcription problems #295
Comments
Any updates on this request whatsoever? |
Wouldn't be hard to add such one letter exceptions, but then there will be cases where initial or one letter with a dot is the end of a sentence and it won't be split.
Do you have json files for those issues? In general, for such issues please provide json file produced by faster-whisper-xxl.exe and the commands used. |
I am using Faster-Whisper-XXL from within Subtitle Edit. What should I do to get a json file? |
|
Using Faster-Whisper-XXL_r192.3.4_windows.
Initial
Problem with inital in a transcription with --max_line_width 37 --max_line_count 2 --sentence --max_comma_cent 50:
Problem with initals in a transcription with --max_line_width 37 --max_line_count 2 --sentence --max_comma_cent 50:
Hyphen
Problem with hyphen in a transcription with --max_line_width 37 --max_line_count 2 --sentence --max_comma_cent 50:
Problem with hyphen in "Glass-Steagall" in a transcription with --max_line_width 37 --max_line_count 2 --sentence --max_comma_cent 50:
Problem with hyphen in a transcription with --max_line_width 37 --max_line_count 2 --sentence --max_comma_cent 50:
(When a line break followed by a hyphen, is automatically removed or repositioned in Subtitle Edit, it is replaced by a space. If this is removed automatically from this sentence, it will change into this: "In 1936, Roosevelt stood for re -election.")
Instead of like this:
It should look like this:
This would be even better (no risk of problems with subsequent processing):
This would also be better:
This problem with a hyphen, may be related to what is discussed here.
Comma
Problem with comma in a transcription with --standard:
Problem with comma in a transcription with --max_line_width 37 --max_line_count 2 --sentence --max_comma_cent 50:
(When a line break followed by a comma, is automatically removed in Subtitle Edit, it is replaced by a space. If this is removed automatically from this subtitle, it will change into this: "under the Linden, how there were 100 ,000 of people when they passed".)
It would be best if line breaks were only created at positions where the line break can be replaced by a space character.
Added later: It may not be possible to find a satisfactory solution for the problem with initials. See here.
Added later: Not sure how many people pronounce the English abbreviations "i.e." and "e.g." as letters, but if they do, something like this can happen:
The text was updated successfully, but these errors were encountered: