-
Testing results of subtitles for hour-long files on version 2.3 I'm noticing that English contractions are being split onto different segments and I'm not sure how to control this behavior. Aside from the default regrouping, I'm trying to use the split_by_length function to trim output of SRT subtitle to 84 characters, but I'm finding that sometimes that English contractions are being split into separate lines. I'm not sure if the problem is with split_by_length not recognizing apostrophe characters. I know it has to do with regrouping function calls, so far I have this which isn't too bad except for the contractions:
Problematic output as follows:
or
Here's the whole script I've been working with:
Edit: |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 11 replies
-
The later splits might have split it. Have you tried it with result.merge_by_punctuation(["'"], lock=True) |
Beta Was this translation helpful? Give feedback.
This should do the trick. If you use
["'"]
it only looks for the ones ending with'
and not those starting with'
. But your case is that it starts with'
.