Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

subtitles parser doesn't handle time codes that include more than 3 digits in the milliseconds field #1997

Closed
1 task
warren-bank opened this issue Dec 24, 2024 · 6 comments
Assignees

Comments

@warren-bank
Copy link

warren-bank commented Dec 24, 2024

Version

Media3 main branch

More version details

you ask: why would the milliseconds field ever contain more than 3 digits
I reply: it shouldn't, but for whatever reason.. captions on PBS do

example:

  • VTT for PBS News Hour
    • episode: 12/23/2024
    • notice that all time codes have 6 digits in the milliseconds field
      • ex: 00:00:00.000000 --> 00:00:00.100000
      • I didn't dig into the code to see how the parser is processing this field, but I can tell you that in ExoPlayer.. no subtitles show for a while.. and then (at about 7 minutes) a giant dump of text covers the screen
    • quick test:
      • I downloaded the file
      • opened it in a text editor
      • performed the following regex operation on its contents:
        • search: (\.\d{3})\d{3}
        • replace: $1
      • note: this effectively truncates the milliseconds field from 6 digits to 3 digits
      • loaded the resulting vtt file in ExoPlayer
      • issue fixed.. captions play as expected

note: I haven't yet tested their SRT captions; I suspect the same time code format. TBD..

Devices that reproduce the issue

OS: Android
app: ExoAirPlayer v3.7.0
lib: AndroidX Media3 v1.5.0

Devices that do not reproduce the issue

No response

Reproducible in the demo app?

Not tested

Reproduction steps

load captions from: VTT

Expected result

captions display at the correct time

Actual result

captions don't display at the correct time

Media

-or-

Bug Report

  • You will email the zip file produced by adb bugreport to [email protected] after filing this issue.
@warren-bank
Copy link
Author

[off-topic] quick observation regarding PBS..

I just checked a few previous episodes, and they all include both SRT and VTT captions having 3 decimal millisecond fields..
only the episode on 12/23 is different.. only offering VTT captions (no SRT), and the format being.. as described.

on the one hand, hopefully PBS will fix itself.
on the other hand, it illustrated a bug in ExoPlayer that should be easily fixed.

@warren-bank
Copy link
Author

quick update..

I can confirm that 6 digit millisecond fields in SRT files are also not parsed correctly..
experiencing slightly different behavior

  • no big block of text that covers the entire screen
  • caption timing is incorrect and out of sync

to reproduce the sample SRT files, I performed the following steps to convert the VTT to SRT:

  1. with the original VTT having 6 digit millisecond fields (named: pbs-0.vtt)
    • remove the first 2 lines:
        WEBVTT
        X-TIMESTAMP-MAP=LOCAL:00:00:00.000,MPEGTS:207000
      
    • regex operation to add placeholder index numbers and reformat the milliseconds field separator from "." to ","
      • search: ^(\d\d:\d\d:\d\d)\.(\d{6})(\s+-->\s+)(\d\d:\d\d:\d\d)\.(\d{6})$
      • replace: 1\n$1,$2$3$4,$5
    • save file with name: pbs-0.srt
  2. with the SRT file having 6 digit millisecond fields (named: pbs-0.srt)
    • regex operation to truncate the milliseconds field to 3 digits
      • search: (,\d{3})\d{3}
      • replace: $1
    • save file with name: pbs-1.srt

attachments:

@warren-bank
Copy link
Author

warren-bank commented Dec 24, 2024

code trace for VTT

https://github.com/androidx/media/blob/1.5.1/libraries/extractor/src/main/java/androidx/media3/extractor/text/webvtt/WebvttCueParser.java#L200

  String firstLine = webvttData.readLine();
  Matcher cueHeaderMatcher = WebvttCueParser.CUE_HEADER_PATTERN.matcher(firstLine);
  return parseCue(null, cueHeaderMatcher, webvttData, styles);

https://github.com/androidx/media/blob/1.5.1/libraries/extractor/src/main/java/androidx/media3/extractor/text/webvtt/WebvttCueParser.java#L346-L349

  builder.startTimeUs = WebvttParserUtil.parseTimestampUs(Assertions.checkNotNull(cueHeaderMatcher.group(1)));
  builder.endTimeUs   = WebvttParserUtil.parseTimestampUs(Assertions.checkNotNull(cueHeaderMatcher.group(2)));

https://github.com/androidx/media/blob/1.5.1/libraries/extractor/src/main/java/androidx/media3/extractor/text/webvtt/WebvttParserUtil.java#L67

  String[] parts = Util.splitAtFirst(timestamp, "\\.");
  value += Long.parseLong(parts[1]);
  return value * 1000;

fix for VTT

  • line
    • from:
        value += Long.parseLong(parts[1]);
    • to:
        if (parts[1].length() > 3) {
          try {
            parts[1] = parts[1].substring(0, 3);
          }
          catch(IndexOutOfBoundsException e) {}
        }
        value += Long.parseLong(parts[1]);

@warren-bank
Copy link
Author

warren-bank commented Dec 24, 2024

code trace for SRT

https://github.com/androidx/media/blob/1.5.1/libraries/extractor/src/main/java/androidx/media3/extractor/text/subrip/SubripParser.java#L135-L142

  long startTimeUs;
  long endTimeUs;
  Matcher matcher = SUBRIP_TIMING_LINE.matcher(currentLine);
  if (matcher.matches()) {
    startTimeUs = parseTimecode(matcher, /* groupOffset= */ 1);
    endTimeUs   = parseTimecode(matcher, /* groupOffset= */ 6);
  }

https://github.com/androidx/media/blob/1.5.1/libraries/extractor/src/main/java/androidx/media3/extractor/text/subrip/SubripParser.java#L288-291

  String millis = matcher.group(groupOffset + 4);
  if (millis != null) {
    timestampMs += Long.parseLong(millis);
  }
  return timestampMs * 1000;

fix for SRT

  • line
    • from:
        timestampMs += Long.parseLong(millis);
    • to:
        if (millis.length() > 3) {
          try {
            millis = millis.substring(0, 3);
          }
          catch(IndexOutOfBoundsException e) {}
        }
        timestampMs += Long.parseLong(millis);

@icbaker
Copy link
Collaborator

icbaker commented Jan 2, 2025

This isn't a bug in the library, the media is invalid (and therefore any player behaviour is undefined), see further reasoning in #1999 (comment).

copybara-service bot pushed a commit that referenced this issue Jan 7, 2025
We previously parsed an arbitrary number of decimal places, but assumed
the value was in milliseconds, which doesn't make sense if there is
greater or fewer than 3. This change restricts the parsing to match
exactly 3, meaning the millisecond assumption is always true.

The WebVTT spec requires there to be exactly 3 decimal places:
https://www.w3.org/TR/webvtt1/#webvtt-timestamp

The SubRip spec is less clearly defined, but the Wikipedia article
defines it as having exactly 3 decimal places
(https://en.wikipedia.org/wiki/SubRip#Format) and ExoPlayer has always
assumed 3 decimal places (anything else is already handled incorrectly),
so this change just ensures we don't show subtitles at the wrong time.

Issue: #1997
PiperOrigin-RevId: 712885023
@icbaker
Copy link
Collaborator

icbaker commented Jan 7, 2025

The change linked above tightens ExoPlayer's parsing to completely ignore cues with more or less than 3 decimal places (rather than render them at an incorrect time).

@icbaker icbaker closed this as completed Jan 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants