Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix occassional DTS overlap #423

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Fix occassional DTS overlap #423

wants to merge 1 commit into from

Conversation

j0sh
Copy link
Collaborator

@j0sh j0sh commented Sep 13, 2024

Fix an occassional DTS overlap by
closing the filtergraph after each
segment and re-creating it at the
beginning of each segment, instead
of attempting to persist the
filtergraph in between segments.

This overlap occurred mostly when
flip-flopping segments between transcoders,
or processing non-consecutive segments within
a single transcoder. This was due to drift in
adjusting input timestamps to match the fps
filter's expectation of mostly consecutive
timestamps while adjusting output timestamps
to remove accumulated delay from the filter.

There is roughly a 1% performance hit on my
machine from re-creating the filtergraph.

Because we are now resetting the filter after
each segment, we can remove a good chunk of
the special-cased timestamp handling code
before and after the filtergraph since
we no longer need to handle discontinuities
between segments.

However, we do need to keep some filter flushing
logic in order to accommodate low-fps or low-frame
content.

This does change our outputs, usually by one
fewer frame. Sometimes we seem to produce an
additional frame - it is unclear why. However,
as the test cases note, this actually clears up a
number of long-standing oddities around the expected
frame count, so it should be seen as an improvement.


It is important to note that while this fixes DTS
overlap in a (rather unpredictable) general case,
there is another overlap bug in one very specific case.

These are the conditions for bug:

  1. First and second segments of the stream are being processed. This could be the same transcoder or different ones.

  2. The first segment starts at or near zero pts

  3. mpegts is the output format

  4. B-frames are being used

What happens is we may see DTS < PTS for the
very first frames in the very first segment,
potentially starting with PTS = 0, DTS < 0.
This is expected for B-frames.

However, if mpegts is in use, it cannot take negative timestamps. To accompdate negative DTS, the muxer
will set PTS = -DTS, DTS = 0 and delay (offset) the rest of the packets in the segment accordingly.

Unfortunately, subsequent transcodes will not know about this delay! This typically leads to an overlap between the first and second segments (but segments after that would be fine).

The normal way to fix this would be to add a constant delay to all segments - ffmpeg adds 1.4s to mpegts by default.

However, introducing a delay right now feels a little odd since we don't really offer any other knobs to control the timestamp (re-transcodes would accumulate the delay) and there is some concern about falling out of sync with the source segment since we have historically tried to make timestamps follow the source as closely as possible.

So we're leaving this particular bug as-is for now. There is some commented-out code that adds this delay in case we feel that we would need it in the future.

Note that FFmpeg CLI also has the exact same problem when the muxer delay is removed, so this is not a
LPMS-specific issue. This is exercised in the test cases.

Example of non-monotonic DTS after encoding and after muxing:

Segment.Frame Encoder DTS Encoder PTS Muxer DTS Muxer PTS
1.1 -20 0 0 20
1.2 -10 10 10 30
1.3 0 20 20 40
1.4 10 30 30 50
2.1 20 40 20 40
2.2 30 50 30 50
2.3 40 60 40 60

Fix an occassional DTS overlap by
closing the filtergraph after each
segment and re-creating it at the
beginning of each segment, instead
of attempting to persist the
filtergraph in between segments.

This overlap occurred mostly when
flip-flopping segments between transcoders,
or processing non-consecutive segments within
a single transcoder. This was due to drift in
adjusting input timestamps to match the fps
filter's expectation of mostly consecutive
timestamps while adjusting output timestamps
to remove accumulated delay from the filter.

There is roughly a 1% performance hit on my
machine from re-creating the filtergraph.

Because we are now resetting the filter after
each segment, we can remove a good chunk of
the special-cased timestamp handling code
before and after the filtergraph since
we no longer need to handle discontinuities
between segments.

However, we do need to keep some filter flushing
logic in order to accommodate low-fps or low-frame
content.

This does change our outputs, usually by one
fewer frame. Sometimes we seem to produce an
*additional* frame - it is unclear why. However,
as the test cases note, this actually clears up a
numer of long-standing oddities around the expected
frame count, so it should be seen as an improvement.

---

It is important to note that while this fixes DTS
overlap in a (rather unpredictable) general case,
there is another overlap bug in one very specific case.

These are the conditions for bug:

1. First and second segments of the stream are being
   processed. This could be the same transcoder or
   different ones.

2. The first segment starts at or near zero pts

3. mpegts is the output format

4. B-frames are being used

What happens is we may see DTS < PTS for the
very first frames in the very first segment,
potentially starting with PTS = 0, DTS < 0.
This is expected for B-frames.

However, if mpegts is in use, it cannot take negative
timestamps. To accompdate negative DTS, the muxer
will set PTS = -DTS, DTS = 0 and delay (offset) the
rest of the packets in the segment accordingly.

Unfortunately, subsequent transcodes will not know
about this delay! This typically leads to an overlap
between the first and second segments (but segments after
that would be fine).

The normal way to fix this would be to add a constant delay
to all segments - ffmpeg adds 1.4s to mpegts by default.

However, introducing a delay right now feels a little
odd since we don't really offer any other knobs to control
the timestamp (re-transcodes would accumulate the delay) and
there is some concern about falling out of sync with the
source segment since we have historically tried to make
timestamps follow the source as closely as possible.

So we're leaving this particular bug as-is for now.
There is some commented-out code that adds this delay
in case we feel that we would need it in the future.

Note that FFmpeg CLI also has the exact same problem
when the muxer delay is removed, so this is not a
LPMS-specific issue. This is exercised in the test cases.

Example of non-monotonic DTS after encoding and after muxing:

Segment.Frame | Encoder DTS | Encoder PTS | Muxer DTS | Muxer PTS
--------------|-------------|-------------|-----------|-----------
      1.1     |  -20        |    0        |    0      | 20
      1.2     |  -10        |   10        |   10      | 30
      1.3     |    0        |   20        |  *20*     | 40
      1.4     |   10        |   30        |  *30*     | 50
      2.1     |   20        |   40        |  *20*     | 40
      2.2     |   30        |   50        |  *30*     | 50
      2.3     |   40        |   60        |   40      | 60
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants