Now back to GOP types. GOPs always begin with an I picture, however it is not always the first picture displayed. In an open GOP the first B pictures can use the last reference picture of the previous GOP. They don't have to, but if they do it means making an edit at that point will break the decoding of the B pictures. These GOPs have a picture order of IBBPBB... Open GOPs can also be self contained with a picture order of IPBBPBB - here the I and first P are the reference pictures for the B pictures.
A closed GOP has the picture order of IBBPBB, just like the first type of open GOP, except that the encoder has not used the previous GOP's last reference frame. The GOP can be decoded all by itself, and editing will not break the decoding.
Now enters DVD2AVI and MPEG2DEC3. Not fully understanding closed GOPs the author thought that a picture order of IBBPBB required a previous GOP to decode properly. So if the first GOP was a closed GOP MPEG2DEC would discard the B pictures, acting as if they could not be decoded properly. This led to a negative delay (audio starts before video) usually of 67ms for NTSC or 80ms for PAL (2 frames).
Donald Graft has corrected this problem with his own versions of DVD2AVI and MPEG2DEC3 called DGIndex
and DGDecode available here.
An example would probably help a lot here. Let's say we are dealing with an NTSC DVD, the authoring
program has chosen the typical video delay value of 25257 clock ticks, and the audio is AC3 at 1536
bytes per frame. The clock we are referring to is the 90KHz clock used by all timestamps in mpeg. The
first vobu contains 12 frames, each having a duration of 3003 clock ticks, for a total of 36036 clock
ticks. So since the delivery of data will begin at time 0, the video will begin at time 25257, and end
at 25257+36036 = 61293. But because of buffer constraints the audio multiplexed into this vobu, which is
gold in the graphic above, will end at time 45417. The rest of the audio, colored yellow in the graphic,
must be delivered later, so it is in the next vobu which begins delivery at time 36036.
If we demultiplex without considering this factor we end up missing 61293-45417 = 15876 clock ticks of
audio (176.4 milliseconds). OK, not such a big problem, but if we start demultiplexing at the second
vobu without considering this factor we have an extra 176ms of audio at the beginning that belonged to
the previous vobu, hence the program will report a delay of -176ms.
Proper demultiplexing, using the PTS values and not vobu boundaries, reduces the delay to no more
than the duration of one audio frame, which for AC3 is 32ms. Should this delay be fed back into an authoring
program? That depends on what is being done with it. First of all, the only time a delay can be applied during
authoring is at the start of a non-seamless vob. Use the delay if the audio and video are being used as
a clip, otherwise ignore the delay.
Demultiplexing Seamless Joints
This problem always shows up at vobu boundaries and cell boundaries, as they are always seamless. The
problem can also appear at vob boundaries if they are seamlessly multiplexed. The problem here is that
audio and video have different buffer requirements in the players, with video being allowed a greater
delay before presentation than audio. I know, there's that word "delay" - in any streaming digital
media there is a delay for decoding information before it is presented (shown or heard). Each stream
can have a different delay tailored to the needs of the encoding method. In the end the streams get
synchronized again by a timestamp called "PTS" (Presentation Time Stamp). The time window for delivering
data in each vobu is determined by the video data, audio and subpicture data is then added based on
the requirements of each. Since audio has a shorter delay some of the audio for each vobu ends up in the
next vobu.
Of course the values vary greatly, and can be especially high at seamless vob joints.
DVD-Video info home | Copyright © 2006 - 2025 MPUCoder, all rights reserved. |