An overview to show the logical structure of VOB files, and help you see where all the pieces fit.
MPEG-2 System Stream
A VOB file is an MPEG-2 system stream. This means that it complies 100% with the
MPEG-2 system level standard, ISO 13818-1. However, VOB files are a very strict
subset of the standard. So while all VOB files are MPEG-2 system streams,
not all MPEG-2 system streams comply with the definition for a VOB file.
Pack/sector size
DVD sectors contain 2048 bytes of data, this is also the size of one pack. In MPEG-2 packs are
used mainly to group together elements (such as audio and video) that are to be presented
simultaneously, and their size is variable. The pack header can also contain timing information
used for synchronization.
In DVD-Video each sector is one pack. This adds some overhead, but makes random access to the
stream much easier.
Pack contents
Each pack begins with a pack header and contains one or two packets, and no more.
The information in one pack is all of one kind, which may be navigation data (a NAV Pack), video, audio, or subpicture.
The NAV pack contains the system header and two fixed length packets called
Presentation Control Information (PCI) and Data Search Information
(DSI).
The video, audio, and subpicture packs contain only the Packetized Elementary Stream
(PES) for the content, and, if needed, a padding packet.
Non-standard stuff
The MPEG-2 system fortunately left provisions for non-standard data in the form of private
streams. There are two private stream types, only one has timing information in the form of
Presentation Time Stamps (PTS) and Decoder Time Stamps (DTS). The actual content of a private stream
is determined by the application, in our case, DVD-Video.
Private Stream 1 is the one that has the timing information, and so DVD-Video uses this stream
for subpictures and all the additional audio systems (AC3, DTS, LPCM, etc) which are not MPEG.
The actual content of each private stream packet is determined by the sub-stream number.
The other stream, Private Stream 2, is used for the navigation packets found in the NAV pack.
The VOBU
The next higher logical structure is called the Video OBject Unit, or VOBU. Each VOBU starts
with a NAV pack and contains approximately half a second of the program. The size of the VOBU is
determined by the video coding unit called a Group Of Pictures (GOP). A VOBU will contain one or
more complete GOP, as needed. The last video pack in each VOBU is padded if needed with either a padding
stream or stuffing bytes. Audio and subpictures with DTS values within the same range of values as the video
are included in each VOBU. Audio is not padded until the end of the cell, therefore audio frames can span VOBUs.
The Cell
Cells are the next higher logical structure, containing any number of whole VOBUs. Their length
and placement is entirely arbitrary and depends on the overall organization of the program (movie).
Chapters, multiple angles, titles, and even how the "prev" and "next" buttons on a remote act all
dictate the placement of cells.
The VOB
The VOB is a collection of one or more cells. An entire title could use just one VOB, but they
usually use more. Sometimes the use is arbitrary, usually along the lines of a new VOB for each
chapter, and within the VOB cells for each scene. This is not a requirement. In fact, there is only one
place where separate VOBs are required, and that is multiple angles.
Several 1GB files
All the content for one title set (VTS) is contiguous on the DVD, but broken up into 1GB files
in the computer compatible file systems for the convenience of the various operating systems.
You can see that there really is no break by examining the second or later file and looking at the
Logical Block Address (LBA), contained in NAV packs.
The files are broken up without regard to content, which is why it is difficult to process any
file but the first, since it most likely will not start at a VOBU (start with a NAV pack). The usual
split point is at 524,287 sectors (1,048,574 KB, 1,073,739,776 bytes). In hexadecimal this is
7FFFF sectors (219-1), or 3FFFF800 bytes.