Video Demystified: A Handbook for the Digital Engineer, Fourth Edition

A packetized elementary stream (PES), consists of a single elementary stream (ES) which has been made into packets, each starting with an added packet header. A PES contains only one type of data (audio, video, etc.) from one source.
The general format of the PES packet is shown in Figure 13.21. Note that start codes (000001xx H) must be byte aligned by inserting 0 7 "0" bits before the start code.
This 24-bit field has a value of 000001 H and in conjunction with stream_ ID, indicates the beginning of a packet
This 8-bit code specifies the type and number of elementary streams, as shown in Table 13.41. For the ATSC and OpenCable standards, the value for audio streams must be "10111101" to indicate Dolby Digital.
This 16-bit binary number specifies the number of bytes in the PES packet following this field. A value of zero indicates it is neither specified nor bounded, and is used only in transport streams. For the ATSC standard, the value must be 0000 H for video streams.
| Note | The following fields (until the next note) are not present if stream_ID = program stream map, padding stream, private stream 2, ECM stream, EMM stream, DSM-CC stream, H.222.1 type E, or program stream directory. |
These optional two bits have a value of "10."
This optional 2-bit code specifies the scrambling mode.