Video Coding: An Introduction to Standard Codecs

If VOPs are not available, then video frames need to be segmented into objects and a VOP to be derived for each one. In general, segmentation consists of extracting image regions of similar properties such as brightness, colour or texture. These regions are then used as masks to extract the objects of interest from the image. Before describing the object segmentation, a brief overview of image segmentation in general is given.
Segmentation methods can be categorised according to the used processing strategy into the following classes [4]:
Region merging: Starting with tiny uniform regions, merge similar regions until no further merging is possible. The starting regions in the finest level could be pixels themselves. They are called atomic regions. Usually atomic regions are chosen as such regions with grey levels within a predefined range. A suitable atomic region can be constructed from the edges, where closed contour edges form the boundary of the object of interest. Edges are extracted by edge extraction methods, such as double differentiation by the Sobel operator [5], where the image pixels are convolved with a 3 3 masking array, and then thresholded to segregate the edge from the non-edge image content. The masking elements are defined such that the slope of the edge is amplified, and can easily be thresholded. In order to obtain many atomic regions, the threshold for edge extraction must be low enough so that the edges with low contrasts may be also detected.
Region...