SlideShare a Scribd company logo
1 of 18
Download to read offline
Emerging H.264 Standard: Overview and TMS320DM642-
Based Solutions for Real-Time Video Applications


White Paper




UB Video Inc.

Suite 400, 1788 west 5th Avenue

Vancouver, British Columbia, Canada V6J 1P2

Tel: 604-737-2426; Fax: 604-737-1514

www.ubvideo.com




Copyright © 2002 UB Video Inc.                www.ubvideo.com   12-2002
H.264: Introduction
Digital video is being adopted in an increasingly proliferating array of applications ranging from video
telephony and videoconferencing to DVD and digital TV. The adoption of digital video in many applications
has been fueled by the development of many video coding standards, which have emerged targeting different
application areas. These standards provide the means needed to achieve interoperability between systems
designed by different manufacturers for any given application, hence facilitating the growth of the video
market. The International Telecommunications Union, Telecommunications Standardization Sector (ITU-T)
is now one of two formal organizations that develop video coding standards - the other being International
Standardization Organization/International Electrotechnical Commission, Joint Technical Committee 1.
(ISO/IEC JTC1) The ITU-T video coding standards are called recommendations, and they are denoted with
H.26x (e.g., H.261, H.262, H.263 and H.264). The ISO/IEC standards are denoted with MPEG-x (e.g.,
MPEG-1, MPEG-2 and MPEG-4).

The ITU-T recommendations have been designed mostly for real-time video communication applications,
such as video conferencing and video telephony. On the other hand, the MPEG standards have been designed
mostly to address the needs of video storage (DVD), broadcast video (Cable, DSL, Satellite TV), and video
streaming (e.g., video over the Internet, video over wireless) applications. For the most part, the two
standardization committees have worked independently on the different standards. The only exception has
been the H.262/MPEG-2 standard, which was developed jointly by the two committees. Recently, the ITU-T
and the ISO/IEC JTC1 have agreed to join their efforts in the development of the emerging H.264 standard,
which was initiated by the ITU-T committee. H.264 (a.k.a. MPEG-4 Part 10 or MPEG-4 AVC) is being
adopted by the two committees because it represents a departure in terms of performance from all existing
video coding standards. Figure 1 summarizes the evolution of the ITU-T recommendations and the ISO/IEC
MPEG standards.


        ITU-T
                                 H.261                      H.263    H.263+      H.263++
      Standards



         Joint
     ITU-T / MPEG                              H.262/MPEG-2                          H.264
       Standards



         MPEG
       Standards                            MPEG-1                  MPEG-4



                    1984   1986      1988   1990     1992   1994    1996      1998   2000    2002   2004
                    Figure 1. Progression of the ITU-T Recommendations and MPEG standards.




Copyright © 2002 UB Video Inc.                          www.ubvideo.com                                12-2002
H.264: Overview
The main objective behind the H.264 project was to develop a high-performance video coding standard by
adopting a “back to basics” approach with simple and straightforward design using well-known building
blocks. The ITU-T Video Coding Experts Group (VCEG) initiated the work on the H.264 standard in 1997.
Towards the end of 2001, and witnessing the superiority of video quality offered by H.264-based software
over that achieved by the (existing) most optimized MPEG-4 based software, ISO/IEC MPEG joined ITU-T
VCEG by forming a Joint Video Team (JVT) that took over the H.264 project of the ITU-T. The JVT
objective was to create a single video coding standard that would simultaneously result in a new part (i.e.,
Part-10) of the MPEG-4 family of standards and a new ITU-T (i.e., H.264) recommendation. The H.264
development work is an on-going activity, with the first version of the standard expected to be finalized
technically early in the year 2003 and officially before the end of the year 2003.

The emerging H.264 standard has a number of advantages that distinguish it from existing standards, while at
the same time, sharing common features with other existing standards. The following are some of the key
advantages of H.264:

1.   Up to 50% in bit rate savings: Compared to H.263v2 (H.263+) or MPEG-4 Simple Profile, H.264
     permits a reduction in bit rate by up to 50% for a similar degree of encoder optimization at most bit rates.

2.   High quality video: H.264 offers consistently good video quality at high and low bit rates.

3.   Error resilience: H.264 provides the tools necessary to deal with packet loss in packet networks and bit
     errors in error-prone wireless networks.

4.   Network friendliness: Through the Network Adaptation Layer, H.264 bit streams can be easily
     transported over different networks.

The above advantages make H.264 an ideal standard for several applications such as video conferencing and
broadcast video.




Copyright © 2002 UB Video Inc.                            www.ubvideo.com                               12-2002
Quantization    step    sizes
                                                                         increased at a compounding
                                      Coding Control                     rate of approximately 12.5%


Video                                                                      Quantized Transform
Source
                  +                   Transform           Quantization
                                                                           Coefficients
                  _

                               4x4 Integer Transform (fixed)


                                     Predicted                  Inverse
                                     Macroblock               Quantization

          Intra             Inter                                                              Bit Stream Out
                                                                                  Entropy
                                                                Inverse           Coding
         Intra Prediction       Inter Motion                   Transform
               and              Compensation
          Compensation                                                                      Single Universal VLC &
                                                                                            Context-Adaptive VLC OR
                                                                 +                          Context-Based Adaptive
                                                                                            Binary Arithmetic Coding
                                    Frame Store

                                                             Loop Filter
                                                                                            No mismatch
                          Motion Estimation
                                                  Motion Vectors + Macroblock Type
                                                  + Reference Frame Index + CBP

                      Intra Prediction Mode +
                      Macroblock Type + CBP



                                                       Seven block sizes and shapes
Intra Prediction Modes                                 ¼-pel-motion estimation accuracy
9 4x4, 4 16x16 modes (luma) & 4                        Five Prediction modes with B-pictures
modes (chroma) = 17 modes                              Multiple reference picture selection


                                       Figure 2. Block Diagram of the H.264 encoder.




Copyright © 2002 UB Video Inc.                                www.ubvideo.com                                   12-2002
H.264: Technical Description
The main objective of the emerging H. 264 standard is to provide a means to achieve substantially higher
video quality as compared to what could be achieved using any of the existing video coding standards.
Nonetheless, the underlying approach of H.264 is similar to that adopted in previous standards such as H.263
and MPEG-4, and consists of the following four main stages:

   1. Dividing each video frame into blocks of pixels so that processing of the video frame can be
      conducted at the block level.

   2. Exploiting the spatial redundancies that exist within the video frame by coding some of the original
      blocks through spatial prediction, transform, quantization and entropy coding (or variable-length
      coding).

   3. Exploiting the temporal dependencies that exist between blocks in successive frames, so that only
      changes between successive frames need to be encoded. This is accomplished by using motion
      estimation and compensation. For any given block, a search is performed in the previously coded one
      or more frames to determine the motion vectors that are then used by the encoder and the decoder to
      predict the subject block.

   4. Exploiting any remaining spatial redundancies that exist within the video frame by coding the
      residual blocks, i.e., the difference between the original blocks and the corresponding predicted
      blocks, again through transform, quantization and entropy coding.

From a coding point of view, the main differences between H.264 and the other standards are summarized in
Figure 2 through an encoder block diagram. On the motion estimation/compensation side, H.264 employs
blocks of different sizes and shapes, higher resolution 1/4-pel motion estimation, multiple reference frame
selection and complex multiple bi-directional mode selection. On the transform side, H.264 uses an integer-
based transform that approximates roughly the Discrete Cosine Transform (DCT) used in previous
standards, but does not have the mismatch problem in the inverse transform. In H.264, entropy coding can be
performed using either a combination of a single Universal Variable Length Codes (UVLC) table with a
Context Adaptive Variable Length Codes (CAVLC) for the transform coefficients or using Context-based
Adaptive Binary Arithmetic Coding (CABAC).

Organization of the Bit Stream
A given video picture is divided into a number of small blocks referred to as macroblocks. For example, a
picture with QCIF resolution (176x144) is divided into 99 16x16 macroblocks as indicated in Figure 3. A
similar macroblock segmentation is used for other frame sizes. The luminance component of the picture is
sampled at these frame resolutions, while the chrominance components, Cb and Cr, are down-sampled by
two in the horizontal and vertical directions. In addition, a picture may be divided into an integer number of
“slices”, which are valuable for resynchronization should some data be lost.




Copyright © 2002 UB Video Inc.                         www.ubvideo.com                               12-2002

                                                        +
                                                        +
9



                                                   11
                          Figure 3. Subdivision of a QCIF picture into 16x16 macroblocks.

Intra Prediction and Coding

Intra coding refers to the case where only spatial redundancies within a video picture are exploited. The
resulting frame is referred to as an I-picture. I-pictures are typically encoded by directly applying the
transform to the different macroblocks in the frame. Consequently, encoded I-pictures are large in size since
a large amount of information is usually present in the frame, and no temporal information is used as part of
the encoding process. In order to increase the efficiency of the intra coding process in H.264, spatial
correlation between adjacent macroblocks in a given frame is exploited. The idea is based on the observation
that adjacent macroblocks tend to have similar properties. Therefore, as a first step in the encoding process
for a given macroblock, one may predict the macroblock of interest from the surrounding macroblocks
(typically the ones located on top and to the left of the macroblock of interest, since those macroblocks would
have already been encoded). The difference between the actual macroblock and its prediction is then coded,
which results in fewer bits to represent the macroblock of interest as compared to when applying the
transform directly to the macroblock itself.




                             Figure 4. Intra prediction modes for 4x4 luminance blocks.


In order to perform the intra prediction mentioned above, H.264 offers nine modes for prediction of 4x4
luminance blocks, including DC prediction (Mode 2) and eight directional modes, labelled 0, 1, 3, 4, 5, 6, 7,
and 8 in Figure 4. This process is illustrated in Figure 4, in which pixels A to M from neighbouring blocks
have already been encoded and may be used for prediction. For example, if Mode 0 (Vertical prediction) is
selected, then the values of the pixels a to p are assigned as follows:

    ♦   a, e, i and m are equal to A,


Copyright © 2002 UB Video Inc.                            www.ubvideo.com                             12-2002
♦   b, f, j and n are equal to B,
    ♦   c, g, k and o are equal to C, and
    ♦   d, h, l and p are equal to D.

In the case where Mode 3 (Diagonal_Down_Left prediction) is chosen, the values of a to p are given as
follows:

    ♦   a is equal to (A+2B+C+2)/4,
    ♦   b, e are equal to (B+2C+D+2)/4,
    ♦   c, f, i are equal to (C+2D+E+2)/4,
    ♦   d, g, j, m are equal to (D+2E+F+2)/4,
    ♦   h, k, n are equal to (E+2F+G+2)/4,
    ♦   l, o are equal to (F+2G+H+2)/4, and
    ♦   p is equal to (G+3H+2)/4.

For regions with less spatial detail (i.e., flat regions), H.264 supports 16x16 intra coding, in which one of
four prediction modes (DC, Vertical, Horizontal and Planar) is chosen for the prediction of the entire
luminance component of the macroblock. In addition, H.264 supports intra prediction for the 8x8
chrominance blocks also using four prediction modes (DC, Vertical, Horizontal and Planar). Finally, the
prediction mode for each block is efficiently coded by assigning shorter symbols to more likely modes,
where the probability of each mode is determined based on the modes used for coding the surrounding
blocks.

Inter Prediction and Coding                                 H.264 derives most of its coding efficiency
                                                            advantage from motion estimation.
Inter prediction and coding is based on using motion
estimation and compensation to take advantage of the temporal redundancies that exist between successive
frames, hence, providing very efficient coding of video sequences. When a selected reference frame(s) for
motion estimation is a previously encoded frame(s), the frame to be encoded is referred to as a P-picture.
When both a previously encoded frame and a future frame are chosen as reference frames, then the frame to
be encoded is referred to as a B-picture. Motion estimation in H.264 supports most of the key features
adopted in earlier video standards, but its efficiency is improved through added flexibility and functionality.
In addition to supporting P-pictures (with single and multiple reference frames) and B-pictures, H.264
supports a new inter-stream transitional picture called an SP-picture. The inclusion of SP-pictures in a bit
stream enables efficient switching between bit streams with similar content encoded at different bit rates, as
well as random access and fast playback modes.




Copyright © 2002 UB Video Inc.                          www.ubvideo.com                               12-2002
Mode 1                      Mode 2                     Mode 3                       Mode 4
         One 16x16 block             Two 8x16 blocks              Two 16x8 blocks              Four 8x8 blocks
        One motion vector           Two motion vectors           Two motion vectors          Four motion vectors

                                                                         0                         0     1
                0                         0     1
                                                                         1                         2     3



              Mode 5                     Mode 6                      Mode 7
        Eight 4x8 blocks             Eight 8x4 blocks           Sixteen 4x4 blocks
      Eight motion vectors         Eight motion vectors       Sixteen motion vectors

                                         0     1                     0 1 2 3
           0 1 2 3
                                         2     3                     4 5 6 7
                                         4     5                     8 9 10 11
           4 5 6 7
                                         6     7                    12 13 14 15


                    Figure 5. Different modes of dividing a macroblock for motion estimation in H.264.



Block sizes

Motion compensation for each 16x16 macroblock can be performed using a number of different block sizes
and shapes. These are illustrated in Figure 5. Individual motion vectors can be transmitted for blocks as small
as 4x4, so up to 16 motion vectors may be transmitted for a
single macroblock. Block sizes of 16x8, 8x16, 8x8, 8x4, and Using seven different block sizes and
                                                                  shapes can translate into bit rate savings
4x8 are also supported as shown. The availability of smaller of more than 15% as compared to using
motion compensation blocks improves prediction in general, only a 16x16 block size.
and in particular, the small blocks improve the ability of the
model to handle fine motion detail and result in better subjective viewing quality because they do not
produce large blocking artifacts.

Moreover, through the recently adopted tree structure segmentation method, it is possible to have a
combination of 4x8, 8x4, or 4x4 sub-blocks within an 8x8 sub-block. Figure 6 shows an example of such a
configuration for a 16x16 macroblock.

                                                          4x4 4x4
                                                8x8
                                                          4x4 4x4

                                                           8x4
                                              4x8 4x8
                                                           8x4

   Figure 6. A configuration of sub-blocks within a macroblock based on tree structure segmentation method in H.264




Copyright © 2002 UB Video Inc.                               www.ubvideo.com                                   12-2002
Motion estimation accuracy

The prediction capability of the motion compensation Using ¼-pel spatial accuracy can yield as
algorithm in H.264 is further improved by allowing motion much as 20% in bit rate savings as
vectors to be determined with higher levels of spatial compared to using integer-pel spatial
                                                             accuracy.
accuracy than in existing standards. Quarter-pixel accurate
motion compensation is the lowest-accuracy form of motion compensation in H.264 (in contrast with prior
standards based primarily on half-pel accuracy, with quarter-pel accuracy only available in the newest
version of MPEG-4).

Multiple reference picture selection

The H.264 standard offers the option of having multiple reference frames in inter picture coding, resulting in
better subjective video quality and more efficient coding of
the video frame under consideration. Moreover, using Using 5 reference frames for prediction
                                                                can yield 5-10% in bit rate savings as
multiple reference frames helps make the H.264 bit stream
                                                                compared to using only one reference
more error resilient. However, from an implementation point frame.
of view, there would be additional processing delays and
higher memory requirements at both the encoder and decoder.

De-blocking (Loop) Filter

H.264 specifies the use of an adaptive de-blocking filter that operates on the horizontal and vertical block
edges within the prediction loop in order to remove artifacts
                                                                 The de-blocking filter yields a substantial
caused by block prediction errors. The filtering is generally
                                                                 improvement in subjective quality.
based on 4x4 block boundaries, in which two pixels on either
side of the boundary may be updated using a different filter. The rules for applying the de-blocking filter are
intricate and quite complex, however, its use is optional for each slice (loosely defined as an integer number
of macroblocks). Nonetheless, the improvement in subjective quality often more than justifies the increase in
complexity.

Integer Transform

The information contained in a prediction error block resulting from either intra prediction or inter prediction
is then re-expressed in the form of transform coefficients. H.264 is unique in that it employs a purely integer
spatial transform (a rough approximation of the DCT) which is primarily 4x4 in shape, as opposed to the
usual floating-point 8x8 DCT specified with rounding-error tolerances as used in earlier standards. The small
shape helps reduce blocking and ringing artifacts, while the precise integer specification eliminates any
mismatch issues between the encoder and decoder in the inverse transform.

Quantization and Transform Coefficient Scanning

The quantization step is where a significant portion of data compression takes place. In H.264, the transform
coefficients are quantized using scalar quantization with no widened dead-zone. Fifty-two different
quantization step sizes can be chosen on a macroblock basis – this being different from prior standards
(H.263 supports thirty-one, for example). Moreover, in H.264 the step sizes are increased at a compounding


Copyright © 2002 UB Video Inc.                          www.ubvideo.com                                12-2002
rate of approximately 12.5%, rather than increasing it by a constant increment. The fidelity of chrominance
components is improved by using finer quantization step sizes as compared to those used for the luminance
coefficients, particularly when the luminance coefficients are coarsely quantized.


                                              0    1    5     6

                                              2    4    7    12

                                              3    8   11    13

                                              9   10   14    15



                                 Figure 7. Scan pattern for frame coding in H.264.


The quantized transform coefficients correspond to different frequencies, with the coefficient at the top left
hand corner in Figure 7 representing the DC value, and the rest of the coefficients corresponding to different
nonzero frequency values. The next step in the encoding process is to arrange the quantized coefficients in
an array, starting with the DC coefficient. A single coefficient-scanning pattern is available in H.264 (Figure
7) for frame coding, and another one is being added for field coding. The zigzag scan illustrated in Figure 7
is used in all frame-coding cases, and it is identical to the conventional scan used in earlier video coding
standards. The zigzag scan arranges the coefficient in an ascending order of the corresponding frequencies.

Entropy Coding

The last step in the video coding process is entropy coding. Entropy coding is based on assigning shorter
codewords to symbols with higher probabilities of occurrence, and longer codewords to symbols with less
frequent occurrences. Some of the parameters to be entropy coded include transform coefficients for the
residual data, motion vectors and other encoder information. Two types of entropy coding have been
adopted. The first method represents a combination of Universal Variable Length Coding (UVLC) and
Context Adaptive Variable-Length coding (CAVLC). The second method is represented by Context-Based
Adaptive Binary Arithmetic Coding (CABAC).
UVLC/CAVLC

In some video coding standards, symbols and the associated codewords are organized in look-up tables,
referred to as variable length coding (VLC) tables, which are stored at both the encoder and decoder. In
H.263, a number of VLC tables are used, depending on the type of data under consideration (e.g., transform
coefficients, motion vectors). H.264 offers a single Universal VLC (UVLC) table that is to be used in entropy
coding of all symbols in the encoder except for the transform coefficients. Although the use of a single
UVLC table is simple, is has a major disadvantage, which is that the single table is usually derived using a
static probability distribution model, which ignores the correlations between the encoder symbols.

In H.264, the transform coefficients are coded using Context Adaptive Variable Length Coding (CAVLC).
CAVLC is designed to take advantage of several characteristics of quantized 4x4 blocks. First, non-zero
coefficients at the end of the zigzag scan are often equal to +/- 1. CAVLC encodes the number of these
coefficients (“trailing 1s”) in a compact way. Second, CAVLC employs run-level coding efficiently to
represent the string of zeros in a quantized 4x4 block. Moreover, the numbers of non-zero coefficients in


Copyright © 2002 UB Video Inc.                              www.ubvideo.com                           12-2002
neighbouring blocks are usually correlated. Thus, the number of non-zero coefficients is encoded using a
look-up table that depends on the numbers of non-zero coefficients in neighbouring blocks. Finally, the
magnitude (level) of non-zero coefficients gets larger near the DC coefficient and get smaller around the high
frequency coefficients. CAVLC takes advantage of this by making the choice of the VLC look-up table for
the level adaptive in a way where the choice depends on the recently coded levels.

H.264 Profiles

To this date, three major profiles have been agreed upon: Baseline, mainly for video conferencing and




telephony/mobile applications, Main, primarily for broadcast video applications, and X, mainly for
streaming and mobile video applications. Figure 8 shows the common features between the Baseline and
Main profiles as well as the additional specific features for each. The Baseline profile allows the use of
Arbitrary Slice Ordering (ASO) to reduce the latency in real-time communication applications, as well as the
use of Flexible Macroblock Ordering (FMO) and redundant slices to improve error resilience in the coded bit
stream. The Main profile enables an additional reduction in bandwidth over the Baseline profile through
mainly sophisticated Bi-directional prediction (B-pictures), Context Adaptive Binary Arithmetic Coding
(CABAC) and weighted prediction.

                              Figure 8. Features for the Baseline and Main profiles

Baseline Profile: Specific Features

Arbitrary Slice Ordering
Arbitrary slice ordering allows the decoder to process slices in an arbitrary order as they arrive to the
decoder. Hence the decoder does not have to wait for all the slices to be properly arranged before it starts
processing them. This reduces the processing delay at the decoder, resulting in less overall latency in real
time video communication applications.

Flexible Macroblock Ordering (FMO)
Macroblocks in a given frame are usually coded in a raster scan order. With FMO, macroblocks are coded
according to a macroblock allocation map that groups, within a given slice, macroblocks from spatially


Copyright © 2002 UB Video Inc.                           www.ubvideo.com                             12-2002
different locations in the frame. Such an arrangement enhances error resilience in the coded bit stream since
it reduces the interdependency that would otherwise exist in coding data within adjacent macroblocks in a
given frame. In the case of packet loss, the loss is scattered throughout the picture and can be easily
concealed.

Redundant slices
Redundant slices allow the transmission of duplicate slices over error-prone networks to increase the
likelihood of the delivery of a slice that is free of errors.

Main Profile: Specific Features

B pictures
B-pictures provide a compression advantage as compared to P-pictures by allowing a larger number of
prediction modes for each macroblock. Here, the prediction is formed by averaging the sample values in two
reference blocks, generally, but not necessarily using one reference block that is forward in time and one that
is backward in time with respect to the current picture. In addition, "Direct Mode" prediction is supported, in
which the motion vectors for the macroblock are interpolated based on the motion vectors used for coding the
co-located macroblock in a nearby reference frame. Thus, no motion information is transmitted. By allowing
so many prediction modes, the prediction accuracy is improved, often reducing the bit rate by 5-10%.

Weighted prediction
This allows the modification of motion compensated sample intensities using a global multiplier and a global
offset. The multiplier and offset may be explicitly sent, or implicitly inferred. The use of the multiplier and
the offset aims at reducing the prediction residuals due, for example, to global changes in brightness, and
consequently, leads to enhanced coding efficiency for sequences with fades, lighting changes, and other
special effects.

CABAC
Context Adaptive Binary Arithmetic Coding (CABAC)
                                                                  The use of Context based Adaptive
makes use of a probability model at both the encoder and Binary Arithmetic Coding in H.264 yields
decoder for all the syntax elements (transform coefficients, an improvement of approximately 10%
motion vectors, etc.). To increase the coding efficiency of in bit savings.
arithmetic coding, the underlying probability model is
adapted to the changing statistics within a video frame, through a process called context modeling.

The context modeling provides estimates of conditional probabilities of the coding symbols. Utilizing
suitable context models, given inter-symbol redundancy can be exploited by switching between different
probability models according to already coded symbols in the neighborhood of the current symbol to encode.
The context modeling is responsible for most of CABAC’s 10% savings in bit rate over the VLC entropy
coding method (UVLC/CAVLC).


Example Applications: Video Conferencing
Video conferencing systems are being increasingly used around the world as tools that enable cost-effective
and efficient communication while reducing expenses and enhancing productivity. Deployed mostly within



Copyright © 2002 UB Video Inc.                          www.ubvideo.com                               12-2002
the enterprise setting, video conferencing offers collaboration tools, in addition to the exchange of audio and
video information, that make it an attractive alternative to business travel.

Video Conferencing Industry: Challenges

The use of video as part of videoconferencing applications involves a number of challenges from both the
bandwidth and the quality points of view. Such challenges are discussed next.

•   Efficient bandwidth utilization: Most of the bandwidth allocated to a typical video conferencing
    session is usually consumed by the video component of the data that is being transmitted. Any
    significant reduction in the video bandwidth that is required to maintain a desired video quality could
    result in a number of benefits, such as increasing the number of participants during a conferencing
    session and increasing the amount of data that could be exchanged during the conferencing session.
    Moreover, the fact that video conferencing applications run at low rates makes it even more important to
    use the most effective video compression tools in order to maintain good video quality at such rates.

•   Low processing delay: It is important that the delay associated with the processing and transmission of
    the data be kept to a minimum in order to maintain good quality in the decoded video. A large
    processing delay at the encoder would lead to the perception of non-smooth motion in the reconstructed
    video sequences. The delay (a.k.a. latency) is the sum of encoding, network and decoding delays. In
    real-time interactive applications, the user will notice some objectionable delay if the round trip time
    exceeds 250ms. In order to minimize latency, it is essential to have very small processing delays at the
    encoder and decoder.

•   Better Video Quality: Video quality in conferencing applications can be affected negatively by a
    number of factors, including noise and brightness changes in the source video, the presence of trailing
    artifacts in the reconstructed video and network packet loss.

    !   Pre-processing: The presence of noise in the video frames and the brightness changes between
        successive video frames can reduce substantially the efficiency of the inter-coding process.
        Consequently, it is important to incorporate pre-processing tools in order to minimize the effects of
        both the noise and the changes in brightness on the video quality.

    !   Avoidance of Trailing Artifacts: Trailing artifacts usually appear as a series of dots that trail a
        moving object, and are very visible. The presence of artifacts is due to the fact that the encoder is
        forced to discard a lot of useful data (i.e., employ a large quantizer) when coding video data at low
        rates. Consequently, one would need to detect and eliminate (or at least) reduce such artifacts.

    !   Error Resilience: The compressed bit stream generated by the video encoder is first segmented into
        fixed or variable length packets, multiplexed with audio and other data types, and then transported
        over the network. Some of the transmitted packets can get lost or corrupted during transport over the
        network, due, for example, to congestion in the network or impairment of the physical channel,
        leading to a distortion in the audio and video data. Therefore, it is often necessary to employ error
        resilience tools in the encoder to guarantee a minimal level of resilience to transmission errors.




Copyright © 2002 UB Video Inc.                          www.ubvideo.com                               12-2002
UBLive-264BP: A Solution for Video Conferencing Applications

UB Video’s UBLive-264BP is a complete video processing solution that has at its core an optimized
Baseline H.264 encoder/decoder. UBLive-264BP has a number of features that make it suitable for video
conferencing applications. UBLive-264BP was designed to address all of the major video processing issues
that plague video conferencing applications, as explained in the following.

•   Up to 50% in bit rate savings: Most of the existing video conferencing systems in existence today are
    based on video coding solutions that make use of either the H.261 or H.263 video coding standards.
    Compared to H.263-based solutions, UBLive-264BP yields an average reduction in bit rate by up to
    50% for a similar degree of encoder optimization at most bit rates and for the same subjective video
    quality. The savings in bit rate are due in part to the flexibility provided by H.264 but also to the
    collection of complexity-reduction algorithms UB Video has researched and developed over the past few
    years.

•   Low Processing Delay: UBLive-264BP also features very low processing latency, a key requirement in
    video conferencing applications. UBLive-264BP guarantees an excellent video quality even for single-
    pass encoding, thereby reducing processing latency.

•   Better Video Quality: UBLive-264BP yields high video quality through the effective use of the H.264
    Baseline Profile features as well as the use of pre-processing methods such as noise filtering, brightness
    compensation in the source video, and trailing artifact avoidance/reduction, as well error resilience
    methods such as Flexible Macroblock Ordering (FMO).


Example Applications: Broadcast Video
Broadcast video applications are currently undergoing a transition where more and more content is produced
and diffused in digital format as opposed to the traditional analog format. Service providers in the broadcast
industry face an unprecedented competition for viewers. In a landscape that was previously dominated by the
cable industry, satellite and Digital Subscriber Line (DSL) companies are now also competing for the same
customers. The competitive landscape is driving service providers to develop ways to differentiate their
services and to adopt new solutions for the production and delivery of digital video.

Broadcast Video Industry: Challenges

Digital broadcast video service providers need to overcome significant technical hurdles in order to
differentiate their services in the market. From a video coding perspective, there are three major challenges
that the industry has to deal with; namely making efficient use of the available bandwidth, ensuring high-
reproduction video quality, and providing a cost-effective embedded decoder solution. Another challenge is
that the H.264 Main Profile standard, while holding the promise to help improve the bandwidth-quality
tradeoffs, is still not final, and may in fact change quite significantly over the next few months. Each of these
four challenges is discussed in more detail in the following.

•   Efficient Bandwidth Utilization: A key differentiating factor for service providers is the number of
    channels that can be accommodated over a given transmission bandwidth, which in turn affects the


Copyright © 2002 UB Video Inc.                           www.ubvideo.com                                12-2002
amount of generated revenues. For the different service providers, whether based on cable, satellite, or
    DSL services, a more efficient use of the transmission bandwidth could translate into making available
    more channels to the customers, or providing additional services, hence enhancing the service providers’
    service offerings. For some broadcast video applications, such as video over DSL, the available
    bandwidth is already very limited, implying that bandwidth savings would be even more needed.
    However, any savings in the bandwidth allocated to the broadcasting of digital video should not come at
    the expense of video quality, as customers expect the service to provide the same broadcast video quality
    they are used to, or even better. Therefore, the only available solution is that the most effective
    compression tools be used.

•   Broadcast Video Quality: Viewers used to the superior quality produced by DVD players will not
    accept any broadcast video that does not measure up to that level of quality. Providing broadcast video
    quality at a limited channel bandwidth (i.e., 1.5Mbps for DSL channels) for the Standard-Definition
    (SD) resolution (720x480) can therefore be quite challenging, particularly when the video content is
    characterized by action sequences with significant amount of motion, scene changes as well as fades and
    dissolves. Moreover, for most broadcast video applications, the video quality could be significantly
    affected by the presence of spatial and temporal noise, often yielding very objectionable artifacts such as
    contouring and blocking especially in bright areas. Therefore, pre-processing the source video to remove
    noise is critical to insuring a high-level of video quality.

•   Decoder complexity: The H.264 standard is significantly more complex than any of the previous video
    coding standards. Motion compensation, for instance, makes use of 7 block sizes from 16x16 down to
    4x4. Consequently, the H.264 decoder is expected to be significantly more demanding in terms of
    computations and memory requirements. Any decoder should be able to handle all “legal” bit streams
    (i.e., worst-case scenario), making the decoder implementation even more complicated. Moreover, the
    development of an embedded decoder implementation where the internal memory size is limited is a
    challenging task. For example, when performing motion compensation on a macroblock coded using bi-
    directional prediction, the decoder must refer to multiple reference frames in both directions.
    Transferring the appropriate macroblocks for motion compensation can slow down the decoder
    significantly as memory transfers may become too demanding in terms of cycles.

•   H.264 Main Profile Status: The H.264 standard is in the last stage of development and changes to the
    standard may still be possible, particularly in features included only by the Main Profile. As a result,
    broadcast infrastructure companies, which normally look for hardware solutions, are currently looking
    for solutions that are fully programmable in order to be able to quickly adapt to the evolving standard.


UBLive-264MP: A Solution for Broadcast Video Applications

UB Video’s UBLive-264MP consists of three components: a pre-processing tool, an encoder and a decoder.
This solution is based on the H.264 Main profile, which has a number of features that make it suitable for
broadcast video applications. UBLive-264MP was designed to address all the major video processing
challenges (discussed above) that face the broadcast video industry. In the following, the three components of
UBLive-264MP and how they address the different challenges are discussed.




Copyright © 2002 UB Video Inc.                         www.ubvideo.com                                12-2002
•   Pre-processing: One of the challenges in broadcast video is the presence of noise that may potentially
    lead to objectionable artifacts. UBLive-264MP applies motion-compensated filtering to reduce noise in
    successive video frames. Motion-compensated filtering reduces noise in the temporal direction while
    taking into account motion in successive video frames. This pre-processing not only achieves good
    performance in terms of reducing the occurrence of artifacts, but it also results in bit savings, leading to a
    better overall video quality.

•   Encoder: Existing broadcast systems are generally based on video coding solutions that make use of the
    MPEG-2 coding standard. The emerging H.264 standard is poised to become the enabling technology
    that would address the bandwidth utilization problem. Compared to MPEG-2, H.264 promises an average
    reduction in bit rate by more than 50% for a similar degree of encoder optimization at most bit rates and
    for the same subjective video quality. The savings in bit rate are the result of a number of techniques,
    including the use sophisticated intra prediction, multiple block sizes and multiple reference frames in
    motion estimation, more advanced B-picture coding and highly efficient Context-Based Adaptive Binary
    Arithmetic Coding. UBLive-264MP, through the result of a three-year research and development effort
    involving various algorithmic and subjective optimizations, achieves good bandwidth-quality tradeoffs.
    In order to achieve efficient implementation of the encoder, UB Video has developed rate-distortion
    optimized motion estimation and mode decision algorithms that provide excellent quality-complexity
    tradeoffs. To further enhance the video quality, UB Video has also developed new algorithms that take
    into account the perception limitation of the human visual system.

•   Decoder on the TMS320DM642 Digital Media Processor: Since the H.264 standard is still under
    development and is expected to stabilize in 2003, broadcast infrastructure companies are looking for a
    programmable chip solution that is powerful enough to handle the processing power associated with such
    a complex decoder. UB Video’s highly optimized H.264 Main profile decoder software running on the
    Texas Instruments’ TMS320DM642 Digital Media Processor represents a very compelling solution. This
    solution includes currently the only embedded real-time broadcast-quality H.264 Main profile decoder for
    SD resolution. This was achieved through algorithmic optimizations that take full advantage of the DSP
    capabilities. This hardware/software solution offers an attractive and viable option for a new generation
    of set top boxes featuring enhanced flexibility through the programmability offered by the DM642 digital
    media processor. Adopting this solution will certainly help service providers roll out their H.264
    compliant products in the very near future.


UBLive-264-DM642 Solutions: Demonstration Software
UB Video has demonstration software for the UBLive-264-DM642 solutions (namely UBLive-264BP and
UBLive-264MP) that runs on the TMS320DM642 Network Video Developer’s Kit (NVDK) board from
Texas Instruments.

TMS320DM642 Digital Media Platform

The TMS320DM642™ digital media processor (DM642) DSP core delivers the highest performance at the
lowest power consumption of any available DSP in the market to date. With its 600Mhz processing power
today and the planned next generation 1.1GHz version, this DSP is most suited to overcome the complexity
and computational requirements of H.264 and to deliver high-quality full-screen video for most broadband



Copyright © 2002 UB Video Inc.                           www.ubvideo.com                                 12-2002
video applications. The DSP core processor has 64 general-purpose 32-bit registers and eight highly
independent functional units - two multipliers and six arithmetic logic units (ALUs) with VelociTI.2
extensions. The VelociTI.2™ extensions in the eight functional units include new instructions to accelerate
the performance in video and imaging applications and extend the parallelism of the VelociTI™ architecture.
The DM642 uses a two-level cache-based architecture and has a powerful and diverse set of peripherals. The
Level 1
program cache (L1P) is a 128-Kbit direct-mapped cache and the Level 1 data cache (L1D) is a 128-Kbit 2-
way set-associative cache. The Level 2 memory/cache (L2) consists of a 2-Mbit-memory space that is shared
between program and data space. L2 memory can be configured as mapped memory, cache, or combinations
of the two.




                                 Figure 9. Block diagram of the UBLive-264BP demo.



UBLive-264BP: Demonstration Software

A block diagram of the demo is shown in Figure 9. The NVDK is a platform where real-time API
encode/decode function calls are done and where algorithms are tested for compliance. The way the UB
Video demonstration software works is as follows: A camera captures frames at a rate of 30 frames per
second and a resolution of 640x480. In the case where the SIF resolution mode is selected, frames are scaled
down to SIF resolution. The resulting frame then passes through a pre-processing stage to reduce the
presence of noise in the source frame and to remove any significant brightness changes between successive
video frames. The output frame is then passed to the encoder API, and then to the decoder API. The decoded
frame is converted from YUV to RGB and then displayed on the screen along with the original (possibly)
scaled video frame. Please see [1] for more details on the UBLive-264MP demonstration software.
The UBLive-264BP software illustrates that UB Video’s Baseline H.264 solution allows a 35-50%
bandwidth reduction over that required by H.263-based solutions. While the additional complexity is
perceived is be high, the software also demonstrates that UBLive-264BP can simultaneously encode and
decode 640x480 interlaced video sequences on a single 600Mhz DM642 chip, with possibly enough room for
audio and other video conferencing system components.

UBLive-264MP: Demonstration Software

A block diagram of the demo is shown in Figure 10. The way the UB Video demonstration software works is
as follows: A bit stream is parsed by the decoder and YUV 4:2:0 frames are output. The resolution of the
frame is 720x360 for Y and 360x180 for chrominance components. The vertical resolution (360) is due to the


Copyright © 2002 UB Video Inc.                            www.ubvideo.com                           12-2002
wide screen DVD content that is used. Once a frame is decoded, it goes through a post-processing stage as
shown. The chrominance vertical lines are duplicated leading to YUV 4:2:2 resolution and then a
transformation to packed format is done followed by a 3:2 pull-down operation to up-sample the frame rate
to 30 frames per second. Each frame is delivered to the board display hardware for output to a television.

The UBLive-264MP decoder demo software has been optimized to decode 30 fps of SD resolution video,
hence illustrating the fact that UBLive-264MP on the TMS320DM642 Digital Media Processor provides a
viable solution for digital video set-top applications. Moreover, the demo illustrates the capability of the
UBLive-264MP encoder software which yields, at a bit rate of 1.5 Mbps, the same broadcast quality video
that MPEG-2 would provide at a rate of 3 Mbps. Please see [2] for more details on UBLive-264MP solution.




                          Figure 10. Block   diagram of the UBLive-264MP demo.


References
[1] UBLive-264BP: An H.264-Based Solution on the TMS320DM642 for Video Conferencing Applications.
UB Video Inc. www.ubvideo.com.
[2] UBLive-264MP: An H.264-Based Solution on the TMS320DM642 for Video Broadcast Applications.
UB Video Inc. www.ubvideo.com.




Copyright © 2002 UB Video Inc.                         www.ubvideo.com                              12-2002

More Related Content

What's hot

h.264 video compression standard.
h.264 video compression standard.h.264 video compression standard.
h.264 video compression standard.Videoguy
 
An Introduction to Versatile Video Coding (VVC) for UHD, HDR and 360 Video
An Introduction to  Versatile Video Coding (VVC) for UHD, HDR and 360 VideoAn Introduction to  Versatile Video Coding (VVC) for UHD, HDR and 360 Video
An Introduction to Versatile Video Coding (VVC) for UHD, HDR and 360 VideoDr. Mohieddin Moradi
 
2011_12_4K research in PSNC
2011_12_4K research in PSNC2011_12_4K research in PSNC
2011_12_4K research in PSNCmglowiak
 
Complexity Analysis in Scalable Video Coding
Complexity Analysis in Scalable Video CodingComplexity Analysis in Scalable Video Coding
Complexity Analysis in Scalable Video CodingWaqas Tariq
 
High Efficiency Video Codec
High Efficiency Video CodecHigh Efficiency Video Codec
High Efficiency Video CodecTejus Adiga M
 
Video Coding Standard
Video Coding StandardVideo Coding Standard
Video Coding StandardVideoguy
 
DIC_video_coding_standards_07
DIC_video_coding_standards_07DIC_video_coding_standards_07
DIC_video_coding_standards_07aniruddh Tyagi
 
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...Ijripublishers Ijri
 

What's hot (16)

h.264 video compression standard.
h.264 video compression standard.h.264 video compression standard.
h.264 video compression standard.
 
HEVC overview main
HEVC overview mainHEVC overview main
HEVC overview main
 
An Introduction to Versatile Video Coding (VVC) for UHD, HDR and 360 Video
An Introduction to  Versatile Video Coding (VVC) for UHD, HDR and 360 VideoAn Introduction to  Versatile Video Coding (VVC) for UHD, HDR and 360 Video
An Introduction to Versatile Video Coding (VVC) for UHD, HDR and 360 Video
 
H.263 Video Codec
H.263 Video CodecH.263 Video Codec
H.263 Video Codec
 
2011_12_4K research in PSNC
2011_12_4K research in PSNC2011_12_4K research in PSNC
2011_12_4K research in PSNC
 
Deblocking_Filter_v2
Deblocking_Filter_v2Deblocking_Filter_v2
Deblocking_Filter_v2
 
Complexity Analysis in Scalable Video Coding
Complexity Analysis in Scalable Video CodingComplexity Analysis in Scalable Video Coding
Complexity Analysis in Scalable Video Coding
 
H263.ppt
H263.pptH263.ppt
H263.ppt
 
High Efficiency Video Codec
High Efficiency Video CodecHigh Efficiency Video Codec
High Efficiency Video Codec
 
[IJET-V1I2P1] Authors :Imran Ullah Khan ,Mohd. Javed Khan ,S.Hasan Saeed ,Nup...
[IJET-V1I2P1] Authors :Imran Ullah Khan ,Mohd. Javed Khan ,S.Hasan Saeed ,Nup...[IJET-V1I2P1] Authors :Imran Ullah Khan ,Mohd. Javed Khan ,S.Hasan Saeed ,Nup...
[IJET-V1I2P1] Authors :Imran Ullah Khan ,Mohd. Javed Khan ,S.Hasan Saeed ,Nup...
 
Video Coding Standard
Video Coding StandardVideo Coding Standard
Video Coding Standard
 
H261
H261H261
H261
 
DIC_video_coding_standards_07
DIC_video_coding_standards_07DIC_video_coding_standards_07
DIC_video_coding_standards_07
 
mpeg4
mpeg4mpeg4
mpeg4
 
HEVC intra coding
HEVC intra codingHEVC intra coding
HEVC intra coding
 
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...
 

Similar to H.264 Standard: 50% Bit Rate Savings for Real-Time Video

/conferences/spr2004/presentations/eubanks/eubanks_mpeg4.ppt
/conferences/spr2004/presentations/eubanks/eubanks_mpeg4.ppt/conferences/spr2004/presentations/eubanks/eubanks_mpeg4.ppt
/conferences/spr2004/presentations/eubanks/eubanks_mpeg4.pptVideoguy
 
H.264 Library
H.264 LibraryH.264 Library
H.264 LibraryVideoguy
 
H 264 in cuda presentation
H 264 in cuda presentationH 264 in cuda presentation
H 264 in cuda presentationashoknaik120
 
martelli.ppt
martelli.pptmartelli.ppt
martelli.pptVideoguy
 
Introduction to Video Compression Techniques - Anurag Jain
Introduction to Video Compression Techniques - Anurag JainIntroduction to Video Compression Techniques - Anurag Jain
Introduction to Video Compression Techniques - Anurag JainVideoguy
 
Applied technology
Applied technologyApplied technology
Applied technologyErica Fressa
 
Aruna Ravi - M.S Thesis
Aruna Ravi - M.S ThesisAruna Ravi - M.S Thesis
Aruna Ravi - M.S ThesisArunaRavi
 
Motion Vector Recovery for Real-time H.264 Video Streams
Motion Vector Recovery for Real-time H.264 Video StreamsMotion Vector Recovery for Real-time H.264 Video Streams
Motion Vector Recovery for Real-time H.264 Video StreamsIDES Editor
 
H.264 video compression standard.
H.264 video compression standard.H.264 video compression standard.
H.264 video compression standard.Axis Communications
 
H264 video compression explained
H264 video compression explainedH264 video compression explained
H264 video compression explainedcnssources
 
Paper id 2120148
Paper id 2120148Paper id 2120148
Paper id 2120148IJRAT
 
10.1.1.184.6612
10.1.1.184.661210.1.1.184.6612
10.1.1.184.6612NITC
 

Similar to H.264 Standard: 50% Bit Rate Savings for Real-Time Video (20)

/conferences/spr2004/presentations/eubanks/eubanks_mpeg4.ppt
/conferences/spr2004/presentations/eubanks/eubanks_mpeg4.ppt/conferences/spr2004/presentations/eubanks/eubanks_mpeg4.ppt
/conferences/spr2004/presentations/eubanks/eubanks_mpeg4.ppt
 
H.264 Library
H.264 LibraryH.264 Library
H.264 Library
 
H 264 in cuda presentation
H 264 in cuda presentationH 264 in cuda presentation
H 264 in cuda presentation
 
martelli.ppt
martelli.pptmartelli.ppt
martelli.ppt
 
Introduction to Video Compression Techniques - Anurag Jain
Introduction to Video Compression Techniques - Anurag JainIntroduction to Video Compression Techniques - Anurag Jain
Introduction to Video Compression Techniques - Anurag Jain
 
Applied technology
Applied technologyApplied technology
Applied technology
 
H.264 vs HEVC
H.264 vs HEVCH.264 vs HEVC
H.264 vs HEVC
 
Aruna Ravi - M.S Thesis
Aruna Ravi - M.S ThesisAruna Ravi - M.S Thesis
Aruna Ravi - M.S Thesis
 
Motion Vector Recovery for Real-time H.264 Video Streams
Motion Vector Recovery for Real-time H.264 Video StreamsMotion Vector Recovery for Real-time H.264 Video Streams
Motion Vector Recovery for Real-time H.264 Video Streams
 
H.264 video compression standard.
H.264 video compression standard.H.264 video compression standard.
H.264 video compression standard.
 
H264 video compression explained
H264 video compression explainedH264 video compression explained
H264 video compression explained
 
proposal
proposalproposal
proposal
 
A04840107
A04840107A04840107
A04840107
 
Paper id 2120148
Paper id 2120148Paper id 2120148
Paper id 2120148
 
video compression2
video compression2video compression2
video compression2
 
video compression2
video compression2video compression2
video compression2
 
video compression2
video compression2video compression2
video compression2
 
HEVC Main Main10 Profile Encoding
HEVC Main Main10 Profile EncodingHEVC Main Main10 Profile Encoding
HEVC Main Main10 Profile Encoding
 
10.1.1.184.6612
10.1.1.184.661210.1.1.184.6612
10.1.1.184.6612
 
Digital TV, IPTV
Digital TV, IPTVDigital TV, IPTV
Digital TV, IPTV
 

More from Videoguy

Energy-Aware Wireless Video Streaming
Energy-Aware Wireless Video StreamingEnergy-Aware Wireless Video Streaming
Energy-Aware Wireless Video StreamingVideoguy
 
Microsoft PowerPoint - WirelessCluster_Pres
Microsoft PowerPoint - WirelessCluster_PresMicrosoft PowerPoint - WirelessCluster_Pres
Microsoft PowerPoint - WirelessCluster_PresVideoguy
 
Proxy Cache Management for Fine-Grained Scalable Video Streaming
Proxy Cache Management for Fine-Grained Scalable Video StreamingProxy Cache Management for Fine-Grained Scalable Video Streaming
Proxy Cache Management for Fine-Grained Scalable Video StreamingVideoguy
 
Free-riding Resilient Video Streaming in Peer-to-Peer Networks
Free-riding Resilient Video Streaming in Peer-to-Peer NetworksFree-riding Resilient Video Streaming in Peer-to-Peer Networks
Free-riding Resilient Video Streaming in Peer-to-Peer NetworksVideoguy
 
Instant video streaming
Instant video streamingInstant video streaming
Instant video streamingVideoguy
 
Video Streaming over Bluetooth: A Survey
Video Streaming over Bluetooth: A SurveyVideo Streaming over Bluetooth: A Survey
Video Streaming over Bluetooth: A SurveyVideoguy
 
Video Streaming
Video StreamingVideo Streaming
Video StreamingVideoguy
 
Reaching a Broader Audience
Reaching a Broader AudienceReaching a Broader Audience
Reaching a Broader AudienceVideoguy
 
Considerations for Creating Streamed Video Content over 3G ...
Considerations for Creating Streamed Video Content over 3G ...Considerations for Creating Streamed Video Content over 3G ...
Considerations for Creating Streamed Video Content over 3G ...Videoguy
 
ADVANCES IN CHANNEL-ADAPTIVE VIDEO STREAMING
ADVANCES IN CHANNEL-ADAPTIVE VIDEO STREAMINGADVANCES IN CHANNEL-ADAPTIVE VIDEO STREAMING
ADVANCES IN CHANNEL-ADAPTIVE VIDEO STREAMINGVideoguy
 
Impact of FEC Overhead on Scalable Video Streaming
Impact of FEC Overhead on Scalable Video StreamingImpact of FEC Overhead on Scalable Video Streaming
Impact of FEC Overhead on Scalable Video StreamingVideoguy
 
Application Brief
Application BriefApplication Brief
Application BriefVideoguy
 
Video Streaming Services – Stage 1
Video Streaming Services – Stage 1Video Streaming Services – Stage 1
Video Streaming Services – Stage 1Videoguy
 
Streaming Video into Second Life
Streaming Video into Second LifeStreaming Video into Second Life
Streaming Video into Second LifeVideoguy
 
Flash Live Video Streaming Software
Flash Live Video Streaming SoftwareFlash Live Video Streaming Software
Flash Live Video Streaming SoftwareVideoguy
 
Videoconference Streaming Solutions Cookbook
Videoconference Streaming Solutions CookbookVideoconference Streaming Solutions Cookbook
Videoconference Streaming Solutions CookbookVideoguy
 
Streaming Video Formaten
Streaming Video FormatenStreaming Video Formaten
Streaming Video FormatenVideoguy
 
iPhone Live Video Streaming Software
iPhone Live Video Streaming SoftwareiPhone Live Video Streaming Software
iPhone Live Video Streaming SoftwareVideoguy
 
Glow: Video streaming training guide - Firefox
Glow: Video streaming training guide - FirefoxGlow: Video streaming training guide - Firefox
Glow: Video streaming training guide - FirefoxVideoguy
 

More from Videoguy (20)

Energy-Aware Wireless Video Streaming
Energy-Aware Wireless Video StreamingEnergy-Aware Wireless Video Streaming
Energy-Aware Wireless Video Streaming
 
Microsoft PowerPoint - WirelessCluster_Pres
Microsoft PowerPoint - WirelessCluster_PresMicrosoft PowerPoint - WirelessCluster_Pres
Microsoft PowerPoint - WirelessCluster_Pres
 
Proxy Cache Management for Fine-Grained Scalable Video Streaming
Proxy Cache Management for Fine-Grained Scalable Video StreamingProxy Cache Management for Fine-Grained Scalable Video Streaming
Proxy Cache Management for Fine-Grained Scalable Video Streaming
 
Adobe
AdobeAdobe
Adobe
 
Free-riding Resilient Video Streaming in Peer-to-Peer Networks
Free-riding Resilient Video Streaming in Peer-to-Peer NetworksFree-riding Resilient Video Streaming in Peer-to-Peer Networks
Free-riding Resilient Video Streaming in Peer-to-Peer Networks
 
Instant video streaming
Instant video streamingInstant video streaming
Instant video streaming
 
Video Streaming over Bluetooth: A Survey
Video Streaming over Bluetooth: A SurveyVideo Streaming over Bluetooth: A Survey
Video Streaming over Bluetooth: A Survey
 
Video Streaming
Video StreamingVideo Streaming
Video Streaming
 
Reaching a Broader Audience
Reaching a Broader AudienceReaching a Broader Audience
Reaching a Broader Audience
 
Considerations for Creating Streamed Video Content over 3G ...
Considerations for Creating Streamed Video Content over 3G ...Considerations for Creating Streamed Video Content over 3G ...
Considerations for Creating Streamed Video Content over 3G ...
 
ADVANCES IN CHANNEL-ADAPTIVE VIDEO STREAMING
ADVANCES IN CHANNEL-ADAPTIVE VIDEO STREAMINGADVANCES IN CHANNEL-ADAPTIVE VIDEO STREAMING
ADVANCES IN CHANNEL-ADAPTIVE VIDEO STREAMING
 
Impact of FEC Overhead on Scalable Video Streaming
Impact of FEC Overhead on Scalable Video StreamingImpact of FEC Overhead on Scalable Video Streaming
Impact of FEC Overhead on Scalable Video Streaming
 
Application Brief
Application BriefApplication Brief
Application Brief
 
Video Streaming Services – Stage 1
Video Streaming Services – Stage 1Video Streaming Services – Stage 1
Video Streaming Services – Stage 1
 
Streaming Video into Second Life
Streaming Video into Second LifeStreaming Video into Second Life
Streaming Video into Second Life
 
Flash Live Video Streaming Software
Flash Live Video Streaming SoftwareFlash Live Video Streaming Software
Flash Live Video Streaming Software
 
Videoconference Streaming Solutions Cookbook
Videoconference Streaming Solutions CookbookVideoconference Streaming Solutions Cookbook
Videoconference Streaming Solutions Cookbook
 
Streaming Video Formaten
Streaming Video FormatenStreaming Video Formaten
Streaming Video Formaten
 
iPhone Live Video Streaming Software
iPhone Live Video Streaming SoftwareiPhone Live Video Streaming Software
iPhone Live Video Streaming Software
 
Glow: Video streaming training guide - Firefox
Glow: Video streaming training guide - FirefoxGlow: Video streaming training guide - Firefox
Glow: Video streaming training guide - Firefox
 

H.264 Standard: 50% Bit Rate Savings for Real-Time Video

  • 1. Emerging H.264 Standard: Overview and TMS320DM642- Based Solutions for Real-Time Video Applications White Paper UB Video Inc. Suite 400, 1788 west 5th Avenue Vancouver, British Columbia, Canada V6J 1P2 Tel: 604-737-2426; Fax: 604-737-1514 www.ubvideo.com Copyright © 2002 UB Video Inc. www.ubvideo.com 12-2002
  • 2. H.264: Introduction Digital video is being adopted in an increasingly proliferating array of applications ranging from video telephony and videoconferencing to DVD and digital TV. The adoption of digital video in many applications has been fueled by the development of many video coding standards, which have emerged targeting different application areas. These standards provide the means needed to achieve interoperability between systems designed by different manufacturers for any given application, hence facilitating the growth of the video market. The International Telecommunications Union, Telecommunications Standardization Sector (ITU-T) is now one of two formal organizations that develop video coding standards - the other being International Standardization Organization/International Electrotechnical Commission, Joint Technical Committee 1. (ISO/IEC JTC1) The ITU-T video coding standards are called recommendations, and they are denoted with H.26x (e.g., H.261, H.262, H.263 and H.264). The ISO/IEC standards are denoted with MPEG-x (e.g., MPEG-1, MPEG-2 and MPEG-4). The ITU-T recommendations have been designed mostly for real-time video communication applications, such as video conferencing and video telephony. On the other hand, the MPEG standards have been designed mostly to address the needs of video storage (DVD), broadcast video (Cable, DSL, Satellite TV), and video streaming (e.g., video over the Internet, video over wireless) applications. For the most part, the two standardization committees have worked independently on the different standards. The only exception has been the H.262/MPEG-2 standard, which was developed jointly by the two committees. Recently, the ITU-T and the ISO/IEC JTC1 have agreed to join their efforts in the development of the emerging H.264 standard, which was initiated by the ITU-T committee. H.264 (a.k.a. MPEG-4 Part 10 or MPEG-4 AVC) is being adopted by the two committees because it represents a departure in terms of performance from all existing video coding standards. Figure 1 summarizes the evolution of the ITU-T recommendations and the ISO/IEC MPEG standards. ITU-T H.261 H.263 H.263+ H.263++ Standards Joint ITU-T / MPEG H.262/MPEG-2 H.264 Standards MPEG Standards MPEG-1 MPEG-4 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 Figure 1. Progression of the ITU-T Recommendations and MPEG standards. Copyright © 2002 UB Video Inc. www.ubvideo.com 12-2002
  • 3. H.264: Overview The main objective behind the H.264 project was to develop a high-performance video coding standard by adopting a “back to basics” approach with simple and straightforward design using well-known building blocks. The ITU-T Video Coding Experts Group (VCEG) initiated the work on the H.264 standard in 1997. Towards the end of 2001, and witnessing the superiority of video quality offered by H.264-based software over that achieved by the (existing) most optimized MPEG-4 based software, ISO/IEC MPEG joined ITU-T VCEG by forming a Joint Video Team (JVT) that took over the H.264 project of the ITU-T. The JVT objective was to create a single video coding standard that would simultaneously result in a new part (i.e., Part-10) of the MPEG-4 family of standards and a new ITU-T (i.e., H.264) recommendation. The H.264 development work is an on-going activity, with the first version of the standard expected to be finalized technically early in the year 2003 and officially before the end of the year 2003. The emerging H.264 standard has a number of advantages that distinguish it from existing standards, while at the same time, sharing common features with other existing standards. The following are some of the key advantages of H.264: 1. Up to 50% in bit rate savings: Compared to H.263v2 (H.263+) or MPEG-4 Simple Profile, H.264 permits a reduction in bit rate by up to 50% for a similar degree of encoder optimization at most bit rates. 2. High quality video: H.264 offers consistently good video quality at high and low bit rates. 3. Error resilience: H.264 provides the tools necessary to deal with packet loss in packet networks and bit errors in error-prone wireless networks. 4. Network friendliness: Through the Network Adaptation Layer, H.264 bit streams can be easily transported over different networks. The above advantages make H.264 an ideal standard for several applications such as video conferencing and broadcast video. Copyright © 2002 UB Video Inc. www.ubvideo.com 12-2002
  • 4. Quantization step sizes increased at a compounding Coding Control rate of approximately 12.5% Video Quantized Transform Source + Transform Quantization Coefficients _ 4x4 Integer Transform (fixed) Predicted Inverse Macroblock Quantization Intra Inter Bit Stream Out Entropy Inverse Coding Intra Prediction Inter Motion Transform and Compensation Compensation Single Universal VLC & Context-Adaptive VLC OR + Context-Based Adaptive Binary Arithmetic Coding Frame Store Loop Filter No mismatch Motion Estimation Motion Vectors + Macroblock Type + Reference Frame Index + CBP Intra Prediction Mode + Macroblock Type + CBP Seven block sizes and shapes Intra Prediction Modes ¼-pel-motion estimation accuracy 9 4x4, 4 16x16 modes (luma) & 4 Five Prediction modes with B-pictures modes (chroma) = 17 modes Multiple reference picture selection Figure 2. Block Diagram of the H.264 encoder. Copyright © 2002 UB Video Inc. www.ubvideo.com 12-2002
  • 5. H.264: Technical Description The main objective of the emerging H. 264 standard is to provide a means to achieve substantially higher video quality as compared to what could be achieved using any of the existing video coding standards. Nonetheless, the underlying approach of H.264 is similar to that adopted in previous standards such as H.263 and MPEG-4, and consists of the following four main stages: 1. Dividing each video frame into blocks of pixels so that processing of the video frame can be conducted at the block level. 2. Exploiting the spatial redundancies that exist within the video frame by coding some of the original blocks through spatial prediction, transform, quantization and entropy coding (or variable-length coding). 3. Exploiting the temporal dependencies that exist between blocks in successive frames, so that only changes between successive frames need to be encoded. This is accomplished by using motion estimation and compensation. For any given block, a search is performed in the previously coded one or more frames to determine the motion vectors that are then used by the encoder and the decoder to predict the subject block. 4. Exploiting any remaining spatial redundancies that exist within the video frame by coding the residual blocks, i.e., the difference between the original blocks and the corresponding predicted blocks, again through transform, quantization and entropy coding. From a coding point of view, the main differences between H.264 and the other standards are summarized in Figure 2 through an encoder block diagram. On the motion estimation/compensation side, H.264 employs blocks of different sizes and shapes, higher resolution 1/4-pel motion estimation, multiple reference frame selection and complex multiple bi-directional mode selection. On the transform side, H.264 uses an integer- based transform that approximates roughly the Discrete Cosine Transform (DCT) used in previous standards, but does not have the mismatch problem in the inverse transform. In H.264, entropy coding can be performed using either a combination of a single Universal Variable Length Codes (UVLC) table with a Context Adaptive Variable Length Codes (CAVLC) for the transform coefficients or using Context-based Adaptive Binary Arithmetic Coding (CABAC). Organization of the Bit Stream A given video picture is divided into a number of small blocks referred to as macroblocks. For example, a picture with QCIF resolution (176x144) is divided into 99 16x16 macroblocks as indicated in Figure 3. A similar macroblock segmentation is used for other frame sizes. The luminance component of the picture is sampled at these frame resolutions, while the chrominance components, Cb and Cr, are down-sampled by two in the horizontal and vertical directions. In addition, a picture may be divided into an integer number of “slices”, which are valuable for resynchronization should some data be lost. Copyright © 2002 UB Video Inc. www.ubvideo.com 12-2002 + +
  • 6. 9 11 Figure 3. Subdivision of a QCIF picture into 16x16 macroblocks. Intra Prediction and Coding Intra coding refers to the case where only spatial redundancies within a video picture are exploited. The resulting frame is referred to as an I-picture. I-pictures are typically encoded by directly applying the transform to the different macroblocks in the frame. Consequently, encoded I-pictures are large in size since a large amount of information is usually present in the frame, and no temporal information is used as part of the encoding process. In order to increase the efficiency of the intra coding process in H.264, spatial correlation between adjacent macroblocks in a given frame is exploited. The idea is based on the observation that adjacent macroblocks tend to have similar properties. Therefore, as a first step in the encoding process for a given macroblock, one may predict the macroblock of interest from the surrounding macroblocks (typically the ones located on top and to the left of the macroblock of interest, since those macroblocks would have already been encoded). The difference between the actual macroblock and its prediction is then coded, which results in fewer bits to represent the macroblock of interest as compared to when applying the transform directly to the macroblock itself. Figure 4. Intra prediction modes for 4x4 luminance blocks. In order to perform the intra prediction mentioned above, H.264 offers nine modes for prediction of 4x4 luminance blocks, including DC prediction (Mode 2) and eight directional modes, labelled 0, 1, 3, 4, 5, 6, 7, and 8 in Figure 4. This process is illustrated in Figure 4, in which pixels A to M from neighbouring blocks have already been encoded and may be used for prediction. For example, if Mode 0 (Vertical prediction) is selected, then the values of the pixels a to p are assigned as follows: ♦ a, e, i and m are equal to A, Copyright © 2002 UB Video Inc. www.ubvideo.com 12-2002
  • 7. b, f, j and n are equal to B, ♦ c, g, k and o are equal to C, and ♦ d, h, l and p are equal to D. In the case where Mode 3 (Diagonal_Down_Left prediction) is chosen, the values of a to p are given as follows: ♦ a is equal to (A+2B+C+2)/4, ♦ b, e are equal to (B+2C+D+2)/4, ♦ c, f, i are equal to (C+2D+E+2)/4, ♦ d, g, j, m are equal to (D+2E+F+2)/4, ♦ h, k, n are equal to (E+2F+G+2)/4, ♦ l, o are equal to (F+2G+H+2)/4, and ♦ p is equal to (G+3H+2)/4. For regions with less spatial detail (i.e., flat regions), H.264 supports 16x16 intra coding, in which one of four prediction modes (DC, Vertical, Horizontal and Planar) is chosen for the prediction of the entire luminance component of the macroblock. In addition, H.264 supports intra prediction for the 8x8 chrominance blocks also using four prediction modes (DC, Vertical, Horizontal and Planar). Finally, the prediction mode for each block is efficiently coded by assigning shorter symbols to more likely modes, where the probability of each mode is determined based on the modes used for coding the surrounding blocks. Inter Prediction and Coding H.264 derives most of its coding efficiency advantage from motion estimation. Inter prediction and coding is based on using motion estimation and compensation to take advantage of the temporal redundancies that exist between successive frames, hence, providing very efficient coding of video sequences. When a selected reference frame(s) for motion estimation is a previously encoded frame(s), the frame to be encoded is referred to as a P-picture. When both a previously encoded frame and a future frame are chosen as reference frames, then the frame to be encoded is referred to as a B-picture. Motion estimation in H.264 supports most of the key features adopted in earlier video standards, but its efficiency is improved through added flexibility and functionality. In addition to supporting P-pictures (with single and multiple reference frames) and B-pictures, H.264 supports a new inter-stream transitional picture called an SP-picture. The inclusion of SP-pictures in a bit stream enables efficient switching between bit streams with similar content encoded at different bit rates, as well as random access and fast playback modes. Copyright © 2002 UB Video Inc. www.ubvideo.com 12-2002
  • 8. Mode 1 Mode 2 Mode 3 Mode 4 One 16x16 block Two 8x16 blocks Two 16x8 blocks Four 8x8 blocks One motion vector Two motion vectors Two motion vectors Four motion vectors 0 0 1 0 0 1 1 2 3 Mode 5 Mode 6 Mode 7 Eight 4x8 blocks Eight 8x4 blocks Sixteen 4x4 blocks Eight motion vectors Eight motion vectors Sixteen motion vectors 0 1 0 1 2 3 0 1 2 3 2 3 4 5 6 7 4 5 8 9 10 11 4 5 6 7 6 7 12 13 14 15 Figure 5. Different modes of dividing a macroblock for motion estimation in H.264. Block sizes Motion compensation for each 16x16 macroblock can be performed using a number of different block sizes and shapes. These are illustrated in Figure 5. Individual motion vectors can be transmitted for blocks as small as 4x4, so up to 16 motion vectors may be transmitted for a single macroblock. Block sizes of 16x8, 8x16, 8x8, 8x4, and Using seven different block sizes and shapes can translate into bit rate savings 4x8 are also supported as shown. The availability of smaller of more than 15% as compared to using motion compensation blocks improves prediction in general, only a 16x16 block size. and in particular, the small blocks improve the ability of the model to handle fine motion detail and result in better subjective viewing quality because they do not produce large blocking artifacts. Moreover, through the recently adopted tree structure segmentation method, it is possible to have a combination of 4x8, 8x4, or 4x4 sub-blocks within an 8x8 sub-block. Figure 6 shows an example of such a configuration for a 16x16 macroblock. 4x4 4x4 8x8 4x4 4x4 8x4 4x8 4x8 8x4 Figure 6. A configuration of sub-blocks within a macroblock based on tree structure segmentation method in H.264 Copyright © 2002 UB Video Inc. www.ubvideo.com 12-2002
  • 9. Motion estimation accuracy The prediction capability of the motion compensation Using ¼-pel spatial accuracy can yield as algorithm in H.264 is further improved by allowing motion much as 20% in bit rate savings as vectors to be determined with higher levels of spatial compared to using integer-pel spatial accuracy. accuracy than in existing standards. Quarter-pixel accurate motion compensation is the lowest-accuracy form of motion compensation in H.264 (in contrast with prior standards based primarily on half-pel accuracy, with quarter-pel accuracy only available in the newest version of MPEG-4). Multiple reference picture selection The H.264 standard offers the option of having multiple reference frames in inter picture coding, resulting in better subjective video quality and more efficient coding of the video frame under consideration. Moreover, using Using 5 reference frames for prediction can yield 5-10% in bit rate savings as multiple reference frames helps make the H.264 bit stream compared to using only one reference more error resilient. However, from an implementation point frame. of view, there would be additional processing delays and higher memory requirements at both the encoder and decoder. De-blocking (Loop) Filter H.264 specifies the use of an adaptive de-blocking filter that operates on the horizontal and vertical block edges within the prediction loop in order to remove artifacts The de-blocking filter yields a substantial caused by block prediction errors. The filtering is generally improvement in subjective quality. based on 4x4 block boundaries, in which two pixels on either side of the boundary may be updated using a different filter. The rules for applying the de-blocking filter are intricate and quite complex, however, its use is optional for each slice (loosely defined as an integer number of macroblocks). Nonetheless, the improvement in subjective quality often more than justifies the increase in complexity. Integer Transform The information contained in a prediction error block resulting from either intra prediction or inter prediction is then re-expressed in the form of transform coefficients. H.264 is unique in that it employs a purely integer spatial transform (a rough approximation of the DCT) which is primarily 4x4 in shape, as opposed to the usual floating-point 8x8 DCT specified with rounding-error tolerances as used in earlier standards. The small shape helps reduce blocking and ringing artifacts, while the precise integer specification eliminates any mismatch issues between the encoder and decoder in the inverse transform. Quantization and Transform Coefficient Scanning The quantization step is where a significant portion of data compression takes place. In H.264, the transform coefficients are quantized using scalar quantization with no widened dead-zone. Fifty-two different quantization step sizes can be chosen on a macroblock basis – this being different from prior standards (H.263 supports thirty-one, for example). Moreover, in H.264 the step sizes are increased at a compounding Copyright © 2002 UB Video Inc. www.ubvideo.com 12-2002
  • 10. rate of approximately 12.5%, rather than increasing it by a constant increment. The fidelity of chrominance components is improved by using finer quantization step sizes as compared to those used for the luminance coefficients, particularly when the luminance coefficients are coarsely quantized. 0 1 5 6 2 4 7 12 3 8 11 13 9 10 14 15 Figure 7. Scan pattern for frame coding in H.264. The quantized transform coefficients correspond to different frequencies, with the coefficient at the top left hand corner in Figure 7 representing the DC value, and the rest of the coefficients corresponding to different nonzero frequency values. The next step in the encoding process is to arrange the quantized coefficients in an array, starting with the DC coefficient. A single coefficient-scanning pattern is available in H.264 (Figure 7) for frame coding, and another one is being added for field coding. The zigzag scan illustrated in Figure 7 is used in all frame-coding cases, and it is identical to the conventional scan used in earlier video coding standards. The zigzag scan arranges the coefficient in an ascending order of the corresponding frequencies. Entropy Coding The last step in the video coding process is entropy coding. Entropy coding is based on assigning shorter codewords to symbols with higher probabilities of occurrence, and longer codewords to symbols with less frequent occurrences. Some of the parameters to be entropy coded include transform coefficients for the residual data, motion vectors and other encoder information. Two types of entropy coding have been adopted. The first method represents a combination of Universal Variable Length Coding (UVLC) and Context Adaptive Variable-Length coding (CAVLC). The second method is represented by Context-Based Adaptive Binary Arithmetic Coding (CABAC). UVLC/CAVLC In some video coding standards, symbols and the associated codewords are organized in look-up tables, referred to as variable length coding (VLC) tables, which are stored at both the encoder and decoder. In H.263, a number of VLC tables are used, depending on the type of data under consideration (e.g., transform coefficients, motion vectors). H.264 offers a single Universal VLC (UVLC) table that is to be used in entropy coding of all symbols in the encoder except for the transform coefficients. Although the use of a single UVLC table is simple, is has a major disadvantage, which is that the single table is usually derived using a static probability distribution model, which ignores the correlations between the encoder symbols. In H.264, the transform coefficients are coded using Context Adaptive Variable Length Coding (CAVLC). CAVLC is designed to take advantage of several characteristics of quantized 4x4 blocks. First, non-zero coefficients at the end of the zigzag scan are often equal to +/- 1. CAVLC encodes the number of these coefficients (“trailing 1s”) in a compact way. Second, CAVLC employs run-level coding efficiently to represent the string of zeros in a quantized 4x4 block. Moreover, the numbers of non-zero coefficients in Copyright © 2002 UB Video Inc. www.ubvideo.com 12-2002
  • 11. neighbouring blocks are usually correlated. Thus, the number of non-zero coefficients is encoded using a look-up table that depends on the numbers of non-zero coefficients in neighbouring blocks. Finally, the magnitude (level) of non-zero coefficients gets larger near the DC coefficient and get smaller around the high frequency coefficients. CAVLC takes advantage of this by making the choice of the VLC look-up table for the level adaptive in a way where the choice depends on the recently coded levels. H.264 Profiles To this date, three major profiles have been agreed upon: Baseline, mainly for video conferencing and telephony/mobile applications, Main, primarily for broadcast video applications, and X, mainly for streaming and mobile video applications. Figure 8 shows the common features between the Baseline and Main profiles as well as the additional specific features for each. The Baseline profile allows the use of Arbitrary Slice Ordering (ASO) to reduce the latency in real-time communication applications, as well as the use of Flexible Macroblock Ordering (FMO) and redundant slices to improve error resilience in the coded bit stream. The Main profile enables an additional reduction in bandwidth over the Baseline profile through mainly sophisticated Bi-directional prediction (B-pictures), Context Adaptive Binary Arithmetic Coding (CABAC) and weighted prediction. Figure 8. Features for the Baseline and Main profiles Baseline Profile: Specific Features Arbitrary Slice Ordering Arbitrary slice ordering allows the decoder to process slices in an arbitrary order as they arrive to the decoder. Hence the decoder does not have to wait for all the slices to be properly arranged before it starts processing them. This reduces the processing delay at the decoder, resulting in less overall latency in real time video communication applications. Flexible Macroblock Ordering (FMO) Macroblocks in a given frame are usually coded in a raster scan order. With FMO, macroblocks are coded according to a macroblock allocation map that groups, within a given slice, macroblocks from spatially Copyright © 2002 UB Video Inc. www.ubvideo.com 12-2002
  • 12. different locations in the frame. Such an arrangement enhances error resilience in the coded bit stream since it reduces the interdependency that would otherwise exist in coding data within adjacent macroblocks in a given frame. In the case of packet loss, the loss is scattered throughout the picture and can be easily concealed. Redundant slices Redundant slices allow the transmission of duplicate slices over error-prone networks to increase the likelihood of the delivery of a slice that is free of errors. Main Profile: Specific Features B pictures B-pictures provide a compression advantage as compared to P-pictures by allowing a larger number of prediction modes for each macroblock. Here, the prediction is formed by averaging the sample values in two reference blocks, generally, but not necessarily using one reference block that is forward in time and one that is backward in time with respect to the current picture. In addition, "Direct Mode" prediction is supported, in which the motion vectors for the macroblock are interpolated based on the motion vectors used for coding the co-located macroblock in a nearby reference frame. Thus, no motion information is transmitted. By allowing so many prediction modes, the prediction accuracy is improved, often reducing the bit rate by 5-10%. Weighted prediction This allows the modification of motion compensated sample intensities using a global multiplier and a global offset. The multiplier and offset may be explicitly sent, or implicitly inferred. The use of the multiplier and the offset aims at reducing the prediction residuals due, for example, to global changes in brightness, and consequently, leads to enhanced coding efficiency for sequences with fades, lighting changes, and other special effects. CABAC Context Adaptive Binary Arithmetic Coding (CABAC) The use of Context based Adaptive makes use of a probability model at both the encoder and Binary Arithmetic Coding in H.264 yields decoder for all the syntax elements (transform coefficients, an improvement of approximately 10% motion vectors, etc.). To increase the coding efficiency of in bit savings. arithmetic coding, the underlying probability model is adapted to the changing statistics within a video frame, through a process called context modeling. The context modeling provides estimates of conditional probabilities of the coding symbols. Utilizing suitable context models, given inter-symbol redundancy can be exploited by switching between different probability models according to already coded symbols in the neighborhood of the current symbol to encode. The context modeling is responsible for most of CABAC’s 10% savings in bit rate over the VLC entropy coding method (UVLC/CAVLC). Example Applications: Video Conferencing Video conferencing systems are being increasingly used around the world as tools that enable cost-effective and efficient communication while reducing expenses and enhancing productivity. Deployed mostly within Copyright © 2002 UB Video Inc. www.ubvideo.com 12-2002
  • 13. the enterprise setting, video conferencing offers collaboration tools, in addition to the exchange of audio and video information, that make it an attractive alternative to business travel. Video Conferencing Industry: Challenges The use of video as part of videoconferencing applications involves a number of challenges from both the bandwidth and the quality points of view. Such challenges are discussed next. • Efficient bandwidth utilization: Most of the bandwidth allocated to a typical video conferencing session is usually consumed by the video component of the data that is being transmitted. Any significant reduction in the video bandwidth that is required to maintain a desired video quality could result in a number of benefits, such as increasing the number of participants during a conferencing session and increasing the amount of data that could be exchanged during the conferencing session. Moreover, the fact that video conferencing applications run at low rates makes it even more important to use the most effective video compression tools in order to maintain good video quality at such rates. • Low processing delay: It is important that the delay associated with the processing and transmission of the data be kept to a minimum in order to maintain good quality in the decoded video. A large processing delay at the encoder would lead to the perception of non-smooth motion in the reconstructed video sequences. The delay (a.k.a. latency) is the sum of encoding, network and decoding delays. In real-time interactive applications, the user will notice some objectionable delay if the round trip time exceeds 250ms. In order to minimize latency, it is essential to have very small processing delays at the encoder and decoder. • Better Video Quality: Video quality in conferencing applications can be affected negatively by a number of factors, including noise and brightness changes in the source video, the presence of trailing artifacts in the reconstructed video and network packet loss. ! Pre-processing: The presence of noise in the video frames and the brightness changes between successive video frames can reduce substantially the efficiency of the inter-coding process. Consequently, it is important to incorporate pre-processing tools in order to minimize the effects of both the noise and the changes in brightness on the video quality. ! Avoidance of Trailing Artifacts: Trailing artifacts usually appear as a series of dots that trail a moving object, and are very visible. The presence of artifacts is due to the fact that the encoder is forced to discard a lot of useful data (i.e., employ a large quantizer) when coding video data at low rates. Consequently, one would need to detect and eliminate (or at least) reduce such artifacts. ! Error Resilience: The compressed bit stream generated by the video encoder is first segmented into fixed or variable length packets, multiplexed with audio and other data types, and then transported over the network. Some of the transmitted packets can get lost or corrupted during transport over the network, due, for example, to congestion in the network or impairment of the physical channel, leading to a distortion in the audio and video data. Therefore, it is often necessary to employ error resilience tools in the encoder to guarantee a minimal level of resilience to transmission errors. Copyright © 2002 UB Video Inc. www.ubvideo.com 12-2002
  • 14. UBLive-264BP: A Solution for Video Conferencing Applications UB Video’s UBLive-264BP is a complete video processing solution that has at its core an optimized Baseline H.264 encoder/decoder. UBLive-264BP has a number of features that make it suitable for video conferencing applications. UBLive-264BP was designed to address all of the major video processing issues that plague video conferencing applications, as explained in the following. • Up to 50% in bit rate savings: Most of the existing video conferencing systems in existence today are based on video coding solutions that make use of either the H.261 or H.263 video coding standards. Compared to H.263-based solutions, UBLive-264BP yields an average reduction in bit rate by up to 50% for a similar degree of encoder optimization at most bit rates and for the same subjective video quality. The savings in bit rate are due in part to the flexibility provided by H.264 but also to the collection of complexity-reduction algorithms UB Video has researched and developed over the past few years. • Low Processing Delay: UBLive-264BP also features very low processing latency, a key requirement in video conferencing applications. UBLive-264BP guarantees an excellent video quality even for single- pass encoding, thereby reducing processing latency. • Better Video Quality: UBLive-264BP yields high video quality through the effective use of the H.264 Baseline Profile features as well as the use of pre-processing methods such as noise filtering, brightness compensation in the source video, and trailing artifact avoidance/reduction, as well error resilience methods such as Flexible Macroblock Ordering (FMO). Example Applications: Broadcast Video Broadcast video applications are currently undergoing a transition where more and more content is produced and diffused in digital format as opposed to the traditional analog format. Service providers in the broadcast industry face an unprecedented competition for viewers. In a landscape that was previously dominated by the cable industry, satellite and Digital Subscriber Line (DSL) companies are now also competing for the same customers. The competitive landscape is driving service providers to develop ways to differentiate their services and to adopt new solutions for the production and delivery of digital video. Broadcast Video Industry: Challenges Digital broadcast video service providers need to overcome significant technical hurdles in order to differentiate their services in the market. From a video coding perspective, there are three major challenges that the industry has to deal with; namely making efficient use of the available bandwidth, ensuring high- reproduction video quality, and providing a cost-effective embedded decoder solution. Another challenge is that the H.264 Main Profile standard, while holding the promise to help improve the bandwidth-quality tradeoffs, is still not final, and may in fact change quite significantly over the next few months. Each of these four challenges is discussed in more detail in the following. • Efficient Bandwidth Utilization: A key differentiating factor for service providers is the number of channels that can be accommodated over a given transmission bandwidth, which in turn affects the Copyright © 2002 UB Video Inc. www.ubvideo.com 12-2002
  • 15. amount of generated revenues. For the different service providers, whether based on cable, satellite, or DSL services, a more efficient use of the transmission bandwidth could translate into making available more channels to the customers, or providing additional services, hence enhancing the service providers’ service offerings. For some broadcast video applications, such as video over DSL, the available bandwidth is already very limited, implying that bandwidth savings would be even more needed. However, any savings in the bandwidth allocated to the broadcasting of digital video should not come at the expense of video quality, as customers expect the service to provide the same broadcast video quality they are used to, or even better. Therefore, the only available solution is that the most effective compression tools be used. • Broadcast Video Quality: Viewers used to the superior quality produced by DVD players will not accept any broadcast video that does not measure up to that level of quality. Providing broadcast video quality at a limited channel bandwidth (i.e., 1.5Mbps for DSL channels) for the Standard-Definition (SD) resolution (720x480) can therefore be quite challenging, particularly when the video content is characterized by action sequences with significant amount of motion, scene changes as well as fades and dissolves. Moreover, for most broadcast video applications, the video quality could be significantly affected by the presence of spatial and temporal noise, often yielding very objectionable artifacts such as contouring and blocking especially in bright areas. Therefore, pre-processing the source video to remove noise is critical to insuring a high-level of video quality. • Decoder complexity: The H.264 standard is significantly more complex than any of the previous video coding standards. Motion compensation, for instance, makes use of 7 block sizes from 16x16 down to 4x4. Consequently, the H.264 decoder is expected to be significantly more demanding in terms of computations and memory requirements. Any decoder should be able to handle all “legal” bit streams (i.e., worst-case scenario), making the decoder implementation even more complicated. Moreover, the development of an embedded decoder implementation where the internal memory size is limited is a challenging task. For example, when performing motion compensation on a macroblock coded using bi- directional prediction, the decoder must refer to multiple reference frames in both directions. Transferring the appropriate macroblocks for motion compensation can slow down the decoder significantly as memory transfers may become too demanding in terms of cycles. • H.264 Main Profile Status: The H.264 standard is in the last stage of development and changes to the standard may still be possible, particularly in features included only by the Main Profile. As a result, broadcast infrastructure companies, which normally look for hardware solutions, are currently looking for solutions that are fully programmable in order to be able to quickly adapt to the evolving standard. UBLive-264MP: A Solution for Broadcast Video Applications UB Video’s UBLive-264MP consists of three components: a pre-processing tool, an encoder and a decoder. This solution is based on the H.264 Main profile, which has a number of features that make it suitable for broadcast video applications. UBLive-264MP was designed to address all the major video processing challenges (discussed above) that face the broadcast video industry. In the following, the three components of UBLive-264MP and how they address the different challenges are discussed. Copyright © 2002 UB Video Inc. www.ubvideo.com 12-2002
  • 16. Pre-processing: One of the challenges in broadcast video is the presence of noise that may potentially lead to objectionable artifacts. UBLive-264MP applies motion-compensated filtering to reduce noise in successive video frames. Motion-compensated filtering reduces noise in the temporal direction while taking into account motion in successive video frames. This pre-processing not only achieves good performance in terms of reducing the occurrence of artifacts, but it also results in bit savings, leading to a better overall video quality. • Encoder: Existing broadcast systems are generally based on video coding solutions that make use of the MPEG-2 coding standard. The emerging H.264 standard is poised to become the enabling technology that would address the bandwidth utilization problem. Compared to MPEG-2, H.264 promises an average reduction in bit rate by more than 50% for a similar degree of encoder optimization at most bit rates and for the same subjective video quality. The savings in bit rate are the result of a number of techniques, including the use sophisticated intra prediction, multiple block sizes and multiple reference frames in motion estimation, more advanced B-picture coding and highly efficient Context-Based Adaptive Binary Arithmetic Coding. UBLive-264MP, through the result of a three-year research and development effort involving various algorithmic and subjective optimizations, achieves good bandwidth-quality tradeoffs. In order to achieve efficient implementation of the encoder, UB Video has developed rate-distortion optimized motion estimation and mode decision algorithms that provide excellent quality-complexity tradeoffs. To further enhance the video quality, UB Video has also developed new algorithms that take into account the perception limitation of the human visual system. • Decoder on the TMS320DM642 Digital Media Processor: Since the H.264 standard is still under development and is expected to stabilize in 2003, broadcast infrastructure companies are looking for a programmable chip solution that is powerful enough to handle the processing power associated with such a complex decoder. UB Video’s highly optimized H.264 Main profile decoder software running on the Texas Instruments’ TMS320DM642 Digital Media Processor represents a very compelling solution. This solution includes currently the only embedded real-time broadcast-quality H.264 Main profile decoder for SD resolution. This was achieved through algorithmic optimizations that take full advantage of the DSP capabilities. This hardware/software solution offers an attractive and viable option for a new generation of set top boxes featuring enhanced flexibility through the programmability offered by the DM642 digital media processor. Adopting this solution will certainly help service providers roll out their H.264 compliant products in the very near future. UBLive-264-DM642 Solutions: Demonstration Software UB Video has demonstration software for the UBLive-264-DM642 solutions (namely UBLive-264BP and UBLive-264MP) that runs on the TMS320DM642 Network Video Developer’s Kit (NVDK) board from Texas Instruments. TMS320DM642 Digital Media Platform The TMS320DM642™ digital media processor (DM642) DSP core delivers the highest performance at the lowest power consumption of any available DSP in the market to date. With its 600Mhz processing power today and the planned next generation 1.1GHz version, this DSP is most suited to overcome the complexity and computational requirements of H.264 and to deliver high-quality full-screen video for most broadband Copyright © 2002 UB Video Inc. www.ubvideo.com 12-2002
  • 17. video applications. The DSP core processor has 64 general-purpose 32-bit registers and eight highly independent functional units - two multipliers and six arithmetic logic units (ALUs) with VelociTI.2 extensions. The VelociTI.2™ extensions in the eight functional units include new instructions to accelerate the performance in video and imaging applications and extend the parallelism of the VelociTI™ architecture. The DM642 uses a two-level cache-based architecture and has a powerful and diverse set of peripherals. The Level 1 program cache (L1P) is a 128-Kbit direct-mapped cache and the Level 1 data cache (L1D) is a 128-Kbit 2- way set-associative cache. The Level 2 memory/cache (L2) consists of a 2-Mbit-memory space that is shared between program and data space. L2 memory can be configured as mapped memory, cache, or combinations of the two. Figure 9. Block diagram of the UBLive-264BP demo. UBLive-264BP: Demonstration Software A block diagram of the demo is shown in Figure 9. The NVDK is a platform where real-time API encode/decode function calls are done and where algorithms are tested for compliance. The way the UB Video demonstration software works is as follows: A camera captures frames at a rate of 30 frames per second and a resolution of 640x480. In the case where the SIF resolution mode is selected, frames are scaled down to SIF resolution. The resulting frame then passes through a pre-processing stage to reduce the presence of noise in the source frame and to remove any significant brightness changes between successive video frames. The output frame is then passed to the encoder API, and then to the decoder API. The decoded frame is converted from YUV to RGB and then displayed on the screen along with the original (possibly) scaled video frame. Please see [1] for more details on the UBLive-264MP demonstration software. The UBLive-264BP software illustrates that UB Video’s Baseline H.264 solution allows a 35-50% bandwidth reduction over that required by H.263-based solutions. While the additional complexity is perceived is be high, the software also demonstrates that UBLive-264BP can simultaneously encode and decode 640x480 interlaced video sequences on a single 600Mhz DM642 chip, with possibly enough room for audio and other video conferencing system components. UBLive-264MP: Demonstration Software A block diagram of the demo is shown in Figure 10. The way the UB Video demonstration software works is as follows: A bit stream is parsed by the decoder and YUV 4:2:0 frames are output. The resolution of the frame is 720x360 for Y and 360x180 for chrominance components. The vertical resolution (360) is due to the Copyright © 2002 UB Video Inc. www.ubvideo.com 12-2002
  • 18. wide screen DVD content that is used. Once a frame is decoded, it goes through a post-processing stage as shown. The chrominance vertical lines are duplicated leading to YUV 4:2:2 resolution and then a transformation to packed format is done followed by a 3:2 pull-down operation to up-sample the frame rate to 30 frames per second. Each frame is delivered to the board display hardware for output to a television. The UBLive-264MP decoder demo software has been optimized to decode 30 fps of SD resolution video, hence illustrating the fact that UBLive-264MP on the TMS320DM642 Digital Media Processor provides a viable solution for digital video set-top applications. Moreover, the demo illustrates the capability of the UBLive-264MP encoder software which yields, at a bit rate of 1.5 Mbps, the same broadcast quality video that MPEG-2 would provide at a rate of 3 Mbps. Please see [2] for more details on UBLive-264MP solution. Figure 10. Block diagram of the UBLive-264MP demo. References [1] UBLive-264BP: An H.264-Based Solution on the TMS320DM642 for Video Conferencing Applications. UB Video Inc. www.ubvideo.com. [2] UBLive-264MP: An H.264-Based Solution on the TMS320DM642 for Video Broadcast Applications. UB Video Inc. www.ubvideo.com. Copyright © 2002 UB Video Inc. www.ubvideo.com 12-2002