SlideShare una empresa de Scribd logo
1 de 21
Descargar para leer sin conexión
Multimedia Technology                                                                  Introduction (2)
n   Overview                                                                           n   Multimedia network
    q      Introduction                                                                    q      The Internet was designed in the 60s for low-speed inter-
    q      Chapter 1: Background of compression techniques                                        networks with boring textual applications à High delay,
    q      Chapter 2: Multimedia technologies                                                     high jitter.
           n   JPEG                                                                        q      à Multimedia applications require drastic modifications
           n   MPEG-1/MPEG -2 Audio & Video                                                       of the INTERNET infrastructure.
           n   MPEG-4                                                                      q      Many frameworks have been being investigated and
           n   MPEG-7 (brief introduction)                                                        deployed to support the next generation multimedia
           n   HDTV (brief introduction)                                                          Internet. (e.g. IntServ, DiffServ)
           n   H261/H263 (brief introduction)                                              q      In the future, all TVs (and PCs) will be connected to the
           n   Model base coding (MBC) (brief introduction)                                       Internet and freely tuned to any of millions broadcast
    q      Chapter 3: Some real-world systems                                                     stations all over the World.
           n   CATV systems                                                                q      At present, multimedia networks run over ATM (almost
           n   DVB systems
                                                                                                  obsolete), IPv4, and in the future IPv6 à should
                                                                                                  guarantee QoS (Quality of Service) !!
    q      Chapter 4: Multimedia Network


4/2/2003                    Nguyen Chan Hung– Hanoi University of Technology      1    4/2/2003                 Nguyen Chan Hung– Hanoi University of Technology   3




Introduction                                                                           Chapter 1: Background of compression
n   The importance of Multimedia technologies: à Multimedia everywhere !!
    q On PCs:
                                                                                       techniques
       n  Real Player, QuickTime, Windows Media.
       n  Music and Video are free on the INTERNET (mp2, mp3, mp4, asf, mpeg,
                                                                                       n   Why compression ?
          mov, ra, ram, mid, DIVX, etc)                                                    q      For communication: reduce bandwidth in multimedia
       n  Video/Audio Conferences.                                                                network applications such as Streaming media, Video-on-
       n  Webcast / Streaming Applications                                                        Demand (VOD), Internet Phone
       n  Distance Learning (or Tele-Education)
                                                                                           q      Digital storage (VCD, DVD, tape, etc) à Reduce size &
       n  Tele-Medicine
                                                                                                  cost, increase media capacity & quality.
       n  Tele-xxx (Let’s imagine !!)
    q On TVs and other home electronic devices:                                        n   Compression factor or compression ratio
       n  DVB-T/DVB-C/DVB-S (Digital Video Broadcasting –                                  q      Ratio between the source data and the compressed data.
          Terrestrial/Cable/Satellite) à shows MPEG -2 superior quality over
          traditional analog TV !!                                                                (e.g. 10:1)
       n  Interactive TV à Internet applications (Mail, Web, E -commerce) on a TV !!   n   2 types of compression:
          à No need to wait for a PC to startup and shutdown !!
       n  CD/VCD/DVD/Mp3 players                                                           q      Lossless compression
    q Also appearing in Handheld devices (3G Mobile phones, wireless PDA) !!               q      Lossy compression

4/2/2003                    Nguyen Chan Hung– Hanoi University of Technology      2    4/2/2003                 Nguyen Chan Hung– Hanoi University of Technology   4
Information content and redundancy                                                 Lossy Compression
n   Information rate                                                               n   The data from the expander is not identical to
     q Entropy is the measure of information content.
                                                                                       the source data but the difference can not be
            n   à Expressed in bits/source output unit (such as bits/pixel).
      The more information in the signal, the higher the
      q
                                                                                       distinguished auditorily or visually.
      entropy.                                                                           q    Suitable for audio and video compression.
    q Lossy compression reduce entropy while lossless
                                                                                         q    Compression factor is much higher than that of
      compression does not.
                                                                                              lossless. (up to 100:1)
n   Redundancy
    q The difference between the information rate and bit                          n   Based on the understanding of
      rate.                                                                            psychoacoustic and psychovisual perception.
    q Usually the information rate is much less than the bit
      rate.                                                                        n   Can be forced to operate at a fixed
    q Compression is to eliminate the redundancy.                                      compression factor.
4/2/2003                   Nguyen Chan Hung– Hanoi University of Technology    5   4/2/2003              Nguyen Chan Hung– Hanoi University of Technology   7




Lossless Compression                                                               Process of Compression
n   The data from the decoder is identical to the                                  n   Communication (reduce the cost of the data
    source data.                                                                       link)
      q    Example: archives resulting from utilities such as                            q    Data ? Compressor (coder) ? transmission
           pkzip or Gzip                                                                      channel ? Expander (decoder) ? Data'
      q    Compression factor is around 2:1.                                       n   Recording (extend playing time: in proportion
n   Can not guarantee a fix compression ratio à                                        to compression factor
    The output data rate is variable à problems                                          q    Data ? Compressor (coder) ? Storagedevice
    for recoding mechanisms or communication                                                  (tape, disk, RAM, etc.) ? Expander (decoder)
    channel.                                                                                  ? Data‘



4/2/2003                   Nguyen Chan Hung– Hanoi University of Technology    6   4/2/2003              Nguyen Chan Hung– Hanoi University of Technology   8
Sampling and quantization                                                         Statistical coding: the Huffman code
n   Why sampling?                                                                 n   Assign short code to the most probable data
      q    Computer can not process analog signal directly.                           pattern and long code to the less frequent
n   PCM                                                                               data pattern.
      q    Sample the analog signal at a constant rate and
           use a fixed number of bits (usually 8 or 16) to                        n   Bit assignment based on statistic of the
           represent the samples.                                                     source data.
      q    bit rate = sampling rate * number of bits per                          n   The statistics of the data should be known
           sample                                                                     prior to the bit assignment.
n   Quantization
      q    Map the sampled analog signal (generally, infinite
           precision) to discrete level (finite precision).
      q    Represent each discrete level with a number.
4/2/2003                  Nguyen Chan Hung– Hanoi University of Technology    9   4/2/2003                Nguyen Chan Hung– Hanoi University of Technology   11




Predictive coding                                                                 Drawbacks of compression
n   Prediction                                                                    n   Sensitive to data error
      q    Use previous sample(s) to estimate the current                               q    Compression eliminates the redundancy which is essential
                                                                                             to making data resistant to errors.
           sample.
      q    For most signal, the difference of the prediction                      n   Concealment required for real time application
                                                                                             Error correction code is required, hence, adds redundancy
           and actual values is small. à We can use smaller                             q

                                                                                             to the compressed data.
           number of bits to code the difference while
           maintaining the same accuracy !!                                       n   Artifacts
                                                                                        q    Artifacts appear when the coder eliminates part of the
      q    Noise is completely unpredictable                                                 entropy.
            n   Most codec requires the data being preprocessed or
                                                                                        q    The higher the compression factor, the more the artifacts.
                otherwise it may perform badly when the data contains
                noise.


4/2/2003                  Nguyen Chan Hung– Hanoi University of Technology   10   4/2/2003                Nguyen Chan Hung– Hanoi University of Technology   12
A coding example: Clustering color pixels                                          Motion Compensated Prediction
                                                                                   n   More data in Frame-Differential Coding can
n     In an image, pixel values are clustered in several                               be eliminated by comparing the present
      peaks                                                                            pixel to the location of the same object
                                                                                       in the previous frame. (à not to the
n     Each cluster representing the color range of one                                 same spatial location in the previous frame)
      object in the image (e.g. blue sky)                                          n   The encoder estimates the motion in the
                                                                                       image to find the corresponding area in a
n     Coding process:                                                                  previous frame.
    1. Separate the pixel values into a limited number of data                     n   The encoder searches for a portion of a
                                                                                       previous frame which is similar to the part
           clusters (e.g., clustered pixels of sky blue or grass green)                of the new frame to be transmitted.
    2. Send the average color of each cluster and an                               n   It then sends (as side information) a
       identifying number for each cluster as side information.                        motion vector telling the decoder what
                                                                                       portion of the previous frame it will use to
    3. Transmit, for each pixel:                                                       predict the new frame.
           n   The number of the average cluster color that it is close to.        n   It also sends the prediction error so that
           n   Its difference from that average cluster color. (à can be               the exact new frame may be reconstituted
               coded to reduce redundancy since the differences are often          n   See top figure à without motion
               similar !!) à Prediction                                                compensation – Bottom figure à With
                                                                                       motion compensation


4/2/2003                   Nguyen Chan Hung– Hanoi University of Technology   13   4/2/2003                Nguyen Chan Hung– Hanoi University of Technology   15




Frame-Differential Coding                                                          Unpredictable Information
n   Frame-Differential Coding = prediction from a                                  n        Unpredictable information from the previous
    previous video frame.
n   A video frame is stored in the encoder for                                              frame:
    comparison with the present frame à causes                                         1. Scene change (e.g. background landscape
    encodinglatency of one frame time.                                                    change)
n   For still images:
                                                                                       2. Newly uncovered information due to object
    q      Data can be sent only for the first instance of a frame
    q      All subsequent prediction error values are zero.                               motion across a background, or at the edges of a
    q      Retransmit the frame occasionally to allow receivers that                      panned scene. (e.g. a soccer ’s face uncovered
           have just been turned on to have a starting point.                             by a flying ball)
n   à FDC reduces the information for still images, but
    leaves significant data for moving images (e.g. a
    movement of the camera)


4/2/2003                   Nguyen Chan Hung– Hanoi University of Technology   14   4/2/2003                Nguyen Chan Hung– Hanoi University of Technology   16
Dealing with unpredictable Information                                           Types of picture transform coding
n   Scene change                                                                 n   Types of picture coding:
    q      à An Intra-coded picture (MPEG I picture) must be sent                    q      Discrete Fourier (DFT)
           for a starting point à require more data than Predicted                   q      Karhonen-Loeve
           picture (P picture)                                                       q      Walsh-Hadamard
    q      I pictures are sent about twice per second àTheir time and                q      Lapped orthogonal
           sending frequency may be adjusted to accommodate
           scene changes                                                             q      Discrete Cosine (DCT) à used in MPEG-2 !
                                                                                     q      Wavelets à New !
n   Uncovered information
    q      Bi-directionally coded type of picture, or B picture.                 n   The differences between transform coding methods:
    q      There must be enough frame storage in the system to wait                  q      The degree of concentration of energy in a few coefficients
           for the later picture that has the desired information.                   q      The region of influence of each coefficient in the
    q      To limit the amount of decoder’s memory, the encoder                             reconstructed picture
           stores pictures and sends the required reference                          q      The appearance and visibility of coding noise due to coarse
           pictures before sending the B picture.                                           quantization of the coefficients

4/2/2003                 Nguyen Chan Hung– Hanoi University of Technology   17   4/2/2003                 Nguyen Chan Hung– Hanoi University of Technology   19




Transform Coding                                                                 DCT Lossy Coding
n   Convert spatial image pixel values to                                        n   Lossless coding cannot obtain high
    transform coefficient values                                                     compression ratio (4:1 or less)
n   à the number of coefficients produced is                                     n   Lossy coding = discard selective information
    equal to the number of pixels transformed.                                       so that the reproduction is visually or aurally
n   Few coefficients contain most of the                                             indistinguishable from the source or having
    energy in a picture à coefficients may be                                        least artifacts.
    further coded by lossless entropy coding
                                                                                 n   Lossy coding can be achieved by:
n   The transform process concentrates the
    energy into particular coefficients                                              q      Eliminating some DCT coefficients
    (generally the “low frequency” coefficients )                                    q      Adjusting the quantizing coarseness of the
                                                                                            coefficients à better !!

4/2/2003                 Nguyen Chan Hung– Hanoi University of Technology   18   4/2/2003                 Nguyen Chan Hung– Hanoi University of Technology   20
Masking                                                                          Run-Level coding
n   Masking make certain types of coding                                         n   "Run-Level" coding = Coding a run-length of
    noise invisible or inaudible due to some                                         zeros followed by a nonzero level.
    psycho-visual/acoustical effect.                                                 q      à Instead of sending all the zero values
                                                                                            individually, the length of the run is sent.
    q      In audio, a pure tone will mask energy of higher
           frequency and also lower frequency (with weaker                           q      Useful for any data with long runs of zeros.
           effect).                                                                  q      Run lengths are easily encoded by Huffman code
    q      In video, high contrast edges mask random noise.
n   Noise introduced at low bit rates falls in the
    frequency, spatial, or temporal regions


4/2/2003                 Nguyen Chan Hung– Hanoi University of Technology   21   4/2/2003              Nguyen Chan Hung– Hanoi University of Technology   23




Variable quantization                                                            Key points:
n   Variable quantization is the main technique of lossy                         n   Compression process
    coding à greatly reduce bit rate.
                                                                                 n   Quantization & Sampling
n   Coarsely quantizing the less significant coefficients
    in a transform (à less noticeable / low energy / less                        n   Coding:
    visible/audible)                                                                 q      Lossless & lossy coding
n   Can be applied to a complete signal or to individual                             q      Frame-Differential Coding
    frequency components of a transformed signal.                                    q      Motion Compensated Prediction
n   VQ also controls instantaneous bit rate in order to:                             q      Variable quantization
    q      Match average bit rate to a constant channel bit rate.                    q      Run level coding
    q      Prevent buffer overflow or underflow.
                                                                                 n   Masking

4/2/2003                 Nguyen Chan Hung– Hanoi University of Technology   22   4/2/2003              Nguyen Chan Hung– Hanoi University of Technology   24
Chapter 2: Multimedia technologies                                                       JPEG – Zig-zag scanning
    q      Roadmap
           n   JPEG
           n   MPEG-1/MPEG-2 Video
           n   MPEG-1 Layer 3 Audio (mp3)
           n   MPEG-4
           n   MPEG-7 (brief introduction)
           n   HDTV (brief introduction)
           n   H261/H263 (brief introduction)
           n   Model base coding (MBC) (brief introduction)




4/2/2003                    Nguyen Chan Hung– Hanoi University of Technology        25   4/2/2003                  Nguyen Chan Hung– Hanoi University of Technology   27




JPEG (Joint Photographic Experts Group)                                                  JPEG - DCT
n   JPEG encoder                                                                         n   DCT is similar to the Discrete Fourier Transform à
    q      Partitions image into blocks of 8 * 8 pixels                                      transforms a signal or image from the spatial domain to
    q      Calculates the Discrete Cosine Transform (DCT) of each block.                     the frequency domain.
    q      A quantizer roundsoff the DCT coefficients according to the                   n   DCT requires less multiplications than DFT
           quantizationmatrix. à lossy but allows for large compression ratios.
    q      Produces a series of DCT coefficients using Zig-zag scanning
    q      Uses a variablelengthcode(VLC) on these DCT coefficients
    q      Writes the compressed data stream to an output file (*.jpg or *.jpeg).
n   JPEG decoder                                                                         n    Input image A:
    q      File à input data stream à Variable length decoder à IDCT (Inverse                 q     The input image A is N2 pixels wide by N1 pixels high;
           DCT) à Image
                                                                                              q     A(i,j) is the intensity of the pixel in row i and column j;
                                                                                         n    Output image B:
                                                                                              q     B(k1,k2) is the DCT coefficient in row k1 and column k2 of
                                                                                                    the DCT matrix

4/2/2003                    Nguyen Chan Hung– Hanoi University of Technology        26   4/2/2003                  Nguyen Chan Hung– Hanoi University of Technology   28
JPEG - Quantization Matrix                                                                                       MPEG (Moving Picture Expert Group)
n    The quantization matrix is the 8 by 8 matrix of step sizes
     (sometimescalled quantums) - one element for each DCT
                                                                                                                 n   MPEG is the heart of:
     coefficient.                                                                                                    q      Digital television set-top boxes
n    Usually symmetric.
                                                                                                                     q      HDTV decoders
n    Step sizes will be:
     q     Small in the upper left (low frequencies),                                                                q      DVD players
     q     Large in the upper right (high frequencies)                                                               q      Video conferencing
     q     A step size of 1 is the most precise.
n    The quantizer divides the DCT coefficient by its corresponding                                                  q      Internet video, etc
     quantum, then rounds to the nearest integer.
                                                                                                                 n   MPEG standards:
n    Large quantums drive small coefficients down to zero.
n    The result:                                                                                                     q      MPEG-1, MPEG-2, MPEG-4, MPEG-7
     q     Many high frequency coefficients become zero à remove easily.                                             q      (MPEG-3 standard was abandoned and became
     q     The low frequency coefficients undergo only minor adjustment.                                                    an extension of MPEG-2)

4/2/2003                           Nguyen Chan Hung– Hanoi University of Technology                         29   4/2/2003                   Nguyen Chan Hung– Hanoi University of Technology    31




JPEG Coding process illustrated                                                                                  MPEG standards
                                                                                                                 n   MPEG-1 (Obsolete)
    1255 -15     43    58    -12      1     -4     -6           78     -1     4       4    -1   0   0   0             q     A standard for storage and retrieval of moving pictures and audio
                                                                                                                            on storage media
    11     -65   80    -73 -27       -1     -5     1             1     -5     6       -4   -1   0   0   0             q     application: VCD (video compact disk)
    -49     37   -87   8     12       6     10     8            -4     3      -5      0    0    0   0   0
                                                                                                                 n   MPEG-2 (Widely implemented)
    27     -50   29    13    3       13     -6     5     Q       2     -3     1       0    0    0   0   0             q     A standard for digital television
    -16     21   -11 -10     10      -21     9     -6           -1     1      0       0    0    0   0   0             q     Applications: DVD (digital versatile disk), HDTV (high definition
                                                                                                                            TV), DVB (European Digital Video Broadcasting Group), etc.
     3     -14    0    14    -14     16     -8     4             0     0      0       0    0    0   0   0
     -4     -1    8    -13   12      -9      5     -1            0     0      0       0    0    0   0   0
                                                                                                                 n   MPEG-4 (Newly implemented – still being
                                                                                                                     researched)
     -4      2   -2    6     -7       6     -1     3             0      0     0       0    0    0   0   0             q     A standard for multimedia applications
        DCT Coefficients                             Quantization result                                              q     Applications: Internet, cable TV, virtual studio, etc.
    Zigzag scan result: 78 -1 1 -4 -5 4 4 6 3 2 -1 -3 -5 -4 -1 0 -1 0 1 1 0 0 0 0 0 0 0                          n   MPEG-7 (Future work – ongoing research)
                                                                                                                      q     Content representation standard for information search
     0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 EOB
                                                                                                                            ( “Multimedia Content Description Interface”)
    à Easily coded by Run-length Huffman coding                                                                       q     Applications: Internet, video search engine, digital library

4/2/2003                           Nguyen Chan Hung– Hanoi University of Technology                         30   4/2/2003                   Nguyen Chan Hung– Hanoi University of Technology    32
MPEG-2 formal standards                                                        Pixel & Block
n   The international standard ISO/IEC 13818-2                                 n   Pixel = "picture element".
    "Generic Coding of Moving Pictures and                                         q      A discrete spatial point sample of an image.
    Associated Audio Information”
                                                                                   q      A color pixel may be represented digitally as a
n   ATSC (Advanced Television Systems                                                     number of bits for each of three primary color
    Committee) document A/54 "Guide to the Use of                                         values
    the ATSC Digital Television Standard”
                                                                               n   Block
                                                                                   q      = 8 x 8 array of pixels.
                                                                                   q      A block is the fundamental unit for the DCT coding
                                                                                          (discrete cosine transform).


4/2/2003               Nguyen Chan Hung– Hanoi University of Technology   33   4/2/2003               Nguyen Chan Hung– Hanoi University of Technology   35




MPEG video data structure                                                      Macroblock
n   The MPEG 2 video data stream is constructed in                             n   A macroblock = 16 x 16 array of luma (Y) pixels ( =
    layers from lowest to highest as follows:                                      4 blocks = 2 x 2 block array).
    q      PIXEL is the fundamental unit                                       n   The number of chroma pixels (Cr, Cb) will vary
    q      BLOCK is an 8 x 8 array of pixels                                       depending on the chroma pixel structure
                                                                                   indicated in the sequence header (e.g. 4:2:0, etc)
    q      MACROBLOCK consists of 4 luma blocks and 2 chroma
           blocks                                                              n   The macroblock is the fundamental unit for motion
    q      Field DCT Coding and Frame DCT Coding                                   compensation and will have motion vector(s)
                                                                                   associated with it if is predictively coded.
    q      SLICE consists of a variable number of macroblocks
    q      PICTURE consists of a frame (or field) of slices                    n   A macroblock is classified as
                                                                                   q Field coded (à An interlaced frame consists of 2 field)
    q      GROUP of PICTURES (GOP) consists of a variable
           number of pictures                                                      q Frame coded
                                                                                   à depending on how the four blocks are extracted from the
    q      SEQUENCE consists of a variable number of GOP’s                           macroblock.
    q      PACKETIZED ELEMENTARY STREAM (opt)

4/2/2003               Nguyen Chan Hung– Hanoi University of Technology   34   4/2/2003               Nguyen Chan Hung– Hanoi University of Technology   36
Slice                                                                              I, P, B Pictures
                                                                                   Encoded pictures are classified into 3 types: I, P, and B.
n   Pictures are divided into slices.
                                                                                   n I Pictures = Intra Coded Pictures
n   A slice consists of an arbitrary number of                                         q      All macroblocks coded without prediction
                                                                                       q      Needed to allow receiver to have a "starting point" for prediction after
    successive macroblocks (going left to right),                                             a channel change and to recover from errors
    but is typically an entire row of macroblocks.                                 n   P Pictures = Predicted Pictures
                                                                                              Macroblocks may be coded with forward prediction from references
    A slice does not extend beyond one row.                                            q
                                                                                              made from previous I and P pictures or may be intra coded
n   The slice header carries address information                                   n   B Pictures = Bi-directionally predicted pictures
                                                                                       q      Macroblocks may be coded with forward prediction from previous I
    that allows the Huffman decoder to                                                        or P references
    resynchronize at slice boundaries                                                  q      Macroblocks may be coded with backward prediction from next I or
                                                                                              P reference
                                                                                       q      Macroblocks may be coded with interpolated prediction from past
                                                                                              and future I or P references
                                                                                       q      Macroblocks may be intra coded (no prediction)

4/2/2003                   Nguyen Chan Hung– Hanoi University of Technology   37   4/2/2003                       Nguyen Chan Hung– Hanoi University of Technology              39




Picture                                                                            Group of pictures (GOP)
                                                                                   n   The group of pictures layer is optional in MPEG-2.
n   A source picture is a contiguous rectangular array of pixels.                  n   GOP begins with a start code and a header
n   A picture may be a complete frame of video ("frame picture") or                n   The header carries
    one of the interlaced fields from an interlaced source ("field                     q time code information
    picture").                                                                         q editing information
n   A field picture does not have any blank lines between its active                   q optional user data
    lines of pixels.                                                               n   First encoded picture in a GOP is always an I picture
n   A coded picture (also called a video access unit) begins with a                n   Typical length is 15 pictures with the following structure (in display order):
    start code and a header. The header consists of:                                   q I B B P B B P B B P B B P B B à Provides an I picture with sufficient
    q      picture type (I, B, P)                                                         frequency to allow a decoder to decode correctly
    q      temporal reference information                                                             Forward motion compensation
    q      motion vector search range
    q      optional user data
n   A frame picture consists of:                                                                              I      B B        P     B     B    P     B      B P    B
                                                                                                                                                                         Time
    q      a frame of a progressive source or
    q      a frame (2 spatially interlaced fields) of an interlaced source

                                                                                                              Bidirectional motion compensation
4/2/2003                   Nguyen Chan Hung– Hanoi University of Technology   38   4/2/2003                       Nguyen Chan Hung– Hanoi University of Technology              40
Sequence                                                                          Transport stream
n   A sequence begins with a unique 32 bit start code followed by                 n   Transport packets (fixedlength) are formed from a PES stream,
    a header.                                                                         including:
                                                                                      q      The PES header
n   The header carries:
                                                                                      q      Transport packet header.
    q      picture size                                                               q      Successive transport packet’s payloads are filled by the remaining
    q      aspect ratio                                                                      PES packet content until the PES packet is all used.
    q      frame rate and bit rate                                                    q      The final transport packet is filled to a fixed length by stuffing with
                                                                                             0xFF bytes (all ones).
    q      optional quantizer matrices
                                                                                  n   Each PES packet header includes:
    q      required decoder buffer size
                                                                                      q An 8-bit stream ID identifying the source of the payload.
    q      chroma pixel structure
                                                                                      q Timing references: PTS (presentation time stamp), the time
    q      optional user data                                                           at which a decoded audio or video access unit is to be
n   The sequence information is needed for channel changing.                            presented by the decoder
n   The sequence length depends on acceptable channel change                          q DTS (decoding time stamp) the time at which an access unit

    delay.                                                                              is decoded by the decoder
                                                                                      q ESCR (elementary stream clock reference).




4/2/2003                  Nguyen Chan Hung– Hanoi University of Technology   41   4/2/2003                    Nguyen Chan Hung– Hanoi University of Technology         43




Packetized Elementary Stream (PES)                                                Intra Frame Coding
                                                                                  n   Intra coding only concern with information within the current
n   Video Elementary Stream (video ES), consists of all                               frame, (not relative to any other frame in the video sequence)
    the video data for a sequence, including the sequence                         n   MPEG intra-frame coding block diagram (See bottom Fig) à
    header and all the subparts of a sequence.                                        Similar to JPEG (àLet’s review JPEG coding mechanism !!)
n   An ES carries only one type of data (video or audio)                          n   Basic blocks of Intra frame coder:
    from a single video or audio encoder.                                             q      Video filter
n   A PES, consists of a single ES which has been split                               q      Discrete cosine transform (DCT)
    into packets, each starting with an added packet                                  q      DCT coefficient quantizer
    header.                                                                           q      Run-length amplitude/variable length coder (VLC)
n   A PES stream contains only one type of data from
    one source, e.g. from one video or audio encoder.
n   PES packets have variable length, not corresponding
    to the fixed packet length of transport packets, and
    may be much longer than a transport packet.


4/2/2003                  Nguyen Chan Hung– Hanoi University of Technology   42   4/2/2003                    Nguyen Chan Hung– Hanoi University of Technology         44
Video Filter                                                                                   MPEG Profiles & levels
                                                                                               n   MPEG-2 is classified into several profiles.
n   Human Visual System (HVS) is
                                                                                               n   Main profile features:
    q      Most sensitive to changes in luminance,
                                                                                                   q      4:2:0 chroma sampling format
    q      Less sensitive to variations in chrominance.                                            q      I, P, and B pictures
n   MPEG uses the YCbCr color space to represent the                                               q      Non-scalable
    data values instead of RGB, where:                                                         n   Main Profile is subdivided into levels.
    q      Y is the luminance signal,                                                              q      MP@ML (Main Profile Main Level):
                                                                                                          n   Designed with CCIR601 standard for interlaced standard digital
    q      Cb is the blue color difference signal,                                                            video.
    q      Cr is the red color difference signal.                                                         n   720 x 576 (PAL) or 720 x 483 (NTSC)
n   What is “4:4:4”, “4:2:0”, etc, video format ?                                                         n   30 Hz progressive, 60 Hz interlaced
                                                                                                          n   Maximum bit rate is 15 Mbits/s
    q      4:4:4 is full bandwidth YCbCr video à each macroblock                                   q      MP@HL (Main Profile High Level):
           consists of 4 Y blocks, 4 Cb blocks, and 4 Cr blocks à                                         n   Upper bounds:
           waste of bandwidth !!                                                                          n   1152 x 1920, 60Hz progressive
    q      4:2:0 is most commonly used in MPEG-2                                                          n   80 Mbits/s

4/2/2003                 Nguyen Chan Hung– Hanoi University of Technology                 45   4/2/2003                    Nguyen Chan Hung– Hanoi University of Technology    47




Applications of chroma formats                                                                 MPEG encoder/ decoder
chroma_for          Multiplex order (time)
                                                                            Application
   mat               within macroblock

4:2:0                                                      ØMain stream television,
                  YYYYCbCr
(6 blocks)                                                 ØConsumer entertainment.


                                                           ØStudio production
4:2:2                                                      environments
                  YYYYCbCrCbCr
(8 blocks)                                                 ØProfessional editing
                                                           equipment,


4:4:4
                  YYYYCbCrCbCrCbCrCbCr                     ØComputer graphics
(12 blocks)


4/2/2003                 Nguyen Chan Hung– Hanoi University of Technology                 46   4/2/2003                    Nguyen Chan Hung– Hanoi University of Technology    48
Prediction                                                                                          DCT and IDCT formulas
                                                  n    Backward prediction is done by                                                                       n    DCT:
                                                       storing pictures until the desired
                                                       anchor picture is available before                                                                        q    Eq 1 à Normal form
                                                       encoding the current stored frames.                                                                       q    Eq 2 à Matrix form
                                                  n    The encoder can decide to use:                                                                       n    IDCT:
                                                       q    Forward prediction from a previous                                                                   q    Eq 3 à Normal form
                                                            picture,                                                                                             q    Eq 4 à Matrix form
                                                       q    Backward prediction from a following
                                                            picture,
                                                                                                                                                            n    Where:
                                                       q    or Interpolated prediction                                                                           q    F(u,v) = two-dimensional
                                                                                                                                                                      NxN DCT.
                                                       à to minimize prediction error.
                                                                                                                                                                 q    u,v,x,y = 0,1,2,...N-1
                                                  n    The encoder must transmit pictures in
                                                       an order differ from that of source                                                                       q    x,y are spatial coordinates in
                                                                                                                                                                      the sample domain.
                                                       pictures so that the decoder has the
                                                       anchor pictures before decoding                                                                           q    u,v are frequency coordinates
                                                       predicted pictures. (See next slide)                                                                           in the transform domain.
                                                  n    The decoder must have two frame                                                                           q    C(u), C(v) = 1/(square root
                                                       stored.                                                                                                        (2)) for u, v = 0.
                                                                                                                                                                 q    C(u), C(v) = 1 otherwise.


4/2/2003                 Nguyen Chan Hung– Hanoi University of Technology                      49   4/2/2003                   Nguyen Chan Hung– Hanoi University of Technology                  51




I P B Picture Reordering                                                                            DCT versus DFT
n   Pictures are coded and decoded in a different order                                             n   The DCT is conceptually similar to the DFT, except:
    than they are displayed.                                                                            q      DCT concentrates energy into lower order coefficients
n   à Due to bidirectional prediction for B pictures.                                                          better than DFT.
                                                                                                        q      DCT is purely real, the DFT is complex (magnitude and
n   For example we have a 12 picture long GOP:                                                                 phase).
n   Source order and encoder input order:                                                               q      A DCT operation on a block of pixels produces coefficients
    q      I(1) B(2) B(3) P(4) B(5) B(6) P(7) B(8) B(9) P(10) B(11)                                            that are similar to the frequency domain coefficients
           B(12) I(13)                                                                                         produced by a DFT operation.
                                                                                                               n   An N-point DCT has the same frequency resolution as a 2N-
n   Encoding order and order in the coded bitstream:                                                               point DFT.
    q      I(1) P(4) B(2) B(3) P(7) B(5) B(6) P(10) B(8) B(9) I(13) B(11)                                      n   The N frequencies of a 2N point DFT correspond to N points
           B(12)                                                                                                   on the upper half of the unit circle in the complex frequency
n   Decoder output order and display order (same as                                                                plane.
    input):                                                                                             q      Assuming a periodic input, the magnitude of the DFT
    q      I(1) B(2) B(3) P(4) B(5) B(6) P(7) B(8) B(9) P(10) B(11)                                            coefficients is spatially invariant (phase of the input does
                                                                                                               not matter). This is not true for the DCT.
           B(12) I(13)

4/2/2003                 Nguyen Chan Hung– Hanoi University of Technology                      50   4/2/2003                   Nguyen Chan Hung– Hanoi University of Technology                  52
Quantization matrix                                                           MPEG scanning
                                            n    Note à DCT                   n   Left à Zigzag scanning (like JPEG)
                                                 coefficients are:
                                                 q Small in the upper left    n   Right à Alternate scanning à better for interlaced frames !
                                                   (low frequencies),
                                                 q Large in the upper right
                                                   (high frequencies)
                                                 à Recall the JPEG
                                                   mechanism !!
                                            n    Why ?
                                                 q   HVS is less sensitive
                                                     to errors in high
                                                     frequency coefficients
                                                     than it is for lower
                                                     frequencies
                                                 q   à higherfrequencies
                                                     should be more
                                                     coarsely quantized !!

4/2/2003   Nguyen Chan Hung– Hanoi University of Technology              53   4/2/2003             Nguyen Chan Hung– Hanoi University of Technology   55




Result DCT matrix (example)                                                   Huffman/ Run-Level Coding
                                                n    After adaptive           n   Huffman coding in combination with Run-Level
                                                                                  coding and zig-zag scanning is applied to
                                                     quantization, the            quantized DCT coefficients.
                                                     result is a matrix       n   "Run-Level" = A run-length of zeros followed by a
                                                     containing many              non-zero level.
                                                     zeros.                   n   Huffman coding is also applied to various types of
                                                                                  side information.
                                                                              n   A Huffman code is an entropy code which is
                                                                                  optimally achieves the shortest average possible
                                                                                  code word length for a source.
                                                                              n   à This average code word length is >= the entropy
                                                                                  of the source.

4/2/2003   Nguyen Chan Hung– Hanoi University of Technology              54   4/2/2003             Nguyen Chan Hung– Hanoi University of Technology   56
Huffman/ Run-Level coding illustrated                                                              MPEG Data Transport
                                                                                                   n       MPEG packages all data into fixed-size 188-byte packets for transport.
   Zero                            MPEG                    n    Using the DCT output               n       Video or audio payload data placed in PES packets before is broken up
                Amplitude                                       matrix in previous slide,
 Run-Length                       Code Value                                                               into fixed length transport packet payloads.
      N/A      8 (DC Value)         110 1000
                                                                after being zigzag                 n       A PES packet may be much longer than a transport packet à Require
                                                                scanned à the output                       segmentation:
           0        4              00001100                     will be a sequence of                      q    The PES header is placed immediately following a transport header
           0        4              00001100                     number: 4, 4, 2, 2, 2, 1,                  q    Successive portions of the PES packet are then placed in the pay loads of
                                                                1, 1, 1, 0 (12 zeros), 1, 0                     transport packets.
           0        2                 01000
                                                                (41 zeros)                                 q    Remaining space in the final transport packet payload is filled with stuffing
           0        2                 01000                                                                     bytes = 0xFF (all ones).
                                                           n    These values are looked                    q    Each transport packet starts with a sync byte = 0x47.
           0        2                 01000                     up in a fixed table of                     q    In the ATSC US terrestrial DTV VSB transmission system, sync byte is not
           0        1                   110                     variable length codes                           processed, but is replaced by a different sync symbol especially suited to RF
                                                                q    à The most probable                        transmission.
           0        1                   110                          occurrence is given a                 q    The transport packet header contains a 13-bit PID (packet ID), which
                                                                                                                corresponds to a particular elementary stream of video, audio, o r other program
           0        1                   110                          relatively short code,                     element.
           0        1                   110                     q    à The least probable                  q    PID 0x0000 is reserved for transport packets carrying a program association
                                                                     occurrence is given a                      table (PAT).
       12           1             0010 0010 0                        relatively long code.                 q    The PAT points to a Program Map Table (PMT)à points to particular elements
      EOB         EOB                   10                                                                      of a program

4/2/2003                Nguyen Chan Hung– Hanoi University of Technology                  57       4/2/2003                        Nguyen Chan Hung– Hanoi University of Technology                    59




Huffman/ Run-Level coding illustrated (2)                                                          MPEG Transport packet
n   à The first run of 12 zeroes has been efficiently
    coded by only 9 bits
n   à The last run of 41 zeroes has been entirely
    eliminated, represented only with a 2-bit End Of
    Block (EOB) indicator.
n   à The quantized DCT coefficients are now                                                                                                                         PCR_flag
                                                                                               n       Adaptation Field:                                         q

    represented by a sequence of 61 binary bits (See                                                   q       8 bits specifying the length of the               q   OPCR_flag
                                                                                                               adaptation field.                                 q   splicing_point_flag
    the table).                                                                                                                                                  q   transport_private_data_flag
                                                                                                       q       The first group of flags consists of
                                                                                                               eight 1-bit flags:                                q   adaptation_field_extension_flag
n   Considering that the original 8x8 block of 8-bit                                                                                                             q   The optionalfields are present if
                                                                                                       q       discontinuity_indicator
    pixels required 512 bits for full representation, à                                                q       random_access_indicator                           q
                                                                                                                                                                     indicated by one of the preceding flags.
                                                                                                                                                                     The remainder of the adaptation field is
    the compression rate is approx. 8,4:1.                                                             q       elementary_stream_priority_in                         filled with stuffing bytes (0xFF, all
                                                                                                                                                                     ones).
                                                                                                               dicator

4/2/2003                Nguyen Chan Hung– Hanoi University of Technology                  58       4/2/2003                        Nguyen Chan Hung– Hanoi University of Technology                    60
Demultiplexing a Transport Stream (TS)                                                           Timing - Synchronization
n        Demultiplexing a transport stream involves:                                             n   The decoder is synchronized with the encoder by time stamps
    1.     Finding the PAT by selecting packets with PID = 0x0000                                n   The encoder contains a master oscillator and counter, called the
    2.     Reading the PIDs for the PMTs                                                             System Time Clock (STC). (See previous block diagram.)
    3.     Reading the PIDs for the elements of a desired program                                    q      à The STC belongs to aparticular program and is the master
                                                                                                            clock of the video and audio encoders for that program.
           from its PMT (for example, a basic program will have a
                                                                                                     q      à Multiple programs, each with its own STC, can also be
           PID for audio and a PID for video)                                                               multiplexed into a single stream.
    4.     Detecting packets with the desired PIDs and routing them                              n   A program component can even have no time stamps à but
           to the decoders                                                                           can not be synchronized with other components.
q        A MPEG-2 transport stream can carry:                                                    n   At encoder input, (Point A), the time of occurrence of an input
    §  Video stream                                                                                  video picture or audio block is noted by sampling the STC.
    §  Audio stream                                                                              n   A total delay of encoder and decoder buffer (constant) is
                                                                                                     added to STC, creating a Presentation Time Stamp (PTS),
    §  Any type of data
                                                                                                     q      à PTS is then inserted in the first of the packet(s ) representing
    à MPEG-2 TS is the packet format for CATV downstream                                                    that picture or audio block, at Point B.
       data communication.

4/2/2003               Nguyen Chan Hung– Hanoi University of Technology                 61       4/2/2003                   Nguyen Chan Hung– Hanoi University of Technology     63




Timing & buffer control                                                   n   Point A:
                                                                              Encoder input
                                                                                                 Timing – Synchronization (2)
                                                                              à
                                                                              Constant/specifi   n   Decode Time Stamp (DTS) can optionally combined into the bit
                                                                              edrate                 stream à represents the time at which the data should be taken
                                                                          n   Point B:               instantaneously from the decoder buffer and decoded.
                                                                              Encoder
                                                                              outputà                q   DTS and PTS are identical except in the case of picture reordering for B
                                                                              Variable rate              pictures.
                                                                          n   Point C:               q   The DTS is only used where it is needed because of reordering.
                                                                              Encoderbuffer              Whenever DTS is used, PTS is also coded.
                                                                              output à
                                                                              Constant rate          q   PTS (or DTS) inserted interval = 700 m S.
                                                                          n   Point D:               q   In ATSC à PTS (or DTS) must be inserted at the beginning of each
                                                                              Communication              coded picture (access unit ).
                                                                              channel +
                                                                              decoderbuffer      n   In addition, the output of the encoder buffer (Point C) is time
                                                                              à Constant             stamped with System Time Clock (STC) values, called:
                                                                              rate                   q   System Clock Reference (SCR) in a Program Stream.
                                                                          n   Point E:
                                                                              Decoder input          q   Program Clock Reference (PCR) in a Transport Stream.
                                                                              à Variable rate    n   PCR time stamp interval = 100mS.
                                                                          n   Point F:
                                                                              Decoderoutput      n   SCR time stamp interval = 700mS.
                                                                              à                  n   PCR and/or the SCR are used to synchronize the decoder STC
                                                                              Constant/specifi
                                                                              edrate                 with the encoder STC.
4/2/2003               Nguyen Chan Hung– Hanoi University of Technology                 62       4/2/2003                   Nguyen Chan Hung– Hanoi University of Technology     64
Timing – Synchronization (3)                                                   HDTV (2)
n   All video and audio streams included in a program must get their           n   HDTV proposals are for a screen which is wider than the conventional
    time stamps from a common STC so that synchronization of the                   TV image by about 33%. It is generally agreed that the HDTV aspect
    video and audio decoders with each other may be accomplished.                  ratio will be 16:9, as opposed to the 4:3 ratio of conventional TV
n   The data rate and packet rate on the channel (at the multiplexer               systems. This ratio has been chosen because psychological tests have
    output) can be completely asynchronous with the System Time                    shown that it best matches the human visual field.
    Clock (STC)                                                                n   It also enables use of existing cinema film formats as additional source
                                                                                   material, since this is the same aspect ratio used in normal 35 mm film.
n   PCR time stamps allows synchronizations of different                           Figure 16.6(a) shows how the aspect ratio of HDTV compares with that
    multiplexed programs having different STCs while allowing STC                  of conventional television, using the same resolution, or the same
    recovery for each program.                                                     surface area as the comparison metric.
n   If there is no buffer underflow or overflow à delays in the buffers        n   To achieve the improved resolution the video image used in HDTV
    and transmission channel for both video and audio are                          must contain over 1000 lines, as opposed to the 525 and 625 provided
    constant.                                                                      by the existing NTSC and PAL systems. This gives a much improved
                                                                                   vertical resolution. The exact value is chosen to be a simple multiple of
n   The encoder input and decoder output run at equal and constant                 one or both of the vertical resolutions used in conventional TV.
    rates .                                                                    n   However, due to the higher scan rates the bandwidth requirement for
n   Fixedend-to-end delay from encoder input to decoder output                     analogue HDTV is approximately 12 MHz, compared to the nominal 6
n   If exact synchronization is not required, the decoder clock can be             MHz of conventional TV
    free running à video frames can be repeated / skipped as
    necessary to prevent buffer underflow / overflow , respectively.
4/2/2003               Nguyen Chan Hung– Hanoi University of Technology   65   4/2/2003                Nguyen Chan Hung– Hanoi University of Technology    67




HDTV (High definition television)                                              HDTV (3)
n   High definition television (HDTV) first came to                            n   The introduction of a non-compatible TV transmission format for
                                                                                   HDTV would require the viewer either to buy a new receiver, or to
    public attention in 1981, when NHK, the                                        buy a converter to receive the picture on their old set.
    Japanese broadcasting authority, first                                     n   The initial thrust in Japan was towards an HDTV format which is
                                                                                   compatible with conventional TV standards, and which can be
    demonstrated it in the United States.                                          received by conventional receivers, with conventional quality.
                                                                                   However, to get the full benefit of HDTV, a new wide screen, high
n   HDTV is defined by the ITU-R as:                                               resolution receiver has to be purchased.
    q      'A system designed to allow viewing at about                        n   One of the principal reasons that HDTV is not already common is
           three times the picture height, such that the                           that a general standard has not yet been agreed. The 26th CCIR
                                                                                   plenary assembly recommended the adoption of a single, worldwide
           system is virtually, or nearly, transparent to the                      standard for high definition television.
           quality or portrayal that would have been                           n   Unfortunately, Japan, Europe and North America are all investing
           perceived in the original scene ... by a discerning                     significant time and money in their own systems based on their own,
           viewer with normal visual acuity.'                                      current, conventional TV standards and other national
                                                                                   considerations.


4/2/2003               Nguyen Chan Hung– Hanoi University of Technology   66   4/2/2003                Nguyen Chan Hung– Hanoi University of Technology    68
H261- H263                                                                         H261-H263 (3)
n   The H.261 algorithm was developed for the purpose of image                     n   H.261 is widely used on 176x 144 pixel images.
    transmission rather than image storage.                                        n   The ability to select a range of output rates for the algorithm
n   It is designed to produce a constant output of p x 64 kbivs, where                 allows it to be used in different applications.
    p is an integer in the range 1 to 30.
    q      This allows transmission over a digital network or data link of         n   Low output rates ( p = 1 or 2) are only suitable for face-to-face
           varying capacity.                                                           (videophone) communication. H.261 is thus the standard used in
    q      It also allows transmission over a single 64 kbit/s digital                 many commercial videophone systems such as the UK
           telephone channel for low quality video-telephony, or at higher bit         BT/Marconi Relate 2000 and the US ATT 2500 products.
           rates for improved picture quality.                                     n   Video-conferencing would require a greater output data rate ( p >
n   The basic coding algorithm is similar to that of MPEG in that it is                6) and might go as high as 2 Mbit/s for high quality transmission
    a hybrid of motion compensation, DCT and straightforward                           with larger image sizes.
    DPCM (intra-frame coding mode), without the MPEG I, P, B
    frames.                                                                        n   A further development of H.261 is H.263 for lower fixed
n   The DCT operation is performed at a low level on 8 x 8 blocks of                   transmission rates.
    error samples from the predicted luminance pixel values, with                  n   This deploys arithmetic coding in place of the variable length
    sub-sampled blocks of chrominance data.                                            coding (See H261 diagram), with other modifications, the data
                                                                                       rate is reduced to only 20 kbit/s.

4/2/2003                   Nguyen Chan Hung– Hanoi University of Technology   69   4/2/2003              Nguyen Chan Hung– Hanoi University of Technology   71




H261-H263 (2)                                                                      Model Based Coding (MBC)
                                                                                   n   At the very low bit rates (20 kbit/s or less) associated with video
                                                                                       telephony, the requirements for image transmission stretch the
                                                                                       compression techniques described earlier to their limits.
                                                                                   n   In order to achieve the necessary degree of compression they
                                                                                       often require reduction in spatial resolution or even the
                                                                                       elimination of frames from the sequence.
                                                                                   n   Model based coding (MBC) attempts to exploit a greater degree
                                                                                       of redundancy in images than current techniques, in order to
                                                                                       achieve significant image compression but without adversely
                                                                                       degrading the image content information.
                                                                                   n   It relies upon the fact that the image quality is largely subjective.
                                                                                       Providing that the appearance of scenes within an observed
                                                                                       image is kept at a visually acceptable level, it may not matter that
                                                                                       the observed image is not a precise reproduction of reality.



4/2/2003                   Nguyen Chan Hung– Hanoi University of Technology   70   4/2/2003              Nguyen Chan Hung– Hanoi University of Technology   72
Model Based Coding (2)                                                             Model based coding (4)
                                                                                       n   A synthetic image is created by texture mapping detail from an
    n    One MBC method for producing an artificial image of a head sequence               initial full-face source image, over the wire-frame, Facial
         utilizes a feature codebook where a range of facial expressions,
         sufficient to create an animation, are generated from sub-images or               movement can be achieved by manipulation of the vertices of the
         templates which are joined together to form a complete face.                      wire-frame.
    n    The most important areas of a face, for conveying an expression, are          n   Head rotation requires the use of simple matrix operations upon
         the eyes and mouth, hence the objective is to create an image in which            the coordinate array. Facial expression requires the manipulation
         the movement of the eyes and mouth is a convincing approximation to               of the features controlling the vertices.
         the movements of the original subject.                                        n   This model based feature codebook approach suffers from the
    n    When forming the synthetic image, the feature template vectors which              drawback of codebook formation.
         form the closest match to those of the original moving sequence are
         selected from the codebook and then transmitted as low bit rate coded         n   This has to be done off-line and, consequently, the image is
         addresses.                                                                        required to be prerecorded, with a consequent delay.
    n    By using only 10 eye and 10 mouth templates, for instance, a total of         n   However, the actual image sequence can be sent at a very low
         100 combinations exists implying that only a 6 -bit codebook address              data rate. For a codebook with 128 entries where 7 bits are
         need be transmitted.                                                              required to code each mouth, a 25 frameh sequence requires
    n    It has been found that there are only 13 visually distinct mouth shapes           less than 200 bit/s to code the mouth movements.
         for vowel and consonant formation during speech.                              n   When it is finally implemented, rates as low as 1 kbit/s are
    n    However, the number of mouth sub-images is usually increased, to                  confidently expected from MBC systems, but they can only
         include intermediate expressions and hence avoid step changes in the              transmit image sequences which match the stored model, e.g.
         image.                                                                            head and shoulders displays.
    4/2/2003                   Nguyen Chan Hung– Hanoi University of Technology   73   4/2/2003                    Nguyen Chan Hung– Hanoi University of Technology     75




    Model Based Coding (3)                                                             Key points:
n       Another common way of representing objects in three-
        dimensional computer graphics is by a net of                                   n   JPEG coding mechanism à DCT/ Zigzag Scanning/ Adaptive
        interconnecting polygons.                                                          Quantization / VLC
n       A model is stored as a set of linked arrays which specify                      n   MPEG layered structure:
        the coordinates of each polygon vertex, with the lines                             q      Pixel, Block, Macroblock, Field DCT Coding / Frame DCT Coding, Slice,
        connecting the vertices together forming each side of a                                   Picture, Group of Pictures (GOP), Sequence, Packetized Elementary Stream
        polygon.                                                                                  (PES)
n       To make realistic models, the polygon net can be
        shaded to reflect the presence of light sources.                               n   MPEG compression mechanism:
n       The wire-frame model [Welch 19911 can be modified to                               q      Prediction
        fit the shape of a person's head and shoulders. The                                q      Motion compensation
        wire-frame, composed of over 100 interconnecting                                   q      Scanning
        triangles, can produce subjectively acceptable synthetic                           q      YCbCr formats (4:4:4, 4:2:0, etc)
        images, providing that the frame is not rotated by more
        than 30" from the full -face position.                                             q      Profiles @ Level
n       The model, (see the Figure) uses smaller triangles in                              q      I,P,B pictures & reordering
        areas associated with high degrees of curvature where                              q      Encoder/ Decoder process & Block diagram
        significant movement is required.                                              n   MPEG Data transport
n       Large flat areas, such as the forehead, contain fewer                          n   MPEG Timing & Buffer control
        triangles.
                                                                                           q      STC/SCR/DTS
n       A second wire-frame is used to model the mouth
        interior.                                                                          q      PCR/PTS


    4/2/2003                   Nguyen Chan Hung– Hanoi University of Technology   74   4/2/2003                    Nguyen Chan Hung– Hanoi University of Technology     76
Technical terms                                                              A Brief History:
n   Macro blocks
n   HVS = Human Visual System                                                    q      CATV appeared in the 60s in the US, where high
n   GOP = Group of Pictures                                                             buildings are the great obstacles for the
n   VLC = Variable Length Coding/Coder                                                  propagation of TV signal.
n   IDCT/DCT = (Inverse) Discrete Cosine Transform
n   PES = Packetized ElementaryStream                                            q      Old CATV networks à
n   MP@ML = Main profile @ Main Level                                                   n   Coaxial only
n   PCR = Program Clock Reference                                                       n   Tree-and-Branch only
n   SCR = System Clock Reference
                                                                                        n   TV only
n   STC = System Time Clock
n   PTS = Presentation Time Stamp                                                       n   No return path (à high-pass filters are installed in
n   DTS = Decode Time Stamp                                                                 customer’s houses to block return low frequency noise)
n   PAT = Program Association Table
n   PMT = Program Map Table



4/2/2003             Nguyen Chan Hung– Hanoi University of Technology   77   4/2/2003                  Nguyen Chan Hung– Hanoi University of Technology                 79




Chapter 3. CATV systems                                                      Modern CATV networks
                                                                                                                                                          n   Key elements:
                                                                                                                                                               q   CO or
n   Overview:                                                                                                                                                      Master
                                                                                                                                                                   Headend
      q A brief history                                                                                                                                        q   Headends/
                                                                                                                                                                   Hub
      q Modern CATV networks                                                                                                                                   q   Server
                                                                                                                                                                   complex
                                                                                                                                                               q   CMTS
      q CATV systems and equipments                                                                                                                            q   TV content
                                                                                                                                                                   provider
                                                                                                                                                               q   Optical
                                                                                                                                                                   Nodes
                                                                                                                                                               q   Taps
                                                                                                                                                               q   Amplifiers
                                                                                                                                                                   (GNA/TNA/L
                                                                                                                                                                   E)



4/2/2003             Nguyen Chan Hung– Hanoi University of Technology   78   4/2/2003                  Nguyen Chan Hung– Hanoi University of Technology                 80
Modern CATV networks (2)                                                            CATV systems and equipments
n   Based on Hybrid Fiber-Coaxial architecture à also referred to
    as “HFC networks”
n   The optical section is based on modern optical communication
    technologies à
      q    Star/ring/mesh, etc topologies
      q    SDH/SONET for digital fibers
      q    Various architectures à digital, analog or mixed fiber cabling
           systems.
n   Part of forward path spectrum is used for high-speed Internet
    access
n   Return path is exploited for Digital data communication à the
    root of new problems !!
      q    5-60 MHz band for upstream
      q    88-860 MHz band for downstream
            n   88-450 MHz for analog/digital TV channels
            n   450-860 MHz for Internet access
      q    FDM

4/2/2003                    Nguyen Chan Hung– Hanoi University of Technology   81   4/2/2003       Nguyen Chan Hung– Hanoi University of Technology   83




Spectrum allocation of CATV networks                                                Vocabulary
                                                                                    n   Perception = Su nhan thuc
                                                                                    n   Lap = Phu len




4/2/2003                    Nguyen Chan Hung– Hanoi University of Technology   82   4/2/2003       Nguyen Chan Hung– Hanoi University of Technology   84

Más contenido relacionado

Destacado

Multimedia Answer Generation for Community Question Answering
Multimedia Answer Generation for Community Question AnsweringMultimedia Answer Generation for Community Question Answering
Multimedia Answer Generation for Community Question AnsweringSWAMI06
 
Relevant multimedia question answering
Relevant multimedia question answeringRelevant multimedia question answering
Relevant multimedia question answeringvembuking
 
Efficient Instant-Fuzzy Search With Proximity Ranking
Efficient Instant-Fuzzy Search With Proximity RankingEfficient Instant-Fuzzy Search With Proximity Ranking
Efficient Instant-Fuzzy Search With Proximity RankingSWAMI06
 
Secure Distibuted data discovery & dissemination IN WSN
Secure Distibuted data discovery & dissemination IN WSNSecure Distibuted data discovery & dissemination IN WSN
Secure Distibuted data discovery & dissemination IN WSNSWAMI06
 
Multimedia presentation
  Multimedia presentation   Multimedia presentation
Multimedia presentation kamalesh2015
 
Introduction To Multimedia
Introduction To MultimediaIntroduction To Multimedia
Introduction To MultimediaJomel Penalba
 
Introduction to multimedia
Introduction to multimediaIntroduction to multimedia
Introduction to multimediaZurina Yasak
 
Chapter 1 : INTRODUCTION TO MULTIMEDIA
Chapter 1 : INTRODUCTION TO MULTIMEDIAChapter 1 : INTRODUCTION TO MULTIMEDIA
Chapter 1 : INTRODUCTION TO MULTIMEDIAazira96
 

Destacado (12)

Multimedia Answer Generation for Community Question Answering
Multimedia Answer Generation for Community Question AnsweringMultimedia Answer Generation for Community Question Answering
Multimedia Answer Generation for Community Question Answering
 
Relevant multimedia question answering
Relevant multimedia question answeringRelevant multimedia question answering
Relevant multimedia question answering
 
Efficient Instant-Fuzzy Search With Proximity Ranking
Efficient Instant-Fuzzy Search With Proximity RankingEfficient Instant-Fuzzy Search With Proximity Ranking
Efficient Instant-Fuzzy Search With Proximity Ranking
 
Secure Distibuted data discovery & dissemination IN WSN
Secure Distibuted data discovery & dissemination IN WSNSecure Distibuted data discovery & dissemination IN WSN
Secure Distibuted data discovery & dissemination IN WSN
 
Multimedia presentation
  Multimedia presentation   Multimedia presentation
Multimedia presentation
 
Multimedia ppt
Multimedia pptMultimedia ppt
Multimedia ppt
 
Introduction To Multimedia
Introduction To MultimediaIntroduction To Multimedia
Introduction To Multimedia
 
Multimedia
MultimediaMultimedia
Multimedia
 
Introduction to multimedia
Introduction to multimediaIntroduction to multimedia
Introduction to multimedia
 
Introduction to multimedia
Introduction to multimediaIntroduction to multimedia
Introduction to multimedia
 
Chapter 1 : INTRODUCTION TO MULTIMEDIA
Chapter 1 : INTRODUCTION TO MULTIMEDIAChapter 1 : INTRODUCTION TO MULTIMEDIA
Chapter 1 : INTRODUCTION TO MULTIMEDIA
 
Multimedia
MultimediaMultimedia
Multimedia
 

Similar a Introduction to multimedia technologies and compression techniques

New coding techniques, standardisation, and quality metrics
New coding techniques, standardisation, and quality metricsNew coding techniques, standardisation, and quality metrics
New coding techniques, standardisation, and quality metricsTouradj Ebrahimi
 
GPAC Team Research Highlights
GPAC Team Research HighlightsGPAC Team Research Highlights
GPAC Team Research HighlightsCyril Concolato
 
Direct satellite broadcast receiver using mpeg 2
Direct satellite broadcast receiver using mpeg 2Direct satellite broadcast receiver using mpeg 2
Direct satellite broadcast receiver using mpeg 2arpit shukla
 
TV Digital dan CATV
TV Digital dan CATVTV Digital dan CATV
TV Digital dan CATVGusty Aditya
 
DVI,FRACTAL IMAGE,SUB BAND IMAGE,VIDEO CODING AND WAVELET BASED COMPRESSION
DVI,FRACTAL IMAGE,SUB BAND IMAGE,VIDEO CODING AND WAVELET BASED COMPRESSIONDVI,FRACTAL IMAGE,SUB BAND IMAGE,VIDEO CODING AND WAVELET BASED COMPRESSION
DVI,FRACTAL IMAGE,SUB BAND IMAGE,VIDEO CODING AND WAVELET BASED COMPRESSIONkirupasuchi1996
 
Machine Learning approaches at video compression
Machine Learning approaches at video compression Machine Learning approaches at video compression
Machine Learning approaches at video compression Roberto Iacoviello
 
PPETP: A peer-to-peer streaming protocol
PPETP: A peer-to-peer streaming protocolPPETP: A peer-to-peer streaming protocol
PPETP: A peer-to-peer streaming protocolRiccardo Bernardini
 
HDTV (High Definition Television) and video surveillance
HDTV (High Definition Television) and video surveillanceHDTV (High Definition Television) and video surveillance
HDTV (High Definition Television) and video surveillanceAxis Communications
 
Transport methods in 3DTV--A Survey
Transport methods in 3DTV--A SurveyTransport methods in 3DTV--A Survey
Transport methods in 3DTV--A SurveyKevin Tong
 
“Trends in Neural Network Topologies for Vision at the Edge,” a Presentation ...
“Trends in Neural Network Topologies for Vision at the Edge,” a Presentation ...“Trends in Neural Network Topologies for Vision at the Edge,” a Presentation ...
“Trends in Neural Network Topologies for Vision at the Edge,” a Presentation ...Edge AI and Vision Alliance
 
Cymtv.Products.Jan.2012
Cymtv.Products.Jan.2012Cymtv.Products.Jan.2012
Cymtv.Products.Jan.2012robwilmer
 
Evaluation of Delay/Disruptive Tolerant Network Solutions in Networks under I...
Evaluation of Delay/Disruptive Tolerant Network Solutions in Networks under I...Evaluation of Delay/Disruptive Tolerant Network Solutions in Networks under I...
Evaluation of Delay/Disruptive Tolerant Network Solutions in Networks under I...acignoni
 
HTTP Adaptive Streaming – Where Is It Heading?
HTTP Adaptive Streaming – Where Is It Heading?HTTP Adaptive Streaming – Where Is It Heading?
HTTP Adaptive Streaming – Where Is It Heading?Alpen-Adria-Universität
 
Television digital terrestre version inlges
Television digital terrestre version inlgesTelevision digital terrestre version inlges
Television digital terrestre version inlgesedsacun
 
(Paper) P2P VIDEO BROADCAST BASED ON PER-PEER TRANSCODING AND ITS EVALUATION ...
(Paper) P2P VIDEO BROADCAST BASED ON PER-PEER TRANSCODING AND ITS EVALUATION ...(Paper) P2P VIDEO BROADCAST BASED ON PER-PEER TRANSCODING AND ITS EVALUATION ...
(Paper) P2P VIDEO BROADCAST BASED ON PER-PEER TRANSCODING AND ITS EVALUATION ...Naoki Shibata
 
Cymtv.Products.Jan.2012
Cymtv.Products.Jan.2012Cymtv.Products.Jan.2012
Cymtv.Products.Jan.2012Ron van Herk
 

Similar a Introduction to multimedia technologies and compression techniques (20)

New coding techniques, standardisation, and quality metrics
New coding techniques, standardisation, and quality metricsNew coding techniques, standardisation, and quality metrics
New coding techniques, standardisation, and quality metrics
 
GPAC Team Research Highlights
GPAC Team Research HighlightsGPAC Team Research Highlights
GPAC Team Research Highlights
 
AV in an IT World
AV in an IT WorldAV in an IT World
AV in an IT World
 
Direct satellite broadcast receiver using mpeg 2
Direct satellite broadcast receiver using mpeg 2Direct satellite broadcast receiver using mpeg 2
Direct satellite broadcast receiver using mpeg 2
 
TV Digital dan CATV
TV Digital dan CATVTV Digital dan CATV
TV Digital dan CATV
 
DVI,FRACTAL IMAGE,SUB BAND IMAGE,VIDEO CODING AND WAVELET BASED COMPRESSION
DVI,FRACTAL IMAGE,SUB BAND IMAGE,VIDEO CODING AND WAVELET BASED COMPRESSIONDVI,FRACTAL IMAGE,SUB BAND IMAGE,VIDEO CODING AND WAVELET BASED COMPRESSION
DVI,FRACTAL IMAGE,SUB BAND IMAGE,VIDEO CODING AND WAVELET BASED COMPRESSION
 
Machine Learning approaches at video compression
Machine Learning approaches at video compression Machine Learning approaches at video compression
Machine Learning approaches at video compression
 
PPETP: A peer-to-peer streaming protocol
PPETP: A peer-to-peer streaming protocolPPETP: A peer-to-peer streaming protocol
PPETP: A peer-to-peer streaming protocol
 
HDTV
HDTVHDTV
HDTV
 
HDTV (High Definition Television) and video surveillance
HDTV (High Definition Television) and video surveillanceHDTV (High Definition Television) and video surveillance
HDTV (High Definition Television) and video surveillance
 
Transport methods in 3DTV--A Survey
Transport methods in 3DTV--A SurveyTransport methods in 3DTV--A Survey
Transport methods in 3DTV--A Survey
 
Modern Telecommunication ... The Very Basic.
Modern Telecommunication ... The Very Basic.Modern Telecommunication ... The Very Basic.
Modern Telecommunication ... The Very Basic.
 
“Trends in Neural Network Topologies for Vision at the Edge,” a Presentation ...
“Trends in Neural Network Topologies for Vision at the Edge,” a Presentation ...“Trends in Neural Network Topologies for Vision at the Edge,” a Presentation ...
“Trends in Neural Network Topologies for Vision at the Edge,” a Presentation ...
 
Cymtv.Products.Jan.2012
Cymtv.Products.Jan.2012Cymtv.Products.Jan.2012
Cymtv.Products.Jan.2012
 
Evaluation of Delay/Disruptive Tolerant Network Solutions in Networks under I...
Evaluation of Delay/Disruptive Tolerant Network Solutions in Networks under I...Evaluation of Delay/Disruptive Tolerant Network Solutions in Networks under I...
Evaluation of Delay/Disruptive Tolerant Network Solutions in Networks under I...
 
HTTP Adaptive Streaming – Where Is It Heading?
HTTP Adaptive Streaming – Where Is It Heading?HTTP Adaptive Streaming – Where Is It Heading?
HTTP Adaptive Streaming – Where Is It Heading?
 
Television digital terrestre version inlges
Television digital terrestre version inlgesTelevision digital terrestre version inlges
Television digital terrestre version inlges
 
(Paper) P2P VIDEO BROADCAST BASED ON PER-PEER TRANSCODING AND ITS EVALUATION ...
(Paper) P2P VIDEO BROADCAST BASED ON PER-PEER TRANSCODING AND ITS EVALUATION ...(Paper) P2P VIDEO BROADCAST BASED ON PER-PEER TRANSCODING AND ITS EVALUATION ...
(Paper) P2P VIDEO BROADCAST BASED ON PER-PEER TRANSCODING AND ITS EVALUATION ...
 
Cymtv.Products.Jan.2012
Cymtv.Products.Jan.2012Cymtv.Products.Jan.2012
Cymtv.Products.Jan.2012
 
Digital TV, IPTV
Digital TV, IPTVDigital TV, IPTV
Digital TV, IPTV
 

Introduction to multimedia technologies and compression techniques

  • 1. Multimedia Technology Introduction (2) n Overview n Multimedia network q Introduction q The Internet was designed in the 60s for low-speed inter- q Chapter 1: Background of compression techniques networks with boring textual applications à High delay, q Chapter 2: Multimedia technologies high jitter. n JPEG q à Multimedia applications require drastic modifications n MPEG-1/MPEG -2 Audio & Video of the INTERNET infrastructure. n MPEG-4 q Many frameworks have been being investigated and n MPEG-7 (brief introduction) deployed to support the next generation multimedia n HDTV (brief introduction) Internet. (e.g. IntServ, DiffServ) n H261/H263 (brief introduction) q In the future, all TVs (and PCs) will be connected to the n Model base coding (MBC) (brief introduction) Internet and freely tuned to any of millions broadcast q Chapter 3: Some real-world systems stations all over the World. n CATV systems q At present, multimedia networks run over ATM (almost n DVB systems obsolete), IPv4, and in the future IPv6 à should guarantee QoS (Quality of Service) !! q Chapter 4: Multimedia Network 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 1 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 3 Introduction Chapter 1: Background of compression n The importance of Multimedia technologies: à Multimedia everywhere !! q On PCs: techniques n Real Player, QuickTime, Windows Media. n Music and Video are free on the INTERNET (mp2, mp3, mp4, asf, mpeg, n Why compression ? mov, ra, ram, mid, DIVX, etc) q For communication: reduce bandwidth in multimedia n Video/Audio Conferences. network applications such as Streaming media, Video-on- n Webcast / Streaming Applications Demand (VOD), Internet Phone n Distance Learning (or Tele-Education) q Digital storage (VCD, DVD, tape, etc) à Reduce size & n Tele-Medicine cost, increase media capacity & quality. n Tele-xxx (Let’s imagine !!) q On TVs and other home electronic devices: n Compression factor or compression ratio n DVB-T/DVB-C/DVB-S (Digital Video Broadcasting – q Ratio between the source data and the compressed data. Terrestrial/Cable/Satellite) à shows MPEG -2 superior quality over traditional analog TV !! (e.g. 10:1) n Interactive TV à Internet applications (Mail, Web, E -commerce) on a TV !! n 2 types of compression: à No need to wait for a PC to startup and shutdown !! n CD/VCD/DVD/Mp3 players q Lossless compression q Also appearing in Handheld devices (3G Mobile phones, wireless PDA) !! q Lossy compression 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 2 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 4
  • 2. Information content and redundancy Lossy Compression n Information rate n The data from the expander is not identical to q Entropy is the measure of information content. the source data but the difference can not be n à Expressed in bits/source output unit (such as bits/pixel). The more information in the signal, the higher the q distinguished auditorily or visually. entropy. q Suitable for audio and video compression. q Lossy compression reduce entropy while lossless q Compression factor is much higher than that of compression does not. lossless. (up to 100:1) n Redundancy q The difference between the information rate and bit n Based on the understanding of rate. psychoacoustic and psychovisual perception. q Usually the information rate is much less than the bit rate. n Can be forced to operate at a fixed q Compression is to eliminate the redundancy. compression factor. 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 5 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 7 Lossless Compression Process of Compression n The data from the decoder is identical to the n Communication (reduce the cost of the data source data. link) q Example: archives resulting from utilities such as q Data ? Compressor (coder) ? transmission pkzip or Gzip channel ? Expander (decoder) ? Data' q Compression factor is around 2:1. n Recording (extend playing time: in proportion n Can not guarantee a fix compression ratio à to compression factor The output data rate is variable à problems q Data ? Compressor (coder) ? Storagedevice for recoding mechanisms or communication (tape, disk, RAM, etc.) ? Expander (decoder) channel. ? Data‘ 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 6 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 8
  • 3. Sampling and quantization Statistical coding: the Huffman code n Why sampling? n Assign short code to the most probable data q Computer can not process analog signal directly. pattern and long code to the less frequent n PCM data pattern. q Sample the analog signal at a constant rate and use a fixed number of bits (usually 8 or 16) to n Bit assignment based on statistic of the represent the samples. source data. q bit rate = sampling rate * number of bits per n The statistics of the data should be known sample prior to the bit assignment. n Quantization q Map the sampled analog signal (generally, infinite precision) to discrete level (finite precision). q Represent each discrete level with a number. 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 9 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 11 Predictive coding Drawbacks of compression n Prediction n Sensitive to data error q Use previous sample(s) to estimate the current q Compression eliminates the redundancy which is essential to making data resistant to errors. sample. q For most signal, the difference of the prediction n Concealment required for real time application Error correction code is required, hence, adds redundancy and actual values is small. à We can use smaller q to the compressed data. number of bits to code the difference while maintaining the same accuracy !! n Artifacts q Artifacts appear when the coder eliminates part of the q Noise is completely unpredictable entropy. n Most codec requires the data being preprocessed or q The higher the compression factor, the more the artifacts. otherwise it may perform badly when the data contains noise. 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 10 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 12
  • 4. A coding example: Clustering color pixels Motion Compensated Prediction n More data in Frame-Differential Coding can n In an image, pixel values are clustered in several be eliminated by comparing the present peaks pixel to the location of the same object in the previous frame. (à not to the n Each cluster representing the color range of one same spatial location in the previous frame) object in the image (e.g. blue sky) n The encoder estimates the motion in the image to find the corresponding area in a n Coding process: previous frame. 1. Separate the pixel values into a limited number of data n The encoder searches for a portion of a previous frame which is similar to the part clusters (e.g., clustered pixels of sky blue or grass green) of the new frame to be transmitted. 2. Send the average color of each cluster and an n It then sends (as side information) a identifying number for each cluster as side information. motion vector telling the decoder what portion of the previous frame it will use to 3. Transmit, for each pixel: predict the new frame. n The number of the average cluster color that it is close to. n It also sends the prediction error so that n Its difference from that average cluster color. (à can be the exact new frame may be reconstituted coded to reduce redundancy since the differences are often n See top figure à without motion similar !!) à Prediction compensation – Bottom figure à With motion compensation 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 13 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 15 Frame-Differential Coding Unpredictable Information n Frame-Differential Coding = prediction from a n Unpredictable information from the previous previous video frame. n A video frame is stored in the encoder for frame: comparison with the present frame à causes 1. Scene change (e.g. background landscape encodinglatency of one frame time. change) n For still images: 2. Newly uncovered information due to object q Data can be sent only for the first instance of a frame q All subsequent prediction error values are zero. motion across a background, or at the edges of a q Retransmit the frame occasionally to allow receivers that panned scene. (e.g. a soccer ’s face uncovered have just been turned on to have a starting point. by a flying ball) n à FDC reduces the information for still images, but leaves significant data for moving images (e.g. a movement of the camera) 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 14 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 16
  • 5. Dealing with unpredictable Information Types of picture transform coding n Scene change n Types of picture coding: q à An Intra-coded picture (MPEG I picture) must be sent q Discrete Fourier (DFT) for a starting point à require more data than Predicted q Karhonen-Loeve picture (P picture) q Walsh-Hadamard q I pictures are sent about twice per second àTheir time and q Lapped orthogonal sending frequency may be adjusted to accommodate scene changes q Discrete Cosine (DCT) à used in MPEG-2 ! q Wavelets à New ! n Uncovered information q Bi-directionally coded type of picture, or B picture. n The differences between transform coding methods: q There must be enough frame storage in the system to wait q The degree of concentration of energy in a few coefficients for the later picture that has the desired information. q The region of influence of each coefficient in the q To limit the amount of decoder’s memory, the encoder reconstructed picture stores pictures and sends the required reference q The appearance and visibility of coding noise due to coarse pictures before sending the B picture. quantization of the coefficients 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 17 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 19 Transform Coding DCT Lossy Coding n Convert spatial image pixel values to n Lossless coding cannot obtain high transform coefficient values compression ratio (4:1 or less) n à the number of coefficients produced is n Lossy coding = discard selective information equal to the number of pixels transformed. so that the reproduction is visually or aurally n Few coefficients contain most of the indistinguishable from the source or having energy in a picture à coefficients may be least artifacts. further coded by lossless entropy coding n Lossy coding can be achieved by: n The transform process concentrates the energy into particular coefficients q Eliminating some DCT coefficients (generally the “low frequency” coefficients ) q Adjusting the quantizing coarseness of the coefficients à better !! 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 18 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 20
  • 6. Masking Run-Level coding n Masking make certain types of coding n "Run-Level" coding = Coding a run-length of noise invisible or inaudible due to some zeros followed by a nonzero level. psycho-visual/acoustical effect. q à Instead of sending all the zero values individually, the length of the run is sent. q In audio, a pure tone will mask energy of higher frequency and also lower frequency (with weaker q Useful for any data with long runs of zeros. effect). q Run lengths are easily encoded by Huffman code q In video, high contrast edges mask random noise. n Noise introduced at low bit rates falls in the frequency, spatial, or temporal regions 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 21 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 23 Variable quantization Key points: n Variable quantization is the main technique of lossy n Compression process coding à greatly reduce bit rate. n Quantization & Sampling n Coarsely quantizing the less significant coefficients in a transform (à less noticeable / low energy / less n Coding: visible/audible) q Lossless & lossy coding n Can be applied to a complete signal or to individual q Frame-Differential Coding frequency components of a transformed signal. q Motion Compensated Prediction n VQ also controls instantaneous bit rate in order to: q Variable quantization q Match average bit rate to a constant channel bit rate. q Run level coding q Prevent buffer overflow or underflow. n Masking 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 22 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 24
  • 7. Chapter 2: Multimedia technologies JPEG – Zig-zag scanning q Roadmap n JPEG n MPEG-1/MPEG-2 Video n MPEG-1 Layer 3 Audio (mp3) n MPEG-4 n MPEG-7 (brief introduction) n HDTV (brief introduction) n H261/H263 (brief introduction) n Model base coding (MBC) (brief introduction) 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 25 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 27 JPEG (Joint Photographic Experts Group) JPEG - DCT n JPEG encoder n DCT is similar to the Discrete Fourier Transform à q Partitions image into blocks of 8 * 8 pixels transforms a signal or image from the spatial domain to q Calculates the Discrete Cosine Transform (DCT) of each block. the frequency domain. q A quantizer roundsoff the DCT coefficients according to the n DCT requires less multiplications than DFT quantizationmatrix. à lossy but allows for large compression ratios. q Produces a series of DCT coefficients using Zig-zag scanning q Uses a variablelengthcode(VLC) on these DCT coefficients q Writes the compressed data stream to an output file (*.jpg or *.jpeg). n JPEG decoder n Input image A: q File à input data stream à Variable length decoder à IDCT (Inverse q The input image A is N2 pixels wide by N1 pixels high; DCT) à Image q A(i,j) is the intensity of the pixel in row i and column j; n Output image B: q B(k1,k2) is the DCT coefficient in row k1 and column k2 of the DCT matrix 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 26 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 28
  • 8. JPEG - Quantization Matrix MPEG (Moving Picture Expert Group) n The quantization matrix is the 8 by 8 matrix of step sizes (sometimescalled quantums) - one element for each DCT n MPEG is the heart of: coefficient. q Digital television set-top boxes n Usually symmetric. q HDTV decoders n Step sizes will be: q Small in the upper left (low frequencies), q DVD players q Large in the upper right (high frequencies) q Video conferencing q A step size of 1 is the most precise. n The quantizer divides the DCT coefficient by its corresponding q Internet video, etc quantum, then rounds to the nearest integer. n MPEG standards: n Large quantums drive small coefficients down to zero. n The result: q MPEG-1, MPEG-2, MPEG-4, MPEG-7 q Many high frequency coefficients become zero à remove easily. q (MPEG-3 standard was abandoned and became q The low frequency coefficients undergo only minor adjustment. an extension of MPEG-2) 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 29 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 31 JPEG Coding process illustrated MPEG standards n MPEG-1 (Obsolete) 1255 -15 43 58 -12 1 -4 -6 78 -1 4 4 -1 0 0 0 q A standard for storage and retrieval of moving pictures and audio on storage media 11 -65 80 -73 -27 -1 -5 1 1 -5 6 -4 -1 0 0 0 q application: VCD (video compact disk) -49 37 -87 8 12 6 10 8 -4 3 -5 0 0 0 0 0 n MPEG-2 (Widely implemented) 27 -50 29 13 3 13 -6 5 Q 2 -3 1 0 0 0 0 0 q A standard for digital television -16 21 -11 -10 10 -21 9 -6 -1 1 0 0 0 0 0 0 q Applications: DVD (digital versatile disk), HDTV (high definition TV), DVB (European Digital Video Broadcasting Group), etc. 3 -14 0 14 -14 16 -8 4 0 0 0 0 0 0 0 0 -4 -1 8 -13 12 -9 5 -1 0 0 0 0 0 0 0 0 n MPEG-4 (Newly implemented – still being researched) -4 2 -2 6 -7 6 -1 3 0 0 0 0 0 0 0 0 q A standard for multimedia applications DCT Coefficients Quantization result q Applications: Internet, cable TV, virtual studio, etc. Zigzag scan result: 78 -1 1 -4 -5 4 4 6 3 2 -1 -3 -5 -4 -1 0 -1 0 1 1 0 0 0 0 0 0 0 n MPEG-7 (Future work – ongoing research) q Content representation standard for information search 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 EOB ( “Multimedia Content Description Interface”) à Easily coded by Run-length Huffman coding q Applications: Internet, video search engine, digital library 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 30 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 32
  • 9. MPEG-2 formal standards Pixel & Block n The international standard ISO/IEC 13818-2 n Pixel = "picture element". "Generic Coding of Moving Pictures and q A discrete spatial point sample of an image. Associated Audio Information” q A color pixel may be represented digitally as a n ATSC (Advanced Television Systems number of bits for each of three primary color Committee) document A/54 "Guide to the Use of values the ATSC Digital Television Standard” n Block q = 8 x 8 array of pixels. q A block is the fundamental unit for the DCT coding (discrete cosine transform). 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 33 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 35 MPEG video data structure Macroblock n The MPEG 2 video data stream is constructed in n A macroblock = 16 x 16 array of luma (Y) pixels ( = layers from lowest to highest as follows: 4 blocks = 2 x 2 block array). q PIXEL is the fundamental unit n The number of chroma pixels (Cr, Cb) will vary q BLOCK is an 8 x 8 array of pixels depending on the chroma pixel structure indicated in the sequence header (e.g. 4:2:0, etc) q MACROBLOCK consists of 4 luma blocks and 2 chroma blocks n The macroblock is the fundamental unit for motion q Field DCT Coding and Frame DCT Coding compensation and will have motion vector(s) associated with it if is predictively coded. q SLICE consists of a variable number of macroblocks q PICTURE consists of a frame (or field) of slices n A macroblock is classified as q Field coded (à An interlaced frame consists of 2 field) q GROUP of PICTURES (GOP) consists of a variable number of pictures q Frame coded à depending on how the four blocks are extracted from the q SEQUENCE consists of a variable number of GOP’s macroblock. q PACKETIZED ELEMENTARY STREAM (opt) 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 34 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 36
  • 10. Slice I, P, B Pictures Encoded pictures are classified into 3 types: I, P, and B. n Pictures are divided into slices. n I Pictures = Intra Coded Pictures n A slice consists of an arbitrary number of q All macroblocks coded without prediction q Needed to allow receiver to have a "starting point" for prediction after successive macroblocks (going left to right), a channel change and to recover from errors but is typically an entire row of macroblocks. n P Pictures = Predicted Pictures Macroblocks may be coded with forward prediction from references A slice does not extend beyond one row. q made from previous I and P pictures or may be intra coded n The slice header carries address information n B Pictures = Bi-directionally predicted pictures q Macroblocks may be coded with forward prediction from previous I that allows the Huffman decoder to or P references resynchronize at slice boundaries q Macroblocks may be coded with backward prediction from next I or P reference q Macroblocks may be coded with interpolated prediction from past and future I or P references q Macroblocks may be intra coded (no prediction) 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 37 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 39 Picture Group of pictures (GOP) n The group of pictures layer is optional in MPEG-2. n A source picture is a contiguous rectangular array of pixels. n GOP begins with a start code and a header n A picture may be a complete frame of video ("frame picture") or n The header carries one of the interlaced fields from an interlaced source ("field q time code information picture"). q editing information n A field picture does not have any blank lines between its active q optional user data lines of pixels. n First encoded picture in a GOP is always an I picture n A coded picture (also called a video access unit) begins with a n Typical length is 15 pictures with the following structure (in display order): start code and a header. The header consists of: q I B B P B B P B B P B B P B B à Provides an I picture with sufficient q picture type (I, B, P) frequency to allow a decoder to decode correctly q temporal reference information Forward motion compensation q motion vector search range q optional user data n A frame picture consists of: I B B P B B P B B P B Time q a frame of a progressive source or q a frame (2 spatially interlaced fields) of an interlaced source Bidirectional motion compensation 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 38 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 40
  • 11. Sequence Transport stream n A sequence begins with a unique 32 bit start code followed by n Transport packets (fixedlength) are formed from a PES stream, a header. including: q The PES header n The header carries: q Transport packet header. q picture size q Successive transport packet’s payloads are filled by the remaining q aspect ratio PES packet content until the PES packet is all used. q frame rate and bit rate q The final transport packet is filled to a fixed length by stuffing with 0xFF bytes (all ones). q optional quantizer matrices n Each PES packet header includes: q required decoder buffer size q An 8-bit stream ID identifying the source of the payload. q chroma pixel structure q Timing references: PTS (presentation time stamp), the time q optional user data at which a decoded audio or video access unit is to be n The sequence information is needed for channel changing. presented by the decoder n The sequence length depends on acceptable channel change q DTS (decoding time stamp) the time at which an access unit delay. is decoded by the decoder q ESCR (elementary stream clock reference). 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 41 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 43 Packetized Elementary Stream (PES) Intra Frame Coding n Intra coding only concern with information within the current n Video Elementary Stream (video ES), consists of all frame, (not relative to any other frame in the video sequence) the video data for a sequence, including the sequence n MPEG intra-frame coding block diagram (See bottom Fig) à header and all the subparts of a sequence. Similar to JPEG (àLet’s review JPEG coding mechanism !!) n An ES carries only one type of data (video or audio) n Basic blocks of Intra frame coder: from a single video or audio encoder. q Video filter n A PES, consists of a single ES which has been split q Discrete cosine transform (DCT) into packets, each starting with an added packet q DCT coefficient quantizer header. q Run-length amplitude/variable length coder (VLC) n A PES stream contains only one type of data from one source, e.g. from one video or audio encoder. n PES packets have variable length, not corresponding to the fixed packet length of transport packets, and may be much longer than a transport packet. 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 42 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 44
  • 12. Video Filter MPEG Profiles & levels n MPEG-2 is classified into several profiles. n Human Visual System (HVS) is n Main profile features: q Most sensitive to changes in luminance, q 4:2:0 chroma sampling format q Less sensitive to variations in chrominance. q I, P, and B pictures n MPEG uses the YCbCr color space to represent the q Non-scalable data values instead of RGB, where: n Main Profile is subdivided into levels. q Y is the luminance signal, q MP@ML (Main Profile Main Level): n Designed with CCIR601 standard for interlaced standard digital q Cb is the blue color difference signal, video. q Cr is the red color difference signal. n 720 x 576 (PAL) or 720 x 483 (NTSC) n What is “4:4:4”, “4:2:0”, etc, video format ? n 30 Hz progressive, 60 Hz interlaced n Maximum bit rate is 15 Mbits/s q 4:4:4 is full bandwidth YCbCr video à each macroblock q MP@HL (Main Profile High Level): consists of 4 Y blocks, 4 Cb blocks, and 4 Cr blocks à n Upper bounds: waste of bandwidth !! n 1152 x 1920, 60Hz progressive q 4:2:0 is most commonly used in MPEG-2 n 80 Mbits/s 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 45 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 47 Applications of chroma formats MPEG encoder/ decoder chroma_for Multiplex order (time) Application mat within macroblock 4:2:0 ØMain stream television, YYYYCbCr (6 blocks) ØConsumer entertainment. ØStudio production 4:2:2 environments YYYYCbCrCbCr (8 blocks) ØProfessional editing equipment, 4:4:4 YYYYCbCrCbCrCbCrCbCr ØComputer graphics (12 blocks) 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 46 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 48
  • 13. Prediction DCT and IDCT formulas n Backward prediction is done by n DCT: storing pictures until the desired anchor picture is available before q Eq 1 à Normal form encoding the current stored frames. q Eq 2 à Matrix form n The encoder can decide to use: n IDCT: q Forward prediction from a previous q Eq 3 à Normal form picture, q Eq 4 à Matrix form q Backward prediction from a following picture, n Where: q or Interpolated prediction q F(u,v) = two-dimensional NxN DCT. à to minimize prediction error. q u,v,x,y = 0,1,2,...N-1 n The encoder must transmit pictures in an order differ from that of source q x,y are spatial coordinates in the sample domain. pictures so that the decoder has the anchor pictures before decoding q u,v are frequency coordinates predicted pictures. (See next slide) in the transform domain. n The decoder must have two frame q C(u), C(v) = 1/(square root stored. (2)) for u, v = 0. q C(u), C(v) = 1 otherwise. 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 49 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 51 I P B Picture Reordering DCT versus DFT n Pictures are coded and decoded in a different order n The DCT is conceptually similar to the DFT, except: than they are displayed. q DCT concentrates energy into lower order coefficients n à Due to bidirectional prediction for B pictures. better than DFT. q DCT is purely real, the DFT is complex (magnitude and n For example we have a 12 picture long GOP: phase). n Source order and encoder input order: q A DCT operation on a block of pixels produces coefficients q I(1) B(2) B(3) P(4) B(5) B(6) P(7) B(8) B(9) P(10) B(11) that are similar to the frequency domain coefficients B(12) I(13) produced by a DFT operation. n An N-point DCT has the same frequency resolution as a 2N- n Encoding order and order in the coded bitstream: point DFT. q I(1) P(4) B(2) B(3) P(7) B(5) B(6) P(10) B(8) B(9) I(13) B(11) n The N frequencies of a 2N point DFT correspond to N points B(12) on the upper half of the unit circle in the complex frequency n Decoder output order and display order (same as plane. input): q Assuming a periodic input, the magnitude of the DFT q I(1) B(2) B(3) P(4) B(5) B(6) P(7) B(8) B(9) P(10) B(11) coefficients is spatially invariant (phase of the input does not matter). This is not true for the DCT. B(12) I(13) 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 50 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 52
  • 14. Quantization matrix MPEG scanning n Note à DCT n Left à Zigzag scanning (like JPEG) coefficients are: q Small in the upper left n Right à Alternate scanning à better for interlaced frames ! (low frequencies), q Large in the upper right (high frequencies) à Recall the JPEG mechanism !! n Why ? q HVS is less sensitive to errors in high frequency coefficients than it is for lower frequencies q à higherfrequencies should be more coarsely quantized !! 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 53 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 55 Result DCT matrix (example) Huffman/ Run-Level Coding n After adaptive n Huffman coding in combination with Run-Level coding and zig-zag scanning is applied to quantization, the quantized DCT coefficients. result is a matrix n "Run-Level" = A run-length of zeros followed by a containing many non-zero level. zeros. n Huffman coding is also applied to various types of side information. n A Huffman code is an entropy code which is optimally achieves the shortest average possible code word length for a source. n à This average code word length is >= the entropy of the source. 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 54 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 56
  • 15. Huffman/ Run-Level coding illustrated MPEG Data Transport n MPEG packages all data into fixed-size 188-byte packets for transport. Zero MPEG n Using the DCT output n Video or audio payload data placed in PES packets before is broken up Amplitude matrix in previous slide, Run-Length Code Value into fixed length transport packet payloads. N/A 8 (DC Value) 110 1000 after being zigzag n A PES packet may be much longer than a transport packet à Require scanned à the output segmentation: 0 4 00001100 will be a sequence of q The PES header is placed immediately following a transport header 0 4 00001100 number: 4, 4, 2, 2, 2, 1, q Successive portions of the PES packet are then placed in the pay loads of 1, 1, 1, 0 (12 zeros), 1, 0 transport packets. 0 2 01000 (41 zeros) q Remaining space in the final transport packet payload is filled with stuffing 0 2 01000 bytes = 0xFF (all ones). n These values are looked q Each transport packet starts with a sync byte = 0x47. 0 2 01000 up in a fixed table of q In the ATSC US terrestrial DTV VSB transmission system, sync byte is not 0 1 110 variable length codes processed, but is replaced by a different sync symbol especially suited to RF q à The most probable transmission. 0 1 110 occurrence is given a q The transport packet header contains a 13-bit PID (packet ID), which corresponds to a particular elementary stream of video, audio, o r other program 0 1 110 relatively short code, element. 0 1 110 q à The least probable q PID 0x0000 is reserved for transport packets carrying a program association occurrence is given a table (PAT). 12 1 0010 0010 0 relatively long code. q The PAT points to a Program Map Table (PMT)à points to particular elements EOB EOB 10 of a program 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 57 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 59 Huffman/ Run-Level coding illustrated (2) MPEG Transport packet n à The first run of 12 zeroes has been efficiently coded by only 9 bits n à The last run of 41 zeroes has been entirely eliminated, represented only with a 2-bit End Of Block (EOB) indicator. n à The quantized DCT coefficients are now PCR_flag n Adaptation Field: q represented by a sequence of 61 binary bits (See q 8 bits specifying the length of the q OPCR_flag adaptation field. q splicing_point_flag the table). q transport_private_data_flag q The first group of flags consists of eight 1-bit flags: q adaptation_field_extension_flag n Considering that the original 8x8 block of 8-bit q The optionalfields are present if q discontinuity_indicator pixels required 512 bits for full representation, à q random_access_indicator q indicated by one of the preceding flags. The remainder of the adaptation field is the compression rate is approx. 8,4:1. q elementary_stream_priority_in filled with stuffing bytes (0xFF, all ones). dicator 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 58 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 60
  • 16. Demultiplexing a Transport Stream (TS) Timing - Synchronization n Demultiplexing a transport stream involves: n The decoder is synchronized with the encoder by time stamps 1. Finding the PAT by selecting packets with PID = 0x0000 n The encoder contains a master oscillator and counter, called the 2. Reading the PIDs for the PMTs System Time Clock (STC). (See previous block diagram.) 3. Reading the PIDs for the elements of a desired program q à The STC belongs to aparticular program and is the master clock of the video and audio encoders for that program. from its PMT (for example, a basic program will have a q à Multiple programs, each with its own STC, can also be PID for audio and a PID for video) multiplexed into a single stream. 4. Detecting packets with the desired PIDs and routing them n A program component can even have no time stamps à but to the decoders can not be synchronized with other components. q A MPEG-2 transport stream can carry: n At encoder input, (Point A), the time of occurrence of an input § Video stream video picture or audio block is noted by sampling the STC. § Audio stream n A total delay of encoder and decoder buffer (constant) is added to STC, creating a Presentation Time Stamp (PTS), § Any type of data q à PTS is then inserted in the first of the packet(s ) representing à MPEG-2 TS is the packet format for CATV downstream that picture or audio block, at Point B. data communication. 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 61 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 63 Timing & buffer control n Point A: Encoder input Timing – Synchronization (2) à Constant/specifi n Decode Time Stamp (DTS) can optionally combined into the bit edrate stream à represents the time at which the data should be taken n Point B: instantaneously from the decoder buffer and decoded. Encoder outputà q DTS and PTS are identical except in the case of picture reordering for B Variable rate pictures. n Point C: q The DTS is only used where it is needed because of reordering. Encoderbuffer Whenever DTS is used, PTS is also coded. output à Constant rate q PTS (or DTS) inserted interval = 700 m S. n Point D: q In ATSC à PTS (or DTS) must be inserted at the beginning of each Communication coded picture (access unit ). channel + decoderbuffer n In addition, the output of the encoder buffer (Point C) is time à Constant stamped with System Time Clock (STC) values, called: rate q System Clock Reference (SCR) in a Program Stream. n Point E: Decoder input q Program Clock Reference (PCR) in a Transport Stream. à Variable rate n PCR time stamp interval = 100mS. n Point F: Decoderoutput n SCR time stamp interval = 700mS. à n PCR and/or the SCR are used to synchronize the decoder STC Constant/specifi edrate with the encoder STC. 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 62 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 64
  • 17. Timing – Synchronization (3) HDTV (2) n All video and audio streams included in a program must get their n HDTV proposals are for a screen which is wider than the conventional time stamps from a common STC so that synchronization of the TV image by about 33%. It is generally agreed that the HDTV aspect video and audio decoders with each other may be accomplished. ratio will be 16:9, as opposed to the 4:3 ratio of conventional TV n The data rate and packet rate on the channel (at the multiplexer systems. This ratio has been chosen because psychological tests have output) can be completely asynchronous with the System Time shown that it best matches the human visual field. Clock (STC) n It also enables use of existing cinema film formats as additional source material, since this is the same aspect ratio used in normal 35 mm film. n PCR time stamps allows synchronizations of different Figure 16.6(a) shows how the aspect ratio of HDTV compares with that multiplexed programs having different STCs while allowing STC of conventional television, using the same resolution, or the same recovery for each program. surface area as the comparison metric. n If there is no buffer underflow or overflow à delays in the buffers n To achieve the improved resolution the video image used in HDTV and transmission channel for both video and audio are must contain over 1000 lines, as opposed to the 525 and 625 provided constant. by the existing NTSC and PAL systems. This gives a much improved vertical resolution. The exact value is chosen to be a simple multiple of n The encoder input and decoder output run at equal and constant one or both of the vertical resolutions used in conventional TV. rates . n However, due to the higher scan rates the bandwidth requirement for n Fixedend-to-end delay from encoder input to decoder output analogue HDTV is approximately 12 MHz, compared to the nominal 6 n If exact synchronization is not required, the decoder clock can be MHz of conventional TV free running à video frames can be repeated / skipped as necessary to prevent buffer underflow / overflow , respectively. 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 65 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 67 HDTV (High definition television) HDTV (3) n High definition television (HDTV) first came to n The introduction of a non-compatible TV transmission format for HDTV would require the viewer either to buy a new receiver, or to public attention in 1981, when NHK, the buy a converter to receive the picture on their old set. Japanese broadcasting authority, first n The initial thrust in Japan was towards an HDTV format which is compatible with conventional TV standards, and which can be demonstrated it in the United States. received by conventional receivers, with conventional quality. However, to get the full benefit of HDTV, a new wide screen, high n HDTV is defined by the ITU-R as: resolution receiver has to be purchased. q 'A system designed to allow viewing at about n One of the principal reasons that HDTV is not already common is three times the picture height, such that the that a general standard has not yet been agreed. The 26th CCIR plenary assembly recommended the adoption of a single, worldwide system is virtually, or nearly, transparent to the standard for high definition television. quality or portrayal that would have been n Unfortunately, Japan, Europe and North America are all investing perceived in the original scene ... by a discerning significant time and money in their own systems based on their own, viewer with normal visual acuity.' current, conventional TV standards and other national considerations. 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 66 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 68
  • 18. H261- H263 H261-H263 (3) n The H.261 algorithm was developed for the purpose of image n H.261 is widely used on 176x 144 pixel images. transmission rather than image storage. n The ability to select a range of output rates for the algorithm n It is designed to produce a constant output of p x 64 kbivs, where allows it to be used in different applications. p is an integer in the range 1 to 30. q This allows transmission over a digital network or data link of n Low output rates ( p = 1 or 2) are only suitable for face-to-face varying capacity. (videophone) communication. H.261 is thus the standard used in q It also allows transmission over a single 64 kbit/s digital many commercial videophone systems such as the UK telephone channel for low quality video-telephony, or at higher bit BT/Marconi Relate 2000 and the US ATT 2500 products. rates for improved picture quality. n Video-conferencing would require a greater output data rate ( p > n The basic coding algorithm is similar to that of MPEG in that it is 6) and might go as high as 2 Mbit/s for high quality transmission a hybrid of motion compensation, DCT and straightforward with larger image sizes. DPCM (intra-frame coding mode), without the MPEG I, P, B frames. n A further development of H.261 is H.263 for lower fixed n The DCT operation is performed at a low level on 8 x 8 blocks of transmission rates. error samples from the predicted luminance pixel values, with n This deploys arithmetic coding in place of the variable length sub-sampled blocks of chrominance data. coding (See H261 diagram), with other modifications, the data rate is reduced to only 20 kbit/s. 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 69 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 71 H261-H263 (2) Model Based Coding (MBC) n At the very low bit rates (20 kbit/s or less) associated with video telephony, the requirements for image transmission stretch the compression techniques described earlier to their limits. n In order to achieve the necessary degree of compression they often require reduction in spatial resolution or even the elimination of frames from the sequence. n Model based coding (MBC) attempts to exploit a greater degree of redundancy in images than current techniques, in order to achieve significant image compression but without adversely degrading the image content information. n It relies upon the fact that the image quality is largely subjective. Providing that the appearance of scenes within an observed image is kept at a visually acceptable level, it may not matter that the observed image is not a precise reproduction of reality. 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 70 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 72
  • 19. Model Based Coding (2) Model based coding (4) n A synthetic image is created by texture mapping detail from an n One MBC method for producing an artificial image of a head sequence initial full-face source image, over the wire-frame, Facial utilizes a feature codebook where a range of facial expressions, sufficient to create an animation, are generated from sub-images or movement can be achieved by manipulation of the vertices of the templates which are joined together to form a complete face. wire-frame. n The most important areas of a face, for conveying an expression, are n Head rotation requires the use of simple matrix operations upon the eyes and mouth, hence the objective is to create an image in which the coordinate array. Facial expression requires the manipulation the movement of the eyes and mouth is a convincing approximation to of the features controlling the vertices. the movements of the original subject. n This model based feature codebook approach suffers from the n When forming the synthetic image, the feature template vectors which drawback of codebook formation. form the closest match to those of the original moving sequence are selected from the codebook and then transmitted as low bit rate coded n This has to be done off-line and, consequently, the image is addresses. required to be prerecorded, with a consequent delay. n By using only 10 eye and 10 mouth templates, for instance, a total of n However, the actual image sequence can be sent at a very low 100 combinations exists implying that only a 6 -bit codebook address data rate. For a codebook with 128 entries where 7 bits are need be transmitted. required to code each mouth, a 25 frameh sequence requires n It has been found that there are only 13 visually distinct mouth shapes less than 200 bit/s to code the mouth movements. for vowel and consonant formation during speech. n When it is finally implemented, rates as low as 1 kbit/s are n However, the number of mouth sub-images is usually increased, to confidently expected from MBC systems, but they can only include intermediate expressions and hence avoid step changes in the transmit image sequences which match the stored model, e.g. image. head and shoulders displays. 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 73 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 75 Model Based Coding (3) Key points: n Another common way of representing objects in three- dimensional computer graphics is by a net of n JPEG coding mechanism à DCT/ Zigzag Scanning/ Adaptive interconnecting polygons. Quantization / VLC n A model is stored as a set of linked arrays which specify n MPEG layered structure: the coordinates of each polygon vertex, with the lines q Pixel, Block, Macroblock, Field DCT Coding / Frame DCT Coding, Slice, connecting the vertices together forming each side of a Picture, Group of Pictures (GOP), Sequence, Packetized Elementary Stream polygon. (PES) n To make realistic models, the polygon net can be shaded to reflect the presence of light sources. n MPEG compression mechanism: n The wire-frame model [Welch 19911 can be modified to q Prediction fit the shape of a person's head and shoulders. The q Motion compensation wire-frame, composed of over 100 interconnecting q Scanning triangles, can produce subjectively acceptable synthetic q YCbCr formats (4:4:4, 4:2:0, etc) images, providing that the frame is not rotated by more than 30" from the full -face position. q Profiles @ Level n The model, (see the Figure) uses smaller triangles in q I,P,B pictures & reordering areas associated with high degrees of curvature where q Encoder/ Decoder process & Block diagram significant movement is required. n MPEG Data transport n Large flat areas, such as the forehead, contain fewer n MPEG Timing & Buffer control triangles. q STC/SCR/DTS n A second wire-frame is used to model the mouth interior. q PCR/PTS 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 74 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 76
  • 20. Technical terms A Brief History: n Macro blocks n HVS = Human Visual System q CATV appeared in the 60s in the US, where high n GOP = Group of Pictures buildings are the great obstacles for the n VLC = Variable Length Coding/Coder propagation of TV signal. n IDCT/DCT = (Inverse) Discrete Cosine Transform n PES = Packetized ElementaryStream q Old CATV networks à n MP@ML = Main profile @ Main Level n Coaxial only n PCR = Program Clock Reference n Tree-and-Branch only n SCR = System Clock Reference n TV only n STC = System Time Clock n PTS = Presentation Time Stamp n No return path (à high-pass filters are installed in n DTS = Decode Time Stamp customer’s houses to block return low frequency noise) n PAT = Program Association Table n PMT = Program Map Table 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 77 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 79 Chapter 3. CATV systems Modern CATV networks n Key elements: q CO or n Overview: Master Headend q A brief history q Headends/ Hub q Modern CATV networks q Server complex q CMTS q CATV systems and equipments q TV content provider q Optical Nodes q Taps q Amplifiers (GNA/TNA/L E) 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 78 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 80
  • 21. Modern CATV networks (2) CATV systems and equipments n Based on Hybrid Fiber-Coaxial architecture à also referred to as “HFC networks” n The optical section is based on modern optical communication technologies à q Star/ring/mesh, etc topologies q SDH/SONET for digital fibers q Various architectures à digital, analog or mixed fiber cabling systems. n Part of forward path spectrum is used for high-speed Internet access n Return path is exploited for Digital data communication à the root of new problems !! q 5-60 MHz band for upstream q 88-860 MHz band for downstream n 88-450 MHz for analog/digital TV channels n 450-860 MHz for Internet access q FDM 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 81 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 83 Spectrum allocation of CATV networks Vocabulary n Perception = Su nhan thuc n Lap = Phu len 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 82 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 84