Multimedia Technology
n Overview
q Introduction
q Chapter 1: Background of compression techniques
q Chapter 2: Multimedia technologies
n JPEG
n MPEG-1/MPEG -2 Audio & Video
n MPEG-4
n MPEG-7 (brief introduction)
n HDTV (brief introduction)
n H261/H263 (brief introduction)
n Model base coding (MBC) (brief introduction)
q Chapter 3: Some real-world systems
n CATV systems
n DVB systems
q Chapter 4: Multimedia Network
Introduction to multimedia technologies and compression techniques
1. Multimedia Technology Introduction (2)
n Overview n Multimedia network
q Introduction q The Internet was designed in the 60s for low-speed inter-
q Chapter 1: Background of compression techniques networks with boring textual applications à High delay,
q Chapter 2: Multimedia technologies high jitter.
n JPEG q à Multimedia applications require drastic modifications
n MPEG-1/MPEG -2 Audio & Video of the INTERNET infrastructure.
n MPEG-4 q Many frameworks have been being investigated and
n MPEG-7 (brief introduction) deployed to support the next generation multimedia
n HDTV (brief introduction) Internet. (e.g. IntServ, DiffServ)
n H261/H263 (brief introduction) q In the future, all TVs (and PCs) will be connected to the
n Model base coding (MBC) (brief introduction) Internet and freely tuned to any of millions broadcast
q Chapter 3: Some real-world systems stations all over the World.
n CATV systems q At present, multimedia networks run over ATM (almost
n DVB systems
obsolete), IPv4, and in the future IPv6 à should
guarantee QoS (Quality of Service) !!
q Chapter 4: Multimedia Network
4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 1 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 3
Introduction Chapter 1: Background of compression
n The importance of Multimedia technologies: à Multimedia everywhere !!
q On PCs:
techniques
n Real Player, QuickTime, Windows Media.
n Music and Video are free on the INTERNET (mp2, mp3, mp4, asf, mpeg,
n Why compression ?
mov, ra, ram, mid, DIVX, etc) q For communication: reduce bandwidth in multimedia
n Video/Audio Conferences. network applications such as Streaming media, Video-on-
n Webcast / Streaming Applications Demand (VOD), Internet Phone
n Distance Learning (or Tele-Education)
q Digital storage (VCD, DVD, tape, etc) à Reduce size &
n Tele-Medicine
cost, increase media capacity & quality.
n Tele-xxx (Let’s imagine !!)
q On TVs and other home electronic devices: n Compression factor or compression ratio
n DVB-T/DVB-C/DVB-S (Digital Video Broadcasting – q Ratio between the source data and the compressed data.
Terrestrial/Cable/Satellite) à shows MPEG -2 superior quality over
traditional analog TV !! (e.g. 10:1)
n Interactive TV à Internet applications (Mail, Web, E -commerce) on a TV !! n 2 types of compression:
à No need to wait for a PC to startup and shutdown !!
n CD/VCD/DVD/Mp3 players q Lossless compression
q Also appearing in Handheld devices (3G Mobile phones, wireless PDA) !! q Lossy compression
4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 2 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 4
2. Information content and redundancy Lossy Compression
n Information rate n The data from the expander is not identical to
q Entropy is the measure of information content.
the source data but the difference can not be
n à Expressed in bits/source output unit (such as bits/pixel).
The more information in the signal, the higher the
q
distinguished auditorily or visually.
entropy. q Suitable for audio and video compression.
q Lossy compression reduce entropy while lossless
q Compression factor is much higher than that of
compression does not.
lossless. (up to 100:1)
n Redundancy
q The difference between the information rate and bit n Based on the understanding of
rate. psychoacoustic and psychovisual perception.
q Usually the information rate is much less than the bit
rate. n Can be forced to operate at a fixed
q Compression is to eliminate the redundancy. compression factor.
4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 5 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 7
Lossless Compression Process of Compression
n The data from the decoder is identical to the n Communication (reduce the cost of the data
source data. link)
q Example: archives resulting from utilities such as q Data ? Compressor (coder) ? transmission
pkzip or Gzip channel ? Expander (decoder) ? Data'
q Compression factor is around 2:1. n Recording (extend playing time: in proportion
n Can not guarantee a fix compression ratio à to compression factor
The output data rate is variable à problems q Data ? Compressor (coder) ? Storagedevice
for recoding mechanisms or communication (tape, disk, RAM, etc.) ? Expander (decoder)
channel. ? Data‘
4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 6 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 8
3. Sampling and quantization Statistical coding: the Huffman code
n Why sampling? n Assign short code to the most probable data
q Computer can not process analog signal directly. pattern and long code to the less frequent
n PCM data pattern.
q Sample the analog signal at a constant rate and
use a fixed number of bits (usually 8 or 16) to n Bit assignment based on statistic of the
represent the samples. source data.
q bit rate = sampling rate * number of bits per n The statistics of the data should be known
sample prior to the bit assignment.
n Quantization
q Map the sampled analog signal (generally, infinite
precision) to discrete level (finite precision).
q Represent each discrete level with a number.
4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 9 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 11
Predictive coding Drawbacks of compression
n Prediction n Sensitive to data error
q Use previous sample(s) to estimate the current q Compression eliminates the redundancy which is essential
to making data resistant to errors.
sample.
q For most signal, the difference of the prediction n Concealment required for real time application
Error correction code is required, hence, adds redundancy
and actual values is small. à We can use smaller q
to the compressed data.
number of bits to code the difference while
maintaining the same accuracy !! n Artifacts
q Artifacts appear when the coder eliminates part of the
q Noise is completely unpredictable entropy.
n Most codec requires the data being preprocessed or
q The higher the compression factor, the more the artifacts.
otherwise it may perform badly when the data contains
noise.
4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 10 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 12
4. A coding example: Clustering color pixels Motion Compensated Prediction
n More data in Frame-Differential Coding can
n In an image, pixel values are clustered in several be eliminated by comparing the present
peaks pixel to the location of the same object
in the previous frame. (à not to the
n Each cluster representing the color range of one same spatial location in the previous frame)
object in the image (e.g. blue sky) n The encoder estimates the motion in the
image to find the corresponding area in a
n Coding process: previous frame.
1. Separate the pixel values into a limited number of data n The encoder searches for a portion of a
previous frame which is similar to the part
clusters (e.g., clustered pixels of sky blue or grass green) of the new frame to be transmitted.
2. Send the average color of each cluster and an n It then sends (as side information) a
identifying number for each cluster as side information. motion vector telling the decoder what
portion of the previous frame it will use to
3. Transmit, for each pixel: predict the new frame.
n The number of the average cluster color that it is close to. n It also sends the prediction error so that
n Its difference from that average cluster color. (à can be the exact new frame may be reconstituted
coded to reduce redundancy since the differences are often n See top figure à without motion
similar !!) à Prediction compensation – Bottom figure à With
motion compensation
4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 13 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 15
Frame-Differential Coding Unpredictable Information
n Frame-Differential Coding = prediction from a n Unpredictable information from the previous
previous video frame.
n A video frame is stored in the encoder for frame:
comparison with the present frame à causes 1. Scene change (e.g. background landscape
encodinglatency of one frame time. change)
n For still images:
2. Newly uncovered information due to object
q Data can be sent only for the first instance of a frame
q All subsequent prediction error values are zero. motion across a background, or at the edges of a
q Retransmit the frame occasionally to allow receivers that panned scene. (e.g. a soccer ’s face uncovered
have just been turned on to have a starting point. by a flying ball)
n à FDC reduces the information for still images, but
leaves significant data for moving images (e.g. a
movement of the camera)
4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 14 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 16
5. Dealing with unpredictable Information Types of picture transform coding
n Scene change n Types of picture coding:
q à An Intra-coded picture (MPEG I picture) must be sent q Discrete Fourier (DFT)
for a starting point à require more data than Predicted q Karhonen-Loeve
picture (P picture) q Walsh-Hadamard
q I pictures are sent about twice per second àTheir time and q Lapped orthogonal
sending frequency may be adjusted to accommodate
scene changes q Discrete Cosine (DCT) à used in MPEG-2 !
q Wavelets à New !
n Uncovered information
q Bi-directionally coded type of picture, or B picture. n The differences between transform coding methods:
q There must be enough frame storage in the system to wait q The degree of concentration of energy in a few coefficients
for the later picture that has the desired information. q The region of influence of each coefficient in the
q To limit the amount of decoder’s memory, the encoder reconstructed picture
stores pictures and sends the required reference q The appearance and visibility of coding noise due to coarse
pictures before sending the B picture. quantization of the coefficients
4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 17 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 19
Transform Coding DCT Lossy Coding
n Convert spatial image pixel values to n Lossless coding cannot obtain high
transform coefficient values compression ratio (4:1 or less)
n à the number of coefficients produced is n Lossy coding = discard selective information
equal to the number of pixels transformed. so that the reproduction is visually or aurally
n Few coefficients contain most of the indistinguishable from the source or having
energy in a picture à coefficients may be least artifacts.
further coded by lossless entropy coding
n Lossy coding can be achieved by:
n The transform process concentrates the
energy into particular coefficients q Eliminating some DCT coefficients
(generally the “low frequency” coefficients ) q Adjusting the quantizing coarseness of the
coefficients à better !!
4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 18 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 20
6. Masking Run-Level coding
n Masking make certain types of coding n "Run-Level" coding = Coding a run-length of
noise invisible or inaudible due to some zeros followed by a nonzero level.
psycho-visual/acoustical effect. q à Instead of sending all the zero values
individually, the length of the run is sent.
q In audio, a pure tone will mask energy of higher
frequency and also lower frequency (with weaker q Useful for any data with long runs of zeros.
effect). q Run lengths are easily encoded by Huffman code
q In video, high contrast edges mask random noise.
n Noise introduced at low bit rates falls in the
frequency, spatial, or temporal regions
4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 21 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 23
Variable quantization Key points:
n Variable quantization is the main technique of lossy n Compression process
coding à greatly reduce bit rate.
n Quantization & Sampling
n Coarsely quantizing the less significant coefficients
in a transform (à less noticeable / low energy / less n Coding:
visible/audible) q Lossless & lossy coding
n Can be applied to a complete signal or to individual q Frame-Differential Coding
frequency components of a transformed signal. q Motion Compensated Prediction
n VQ also controls instantaneous bit rate in order to: q Variable quantization
q Match average bit rate to a constant channel bit rate. q Run level coding
q Prevent buffer overflow or underflow.
n Masking
4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 22 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 24
7. Chapter 2: Multimedia technologies JPEG – Zig-zag scanning
q Roadmap
n JPEG
n MPEG-1/MPEG-2 Video
n MPEG-1 Layer 3 Audio (mp3)
n MPEG-4
n MPEG-7 (brief introduction)
n HDTV (brief introduction)
n H261/H263 (brief introduction)
n Model base coding (MBC) (brief introduction)
4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 25 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 27
JPEG (Joint Photographic Experts Group) JPEG - DCT
n JPEG encoder n DCT is similar to the Discrete Fourier Transform à
q Partitions image into blocks of 8 * 8 pixels transforms a signal or image from the spatial domain to
q Calculates the Discrete Cosine Transform (DCT) of each block. the frequency domain.
q A quantizer roundsoff the DCT coefficients according to the n DCT requires less multiplications than DFT
quantizationmatrix. à lossy but allows for large compression ratios.
q Produces a series of DCT coefficients using Zig-zag scanning
q Uses a variablelengthcode(VLC) on these DCT coefficients
q Writes the compressed data stream to an output file (*.jpg or *.jpeg).
n JPEG decoder n Input image A:
q File à input data stream à Variable length decoder à IDCT (Inverse q The input image A is N2 pixels wide by N1 pixels high;
DCT) à Image
q A(i,j) is the intensity of the pixel in row i and column j;
n Output image B:
q B(k1,k2) is the DCT coefficient in row k1 and column k2 of
the DCT matrix
4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 26 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 28
8. JPEG - Quantization Matrix MPEG (Moving Picture Expert Group)
n The quantization matrix is the 8 by 8 matrix of step sizes
(sometimescalled quantums) - one element for each DCT
n MPEG is the heart of:
coefficient. q Digital television set-top boxes
n Usually symmetric.
q HDTV decoders
n Step sizes will be:
q Small in the upper left (low frequencies), q DVD players
q Large in the upper right (high frequencies) q Video conferencing
q A step size of 1 is the most precise.
n The quantizer divides the DCT coefficient by its corresponding q Internet video, etc
quantum, then rounds to the nearest integer.
n MPEG standards:
n Large quantums drive small coefficients down to zero.
n The result: q MPEG-1, MPEG-2, MPEG-4, MPEG-7
q Many high frequency coefficients become zero à remove easily. q (MPEG-3 standard was abandoned and became
q The low frequency coefficients undergo only minor adjustment. an extension of MPEG-2)
4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 29 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 31
JPEG Coding process illustrated MPEG standards
n MPEG-1 (Obsolete)
1255 -15 43 58 -12 1 -4 -6 78 -1 4 4 -1 0 0 0 q A standard for storage and retrieval of moving pictures and audio
on storage media
11 -65 80 -73 -27 -1 -5 1 1 -5 6 -4 -1 0 0 0 q application: VCD (video compact disk)
-49 37 -87 8 12 6 10 8 -4 3 -5 0 0 0 0 0
n MPEG-2 (Widely implemented)
27 -50 29 13 3 13 -6 5 Q 2 -3 1 0 0 0 0 0 q A standard for digital television
-16 21 -11 -10 10 -21 9 -6 -1 1 0 0 0 0 0 0 q Applications: DVD (digital versatile disk), HDTV (high definition
TV), DVB (European Digital Video Broadcasting Group), etc.
3 -14 0 14 -14 16 -8 4 0 0 0 0 0 0 0 0
-4 -1 8 -13 12 -9 5 -1 0 0 0 0 0 0 0 0
n MPEG-4 (Newly implemented – still being
researched)
-4 2 -2 6 -7 6 -1 3 0 0 0 0 0 0 0 0 q A standard for multimedia applications
DCT Coefficients Quantization result q Applications: Internet, cable TV, virtual studio, etc.
Zigzag scan result: 78 -1 1 -4 -5 4 4 6 3 2 -1 -3 -5 -4 -1 0 -1 0 1 1 0 0 0 0 0 0 0 n MPEG-7 (Future work – ongoing research)
q Content representation standard for information search
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 EOB
( “Multimedia Content Description Interface”)
à Easily coded by Run-length Huffman coding q Applications: Internet, video search engine, digital library
4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 30 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 32
9. MPEG-2 formal standards Pixel & Block
n The international standard ISO/IEC 13818-2 n Pixel = "picture element".
"Generic Coding of Moving Pictures and q A discrete spatial point sample of an image.
Associated Audio Information”
q A color pixel may be represented digitally as a
n ATSC (Advanced Television Systems number of bits for each of three primary color
Committee) document A/54 "Guide to the Use of values
the ATSC Digital Television Standard”
n Block
q = 8 x 8 array of pixels.
q A block is the fundamental unit for the DCT coding
(discrete cosine transform).
4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 33 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 35
MPEG video data structure Macroblock
n The MPEG 2 video data stream is constructed in n A macroblock = 16 x 16 array of luma (Y) pixels ( =
layers from lowest to highest as follows: 4 blocks = 2 x 2 block array).
q PIXEL is the fundamental unit n The number of chroma pixels (Cr, Cb) will vary
q BLOCK is an 8 x 8 array of pixels depending on the chroma pixel structure
indicated in the sequence header (e.g. 4:2:0, etc)
q MACROBLOCK consists of 4 luma blocks and 2 chroma
blocks n The macroblock is the fundamental unit for motion
q Field DCT Coding and Frame DCT Coding compensation and will have motion vector(s)
associated with it if is predictively coded.
q SLICE consists of a variable number of macroblocks
q PICTURE consists of a frame (or field) of slices n A macroblock is classified as
q Field coded (à An interlaced frame consists of 2 field)
q GROUP of PICTURES (GOP) consists of a variable
number of pictures q Frame coded
à depending on how the four blocks are extracted from the
q SEQUENCE consists of a variable number of GOP’s macroblock.
q PACKETIZED ELEMENTARY STREAM (opt)
4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 34 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 36
10. Slice I, P, B Pictures
Encoded pictures are classified into 3 types: I, P, and B.
n Pictures are divided into slices.
n I Pictures = Intra Coded Pictures
n A slice consists of an arbitrary number of q All macroblocks coded without prediction
q Needed to allow receiver to have a "starting point" for prediction after
successive macroblocks (going left to right), a channel change and to recover from errors
but is typically an entire row of macroblocks. n P Pictures = Predicted Pictures
Macroblocks may be coded with forward prediction from references
A slice does not extend beyond one row. q
made from previous I and P pictures or may be intra coded
n The slice header carries address information n B Pictures = Bi-directionally predicted pictures
q Macroblocks may be coded with forward prediction from previous I
that allows the Huffman decoder to or P references
resynchronize at slice boundaries q Macroblocks may be coded with backward prediction from next I or
P reference
q Macroblocks may be coded with interpolated prediction from past
and future I or P references
q Macroblocks may be intra coded (no prediction)
4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 37 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 39
Picture Group of pictures (GOP)
n The group of pictures layer is optional in MPEG-2.
n A source picture is a contiguous rectangular array of pixels. n GOP begins with a start code and a header
n A picture may be a complete frame of video ("frame picture") or n The header carries
one of the interlaced fields from an interlaced source ("field q time code information
picture"). q editing information
n A field picture does not have any blank lines between its active q optional user data
lines of pixels. n First encoded picture in a GOP is always an I picture
n A coded picture (also called a video access unit) begins with a n Typical length is 15 pictures with the following structure (in display order):
start code and a header. The header consists of: q I B B P B B P B B P B B P B B à Provides an I picture with sufficient
q picture type (I, B, P) frequency to allow a decoder to decode correctly
q temporal reference information Forward motion compensation
q motion vector search range
q optional user data
n A frame picture consists of: I B B P B B P B B P B
Time
q a frame of a progressive source or
q a frame (2 spatially interlaced fields) of an interlaced source
Bidirectional motion compensation
4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 38 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 40
11. Sequence Transport stream
n A sequence begins with a unique 32 bit start code followed by n Transport packets (fixedlength) are formed from a PES stream,
a header. including:
q The PES header
n The header carries:
q Transport packet header.
q picture size q Successive transport packet’s payloads are filled by the remaining
q aspect ratio PES packet content until the PES packet is all used.
q frame rate and bit rate q The final transport packet is filled to a fixed length by stuffing with
0xFF bytes (all ones).
q optional quantizer matrices
n Each PES packet header includes:
q required decoder buffer size
q An 8-bit stream ID identifying the source of the payload.
q chroma pixel structure
q Timing references: PTS (presentation time stamp), the time
q optional user data at which a decoded audio or video access unit is to be
n The sequence information is needed for channel changing. presented by the decoder
n The sequence length depends on acceptable channel change q DTS (decoding time stamp) the time at which an access unit
delay. is decoded by the decoder
q ESCR (elementary stream clock reference).
4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 41 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 43
Packetized Elementary Stream (PES) Intra Frame Coding
n Intra coding only concern with information within the current
n Video Elementary Stream (video ES), consists of all frame, (not relative to any other frame in the video sequence)
the video data for a sequence, including the sequence n MPEG intra-frame coding block diagram (See bottom Fig) à
header and all the subparts of a sequence. Similar to JPEG (àLet’s review JPEG coding mechanism !!)
n An ES carries only one type of data (video or audio) n Basic blocks of Intra frame coder:
from a single video or audio encoder. q Video filter
n A PES, consists of a single ES which has been split q Discrete cosine transform (DCT)
into packets, each starting with an added packet q DCT coefficient quantizer
header. q Run-length amplitude/variable length coder (VLC)
n A PES stream contains only one type of data from
one source, e.g. from one video or audio encoder.
n PES packets have variable length, not corresponding
to the fixed packet length of transport packets, and
may be much longer than a transport packet.
4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 42 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 44
12. Video Filter MPEG Profiles & levels
n MPEG-2 is classified into several profiles.
n Human Visual System (HVS) is
n Main profile features:
q Most sensitive to changes in luminance,
q 4:2:0 chroma sampling format
q Less sensitive to variations in chrominance. q I, P, and B pictures
n MPEG uses the YCbCr color space to represent the q Non-scalable
data values instead of RGB, where: n Main Profile is subdivided into levels.
q Y is the luminance signal, q MP@ML (Main Profile Main Level):
n Designed with CCIR601 standard for interlaced standard digital
q Cb is the blue color difference signal, video.
q Cr is the red color difference signal. n 720 x 576 (PAL) or 720 x 483 (NTSC)
n What is “4:4:4”, “4:2:0”, etc, video format ? n 30 Hz progressive, 60 Hz interlaced
n Maximum bit rate is 15 Mbits/s
q 4:4:4 is full bandwidth YCbCr video à each macroblock q MP@HL (Main Profile High Level):
consists of 4 Y blocks, 4 Cb blocks, and 4 Cr blocks à n Upper bounds:
waste of bandwidth !! n 1152 x 1920, 60Hz progressive
q 4:2:0 is most commonly used in MPEG-2 n 80 Mbits/s
4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 45 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 47
Applications of chroma formats MPEG encoder/ decoder
chroma_for Multiplex order (time)
Application
mat within macroblock
4:2:0 ØMain stream television,
YYYYCbCr
(6 blocks) ØConsumer entertainment.
ØStudio production
4:2:2 environments
YYYYCbCrCbCr
(8 blocks) ØProfessional editing
equipment,
4:4:4
YYYYCbCrCbCrCbCrCbCr ØComputer graphics
(12 blocks)
4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 46 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 48
13. Prediction DCT and IDCT formulas
n Backward prediction is done by n DCT:
storing pictures until the desired
anchor picture is available before q Eq 1 à Normal form
encoding the current stored frames. q Eq 2 à Matrix form
n The encoder can decide to use: n IDCT:
q Forward prediction from a previous q Eq 3 à Normal form
picture, q Eq 4 à Matrix form
q Backward prediction from a following
picture,
n Where:
q or Interpolated prediction q F(u,v) = two-dimensional
NxN DCT.
à to minimize prediction error.
q u,v,x,y = 0,1,2,...N-1
n The encoder must transmit pictures in
an order differ from that of source q x,y are spatial coordinates in
the sample domain.
pictures so that the decoder has the
anchor pictures before decoding q u,v are frequency coordinates
predicted pictures. (See next slide) in the transform domain.
n The decoder must have two frame q C(u), C(v) = 1/(square root
stored. (2)) for u, v = 0.
q C(u), C(v) = 1 otherwise.
4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 49 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 51
I P B Picture Reordering DCT versus DFT
n Pictures are coded and decoded in a different order n The DCT is conceptually similar to the DFT, except:
than they are displayed. q DCT concentrates energy into lower order coefficients
n à Due to bidirectional prediction for B pictures. better than DFT.
q DCT is purely real, the DFT is complex (magnitude and
n For example we have a 12 picture long GOP: phase).
n Source order and encoder input order: q A DCT operation on a block of pixels produces coefficients
q I(1) B(2) B(3) P(4) B(5) B(6) P(7) B(8) B(9) P(10) B(11) that are similar to the frequency domain coefficients
B(12) I(13) produced by a DFT operation.
n An N-point DCT has the same frequency resolution as a 2N-
n Encoding order and order in the coded bitstream: point DFT.
q I(1) P(4) B(2) B(3) P(7) B(5) B(6) P(10) B(8) B(9) I(13) B(11) n The N frequencies of a 2N point DFT correspond to N points
B(12) on the upper half of the unit circle in the complex frequency
n Decoder output order and display order (same as plane.
input): q Assuming a periodic input, the magnitude of the DFT
q I(1) B(2) B(3) P(4) B(5) B(6) P(7) B(8) B(9) P(10) B(11) coefficients is spatially invariant (phase of the input does
not matter). This is not true for the DCT.
B(12) I(13)
4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 50 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 52
14. Quantization matrix MPEG scanning
n Note à DCT n Left à Zigzag scanning (like JPEG)
coefficients are:
q Small in the upper left n Right à Alternate scanning à better for interlaced frames !
(low frequencies),
q Large in the upper right
(high frequencies)
à Recall the JPEG
mechanism !!
n Why ?
q HVS is less sensitive
to errors in high
frequency coefficients
than it is for lower
frequencies
q à higherfrequencies
should be more
coarsely quantized !!
4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 53 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 55
Result DCT matrix (example) Huffman/ Run-Level Coding
n After adaptive n Huffman coding in combination with Run-Level
coding and zig-zag scanning is applied to
quantization, the quantized DCT coefficients.
result is a matrix n "Run-Level" = A run-length of zeros followed by a
containing many non-zero level.
zeros. n Huffman coding is also applied to various types of
side information.
n A Huffman code is an entropy code which is
optimally achieves the shortest average possible
code word length for a source.
n à This average code word length is >= the entropy
of the source.
4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 54 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 56
15. Huffman/ Run-Level coding illustrated MPEG Data Transport
n MPEG packages all data into fixed-size 188-byte packets for transport.
Zero MPEG n Using the DCT output n Video or audio payload data placed in PES packets before is broken up
Amplitude matrix in previous slide,
Run-Length Code Value into fixed length transport packet payloads.
N/A 8 (DC Value) 110 1000
after being zigzag n A PES packet may be much longer than a transport packet à Require
scanned à the output segmentation:
0 4 00001100 will be a sequence of q The PES header is placed immediately following a transport header
0 4 00001100 number: 4, 4, 2, 2, 2, 1, q Successive portions of the PES packet are then placed in the pay loads of
1, 1, 1, 0 (12 zeros), 1, 0 transport packets.
0 2 01000
(41 zeros) q Remaining space in the final transport packet payload is filled with stuffing
0 2 01000 bytes = 0xFF (all ones).
n These values are looked q Each transport packet starts with a sync byte = 0x47.
0 2 01000 up in a fixed table of q In the ATSC US terrestrial DTV VSB transmission system, sync byte is not
0 1 110 variable length codes processed, but is replaced by a different sync symbol especially suited to RF
q à The most probable transmission.
0 1 110 occurrence is given a q The transport packet header contains a 13-bit PID (packet ID), which
corresponds to a particular elementary stream of video, audio, o r other program
0 1 110 relatively short code, element.
0 1 110 q à The least probable q PID 0x0000 is reserved for transport packets carrying a program association
occurrence is given a table (PAT).
12 1 0010 0010 0 relatively long code. q The PAT points to a Program Map Table (PMT)à points to particular elements
EOB EOB 10 of a program
4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 57 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 59
Huffman/ Run-Level coding illustrated (2) MPEG Transport packet
n à The first run of 12 zeroes has been efficiently
coded by only 9 bits
n à The last run of 41 zeroes has been entirely
eliminated, represented only with a 2-bit End Of
Block (EOB) indicator.
n à The quantized DCT coefficients are now PCR_flag
n Adaptation Field: q
represented by a sequence of 61 binary bits (See q 8 bits specifying the length of the q OPCR_flag
adaptation field. q splicing_point_flag
the table). q transport_private_data_flag
q The first group of flags consists of
eight 1-bit flags: q adaptation_field_extension_flag
n Considering that the original 8x8 block of 8-bit q The optionalfields are present if
q discontinuity_indicator
pixels required 512 bits for full representation, à q random_access_indicator q
indicated by one of the preceding flags.
The remainder of the adaptation field is
the compression rate is approx. 8,4:1. q elementary_stream_priority_in filled with stuffing bytes (0xFF, all
ones).
dicator
4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 58 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 60
16. Demultiplexing a Transport Stream (TS) Timing - Synchronization
n Demultiplexing a transport stream involves: n The decoder is synchronized with the encoder by time stamps
1. Finding the PAT by selecting packets with PID = 0x0000 n The encoder contains a master oscillator and counter, called the
2. Reading the PIDs for the PMTs System Time Clock (STC). (See previous block diagram.)
3. Reading the PIDs for the elements of a desired program q à The STC belongs to aparticular program and is the master
clock of the video and audio encoders for that program.
from its PMT (for example, a basic program will have a
q à Multiple programs, each with its own STC, can also be
PID for audio and a PID for video) multiplexed into a single stream.
4. Detecting packets with the desired PIDs and routing them n A program component can even have no time stamps à but
to the decoders can not be synchronized with other components.
q A MPEG-2 transport stream can carry: n At encoder input, (Point A), the time of occurrence of an input
§ Video stream video picture or audio block is noted by sampling the STC.
§ Audio stream n A total delay of encoder and decoder buffer (constant) is
added to STC, creating a Presentation Time Stamp (PTS),
§ Any type of data
q à PTS is then inserted in the first of the packet(s ) representing
à MPEG-2 TS is the packet format for CATV downstream that picture or audio block, at Point B.
data communication.
4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 61 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 63
Timing & buffer control n Point A:
Encoder input
Timing – Synchronization (2)
à
Constant/specifi n Decode Time Stamp (DTS) can optionally combined into the bit
edrate stream à represents the time at which the data should be taken
n Point B: instantaneously from the decoder buffer and decoded.
Encoder
outputà q DTS and PTS are identical except in the case of picture reordering for B
Variable rate pictures.
n Point C: q The DTS is only used where it is needed because of reordering.
Encoderbuffer Whenever DTS is used, PTS is also coded.
output à
Constant rate q PTS (or DTS) inserted interval = 700 m S.
n Point D: q In ATSC à PTS (or DTS) must be inserted at the beginning of each
Communication coded picture (access unit ).
channel +
decoderbuffer n In addition, the output of the encoder buffer (Point C) is time
à Constant stamped with System Time Clock (STC) values, called:
rate q System Clock Reference (SCR) in a Program Stream.
n Point E:
Decoder input q Program Clock Reference (PCR) in a Transport Stream.
à Variable rate n PCR time stamp interval = 100mS.
n Point F:
Decoderoutput n SCR time stamp interval = 700mS.
à n PCR and/or the SCR are used to synchronize the decoder STC
Constant/specifi
edrate with the encoder STC.
4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 62 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 64
17. Timing – Synchronization (3) HDTV (2)
n All video and audio streams included in a program must get their n HDTV proposals are for a screen which is wider than the conventional
time stamps from a common STC so that synchronization of the TV image by about 33%. It is generally agreed that the HDTV aspect
video and audio decoders with each other may be accomplished. ratio will be 16:9, as opposed to the 4:3 ratio of conventional TV
n The data rate and packet rate on the channel (at the multiplexer systems. This ratio has been chosen because psychological tests have
output) can be completely asynchronous with the System Time shown that it best matches the human visual field.
Clock (STC) n It also enables use of existing cinema film formats as additional source
material, since this is the same aspect ratio used in normal 35 mm film.
n PCR time stamps allows synchronizations of different Figure 16.6(a) shows how the aspect ratio of HDTV compares with that
multiplexed programs having different STCs while allowing STC of conventional television, using the same resolution, or the same
recovery for each program. surface area as the comparison metric.
n If there is no buffer underflow or overflow à delays in the buffers n To achieve the improved resolution the video image used in HDTV
and transmission channel for both video and audio are must contain over 1000 lines, as opposed to the 525 and 625 provided
constant. by the existing NTSC and PAL systems. This gives a much improved
vertical resolution. The exact value is chosen to be a simple multiple of
n The encoder input and decoder output run at equal and constant one or both of the vertical resolutions used in conventional TV.
rates . n However, due to the higher scan rates the bandwidth requirement for
n Fixedend-to-end delay from encoder input to decoder output analogue HDTV is approximately 12 MHz, compared to the nominal 6
n If exact synchronization is not required, the decoder clock can be MHz of conventional TV
free running à video frames can be repeated / skipped as
necessary to prevent buffer underflow / overflow , respectively.
4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 65 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 67
HDTV (High definition television) HDTV (3)
n High definition television (HDTV) first came to n The introduction of a non-compatible TV transmission format for
HDTV would require the viewer either to buy a new receiver, or to
public attention in 1981, when NHK, the buy a converter to receive the picture on their old set.
Japanese broadcasting authority, first n The initial thrust in Japan was towards an HDTV format which is
compatible with conventional TV standards, and which can be
demonstrated it in the United States. received by conventional receivers, with conventional quality.
However, to get the full benefit of HDTV, a new wide screen, high
n HDTV is defined by the ITU-R as: resolution receiver has to be purchased.
q 'A system designed to allow viewing at about n One of the principal reasons that HDTV is not already common is
three times the picture height, such that the that a general standard has not yet been agreed. The 26th CCIR
plenary assembly recommended the adoption of a single, worldwide
system is virtually, or nearly, transparent to the standard for high definition television.
quality or portrayal that would have been n Unfortunately, Japan, Europe and North America are all investing
perceived in the original scene ... by a discerning significant time and money in their own systems based on their own,
viewer with normal visual acuity.' current, conventional TV standards and other national
considerations.
4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 66 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 68
18. H261- H263 H261-H263 (3)
n The H.261 algorithm was developed for the purpose of image n H.261 is widely used on 176x 144 pixel images.
transmission rather than image storage. n The ability to select a range of output rates for the algorithm
n It is designed to produce a constant output of p x 64 kbivs, where allows it to be used in different applications.
p is an integer in the range 1 to 30.
q This allows transmission over a digital network or data link of n Low output rates ( p = 1 or 2) are only suitable for face-to-face
varying capacity. (videophone) communication. H.261 is thus the standard used in
q It also allows transmission over a single 64 kbit/s digital many commercial videophone systems such as the UK
telephone channel for low quality video-telephony, or at higher bit BT/Marconi Relate 2000 and the US ATT 2500 products.
rates for improved picture quality. n Video-conferencing would require a greater output data rate ( p >
n The basic coding algorithm is similar to that of MPEG in that it is 6) and might go as high as 2 Mbit/s for high quality transmission
a hybrid of motion compensation, DCT and straightforward with larger image sizes.
DPCM (intra-frame coding mode), without the MPEG I, P, B
frames. n A further development of H.261 is H.263 for lower fixed
n The DCT operation is performed at a low level on 8 x 8 blocks of transmission rates.
error samples from the predicted luminance pixel values, with n This deploys arithmetic coding in place of the variable length
sub-sampled blocks of chrominance data. coding (See H261 diagram), with other modifications, the data
rate is reduced to only 20 kbit/s.
4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 69 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 71
H261-H263 (2) Model Based Coding (MBC)
n At the very low bit rates (20 kbit/s or less) associated with video
telephony, the requirements for image transmission stretch the
compression techniques described earlier to their limits.
n In order to achieve the necessary degree of compression they
often require reduction in spatial resolution or even the
elimination of frames from the sequence.
n Model based coding (MBC) attempts to exploit a greater degree
of redundancy in images than current techniques, in order to
achieve significant image compression but without adversely
degrading the image content information.
n It relies upon the fact that the image quality is largely subjective.
Providing that the appearance of scenes within an observed
image is kept at a visually acceptable level, it may not matter that
the observed image is not a precise reproduction of reality.
4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 70 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 72
19. Model Based Coding (2) Model based coding (4)
n A synthetic image is created by texture mapping detail from an
n One MBC method for producing an artificial image of a head sequence initial full-face source image, over the wire-frame, Facial
utilizes a feature codebook where a range of facial expressions,
sufficient to create an animation, are generated from sub-images or movement can be achieved by manipulation of the vertices of the
templates which are joined together to form a complete face. wire-frame.
n The most important areas of a face, for conveying an expression, are n Head rotation requires the use of simple matrix operations upon
the eyes and mouth, hence the objective is to create an image in which the coordinate array. Facial expression requires the manipulation
the movement of the eyes and mouth is a convincing approximation to of the features controlling the vertices.
the movements of the original subject. n This model based feature codebook approach suffers from the
n When forming the synthetic image, the feature template vectors which drawback of codebook formation.
form the closest match to those of the original moving sequence are
selected from the codebook and then transmitted as low bit rate coded n This has to be done off-line and, consequently, the image is
addresses. required to be prerecorded, with a consequent delay.
n By using only 10 eye and 10 mouth templates, for instance, a total of n However, the actual image sequence can be sent at a very low
100 combinations exists implying that only a 6 -bit codebook address data rate. For a codebook with 128 entries where 7 bits are
need be transmitted. required to code each mouth, a 25 frameh sequence requires
n It has been found that there are only 13 visually distinct mouth shapes less than 200 bit/s to code the mouth movements.
for vowel and consonant formation during speech. n When it is finally implemented, rates as low as 1 kbit/s are
n However, the number of mouth sub-images is usually increased, to confidently expected from MBC systems, but they can only
include intermediate expressions and hence avoid step changes in the transmit image sequences which match the stored model, e.g.
image. head and shoulders displays.
4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 73 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 75
Model Based Coding (3) Key points:
n Another common way of representing objects in three-
dimensional computer graphics is by a net of n JPEG coding mechanism à DCT/ Zigzag Scanning/ Adaptive
interconnecting polygons. Quantization / VLC
n A model is stored as a set of linked arrays which specify n MPEG layered structure:
the coordinates of each polygon vertex, with the lines q Pixel, Block, Macroblock, Field DCT Coding / Frame DCT Coding, Slice,
connecting the vertices together forming each side of a Picture, Group of Pictures (GOP), Sequence, Packetized Elementary Stream
polygon. (PES)
n To make realistic models, the polygon net can be
shaded to reflect the presence of light sources. n MPEG compression mechanism:
n The wire-frame model [Welch 19911 can be modified to q Prediction
fit the shape of a person's head and shoulders. The q Motion compensation
wire-frame, composed of over 100 interconnecting q Scanning
triangles, can produce subjectively acceptable synthetic q YCbCr formats (4:4:4, 4:2:0, etc)
images, providing that the frame is not rotated by more
than 30" from the full -face position. q Profiles @ Level
n The model, (see the Figure) uses smaller triangles in q I,P,B pictures & reordering
areas associated with high degrees of curvature where q Encoder/ Decoder process & Block diagram
significant movement is required. n MPEG Data transport
n Large flat areas, such as the forehead, contain fewer n MPEG Timing & Buffer control
triangles.
q STC/SCR/DTS
n A second wire-frame is used to model the mouth
interior. q PCR/PTS
4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 74 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 76
20. Technical terms A Brief History:
n Macro blocks
n HVS = Human Visual System q CATV appeared in the 60s in the US, where high
n GOP = Group of Pictures buildings are the great obstacles for the
n VLC = Variable Length Coding/Coder propagation of TV signal.
n IDCT/DCT = (Inverse) Discrete Cosine Transform
n PES = Packetized ElementaryStream q Old CATV networks à
n MP@ML = Main profile @ Main Level n Coaxial only
n PCR = Program Clock Reference n Tree-and-Branch only
n SCR = System Clock Reference
n TV only
n STC = System Time Clock
n PTS = Presentation Time Stamp n No return path (à high-pass filters are installed in
n DTS = Decode Time Stamp customer’s houses to block return low frequency noise)
n PAT = Program Association Table
n PMT = Program Map Table
4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 77 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 79
Chapter 3. CATV systems Modern CATV networks
n Key elements:
q CO or
n Overview: Master
Headend
q A brief history q Headends/
Hub
q Modern CATV networks q Server
complex
q CMTS
q CATV systems and equipments q TV content
provider
q Optical
Nodes
q Taps
q Amplifiers
(GNA/TNA/L
E)
4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 78 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 80
21. Modern CATV networks (2) CATV systems and equipments
n Based on Hybrid Fiber-Coaxial architecture à also referred to
as “HFC networks”
n The optical section is based on modern optical communication
technologies à
q Star/ring/mesh, etc topologies
q SDH/SONET for digital fibers
q Various architectures à digital, analog or mixed fiber cabling
systems.
n Part of forward path spectrum is used for high-speed Internet
access
n Return path is exploited for Digital data communication à the
root of new problems !!
q 5-60 MHz band for upstream
q 88-860 MHz band for downstream
n 88-450 MHz for analog/digital TV channels
n 450-860 MHz for Internet access
q FDM
4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 81 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 83
Spectrum allocation of CATV networks Vocabulary
n Perception = Su nhan thuc
n Lap = Phu len
4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 82 4/2/2003 Nguyen Chan Hung– Hanoi University of Technology 84