MULTIMEDIA COMMUNICATION & NETWORKS

13PIT101
Multimedia Communication & Networks

UNIT - IV

Dr.A.Kathirvel
Professor & Head/IT - VCEW

Unit - IV
Stream characteristics for Continuous media –
Temporal Relationship – Object Stream
Interactions,
Media
Levity,
Media
Synchronization – Models for Temporal
Specifications – Streaming of Audio and Video
– Jitter – Fixed playout and Adaptive playout –
Recovering from packet loss – RTSP ––
Multimedia Communication Standards –
RTP/RTCP – SIP and H.263

Multimedia- Definitions
– Multi - many; much; multiple
– Medium - a substance regarded as the means of
transmission of a force or effect; a channel or
system of communication, information, or
entertainment
(Merriam-Webster Dictionary )

• So, Multimedia???
– The terms Multi & Medium don’t seem fit well
– Term Medium needs more clarification !!!
3

Perception

• Medium
– Means for distribution
and presentation of
information
• Classification based on
perception (text, audio,
video) is appropriate for
defining multimedia

Representation

Presentation

Criteria for the classification
of medium

Storage

Transmission

Information
Exchange

4

• Time always takes separate dimension in the media
representation
• Based on time-dimension in the representation space,
media can be
– Time-independent (Discrete)
• Text, Graphics

– Time dependent (Continuous)
• Audio, Video
• Video, sequence of frames (images) presented to the user periodically.
• Time dependent aperiodic media is not continuous!!

– Discrete & Continuous here has no connection with internal
representation !! (relates to the viewers impression…)

5

• Multimedia is any combination of digitally manipulated text,
art, sound, animation and video.
• A more strict version of the definition of multimedia do not
allow just any combination of media.
• It requires
– Both continuous & discrete media to be utilized
– Significant level of independence between media being
used
• The less stricter version of the definition is used in practice.

6

• Multimedia elements are composed into a project using
authoring tools.
• Multimedia Authoring tools are those programs that provide
the capability for creating a complete multimedia presentations
by linking together objects such as a paragraph of text (song),
an illustration, an audio, with appropriate interactive user
control.

7

• By defining the objects' relationships to each other, and by
sequencing them in an appropriate order, authors (those who
use authoring tools) can produce attractive and useful
graphical applications.

• To name a few authoring tools
– Macromedia Flash
– Macromedia Director
– Authorware
• The hardware and the software that govern the limits of what
can happen are multimedia platform or environment

8

• Multimedia is interactive when the end-user is allowed to
control what and when the elements are delivered.
• Interactive Multimedia is Hypermedia, when the end-user is
provided with the structure of linked elements through which
he/she can navigate.

9

• Multimedia is linear, when it is not interactive and the users
just sit and watch as if it is a movie.
• Multimedia is nonlinear, when the users are given the
navigational control and can browse the contents at will.

10

Multimedia System
• Following the dictionary definitions, Multimedia system is any
system that supports more than a single kind of media
– Implies any system processing text and image will be a
multimedia system!!!
– Note, the definition is quantitative. A qualitative definition
would be more appropriate.
– The kind of media supported should be considered, rather
the number of media
11

Multimedia System
• A multimedia system is characterized by computer-controlled,
integrated production, manipulation, storage and
communication of independent information, which is encoded
at least through a continuous (time-dependent) and a discrete
(time-independent) medium.

12

Data streams
• Data Stream is any sequence of individual packets transmitted
in a time-dependent fashion
– Packets can carry information of either continuous or
discrete media
• Transmission modes
– Asynchronous
• Packets can reach receiver as fast as possible.
• Suited for discrete media
• Additional time constraints must be imposed for
continuous media
13

Data streams
– Synchronous
• Defines maximum end-to-end delay
• Packets can be received at an arbitrarily earlier time
• For retrieving uncompressed video at data rate
140Mbits/s & maximal end-to-end delay 1 second the
receiver should have temporary storage 17.5 Mbytes

– Isochronous
• Defines maximum & minimum end-to-end delay
• Storage requirements at the receiver reduces
14

Characterizing continuous media streams
• Periodicity
– Strongly periodic
• PCM -Pulse-code modulation coded speech in telephony behaves
this way.
– Weakly Periodic
– Aperiodic
• Cooperative applications with shared windows
• Variation of consecutive packet size
– Strongly Regular
• Uncompressed digital data transmission
– Weakly regular
• Mpeg standard frame size I:P:B is 10:1:2
– Irregular
15

Characterizing continuous media streams
• Contiguous packets
– Are contiguous packets are transmitted directly one after
another/ any delay?
– Can be seen as the utilization of resource
– Connected / unconnected data streams

16

Streaming Media
• Popular approach to continuous media over the internet
• Playback at users computer is done while the media is
being transferred (no waiting till complete download!!!)
• You can find streaming in
– Internet radio stations
– Distance learning
– Cricket Live!!!
– Movie Trailers

17

Multimedia- Applications
Multimedia plays major role in following areas
– Instruction
– Business
– Advertisements
– Training materials
– Presentations
– Customer support services
– Entertainment
– Interactive Games
19

– Enabling Technology
– Accessibility to web based materials
– Teaching-learning disabled children & adults

– Fine Arts & Humanities
– Museum tours
– Art exhibitions
– Presentations of literature

20


In Medicine
Source:
Cardiac
Imaging,
YALE centre
for advanced
cardiac
imaging

21


In training

22


Public awareness
campaign
Source
Interactive
Multimedia Project
Department of food
science& nutrition,
Colorado State Univ

23

Notion of Synchronization
• Multimedia synchronization is understood in correspondence
of content relation, spatial relation and temporal relation

• Content Relation: defines dependency of media objects for
some data
– Example: dependency between a filled spreadsheet and
graphics that represent data listed in spreadsheet

Spatial Relation
• Spatial relation is represented through layout relation and defines
space used for presentation of a media object on an output device at a
certain point of time in a multimedia document
– Example: desktop publishing
– Layout frames are placed on an output device and a content is
assigned to this frame
• Positioning of layout frames:
– Fixed to a position of a document
– Fixed to a position on a page
– Relative to the position of other frame
– Concept of frames is used for positioning of time-dependent
objects
• Example: in window-based systems, layout frames correspond
to windows and video can be positioned in a window

Temporal Relation
• Temporal relation defines temporal dependencies between
media objects
– Example: lip synchronization
• This relation is the focus of our papers, we will not talk
about content or spatial relation
• Time-dependent objects – represent a media stream
because there exists temporal relations between
consecutive units of the stream
• Time-independent objects are traditional media such as
images or text.

Temporal Relations (2)
• Temporal synchronization is supported by many system
components:
– OS (CPU scheduling, semaphores during IPC)
– Communication systems (traffic shaping, network
scheduling)
– Databases
– Document handling

• Synchronization needed at several levels of a Multimedia
System

Temporal Relations (3)
• 1. level: OS and lower communication layers bundle
single streams
– Objective is to avoid jitter at the presentation time of
one stream
• 2. level: on top of this level is the RUN-TIME support for
synchronization of multimedia streams (schedulers)
– Objective is bounded skews between various streams
• 3. level: next level holds the run-time support for
synchronization between time-dependent and timeindependent media together with handling of user
interaction
– Objective is bounded skews between time-dependent
and time-independent media

Specification of Synchronization
• Implicit Specification
– Temporal relation may be specified implicitly during capturing of
the media objects; the goal of a presentation is to present media in
the same ways as they were originally captured
• Audio/video recording and playback /VOD applications
• Explicit Specification
– Temporal relation may be specified explicitly in the case of
presentations that are composed of independently captured or
otherwise created objects
• Slide show where presentation designed
– Selects appropriate slides
– Creates audio slides
– Defines units of audio presentation stream
– Defines units of audio presentation stream where slides
have to be presented

Inter-object and Intra-Object
Synchronization
• Intra-object synchronization refers to the time relation between
various presentation units of one time-dependent media object
• Inter-object synchronization refers to the synchronization
between media objects

40ms

40ms
Audio 2

Audio 1
Slides

Video

Animation

Classification of Synchronization Units
• Logical Data Units (LDU)
– Samples or Pixels,
– Notes or frames
– Movements or scenes
– Symphony or movie
• Fixed LDU vs Variable LDU
• LDU specification during recording vs LDU defined by
user
• Open LDU vs Closed LDU

Live Synchronization
• Goal is to exactly reproduce at a presentation the temporal
relations as they existed during the capturing process
• Need to capture temporal relation information during the
capturing
• Live Sync is needed in conversational services
– Video Conferencing, Video Phone
– Recording and Retrieval services are considered
retrieval services, or presentations with delay

Synthetic Synchronization
• Temporal relations are artificially specified
• Often used in presentation and retrieval-based systems
with stored data objects that are arranged to provide new
combined multimedia objects
– Authoring and tutoring systems
• Need synchronization editors to support flexible
synchronization relations between media
• Two phases: (1) specification phase defines temporal
relations, (2) presentation phase presents data in a
synchronized mode
– 4 audio messages recorded relate to parts of an engine
in an animation; the animation sequence shows a slow
360 degrees rotation of the engine.

Synchronization Requirements
• For intra-object synchronization:
– Accuracy concerning jitters and end-to-end delays in the
presentations of LDUs
• For inter-object synchronization:
– Accuracy in the parallel presentation of media objects
• Implication of blocking method:
– Fine for time-independent media
– Gap problem for time-dependent media
• What does the blocking of a stream mean for the output
device?
• Should be repeated previous parts in case of speech or
music?
• Should the last picture of a stream be shown?
• How long can such a gap exist?

Synchronization Requirements (2)
• Solutions to Gap problem
– Restricted blocking method
• Switching to alternative presentation if gap between late
video and audio exceeds a predefined threshold
• Show last picture as a still image
– Re-sampling of a stream
• Speed up or slow down streams for the purpose of
synchronization
– Off-line-re-sampling – used after capturing of media
streams with independent devices
» Concert which is captured with two independent
audio and video devices
– Online re-sampling – used during a presentation in the
case that at run-time a gap between media streams occurs

• Lip Synchronization Requirements refer to temporal relation
between audio and video streams for human speaking
• Time difference between related audio and video LDUs is
called synchronization skew
• Streams are in sync if skew = 0 or skew ≤ bound
• Streams are out of sync if skew > 0
• Bounds:
– Audio/Video in sync means -80ms ≤ skew ≤ 80ms
– Audio/Video out of sync means skew < -160ms or skew >
160ms
– Transient means -160ms ≤ skew < -80ms, and 80ms < skew
≤ 160ms

• Pointer Synchronization Requirements are very important in
computer-supported cooperative work (CSCW)
• We need synchronization between graphics, pointers and audio
• Comparison
– Lip sync error is skew between 40 to 60ms
– Pointer sync error is skew between 250 to 1500ms
• Bound:
– Pointer/Audio/Graphics in sync means -500ms ≤ skew
750ms
– Out of sync means skew < -1000ms and skew > 1250ms
– Transient means -1000ms ≤ skew < -500ms and 750ms <
skew ≤ 1250ms

• Digital audio on CD-ROM:
– maximum allowable jitter delay in perception
experiments 5-10ns, other experiments suggest 2ms
• Combination of audio and animation is not as stringent as
lip synchronization
– Maximum allowable skew is +/- 80ms
• Stereo audio is tightly coupled
– Maximum allowable skew is 20ms; because of
listening errors, suggestion for skew is +/- 11ms
• Loosely coupled audio channels: speaker and background
music
– Maximum allowable skew is 500ms.

• Production-level synchronization should be guaranteed
prior to the presentation of data at the user interface
– For example, in case of recording of synchronized data
for subsequent playback
• Stored data should be captured and recorded with
no skew
• For playback, the defined lip sync boundaries are
80 ms
• For playback at local and remote workstation
simultaneously, sync skews should be between 160ms and 0ms (video should be ahead of audio for
remote station due to pre-fetching)

• Presentation-level synchronization should be defined at
the user interface
• This synchronization focuses on human perception
– Examples
• Video and image overlay +/- 240ms
• Video and image non-overlay +/- 500ms
• Audio and image (music with notes) +/- 5ms
• Audio and slide show (loosely coupled image) +/500ms
• Audio and text (Text annotation) +/-240ms

Reference Model for Synchronization
• Synchronization of multimedia objects are classified with
respect to four-level system
SPECIFICATION LEVEL (is an open layer, includes applications and tools
Which allow to create synchronization specifications, e.g. sync editors); editing
And formatting, mapping of user QoS to abstractions at object level
OBJECT/SERVICE LEVEL (operates on all types of media and hides differences
between discrete and continuous media), plan and coordinate presentations, initiate
presentations
STREAM LEVEL (operates on multiple media streams, provides inter-stream
synchronization), resource reservation and scheduling

MEDIA LEVEL (operates on single stream; treaded as a sequence of LDUs,
provides intra-stream synchronization), file and device access

Synchronization in Distributed
Environments
• Information of synchronization must be transmitted with audio
and video streams, so that the receiver side can synchronize the
streams
• Delivery of complete sync information can be done before the
start of presentation
– This is used in synthetic synchronization
• Advantage: simple implementation
• Disadvantage: presentation delay
• Deliver of complete sync information can be using out-of-band
communication via a separate sync channel
– This is used in live synchronization
• Advantage: no additional presentation delays
• Disadvantage: additional channel is needed; additional
errors can occur

Environments (2)
• Delivery of complete synchronization information can be done
using in-band communication via multiplexed data streams,
i.e., synchronization information is in headers of the
multimedia PDU
– Advantage: related sync information is delivered together
with media units
– Disadvantage: difficult to use for multiple sources

Environments (3)
Location of Synchronization Operations

• It is possible to synchronize media objects by recording
objects together and leave them together as one object, i.e.,
combine objects into a new media object during creation;
Synchronization operation happens then at the recording site
• Synchronization operation can be placed at the sink. In this
case the demand on bandwidth is larger because additional
sync information must be transported
• Synchronization operation can be placed at the source. In this
case the demand on bandwidth is smaller because the streams
are multiplexed according to synchronization requirements

Synchronization in Distributed Environments (4)
Clock Synchronization

• Consider synchronization accuracy between clocks at
source and destination
• Global time-based synchronization needs clock
synchronization
• In order to re-synchronize, we can allocate buffers at the
sink and start transmission of audio and video in advance,
or use NTP (Network Time Protocol) to bound the
maximum clock offset

Environments (5)
Other Synchronization Issues
• Synchronization in distributed environment is a multi-step
process
– Sync must be considered during object acquisition (during video
digitization)
– Sync must be considered during retrieval (synchronize access to
frames of a stored video)
– Sync must be considered during delivery of LDUs to the network
(traffic shaping)
– Sync must be considered during transport (use isochronous
protocols if possible)
– Sync must be considered at the sink (sync delivery to the output
devices)

Synchronization Specification Methods
• Interval-based Specification
– presentation duration of an object is considered as
interval
• Examples of operations: A before(0) B, A overlap
B, A starts B, A equals B, A during B, A while(0,0)
B
– Advantage: easy to handle open LDUs ad therefore
user interactions
– Disadvantage: model does not include skew
specifications Audio 1
Audio 2
slides
Video 1

animation

Synchronization Specification (2)
• Control Flow-based Specification – Hierarchical
Approach
– Flow of concurrent presentation threads is
synchronized in predefined points of the presentation
– Basic hierarchical specification: 1. serial
synchronization, 2. parallel synchronization of actions
– Action can be atomic or compound
– Atomic action handles presentation of a single media
object, user input or delay
– Compound actions are combinations of
synchronization operators and atomic actions
– Delay as an atomic action allows modeling of further
synchronization (e.g. delay in serial presentation)

• Control Flow-based Specification – Hierarchical Approach
• Advantage: easy to understand, natural support of hierarchy and
integration of interactive objects is easy
• Disadvantage: Additional description of skews and QoS is necessary, we
must add presentation durations

Slides
Animation
Audio 1

Video

Audio 2

• Control flow-based Synchronization Specification –
Timed Petri Nets
transition

Input place with token

Output place

• Advantages: Timed Petri nets allow all kinds of
synchronization specification
• Disadvantage: Difficulty with complex specification and
offers insufficient abstraction of media object content
because the media objects must be split into sub-objects

OBJECTIVES:
 To show how audio/video files can be downloaded for future use or
broadcast to clients over the Internet. The Internet can also be used for live
audio/video interaction. Audio and video need to be digitized before being
sent over the Internet.
 To discuss how audio and video files are compressed for transmission
through the Internet.

 To discuss the phenomenon called Jitter that can be created on a packetswitched network when transmitting real-time data.

 To introduce the Real-Time Transport Protocol (RTP) and Real-Time
Transport Control Protocol (RTCP) used in multimedia applications.

 To discuss voice over IP as a real-time interactive audio/video application.
TCP/IP Protocol Suite

52

OBJECTIVES (continued):
 To introduce the Session Initiation Protocol (SIP) as an application
layer protocol that establishes, manages, and terminates multimedia
sessions.
 To introduce quality of service (QoS) and how it can be improved
using scheduling techniques and traffic shaping techniques.
 To discuss Integrated Services and Differential Services and how
they can be implemented.

 To introduce Resource Reservation Protocol (RSVP) as a signaling
protocol that helps IP create a flow and makes a resource reservation.


53

INTRODUCTION
We can divide audio and video services into three broad
categories: streaming stored audio/video, streaming live
audio/video, and interactive audio/video, as shown in Figure 1.
Streaming means a user can listen (or watch) the file after the
downloading has started.


54

Figure 1: Internet audio/video


55

Note

Streaming stored audio/video refers to ondemand requests for compressed
audio/video files.


56

Note

Streaming live audio/video refers to
the broadcasting of radio and TV
programs through the Internet.


57

Note

Interactive audio/video refers to the use of the
Internet for interactive
audio/video applications.


58

DIGITIZING AUDIO AND VIDEO
Before audio or video signals can be sent on the
Internet, they need to be digitized. We discuss audio and
video separately.


59

Topics Discussed in the Section
 Digitizing Audio
 Digitizing Video


60

Note

Compression is needed to send video over the
Internet.


61

AUDIO AND VIDEO COMPRESSION
To send audio or video over the Internet requires
compression. In this section, we first discuss audio
compression and then video compression.


62

 Audio Compression
 Video Compression


63

Figure 2


JPEG gray scale

64

Figure 3


JPEG process

65

Figure 4


Case 1: uniform gray scale

66

Figure 5


Case2: two sections

67

Figure 6


Case 3 : gradient gray scale

68

Figure 7 Reading the table


69

Figure 8 MPEG frames


70

Figure 9


MPEG frame construction

71

STREAMING STORED AUDIO/VIDEO
Now that we have discussed digitizing and compressing
audio/video, we turn our attention to specific applications.
The first is streaming stored audio and video.
Downloading these types of files from a Web server can
be different from downloading other types of files. To
understand the concept, let us discuss three approaches,
each with a different complexity.


72


 First Approach: Using a Web Server
 Second Approach: Using a Web Server with Metafile
 Third Approach: Using a Media Server
 Fourth Approach: Using a Media Server and RTSP


73

Figure 10 Using a Web server

1

GET: audio/video file
RESPONSE

3
Audio/video
file


74

2

Figure 11 Using a Web server with a metafile

1

GET: metafile

RESPONSE

2

3
Metafile

4


RESPONSE


75

5

Figure 12 Using a media server

1

GET: metafile

RESPONSE

2

3
Metafile

4


RESPONSE


76

5

Figure 13 Using a media server and RSTP
1

GET: metafile

RESPONSE

2

3
Metafile
4

SETUP
RESPONSE

6

5

PLAY
RESPONSE

7

Audio/video
Stream
8

TEARDOWN
RESPONSE


77

9

STREAMING LIVE AUDIO/VIDEO
Streaming live audio/video is similar to the
broadcasting of audio and video by radio and TV
stations. Instead of broadcasting to the air, the
stations broadcast through the Internet. There are
several similarities between streaming stored
audio/video and streaming live audio/video. They are
both sensitive to delay; neither can accept
retransmission. However, there is a difference. In the
first application, the communication is unicast and
on-demand. In the second, the communication is
multicast and live.

78

REAL-TIME INTERACTIVE AUDIO/VIDEO
In real-time interactive audio/video, people communicate
with one another in real time. The Internet phone or voice
over IP is an example of this type of application. Video
conferencing is another example that allows people to
communicate visually and orally.


79

 Characteristics


80

Figure 14 Time relationship


81

Note

Jitter is introduced in real-time data by the delay
between packets.


82

Figure 15 Jitter


83

Figure 16 Timestamp


84

Note

To prevent jitter, we can timestamp the packets
and separate the arrival time from the playback
time.


85

Figure 17 Playback buffer


86

Note

A playback buffer is required for
real-time traffic.


87

Note

A sequence number on each packet is required
for real-time traffic.


88

Note

Real-time traffic needs the support of
multicasting.


89

Note

Translation means changing the encoding of a
payload to a lower
quality to match the bandwidth
of the receiving network.


90

Note

Mixing means combining several streams of
traffic into one stream.


91

Note

TCP, with all its sophistication, is not suitable
for interactive multimedia
traffic because we cannot allow retransmission
of packets.


92

Note

UDP is more suitable than TCP for interactive
traffic. However, we need
the services of RTP, another
transport layer protocol, to make
up for the deficiencies of UDP.


93

RTP
Real-time Transport Protocol (RTP) is the protocol
designed to handle real-time traffic on the Internet. RTP
does not have a delivery mechanism (multicasting, port
numbers, and so on); it must be used with UDP. RTP
stands between UDP and the application program. The
main contributions of RTP are timestamping, sequencing,
and mixing facilities.


94


 RTP Packet Format
 UDP Port


95

Figure 18 RTP packet header format


96

Figure 19 RTP packet header format


97

Note

RTP uses a temporary even-numbered UDP
port.


99

RTCP
RTP allows only one type of message, one that
carries data from the source to the destination. In
many cases, there is a need for other messages in a
session. These messages control the flow and
quality of data and allow the recipient to send
feedback to the source or sources. Real-Time
Transport Control Protocol (RTCP) is a protocol
designed for this purpose.


100


 Sender Report
 Receiver Report
 Source Description Message
 Bye Message
 Application-Specific Message
 UDP Port


101

Figure 20 RTCP message types


102

Note

RTCP uses an odd-numbered UDP port number
that follows the port number
selected for RTP.


103

VOICE OVER IP
Let us concentrate on one real-time interactive
audio/video application: voice over IP, or Internet
telephony. The idea is to use the Internet as a
telephone network with some additional capabilities.
Instead of communicating over a circuit-switched
network, this application allows communication
between two parties over the packet-switched
Internet. Two protocols have been designed to
handle this type of communication: SIP and H.323.
We briefly discuss both.

104

 SIP
 H.323


105

Figure 21 SIP messages


106

Figure 22 SIP formats


107

Figure 23 SIP simple session

INVITE: address, options
Establishing

OK: address
ACK

Communicating

Exchanging audio

BYE

Terminating


108

Figure 24 Tracking the callee

INVITE
Lookup
Reply

INVITE

OK

OK

ACK
ACK

Exchanging audio

BYE

109

Figure 25 H.323 architecture


110

Figure 26 H.323 protocols


111

Figure 27 H.323 example

Find IP address
of gatekeeper

Q.931 message
for setup

RTP for audio exchange
RTCP for management

Q.931 message
for termination

112

Chronology of Video Standards
H.261

ITU-T

H.263

H.263++
H.263+

ISO

MPEG 1

1990

MPEG 4
MPEG 2

1992

H.263L

1994

1996

MPEG 7

1998

2000

2002

Chronology of Video Standards
• (1990) H.261, ITU-T
– Designed to work at multiples of 64 kb/s (px64).
– Operates on standard frame sizes CIF, QCIF.
• (1992) MPEG-1, ISO “Storage & Retrieval of Audio &
Video”
– Evolution of H.261.
– Main application is CD-ROM based video (~1.5 Mb/s).

Chronology continued
• (1994-5) MPEG-2, ISO “Digital Television”
– Evolution of MPEG-1.
– Main application is video broadcast (DirecTV, DVD,
HDTV).
– Typically operates at data rates of 2-3 Mb/s and above.

• (1996) H.263, ITU-T
– Evolution of all of the above.
– Supports more standard frame sizes (SQCIF, QCIF,
CIF, 4CIF, 16CIF).
– Targeted low bit rate video <64 kb/s. Works well at
high rates, too.
• (1/98) H.263 Ver. 2 (H.263+), ITU-T
– Additional negotiable options for H.263.
– New features include: deblocking filter, scalability,
slicing for network packetization and local decode,
square pixel support, arbitrary frame size, chromakey
transparency, etc…

• (1/99) MPEG-4, ISO “Multimedia Applications”
– MPEG4 video based on H.263, similar to H.263+
– Adds more sophisticated binary and multi-bit
transparency support.
– Support for multi-layered, non-rectangular video display.
• (2H/’00) H.263++ (H.263V3), ITU-T
– Tentative work item.
– Addition of features to H.263.
– Maintain backward compatibility with H.263 V.1.

• (2001) MPEG7, ISO “Content Representation for Info
Search”
– Specify a standardized description of various types of
multimedia information. This description shall be
associated with the content itself, to allow fast and
efficient searching for material that is of a user’s
interest.
• (2002) H.263L, ITU-T
– Call for Proposals, early ‘98.
– Proposals reviewed through 11/98, decision to proceed.
– Determined in 2001

Section 1: Conferencing Video
 Video Compression Review
 Chronology of Video Standards

 The Input Video Format
 H.263 Overview
 H.263+ Overview

Video Format for Conferencing
• Input color format is YCbCr (a.k.a. YUV). Y is the
luminance component, U & V are chrominance (color
difference) components.
• Chrominance is subsampled by two in each direction.
• Input frame size is based on the Common Intermediate
Format (CIF) which is 352x288 pixels for luminance and
176x144 for each of the chrominance components.

=

Y
Cr

Cb

YCbCr (YUV) Color Space
• Defined as input color space to H.263, H.263+, H.261,
MPEG, etc.
• It’s a 3x3 transformation from RGB.

Y
Cb
Cr

0.299 0.587 0.114 R
=
-0.169 -0.331 0.500 G
0.500 -0.419 -0.081 B

Y represents the luminance of a pixel.
Cr, Cb represents the color difference or chrominance of a
pixel.

Subsampled Chrominance
• The human eye is more sensitive to spatial detail in
luminance than in chrominance.
• Hence, it doesn’t make sense to have as many pixels
in the chrominance planes.

Spatial relation between luma and
chroma pels for CIF 4:2:0
Different than
MPEG-2 4:2:0

luminance pel
chrominance pel
sample
block edge

Common Intermediate Format
•The input video format is based on Common Intermediate
Format or CIF.
•It is called Common Intermediate Format because it is
derivable from both 525 line/60 Hz (NTSC) and 625 line/50
Hz (PAL) video signals.
•CIF is defined as 352 pels per line and 288 lines per frame.
•The picture area for CIF is defined to have an aspect ratio of
about 4:3 . However,

352 / 4  3  264  288

Picture & Pixel Aspect Ratios

288

Pixels are not square in CIF.

Pixel
12:11
352
Picture
4:3

Picture & Pixel Aspect Ratios
Hence on a square pixel display such as a computer screen,
the video will look slightly compressed horizontally. The
solution is to spatially resample the video frames to be
384 x 288 or 352 x 264

This corresponds to a 4:3 aspect ratio for the picture area on
a square pixel display.

Blocks and Macroblocks
The luma and chroma planes are divided into 8x8 pixel
blocks. Every four luma blocks are associated with a
corresponding Cb and Cr block to create a macroblock.
macroblock

Cb

Y

8x8 pixel blocks

Cr

Section 1: Conferencing Video
Video Compression Review
Chronology of Video Standards
The Input Video Format

H.263+ Overview

ITU-T Recommendation H.263
• H.263 targets low data rates (< 28 kb/s). For example it can
compress QCIF video to 10-15 fps at 20 kb/s.
• For the first time there is a standard video codec that can be
used for video conferencing over normal phone lines (H.324).
• H.263 is also used in ISDN-based VC (H.320) and
network/Internet VC (H.323).

ITU-T Recommendation H.263
Composed of a baseline plus
four negotiable options
Baseline Codec

Unrestricted/Extended Motion Vector Mode

Advanced Prediction Mode

PB Frames Mode

Syntax-based Arithmetic Coding Mode

Frame Formats

Always 12:11 pixel aspect ratio.

Picture & Macroblock Types
• Two picture types:
– INTRA (I-frame) implies no temporal prediction is
performed.
– INTER (P-frame) may employ temporal prediction.
• Macroblock (MB) types:
– INTRA & INTER MB types (even in P-frames).
• INTER MBs have shorter symbols in P frames
• INTRA MBs have shorter symbols in I frames
– Not coded - MB data is copied from previous decoded
frame.

Motion Vectors
• Motion vectors have 1/2 pixel
granularity. Reference frames must
be interpolated by two.
• MV’s are not coded directly, but
rather a median predictor is used.
• The predictor residual is then coded
using a VLC table.

B

A

X

MV  MVX  median MVA , MVB , MVC 

C

Code length in bits

Motion Vector Delta (MVD) Symbol
Lengths
14
12
10
8
6
4
2
0
0

0.5

1

1.5

2

2.5 - 4.0 - 5.5 - 12.53.5 5.0 12.0 15.5

MVD Absolute Value

Transform Coefficient Coding
Assign a variable length code according to three
parameters (3-D VLC):

1 - Length of the run of zeros preceding the current
nonzero coefficient.
2 - Amplitude of the current coefficient.

3 - Indication of whether current coefficient is the
last one in the block.
3 - The most common are variable length coded (313 bits), the rest are coded with escape sequences
(22 bits)

Quantization
• H.263 uses a scalar quantizer with center
clipping.
• Quantizer varies from 2 to 62, by 2’s.
• Can be varied ±1, ±2 at macroblock boundaries
(2 bits), or 2-62 at row and picture boundaries (5
bits).
out
2Q
Q
-Q
-2Q

in

Bit Stream Syntax
Hierarchy of three layers.
Picture Layer

GOB* Layer

MB Layer

*A GOB

is usually a row of macroblocks, except
for frame sizes greater than CIF.
Picture Hdr

GOB Hdr

MB

MB

...

GOB Hdr

...

Picture Layer Concepts
Picture Start
Code

Temporal
Reference

Picture
Type

Picture
Quant

• PSC - sequence of bits that can not be emulated
anywhere else in the bit stream.
• TR - 29.97 Hz counter indicating time reference for a
picture.
• PType - Denotes INTRA, INTER-coded, etc.
• P-Quant - Indicates which quantizer (2…62) is used
initially for the picture.

GOB Layer Concepts
GOB Headers are Optional
GOB Start
Code

GOB
Number

GOB
Quant

• GSC - Another unique start code (17 bits).
• GOB Number - Indicates which GOB, counting
vertically from the top (5 bits).
• GOB Quant - Indicates which quantizer (2…62) is
used for this GOB (5 bits).
• GOB can be decoded independently from the
rest of the frame.

Macroblock Layer Concepts
Coded
Flag

MB
Type

Code Block
Pattern

DQuant

MV
Deltas

Transform
Coefficients

• COD - if set, indicates empty INTER MB.
• MB Type - indicates INTER, INTRA, whether MV is
present, etc.
• CBP - indicates which blocks, if any, are empty.
• DQuant - indicates a quantizer change by +/- 2, 4.
• MV Deltas - are the MV prediction residuals.
• Transform coefficients - are the 3-D VLC’s for the
coefficients.

Unrestricted/Extended Motion Vector
Mode
• Motion vectors are permitted to point outside the picture
boundaries.
– non-existent pixels are created by replicating the edge
pixels.
– improves compression when there is movement across
the edge of a picture boundary or when there is camera
panning.
• Also possible to extend the range of the motion vectors
from [-16,15.5] to [-31.5,31.5] with some restrictions. This
better addresses high motion scenes.

Motion Vectors Over
Picture Boundaries

Edge pixels
are repeated.

Reference Frame N-1

Target Frame N

Extended MV Range
15.5

15.5

(31.5,31.5)

-16
15.5
-16

-16

15.5

-16

Extended motion
vector range, [-16,15.5]
around MV predictor.

Base motion vector range.

Advanced Prediction Mode
• Includes motion vectors across picture boundaries from
the previous mode.
• Option of using four motion vectors for 8x8 blocks
instead of one motion vector for 16x16 blocks as in
baseline.
• Overlapped motion compensation to reduce blocking
artifacts.

Overlapped Motion Compensation
• In normal motion compensation, the current block is composed
of
– the predicted block from the previous frame (referenced by
the motion vectors), plus
– the residual data transmitted in the bit stream for the current
block.
• In overlapped motion compensation, the prediction is a
weighted sum of three predictions.

Overlapped Motion Compensation
• Let (m, n) be the column & row indices of an 88 pixel block
in a frame.
• Let (i, j) be the column & row indices of a pixel within an 88
block.
• Let (x, y) be the column & row indices of a pixel within the
entire frame so that:

(x, y) = (m8 + i, n8 + j)

Overlapped Motion Comp.
• Let (MV0x,MV0y) denote the
motion vectors for the current
block.
motion vectors for the block
above (below) if the current pixel
is in the top (bottom) half of the
current block.
motion vectors for the block to
the left (right) if the current pixel
is in the left (right) half of the
current block.

MV0
MV1

MV1

MV2

MV2

Then the summed, weighted prediction is denoted:
P(x,y) = (q(x,y) H0(i,j) + r(x,y) H1(i,j) + s(x,y) H2(i,j) +4)/8
Where,
q(x,y) = (x + MV0x, y + MV0y),
r(x,y) = (x + MV1x, y + MV1y),
s(x,y) = (x + MV2x, y + MV2y)


H0(i, j) =


H1(i, j) =

H2(i, j) = ( H1(i, j) )T

PB Frames Mode
• Permits two pictures to be coded as one unit: a P frame
as in baseline, and a bi-directionally predicted frame or
B frame.
• B frames provide more efficient compression at times.
• Can increase frame rate 2X with only about 30%
increase in bit rate.
• Restriction: the backward predictor cannot extend
outside the current MB position of the future frame.
See diagram.

PB Frames
-V 1/2

V 1/2

Picture 1
P or I Frame

Picture 2
B Frame

Picture 3
P or I Frame
PB

2X frame rate for only 30% more bits.

Syntax based Arithmetic Coding Mode
• In this mode, all the variable length coding and
decoding of baseline H.263 is replaced with
arithmetic coding/decoding. This removes the
restriction that each sumbol must be represented by
an integer number of bits, thus improving
compression efficiency.
• Experiments indicate that compression can be
improved by up to 10% over variable length
coding/decoding.
• Complexity of arithmetic coding is higher than
variable length coding, however.

H.263 Improvements over H.261
• H.261 only accepts QCIF and CIF format.
• No 1/2 pel motion estimation in H.261, instead it uses a
spatial loop filter.
• H.261 does not use median predictors for motion vectors
but simply uses the motion vector in the MB to the left as
predictor.
• H.261 does not use a 3-D VLC for transform coefficient
coding.
• GOB headers are mandatory in H.261.
• Quantizer changes at MB granularity requires 5 bits in
H.261 and only 2 bits in H.263.

Demo: QCIF, 8 fps @ 28 Kb/s

H.261

H.263

MULTIMEDIA COMMUNICATION & NETWORKS

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (20)

Similar a MULTIMEDIA COMMUNICATION & NETWORKS

Similar a MULTIMEDIA COMMUNICATION & NETWORKS (20)

Más de Kathirvel Ayyaswamy

Más de Kathirvel Ayyaswamy (20)

Último

Último (20)

MULTIMEDIA COMMUNICATION & NETWORKS