The document discusses media processing in future communications networks. It asks what processing will be needed, where it will take place, and how it will be implemented. It argues that most media processing will remain in networks rather than terminals due to bandwidth, battery life, and interoperability considerations. Networks can provide services more efficiently by pooling resources and transcoding media in the cloud rather than relying on individual terminals. Both generic hardware platforms and dedicated DSPs will likely continue to be used for media processing, with DSPs preferred currently for voice and generic platforms gaining ground over time.
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Media processing in the cloud- what, where and how
1. The communications technology journal since 1924 2013 • 5
Media processing in the cloud:
what, where and how
April 11, 2013
2. Media processing in the
cloud: what, where and how
The evolution to IP technology, VoLTE and new video services will have a profound
impact on the way person-to-person media processing will be performed in the
networks of the future. This evolution raises some questions: what processing will be
needed, where will it take place and how will it be implemented?
mediaprocessingbeprovided–willitbe
handledinacloud-likemannerorwillit
bepushedouttoterminals?
The deployment of generic industry
hardware that is capable of running
many kinds of applications in a flexible
manner is a growing trend within the
ICTindustry.Itfollowsthenthatgener-
iccomputersofferingcloudserviceswill
also be used to implement future tele-
communication networks in operator
cloudcenters.
The third and final question
addressed in this article is: how will
mediabeprocessedinevolvedtelecom-
munications networks – how much
generic hardware will be used and will
DSPs on dedicated platforms continue
tobethepreferredapproach.
Bearing in mind that the cloud is not
just about technology, this article also
describes how cloud principles can be
applied to the various business models
forcommunicationservices.
Initially,theservicesprovidedbythe
telephone network were carried out by
switchboard operators. Gradually, as
computing resources were introduced,
control logic processing and media
handling became entirely automatic,
leading to today’s models where cloud-
based services are provisioned over a
network using shared pools of comput-
ing resources, and where users pay for
whattheyconsume.
Phones were initially simple devices,
consisting of a microphone and a loud-
speaker. When routing of calls became
automatic, a rotary dial was added.
Today, more than one billion smart-
phones around the world provide a
computing platform that is capable of
runningmillionsofapplicationsandof
providingextensivemediaprocessing.
Twoofthequestionsaddressedinthis
article are: what media processing will
take place in the communication ser-
vices of the future, and where will this
JOHAN LUNDSTRÖM
BOX A Terms and abbreviations
AMR Adaptive Multi-Rate
AMR-WB AMR-wideband
AS application server
ATM Asynchronous Transfer Mode
BGF border gateway function
BSC Base Station Controller
CAGR Compound Annual Growth Rate
DSP digital signal processor
EFR Enhanced Full Rate
IETF Internet Engineering Task Force
IMS IP Multimedia Subsystem
MGC Media Gateway Controller
MGW Media Gateway
MSC mobile switching center
MSC-S MSC server
M-MGW Mobile Media Gateway
MMTel AS multimedia telephony application
server
MRF Media Resource Function
MRS media resource system
MSS mobile softswitch
OM operations and maintenance
OSS operations support systems
PCM pulse-code modulation
PLMN public land mobile network
PSTN public switched telephone network
RNC radio network controller
SBG Session Border Gateway
SGC Session Gateway Controller
SGW Signaling Gateway
SIP Session Initiation Protocol
TDM time division multiplexing
TrFO transcoder free operation
VLR visitor location register
VoLTE voice over LTE
There’s a strong argument for
regarding telephony as one of the
first cloud-based services. Since
the invention of the telephone,
the industry has evolved
significantly and operators have
developed a flexible range of
services for subscribers provided
on a pay-as-you-use basis.
Smartphones have brought an
enriched experience to users
and theoretically they, along
with other advanced terminals,
could perform much of the media
processing traditionally taken
care of by networks. However, the
constraints posed by bandwidth
and battery life, along with the
desire to provide new services
independent of terminal type,
tend to indicate that most media-
processing services will remain
in the network.
2
ERICSSON REVIEW • APRIL 11, 2013
Voice and video in the cloud
3. Processingandnetworkevolution
Thedigitalizationofvoicewasoneofthe
firststepsinnetworkevolutionandelec-
tronic media processing. The shift to
digitalledtolowerdistortionlevelsand
reducedattenuationofthevoicesignal,
improvingitsquality.
Digitalization led the way in the
development of new approaches for
improving voice quality, such as echo
cancellingandnoisereduction.Without
thedigitalizationofvoice,andthedevel-
opment of efficient voice codecs that
save bandwidth, such as Enhanced
Full Rate (EFR) and Adaptive Multi-Rate
(AMR), mobile telephony would not be
therealityitistoday.
Pulse-code modulation (PCM) is still
the most common method of digital-
ly representing analog voice signals
over the PSTN and among PLMNs. As
networks and devices use and support
different codecs and protocols, mobile
telephonynetworksusuallyneedtocon-
vert voice – by transcoding – from one
formattoanother.
Further improvements to voice qual-
ity are taking place through the appli-
cation of new codecs, such as AMR-WB,
which supports HD voice, combined
with mechanisms, such as transcoder
free operation (TrFO), based on codec
negotiation between the end points
involvedinacall1,2
.
Tones, such as dial and busy tones,
and announcements, such as faulty
service indications, are examples of
general network-generated services
that users have grown accustomed to
over the years. Other services such as
conferences,wherevoicestreamsfrom
multiplesourcesarecombined,arealso
network-generated and exemplify the
trendtowardsadvancedvoiceservices.
Circuit-switched networks still han-
dle most of today’s voice traffic. The
architecture of these networks tends
to be based on softswitches consist-
ing of Media Gateways (MGWs) and
Media Gateway Controllers (MGCs). For
mobile softswitches (MSSs), the MGC is
integratedinthemobileswitchingcen-
ter server (MSC-S). For the most part,
echocancelling,transcoding,andsend-
ing of tones and announcements is car-
ried out by MGWs. These gateways also
interwork with the PSTN for circuit-
switched data and fax, they handle
multi-party calls, and reframe media
samples on the borders between 3GPP
and IETF networks. In addition to per-
forming media processing, the MGWs
also act as a bridge between different
bearer technologies, such as between
TDMandIP.
As networks evolve, and people’s use
of them progresses, voice will be han-
dledbytheIMS.Andsocommunication
with video will become a mainstream
activity for enterprises and consumers.
Media handling in this environment is
performed primarily in a logical node
called the Media Resource Function
(MRF), which uses SIP to communi-
cate with the rest of the network. The
MRF provides services such as tones,
announcements and conferences, and
will support new services developed in
responsetosubscriberdemand.
Inanall-IPenvironment,suchasIMS,
operatorsnolongerhaveend-to-endcon-
trol over networks, resulting in greater
emphasis on security. For SIP signaling
and related media, it is the responsibil-
ity of Session Border Gateways (SBGs)
to handle security. These SBGs can be
implemented as stand-alone boxes, or
integratedintoothernetworkelements
inalayeredarchitecture,whichreduces
capex and opex. These gateways may
also provide limited media-processing
capabilities,suchastranscoding.
Further development in media pro-
cessingwillbeneededtomeettheexpo-
nential growth in person-to-person
videocommunication.
Consider the media processing
requirements for videoconferencing.
Most videoconference services show
participantsusingtwoprimarydisplay
modes: voice activated and continuous
presence. In voice-activated mode, the
stream from the active speaker domi-
nates the available display area, while
otherparticipantsareshowninsmaller
windows, or not at all. In continuous-
presence mode, all participants are
displayed simultaneously. To deliver a
videoconference, the network has two
choices: it can either collect all video
streams from participating users and
send all streams to all users; or it can
mix the video streams into one pre-
ferredformatbeforesendingthesingle,
combinedstreamtoparticipatingusers.
Intheall-streams-to-all-usersapproach,
media processing is performed by the
participating terminals, whereas the
mixing approach relieves the
Common resources that are pooled
and dynamically shared by different
applications
MSC-S OSSMGC
SGW
app
ATM
ports
TDM
ports
IP
ports
DSP
devices
MGW
app
BGF
app
Common resource handling
MRF
app
OM
SGC MMTel
AS
Common OM implementation and
interface with a one node view
FIGURE 1 Ericsson media-resource-system architecture
3
ERICSSON REVIEW • APR L 11, 2013
4. control, such as the MSC-S, and one for
media processing applications, such as
theMGWortheMRF.
Today,controlapplicationstendtobe
built on dedicated, carrier-grade plat-
forms with generic processor archi-
tectures, such as x86. Some of these
platformscanalreadyrunmultipletele-
com applications and provide many of
the benefits offered by operator cloud
centers. It is likely that these platforms
will develop into telecom cloud cen-
ters supporting virtualized software
and applications – allowing operators
to further reduce their capex and opex
investments.
The requirements placed on media-
processing platforms are however sig-
nificantly different from those for
processing control applications. This
is because the amount of processing
needed for media is much greater and
therequirementsforreal-timeprocess-
ing and latency are more stringent. In
addition to supporting multiple ser-
vices and adapting to changing traffic
profiles automatically, media-resource
platforms will need to support TDM
interfaces for some time to maintain
interactionwithlegacysystems.
General-purpose processors, such
as the x86, have become more cost
efficient for handling media, howev-
er their performance compared with
DSPs varies significantly depending on
the media being processed. A DSP, for
example, offers superior performance
for voice processing, such as transcod-
ing. But when it comes to certain types
of video processing the performance of
aDSPisnotsignificantlybetter.
It is hard to predict whether the
cost-to-performance ratio for DSPs
and general-purpose processors will
change as new chips are introduced
to the market and the types of media-
processing services evolve. For the
moment, DSPs provide the best perfor-
manceincomparisontooverallcostfor
services requiring both high channel
capacity and density, such as voice in
circuit-switchednetworks.
Inthelongterm,astheneedtointer-
face with TDM systems disappears and
the volume of voice transcoding con-
sequently shrinks, using generic pro-
cessors and operator cloud centers for
media processing will become a more
competitiveoption.
terminal of the need to perform
any media processing. The combined
approachcansaveasignificantamount
ofbandwidthintheaccessnetwork.Yet
anotherwaytosavebandwidthistojust
send the video stream associated with
the active speaker to the participants’
terminals.
Videoconferencing is just one exam-
ple of a video-based application. Many
newservicesthatwillbetypicallydeliv-
ered by the cloud, such as recording,
storage, announcements and mail-
boxes, will be implemented later on.
Advanced voice and video services may
include real-time speech recognition;
speech-to-text conversion; automatic
languagetranslation;speech-controlled
supplementaryservices;embeddedban-
ner advertising; speaker identification;
and real-time generation and transla-
tionofsubtitlesinvideocalls.
Thecloudversustheterminal
To ensure good media quality and
efficient use of the access network,
terminalsneedtobeabletoencodeand
decode digital media. In theory, ter-
minals could provide more or less all
the media-processing power needed to
deliverservicesofferedbythenetwork.
To do this, terminals would, for exam-
ple,needto:
supportallcodecs–sothatallpotential
peerscanusethecodecbestsuitedto
theirarchitecture;
generatetonesandannouncements
basedonerrorcodesreceivedfromthe
network;and
actasaconferencebridge,orsupport
multiplewaysofactingasavideoclient–
toensureinteroperabilitywithall
potentialpeers.
But is this approach cost efficient?
And is it good for users? The success of
a new communication service lies in
the rapid adoption by a critical mass of
users. New services therefore need to
beasterminal-independentaspossible,
reach as many users as possible and be
interoperablefromdayone.
To maintain interoperability and
avoid fragmentation of some types of
services,suchasvideocommunication,
performingmediaprocessinginthenet-
workiskey.Usingstandardizedinterfac-
es between networks helps to ensure
interoperability among operators and
secures optimal performance and
quality. In addition, codec negotiation
(including interworking between con-
trol protocols), transcoding, reframing
and video-mixing services can be used
innetworkstosupportinteroperability.
Asillustratedbythevideoconference
example, handling media processing
in the network, rather than the termi-
nal,cansavebandwidth.Thisexpensive
resourcecanalsobeusedmoreeconom-
ically if the network is allowed to pro-
videalltranscodingprocessing,leaving
terminals free to use the codec that is
bestsuitedtotheirspecificarchitecture.
Terminals that use less bandwidth
often require less power. And so, by
handing over bandwidth-hungry ser-
vices – such as voice and video mixing
– to the network, power consumption
intheterminalcanbereduced,extend-
ingtherechargingintervalandimprov-
ingbatterylife.
Algorithms for voice and video pro-
cessing tend to be patented and termi-
nalmanufacturershavetopayroyalties
tousethem.Performingtranscodingin
the network through pooled instances
reducesthenumberofalgorithmsneed-
edforterminalmedia-processingresult-
ing in lower usage fees and reduced
overallcosttosubscribers.
When all the factors are brought
together,itseemsthecurrentapproach
to media processing – performing it in
the network – remains the most effi-
cient.Asitislikelythatthenetworkwill
continue to be the most practical alter-
native in the future, it stands to reason
that media processing will also remain
acloud-basedservice.
Cost-drivenplatformevolution
Requirements for reliability, energy
efficiency, redundancy and low carbon
footprint have led to the use of dedicat-
edhardwareplatformstobuildtelecom-
munication network elements – until
now. In an operator cloud, a competi-
tive hardware platform not only needs
to meet all of these requirements but
should be generic enough to support
multiple applications and flexible
enough to accommodate fluctuating
traffic patterns and changing applica-
tioncapacityneeds.
To efficiently provide communica-
tion services in a network, two differ-
ent platform types are needed: one for
4
ERICSSON REVIEW • APRIL 11, 2013
Voice and video in the cloud
5. Sharingresourcesreducescost
The concept underlying Ericsson’s
media-processing platform is based
on providing processing capabilities
in the network. Such a platform – a
mediaresourcesystem(MRS)–usesDSP
resources in a dynamic way, is capable
of allocating resources to the different
media-processing functions automati-
cally,and canpooluserrequestsamong
thevariousDSPs.
The MRS concept provides both
media-gateway and signaling-gateway
functionality for MSS networks. It con-
tains an MRF for media processing in
IMSnetworksandprovidessessionbor-
der functionality for MSS and IMS net-
works.Thesessionborderfunctionality
usesalayeredarchitecture,underwhich
a border gateway function (BGF) in the
MRS handles the media plane, while a
Session Gateway Controller (SGC) han-
dles the control plane. Figure 1 shows
the high-level distributed and integrat-
edarchitectureofthissystem.
Networks with Ericsson Mobile
MGW (M-MGW) nodes installed can be
upgraded to an MRS with support for
future media-processing features, as
the M-MGW/MRS can be part of both
an MSS and an IMS environment. To
perform this type of upgrade simply
involvesasoftwareupdate.
The MRS can be considered to be a
media cloud platform as it supports
multiple media-processing applica-
tions, it can share the available com-
puting resources as well as sharing
externalinterfacesdynamicallyamong
the media-processing applications.
Planstodevelopthesystemincludethe
addition of open interfaces that allow
specializedexternalproductstoprovide
functionalityviathecommonMRF.
Networkscenarios
As illustrated by the example in
Figure 2, fixed and mobile network
architectures have traditionally been
distributed and hierarchical. In such
networks, the node closest to the sub-
scriber takes care of voice coding or
transcoding to PCM when a call enters
thenetwork.
Today’s mobile switching solutions
allowthecontrollogic–theMSCserver
nodes – to be centralized to just a few
sites, even in fairly large networks.
Media, meanwhile, is handled
Coding and
decoding
BSC MSC/VLR Transit
exchange
Local
exchange
Transcoding Coding and
decoding
FIGURE 2 Traditional network architecture
FIGURE 3 Structure of a modern mobile voice network
BSC
BSC
MSC-S
RNC
IP
MGW
PLMN
PSTN
IMS
Pooled media-resources Pooled media-control
and call-routing resources
5
ERICSSON REVIEW • APR L 11, 2013
6. locally to save bandwidth and min-
imize latency. To ensure hardware
resourcesareusedefficientlyandahigh
level of resilience is maintained, MSC-S
nodesareoftenpooled.IP-basedbearers
used on the interface to the radio net-
workalsoallowpoolingofMGWs,offer-
ingsimilarbenefitsintermsofefficient
resourceusageandresilience.Figure 3
showsasimplenetworkwhereboththe
mediagatewaysandserversarepooled.
The introduction of VoLTE and IMS
has naturally led to a new network
structure,especiallyinthemediaplane.
Thefirsttaskthatthenetworkneedsto
take care of is security, and so an SBG
makes sure that it is safe to establish a
session. Media processing may then be
neededintheset-upphaseto,forexam-
ple,producetonesandannouncements;
services which can be provided by tem-
porarily linking in an MRF. During the
call-establishment phase, the control
layer determines whether transcoding
andreframingareneeded.Ifso,anMRF
is linked in, or alternatively a BGF may
be able to handle transcoding. Certain
services,suchasconferencing,mayalso
require additional media processing.
Asend-to-endcodecnegotiationwillbe
more common in IMS networks than
it is in circuit-switched networks, the
need for media processing will dimin-
ish as networks evolve. However, new
and advanced processing services will
beintroducedtohandlespecialcases.
The best network architecture, illus-
trated in Figure 4, is based on distrib-
uted SBGs or BGFs optimizing latency
andensuringbandwidthefficiency;and
advanced services that are not used so
oftencanbecentralized.
The flexible nature of the MRS sup-
ports all network architectures. It is a
scalablesolutionthatcanbeusedatthe
edge of a network or in a centralized
way. In cases where an operator wants
to avoid over provisioning to cater for
occasionaltrafficpeaks,MRSnodescan
be pooled to balance the load through-
out the network. This can be achieved
even if the nodes are in different geo-
graphiclocations.
Changingbusinessmodels
A significant aspect of cloud comput-
ing is the business model. The cloud
approach enables enterprises to buy IT
services instead of investing in infra-
structure. Telecommunication opera-
tors provide communication services,
such as voice, to consumers and enter-
prises in much the same way. And it is
likely that additional products will be
cloud-based3
.
Vendorscanprovidewholesalecloud
servicestooperatorswho,inturn,break
them up into smaller, retail, offer-
ings for enterprises and consumers.
Ericsson’sDeviceConnectionPlatform,
for example, supports machine-to-
machinecommunicationasacloudser-
viceforoperatorsthatofferretail cloud
services. Other services, such as low-
volume media processing, may be pro-
vided to operators as cloud services in
the future. The sharing of network ele-
mentsamongseveraloperatorsenables
vendors to obtain better economies of
scale than individual operators can for
certainservices.
Thewhat,thewhereand
thehow:theanswers
Even though terminals are fast becom-
ing advanced computers capable of
performing sophisticated media pro-
cessing, this function is likely to
remainanetwork-basedserviceforrea-
sons of efficiency. Telecommunication
platforms are developing into multi-
application systems, that support both
local and geographic spreading of
resourcepools.
Cloudplatformsbasedongenericpro-
cessorsarelikelytobeintroducedinthe
control plane first. Whether these plat-
formswillbeusedformediaprocessing,
and when, will depend on: the need for
legacy interfaces; the evolution of the
cost-to-performance ratio for DSPs; the
type of media processing services that
will be required in the future; and the
volumeoftheseservices.
Oneoftheimportantaspectsofcloud
computing is the business model. The
market is already showing evidence of
increased flexibility when it comes to
who will provide communication ser-
vices. In the future, enterprises will be
abletorelyonoperatorstoprovidecom-
munication services instead of buying
their own equipment. Operators will,
inturn,beabletorelyonvendorstopro-
videcloudservices,creatinganefficient
value chain in which each player pays
forservicesbasedonusage.
External
networks
Evolved
Packet
Core
Evolved
Packet
Core
SGC AS
BGF BGFMRF
IP transport network
IMS
control plane
Security and
transcoding on
the network edges
Security and
transcoding on
the network edges
Centralized and pooled
media-resources
in MRF
FIGURE 4 Architecture of an all-IP and IMS network
6
ERICSSON REVIEW • APRIL 11, 2013
Voice and video in the cloud
7. Johan Lundström
is a strategy manager
for mobile softswitch and
media processing
solutions within product
area Core and IMS at Business Unit
Networks. He joined Ericsson in 1991
and since then, he has worked
primarily with mobile core networks.
He has had various positions in both
RD and product management,
including line management. He holds
an M.Sc. in telecommunications and
software science from the Helsinki
University of Technology, Finland.
1. Ericsson, 2010, Ericsson Review, Evolution of the voice
interconnect, available at: http://www.ericsson.com/res/
thecompany/docs/publications/ericsson_review/2010/
evolution_voice_interconnect.pdf
2. Ericsson, 2011, White Paper, HD voice – it speaks for
itself, available at: http://www.ericsson.com/res/docs/
whitepapers/WP-HD-voice.pdf
3. Ericsson, 2011, White Paper, Visual communication – why
operators should address the enterprise market, available
at: http://www.ericsson.com/res/docs/whitepapers/wp-
visual-communication.pdf
References
The author gratefully acknowledges
the colleagues who have contributed
to this article: Patrik Roséen, Mats
Alendal, Joakim Haldin, Markku Korpi,
Peter Jungner, András Vajda,
Kari-Pekka Perttula and Jörg Ewert.
Acknowledgements
7
ERICSSON REVIEW • APR L 11, 2013