Practical Fundamentals of Voice over IP (VoIP) for Engineers and Technicians
Research paper on VOIP Technology
1. VoIP Techniques and Challenges
Karama Said Mohamed
School of Engineering, Design and Technology
University of Bradford
Kartix_2008@yahoo.com
Abstract
Voice over Internet Protocol (VoIP) is a protocol aimed
towards the optimization of voice transmission over the
internet and other networks based on packet switching. The
birth of VoIP came as an alternative for the much expensive
Public Switched Telephone network (PSTN) for voice
transmission.
Putting into consideration the Quality of Service, (QoS) of
the VoIP systems, factors like system capacity, jitter, and
packet delay and loss and channel configurations are of
utmost importance. These milestones together with security
issues and channel bandwidth allocation and the question of
reliability are great challenges facing VoIP systems.
To master the VoIP systems, a clear understanding of the
Internet Protocol (IP) is mandatory. Voice is carried by the
RTP protocol added into the IP packet. In this research paper
I will focus on the techniques employed in VoIP systems and
the challenges that these systems face. With the voice
transmission market still being in the transient stage from
traditional PSTN to VoIP, new techniques are still being
experimented and tested to improve quality of services offered
by VoIP systems. The main point of focus is to come up with
ways to deal with high traffic and the booming demand for the
VoIP systems.
Keywords: Voice over IP, PSTN, Quality of Service, Jitter,
Packet Loss, Packet Delay, Internet Protocol, RTP, Channel
Bandwidth
I. Introduction
The interest and need for Voice over Internet Protocol
(VoIP) has been felt since the introduction of the first
computer network. The main aim of VoIP Systems is to
provide services that are either very hard or very
expensive to implement using the traditional PSTN. The
VoIP system is mainly based on offering voice
communication by using the already existing internet
Protocol (IP). [1]
The internet has proved itself to be a cheap medium of
sending data, like e-mails, globally over the years. Due to
this fact, the VoIP system is invented to carry voice over
this cheap media and to cut down the high costs
associated with the traditional telephone lines. VoIP will
enable people to communicate by voice all over the world
at a much cheaper price than what people used to spend
on normal telephone lines. This huge difference in cost
has highlighted VoIP systems to be an interesting area for
researchers.
In contrary, cheaper costs of communication do not
guarantee best services. Applications associated with
VoIP have ever since its first usage, shown much poorer
performance than conventional telephone services.
Looking at aspects like Quality of Service, VoIP still
needs a lot of improvements to stand up against the
traditional telephone lines. The unsatisfying quality of
data transmitted over the existing internet infrastructure
could be due to the non-uniform nature of internet
services that are available in different places in the world.
[2]
Currently, there is a lot of interest in the area of using
VoIP over cellular networks. Due to the increase of users
of VoIP through the internet and the ever existing need of
cellular operators to maximize profits, introducing VoIP
over cellular networks is a nut worth cracking. By
introducing VoIP into cellular networks, existing cellular
operators can easily and at a low cost switch to all-IP
networks. This move would greatly reduce the operational
costs and hence increase profitability. Not only the
operators would benefit from this but also the users. The
users will enjoy very much cheaper voice communication.
This will be achieved by using devices such as the
truphone. This device carries software that will enable to
turn the mobile cellular phone into a VoIP phone while
connected to the internet. [3][15]
This all could be realized if the QoS of VoIP systems
would be improved to the maximum. To achieve this
desired QoS, things like system capacity, jitter, packet
delay and loss and channel configurations must be put
under serious monitoring.
II. How Does VoIP Work?
Due to the fact that VoIP makes use of the normal
internet architecture, it functions more or less like any
2. other internet service at the basic level perspective. At the
transmitting end, the data (voice) undergoes compression,
analogue to digital conversion and then it is broken to
data packets that carry distinct serial numbers. A lookup
table will then aid the server at the transmitting end to
determine the IP of the intended receiver. Once the server
figures out the receiver IP it starts sending the data in the
same manner as emails and other internet data. [14]
At the receiver end, the data is collected and arranged as
per the sequence numbers they carry and then it
undergoes digital to analogue conversion to enable the
receiver to hear the voice. Transmission using VoIP can
be between two computers, two telephones or even
between a telephone and a computer. With VoIP, a person
can even use a PC to originate a call to a landline or vice
versa. [Jain, 2004].figure 1 shows the overview of VoIP
functionality network.
Figure 1: basic VoIP functionality
This is a basic view of how VoIP functions. In the next
section I will elaborate in details the techniques used in
VoIP systems.
III.Carrying Voice over IP
As I mentioned earlier, in VoIP systems, the voice is
carried using the existing IP. This is made possible by
adding the RTP header in the IP packet. Figure 2 shows
the normal IP packet and the VoIP packet where the RTP
header is included.
Figure 2: RTP header added onto an IP packet
The RTP is encapsulated in the UDP and IP.The voice
bandwidth per conversation depends on CODEC and the
sampling rate. [4]
IV. VoIP Performance
In order for VoIP to be able to stand up against the likes
of UMTS, a very desirable QoS must be achieved. As
mentioned earlier, one of the reasons for the poor
performance of VoIP is that different regions in the world
have distinct nature in terms of internet performance. To
analyze in details the performance of this system we need
to look into the following parameters;
System capacity/ Available bandwidth
Packet loss
Delay/Network Latency
Jitter
Echo
Security
These mentioned factors are so far the major challenges
faced by VoIP systems. All these parameters are variables
that can be altered to achieve a desirable QoS. [14]
3. V. What Is QoS?
QoS or Quality of Service is an overall set of network
standards and mechanisms that ensure that the services
offered are of high performance. Network administrators
normally use the QoS mechanisms as a reference model
to make optimum use of existing network resources to
achieve the desired performance without the need for
expanding or providing more resources to the network.
Initially, quality in networks just meant equal treatment
of the entire network traffic. This meant the network’s
best effort was distributed equally to all traffic. This
condition offered no guarantees for network performance
characteristics such as delay and its variations, reliability
and security. QoS is here to change the situation with the
idea that different applications have different
requirements and different users also have different needs.
The main idea of QoS is that the effort of the network
should not be distributed evenly to all the traffic in that
some traffic needs to be given priorities over others.
The main goal of QoS is to provide prioritized delivery
services to the applications that require it. This is done by
ensuring the provision of sufficient bandwidth,
monitoring and controlling delay and jitter, and by
minimizing data loss.
In IP based networks, VoIP being one of them, there are
two main models of QoS defined by the Internet
Engineering Task Force (IETF), these are; the integrated
Services (Intserv) and Differentiated Services (Diffserv).
These two models have several mechanisms that ensure
preferential services are given to specific traffic in the
network.
If VoIP applications achieve a desirable QoS then they
will enjoy the following benefits;
Administrators will have a good control over the
usage of network resources which will enable
them to operate the network in a business
perspective to maximize profits.
Applications and users who are time sensitive
and critical will be provided with the resources
they require at the same time other applications
and users can have access to the network.
User experience will be improved as a result of
improved system performance.
Due to the fact that existing resources will be put
to optimum use, the general operating cost will
be reduced. This will also ensure that there is a
minimum need for expansions and upgrades. [5]
VI. VoIP Challenges
A. System Capacity/Available bandwidth.
The provision of sufficient bandwidth for voice
transmission is the first crucial step towards achieving a
desirable QoS. The challenge here is that the available
bandwidth is a limited resource. This implies that VoIP
systems must be designed to make use of the available
bandwidth without exceeding the limit, while at the same
time carry real-time voice efficiently. Table 3 shows the
bandwidth provisioning for VoIP.
Table 3: Bandwidth provisioning for VoIP.
To carry out a more accurate method for provisioning
the layer 2 header is included into the bandwidth
calculations. The results are as shown in the table 4.
Table 4: Bandwidth provisioning for VoIP with the
header 2 included in calculations.
4. Table 5: Bandwidth allocations for compressed
RTP
VoIP services operate using symmetrical bandwidth in
the uplink and downlink. The main problem is that a
bandwidth imbalance may exist in the uplink and the
downlink even in the HSDPA phase.
The system capacity also affects the provision of
bandwidth traffic for each subscriber thus limiting the
number of subscribers. In GSM and UMTS systems, the
adaptive multi-rate (AMR) audio codec (12.2 Kbps) are
extensively used in the CS voice services. In the case of
VoIP protocol stack, the routing table protocol (RTP) and
the user datagram protocol (RDP) are put in use. These
two protocols are carried by the IP packet. Considering
that the IP packet carries the RTP, UDP and the IP
headers, the voice will require a bidirectional data rate of
32 Kbps or 64Kbps to carry out the transmission
successfully. [6]
B. Packet Loss
In IP networks voice is treated as normal data. Due to
this fact, the voice packets are vulnerable to the
unfortunate cases of being dropped when the traffic is
high and the network is congested. Re-transmission of
lost data packets can solve the problem in data
transmission, but this is not a solution for voice data.
These solutions fail in voice data transmission because
voice packets can contain a range of 40 to 80 ms of
speech information. Packet loss greatly reduces the QoS
of the systems. In systems like the ITU-TG.711 Vocoder,
a standard for toll quality, a packet loss rate as low as 1%
can cause a serious degradation in user experience. Other
types of coders that carry out a more severe data
compression tend to degrade more rigorously. [13]
In the calculation of jitter, which I will discuss later in
this paper, lost packets are usually neglected as they are
considered to be packets with a delay magnitude of
infinity and using them in the calculations will twist the
calculations. Packet loss can be compensated in the end
point by using algorithms like Packet Loss Concealment
(PLC) or Packet Loss Recovery (PLR). Payload
redundancy can be applied to counter packet loss but its
use will require additional bandwidth. [7]
In order to secure a sufficient bandwidth for the packets
in a VoIP channel, a network device should be able to
carry out identification of the VoIP packets. This implies
that the VoIP packets should be able to be identified from
all the other IP traffic. The network devices carry out this
identification process by referring to the source and
destination IP headers or the User Datagram Protocol
headers. This process of packet identification is termed as
classification and it is the basic foundation towards
achieving a desired QoS.
Another method of carrying out classification is by
using the Resource Reservation Protocol (RSVP)
mechanism. This mechanism carries out dynamic
classification unlike the previously stated which is a static
way of classification.
After the classification process is completed by each
hop in the network, each VoIP packet is then provided
with the needed QoS. At this extent, special techniques
can be assigned to achieve a priority queuing. Priority
queuing ensures that any large data packets involved do
not interfere with the ongoing voice transmission and
minimizes bandwidth requirements in the fact that it
compresses the 40-byte IP and UDP together with the
RTP headers to 2 or 4 bytes only. [8]
C. Delay/Network Latency
Delay in networks is a condition that arises when voice
packets take a longer time than expected to arrive at their
destinations. This condition eventually results to
distortions in the quality of voice.
When transmitting voice packets, some of them get
delayed and reach the destination later then expected.
This delay may be caused by many factors and the main
one being the underlying network. Delayed packets
normally arrive at the destination late or never at all. QoS
for voice transmission tends to be more tolerant on packet
loss compared to text.)[9]
The main known causes of delay are:
Codec
Queuing
Wait for packet being transmitted
Serialization
Jitter buffer
5. Figure 6: The acceptable range of delay
Figure 6 shows the acceptable delay for different
applications. Some of the delay causes can be dealt with
but some of them there is no solution for them. Figure 7
shows the delay components at different levels of
transmission. [10]
Figure 7: delay components from source to
destination.
Delay can be classified into two categories: the fixed
delay and the variable delay also known as jitter. Figure 8
shows the existence and causes of fixed delay in a
network. The fixed delays are due to propagation,
serialization and processing as shown in the figure.
Figure 8: fixed delays in a network
The propagation delay is normally about six
microseconds per kilometre. Serialization delay occurs in
the buffer to serial link. The processes that impose a
delay are the likes of coding, compression, packetisation,
decompression and decoding.
The other type of delay is the variable delay, commonly
known as jitter. Figure 9 shows the variable delay in a
network. The main component here is the queuing delay
which occurs throughout the network and it is greatly
influenced by the packet size. The other thing that
contributes to the variable delay are the de-jitter buffers
that introduce variable delay so as to smooth out voice
playout. [4]
Fixed delays are out of our control but other delays can
be reduced by the practice of marking voice packets as
being delay sensitive. Another solution is to mitigate the
effects caused by the jitter in a jitter buffer once they
arrive at the destination. This process has a side effect of
increasing delay. [1]
Figure 9: variable delays
D.Jitter
A mentioned earlier, jitter is a variable delay caused
mainly by queuing, contention and serialization along the
network. In general terms, it is seen that jitter occurs most
in links that are either slow or suffer from heavy
congestion. QoS mechanisms such as queuing based on
class, reservation of bandwidth and links that operate
faster can greatly reduce the jitter problems in future.
Until then jitter still remains a notorious drawback for
VoIP channels.
Jitter in real-time voice transmission can be classified
into 3 types;
6. Type A: this type is classified as a constant jitter. The
packet to packet delay variation is almost constant.
Type B: this type is termed as the transient jitter. The
main characteristic of this type of jitter is that it has a
substantial incremental delay that may affect a single
packet.
Type C: this is the jitter composed of short term delay
variations. Here the delay increases that affects numerous
packets. Apart from this, a packet to packet delay
variation may also be present. This type of jitter normally
results from congestions and changes in routes.
Transmit time jitter can occur in soft phones because the
processes involved in the VoIP systems have to compete
for the CPU time with other processes. This jitter is due
to scheduling delays.
Figure 10: experimental results of effect of packet
congestion on delay (x = packet)
Figure 10 shows how packet congestion in a network
increases the delay time. The X’s in the graph represents
packets and we see that where the packets are most
congested, the delay is more.
In the figure 11, another aspect is investigated that
affects the delay, which is the access link congestion.
Other main causes of delay are;
Sharing of load between many access links or IP
service providers
Sharing of load within IP service
Inter-router load sharing
Routing table updates
Route flapping
Timing drift
Figure 11: Access link congestion effects on
delay
The mostly used remedy for removing the effects of
jitter is the use of jitter buffers. Jitter buffers are designed
to erase the effects of jitter from the decoded voice
stream. This process is done by buffering each individual
packet for a short interval before it is heard by the
receiver. As a result an additional delay is introduced and
some packets get lost but jitter is solved. Adaptive jitter
buffers are more preferable than fixed jitter buffers
because they are capable of adjusting there size and to
optimize the delay and discard tradeoffs.
In terms of delay, both the fixed and adaptive jitter
buffers are capable of carrying out automatic adjustments
according to the changes in delay. For instance if a delay
undergoes a step change of 19 milliseconds, then some
packets may be discarded due to the change but the jitter
buffer will be realigned fast.
A jitter buffer is commonly looked at as a time window
with the early side aligned with the recent minimum delay
and the late side representing the maximum allowed delay
before a packet is considered to be discarded [11]
E.Echo
Sometimes when users of VoIP make calls they could
hear their own voice reflected to their phones’ speaker
after a few milliseconds. This annoying phenomenon is
known as echo. The time interval between speaking and
hearing your own voice varies with the different causes of
echo. A short interval echo does not cause so much harm
but a longer one could completely destroy the
conversation. A noticeable delay is the one which is loud
and delayed. PSTN suffer from echo but not as much as
VoIP systems. This is because PSTN has much lesser
delay compared to VoIP. The maximum allowable delay
for PSTN is about 10 milliseconds while that of VoIP can
7. be up to 400 milliseconds. This implies that VoIP is more
vulnerable to echo. [9][13]
When a portion of the talker’s voice is echoed back to
him, this is known as talker echo. Listener echo happens
when a portion of the talker’s voice is echoed back from
the listener’s side and then proceeded by a second echo
that causes a portion of the signal to reflect back to the
listener. The end result s the listener hearing the talker’s
voice twice i.e. echoed.
The other type of echo is the convergence echo. This
occurs at the beginning of a call and it occurs due to the
delay in the echo canceller’s convergence.
To solve the problems of echo, VoIP gateways make
use of line echo cancellers to eradicate or minimize echo
levels originating from analogue loops. Identifying a
source of echo and checking its configurations are
important processes involved in echo removal. [9]
Echo cancellers normally face towards the PSTN tail
circuit and they carry out elimination of echoes in the tail
circuit on its respective side of the network. [8]
F.Security
Although it is much easier to secure a phone with VoIP
than PSTN, a good number of consumer VoIP solutions
do not support encryption yet. This makes it easier to
carry out eavesdropping in VoIP and even change the
contents of the data.
Numerous open source solutions are available that
facilitate the sniffing process of conversations through
VoIP. A small degree of security is afforded by the use of
scarce patented audio codecs that are not easily obtainable
for open source applications. The use of this method of
security is not proven effective.
Compression is put in use by some vendors to counter
eavesdropping. This method also only makes it difficult to
eavesdrop but doesn’t prevent it. Encryption and
cryptography are essential to ensure proper security in
VoIP. There are possibilities to use the IPSec to secure
VoIP by the use of opportunistic encryption. [12]
VII. Conclusion
VoIP is an evolutional step in voice communication
that makes use of the widely spread and well establishes
internet backbone. VoIP has managed to provide a much
cheaper means of voice communication but still it is not
wholly embraced by all. This might be because of its
trade-off of low cost for poor QoS.
The core reason for this low QoS in VoIP is that
basically due to the fact that the internet was not designed
for voice transmission. This is because the performance of
VoIP is significantly hindered by factors like delay and
packet loss. Delay has a much greater impact in the
performance of VoIP due to the voice data sensitivity to
delay.
The nature of transmitting voice data over internet will
always result to packet loss. The techniques used to
counter the packet loss need to be closely monitored as
most of them trade-off packet loss with delay.
Apart from delay, jitter and packet loss, the question of
security and reliability arises often due to the fact that the
voice is transmitted over a public, widely spread media;
the internet.
To conclude, the PSTN system was designed for the
sole purpose of carrying voice. Will the use of internet as
a backbone to carry voice reach the standards of PSTN?
Until the present, I can say VoIP can only be used in
conjunction with the PSTN and not to replace it. VoIP
may have a chance of replacing PSTN if and only if
definite communication standards are set for VoIP,
solutions for compatibility queries are defined and cross
platform communication system is developed.
References:
[1]Voice over IP available at:
http://en.wikipedia.org/w/index.php?title=Voice_over_IP&redir
ect=no
[2] Term paper on voice over internet protocol available at:
http://www.termpapergenie.com/voiceover%20.html
[3] www.theiet.com
[4] Voice over IP by David Lake, Cisco Systems Ltd available
at:
http://www2.theiet.org/oncomms/sector/communications/Article
s/Heading/132
[5]http://technet2.microsoft.com/windowsserver/en/library
[6] Leading Edge-VoIP over HSPA: running in the fast lane, By
Li Xuanbo available at:
http://www.huawei.com/publications/view.do?id=2938&cid=53
31&pid=61
[7] Overcoming Barriers to High-Quality Voice over
8. IP Deployments available at:
http://www.intel.com/network/csp/pdf/8539.pdf
[8] Quality of service, quality of Service for voice over IP
available at:
http://www.cisco.com/en/US/docs/ios/solutions_docs/qos_soluti
ons/QoSVoIP/QoSVoIP.html
[9]http://voip.about.com/od/glossary/g/delay.htm
[10] Quality of Service for Voice over IP (QoS for VoIP)
Presented by: Dr. Peter J. Welcher. Available at
www.netcraftsmen.net/welcher/seminars/qos-voip.pdf
[11] In depth: jitter. Available at
http://www.voiptroubleshooter.com/indepth/jittersources.html
[12] Examining Two Well-Known Attacks on VoIP By: Peter
Thermos available at;
http://www.circleid.com/posts/examining_two_well_known_att
acks_on_voip1/
[13] Olivier Hersent, Jean-Pierre Petit, David Gurle, Beyond
VoIP Protocols: Understanding Voice Technology and
Networking techniques for IP telephony, John Wiley and Sons,
2005.
[14] Jonathan Davidson, James Peters Contributor: Brian
Gracely, Voice over IP Fundamentals, Cisco press, 2000.
[15] Ted Wallingford, Switching to VoIP, O’Reilly, 2005.
.