SlideShare una empresa de Scribd logo
1 de 26
Descargar para leer sin conexión
Audio Compression
                    by: Philipp Herget




                 Su ciency Course Sequence:

Course Number   Course Title                                Term
HI1341          Introduction to Global History              A92
HI2328          History of Revolution in the 20th Century   B92
MU1611          Fundamentals of Music I                     A93
MU2611          Fundamentals of Music II                    B93
MU3611          Computer Techniques in Music                C94




                             Presented to: Professor Bianchi
                                  Department of Humanities & Arts
                                  Term B, 1996
                                  FWB5102


              Submitted in Partial Ful llment
                  of the Requirements of
         the Humanities & Arts Su ciency Program
               Worcester Polytechnic Institute
                 Worcester, Massachusetts
Abstract
    This report examines the area of audio compression and its rapidly expanding use
in the world today. Covered topics include a primer on digital audio, discussion of
di erent compression techniques, a description of a variety of compressed formats, and
compression in computers and Hi-Fi stereo equipment. Information was gathered on a
multitude of di erent compression uses.
Contents
1 Introduction                                                                          1
2 Digital Audio Basics                                                                  2
3 Compression Basics                                                                    7
  3.1 Lossless vs. Lossy Compression : : : : : : : : : : : : : : : : : : : : : : : : : 7
  3.2 Audio Compression Techniques : : : : : : : : : : : : : : : : : : : : : : : : : 9
  3.3 Common Audio Compression Techniques : : : : : : : : : : : : : : : : : : : : 10
4 Uses of Compression                                                                  17
  4.1 Compression in File Formats : : : : : : : : : : : : : : : : : : : : : : : : : : : 18
  4.2 Compression in Recording Devices : : : : : : : : : : : : : : : : : : : : : : : 19
5 Conclusion                                                                           22
Bibliography                                                                           23




                                           i
1 Introduction
The rst form of audio compression came out in 1939 when Dudley rst introduced the
VOCODER (VOice CODER) to reduce the amount of bandwidth needed to transmit speech
over a telephone line (Lynch, 222). The VOCODER broke speech down into certain fre-
quency bands, transmitted information about the amount of energy in each band, and then
synthesized speech using the transmitted information on the receiving end of the device. Since
then, there has been a great deal of research conducted in the area of audio compression. In
the 1960's, compression was used in telephony, and extensive research was done to minimize
bandwidth needed to transmit audio data (Nelson, 313). Today, audio compression is a large
subarea of Audio Engineering.
    The need for audio compression is brought about by the tremendous amount of space
required to store high quality digital audio data. One minute of CD quality audio data
takes up 4Mbytes of storage space (Ratcli , 32). The use of compression allows a signi cant
reduction in the amount of data needed to create audio sounds with usually only a minimal loss
in the quality of the audio signal. Compression comes at the expense of the extra hardware or
software needed to compress the signal. However, in todays technologically advanced times,
this cost is usually small compared to the cost of space that is saved.
    Compression is used in almost all new digital audio devices on the market, and in many of
the older ones. Some examples are the telephone system, digital message recorders, like those
in answering machines, and Sony's new MiniDisc player. With the use of compression, these
devices are able to store more information in less space. Compression is accompanied by a
loss in quality, but usually so minimal it cannot be heard by most people. A good example
of this is the anti-shock mechanism found in the newer CD players. This mechanism uses a
small portion of digital memory to bu er digital data from the CD. When a physical shock
disrupts the player and it can no longer read data from the CD, the data from the memory
bu er is used to generate the audio signal until the player re-tracks on the CD. To store a
maximum amount of data, the player uses compression to store the data in the memory. The

                                              1
Panasonic SL-S600C has such an anti-shock mechanism with 10 seconds of storage bu er.
The Panasonic SL-S600C Operating Instructions state:
     The extra anti-shock function incorporates digital signal compression technology.
     When listening to sound with the unit connected to a system at home, it is
     recommended that the extra anti-shock switch be set to the OFF position.
The recommendation is given because the compression algorithm used in the storage has a
slightly detrimental impact on the sound quality.
    The use of audio compression is a tradeo among di erent factors. Knowledge of audio
compression is useful not only to the designer, but also the consumer. The key questions that
arise in the evaluation of an audio compression systems are how much is the data compressed,
what are the losses associated with the compression, and what is the cost of the compression.
This paper will answer some of these questions by providing a basic awareness of compression,
giving background on compression, explaining various popular compression techniques, and
discussing the compression formats used in various audio devices and audio computer les.


2 Digital Audio Basics
Compression can be accomplished using two di erent methods. The rst method is to take
the data from a standard digital audio system and compress it using software. The second is
to encode the signal in a di erent yet similar manner to that done in a normal digital audio
system. Both of these methods are based on digital audio theory, therefore, the understanding
of their functionality and performances requires an understanding of digital audio basics.
    The sounds we hear are caused by variations in air pressure which are picked up by our
ear. In an analog electronic audio system, these pressure signals are converted to a electric
voltage by a microphone. The changing voltage, which represents the sound pressure, is
stored on a medium (like tape), and later used to control a speaker to reproduce the original
sound. The largest source of error in such an audio system occurs in the storage and retrieval
process were noise is added to the sound.
                                              2
Voltage (Air Pressure)




                                                                          time




                       Figure 1: An Example of an Analog Waveform

    The idea behind a digital system is to represent an analog (continuous) waveform as a
  nite number of discrete values. These values can be stored in any digital media, such as a
computer. Later, the values can be converted back to an analog audio signal. This method
is advantageous over the older analog techniques because no information (quality) is lost in
the storage and retrieval process. Also unlike analog, when a copy of a digital recording is
made, the values can be exactly duplicated, creating an exact replica of the original digital
work. However, the process does su er other losses. These losses occur in the conversion
process from the analog to the digital format.
    To explain the analog to digital conversion process, we will look at an analog audio
waveform and show each of the steps taken in digitizing it. The waveform in Figure 1
represents a brief moment of an audible sound. The amplitude of the waveform represents
the relative air pressure due to the sound.
    In a digital system, the waveform is represented by a series of discrete values. To get
these values, two steps must be taken. First the signal is sampled. This means that discrete
values of the signal are selected in time. The second step is to quantize each of the values
attained in the sampling step. Quantization reduces the amount of storage space required for
each value in a digital system.
    In the rst step, the samples are taken at constant intervals. The number of samples

                                             3
Voltage




                                                                                time

                                T




                  Figure 2: An Example of a Sampled Analog Waveform

taken every second is called the sampling rate. Figure 2 shows the result of sampling the
signal. The X's on the waveform represent the samples which were taken. Since the samples
were taken every T seconds, there are 1=T samples per second. The sampling rate shown
in Figure 2 is therefore 1=T samples/s. Typical sampling rates range from 8000 to 44100
samples/s for a CD. The term samples/s is often replaced by the term Hz, kHz, or MHz to
represent units of samples/s, kilo samples/s, or Mega samples/s respectively (Audio FAQ).
    The sample values, the values with the X's, now represent the original waveform. These
values could now be stored, and be used at a later time to recreate the original signal. How
well the original signal can be recreated, is related to the number of samples taken in a given
time period. Therefore, the sampling rate is a critical factor in the quality of the digitized
signal. If too few samples are taken, then the original signal cannot be re-generated correctly.
    In 1933, a publication by Harry Nyquest proved that if the sampling rate is greater
that twice the highest frequency of the original signal, the original signal can be exactly
reconstructed (Nelson, 321). This means that if we sample our original signal at a rate that
is twice as high as the highest frequency contained in the signal, there will be no theoretical
losses of quality. This sampling rate, necessary for perfect reconstruction, is commonly
referred to as the Nyquest rate.
    Now that we have a set of consecutive samples of the original signal, the samples need
                                               4
Voltage




                                                                                time

                                T




      Figure 3: An Example of a Quantization of the A Sampled Analog Waveform

to be quantized in order to reduce the storage space required by each sample. The process
involves converting the sampled values into a certain number of discrete levels, which are
stored as binary numbers. A sample value is typically converted to one of 2n levels, where n
is the number of bits used to represent each sample digitally. This process is carried out in
hardware by a device called an analog to digital converter (ADC).
    The result of quantizing the values from Figure 2 is shown in Figure 3. The samples still
have approximately the same value as before, but have been rounded o " to the nearest of
16 di erent levels. In a digital system, the amount of storage space required by a number
is governed by the number of possible values that number could have. By quantizing the
sample, the number of possible values is limited, signi cantly reducing the required storage
space. After quantizing the value of each sample in the gure to one of 24 levels, only 4 bits
of storage are needed for each sample. In most digital audio systems, either 8 or 16 bits are
used for storage, yielding 28 = 256 or 216 = 65536 di erent levels in the quantization process.
    The quantization process is the most signi cant source of error in a digital audio signal.
Each time a value is quantized, the original value is lost, and the value is replaced by an
approximation of the original. The peak value of the error is 1=2 the value of the quantization
step. Thus the smaller the quantization steps, the smaller the error is. This means the more

                                              5
Voltage




                                                                                time

                                T




          Figure 4: An Example of a Signal Reconstructed from the Digital Data

bits used to quantize the signal, the better the quality of reconstructed sound signal, and the
more space required to store the signal values.
    To regain the original signal, each of the values stored as the digital audio signal are
converted back to an analog audio signal using a Digital to Analog Converter (DAC). An
example of the output of the DAC is shown in Figure 4. The DAC takes the sample points and
makes an analog waveform out of them. Due to the process used to convert the waveform,
the resulting signal is comprised of a series of steps. To remedy this, the signal is then put
through a low pass lter which smoothes out the waveform, removing all of the sharp edges
caused by the DAC. The resulting signal is very close to the original.
    All the losses in the digital system occur in the conversion process to and from a digital
signal. Once the signal is digital, it can be duplicated, or replayed any number of times and
never lose any quality. This is the advantage of a digital system. The losses generated by
the conversion process can be measured as a Signal to Noise Ratio (SNR), the same measure
used for analog signals. The noise in the signal is considered to be the signal that would have
to be subtracted from the reconstructed signal to obtain the original. SNR is used to compare
the quality of di erent types of quantization, and is also used in the quality measurement of
compression techniques.

                                              6
3 Compression Basics
The underlying idea behind data compression is that a data le can be re-written in a di erent
format that takes up less space. A data format is called compressed when it saves either
more information in the same space, or saves information in less space than a standard
uncompressed format. A compression algorithm for an audio signal will analyze the signal
and store it in a di erent way, hopefully saving space. An analogy could be made between
compression and shorthand. In shorthand, words are represented by symbols, e ectively
shortening the amount of space occupied. Data compression uses the same concept.

3.1 Lossless vs. Lossy Compression
The eld of compression is divided into two categories, lossless and lossy compression. In
lossless compression, no data is lost in the compression process. An example of a lossless
compression program is pkzip for the IBM PC. This is a shareware utility which is widely
available. It can be used to compress and uncompress any type of computer le. When a le
is uncompressed, the exact original is retrieved. The amount of compression that is achieved
is highly dependent on the type of le, and varies greatly from le to le.
     In lossy compression schemes, the goal is to encode an approximation of the original.
By using a close approximation of the signal, the coding can usually be accomplished using
much less space. Since an approximation is saved, instead of the original, lossy compression
schemes can only be used to compress information when the exact original is not needed.
This is the case for audio and video data. With these types of data, any digital format used
is an approximation of the original signal. Compression used in computer data or program
  les must be compressed using lossless compression because all of the data is usually critical.
     In general, lossy compression schemes yield much higher compression ratios than lossless
compression schemes. In many cases, the di erence in quality between the compressed
version and the original is so minimal that it is not noticeable. Yet, in other compression

                                               7
schemes there is a signi cant di erence in quality. Deciding what how much information is
to be lost is up to the discretion of the designer of the algorithm or technique. It is a tradeo
between size and quality.
    If the shorthand writer, from the previous analogy, was to write only the main idea's of the
text down, it would be analogous to lossy compression. Using only the main ideas would be
an extreme form of compression. If he or she were to leave out some adjectives and adverbs,
it would again be a form of lossy compression. This one being less lossy than the rst. From
the analogy, it can be seen how the writer (programmer) can decide how important the details
are and how many details to include.
    Almost all compression techniques used in digital systems are lossy. This is because
lossless compression algorithms are generally very unpredictable in the amount of compres-
sion they can achieve. In a typical application, there is a limited amount of space" for the
digital audio data that is generated. If the audio data cannot be compressed to a guaranteed
size, it simply will not t in the required space, which is unacceptable.
    The reason for the unpredictability of a lossless technique lies in the technique itself. Data
which happens to be in a format which does not lend itself to the way the lossless technique
re-writes" the data will not be compressed. In The Data Compression Book, Mark Nelson
compares raw speech les which were compressed with a shareware lossless data compression
program, ARJ, to demonstrate how well a typical lossless compression scheme will compress
an audio signal. He states:
       ARJ results showed that voice les did in fact compress relatively well. The six
       sample raw sound les gave the following results:
                       Filename          Original Compressed Ratio
                   SAMPLE-1.RAW            50777          33036         35%
                   SAMPLE-2.RAW            12033           8796         27%
                   SAMPLE-3.RAW            73019          59527         19%
                   SAMPLE-4.RAW            23702           9418         60%
                   SAMPLE-5.RAW            27411          19037         30%
                   SAMPLE-6.RAW            15913          12771         20%

                                                8
His data shows that the compression ratios uctuate greatly depending on the particular
sample of speech that is used.

3.2 Audio Compression Techniques
For any type of compression, the compression ratio and the algorithm used is highly depend-
ent on the type of data that is being compressed. The data source used in this paper is audio
data, and we have already determined that lossy compression will be used in most cases.
Now we can further subdivide the source into music and voice data.
    The more information that is known about the source, the better the compression tech-
nique can be tailored toward that type of data. The di erences between music and speech
allow audio compression techniques to be subdivided into two categories: waveform coding
and voice coding. Waveform coding can be used on all types of audio data, including voice.
The goal of waveform coding is to recreate the original waveform after decompression. The
closer the decompressed waveform is to the original, the better the quality of the coding
algorithm is. The second technique, voice coding, yields a much higher compression ratio,
but can only be used if the audio source is a voice. In voice coding, the goal is to recreate the
words that were spoken and not the actual voice. The algorithms utilize priori information
about the human voice, in particular the mechanism that produces it" (Lynch, 255).
    Since the two techniques are fundamentally di erent, the performance of each technique
is measured di erently. The performance of waveform coding techniques are measured by
determining how well the uncompressed signal matches the original speech waveform. This
is usually done by measuring the SNR. With the voice coding technique this is not possible
since the technique doesn't try to mimic the waveform. Therefore, in voice coding algorithms,
the quality of the algorithm is measured by listener preference.
    These coding techniques can be further subdivided into two categories, time domain
coding and frequency domain coding. In a time domain coding technique, information on each
of the samples of the original signal are encoded. In a frequency domain coding technique,

                                               9
the signal is transformed into it's frequency representation. This frequency representation is
then encoded into a compressed format. Later the information is decoded, and transformed
back into the time representation of the signal to get back the original samples. Most simple
compression algorithms use a time domain coding technique.
    The more recent waveform coding techniques provide a much higher compression ratio by
using psychoacoustics to aid in the compression. Psychoacoustics is the study of how sounds
are heard subjectively and of the individual's response to sound stimuli" (Webster's New
World Dictionary, 1147). By basing the compression scheme on psychoacoustic phenomenon,
data that can't be heard by humans can be discarded. For example, in psychoacoustics it has
been determined that certain levels of sounds cannot be heard while other louder sounds are
present (Beerends, 965). This e ect is called masking. By eliminating the unheard sounds
from the audio signal, the signal is simpli ed, and can be more easily compressed. Techniques
like these are used in modern systems where high compression ratios are necessary, like Sony's
new MiniDisc player.

3.3 Common Audio Compression Techniques
The techniques that have been discussed thus far are general subcategories of the approaches
that can be taken when designing an audio compression algorithm. In this section, the details
of some popular compression techniques will be discussed. Since compression is such a large
area, a comprehensive guide to all the di erent compression methods is far beyond the scope
of this paper. However, this section covers some fundamental and some advanced techniques
to provide a general idea of how di erent compression techniques are implemented.
    To give a general background, both waveform and voice coding techniques are discussed.
Since the waveform coding techniques are simpler, they will be discussed rst. In these
techniques, the compressed digital data is often obtained from the original signal itself, rather
than creating standard digital audio data and compressing it with software.


                                               10
3.3.1 Waveform Coding Techniques
PCM
Pulse Code Modulation (PCM) refers to the technique used to code the raw digital audio
data as described in Section 2. It is the fundamental digital audio technique that is used
most frequently in digital audio systems. Although PCM is not a compression technique,
when it is used along with non-uniform quantization such as {Law or A{Law, it can be
considered compression. PCM combined with non-uniform quantization is used as a reference
for comparing the performance of other compression schemes (Lynch, 225).

 {Law and A{Law Companding
Since the dynamic range of an audio signal is very wide, an audio waveform having a maximum
possible amplitude of 1 volt may never reach over 0.1 volts if the audio signal is not very
loud. If the signal is quantized with a linear scale, the values attained by the signal will
cover only 1/10 of the quantization range. As a result, the softer audio signals have a very
granular waveform after being quantized, and the quality of the sound deteriorates rapidly
as the sound gets softer. To compensate for the wide dynamic range of audio signals, a non-
linear scale can be used to quantize the signal. Using this method, the digitized signal will
have an increased number of steps in the lower range, alleviating the problem (Couch, 152).
Using non-uniform quantization can raise the SNR for a softer sound, making the SNR for
a wide range of sound levels approximately uniform (Couch, 155). Typically, non-uniform
quantization is done on a logarithmic scale.
    The two standard formats for the logarithmic quantization of a signal are {Law and
A{Law. A{Law is the standard format used in Europe (Couch, 153), and {Law is used in
the telephone systems of the United States, Canada, and Japan. The {Law quantization,
used in phone systems, uses eight bits of data to provide the dynamic range that normally
requires twelve bits of PCM data (Audio FAQ).
    The process of converting a computer le to {Law is a form of compression, since the
                                             11
amount of data that is needed per sample is reduced and the dynamic range of the sample
is increased. The result is much less data with more information. To create {Law or A{
Law data, the signal must be originally be compressed and later expanded. This process is
commonly referred to as companding.

Silence Compression
Silence compression is a form of lossless compression that is extremely easy to implement.
In silence compression, periods of relative silence in a audio signal are replaced by actual
silence. The samples of data that were used to represent the silent part are replaced by a
code and a number telling the device which reconstructs the analog signal how much silence
to insert. This reduces all of the data needed to represent the silent part of the signal down
to a few bytes.
    To implement this, the compression algorithm rst determines if the audio data is silent
by comparing the level of the digital audio data to a threshold. If the level is lower than the
threshold, that part of the audio signal is considered silent, and the samples are replaced by
zeros. The performance of the algorithm therefore hinges on the threshold level. The higher
the level, the more compression there is but the more lossy the technique is. The amount of
compression achieved also depends on the total length of all the silent periods in an audio
signal. The amount can be very signi cant in some types of audio data like voice data.
      Silence encoding is extremely important for human speech. If you examine a
      waveform of human speech, you will see long, relatively at pauses between the
      spoken words. (Ratcli 32)
   In The Data Compression Book, Mark Nelson wrote silence compression code in C, and
used it to compress some PCM audio data les. The results he obtained were as follows:
                       Filename         Original Compressed Ratio
                   SAMPLE-1.RAW          50777          37769         26%
                   SAMPLE-2.RAW          12033          11657          3%
                   SAMPLE-3.RAW          73019          73072          0%
                   SAMPLE-4.RAW          13852          10962         21%
                   SAMPLE-5.RAW          27411          22865         17%
                                              12
a)




              b)


Figure 5: An Example of Signals in a DM waveform: a) The original and reconstructed
waveforms and b) The DM waveform

The table indicates that silence compression can be very e ective in some instances, but in
others it may have no e ect at all, or even increase the le size slightly. Silence compression
is used mainly in le formats found in computers.

DM
Delta Modulation (DM) is one of the most primitive forms of audio encoding. In DM, a
stream of 1 bit values is used to represent the analog signal. Each bit contains information
on whether the DM signal is greater or less than the actual audio signal. With this information,
the original signal can then be reconstructed.
    Figure 5 shows an example DM signal, the original signal it was generated from, and the
reconstructed signal before ltering. The actual DM signal, Figure 5b, contains information
on whether the output should rise or fall. The size of the step and the rate of the steps are
 xed. The reconstruction algorithm simply raises or lowers the input value according to the
DM waveform.
    DM su ers from two major losses, granular noise and slope overload. Granular noise
occurs when the input signal is at. The DM signal simulates at regions by rising and
falling, leading to granular noise. Slope overload is caused when the input signal rises faster

                                              13
than the DM signal can follow it. Granular noise can be eliminated by making the step size
small enough, and slope overload can be prevented by increasing the data rate. However,
decreasing the step size and increasing the data rate, also increases the amount of data
needed to store the signal. DM is rarely used, but was explained here to provide a basis for
understanding ADM, which o ers a signi cant advantage over PCM.

ADM
Adaptive Delta Modulation (ADM) is the solution to the problems with DM. In ADM, the
step size is continuously adjusted, making the step size larger in the fast changing parts of
the signal and smaller in the slower changing parts of the signal. Using this technique, both
the granular noise and the slope overload problems are solved.
    In order to adjust the step size, an estimation must be made to determine if the signal is
changing rapidly. The estimation in ADM is usually based on the last sample. If the signal
increased for two consecutive samples, the step size is increased. If the two previous steps
were opposite in direction, then the step size is decreased. This estimation method is simple
yet e ective.
    The performance of ADM using the above technique turns out to be better than Log PCM
when little data is used to represent a signal1. When more data is used however, Log PCM
performs better (Lynch 229).

DPCM
A Di erential Pulse Code Modulation (DPCM) system consists of a predictor, a di erence
calculator, and a quantizer. The predictor predicts the value of the next sample. The
di erence calculator then determines the di erence between the predicted value and the actual
value. Finally, this di erence value is quantized by the quantizer. The quantized di erences
are used to represent the original signal.
  1   Performance is measured with SNR.

                                             14
Essentially, a DM signal is a DPCM signal with one bit being used in the quantization
process and a predictor based on the previous bit. In a DM system, the predicted value
is always the same as the previous value and the di erence between the predicted value
(previous value) and the actual signal is quantized with using one bit (two levels).
    The performance of a DPCM signal depends on the predictor. The better it can predict
where the signal is headed, the better it will perform. A DPCM system using one previous
value in the predictor can achieve the same SNR as a {Law PCM system using one less bit
to quantize each sample value. If three previous values are used for the predictor, the same
SNR can be achieved using two bits less to represent each sample (Lynch 227). This is a
signi cant performance increase over PCM because it obtains the same SNR using less data.
This technique can be extended even further by making the prediction method adaptive to the
input data. The technique is called Adaptive Di erential Pulse Code Modulation (ADPCM).

ADPCM
ADPCM is a modi cation of the DPCM technique making the algorithm adapt to the char-
acteristics of the signal. The relationship between DM and ADM is the same as that between
DPCM and ADPCM. In both of these, the algorithm is made adaptive to the changes in the
audio signal. The adaptive part of the system can be built into the predictor, the quantizer,
or both, but has been shown to be most e ective in the quantizer (Lynch 227).
    Using this adaptive algorithm, the compression performance can be increased beyond that
of DPCM. Cohen (1973) shows that by using the two most signi cant bits in the previous
three samples, a gain in SNR of 7dB over non-adaptive DPCM can be obtained" (Lynch,
227). Di erent forms of ADPCM are used in many applications including inexpensive digital
recorders. Also, ADPCM is used in public compression standards which are slowly gaining
popularity, like CCITT G.721 and G.723, which used ADPCM at 32 kbits/s and 24 or 40
kbits/s respectively (Audio FAQ).


                                             15
PASC and ATRAC
All of the previously mentioned compression techniques are a relatively simple re-writing
of the audio data. Precision Adaptive Subband Coding (PASC) and Adaptive TRansform
Acoustic Coding (ATRAC) di er from these, because they are much more complex propri-
etary schemes which were developed for a speci c purpose. PASC and ATRAC were both
developed for used in the Hi-Fi audio market. PASC was developed by Philips for use with
the Digital Compact Cassette (DCC), and ATRAC was developed by Sony for use with their
MiniDisc player. Both of these techniques use psychoacoustic phenomena as a basis for the
compression algorithm in order to achieve the extreme compression ratios required for their
applications.
    The details of the algorithms are complicated, and will not be discussed here. More
information is given in the discussion of compression used in Hi-Fi audio equipment in
Section 4.2. In addition to this, details on PASC can be found in Advanced Digital Audio
by Ken Polmann, and details on ATRAC can be found in the Proceedings of the IEEE in an
article titled, The Rewritable MiniDisc System" by Tadao Yoshida.

3.3.2 Voice Coding Techniques
LPC
Linear Predictive Coding (LPC) is one of the most popular voice coding techniques. In
an LPC system, the voice signal is represented by storing characteristics about the system
creating the voice. When the data is played back, the voice is synthesized from the stored data
by the playing device. The model used in an LPC system includes the source of the sound,
a variable lter resembling the human vocal tract, and an variable ampli er resembling the
amplitude of the sound.
   The source of the sound is modeled in two di erent ways depending on how the voice is
being produced. This is done because humans can produce two types of sound, voiced and
unvoiced. Voiced sounds are those which are created by using the vocal cords and unvoiced
                                              16
sounds are created by pushing air through the vocal tract. An LPC algorithm models these
sounds by using either driven periodic pulses (voiced) or a random noise generator (unvoiced)
as the source.
    The human vocal tract is modeled in the system as a time-varying lter (Lynch, 240).
Parameters are calculated for the lter to mimic the changing characteristics of the vocal
tract when the sound was being produced. The data used to represent the voice in an LPC
algorithm consists of the information on the lter parameters, the source used (voiced or
unvoiced), the pitch of the voice, and the volume of the voice. The amount of data generated
by storing these parameters is signi cantly less than the amount of data used to represent
the waveform of the speech signal.

GSM
The Global System for Mobile telecommunications (GSM) is a standard used for compression
of speech in the European digital cellular telephone system. GSM is an advanced compression
technique that can achieve a compression ratio of 8:1. To obtain this high compression ratio
and still produce high quality sound, GSM is based on the LPC voice coding technique and
also incorporates a form of waveform coding (Degener, 30).


4 Uses of Compression
Compression is used in almost all modern digital audio applications. These devices include
computer les, audio playing devices, telephony applications, and digital recording devices.
Many of the devices, like the telephone system, have been using compression for many years
now. Others have just recently started using it. The type of compression that is used depends
on cost, size, space, and many other factors.
   After reviewing a basic background on compression, one question remains unanswered:
what type of compression is used for a particular application? In the following sections, the

                                             17
uses of compression in two major areas will be discussed: computer les, and digital hi-
stereo equipment. Knowledge about these areas is particularly useful, because it can help in
deciding which device to use.

4.1 Compression in File Formats
When digital audio technology was rst appearing on the market, each computer manufac-
turer had their own le format, or formats, associated with their computer (Audio FAQ). As
software became more advanced, computers attained the ability to read more than one le
format. Today, most software can read and write a wide range of le formats, leaving the
choice to the user.
    In general, there are two types of le formats, raw" and self-describing. In a raw le
format data can be in any format. The encoding and parameters are xed and know in
advance to be able to read the le. The self-describing format has a header in which di erent
information about the data type are stored, like sampling rate and compression. The main
concern here will be with self-describing le formats, since these are most often used and
most versatile.
    A disadvantage of using compression in computer les is that the le usually needs to be
converted to linear PCM data for playback on digital audio devices. This requires extra code
and processing time. It also may be one of the reasons why approximately half of the le
formats available for computers don't support compression. The following is a chart taken
from the Audio Tutorial FAQ" of The Center for Innovative Computer Applications. It
describes most of the popular le formats on the market, and the compression that is used if
any:




                                            18
Extension, Name Origin                 Variable Parameters
   .au or .snd            NeXT, Sun       rate, #channels, encoding, info string
   .aif(f), AIFF          Apple, SGI      rate, #channels, sample width, lots of info
   .aif(f), AIFC          Apple, SGI      same (extension of AIFF with compression)
   .i , IFF/8SVX          Amiga           rate, #channels, instrument info (8 bits)
   .voc                   Soundblaster    rate (8 bits/1 ch; can use silence deletion)
   .wav, WAVE             Microsoft       rate, #channels, sample width, lots of info
                                           including compression scheme]
   .sf                    IRCAM           rate, #channels, encoding, info
   none, HCOM             Mac             rate (8 bits/1 ch; uses Hu man compression)
   none, MIME             Internet         usually 8-bit {Law compression 8000 samp/s]
   .mod or .nst           Amiga            bank of digitized instrument samples with
                                          sequencing information]
    Many of these le formats are just uncompressed PCM data with the sampling rate and
the number of channels used during recording speci ed in the header. For the formats that do
support compression, it is usually optional. For example, in the Soundblaster .voc" format,
silence compression can be used, and in the Microsoft .wav" format, a number of di erent
encoding schemes can be used including PCM, DM, DPCM, and ADPCM.
    Conversion from one format to another can be accomplished via software. The Audio
FAQ" also provides information on a number of di erent programs that will do the conversion.
When converting from uncompressed to compressed formats, the le is generally smaller
afterwards, but some quality is lost. If the le is later converted back, the size will increase,
but the quality can never be regained.

4.2 Compression in Recording Devices
There are currently four major digital stereo devices on the market. These are the Compact
Disc (CD), the Digital Analog Tape (DAT), the Digital Compact Cassette (DCC), and the
MiniDisc (MD). They are all very di erent from each other. The CD and MD use an optical
storage mechanism, and the DAT and DCC use a magnetic tape to store the data. There are
also a number of other apparent di erences between the mediums. For example, a CD is not
                                              19
re-writable while the others are.
    A major di erence that may not be apparent, however, is that the MD and DCC utilize
digital data compression while the DAT and CD do not. This allows the MD and DCC to be
physically smaller than their uncompressed counterparts. In both devices, the smaller data
size is necessary and advantageous.
    In the MD, the design goal was to make the optical disc small so that it would be portable.
The MD contains the same density of data as the CD. Only by using compression can the disc
be made physically smaller than the CD. In addition to reducing the size, the compression
used gave the MD other advantages. It allowed the MD to be the rst optical player with
the digital anti-shock mechanism described in the introduction. Since less data is required
to generate sound and the MD reads at the same speed as the CD, the MD can read more
data than it needs to generate sound. The extra data is stored in a bu er, which does not
need to be very big. CD's eventually came out with the same technology, but in order to
implement it, the reading speed of the CD needed to be increased, and the data needed to
be compressed after reading to t it into a memory bu er.
    The design goal of the DCC was to make the storage medium inexpensive and the same
size as an audio tape. By doing this, a DCC player could accept standard audio tapes as
well as the new DCC tapes, making it more marketable. To be able to t the data onto a
relatively inexpensive tape medium which can be housed in an audio cassette case, digital
compression was required.
    In both the MD and DCC, the space available for digital audio data was approximately 1=4
of the size required for PCM data. The compression ratio needed was therefore approximately
4:1. To obtain such high compression rates, the compression schemes utilize psychoacoustic
phenomena.
    Precision Adaptive Subband Coding (PASC) is the compression algorithm that is used
for the DCC to provide a 4:1 compression of the digital PCM data. PASC is described in
the book Advanced Digital Audio, edited by Ken Pohlmann:

                                              20
The PASC system is based on three principles. First, the ear only hears sounds
     above the threshold of hearing. Second, louder sounds mask softer sounds of
     similar frequency, thus dynamically changing the threshold of hearing. Similarly,
     other masking properties such as high- and low-frequency masking may be util-
     ized. Third, su cient data must be allocated for precise encoding of sounds above
     the dynamic threshold of hearing.
Using PASC, enough digital data can t onto a medium the size of a cassette to make the
DCC player feasible.
    The MD uses the ATRAC compression algorithm, which is based on the same psy-
choacoustical phenomenon. Compression in a MiniDisc is more advanced, however. The
MiniDisc achieves a compression ratio of 5:1 in order to o er 74 min of playback time"
(Yoshida, 1498).
    Although these algorithms o er such a high compression, there are some losses that are
involved. Experts claim that they can hear a di erence between a CD and a MD, but the
actual losses are so minimal that the average person will not hear them. The largest errors
occur with certain types of audio sounds that the compression algorithm has problems with.
In an article in Audio Magazine, Edward Foster writes:
      Although the test was not double-blind, and thus is suspect, I convinced my-
      self I could reliably tell the original from the copy|just barely, buy di erent
      nonetheless.
      The di erences occurred in three areas: A slight suppression of low-level high-
      frequency content when the algorithm needed most of the available bitstream
      to handle strong bass and midrange content, a slight dulling of the attack of
      percussion instruments (piano, harpsichord, glockenspiel, etc.) probably caused
      by imperfect masking of pre-echo" and a slight post-echo" (noise pu ) at the
      sensation of a sharp sound (such as claves struck in an acoustically dead envir-
      onment). The second and third of these anomalies were most readily discernible
      on single instruments played one note at a time in a quiet environment and were
      taken from a recording speci cally made to evaluate perceptual encoders.
Similar e ects exist when listening to a DCC recording. Although the losses are minimal,
they are still present, being the tradeo of having the small compact portable format.

                                            21
5 Conclusion
In the last decade, the eld of digital audio compression has grown tremendously. With the
expansion of the electronics industry and the decreasing prices of digital audio, many devices
which once used analog audio technology now use digital technology. Many of these digital
devices use compression to reduce storage space, and bring down cost.
    Digital audio compression has become a sub-area of Audio Engineering, supporting many
professionals who specialize in this eld. Millions of dollars are invested by companies,
such as Sony and Philips, to develop proprietary compression schemes for their digital audio
applications (Audio FAQ).
    Because of the widespread use of compression, knowledge in this area can be useful.
As a musician working with modern digital recording and editing equipment, the study of
compression can provide an advantage. Knowledge in the eld of compression can help in
the evaluation and understanding of recording and playback equipment. It can also aid when
manipulating digital les with computers. As we move into the next century, and digital
audio technology continues to grow, the knowledge of audio compression will become an
increasingly valuable asset.




                                             22
Bibliography
Audio tutorial FAQ." FTP://pub/usenet/news.answers/audio-fmts/part 12], Center for
Innovative Computer Applications, August 1994.
J. G. Beerends and J. A. Stermerdink, A perceptual audio quality measure based on
a psychoacoustic sound representation," AES: Journal of the Audio Engineering Society,
vol. 40, p. 963, December 1992.
L. W. Couch, Digital and Analog Communication Systems. New York, NY: Macmillan
Publishing Company, fourth ed., 1993.
J. Degener, Digital speech compression," Dr. Dobb's Journal, vol. 19, p. 30, December
1994.
M. Fleischmann, Digital recording arrives," Popular Science, vol. 242, p. 84, April 1993.
E. J. Foster, Sony MSD-501 minidisc deck," Audio, vol. 78, p. 56, November 1994.
D. B. Guralnik, ed., Webster's New World Dictionary. New York, NY: Prentice Hall Press,
second college ed., 1986.
P. Lutter, M. Muller-Wernhart, J. Ramharter, F. Rattay, and P. Slowik, Speech research
with WAVE-GL," Dr. Dobb's Journal, vol. 21, p. 50, November 1996.
T. J. Lynch, Data Compression: Techniques and Applications. New York, NY: Van Nos-
trand Reinhold, 1985.
M. Nelson, The Data Compression Book. San Mateo, CA: M&T Books, 1992.
Panasonic Portable CD Player SL-S600C Operating Instructions.
K. C. Pollmann, ed., Advanced Digital Audio. Carmel, IN: SAMS, rst ed., 1993.
J. W. Ratcli , Audio compression," Dr. Dobb's Journal, vol. 17, p. 32, July 1992.
J. W. Ratcli , Examining PC audio," Dr. Dobb's Journal, vol. 18, p. 78, March 1993.
J. Rothstein, MIDI: A Comprehensive Introduction. Madison, WI: A-R Editions, Inc., 1992.
A. Vollmer, Minidisc, digital compact cassette vie for digital recording market," Electron-
ics, vol. 66, p. 11, September 13 1993.
J. Watkinson, An Introduction to Digital Audio. Jordan Hill, Oxford (GB): Focal Press,
1994.
T. Yoshida, The rewritable minidisc system," Proceedings of the IEEE, vol. 82, p. 1492,
October 1994.

                                            23

Más contenido relacionado

La actualidad más candente

Lecture6 audio
Lecture6   audioLecture6   audio
Lecture6 audio
Mr SMAK
 
Audio compression
Audio compressionAudio compression
Audio compression
Sahil Garg
 
Lecture 8 audio compression
Lecture 8 audio compressionLecture 8 audio compression
Lecture 8 audio compression
Mr SMAK
 
Digital Audio
Digital  AudioDigital  Audio
Digital Audio
surprisem
 

La actualidad más candente (20)

Digaudio
DigaudioDigaudio
Digaudio
 
Audio compression
Audio compressionAudio compression
Audio compression
 
Digital audio
Digital audioDigital audio
Digital audio
 
3 Digital Audio
3 Digital Audio3 Digital Audio
3 Digital Audio
 
Lecture6 audio
Lecture6   audioLecture6   audio
Lecture6 audio
 
Audio compression
Audio compression Audio compression
Audio compression
 
Audio compression
Audio compressionAudio compression
Audio compression
 
Digital audio
Digital audioDigital audio
Digital audio
 
Sample rate
Sample rateSample rate
Sample rate
 
MPEG/Audio Compression
MPEG/Audio CompressionMPEG/Audio Compression
MPEG/Audio Compression
 
Sampling rate bit depth
Sampling rate bit depthSampling rate bit depth
Sampling rate bit depth
 
Audio compression 1
Audio compression 1Audio compression 1
Audio compression 1
 
Digital audio
Digital audioDigital audio
Digital audio
 
Audio and Video Compression
Audio and Video CompressionAudio and Video Compression
Audio and Video Compression
 
Multimedia seminar ppt
Multimedia seminar pptMultimedia seminar ppt
Multimedia seminar ppt
 
Sound
SoundSound
Sound
 
Lecture 8 audio compression
Lecture 8 audio compressionLecture 8 audio compression
Lecture 8 audio compression
 
Digital Audio
Digital  AudioDigital  Audio
Digital Audio
 
Mm Unit 3
Mm Unit 3Mm Unit 3
Mm Unit 3
 
Sampling rate bit depth_lossey lossless
Sampling rate bit depth_lossey losslessSampling rate bit depth_lossey lossless
Sampling rate bit depth_lossey lossless
 

Destacado (7)

Lec5 Compression
Lec5 CompressionLec5 Compression
Lec5 Compression
 
Zvuk
ZvukZvuk
Zvuk
 
Audio and video compression
Audio and video compressionAudio and video compression
Audio and video compression
 
Lesson 4 - Audio File Formats
Lesson 4 - Audio File FormatsLesson 4 - Audio File Formats
Lesson 4 - Audio File Formats
 
LO4 - Lesson 3 - Feedback
LO4 - Lesson 3 - FeedbackLO4 - Lesson 3 - Feedback
LO4 - Lesson 3 - Feedback
 
Audio compression
Audio compressionAudio compression
Audio compression
 
multimedia technologies Introduction
multimedia technologies Introductionmultimedia technologies Introduction
multimedia technologies Introduction
 

Similar a Compression

Richard_Final_Poster
Richard_Final_PosterRichard_Final_Poster
Richard_Final_Poster
Richard Jung
 
multimedia chapter1
multimedia chapter1multimedia chapter1
multimedia chapter1
nes
 
Damayo on recordings
Damayo on recordingsDamayo on recordings
Damayo on recordings
SFYC
 
Analog to digital conversion
Analog to digital conversionAnalog to digital conversion
Analog to digital conversion
Firman Bachtiar
 

Similar a Compression (20)

Optical recording and reproduction
Optical recording and reproductionOptical recording and reproduction
Optical recording and reproduction
 
Universal sound recorder using arm 9
Universal sound recorder using arm 9Universal sound recorder using arm 9
Universal sound recorder using arm 9
 
Chapter 2- Digital Data Acquistion.ppt
Chapter 2- Digital Data Acquistion.pptChapter 2- Digital Data Acquistion.ppt
Chapter 2- Digital Data Acquistion.ppt
 
Multimedia elements
Multimedia elementsMultimedia elements
Multimedia elements
 
Digital Audio Watermarking Using Psychoacoustic Model and CDMA Modulation
Digital Audio Watermarking Using Psychoacoustic Model and CDMA ModulationDigital Audio Watermarking Using Psychoacoustic Model and CDMA Modulation
Digital Audio Watermarking Using Psychoacoustic Model and CDMA Modulation
 
Lecture 1.pptx
Lecture 1.pptxLecture 1.pptx
Lecture 1.pptx
 
Audio-1
Audio-1Audio-1
Audio-1
 
Audio
AudioAudio
Audio
 
F5242832
F5242832F5242832
F5242832
 
Multimedia and-system-design-sound-images by zubair yaseen& yameen shakir
Multimedia and-system-design-sound-images by zubair yaseen& yameen shakirMultimedia and-system-design-sound-images by zubair yaseen& yameen shakir
Multimedia and-system-design-sound-images by zubair yaseen& yameen shakir
 
M1L1-2.ppt
M1L1-2.pptM1L1-2.ppt
M1L1-2.ppt
 
Richard_Final_Poster
Richard_Final_PosterRichard_Final_Poster
Richard_Final_Poster
 
Mk3422222228
Mk3422222228Mk3422222228
Mk3422222228
 
multimedia chapter1
multimedia chapter1multimedia chapter1
multimedia chapter1
 
Multimedia.pdf
Multimedia.pdfMultimedia.pdf
Multimedia.pdf
 
Damayo on recordings
Damayo on recordingsDamayo on recordings
Damayo on recordings
 
N017657985
N017657985N017657985
N017657985
 
Data Compression using Multiple Transformation Techniques for Audio Applicati...
Data Compression using Multiple Transformation Techniques for Audio Applicati...Data Compression using Multiple Transformation Techniques for Audio Applicati...
Data Compression using Multiple Transformation Techniques for Audio Applicati...
 
Novel Approach of Implementing Psychoacoustic model for MPEG-1 Audio
Novel Approach of Implementing Psychoacoustic model for MPEG-1 AudioNovel Approach of Implementing Psychoacoustic model for MPEG-1 Audio
Novel Approach of Implementing Psychoacoustic model for MPEG-1 Audio
 
Analog to digital conversion
Analog to digital conversionAnalog to digital conversion
Analog to digital conversion
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Último (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 

Compression

  • 1. Audio Compression by: Philipp Herget Su ciency Course Sequence: Course Number Course Title Term HI1341 Introduction to Global History A92 HI2328 History of Revolution in the 20th Century B92 MU1611 Fundamentals of Music I A93 MU2611 Fundamentals of Music II B93 MU3611 Computer Techniques in Music C94 Presented to: Professor Bianchi Department of Humanities & Arts Term B, 1996 FWB5102 Submitted in Partial Ful llment of the Requirements of the Humanities & Arts Su ciency Program Worcester Polytechnic Institute Worcester, Massachusetts
  • 2. Abstract This report examines the area of audio compression and its rapidly expanding use in the world today. Covered topics include a primer on digital audio, discussion of di erent compression techniques, a description of a variety of compressed formats, and compression in computers and Hi-Fi stereo equipment. Information was gathered on a multitude of di erent compression uses.
  • 3. Contents 1 Introduction 1 2 Digital Audio Basics 2 3 Compression Basics 7 3.1 Lossless vs. Lossy Compression : : : : : : : : : : : : : : : : : : : : : : : : : 7 3.2 Audio Compression Techniques : : : : : : : : : : : : : : : : : : : : : : : : : 9 3.3 Common Audio Compression Techniques : : : : : : : : : : : : : : : : : : : : 10 4 Uses of Compression 17 4.1 Compression in File Formats : : : : : : : : : : : : : : : : : : : : : : : : : : : 18 4.2 Compression in Recording Devices : : : : : : : : : : : : : : : : : : : : : : : 19 5 Conclusion 22 Bibliography 23 i
  • 4. 1 Introduction The rst form of audio compression came out in 1939 when Dudley rst introduced the VOCODER (VOice CODER) to reduce the amount of bandwidth needed to transmit speech over a telephone line (Lynch, 222). The VOCODER broke speech down into certain fre- quency bands, transmitted information about the amount of energy in each band, and then synthesized speech using the transmitted information on the receiving end of the device. Since then, there has been a great deal of research conducted in the area of audio compression. In the 1960's, compression was used in telephony, and extensive research was done to minimize bandwidth needed to transmit audio data (Nelson, 313). Today, audio compression is a large subarea of Audio Engineering. The need for audio compression is brought about by the tremendous amount of space required to store high quality digital audio data. One minute of CD quality audio data takes up 4Mbytes of storage space (Ratcli , 32). The use of compression allows a signi cant reduction in the amount of data needed to create audio sounds with usually only a minimal loss in the quality of the audio signal. Compression comes at the expense of the extra hardware or software needed to compress the signal. However, in todays technologically advanced times, this cost is usually small compared to the cost of space that is saved. Compression is used in almost all new digital audio devices on the market, and in many of the older ones. Some examples are the telephone system, digital message recorders, like those in answering machines, and Sony's new MiniDisc player. With the use of compression, these devices are able to store more information in less space. Compression is accompanied by a loss in quality, but usually so minimal it cannot be heard by most people. A good example of this is the anti-shock mechanism found in the newer CD players. This mechanism uses a small portion of digital memory to bu er digital data from the CD. When a physical shock disrupts the player and it can no longer read data from the CD, the data from the memory bu er is used to generate the audio signal until the player re-tracks on the CD. To store a maximum amount of data, the player uses compression to store the data in the memory. The 1
  • 5. Panasonic SL-S600C has such an anti-shock mechanism with 10 seconds of storage bu er. The Panasonic SL-S600C Operating Instructions state: The extra anti-shock function incorporates digital signal compression technology. When listening to sound with the unit connected to a system at home, it is recommended that the extra anti-shock switch be set to the OFF position. The recommendation is given because the compression algorithm used in the storage has a slightly detrimental impact on the sound quality. The use of audio compression is a tradeo among di erent factors. Knowledge of audio compression is useful not only to the designer, but also the consumer. The key questions that arise in the evaluation of an audio compression systems are how much is the data compressed, what are the losses associated with the compression, and what is the cost of the compression. This paper will answer some of these questions by providing a basic awareness of compression, giving background on compression, explaining various popular compression techniques, and discussing the compression formats used in various audio devices and audio computer les. 2 Digital Audio Basics Compression can be accomplished using two di erent methods. The rst method is to take the data from a standard digital audio system and compress it using software. The second is to encode the signal in a di erent yet similar manner to that done in a normal digital audio system. Both of these methods are based on digital audio theory, therefore, the understanding of their functionality and performances requires an understanding of digital audio basics. The sounds we hear are caused by variations in air pressure which are picked up by our ear. In an analog electronic audio system, these pressure signals are converted to a electric voltage by a microphone. The changing voltage, which represents the sound pressure, is stored on a medium (like tape), and later used to control a speaker to reproduce the original sound. The largest source of error in such an audio system occurs in the storage and retrieval process were noise is added to the sound. 2
  • 6. Voltage (Air Pressure) time Figure 1: An Example of an Analog Waveform The idea behind a digital system is to represent an analog (continuous) waveform as a nite number of discrete values. These values can be stored in any digital media, such as a computer. Later, the values can be converted back to an analog audio signal. This method is advantageous over the older analog techniques because no information (quality) is lost in the storage and retrieval process. Also unlike analog, when a copy of a digital recording is made, the values can be exactly duplicated, creating an exact replica of the original digital work. However, the process does su er other losses. These losses occur in the conversion process from the analog to the digital format. To explain the analog to digital conversion process, we will look at an analog audio waveform and show each of the steps taken in digitizing it. The waveform in Figure 1 represents a brief moment of an audible sound. The amplitude of the waveform represents the relative air pressure due to the sound. In a digital system, the waveform is represented by a series of discrete values. To get these values, two steps must be taken. First the signal is sampled. This means that discrete values of the signal are selected in time. The second step is to quantize each of the values attained in the sampling step. Quantization reduces the amount of storage space required for each value in a digital system. In the rst step, the samples are taken at constant intervals. The number of samples 3
  • 7. Voltage time T Figure 2: An Example of a Sampled Analog Waveform taken every second is called the sampling rate. Figure 2 shows the result of sampling the signal. The X's on the waveform represent the samples which were taken. Since the samples were taken every T seconds, there are 1=T samples per second. The sampling rate shown in Figure 2 is therefore 1=T samples/s. Typical sampling rates range from 8000 to 44100 samples/s for a CD. The term samples/s is often replaced by the term Hz, kHz, or MHz to represent units of samples/s, kilo samples/s, or Mega samples/s respectively (Audio FAQ). The sample values, the values with the X's, now represent the original waveform. These values could now be stored, and be used at a later time to recreate the original signal. How well the original signal can be recreated, is related to the number of samples taken in a given time period. Therefore, the sampling rate is a critical factor in the quality of the digitized signal. If too few samples are taken, then the original signal cannot be re-generated correctly. In 1933, a publication by Harry Nyquest proved that if the sampling rate is greater that twice the highest frequency of the original signal, the original signal can be exactly reconstructed (Nelson, 321). This means that if we sample our original signal at a rate that is twice as high as the highest frequency contained in the signal, there will be no theoretical losses of quality. This sampling rate, necessary for perfect reconstruction, is commonly referred to as the Nyquest rate. Now that we have a set of consecutive samples of the original signal, the samples need 4
  • 8. Voltage time T Figure 3: An Example of a Quantization of the A Sampled Analog Waveform to be quantized in order to reduce the storage space required by each sample. The process involves converting the sampled values into a certain number of discrete levels, which are stored as binary numbers. A sample value is typically converted to one of 2n levels, where n is the number of bits used to represent each sample digitally. This process is carried out in hardware by a device called an analog to digital converter (ADC). The result of quantizing the values from Figure 2 is shown in Figure 3. The samples still have approximately the same value as before, but have been rounded o " to the nearest of 16 di erent levels. In a digital system, the amount of storage space required by a number is governed by the number of possible values that number could have. By quantizing the sample, the number of possible values is limited, signi cantly reducing the required storage space. After quantizing the value of each sample in the gure to one of 24 levels, only 4 bits of storage are needed for each sample. In most digital audio systems, either 8 or 16 bits are used for storage, yielding 28 = 256 or 216 = 65536 di erent levels in the quantization process. The quantization process is the most signi cant source of error in a digital audio signal. Each time a value is quantized, the original value is lost, and the value is replaced by an approximation of the original. The peak value of the error is 1=2 the value of the quantization step. Thus the smaller the quantization steps, the smaller the error is. This means the more 5
  • 9. Voltage time T Figure 4: An Example of a Signal Reconstructed from the Digital Data bits used to quantize the signal, the better the quality of reconstructed sound signal, and the more space required to store the signal values. To regain the original signal, each of the values stored as the digital audio signal are converted back to an analog audio signal using a Digital to Analog Converter (DAC). An example of the output of the DAC is shown in Figure 4. The DAC takes the sample points and makes an analog waveform out of them. Due to the process used to convert the waveform, the resulting signal is comprised of a series of steps. To remedy this, the signal is then put through a low pass lter which smoothes out the waveform, removing all of the sharp edges caused by the DAC. The resulting signal is very close to the original. All the losses in the digital system occur in the conversion process to and from a digital signal. Once the signal is digital, it can be duplicated, or replayed any number of times and never lose any quality. This is the advantage of a digital system. The losses generated by the conversion process can be measured as a Signal to Noise Ratio (SNR), the same measure used for analog signals. The noise in the signal is considered to be the signal that would have to be subtracted from the reconstructed signal to obtain the original. SNR is used to compare the quality of di erent types of quantization, and is also used in the quality measurement of compression techniques. 6
  • 10. 3 Compression Basics The underlying idea behind data compression is that a data le can be re-written in a di erent format that takes up less space. A data format is called compressed when it saves either more information in the same space, or saves information in less space than a standard uncompressed format. A compression algorithm for an audio signal will analyze the signal and store it in a di erent way, hopefully saving space. An analogy could be made between compression and shorthand. In shorthand, words are represented by symbols, e ectively shortening the amount of space occupied. Data compression uses the same concept. 3.1 Lossless vs. Lossy Compression The eld of compression is divided into two categories, lossless and lossy compression. In lossless compression, no data is lost in the compression process. An example of a lossless compression program is pkzip for the IBM PC. This is a shareware utility which is widely available. It can be used to compress and uncompress any type of computer le. When a le is uncompressed, the exact original is retrieved. The amount of compression that is achieved is highly dependent on the type of le, and varies greatly from le to le. In lossy compression schemes, the goal is to encode an approximation of the original. By using a close approximation of the signal, the coding can usually be accomplished using much less space. Since an approximation is saved, instead of the original, lossy compression schemes can only be used to compress information when the exact original is not needed. This is the case for audio and video data. With these types of data, any digital format used is an approximation of the original signal. Compression used in computer data or program les must be compressed using lossless compression because all of the data is usually critical. In general, lossy compression schemes yield much higher compression ratios than lossless compression schemes. In many cases, the di erence in quality between the compressed version and the original is so minimal that it is not noticeable. Yet, in other compression 7
  • 11. schemes there is a signi cant di erence in quality. Deciding what how much information is to be lost is up to the discretion of the designer of the algorithm or technique. It is a tradeo between size and quality. If the shorthand writer, from the previous analogy, was to write only the main idea's of the text down, it would be analogous to lossy compression. Using only the main ideas would be an extreme form of compression. If he or she were to leave out some adjectives and adverbs, it would again be a form of lossy compression. This one being less lossy than the rst. From the analogy, it can be seen how the writer (programmer) can decide how important the details are and how many details to include. Almost all compression techniques used in digital systems are lossy. This is because lossless compression algorithms are generally very unpredictable in the amount of compres- sion they can achieve. In a typical application, there is a limited amount of space" for the digital audio data that is generated. If the audio data cannot be compressed to a guaranteed size, it simply will not t in the required space, which is unacceptable. The reason for the unpredictability of a lossless technique lies in the technique itself. Data which happens to be in a format which does not lend itself to the way the lossless technique re-writes" the data will not be compressed. In The Data Compression Book, Mark Nelson compares raw speech les which were compressed with a shareware lossless data compression program, ARJ, to demonstrate how well a typical lossless compression scheme will compress an audio signal. He states: ARJ results showed that voice les did in fact compress relatively well. The six sample raw sound les gave the following results: Filename Original Compressed Ratio SAMPLE-1.RAW 50777 33036 35% SAMPLE-2.RAW 12033 8796 27% SAMPLE-3.RAW 73019 59527 19% SAMPLE-4.RAW 23702 9418 60% SAMPLE-5.RAW 27411 19037 30% SAMPLE-6.RAW 15913 12771 20% 8
  • 12. His data shows that the compression ratios uctuate greatly depending on the particular sample of speech that is used. 3.2 Audio Compression Techniques For any type of compression, the compression ratio and the algorithm used is highly depend- ent on the type of data that is being compressed. The data source used in this paper is audio data, and we have already determined that lossy compression will be used in most cases. Now we can further subdivide the source into music and voice data. The more information that is known about the source, the better the compression tech- nique can be tailored toward that type of data. The di erences between music and speech allow audio compression techniques to be subdivided into two categories: waveform coding and voice coding. Waveform coding can be used on all types of audio data, including voice. The goal of waveform coding is to recreate the original waveform after decompression. The closer the decompressed waveform is to the original, the better the quality of the coding algorithm is. The second technique, voice coding, yields a much higher compression ratio, but can only be used if the audio source is a voice. In voice coding, the goal is to recreate the words that were spoken and not the actual voice. The algorithms utilize priori information about the human voice, in particular the mechanism that produces it" (Lynch, 255). Since the two techniques are fundamentally di erent, the performance of each technique is measured di erently. The performance of waveform coding techniques are measured by determining how well the uncompressed signal matches the original speech waveform. This is usually done by measuring the SNR. With the voice coding technique this is not possible since the technique doesn't try to mimic the waveform. Therefore, in voice coding algorithms, the quality of the algorithm is measured by listener preference. These coding techniques can be further subdivided into two categories, time domain coding and frequency domain coding. In a time domain coding technique, information on each of the samples of the original signal are encoded. In a frequency domain coding technique, 9
  • 13. the signal is transformed into it's frequency representation. This frequency representation is then encoded into a compressed format. Later the information is decoded, and transformed back into the time representation of the signal to get back the original samples. Most simple compression algorithms use a time domain coding technique. The more recent waveform coding techniques provide a much higher compression ratio by using psychoacoustics to aid in the compression. Psychoacoustics is the study of how sounds are heard subjectively and of the individual's response to sound stimuli" (Webster's New World Dictionary, 1147). By basing the compression scheme on psychoacoustic phenomenon, data that can't be heard by humans can be discarded. For example, in psychoacoustics it has been determined that certain levels of sounds cannot be heard while other louder sounds are present (Beerends, 965). This e ect is called masking. By eliminating the unheard sounds from the audio signal, the signal is simpli ed, and can be more easily compressed. Techniques like these are used in modern systems where high compression ratios are necessary, like Sony's new MiniDisc player. 3.3 Common Audio Compression Techniques The techniques that have been discussed thus far are general subcategories of the approaches that can be taken when designing an audio compression algorithm. In this section, the details of some popular compression techniques will be discussed. Since compression is such a large area, a comprehensive guide to all the di erent compression methods is far beyond the scope of this paper. However, this section covers some fundamental and some advanced techniques to provide a general idea of how di erent compression techniques are implemented. To give a general background, both waveform and voice coding techniques are discussed. Since the waveform coding techniques are simpler, they will be discussed rst. In these techniques, the compressed digital data is often obtained from the original signal itself, rather than creating standard digital audio data and compressing it with software. 10
  • 14. 3.3.1 Waveform Coding Techniques PCM Pulse Code Modulation (PCM) refers to the technique used to code the raw digital audio data as described in Section 2. It is the fundamental digital audio technique that is used most frequently in digital audio systems. Although PCM is not a compression technique, when it is used along with non-uniform quantization such as {Law or A{Law, it can be considered compression. PCM combined with non-uniform quantization is used as a reference for comparing the performance of other compression schemes (Lynch, 225). {Law and A{Law Companding Since the dynamic range of an audio signal is very wide, an audio waveform having a maximum possible amplitude of 1 volt may never reach over 0.1 volts if the audio signal is not very loud. If the signal is quantized with a linear scale, the values attained by the signal will cover only 1/10 of the quantization range. As a result, the softer audio signals have a very granular waveform after being quantized, and the quality of the sound deteriorates rapidly as the sound gets softer. To compensate for the wide dynamic range of audio signals, a non- linear scale can be used to quantize the signal. Using this method, the digitized signal will have an increased number of steps in the lower range, alleviating the problem (Couch, 152). Using non-uniform quantization can raise the SNR for a softer sound, making the SNR for a wide range of sound levels approximately uniform (Couch, 155). Typically, non-uniform quantization is done on a logarithmic scale. The two standard formats for the logarithmic quantization of a signal are {Law and A{Law. A{Law is the standard format used in Europe (Couch, 153), and {Law is used in the telephone systems of the United States, Canada, and Japan. The {Law quantization, used in phone systems, uses eight bits of data to provide the dynamic range that normally requires twelve bits of PCM data (Audio FAQ). The process of converting a computer le to {Law is a form of compression, since the 11
  • 15. amount of data that is needed per sample is reduced and the dynamic range of the sample is increased. The result is much less data with more information. To create {Law or A{ Law data, the signal must be originally be compressed and later expanded. This process is commonly referred to as companding. Silence Compression Silence compression is a form of lossless compression that is extremely easy to implement. In silence compression, periods of relative silence in a audio signal are replaced by actual silence. The samples of data that were used to represent the silent part are replaced by a code and a number telling the device which reconstructs the analog signal how much silence to insert. This reduces all of the data needed to represent the silent part of the signal down to a few bytes. To implement this, the compression algorithm rst determines if the audio data is silent by comparing the level of the digital audio data to a threshold. If the level is lower than the threshold, that part of the audio signal is considered silent, and the samples are replaced by zeros. The performance of the algorithm therefore hinges on the threshold level. The higher the level, the more compression there is but the more lossy the technique is. The amount of compression achieved also depends on the total length of all the silent periods in an audio signal. The amount can be very signi cant in some types of audio data like voice data. Silence encoding is extremely important for human speech. If you examine a waveform of human speech, you will see long, relatively at pauses between the spoken words. (Ratcli 32) In The Data Compression Book, Mark Nelson wrote silence compression code in C, and used it to compress some PCM audio data les. The results he obtained were as follows: Filename Original Compressed Ratio SAMPLE-1.RAW 50777 37769 26% SAMPLE-2.RAW 12033 11657 3% SAMPLE-3.RAW 73019 73072 0% SAMPLE-4.RAW 13852 10962 21% SAMPLE-5.RAW 27411 22865 17% 12
  • 16. a) b) Figure 5: An Example of Signals in a DM waveform: a) The original and reconstructed waveforms and b) The DM waveform The table indicates that silence compression can be very e ective in some instances, but in others it may have no e ect at all, or even increase the le size slightly. Silence compression is used mainly in le formats found in computers. DM Delta Modulation (DM) is one of the most primitive forms of audio encoding. In DM, a stream of 1 bit values is used to represent the analog signal. Each bit contains information on whether the DM signal is greater or less than the actual audio signal. With this information, the original signal can then be reconstructed. Figure 5 shows an example DM signal, the original signal it was generated from, and the reconstructed signal before ltering. The actual DM signal, Figure 5b, contains information on whether the output should rise or fall. The size of the step and the rate of the steps are xed. The reconstruction algorithm simply raises or lowers the input value according to the DM waveform. DM su ers from two major losses, granular noise and slope overload. Granular noise occurs when the input signal is at. The DM signal simulates at regions by rising and falling, leading to granular noise. Slope overload is caused when the input signal rises faster 13
  • 17. than the DM signal can follow it. Granular noise can be eliminated by making the step size small enough, and slope overload can be prevented by increasing the data rate. However, decreasing the step size and increasing the data rate, also increases the amount of data needed to store the signal. DM is rarely used, but was explained here to provide a basis for understanding ADM, which o ers a signi cant advantage over PCM. ADM Adaptive Delta Modulation (ADM) is the solution to the problems with DM. In ADM, the step size is continuously adjusted, making the step size larger in the fast changing parts of the signal and smaller in the slower changing parts of the signal. Using this technique, both the granular noise and the slope overload problems are solved. In order to adjust the step size, an estimation must be made to determine if the signal is changing rapidly. The estimation in ADM is usually based on the last sample. If the signal increased for two consecutive samples, the step size is increased. If the two previous steps were opposite in direction, then the step size is decreased. This estimation method is simple yet e ective. The performance of ADM using the above technique turns out to be better than Log PCM when little data is used to represent a signal1. When more data is used however, Log PCM performs better (Lynch 229). DPCM A Di erential Pulse Code Modulation (DPCM) system consists of a predictor, a di erence calculator, and a quantizer. The predictor predicts the value of the next sample. The di erence calculator then determines the di erence between the predicted value and the actual value. Finally, this di erence value is quantized by the quantizer. The quantized di erences are used to represent the original signal. 1 Performance is measured with SNR. 14
  • 18. Essentially, a DM signal is a DPCM signal with one bit being used in the quantization process and a predictor based on the previous bit. In a DM system, the predicted value is always the same as the previous value and the di erence between the predicted value (previous value) and the actual signal is quantized with using one bit (two levels). The performance of a DPCM signal depends on the predictor. The better it can predict where the signal is headed, the better it will perform. A DPCM system using one previous value in the predictor can achieve the same SNR as a {Law PCM system using one less bit to quantize each sample value. If three previous values are used for the predictor, the same SNR can be achieved using two bits less to represent each sample (Lynch 227). This is a signi cant performance increase over PCM because it obtains the same SNR using less data. This technique can be extended even further by making the prediction method adaptive to the input data. The technique is called Adaptive Di erential Pulse Code Modulation (ADPCM). ADPCM ADPCM is a modi cation of the DPCM technique making the algorithm adapt to the char- acteristics of the signal. The relationship between DM and ADM is the same as that between DPCM and ADPCM. In both of these, the algorithm is made adaptive to the changes in the audio signal. The adaptive part of the system can be built into the predictor, the quantizer, or both, but has been shown to be most e ective in the quantizer (Lynch 227). Using this adaptive algorithm, the compression performance can be increased beyond that of DPCM. Cohen (1973) shows that by using the two most signi cant bits in the previous three samples, a gain in SNR of 7dB over non-adaptive DPCM can be obtained" (Lynch, 227). Di erent forms of ADPCM are used in many applications including inexpensive digital recorders. Also, ADPCM is used in public compression standards which are slowly gaining popularity, like CCITT G.721 and G.723, which used ADPCM at 32 kbits/s and 24 or 40 kbits/s respectively (Audio FAQ). 15
  • 19. PASC and ATRAC All of the previously mentioned compression techniques are a relatively simple re-writing of the audio data. Precision Adaptive Subband Coding (PASC) and Adaptive TRansform Acoustic Coding (ATRAC) di er from these, because they are much more complex propri- etary schemes which were developed for a speci c purpose. PASC and ATRAC were both developed for used in the Hi-Fi audio market. PASC was developed by Philips for use with the Digital Compact Cassette (DCC), and ATRAC was developed by Sony for use with their MiniDisc player. Both of these techniques use psychoacoustic phenomena as a basis for the compression algorithm in order to achieve the extreme compression ratios required for their applications. The details of the algorithms are complicated, and will not be discussed here. More information is given in the discussion of compression used in Hi-Fi audio equipment in Section 4.2. In addition to this, details on PASC can be found in Advanced Digital Audio by Ken Polmann, and details on ATRAC can be found in the Proceedings of the IEEE in an article titled, The Rewritable MiniDisc System" by Tadao Yoshida. 3.3.2 Voice Coding Techniques LPC Linear Predictive Coding (LPC) is one of the most popular voice coding techniques. In an LPC system, the voice signal is represented by storing characteristics about the system creating the voice. When the data is played back, the voice is synthesized from the stored data by the playing device. The model used in an LPC system includes the source of the sound, a variable lter resembling the human vocal tract, and an variable ampli er resembling the amplitude of the sound. The source of the sound is modeled in two di erent ways depending on how the voice is being produced. This is done because humans can produce two types of sound, voiced and unvoiced. Voiced sounds are those which are created by using the vocal cords and unvoiced 16
  • 20. sounds are created by pushing air through the vocal tract. An LPC algorithm models these sounds by using either driven periodic pulses (voiced) or a random noise generator (unvoiced) as the source. The human vocal tract is modeled in the system as a time-varying lter (Lynch, 240). Parameters are calculated for the lter to mimic the changing characteristics of the vocal tract when the sound was being produced. The data used to represent the voice in an LPC algorithm consists of the information on the lter parameters, the source used (voiced or unvoiced), the pitch of the voice, and the volume of the voice. The amount of data generated by storing these parameters is signi cantly less than the amount of data used to represent the waveform of the speech signal. GSM The Global System for Mobile telecommunications (GSM) is a standard used for compression of speech in the European digital cellular telephone system. GSM is an advanced compression technique that can achieve a compression ratio of 8:1. To obtain this high compression ratio and still produce high quality sound, GSM is based on the LPC voice coding technique and also incorporates a form of waveform coding (Degener, 30). 4 Uses of Compression Compression is used in almost all modern digital audio applications. These devices include computer les, audio playing devices, telephony applications, and digital recording devices. Many of the devices, like the telephone system, have been using compression for many years now. Others have just recently started using it. The type of compression that is used depends on cost, size, space, and many other factors. After reviewing a basic background on compression, one question remains unanswered: what type of compression is used for a particular application? In the following sections, the 17
  • 21. uses of compression in two major areas will be discussed: computer les, and digital hi- stereo equipment. Knowledge about these areas is particularly useful, because it can help in deciding which device to use. 4.1 Compression in File Formats When digital audio technology was rst appearing on the market, each computer manufac- turer had their own le format, or formats, associated with their computer (Audio FAQ). As software became more advanced, computers attained the ability to read more than one le format. Today, most software can read and write a wide range of le formats, leaving the choice to the user. In general, there are two types of le formats, raw" and self-describing. In a raw le format data can be in any format. The encoding and parameters are xed and know in advance to be able to read the le. The self-describing format has a header in which di erent information about the data type are stored, like sampling rate and compression. The main concern here will be with self-describing le formats, since these are most often used and most versatile. A disadvantage of using compression in computer les is that the le usually needs to be converted to linear PCM data for playback on digital audio devices. This requires extra code and processing time. It also may be one of the reasons why approximately half of the le formats available for computers don't support compression. The following is a chart taken from the Audio Tutorial FAQ" of The Center for Innovative Computer Applications. It describes most of the popular le formats on the market, and the compression that is used if any: 18
  • 22. Extension, Name Origin Variable Parameters .au or .snd NeXT, Sun rate, #channels, encoding, info string .aif(f), AIFF Apple, SGI rate, #channels, sample width, lots of info .aif(f), AIFC Apple, SGI same (extension of AIFF with compression) .i , IFF/8SVX Amiga rate, #channels, instrument info (8 bits) .voc Soundblaster rate (8 bits/1 ch; can use silence deletion) .wav, WAVE Microsoft rate, #channels, sample width, lots of info including compression scheme] .sf IRCAM rate, #channels, encoding, info none, HCOM Mac rate (8 bits/1 ch; uses Hu man compression) none, MIME Internet usually 8-bit {Law compression 8000 samp/s] .mod or .nst Amiga bank of digitized instrument samples with sequencing information] Many of these le formats are just uncompressed PCM data with the sampling rate and the number of channels used during recording speci ed in the header. For the formats that do support compression, it is usually optional. For example, in the Soundblaster .voc" format, silence compression can be used, and in the Microsoft .wav" format, a number of di erent encoding schemes can be used including PCM, DM, DPCM, and ADPCM. Conversion from one format to another can be accomplished via software. The Audio FAQ" also provides information on a number of di erent programs that will do the conversion. When converting from uncompressed to compressed formats, the le is generally smaller afterwards, but some quality is lost. If the le is later converted back, the size will increase, but the quality can never be regained. 4.2 Compression in Recording Devices There are currently four major digital stereo devices on the market. These are the Compact Disc (CD), the Digital Analog Tape (DAT), the Digital Compact Cassette (DCC), and the MiniDisc (MD). They are all very di erent from each other. The CD and MD use an optical storage mechanism, and the DAT and DCC use a magnetic tape to store the data. There are also a number of other apparent di erences between the mediums. For example, a CD is not 19
  • 23. re-writable while the others are. A major di erence that may not be apparent, however, is that the MD and DCC utilize digital data compression while the DAT and CD do not. This allows the MD and DCC to be physically smaller than their uncompressed counterparts. In both devices, the smaller data size is necessary and advantageous. In the MD, the design goal was to make the optical disc small so that it would be portable. The MD contains the same density of data as the CD. Only by using compression can the disc be made physically smaller than the CD. In addition to reducing the size, the compression used gave the MD other advantages. It allowed the MD to be the rst optical player with the digital anti-shock mechanism described in the introduction. Since less data is required to generate sound and the MD reads at the same speed as the CD, the MD can read more data than it needs to generate sound. The extra data is stored in a bu er, which does not need to be very big. CD's eventually came out with the same technology, but in order to implement it, the reading speed of the CD needed to be increased, and the data needed to be compressed after reading to t it into a memory bu er. The design goal of the DCC was to make the storage medium inexpensive and the same size as an audio tape. By doing this, a DCC player could accept standard audio tapes as well as the new DCC tapes, making it more marketable. To be able to t the data onto a relatively inexpensive tape medium which can be housed in an audio cassette case, digital compression was required. In both the MD and DCC, the space available for digital audio data was approximately 1=4 of the size required for PCM data. The compression ratio needed was therefore approximately 4:1. To obtain such high compression rates, the compression schemes utilize psychoacoustic phenomena. Precision Adaptive Subband Coding (PASC) is the compression algorithm that is used for the DCC to provide a 4:1 compression of the digital PCM data. PASC is described in the book Advanced Digital Audio, edited by Ken Pohlmann: 20
  • 24. The PASC system is based on three principles. First, the ear only hears sounds above the threshold of hearing. Second, louder sounds mask softer sounds of similar frequency, thus dynamically changing the threshold of hearing. Similarly, other masking properties such as high- and low-frequency masking may be util- ized. Third, su cient data must be allocated for precise encoding of sounds above the dynamic threshold of hearing. Using PASC, enough digital data can t onto a medium the size of a cassette to make the DCC player feasible. The MD uses the ATRAC compression algorithm, which is based on the same psy- choacoustical phenomenon. Compression in a MiniDisc is more advanced, however. The MiniDisc achieves a compression ratio of 5:1 in order to o er 74 min of playback time" (Yoshida, 1498). Although these algorithms o er such a high compression, there are some losses that are involved. Experts claim that they can hear a di erence between a CD and a MD, but the actual losses are so minimal that the average person will not hear them. The largest errors occur with certain types of audio sounds that the compression algorithm has problems with. In an article in Audio Magazine, Edward Foster writes: Although the test was not double-blind, and thus is suspect, I convinced my- self I could reliably tell the original from the copy|just barely, buy di erent nonetheless. The di erences occurred in three areas: A slight suppression of low-level high- frequency content when the algorithm needed most of the available bitstream to handle strong bass and midrange content, a slight dulling of the attack of percussion instruments (piano, harpsichord, glockenspiel, etc.) probably caused by imperfect masking of pre-echo" and a slight post-echo" (noise pu ) at the sensation of a sharp sound (such as claves struck in an acoustically dead envir- onment). The second and third of these anomalies were most readily discernible on single instruments played one note at a time in a quiet environment and were taken from a recording speci cally made to evaluate perceptual encoders. Similar e ects exist when listening to a DCC recording. Although the losses are minimal, they are still present, being the tradeo of having the small compact portable format. 21
  • 25. 5 Conclusion In the last decade, the eld of digital audio compression has grown tremendously. With the expansion of the electronics industry and the decreasing prices of digital audio, many devices which once used analog audio technology now use digital technology. Many of these digital devices use compression to reduce storage space, and bring down cost. Digital audio compression has become a sub-area of Audio Engineering, supporting many professionals who specialize in this eld. Millions of dollars are invested by companies, such as Sony and Philips, to develop proprietary compression schemes for their digital audio applications (Audio FAQ). Because of the widespread use of compression, knowledge in this area can be useful. As a musician working with modern digital recording and editing equipment, the study of compression can provide an advantage. Knowledge in the eld of compression can help in the evaluation and understanding of recording and playback equipment. It can also aid when manipulating digital les with computers. As we move into the next century, and digital audio technology continues to grow, the knowledge of audio compression will become an increasingly valuable asset. 22
  • 26. Bibliography Audio tutorial FAQ." FTP://pub/usenet/news.answers/audio-fmts/part 12], Center for Innovative Computer Applications, August 1994. J. G. Beerends and J. A. Stermerdink, A perceptual audio quality measure based on a psychoacoustic sound representation," AES: Journal of the Audio Engineering Society, vol. 40, p. 963, December 1992. L. W. Couch, Digital and Analog Communication Systems. New York, NY: Macmillan Publishing Company, fourth ed., 1993. J. Degener, Digital speech compression," Dr. Dobb's Journal, vol. 19, p. 30, December 1994. M. Fleischmann, Digital recording arrives," Popular Science, vol. 242, p. 84, April 1993. E. J. Foster, Sony MSD-501 minidisc deck," Audio, vol. 78, p. 56, November 1994. D. B. Guralnik, ed., Webster's New World Dictionary. New York, NY: Prentice Hall Press, second college ed., 1986. P. Lutter, M. Muller-Wernhart, J. Ramharter, F. Rattay, and P. Slowik, Speech research with WAVE-GL," Dr. Dobb's Journal, vol. 21, p. 50, November 1996. T. J. Lynch, Data Compression: Techniques and Applications. New York, NY: Van Nos- trand Reinhold, 1985. M. Nelson, The Data Compression Book. San Mateo, CA: M&T Books, 1992. Panasonic Portable CD Player SL-S600C Operating Instructions. K. C. Pollmann, ed., Advanced Digital Audio. Carmel, IN: SAMS, rst ed., 1993. J. W. Ratcli , Audio compression," Dr. Dobb's Journal, vol. 17, p. 32, July 1992. J. W. Ratcli , Examining PC audio," Dr. Dobb's Journal, vol. 18, p. 78, March 1993. J. Rothstein, MIDI: A Comprehensive Introduction. Madison, WI: A-R Editions, Inc., 1992. A. Vollmer, Minidisc, digital compact cassette vie for digital recording market," Electron- ics, vol. 66, p. 11, September 13 1993. J. Watkinson, An Introduction to Digital Audio. Jordan Hill, Oxford (GB): Focal Press, 1994. T. Yoshida, The rewritable minidisc system," Proceedings of the IEEE, vol. 82, p. 1492, October 1994. 23