SlideShare a Scribd company logo
1 of 398
Download to read offline
Abstract Algebra Applications with Maple,
Richard E. Klima, Ernest Stitzinger, and Neil P. Sigmon
Algebraic Number Theory, Richard A. Mollin
An Atlas of The Smaller Maps in Orientable and Nonorientable Surfaces,
David M. Jackson and Terry I. Visentin
An Introduction to Crytography, Richard A. Mollin
Combinatorial Algorithms: Generation Enumeration and Search,
Donald L. Kreher and Douglas R. Stinson
The CRC Handbook of Combinatorial Designs,
Charles J. Colbourn and Jeffrey H. Dinitz
Cryptography: Theory and Practice, Second Edition, Douglas R. Stinson
Design Theory, Charles C. Lindner and Christopher A. Rodgers
Frames and Resolvable Designs: Uses, Constructions, and Existence,
Steven Furino, Ying Miao, and Jianxing Yin
Fundamental Number Theory with Applications, Richard A. Mollin
Graph Theory and Its Applications, Jonathan Gross and Jay Yellen
Handbook of Applied Cryptography,
Alfred J. Menezes, Paul C. van Oorschot, and Scott A. Vanstone
Handbook of Constrained Optimization,
Herbert B. Shulman and Venkat Venkateswaran
Handbook of Discrete and Combinatorial Mathematics, Kenneth H. Rosen
Handbook of Discrete and Computational Geometry,
Jacob E. Goodman and Joseph O’Rourke
Introduction to Information Theory and Data Compression,
Darrel R. Hankerson, Greg A. Harris, and Peter D. Johnson
Series Editor
Kenneth H. Rosen, Ph.D.
AT&T Laboratories
Middletown, New Jersey
and
DISCRETE
MATHEMATICS
ITS APPLICATIONS
Continued Titles
Network Reliability: Experiments with a Symbolic Algebra Environment,
Daryl D. Harms, Miroslav Kraetzl, Charles J. Colbourn, and John S. Devitt
RSA and Public-Key Cryptography
Richard A. Mollin
Quadratics, Richard A. Mollin
Verificaton of Computer Codes in Computational Science and Engineering,
Patrick Knupp and Kambiz Salari
CHAPMAN & HALL/CRC
A CRC Press Company
Boca Raton London NewYork Washington, D.C.
Roberto Togneri
Christopher J.S. deSilva
FUNDAMENTALS of
INFORMATION THEORY
and CODING DESIGN
DISCRETE MA THEMA TICS A ND ITS A PPLICA TIONS
Series Editor KENNETH H. ROSEN
7KLV ERRN FRQWDLQV LQIRUPDWLRQ REWDLQHG IURP DXWKHQWLF DQG KLJKO UHJDUGHG VRXUFHV 5HSULQWHG PDWHULDO
LV TXRWHG ZLWK SHUPLVVLRQ DQG VRXUFHV DUH LQGLFDWHG $ ZLGH YDULHW RI UHIHUHQFHV DUH OLVWHG 5HDVRQDEOH
HIIRUWV KDYH EHHQ PDGH WR SXEOLVK UHOLDEOH GDWD DQG LQIRUPDWLRQ EXW WKH DXWKRU DQG WKH SXEOLVKHU FDQQRW
DVVXPH UHVSRQVLELOLW IRU WKH YDOLGLW RI DOO PDWHULDOV RU IRU WKH FRQVHTXHQFHV RI WKHLU XVH
1HLWKHU WKLV ERRN QRU DQ SDUW PD EH UHSURGXFHG RU WUDQVPLWWHG LQ DQ IRUP RU E DQ PHDQV HOHFWURQLF
RU PHFKDQLFDO LQFOXGLQJ SKRWRFRSLQJ PLFURÀOPLQJ DQG UHFRUGLQJ RU E DQ LQIRUPDWLRQ VWRUDJH RU
UHWULHYDO VVWHP ZLWKRXW SULRU SHUPLVVLRQ LQ ZULWLQJ IURP WKH SXEOLVKHU
7KH FRQVHQW RI 5 3UHVV // GRHV QRW H[WHQG WR FRSLQJ IRU JHQHUDO GLVWULEXWLRQ IRU SURPRWLRQ IRU
FUHDWLQJ QHZ ZRUNV RU IRU UHVDOH 6SHFLÀF SHUPLVVLRQ PXVW EH REWDLQHG LQ ZULWLQJ IURP 5 3UHVV //
IRU VXFK FRSLQJ
'LUHFW DOO LQTXLULHV WR 5 3UHVV //  1: RUSRUDWH %OYG %RFD 5DWRQ )ORULGD 
7UDGHPDUN 1RWLFH 3URGXFW RU FRUSRUDWH QDPHV PD EH WUDGHPDUNV RU UHJLVWHUHG WUDGHPDUNV DQG DUH
XVHG RQO IRU LGHQWLÀFDWLRQ DQG H[SODQDWLRQ ZLWKRXW LQWHQW WR LQIULQJH
9LVLW WKH 5 3UHVV :HE VLWH DW ZZZFUFSUHVVFRP
‹  E KDSPDQ +DOO5
1R FODLP WR RULJLQDO 86 *RYHUQPHQW ZRUNV
,QWHUQDWLRQDO 6WDQGDUG %RRN 1XPEHU 
/LEUDU RI RQJUHVV DUG 1XPEHU 
3ULQWHG LQ WKH 8QLWHG 6WDWHV RI $PHULFD          
3ULQWHG RQ DFLGIUHH SDSHU
/LEUDU RI RQJUHVV DWDORJLQJLQ3XEOLFDWLRQ 'DWD
7RJQHUL 5REHUWR
)XQGDPHQWDOV RI LQIRUPDWLRQ WKHRU DQG FRGLQJ GHVLJQ  5REHUWR 7RJQHUL DQG
KULVWRSKHU -6 GH6LOYD
S FP ³ 'LVFUHWH PDWKHPDWLFV DQG LWV DSSOLFDWLRQV
,QFOXGHV ELEOLRJUDSKLFDO UHIHUHQFHV DQG LQGH[
,6%1  DON SDSHU
,QIRUPDWLRQ WKHRU  RGLQJ WKHRU , 'H6LOYD KULVWRSKHU - 6 ,, 7LWOH ,,, 5
3UHVV VHULHV RQ GLVFUHWH PDWKHPDWLFV DQG LWV DSSOLFDWLRQV
4 7 
d³GF 
This edition published in the Taylor  Francis e-Library, 2006.
“To purchase your own copy of this or any of Taylor  Francis or Routledge’s
collection of thousands of eBooks please go to www.eBookstore.tandf.co.uk.”
Preface
What is information? How do we quantify or measure the amount of information
that is present in a file of data, or a string of text? How do we encode the information
so that it can be stored efficiently, or transmitted reliably?
The main concepts and principles of information theory were developed by Claude E.
Shannon in the 1940s. Yet only now, and thanks to the emergence of the information
age and digital communication, are the ideas of information theory being looked at
again in a new light. Because of information theory and the results arising from
coding theory we now know how to quantify information, how we can efficiently
encode it and how reliably we can transmit it.
This book introduces the main concepts behind how we model information sources
and channels, how we code sources for efficient storage and transmission, and the
fundamentals of coding theory and applications to state-of-the-art error correcting
and error detecting codes.
This textbook has been written for upper level undergraduate students and graduate
students in mathematics, engineering and computer science. Most of the material
presented in this text was developed over many years at The University of West-
ern Australia in the unit Information Theory and Coding 314, which was a core unit
for students majoring in Communications and Electrical and Electronic Engineering,
and was a unit offered to students enrolled in the Master of Engineering by Course-
work and Dissertation in the Intelligent Information Processing Systems course.
The number of books on the market dealing with information theory and coding has
been on the rise over the past five years. However, very few, if any, of these books
have been able to cover the fundamentals of the theory without losing the reader in
the complex mathematical abstractions. And fewer books are able to provide the
important theoretical framework when discussing the algorithms and implementa-
tion details of modern coding systems. This book does not abandon the theoretical
foundations of information and coding theory and presents working algorithms and
implementations which can be used to fabricate and design real systems. The main
emphasis is on the underlying concepts that govern information theory and the nec-
essary mathematical background that describe modern coding systems. One of the
strengths of the book are the many worked examples that appear throughout the book
that allow the reader to immediately understand the concept being explained, or the
algorithm being described. These are backed up by fairly comprehensive exercise
sets at the end of each chapter (including exercises identified by an * which are more
advanced or challenging).
v
vi
The material in the book has been selected for completeness and to present a balanced
coverage. There is discussion of cascading of information channels and additivity
of information which is rarely found in modern texts. Arithmetic coding is fully
explained with both worked examples for encoding and decoding. The connection
between coding of extensions and Markov modelling is clearly established (this is
usually not apparent in other textbooks). Three complete chapters are devoted to
block codes for error detection and correction. A large part of these chapters deals
with an exposition of the concepts from abstract algebra that underpin the design of
these codes. We decided that this material should form part of the main text (rather
than be relegated to an appendix) to emphasise the importance of understanding the
mathematics of these and other advanced coding strategies.
Chapter 1 introduces the concepts of entropy and information sources and explains
how information sources are modelled. In Chapter 2 this analysis is extended to
information channels where the concept of mutual information is introduced and
channel capacity is discussed. Chapter 3 covers source coding for efficient storage
and transmission with an introduction to the theory and main concepts, a discussion
of Shannon’s Noiseless Coding Theorem and details of the Huffman and arithmetic
coding algorithms. Chapter 4 provides the basic principles behind the various com-
pression algorithms including run-length coding and dictionary coders. Chapter 5
introduces the fundamental principles of channel coding, the importance of the Ham-
ming distance in the analysis and design of codes and a statement of what Shannon’s
Fundamental Coding Theorem tells us we can do with channel codes. Chapter 6
introduces the algebraic concepts of groups, rings, fields and linear spaces over the
binary field and introduces binary block codes. Chapter 7 provides the details of the
theory of rings of polynomials and cyclic codes and describes how to analyse and
design various linear cyclic codes including Hamming codes, Cyclic Redundancy
Codes and Reed-Muller codes. Chapter 8 deals with burst-correcting codes and de-
scribes the design of Fire codes, BCH codes and Reed-Solomon codes. Chapter 9
completes the discussion on channel coding by describing the convolutional encoder,
decoding of convolutional codes, trellis modulation and Turbo codes.
This book can be used as a textbook for a one semester undergraduate course in in-
formation theory and source coding (all of Chapters 1 to 4), a one semester graduate
course in coding theory (all of Chapters 5 to 9) or as part of a one semester under-
graduate course in communications systems covering information theory and coding
(selected material from Chapters 1, 2, 3, 5, 6 and 7).
We would like to thank Sean Davey and Nishith Arora for their help with the LATEX
formatting of the manuscript. We would also like to thank Ken Rosen for his review
of our draft manuscript and his many helpful suggestions and Sunil Nair from CRC
Press for encouraging us to write this book in the first place!
Our examples on arithmetic coding were greatly facilitated by the use of the conver-
sion calculator (which is one of the few that can handle fractions!) made available
by www.math.com.
vii
The manuscript was written in LATEX and we are indebted to the open source software
community for developing such a powerful text processing environment. We are
especially grateful to the developers of LyX (www.lyx.org) for making writing the
document that much more enjoyable and to the makers of xfig (www.xfig.org) for
providing such an easy-to-use drawing package.
Roberto Togneri
Chris deSilva
Contents
1 Entropy and Information 1
1.1 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Structure in Randomness . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 First Concepts of Probability Theory . . . . . . . . . . . . . . . . 2
1.4 Surprise and Entropy . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.5 Units of Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.6 The Minimum and Maximum Values of Entropy . . . . . . . . . . 7
1.7 A Useful Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.8 Joint Probability Distribution Functions . . . . . . . . . . . . . . . 10
1.9 Conditional Probability and Bayes’ Theorem . . . . . . . . . . . . 12
1.10 Conditional Probability Distributions and Conditional Entropy . . . 14
1.11 Information Sources . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.12 Memoryless Information Sources . . . . . . . . . . . . . . . . . . 18
1.13 Markov Sources and n-gram Models . . . . . . . . . . . . . . . . . 19
1.14 Stationary Distributions . . . . . . . . . . . . . . . . . . . . . . . 23
1.15 The Entropy of Markov Sources . . . . . . . . . . . . . . . . . . . 27
1.16 Sequences of Symbols . . . . . . . . . . . . . . . . . . . . . . . . 29
1.17 The Adjoint Source of a Markov Source . . . . . . . . . . . . . . . 31
1.18 Extensions of Sources . . . . . . . . . . . . . . . . . . . . . . . . 34
1.19 Infinite Sample Spaces . . . . . . . . . . . . . . . . . . . . . . . . 41
1.20 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
1.21 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2 Information Channels 51
2.1 What Are Information Channels? . . . . . . . . . . . . . . . . . . 51
2.2 BSC and BEC Channels . . . . . . . . . . . . . . . . . . . . . . . 54
2.3 Mutual Information . . . . . . . . . . . . . . . . . . . . . . . . . . 56
2.3.1 Importance of Mutual Information . . . . . . . . . . . . . . 61
2.3.2 Properties of the Mutual Information . . . . . . . . . . . . 61
2.4 Noiseless and Deterministic Channels . . . . . . . . . . . . . . . . 66
2.4.1 Noiseless Channels . . . . . . . . . . . . . . . . . . . . . . 66
2.4.2 Deterministic Channels . . . . . . . . . . . . . . . . . . . . 68
2.5 Cascaded Channels . . . . . . . . . . . . . . . . . . . . . . . . . . 69
2.6 Additivity of Mutual Information . . . . . . . . . . . . . . . . . . 73
2.7 Channel Capacity: Maximum Mutual Information . . . . . . . . . 78
2.7.1 Channel Capacity of a BSC . . . . . . . . . . . . . . . . . 79
2.7.2 Channel Capacity of a BEC . . . . . . . . . . . . . . . . . 80
ix
x
2.7.3 Channel Capacity of Weakly Symmetric Channels . . . . . 81
2.8 Continuous Channels and Gaussian Channels . . . . . . . . . . . . 83
2.9 Information Capacity Theorem . . . . . . . . . . . . . . . . . . . . 85
2.10 Rate Distortion Theory . . . . . . . . . . . . . . . . . . . . . . . . 88
2.10.1 Properties of Ê´ µ . . . . . . . . . . . . . . . . . . . . . . 93
2.11 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
2.12 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
3 Source Coding 105
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
3.2 Instantaneous Codes . . . . . . . . . . . . . . . . . . . . . . . . . 107
3.2.1 Construction of Instantaneous Codes . . . . . . . . . . . . . 110
3.2.2 Decoding Instantaneous Codes . . . . . . . . . . . . . . . . 112
3.2.3 Properties of Instantaneous Codes . . . . . . . . . . . . . . 113
3.2.4 Sensitivity to Bit Errors . . . . . . . . . . . . . . . . . . . 113
3.3 The Kraft Inequality and McMillan’s Theorem . . . . . . . . . . . 115
3.3.1 The Kraft Inequality . . . . . . . . . . . . . . . . . . . . . 115
3.3.2 McMillan’s Theorem . . . . . . . . . . . . . . . . . . . . . 118
3.4 Average Length and Compact Codes . . . . . . . . . . . . . . . . . 121
3.4.1 Average Length . . . . . . . . . . . . . . . . . . . . . . . . 121
3.4.2 Lower Bound on Average Length . . . . . . . . . . . . . . 122
3.5 Shannon’s Noiseless Coding Theorem . . . . . . . . . . . . . . . . 125
3.5.1 Shannon’s Theorem for Zero-Memory Sources . . . . . . . 125
3.5.2 Shannon’s Theorem for Markov Sources . . . . . . . . . . 129
3.5.3 Code Efficiency and Channel Capacity . . . . . . . . . . . 131
3.6 Fano Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
3.7 Huffman Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
3.7.1 Huffman Codes . . . . . . . . . . . . . . . . . . . . . . . . 136
3.7.2 Binary Huffman Coding Algorithm . . . . . . . . . . . . . 137
3.7.3 Software Implementation of Binary Huffman Coding . . . . 142
3.7.4 Ö -ary Huffman Codes . . . . . . . . . . . . . . . . . . . . 142
3.8 Arithmetic Coding . . . . . . . . . . . . . . . . . . . . . . . . . . 146
3.8.1 Encoding and Decoding Algorithms . . . . . . . . . . . . . 149
3.8.2 Encoding and Decoding with Scaling . . . . . . . . . . . . 154
3.8.3 Is Arithmetic Coding Better Than Huffman Coding? . . . . 156
3.9 Higher-order Modelling . . . . . . . . . . . . . . . . . . . . . . . 157
3.9.1 Higher-order Huffman Coding . . . . . . . . . . . . . . . . 158
3.9.2 Higher-order Arithmetic Coding . . . . . . . . . . . . . . . 159
3.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
3.11 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
4 Data Compression 171
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
4.2 Basic Concepts of Data Compression . . . . . . . . . . . . . . . . 172
4.3 Run-length Coding . . . . . . . . . . . . . . . . . . . . . . . . . . 173
xi
4.4 The CCITT Standard for Facsimile Transmission . . . . . . . . . . 174
4.5 Block-sorting Compression . . . . . . . . . . . . . . . . . . . . . 176
4.6 Dictionary Coding . . . . . . . . . . . . . . . . . . . . . . . . . . 179
4.7 Statistical Compression . . . . . . . . . . . . . . . . . . . . . . . 185
4.8 Prediction by Partial Matching . . . . . . . . . . . . . . . . . . . . 186
4.9 Image Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
4.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
4.11 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
5 Fundamentals of Channel Coding 199
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
5.2 Code Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
5.3 Decoding Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
5.4 Hamming Distance . . . . . . . . . . . . . . . . . . . . . . . . . . 206
5.4.1 Hamming Distance Decoding Rule for BSCs . . . . . . . . 207
5.4.2 Error Detection/Correction Using the Hamming Distance . . 208
5.5 Bounds on Å, Maximal Codes and Perfect Codes . . . . . . . . . 213
5.5.1 Upper Bounds on Å and the Hamming Bound . . . . . . . 213
5.5.2 Maximal Codes and the Gilbert Bound . . . . . . . . . . . 217
5.5.3 Redundancy Requirements for Ø-bit Error Correction . . . . 219
5.5.4 Perfect Codes for Ø-bit Error Correction . . . . . . . . . . . 220
5.6 Error Probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . 222
5.6.1 Bit and Block Error Probabilities and Code Rate . . . . . . 223
5.6.2 Probability of Undetected Block Error . . . . . . . . . . . . 225
5.7 Shannon’s Fundamental Coding Theorem . . . . . . . . . . . . . . 227
5.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
5.9 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
6 Error-Correcting Codes 235
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
6.2 Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
6.3 Rings and Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
6.4 Linear Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
6.5 Linear Spaces over the Binary Field . . . . . . . . . . . . . . . . . 251
6.6 Linear Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
6.7 Encoding and Decoding . . . . . . . . . . . . . . . . . . . . . . . 269
6.8 Codes Derived from Hadamard Matrices . . . . . . . . . . . . . . 272
6.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
6.10 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278
7 Cyclic Codes 281
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
7.2 Rings of Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . 281
7.3 Cyclic Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
7.4 Encoding and Decoding of Cyclic Codes . . . . . . . . . . . . . . 296
xii
7.5 Encoding and Decoding Circuits for Cyclic Codes . . . . . . . . . 300
7.6 The Golay Code . . . . . . . . . . . . . . . . . . . . . . . . . . . 304
7.7 Hamming Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . 304
7.8 Cyclic Redundancy Check Codes . . . . . . . . . . . . . . . . . . 307
7.9 Reed-Muller Codes . . . . . . . . . . . . . . . . . . . . . . . . . . 309
7.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314
7.11 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318
8 Burst-Correcting Codes 319
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319
8.2 Finite Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319
8.3 Irreducible Polynomials . . . . . . . . . . . . . . . . . . . . . . . 321
8.4 Construction of Finite Fields . . . . . . . . . . . . . . . . . . . . . 327
8.5 Bursts of Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
8.6 Fire Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
8.7 Minimum Polynomials . . . . . . . . . . . . . . . . . . . . . . . . 338
8.8 Bose-Chaudhuri-Hocquenghem Codes . . . . . . . . . . . . . . . 340
8.9 Other Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342
8.10 Reed-Solomon Codes . . . . . . . . . . . . . . . . . . . . . . . . 345
8.11 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347
8.12 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
9 Convolutional Codes 351
9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351
9.2 A Simple Example . . . . . . . . . . . . . . . . . . . . . . . . . . 351
9.3 Binary Convolutional Codes . . . . . . . . . . . . . . . . . . . . . 356
9.4 Decoding Convolutional Codes . . . . . . . . . . . . . . . . . . . 360
9.5 The Viterbi Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 361
9.6 Sequential Decoding . . . . . . . . . . . . . . . . . . . . . . . . . 367
9.7 Trellis Modulation . . . . . . . . . . . . . . . . . . . . . . . . . . 371
9.8 Turbo Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375
9.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377
9.10 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379
Index 381
Chapter 1
Entropy and Information
1.1 Structure
Structure is a concept of which we all have an intuitive understanding. However,
it is not easy to articulate that understanding and give a precise definition of what
structure is. We might try to explain structure in terms of such things as regularity,
predictability, symmetry and permanence. We might also try to describe what struc-
ture is not, using terms such as featureless, random, chaotic, transient and aleatory.
Part of the problem of trying to define structure is that there are many different kinds
of behaviour and phenomena which might be described as structured, and finding a
definition that covers all of them is very difficult.
Consider the distribution of the stars in the night sky. Overall, it would appear that
this distribution is random, without any structure. Yet people have found patterns in
the stars and imposed a structure on the distribution by naming constellations.
Again, consider what would happen if you took the pixels on the screen of your
computer when it was showing a complicated and colourful scene and strung them
out in a single row. The distribution of colours in this single row of pixels would
appear to be quite arbitrary, yet the complicated pattern of the two-dimensional array
of pixels would still be there.
These two examples illustrate the point that we must distinguish between the pres-
ence of structure and our perception of structure. In the case of the constellations,
the structure is imposed by our brains. In the case of the picture on our computer
screen, we can only see the pattern if the pixels are arranged in a certain way.
Structure relates to the way in which things are put together, the way in which the
parts make up the whole. Yet there is a difference between the structure of, say, a
bridge and that of a piece of music. The parts of the Golden Gate Bridge or the
Sydney Harbour Bridge are solid and fixed in relation to one another. Seeing one
part of the bridge gives you a good idea of what the rest of it looks like.
The structure of pieces of music is quite different. The notes of a melody can be
arranged according to the whim or the genius of the composer. Having heard part
of the melody you cannot be sure of what the next note is going to be, leave alone
1
2 Fundamentals of Information Theory and Coding Design
any other part of the melody. In fact, pieces of music often have a complicated,
multi-layered structure, which is not obvious to the casual listener.
In this book, we are going to be concerned with things that have structure. The kinds
of structure we will be concerned with will be like the structure of pieces of music.
They will not be fixed and obvious.
1.2 Structure in Randomness
Structure may be present in phenomena that appear to be random. When it is present,
it makes the phenomena more predictable. Nevertheless, the fact that randomness is
present means that we have to talk about the phenomena in terms of probabilities.
Let us consider a very simple example of how structure can make a random phe-
nomenon more predictable. Suppose we have a fair die. The probability of any face
coming up when the die is thrown is 1/6. In this case, it is not possible to predict
which face will come up more than one-sixth of the time, on average.
On the other hand, if we have a die that has been biased, this introduces some struc-
ture into the situation. Suppose that the biasing has the effect of making the probabil-
ity of the face with six spots coming up 55/100, the probability of the face with one
spot coming up 5/100 and the probability of any other face coming up 1/10. Then
the prediction that the face with six spots will come up will be right more than half
the time, on average.
Another example of structure in randomness that facilitates prediction arises from
phenomena that are correlated. If we have information about one of the phenomena,
we can make predictions about the other. For example, we know that the IQ of iden-
tical twins is highly correlated. In general, we cannot make any reliable prediction
about the IQ of one of a pair of twins. But if we know the IQ of one twin, we can
make a reliable prediction of the IQ of the other.
In order to talk about structure in randomness in quantitative terms, we need to use
probability theory.
1.3 First Concepts of Probability Theory
To describe a phenomenon in terms of probability theory, we need to define a set
of outcomes, which is called the sample space. For the present, we will restrict
consideration to sample spaces which are finite sets.
Entropy and Information 3
DEFINITION 1.1 Probability Distribution A probability distribution on a
sample space Ë ×½ ×¾ ×Æ is a function È that assigns a probability
to each outcome in the sample space. È is a map from Ë to the unit interval,
È Ë ¼ ½ , which must satisfy
ÈÆ
½ È´× µ ½.
DEFINITION 1.2 Events Events are subsets of the sample space.
We can extend a probability distribution È from Ë to the set of all subsets of Ë,
which we denote by ȴ˵, by setting È´ µ È×¾ È´×µfor any ¾ ȴ˵. Note
that È´ µ ¼.
An event whose probability is ¼ is impossible and an event whose probability is ½ is
certain to occur.
If and are events and then È´ µ È´ µ·È´ µ.
DEFINITION 1.3 Expected Value If Ë ×½ ×¾ ×Æ is a sample space
with probability distribution È, and Ë Î is a function from the sample space
to a vector space Î , the expected value of is
ÈÆ
½ È´× µ ´× µ.
NOTE We will often have equations that involve summation over the elements
of a finite set. In the equations above, the set has been Ë ×½ ×¾ ×Æ and
the summation has been denoted by
ÈÆ
½. In other places in the text we will denote
such summations simply by
È×¾Ë.
1.4 Surprise and Entropy
In everyday life, events can surprise us. Usually, the more unlikely or unexpected
an event is, the more surprising it is. We can quantify this idea using a probability
distribution.
DEFINITION 1.4 Surprise If is an event in a sample space Ë, we define the
surprise of to be ×´ µ   ÐÓ ´È´ µµ ÐÓ ´½ È´ µµ.
Events for which È´ µ ½, which are certain to occur, have zero surprise, as we
would expect, and events that are impossible, that is, for which È´ µ ¼, have
infinite surprise.
4 Fundamentals of Information Theory and Coding Design
Defining the surprise as the negative logarithm of the probability not only gives us the
appropriate limiting values as the probability tends to ¼ or ½, it also makes surprise
additive. If several independent events occur in succession, the total surprise they
generate is the sum of their individual surprises.
DEFINITION 1.5 Entropy We can restrict the surprise to the sample space
and consider it to be a function from the sample space to the real numbers. The
expected value of the surprise is the entropy of the probability distribution.
If the sample space is Ë ×½ ×¾ ×Æ , with probability distribution È, the
entropy of the probability distribution is given by
À´Èµ  
Æ
½
È´× µ ÐÓ ´È´× µµ (1.1)
The concept of entropy was introduced into thermodynamics in the nineteenth cen-
tury. It was considered to be a measure of the extent to which a system was disor-
dered. The tendency of systems to become more disordered over time is described by
the Second Law of Thermodynamics, which states that the entropy of a system can-
not spontaneously decrease. In the 1940’s, Shannon [6] introduced the concept into
communications theory and founded the subject of information theory. It was then
realised that entropy is a property of any stochastic system and the concept is now
used widely in many fields. Today, information theory (as described in books such
as [1], [2], [3]) is still principally concerned with communications systems, but there
are widespread applications in statistics, information processing and computing (see
[2], [4], [5]).
Let us consider some examples of probability distributions and see how the entropy is
related to predictability. First, let us note the form of the function ×´Ôµ  ÔÐÓ ´Ôµ
where ¼ Ô ½ and ÐÓ denotes the logarithm to base 2. (The actual base does not
matter, but we shall be using base 2 throughout the rest of this book, so we may as
well start here.) The graph of this function is shown in Figure 1.1.
Note that  ÔÐÓ ´Ôµ approaches ¼ as Ô tends to ¼ and also as Ô tends to ½. This
means that outcomes that are almost certain to occur and outcomes that are unlikely
to occur both contribute little to the entropy. Outcomes whose probability is close to
¼ make a comparatively large contribution to the entropy.
EXAMPLE 1.1
Ë ×½ ×¾ with È´×½µ ¼ È´×¾µ. The entropy is
À´Èµ  ´¼ µ´ ½µ   ´¼ µ´ ½µ ½
In this case, ×½ and ×¾ are equally likely to occur and the situation is as unpredictable
as it can be.
Entropy and Information 5
0
0.1
0.2
0.3
0.4
0.5
0.6
0 0.2 0.4 0.6 0.8 1
-p*log(p)
p
FIGURE 1.1
The graph of  Ô ÐÓ ´Ôµ.
EXAMPLE 1.2
Ë ×½ ×¾ with È´×½µ ¼ , and È´×¾µ ¼ ¼¿½¾ . The entropy is
À´Èµ  ´¼ µ´ ¼ ¼ µ  ´¼ ¼¿½¾ µ´  µ ¼ ¾¼
In this case, the situation is more predictable, with ×½ more than thirty times more
likely to occur than ×¾. The entropy is close to zero.
EXAMPLE 1.3
Ë ×½ ×¾ with È´×½µ ½ ¼, and È´×¾µ ¼ ¼. Using the convention that
¼ ÐÓ ´¼µ ¼, the entropy is ¼. The situation is entirely predictable, as ×½ always
occurs.
EXAMPLE 1.4
Ë ×½ ×¾ ׿ × × × , with È´× µ ½ for ½ ¾ . The entropy is
¾ and the situation is as unpredictable as it can be.
EXAMPLE 1.5
Ë ×½ ×¾ ׿ × × × , with È´×½µ ¼ È´× µ ¼ ¼¼½ for ¾ ¿ .
6 Fundamentals of Information Theory and Coding Design
The entropy is ¼ ¼ and the situation is fairly predictable as ×½ will occur far more
frequently than any other outcome.
EXAMPLE 1.6
Ë ×½ ×¾ ׿ × × × , with È ´×½µ ¼ È ´×¾µ È ´× µ ¼ ¼¼½ for
¿ . The entropy is ½ ¼ ¾ and the situation is about as predictable as in
Example 1.1 above, with outcomes ×½ and ×¾ equally likely to occur and the others
very unlikely to occur.
Roughly speaking, a system whose entropy is is about as unpredictable as a system
with ¾ equally likely outcomes.
1.5 Units of Entropy
The units in which entropy is measured depend on the base of the logarithms used
to calculate it. If we use logarithms to the base 2, then the unit is the bit. If we
use natural logarithms (base ), the entropy is measured in natural units, sometimes
referred to as nits. Converting between the different units is simple.
PROPOSITION 1.1
If À is the entropy of a probability distribution measured using natural logarithms,
and ÀÖ istheentropyofthesameprobabilitydistribution measured using logarithms
to the base Ö, then
ÀÖ
À
ÐÒ´Öµ
(1.2)
PROOF Let the sample space be Ë ×½ ×¾ ×Æ , with probability distri-
bution È . For any positive number Ü,
ÐҴܵ ÐÒ´Öµ ÐÓ Ö´Üµ (1.3)
It follows that
ÀÖ´È µ  
Æ
½
È ´× µ ÐÓ Ö´È ´× µµ
 
Æ
½
È ´× µ
ÐÒ´È ´× µµ
ÐÒ´Öµ
Entropy and Information 7
 ÈÆ
½ È´× µÐÒ´È´× µµ
ÐÒ´Öµ
À ´Èµ
ÐÒ´Öµ (1.4)
1.6 The Minimum and Maximum Values of Entropy
If we have a sample space Ë with Æ elements, and probability distribution È on Ë,
it is convenient to denote the probability of × ¾ Ë by Ô . We can construct a vector
in ÊÆ consisting of the probabilities:
Ô
¾
Ô½
Ô¾
...
ÔÆ
¿
Because the probabilities have to add up to unity, the set of all probability distribu-
tions forms a simplex in ÊÆ, namely
Ã
´
Ô ¾ ÊÆ
Æ
½
Ô ½
µ
We can consider the entropy to be a function defined on this simplex. Since it is
a continuous function, extreme values will occur at the vertices of this simplex, at
points where all except one of the probabilities are zero. If ÔÚ is a vertex, then the
entropy there will be
À´ÔÚµ ´Æ  ½µ ¼ ÐÓ ´¼µ·½ ÐÓ ´½µ
The logarithm of zero is not defined, but the limit of ÜÐÓ ´Üµ as Ü tends to ¼ ex-
ists and is equal to zero. If we take the limiting values, we see that at any vertex,
À´ÔÚµ ¼, as ÐÓ ´½µ ¼. This is the minimum value of the entropy function.
The entropy function has a maximum value at an interior point of the simplex. To
find it we can use Lagrange multipliers.
THEOREM 1.1
If we have a sample space with Æ elements, the maximum value of the entropy
function is ÐÓ ´Æµ.
8 Fundamentals of Information Theory and Coding Design
PROOF We want to find the maximum value of
À´Ôµ  
Æ
½
Ô ÐÓ ´Ô µ (1.5)
subject to the constraint
Æ
½
Ô ½ (1.6)
We introduce the Lagrange multiplier , and put
´Ôµ À´Ôµ ·
Æ
½
Ô  ½ (1.7)
To find the maximum value we have to solve
Ô ¼ (1.8)
for ½ ¾ Æ and
Æ
½
Ô ½ (1.9)
Ô  ÐÓ ´Ô µ  ½ · (1.10)
so
Ô  ½ (1.11)
for each . The remaining condition gives
Æ  ½ ½ (1.12)
which can be solved for , or can be used directly to give
Ô ½
Æ (1.13)
for all . Using these values for the Ô , we get
À´Ôµ  Æ ½
Æ ÐÓ ´½ Ƶ ÐÓ ´Æµ (1.14)
Entropy and Information 9
1.7 A Useful Inequality
LEMMA 1.1
If Ô½ Ô¾ ÔÆ and Õ½ Õ¾ ÕÆ are all non-negative numbers that satisfy the
conditions
ÈÆ
½ ÔÒ ½ and
ÈÆ
½ ÕÒ ½, then
 
Æ
½
Ô ÐÓ ´Ô µ  
Æ
½
Ô ÐÓ ´Õ µ (1.15)
with equality if and only if Ô Õ for all .
PROOF We prove the result for the natural logarithm; the result for any other base
follows immediately from the identity
ÐҴܵ ÐÒ´ÖµÐÓ Ö´Üµ (1.16)
It is a standard result about the logarithm function that
ÐÒÜ Ü  ½ (1.17)
for Ü ¼, with equality if and only if Ü ½. Substituting Ü Õ Ô , we get
ÐÒ´Õ Ô µ Õ Ô  ½ (1.18)
with equality if and only if Ô Õ . This holds for all ½ ¾ Æ, so if we
multiply by Ô and sum over the , we get
Æ
½
Ô ÐÒ´Õ Ô µ
Æ
½
´Õ  Ô µ
Æ
½
Õ  
Æ
½
Ô ½ ½ ¼ (1.19)
with equality if and only if Ô Õ for all . So
Æ
½
Ô ÐÒ´Õ µ 
Æ
½
Ô ÐÒ´Ô µ ¼ (1.20)
which is the required result.
The inequality can also be written in the form
Æ
½
Ô ÐÓ ´Õ Ô µ ¼ (1.21)
with equality if and only if Ô Õ for all .
Note that putting Õ ½ Æ for all in this inequality gives us an alternative proof
that the maximum value of the entropy function is ÐÓ ´Æµ.
10 Fundamentals of Information Theory and Coding Design
1.8 Joint Probability Distribution Functions
There are many situations in which it is useful to consider sample spaces that are the
Cartesian product of two or more sets.
DEFINITION 1.6 Cartesian Product Let Ë ×½ ×¾ ×Å and Ì
ؽ ؾ ØÆ be two sets. The Cartesian product of Ë and Ì is the set Ë ¢ Ì
´× Ø µ ½ Å ½ Æ .
The extension to the Cartesian product of more than two sets is immediate.
DEFINITION 1.7 Joint Probability Distribution A joint probability distribution
is a probability distribution on the Cartesian product of a number of sets.
If we have Ë and Ì as above, then a joint probability distribution function assigns a
probability to each pair ´× Ø µ. We can denote this probability by Ô . Since these
values form a probability distribution, we have
¼ Ô ½ (1.22)
for ½ Å, ½ Æ, and
Å
½
Æ
½
Ô ½ (1.23)
If È is the joint probability distribution function on Ë ¢ Ì, the definition of entropy
becomes
À´Èµ  
Å
½
Æ
½
È´× Ø µÐÓ ´È´× Ø µµ  
Å
½
Æ
½
Ô ÐÓ ´Ô µ (1.24)
If we want to emphasise the spaces Ë and Ì, we will denote the entropy of the joint
probability distribution on Ë¢Ì by À´Èˢ̵or simply by À´Ë ̵. This is known
as the joint entropy of Ë and Ì.
If there are probability distributions ÈË and ÈÌ on Ë and Ì, respectively, and these
are independent, the joint probability distribution on Ë ¢ Ì is given by
Ô ÈË´× µÈÌ´Ø µ (1.25)
Entropy and Information 11
for ½ Å , ½ Æ . If there are correlations between the × and Ø , then this
formula does not apply.
DEFINITION 1.8 Marginal Distribution If È is a joint probability distribution
function on Ë ¢Ì , the marginal distribution on Ë is ÈË Ë ¼ ½ given by
ÈË´× µ
Æ
½
È ´× Ø µ (1.26)
for ½ Æ and the marginal distribution on Ì is ÈÌ Ì ¼ ½ given by
ÈÌ ´Ø µ
Å
½
È ´× Ø µ (1.27)
for ½ Æ .
There is a simple relationship between the entropy of the joint probability distribution
function and that of the marginal distribution functions.
THEOREM 1.2
If È is a joint probability distribution function on Ë ¢Ì , and ÈË and ÈÌ are the
marginal distributions on Ë and Ì , respectively, then
À´È µ À´È˵ · À´ÈÌ µ (1.28)
with equality if and only if the marginal distributions are independent.
PROOF
À´È˵  
Å
½
ÈË´× µ ÐÓ ´ÈË´× µµ
 
Å
½
Æ
½
È ´× Ø µ ÐÓ ´ÈË´× µµ (1.29)
and similarly
À´ÈÌ µ  
Å
½
Æ
½
È ´× Ø µ ÐÓ ´ÈÌ ´Ø µµ (1.30)
So
À´È˵ · À´ÈÌ µ  
Å
½
Æ
½
È ´× Ø µ ÐÓ ´ÈË´× µµ · ÐÓ ´ÈÌ ´Ø µµ
12 Fundamentals of Information Theory and Coding Design
 
Å
½
Æ
½
È´× Ø µÐÓ ´ÈË´× µÈÌ´Ø µµ (1.31)
Also,
À´Èµ  
Å
½
Æ
½
È´× Ø µÐÓ ´È´× Ø µµ (1.32)
Since
Å
½
Æ
½
È´× Ø µ ½ (1.33)
and
Å
½
Æ
½
ÈË´× µÈÌ´Ø µ
Å
½
ÈË´× µ
Æ
½
ÈÌ´Ø µ ½ (1.34)
we can use the inequality of Lemma 1.1 to conclude that
À´Èµ À´È˵·À´È̵ (1.35)
with equality if and only if È´× Ø µ ÈË´× µÈÌ´Ø µ for all and , that is, if the
two marginal distributions are independent.
1.9 Conditional Probability and Bayes’ Theorem
DEFINITION 1.9 Conditional Probability If Ë is a sample space with a prob-
ability distribution function È, and and are events in Ë, the conditional prob-
ability of given is
È´ µ È´ µ
È´ µ (1.36)
It is obvious that
È´ µÈ´ µ È´ µ È´ µÈ´ µ (1.37)
Almost as obvious is one form of Bayes’ Theorem:
THEOREM 1.3
If Ë is a sample space with a probability distribution function È, and and are
events in Ë, then
È´ µ È´ µÈ´ µ
È´ µ (1.38)
Entropy and Information 13
Bayes’ Theorem is important because it enables us to derive probabilities of hypothe-
ses from observations, as in the following example.
EXAMPLE 1.7
We have two jars, A and B. Jar A contains 8 green balls and 2 red balls. Jar B contains
3 green balls and 7 red balls. One jar is selected at random and a ball is drawn from
it.
Wehaveprobabilitiesasfollows. Thesetofjarsformsonesamplespace,Ë ,
with
È´ µ ¼ È´ µ
as one jar is as likely to be chosen as the other.
The set of colours forms another sample space, Ì Ê . The probability of
drawing a green ball is
È´ µ ½½ ¾¼ ¼
as 11 of the 20 balls in the jars are green. Similarly,
È´ Ê µ ¾¼ ¼
We have a joint probability distribution over the colours of the balls and the jars with
the probability of selecting Jar A and drawing a green ball being given by
È´ ´ µ µ ¼
Similarly, we have the probability of selecting Jar A and drawing a red ball
È´ ´Ê µ µ ¼ ½
the probability of selecting Jar B and drawing a green ball
È´ ´ µ µ ¼ ½
and the probability of selecting Jar B and drawing a red ball
È´ ´Ê µ µ ¼ ¿
We have the conditional probabilities: given that Jar A was selected, the probability
of drawing a green ball is
È´ µ ¼
and the probability of drawing a red ball is
È´ Ê µ ¼ ¾
14 Fundamentals of Information Theory and Coding Design
Given that Jar B was selected, the corresponding probabilities are:
È´ µ ¼ ¿
and
È´ Ê µ ¼
We can now use Bayes’ Theorem to work out the probability of having drawn from
either jar, given the colour of the ball that was drawn. If a green ball was drawn, the
probability that it was drawn from Jar A is
È´ µ
È´ µÈ´ µ
È´ µ
¼ ¢ ¼ ¼ ¼ ¿
while the probability that it was drawn from Jar B is
È´ µ
È´ µÈ´ µ
È´ µ
¼ ¿ ¢ ¼ ¼ ¼ ¾
If a red ball was drawn, the probability that it was drawn from Jar A is
È´ Ê µ
È´ Ê µÈ´ µ
È´ Ê µ
¼ ¾ ¢ ¼ ¼ ¼ ¾¾
while the probability that it was drawn from Jar B is
È´ Ê µ
È´ Ê µÈ´ µ
È´ Ê µ
¼ ¢ ¼ ¼ ¼
(In this case, we could have derived these conditional probabilities from the joint
probability distribution, but we chose not to do so to illustrate how Bayes’ Theorem
allows us to go from the conditional probabilities of the colours given the jar selected
to the conditional probabilities of the jars selected given the colours drawn.)
1.10 Conditional Probability Distributions and Conditional Entropy
In this section, we have a joint probability distribution È on a Cartesian product
Ë ¢ Ì, where Ë ×½ ×¾ ×Å and Ì Ø½ ؾ ØÆ , with marginal distri-
butions ÈË and ÈÌ.
DEFINITION 1.10 Conditional Probability of × given Ø For × ¾ Ë and
Ø ¾ Ì, the conditional probability of × given Ø is
È´× Ø µ
È´× Ø µ
ÈÌ´Ø µ
È´× Ø µ
ÈÅ
½ È´× Ø µ
(1.39)
Entropy and Information 15
DEFINITION 1.11 Conditional Probability Distribution given Ø For a fixed
Ø , the conditional probabilities È ´× Ø µ sum to 1 over , so they form a probability
distribution on Ë, the conditional probability distribution given Ø . We will denote
this by ÈË Ø .
DEFINITION 1.12 Conditional Entropy given Ø The conditional entropy
given Ø is the entropy of the conditional probability distribution on Ë given Ø . It
will be denoted À´ÈË Ø µ.
À´ÈË Ø µ  
Å
½
È ´× Ø µ ÐÓ ´È ´× Ø µµ (1.40)
DEFINITION 1.13 Conditional Probability Distribution on Ë given Ì The
conditional probability distribution on Ë given Ì is the weighted average of the
conditional probability distributions given Ø for all . It will be denoted ÈË Ì .
ÈË Ì ´× µ
Æ
½
ÈÌ ´Ø µÈË Ø ´× µ (1.41)
DEFINITION1.14 ConditionalEntropygivenÌ Theconditional entropy given
Ì is the weighted average of the conditional entropies on Ë given Ø for all Ø ¾ Ì .
It will be denoted À´ÈË Ì µ.
À´ÈË Ì µ  
Æ
½
ÈÌ ´Ø µ
Å
½
È ´× Ø µ ÐÓ ´È ´× Ø µµ (1.42)
Since ÈÌ ´Ø µÈ ´× Ø µ È ´× Ø µ, we can re-write this as
À´ÈË Ì µ  
Å
½
Æ
½
È ´× Ø µ ÐÓ ´È ´× Ø µµ (1.43)
We now prove two simple results about the conditional entropies.
THEOREM 1.4
À´È µ À´ÈÌ µ · À´ÈË Ì µ À´È˵ · À´ÈÌ Ëµ
16 Fundamentals of Information Theory and Coding Design
PROOF
À´Èµ  
Å
½
Æ
½
È´× Ø µ ÐÓ ´È´× Ø µµ
 
Å
½
Æ
½
È´× Ø µ ÐÓ ´ÈÌ´Ø µÈ´× Ø µµ
 
Å
½
Æ
½
È´× Ø µ ÐÓ ´ÈÌ´Ø µµ  
Å
½
Æ
½
È´× Ø µ ÐÓ ´È´× Ø µµ
 
Æ
½
ÈÌ´Ø µ ÐÓ ´ÈÌ´Ø µµ  
Å
½
Æ
½
È´× Ø µ ÐÓ ´È´× Ø µµ
À´È̵ · À´ÈË Ìµ (1.44)
The proof of the other equality is similar.
THEOREM 1.5
À´ÈË Ìµ À´È˵ with equality if and only if ÈË and ÈÌ are independent.
PROOF From the previous theorem, À´Èµ À´È̵ · À´ÈË Ìµ
From Theorem 1.2, À´Èµ À´È˵ · À´È̵ with equality if and only if ÈË and
ÈÌ are independent.
So À´È̵ · À´ÈË Ìµ À´È˵ · À´È̵.
Subtracting À´È̵ from both sides we get À´ÈË Ìµ À´È˵, with equality if and
only if ÈË and ÈÌ are independent.
This result is obviously symmetric in Ë and Ì; so we also have À´ÈÌ Ëµ À´È̵
with equality if and only if ÈË and ÈÌ are independent. We can sum up this result
by saying the conditioning reduces entropy or conditioning reduces uncertainty.
1.11 Information Sources
Most of this book will be concerned with random sequences. Depending on the
context, such sequences may be called time series, (discrete) stochastic processes or
Entropy and Information 17
signals. The first term is used by statisticians, the second by mathematicians and the
third by engineers. This may reflect differences in the way these people approach the
subject: statisticians are primarily interested in describing such sequences in terms
of probability theory, mathematicians are interested in the behaviour of such series
and the ways in which they may be generated and engineers are interested in ways of
using such sequences and processing them to extract useful information from them.
A device or situation that produces such a sequence is called an information source.
The elements of the sequence are usually drawn from a finite set, which may be
referred to as the alphabet. The source can be considered to be emitting an element
of the alphabet at each instant of a sequence of instants in time. The elements of the
alphabet are referred to as symbols.
EXAMPLE 1.8
Tossing a coin repeatedly and recording the outcomes as heads (H) or tails (T) gives
us a random sequence whose alphabet is À Ì .
EXAMPLE 1.9
Throwing a die repeatedly and recording the number of spots on the uppermost face
gives us a random sequence whose alphabet is ½ ¾ ¿ .
EXAMPLE 1.10
Computers and telecommunications equipment generate sequences of bits which are
random sequences whose alphabet is ¼ ½ .
EXAMPLE 1.11
A text in the English language is a random sequence whose alphabet is the set con-
sisting of the letters of the alphabet, the digits and the punctuation marks. While we
normally consider text to be meaningful rather than random, it is only possible to
predict which letter will come next in the sequence in probabilistic terms, in general.
The last example above illustrates the point that a random sequence may not appear
to be random at first sight. The difference between the earlier examples and the final
example is that in the English language there are correlations between each letter in
the sequence and those that precede it. In contrast, there are no such correlations in
the cases of tossing a coin or throwing a die repeatedly. We will consider both kinds
of information sources below.
18 Fundamentals of Information Theory and Coding Design
An obvious question that is raised by the term “information source” is: What is the
“information” that the source produces? A second question, perhaps less obvious,
is: How can we measure the information produced by an information source?
An information source generates a sequence of symbols which has a certain degree
of unpredictability. The more unpredictable the sequence is, the more information
is conveyed by each symbol. The information source may impose structure on the
sequence of symbols. This structure will increase the predictability of the sequence
and reduce the information carried by each symbol.
The random behaviour of the sequence may be described by probability distribu-
tions over the alphabet. If the elements of the sequence are uncorrelated, a simple
probability distribution over the alphabet may suffice. In other cases, conditional
probability distributions may be required.
We have already seen that entropy is a measure of predictability. For an information
source, the information content of the sequence that it generates is measured by the
entropy per symbol. We can compute this if we make assumptions about the kinds
of structure that the information source imposes upon its output sequences.
To describe an information source completely, we need to specify both the alpha-
bet and the probability distribution that governs the generation of sequences. The
entropy of the information source Ë with alphabet and probability distribution È
will be denoted by À´Ëµ in the following sections, even though it is actually the en-
tropy of È. Later on, we will wish to concentrate on the alphabet and will use À´ µ
to denote the entropy of the information source, on the assumption that the alphabet
will have a probability distribution associated with it.
1.12 Memoryless Information Sources
For a memoryless information source, there are no correlations between the outputs
of the source at different times. For each instant at which an output is emitted, there
is a probability distribution over the alphabet that describes the probability of each
symbol being emitted at that instant. If all the probability distributions are the same,
the source is said to be stationary. If we know these probability distributions, we can
calculate the information content of the sequence.
EXAMPLE 1.12
Tossing a fair coin gives us an example of a stationary memoryless information source.
At any instant, the probability distribution is given by È´Àµ ¼ , ȴ̵ ¼ ¼.
This probability distribution has an entropy of 1 bit; so the information content is 1
bit/symbol.
Entropy and Information 19
EXAMPLE 1.13
As an exampleof a non-stationarymemorylessinformationsource, suppose we have a
faircoinandadiewith Àpainted on fourfacesandÌpainted on two faces. Tossing the
coin and throwing the die in alternation will create a memoryless information source
with alphabet À Ì . Every time the coin is tossed, the probability distribution of
the outcomes is È´Àµ ¼ , ȴ̵ ¼ , and every time the die is thrown, the
probability distribution is È´Àµ ¼ , ȴ̵ ¼ ¿¿¿.
The probability distribution of the outcomes of tossing the coin has an entropy of 1
bit. The probability distribution of the outcomes of throwing the die has an entropy
of 0.918 bits. The information content of the sequence is the average entropy per
symbol, which is 0.959 bits/symbol.
Memoryless information sources are relatively simple. More realistic information
sources have memory, which is the property that the emission of a symbol at any
instant depends on one or more of the symbols that were generated before it.
1.13 Markov Sources and n-gram Models
Markov sources and n-gram models are descriptions of a class of information sources
with memory.
DEFINITION 1.15Markov Source A Markov source consists of an alphabet ,
a set of states ¦, a set of transitions between states, a set of labels for the transitions
and two sets of probabilities. The first set of probabilities is the initial probability
distribution on the set of states, which determines the probabilities of sequences
starting with each symbol in the alphabet. The second set of probabilities is a set
of transition probabilities. For each pair of states, and , the probability of a
transition from to is È´ µ. (Note that these probabilities are fixed and do not
depend on time, so that there is an implicit assumption of stationarity.) The labels
on the transitions are symbols from the alphabet.
To generate a sequence, a state is selected on the basis of the initial probability distri-
bution. A transition from this state to another state (or to the same state) is selected
on the basis of the transition probabilities, and the label of this transition is output.
This process is repeated to generate the sequence of output symbols.
It is convenient to represent Markov models diagrammatically in the form of a graph,
with the states represented by vertices and the transitions by edges, as in the follow-
ing example.
20 Fundamentals of Information Theory and Coding Design
²±³°½
²±
³°¾
²±
³°¿
 
 
 
 
  ©
Label 1,
È´¾ ½µ ½ ¼
Á Label 1,
È´½ ¿µ ¼
¹Label 0, È´¿ ¾µ ½ ¼
Label 0, È´¾ ¿µ ¼
FIGURE 1.2
Diagrammatic representation of a Markov source.
EXAMPLE 1.14
Consider a Markov source with alphabet ¼ ½ and set of states ¦ ½ ¾ ¿ .
Suppose there are four transitions:
1. ½ ¾, with label 1 and È´¾ ½µ ½ ¼;
2. ¾ ¿, with label 0 and È´¿ ¾µ ½ ¼;
3. ¿ ½, with label 1 and È´½ ¿µ ¼ ;
4. ¿ ¾, with label 0 and È´¾ ¿µ ¼ .
The initial probability distribution is È´ ½µ ½ ¿, È´ ¾µ ½ ¿, È´ ¿µ ½ ¿.
The diagrammatic representation of this is shown in Figure 1.2.
The random sequences generated by this source all consist of subsequences of an
even number of 0’s separated by a pair of 1’s, except at the very beginning of the
sequence, where there may be a single 0 or 1.
It is possible to describe these sources without explicit reference to the states. In an
n-gram model, the description is solely in terms of the probabilities of all the possible
sequences of symbols of length Ò.
Entropy and Information 21
EXAMPLE 1.15
The following probabilities give us a 3-gram model on the language ¼ ½ .
È ´¼¼¼µ ¼ ¿¾ È ´¼¼½µ ¼ ¼
È ´¼½¼µ ¼ ½ È ´¼½½µ ¼ ½
È ´½¼¼µ ¼ ½ È ´½¼½µ ¼ ¼
È ´½½¼µ ¼ ¼ È ´½½½µ ¼ ¼
To describe the relationship between n-gram models and Markov sources, we need
to look at special cases of Markov sources.
DEFINITION 1.16 Ñth-order Markov Source A Markov source whose states
are sequences of Ñ symbols from the alphabet is called an Ñth-order Markov
source.
When we have an Ñth-order Markov model, the transition probabilities are usually
given in terms of the probabilities of single symbols being emitted when the source
is in a given state. For example, in a second-order Markov model on ¼ ½ , the
transition probability from ¼½ to ½¼, which would be represented by È ´½¼ ¼½µ, would
be represented instead by the probability of emission of ¼ when in the state ¼½, that is
È ´¼ ¼½µ. Obviously, some transitions are impossible. For example, it is not possible
to go from the state ¼½ to the state ¼¼, as the state following ¼½ must have ½ as its
first symbol.
We can construct a Ñth-order Markov model from an ´Ñ · ½µ-gram model and an
Ñ-gram model. The Ñ-gram model gives us the probabilities of strings of length Ñ,
such as È ´×½ ×¾ ×ѵ. To find the emission probability of × from this state, we
set
È ´× ×½ ×¾ ×ѵ
È ´×½ ×¾ ×Ñ ×µ
È ´×½ ×¾ ×ѵ
(1.45)
where the probability È ´×½ ×¾ ×Ñ ×µ is given by the ´Ñ · ½µ-gram model.
EXAMPLE 1.16
In the previous example 1.15 we had a 3-gram model on the language ¼ ½ given
by
È ´¼¼¼µ ¼ ¿¾ È ´¼¼½µ ¼ ¼
È ´¼½¼µ ¼ ½ È ´¼½½µ ¼ ½
È ´½¼¼µ ¼ ½ È ´½¼½µ ¼ ¼
È ´½½¼µ ¼ ¼ È ´½½½µ ¼ ¼
22 Fundamentals of Information Theory and Coding Design
P(1|01)=0.5
P(0|11)=0.6P(1|00)=0.2
01 11
1000
P(0|10)=0.8
P(1|10)=0.2
P(0|01)=0.5
P(1|11)=0.4
P(0|00)=0.8
FIGURE 1.3
Diagrammatic representation of a Markov source equivalent to a 3-gram model.
If a 2-gram model for the same source is given by È´¼¼µ ¼ , È´¼½µ ¼ ¿,
È´½¼µ ¼ ¾ and È´½½µ ¼ ½, then we can construct a second-order Markov source
as follows:
È´¼ ¼¼µ È´¼¼¼µ È´¼¼µ ¼ ¿¾ ¼ ¼
È´½ ¼¼µ È´¼¼½µ È´¼¼µ ¼ ¼ ¼ ¼ ¾
È´¼ ¼½µ È´¼½¼µ È´¼½µ ¼ ½ ¼ ¿ ¼
È´½ ¼½µ È´¼½½µ È´¼½µ ¼ ½ ¼ ¿ ¼
È´¼ ½¼µ È´½¼¼µ È´½¼µ ¼ ½ ¼ ¾ ¼
È´½ ½¼µ È´½¼½µ È´½¼µ ¼ ¼ ¼ ¾ ¼ ¾
È´¼ ½½µ È´½½¼µ È´½½µ ¼ ¼ ¼ ½ ¼
È´½ ½½µ È´½½½µ È´½½µ ¼ ¼ ¼ ½ ¼
Figure 1.3 shows this Markov source.
To describe the behaviour of a Markov source mathematically, we use the transition
matrix of probabilities. If the set of states is
¦ ½ ¾ Æ
Entropy and Information 23
the transition matrix is the Æ ¢Æ matrix
¥
¾
È´½ ½µ È´½ ¾µ ¡¡¡ È´½ Ƶ
È´¾ ½µ È´¾ ¾µ ¡¡¡ È´¾ Ƶ
...
...
...
...
È´Æ ½µ È´Æ ¾µ ¡¡¡ È´Æ Æµ
¿
(1.46)
The probability of the source being in a given state varies over time. Let ÛØ
be the
probability of the source being in state at time Ø, and set
ÏØ
¾
ÛØ
½
ÛØ
¾
...
ÛØ
Æ
¿
(1.47)
Then ϼ is the initial probability distribution and
ÏØ·½
¥ÏØ
(1.48)
and so, by induction,
ÏØ
¥Ø
ϼ
(1.49)
Because they all represent probability distributions, each of the columns of ¥ must
add up to ½, and all the ÛØ
must add up to ½ for each Ø.
1.14 Stationary Distributions
The vectors ÏØ
describe how the behaviour of the source changes over time. The
asymptotic (long-term) behaviour of sources is of interest in some cases.
EXAMPLE 1.17
Consider a source with transition matrix
¥ ¼ ¼ ¾
¼ ¼
Suppose the initial probability distribution is
ϼ ¼
¼
24 Fundamentals of Information Theory and Coding Design
Then
Ͻ
¥Ï¼ ¼ ¼ ¾
¼ ¼
¼
¼
¼
¼
Similarly,
Ͼ
¥Ï½ ¼ ¿
¼
Ï¿
¥Ï¾ ¼ ¿
¼
Ï ¥Ï¿ ¼ ¿¿
¼ ¾
and so on.
Suppose instead that
ϼ
×
½ ¿
¾ ¿
Then
Ͻ
× ¥Ï¼
×
¼ ¼ ¾
¼ ¼
½ ¿
¾ ¿
½ ¿
¾ ¿ ϼ
×
so that
ÏØ
× Ï¼
×
for all Ø ¼. This distribution will persist for all time.
In the example above, the initial distribution ϼ
× has the property that
¥Ï¼
× Ï¼
× (1.50)
and persists for all time.
DEFINITION 1.17 Stationary Distribution A probability distribution Ï over
the states of a Markov source with transition matrix ¥ that satisfies the equation
¥Ï Ï is a stationary distribution.
As shown in the example, if ϼ is a stationary distribution, it persists for all time,
ÏØ Ï¼ for all Ø. The defining equation shows that a stationary distribution Ï
must be an eigenvector of ¥with eigenvalue ½. To find a stationary distribution for ¥,
we must solve the equation ¥Ï Ï together with the condition that
È
Û ½.
EXAMPLE 1.18
Suppose
¥
¾
¼ ¾ ¼ ¼ ¼ ¼¼
¼ ¼ ¼ ¼¼ ¼ ¾
¼ ¾ ¼ ¼ ¼
¿
Entropy and Information 25
Then the equation ¥Ï Ï gives
¼ ¾ Û½ ·¼ ¼Û¾ ·¼ ¼¼Û¿ Û½
¼ ¼Û½ ·¼ ¼¼Û¾ ·¼ ¾ Û¿ Û¾
¼ ¾ Û½ ·¼ ¼Û¾ ·¼ Û¿ Û¿
The first equation gives us
¼ Û½  ¼ ¼Û¾ ¼
and the other two give
¼ ¼Û½  ½ ¼¼Û¾ ·¼ ¾ Û¿ ¼
¼ ¾ Û½ ·¼ ¼Û¾  ¼ ¾ Û¿ ¼
from which we get
 ¾ ¼¼Û¾ ·¼ Û¿ ¼
So
Û½
¾
¿Û¾
and
Û¿
¿Û¾
Substituting these values in
Û½ ·Û¾ ·Û¿ ½
we get
¾
¿Û¾ ·Û¾ · ¿Û¾ ½
which gives us
Û¾
¿
½¿ Û½
¾
½¿ Û¿
½¿
So the stationary distribution is
Ï
¾
¾ ½¿
¿ ½¿
½¿
¿
In the examples above, the source has an unique stationary distribution. This is not
always the case.
26 Fundamentals of Information Theory and Coding Design
EXAMPLE 1.19
Consider the source with four states and probability transition matrix
¥
¾
½ ¼ ¼ ¼ ¼ ¼ ¼
¼ ¼ ¼ ¼ ¼ ¼ ¼
¼ ¼ ¼ ¼ ¼ ¼ ¼
¼ ¼ ¼ ¼ ¼ ½ ¼
¿
The diagrammatic representation of this source is shown in Figure 1.4.
È´ ¾µ ¼
¾½
È´ ½ ¿µ ¼
È´ ¾ ¿µ ¼
È´ ¿ ¾µ ¼
È´ ½ ½µ ½ ¼
¿
È´ µ ½ ¼
FIGURE 1.4
A source with two stationary distributions.
For this source, any distribution with Û¾ ¼ ¼, Û¿ ¼ ¼and Û½ ·Û ½ ¼satisfies
the equation ¥Ï Ï . However, inspection of the transition matrix shows that once
the source enters either the first state or the fourth state, it cannot leave it. The only
stationary distributions that can occur are Û½ ½ ¼, Û¾ ¼ ¼, Û¿ ¼ ¼, Û ¼ ¼
or Û½ ¼ ¼, Û¾ ¼ ¼, Û¿ ¼ ¼, Û ½ ¼.
Some Markov sources have the property that every sequence generated by the source
has the same statistical properties. That is, the various frequencies of occurrence of
Entropy and Information 27
symbols, pairs of symbols, and so on, obtained from any sequence generated by the
source will, as the length of the sequence increases, approach some definite limit
which is independent of the particular sequence. Sources that have this property are
called ergodic sources.
The source of Example 1.19 is not an ergodic source. The sequences generated by
that source fall into two classes, one of which is generated by sequences of states
that end in the first state, the other of which is generated by sequences that end in the
fourth state. The fact that there are two distinct stationary distributions shows that
the source is not ergodic.
1.15 The Entropy of Markov Sources
There are various ways of defining the entropy of an information source. The fol-
lowing is a simple approach which applies to a restricted class of Markov sources.
DEFINITION 1.18 Entropy of the th State of a Markov Source The entropy of
the th state of a Markov source is the entropy of the probability distribution on the
set of transitions from that state.
If we denote the probability distribution on the set of transitions from the th state by
È , then the entropy of the th state is given by
À´È µ  
Æ
½
È ´ µ ÐÓ ´È ´ µµ (1.51)
DEFINITION 1.19 Unifilar Markov Source A unifilar Markov source is one with
the property that the labels on the transitions from any given state are all distinct.
We need this property in order to be able to define the entropy of a Markov source.
We assume that the source has a stationary distribution.
28 Fundamentals of Information Theory and Coding Design
DEFINITION 1.20 Entropy of a Unifilar Markov Source The entropy
of a unifilar Markov source Å, whose stationary distribution is given by
Û½ Û¾ ÛÆ, and whose transition probabilities are È´ µ for ½ Æ,
½ Æ, is
À´Åµ
Æ
½
Û À´È µ  
Æ
½
Æ
½
Û È´ µ ÐÓ ´È´ µµ (1.52)
It can be shown that this definition is consistent with more general definitions of the
entropy of an information source.
EXAMPLE 1.20
For the Markov source of Example 1.14, there are three states, ½, ¾ and ¿. The
probability distribution on the set of transitions from is È for ½ ¾ ¿.
Ƚ is given by
Ƚ´½µ È´½ ½µ ¼ ¼ Ƚ´¾µ È´¾ ½µ ½ ¼ Ƚ´¿µ È´¿ ½µ ¼ ¼
Its entropy is
À´È½µ  ´¼ ¼µ ÐÓ ´¼ ¼µ   ´½ ¼µ´¼ ¼µ   ´¼ ¼µ ÐÓ ´¼ ¼µ ¼ ¼
using the usual convention that ¼ ¼ ÐÓ ´¼ ¼µ ¼ ¼.
Ⱦ is given by
Ⱦ´½µ È´½ ¾µ ¼ ¼ Ⱦ´¾µ È´¾ ¾µ ¼ ¼ Ⱦ´¿µ È´¿ ¾µ ½ ¼
Its entropy is
À´È¾µ  ´¼ ¼µ ÐÓ ´¼ ¼µ   ´¼ ¼µ ÐÓ ´¼ ¼µ   ´½ ¼µ´¼ ¼µ ¼ ¼
È¿ is given by
È¿´½µ È´½ ¿µ ¼ È¿´¾µ È´¾ ¿µ ¼ È¿´¿µ È´¿ ¿µ ¼ ¼
Its entropy is
À´È¿µ  ´¼ µ´ ¼ ¿ µ   ´¼ µ´ ½ ¿¾½ ¿µ   ´¼ ¼µ ÐÓ ´¼ ¼µ ¼ ¼
The stationary distribution of the source is given by
Û½
¿
½¿
Û¾
½¿
Û¿
½¿
Entropy and Information 29
The entropy of the source is
À´Åµ ´¼ ¼µ´¿ ½¿µ · ´¼ ¼µ´ ½¿µ · ´¼ ¼ µ´ ½¿µ ¼ ¿ ¿
EXAMPLE 1.21
For the source of Example 1.16, the states are ¼¼, ¼½, ½¼, ½½.
ȼ¼ is given by ȼ¼´¼µ È´¼ ¼¼µ ¼ , and ȼ¼´½µ È´½ ¼¼µ ¼ ¾. Its entropy
is
À´È¼¼µ  ´¼ µ´ ¼ ¿¾½ ¿µ   ´¼ ¾µ´ ¾ ¿¾½ ¿µ ¼ ¾½ ¿
ȼ½ is given by ȼ½´¼µ È´¼ ¼½µ ¼ , and ȼ½´½µ È´½ ¼½µ ¼ . Its entropy
is
À´È¼½µ  ´¼ µ´ ½ ¼µ   ´¼ µ´ ½ ¼µ ½ ¼
Ƚ¼ is given by Ƚ¼´¼µ È´¼ ½¼µ ¼ , and Ƚ¼´½µ È´½ ½¼µ ¼ ¾. Its entropy
is
À´È½¼µ  ´¼ µ´ ¼ ¿¾½ ¿µ   ´¼ ¾µ´ ¾ ¿¾½ ¿µ ¼ ¾½ ¿
Ƚ½ is given by Ƚ½´¼µ È´¼ ½½µ ¼ , and Ƚ½´½µ È´½ ½½µ ¼ . Its entropy
is
À´È½½µ  ´¼ µ´ ¼ ½¼ ¿µ   ´¼ µ´ ½ ¿¾½ ¿µ ¼ ¼
The stationary distribution of the source is given by
Û½
½¾¼
¾½¿
Û¾
¿¼
¾½¿
Û¿
¿
¾½¿
Û ¾
¾½¿
The entropy of the source is
À´Åµ
½¾¼´¼ ¾½ ¿µ · ¿¼´½ ¼¼¼¼¼µ · ¿ ´¼ ¾½ ¿µ · ¾ ´¼ ¼ µ
¾½¿
¼ ¿ ¿
1.16 Sequences of Symbols
It is possible to estimate the entropy of a Markov source using information about
the probabilities of occurrence of sequences of symbols. The following results apply
30 Fundamentals of Information Theory and Coding Design
to ergodic Markov sources and are stated without proof. In a sense, they justify
the use of the conditional probabilities of emission of symbols instead of transition
probabilities between states in Ñth-order Markov models.
THEOREM 1.6
Given any ¯ ¼ and any Æ ¼, we can find a positive integer Ƽ such that all
sequences of length Æ Æ¼ fall into two classes: a set of sequences whose total
probability is less than ¯; and the remainder, for which the following inequality holds:
¬
¬
¬
¬
ÐÓ ´½ Ôµ
Æ  À
¬
¬
¬
¬
Æ (1.53)
where Ôis the probability of the sequence and À is the entropy of the source.
PROOF See [6], Appendix 3.
THEOREM 1.7
Let Å be a Markov source with alphabet ½ ¾ Ò , and entropy À. Let
Æ denote the set of all sequences of symbols from of length Æ. For × ¾ Æ, let
È´×µ be the probability of the sequence ×being emitted by the source. Define
Æ   ½
Æ ×¾ Æ
È´×µ ÐÓ ´È´×µµ (1.54)
which is the entropy per symbol of the sequences of Æ symbols. Then Æ is a
monotonic decreasing function of Æ and
Ð Ñ
Æ ½ Æ À (1.55)
PROOF See [6], Appendix 3.
THEOREM 1.8
Let Å be a Markov source with alphabet ½ ¾ Ò , and entropy À. Let
Æ denote the set of all sequences of symbols from of length Æ. For × ¾ Æ ½,
let È´× µ be the probability of the source emitting the sequence × followed by the
symbol , and let È´ ×µ be the conditional probability of the symbol being
emitted immediately after the sequence ×. Define
Æ  
×¾ Æ ½
Ò
½
È´× µ ÐÓ ´È´ ×µµ (1.56)
Entropy and Information 31
which is the conditional entropy of the next symbol when the ´Æ   ½µ preceding
symbols are known. Then Æ is a monotonic decreasing function of Æ and
Ð Ñ
Æ ½ Æ À (1.57)
PROOF See [6], Appendix 3.
THEOREM 1.9
If Æ and Æ are defined as in the previous theorems, then
Æ Æ Æ  ´Æ  ½µ Æ ½ (1.58)
Æ
½
Æ
Æ
Á ½
Á (1.59)
and
Æ Æ (1.60)
PROOF See [6], Appendix 3.
These results show that a series of approximations to the entropy of a source can
be obtained by considering only the statistical behaviour of sequences of symbols
of increasing length. The sequence of estimates Æ is a better approximation than
the sequence Æ . If the dependencies in a source extend over no more than Æ
symbols, so that the conditional probability of the next symbol knowing the preced-
ing ´Æ  ½µ symbols is the same as the conditional probability of the next symbol
knowing the preceding Æ Æ symbols, then Æ À.
1.17 The Adjoint Source of a Markov Source
It is possible to approximate the behaviour of a Markov source by a memoryless
source.
DEFINITION 1.21 Adjoint Source of a Markov Source The adjoint source
of a Markov source is the memoryless source with the same alphabet which emits
symbols independently of each other with the same probabilities as the Markov
source.
32 Fundamentals of Information Theory and Coding Design
If we have an Ñth-order Markov source Å with alphabet ½ ¾ Õ , the
probabilities of emission of the symbols are
È´ µ È´ ½ ¾ ѵȴ ½ ¾ ѵ (1.61)
where ½ ¾ Ñ represents a sequence of Ñ symbols from the alphabet of the
source, È´ ½ ¾ ѵ is the probability of this sequence in the stationary dis-
tribution of the Markov source and the summation over indicates that all such
sequences are included in the summation. The adjoint source of this Markov source,
denoted Å, is the memoryless source that emits these symbols with the same prob-
abilities.
EXAMPLE 1.22
For the 3-gram model of Example 1.15, we have transition probabilities
È´¼ ¼¼µ ¼ È´½ ¼¼µ ¼ ¾
È´¼ ¼½µ ¼ È´½ ¼½µ ¼
È´¼ ½¼µ ¼ È´½ ½¼µ ¼ ¾
È´¼ ½½µ ¼ È´½ ½½µ ¼
which give us the transition matrix
¥
¾
¼ ¼ ¼ ¼ ¼ ¼
¼ ¾ ¼ ¼ ¼ ¾ ¼ ¼
¼ ¼ ¼ ¼ ¼ ¼
¼ ¼ ¼ ¼ ¼ ¼
¿
We need to find the stationary distribution of the source. The equation ¥Ï Ï
gives
¼ Û½ ·¼ ¼Û¾ ·¼ Û¿ ·¼ ¼Û Û½
¼ ¾Û½ ·¼ ¼Û¾ ·¼ ¾Û¿ ·¼ ¼Û Û¾
¼ ¼Û½ ·¼ Û¾ ·¼ ¼Û¿ ·¼ Û Û¿
¼ ¼Û½ ·¼ Û¾ ·¼ ¼Û¿ ·¼ Û Û
Solving these equations together with the constraint
Û½ ·Û¾ ·Û¿ ·Û ½ ¼
we get the stationary distribution
È´¼¼µ ¾ È´¼½µ È´½¼µ È´½½µ
Entropy and Information 33
The probabilities for the adjoint source of the 3-gram models are
È´¼µ È´¼ ¼¼µÈ´¼¼µ · È´¼ ¼½µÈ´¼½µ · È´¼ ½¼µÈ´½¼µ · È´¼ ½½µÈ´½½µ
½
¾¿
and
È´½µ È´½ ¼¼µÈ´¼¼µ · È´½ ¼½µÈ´¼½µ · È´½ ½¼µÈ´½¼µ · È´½ ½½µÈ´½½µ
¾¿
Although the probabilities of emission of single symbols are the same for both the
Markov source and its adjoint source, the probabilities of emission of sequences
of symbols may not be the same. For example the probability of emission of the
sequence ¼¼¼ by the Markov source is È´¼ ¼¼µ ¼ , while for the adjoint source it
is È´¼µ¿ ¼ ¼¿ (by the assumption of independence).
Going from a Markov source to its adjoint reduces the number of constraints on the
output sequence and hence increases the entropy. This is formalised by the following
theorem.
THEOREM 1.10
If Å is the adjoint of the Markov source Å, their entropies are related by
À´Åµ À´Åµ (1.62)
PROOF If Å is an Ñth-order source with alphabet ½ ¾ Õ , we will
denote the states, which are Ñ-tuples of the , by «Á , where ½ Á ÕÑ
. We
assume that Å has a stationary distribution.
The probabilities of emission of the symbols are
È´ µ
Á
ÛÁÈ´ «Áµ (1.63)
where the summation is over all states and ÛÁ is the probability of state «Á in the
stationary distribution of the source.
The entropy of the adjoint is
À´Åµ  
Õ
½
È´ µ ÐÓ ´È´ µµ
34 Fundamentals of Information Theory and Coding Design
 
Õ
½ Á
ÛÁ È´ «ÁµÐÓ ´È´ µµ
 
Á
ÛÁ
Õ
½
È´ «ÁµÐÓ ´È´ µµ (1.64)
The entropy of the Áth state of Å is
À´ÈÁ µ  
Õ
½
È´ «Á µÐÓ ´È´ «Áµµ (1.65)
and the entropy of Å is
À´Åµ  
Á
Õ
½
ÛÁÈ´ «ÁµÐÓ ´È´ «Á µµ
 
Á
ÛÁ
Õ
½
È´ «ÁµÐÓ ´È´ «Á µµ (1.66)
If we apply the inequality of Lemma 1.1 to each summation over , the result follows.
1.18 Extensions of Sources
In situations where codes of various types are being developed, it is often useful to
consider sequences of symbols emitted by a source.
DEFINITION 1.22 Extension of a Stationary Memoryless Source The Òth
extension of a stationary memoryless source Ë is the stationary memoryless source
whose alphabet consists of all sequences of Ò symbols from the alphabet of Ë, with
the emission probabilities of the sequences being the same as the probabilities of
occurrence of the sequences in the output of Ë.
The Òth extension of Ë will be denoted by ËÒ. Because the emission of successive
symbols by Ë is statistically independent, the emission probabilities in ËÒ can be
computed by multiplying the appropriate emission probabilities in Ë.
EXAMPLE 1.23
Consider the memoryless source Ë with alphabet ¼ ½ and emission probabilities
È´¼µ ¼ ¾, È´½µ ¼ .
Entropy and Information 35
The second extension of Ë has alphabet ¼¼ ¼½ ½¼ ½½ with emission probabilities
È´¼¼µ È´¼µÈ´¼µ ´¼ ¾µ´¼ ¾µ ¼ ¼
È´¼½µ È´¼µÈ´½µ ´¼ ¾µ´¼ µ ¼ ½
È´½¼µ È´½µÈ´¼µ ´¼ µ´¼ ¾µ ¼ ½
È´½½µ È´½µÈ´½µ ´¼ µ´¼ µ ¼
The third extension of Ë has alphabet ¼¼¼ ¼¼½ ¼½¼ ¼½½ ½¼¼ ½¼½ ½½¼ ½½½ with
emission probabilities
È´¼¼¼µ È´¼µÈ´¼µÈ´¼µ ´¼ ¾µ´¼ ¾µ´¼ ¾µ ¼ ¼¼
È´¼¼½µ È´¼µÈ´¼µÈ´½µ ´¼ ¾µ´¼ ¾µ´¼ µ ¼ ¼¿¾
È´¼½¼µ È´¼µÈ´½µÈ´¼µ ´¼ ¾µ´¼ µ´¼ ¾µ ¼ ¼¿¾
È´¼½½µ È´¼µÈ´½µÈ´½µ ´¼ ¾µ´¼ µ´¼ µ ¼ ½¾
È´½¼¼µ È´½µÈ´¼µÈ´¼µ ´¼ µ´¼ ¾µ´¼ ¾µ ¼ ¼¿¾
È´½¼½µ È´½µÈ´¼µÈ´½µ ´¼ µ´¼ ¾µ´¼ µ ¼ ½¾
È´½½¼µ È´½µÈ´½µÈ´¼µ ´¼ µ´¼ µ´¼ ¾µ ¼ ½¾
È´½½½µ È´½µÈ´½µÈ´½µ ´¼ µ´¼ µ´¼ µ ¼ ½¾
There is a simple relationship between the entropy of a stationary memoryless source
and the entropies of its extensions.
THEOREM 1.11
If ËÒ is the Òth extension of the stationary memoryless source Ë, their entropies are
related by
À´ËÒµ ÒÀ´Ëµ (1.67)
PROOF If the alphabet of Ë is ½ ¾ Õ , and the emission probabilities
of the symbols are È´ µ for ½ ¾ Õ, the entropy of Ë is
À´Ëµ  
Õ
½
È´ µ ÐÓ ´È´ µµ (1.68)
The alphabet of ËÒ consists of all sequences ½ ¾ Ò, where ¾ ½ ¾ Õ .
The emission probability of ½ ¾ Ò is
È´ ½ ¾ Òµ È´ ½ µÈ´ ¾ µ È´ Òµ (1.69)
36 Fundamentals of Information Theory and Coding Design
The entropy of ËÒ is
À´ËÒµ  
Õ
½ ½
Õ
Ò ½
È´ ½ ¾ Òµ ÐÓ ´È´ ½ ¾ Òµµ
 
Õ
½ ½
Õ
Ò ½
È´ ½ ¾ Òµ
Ò
½
ÐÓ ´È´ µµ (1.70)
We can interchange the order of summation to get
À´ËÒµ  
Ò
½
Õ
½ ½
Õ
Ò ½
È´ ½ ¾ Ò µ ÐÓ ´È´ µµ (1.71)
Breaking È´ ½ ¾ Ò µ into the product of probabilities, and rearranging, we get
À´ËÒµ  
Ò
½
Õ
½ ½
È´ ½ µ
Õ
½
È´ µ ÐÓ ´È´ µµ
Õ
Ò ½
È´ Ò µ (1.72)
Since
Õ
½
È´ µ ½ (1.73)
for , we are left with
À´ËÒµ  
Ò
½
Õ
½
È´ µ ÐÓ ´È´ µµ
Ò
½
À´Ëµ
ÒÀ´Ëµ (1.74)
We also have extensions of Markov sources.
DEFINITION 1.23 Extension of an Ñth-order Markov Source Let Ñ and Ò
be positive integers, and let Ô be the smallest integer that is greater than or equal
to Ñ Ò. The Òth extension of the Ñth-order Markov source Å is the Ôth-order
Markov source whose alphabet consists of all sequences of Ò symbols from the
alphabet of Å and for which the transition probabilities between states are equal
to the probabilities of the corresponding Ò-fold transitions of the Ñth-order source.
We will use ÅÒ to denote the Òth extension of Å.
Entropy and Information 37
EXAMPLE 1.24
Let Å be the first-order Markov source with alphabet ¼ ½ and transition probabil-
ities
È´¼ ¼µ ¼ ¿ È´½ ¼µ ¼ È´¼ ½µ ¼ È´½ ½µ ¼
The second extension of Å has Ñ ½ and Ò ¾, so Ô ½. It is a first-order
source with alphabet ¼¼ ¼½ ½¼ ½½ . We can calculate the transition probabilities
as follows.
È´¼¼ ¼¼µ È´¼ ¼µÈ´¼ ¼µ ´¼ ¿µ´¼ ¿µ ¼ ¼
È´¼½ ¼¼µ È´¼ ¼µÈ´½ ¼µ ´¼ ¿µ´¼ µ ¼ ¾½
È´½¼ ¼¼µ È´½ ¼µÈ´¼ ½µ ´¼ µ´¼ µ ¼ ¾
È´½½ ¼¼µ È´½ ¼µÈ´½ ½µ ´¼ µ´¼ µ ¼ ¾
È´¼¼ ½½µ È´¼ ½µÈ´¼ ¼µ ´¼ µ´¼ ¿µ ¼ ½¾
È´¼½ ½½µ È´¼ ½µÈ´½ ¼µ ´¼ µ´¼ µ ¼ ¾
È´½¼ ½½µ È´½ ½µÈ´¼ ½µ ´¼ µ´¼ µ ¼ ¾
È´½½ ½½µ È´½ ½µÈ´½ ½µ ´¼ µ´¼ µ ¼ ¿
EXAMPLE 1.25
Consider the second order Markov source with alphabet ¼ ½ and transition proba-
bilities
È´¼ ¼¼µ ¼ È´½ ¼¼µ ¼ ¾
È´¼ ¼½µ ¼ È´½ ¼½µ ¼
È´¼ ½¼µ ¼ ¾ È´½ ½¼µ ¼
È´¼ ½½µ ¼ È´½ ½½µ ¼
The transition probabilities of the second extension are
38 Fundamentals of Information Theory and Coding Design
È´¼¼ ¼¼µ È´¼ ¼¼µÈ´¼ ¼¼µ ´¼ µ´¼ µ ¼
È´¼½ ¼¼µ È´¼ ¼¼µÈ´½ ¼¼µ ´¼ µ´¼ ¾µ ¼ ½
È´½¼ ¼¼µ È´½ ¼¼µÈ´¼ ¼½µ ´¼ ¾µ´¼ µ ¼ ½¼
È´½½ ¼¼µ È´½ ¼¼µÈ´½ ¼½µ ´¼ ¾µ´¼ µ ¼ ½¼
È´¼¼ ¼½µ È´¼ ¼½µÈ´¼ ½¼µ ´¼ µ´¼ ¾µ ¼ ½¼
È´¼½ ¼½µ È´¼ ¼½µÈ´½ ½¼µ ´¼ µ´¼ µ ¼ ¼
È´½¼ ¼½µ È´½ ¼½µÈ´¼ ½½µ ´¼ µ´¼ µ ¼ ¿¼
È´½½ ¼½µ È´½ ¼½µÈ´½ ½½µ ´¼ µ´¼ µ ¼ ¾¼
È´¼¼ ½¼µ È´¼ ½¼µÈ´¼ ¼¼µ ´¼ ¾µ´¼ µ ¼ ½
È´¼½ ½¼µ È´¼ ½¼µÈ´½ ¼¼µ ´¼ ¾µ´¼ ¾µ ¼ ¼
È´½¼ ½¼µ È´½ ½¼µÈ´¼ ¼½µ ´¼ µ´¼ µ ¼ ¼
È´½½ ½¼µ È´½ ½¼µÈ´½ ¼½µ ´¼ µ´¼ µ ¼ ¼
È´¼¼ ½½µ È´¼ ½½µÈ´¼ ½¼µ ´¼ µ´¼ ¾µ ¼ ½¾
È´¼½ ½½µ È´¼ ½½µÈ´½ ½¼µ ´¼ µ´¼ µ ¼
È´½¼ ½½µ È´½ ½½µÈ´¼ ½½µ ´¼ µ´¼ µ ¼ ¾
È´½½ ½½µ È´½ ½½µÈ´½ ½½µ ´¼ µ´¼ µ ¼ ½
If we denote the states of the second extension by «½ ¼¼, «¾ ¼½, «¿ ½¼ and
« ½½, we have the transition probabilities of a first-order Markov source:
È´«½ «½µ ¼ È´«¾ «½µ ¼ ½
È´«¿ «½µ ¼ ½¼ È´« «½µ ¼ ½¼
È´«½ «¾µ ¼ ½¼ È´«¾ «¾µ ¼ ¼
È´«¿ «¾µ ¼ ¿¼ È´« «¾µ ¼ ¾¼
È´«½ «¿µ ¼ ½ È´«¾ «¿µ ¼ ¼
È´«¿ «¿µ ¼ ¼ È´« «¿µ ¼ ¼
È´«½ « µ ¼ ½¾ È´«¾ « µ ¼
È´«¿ « µ ¼ ¾ È´« « µ ¼ ½
EXAMPLE 1.26
Consider the fourth order Markov source with alphabet ¼ ½ and transition proba-
bilities
Entropy and Information 39
È´¼ ¼¼¼¼µ ¼ È´½ ¼¼¼¼µ ¼ ½
È´¼ ¼¼¼½µ ¼ È´½ ¼¼¼½µ ¼ ¾
È´¼ ¼¼½¼µ ¼ È´½ ¼¼½¼µ ¼ ¿
È´¼ ¼¼½½µ ¼ È´½ ¼¼½½µ ¼
È´¼ ¼½¼¼µ ¼ È´½ ¼½¼¼µ ¼
È´¼ ¼½¼½µ ¼ È´½ ¼½¼½µ ¼
È´¼ ¼½½¼µ ¼ ¿ È´½ ¼½½¼µ ¼
È´¼ ¼½½½µ ¼ ¾ È´½ ¼½½½µ ¼
È´¼ ½¼¼¼µ ¼ ½ È´½ ½¼¼¼µ ¼
È´¼ ½¼¼½µ ¼ ¾ È´½ ½¼¼½µ ¼
È´¼ ½¼½¼µ ¼ ¿ È´½ ½¼½¼µ ¼
È´¼ ½¼½½µ ¼ È´½ ½¼½½µ ¼
È´¼ ½½¼¼µ ¼ È´½ ½½¼¼µ ¼
È´¼ ½½¼½µ ¼ È´½ ½½¼½µ ¼
È´¼ ½½½¼µ ¼ È´½ ½½½¼µ ¼ ¿
È´¼ ½½½½µ ¼ È´½ ½½½½µ ¼ ¾
We can use these probabilities to compute the transition probabilities of the second
extension, for example,
È´¼½ ½¼½¼µ È´¼ ½¼½¼µÈ´½ ¼½¼¼µ ´¼ ¿µ´¼ µ ¼ ½
If we denote the states of the second extension by «½ ¼¼, «¾ ¼½, «¿ ½¼ and
« ½½, we have the transition probabilities of a second-order Markov source:
40 Fundamentals of Information Theory and Coding Design
È´«½ «½«½µ ¼ ½ È´«¾ «½«½µ ¼ ¼
È´«¿ «½«½µ ¼ ¼ È´« «½«½µ ¼ ¼¾
È´«½ «½«¾µ ¼ È´«¾ «½«¾µ ¼ ¾
È´«¿ «½«¾µ ¼ ½¾ È´« «½«¾µ ¼ ¼
È´«½ «½«¿µ ¼ ¿ È´«¾ «½«¿µ ¼ ¿
È´«¿ «½«¿µ ¼ ½¾ È´« «½«¿µ ¼ ½
È´«½ «½« µ ¼ ½ È´«¾ «½« µ ¼ ¾
È´«¿ «½« µ ¼ ¼ È´« «½« µ ¼ ¿¾
È´«½ «¾«½µ ¼ ¼ È´«¾ «¾«½µ ¼
È´«¿ «¾«½µ ¼ ½¼ È´« «¾«½µ ¼ ¼
È´«½ «¾«¾µ ¼ ½¾ È´«¾ «¾«¾µ ¼ ¾
È´«¿ «¾«¾µ ¼ ¾ È´« «¾«¾µ ¼ ¿
È´«½ «¾«¿µ ¼ ½ È´«¾ «¾«¿µ ¼ ½
È´«¿ «¾«¿µ ¼ ¾ È´« «¾«¿µ ¼ ¾
È´«½ «¾« µ ¼ ½ È´«¾ «¾« µ ¼ ¼
È´«¿ «¾« µ ¼ È´« «¾« µ ¼ ½
È´«½ «¿«½µ ¼ ¼ È´«¾ «¿«½µ ¼ ¼½
È´«¿ «¿«½µ ¼ ¾ È´« «¿«½µ ¼ ½
È´«½ «¿«¾µ ¼ ½ È´«¾ «¿«¾µ ¼ ¼
È´«¿ «¿«¾µ ¼ È´« «¿«¾µ ¼ ¿¾
È´«½ «¿«¿µ ¼ ½ È´«¾ «¿«¿µ ¼ ½
È´«¿ «¿«¿µ ¼ ¾ È´« «¿«¿µ ¼ ¾
È´«½ «¿« µ ¼ ½¾ È´«¾ «¿« µ ¼ ¾
È´«¿ «¿« µ ¼ ½¾ È´« «¿« µ ¼
È´«½ « «½µ ¼ ¼ È´«¾ « «½µ ¼
È´«¿ « «½µ ¼ ½¼ È´« « «½µ ¼ ¼
È´«½ « «¾µ ¼ ½ È´«¾ « «¾µ ¼ ¾
È´«¿ « «¾µ ¼ ½ È´« « «¾µ ¼ ¾
È´«½ « «¿µ ¼ ¿ È´«¾ « «¿µ ¼ ¿
È´«¿ « «¿µ ¼ ½ È´« « «¿µ ¼ ½¾
È´«½ « « µ ¼ È´«¾ « « µ ¼ ¾
È´«¿ « « µ ¼ ½ È´« « « µ ¼ ¼
It is convenient to represent elements of the alphabet of ÅÒ by single symbols as
we have done in the examples above. If the alphabet of Å is ½ ¾ Õ , then
we will use « for a generic element of the alphabet of ÅÒ, and « ½ ¾ Ò will stand
for the sequence ½ ¾ Ò. For further abbreviation, we will let Á stand for
½ ¾ Ò and use «Á to denote « ½ ¾ Ò, and so on.
The statistics of the extension ÅÒ are given by the conditional probabilities È´«Â «Á½ «Á¾ «ÁÔµ.
In terms of the alphabet of Å, we have
È´«Â «Á½ «Á¾ «ÁÔµ È´ ½ Ò ½½ ½Ò ÔÒµ (1.75)
Entropy and Information 41
This can also be written as
È´«Â «Á½ «Á¾ «ÁÔµ È´ ½ ½½ ½Ò Ôҵȴ ¾ ½¾ ÔÒ ½ µ
È´ Ò ½Ò ÔÒ ½ Ò ½µ (1.76)
We can use this relationship to prove the following result.
THEOREM 1.12
If ÅÒ is the Òth extension of the Markov source Å their entropies are related by
À´ÅÒµ ÒÀ´Åµ (1.77)
The proof is similar to the proof of the corresponding result for memoryless sources.
(Exercise 17 deals with a special case.)
Note that if Ñ Ò, then
À´Åѵ ÑÀ´Åµ ÒÀ´Åµ À´ÅÒµ (1.78)
Since an extension of a Markov source is a Ôth-order Markov source, we can consider
its adjoint source. If Å is an Ñth-order Markov source, ÅÒ is an Òth extension of
Å, and ÅÒ the adjoint source of the extension, then we can combine the results of
Theorem 1.10 and Theorem 1.12 to get
À´ÅÒµ À´ÅÒµ ÒÀ´Åµ (1.79)
1.19 Infinite Sample Spaces
The concept of entropy carries over to infinite sample spaces, but there are a number
of technical issues that have to be considered.
If the sample space is countable, the entropy has to be defined in terms of the limit
of a series, as in the following example.
EXAMPLE 1.27
Suppose the sample space is the set of natural numbers, Æ ¼ ½ ¾ and the
probability distribution is given by
È´Òµ ¾
 Ò ½
(1.80)
42 Fundamentals of Information Theory and Coding Design
for Ò ¾ Æ. The entropy of this distribution is
À´Èµ  
½
Ò ¼
È´Òµ ÐÓ ´È´Òµ
 
½
Ò ¼
¾
 Ò ½´ Ò   ½µ
½
Ò ¼
Ò · ½
¾Ò·½ (1.81)
This infinite sum converges to ¾, so À´Èµ ¾.
If the sample space is a continuum, in particular, the real line, the summations be-
come integrals. Instead of the probability distribution function, we use the probabil-
ity density function , which has the property that
È´ µ ´Üµ Ü (1.82)
where is a closed interval and is the probability density function.
The mean and variance of the probability density function are defined by
½
 ½
Ü ´Üµ Ü (1.83)
and
¾
½
 ½
´Ü   µ
¾ ´Üµ Ü (1.84)
The obvious generalisation of the definition of entropy for a probability density func-
tion defined on the real line is
À´ µ  
½
 ½
´Üµ ÐÓ ´ ´Üµµ Ü (1.85)
provided this integral exists. This definition was proposed by Shannon, but has been
the subject of debate because it is not invariant with respect to change of scale or
change of co-ordinates in general. It is sometimes known as the differential entropy.
If we accept this definition of the entropy of a continuous distribution, it is easy to
compute the entropy of a Gaussian distribution.
THEOREM 1.13
The entropy of a Gaussian distribution with mean and variance ¾ is ÐÒ´
Ô
¾ µ
in natural units.
Entropy and Information 43
PROOF The density function of the Gaussian distribution is
´Üµ
½
Ô¾
´Ü  µ¾ ¾ ¾
(1.86)
Since this is a probability density function, we have
½
 ½
´Üµ Ü ½ (1.87)
Taking the natural logarithm of , we get
ÐÒ´ ´Üµµ  ÐÒ´
Ô
¾ µ   ´Ü   µ¾
¾ ¾
(1.88)
By definition,
¾
½
 ½
´Ü   µ
¾
´Üµ Ü (1.89)
this will be used below.
We now calculate the entropy:
À´ µ  
½
 ½
´Üµ ÐÒ´ ´Üµµ Ü
 
½
 ½
´Üµ  ÐÒ´
Ô
¾ µ   ´Ü   µ¾
¾ ¾
Ü
½
 ½
´Üµ ÐÒ´
Ô
¾ µ Ü ·
½
 ½
´Üµ
´Ü   µ¾
¾ ¾
Ü
ÐÒ´
Ô
¾ µ
½
 ½
´Üµ Ü ·
½
¾ ¾
½
 ½
´Üµ´Ü   µ
¾
Ü
ÐÒ´
Ô
¾ µ ·
¾
¾ ¾
ÐÒ´
Ô
¾ µ ·
½
¾
ÐÒ´
Ô
¾ µ · ÐÓ ´
Ô µ
ÐÒ´
Ô
¾ µ (1.90)
If the probability density function is defined over the whole real line, it is not possi-
ble to find a specific probability density function whose entropy is greater than the
entropy of all other probability density functions defined on the real line. However,
if we restrict consideration to probability density functions with a given variance, it
44 Fundamentals of Information Theory and Coding Design
can be shown that the Gaussian distribution has the maximum entropy of all these
distributions. (Exercises 20 and 21 outline the proof of this result.)
We have used À´ µ to denote the entropy of the probability distribution whose prob-
ability density function is . If is a random variable whose probability density
function is , we will denote its entropy by either À´ µ or À´ µ.
1.20 Exercises
1. Let Ë ×½ ×¾ ׿ be a sample space with probability distribution È given
by È´×½µ ¼ ¾, È´×¾µ ¼ ¿, ȴ׿µ ¼ . Let be a function defined on Ë
by ´×½µ , ´×¾µ  ¾, ´×¿µ ½. What is the expected value of ?
2. Let Ë ×½ ×¾ be a sample space with probability distribution È given by
È´×½µ ¼ , È´×¾µ ¼ ¿. Let be the function from Ë to ʾ given by
´×½µ
¾ ¼
¿ ¼
and
´×¾µ
¼
  ¼
What is the expected value of ?
3. Suppose that a fair die is tossed. What is the expected number of spots on the
uppermost face of the die when it comes to rest? Will this number of spots
ever be seen when the die is tossed?
4. Let Ë ×½ ×¾ ׿ × be a sample space with probability distribution È
given by È´×½µ ¼ , È´×¾µ ¼ ¾ , ȴ׿µ ¼ ½¾ , È´× µ ¼ ½¾ .
There are sixteen possible events that can be formed from the elements of Ë.
Compute the probability and surprise of these events.
5. Let Ë Ë½ ×½ ×Æ be a sample space for some Æ. Compute the en-
tropy of each of the following probability distributions on Ë:
(a) Æ ¿, È´×½µ ¼ , È´×¾µ ¼ ¾ , ȴ׿µ ¼ ¾ ;
(b) Æ , È´×½µ ¼ , È´×¾µ ¼ ¾ , ȴ׿µ ¼ ½¾ , È´× µ ¼ ½¾ ;
(c) Æ , È´×½µ ¼ , È´×¾µ ¼ ½¾ , ȴ׿µ ¼ ½¾ , È´× µ ¼ ½¾ ,
È´× µ ¼ ½¾ ;
(d) Æ , È´×½µ ¼ ¾ , È´×¾µ ¼ ¾ , ȴ׿µ ¼ ¾ , È´× µ ¼ ½¾ ,
È´× µ ¼ ½¾ ;
Entropy and Information 45
(e) Æ , È´×½µ ¼ ¼ ¾ , È´×¾µ ¼ ½¾ , ȴ׿µ ¼ ¾ , È´× µ ¼ ,
È´× µ ¼ ¼ ¾ .
6. Convert the following entropy values from bits to natural units: (a) À´Èµ
¼ ; (b) À´Èµ ½ ¼; (c) À´Èµ ½ ; (d) À´Èµ ¾ ¼; (e) À´Èµ ¾ ; (f)
À´Èµ ¿ ¼.
7. Convert the following entropy values from natural units to bits: (a) À´Èµ
¼ ; (b) À´Èµ ½ ¼; (c) À´Èµ ½ ; (d) À´Èµ ¾ ¼; (e) À´Èµ ¾ ; (f)
À´Èµ ¿ ¼.
8. Let Ë ×½ ×¾ and Ì Ø½ ؾ Ø¿ . Let È be a joint probability distribution
function on Ë ¢ Ì, given by È´×½ ؽµ ¼ , È´×½ ؾµ ¼ ¾ , È´×½ Ø¿µ
¼ ½¾ , È´×¾ ؽµ ¼ ¼ ¾ , È´×¾ ؾµ ¼ ¼¿½¾ , È´×¾ Ø¿µ ¼ ¼¿½¾ . Com-
pute the marginal distributions ÈË and ÈÌ and the entropies À´Èµ, À´È˵
and À´ÈÌ µ. Also compute the conditional probability distributions ÈË Ì and
ÈÌ Ë and their entropies À´ÈË Ì µ and À´ÈÌ Ëµ.
9. Draw a diagram to represent the Markov source with alphabet ¼ ½ and set
of states ¦ ½ ¾ ¿ with the following five transitions:
(a) ½ ¾, with label 1 and È´¾ ½µ ¼ ;
(b) ½ ¿, with label 0 and È´¿ ½µ ¼ ;
(c) ¾ ½, with label 0 and È´½ ¾µ ¼ ;
(d) ¾ ¿, with label 1 and È´¿ ¾µ ¼ ¾;
(e) ¿ ½, with label 1 and È´½ ¿µ ½ ¼.
Write down the transition matrix for this source. Is it possible for this source to
generate an output sequence that includes the subsequence ¼¼¼? Is it possible
for this source to generate an output sequence that includes the subsequence
½½½?
10. A 2-gram model on the language is given by the probabilities
È´ µ ¼ ¼¼¼ È´ µ ¼ ¾¼¼ È´ µ ¼ ½¿¿
È´ µ ¼ ½¿¿ È´ µ ¼ ¼¼¼ È´ µ ¼ ¾¼¼
È´ µ ¼ ¾¼¼ È´ µ ¼ ½¿¿ È´ µ ¼ ¼¼¼
The probabilities of the individual symbols are
È´ µ ½ ¿ È´ µ ½ ¿ È´ µ ½ ¿
Construct a (first-order) Markov source from these models. Draw a diagram
to represent the source and write down its transition matrix.
46 Fundamentals of Information Theory and Coding Design
11. A 4-gram model on the language ¼ ½ is given by
È ´½¼½¼µ ¼ ¼ È ´¼½¼½µ ¼ ¼
with all other probabilities ¼. All the 3-gram probabilities are also ¼, except
for
È ´¼½¼µ ¼ ¼ È ´½¼½µ ¼ ¼
Construct a third-order Markov source from these models. Draw a diagram to
represent the source and write down its transition matrix.
12. Consider a Markov source with transition matrix
¥
¼ ¿ ¼
¼ ¼ ¾
and initial probability distribution
Ï
¼ ¼
¼
Compute Ï
Ø
for Ø ½ ¾ ¿ .
13. Find the stationary distribution of the Markov source whose transition matrix
is
¥
¼ ½ ¼
¼ ¼ ¾
*14. Prove that a Markov source with two states always has a stationary distribution,
provided that none of the transition probabilities are ¼.
15. Find the stationary distribution of the Markov source whose transition matrix
is
¥
¾
¼ ¼ ¿ ¼ ¾
¼ ¼ ¾ ¼
¼ ½ ¼ ¼ ¾
¿
16. Compute the entropy of the Markov source in Exercise 9 above.
*17. Prove that if Å
Ò
is the Òth extension of the first-order Markov source Å , their
entropies are related by À ´Å
Ò
µ ÒÀ ´Å µ.
18. Let Ë ½ ¾ ¿ be the set of positive integers and let È be the probability
distribution given by È ´ µ ¾  . Let be the function on Ë defined by
´ µ ´ ½µ ·½. What is the expected value of ?
19. Let Ù be the probability density function of the uniform distribution over the
closed and bounded interval , so that ٴܵ ´   µ ½ if Ü
and ٴܵ ¼ otherwise. Compute the mean, variance and entropy (in natural
units) of Ù.
The next two exercises prove the result that the Gaussian distribution has the
maximum entropy of all distributions with a given variance.
Entropy and Information 47
20. Use the identity ÐҴܵ Ü  ½ to show that if and are probability density
functions defined on the real line then
½
 ½
´Üµ ÐÒ´ ´Üµµ Ü
½
 ½
´Üµ ÐÒ´ ´Üµµ Ü
provided both these integrals exist.
21. Show that if is a probability density function with mean and variance ¾
and is the probability density function of a Gaussian distribution with the
same mean and variance, then
 
½
 ½
´Üµ ÐÒ´ ´Üµµ Ü ÐÒ´
Ô
¾ µ
Conclude that of all the probability density functions on the real line with
variance ¾, the Gaussian has the greatest entropy.
22. Compute the entropy of the probability density function of a uniform distribu-
tion on a closed interval whose variance is ¾ and confirm that it is less than
the entropy of a Gaussian distribution with variance ¾. (Use the results of
Exercise 19.)
The next four exercises are concerned with the Kullback-Leibler Divergence.
*23. Let È and É be probability distributions on the sample space
Ë ×½ ×¾ ×Æ
with È´× µ Ô and É´× µ Õ for ½ Æ. The Kullback-Leibler
Divergence of È and É is defined by
´È ɵ
Æ
½
Ô ÐÓ ´Ô Õ µ
Compute ´È ɵ and ´É ȵ, when Æ , Ô ¼ ¾ for ½ ¾ ¿ ,
and Õ½ ¼ ½¾ , Õ¾ ¼ ¾ , Õ¿ ¼ ½¾ , Õ ¼ . Is ´È ɵ ´É ȵ?
*24. If È and É are probability distributions on Ë, a finite sample space, show that
´È ɵ ¼ and ´È ɵ ¼ if and only if È É.
*25. If È, É, and Ê are probability distributions on Ë ×½ ×¾ ×Æ , with
È´× µ Ô , É´× µ Õ and Ê´× µ Ö for ½ Æ, show that
´È ʵ ´È ɵ · ´É ʵ
if and only if
Æ
½
´Ô  Õ µ ÐÓ ´Õ Ö µ ¼
48 Fundamentals of Information Theory and Coding Design
*26. If the sample space is the real line, it is easier to define the Kullback-Leibler
Divergence in terms of probability density functions. If Ô and Õ are probability
density functions on Ê, with Դܵ ¼ and մܵ ¼ for all Ü ¾ Ê, then we
define
´Ô Õµ
½
 ½
Դܵ ÐҴԴܵ մܵµ Ü
Compute ´Ô Õµ when Ô is the probability density function of a Gaussian
distribution with mean ½ and variance ¾ and Õ is the probability density
function of a Gaussian distribution with mean ¾ and variance ¾.
The next two exercises deal with the topic of Maximum Entropy Estimation.
*27. We have shown in Section 1.6 that the entropy of the uniform distribution on a
sample space with Æ elements is ÐÓ ´Æµ and this is the maximum value of the
entropy for any distribution defined on that sample space. In many situations,
we need to find a probability distribution that satisfies certain constraints and
has the maximum entropy of all the distributions that satisfy those constraints.
One type of constraint that is common is to require that the mean of the dis-
tribution should have a certain value. This can be done using Lagrange multi-
pliers. Find the probability distribution on Ë ½ ¾ ¿ that has maximum
entropy subject to the condition that the mean of the distribution is ¾. To do
this you have to find Ô½, Ô¾, Ô¿ and Ô that maximise the entropy
À´Èµ
½
Ô ÐÓ ´½ Ô µ
subject to the two constraints
½
Ô ½
(so that the Ô form a probability distribution) and
½
Ô ¾
*28. Find the probability distribution on Ë ½ ¾ ¿ that has maximum entropy
subject to the conditions that the mean of the distribution is ¾ and the second
moment of the distribution of the distribution is ¾. In this case the constraints
are
½
Ô ½
½
Ô ¾
Entropy and Information 49
and
½
¾
Ô ¾
The next two exercises deal with the topic of Mutual Information.
*29. If È is a joint probability distribution on Ë ¢ Ì , the Mutual Information of
È , denoted ÁÅ´È µ, is the Kullback-Leibler Divergence between È and the
product of the marginals É, given by
É´× Ø µ ÈË´× µÈÌ´Ø µ
Compute the Mutual Information of È when Ë ×½ ×¾ , Ì Ø½ ؾ Ø¿ ,
and È is given by È ´×½ ؽµ ¼ , È ´×½ ؾµ ¼ ¾ , È ´×½ Ø¿µ ¼ ½¾ ,
È ´×¾ ؽµ ¼ ¼ ¾ , È ´×¾ ؾµ ¼ ¼¿½¾ , È ´×¾ Ø¿µ ¼ ¼¿½¾ .
*30. Show that if Ë ×½ ×¾ ×Å and Ì Ø½ ؾ ØÆ , an alternative
expression for the Mutual Information of È , a joint probability distribution on
Ë ¢Ì , is given by
ÁÅ´È µ
Å
½
Æ
½
È ´× Ø µÐÓ ´È ´× Ø µ ÈË´× µµ
1.21 References
[1] R. Ash, Information Theory, John Wiley  Sons, New York, 1965.
[2] T. M. Cover and J. A. Thomas, Elements of Information Theory, John Wiley 
Sons, New York, 1991.
[3] D. S. Jones, Elementary Information Theory, Oxford University Press, Oxford,
1979.
[4] H. S. Leff and A. F. Rex, Eds., Maxwell’s Demon: Entropy, Information, Com-
puting, Adam Hilger, Bristol, 1990.
[5] R. D. Rosenkrantz, Ed., E. T. Jaynes: Papers on Probability, Statistics and Sta-
tistical Physics, D. Reidel, Dordrecht, 1983.
[6] C. E. Shannon and W. Weaver, The Mathematical Theory Of Communication,
The University of Illinois Press, Urbana, IL, 1949.
Chapter 2
Information Channels
2.1 What Are Information Channels?
Information does not only have to be used or stored, it also has to be transmitted. In
communication systems a transmitter converts, or encodes, a message to a form suit-
able for transmission through a communications medium, be it a fibre optic channel,
satellite link or radio signal through space. The receiver then detects the transmitted
signal and decodes it back to the original message. The encoding and decoding oper-
ations will be discussed in the following chapters. Although we will be dealing with
digital data, physical transmission and storage has to deal with analog signals and
media. Thus digital data has to be modulated in some way prior to transmission and
storage, and detected by the receiver to reproduce the original digital sequence. Dis-
cussion of modern analog and digital communication systems is beyond the scope
of this book and interested readers should refer to the standard textbooks in the area,
for example [7].
As we will see, two forms of coding will be necessary. Source coding is required to
efficiently store and transmit the information generated by a general-purpose source
(e.g., ASCII text, images, audio, etc.) on the storage or transmission medium in use
(e.g., computer disk or digital data networks, both requiring conversion to binary
data). Channel coding is required to mitigate the effects of noise that may corrupt
data during storage and transmission through physical systems. What is noise? Noise
can be defined as any unwanted signal or effect in addition to the desired signal. As
such there can be many causes of noise: interference from other signals, thermal and
spurious effects generated by the electronic device being used to store or transmit the
signal, environmental disturbances, etc.
In Chapter 1 we were concerned with defining the information content of a source.
In this chapter we will be concerned with the equally important problem of defining
or measuring the information carrying capacity of a channel. Intuitively a channel
can carry no more information than is being pushed through the channel itself, that
is, the entropy of the source or message being transmitted through the channel. But
as we will see the presence of noise reduces the information carrying capacity of
the channel and leads us to define another important quantity (alongside entropy)
called mutual information. As we view the channel from an information theoretic
51
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_
Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_

More Related Content

Similar to Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_

An Introduction to the Analysis of Algorithms (2nd_Edition_Robert_Sedgewick,_...
An Introduction to the Analysis of Algorithms (2nd_Edition_Robert_Sedgewick,_...An Introduction to the Analysis of Algorithms (2nd_Edition_Robert_Sedgewick,_...
An Introduction to the Analysis of Algorithms (2nd_Edition_Robert_Sedgewick,_...Saqib Raza
 
New books dec 2012
New books dec 2012New books dec 2012
New books dec 2012maethaya
 
Aminoacidos cod binarios
Aminoacidos cod binariosAminoacidos cod binarios
Aminoacidos cod binariosRama Shakti
 
English 103 Final TestReading Poetry for 10Answer all five
English 103 Final TestReading Poetry for 10Answer all five English 103 Final TestReading Poetry for 10Answer all five
English 103 Final TestReading Poetry for 10Answer all five TanaMaeskm
 
Artificial_Intelligence__A_Modern_Approach_Nho Vĩnh Share.pdf
Artificial_Intelligence__A_Modern_Approach_Nho Vĩnh Share.pdfArtificial_Intelligence__A_Modern_Approach_Nho Vĩnh Share.pdf
Artificial_Intelligence__A_Modern_Approach_Nho Vĩnh Share.pdfNho Vĩnh
 
Chromatic graph theory
Chromatic graph theoryChromatic graph theory
Chromatic graph theoryjotasmall
 
Summary Of Defending Against The Indefensible Essay
Summary Of Defending Against The Indefensible EssaySummary Of Defending Against The Indefensible Essay
Summary Of Defending Against The Indefensible EssayBrenda Zerr
 
The Design and Analysis of Computer Algorithms [Aho, Hopcroft & Ullman 1974-0...
The Design and Analysis of Computer Algorithms [Aho, Hopcroft & Ullman 1974-0...The Design and Analysis of Computer Algorithms [Aho, Hopcroft & Ullman 1974-0...
The Design and Analysis of Computer Algorithms [Aho, Hopcroft & Ullman 1974-0...YanNaingSoe33
 
Simulating Multivariate Random Normal Data using Statistical Computing Platfo...
Simulating Multivariate Random Normal Data using Statistical Computing Platfo...Simulating Multivariate Random Normal Data using Statistical Computing Platfo...
Simulating Multivariate Random Normal Data using Statistical Computing Platfo...ijtsrd
 
( (P A N G . N I N G T A NM i c h i g a n
 ( (P A N G . N  I  N G  T A NM i c h i g a n   ( (P A N G . N  I  N G  T A NM i c h i g a n
( (P A N G . N I N G T A NM i c h i g a n VannaJoy20
 
( (P A N G . N I N G T A NM i c h i g a n .docx
( (P A N G . N  I  N G  T A NM i c h i g a n  .docx( (P A N G . N  I  N G  T A NM i c h i g a n  .docx
( (P A N G . N I N G T A NM i c h i g a n .docxgertrudebellgrove
 
Analog_and_digital_signals_and_systems.pdf
Analog_and_digital_signals_and_systems.pdfAnalog_and_digital_signals_and_systems.pdf
Analog_and_digital_signals_and_systems.pdfKrishanVyas1
 
Rosen - Elementary number theory and its applications.pdf
Rosen - Elementary number theory and its applications.pdfRosen - Elementary number theory and its applications.pdf
Rosen - Elementary number theory and its applications.pdfSahat Hutajulu
 
Grimmett&Stirzaker--Probability and Random Processes Third Ed(2001).pdf
Grimmett&Stirzaker--Probability and Random Processes  Third Ed(2001).pdfGrimmett&Stirzaker--Probability and Random Processes  Third Ed(2001).pdf
Grimmett&Stirzaker--Probability and Random Processes Third Ed(2001).pdfAbdirahman Farah Ali
 
SMART Seminar Series: "Data is the new water in the digital age"
SMART Seminar Series: "Data is the new water in the digital age"SMART Seminar Series: "Data is the new water in the digital age"
SMART Seminar Series: "Data is the new water in the digital age"SMART Infrastructure Facility
 
Knowledge graph construction for research & medicine
Knowledge graph construction for research & medicineKnowledge graph construction for research & medicine
Knowledge graph construction for research & medicinePaul Groth
 
Applied_Statistics.pdf
Applied_Statistics.pdfApplied_Statistics.pdf
Applied_Statistics.pdfAdarshDKarki1
 

Similar to Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_ (20)

An Introduction to the Analysis of Algorithms (2nd_Edition_Robert_Sedgewick,_...
An Introduction to the Analysis of Algorithms (2nd_Edition_Robert_Sedgewick,_...An Introduction to the Analysis of Algorithms (2nd_Edition_Robert_Sedgewick,_...
An Introduction to the Analysis of Algorithms (2nd_Edition_Robert_Sedgewick,_...
 
New books dec 2012
New books dec 2012New books dec 2012
New books dec 2012
 
Aminoacidos cod binarios
Aminoacidos cod binariosAminoacidos cod binarios
Aminoacidos cod binarios
 
English 103 Final TestReading Poetry for 10Answer all five
English 103 Final TestReading Poetry for 10Answer all five English 103 Final TestReading Poetry for 10Answer all five
English 103 Final TestReading Poetry for 10Answer all five
 
Artificial_Intelligence__A_Modern_Approach_Nho Vĩnh Share.pdf
Artificial_Intelligence__A_Modern_Approach_Nho Vĩnh Share.pdfArtificial_Intelligence__A_Modern_Approach_Nho Vĩnh Share.pdf
Artificial_Intelligence__A_Modern_Approach_Nho Vĩnh Share.pdf
 
2018 Modern Math Workshop - Nonparametric Regression and Classification for M...
2018 Modern Math Workshop - Nonparametric Regression and Classification for M...2018 Modern Math Workshop - Nonparametric Regression and Classification for M...
2018 Modern Math Workshop - Nonparametric Regression and Classification for M...
 
Chromatic graph theory
Chromatic graph theoryChromatic graph theory
Chromatic graph theory
 
Summary Of Defending Against The Indefensible Essay
Summary Of Defending Against The Indefensible EssaySummary Of Defending Against The Indefensible Essay
Summary Of Defending Against The Indefensible Essay
 
The Design and Analysis of Computer Algorithms [Aho, Hopcroft & Ullman 1974-0...
The Design and Analysis of Computer Algorithms [Aho, Hopcroft & Ullman 1974-0...The Design and Analysis of Computer Algorithms [Aho, Hopcroft & Ullman 1974-0...
The Design and Analysis of Computer Algorithms [Aho, Hopcroft & Ullman 1974-0...
 
Simulating Multivariate Random Normal Data using Statistical Computing Platfo...
Simulating Multivariate Random Normal Data using Statistical Computing Platfo...Simulating Multivariate Random Normal Data using Statistical Computing Platfo...
Simulating Multivariate Random Normal Data using Statistical Computing Platfo...
 
( (P A N G . N I N G T A NM i c h i g a n
 ( (P A N G . N  I  N G  T A NM i c h i g a n   ( (P A N G . N  I  N G  T A NM i c h i g a n
( (P A N G . N I N G T A NM i c h i g a n
 
( (P A N G . N I N G T A NM i c h i g a n .docx
( (P A N G . N  I  N G  T A NM i c h i g a n  .docx( (P A N G . N  I  N G  T A NM i c h i g a n  .docx
( (P A N G . N I N G T A NM i c h i g a n .docx
 
Analog_and_digital_signals_and_systems.pdf
Analog_and_digital_signals_and_systems.pdfAnalog_and_digital_signals_and_systems.pdf
Analog_and_digital_signals_and_systems.pdf
 
Rosen - Elementary number theory and its applications.pdf
Rosen - Elementary number theory and its applications.pdfRosen - Elementary number theory and its applications.pdf
Rosen - Elementary number theory and its applications.pdf
 
Using r
Using rUsing r
Using r
 
General Statistics
General StatisticsGeneral Statistics
General Statistics
 
Grimmett&Stirzaker--Probability and Random Processes Third Ed(2001).pdf
Grimmett&Stirzaker--Probability and Random Processes  Third Ed(2001).pdfGrimmett&Stirzaker--Probability and Random Processes  Third Ed(2001).pdf
Grimmett&Stirzaker--Probability and Random Processes Third Ed(2001).pdf
 
SMART Seminar Series: "Data is the new water in the digital age"
SMART Seminar Series: "Data is the new water in the digital age"SMART Seminar Series: "Data is the new water in the digital age"
SMART Seminar Series: "Data is the new water in the digital age"
 
Knowledge graph construction for research & medicine
Knowledge graph construction for research & medicineKnowledge graph construction for research & medicine
Knowledge graph construction for research & medicine
 
Applied_Statistics.pdf
Applied_Statistics.pdfApplied_Statistics.pdf
Applied_Statistics.pdf
 

Recently uploaded

ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6Vanessa Camilleri
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationdeepaannamalai16
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...Nguyen Thanh Tu Collection
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...Nguyen Thanh Tu Collection
 
How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17Celine George
 
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptxDhatriParmar
 
Indexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdfIndexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdfChristalin Nelson
 
CLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptxCLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptxAnupam32727
 
Comparative Literature in India by Amiya dev.pptx
Comparative Literature in India by Amiya dev.pptxComparative Literature in India by Amiya dev.pptx
Comparative Literature in India by Amiya dev.pptxAvaniJani1
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptxmary850239
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Association for Project Management
 
Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...
Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...
Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...Osopher
 
6 ways Samsung’s Interactive Display powered by Android changes the classroom
6 ways Samsung’s Interactive Display powered by Android changes the classroom6 ways Samsung’s Interactive Display powered by Android changes the classroom
6 ways Samsung’s Interactive Display powered by Android changes the classroomSamsung Business USA
 
The role of Geography in climate education: science and active citizenship
The role of Geography in climate education: science and active citizenshipThe role of Geography in climate education: science and active citizenship
The role of Geography in climate education: science and active citizenshipKarl Donert
 
CHUYÊN ĐỀ ÔN THEO CÂU CHO HỌC SINH LỚP 12 ĐỂ ĐẠT ĐIỂM 5+ THI TỐT NGHIỆP THPT ...
CHUYÊN ĐỀ ÔN THEO CÂU CHO HỌC SINH LỚP 12 ĐỂ ĐẠT ĐIỂM 5+ THI TỐT NGHIỆP THPT ...CHUYÊN ĐỀ ÔN THEO CÂU CHO HỌC SINH LỚP 12 ĐỂ ĐẠT ĐIỂM 5+ THI TỐT NGHIỆP THPT ...
CHUYÊN ĐỀ ÔN THEO CÂU CHO HỌC SINH LỚP 12 ĐỂ ĐẠT ĐIỂM 5+ THI TỐT NGHIỆP THPT ...Nguyen Thanh Tu Collection
 
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptxMan or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptxDhatriParmar
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxSayali Powar
 

Recently uploaded (20)

ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6ICS 2208 Lecture Slide Notes for Topic 6
ICS 2208 Lecture Slide Notes for Topic 6
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentation
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 - I-LEARN SMART WORLD - CẢ NĂM - CÓ FILE NGHE (BẢN...
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
 
How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17
 
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
 
Indexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdfIndexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdf
 
CLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptxCLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptx
 
Comparative Literature in India by Amiya dev.pptx
Comparative Literature in India by Amiya dev.pptxComparative Literature in India by Amiya dev.pptx
Comparative Literature in India by Amiya dev.pptx
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx
 
CARNAVAL COM MAGIA E EUFORIA _
CARNAVAL COM MAGIA E EUFORIA            _CARNAVAL COM MAGIA E EUFORIA            _
CARNAVAL COM MAGIA E EUFORIA _
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
 
prashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Professionprashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Profession
 
Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...
Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...
Healthy Minds, Flourishing Lives: A Philosophical Approach to Mental Health a...
 
6 ways Samsung’s Interactive Display powered by Android changes the classroom
6 ways Samsung’s Interactive Display powered by Android changes the classroom6 ways Samsung’s Interactive Display powered by Android changes the classroom
6 ways Samsung’s Interactive Display powered by Android changes the classroom
 
The role of Geography in climate education: science and active citizenship
The role of Geography in climate education: science and active citizenshipThe role of Geography in climate education: science and active citizenship
The role of Geography in climate education: science and active citizenship
 
CHUYÊN ĐỀ ÔN THEO CÂU CHO HỌC SINH LỚP 12 ĐỂ ĐẠT ĐIỂM 5+ THI TỐT NGHIỆP THPT ...
CHUYÊN ĐỀ ÔN THEO CÂU CHO HỌC SINH LỚP 12 ĐỂ ĐẠT ĐIỂM 5+ THI TỐT NGHIỆP THPT ...CHUYÊN ĐỀ ÔN THEO CÂU CHO HỌC SINH LỚP 12 ĐỂ ĐẠT ĐIỂM 5+ THI TỐT NGHIỆP THPT ...
CHUYÊN ĐỀ ÔN THEO CÂU CHO HỌC SINH LỚP 12 ĐỂ ĐẠT ĐIỂM 5+ THI TỐT NGHIỆP THPT ...
 
Chi-Square Test Non Parametric Test Categorical Variable
Chi-Square Test Non Parametric Test Categorical VariableChi-Square Test Non Parametric Test Categorical Variable
Chi-Square Test Non Parametric Test Categorical Variable
 
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptxMan or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
 

Fundamentals of information_theory_and_coding_design__discrete_mathematics_and_its_applications_

  • 1.
  • 2. Abstract Algebra Applications with Maple, Richard E. Klima, Ernest Stitzinger, and Neil P. Sigmon Algebraic Number Theory, Richard A. Mollin An Atlas of The Smaller Maps in Orientable and Nonorientable Surfaces, David M. Jackson and Terry I. Visentin An Introduction to Crytography, Richard A. Mollin Combinatorial Algorithms: Generation Enumeration and Search, Donald L. Kreher and Douglas R. Stinson The CRC Handbook of Combinatorial Designs, Charles J. Colbourn and Jeffrey H. Dinitz Cryptography: Theory and Practice, Second Edition, Douglas R. Stinson Design Theory, Charles C. Lindner and Christopher A. Rodgers Frames and Resolvable Designs: Uses, Constructions, and Existence, Steven Furino, Ying Miao, and Jianxing Yin Fundamental Number Theory with Applications, Richard A. Mollin Graph Theory and Its Applications, Jonathan Gross and Jay Yellen Handbook of Applied Cryptography, Alfred J. Menezes, Paul C. van Oorschot, and Scott A. Vanstone Handbook of Constrained Optimization, Herbert B. Shulman and Venkat Venkateswaran Handbook of Discrete and Combinatorial Mathematics, Kenneth H. Rosen Handbook of Discrete and Computational Geometry, Jacob E. Goodman and Joseph O’Rourke Introduction to Information Theory and Data Compression, Darrel R. Hankerson, Greg A. Harris, and Peter D. Johnson Series Editor Kenneth H. Rosen, Ph.D. AT&T Laboratories Middletown, New Jersey and DISCRETE MATHEMATICS ITS APPLICATIONS
  • 3. Continued Titles Network Reliability: Experiments with a Symbolic Algebra Environment, Daryl D. Harms, Miroslav Kraetzl, Charles J. Colbourn, and John S. Devitt RSA and Public-Key Cryptography Richard A. Mollin Quadratics, Richard A. Mollin Verificaton of Computer Codes in Computational Science and Engineering, Patrick Knupp and Kambiz Salari
  • 4. CHAPMAN & HALL/CRC A CRC Press Company Boca Raton London NewYork Washington, D.C. Roberto Togneri Christopher J.S. deSilva FUNDAMENTALS of INFORMATION THEORY and CODING DESIGN DISCRETE MA THEMA TICS A ND ITS A PPLICA TIONS Series Editor KENNETH H. ROSEN
  • 5. 7KLV ERRN FRQWDLQV LQIRUPDWLRQ REWDLQHG IURP DXWKHQWLF DQG KLJKO UHJDUGHG VRXUFHV 5HSULQWHG PDWHULDO LV TXRWHG ZLWK SHUPLVVLRQ DQG VRXUFHV DUH LQGLFDWHG $ ZLGH YDULHW RI UHIHUHQFHV DUH OLVWHG 5HDVRQDEOH HIIRUWV KDYH EHHQ PDGH WR SXEOLVK UHOLDEOH GDWD DQG LQIRUPDWLRQ EXW WKH DXWKRU DQG WKH SXEOLVKHU FDQQRW DVVXPH UHVSRQVLELOLW IRU WKH YDOLGLW RI DOO PDWHULDOV RU IRU WKH FRQVHTXHQFHV RI WKHLU XVH 1HLWKHU WKLV ERRN QRU DQ SDUW PD EH UHSURGXFHG RU WUDQVPLWWHG LQ DQ IRUP RU E DQ PHDQV HOHFWURQLF RU PHFKDQLFDO LQFOXGLQJ SKRWRFRSLQJ PLFURÀOPLQJ DQG UHFRUGLQJ RU E DQ LQIRUPDWLRQ VWRUDJH RU UHWULHYDO VVWHP ZLWKRXW SULRU SHUPLVVLRQ LQ ZULWLQJ IURP WKH SXEOLVKHU 7KH FRQVHQW RI 5 3UHVV // GRHV QRW H[WHQG WR FRSLQJ IRU JHQHUDO GLVWULEXWLRQ IRU SURPRWLRQ IRU FUHDWLQJ QHZ ZRUNV RU IRU UHVDOH 6SHFLÀF SHUPLVVLRQ PXVW EH REWDLQHG LQ ZULWLQJ IURP 5 3UHVV // IRU VXFK FRSLQJ 'LUHFW DOO LQTXLULHV WR 5 3UHVV // 1: RUSRUDWH %OYG %RFD 5DWRQ )ORULGD 7UDGHPDUN 1RWLFH 3URGXFW RU FRUSRUDWH QDPHV PD EH WUDGHPDUNV RU UHJLVWHUHG WUDGHPDUNV DQG DUH XVHG RQO IRU LGHQWLÀFDWLRQ DQG H[SODQDWLRQ ZLWKRXW LQWHQW WR LQIULQJH 9LVLW WKH 5 3UHVV :HE VLWH DW ZZZFUFSUHVVFRP ‹ E KDSPDQ +DOO5 1R FODLP WR RULJLQDO 86 *RYHUQPHQW ZRUNV ,QWHUQDWLRQDO 6WDQGDUG %RRN 1XPEHU /LEUDU RI RQJUHVV DUG 1XPEHU 3ULQWHG LQ WKH 8QLWHG 6WDWHV RI $PHULFD 3ULQWHG RQ DFLGIUHH SDSHU /LEUDU RI RQJUHVV DWDORJLQJLQ3XEOLFDWLRQ 'DWD 7RJQHUL 5REHUWR )XQGDPHQWDOV RI LQIRUPDWLRQ WKHRU DQG FRGLQJ GHVLJQ 5REHUWR 7RJQHUL DQG KULVWRSKHU -6 GH6LOYD S FP ³ 'LVFUHWH PDWKHPDWLFV DQG LWV DSSOLFDWLRQV
  • 6. ,QFOXGHV ELEOLRJUDSKLFDO UHIHUHQFHV DQG LQGH[ ,6%1 DON SDSHU
  • 7. ,QIRUPDWLRQ WKHRU RGLQJ WKHRU , 'H6LOYD KULVWRSKHU - 6 ,, 7LWOH ,,, 5 3UHVV VHULHV RQ GLVFUHWH PDWKHPDWLFV DQG LWV DSSOLFDWLRQV 4 7 d³GF This edition published in the Taylor Francis e-Library, 2006. “To purchase your own copy of this or any of Taylor Francis or Routledge’s collection of thousands of eBooks please go to www.eBookstore.tandf.co.uk.”
  • 8. Preface What is information? How do we quantify or measure the amount of information that is present in a file of data, or a string of text? How do we encode the information so that it can be stored efficiently, or transmitted reliably? The main concepts and principles of information theory were developed by Claude E. Shannon in the 1940s. Yet only now, and thanks to the emergence of the information age and digital communication, are the ideas of information theory being looked at again in a new light. Because of information theory and the results arising from coding theory we now know how to quantify information, how we can efficiently encode it and how reliably we can transmit it. This book introduces the main concepts behind how we model information sources and channels, how we code sources for efficient storage and transmission, and the fundamentals of coding theory and applications to state-of-the-art error correcting and error detecting codes. This textbook has been written for upper level undergraduate students and graduate students in mathematics, engineering and computer science. Most of the material presented in this text was developed over many years at The University of West- ern Australia in the unit Information Theory and Coding 314, which was a core unit for students majoring in Communications and Electrical and Electronic Engineering, and was a unit offered to students enrolled in the Master of Engineering by Course- work and Dissertation in the Intelligent Information Processing Systems course. The number of books on the market dealing with information theory and coding has been on the rise over the past five years. However, very few, if any, of these books have been able to cover the fundamentals of the theory without losing the reader in the complex mathematical abstractions. And fewer books are able to provide the important theoretical framework when discussing the algorithms and implementa- tion details of modern coding systems. This book does not abandon the theoretical foundations of information and coding theory and presents working algorithms and implementations which can be used to fabricate and design real systems. The main emphasis is on the underlying concepts that govern information theory and the nec- essary mathematical background that describe modern coding systems. One of the strengths of the book are the many worked examples that appear throughout the book that allow the reader to immediately understand the concept being explained, or the algorithm being described. These are backed up by fairly comprehensive exercise sets at the end of each chapter (including exercises identified by an * which are more advanced or challenging). v
  • 9. vi The material in the book has been selected for completeness and to present a balanced coverage. There is discussion of cascading of information channels and additivity of information which is rarely found in modern texts. Arithmetic coding is fully explained with both worked examples for encoding and decoding. The connection between coding of extensions and Markov modelling is clearly established (this is usually not apparent in other textbooks). Three complete chapters are devoted to block codes for error detection and correction. A large part of these chapters deals with an exposition of the concepts from abstract algebra that underpin the design of these codes. We decided that this material should form part of the main text (rather than be relegated to an appendix) to emphasise the importance of understanding the mathematics of these and other advanced coding strategies. Chapter 1 introduces the concepts of entropy and information sources and explains how information sources are modelled. In Chapter 2 this analysis is extended to information channels where the concept of mutual information is introduced and channel capacity is discussed. Chapter 3 covers source coding for efficient storage and transmission with an introduction to the theory and main concepts, a discussion of Shannon’s Noiseless Coding Theorem and details of the Huffman and arithmetic coding algorithms. Chapter 4 provides the basic principles behind the various com- pression algorithms including run-length coding and dictionary coders. Chapter 5 introduces the fundamental principles of channel coding, the importance of the Ham- ming distance in the analysis and design of codes and a statement of what Shannon’s Fundamental Coding Theorem tells us we can do with channel codes. Chapter 6 introduces the algebraic concepts of groups, rings, fields and linear spaces over the binary field and introduces binary block codes. Chapter 7 provides the details of the theory of rings of polynomials and cyclic codes and describes how to analyse and design various linear cyclic codes including Hamming codes, Cyclic Redundancy Codes and Reed-Muller codes. Chapter 8 deals with burst-correcting codes and de- scribes the design of Fire codes, BCH codes and Reed-Solomon codes. Chapter 9 completes the discussion on channel coding by describing the convolutional encoder, decoding of convolutional codes, trellis modulation and Turbo codes. This book can be used as a textbook for a one semester undergraduate course in in- formation theory and source coding (all of Chapters 1 to 4), a one semester graduate course in coding theory (all of Chapters 5 to 9) or as part of a one semester under- graduate course in communications systems covering information theory and coding (selected material from Chapters 1, 2, 3, 5, 6 and 7). We would like to thank Sean Davey and Nishith Arora for their help with the LATEX formatting of the manuscript. We would also like to thank Ken Rosen for his review of our draft manuscript and his many helpful suggestions and Sunil Nair from CRC Press for encouraging us to write this book in the first place! Our examples on arithmetic coding were greatly facilitated by the use of the conver- sion calculator (which is one of the few that can handle fractions!) made available by www.math.com.
  • 10. vii The manuscript was written in LATEX and we are indebted to the open source software community for developing such a powerful text processing environment. We are especially grateful to the developers of LyX (www.lyx.org) for making writing the document that much more enjoyable and to the makers of xfig (www.xfig.org) for providing such an easy-to-use drawing package. Roberto Togneri Chris deSilva
  • 11.
  • 12. Contents 1 Entropy and Information 1 1.1 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Structure in Randomness . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 First Concepts of Probability Theory . . . . . . . . . . . . . . . . 2 1.4 Surprise and Entropy . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.5 Units of Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.6 The Minimum and Maximum Values of Entropy . . . . . . . . . . 7 1.7 A Useful Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.8 Joint Probability Distribution Functions . . . . . . . . . . . . . . . 10 1.9 Conditional Probability and Bayes’ Theorem . . . . . . . . . . . . 12 1.10 Conditional Probability Distributions and Conditional Entropy . . . 14 1.11 Information Sources . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.12 Memoryless Information Sources . . . . . . . . . . . . . . . . . . 18 1.13 Markov Sources and n-gram Models . . . . . . . . . . . . . . . . . 19 1.14 Stationary Distributions . . . . . . . . . . . . . . . . . . . . . . . 23 1.15 The Entropy of Markov Sources . . . . . . . . . . . . . . . . . . . 27 1.16 Sequences of Symbols . . . . . . . . . . . . . . . . . . . . . . . . 29 1.17 The Adjoint Source of a Markov Source . . . . . . . . . . . . . . . 31 1.18 Extensions of Sources . . . . . . . . . . . . . . . . . . . . . . . . 34 1.19 Infinite Sample Spaces . . . . . . . . . . . . . . . . . . . . . . . . 41 1.20 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 1.21 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 2 Information Channels 51 2.1 What Are Information Channels? . . . . . . . . . . . . . . . . . . 51 2.2 BSC and BEC Channels . . . . . . . . . . . . . . . . . . . . . . . 54 2.3 Mutual Information . . . . . . . . . . . . . . . . . . . . . . . . . . 56 2.3.1 Importance of Mutual Information . . . . . . . . . . . . . . 61 2.3.2 Properties of the Mutual Information . . . . . . . . . . . . 61 2.4 Noiseless and Deterministic Channels . . . . . . . . . . . . . . . . 66 2.4.1 Noiseless Channels . . . . . . . . . . . . . . . . . . . . . . 66 2.4.2 Deterministic Channels . . . . . . . . . . . . . . . . . . . . 68 2.5 Cascaded Channels . . . . . . . . . . . . . . . . . . . . . . . . . . 69 2.6 Additivity of Mutual Information . . . . . . . . . . . . . . . . . . 73 2.7 Channel Capacity: Maximum Mutual Information . . . . . . . . . 78 2.7.1 Channel Capacity of a BSC . . . . . . . . . . . . . . . . . 79 2.7.2 Channel Capacity of a BEC . . . . . . . . . . . . . . . . . 80 ix
  • 13. x 2.7.3 Channel Capacity of Weakly Symmetric Channels . . . . . 81 2.8 Continuous Channels and Gaussian Channels . . . . . . . . . . . . 83 2.9 Information Capacity Theorem . . . . . . . . . . . . . . . . . . . . 85 2.10 Rate Distortion Theory . . . . . . . . . . . . . . . . . . . . . . . . 88 2.10.1 Properties of Ê´ µ . . . . . . . . . . . . . . . . . . . . . . 93 2.11 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 2.12 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 3 Source Coding 105 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 3.2 Instantaneous Codes . . . . . . . . . . . . . . . . . . . . . . . . . 107 3.2.1 Construction of Instantaneous Codes . . . . . . . . . . . . . 110 3.2.2 Decoding Instantaneous Codes . . . . . . . . . . . . . . . . 112 3.2.3 Properties of Instantaneous Codes . . . . . . . . . . . . . . 113 3.2.4 Sensitivity to Bit Errors . . . . . . . . . . . . . . . . . . . 113 3.3 The Kraft Inequality and McMillan’s Theorem . . . . . . . . . . . 115 3.3.1 The Kraft Inequality . . . . . . . . . . . . . . . . . . . . . 115 3.3.2 McMillan’s Theorem . . . . . . . . . . . . . . . . . . . . . 118 3.4 Average Length and Compact Codes . . . . . . . . . . . . . . . . . 121 3.4.1 Average Length . . . . . . . . . . . . . . . . . . . . . . . . 121 3.4.2 Lower Bound on Average Length . . . . . . . . . . . . . . 122 3.5 Shannon’s Noiseless Coding Theorem . . . . . . . . . . . . . . . . 125 3.5.1 Shannon’s Theorem for Zero-Memory Sources . . . . . . . 125 3.5.2 Shannon’s Theorem for Markov Sources . . . . . . . . . . 129 3.5.3 Code Efficiency and Channel Capacity . . . . . . . . . . . 131 3.6 Fano Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 3.7 Huffman Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 3.7.1 Huffman Codes . . . . . . . . . . . . . . . . . . . . . . . . 136 3.7.2 Binary Huffman Coding Algorithm . . . . . . . . . . . . . 137 3.7.3 Software Implementation of Binary Huffman Coding . . . . 142 3.7.4 Ö -ary Huffman Codes . . . . . . . . . . . . . . . . . . . . 142 3.8 Arithmetic Coding . . . . . . . . . . . . . . . . . . . . . . . . . . 146 3.8.1 Encoding and Decoding Algorithms . . . . . . . . . . . . . 149 3.8.2 Encoding and Decoding with Scaling . . . . . . . . . . . . 154 3.8.3 Is Arithmetic Coding Better Than Huffman Coding? . . . . 156 3.9 Higher-order Modelling . . . . . . . . . . . . . . . . . . . . . . . 157 3.9.1 Higher-order Huffman Coding . . . . . . . . . . . . . . . . 158 3.9.2 Higher-order Arithmetic Coding . . . . . . . . . . . . . . . 159 3.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 3.11 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 4 Data Compression 171 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 4.2 Basic Concepts of Data Compression . . . . . . . . . . . . . . . . 172 4.3 Run-length Coding . . . . . . . . . . . . . . . . . . . . . . . . . . 173
  • 14. xi 4.4 The CCITT Standard for Facsimile Transmission . . . . . . . . . . 174 4.5 Block-sorting Compression . . . . . . . . . . . . . . . . . . . . . 176 4.6 Dictionary Coding . . . . . . . . . . . . . . . . . . . . . . . . . . 179 4.7 Statistical Compression . . . . . . . . . . . . . . . . . . . . . . . 185 4.8 Prediction by Partial Matching . . . . . . . . . . . . . . . . . . . . 186 4.9 Image Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 4.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194 4.11 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196 5 Fundamentals of Channel Coding 199 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 5.2 Code Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 5.3 Decoding Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 5.4 Hamming Distance . . . . . . . . . . . . . . . . . . . . . . . . . . 206 5.4.1 Hamming Distance Decoding Rule for BSCs . . . . . . . . 207 5.4.2 Error Detection/Correction Using the Hamming Distance . . 208 5.5 Bounds on Å, Maximal Codes and Perfect Codes . . . . . . . . . 213 5.5.1 Upper Bounds on Å and the Hamming Bound . . . . . . . 213 5.5.2 Maximal Codes and the Gilbert Bound . . . . . . . . . . . 217 5.5.3 Redundancy Requirements for Ø-bit Error Correction . . . . 219 5.5.4 Perfect Codes for Ø-bit Error Correction . . . . . . . . . . . 220 5.6 Error Probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . 222 5.6.1 Bit and Block Error Probabilities and Code Rate . . . . . . 223 5.6.2 Probability of Undetected Block Error . . . . . . . . . . . . 225 5.7 Shannon’s Fundamental Coding Theorem . . . . . . . . . . . . . . 227 5.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 5.9 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234 6 Error-Correcting Codes 235 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235 6.2 Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236 6.3 Rings and Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . 242 6.4 Linear Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246 6.5 Linear Spaces over the Binary Field . . . . . . . . . . . . . . . . . 251 6.6 Linear Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255 6.7 Encoding and Decoding . . . . . . . . . . . . . . . . . . . . . . . 269 6.8 Codes Derived from Hadamard Matrices . . . . . . . . . . . . . . 272 6.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274 6.10 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278 7 Cyclic Codes 281 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281 7.2 Rings of Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . 281 7.3 Cyclic Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291 7.4 Encoding and Decoding of Cyclic Codes . . . . . . . . . . . . . . 296
  • 15. xii 7.5 Encoding and Decoding Circuits for Cyclic Codes . . . . . . . . . 300 7.6 The Golay Code . . . . . . . . . . . . . . . . . . . . . . . . . . . 304 7.7 Hamming Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . 304 7.8 Cyclic Redundancy Check Codes . . . . . . . . . . . . . . . . . . 307 7.9 Reed-Muller Codes . . . . . . . . . . . . . . . . . . . . . . . . . . 309 7.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314 7.11 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318 8 Burst-Correcting Codes 319 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319 8.2 Finite Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319 8.3 Irreducible Polynomials . . . . . . . . . . . . . . . . . . . . . . . 321 8.4 Construction of Finite Fields . . . . . . . . . . . . . . . . . . . . . 327 8.5 Bursts of Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337 8.6 Fire Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337 8.7 Minimum Polynomials . . . . . . . . . . . . . . . . . . . . . . . . 338 8.8 Bose-Chaudhuri-Hocquenghem Codes . . . . . . . . . . . . . . . 340 8.9 Other Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342 8.10 Reed-Solomon Codes . . . . . . . . . . . . . . . . . . . . . . . . 345 8.11 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347 8.12 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349 9 Convolutional Codes 351 9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351 9.2 A Simple Example . . . . . . . . . . . . . . . . . . . . . . . . . . 351 9.3 Binary Convolutional Codes . . . . . . . . . . . . . . . . . . . . . 356 9.4 Decoding Convolutional Codes . . . . . . . . . . . . . . . . . . . 360 9.5 The Viterbi Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 361 9.6 Sequential Decoding . . . . . . . . . . . . . . . . . . . . . . . . . 367 9.7 Trellis Modulation . . . . . . . . . . . . . . . . . . . . . . . . . . 371 9.8 Turbo Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375 9.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377 9.10 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379 Index 381
  • 16. Chapter 1 Entropy and Information 1.1 Structure Structure is a concept of which we all have an intuitive understanding. However, it is not easy to articulate that understanding and give a precise definition of what structure is. We might try to explain structure in terms of such things as regularity, predictability, symmetry and permanence. We might also try to describe what struc- ture is not, using terms such as featureless, random, chaotic, transient and aleatory. Part of the problem of trying to define structure is that there are many different kinds of behaviour and phenomena which might be described as structured, and finding a definition that covers all of them is very difficult. Consider the distribution of the stars in the night sky. Overall, it would appear that this distribution is random, without any structure. Yet people have found patterns in the stars and imposed a structure on the distribution by naming constellations. Again, consider what would happen if you took the pixels on the screen of your computer when it was showing a complicated and colourful scene and strung them out in a single row. The distribution of colours in this single row of pixels would appear to be quite arbitrary, yet the complicated pattern of the two-dimensional array of pixels would still be there. These two examples illustrate the point that we must distinguish between the pres- ence of structure and our perception of structure. In the case of the constellations, the structure is imposed by our brains. In the case of the picture on our computer screen, we can only see the pattern if the pixels are arranged in a certain way. Structure relates to the way in which things are put together, the way in which the parts make up the whole. Yet there is a difference between the structure of, say, a bridge and that of a piece of music. The parts of the Golden Gate Bridge or the Sydney Harbour Bridge are solid and fixed in relation to one another. Seeing one part of the bridge gives you a good idea of what the rest of it looks like. The structure of pieces of music is quite different. The notes of a melody can be arranged according to the whim or the genius of the composer. Having heard part of the melody you cannot be sure of what the next note is going to be, leave alone 1
  • 17. 2 Fundamentals of Information Theory and Coding Design any other part of the melody. In fact, pieces of music often have a complicated, multi-layered structure, which is not obvious to the casual listener. In this book, we are going to be concerned with things that have structure. The kinds of structure we will be concerned with will be like the structure of pieces of music. They will not be fixed and obvious. 1.2 Structure in Randomness Structure may be present in phenomena that appear to be random. When it is present, it makes the phenomena more predictable. Nevertheless, the fact that randomness is present means that we have to talk about the phenomena in terms of probabilities. Let us consider a very simple example of how structure can make a random phe- nomenon more predictable. Suppose we have a fair die. The probability of any face coming up when the die is thrown is 1/6. In this case, it is not possible to predict which face will come up more than one-sixth of the time, on average. On the other hand, if we have a die that has been biased, this introduces some struc- ture into the situation. Suppose that the biasing has the effect of making the probabil- ity of the face with six spots coming up 55/100, the probability of the face with one spot coming up 5/100 and the probability of any other face coming up 1/10. Then the prediction that the face with six spots will come up will be right more than half the time, on average. Another example of structure in randomness that facilitates prediction arises from phenomena that are correlated. If we have information about one of the phenomena, we can make predictions about the other. For example, we know that the IQ of iden- tical twins is highly correlated. In general, we cannot make any reliable prediction about the IQ of one of a pair of twins. But if we know the IQ of one twin, we can make a reliable prediction of the IQ of the other. In order to talk about structure in randomness in quantitative terms, we need to use probability theory. 1.3 First Concepts of Probability Theory To describe a phenomenon in terms of probability theory, we need to define a set of outcomes, which is called the sample space. For the present, we will restrict consideration to sample spaces which are finite sets.
  • 18. Entropy and Information 3 DEFINITION 1.1 Probability Distribution A probability distribution on a sample space Ë ×½ ×¾ ×Æ is a function È that assigns a probability to each outcome in the sample space. È is a map from Ë to the unit interval, È Ë ¼ ½ , which must satisfy ÈÆ ½ È´× µ ½. DEFINITION 1.2 Events Events are subsets of the sample space. We can extend a probability distribution È from Ë to the set of all subsets of Ë, which we denote by ȴ˵, by setting È´ µ È×¾ È´×µfor any ¾ ȴ˵. Note that È´ µ ¼. An event whose probability is ¼ is impossible and an event whose probability is ½ is certain to occur. If and are events and then È´ µ È´ µ·È´ µ. DEFINITION 1.3 Expected Value If Ë ×½ ×¾ ×Æ is a sample space with probability distribution È, and Ë Î is a function from the sample space to a vector space Î , the expected value of is ÈÆ ½ È´× µ ´× µ. NOTE We will often have equations that involve summation over the elements of a finite set. In the equations above, the set has been Ë ×½ ×¾ ×Æ and the summation has been denoted by ÈÆ ½. In other places in the text we will denote such summations simply by È×¾Ë. 1.4 Surprise and Entropy In everyday life, events can surprise us. Usually, the more unlikely or unexpected an event is, the more surprising it is. We can quantify this idea using a probability distribution. DEFINITION 1.4 Surprise If is an event in a sample space Ë, we define the surprise of to be ×´ µ   ÐÓ ´È´ µµ ÐÓ ´½ È´ µµ. Events for which È´ µ ½, which are certain to occur, have zero surprise, as we would expect, and events that are impossible, that is, for which È´ µ ¼, have infinite surprise.
  • 19. 4 Fundamentals of Information Theory and Coding Design Defining the surprise as the negative logarithm of the probability not only gives us the appropriate limiting values as the probability tends to ¼ or ½, it also makes surprise additive. If several independent events occur in succession, the total surprise they generate is the sum of their individual surprises. DEFINITION 1.5 Entropy We can restrict the surprise to the sample space and consider it to be a function from the sample space to the real numbers. The expected value of the surprise is the entropy of the probability distribution. If the sample space is Ë ×½ ×¾ ×Æ , with probability distribution È, the entropy of the probability distribution is given by À´Èµ   Æ ½ È´× µ ÐÓ ´È´× µµ (1.1) The concept of entropy was introduced into thermodynamics in the nineteenth cen- tury. It was considered to be a measure of the extent to which a system was disor- dered. The tendency of systems to become more disordered over time is described by the Second Law of Thermodynamics, which states that the entropy of a system can- not spontaneously decrease. In the 1940’s, Shannon [6] introduced the concept into communications theory and founded the subject of information theory. It was then realised that entropy is a property of any stochastic system and the concept is now used widely in many fields. Today, information theory (as described in books such as [1], [2], [3]) is still principally concerned with communications systems, but there are widespread applications in statistics, information processing and computing (see [2], [4], [5]). Let us consider some examples of probability distributions and see how the entropy is related to predictability. First, let us note the form of the function ×´Ôµ  ÔÐÓ ´Ôµ where ¼ Ô ½ and ÐÓ denotes the logarithm to base 2. (The actual base does not matter, but we shall be using base 2 throughout the rest of this book, so we may as well start here.) The graph of this function is shown in Figure 1.1. Note that  ÔÐÓ ´Ôµ approaches ¼ as Ô tends to ¼ and also as Ô tends to ½. This means that outcomes that are almost certain to occur and outcomes that are unlikely to occur both contribute little to the entropy. Outcomes whose probability is close to ¼ make a comparatively large contribution to the entropy. EXAMPLE 1.1 Ë ×½ ×¾ with È´×½µ ¼ È´×¾µ. The entropy is À´Èµ  ´¼ µ´ ½µ   ´¼ µ´ ½µ ½ In this case, ×½ and ×¾ are equally likely to occur and the situation is as unpredictable as it can be.
  • 20. Entropy and Information 5 0 0.1 0.2 0.3 0.4 0.5 0.6 0 0.2 0.4 0.6 0.8 1 -p*log(p) p FIGURE 1.1 The graph of  Ô ÐÓ ´Ôµ. EXAMPLE 1.2 Ë ×½ ×¾ with È´×½µ ¼ , and È´×¾µ ¼ ¼¿½¾ . The entropy is À´Èµ  ´¼ µ´ ¼ ¼ µ  ´¼ ¼¿½¾ µ´  µ ¼ ¾¼ In this case, the situation is more predictable, with ×½ more than thirty times more likely to occur than ×¾. The entropy is close to zero. EXAMPLE 1.3 Ë ×½ ×¾ with È´×½µ ½ ¼, and È´×¾µ ¼ ¼. Using the convention that ¼ ÐÓ ´¼µ ¼, the entropy is ¼. The situation is entirely predictable, as ×½ always occurs. EXAMPLE 1.4 Ë ×½ ×¾ ׿ × × × , with È´× µ ½ for ½ ¾ . The entropy is ¾ and the situation is as unpredictable as it can be. EXAMPLE 1.5 Ë ×½ ×¾ ׿ × × × , with È´×½µ ¼ È´× µ ¼ ¼¼½ for ¾ ¿ .
  • 21. 6 Fundamentals of Information Theory and Coding Design The entropy is ¼ ¼ and the situation is fairly predictable as ×½ will occur far more frequently than any other outcome. EXAMPLE 1.6 Ë ×½ ×¾ ׿ × × × , with È ´×½µ ¼ È ´×¾µ È ´× µ ¼ ¼¼½ for ¿ . The entropy is ½ ¼ ¾ and the situation is about as predictable as in Example 1.1 above, with outcomes ×½ and ×¾ equally likely to occur and the others very unlikely to occur. Roughly speaking, a system whose entropy is is about as unpredictable as a system with ¾ equally likely outcomes. 1.5 Units of Entropy The units in which entropy is measured depend on the base of the logarithms used to calculate it. If we use logarithms to the base 2, then the unit is the bit. If we use natural logarithms (base ), the entropy is measured in natural units, sometimes referred to as nits. Converting between the different units is simple. PROPOSITION 1.1 If À is the entropy of a probability distribution measured using natural logarithms, and ÀÖ istheentropyofthesameprobabilitydistribution measured using logarithms to the base Ö, then ÀÖ À ÐÒ´Öµ (1.2) PROOF Let the sample space be Ë ×½ ×¾ ×Æ , with probability distri- bution È . For any positive number Ü, ÐҴܵ ÐÒ´Öµ ÐÓ Ö´Üµ (1.3) It follows that ÀÖ´È µ   Æ ½ È ´× µ ÐÓ Ö´È ´× µµ   Æ ½ È ´× µ ÐÒ´È ´× µµ ÐÒ´Öµ
  • 22. Entropy and Information 7  ÈÆ ½ È´× µÐÒ´È´× µµ ÐÒ´Öµ À ´Èµ ÐÒ´Öµ (1.4) 1.6 The Minimum and Maximum Values of Entropy If we have a sample space Ë with Æ elements, and probability distribution È on Ë, it is convenient to denote the probability of × ¾ Ë by Ô . We can construct a vector in ÊÆ consisting of the probabilities: Ô ¾ Ô½ Ô¾ ... ÔÆ ¿ Because the probabilities have to add up to unity, the set of all probability distribu- tions forms a simplex in ÊÆ, namely à ´ Ô ¾ ÊÆ Æ ½ Ô ½ µ We can consider the entropy to be a function defined on this simplex. Since it is a continuous function, extreme values will occur at the vertices of this simplex, at points where all except one of the probabilities are zero. If ÔÚ is a vertex, then the entropy there will be À´ÔÚµ ´Æ  ½µ ¼ ÐÓ ´¼µ·½ ÐÓ ´½µ The logarithm of zero is not defined, but the limit of ÜÐÓ ´Üµ as Ü tends to ¼ ex- ists and is equal to zero. If we take the limiting values, we see that at any vertex, À´ÔÚµ ¼, as ÐÓ ´½µ ¼. This is the minimum value of the entropy function. The entropy function has a maximum value at an interior point of the simplex. To find it we can use Lagrange multipliers. THEOREM 1.1 If we have a sample space with Æ elements, the maximum value of the entropy function is ÐÓ ´Æµ.
  • 23. 8 Fundamentals of Information Theory and Coding Design PROOF We want to find the maximum value of À´Ôµ   Æ ½ Ô ÐÓ ´Ô µ (1.5) subject to the constraint Æ ½ Ô ½ (1.6) We introduce the Lagrange multiplier , and put ´Ôµ À´Ôµ · Æ ½ Ô  ½ (1.7) To find the maximum value we have to solve Ô ¼ (1.8) for ½ ¾ Æ and Æ ½ Ô ½ (1.9) Ô  ÐÓ ´Ô µ  ½ · (1.10) so Ô  ½ (1.11) for each . The remaining condition gives Æ  ½ ½ (1.12) which can be solved for , or can be used directly to give Ô ½ Æ (1.13) for all . Using these values for the Ô , we get À´Ôµ  Æ ½ Æ ÐÓ ´½ Ƶ ÐÓ ´Æµ (1.14)
  • 24. Entropy and Information 9 1.7 A Useful Inequality LEMMA 1.1 If Ô½ Ô¾ ÔÆ and Õ½ Õ¾ ÕÆ are all non-negative numbers that satisfy the conditions ÈÆ ½ ÔÒ ½ and ÈÆ ½ ÕÒ ½, then   Æ ½ Ô ÐÓ ´Ô µ   Æ ½ Ô ÐÓ ´Õ µ (1.15) with equality if and only if Ô Õ for all . PROOF We prove the result for the natural logarithm; the result for any other base follows immediately from the identity ÐҴܵ ÐÒ´ÖµÐÓ Ö´Üµ (1.16) It is a standard result about the logarithm function that ÐÒÜ Ü  ½ (1.17) for Ü ¼, with equality if and only if Ü ½. Substituting Ü Õ Ô , we get ÐÒ´Õ Ô µ Õ Ô  ½ (1.18) with equality if and only if Ô Õ . This holds for all ½ ¾ Æ, so if we multiply by Ô and sum over the , we get Æ ½ Ô ÐÒ´Õ Ô µ Æ ½ ´Õ  Ô µ Æ ½ Õ   Æ ½ Ô ½ ½ ¼ (1.19) with equality if and only if Ô Õ for all . So Æ ½ Ô ÐÒ´Õ µ  Æ ½ Ô ÐÒ´Ô µ ¼ (1.20) which is the required result. The inequality can also be written in the form Æ ½ Ô ÐÓ ´Õ Ô µ ¼ (1.21) with equality if and only if Ô Õ for all . Note that putting Õ ½ Æ for all in this inequality gives us an alternative proof that the maximum value of the entropy function is ÐÓ ´Æµ.
  • 25. 10 Fundamentals of Information Theory and Coding Design 1.8 Joint Probability Distribution Functions There are many situations in which it is useful to consider sample spaces that are the Cartesian product of two or more sets. DEFINITION 1.6 Cartesian Product Let Ë ×½ ×¾ ×Å and Ì Ø½ ؾ ØÆ be two sets. The Cartesian product of Ë and Ì is the set Ë ¢ Ì ´× Ø µ ½ Å ½ Æ . The extension to the Cartesian product of more than two sets is immediate. DEFINITION 1.7 Joint Probability Distribution A joint probability distribution is a probability distribution on the Cartesian product of a number of sets. If we have Ë and Ì as above, then a joint probability distribution function assigns a probability to each pair ´× Ø µ. We can denote this probability by Ô . Since these values form a probability distribution, we have ¼ Ô ½ (1.22) for ½ Å, ½ Æ, and Å ½ Æ ½ Ô ½ (1.23) If È is the joint probability distribution function on Ë ¢ Ì, the definition of entropy becomes À´Èµ   Å ½ Æ ½ È´× Ø µÐÓ ´È´× Ø µµ   Å ½ Æ ½ Ô ÐÓ ´Ô µ (1.24) If we want to emphasise the spaces Ë and Ì, we will denote the entropy of the joint probability distribution on Ë¢Ì by À´Èˢ̵or simply by À´Ë ̵. This is known as the joint entropy of Ë and Ì. If there are probability distributions ÈË and ÈÌ on Ë and Ì, respectively, and these are independent, the joint probability distribution on Ë ¢ Ì is given by Ô ÈË´× µÈÌ´Ø µ (1.25)
  • 26. Entropy and Information 11 for ½ Å , ½ Æ . If there are correlations between the × and Ø , then this formula does not apply. DEFINITION 1.8 Marginal Distribution If È is a joint probability distribution function on Ë ¢Ì , the marginal distribution on Ë is ÈË Ë ¼ ½ given by ÈË´× µ Æ ½ È ´× Ø µ (1.26) for ½ Æ and the marginal distribution on Ì is ÈÌ Ì ¼ ½ given by ÈÌ ´Ø µ Å ½ È ´× Ø µ (1.27) for ½ Æ . There is a simple relationship between the entropy of the joint probability distribution function and that of the marginal distribution functions. THEOREM 1.2 If È is a joint probability distribution function on Ë ¢Ì , and ÈË and ÈÌ are the marginal distributions on Ë and Ì , respectively, then À´È µ À´È˵ · À´ÈÌ µ (1.28) with equality if and only if the marginal distributions are independent. PROOF À´È˵   Å ½ ÈË´× µ ÐÓ ´ÈË´× µµ   Å ½ Æ ½ È ´× Ø µ ÐÓ ´ÈË´× µµ (1.29) and similarly À´ÈÌ µ   Å ½ Æ ½ È ´× Ø µ ÐÓ ´ÈÌ ´Ø µµ (1.30) So À´È˵ · À´ÈÌ µ   Å ½ Æ ½ È ´× Ø µ ÐÓ ´ÈË´× µµ · ÐÓ ´ÈÌ ´Ø µµ
  • 27. 12 Fundamentals of Information Theory and Coding Design   Å ½ Æ ½ È´× Ø µÐÓ ´ÈË´× µÈÌ´Ø µµ (1.31) Also, À´Èµ   Å ½ Æ ½ È´× Ø µÐÓ ´È´× Ø µµ (1.32) Since Å ½ Æ ½ È´× Ø µ ½ (1.33) and Å ½ Æ ½ ÈË´× µÈÌ´Ø µ Å ½ ÈË´× µ Æ ½ ÈÌ´Ø µ ½ (1.34) we can use the inequality of Lemma 1.1 to conclude that À´Èµ À´È˵·À´È̵ (1.35) with equality if and only if È´× Ø µ ÈË´× µÈÌ´Ø µ for all and , that is, if the two marginal distributions are independent. 1.9 Conditional Probability and Bayes’ Theorem DEFINITION 1.9 Conditional Probability If Ë is a sample space with a prob- ability distribution function È, and and are events in Ë, the conditional prob- ability of given is È´ µ È´ µ È´ µ (1.36) It is obvious that È´ µÈ´ µ È´ µ È´ µÈ´ µ (1.37) Almost as obvious is one form of Bayes’ Theorem: THEOREM 1.3 If Ë is a sample space with a probability distribution function È, and and are events in Ë, then È´ µ È´ µÈ´ µ È´ µ (1.38)
  • 28. Entropy and Information 13 Bayes’ Theorem is important because it enables us to derive probabilities of hypothe- ses from observations, as in the following example. EXAMPLE 1.7 We have two jars, A and B. Jar A contains 8 green balls and 2 red balls. Jar B contains 3 green balls and 7 red balls. One jar is selected at random and a ball is drawn from it. Wehaveprobabilitiesasfollows. Thesetofjarsformsonesamplespace,Ë , with È´ µ ¼ È´ µ as one jar is as likely to be chosen as the other. The set of colours forms another sample space, Ì Ê . The probability of drawing a green ball is È´ µ ½½ ¾¼ ¼ as 11 of the 20 balls in the jars are green. Similarly, È´ Ê µ ¾¼ ¼ We have a joint probability distribution over the colours of the balls and the jars with the probability of selecting Jar A and drawing a green ball being given by È´ ´ µ µ ¼ Similarly, we have the probability of selecting Jar A and drawing a red ball È´ ´Ê µ µ ¼ ½ the probability of selecting Jar B and drawing a green ball È´ ´ µ µ ¼ ½ and the probability of selecting Jar B and drawing a red ball È´ ´Ê µ µ ¼ ¿ We have the conditional probabilities: given that Jar A was selected, the probability of drawing a green ball is È´ µ ¼ and the probability of drawing a red ball is È´ Ê µ ¼ ¾
  • 29. 14 Fundamentals of Information Theory and Coding Design Given that Jar B was selected, the corresponding probabilities are: È´ µ ¼ ¿ and È´ Ê µ ¼ We can now use Bayes’ Theorem to work out the probability of having drawn from either jar, given the colour of the ball that was drawn. If a green ball was drawn, the probability that it was drawn from Jar A is È´ µ È´ µÈ´ µ È´ µ ¼ ¢ ¼ ¼ ¼ ¿ while the probability that it was drawn from Jar B is È´ µ È´ µÈ´ µ È´ µ ¼ ¿ ¢ ¼ ¼ ¼ ¾ If a red ball was drawn, the probability that it was drawn from Jar A is È´ Ê µ È´ Ê µÈ´ µ È´ Ê µ ¼ ¾ ¢ ¼ ¼ ¼ ¾¾ while the probability that it was drawn from Jar B is È´ Ê µ È´ Ê µÈ´ µ È´ Ê µ ¼ ¢ ¼ ¼ ¼ (In this case, we could have derived these conditional probabilities from the joint probability distribution, but we chose not to do so to illustrate how Bayes’ Theorem allows us to go from the conditional probabilities of the colours given the jar selected to the conditional probabilities of the jars selected given the colours drawn.) 1.10 Conditional Probability Distributions and Conditional Entropy In this section, we have a joint probability distribution È on a Cartesian product Ë ¢ Ì, where Ë ×½ ×¾ ×Å and Ì Ø½ ؾ ØÆ , with marginal distri- butions ÈË and ÈÌ. DEFINITION 1.10 Conditional Probability of × given Ø For × ¾ Ë and Ø ¾ Ì, the conditional probability of × given Ø is È´× Ø µ È´× Ø µ ÈÌ´Ø µ È´× Ø µ ÈÅ ½ È´× Ø µ (1.39)
  • 30. Entropy and Information 15 DEFINITION 1.11 Conditional Probability Distribution given Ø For a fixed Ø , the conditional probabilities È ´× Ø µ sum to 1 over , so they form a probability distribution on Ë, the conditional probability distribution given Ø . We will denote this by ÈË Ø . DEFINITION 1.12 Conditional Entropy given Ø The conditional entropy given Ø is the entropy of the conditional probability distribution on Ë given Ø . It will be denoted À´ÈË Ø µ. À´ÈË Ø µ   Å ½ È ´× Ø µ ÐÓ ´È ´× Ø µµ (1.40) DEFINITION 1.13 Conditional Probability Distribution on Ë given Ì The conditional probability distribution on Ë given Ì is the weighted average of the conditional probability distributions given Ø for all . It will be denoted ÈË Ì . ÈË Ì ´× µ Æ ½ ÈÌ ´Ø µÈË Ø ´× µ (1.41) DEFINITION1.14 ConditionalEntropygivenÌ Theconditional entropy given Ì is the weighted average of the conditional entropies on Ë given Ø for all Ø ¾ Ì . It will be denoted À´ÈË Ì µ. À´ÈË Ì µ   Æ ½ ÈÌ ´Ø µ Å ½ È ´× Ø µ ÐÓ ´È ´× Ø µµ (1.42) Since ÈÌ ´Ø µÈ ´× Ø µ È ´× Ø µ, we can re-write this as À´ÈË Ì µ   Å ½ Æ ½ È ´× Ø µ ÐÓ ´È ´× Ø µµ (1.43) We now prove two simple results about the conditional entropies. THEOREM 1.4 À´È µ À´ÈÌ µ · À´ÈË Ì µ À´È˵ · À´ÈÌ Ëµ
  • 31. 16 Fundamentals of Information Theory and Coding Design PROOF À´Èµ   Å ½ Æ ½ È´× Ø µ ÐÓ ´È´× Ø µµ   Å ½ Æ ½ È´× Ø µ ÐÓ ´ÈÌ´Ø µÈ´× Ø µµ   Å ½ Æ ½ È´× Ø µ ÐÓ ´ÈÌ´Ø µµ   Å ½ Æ ½ È´× Ø µ ÐÓ ´È´× Ø µµ   Æ ½ ÈÌ´Ø µ ÐÓ ´ÈÌ´Ø µµ   Å ½ Æ ½ È´× Ø µ ÐÓ ´È´× Ø µµ À´È̵ · À´ÈË Ìµ (1.44) The proof of the other equality is similar. THEOREM 1.5 À´ÈË Ìµ À´È˵ with equality if and only if ÈË and ÈÌ are independent. PROOF From the previous theorem, À´Èµ À´È̵ · À´ÈË Ìµ From Theorem 1.2, À´Èµ À´È˵ · À´È̵ with equality if and only if ÈË and ÈÌ are independent. So À´È̵ · À´ÈË Ìµ À´È˵ · À´È̵. Subtracting À´È̵ from both sides we get À´ÈË Ìµ À´È˵, with equality if and only if ÈË and ÈÌ are independent. This result is obviously symmetric in Ë and Ì; so we also have À´ÈÌ Ëµ À´È̵ with equality if and only if ÈË and ÈÌ are independent. We can sum up this result by saying the conditioning reduces entropy or conditioning reduces uncertainty. 1.11 Information Sources Most of this book will be concerned with random sequences. Depending on the context, such sequences may be called time series, (discrete) stochastic processes or
  • 32. Entropy and Information 17 signals. The first term is used by statisticians, the second by mathematicians and the third by engineers. This may reflect differences in the way these people approach the subject: statisticians are primarily interested in describing such sequences in terms of probability theory, mathematicians are interested in the behaviour of such series and the ways in which they may be generated and engineers are interested in ways of using such sequences and processing them to extract useful information from them. A device or situation that produces such a sequence is called an information source. The elements of the sequence are usually drawn from a finite set, which may be referred to as the alphabet. The source can be considered to be emitting an element of the alphabet at each instant of a sequence of instants in time. The elements of the alphabet are referred to as symbols. EXAMPLE 1.8 Tossing a coin repeatedly and recording the outcomes as heads (H) or tails (T) gives us a random sequence whose alphabet is À Ì . EXAMPLE 1.9 Throwing a die repeatedly and recording the number of spots on the uppermost face gives us a random sequence whose alphabet is ½ ¾ ¿ . EXAMPLE 1.10 Computers and telecommunications equipment generate sequences of bits which are random sequences whose alphabet is ¼ ½ . EXAMPLE 1.11 A text in the English language is a random sequence whose alphabet is the set con- sisting of the letters of the alphabet, the digits and the punctuation marks. While we normally consider text to be meaningful rather than random, it is only possible to predict which letter will come next in the sequence in probabilistic terms, in general. The last example above illustrates the point that a random sequence may not appear to be random at first sight. The difference between the earlier examples and the final example is that in the English language there are correlations between each letter in the sequence and those that precede it. In contrast, there are no such correlations in the cases of tossing a coin or throwing a die repeatedly. We will consider both kinds of information sources below.
  • 33. 18 Fundamentals of Information Theory and Coding Design An obvious question that is raised by the term “information source” is: What is the “information” that the source produces? A second question, perhaps less obvious, is: How can we measure the information produced by an information source? An information source generates a sequence of symbols which has a certain degree of unpredictability. The more unpredictable the sequence is, the more information is conveyed by each symbol. The information source may impose structure on the sequence of symbols. This structure will increase the predictability of the sequence and reduce the information carried by each symbol. The random behaviour of the sequence may be described by probability distribu- tions over the alphabet. If the elements of the sequence are uncorrelated, a simple probability distribution over the alphabet may suffice. In other cases, conditional probability distributions may be required. We have already seen that entropy is a measure of predictability. For an information source, the information content of the sequence that it generates is measured by the entropy per symbol. We can compute this if we make assumptions about the kinds of structure that the information source imposes upon its output sequences. To describe an information source completely, we need to specify both the alpha- bet and the probability distribution that governs the generation of sequences. The entropy of the information source Ë with alphabet and probability distribution È will be denoted by À´Ëµ in the following sections, even though it is actually the en- tropy of È. Later on, we will wish to concentrate on the alphabet and will use À´ µ to denote the entropy of the information source, on the assumption that the alphabet will have a probability distribution associated with it. 1.12 Memoryless Information Sources For a memoryless information source, there are no correlations between the outputs of the source at different times. For each instant at which an output is emitted, there is a probability distribution over the alphabet that describes the probability of each symbol being emitted at that instant. If all the probability distributions are the same, the source is said to be stationary. If we know these probability distributions, we can calculate the information content of the sequence. EXAMPLE 1.12 Tossing a fair coin gives us an example of a stationary memoryless information source. At any instant, the probability distribution is given by È´Àµ ¼ , ȴ̵ ¼ ¼. This probability distribution has an entropy of 1 bit; so the information content is 1 bit/symbol.
  • 34. Entropy and Information 19 EXAMPLE 1.13 As an exampleof a non-stationarymemorylessinformationsource, suppose we have a faircoinandadiewith Àpainted on fourfacesandÌpainted on two faces. Tossing the coin and throwing the die in alternation will create a memoryless information source with alphabet À Ì . Every time the coin is tossed, the probability distribution of the outcomes is È´Àµ ¼ , ȴ̵ ¼ , and every time the die is thrown, the probability distribution is È´Àµ ¼ , ȴ̵ ¼ ¿¿¿. The probability distribution of the outcomes of tossing the coin has an entropy of 1 bit. The probability distribution of the outcomes of throwing the die has an entropy of 0.918 bits. The information content of the sequence is the average entropy per symbol, which is 0.959 bits/symbol. Memoryless information sources are relatively simple. More realistic information sources have memory, which is the property that the emission of a symbol at any instant depends on one or more of the symbols that were generated before it. 1.13 Markov Sources and n-gram Models Markov sources and n-gram models are descriptions of a class of information sources with memory. DEFINITION 1.15Markov Source A Markov source consists of an alphabet , a set of states ¦, a set of transitions between states, a set of labels for the transitions and two sets of probabilities. The first set of probabilities is the initial probability distribution on the set of states, which determines the probabilities of sequences starting with each symbol in the alphabet. The second set of probabilities is a set of transition probabilities. For each pair of states, and , the probability of a transition from to is È´ µ. (Note that these probabilities are fixed and do not depend on time, so that there is an implicit assumption of stationarity.) The labels on the transitions are symbols from the alphabet. To generate a sequence, a state is selected on the basis of the initial probability distri- bution. A transition from this state to another state (or to the same state) is selected on the basis of the transition probabilities, and the label of this transition is output. This process is repeated to generate the sequence of output symbols. It is convenient to represent Markov models diagrammatically in the form of a graph, with the states represented by vertices and the transitions by edges, as in the follow- ing example.
  • 35. 20 Fundamentals of Information Theory and Coding Design ²±³°½ ²± ³°¾ ²± ³°¿           © Label 1, È´¾ ½µ ½ ¼ Á Label 1, È´½ ¿µ ¼ ¹Label 0, È´¿ ¾µ ½ ¼ Label 0, È´¾ ¿µ ¼ FIGURE 1.2 Diagrammatic representation of a Markov source. EXAMPLE 1.14 Consider a Markov source with alphabet ¼ ½ and set of states ¦ ½ ¾ ¿ . Suppose there are four transitions: 1. ½ ¾, with label 1 and È´¾ ½µ ½ ¼; 2. ¾ ¿, with label 0 and È´¿ ¾µ ½ ¼; 3. ¿ ½, with label 1 and È´½ ¿µ ¼ ; 4. ¿ ¾, with label 0 and È´¾ ¿µ ¼ . The initial probability distribution is È´ ½µ ½ ¿, È´ ¾µ ½ ¿, È´ ¿µ ½ ¿. The diagrammatic representation of this is shown in Figure 1.2. The random sequences generated by this source all consist of subsequences of an even number of 0’s separated by a pair of 1’s, except at the very beginning of the sequence, where there may be a single 0 or 1. It is possible to describe these sources without explicit reference to the states. In an n-gram model, the description is solely in terms of the probabilities of all the possible sequences of symbols of length Ò.
  • 36. Entropy and Information 21 EXAMPLE 1.15 The following probabilities give us a 3-gram model on the language ¼ ½ . È ´¼¼¼µ ¼ ¿¾ È ´¼¼½µ ¼ ¼ È ´¼½¼µ ¼ ½ È ´¼½½µ ¼ ½ È ´½¼¼µ ¼ ½ È ´½¼½µ ¼ ¼ È ´½½¼µ ¼ ¼ È ´½½½µ ¼ ¼ To describe the relationship between n-gram models and Markov sources, we need to look at special cases of Markov sources. DEFINITION 1.16 Ñth-order Markov Source A Markov source whose states are sequences of Ñ symbols from the alphabet is called an Ñth-order Markov source. When we have an Ñth-order Markov model, the transition probabilities are usually given in terms of the probabilities of single symbols being emitted when the source is in a given state. For example, in a second-order Markov model on ¼ ½ , the transition probability from ¼½ to ½¼, which would be represented by È ´½¼ ¼½µ, would be represented instead by the probability of emission of ¼ when in the state ¼½, that is È ´¼ ¼½µ. Obviously, some transitions are impossible. For example, it is not possible to go from the state ¼½ to the state ¼¼, as the state following ¼½ must have ½ as its first symbol. We can construct a Ñth-order Markov model from an ´Ñ · ½µ-gram model and an Ñ-gram model. The Ñ-gram model gives us the probabilities of strings of length Ñ, such as È ´×½ ×¾ ×ѵ. To find the emission probability of × from this state, we set È ´× ×½ ×¾ ×ѵ È ´×½ ×¾ ×Ñ ×µ È ´×½ ×¾ ×ѵ (1.45) where the probability È ´×½ ×¾ ×Ñ ×µ is given by the ´Ñ · ½µ-gram model. EXAMPLE 1.16 In the previous example 1.15 we had a 3-gram model on the language ¼ ½ given by È ´¼¼¼µ ¼ ¿¾ È ´¼¼½µ ¼ ¼ È ´¼½¼µ ¼ ½ È ´¼½½µ ¼ ½ È ´½¼¼µ ¼ ½ È ´½¼½µ ¼ ¼ È ´½½¼µ ¼ ¼ È ´½½½µ ¼ ¼
  • 37. 22 Fundamentals of Information Theory and Coding Design P(1|01)=0.5 P(0|11)=0.6P(1|00)=0.2 01 11 1000 P(0|10)=0.8 P(1|10)=0.2 P(0|01)=0.5 P(1|11)=0.4 P(0|00)=0.8 FIGURE 1.3 Diagrammatic representation of a Markov source equivalent to a 3-gram model. If a 2-gram model for the same source is given by È´¼¼µ ¼ , È´¼½µ ¼ ¿, È´½¼µ ¼ ¾ and È´½½µ ¼ ½, then we can construct a second-order Markov source as follows: È´¼ ¼¼µ È´¼¼¼µ È´¼¼µ ¼ ¿¾ ¼ ¼ È´½ ¼¼µ È´¼¼½µ È´¼¼µ ¼ ¼ ¼ ¼ ¾ È´¼ ¼½µ È´¼½¼µ È´¼½µ ¼ ½ ¼ ¿ ¼ È´½ ¼½µ È´¼½½µ È´¼½µ ¼ ½ ¼ ¿ ¼ È´¼ ½¼µ È´½¼¼µ È´½¼µ ¼ ½ ¼ ¾ ¼ È´½ ½¼µ È´½¼½µ È´½¼µ ¼ ¼ ¼ ¾ ¼ ¾ È´¼ ½½µ È´½½¼µ È´½½µ ¼ ¼ ¼ ½ ¼ È´½ ½½µ È´½½½µ È´½½µ ¼ ¼ ¼ ½ ¼ Figure 1.3 shows this Markov source. To describe the behaviour of a Markov source mathematically, we use the transition matrix of probabilities. If the set of states is ¦ ½ ¾ Æ
  • 38. Entropy and Information 23 the transition matrix is the Æ ¢Æ matrix ¥ ¾ È´½ ½µ È´½ ¾µ ¡¡¡ È´½ Ƶ È´¾ ½µ È´¾ ¾µ ¡¡¡ È´¾ Ƶ ... ... ... ... È´Æ ½µ È´Æ ¾µ ¡¡¡ È´Æ Æµ ¿ (1.46) The probability of the source being in a given state varies over time. Let ÛØ be the probability of the source being in state at time Ø, and set ÏØ ¾ ÛØ ½ ÛØ ¾ ... ÛØ Æ ¿ (1.47) Then ϼ is the initial probability distribution and ÏØ·½ ¥ÏØ (1.48) and so, by induction, ÏØ ¥Ø ϼ (1.49) Because they all represent probability distributions, each of the columns of ¥ must add up to ½, and all the ÛØ must add up to ½ for each Ø. 1.14 Stationary Distributions The vectors ÏØ describe how the behaviour of the source changes over time. The asymptotic (long-term) behaviour of sources is of interest in some cases. EXAMPLE 1.17 Consider a source with transition matrix ¥ ¼ ¼ ¾ ¼ ¼ Suppose the initial probability distribution is ϼ ¼ ¼
  • 39. 24 Fundamentals of Information Theory and Coding Design Then Ͻ ¥Ï¼ ¼ ¼ ¾ ¼ ¼ ¼ ¼ ¼ ¼ Similarly, Ͼ ¥Ï½ ¼ ¿ ¼ Ï¿ ¥Ï¾ ¼ ¿ ¼ Ï ¥Ï¿ ¼ ¿¿ ¼ ¾ and so on. Suppose instead that ϼ × ½ ¿ ¾ ¿ Then Ͻ × ¥Ï¼ × ¼ ¼ ¾ ¼ ¼ ½ ¿ ¾ ¿ ½ ¿ ¾ ¿ ϼ × so that ÏØ × Ï¼ × for all Ø ¼. This distribution will persist for all time. In the example above, the initial distribution ϼ × has the property that ¥Ï¼ × Ï¼ × (1.50) and persists for all time. DEFINITION 1.17 Stationary Distribution A probability distribution Ï over the states of a Markov source with transition matrix ¥ that satisfies the equation ¥Ï Ï is a stationary distribution. As shown in the example, if ϼ is a stationary distribution, it persists for all time, ÏØ Ï¼ for all Ø. The defining equation shows that a stationary distribution Ï must be an eigenvector of ¥with eigenvalue ½. To find a stationary distribution for ¥, we must solve the equation ¥Ï Ï together with the condition that È Û ½. EXAMPLE 1.18 Suppose ¥ ¾ ¼ ¾ ¼ ¼ ¼ ¼¼ ¼ ¼ ¼ ¼¼ ¼ ¾ ¼ ¾ ¼ ¼ ¼ ¿
  • 40. Entropy and Information 25 Then the equation ¥Ï Ï gives ¼ ¾ Û½ ·¼ ¼Û¾ ·¼ ¼¼Û¿ Û½ ¼ ¼Û½ ·¼ ¼¼Û¾ ·¼ ¾ Û¿ Û¾ ¼ ¾ Û½ ·¼ ¼Û¾ ·¼ Û¿ Û¿ The first equation gives us ¼ Û½  ¼ ¼Û¾ ¼ and the other two give ¼ ¼Û½  ½ ¼¼Û¾ ·¼ ¾ Û¿ ¼ ¼ ¾ Û½ ·¼ ¼Û¾  ¼ ¾ Û¿ ¼ from which we get  ¾ ¼¼Û¾ ·¼ Û¿ ¼ So Û½ ¾ ¿Û¾ and Û¿ ¿Û¾ Substituting these values in Û½ ·Û¾ ·Û¿ ½ we get ¾ ¿Û¾ ·Û¾ · ¿Û¾ ½ which gives us Û¾ ¿ ½¿ Û½ ¾ ½¿ Û¿ ½¿ So the stationary distribution is Ï ¾ ¾ ½¿ ¿ ½¿ ½¿ ¿ In the examples above, the source has an unique stationary distribution. This is not always the case.
  • 41. 26 Fundamentals of Information Theory and Coding Design EXAMPLE 1.19 Consider the source with four states and probability transition matrix ¥ ¾ ½ ¼ ¼ ¼ ¼ ¼ ¼ ¼ ¼ ¼ ¼ ¼ ¼ ¼ ¼ ¼ ¼ ¼ ¼ ¼ ¼ ¼ ¼ ¼ ¼ ¼ ½ ¼ ¿ The diagrammatic representation of this source is shown in Figure 1.4. È´ ¾µ ¼ ¾½ È´ ½ ¿µ ¼ È´ ¾ ¿µ ¼ È´ ¿ ¾µ ¼ È´ ½ ½µ ½ ¼ ¿ È´ µ ½ ¼ FIGURE 1.4 A source with two stationary distributions. For this source, any distribution with Û¾ ¼ ¼, Û¿ ¼ ¼and Û½ ·Û ½ ¼satisfies the equation ¥Ï Ï . However, inspection of the transition matrix shows that once the source enters either the first state or the fourth state, it cannot leave it. The only stationary distributions that can occur are Û½ ½ ¼, Û¾ ¼ ¼, Û¿ ¼ ¼, Û ¼ ¼ or Û½ ¼ ¼, Û¾ ¼ ¼, Û¿ ¼ ¼, Û ½ ¼. Some Markov sources have the property that every sequence generated by the source has the same statistical properties. That is, the various frequencies of occurrence of
  • 42. Entropy and Information 27 symbols, pairs of symbols, and so on, obtained from any sequence generated by the source will, as the length of the sequence increases, approach some definite limit which is independent of the particular sequence. Sources that have this property are called ergodic sources. The source of Example 1.19 is not an ergodic source. The sequences generated by that source fall into two classes, one of which is generated by sequences of states that end in the first state, the other of which is generated by sequences that end in the fourth state. The fact that there are two distinct stationary distributions shows that the source is not ergodic. 1.15 The Entropy of Markov Sources There are various ways of defining the entropy of an information source. The fol- lowing is a simple approach which applies to a restricted class of Markov sources. DEFINITION 1.18 Entropy of the th State of a Markov Source The entropy of the th state of a Markov source is the entropy of the probability distribution on the set of transitions from that state. If we denote the probability distribution on the set of transitions from the th state by È , then the entropy of the th state is given by À´È µ   Æ ½ È ´ µ ÐÓ ´È ´ µµ (1.51) DEFINITION 1.19 Unifilar Markov Source A unifilar Markov source is one with the property that the labels on the transitions from any given state are all distinct. We need this property in order to be able to define the entropy of a Markov source. We assume that the source has a stationary distribution.
  • 43. 28 Fundamentals of Information Theory and Coding Design DEFINITION 1.20 Entropy of a Unifilar Markov Source The entropy of a unifilar Markov source Å, whose stationary distribution is given by Û½ Û¾ ÛÆ, and whose transition probabilities are È´ µ for ½ Æ, ½ Æ, is À´Åµ Æ ½ Û À´È µ   Æ ½ Æ ½ Û È´ µ ÐÓ ´È´ µµ (1.52) It can be shown that this definition is consistent with more general definitions of the entropy of an information source. EXAMPLE 1.20 For the Markov source of Example 1.14, there are three states, ½, ¾ and ¿. The probability distribution on the set of transitions from is È for ½ ¾ ¿. Ƚ is given by Ƚ´½µ È´½ ½µ ¼ ¼ Ƚ´¾µ È´¾ ½µ ½ ¼ Ƚ´¿µ È´¿ ½µ ¼ ¼ Its entropy is À´È½µ  ´¼ ¼µ ÐÓ ´¼ ¼µ   ´½ ¼µ´¼ ¼µ   ´¼ ¼µ ÐÓ ´¼ ¼µ ¼ ¼ using the usual convention that ¼ ¼ ÐÓ ´¼ ¼µ ¼ ¼. Ⱦ is given by Ⱦ´½µ È´½ ¾µ ¼ ¼ Ⱦ´¾µ È´¾ ¾µ ¼ ¼ Ⱦ´¿µ È´¿ ¾µ ½ ¼ Its entropy is À´È¾µ  ´¼ ¼µ ÐÓ ´¼ ¼µ   ´¼ ¼µ ÐÓ ´¼ ¼µ   ´½ ¼µ´¼ ¼µ ¼ ¼ È¿ is given by È¿´½µ È´½ ¿µ ¼ È¿´¾µ È´¾ ¿µ ¼ È¿´¿µ È´¿ ¿µ ¼ ¼ Its entropy is À´È¿µ  ´¼ µ´ ¼ ¿ µ   ´¼ µ´ ½ ¿¾½ ¿µ   ´¼ ¼µ ÐÓ ´¼ ¼µ ¼ ¼ The stationary distribution of the source is given by Û½ ¿ ½¿ Û¾ ½¿ Û¿ ½¿
  • 44. Entropy and Information 29 The entropy of the source is À´Åµ ´¼ ¼µ´¿ ½¿µ · ´¼ ¼µ´ ½¿µ · ´¼ ¼ µ´ ½¿µ ¼ ¿ ¿ EXAMPLE 1.21 For the source of Example 1.16, the states are ¼¼, ¼½, ½¼, ½½. ȼ¼ is given by ȼ¼´¼µ È´¼ ¼¼µ ¼ , and ȼ¼´½µ È´½ ¼¼µ ¼ ¾. Its entropy is À´È¼¼µ  ´¼ µ´ ¼ ¿¾½ ¿µ   ´¼ ¾µ´ ¾ ¿¾½ ¿µ ¼ ¾½ ¿ ȼ½ is given by ȼ½´¼µ È´¼ ¼½µ ¼ , and ȼ½´½µ È´½ ¼½µ ¼ . Its entropy is À´È¼½µ  ´¼ µ´ ½ ¼µ   ´¼ µ´ ½ ¼µ ½ ¼ Ƚ¼ is given by Ƚ¼´¼µ È´¼ ½¼µ ¼ , and Ƚ¼´½µ È´½ ½¼µ ¼ ¾. Its entropy is À´È½¼µ  ´¼ µ´ ¼ ¿¾½ ¿µ   ´¼ ¾µ´ ¾ ¿¾½ ¿µ ¼ ¾½ ¿ Ƚ½ is given by Ƚ½´¼µ È´¼ ½½µ ¼ , and Ƚ½´½µ È´½ ½½µ ¼ . Its entropy is À´È½½µ  ´¼ µ´ ¼ ½¼ ¿µ   ´¼ µ´ ½ ¿¾½ ¿µ ¼ ¼ The stationary distribution of the source is given by Û½ ½¾¼ ¾½¿ Û¾ ¿¼ ¾½¿ Û¿ ¿ ¾½¿ Û ¾ ¾½¿ The entropy of the source is À´Åµ ½¾¼´¼ ¾½ ¿µ · ¿¼´½ ¼¼¼¼¼µ · ¿ ´¼ ¾½ ¿µ · ¾ ´¼ ¼ µ ¾½¿ ¼ ¿ ¿ 1.16 Sequences of Symbols It is possible to estimate the entropy of a Markov source using information about the probabilities of occurrence of sequences of symbols. The following results apply
  • 45. 30 Fundamentals of Information Theory and Coding Design to ergodic Markov sources and are stated without proof. In a sense, they justify the use of the conditional probabilities of emission of symbols instead of transition probabilities between states in Ñth-order Markov models. THEOREM 1.6 Given any ¯ ¼ and any Æ ¼, we can find a positive integer Ƽ such that all sequences of length Æ Æ¼ fall into two classes: a set of sequences whose total probability is less than ¯; and the remainder, for which the following inequality holds: ¬ ¬ ¬ ¬ ÐÓ ´½ Ôµ Æ  À ¬ ¬ ¬ ¬ Æ (1.53) where Ôis the probability of the sequence and À is the entropy of the source. PROOF See [6], Appendix 3. THEOREM 1.7 Let Å be a Markov source with alphabet ½ ¾ Ò , and entropy À. Let Æ denote the set of all sequences of symbols from of length Æ. For × ¾ Æ, let È´×µ be the probability of the sequence ×being emitted by the source. Define Æ   ½ Æ ×¾ Æ È´×µ ÐÓ ´È´×µµ (1.54) which is the entropy per symbol of the sequences of Æ symbols. Then Æ is a monotonic decreasing function of Æ and Ð Ñ Æ ½ Æ À (1.55) PROOF See [6], Appendix 3. THEOREM 1.8 Let Å be a Markov source with alphabet ½ ¾ Ò , and entropy À. Let Æ denote the set of all sequences of symbols from of length Æ. For × ¾ Æ ½, let È´× µ be the probability of the source emitting the sequence × followed by the symbol , and let È´ ×µ be the conditional probability of the symbol being emitted immediately after the sequence ×. Define Æ   ×¾ Æ ½ Ò ½ È´× µ ÐÓ ´È´ ×µµ (1.56)
  • 46. Entropy and Information 31 which is the conditional entropy of the next symbol when the ´Æ   ½µ preceding symbols are known. Then Æ is a monotonic decreasing function of Æ and Ð Ñ Æ ½ Æ À (1.57) PROOF See [6], Appendix 3. THEOREM 1.9 If Æ and Æ are defined as in the previous theorems, then Æ Æ Æ  ´Æ  ½µ Æ ½ (1.58) Æ ½ Æ Æ Á ½ Á (1.59) and Æ Æ (1.60) PROOF See [6], Appendix 3. These results show that a series of approximations to the entropy of a source can be obtained by considering only the statistical behaviour of sequences of symbols of increasing length. The sequence of estimates Æ is a better approximation than the sequence Æ . If the dependencies in a source extend over no more than Æ symbols, so that the conditional probability of the next symbol knowing the preced- ing ´Æ  ½µ symbols is the same as the conditional probability of the next symbol knowing the preceding Æ Æ symbols, then Æ À. 1.17 The Adjoint Source of a Markov Source It is possible to approximate the behaviour of a Markov source by a memoryless source. DEFINITION 1.21 Adjoint Source of a Markov Source The adjoint source of a Markov source is the memoryless source with the same alphabet which emits symbols independently of each other with the same probabilities as the Markov source.
  • 47. 32 Fundamentals of Information Theory and Coding Design If we have an Ñth-order Markov source Å with alphabet ½ ¾ Õ , the probabilities of emission of the symbols are È´ µ È´ ½ ¾ ѵȴ ½ ¾ ѵ (1.61) where ½ ¾ Ñ represents a sequence of Ñ symbols from the alphabet of the source, È´ ½ ¾ ѵ is the probability of this sequence in the stationary dis- tribution of the Markov source and the summation over indicates that all such sequences are included in the summation. The adjoint source of this Markov source, denoted Å, is the memoryless source that emits these symbols with the same prob- abilities. EXAMPLE 1.22 For the 3-gram model of Example 1.15, we have transition probabilities È´¼ ¼¼µ ¼ È´½ ¼¼µ ¼ ¾ È´¼ ¼½µ ¼ È´½ ¼½µ ¼ È´¼ ½¼µ ¼ È´½ ½¼µ ¼ ¾ È´¼ ½½µ ¼ È´½ ½½µ ¼ which give us the transition matrix ¥ ¾ ¼ ¼ ¼ ¼ ¼ ¼ ¼ ¾ ¼ ¼ ¼ ¾ ¼ ¼ ¼ ¼ ¼ ¼ ¼ ¼ ¼ ¼ ¼ ¼ ¼ ¼ ¿ We need to find the stationary distribution of the source. The equation ¥Ï Ï gives ¼ Û½ ·¼ ¼Û¾ ·¼ Û¿ ·¼ ¼Û Û½ ¼ ¾Û½ ·¼ ¼Û¾ ·¼ ¾Û¿ ·¼ ¼Û Û¾ ¼ ¼Û½ ·¼ Û¾ ·¼ ¼Û¿ ·¼ Û Û¿ ¼ ¼Û½ ·¼ Û¾ ·¼ ¼Û¿ ·¼ Û Û Solving these equations together with the constraint Û½ ·Û¾ ·Û¿ ·Û ½ ¼ we get the stationary distribution È´¼¼µ ¾ È´¼½µ È´½¼µ È´½½µ
  • 48. Entropy and Information 33 The probabilities for the adjoint source of the 3-gram models are È´¼µ È´¼ ¼¼µÈ´¼¼µ · È´¼ ¼½µÈ´¼½µ · È´¼ ½¼µÈ´½¼µ · È´¼ ½½µÈ´½½µ ½ ¾¿ and È´½µ È´½ ¼¼µÈ´¼¼µ · È´½ ¼½µÈ´¼½µ · È´½ ½¼µÈ´½¼µ · È´½ ½½µÈ´½½µ ¾¿ Although the probabilities of emission of single symbols are the same for both the Markov source and its adjoint source, the probabilities of emission of sequences of symbols may not be the same. For example the probability of emission of the sequence ¼¼¼ by the Markov source is È´¼ ¼¼µ ¼ , while for the adjoint source it is È´¼µ¿ ¼ ¼¿ (by the assumption of independence). Going from a Markov source to its adjoint reduces the number of constraints on the output sequence and hence increases the entropy. This is formalised by the following theorem. THEOREM 1.10 If Å is the adjoint of the Markov source Å, their entropies are related by À´Åµ À´Åµ (1.62) PROOF If Å is an Ñth-order source with alphabet ½ ¾ Õ , we will denote the states, which are Ñ-tuples of the , by «Á , where ½ Á ÕÑ . We assume that Å has a stationary distribution. The probabilities of emission of the symbols are È´ µ Á ÛÁÈ´ «Áµ (1.63) where the summation is over all states and ÛÁ is the probability of state «Á in the stationary distribution of the source. The entropy of the adjoint is À´Åµ   Õ ½ È´ µ ÐÓ ´È´ µµ
  • 49. 34 Fundamentals of Information Theory and Coding Design   Õ ½ Á ÛÁ È´ «ÁµÐÓ ´È´ µµ   Á ÛÁ Õ ½ È´ «ÁµÐÓ ´È´ µµ (1.64) The entropy of the Áth state of Å is À´ÈÁ µ   Õ ½ È´ «Á µÐÓ ´È´ «Áµµ (1.65) and the entropy of Å is À´Åµ   Á Õ ½ ÛÁÈ´ «ÁµÐÓ ´È´ «Á µµ   Á ÛÁ Õ ½ È´ «ÁµÐÓ ´È´ «Á µµ (1.66) If we apply the inequality of Lemma 1.1 to each summation over , the result follows. 1.18 Extensions of Sources In situations where codes of various types are being developed, it is often useful to consider sequences of symbols emitted by a source. DEFINITION 1.22 Extension of a Stationary Memoryless Source The Òth extension of a stationary memoryless source Ë is the stationary memoryless source whose alphabet consists of all sequences of Ò symbols from the alphabet of Ë, with the emission probabilities of the sequences being the same as the probabilities of occurrence of the sequences in the output of Ë. The Òth extension of Ë will be denoted by ËÒ. Because the emission of successive symbols by Ë is statistically independent, the emission probabilities in ËÒ can be computed by multiplying the appropriate emission probabilities in Ë. EXAMPLE 1.23 Consider the memoryless source Ë with alphabet ¼ ½ and emission probabilities È´¼µ ¼ ¾, È´½µ ¼ .
  • 50. Entropy and Information 35 The second extension of Ë has alphabet ¼¼ ¼½ ½¼ ½½ with emission probabilities È´¼¼µ È´¼µÈ´¼µ ´¼ ¾µ´¼ ¾µ ¼ ¼ È´¼½µ È´¼µÈ´½µ ´¼ ¾µ´¼ µ ¼ ½ È´½¼µ È´½µÈ´¼µ ´¼ µ´¼ ¾µ ¼ ½ È´½½µ È´½µÈ´½µ ´¼ µ´¼ µ ¼ The third extension of Ë has alphabet ¼¼¼ ¼¼½ ¼½¼ ¼½½ ½¼¼ ½¼½ ½½¼ ½½½ with emission probabilities È´¼¼¼µ È´¼µÈ´¼µÈ´¼µ ´¼ ¾µ´¼ ¾µ´¼ ¾µ ¼ ¼¼ È´¼¼½µ È´¼µÈ´¼µÈ´½µ ´¼ ¾µ´¼ ¾µ´¼ µ ¼ ¼¿¾ È´¼½¼µ È´¼µÈ´½µÈ´¼µ ´¼ ¾µ´¼ µ´¼ ¾µ ¼ ¼¿¾ È´¼½½µ È´¼µÈ´½µÈ´½µ ´¼ ¾µ´¼ µ´¼ µ ¼ ½¾ È´½¼¼µ È´½µÈ´¼µÈ´¼µ ´¼ µ´¼ ¾µ´¼ ¾µ ¼ ¼¿¾ È´½¼½µ È´½µÈ´¼µÈ´½µ ´¼ µ´¼ ¾µ´¼ µ ¼ ½¾ È´½½¼µ È´½µÈ´½µÈ´¼µ ´¼ µ´¼ µ´¼ ¾µ ¼ ½¾ È´½½½µ È´½µÈ´½µÈ´½µ ´¼ µ´¼ µ´¼ µ ¼ ½¾ There is a simple relationship between the entropy of a stationary memoryless source and the entropies of its extensions. THEOREM 1.11 If ËÒ is the Òth extension of the stationary memoryless source Ë, their entropies are related by À´ËÒµ ÒÀ´Ëµ (1.67) PROOF If the alphabet of Ë is ½ ¾ Õ , and the emission probabilities of the symbols are È´ µ for ½ ¾ Õ, the entropy of Ë is À´Ëµ   Õ ½ È´ µ ÐÓ ´È´ µµ (1.68) The alphabet of ËÒ consists of all sequences ½ ¾ Ò, where ¾ ½ ¾ Õ . The emission probability of ½ ¾ Ò is È´ ½ ¾ Òµ È´ ½ µÈ´ ¾ µ È´ Òµ (1.69)
  • 51. 36 Fundamentals of Information Theory and Coding Design The entropy of ËÒ is À´ËÒµ   Õ ½ ½ Õ Ò ½ È´ ½ ¾ Òµ ÐÓ ´È´ ½ ¾ Òµµ   Õ ½ ½ Õ Ò ½ È´ ½ ¾ Òµ Ò ½ ÐÓ ´È´ µµ (1.70) We can interchange the order of summation to get À´ËÒµ   Ò ½ Õ ½ ½ Õ Ò ½ È´ ½ ¾ Ò µ ÐÓ ´È´ µµ (1.71) Breaking È´ ½ ¾ Ò µ into the product of probabilities, and rearranging, we get À´ËÒµ   Ò ½ Õ ½ ½ È´ ½ µ Õ ½ È´ µ ÐÓ ´È´ µµ Õ Ò ½ È´ Ò µ (1.72) Since Õ ½ È´ µ ½ (1.73) for , we are left with À´ËÒµ   Ò ½ Õ ½ È´ µ ÐÓ ´È´ µµ Ò ½ À´Ëµ ÒÀ´Ëµ (1.74) We also have extensions of Markov sources. DEFINITION 1.23 Extension of an Ñth-order Markov Source Let Ñ and Ò be positive integers, and let Ô be the smallest integer that is greater than or equal to Ñ Ò. The Òth extension of the Ñth-order Markov source Å is the Ôth-order Markov source whose alphabet consists of all sequences of Ò symbols from the alphabet of Å and for which the transition probabilities between states are equal to the probabilities of the corresponding Ò-fold transitions of the Ñth-order source. We will use ÅÒ to denote the Òth extension of Å.
  • 52. Entropy and Information 37 EXAMPLE 1.24 Let Å be the first-order Markov source with alphabet ¼ ½ and transition probabil- ities È´¼ ¼µ ¼ ¿ È´½ ¼µ ¼ È´¼ ½µ ¼ È´½ ½µ ¼ The second extension of Å has Ñ ½ and Ò ¾, so Ô ½. It is a first-order source with alphabet ¼¼ ¼½ ½¼ ½½ . We can calculate the transition probabilities as follows. È´¼¼ ¼¼µ È´¼ ¼µÈ´¼ ¼µ ´¼ ¿µ´¼ ¿µ ¼ ¼ È´¼½ ¼¼µ È´¼ ¼µÈ´½ ¼µ ´¼ ¿µ´¼ µ ¼ ¾½ È´½¼ ¼¼µ È´½ ¼µÈ´¼ ½µ ´¼ µ´¼ µ ¼ ¾ È´½½ ¼¼µ È´½ ¼µÈ´½ ½µ ´¼ µ´¼ µ ¼ ¾ È´¼¼ ½½µ È´¼ ½µÈ´¼ ¼µ ´¼ µ´¼ ¿µ ¼ ½¾ È´¼½ ½½µ È´¼ ½µÈ´½ ¼µ ´¼ µ´¼ µ ¼ ¾ È´½¼ ½½µ È´½ ½µÈ´¼ ½µ ´¼ µ´¼ µ ¼ ¾ È´½½ ½½µ È´½ ½µÈ´½ ½µ ´¼ µ´¼ µ ¼ ¿ EXAMPLE 1.25 Consider the second order Markov source with alphabet ¼ ½ and transition proba- bilities È´¼ ¼¼µ ¼ È´½ ¼¼µ ¼ ¾ È´¼ ¼½µ ¼ È´½ ¼½µ ¼ È´¼ ½¼µ ¼ ¾ È´½ ½¼µ ¼ È´¼ ½½µ ¼ È´½ ½½µ ¼ The transition probabilities of the second extension are
  • 53. 38 Fundamentals of Information Theory and Coding Design È´¼¼ ¼¼µ È´¼ ¼¼µÈ´¼ ¼¼µ ´¼ µ´¼ µ ¼ È´¼½ ¼¼µ È´¼ ¼¼µÈ´½ ¼¼µ ´¼ µ´¼ ¾µ ¼ ½ È´½¼ ¼¼µ È´½ ¼¼µÈ´¼ ¼½µ ´¼ ¾µ´¼ µ ¼ ½¼ È´½½ ¼¼µ È´½ ¼¼µÈ´½ ¼½µ ´¼ ¾µ´¼ µ ¼ ½¼ È´¼¼ ¼½µ È´¼ ¼½µÈ´¼ ½¼µ ´¼ µ´¼ ¾µ ¼ ½¼ È´¼½ ¼½µ È´¼ ¼½µÈ´½ ½¼µ ´¼ µ´¼ µ ¼ ¼ È´½¼ ¼½µ È´½ ¼½µÈ´¼ ½½µ ´¼ µ´¼ µ ¼ ¿¼ È´½½ ¼½µ È´½ ¼½µÈ´½ ½½µ ´¼ µ´¼ µ ¼ ¾¼ È´¼¼ ½¼µ È´¼ ½¼µÈ´¼ ¼¼µ ´¼ ¾µ´¼ µ ¼ ½ È´¼½ ½¼µ È´¼ ½¼µÈ´½ ¼¼µ ´¼ ¾µ´¼ ¾µ ¼ ¼ È´½¼ ½¼µ È´½ ½¼µÈ´¼ ¼½µ ´¼ µ´¼ µ ¼ ¼ È´½½ ½¼µ È´½ ½¼µÈ´½ ¼½µ ´¼ µ´¼ µ ¼ ¼ È´¼¼ ½½µ È´¼ ½½µÈ´¼ ½¼µ ´¼ µ´¼ ¾µ ¼ ½¾ È´¼½ ½½µ È´¼ ½½µÈ´½ ½¼µ ´¼ µ´¼ µ ¼ È´½¼ ½½µ È´½ ½½µÈ´¼ ½½µ ´¼ µ´¼ µ ¼ ¾ È´½½ ½½µ È´½ ½½µÈ´½ ½½µ ´¼ µ´¼ µ ¼ ½ If we denote the states of the second extension by «½ ¼¼, «¾ ¼½, «¿ ½¼ and « ½½, we have the transition probabilities of a first-order Markov source: È´«½ «½µ ¼ È´«¾ «½µ ¼ ½ È´«¿ «½µ ¼ ½¼ È´« «½µ ¼ ½¼ È´«½ «¾µ ¼ ½¼ È´«¾ «¾µ ¼ ¼ È´«¿ «¾µ ¼ ¿¼ È´« «¾µ ¼ ¾¼ È´«½ «¿µ ¼ ½ È´«¾ «¿µ ¼ ¼ È´«¿ «¿µ ¼ ¼ È´« «¿µ ¼ ¼ È´«½ « µ ¼ ½¾ È´«¾ « µ ¼ È´«¿ « µ ¼ ¾ È´« « µ ¼ ½ EXAMPLE 1.26 Consider the fourth order Markov source with alphabet ¼ ½ and transition proba- bilities
  • 54. Entropy and Information 39 È´¼ ¼¼¼¼µ ¼ È´½ ¼¼¼¼µ ¼ ½ È´¼ ¼¼¼½µ ¼ È´½ ¼¼¼½µ ¼ ¾ È´¼ ¼¼½¼µ ¼ È´½ ¼¼½¼µ ¼ ¿ È´¼ ¼¼½½µ ¼ È´½ ¼¼½½µ ¼ È´¼ ¼½¼¼µ ¼ È´½ ¼½¼¼µ ¼ È´¼ ¼½¼½µ ¼ È´½ ¼½¼½µ ¼ È´¼ ¼½½¼µ ¼ ¿ È´½ ¼½½¼µ ¼ È´¼ ¼½½½µ ¼ ¾ È´½ ¼½½½µ ¼ È´¼ ½¼¼¼µ ¼ ½ È´½ ½¼¼¼µ ¼ È´¼ ½¼¼½µ ¼ ¾ È´½ ½¼¼½µ ¼ È´¼ ½¼½¼µ ¼ ¿ È´½ ½¼½¼µ ¼ È´¼ ½¼½½µ ¼ È´½ ½¼½½µ ¼ È´¼ ½½¼¼µ ¼ È´½ ½½¼¼µ ¼ È´¼ ½½¼½µ ¼ È´½ ½½¼½µ ¼ È´¼ ½½½¼µ ¼ È´½ ½½½¼µ ¼ ¿ È´¼ ½½½½µ ¼ È´½ ½½½½µ ¼ ¾ We can use these probabilities to compute the transition probabilities of the second extension, for example, È´¼½ ½¼½¼µ È´¼ ½¼½¼µÈ´½ ¼½¼¼µ ´¼ ¿µ´¼ µ ¼ ½ If we denote the states of the second extension by «½ ¼¼, «¾ ¼½, «¿ ½¼ and « ½½, we have the transition probabilities of a second-order Markov source:
  • 55. 40 Fundamentals of Information Theory and Coding Design È´«½ «½«½µ ¼ ½ È´«¾ «½«½µ ¼ ¼ È´«¿ «½«½µ ¼ ¼ È´« «½«½µ ¼ ¼¾ È´«½ «½«¾µ ¼ È´«¾ «½«¾µ ¼ ¾ È´«¿ «½«¾µ ¼ ½¾ È´« «½«¾µ ¼ ¼ È´«½ «½«¿µ ¼ ¿ È´«¾ «½«¿µ ¼ ¿ È´«¿ «½«¿µ ¼ ½¾ È´« «½«¿µ ¼ ½ È´«½ «½« µ ¼ ½ È´«¾ «½« µ ¼ ¾ È´«¿ «½« µ ¼ ¼ È´« «½« µ ¼ ¿¾ È´«½ «¾«½µ ¼ ¼ È´«¾ «¾«½µ ¼ È´«¿ «¾«½µ ¼ ½¼ È´« «¾«½µ ¼ ¼ È´«½ «¾«¾µ ¼ ½¾ È´«¾ «¾«¾µ ¼ ¾ È´«¿ «¾«¾µ ¼ ¾ È´« «¾«¾µ ¼ ¿ È´«½ «¾«¿µ ¼ ½ È´«¾ «¾«¿µ ¼ ½ È´«¿ «¾«¿µ ¼ ¾ È´« «¾«¿µ ¼ ¾ È´«½ «¾« µ ¼ ½ È´«¾ «¾« µ ¼ ¼ È´«¿ «¾« µ ¼ È´« «¾« µ ¼ ½ È´«½ «¿«½µ ¼ ¼ È´«¾ «¿«½µ ¼ ¼½ È´«¿ «¿«½µ ¼ ¾ È´« «¿«½µ ¼ ½ È´«½ «¿«¾µ ¼ ½ È´«¾ «¿«¾µ ¼ ¼ È´«¿ «¿«¾µ ¼ È´« «¿«¾µ ¼ ¿¾ È´«½ «¿«¿µ ¼ ½ È´«¾ «¿«¿µ ¼ ½ È´«¿ «¿«¿µ ¼ ¾ È´« «¿«¿µ ¼ ¾ È´«½ «¿« µ ¼ ½¾ È´«¾ «¿« µ ¼ ¾ È´«¿ «¿« µ ¼ ½¾ È´« «¿« µ ¼ È´«½ « «½µ ¼ ¼ È´«¾ « «½µ ¼ È´«¿ « «½µ ¼ ½¼ È´« « «½µ ¼ ¼ È´«½ « «¾µ ¼ ½ È´«¾ « «¾µ ¼ ¾ È´«¿ « «¾µ ¼ ½ È´« « «¾µ ¼ ¾ È´«½ « «¿µ ¼ ¿ È´«¾ « «¿µ ¼ ¿ È´«¿ « «¿µ ¼ ½ È´« « «¿µ ¼ ½¾ È´«½ « « µ ¼ È´«¾ « « µ ¼ ¾ È´«¿ « « µ ¼ ½ È´« « « µ ¼ ¼ It is convenient to represent elements of the alphabet of ÅÒ by single symbols as we have done in the examples above. If the alphabet of Å is ½ ¾ Õ , then we will use « for a generic element of the alphabet of ÅÒ, and « ½ ¾ Ò will stand for the sequence ½ ¾ Ò. For further abbreviation, we will let Á stand for ½ ¾ Ò and use «Á to denote « ½ ¾ Ò, and so on. The statistics of the extension ÅÒ are given by the conditional probabilities È´«Â «Á½ «Á¾ «ÁÔµ. In terms of the alphabet of Å, we have È´«Â «Á½ «Á¾ «ÁÔµ È´ ½ Ò ½½ ½Ò ÔÒµ (1.75)
  • 56. Entropy and Information 41 This can also be written as È´«Â «Á½ «Á¾ «ÁÔµ È´ ½ ½½ ½Ò Ôҵȴ ¾ ½¾ ÔÒ ½ µ È´ Ò ½Ò ÔÒ ½ Ò ½µ (1.76) We can use this relationship to prove the following result. THEOREM 1.12 If ÅÒ is the Òth extension of the Markov source Å their entropies are related by À´ÅÒµ ÒÀ´Åµ (1.77) The proof is similar to the proof of the corresponding result for memoryless sources. (Exercise 17 deals with a special case.) Note that if Ñ Ò, then À´Åѵ ÑÀ´Åµ ÒÀ´Åµ À´ÅÒµ (1.78) Since an extension of a Markov source is a Ôth-order Markov source, we can consider its adjoint source. If Å is an Ñth-order Markov source, ÅÒ is an Òth extension of Å, and ÅÒ the adjoint source of the extension, then we can combine the results of Theorem 1.10 and Theorem 1.12 to get À´ÅÒµ À´ÅÒµ ÒÀ´Åµ (1.79) 1.19 Infinite Sample Spaces The concept of entropy carries over to infinite sample spaces, but there are a number of technical issues that have to be considered. If the sample space is countable, the entropy has to be defined in terms of the limit of a series, as in the following example. EXAMPLE 1.27 Suppose the sample space is the set of natural numbers, Æ ¼ ½ ¾ and the probability distribution is given by È´Òµ ¾  Ò ½ (1.80)
  • 57. 42 Fundamentals of Information Theory and Coding Design for Ò ¾ Æ. The entropy of this distribution is À´Èµ   ½ Ò ¼ È´Òµ ÐÓ ´È´Òµ   ½ Ò ¼ ¾  Ò ½´ Ò   ½µ ½ Ò ¼ Ò · ½ ¾Ò·½ (1.81) This infinite sum converges to ¾, so À´Èµ ¾. If the sample space is a continuum, in particular, the real line, the summations be- come integrals. Instead of the probability distribution function, we use the probabil- ity density function , which has the property that È´ µ ´Üµ Ü (1.82) where is a closed interval and is the probability density function. The mean and variance of the probability density function are defined by ½  ½ Ü ´Üµ Ü (1.83) and ¾ ½  ½ ´Ü   µ ¾ ´Üµ Ü (1.84) The obvious generalisation of the definition of entropy for a probability density func- tion defined on the real line is À´ µ   ½  ½ ´Üµ ÐÓ ´ ´Üµµ Ü (1.85) provided this integral exists. This definition was proposed by Shannon, but has been the subject of debate because it is not invariant with respect to change of scale or change of co-ordinates in general. It is sometimes known as the differential entropy. If we accept this definition of the entropy of a continuous distribution, it is easy to compute the entropy of a Gaussian distribution. THEOREM 1.13 The entropy of a Gaussian distribution with mean and variance ¾ is ÐÒ´ Ô ¾ µ in natural units.
  • 58. Entropy and Information 43 PROOF The density function of the Gaussian distribution is ´Üµ ½ Ô¾ ´Ü  µ¾ ¾ ¾ (1.86) Since this is a probability density function, we have ½  ½ ´Üµ Ü ½ (1.87) Taking the natural logarithm of , we get ÐÒ´ ´Üµµ  ÐÒ´ Ô ¾ µ   ´Ü   µ¾ ¾ ¾ (1.88) By definition, ¾ ½  ½ ´Ü   µ ¾ ´Üµ Ü (1.89) this will be used below. We now calculate the entropy: À´ µ   ½  ½ ´Üµ ÐÒ´ ´Üµµ Ü   ½  ½ ´Üµ  ÐÒ´ Ô ¾ µ   ´Ü   µ¾ ¾ ¾ Ü ½  ½ ´Üµ ÐÒ´ Ô ¾ µ Ü · ½  ½ ´Üµ ´Ü   µ¾ ¾ ¾ Ü ÐÒ´ Ô ¾ µ ½  ½ ´Üµ Ü · ½ ¾ ¾ ½  ½ ´Üµ´Ü   µ ¾ Ü ÐÒ´ Ô ¾ µ · ¾ ¾ ¾ ÐÒ´ Ô ¾ µ · ½ ¾ ÐÒ´ Ô ¾ µ · ÐÓ ´ Ô µ ÐÒ´ Ô ¾ µ (1.90) If the probability density function is defined over the whole real line, it is not possi- ble to find a specific probability density function whose entropy is greater than the entropy of all other probability density functions defined on the real line. However, if we restrict consideration to probability density functions with a given variance, it
  • 59. 44 Fundamentals of Information Theory and Coding Design can be shown that the Gaussian distribution has the maximum entropy of all these distributions. (Exercises 20 and 21 outline the proof of this result.) We have used À´ µ to denote the entropy of the probability distribution whose prob- ability density function is . If is a random variable whose probability density function is , we will denote its entropy by either À´ µ or À´ µ. 1.20 Exercises 1. Let Ë ×½ ×¾ ׿ be a sample space with probability distribution È given by È´×½µ ¼ ¾, È´×¾µ ¼ ¿, ȴ׿µ ¼ . Let be a function defined on Ë by ´×½µ , ´×¾µ  ¾, ´×¿µ ½. What is the expected value of ? 2. Let Ë ×½ ×¾ be a sample space with probability distribution È given by È´×½µ ¼ , È´×¾µ ¼ ¿. Let be the function from Ë to ʾ given by ´×½µ ¾ ¼ ¿ ¼ and ´×¾µ ¼   ¼ What is the expected value of ? 3. Suppose that a fair die is tossed. What is the expected number of spots on the uppermost face of the die when it comes to rest? Will this number of spots ever be seen when the die is tossed? 4. Let Ë ×½ ×¾ ׿ × be a sample space with probability distribution È given by È´×½µ ¼ , È´×¾µ ¼ ¾ , ȴ׿µ ¼ ½¾ , È´× µ ¼ ½¾ . There are sixteen possible events that can be formed from the elements of Ë. Compute the probability and surprise of these events. 5. Let Ë Ë½ ×½ ×Æ be a sample space for some Æ. Compute the en- tropy of each of the following probability distributions on Ë: (a) Æ ¿, È´×½µ ¼ , È´×¾µ ¼ ¾ , ȴ׿µ ¼ ¾ ; (b) Æ , È´×½µ ¼ , È´×¾µ ¼ ¾ , ȴ׿µ ¼ ½¾ , È´× µ ¼ ½¾ ; (c) Æ , È´×½µ ¼ , È´×¾µ ¼ ½¾ , ȴ׿µ ¼ ½¾ , È´× µ ¼ ½¾ , È´× µ ¼ ½¾ ; (d) Æ , È´×½µ ¼ ¾ , È´×¾µ ¼ ¾ , ȴ׿µ ¼ ¾ , È´× µ ¼ ½¾ , È´× µ ¼ ½¾ ;
  • 60. Entropy and Information 45 (e) Æ , È´×½µ ¼ ¼ ¾ , È´×¾µ ¼ ½¾ , ȴ׿µ ¼ ¾ , È´× µ ¼ , È´× µ ¼ ¼ ¾ . 6. Convert the following entropy values from bits to natural units: (a) À´Èµ ¼ ; (b) À´Èµ ½ ¼; (c) À´Èµ ½ ; (d) À´Èµ ¾ ¼; (e) À´Èµ ¾ ; (f) À´Èµ ¿ ¼. 7. Convert the following entropy values from natural units to bits: (a) À´Èµ ¼ ; (b) À´Èµ ½ ¼; (c) À´Èµ ½ ; (d) À´Èµ ¾ ¼; (e) À´Èµ ¾ ; (f) À´Èµ ¿ ¼. 8. Let Ë ×½ ×¾ and Ì Ø½ ؾ Ø¿ . Let È be a joint probability distribution function on Ë ¢ Ì, given by È´×½ ؽµ ¼ , È´×½ ؾµ ¼ ¾ , È´×½ Ø¿µ ¼ ½¾ , È´×¾ ؽµ ¼ ¼ ¾ , È´×¾ ؾµ ¼ ¼¿½¾ , È´×¾ Ø¿µ ¼ ¼¿½¾ . Com- pute the marginal distributions ÈË and ÈÌ and the entropies À´Èµ, À´È˵ and À´ÈÌ µ. Also compute the conditional probability distributions ÈË Ì and ÈÌ Ë and their entropies À´ÈË Ì µ and À´ÈÌ Ëµ. 9. Draw a diagram to represent the Markov source with alphabet ¼ ½ and set of states ¦ ½ ¾ ¿ with the following five transitions: (a) ½ ¾, with label 1 and È´¾ ½µ ¼ ; (b) ½ ¿, with label 0 and È´¿ ½µ ¼ ; (c) ¾ ½, with label 0 and È´½ ¾µ ¼ ; (d) ¾ ¿, with label 1 and È´¿ ¾µ ¼ ¾; (e) ¿ ½, with label 1 and È´½ ¿µ ½ ¼. Write down the transition matrix for this source. Is it possible for this source to generate an output sequence that includes the subsequence ¼¼¼? Is it possible for this source to generate an output sequence that includes the subsequence ½½½? 10. A 2-gram model on the language is given by the probabilities È´ µ ¼ ¼¼¼ È´ µ ¼ ¾¼¼ È´ µ ¼ ½¿¿ È´ µ ¼ ½¿¿ È´ µ ¼ ¼¼¼ È´ µ ¼ ¾¼¼ È´ µ ¼ ¾¼¼ È´ µ ¼ ½¿¿ È´ µ ¼ ¼¼¼ The probabilities of the individual symbols are È´ µ ½ ¿ È´ µ ½ ¿ È´ µ ½ ¿ Construct a (first-order) Markov source from these models. Draw a diagram to represent the source and write down its transition matrix.
  • 61. 46 Fundamentals of Information Theory and Coding Design 11. A 4-gram model on the language ¼ ½ is given by È ´½¼½¼µ ¼ ¼ È ´¼½¼½µ ¼ ¼ with all other probabilities ¼. All the 3-gram probabilities are also ¼, except for È ´¼½¼µ ¼ ¼ È ´½¼½µ ¼ ¼ Construct a third-order Markov source from these models. Draw a diagram to represent the source and write down its transition matrix. 12. Consider a Markov source with transition matrix ¥ ¼ ¿ ¼ ¼ ¼ ¾ and initial probability distribution Ï ¼ ¼ ¼ Compute Ï Ø for Ø ½ ¾ ¿ . 13. Find the stationary distribution of the Markov source whose transition matrix is ¥ ¼ ½ ¼ ¼ ¼ ¾ *14. Prove that a Markov source with two states always has a stationary distribution, provided that none of the transition probabilities are ¼. 15. Find the stationary distribution of the Markov source whose transition matrix is ¥ ¾ ¼ ¼ ¿ ¼ ¾ ¼ ¼ ¾ ¼ ¼ ½ ¼ ¼ ¾ ¿ 16. Compute the entropy of the Markov source in Exercise 9 above. *17. Prove that if Å Ò is the Òth extension of the first-order Markov source Å , their entropies are related by À ´Å Ò µ ÒÀ ´Å µ. 18. Let Ë ½ ¾ ¿ be the set of positive integers and let È be the probability distribution given by È ´ µ ¾  . Let be the function on Ë defined by ´ µ ´ ½µ ·½. What is the expected value of ? 19. Let Ù be the probability density function of the uniform distribution over the closed and bounded interval , so that ٴܵ ´   µ ½ if Ü and ٴܵ ¼ otherwise. Compute the mean, variance and entropy (in natural units) of Ù. The next two exercises prove the result that the Gaussian distribution has the maximum entropy of all distributions with a given variance.
  • 62. Entropy and Information 47 20. Use the identity ÐҴܵ Ü  ½ to show that if and are probability density functions defined on the real line then ½  ½ ´Üµ ÐÒ´ ´Üµµ Ü ½  ½ ´Üµ ÐÒ´ ´Üµµ Ü provided both these integrals exist. 21. Show that if is a probability density function with mean and variance ¾ and is the probability density function of a Gaussian distribution with the same mean and variance, then   ½  ½ ´Üµ ÐÒ´ ´Üµµ Ü ÐÒ´ Ô ¾ µ Conclude that of all the probability density functions on the real line with variance ¾, the Gaussian has the greatest entropy. 22. Compute the entropy of the probability density function of a uniform distribu- tion on a closed interval whose variance is ¾ and confirm that it is less than the entropy of a Gaussian distribution with variance ¾. (Use the results of Exercise 19.) The next four exercises are concerned with the Kullback-Leibler Divergence. *23. Let È and É be probability distributions on the sample space Ë ×½ ×¾ ×Æ with È´× µ Ô and É´× µ Õ for ½ Æ. The Kullback-Leibler Divergence of È and É is defined by ´È ɵ Æ ½ Ô ÐÓ ´Ô Õ µ Compute ´È ɵ and ´É ȵ, when Æ , Ô ¼ ¾ for ½ ¾ ¿ , and Õ½ ¼ ½¾ , Õ¾ ¼ ¾ , Õ¿ ¼ ½¾ , Õ ¼ . Is ´È ɵ ´É ȵ? *24. If È and É are probability distributions on Ë, a finite sample space, show that ´È ɵ ¼ and ´È ɵ ¼ if and only if È É. *25. If È, É, and Ê are probability distributions on Ë ×½ ×¾ ×Æ , with È´× µ Ô , É´× µ Õ and Ê´× µ Ö for ½ Æ, show that ´È ʵ ´È ɵ · ´É ʵ if and only if Æ ½ ´Ô  Õ µ ÐÓ ´Õ Ö µ ¼
  • 63. 48 Fundamentals of Information Theory and Coding Design *26. If the sample space is the real line, it is easier to define the Kullback-Leibler Divergence in terms of probability density functions. If Ô and Õ are probability density functions on Ê, with Դܵ ¼ and մܵ ¼ for all Ü ¾ Ê, then we define ´Ô Õµ ½  ½ Դܵ ÐҴԴܵ մܵµ Ü Compute ´Ô Õµ when Ô is the probability density function of a Gaussian distribution with mean ½ and variance ¾ and Õ is the probability density function of a Gaussian distribution with mean ¾ and variance ¾. The next two exercises deal with the topic of Maximum Entropy Estimation. *27. We have shown in Section 1.6 that the entropy of the uniform distribution on a sample space with Æ elements is ÐÓ ´Æµ and this is the maximum value of the entropy for any distribution defined on that sample space. In many situations, we need to find a probability distribution that satisfies certain constraints and has the maximum entropy of all the distributions that satisfy those constraints. One type of constraint that is common is to require that the mean of the dis- tribution should have a certain value. This can be done using Lagrange multi- pliers. Find the probability distribution on Ë ½ ¾ ¿ that has maximum entropy subject to the condition that the mean of the distribution is ¾. To do this you have to find Ô½, Ô¾, Ô¿ and Ô that maximise the entropy À´Èµ ½ Ô ÐÓ ´½ Ô µ subject to the two constraints ½ Ô ½ (so that the Ô form a probability distribution) and ½ Ô ¾ *28. Find the probability distribution on Ë ½ ¾ ¿ that has maximum entropy subject to the conditions that the mean of the distribution is ¾ and the second moment of the distribution of the distribution is ¾. In this case the constraints are ½ Ô ½ ½ Ô ¾
  • 64. Entropy and Information 49 and ½ ¾ Ô ¾ The next two exercises deal with the topic of Mutual Information. *29. If È is a joint probability distribution on Ë ¢ Ì , the Mutual Information of È , denoted ÁÅ´È µ, is the Kullback-Leibler Divergence between È and the product of the marginals É, given by É´× Ø µ ÈË´× µÈÌ´Ø µ Compute the Mutual Information of È when Ë ×½ ×¾ , Ì Ø½ ؾ Ø¿ , and È is given by È ´×½ ؽµ ¼ , È ´×½ ؾµ ¼ ¾ , È ´×½ Ø¿µ ¼ ½¾ , È ´×¾ ؽµ ¼ ¼ ¾ , È ´×¾ ؾµ ¼ ¼¿½¾ , È ´×¾ Ø¿µ ¼ ¼¿½¾ . *30. Show that if Ë ×½ ×¾ ×Å and Ì Ø½ ؾ ØÆ , an alternative expression for the Mutual Information of È , a joint probability distribution on Ë ¢Ì , is given by ÁÅ´È µ Å ½ Æ ½ È ´× Ø µÐÓ ´È ´× Ø µ ÈË´× µµ 1.21 References [1] R. Ash, Information Theory, John Wiley Sons, New York, 1965. [2] T. M. Cover and J. A. Thomas, Elements of Information Theory, John Wiley Sons, New York, 1991. [3] D. S. Jones, Elementary Information Theory, Oxford University Press, Oxford, 1979. [4] H. S. Leff and A. F. Rex, Eds., Maxwell’s Demon: Entropy, Information, Com- puting, Adam Hilger, Bristol, 1990. [5] R. D. Rosenkrantz, Ed., E. T. Jaynes: Papers on Probability, Statistics and Sta- tistical Physics, D. Reidel, Dordrecht, 1983. [6] C. E. Shannon and W. Weaver, The Mathematical Theory Of Communication, The University of Illinois Press, Urbana, IL, 1949.
  • 65.
  • 66. Chapter 2 Information Channels 2.1 What Are Information Channels? Information does not only have to be used or stored, it also has to be transmitted. In communication systems a transmitter converts, or encodes, a message to a form suit- able for transmission through a communications medium, be it a fibre optic channel, satellite link or radio signal through space. The receiver then detects the transmitted signal and decodes it back to the original message. The encoding and decoding oper- ations will be discussed in the following chapters. Although we will be dealing with digital data, physical transmission and storage has to deal with analog signals and media. Thus digital data has to be modulated in some way prior to transmission and storage, and detected by the receiver to reproduce the original digital sequence. Dis- cussion of modern analog and digital communication systems is beyond the scope of this book and interested readers should refer to the standard textbooks in the area, for example [7]. As we will see, two forms of coding will be necessary. Source coding is required to efficiently store and transmit the information generated by a general-purpose source (e.g., ASCII text, images, audio, etc.) on the storage or transmission medium in use (e.g., computer disk or digital data networks, both requiring conversion to binary data). Channel coding is required to mitigate the effects of noise that may corrupt data during storage and transmission through physical systems. What is noise? Noise can be defined as any unwanted signal or effect in addition to the desired signal. As such there can be many causes of noise: interference from other signals, thermal and spurious effects generated by the electronic device being used to store or transmit the signal, environmental disturbances, etc. In Chapter 1 we were concerned with defining the information content of a source. In this chapter we will be concerned with the equally important problem of defining or measuring the information carrying capacity of a channel. Intuitively a channel can carry no more information than is being pushed through the channel itself, that is, the entropy of the source or message being transmitted through the channel. But as we will see the presence of noise reduces the information carrying capacity of the channel and leads us to define another important quantity (alongside entropy) called mutual information. As we view the channel from an information theoretic 51