Chapter 17

ashish bansal
ashish bansalProgram Analyst at PointedSquares Edventures
Chapter 17
Greedy Algorithms
Introduction
 Dynamic programming is a good technique for many optimization
problems, but may be overkill for many others
 Relies on global optimization of all choices at all steps
 Greedy algorithms make the choice that seems best at the
moment
 Choices are locally optimal, in the hopes that this will lead to a
globally optimal solution
 Because of this, greedy algorithms do not always yield optimal
solutions, but many times they do
An Activity-Selection Problem
 Suppose we have a set S={1, 2, …, n} of n proposed
activities that wish to use a resource - a lecture hall, for
example - that can only be used by one activity at a time
 Each activity i has a start time si, and finish time fi, where si
<= fi
 Selected activities take place during the interval [si, fi)
 Activities i and j are compatible if their respective intervals [si,
fi) and [sj, fj) do not overlap
 The activity-selection problem is to select a maximum-size
set of mutually compatible activities
An Activity-Selection Problem
 Here is a greedy algorithm to select a maximum size set of
mutually compatible activites
 It assume that input activities have been sorted by increasing
finish time
GreedyActivitySelector(time s[], time f[], stack<int> &A,
int size)
{
int j = 0;
for ( int i = 1 ; i < n ; ++i ) {
if ( s[i] >= f[j] ) {
A.push(i);
j = i;
}
}
}
GreedyActivitySelector(time s[], time f[], stack<int> &A,
int size)
{
int j = 0;
for ( int i = 1 ; i < n ; ++i ) {
if ( s[i] >= f[j] ) {
A.push(i);
j = i;
}
}
}
An Activity-Selection Problem
 How does it work?
 A collects the selected activities
 j specifies the most recent addition to A
 As such, fj is always the maximum finishing time of any
activity in A
 We use this to check compatibility - an activity is
compatible (and added to A) if it’s start time is not earlier
than the finish time of the most recent addition to A
An Activity-Selection Problem
 As the algorithm progresses, the activity picked next
is always the one with the earliest finish time that
can legally be scheduled
 This is thus a “greedy” choice in that it leaves as much
opportunity as possible for the remaining activities to be
scheduled
 i.e., it maximizes the amount of unscheduled time
remaining
An Activity-Selection Problem
 How efficient is this algorithm?
 There’s only a single loop, so it can schedule a set of
activities that have already been sorted in O(n) time
 How does this compare with a dynamic
programming approach?
 Base the approach on computing mi iteratively for i=1, 2, …,
n, where mi is the size of the largest set of mutually
compatible activities among activities {1, 2, …, i}
Elements of the Greedy Strategy
 Greedy algorithms solve problems by making locally
optimal choices
 This won’t always lead to a globally optimal solution, but
many times will
 Problems that lend themselves to the greedy
strategy exhibit two main properties:
 The greedy-choice property
 Optimal substructure
Elements of the Greedy Strategy
 The greedy-choice property
 A globally optimal solution can be arrived at by
making a locally optimal (greedy) choice
 In dynamic programming, choices may depend on
the answers to previously computed subproblems
 In a greedy algorithm, we make the best choice
we can at the moment then solve the resulting
subproblems
 May depend on choices so far, but cannot depend
on future choices or on solutions to subproblems
 Usually progresses in top-down fashion
Elements of the Greedy Strategy
 Proving that greedy choices yields globally optimal
solutions
 Examine a globally optimal solution
 Modify the solution to make a greedy choice at the first
step, and show that this reduces the problem to a similar
but smaller problem
 Reduces the problem of correctness to demonstrating that an
optimal solution must exhibit optimal substructure
 Apply induction to show that a greedy choice can be used
at every step
Elements of the Greedy Strategy
 Optimal substructure
 A problem exhibits optimal substructure if an optimal
solution to the problem contains within it optimal solutions
to subproblems
 This is a key of both dynamic programming and greedy
algorithms
Greedy vs. Dynamic Programming
 Consider two similar problems
 The 0-1 knapsack problem
 A thief robbing a store finds n items; the ith item is worth vi
dollars and weighs wi pounds (both integers)
 The thief wants to take the most valuable load he can, subject
to the maximum weight W that he can carry
 What items does he choose?
 The fractional knapsack problem
 Same thief, same problem, except that the thief may take
fractions of items rather than having to either take it or leave it
Greedy vs. Dynamic Programming
 Both problems exhibit the optimal
substructure property
 Removing an item means that the items that
remain must be the most valuable that can be
carried within the remaining weight allowance (W-
w, where w is the weight of the item/fraction of
item removed)
Greedy vs. Dynamic Programming
 Solving the fractional knapsack problem
 First compute the value per pound vi/wi for each item
 By the greedy strategy, the thief starts by grabbing as much
as possible of the item with the greatest value per pound
 If he runs out of the first item, he starts with the next
highest value per pound and grabs as much of it as
possible
 And so on, until no more can be carried
Greedy vs. Dynamic Programming
 Solving the 0-1 knapsack problem
 The greedy solution doesn’t work here
 Grabbing the item with the highest value per pound doesn’t
guarantee that you get the most value, nor does grabbing the
biggest item
 What must happen in this case is that solutions to
subproblems must be compared to determine which one
provides the most valuable load
 This is the overlapping subproblem property
Huffman Codes
 Huffman codes are an effective technique for
compressing data
 The algorithm builds a table of the frequencies of
each character in a file
 The table is then used to determine an optimal
way of representing each character as a binary
string
Huffman Codes
 Consider a file of 100,000 characters from a-
f, with these frequencies:
 a = 45,000
 b = 13,000
 c = 12,000
 d = 16,000
 e = 9,000
 f = 5,000
Huffman Codes
 Typically each character in a file is stored as a
single byte (8 bits)
 If we know we only have six characters, we can use a 3 bit
code for the characters instead:
 a = 000, b = 001, c = 010, d = 011, e = 100, f = 101
 This is called a fixed-length code
 With this scheme, we can encode the whole file with 300,000
bits
 We can do better
 Better compression
 More flexibility
Huffman Codes
 Variable length codes can perform significantly
better
 Frequent characters are given short code words, while
infrequent characters get longer code words
 Consider this scheme:
 a = 0; b = 101; c = 100; d = 111; e = 1101; f = 1100
 How many bits are now required to encode our file?
 45,000*1 + 13,000*3 + 12,000*3 + 16,000*3 + 9,000*4 + 5,000*4
= 224,000 bits
 This is in fact an optimal character code for this file
Huffman Codes
 Prefix codes
 Huffman codes are constructed in such a way that they can
be unambiguously translated back to the original data, yet
still be an optimal character code
 Huffman codes are really considered “prefix codes”
 No code word is a prefix of any other codeword
 This guarantees unambiguous decoding
 Once a code is recognized, we can replace with the decoded
data, without worrying about whether we may also match some
other code
Huffman Codes
 Both the encoder and decoder make use of a binary
tree to recognize codes
 The leaves of the tree represent the unencoded characters
 Each left branch indicates a “0” placed in the encoded bit
string
 Each right branch indicates a “1” placed in the bit string
Huffman Codes
100
a :45
0
55
1
25
0
c :12
0
b :13
1
30
1
14
0
d :16
1
f :5 e :9
0 1
A Huffman Code Tree
 To encode:
 Search the tree for the character
to encode
 As you progress, add “0” or “1” to
right of code
 Code is complete when you find
character
 To decode a code:
 Proceed through bit string left to right
 For each bit, proceed left or right as
indicated
 When you reach a leaf, that is the
decoded character
Huffman Codes
 Using this representation, an optimal code will
always be represented by a full binary tree
 Every non-leaf node has two children
 If this were not true, then there would be “waste” bits, as in the
fixed-length code, leading to a non-optimal compression
 For a set of c characters, this requires c leaves, and c-1
internal nodes
Huffman Codes
 Given a Huffman tree, how do we compute the
number of bits required to encode a file?
 For every character c:
 Let f(c) denote the character’s frequency
 Let dT(c) denote the character’s depth in the tree
 This is also the length of the character’s code word
 The total bits required is then:
∑
∈
=
Cc
T cdcfTB )()()(
Constructing a Huffman Code
 Huffman developed a greedy algorithm for
constructing an optimal prefix code
 The algorithm builds the tree in a bottom-up manner
 It begins with the leaves, then performs merging operations
to build up the tree
 At each step, it merges the two least frequent members
together
 It removes these characters from the set, and replaces them
with a “metacharacter” with frequency = sum of the removed
characters’ frequencies
Constructing a Huffman Code
Huffman(table C[], int n, tree T)
{
// Create a priority queue that has all characters
// sorted by their frequencies - each entry happens
// to also be a complete tree node
PriorityQueue Q(C);
for ( int i = 0 ; i < n-1 ; ++i )
{
z = T.AllocateNode();
x = z->Left = Q.ExtractMin();
y = z->Right = Q.ExtractMin();
z->freq = x->freq + y->freq;
Q.Insert(z);
}
}
Huffman(table C[], int n, tree T)
{
// Create a priority queue that has all characters
// sorted by their frequencies - each entry happens
// to also be a complete tree node
PriorityQueue Q(C);
for ( int i = 0 ; i < n-1 ; ++i )
{
z = T.AllocateNode();
x = z->Left = Q.ExtractMin();
y = z->Right = Q.ExtractMin();
z->freq = x->freq + y->freq;
Q.Insert(z);
}
}
Constructing a Huffman Code
f :5 e :9 c :12 b :13 d :16 a :45
The algorithm starts by placing an entry in a priority queue
for each character, sorted by frequency
f :5 e :9
c :12 b :13 d :16 a :4514
0 1
Next it removes the two entries with the lowest frequency and
inserts them into the Huffman tree. It replaces them with a
node that has a frequency = to the sum of their removed
frequencies
Constructing a Huffman Code
The tree after removing and merging c & b
Because we’re using a priority queue, each node is inserted
into it’s sorted position, based on the sum of the frequencies
of its children
f :5 e :9 c :12 b :13
d :16 a :4514
0 1
25
0 1
Constructing a Huffman Code
f :5 e :9
c :12 b :13 d :16
a :45
14
0 1
25
0 1
30
0 1
The tree after removing the previously merged f & e, and d,
and merging them
Constructing a Huffman Code
f :5 e :9
c :12 b :13 d :16
a :45
14
0 1
25
0 1
30
0 1
55
0 1
The tree after removing the “25” and “30” frequency nodes
Constructing a Huffman Code
f :5 e :9
c :12 b :13 d :16
a :45
14
0 1
25
0 1
30
0 1
55
0 1
100
0 1
The final tree
Constructing a Huffman Code
 What is the running time?
 If we use a binary heap for the priority queue, it
takes O(n) to build
 The loop is executed n-1 times
 Each heap operation requires O(log2n)
 So, building the Huffman tree requires O(nlog2n)
1 de 32

Recomendados

lecture 26lecture 26
lecture 26sajinsc
1.7K vistas20 diapositivas
Greedy AlgorihmGreedy Algorihm
Greedy AlgorihmMuhammad Amjad Rana
16.2K vistas18 diapositivas
Greedy AlgorithmGreedy Algorithm
Greedy AlgorithmWaqar Akram
4.4K vistas22 diapositivas
Greedy method1Greedy method1
Greedy method1Rajendran
1K vistas14 diapositivas
test pretest pre
test prefarazch
458 vistas29 diapositivas

Más contenido relacionado

La actualidad más candente(20)

Greedy method by Dr. B. J. MohiteGreedy method by Dr. B. J. Mohite
Greedy method by Dr. B. J. Mohite
Zeal Education Society, Pune5.1K vistas
4 greedy methodnew4 greedy methodnew
4 greedy methodnew
abhinav1082.6K vistas
Greedy algorithmGreedy algorithm
Greedy algorithm
International Islamic University6.4K vistas
A greedy algorithmsA greedy algorithms
A greedy algorithms
Amit Rathi514 vistas
Greedy Algorithms with examples'  b-18298Greedy Algorithms with examples'  b-18298
Greedy Algorithms with examples' b-18298
LGS, GBHS&IC, University Of South-Asia, TARA-Technologies2.1K vistas
Ms nikita greedy agorithmMs nikita greedy agorithm
Ms nikita greedy agorithm
Nikitagupta123398 vistas
GreedymethodGreedymethod
Greedymethod
Meenakshi Devi11.5K vistas
daa-unit-3-greedy methoddaa-unit-3-greedy method
daa-unit-3-greedy method
hodcsencet2K vistas
Greedy algorithmsGreedy algorithms
Greedy algorithms
Rajendran 2.1K vistas
unit-4-dynamic programmingunit-4-dynamic programming
unit-4-dynamic programming
hodcsencet372 vistas
GreedyGreedy
Greedy
koralverma1.9K vistas
Greedy AlgorithmsGreedy Algorithms
Greedy Algorithms
Amrinder Arora15.2K vistas
Dynamic programmingDynamic programming
Dynamic programming
International Islamic University434 vistas
Greedy methodGreedy method
Greedy method
Anusha sivakumar236 vistas
Greedy Algorithm - Knapsack ProblemGreedy Algorithm - Knapsack Problem
Greedy Algorithm - Knapsack Problem
Madhu Bala24.5K vistas
12 Greeddy Method12 Greeddy Method
12 Greeddy Method
Andres Mendez-Vazquez634 vistas
Greedyalgorithm Greedyalgorithm
Greedyalgorithm
Diksha Lad462 vistas

Similar a Chapter 17

Daa chapter4Daa chapter4
Daa chapter4B.Kirron Reddi
30 vistas13 diapositivas
DA lecture 3.pptxDA lecture 3.pptx
DA lecture 3.pptxSayanSen36
7 vistas14 diapositivas
Lecture 01-2.pptLecture 01-2.ppt
Lecture 01-2.pptRaoHamza24
4 vistas24 diapositivas
AlgorithmsAlgorithms
AlgorithmsRamy F. Radwan
220 vistas9 diapositivas

Similar a Chapter 17(20)

Daa chapter4Daa chapter4
Daa chapter4
B.Kirron Reddi30 vistas
Analysis and Design of Algorithms  notesAnalysis and Design of Algorithms  notes
Analysis and Design of Algorithms notes
Prof. Dr. K. Adisesha439 vistas
DA lecture 3.pptxDA lecture 3.pptx
DA lecture 3.pptx
SayanSen367 vistas
Lecture 01-2.pptLecture 01-2.ppt
Lecture 01-2.ppt
RaoHamza244 vistas
AlgorithmsAlgorithms
Algorithms
Ramy F. Radwan220 vistas
DAA UNIT 3DAA UNIT 3
DAA UNIT 3
SURBHI SAROHA199 vistas
Unit V.pdfUnit V.pdf
Unit V.pdf
KPRevathiAsstprofITD8 vistas
Algorithm Design and Complexity - Course 5Algorithm Design and Complexity - Course 5
Algorithm Design and Complexity - Course 5
Traian Rebedea3.9K vistas
Introduction to AlgorithmIntroduction to Algorithm
Introduction to Algorithm
Sabyasachi Moitra482 vistas
DATA STRUCTURE.pdfDATA STRUCTURE.pdf
DATA STRUCTURE.pdf
ibrahim386946251 vistas
DATA STRUCTUREDATA STRUCTURE
DATA STRUCTURE
RobinRohit2113 vistas
Aad introductionAad introduction
Aad introduction
Mr SMAK1K vistas
Types of Algorithms.pptTypes of Algorithms.ppt
Types of Algorithms.ppt
ALIZAIB KHAN9 vistas
Huffman analysisHuffman analysis
Huffman analysis
Abubakar Sultan269 vistas

Más de ashish bansal

Data strutersData struters
Data strutersashish bansal
381 vistas6 diapositivas
Cis435 week05Cis435 week05
Cis435 week05ashish bansal
424 vistas28 diapositivas
Cis435 week04Cis435 week04
Cis435 week04ashish bansal
190 vistas21 diapositivas
Cis435 week03Cis435 week03
Cis435 week03ashish bansal
413 vistas45 diapositivas
Cis435 week02Cis435 week02
Cis435 week02ashish bansal
699 vistas27 diapositivas
Cis435 week01Cis435 week01
Cis435 week01ashish bansal
377 vistas36 diapositivas

Más de ashish bansal(14)

Data strutersData struters
Data struters
ashish bansal381 vistas
Cis435 week05Cis435 week05
Cis435 week05
ashish bansal424 vistas
Cis435 week04Cis435 week04
Cis435 week04
ashish bansal190 vistas
Cis435 week03Cis435 week03
Cis435 week03
ashish bansal413 vistas
Cis435 week02Cis435 week02
Cis435 week02
ashish bansal699 vistas
Cis435 week01Cis435 week01
Cis435 week01
ashish bansal377 vistas
Chapter 16Chapter 16
Chapter 16
ashish bansal407 vistas
Chapter 15Chapter 15
Chapter 15
ashish bansal543 vistas
35 algorithm-types35 algorithm-types
35 algorithm-types
ashish bansal344 vistas
32 algorithm-types32 algorithm-types
32 algorithm-types
ashish bansal1.3K vistas
7 stacksqueues7 stacksqueues
7 stacksqueues
ashish bansal512 vistas
5 searching5 searching
5 searching
ashish bansal240 vistas
4 recursion4 recursion
4 recursion
ashish bansal340 vistas
Cis435 week06Cis435 week06
Cis435 week06
ashish bansal492 vistas

Último(20)

Plastic waste.pdfPlastic waste.pdf
Plastic waste.pdf
alqaseedae94 vistas
Psychology KS4Psychology KS4
Psychology KS4
WestHatch54 vistas
Classification of crude drugs.pptxClassification of crude drugs.pptx
Classification of crude drugs.pptx
GayatriPatra1460 vistas
BYSC infopack.pdfBYSC infopack.pdf
BYSC infopack.pdf
Fundacja Rozwoju Społeczeństwa Przedsiębiorczego160 vistas
STERILITY TEST.pptxSTERILITY TEST.pptx
STERILITY TEST.pptx
Anupkumar Sharma107 vistas
Gopal Chakraborty Memorial Quiz 2.0 Prelims.pptxGopal Chakraborty Memorial Quiz 2.0 Prelims.pptx
Gopal Chakraborty Memorial Quiz 2.0 Prelims.pptx
Debapriya Chakraborty479 vistas
STYP infopack.pdfSTYP infopack.pdf
STYP infopack.pdf
Fundacja Rozwoju Społeczeństwa Przedsiębiorczego159 vistas
Education and Diversity.pptxEducation and Diversity.pptx
Education and Diversity.pptx
DrHafizKosar87 vistas
Class 10 English  lesson plansClass 10 English  lesson plans
Class 10 English lesson plans
TARIQ KHAN189 vistas
Narration  ppt.pptxNarration  ppt.pptx
Narration ppt.pptx
TARIQ KHAN76 vistas
Use of Probiotics in Aquaculture.pptxUse of Probiotics in Aquaculture.pptx
Use of Probiotics in Aquaculture.pptx
AKSHAY MANDAL72 vistas
Streaming Quiz 2023.pdfStreaming Quiz 2023.pdf
Streaming Quiz 2023.pdf
Quiz Club NITW97 vistas

Chapter 17

  • 2. Introduction  Dynamic programming is a good technique for many optimization problems, but may be overkill for many others  Relies on global optimization of all choices at all steps  Greedy algorithms make the choice that seems best at the moment  Choices are locally optimal, in the hopes that this will lead to a globally optimal solution  Because of this, greedy algorithms do not always yield optimal solutions, but many times they do
  • 3. An Activity-Selection Problem  Suppose we have a set S={1, 2, …, n} of n proposed activities that wish to use a resource - a lecture hall, for example - that can only be used by one activity at a time  Each activity i has a start time si, and finish time fi, where si <= fi  Selected activities take place during the interval [si, fi)  Activities i and j are compatible if their respective intervals [si, fi) and [sj, fj) do not overlap  The activity-selection problem is to select a maximum-size set of mutually compatible activities
  • 4. An Activity-Selection Problem  Here is a greedy algorithm to select a maximum size set of mutually compatible activites  It assume that input activities have been sorted by increasing finish time GreedyActivitySelector(time s[], time f[], stack<int> &A, int size) { int j = 0; for ( int i = 1 ; i < n ; ++i ) { if ( s[i] >= f[j] ) { A.push(i); j = i; } } } GreedyActivitySelector(time s[], time f[], stack<int> &A, int size) { int j = 0; for ( int i = 1 ; i < n ; ++i ) { if ( s[i] >= f[j] ) { A.push(i); j = i; } } }
  • 5. An Activity-Selection Problem  How does it work?  A collects the selected activities  j specifies the most recent addition to A  As such, fj is always the maximum finishing time of any activity in A  We use this to check compatibility - an activity is compatible (and added to A) if it’s start time is not earlier than the finish time of the most recent addition to A
  • 6. An Activity-Selection Problem  As the algorithm progresses, the activity picked next is always the one with the earliest finish time that can legally be scheduled  This is thus a “greedy” choice in that it leaves as much opportunity as possible for the remaining activities to be scheduled  i.e., it maximizes the amount of unscheduled time remaining
  • 7. An Activity-Selection Problem  How efficient is this algorithm?  There’s only a single loop, so it can schedule a set of activities that have already been sorted in O(n) time  How does this compare with a dynamic programming approach?  Base the approach on computing mi iteratively for i=1, 2, …, n, where mi is the size of the largest set of mutually compatible activities among activities {1, 2, …, i}
  • 8. Elements of the Greedy Strategy  Greedy algorithms solve problems by making locally optimal choices  This won’t always lead to a globally optimal solution, but many times will  Problems that lend themselves to the greedy strategy exhibit two main properties:  The greedy-choice property  Optimal substructure
  • 9. Elements of the Greedy Strategy  The greedy-choice property  A globally optimal solution can be arrived at by making a locally optimal (greedy) choice  In dynamic programming, choices may depend on the answers to previously computed subproblems  In a greedy algorithm, we make the best choice we can at the moment then solve the resulting subproblems  May depend on choices so far, but cannot depend on future choices or on solutions to subproblems  Usually progresses in top-down fashion
  • 10. Elements of the Greedy Strategy  Proving that greedy choices yields globally optimal solutions  Examine a globally optimal solution  Modify the solution to make a greedy choice at the first step, and show that this reduces the problem to a similar but smaller problem  Reduces the problem of correctness to demonstrating that an optimal solution must exhibit optimal substructure  Apply induction to show that a greedy choice can be used at every step
  • 11. Elements of the Greedy Strategy  Optimal substructure  A problem exhibits optimal substructure if an optimal solution to the problem contains within it optimal solutions to subproblems  This is a key of both dynamic programming and greedy algorithms
  • 12. Greedy vs. Dynamic Programming  Consider two similar problems  The 0-1 knapsack problem  A thief robbing a store finds n items; the ith item is worth vi dollars and weighs wi pounds (both integers)  The thief wants to take the most valuable load he can, subject to the maximum weight W that he can carry  What items does he choose?  The fractional knapsack problem  Same thief, same problem, except that the thief may take fractions of items rather than having to either take it or leave it
  • 13. Greedy vs. Dynamic Programming  Both problems exhibit the optimal substructure property  Removing an item means that the items that remain must be the most valuable that can be carried within the remaining weight allowance (W- w, where w is the weight of the item/fraction of item removed)
  • 14. Greedy vs. Dynamic Programming  Solving the fractional knapsack problem  First compute the value per pound vi/wi for each item  By the greedy strategy, the thief starts by grabbing as much as possible of the item with the greatest value per pound  If he runs out of the first item, he starts with the next highest value per pound and grabs as much of it as possible  And so on, until no more can be carried
  • 15. Greedy vs. Dynamic Programming  Solving the 0-1 knapsack problem  The greedy solution doesn’t work here  Grabbing the item with the highest value per pound doesn’t guarantee that you get the most value, nor does grabbing the biggest item  What must happen in this case is that solutions to subproblems must be compared to determine which one provides the most valuable load  This is the overlapping subproblem property
  • 16. Huffman Codes  Huffman codes are an effective technique for compressing data  The algorithm builds a table of the frequencies of each character in a file  The table is then used to determine an optimal way of representing each character as a binary string
  • 17. Huffman Codes  Consider a file of 100,000 characters from a- f, with these frequencies:  a = 45,000  b = 13,000  c = 12,000  d = 16,000  e = 9,000  f = 5,000
  • 18. Huffman Codes  Typically each character in a file is stored as a single byte (8 bits)  If we know we only have six characters, we can use a 3 bit code for the characters instead:  a = 000, b = 001, c = 010, d = 011, e = 100, f = 101  This is called a fixed-length code  With this scheme, we can encode the whole file with 300,000 bits  We can do better  Better compression  More flexibility
  • 19. Huffman Codes  Variable length codes can perform significantly better  Frequent characters are given short code words, while infrequent characters get longer code words  Consider this scheme:  a = 0; b = 101; c = 100; d = 111; e = 1101; f = 1100  How many bits are now required to encode our file?  45,000*1 + 13,000*3 + 12,000*3 + 16,000*3 + 9,000*4 + 5,000*4 = 224,000 bits  This is in fact an optimal character code for this file
  • 20. Huffman Codes  Prefix codes  Huffman codes are constructed in such a way that they can be unambiguously translated back to the original data, yet still be an optimal character code  Huffman codes are really considered “prefix codes”  No code word is a prefix of any other codeword  This guarantees unambiguous decoding  Once a code is recognized, we can replace with the decoded data, without worrying about whether we may also match some other code
  • 21. Huffman Codes  Both the encoder and decoder make use of a binary tree to recognize codes  The leaves of the tree represent the unencoded characters  Each left branch indicates a “0” placed in the encoded bit string  Each right branch indicates a “1” placed in the bit string
  • 22. Huffman Codes 100 a :45 0 55 1 25 0 c :12 0 b :13 1 30 1 14 0 d :16 1 f :5 e :9 0 1 A Huffman Code Tree  To encode:  Search the tree for the character to encode  As you progress, add “0” or “1” to right of code  Code is complete when you find character  To decode a code:  Proceed through bit string left to right  For each bit, proceed left or right as indicated  When you reach a leaf, that is the decoded character
  • 23. Huffman Codes  Using this representation, an optimal code will always be represented by a full binary tree  Every non-leaf node has two children  If this were not true, then there would be “waste” bits, as in the fixed-length code, leading to a non-optimal compression  For a set of c characters, this requires c leaves, and c-1 internal nodes
  • 24. Huffman Codes  Given a Huffman tree, how do we compute the number of bits required to encode a file?  For every character c:  Let f(c) denote the character’s frequency  Let dT(c) denote the character’s depth in the tree  This is also the length of the character’s code word  The total bits required is then: ∑ ∈ = Cc T cdcfTB )()()(
  • 25. Constructing a Huffman Code  Huffman developed a greedy algorithm for constructing an optimal prefix code  The algorithm builds the tree in a bottom-up manner  It begins with the leaves, then performs merging operations to build up the tree  At each step, it merges the two least frequent members together  It removes these characters from the set, and replaces them with a “metacharacter” with frequency = sum of the removed characters’ frequencies
  • 26. Constructing a Huffman Code Huffman(table C[], int n, tree T) { // Create a priority queue that has all characters // sorted by their frequencies - each entry happens // to also be a complete tree node PriorityQueue Q(C); for ( int i = 0 ; i < n-1 ; ++i ) { z = T.AllocateNode(); x = z->Left = Q.ExtractMin(); y = z->Right = Q.ExtractMin(); z->freq = x->freq + y->freq; Q.Insert(z); } } Huffman(table C[], int n, tree T) { // Create a priority queue that has all characters // sorted by their frequencies - each entry happens // to also be a complete tree node PriorityQueue Q(C); for ( int i = 0 ; i < n-1 ; ++i ) { z = T.AllocateNode(); x = z->Left = Q.ExtractMin(); y = z->Right = Q.ExtractMin(); z->freq = x->freq + y->freq; Q.Insert(z); } }
  • 27. Constructing a Huffman Code f :5 e :9 c :12 b :13 d :16 a :45 The algorithm starts by placing an entry in a priority queue for each character, sorted by frequency f :5 e :9 c :12 b :13 d :16 a :4514 0 1 Next it removes the two entries with the lowest frequency and inserts them into the Huffman tree. It replaces them with a node that has a frequency = to the sum of their removed frequencies
  • 28. Constructing a Huffman Code The tree after removing and merging c & b Because we’re using a priority queue, each node is inserted into it’s sorted position, based on the sum of the frequencies of its children f :5 e :9 c :12 b :13 d :16 a :4514 0 1 25 0 1
  • 29. Constructing a Huffman Code f :5 e :9 c :12 b :13 d :16 a :45 14 0 1 25 0 1 30 0 1 The tree after removing the previously merged f & e, and d, and merging them
  • 30. Constructing a Huffman Code f :5 e :9 c :12 b :13 d :16 a :45 14 0 1 25 0 1 30 0 1 55 0 1 The tree after removing the “25” and “30” frequency nodes
  • 31. Constructing a Huffman Code f :5 e :9 c :12 b :13 d :16 a :45 14 0 1 25 0 1 30 0 1 55 0 1 100 0 1 The final tree
  • 32. Constructing a Huffman Code  What is the running time?  If we use a binary heap for the priority queue, it takes O(n) to build  The loop is executed n-1 times  Each heap operation requires O(log2n)  So, building the Huffman tree requires O(nlog2n)