SlideShare una empresa de Scribd logo
1 de 44
C#ODE Studio
CS 102
Huffman Coding:
An Application of Binary
Trees and Priority Queues
C#ODE Studio
CS 102
Encoding and
Compression of Data
• Fax Machines
• ASCII
• Variations on ASCII
– min number of bits needed
– cost of savings
– patterns
– modifications
C#ODE Studio
CS 102
Purpose of Huffman
Coding
• Proposed by Dr. David A.
Huffman in 1952
– “A Method for the Construction
of Minimum Redundancy Codes”
• Applicable to many forms of
data transmission
– Our example: text files
C#ODE Studio
CS 102
The Basic Algorithm
• Huffman coding is a form of
statistical coding
• Not all characters occur with the
same frequency!
• Yet all characters are allocated
the same amount of space
– 1 char = 1 byte, be it e or x
C#ODE Studio
CS 102
The Basic Algorithm
• Any savings in tailoring codes
to frequency of character?
• Code word lengths are no longer
fixed like ASCII.
• Code word lengths vary and will
be shorter for the more
frequently used characters.
C#ODE Studio
CS 102
The (Real) Basic
Algorithm
1. Scan text to be compressed and tally
occurrence of all characters.
2. Sort or prioritize characters based on
number of occurrences in text.
3. Build Huffman code tree based on
prioritized list.
4. Perform a traversal of tree to determine
all code words.
5. Scan text again and create new file
using the Huffman codes.
C#ODE Studio
CS 102
Building a Tree
Scan the original text
• Consider the following short
text:
Eerie eyes seen near lake.
• Count up the occurrences of all
characters in the text
C#ODE Studio
CS 102
Building a Tree
Scan the original text
Eerie eyes seen near lake.
• What characters are present?
E e r i space
y s n a r l k .
C#ODE Studio
CS 102
Building a Tree
Scan the original text
Eerie eyes seen near lake.
• What is the frequency of each
character in the text?
Char Freq. Char Freq. Char Freq.
E 1 y 1 k 1
e 8 s 2 . 1
r 2 n 2
i 1 a 2
space 4 l 1
C#ODE Studio
CS 102
Building a Tree
Prioritize characters
• Create binary tree nodes with
character and frequency of
each character
• Place nodes in a priority
queue
– The lower the occurrence, the
higher the priority in the queue
C#ODE Studio
CS 102
Building a Tree
Prioritize characters
• Uses binary tree nodes
public class HuffNode
{
public char myChar;
public int myFrequency;
public HuffNode myLeft, myRight;
}
priorityQueue myQueue;
C#ODE Studio
CS 102
Building a Tree
• The queue after inserting all nodes
• Null Pointers are not shown
E
1
i
1
y
1
l
1
k
1
.
1
r
2
s
2
n
2
a
2
sp
4
e
8
C#ODE Studio
CS 102
Building a Tree
• While priority queue contains two
or more nodes
– Create new node
– Dequeue node and make it left subtree
– Dequeue next node and make it right
subtree
– Frequency of new node equals sum of
frequency of left and right children
– Enqueue new node back into queue
C#ODE Studio
CS 102
Building a Tree
E
1
i
1
y
1
l
1
k
1
.
1
r
2
s
2
n
2
a
2
sp
4
e
8
C#ODE Studio
CS 102
Building a Tree
E i
y
1
l
1
k
1
.
1
r
2
s
2
n
2
a
2
sp
4
e
8
2
C#ODE Studio
CS 102
Building a Tree
E i
y
1
l
1
k
1
.
1
r
2
s
2
n
2
a
2
sp
4
e
8
C#ODE Studio
CS 102
Building a Tree
E i
k
1
.
1
r
2
s
2
n
2
a
2
sp
4
e
8
y l
C#ODE Studio
CS 102
Building a Tree
E i
k
1
.
1
r
2
s
2
n
2
a
2
sp
4
e
8
2
y l
2
C#ODE Studio
CS 102
Building a Tree
E i
r
2
s
2
n
2
a
2
sp
4
e
8
2
y l
2
k .
2
C#ODE Studio
CS 102
Building a Tree
E i
r
2
s
2
n
2
a
2
sp
4
e
8
2
y l
2
k .
2
C#ODE Studio
CS 102
Building a Tree
E i
n
2
a
2
sp
4
e
8
2
y l
2
k .
2
r s
4
C#ODE Studio
CS 102
Building a Tree
E i
n
2
a
2
sp
4
e
8
2
y l
2
k .
2
r s
4
C#ODE Studio
CS 102
Building a Tree
E i
sp
4
e
8
2
y l
2
k .
2
r s
4
n a
4
C#ODE Studio
CS 102
Building a Tree
E i
sp
4
e
8
2
y l
2
k .
2
r s
4
n a
4
C#ODE Studio
CS 102
Building a Tree
E i
sp
4
e
8
2
y l
2
k .
2
r s
4
n a
4
4
C#ODE Studio
CS 102
Building a Tree
E i
sp
4
e
82
y l
2
k .
2
r s
4
n a
4 4
C#ODE Studio
CS 102
Building a Tree
E i
sp
e
82
y l
2
k .
2
r s
4
n a
4 4
6
C#ODE Studio
CS 102
Building a Tree
E i
sp
e
8
2
y l
2
k .
2
r s
4
n a
4 4 6
What is happening to the characters
with a low number of occurrences?
C#ODE Studio
CS 102
Building a Tree
E i
sp
e
82
y l
2
k .
2
r s
4
n a
4
4
6
8
C#ODE Studio
CS 102
Building a Tree
E i
sp
e
82
y l
2
k .
2
r s
4
n a
4
4
6 8
C#ODE Studio
CS 102
Building a Tree
E i
sp
e
8
2
y l
2
k .
2
r s
4
n a
4
4
6
8
10
C#ODE Studio
CS 102
Building a Tree
E i
sp
e
8
2
y l
2
k .
2r s
4
n a
4 4
6
8 10
C#ODE Studio
CS 102
Building a Tree
E i
sp
e2
y l
2
k .
2
r s
4
n a
4
4
6
8
10
16
C#ODE Studio
CS 102
Building a Tree
E i
sp
e
2
y l
2
k .
2
r s
4
n a
4
4
6
8
10 16
C#ODE Studio
CS 102
Building a Tree
E i
sp
e
2
y l
2
k .
2
r s
4
n a
4
4
6
8
10
16
26
C#ODE Studio
CS 102
Building a Tree
E i
sp
e
2
y l
2
k .
2
r s
4
n a
4
4
6
8
10
16
26
•After
enqueueing
this node
there is only
one node left
in priority
queue.
C#ODE Studio
CS 102
Building a Tree
Dequeue the single node
left in the queue.
This tree contains the
new code words for each
character.
Frequency of root node
should equal number of
characters in text.
E i
sp
e
2
y l
2
k .
2
r s
4
n a
4
4
6 8
10
16
26
Eerie eyes seen near lake. 26 characters
C#ODE Studio
CS 102
Encoding the File
Traverse Tree for Codes
• Perform a traversal
of the tree to
obtain new code
words
• Going left is a 0
going right is a 1
• code word is only
completed when a
leaf node is
reached
E i
sp
e
2
y l
2
k .
2
r s
4
n a
4
4
6 8
10
16
26
C#ODE Studio
CS 102
Encoding the File
Traverse Tree for Codes
Char Code
E 0000
i 0001
y 0010
l 0011
k 0100
. 0101
space 011
e 10
r 1100
s 1101
n 1110
a 1111
E i
sp
e
2
y l
2
k .
2
r s
4
n a
4
4
6 8
10
16
26
C#ODE Studio
CS 102
Encoding the File
• Rescan text and
encode file using
new code words
Eerie eyes seen near lake.
Char Code
E 0000
i 0001
y 0010
l 0011
k 0100
. 0101
space 011
e 10
r 1100
s 1101
n 1110
a 1111
0000101100000110011
1000101011011010011
1110101111110001100
1111110100100101
• Why is there no need
for a separator
character?
.
C#ODE Studio
CS 102
Encoding the File
Results
• Have we made
things any
better?
• 73 bits to encode
the text
• ASCII would take
8 * 26 = 208 bits
0000101100000110011
1000101011011010011
1110101111110001100
1111110100100101
If modified code used 4 bits per
character are needed. Total bits
4 * 26 = 104. Savings not as great.
C#ODE Studio
CS 102
Decoding the File
• How does receiver know what the codes are?
• Tree constructed for each text file.
– Considers frequency for each file
– Big hit on compression, especially for smaller
files
• Tree predetermined
– based on statistical analysis of text files or
file types
• Data transmission is bit based versus byte
based
C#ODE Studio
CS 102
Decoding the File
• Once receiver has
tree it scans
incoming bit stream
• 0 ⇒ go left
• 1 ⇒ go right
E i
sp
e
2
y l
2
k .
2
r s
4
n a
4
4
6 8
10
16
26
101000110111101111
01111110000110101
C#ODE Studio
CS 102
Summary
• Huffman coding is a technique
used to compress files for
transmission
• Uses statistical coding
– more frequently used symbols have
shorter code words
• Works well for text and fax
transmissions
• An application that uses several
data structures

Más contenido relacionado

La actualidad más candente

180nm process typical parameter values
180nm process typical parameter values180nm process typical parameter values
180nm process typical parameter values
Mark Abadies
 
Vlsi design and fabrication ppt
Vlsi design and fabrication  pptVlsi design and fabrication  ppt
Vlsi design and fabrication ppt
Manjushree Mashal
 
io and pad ring.pdf
io and pad ring.pdfio and pad ring.pdf
io and pad ring.pdf
quandao25
 

La actualidad más candente (20)

180nm process typical parameter values
180nm process typical parameter values180nm process typical parameter values
180nm process typical parameter values
 
An Introduction to Crosstalk Measurements
An Introduction to Crosstalk MeasurementsAn Introduction to Crosstalk Measurements
An Introduction to Crosstalk Measurements
 
mosfet scaling_
mosfet scaling_mosfet scaling_
mosfet scaling_
 
Iain Richardson: An Introduction to Video Compression
Iain Richardson: An Introduction to Video CompressionIain Richardson: An Introduction to Video Compression
Iain Richardson: An Introduction to Video Compression
 
A view of semiconductor industry
A view of semiconductor industryA view of semiconductor industry
A view of semiconductor industry
 
01 nand flash_reliability_notes
01 nand flash_reliability_notes01 nand flash_reliability_notes
01 nand flash_reliability_notes
 
Minor Project- AES Implementation in Verilog
Minor Project- AES Implementation in VerilogMinor Project- AES Implementation in Verilog
Minor Project- AES Implementation in Verilog
 
CMOS-IC Design NOTES Lodhi.pdf
CMOS-IC Design NOTES Lodhi.pdfCMOS-IC Design NOTES Lodhi.pdf
CMOS-IC Design NOTES Lodhi.pdf
 
Layout rules
Layout rulesLayout rules
Layout rules
 
Spot and Projection Welding
Spot and Projection WeldingSpot and Projection Welding
Spot and Projection Welding
 
Optimized Local I/O ESD Protection for SerDes In Advanced SOI, BiCMOS and Fin...
Optimized Local I/O ESD Protection for SerDes In Advanced SOI, BiCMOS and Fin...Optimized Local I/O ESD Protection for SerDes In Advanced SOI, BiCMOS and Fin...
Optimized Local I/O ESD Protection for SerDes In Advanced SOI, BiCMOS and Fin...
 
VLSI routing
VLSI routingVLSI routing
VLSI routing
 
Semiconductor Industry Tutorial
Semiconductor Industry TutorialSemiconductor Industry Tutorial
Semiconductor Industry Tutorial
 
Finfet Technology
Finfet TechnologyFinfet Technology
Finfet Technology
 
42 PPT-5 BOUNDARY SCAN....pptx
42 PPT-5 BOUNDARY SCAN....pptx42 PPT-5 BOUNDARY SCAN....pptx
42 PPT-5 BOUNDARY SCAN....pptx
 
Vlsi design and fabrication ppt
Vlsi design and fabrication  pptVlsi design and fabrication  ppt
Vlsi design and fabrication ppt
 
io and pad ring.pdf
io and pad ring.pdfio and pad ring.pdf
io and pad ring.pdf
 
PCB Fabrication Process by Sierra Assembly
PCB Fabrication Process by Sierra Assembly PCB Fabrication Process by Sierra Assembly
PCB Fabrication Process by Sierra Assembly
 
Library Characterization Flow
Library Characterization FlowLibrary Characterization Flow
Library Characterization Flow
 
Design of CMOS operational Amplifiers using CADENCE
Design of CMOS operational Amplifiers using CADENCEDesign of CMOS operational Amplifiers using CADENCE
Design of CMOS operational Amplifiers using CADENCE
 

Destacado

Data compression huffman coding algoritham
Data compression huffman coding algorithamData compression huffman coding algoritham
Data compression huffman coding algoritham
Rahul Khanwani
 
Disk scheduling algorithm.52
Disk scheduling algorithm.52Disk scheduling algorithm.52
Disk scheduling algorithm.52
myrajendra
 

Destacado (20)

Huffman coding
Huffman codingHuffman coding
Huffman coding
 
Dijksatra
DijksatraDijksatra
Dijksatra
 
Disk scheduling algorithms
Disk scheduling algorithms Disk scheduling algorithms
Disk scheduling algorithms
 
Disk scheduling
Disk schedulingDisk scheduling
Disk scheduling
 
4.4 external hashing
4.4 external hashing4.4 external hashing
4.4 external hashing
 
9 cm402.18
9 cm402.189 cm402.18
9 cm402.18
 
Arrays and addressing modes
Arrays and addressing modesArrays and addressing modes
Arrays and addressing modes
 
Minimum spanning tree
Minimum spanning treeMinimum spanning tree
Minimum spanning tree
 
Filehandlinging cp2
Filehandlinging cp2Filehandlinging cp2
Filehandlinging cp2
 
Implicit objects advance Java
Implicit objects advance JavaImplicit objects advance Java
Implicit objects advance Java
 
Translation Lookaside Buffer & Inverted Page Table
Translation Lookaside Buffer & Inverted Page TableTranslation Lookaside Buffer & Inverted Page Table
Translation Lookaside Buffer & Inverted Page Table
 
Projection of line inclined to both the planes
Projection of line inclined to both the planesProjection of line inclined to both the planes
Projection of line inclined to both the planes
 
Design Concept software engineering
Design Concept software engineeringDesign Concept software engineering
Design Concept software engineering
 
Data compression huffman coding algoritham
Data compression huffman coding algorithamData compression huffman coding algoritham
Data compression huffman coding algoritham
 
Page replacement
Page replacementPage replacement
Page replacement
 
Disk scheduling algorithm.52
Disk scheduling algorithm.52Disk scheduling algorithm.52
Disk scheduling algorithm.52
 
Longest Common Subsequence (LCS) Algorithm
Longest Common Subsequence (LCS) AlgorithmLongest Common Subsequence (LCS) Algorithm
Longest Common Subsequence (LCS) Algorithm
 
Addressing mode of 8051
Addressing mode of 8051Addressing mode of 8051
Addressing mode of 8051
 
Hashing Techniques in Data Structures Part2
Hashing Techniques in Data Structures Part2Hashing Techniques in Data Structures Part2
Hashing Techniques in Data Structures Part2
 
Huffman Coding
Huffman CodingHuffman Coding
Huffman Coding
 

Similar a Huffman

Huffman > Data Structures & Algorithums
Huffman > Data Structures & AlgorithumsHuffman > Data Structures & Algorithums
Huffman > Data Structures & Algorithums
Ain-ul-Moiz Khawaja
 
huffman algoritm upload for understand.ppt
huffman algoritm upload for understand.ppthuffman algoritm upload for understand.ppt
huffman algoritm upload for understand.ppt
SadiaSharmin40
 

Similar a Huffman (20)

CS-102 Data Structures huffman coding.pdf
CS-102 Data Structures huffman coding.pdfCS-102 Data Structures huffman coding.pdf
CS-102 Data Structures huffman coding.pdf
 
CS-102 Data Structures huffman coding.pdf
CS-102 Data Structures huffman coding.pdfCS-102 Data Structures huffman coding.pdf
CS-102 Data Structures huffman coding.pdf
 
Huffman Tree And Its Application
Huffman Tree And Its ApplicationHuffman Tree And Its Application
Huffman Tree And Its Application
 
Huffman1
Huffman1Huffman1
Huffman1
 
Huffman 2
Huffman 2Huffman 2
Huffman 2
 
Huffman
HuffmanHuffman
Huffman
 
Data Structure and Algorithms Huffman Coding Algorithm
Data Structure and Algorithms Huffman Coding AlgorithmData Structure and Algorithms Huffman Coding Algorithm
Data Structure and Algorithms Huffman Coding Algorithm
 
Huffman > Data Structures & Algorithums
Huffman > Data Structures & AlgorithumsHuffman > Data Structures & Algorithums
Huffman > Data Structures & Algorithums
 
Data compession
Data compession Data compession
Data compession
 
huffman algoritm upload for understand.ppt
huffman algoritm upload for understand.ppthuffman algoritm upload for understand.ppt
huffman algoritm upload for understand.ppt
 
huffman codes and algorithm
huffman codes and algorithm huffman codes and algorithm
huffman codes and algorithm
 
huffman_nyu.ppt ghgghtttjghh hhhhhhhhhhh
huffman_nyu.ppt ghgghtttjghh hhhhhhhhhhhhuffman_nyu.ppt ghgghtttjghh hhhhhhhhhhh
huffman_nyu.ppt ghgghtttjghh hhhhhhhhhhh
 
Huffman
HuffmanHuffman
Huffman
 
Huffman.pptx
Huffman.pptxHuffman.pptx
Huffman.pptx
 
Farhana shaikh webinar_huffman coding
Farhana shaikh webinar_huffman codingFarhana shaikh webinar_huffman coding
Farhana shaikh webinar_huffman coding
 
Cassandra for the ops dos and donts
Cassandra for the ops   dos and dontsCassandra for the ops   dos and donts
Cassandra for the ops dos and donts
 
Huffman tree
Huffman tree Huffman tree
Huffman tree
 
Cassandra introduction mars jug
Cassandra introduction mars jugCassandra introduction mars jug
Cassandra introduction mars jug
 
Huffman Codes
Huffman CodesHuffman Codes
Huffman Codes
 
Creating a Fibonacci Generator in Assembly - by Willem van Ketwich
Creating a Fibonacci Generator in Assembly - by Willem van KetwichCreating a Fibonacci Generator in Assembly - by Willem van Ketwich
Creating a Fibonacci Generator in Assembly - by Willem van Ketwich
 

Último

Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Kandungan 087776558899
 
+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...
+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...
+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...
Health
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
Epec Engineered Technologies
 

Último (20)

Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdf
 
kiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadkiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal load
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planes
 
Rums floating Omkareshwar FSPV IM_16112021.pdf
Rums floating Omkareshwar FSPV IM_16112021.pdfRums floating Omkareshwar FSPV IM_16112021.pdf
Rums floating Omkareshwar FSPV IM_16112021.pdf
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best ServiceTamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
 
+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...
+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...
+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
Air Compressor reciprocating single stage
Air Compressor reciprocating single stageAir Compressor reciprocating single stage
Air Compressor reciprocating single stage
 
Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
 
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxA CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
 

Huffman

  • 1. C#ODE Studio CS 102 Huffman Coding: An Application of Binary Trees and Priority Queues
  • 2. C#ODE Studio CS 102 Encoding and Compression of Data • Fax Machines • ASCII • Variations on ASCII – min number of bits needed – cost of savings – patterns – modifications
  • 3. C#ODE Studio CS 102 Purpose of Huffman Coding • Proposed by Dr. David A. Huffman in 1952 – “A Method for the Construction of Minimum Redundancy Codes” • Applicable to many forms of data transmission – Our example: text files
  • 4. C#ODE Studio CS 102 The Basic Algorithm • Huffman coding is a form of statistical coding • Not all characters occur with the same frequency! • Yet all characters are allocated the same amount of space – 1 char = 1 byte, be it e or x
  • 5. C#ODE Studio CS 102 The Basic Algorithm • Any savings in tailoring codes to frequency of character? • Code word lengths are no longer fixed like ASCII. • Code word lengths vary and will be shorter for the more frequently used characters.
  • 6. C#ODE Studio CS 102 The (Real) Basic Algorithm 1. Scan text to be compressed and tally occurrence of all characters. 2. Sort or prioritize characters based on number of occurrences in text. 3. Build Huffman code tree based on prioritized list. 4. Perform a traversal of tree to determine all code words. 5. Scan text again and create new file using the Huffman codes.
  • 7. C#ODE Studio CS 102 Building a Tree Scan the original text • Consider the following short text: Eerie eyes seen near lake. • Count up the occurrences of all characters in the text
  • 8. C#ODE Studio CS 102 Building a Tree Scan the original text Eerie eyes seen near lake. • What characters are present? E e r i space y s n a r l k .
  • 9. C#ODE Studio CS 102 Building a Tree Scan the original text Eerie eyes seen near lake. • What is the frequency of each character in the text? Char Freq. Char Freq. Char Freq. E 1 y 1 k 1 e 8 s 2 . 1 r 2 n 2 i 1 a 2 space 4 l 1
  • 10. C#ODE Studio CS 102 Building a Tree Prioritize characters • Create binary tree nodes with character and frequency of each character • Place nodes in a priority queue – The lower the occurrence, the higher the priority in the queue
  • 11. C#ODE Studio CS 102 Building a Tree Prioritize characters • Uses binary tree nodes public class HuffNode { public char myChar; public int myFrequency; public HuffNode myLeft, myRight; } priorityQueue myQueue;
  • 12. C#ODE Studio CS 102 Building a Tree • The queue after inserting all nodes • Null Pointers are not shown E 1 i 1 y 1 l 1 k 1 . 1 r 2 s 2 n 2 a 2 sp 4 e 8
  • 13. C#ODE Studio CS 102 Building a Tree • While priority queue contains two or more nodes – Create new node – Dequeue node and make it left subtree – Dequeue next node and make it right subtree – Frequency of new node equals sum of frequency of left and right children – Enqueue new node back into queue
  • 14. C#ODE Studio CS 102 Building a Tree E 1 i 1 y 1 l 1 k 1 . 1 r 2 s 2 n 2 a 2 sp 4 e 8
  • 15. C#ODE Studio CS 102 Building a Tree E i y 1 l 1 k 1 . 1 r 2 s 2 n 2 a 2 sp 4 e 8 2
  • 16. C#ODE Studio CS 102 Building a Tree E i y 1 l 1 k 1 . 1 r 2 s 2 n 2 a 2 sp 4 e 8
  • 17. C#ODE Studio CS 102 Building a Tree E i k 1 . 1 r 2 s 2 n 2 a 2 sp 4 e 8 y l
  • 18. C#ODE Studio CS 102 Building a Tree E i k 1 . 1 r 2 s 2 n 2 a 2 sp 4 e 8 2 y l 2
  • 19. C#ODE Studio CS 102 Building a Tree E i r 2 s 2 n 2 a 2 sp 4 e 8 2 y l 2 k . 2
  • 20. C#ODE Studio CS 102 Building a Tree E i r 2 s 2 n 2 a 2 sp 4 e 8 2 y l 2 k . 2
  • 21. C#ODE Studio CS 102 Building a Tree E i n 2 a 2 sp 4 e 8 2 y l 2 k . 2 r s 4
  • 22. C#ODE Studio CS 102 Building a Tree E i n 2 a 2 sp 4 e 8 2 y l 2 k . 2 r s 4
  • 23. C#ODE Studio CS 102 Building a Tree E i sp 4 e 8 2 y l 2 k . 2 r s 4 n a 4
  • 24. C#ODE Studio CS 102 Building a Tree E i sp 4 e 8 2 y l 2 k . 2 r s 4 n a 4
  • 25. C#ODE Studio CS 102 Building a Tree E i sp 4 e 8 2 y l 2 k . 2 r s 4 n a 4 4
  • 26. C#ODE Studio CS 102 Building a Tree E i sp 4 e 82 y l 2 k . 2 r s 4 n a 4 4
  • 27. C#ODE Studio CS 102 Building a Tree E i sp e 82 y l 2 k . 2 r s 4 n a 4 4 6
  • 28. C#ODE Studio CS 102 Building a Tree E i sp e 8 2 y l 2 k . 2 r s 4 n a 4 4 6 What is happening to the characters with a low number of occurrences?
  • 29. C#ODE Studio CS 102 Building a Tree E i sp e 82 y l 2 k . 2 r s 4 n a 4 4 6 8
  • 30. C#ODE Studio CS 102 Building a Tree E i sp e 82 y l 2 k . 2 r s 4 n a 4 4 6 8
  • 31. C#ODE Studio CS 102 Building a Tree E i sp e 8 2 y l 2 k . 2 r s 4 n a 4 4 6 8 10
  • 32. C#ODE Studio CS 102 Building a Tree E i sp e 8 2 y l 2 k . 2r s 4 n a 4 4 6 8 10
  • 33. C#ODE Studio CS 102 Building a Tree E i sp e2 y l 2 k . 2 r s 4 n a 4 4 6 8 10 16
  • 34. C#ODE Studio CS 102 Building a Tree E i sp e 2 y l 2 k . 2 r s 4 n a 4 4 6 8 10 16
  • 35. C#ODE Studio CS 102 Building a Tree E i sp e 2 y l 2 k . 2 r s 4 n a 4 4 6 8 10 16 26
  • 36. C#ODE Studio CS 102 Building a Tree E i sp e 2 y l 2 k . 2 r s 4 n a 4 4 6 8 10 16 26 •After enqueueing this node there is only one node left in priority queue.
  • 37. C#ODE Studio CS 102 Building a Tree Dequeue the single node left in the queue. This tree contains the new code words for each character. Frequency of root node should equal number of characters in text. E i sp e 2 y l 2 k . 2 r s 4 n a 4 4 6 8 10 16 26 Eerie eyes seen near lake. 26 characters
  • 38. C#ODE Studio CS 102 Encoding the File Traverse Tree for Codes • Perform a traversal of the tree to obtain new code words • Going left is a 0 going right is a 1 • code word is only completed when a leaf node is reached E i sp e 2 y l 2 k . 2 r s 4 n a 4 4 6 8 10 16 26
  • 39. C#ODE Studio CS 102 Encoding the File Traverse Tree for Codes Char Code E 0000 i 0001 y 0010 l 0011 k 0100 . 0101 space 011 e 10 r 1100 s 1101 n 1110 a 1111 E i sp e 2 y l 2 k . 2 r s 4 n a 4 4 6 8 10 16 26
  • 40. C#ODE Studio CS 102 Encoding the File • Rescan text and encode file using new code words Eerie eyes seen near lake. Char Code E 0000 i 0001 y 0010 l 0011 k 0100 . 0101 space 011 e 10 r 1100 s 1101 n 1110 a 1111 0000101100000110011 1000101011011010011 1110101111110001100 1111110100100101 • Why is there no need for a separator character? .
  • 41. C#ODE Studio CS 102 Encoding the File Results • Have we made things any better? • 73 bits to encode the text • ASCII would take 8 * 26 = 208 bits 0000101100000110011 1000101011011010011 1110101111110001100 1111110100100101 If modified code used 4 bits per character are needed. Total bits 4 * 26 = 104. Savings not as great.
  • 42. C#ODE Studio CS 102 Decoding the File • How does receiver know what the codes are? • Tree constructed for each text file. – Considers frequency for each file – Big hit on compression, especially for smaller files • Tree predetermined – based on statistical analysis of text files or file types • Data transmission is bit based versus byte based
  • 43. C#ODE Studio CS 102 Decoding the File • Once receiver has tree it scans incoming bit stream • 0 ⇒ go left • 1 ⇒ go right E i sp e 2 y l 2 k . 2 r s 4 n a 4 4 6 8 10 16 26 101000110111101111 01111110000110101
  • 44. C#ODE Studio CS 102 Summary • Huffman coding is a technique used to compress files for transmission • Uses statistical coding – more frequently used symbols have shorter code words • Works well for text and fax transmissions • An application that uses several data structures