SlideShare una empresa de Scribd logo
1 de 8
1 Data Compression Lec (3)
-
Coding
Methods
2 Data Compression Lec (3)
1-Run-Length Encoding
The idea behind this approach to data compression is this: If a data item
d occurs nconsecutive times in the input stream, replace the n
occurrences with the single pairnd. The n consecutive occurrences of a
data item are called a run length of n, and thisapproach to data
compression is called run-length encoding or RLE. We apply this ideafirst
to text compression and then to image compression.
 RLE Text Compression
Just replacing 2._all_is_too_well with 2._a2_is_t2_we2 will not
work.Even the string 2._a2l_is_t2o_we2l does not solve this problem.
One way to solve this problem is to precede each repetition with a
special escape character. If we use the character @ as the escape
character, then the string 2._a@2l_is_t@2o_we@2l can be
decompressed unambiguously. However, this string is longer than the
original string, because it replaces two consecutive letters with three
characters. We have to adopt the convention that only three or more
repetitions of the same character will be replaced with a repetition
factor. The main problems with this method are the following:
1. In English text there are not many repetitions. There are many
“doubles” but a “triple” is rare.
2. The character “@” may be part of the text in the input stream,
in which case a different escape character must be chosen.
Sometimes the input stream may contain every possible
character in the alphabet.
 RLE Image Compression
RLE can be used to compress grayscale images. Each run of pixels of
the same intensity (gray level) is encoded as a pair (run length, pixel
value). The run length usually occupies one byte, allowing for runs of
up to 255 pixels. The pixel value occupies several bits, depending on
the number of gray levels (typically between 4 and 8 bits).
3 Data Compression Lec (3)
Example 3.1An 8-bit deep grayscale bitmap that starts with
12, 12, 12, 12, 12, 12, 12, 12, 12, 35, 76, 112, 67, 87, 87, 87,
5, 5, 5, 5, 5, 5, 1, . . .
is compressed into 9 ,12,35,76,112,67, 3 ,87, 6 ,5,1,. . . , where
the bold numbers indicate counts. The problem is to distinguish
between a byte containing a grayscale value (such as 12) and one
containing a count (such as 9 ). Here are some solutions
1. If the image is limited to just 128 grayscales, we can devote
one bit in each byte to indicate whether the byte contains a
grayscale value or a count.
2. If the number of grayscales is 256, it can be reduced to 255
with one value reserved as a flag to precede every byte with a
count. If the flag is, say, 255, then the sequence above be
comes
255, 9, 12, 35, 76, 112, 67, 255, 3, 87, 255, 6, 5, 1, . . . .
3. Again, one bit is devoted to each byte to indicate whether the byte
contains a grayscale value or a count. This time, however, these extra
bits are accumulated in groups of 8,and each group is written on the
output stream preceding (or following) the 8 bytes it “corresponds to.”
Example: the sequence 9 ,12,35,76,112,67, 3 ,87, 6 ,5,1,. ...
becomes
10000010 ,9,12,35,76,112,67,3,87, 100..... ,6,5,1,. .
4 Data Compression Lec (3)
2-Move-to-Front Coding
The basic idea of this method is to maintain the alphabet A of
symbols as a list where frequently occurring symbols are located near
the front. A symbol s isencoded as the number of symbols that
precede it in this list
Example 3.2
Here are example that illustrate the move-to-front idea. The
alphabet A=(a, b, c, d, m, n, o, p)
The input stream abcddcbamnopponm is encoded as
C = (0, 1, 2, 3, 0, 1, 2, 3, 4, 5, 6, 7, 0, 1, 2, 3)
5 Data Compression Lec (3)
3-Huffman coding
Huffman encoding is a way to assign binary codes to symbols that
reduces the overall number of bitsused to encode a typical string of
those symbols.
For example, if you use letters as symbols and have details of the
frequency of occurence of those letters in typical strings, then you
could just encode each letter with a fixed number of bits, such as in
ASCII codes. You can do better than this by encoding more
frequently occurring letters such as e and a, with smaller bit strings;
and less frequently occurring letters such as q and x with longer bit
strings.
Any string of letters will be encoded as a string of bits that are no-
longer of the same length per letter. To successfully decode such as
string, the smaller codes assigned to letters such as 'e' cannot occur
as a prefix in the larger codes such as that for 'x'.
If you were to assign a code 01 for 'e' and code 011 for 'x', then if
the bits to decode started as 011... then you would not know if you
should decode an 'e' or an 'x'.
The Huffman coding scheme takes each symbol and its weight (or
frequency of occurrence), and generates proper encodings for each
symbol taking account of the weights of each symbol, so that higher
weighted symbols have less bits in their encoding. (See the WP article
for more information).
A Huffman encoding can be computed by first creating a tree of
nodes:
6 Data Compression Lec (3)
Algorithm Huffman coding
1- Create a leaf node for each symbol and add it to the
priority queue.
2- While there is more than one node in the queue:
a. Remove the node of highest priority (lowest
probability) twice to get two nodes.
b. Create a new internal node with these two nodes as
children and with probability equal to the sum of the
two nodes' probabilities.
c. Add the new node to the queue.
3- The remaining node is the root node and the tree is
complete.
Traverse the constructed binary tree from root to leaves
assigning and accumulating a '0' for one branch and a '1' for
the other at each node. The accumulated zeroes and ones at
each leaf constitute a Huffman encoding for those symbols and
weights:
7 Data Compression Lec (3)
Example : build codebook for the following symbols
symbols A B C D
probability o.2 0.3 0.1 0.4
--
D 0.4
B 0.3
A 0.2
C 0.1
-1.00.20.3
-
0.60
0.41
D 0.4 0.4 0.4 0.4 0.4 0.4 0.6
B0.3 0.3 0.3 0.3 0.6 0.6 0.4
A0.2 0.2 0.3 0.3
C0.1 0.1
D  0.4 0.4 0.4 0.6
B 0.3 0.3 0.6 0.4
A 0.2 0.3
C 0.1
0
1
01
8 Data Compression Lec (3)
01
D  0.4 0.4 0.4 0.6
B 0.3 0.3 0.6 0.4
A 0.2 0.3
C 0.1
D  0.4 0.4 0.4 0.6
B 0.3 0.3 0.6 0.4
A 0.2 0.3
C 0.1
D  0.4 0.4 0.4 0.6
B 0.3 0.3 0.6 0.4
A 0.2 0.3
C 0.1
Huffman CodeProbabilityNatural Code
01020.2A-002
0020.3B-012
01120.2C-102
120.4D-112
1
0
0
1
00
0 1
1 0
1
01
01
1
00 00
010
011

Más contenido relacionado

La actualidad más candente

Dictionary Based Compression
Dictionary Based CompressionDictionary Based Compression
Dictionary Based Compressionanithabalaprabhu
 
Data compression & Classification
Data compression & ClassificationData compression & Classification
Data compression & ClassificationKhulna University
 
Arithmetic coding
Arithmetic codingArithmetic coding
Arithmetic coding09lavee
 
Polygraphic Substitution Cipher - Part 2
Polygraphic Substitution Cipher  - Part 2Polygraphic Substitution Cipher  - Part 2
Polygraphic Substitution Cipher - Part 2SHUBHA CHATURVEDI
 
Huffman's Alforithm
Huffman's AlforithmHuffman's Alforithm
Huffman's AlforithmRoohaali
 
Data compression huffman coding algoritham
Data compression huffman coding algorithamData compression huffman coding algoritham
Data compression huffman coding algorithamRahul Khanwani
 
Module 4 Arithmetic Coding
Module 4 Arithmetic CodingModule 4 Arithmetic Coding
Module 4 Arithmetic Codinganithabalaprabhu
 
arithmetic and adaptive arithmetic coding
arithmetic and adaptive arithmetic codingarithmetic and adaptive arithmetic coding
arithmetic and adaptive arithmetic codingAyush Gupta
 
DBSCAN : A Clustering Algorithm
DBSCAN : A Clustering AlgorithmDBSCAN : A Clustering Algorithm
DBSCAN : A Clustering AlgorithmPınar Yahşi
 
On the-optimal-number-of-smart-dust-particles
On the-optimal-number-of-smart-dust-particlesOn the-optimal-number-of-smart-dust-particles
On the-optimal-number-of-smart-dust-particlesCemal Ardil
 
NETWORK LAYER - Logical Addressing
NETWORK LAYER - Logical AddressingNETWORK LAYER - Logical Addressing
NETWORK LAYER - Logical AddressingPankaj Debbarma
 

La actualidad más candente (17)

Dictionary Based Compression
Dictionary Based CompressionDictionary Based Compression
Dictionary Based Compression
 
Data compression & Classification
Data compression & ClassificationData compression & Classification
Data compression & Classification
 
Arithmetic coding
Arithmetic codingArithmetic coding
Arithmetic coding
 
Polygraphic Substitution Cipher - Part 2
Polygraphic Substitution Cipher  - Part 2Polygraphic Substitution Cipher  - Part 2
Polygraphic Substitution Cipher - Part 2
 
Huffman's Alforithm
Huffman's AlforithmHuffman's Alforithm
Huffman's Alforithm
 
Text encryption
Text encryptionText encryption
Text encryption
 
Data compression huffman coding algoritham
Data compression huffman coding algorithamData compression huffman coding algoritham
Data compression huffman coding algoritham
 
Huffman Coding
Huffman CodingHuffman Coding
Huffman Coding
 
Huffman coding
Huffman codingHuffman coding
Huffman coding
 
Module 4 Arithmetic Coding
Module 4 Arithmetic CodingModule 4 Arithmetic Coding
Module 4 Arithmetic Coding
 
Adaptive Huffman Coding
Adaptive Huffman CodingAdaptive Huffman Coding
Adaptive Huffman Coding
 
arithmetic and adaptive arithmetic coding
arithmetic and adaptive arithmetic codingarithmetic and adaptive arithmetic coding
arithmetic and adaptive arithmetic coding
 
DBSCAN : A Clustering Algorithm
DBSCAN : A Clustering AlgorithmDBSCAN : A Clustering Algorithm
DBSCAN : A Clustering Algorithm
 
On the-optimal-number-of-smart-dust-particles
On the-optimal-number-of-smart-dust-particlesOn the-optimal-number-of-smart-dust-particles
On the-optimal-number-of-smart-dust-particles
 
NETWORK LAYER - Logical Addressing
NETWORK LAYER - Logical AddressingNETWORK LAYER - Logical Addressing
NETWORK LAYER - Logical Addressing
 
Analytical Study of AES and Proposed Variant with Enhance Block Length and Ke...
Analytical Study of AES and Proposed Variant with Enhance Block Length and Ke...Analytical Study of AES and Proposed Variant with Enhance Block Length and Ke...
Analytical Study of AES and Proposed Variant with Enhance Block Length and Ke...
 
Chapter 19
Chapter 19Chapter 19
Chapter 19
 

Similar a Lecft3data

3 mathematical priliminaries DATA compression
3 mathematical priliminaries DATA compression3 mathematical priliminaries DATA compression
3 mathematical priliminaries DATA compressionShubham Jain
 
Implementation of Lossless Compression Algorithms for Text Data
Implementation of Lossless Compression Algorithms for Text DataImplementation of Lossless Compression Algorithms for Text Data
Implementation of Lossless Compression Algorithms for Text DataBRNSSPublicationHubI
 
2.3 unit-ii-text-compression-a-outline-compression-techniques-run-length-codi...
2.3 unit-ii-text-compression-a-outline-compression-techniques-run-length-codi...2.3 unit-ii-text-compression-a-outline-compression-techniques-run-length-codi...
2.3 unit-ii-text-compression-a-outline-compression-techniques-run-length-codi...Helan4
 
Data Compression - Text Compression - Run Length Encoding
Data Compression - Text Compression - Run Length EncodingData Compression - Text Compression - Run Length Encoding
Data Compression - Text Compression - Run Length EncodingMANISH T I
 
Chapter 4 Lossless Compression Algorithims.pptx
Chapter 4 Lossless Compression Algorithims.pptxChapter 4 Lossless Compression Algorithims.pptx
Chapter 4 Lossless Compression Algorithims.pptxMedinaBedru
 
C programming session 04
C programming session 04C programming session 04
C programming session 04Dushmanta Nath
 
Huffman coding
Huffman codingHuffman coding
Huffman codingGeorge Ang
 
Mathematics Research Paper - Mathematics of Computer Networking - Final Draft
Mathematics Research Paper - Mathematics of Computer Networking - Final DraftMathematics Research Paper - Mathematics of Computer Networking - Final Draft
Mathematics Research Paper - Mathematics of Computer Networking - Final DraftAlexanderCominsky
 
M.Sridevi II-M.Sc (computer science)
M.Sridevi II-M.Sc (computer science)M.Sridevi II-M.Sc (computer science)
M.Sridevi II-M.Sc (computer science)SrideviM4
 

Similar a Lecft3data (20)

Compression ii
Compression iiCompression ii
Compression ii
 
Compression Ii
Compression IiCompression Ii
Compression Ii
 
Compression Ii
Compression IiCompression Ii
Compression Ii
 
Lossless
LosslessLossless
Lossless
 
3 mathematical priliminaries DATA compression
3 mathematical priliminaries DATA compression3 mathematical priliminaries DATA compression
3 mathematical priliminaries DATA compression
 
Implementation of Lossless Compression Algorithms for Text Data
Implementation of Lossless Compression Algorithms for Text DataImplementation of Lossless Compression Algorithms for Text Data
Implementation of Lossless Compression Algorithms for Text Data
 
2.3 unit-ii-text-compression-a-outline-compression-techniques-run-length-codi...
2.3 unit-ii-text-compression-a-outline-compression-techniques-run-length-codi...2.3 unit-ii-text-compression-a-outline-compression-techniques-run-length-codi...
2.3 unit-ii-text-compression-a-outline-compression-techniques-run-length-codi...
 
Data Compression - Text Compression - Run Length Encoding
Data Compression - Text Compression - Run Length EncodingData Compression - Text Compression - Run Length Encoding
Data Compression - Text Compression - Run Length Encoding
 
Chapter 4 Lossless Compression Algorithims.pptx
Chapter 4 Lossless Compression Algorithims.pptxChapter 4 Lossless Compression Algorithims.pptx
Chapter 4 Lossless Compression Algorithims.pptx
 
Nn
NnNn
Nn
 
MKG_ISS_04.ppt
MKG_ISS_04.pptMKG_ISS_04.ppt
MKG_ISS_04.ppt
 
Huffman codes
Huffman codesHuffman codes
Huffman codes
 
C programming session 04
C programming session 04C programming session 04
C programming session 04
 
Huffman Coding
Huffman CodingHuffman Coding
Huffman Coding
 
Lec5 Compression
Lec5 CompressionLec5 Compression
Lec5 Compression
 
Huffman coding
Huffman codingHuffman coding
Huffman coding
 
MATLAB
MATLABMATLAB
MATLAB
 
Huffman coding01
Huffman coding01Huffman coding01
Huffman coding01
 
Mathematics Research Paper - Mathematics of Computer Networking - Final Draft
Mathematics Research Paper - Mathematics of Computer Networking - Final DraftMathematics Research Paper - Mathematics of Computer Networking - Final Draft
Mathematics Research Paper - Mathematics of Computer Networking - Final Draft
 
M.Sridevi II-M.Sc (computer science)
M.Sridevi II-M.Sc (computer science)M.Sridevi II-M.Sc (computer science)
M.Sridevi II-M.Sc (computer science)
 

Último

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

Último (20)

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Lecft3data

  • 1. 1 Data Compression Lec (3) - Coding Methods
  • 2. 2 Data Compression Lec (3) 1-Run-Length Encoding The idea behind this approach to data compression is this: If a data item d occurs nconsecutive times in the input stream, replace the n occurrences with the single pairnd. The n consecutive occurrences of a data item are called a run length of n, and thisapproach to data compression is called run-length encoding or RLE. We apply this ideafirst to text compression and then to image compression.  RLE Text Compression Just replacing 2._all_is_too_well with 2._a2_is_t2_we2 will not work.Even the string 2._a2l_is_t2o_we2l does not solve this problem. One way to solve this problem is to precede each repetition with a special escape character. If we use the character @ as the escape character, then the string 2._a@2l_is_t@2o_we@2l can be decompressed unambiguously. However, this string is longer than the original string, because it replaces two consecutive letters with three characters. We have to adopt the convention that only three or more repetitions of the same character will be replaced with a repetition factor. The main problems with this method are the following: 1. In English text there are not many repetitions. There are many “doubles” but a “triple” is rare. 2. The character “@” may be part of the text in the input stream, in which case a different escape character must be chosen. Sometimes the input stream may contain every possible character in the alphabet.  RLE Image Compression RLE can be used to compress grayscale images. Each run of pixels of the same intensity (gray level) is encoded as a pair (run length, pixel value). The run length usually occupies one byte, allowing for runs of up to 255 pixels. The pixel value occupies several bits, depending on the number of gray levels (typically between 4 and 8 bits).
  • 3. 3 Data Compression Lec (3) Example 3.1An 8-bit deep grayscale bitmap that starts with 12, 12, 12, 12, 12, 12, 12, 12, 12, 35, 76, 112, 67, 87, 87, 87, 5, 5, 5, 5, 5, 5, 1, . . . is compressed into 9 ,12,35,76,112,67, 3 ,87, 6 ,5,1,. . . , where the bold numbers indicate counts. The problem is to distinguish between a byte containing a grayscale value (such as 12) and one containing a count (such as 9 ). Here are some solutions 1. If the image is limited to just 128 grayscales, we can devote one bit in each byte to indicate whether the byte contains a grayscale value or a count. 2. If the number of grayscales is 256, it can be reduced to 255 with one value reserved as a flag to precede every byte with a count. If the flag is, say, 255, then the sequence above be comes 255, 9, 12, 35, 76, 112, 67, 255, 3, 87, 255, 6, 5, 1, . . . . 3. Again, one bit is devoted to each byte to indicate whether the byte contains a grayscale value or a count. This time, however, these extra bits are accumulated in groups of 8,and each group is written on the output stream preceding (or following) the 8 bytes it “corresponds to.” Example: the sequence 9 ,12,35,76,112,67, 3 ,87, 6 ,5,1,. ... becomes 10000010 ,9,12,35,76,112,67,3,87, 100..... ,6,5,1,. .
  • 4. 4 Data Compression Lec (3) 2-Move-to-Front Coding The basic idea of this method is to maintain the alphabet A of symbols as a list where frequently occurring symbols are located near the front. A symbol s isencoded as the number of symbols that precede it in this list Example 3.2 Here are example that illustrate the move-to-front idea. The alphabet A=(a, b, c, d, m, n, o, p) The input stream abcddcbamnopponm is encoded as C = (0, 1, 2, 3, 0, 1, 2, 3, 4, 5, 6, 7, 0, 1, 2, 3)
  • 5. 5 Data Compression Lec (3) 3-Huffman coding Huffman encoding is a way to assign binary codes to symbols that reduces the overall number of bitsused to encode a typical string of those symbols. For example, if you use letters as symbols and have details of the frequency of occurence of those letters in typical strings, then you could just encode each letter with a fixed number of bits, such as in ASCII codes. You can do better than this by encoding more frequently occurring letters such as e and a, with smaller bit strings; and less frequently occurring letters such as q and x with longer bit strings. Any string of letters will be encoded as a string of bits that are no- longer of the same length per letter. To successfully decode such as string, the smaller codes assigned to letters such as 'e' cannot occur as a prefix in the larger codes such as that for 'x'. If you were to assign a code 01 for 'e' and code 011 for 'x', then if the bits to decode started as 011... then you would not know if you should decode an 'e' or an 'x'. The Huffman coding scheme takes each symbol and its weight (or frequency of occurrence), and generates proper encodings for each symbol taking account of the weights of each symbol, so that higher weighted symbols have less bits in their encoding. (See the WP article for more information). A Huffman encoding can be computed by first creating a tree of nodes:
  • 6. 6 Data Compression Lec (3) Algorithm Huffman coding 1- Create a leaf node for each symbol and add it to the priority queue. 2- While there is more than one node in the queue: a. Remove the node of highest priority (lowest probability) twice to get two nodes. b. Create a new internal node with these two nodes as children and with probability equal to the sum of the two nodes' probabilities. c. Add the new node to the queue. 3- The remaining node is the root node and the tree is complete. Traverse the constructed binary tree from root to leaves assigning and accumulating a '0' for one branch and a '1' for the other at each node. The accumulated zeroes and ones at each leaf constitute a Huffman encoding for those symbols and weights:
  • 7. 7 Data Compression Lec (3) Example : build codebook for the following symbols symbols A B C D probability o.2 0.3 0.1 0.4 -- D 0.4 B 0.3 A 0.2 C 0.1 -1.00.20.3 - 0.60 0.41 D 0.4 0.4 0.4 0.4 0.4 0.4 0.6 B0.3 0.3 0.3 0.3 0.6 0.6 0.4 A0.2 0.2 0.3 0.3 C0.1 0.1 D  0.4 0.4 0.4 0.6 B 0.3 0.3 0.6 0.4 A 0.2 0.3 C 0.1 0 1 01
  • 8. 8 Data Compression Lec (3) 01 D  0.4 0.4 0.4 0.6 B 0.3 0.3 0.6 0.4 A 0.2 0.3 C 0.1 D  0.4 0.4 0.4 0.6 B 0.3 0.3 0.6 0.4 A 0.2 0.3 C 0.1 D  0.4 0.4 0.4 0.6 B 0.3 0.3 0.6 0.4 A 0.2 0.3 C 0.1 Huffman CodeProbabilityNatural Code 01020.2A-002 0020.3B-012 01120.2C-102 120.4D-112 1 0 0 1 00 0 1 1 0 1 01 01 1 00 00 010 011