SlideShare una empresa de Scribd logo
1 de 40
Descargar para leer sin conexión
CHAPTER 6

Compression Techniques




Objectives:

 Able to perform data compression
 Able to use different compression
 techniques




                                     1
Introduction
  What is Compression?
     Data compression requires the identification and
     extraction of source redundancy.
     In other words, data compression seeks to
     reduce the number of bits used to store or
     transmit information.
     There are a wide range of compression methods
     which can be so unlike one another that they
     have little in common except that they compress
     data.




The Need For Compression
In terms of storage, the capacity of a storage
device can be effectively increased with
methods that compresses a body of data on its
way to a storage device and decompresses it
when it is retrieved.
In terms of communications, the bandwidth of a
digital communication link can be effectively
increased by compressing data at the sending
end and decompressing data at the receiving
end.




                                                        2
A Brief History of Data
Compression
   The late 40's were the early years of
   Information Theory, the idea of developing
   efficient new coding methods was just
   starting to be fleshed out. Ideas of entropy,
   information content and redundancy were
   explored.
   One popular notion held that if the
   probability of symbols in a message were
   known, there ought to be a way to code the
   symbols so that the message will take up
   less space.




A Brief History of Data
Compression
The first well-known method for compressing
digital signals is now known as Shannon- Fano
coding. Shannon and Fano [~1948]
simultaneously developed this algorithm which
assigns binary codewords to unique symbols
that appear within a given data file.

While Shannon-Fano coding was a great leap
forward, it had the unfortunate luck to be quickly
superseded by an even more efficient coding
system : Huffman Coding.




                                                     3
A Brief History of Data
Compression

 Huffman coding [1952] shares most
 characteristics of Shannon-Fano coding.
 Huffman coding could perform effective data
 compression by reducing the amount of
 redundancy in the coding of symbols.
 It has been proven to be the most efficient
 fixed-length coding method available.




A Brief History of Data
Compression
 In the last fifteen years, Huffman coding has
 been replaced by arithmetic coding.
 Arithmetic coding bypasses the idea of
 replacing an input symbol with a specific
 code.
 It replaces a stream of input symbols with a
 single floating-point output number.
 More bits are needed in the output number
 for longer, complex messages.




                                                 4
A Brief History of Data
  Compression




  Terminology
• Compressor–Software (or hardware) device that
  compresses data
• Decompressor–Software (or hardware) device
  that decompresses data
• Codec–Software (or hardware) device that
  compresses and decompresses data
• Algorithm–The logic that governs the
  compression/decompression process




                                                  5
Compression can be
    categorized in two broad ways:
 Lossless compression
     recover the exact original data after
     compression.
     mainly use for compressing database
     records, spreadsheets or word
     processing files, where exact
     replication of the original is essential.




   Compression can be
   categorized in two broad ways:
Lossy compression
  will result in a certain loss of accuracy in
  exchange for a substantial increase in
  compression.
  more effective when used to compress graphic
  images and digitised voice where losses outside
  visual or aural perception can be tolerated.
  Most lossy compression techniques can be
  adjusted to different quality levels, gaining higher
  accuracy in exchange for less effective
  compression.




                                                         6
Lossless Compression
Algorithms:




Lossless Compression
Algorithms:
 Dictionary-based compression algorithms
 Repetitive Sequence Suppression
 Run-length Encoding*
 Pattern Substitution
 Entropy Encoding*
   The Shannon-Fano Algorithm
   Huffman Coding*
   Arithmetic Coding*




                                           7
Dictionary-based
compression algorithms
 Dictionary-based compression algorithms
 use a completely different method to
 compress data.
 They encode variable-length strings of
 symbols as single tokens.
 The token forms an index to a phrase
 dictionary.
 If the tokens are smaller than the phrases,
 they replace the phrases and compression
 occurs.




Dictionary-based
compression algorithms
 Suppose we want to encode the Oxford
 Concise English dictionary which contains
 about 159,000 entries. Why not just transmit
 each word as an 18 bit number?
 Problems:
    Too many bits,
    everyone needs a dictionary,
    only works for English text.
 Solution: Find a way to build the dictionary
 adaptively.




                                                8
Dictionary-based
compression algorithms
Two dictionary- based compression
techniques called LZ77 and LZ78 have been
developed.
LZ77 is a "sliding window" technique in which
the dictionary consists of a set of fixed- length
phrases found in a "window" into the
previously seen text.
LZ78 takes a completely different approach to
building a dictionary. Instead of using
fixedlength phrases from a window into the
text, LZ78 builds phrases up one symbol at a
time, adding a new symbol to an existing
phrase when a match occurs.




Dictionary-based
compression algorithms

  The LZW Compression Algorithm can
  summarised as follows:




                                                    9
Example




Dictionary-based
compression algorithms
 The LZW decompression Algorithm can
 summarised as follows:




                                       10
Example:




Dictionary-based compression
algorithms Problem
  What if we run out of dictionary space?
    Solution 1: Keep track of unused entries
    and use LRU
    Solution 2: Monitor compression
    performance and flush dictionary when
    performance is poor.




                                               11
Repetitive
Sequence Suppression

 Fairly straight forward to understand
 and implement.
 Simplicity is their downfall: NOT best
 compression ratios.
 Some methods have their applications,
 e.g.Component of JPEG, Silence
 Suppression.




Repetitive Sequence Suppression
 If a sequence a series on & successive
 tokens appears
   Replace series with a token and a count number
   of occurrences.
   Usually need to have a special flag to denote
   when the repeated token appears
 Example
 89400000000000000000000000000000000
 we can replace with 894f32, where f is the
 flag for zero.




                                                    12
Repetitive Sequence Suppression

  How Much Compression?
    Compression savings depend on the content of
    the data.
  Applications of this simple compression
  technique include:
    Suppression of zero’s in a file (Zero Length
    Suppression)
    Silence in audio data, Pauses in conversation
    etc.
    Bitmaps
    Blanks in text or program source files
    Backgrounds in images
    Other regular image or data tokens




Run-length Encoding




                                                    13
Run-length Encoding




Run-length Encoding




                      14
Run-length Encoding




   Run-length Encoding
 Uncompress
Blue White White White White White White Blue
White Blue White White White White White Blue etc.

 Compress
1XBlue 6XWhite 1XBlue
1XWhite 1XBlue 4Xwhite 1XBlue 1XWhite
etc.




                                                     15
Run-length Encoding




Pattern Substitution




                       16
Entropy Encoding




The Shannon-Fano Coding
 To create a code tree according to Shannon
 and Fano an ordered table is required
 providing the frequency of any symbol.
 Each part of the table will be divided into two
 segments.
 The algorithm has to ensure that either the
 upper and the lower part of the segment
 have nearly the same sum of frequencies.
 This procedure will be repeated until only
 single symbols are left.




                                                   17
The Shannon-Fano
Algorithm




The Shannon-Fano
Algorithm




                   18
The Shannon-Fano
Algorithm




The Shannon-Fano
Algorithm




                   19
The Shannon-Fano
Algorithm




  Example Shannon-Fano
         Coding




                         20
Example Shannon-Fano
                Coding
                  STEP 1       STEP 2         STEP 3
SYMBOL   FREQ   SUM   CODE   SUM   CODE   SUM    CODE




                                    11    8

                                    11




         Example Shannon-Fano
                Coding




                                                        21
Example Shannon-Fano
         Coding




Huffman Coding




                         22
Huffman Coding




Huffman Coding




                 23
Huffman Coding




Huffman Coding




                 24
Huffman Coding




Huffman Coding




                 25
Huffman Coding




Huffman Coding




                 26
Huffman Code: Example




Huffman Code: Example




                        27
Huffman Code: Example




Huffman Coding




                        28
Huffman Coding




Huffman Coding




                 29
Arithmetic Coding




Arithmetic Coding




                    30
Arithmetic Coding




Arithmetic Coding




                    31
Arithmetic Coding




Arithmetic Coding




                    32
Arithmetic Coding




Arithmetic Coding




                    33
Arithmetic Coding




Arithmetic Coding




                    34
Arithmetic Coding




Arithmetic Coding




                    35
Arithmetic Coding




How to translate range to bit

     Example:
1.   BACA
     low = 0.59375, high = 0.60937.
2.   CAEE$
     low = 0.33184, high = 0.3322.




                                      36
Decimal
            1      2      3        4       5
0.12345 =     1
                + 2+ 3+ 4+ 5
           10 10 10 10 10
        = 1× 10 −1 + 2 × 10 − 2 + 3 × 10 −3 + 4 × 10 − 4 + 5 × 10 −5

     0.12345
                                x   10-5
                                x   10-4
                                x   10-3
                                x   10-2
                                x   10-1




    Binary
    0 . 01010101
       0    1  0    1   0   1      0 1
    =    + 2 + 3 + 4 + 5 + 6 + 7 + 8
      21 2     2    2   2   2    2   2
          −2     −4      −6     −8
    = 1× 2 + 1× 2 + 1× 2 + 1× 2
               0.01010
                                           x 2-5
                                           x 2-4
                                           x 2-3
                                            x 2-2
                                            x 2-1




                                                                       37
Binary to decimal

0.12 = 0.510                What is a value of
                             0.010101012 in
0.012 = 0.2510                   decimal?
0.0012 = 0.12510
0.00012 = 0.062510
0.000012 = 0.0312510

                                       0.033203125




   Generating codeword for
   encoder

BEGIN                     [0.33184,0.33220]
  code=0;
  k=1;
  while( value(code) < low )
  {
      assign 1 to the k-th binary fraction bit;
      if ( value(code) > high)
           replace the k-th bit by 0;
      k = k + 1;
  }
END




                                                     38
Example1
   Range (0.33184,0.33220)

BEGIN            Binary
  code=0;                    Decimal
  k=1;
  while( value(code) < 0.33184 )
  {
      assign 1 to the k-th binary fraction bit;
      if ( value(code) > 0.33220 )
           replace the k-th bit by 0;
      k = k + 1;
  }
END




   Example1
   Range (0.33184,0.33220)

   1.   Assign 1 to the first fraction
        (codeword=0.12) and compare with low
        (0.3318410)
          value(0.12=0.510)> 0.3318410 -> out of range
          Hence, we assign 0 for the first bit.
          value(0.02)< 0.3318410 -> while loop continue

   2.   Assign 1 to the second fraction (0.012)
        =0.2510 which is less then high (0.33220)




                                                          39
Example1
Range (0.33184,0.33220)

3.   Assign 1 to the third fraction (0.0112)
     =0.2510+ 0.12510 = 0.37510 which is bigger
     then high (0.33220), so replace the kbit by
     0. Now the codeword = 0.0102
4.   Assign 1 to the fourth fraction (0.01012) =
     0.2510 + 0.062510 =0.312510 which is less
     then high (0.33220). Now the codeword =
     0.01012
5.   Continue…




Example1
Range (0.33184,0.33220)

     Eventually, the binary codeword
     generate is 0.01010101 which
     0.033203125
     8 bit binary represent CAEE$




                                                   40

Más contenido relacionado

La actualidad más candente

Edge Detection and Segmentation
Edge Detection and SegmentationEdge Detection and Segmentation
Edge Detection and SegmentationA B Shinde
 
Arithmetic coding
Arithmetic codingArithmetic coding
Arithmetic codingVikas Goyal
 
Multimedia image compression standards
Multimedia image compression standardsMultimedia image compression standards
Multimedia image compression standardsMazin Alwaaly
 
Fundamentals and image compression models
Fundamentals and image compression modelsFundamentals and image compression models
Fundamentals and image compression modelslavanya marichamy
 
Homomorphic filtering
Homomorphic filteringHomomorphic filtering
Homomorphic filteringGautam Saxena
 
Predictive coding
Predictive codingPredictive coding
Predictive codingp_ayal
 
Presentation of Lossy compression
Presentation of Lossy compressionPresentation of Lossy compression
Presentation of Lossy compressionOmar Ghazi
 
digital image processing
digital image processingdigital image processing
digital image processingAbinaya B
 
Image compression in digital image processing
Image compression in digital image processingImage compression in digital image processing
Image compression in digital image processingDHIVYADEVAKI
 
Data Compression (Lossy and Lossless)
Data Compression (Lossy and Lossless)Data Compression (Lossy and Lossless)
Data Compression (Lossy and Lossless)Project Student
 
Introduction for Data Compression
Introduction for Data Compression Introduction for Data Compression
Introduction for Data Compression MANISH T I
 
Raster scan displays ppt
Raster scan displays pptRaster scan displays ppt
Raster scan displays pptABHISHEK KUMAR
 

La actualidad más candente (20)

Edge Detection and Segmentation
Edge Detection and SegmentationEdge Detection and Segmentation
Edge Detection and Segmentation
 
Arithmetic coding
Arithmetic codingArithmetic coding
Arithmetic coding
 
Multimedia image compression standards
Multimedia image compression standardsMultimedia image compression standards
Multimedia image compression standards
 
Fundamentals and image compression models
Fundamentals and image compression modelsFundamentals and image compression models
Fundamentals and image compression models
 
Lzw coding technique for image compression
Lzw coding technique for image compressionLzw coding technique for image compression
Lzw coding technique for image compression
 
Homomorphic filtering
Homomorphic filteringHomomorphic filtering
Homomorphic filtering
 
Predictive coding
Predictive codingPredictive coding
Predictive coding
 
Presentation of Lossy compression
Presentation of Lossy compressionPresentation of Lossy compression
Presentation of Lossy compression
 
Huffman Coding
Huffman CodingHuffman Coding
Huffman Coding
 
digital image processing
digital image processingdigital image processing
digital image processing
 
Image compression in digital image processing
Image compression in digital image processingImage compression in digital image processing
Image compression in digital image processing
 
Image compression models
Image compression modelsImage compression models
Image compression models
 
Jpeg compression
Jpeg compressionJpeg compression
Jpeg compression
 
Text compression
Text compressionText compression
Text compression
 
Data Compression (Lossy and Lossless)
Data Compression (Lossy and Lossless)Data Compression (Lossy and Lossless)
Data Compression (Lossy and Lossless)
 
Audio compression
Audio compressionAudio compression
Audio compression
 
Introduction for Data Compression
Introduction for Data Compression Introduction for Data Compression
Introduction for Data Compression
 
JPEG Image Compression
JPEG Image CompressionJPEG Image Compression
JPEG Image Compression
 
Image compression
Image compressionImage compression
Image compression
 
Raster scan displays ppt
Raster scan displays pptRaster scan displays ppt
Raster scan displays ppt
 

Similar a Dictionary Based Compression

111111111111111111111111111111111789.ppt
111111111111111111111111111111111789.ppt111111111111111111111111111111111789.ppt
111111111111111111111111111111111789.pptAllamJayaPrakash
 
111111111111111111111111111111111789.ppt
111111111111111111111111111111111789.ppt111111111111111111111111111111111789.ppt
111111111111111111111111111111111789.pptAllamJayaPrakash
 
Sunzip user tool for data reduction using huffman algorithm
Sunzip user tool for data reduction using huffman algorithmSunzip user tool for data reduction using huffman algorithm
Sunzip user tool for data reduction using huffman algorithmDr Sandeep Kumar Poonia
 
2.3 unit-ii-text-compression-a-outline-compression-techniques-run-length-codi...
2.3 unit-ii-text-compression-a-outline-compression-techniques-run-length-codi...2.3 unit-ii-text-compression-a-outline-compression-techniques-run-length-codi...
2.3 unit-ii-text-compression-a-outline-compression-techniques-run-length-codi...Helan4
 
Chapter%202%20 %20 Text%20compression(2)
Chapter%202%20 %20 Text%20compression(2)Chapter%202%20 %20 Text%20compression(2)
Chapter%202%20 %20 Text%20compression(2)nes
 
Lossless image compression.(1)
Lossless image compression.(1)Lossless image compression.(1)
Lossless image compression.(1)MohnishSatidasani
 
A research paper_on_lossless_data_compre
A research paper_on_lossless_data_compreA research paper_on_lossless_data_compre
A research paper_on_lossless_data_compreLuisa Francisco
 
Huffman coding.ppt
Huffman coding.pptHuffman coding.ppt
Huffman coding.pptvace1
 
A Lossless FBAR Compressor
A Lossless FBAR CompressorA Lossless FBAR Compressor
A Lossless FBAR CompressorPhilip Alipour
 
Teknik Pengkodean (2).pptx
Teknik Pengkodean (2).pptxTeknik Pengkodean (2).pptx
Teknik Pengkodean (2).pptxzulhelmanz
 
FINAL PROJECT REPORT
FINAL PROJECT REPORTFINAL PROJECT REPORT
FINAL PROJECT REPORTDhrumil Shah
 
Lecture 16 17 code-generation
Lecture 16 17 code-generationLecture 16 17 code-generation
Lecture 16 17 code-generationIffat Anjum
 
Comparison Study of Lossless Data Compression Algorithms for Text Data
Comparison Study of Lossless Data Compression Algorithms for Text DataComparison Study of Lossless Data Compression Algorithms for Text Data
Comparison Study of Lossless Data Compression Algorithms for Text DataIOSR Journals
 

Similar a Dictionary Based Compression (20)

111111111111111111111111111111111789.ppt
111111111111111111111111111111111789.ppt111111111111111111111111111111111789.ppt
111111111111111111111111111111111789.ppt
 
111111111111111111111111111111111789.ppt
111111111111111111111111111111111789.ppt111111111111111111111111111111111789.ppt
111111111111111111111111111111111789.ppt
 
Sunzip user tool for data reduction using huffman algorithm
Sunzip user tool for data reduction using huffman algorithmSunzip user tool for data reduction using huffman algorithm
Sunzip user tool for data reduction using huffman algorithm
 
Compression techniques
Compression techniquesCompression techniques
Compression techniques
 
2.3 unit-ii-text-compression-a-outline-compression-techniques-run-length-codi...
2.3 unit-ii-text-compression-a-outline-compression-techniques-run-length-codi...2.3 unit-ii-text-compression-a-outline-compression-techniques-run-length-codi...
2.3 unit-ii-text-compression-a-outline-compression-techniques-run-length-codi...
 
Chapter%202%20 %20 Text%20compression(2)
Chapter%202%20 %20 Text%20compression(2)Chapter%202%20 %20 Text%20compression(2)
Chapter%202%20 %20 Text%20compression(2)
 
Data Compression
Data CompressionData Compression
Data Compression
 
Lossless image compression.(1)
Lossless image compression.(1)Lossless image compression.(1)
Lossless image compression.(1)
 
A research paper_on_lossless_data_compre
A research paper_on_lossless_data_compreA research paper_on_lossless_data_compre
A research paper_on_lossless_data_compre
 
Lossless
LosslessLossless
Lossless
 
Lossless
LosslessLossless
Lossless
 
Arithmetic Coding
Arithmetic CodingArithmetic Coding
Arithmetic Coding
 
Huffman Coding
Huffman CodingHuffman Coding
Huffman Coding
 
Huffman coding.ppt
Huffman coding.pptHuffman coding.ppt
Huffman coding.ppt
 
A Lossless FBAR Compressor
A Lossless FBAR CompressorA Lossless FBAR Compressor
A Lossless FBAR Compressor
 
Teknik Pengkodean (2).pptx
Teknik Pengkodean (2).pptxTeknik Pengkodean (2).pptx
Teknik Pengkodean (2).pptx
 
FINAL PROJECT REPORT
FINAL PROJECT REPORTFINAL PROJECT REPORT
FINAL PROJECT REPORT
 
Lecture 16 17 code-generation
Lecture 16 17 code-generationLecture 16 17 code-generation
Lecture 16 17 code-generation
 
Comparison Study of Lossless Data Compression Algorithms for Text Data
Comparison Study of Lossless Data Compression Algorithms for Text DataComparison Study of Lossless Data Compression Algorithms for Text Data
Comparison Study of Lossless Data Compression Algorithms for Text Data
 
Topics:LZ77 & LZ78
Topics:LZ77 & LZ78Topics:LZ77 & LZ78
Topics:LZ77 & LZ78
 

Más de anithabalaprabhu (20)

Shannon Fano
Shannon FanoShannon Fano
Shannon Fano
 
Ch 04 Arithmetic Coding ( P P T)
Ch 04  Arithmetic  Coding ( P P T)Ch 04  Arithmetic  Coding ( P P T)
Ch 04 Arithmetic Coding ( P P T)
 
Compression
CompressionCompression
Compression
 
Datacompression1
Datacompression1Datacompression1
Datacompression1
 
Speech Compression
Speech CompressionSpeech Compression
Speech Compression
 
Z24 4 Speech Compression
Z24   4   Speech CompressionZ24   4   Speech Compression
Z24 4 Speech Compression
 
Dictor
DictorDictor
Dictor
 
Module 4 Arithmetic Coding
Module 4 Arithmetic CodingModule 4 Arithmetic Coding
Module 4 Arithmetic Coding
 
Ch 04 Arithmetic Coding (Ppt)
Ch 04 Arithmetic Coding (Ppt)Ch 04 Arithmetic Coding (Ppt)
Ch 04 Arithmetic Coding (Ppt)
 
Compression Ii
Compression IiCompression Ii
Compression Ii
 
06 Arithmetic 1
06 Arithmetic 106 Arithmetic 1
06 Arithmetic 1
 
Lassy
LassyLassy
Lassy
 
Compression Ii
Compression IiCompression Ii
Compression Ii
 
Lossy
LossyLossy
Lossy
 
Planning
PlanningPlanning
Planning
 
Losseless
LosselessLosseless
Losseless
 
Lec32
Lec32Lec32
Lec32
 
Lec5 Compression
Lec5 CompressionLec5 Compression
Lec5 Compression
 
Huffman Student
Huffman StudentHuffman Student
Huffman Student
 
Huffman Encoding Pr
Huffman Encoding PrHuffman Encoding Pr
Huffman Encoding Pr
 

Último

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 

Último (20)

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 

Dictionary Based Compression

  • 1. CHAPTER 6 Compression Techniques Objectives: Able to perform data compression Able to use different compression techniques 1
  • 2. Introduction What is Compression? Data compression requires the identification and extraction of source redundancy. In other words, data compression seeks to reduce the number of bits used to store or transmit information. There are a wide range of compression methods which can be so unlike one another that they have little in common except that they compress data. The Need For Compression In terms of storage, the capacity of a storage device can be effectively increased with methods that compresses a body of data on its way to a storage device and decompresses it when it is retrieved. In terms of communications, the bandwidth of a digital communication link can be effectively increased by compressing data at the sending end and decompressing data at the receiving end. 2
  • 3. A Brief History of Data Compression The late 40's were the early years of Information Theory, the idea of developing efficient new coding methods was just starting to be fleshed out. Ideas of entropy, information content and redundancy were explored. One popular notion held that if the probability of symbols in a message were known, there ought to be a way to code the symbols so that the message will take up less space. A Brief History of Data Compression The first well-known method for compressing digital signals is now known as Shannon- Fano coding. Shannon and Fano [~1948] simultaneously developed this algorithm which assigns binary codewords to unique symbols that appear within a given data file. While Shannon-Fano coding was a great leap forward, it had the unfortunate luck to be quickly superseded by an even more efficient coding system : Huffman Coding. 3
  • 4. A Brief History of Data Compression Huffman coding [1952] shares most characteristics of Shannon-Fano coding. Huffman coding could perform effective data compression by reducing the amount of redundancy in the coding of symbols. It has been proven to be the most efficient fixed-length coding method available. A Brief History of Data Compression In the last fifteen years, Huffman coding has been replaced by arithmetic coding. Arithmetic coding bypasses the idea of replacing an input symbol with a specific code. It replaces a stream of input symbols with a single floating-point output number. More bits are needed in the output number for longer, complex messages. 4
  • 5. A Brief History of Data Compression Terminology • Compressor–Software (or hardware) device that compresses data • Decompressor–Software (or hardware) device that decompresses data • Codec–Software (or hardware) device that compresses and decompresses data • Algorithm–The logic that governs the compression/decompression process 5
  • 6. Compression can be categorized in two broad ways: Lossless compression recover the exact original data after compression. mainly use for compressing database records, spreadsheets or word processing files, where exact replication of the original is essential. Compression can be categorized in two broad ways: Lossy compression will result in a certain loss of accuracy in exchange for a substantial increase in compression. more effective when used to compress graphic images and digitised voice where losses outside visual or aural perception can be tolerated. Most lossy compression techniques can be adjusted to different quality levels, gaining higher accuracy in exchange for less effective compression. 6
  • 7. Lossless Compression Algorithms: Lossless Compression Algorithms: Dictionary-based compression algorithms Repetitive Sequence Suppression Run-length Encoding* Pattern Substitution Entropy Encoding* The Shannon-Fano Algorithm Huffman Coding* Arithmetic Coding* 7
  • 8. Dictionary-based compression algorithms Dictionary-based compression algorithms use a completely different method to compress data. They encode variable-length strings of symbols as single tokens. The token forms an index to a phrase dictionary. If the tokens are smaller than the phrases, they replace the phrases and compression occurs. Dictionary-based compression algorithms Suppose we want to encode the Oxford Concise English dictionary which contains about 159,000 entries. Why not just transmit each word as an 18 bit number? Problems: Too many bits, everyone needs a dictionary, only works for English text. Solution: Find a way to build the dictionary adaptively. 8
  • 9. Dictionary-based compression algorithms Two dictionary- based compression techniques called LZ77 and LZ78 have been developed. LZ77 is a "sliding window" technique in which the dictionary consists of a set of fixed- length phrases found in a "window" into the previously seen text. LZ78 takes a completely different approach to building a dictionary. Instead of using fixedlength phrases from a window into the text, LZ78 builds phrases up one symbol at a time, adding a new symbol to an existing phrase when a match occurs. Dictionary-based compression algorithms The LZW Compression Algorithm can summarised as follows: 9
  • 10. Example Dictionary-based compression algorithms The LZW decompression Algorithm can summarised as follows: 10
  • 11. Example: Dictionary-based compression algorithms Problem What if we run out of dictionary space? Solution 1: Keep track of unused entries and use LRU Solution 2: Monitor compression performance and flush dictionary when performance is poor. 11
  • 12. Repetitive Sequence Suppression Fairly straight forward to understand and implement. Simplicity is their downfall: NOT best compression ratios. Some methods have their applications, e.g.Component of JPEG, Silence Suppression. Repetitive Sequence Suppression If a sequence a series on & successive tokens appears Replace series with a token and a count number of occurrences. Usually need to have a special flag to denote when the repeated token appears Example 89400000000000000000000000000000000 we can replace with 894f32, where f is the flag for zero. 12
  • 13. Repetitive Sequence Suppression How Much Compression? Compression savings depend on the content of the data. Applications of this simple compression technique include: Suppression of zero’s in a file (Zero Length Suppression) Silence in audio data, Pauses in conversation etc. Bitmaps Blanks in text or program source files Backgrounds in images Other regular image or data tokens Run-length Encoding 13
  • 15. Run-length Encoding Run-length Encoding Uncompress Blue White White White White White White Blue White Blue White White White White White Blue etc. Compress 1XBlue 6XWhite 1XBlue 1XWhite 1XBlue 4Xwhite 1XBlue 1XWhite etc. 15
  • 17. Entropy Encoding The Shannon-Fano Coding To create a code tree according to Shannon and Fano an ordered table is required providing the frequency of any symbol. Each part of the table will be divided into two segments. The algorithm has to ensure that either the upper and the lower part of the segment have nearly the same sum of frequencies. This procedure will be repeated until only single symbols are left. 17
  • 20. The Shannon-Fano Algorithm Example Shannon-Fano Coding 20
  • 21. Example Shannon-Fano Coding STEP 1 STEP 2 STEP 3 SYMBOL FREQ SUM CODE SUM CODE SUM CODE 11 8 11 Example Shannon-Fano Coding 21
  • 22. Example Shannon-Fano Coding Huffman Coding 22
  • 27. Huffman Code: Example Huffman Code: Example 27
  • 36. Arithmetic Coding How to translate range to bit Example: 1. BACA low = 0.59375, high = 0.60937. 2. CAEE$ low = 0.33184, high = 0.3322. 36
  • 37. Decimal 1 2 3 4 5 0.12345 = 1 + 2+ 3+ 4+ 5 10 10 10 10 10 = 1× 10 −1 + 2 × 10 − 2 + 3 × 10 −3 + 4 × 10 − 4 + 5 × 10 −5 0.12345 x 10-5 x 10-4 x 10-3 x 10-2 x 10-1 Binary 0 . 01010101 0 1 0 1 0 1 0 1 = + 2 + 3 + 4 + 5 + 6 + 7 + 8 21 2 2 2 2 2 2 2 −2 −4 −6 −8 = 1× 2 + 1× 2 + 1× 2 + 1× 2 0.01010 x 2-5 x 2-4 x 2-3 x 2-2 x 2-1 37
  • 38. Binary to decimal 0.12 = 0.510 What is a value of 0.010101012 in 0.012 = 0.2510 decimal? 0.0012 = 0.12510 0.00012 = 0.062510 0.000012 = 0.0312510 0.033203125 Generating codeword for encoder BEGIN [0.33184,0.33220] code=0; k=1; while( value(code) < low ) { assign 1 to the k-th binary fraction bit; if ( value(code) > high) replace the k-th bit by 0; k = k + 1; } END 38
  • 39. Example1 Range (0.33184,0.33220) BEGIN Binary code=0; Decimal k=1; while( value(code) < 0.33184 ) { assign 1 to the k-th binary fraction bit; if ( value(code) > 0.33220 ) replace the k-th bit by 0; k = k + 1; } END Example1 Range (0.33184,0.33220) 1. Assign 1 to the first fraction (codeword=0.12) and compare with low (0.3318410) value(0.12=0.510)> 0.3318410 -> out of range Hence, we assign 0 for the first bit. value(0.02)< 0.3318410 -> while loop continue 2. Assign 1 to the second fraction (0.012) =0.2510 which is less then high (0.33220) 39
  • 40. Example1 Range (0.33184,0.33220) 3. Assign 1 to the third fraction (0.0112) =0.2510+ 0.12510 = 0.37510 which is bigger then high (0.33220), so replace the kbit by 0. Now the codeword = 0.0102 4. Assign 1 to the fourth fraction (0.01012) = 0.2510 + 0.062510 =0.312510 which is less then high (0.33220). Now the codeword = 0.01012 5. Continue… Example1 Range (0.33184,0.33220) Eventually, the binary codeword generate is 0.01010101 which 0.033203125 8 bit binary represent CAEE$ 40