SlideShare una empresa de Scribd logo
1 de 38
Descargar para leer sin conexión
Performance evaluation of fast
integer compression
techniques over tables

Ikhtear Md. Sharif Bhuyan
Supervisors: Hazel Webb, Daniel Lemire, Owen Kaser
©Ikhtear Md. Sharif Bhuyan
Overview
•
•
•
•
•
•

Introduction
Compression in databases and issues
Objectives
Experimental Results
Conclusion
Future Work

Performance evaluation of fast integer compression techniques over tables

12/4/2013

2
Query processing
Processor

Cache
RAM

Disk

Performance evaluation of fast integer compression techniques over tables

12/4/2013

3
Compression in databases
•
•
•
•

Reduce storage
Query processing speed
Save I/O bandwidth
Improve performance for I/O-bound operation

Performance evaluation of fast integer compression techniques over tables

12/4/2013

4
Selecting Compression in
databases
• Lossless
• Trade off between compression ratio and speed of
compression and decompression

Performance evaluation of fast integer compression techniques over tables

12/4/2013

5
Objective
• Examining and comparing the performance of
patched schemes with other methods with respect
to compression ratio, decompression speed and
compression speed.
• Assessing the effect of different factors such as row
order.

Performance evaluation of fast integer compression techniques over tables

12/4/2013

6
Column-oriented database
system
104543

ID

Name

104543

Peter

203456

Sam

234321

Peter

203456

Sam

234321

Maria

Maria

Row-oriented database

104543

203456

234621

Peter

Sam

Maria

Column-oriented database
Performance evaluation of fast integer compression techniques over tables

12/4/2013

7
Compression Algorithm
• Variable length output
o Byte-oriented compression: Integers are coded in
units of bytes. i.e., Variable-Byte
o Block-based compression: These schemes use a
fixed number of input integers and output a
variable number of bytes. e.g., FOR, NewPFD,
FastPFD

Performance evaluation of fast integer compression techniques over tables

12/4/2013

8
Compression Algorithm (Contd …)
• Fixed length output
Each step takes a variable number of integers
and produces a compressed form of those integers
using a fixed number of bits as a unit. i.e., Simple9

Performance evaluation of fast integer compression techniques over tables

12/4/2013

9
Binary packing
• Original Sequence
67

78

85

96

98

• the numbers range from 67 to 98.
• Compressed Sequence
0

11

18

29

Performance evaluation of fast integer compression techniques over tables

31

12/4/2013

10
Patched Compression
• Original Sequence
11

1

10

…

11

11

11111 10

11

• The exception # 11111.
• Base value b=2 (non-exceptional values), maximum
number of bits 5, number of exception 1, location
of exception 125
• Compressed Sequence
11

1

10

…

11

11

11

Performance evaluation of fast integer compression techniques over tables

10

11
12/4/2013

11
Synthetic data experiments
• Compression Ratio Clustered data

Clustered Data

Performance evaluation of fast integer compression techniques over tables

12/4/2013

12
Synthetic data experiments (Contd …)
• Compression Ratio Uniform data

Uniform data
Performance evaluation of fast integer compression techniques over tables

12/4/2013

13
Synthetic data experiments(Contd …)
• Decompression Speed:

Clustered data

Performance evaluation of fast integer compression techniques over tables

12/4/2013

14
Synthetic data experiments(Contd …)

Uniform Data
Performance evaluation of fast integer compression techniques over tables

12/4/2013

15
Real Data Sets
• Census-Income
• Census1881
• Star Schema Benchmark

Performance evaluation of fast integer compression techniques over tables

12/4/2013

16
Column wise Compressed size

Original

Shuffled

Column-wise compressed size for Census1881 of frequency coded file
Performance evaluation of fast integer compression techniques over tables

12/4/2013

17
Column wise Compressed size
(Contd …)

Sort High Cardinality Column (column 1)

Sort Low Cardinality Column(column 3)

Column-wise compressed size for Census1881 of frequency coded file
Performance evaluation of fast integer compression techniques over tables

12/4/2013

18
Column wise Compression
speed

Column-wise compression speed for Census1881 of frequency coded file
Performance evaluation of fast integer compression techniques over tables

12/4/2013

19
Column wise Compression
speed (Contd …)

Column-wise compression speed for Census1881 of frequency coded file
Performance evaluation of fast integer compression techniques over tables

12/4/2013

20
Column wise Decompression
speed

Column-wise decompression speed for Census1881 of frequency coded file
Performance evaluation of fast integer compression techniques over tables

12/4/2013

21
Column wise Decompression
speed (Contd …)

Column-wise decompression speed for Census1881 of frequency coded file
Performance evaluation of fast integer compression techniques over tables

12/4/2013

22
Effect of Row Order

Histogram of compressed size (bits/int)
Performance evaluation of fast integer compression techniques over tables

12/4/2013

23
Conclusion
• Sorting columns results in good compressed size.
• Sorted columns can be compressed and
decompressed faster than shuffled order.
• Selection of compression schemes depends on the
nature of database(OLPT/OLAP) and the
requirement of storage and data access speed.

Performance evaluation of fast integer compression techniques over tables

12/4/2013

24
Future Work
• Incorporating a query engine to asses real world
performance.
• Comparing on processor-level metrics.
• Using multiple threads in compression algorithm.
• Query in compressed form

Performance evaluation of fast integer compression techniques over tables

12/4/2013

25
Thank You

Performance evaluation of fast integer compression techniques over tables

12/4/2013

26
Backup

Performance evaluation of fast integer compression techniques over tables

12/4/2013

27
Key Issues
• Data access latency
The time it takes between the request sent and the
data is found on disk to start processing.
• Disk bandwidth
The amount of data can be sent per second from the
disk.

Performance evaluation of fast integer compression techniques over tables

12/4/2013

28
Experimental Setup
• Hardware
o
o
o
o

Intel Core i5-2400
RAM: 8 GB
Cache: 6MB L3
Memory Clock Speed: 1333 MHz

• Software
o Java SDK version 1.7.0
o https://github.com/lemire/JavaFastPFOR
o Single-threaded

• More Info
o http://hdl.handle.net/1882/45703

Performance evaluation of fast integer compression techniques over tables

12/4/2013

29
Compressed Size
Coding Scheme

Original

Shuffled High Card.

Low Card.

Variable-Byte

15.00

15.00

15.00

15.00

Binary Packing

11.37

11.42

11.15

11.37

NewPFD

13.06

13.19

12.32

13.14

OptPFD

11.84

11.85

11.80

11.80

FastPFOR

11.27

11.29

11.06

11.24

Simple9

15.75

15.90

15.72

15.84

Result of compression (bits per integer) on SSB with frequency coded file
Performance evaluation of fast integer compression techniques over tables

12/4/2013

30
Compression Speed
Coding Scheme

Original

Shuffled High Card.

Low Card.

Variable-Byte

33

31

33

31

Binary Packing

729

711

746

732

NewPFD

52

36

40

34

OptPFD

6

3

5

4

FastPFOR

104

76

89

84

Simple9

78

60

69

64

Result of compression speed (mis) on Census1881 with frequency coded
file
Performance evaluation of fast integer compression techniques over tables

12/4/2013

31
Decompression Speed
Coding Scheme

Original

Shuffled High Card.

Low Card.

Variable-Byte

165

197

214

186

Binary Packing

1151

1089

1151

1135

NewPFD

709

615

729

689

OptPFD

421

357

482

381

FastPFOR

776

707

763

730

Simple9

488

377

447

398

Result of decompression speed (mis) on Census1881 with frequency
coded file
Performance evaluation of fast integer compression techniques over tables

12/4/2013

32
Column wise Compressed size

Original

Shuffled

Column-wise compressed size for Census1881 of frequency coded file
Performance evaluation of fast integer compression techniques over tables

12/4/2013

33
Column wise Compressed size

Sort High Cardinality Column (column 1)

Sort Low Cardinality Column(column 3)

Column-wise compressed size for Census1881 of frequency coded file
Performance evaluation of fast integer compression techniques over tables

12/4/2013

34
Column wise Compression
speed

Column-wise compression speed for Census1881 of frequency coded file
Performance evaluation of fast integer compression techniques over tables

12/4/2013

35
Column wise Decompression
speed

Column-wise decompression speed for Census1881 of frequency coded file
Performance evaluation of fast integer compression techniques over tables

12/4/2013

36
Effect of CPU family on
compression speed

Compression speed (mis) on different processor

Performance evaluation of fast integer compression techniques over tables

12/4/2013

37
Effect of CPU family on
decompression speed

Decompression speed (mis) on different processor
Performance evaluation of fast integer compression techniques over tables

12/4/2013

38

Más contenido relacionado

Último

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 

Último (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 

Destacado

Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 

Destacado (20)

Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 

Performance evaluation of fast integer compression techniques over tables

  • 1. Performance evaluation of fast integer compression techniques over tables Ikhtear Md. Sharif Bhuyan Supervisors: Hazel Webb, Daniel Lemire, Owen Kaser ©Ikhtear Md. Sharif Bhuyan
  • 2. Overview • • • • • • Introduction Compression in databases and issues Objectives Experimental Results Conclusion Future Work Performance evaluation of fast integer compression techniques over tables 12/4/2013 2
  • 3. Query processing Processor Cache RAM Disk Performance evaluation of fast integer compression techniques over tables 12/4/2013 3
  • 4. Compression in databases • • • • Reduce storage Query processing speed Save I/O bandwidth Improve performance for I/O-bound operation Performance evaluation of fast integer compression techniques over tables 12/4/2013 4
  • 5. Selecting Compression in databases • Lossless • Trade off between compression ratio and speed of compression and decompression Performance evaluation of fast integer compression techniques over tables 12/4/2013 5
  • 6. Objective • Examining and comparing the performance of patched schemes with other methods with respect to compression ratio, decompression speed and compression speed. • Assessing the effect of different factors such as row order. Performance evaluation of fast integer compression techniques over tables 12/4/2013 6
  • 8. Compression Algorithm • Variable length output o Byte-oriented compression: Integers are coded in units of bytes. i.e., Variable-Byte o Block-based compression: These schemes use a fixed number of input integers and output a variable number of bytes. e.g., FOR, NewPFD, FastPFD Performance evaluation of fast integer compression techniques over tables 12/4/2013 8
  • 9. Compression Algorithm (Contd …) • Fixed length output Each step takes a variable number of integers and produces a compressed form of those integers using a fixed number of bits as a unit. i.e., Simple9 Performance evaluation of fast integer compression techniques over tables 12/4/2013 9
  • 10. Binary packing • Original Sequence 67 78 85 96 98 • the numbers range from 67 to 98. • Compressed Sequence 0 11 18 29 Performance evaluation of fast integer compression techniques over tables 31 12/4/2013 10
  • 11. Patched Compression • Original Sequence 11 1 10 … 11 11 11111 10 11 • The exception # 11111. • Base value b=2 (non-exceptional values), maximum number of bits 5, number of exception 1, location of exception 125 • Compressed Sequence 11 1 10 … 11 11 11 Performance evaluation of fast integer compression techniques over tables 10 11 12/4/2013 11
  • 12. Synthetic data experiments • Compression Ratio Clustered data Clustered Data Performance evaluation of fast integer compression techniques over tables 12/4/2013 12
  • 13. Synthetic data experiments (Contd …) • Compression Ratio Uniform data Uniform data Performance evaluation of fast integer compression techniques over tables 12/4/2013 13
  • 14. Synthetic data experiments(Contd …) • Decompression Speed: Clustered data Performance evaluation of fast integer compression techniques over tables 12/4/2013 14
  • 15. Synthetic data experiments(Contd …) Uniform Data Performance evaluation of fast integer compression techniques over tables 12/4/2013 15
  • 16. Real Data Sets • Census-Income • Census1881 • Star Schema Benchmark Performance evaluation of fast integer compression techniques over tables 12/4/2013 16
  • 17. Column wise Compressed size Original Shuffled Column-wise compressed size for Census1881 of frequency coded file Performance evaluation of fast integer compression techniques over tables 12/4/2013 17
  • 18. Column wise Compressed size (Contd …) Sort High Cardinality Column (column 1) Sort Low Cardinality Column(column 3) Column-wise compressed size for Census1881 of frequency coded file Performance evaluation of fast integer compression techniques over tables 12/4/2013 18
  • 19. Column wise Compression speed Column-wise compression speed for Census1881 of frequency coded file Performance evaluation of fast integer compression techniques over tables 12/4/2013 19
  • 20. Column wise Compression speed (Contd …) Column-wise compression speed for Census1881 of frequency coded file Performance evaluation of fast integer compression techniques over tables 12/4/2013 20
  • 21. Column wise Decompression speed Column-wise decompression speed for Census1881 of frequency coded file Performance evaluation of fast integer compression techniques over tables 12/4/2013 21
  • 22. Column wise Decompression speed (Contd …) Column-wise decompression speed for Census1881 of frequency coded file Performance evaluation of fast integer compression techniques over tables 12/4/2013 22
  • 23. Effect of Row Order Histogram of compressed size (bits/int) Performance evaluation of fast integer compression techniques over tables 12/4/2013 23
  • 24. Conclusion • Sorting columns results in good compressed size. • Sorted columns can be compressed and decompressed faster than shuffled order. • Selection of compression schemes depends on the nature of database(OLPT/OLAP) and the requirement of storage and data access speed. Performance evaluation of fast integer compression techniques over tables 12/4/2013 24
  • 25. Future Work • Incorporating a query engine to asses real world performance. • Comparing on processor-level metrics. • Using multiple threads in compression algorithm. • Query in compressed form Performance evaluation of fast integer compression techniques over tables 12/4/2013 25
  • 26. Thank You Performance evaluation of fast integer compression techniques over tables 12/4/2013 26
  • 27. Backup Performance evaluation of fast integer compression techniques over tables 12/4/2013 27
  • 28. Key Issues • Data access latency The time it takes between the request sent and the data is found on disk to start processing. • Disk bandwidth The amount of data can be sent per second from the disk. Performance evaluation of fast integer compression techniques over tables 12/4/2013 28
  • 29. Experimental Setup • Hardware o o o o Intel Core i5-2400 RAM: 8 GB Cache: 6MB L3 Memory Clock Speed: 1333 MHz • Software o Java SDK version 1.7.0 o https://github.com/lemire/JavaFastPFOR o Single-threaded • More Info o http://hdl.handle.net/1882/45703 Performance evaluation of fast integer compression techniques over tables 12/4/2013 29
  • 30. Compressed Size Coding Scheme Original Shuffled High Card. Low Card. Variable-Byte 15.00 15.00 15.00 15.00 Binary Packing 11.37 11.42 11.15 11.37 NewPFD 13.06 13.19 12.32 13.14 OptPFD 11.84 11.85 11.80 11.80 FastPFOR 11.27 11.29 11.06 11.24 Simple9 15.75 15.90 15.72 15.84 Result of compression (bits per integer) on SSB with frequency coded file Performance evaluation of fast integer compression techniques over tables 12/4/2013 30
  • 31. Compression Speed Coding Scheme Original Shuffled High Card. Low Card. Variable-Byte 33 31 33 31 Binary Packing 729 711 746 732 NewPFD 52 36 40 34 OptPFD 6 3 5 4 FastPFOR 104 76 89 84 Simple9 78 60 69 64 Result of compression speed (mis) on Census1881 with frequency coded file Performance evaluation of fast integer compression techniques over tables 12/4/2013 31
  • 32. Decompression Speed Coding Scheme Original Shuffled High Card. Low Card. Variable-Byte 165 197 214 186 Binary Packing 1151 1089 1151 1135 NewPFD 709 615 729 689 OptPFD 421 357 482 381 FastPFOR 776 707 763 730 Simple9 488 377 447 398 Result of decompression speed (mis) on Census1881 with frequency coded file Performance evaluation of fast integer compression techniques over tables 12/4/2013 32
  • 33. Column wise Compressed size Original Shuffled Column-wise compressed size for Census1881 of frequency coded file Performance evaluation of fast integer compression techniques over tables 12/4/2013 33
  • 34. Column wise Compressed size Sort High Cardinality Column (column 1) Sort Low Cardinality Column(column 3) Column-wise compressed size for Census1881 of frequency coded file Performance evaluation of fast integer compression techniques over tables 12/4/2013 34
  • 35. Column wise Compression speed Column-wise compression speed for Census1881 of frequency coded file Performance evaluation of fast integer compression techniques over tables 12/4/2013 35
  • 36. Column wise Decompression speed Column-wise decompression speed for Census1881 of frequency coded file Performance evaluation of fast integer compression techniques over tables 12/4/2013 36
  • 37. Effect of CPU family on compression speed Compression speed (mis) on different processor Performance evaluation of fast integer compression techniques over tables 12/4/2013 37
  • 38. Effect of CPU family on decompression speed Decompression speed (mis) on different processor Performance evaluation of fast integer compression techniques over tables 12/4/2013 38