SlideShare a Scribd company logo
1 of 34
Accelerating Foreign-Key Joins using
  Asymmetric Memory Channels

  Holger Pirk     Stefan Manegold     Martin Kersten
  holger@cwi.nl     manegold@cwi.nl      mk@cwi.nl
Why?
• Trivia: Joins   are important

• But: Many    Joins are (Indexed) Foreign Key Joins

• Important       example: Data Warehouse Star Joins

  • Large   Fact Table (many Gigabytes) with Foreign Keys on

  • Small   Dimension Tables (hundreds of Megabytes)

  • Joins   are Performed for Projection or Condition Evaluation
A History of In Memory Join Processing
       (and incidentally of this talk)
• Foreign-Key   Joins on CPUs

 • Naïve   (Indexed Hash) Foreign-Key Joins

 • Clustered    Foreign-Key Joins
                                              t
• Foreign-Key   Joins on GPGPUs

 • Clustered    Foreign-Key Joins

 • Naïve   (Asymmetric) Foreign-Key Joins
Premises for this talk

• Focus   is on Indexed Foreign Key Joins

• We   assume an unclustered Foreign Key Index Column

• i.e., Pointers   to Matching Tuples of the Target Table
Foreign-Key Joins on CPUs
The Problem: Bandwidth Bound CPUs
Core 1      Core 2    ...    Core 4




         Memory Controller            • Bottleneck for almost all
                                       relational (Memory
                    8.5 GB/s
                                       Resident) DBMS operations
             8.5 GB/s
                                      • Valuable Target   for
Memory       Memory          Memory    Optimization
Naïve Foreign-Key Joins on CPUs
Naïve Foreign-Key Joins on CPUs

•Foreign-Key Index is scanned sequentially

•Target Table is randomly accessed
Naïve Foreign-Key Joins on CPUs



• Optimization Targets:

  • Foreign-Key   Index is scanned sequentially

  • Index   Cache lines accessed once, i.e., not much to optimize
Naïve Foreign-Key Joins on CPUs

• Large Target Tables   cause heavy Cache Thrashing

• Assume    a (theoretical) 1-Slot Cache with 2-Value Cache Lines

• How   many Cache Misses do you get?




                                           }
                                           }
    • 80%   of are Capacitive Misses         6        6
• Target Table   Lookups are rewarding optimization Targets
Clustered Foreign-Key Joins on CPUs
Clustered Foreign-Key Joins on CPUs

• Require   a Value-partitioned Foreign-Key Index

• Foreign-Key    Index Partitions are scanned



• Target Table   Random Access Locality is improved
Clustered Foreign-Key Joins on CPUs

    • Require   a Value-partitioned Foreign-Key Index

    • Foreign-Key    Index Partitions are scanned

[                                   ][                    ]
    • Target Table   Random Access Locality is improved

            [        ][     ]
Clustered Foreign-Key Joins on CPUs

• What    is the number of Target Table Cache Misses?

• So   Cache behavior is greatly improved
                             [       ][       ]
• Also: (disjoint)   Partitions can be processed in parallel
                             }
  • Parallelism              }   1        1
                  is generally good for GPUs

• But: The   Index has to be Value (e.g., Radix)-partitioned first
Clustered Foreign-Key Joins on CPUs
                                 6.0
                                                        Unpartitioned Join (CPU)
    Processing Time in seconds

                                                        Partition (CPU)
                                                        Join (CPU)
                                 4.5
                                                        Radix Join (CPU)


                                 3.0



                                                                                 about 30 %
                                                                                 (consistent with [Blanas11])
                                 1.5




                                  0
                                       1        10           100          1000              10000

                                           Number of Partitions in Clustering Step
                                               (Target Table Size: 64M)
Clustered Foreign-Key Joins on CPUs
                                25.00

                                          CPU          GPU
    Partition time in Seconds



                                                                        20.512
                                18.75




                                12.50

                                                   11.040


                                 6.25
                                                             6.492860



                                        1.695271
                                   0

                                         256 Clusters        65536 Clusters
Foreign-Key Joins on GPUs
GPU Architecture Overview
    Processing Unit 1                                                  Processing Unit 16




                                  Instruction Dispatcher




                                                                                                        Instruction Dispatcher
     Core   Core   Core    Core                                            Core   Core   Core    Core

     Core   Core   Core    Core                                            Core   Core   Core    Core

     Core   Core   Core    Core
                                                            .....          Core   Core   Core    Core

     Core   Core   Core    Core                                            Core   Core   Core    Core

            Local Memory                                                          Local Memory



Expensive Atomic Operations
                                                           Global Memory
The Problem: Bandwidth Bound GPUs

  Core 1      Core 2    ...    Core 4                     GPU        GPU
                                             PCI-E x 16     177.4
                                               (8 GB/s)      GB/s

                                                 I/O
           Memory Controller                     Hub
                                       QPI
                                   (25.6 GB/s)
• PCI-EBottleneck is a big
                 8.5 GB/s
 Problem for GPUs
          8.5 GB/s

                                                       Graphics     Graphics
  Memory       Memory          Memory                  Memory       Memory
Clustered Foreign-Key Joins on GPUs
Clustered Foreign-Key Joins on GPUs



• Port   of the Radix-Join from CPU to GPU [He08]

• Improved    Cache Performance is Important on GPUs too

• Allows   efficient parallel processing of Partitions
Clustered Foreign-Key Joins on GPUs
                               6.0
                                         Join (GPU)
                                         Partition (CPU)
  Processing Time in seconds




                               4.5
                                         Radix Join (CPU/GPU)



                               3.0



                                                                                           about 11 %
                               1.5




                                                                                           about 70 %
                                0
                                     1        10           100           1000      10000

                                         Number of Partitions in Clustering Step
Naïve Foreign-Key Joins on GPUs
Naïve Foreign-Key Joins on GPUs


• Primary   Bottleneck is PCI-E, not GPU Memory

• Idea: Accelerate   FK Joins by exploiting the Strength of GPUs:

  • Random Access      to target table

• Limit   the PCI-E traffic to a minimum:

  • Stream    only the Index
Asymmetric Foreign-Key Joins on GPUs


• Primary   Bottleneck is PCI-E, not GPU Memory

• Idea: Accelerate   FK Joins by exploiting the Strength of GPUs:

  • Random Access      to target table

• Limit   the PCI-E traffic to a minimum:

  • Stream    only the Index
Asymmetric Foreign-Key Joins on GPUs
                                6.0



                                          Join (GPU)
   Processing Time in seconds




                                4.5
                                          Partition (CPU)
                                          Radix Join (CPU/GPU)

                                3.0




                                1.5




                                 0
                                      1        10           100           1000      10000

                                          Number of Partitions in Clustering Step
Asymmetric Foreign-Key Joins on GPUs
                                6.0
                                          Unpartitioned Join (GPU)
                                          Join (GPU)
   Processing Time in seconds




                                4.5
                                          Partition (CPU)
                                          Radix Join (CPU/GPU)

                                3.0




                                1.5
                                                                                 about factor 3.5


                                 0
                                      1        10           100           1000              10000

                                          Number of Partitions in Clustering Step
Asymmetric Foreign-Key Joins on GPUs



• Message: Clustering   improves performance on GPUs

• But: Cluster   Costs don’t pay off in Cache Friendliness

• But   what about parallel processing of Partitions?
But what about Parallelization
                             1.00

                                                                              Total Costs
                                                                              Data transfer Costs
Processing time in Seconds




                             0.75


                                                                                       about 41 %
                                                                                       (not enough)
                             0.50




                             0.25




                               0
                                    1                 10                100                   1000

                                    Number of Utilized Cores per Processor (Work Group Size)
Foreign-Key Joins on GPUs vs CPUs
                               1.100
                                            Intel i7 860
  Processing Time in Seconds


                                            ATI Radeon HD 5850
                               0.825
                                            PCI-E Data Transfer Costs



                               0.550




                               0.275




                                  0
                                       1K    10K      100K     1000K    10000K   100000K

                                                      Dimension Table Size
Foreign-Key Joins on GPUs vs CPUs
                               1.100
                                            Intel i7 860
  Processing Time in Seconds


                                            ATI Radeon HD 5850 (processing only)
                               0.825
                                            PCI-E Data Transfer Costs



                               0.550




                               0.275




                                  0
                                       1K    10K      100K    1000K    10000K   100000K

                                                      Dimension Table Size
Conclusion



• Unclustered   FK-Joins Cause Heavy Cache Thrashing

• Thrashing   can be offloaded to fast GPU memory

• PCI-E   Bandwidth can be saturated without Clustering

• Naïve   FK-Joins on GPUs outperform Clustered Joins on CPUs
Thank you for Listening
Questions?   Feedback?

More Related Content

More from PlanetData Network of Excellence

On Leveraging Crowdsourcing Techniques for Schema Matching Networks
On Leveraging Crowdsourcing Techniques for Schema Matching NetworksOn Leveraging Crowdsourcing Techniques for Schema Matching Networks
On Leveraging Crowdsourcing Techniques for Schema Matching NetworksPlanetData Network of Excellence
 
Towards Enabling Probabilistic Databases for Participatory Sensing
Towards Enabling Probabilistic Databases for Participatory SensingTowards Enabling Probabilistic Databases for Participatory Sensing
Towards Enabling Probabilistic Databases for Participatory SensingPlanetData Network of Excellence
 
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstream
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstreamDemo: tablet-based visualisation of transport data in Madrid using SPARQLstream
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstreamPlanetData Network of Excellence
 
On the need for a W3C community group on RDF Stream Processing
On the need for a W3C community group on RDF Stream ProcessingOn the need for a W3C community group on RDF Stream Processing
On the need for a W3C community group on RDF Stream ProcessingPlanetData Network of Excellence
 
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...PlanetData Network of Excellence
 
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatch
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatchLinking Smart Cities Datasets with Human Computation: the case of UrbanMatch
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatchPlanetData Network of Excellence
 
SciQL, Bridging the Gap between Science and Relational DBMS
SciQL, Bridging the Gap between Science and Relational DBMSSciQL, Bridging the Gap between Science and Relational DBMS
SciQL, Bridging the Gap between Science and Relational DBMSPlanetData Network of Excellence
 
Scalable Nonmonotonic Reasoning over RDF Data Using MapReduce
Scalable Nonmonotonic Reasoning over RDF Data Using MapReduceScalable Nonmonotonic Reasoning over RDF Data Using MapReduce
Scalable Nonmonotonic Reasoning over RDF Data Using MapReducePlanetData Network of Excellence
 
Evolution of Workflow Provenance Information in the Presence of Custom Infere...
Evolution of Workflow Provenance Information in the Presence of Custom Infere...Evolution of Workflow Provenance Information in the Presence of Custom Infere...
Evolution of Workflow Provenance Information in the Presence of Custom Infere...PlanetData Network of Excellence
 
Towards Parallel Nonmonotonic Reasoning with Billions of Facts
Towards Parallel Nonmonotonic Reasoning with Billions of FactsTowards Parallel Nonmonotonic Reasoning with Billions of Facts
Towards Parallel Nonmonotonic Reasoning with Billions of FactsPlanetData Network of Excellence
 
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...PlanetData Network of Excellence
 
Adaptive Semantic Data Management Techniques for Federations of Endpoints
Adaptive Semantic Data Management Techniques for Federations of EndpointsAdaptive Semantic Data Management Techniques for Federations of Endpoints
Adaptive Semantic Data Management Techniques for Federations of EndpointsPlanetData Network of Excellence
 

More from PlanetData Network of Excellence (20)

On Leveraging Crowdsourcing Techniques for Schema Matching Networks
On Leveraging Crowdsourcing Techniques for Schema Matching NetworksOn Leveraging Crowdsourcing Techniques for Schema Matching Networks
On Leveraging Crowdsourcing Techniques for Schema Matching Networks
 
Towards Enabling Probabilistic Databases for Participatory Sensing
Towards Enabling Probabilistic Databases for Participatory SensingTowards Enabling Probabilistic Databases for Participatory Sensing
Towards Enabling Probabilistic Databases for Participatory Sensing
 
Privacy-Preserving Schema Reuse
Privacy-Preserving Schema ReusePrivacy-Preserving Schema Reuse
Privacy-Preserving Schema Reuse
 
Pay-as-you-go Reconciliation in Schema Matching Networks
Pay-as-you-go Reconciliation in Schema Matching NetworksPay-as-you-go Reconciliation in Schema Matching Networks
Pay-as-you-go Reconciliation in Schema Matching Networks
 
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstream
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstreamDemo: tablet-based visualisation of transport data in Madrid using SPARQLstream
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstream
 
On the need for a W3C community group on RDF Stream Processing
On the need for a W3C community group on RDF Stream ProcessingOn the need for a W3C community group on RDF Stream Processing
On the need for a W3C community group on RDF Stream Processing
 
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
 
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatch
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatchLinking Smart Cities Datasets with Human Computation: the case of UrbanMatch
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatch
 
SciQL, Bridging the Gap between Science and Relational DBMS
SciQL, Bridging the Gap between Science and Relational DBMSSciQL, Bridging the Gap between Science and Relational DBMS
SciQL, Bridging the Gap between Science and Relational DBMS
 
CLODA: A Crowdsourced Linked Open Data Architecture
CLODA: A Crowdsourced Linked Open Data ArchitectureCLODA: A Crowdsourced Linked Open Data Architecture
CLODA: A Crowdsourced Linked Open Data Architecture
 
Scalable Nonmonotonic Reasoning over RDF Data Using MapReduce
Scalable Nonmonotonic Reasoning over RDF Data Using MapReduceScalable Nonmonotonic Reasoning over RDF Data Using MapReduce
Scalable Nonmonotonic Reasoning over RDF Data Using MapReduce
 
Data and Knowledge Evolution
Data and Knowledge Evolution  Data and Knowledge Evolution
Data and Knowledge Evolution
 
Evolution of Workflow Provenance Information in the Presence of Custom Infere...
Evolution of Workflow Provenance Information in the Presence of Custom Infere...Evolution of Workflow Provenance Information in the Presence of Custom Infere...
Evolution of Workflow Provenance Information in the Presence of Custom Infere...
 
Access Control for RDF graphs using Abstract Models
Access Control for RDF graphs using Abstract ModelsAccess Control for RDF graphs using Abstract Models
Access Control for RDF graphs using Abstract Models
 
Arrays in Databases, the next frontier?
Arrays in Databases, the next frontier?Arrays in Databases, the next frontier?
Arrays in Databases, the next frontier?
 
Abstract Access Control Model for Dynamic RDF Datasets
Abstract Access Control Model for Dynamic RDF DatasetsAbstract Access Control Model for Dynamic RDF Datasets
Abstract Access Control Model for Dynamic RDF Datasets
 
Towards Parallel Nonmonotonic Reasoning with Billions of Facts
Towards Parallel Nonmonotonic Reasoning with Billions of FactsTowards Parallel Nonmonotonic Reasoning with Billions of Facts
Towards Parallel Nonmonotonic Reasoning with Billions of Facts
 
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
 
Heuristic based Query Optimisation for SPARQL
Heuristic based Query Optimisation for SPARQLHeuristic based Query Optimisation for SPARQL
Heuristic based Query Optimisation for SPARQL
 
Adaptive Semantic Data Management Techniques for Federations of Endpoints
Adaptive Semantic Data Management Techniques for Federations of EndpointsAdaptive Semantic Data Management Techniques for Federations of Endpoints
Adaptive Semantic Data Management Techniques for Federations of Endpoints
 

Recently uploaded

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 

Recently uploaded (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 

Accelerating Foreign-Key Joins using Asymmetric Memory Channels

  • 1. Accelerating Foreign-Key Joins using Asymmetric Memory Channels Holger Pirk Stefan Manegold Martin Kersten holger@cwi.nl manegold@cwi.nl mk@cwi.nl
  • 2. Why? • Trivia: Joins are important • But: Many Joins are (Indexed) Foreign Key Joins • Important example: Data Warehouse Star Joins • Large Fact Table (many Gigabytes) with Foreign Keys on • Small Dimension Tables (hundreds of Megabytes) • Joins are Performed for Projection or Condition Evaluation
  • 3. A History of In Memory Join Processing (and incidentally of this talk) • Foreign-Key Joins on CPUs • Naïve (Indexed Hash) Foreign-Key Joins • Clustered Foreign-Key Joins t • Foreign-Key Joins on GPGPUs • Clustered Foreign-Key Joins • Naïve (Asymmetric) Foreign-Key Joins
  • 4. Premises for this talk • Focus is on Indexed Foreign Key Joins • We assume an unclustered Foreign Key Index Column • i.e., Pointers to Matching Tuples of the Target Table
  • 6. The Problem: Bandwidth Bound CPUs Core 1 Core 2 ... Core 4 Memory Controller • Bottleneck for almost all relational (Memory 8.5 GB/s Resident) DBMS operations 8.5 GB/s • Valuable Target for Memory Memory Memory Optimization
  • 8. Naïve Foreign-Key Joins on CPUs •Foreign-Key Index is scanned sequentially •Target Table is randomly accessed
  • 9. Naïve Foreign-Key Joins on CPUs • Optimization Targets: • Foreign-Key Index is scanned sequentially • Index Cache lines accessed once, i.e., not much to optimize
  • 10. Naïve Foreign-Key Joins on CPUs • Large Target Tables cause heavy Cache Thrashing • Assume a (theoretical) 1-Slot Cache with 2-Value Cache Lines • How many Cache Misses do you get? } } • 80% of are Capacitive Misses 6 6 • Target Table Lookups are rewarding optimization Targets
  • 12. Clustered Foreign-Key Joins on CPUs • Require a Value-partitioned Foreign-Key Index • Foreign-Key Index Partitions are scanned • Target Table Random Access Locality is improved
  • 13. Clustered Foreign-Key Joins on CPUs • Require a Value-partitioned Foreign-Key Index • Foreign-Key Index Partitions are scanned [ ][ ] • Target Table Random Access Locality is improved [ ][ ]
  • 14. Clustered Foreign-Key Joins on CPUs • What is the number of Target Table Cache Misses? • So Cache behavior is greatly improved [ ][ ] • Also: (disjoint) Partitions can be processed in parallel } • Parallelism } 1 1 is generally good for GPUs • But: The Index has to be Value (e.g., Radix)-partitioned first
  • 15. Clustered Foreign-Key Joins on CPUs 6.0 Unpartitioned Join (CPU) Processing Time in seconds Partition (CPU) Join (CPU) 4.5 Radix Join (CPU) 3.0 about 30 % (consistent with [Blanas11]) 1.5 0 1 10 100 1000 10000 Number of Partitions in Clustering Step (Target Table Size: 64M)
  • 16. Clustered Foreign-Key Joins on CPUs 25.00 CPU GPU Partition time in Seconds 20.512 18.75 12.50 11.040 6.25 6.492860 1.695271 0 256 Clusters 65536 Clusters
  • 18. GPU Architecture Overview Processing Unit 1 Processing Unit 16 Instruction Dispatcher Instruction Dispatcher Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core ..... Core Core Core Core Core Core Core Core Core Core Core Core Local Memory Local Memory Expensive Atomic Operations Global Memory
  • 19. The Problem: Bandwidth Bound GPUs Core 1 Core 2 ... Core 4 GPU GPU PCI-E x 16 177.4 (8 GB/s) GB/s I/O Memory Controller Hub QPI (25.6 GB/s) • PCI-EBottleneck is a big 8.5 GB/s Problem for GPUs 8.5 GB/s Graphics Graphics Memory Memory Memory Memory Memory
  • 21. Clustered Foreign-Key Joins on GPUs • Port of the Radix-Join from CPU to GPU [He08] • Improved Cache Performance is Important on GPUs too • Allows efficient parallel processing of Partitions
  • 22. Clustered Foreign-Key Joins on GPUs 6.0 Join (GPU) Partition (CPU) Processing Time in seconds 4.5 Radix Join (CPU/GPU) 3.0 about 11 % 1.5 about 70 % 0 1 10 100 1000 10000 Number of Partitions in Clustering Step
  • 24. Naïve Foreign-Key Joins on GPUs • Primary Bottleneck is PCI-E, not GPU Memory • Idea: Accelerate FK Joins by exploiting the Strength of GPUs: • Random Access to target table • Limit the PCI-E traffic to a minimum: • Stream only the Index
  • 25. Asymmetric Foreign-Key Joins on GPUs • Primary Bottleneck is PCI-E, not GPU Memory • Idea: Accelerate FK Joins by exploiting the Strength of GPUs: • Random Access to target table • Limit the PCI-E traffic to a minimum: • Stream only the Index
  • 26. Asymmetric Foreign-Key Joins on GPUs 6.0 Join (GPU) Processing Time in seconds 4.5 Partition (CPU) Radix Join (CPU/GPU) 3.0 1.5 0 1 10 100 1000 10000 Number of Partitions in Clustering Step
  • 27. Asymmetric Foreign-Key Joins on GPUs 6.0 Unpartitioned Join (GPU) Join (GPU) Processing Time in seconds 4.5 Partition (CPU) Radix Join (CPU/GPU) 3.0 1.5 about factor 3.5 0 1 10 100 1000 10000 Number of Partitions in Clustering Step
  • 28. Asymmetric Foreign-Key Joins on GPUs • Message: Clustering improves performance on GPUs • But: Cluster Costs don’t pay off in Cache Friendliness • But what about parallel processing of Partitions?
  • 29. But what about Parallelization 1.00 Total Costs Data transfer Costs Processing time in Seconds 0.75 about 41 % (not enough) 0.50 0.25 0 1 10 100 1000 Number of Utilized Cores per Processor (Work Group Size)
  • 30. Foreign-Key Joins on GPUs vs CPUs 1.100 Intel i7 860 Processing Time in Seconds ATI Radeon HD 5850 0.825 PCI-E Data Transfer Costs 0.550 0.275 0 1K 10K 100K 1000K 10000K 100000K Dimension Table Size
  • 31. Foreign-Key Joins on GPUs vs CPUs 1.100 Intel i7 860 Processing Time in Seconds ATI Radeon HD 5850 (processing only) 0.825 PCI-E Data Transfer Costs 0.550 0.275 0 1K 10K 100K 1000K 10000K 100000K Dimension Table Size
  • 32. Conclusion • Unclustered FK-Joins Cause Heavy Cache Thrashing • Thrashing can be offloaded to fast GPU memory • PCI-E Bandwidth can be saturated without Clustering • Naïve FK-Joins on GPUs outperform Clustered Joins on CPUs
  • 33. Thank you for Listening
  • 34. Questions? Feedback?