SlideShare una empresa de Scribd logo
1 de 34
Descargar para leer sin conexión
Thread Clustering:
Sharing-Aware Thread Scheduling
on SMP-CMP-SMT Multiprocessors

   David Tam, Reza Azimi, Michael Stumm

              University of Toronto
      {tamda, azimi, stumm}@eecg.toronto.edu



                                               Thread Clustering
Multiprocessors Today
Example: IBM Power 5 system




                                                  1
                              Thread Clustering
Multiprocessors Today
 Example: IBM Power 5 system
                   SMP
         CMP
                               SMT




SHARED
CACHE


                                                     1
                                 Thread Clustering
Multiprocessors Today
Example: IBM Power 5 system
           Disparity in L2 latencies



                     120
           14




                                                           1
                                       Thread Clustering
Operating Systems Today
CPU Schedulers:
      Ignore disparity in L2 latencies
  ●


      Ignore data sharing among threads
  ●


           Distribute threads poorly
       ●



      Cross-chip traffic
  ●


           Remote L2 cache accesses
       ●




      Causes performance problem
  ●


                                                              2
                                          Thread Clustering
Our Goal: Sharing-Aware Scheduling
     Detect sharing patterns
 ●


     Cluster threads
 ●




 Benefits:
         Decrease cross-chip traffic
     ●


         Increase on-chip cache locality
     ●


         Exploit shared L2 caches
     ●


                                                               3
                                           Thread Clustering
Our Online Technique

         STEPS:
         1) Monitor remote cache access rate
         2) Detect thread sharing patterns
REPEAT
         3) Determine thread clusters
         4) Migrate thread clusters




                                                                   4
                                               Thread Clustering
Sharing Detection
    To observe remote cache accesses:
●


        Exploit HPCs (hardware performance counters)
    ●


        Sample remote cache miss addresses
    ●


                Local cache misses satisfied by remote cache
            ●


                IBM Power 5 continuous data sampling
            ●




           1
                X



                                                                                   5
                                                               Thread Clustering
Sharing Detection
    To observe remote cache accesses:
●


        Exploit HPCs (hardware performance counters)
    ●


        Sample remote cache miss addresses
    ●


                Local cache misses satisfied by remote cache
            ●


                IBM Power 5 continuous data sampling
            ●




                X

                                 2
                                                                                   5
                                                               Thread Clustering
Sharing Detection
    To observe remote cache accesses:
●


        Exploit HPCs (hardware performance counters)
    ●


        Sample remote cache miss addresses
    ●


                Local cache misses satisfied by remote cache
            ●


                IBM Power 5 continuous data sampling
            ●




                                 3

                                                                                   5
                                                               Thread Clustering
Sharing Signatures
    Construct for each thread
●


        Counts remote cache accesses
    ●




                          Conceptually


                                                       virtual address
        virtual address
                                                              264
                0




                 block
                                       8-bit counter




                                                                                  6
                                                              Thread Clustering
Sharing Signatures
    Construct for each thread
●


        Counts remote cache accesses
    ●




                          Conceptually


                                                       virtual address
        virtual address
                                                              264
                0

                            ctri++

                 block
                                       8-bit counter




                                                                                  6
                                                              Thread Clustering
Optimizations
    CPU: Temporal Sampling
●


        Sample every Nth remote cache access
    ●



    Memory: Spatial Sampling
●


        256-entry vector
    ●


        Hash function
    ●


        Block ID filter
    ●



    Vectors still effective at indicating sharing
●




                                                                        7
                                                    Thread Clustering
Spatial Sampling
    Hash collision & alias removal
●




                                      Filter Legend
                                        Empty
                                        Reserved




    Block ID

                                          255
               0




                                          255
               0


                                                                8
                                            Thread Clustering
Spatial Sampling
    Hash collision & alias removal
●




                                      Filter Legend
                                        Empty
                                        Reserved




                   hash
                          EMPTY
    Block ID

                                          255
               0




                                          255
               0


                                                                8
                                            Thread Clustering
Spatial Sampling
    Hash collision & alias removal
●




                                                        Filter Legend
                                                          Empty
                                                          Reserved




                   hash
                          (First-Come-First-Reserved)
    Block ID

                                                            255
               0




                                                            255
               0


                                                                                  8
                                                              Thread Clustering
Spatial Sampling
    Hash collision & alias removal
●




                                                  Filter Legend
                                                    Empty
                                                    Reserved




                          hash
                                 MATCH Block ID
    Block ID

                                                      255
               0




                                                      255
               0


                                                                            8
                                                        Thread Clustering
Spatial Sampling
    Hash collision & alias removal
●




                                             Filter Legend
                                               Empty
                                               Reserved




                        hash

                         MISMATCH Block ID
    Block ID

                                                 255
               0

                     ALIASING PREVENTED


                                                 255
               0


                                                                       8
                                                   Thread Clustering
Automated Clustering
Clustering Heuristic:
      Simple, one-pass algorithm
  ●


      Compare vector against existing clusters
  ●


      If not similar, create a new cluster
  ●




Similarity Metric:
      N
  ∑       V1[i] * V2[i]
  i=0

      Shared blocks amplified
  ●


      Non-shared blocks nullified
  ●



                                                                     9
                                                 Thread Clustering
Experimental Platform
    8-way Power 5, 1.5GHz
●

    Linux 2.6
●

    IBM J2SE 5.0 JVM
●




                            1.9MB L2
              1.9MB L2
                                       36MB
      36MB




                 4 GB       4 GB



                                                              10
                                          Thread Clustering
Workloads
Microbenchmark
      expect 4 clusters
  ●

           4 threads per cluster
       ●


SPECjbb2000 (modified)
      expect 2 clusters
  ●

           2 warehouses, 8 threads per warehouse
       ●


RUBiS + MySQL
      expect 2 clusters
  ●

           2 databases, 16 threads per database
       ●


VolanoMark chat server
      expect 2 clusters
  ●

           2 rooms, 8 threads per room
       ●




                                                                       11
                                                   Thread Clustering
Visualizing Clusters
                                    Counter Values

         An example                         255
     ●
                                            128
                                            64
                                            0




         {
Cluster A,
4 vectors




         {
Cluster B,
4 vectors


                                                              12
                                          Thread Clustering
Visualizing Clusters
                                    Counter Values

         An example                         255
     ●
                                            128
                                            64
                                            0




         {
Cluster A,
4 vectors




         {
Cluster B,
4 vectors


                                                              12
                                          Thread Clustering
Visualizing Clusters
                                    Counter Values

         An example                         255
     ●
                                            128
                                            64
                                            0




         {
Cluster A,
4 vectors




         {
Cluster B,
4 vectors




                                                              12
                                          Thread Clustering
Visualizing Clusters
          Microbenchmark
      ●




     {
   4
vectors




                                                         13
                                     Thread Clustering
Visualizing Clusters
          Modified SPECjbb2000 (4 warehouses)
      ●




     {
  16
vectors




                                                                    14
                                                Thread Clustering
Visualizing Clusters
          RUBiS + MySQL (2 databases)
      ●




   {
  24
vectors




                                                            15
                                        Thread Clustering
Visualizing Clusters
    VolanoMark (4 rooms)
●




                                                    16
                                Thread Clustering
Remote Cache Impact
    Normalized to default Linux
●



                                              90


               70 72



                                43
                                         32
                           22

                       9
                                     2
         -17




                                                                       17
                                                   Thread Clustering
Performance Impact
    IPC: instructions per cycle
●

    Normalized to default Linux
●


                                     7.4               7.4
                               7.1

               6.1 6.1
                                                 5.1
                         5.0

                                           3.7




        -0.8




                                                                                 18
                                                             Thread Clustering
Summary


                                 AFTER:
      BEFORE:
                            Operating System With
Current Operating Systems
                              Thread Clustering




                                                                 19
                                             Thread Clustering
Conclusions
    HPCs can detect sharing
●


    Sharing signatures are effective
●


    Automated thread clustering:
●


        Reduces remote cache access up to 70%
    ●


        Improves performance up to 7%
    ●


    All with low overhead
●




Future Work:
        More workloads
    ●


        Improve clustering algorithm
    ●


        Integration with load-balancing aspects
    ●




                                                                      20
                                                  Thread Clustering
Thread Clustering
Sampling Overhead
    Modified SPECjbb2000
●




                            Thread Clustering

Más contenido relacionado

La actualidad más candente

ScalableCore System: A Scalable Many-core Simulator by Employing Over 100 FPGAs
ScalableCore System: A Scalable Many-core Simulator by Employing Over 100 FPGAsScalableCore System: A Scalable Many-core Simulator by Employing Over 100 FPGAs
ScalableCore System: A Scalable Many-core Simulator by Employing Over 100 FPGAsShinya Takamaeda-Y
 
IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...
IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...
IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...npinto
 
Stefano Giordano
Stefano GiordanoStefano Giordano
Stefano GiordanoGoWireless
 
IAP09 CUDA@MIT 6.963 - Lecture 01: GPU Computing using CUDA (David Luebke, NV...
IAP09 CUDA@MIT 6.963 - Lecture 01: GPU Computing using CUDA (David Luebke, NV...IAP09 CUDA@MIT 6.963 - Lecture 01: GPU Computing using CUDA (David Luebke, NV...
IAP09 CUDA@MIT 6.963 - Lecture 01: GPU Computing using CUDA (David Luebke, NV...npinto
 
Durgam vahia open_sparc_fpga
Durgam vahia open_sparc_fpgaDurgam vahia open_sparc_fpga
Durgam vahia open_sparc_fpgaObsidian Software
 
Under the Armor of Knights Corner: Intel MIC Architecture at Hotchips 2012
Under the Armor of Knights Corner: Intel MIC Architecture at Hotchips 2012Under the Armor of Knights Corner: Intel MIC Architecture at Hotchips 2012
Under the Armor of Knights Corner: Intel MIC Architecture at Hotchips 2012Intel IT Center
 
GENI - Seminário - Inatel
GENI - Seminário - InatelGENI - Seminário - Inatel
GENI - Seminário - InatelLúcio Henrique
 
Direct Code Execution - LinuxCon Japan 2014
Direct Code Execution - LinuxCon Japan 2014Direct Code Execution - LinuxCon Japan 2014
Direct Code Execution - LinuxCon Japan 2014Hajime Tazaki
 
Os Wardenupdated
Os WardenupdatedOs Wardenupdated
Os Wardenupdatedoscon2007
 
IAP09 CUDA@MIT 6.963 - Lecture 04: CUDA Advanced #1 (Nicolas Pinto, MIT)
IAP09 CUDA@MIT 6.963 - Lecture 04: CUDA Advanced #1 (Nicolas Pinto, MIT)IAP09 CUDA@MIT 6.963 - Lecture 04: CUDA Advanced #1 (Nicolas Pinto, MIT)
IAP09 CUDA@MIT 6.963 - Lecture 04: CUDA Advanced #1 (Nicolas Pinto, MIT)npinto
 
Network stack personality in Android phone - netdev 2.2
Network stack personality in Android phone - netdev 2.2Network stack personality in Android phone - netdev 2.2
Network stack personality in Android phone - netdev 2.2Hajime Tazaki
 
[Harvard CS264] 05 - Advanced-level CUDA Programming
[Harvard CS264] 05 - Advanced-level CUDA Programming[Harvard CS264] 05 - Advanced-level CUDA Programming
[Harvard CS264] 05 - Advanced-level CUDA Programmingnpinto
 
Linux Kernel Library - Reusing Monolithic Kernel
Linux Kernel Library - Reusing Monolithic KernelLinux Kernel Library - Reusing Monolithic Kernel
Linux Kernel Library - Reusing Monolithic KernelHajime Tazaki
 
Cisco 刘洋 从“路由”回归“交换”
Cisco 刘洋 从“路由”回归“交换”Cisco 刘洋 从“路由”回归“交换”
Cisco 刘洋 从“路由”回归“交换”guiyingshenxia
 

La actualidad más candente (17)

ScalableCore System: A Scalable Many-core Simulator by Employing Over 100 FPGAs
ScalableCore System: A Scalable Many-core Simulator by Employing Over 100 FPGAsScalableCore System: A Scalable Many-core Simulator by Employing Over 100 FPGAs
ScalableCore System: A Scalable Many-core Simulator by Employing Over 100 FPGAs
 
Quantum Networks
Quantum NetworksQuantum Networks
Quantum Networks
 
IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...
IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...
IAP09 CUDA@MIT 6.963 - Guest Lecture: Out-of-Core Programming with NVIDIA's C...
 
Stefano Giordano
Stefano GiordanoStefano Giordano
Stefano Giordano
 
IAP09 CUDA@MIT 6.963 - Lecture 01: GPU Computing using CUDA (David Luebke, NV...
IAP09 CUDA@MIT 6.963 - Lecture 01: GPU Computing using CUDA (David Luebke, NV...IAP09 CUDA@MIT 6.963 - Lecture 01: GPU Computing using CUDA (David Luebke, NV...
IAP09 CUDA@MIT 6.963 - Lecture 01: GPU Computing using CUDA (David Luebke, NV...
 
Durgam vahia open_sparc_fpga
Durgam vahia open_sparc_fpgaDurgam vahia open_sparc_fpga
Durgam vahia open_sparc_fpga
 
Under the Armor of Knights Corner: Intel MIC Architecture at Hotchips 2012
Under the Armor of Knights Corner: Intel MIC Architecture at Hotchips 2012Under the Armor of Knights Corner: Intel MIC Architecture at Hotchips 2012
Under the Armor of Knights Corner: Intel MIC Architecture at Hotchips 2012
 
GENI - Seminário - Inatel
GENI - Seminário - InatelGENI - Seminário - Inatel
GENI - Seminário - Inatel
 
Direct Code Execution - LinuxCon Japan 2014
Direct Code Execution - LinuxCon Japan 2014Direct Code Execution - LinuxCon Japan 2014
Direct Code Execution - LinuxCon Japan 2014
 
Os Wardenupdated
Os WardenupdatedOs Wardenupdated
Os Wardenupdated
 
IAP09 CUDA@MIT 6.963 - Lecture 04: CUDA Advanced #1 (Nicolas Pinto, MIT)
IAP09 CUDA@MIT 6.963 - Lecture 04: CUDA Advanced #1 (Nicolas Pinto, MIT)IAP09 CUDA@MIT 6.963 - Lecture 04: CUDA Advanced #1 (Nicolas Pinto, MIT)
IAP09 CUDA@MIT 6.963 - Lecture 04: CUDA Advanced #1 (Nicolas Pinto, MIT)
 
Network stack personality in Android phone - netdev 2.2
Network stack personality in Android phone - netdev 2.2Network stack personality in Android phone - netdev 2.2
Network stack personality in Android phone - netdev 2.2
 
[Harvard CS264] 05 - Advanced-level CUDA Programming
[Harvard CS264] 05 - Advanced-level CUDA Programming[Harvard CS264] 05 - Advanced-level CUDA Programming
[Harvard CS264] 05 - Advanced-level CUDA Programming
 
Linux Kernel Library - Reusing Monolithic Kernel
Linux Kernel Library - Reusing Monolithic KernelLinux Kernel Library - Reusing Monolithic Kernel
Linux Kernel Library - Reusing Monolithic Kernel
 
Virtual net performance
Virtual net performanceVirtual net performance
Virtual net performance
 
Shellcoding, an Introduction
Shellcoding, an IntroductionShellcoding, an Introduction
Shellcoding, an Introduction
 
Cisco 刘洋 从“路由”回归“交换”
Cisco 刘洋 从“路由”回归“交换”Cisco 刘洋 从“路由”回归“交换”
Cisco 刘洋 从“路由”回归“交换”
 

Destacado

Ph.D. thesis presentation
Ph.D. thesis presentationPh.D. thesis presentation
Ph.D. thesis presentationdavidkftam
 
Galvin-operating System(Ch8)
Galvin-operating System(Ch8)Galvin-operating System(Ch8)
Galvin-operating System(Ch8)dsuyal1
 
Lecture5
Lecture5Lecture5
Lecture5jntu
 
C++ Actor Model - You’ve Got Mail ...
C++ Actor Model - You’ve Got Mail ...C++ Actor Model - You’ve Got Mail ...
C++ Actor Model - You’ve Got Mail ...Gianluca Padovani
 
Essentials of Multithreaded System Programming in C++
Essentials of Multithreaded System Programming in C++Essentials of Multithreaded System Programming in C++
Essentials of Multithreaded System Programming in C++Shuo Chen
 
Threads (operating System)
Threads (operating System)Threads (operating System)
Threads (operating System)Prakhar Maurya
 
Operating System 5
Operating System 5Operating System 5
Operating System 5tech2click
 
Unit II - 2 - Operating System - Threads
Unit II - 2 - Operating System - ThreadsUnit II - 2 - Operating System - Threads
Unit II - 2 - Operating System - Threadscscarcas
 
Thread scheduling in Operating Systems
Thread scheduling in Operating SystemsThread scheduling in Operating Systems
Thread scheduling in Operating SystemsNitish Gulati
 
Operating System-Threads-Galvin
Operating System-Threads-GalvinOperating System-Threads-Galvin
Operating System-Threads-GalvinSonali Chauhan
 
Ch8 OS
Ch8 OSCh8 OS
Ch8 OSC.U
 
Operating System Deadlock Galvin
Operating System  Deadlock GalvinOperating System  Deadlock Galvin
Operating System Deadlock GalvinSonali Chauhan
 

Destacado (20)

Ph.D. thesis presentation
Ph.D. thesis presentationPh.D. thesis presentation
Ph.D. thesis presentation
 
Java Threading
Java ThreadingJava Threading
Java Threading
 
Deadlock
DeadlockDeadlock
Deadlock
 
Galvin-operating System(Ch8)
Galvin-operating System(Ch8)Galvin-operating System(Ch8)
Galvin-operating System(Ch8)
 
Ch05
Ch05Ch05
Ch05
 
Lecture5
Lecture5Lecture5
Lecture5
 
Threading
ThreadingThreading
Threading
 
C++ Actor Model - You’ve Got Mail ...
C++ Actor Model - You’ve Got Mail ...C++ Actor Model - You’ve Got Mail ...
C++ Actor Model - You’ve Got Mail ...
 
Essentials of Multithreaded System Programming in C++
Essentials of Multithreaded System Programming in C++Essentials of Multithreaded System Programming in C++
Essentials of Multithreaded System Programming in C++
 
Threads (operating System)
Threads (operating System)Threads (operating System)
Threads (operating System)
 
Operating System 5
Operating System 5Operating System 5
Operating System 5
 
Unit II - 2 - Operating System - Threads
Unit II - 2 - Operating System - ThreadsUnit II - 2 - Operating System - Threads
Unit II - 2 - Operating System - Threads
 
Thread scheduling in Operating Systems
Thread scheduling in Operating SystemsThread scheduling in Operating Systems
Thread scheduling in Operating Systems
 
Threads in java
Threads in javaThreads in java
Threads in java
 
Cpu scheduling
Cpu schedulingCpu scheduling
Cpu scheduling
 
Operating System-Threads-Galvin
Operating System-Threads-GalvinOperating System-Threads-Galvin
Operating System-Threads-Galvin
 
Chapter 7 - Deadlocks
Chapter 7 - DeadlocksChapter 7 - Deadlocks
Chapter 7 - Deadlocks
 
Ch8 OS
Ch8 OSCh8 OS
Ch8 OS
 
Operating System Deadlock Galvin
Operating System  Deadlock GalvinOperating System  Deadlock Galvin
Operating System Deadlock Galvin
 
Os Threads
Os ThreadsOs Threads
Os Threads
 

Similar a thread-clustering

QsNetIII, An HPC Interconnect For Peta Scale Systems
QsNetIII, An HPC Interconnect For Peta Scale SystemsQsNetIII, An HPC Interconnect For Peta Scale Systems
QsNetIII, An HPC Interconnect For Peta Scale SystemsFederica Pisani
 
Acunu & OCaml: Experience Report, CUFP
Acunu & OCaml: Experience Report, CUFPAcunu & OCaml: Experience Report, CUFP
Acunu & OCaml: Experience Report, CUFPAcunu
 
New hope is comming? Project Loom.pdf
New hope is comming? Project Loom.pdfNew hope is comming? Project Loom.pdf
New hope is comming? Project Loom.pdfKrystian Zybała
 
Caching technology comparison
Caching technology comparisonCaching technology comparison
Caching technology comparisonRohit Kelapure
 
Intel® RDT Hands-on Lab
Intel® RDT Hands-on LabIntel® RDT Hands-on Lab
Intel® RDT Hands-on LabMichelle Holley
 
Membase Meetup Chicago - january 2011
Membase Meetup Chicago - january 2011Membase Meetup Chicago - january 2011
Membase Meetup Chicago - january 2011Membase
 
Microsoft dagen windows server 2012
Microsoft dagen   windows server 2012Microsoft dagen   windows server 2012
Microsoft dagen windows server 2012Olav Tvedt
 
Cassandra Performance and Scalability on AWS
Cassandra Performance and Scalability on AWSCassandra Performance and Scalability on AWS
Cassandra Performance and Scalability on AWSAdrian Cockcroft
 
Cache is King ( Or How To Stop Worrying And Start Caching in Java) at Chicago...
Cache is King ( Or How To Stop Worrying And Start Caching in Java) at Chicago...Cache is King ( Or How To Stop Worrying And Start Caching in Java) at Chicago...
Cache is King ( Or How To Stop Worrying And Start Caching in Java) at Chicago...srisatish ambati
 
Roeder posterismb2010
Roeder posterismb2010Roeder posterismb2010
Roeder posterismb2010Chris Roeder
 
Collaborate vdb performance
Collaborate vdb performanceCollaborate vdb performance
Collaborate vdb performanceKyle Hailey
 
Dependency injection
Dependency injectionDependency injection
Dependency injectionhousecor
 
ASPLOS2011 workshop RESoLVE "Effect of Disk Prefetching of Guest OS "
ASPLOS2011 workshop RESoLVE "Effect of Disk Prefetching of Guest OS "ASPLOS2011 workshop RESoLVE "Effect of Disk Prefetching of Guest OS "
ASPLOS2011 workshop RESoLVE "Effect of Disk Prefetching of Guest OS "Kuniyasu Suzaki
 
Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...
Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...
Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...OpenEBS
 
Percona XtraDB Cluster
Percona XtraDB ClusterPercona XtraDB Cluster
Percona XtraDB ClusterKenny Gryp
 
Open stack in sina
Open stack in sinaOpen stack in sina
Open stack in sinaHui Cheng
 

Similar a thread-clustering (20)

QsNetIII, An HPC Interconnect For Peta Scale Systems
QsNetIII, An HPC Interconnect For Peta Scale SystemsQsNetIII, An HPC Interconnect For Peta Scale Systems
QsNetIII, An HPC Interconnect For Peta Scale Systems
 
Acunu & OCaml: Experience Report, CUFP
Acunu & OCaml: Experience Report, CUFPAcunu & OCaml: Experience Report, CUFP
Acunu & OCaml: Experience Report, CUFP
 
New hope is comming? Project Loom.pdf
New hope is comming? Project Loom.pdfNew hope is comming? Project Loom.pdf
New hope is comming? Project Loom.pdf
 
Caching technology comparison
Caching technology comparisonCaching technology comparison
Caching technology comparison
 
Intel® RDT Hands-on Lab
Intel® RDT Hands-on LabIntel® RDT Hands-on Lab
Intel® RDT Hands-on Lab
 
Membase Meetup Chicago - january 2011
Membase Meetup Chicago - january 2011Membase Meetup Chicago - january 2011
Membase Meetup Chicago - january 2011
 
Microsoft dagen windows server 2012
Microsoft dagen   windows server 2012Microsoft dagen   windows server 2012
Microsoft dagen windows server 2012
 
Cassandra Performance and Scalability on AWS
Cassandra Performance and Scalability on AWSCassandra Performance and Scalability on AWS
Cassandra Performance and Scalability on AWS
 
Cache is King ( Or How To Stop Worrying And Start Caching in Java) at Chicago...
Cache is King ( Or How To Stop Worrying And Start Caching in Java) at Chicago...Cache is King ( Or How To Stop Worrying And Start Caching in Java) at Chicago...
Cache is King ( Or How To Stop Worrying And Start Caching in Java) at Chicago...
 
CPU Caches
CPU CachesCPU Caches
CPU Caches
 
Roeder posterismb2010
Roeder posterismb2010Roeder posterismb2010
Roeder posterismb2010
 
Collaborate vdb performance
Collaborate vdb performanceCollaborate vdb performance
Collaborate vdb performance
 
Integrating on premise Line Of Business applications with CRM Online
Integrating on premise Line Of Business applications with CRM OnlineIntegrating on premise Line Of Business applications with CRM Online
Integrating on premise Line Of Business applications with CRM Online
 
Presentation
PresentationPresentation
Presentation
 
Presentation
PresentationPresentation
Presentation
 
Dependency injection
Dependency injectionDependency injection
Dependency injection
 
ASPLOS2011 workshop RESoLVE "Effect of Disk Prefetching of Guest OS "
ASPLOS2011 workshop RESoLVE "Effect of Disk Prefetching of Guest OS "ASPLOS2011 workshop RESoLVE "Effect of Disk Prefetching of Guest OS "
ASPLOS2011 workshop RESoLVE "Effect of Disk Prefetching of Guest OS "
 
Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...
Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...
Container Attached Storage (CAS) with OpenEBS - Berlin Kubernetes Meetup - Ma...
 
Percona XtraDB Cluster
Percona XtraDB ClusterPercona XtraDB Cluster
Percona XtraDB Cluster
 
Open stack in sina
Open stack in sinaOpen stack in sina
Open stack in sina
 

Último

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 

Último (20)

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 

thread-clustering

  • 1. Thread Clustering: Sharing-Aware Thread Scheduling on SMP-CMP-SMT Multiprocessors David Tam, Reza Azimi, Michael Stumm University of Toronto {tamda, azimi, stumm}@eecg.toronto.edu Thread Clustering
  • 2. Multiprocessors Today Example: IBM Power 5 system 1 Thread Clustering
  • 3. Multiprocessors Today Example: IBM Power 5 system SMP CMP SMT SHARED CACHE 1 Thread Clustering
  • 4. Multiprocessors Today Example: IBM Power 5 system Disparity in L2 latencies 120 14 1 Thread Clustering
  • 5. Operating Systems Today CPU Schedulers: Ignore disparity in L2 latencies ● Ignore data sharing among threads ● Distribute threads poorly ● Cross-chip traffic ● Remote L2 cache accesses ● Causes performance problem ● 2 Thread Clustering
  • 6. Our Goal: Sharing-Aware Scheduling Detect sharing patterns ● Cluster threads ● Benefits: Decrease cross-chip traffic ● Increase on-chip cache locality ● Exploit shared L2 caches ● 3 Thread Clustering
  • 7. Our Online Technique STEPS: 1) Monitor remote cache access rate 2) Detect thread sharing patterns REPEAT 3) Determine thread clusters 4) Migrate thread clusters 4 Thread Clustering
  • 8. Sharing Detection To observe remote cache accesses: ● Exploit HPCs (hardware performance counters) ● Sample remote cache miss addresses ● Local cache misses satisfied by remote cache ● IBM Power 5 continuous data sampling ● 1 X 5 Thread Clustering
  • 9. Sharing Detection To observe remote cache accesses: ● Exploit HPCs (hardware performance counters) ● Sample remote cache miss addresses ● Local cache misses satisfied by remote cache ● IBM Power 5 continuous data sampling ● X 2 5 Thread Clustering
  • 10. Sharing Detection To observe remote cache accesses: ● Exploit HPCs (hardware performance counters) ● Sample remote cache miss addresses ● Local cache misses satisfied by remote cache ● IBM Power 5 continuous data sampling ● 3 5 Thread Clustering
  • 11. Sharing Signatures Construct for each thread ● Counts remote cache accesses ● Conceptually virtual address virtual address 264 0 block 8-bit counter 6 Thread Clustering
  • 12. Sharing Signatures Construct for each thread ● Counts remote cache accesses ● Conceptually virtual address virtual address 264 0 ctri++ block 8-bit counter 6 Thread Clustering
  • 13. Optimizations CPU: Temporal Sampling ● Sample every Nth remote cache access ● Memory: Spatial Sampling ● 256-entry vector ● Hash function ● Block ID filter ● Vectors still effective at indicating sharing ● 7 Thread Clustering
  • 14. Spatial Sampling Hash collision & alias removal ● Filter Legend Empty Reserved Block ID 255 0 255 0 8 Thread Clustering
  • 15. Spatial Sampling Hash collision & alias removal ● Filter Legend Empty Reserved hash EMPTY Block ID 255 0 255 0 8 Thread Clustering
  • 16. Spatial Sampling Hash collision & alias removal ● Filter Legend Empty Reserved hash (First-Come-First-Reserved) Block ID 255 0 255 0 8 Thread Clustering
  • 17. Spatial Sampling Hash collision & alias removal ● Filter Legend Empty Reserved hash MATCH Block ID Block ID 255 0 255 0 8 Thread Clustering
  • 18. Spatial Sampling Hash collision & alias removal ● Filter Legend Empty Reserved hash MISMATCH Block ID Block ID 255 0 ALIASING PREVENTED 255 0 8 Thread Clustering
  • 19. Automated Clustering Clustering Heuristic: Simple, one-pass algorithm ● Compare vector against existing clusters ● If not similar, create a new cluster ● Similarity Metric: N ∑ V1[i] * V2[i] i=0 Shared blocks amplified ● Non-shared blocks nullified ● 9 Thread Clustering
  • 20. Experimental Platform 8-way Power 5, 1.5GHz ● Linux 2.6 ● IBM J2SE 5.0 JVM ● 1.9MB L2 1.9MB L2 36MB 36MB 4 GB 4 GB 10 Thread Clustering
  • 21. Workloads Microbenchmark expect 4 clusters ● 4 threads per cluster ● SPECjbb2000 (modified) expect 2 clusters ● 2 warehouses, 8 threads per warehouse ● RUBiS + MySQL expect 2 clusters ● 2 databases, 16 threads per database ● VolanoMark chat server expect 2 clusters ● 2 rooms, 8 threads per room ● 11 Thread Clustering
  • 22. Visualizing Clusters Counter Values An example 255 ● 128 64 0 { Cluster A, 4 vectors { Cluster B, 4 vectors 12 Thread Clustering
  • 23. Visualizing Clusters Counter Values An example 255 ● 128 64 0 { Cluster A, 4 vectors { Cluster B, 4 vectors 12 Thread Clustering
  • 24. Visualizing Clusters Counter Values An example 255 ● 128 64 0 { Cluster A, 4 vectors { Cluster B, 4 vectors 12 Thread Clustering
  • 25. Visualizing Clusters Microbenchmark ● { 4 vectors 13 Thread Clustering
  • 26. Visualizing Clusters Modified SPECjbb2000 (4 warehouses) ● { 16 vectors 14 Thread Clustering
  • 27. Visualizing Clusters RUBiS + MySQL (2 databases) ● { 24 vectors 15 Thread Clustering
  • 28. Visualizing Clusters VolanoMark (4 rooms) ● 16 Thread Clustering
  • 29. Remote Cache Impact Normalized to default Linux ● 90 70 72 43 32 22 9 2 -17 17 Thread Clustering
  • 30. Performance Impact IPC: instructions per cycle ● Normalized to default Linux ● 7.4 7.4 7.1 6.1 6.1 5.1 5.0 3.7 -0.8 18 Thread Clustering
  • 31. Summary AFTER: BEFORE: Operating System With Current Operating Systems Thread Clustering 19 Thread Clustering
  • 32. Conclusions HPCs can detect sharing ● Sharing signatures are effective ● Automated thread clustering: ● Reduces remote cache access up to 70% ● Improves performance up to 7% ● All with low overhead ● Future Work: More workloads ● Improve clustering algorithm ● Integration with load-balancing aspects ● 20 Thread Clustering
  • 34. Sampling Overhead Modified SPECjbb2000 ● Thread Clustering