SlideShare una empresa de Scribd logo
1 de 33
Descargar para leer sin conexión
Multi-core Parallelization in Clojure -
            a Case Study

     Johann M. Kraus and Hans A. Kestler

    AG Bioinformatics and Systems Biology
  Institute of Neural Information Processing
               University of Ulm

                 29.06.2009
Outline


1. Concepts of parallel programming


2. Short introduction to Clojure


3. Multi-core parallel K-means - the case study


4. Analysis and Results


5. Summary
Parallel Programming
Definition:
Parallel programming is a form of programming where many calculations
are performed simultaneously.




•   Physical constraints prevent frequency scaling of processors


•   This led to an increasing interest in parallel hardware and parallel
    programming


•   Multi-core hardware is standard on desktop computers


•   Parallel software can use this hardware to the full capacity
•             Large problems are divided into smaller ones and the sub-
              problems are solved simultaneously


•             Speedup S is limited by the fraction of parallelizable code P

                                                   1
•             Amdahl’s law:                  S=
                                                1−P +                           P
                                                                                N

                                                         Amdahl's law
              20
              18




                                                                                Fraction of parallelizable code
              16




                                                                                               0.95 %
                                                                                               0.90 %
              14




                                                                                               0.75 %
                                                                                               0.50 %
              12
    Speedup

              10
              8
              6
              4
              2
              0




                   1   2   4   8   16   32    64   128       256        512   1024   2048   4096    8192   16384   32768   65536

                                                    Number of processors
Concepts of Parallel Programming

              Explicit vs. implicit parallelization


•   Explicitly define communication and synchronization details for
    each task:
                 • MPI
                •   Java Threads


•   Functional programming allows implicit parallelization:

                •   Parallel processing of functions

                •   Functions are free of side-effects

                •   Data is immutable
Distributed vs. local hardware


•   Master - Slave parallelization                    •     Shared memory parallelization
    (e.g. Message Passing Interface)                        (e.g. Open Multi-Processing)



                                                                   CPU
                       Master                                       0




       Slave           Slave            Slave         CPU         Shared          CPU
         0               1                2            4          Memory           1




               Slave            Slave
                 3                4                         CPU
                                                                           CPU2
                                                             3



                                        send data                                       read
                                        send result                                     write
Thread programming

•   Threads are refinements of a process that share the same memory and
    can be processed separately and simultaneously


•   Available in many languages, e.g. PThreads (C), Java Threads (Java),
    OpenMP Threads (C, Fortran)


•   Execution of threads is handled by a scheduler that manages the available
    processing time

•   Communication between
                                           new
                                                     start   runnable
                                                                            awake

    threads is faster than
    communication between
    processes




                                                                 schedule
                                                                                    waiting




•   Invoking threads is also                         end                    block
    faster than fork/join
                                        terminated           running



    processes
Concurrency control via locking and synchronizing

• Concurrency control ensures that threads can access shared memory
 without violating data integrity


• The most popular approach to concurrency is locking and synchronizing
               public c l a s s Counter {
                         private int v a l u e = 0 ;
                         public synchronized void i n c r {
                                     value = value + 1;
                         }
               }
               Counter c o u n t e r = new Counter ( ) ;
               counter . incr ( ) ;

• Problems might occur when using too many locks, too few locks, wrong
  locks, or locks in the wrong order


• Using locks can be fatally error-prone, e.g. dead-locks
Concurrency control via transactional memory


• Transactional memory offers a flexible alternative to lock-based
  concurrency control


• Functionality is analogous to controlling simultaneous access to database
  management systems


• Transactions ensure properties:
  •   Atomicity: Either all changes of a transaction occur or none do

  •   Consistency: Only valid changes are committed

  •   Isolation: No transaction sees the effect of other transactions

  •   Durability: Changes from transactions will be persistent
• Software transactional memory maps transactional memory to
 concurrency control in parallel programming

                                                                                   TIME


 :Transaction 0                       :Data                       :Transaction 1

                      get data




                                                  get data



                                              [consistent data]
                                              send modified data

                  [consistent data]
                  send modified data


                       get data


                  [consistent data]
                  send modified data
Clojure


•   Functional programming language hosted on the JVM


•   Extends the code-as-data paradigm to maps and vectors


•   Based on immutable data structures


•   Provides built-in concurrency support via software transactional
    memory


•   Completely symbiotic to Java, e.g. easy access to Java libraries


•   Platform independent
•   Java interaction
        ( import    ’ ( c e r n . j e t . random . s a m p l i n g
                        RandomSamplingAssistant ) )
        ( defn sample
          [n k]
          ( seq ( . RandomSamplingAssistant
                      ( sampleArray k ( i n t −a r r a y ( range n ) ) ) ) ) )


•   Dynamic typing and multi-methods

    •   An object is defined as the sum of what it can do (methods),
        rather than the sum of what it is (type hierarchy)


•   Add type hints to speed up code

        ( defn da+ [#ˆ doubles a s #ˆdoubles bs ]
          (amap a s i r e t
           (+ ( aget a s i ) ( aget bs i ) ) ) )
Transactional references and STM


•   Transactional references ensure safe coordinated synchronous
    changes to mutable storage locations


•   Are bound to a single storage location for their lifetime


•   Only allow mutation of that location to occur within transactions


•   Available operations are ref-set, alter, and commute


•   No explicit locking is required


                 ( def c o u n t e r ( r e f 0 ) )
                 ( dosync ( a l t e r c o u n t e r inc ) )
Agents

•   Agents allow independent asynchronous change of mutable
    locations

•   Are bound to a single storage location for their lifetime

•   Only allow mutation of that location to a new state to occur as a
    result of an action

•   Actions are functions that are asynchronously applied to the state
    of an Agent

•   The return value of an action becomes new state of the Agent

•   Agents are integrated with the STM
                    ( def c o u n t e r ( agent 0 ) )
                    ( send c o u n t e r inc )
Cluster analysis

•   Given a data set X compute a partition of X into k disjoint clusters C,
    such that:
                                     k
                               (1)         Ci = X
                                     i=1
                               (2) Ci = ∅ and Ci ∩ Cj = ∅


•   How many clusters are in the data set?




                   3 cluster                                9 cluster
Cluster algorithms
•   For all possible partitions evaluate the
    objective function f and search the optimum.




                                                                                                                                                                              Number of data points
                                                                                  30
•   The cardinality of the set of all possible
                                                                                                                                                                         35




                                                                                  25
                                                           Runtime (nanosecond)
                                                                                                                                                                    30




                                                                                  20
    partitions is given by:
                                                                                                                                                               25




                                                                                  15
                                                                                                                                                          20

                                                                                                                                                     15




                                                                                  10
                                      k
                                 1
                                                                                                                                                10
    Stirling numbers of                        k−i   k N
                           k
                               =            (−1)




                                                                                  5
    the second kind
                          SN                           i                                                                                    5

                                 k!                  i                                                                                  0




                                                                                  0
                                      i=0                                              0   5   10       15     20        25   30   35

                                                                                                    Number of clusters




Cluster algorithms provide a heuristic for this search:

•   Partitional clustering (K-means, Neuralgas, SOM, Fuzzy C-means, ...)

•   Hierarchical clustering (Divisive/agglomerative, Complete linkage, ...)

•   Graph-based clustering (Spectral clustering, NMF, Affinity propagation, ...)

•   Model-based clustering, Biclustering, Semi-supervised clustering
K-means algorithm
Function KMeans

 Input : X = { x 1 , . . . , x n } ( Data t o be c l u s t e r e d )
         k ( Number o f c l u s t e r s )

Output : C = { c 1 , . . . , c k } ( C l u s t e r c e n t r o i d s )
         m: X −> C ( C l u s t e r a s s i g n m e n t s )

I n i t i a l i z e C ( e . g . random s e l e c t i o n from X)
While C h a s changed
  For e a c h x i i n X
   m( x i ) = a r g m i n j d i s t a n c e ( x i , c j )
 End
  For e a c h c j i n C
   c j = c e n t r o i d ( { x i | m( x i ) = j } )
 End
End
Cluster Validation
•   Evaluation requires repeated runs of clustering, e.g.:
       •   Resampled data sets

       •   Different parameters

•   MCA-index: mean proportion of samples being consistent over
    different clusterings
                                         k
                  M CA =     1
                             n   maxπ    i=1   |Ai ∩ Bj |
Estimation of the expected value of a validation index




                                                 1.0
Random label: randomly assign
each item to a cluster k




                                                 0.8
Random partition: choose a


                                mean mca index

                                                 0.6
random partition


                                                 0.4
Random prototype: assign each
item to its next prototype                       0.2
                                                 0.0




                                                       0   10   20             30   40   50

Mean value from 100 runs                                             cluster
Multi-core K-means with Clojure
•   Split the data set into smaller pieces that are handled by agents

•   Each cluster is represented by an agent

•   Add a commutative list of cluster members within a transactional
    reference to accelerate the centroid update step



                       Data       Data      Data      Data              Data
                      Agent 0    Agent 1   Agent 2   Agent 3           Agent n




                                                                   Member
           Cluster                                                  Ref 0
           Agent 0



                     Cluster                                      Member
                     Agent 1                                       Ref 1




                                Cluster                  Member
                                Agent k                   Ref k
                                                                                 read

                                                                                 write
simultaneous read



      Cluster                               Data
      Agent 0                              Agent 0


            Cluster                                        Data
            Agent 1                                       Agent 1




  Cluster
  Agent k                                       Data
                                               Agent n




                      simultaneous write



                                            Data
 Member
                                           Agent 0
  Ref 0


                                                          Data
         Member
                                                         Agent 1
          Ref 1



                                            Data
                                           Agent n
Member
 Ref 2
read: (nearest-cluster)

write: (commute)
       (assoc)

( defn a s s i g n m e n t [ ]
  (map #(send % update−d a t a a g e n t ) DataAgents )

( defn update−d a t a a g e n t [ d a t a p o i n t s ]
  (map update−d a t a p o i n t d a t a p o i n t s ) )

( defn update−d a t a p o i n t [ d a t a p o i n t ]
  ( l e t [ newass ( n e a r e s t −c l u s t e r d a t a p o i n t ) ]
    ( dosync (commute ( nth MemberRefs newass )
                              conj ( : d a t a d a t a p o i n t ) ) )
    ( assoc d a t a p o i n t : a s s i g n m e n t newass ) ) )
Benchmark results
                          Large data sets (artificial):

                          •   Each data point is sampled from N(0,1)

                          •   Summary for 10 runs of K-means
                                   10.000 cases, 100 dimensions                                1.000.000 cases, 200 dimensions
                                            20 Cluster                                                    20 Cluster
                    150




                                                                                         450
runtime (seconds)




                                                                     runtime (minutes)
                    100




                                                                                         300
                                                                                         150
                    50
                    0




                                                                                         0




                              ParaKMeans    K-means R     McKmeans                                K-means R     McKmeans
•      Number of computer cores used                                    •   Number of data agents used
                                       100.000 x 500                                                    100.000 x 500
                                         20 cluster                                                       20 cluster




                                                                                      800
                    1500




                                                                                      600
runtime (seconds)




                                                                  runtime (seconds)
                    1000




                                                                                      400
                    500




                                                                                      200
                    0




                                                                                      0




                               1              4               8                                 4      6                8     10

                                   number of computer cores                                           number of data agents
Large data sets with cluster structure


                           •      Data sampled from a multi-variate normal distribution

                           •      100000 samples, 200/500 dimensions, 10/20 cluster

                                                 K-means R                                        McKmeans
                    2000
                    1500
runtime (seconds)
                    1000
                    500
                    0




                               200 / 10   200 / 20   500 / 10   500 / 20   200 / 10    200 / 20      500 / 10   500 / 20

                                                        Number of samples / Number of clusters
Accuracy compared to the known grouping of data


                         •    Measured with the MCA index

                         •    Red bars indicate the random-prototype baseline

               100.000 x 200            100.000 x 200          100.000 x 500          100.000 x 500
                10 cluster               20 cluster             10 cluster             20 cluster
     1.0
     0.8




                _            _                                                         _          _
                                         _          _           _          _
MCA index
0.4    0.6
     0.2
     0.0




              McKmeans   K-means R     McKmeans   K-means R   McKmeans   K-means R   McKmeans   K-means R
Real world data set

                           •     Microarray data (Radiation-induced changes in
                                 human gene expression)

                           •     22277 samples (genes) and 465 features (profiles)
                                              K-means R                                                       McKmeans
                    350
runtime (seconds)

                    250
                    150
                    50
                    0




                          2 Cluster    5 Cluster      10 Cluster      20 Cluster       2 Cluster       5 Cluster       10 Cluster      20 Cluster


                                                                         Number of clusters
    Smirnov D, Morley M, Shin E, Spielman R, Cheung V: Genetic analysis of radiation-induced changes in human gene expression. Nature 2009, 459:587–591
Application to Cluster Number Estimation
•   Repeated clustering with different subsets of data


•   Repeated for different number of clusters k


•   Most stable clustering is produced for the ‘real’ cluster number

•   Jackknife resampling




                                                 1.0
•                                                      _ _ _ _

                                                 0.8
    Evaluation with MCA index
                                                               _ _
                                                 0.6
•   Data set:100000 samples,         MCA index



    100 features, 3 cluster
                                                 0.4




•
                                                 0.2




    10 runs per cluster number
                                                 0.0




•   49.26 minutes on dual-quad                         2   3     4           5      6   7

    core 3.2 GHz                                               number of clusters
Java GUI
( import       ’ ( j a v a x . s w i n g JFrame J L a b e l J T e x t F i e l d JButton )
               ’ ( j a v a . awt . e v e n t A c t i o n L i s t e n e r )
               ’ ( j a v a . awt GridLayout ) )

( let     [ frame ( new JFrame ” H e l l o , World ! ” )
            h e l l o b u t t o n ( new JButton ” Say h e l l o ” )
            h e l l o l a b e l ( new J L a b e l ” ” ) ]
        ( . h e l l o button
                ( addActionListener
                     ( proxy [ A c t i o n L i s t e n e r ] [ ]
                              ( actionPerformed [ evt ]
                                          ( . hello label
                                                ( s e t T e x t ” H e l l o , World ! ” ) ) ) ) ) )
        ( d o t o frame
                              ( . s e t L a y o u t ( new GridLayout 1 1 3 3 ) )
                              ( . add h e l l o b u t t o n )
                              ( . add h e l l o l a b e l )
                              ( . s e t S i z e 300 8 0 )
                              ( . s e t V i s i b l e true )))
Summary

•   Writing parallel programs usually requires a careful software design
    and a deep knowledge about thread-safe programming


•   Concurrency control via transactional memory circumvents
    problems of lock-based concurrency strategies


•   Immutable data structures play a key role to software transactional
    memory


•   Clojure combines Lisp, Java and a powerful STM system


•   This enables fast parallelization of algorithms, even for rapid
    prototyping


•   Our simulations show a good performance of the parallelized code
Thank you for your attention.
Statistical computing library


• http://wiki.github.com/liebke/incanter
• Clojure-based statistical computing
• R-like semantics
• COLT library for numerical computation
• JFreeChart library for graphics

Más contenido relacionado

La actualidad más candente

Building high traffic http front-ends. theo schlossnagle. зал 1
Building high traffic http front-ends. theo schlossnagle. зал 1Building high traffic http front-ends. theo schlossnagle. зал 1
Building high traffic http front-ends. theo schlossnagle. зал 1rit2011
 
Simple asynchronous remote invocations for distributed real-time Java
Simple asynchronous remote invocations for distributed real-time JavaSimple asynchronous remote invocations for distributed real-time Java
Simple asynchronous remote invocations for distributed real-time JavaUniversidad Carlos III de Madrid
 
Recent advance in netmap/VALE(mSwitch)
Recent advance in netmap/VALE(mSwitch)Recent advance in netmap/VALE(mSwitch)
Recent advance in netmap/VALE(mSwitch)micchie
 
Hs java open_party
Hs java open_partyHs java open_party
Hs java open_partyOpen Party
 
Fast Userspace OVS with AF_XDP, OVS CONF 2018
Fast Userspace OVS with AF_XDP, OVS CONF 2018Fast Userspace OVS with AF_XDP, OVS CONF 2018
Fast Userspace OVS with AF_XDP, OVS CONF 2018Cheng-Chun William Tu
 
DevOops - Lessons Learned from an OpenStack Network Architect
DevOops - Lessons Learned from an OpenStack Network ArchitectDevOops - Lessons Learned from an OpenStack Network Architect
DevOops - Lessons Learned from an OpenStack Network ArchitectJames Denton
 
Training Slides: Basics 102: Introduction to Tungsten Clustering
Training Slides: Basics 102: Introduction to Tungsten ClusteringTraining Slides: Basics 102: Introduction to Tungsten Clustering
Training Slides: Basics 102: Introduction to Tungsten ClusteringContinuent
 
Performance challenges in software networking
Performance challenges in software networkingPerformance challenges in software networking
Performance challenges in software networkingStephen Hemminger
 
What's New and Upcoming in HDFS - the Hadoop Distributed File System
What's New and Upcoming in HDFS - the Hadoop Distributed File SystemWhat's New and Upcoming in HDFS - the Hadoop Distributed File System
What's New and Upcoming in HDFS - the Hadoop Distributed File SystemCloudera, Inc.
 
Presentation: Optimal Power Management for Server Farm to Support Green Compu...
Presentation: Optimal Power Management for Server Farm to Support Green Compu...Presentation: Optimal Power Management for Server Farm to Support Green Compu...
Presentation: Optimal Power Management for Server Farm to Support Green Compu...Sivadon Chaisiri
 
Jvm Performance Tunning
Jvm Performance TunningJvm Performance Tunning
Jvm Performance Tunningguest1f2740
 
Prerequisite knowledge for shared memory concurrency
Prerequisite knowledge for shared memory concurrencyPrerequisite knowledge for shared memory concurrency
Prerequisite knowledge for shared memory concurrencyViller Hsiao
 

La actualidad más candente (19)

Building high traffic http front-ends. theo schlossnagle. зал 1
Building high traffic http front-ends. theo schlossnagle. зал 1Building high traffic http front-ends. theo schlossnagle. зал 1
Building high traffic http front-ends. theo schlossnagle. зал 1
 
Simple asynchronous remote invocations for distributed real-time Java
Simple asynchronous remote invocations for distributed real-time JavaSimple asynchronous remote invocations for distributed real-time Java
Simple asynchronous remote invocations for distributed real-time Java
 
Recent advance in netmap/VALE(mSwitch)
Recent advance in netmap/VALE(mSwitch)Recent advance in netmap/VALE(mSwitch)
Recent advance in netmap/VALE(mSwitch)
 
Hs java open_party
Hs java open_partyHs java open_party
Hs java open_party
 
Fast Userspace OVS with AF_XDP, OVS CONF 2018
Fast Userspace OVS with AF_XDP, OVS CONF 2018Fast Userspace OVS with AF_XDP, OVS CONF 2018
Fast Userspace OVS with AF_XDP, OVS CONF 2018
 
2011.jtr.pbasanta.
2011.jtr.pbasanta.2011.jtr.pbasanta.
2011.jtr.pbasanta.
 
Userspace networking
Userspace networkingUserspace networking
Userspace networking
 
DevOops - Lessons Learned from an OpenStack Network Architect
DevOops - Lessons Learned from an OpenStack Network ArchitectDevOops - Lessons Learned from an OpenStack Network Architect
DevOops - Lessons Learned from an OpenStack Network Architect
 
Training Slides: Basics 102: Introduction to Tungsten Clustering
Training Slides: Basics 102: Introduction to Tungsten ClusteringTraining Slides: Basics 102: Introduction to Tungsten Clustering
Training Slides: Basics 102: Introduction to Tungsten Clustering
 
Performance challenges in software networking
Performance challenges in software networkingPerformance challenges in software networking
Performance challenges in software networking
 
Ch3-2
Ch3-2Ch3-2
Ch3-2
 
What's New and Upcoming in HDFS - the Hadoop Distributed File System
What's New and Upcoming in HDFS - the Hadoop Distributed File SystemWhat's New and Upcoming in HDFS - the Hadoop Distributed File System
What's New and Upcoming in HDFS - the Hadoop Distributed File System
 
General Purpose GPU Computing
General Purpose GPU ComputingGeneral Purpose GPU Computing
General Purpose GPU Computing
 
Basanta jtr2009
Basanta jtr2009Basanta jtr2009
Basanta jtr2009
 
Presentation: Optimal Power Management for Server Farm to Support Green Compu...
Presentation: Optimal Power Management for Server Farm to Support Green Compu...Presentation: Optimal Power Management for Server Farm to Support Green Compu...
Presentation: Optimal Power Management for Server Farm to Support Green Compu...
 
Jvm Performance Tunning
Jvm Performance TunningJvm Performance Tunning
Jvm Performance Tunning
 
Methods of NoSQL database systems benchmarking
Methods of NoSQL database systems benchmarkingMethods of NoSQL database systems benchmarking
Methods of NoSQL database systems benchmarking
 
Prerequisite knowledge for shared memory concurrency
Prerequisite knowledge for shared memory concurrencyPrerequisite knowledge for shared memory concurrency
Prerequisite knowledge for shared memory concurrency
 
Enhancing the region model of RTSJ
Enhancing the region model of RTSJEnhancing the region model of RTSJ
Enhancing the region model of RTSJ
 

Similar a Multi-core Parallelization in Clojure - a Case Study

Coding For Cores - C# Way
Coding For Cores - C# WayCoding For Cores - C# Way
Coding For Cores - C# WayBishnu Rawal
 
Concurrency Programming in Java - 01 - Introduction to Concurrency Programming
Concurrency Programming in Java - 01 - Introduction to Concurrency ProgrammingConcurrency Programming in Java - 01 - Introduction to Concurrency Programming
Concurrency Programming in Java - 01 - Introduction to Concurrency ProgrammingSachintha Gunasena
 
gevent at TellApart
gevent at TellApartgevent at TellApart
gevent at TellApartTellApart
 
Building Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesBuilding Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesDavid Martínez Rego
 
Lecture 2
Lecture 2Lecture 2
Lecture 2Mr SMAK
 
Simon Peyton Jones: Managing parallelism
Simon Peyton Jones: Managing parallelismSimon Peyton Jones: Managing parallelism
Simon Peyton Jones: Managing parallelismSkills Matter
 
Peyton jones-2011-parallel haskell-the_future
Peyton jones-2011-parallel haskell-the_futurePeyton jones-2011-parallel haskell-the_future
Peyton jones-2011-parallel haskell-the_futureTakayuki Muranushi
 
Storm presentation
Storm presentationStorm presentation
Storm presentationShyam Raj
 
Building a Database for the End of the World
Building a Database for the End of the WorldBuilding a Database for the End of the World
Building a Database for the End of the Worldjhugg
 
Multi core programming 2
Multi core programming 2Multi core programming 2
Multi core programming 2Robin Aggarwal
 
Scaling up java applications on windows
Scaling up java applications on windowsScaling up java applications on windows
Scaling up java applications on windowsJuarez Junior
 
VTU 6th Sem Elective CSE - Module 3 cloud computing
VTU 6th Sem Elective CSE - Module 3 cloud computingVTU 6th Sem Elective CSE - Module 3 cloud computing
VTU 6th Sem Elective CSE - Module 3 cloud computingSachin Gowda
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsJonas Bonér
 
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInLinkedIn
 
Real-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormReal-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormJohn Georgiadis
 
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...VAISHNAVI MADHAN
 
Distributed Model Validation with Epsilon
Distributed Model Validation with EpsilonDistributed Model Validation with Epsilon
Distributed Model Validation with EpsilonSina Madani
 
Unit 5 Advanced Computer Architecture
Unit 5 Advanced Computer ArchitectureUnit 5 Advanced Computer Architecture
Unit 5 Advanced Computer ArchitectureBalaji Vignesh
 

Similar a Multi-core Parallelization in Clojure - a Case Study (20)

Coding For Cores - C# Way
Coding For Cores - C# WayCoding For Cores - C# Way
Coding For Cores - C# Way
 
Concurrency Programming in Java - 01 - Introduction to Concurrency Programming
Concurrency Programming in Java - 01 - Introduction to Concurrency ProgrammingConcurrency Programming in Java - 01 - Introduction to Concurrency Programming
Concurrency Programming in Java - 01 - Introduction to Concurrency Programming
 
gevent at TellApart
gevent at TellApartgevent at TellApart
gevent at TellApart
 
Gevent at TellApart
Gevent at TellApartGevent at TellApart
Gevent at TellApart
 
Building Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesBuilding Big Data Streaming Architectures
Building Big Data Streaming Architectures
 
Lecture 2
Lecture 2Lecture 2
Lecture 2
 
Concept of thread
Concept of threadConcept of thread
Concept of thread
 
Simon Peyton Jones: Managing parallelism
Simon Peyton Jones: Managing parallelismSimon Peyton Jones: Managing parallelism
Simon Peyton Jones: Managing parallelism
 
Peyton jones-2011-parallel haskell-the_future
Peyton jones-2011-parallel haskell-the_futurePeyton jones-2011-parallel haskell-the_future
Peyton jones-2011-parallel haskell-the_future
 
Storm presentation
Storm presentationStorm presentation
Storm presentation
 
Building a Database for the End of the World
Building a Database for the End of the WorldBuilding a Database for the End of the World
Building a Database for the End of the World
 
Multi core programming 2
Multi core programming 2Multi core programming 2
Multi core programming 2
 
Scaling up java applications on windows
Scaling up java applications on windowsScaling up java applications on windows
Scaling up java applications on windows
 
VTU 6th Sem Elective CSE - Module 3 cloud computing
VTU 6th Sem Elective CSE - Module 3 cloud computingVTU 6th Sem Elective CSE - Module 3 cloud computing
VTU 6th Sem Elective CSE - Module 3 cloud computing
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability Patterns
 
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
 
Real-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormReal-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and Storm
 
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
 
Distributed Model Validation with Epsilon
Distributed Model Validation with EpsilonDistributed Model Validation with Epsilon
Distributed Model Validation with Epsilon
 
Unit 5 Advanced Computer Architecture
Unit 5 Advanced Computer ArchitectureUnit 5 Advanced Computer Architecture
Unit 5 Advanced Computer Architecture
 

Más de elliando dias

Clojurescript slides
Clojurescript slidesClojurescript slides
Clojurescript slideselliando dias
 
Why you should be excited about ClojureScript
Why you should be excited about ClojureScriptWhy you should be excited about ClojureScript
Why you should be excited about ClojureScriptelliando dias
 
Functional Programming with Immutable Data Structures
Functional Programming with Immutable Data StructuresFunctional Programming with Immutable Data Structures
Functional Programming with Immutable Data Structureselliando dias
 
Nomenclatura e peças de container
Nomenclatura  e peças de containerNomenclatura  e peças de container
Nomenclatura e peças de containerelliando dias
 
Polyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better AgilityPolyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better Agilityelliando dias
 
Javascript Libraries
Javascript LibrariesJavascript Libraries
Javascript Librarieselliando dias
 
How to Make an Eight Bit Computer and Save the World!
How to Make an Eight Bit Computer and Save the World!How to Make an Eight Bit Computer and Save the World!
How to Make an Eight Bit Computer and Save the World!elliando dias
 
A Practical Guide to Connecting Hardware to the Web
A Practical Guide to Connecting Hardware to the WebA Practical Guide to Connecting Hardware to the Web
A Practical Guide to Connecting Hardware to the Webelliando dias
 
Introdução ao Arduino
Introdução ao ArduinoIntrodução ao Arduino
Introdução ao Arduinoelliando dias
 
Incanter Data Sorcery
Incanter Data SorceryIncanter Data Sorcery
Incanter Data Sorceryelliando dias
 
Fab.in.a.box - Fab Academy: Machine Design
Fab.in.a.box - Fab Academy: Machine DesignFab.in.a.box - Fab Academy: Machine Design
Fab.in.a.box - Fab Academy: Machine Designelliando dias
 
The Digital Revolution: Machines that makes
The Digital Revolution: Machines that makesThe Digital Revolution: Machines that makes
The Digital Revolution: Machines that makeselliando dias
 
Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.elliando dias
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at FacebookHadoop and Hive Development at Facebook
Hadoop and Hive Development at Facebookelliando dias
 
From Lisp to Clojure/Incanter and RAn Introduction
From Lisp to Clojure/Incanter and RAn IntroductionFrom Lisp to Clojure/Incanter and RAn Introduction
From Lisp to Clojure/Incanter and RAn Introductionelliando dias
 

Más de elliando dias (20)

Clojurescript slides
Clojurescript slidesClojurescript slides
Clojurescript slides
 
Why you should be excited about ClojureScript
Why you should be excited about ClojureScriptWhy you should be excited about ClojureScript
Why you should be excited about ClojureScript
 
Functional Programming with Immutable Data Structures
Functional Programming with Immutable Data StructuresFunctional Programming with Immutable Data Structures
Functional Programming with Immutable Data Structures
 
Nomenclatura e peças de container
Nomenclatura  e peças de containerNomenclatura  e peças de container
Nomenclatura e peças de container
 
Geometria Projetiva
Geometria ProjetivaGeometria Projetiva
Geometria Projetiva
 
Polyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better AgilityPolyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better Agility
 
Javascript Libraries
Javascript LibrariesJavascript Libraries
Javascript Libraries
 
How to Make an Eight Bit Computer and Save the World!
How to Make an Eight Bit Computer and Save the World!How to Make an Eight Bit Computer and Save the World!
How to Make an Eight Bit Computer and Save the World!
 
Ragel talk
Ragel talkRagel talk
Ragel talk
 
A Practical Guide to Connecting Hardware to the Web
A Practical Guide to Connecting Hardware to the WebA Practical Guide to Connecting Hardware to the Web
A Practical Guide to Connecting Hardware to the Web
 
Introdução ao Arduino
Introdução ao ArduinoIntrodução ao Arduino
Introdução ao Arduino
 
Minicurso arduino
Minicurso arduinoMinicurso arduino
Minicurso arduino
 
Incanter Data Sorcery
Incanter Data SorceryIncanter Data Sorcery
Incanter Data Sorcery
 
Rango
RangoRango
Rango
 
Fab.in.a.box - Fab Academy: Machine Design
Fab.in.a.box - Fab Academy: Machine DesignFab.in.a.box - Fab Academy: Machine Design
Fab.in.a.box - Fab Academy: Machine Design
 
The Digital Revolution: Machines that makes
The Digital Revolution: Machines that makesThe Digital Revolution: Machines that makes
The Digital Revolution: Machines that makes
 
Hadoop + Clojure
Hadoop + ClojureHadoop + Clojure
Hadoop + Clojure
 
Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at FacebookHadoop and Hive Development at Facebook
Hadoop and Hive Development at Facebook
 
From Lisp to Clojure/Incanter and RAn Introduction
From Lisp to Clojure/Incanter and RAn IntroductionFrom Lisp to Clojure/Incanter and RAn Introduction
From Lisp to Clojure/Incanter and RAn Introduction
 

Último

Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 

Último (20)

Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 

Multi-core Parallelization in Clojure - a Case Study

  • 1. Multi-core Parallelization in Clojure - a Case Study Johann M. Kraus and Hans A. Kestler AG Bioinformatics and Systems Biology Institute of Neural Information Processing University of Ulm 29.06.2009
  • 2. Outline 1. Concepts of parallel programming 2. Short introduction to Clojure 3. Multi-core parallel K-means - the case study 4. Analysis and Results 5. Summary
  • 3. Parallel Programming Definition: Parallel programming is a form of programming where many calculations are performed simultaneously. • Physical constraints prevent frequency scaling of processors • This led to an increasing interest in parallel hardware and parallel programming • Multi-core hardware is standard on desktop computers • Parallel software can use this hardware to the full capacity
  • 4. Large problems are divided into smaller ones and the sub- problems are solved simultaneously • Speedup S is limited by the fraction of parallelizable code P 1 • Amdahl’s law: S= 1−P + P N Amdahl's law 20 18 Fraction of parallelizable code 16 0.95 % 0.90 % 14 0.75 % 0.50 % 12 Speedup 10 8 6 4 2 0 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768 65536 Number of processors
  • 5. Concepts of Parallel Programming Explicit vs. implicit parallelization • Explicitly define communication and synchronization details for each task: • MPI • Java Threads • Functional programming allows implicit parallelization: • Parallel processing of functions • Functions are free of side-effects • Data is immutable
  • 6. Distributed vs. local hardware • Master - Slave parallelization • Shared memory parallelization (e.g. Message Passing Interface) (e.g. Open Multi-Processing) CPU Master 0 Slave Slave Slave CPU Shared CPU 0 1 2 4 Memory 1 Slave Slave 3 4 CPU CPU2 3 send data read send result write
  • 7. Thread programming • Threads are refinements of a process that share the same memory and can be processed separately and simultaneously • Available in many languages, e.g. PThreads (C), Java Threads (Java), OpenMP Threads (C, Fortran) • Execution of threads is handled by a scheduler that manages the available processing time • Communication between new start runnable awake threads is faster than communication between processes schedule waiting • Invoking threads is also end block faster than fork/join terminated running processes
  • 8. Concurrency control via locking and synchronizing • Concurrency control ensures that threads can access shared memory without violating data integrity • The most popular approach to concurrency is locking and synchronizing public c l a s s Counter { private int v a l u e = 0 ; public synchronized void i n c r { value = value + 1; } } Counter c o u n t e r = new Counter ( ) ; counter . incr ( ) ; • Problems might occur when using too many locks, too few locks, wrong locks, or locks in the wrong order • Using locks can be fatally error-prone, e.g. dead-locks
  • 9. Concurrency control via transactional memory • Transactional memory offers a flexible alternative to lock-based concurrency control • Functionality is analogous to controlling simultaneous access to database management systems • Transactions ensure properties: • Atomicity: Either all changes of a transaction occur or none do • Consistency: Only valid changes are committed • Isolation: No transaction sees the effect of other transactions • Durability: Changes from transactions will be persistent
  • 10. • Software transactional memory maps transactional memory to concurrency control in parallel programming TIME :Transaction 0 :Data :Transaction 1 get data get data [consistent data] send modified data [consistent data] send modified data get data [consistent data] send modified data
  • 11. Clojure • Functional programming language hosted on the JVM • Extends the code-as-data paradigm to maps and vectors • Based on immutable data structures • Provides built-in concurrency support via software transactional memory • Completely symbiotic to Java, e.g. easy access to Java libraries • Platform independent
  • 12. Java interaction ( import ’ ( c e r n . j e t . random . s a m p l i n g RandomSamplingAssistant ) ) ( defn sample [n k] ( seq ( . RandomSamplingAssistant ( sampleArray k ( i n t −a r r a y ( range n ) ) ) ) ) ) • Dynamic typing and multi-methods • An object is defined as the sum of what it can do (methods), rather than the sum of what it is (type hierarchy) • Add type hints to speed up code ( defn da+ [#ˆ doubles a s #ˆdoubles bs ] (amap a s i r e t (+ ( aget a s i ) ( aget bs i ) ) ) )
  • 13. Transactional references and STM • Transactional references ensure safe coordinated synchronous changes to mutable storage locations • Are bound to a single storage location for their lifetime • Only allow mutation of that location to occur within transactions • Available operations are ref-set, alter, and commute • No explicit locking is required ( def c o u n t e r ( r e f 0 ) ) ( dosync ( a l t e r c o u n t e r inc ) )
  • 14. Agents • Agents allow independent asynchronous change of mutable locations • Are bound to a single storage location for their lifetime • Only allow mutation of that location to a new state to occur as a result of an action • Actions are functions that are asynchronously applied to the state of an Agent • The return value of an action becomes new state of the Agent • Agents are integrated with the STM ( def c o u n t e r ( agent 0 ) ) ( send c o u n t e r inc )
  • 15. Cluster analysis • Given a data set X compute a partition of X into k disjoint clusters C, such that: k (1) Ci = X i=1 (2) Ci = ∅ and Ci ∩ Cj = ∅ • How many clusters are in the data set? 3 cluster 9 cluster
  • 16. Cluster algorithms • For all possible partitions evaluate the objective function f and search the optimum. Number of data points 30 • The cardinality of the set of all possible 35 25 Runtime (nanosecond) 30 20 partitions is given by: 25 15 20 15 10 k 1 10 Stirling numbers of k−i k N k = (−1) 5 the second kind SN i 5 k! i 0 0 i=0 0 5 10 15 20 25 30 35 Number of clusters Cluster algorithms provide a heuristic for this search: • Partitional clustering (K-means, Neuralgas, SOM, Fuzzy C-means, ...) • Hierarchical clustering (Divisive/agglomerative, Complete linkage, ...) • Graph-based clustering (Spectral clustering, NMF, Affinity propagation, ...) • Model-based clustering, Biclustering, Semi-supervised clustering
  • 17. K-means algorithm Function KMeans Input : X = { x 1 , . . . , x n } ( Data t o be c l u s t e r e d ) k ( Number o f c l u s t e r s ) Output : C = { c 1 , . . . , c k } ( C l u s t e r c e n t r o i d s ) m: X −> C ( C l u s t e r a s s i g n m e n t s ) I n i t i a l i z e C ( e . g . random s e l e c t i o n from X) While C h a s changed For e a c h x i i n X m( x i ) = a r g m i n j d i s t a n c e ( x i , c j ) End For e a c h c j i n C c j = c e n t r o i d ( { x i | m( x i ) = j } ) End End
  • 18. Cluster Validation • Evaluation requires repeated runs of clustering, e.g.: • Resampled data sets • Different parameters • MCA-index: mean proportion of samples being consistent over different clusterings k M CA = 1 n maxπ i=1 |Ai ∩ Bj |
  • 19. Estimation of the expected value of a validation index 1.0 Random label: randomly assign each item to a cluster k 0.8 Random partition: choose a mean mca index 0.6 random partition 0.4 Random prototype: assign each item to its next prototype 0.2 0.0 0 10 20 30 40 50 Mean value from 100 runs cluster
  • 20. Multi-core K-means with Clojure • Split the data set into smaller pieces that are handled by agents • Each cluster is represented by an agent • Add a commutative list of cluster members within a transactional reference to accelerate the centroid update step Data Data Data Data Data Agent 0 Agent 1 Agent 2 Agent 3 Agent n Member Cluster Ref 0 Agent 0 Cluster Member Agent 1 Ref 1 Cluster Member Agent k Ref k read write
  • 21. simultaneous read Cluster Data Agent 0 Agent 0 Cluster Data Agent 1 Agent 1 Cluster Agent k Data Agent n simultaneous write Data Member Agent 0 Ref 0 Data Member Agent 1 Ref 1 Data Agent n Member Ref 2
  • 22. read: (nearest-cluster) write: (commute) (assoc) ( defn a s s i g n m e n t [ ] (map #(send % update−d a t a a g e n t ) DataAgents ) ( defn update−d a t a a g e n t [ d a t a p o i n t s ] (map update−d a t a p o i n t d a t a p o i n t s ) ) ( defn update−d a t a p o i n t [ d a t a p o i n t ] ( l e t [ newass ( n e a r e s t −c l u s t e r d a t a p o i n t ) ] ( dosync (commute ( nth MemberRefs newass ) conj ( : d a t a d a t a p o i n t ) ) ) ( assoc d a t a p o i n t : a s s i g n m e n t newass ) ) )
  • 23. Benchmark results Large data sets (artificial): • Each data point is sampled from N(0,1) • Summary for 10 runs of K-means 10.000 cases, 100 dimensions 1.000.000 cases, 200 dimensions 20 Cluster 20 Cluster 150 450 runtime (seconds) runtime (minutes) 100 300 150 50 0 0 ParaKMeans K-means R McKmeans K-means R McKmeans
  • 24. Number of computer cores used • Number of data agents used 100.000 x 500 100.000 x 500 20 cluster 20 cluster 800 1500 600 runtime (seconds) runtime (seconds) 1000 400 500 200 0 0 1 4 8 4 6 8 10 number of computer cores number of data agents
  • 25. Large data sets with cluster structure • Data sampled from a multi-variate normal distribution • 100000 samples, 200/500 dimensions, 10/20 cluster K-means R McKmeans 2000 1500 runtime (seconds) 1000 500 0 200 / 10 200 / 20 500 / 10 500 / 20 200 / 10 200 / 20 500 / 10 500 / 20 Number of samples / Number of clusters
  • 26. Accuracy compared to the known grouping of data • Measured with the MCA index • Red bars indicate the random-prototype baseline 100.000 x 200 100.000 x 200 100.000 x 500 100.000 x 500 10 cluster 20 cluster 10 cluster 20 cluster 1.0 0.8 _ _ _ _ _ _ _ _ MCA index 0.4 0.6 0.2 0.0 McKmeans K-means R McKmeans K-means R McKmeans K-means R McKmeans K-means R
  • 27. Real world data set • Microarray data (Radiation-induced changes in human gene expression) • 22277 samples (genes) and 465 features (profiles) K-means R McKmeans 350 runtime (seconds) 250 150 50 0 2 Cluster 5 Cluster 10 Cluster 20 Cluster 2 Cluster 5 Cluster 10 Cluster 20 Cluster Number of clusters Smirnov D, Morley M, Shin E, Spielman R, Cheung V: Genetic analysis of radiation-induced changes in human gene expression. Nature 2009, 459:587–591
  • 28. Application to Cluster Number Estimation • Repeated clustering with different subsets of data • Repeated for different number of clusters k • Most stable clustering is produced for the ‘real’ cluster number • Jackknife resampling 1.0 • _ _ _ _ 0.8 Evaluation with MCA index _ _ 0.6 • Data set:100000 samples, MCA index 100 features, 3 cluster 0.4 • 0.2 10 runs per cluster number 0.0 • 49.26 minutes on dual-quad 2 3 4 5 6 7 core 3.2 GHz number of clusters
  • 29. Java GUI ( import ’ ( j a v a x . s w i n g JFrame J L a b e l J T e x t F i e l d JButton ) ’ ( j a v a . awt . e v e n t A c t i o n L i s t e n e r ) ’ ( j a v a . awt GridLayout ) ) ( let [ frame ( new JFrame ” H e l l o , World ! ” ) h e l l o b u t t o n ( new JButton ” Say h e l l o ” ) h e l l o l a b e l ( new J L a b e l ” ” ) ] ( . h e l l o button ( addActionListener ( proxy [ A c t i o n L i s t e n e r ] [ ] ( actionPerformed [ evt ] ( . hello label ( s e t T e x t ” H e l l o , World ! ” ) ) ) ) ) ) ( d o t o frame ( . s e t L a y o u t ( new GridLayout 1 1 3 3 ) ) ( . add h e l l o b u t t o n ) ( . add h e l l o l a b e l ) ( . s e t S i z e 300 8 0 ) ( . s e t V i s i b l e true )))
  • 30.
  • 31. Summary • Writing parallel programs usually requires a careful software design and a deep knowledge about thread-safe programming • Concurrency control via transactional memory circumvents problems of lock-based concurrency strategies • Immutable data structures play a key role to software transactional memory • Clojure combines Lisp, Java and a powerful STM system • This enables fast parallelization of algorithms, even for rapid prototyping • Our simulations show a good performance of the parallelized code
  • 32. Thank you for your attention.
  • 33. Statistical computing library • http://wiki.github.com/liebke/incanter • Clojure-based statistical computing • R-like semantics • COLT library for numerical computation • JFreeChart library for graphics