SlideShare una empresa de Scribd logo
1 de 33
Descargar para leer sin conexión
Multi-core Parallelization in Clojure -
            a Case Study

     Johann M. Kraus and Hans A. Kestler

    AG Bioinformatics and Systems Biology
  Institute of Neural Information Processing
               University of Ulm


1. Concepts of parallel programming

2. Short introduction to Clojure

3. Multi-core parallel K-means - the case study

4. Analysis and Results

5. Summary
Parallel Programming
Parallel programming is a form of programming where many calculations
are performed simultaneously.

•   Physical constraints prevent frequency scaling of processors

•   This led to an increasing interest in parallel hardware and parallel

•   Multi-core hardware is standard on desktop computers

•   Parallel software can use this hardware to the full capacity
•             Large problems are divided into smaller ones and the sub-
              problems are solved simultaneously

•             Speedup S is limited by the fraction of parallelizable code P

•             Amdahl’s law:                  S=
                                                1−P +                           P

                                                         Amdahl's law

                                                                                Fraction of parallelizable code

                                                                                               0.95 %
                                                                                               0.90 %

                                                                                               0.75 %
                                                                                               0.50 %


                   1   2   4   8   16   32    64   128       256        512   1024   2048   4096    8192   16384   32768   65536

                                                    Number of processors
Concepts of Parallel Programming

              Explicit vs. implicit parallelization

•   Explicitly define communication and synchronization details for
    each task:
                 • MPI
                •   Java Threads

•   Functional programming allows implicit parallelization:

                •   Parallel processing of functions

                •   Functions are free of side-effects

                •   Data is immutable
Distributed vs. local hardware

•   Master - Slave parallelization                    •     Shared memory parallelization
    (e.g. Message Passing Interface)                        (e.g. Open Multi-Processing)

                       Master                                       0

       Slave           Slave            Slave         CPU         Shared          CPU
         0               1                2            4          Memory           1

               Slave            Slave
                 3                4                         CPU

                                        send data                                       read
                                        send result                                     write
Thread programming

•   Threads are refinements of a process that share the same memory and
    can be processed separately and simultaneously

•   Available in many languages, e.g. PThreads (C), Java Threads (Java),
    OpenMP Threads (C, Fortran)

•   Execution of threads is handled by a scheduler that manages the available
    processing time

•   Communication between
                                                     start   runnable

    threads is faster than
    communication between


•   Invoking threads is also                         end                    block
    faster than fork/join
                                        terminated           running

Concurrency control via locking and synchronizing

• Concurrency control ensures that threads can access shared memory
 without violating data integrity

• The most popular approach to concurrency is locking and synchronizing
               public c l a s s Counter {
                         private int v a l u e = 0 ;
                         public synchronized void i n c r {
                                     value = value + 1;
               Counter c o u n t e r = new Counter ( ) ;
               counter . incr ( ) ;

• Problems might occur when using too many locks, too few locks, wrong
  locks, or locks in the wrong order

• Using locks can be fatally error-prone, e.g. dead-locks
Concurrency control via transactional memory

• Transactional memory offers a flexible alternative to lock-based
  concurrency control

• Functionality is analogous to controlling simultaneous access to database
  management systems

• Transactions ensure properties:
  •   Atomicity: Either all changes of a transaction occur or none do

  •   Consistency: Only valid changes are committed

  •   Isolation: No transaction sees the effect of other transactions

  •   Durability: Changes from transactions will be persistent
• Software transactional memory maps transactional memory to
 concurrency control in parallel programming


 :Transaction 0                       :Data                       :Transaction 1

                      get data

                                                  get data

                                              [consistent data]
                                              send modified data

                  [consistent data]
                  send modified data

                       get data

                  [consistent data]
                  send modified data

•   Functional programming language hosted on the JVM

•   Extends the code-as-data paradigm to maps and vectors

•   Based on immutable data structures

•   Provides built-in concurrency support via software transactional

•   Completely symbiotic to Java, e.g. easy access to Java libraries

•   Platform independent
•   Java interaction
        ( import    ’ ( c e r n . j e t . random . s a m p l i n g
                        RandomSamplingAssistant ) )
        ( defn sample
          [n k]
          ( seq ( . RandomSamplingAssistant
                      ( sampleArray k ( i n t −a r r a y ( range n ) ) ) ) ) )

•   Dynamic typing and multi-methods

    •   An object is defined as the sum of what it can do (methods),
        rather than the sum of what it is (type hierarchy)

•   Add type hints to speed up code

        ( defn da+ [#ˆ doubles a s #ˆdoubles bs ]
          (amap a s i r e t
           (+ ( aget a s i ) ( aget bs i ) ) ) )
Transactional references and STM

•   Transactional references ensure safe coordinated synchronous
    changes to mutable storage locations

•   Are bound to a single storage location for their lifetime

•   Only allow mutation of that location to occur within transactions

•   Available operations are ref-set, alter, and commute

•   No explicit locking is required

                 ( def c o u n t e r ( r e f 0 ) )
                 ( dosync ( a l t e r c o u n t e r inc ) )

•   Agents allow independent asynchronous change of mutable

•   Are bound to a single storage location for their lifetime

•   Only allow mutation of that location to a new state to occur as a
    result of an action

•   Actions are functions that are asynchronously applied to the state
    of an Agent

•   The return value of an action becomes new state of the Agent

•   Agents are integrated with the STM
                    ( def c o u n t e r ( agent 0 ) )
                    ( send c o u n t e r inc )
Cluster analysis

•   Given a data set X compute a partition of X into k disjoint clusters C,
    such that:
                               (1)         Ci = X
                               (2) Ci = ∅ and Ci ∩ Cj = ∅

•   How many clusters are in the data set?

                   3 cluster                                9 cluster
Cluster algorithms
•   For all possible partitions evaluate the
    objective function f and search the optimum.

                                                                                                                                                                              Number of data points
•   The cardinality of the set of all possible

                                                           Runtime (nanosecond)

    partitions is given by:



    Stirling numbers of                        k−i   k N
                               =            (−1)

    the second kind
                          SN                           i                                                                                    5

                                 k!                  i                                                                                  0

                                      i=0                                              0   5   10       15     20        25   30   35

                                                                                                    Number of clusters

Cluster algorithms provide a heuristic for this search:

•   Partitional clustering (K-means, Neuralgas, SOM, Fuzzy C-means, ...)

•   Hierarchical clustering (Divisive/agglomerative, Complete linkage, ...)

•   Graph-based clustering (Spectral clustering, NMF, Affinity propagation, ...)

•   Model-based clustering, Biclustering, Semi-supervised clustering
K-means algorithm
Function KMeans

 Input : X = { x 1 , . . . , x n } ( Data t o be c l u s t e r e d )
         k ( Number o f c l u s t e r s )

Output : C = { c 1 , . . . , c k } ( C l u s t e r c e n t r o i d s )
         m: X −> C ( C l u s t e r a s s i g n m e n t s )

I n i t i a l i z e C ( e . g . random s e l e c t i o n from X)
While C h a s changed
  For e a c h x i i n X
   m( x i ) = a r g m i n j d i s t a n c e ( x i , c j )
  For e a c h c j i n C
   c j = c e n t r o i d ( { x i | m( x i ) = j } )
Cluster Validation
•   Evaluation requires repeated runs of clustering, e.g.:
       •   Resampled data sets

       •   Different parameters

•   MCA-index: mean proportion of samples being consistent over
    different clusterings
                  M CA =     1
                             n   maxπ    i=1   |Ai ∩ Bj |
Estimation of the expected value of a validation index

Random label: randomly assign
each item to a cluster k

Random partition: choose a

                                mean mca index

random partition

Random prototype: assign each
item to its next prototype                       0.2

                                                       0   10   20             30   40   50

Mean value from 100 runs                                             cluster
Multi-core K-means with Clojure
•   Split the data set into smaller pieces that are handled by agents

•   Each cluster is represented by an agent

•   Add a commutative list of cluster members within a transactional
    reference to accelerate the centroid update step

                       Data       Data      Data      Data              Data
                      Agent 0    Agent 1   Agent 2   Agent 3           Agent n

           Cluster                                                  Ref 0
           Agent 0

                     Cluster                                      Member
                     Agent 1                                       Ref 1

                                Cluster                  Member
                                Agent k                   Ref k

simultaneous read

      Cluster                               Data
      Agent 0                              Agent 0

            Cluster                                        Data
            Agent 1                                       Agent 1

  Agent k                                       Data
                                               Agent n

                      simultaneous write

                                           Agent 0
  Ref 0

                                                         Agent 1
          Ref 1

                                           Agent n
 Ref 2
read: (nearest-cluster)

write: (commute)

( defn a s s i g n m e n t [ ]
  (map #(send % update−d a t a a g e n t ) DataAgents )

( defn update−d a t a a g e n t [ d a t a p o i n t s ]
  (map update−d a t a p o i n t d a t a p o i n t s ) )

( defn update−d a t a p o i n t [ d a t a p o i n t ]
  ( l e t [ newass ( n e a r e s t −c l u s t e r d a t a p o i n t ) ]
    ( dosync (commute ( nth MemberRefs newass )
                              conj ( : d a t a d a t a p o i n t ) ) )
    ( assoc d a t a p o i n t : a s s i g n m e n t newass ) ) )
Benchmark results
                          Large data sets (artificial):

                          •   Each data point is sampled from N(0,1)

                          •   Summary for 10 runs of K-means
                                   10.000 cases, 100 dimensions                                1.000.000 cases, 200 dimensions
                                            20 Cluster                                                    20 Cluster

runtime (seconds)

                                                                     runtime (minutes)



                              ParaKMeans    K-means R     McKmeans                                K-means R     McKmeans
•      Number of computer cores used                                    •   Number of data agents used
                                       100.000 x 500                                                    100.000 x 500
                                         20 cluster                                                       20 cluster


runtime (seconds)

                                                                  runtime (seconds)




                               1              4               8                                 4      6                8     10

                                   number of computer cores                                           number of data agents
Large data sets with cluster structure

                           •      Data sampled from a multi-variate normal distribution

                           •      100000 samples, 200/500 dimensions, 10/20 cluster

                                                 K-means R                                        McKmeans
runtime (seconds)

                               200 / 10   200 / 20   500 / 10   500 / 20   200 / 10    200 / 20      500 / 10   500 / 20

                                                        Number of samples / Number of clusters
Accuracy compared to the known grouping of data

                         •    Measured with the MCA index

                         •    Red bars indicate the random-prototype baseline

               100.000 x 200            100.000 x 200          100.000 x 500          100.000 x 500
                10 cluster               20 cluster             10 cluster             20 cluster

                _            _                                                         _          _
                                         _          _           _          _
MCA index
0.4    0.6

              McKmeans   K-means R     McKmeans   K-means R   McKmeans   K-means R   McKmeans   K-means R
Real world data set

                           •     Microarray data (Radiation-induced changes in
                                 human gene expression)

                           •     22277 samples (genes) and 465 features (profiles)
                                              K-means R                                                       McKmeans
runtime (seconds)


                          2 Cluster    5 Cluster      10 Cluster      20 Cluster       2 Cluster       5 Cluster       10 Cluster      20 Cluster

                                                                         Number of clusters
    Smirnov D, Morley M, Shin E, Spielman R, Cheung V: Genetic analysis of radiation-induced changes in human gene expression. Nature 2009, 459:587–591
Application to Cluster Number Estimation
•   Repeated clustering with different subsets of data

•   Repeated for different number of clusters k

•   Most stable clustering is produced for the ‘real’ cluster number

•   Jackknife resampling

•                                                      _ _ _ _

    Evaluation with MCA index
                                                               _ _
•   Data set:100000 samples,         MCA index

    100 features, 3 cluster


    10 runs per cluster number

•   49.26 minutes on dual-quad                         2   3     4           5      6   7

    core 3.2 GHz                                               number of clusters
Java GUI
( import       ’ ( j a v a x . s w i n g JFrame J L a b e l J T e x t F i e l d JButton )
               ’ ( j a v a . awt . e v e n t A c t i o n L i s t e n e r )
               ’ ( j a v a . awt GridLayout ) )

( let     [ frame ( new JFrame ” H e l l o , World ! ” )
            h e l l o b u t t o n ( new JButton ” Say h e l l o ” )
            h e l l o l a b e l ( new J L a b e l ” ” ) ]
        ( . h e l l o button
                ( addActionListener
                     ( proxy [ A c t i o n L i s t e n e r ] [ ]
                              ( actionPerformed [ evt ]
                                          ( . hello label
                                                ( s e t T e x t ” H e l l o , World ! ” ) ) ) ) ) )
        ( d o t o frame
                              ( . s e t L a y o u t ( new GridLayout 1 1 3 3 ) )
                              ( . add h e l l o b u t t o n )
                              ( . add h e l l o l a b e l )
                              ( . s e t S i z e 300 8 0 )
                              ( . s e t V i s i b l e true )))

•   Writing parallel programs usually requires a careful software design
    and a deep knowledge about thread-safe programming

•   Concurrency control via transactional memory circumvents
    problems of lock-based concurrency strategies

•   Immutable data structures play a key role to software transactional

•   Clojure combines Lisp, Java and a powerful STM system

•   This enables fast parallelization of algorithms, even for rapid

•   Our simulations show a good performance of the parallelized code
Thank you for your attention.
Statistical computing library

• Clojure-based statistical computing
• R-like semantics
• COLT library for numerical computation
• JFreeChart library for graphics

Más contenido relacionado

La actualidad más candente

Building high traffic http front-ends. theo schlossnagle. зал 1
Building high traffic http front-ends. theo schlossnagle. зал 1Building high traffic http front-ends. theo schlossnagle. зал 1
Building high traffic http front-ends. theo schlossnagle. зал 1rit2011
Simple asynchronous remote invocations for distributed real-time Java
Simple asynchronous remote invocations for distributed real-time JavaSimple asynchronous remote invocations for distributed real-time Java
Simple asynchronous remote invocations for distributed real-time JavaUniversidad Carlos III de Madrid
Recent advance in netmap/VALE(mSwitch)
Recent advance in netmap/VALE(mSwitch)Recent advance in netmap/VALE(mSwitch)
Recent advance in netmap/VALE(mSwitch)micchie
Hs java open_party
Hs java open_partyHs java open_party
Hs java open_partyOpen Party
Fast Userspace OVS with AF_XDP, OVS CONF 2018
Fast Userspace OVS with AF_XDP, OVS CONF 2018Fast Userspace OVS with AF_XDP, OVS CONF 2018
Fast Userspace OVS with AF_XDP, OVS CONF 2018Cheng-Chun William Tu
DevOops - Lessons Learned from an OpenStack Network Architect
DevOops - Lessons Learned from an OpenStack Network ArchitectDevOops - Lessons Learned from an OpenStack Network Architect
DevOops - Lessons Learned from an OpenStack Network ArchitectJames Denton
Training Slides: Basics 102: Introduction to Tungsten Clustering
Training Slides: Basics 102: Introduction to Tungsten ClusteringTraining Slides: Basics 102: Introduction to Tungsten Clustering
Training Slides: Basics 102: Introduction to Tungsten ClusteringContinuent
Performance challenges in software networking
Performance challenges in software networkingPerformance challenges in software networking
Performance challenges in software networkingStephen Hemminger
What's New and Upcoming in HDFS - the Hadoop Distributed File System
What's New and Upcoming in HDFS - the Hadoop Distributed File SystemWhat's New and Upcoming in HDFS - the Hadoop Distributed File System
What's New and Upcoming in HDFS - the Hadoop Distributed File SystemCloudera, Inc.
Presentation: Optimal Power Management for Server Farm to Support Green Compu...
Presentation: Optimal Power Management for Server Farm to Support Green Compu...Presentation: Optimal Power Management for Server Farm to Support Green Compu...
Presentation: Optimal Power Management for Server Farm to Support Green Compu...Sivadon Chaisiri
Jvm Performance Tunning
Jvm Performance TunningJvm Performance Tunning
Jvm Performance Tunningguest1f2740
Prerequisite knowledge for shared memory concurrency
Prerequisite knowledge for shared memory concurrencyPrerequisite knowledge for shared memory concurrency
Prerequisite knowledge for shared memory concurrencyViller Hsiao

La actualidad más candente (19)

Building high traffic http front-ends. theo schlossnagle. зал 1
Building high traffic http front-ends. theo schlossnagle. зал 1Building high traffic http front-ends. theo schlossnagle. зал 1
Building high traffic http front-ends. theo schlossnagle. зал 1
Simple asynchronous remote invocations for distributed real-time Java
Simple asynchronous remote invocations for distributed real-time JavaSimple asynchronous remote invocations for distributed real-time Java
Simple asynchronous remote invocations for distributed real-time Java
Recent advance in netmap/VALE(mSwitch)
Recent advance in netmap/VALE(mSwitch)Recent advance in netmap/VALE(mSwitch)
Recent advance in netmap/VALE(mSwitch)
Hs java open_party
Hs java open_partyHs java open_party
Hs java open_party
Fast Userspace OVS with AF_XDP, OVS CONF 2018
Fast Userspace OVS with AF_XDP, OVS CONF 2018Fast Userspace OVS with AF_XDP, OVS CONF 2018
Fast Userspace OVS with AF_XDP, OVS CONF 2018
Userspace networking
Userspace networkingUserspace networking
Userspace networking
DevOops - Lessons Learned from an OpenStack Network Architect
DevOops - Lessons Learned from an OpenStack Network ArchitectDevOops - Lessons Learned from an OpenStack Network Architect
DevOops - Lessons Learned from an OpenStack Network Architect
Training Slides: Basics 102: Introduction to Tungsten Clustering
Training Slides: Basics 102: Introduction to Tungsten ClusteringTraining Slides: Basics 102: Introduction to Tungsten Clustering
Training Slides: Basics 102: Introduction to Tungsten Clustering
Performance challenges in software networking
Performance challenges in software networkingPerformance challenges in software networking
Performance challenges in software networking
What's New and Upcoming in HDFS - the Hadoop Distributed File System
What's New and Upcoming in HDFS - the Hadoop Distributed File SystemWhat's New and Upcoming in HDFS - the Hadoop Distributed File System
What's New and Upcoming in HDFS - the Hadoop Distributed File System
General Purpose GPU Computing
General Purpose GPU ComputingGeneral Purpose GPU Computing
General Purpose GPU Computing
Basanta jtr2009
Basanta jtr2009Basanta jtr2009
Basanta jtr2009
Presentation: Optimal Power Management for Server Farm to Support Green Compu...
Presentation: Optimal Power Management for Server Farm to Support Green Compu...Presentation: Optimal Power Management for Server Farm to Support Green Compu...
Presentation: Optimal Power Management for Server Farm to Support Green Compu...
Jvm Performance Tunning
Jvm Performance TunningJvm Performance Tunning
Jvm Performance Tunning
Methods of NoSQL database systems benchmarking
Methods of NoSQL database systems benchmarkingMethods of NoSQL database systems benchmarking
Methods of NoSQL database systems benchmarking
Prerequisite knowledge for shared memory concurrency
Prerequisite knowledge for shared memory concurrencyPrerequisite knowledge for shared memory concurrency
Prerequisite knowledge for shared memory concurrency
Enhancing the region model of RTSJ
Enhancing the region model of RTSJEnhancing the region model of RTSJ
Enhancing the region model of RTSJ

Similar a Multi-core Parallelization in Clojure - a Case Study

Coding For Cores - C# Way
Coding For Cores - C# WayCoding For Cores - C# Way
Coding For Cores - C# WayBishnu Rawal
Concurrency Programming in Java - 01 - Introduction to Concurrency Programming
Concurrency Programming in Java - 01 - Introduction to Concurrency ProgrammingConcurrency Programming in Java - 01 - Introduction to Concurrency Programming
Concurrency Programming in Java - 01 - Introduction to Concurrency ProgrammingSachintha Gunasena
gevent at TellApart
gevent at TellApartgevent at TellApart
gevent at TellApartTellApart
Building Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesBuilding Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesDavid Martínez Rego
Lecture 2
Lecture 2Lecture 2
Lecture 2Mr SMAK
Simon Peyton Jones: Managing parallelism
Simon Peyton Jones: Managing parallelismSimon Peyton Jones: Managing parallelism
Simon Peyton Jones: Managing parallelismSkills Matter
Peyton jones-2011-parallel haskell-the_future
Peyton jones-2011-parallel haskell-the_futurePeyton jones-2011-parallel haskell-the_future
Peyton jones-2011-parallel haskell-the_futureTakayuki Muranushi
Storm presentation
Storm presentationStorm presentation
Storm presentationShyam Raj
Building a Database for the End of the World
Building a Database for the End of the WorldBuilding a Database for the End of the World
Building a Database for the End of the Worldjhugg
Multi core programming 2
Multi core programming 2Multi core programming 2
Multi core programming 2Robin Aggarwal
Scaling up java applications on windows
Scaling up java applications on windowsScaling up java applications on windows
Scaling up java applications on windowsJuarez Junior
VTU 6th Sem Elective CSE - Module 3 cloud computing
VTU 6th Sem Elective CSE - Module 3 cloud computingVTU 6th Sem Elective CSE - Module 3 cloud computing
VTU 6th Sem Elective CSE - Module 3 cloud computingSachin Gowda
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsJonas Bonér
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInLinkedIn
Real-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormReal-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormJohn Georgiadis
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...VAISHNAVI MADHAN
Distributed Model Validation with Epsilon
Distributed Model Validation with EpsilonDistributed Model Validation with Epsilon
Distributed Model Validation with EpsilonSina Madani
Unit 5 Advanced Computer Architecture
Unit 5 Advanced Computer ArchitectureUnit 5 Advanced Computer Architecture
Unit 5 Advanced Computer ArchitectureBalaji Vignesh

Similar a Multi-core Parallelization in Clojure - a Case Study (20)

Coding For Cores - C# Way
Coding For Cores - C# WayCoding For Cores - C# Way
Coding For Cores - C# Way
Concurrency Programming in Java - 01 - Introduction to Concurrency Programming
Concurrency Programming in Java - 01 - Introduction to Concurrency ProgrammingConcurrency Programming in Java - 01 - Introduction to Concurrency Programming
Concurrency Programming in Java - 01 - Introduction to Concurrency Programming
gevent at TellApart
gevent at TellApartgevent at TellApart
gevent at TellApart
Gevent at TellApart
Gevent at TellApartGevent at TellApart
Gevent at TellApart
Building Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesBuilding Big Data Streaming Architectures
Building Big Data Streaming Architectures
Lecture 2
Lecture 2Lecture 2
Lecture 2
Concept of thread
Concept of threadConcept of thread
Concept of thread
Simon Peyton Jones: Managing parallelism
Simon Peyton Jones: Managing parallelismSimon Peyton Jones: Managing parallelism
Simon Peyton Jones: Managing parallelism
Peyton jones-2011-parallel haskell-the_future
Peyton jones-2011-parallel haskell-the_futurePeyton jones-2011-parallel haskell-the_future
Peyton jones-2011-parallel haskell-the_future
Storm presentation
Storm presentationStorm presentation
Storm presentation
Building a Database for the End of the World
Building a Database for the End of the WorldBuilding a Database for the End of the World
Building a Database for the End of the World
Multi core programming 2
Multi core programming 2Multi core programming 2
Multi core programming 2
Scaling up java applications on windows
Scaling up java applications on windowsScaling up java applications on windows
Scaling up java applications on windows
VTU 6th Sem Elective CSE - Module 3 cloud computing
VTU 6th Sem Elective CSE - Module 3 cloud computingVTU 6th Sem Elective CSE - Module 3 cloud computing
VTU 6th Sem Elective CSE - Module 3 cloud computing
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability Patterns
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedInJay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Real-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormReal-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and Storm
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
Distributed Model Validation with Epsilon
Distributed Model Validation with EpsilonDistributed Model Validation with Epsilon
Distributed Model Validation with Epsilon
Unit 5 Advanced Computer Architecture
Unit 5 Advanced Computer ArchitectureUnit 5 Advanced Computer Architecture
Unit 5 Advanced Computer Architecture

Más de elliando dias

Clojurescript slides
Clojurescript slidesClojurescript slides
Clojurescript slideselliando dias
Why you should be excited about ClojureScript
Why you should be excited about ClojureScriptWhy you should be excited about ClojureScript
Why you should be excited about ClojureScriptelliando dias
Functional Programming with Immutable Data Structures
Functional Programming with Immutable Data StructuresFunctional Programming with Immutable Data Structures
Functional Programming with Immutable Data Structureselliando dias
Nomenclatura e peças de container
Nomenclatura  e peças de containerNomenclatura  e peças de container
Nomenclatura e peças de containerelliando dias
Polyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better AgilityPolyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better Agilityelliando dias
Javascript Libraries
Javascript LibrariesJavascript Libraries
Javascript Librarieselliando dias
How to Make an Eight Bit Computer and Save the World!
How to Make an Eight Bit Computer and Save the World!How to Make an Eight Bit Computer and Save the World!
How to Make an Eight Bit Computer and Save the World!elliando dias
A Practical Guide to Connecting Hardware to the Web
A Practical Guide to Connecting Hardware to the WebA Practical Guide to Connecting Hardware to the Web
A Practical Guide to Connecting Hardware to the Webelliando dias
Introdução ao Arduino
Introdução ao ArduinoIntrodução ao Arduino
Introdução ao Arduinoelliando dias
Incanter Data Sorcery
Incanter Data SorceryIncanter Data Sorcery
Incanter Data Sorceryelliando dias - Fab Academy: Machine Design - Fab Academy: Machine - Fab Academy: Machine Design - Fab Academy: Machine Designelliando dias
The Digital Revolution: Machines that makes
The Digital Revolution: Machines that makesThe Digital Revolution: Machines that makes
The Digital Revolution: Machines that makeselliando dias
Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.elliando dias
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at FacebookHadoop and Hive Development at Facebook
Hadoop and Hive Development at Facebookelliando dias
From Lisp to Clojure/Incanter and RAn Introduction
From Lisp to Clojure/Incanter and RAn IntroductionFrom Lisp to Clojure/Incanter and RAn Introduction
From Lisp to Clojure/Incanter and RAn Introductionelliando dias

Más de elliando dias (20)

Clojurescript slides
Clojurescript slidesClojurescript slides
Clojurescript slides
Why you should be excited about ClojureScript
Why you should be excited about ClojureScriptWhy you should be excited about ClojureScript
Why you should be excited about ClojureScript
Functional Programming with Immutable Data Structures
Functional Programming with Immutable Data StructuresFunctional Programming with Immutable Data Structures
Functional Programming with Immutable Data Structures
Nomenclatura e peças de container
Nomenclatura  e peças de containerNomenclatura  e peças de container
Nomenclatura e peças de container
Geometria Projetiva
Geometria ProjetivaGeometria Projetiva
Geometria Projetiva
Polyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better AgilityPolyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better Agility
Javascript Libraries
Javascript LibrariesJavascript Libraries
Javascript Libraries
How to Make an Eight Bit Computer and Save the World!
How to Make an Eight Bit Computer and Save the World!How to Make an Eight Bit Computer and Save the World!
How to Make an Eight Bit Computer and Save the World!
Ragel talk
Ragel talkRagel talk
Ragel talk
A Practical Guide to Connecting Hardware to the Web
A Practical Guide to Connecting Hardware to the WebA Practical Guide to Connecting Hardware to the Web
A Practical Guide to Connecting Hardware to the Web
Introdução ao Arduino
Introdução ao ArduinoIntrodução ao Arduino
Introdução ao Arduino
Minicurso arduino
Minicurso arduinoMinicurso arduino
Minicurso arduino
Incanter Data Sorcery
Incanter Data SorceryIncanter Data Sorcery
Incanter Data Sorcery
Rango - Fab Academy: Machine Design - Fab Academy: Machine - Fab Academy: Machine Design - Fab Academy: Machine Design
The Digital Revolution: Machines that makes
The Digital Revolution: Machines that makesThe Digital Revolution: Machines that makes
The Digital Revolution: Machines that makes
Hadoop + Clojure
Hadoop + ClojureHadoop + Clojure
Hadoop + Clojure
Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at FacebookHadoop and Hive Development at Facebook
Hadoop and Hive Development at Facebook
From Lisp to Clojure/Incanter and RAn Introduction
From Lisp to Clojure/Incanter and RAn IntroductionFrom Lisp to Clojure/Incanter and RAn Introduction
From Lisp to Clojure/Incanter and RAn Introduction


Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey

Último (20)

Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024

Multi-core Parallelization in Clojure - a Case Study

  • 1. Multi-core Parallelization in Clojure - a Case Study Johann M. Kraus and Hans A. Kestler AG Bioinformatics and Systems Biology Institute of Neural Information Processing University of Ulm 29.06.2009
  • 2. Outline 1. Concepts of parallel programming 2. Short introduction to Clojure 3. Multi-core parallel K-means - the case study 4. Analysis and Results 5. Summary
  • 3. Parallel Programming Definition: Parallel programming is a form of programming where many calculations are performed simultaneously. • Physical constraints prevent frequency scaling of processors • This led to an increasing interest in parallel hardware and parallel programming • Multi-core hardware is standard on desktop computers • Parallel software can use this hardware to the full capacity
  • 4. Large problems are divided into smaller ones and the sub- problems are solved simultaneously • Speedup S is limited by the fraction of parallelizable code P 1 • Amdahl’s law: S= 1−P + P N Amdahl's law 20 18 Fraction of parallelizable code 16 0.95 % 0.90 % 14 0.75 % 0.50 % 12 Speedup 10 8 6 4 2 0 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768 65536 Number of processors
  • 5. Concepts of Parallel Programming Explicit vs. implicit parallelization • Explicitly define communication and synchronization details for each task: • MPI • Java Threads • Functional programming allows implicit parallelization: • Parallel processing of functions • Functions are free of side-effects • Data is immutable
  • 6. Distributed vs. local hardware • Master - Slave parallelization • Shared memory parallelization (e.g. Message Passing Interface) (e.g. Open Multi-Processing) CPU Master 0 Slave Slave Slave CPU Shared CPU 0 1 2 4 Memory 1 Slave Slave 3 4 CPU CPU2 3 send data read send result write
  • 7. Thread programming • Threads are refinements of a process that share the same memory and can be processed separately and simultaneously • Available in many languages, e.g. PThreads (C), Java Threads (Java), OpenMP Threads (C, Fortran) • Execution of threads is handled by a scheduler that manages the available processing time • Communication between new start runnable awake threads is faster than communication between processes schedule waiting • Invoking threads is also end block faster than fork/join terminated running processes
  • 8. Concurrency control via locking and synchronizing • Concurrency control ensures that threads can access shared memory without violating data integrity • The most popular approach to concurrency is locking and synchronizing public c l a s s Counter { private int v a l u e = 0 ; public synchronized void i n c r { value = value + 1; } } Counter c o u n t e r = new Counter ( ) ; counter . incr ( ) ; • Problems might occur when using too many locks, too few locks, wrong locks, or locks in the wrong order • Using locks can be fatally error-prone, e.g. dead-locks
  • 9. Concurrency control via transactional memory • Transactional memory offers a flexible alternative to lock-based concurrency control • Functionality is analogous to controlling simultaneous access to database management systems • Transactions ensure properties: • Atomicity: Either all changes of a transaction occur or none do • Consistency: Only valid changes are committed • Isolation: No transaction sees the effect of other transactions • Durability: Changes from transactions will be persistent
  • 10. • Software transactional memory maps transactional memory to concurrency control in parallel programming TIME :Transaction 0 :Data :Transaction 1 get data get data [consistent data] send modified data [consistent data] send modified data get data [consistent data] send modified data
  • 11. Clojure • Functional programming language hosted on the JVM • Extends the code-as-data paradigm to maps and vectors • Based on immutable data structures • Provides built-in concurrency support via software transactional memory • Completely symbiotic to Java, e.g. easy access to Java libraries • Platform independent
  • 12. Java interaction ( import ’ ( c e r n . j e t . random . s a m p l i n g RandomSamplingAssistant ) ) ( defn sample [n k] ( seq ( . RandomSamplingAssistant ( sampleArray k ( i n t −a r r a y ( range n ) ) ) ) ) ) • Dynamic typing and multi-methods • An object is defined as the sum of what it can do (methods), rather than the sum of what it is (type hierarchy) • Add type hints to speed up code ( defn da+ [#ˆ doubles a s #ˆdoubles bs ] (amap a s i r e t (+ ( aget a s i ) ( aget bs i ) ) ) )
  • 13. Transactional references and STM • Transactional references ensure safe coordinated synchronous changes to mutable storage locations • Are bound to a single storage location for their lifetime • Only allow mutation of that location to occur within transactions • Available operations are ref-set, alter, and commute • No explicit locking is required ( def c o u n t e r ( r e f 0 ) ) ( dosync ( a l t e r c o u n t e r inc ) )
  • 14. Agents • Agents allow independent asynchronous change of mutable locations • Are bound to a single storage location for their lifetime • Only allow mutation of that location to a new state to occur as a result of an action • Actions are functions that are asynchronously applied to the state of an Agent • The return value of an action becomes new state of the Agent • Agents are integrated with the STM ( def c o u n t e r ( agent 0 ) ) ( send c o u n t e r inc )
  • 15. Cluster analysis • Given a data set X compute a partition of X into k disjoint clusters C, such that: k (1) Ci = X i=1 (2) Ci = ∅ and Ci ∩ Cj = ∅ • How many clusters are in the data set? 3 cluster 9 cluster
  • 16. Cluster algorithms • For all possible partitions evaluate the objective function f and search the optimum. Number of data points 30 • The cardinality of the set of all possible 35 25 Runtime (nanosecond) 30 20 partitions is given by: 25 15 20 15 10 k 1 10 Stirling numbers of k−i k N k = (−1) 5 the second kind SN i 5 k! i 0 0 i=0 0 5 10 15 20 25 30 35 Number of clusters Cluster algorithms provide a heuristic for this search: • Partitional clustering (K-means, Neuralgas, SOM, Fuzzy C-means, ...) • Hierarchical clustering (Divisive/agglomerative, Complete linkage, ...) • Graph-based clustering (Spectral clustering, NMF, Affinity propagation, ...) • Model-based clustering, Biclustering, Semi-supervised clustering
  • 17. K-means algorithm Function KMeans Input : X = { x 1 , . . . , x n } ( Data t o be c l u s t e r e d ) k ( Number o f c l u s t e r s ) Output : C = { c 1 , . . . , c k } ( C l u s t e r c e n t r o i d s ) m: X −> C ( C l u s t e r a s s i g n m e n t s ) I n i t i a l i z e C ( e . g . random s e l e c t i o n from X) While C h a s changed For e a c h x i i n X m( x i ) = a r g m i n j d i s t a n c e ( x i , c j ) End For e a c h c j i n C c j = c e n t r o i d ( { x i | m( x i ) = j } ) End End
  • 18. Cluster Validation • Evaluation requires repeated runs of clustering, e.g.: • Resampled data sets • Different parameters • MCA-index: mean proportion of samples being consistent over different clusterings k M CA = 1 n maxπ i=1 |Ai ∩ Bj |
  • 19. Estimation of the expected value of a validation index 1.0 Random label: randomly assign each item to a cluster k 0.8 Random partition: choose a mean mca index 0.6 random partition 0.4 Random prototype: assign each item to its next prototype 0.2 0.0 0 10 20 30 40 50 Mean value from 100 runs cluster
  • 20. Multi-core K-means with Clojure • Split the data set into smaller pieces that are handled by agents • Each cluster is represented by an agent • Add a commutative list of cluster members within a transactional reference to accelerate the centroid update step Data Data Data Data Data Agent 0 Agent 1 Agent 2 Agent 3 Agent n Member Cluster Ref 0 Agent 0 Cluster Member Agent 1 Ref 1 Cluster Member Agent k Ref k read write
  • 21. simultaneous read Cluster Data Agent 0 Agent 0 Cluster Data Agent 1 Agent 1 Cluster Agent k Data Agent n simultaneous write Data Member Agent 0 Ref 0 Data Member Agent 1 Ref 1 Data Agent n Member Ref 2
  • 22. read: (nearest-cluster) write: (commute) (assoc) ( defn a s s i g n m e n t [ ] (map #(send % update−d a t a a g e n t ) DataAgents ) ( defn update−d a t a a g e n t [ d a t a p o i n t s ] (map update−d a t a p o i n t d a t a p o i n t s ) ) ( defn update−d a t a p o i n t [ d a t a p o i n t ] ( l e t [ newass ( n e a r e s t −c l u s t e r d a t a p o i n t ) ] ( dosync (commute ( nth MemberRefs newass ) conj ( : d a t a d a t a p o i n t ) ) ) ( assoc d a t a p o i n t : a s s i g n m e n t newass ) ) )
  • 23. Benchmark results Large data sets (artificial): • Each data point is sampled from N(0,1) • Summary for 10 runs of K-means 10.000 cases, 100 dimensions 1.000.000 cases, 200 dimensions 20 Cluster 20 Cluster 150 450 runtime (seconds) runtime (minutes) 100 300 150 50 0 0 ParaKMeans K-means R McKmeans K-means R McKmeans
  • 24. Number of computer cores used • Number of data agents used 100.000 x 500 100.000 x 500 20 cluster 20 cluster 800 1500 600 runtime (seconds) runtime (seconds) 1000 400 500 200 0 0 1 4 8 4 6 8 10 number of computer cores number of data agents
  • 25. Large data sets with cluster structure • Data sampled from a multi-variate normal distribution • 100000 samples, 200/500 dimensions, 10/20 cluster K-means R McKmeans 2000 1500 runtime (seconds) 1000 500 0 200 / 10 200 / 20 500 / 10 500 / 20 200 / 10 200 / 20 500 / 10 500 / 20 Number of samples / Number of clusters
  • 26. Accuracy compared to the known grouping of data • Measured with the MCA index • Red bars indicate the random-prototype baseline 100.000 x 200 100.000 x 200 100.000 x 500 100.000 x 500 10 cluster 20 cluster 10 cluster 20 cluster 1.0 0.8 _ _ _ _ _ _ _ _ MCA index 0.4 0.6 0.2 0.0 McKmeans K-means R McKmeans K-means R McKmeans K-means R McKmeans K-means R
  • 27. Real world data set • Microarray data (Radiation-induced changes in human gene expression) • 22277 samples (genes) and 465 features (profiles) K-means R McKmeans 350 runtime (seconds) 250 150 50 0 2 Cluster 5 Cluster 10 Cluster 20 Cluster 2 Cluster 5 Cluster 10 Cluster 20 Cluster Number of clusters Smirnov D, Morley M, Shin E, Spielman R, Cheung V: Genetic analysis of radiation-induced changes in human gene expression. Nature 2009, 459:587–591
  • 28. Application to Cluster Number Estimation • Repeated clustering with different subsets of data • Repeated for different number of clusters k • Most stable clustering is produced for the ‘real’ cluster number • Jackknife resampling 1.0 • _ _ _ _ 0.8 Evaluation with MCA index _ _ 0.6 • Data set:100000 samples, MCA index 100 features, 3 cluster 0.4 • 0.2 10 runs per cluster number 0.0 • 49.26 minutes on dual-quad 2 3 4 5 6 7 core 3.2 GHz number of clusters
  • 29. Java GUI ( import ’ ( j a v a x . s w i n g JFrame J L a b e l J T e x t F i e l d JButton ) ’ ( j a v a . awt . e v e n t A c t i o n L i s t e n e r ) ’ ( j a v a . awt GridLayout ) ) ( let [ frame ( new JFrame ” H e l l o , World ! ” ) h e l l o b u t t o n ( new JButton ” Say h e l l o ” ) h e l l o l a b e l ( new J L a b e l ” ” ) ] ( . h e l l o button ( addActionListener ( proxy [ A c t i o n L i s t e n e r ] [ ] ( actionPerformed [ evt ] ( . hello label ( s e t T e x t ” H e l l o , World ! ” ) ) ) ) ) ) ( d o t o frame ( . s e t L a y o u t ( new GridLayout 1 1 3 3 ) ) ( . add h e l l o b u t t o n ) ( . add h e l l o l a b e l ) ( . s e t S i z e 300 8 0 ) ( . s e t V i s i b l e true )))
  • 30.
  • 31. Summary • Writing parallel programs usually requires a careful software design and a deep knowledge about thread-safe programming • Concurrency control via transactional memory circumvents problems of lock-based concurrency strategies • Immutable data structures play a key role to software transactional memory • Clojure combines Lisp, Java and a powerful STM system • This enables fast parallelization of algorithms, even for rapid prototyping • Our simulations show a good performance of the parallelized code
  • 32. Thank you for your attention.
  • 33. Statistical computing library • • Clojure-based statistical computing • R-like semantics • COLT library for numerical computation • JFreeChart library for graphics