SlideShare a Scribd company logo
1 of 44
Download to read offline
Statistical analysis of network data and evolution on GPUs:
           High-performance statistical computing




                                                     Thomas Thorne & Michael P.H.
                                                              Stumpf

                                                         Theoretical Systems Biology Group


                                                                       25/01/2012




      Network Analysis on GPUs   Thomas Thorne & Michael P.H. Stumpf                         1 of 22
Trends in High-Performance Computing




                         CUDA OpenCL MPI OpenMPI PVM




   Network Analysis on GPUs   Thomas Thorne & Michael P.H. Stumpf   Introduction   2 of 22
(Hidden) Costs of Computing
Prototyping/Development Time is typically reduced for scripting
             languages such as R or Python.




     Network Analysis on GPUs   Thomas Thorne & Michael P.H. Stumpf   Introduction   3 of 22
(Hidden) Costs of Computing
Prototyping/Development Time is typically reduced for scripting
             languages such as R or Python.
  Run Time on single threads C/C++ (or Fortran) have better
             performance characteristics. But for specialized tasks
             other languages, e.g. Haskell, can show good
             characteristics.




     Network Analysis on GPUs   Thomas Thorne & Michael P.H. Stumpf   Introduction   3 of 22
(Hidden) Costs of Computing
Prototyping/Development Time is typically reduced for scripting
             languages such as R or Python.
  Run Time on single threads C/C++ (or Fortran) have better
             performance characteristics. But for specialized tasks
             other languages, e.g. Haskell, can show good
             characteristics.
Energy Requirements Every Watt we use for computing we also have
             to extract with air conditioning.




     Network Analysis on GPUs   Thomas Thorne & Michael P.H. Stumpf   Introduction   3 of 22
(Hidden) Costs of Computing
Prototyping/Development Time is typically reduced for scripting
             languages such as R or Python.
  Run Time on single threads C/C++ (or Fortran) have better
             performance characteristics. But for specialized tasks
             other languages, e.g. Haskell, can show good
             characteristics.
Energy Requirements Every Watt we use for computing we also have
             to extract with air conditioning.

The role of GPUs
• GPUs can be accessed from many different programming
  languages (e.g. PyCUDA).
• GPUs have a comparatively small footprint and relatively modest
  energy requirements compared to clusters of CPUs.
• GPUs were designed for consumer electronics: computer gamers
  have different needs from the HPC community.
     Network Analysis on GPUs   Thomas Thorne & Michael P.H. Stumpf   Introduction   3 of 22
Evolving Networks
                                                                                α


                                                                            δ




                                                                                    δ
                                                                                α


                                                                            δ


                                                             γ




                                                                                    δ
   Network Analysis on GPUs   Thomas Thorne & Michael P.H. Stumpf
                                                                                γ
                                                                    Network Evolution   4 of 22
Evolving Networks

Model-Based Evolutionary Analysis
• For sequence data we use models of nucleotide substitution in
  order to infer phylogenies in a likelihood or Bayesian framework.
• None of these models — even the general time-reversible model
  — are particularly realistic; but by allowing for complicating factors
  e.g. rate variation we capture much of the variability observed
  across a phylogenetic panel.
• Modes of network evolution will be even more complicated and
  exhibit high levels of contingency; moreover the structure and
  function of different parts of the network will be intricately linked.
• Nevertheless we believe that modelling the processes underlying
  the evolution of networks can provide useful insights; in particular
  we can study how functionality is distributed across groups of
  genes.


     Network Analysis on GPUs   Thomas Thorne & Michael P.H. Stumpf   Network Evolution   4 of 22
Network Evolution Models




             (a) Duplication attachment (b) Duplication attachment
                                                  with complimentarity




                                                                     wj
            (c) Linear preferential                                  wi
                                                      (d) General scale-free
            attachment
   Network Analysis on GPUs   Thomas Thorne & Michael P.H. Stumpf   Network Evolution   5 of 22
ABC on Networks

Summarizing Networks
• Data are noisy and incomplete.
• We can simulate models of network
  evolution, but this does not allow us to
  calculate likelihoods for all but very
  trivial models.
• There is also no sufficient statistic that
  would allow us to summarize networks,
  so ABC approaches require some
  thought.
• Many possible summary statistics of
  networks are expensive to calculate.
    Full likelihood: Wiuf et al., PNAS (2006).
             ABC: Ratman et al., PLoS Comp.Biol. (2008).

                                                                          Stumpf & Wiuf, J. Roy. Soc. Interface (2010).
           Network Analysis on GPUs       Thomas Thorne & Michael P.H. Stumpf   Network Evolution                         6 of 22
Graph Spectrum
                     c
                                                                         a b c d e
                                                                                            
                                                                         0 1 1 1 0               a
    a                                            e
                                                                          
                                     d                          1 0 1 1 0 b
                                                                          
                                                            A = 1 1 0 0 0  c
                                                                          
                                                                1 1 0 0 1 d
                                                                          
                     b                                                   0 0 0 1 0               e

Graph Spectra
Given a graph G comprised of a set of nodes N and edges (i , j ) ∈ E
with i , j ∈ N, the adjacency matrix, A, of the graph is defined by
                                                 1      if (i , j ) ∈ E ,
                                    ai ,j =
                               0 otherwise.
The eigenvalues, λ, of this matrix provide one way of defining the
graph spectrum.
        Network Analysis on GPUs   Thomas Thorne & Michael P.H. Stumpf   Network Evolution           7 of 22
Spectral Distances
A simple distance measure between graphs having adjacency
matrices A and B, known as the edit distance, is to count the number
of edges that are not shared by both graphs,

                                D (A, B ) =              (ai ,j − bi ,j )2 .
                                                  i ,j




     Network Analysis on GPUs   Thomas Thorne & Michael P.H. Stumpf     Network Evolution   8 of 22
Spectral Distances
A simple distance measure between graphs having adjacency
matrices A and B, known as the edit distance, is to count the number
of edges that are not shared by both graphs,

                                D (A, B ) =              (ai ,j − bi ,j )2 .
                                                  i ,j

However for unlabelled graphs we require some mapping h from
i ∈ NA to i ∈ NB that minimizes the distance

                 D (A, B )        Dh (A, B ) =                   (ai ,j − bh(i ),h(j ) )2 ,
                                                          i ,j




     Network Analysis on GPUs   Thomas Thorne & Michael P.H. Stumpf       Network Evolution   8 of 22
Spectral Distances
A simple distance measure between graphs having adjacency
matrices A and B, known as the edit distance, is to count the number
of edges that are not shared by both graphs,

                                D (A, B ) =                (ai ,j − bi ,j )2 .
                                                    i ,j

However for unlabelled graphs we require some mapping h from
i ∈ NA to i ∈ NB that minimizes the distance

                 D (A, B )        Dh (A, B ) =                     (ai ,j − bh(i ),h(j ) )2 ,
                                                            i ,j

Given a spectrum (which is relatively cheap to compute) we have

                                                              (α)          (β) 2
                           D (A, B ) =                     λl        − λl
                                                l


     Network Analysis on GPUs   Thomas Thorne & Michael P.H. Stumpf         Network Evolution   8 of 22
ABC using Graph Spectra


For an observed network, N, and a simulated network, Sθ , we use the
distance between the spectra

                                                         (N)          (S) 2
                         D (N, Sθ ) =                  λl      − λl            ,
                                                l

in our ABC SMC procedure. Note that this distance is a close lower
bound on the distance between the raw data; we therefore do not
have to bother with summary statistics.
Also, calculating graph spectra costs as much as calculating other
O (N 3 ) statistics (such as all shortest paths, the network diameter or
the within-reach distribution).




     Network Analysis on GPUs   Thomas Thorne & Michael P.H. Stumpf   Network Evolution   9 of 22
Simulated Data
                                   δ




        1.5




                                                      0.4




                                                                                                   0.4




                                                                                                                                       0.4
                                                      0.3




                                                                                                   0.3




                                                                                                                                       0.3
        1.0




                                                      0.2




                                                                                                   0.2




                                                                                                                                       0.2
        0.5




                                                      0.1




                                                                                                   0.1




                                                                                                                                       0.1
                                                      0.0




                                                                                                   0.0




                                                                                                                                       0.0
        0.0



              −0.1     0.1             0.3      0.5         0.0    0.2     0.4        0.6   0.8          0.2   0.4   0.6   0.8   1.0         0.0    0.2    0.4       0.6    0.8    1.0
                                                                                 α




                                                      0.4
        0.8




                                                                                                   0.8




                                                                                                                                       0.8
                                                      0.3
        0.6




                                                                                                   0.6




                                                                                                                                       0.6
                                                      0.2
        0.4




                                                                                                   0.4




                                                                                                                                       0.4
                                                      0.1
        0.2




                                                                                                   0.2




                                                                                                                                       0.2
        0.0




                                                      0.0




                                                                                                   0.0




                                                                                                                                       0.0
               0.0   0.1     0.2        0.3   0.4           −0.2     0.2             0.6     1.0         0.2   0.4   0.6   0.8   1.0         0.0    0.2    0.4       0.6    0.8    1.0
                                                                                                                     p
        1.0




                                                      1.0




                                                                                                                                       1.0
                                                                                                   0.4
        0.8




                                                      0.8




                                                                                                                                       0.8
                                                                                                   0.3
        0.6




                                                      0.6




                                                                                                                                       0.6
                                                                                                   0.2
        0.4




                                                      0.4




                                                                                                                                       0.4
                                                                                                   0.1
        0.2




                                                      0.2




                                                                                                                                       0.2
                                                                                                   0.0



               0.0   0.1     0.2        0.3   0.4           0.0    0.2     0.4        0.6   0.8          0.0 0.2 0.4 0.6 0.8 1.0 1.2         0.0    0.2    0.4       0.6    0.8    1.0
                                                                                                                                                                 m
        1.0




                                                      1.0




                                                                                                   1.0




                                                                                                                                       1.2
        0.8




                                                      0.8




                                                                                                   0.8




                                                                                                                                       0.8
        0.6




                                                      0.6




                                                                                                   0.6
        0.4




                                                      0.4




                                                                                                   0.4




                                                                                                                                       0.4
        0.2




                                                      0.2




                                                                                                   0.2
        0.0




                                                      0.0




                                                                                                   0.0




                                                                                                                                       0.0



               0.0   0.1     0.2        0.3   0.4           0.0    0.2     0.4        0.6   0.8          0.2   0.4   0.6   0.8   1.0          0.0    0.2   0.4       0.6   0.8    1.0


   Network Analysis on GPUs                             Thomas Thorne & Michael P.H. Stumpf                                      Network Evolution                                       10 of 22
Protein Interaction Network Data
   Species                 Proteins      Interactions      Genome size          Sampling fraction
   S.cerevisiae                5035            22118                  6532               0.77
   D. melanogaster             7506            22871                 14076               0.53
   H. pylori                    715              1423                 1589               0.45
   E. coli                     1888              7008                 5416               0.35




    Network Analysis on GPUs   Thomas Thorne & Michael P.H. Stumpf   Network Evolution              11 of 22
Protein Interaction Network Data
                               Species                         Proteins        Interactions        Genome size      Sampling fraction
                               S.cerevisiae                       5035                     22118         6532                0.77
                               D. melanogaster                    7506                     22871        14076                0.53
                               H. pylori                           715                      1423         1589                0.45
                               E. coli                            1888                      7008         5416                0.35



                    0.5




                    0.4
Model probability




                                                                       Organism
                    0.3                                                   S.cerevisae
                                                                          D.melanogaster
                                                                          H.pylori
                                                                          E.coli

                    0.2




                    0.1




                    0.0


                          DA      DAC    LPA       SF   DACL    DACR
                                           Model


                               Network Analysis on GPUs           Thomas Thorne & Michael P.H. Stumpf    Network Evolution              11 of 22
Protein Interaction Network Data
                               Species                         Proteins        Interactions        Genome size      Sampling fraction
                               S.cerevisiae                       5035                     22118         6532                0.77
                               D. melanogaster                    7506                     22871        14076                0.53
                               H. pylori                           715                      1423         1589                0.45
                               E. coli                            1888                      7008         5416                0.35



                    0.5
                                                                                               Model Selection
                                                                                               • Inference here was based on all
                    0.4

                                                                                                 the data, not summary
Model probability




                    0.3
                                                                       Organism
                                                                          S.cerevisae            statistics.
                                                                          D.melanogaster
                                                                          H.pylori
                                                                          E.coli               • Duplication models receive the
                    0.2
                                                                                                 strongest support from the data.
                    0.1                                                                        • Several models receive support
                                                                                                 and no model is chosen
                    0.0
                                                                                                 unambiguously.
                          DA      DAC    LPA       SF   DACL    DACR
                                           Model


                               Network Analysis on GPUs           Thomas Thorne & Michael P.H. Stumpf    Network Evolution              11 of 22
Protein Interaction Network Data
                         δ                     α

        15




                                   8
                                   6
        10
 DA




                                   4
        5




                                   2
        0




                                   0
             0.0   0.4       0.8       0.0   0.4    0.8
                         δ                     α
        15




                                   8
                                   6
        10
 DAC




                                   4
        5




                                   2




                                                                                                                       S.cerevisiae
        0




                                   0




             0.0   0.4       0.8       0.0   0.4    0.8                                                                D. melanogaster
                         δ                     α                          p                           m
                                                                                                                       H. pylori




                                                                                    1.0
        10




                                                             10
                                   4




                                                                                    0.8
                                                                                                                       E. coli
        8




                                                             8
                                   3




                                                                                    0.6
        6




                                                             6
 DACL




                                   2




                                                                                    0.4
        4




                                                             4
                                   1




                                                                                    0.2
        2




                                                             2




                                                                                    0.0
        0




                                   0




                                                             0




             0.0   0.4       0.8       0.0   0.4    0.8           0.0   0.4   0.8         0   2   4       6   8   10
                         δ                     α                          p                           m

                                                                                    1.0
                                   4




                                                             5
        8




                                                                                    0.8
                                                             4
                                   3
        6




                                                                                    0.6
                                                             3
 DACR




                                   2
        4




                                                                                    0.4
                                                             2
                                   1
        2




                                                                                    0.2
                                                             1




                                                                                    0.0
        0




                                   0




                                                             0




             0.0   0.4       0.8       0.0   0.4    0.8           0.0   0.4   0.8         0   2   4       6   8   10


              Network Analysis on GPUs             Thomas Thorne & Michael P.H. Stumpf            Network Evolution                  11 of 22
GPUs in Computational Statistics
Performance
• With highly optimized libraries (e.g. BLAST/ATLAS) we can run
  numerically demanding jobs relatively straightforwardly.




    Network Analysis on GPUs   Thomas Thorne & Michael P.H. Stumpf   GPU and the Alternatives   12 of 22
GPUs in Computational Statistics
Performance
• With highly optimized libraries (e.g. BLAST/ATLAS) we can run
  numerically demanding jobs relatively straightforwardly.
• Whenever poor-man’s paralellization is possible, GPUs offer
  considerable advantages over multi-core CPU systems (at
  comparable cost and energy requirements).




    Network Analysis on GPUs   Thomas Thorne & Michael P.H. Stumpf   GPU and the Alternatives   12 of 22
GPUs in Computational Statistics
Performance
• With highly optimized libraries (e.g. BLAST/ATLAS) we can run
  numerically demanding jobs relatively straightforwardly.
• Whenever poor-man’s paralellization is possible, GPUs offer
  considerable advantages over multi-core CPU systems (at
  comparable cost and energy requirements).
• Performance depends crucially on our ability to map tasks onto the
  hardware.




    Network Analysis on GPUs   Thomas Thorne & Michael P.H. Stumpf   GPU and the Alternatives   12 of 22
GPUs in Computational Statistics
Performance
• With highly optimized libraries (e.g. BLAST/ATLAS) we can run
  numerically demanding jobs relatively straightforwardly.
• Whenever poor-man’s paralellization is possible, GPUs offer
  considerable advantages over multi-core CPU systems (at
  comparable cost and energy requirements).
• Performance depends crucially on our ability to map tasks onto the
  hardware.

Challenges
• GPU hardware was initially conceived for different purposes —
  computer gamers need fewer random numbers than MCMC or
  SMC procedures. The address space accessible to
  single-precision numbers also suffices to kill zombies etc .


    Network Analysis on GPUs   Thomas Thorne & Michael P.H. Stumpf   GPU and the Alternatives   12 of 22
GPUs in Computational Statistics
Performance
• With highly optimized libraries (e.g. BLAST/ATLAS) we can run
  numerically demanding jobs relatively straightforwardly.
• Whenever poor-man’s paralellization is possible, GPUs offer
  considerable advantages over multi-core CPU systems (at
  comparable cost and energy requirements).
• Performance depends crucially on our ability to map tasks onto the
  hardware.

Challenges
• GPU hardware was initially conceived for different purposes —
  computer gamers need fewer random numbers than MCMC or
  SMC procedures. The address space accessible to
  single-precision numbers also suffices to kill zombies etc .
• Combining several GPUs requires additional work, using e.g. MPI.

    Network Analysis on GPUs   Thomas Thorne & Michael P.H. Stumpf   GPU and the Alternatives   12 of 22
Alternatives to GPUs: FPGAs
    Field Programmable Gate Arrays are configurable electronic circuits
    that are partly (re)configurable. Here the hardware is adapted to the
2    D.B. Thomas, W. Luk, and M. Stumpf
    problem at hand and encapsulates the programme in its
    programmable logic arrays.
                                                                                 Multiple instances in xc4vlx60
         Performance (MSwap-Compare/s)




                                         100000
                                                                                 Single Instance in Virtex-4
                                                                                 Quad Opteron Software
                                         10000


                                          1000


                                           100


                                            10


                                             1
                                                  4         9                   14                  19                 24
                                                                         Graph Size

           Network Analysis on GPUs                   Thomas Thorne & Michael P.H. Stumpf   GPU and the Alternatives        13 of 22
g. 9. A comparison of the raw swap-compare performance between the Virtex
Alternatives to GPUs: CPUs (and MPI. . . )

CPUs with multiple cores
                                                      40
are flexible, have large
address spaces and a
wealth of flexible numerical                           30

routines. This makes
implementation of                                                                                      processor




                                               time
numerically demanding                                 20
                                                                                                            CPU
                                                                                                            GPU
tasks relatively
straightforward. In
particular there is less of an                        10


incentive to consider how a
problem is best
implemented in software                                    100    500   1000          5000 10000    50000
                                                                               size
that takes advantage of
hardware features.                                    6-core Xeon vs M2050 (448 cores) programmed in
                                                                                OpenCL
       Network Analysis on GPUs   Thomas Thorne & Michael P.H. Stumpf    GPU and the Alternatives                 14 of 22
(Pseudo) Random Numbers
The Mersenne-Twister is one of the standard random number
generators for simulation. MT19937 has a period of
219937 − 1 ≈ 4 × 106001 .




     Network Analysis on GPUs   Thomas Thorne & Michael P.H. Stumpf   Random Numbers on GPUs   15 of 22
(Pseudo) Random Numbers
The Mersenne-Twister is one of the standard random number
generators for simulation. MT19937 has a period of
219937 − 1 ≈ 4 × 106001 .
But MT does not have cryptographic strength (once 624 iterates have
been observed, all future states are predictable), unlike
Blum-Blum-Shub,
                           xn+1 = xn modM ,
where M = pq with p, q large prime numbers. But it is too slow for
simulations.




     Network Analysis on GPUs   Thomas Thorne & Michael P.H. Stumpf   Random Numbers on GPUs   15 of 22
(Pseudo) Random Numbers
The Mersenne-Twister is one of the standard random number
generators for simulation. MT19937 has a period of
219937 − 1 ≈ 4 × 106001 .
But MT does not have cryptographic strength (once 624 iterates have
been observed, all future states are predictable), unlike
Blum-Blum-Shub,
                           xn+1 = xn modM ,
where M = pq with p, q large prime numbers. But it is too slow for
simulations.
Parallel Random Number Generation
• Running RNGs in parallel does not produce reliable sets of random
  numbers.
• We need algorithms produce large numbers of parallel streams of
  “good” random numbers.
• We also need better algorithms for weighted sampling.

     Network Analysis on GPUs   Thomas Thorne & Michael P.H. Stumpf   Random Numbers on GPUs   15 of 22
What GPUs are Good At
• GPUs are good for linear threads involving mathematical functions.
• We should avoid loops and branches in the code.
• In an optimal world we should aim for all threads to finish at the
  same time.
                                     Execution cycle




    Network Analysis on GPUs   Thomas Thorne & Michael P.H. Stumpf   Random Numbers on GPUs   16 of 22
Generating RNGs in Parallel
We assume that we have a function xn+1 = f (xn ) which generates a
stream of (pseudo) random numbers

                     x0 , x1 , x2 , . . . , xr , . . . , xs , . . . , xt , . . . , xz




     Network Analysis on GPUs   Thomas Thorne & Michael P.H. Stumpf    Random Numbers on GPUs   17 of 22
Generating RNGs in Parallel
We assume that we have a function xn+1 = f (xn ) which generates a
stream of (pseudo) random numbers

                            x0 , x1 , x2 , . . . , xr , . . . , xs , . . . , xt , . . . , xz



                                                       Parallel Streams
          x0 , x1 , . . .
          x0 , x1 , . . .
         x0 , x1 , . . .


                                                          Sub-Streams
        x0 , . . . , xr −1
        xr , . . . , xs−1
        xs , . . . , xt −1



     Network Analysis on GPUs         Thomas Thorne & Michael P.H. Stumpf     Random Numbers on GPUs   17 of 22
Counter-Based RNGs

We can represent RNGs as state-space processes,

                         yn+1 = f (yn )          with            f :Y −→ Y
    xn+1 = gk ,nmodJ (y          n+1/J      )    with            g :Y × K × ZJ −→ X ,

where x are essentially behaving as x ∼ U[0,1] ; here K is the key
space, J the number of random numbers generated from the internal
state of the RNG.




     Network Analysis on GPUs   Thomas Thorne & Michael P.H. Stumpf   Random Numbers on GPUs   18 of 22
Counter-Based RNGs

We can represent RNGs as state-space processes,

                         yn+1 = f (yn )          with            f :Y −→ Y
    xn+1 = gk ,nmodJ (y          n+1/J      )    with            g :Y × K × ZJ −→ X ,

where x are essentially behaving as x ∼ U[0,1] ; here K is the key
space, J the number of random numbers generated from the internal
state of the RNG.
Salmon et al.(SC11) propose to use a simple form for f (·) and

                                         gk ,j = hk ◦ bk




     Network Analysis on GPUs   Thomas Thorne & Michael P.H. Stumpf   Random Numbers on GPUs   18 of 22
Counter-Based RNGs

We can represent RNGs as state-space processes,

                         yn+1 = f (yn )          with            f :Y −→ Y
    xn+1 = gk ,nmodJ (y          n+1/J      )    with            g :Y × K × ZJ −→ X ,

where x are essentially behaving as x ∼ U[0,1] ; here K is the key
space, J the number of random numbers generated from the internal
state of the RNG.
Salmon et al.(SC11) propose to use a simple form for f (·) and

                                         gk ,j = hk ◦ bk

 For a sufficiently complex bijection bk we can use simple updates
and leave the randomization to bk . Here cryptographic routines come
in useful.


     Network Analysis on GPUs   Thomas Thorne & Michael P.H. Stumpf   Random Numbers on GPUs   18 of 22
RNG Performance on CPUs and GPUs
                        Method        Max.    Min.    Output    Intel CPU    Nvidia GPU      AMD GPU
                                      input   state    size    cpB GB/s      cpB    GB/s     cpB GB/s
 Counter-based, Cryptographic
                        AES(sw)    (1+0)×16   11×16     1×16   31.2    0.4      –       –        –       –
                        AES(hw)    (1+0)×16   11×16     1×16    1.7    7.2      –       –        –       –
     Threefish (Threefry-4×64-72)    (4+4)×8       0      4×8    7.3    1.7   51.8    15.3    302.8     4.5
 Counter-based, Crush-resistant
                      ARS-5(hw)    (1+1)×16       0     1×16   0.7    17.8      –        –      –        –
                      ARS-7(hw)    (1+1)×16       0     1×16   1.1    11.1      –        –      –        –
                Threefry-2×64-13    (2+2)×8       0      2×8   2.0     6.3   13.6     58.1   25.6     52.5
                Threefry-2×64-20    (2+2)×8       0      2×8   2.4     5.1   15.3     51.7   30.4     44.5
                Threefry-4×64-12    (4+4)×8       0      4×8   1.1    11.2    9.4     84.1   15.2     90.0
                Threefry-4×64-20    (4+4)×8       0      4×8   1.9     6.4   15.0     52.8   29.2     46.4
                Threefry-4×32-12    (4+4)×4       0      4×4   2.2     5.6    9.5     83.0   12.8    106.2
                Threefry-4×32-20    (4+4)×4       0      4×4   3.9     3.1   15.7     50.4   25.2     53.8
                   Philox2×64-6     (2+1)×8       0      2×8   2.1     5.9    8.8     90.0   37.2     36.4
                  Philox2×64-10     (2+1)×8       0      2×8   4.3     2.8   14.7     53.7   62.8     21.6
                   Philox4×64-7     (4+2)×8       0      4×8   2.0     6.0    8.6     92.4   36.4     37.2
                  Philox4×64-10     (4+2)×8       0      4×8   3.2     3.9   12.9     61.5   54.0     25.1
                   Philox4×32-7     (4+2)×4       0      4×4   2.4     5.0    3.9    201.6   12.0    113.1
                  Philox4×32-10     (4+2)×4       0      4×4   3.6     3.4    5.4   145.3    17.2    79.1
 Conventional, Crush-resistant
                     MRG32k3a             0    6×4    1000×4    3.8    3.2      –       –       –       –
                     MRG32k3a             0    6×4       4×4   20.3    0.6      –       –       –       –
                      MRGk5-93            0    5×4       1×4    7.6    1.6    9.2    85.5       –       –
 Conventional, Crushable
                Mersenne Twister          0   312×8      1×8    2.0    6.1   43.3    18.3        –       –
                      XORWOW              0     6×4      1×4    1.6    7.7    5.8   136.7     16.8    81.1

Table 2: Memory and performance characteristics for a variety of counter-based and conventional PRNGs.
Taken from Salmon et al.(see References). a counter type of width c×w bytes and a key type of width
Maximum input is written as (c+k)×w, indicating
k×w bytes. Minimum state and output size are c×w bytes. Counter-based PRNG performance is reported
with the minimal number of rounds for Crush-resistance, and also with extra rounds for “safety margin.”
Performance is shown in bold for recommended PRNGs that have the best platform-specific performance with 22
       Network Analysis on GPUs   Thomas Thorne & Michael P.H. Stumpf Random Numbers on GPUs        19 of
Computational Challenges of GPUs

• Address spaces are a potential issue: using single precision we
  are prone to run out of numbers for challenging MCMC/SMC
  applications very quickly.




    Network Analysis on GPUs   Thomas Thorne & Michael P.H. Stumpf   Conclusions   20 of 22
Computational Challenges of GPUs

• Address spaces are a potential issue: using single precision we
  are prone to run out of numbers for challenging MCMC/SMC
  applications very quickly.
• Memory is an issue. In particular, registers/memory close to the
  computing cores are precious.




    Network Analysis on GPUs   Thomas Thorne & Michael P.H. Stumpf   Conclusions   20 of 22
Computational Challenges of GPUs

• Address spaces are a potential issue: using single precision we
  are prone to run out of numbers for challenging MCMC/SMC
  applications very quickly.
• Memory is an issue. In particular, registers/memory close to the
  computing cores are precious.
• Bandwidth is an issue — coordination between GPU and CPU is
  much more challenging in statistical applications than e.g. for
  texture mapping of graphics rendering tasks.




    Network Analysis on GPUs   Thomas Thorne & Michael P.H. Stumpf   Conclusions   20 of 22
Computational Challenges of GPUs

• Address spaces are a potential issue: using single precision we
  are prone to run out of numbers for challenging MCMC/SMC
  applications very quickly.
• Memory is an issue. In particular, registers/memory close to the
  computing cores are precious.
• Bandwidth is an issue — coordination between GPU and CPU is
  much more challenging in statistical applications than e.g. for
  texture mapping of graphics rendering tasks.
• Programming has to be much more hardware aware than for CPUs
  — more precisely, non-hardware adapted programming will be
  more obviously less efficient than for CPUs.




    Network Analysis on GPUs   Thomas Thorne & Michael P.H. Stumpf   Conclusions   20 of 22
Computational Challenges of GPUs

• Address spaces are a potential issue: using single precision we
  are prone to run out of numbers for challenging MCMC/SMC
  applications very quickly.
• Memory is an issue. In particular, registers/memory close to the
  computing cores are precious.
• Bandwidth is an issue — coordination between GPU and CPU is
  much more challenging in statistical applications than e.g. for
  texture mapping of graphics rendering tasks.
• Programming has to be much more hardware aware than for CPUs
  — more precisely, non-hardware adapted programming will be
  more obviously less efficient than for CPUs.
• There are some differences from conventional ANSI standards for
  mathematics (e.g. rounding).


    Network Analysis on GPUs   Thomas Thorne & Michael P.H. Stumpf   Conclusions   20 of 22
References
GPUs and ABC

• Zhou, Liepe, Sheng, Stumpf, Barnes (2011) GPU accelerated
  biochemical network simulation. Bioinformatics 27:874-876.
• Liepe, Barnes, Cule, Erguler, Kirk, Toni, Stumpf (2010)
  ABC-SysBioapproximate Bayesian computation in Python with GPU
  support. Bioinformatics 26:1797-1799.

GPUs and Random Numbers
• Salmon, Moraes, Dror, Shaw (2011) Parallel Random Numbers: As Easy
  as 1, 2, 3. in Proceedings of 2011 International Conference for High
  Performance Computing, Networking, Storage and Analysis, ACM, New
  York; http://doi.acm.org/10.1145/2063384.2063405.
• Howes, Thomas (2007) Efficient Random Number Generation and
  Application Using CUDA in GPU Gems 3, Addison-Wesley Professional,
  Boston; http:
  //http.developer.nvidia.com/GPUGems3/gpugems3_ch37.html.
    Network Analysis on GPUs   Thomas Thorne & Michael P.H. Stumpf   Conclusions   21 of 22
Acknowledgements




   Network Analysis on GPUs   Thomas Thorne & Michael P.H. Stumpf   Conclusions   22 of 22

More Related Content

What's hot

Master Thesis Defense
Master Thesis DefenseMaster Thesis Defense
Master Thesis DefenseFilipo Mór
 
ICML2013読み会 Large-Scale Learning with Less RAM via Randomization
ICML2013読み会 Large-Scale Learning with Less RAM via RandomizationICML2013読み会 Large-Scale Learning with Less RAM via Randomization
ICML2013読み会 Large-Scale Learning with Less RAM via RandomizationHidekazu Oiwa
 
Colfax-Winograd-Summary _final (1)
Colfax-Winograd-Summary _final (1)Colfax-Winograd-Summary _final (1)
Colfax-Winograd-Summary _final (1)Sangamesh Ragate
 
VAE-type Deep Generative Models
VAE-type Deep Generative ModelsVAE-type Deep Generative Models
VAE-type Deep Generative ModelsKenta Oono
 
NIPS読み会2013: One-shot learning by inverting a compositional causal process
NIPS読み会2013: One-shot learning by inverting  a compositional causal processNIPS読み会2013: One-shot learning by inverting  a compositional causal process
NIPS読み会2013: One-shot learning by inverting a compositional causal processnozyh
 
D I G I T A L I C A P P L I C A T I O N S J N T U M O D E L P A P E R{Www
D I G I T A L  I C  A P P L I C A T I O N S  J N T U  M O D E L  P A P E R{WwwD I G I T A L  I C  A P P L I C A T I O N S  J N T U  M O D E L  P A P E R{Www
D I G I T A L I C A P P L I C A T I O N S J N T U M O D E L P A P E R{Wwwguest3f9c6b
 
Common Design of Deep Learning Frameworks
Common Design of Deep Learning FrameworksCommon Design of Deep Learning Frameworks
Common Design of Deep Learning FrameworksKenta Oono
 
論文紹介 Fast imagetagging
論文紹介 Fast imagetagging論文紹介 Fast imagetagging
論文紹介 Fast imagetaggingTakashi Abe
 
Software tookits for machine learning and graphical models
Software tookits for machine learning and graphical modelsSoftware tookits for machine learning and graphical models
Software tookits for machine learning and graphical modelsbutest
 
The Concurrent Constraint Programming Research Programmes -- Redux (part2)
The Concurrent Constraint Programming Research Programmes -- Redux (part2)The Concurrent Constraint Programming Research Programmes -- Redux (part2)
The Concurrent Constraint Programming Research Programmes -- Redux (part2)Pierre Schaus
 
Present v0.2
Present v0.2Present v0.2
Present v0.2d4k
 
Trends In Graph Data Management And Mining
Trends In Graph Data Management And MiningTrends In Graph Data Management And Mining
Trends In Graph Data Management And MiningSrinath Srinivasa
 
Sucha_ICC_2012
Sucha_ICC_2012Sucha_ICC_2012
Sucha_ICC_2012sucha
 
Optimizedfeedforwardnetworkofcnnwithxnorv5 180321130759
Optimizedfeedforwardnetworkofcnnwithxnorv5 180321130759Optimizedfeedforwardnetworkofcnnwithxnorv5 180321130759
Optimizedfeedforwardnetworkofcnnwithxnorv5 180321130759Vandna Sambyal
 
The Concurrent Constraint Programming Research Programmes -- Redux
The Concurrent Constraint Programming Research Programmes -- ReduxThe Concurrent Constraint Programming Research Programmes -- Redux
The Concurrent Constraint Programming Research Programmes -- ReduxPierre Schaus
 

What's hot (20)

Master Thesis Defense
Master Thesis DefenseMaster Thesis Defense
Master Thesis Defense
 
ICML2013読み会 Large-Scale Learning with Less RAM via Randomization
ICML2013読み会 Large-Scale Learning with Less RAM via RandomizationICML2013読み会 Large-Scale Learning with Less RAM via Randomization
ICML2013読み会 Large-Scale Learning with Less RAM via Randomization
 
Colfax-Winograd-Summary _final (1)
Colfax-Winograd-Summary _final (1)Colfax-Winograd-Summary _final (1)
Colfax-Winograd-Summary _final (1)
 
VAE-type Deep Generative Models
VAE-type Deep Generative ModelsVAE-type Deep Generative Models
VAE-type Deep Generative Models
 
NIPS読み会2013: One-shot learning by inverting a compositional causal process
NIPS読み会2013: One-shot learning by inverting  a compositional causal processNIPS読み会2013: One-shot learning by inverting  a compositional causal process
NIPS読み会2013: One-shot learning by inverting a compositional causal process
 
70
7070
70
 
D I G I T A L I C A P P L I C A T I O N S J N T U M O D E L P A P E R{Www
D I G I T A L  I C  A P P L I C A T I O N S  J N T U  M O D E L  P A P E R{WwwD I G I T A L  I C  A P P L I C A T I O N S  J N T U  M O D E L  P A P E R{Www
D I G I T A L I C A P P L I C A T I O N S J N T U M O D E L P A P E R{Www
 
gSpan algorithm
gSpan algorithmgSpan algorithm
gSpan algorithm
 
EEDC Programming Models
EEDC Programming ModelsEEDC Programming Models
EEDC Programming Models
 
Common Design of Deep Learning Frameworks
Common Design of Deep Learning FrameworksCommon Design of Deep Learning Frameworks
Common Design of Deep Learning Frameworks
 
論文紹介 Fast imagetagging
論文紹介 Fast imagetagging論文紹介 Fast imagetagging
論文紹介 Fast imagetagging
 
Software tookits for machine learning and graphical models
Software tookits for machine learning and graphical modelsSoftware tookits for machine learning and graphical models
Software tookits for machine learning and graphical models
 
The Concurrent Constraint Programming Research Programmes -- Redux (part2)
The Concurrent Constraint Programming Research Programmes -- Redux (part2)The Concurrent Constraint Programming Research Programmes -- Redux (part2)
The Concurrent Constraint Programming Research Programmes -- Redux (part2)
 
Present v0.2
Present v0.2Present v0.2
Present v0.2
 
Trends In Graph Data Management And Mining
Trends In Graph Data Management And MiningTrends In Graph Data Management And Mining
Trends In Graph Data Management And Mining
 
ECE 565 FInal Project
ECE 565 FInal ProjectECE 565 FInal Project
ECE 565 FInal Project
 
Sucha_ICC_2012
Sucha_ICC_2012Sucha_ICC_2012
Sucha_ICC_2012
 
Optimizedfeedforwardnetworkofcnnwithxnorv5 180321130759
Optimizedfeedforwardnetworkofcnnwithxnorv5 180321130759Optimizedfeedforwardnetworkofcnnwithxnorv5 180321130759
Optimizedfeedforwardnetworkofcnnwithxnorv5 180321130759
 
Optimized feedforward network of cnn with xnor v5
Optimized feedforward network of cnn with xnor v5Optimized feedforward network of cnn with xnor v5
Optimized feedforward network of cnn with xnor v5
 
The Concurrent Constraint Programming Research Programmes -- Redux
The Concurrent Constraint Programming Research Programmes -- ReduxThe Concurrent Constraint Programming Research Programmes -- Redux
The Concurrent Constraint Programming Research Programmes -- Redux
 

Similar to Statistical analysis of network data and evolution on GPUs: High-performance statistical computing

Approximate Bayesian Computation on GPUs
Approximate Bayesian Computation on GPUsApproximate Bayesian Computation on GPUs
Approximate Bayesian Computation on GPUsMichael Stumpf
 
Passive network-redesign-ntua
Passive network-redesign-ntuaPassive network-redesign-ntua
Passive network-redesign-ntuaIEEE NTUA SB
 
Análisis llamadas telefónicas con Teoría de Grafos y R
Análisis llamadas telefónicas con Teoría de Grafos y RAnálisis llamadas telefónicas con Teoría de Grafos y R
Análisis llamadas telefónicas con Teoría de Grafos y RRafael Nogueras
 
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...Jinho Choi
 
Network coding
Network codingNetwork coding
Network codingLishi He
 
An Algorithm for Bayesian Network Construction from Data
An Algorithm for Bayesian Network Construction from DataAn Algorithm for Bayesian Network Construction from Data
An Algorithm for Bayesian Network Construction from Databutest
 
Stack squeues lists
Stack squeues listsStack squeues lists
Stack squeues listsJames Wong
 
Stacksqueueslists
StacksqueueslistsStacksqueueslists
StacksqueueslistsFraboni Ec
 
Stacks queues lists
Stacks queues listsStacks queues lists
Stacks queues listsTony Nguyen
 
Stacks queues lists
Stacks queues listsStacks queues lists
Stacks queues listsHarry Potter
 
Stacks queues lists
Stacks queues listsStacks queues lists
Stacks queues listsYoung Alista
 
Automated Information Retrieval Model Using FP Growth Based Fuzzy Particle Sw...
Automated Information Retrieval Model Using FP Growth Based Fuzzy Particle Sw...Automated Information Retrieval Model Using FP Growth Based Fuzzy Particle Sw...
Automated Information Retrieval Model Using FP Growth Based Fuzzy Particle Sw...AIRCC Publishing Corporation
 
Improving initial generations in pso algorithm for transportation network des...
Improving initial generations in pso algorithm for transportation network des...Improving initial generations in pso algorithm for transportation network des...
Improving initial generations in pso algorithm for transportation network des...ijcsit
 
Bekas for cognitive_speaker_series
Bekas for cognitive_speaker_seriesBekas for cognitive_speaker_series
Bekas for cognitive_speaker_seriesdiannepatricia
 
Bekas for cognitive_speaker_series
Bekas for cognitive_speaker_seriesBekas for cognitive_speaker_series
Bekas for cognitive_speaker_seriesdiannepatricia
 
Locally densest subgraph discovery
Locally densest subgraph discoveryLocally densest subgraph discovery
Locally densest subgraph discoveryaftab alam
 
SCALABLE PATTERN MATCHING OVER COMPRESSED GRAPHS VIA DE-DENSIFICATION
SCALABLE PATTERN MATCHING OVER COMPRESSED GRAPHS VIA DE-DENSIFICATIONSCALABLE PATTERN MATCHING OVER COMPRESSED GRAPHS VIA DE-DENSIFICATION
SCALABLE PATTERN MATCHING OVER COMPRESSED GRAPHS VIA DE-DENSIFICATIONaftab alam
 

Similar to Statistical analysis of network data and evolution on GPUs: High-performance statistical computing (20)

Approximate Bayesian Computation on GPUs
Approximate Bayesian Computation on GPUsApproximate Bayesian Computation on GPUs
Approximate Bayesian Computation on GPUs
 
Kailash(13EC35032)_mtp.pptx
Kailash(13EC35032)_mtp.pptxKailash(13EC35032)_mtp.pptx
Kailash(13EC35032)_mtp.pptx
 
Passive network-redesign-ntua
Passive network-redesign-ntuaPassive network-redesign-ntua
Passive network-redesign-ntua
 
Análisis llamadas telefónicas con Teoría de Grafos y R
Análisis llamadas telefónicas con Teoría de Grafos y RAnálisis llamadas telefónicas con Teoría de Grafos y R
Análisis llamadas telefónicas con Teoría de Grafos y R
 
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
Adaptation of Multilingual Transformer Encoder for Robust Enhanced Universal ...
 
Network coding
Network codingNetwork coding
Network coding
 
TINET_FRnOG_2008_public
TINET_FRnOG_2008_publicTINET_FRnOG_2008_public
TINET_FRnOG_2008_public
 
An Algorithm for Bayesian Network Construction from Data
An Algorithm for Bayesian Network Construction from DataAn Algorithm for Bayesian Network Construction from Data
An Algorithm for Bayesian Network Construction from Data
 
Stack squeues lists
Stack squeues listsStack squeues lists
Stack squeues lists
 
Stacksqueueslists
StacksqueueslistsStacksqueueslists
Stacksqueueslists
 
Stacks queues lists
Stacks queues listsStacks queues lists
Stacks queues lists
 
Stacks queues lists
Stacks queues listsStacks queues lists
Stacks queues lists
 
Stacks queues lists
Stacks queues listsStacks queues lists
Stacks queues lists
 
Stacks queues lists
Stacks queues listsStacks queues lists
Stacks queues lists
 
Automated Information Retrieval Model Using FP Growth Based Fuzzy Particle Sw...
Automated Information Retrieval Model Using FP Growth Based Fuzzy Particle Sw...Automated Information Retrieval Model Using FP Growth Based Fuzzy Particle Sw...
Automated Information Retrieval Model Using FP Growth Based Fuzzy Particle Sw...
 
Improving initial generations in pso algorithm for transportation network des...
Improving initial generations in pso algorithm for transportation network des...Improving initial generations in pso algorithm for transportation network des...
Improving initial generations in pso algorithm for transportation network des...
 
Bekas for cognitive_speaker_series
Bekas for cognitive_speaker_seriesBekas for cognitive_speaker_series
Bekas for cognitive_speaker_series
 
Bekas for cognitive_speaker_series
Bekas for cognitive_speaker_seriesBekas for cognitive_speaker_series
Bekas for cognitive_speaker_series
 
Locally densest subgraph discovery
Locally densest subgraph discoveryLocally densest subgraph discovery
Locally densest subgraph discovery
 
SCALABLE PATTERN MATCHING OVER COMPRESSED GRAPHS VIA DE-DENSIFICATION
SCALABLE PATTERN MATCHING OVER COMPRESSED GRAPHS VIA DE-DENSIFICATIONSCALABLE PATTERN MATCHING OVER COMPRESSED GRAPHS VIA DE-DENSIFICATION
SCALABLE PATTERN MATCHING OVER COMPRESSED GRAPHS VIA DE-DENSIFICATION
 

More from Michael Stumpf

Gaining Confidence in Signalling and Regulatory Networks
Gaining Confidence in Signalling and Regulatory NetworksGaining Confidence in Signalling and Regulatory Networks
Gaining Confidence in Signalling and Regulatory NetworksMichael Stumpf
 
Beyond Static Networks: A Bayesian Non-Parametric Approach
Beyond Static Networks: A Bayesian Non-Parametric ApproachBeyond Static Networks: A Bayesian Non-Parametric Approach
Beyond Static Networks: A Bayesian Non-Parametric ApproachMichael Stumpf
 
Multi-Scale Models in Immunobiology
Multi-Scale Models in ImmunobiologyMulti-Scale Models in Immunobiology
Multi-Scale Models in ImmunobiologyMichael Stumpf
 
Considerate Approaches to ABC Model Selection
Considerate Approaches to ABC Model SelectionConsiderate Approaches to ABC Model Selection
Considerate Approaches to ABC Model SelectionMichael Stumpf
 
Time-Variable Networks in Candida Glabrata
Time-Variable Networks in Candida GlabrataTime-Variable Networks in Candida Glabrata
Time-Variable Networks in Candida GlabrataMichael Stumpf
 
Noisy information transmission through molecular interaction networks
Noisy information transmission through molecular interaction networksNoisy information transmission through molecular interaction networks
Noisy information transmission through molecular interaction networksMichael Stumpf
 

More from Michael Stumpf (7)

Gaining Confidence in Signalling and Regulatory Networks
Gaining Confidence in Signalling and Regulatory NetworksGaining Confidence in Signalling and Regulatory Networks
Gaining Confidence in Signalling and Regulatory Networks
 
Beyond Static Networks: A Bayesian Non-Parametric Approach
Beyond Static Networks: A Bayesian Non-Parametric ApproachBeyond Static Networks: A Bayesian Non-Parametric Approach
Beyond Static Networks: A Bayesian Non-Parametric Approach
 
Are Powerlaws Useful?
Are Powerlaws Useful?Are Powerlaws Useful?
Are Powerlaws Useful?
 
Multi-Scale Models in Immunobiology
Multi-Scale Models in ImmunobiologyMulti-Scale Models in Immunobiology
Multi-Scale Models in Immunobiology
 
Considerate Approaches to ABC Model Selection
Considerate Approaches to ABC Model SelectionConsiderate Approaches to ABC Model Selection
Considerate Approaches to ABC Model Selection
 
Time-Variable Networks in Candida Glabrata
Time-Variable Networks in Candida GlabrataTime-Variable Networks in Candida Glabrata
Time-Variable Networks in Candida Glabrata
 
Noisy information transmission through molecular interaction networks
Noisy information transmission through molecular interaction networksNoisy information transmission through molecular interaction networks
Noisy information transmission through molecular interaction networks
 

Recently uploaded

Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 

Recently uploaded (20)

Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 

Statistical analysis of network data and evolution on GPUs: High-performance statistical computing

  • 1. Statistical analysis of network data and evolution on GPUs: High-performance statistical computing Thomas Thorne & Michael P.H. Stumpf Theoretical Systems Biology Group 25/01/2012 Network Analysis on GPUs Thomas Thorne & Michael P.H. Stumpf 1 of 22
  • 2. Trends in High-Performance Computing CUDA OpenCL MPI OpenMPI PVM Network Analysis on GPUs Thomas Thorne & Michael P.H. Stumpf Introduction 2 of 22
  • 3. (Hidden) Costs of Computing Prototyping/Development Time is typically reduced for scripting languages such as R or Python. Network Analysis on GPUs Thomas Thorne & Michael P.H. Stumpf Introduction 3 of 22
  • 4. (Hidden) Costs of Computing Prototyping/Development Time is typically reduced for scripting languages such as R or Python. Run Time on single threads C/C++ (or Fortran) have better performance characteristics. But for specialized tasks other languages, e.g. Haskell, can show good characteristics. Network Analysis on GPUs Thomas Thorne & Michael P.H. Stumpf Introduction 3 of 22
  • 5. (Hidden) Costs of Computing Prototyping/Development Time is typically reduced for scripting languages such as R or Python. Run Time on single threads C/C++ (or Fortran) have better performance characteristics. But for specialized tasks other languages, e.g. Haskell, can show good characteristics. Energy Requirements Every Watt we use for computing we also have to extract with air conditioning. Network Analysis on GPUs Thomas Thorne & Michael P.H. Stumpf Introduction 3 of 22
  • 6. (Hidden) Costs of Computing Prototyping/Development Time is typically reduced for scripting languages such as R or Python. Run Time on single threads C/C++ (or Fortran) have better performance characteristics. But for specialized tasks other languages, e.g. Haskell, can show good characteristics. Energy Requirements Every Watt we use for computing we also have to extract with air conditioning. The role of GPUs • GPUs can be accessed from many different programming languages (e.g. PyCUDA). • GPUs have a comparatively small footprint and relatively modest energy requirements compared to clusters of CPUs. • GPUs were designed for consumer electronics: computer gamers have different needs from the HPC community. Network Analysis on GPUs Thomas Thorne & Michael P.H. Stumpf Introduction 3 of 22
  • 7. Evolving Networks α δ δ α δ γ δ Network Analysis on GPUs Thomas Thorne & Michael P.H. Stumpf γ Network Evolution 4 of 22
  • 8. Evolving Networks Model-Based Evolutionary Analysis • For sequence data we use models of nucleotide substitution in order to infer phylogenies in a likelihood or Bayesian framework. • None of these models — even the general time-reversible model — are particularly realistic; but by allowing for complicating factors e.g. rate variation we capture much of the variability observed across a phylogenetic panel. • Modes of network evolution will be even more complicated and exhibit high levels of contingency; moreover the structure and function of different parts of the network will be intricately linked. • Nevertheless we believe that modelling the processes underlying the evolution of networks can provide useful insights; in particular we can study how functionality is distributed across groups of genes. Network Analysis on GPUs Thomas Thorne & Michael P.H. Stumpf Network Evolution 4 of 22
  • 9. Network Evolution Models (a) Duplication attachment (b) Duplication attachment with complimentarity wj (c) Linear preferential wi (d) General scale-free attachment Network Analysis on GPUs Thomas Thorne & Michael P.H. Stumpf Network Evolution 5 of 22
  • 10. ABC on Networks Summarizing Networks • Data are noisy and incomplete. • We can simulate models of network evolution, but this does not allow us to calculate likelihoods for all but very trivial models. • There is also no sufficient statistic that would allow us to summarize networks, so ABC approaches require some thought. • Many possible summary statistics of networks are expensive to calculate. Full likelihood: Wiuf et al., PNAS (2006). ABC: Ratman et al., PLoS Comp.Biol. (2008). Stumpf & Wiuf, J. Roy. Soc. Interface (2010). Network Analysis on GPUs Thomas Thorne & Michael P.H. Stumpf Network Evolution 6 of 22
  • 11. Graph Spectrum c a b c d e   0 1 1 1 0 a a e   d  1 0 1 1 0 b   A = 1 1 0 0 0  c    1 1 0 0 1 d   b 0 0 0 1 0 e Graph Spectra Given a graph G comprised of a set of nodes N and edges (i , j ) ∈ E with i , j ∈ N, the adjacency matrix, A, of the graph is defined by 1 if (i , j ) ∈ E , ai ,j = 0 otherwise. The eigenvalues, λ, of this matrix provide one way of defining the graph spectrum. Network Analysis on GPUs Thomas Thorne & Michael P.H. Stumpf Network Evolution 7 of 22
  • 12. Spectral Distances A simple distance measure between graphs having adjacency matrices A and B, known as the edit distance, is to count the number of edges that are not shared by both graphs, D (A, B ) = (ai ,j − bi ,j )2 . i ,j Network Analysis on GPUs Thomas Thorne & Michael P.H. Stumpf Network Evolution 8 of 22
  • 13. Spectral Distances A simple distance measure between graphs having adjacency matrices A and B, known as the edit distance, is to count the number of edges that are not shared by both graphs, D (A, B ) = (ai ,j − bi ,j )2 . i ,j However for unlabelled graphs we require some mapping h from i ∈ NA to i ∈ NB that minimizes the distance D (A, B ) Dh (A, B ) = (ai ,j − bh(i ),h(j ) )2 , i ,j Network Analysis on GPUs Thomas Thorne & Michael P.H. Stumpf Network Evolution 8 of 22
  • 14. Spectral Distances A simple distance measure between graphs having adjacency matrices A and B, known as the edit distance, is to count the number of edges that are not shared by both graphs, D (A, B ) = (ai ,j − bi ,j )2 . i ,j However for unlabelled graphs we require some mapping h from i ∈ NA to i ∈ NB that minimizes the distance D (A, B ) Dh (A, B ) = (ai ,j − bh(i ),h(j ) )2 , i ,j Given a spectrum (which is relatively cheap to compute) we have (α) (β) 2 D (A, B ) = λl − λl l Network Analysis on GPUs Thomas Thorne & Michael P.H. Stumpf Network Evolution 8 of 22
  • 15. ABC using Graph Spectra For an observed network, N, and a simulated network, Sθ , we use the distance between the spectra (N) (S) 2 D (N, Sθ ) = λl − λl , l in our ABC SMC procedure. Note that this distance is a close lower bound on the distance between the raw data; we therefore do not have to bother with summary statistics. Also, calculating graph spectra costs as much as calculating other O (N 3 ) statistics (such as all shortest paths, the network diameter or the within-reach distribution). Network Analysis on GPUs Thomas Thorne & Michael P.H. Stumpf Network Evolution 9 of 22
  • 16. Simulated Data δ 1.5 0.4 0.4 0.4 0.3 0.3 0.3 1.0 0.2 0.2 0.2 0.5 0.1 0.1 0.1 0.0 0.0 0.0 0.0 −0.1 0.1 0.3 0.5 0.0 0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 α 0.4 0.8 0.8 0.8 0.3 0.6 0.6 0.6 0.2 0.4 0.4 0.4 0.1 0.2 0.2 0.2 0.0 0.0 0.0 0.0 0.0 0.1 0.2 0.3 0.4 −0.2 0.2 0.6 1.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 p 1.0 1.0 1.0 0.4 0.8 0.8 0.8 0.3 0.6 0.6 0.6 0.2 0.4 0.4 0.4 0.1 0.2 0.2 0.2 0.0 0.0 0.1 0.2 0.3 0.4 0.0 0.2 0.4 0.6 0.8 0.0 0.2 0.4 0.6 0.8 1.0 1.2 0.0 0.2 0.4 0.6 0.8 1.0 m 1.0 1.0 1.0 1.2 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.0 0.0 0.0 0.0 0.0 0.1 0.2 0.3 0.4 0.0 0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Network Analysis on GPUs Thomas Thorne & Michael P.H. Stumpf Network Evolution 10 of 22
  • 17. Protein Interaction Network Data Species Proteins Interactions Genome size Sampling fraction S.cerevisiae 5035 22118 6532 0.77 D. melanogaster 7506 22871 14076 0.53 H. pylori 715 1423 1589 0.45 E. coli 1888 7008 5416 0.35 Network Analysis on GPUs Thomas Thorne & Michael P.H. Stumpf Network Evolution 11 of 22
  • 18. Protein Interaction Network Data Species Proteins Interactions Genome size Sampling fraction S.cerevisiae 5035 22118 6532 0.77 D. melanogaster 7506 22871 14076 0.53 H. pylori 715 1423 1589 0.45 E. coli 1888 7008 5416 0.35 0.5 0.4 Model probability Organism 0.3 S.cerevisae D.melanogaster H.pylori E.coli 0.2 0.1 0.0 DA DAC LPA SF DACL DACR Model Network Analysis on GPUs Thomas Thorne & Michael P.H. Stumpf Network Evolution 11 of 22
  • 19. Protein Interaction Network Data Species Proteins Interactions Genome size Sampling fraction S.cerevisiae 5035 22118 6532 0.77 D. melanogaster 7506 22871 14076 0.53 H. pylori 715 1423 1589 0.45 E. coli 1888 7008 5416 0.35 0.5 Model Selection • Inference here was based on all 0.4 the data, not summary Model probability 0.3 Organism S.cerevisae statistics. D.melanogaster H.pylori E.coli • Duplication models receive the 0.2 strongest support from the data. 0.1 • Several models receive support and no model is chosen 0.0 unambiguously. DA DAC LPA SF DACL DACR Model Network Analysis on GPUs Thomas Thorne & Michael P.H. Stumpf Network Evolution 11 of 22
  • 20. Protein Interaction Network Data δ α 15 8 6 10 DA 4 5 2 0 0 0.0 0.4 0.8 0.0 0.4 0.8 δ α 15 8 6 10 DAC 4 5 2 S.cerevisiae 0 0 0.0 0.4 0.8 0.0 0.4 0.8 D. melanogaster δ α p m H. pylori 1.0 10 10 4 0.8 E. coli 8 8 3 0.6 6 6 DACL 2 0.4 4 4 1 0.2 2 2 0.0 0 0 0 0.0 0.4 0.8 0.0 0.4 0.8 0.0 0.4 0.8 0 2 4 6 8 10 δ α p m 1.0 4 5 8 0.8 4 3 6 0.6 3 DACR 2 4 0.4 2 1 2 0.2 1 0.0 0 0 0 0.0 0.4 0.8 0.0 0.4 0.8 0.0 0.4 0.8 0 2 4 6 8 10 Network Analysis on GPUs Thomas Thorne & Michael P.H. Stumpf Network Evolution 11 of 22
  • 21. GPUs in Computational Statistics Performance • With highly optimized libraries (e.g. BLAST/ATLAS) we can run numerically demanding jobs relatively straightforwardly. Network Analysis on GPUs Thomas Thorne & Michael P.H. Stumpf GPU and the Alternatives 12 of 22
  • 22. GPUs in Computational Statistics Performance • With highly optimized libraries (e.g. BLAST/ATLAS) we can run numerically demanding jobs relatively straightforwardly. • Whenever poor-man’s paralellization is possible, GPUs offer considerable advantages over multi-core CPU systems (at comparable cost and energy requirements). Network Analysis on GPUs Thomas Thorne & Michael P.H. Stumpf GPU and the Alternatives 12 of 22
  • 23. GPUs in Computational Statistics Performance • With highly optimized libraries (e.g. BLAST/ATLAS) we can run numerically demanding jobs relatively straightforwardly. • Whenever poor-man’s paralellization is possible, GPUs offer considerable advantages over multi-core CPU systems (at comparable cost and energy requirements). • Performance depends crucially on our ability to map tasks onto the hardware. Network Analysis on GPUs Thomas Thorne & Michael P.H. Stumpf GPU and the Alternatives 12 of 22
  • 24. GPUs in Computational Statistics Performance • With highly optimized libraries (e.g. BLAST/ATLAS) we can run numerically demanding jobs relatively straightforwardly. • Whenever poor-man’s paralellization is possible, GPUs offer considerable advantages over multi-core CPU systems (at comparable cost and energy requirements). • Performance depends crucially on our ability to map tasks onto the hardware. Challenges • GPU hardware was initially conceived for different purposes — computer gamers need fewer random numbers than MCMC or SMC procedures. The address space accessible to single-precision numbers also suffices to kill zombies etc . Network Analysis on GPUs Thomas Thorne & Michael P.H. Stumpf GPU and the Alternatives 12 of 22
  • 25. GPUs in Computational Statistics Performance • With highly optimized libraries (e.g. BLAST/ATLAS) we can run numerically demanding jobs relatively straightforwardly. • Whenever poor-man’s paralellization is possible, GPUs offer considerable advantages over multi-core CPU systems (at comparable cost and energy requirements). • Performance depends crucially on our ability to map tasks onto the hardware. Challenges • GPU hardware was initially conceived for different purposes — computer gamers need fewer random numbers than MCMC or SMC procedures. The address space accessible to single-precision numbers also suffices to kill zombies etc . • Combining several GPUs requires additional work, using e.g. MPI. Network Analysis on GPUs Thomas Thorne & Michael P.H. Stumpf GPU and the Alternatives 12 of 22
  • 26. Alternatives to GPUs: FPGAs Field Programmable Gate Arrays are configurable electronic circuits that are partly (re)configurable. Here the hardware is adapted to the 2 D.B. Thomas, W. Luk, and M. Stumpf problem at hand and encapsulates the programme in its programmable logic arrays. Multiple instances in xc4vlx60 Performance (MSwap-Compare/s) 100000 Single Instance in Virtex-4 Quad Opteron Software 10000 1000 100 10 1 4 9 14 19 24 Graph Size Network Analysis on GPUs Thomas Thorne & Michael P.H. Stumpf GPU and the Alternatives 13 of 22 g. 9. A comparison of the raw swap-compare performance between the Virtex
  • 27. Alternatives to GPUs: CPUs (and MPI. . . ) CPUs with multiple cores 40 are flexible, have large address spaces and a wealth of flexible numerical 30 routines. This makes implementation of processor time numerically demanding 20 CPU GPU tasks relatively straightforward. In particular there is less of an 10 incentive to consider how a problem is best implemented in software 100 500 1000 5000 10000 50000 size that takes advantage of hardware features. 6-core Xeon vs M2050 (448 cores) programmed in OpenCL Network Analysis on GPUs Thomas Thorne & Michael P.H. Stumpf GPU and the Alternatives 14 of 22
  • 28. (Pseudo) Random Numbers The Mersenne-Twister is one of the standard random number generators for simulation. MT19937 has a period of 219937 − 1 ≈ 4 × 106001 . Network Analysis on GPUs Thomas Thorne & Michael P.H. Stumpf Random Numbers on GPUs 15 of 22
  • 29. (Pseudo) Random Numbers The Mersenne-Twister is one of the standard random number generators for simulation. MT19937 has a period of 219937 − 1 ≈ 4 × 106001 . But MT does not have cryptographic strength (once 624 iterates have been observed, all future states are predictable), unlike Blum-Blum-Shub, xn+1 = xn modM , where M = pq with p, q large prime numbers. But it is too slow for simulations. Network Analysis on GPUs Thomas Thorne & Michael P.H. Stumpf Random Numbers on GPUs 15 of 22
  • 30. (Pseudo) Random Numbers The Mersenne-Twister is one of the standard random number generators for simulation. MT19937 has a period of 219937 − 1 ≈ 4 × 106001 . But MT does not have cryptographic strength (once 624 iterates have been observed, all future states are predictable), unlike Blum-Blum-Shub, xn+1 = xn modM , where M = pq with p, q large prime numbers. But it is too slow for simulations. Parallel Random Number Generation • Running RNGs in parallel does not produce reliable sets of random numbers. • We need algorithms produce large numbers of parallel streams of “good” random numbers. • We also need better algorithms for weighted sampling. Network Analysis on GPUs Thomas Thorne & Michael P.H. Stumpf Random Numbers on GPUs 15 of 22
  • 31. What GPUs are Good At • GPUs are good for linear threads involving mathematical functions. • We should avoid loops and branches in the code. • In an optimal world we should aim for all threads to finish at the same time. Execution cycle Network Analysis on GPUs Thomas Thorne & Michael P.H. Stumpf Random Numbers on GPUs 16 of 22
  • 32. Generating RNGs in Parallel We assume that we have a function xn+1 = f (xn ) which generates a stream of (pseudo) random numbers x0 , x1 , x2 , . . . , xr , . . . , xs , . . . , xt , . . . , xz Network Analysis on GPUs Thomas Thorne & Michael P.H. Stumpf Random Numbers on GPUs 17 of 22
  • 33. Generating RNGs in Parallel We assume that we have a function xn+1 = f (xn ) which generates a stream of (pseudo) random numbers x0 , x1 , x2 , . . . , xr , . . . , xs , . . . , xt , . . . , xz Parallel Streams x0 , x1 , . . . x0 , x1 , . . . x0 , x1 , . . . Sub-Streams x0 , . . . , xr −1 xr , . . . , xs−1 xs , . . . , xt −1 Network Analysis on GPUs Thomas Thorne & Michael P.H. Stumpf Random Numbers on GPUs 17 of 22
  • 34. Counter-Based RNGs We can represent RNGs as state-space processes, yn+1 = f (yn ) with f :Y −→ Y xn+1 = gk ,nmodJ (y n+1/J ) with g :Y × K × ZJ −→ X , where x are essentially behaving as x ∼ U[0,1] ; here K is the key space, J the number of random numbers generated from the internal state of the RNG. Network Analysis on GPUs Thomas Thorne & Michael P.H. Stumpf Random Numbers on GPUs 18 of 22
  • 35. Counter-Based RNGs We can represent RNGs as state-space processes, yn+1 = f (yn ) with f :Y −→ Y xn+1 = gk ,nmodJ (y n+1/J ) with g :Y × K × ZJ −→ X , where x are essentially behaving as x ∼ U[0,1] ; here K is the key space, J the number of random numbers generated from the internal state of the RNG. Salmon et al.(SC11) propose to use a simple form for f (·) and gk ,j = hk ◦ bk Network Analysis on GPUs Thomas Thorne & Michael P.H. Stumpf Random Numbers on GPUs 18 of 22
  • 36. Counter-Based RNGs We can represent RNGs as state-space processes, yn+1 = f (yn ) with f :Y −→ Y xn+1 = gk ,nmodJ (y n+1/J ) with g :Y × K × ZJ −→ X , where x are essentially behaving as x ∼ U[0,1] ; here K is the key space, J the number of random numbers generated from the internal state of the RNG. Salmon et al.(SC11) propose to use a simple form for f (·) and gk ,j = hk ◦ bk For a sufficiently complex bijection bk we can use simple updates and leave the randomization to bk . Here cryptographic routines come in useful. Network Analysis on GPUs Thomas Thorne & Michael P.H. Stumpf Random Numbers on GPUs 18 of 22
  • 37. RNG Performance on CPUs and GPUs Method Max. Min. Output Intel CPU Nvidia GPU AMD GPU input state size cpB GB/s cpB GB/s cpB GB/s Counter-based, Cryptographic AES(sw) (1+0)×16 11×16 1×16 31.2 0.4 – – – – AES(hw) (1+0)×16 11×16 1×16 1.7 7.2 – – – – Threefish (Threefry-4×64-72) (4+4)×8 0 4×8 7.3 1.7 51.8 15.3 302.8 4.5 Counter-based, Crush-resistant ARS-5(hw) (1+1)×16 0 1×16 0.7 17.8 – – – – ARS-7(hw) (1+1)×16 0 1×16 1.1 11.1 – – – – Threefry-2×64-13 (2+2)×8 0 2×8 2.0 6.3 13.6 58.1 25.6 52.5 Threefry-2×64-20 (2+2)×8 0 2×8 2.4 5.1 15.3 51.7 30.4 44.5 Threefry-4×64-12 (4+4)×8 0 4×8 1.1 11.2 9.4 84.1 15.2 90.0 Threefry-4×64-20 (4+4)×8 0 4×8 1.9 6.4 15.0 52.8 29.2 46.4 Threefry-4×32-12 (4+4)×4 0 4×4 2.2 5.6 9.5 83.0 12.8 106.2 Threefry-4×32-20 (4+4)×4 0 4×4 3.9 3.1 15.7 50.4 25.2 53.8 Philox2×64-6 (2+1)×8 0 2×8 2.1 5.9 8.8 90.0 37.2 36.4 Philox2×64-10 (2+1)×8 0 2×8 4.3 2.8 14.7 53.7 62.8 21.6 Philox4×64-7 (4+2)×8 0 4×8 2.0 6.0 8.6 92.4 36.4 37.2 Philox4×64-10 (4+2)×8 0 4×8 3.2 3.9 12.9 61.5 54.0 25.1 Philox4×32-7 (4+2)×4 0 4×4 2.4 5.0 3.9 201.6 12.0 113.1 Philox4×32-10 (4+2)×4 0 4×4 3.6 3.4 5.4 145.3 17.2 79.1 Conventional, Crush-resistant MRG32k3a 0 6×4 1000×4 3.8 3.2 – – – – MRG32k3a 0 6×4 4×4 20.3 0.6 – – – – MRGk5-93 0 5×4 1×4 7.6 1.6 9.2 85.5 – – Conventional, Crushable Mersenne Twister 0 312×8 1×8 2.0 6.1 43.3 18.3 – – XORWOW 0 6×4 1×4 1.6 7.7 5.8 136.7 16.8 81.1 Table 2: Memory and performance characteristics for a variety of counter-based and conventional PRNGs. Taken from Salmon et al.(see References). a counter type of width c×w bytes and a key type of width Maximum input is written as (c+k)×w, indicating k×w bytes. Minimum state and output size are c×w bytes. Counter-based PRNG performance is reported with the minimal number of rounds for Crush-resistance, and also with extra rounds for “safety margin.” Performance is shown in bold for recommended PRNGs that have the best platform-specific performance with 22 Network Analysis on GPUs Thomas Thorne & Michael P.H. Stumpf Random Numbers on GPUs 19 of
  • 38. Computational Challenges of GPUs • Address spaces are a potential issue: using single precision we are prone to run out of numbers for challenging MCMC/SMC applications very quickly. Network Analysis on GPUs Thomas Thorne & Michael P.H. Stumpf Conclusions 20 of 22
  • 39. Computational Challenges of GPUs • Address spaces are a potential issue: using single precision we are prone to run out of numbers for challenging MCMC/SMC applications very quickly. • Memory is an issue. In particular, registers/memory close to the computing cores are precious. Network Analysis on GPUs Thomas Thorne & Michael P.H. Stumpf Conclusions 20 of 22
  • 40. Computational Challenges of GPUs • Address spaces are a potential issue: using single precision we are prone to run out of numbers for challenging MCMC/SMC applications very quickly. • Memory is an issue. In particular, registers/memory close to the computing cores are precious. • Bandwidth is an issue — coordination between GPU and CPU is much more challenging in statistical applications than e.g. for texture mapping of graphics rendering tasks. Network Analysis on GPUs Thomas Thorne & Michael P.H. Stumpf Conclusions 20 of 22
  • 41. Computational Challenges of GPUs • Address spaces are a potential issue: using single precision we are prone to run out of numbers for challenging MCMC/SMC applications very quickly. • Memory is an issue. In particular, registers/memory close to the computing cores are precious. • Bandwidth is an issue — coordination between GPU and CPU is much more challenging in statistical applications than e.g. for texture mapping of graphics rendering tasks. • Programming has to be much more hardware aware than for CPUs — more precisely, non-hardware adapted programming will be more obviously less efficient than for CPUs. Network Analysis on GPUs Thomas Thorne & Michael P.H. Stumpf Conclusions 20 of 22
  • 42. Computational Challenges of GPUs • Address spaces are a potential issue: using single precision we are prone to run out of numbers for challenging MCMC/SMC applications very quickly. • Memory is an issue. In particular, registers/memory close to the computing cores are precious. • Bandwidth is an issue — coordination between GPU and CPU is much more challenging in statistical applications than e.g. for texture mapping of graphics rendering tasks. • Programming has to be much more hardware aware than for CPUs — more precisely, non-hardware adapted programming will be more obviously less efficient than for CPUs. • There are some differences from conventional ANSI standards for mathematics (e.g. rounding). Network Analysis on GPUs Thomas Thorne & Michael P.H. Stumpf Conclusions 20 of 22
  • 43. References GPUs and ABC • Zhou, Liepe, Sheng, Stumpf, Barnes (2011) GPU accelerated biochemical network simulation. Bioinformatics 27:874-876. • Liepe, Barnes, Cule, Erguler, Kirk, Toni, Stumpf (2010) ABC-SysBioapproximate Bayesian computation in Python with GPU support. Bioinformatics 26:1797-1799. GPUs and Random Numbers • Salmon, Moraes, Dror, Shaw (2011) Parallel Random Numbers: As Easy as 1, 2, 3. in Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, ACM, New York; http://doi.acm.org/10.1145/2063384.2063405. • Howes, Thomas (2007) Efficient Random Number Generation and Application Using CUDA in GPU Gems 3, Addison-Wesley Professional, Boston; http: //http.developer.nvidia.com/GPUGems3/gpugems3_ch37.html. Network Analysis on GPUs Thomas Thorne & Michael P.H. Stumpf Conclusions 21 of 22
  • 44. Acknowledgements Network Analysis on GPUs Thomas Thorne & Michael P.H. Stumpf Conclusions 22 of 22