SlideShare una empresa de Scribd logo
1 de 31
Descargar para leer sin conexión
Parallel External
    Memory Algorithms
        applied to
    Generalized Linear
         Models
Lee E. Edlefsen, Ph.D.
Chief Scientist
JSM 2012


                         1
Introduction and overview                      Revolution Confidential




 For the past several decades the rising tide of
technology has allowed the same data analysis code
to handle the increase in sizes of typical data sets.
That era is ending. The size of data sets is increasing
much more rapidly than the speed of single cores, of
RAM, and of hard drives.
To deal with this, statistical software must be able to
use multiple cores and computers. Parallel external
memory algorithms (PEMA’s) provide a foundation for
such software.


                                                                 2
Introduction and overview – (2)               Revolution Confidential




 External memory algorithms (EMA’s) are those that
  do not require all data to be in RAM, and are widely
  available.
 Parallel implementations of EMA’s allow them to
  run on multiple cores and computers, and to
  process unlimited rows of data.
 This paper describes a general approach to
  efficiently parallelizing EMA’s, using an R and C++
  implementation of generalized linear models (GLM)
  as a detailed example.

                    Revolution R Enterprise                     3
Introduction and overview – (3)                     Revolution Confidential




 This paper discusses:
   the arrangement of code for “automatic” parallelization
   the efficient use of cores
   the efficient use of multiple computers (nodes)
 The approach presented is independent of the
  distributed computing platform (MPI, Hadoop, MPP
  database appliances)
 The paper includes billion row benchmarks
  showing linear scaling with rows and nodes, and
  demonstrating that extremely high performance is
  achievable
                      Revolution R Enterprise                         4
High Performance Computing vs High              Revolution Confidential

Performance Analytics
 HPA is HPC + Data
 High Performance Computing is CPU centric
   Lots of processing on small amounts of data
   Focus is on cores
 High Performance Analytics is data centric
   Less processing per amount of data
   Focus is on feeding data to the cores
      On disk I/O, data locality
      On efficient threading, data management in RAM

                     Revolution R Enterprise                      5
High Performance Analytics in RevoScaleR        Revolution Confidential




 Extremely high performance data management
  and data analysis
 Scales from small local data to huge distributed
  data
 Scales from laptop to cluster to cloud
 Based on a platform that “automatically” and
  efficiently parallelizes and distributes a broad class
  of predictive analytic algorithms
 This platform implements the approach to parallel
  external memory algorithms I will describe
                     Revolution R Enterprise                      6
External memory algorithms                        Revolution Confidential




 External memory algorithms are those that allow
  computations to be split into pieces so that not all data
  has to be in memory at one time
 Such algorithms process data a “chunk” at a time,
  storing intermediate results from each chunk and
  combining them at the end
 Each chunk must produce an intermediate result that
  can be combined with other intermediate results to give
  the final result
 Such algorithms are widely available for data
  management and predictive analytics

                      7
Parallel external memory algorithms              Revolution Confidential


(PEMA’S)
 PEMA’s are external memory algorithms that have
  been parallelized
 Such algorithms process data a chunk at a time in
  parallel, storing intermediate results from each
  chunk and combining them at the end
 External memory algorithms that are not “inherently
  sequential” can be parallelized
   Results for one chunk of data cannot depend upon prior
    results
   Data dependence (lags, leads) is OK

                     Revolution R Enterprise                       8
Generalized Linear Models (GLM)               Revolution Confidential




 The generalized linear model can be thought of as
  a generalization of linear regression
 It extends linear regression to handle dependent
  variables that are generated from exponential
  distribution functions, including Gaussian, Poisson,
  logistic, gamma, binomial, multinomial, and
  tweedie
 Generalized linear models are widely used in a
  variety of fields and industries

                    Revolution R Enterprise                     9
GLM overview                                   Revolution Confidential




 The dependent variable Y is generated from a
  distribution in the exponential family
 The expected value of Y is related to a linear
  predictor of the data X and parameters β through
  the inverse of a “link” function g():
      E(Y) = mu = g-1(Xβ)
 The variance of Y is typically a function V() of the
  mean mu:
    Var(Y) = varmu = V(mu)
                     Revolution R Enterprise                    10
GLM Estimation                                Revolution Confidential




 The parameters of GLM models can be estimated
  using maximum likelihood
 Iteratively reweighted least squares (IRLS) is
  commonly used to obtain the maximum likelihood
  estimates
 Each iteration of IRLS requires at least one pass
  through the data, generating a vector of weights
  and a “new” dependent variable and then doing a
  weighted least squares regression

                    Revolution R Enterprise                    11
IRLS for GLM                                                            Revolution Confidential




 Given an estimate of the parameters β and the
  data X, IRLS requires the computation of a “weight”
  variable W and a “new” dependent variable Z:
  eta = Xβ
  mu = linkinv(eta)
  Z = (y-mu)/mu_eta, where mu_eta is the partial of mu with respect to eta
  W = sqrt(mu_eta*mu_eta)/varmu

 The next β is then computed by regressing Z on X,
  weighted by W
 If the estimation has not converged, the steps are
  repeated
                               Revolution R Enterprise                                   12
In-memory implementations                     Revolution Confidential




 The glm() function in R provides a beautiful and
  efficient in-memory implementation
 However, nearly every computational line of code
  involves processing all rows of data
 There is no easy way to directly convert an
  implementation like this into an implementation that
  can handle data too big to fit into memory and that
  can use multiple cores and multiple computers
 However, it can be accomplished by arranging the
  same computations into separate functions that
  accomplish separate tasks
                    Revolution R Enterprise                    13
Example external memory algorithm for the
mean of a variable                             Revolution Confidential




 Initialization function: total=0, count=0
 ProcessData function: for each block of x; total =
  sum(x), count=length(x)
 UpdateResults function: total12 = total1 + total2
 ProcessResults function: mean = combined total /
  combined count




                     14
A formalization of PEMA’s                           Revolution Confidential




 Arrange the code into 4 functions:
  1. Initialize(): does any necessary initialization
  2. ProcessData(): takes a chunk of data and
     produces an intermediate result (IR); this is the only
     function run in parallel; it must assume it does not have
     all data; it must produce no side-effects
  3. UpdateResults(): takes two IR’s and produces
     another IR that is equivalent to the IR that would have
     been produced by combing the two corresponding
     chunks of data and calling ProcessData()
  4. ProcessResults(): takes any given IR and
     converts it into a “final results” (FR) form
                       Revolution R Enterprise                       15
An external memory algorithm for GLM           Revolution Confidential

 Initialization function: set intermediate values to 0
 ProcessData function: for given β and chunk of
  data X, compute Z, W and M, the weighted cross
  products matrix of X and Z for this chunk
   eta = Xβ, mu = linkinv(eta)
   Z = (y-mu)/mu_eta, W = sqrt(mu_eta*mu_eta)/varmu
   M = [X*W Z*W]’[X*W Z*W]
 UpdateResults function:
   M12 = M1 + M2

 ProcessResults function:
    β = Solve(M) (solves a set of linear equations)

 Check for convergence and repeat if necessary
                     Revolution R Enterprise                    16
A C++ and R implementation of GLM            Revolution Confidential




 C++ “analysis” objects
   Have 4 virtual PEMA methods, among others
   Have member variables for intermediate results
    and for maintaining local state
   Know how to copy themselves (including ability
    to not copy some members, for efficiency)
   Have ability to call into R during ProcessData()
 R “family” objects for glm
   Contain methods for computing Z, W (eta, mu,
    etc)
                   Revolution R Enterprise                    17
GLM in C++ and R: Multiple Cores              Revolution Confidential


 On each computer, a master analysis object makes
  a copy of itself for all usable threads (cores)
  except one
 The remaining thread is assigned to handle all I/O
 In a master loop over the data, the I/O object reads
  a chunk of data
 In parallel (after the first read), portions of the
  previously-read chunk are (virtually) passed to the
  ProcessData() methods of the other objects



                    Revolution R Enterprise                    18
GLM in C++ and R: Multiple Cores – (2)         Revolution Confidential


 For each chunk of data, Z,W are computed (in R
  or C++; if in R, only on 1 thread at a time is
  allowed); Xβ and M are computed in C++
 After all data has been consumed, the master
  analysis object loops over all of the thread-specific
  objects and updates itself (using UpdateResults()),
  resulting in the intermediate results object that
  corresponds to all of the data processed on this
  computer
 If other computers are being used, this computer
  sends it intermediate results to the “master” node

                     Revolution R Enterprise                    19
GLM in C++ and R: Multiple MPI Nodes         Revolution Confidential


 A “master node” sends a copy of the analysis
  object, or instructions on how to create one, to
  each computer (node) on a cluster/grid, and the
  steps described above are carried out
 Each node reads and processes its portion of the
  data (the more local the data the better)
 Worker nodes do not communicate with each other
 Worker nodes do not communicate with the master
  node except for sending their results



                   Revolution R Enterprise                    20
GLM in C++ and R: Multiple MPI Nodes (2)       Revolution Confidential


 When each node has its final IR object, it sends it
  to the master node
 The master node gathers and combines all
  intermediate results using UpdateResults()
 When it has the final intermediate results, it calls
  ProcessResults() to get next estimate of β
 The master node checks for convergence, and
  repeats all of the steps if necessary




                     Revolution R Enterprise                    21
Implementation in RevoScaleR                 Revolution Confidential




 The package RevoScaleR, which is part of
  Revolution R Enterprise, contains an
  implementation of GLM and other algorithms based
  on this approach
 The algorithms are internally threaded
 They can currently use MPI or RPC for inter-
  process communication
 Supports Platform LSF and HPC Server
  schedulers
 We are currently working on supporting Hadoop

                   Revolution R Enterprise                    22
Some features of this implementation         Revolution Confidential




 Handles an arbitrarily large number of rows in a
  fixed amount of memory
 Scales linearly with the number of rows
 Scales (approximately) linearly with the number
  of nodes
 Scales well with the number of cores per node
 Scales well with the number of parameters
 Works on commodity hardware
 Extremely high performance
                   Revolution R Enterprise                    23
Scalability of linear regression with rows
         1 million - 1 billion rows, 443 betas
                                                              Revolution Confidential

                                  (4 cores)


                             Time (secs)
  1200
               ~ 1.1 million rows/second
  1000

   800

   600

   400

   200

     0
         0     200     400            600       800   1000   1200


                      Revolution R Enterprise                                  24
Scalability of glm (logit) with rows
           1 million - 1 billion rows, 443 betas                   Revolution Confidential

                                     (4 cores)


                             Time (secs)
4000

3500

3000

2500

2000

1500

1000

500

  0
       0     200   400         600                 800   1000   1200

                         Revolution R Enterprise                                    25
Scalability with nodes: glm (logit)                                                         Revolution Confidential

            Big (1B rows) and Small (124M rows) data
Big (443 params) and Small (7 params) models (4 cores per node)

                                              Big Data, Big Model
                                              (Super scaling)

        5 iterations per model




                                                                    Big Data, Small Model

                                                             Small Data, Big Model

            Linear scaling reference




                                                             Small Data, Small Model




                                 Revolution R Enterprise                                                     26
Timing comparisons                        Revolution Confidential




 glm() in CRAN R vs rxGlm in RevoScaleR
 SAS’s new HPA functionality vs rxGlm




                Revolution R Enterprise                    27
Revolution Confidential




                 28
Revolution Confidential
HPA Benchmarking comparison* – Logistic Regression


 Rows of data                1 billion                                              1 billion
  Parameters               “just a few”                                                7
      Time                 80 seconds                                              44 seconds
 Data location              In memory                                               On disk
     Nodes                      32                                                     5
     Cores                     384                                                     20
      RAM                   1,536 GB                                                 80 GB

Revolution R is faster on the same amount of data, despite using approximately a
20th as many cores, a 20th as much RAM, a 6th as many nodes, and not pre-
loading data into RAM.


                                *As published by SAS in HPC Wire, April 21, 2011                             29
Conclusion                                   Revolution Confidential




 PEMA’s provide a systematic approach to scalable
  analytic algorithms
 Algorithms implemented in this way can handle
  unlimited numbers of rows on a single core in a
  fixed amount of RAM
 Such algorithms scale well with rows and nodes,
  and scale well with cores up to a point
 Work on commodity hardware
 Work on different distributed computing platforms
 Extremely high performance is possible
                   Revolution R Enterprise                    30
Thank you!                           Revolution Confidential




 R-Core Team
 R Package Developers
 R Community
 Revolution R Enterprise Customers and Beta
  Testers
 Colleagues at Revolution Analytics
Contact:
lee@revolutionanalytics.com

                                                      31

Más contenido relacionado

La actualidad más candente

ADMM-Based Scalable Machine Learning on Apache Spark with Sauptik Dhar and Mo...
ADMM-Based Scalable Machine Learning on Apache Spark with Sauptik Dhar and Mo...ADMM-Based Scalable Machine Learning on Apache Spark with Sauptik Dhar and Mo...
ADMM-Based Scalable Machine Learning on Apache Spark with Sauptik Dhar and Mo...Databricks
 
Terascale Learning
Terascale LearningTerascale Learning
Terascale Learningpauldix
 
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...MLconf
 
Wapid and wobust active online machine leawning with Vowpal Wabbit
Wapid and wobust active online machine leawning with Vowpal Wabbit Wapid and wobust active online machine leawning with Vowpal Wabbit
Wapid and wobust active online machine leawning with Vowpal Wabbit Antti Haapala
 
Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*Intel® Software
 
2014-10-20 Large-Scale Machine Learning with Apache Spark at Internet of Thin...
2014-10-20 Large-Scale Machine Learning with Apache Spark at Internet of Thin...2014-10-20 Large-Scale Machine Learning with Apache Spark at Internet of Thin...
2014-10-20 Large-Scale Machine Learning with Apache Spark at Internet of Thin...DB Tsai
 
Next generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph labNext generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph labImpetus Technologies
 
Enterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using SparkEnterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using SparkAlpine Data
 
GraphSage vs Pinsage #InsideArangoDB
GraphSage vs Pinsage #InsideArangoDBGraphSage vs Pinsage #InsideArangoDB
GraphSage vs Pinsage #InsideArangoDBArangoDB Database
 
Dask glm-scipy2017-final
Dask glm-scipy2017-finalDask glm-scipy2017-final
Dask glm-scipy2017-finalHussain Sultan
 
Implementation of linear regression and logistic regression on Spark
Implementation of linear regression and logistic regression on SparkImplementation of linear regression and logistic regression on Spark
Implementation of linear regression and logistic regression on SparkDalei Li
 
Scalable and Adaptive Graph Querying with MapReduce
Scalable and Adaptive Graph Querying with MapReduceScalable and Adaptive Graph Querying with MapReduce
Scalable and Adaptive Graph Querying with MapReduceKyong-Ha Lee
 
SASUM: A Sharing-based Approach to Fast Approximate Subgraph Matching for Lar...
SASUM: A Sharing-based Approach to Fast Approximate Subgraph Matching for Lar...SASUM: A Sharing-based Approach to Fast Approximate Subgraph Matching for Lar...
SASUM: A Sharing-based Approach to Fast Approximate Subgraph Matching for Lar...Kyong-Ha Lee
 
Pretzel: optimized Machine Learning framework for low-latency and high throug...
Pretzel: optimized Machine Learning framework for low-latency and high throug...Pretzel: optimized Machine Learning framework for low-latency and high throug...
Pretzel: optimized Machine Learning framework for low-latency and high throug...NECST Lab @ Politecnico di Milano
 
Chapter 4: Parallel Programming Languages
Chapter 4: Parallel Programming LanguagesChapter 4: Parallel Programming Languages
Chapter 4: Parallel Programming LanguagesHeman Pathak
 
Large Scale Kernel Learning using Block Coordinate Descent
Large Scale Kernel Learning using Block Coordinate DescentLarge Scale Kernel Learning using Block Coordinate Descent
Large Scale Kernel Learning using Block Coordinate DescentShaleen Kumar Gupta
 

La actualidad más candente (20)

ADMM-Based Scalable Machine Learning on Apache Spark with Sauptik Dhar and Mo...
ADMM-Based Scalable Machine Learning on Apache Spark with Sauptik Dhar and Mo...ADMM-Based Scalable Machine Learning on Apache Spark with Sauptik Dhar and Mo...
ADMM-Based Scalable Machine Learning on Apache Spark with Sauptik Dhar and Mo...
 
Terascale Learning
Terascale LearningTerascale Learning
Terascale Learning
 
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
 
Wapid and wobust active online machine leawning with Vowpal Wabbit
Wapid and wobust active online machine leawning with Vowpal Wabbit Wapid and wobust active online machine leawning with Vowpal Wabbit
Wapid and wobust active online machine leawning with Vowpal Wabbit
 
Yarn spark next_gen_hadoop_8_jan_2014
Yarn spark next_gen_hadoop_8_jan_2014Yarn spark next_gen_hadoop_8_jan_2014
Yarn spark next_gen_hadoop_8_jan_2014
 
Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*
 
2014-10-20 Large-Scale Machine Learning with Apache Spark at Internet of Thin...
2014-10-20 Large-Scale Machine Learning with Apache Spark at Internet of Thin...2014-10-20 Large-Scale Machine Learning with Apache Spark at Internet of Thin...
2014-10-20 Large-Scale Machine Learning with Apache Spark at Internet of Thin...
 
Next generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph labNext generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph lab
 
Hadoop map reduce concepts
Hadoop map reduce conceptsHadoop map reduce concepts
Hadoop map reduce concepts
 
Enterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using SparkEnterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using Spark
 
GraphSage vs Pinsage #InsideArangoDB
GraphSage vs Pinsage #InsideArangoDBGraphSage vs Pinsage #InsideArangoDB
GraphSage vs Pinsage #InsideArangoDB
 
Dask glm-scipy2017-final
Dask glm-scipy2017-finalDask glm-scipy2017-final
Dask glm-scipy2017-final
 
Implementation of linear regression and logistic regression on Spark
Implementation of linear regression and logistic regression on SparkImplementation of linear regression and logistic regression on Spark
Implementation of linear regression and logistic regression on Spark
 
Scalable and Adaptive Graph Querying with MapReduce
Scalable and Adaptive Graph Querying with MapReduceScalable and Adaptive Graph Querying with MapReduce
Scalable and Adaptive Graph Querying with MapReduce
 
SASUM: A Sharing-based Approach to Fast Approximate Subgraph Matching for Lar...
SASUM: A Sharing-based Approach to Fast Approximate Subgraph Matching for Lar...SASUM: A Sharing-based Approach to Fast Approximate Subgraph Matching for Lar...
SASUM: A Sharing-based Approach to Fast Approximate Subgraph Matching for Lar...
 
Pretzel: optimized Machine Learning framework for low-latency and high throug...
Pretzel: optimized Machine Learning framework for low-latency and high throug...Pretzel: optimized Machine Learning framework for low-latency and high throug...
Pretzel: optimized Machine Learning framework for low-latency and high throug...
 
Chapter 3 pc
Chapter 3 pcChapter 3 pc
Chapter 3 pc
 
Chapter 4: Parallel Programming Languages
Chapter 4: Parallel Programming LanguagesChapter 4: Parallel Programming Languages
Chapter 4: Parallel Programming Languages
 
SLE2015: Distributed ATL
SLE2015: Distributed ATLSLE2015: Distributed ATL
SLE2015: Distributed ATL
 
Large Scale Kernel Learning using Block Coordinate Descent
Large Scale Kernel Learning using Block Coordinate DescentLarge Scale Kernel Learning using Block Coordinate Descent
Large Scale Kernel Learning using Block Coordinate Descent
 

Similar a Parallel External Memory Algorithms Applied to Generalized Linear Models

IRJET- Latin Square Computation of Order-3 using Open CL
IRJET- Latin Square Computation of Order-3 using Open CLIRJET- Latin Square Computation of Order-3 using Open CL
IRJET- Latin Square Computation of Order-3 using Open CLIRJET Journal
 
Migration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsMigration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsZvi Avraham
 
Complier design
Complier design Complier design
Complier design shreeuva
 
My Postdoctoral Research
My Postdoctoral ResearchMy Postdoctoral Research
My Postdoctoral ResearchPo-Ting Wu
 
Intel Cluster Poisson Solver Library
Intel Cluster Poisson Solver LibraryIntel Cluster Poisson Solver Library
Intel Cluster Poisson Solver LibraryIlya Kryukov
 
HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardin...
HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardin...HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardin...
HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardin...Sunny Kr
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentIJERD Editor
 
Swift Parallel Scripting for High-Performance Workflow
Swift Parallel Scripting for High-Performance WorkflowSwift Parallel Scripting for High-Performance Workflow
Swift Parallel Scripting for High-Performance WorkflowDaniel S. Katz
 
useR2011 - Edlefsen
useR2011 - EdlefsenuseR2011 - Edlefsen
useR2011 - Edlefsenrusersla
 
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...Derryck Lamptey, MPhil, CISSP
 
Accelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous PlatformsAccelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous PlatformsIJMER
 
Scalable Data Analysis in R -- Lee Edlefsen
Scalable Data Analysis in R -- Lee EdlefsenScalable Data Analysis in R -- Lee Edlefsen
Scalable Data Analysis in R -- Lee EdlefsenRevolution Analytics
 
Hadoop mapreduce and yarn frame work- unit5
Hadoop mapreduce and yarn frame work-  unit5Hadoop mapreduce and yarn frame work-  unit5
Hadoop mapreduce and yarn frame work- unit5RojaT4
 
Simplified Data Processing On Large Cluster
Simplified Data Processing On Large ClusterSimplified Data Processing On Large Cluster
Simplified Data Processing On Large ClusterHarsh Kevadia
 
Understanding Parallelization of Machine Learning Algorithms in Apache Spark™
Understanding Parallelization of Machine Learning Algorithms in Apache Spark™Understanding Parallelization of Machine Learning Algorithms in Apache Spark™
Understanding Parallelization of Machine Learning Algorithms in Apache Spark™Databricks
 

Similar a Parallel External Memory Algorithms Applied to Generalized Linear Models (20)

IRJET- Latin Square Computation of Order-3 using Open CL
IRJET- Latin Square Computation of Order-3 using Open CLIRJET- Latin Square Computation of Order-3 using Open CL
IRJET- Latin Square Computation of Order-3 using Open CL
 
parallel-computation.pdf
parallel-computation.pdfparallel-computation.pdf
parallel-computation.pdf
 
Parallel computation
Parallel computationParallel computation
Parallel computation
 
Migration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsMigration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming Models
 
Complier design
Complier design Complier design
Complier design
 
20090720 smith
20090720 smith20090720 smith
20090720 smith
 
My Postdoctoral Research
My Postdoctoral ResearchMy Postdoctoral Research
My Postdoctoral Research
 
Intel Cluster Poisson Solver Library
Intel Cluster Poisson Solver LibraryIntel Cluster Poisson Solver Library
Intel Cluster Poisson Solver Library
 
HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardin...
HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardin...HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardin...
HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardin...
 
Parallel Computing
Parallel ComputingParallel Computing
Parallel Computing
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and Development
 
Swift Parallel Scripting for High-Performance Workflow
Swift Parallel Scripting for High-Performance WorkflowSwift Parallel Scripting for High-Performance Workflow
Swift Parallel Scripting for High-Performance Workflow
 
useR2011 - Edlefsen
useR2011 - EdlefsenuseR2011 - Edlefsen
useR2011 - Edlefsen
 
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...
 
cnsm2011_slide
cnsm2011_slidecnsm2011_slide
cnsm2011_slide
 
Accelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous PlatformsAccelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous Platforms
 
Scalable Data Analysis in R -- Lee Edlefsen
Scalable Data Analysis in R -- Lee EdlefsenScalable Data Analysis in R -- Lee Edlefsen
Scalable Data Analysis in R -- Lee Edlefsen
 
Hadoop mapreduce and yarn frame work- unit5
Hadoop mapreduce and yarn frame work-  unit5Hadoop mapreduce and yarn frame work-  unit5
Hadoop mapreduce and yarn frame work- unit5
 
Simplified Data Processing On Large Cluster
Simplified Data Processing On Large ClusterSimplified Data Processing On Large Cluster
Simplified Data Processing On Large Cluster
 
Understanding Parallelization of Machine Learning Algorithms in Apache Spark™
Understanding Parallelization of Machine Learning Algorithms in Apache Spark™Understanding Parallelization of Machine Learning Algorithms in Apache Spark™
Understanding Parallelization of Machine Learning Algorithms in Apache Spark™
 

Más de Revolution Analytics

Speeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the CloudSpeeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the CloudRevolution Analytics
 
Migrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to AzureMigrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to AzureRevolution Analytics
 
Speed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudSpeed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudRevolution Analytics
 
Predicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per SecondPredicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per SecondRevolution Analytics
 
The Value of Open Source Communities
The Value of Open Source CommunitiesThe Value of Open Source Communities
The Value of Open Source CommunitiesRevolution Analytics
 
Building a scalable data science platform with R
Building a scalable data science platform with RBuilding a scalable data science platform with R
Building a scalable data science platform with RRevolution Analytics
 
The Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data ScienceThe Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data ScienceRevolution Analytics
 
Taking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the CloudTaking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the CloudRevolution Analytics
 
The Network structure of R packages on CRAN & BioConductor
The Network structure of R packages on CRAN & BioConductorThe Network structure of R packages on CRAN & BioConductor
The Network structure of R packages on CRAN & BioConductorRevolution Analytics
 
The network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 finalThe network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 finalRevolution Analytics
 
Simple Reproducibility with the checkpoint package
Simple Reproducibilitywith the checkpoint packageSimple Reproducibilitywith the checkpoint package
Simple Reproducibility with the checkpoint packageRevolution Analytics
 

Más de Revolution Analytics (20)

Speeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the CloudSpeeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the Cloud
 
Migrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to AzureMigrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to Azure
 
R in Minecraft
R in Minecraft R in Minecraft
R in Minecraft
 
The case for R for AI developers
The case for R for AI developersThe case for R for AI developers
The case for R for AI developers
 
Speed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudSpeed up R with parallel programming in the Cloud
Speed up R with parallel programming in the Cloud
 
The R Ecosystem
The R EcosystemThe R Ecosystem
The R Ecosystem
 
R Then and Now
R Then and NowR Then and Now
R Then and Now
 
Predicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per SecondPredicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per Second
 
Reproducible Data Science with R
Reproducible Data Science with RReproducible Data Science with R
Reproducible Data Science with R
 
The Value of Open Source Communities
The Value of Open Source CommunitiesThe Value of Open Source Communities
The Value of Open Source Communities
 
The R Ecosystem
The R EcosystemThe R Ecosystem
The R Ecosystem
 
R at Microsoft (useR! 2016)
R at Microsoft (useR! 2016)R at Microsoft (useR! 2016)
R at Microsoft (useR! 2016)
 
Building a scalable data science platform with R
Building a scalable data science platform with RBuilding a scalable data science platform with R
Building a scalable data science platform with R
 
R at Microsoft
R at MicrosoftR at Microsoft
R at Microsoft
 
The Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data ScienceThe Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data Science
 
Taking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the CloudTaking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the Cloud
 
The Network structure of R packages on CRAN & BioConductor
The Network structure of R packages on CRAN & BioConductorThe Network structure of R packages on CRAN & BioConductor
The Network structure of R packages on CRAN & BioConductor
 
The network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 finalThe network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 final
 
Simple Reproducibility with the checkpoint package
Simple Reproducibilitywith the checkpoint packageSimple Reproducibilitywith the checkpoint package
Simple Reproducibility with the checkpoint package
 
R at Microsoft
R at MicrosoftR at Microsoft
R at Microsoft
 

Último

The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 

Último (20)

The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 

Parallel External Memory Algorithms Applied to Generalized Linear Models

  • 1. Parallel External Memory Algorithms applied to Generalized Linear Models Lee E. Edlefsen, Ph.D. Chief Scientist JSM 2012 1
  • 2. Introduction and overview Revolution Confidential  For the past several decades the rising tide of technology has allowed the same data analysis code to handle the increase in sizes of typical data sets. That era is ending. The size of data sets is increasing much more rapidly than the speed of single cores, of RAM, and of hard drives. To deal with this, statistical software must be able to use multiple cores and computers. Parallel external memory algorithms (PEMA’s) provide a foundation for such software. 2
  • 3. Introduction and overview – (2) Revolution Confidential  External memory algorithms (EMA’s) are those that do not require all data to be in RAM, and are widely available.  Parallel implementations of EMA’s allow them to run on multiple cores and computers, and to process unlimited rows of data.  This paper describes a general approach to efficiently parallelizing EMA’s, using an R and C++ implementation of generalized linear models (GLM) as a detailed example. Revolution R Enterprise 3
  • 4. Introduction and overview – (3) Revolution Confidential  This paper discusses:  the arrangement of code for “automatic” parallelization  the efficient use of cores  the efficient use of multiple computers (nodes)  The approach presented is independent of the distributed computing platform (MPI, Hadoop, MPP database appliances)  The paper includes billion row benchmarks showing linear scaling with rows and nodes, and demonstrating that extremely high performance is achievable Revolution R Enterprise 4
  • 5. High Performance Computing vs High Revolution Confidential Performance Analytics  HPA is HPC + Data  High Performance Computing is CPU centric  Lots of processing on small amounts of data  Focus is on cores  High Performance Analytics is data centric  Less processing per amount of data  Focus is on feeding data to the cores  On disk I/O, data locality  On efficient threading, data management in RAM Revolution R Enterprise 5
  • 6. High Performance Analytics in RevoScaleR Revolution Confidential  Extremely high performance data management and data analysis  Scales from small local data to huge distributed data  Scales from laptop to cluster to cloud  Based on a platform that “automatically” and efficiently parallelizes and distributes a broad class of predictive analytic algorithms  This platform implements the approach to parallel external memory algorithms I will describe Revolution R Enterprise 6
  • 7. External memory algorithms Revolution Confidential  External memory algorithms are those that allow computations to be split into pieces so that not all data has to be in memory at one time  Such algorithms process data a “chunk” at a time, storing intermediate results from each chunk and combining them at the end  Each chunk must produce an intermediate result that can be combined with other intermediate results to give the final result  Such algorithms are widely available for data management and predictive analytics 7
  • 8. Parallel external memory algorithms Revolution Confidential (PEMA’S)  PEMA’s are external memory algorithms that have been parallelized  Such algorithms process data a chunk at a time in parallel, storing intermediate results from each chunk and combining them at the end  External memory algorithms that are not “inherently sequential” can be parallelized  Results for one chunk of data cannot depend upon prior results  Data dependence (lags, leads) is OK Revolution R Enterprise 8
  • 9. Generalized Linear Models (GLM) Revolution Confidential  The generalized linear model can be thought of as a generalization of linear regression  It extends linear regression to handle dependent variables that are generated from exponential distribution functions, including Gaussian, Poisson, logistic, gamma, binomial, multinomial, and tweedie  Generalized linear models are widely used in a variety of fields and industries Revolution R Enterprise 9
  • 10. GLM overview Revolution Confidential  The dependent variable Y is generated from a distribution in the exponential family  The expected value of Y is related to a linear predictor of the data X and parameters β through the inverse of a “link” function g(): E(Y) = mu = g-1(Xβ)  The variance of Y is typically a function V() of the mean mu: Var(Y) = varmu = V(mu) Revolution R Enterprise 10
  • 11. GLM Estimation Revolution Confidential  The parameters of GLM models can be estimated using maximum likelihood  Iteratively reweighted least squares (IRLS) is commonly used to obtain the maximum likelihood estimates  Each iteration of IRLS requires at least one pass through the data, generating a vector of weights and a “new” dependent variable and then doing a weighted least squares regression Revolution R Enterprise 11
  • 12. IRLS for GLM Revolution Confidential  Given an estimate of the parameters β and the data X, IRLS requires the computation of a “weight” variable W and a “new” dependent variable Z: eta = Xβ mu = linkinv(eta) Z = (y-mu)/mu_eta, where mu_eta is the partial of mu with respect to eta W = sqrt(mu_eta*mu_eta)/varmu  The next β is then computed by regressing Z on X, weighted by W  If the estimation has not converged, the steps are repeated Revolution R Enterprise 12
  • 13. In-memory implementations Revolution Confidential  The glm() function in R provides a beautiful and efficient in-memory implementation  However, nearly every computational line of code involves processing all rows of data  There is no easy way to directly convert an implementation like this into an implementation that can handle data too big to fit into memory and that can use multiple cores and multiple computers  However, it can be accomplished by arranging the same computations into separate functions that accomplish separate tasks Revolution R Enterprise 13
  • 14. Example external memory algorithm for the mean of a variable Revolution Confidential  Initialization function: total=0, count=0  ProcessData function: for each block of x; total = sum(x), count=length(x)  UpdateResults function: total12 = total1 + total2  ProcessResults function: mean = combined total / combined count 14
  • 15. A formalization of PEMA’s Revolution Confidential  Arrange the code into 4 functions: 1. Initialize(): does any necessary initialization 2. ProcessData(): takes a chunk of data and produces an intermediate result (IR); this is the only function run in parallel; it must assume it does not have all data; it must produce no side-effects 3. UpdateResults(): takes two IR’s and produces another IR that is equivalent to the IR that would have been produced by combing the two corresponding chunks of data and calling ProcessData() 4. ProcessResults(): takes any given IR and converts it into a “final results” (FR) form Revolution R Enterprise 15
  • 16. An external memory algorithm for GLM Revolution Confidential  Initialization function: set intermediate values to 0  ProcessData function: for given β and chunk of data X, compute Z, W and M, the weighted cross products matrix of X and Z for this chunk eta = Xβ, mu = linkinv(eta) Z = (y-mu)/mu_eta, W = sqrt(mu_eta*mu_eta)/varmu M = [X*W Z*W]’[X*W Z*W]  UpdateResults function: M12 = M1 + M2  ProcessResults function: β = Solve(M) (solves a set of linear equations)  Check for convergence and repeat if necessary Revolution R Enterprise 16
  • 17. A C++ and R implementation of GLM Revolution Confidential  C++ “analysis” objects  Have 4 virtual PEMA methods, among others  Have member variables for intermediate results and for maintaining local state  Know how to copy themselves (including ability to not copy some members, for efficiency)  Have ability to call into R during ProcessData()  R “family” objects for glm  Contain methods for computing Z, W (eta, mu, etc) Revolution R Enterprise 17
  • 18. GLM in C++ and R: Multiple Cores Revolution Confidential  On each computer, a master analysis object makes a copy of itself for all usable threads (cores) except one  The remaining thread is assigned to handle all I/O  In a master loop over the data, the I/O object reads a chunk of data  In parallel (after the first read), portions of the previously-read chunk are (virtually) passed to the ProcessData() methods of the other objects Revolution R Enterprise 18
  • 19. GLM in C++ and R: Multiple Cores – (2) Revolution Confidential  For each chunk of data, Z,W are computed (in R or C++; if in R, only on 1 thread at a time is allowed); Xβ and M are computed in C++  After all data has been consumed, the master analysis object loops over all of the thread-specific objects and updates itself (using UpdateResults()), resulting in the intermediate results object that corresponds to all of the data processed on this computer  If other computers are being used, this computer sends it intermediate results to the “master” node Revolution R Enterprise 19
  • 20. GLM in C++ and R: Multiple MPI Nodes Revolution Confidential  A “master node” sends a copy of the analysis object, or instructions on how to create one, to each computer (node) on a cluster/grid, and the steps described above are carried out  Each node reads and processes its portion of the data (the more local the data the better)  Worker nodes do not communicate with each other  Worker nodes do not communicate with the master node except for sending their results Revolution R Enterprise 20
  • 21. GLM in C++ and R: Multiple MPI Nodes (2) Revolution Confidential  When each node has its final IR object, it sends it to the master node  The master node gathers and combines all intermediate results using UpdateResults()  When it has the final intermediate results, it calls ProcessResults() to get next estimate of β  The master node checks for convergence, and repeats all of the steps if necessary Revolution R Enterprise 21
  • 22. Implementation in RevoScaleR Revolution Confidential  The package RevoScaleR, which is part of Revolution R Enterprise, contains an implementation of GLM and other algorithms based on this approach  The algorithms are internally threaded  They can currently use MPI or RPC for inter- process communication  Supports Platform LSF and HPC Server schedulers  We are currently working on supporting Hadoop Revolution R Enterprise 22
  • 23. Some features of this implementation Revolution Confidential  Handles an arbitrarily large number of rows in a fixed amount of memory  Scales linearly with the number of rows  Scales (approximately) linearly with the number of nodes  Scales well with the number of cores per node  Scales well with the number of parameters  Works on commodity hardware  Extremely high performance Revolution R Enterprise 23
  • 24. Scalability of linear regression with rows 1 million - 1 billion rows, 443 betas Revolution Confidential (4 cores) Time (secs) 1200 ~ 1.1 million rows/second 1000 800 600 400 200 0 0 200 400 600 800 1000 1200 Revolution R Enterprise 24
  • 25. Scalability of glm (logit) with rows 1 million - 1 billion rows, 443 betas Revolution Confidential (4 cores) Time (secs) 4000 3500 3000 2500 2000 1500 1000 500 0 0 200 400 600 800 1000 1200 Revolution R Enterprise 25
  • 26. Scalability with nodes: glm (logit) Revolution Confidential Big (1B rows) and Small (124M rows) data Big (443 params) and Small (7 params) models (4 cores per node) Big Data, Big Model (Super scaling) 5 iterations per model Big Data, Small Model Small Data, Big Model Linear scaling reference Small Data, Small Model Revolution R Enterprise 26
  • 27. Timing comparisons Revolution Confidential  glm() in CRAN R vs rxGlm in RevoScaleR  SAS’s new HPA functionality vs rxGlm Revolution R Enterprise 27
  • 29. Revolution Confidential HPA Benchmarking comparison* – Logistic Regression Rows of data 1 billion 1 billion Parameters “just a few” 7 Time 80 seconds 44 seconds Data location In memory On disk Nodes 32 5 Cores 384 20 RAM 1,536 GB 80 GB Revolution R is faster on the same amount of data, despite using approximately a 20th as many cores, a 20th as much RAM, a 6th as many nodes, and not pre- loading data into RAM. *As published by SAS in HPC Wire, April 21, 2011 29
  • 30. Conclusion Revolution Confidential  PEMA’s provide a systematic approach to scalable analytic algorithms  Algorithms implemented in this way can handle unlimited numbers of rows on a single core in a fixed amount of RAM  Such algorithms scale well with rows and nodes, and scale well with cores up to a point  Work on commodity hardware  Work on different distributed computing platforms  Extremely high performance is possible Revolution R Enterprise 30
  • 31. Thank you! Revolution Confidential  R-Core Team  R Package Developers  R Community  Revolution R Enterprise Customers and Beta Testers  Colleagues at Revolution Analytics Contact: lee@revolutionanalytics.com 31