Glm talk Tomas

•

0 likes•1,992 views

Sri Ambati

The math and code behind H20's big data Generalized Linear Modeling (GLM) presented by Tomas Nykodym

Technology Health & Medicine

Distributed GLM Implementation
on H2O platform
Tomas Nykodym, 0xDATA

Linear Regression
Data:
x, y + noise
Goal:
predict y using x
i.e. find a,b s.t.
y = a*x + b

Linear Regression
Least Squares Fit
Real Relation:
y=3x+10+N(0,20)
Best Fit:
y = 3.08*x + 6

Prostate Cancer Example
Data:
x = PSA
(prostate-specific antigen)
y = CAPSULE
0 = no tumour
1 = tumour
Goal:
predict y using x

Prostate Cancer Example
Linear Regression Fit
Data:
x = PSA
(prostate-specific antigen)
y = CAPSULE
0 = no tumour
1 = tumour
Fit:
Least squares fit

Generalized Linear Model
Generalizes linear regression by:
– adding a link function g to transform the output
z = g(y) – new response variable
– noise (i.e.variance) does not have to be constant
– fit is maximal likelihood instead of least squares

Prostate Cancer
Logistic Regression Fit
Data:
x = PSA
(prostate-specific antigen)
y = CAPSULE
0 = no tumour
1 = tumour
GLM Fit:
– Binomial family
– Logit link
– Predict probability
of CAPSULE=1.

Implementation - Solve GLM by IRLSM
Input:
– X: data matrix N*P
– Y: response vector (N rows)
– family, link function, α,β
INNER LOOP:
Solve elastic net:
ADMM(Boyd 2010, page 43):
OUTER LOOP:
While β changes, compute:
zk +1=β k +( y−μ k )
d η
d μ
W k +1
−1
=(
d η
d μ
)
2
Var(μ k )
γ
l+1
=( X
T
WX +ρ I )
−1
X
T
Wz+ρ (β
l
−u
l
)
β l+1
=Sλ /ρ (γ l+1
+ul
)
ul+1
=uk
+γ l+1
−βl+1
Output:
– β vector of coefficients, solution
to max-likellihood
XX = X
T
W k+1 X
Xz= X
T
Wzk +1

$H2O Implementation Outer Loop: (Map Reduce Task) public class SimpleGLM extends MRTask { @Override public void map(Chunk c) { res = new double [p][p]; for(double [] x:c.rows()){ double eta,mu,var; eta = computeEta(x); mu = _link.linkInv(eta); var = Math.max(1e-5,_family.variance(mu)); double dp = _link.linkInvDeriv(eta); double w = dp*dp/var; for(int i = 0; i < x.length; ++i) for(int j = 0; j < x.length; ++j) res[i][j] += x[i]*x[j]*w; } } @Override public void reduce(SimpleGLM g) { for(int i = 0; i < res.length; ++i) for(int j = 0; i < res.length; ++i) res[i][j] += g.res[i][j]; } } Inner Loop: (ADMM solver) public double [] solve(Matrix xx, Matrix xy) { // ADMM LSM Solve CholeskyDecomposition lu; // cache decomp! lu = new CholeskyDecomposition(xx); for( int i = 0; i < 1000; ++i ) { // Solve using cached Cholesky decomposition! xm = lu.solve(xyPrime); // compute u and z update for( int j = 0; j < N-1; ++j ) { double x_hat = xm.get(j, 0); x_norm += x_hat * x_hat; double zold = z[j]; z[j] = shrinkage(x_hat + u[j], kappa); u[j] += x_hat - z[j]; u_norm += u[j] * u[j]; } } } double shrinkage(double x, double k) { return Math.max(0,x-k)-Math.max(0,-x-k); }$

Regularization
Elastic Net (Zhou, Hastie, 2005):
● Added L1 and L2 penalty to β to:
– avoid overfitting, reduce variance
– obtain sparse solution (L1 penalty)
– avoid problems with correlated covariates
No longer analytical solution.
Options: LARS, ADMM, Generalized Gradient, ...
β =argmin( X β − y)
T
( X β −y)+α ∥β∥1+(1−α )∥β∥2
2

Linear Regression
Least Squares Method
Find β by minimizing the sum of squared errors:
Analytical solution:
Easily parallelized if XT
X is reasonably small.
β =(X T
X )−1
X T
y=(
1
n
∑ xi xi
T
)
−1
1
n
∑ xi y
β =argmin( X β − y)
T
( X β −y)

Generalized Linear Model
● Generalizes linear regression by:
– adding a link function g to transform the response
z = g(y) – new response variable
η = Xβ – linear predictor
μ = g-1
(η)
– y has a distribution in the exponential family
– variance depends on μ
e.g var(μ) = μ*(1-μ) for Binomial family.
– fit by maximizing the likelihood of the model

What's hot

Pytorch and Machine Learning for the Math ImpairedTyrel Denison

Semi vae memo (1)Masato Nakai

Irs gan docMasato Nakai

Semi vae memo (2)Masato Nakai

Numerical_Methods_Simpson_RuleAlex_5991

Deep Learning with Julia1.0 and FluxSatoshi Terasaki

C++ ammar .s.qammarsalem5

Lec 3-mcgregorAtner Yegorov

Fast parallelizable scenario-based stochastic optimizationPantelis Sopasakis

Sergey Shelpuk & Olha Romaniuk - “Deep learning, Tensorflow, and Fashion: how...Lviv Startup Club

InterpolationDmytro Mitin

Машинное обучение на JS. С чего начать и куда идти | Odessa Frontend Meetup #12OdessaFrontend

Cristian tovar 10 03 jtIsaac Geovaniz Tellez Chacon

Statistics for Economics Midterm 2 Cheat SheetLaurel Ayuyao

The Uncertain EnterpriseClarkTony

HMPC for Upper Stage Attitude ControlPantelis Sopasakis

CrystalBall - Compute Relative Frequency in Hadoop Suvash Shah

0004sts9ratm0624

Maximizing Submodular Function over the Integer LatticeTasuku Soma

What's hot (20)

Pytorch and Machine Learning for the Math Impaired

Semi vae memo (1)

Irs gan doc

Semi vae memo (2)

Numerical_Methods_Simpson_Rule

Deep Learning with Julia1.0 and Flux

C++ ammar .s.q

Lec 3-mcgregor

Fast parallelizable scenario-based stochastic optimization

Sergey Shelpuk & Olha Romaniuk - “Deep learning, Tensorflow, and Fashion: how...

Interpolation

Машинное обучение на JS. С чего начать и куда идти | Odessa Frontend Meetup #12

Cristian tovar 10 03 jt

Statistics for Economics Midterm 2 Cheat Sheet

The Uncertain Enterprise

HMPC for Upper Stage Attitude Control

CrystalBall - Compute Relative Frequency in Hadoop

0004

Maximizing Submodular Function over the Integer Lattice

Viewers also liked

H2O World - Building Data Products for Data Natives - Monica RogatiSri Ambati

Running GLM in RSri Ambati

H2O.ai's Distributed Deep Learning by Arno Candel 04/03/14Sri Ambati

H2O World - Clustering & Feature Extraction on Text - Seth RedmoreSri Ambati

Arno candel h2o_a_platform_for_big_math_hadoop_summit_june2016Sri Ambati

H2O World - Advanced Analytics at Macys.com - Daqing ZhaoSri Ambati

Viewers also liked (6)

H2O World - Building Data Products for Data Natives - Monica Rogati

Running GLM in R

H2O.ai's Distributed Deep Learning by Arno Candel 04/03/14

H2O World - Clustering & Feature Extraction on Text - Seth Redmore

Arno candel h2o_a_platform_for_big_math_hadoop_summit_june2016

H2O World - Advanced Analytics at Macys.com - Daqing Zhao

Similar to Glm talk Tomas

Hybrid dynamics in large-scale logistics networksMKosmykov

Improved Trainings of Wasserstein GANs (WGAN-GP)Sangwoo Mo

H2O World - Consensus Optimization and Machine Learning - Stephen BoydSri Ambati

Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covar...Michael Lie

SIAM - Minisymposium on Guaranteed numerical algorithmsJagadeeswaran Rathinavel

MUMS: Transition & SPUQ Workshop - Practical Bayesian Optimization for Urban ...The Statistical and Applied Mathematical Sciences Institute

Logisticregressionsujimaa

Logisticregressionrnunoo

Need for Controllers having Integer Coeﬃcients in Homomorphically Encrypted D...CDSL_at_SNU

logisticregressionSahilShahPhD2020

logisticregression.pptShrutiPanda12

NTHU AI Reading Group: Improved Training of Wasserstein GANsMark Chang

Computing near-optimal policies from trajectories by solving a sequence of st...Université de Liège (ULg)

QMC: Transition Workshop - Applying Quasi-Monte Carlo Methods to a Stochastic...The Statistical and Applied Mathematical Sciences Institute

ML unit-1.pptxSwarnaKumariChinni

Reinforcement Learning: Hidden Theory and New Super-Fast AlgorithmsSean Meyn

PMHMathematicaSamplePeter Hammel

Hands-On Algorithms for Predictive ModelingArthur Charpentier

ADVANCED ALGORITHMS-UNIT-3-Final.pptssuser702532

Error control coding bch, reed-solomon etc..Madhumita Tamhane

Similar to Glm talk Tomas (20)

Hybrid dynamics in large-scale logistics networks

Improved Trainings of Wasserstein GANs (WGAN-GP)

H2O World - Consensus Optimization and Machine Learning - Stephen Boyd

Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covar...

SIAM - Minisymposium on Guaranteed numerical algorithms

MUMS: Transition & SPUQ Workshop - Practical Bayesian Optimization for Urban ...

Logisticregression

Need for Controllers having Integer Coeﬃcients in Homomorphically Encrypted D...

logisticregression

logisticregression.ppt

NTHU AI Reading Group: Improved Training of Wasserstein GANs

Computing near-optimal policies from trajectories by solving a sequence of st...

QMC: Transition Workshop - Applying Quasi-Monte Carlo Methods to a Stochastic...

ML unit-1.pptx

Reinforcement Learning: Hidden Theory and New Super-Fast Algorithms

PMHMathematicaSample

Hands-On Algorithms for Predictive Modeling

ADVANCED ALGORITHMS-UNIT-3-Final.ppt

Error control coding bch, reed-solomon etc..

Recently uploaded

DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

Rise of the Machines: Known As Drones...Rick Flair

What is Artificial Intelligence?????????blackmambaettijean

SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521

Ryan Mahoney - Will Artificial Intelligence Replace Real Estate AgentsRyan Mahoney

Sample pptx for embedding into website for demoHarshalMandlekar2

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3

From Family Reminiscence to Scholarly Archive .Alan Dix

unit 4 immunoblotting technique complete.pptxBkGupta21

DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell

DevEX - reference for building teams, processes, and platformsSergiu Bodiu

Anypoint Exchange: It’s Not Just a Repo!Manik S Magar

Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3

TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey

What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina

Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays

The State of Passkeys with FIDO Alliance.pptxLoriGlavin3

WordPress Websites for Engineers: Elevate Your Brandgvaughan

Recently uploaded (20)

DevoxxFR 2024 Reproducible Builds with Apache Maven

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024

Rise of the Machines: Known As Drones...

What is Artificial Intelligence?????????

SALESFORCE EDUCATION CLOUD | FEXLE SERVICES

Ryan Mahoney - Will Artificial Intelligence Replace Real Estate Agents

Sample pptx for embedding into website for demo

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx

From Family Reminiscence to Scholarly Archive .

unit 4 immunoblotting technique complete.pptx

DSPy a system for AI to Write Prompts and Do Fine Tuning

DevEX - reference for building teams, processes, and platforms

Anypoint Exchange: It’s Not Just a Repo!

Digital Identity is Under Attack: FIDO Paris Seminar.pptx

TeamStation AI System Report LATAM IT Salaries 2024

What is DBT - The Ultimate Data Build Tool.pdf

Nell’iperspazio con Rocket: il Framework Web di Rust!

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack

The State of Passkeys with FIDO Alliance.pptx

WordPress Websites for Engineers: Elevate Your Brand

Glm talk Tomas

1. Distributed GLM Implementation on H2O platform Tomas Nykodym, 0xDATA

2. Linear Regression Data: x, y + noise Goal: predict y using x i.e. find a,b s.t. y = a*x + b

3. Linear Regression Least Squares Fit Real Relation: y=3x+10+N(0,20) Best Fit: y = 3.08*x + 6

4. Prostate Cancer Example Data: x = PSA (prostate-specific antigen) y = CAPSULE 0 = no tumour 1 = tumour Goal: predict y using x

5. Prostate Cancer Example Linear Regression Fit Data: x = PSA (prostate-specific antigen) y = CAPSULE 0 = no tumour 1 = tumour Fit: Least squares fit

6. Generalized Linear Model Generalizes linear regression by: – adding a link function g to transform the output z = g(y) – new response variable – noise (i.e.variance) does not have to be constant – fit is maximal likelihood instead of least squares

7. Prostate Cancer Logistic Regression Fit Data: x = PSA (prostate-specific antigen) y = CAPSULE 0 = no tumour 1 = tumour GLM Fit: – Binomial family – Logit link – Predict probability of CAPSULE=1.

8. Implementation - Solve GLM by IRLSM Input: – X: data matrix N*P – Y: response vector (N rows) – family, link function, α,β INNER LOOP: Solve elastic net: ADMM(Boyd 2010, page 43): OUTER LOOP: While β changes, compute: zk +1=β k +( y−μ k ) d η d μ W k +1 −1 =( d η d μ ) 2 Var(μ k ) γ l+1 =( X T WX +ρ I ) −1 X T Wz+ρ (β l −u l ) β l+1 =Sλ /ρ (γ l+1 +ul ) ul+1 =uk +γ l+1 −βl+1 Output: – β vector of coefficients, solution to max-likellihood XX = X T W k+1 X Xz= X T Wzk +1

9. H2O Implementation Outer Loop: (Map Reduce Task) public class SimpleGLM extends MRTask { @Override public void map(Chunk c) { res = new double [p][p]; for(double [] x:c.rows()){ double eta,mu,var; eta = computeEta(x); mu = _link.linkInv(eta); var = Math.max(1e-5,_family.variance(mu)); double dp = _link.linkInvDeriv(eta); double w = dp*dp/var; for(int i = 0; i < x.length; ++i) for(int j = 0; j < x.length; ++j) res[i][j] += x[i]*x[j]*w; } } @Override public void reduce(SimpleGLM g) { for(int i = 0; i < res.length; ++i) for(int j = 0; i < res.length; ++i) res[i][j] += g.res[i][j]; } } Inner Loop: (ADMM solver) public double [] solve(Matrix xx, Matrix xy) { // ADMM LSM Solve CholeskyDecomposition lu; // cache decomp! lu = new CholeskyDecomposition(xx); for( int i = 0; i < 1000; ++i ) { // Solve using cached Cholesky decomposition! xm = lu.solve(xyPrime); // compute u and z update for( int j = 0; j < N-1; ++j ) { double x_hat = xm.get(j, 0); x_norm += x_hat * x_hat; double zold = z[j]; z[j] = shrinkage(x_hat + u[j], kappa); u[j] += x_hat - z[j]; u_norm += u[j] * u[j]; } } } double shrinkage(double x, double k) { return Math.max(0,x-k)-Math.max(0,-x-k); }

10. Regularization Elastic Net (Zhou, Hastie, 2005): ● Added L1 and L2 penalty to β to: – avoid overfitting, reduce variance – obtain sparse solution (L1 penalty) – avoid problems with correlated covariates No longer analytical solution. Options: LARS, ADMM, Generalized Gradient, ... β =argmin( X β − y) T ( X β −y)+α ∥β∥1+(1−α )∥β∥2 2

11. Linear Regression Least Squares Method Find β by minimizing the sum of squared errors: Analytical solution: Easily parallelized if XT X is reasonably small. β =(X T X )−1 X T y=( 1 n ∑ xi xi T ) −1 1 n ∑ xi y β =argmin( X β − y) T ( X β −y)

12. Generalized Linear Model ● Generalizes linear regression by: – adding a link function g to transform the response z = g(y) – new response variable η = Xβ – linear predictor μ = g-1 (η) – y has a distribution in the exponential family – variance depends on μ e.g var(μ) = μ*(1-μ) for Binomial family. – fit by maximizing the likelihood of the model

Glm talk Tomas

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (6)

Similar to Glm talk Tomas

Similar to Glm talk Tomas (20)

More from Sri Ambati

More from Sri Ambati (20)

Recently uploaded

Recently uploaded (20)

Glm talk Tomas