SlideShare a Scribd company logo
1 of 120
Download to read offline
Introduction to Machine Learning
with H2O and Python
Jo-fai (Joe) Chow
Data Scientist
joe@h2o.ai
@matlabulous
H2O Tutorial at Analyx
20th April, 2017
Slides and Code Examples:
bit.ly/joe_h2o_tutorials
2
About Me
• Civil (Water) Engineer
• 2010 – 2015
• Consultant (UK)
• Utilities
• Asset Management
• Constrained Optimization
• Industrial PhD (UK)
• Infrastructure Design Optimization
• Machine Learning +
Water Engineering
• Discovered H2O in 2014
• Data Scientist
• 2015
• Virgin Media (UK)
• Domino Data Lab (Silicon Valley,
US)
• 2016 – Present
• H2O.ai (Silicon Valley, US)
3
About Me
4
Side Project #1 – Crime Data Visualization
5
https://github.com/woobe/rApps/tree/master/crimemap
http://insidebigdata.com/2013/11/30/visualization-week-crimemap/
Side Project #2 – Data Visualization Contest
6
https://github.com/woobe/rugsmaps http://blog.revolutionanalytics.com/2014/08/winner-for-revolution-analytics-user-group-map-contest.html
Side Project #3
7
Developing R Packages for Fun
rPlotter (2014)
About Me
8
R + H2O + Domino for Kaggle
Guest Blog Post for Domino & H2O (2014)
• The Long Story
• bit.ly/joe_kaggle_story
Agenda
• About H2O.ai
• Company
• Machine Learning Platform
• Tutorial
• H2O Python Module
• Download & Install
• Step-by-Step Examples:
• Basic Data Import / Manipulation
• Regression & Classification (Basics)
• Regression & Classification (Advanced)
• Using H2O in the Cloud
9
Agenda
• About H2O.ai
• Company
• Machine Learning Platform
• Tutorial
• H2O Python Module
• Download & Install
• Step-by-Step Examples:
• Basic Data Import / Manipulation
• Regression & Classification (Basics)
• Regression & Classification (Advanced)
• Using H2O in the Cloud
10
Background Information
For beginners
As if I am working on
Kaggle competitions
Short Break
About H2O.ai
11
Company Overview
Founded 2011 Venture-backed, debuted in 2012
Products • H2O Open Source In-Memory AI Prediction Engine
• Sparkling Water
• Steam
Mission Operationalize Data Science, and provide a platform for users to build beautiful data products
Team 70 employees
• Distributed Systems Engineers doing Machine Learning
• World-class visualization designers
Headquarters Mountain View, CA
12
13
Our Team
Joe
Scientific Advisory Council
14
15
0
10000
20000
30000
40000
50000
60000
70000
1-Jan-15 1-Jul-15 1-Jan-16 1-Oct-16
# H2O Users
H2O Community Growth
Tremendous Momentum Globally
65,000+ users globally
(Sept 2016)
• 65,000+ users from
~8,000 companies in 140
countries. Top 5 from:
Large User Circle
* DATA FROM GOOGLE ANALYTICS EMBEDDED IN THE END USER PRODUCT
16
0
2000
4000
6000
8000
10000
1-Jan-15 1-Jul-15 1-Jan-16 1-Oct-16
# Companies Using H2O ~8,000+ companies
(Sept 2016)
+127%
+60%
#AroundTheWorldWithH2Oai
17
H2O for Kaggle Competitions
18
H2O for Academic Research
19
http://www.sciencedirect.com/science/article/pii/S0377221716308657
https://arxiv.org/abs/1509.01199
Users In Various Verticals Adore H2O
Financial Insurance MarketingTelecom Healthcare
20
21
Joe (2015)
http://www.h2o.ai/gartner-magic-quadrant/
22
Check
out our
website
h2o.ai
H2O Machine Learning Platform
23
24
25
H2O Overview
26
HDFS
S3
NFS
Distributed
In-Memory
Load Data
Loss-less
Compression
H2O Compute Engine
Production Scoring Environment
Exploratory &
Descriptive
Analysis
Feature
Engineering &
Selection
Supervised &
Unsupervised
Modeling
Model
Evaluation &
Selection
Predict
Data & Model
Storage
Model Export:
Plain Old Java Object
Your
Imagination
Data Prep Export:
Plain Old Java Object
Local
SQL
High Level Architecture
27
HDFS
S3
NFS
Distributed
In-Memory
Load Data
Loss-less
Compression
H2O Compute Engine
Production Scoring Environment
Exploratory &
Descriptive
Analysis
Feature
Engineering &
Selection
Supervised &
Unsupervised
Modeling
Model
Evaluation &
Selection
Predict
Data & Model
Storage
Model Export:
Plain Old Java Object
Your
Imagination
Data Prep Export:
Plain Old Java Object
Local
SQL
High Level Architecture
28
Import Data from
Multiple Sources
HDFS
S3
NFS
Distributed
In-Memory
Load Data
Loss-less
Compression
H2O Compute Engine
Production Scoring Environment
Exploratory &
Descriptive
Analysis
Feature
Engineering &
Selection
Supervised &
Unsupervised
Modeling
Model
Evaluation &
Selection
Predict
Data & Model
Storage
Model Export:
Plain Old Java Object
Your
Imagination
Data Prep Export:
Plain Old Java Object
Local
SQL
High Level Architecture
29
Fast, Scalable & Distributed
Compute Engine Written in
Java
HDFS
S3
NFS
Distributed
In-Memory
Load Data
Loss-less
Compression
H2O Compute Engine
Production Scoring Environment
Exploratory &
Descriptive
Analysis
Feature
Engineering &
Selection
Supervised &
Unsupervised
Modeling
Model
Evaluation &
Selection
Predict
Data & Model
Storage
Model Export:
Plain Old Java Object
Your
Imagination
Data Prep Export:
Plain Old Java Object
Local
SQL
High Level Architecture
30
Fast, Scalable & Distributed
Compute Engine Written in
Java
Supervised Learning
• Generalized Linear Models: Binomial,
Gaussian, Gamma, Poisson and Tweedie
• Naïve Bayes
Statistical
Analysis
Ensembles
• Distributed Random Forest: Classification
or regression models
• Gradient Boosting Machine: Produces an
ensemble of decision trees with increasing
refined approximations
Deep Neural
Networks
• Deep learning: Create multi-layer feed
forward neural networks starting with an
input layer followed by multiple layers of
nonlinear transformations
Algorithms Overview
Unsupervised Learning
• K-means: Partitions observations into k
clusters/groups of the same spatial size.
Automatically detect optimal k
Clustering
Dimensionality
Reduction
• Principal Component Analysis: Linearly transforms
correlated variables to independent components
• Generalized Low Rank Models: extend the idea of
PCA to handle arbitrary data consisting of numerical,
Boolean, categorical, and missing data
Anomaly
Detection
• Autoencoders: Find outliers using a
nonlinear dimensionality reduction using
deep learning
31
H2O Deep Learning in Action
32
HDFS
S3
NFS
Distributed
In-Memory
Load Data
Loss-less
Compression
H2O Compute Engine
Production Scoring Environment
Exploratory &
Descriptive
Analysis
Feature
Engineering &
Selection
Supervised &
Unsupervised
Modeling
Model
Evaluation &
Selection
Predict
Data & Model
Storage
Model Export:
Plain Old Java Object
Your
Imagination
Data Prep Export:
Plain Old Java Object
Local
SQL
High Level Architecture
33
Multiple Interfaces
H2O + Python
34
H2O + R
35
36
H2O Flow (Web) Interface
HDFS
S3
NFS
Distributed
In-Memory
Load Data
Loss-less
Compression
H2O Compute Engine
Production Scoring Environment
Exploratory &
Descriptive
Analysis
Feature
Engineering &
Selection
Supervised &
Unsupervised
Modeling
Model
Evaluation &
Selection
Predict
Data & Model
Storage
Model Export:
Plain Old Java Object
Your
Imagination
Data Prep Export:
Plain Old Java Object
Local
SQL
High Level Architecture
37
Export Standalone Models
for Production
38
docs.h2o.ai
H2O + Python Tutorial
39
Learning Objectives
• Start and connect to a local H2O cluster from Python.
• Import data from Python data frames, local files or web.
• Perform basic data transformation and exploration.
• Train regression and classification models using various H2O machine
learning algorithms.
• Evaluate models and make predictions.
• Improve performance by tuning and stacking.
• Connect to H2O cluster in the cloud.
40
41
Install H2O
h2o.ai -> Download -> Install in Python
42
43
Start and Connect to a
Local H2O Cluster
py_01_data_in_h2o.ipynb
44
Local H2O Cluster
45
Import H2O module
Start a local H2O cluster
nthreads = -1 means
using ALL CPU resources
46
Information of Cluster
Importing Data into H2O
py_01_data_in_h2o.ipynb
47
48
Import data into H2O cluster
(instead of Python’s memory)
49
Directly from data on the web
50
Convert from Pandas
to H2O data frame
Basic Data Transformation &
Exploration
py_02_data_manipulation.ipynb
(see notebooks)
51
52
The Classic
Titanic Dataset
53
Only two unique values
(0 or 1)
54
“enum” is the data type of
categorical data in Java
Convert numerical to
categorical values
55
Only three unique values
(1, 2 or 3)
Regression Models (Basics)
py_03a_regression_basics.ipynb
56
Supervised Learning
• Generalized Linear Models: Binomial,
Gaussian, Gamma, Poisson and Tweedie
• Naïve Bayes
Statistical
Analysis
Ensembles
• Distributed Random Forest: Classification
or regression models
• Gradient Boosting Machine: Produces an
ensemble of decision trees with increasing
refined approximations
Deep Neural
Networks
• Deep learning: Create multi-layer feed
forward neural networks starting with an
input layer followed by multiple layers of
nonlinear transformations
Algorithms Overview
Unsupervised Learning
• K-means: Partitions observations into k
clusters/groups of the same spatial size.
Automatically detect optimal k
Clustering
Dimensionality
Reduction
• Principal Component Analysis: Linearly transforms
correlated variables to independent components
• Generalized Low Rank Models: extend the idea of
PCA to handle arbitrary data consisting of numerical,
Boolean, categorical, and missing data
Anomaly
Detection
• Autoencoders: Find outliers using a
nonlinear dimensionality reduction using
deep learning
57
58
docs.h2o.ai
59
11 Numerical Features
Target
60
Define 11 Numerical
Features using their
Column Names
61
Split dataset so we can
measure out-of-bag
performance later
62
Basic H2O Usage for GLM
63
Regression Performance – MSE
Lower the better
64
Model Summary
65
Evaluate model performance
using test set
66
67
API for other ML
algorithms
68
API for other ML
algorithms
Classification Models (Basics)
py_04_classification_basics.ipynb
69
70
Target
71
Convert numerical to
categorical values
72
Define features manually
Split dataset so we can
measure out-of-bag
performance later
73
Basic H2O Usage for GLM
Classification Performance – Confusion Matrix
74
Confusion Matrix
75
76
Model Summary
77
Evaluate model performance
using test set
78
Predicted
Class
Probabilities of Each Class
79
API for other ML
algorithms
80
API for other ML
algorithms
End of Basics
Let’s have a break ☺
81
Regression Models (Tuning)
py_03b_regression_grid_search.ipynb
82
Improving Model Performance (Step-by-Step)
83
Model Settings MSE (CV) MSE (Test)
GBM with default settings N/A 0.4551
GBM with manual settings N/A 0.4433
Manual settings + cross-validation 0.4502 0.4433
Manual + CV + early stopping 0.4429 0.4287
CV + early stopping + full grid search 0.4378 0.4196
CV + early stopping + random grid search 0.4227 0.4047
Stacking models from random grid search N/A 0.3969
Lower Mean
Square Error
=
Better
Performance
84
Using same dataset and split
as previous tutorial
85
Baseline Model
Write down MSE on Test set
Improving Model Performance (Step-by-Step)
86
Model Settings MSE (CV) MSE (Test)
GBM with default settings N/A 0.4551
GBM with manual settings N/A 0.4433
Manual settings + cross-validation 0.4502 0.4433
Manual + CV + early stopping 0.4429 0.4287
CV + early stopping + full grid search 0.4378 0.4196
CV + early stopping + random grid search 0.4227 0.4047
Stacking models from random grid search N/A 0.3969
87
Manual settings
based on experience
Improving Model Performance (Step-by-Step)
88
Model Settings MSE (CV) MSE (Test)
GBM with default settings N/A 0.4551
GBM with manual settings N/A 0.4433
Manual settings + cross-validation 0.4502 0.4433
Manual + CV + early stopping 0.4429 0.4287
CV + early stopping + full grid search 0.4378 0.4196
CV + early stopping + random grid search 0.4227 0.4047
Stacking models from random grid search N/A 0.3969
Cross-Validation
89
90
Manual settings
based on experience
+
5-fold CV
91
Average MSE
from 5-fold CV
Improving Model Performance (Step-by-Step)
92
Model Settings MSE (CV) MSE (Test)
GBM with default settings N/A 0.4551
GBM with manual settings N/A 0.4433
Manual settings + cross-validation 0.4502 0.4433
Manual + CV + early stopping 0.4429 0.4287
CV + early stopping + full grid search 0.4378 0.4196
CV + early stopping + random grid search 0.4227 0.4047
Stacking models from random grid search N/A 0.3969
Early Stopping
93
94
Search for
lowest MSE
from 5-fold CV
Improving Model Performance (Step-by-Step)
95
Model Settings MSE (CV) MSE (Test)
GBM with default settings N/A 0.4551
GBM with manual settings N/A 0.4433
Manual settings + cross-validation 0.4502 0.4433
Manual + CV + early stopping 0.4429 0.4287
CV + early stopping + full grid search 0.4378 0.4196
CV + early stopping + random grid search 0.4227 0.4047
Stacking models from random grid search N/A 0.3969
Grid Search
96
Combination Parameter 1 Parameter 2
1 0.7 0.7
2 0.7 0.8
3 0.7 0.9
4 0.8 0.7
5 0.8 0.8
6 0.8 0.9
7 0.9 0.7
8 0.9 0.8
9 0.9 0.9
97
98
Sort Results by MSE
Best Model on Top
Lowest MSE
99
Stopped at 187 trees
(automatic)
Improving Model Performance (Step-by-Step)
100
Model Settings MSE (CV) MSE (Test)
GBM with default settings N/A 0.4551
GBM with manual settings N/A 0.4433
Manual settings + cross-validation 0.4502 0.4433
Manual + CV + early stopping 0.4429 0.4287
CV + early stopping + full grid search 0.4378 0.4196
CV + early stopping + random grid search 0.4227 0.4047
Stacking models from random grid search N/A 0.3969
101
Expand Search Space
Only search for 9
combinations
102
Sort Results by MSE
Best Model on Top
Lowest MSE
Improving Model Performance (Step-by-Step)
103
Model Settings MSE (CV) MSE (Test)
GBM with default settings N/A 0.4551
GBM with manual settings N/A 0.4433
Manual settings + cross-validation 0.4502 0.4433
Manual + CV + early stopping 0.4429 0.4287
CV + early stopping + full grid search 0.4378 0.4196
CV + early stopping + random grid search 0.4227 0.4047
Stacking models from random grid search N/A 0.3969
Regression Models (Ensembles)
py_03c_regression_ensembles.ipynb
104
105
https://github.com/h2oai/h2o-
meetups/blob/master/2017_02_23_
Metis_SF_Sacked_Ensembles_Deep_
Water/stacked_ensembles_in_h2o_fe
b2017.pdf
106
Keep the Best Model after
Random Grid Search
107
Keep the Best Model after
Random Grid Search
108
Keep the Best Model after
Random Grid Search
109
Lowest MSE =
Best Performance
API for Stacked Ensembles
Use the three models
from previous steps
Improving Model Performance (Step-by-Step)
110
Model Settings MSE (CV) MSE (Test)
GBM with default settings N/A 0.4551
GBM with manual settings N/A 0.4433
Manual settings + cross-validation 0.4502 0.4433
Manual + CV + early stopping 0.4429 0.4287
CV + early stopping + full grid search 0.4378 0.4196
CV + early stopping + random grid search 0.4227 0.4047
Stacking models from random grid search N/A 0.3969
Lowest MSE =
Best Performance
Classification Models (Ensembles)
py_04_classification_ensembles.ipynb
111
112
Highest AUC =
Best Performance
H2O in the Cloud
py_05_h2o_in_the_cloud.ipynb
113
114
115
Recap
116
Learning Objectives
• Start and connect to a local H2O cluster from Python.
• Import data from Python data frames, local files or web.
• Perform basic data transformation and exploration.
• Train regression and classification models using various H2O machine
learning algorithms.
• Evaluate models and make predictions.
• Improve performance by tuning and stacking.
• Connect to H2O cluster in the cloud.
117
Improving Model Performance (Step-by-Step)
118
Model Settings MSE (CV) MSE (Test)
GBM with default settings N/A 0.4551
GBM with manual settings N/A 0.4433
Manual settings + cross-validation 0.4502 0.4433
Manual + CV + early stopping 0.4429 0.4287
CV + early stopping + full grid search 0.4378 0.4196
CV + early stopping + random grid search 0.4227 0.4047
Stacking models from random grid search N/A 0.3969
Lowest MSE =
Best Performance
119
• Our Friends at
• Find us at Poznan R Meetup
• Today at 6:15 pm
• Uniwersytet Ekonomiczny w Poznaniu
Centrum Edukacyjne Usług
Elektronicznych
120
Thanks!
• Code, Slides & Documents
• bit.ly/h2o_meetups
• docs.h2o.ai
• Contact
• joe@h2o.ai
• @matlabulous
• github.com/woobe
• Please search/ask questions on
Stack Overflow
• Use the tag `h2o` (not H2 zero)

More Related Content

What's hot

Project "Deep Water"
Project "Deep Water"Project "Deep Water"
Project "Deep Water"Jo-fai Chow
 
Introduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and PythonIntroduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and PythonSri Ambati
 
H2O Big Join Slides
H2O Big Join SlidesH2O Big Join Slides
H2O Big Join SlidesSri Ambati
 
Intro to H2O in Python - Data Science LA
Intro to H2O in Python - Data Science LAIntro to H2O in Python - Data Science LA
Intro to H2O in Python - Data Science LASri Ambati
 
Automatic and Interpretable Machine Learning with H2O and LIME
Automatic and Interpretable Machine Learning with H2O and LIMEAutomatic and Interpretable Machine Learning with H2O and LIME
Automatic and Interpretable Machine Learning with H2O and LIMEJo-fai Chow
 
H2O Machine Learning Use Cases
H2O Machine Learning Use CasesH2O Machine Learning Use Cases
H2O Machine Learning Use CasesJo-fai Chow
 
ArnoCandelAIFrontiers011217
ArnoCandelAIFrontiers011217ArnoCandelAIFrontiers011217
ArnoCandelAIFrontiers011217Sri Ambati
 
Drive Away Fraudsters With Driverless AI - Venkatesh Ramanathan, Senior Data ...
Drive Away Fraudsters With Driverless AI - Venkatesh Ramanathan, Senior Data ...Drive Away Fraudsters With Driverless AI - Venkatesh Ramanathan, Senior Data ...
Drive Away Fraudsters With Driverless AI - Venkatesh Ramanathan, Senior Data ...Sri Ambati
 
Automatic and Interpretable Machine Learning in R with H2O and LIME
Automatic and Interpretable Machine Learning in R with H2O and LIMEAutomatic and Interpretable Machine Learning in R with H2O and LIME
Automatic and Interpretable Machine Learning in R with H2O and LIMEJo-fai Chow
 
Introduction to Data Science with H2O- Mountain View
Introduction to Data Science with H2O- Mountain ViewIntroduction to Data Science with H2O- Mountain View
Introduction to Data Science with H2O- Mountain ViewSri Ambati
 
Intro to H2O Machine Learning in R at Santa Clara University
Intro to H2O Machine Learning in R at Santa Clara UniversityIntro to H2O Machine Learning in R at Santa Clara University
Intro to H2O Machine Learning in R at Santa Clara UniversitySri Ambati
 
Databricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupDatabricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupPaco Nathan
 
BDW Chicago 2016 - Jim Scott, Director, Enterprise Strategy & Architecture - ...
BDW Chicago 2016 - Jim Scott, Director, Enterprise Strategy & Architecture - ...BDW Chicago 2016 - Jim Scott, Director, Enterprise Strategy & Architecture - ...
BDW Chicago 2016 - Jim Scott, Director, Enterprise Strategy & Architecture - ...Big Data Week
 
Sparkling Water 2.0 - Michal Malohlava
Sparkling Water 2.0 - Michal MalohlavaSparkling Water 2.0 - Michal Malohlava
Sparkling Water 2.0 - Michal MalohlavaSri Ambati
 
Training Series: Build APIs with Neo4j GraphQL Library
Training Series: Build APIs with Neo4j GraphQL LibraryTraining Series: Build APIs with Neo4j GraphQL Library
Training Series: Build APIs with Neo4j GraphQL LibraryNeo4j
 
H2O Random Grid Search - PyData Amsterdam
H2O Random Grid Search - PyData AmsterdamH2O Random Grid Search - PyData Amsterdam
H2O Random Grid Search - PyData AmsterdamSri Ambati
 
Engineering data quality
Engineering data qualityEngineering data quality
Engineering data qualityLars Albertsson
 
Chemical Databases and Open Chemistry on the Desktop
Chemical Databases and Open Chemistry on the DesktopChemical Databases and Open Chemistry on the Desktop
Chemical Databases and Open Chemistry on the DesktopMarcus Hanwell
 
H2O intro at Dallas Meetup
H2O intro at Dallas MeetupH2O intro at Dallas Meetup
H2O intro at Dallas MeetupSri Ambati
 

What's hot (19)

Project "Deep Water"
Project "Deep Water"Project "Deep Water"
Project "Deep Water"
 
Introduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and PythonIntroduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and Python
 
H2O Big Join Slides
H2O Big Join SlidesH2O Big Join Slides
H2O Big Join Slides
 
Intro to H2O in Python - Data Science LA
Intro to H2O in Python - Data Science LAIntro to H2O in Python - Data Science LA
Intro to H2O in Python - Data Science LA
 
Automatic and Interpretable Machine Learning with H2O and LIME
Automatic and Interpretable Machine Learning with H2O and LIMEAutomatic and Interpretable Machine Learning with H2O and LIME
Automatic and Interpretable Machine Learning with H2O and LIME
 
H2O Machine Learning Use Cases
H2O Machine Learning Use CasesH2O Machine Learning Use Cases
H2O Machine Learning Use Cases
 
ArnoCandelAIFrontiers011217
ArnoCandelAIFrontiers011217ArnoCandelAIFrontiers011217
ArnoCandelAIFrontiers011217
 
Drive Away Fraudsters With Driverless AI - Venkatesh Ramanathan, Senior Data ...
Drive Away Fraudsters With Driverless AI - Venkatesh Ramanathan, Senior Data ...Drive Away Fraudsters With Driverless AI - Venkatesh Ramanathan, Senior Data ...
Drive Away Fraudsters With Driverless AI - Venkatesh Ramanathan, Senior Data ...
 
Automatic and Interpretable Machine Learning in R with H2O and LIME
Automatic and Interpretable Machine Learning in R with H2O and LIMEAutomatic and Interpretable Machine Learning in R with H2O and LIME
Automatic and Interpretable Machine Learning in R with H2O and LIME
 
Introduction to Data Science with H2O- Mountain View
Introduction to Data Science with H2O- Mountain ViewIntroduction to Data Science with H2O- Mountain View
Introduction to Data Science with H2O- Mountain View
 
Intro to H2O Machine Learning in R at Santa Clara University
Intro to H2O Machine Learning in R at Santa Clara UniversityIntro to H2O Machine Learning in R at Santa Clara University
Intro to H2O Machine Learning in R at Santa Clara University
 
Databricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupDatabricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User Group
 
BDW Chicago 2016 - Jim Scott, Director, Enterprise Strategy & Architecture - ...
BDW Chicago 2016 - Jim Scott, Director, Enterprise Strategy & Architecture - ...BDW Chicago 2016 - Jim Scott, Director, Enterprise Strategy & Architecture - ...
BDW Chicago 2016 - Jim Scott, Director, Enterprise Strategy & Architecture - ...
 
Sparkling Water 2.0 - Michal Malohlava
Sparkling Water 2.0 - Michal MalohlavaSparkling Water 2.0 - Michal Malohlava
Sparkling Water 2.0 - Michal Malohlava
 
Training Series: Build APIs with Neo4j GraphQL Library
Training Series: Build APIs with Neo4j GraphQL LibraryTraining Series: Build APIs with Neo4j GraphQL Library
Training Series: Build APIs with Neo4j GraphQL Library
 
H2O Random Grid Search - PyData Amsterdam
H2O Random Grid Search - PyData AmsterdamH2O Random Grid Search - PyData Amsterdam
H2O Random Grid Search - PyData Amsterdam
 
Engineering data quality
Engineering data qualityEngineering data quality
Engineering data quality
 
Chemical Databases and Open Chemistry on the Desktop
Chemical Databases and Open Chemistry on the DesktopChemical Databases and Open Chemistry on the Desktop
Chemical Databases and Open Chemistry on the Desktop
 
H2O intro at Dallas Meetup
H2O intro at Dallas MeetupH2O intro at Dallas Meetup
H2O intro at Dallas Meetup
 

Similar to Introduction to Machine Learning with H2O and Python

Introduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and PythonIntroduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and PythonJo-fai Chow
 
Introduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2O
Introduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2OIntroduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2O
Introduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2OData Science Milan
 
Latest Developments in H2O
Latest Developments in H2OLatest Developments in H2O
Latest Developments in H2OSri Ambati
 
Machine Learning for Smarter Apps - Jacksonville Meetup
Machine Learning for Smarter Apps - Jacksonville MeetupMachine Learning for Smarter Apps - Jacksonville Meetup
Machine Learning for Smarter Apps - Jacksonville MeetupSri Ambati
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...Ian Foster
 
Continuum Analytics and Python
Continuum Analytics and PythonContinuum Analytics and Python
Continuum Analytics and PythonTravis Oliphant
 
Intro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSIntro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSSri Ambati
 
Agile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachAgile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachSoftServe
 
Anaconda and PyData Solutions
Anaconda and PyData SolutionsAnaconda and PyData Solutions
Anaconda and PyData SolutionsTravis Oliphant
 
H2O Overview with Amy Wang at useR! Aalborg
H2O Overview with Amy Wang at useR! AalborgH2O Overview with Amy Wang at useR! Aalborg
H2O Overview with Amy Wang at useR! AalborgSri Ambati
 
Start Getting Your Feet Wet in Open Source Machine and Deep Learning
Start Getting Your Feet Wet in Open Source Machine and Deep Learning Start Getting Your Feet Wet in Open Source Machine and Deep Learning
Start Getting Your Feet Wet in Open Source Machine and Deep Learning Ian Gomez
 
Analytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data PlatformAnalytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data PlatformVMware Tanzu
 
Intro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsIntro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsSri Ambati
 
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop WarehouseData Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop WarehouseDataWorks Summit
 
Ruben Diaz, Vision Banco + Rafael Coss, H2O ai + Luis Armenta, IBM - AI journ...
Ruben Diaz, Vision Banco + Rafael Coss, H2O ai + Luis Armenta, IBM - AI journ...Ruben Diaz, Vision Banco + Rafael Coss, H2O ai + Luis Armenta, IBM - AI journ...
Ruben Diaz, Vision Banco + Rafael Coss, H2O ai + Luis Armenta, IBM - AI journ...Sri Ambati
 
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...Demi Ben-Ari
 
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...Codemotion
 
Intro to H2O Machine Learning in Python - Galvanize Seattle
Intro to H2O Machine Learning in Python - Galvanize SeattleIntro to H2O Machine Learning in Python - Galvanize Seattle
Intro to H2O Machine Learning in Python - Galvanize SeattleSri Ambati
 
Belgrade R - Intro to H2O and Deep Water
Belgrade R - Intro to H2O and Deep WaterBelgrade R - Intro to H2O and Deep Water
Belgrade R - Intro to H2O and Deep WaterSri Ambati
 
CCI2018 - Real-time dashboard whatif analysis
CCI2018 - Real-time dashboard whatif analysisCCI2018 - Real-time dashboard whatif analysis
CCI2018 - Real-time dashboard whatif analysiswalk2talk srl
 

Similar to Introduction to Machine Learning with H2O and Python (20)

Introduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and PythonIntroduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and Python
 
Introduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2O
Introduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2OIntroduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2O
Introduction to Machine Learning with H2O - Jo-Fai (Joe) Chow, H2O
 
Latest Developments in H2O
Latest Developments in H2OLatest Developments in H2O
Latest Developments in H2O
 
Machine Learning for Smarter Apps - Jacksonville Meetup
Machine Learning for Smarter Apps - Jacksonville MeetupMachine Learning for Smarter Apps - Jacksonville Meetup
Machine Learning for Smarter Apps - Jacksonville Meetup
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
 
Continuum Analytics and Python
Continuum Analytics and PythonContinuum Analytics and Python
Continuum Analytics and Python
 
Intro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSIntro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWS
 
Agile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachAgile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric Approach
 
Anaconda and PyData Solutions
Anaconda and PyData SolutionsAnaconda and PyData Solutions
Anaconda and PyData Solutions
 
H2O Overview with Amy Wang at useR! Aalborg
H2O Overview with Amy Wang at useR! AalborgH2O Overview with Amy Wang at useR! Aalborg
H2O Overview with Amy Wang at useR! Aalborg
 
Start Getting Your Feet Wet in Open Source Machine and Deep Learning
Start Getting Your Feet Wet in Open Source Machine and Deep Learning Start Getting Your Feet Wet in Open Source Machine and Deep Learning
Start Getting Your Feet Wet in Open Source Machine and Deep Learning
 
Analytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data PlatformAnalytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data Platform
 
Intro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsIntro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data Scientists
 
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop WarehouseData Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
Data Driving Yahoo Mail Growth and Evolution with a 50 PB Hadoop Warehouse
 
Ruben Diaz, Vision Banco + Rafael Coss, H2O ai + Luis Armenta, IBM - AI journ...
Ruben Diaz, Vision Banco + Rafael Coss, H2O ai + Luis Armenta, IBM - AI journ...Ruben Diaz, Vision Banco + Rafael Coss, H2O ai + Luis Armenta, IBM - AI journ...
Ruben Diaz, Vision Banco + Rafael Coss, H2O ai + Luis Armenta, IBM - AI journ...
 
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
Monitoring Big Data Systems Done "The Simple Way" - Codemotion Milan 2017 - D...
 
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
Demi Ben-Ari - Monitoring Big Data Systems Done "The Simple Way" - Codemotion...
 
Intro to H2O Machine Learning in Python - Galvanize Seattle
Intro to H2O Machine Learning in Python - Galvanize SeattleIntro to H2O Machine Learning in Python - Galvanize Seattle
Intro to H2O Machine Learning in Python - Galvanize Seattle
 
Belgrade R - Intro to H2O and Deep Water
Belgrade R - Intro to H2O and Deep WaterBelgrade R - Intro to H2O and Deep Water
Belgrade R - Intro to H2O and Deep Water
 
CCI2018 - Real-time dashboard whatif analysis
CCI2018 - Real-time dashboard whatif analysisCCI2018 - Real-time dashboard whatif analysis
CCI2018 - Real-time dashboard whatif analysis
 

More from Jo-fai Chow

Making Multimillion-Dollar Baseball Decisions with H2O AutoML, LIME and Shiny
Making Multimillion-Dollar Baseball Decisions with H2O AutoML, LIME and ShinyMaking Multimillion-Dollar Baseball Decisions with H2O AutoML, LIME and Shiny
Making Multimillion-Dollar Baseball Decisions with H2O AutoML, LIME and ShinyJo-fai Chow
 
Kaggle Competitions, New Friends, New Skills and New Opportunities
Kaggle Competitions, New Friends, New Skills and New OpportunitiesKaggle Competitions, New Friends, New Skills and New Opportunities
Kaggle Competitions, New Friends, New Skills and New OpportunitiesJo-fai Chow
 
Using H2O Random Grid Search for Hyper-parameters Optimization
Using H2O Random Grid Search for Hyper-parameters OptimizationUsing H2O Random Grid Search for Hyper-parameters Optimization
Using H2O Random Grid Search for Hyper-parameters OptimizationJo-fai Chow
 
Introduction to Generalised Low-Rank Model and Missing Values
Introduction to Generalised Low-Rank Model and Missing ValuesIntroduction to Generalised Low-Rank Model and Missing Values
Introduction to Generalised Low-Rank Model and Missing ValuesJo-fai Chow
 
Improving Model Predictions via Stacking and Hyper-parameters Tuning
Improving Model Predictions via Stacking and Hyper-parameters TuningImproving Model Predictions via Stacking and Hyper-parameters Tuning
Improving Model Predictions via Stacking and Hyper-parameters TuningJo-fai Chow
 
Kaggle competitions, new friends, new skills and new opportunities
Kaggle competitions, new friends, new skills and new opportunitiesKaggle competitions, new friends, new skills and new opportunities
Kaggle competitions, new friends, new skills and new opportunitiesJo-fai Chow
 
Deploying your Predictive Models as a Service via Domino
Deploying your Predictive Models as a Service via DominoDeploying your Predictive Models as a Service via Domino
Deploying your Predictive Models as a Service via DominoJo-fai Chow
 
A Systematic, Multi-Criteria Decision Support Framework for Sustainable Drain...
A Systematic, Multi-Criteria Decision Support Framework for Sustainable Drain...A Systematic, Multi-Criteria Decision Support Framework for Sustainable Drain...
A Systematic, Multi-Criteria Decision Support Framework for Sustainable Drain...Jo-fai Chow
 
Designing Sustainable Drainage Systems
Designing Sustainable Drainage SystemsDesigning Sustainable Drainage Systems
Designing Sustainable Drainage SystemsJo-fai Chow
 
Developing a New Decision Support System for SuDS
Developing a New Decision Support System for SuDSDeveloping a New Decision Support System for SuDS
Developing a New Decision Support System for SuDSJo-fai Chow
 
Udacity Statement (Introduction to Statistics, August 2012)
Udacity Statement (Introduction to Statistics, August 2012)Udacity Statement (Introduction to Statistics, August 2012)
Udacity Statement (Introduction to Statistics, August 2012)Jo-fai Chow
 
Coursera Statement (Computational Investing, Part I,
Coursera Statement (Computational Investing, Part I, Coursera Statement (Computational Investing, Part I,
Coursera Statement (Computational Investing, Part I, Jo-fai Chow
 
Coursera Statement (Computing for Data Analysis, Oct 2013)
Coursera Statement (Computing for Data Analysis, Oct 2013)Coursera Statement (Computing for Data Analysis, Oct 2013)
Coursera Statement (Computing for Data Analysis, Oct 2013)Jo-fai Chow
 
Coursera Statement (Data Analysis, Mar 2013)
Coursera Statement (Data Analysis, Mar 2013)Coursera Statement (Data Analysis, Mar 2013)
Coursera Statement (Data Analysis, Mar 2013)Jo-fai Chow
 
A Systematic, Multi-Criteria Decision Support Framework for Sustainable Drain...
A Systematic, Multi-Criteria Decision Support Framework for Sustainable Drain...A Systematic, Multi-Criteria Decision Support Framework for Sustainable Drain...
A Systematic, Multi-Criteria Decision Support Framework for Sustainable Drain...Jo-fai Chow
 

More from Jo-fai Chow (15)

Making Multimillion-Dollar Baseball Decisions with H2O AutoML, LIME and Shiny
Making Multimillion-Dollar Baseball Decisions with H2O AutoML, LIME and ShinyMaking Multimillion-Dollar Baseball Decisions with H2O AutoML, LIME and Shiny
Making Multimillion-Dollar Baseball Decisions with H2O AutoML, LIME and Shiny
 
Kaggle Competitions, New Friends, New Skills and New Opportunities
Kaggle Competitions, New Friends, New Skills and New OpportunitiesKaggle Competitions, New Friends, New Skills and New Opportunities
Kaggle Competitions, New Friends, New Skills and New Opportunities
 
Using H2O Random Grid Search for Hyper-parameters Optimization
Using H2O Random Grid Search for Hyper-parameters OptimizationUsing H2O Random Grid Search for Hyper-parameters Optimization
Using H2O Random Grid Search for Hyper-parameters Optimization
 
Introduction to Generalised Low-Rank Model and Missing Values
Introduction to Generalised Low-Rank Model and Missing ValuesIntroduction to Generalised Low-Rank Model and Missing Values
Introduction to Generalised Low-Rank Model and Missing Values
 
Improving Model Predictions via Stacking and Hyper-parameters Tuning
Improving Model Predictions via Stacking and Hyper-parameters TuningImproving Model Predictions via Stacking and Hyper-parameters Tuning
Improving Model Predictions via Stacking and Hyper-parameters Tuning
 
Kaggle competitions, new friends, new skills and new opportunities
Kaggle competitions, new friends, new skills and new opportunitiesKaggle competitions, new friends, new skills and new opportunities
Kaggle competitions, new friends, new skills and new opportunities
 
Deploying your Predictive Models as a Service via Domino
Deploying your Predictive Models as a Service via DominoDeploying your Predictive Models as a Service via Domino
Deploying your Predictive Models as a Service via Domino
 
A Systematic, Multi-Criteria Decision Support Framework for Sustainable Drain...
A Systematic, Multi-Criteria Decision Support Framework for Sustainable Drain...A Systematic, Multi-Criteria Decision Support Framework for Sustainable Drain...
A Systematic, Multi-Criteria Decision Support Framework for Sustainable Drain...
 
Designing Sustainable Drainage Systems
Designing Sustainable Drainage SystemsDesigning Sustainable Drainage Systems
Designing Sustainable Drainage Systems
 
Developing a New Decision Support System for SuDS
Developing a New Decision Support System for SuDSDeveloping a New Decision Support System for SuDS
Developing a New Decision Support System for SuDS
 
Udacity Statement (Introduction to Statistics, August 2012)
Udacity Statement (Introduction to Statistics, August 2012)Udacity Statement (Introduction to Statistics, August 2012)
Udacity Statement (Introduction to Statistics, August 2012)
 
Coursera Statement (Computational Investing, Part I,
Coursera Statement (Computational Investing, Part I, Coursera Statement (Computational Investing, Part I,
Coursera Statement (Computational Investing, Part I,
 
Coursera Statement (Computing for Data Analysis, Oct 2013)
Coursera Statement (Computing for Data Analysis, Oct 2013)Coursera Statement (Computing for Data Analysis, Oct 2013)
Coursera Statement (Computing for Data Analysis, Oct 2013)
 
Coursera Statement (Data Analysis, Mar 2013)
Coursera Statement (Data Analysis, Mar 2013)Coursera Statement (Data Analysis, Mar 2013)
Coursera Statement (Data Analysis, Mar 2013)
 
A Systematic, Multi-Criteria Decision Support Framework for Sustainable Drain...
A Systematic, Multi-Criteria Decision Support Framework for Sustainable Drain...A Systematic, Multi-Criteria Decision Support Framework for Sustainable Drain...
A Systematic, Multi-Criteria Decision Support Framework for Sustainable Drain...
 

Recently uploaded

Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf31events.com
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commercemanigoyal112
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Software Coding for software engineering
Software Coding for software engineeringSoftware Coding for software engineering
Software Coding for software engineeringssuserb3a23b
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentationvaddepallysandeep122
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 

Recently uploaded (20)

Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commerce
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Software Coding for software engineering
Software Coding for software engineeringSoftware Coding for software engineering
Software Coding for software engineering
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentation
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 

Introduction to Machine Learning with H2O and Python

  • 1. Introduction to Machine Learning with H2O and Python Jo-fai (Joe) Chow Data Scientist joe@h2o.ai @matlabulous H2O Tutorial at Analyx 20th April, 2017
  • 2. Slides and Code Examples: bit.ly/joe_h2o_tutorials 2
  • 3. About Me • Civil (Water) Engineer • 2010 – 2015 • Consultant (UK) • Utilities • Asset Management • Constrained Optimization • Industrial PhD (UK) • Infrastructure Design Optimization • Machine Learning + Water Engineering • Discovered H2O in 2014 • Data Scientist • 2015 • Virgin Media (UK) • Domino Data Lab (Silicon Valley, US) • 2016 – Present • H2O.ai (Silicon Valley, US) 3
  • 5. Side Project #1 – Crime Data Visualization 5 https://github.com/woobe/rApps/tree/master/crimemap http://insidebigdata.com/2013/11/30/visualization-week-crimemap/
  • 6. Side Project #2 – Data Visualization Contest 6 https://github.com/woobe/rugsmaps http://blog.revolutionanalytics.com/2014/08/winner-for-revolution-analytics-user-group-map-contest.html
  • 7. Side Project #3 7 Developing R Packages for Fun rPlotter (2014)
  • 8. About Me 8 R + H2O + Domino for Kaggle Guest Blog Post for Domino & H2O (2014) • The Long Story • bit.ly/joe_kaggle_story
  • 9. Agenda • About H2O.ai • Company • Machine Learning Platform • Tutorial • H2O Python Module • Download & Install • Step-by-Step Examples: • Basic Data Import / Manipulation • Regression & Classification (Basics) • Regression & Classification (Advanced) • Using H2O in the Cloud 9
  • 10. Agenda • About H2O.ai • Company • Machine Learning Platform • Tutorial • H2O Python Module • Download & Install • Step-by-Step Examples: • Basic Data Import / Manipulation • Regression & Classification (Basics) • Regression & Classification (Advanced) • Using H2O in the Cloud 10 Background Information For beginners As if I am working on Kaggle competitions Short Break
  • 12. Company Overview Founded 2011 Venture-backed, debuted in 2012 Products • H2O Open Source In-Memory AI Prediction Engine • Sparkling Water • Steam Mission Operationalize Data Science, and provide a platform for users to build beautiful data products Team 70 employees • Distributed Systems Engineers doing Machine Learning • World-class visualization designers Headquarters Mountain View, CA 12
  • 15. 15
  • 16. 0 10000 20000 30000 40000 50000 60000 70000 1-Jan-15 1-Jul-15 1-Jan-16 1-Oct-16 # H2O Users H2O Community Growth Tremendous Momentum Globally 65,000+ users globally (Sept 2016) • 65,000+ users from ~8,000 companies in 140 countries. Top 5 from: Large User Circle * DATA FROM GOOGLE ANALYTICS EMBEDDED IN THE END USER PRODUCT 16 0 2000 4000 6000 8000 10000 1-Jan-15 1-Jul-15 1-Jan-16 1-Oct-16 # Companies Using H2O ~8,000+ companies (Sept 2016) +127% +60%
  • 18. H2O for Kaggle Competitions 18
  • 19. H2O for Academic Research 19 http://www.sciencedirect.com/science/article/pii/S0377221716308657 https://arxiv.org/abs/1509.01199
  • 20. Users In Various Verticals Adore H2O Financial Insurance MarketingTelecom Healthcare 20
  • 23. H2O Machine Learning Platform 23
  • 24. 24
  • 25. 25
  • 27. HDFS S3 NFS Distributed In-Memory Load Data Loss-less Compression H2O Compute Engine Production Scoring Environment Exploratory & Descriptive Analysis Feature Engineering & Selection Supervised & Unsupervised Modeling Model Evaluation & Selection Predict Data & Model Storage Model Export: Plain Old Java Object Your Imagination Data Prep Export: Plain Old Java Object Local SQL High Level Architecture 27
  • 28. HDFS S3 NFS Distributed In-Memory Load Data Loss-less Compression H2O Compute Engine Production Scoring Environment Exploratory & Descriptive Analysis Feature Engineering & Selection Supervised & Unsupervised Modeling Model Evaluation & Selection Predict Data & Model Storage Model Export: Plain Old Java Object Your Imagination Data Prep Export: Plain Old Java Object Local SQL High Level Architecture 28 Import Data from Multiple Sources
  • 29. HDFS S3 NFS Distributed In-Memory Load Data Loss-less Compression H2O Compute Engine Production Scoring Environment Exploratory & Descriptive Analysis Feature Engineering & Selection Supervised & Unsupervised Modeling Model Evaluation & Selection Predict Data & Model Storage Model Export: Plain Old Java Object Your Imagination Data Prep Export: Plain Old Java Object Local SQL High Level Architecture 29 Fast, Scalable & Distributed Compute Engine Written in Java
  • 30. HDFS S3 NFS Distributed In-Memory Load Data Loss-less Compression H2O Compute Engine Production Scoring Environment Exploratory & Descriptive Analysis Feature Engineering & Selection Supervised & Unsupervised Modeling Model Evaluation & Selection Predict Data & Model Storage Model Export: Plain Old Java Object Your Imagination Data Prep Export: Plain Old Java Object Local SQL High Level Architecture 30 Fast, Scalable & Distributed Compute Engine Written in Java
  • 31. Supervised Learning • Generalized Linear Models: Binomial, Gaussian, Gamma, Poisson and Tweedie • Naïve Bayes Statistical Analysis Ensembles • Distributed Random Forest: Classification or regression models • Gradient Boosting Machine: Produces an ensemble of decision trees with increasing refined approximations Deep Neural Networks • Deep learning: Create multi-layer feed forward neural networks starting with an input layer followed by multiple layers of nonlinear transformations Algorithms Overview Unsupervised Learning • K-means: Partitions observations into k clusters/groups of the same spatial size. Automatically detect optimal k Clustering Dimensionality Reduction • Principal Component Analysis: Linearly transforms correlated variables to independent components • Generalized Low Rank Models: extend the idea of PCA to handle arbitrary data consisting of numerical, Boolean, categorical, and missing data Anomaly Detection • Autoencoders: Find outliers using a nonlinear dimensionality reduction using deep learning 31
  • 32. H2O Deep Learning in Action 32
  • 33. HDFS S3 NFS Distributed In-Memory Load Data Loss-less Compression H2O Compute Engine Production Scoring Environment Exploratory & Descriptive Analysis Feature Engineering & Selection Supervised & Unsupervised Modeling Model Evaluation & Selection Predict Data & Model Storage Model Export: Plain Old Java Object Your Imagination Data Prep Export: Plain Old Java Object Local SQL High Level Architecture 33 Multiple Interfaces
  • 36. 36 H2O Flow (Web) Interface
  • 37. HDFS S3 NFS Distributed In-Memory Load Data Loss-less Compression H2O Compute Engine Production Scoring Environment Exploratory & Descriptive Analysis Feature Engineering & Selection Supervised & Unsupervised Modeling Model Evaluation & Selection Predict Data & Model Storage Model Export: Plain Old Java Object Your Imagination Data Prep Export: Plain Old Java Object Local SQL High Level Architecture 37 Export Standalone Models for Production
  • 39. H2O + Python Tutorial 39
  • 40. Learning Objectives • Start and connect to a local H2O cluster from Python. • Import data from Python data frames, local files or web. • Perform basic data transformation and exploration. • Train regression and classification models using various H2O machine learning algorithms. • Evaluate models and make predictions. • Improve performance by tuning and stacking. • Connect to H2O cluster in the cloud. 40
  • 41. 41
  • 42. Install H2O h2o.ai -> Download -> Install in Python 42
  • 43. 43
  • 44. Start and Connect to a Local H2O Cluster py_01_data_in_h2o.ipynb 44
  • 45. Local H2O Cluster 45 Import H2O module Start a local H2O cluster nthreads = -1 means using ALL CPU resources
  • 47. Importing Data into H2O py_01_data_in_h2o.ipynb 47
  • 48. 48 Import data into H2O cluster (instead of Python’s memory)
  • 49. 49 Directly from data on the web
  • 50. 50 Convert from Pandas to H2O data frame
  • 51. Basic Data Transformation & Exploration py_02_data_manipulation.ipynb (see notebooks) 51
  • 53. 53 Only two unique values (0 or 1)
  • 54. 54 “enum” is the data type of categorical data in Java Convert numerical to categorical values
  • 55. 55 Only three unique values (1, 2 or 3)
  • 57. Supervised Learning • Generalized Linear Models: Binomial, Gaussian, Gamma, Poisson and Tweedie • Naïve Bayes Statistical Analysis Ensembles • Distributed Random Forest: Classification or regression models • Gradient Boosting Machine: Produces an ensemble of decision trees with increasing refined approximations Deep Neural Networks • Deep learning: Create multi-layer feed forward neural networks starting with an input layer followed by multiple layers of nonlinear transformations Algorithms Overview Unsupervised Learning • K-means: Partitions observations into k clusters/groups of the same spatial size. Automatically detect optimal k Clustering Dimensionality Reduction • Principal Component Analysis: Linearly transforms correlated variables to independent components • Generalized Low Rank Models: extend the idea of PCA to handle arbitrary data consisting of numerical, Boolean, categorical, and missing data Anomaly Detection • Autoencoders: Find outliers using a nonlinear dimensionality reduction using deep learning 57
  • 60. 60 Define 11 Numerical Features using their Column Names
  • 61. 61 Split dataset so we can measure out-of-bag performance later
  • 63. 63 Regression Performance – MSE Lower the better
  • 66. 66
  • 67. 67 API for other ML algorithms
  • 68. 68 API for other ML algorithms
  • 72. 72 Define features manually Split dataset so we can measure out-of-bag performance later
  • 74. Classification Performance – Confusion Matrix 74
  • 79. 79 API for other ML algorithms
  • 80. 80 API for other ML algorithms
  • 81. End of Basics Let’s have a break ☺ 81
  • 83. Improving Model Performance (Step-by-Step) 83 Model Settings MSE (CV) MSE (Test) GBM with default settings N/A 0.4551 GBM with manual settings N/A 0.4433 Manual settings + cross-validation 0.4502 0.4433 Manual + CV + early stopping 0.4429 0.4287 CV + early stopping + full grid search 0.4378 0.4196 CV + early stopping + random grid search 0.4227 0.4047 Stacking models from random grid search N/A 0.3969 Lower Mean Square Error = Better Performance
  • 84. 84 Using same dataset and split as previous tutorial
  • 85. 85 Baseline Model Write down MSE on Test set
  • 86. Improving Model Performance (Step-by-Step) 86 Model Settings MSE (CV) MSE (Test) GBM with default settings N/A 0.4551 GBM with manual settings N/A 0.4433 Manual settings + cross-validation 0.4502 0.4433 Manual + CV + early stopping 0.4429 0.4287 CV + early stopping + full grid search 0.4378 0.4196 CV + early stopping + random grid search 0.4227 0.4047 Stacking models from random grid search N/A 0.3969
  • 88. Improving Model Performance (Step-by-Step) 88 Model Settings MSE (CV) MSE (Test) GBM with default settings N/A 0.4551 GBM with manual settings N/A 0.4433 Manual settings + cross-validation 0.4502 0.4433 Manual + CV + early stopping 0.4429 0.4287 CV + early stopping + full grid search 0.4378 0.4196 CV + early stopping + random grid search 0.4227 0.4047 Stacking models from random grid search N/A 0.3969
  • 90. 90 Manual settings based on experience + 5-fold CV
  • 92. Improving Model Performance (Step-by-Step) 92 Model Settings MSE (CV) MSE (Test) GBM with default settings N/A 0.4551 GBM with manual settings N/A 0.4433 Manual settings + cross-validation 0.4502 0.4433 Manual + CV + early stopping 0.4429 0.4287 CV + early stopping + full grid search 0.4378 0.4196 CV + early stopping + random grid search 0.4227 0.4047 Stacking models from random grid search N/A 0.3969
  • 95. Improving Model Performance (Step-by-Step) 95 Model Settings MSE (CV) MSE (Test) GBM with default settings N/A 0.4551 GBM with manual settings N/A 0.4433 Manual settings + cross-validation 0.4502 0.4433 Manual + CV + early stopping 0.4429 0.4287 CV + early stopping + full grid search 0.4378 0.4196 CV + early stopping + random grid search 0.4227 0.4047 Stacking models from random grid search N/A 0.3969
  • 96. Grid Search 96 Combination Parameter 1 Parameter 2 1 0.7 0.7 2 0.7 0.8 3 0.7 0.9 4 0.8 0.7 5 0.8 0.8 6 0.8 0.9 7 0.9 0.7 8 0.9 0.8 9 0.9 0.9
  • 97. 97
  • 98. 98 Sort Results by MSE Best Model on Top Lowest MSE
  • 99. 99 Stopped at 187 trees (automatic)
  • 100. Improving Model Performance (Step-by-Step) 100 Model Settings MSE (CV) MSE (Test) GBM with default settings N/A 0.4551 GBM with manual settings N/A 0.4433 Manual settings + cross-validation 0.4502 0.4433 Manual + CV + early stopping 0.4429 0.4287 CV + early stopping + full grid search 0.4378 0.4196 CV + early stopping + random grid search 0.4227 0.4047 Stacking models from random grid search N/A 0.3969
  • 101. 101 Expand Search Space Only search for 9 combinations
  • 102. 102 Sort Results by MSE Best Model on Top Lowest MSE
  • 103. Improving Model Performance (Step-by-Step) 103 Model Settings MSE (CV) MSE (Test) GBM with default settings N/A 0.4551 GBM with manual settings N/A 0.4433 Manual settings + cross-validation 0.4502 0.4433 Manual + CV + early stopping 0.4429 0.4287 CV + early stopping + full grid search 0.4378 0.4196 CV + early stopping + random grid search 0.4227 0.4047 Stacking models from random grid search N/A 0.3969
  • 106. 106 Keep the Best Model after Random Grid Search
  • 107. 107 Keep the Best Model after Random Grid Search
  • 108. 108 Keep the Best Model after Random Grid Search
  • 109. 109 Lowest MSE = Best Performance API for Stacked Ensembles Use the three models from previous steps
  • 110. Improving Model Performance (Step-by-Step) 110 Model Settings MSE (CV) MSE (Test) GBM with default settings N/A 0.4551 GBM with manual settings N/A 0.4433 Manual settings + cross-validation 0.4502 0.4433 Manual + CV + early stopping 0.4429 0.4287 CV + early stopping + full grid search 0.4378 0.4196 CV + early stopping + random grid search 0.4227 0.4047 Stacking models from random grid search N/A 0.3969 Lowest MSE = Best Performance
  • 112. 112 Highest AUC = Best Performance
  • 113. H2O in the Cloud py_05_h2o_in_the_cloud.ipynb 113
  • 114. 114
  • 115. 115
  • 117. Learning Objectives • Start and connect to a local H2O cluster from Python. • Import data from Python data frames, local files or web. • Perform basic data transformation and exploration. • Train regression and classification models using various H2O machine learning algorithms. • Evaluate models and make predictions. • Improve performance by tuning and stacking. • Connect to H2O cluster in the cloud. 117
  • 118. Improving Model Performance (Step-by-Step) 118 Model Settings MSE (CV) MSE (Test) GBM with default settings N/A 0.4551 GBM with manual settings N/A 0.4433 Manual settings + cross-validation 0.4502 0.4433 Manual + CV + early stopping 0.4429 0.4287 CV + early stopping + full grid search 0.4378 0.4196 CV + early stopping + random grid search 0.4227 0.4047 Stacking models from random grid search N/A 0.3969 Lowest MSE = Best Performance
  • 119. 119
  • 120. • Our Friends at • Find us at Poznan R Meetup • Today at 6:15 pm • Uniwersytet Ekonomiczny w Poznaniu Centrum Edukacyjne Usług Elektronicznych 120 Thanks! • Code, Slides & Documents • bit.ly/h2o_meetups • docs.h2o.ai • Contact • joe@h2o.ai • @matlabulous • github.com/woobe • Please search/ask questions on Stack Overflow • Use the tag `h2o` (not H2 zero)