SlideShare una empresa de Scribd logo
1 de 26
Descargar para leer sin conexión
Introduction	to	
H2O	Driverless	AI
Arno	Candel	
CTO	
H2O.ai	
@arnocandel
• Founded in 2011
• 110+ employees
• Mountain View, CA
• VC funded
• Open-Source Culture
H2O.ai is a Leader in the Gartner Magic Quadrant
Shortage of Data Scientists
Data	
Processing
Feature	
Engineering
Model	
Tuning
Final	Ensemble	
Training
Scoring	
Pipeline
H2O Driverless AI - From Data to Deployment
2 months for Grandmasters — 2 hours for Driverless AI
single run, fully automated: 2h on DGX Station! 6h on PC
Driverless AI: 10th place in private LB at Kaggle (out of 2926)
Driverless AI: top 10 in BNP Paribas Kaggle competition
Automatic Visualization
Scalable outlier detection
(no sampling)
Contains novel statistical algorithms to

only show “relevant” aspects of the data

(coming soon: automated data cleaning)
MLI - Machine Learning Interpretation
Gain confidence in models before deploying them!
http://h2o.ai
21-day free trial
Easy installation:
Native and Dockerized deployment options
Secret Sauce: 1) Grandmaster Feature Engineering
Numerical/Categorical Interactions, Target
Encoding, Clustering, Dimensionality Reduction,
Weight of Evidence, etc.
Time-Series: Lags and historical aggregates
with causality constraints
Secret Sauce: 2) Grandmaster Pipeline Tuning + Validation
19,000 features tested
1,000 models trained
reliable generalization estimates (overfitting avoidance)
Example: Driverless AI BNP Paribas on 3-GPU workstation
evolutionary strategies
DOI: 10.1126/science.aaa9375
MTV
1 final optimal
scoring pipeline
massively parallel processing
(multi-CPU, multi-GPU)
https://web.stanford.edu/~hastie/Papers/ESLII.pdf
http://www.deeplearningbook.org
Statistical Learning vs Deep Learning - We Do Both!
Typically better for structured data
(CSV, SQL, Transactional)
Typically better for unstructured data
(Images, Video, Audio, Text)
GLM/CART/RF/GBM/XGBoost

K-Means/PCA/SVD
TensorFlow Deep Learning
time:
Gap=1		|	Forecast	Horizon=2
invalid	lag	size	(no	information	available)
valid	lag	size	(information	available)
1 2 3 4 5 6 7 8 9 10 11 12
[Gap]
"[	Gap	]" "8" "9" [Gap] [Gap]
test
tvs	train			 tvs	valid
train
test
Time Series in Driverless AI
• Automatic	Selection	or	Manual	Control	for:	
• Forecast	Horizon	
• Gap	between	Training	and	Production
• Automatic	handling	of	time	groups	(e.g.	[time,	store_id,	department_id])	
• Robust	validation	framework	
• Accounting	for	time	gaps	between	train	&	test	
• Accounting	for	length	of	forecast	horizon	the	user	is	interested	in	
• Comprehensive	set	of	recipes	for	time	series	specific	feature	engineering	
• Date	features	like	day	of	week,	day	of	month	etc.	
• Optimal	(target)-lags	taking	account	of	detected	time	groups	
• Interactions	of	lagged-features	
• Exponentially	Weighted	Moving	Averages	of	n-th	order	differentiated	past	
information	
• Aggregation	of	past	information	(mean,	std,	sums,	etc.)	across	time	groups	and	
for	different	time	intervals	(e.g.	every	week,	every	2	weeks	etc.)	
• Fully	integrated	into	Driverless	AI‘s	optimization	pipeline
Time Series in Driverless AI
Text / Natural Language Processing in Driverless AI
https://blog.h2o.ai/2018/09/automatic-feature-engineering-text-analytics-latest-addition-kaggle-grandmasters-recipes/
i.i.d and Time-Series Recipes
NLP Recipes: Statistical and Deep Learning
on roadmap
more information:
Recipes can mix & match
Text / Natural Language Processing in Driverless AI
Every	step	is	documented	automatically

(custom	templates	are	supported)
AutoDoc - Automatic Documentation of Experiments
Data	
Processing
Feature	
Engineering
Model	
Tuning
Final	Ensemble	
Training
Scoring	
Pipeline
Feature v1.0 v1.1 v1.2 v1.3
v1.4
(NOW)
v1.5 v2.0
Kaggle Grandmaster Recipes for i.i.d. data
Automatic Visualization
Machine Learning Interpretability
GBM (XGBoost) for high accuracy incl. stacked ensembles (CPU/GPU)
5-minute Install with Docker for Linux/Mac/Windows - Cloud/OnPrem
Standalone Python Scoring Pipeline
Hardware acceleration: NVIDIA GPUs (DGX-1 etc.)
User Management and Security (LDAP/Kerberos)
Data Connectors: NFS/HDFS/S3/GCS/BigQuery, CSV/Excel/Parquet/Feather
GLM (Linear models) for high interpretability (CPU/GPU)
Native Installer: RPM/DEB
Cloud Neutral: Amazon/Microsoft/Google
Kaggle Grandmaster Recipes for Time-Series
IBM Power8/Power9
AutoDoc
Deep Learning TensorFlow Models (CPU/GPU)
Standalone Java Scoring Pipeline (MOJO)
Deep Learning for NLP / Text (CPU/GPU)
LightGBM Models (CPU/GPU)
Improved Time-Series Recipes (Multiple Windows), MLI for Time-Series
Improved Final Ensemble
Local Feature Brain
C++ Scoring Pipeline
FTRL Models
Multi-Node Training and much more (based on customer demand)
Driverless AI Roadmap
Live Demo
16th	(of	2926)

Más contenido relacionado

La actualidad más candente

Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...
Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...
Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...
Databricks
 
Driverless AI - Intro + Interactive Hands-on Lab
Driverless AI - Intro + Interactive Hands-on LabDriverless AI - Intro + Interactive Hands-on Lab
Driverless AI - Intro + Interactive Hands-on Lab
Sri Ambati
 
Jakub Hava, H2O.ai - Productionizing Apache Spark Models using H2O - H2O Worl...
Jakub Hava, H2O.ai - Productionizing Apache Spark Models using H2O - H2O Worl...Jakub Hava, H2O.ai - Productionizing Apache Spark Models using H2O - H2O Worl...
Jakub Hava, H2O.ai - Productionizing Apache Spark Models using H2O - H2O Worl...
Sri Ambati
 
H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...
H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...
H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...
Sri Ambati
 
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...
Databricks
 

La actualidad más candente (20)

Python and H2O with Cliff Click at PyData Dallas 2015
Python and H2O with Cliff Click at PyData Dallas 2015Python and H2O with Cliff Click at PyData Dallas 2015
Python and H2O with Cliff Click at PyData Dallas 2015
 
Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...
Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...
Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...
 
Hadoop Summit EU 2013: Parallel Linear Regression, IterativeReduce, and YARN
Hadoop Summit EU 2013: Parallel Linear Regression, IterativeReduce, and YARNHadoop Summit EU 2013: Parallel Linear Regression, IterativeReduce, and YARN
Hadoop Summit EU 2013: Parallel Linear Regression, IterativeReduce, and YARN
 
Lessons Learned from Using Spark for Evaluating Road Detection at BMW Autonom...
Lessons Learned from Using Spark for Evaluating Road Detection at BMW Autonom...Lessons Learned from Using Spark for Evaluating Road Detection at BMW Autonom...
Lessons Learned from Using Spark for Evaluating Road Detection at BMW Autonom...
 
Operationalizing AI at scale using MADlib Flow - Greenplum Summit 2019
Operationalizing AI at scale using MADlib Flow - Greenplum Summit 2019Operationalizing AI at scale using MADlib Flow - Greenplum Summit 2019
Operationalizing AI at scale using MADlib Flow - Greenplum Summit 2019
 
[Giovanni Galloro] How to use machine learning on Google Cloud Platform
[Giovanni Galloro] How to use machine learning on Google Cloud Platform[Giovanni Galloro] How to use machine learning on Google Cloud Platform
[Giovanni Galloro] How to use machine learning on Google Cloud Platform
 
Driverless AI - Intro + Interactive Hands-on Lab
Driverless AI - Intro + Interactive Hands-on LabDriverless AI - Intro + Interactive Hands-on Lab
Driverless AI - Intro + Interactive Hands-on Lab
 
Automated Production Ready ML at Scale
Automated Production Ready ML at ScaleAutomated Production Ready ML at Scale
Automated Production Ready ML at Scale
 
A Microservices Framework for Real-Time Model Scoring Using Structured Stream...
A Microservices Framework for Real-Time Model Scoring Using Structured Stream...A Microservices Framework for Real-Time Model Scoring Using Structured Stream...
A Microservices Framework for Real-Time Model Scoring Using Structured Stream...
 
Portable Scalable Data Visualization Techniques for Apache Spark and Python N...
Portable Scalable Data Visualization Techniques for Apache Spark and Python N...Portable Scalable Data Visualization Techniques for Apache Spark and Python N...
Portable Scalable Data Visualization Techniques for Apache Spark and Python N...
 
Saving Energy in Homes with a Unified Approach to Data and AI
Saving Energy in Homes with a Unified Approach to Data and AISaving Energy in Homes with a Unified Approach to Data and AI
Saving Energy in Homes with a Unified Approach to Data and AI
 
Jakub Hava, H2O.ai - Productionizing Apache Spark Models using H2O - H2O Worl...
Jakub Hava, H2O.ai - Productionizing Apache Spark Models using H2O - H2O Worl...Jakub Hava, H2O.ai - Productionizing Apache Spark Models using H2O - H2O Worl...
Jakub Hava, H2O.ai - Productionizing Apache Spark Models using H2O - H2O Worl...
 
Introduction to data science with H2O-Chicago
Introduction to data science with H2O-ChicagoIntroduction to data science with H2O-Chicago
Introduction to data science with H2O-Chicago
 
H2O World 2017 Keynote - Jim McHugh, VP & GM of Data Center, NVIDIA
H2O World 2017 Keynote - Jim McHugh, VP & GM of Data Center, NVIDIAH2O World 2017 Keynote - Jim McHugh, VP & GM of Data Center, NVIDIA
H2O World 2017 Keynote - Jim McHugh, VP & GM of Data Center, NVIDIA
 
MLOps with Kubeflow
MLOps with Kubeflow MLOps with Kubeflow
MLOps with Kubeflow
 
H2O Advancements - Arno Candel
H2O Advancements - Arno CandelH2O Advancements - Arno Candel
H2O Advancements - Arno Candel
 
DevOps and Machine Learning (Geekwire Cloud Tech Summit)
DevOps and Machine Learning (Geekwire Cloud Tech Summit)DevOps and Machine Learning (Geekwire Cloud Tech Summit)
DevOps and Machine Learning (Geekwire Cloud Tech Summit)
 
H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...
H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...
H2O Machine Learning with KNIME Analytics Platform - Christian Dietz - H2O AI...
 
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...
 
Kubeflow and Data Science in Kubernetes
Kubeflow and Data Science in KubernetesKubeflow and Data Science in Kubernetes
Kubeflow and Data Science in Kubernetes
 

Similar a Get Behind the Wheel with H2O Driverless AI Hands-On Training

Driverless AI - Arno Candel, H2O.ai
Driverless AI - Arno Candel, H2O.aiDriverless AI - Arno Candel, H2O.ai
Driverless AI - Arno Candel, H2O.ai
Sri Ambati
 
A Look Under the Hood of H2O Driverless AI
A Look Under the Hood of H2O Driverless AIA Look Under the Hood of H2O Driverless AI
A Look Under the Hood of H2O Driverless AI
Sri Ambati
 
Danny Bickson - Python based predictive analytics with GraphLab Create
Danny Bickson - Python based predictive analytics with GraphLab Create Danny Bickson - Python based predictive analytics with GraphLab Create
Danny Bickson - Python based predictive analytics with GraphLab Create
PyData
 
Gredy - test automation management and team collaboration
Gredy - test automation management and team collaborationGredy - test automation management and team collaboration
Gredy - test automation management and team collaboration
Gredy
 

Similar a Get Behind the Wheel with H2O Driverless AI Hands-On Training (20)

Driverless AI - Arno Candel, H2O.ai
Driverless AI - Arno Candel, H2O.aiDriverless AI - Arno Candel, H2O.ai
Driverless AI - Arno Candel, H2O.ai
 
Bring Your Own Recipes Hands-On Session
Bring Your Own Recipes Hands-On Session Bring Your Own Recipes Hands-On Session
Bring Your Own Recipes Hands-On Session
 
Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)
Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)
Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)
 
A Look Under the Hood of H2O Driverless AI
A Look Under the Hood of H2O Driverless AIA Look Under the Hood of H2O Driverless AI
A Look Under the Hood of H2O Driverless AI
 
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
 
Glasswall Wardley Maps & Services
Glasswall Wardley Maps & ServicesGlasswall Wardley Maps & Services
Glasswall Wardley Maps & Services
 
Predicting medical tests results using Driverless AI
Predicting medical tests results using Driverless AIPredicting medical tests results using Driverless AI
Predicting medical tests results using Driverless AI
 
Ahmed El Mawaziny CV
Ahmed El Mawaziny CVAhmed El Mawaziny CV
Ahmed El Mawaziny CV
 
A Look Under the Hood of H2O Driverless AI, Arno Candel - H2O World San Franc...
A Look Under the Hood of H2O Driverless AI, Arno Candel - H2O World San Franc...A Look Under the Hood of H2O Driverless AI, Arno Candel - H2O World San Franc...
A Look Under the Hood of H2O Driverless AI, Arno Candel - H2O World San Franc...
 
HR management system
HR management systemHR management system
HR management system
 
Danny Bickson - Python based predictive analytics with GraphLab Create
Danny Bickson - Python based predictive analytics with GraphLab Create Danny Bickson - Python based predictive analytics with GraphLab Create
Danny Bickson - Python based predictive analytics with GraphLab Create
 
Gredy - test automation management and team collaboration
Gredy - test automation management and team collaborationGredy - test automation management and team collaboration
Gredy - test automation management and team collaboration
 
C19013010 the tutorial to build shared ai services session 1
C19013010  the tutorial to build shared ai services session 1C19013010  the tutorial to build shared ai services session 1
C19013010 the tutorial to build shared ai services session 1
 
How To Build, Integrate, and Deploy Real-Time Streaming Pipelines On Kubernetes
How To Build, Integrate, and Deploy Real-Time Streaming Pipelines On KubernetesHow To Build, Integrate, and Deploy Real-Time Streaming Pipelines On Kubernetes
How To Build, Integrate, and Deploy Real-Time Streaming Pipelines On Kubernetes
 
Migrating to Cloud: Inhouse Hadoop to Databricks (3)
Migrating to Cloud: Inhouse Hadoop to Databricks (3)Migrating to Cloud: Inhouse Hadoop to Databricks (3)
Migrating to Cloud: Inhouse Hadoop to Databricks (3)
 
Workload Automation for Cloud Migration and Machine Learning Platform
Workload Automation for Cloud Migration and Machine Learning PlatformWorkload Automation for Cloud Migration and Machine Learning Platform
Workload Automation for Cloud Migration and Machine Learning Platform
 
The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?
 
Machine Learning & Predictive Maintenance
Machine Learning &  Predictive MaintenanceMachine Learning &  Predictive Maintenance
Machine Learning & Predictive Maintenance
 
Estimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics PlatformEstimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics Platform
 
Relevance of time series databases & druid.io
Relevance of time series databases & druid.ioRelevance of time series databases & druid.io
Relevance of time series databases & druid.io
 

Más de Sri Ambati

Más de Sri Ambati (20)

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptx
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek
 
LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5th
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for Production
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
 
Risk Management for LLMs
Risk Management for LLMsRisk Management for LLMs
Risk Management for LLMs
 
Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the Way
 
Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2O
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM Papers
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email Again
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
 
AI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneyAI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation Journey
 

Último

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 

Get Behind the Wheel with H2O Driverless AI Hands-On Training

  • 2. • Founded in 2011 • 110+ employees • Mountain View, CA • VC funded • Open-Source Culture H2O.ai is a Leader in the Gartner Magic Quadrant
  • 3. Shortage of Data Scientists
  • 4.
  • 6.
  • 7. 2 months for Grandmasters — 2 hours for Driverless AI single run, fully automated: 2h on DGX Station! 6h on PC Driverless AI: 10th place in private LB at Kaggle (out of 2926) Driverless AI: top 10 in BNP Paribas Kaggle competition
  • 8. Automatic Visualization Scalable outlier detection (no sampling) Contains novel statistical algorithms to
 only show “relevant” aspects of the data
 (coming soon: automated data cleaning)
  • 9. MLI - Machine Learning Interpretation Gain confidence in models before deploying them!
  • 10. http://h2o.ai 21-day free trial Easy installation: Native and Dockerized deployment options
  • 11. Secret Sauce: 1) Grandmaster Feature Engineering Numerical/Categorical Interactions, Target Encoding, Clustering, Dimensionality Reduction, Weight of Evidence, etc. Time-Series: Lags and historical aggregates with causality constraints
  • 12. Secret Sauce: 2) Grandmaster Pipeline Tuning + Validation 19,000 features tested 1,000 models trained reliable generalization estimates (overfitting avoidance) Example: Driverless AI BNP Paribas on 3-GPU workstation evolutionary strategies DOI: 10.1126/science.aaa9375 MTV 1 final optimal scoring pipeline massively parallel processing (multi-CPU, multi-GPU)
  • 13. https://web.stanford.edu/~hastie/Papers/ESLII.pdf http://www.deeplearningbook.org Statistical Learning vs Deep Learning - We Do Both! Typically better for structured data (CSV, SQL, Transactional) Typically better for unstructured data (Images, Video, Audio, Text) GLM/CART/RF/GBM/XGBoost
 K-Means/PCA/SVD TensorFlow Deep Learning
  • 14. time: Gap=1 | Forecast Horizon=2 invalid lag size (no information available) valid lag size (information available) 1 2 3 4 5 6 7 8 9 10 11 12 [Gap] "[ Gap ]" "8" "9" [Gap] [Gap] test tvs train tvs valid train test Time Series in Driverless AI • Automatic Selection or Manual Control for: • Forecast Horizon • Gap between Training and Production
  • 15. • Automatic handling of time groups (e.g. [time, store_id, department_id]) • Robust validation framework • Accounting for time gaps between train & test • Accounting for length of forecast horizon the user is interested in • Comprehensive set of recipes for time series specific feature engineering • Date features like day of week, day of month etc. • Optimal (target)-lags taking account of detected time groups • Interactions of lagged-features • Exponentially Weighted Moving Averages of n-th order differentiated past information • Aggregation of past information (mean, std, sums, etc.) across time groups and for different time intervals (e.g. every week, every 2 weeks etc.) • Fully integrated into Driverless AI‘s optimization pipeline Time Series in Driverless AI
  • 16. Text / Natural Language Processing in Driverless AI https://blog.h2o.ai/2018/09/automatic-feature-engineering-text-analytics-latest-addition-kaggle-grandmasters-recipes/ i.i.d and Time-Series Recipes NLP Recipes: Statistical and Deep Learning on roadmap more information: Recipes can mix & match
  • 17. Text / Natural Language Processing in Driverless AI
  • 18. Every step is documented automatically
 (custom templates are supported) AutoDoc - Automatic Documentation of Experiments Data Processing Feature Engineering Model Tuning Final Ensemble Training Scoring Pipeline
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25. Feature v1.0 v1.1 v1.2 v1.3 v1.4 (NOW) v1.5 v2.0 Kaggle Grandmaster Recipes for i.i.d. data Automatic Visualization Machine Learning Interpretability GBM (XGBoost) for high accuracy incl. stacked ensembles (CPU/GPU) 5-minute Install with Docker for Linux/Mac/Windows - Cloud/OnPrem Standalone Python Scoring Pipeline Hardware acceleration: NVIDIA GPUs (DGX-1 etc.) User Management and Security (LDAP/Kerberos) Data Connectors: NFS/HDFS/S3/GCS/BigQuery, CSV/Excel/Parquet/Feather GLM (Linear models) for high interpretability (CPU/GPU) Native Installer: RPM/DEB Cloud Neutral: Amazon/Microsoft/Google Kaggle Grandmaster Recipes for Time-Series IBM Power8/Power9 AutoDoc Deep Learning TensorFlow Models (CPU/GPU) Standalone Java Scoring Pipeline (MOJO) Deep Learning for NLP / Text (CPU/GPU) LightGBM Models (CPU/GPU) Improved Time-Series Recipes (Multiple Windows), MLI for Time-Series Improved Final Ensemble Local Feature Brain C++ Scoring Pipeline FTRL Models Multi-Node Training and much more (based on customer demand) Driverless AI Roadmap