Driverless AI is an automated machine learning platform created by H2O.ai that can complete an entire machine learning workflow from data to deployment in as little as 2 hours. It uses techniques developed by H2O Grandmasters such as automated feature engineering, model tuning, and ensemble building to generate high performing models with little to no input from users. Driverless AI supports both structured and unstructured data types including text/NLP and time series data and generates documentation of all modeling steps.
7. 2 months for Grandmasters — 2 hours for Driverless AI
single run, fully automated: 2h on DGX Station! 6h on PC
Driverless AI: 10th place in private LB at Kaggle (out of 2926)
Driverless AI: top 10 in BNP Paribas Kaggle competition
8. Automatic Visualization
Scalable outlier detection
(no sampling)
Contains novel statistical algorithms to
only show “relevant” aspects of the data
(coming soon: automated data cleaning)
9. MLI - Machine Learning Interpretation
Gain confidence in models before deploying them!
15. • Automatic handling of time groups (e.g. [time, store_id, department_id])
• Robust validation framework
• Accounting for time gaps between train & test
• Accounting for length of forecast horizon the user is interested in
• Comprehensive set of recipes for time series specific feature engineering
• Date features like day of week, day of month etc.
• Optimal (target)-lags taking account of detected time groups
• Interactions of lagged-features
• Exponentially Weighted Moving Averages of n-th order differentiated past
information
• Aggregation of past information (mean, std, sums, etc.) across time groups and
for different time intervals (e.g. every week, every 2 weeks etc.)
• Fully integrated into Driverless AI‘s optimization pipeline
Time Series in Driverless AI
16. Text / Natural Language Processing in Driverless AI
https://blog.h2o.ai/2018/09/automatic-feature-engineering-text-analytics-latest-addition-kaggle-grandmasters-recipes/
i.i.d and Time-Series Recipes
NLP Recipes: Statistical and Deep Learning
on roadmap
more information:
Recipes can mix & match
25. Feature v1.0 v1.1 v1.2 v1.3
v1.4
(NOW)
v1.5 v2.0
Kaggle Grandmaster Recipes for i.i.d. data
Automatic Visualization
Machine Learning Interpretability
GBM (XGBoost) for high accuracy incl. stacked ensembles (CPU/GPU)
5-minute Install with Docker for Linux/Mac/Windows - Cloud/OnPrem
Standalone Python Scoring Pipeline
Hardware acceleration: NVIDIA GPUs (DGX-1 etc.)
User Management and Security (LDAP/Kerberos)
Data Connectors: NFS/HDFS/S3/GCS/BigQuery, CSV/Excel/Parquet/Feather
GLM (Linear models) for high interpretability (CPU/GPU)
Native Installer: RPM/DEB
Cloud Neutral: Amazon/Microsoft/Google
Kaggle Grandmaster Recipes for Time-Series
IBM Power8/Power9
AutoDoc
Deep Learning TensorFlow Models (CPU/GPU)
Standalone Java Scoring Pipeline (MOJO)
Deep Learning for NLP / Text (CPU/GPU)
LightGBM Models (CPU/GPU)
Improved Time-Series Recipes (Multiple Windows), MLI for Time-Series
Improved Final Ensemble
Local Feature Brain
C++ Scoring Pipeline
FTRL Models
Multi-Node Training and much more (based on customer demand)
Driverless AI Roadmap