Professional Documents
Culture Documents
H2O.ai
Machine Intelligence
Introduction
Statistician & Machine Learning Scientist at H2O.ai in
Mountain View, California, USA
Ph.D. in Biostatistics with Designated Emphasis in
Computational Science and Engineering from
UC Berkeley (focus on Machine Learning)
Worked as a data scientist at several startups
Written a handful of machine learning R packages
H2O.ai
Machine Intelligence
Agenda
What/who is H2O.ai?
H2O Machine Learning Software
H2O Architecture
H2O in R & Demo
Sparking Water: H2O on Spark
Ensembles in H2O & Demo
H2O.ai
Machine Intelligence
H2O.ai
H2O Company Team: 35. Founded in 2012, Mountain View, CA
Stanford Math & Systems Engineers
H2O.ai
Machine Intelligence
H2O.ai Founders
SriSatish Ambati
CEO and Co-founder at H2O.ai
Past: Platfora, Cassandra, DataStax, Azul Systems,
UC Berkeley
H2O.ai
Machine Intelligence
Scientific Advisory Council
Dr. Trevor Hastie
John A. Overdeck Professor of Mathematics, Stanford University
PhD in Statistics, Stanford University
Co-author, The Elements of Statistical Learning: Prediction, Inference and Data Mining
Co-author with John Chambers, Statistical Models in S
Co-author, Generalized Additive Models
108,404 citations (via Google Scholar)
H2O.ai
Machine Intelligence
H2O Platform
Part 1 of 7
High Performance ML in R with H2O
H2O.ai
Machine Intelligence
H2O Software
H2O.ai
Machine Intelligence
H2O Overview
Time is valuable
Speed Matters! In-memory is faster
Distributed is faster
High speed AND accuracy
H2O.ai
Machine Intelligence
Current Algorithm Overview
Statistical Analysis Clustering
Linear Models (GLM) K-Means
Cox Proportional Hazards
Nave Bayes Dimension Reduction
Principal Component Analysis
Ensembles
Generalized Low Rank Models
Random Forest
Distributed Trees Solvers & Optimization
Gradient Boosting Machine
R Package - Super Learner Generalized ADMM Solver
Ensembles L-BFGS (Quasi Newton
Method)
Ordinary Least-Square Solver
Deep Neural Networks
Stochastic Gradient Descent
Multi-layer Feed-Forward
Neural Network Data Munging
Auto-encoder
Anomaly Detection Integrated R-Environment
Deep Features Slice, Log Transform
H2O.ai
Machine Intelligence
H2O.ai
Machine Intelligence
H2O Flow Interface
H2O.ai
Machine Intelligence
H2O.ai
Machine Intelligence
http://h2o.ai/download
H2O.ai
Machine Intelligence
https://github.com/h2oai/h2o-3
H2O Architecture
Part 2 of 7
High Performance ML in R with H2O
H2O.ai
Machine Intelligence
H2O Components
H2O.ai
Machine Intelligence
Distributed H2O Frame
H2O.ai
Machine Intelligence
Communication in H2O
H2O requires network communication to JVMs in
unrelated process or machine memory spaces.
Network That network communication can be fast or slow,
Communication or may drop packets & sockets (even TCP can
silently fail), and may need to be retried.
H2O.ai
Machine Intelligence
Data Processing in H2O
Map/Reduce is a nice way to write blatantly
parallel code (although not the only way), and we
Map Reduce support a particularly fast and efficient flavor.
Distributed fork/join and parallel map: within each
node, classic fork / join
We have a GroupBy operator running at scale
(called ddply in the R community).
Group By GroupBy can handle millions of groups on billions
of rows, and runs Map/Reduce tasks on the
group members.
H2O has overloaded all the basic data frame
manipulation functions in R and Python.
Ease of Use
Tasks such as imputation and one-hot encoding
of categoricals is performed inside the algorithms.
H2O.ai
Machine Intelligence
H2O on Amazon
Part 3 of 7
High Performance ML in R with H2O
H2O.ai
Machine Intelligence
H2O on Amazon EC2
H2O.ai
Machine Intelligence
H2O.ai
Machine Intelligence
NERSC Supercomputers
H2O.ai
Machine Intelligence
h2o R package on CRAN
H2O.ai
Machine Intelligence
Start H2O Cluster from R
H2O.ai
Machine Intelligence
H2O in R: Load Data
H2O.ai
Machine Intelligence
H2O in R: Train & Test
H2O.ai
Machine Intelligence
H2O in R: Plotting
H2O.ai
Machine Intelligence
H2O in R: Grid Search
H2O.ai
Machine Intelligence
Live H2O Demo!
https://gist.github.com/ledell
Install H2O (stable): install_h2o_slater.R
Demo: h2o_higgs_simple_demo.R
H2O.ai
Machine Intelligence
H2O Ensemble
Part 5 of 7
High Performance ML in R with H2O
H2O.ai
Machine Intelligence
What is Ensemble Learning?
Ensemble methods use multiple learning algorithms
What it is: to obtain better predictive performance that could be
obtained from any of the constituent learning
algorithms. (Wikipedia, 2015)
Random Forests and Gradient Boosting Machines
(GBM) are both ensembles of decision trees.
Stacking, or Super Learning, is technique for
combining various learners into a single, powerful
learner using a second-level metalearning algorithm.
H2O.ai
Machine Intelligence
H2O Ensemble Overview
Regression
ML Tasks Binary Classification / Ranking
Coming soon: Support for multi-class
H2O.ai
Machine Intelligence
Super Learner: The setup
H2O.ai
Machine Intelligence
Super Learner: The Algorithm
H2O.ai
Machine Intelligence
H2O Ensemble R Interface
H2O.ai
Machine Intelligence
H2O Ensemble R Interface
H2O.ai
Machine Intelligence
Live H2O Demo!
https://gist.github.com/ledell
Install H2O Ensemble: install_h2oEnsemble.R
Demo: lending_club_bad_loans_ensemble.R
H2O.ai
Machine Intelligence
Sparkling Water
Part 6 of 7
High Performance ML in R with H2O
H2O.ai
Machine Intelligence
Apache Spark and SparkR
H2O.ai
Machine Intelligence
H2O vs SparkR
H2O.ai
Machine Intelligence
H2O Sparkling Water
H2O.ai
Machine Intelligence
H2O in Action
Part 7 of 7
High Performance ML in R with H2O
H2O.ai
Machine Intelligence
Actual Customer Use Cases
H2O.ai
Machine Intelligence
H2O on
H2O starter scripts available on Kaggle
H2O is used in many competitions on Kaggle
Mark Landry, H2O Data Scientist and Competitive Kaggler
https://www.kaggle.com/mlandry
H2O.ai
Machine Intelligence
Where to learn more?
H2O Online Training (free): http://learn.h2o.ai
H2O Slidedecks: http://www.slideshare.net/0xdata
H2O Video Presentations: https://www.youtube.com/user/0xdata
H2O Community Events & Meetups: http://h2o.ai/events
Machine Learning & Data Science courses: http://coursebuffet.com
H2O.ai
Machine Intelligence
H2O Booklets
https://github.com/h2oai/h2o-3/tree/master/h2o-docs/src/
booklets/v2_2015/PDFs/online
H2O.ai
Machine Intelligence
35 Speakers
Training
2-Full Days
Nov. 9 - 11
http://world.h2o.ai
20% Discount code:
h2ocommunity
H2O.ai
Customers
Machine Intelligence Community Evangelists
H2O.ai
Machine Intelligence