SlideShare una empresa de Scribd logo
1 de 76
Descargar para leer sin conexión
1 / 47
mlpack: or, How I Learned To Stop Worrying and
Love C++
Ryan R. Curtin
September 23, 2016
Introduction
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
2 / 47
Why are we here?
Introduction
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
2 / 47
Why are we here?
speed
Introduction
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
2 / 47
Why are we here?
speed programming languages
Introduction
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
2 / 47
Why are we here?
speed programming languages machine learning
Introduction
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
2 / 47
Why are we here?
speed programming languages machine learning
C++
Introduction
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
2 / 47
Why are we here?
speed programming languages machine learning
C++ mlpack
Introduction
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
2 / 47
Why are we here?
speed programming languages machine learning
C++ mlpack fast
Graph #1
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
3 / 47
Graph #1
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
3 / 47
low-level high-level
Graph #1
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
3 / 47
low-level high-level
ASM
Graph #1
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
3 / 47
low-level high-level
ASM
Graph #1
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
3 / 47
low-level high-level
ASM VB
Graph #1
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
3 / 47
low-level high-level
ASM VB
Graph #1
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
3 / 47
low-level high-level
ASM VB
C
C++
Java
Python
Scala
Ruby
INTERCAL
C#
PHP
JS
Go
Tcl
Lua
Note: this is not a scientific or particularly accurate representation.
Graph #1
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
3 / 47
low-level high-level
ASM VB
C
C++
Java
Python
Scala
Ruby
INTERCAL
C#
PHP
JS
Go
Tcl
Lua
Note: this is not a scientific or particularly accurate representation.
fast easy
Graph #1
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
3 / 47
low-level high-level
ASM VB
C
C++
Java
Python
Scala
Ruby
INTERCAL
C#
PHP
JS
Go
Tcl
Lua
Note: this is not a scientific or particularly accurate representation.
fast easy
specific portable
The Big Tradeoff
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
4 / 47
speed vs. portability and readability
The Big Tradeoff
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
4 / 47
speed vs. portability and readability
The Big Tradeoff
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
4 / 47
speed vs. portability and readability
If we’re careful, we can get speed, portability, and
readability by using C++.
So, mlpack.
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
5 / 47
What is it?
So, mlpack.
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
5 / 47
What is it?
● a fast general-purpose C++ machine learning library
● contains flexible implementations of common and
cutting-edge machine learning algorithms
● for fast or big runs on single workstations
● bindings are available for R, and are coming for other
languages
● commonly used inside the machine learning community
● 60+ developers from around the world
● frequent participation in the Google Summer of Code
program
R.R. Curtin, J.R. Cline, N.P. Slagle, W.B. March, P. Ram, N.A. Mehta, A.G. Gray, “mlpack: a
scalable C++ machine learning library”, in The Journal of Machine Learning Research, vol. 14,
p. 801–805, 2013.
So, mlpack.
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
5 / 47
What is it?
● a fast general-purpose C++ machine learning library
● contains flexible implementations of common and
cutting-edge machine learning algorithms
● for fast or big runs on single workstations
● bindings are available for R, and are coming for other
languages
● commonly used inside the machine learning community
● 60+ developers from around the world
● frequent participation in the Google Summer of Code
program
R.R. Curtin, J.R. Cline, N.P. Slagle, W.B. March, P. Ram, N.A. Mehta, A.G. Gray, “mlpack: a
scalable C++ machine learning library”, in The Journal of Machine Learning Research, vol. 14,
p. 801–805, 2013.
http://www.mlpack.org/
https://github.com/mlpack/mlpack/
Standard ML
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
6 / 47
mlpack implements a lot of standard machine learning
techniques.
● k-means and mean shift clustering
● nearest neighbor search and range search
● naive Bayes classifier
● regression: linear, logistic, LARS, softmax
● independent components analysis
● matrix factorization and collaborative filtering
● AdaBoost (plus some weak learners)
● GMMs and HMMs
● deep learning (still experimental)
● kernel PCA and regular PCA
● data preprocessing utilities
● numerous generic optimizers
Cutting-edge ML
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
7 / 47
mlpack also implements new, cutting-edge machine learning
techniques.
● several variants of the k-means algorithm
● density estimation trees
● Hoeffding (streaming) decision trees (and experimental
streaming forests)
● sparse coding and local coordinate coding
● dual-tree minimum spanning tree calculation
● dual-tree fast max-kernel search
● neighborhood components analysis
● rank-approximate nearest neighbor search
● automatically tuned locality-sensitive hashing (still
experimental)
● QUIC-SVD (fast approximate SVD)
Command-line programs
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
8 / 47
You don’t need to be a C++ expert.
Command-line programs
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
9 / 47
You don’t need to be a C++ expert.
# Train AdaBoost model.
$ mlpack_adaboost -t training_file.h5 -l training_labels.h5 
> -M trained_model.bin
# Predict with AdaBoost model.
$ mlpack_adaboost -m trained_model.bin -T test_set.csv 
> -o test_set_predictions.csv
Command-line programs
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
10 / 47
You don’t need to be a C++ expert.
# Train AdaBoost model.
$ mlpack_adaboost -t training_file.h5 -l training_labels.h5 
> -M trained_model.bin
# Predict with AdaBoost model.
$ mlpack_adaboost -m trained_model.bin -T test_set.csv 
> -o test_set_predictions.csv
# Use PCA to reduce to 5 dimensions.
$ mlpack_pca -i dataset.txt -d 5 -o new_dataset.csv
Command-line programs
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
11 / 47
You don’t need to be a C++ expert.
# Train AdaBoost model.
$ mlpack_adaboost -t training_file.h5 -l training_labels.h5 
> -M trained_model.bin
# Predict with AdaBoost model.
$ mlpack_adaboost -m trained_model.bin -T test_set.csv 
> -o test_set_predictions.csv
# Use PCA to reduce to 5 dimensions.
$ mlpack_pca -i dataset.txt -d 5 -o new_dataset.csv
# Impute missing values ("NULL") in the input dataset to the
# mean in that dimension.
$ mlpack_preprocess_imputer -i dataset.h5 -s mean -o imputed.h5
Genericity
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
12 / 47
Why write an algorithm for one specific situation?
Genericity
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
13 / 47
Why write an algorithm for one specific situation?
NearestNeighborSearch n(dataset);
n.Search(query_set, 3, neighbors, distances);
What if I don’t want the Euclidean distance?
Genericity
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
14 / 47
Why write an algorithm for one specific situation?
// The numeric parameter is the value of p for the p-norm to
// use. 1 = Manhattan distance, 2 = Euclidean distance, etc.
NearestNeighborSearch n(dataset, 1);
n.Search(query_set, 3, neighbors, distances);
Ok, this is a little better!
Genericity
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
15 / 47
Why write an algorithm for one specific situation?
// ManhattanDistance is a class with a method Evaluate().
NearestNeighborSearch<ManhattanDistance> n(dataset);
n.Search(query_set, 3, neighbors, distances);
This is much better! The user can specify whatever
distance metric they want, including one they write
themselves.
Genericity
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
16 / 47
Why write an algorithm for one specific situation?
// This will _definitely_ get me best paper at ICML! I can
// feel it!
class MyStupidDistance
{
static double Evaluate(const arma::vec& a,
const arma::vec& b)
{
return 15.0 * std::abs(a[0] - b[0]);
}
};
// Now we can use it!
NearestNeighborSearch<MyStupidDistance> n(dataset);
n.Search(query_set, 3, neighbors, distances);
Genericity
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
17 / 47
Why write an algorithm for one specific situation?
// We can also use sparse matrices instead!
NearestNeighborSearch<MyStupidDistance, arma::sp_mat>
n(sparse_dataset);
n.Search(sparse_query_set, 3, neighbors, distances);
Genericity
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
18 / 47
Why write an algorithm for one specific situation?
// Nearest neighbor search with arbitrary types of trees!
NearestNeighborSearch<EuclideanDistance, arma::mat, KDTree> kn;
NearestNeighborSearch<EuclideanDistance, arma::sp_mat, CoverTree> cn;
NearestNeighborSearch<ManhattanDistance, arma::mat, Octree> on;
NearestNeighborSearch<ChebyshevDistance, arma::sp_mat, RPlusTree> rn;
NearestNeighborSearch<MahalanobisDistance, arma::mat, RPTree> rpn;
NearestNeighborSearch<EuclideanDistance, arma::mat, XTree> xn;
Genericity
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
19 / 47
Why write an algorithm for one specific situation?
// Nearest neighbor search with arbitrary types of trees!
NearestNeighborSearch<EuclideanDistance, arma::mat, KDTree> kn;
NearestNeighborSearch<EuclideanDistance, arma::sp_mat, CoverTree> cn;
NearestNeighborSearch<ManhattanDistance, arma::mat, Octree> on;
NearestNeighborSearch<ChebyshevDistance, arma::sp_mat, RPlusTree> rn;
NearestNeighborSearch<MahalanobisDistance, arma::mat, RPTree> rpn;
NearestNeighborSearch<EuclideanDistance, arma::mat, XTree> xn;
R.R. Curtin, “Improving dual-tree algorithms”. PhD thesis, Georgia Institute of Technology, At-
lanta, GA, 8/2015.
Genericity
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
20 / 47
Why write an algorithm for one specific situation?
// Nearest neighbor search with arbitrary types of trees!
NearestNeighborSearch<EuclideanDistance, arma::mat, KDTree> kn;
NearestNeighborSearch<EuclideanDistance, arma::sp_mat, CoverTree> cn;
NearestNeighborSearch<ManhattanDistance, arma::mat, Octree> on;
NearestNeighborSearch<ChebyshevDistance, arma::sp_mat, RPlusTree> rn;
NearestNeighborSearch<MahalanobisDistance, arma::mat, RPTree> rpn;
NearestNeighborSearch<EuclideanDistance, arma::mat, XTree> xn;
R.R. Curtin, “Improving dual-tree algorithms”. PhD thesis, Georgia Institute of Technology, At-
lanta, GA, 8/2015.
Genericity
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
21 / 47
Why write an algorithm for one specific situation?
// Nearest neighbor search with arbitrary types of trees!
NearestNeighborSearch<EuclideanDistance, arma::mat, KDTree> kn;
NearestNeighborSearch<EuclideanDistance, arma::sp_mat, CoverTree> cn;
NearestNeighborSearch<ManhattanDistance, arma::mat, Octree> on;
NearestNeighborSearch<ChebyshevDistance, arma::sp_mat, RPlusTree> rn;
NearestNeighborSearch<MahalanobisDistance, arma::mat, RPTree> rpn;
NearestNeighborSearch<EuclideanDistance, arma::mat, XTree> xn;
R.R. Curtin, “Improving dual-tree algorithms”. PhD thesis, Georgia Institute of Technology, At-
lanta, GA, 8/2015.
Pros of C++
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
22 / 47
C++ is great!
Pros of C++
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
22 / 47
C++ is great!
● Generic programming at compile time via templates.
Pros of C++
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
22 / 47
C++ is great!
● Generic programming at compile time via templates.
● Low-level memory management.
Pros of C++
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
22 / 47
C++ is great!
● Generic programming at compile time via templates.
● Low-level memory management.
● Little to no runtime overhead.
Pros of C++
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
22 / 47
C++ is great!
● Generic programming at compile time via templates.
● Low-level memory management.
● Little to no runtime overhead.
● Well-known!
Pros of C++
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
22 / 47
C++ is great!
● Generic programming at compile time via templates.
● Low-level memory management.
● Little to no runtime overhead.
● Well-known!
● The Armadillo library gives us good linear algebra
primitives.
Pros of C++
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
23 / 47
C++ is great!
● Generic programming at compile time via templates.
● Low-level memory management.
● Little to no runtime overhead.
● Well-known!
● The Armadillo library gives us good linear algebra
primitives.
using namespace arma;
extern mat x, y;
mat z = (x + y) * chol(x) + 3 * chol(y.t());
Cons of C++
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
24 / 47
C++ is not great!
Cons of C++
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
24 / 47
C++ is not great!
● Templates can be hard to debug because of error
messages.
Cons of C++
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
24 / 47
C++ is not great!
● Templates can be hard to debug because of error
messages.
● Memory bugs are easy to introduce.
Cons of C++
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
24 / 47
C++ is not great!
● Templates can be hard to debug because of error
messages.
● Memory bugs are easy to introduce.
● The new language revisions are not making the language
any simpler...
Cons of C++
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
24 / 47
C++ is not great!
● Templates can be hard to debug because of error
messages.
● Memory bugs are easy to introduce.
● The new language revisions are not making the language
any simpler...
Why templates?
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
25 / 47
What about virtual inheritance?
Why templates?
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
26 / 47
What about virtual inheritance?
class MyStupidDistance : public Distance
{
virtual double Evaluate(const arma::vec& a,
const arma::vec& b)
{
return 15.0 * std::abs(a[0] - b[0]);
}
};
NearestNeighborSearch n(dataset, new MyStupidDistance());
n.Search(3, neighbors, distances);
Why templates?
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
27 / 47
What about virtual inheritance?
class MyStupidDistance : public Distance
{
virtual double Evaluate(const arma::vec& a,
const arma::vec& b)
{
return 15.0 * std::abs(a[0] - b[0]);
}
};
NearestNeighborSearch n(dataset, new MyStupidDistance());
n.Search(3, neighbors, distances);
vtable lookup penalty!
Why templates?
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
28 / 47
Using inheritance to call a function costs us instructions:
Distance* d =
new MyStupidDistance();
d->Evaluate(a, b);
MyStupidDistance::Evaluate(a, b);
Why templates?
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
29 / 47
Using inheritance to call a function costs us instructions:
Distance* d =
new MyStupidDistance();
d->Evaluate(a, b);
MyStupidDistance::Evaluate(a, b);
; push stack pointer
movq %rsp, %rdi
; get location of function
movq $_ZTV1A+16, (%rsp)
; call Evaluate()
call _ZN1A1aEd
; just call Evaluate()!
call _ZN1B1aEd.isra.0.constprop.1
Why templates?
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
30 / 47
Using inheritance to call a function costs us instructions:
Distance* d =
new MyStupidDistance();
d->Evaluate(a, b);
MyStupidDistance::Evaluate(a, b);
; push stack pointer
movq %rsp, %rdi
; get location of function
movq $_ZTV1A+16, (%rsp)
; call Evaluate()
call _ZN1A1aEd
; just call Evaluate()!
call _ZN1B1aEd.isra.0.constprop.1
Up to 10%+ performance penalty in some situations!
Compile-time expressions
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
31 / 47
What about math? (Armadillo)
Compile-time expressions
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
32 / 47
What about math? (Armadillo)
In C:
extern double** a, b, c, d, e;
extern int rows, cols;
// We want to do e = a + b + c + d.
mat_copy(e, a, rows, cols);
mat_add(e, b, rows, cols);
mat_add(e, c, rows, cols);
mat_add(e, d, rows, cols);
Compile-time expressions
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
33 / 47
What about math? (Armadillo)
In C with a custom function:
extern double** a, b, c, d, e;
extern int rows, cols;
// We want to do e = a + b + c + d.
mat_add4_into(e, a, b, c, d, rows, cols);
Fastest! (one pass)
Compile-time expressions
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
34 / 47
What about math? (Armadillo)
In MATLAB:
e = a + b + c + d
Compile-time expressions
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
35 / 47
What about math? (Armadillo)
In MATLAB:
e = a + b + c + d
Beautiful!
Compile-time expressions
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
36 / 47
What about math? (Armadillo)
Compile-time expressions
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
37 / 47
What about math? (Armadillo)
Compile-time expressions
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
38 / 47
What about math? (Armadillo)
In C++ (with Armadillo):
using namespace arma;
extern mat a, b, c, d;
mat e = a + b + c + d;
No temporaries, only one pass! Just as fast as the fastest C
implementation.
Compile-time expressions
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
39 / 47
What about math? (Armadillo)
In C++ (with Armadillo):
using namespace arma;
extern mat a, b, c, d;
mat e = a + b + c + d;
● operator+(mat, mat) returns a placeholder type
op<mat, mat, add>
Compile-time expressions
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
40 / 47
What about math? (Armadillo)
In C++ (with Armadillo):
using namespace arma;
extern mat a, b, c, d;
mat e = a + b + c + d;
● operator+(mat, mat) returns a placeholder type
op<mat, mat, add>
● operator+() is also overloaded for op<...> arguments
Compile-time expressions
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
41 / 47
What about math? (Armadillo)
In C++ (with Armadillo):
using namespace arma;
extern mat a, b, c, d;
mat e = a + b + c + d;
● operator+(mat, mat) returns a placeholder type
op<mat, mat, add>
● operator+() is also overloaded for op<...> arguments
● So a + b + c + d returns
op<mat, op<mat, op<mat, mat, add>, add>, add>
Compile-time expressions
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
42 / 47
What about math? (Armadillo)
In C++ (with Armadillo):
using namespace arma;
extern mat a, b, c, d;
mat e = a + b + c + d;
● operator+(mat, mat) returns a placeholder type
op<mat, mat, add>
● operator+() is also overloaded for op<...> arguments
● So a + b + c + d returns
op<mat, op<mat, op<mat, mat, add>, add>, add>
● operator=() is overloaded to ‘unwrap’ op<> expressions
Take-home
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
43 / 47
● Templates give us generic code.
● Templates allow us to generate fast code.
kNN benchmarks
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
44 / 47
dataset d N mlpack mlpy matlab scikit shogun Weka
isolet 617 8k 15.65s 59.09s 50.88s 44.59s 59.56s 220.38s
corel 32 68k 17.70s 95.26s fail 63.32s fail 29.38s
covertype 54 581k 18.04s 27.68s >9000s 44.55s >9000s 42.34s
twitter 78 583k 1573.92s >9000s >9000s 4637.81s fail >9000s
mnist 784 70k 3129.46s >9000s fail 8494.24s 6040.16s >9000s
tinyImages 384 100k 4535.38s 9000s fail >9000s fail >9000s
kNN benchmarks
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
44 / 47
dataset d N mlpack mlpy matlab scikit shogun Weka
isolet 617 8k 15.65s 59.09s 50.88s 44.59s 59.56s 220.38s
corel 32 68k 17.70s 95.26s fail 63.32s fail 29.38s
covertype 54 581k 18.04s 27.68s >9000s 44.55s >9000s 42.34s
twitter 78 583k 1573.92s >9000s >9000s 4637.81s fail >9000s
mnist 784 70k 3129.46s >9000s fail 8494.24s 6040.16s >9000s
tinyImages 384 100k 4535.38s 9000s fail >9000s fail >9000s
vs. Spark
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
45 / 47
We can use mmap() for out-of-core learning because our
algorithms are generic!
vs. Spark
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
45 / 47
We can use mmap() for out-of-core learning because our
algorithms are generic!
D. Fang, P. Chau. M3: scaling up machine learning via memory mapping, SIGMOD/PODS 2016.
What’s coming?
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
46 / 47
● Bindings to other languages! (more than just R)
● Non-experimental support for deep learning
● Automatic tuning of machine learning algorithms
● Further compile-time speedups
Questions and comments?
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
47 / 47
http://www.mlpack.org/
https://github.com/mlpack/mlpack/

Más contenido relacionado

La actualidad más candente

Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15MLconf
 
Oleksii Moskalenko "Continuous Delivery of ML Pipelines to Production"
Oleksii Moskalenko "Continuous Delivery of ML Pipelines to Production"Oleksii Moskalenko "Continuous Delivery of ML Pipelines to Production"
Oleksii Moskalenko "Continuous Delivery of ML Pipelines to Production"Fwdays
 
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...MLconf
 
Fine tuning large LMs
Fine tuning large LMsFine tuning large LMs
Fine tuning large LMsSylvainGugger
 
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...MLconf
 
Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for D...
 Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for D... Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for D...
Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for D...Databricks
 
Building A Machine Learning Platform At Quora (1)
Building A Machine Learning Platform At Quora (1)Building A Machine Learning Platform At Quora (1)
Building A Machine Learning Platform At Quora (1)Nikhil Garg
 
Learning to Translate with Joey NMT
Learning to Translate with Joey NMTLearning to Translate with Joey NMT
Learning to Translate with Joey NMTJulia Kreutzer
 
The road ahead for scientific computing with Python
The road ahead for scientific computing with PythonThe road ahead for scientific computing with Python
The road ahead for scientific computing with PythonRalf Gommers
 
HyperGraphDb
HyperGraphDbHyperGraphDb
HyperGraphDbborislav
 
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016MLconf
 
Open source ml systems that need to be built
Open source ml systems that need to be builtOpen source ml systems that need to be built
Open source ml systems that need to be builtNikhil Garg
 
Julia language: inside the corporation
Julia language: inside the corporationJulia language: inside the corporation
Julia language: inside the corporationAndre Pemmelaar
 
Mathias Brandewinder, Software Engineer & Data Scientist, Clear Lines Consult...
Mathias Brandewinder, Software Engineer & Data Scientist, Clear Lines Consult...Mathias Brandewinder, Software Engineer & Data Scientist, Clear Lines Consult...
Mathias Brandewinder, Software Engineer & Data Scientist, Clear Lines Consult...MLconf
 
Managing and Versioning Machine Learning Models in Python
Managing and Versioning Machine Learning Models in PythonManaging and Versioning Machine Learning Models in Python
Managing and Versioning Machine Learning Models in PythonSimon Frid
 
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016MLconf
 
MLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott ClarkMLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott ClarkSigOpt
 
Natural Language to Visualization by Neural Machine Translation
Natural Language to Visualization by Neural Machine TranslationNatural Language to Visualization by Neural Machine Translation
Natural Language to Visualization by Neural Machine Translationivaderivader
 
Deploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne BauerDeploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne BauerPyData
 

La actualidad más candente (20)

Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
 
Oleksii Moskalenko "Continuous Delivery of ML Pipelines to Production"
Oleksii Moskalenko "Continuous Delivery of ML Pipelines to Production"Oleksii Moskalenko "Continuous Delivery of ML Pipelines to Production"
Oleksii Moskalenko "Continuous Delivery of ML Pipelines to Production"
 
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
 
Fine tuning large LMs
Fine tuning large LMsFine tuning large LMs
Fine tuning large LMs
 
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
 
Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for D...
 Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for D... Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for D...
Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for D...
 
Building A Machine Learning Platform At Quora (1)
Building A Machine Learning Platform At Quora (1)Building A Machine Learning Platform At Quora (1)
Building A Machine Learning Platform At Quora (1)
 
Learning to Translate with Joey NMT
Learning to Translate with Joey NMTLearning to Translate with Joey NMT
Learning to Translate with Joey NMT
 
Recent nlp trends
Recent nlp trendsRecent nlp trends
Recent nlp trends
 
The road ahead for scientific computing with Python
The road ahead for scientific computing with PythonThe road ahead for scientific computing with Python
The road ahead for scientific computing with Python
 
HyperGraphDb
HyperGraphDbHyperGraphDb
HyperGraphDb
 
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
 
Open source ml systems that need to be built
Open source ml systems that need to be builtOpen source ml systems that need to be built
Open source ml systems that need to be built
 
Julia language: inside the corporation
Julia language: inside the corporationJulia language: inside the corporation
Julia language: inside the corporation
 
Mathias Brandewinder, Software Engineer & Data Scientist, Clear Lines Consult...
Mathias Brandewinder, Software Engineer & Data Scientist, Clear Lines Consult...Mathias Brandewinder, Software Engineer & Data Scientist, Clear Lines Consult...
Mathias Brandewinder, Software Engineer & Data Scientist, Clear Lines Consult...
 
Managing and Versioning Machine Learning Models in Python
Managing and Versioning Machine Learning Models in PythonManaging and Versioning Machine Learning Models in Python
Managing and Versioning Machine Learning Models in Python
 
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
Josh Patterson, Advisor, Skymind – Deep learning for Industry at MLconf ATL 2016
 
MLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott ClarkMLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott Clark
 
Natural Language to Visualization by Neural Machine Translation
Natural Language to Visualization by Neural Machine TranslationNatural Language to Visualization by Neural Machine Translation
Natural Language to Visualization by Neural Machine Translation
 
Deploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne BauerDeploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne Bauer
 

Similar a Ryan Curtin, Principal Research Scientist, Symantec at MLconf ATL 2016

Scala & Spark(1.6) in Performance Aspect for Scala Taiwan
Scala & Spark(1.6) in Performance Aspect for Scala TaiwanScala & Spark(1.6) in Performance Aspect for Scala Taiwan
Scala & Spark(1.6) in Performance Aspect for Scala TaiwanJimin Hsieh
 
How aspects clean your code
How aspects clean your codeHow aspects clean your code
How aspects clean your codeBarbara Fusinska
 
The Past, Present, and Future of OpenACC
The Past, Present, and Future of OpenACCThe Past, Present, and Future of OpenACC
The Past, Present, and Future of OpenACCinside-BigData.com
 
Death to project documentation with eXtreme Programming
Death to project documentation with eXtreme ProgrammingDeath to project documentation with eXtreme Programming
Death to project documentation with eXtreme ProgrammingAlex Fernandez
 
Your interactive computing
Your interactive computingYour interactive computing
Your interactive computingYung-Yu Chen
 
Jak aspekty uporządkują twój kod.
Jak aspekty uporządkują twój kod.Jak aspekty uporządkują twój kod.
Jak aspekty uporządkują twój kod.Future Processing
 
Simplifying training deep and serving learning models with big data in python...
Simplifying training deep and serving learning models with big data in python...Simplifying training deep and serving learning models with big data in python...
Simplifying training deep and serving learning models with big data in python...Holden Karau
 
RTL DESIGN IN ML WORLD_OBJECT AUTOMATION Inc
RTL DESIGN IN ML WORLD_OBJECT AUTOMATION IncRTL DESIGN IN ML WORLD_OBJECT AUTOMATION Inc
RTL DESIGN IN ML WORLD_OBJECT AUTOMATION IncObject Automation
 
Powering Tensorflow with big data using Apache Beam, Flink, and Spark - OSCON...
Powering Tensorflow with big data using Apache Beam, Flink, and Spark - OSCON...Powering Tensorflow with big data using Apache Beam, Flink, and Spark - OSCON...
Powering Tensorflow with big data using Apache Beam, Flink, and Spark - OSCON...Holden Karau
 
Object oriented slides
Object oriented slidesObject oriented slides
Object oriented slidesahad nadeem
 
Machine Learning on Your Hand - Introduction to Tensorflow Lite Preview
Machine Learning on Your Hand - Introduction to Tensorflow Lite PreviewMachine Learning on Your Hand - Introduction to Tensorflow Lite Preview
Machine Learning on Your Hand - Introduction to Tensorflow Lite PreviewModulabs
 
Big Data Beyond the JVM - Strata San Jose 2018
Big Data Beyond the JVM - Strata San Jose 2018Big Data Beyond the JVM - Strata San Jose 2018
Big Data Beyond the JVM - Strata San Jose 2018Holden Karau
 
Profiling and Optimizing for Xeon Phi with Allinea MAP
Profiling and Optimizing for Xeon Phi with Allinea MAPProfiling and Optimizing for Xeon Phi with Allinea MAP
Profiling and Optimizing for Xeon Phi with Allinea MAPIntel IT Center
 
Accelerating Big Data beyond the JVM - Fosdem 2018
Accelerating Big Data beyond the JVM - Fosdem 2018Accelerating Big Data beyond the JVM - Fosdem 2018
Accelerating Big Data beyond the JVM - Fosdem 2018Holden Karau
 
Harmonic Stack for Speed
Harmonic Stack for SpeedHarmonic Stack for Speed
Harmonic Stack for SpeedYung-Yu Chen
 
Real World Application of Development
Real World Application of DevelopmentReal World Application of Development
Real World Application of Developmentdjones101
 
GraphQL - hot or not? How to simplify API based services?
GraphQL - hot or not? How to simplify  API based services?GraphQL - hot or not? How to simplify  API based services?
GraphQL - hot or not? How to simplify API based services?Adam Klimczyk
 
Engineer Engineering Software
Engineer Engineering SoftwareEngineer Engineering Software
Engineer Engineering SoftwareYung-Yu Chen
 

Similar a Ryan Curtin, Principal Research Scientist, Symantec at MLconf ATL 2016 (20)

Scala & Spark(1.6) in Performance Aspect for Scala Taiwan
Scala & Spark(1.6) in Performance Aspect for Scala TaiwanScala & Spark(1.6) in Performance Aspect for Scala Taiwan
Scala & Spark(1.6) in Performance Aspect for Scala Taiwan
 
How aspects clean your code
How aspects clean your codeHow aspects clean your code
How aspects clean your code
 
The Past, Present, and Future of OpenACC
The Past, Present, and Future of OpenACCThe Past, Present, and Future of OpenACC
The Past, Present, and Future of OpenACC
 
Death to project documentation with eXtreme Programming
Death to project documentation with eXtreme ProgrammingDeath to project documentation with eXtreme Programming
Death to project documentation with eXtreme Programming
 
Your interactive computing
Your interactive computingYour interactive computing
Your interactive computing
 
Jak aspekty uporządkują twój kod.
Jak aspekty uporządkują twój kod.Jak aspekty uporządkują twój kod.
Jak aspekty uporządkują twój kod.
 
Simplifying training deep and serving learning models with big data in python...
Simplifying training deep and serving learning models with big data in python...Simplifying training deep and serving learning models with big data in python...
Simplifying training deep and serving learning models with big data in python...
 
RTL DESIGN IN ML WORLD_OBJECT AUTOMATION Inc
RTL DESIGN IN ML WORLD_OBJECT AUTOMATION IncRTL DESIGN IN ML WORLD_OBJECT AUTOMATION Inc
RTL DESIGN IN ML WORLD_OBJECT AUTOMATION Inc
 
Powering Tensorflow with big data using Apache Beam, Flink, and Spark - OSCON...
Powering Tensorflow with big data using Apache Beam, Flink, and Spark - OSCON...Powering Tensorflow with big data using Apache Beam, Flink, and Spark - OSCON...
Powering Tensorflow with big data using Apache Beam, Flink, and Spark - OSCON...
 
Object oriented slides
Object oriented slidesObject oriented slides
Object oriented slides
 
Lecture 11
Lecture 11Lecture 11
Lecture 11
 
Machine Learning on Your Hand - Introduction to Tensorflow Lite Preview
Machine Learning on Your Hand - Introduction to Tensorflow Lite PreviewMachine Learning on Your Hand - Introduction to Tensorflow Lite Preview
Machine Learning on Your Hand - Introduction to Tensorflow Lite Preview
 
Big Data Beyond the JVM - Strata San Jose 2018
Big Data Beyond the JVM - Strata San Jose 2018Big Data Beyond the JVM - Strata San Jose 2018
Big Data Beyond the JVM - Strata San Jose 2018
 
Profiling and Optimizing for Xeon Phi with Allinea MAP
Profiling and Optimizing for Xeon Phi with Allinea MAPProfiling and Optimizing for Xeon Phi with Allinea MAP
Profiling and Optimizing for Xeon Phi with Allinea MAP
 
Accelerating Big Data beyond the JVM - Fosdem 2018
Accelerating Big Data beyond the JVM - Fosdem 2018Accelerating Big Data beyond the JVM - Fosdem 2018
Accelerating Big Data beyond the JVM - Fosdem 2018
 
Harmonic Stack for Speed
Harmonic Stack for SpeedHarmonic Stack for Speed
Harmonic Stack for Speed
 
Real World Application of Development
Real World Application of DevelopmentReal World Application of Development
Real World Application of Development
 
GraphQL - hot or not? How to simplify API based services?
GraphQL - hot or not? How to simplify  API based services?GraphQL - hot or not? How to simplify  API based services?
GraphQL - hot or not? How to simplify API based services?
 
Introduction to c++
Introduction to c++Introduction to c++
Introduction to c++
 
Engineer Engineering Software
Engineer Engineering SoftwareEngineer Engineering Software
Engineer Engineering Software
 

Más de MLconf

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...MLconf
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingMLconf
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...MLconf
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushMLconf
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceMLconf
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...MLconf
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...MLconf
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMLconf
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionMLconf
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLMLconf
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksMLconf
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...MLconf
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldMLconf
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...MLconf
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...MLconf
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...MLconf
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeMLconf
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...MLconf
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareMLconf
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesMLconf
 

Más de MLconf (20)

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious Experience
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the Cheap
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data Collection
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of ML
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to code
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better Software
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime Changes
 

Último

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 

Último (20)

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 

Ryan Curtin, Principal Research Scientist, Symantec at MLconf ATL 2016

  • 1. 1 / 47 mlpack: or, How I Learned To Stop Worrying and Love C++ Ryan R. Curtin September 23, 2016
  • 2. Introduction ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 2 / 47 Why are we here?
  • 3. Introduction ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 2 / 47 Why are we here? speed
  • 4. Introduction ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 2 / 47 Why are we here? speed programming languages
  • 5. Introduction ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 2 / 47 Why are we here? speed programming languages machine learning
  • 6. Introduction ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 2 / 47 Why are we here? speed programming languages machine learning C++
  • 7. Introduction ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 2 / 47 Why are we here? speed programming languages machine learning C++ mlpack
  • 8. Introduction ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 2 / 47 Why are we here? speed programming languages machine learning C++ mlpack fast
  • 9. Graph #1 ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 3 / 47
  • 10. Graph #1 ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 3 / 47 low-level high-level
  • 11. Graph #1 ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 3 / 47 low-level high-level ASM
  • 12. Graph #1 ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 3 / 47 low-level high-level ASM
  • 13. Graph #1 ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 3 / 47 low-level high-level ASM VB
  • 14. Graph #1 ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 3 / 47 low-level high-level ASM VB
  • 15. Graph #1 ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 3 / 47 low-level high-level ASM VB C C++ Java Python Scala Ruby INTERCAL C# PHP JS Go Tcl Lua Note: this is not a scientific or particularly accurate representation.
  • 16. Graph #1 ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 3 / 47 low-level high-level ASM VB C C++ Java Python Scala Ruby INTERCAL C# PHP JS Go Tcl Lua Note: this is not a scientific or particularly accurate representation. fast easy
  • 17. Graph #1 ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 3 / 47 low-level high-level ASM VB C C++ Java Python Scala Ruby INTERCAL C# PHP JS Go Tcl Lua Note: this is not a scientific or particularly accurate representation. fast easy specific portable
  • 18. The Big Tradeoff ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 4 / 47 speed vs. portability and readability
  • 19. The Big Tradeoff ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 4 / 47 speed vs. portability and readability
  • 20. The Big Tradeoff ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 4 / 47 speed vs. portability and readability If we’re careful, we can get speed, portability, and readability by using C++.
  • 21. So, mlpack. ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 5 / 47 What is it?
  • 22. So, mlpack. ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 5 / 47 What is it? ● a fast general-purpose C++ machine learning library ● contains flexible implementations of common and cutting-edge machine learning algorithms ● for fast or big runs on single workstations ● bindings are available for R, and are coming for other languages ● commonly used inside the machine learning community ● 60+ developers from around the world ● frequent participation in the Google Summer of Code program R.R. Curtin, J.R. Cline, N.P. Slagle, W.B. March, P. Ram, N.A. Mehta, A.G. Gray, “mlpack: a scalable C++ machine learning library”, in The Journal of Machine Learning Research, vol. 14, p. 801–805, 2013.
  • 23. So, mlpack. ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 5 / 47 What is it? ● a fast general-purpose C++ machine learning library ● contains flexible implementations of common and cutting-edge machine learning algorithms ● for fast or big runs on single workstations ● bindings are available for R, and are coming for other languages ● commonly used inside the machine learning community ● 60+ developers from around the world ● frequent participation in the Google Summer of Code program R.R. Curtin, J.R. Cline, N.P. Slagle, W.B. March, P. Ram, N.A. Mehta, A.G. Gray, “mlpack: a scalable C++ machine learning library”, in The Journal of Machine Learning Research, vol. 14, p. 801–805, 2013. http://www.mlpack.org/ https://github.com/mlpack/mlpack/
  • 24. Standard ML ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 6 / 47 mlpack implements a lot of standard machine learning techniques. ● k-means and mean shift clustering ● nearest neighbor search and range search ● naive Bayes classifier ● regression: linear, logistic, LARS, softmax ● independent components analysis ● matrix factorization and collaborative filtering ● AdaBoost (plus some weak learners) ● GMMs and HMMs ● deep learning (still experimental) ● kernel PCA and regular PCA ● data preprocessing utilities ● numerous generic optimizers
  • 25. Cutting-edge ML ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 7 / 47 mlpack also implements new, cutting-edge machine learning techniques. ● several variants of the k-means algorithm ● density estimation trees ● Hoeffding (streaming) decision trees (and experimental streaming forests) ● sparse coding and local coordinate coding ● dual-tree minimum spanning tree calculation ● dual-tree fast max-kernel search ● neighborhood components analysis ● rank-approximate nearest neighbor search ● automatically tuned locality-sensitive hashing (still experimental) ● QUIC-SVD (fast approximate SVD)
  • 26. Command-line programs ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 8 / 47 You don’t need to be a C++ expert.
  • 27. Command-line programs ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 9 / 47 You don’t need to be a C++ expert. # Train AdaBoost model. $ mlpack_adaboost -t training_file.h5 -l training_labels.h5 > -M trained_model.bin # Predict with AdaBoost model. $ mlpack_adaboost -m trained_model.bin -T test_set.csv > -o test_set_predictions.csv
  • 28. Command-line programs ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 10 / 47 You don’t need to be a C++ expert. # Train AdaBoost model. $ mlpack_adaboost -t training_file.h5 -l training_labels.h5 > -M trained_model.bin # Predict with AdaBoost model. $ mlpack_adaboost -m trained_model.bin -T test_set.csv > -o test_set_predictions.csv # Use PCA to reduce to 5 dimensions. $ mlpack_pca -i dataset.txt -d 5 -o new_dataset.csv
  • 29. Command-line programs ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 11 / 47 You don’t need to be a C++ expert. # Train AdaBoost model. $ mlpack_adaboost -t training_file.h5 -l training_labels.h5 > -M trained_model.bin # Predict with AdaBoost model. $ mlpack_adaboost -m trained_model.bin -T test_set.csv > -o test_set_predictions.csv # Use PCA to reduce to 5 dimensions. $ mlpack_pca -i dataset.txt -d 5 -o new_dataset.csv # Impute missing values ("NULL") in the input dataset to the # mean in that dimension. $ mlpack_preprocess_imputer -i dataset.h5 -s mean -o imputed.h5
  • 30. Genericity ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 12 / 47 Why write an algorithm for one specific situation?
  • 31. Genericity ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 13 / 47 Why write an algorithm for one specific situation? NearestNeighborSearch n(dataset); n.Search(query_set, 3, neighbors, distances); What if I don’t want the Euclidean distance?
  • 32. Genericity ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 14 / 47 Why write an algorithm for one specific situation? // The numeric parameter is the value of p for the p-norm to // use. 1 = Manhattan distance, 2 = Euclidean distance, etc. NearestNeighborSearch n(dataset, 1); n.Search(query_set, 3, neighbors, distances); Ok, this is a little better!
  • 33. Genericity ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 15 / 47 Why write an algorithm for one specific situation? // ManhattanDistance is a class with a method Evaluate(). NearestNeighborSearch<ManhattanDistance> n(dataset); n.Search(query_set, 3, neighbors, distances); This is much better! The user can specify whatever distance metric they want, including one they write themselves.
  • 34. Genericity ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 16 / 47 Why write an algorithm for one specific situation? // This will _definitely_ get me best paper at ICML! I can // feel it! class MyStupidDistance { static double Evaluate(const arma::vec& a, const arma::vec& b) { return 15.0 * std::abs(a[0] - b[0]); } }; // Now we can use it! NearestNeighborSearch<MyStupidDistance> n(dataset); n.Search(query_set, 3, neighbors, distances);
  • 35. Genericity ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 17 / 47 Why write an algorithm for one specific situation? // We can also use sparse matrices instead! NearestNeighborSearch<MyStupidDistance, arma::sp_mat> n(sparse_dataset); n.Search(sparse_query_set, 3, neighbors, distances);
  • 36. Genericity ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 18 / 47 Why write an algorithm for one specific situation? // Nearest neighbor search with arbitrary types of trees! NearestNeighborSearch<EuclideanDistance, arma::mat, KDTree> kn; NearestNeighborSearch<EuclideanDistance, arma::sp_mat, CoverTree> cn; NearestNeighborSearch<ManhattanDistance, arma::mat, Octree> on; NearestNeighborSearch<ChebyshevDistance, arma::sp_mat, RPlusTree> rn; NearestNeighborSearch<MahalanobisDistance, arma::mat, RPTree> rpn; NearestNeighborSearch<EuclideanDistance, arma::mat, XTree> xn;
  • 37. Genericity ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 19 / 47 Why write an algorithm for one specific situation? // Nearest neighbor search with arbitrary types of trees! NearestNeighborSearch<EuclideanDistance, arma::mat, KDTree> kn; NearestNeighborSearch<EuclideanDistance, arma::sp_mat, CoverTree> cn; NearestNeighborSearch<ManhattanDistance, arma::mat, Octree> on; NearestNeighborSearch<ChebyshevDistance, arma::sp_mat, RPlusTree> rn; NearestNeighborSearch<MahalanobisDistance, arma::mat, RPTree> rpn; NearestNeighborSearch<EuclideanDistance, arma::mat, XTree> xn; R.R. Curtin, “Improving dual-tree algorithms”. PhD thesis, Georgia Institute of Technology, At- lanta, GA, 8/2015.
  • 38. Genericity ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 20 / 47 Why write an algorithm for one specific situation? // Nearest neighbor search with arbitrary types of trees! NearestNeighborSearch<EuclideanDistance, arma::mat, KDTree> kn; NearestNeighborSearch<EuclideanDistance, arma::sp_mat, CoverTree> cn; NearestNeighborSearch<ManhattanDistance, arma::mat, Octree> on; NearestNeighborSearch<ChebyshevDistance, arma::sp_mat, RPlusTree> rn; NearestNeighborSearch<MahalanobisDistance, arma::mat, RPTree> rpn; NearestNeighborSearch<EuclideanDistance, arma::mat, XTree> xn; R.R. Curtin, “Improving dual-tree algorithms”. PhD thesis, Georgia Institute of Technology, At- lanta, GA, 8/2015.
  • 39. Genericity ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 21 / 47 Why write an algorithm for one specific situation? // Nearest neighbor search with arbitrary types of trees! NearestNeighborSearch<EuclideanDistance, arma::mat, KDTree> kn; NearestNeighborSearch<EuclideanDistance, arma::sp_mat, CoverTree> cn; NearestNeighborSearch<ManhattanDistance, arma::mat, Octree> on; NearestNeighborSearch<ChebyshevDistance, arma::sp_mat, RPlusTree> rn; NearestNeighborSearch<MahalanobisDistance, arma::mat, RPTree> rpn; NearestNeighborSearch<EuclideanDistance, arma::mat, XTree> xn; R.R. Curtin, “Improving dual-tree algorithms”. PhD thesis, Georgia Institute of Technology, At- lanta, GA, 8/2015.
  • 40. Pros of C++ ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 22 / 47 C++ is great!
  • 41. Pros of C++ ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 22 / 47 C++ is great! ● Generic programming at compile time via templates.
  • 42. Pros of C++ ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 22 / 47 C++ is great! ● Generic programming at compile time via templates. ● Low-level memory management.
  • 43. Pros of C++ ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 22 / 47 C++ is great! ● Generic programming at compile time via templates. ● Low-level memory management. ● Little to no runtime overhead.
  • 44. Pros of C++ ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 22 / 47 C++ is great! ● Generic programming at compile time via templates. ● Low-level memory management. ● Little to no runtime overhead. ● Well-known!
  • 45. Pros of C++ ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 22 / 47 C++ is great! ● Generic programming at compile time via templates. ● Low-level memory management. ● Little to no runtime overhead. ● Well-known! ● The Armadillo library gives us good linear algebra primitives.
  • 46. Pros of C++ ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 23 / 47 C++ is great! ● Generic programming at compile time via templates. ● Low-level memory management. ● Little to no runtime overhead. ● Well-known! ● The Armadillo library gives us good linear algebra primitives. using namespace arma; extern mat x, y; mat z = (x + y) * chol(x) + 3 * chol(y.t());
  • 47. Cons of C++ ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 24 / 47 C++ is not great!
  • 48. Cons of C++ ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 24 / 47 C++ is not great! ● Templates can be hard to debug because of error messages.
  • 49. Cons of C++ ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 24 / 47 C++ is not great! ● Templates can be hard to debug because of error messages. ● Memory bugs are easy to introduce.
  • 50. Cons of C++ ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 24 / 47 C++ is not great! ● Templates can be hard to debug because of error messages. ● Memory bugs are easy to introduce. ● The new language revisions are not making the language any simpler...
  • 51. Cons of C++ ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 24 / 47 C++ is not great! ● Templates can be hard to debug because of error messages. ● Memory bugs are easy to introduce. ● The new language revisions are not making the language any simpler...
  • 52. Why templates? ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 25 / 47 What about virtual inheritance?
  • 53. Why templates? ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 26 / 47 What about virtual inheritance? class MyStupidDistance : public Distance { virtual double Evaluate(const arma::vec& a, const arma::vec& b) { return 15.0 * std::abs(a[0] - b[0]); } }; NearestNeighborSearch n(dataset, new MyStupidDistance()); n.Search(3, neighbors, distances);
  • 54. Why templates? ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 27 / 47 What about virtual inheritance? class MyStupidDistance : public Distance { virtual double Evaluate(const arma::vec& a, const arma::vec& b) { return 15.0 * std::abs(a[0] - b[0]); } }; NearestNeighborSearch n(dataset, new MyStupidDistance()); n.Search(3, neighbors, distances); vtable lookup penalty!
  • 55. Why templates? ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 28 / 47 Using inheritance to call a function costs us instructions: Distance* d = new MyStupidDistance(); d->Evaluate(a, b); MyStupidDistance::Evaluate(a, b);
  • 56. Why templates? ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 29 / 47 Using inheritance to call a function costs us instructions: Distance* d = new MyStupidDistance(); d->Evaluate(a, b); MyStupidDistance::Evaluate(a, b); ; push stack pointer movq %rsp, %rdi ; get location of function movq $_ZTV1A+16, (%rsp) ; call Evaluate() call _ZN1A1aEd ; just call Evaluate()! call _ZN1B1aEd.isra.0.constprop.1
  • 57. Why templates? ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 30 / 47 Using inheritance to call a function costs us instructions: Distance* d = new MyStupidDistance(); d->Evaluate(a, b); MyStupidDistance::Evaluate(a, b); ; push stack pointer movq %rsp, %rdi ; get location of function movq $_ZTV1A+16, (%rsp) ; call Evaluate() call _ZN1A1aEd ; just call Evaluate()! call _ZN1B1aEd.isra.0.constprop.1 Up to 10%+ performance penalty in some situations!
  • 58. Compile-time expressions ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 31 / 47 What about math? (Armadillo)
  • 59. Compile-time expressions ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 32 / 47 What about math? (Armadillo) In C: extern double** a, b, c, d, e; extern int rows, cols; // We want to do e = a + b + c + d. mat_copy(e, a, rows, cols); mat_add(e, b, rows, cols); mat_add(e, c, rows, cols); mat_add(e, d, rows, cols);
  • 60. Compile-time expressions ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 33 / 47 What about math? (Armadillo) In C with a custom function: extern double** a, b, c, d, e; extern int rows, cols; // We want to do e = a + b + c + d. mat_add4_into(e, a, b, c, d, rows, cols); Fastest! (one pass)
  • 61. Compile-time expressions ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 34 / 47 What about math? (Armadillo) In MATLAB: e = a + b + c + d
  • 62. Compile-time expressions ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 35 / 47 What about math? (Armadillo) In MATLAB: e = a + b + c + d Beautiful!
  • 63. Compile-time expressions ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 36 / 47 What about math? (Armadillo)
  • 64. Compile-time expressions ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 37 / 47 What about math? (Armadillo)
  • 65. Compile-time expressions ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 38 / 47 What about math? (Armadillo) In C++ (with Armadillo): using namespace arma; extern mat a, b, c, d; mat e = a + b + c + d; No temporaries, only one pass! Just as fast as the fastest C implementation.
  • 66. Compile-time expressions ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 39 / 47 What about math? (Armadillo) In C++ (with Armadillo): using namespace arma; extern mat a, b, c, d; mat e = a + b + c + d; ● operator+(mat, mat) returns a placeholder type op<mat, mat, add>
  • 67. Compile-time expressions ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 40 / 47 What about math? (Armadillo) In C++ (with Armadillo): using namespace arma; extern mat a, b, c, d; mat e = a + b + c + d; ● operator+(mat, mat) returns a placeholder type op<mat, mat, add> ● operator+() is also overloaded for op<...> arguments
  • 68. Compile-time expressions ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 41 / 47 What about math? (Armadillo) In C++ (with Armadillo): using namespace arma; extern mat a, b, c, d; mat e = a + b + c + d; ● operator+(mat, mat) returns a placeholder type op<mat, mat, add> ● operator+() is also overloaded for op<...> arguments ● So a + b + c + d returns op<mat, op<mat, op<mat, mat, add>, add>, add>
  • 69. Compile-time expressions ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 42 / 47 What about math? (Armadillo) In C++ (with Armadillo): using namespace arma; extern mat a, b, c, d; mat e = a + b + c + d; ● operator+(mat, mat) returns a placeholder type op<mat, mat, add> ● operator+() is also overloaded for op<...> arguments ● So a + b + c + d returns op<mat, op<mat, op<mat, mat, add>, add>, add> ● operator=() is overloaded to ‘unwrap’ op<> expressions
  • 70. Take-home ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 43 / 47 ● Templates give us generic code. ● Templates allow us to generate fast code.
  • 71. kNN benchmarks ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 44 / 47 dataset d N mlpack mlpy matlab scikit shogun Weka isolet 617 8k 15.65s 59.09s 50.88s 44.59s 59.56s 220.38s corel 32 68k 17.70s 95.26s fail 63.32s fail 29.38s covertype 54 581k 18.04s 27.68s >9000s 44.55s >9000s 42.34s twitter 78 583k 1573.92s >9000s >9000s 4637.81s fail >9000s mnist 784 70k 3129.46s >9000s fail 8494.24s 6040.16s >9000s tinyImages 384 100k 4535.38s 9000s fail >9000s fail >9000s
  • 72. kNN benchmarks ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 44 / 47 dataset d N mlpack mlpy matlab scikit shogun Weka isolet 617 8k 15.65s 59.09s 50.88s 44.59s 59.56s 220.38s corel 32 68k 17.70s 95.26s fail 63.32s fail 29.38s covertype 54 581k 18.04s 27.68s >9000s 44.55s >9000s 42.34s twitter 78 583k 1573.92s >9000s >9000s 4637.81s fail >9000s mnist 784 70k 3129.46s >9000s fail 8494.24s 6040.16s >9000s tinyImages 384 100k 4535.38s 9000s fail >9000s fail >9000s
  • 73. vs. Spark ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 45 / 47 We can use mmap() for out-of-core learning because our algorithms are generic!
  • 74. vs. Spark ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 45 / 47 We can use mmap() for out-of-core learning because our algorithms are generic! D. Fang, P. Chau. M3: scaling up machine learning via memory mapping, SIGMOD/PODS 2016.
  • 75. What’s coming? ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 46 / 47 ● Bindings to other languages! (more than just R) ● Non-experimental support for deep learning ● Automatic tuning of machine learning algorithms ● Further compile-time speedups
  • 76. Questions and comments? ❖ Introduction ❖ Graph #1 ❖ The Big Tradeoff ❖ So, mlpack. ❖ Standard ML ❖ Cutting-edge ML ❖ Command-line programs ❖ Genericity ❖ Pros of C++ ❖ Cons of C++ ❖ Why templates? ❖ Compile-time expressions ❖ Take-home ❖ kNN benchmarks ❖ vs. Spark ❖ What’s coming? ❖ Questions and comments? 47 / 47 http://www.mlpack.org/ https://github.com/mlpack/mlpack/