mlpack: Or, How I Learned To Stop Worrying and Love C++: mlpack is a cutting-edge C++ machine learning library containing fast implementations of both standard machine learning algorithms and recently-published algorithms. In this talk, I will introduce mlpack, its design philosophy, and discuss how C++ is helpful for making implementations fast, as well as the pros and cons of C++ as a language choice. I will briefly review the capabilities of mlpack, then focus on mlpack’s flexibility by demonstrating the k-means clustering code (and maybe some other algorithms too, like nearest neighbor search), and how it might be used in a production environment. The project website can be found at http://www.mlpack.org/.
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
Ryan Curtin, Principal Research Scientist, Symantec at MLconf ATL 2016
1. 1 / 47
mlpack: or, How I Learned To Stop Worrying and
Love C++
Ryan R. Curtin
September 23, 2016
2. Introduction
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
2 / 47
Why are we here?
3. Introduction
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
2 / 47
Why are we here?
speed
4. Introduction
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
2 / 47
Why are we here?
speed programming languages
5. Introduction
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
2 / 47
Why are we here?
speed programming languages machine learning
6. Introduction
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
2 / 47
Why are we here?
speed programming languages machine learning
C++
7. Introduction
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
2 / 47
Why are we here?
speed programming languages machine learning
C++ mlpack
8. Introduction
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
2 / 47
Why are we here?
speed programming languages machine learning
C++ mlpack fast
9. Graph #1
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
3 / 47
10. Graph #1
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
3 / 47
low-level high-level
11. Graph #1
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
3 / 47
low-level high-level
ASM
12. Graph #1
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
3 / 47
low-level high-level
ASM
13. Graph #1
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
3 / 47
low-level high-level
ASM VB
14. Graph #1
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
3 / 47
low-level high-level
ASM VB
15. Graph #1
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
3 / 47
low-level high-level
ASM VB
C
C++
Java
Python
Scala
Ruby
INTERCAL
C#
PHP
JS
Go
Tcl
Lua
Note: this is not a scientific or particularly accurate representation.
16. Graph #1
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
3 / 47
low-level high-level
ASM VB
C
C++
Java
Python
Scala
Ruby
INTERCAL
C#
PHP
JS
Go
Tcl
Lua
Note: this is not a scientific or particularly accurate representation.
fast easy
17. Graph #1
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
3 / 47
low-level high-level
ASM VB
C
C++
Java
Python
Scala
Ruby
INTERCAL
C#
PHP
JS
Go
Tcl
Lua
Note: this is not a scientific or particularly accurate representation.
fast easy
specific portable
18. The Big Tradeoff
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
4 / 47
speed vs. portability and readability
19. The Big Tradeoff
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
4 / 47
speed vs. portability and readability
20. The Big Tradeoff
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
4 / 47
speed vs. portability and readability
If we’re careful, we can get speed, portability, and
readability by using C++.
21. So, mlpack.
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
5 / 47
What is it?
22. So, mlpack.
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
5 / 47
What is it?
● a fast general-purpose C++ machine learning library
● contains flexible implementations of common and
cutting-edge machine learning algorithms
● for fast or big runs on single workstations
● bindings are available for R, and are coming for other
languages
● commonly used inside the machine learning community
● 60+ developers from around the world
● frequent participation in the Google Summer of Code
program
R.R. Curtin, J.R. Cline, N.P. Slagle, W.B. March, P. Ram, N.A. Mehta, A.G. Gray, “mlpack: a
scalable C++ machine learning library”, in The Journal of Machine Learning Research, vol. 14,
p. 801–805, 2013.
23. So, mlpack.
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
5 / 47
What is it?
● a fast general-purpose C++ machine learning library
● contains flexible implementations of common and
cutting-edge machine learning algorithms
● for fast or big runs on single workstations
● bindings are available for R, and are coming for other
languages
● commonly used inside the machine learning community
● 60+ developers from around the world
● frequent participation in the Google Summer of Code
program
R.R. Curtin, J.R. Cline, N.P. Slagle, W.B. March, P. Ram, N.A. Mehta, A.G. Gray, “mlpack: a
scalable C++ machine learning library”, in The Journal of Machine Learning Research, vol. 14,
p. 801–805, 2013.
http://www.mlpack.org/
https://github.com/mlpack/mlpack/
24. Standard ML
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
6 / 47
mlpack implements a lot of standard machine learning
techniques.
● k-means and mean shift clustering
● nearest neighbor search and range search
● naive Bayes classifier
● regression: linear, logistic, LARS, softmax
● independent components analysis
● matrix factorization and collaborative filtering
● AdaBoost (plus some weak learners)
● GMMs and HMMs
● deep learning (still experimental)
● kernel PCA and regular PCA
● data preprocessing utilities
● numerous generic optimizers
25. Cutting-edge ML
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
7 / 47
mlpack also implements new, cutting-edge machine learning
techniques.
● several variants of the k-means algorithm
● density estimation trees
● Hoeffding (streaming) decision trees (and experimental
streaming forests)
● sparse coding and local coordinate coding
● dual-tree minimum spanning tree calculation
● dual-tree fast max-kernel search
● neighborhood components analysis
● rank-approximate nearest neighbor search
● automatically tuned locality-sensitive hashing (still
experimental)
● QUIC-SVD (fast approximate SVD)
26. Command-line programs
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
8 / 47
You don’t need to be a C++ expert.
27. Command-line programs
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
9 / 47
You don’t need to be a C++ expert.
# Train AdaBoost model.
$ mlpack_adaboost -t training_file.h5 -l training_labels.h5
> -M trained_model.bin
# Predict with AdaBoost model.
$ mlpack_adaboost -m trained_model.bin -T test_set.csv
> -o test_set_predictions.csv
28. Command-line programs
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
10 / 47
You don’t need to be a C++ expert.
# Train AdaBoost model.
$ mlpack_adaboost -t training_file.h5 -l training_labels.h5
> -M trained_model.bin
# Predict with AdaBoost model.
$ mlpack_adaboost -m trained_model.bin -T test_set.csv
> -o test_set_predictions.csv
# Use PCA to reduce to 5 dimensions.
$ mlpack_pca -i dataset.txt -d 5 -o new_dataset.csv
29. Command-line programs
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
11 / 47
You don’t need to be a C++ expert.
# Train AdaBoost model.
$ mlpack_adaboost -t training_file.h5 -l training_labels.h5
> -M trained_model.bin
# Predict with AdaBoost model.
$ mlpack_adaboost -m trained_model.bin -T test_set.csv
> -o test_set_predictions.csv
# Use PCA to reduce to 5 dimensions.
$ mlpack_pca -i dataset.txt -d 5 -o new_dataset.csv
# Impute missing values ("NULL") in the input dataset to the
# mean in that dimension.
$ mlpack_preprocess_imputer -i dataset.h5 -s mean -o imputed.h5
30. Genericity
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
12 / 47
Why write an algorithm for one specific situation?
31. Genericity
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
13 / 47
Why write an algorithm for one specific situation?
NearestNeighborSearch n(dataset);
n.Search(query_set, 3, neighbors, distances);
What if I don’t want the Euclidean distance?
32. Genericity
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
14 / 47
Why write an algorithm for one specific situation?
// The numeric parameter is the value of p for the p-norm to
// use. 1 = Manhattan distance, 2 = Euclidean distance, etc.
NearestNeighborSearch n(dataset, 1);
n.Search(query_set, 3, neighbors, distances);
Ok, this is a little better!
33. Genericity
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
15 / 47
Why write an algorithm for one specific situation?
// ManhattanDistance is a class with a method Evaluate().
NearestNeighborSearch<ManhattanDistance> n(dataset);
n.Search(query_set, 3, neighbors, distances);
This is much better! The user can specify whatever
distance metric they want, including one they write
themselves.
34. Genericity
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
16 / 47
Why write an algorithm for one specific situation?
// This will _definitely_ get me best paper at ICML! I can
// feel it!
class MyStupidDistance
{
static double Evaluate(const arma::vec& a,
const arma::vec& b)
{
return 15.0 * std::abs(a[0] - b[0]);
}
};
// Now we can use it!
NearestNeighborSearch<MyStupidDistance> n(dataset);
n.Search(query_set, 3, neighbors, distances);
35. Genericity
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
17 / 47
Why write an algorithm for one specific situation?
// We can also use sparse matrices instead!
NearestNeighborSearch<MyStupidDistance, arma::sp_mat>
n(sparse_dataset);
n.Search(sparse_query_set, 3, neighbors, distances);
36. Genericity
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
18 / 47
Why write an algorithm for one specific situation?
// Nearest neighbor search with arbitrary types of trees!
NearestNeighborSearch<EuclideanDistance, arma::mat, KDTree> kn;
NearestNeighborSearch<EuclideanDistance, arma::sp_mat, CoverTree> cn;
NearestNeighborSearch<ManhattanDistance, arma::mat, Octree> on;
NearestNeighborSearch<ChebyshevDistance, arma::sp_mat, RPlusTree> rn;
NearestNeighborSearch<MahalanobisDistance, arma::mat, RPTree> rpn;
NearestNeighborSearch<EuclideanDistance, arma::mat, XTree> xn;
37. Genericity
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
19 / 47
Why write an algorithm for one specific situation?
// Nearest neighbor search with arbitrary types of trees!
NearestNeighborSearch<EuclideanDistance, arma::mat, KDTree> kn;
NearestNeighborSearch<EuclideanDistance, arma::sp_mat, CoverTree> cn;
NearestNeighborSearch<ManhattanDistance, arma::mat, Octree> on;
NearestNeighborSearch<ChebyshevDistance, arma::sp_mat, RPlusTree> rn;
NearestNeighborSearch<MahalanobisDistance, arma::mat, RPTree> rpn;
NearestNeighborSearch<EuclideanDistance, arma::mat, XTree> xn;
R.R. Curtin, “Improving dual-tree algorithms”. PhD thesis, Georgia Institute of Technology, At-
lanta, GA, 8/2015.
38. Genericity
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
20 / 47
Why write an algorithm for one specific situation?
// Nearest neighbor search with arbitrary types of trees!
NearestNeighborSearch<EuclideanDistance, arma::mat, KDTree> kn;
NearestNeighborSearch<EuclideanDistance, arma::sp_mat, CoverTree> cn;
NearestNeighborSearch<ManhattanDistance, arma::mat, Octree> on;
NearestNeighborSearch<ChebyshevDistance, arma::sp_mat, RPlusTree> rn;
NearestNeighborSearch<MahalanobisDistance, arma::mat, RPTree> rpn;
NearestNeighborSearch<EuclideanDistance, arma::mat, XTree> xn;
R.R. Curtin, “Improving dual-tree algorithms”. PhD thesis, Georgia Institute of Technology, At-
lanta, GA, 8/2015.
39. Genericity
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
21 / 47
Why write an algorithm for one specific situation?
// Nearest neighbor search with arbitrary types of trees!
NearestNeighborSearch<EuclideanDistance, arma::mat, KDTree> kn;
NearestNeighborSearch<EuclideanDistance, arma::sp_mat, CoverTree> cn;
NearestNeighborSearch<ManhattanDistance, arma::mat, Octree> on;
NearestNeighborSearch<ChebyshevDistance, arma::sp_mat, RPlusTree> rn;
NearestNeighborSearch<MahalanobisDistance, arma::mat, RPTree> rpn;
NearestNeighborSearch<EuclideanDistance, arma::mat, XTree> xn;
R.R. Curtin, “Improving dual-tree algorithms”. PhD thesis, Georgia Institute of Technology, At-
lanta, GA, 8/2015.
40. Pros of C++
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
22 / 47
C++ is great!
41. Pros of C++
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
22 / 47
C++ is great!
● Generic programming at compile time via templates.
42. Pros of C++
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
22 / 47
C++ is great!
● Generic programming at compile time via templates.
● Low-level memory management.
43. Pros of C++
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
22 / 47
C++ is great!
● Generic programming at compile time via templates.
● Low-level memory management.
● Little to no runtime overhead.
44. Pros of C++
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
22 / 47
C++ is great!
● Generic programming at compile time via templates.
● Low-level memory management.
● Little to no runtime overhead.
● Well-known!
45. Pros of C++
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
22 / 47
C++ is great!
● Generic programming at compile time via templates.
● Low-level memory management.
● Little to no runtime overhead.
● Well-known!
● The Armadillo library gives us good linear algebra
primitives.
46. Pros of C++
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
23 / 47
C++ is great!
● Generic programming at compile time via templates.
● Low-level memory management.
● Little to no runtime overhead.
● Well-known!
● The Armadillo library gives us good linear algebra
primitives.
using namespace arma;
extern mat x, y;
mat z = (x + y) * chol(x) + 3 * chol(y.t());
47. Cons of C++
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
24 / 47
C++ is not great!
48. Cons of C++
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
24 / 47
C++ is not great!
● Templates can be hard to debug because of error
messages.
49. Cons of C++
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
24 / 47
C++ is not great!
● Templates can be hard to debug because of error
messages.
● Memory bugs are easy to introduce.
50. Cons of C++
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
24 / 47
C++ is not great!
● Templates can be hard to debug because of error
messages.
● Memory bugs are easy to introduce.
● The new language revisions are not making the language
any simpler...
51. Cons of C++
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
24 / 47
C++ is not great!
● Templates can be hard to debug because of error
messages.
● Memory bugs are easy to introduce.
● The new language revisions are not making the language
any simpler...
52. Why templates?
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
25 / 47
What about virtual inheritance?
53. Why templates?
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
26 / 47
What about virtual inheritance?
class MyStupidDistance : public Distance
{
virtual double Evaluate(const arma::vec& a,
const arma::vec& b)
{
return 15.0 * std::abs(a[0] - b[0]);
}
};
NearestNeighborSearch n(dataset, new MyStupidDistance());
n.Search(3, neighbors, distances);
54. Why templates?
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
27 / 47
What about virtual inheritance?
class MyStupidDistance : public Distance
{
virtual double Evaluate(const arma::vec& a,
const arma::vec& b)
{
return 15.0 * std::abs(a[0] - b[0]);
}
};
NearestNeighborSearch n(dataset, new MyStupidDistance());
n.Search(3, neighbors, distances);
vtable lookup penalty!
55. Why templates?
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
28 / 47
Using inheritance to call a function costs us instructions:
Distance* d =
new MyStupidDistance();
d->Evaluate(a, b);
MyStupidDistance::Evaluate(a, b);
56. Why templates?
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
29 / 47
Using inheritance to call a function costs us instructions:
Distance* d =
new MyStupidDistance();
d->Evaluate(a, b);
MyStupidDistance::Evaluate(a, b);
; push stack pointer
movq %rsp, %rdi
; get location of function
movq $_ZTV1A+16, (%rsp)
; call Evaluate()
call _ZN1A1aEd
; just call Evaluate()!
call _ZN1B1aEd.isra.0.constprop.1
57. Why templates?
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
30 / 47
Using inheritance to call a function costs us instructions:
Distance* d =
new MyStupidDistance();
d->Evaluate(a, b);
MyStupidDistance::Evaluate(a, b);
; push stack pointer
movq %rsp, %rdi
; get location of function
movq $_ZTV1A+16, (%rsp)
; call Evaluate()
call _ZN1A1aEd
; just call Evaluate()!
call _ZN1B1aEd.isra.0.constprop.1
Up to 10%+ performance penalty in some situations!
58. Compile-time expressions
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
31 / 47
What about math? (Armadillo)
59. Compile-time expressions
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
32 / 47
What about math? (Armadillo)
In C:
extern double** a, b, c, d, e;
extern int rows, cols;
// We want to do e = a + b + c + d.
mat_copy(e, a, rows, cols);
mat_add(e, b, rows, cols);
mat_add(e, c, rows, cols);
mat_add(e, d, rows, cols);
60. Compile-time expressions
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
33 / 47
What about math? (Armadillo)
In C with a custom function:
extern double** a, b, c, d, e;
extern int rows, cols;
// We want to do e = a + b + c + d.
mat_add4_into(e, a, b, c, d, rows, cols);
Fastest! (one pass)
61. Compile-time expressions
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
34 / 47
What about math? (Armadillo)
In MATLAB:
e = a + b + c + d
62. Compile-time expressions
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
35 / 47
What about math? (Armadillo)
In MATLAB:
e = a + b + c + d
Beautiful!
63. Compile-time expressions
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
36 / 47
What about math? (Armadillo)
64. Compile-time expressions
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
37 / 47
What about math? (Armadillo)
65. Compile-time expressions
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
38 / 47
What about math? (Armadillo)
In C++ (with Armadillo):
using namespace arma;
extern mat a, b, c, d;
mat e = a + b + c + d;
No temporaries, only one pass! Just as fast as the fastest C
implementation.
66. Compile-time expressions
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
39 / 47
What about math? (Armadillo)
In C++ (with Armadillo):
using namespace arma;
extern mat a, b, c, d;
mat e = a + b + c + d;
● operator+(mat, mat) returns a placeholder type
op<mat, mat, add>
67. Compile-time expressions
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
40 / 47
What about math? (Armadillo)
In C++ (with Armadillo):
using namespace arma;
extern mat a, b, c, d;
mat e = a + b + c + d;
● operator+(mat, mat) returns a placeholder type
op<mat, mat, add>
● operator+() is also overloaded for op<...> arguments
68. Compile-time expressions
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
41 / 47
What about math? (Armadillo)
In C++ (with Armadillo):
using namespace arma;
extern mat a, b, c, d;
mat e = a + b + c + d;
● operator+(mat, mat) returns a placeholder type
op<mat, mat, add>
● operator+() is also overloaded for op<...> arguments
● So a + b + c + d returns
op<mat, op<mat, op<mat, mat, add>, add>, add>
69. Compile-time expressions
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
42 / 47
What about math? (Armadillo)
In C++ (with Armadillo):
using namespace arma;
extern mat a, b, c, d;
mat e = a + b + c + d;
● operator+(mat, mat) returns a placeholder type
op<mat, mat, add>
● operator+() is also overloaded for op<...> arguments
● So a + b + c + d returns
op<mat, op<mat, op<mat, mat, add>, add>, add>
● operator=() is overloaded to ‘unwrap’ op<> expressions
70. Take-home
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
43 / 47
● Templates give us generic code.
● Templates allow us to generate fast code.
71. kNN benchmarks
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
44 / 47
dataset d N mlpack mlpy matlab scikit shogun Weka
isolet 617 8k 15.65s 59.09s 50.88s 44.59s 59.56s 220.38s
corel 32 68k 17.70s 95.26s fail 63.32s fail 29.38s
covertype 54 581k 18.04s 27.68s >9000s 44.55s >9000s 42.34s
twitter 78 583k 1573.92s >9000s >9000s 4637.81s fail >9000s
mnist 784 70k 3129.46s >9000s fail 8494.24s 6040.16s >9000s
tinyImages 384 100k 4535.38s 9000s fail >9000s fail >9000s
72. kNN benchmarks
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
44 / 47
dataset d N mlpack mlpy matlab scikit shogun Weka
isolet 617 8k 15.65s 59.09s 50.88s 44.59s 59.56s 220.38s
corel 32 68k 17.70s 95.26s fail 63.32s fail 29.38s
covertype 54 581k 18.04s 27.68s >9000s 44.55s >9000s 42.34s
twitter 78 583k 1573.92s >9000s >9000s 4637.81s fail >9000s
mnist 784 70k 3129.46s >9000s fail 8494.24s 6040.16s >9000s
tinyImages 384 100k 4535.38s 9000s fail >9000s fail >9000s
73. vs. Spark
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
45 / 47
We can use mmap() for out-of-core learning because our
algorithms are generic!
74. vs. Spark
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
45 / 47
We can use mmap() for out-of-core learning because our
algorithms are generic!
D. Fang, P. Chau. M3: scaling up machine learning via memory mapping, SIGMOD/PODS 2016.
75. What’s coming?
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
46 / 47
● Bindings to other languages! (more than just R)
● Non-experimental support for deep learning
● Automatic tuning of machine learning algorithms
● Further compile-time speedups
76. Questions and comments?
❖ Introduction
❖ Graph #1
❖ The Big Tradeoff
❖ So, mlpack.
❖ Standard ML
❖ Cutting-edge ML
❖ Command-line
programs
❖ Genericity
❖ Pros of C++
❖ Cons of C++
❖ Why templates?
❖ Compile-time
expressions
❖ Take-home
❖ kNN benchmarks
❖ vs. Spark
❖ What’s coming?
❖ Questions and
comments?
47 / 47
http://www.mlpack.org/
https://github.com/mlpack/mlpack/