This presentation gives an overview of the Apache SystemML AI/ML project. It explains Apache SystemML AI/ML in terms of it's functionality, dependencies and how systemDS has been forked from it providing greater functionality.
Links for further information and connecting
http://www.amazon.com/Michael-Frampton/e/B00NIQDOOM/
https://nz.linkedin.com/pub/mike-frampton/20/630/385
https://open-source-systems.blogspot.com/
1. What Is Apache SystemML ?
● A machine learning system
● Designed to scale to Spark / Hadoop clusters
● Open source / Apache 2 license
● Developed in Java
● Supports R-like and Python-like languages
● Which are designed to scale into the big data range
● Automatic optimization at scale for data and cluster
2. SystemML Execution Modes
● System ML supports multiple execution modes
● Including
– Standalone
– Spark Batch
– Spark MLContext
– Hadoop Batch
– Java Machine Learning Connector (JMLC)
3. SystemML Dependencies
● System DS forked from ML 1.2
● Current dependencies
– Java 8+
– Scala 2.11+
– Python 2.7/3.5+
– Hadoop 2.6+
– Spark 2.1+
4. What Is Apache SystemDS ?
● Forked from Apache SystemML 1.2 in September 2018
● Supports linear algebra programs over matrices
● Replaces the underlying data model and compiler
● Substantially extends the supported functionalities
● Supports the whole data science lifecycle
– Data integration, cleaning
– Feature engineering
– Model training
● Over efficient
● Local and distributed ML
– Deployment, serving
5. What Is Apache SystemDS ?
● R-like languages for
– The data-science life cycle stages
– Differing expertise levels
● High-level scripts are compiled into hybrid execution plans
– For local, in-memory CPU / GPU operations
– For distributed operations on Apache Spark
● Underlying data model are DataTensors
– Tensors (multi-dimensional arrays) whose first dimension
– May have a heterogeneous and nested schema
6. SystemDS Algorithms
● Descriptive Statistics
– Univariate Statistics
– Bivariate Statistics
– Stratified Bivariate Statistics
● Classification
– Multinomial Logistic Regression
– Support Vector Machines
● Binary-Class Support Vector Machines
● Multi-Class Support Vector Machines
– Naive Bayes
– Decision Trees
– Random Forests
7. SystemDS Algorithms
● Clustering
– K-Means Clustering
● Regression
– Linear Regression
– Stepwise Linear Regression
– Generalized Linear Models
– Stepwise Generalized Linear Regression
– Regression Scoring and Prediction
● Matrix Factorization
– Principal Component Analysis
– Matrix Completion via Alternating Minimizations
9. SystemDS Deep Neural Nets
●
Use SystemDS to implement deep neural networks
– Specifying network in Keras format / invoke with Keras2DML API
– Specifying network in Caffe format / invoke with Caffe2DML API
– Use DML-bodied SystemDS-NN library
●
Ease training compute resource issues with
– Native BLAS (Basic Linear Algebra Subprograms)
– SystemDS GPU backend
10. Available Books
● See “Big Data Made Easy”
– Apress Jan 2015
●
See “Mastering Apache Spark”
– Packt Oct 2015
●
See “Complete Guide to Open Source Big Data Stack
– “Apress Jan 2018”
● Find the author on Amazon
– www.amazon.com/Michael-Frampton/e/B00NIQDOOM/
●
Connect on LinkedIn
– www.linkedin.com/in/mike-frampton-38563020
11. Connect
● Feel free to connect on LinkedIn
– www.linkedin.com/in/mike-frampton-38563020
● See my open source blog at
– open-source-systems.blogspot.com/
● I am always interested in
– New technology
– Opportunities
– Technology based issues
– Big data integration