SlideShare una empresa de Scribd logo
1 de 11
Descargar para leer sin conexión
What Is Apache SystemML ?
● A machine learning system
● Designed to scale to Spark / Hadoop clusters
● Open source / Apache 2 license
● Developed in Java
● Supports R-like and Python-like languages
● Which are designed to scale into the big data range
● Automatic optimization at scale for data and cluster
SystemML Execution Modes
● System ML supports multiple execution modes
● Including
– Standalone
– Spark Batch
– Spark MLContext
– Hadoop Batch
– Java Machine Learning Connector (JMLC)
SystemML Dependencies
● System DS forked from ML 1.2
● Current dependencies
– Java 8+
– Scala 2.11+
– Python 2.7/3.5+
– Hadoop 2.6+
– Spark 2.1+
What Is Apache SystemDS ?
● Forked from Apache SystemML 1.2 in September 2018
● Supports linear algebra programs over matrices
● Replaces the underlying data model and compiler
● Substantially extends the supported functionalities
● Supports the whole data science lifecycle
– Data integration, cleaning
– Feature engineering
– Model training
● Over efficient
● Local and distributed ML
– Deployment, serving
What Is Apache SystemDS ?
● R-like languages for
– The data-science life cycle stages
– Differing expertise levels
● High-level scripts are compiled into hybrid execution plans
– For local, in-memory CPU / GPU operations
– For distributed operations on Apache Spark
● Underlying data model are DataTensors
– Tensors (multi-dimensional arrays) whose first dimension
– May have a heterogeneous and nested schema
SystemDS Algorithms
● Descriptive Statistics
– Univariate Statistics
– Bivariate Statistics
– Stratified Bivariate Statistics
● Classification
– Multinomial Logistic Regression
– Support Vector Machines
● Binary-Class Support Vector Machines
● Multi-Class Support Vector Machines
– Naive Bayes
– Decision Trees
– Random Forests
SystemDS Algorithms
● Clustering
– K-Means Clustering
● Regression
– Linear Regression
– Stepwise Linear Regression
– Generalized Linear Models
– Stepwise Generalized Linear Regression
– Regression Scoring and Prediction
● Matrix Factorization
– Principal Component Analysis
– Matrix Completion via Alternating Minimizations
SystemDS Algorithms
● Survival Analysis
– Kaplan-Meier Survival Analysis
– Cox Proportional Hazard Regression Model
●Factorization Machines
– Factorization Machine
SystemDS Deep Neural Nets
●
Use SystemDS to implement deep neural networks
– Specifying network in Keras format / invoke with Keras2DML API
– Specifying network in Caffe format / invoke with Caffe2DML API
– Use DML-bodied SystemDS-NN library
●
Ease training compute resource issues with
– Native BLAS (Basic Linear Algebra Subprograms)
– SystemDS GPU backend
Available Books
● See “Big Data Made Easy”
– Apress Jan 2015
●
See “Mastering Apache Spark”
– Packt Oct 2015
●
See “Complete Guide to Open Source Big Data Stack
– “Apress Jan 2018”
● Find the author on Amazon
– www.amazon.com/Michael-Frampton/e/B00NIQDOOM/
●
Connect on LinkedIn
– www.linkedin.com/in/mike-frampton-38563020
Connect
● Feel free to connect on LinkedIn
– www.linkedin.com/in/mike-frampton-38563020
● See my open source blog at
– open-source-systems.blogspot.com/
● I am always interested in
– New technology
– Opportunities
– Technology based issues
– Big data integration

Más contenido relacionado

Más de Mike Frampton

An introduction to Apache Mesos
An introduction to Apache MesosAn introduction to Apache Mesos
An introduction to Apache MesosMike Frampton
 
An introduction to Pentaho
An introduction to PentahoAn introduction to Pentaho
An introduction to PentahoMike Frampton
 
An introduction to Apache Thrift
An introduction to Apache ThriftAn introduction to Apache Thrift
An introduction to Apache ThriftMike Frampton
 
An introduction to Apache Cassandra
An introduction to Apache CassandraAn introduction to Apache Cassandra
An introduction to Apache CassandraMike Frampton
 
An example Hadoop Install
An example Hadoop InstallAn example Hadoop Install
An example Hadoop InstallMike Frampton
 
An Introduction to Apache Hadoop Yarn
An Introduction to Apache Hadoop YarnAn Introduction to Apache Hadoop Yarn
An Introduction to Apache Hadoop YarnMike Frampton
 
An Introduction to Cloud Computing
An Introduction to Cloud ComputingAn Introduction to Cloud Computing
An Introduction to Cloud ComputingMike Frampton
 
An Introduction to Hadoop Hue Gui
An Introduction to Hadoop Hue GuiAn Introduction to Hadoop Hue Gui
An Introduction to Hadoop Hue GuiMike Frampton
 
An introduction to Apache Hadoop Hive
An introduction to Apache Hadoop HiveAn introduction to Apache Hadoop Hive
An introduction to Apache Hadoop HiveMike Frampton
 
Introdution to Apache Hadoop
Introdution to Apache HadoopIntrodution to Apache Hadoop
Introdution to Apache HadoopMike Frampton
 

Más de Mike Frampton (20)

Apache Tephra
Apache TephraApache Tephra
Apache Tephra
 
Apache Kudu
Apache KuduApache Kudu
Apache Kudu
 
Apache Bahir
Apache BahirApache Bahir
Apache Bahir
 
Apache Arrow
Apache ArrowApache Arrow
Apache Arrow
 
JanusGraph DB
JanusGraph DBJanusGraph DB
JanusGraph DB
 
Apache Ignite
Apache IgniteApache Ignite
Apache Ignite
 
Apache Samza
Apache SamzaApache Samza
Apache Samza
 
Apache Flink
Apache FlinkApache Flink
Apache Flink
 
Apache Edgent
Apache EdgentApache Edgent
Apache Edgent
 
Apache CouchDB
Apache CouchDBApache CouchDB
Apache CouchDB
 
An introduction to Apache Mesos
An introduction to Apache MesosAn introduction to Apache Mesos
An introduction to Apache Mesos
 
An introduction to Pentaho
An introduction to PentahoAn introduction to Pentaho
An introduction to Pentaho
 
An introduction to Apache Thrift
An introduction to Apache ThriftAn introduction to Apache Thrift
An introduction to Apache Thrift
 
An introduction to Apache Cassandra
An introduction to Apache CassandraAn introduction to Apache Cassandra
An introduction to Apache Cassandra
 
An example Hadoop Install
An example Hadoop InstallAn example Hadoop Install
An example Hadoop Install
 
An Introduction to Apache Hadoop Yarn
An Introduction to Apache Hadoop YarnAn Introduction to Apache Hadoop Yarn
An Introduction to Apache Hadoop Yarn
 
An Introduction to Cloud Computing
An Introduction to Cloud ComputingAn Introduction to Cloud Computing
An Introduction to Cloud Computing
 
An Introduction to Hadoop Hue Gui
An Introduction to Hadoop Hue GuiAn Introduction to Hadoop Hue Gui
An Introduction to Hadoop Hue Gui
 
An introduction to Apache Hadoop Hive
An introduction to Apache Hadoop HiveAn introduction to Apache Hadoop Hive
An introduction to Apache Hadoop Hive
 
Introdution to Apache Hadoop
Introdution to Apache HadoopIntrodution to Apache Hadoop
Introdution to Apache Hadoop
 

Último

Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 

Último (20)

Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 

Apache SystemML AI/ML

  • 1. What Is Apache SystemML ? ● A machine learning system ● Designed to scale to Spark / Hadoop clusters ● Open source / Apache 2 license ● Developed in Java ● Supports R-like and Python-like languages ● Which are designed to scale into the big data range ● Automatic optimization at scale for data and cluster
  • 2. SystemML Execution Modes ● System ML supports multiple execution modes ● Including – Standalone – Spark Batch – Spark MLContext – Hadoop Batch – Java Machine Learning Connector (JMLC)
  • 3. SystemML Dependencies ● System DS forked from ML 1.2 ● Current dependencies – Java 8+ – Scala 2.11+ – Python 2.7/3.5+ – Hadoop 2.6+ – Spark 2.1+
  • 4. What Is Apache SystemDS ? ● Forked from Apache SystemML 1.2 in September 2018 ● Supports linear algebra programs over matrices ● Replaces the underlying data model and compiler ● Substantially extends the supported functionalities ● Supports the whole data science lifecycle – Data integration, cleaning – Feature engineering – Model training ● Over efficient ● Local and distributed ML – Deployment, serving
  • 5. What Is Apache SystemDS ? ● R-like languages for – The data-science life cycle stages – Differing expertise levels ● High-level scripts are compiled into hybrid execution plans – For local, in-memory CPU / GPU operations – For distributed operations on Apache Spark ● Underlying data model are DataTensors – Tensors (multi-dimensional arrays) whose first dimension – May have a heterogeneous and nested schema
  • 6. SystemDS Algorithms ● Descriptive Statistics – Univariate Statistics – Bivariate Statistics – Stratified Bivariate Statistics ● Classification – Multinomial Logistic Regression – Support Vector Machines ● Binary-Class Support Vector Machines ● Multi-Class Support Vector Machines – Naive Bayes – Decision Trees – Random Forests
  • 7. SystemDS Algorithms ● Clustering – K-Means Clustering ● Regression – Linear Regression – Stepwise Linear Regression – Generalized Linear Models – Stepwise Generalized Linear Regression – Regression Scoring and Prediction ● Matrix Factorization – Principal Component Analysis – Matrix Completion via Alternating Minimizations
  • 8. SystemDS Algorithms ● Survival Analysis – Kaplan-Meier Survival Analysis – Cox Proportional Hazard Regression Model ●Factorization Machines – Factorization Machine
  • 9. SystemDS Deep Neural Nets ● Use SystemDS to implement deep neural networks – Specifying network in Keras format / invoke with Keras2DML API – Specifying network in Caffe format / invoke with Caffe2DML API – Use DML-bodied SystemDS-NN library ● Ease training compute resource issues with – Native BLAS (Basic Linear Algebra Subprograms) – SystemDS GPU backend
  • 10. Available Books ● See “Big Data Made Easy” – Apress Jan 2015 ● See “Mastering Apache Spark” – Packt Oct 2015 ● See “Complete Guide to Open Source Big Data Stack – “Apress Jan 2018” ● Find the author on Amazon – www.amazon.com/Michael-Frampton/e/B00NIQDOOM/ ● Connect on LinkedIn – www.linkedin.com/in/mike-frampton-38563020
  • 11. Connect ● Feel free to connect on LinkedIn – www.linkedin.com/in/mike-frampton-38563020 ● See my open source blog at – open-source-systems.blogspot.com/ ● I am always interested in – New technology – Opportunities – Technology based issues – Big data integration