Enviar búsqueda
Cargar
Declarative Machine Learning: Bring your own Syntax, Algorithm, Data and Infrastructure
•
Descargar como PPTX, PDF
•
2 recomendaciones
•
664 vistas
Turi, Inc.
Seguir
Presenter: Shivakumar Vaithyanathan, IBM Chief Scientist & Sr. Manager, IBM Research
Leer menos
Leer más
Tecnología
Denunciar
Compartir
Denunciar
Compartir
1 de 15
Descargar ahora
Recomendados
Better {ML} Together: GraphLab Create + Spark
Better {ML} Together: GraphLab Create + Spark
Turi, Inc.
Spark what's new what's coming
Spark what's new what's coming
Databricks
Spark's Role in the Big Data Ecosystem (Spark Summit 2014)
Spark's Role in the Big Data Ecosystem (Spark Summit 2014)
Databricks
End-to-end Data Pipeline with Apache Spark
End-to-end Data Pipeline with Apache Spark
Databricks
Processing Terabyte-Scale Genomics Datasets with ADAM: Spark Summit East talk...
Processing Terabyte-Scale Genomics Datasets with ADAM: Spark Summit East talk...
Spark Summit
Spark Application Carousel: Highlights of Several Applications Built with Spark
Spark Application Carousel: Highlights of Several Applications Built with Spark
Databricks
Enabling exploratory data science with Spark and R
Enabling exploratory data science with Spark and R
Databricks
Apache® Spark™ 1.6 presented by Databricks co-founder Patrick Wendell
Apache® Spark™ 1.6 presented by Databricks co-founder Patrick Wendell
Databricks
Recomendados
Better {ML} Together: GraphLab Create + Spark
Better {ML} Together: GraphLab Create + Spark
Turi, Inc.
Spark what's new what's coming
Spark what's new what's coming
Databricks
Spark's Role in the Big Data Ecosystem (Spark Summit 2014)
Spark's Role in the Big Data Ecosystem (Spark Summit 2014)
Databricks
End-to-end Data Pipeline with Apache Spark
End-to-end Data Pipeline with Apache Spark
Databricks
Processing Terabyte-Scale Genomics Datasets with ADAM: Spark Summit East talk...
Processing Terabyte-Scale Genomics Datasets with ADAM: Spark Summit East talk...
Spark Summit
Spark Application Carousel: Highlights of Several Applications Built with Spark
Spark Application Carousel: Highlights of Several Applications Built with Spark
Databricks
Enabling exploratory data science with Spark and R
Enabling exploratory data science with Spark and R
Databricks
Apache® Spark™ 1.6 presented by Databricks co-founder Patrick Wendell
Apache® Spark™ 1.6 presented by Databricks co-founder Patrick Wendell
Databricks
What’s New in the Berkeley Data Analytics Stack
What’s New in the Berkeley Data Analytics Stack
Turi, Inc.
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
MLconf
Web-Scale Graph Analytics with Apache® Spark™
Web-Scale Graph Analytics with Apache® Spark™
Databricks
Spark DataFrames and ML Pipelines
Spark DataFrames and ML Pipelines
Databricks
Spark streaming state of the union
Spark streaming state of the union
Databricks
Introduction to Spark (Intern Event Presentation)
Introduction to Spark (Intern Event Presentation)
Databricks
Strata NYC 2015 - Supercharging R with Apache Spark
Strata NYC 2015 - Supercharging R with Apache Spark
Databricks
Distributed Deep Learning + others for Spark Meetup
Distributed Deep Learning + others for Spark Meetup
Vijay Srinivas Agneeswaran, Ph.D
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
Databricks
Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...
Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...
Spark Summit
DASK and Apache Spark
DASK and Apache Spark
Databricks
Spark ML with High Dimensional Labels Michael Zargham and Stefan Panayotov
Spark ML with High Dimensional Labels Michael Zargham and Stefan Panayotov
Databricks
Teaching Apache Spark: Demonstrations on the Databricks Cloud Platform
Teaching Apache Spark: Demonstrations on the Databricks Cloud Platform
Yao Yao
Strata NYC 2015 - What's coming for the Spark community
Strata NYC 2015 - What's coming for the Spark community
Databricks
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Spark Summit
Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)
Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)
Spark Summit
SparkR: The Past, the Present and the Future-(Shivaram Venkataraman and Rui S...
SparkR: The Past, the Present and the Future-(Shivaram Venkataraman and Rui S...
Spark Summit
Accelerating Data Science with Better Data Engineering on Databricks
Accelerating Data Science with Better Data Engineering on Databricks
Databricks
Big Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLab
Impetus Technologies
Designing Distributed Machine Learning on Apache Spark
Designing Distributed Machine Learning on Apache Spark
Databricks
Ml in games intel game developer presentation v1.2
Ml in games intel game developer presentation v1.2
George Dolbier
Machine learning
Machine learning
Wes Eklund
Más contenido relacionado
La actualidad más candente
What’s New in the Berkeley Data Analytics Stack
What’s New in the Berkeley Data Analytics Stack
Turi, Inc.
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
MLconf
Web-Scale Graph Analytics with Apache® Spark™
Web-Scale Graph Analytics with Apache® Spark™
Databricks
Spark DataFrames and ML Pipelines
Spark DataFrames and ML Pipelines
Databricks
Spark streaming state of the union
Spark streaming state of the union
Databricks
Introduction to Spark (Intern Event Presentation)
Introduction to Spark (Intern Event Presentation)
Databricks
Strata NYC 2015 - Supercharging R with Apache Spark
Strata NYC 2015 - Supercharging R with Apache Spark
Databricks
Distributed Deep Learning + others for Spark Meetup
Distributed Deep Learning + others for Spark Meetup
Vijay Srinivas Agneeswaran, Ph.D
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
Databricks
Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...
Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...
Spark Summit
DASK and Apache Spark
DASK and Apache Spark
Databricks
Spark ML with High Dimensional Labels Michael Zargham and Stefan Panayotov
Spark ML with High Dimensional Labels Michael Zargham and Stefan Panayotov
Databricks
Teaching Apache Spark: Demonstrations on the Databricks Cloud Platform
Teaching Apache Spark: Demonstrations on the Databricks Cloud Platform
Yao Yao
Strata NYC 2015 - What's coming for the Spark community
Strata NYC 2015 - What's coming for the Spark community
Databricks
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Spark Summit
Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)
Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)
Spark Summit
SparkR: The Past, the Present and the Future-(Shivaram Venkataraman and Rui S...
SparkR: The Past, the Present and the Future-(Shivaram Venkataraman and Rui S...
Spark Summit
Accelerating Data Science with Better Data Engineering on Databricks
Accelerating Data Science with Better Data Engineering on Databricks
Databricks
Big Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLab
Impetus Technologies
Designing Distributed Machine Learning on Apache Spark
Designing Distributed Machine Learning on Apache Spark
Databricks
La actualidad más candente
(20)
What’s New in the Berkeley Data Analytics Stack
What’s New in the Berkeley Data Analytics Stack
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Web-Scale Graph Analytics with Apache® Spark™
Web-Scale Graph Analytics with Apache® Spark™
Spark DataFrames and ML Pipelines
Spark DataFrames and ML Pipelines
Spark streaming state of the union
Spark streaming state of the union
Introduction to Spark (Intern Event Presentation)
Introduction to Spark (Intern Event Presentation)
Strata NYC 2015 - Supercharging R with Apache Spark
Strata NYC 2015 - Supercharging R with Apache Spark
Distributed Deep Learning + others for Spark Meetup
Distributed Deep Learning + others for Spark Meetup
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...
Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...
DASK and Apache Spark
DASK and Apache Spark
Spark ML with High Dimensional Labels Michael Zargham and Stefan Panayotov
Spark ML with High Dimensional Labels Michael Zargham and Stefan Panayotov
Teaching Apache Spark: Demonstrations on the Databricks Cloud Platform
Teaching Apache Spark: Demonstrations on the Databricks Cloud Platform
Strata NYC 2015 - What's coming for the Spark community
Strata NYC 2015 - What's coming for the Spark community
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)
Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)
SparkR: The Past, the Present and the Future-(Shivaram Venkataraman and Rui S...
SparkR: The Past, the Present and the Future-(Shivaram Venkataraman and Rui S...
Accelerating Data Science with Better Data Engineering on Databricks
Accelerating Data Science with Better Data Engineering on Databricks
Big Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLab
Designing Distributed Machine Learning on Apache Spark
Designing Distributed Machine Learning on Apache Spark
Destacado
Ml in games intel game developer presentation v1.2
Ml in games intel game developer presentation v1.2
George Dolbier
Machine learning
Machine learning
Wes Eklund
Machine learning
Machine learning
Rob Thomas
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
MongoDB
Why You Should Care about Machine Learning And Artificial Intelligence Richar...
Why You Should Care about Machine Learning And Artificial Intelligence Richar...
Business of Software Conference
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
MLconf
IBM Watson Conversation: machine learning tools, artificial intelligence capa...
IBM Watson Conversation: machine learning tools, artificial intelligence capa...
Codemotion
IBM Watson & Cognitive Computing - Tech In Asia 2016
IBM Watson & Cognitive Computing - Tech In Asia 2016
Nugroho Gito
Ml, AI and IBM Watson - 101 for Business
Ml, AI and IBM Watson - 101 for Business
Jouko Poutanen
The MD Anderson / IBM Watson Announcement: What does it mean for machine lear...
The MD Anderson / IBM Watson Announcement: What does it mean for machine lear...
Health Catalyst
Cognitive Computing: Trends to Watch in 2016
Cognitive Computing: Trends to Watch in 2016
Bill Chamberlin
How to Use Text Analytics in Healthcare to Improve Outcomes: Why You Need Mor...
How to Use Text Analytics in Healthcare to Improve Outcomes: Why You Need Mor...
Health Catalyst
Destacado
(12)
Ml in games intel game developer presentation v1.2
Ml in games intel game developer presentation v1.2
Machine learning
Machine learning
Machine learning
Machine learning
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Why You Should Care about Machine Learning And Artificial Intelligence Richar...
Why You Should Care about Machine Learning And Artificial Intelligence Richar...
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
Jean-François Puget, Distinguished Engineer, Machine Learning and Optimizatio...
IBM Watson Conversation: machine learning tools, artificial intelligence capa...
IBM Watson Conversation: machine learning tools, artificial intelligence capa...
IBM Watson & Cognitive Computing - Tech In Asia 2016
IBM Watson & Cognitive Computing - Tech In Asia 2016
Ml, AI and IBM Watson - 101 for Business
Ml, AI and IBM Watson - 101 for Business
The MD Anderson / IBM Watson Announcement: What does it mean for machine lear...
The MD Anderson / IBM Watson Announcement: What does it mean for machine lear...
Cognitive Computing: Trends to Watch in 2016
Cognitive Computing: Trends to Watch in 2016
How to Use Text Analytics in Healthcare to Improve Outcomes: Why You Need Mor...
How to Use Text Analytics in Healthcare to Improve Outcomes: Why You Need Mor...
Similar a Declarative Machine Learning: Bring your own Syntax, Algorithm, Data and Infrastructure
Serverless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData Seattle
Jim Dowling
Galvanise NYC - Scaling R with Hadoop & Spark. V1.0
Galvanise NYC - Scaling R with Hadoop & Spark. V1.0
vithakur
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Cloudera, Inc.
R the unsung hero of Big Data
R the unsung hero of Big Data
Dhafer Malouche
Things you can find in the plan cache
Things you can find in the plan cache
sqlserver.co.il
Giga Spaces Data Grid / Data Caching Overview
Giga Spaces Data Grid / Data Caching Overview
jimliddle
Integration Patterns for Big Data Applications
Integration Patterns for Big Data Applications
Michael Häusler
Debugging Java from Dumps
Debugging Java from Dumps
Chris Bailey
Data herding
Data herding
unbracketed
Data herding
Data herding
unbracketed
Performance modeling and simulation for accumulo applications
Performance modeling and simulation for accumulo applications
Accumulo Summit
Lec1
Lec1
Ibrahim El-Torbany
Lec1
Lec1
Saad Gabr
Handout3o
Handout3o
Shahbaz Sidhu
Xadoop - new approaches to data analytics
Xadoop - new approaches to data analytics
Maxim Grinev
Cast Iron Cloud Integration Best Practices
Cast Iron Cloud Integration Best Practices
Sarath Ambadas
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
Amazon Web Services
Productionalizing ML : Real Experience
Productionalizing ML : Real Experience
Ihor Bobak
hadoop&zing
hadoop&zing
zingopen
Programmability in spss 14
Programmability in spss 14
Armand Ruis
Similar a Declarative Machine Learning: Bring your own Syntax, Algorithm, Data and Infrastructure
(20)
Serverless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData Seattle
Galvanise NYC - Scaling R with Hadoop & Spark. V1.0
Galvanise NYC - Scaling R with Hadoop & Spark. V1.0
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
R the unsung hero of Big Data
R the unsung hero of Big Data
Things you can find in the plan cache
Things you can find in the plan cache
Giga Spaces Data Grid / Data Caching Overview
Giga Spaces Data Grid / Data Caching Overview
Integration Patterns for Big Data Applications
Integration Patterns for Big Data Applications
Debugging Java from Dumps
Debugging Java from Dumps
Data herding
Data herding
Data herding
Data herding
Performance modeling and simulation for accumulo applications
Performance modeling and simulation for accumulo applications
Lec1
Lec1
Lec1
Lec1
Handout3o
Handout3o
Xadoop - new approaches to data analytics
Xadoop - new approaches to data analytics
Cast Iron Cloud Integration Best Practices
Cast Iron Cloud Integration Best Practices
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
Productionalizing ML : Real Experience
Productionalizing ML : Real Experience
hadoop&zing
hadoop&zing
Programmability in spss 14
Programmability in spss 14
Más de Turi, Inc.
Webinar - Analyzing Video
Webinar - Analyzing Video
Turi, Inc.
Webinar - Patient Readmission Risk
Webinar - Patient Readmission Risk
Turi, Inc.
Webinar - Know Your Customer - Arya (20160526)
Webinar - Know Your Customer - Arya (20160526)
Turi, Inc.
Webinar - Product Matching - Palombo (20160428)
Webinar - Product Matching - Palombo (20160428)
Turi, Inc.
Webinar - Pattern Mining Log Data - Vega (20160426)
Webinar - Pattern Mining Log Data - Vega (20160426)
Turi, Inc.
Webinar - Fraud Detection - Palombo (20160428)
Webinar - Fraud Detection - Palombo (20160428)
Turi, Inc.
Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge Datasets
Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge Datasets
Turi, Inc.
Pattern Mining: Extracting Value from Log Data
Pattern Mining: Extracting Value from Log Data
Turi, Inc.
Intelligent Applications with Machine Learning Toolkits
Intelligent Applications with Machine Learning Toolkits
Turi, Inc.
Text Analysis with Machine Learning
Text Analysis with Machine Learning
Turi, Inc.
Machine Learning with GraphLab Create
Machine Learning with GraphLab Create
Turi, Inc.
Machine Learning in Production with Dato Predictive Services
Machine Learning in Production with Dato Predictive Services
Turi, Inc.
Machine Learning in 2016: Live Q&A with Carlos Guestrin
Machine Learning in 2016: Live Q&A with Carlos Guestrin
Turi, Inc.
Scalable data structures for data science
Scalable data structures for data science
Turi, Inc.
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
Turi, Inc.
Introduction to Recommender Systems
Introduction to Recommender Systems
Turi, Inc.
Machine learning in production
Machine learning in production
Turi, Inc.
Overview of Machine Learning and Feature Engineering
Overview of Machine Learning and Feature Engineering
Turi, Inc.
SFrame
SFrame
Turi, Inc.
Building Personalized Data Products with Dato
Building Personalized Data Products with Dato
Turi, Inc.
Más de Turi, Inc.
(20)
Webinar - Analyzing Video
Webinar - Analyzing Video
Webinar - Patient Readmission Risk
Webinar - Patient Readmission Risk
Webinar - Know Your Customer - Arya (20160526)
Webinar - Know Your Customer - Arya (20160526)
Webinar - Product Matching - Palombo (20160428)
Webinar - Product Matching - Palombo (20160428)
Webinar - Pattern Mining Log Data - Vega (20160426)
Webinar - Pattern Mining Log Data - Vega (20160426)
Webinar - Fraud Detection - Palombo (20160428)
Webinar - Fraud Detection - Palombo (20160428)
Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge Datasets
Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge Datasets
Pattern Mining: Extracting Value from Log Data
Pattern Mining: Extracting Value from Log Data
Intelligent Applications with Machine Learning Toolkits
Intelligent Applications with Machine Learning Toolkits
Text Analysis with Machine Learning
Text Analysis with Machine Learning
Machine Learning with GraphLab Create
Machine Learning with GraphLab Create
Machine Learning in Production with Dato Predictive Services
Machine Learning in Production with Dato Predictive Services
Machine Learning in 2016: Live Q&A with Carlos Guestrin
Machine Learning in 2016: Live Q&A with Carlos Guestrin
Scalable data structures for data science
Scalable data structures for data science
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
Introduction to Recommender Systems
Introduction to Recommender Systems
Machine learning in production
Machine learning in production
Overview of Machine Learning and Feature Engineering
Overview of Machine Learning and Feature Engineering
SFrame
SFrame
Building Personalized Data Products with Dato
Building Personalized Data Products with Dato
Último
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
sammart93
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Juan lago vázquez
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
MIND CTI
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Orbitshub
Architecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
Zilliz
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
UiPathCommunity
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
Martijn de Jong
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
The Digital Insurer
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Edi Saputra
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
apidays
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
Zilliz
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
rafiqahmad00786416
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
danishmna97
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
Khushali Kathiriya
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
Product Anonymous
Último
(20)
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Architecting Cloud Native Applications
Architecting Cloud Native Applications
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
Declarative Machine Learning: Bring your own Syntax, Algorithm, Data and Infrastructure
1.
© 2015 IBM
Corporation Declarative Machine Learning: Bring your Own Algorithm, Data, Syntax and Infrastructure Shivakumar Vaithyanathan IBM Fellow Watson & IBM Research
2.
IBM Research © 2012
IBM Corporation Credit Risk Scoring Application at a Large Financial Institution To execute on one machine (with a hypothetical statistical package/engine) 3.6 TB of RAM required (underestimate). Reduced Set: 1.2 TB of RAM (underestimate) In practice more RAM is required – Outputs and intermediates also need to be stored along with the input 2 Prototypical of problems in other industries ranging from automotive to insurance to transportation Credit Risk Scoring Payment History Amount Owed Length of Credit History New Credit Types of Credit Used Problem size 300 million rows, 1500 features Reduced set: 500 features Data size on disk 3.6 TB (uncompressed) Even for reduced set: 1.2 TB Algorithm of interest Regression …
3.
IBM Research © 2012
IBM Corporation Insurance Big Data Analytics Usecases Problem Description – Consumer risk modeling – Consumer data with ~300 M rows and ~500 attributes 3 Problem Description – Predict customer monetary loss – Multi-million observations, 95 features, evaluate several hundred models for optimal subset of features Problem Description – Customer Satisfaction – Multi-million cars with few reacquired cars – Feature expansion from ~250 to ~21,800 Automotive DaaS (Retail Finance) RISK
4.
IBM Research © 2012
IBM Corporation A Day in the life of a Data Scientist …. 4 data sample data characteristics Develop new algorithm or modify existing algorithm original data Data scientist Bayesian networks Neural networks Random forests Support vector machines … algorithms Custom syntax
5.
IBM Research © 2012
IBM Corporation Bottleneck: Moving the algorithm onto Big Data Infrastructure 5 Data scientist Hadoop Programmer Spark Programmer MPI Programmer
6.
IBM Research © 2012
IBM Corporation What If .…. 6 Data scientist Hadoop Programmer Spark Programmer MPI Programmer compiler optimizer
7.
IBM Research © 2012
IBM Corporation Simplified view of what we want to build … 7 The What The How language tooling compiler optimizer High-level language Write any algorithm Adapt to different data and program characteristics Support different backend architectures and configurations
8.
IBM Research © 2012
IBM Corporation SystemML: IBM Research Project will soon be in Open Source 8 • IBM Research Project started 6 years ago • More than 10 papers in major conferences • In Beta for more than a year and used in multiple applications What • R- like, Python-like syntax, ….. • Rich set of statistical functions • User-defined & external function How • Single-node, embeddable and Hadoop & Spark • Dense / sparse matrix representation • Library of more than 15 algorithms In-Memory Single Node Hadoop / Spark Lower Ops (LOP) Higher Ops (HOP) R-parser Python- parser Writing a Python-syntax parser took less than 2 man-months
9.
IBM Research © 2012
IBM Corporation How should the “What” work ? 9 package gnmf; import java.io.IOException; import java.net.URISyntaxException; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.mapred.JobConf; public class MatrixGNMF { public static void main(String[] args) throws IOException, URISyntaxException { if(args.length < 10) { System.out.println("missing parameters"); System.out.println("expected parameters: [directory of v] [directory of w] [directory of h] " + "[k] [num mappers] [num reducers] [replication] [working directory] " + "[final directory of w] [final directory of h]"); System.exit(1); } String vDir = args[0]; String wDir = args[1]; String hDir = args[2]; int k = Integer.parseInt(args[3]); int numMappers = Integer.parseInt(args[4]); int numReducers = Integer.parseInt(args[5]); int replication = Integer.parseInt(args[6]); String outputDir = args[7]; String wFinalDir = args[8]; String hFinalDir = args[9]; JobConf mainJob = new JobConf(MatrixGNMF.class); String vDirectory; String wDirectory; String hDirectory; FileSystem.get(mainJob).delete(new Path(outputDir)); vDirectory = vDir; hDirectory = hDir; wDirectory = wDir; String workingDirectory; String resultDirectoryX; String resultDirectoryY; long start = System.currentTimeMillis(); System.gc(); System.out.println("starting calculation"); System.out.print("calculating X = WT * V... "); workingDirectory = UpdateWHStep1.runJob(numMappers, numReducers, replication, UpdateWHStep1.UPDATE_TYPE_H, vDirectory, wDirectory, outputDir, k); resultDirectoryX = UpdateWHStep2.runJob(numMappers, numReducers, replication, workingDirectory, outputDir); FileSystem.get(mainJob).delete(new Path(workingDirectory)); System.out.println("done"); System.out.print("calculating Y = WT * W * H... "); workingDirectory = UpdateWHStep3.runJob(numMappers, numReducers, replication, wDirectory, outputDir); resultDirectoryY = UpdateWHStep4.runJob(numMappers, replication, workingDirectory, UpdateWHStep4.UPDATE_TYPE_H, hDirectory, outputDir); FileSystem.get(mainJob).delete(new Path(workingDirectory)); System.out.println("done"); System.out.print("calculating H = H .* X ./ Y... "); workingDirectory = UpdateWHStep5.runJob(numMappers, numReducers, replication, hDirectory, resultDirectoryX, resultDirectoryY, hFinalDir, k); System.out.println("done"); FileSystem.get(mainJob).delete(new Path(resultDirectoryX)); FileSystem.get(mainJob).delete(new Path(resultDirectoryY)); System.out.print("storing back H... "); FileSystem.get(mainJob).delete(new Path(hDirectory)); hDirectory = workingDirectory; System.out.println("done"); System.out.print("calculating X = V * HT... "); workingDirectory = UpdateWHStep1.runJob(numMappers, numReducers, replication, UpdateWHStep1.UPDATE_TYPE_W, vDirectory, hDirectory, outputDir, k); resultDirectoryX = UpdateWHStep2.runJob(numMappers, numReducers, replication, workingDirectory, outputDir); FileSystem.get(mainJob).delete(new Path(workingDirectory)); System.out.println("done"); System.out.print("calculating Y = W * H * HT... "); workingDirectory = UpdateWHStep3.runJob(numMappers, numReducers, replication, hDirectory, outputDir); resultDirectoryY = UpdateWHStep4.runJob(numMappers, replication, workingDirectory, UpdateWHStep4.UPDATE_TYPE_W, wDirectory, outputDir); FileSystem.get(mainJob).delete(new Path(workingDirectory)); System.out.println("done"); System.out.print("calculating W = W .* X ./ Y... "); workingDirectory = UpdateWHStep5.runJob(numMappers, numReducers, replication, wDirectory, resultDirectoryX, resultDirectoryY, wFinalDir, k); System.out.println("done"); FileSystem.get(mainJob).delete(new Path(resultDirectoryX)); FileSystem.get(mainJob).delete(new Path(resultDirectoryY)); System.out.print("storing back W... "); FileSystem.get(mainJob).delete(new Path(wDirectory)); wDirectory = workingDirectory; System.out.println("done"); long requiredTime = System.currentTimeMillis() - start; long requiredTimeMilliseconds = requiredTime % 1000; requiredTime -= requiredTimeMilliseconds; requiredTime /= 1000; long requiredTimeSeconds = requiredTime % 60; requiredTime -= requiredTimeSeconds; requiredTime /= 60; long requiredTimeMinutes = requiredTime % 60; requiredTime -= requiredTimeMinutes; requiredTime /= 60; long requiredTimeHours = requiredTime; } } package gnmf; import gnmf.io.MatrixObject; import gnmf.io.MatrixVector; import gnmf.io.TaggedIndex; import java.io.IOException; import java.util.Iterator; import org.apache.hadoop.fs.Path; import org.apache.hadoop.mapred.FileInputFormat; import org.apache.hadoop.mapred.FileOutputFormat; import org.apache.hadoop.mapred.JobClient; import org.apache.hadoop.mapred.JobConf; import org.apache.hadoop.mapred.MapReduceBase; import org.apache.hadoop.mapred.Mapper; import org.apache.hadoop.mapred.OutputCollector; import org.apache.hadoop.mapred.Reducer; import org.apache.hadoop.mapred.Reporter; import org.apache.hadoop.mapred.SequenceFileInputFormat; import org.apache.hadoop.mapred.SequenceFileOutputFormat; public class UpdateWHStep2 { static class UpdateWHStep2Mapper extends MapReduceBase implements Mapper<TaggedIndex, MatrixVector, TaggedIndex, MatrixVector> { @Override public void map(TaggedIndex key, MatrixVector value, OutputCollector<TaggedIndex, MatrixVector> out, Reporter reporter) throws IOException { out.collect(key, value); } } static class UpdateWHStep2Reducer extends MapReduceBase implements Reducer<TaggedIndex, MatrixVector, TaggedIndex, MatrixObject> { @Override public void reduce(TaggedIndex key, Iterator<MatrixVector> values, OutputCollector<TaggedIndex, MatrixObject> out, Reporter reporter) throws IOException { MatrixVector result = null; while(values.hasNext()) { MatrixVector current = values.next(); if(result == null) { result = current.getCopy(); } else { result.addVector(current); } } if(result != null) { out.collect(new TaggedIndex(key.getIndex(), TaggedIndex.TYPE_VECTOR_X), new MatrixObject(result)); } } } public static String runJob(int numMappers, int numReducers, int replication, String inputDir, String outputDir) throws IOException { String workingDirectory = outputDir + System.currentTimeMillis() + "- UpdateWHStep2/"; JobConf job = new JobConf(UpdateWHStep2.class); job.setJobName("MatrixGNMFUpdateWHStep2"); job.setInputFormat(SequenceFileInputFormat.class); FileInputFormat.setInputPaths(job, new Path(inputDir)); job.setOutputFormat(SequenceFileOutputFormat.class); FileOutputFormat.setOutputPath(job, new Path(workingDirectory)); job.setNumMapTasks(numMappers); job.setMapperClass(UpdateWHStep2Mapper.class); job.setMapOutputKeyClass(TaggedIndex.class); job.setMapOutputValueClass(MatrixVector.class); job.setNumReduceTasks(numReducers); job.setReducerClass(UpdateWHStep2Reducer.class); job.setOutputKeyClass(TaggedIndex.class); job.setOutputValueClass(MatrixObject.class); JobClient.runJob(job); return workingDirectory; } } package gnmf; import gnmf.io.MatrixCell; import gnmf.io.MatrixFormats; import gnmf.io.MatrixObject; import gnmf.io.MatrixVector; import gnmf.io.TaggedIndex; import java.io.IOException; import java.util.Iterator; import org.apache.hadoop.filecache.DistributedCache; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.mapred.FileInputFormat; import org.apache.hadoop.mapred.FileOutputFormat; import org.apache.hadoop.mapred.JobClient; import org.apache.hadoop.mapred.JobConf; import org.apache.hadoop.mapred.MapReduceBase; import org.apache.hadoop.mapred.Mapper; import org.apache.hadoop.mapred.OutputCollector; import org.apache.hadoop.mapred.Reducer; import org.apache.hadoop.mapred.Reporter; import org.apache.hadoop.mapred.SequenceFileInputFormat; import org.apache.hadoop.mapred.SequenceFileOutputFormat; public class UpdateWHStep1 { public static final int UPDATE_TYPE_H = 0; public static final int UPDATE_TYPE_W = 1; static class UpdateWHStep1Mapper extends MapReduceBase implements Mapper<TaggedIndex, MatrixObject, TaggedIndex, MatrixObject> { private int updateType; @Override public void map(TaggedIndex key, MatrixObject value, OutputCollector<TaggedIndex, MatrixObject> out, Reporter reporter) throws IOException { if(updateType == UPDATE_TYPE_W && key.getType() == TaggedIndex.TYPE_CELL) { MatrixCell current = (MatrixCell) value.getObject(); out.collect(new TaggedIndex(current.getColumn(), TaggedIndex.TYPE_CELL), new MatrixObject(new MatrixCell(key.getIndex(), current.getValue()))); } else { out.collect(key, value); } } @Override public void configure(JobConf job) { updateType = job.getInt("gnmf.updateType", 0); } } static class UpdateWHStep1Reducer extends MapReduceBase implements Reducer<TaggedIndex, MatrixObject, TaggedIndex, MatrixVector> { private double[] baseVector = null; private int vectorSizeK; @Override public void reduce(TaggedIndex key, Iterator<MatrixObject> values, OutputCollector<TaggedIndex, MatrixVector> out, Reporter reporter) throws IOException { if(key.getType() == TaggedIndex.TYPE_VECTOR) { if(!values.hasNext()) throw new RuntimeException("expected vector"); MatrixFormats current = values.next().getObject(); if(!(current instanceof MatrixVector)) throw new RuntimeException("expected vector"); baseVector = ((MatrixVector) current).getValues(); } else { while(values.hasNext()) { MatrixCell current = (MatrixCell) values.next().getObject(); if(baseVector == null) { out.collect(new TaggedIndex(current.getColumn(), TaggedIndex.TYPE_VECTOR), new MatrixVector(vectorSizeK)); } else { if(baseVector.length == 0) throw new RuntimeException("base vector is corrupted"); MatrixVector resultingVector = new MatrixVector(baseVector); resultingVector.multiplyWithScalar(current.getValue()); if(resultingVector.getValues().length == 0) throw new RuntimeException("multiplying with scalar failed"); out.collect(new TaggedIndex(current.getColumn(), TaggedIndex.TYPE_VECTOR), resultingVector); } } baseVector = null; } } @Override public void configure(JobConf job) { vectorSizeK = job.getInt("dml.matrix.gnmf.k", 0); if(vectorSizeK == 0) throw new RuntimeException("invalid k specified"); } } public static String runJob(int numMappers, int numReducers, int replication, int updateType, String matrixInputDir, String whInputDir, String outputDir, int k) throws IOException { R syntax (10 lines of code) Python syntax (10 lines of code) A factor of 7 – 10 advantage in man- months over multiple algorithms
10.
IBM Research © 2012
IBM Corporation Scalability and Performance – GNMF Example 10 All operations execute on Single machine 0 MR Jobs Hybrid Execution (majority of operations execute on single machine) 4 MR Jobs Hybrid Execution (majority of operations execute in map-reduce) 6 MR Jobs
11.
IBM Research © 2012
IBM Corporation What does the “How” do ? 1111
12.
IBM Research © 2012
IBM Corporation What does the “How” do ? 12 X has 3 times more columns 300M 500 X 300M 1 y From 2.5 to GB Map Task JVM 7 GB In-Mem Master JVM Change in Cluster configuration 600M 500 X 600M 1 y X has 2 times more rows 300M 1500 X 300M 1 y X’y job1 X’y job2 X’X job solve X’y job1 X’y job2 X’X job solve 300M 500 X 300M 1 y Original data X’X and X’y job solve Execution plan Change in data characteristics X’X and X’y job solve X’X job1 X’X job2 X’y job solve 3X faster!
13.
IBM Research © 2012
IBM Corporation Compilation Chain Overview with Example 13 + %*% * b sb X y Q bsb Xbsb yXbsb Parse Tree If dimensions are unknown at compile time, validate will pass through and additional checks will be made at run time Runtime Instructions: CP: b+sb _mvar1 MR-Job: [map=X%*%_mvar1 _mvar2] CP: y*_mvar2 _mvar3 HOPs DAG: LOPs DAG:
14.
IBM Research © 2012
IBM Corporation Data fits in aggregated memory: SystemML optimizations give ~10X over Hadoop In-Memory Data Set (160GB) Some Performance Numbers for Spark / Hadoop Data larger than aggregated memory: SystemML optimizations give ~ 2X ML Program MR Backend (All ML optims) Spark Backend (All ML optims) Spark Backend (Limited ML optims) LinregDS 479s 342s 456s LinregCG 954s 188s 243s L2SVM 1,517s 237s 531s GLM 1,989s 205s 318s ML Program MR Backend (All ML optims) Spark Backend (All ML optims) LinregDS 5,429s 6,779s LinregCG 12,469s 10,014s L2SVM 24,360s 12,795s GLM 32,521s 17,301s Large-Scale Data Set (1.6TB)
15.
IBM Research © 2012
IBM Corporation Thank You
Notas del editor
8
10
Descargar ahora