SlideShare una empresa de Scribd logo
1 de 46
Data Science Company 
Machine Learning in Practice 
An InfoFarm Seminar 
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be 
Data 
Science 
Big 
Data 
Identifying, extracting and using data of all types 
and origins; exploring, correlating and using it in new 
and innovative ways in order to extract meaning 
and business value from it.
2 Data Scientists 4 Big Data 
Consultants 
1 Infrastructure 
Specialist 
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be 
Java 
PHP 
E-Commerce 
Mobile 
Web 
Development
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
Agenda 
• 13:00 What is Machine Learning? 
• 13:30 Techniques 
• 14:30 Tools 
• 15:00 Practical examples 
• 16:00 Wrap up 
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
What is Machine Learning? 
Veldkant 33A, Kontich ● info@infofarmDa.btae S●ciwewncwe. inCfoomfaprman.bye
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be 
Magic?
Machine Learning is a subfield of 
computer science and statistics that deals 
with systems that can learn from data, 
instead of follow explicitly programmed 
instructions. 
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
Machine Learning vs Data Science vs Big Data 
• You don’t need Big Data to leverage the 
benefits of machine learning, but more 
learning data makes a better machine 
• Data Science can help you to get the most 
out of Machine Learning 
• Machine Learning can help you to get the 
most out of Data Science 
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
Terminology 
Veldkant 33A, Kontich ● info@infofarmDa.btae S●ciwewncwe. inCfoomfaprman.bye
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be 
Terminology 
Weight (g) Wingspan (cm) Webbed feet? Back color Species 
1000.1 125.0 No Brown Buteo jamaicenis 
3000.7 200.0 No Gray Sagittarius serpentarius 
3300.0 220.3 No Gray Sagittarius serpentarius 
4100.0 136.0 Yes Black Gavia Immer 
3.0 11.0 No Green Colothorax lucifer 
570.0 75.0 No Black Campephilus principalic 
• Features / attributes 
• Instance / data point 
• Label / target variable 
• Factorial versus Numeric versus Binary data
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be 
Learning 
• Supervised Learning 
• Unsupervised Learning
Techniques 
Veldkant 33A, Kontich ● info@infofarmDa.btae S●ciwewncwe. inCfoomfaprman.bye
Machine 
Learning 
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be 
Clustering 
Classification 
Association 
Rules 
Regression 
Information 
extraction
Classification 
• Predict a category for a given instance 
• Mostly supervised learning. 
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be 
• Algorithms 
– Naïve Bayes 
– Support Vector Machine 
– Decision Trees 
– Neural Networks
Classification: Use Cases 
• Incoming mail redirection 
• Sentiment analysis 
• Order picking optimization 
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
Clustering 
• Try to find clusters in unstructured data 
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be 
• Unsupervised learning 
• Algorithms: K-Means
Clustering: Use cases 
• Customer profiling 
• Grouping of shopping items 
• Recommendation systems 
• Fraud detection 
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
Association Rule Learning 
• Find interesting relations 
• Find frequent occurring patterns 
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be 
• Algorithms 
– Apriori 
– Singular Value Decomposition 
– FP-growth
Association Rule Learning: Use Cases 
• Recommendations 
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be 
• Data exploration 
• Find connections between unrelated 
events 
• Frequent pattern mining
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be 
Regression 
• Prediction of a quantity 
• Algorithms: 
– Linear regression 
– Logistic regression
Regression: Use Cases 
• Order Quantity Prediction 
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be 
• Lag analysis 
• Trend estimation
Information Extraction 
• Extract variables out of unstructured data 
like text. 
• Named Entity Extraction 
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
Veldkant 33A, Kontich ● info@infofarmDa.btae S●ciwewncwe. inCfoomfaprman.bye 
Tools
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be 
Apache Mahout 
Pro Contra 
Relatively stable Poor documentation 
Build on Hadoop – Scales well Mahout is currently migrating from 
Apache Hadoop to Apache Spark. 
Development is slow and Apache Spark 
already built a machine learning library of 
their own… Instant legacy? 
Command-line access for most algorithms Kind of slow for smaller use cases 
All important algorithms are available
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be 
Weka 
Pro Contra 
A lot of algorithms are available Not ‘Big Data’ ready 
Graphical user interface for prototyping 
and experimenting 
Requires custom data format – ARRF-files 
Available as a Java library Optimized for academic use cases
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be 
Apache Spark: MLLib 
Pro Contra 
Based on Apache Spark – Very, fast and 
scalable 
Based on Apache Spark – Requires 
knowledge of Spark and Scala 
Very fast development cycle, new features 
are rolling out every couple of months 
Relatively new, so a small choice of 
algorithms. But the essential ones are 
there. 
New and refreshing API, easy integration 
with other components of Apache Spark.
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be 
R 
Pro Contra 
A lot of algorithms are available Can run on Hadoop/Spark, but requires a 
lot of knowledge from both platforms 
Well documented Must learn a new language 
Lot’s of existing packages, that are easily 
available
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be 
Noteworthy 
Java 
• DeepLearning4J 
• Mallet 
• MOA 
Python 
• NLTK 
• Theano 
• PyBrain 
• SciKit-Learn 
Lua 
• Torch 
General 
• LibSVM 
• LibLinear
Integration with Software Development 
Veldkant 33A, Kontich ● info@infofarmDa.btae S●ciwewncwe. inCfoomfaprman.bye
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be 
Development Cycle 
Collect Analyze Extract Train Test Use
Feature extraction 
• Describe an instance to be 
used in an algorithm 
• Recognize hand-written digits 
by converting the images to 
lines of 1’s and 0’s 
00000000000001000000000000000000 
00000000000001110000000000000000 
00000000000011110000000000000000 
00000000001111100000000000000000 
00000000001111000000000000000000 
00000000000111100000000000000000 
00000000001111100000000000000000 
00000000011111000000000000000000 
00000000011110000000000000000000 
00000000111110000000000000000000 
00000000011111000000000000000000 
00000000111111000000000000000000 
00000000111110000000000000000000 
00000000111100000000000000000000 
00000000011110000000000000000000 
00000000111110000111000000000000 
00000001111111111111111100000000 
00000001111111111111111110000000 
00000001111111111111111110000000 
00000000111111111111111111100000 
00000001111111110000011111100000 
00000001111100000000000111100000 
00000000111100000000000111100000 
00000000011110000000000011110000 
00000000011111000000000011110000 
00000000011111100000001111110000 
00000000011111111111111111110000 
00000000011111111111111111100000 
00000000000111111111111111100000 
00000000000011111111111111100000 
00000000000000111111111000000000 
00000000000000001111110000000000 
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
Training an algorithm 
1. Collect you’re data as a collection of 
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be 
instances 
2. Split you’re data set into a training set 
and a testing set 
3. Train the algorithm with the training set 
4. Validate the results using the test set
Runtime model 
• During training most algorithms generate a 
mathematical runtime model. 
• Model should be updated on a regular 
basis 
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
A / B Testing 
• Slow integration in the main system. 
• If the machine is certain (enough) the 
machine can take over 
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
Hands-on 
Veldkant 33A, Kontich ● info@infofarmDa.btae S●ciwewncwe. inCfoomfaprman.bye
Demo 
• K-Nearest Neighbour Classifier 
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be 
• Clustering using Weka 
• Named-Entity Extraction 
• Classification of tweets
What’s in it for you? 
Veldkant 33A, Kontich ● info@infofarmDa.btae S●ciwewncwe. inCfoomfaprman.bye
Benefits of using machine learning 
• Automate repetitive tasks 
• Can be a solution for problems that are 
difficult to automate 
• Gain insights about your business 
• Optimize business decisions by using the 
opinion of the computer 
Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
Questions? 
Veldkant 33A, Kontich ● info@infofarmDa.btae S●ciwewncwe. inCfoomfaprman.bye
Wrap-up 
Veldkant 33A, Kontich ● info@infofarmDa.btae S●ciwewncwe. inCfoomfaprman.bye

Más contenido relacionado

La actualidad más candente

Machine learning overview
Machine learning overviewMachine learning overview
Machine learning overviewprih_yah
 
Types of Machine Learning
Types of Machine LearningTypes of Machine Learning
Types of Machine LearningSamra Shahzadi
 
Machine Learning Algorithms
Machine Learning AlgorithmsMachine Learning Algorithms
Machine Learning AlgorithmsDezyreAcademy
 
An introduction to Machine Learning
An introduction to Machine LearningAn introduction to Machine Learning
An introduction to Machine Learningbutest
 
Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Marina Santini
 
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...Simplilearn
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningEng Teong Cheah
 
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...Simplilearn
 
Machine Learning
Machine LearningMachine Learning
Machine LearningVivek Garg
 
Lecture #1: Introduction to machine learning (ML)
Lecture #1: Introduction to machine learning (ML)Lecture #1: Introduction to machine learning (ML)
Lecture #1: Introduction to machine learning (ML)butest
 
Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)SwatiTripathi44
 
Natural language processing
Natural language processingNatural language processing
Natural language processingKarenVacca
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.butest
 

La actualidad más candente (20)

Machine learning overview
Machine learning overviewMachine learning overview
Machine learning overview
 
Types of Machine Learning
Types of Machine LearningTypes of Machine Learning
Types of Machine Learning
 
Machine learning
Machine learningMachine learning
Machine learning
 
Machine Learning Algorithms
Machine Learning AlgorithmsMachine Learning Algorithms
Machine Learning Algorithms
 
Machine learning
Machine learningMachine learning
Machine learning
 
An introduction to Machine Learning
An introduction to Machine LearningAn introduction to Machine Learning
An introduction to Machine Learning
 
Machine learning
Machine learningMachine learning
Machine learning
 
Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?
 
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
 
Machine learning
Machine learning Machine learning
Machine learning
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Machine Learning ppt
Machine Learning pptMachine Learning ppt
Machine Learning ppt
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
machine learning
machine learningmachine learning
machine learning
 
Lecture #1: Introduction to machine learning (ML)
Lecture #1: Introduction to machine learning (ML)Lecture #1: Introduction to machine learning (ML)
Lecture #1: Introduction to machine learning (ML)
 
Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.
 

Destacado

Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningRahul Jain
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningLior Rokach
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learningbigdatasyd
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningLars Marius Garshol
 
Machine Learning and Data Mining: 10 Introduction to Classification
Machine Learning and Data Mining: 10 Introduction to ClassificationMachine Learning and Data Mining: 10 Introduction to Classification
Machine Learning and Data Mining: 10 Introduction to ClassificationPier Luca Lanzi
 
林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning台灣資料科學年會
 
Introduction to Machine Learning and Deep Learning
Introduction to Machine Learning and Deep LearningIntroduction to Machine Learning and Deep Learning
Introduction to Machine Learning and Deep LearningTerry Taewoong Um
 
Retail Detail OmniChannel Congress 2015 - Data Science for e-commerce
Retail Detail OmniChannel Congress 2015 - Data Science for e-commerceRetail Detail OmniChannel Congress 2015 - Data Science for e-commerce
Retail Detail OmniChannel Congress 2015 - Data Science for e-commerceInfoFarm
 
Boosting big data with apache spark
Boosting big data with apache sparkBoosting big data with apache spark
Boosting big data with apache sparkInfoFarm
 
Data Driven Decisions seminar
Data Driven Decisions seminarData Driven Decisions seminar
Data Driven Decisions seminarInfoFarm
 
An explanation of machine learning for business
An explanation of machine learning for businessAn explanation of machine learning for business
An explanation of machine learning for businessClement Levallois
 
Big Data with Apache Hadoop
Big Data with Apache HadoopBig Data with Apache Hadoop
Big Data with Apache HadoopInfoFarm
 
First impressions of SparkR: our own machine learning algorithm
First impressions of SparkR: our own machine learning algorithmFirst impressions of SparkR: our own machine learning algorithm
First impressions of SparkR: our own machine learning algorithmInfoFarm
 
Fikrimuhal TRHUG 2016 Machine Learning
Fikrimuhal TRHUG 2016 Machine LearningFikrimuhal TRHUG 2016 Machine Learning
Fikrimuhal TRHUG 2016 Machine LearningSukru Hasdemir
 
Real Time Big Data
Real Time Big DataReal Time Big Data
Real Time Big DataInfoFarm
 
Harvesting business Value with Data Science
Harvesting business Value with Data ScienceHarvesting business Value with Data Science
Harvesting business Value with Data ScienceInfoFarm
 

Destacado (20)

Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Machine Learning for Dummies
Machine Learning for DummiesMachine Learning for Dummies
Machine Learning for Dummies
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine Learning
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Machine learning
Machine learningMachine learning
Machine learning
 
Machine Learning and Data Mining: 10 Introduction to Classification
Machine Learning and Data Mining: 10 Introduction to ClassificationMachine Learning and Data Mining: 10 Introduction to Classification
Machine Learning and Data Mining: 10 Introduction to Classification
 
林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning林守德/Practical Issues in Machine Learning
林守德/Practical Issues in Machine Learning
 
Introduction to Machine Learning and Deep Learning
Introduction to Machine Learning and Deep LearningIntroduction to Machine Learning and Deep Learning
Introduction to Machine Learning and Deep Learning
 
Retail Detail OmniChannel Congress 2015 - Data Science for e-commerce
Retail Detail OmniChannel Congress 2015 - Data Science for e-commerceRetail Detail OmniChannel Congress 2015 - Data Science for e-commerce
Retail Detail OmniChannel Congress 2015 - Data Science for e-commerce
 
Boosting big data with apache spark
Boosting big data with apache sparkBoosting big data with apache spark
Boosting big data with apache spark
 
Data Driven Decisions seminar
Data Driven Decisions seminarData Driven Decisions seminar
Data Driven Decisions seminar
 
An explanation of machine learning for business
An explanation of machine learning for businessAn explanation of machine learning for business
An explanation of machine learning for business
 
Big Data with Apache Hadoop
Big Data with Apache HadoopBig Data with Apache Hadoop
Big Data with Apache Hadoop
 
First impressions of SparkR: our own machine learning algorithm
First impressions of SparkR: our own machine learning algorithmFirst impressions of SparkR: our own machine learning algorithm
First impressions of SparkR: our own machine learning algorithm
 
Fikrimuhal TRHUG 2016 Machine Learning
Fikrimuhal TRHUG 2016 Machine LearningFikrimuhal TRHUG 2016 Machine Learning
Fikrimuhal TRHUG 2016 Machine Learning
 
Real Time Big Data
Real Time Big DataReal Time Big Data
Real Time Big Data
 
Harvesting business Value with Data Science
Harvesting business Value with Data ScienceHarvesting business Value with Data Science
Harvesting business Value with Data Science
 

Similar a Machine learning

Automated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and TrackingAutomated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and TrackingDatabricks
 
SystemT: Declarative Information Extraction (invited talk at MIT CSAIL)
SystemT: Declarative Information Extraction (invited talk at MIT CSAIL)SystemT: Declarative Information Extraction (invited talk at MIT CSAIL)
SystemT: Declarative Information Extraction (invited talk at MIT CSAIL)Laura Chiticariu
 
Strategies for Processing and Explaining Distributed Queries on Linked Data
Strategies for Processing and Explaining Distributed Queries on Linked DataStrategies for Processing and Explaining Distributed Queries on Linked Data
Strategies for Processing and Explaining Distributed Queries on Linked DataRakebul Hasan
 
Tips to get the most out of OpenERP
Tips to get the most out of OpenERPTips to get the most out of OpenERP
Tips to get the most out of OpenERPAudaxis
 
Tips to get the most out of OpenERP. Jean Luc Delsaute & Coralie Girardet, Au...
Tips to get the most out of OpenERP. Jean Luc Delsaute & Coralie Girardet, Au...Tips to get the most out of OpenERP. Jean Luc Delsaute & Coralie Girardet, Au...
Tips to get the most out of OpenERP. Jean Luc Delsaute & Coralie Girardet, Au...Odoo
 
An AI-Powered Chatbot to Simplify Apache Spark Performance Management
An AI-Powered Chatbot to Simplify Apache Spark Performance ManagementAn AI-Powered Chatbot to Simplify Apache Spark Performance Management
An AI-Powered Chatbot to Simplify Apache Spark Performance ManagementDatabricks
 
Mortal analytics - Covid-19 and the problem of data quality
Mortal analytics - Covid-19 and the problem of data qualityMortal analytics - Covid-19 and the problem of data quality
Mortal analytics - Covid-19 and the problem of data qualityLars Albertsson
 
PYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALS
PYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALSPYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALS
PYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALSQuantUniversity
 
MITRE ATT&CKcon 2018: Hunters ATT&CKing with the Data, Roberto Rodriguez, Spe...
MITRE ATT&CKcon 2018: Hunters ATT&CKing with the Data, Roberto Rodriguez, Spe...MITRE ATT&CKcon 2018: Hunters ATT&CKing with the Data, Roberto Rodriguez, Spe...
MITRE ATT&CKcon 2018: Hunters ATT&CKing with the Data, Roberto Rodriguez, Spe...MITRE - ATT&CKcon
 
Elasticsearch Performance Testing and Scaling @ Signal
Elasticsearch Performance Testing and Scaling @ SignalElasticsearch Performance Testing and Scaling @ Signal
Elasticsearch Performance Testing and Scaling @ SignalJoachim Draeger
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixC4Media
 
IBANK - Big data www.ibank.uk.com 07474222079
IBANK - Big data www.ibank.uk.com 07474222079IBANK - Big data www.ibank.uk.com 07474222079
IBANK - Big data www.ibank.uk.com 07474222079ibankuk
 
Don't build a data science team
Don't build a data science teamDon't build a data science team
Don't build a data science teamLars Albertsson
 
How I Learned to Stop Worrying and Love Building Data Products
How I Learned to Stop Worrying and Love Building Data ProductsHow I Learned to Stop Worrying and Love Building Data Products
How I Learned to Stop Worrying and Love Building Data ProductsAlejandro Correa Bahnsen, PhD
 

Similar a Machine learning (20)

Automated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and TrackingAutomated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and Tracking
 
SystemT: Declarative Information Extraction (invited talk at MIT CSAIL)
SystemT: Declarative Information Extraction (invited talk at MIT CSAIL)SystemT: Declarative Information Extraction (invited talk at MIT CSAIL)
SystemT: Declarative Information Extraction (invited talk at MIT CSAIL)
 
Strategies for Processing and Explaining Distributed Queries on Linked Data
Strategies for Processing and Explaining Distributed Queries on Linked DataStrategies for Processing and Explaining Distributed Queries on Linked Data
Strategies for Processing and Explaining Distributed Queries on Linked Data
 
Big data
Big dataBig data
Big data
 
Tips to get the most out of OpenERP
Tips to get the most out of OpenERPTips to get the most out of OpenERP
Tips to get the most out of OpenERP
 
Tips to get the most out of OpenERP. Jean Luc Delsaute & Coralie Girardet, Au...
Tips to get the most out of OpenERP. Jean Luc Delsaute & Coralie Girardet, Au...Tips to get the most out of OpenERP. Jean Luc Delsaute & Coralie Girardet, Au...
Tips to get the most out of OpenERP. Jean Luc Delsaute & Coralie Girardet, Au...
 
An AI-Powered Chatbot to Simplify Apache Spark Performance Management
An AI-Powered Chatbot to Simplify Apache Spark Performance ManagementAn AI-Powered Chatbot to Simplify Apache Spark Performance Management
An AI-Powered Chatbot to Simplify Apache Spark Performance Management
 
Mortal analytics - Covid-19 and the problem of data quality
Mortal analytics - Covid-19 and the problem of data qualityMortal analytics - Covid-19 and the problem of data quality
Mortal analytics - Covid-19 and the problem of data quality
 
PYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALS
PYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALSPYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALS
PYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALS
 
MITRE ATT&CKcon 2018: Hunters ATT&CKing with the Data, Roberto Rodriguez, Spe...
MITRE ATT&CKcon 2018: Hunters ATT&CKing with the Data, Roberto Rodriguez, Spe...MITRE ATT&CKcon 2018: Hunters ATT&CKing with the Data, Roberto Rodriguez, Spe...
MITRE ATT&CKcon 2018: Hunters ATT&CKing with the Data, Roberto Rodriguez, Spe...
 
Elasticsearch Performance Testing and Scaling @ Signal
Elasticsearch Performance Testing and Scaling @ SignalElasticsearch Performance Testing and Scaling @ Signal
Elasticsearch Performance Testing and Scaling @ Signal
 
ML master class
ML master classML master class
ML master class
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
 
IBANK - Big data www.ibank.uk.com 07474222079
IBANK - Big data www.ibank.uk.com 07474222079IBANK - Big data www.ibank.uk.com 07474222079
IBANK - Big data www.ibank.uk.com 07474222079
 
Don't build a data science team
Don't build a data science teamDon't build a data science team
Don't build a data science team
 
How I Learned to Stop Worrying and Love Building Data Products
How I Learned to Stop Worrying and Love Building Data ProductsHow I Learned to Stop Worrying and Love Building Data Products
How I Learned to Stop Worrying and Love Building Data Products
 
Big data
Big dataBig data
Big data
 
Big data
Big dataBig data
Big data
 
Hadoop PDF
Hadoop PDFHadoop PDF
Hadoop PDF
 
Skillwise Big data
Skillwise Big dataSkillwise Big data
Skillwise Big data
 

Último

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 

Último (20)

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 

Machine learning

  • 1. Data Science Company Machine Learning in Practice An InfoFarm Seminar Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
  • 2. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be Data Science Big Data Identifying, extracting and using data of all types and origins; exploring, correlating and using it in new and innovative ways in order to extract meaning and business value from it.
  • 3. 2 Data Scientists 4 Big Data Consultants 1 Infrastructure Specialist Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
  • 4. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be Java PHP E-Commerce Mobile Web Development
  • 5. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
  • 6. Agenda • 13:00 What is Machine Learning? • 13:30 Techniques • 14:30 Tools • 15:00 Practical examples • 16:00 Wrap up Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
  • 7. What is Machine Learning? Veldkant 33A, Kontich ● info@infofarmDa.btae S●ciwewncwe. inCfoomfaprman.bye
  • 8. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be Magic?
  • 9. Machine Learning is a subfield of computer science and statistics that deals with systems that can learn from data, instead of follow explicitly programmed instructions. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
  • 10. Machine Learning vs Data Science vs Big Data • You don’t need Big Data to leverage the benefits of machine learning, but more learning data makes a better machine • Data Science can help you to get the most out of Machine Learning • Machine Learning can help you to get the most out of Data Science Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
  • 11. Terminology Veldkant 33A, Kontich ● info@infofarmDa.btae S●ciwewncwe. inCfoomfaprman.bye
  • 12. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be Terminology Weight (g) Wingspan (cm) Webbed feet? Back color Species 1000.1 125.0 No Brown Buteo jamaicenis 3000.7 200.0 No Gray Sagittarius serpentarius 3300.0 220.3 No Gray Sagittarius serpentarius 4100.0 136.0 Yes Black Gavia Immer 3.0 11.0 No Green Colothorax lucifer 570.0 75.0 No Black Campephilus principalic • Features / attributes • Instance / data point • Label / target variable • Factorial versus Numeric versus Binary data
  • 13. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be Learning • Supervised Learning • Unsupervised Learning
  • 14. Techniques Veldkant 33A, Kontich ● info@infofarmDa.btae S●ciwewncwe. inCfoomfaprman.bye
  • 15. Machine Learning Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be Clustering Classification Association Rules Regression Information extraction
  • 16. Classification • Predict a category for a given instance • Mostly supervised learning. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be • Algorithms – Naïve Bayes – Support Vector Machine – Decision Trees – Neural Networks
  • 17. Classification: Use Cases • Incoming mail redirection • Sentiment analysis • Order picking optimization Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
  • 18. Clustering • Try to find clusters in unstructured data Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be • Unsupervised learning • Algorithms: K-Means
  • 19. Clustering: Use cases • Customer profiling • Grouping of shopping items • Recommendation systems • Fraud detection Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
  • 20. Association Rule Learning • Find interesting relations • Find frequent occurring patterns Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be • Algorithms – Apriori – Singular Value Decomposition – FP-growth
  • 21. Association Rule Learning: Use Cases • Recommendations Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be • Data exploration • Find connections between unrelated events • Frequent pattern mining
  • 22. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be Regression • Prediction of a quantity • Algorithms: – Linear regression – Logistic regression
  • 23. Regression: Use Cases • Order Quantity Prediction Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be • Lag analysis • Trend estimation
  • 24. Information Extraction • Extract variables out of unstructured data like text. • Named Entity Extraction Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
  • 25. Veldkant 33A, Kontich ● info@infofarmDa.btae S●ciwewncwe. inCfoomfaprman.bye Tools
  • 26. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
  • 27. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be Apache Mahout Pro Contra Relatively stable Poor documentation Build on Hadoop – Scales well Mahout is currently migrating from Apache Hadoop to Apache Spark. Development is slow and Apache Spark already built a machine learning library of their own… Instant legacy? Command-line access for most algorithms Kind of slow for smaller use cases All important algorithms are available
  • 28. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
  • 29. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be Weka Pro Contra A lot of algorithms are available Not ‘Big Data’ ready Graphical user interface for prototyping and experimenting Requires custom data format – ARRF-files Available as a Java library Optimized for academic use cases
  • 30. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
  • 31. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be Apache Spark: MLLib Pro Contra Based on Apache Spark – Very, fast and scalable Based on Apache Spark – Requires knowledge of Spark and Scala Very fast development cycle, new features are rolling out every couple of months Relatively new, so a small choice of algorithms. But the essential ones are there. New and refreshing API, easy integration with other components of Apache Spark.
  • 32. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
  • 33. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be R Pro Contra A lot of algorithms are available Can run on Hadoop/Spark, but requires a lot of knowledge from both platforms Well documented Must learn a new language Lot’s of existing packages, that are easily available
  • 34. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be Noteworthy Java • DeepLearning4J • Mallet • MOA Python • NLTK • Theano • PyBrain • SciKit-Learn Lua • Torch General • LibSVM • LibLinear
  • 35. Integration with Software Development Veldkant 33A, Kontich ● info@infofarmDa.btae S●ciwewncwe. inCfoomfaprman.bye
  • 36. Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be Development Cycle Collect Analyze Extract Train Test Use
  • 37. Feature extraction • Describe an instance to be used in an algorithm • Recognize hand-written digits by converting the images to lines of 1’s and 0’s 00000000000001000000000000000000 00000000000001110000000000000000 00000000000011110000000000000000 00000000001111100000000000000000 00000000001111000000000000000000 00000000000111100000000000000000 00000000001111100000000000000000 00000000011111000000000000000000 00000000011110000000000000000000 00000000111110000000000000000000 00000000011111000000000000000000 00000000111111000000000000000000 00000000111110000000000000000000 00000000111100000000000000000000 00000000011110000000000000000000 00000000111110000111000000000000 00000001111111111111111100000000 00000001111111111111111110000000 00000001111111111111111110000000 00000000111111111111111111100000 00000001111111110000011111100000 00000001111100000000000111100000 00000000111100000000000111100000 00000000011110000000000011110000 00000000011111000000000011110000 00000000011111100000001111110000 00000000011111111111111111110000 00000000011111111111111111100000 00000000000111111111111111100000 00000000000011111111111111100000 00000000000000111111111000000000 00000000000000001111110000000000 Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
  • 38. Training an algorithm 1. Collect you’re data as a collection of Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be instances 2. Split you’re data set into a training set and a testing set 3. Train the algorithm with the training set 4. Validate the results using the test set
  • 39. Runtime model • During training most algorithms generate a mathematical runtime model. • Model should be updated on a regular basis Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
  • 40. A / B Testing • Slow integration in the main system. • If the machine is certain (enough) the machine can take over Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
  • 41. Hands-on Veldkant 33A, Kontich ● info@infofarmDa.btae S●ciwewncwe. inCfoomfaprman.bye
  • 42. Demo • K-Nearest Neighbour Classifier Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be • Clustering using Weka • Named-Entity Extraction • Classification of tweets
  • 43. What’s in it for you? Veldkant 33A, Kontich ● info@infofarmDa.btae S●ciwewncwe. inCfoomfaprman.bye
  • 44. Benefits of using machine learning • Automate repetitive tasks • Can be a solution for problems that are difficult to automate • Gain insights about your business • Optimize business decisions by using the opinion of the computer Veldkant 33A, Kontich ● info@infofarm.be ● www.infofarm.be
  • 45. Questions? Veldkant 33A, Kontich ● info@infofarmDa.btae S●ciwewncwe. inCfoomfaprman.bye
  • 46. Wrap-up Veldkant 33A, Kontich ● info@infofarmDa.btae S●ciwewncwe. inCfoomfaprman.bye