SlideShare a Scribd company logo
1 of 52
Download to read offline
Choosing a
ML technique
to solve your need
RALUCA APOSTOL
Nestor
Co-founder & Chief Product Officer
Passion for data - science, analysis & interpretation
Our chat today :)
• What is Machine Learning?
• What are some popular approaches?
• How to think?
• How to build your model?
• What are the resources?
• Let’s get to action.
Artificial Intelligence
• Anything which is not natural and created by humans is artificial.
• Intelligence means ability to understand, reason, plan, adapt, etc.
• So any code, tech or algorithm that enable machine to mimic,
develop or demonstrate the human cognition or behavior is AI
Applications for AI
AI vs. ML
AI vs. ML
Connecting the dots …
Connecting the dots …
So what is Machine Learning?
• Machine learning is the field of study
that gives computers the ability to
learn without being explicitly
programmed.
• In simple term, Machine Learning
means making prediction based on
data.
The ML Mindset
“ML changes the way you think about a problem. The focus shifts
from a mathematical science to a natural science, running
experiments and using statistics, not logic, to analyze its results."
– Peter Norvig
Google Research Director
The ML Mindset
The ML Mindset
F(a,b) = a + b
a = 1; b = 2;
3
The ML Mindset
F(a,b) = a + b
a = 1; b = 2;
3
F(a,b) = x*a + y*b
a = 1; b = 2;
3
x = ?; y = ?
The steps
• Machine learning is part art and part science. :)
• There is no one solution or one approach that fits all.
• There are several factors that can affect your decision to choose a
machine learning algorithm.
The steps
Know your data
Before you start looking at different ML algorithms, you need to have a
clear picture of your data, your problem and your constraints.
Know your data
Before you start looking at different ML algorithms, you need to have a
clear picture of your data, your problem and your constraints.
Know your data
1. Look at Summary statistics and visualizations
• Percentages can help identify the range for most of the data
• Averages and medians can describe central tendency
• Correlations can indicate strong relationships
2. Visualize the data
• Box plots can identify outliers
• Density plots and histograms show the spread of data
• Scatter plots can describe bivariate relationships
Since the collected data may be in an undesired format, unorganized,
or extremely large, further steps are needed to enhance its quality. The
three common steps for preprocessing data are formatting, cleaning,
and sampling.
Clean your data
1. How do I deal with missing value?
• Missing data affects some models more than others.
• Even for models that handle missing data, they can be sensitive to it (missing data
for certain variables can result in poor predictions)
2. Does the data needs to be aggregated?
Clean your data
Clean your data
Clean your data
3. What do I do with outliers?
• Outliers can be very common in multidimensional data.
• Some models are less sensitive to outliers than others.
Usually tree models are less sensitive to the presence of
outliers. However regression models, or any model that
tries to use equations, could definitely be effected by
outliers
• Outliers can be the result of bad data collection, or they can
be legitimate extreme values.
Clean your data
Sampling
Augment your data
1. Feature engineering is the process of going from raw
data to data that is ready for modeling. It can serve
multiple purposes:
• Make the models easier to interpret (e.g. binning)
• Capture more complex relationships (e.g. NNs)
• Reduce data redundancy and dimensionality (e.g. PCA)
• Rescale variables (e.g. standardizing or normalizing)
2. Different models may have different feature engineering
requirements.
Categorize the problem
1. Categorize by input:
• If you have labelled data, it’s a supervised learning problem.
• If you have unlabelled data and want to find structure, it’s an unsupervised
learning problem.
• If you want to optimize an objective function by interacting with an environment,
it’s a reinforcement learning problem.
2. Categorize by output
• If the output of your model is a number, it’s a regression problem.
• If the output of your model is a class, it’s a classification problem.
• If the output of your model is a set of input groups, it’s a clustering problem.
• Do you want to detect an anomaly ? That’s anomaly detection.
Categorize the problem
Understand your constraints
• What is your data storage capacity? Depending on the storage capacity of your
system, you might not be able to store gigabytes of classification/regression
models or gigabytes of data to clusterize. This is the case, for instance, for
embedded systems.
• Does the prediction have to be fast? In real time applications, it is obviously very
important to have a prediction as fast as possible. For instance, in autonomous
driving, it’s important that the classification of road signs be as fast as possible to
avoid accidents.
• Does the learning have to be fast? In some circumstances, training models quickly
is necessary: sometimes, you need to rapidly update, on the fly, your model with a
different dataset.
Choose the algorithm
Identify the algorithms that are applicable and practical to implement using the tools
at your disposal.
Some of the factors affecting the choice of a model are:
• Whether the model meets the business goals
• How much preprocessing the model needs
• How accurate the model is
• How explainable the model is
• How fast the model is: How long does it take to build a model, and how long does
the model take to make predictions.
• How scalable the model is
Take care at complexity
Making the same algorithm more complex increases the chance of overfitting.
A model is more complex when:
• It relies on more features to learn and predict (e.g. using two features vs ten
features to predict a target)
• It relies on more complex feature engineering (e.g. using polynomial terms,
interactions, or principal components)
• It has more computational overhead (e.g. a single decision tree vs. a random
forest of 100 trees).
ML Algorithms
ML Algorithms
Linear Regression
Regression algorithms can be used for example, when you
want to compute some continuous value as compared to
Classification where the output is categoric.
• Time to go one location to another
• Predicting sales of particular product next month
• Impact of blood alcohol content on coordination
• Predict monthly gift card sales and improve yearly
revenue projections
Logistic Regression
Logistic regression performs binary classification, so the
label outputs are binary. It takes linear combination of
features and applies non-linear function (sigmoid) to it, so
it’s a very small instance of neural network.
• Predicting the Customer Churn
• Credit Scoring & Fraud Detection
• Measuring the effectiveness of marketing
campaigns
Decision trees
Single trees are used very rarely, but in composition with
many others they build very efficient algorithms such as
Random Forest or Gradient Tree Boosting.
• Investment decisions
• Customer churn
• Banks loan defaulters
• Build vs Buy decisions
• Sales lead qualifications
K-means
Sometimes you don’t know any labels and your goal is to
assign labels according to the features of objects. This is
called clusterization task.
If there are questions like how is this organized or
grouping something or concentrating on particular groups
etc. in your problem statement then you should go with
Clustering.
• When there is a large group of users and you want to
divide them into particular groups based on some
common attributes.
Principal component analysis
Principal component analysis provides dimensionality
reduction. Sometimes you have a wide range of features,
probably highly correlated between each other, and
models can easily overfit on a huge amount of data. Then,
you can apply PCA.
Support Vector Machines
Support Vector Machine (SVM) is a supervised
machine learning technique that is widely used in
pattern recognition and classification problems — 
when your data has exactly two classes.
• detecting persons with common diseases
such as diabetes
• hand-written character recognition
• text categorization — news articles by topics
• stock market price prediction
Neural networks
Neural Networks take in the weights of connections
between neurons.
Extremely complex models can be trained and they
can be utilized as a kind of black box, without playing
out an unpredictable complex feature engineering
before training the model.
Object recognition has been as of late enormously
enhanced utilizing Deep Neural Networks. Applied to
unsupervised learning tasks, such as feature
extraction, deep learning also extracts features from
raw images or speech with much less human
intervention.
Training & Evaluation
Training & Evaluation
Parameter Tuning
Prediction
What you need to know ?
It is mandatory to learn a programming language, preferably Python, along with the
required analytical and mathematical knowledge. You can touch on:
• Linear algebra for data analysis: Scalars, Vectors, Matrices, and Tensors
• Mathematical Analysis: Derivatives and Gradients
• Probability theory and statistics
• Multivariate Calculus
• Algorithms and Complex Optimizations
Programming Languages
Python is hands down the best programming language for
Machine Learning applications.
Other programming languages that could to use for Machine
Learning Applications are R, C++, JavaScript, Java, C#, Julia,
Shell, TypeScript, and Scala.
Cloud Support
Test
Interactive:
• http://www.r2d3.us/visual-intro-to-machine-learning-part-1/
Test with Amazon SageMaker steps:
1. Create an AWS account
2. Launch a SageMaker Instance
3. Created a S3 bucket
4. Launch the Jupyter Notebook
5. Explore examples & tutorials
Libraries
• https://towardsdatascience.com/best-python-libraries-for-machine-
learning-and-deep-learning-b0bd40c7e8c (Python)
• https://blog.bitsrc.io/11-javascript-machine-learning-libraries-to-use-
in-your-app-c49772cca46c (Javascript)
• https://www.baeldung.com/java-ai (Java)
• https://www.geeksforgeeks.org/machine-learning-in-c/ (C++)
Datasets
• https://towardsai.net/p/machine-learning/best-free-datasets-for-
machine-learning-and-data-science/stanfordai/3451/
• https://towardsdatascience.com/free-data-sets-for-machine-
learning-73e74554cc21
• https://www.ubuntupit.com/best-machine-learning-datasets-for-
practicing-applied-ml/
Nestor
Co-founder & Chief Product Officer
Raluca Apostol
raluca@nestorup.com

More Related Content

What's hot

Machine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachinePulse
 
End-to-End Machine Learning Project
End-to-End Machine Learning ProjectEnd-to-End Machine Learning Project
End-to-End Machine Learning ProjectEng Teong Cheah
 
Machine learning
Machine learningMachine learning
Machine learningRohit Kumar
 
Internship project report,Predictive Modelling
Internship project report,Predictive ModellingInternship project report,Predictive Modelling
Internship project report,Predictive ModellingAmit Kumar
 
Barga DIDC'14 Invited Talk
Barga DIDC'14 Invited TalkBarga DIDC'14 Invited Talk
Barga DIDC'14 Invited TalkRoger Barga
 
Machine learning Presentation
Machine learning PresentationMachine learning Presentation
Machine learning PresentationManish Singh
 
Applied Artificial Intelligence Unit 3 Semester 3 MSc IT Part 2 Mumbai Univer...
Applied Artificial Intelligence Unit 3 Semester 3 MSc IT Part 2 Mumbai Univer...Applied Artificial Intelligence Unit 3 Semester 3 MSc IT Part 2 Mumbai Univer...
Applied Artificial Intelligence Unit 3 Semester 3 MSc IT Part 2 Mumbai Univer...Madhav Mishra
 
Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Marina Santini
 
Machine Learning: Applications, Process and Techniques
Machine Learning: Applications, Process and TechniquesMachine Learning: Applications, Process and Techniques
Machine Learning: Applications, Process and TechniquesRui Pedro Paiva
 
Supervised learning
Supervised learningSupervised learning
Supervised learningAlia Hamwi
 
Introduction to machine learning and applications (1)
Introduction to machine learning and applications (1)Introduction to machine learning and applications (1)
Introduction to machine learning and applications (1)Manjunath Sindagi
 
Introduction to Machine learning
Introduction to Machine learningIntroduction to Machine learning
Introduction to Machine learningKnoldus Inc.
 
Applied Artificial Intelligence Unit 4 Semester 3 MSc IT Part 2 Mumbai Univer...
Applied Artificial Intelligence Unit 4 Semester 3 MSc IT Part 2 Mumbai Univer...Applied Artificial Intelligence Unit 4 Semester 3 MSc IT Part 2 Mumbai Univer...
Applied Artificial Intelligence Unit 4 Semester 3 MSc IT Part 2 Mumbai Univer...Madhav Mishra
 
Engineering Intelligent Systems using Machine Learning
Engineering Intelligent Systems using Machine Learning Engineering Intelligent Systems using Machine Learning
Engineering Intelligent Systems using Machine Learning Saurabh Kaushik
 

What's hot (20)

Machine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachine Learning and Real-World Applications
Machine Learning and Real-World Applications
 
End-to-End Machine Learning Project
End-to-End Machine Learning ProjectEnd-to-End Machine Learning Project
End-to-End Machine Learning Project
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Machine learning
Machine learningMachine learning
Machine learning
 
Internship project report,Predictive Modelling
Internship project report,Predictive ModellingInternship project report,Predictive Modelling
Internship project report,Predictive Modelling
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Barga DIDC'14 Invited Talk
Barga DIDC'14 Invited TalkBarga DIDC'14 Invited Talk
Barga DIDC'14 Invited Talk
 
Machine learning Presentation
Machine learning PresentationMachine learning Presentation
Machine learning Presentation
 
Applied Artificial Intelligence Unit 3 Semester 3 MSc IT Part 2 Mumbai Univer...
Applied Artificial Intelligence Unit 3 Semester 3 MSc IT Part 2 Mumbai Univer...Applied Artificial Intelligence Unit 3 Semester 3 MSc IT Part 2 Mumbai Univer...
Applied Artificial Intelligence Unit 3 Semester 3 MSc IT Part 2 Mumbai Univer...
 
Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?
 
Machine Learning: Applications, Process and Techniques
Machine Learning: Applications, Process and TechniquesMachine Learning: Applications, Process and Techniques
Machine Learning: Applications, Process and Techniques
 
machine learning
machine learningmachine learning
machine learning
 
Supervised learning
Supervised learningSupervised learning
Supervised learning
 
Managing machine learning
Managing machine learningManaging machine learning
Managing machine learning
 
ML Basics
ML BasicsML Basics
ML Basics
 
Introduction to machine learning and applications (1)
Introduction to machine learning and applications (1)Introduction to machine learning and applications (1)
Introduction to machine learning and applications (1)
 
Introduction to Machine learning
Introduction to Machine learningIntroduction to Machine learning
Introduction to Machine learning
 
Applied Artificial Intelligence Unit 4 Semester 3 MSc IT Part 2 Mumbai Univer...
Applied Artificial Intelligence Unit 4 Semester 3 MSc IT Part 2 Mumbai Univer...Applied Artificial Intelligence Unit 4 Semester 3 MSc IT Part 2 Mumbai Univer...
Applied Artificial Intelligence Unit 4 Semester 3 MSc IT Part 2 Mumbai Univer...
 
Engineering Intelligent Systems using Machine Learning
Engineering Intelligent Systems using Machine Learning Engineering Intelligent Systems using Machine Learning
Engineering Intelligent Systems using Machine Learning
 
Machine learning
Machine learning Machine learning
Machine learning
 

Similar to Choosing a Machine Learning technique to solve your need

Barga Data Science lecture 2
Barga Data Science lecture 2Barga Data Science lecture 2
Barga Data Science lecture 2Roger Barga
 
Lesson 1 - Overview of Machine Learning and Data Analysis.pptx
Lesson 1 - Overview of Machine Learning and Data Analysis.pptxLesson 1 - Overview of Machine Learning and Data Analysis.pptx
Lesson 1 - Overview of Machine Learning and Data Analysis.pptxcloudserviceuit
 
It's Machine Learning Basics -- For You!
It's Machine Learning Basics -- For You!It's Machine Learning Basics -- For You!
It's Machine Learning Basics -- For You!To Sum It Up
 
Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)SwatiTripathi44
 
Unit 1-ML (1) (1).pptx
Unit 1-ML (1) (1).pptxUnit 1-ML (1) (1).pptx
Unit 1-ML (1) (1).pptxChitrachitrap
 
AI-900 - Fundamental Principles of ML.pptx
AI-900 - Fundamental Principles of ML.pptxAI-900 - Fundamental Principles of ML.pptx
AI-900 - Fundamental Principles of ML.pptxkprasad8
 
Introduction to data science.pdf
Introduction to data science.pdfIntroduction to data science.pdf
Introduction to data science.pdfalsaid fathy
 
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...Egyptian Engineers Association
 
Machine Learning With ML.NET
Machine Learning With ML.NETMachine Learning With ML.NET
Machine Learning With ML.NETDev Raj Gautam
 
Data Mining - The Big Picture!
Data Mining - The Big Picture!Data Mining - The Big Picture!
Data Mining - The Big Picture!Khalid Salama
 
01 Introduction to Data Mining
01 Introduction to Data Mining01 Introduction to Data Mining
01 Introduction to Data MiningValerii Klymchuk
 
Data Science Training in Chandigarh h
Data Science Training in Chandigarh    hData Science Training in Chandigarh    h
Data Science Training in Chandigarh hasmeerana605
 
Data Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAData Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAjaved75
 
credit card fraud detection
credit card fraud detectioncredit card fraud detection
credit card fraud detectionjagan477830
 
Feature engineering
Feature engineeringFeature engineering
Feature engineeringSaurabhWani6
 
MACHINE LEARNING PRESENTATION (ARTIFICIAL INTELLIGENCE)
MACHINE LEARNING PRESENTATION (ARTIFICIAL INTELLIGENCE)MACHINE LEARNING PRESENTATION (ARTIFICIAL INTELLIGENCE)
MACHINE LEARNING PRESENTATION (ARTIFICIAL INTELLIGENCE)MAHIRA
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Roger Barga
 
Module 7: Unsupervised Learning
Module 7:  Unsupervised LearningModule 7:  Unsupervised Learning
Module 7: Unsupervised LearningSara Hooker
 

Similar to Choosing a Machine Learning technique to solve your need (20)

Barga Data Science lecture 2
Barga Data Science lecture 2Barga Data Science lecture 2
Barga Data Science lecture 2
 
Lesson 1 - Overview of Machine Learning and Data Analysis.pptx
Lesson 1 - Overview of Machine Learning and Data Analysis.pptxLesson 1 - Overview of Machine Learning and Data Analysis.pptx
Lesson 1 - Overview of Machine Learning and Data Analysis.pptx
 
It's Machine Learning Basics -- For You!
It's Machine Learning Basics -- For You!It's Machine Learning Basics -- For You!
It's Machine Learning Basics -- For You!
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)
 
Unit 1-ML (1) (1).pptx
Unit 1-ML (1) (1).pptxUnit 1-ML (1) (1).pptx
Unit 1-ML (1) (1).pptx
 
AI-900 - Fundamental Principles of ML.pptx
AI-900 - Fundamental Principles of ML.pptxAI-900 - Fundamental Principles of ML.pptx
AI-900 - Fundamental Principles of ML.pptx
 
Introduction to data science.pdf
Introduction to data science.pdfIntroduction to data science.pdf
Introduction to data science.pdf
 
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
 
Machine Learning With ML.NET
Machine Learning With ML.NETMachine Learning With ML.NET
Machine Learning With ML.NET
 
Data Mining - The Big Picture!
Data Mining - The Big Picture!Data Mining - The Big Picture!
Data Mining - The Big Picture!
 
01 Introduction to Data Mining
01 Introduction to Data Mining01 Introduction to Data Mining
01 Introduction to Data Mining
 
Data Science Training in Chandigarh h
Data Science Training in Chandigarh    hData Science Training in Chandigarh    h
Data Science Training in Chandigarh h
 
Machine learning in Banks
Machine learning in BanksMachine learning in Banks
Machine learning in Banks
 
Data Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAData Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATA
 
credit card fraud detection
credit card fraud detectioncredit card fraud detection
credit card fraud detection
 
Feature engineering
Feature engineeringFeature engineering
Feature engineering
 
MACHINE LEARNING PRESENTATION (ARTIFICIAL INTELLIGENCE)
MACHINE LEARNING PRESENTATION (ARTIFICIAL INTELLIGENCE)MACHINE LEARNING PRESENTATION (ARTIFICIAL INTELLIGENCE)
MACHINE LEARNING PRESENTATION (ARTIFICIAL INTELLIGENCE)
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015
 
Module 7: Unsupervised Learning
Module 7:  Unsupervised LearningModule 7:  Unsupervised Learning
Module 7: Unsupervised Learning
 

Recently uploaded

Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 

Recently uploaded (20)

Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 

Choosing a Machine Learning technique to solve your need

  • 1. Choosing a ML technique to solve your need RALUCA APOSTOL
  • 2. Nestor Co-founder & Chief Product Officer
  • 3. Passion for data - science, analysis & interpretation
  • 4. Our chat today :) • What is Machine Learning? • What are some popular approaches? • How to think? • How to build your model? • What are the resources? • Let’s get to action.
  • 5. Artificial Intelligence • Anything which is not natural and created by humans is artificial. • Intelligence means ability to understand, reason, plan, adapt, etc. • So any code, tech or algorithm that enable machine to mimic, develop or demonstrate the human cognition or behavior is AI
  • 11. So what is Machine Learning? • Machine learning is the field of study that gives computers the ability to learn without being explicitly programmed. • In simple term, Machine Learning means making prediction based on data.
  • 12. The ML Mindset “ML changes the way you think about a problem. The focus shifts from a mathematical science to a natural science, running experiments and using statistics, not logic, to analyze its results." – Peter Norvig Google Research Director
  • 14. The ML Mindset F(a,b) = a + b a = 1; b = 2; 3
  • 15. The ML Mindset F(a,b) = a + b a = 1; b = 2; 3 F(a,b) = x*a + y*b a = 1; b = 2; 3 x = ?; y = ?
  • 16. The steps • Machine learning is part art and part science. :) • There is no one solution or one approach that fits all. • There are several factors that can affect your decision to choose a machine learning algorithm.
  • 18. Know your data Before you start looking at different ML algorithms, you need to have a clear picture of your data, your problem and your constraints.
  • 19. Know your data Before you start looking at different ML algorithms, you need to have a clear picture of your data, your problem and your constraints.
  • 20. Know your data 1. Look at Summary statistics and visualizations • Percentages can help identify the range for most of the data • Averages and medians can describe central tendency • Correlations can indicate strong relationships 2. Visualize the data • Box plots can identify outliers • Density plots and histograms show the spread of data • Scatter plots can describe bivariate relationships
  • 21. Since the collected data may be in an undesired format, unorganized, or extremely large, further steps are needed to enhance its quality. The three common steps for preprocessing data are formatting, cleaning, and sampling. Clean your data
  • 22. 1. How do I deal with missing value? • Missing data affects some models more than others. • Even for models that handle missing data, they can be sensitive to it (missing data for certain variables can result in poor predictions) 2. Does the data needs to be aggregated? Clean your data
  • 25. 3. What do I do with outliers? • Outliers can be very common in multidimensional data. • Some models are less sensitive to outliers than others. Usually tree models are less sensitive to the presence of outliers. However regression models, or any model that tries to use equations, could definitely be effected by outliers • Outliers can be the result of bad data collection, or they can be legitimate extreme values. Clean your data
  • 27. Augment your data 1. Feature engineering is the process of going from raw data to data that is ready for modeling. It can serve multiple purposes: • Make the models easier to interpret (e.g. binning) • Capture more complex relationships (e.g. NNs) • Reduce data redundancy and dimensionality (e.g. PCA) • Rescale variables (e.g. standardizing or normalizing) 2. Different models may have different feature engineering requirements.
  • 28. Categorize the problem 1. Categorize by input: • If you have labelled data, it’s a supervised learning problem. • If you have unlabelled data and want to find structure, it’s an unsupervised learning problem. • If you want to optimize an objective function by interacting with an environment, it’s a reinforcement learning problem. 2. Categorize by output • If the output of your model is a number, it’s a regression problem. • If the output of your model is a class, it’s a classification problem. • If the output of your model is a set of input groups, it’s a clustering problem. • Do you want to detect an anomaly ? That’s anomaly detection.
  • 30. Understand your constraints • What is your data storage capacity? Depending on the storage capacity of your system, you might not be able to store gigabytes of classification/regression models or gigabytes of data to clusterize. This is the case, for instance, for embedded systems. • Does the prediction have to be fast? In real time applications, it is obviously very important to have a prediction as fast as possible. For instance, in autonomous driving, it’s important that the classification of road signs be as fast as possible to avoid accidents. • Does the learning have to be fast? In some circumstances, training models quickly is necessary: sometimes, you need to rapidly update, on the fly, your model with a different dataset.
  • 31. Choose the algorithm Identify the algorithms that are applicable and practical to implement using the tools at your disposal. Some of the factors affecting the choice of a model are: • Whether the model meets the business goals • How much preprocessing the model needs • How accurate the model is • How explainable the model is • How fast the model is: How long does it take to build a model, and how long does the model take to make predictions. • How scalable the model is
  • 32. Take care at complexity Making the same algorithm more complex increases the chance of overfitting. A model is more complex when: • It relies on more features to learn and predict (e.g. using two features vs ten features to predict a target) • It relies on more complex feature engineering (e.g. using polynomial terms, interactions, or principal components) • It has more computational overhead (e.g. a single decision tree vs. a random forest of 100 trees).
  • 35. Linear Regression Regression algorithms can be used for example, when you want to compute some continuous value as compared to Classification where the output is categoric. • Time to go one location to another • Predicting sales of particular product next month • Impact of blood alcohol content on coordination • Predict monthly gift card sales and improve yearly revenue projections
  • 36. Logistic Regression Logistic regression performs binary classification, so the label outputs are binary. It takes linear combination of features and applies non-linear function (sigmoid) to it, so it’s a very small instance of neural network. • Predicting the Customer Churn • Credit Scoring & Fraud Detection • Measuring the effectiveness of marketing campaigns
  • 37. Decision trees Single trees are used very rarely, but in composition with many others they build very efficient algorithms such as Random Forest or Gradient Tree Boosting. • Investment decisions • Customer churn • Banks loan defaulters • Build vs Buy decisions • Sales lead qualifications
  • 38. K-means Sometimes you don’t know any labels and your goal is to assign labels according to the features of objects. This is called clusterization task. If there are questions like how is this organized or grouping something or concentrating on particular groups etc. in your problem statement then you should go with Clustering. • When there is a large group of users and you want to divide them into particular groups based on some common attributes.
  • 39. Principal component analysis Principal component analysis provides dimensionality reduction. Sometimes you have a wide range of features, probably highly correlated between each other, and models can easily overfit on a huge amount of data. Then, you can apply PCA.
  • 40. Support Vector Machines Support Vector Machine (SVM) is a supervised machine learning technique that is widely used in pattern recognition and classification problems —  when your data has exactly two classes. • detecting persons with common diseases such as diabetes • hand-written character recognition • text categorization — news articles by topics • stock market price prediction
  • 41. Neural networks Neural Networks take in the weights of connections between neurons. Extremely complex models can be trained and they can be utilized as a kind of black box, without playing out an unpredictable complex feature engineering before training the model. Object recognition has been as of late enormously enhanced utilizing Deep Neural Networks. Applied to unsupervised learning tasks, such as feature extraction, deep learning also extracts features from raw images or speech with much less human intervention.
  • 46. What you need to know ? It is mandatory to learn a programming language, preferably Python, along with the required analytical and mathematical knowledge. You can touch on: • Linear algebra for data analysis: Scalars, Vectors, Matrices, and Tensors • Mathematical Analysis: Derivatives and Gradients • Probability theory and statistics • Multivariate Calculus • Algorithms and Complex Optimizations
  • 47. Programming Languages Python is hands down the best programming language for Machine Learning applications. Other programming languages that could to use for Machine Learning Applications are R, C++, JavaScript, Java, C#, Julia, Shell, TypeScript, and Scala.
  • 49. Test Interactive: • http://www.r2d3.us/visual-intro-to-machine-learning-part-1/ Test with Amazon SageMaker steps: 1. Create an AWS account 2. Launch a SageMaker Instance 3. Created a S3 bucket 4. Launch the Jupyter Notebook 5. Explore examples & tutorials
  • 50. Libraries • https://towardsdatascience.com/best-python-libraries-for-machine- learning-and-deep-learning-b0bd40c7e8c (Python) • https://blog.bitsrc.io/11-javascript-machine-learning-libraries-to-use- in-your-app-c49772cca46c (Javascript) • https://www.baeldung.com/java-ai (Java) • https://www.geeksforgeeks.org/machine-learning-in-c/ (C++)
  • 52. Nestor Co-founder & Chief Product Officer Raluca Apostol raluca@nestorup.com