SlideShare una empresa de Scribd logo
1 de 9
Descargar para leer sin conexión
Guiding through a typical
Machine Learning Pipeline
2
ML Pipeline
The Standard Machine Learning Pipeline is derived from the CRISP-DM
Model
Datasets
Data Retrieval
Data Preparation & Feature Engineering
Modeling
Model Evaluation &
Tuning
Deployment &
Monitoring
ML
Algorithm
Satisfactory
Perfor-
mance?
Data Processing
& Wrangling
Feature
Extraction &
Engineering
Feature Scaling
& Selection
No
Yes
1
2
3 4
5
Source: Practical Machine Learning with Python
3
ML Pipeline
Data Retrieval
Raw Data Set
Data Retrieval is mainly data collection,
extraction and acquisition from various data
sources and data stores.
Data Sources or Formats, e.g.:
• CSV
• JSON
• XML
• SQL
• SQLite
• Web Scraping (DOM, HTML)
Data Descriptions:
• Numeric
• Text
• Categorical (Nominal, Ordinal)
More data beats clever algorithms, but
better data beats more data.
Peter Norvig
“
“
1 2 3 4 5
Source: Practical Machine Learning with Python
4
ML Pipeline
Data Preparation & Feature Engineering
Data outcome labels
Dataset Features
Feature set with categorical variables
• In this step the data is pre-processed by cleaning,
wrangling (munging) and manipulation as needed.
• Initial exploratory data analysis is also carried out.
• Data Wrangling
• Data Understanding
• Filtering
• Typecasting
• Data Transformation
• Imputing Missing Values
• Handling Duplicates
• Handling Categorical Data
• Normalizing Values
• String Manipulations
• Data Summarization
• Data Visualization
• Feature Engineering, Scaling, Selection
• Dimensionality Reduction
Data Visualization
Purpose
Methods
1 2 3 4 5
Source: Practical Machine Learning with Python
5
Modelling Procedure
ML Pipeline
Modeling
In the process of modeling, data
features are usually fed to a ML
method or algorithm and train
the model, typically to optimize a
specific cost function in most
cases with the objective of
reducing errors and generalizing
the representations learned from
the data.
Model Types
• Linear models
• Logistic Regression
• Naïve Bayes
• Support Vector Machines
• Non parametric models
• K-Nearest Neighbors
• Tree based models
• Decision tree
• Ensemble methods
• Random forests
• Gradient Boosted Machines
• Neural Networks
• Densely Neural networks (DNN)
• Convolutional Neural networks (CNN)
• Recurrent Neural networks (RNN)
Regression models
• Simple linear regression
• Multiple linear regression
• Non linear regression
Clustering models
• Partition based clustering
• Hierarchical clustering
• Density based clustering
Classification models
• Binary Classification
• Multi-Class Classification
• Multi Label Classification
Activation
Function
Initializing
Parameters
Cost function, Metric
definition
Train with # of
epochs
Evaluate model with test
data
1 2 3 4 5
Source: Practical Machine Learning with Python
6
ML Pipeline
Evaluation & Tuning Methods [1]
Models have various parameters that are tuned in a process
called hyper parameter optimization to gate models with the best
and optimal results.
3-fold cross validation
ROC curve for binary and multi-class model evaluation
Classification models can be evaluated and tested on validation
datasets (k-fold cross) and based on metrics like:
• Accuracy
• Confusion matrix, ROC
Regression models can be evaluated by:
• Coefficient of Determination, R2
• Mean Squared Error
Clustering Models can be validated by:
• Homogeneity
• Completeness
• V-measures (combination)
• Silhouette Coefficient
• Calinski-Harabaz Index
Purpose
Methods
1 2 3 4 5
Source: Practical Machine Learning with Python
7
ML Pipeline
Evaluation & Tuning Methods [2]
Bias Variance Trade-Off
• Finding the best balance between Bias and Variance
Errors.
• Bias Error is the difference between expected and
predicted value of the model estimator. It is caused
by the underlying data and patterns.
• Variance errors arises due to model sensitivity of
outliers and random noise.
Bias Variance Trade Off
Underfitting
• Underfitting is seen as a parameter setup resulting in
a low variance and high bias.
Overfitting
• Overfitting is seen as a parameter setup resulting in
a high variance and low bias.
Grid Search
Simplest hyper-parameter
optimization method. Tries out a
predefined grid of hyper parameter
set to find the best.
Randomized Search
This is a modification of Grid
Search and uses a randomized
grid of hyper-parameter settings
to find the best one.
1 2 3 4 5
Source: Practical Machine Learning with Python
8
ML Pipeline
Deployment & Monitoring
Selected models are deployed in
production and are constantly
monitored based on their predictions
and results.
Deployment Persistence
Model Persistence is the simplest was of deploying
A model. The final model will persist on permanent
media Like hard drive. A new program must route
real-life data to the persistent model which creates
the predicted output.
Custom Development
Another option to deploy a model is by developing
the implementation of model prediction method
separately. The output is just the values of
parameters that were learned. Method for the
software development domain.
In-House Model Deployment
Due to data protection reasons a lot of enterprises
do not want to expose their data on which models
need to be built and deployed. Models can be easily
integrated internally with web dev frameworks, APIs
or micro-services on top of the prediction models.
Model Deployment as a Service
Model is open accessible and can be integrated via
a cloud based API request.
1 2 3 4 5
Source: Practical Machine Learning with Python
9
Michael Gerke
Detecon International GmbH
Sternengasse 14-16
50676 Cologne (Germany)
Phone: +49 221 91611138
Mobile: +49 160 6907433
Email: Michael.Gerke@detecon.com
ML Pipeline
Contact
Special Thanks to the author team:
• Dipanjan Sarkar
• Raghav Bali
• Tushar Sharma

Más contenido relacionado

La actualidad más candente

Explainable AI in Industry (FAT* 2020 Tutorial)
Explainable AI in Industry (FAT* 2020 Tutorial)Explainable AI in Industry (FAT* 2020 Tutorial)
Explainable AI in Industry (FAT* 2020 Tutorial)
Krishnaram Kenthapadi
 
Real-world Strategies for Debugging Machine Learning Systems
Real-world Strategies for Debugging Machine Learning SystemsReal-world Strategies for Debugging Machine Learning Systems
Real-world Strategies for Debugging Machine Learning Systems
Databricks
 
Machine Learning Tutorial | Machine Learning Basics | Machine Learning Algori...
Machine Learning Tutorial | Machine Learning Basics | Machine Learning Algori...Machine Learning Tutorial | Machine Learning Basics | Machine Learning Algori...
Machine Learning Tutorial | Machine Learning Basics | Machine Learning Algori...
Simplilearn
 
Explainable AI in Industry (WWW 2020 Tutorial)
Explainable AI in Industry (WWW 2020 Tutorial)Explainable AI in Industry (WWW 2020 Tutorial)
Explainable AI in Industry (WWW 2020 Tutorial)
Krishnaram Kenthapadi
 
Explainable AI in Industry (AAAI 2020 Tutorial)
Explainable AI in Industry (AAAI 2020 Tutorial)Explainable AI in Industry (AAAI 2020 Tutorial)
Explainable AI in Industry (AAAI 2020 Tutorial)
Krishnaram Kenthapadi
 

La actualidad más candente (20)

Racing for unbalanced methods selection
Racing for unbalanced methods selectionRacing for unbalanced methods selection
Racing for unbalanced methods selection
 
Explainable AI in Industry (FAT* 2020 Tutorial)
Explainable AI in Industry (FAT* 2020 Tutorial)Explainable AI in Industry (FAT* 2020 Tutorial)
Explainable AI in Industry (FAT* 2020 Tutorial)
 
MLOps.pptx
MLOps.pptxMLOps.pptx
MLOps.pptx
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Generative AI Risks & Concerns
Generative AI Risks & ConcernsGenerative AI Risks & Concerns
Generative AI Risks & Concerns
 
Explainability and bias in AI
Explainability and bias in AIExplainability and bias in AI
Explainability and bias in AI
 
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Data Science Tutorial | Introduction To Data Science | Data Science Training ...Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
 
Machine learning workshop
Machine learning workshopMachine learning workshop
Machine learning workshop
 
End-to-End Machine Learning Project
End-to-End Machine Learning ProjectEnd-to-End Machine Learning Project
End-to-End Machine Learning Project
 
Real-world Strategies for Debugging Machine Learning Systems
Real-world Strategies for Debugging Machine Learning SystemsReal-world Strategies for Debugging Machine Learning Systems
Real-world Strategies for Debugging Machine Learning Systems
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Machine Learning Tutorial | Machine Learning Basics | Machine Learning Algori...
Machine Learning Tutorial | Machine Learning Basics | Machine Learning Algori...Machine Learning Tutorial | Machine Learning Basics | Machine Learning Algori...
Machine Learning Tutorial | Machine Learning Basics | Machine Learning Algori...
 
ML Basics
ML BasicsML Basics
ML Basics
 
Unified Approach to Interpret Machine Learning Model: SHAP + LIME
Unified Approach to Interpret Machine Learning Model: SHAP + LIMEUnified Approach to Interpret Machine Learning Model: SHAP + LIME
Unified Approach to Interpret Machine Learning Model: SHAP + LIME
 
Explainable AI in Industry (WWW 2020 Tutorial)
Explainable AI in Industry (WWW 2020 Tutorial)Explainable AI in Industry (WWW 2020 Tutorial)
Explainable AI in Industry (WWW 2020 Tutorial)
 
Security and Privacy of Machine Learning
Security and Privacy of Machine LearningSecurity and Privacy of Machine Learning
Security and Privacy of Machine Learning
 
From Data Science to MLOps
From Data Science to MLOpsFrom Data Science to MLOps
From Data Science to MLOps
 
Explainable AI in Industry (AAAI 2020 Tutorial)
Explainable AI in Industry (AAAI 2020 Tutorial)Explainable AI in Industry (AAAI 2020 Tutorial)
Explainable AI in Industry (AAAI 2020 Tutorial)
 

Similar a Guiding through a typical Machine Learning Pipeline

MachineLearning Seminar PPT.pptx
MachineLearning Seminar PPT.pptxMachineLearning Seminar PPT.pptx
MachineLearning Seminar PPT.pptx
AmanDixit74
 

Similar a Guiding through a typical Machine Learning Pipeline (20)

The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
 
Practical data science
Practical data sciencePractical data science
Practical data science
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in Production
 
Automated Machine Learning
Automated Machine LearningAutomated Machine Learning
Automated Machine Learning
 
AlphaPy
AlphaPyAlphaPy
AlphaPy
 
AlphaPy: A Data Science Pipeline in Python
AlphaPy: A Data Science Pipeline in PythonAlphaPy: A Data Science Pipeline in Python
AlphaPy: A Data Science Pipeline in Python
 
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
 
Machine Learning and AI at Oracle
Machine Learning and AI at OracleMachine Learning and AI at Oracle
Machine Learning and AI at Oracle
 
What are the Unique Challenges and Opportunities in Systems for ML?
What are the Unique Challenges and Opportunities in Systems for ML?What are the Unique Challenges and Opportunities in Systems for ML?
What are the Unique Challenges and Opportunities in Systems for ML?
 
Combining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache SparkCombining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache Spark
 
MachineLearning Seminar PPT.pptx
MachineLearning Seminar PPT.pptxMachineLearning Seminar PPT.pptx
MachineLearning Seminar PPT.pptx
 
Machine learning
Machine learningMachine learning
Machine learning
 
ML Ops.pptx
ML Ops.pptxML Ops.pptx
ML Ops.pptx
 
Machine Learning With ML.NET
Machine Learning With ML.NETMachine Learning With ML.NET
Machine Learning With ML.NET
 
Everything you need to know about AutoML
Everything you need to know about AutoMLEverything you need to know about AutoML
Everything you need to know about AutoML
 
MachineLearningSparkML.pptx
MachineLearningSparkML.pptxMachineLearningSparkML.pptx
MachineLearningSparkML.pptx
 
Recommender System Using AZURE ML
Recommender System Using AZURE MLRecommender System Using AZURE ML
Recommender System Using AZURE ML
 
Bangla Hand Written Digit Recognition presentation slide .pptx
Bangla Hand Written Digit Recognition presentation slide .pptxBangla Hand Written Digit Recognition presentation slide .pptx
Bangla Hand Written Digit Recognition presentation slide .pptx
 
Combining Machine Learning Frameworks with Apache Spark
Combining Machine Learning Frameworks with Apache SparkCombining Machine Learning Frameworks with Apache Spark
Combining Machine Learning Frameworks with Apache Spark
 
WhyR? Analiza sentymentu
WhyR? Analiza sentymentuWhyR? Analiza sentymentu
WhyR? Analiza sentymentu
 

Último

怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
vexqp
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
vexqp
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
q6pzkpark
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
Health
 

Último (20)

怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptx
 
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
SR-101-01012024-EN.docx  Federal Constitution  of the Swiss ConfederationSR-101-01012024-EN.docx  Federal Constitution  of the Swiss Confederation
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 

Guiding through a typical Machine Learning Pipeline

  • 1. Guiding through a typical Machine Learning Pipeline
  • 2. 2 ML Pipeline The Standard Machine Learning Pipeline is derived from the CRISP-DM Model Datasets Data Retrieval Data Preparation & Feature Engineering Modeling Model Evaluation & Tuning Deployment & Monitoring ML Algorithm Satisfactory Perfor- mance? Data Processing & Wrangling Feature Extraction & Engineering Feature Scaling & Selection No Yes 1 2 3 4 5 Source: Practical Machine Learning with Python
  • 3. 3 ML Pipeline Data Retrieval Raw Data Set Data Retrieval is mainly data collection, extraction and acquisition from various data sources and data stores. Data Sources or Formats, e.g.: • CSV • JSON • XML • SQL • SQLite • Web Scraping (DOM, HTML) Data Descriptions: • Numeric • Text • Categorical (Nominal, Ordinal) More data beats clever algorithms, but better data beats more data. Peter Norvig “ “ 1 2 3 4 5 Source: Practical Machine Learning with Python
  • 4. 4 ML Pipeline Data Preparation & Feature Engineering Data outcome labels Dataset Features Feature set with categorical variables • In this step the data is pre-processed by cleaning, wrangling (munging) and manipulation as needed. • Initial exploratory data analysis is also carried out. • Data Wrangling • Data Understanding • Filtering • Typecasting • Data Transformation • Imputing Missing Values • Handling Duplicates • Handling Categorical Data • Normalizing Values • String Manipulations • Data Summarization • Data Visualization • Feature Engineering, Scaling, Selection • Dimensionality Reduction Data Visualization Purpose Methods 1 2 3 4 5 Source: Practical Machine Learning with Python
  • 5. 5 Modelling Procedure ML Pipeline Modeling In the process of modeling, data features are usually fed to a ML method or algorithm and train the model, typically to optimize a specific cost function in most cases with the objective of reducing errors and generalizing the representations learned from the data. Model Types • Linear models • Logistic Regression • Naïve Bayes • Support Vector Machines • Non parametric models • K-Nearest Neighbors • Tree based models • Decision tree • Ensemble methods • Random forests • Gradient Boosted Machines • Neural Networks • Densely Neural networks (DNN) • Convolutional Neural networks (CNN) • Recurrent Neural networks (RNN) Regression models • Simple linear regression • Multiple linear regression • Non linear regression Clustering models • Partition based clustering • Hierarchical clustering • Density based clustering Classification models • Binary Classification • Multi-Class Classification • Multi Label Classification Activation Function Initializing Parameters Cost function, Metric definition Train with # of epochs Evaluate model with test data 1 2 3 4 5 Source: Practical Machine Learning with Python
  • 6. 6 ML Pipeline Evaluation & Tuning Methods [1] Models have various parameters that are tuned in a process called hyper parameter optimization to gate models with the best and optimal results. 3-fold cross validation ROC curve for binary and multi-class model evaluation Classification models can be evaluated and tested on validation datasets (k-fold cross) and based on metrics like: • Accuracy • Confusion matrix, ROC Regression models can be evaluated by: • Coefficient of Determination, R2 • Mean Squared Error Clustering Models can be validated by: • Homogeneity • Completeness • V-measures (combination) • Silhouette Coefficient • Calinski-Harabaz Index Purpose Methods 1 2 3 4 5 Source: Practical Machine Learning with Python
  • 7. 7 ML Pipeline Evaluation & Tuning Methods [2] Bias Variance Trade-Off • Finding the best balance between Bias and Variance Errors. • Bias Error is the difference between expected and predicted value of the model estimator. It is caused by the underlying data and patterns. • Variance errors arises due to model sensitivity of outliers and random noise. Bias Variance Trade Off Underfitting • Underfitting is seen as a parameter setup resulting in a low variance and high bias. Overfitting • Overfitting is seen as a parameter setup resulting in a high variance and low bias. Grid Search Simplest hyper-parameter optimization method. Tries out a predefined grid of hyper parameter set to find the best. Randomized Search This is a modification of Grid Search and uses a randomized grid of hyper-parameter settings to find the best one. 1 2 3 4 5 Source: Practical Machine Learning with Python
  • 8. 8 ML Pipeline Deployment & Monitoring Selected models are deployed in production and are constantly monitored based on their predictions and results. Deployment Persistence Model Persistence is the simplest was of deploying A model. The final model will persist on permanent media Like hard drive. A new program must route real-life data to the persistent model which creates the predicted output. Custom Development Another option to deploy a model is by developing the implementation of model prediction method separately. The output is just the values of parameters that were learned. Method for the software development domain. In-House Model Deployment Due to data protection reasons a lot of enterprises do not want to expose their data on which models need to be built and deployed. Models can be easily integrated internally with web dev frameworks, APIs or micro-services on top of the prediction models. Model Deployment as a Service Model is open accessible and can be integrated via a cloud based API request. 1 2 3 4 5 Source: Practical Machine Learning with Python
  • 9. 9 Michael Gerke Detecon International GmbH Sternengasse 14-16 50676 Cologne (Germany) Phone: +49 221 91611138 Mobile: +49 160 6907433 Email: Michael.Gerke@detecon.com ML Pipeline Contact Special Thanks to the author team: • Dipanjan Sarkar • Raghav Bali • Tushar Sharma