SlideShare una empresa de Scribd logo
1 de 26
Descargar para leer sin conexión
2nd edition
#MLSEV 2
Anomaly Detectors
Practical Examples with BigML
Guillem Vidal
Machine Learning Engineer, BigML
#MLSEV 3
Outline
2 Demo 1: Removing Outliers
3 Demo 2: Fraud Detection
4 Demo 3: Novel Categories Discovery
1 Anomaly Detection Recap
#MLSEV 4
Anomaly Detection Recap
#MLSEV 5
Anomaly Detection
date customer account auth class zip amount
Mon Bob 3421 pin clothes 46140 135
Tue Bob 3421 sign food 46140 401
Tue Alice 2456 pin food 12222 234
Wed Sally 6788 pin gas 26339 94
Wed Bob 3421 pin tech 21350 2459
Wed Bob 3421 pin gas 46140 83
Thr Sally 6788 sign food 26339 51
An unsupervised algorithm that looks for unusual instances in a dataset. Anomaly
detectors provide an anomaly score to each instance, the higher is the score the
most unusual is the instance. Example:
• Amount $2,459 is higher than all other transactions
• Only transaction
• In zip 21350
• For the purchase class “tech"
#MLSEV 6
Graphical Example
Which object appears more unusual within this group?
#MLSEV
“Round”“Skinny” “Corners”
“Skinny”
but not “smooth”
No
“Corners”
Not
“Round”
Most unusual
7
Graphical Example
#MLSEV 8
Isolation Forest
“easy” to isolate
“hard” to isolate
Depth
Now repeat the process several
times and use average depth to
compute anomaly score:
0 (similar) 1 (dissimilar)
Isolation Forest: Grow random
decision trees until each instance is
in its own leaf. Random features
and splits
#MLSEV 9
Isolation Forest Splits
https://cs.nju.edu.cn/zhouzh/zhouzh.files/publication/icdm08b.pdf
AnomalyUsual data point
#MLSEV 10
Removing Outliers
#MLSEV 11
Removing Outliers
https://towardsdatascience.com/outlier-detection-with-isolation-forest-3d190448d45e
#MLSEV 12
Outliers
• Data points that differ significantly from other observations
• Outliers can cause serious problems in statistical analyses
• Examples:
1
2
3
4
5
6
10 20 30 40 50 60 70 80 900
Price
(100k €)
Square Meters
Regression:
1
2
3
4
5
6
0
Price
(100k €)
10 20 30 40 50 60 70 80 90
Square Meters
Unsold
Sold
Classification:
#MLSEV 13
Outliers
• Data points that differ significantly from other observations
• Outliers can cause serious problems in statistical analyses
• Examples:
1
2
3
4
5
6
10 20 30 40 50 60 70 80 900
Price
(100k €)
Square Meters
Regression:
1
2
3
4
5
6
0
Price
(100k €)
10 20 30 40 50 60 70 80 90
Square Meters
Unsold
Sold
Classification:
#MLSEV 14
Removing Outliers
ORIGINAL
DATASET
TRAIN SET
TEST SET
ALL
MODEL
CLEAN
DATASET
REJECT MOST
ANOMALOUS
CLEAN
MODEL
COMPARE
EVALUATIONS
ANOMALY
DETECTOR
• Anomaly detectors can be used to remove outliers
• With this methodology outliers removal can be tested
ALL
EVALUATION
CLEAN
EVALUATION
#MLSEV 15
Outliers Demo
pregnancies
plasma
glucose
blood
pressure
triceps skin
thickness
insulin bmi
diabetes
pedigree
age diabetes
6 148 72 35 0 33.6 627 50 TRUE
1 85 66 29 0 26.6 351 31 FALSE
8 183 64 0 0 23.3 672 32 TRUE
1 89 66 23 94 28.1 167 21 FALSE
0 137 40 35 168 43.1 2.288 33 TRUE
5 116 74 0 0 25.6 201 30 FALSE
3 78 50 32 88 31.0 248 26 TRUE
10 115 0 0 0 35.3 134 29 FALSE
2 197 70 45 543 30.5 158 53 TRUE
8 125 96 0 0 0.0 232 54 TRUE
4 110 92 0 0 37.6 191 30 FALSE
10 168 74 0 0 38.0 537 34 TRUE
Diabetes dataset
• Predict whether patients are diabetic or not
BigML Gallery
#MLSEV 16
Summary
•An anomaly detector improved a classifier performance by removing top
10 anomalies as outliers
•Usually removing anomalies with score over 60% works
#MLSEV 17
Fraud Detection
#MLSEV 18
Fraud Detection
HISTORIC NON
FRAUD
TRANSACTIONS
ANOMALY
DETECTOR
NEW
TRANSACTION(S)
ANOMALY
SCORE
KEEP HIGH
SCORES
SUSPICIOUS
TRANSACTION(S)
FRAUD
ANALYST
• Use Machine Learning to detect fraudulent financial transactions
• Fraud transactions being unusual can be detected with an anomaly
detector
#MLSEV 19
Fraud Detection Demo
Credit card transactions dataset
• Anonymized credit card transactions with a fraud label
• Very unbalanced
Time V1 V2 V3 V4
0 -1.3598 -0.0727 2.5363 1.3781
0 1.1918 0.2661 0.1664 0.4481
1 -1.3583 -1.3401 1.7732 0.3797
1 -0.9662 -0.1852 1.7929 -0.8632
2 -1.1582 0.8777 1.5487 0.4030
2 -0.4259 0.9605 1.1411 -0.1682
4 1.2296 0.1410 0.0453 1.2026
7 -0.6442 1.4179 1.0743 -0.4921
7 -0.8942 0.2861 -0.1131 -0.2715
9 -0.3382 1.1195 1.0443 -0.2221
10 1.4490 -1.1763 0.9138 -1.3756
V27 V28 Amount Class
0.1335 -0.0210 149.62 0
-0.0089 0.0147 2.69 0
-0.0553 -0.0597 378.66 0
0.0627 0.0614 123.5 0
0.2194 0.2151 69.99 0
0.2538 0.0810 3.67 0
0.0345 0.0051 4.99 0
-1.2069 -1.0853 40.8 1
0.0117 0.1424 93.2 0
0.2462 0.0830 3.68 0
0.0428 0.0162 7.8 0
…
…
…
https://www.kaggle.com/mlg-ulb/creditcardfraud
#MLSEV 20
Summary
• Anomaly detectors can be an unsupervised alternative to classifiers
in extremely unbalanced datasets
• Fraud detection is an example. A similar approach can be used for other
use cases such as predictive maintenance or network intrusion
detection
• With this approach, the most challenging aspect is finding the features
that work
#MLSEV 21
Novel Categories Discovery
#MLSEV 22
Novel Categories
• A classification model performance could be reduced over time in
production with real data evolution over time
• Model degradation can be addressed by retraining with new data
• What if new data is not labeled?
• What if new data contains novel categories?
• Anomaly detectors can be used to spot model degradation and to
discover novel categories
#MLSEV 23
Novel Categories Discovery
ORIGINAL
DATASET
CLASSIFICATION
MODEL
ANOMALY
DETECTOR
NEW
INSTANCES
HIGH SCORED
INSTANCES, POTENTIAL
NOVEL CATEGORIES
REJECT HIGH
ANOMALY SCORES
SIMILAR
INSTANCES
PREDICTION
LABEL/RETRAIN
MODEL ALERT
WHEN CUMULATED
ANOMALY
SCORE
DATA ANALYST
#MLSEV 24
Novel Categories Demo
Steel plates faults dataset
• Each instance represents a faulty steel plate with fault type label
• Objective: predict fault type given a faulty steel plate
…
…
…
X_Min X_Max Y_Min Y_Max Pixels Areas X_Perim Y_Perim
42 50 270900 270944 267 17 44
645 651 2538079 2538108 108 10 30
829 835 1553913 1553931 71 8 19
853 860 369370 369415 176 13 45
1289 1306 498078 498335 2409 60 260
430 441 100250 100337 630 20 87
413 446 138468 138883 9052 230 432
190 200 210936 210956 132 11 20
330 343 429227 429253 264 15 26
74 90 779144 779308 1506 46 167
106 118 813452 813500 442 13 48
Orientation_Index Luminosity_Index SigmoidOfAreas Fault
0.8182 -0.2913 0.5822 Pastry
0.7931 -0.1756 0.2984 Bumps
0.6667 -0.1228 215 Bumps
0.8444 -0.1568 0.5212 Dirty
0.9338 -0.1992 1.0 Stains
0.8736 -0.2267 0.9874 Pastry
0.9205 0.2791 1.0 Stains
0.5 0.1841 0.3359 Bumps
0.5 -0.1197 0.5593 Pastry
0.9024 -0.0651 1.0 Pastry
0.75 -0.1093 0.8612 Pastry
28
fields
total
BigML Gallery
#MLSEV 25
Summary
• Novel plates faults categories could be spotted with this method
• Model degradation in general can be monitored with anomaly detectors
MLSEV Virtual. Anomaly Detection Examples

Más contenido relacionado

Similar a MLSEV Virtual. Anomaly Detection Examples

BSSML17 - Anomaly Detection
BSSML17 - Anomaly DetectionBSSML17 - Anomaly Detection
BSSML17 - Anomaly DetectionBigML, Inc
 
DutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly DetectionDutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly DetectionBigML, Inc
 
Anomaly detection Workshop slides
Anomaly detection Workshop slidesAnomaly detection Workshop slides
Anomaly detection Workshop slidesQuantUniversity
 
Feature Engineering
Feature Engineering Feature Engineering
Feature Engineering odsc
 
Data_Preparation.pptx
Data_Preparation.pptxData_Preparation.pptx
Data_Preparation.pptxImXaib
 
14. Statistical Process Control.pptx
14. Statistical Process Control.pptx14. Statistical Process Control.pptx
14. Statistical Process Control.pptxSarthakGupta856447
 
Automating Data Exploration SciPy 2016
Automating Data Exploration SciPy 2016Automating Data Exploration SciPy 2016
Automating Data Exploration SciPy 2016Gramener
 
IMCSummit 2015 - Day 2 Developer Track - Catch Them in the Act - Fraud Detect...
IMCSummit 2015 - Day 2 Developer Track - Catch Them in the Act - Fraud Detect...IMCSummit 2015 - Day 2 Developer Track - Catch Them in the Act - Fraud Detect...
IMCSummit 2015 - Day 2 Developer Track - Catch Them in the Act - Fraud Detect...In-Memory Computing Summit
 
Mathematics of anomalies
Mathematics of anomaliesMathematics of anomalies
Mathematics of anomaliesCSIRO
 
HYDSPIN Dec14 visual story telling
HYDSPIN Dec14 visual story tellingHYDSPIN Dec14 visual story telling
HYDSPIN Dec14 visual story tellingGramener
 
Database Marketing - Dominick's stores in Chicago distric
Database Marketing - Dominick's stores in Chicago districDatabase Marketing - Dominick's stores in Chicago distric
Database Marketing - Dominick's stores in Chicago districDemin Wang
 
Detecting Malicious Websites using Machine Learning
Detecting Malicious Websites using Machine LearningDetecting Malicious Websites using Machine Learning
Detecting Malicious Websites using Machine LearningAndrew Beard
 
Online Detection of Shutdown Periods in Chemical Plants: A Case Study
Online Detection of Shutdown Periods in Chemical Plants: A Case StudyOnline Detection of Shutdown Periods in Chemical Plants: A Case Study
Online Detection of Shutdown Periods in Chemical Plants: A Case StudyManuel Martín
 
SPC Training by D&H Engineers
SPC Training by D&H EngineersSPC Training by D&H Engineers
SPC Training by D&H EngineersD&H Engineers
 
Example-Dependent Cost-Sensitive Credit Card Fraud Detection
Example-Dependent Cost-Sensitive Credit Card Fraud DetectionExample-Dependent Cost-Sensitive Credit Card Fraud Detection
Example-Dependent Cost-Sensitive Credit Card Fraud DetectionAlejandro Correa Bahnsen, PhD
 

Similar a MLSEV Virtual. Anomaly Detection Examples (20)

BSSML17 - Anomaly Detection
BSSML17 - Anomaly DetectionBSSML17 - Anomaly Detection
BSSML17 - Anomaly Detection
 
DutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly DetectionDutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly Detection
 
Anomaly detection Workshop slides
Anomaly detection Workshop slidesAnomaly detection Workshop slides
Anomaly detection Workshop slides
 
Feature Engineering
Feature Engineering Feature Engineering
Feature Engineering
 
Data_Preparation.pptx
Data_Preparation.pptxData_Preparation.pptx
Data_Preparation.pptx
 
14. Statistical Process Control.pptx
14. Statistical Process Control.pptx14. Statistical Process Control.pptx
14. Statistical Process Control.pptx
 
Automating Data Exploration SciPy 2016
Automating Data Exploration SciPy 2016Automating Data Exploration SciPy 2016
Automating Data Exploration SciPy 2016
 
Quantity and unit
Quantity and unitQuantity and unit
Quantity and unit
 
IMCSummit 2015 - Day 2 Developer Track - Catch Them in the Act - Fraud Detect...
IMCSummit 2015 - Day 2 Developer Track - Catch Them in the Act - Fraud Detect...IMCSummit 2015 - Day 2 Developer Track - Catch Them in the Act - Fraud Detect...
IMCSummit 2015 - Day 2 Developer Track - Catch Them in the Act - Fraud Detect...
 
Mathematics of anomalies
Mathematics of anomaliesMathematics of anomalies
Mathematics of anomalies
 
Multivariate Analysis
Multivariate AnalysisMultivariate Analysis
Multivariate Analysis
 
Multivariate Analysis.ppt
Multivariate Analysis.pptMultivariate Analysis.ppt
Multivariate Analysis.ppt
 
Multivariate analysis
Multivariate analysisMultivariate analysis
Multivariate analysis
 
HYDSPIN Dec14 visual story telling
HYDSPIN Dec14 visual story tellingHYDSPIN Dec14 visual story telling
HYDSPIN Dec14 visual story telling
 
Database Marketing - Dominick's stores in Chicago distric
Database Marketing - Dominick's stores in Chicago districDatabase Marketing - Dominick's stores in Chicago distric
Database Marketing - Dominick's stores in Chicago distric
 
Detecting Malicious Websites using Machine Learning
Detecting Malicious Websites using Machine LearningDetecting Malicious Websites using Machine Learning
Detecting Malicious Websites using Machine Learning
 
Online Detection of Shutdown Periods in Chemical Plants: A Case Study
Online Detection of Shutdown Periods in Chemical Plants: A Case StudyOnline Detection of Shutdown Periods in Chemical Plants: A Case Study
Online Detection of Shutdown Periods in Chemical Plants: A Case Study
 
SPC Training by D&H Engineers
SPC Training by D&H EngineersSPC Training by D&H Engineers
SPC Training by D&H Engineers
 
Example-Dependent Cost-Sensitive Credit Card Fraud Detection
Example-Dependent Cost-Sensitive Credit Card Fraud DetectionExample-Dependent Cost-Sensitive Credit Card Fraud Detection
Example-Dependent Cost-Sensitive Credit Card Fraud Detection
 
Graphs, pareto
Graphs, paretoGraphs, pareto
Graphs, pareto
 

Más de BigML, Inc

Digital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in ManufacturingDigital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in ManufacturingBigML, Inc
 
DutchMLSchool 2022 - Automation
DutchMLSchool 2022 - AutomationDutchMLSchool 2022 - Automation
DutchMLSchool 2022 - AutomationBigML, Inc
 
DutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML ComplianceDutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML ComplianceBigML, Inc
 
DutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective AnomaliesDutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective AnomaliesBigML, Inc
 
DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector BigML, Inc
 
DutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLDutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLBigML, Inc
 
DutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End MLDutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End MLBigML, Inc
 
DutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven CompanyDutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven CompanyBigML, Inc
 
DutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal SectorDutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal SectorBigML, Inc
 
DutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe StadiumsDutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe StadiumsBigML, Inc
 
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing PlantsDutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing PlantsBigML, Inc
 
DutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at ScaleDutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at ScaleBigML, Inc
 
DutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AIDutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AIBigML, Inc
 
Democratizing Object Detection
Democratizing Object DetectionDemocratizing Object Detection
Democratizing Object DetectionBigML, Inc
 
BigML Release: Image Processing
BigML Release: Image ProcessingBigML Release: Image Processing
BigML Release: Image ProcessingBigML, Inc
 
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your FutureMachine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your FutureBigML, Inc
 
Machine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail SectorMachine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail SectorBigML, Inc
 
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a LawyerbotML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a LawyerbotBigML, Inc
 
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...BigML, Inc
 
ML in GRC: Cybersecurity versus Governance, Risk Management, and Compliance
ML in GRC: Cybersecurity versus Governance, Risk Management, and ComplianceML in GRC: Cybersecurity versus Governance, Risk Management, and Compliance
ML in GRC: Cybersecurity versus Governance, Risk Management, and ComplianceBigML, Inc
 

Más de BigML, Inc (20)

Digital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in ManufacturingDigital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in Manufacturing
 
DutchMLSchool 2022 - Automation
DutchMLSchool 2022 - AutomationDutchMLSchool 2022 - Automation
DutchMLSchool 2022 - Automation
 
DutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML ComplianceDutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML Compliance
 
DutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective AnomaliesDutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective Anomalies
 
DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector
 
DutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLDutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in ML
 
DutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End MLDutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End ML
 
DutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven CompanyDutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven Company
 
DutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal SectorDutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal Sector
 
DutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe StadiumsDutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe Stadiums
 
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing PlantsDutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
 
DutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at ScaleDutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at Scale
 
DutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AIDutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AI
 
Democratizing Object Detection
Democratizing Object DetectionDemocratizing Object Detection
Democratizing Object Detection
 
BigML Release: Image Processing
BigML Release: Image ProcessingBigML Release: Image Processing
BigML Release: Image Processing
 
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your FutureMachine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
 
Machine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail SectorMachine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail Sector
 
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a LawyerbotML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
 
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
 
ML in GRC: Cybersecurity versus Governance, Risk Management, and Compliance
ML in GRC: Cybersecurity versus Governance, Risk Management, and ComplianceML in GRC: Cybersecurity versus Governance, Risk Management, and Compliance
ML in GRC: Cybersecurity versus Governance, Risk Management, and Compliance
 

Último

➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...amitlee9823
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Pooja Nehwal
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...amitlee9823
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...amitlee9823
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 

Último (20)

Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 

MLSEV Virtual. Anomaly Detection Examples

  • 2. #MLSEV 2 Anomaly Detectors Practical Examples with BigML Guillem Vidal Machine Learning Engineer, BigML
  • 3. #MLSEV 3 Outline 2 Demo 1: Removing Outliers 3 Demo 2: Fraud Detection 4 Demo 3: Novel Categories Discovery 1 Anomaly Detection Recap
  • 5. #MLSEV 5 Anomaly Detection date customer account auth class zip amount Mon Bob 3421 pin clothes 46140 135 Tue Bob 3421 sign food 46140 401 Tue Alice 2456 pin food 12222 234 Wed Sally 6788 pin gas 26339 94 Wed Bob 3421 pin tech 21350 2459 Wed Bob 3421 pin gas 46140 83 Thr Sally 6788 sign food 26339 51 An unsupervised algorithm that looks for unusual instances in a dataset. Anomaly detectors provide an anomaly score to each instance, the higher is the score the most unusual is the instance. Example: • Amount $2,459 is higher than all other transactions • Only transaction • In zip 21350 • For the purchase class “tech"
  • 6. #MLSEV 6 Graphical Example Which object appears more unusual within this group?
  • 7. #MLSEV “Round”“Skinny” “Corners” “Skinny” but not “smooth” No “Corners” Not “Round” Most unusual 7 Graphical Example
  • 8. #MLSEV 8 Isolation Forest “easy” to isolate “hard” to isolate Depth Now repeat the process several times and use average depth to compute anomaly score: 0 (similar) 1 (dissimilar) Isolation Forest: Grow random decision trees until each instance is in its own leaf. Random features and splits
  • 9. #MLSEV 9 Isolation Forest Splits https://cs.nju.edu.cn/zhouzh/zhouzh.files/publication/icdm08b.pdf AnomalyUsual data point
  • 12. #MLSEV 12 Outliers • Data points that differ significantly from other observations • Outliers can cause serious problems in statistical analyses • Examples: 1 2 3 4 5 6 10 20 30 40 50 60 70 80 900 Price (100k €) Square Meters Regression: 1 2 3 4 5 6 0 Price (100k €) 10 20 30 40 50 60 70 80 90 Square Meters Unsold Sold Classification:
  • 13. #MLSEV 13 Outliers • Data points that differ significantly from other observations • Outliers can cause serious problems in statistical analyses • Examples: 1 2 3 4 5 6 10 20 30 40 50 60 70 80 900 Price (100k €) Square Meters Regression: 1 2 3 4 5 6 0 Price (100k €) 10 20 30 40 50 60 70 80 90 Square Meters Unsold Sold Classification:
  • 14. #MLSEV 14 Removing Outliers ORIGINAL DATASET TRAIN SET TEST SET ALL MODEL CLEAN DATASET REJECT MOST ANOMALOUS CLEAN MODEL COMPARE EVALUATIONS ANOMALY DETECTOR • Anomaly detectors can be used to remove outliers • With this methodology outliers removal can be tested ALL EVALUATION CLEAN EVALUATION
  • 15. #MLSEV 15 Outliers Demo pregnancies plasma glucose blood pressure triceps skin thickness insulin bmi diabetes pedigree age diabetes 6 148 72 35 0 33.6 627 50 TRUE 1 85 66 29 0 26.6 351 31 FALSE 8 183 64 0 0 23.3 672 32 TRUE 1 89 66 23 94 28.1 167 21 FALSE 0 137 40 35 168 43.1 2.288 33 TRUE 5 116 74 0 0 25.6 201 30 FALSE 3 78 50 32 88 31.0 248 26 TRUE 10 115 0 0 0 35.3 134 29 FALSE 2 197 70 45 543 30.5 158 53 TRUE 8 125 96 0 0 0.0 232 54 TRUE 4 110 92 0 0 37.6 191 30 FALSE 10 168 74 0 0 38.0 537 34 TRUE Diabetes dataset • Predict whether patients are diabetic or not BigML Gallery
  • 16. #MLSEV 16 Summary •An anomaly detector improved a classifier performance by removing top 10 anomalies as outliers •Usually removing anomalies with score over 60% works
  • 18. #MLSEV 18 Fraud Detection HISTORIC NON FRAUD TRANSACTIONS ANOMALY DETECTOR NEW TRANSACTION(S) ANOMALY SCORE KEEP HIGH SCORES SUSPICIOUS TRANSACTION(S) FRAUD ANALYST • Use Machine Learning to detect fraudulent financial transactions • Fraud transactions being unusual can be detected with an anomaly detector
  • 19. #MLSEV 19 Fraud Detection Demo Credit card transactions dataset • Anonymized credit card transactions with a fraud label • Very unbalanced Time V1 V2 V3 V4 0 -1.3598 -0.0727 2.5363 1.3781 0 1.1918 0.2661 0.1664 0.4481 1 -1.3583 -1.3401 1.7732 0.3797 1 -0.9662 -0.1852 1.7929 -0.8632 2 -1.1582 0.8777 1.5487 0.4030 2 -0.4259 0.9605 1.1411 -0.1682 4 1.2296 0.1410 0.0453 1.2026 7 -0.6442 1.4179 1.0743 -0.4921 7 -0.8942 0.2861 -0.1131 -0.2715 9 -0.3382 1.1195 1.0443 -0.2221 10 1.4490 -1.1763 0.9138 -1.3756 V27 V28 Amount Class 0.1335 -0.0210 149.62 0 -0.0089 0.0147 2.69 0 -0.0553 -0.0597 378.66 0 0.0627 0.0614 123.5 0 0.2194 0.2151 69.99 0 0.2538 0.0810 3.67 0 0.0345 0.0051 4.99 0 -1.2069 -1.0853 40.8 1 0.0117 0.1424 93.2 0 0.2462 0.0830 3.68 0 0.0428 0.0162 7.8 0 … … … https://www.kaggle.com/mlg-ulb/creditcardfraud
  • 20. #MLSEV 20 Summary • Anomaly detectors can be an unsupervised alternative to classifiers in extremely unbalanced datasets • Fraud detection is an example. A similar approach can be used for other use cases such as predictive maintenance or network intrusion detection • With this approach, the most challenging aspect is finding the features that work
  • 22. #MLSEV 22 Novel Categories • A classification model performance could be reduced over time in production with real data evolution over time • Model degradation can be addressed by retraining with new data • What if new data is not labeled? • What if new data contains novel categories? • Anomaly detectors can be used to spot model degradation and to discover novel categories
  • 23. #MLSEV 23 Novel Categories Discovery ORIGINAL DATASET CLASSIFICATION MODEL ANOMALY DETECTOR NEW INSTANCES HIGH SCORED INSTANCES, POTENTIAL NOVEL CATEGORIES REJECT HIGH ANOMALY SCORES SIMILAR INSTANCES PREDICTION LABEL/RETRAIN MODEL ALERT WHEN CUMULATED ANOMALY SCORE DATA ANALYST
  • 24. #MLSEV 24 Novel Categories Demo Steel plates faults dataset • Each instance represents a faulty steel plate with fault type label • Objective: predict fault type given a faulty steel plate … … … X_Min X_Max Y_Min Y_Max Pixels Areas X_Perim Y_Perim 42 50 270900 270944 267 17 44 645 651 2538079 2538108 108 10 30 829 835 1553913 1553931 71 8 19 853 860 369370 369415 176 13 45 1289 1306 498078 498335 2409 60 260 430 441 100250 100337 630 20 87 413 446 138468 138883 9052 230 432 190 200 210936 210956 132 11 20 330 343 429227 429253 264 15 26 74 90 779144 779308 1506 46 167 106 118 813452 813500 442 13 48 Orientation_Index Luminosity_Index SigmoidOfAreas Fault 0.8182 -0.2913 0.5822 Pastry 0.7931 -0.1756 0.2984 Bumps 0.6667 -0.1228 215 Bumps 0.8444 -0.1568 0.5212 Dirty 0.9338 -0.1992 1.0 Stains 0.8736 -0.2267 0.9874 Pastry 0.9205 0.2791 1.0 Stains 0.5 0.1841 0.3359 Bumps 0.5 -0.1197 0.5593 Pastry 0.9024 -0.0651 1.0 Pastry 0.75 -0.1093 0.8612 Pastry 28 fields total BigML Gallery
  • 25. #MLSEV 25 Summary • Novel plates faults categories could be spotted with this method • Model degradation in general can be monitored with anomaly detectors