SlideShare una empresa de Scribd logo
1 de 15
Inconsistency and OutliersActive Learning by Outlier DetectionInconsistency Robustness Symposium 2011 Neil Rubens Assistant Professor University of Electro-Communications Tokyo, Japan
Outline Inconsistency Robustness is a multi-disciplinary issue.  We discuss some of the aspect of Inconsistency Robustness from the perspective of Machine Learning: What is Inconsistency Can Inconsistency be Useful Measuring Inconsistency
Inconsistency-Outlier Inconsistency/outlier: data that does not agree with the model.
Outlier Types Spatial Outlier unlabeled data Our Focus Model Outlier labeled data
Causes of Outliers Faulty data Entry error, malfunction, etc. Chance/Deviation Incorrect Model Our Focus http://www.dkimages.com/discover/previews/852/20223083.JPG
Typical Treatment of Outliers Assume that the learned model is correct and discard points that don’t agree with the model
Atypical Treatment of Outliers Assume that data is right, and that the model is wrong Our Focus
Rubens et al, AJS 2011
If there is no inconsistency between the training and testing data then  the most complex model would tend be selected.
Change Detection / Model Correction  Is inconsistency caused by noise (or minor factors) or by changes in the underlying model http://www.skyboximaging.com/solutions/application/change-detection Applications: medical diagnostics, intrusion detection, network analysis, finance http://www.satimagingcorp.com/galleryimages/high-resolution-landsat-satellite-imagery-oman.jpg http://www.lucieer.net/research/heard.html http://www.ittvis.com/portals/0/images/ChangeDetection_3window.jpg
Conclusion Inconsistency could be useful for: Hypothesis Learning Model Selection Model Correction Neil Rubens Assistant ProfessorActive Intelligence Group Laboratory for Knowledge Computing University of Electro-Communications Tokyo, Japan http://ActiveIntelligence.org

Más contenido relacionado

La actualidad más candente

On Parameter Tuning in Search-Based Software Engineering: A Replicated Empiri...
On Parameter Tuning in Search-Based Software Engineering: A Replicated Empiri...On Parameter Tuning in Search-Based Software Engineering: A Replicated Empiri...
On Parameter Tuning in Search-Based Software Engineering: A Replicated Empiri...Abdel Salam Sayyad
 
Past and Future of Software Testing and Analysis
Past and Future of Software Testing and AnalysisPast and Future of Software Testing and Analysis
Past and Future of Software Testing and AnalysisLionel Briand
 
Supervised and Unsupervised Machine Learning
Supervised and Unsupervised Machine LearningSupervised and Unsupervised Machine Learning
Supervised and Unsupervised Machine LearningSpotle.ai
 
Empirical research methods for software engineering
Empirical research methods for software engineeringEmpirical research methods for software engineering
Empirical research methods for software engineeringsarfraznawaz
 
On the application of SAT solvers for Search Based Software Testing
On the application of SAT solvers for Search Based Software TestingOn the application of SAT solvers for Search Based Software Testing
On the application of SAT solvers for Search Based Software Testingjfrchicanog
 
Empirical Software Engineering - What is it and why do we need it?
Empirical Software Engineering - What is it and why do we need it?Empirical Software Engineering - What is it and why do we need it?
Empirical Software Engineering - What is it and why do we need it?Daniel Mendez
 
Spreadsheet Errors John Park
Spreadsheet  Errors  John ParkSpreadsheet  Errors  John Park
Spreadsheet Errors John ParkJohn Park
 
Abstractions Conference 2016 - Machine Learning in Healthcare – ML for the Re...
Abstractions Conference 2016 - Machine Learning in Healthcare – ML for the Re...Abstractions Conference 2016 - Machine Learning in Healthcare – ML for the Re...
Abstractions Conference 2016 - Machine Learning in Healthcare – ML for the Re...Mohinder Dick, PMP
 
Algorithm evaluation using item response theory
Algorithm evaluation using item response theoryAlgorithm evaluation using item response theory
Algorithm evaluation using item response theoryCSIRO
 
Machine learning - session 2
Machine learning - session 2Machine learning - session 2
Machine learning - session 2Luis Borbon
 
Top 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsTop 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsSri Ambati
 
Report
ReportReport
Reportbutest
 
SVTL 2011 - 11 - Rowan
SVTL 2011 - 11 - RowanSVTL 2011 - 11 - Rowan
SVTL 2011 - 11 - Rowanthe nciia
 
Qualitative Studies in Software Engineering - Interviews, Observation, Ground...
Qualitative Studies in Software Engineering - Interviews, Observation, Ground...Qualitative Studies in Software Engineering - Interviews, Observation, Ground...
Qualitative Studies in Software Engineering - Interviews, Observation, Ground...alessio_ferrari
 
Using a Concept Inventory to Inform the Design of Instruction and Software
Using a Concept Inventory to Inform the Design of Instruction and SoftwareUsing a Concept Inventory to Inform the Design of Instruction and Software
Using a Concept Inventory to Inform the Design of Instruction and SoftwareDoug Holton
 
Predicting students performance in final examination
Predicting students performance in final examinationPredicting students performance in final examination
Predicting students performance in final examinationRashid Ansari
 

La actualidad más candente (20)

On Parameter Tuning in Search-Based Software Engineering: A Replicated Empiri...
On Parameter Tuning in Search-Based Software Engineering: A Replicated Empiri...On Parameter Tuning in Search-Based Software Engineering: A Replicated Empiri...
On Parameter Tuning in Search-Based Software Engineering: A Replicated Empiri...
 
Past and Future of Software Testing and Analysis
Past and Future of Software Testing and AnalysisPast and Future of Software Testing and Analysis
Past and Future of Software Testing and Analysis
 
Supervised and Unsupervised Machine Learning
Supervised and Unsupervised Machine LearningSupervised and Unsupervised Machine Learning
Supervised and Unsupervised Machine Learning
 
Empirical research methods for software engineering
Empirical research methods for software engineeringEmpirical research methods for software engineering
Empirical research methods for software engineering
 
On the application of SAT solvers for Search Based Software Testing
On the application of SAT solvers for Search Based Software TestingOn the application of SAT solvers for Search Based Software Testing
On the application of SAT solvers for Search Based Software Testing
 
Empirical Software Engineering - What is it and why do we need it?
Empirical Software Engineering - What is it and why do we need it?Empirical Software Engineering - What is it and why do we need it?
Empirical Software Engineering - What is it and why do we need it?
 
Spreadsheet Errors John Park
Spreadsheet  Errors  John ParkSpreadsheet  Errors  John Park
Spreadsheet Errors John Park
 
Abstractions Conference 2016 - Machine Learning in Healthcare – ML for the Re...
Abstractions Conference 2016 - Machine Learning in Healthcare – ML for the Re...Abstractions Conference 2016 - Machine Learning in Healthcare – ML for the Re...
Abstractions Conference 2016 - Machine Learning in Healthcare – ML for the Re...
 
Ml part2
Ml part2Ml part2
Ml part2
 
Novice e-ass
Novice e-assNovice e-ass
Novice e-ass
 
What is Gate exam
What is Gate examWhat is Gate exam
What is Gate exam
 
Algorithm evaluation using item response theory
Algorithm evaluation using item response theoryAlgorithm evaluation using item response theory
Algorithm evaluation using item response theory
 
Machine learning - session 2
Machine learning - session 2Machine learning - session 2
Machine learning - session 2
 
Top 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsTop 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner Pitfalls
 
Report
ReportReport
Report
 
Novice vp2
Novice vp2Novice vp2
Novice vp2
 
SVTL 2011 - 11 - Rowan
SVTL 2011 - 11 - RowanSVTL 2011 - 11 - Rowan
SVTL 2011 - 11 - Rowan
 
Qualitative Studies in Software Engineering - Interviews, Observation, Ground...
Qualitative Studies in Software Engineering - Interviews, Observation, Ground...Qualitative Studies in Software Engineering - Interviews, Observation, Ground...
Qualitative Studies in Software Engineering - Interviews, Observation, Ground...
 
Using a Concept Inventory to Inform the Design of Instruction and Software
Using a Concept Inventory to Inform the Design of Instruction and SoftwareUsing a Concept Inventory to Inform the Design of Instruction and Software
Using a Concept Inventory to Inform the Design of Instruction and Software
 
Predicting students performance in final examination
Predicting students performance in final examinationPredicting students performance in final examination
Predicting students performance in final examination
 

Similar a Inconsistent Outliers

Lecture 9: Machine Learning in Practice (2)
Lecture 9: Machine Learning in Practice (2)Lecture 9: Machine Learning in Practice (2)
Lecture 9: Machine Learning in Practice (2)Marina Santini
 
Module 4: Model Selection and Evaluation
Module 4: Model Selection and EvaluationModule 4: Model Selection and Evaluation
Module 4: Model Selection and EvaluationSara Hooker
 
Machine Learning Approaches and its Challenges
Machine Learning Approaches and its ChallengesMachine Learning Approaches and its Challenges
Machine Learning Approaches and its Challengesijcnes
 
Total Survey Error & Institutional Research: A case study of the University E...
Total Survey Error & Institutional Research: A case study of the University E...Total Survey Error & Institutional Research: A case study of the University E...
Total Survey Error & Institutional Research: A case study of the University E...Sonia Whiteley
 
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...ijsc
 
Technology-based assessments-special educationNew technologies r.docx
Technology-based assessments-special educationNew technologies r.docxTechnology-based assessments-special educationNew technologies r.docx
Technology-based assessments-special educationNew technologies r.docxssuserf9c51d
 
E bay amplify_final
E bay amplify_finalE bay amplify_final
E bay amplify_finalMaria Stone
 
A Non-Technical Approach for Illustrating Item Response Theory
A Non-Technical Approach for Illustrating Item Response TheoryA Non-Technical Approach for Illustrating Item Response Theory
A Non-Technical Approach for Illustrating Item Response TheoryOpenThink Labs
 
An overview on data mining designed for imbalanced datasets
An overview on data mining designed for imbalanced datasetsAn overview on data mining designed for imbalanced datasets
An overview on data mining designed for imbalanced datasetseSAT Publishing House
 
An overview on data mining designed for imbalanced datasets
An overview on data mining designed for imbalanced datasetsAn overview on data mining designed for imbalanced datasets
An overview on data mining designed for imbalanced datasetseSAT Journals
 
Missing data and non response pdf
Missing data and non response pdfMissing data and non response pdf
Missing data and non response pdfAnuj Bhatia
 
Top 10 Data Science Practioner Pitfalls - Mark Landry
Top 10 Data Science Practioner Pitfalls - Mark LandryTop 10 Data Science Practioner Pitfalls - Mark Landry
Top 10 Data Science Practioner Pitfalls - Mark LandrySri Ambati
 
STAT7440StudentIMLPresentationJishan.pptx
STAT7440StudentIMLPresentationJishan.pptxSTAT7440StudentIMLPresentationJishan.pptx
STAT7440StudentIMLPresentationJishan.pptxJishanAhmed24
 
Analysing & interpreting data.ppt
Analysing & interpreting data.pptAnalysing & interpreting data.ppt
Analysing & interpreting data.pptmanaswidebbarma1
 
Case Studies: When you can't or won't run an experiment (and still want to...
Case Studies: When you can't or  won't run an  experiment (and still  want to...Case Studies: When you can't or  won't run an  experiment (and still  want to...
Case Studies: When you can't or won't run an experiment (and still want to...David Saldaña Sage
 

Similar a Inconsistent Outliers (20)

Lecture 9: Machine Learning in Practice (2)
Lecture 9: Machine Learning in Practice (2)Lecture 9: Machine Learning in Practice (2)
Lecture 9: Machine Learning in Practice (2)
 
Module 4: Model Selection and Evaluation
Module 4: Model Selection and EvaluationModule 4: Model Selection and Evaluation
Module 4: Model Selection and Evaluation
 
Machine Learning Approaches and its Challenges
Machine Learning Approaches and its ChallengesMachine Learning Approaches and its Challenges
Machine Learning Approaches and its Challenges
 
Total Survey Error & Institutional Research: A case study of the University E...
Total Survey Error & Institutional Research: A case study of the University E...Total Survey Error & Institutional Research: A case study of the University E...
Total Survey Error & Institutional Research: A case study of the University E...
 
Irt assessment
Irt assessmentIrt assessment
Irt assessment
 
Data wrangling week 9
Data wrangling week 9Data wrangling week 9
Data wrangling week 9
 
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...
AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...
 
Technology-based assessments-special educationNew technologies r.docx
Technology-based assessments-special educationNew technologies r.docxTechnology-based assessments-special educationNew technologies r.docx
Technology-based assessments-special educationNew technologies r.docx
 
E bay amplify_final
E bay amplify_finalE bay amplify_final
E bay amplify_final
 
A Non-Technical Approach for Illustrating Item Response Theory
A Non-Technical Approach for Illustrating Item Response TheoryA Non-Technical Approach for Illustrating Item Response Theory
A Non-Technical Approach for Illustrating Item Response Theory
 
An overview on data mining designed for imbalanced datasets
An overview on data mining designed for imbalanced datasetsAn overview on data mining designed for imbalanced datasets
An overview on data mining designed for imbalanced datasets
 
An overview on data mining designed for imbalanced datasets
An overview on data mining designed for imbalanced datasetsAn overview on data mining designed for imbalanced datasets
An overview on data mining designed for imbalanced datasets
 
Lime
LimeLime
Lime
 
Missing data and non response pdf
Missing data and non response pdfMissing data and non response pdf
Missing data and non response pdf
 
Top 10 Data Science Practioner Pitfalls - Mark Landry
Top 10 Data Science Practioner Pitfalls - Mark LandryTop 10 Data Science Practioner Pitfalls - Mark Landry
Top 10 Data Science Practioner Pitfalls - Mark Landry
 
MLF-2.pptx
MLF-2.pptxMLF-2.pptx
MLF-2.pptx
 
Calibration of weights in surveys with nonresponse and frame imperfections
Calibration of weights in surveys with nonresponse and frame imperfectionsCalibration of weights in surveys with nonresponse and frame imperfections
Calibration of weights in surveys with nonresponse and frame imperfections
 
STAT7440StudentIMLPresentationJishan.pptx
STAT7440StudentIMLPresentationJishan.pptxSTAT7440StudentIMLPresentationJishan.pptx
STAT7440StudentIMLPresentationJishan.pptx
 
Analysing & interpreting data.ppt
Analysing & interpreting data.pptAnalysing & interpreting data.ppt
Analysing & interpreting data.ppt
 
Case Studies: When you can't or won't run an experiment (and still want to...
Case Studies: When you can't or  won't run an  experiment (and still  want to...Case Studies: When you can't or  won't run an  experiment (and still  want to...
Case Studies: When you can't or won't run an experiment (and still want to...
 

Más de Neil Rubens

Autism: Survey of Emerging Approaches [Clinical]
Autism: Survey of Emerging Approaches [Clinical]Autism: Survey of Emerging Approaches [Clinical]
Autism: Survey of Emerging Approaches [Clinical]Neil Rubens
 
Collaborative Robotics (CoBot): Opportunities for Corporations
Collaborative Robotics (CoBot): Opportunities for CorporationsCollaborative Robotics (CoBot): Opportunities for Corporations
Collaborative Robotics (CoBot): Opportunities for CorporationsNeil Rubens
 
Autism: Survey of Emerging Approaches [Startups]
Autism: Survey of Emerging Approaches [Startups]Autism: Survey of Emerging Approaches [Startups]
Autism: Survey of Emerging Approaches [Startups]Neil Rubens
 
Solving the AL Chicken-and-Egg Corpus and Model Problem
Solving the AL Chicken-and-Egg Corpus and Model ProblemSolving the AL Chicken-and-Egg Corpus and Model Problem
Solving the AL Chicken-and-Egg Corpus and Model ProblemNeil Rubens
 
Recommender Systems and Active Learning (for Startups)
Recommender Systems and Active Learning (for Startups)Recommender Systems and Active Learning (for Startups)
Recommender Systems and Active Learning (for Startups)Neil Rubens
 
ThingTank @ MIT-Skoltech Innovation Symposium 2014
ThingTank @ MIT-Skoltech Innovation Symposium 2014ThingTank @ MIT-Skoltech Innovation Symposium 2014
ThingTank @ MIT-Skoltech Innovation Symposium 2014Neil Rubens
 
Network Learning: AI-driven Connectivist Framework for E-Learning 3.0
Network Learning: AI-driven Connectivist Framework for E-Learning 3.0Network Learning: AI-driven Connectivist Framework for E-Learning 3.0
Network Learning: AI-driven Connectivist Framework for E-Learning 3.0Neil Rubens
 
e-learning 3.0 and AI
e-learning 3.0 and AIe-learning 3.0 and AI
e-learning 3.0 and AINeil Rubens
 
Learning Networks: e-Learning 3.0
Learning Networks: e-Learning 3.0Learning Networks: e-Learning 3.0
Learning Networks: e-Learning 3.0Neil Rubens
 
Active Learning in Recommender Systems
Active Learning in Recommender SystemsActive Learning in Recommender Systems
Active Learning in Recommender SystemsNeil Rubens
 
Outliers and Inconsistency
Outliers and InconsistencyOutliers and Inconsistency
Outliers and InconsistencyNeil Rubens
 
Alumni Network Analysis
Alumni Network AnalysisAlumni Network Analysis
Alumni Network AnalysisNeil Rubens
 
Value Co-Creation in Innovation Ecosystems (Presentation @ Tokyo Institute of...
Value Co-Creation in Innovation Ecosystems (Presentation @ Tokyo Institute of...Value Co-Creation in Innovation Ecosystems (Presentation @ Tokyo Institute of...
Value Co-Creation in Innovation Ecosystems (Presentation @ Tokyo Institute of...Neil Rubens
 
Value Co-Creation in Innovation Ecosystems (English)
Value Co-Creation in Innovation Ecosystems (English)Value Co-Creation in Innovation Ecosystems (English)
Value Co-Creation in Innovation Ecosystems (English)Neil Rubens
 
Value Co-Creation in Innovation Ecosystems (Chinese)
Value Co-Creation in Innovation Ecosystems (Chinese)Value Co-Creation in Innovation Ecosystems (Chinese)
Value Co-Creation in Innovation Ecosystems (Chinese)Neil Rubens
 

Más de Neil Rubens (16)

Autism: Survey of Emerging Approaches [Clinical]
Autism: Survey of Emerging Approaches [Clinical]Autism: Survey of Emerging Approaches [Clinical]
Autism: Survey of Emerging Approaches [Clinical]
 
Collaborative Robotics (CoBot): Opportunities for Corporations
Collaborative Robotics (CoBot): Opportunities for CorporationsCollaborative Robotics (CoBot): Opportunities for Corporations
Collaborative Robotics (CoBot): Opportunities for Corporations
 
Autism: Survey of Emerging Approaches [Startups]
Autism: Survey of Emerging Approaches [Startups]Autism: Survey of Emerging Approaches [Startups]
Autism: Survey of Emerging Approaches [Startups]
 
Solving the AL Chicken-and-Egg Corpus and Model Problem
Solving the AL Chicken-and-Egg Corpus and Model ProblemSolving the AL Chicken-and-Egg Corpus and Model Problem
Solving the AL Chicken-and-Egg Corpus and Model Problem
 
Recommender Systems and Active Learning (for Startups)
Recommender Systems and Active Learning (for Startups)Recommender Systems and Active Learning (for Startups)
Recommender Systems and Active Learning (for Startups)
 
ThingTank @ MIT-Skoltech Innovation Symposium 2014
ThingTank @ MIT-Skoltech Innovation Symposium 2014ThingTank @ MIT-Skoltech Innovation Symposium 2014
ThingTank @ MIT-Skoltech Innovation Symposium 2014
 
Network Learning: AI-driven Connectivist Framework for E-Learning 3.0
Network Learning: AI-driven Connectivist Framework for E-Learning 3.0Network Learning: AI-driven Connectivist Framework for E-Learning 3.0
Network Learning: AI-driven Connectivist Framework for E-Learning 3.0
 
e-learning 3.0 and AI
e-learning 3.0 and AIe-learning 3.0 and AI
e-learning 3.0 and AI
 
Learning Networks: e-Learning 3.0
Learning Networks: e-Learning 3.0Learning Networks: e-Learning 3.0
Learning Networks: e-Learning 3.0
 
Active Learning in Recommender Systems
Active Learning in Recommender SystemsActive Learning in Recommender Systems
Active Learning in Recommender Systems
 
Outliers and Inconsistency
Outliers and InconsistencyOutliers and Inconsistency
Outliers and Inconsistency
 
Alumni Network Analysis
Alumni Network AnalysisAlumni Network Analysis
Alumni Network Analysis
 
Value Co-Creation in Innovation Ecosystems (Presentation @ Tokyo Institute of...
Value Co-Creation in Innovation Ecosystems (Presentation @ Tokyo Institute of...Value Co-Creation in Innovation Ecosystems (Presentation @ Tokyo Institute of...
Value Co-Creation in Innovation Ecosystems (Presentation @ Tokyo Institute of...
 
Value Co-Creation in Innovation Ecosystems (English)
Value Co-Creation in Innovation Ecosystems (English)Value Co-Creation in Innovation Ecosystems (English)
Value Co-Creation in Innovation Ecosystems (English)
 
Value Co-Creation in Innovation Ecosystems (Chinese)
Value Co-Creation in Innovation Ecosystems (Chinese)Value Co-Creation in Innovation Ecosystems (Chinese)
Value Co-Creation in Innovation Ecosystems (Chinese)
 
Japan Mobile
Japan MobileJapan Mobile
Japan Mobile
 

Último

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 

Último (20)

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 

Inconsistent Outliers

  • 1. Inconsistency and OutliersActive Learning by Outlier DetectionInconsistency Robustness Symposium 2011 Neil Rubens Assistant Professor University of Electro-Communications Tokyo, Japan
  • 2. Outline Inconsistency Robustness is a multi-disciplinary issue. We discuss some of the aspect of Inconsistency Robustness from the perspective of Machine Learning: What is Inconsistency Can Inconsistency be Useful Measuring Inconsistency
  • 3. Inconsistency-Outlier Inconsistency/outlier: data that does not agree with the model.
  • 4. Outlier Types Spatial Outlier unlabeled data Our Focus Model Outlier labeled data
  • 5. Causes of Outliers Faulty data Entry error, malfunction, etc. Chance/Deviation Incorrect Model Our Focus http://www.dkimages.com/discover/previews/852/20223083.JPG
  • 6. Typical Treatment of Outliers Assume that the learned model is correct and discard points that don’t agree with the model
  • 7. Atypical Treatment of Outliers Assume that data is right, and that the model is wrong Our Focus
  • 8.
  • 9.
  • 10.
  • 11. Rubens et al, AJS 2011
  • 12.
  • 13. If there is no inconsistency between the training and testing data then the most complex model would tend be selected.
  • 14. Change Detection / Model Correction Is inconsistency caused by noise (or minor factors) or by changes in the underlying model http://www.skyboximaging.com/solutions/application/change-detection Applications: medical diagnostics, intrusion detection, network analysis, finance http://www.satimagingcorp.com/galleryimages/high-resolution-landsat-satellite-imagery-oman.jpg http://www.lucieer.net/research/heard.html http://www.ittvis.com/portals/0/images/ChangeDetection_3window.jpg
  • 15. Conclusion Inconsistency could be useful for: Hypothesis Learning Model Selection Model Correction Neil Rubens Assistant ProfessorActive Intelligence Group Laboratory for Knowledge Computing University of Electro-Communications Tokyo, Japan http://ActiveIntelligence.org

Notas del editor

  1. Hello. First of all, I would like to apologize for not being here in person; but I hope to join discussions about Inconsistency Robustness through online means.In my presentation I would like to talk about relations between Inconsistency and Outliers.
  2. As could be seen from the symposium’s program the issue of Inconsistency Robustness is rather multi-disciplinary. Let me discuss some of its aspects from the Machine Learning perspective. More specifically I would like to express my views about what is inconsistency, whether it could be useful and how it could be measured.
  3. In Machine Learning we typically refer to inconsistent points as outliers. Typically, we try to construct a model that is able to fits well the data that we have. The points that do not fit the model are typically considered to be an outlier.I think this cartoon captures very well the essence of the outliers. The outlier piont says that our model/or theory is not correct. On the other hand we consider outliers to be some erroneous or atypical data and tend to discard it.
  4. We can separate outlier into two classes.In the case of Spatial Outlier, the point is considered to be an outlier if it is distant from other points.In the case of Model Outlier, an outlier is a point whose label is different from the model’s expectations.In this talk we will focus on the model outliers.
  5. Outliers can occur due to a variety of causesOutlier could be a Faulty Data caused by the data entry error, or a measurement malfunctionThen there are outliers that occur by chance due to some natural deviationFinally outliers may be due to the incorrect assumptions that we make about the underlying model
  6. When encountering an outlierit is often assumed that current hypothesis/model is reasonably accurate for most of the points, and is inaccurate for just a few outliers. Therefore using outliers is considered to lead the learning process astray towards tuning the model for some incorrect or uncommon cases and therefore making it less accurate for the majority of the points. So outliers are typically discardedWe often get attached to our models/theories and tend to downplay or disregard data does not agree with it.
  7. But we must also consider the other possibility; That the data is right; and the model is wrong In which case the model needs to be changed and corrected
  8. Let us discuss setting in which outlier points could be very useful for learning.Consider that we have many points and we want to learn which points are orange and which points are blue. This could be problem of predicting which movie you like, whether webpage is relevant to your query, which treatment should be prescribed, etc. Typical approach is simply to get a lot of data and then to learn from it. However in many settings obtaining data could be costly e.g. if we want to discover effective treatment of adisease we may have to try out many compounds and that costs a lot of money and effort. If I want to learn about your preferences for movies, I would I need to ask you which movies you like and which ones you don’t; but that takes time and effort and many people are able to provide only a few ratings.So since data is costly we want to obtain data that is most informative and useful.
  9. So to learn the underlying colorings we can obtain a few samples, that is we select the points that we are interested in and their color is revealed.Lets say we have obtained a couple of points already. There could be a number of hypothesis/decision (shown by dashed lines) that are consistent with these points; i.e. points on one side of the line are blue and on the other side are orange. Then when predicting the color of the points we have to select one of the hypotheses and to hope that it is the correct one.
  10. Lets consider that we are now allowed to get another sample. We can choose a sample that is consistent with all of the hypothesis; i.e. all of the hypothesis assign the same color to it. Not surprisingly when the color of the point is revealed it is blue. This might seem like a good thing, but unfortunately it does not allow to reduce the number of hypothesis so that we can find the correct one. On the other hand we can choose an inconsistent point for which part of the hypothesis assign blue color and the other one orange. After the color of the point is revealed we can get rid of the hypothesis that got it wrong; and get closer to finding the right hypothesis.
  11. I would like to make another argument in support of outliers being informative.There is a very interesting phrase by Gregory Bateson that defines information as a difference that makes a difference. Outliers fit the viewpoint of information very well.Outliers are different from the rest of the points by definition.And including outliers in the learning process will make a difference on the model’s predictionsThe intuition behind this principle is thatThe only way that model’s prediction will improve, is if they will change.However, not all of changes are good; so the tricky part is to determine when the change is for the better and when it is not.
  12. Let me briefly mention relation between inconsistency and model complexity.As the number of training point increases more complex models tend to fit data better. e.g. When we have just two points linear model fits the data very well; if we add another point a linear model may no longer be complex enough to fit the data, so we may need to use a polynomial model of order 2; and then as we add more points increasing complex models may be neededAn important implication of that being that as we learn more and more the underlying model is likely to change and to become increasingly complex.
  13. The problem with simply increasing the model’s complexity is that the model that is too complex may start overfitting to the data, e.g. learning noise and not the signal. So allowing for some inconsistency could be good; models that do exceptionally well on some data may actually start to memorize it instead of learning it.So having some inconsistency between training and testing data could actually prevent us from making model more complex than necessary.
  14. The initial learned model could be accurate; but as the time progressed the underlying process may have started to change; e.g. we saw some drastic changes in the stock pricing models these past two weeks. So when we encounter inconsistent data we should not discard it as noise, but try to see if it could be indicative of our current model being incorrect and if possible try to correct it.
  15. In Conclusion, I hope that I was able to show that sometimes inconsistency could actually be rather useful for such things asHypothesis Learning, Model Selection and Model Correction.Thank You.