Inconsistent Outliers

•Descargar como PPTX, PDF•

1 recomendación•1,845 vistas

Neil Rubens

Outliers and Inconsistency at Inconsistency Robustness Symposium 2011 at Stanford University.

Tecnología Educación

Inconsistency and OutliersActive Learning by Outlier DetectionInconsistency Robustness Symposium 2011 Neil Rubens Assistant Professor University of Electro-Communications Tokyo, Japan

Outline Inconsistency Robustness is a multi-disciplinary issue. We discuss some of the aspect of Inconsistency Robustness from the perspective of Machine Learning: What is Inconsistency Can Inconsistency be Useful Measuring Inconsistency

Inconsistency-Outlier Inconsistency/outlier: data that does not agree with the model.

Outlier Types Spatial Outlier unlabeled data Our Focus Model Outlier labeled data

Causes of Outliers Faulty data Entry error, malfunction, etc. Chance/Deviation Incorrect Model Our Focus http://www.dkimages.com/discover/previews/852/20223083.JPG

Typical Treatment of Outliers Assume that the learned model is correct and discard points that don’t agree with the model

Atypical Treatment of Outliers Assume that data is right, and that the model is wrong Our Focus

If there is no inconsistency between the training and testing data then the most complex model would tend be selected.

Change Detection / Model Correction Is inconsistency caused by noise (or minor factors) or by changes in the underlying model http://www.skyboximaging.com/solutions/application/change-detection Applications: medical diagnostics, intrusion detection, network analysis, finance http://www.satimagingcorp.com/galleryimages/high-resolution-landsat-satellite-imagery-oman.jpg http://www.lucieer.net/research/heard.html http://www.ittvis.com/portals/0/images/ChangeDetection_3window.jpg

Conclusion Inconsistency could be useful for: Hypothesis Learning Model Selection Model Correction Neil Rubens Assistant ProfessorActive Intelligence Group Laboratory for Knowledge Computing University of Electro-Communications Tokyo, Japan http://ActiveIntelligence.org

Más contenido relacionado

La actualidad más candente

On Parameter Tuning in Search-Based Software Engineering: A Replicated Empiri...Abdel Salam Sayyad

Past and Future of Software Testing and AnalysisLionel Briand

Supervised and Unsupervised Machine LearningSpotle.ai

Empirical research methods for software engineeringsarfraznawaz

On the application of SAT solvers for Search Based Software Testingjfrchicanog

Empirical Software Engineering - What is it and why do we need it?Daniel Mendez

Spreadsheet Errors John ParkJohn Park

Abstractions Conference 2016 - Machine Learning in Healthcare – ML for the Re...Mohinder Dick, PMP

Ml part2Leon Gladston

Novice e-assmultiermedia

What is Gate examAmit Kumar , Jaipur Engineers

Algorithm evaluation using item response theoryCSIRO

Machine learning - session 2Luis Borbon

Top 10 Data Science Practitioner PitfallsSri Ambati

Reportbutest

Novice vp2multiermedia

SVTL 2011 - 11 - Rowanthe nciia

Qualitative Studies in Software Engineering - Interviews, Observation, Ground...alessio_ferrari

Using a Concept Inventory to Inform the Design of Instruction and SoftwareDoug Holton

Predicting students performance in final examinationRashid Ansari

La actualidad más candente (20)

On Parameter Tuning in Search-Based Software Engineering: A Replicated Empiri...

Past and Future of Software Testing and Analysis

Supervised and Unsupervised Machine Learning

Empirical research methods for software engineering

On the application of SAT solvers for Search Based Software Testing

Empirical Software Engineering - What is it and why do we need it?

Spreadsheet Errors John Park

Abstractions Conference 2016 - Machine Learning in Healthcare – ML for the Re...

Ml part2

Novice e-ass

What is Gate exam

Algorithm evaluation using item response theory

Machine learning - session 2

Top 10 Data Science Practitioner Pitfalls

Report

Novice vp2

SVTL 2011 - 11 - Rowan

Qualitative Studies in Software Engineering - Interviews, Observation, Ground...

Using a Concept Inventory to Inform the Design of Instruction and Software

Predicting students performance in final examination

Similar a Inconsistent Outliers

Lecture 9: Machine Learning in Practice (2)Marina Santini

Module 4: Model Selection and EvaluationSara Hooker

Machine Learning Approaches and its Challengesijcnes

Total Survey Error & Institutional Research: A case study of the University E...Sonia Whiteley

Irt assessmentAllame Tabatabaei

Data wrangling week 9Ferdin Joe John Joseph PhD

AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...ijsc

Technology-based assessments-special educationNew technologies r.docxssuserf9c51d

E bay amplify_finalMaria Stone

A Non-Technical Approach for Illustrating Item Response TheoryOpenThink Labs

An overview on data mining designed for imbalanced datasetseSAT Publishing House

An overview on data mining designed for imbalanced datasetseSAT Journals

LimeDaniel LIAO

Missing data and non response pdfAnuj Bhatia

Top 10 Data Science Practioner Pitfalls - Mark LandrySri Ambati

MLF-2.pptxDevarapalliVamsi1

Calibration of weights in surveys with nonresponse and frame imperfectionsEUSTAT - Euskal Estatistika Erakundea - Instituto Vasco de Estadística

STAT7440StudentIMLPresentationJishan.pptxJishanAhmed24

Analysing & interpreting data.pptmanaswidebbarma1

Case Studies: When you can't or won't run an experiment (and still want to...David Saldaña Sage

Similar a Inconsistent Outliers (20)

Lecture 9: Machine Learning in Practice (2)

Module 4: Model Selection and Evaluation

Machine Learning Approaches and its Challenges

Total Survey Error & Institutional Research: A case study of the University E...

Irt assessment

Data wrangling week 9

AI TESTING: ENSURING A GOOD DATA SPLIT BETWEEN DATA SETS (TRAINING AND TEST) ...

Technology-based assessments-special educationNew technologies r.docx

E bay amplify_final

A Non-Technical Approach for Illustrating Item Response Theory

An overview on data mining designed for imbalanced datasets

Lime

Missing data and non response pdf

Top 10 Data Science Practioner Pitfalls - Mark Landry

MLF-2.pptx

Calibration of weights in surveys with nonresponse and frame imperfections

STAT7440StudentIMLPresentationJishan.pptx

Analysing & interpreting data.ppt

Case Studies: When you can't or won't run an experiment (and still want to...

Más de Neil Rubens

Autism: Survey of Emerging Approaches [Clinical]Neil Rubens

Collaborative Robotics (CoBot): Opportunities for CorporationsNeil Rubens

Autism: Survey of Emerging Approaches [Startups]Neil Rubens

Solving the AL Chicken-and-Egg Corpus and Model ProblemNeil Rubens

Recommender Systems and Active Learning (for Startups)Neil Rubens

ThingTank @ MIT-Skoltech Innovation Symposium 2014Neil Rubens

Network Learning: AI-driven Connectivist Framework for E-Learning 3.0Neil Rubens

e-learning 3.0 and AINeil Rubens

Learning Networks: e-Learning 3.0Neil Rubens

Active Learning in Recommender SystemsNeil Rubens

Outliers and InconsistencyNeil Rubens

Alumni Network AnalysisNeil Rubens

Value Co-Creation in Innovation Ecosystems (Presentation @ Tokyo Institute of...Neil Rubens

Value Co-Creation in Innovation Ecosystems (English)Neil Rubens

Value Co-Creation in Innovation Ecosystems (Chinese)Neil Rubens

Japan MobileNeil Rubens

Más de Neil Rubens (16)

Autism: Survey of Emerging Approaches [Clinical]

Collaborative Robotics (CoBot): Opportunities for Corporations

Autism: Survey of Emerging Approaches [Startups]

Solving the AL Chicken-and-Egg Corpus and Model Problem

Recommender Systems and Active Learning (for Startups)

ThingTank @ MIT-Skoltech Innovation Symposium 2014

Network Learning: AI-driven Connectivist Framework for E-Learning 3.0

e-learning 3.0 and AI

Learning Networks: e-Learning 3.0

Active Learning in Recommender Systems

Outliers and Inconsistency

Alumni Network Analysis

Value Co-Creation in Innovation Ecosystems (Presentation @ Tokyo Institute of...

Value Co-Creation in Innovation Ecosystems (English)

Value Co-Creation in Innovation Ecosystems (Chinese)

Japan Mobile

Último

TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc

Boost PC performance: How more available memory can improve productivityPrincipled Technologies

Partners Life - Insurer Innovation Award 2024The Digital Insurer

08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls

08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls

[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science

IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge

Developing An App To Navigate The Roads of BrazilV3cube

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700

Salesforce Community Group Quito, Salesforce 101Paola De la Torre

Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer

A Call to Action for Generative AI in 2024Results

The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los

Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun

🐬 The future of MySQL is Postgres 🐘RTylerCroy

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung

A Domino Admins Adventures (Engage 2024)Gabriella Davis

Inconsistent Outliers

1. Inconsistency and OutliersActive Learning by Outlier DetectionInconsistency Robustness Symposium 2011 Neil Rubens Assistant Professor University of Electro-Communications Tokyo, Japan

2. Outline Inconsistency Robustness is a multi-disciplinary issue. We discuss some of the aspect of Inconsistency Robustness from the perspective of Machine Learning: What is Inconsistency Can Inconsistency be Useful Measuring Inconsistency

3. Inconsistency-Outlier Inconsistency/outlier: data that does not agree with the model.

4. Outlier Types Spatial Outlier unlabeled data Our Focus Model Outlier labeled data

5. Causes of Outliers Faulty data Entry error, malfunction, etc. Chance/Deviation Incorrect Model Our Focus http://www.dkimages.com/discover/previews/852/20223083.JPG

6. Typical Treatment of Outliers Assume that the learned model is correct and discard points that don’t agree with the model

7. Atypical Treatment of Outliers Assume that data is right, and that the model is wrong Our Focus

10.

11. Rubens et al, AJS 2011

12.

13. If there is no inconsistency between the training and testing data then the most complex model would tend be selected.

14. Change Detection / Model Correction Is inconsistency caused by noise (or minor factors) or by changes in the underlying model http://www.skyboximaging.com/solutions/application/change-detection Applications: medical diagnostics, intrusion detection, network analysis, finance http://www.satimagingcorp.com/galleryimages/high-resolution-landsat-satellite-imagery-oman.jpg http://www.lucieer.net/research/heard.html http://www.ittvis.com/portals/0/images/ChangeDetection_3window.jpg

15. Conclusion Inconsistency could be useful for: Hypothesis Learning Model Selection Model Correction Neil Rubens Assistant ProfessorActive Intelligence Group Laboratory for Knowledge Computing University of Electro-Communications Tokyo, Japan http://ActiveIntelligence.org

Notas del editor

Hello. First of all, I would like to apologize for not being here in person; but I hope to join discussions about Inconsistency Robustness through online means.In my presentation I would like to talk about relations between Inconsistency and Outliers.
As could be seen from the symposium’s program the issue of Inconsistency Robustness is rather multi-disciplinary. Let me discuss some of its aspects from the Machine Learning perspective. More specifically I would like to express my views about what is inconsistency, whether it could be useful and how it could be measured.
In Machine Learning we typically refer to inconsistent points as outliers. Typically, we try to construct a model that is able to fits well the data that we have. The points that do not fit the model are typically considered to be an outlier.I think this cartoon captures very well the essence of the outliers. The outlier piont says that our model/or theory is not correct. On the other hand we consider outliers to be some erroneous or atypical data and tend to discard it.
We can separate outlier into two classes.In the case of Spatial Outlier, the point is considered to be an outlier if it is distant from other points.In the case of Model Outlier, an outlier is a point whose label is different from the model’s expectations.In this talk we will focus on the model outliers.
Outliers can occur due to a variety of causesOutlier could be a Faulty Data caused by the data entry error, or a measurement malfunctionThen there are outliers that occur by chance due to some natural deviationFinally outliers may be due to the incorrect assumptions that we make about the underlying model
When encountering an outlierit is often assumed that current hypothesis/model is reasonably accurate for most of the points, and is inaccurate for just a few outliers. Therefore using outliers is considered to lead the learning process astray towards tuning the model for some incorrect or uncommon cases and therefore making it less accurate for the majority of the points. So outliers are typically discardedWe often get attached to our models/theories and tend to downplay or disregard data does not agree with it.
But we must also consider the other possibility; That the data is right; and the model is wrong In which case the model needs to be changed and corrected
Let us discuss setting in which outlier points could be very useful for learning.Consider that we have many points and we want to learn which points are orange and which points are blue. This could be problem of predicting which movie you like, whether webpage is relevant to your query, which treatment should be prescribed, etc. Typical approach is simply to get a lot of data and then to learn from it. However in many settings obtaining data could be costly e.g. if we want to discover effective treatment of adisease we may have to try out many compounds and that costs a lot of money and effort. If I want to learn about your preferences for movies, I would I need to ask you which movies you like and which ones you don’t; but that takes time and effort and many people are able to provide only a few ratings.So since data is costly we want to obtain data that is most informative and useful.
So to learn the underlying colorings we can obtain a few samples, that is we select the points that we are interested in and their color is revealed.Lets say we have obtained a couple of points already. There could be a number of hypothesis/decision (shown by dashed lines) that are consistent with these points; i.e. points on one side of the line are blue and on the other side are orange. Then when predicting the color of the points we have to select one of the hypotheses and to hope that it is the correct one.
Lets consider that we are now allowed to get another sample. We can choose a sample that is consistent with all of the hypothesis; i.e. all of the hypothesis assign the same color to it. Not surprisingly when the color of the point is revealed it is blue. This might seem like a good thing, but unfortunately it does not allow to reduce the number of hypothesis so that we can find the correct one. On the other hand we can choose an inconsistent point for which part of the hypothesis assign blue color and the other one orange. After the color of the point is revealed we can get rid of the hypothesis that got it wrong; and get closer to finding the right hypothesis.
I would like to make another argument in support of outliers being informative.There is a very interesting phrase by Gregory Bateson that defines information as a difference that makes a difference. Outliers fit the viewpoint of information very well.Outliers are different from the rest of the points by definition.And including outliers in the learning process will make a difference on the model’s predictionsThe intuition behind this principle is thatThe only way that model’s prediction will improve, is if they will change.However, not all of changes are good; so the tricky part is to determine when the change is for the better and when it is not.
Let me briefly mention relation between inconsistency and model complexity.As the number of training point increases more complex models tend to fit data better. e.g. When we have just two points linear model fits the data very well; if we add another point a linear model may no longer be complex enough to fit the data, so we may need to use a polynomial model of order 2; and then as we add more points increasing complex models may be neededAn important implication of that being that as we learn more and more the underlying model is likely to change and to become increasingly complex.
The problem with simply increasing the model’s complexity is that the model that is too complex may start overfitting to the data, e.g. learning noise and not the signal. So allowing for some inconsistency could be good; models that do exceptionally well on some data may actually start to memorize it instead of learning it.So having some inconsistency between training and testing data could actually prevent us from making model more complex than necessary.
The initial learned model could be accurate; but as the time progressed the underlying process may have started to change; e.g. we saw some drastic changes in the stock pricing models these past two weeks. So when we encounter inconsistent data we should not discard it as noise, but try to see if it could be indicative of our current model being incorrect and if possible try to correct it.
In Conclusion, I hope that I was able to show that sometimes inconsistency could actually be rather useful for such things asHypothesis Learning, Model Selection and Model Correction.Thank You.

Inconsistent Outliers

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Inconsistent Outliers

Similar a Inconsistent Outliers (20)

Más de Neil Rubens

Más de Neil Rubens (16)

Último

Último (20)

Inconsistent Outliers

Notas del editor