1. Inconsistency and OutliersActive Learning by Outlier DetectionInconsistency Robustness Symposium 2011 Neil Rubens Assistant Professor University of Electro-Communications Tokyo, Japan
2. Outline Inconsistency Robustness is a multi-disciplinary issue. We discuss some of the aspect of Inconsistency Robustness from the perspective of Machine Learning: What is Inconsistency Can Inconsistency be Useful Measuring Inconsistency
5. Causes of Outliers Faulty data Entry error, malfunction, etc. Chance/Deviation Incorrect Model Our Focus http://www.dkimages.com/discover/previews/852/20223083.JPG
6. Typical Treatment of Outliers Assume that the learned model is correct and discard points that don’t agree with the model
7. Atypical Treatment of Outliers Assume that data is right, and that the model is wrong Our Focus
13. If there is no inconsistency between the training and testing data then the most complex model would tend be selected.
14. Change Detection / Model Correction Is inconsistency caused by noise (or minor factors) or by changes in the underlying model http://www.skyboximaging.com/solutions/application/change-detection Applications: medical diagnostics, intrusion detection, network analysis, finance http://www.satimagingcorp.com/galleryimages/high-resolution-landsat-satellite-imagery-oman.jpg http://www.lucieer.net/research/heard.html http://www.ittvis.com/portals/0/images/ChangeDetection_3window.jpg
15. Conclusion Inconsistency could be useful for: Hypothesis Learning Model Selection Model Correction Neil Rubens Assistant ProfessorActive Intelligence Group Laboratory for Knowledge Computing University of Electro-Communications Tokyo, Japan http://ActiveIntelligence.org
Notas del editor
Hello. First of all, I would like to apologize for not being here in person; but I hope to join discussions about Inconsistency Robustness through online means.In my presentation I would like to talk about relations between Inconsistency and Outliers.
As could be seen from the symposium’s program the issue of Inconsistency Robustness is rather multi-disciplinary. Let me discuss some of its aspects from the Machine Learning perspective. More specifically I would like to express my views about what is inconsistency, whether it could be useful and how it could be measured.
In Machine Learning we typically refer to inconsistent points as outliers. Typically, we try to construct a model that is able to fits well the data that we have. The points that do not fit the model are typically considered to be an outlier.I think this cartoon captures very well the essence of the outliers. The outlier piont says that our model/or theory is not correct. On the other hand we consider outliers to be some erroneous or atypical data and tend to discard it.
We can separate outlier into two classes.In the case of Spatial Outlier, the point is considered to be an outlier if it is distant from other points.In the case of Model Outlier, an outlier is a point whose label is different from the model’s expectations.In this talk we will focus on the model outliers.
Outliers can occur due to a variety of causesOutlier could be a Faulty Data caused by the data entry error, or a measurement malfunctionThen there are outliers that occur by chance due to some natural deviationFinally outliers may be due to the incorrect assumptions that we make about the underlying model
When encountering an outlierit is often assumed that current hypothesis/model is reasonably accurate for most of the points, and is inaccurate for just a few outliers. Therefore using outliers is considered to lead the learning process astray towards tuning the model for some incorrect or uncommon cases and therefore making it less accurate for the majority of the points. So outliers are typically discardedWe often get attached to our models/theories and tend to downplay or disregard data does not agree with it.
But we must also consider the other possibility; That the data is right; and the model is wrong In which case the model needs to be changed and corrected
Let us discuss setting in which outlier points could be very useful for learning.Consider that we have many points and we want to learn which points are orange and which points are blue. This could be problem of predicting which movie you like, whether webpage is relevant to your query, which treatment should be prescribed, etc. Typical approach is simply to get a lot of data and then to learn from it. However in many settings obtaining data could be costly e.g. if we want to discover effective treatment of adisease we may have to try out many compounds and that costs a lot of money and effort. If I want to learn about your preferences for movies, I would I need to ask you which movies you like and which ones you don’t; but that takes time and effort and many people are able to provide only a few ratings.So since data is costly we want to obtain data that is most informative and useful.
So to learn the underlying colorings we can obtain a few samples, that is we select the points that we are interested in and their color is revealed.Lets say we have obtained a couple of points already. There could be a number of hypothesis/decision (shown by dashed lines) that are consistent with these points; i.e. points on one side of the line are blue and on the other side are orange. Then when predicting the color of the points we have to select one of the hypotheses and to hope that it is the correct one.
Lets consider that we are now allowed to get another sample. We can choose a sample that is consistent with all of the hypothesis; i.e. all of the hypothesis assign the same color to it. Not surprisingly when the color of the point is revealed it is blue. This might seem like a good thing, but unfortunately it does not allow to reduce the number of hypothesis so that we can find the correct one. On the other hand we can choose an inconsistent point for which part of the hypothesis assign blue color and the other one orange. After the color of the point is revealed we can get rid of the hypothesis that got it wrong; and get closer to finding the right hypothesis.
I would like to make another argument in support of outliers being informative.There is a very interesting phrase by Gregory Bateson that defines information as a difference that makes a difference. Outliers fit the viewpoint of information very well.Outliers are different from the rest of the points by definition.And including outliers in the learning process will make a difference on the model’s predictionsThe intuition behind this principle is thatThe only way that model’s prediction will improve, is if they will change.However, not all of changes are good; so the tricky part is to determine when the change is for the better and when it is not.
Let me briefly mention relation between inconsistency and model complexity.As the number of training point increases more complex models tend to fit data better. e.g. When we have just two points linear model fits the data very well; if we add another point a linear model may no longer be complex enough to fit the data, so we may need to use a polynomial model of order 2; and then as we add more points increasing complex models may be neededAn important implication of that being that as we learn more and more the underlying model is likely to change and to become increasingly complex.
The problem with simply increasing the model’s complexity is that the model that is too complex may start overfitting to the data, e.g. learning noise and not the signal. So allowing for some inconsistency could be good; models that do exceptionally well on some data may actually start to memorize it instead of learning it.So having some inconsistency between training and testing data could actually prevent us from making model more complex than necessary.
The initial learned model could be accurate; but as the time progressed the underlying process may have started to change; e.g. we saw some drastic changes in the stock pricing models these past two weeks. So when we encounter inconsistent data we should not discard it as noise, but try to see if it could be indicative of our current model being incorrect and if possible try to correct it.
In Conclusion, I hope that I was able to show that sometimes inconsistency could actually be rather useful for such things asHypothesis Learning, Model Selection and Model Correction.Thank You.