SlideShare una empresa de Scribd logo
1 de 6
Reaction Paper Discussing Articles in Fields of Outlier Detection &
Sentiment Analysis
Khan Mostafa
Graduate Student, Computer Science, Stony Brook University, NY 11794, USA
Student ID# 109365509

This Reaction Paper is submitted as an assignment to critique and brainstorm upon
reading few papers.

In this article I am discussing four papers, two in field of outlier detection and two in field of
sentiment analysis. The first publication I discuss introduced LOF (Local Outlier Factor) as a
density based approach to detect outliers. LOF is a very useful and widely employed approach
although will not work very well in high-dimensions. Second article proposes using angular
measures to detect outliers in high-dimension. Next two articles address sentiment analysis; one
examined appraisal taxonomies for sentiment analysis and other one is about using Twitter as
corpus for sentiment analysis.

2.1 Outlier detection
Two of the discussed paper addressed outlier detection. An outlier is significantly different from
the rest (a.k.a. normal). In clusters, an outlier is some point which do not fit into any cluster.
Outliers can be of interest to many. Especially, it is interesting to detect anomalies. Anomalous
events or objects cannot be detected using supervised learning as we the nature of anomalies in
unknown. Thus some unsupervised method can be suitable. Outlier detection can be also used,
before clustering a dataset. It helps by removing outlying objects and thus better performing in
producing clusters. Sometimes, outliers are outstanding or crucial points of a system.

2.2 Sentiment Analysis
Identifying sentiment is important to many. Especially, corporations, politicians, banks want to
know how people are feeling about some certain product, campaign or thing. Sentiment is
generally expression of emotion or feeling regarding some object.
Generally, human can identify sentiment by reading text. But, to understand public opinion,
applications need to understand sentiment from massive amount of text. To do this, approaches
Reaction Paper submitted for CSE590 Networks and Data Mining Techniques on 22/10/2013
Khan Mostafa

Student ID# 109365509

are taken from fields spanning data mining, natural language processing, data mining and statistics.
The trend in sentiment analysis is to identify if some text is subjective and whether they convey
positive sentiment of negative.

In this section methods presented in each paper is briefly outlined and then reacted upon.
3.1 Local Outlier Factor for Outlier Detection
Earlier approaches for outlier detection considered outliers globally. However, a more appropriate
way of measuring outlier is measuring how outlying they are from the cluster they were supposed
to be if they were not outliers. Outliers should be calculated locally based on how deviant it is from
its neighbors. One early approach to consider outliers locally was by Knorr and Ng (Knorr and
Ng, Finding Intensional Knowledge of Distance-Based Outliers 1999) (Knorr and Ng, Algorithms
for Mining Distance-Based Outliers in Large Datasets 1998) where they proposed the notion of
distance-based outlier detection. A more efficient algorithm proposed considering distance to k
nearest neighbors (Ramaswamy, Rastogi and Shim 2000). However, distance is not an appropriate
measure when density of clusters vary. The work being examined (Breunig, et al. 2000) have
advanced local approach by introducing a density based concept, Local Outlier Factor (LOF).
Authors posit, “being outlying is not a binary property”. Hence, for each point a score, LOF, is calculated
which estimates the degree of being an outlier. For this, they calculate reachability distance of each
point and then estimates reachability density. Reachability density is the inverse of average
reachability distance in terms of k (= MinPts) nearest neighbors. Then LOF for a point p is
calculated as, “the average of the ratio of the local reachability density of p and those of p’s MinPts-nearest
neighbors”. LOF is higher when the point is further from its nearest neighbor. Reachability density
of a point deep in cluster (points that are not outlying) will have similar reachability distance as of
its neighbors. Hence, it has been shown that, LOF for non-outlying points will be approximately
Estimation of LOF is largely influenced by the parameter MinPts. MinPts is the number of
neighboring points in terms of which the reachability density is measured. If MinPts is larger than
the number of some cluster C, then all points in C will have LOF much larger than 1. Again, if
MinPts is much smaller, then outliers that are neighbors to n<MinPts outlying points may have a
LOF score approximately one. Therefore, estimation of MinPts are suggested to be done
After being proposed, LOF has gained a lot of attention and has widely been studied in last decade.
As LOF depends on parameter MinPts, another approach was suggested (Papadimitriou, et al.
2003) to calculate Local Outlier Correlation Integral (LOCI). Here, a sampling neighborhood of radius
r and a counting neighborhood of radius αr is considered. For a point p, points {… , 𝑝𝑛 𝑖 , … } that are
in sampling neighborhood are taken. Then cardinality of each point 𝑝𝑛 are respect to points within
its counting neighborhood. These calculations combined with mean and standard deviation are
used. They also introduced concept of LOCI plot. However, estimates of LOCI depends on input
parameter α.
Some other variations and extensions studied are,
Reaction Paper Discussing Articles in Fields of Outlier Detection & Sentiment Analysis


Outlier Detection using In-degree Number (ODIN) (Hautamaki, Karkkainen and Franti 2004)
Connectivity-based Outlier Factor (COF) (Tang, et al. 2002)
Using probabilistic suffix tree (PST) for detecting nearest neighbors
An approach to enhance efficiency is studied (Jin, Tung and Han 2001) to create micro clusters
Another study covered the case, when clusters of different densities lie closely

Many other studies are done, covering each and every variations and extensions that comes in
mind, rather not survey them all.
LOF is a very useful measure as it can identify outliers in local domain. It indeed covers global
outliers as all global outlier is also local outliers. A major weakness is however its computation cost
which is 𝑂(𝑛2 ). To reduced complexity, one approach might be to use some kind of locality
hashing. Where, a prior run will be made to hash each point into a bucket consisting of neighboring
points to it. A grid based approach can also be employed. For a point, k= (MinPts) number of
points will be chosen randomly from the bucket (or grid) where it belongs to. If there are fewer
points than k in a grid, nearby grids can be taken. Another improvement can be to calculate
reachability distance for each grid a priori while grid spacing them.
Some normal points can have very few neighbors – in such cases, LOF might yield a high LOF
for them indicating them as outlying.
LOF is density bases, density is defined in in terms of distance. In higher dimensions, distance are
almost similar (curse of dimensionality) for each points. In such case, LOF cannot be directly
employed. However, feature bagging is often suggested.
In high dimension, when there is a need to select few features, LOF can be used. In such case, few
features can be used each time to estimate LOFs. Those sets of features, which yield less diverse
LOFs (i.e. yields high LOF for fewer points) can be potentially good feature approximations.
LOF is spatial algorithm. Hence, it cannot be used in situations where there is not distance
LOFs can be also used to cluster points. In this case, a hierarchical clustering can be employed.
When a point is calculated to have LOF approximately 1 then it can be assigned to the cluster in
which its neighbors are belonging to.
LOF can be used to identify anomalies within clusters. Say, a small portion within a class is
significantly more or less dense. Points amongst them will result LOF scores which will be
different from LOF scores of other points of the cluster.

3.2 Angle Based Outlier Detection in Higher Dimension
In higher dimension, distance is uniform. But, an assumption was posed by (Hans-peter Kriegel
2008) that, angles are more stable and outliers reside in periphery. This method (ABOD) would
calculate angular distance of other points from the point in question. A point is said to be outlying
if most points rest in one side of the point i.e. if angular distance from this point is similar. While,
normal points will have sparse angular distance.
Khan Mostafa

Student ID# 109365509

ABOD has falls back in that it requires a very high computational complexity. This can be
minimized, by selecting random points. Angular distances will be similar from an outlying point
even if the points chosen in random. Another way can be to further subspace dividing features
and then calculate angular distances in these subspaces; these measures can be collectively used.

3.3 Appraisal Taxonomies in Sentiment Analysis
Sentiment analysis is being heavily investigated for more than a decade, instigated by Pang, et al
(Pang, Lee and Vaithyanathan 2002) when they attempted to solve the problem of sentiment
classification as a case of topic based categorization. Sentiment, however, in any granularity level
(viz. article, paragraph and sentence) is generally perceived by human as appraisal. Many words
and phrases are used to praise and many are to express negative comment about things. This case
was investigated by Whitelaw, et al. (Whitelaw, Garg and Argamon 2005).
They identified the need for semantic analysis of attitude expression and also hypothesized that,
atomic units of sentiment expression are not individual word but rather appraisal groups (Attitude,
Orientation, Graduation and Polarity). [See Appraisal Theory (Martin and White 2005)]. Basing
on WordNet and two other thesauri they constructed a lexicon. They used coarse ranking of
relevance to enlist such terms. However, final set of terms were produced using manual
examination. Then they tested several feature sets and found that, union of bag-of-words and
appraisal group by attitude & orientation (BoW + GAO) yields best result.
Proposed approach is not very scalable as they require a lot of manual labor and cannot create
absolute appraisal estimates for a lot of words. It also employs much computation intensive
classification technique. Though, this investigation brings light to the case that, appraisal are not
simply done with adjectives alone. Other parts of speech in sentences are also responsible for
sentiments in it. Several studies have tried to employ adverb and verbs along with adjectives to
estimate sentiment.
Overall, it can be said that, sentiment is expressed with tone of the sentence and different POS
occur differently in positive and negative statements. Hence, subjectivity of statements can be
scored using parts of speech tagging and estimating then by using some classifier.
Furthermore, nouns and names can also embody polarity. Especially, when comparative phrases
are used. Same word can also express different feeling in different context. An approach can be to
enlist appraisal scores of words along with contexts. Yet some problem may remain when,
qualifiers may indicate opposite feeling when used with different words (viz. fast access as opposed
to fast heating in PC RAM description).

3.4 Sentiment Analysis in Twitter
Twitter is a widely used blog-sphere where people often covey sentiment. Several studies have
tried to analyze sentiment in such platform. One of them is by A. Pak & P. Paroubek (Pak and
Paroubek 2010) where authors build a sentiment corpus by using tweets as corpus. They exploited
that user put emoticons and used them to build a sentiment lexicon. Along with that they trained
classifier based on parts of speech tagging. They also used then to estimate sentiment of tweets
with help of n-gram and POS classifiers.
Reaction Paper Discussing Articles in Fields of Outlier Detection & Sentiment Analysis

Their work is elegant one as it can estimate sentiments in real time. However, an extension to their
work can be to use sentiment score, in place of strict classifying (negative, positive, objective).
This approach cannot deal contextual sentiment dependency. An approach to deal with it can be,
to first extract keywords from the tweet. Then associate appraisal key-words with objective key
terms to estimate how this subjective word express sentiment for that particular key term. Even,
key terms can be used to identify context category first.
Sentiment analysis troubles when sarcastic and ironic speeches are used. Although, there are some
studies to solve this, it requires more investigation and may be more rigorous language processing
and logic techniques might be needed to more effectively estimate irony. Hence, a perfect
sentiment analysis tool is yet to emerge.

In this paper, four articles are discussed. First pair of papers which are about outlier detection
address different portions of same problem. LOF is widely studied and used, hence there is a
multitude of approaches to enhance, extend it as well as to improve its computational complexity.
Second article (ABOD) is also motivated from LOF. In present there are not much connection
found between sentiment analysis and outlier detection. However, in opinion mining of mass data
it can be useful. When, opinion about some entity is mined, first approach is to pull statements
about that entity. These pulled statements might also include some statement which is actually not
about that very entity. These outlying statements can be filtered to better reflect sentiment about
the entity.

Breunig, Markus M., Hans-Peter Kriegel, Raymond T. Ng, and Jörg Sander. 2000. "LOF:
Identifying Density-Based Local Outliers." International Conference on Management of Data SIGMOD. ACM. 93-104.
Hans-peter Kriegel, Matthias Schubert, Arthur Zimek. 2008. "Angle-based outlier detection in
high-dimensional data." Knowledge Discovery and Data Mining - KDD. 444-452.
Hautamaki, Ville, Ismo Karkkainen, and asi Franti. 2004. "Outlier detection using k-nearest
neighbour graph." Proceedings of the 17th International Conference on Pattern Recognition, ICPR
2004. IEEE. 430-433.
Jin, Wen, Anthony K. H. Tung, and Jiawei Han. 2001. "Mining top-n local outliers in large
databases." Knowledge Discovery and Data Mining - KDD. 293-298.
Knorr, Edwin M., and Raymond T. Ng. 1998. "Algorithms for Mining Distance-Based Outliers in
Large Datasets." Very Large Data Bases - VLDB. 392-403.
—. 1999. "Finding Intensional Knowledge of Distance-Based Outliers." Very Large Data Bases VLDB. 211-222.

Khan Mostafa

Student ID# 109365509

Martin, J. R., and P. R. R. White. 2005. Language of Evaluation: Appraisal in English. London: Palgrave.
Pak, Alexander, and Patrick Paroubek. 2010. "Twitter as a Corpus for Sentiment Analysis and
Opinion Mining." Language Resources and Evaluation. 1320-1326.
Pang, Bo, Lillian Lee, and Shivakumar Vaithyanathan. 2002. "Thumbs up? Sentiment Classification
using Machine Learning Techniques." Proceedings of the ACL-02 conference on Empirical methods
in natural language processing. Philadelphia, PA, USA: Association for Computational
Linguistics. 79-86.
Papadimitriou, Spiros, Hiroyuki Kitagawa, Phillip B. Gibbons, and Christos Faloutsos. 2003.
"Loci: Fast outlier detection using the local correlation integral." Proceedings. 19th
International Conference on Data Engineering. IEEE. 315-326.
Ramaswamy, Sridhar, Rajeev Rastogi, and Kyuseok Shim. 2000. "Efficient Algorithms for Mining
Outliers from Large Data Sets." Proc. ACM SIDMOD Int. Conf. on Management of Data.
ACM. 427-438.
Tang, Jian, Zhixiang Chen, Ada Wai-Chee Fu, and David W. Cheung. 2002. "Enhancing
effectiveness of outlier detections for low density patterns." In Advances in Knowledge
Discovery and Data Mining, 535-548. Springer Berlin Heidelberg.
Whitelaw, Casey, Navendu Garg, and Shlomo Argamon. 2005. "Using appraisal groups for
sentiment analysis." Proceedings of the 14th ACM international conference on Information and
knowledge management. ACM. 625-631.


Más contenido relacionado

La actualidad más candente

Search problems in Artificial Intelligence
Search problems in Artificial IntelligenceSearch problems in Artificial Intelligence
Search problems in Artificial Intelligenceananth
[Paper Reading] Supervised Learning of Universal Sentence Representations fro...
[Paper Reading] Supervised Learning of Universal Sentence Representations fro...[Paper Reading] Supervised Learning of Universal Sentence Representations fro...
[Paper Reading] Supervised Learning of Universal Sentence Representations fro...Hiroki Shimanaka
A survey on phrase structure learning methods for text classification
A survey on phrase structure learning methods for text classificationA survey on phrase structure learning methods for text classification
A survey on phrase structure learning methods for text classificationijnlc
Extraction of Socio-Semantic Data from Chat Conversations in Collaborative Le...
Extraction of Socio-Semantic Data from Chat Conversations in Collaborative Le...Extraction of Socio-Semantic Data from Chat Conversations in Collaborative Le...
Extraction of Socio-Semantic Data from Chat Conversations in Collaborative Le...Traian Rebedea
Nlp research presentation
Nlp research presentationNlp research presentation
Nlp research presentationSurya Sg
Sarcasm Detection: Achilles Heel of sentiment analysis
Sarcasm Detection: Achilles Heel of sentiment analysisSarcasm Detection: Achilles Heel of sentiment analysis
Sarcasm Detection: Achilles Heel of sentiment analysisAnuj Gupta
Sentiment Analysis in Twitter with Lightweight Discourse Analysis
Sentiment Analysis in Twitter with Lightweight Discourse AnalysisSentiment Analysis in Twitter with Lightweight Discourse Analysis
Sentiment Analysis in Twitter with Lightweight Discourse Analysis Naveen Kumar
IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...
IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...
IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...IRJET Journal
Conceptual framework for abstractive text summarization
Conceptual framework for abstractive text summarizationConceptual framework for abstractive text summarization
Conceptual framework for abstractive text summarizationijnlc
IRJET- Emotion recognition using Speech Signal: A Review
IRJET-  	  Emotion recognition using Speech Signal: A ReviewIRJET-  	  Emotion recognition using Speech Signal: A Review
IRJET- Emotion recognition using Speech Signal: A ReviewIRJET Journal
SemEval - Aspect Based Sentiment Analysis
SemEval - Aspect Based Sentiment AnalysisSemEval - Aspect Based Sentiment Analysis
SemEval - Aspect Based Sentiment AnalysisAditya Joshi

La actualidad más candente (19)

Search problems in Artificial Intelligence
Search problems in Artificial IntelligenceSearch problems in Artificial Intelligence
Search problems in Artificial Intelligence
[Paper Reading] Supervised Learning of Universal Sentence Representations fro...
[Paper Reading] Supervised Learning of Universal Sentence Representations fro...[Paper Reading] Supervised Learning of Universal Sentence Representations fro...
[Paper Reading] Supervised Learning of Universal Sentence Representations fro...
AI: AI & Problem Solving
AI: AI & Problem SolvingAI: AI & Problem Solving
AI: AI & Problem Solving
Reasoning in AI
Reasoning in AIReasoning in AI
Reasoning in AI
SentiCircles for Contextual and Conceptual Semantic Sentiment Analysis of Twi...
SentiCircles for Contextual and Conceptual Semantic Sentiment Analysis of Twi...SentiCircles for Contextual and Conceptual Semantic Sentiment Analysis of Twi...
SentiCircles for Contextual and Conceptual Semantic Sentiment Analysis of Twi...
A survey on phrase structure learning methods for text classification
A survey on phrase structure learning methods for text classificationA survey on phrase structure learning methods for text classification
A survey on phrase structure learning methods for text classification
Text summarization
Text summarizationText summarization
Text summarization
Extraction of Socio-Semantic Data from Chat Conversations in Collaborative Le...
Extraction of Socio-Semantic Data from Chat Conversations in Collaborative Le...Extraction of Socio-Semantic Data from Chat Conversations in Collaborative Le...
Extraction of Socio-Semantic Data from Chat Conversations in Collaborative Le...
Nlp research presentation
Nlp research presentationNlp research presentation
Nlp research presentation
Semantic Patterns for Sentiment Analysis of Twitter
Semantic Patterns for Sentiment Analysis of TwitterSemantic Patterns for Sentiment Analysis of Twitter
Semantic Patterns for Sentiment Analysis of Twitter
Sarcasm Detection: Achilles Heel of sentiment analysis
Sarcasm Detection: Achilles Heel of sentiment analysisSarcasm Detection: Achilles Heel of sentiment analysis
Sarcasm Detection: Achilles Heel of sentiment analysis
AI: Logic in AI
AI: Logic in AIAI: Logic in AI
AI: Logic in AI
Sentiment Analysis in Twitter with Lightweight Discourse Analysis
Sentiment Analysis in Twitter with Lightweight Discourse AnalysisSentiment Analysis in Twitter with Lightweight Discourse Analysis
Sentiment Analysis in Twitter with Lightweight Discourse Analysis
Collective sensing
Collective sensingCollective sensing
Collective sensing
IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...
IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...
IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...
Conceptual framework for abstractive text summarization
Conceptual framework for abstractive text summarizationConceptual framework for abstractive text summarization
Conceptual framework for abstractive text summarization
IRJET- Emotion recognition using Speech Signal: A Review
IRJET-  	  Emotion recognition using Speech Signal: A ReviewIRJET-  	  Emotion recognition using Speech Signal: A Review
IRJET- Emotion recognition using Speech Signal: A Review
SemEval - Aspect Based Sentiment Analysis
SemEval - Aspect Based Sentiment AnalysisSemEval - Aspect Based Sentiment Analysis
SemEval - Aspect Based Sentiment Analysis


Ferdinand Marcos at the LNMB: In a Millennial's Perspective
Ferdinand Marcos at the LNMB: In a Millennial's PerspectiveFerdinand Marcos at the LNMB: In a Millennial's Perspective
Ferdinand Marcos at the LNMB: In a Millennial's PerspectiveSam Wais
Sona 2014 reaction paper for students
Sona 2014 reaction paper for studentsSona 2014 reaction paper for students
Sona 2014 reaction paper for studentsJann Paulo Montes

Destacado (7)

Ferdinand Marcos at the LNMB: In a Millennial's Perspective
Ferdinand Marcos at the LNMB: In a Millennial's PerspectiveFerdinand Marcos at the LNMB: In a Millennial's Perspective
Ferdinand Marcos at the LNMB: In a Millennial's Perspective
Reaction paper
Reaction paperReaction paper
Reaction paper
Reaction paper
Reaction paperReaction paper
Reaction paper
Sona 2014 reaction paper for students
Sona 2014 reaction paper for studentsSona 2014 reaction paper for students
Sona 2014 reaction paper for students
Reaction paper
Reaction paperReaction paper
Reaction paper

Similar a Reaction Paper Discussing Articles in Fields of Outlier Detection & Sentiment Analysis

Reverse Nearest Neighbors in Unsupervised Distance-Based Outlier Detection
Reverse Nearest Neighbors in Unsupervised Distance-Based Outlier DetectionReverse Nearest Neighbors in Unsupervised Distance-Based Outlier Detection
Reverse Nearest Neighbors in Unsupervised Distance-Based Outlier Detection1crore projects
Sa Presentation 20070917111 Thomas
Sa Presentation 20070917111 ThomasSa Presentation 20070917111 Thomas
Sa Presentation 20070917111 Thomasnspiropo
An Efficient Unsupervised AdaptiveAntihub Technique for Outlier Detection in ...
An Efficient Unsupervised AdaptiveAntihub Technique for Outlier Detection in ...An Efficient Unsupervised AdaptiveAntihub Technique for Outlier Detection in ...
An Efficient Unsupervised AdaptiveAntihub Technique for Outlier Detection in ...theijes
IJERD ( International Journal of Engineering Research and Devel...
IJERD ( International Journal of Engineering Research and Devel...IJERD ( International Journal of Engineering Research and Devel...
IJERD ( International Journal of Engineering Research and Devel...IJERD Editor
Distribution Similarity based Data Partition and Nearest Neighbor Search on U...
Distribution Similarity based Data Partition and Nearest Neighbor Search on U...Distribution Similarity based Data Partition and Nearest Neighbor Search on U...
Distribution Similarity based Data Partition and Nearest Neighbor Search on U...Editor IJMTER
Chap10 Anomaly Detection
Chap10 Anomaly DetectionChap10 Anomaly Detection
Chap10 Anomaly Detectionguest76d673
Capital market applications of neural networks etc
Capital market applications of neural networks etcCapital market applications of neural networks etc
Capital market applications of neural networks etc23tino
Spatial data analysis
Spatial data analysisSpatial data analysis
Spatial data analysisJohan Blomme
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learningKnoldus Inc.
A Mixture Model of Hubness and PCA for Detection of Projected Outliers
A Mixture Model of Hubness and PCA for Detection of Projected OutliersA Mixture Model of Hubness and PCA for Detection of Projected Outliers
A Mixture Model of Hubness and PCA for Detection of Projected OutliersZac Darcy
Data Mining Anomaly DetectionLecture Notes for Chapt.docx
Data Mining Anomaly DetectionLecture Notes for Chapt.docxData Mining Anomaly DetectionLecture Notes for Chapt.docx
Data Mining Anomaly DetectionLecture Notes for Chapt.docxrandyburney60861

Similar a Reaction Paper Discussing Articles in Fields of Outlier Detection & Sentiment Analysis (20)

Reverse Nearest Neighbors in Unsupervised Distance-Based Outlier Detection
Reverse Nearest Neighbors in Unsupervised Distance-Based Outlier DetectionReverse Nearest Neighbors in Unsupervised Distance-Based Outlier Detection
Reverse Nearest Neighbors in Unsupervised Distance-Based Outlier Detection
Chapter 05 k nn
Chapter 05 k nnChapter 05 k nn
Chapter 05 k nn
3 Centrality
3 Centrality3 Centrality
3 Centrality
2018 Modern Math Workshop - Nonparametric Regression and Classification for M...
2018 Modern Math Workshop - Nonparametric Regression and Classification for M...2018 Modern Math Workshop - Nonparametric Regression and Classification for M...
2018 Modern Math Workshop - Nonparametric Regression and Classification for M...
Sa Presentation 20070917111 Thomas
Sa Presentation 20070917111 ThomasSa Presentation 20070917111 Thomas
Sa Presentation 20070917111 Thomas
An Efficient Unsupervised AdaptiveAntihub Technique for Outlier Detection in ...
An Efficient Unsupervised AdaptiveAntihub Technique for Outlier Detection in ...An Efficient Unsupervised AdaptiveAntihub Technique for Outlier Detection in ...
An Efficient Unsupervised AdaptiveAntihub Technique for Outlier Detection in ...
IJERD ( International Journal of Engineering Research and Devel...
IJERD ( International Journal of Engineering Research and Devel...IJERD ( International Journal of Engineering Research and Devel...
IJERD ( International Journal of Engineering Research and Devel...
Distribution Similarity based Data Partition and Nearest Neighbor Search on U...
Distribution Similarity based Data Partition and Nearest Neighbor Search on U...Distribution Similarity based Data Partition and Nearest Neighbor Search on U...
Distribution Similarity based Data Partition and Nearest Neighbor Search on U...
Chap10 Anomaly Detection
Chap10 Anomaly DetectionChap10 Anomaly Detection
Chap10 Anomaly Detection
Capital market applications of neural networks etc
Capital market applications of neural networks etcCapital market applications of neural networks etc
Capital market applications of neural networks etc
Spatial data analysis
Spatial data analysisSpatial data analysis
Spatial data analysis
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
A Mixture Model of Hubness and PCA for Detection of Projected Outliers
A Mixture Model of Hubness and PCA for Detection of Projected OutliersA Mixture Model of Hubness and PCA for Detection of Projected Outliers
A Mixture Model of Hubness and PCA for Detection of Projected Outliers
Data Mining Anomaly DetectionLecture Notes for Chapt.docx
Data Mining Anomaly DetectionLecture Notes for Chapt.docxData Mining Anomaly DetectionLecture Notes for Chapt.docx
Data Mining Anomaly DetectionLecture Notes for Chapt.docx
Esa act-2020 la mura
Esa act-2020 la muraEsa act-2020 la mura
Esa act-2020 la mura

Más de Khan Mostafa

Graph-based Analysis and Opinion Mining in Social Network
Graph-based Analysis and Opinion Mining in Social NetworkGraph-based Analysis and Opinion Mining in Social Network
Graph-based Analysis and Opinion Mining in Social NetworkKhan Mostafa
Research in the Computing Industry
Research in the Computing IndustryResearch in the Computing Industry
Research in the Computing IndustryKhan Mostafa
Semantic matchmaking Local Closed-World Reasoning
Semantic matchmaking Local Closed-World ReasoningSemantic matchmaking Local Closed-World Reasoning
Semantic matchmaking Local Closed-World ReasoningKhan Mostafa
Survey on real media paint simulation in Computer Graphics
Survey on real media paint simulation in Computer GraphicsSurvey on real media paint simulation in Computer Graphics
Survey on real media paint simulation in Computer GraphicsKhan Mostafa
Seminal works on watercolor painting simulation
Seminal works on watercolor painting simulation Seminal works on watercolor painting simulation
Seminal works on watercolor painting simulation Khan Mostafa
Project Presentation: Graph-based Analysis and Opinion Mining in Social Network
Project Presentation: Graph-based Analysis and Opinion Mining in Social NetworkProject Presentation: Graph-based Analysis and Opinion Mining in Social Network
Project Presentation: Graph-based Analysis and Opinion Mining in Social NetworkKhan Mostafa
A Survey on Sentiment Mining Techniques
A Survey on Sentiment Mining TechniquesA Survey on Sentiment Mining Techniques
A Survey on Sentiment Mining TechniquesKhan Mostafa
RDF by Structured Reference to Semantics, the RS2 framework
RDF by Structured Reference to Semantics, the RS2 frameworkRDF by Structured Reference to Semantics, the RS2 framework
RDF by Structured Reference to Semantics, the RS2 frameworkKhan Mostafa
Study Tour (KUET CSE 2k5) Poster
Study Tour (KUET CSE 2k5) PosterStudy Tour (KUET CSE 2k5) Poster
Study Tour (KUET CSE 2k5) PosterKhan Mostafa
Traffic Jam Detection System by Ratul, Sadh, Shams
Traffic Jam Detection System by Ratul, Sadh, ShamsTraffic Jam Detection System by Ratul, Sadh, Shams
Traffic Jam Detection System by Ratul, Sadh, ShamsKhan Mostafa
Open Document Format
Open Document FormatOpen Document Format
Open Document FormatKhan Mostafa
An Approach To Emerge Web 3.0
An Approach To Emerge Web 3.0An Approach To Emerge Web 3.0
An Approach To Emerge Web 3.0Khan Mostafa

Más de Khan Mostafa (14)

Graph-based Analysis and Opinion Mining in Social Network
Graph-based Analysis and Opinion Mining in Social NetworkGraph-based Analysis and Opinion Mining in Social Network
Graph-based Analysis and Opinion Mining in Social Network
Research in the Computing Industry
Research in the Computing IndustryResearch in the Computing Industry
Research in the Computing Industry
Semantic matchmaking Local Closed-World Reasoning
Semantic matchmaking Local Closed-World ReasoningSemantic matchmaking Local Closed-World Reasoning
Semantic matchmaking Local Closed-World Reasoning
Survey on real media paint simulation in Computer Graphics
Survey on real media paint simulation in Computer GraphicsSurvey on real media paint simulation in Computer Graphics
Survey on real media paint simulation in Computer Graphics
Seminal works on watercolor painting simulation
Seminal works on watercolor painting simulation Seminal works on watercolor painting simulation
Seminal works on watercolor painting simulation
Project Presentation: Graph-based Analysis and Opinion Mining in Social Network
Project Presentation: Graph-based Analysis and Opinion Mining in Social NetworkProject Presentation: Graph-based Analysis and Opinion Mining in Social Network
Project Presentation: Graph-based Analysis and Opinion Mining in Social Network
A Survey on Sentiment Mining Techniques
A Survey on Sentiment Mining TechniquesA Survey on Sentiment Mining Techniques
A Survey on Sentiment Mining Techniques
The Career (CSE)
The Career (CSE)The Career (CSE)
The Career (CSE)
RDF by Structured Reference to Semantics, the RS2 framework
RDF by Structured Reference to Semantics, the RS2 frameworkRDF by Structured Reference to Semantics, the RS2 framework
RDF by Structured Reference to Semantics, the RS2 framework
Study Tour (KUET CSE 2k5) Poster
Study Tour (KUET CSE 2k5) PosterStudy Tour (KUET CSE 2k5) Poster
Study Tour (KUET CSE 2k5) Poster
Traffic Jam Detection System by Ratul, Sadh, Shams
Traffic Jam Detection System by Ratul, Sadh, ShamsTraffic Jam Detection System by Ratul, Sadh, Shams
Traffic Jam Detection System by Ratul, Sadh, Shams
Open Document Format
Open Document FormatOpen Document Format
Open Document Format
GPU Computing
GPU ComputingGPU Computing
GPU Computing
An Approach To Emerge Web 3.0
An Approach To Emerge Web 3.0An Approach To Emerge Web 3.0
An Approach To Emerge Web 3.0


WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate AgentsRyan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate AgentsRyan Mahoney
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu

Último (20)

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate AgentsRyan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
Ryan Mahoney - Will Artificial Intelligence Replace Real Estate Agents
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms

Reaction Paper Discussing Articles in Fields of Outlier Detection & Sentiment Analysis

  • 1. Reaction Paper Discussing Articles in Fields of Outlier Detection & Sentiment Analysis Khan Mostafa Graduate Student, Computer Science, Stony Brook University, NY 11794, USA Email: Student ID# 109365509 ABSTRACT This Reaction Paper is submitted as an assignment to critique and brainstorm upon reading few papers. 1 INTRODUCTION In this article I am discussing four papers, two in field of outlier detection and two in field of sentiment analysis. The first publication I discuss introduced LOF (Local Outlier Factor) as a density based approach to detect outliers. LOF is a very useful and widely employed approach although will not work very well in high-dimensions. Second article proposes using angular measures to detect outliers in high-dimension. Next two articles address sentiment analysis; one examined appraisal taxonomies for sentiment analysis and other one is about using Twitter as corpus for sentiment analysis. 2 RELATED TERMS 2.1 Outlier detection Two of the discussed paper addressed outlier detection. An outlier is significantly different from the rest (a.k.a. normal). In clusters, an outlier is some point which do not fit into any cluster. Outliers can be of interest to many. Especially, it is interesting to detect anomalies. Anomalous events or objects cannot be detected using supervised learning as we the nature of anomalies in unknown. Thus some unsupervised method can be suitable. Outlier detection can be also used, before clustering a dataset. It helps by removing outlying objects and thus better performing in producing clusters. Sometimes, outliers are outstanding or crucial points of a system. 2.2 Sentiment Analysis Identifying sentiment is important to many. Especially, corporations, politicians, banks want to know how people are feeling about some certain product, campaign or thing. Sentiment is generally expression of emotion or feeling regarding some object. Generally, human can identify sentiment by reading text. But, to understand public opinion, applications need to understand sentiment from massive amount of text. To do this, approaches Reaction Paper submitted for CSE590 Networks and Data Mining Techniques on 22/10/2013
  • 2. Khan Mostafa Student ID# 109365509 are taken from fields spanning data mining, natural language processing, data mining and statistics. The trend in sentiment analysis is to identify if some text is subjective and whether they convey positive sentiment of negative. 3 REACTIONS AND OUTLINES In this section methods presented in each paper is briefly outlined and then reacted upon. 3.1 Local Outlier Factor for Outlier Detection Earlier approaches for outlier detection considered outliers globally. However, a more appropriate way of measuring outlier is measuring how outlying they are from the cluster they were supposed to be if they were not outliers. Outliers should be calculated locally based on how deviant it is from its neighbors. One early approach to consider outliers locally was by Knorr and Ng (Knorr and Ng, Finding Intensional Knowledge of Distance-Based Outliers 1999) (Knorr and Ng, Algorithms for Mining Distance-Based Outliers in Large Datasets 1998) where they proposed the notion of distance-based outlier detection. A more efficient algorithm proposed considering distance to k nearest neighbors (Ramaswamy, Rastogi and Shim 2000). However, distance is not an appropriate measure when density of clusters vary. The work being examined (Breunig, et al. 2000) have advanced local approach by introducing a density based concept, Local Outlier Factor (LOF). Authors posit, “being outlying is not a binary property”. Hence, for each point a score, LOF, is calculated which estimates the degree of being an outlier. For this, they calculate reachability distance of each point and then estimates reachability density. Reachability density is the inverse of average reachability distance in terms of k (= MinPts) nearest neighbors. Then LOF for a point p is calculated as, “the average of the ratio of the local reachability density of p and those of p’s MinPts-nearest neighbors”. LOF is higher when the point is further from its nearest neighbor. Reachability density of a point deep in cluster (points that are not outlying) will have similar reachability distance as of its neighbors. Hence, it has been shown that, LOF for non-outlying points will be approximately one. Estimation of LOF is largely influenced by the parameter MinPts. MinPts is the number of neighboring points in terms of which the reachability density is measured. If MinPts is larger than the number of some cluster C, then all points in C will have LOF much larger than 1. Again, if MinPts is much smaller, then outliers that are neighbors to n<MinPts outlying points may have a LOF score approximately one. Therefore, estimation of MinPts are suggested to be done heuristically. After being proposed, LOF has gained a lot of attention and has widely been studied in last decade. As LOF depends on parameter MinPts, another approach was suggested (Papadimitriou, et al. 2003) to calculate Local Outlier Correlation Integral (LOCI). Here, a sampling neighborhood of radius r and a counting neighborhood of radius αr is considered. For a point p, points {… , 𝑝𝑛 𝑖 , … } that are in sampling neighborhood are taken. Then cardinality of each point 𝑝𝑛 are respect to points within its counting neighborhood. These calculations combined with mean and standard deviation are used. They also introduced concept of LOCI plot. However, estimates of LOCI depends on input parameter α. Some other variations and extensions studied are, 2
  • 3. Reaction Paper Discussing Articles in Fields of Outlier Detection & Sentiment Analysis      Outlier Detection using In-degree Number (ODIN) (Hautamaki, Karkkainen and Franti 2004) Connectivity-based Outlier Factor (COF) (Tang, et al. 2002) Using probabilistic suffix tree (PST) for detecting nearest neighbors An approach to enhance efficiency is studied (Jin, Tung and Han 2001) to create micro clusters first Another study covered the case, when clusters of different densities lie closely Many other studies are done, covering each and every variations and extensions that comes in mind, rather not survey them all. LOF is a very useful measure as it can identify outliers in local domain. It indeed covers global outliers as all global outlier is also local outliers. A major weakness is however its computation cost which is 𝑂(𝑛2 ). To reduced complexity, one approach might be to use some kind of locality hashing. Where, a prior run will be made to hash each point into a bucket consisting of neighboring points to it. A grid based approach can also be employed. For a point, k= (MinPts) number of points will be chosen randomly from the bucket (or grid) where it belongs to. If there are fewer points than k in a grid, nearby grids can be taken. Another improvement can be to calculate reachability distance for each grid a priori while grid spacing them. Some normal points can have very few neighbors – in such cases, LOF might yield a high LOF for them indicating them as outlying. LOF is density bases, density is defined in in terms of distance. In higher dimensions, distance are almost similar (curse of dimensionality) for each points. In such case, LOF cannot be directly employed. However, feature bagging is often suggested. In high dimension, when there is a need to select few features, LOF can be used. In such case, few features can be used each time to estimate LOFs. Those sets of features, which yield less diverse LOFs (i.e. yields high LOF for fewer points) can be potentially good feature approximations. LOF is spatial algorithm. Hence, it cannot be used in situations where there is not distance measure. LOFs can be also used to cluster points. In this case, a hierarchical clustering can be employed. When a point is calculated to have LOF approximately 1 then it can be assigned to the cluster in which its neighbors are belonging to. LOF can be used to identify anomalies within clusters. Say, a small portion within a class is significantly more or less dense. Points amongst them will result LOF scores which will be different from LOF scores of other points of the cluster. 3.2 Angle Based Outlier Detection in Higher Dimension In higher dimension, distance is uniform. But, an assumption was posed by (Hans-peter Kriegel 2008) that, angles are more stable and outliers reside in periphery. This method (ABOD) would calculate angular distance of other points from the point in question. A point is said to be outlying if most points rest in one side of the point i.e. if angular distance from this point is similar. While, normal points will have sparse angular distance. 3
  • 4. Khan Mostafa Student ID# 109365509 ABOD has falls back in that it requires a very high computational complexity. This can be minimized, by selecting random points. Angular distances will be similar from an outlying point even if the points chosen in random. Another way can be to further subspace dividing features and then calculate angular distances in these subspaces; these measures can be collectively used. 3.3 Appraisal Taxonomies in Sentiment Analysis Sentiment analysis is being heavily investigated for more than a decade, instigated by Pang, et al (Pang, Lee and Vaithyanathan 2002) when they attempted to solve the problem of sentiment classification as a case of topic based categorization. Sentiment, however, in any granularity level (viz. article, paragraph and sentence) is generally perceived by human as appraisal. Many words and phrases are used to praise and many are to express negative comment about things. This case was investigated by Whitelaw, et al. (Whitelaw, Garg and Argamon 2005). They identified the need for semantic analysis of attitude expression and also hypothesized that, atomic units of sentiment expression are not individual word but rather appraisal groups (Attitude, Orientation, Graduation and Polarity). [See Appraisal Theory (Martin and White 2005)]. Basing on WordNet and two other thesauri they constructed a lexicon. They used coarse ranking of relevance to enlist such terms. However, final set of terms were produced using manual examination. Then they tested several feature sets and found that, union of bag-of-words and appraisal group by attitude & orientation (BoW + GAO) yields best result. Proposed approach is not very scalable as they require a lot of manual labor and cannot create absolute appraisal estimates for a lot of words. It also employs much computation intensive classification technique. Though, this investigation brings light to the case that, appraisal are not simply done with adjectives alone. Other parts of speech in sentences are also responsible for sentiments in it. Several studies have tried to employ adverb and verbs along with adjectives to estimate sentiment. Overall, it can be said that, sentiment is expressed with tone of the sentence and different POS occur differently in positive and negative statements. Hence, subjectivity of statements can be scored using parts of speech tagging and estimating then by using some classifier. Furthermore, nouns and names can also embody polarity. Especially, when comparative phrases are used. Same word can also express different feeling in different context. An approach can be to enlist appraisal scores of words along with contexts. Yet some problem may remain when, qualifiers may indicate opposite feeling when used with different words (viz. fast access as opposed to fast heating in PC RAM description). 3.4 Sentiment Analysis in Twitter Twitter is a widely used blog-sphere where people often covey sentiment. Several studies have tried to analyze sentiment in such platform. One of them is by A. Pak & P. Paroubek (Pak and Paroubek 2010) where authors build a sentiment corpus by using tweets as corpus. They exploited that user put emoticons and used them to build a sentiment lexicon. Along with that they trained classifier based on parts of speech tagging. They also used then to estimate sentiment of tweets with help of n-gram and POS classifiers. 4
  • 5. Reaction Paper Discussing Articles in Fields of Outlier Detection & Sentiment Analysis Their work is elegant one as it can estimate sentiments in real time. However, an extension to their work can be to use sentiment score, in place of strict classifying (negative, positive, objective). This approach cannot deal contextual sentiment dependency. An approach to deal with it can be, to first extract keywords from the tweet. Then associate appraisal key-words with objective key terms to estimate how this subjective word express sentiment for that particular key term. Even, key terms can be used to identify context category first. Sentiment analysis troubles when sarcastic and ironic speeches are used. Although, there are some studies to solve this, it requires more investigation and may be more rigorous language processing and logic techniques might be needed to more effectively estimate irony. Hence, a perfect sentiment analysis tool is yet to emerge. 4 CONCLUSION In this paper, four articles are discussed. First pair of papers which are about outlier detection address different portions of same problem. LOF is widely studied and used, hence there is a multitude of approaches to enhance, extend it as well as to improve its computational complexity. Second article (ABOD) is also motivated from LOF. In present there are not much connection found between sentiment analysis and outlier detection. However, in opinion mining of mass data it can be useful. When, opinion about some entity is mined, first approach is to pull statements about that entity. These pulled statements might also include some statement which is actually not about that very entity. These outlying statements can be filtered to better reflect sentiment about the entity. 5 BIBLIOGRAPHY Breunig, Markus M., Hans-Peter Kriegel, Raymond T. Ng, and Jörg Sander. 2000. "LOF: Identifying Density-Based Local Outliers." International Conference on Management of Data SIGMOD. ACM. 93-104. Hans-peter Kriegel, Matthias Schubert, Arthur Zimek. 2008. "Angle-based outlier detection in high-dimensional data." Knowledge Discovery and Data Mining - KDD. 444-452. Hautamaki, Ville, Ismo Karkkainen, and asi Franti. 2004. "Outlier detection using k-nearest neighbour graph." Proceedings of the 17th International Conference on Pattern Recognition, ICPR 2004. IEEE. 430-433. Jin, Wen, Anthony K. H. Tung, and Jiawei Han. 2001. "Mining top-n local outliers in large databases." Knowledge Discovery and Data Mining - KDD. 293-298. Knorr, Edwin M., and Raymond T. Ng. 1998. "Algorithms for Mining Distance-Based Outliers in Large Datasets." Very Large Data Bases - VLDB. 392-403. —. 1999. "Finding Intensional Knowledge of Distance-Based Outliers." Very Large Data Bases VLDB. 211-222. 5
  • 6. Khan Mostafa Student ID# 109365509 Martin, J. R., and P. R. R. White. 2005. Language of Evaluation: Appraisal in English. London: Palgrave. Pak, Alexander, and Patrick Paroubek. 2010. "Twitter as a Corpus for Sentiment Analysis and Opinion Mining." Language Resources and Evaluation. 1320-1326. Pang, Bo, Lillian Lee, and Shivakumar Vaithyanathan. 2002. "Thumbs up? Sentiment Classification using Machine Learning Techniques." Proceedings of the ACL-02 conference on Empirical methods in natural language processing. Philadelphia, PA, USA: Association for Computational Linguistics. 79-86. Papadimitriou, Spiros, Hiroyuki Kitagawa, Phillip B. Gibbons, and Christos Faloutsos. 2003. "Loci: Fast outlier detection using the local correlation integral." Proceedings. 19th International Conference on Data Engineering. IEEE. 315-326. Ramaswamy, Sridhar, Rajeev Rastogi, and Kyuseok Shim. 2000. "Efficient Algorithms for Mining Outliers from Large Data Sets." Proc. ACM SIDMOD Int. Conf. on Management of Data. ACM. 427-438. Tang, Jian, Zhixiang Chen, Ada Wai-Chee Fu, and David W. Cheung. 2002. "Enhancing effectiveness of outlier detections for low density patterns." In Advances in Knowledge Discovery and Data Mining, 535-548. Springer Berlin Heidelberg. Whitelaw, Casey, Navendu Garg, and Shlomo Argamon. 2005. "Using appraisal groups for sentiment analysis." Proceedings of the 14th ACM international conference on Information and knowledge management. ACM. 625-631. 6