This is a demonstration, which will be presented by Surender Reddy Yerva during the 35th Annual SIGIR (Special Interest Group on Information Retrieval) Conference, taking place in Portland, Oregon, USA from August 12-16, 2012.
Extended Abstract:
People readily express their opinions about the various products, companies, TV shows etc., on Twitter. These
tweet messages are thus a rich source of information that can be exploited to understand the sentiments about the concerned products or services. Retrieving the tweets related to given entities is however a challenging task as their names are often (deliberately) ambiguous, e.g. Apple, Blackberry, Friends, etc. Nevertheless, identifying the relevant entities is an essential rst step to develop reliable sentiment analysis techniques that is not considered in existing systems, for example TweetFeel, TwitterSentiment.
While there is a number of techniques for identifying namedentities in unstructured text, they are often not directly applicable in this case, as tweet messages are very short (maximal 140 characters). This demonstrator introduces TweetSpector, a tool that addresses this retrieval task and enables to link tweet messages to a given entity. Our retrieval methods rely on classication techniques that exploit our concise descriptions of entity-relevant information, also called entity proles.
TweetSpector: Entity-based retrieval of Tweets [Demo]
1. TweetSpector: Entity-based retrieval of Tweets
Surender Reddy Yerva, Zoltán Miklós, Flavia Grosan, Alexandru Tandrau, Karl Aberer
Swiss Federal Institute of Technology (EPFL)
Lausanne, Switzerland
{surenderreddy.yerva,zoltan.miklos,flavia.grosan,alexandru.tandrau,karl.aberer}@epfl.ch
Categories and Subject Descriptors
H.3.1 [Information Systems Applications]: Content Anal-
ysis and Indexing; H.3.5 [Information Systems Applica-
tions]: Online Information Services
Keywords
Entity, Disambiguation, Profiles, Twitter
1. EXTENDED ABSTRACT
People readily express their opinions about the various
products, companies, TV shows etc., on Twitter1 . These
tweet messages are thus a rich source of information that can
be exploited to understand the sentiments about the con-
cerned products or services. Retrieving the tweets related
to given entities is however a challenging task as their names
are often (deliberately) ambiguous, e.g. Apple, Blackberry,
Friends, etc. Nevertheless, identifying the relevant entities
is an essential first step to develop reliable sentiment analy-
sis techniques that is not considered in existing systems, for
example TweetFeel2 , TwitterSentiment3 .
While there is a number of techniques for identifying named Figure 1: TweetSpector: Various Features
entities in unstructured text, they are often not directly ap-
plicable in this case, as tweet messages are very short (max- -Tweet Classification: TweetSpector displays in real-time
imal 140 characters). This demonstrator introduces Tweet- the classification results (see Figure 1). For example, a
Spector, a tool that addresses this retrieval task and enables stream of tweets is displayed and it is indicated whether
to link tweet messages to a given entity. Our retrieval meth- or not the messages shall be related to the company Ap-
ods rely on classification techniques that exploit our concise ple Inc.. The classification techniques are widely extended
descriptions of entity-relevant information, also called entity versions of our earlier work [1].
profiles. -User Feedback: The users can indicate whether the pro-
The demonstrator presents the following features of Tweet- posed classification is correct or not. This feedback is taken
Spector: into account by the algorithms. TweetSpector can also take
-Entity Profile Creation: TweetSpector supports auto- human input through crowdsourcing (through an interface
matic profile creation, where we apply named-entity recog- to Amazon Mechanical Turk).
nition, NLTK, wordnet and Web data extraction techniques -Dashboard: TweetSpector can display performance met-
to construct profiles for an entity, given a relevant Web- rics and statistical information on a dashboard related to
page. TweetSpector also enables manual profile construc- the entity.
tion, where users can construct arbitrary entity profiles,
as well as manual and automatic updates for initially con- 2. ACKNOWLEDGEMENTS
structed profiles (thus the profiles are dynamic). The profiles This work was partly funded by the NisB project (FP7-
can also be visualized using Word Clouds. ICT-256955) and the European Commission in the Planet-
1 Data NoE (contract nr. 257641).
http://www.twitter.com
2
http://www.tweetfeel.com
3
http://twittersentiment.appspot.com
3. REFERENCES
[1] Surender Reddy Yerva, Zolt´n Mikl´s, and Karl
a o
Aberer. Entity-based Classification of Twitter
Copyright is held by the author/owner(s). Messages. International Journal of Computer Science &
SIGIR’12, August 12–16, 2012, Portland, Oregon, USA.
ACM 978-1-4503-1472-5/12/08.
Applications, 9(1):88–115, 2012.