SlideShare una empresa de Scribd logo
1 de 59
Aspect Miner
   Fine-grained feature level opinion mining
          from rated review corpora

              MSc Thesis Defense | February 2012



                     Stelios Karabasakis
       Dept. of Informatics and Telecommunications
       National and Kapodistrian University of Athens

in association with the Knowledge Discovery in Databases Laboratory
                           kddlab.di.uoa.gr
INTRODUCTION


                       Opinion Mining: an overview
 What is it? The task of recognizing and classifying the
  opinions and sentiments expressed in unstructured text.
                                                   Our focus in
 Use cases                      this work Opinion sources
  • product comparison                     • news
  • opinion summarization                  • blogs
  • opinion-aware recommendation systems   • reviews
  • opinion-aware online advertising       • user comments
  • reputation management                  • social networks
  • business intelligence                  • forums
  • government intelligence                • discussion groups

Stelios Karabasakis   Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora   Feb 2012   2
INTRODUCTION


                      Reviews

 • Popular form of user
                                                                      movies                                   books
     generated content
     » consumers use them to
         make informed choices
     » businesses use them to
         gauge and monitor                                             hotels                                restaurants

         consumer sentiment


 • Covering many distinct
     domains, such as…                                                  goods                                 services

Stelios Karabasakis      Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora       Feb 2012   3
INTRODUCTION


                                                    Ratings
 • Every online review typically
   carries a rating
     » picked by the review author
     » summarizes the sentiment of
         the text

 • Corpora of rated reviews are
     » abundant on the web
     » potentially useful for
       supervised opinion mining
     » largely ignored in the literature!


Stelios Karabasakis   Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora   Feb 2012   4
INTRODUCTION


                      Opinion Mining is challenging
 Not as simple as counting positive vs. negative words
     It is pointless to discuss why Hitchcock was a genius.

 Distinct opinions about different topics in the same sentence
     The top-notch production values are not enough to distract from a
     clichéd story that lacks heart and soul.

 Semantics of subjective expressions are domain-dependent
     unpredictable plot twist, gloomy atmosphere                                                 (movies)
     unpredictable service quality, gloomy room                                                  (hotels)


Stelios Karabasakis   Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora     Feb 2012   5
INTRODUCTION


          Opinion Mining is a text classification problem

 classification dimensions
     • subjectivity: factual vs. subjective statements
     • polarity: positive vs. negative sentiment
     • intensity: weak vs. strong sentiment

 classification granularity                                              ? Motivating question
                                                                         How can we train a system to
     • binary                                                            distinguish among multiple
     • multiclass                                                        degrees of sentiment?




Stelios Karabasakis   Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora   Feb 2012   6
INTRODUCTION


                                     Classification levels
                      document level
 In “Game of Thrones” (2011), the transition
 from book to screen is remarkably successful.
 The carefully chosen location and cast, the
 top-notch cinematography and the seamless-                                                           positive
 ness of its narrative come together brilliantly.
 The new HBO show offers compelling drama,
 even when rehashing old fantasy themes.




Stelios Karabasakis    Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora    Feb 2012   7
INTRODUCTION


                                    Classification levels
                      sentence level
 In “Game of Thrones” (2011), the transition
                                                                                                     positive
 from book to screen is remarkably successful.
 The carefully chosen location and cast, the
 top-notch cinematography and the seamless-                                                          positive
 ness of its narrative come together brilliantly.
 The new HBO show offers compelling drama,
 even when rehashing old fantasy themes.                                                             positive




Stelios Karabasakis   Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora    Feb 2012   8
INTRODUCTION


                                    Classification levels
                      feature level
        features = domain-specific ratable properties
 In “Game of Thrones” (2011), the transition
                                                                                                     adaptation: positive
 from book to screen is remarkably successful.
 The carefully chosen location and cast, the                                                         production: positive
                                                                                                     cast: positive
 top-notch cinematography and the seamless-
                                                                                                     direction: positive
 ness of its narrative come together brilliantly.                                                    plot: positive
 The new HBO show offers compelling drama,                                                           serialization: positive
 even when rehashing old fantasy themes.                                                             subject: negative

                                                                  ? Motivating question
                                                                  How can we identify feature terms
                                                                  and the features they refer to?
Stelios Karabasakis   Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora      Feb 2012       9
INTRODUCTION


                                       Problem description

  Produce rich, fine-grained, feature-oriented review summaries
        by analyzing reviews at the sentence level and aggregating the results


                                                   Sample summary
  “Avatar” (2009)                                                           aggregated summary of 90 reviews
   aspect             mentions           sentiment mean         sentiment dispersion
  direction             217           9/10 STRONGLY POSITIVE 17% UNANIMOUS AGREEMENT
    story               152           8/10 POSITIVE          32% GENERAL AGREEMENT
   acting               177           4/10 WEAKLY NEGATIVE 56% MIXED REACTION



Stelios Karabasakis       Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora   Feb 2012   10
INTRODUCTION


                                 Solution components
 a sentiment lexicon                       term                        prior sentiment                               _
                                           masterpiece                 10 (very strongly positive)
 multiclass and adapted                    good                         8 (positive)
 to the target domain                      mediocre                     5 (very weakly negative)
                                           terrible                     2 (strongly negative)


  feature term           feature                                                             a feature lexicon
  protagonist            CAST
  performance            CAST                                                                   for the target domain
  deliver                CAST
  camera                 DIRECTION
  cinematography         DIRECTION
  dialogue               WRITING
  script                 WRITING


        and a set of linguistic rules for sentence classification
Stelios Karabasakis   Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora   Feb 2012       11
INTRODUCTION


                                The Aspect Miner system
                      (a proof-of concept implementation of our approach)

                                                                   Training subsystem

        Training corpus                                                                    Index of
        (rated reviews)                                                                    terms


                                                                Feature                  Term
                                Lexical
                                                               identifier              classifier
                               Analyzer

                                                                    Feature                Sentiment
                                                                    lexicon                lexicon

                                                                                                               Result:
          Text to
          classify
                                                                  Sentence classifier                          Feature-level
                                                                                                               sentiments




            Key features: modular architecture, unsupervised,
                          domain agnostic, configurable granularity
Stelios Karabasakis       Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora   Feb 2012         12
INTRODUCTION


                      Aspect Miner implementation*
 • Implemented in Java with
     » NekoHTML for scraping
     » JDBC/MySQL for dataset storage
     » Lucene as a lexical analysis API and for indexing
     » Wordnet & JWNL for lemmatization
     » Stanford Parser for POS-tagging & typed dependency parsing
     » Mallet’s LDA implementation for topic modeling
     » GraphViz for visualizations


                                                                 * source code (MIT-licensed) available from
                                                                 github.com/skarabasakis/ImdbAspectMiner

Stelios Karabasakis   Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora   Feb 2012   13
INTRODUCTION


                                        Training dataset*
 107.646 movie reviews from IMDB.com, rated 1-10 stars
                                                  *available as an SQL dump from http://db.tt/vAthzJRL



                                                                                         mean = 291 words
                                                                                         median = 228 words
           # reviews




                                               review length (words)


Stelios Karabasakis    Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora      Feb 2012   14
Sentiment Lexicon Construction
Designing a fine-grained term classifier
SENTIMENT LEXICON


                                                           Terms

A term is a (base form, part of speech) tuple
     » part of speech                    {VERB, NOUN, ADJECTIVE, ADVERB}

     » a term represents all inflected forms and spellings of a word
         e.g. {choose, chooses, chose, chosen, …}  [choose VERB]
                      {localise, localize, …}  [localize VERB]

     » terms can be compound
         e.g. [work out VERB]                                      [common sense NOUN]
              [meet up with VERB]                                  [as a matter of fact ADVERB]




Stelios Karabasakis          Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora   Feb 2012   16
SENTIMENT LEXICON


                                                                                     Lexical analyzer
                        Training corpus
                         (rated reviews)
                                                        Purpose: to extract terms from texts

                           Tokenization
                                                        » Identifies the base form of words & compounds
                           POS tagging                      • Uses Wordnet to look up base forms
                    Named Entity identification
                          Lemmatization
                                                        » Eliminates non-subjective words
 Lexical Analyzer




                    Comparatives annotation                 • Stop words including very common terms (be,have,…)

                    Negation scope resolution
                                                            • Named Entities (i.e. proper nouns)
                                                            • all articles, pronouns, prepositions etc.
                       Stop word removal

                                                        » Eliminates words that would be misleading
                    Open-class word filtering
                                                            for sentiment classification
                                                            • Comparatives & superlatives
                          Bags of terms
                          (one per                          • Words within a negation scope
                          document)


Stelios Karabasakis                        Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora   Feb 2012   17
SENTIMENT LEXICON


                                  Lexical analysis example
                The most dramatic moment in the Sixth Sense does not occur until the
                final minutes and the jaw dropping twist Shyamalan has been building up to.
                                                                       Lemmatize




                                                                       Eliminate




                                                                       Get indexable terms




Stelios Karabasakis         Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora   Feb 2012   18
SENTIMENT LEXICON


             Previous approaches to term classification
Lexicon-based approach
• Prior sentiment inferred from lexical associations
  (synonyms, antonyms, hypernyms etc.) in a dictionary
• High accuracy, limited coverage
• Notable example: Sentiwordnet (Esuli & Sebastiani 2006)
Corpus-based approach
• Prior sentiment inferred from correlation patterns
  (and, or, either…or, but etc.) in a training corpus
• Extended coverage, lower accuracy
• Notable examples: Hatzivassiloglou & McKeown 1997, Turney & Littman 2003,
     Popescu & Etzioni 2005, Ding Liu & Yu 2008


Stelios Karabasakis         Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora   Feb 2012   19
SENTIMENT LEXICON


                         Ratings-based term classification
Our proposal: a ratings-based approach
                                                              positive term                                negative term
• Requires a training
     set of rated reviews

• Prior sentiment
     inferred from the
     distribution of ratings
     among all the reviews                                     neutral term                             polysemous term
     where a term occurs,
     i.e. the rating
     histogram of the
     term

Stelios Karabasakis         Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora     Feb 2012   20
SENTIMENT LEXICON


                       IMDB dataset: Ratings distribution
                # reviews                                                                                       # terms
                                  # reviews
                                  # terms




                                                                rating

                         Caution: Ratings are not evenly distributed
                                 across the training corpus.
Stelios Karabasakis         Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora        Feb 2012   21
SENTIMENT LEXICON


                             Rating frequency weighting

Why? Weighting is necessary to
     » eliminate training set biases
     » make rating frequencies comparable to each other

How? Multiply every rating frequency in a histogram
 with that rating’s weight , calculated as follows:
     »       := cumulative term count of all reviews with rating
     » We pick               in such a way that                               are equal for all
         • Most predominant rating in the dataset has                                             =1
         • The less frequent the rating, the higher its weight

Stelios Karabasakis         Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora   Feb 2012   22
SENTIMENT LEXICON


                                Some sample histograms
                                       extracted from the IMDB dataset




Stelios Karabasakis         Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora   Feb 2012   23
SENTIMENT LEXICON


                                Designing a term classifier
                       input: weighted rating histogram for term
                      output: one or more* sets of significant ratings
                                                                                  * if term is polysemous


                                                                      A weighted mean function can
                                                                      condense    into a single rating.

                                                                                                       9 5 7 7 10 8
                                                                                                                    7
                                                                                                          9 7 10

                                                                      This rating indicates the term’s
                                                                      sentiment.


Stelios Karabasakis          Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora   Feb 2012   24
SENTIMENT LEXICON


                                         Neutrality criterion
           For a term to be neutral, its rating histogram must
                   approximate a uniform distribution



                                                                                         1



                                                                                              where 0 <         ≤1




Stelios Karabasakis         Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora   Feb 2012   25
SENTIMENT LEXICON


                             Term classification schemes
   Scheme 1: Peak Classifier
    Picks the histogram’s
     peak rating as the only
     significant rating




   Pros Simplest classifier possible. Useful as a comparison baseline.
        Surprisingly capable at classifying polarity (almost 2/3 accurate)
   Cons Can’t detect polysemy
        Poor at classifying intensity
Stelios Karabasakis         Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora   Feb 2012   26
SENTIMENT LEXICON


                             Term classification schemes
   Scheme 2: Positive/Negative Area Classifier (PN)
    All ratings above a cutoff
     frequency are significant
        Cutoff frequency should
           be set a little bit above                       11
           the frequency average.
    Returns separate sets for
       positive and negative
       ratings


   Pros Better at classifying intensity
        Makes an attempt at detecting polysemy
   Cons Weak terms can be mistaken for polysemous
Stelios Karabasakis         Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora   Feb 2012   27
SENTIMENT LEXICON


                             Term classification schemes
   Scheme 3: Widest Window Classifier (WW)
    Looks for windows of
     consecutive significant ratings
    Ratings are added to windows
     from most to least frequent
    Significant rating windows must
     satisfy 2 constraints
        minimum coverage:
            windows must contain at
         least          of samples
        be as wide as possible
    Returns as many rating classes
       as the windows it detects

   Pros Avoids detecting false polysemy
        Avoids biases exhibited by the other classification schemes
Stelios Karabasakis         Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora   Feb 2012   28
SENTIMENT LEXICON


               Classifier evaluation: Ratings Distribution
                                   We classified 33.000 terms
                          that appear ≥5 times in the IMDB dataset.
      Conclusion: WW classifier distributes rating classes more evenly

            PEAK

                PN

             WW




                           Distribution of primary rating classes for each classifier

Stelios Karabasakis         Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora   Feb 2012   29
SENTIMENT LEXICON


                            Classifier evaluation: Polarity
      We evaluate against a reference lexicon of 5272 terms
                 based on the MPQA and General Inquirer subjectivity lexicons.


                       Accuracy                      Precision        Recall        F1-Score          WW is the most
                                    POSITIVE           55.5%          44.2%           49.2%               accurate of the 3
              PEAK       63.6%
                                    NEGATIVE           67.3%          65.3%           66.3%
                                                                                                          proposed classifiers
                                    POSITIVE           62.4%          58.4%           60.4%           But not as accurate
                 PN      66.2%
                                    NEGATIVE           68.4%          72.3%           70.3%               than SentiWordnet
                                    POSITIVE           70.4%          86.2%           77.5%
                WW       70.1%                                                                        However, WW is
                                    NEGATIVE           69.6%          60.5%           64.8%
                                                                                                          more accurate for
                                    POSITIVE           63.6%          61.3%           62.4%               domain-specific
   SentiWordnet          73.2%
                                    NEGATIVE           83.6%          48.3%           61.3%               terms



Stelios Karabasakis         Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora      Feb 2012      30
SENTIMENT LEXICON


                                              Classifier evaluation: Intensity
         We evaluate against a test set of 443 strong + 323 weak terms
                                          based on the General Inquirer subjectivity lexicon.

                                                                WEAK          STRONG

                              40.0%                                                                                  Using the WW classifier
      % terms in WW lexicon




                                                                                                                     to classify intensity:
                              30.0%
                                                                                                                      78% of strong terms
            Ποσοστό όρων




                              20.0%
                                                                                                                          are classified 3 and
                                                                                                                          above
                              10.0%
                                                                                                                      83% of weak terms
                                                                                                                          are classified 3 and
                              0.0%
                                          1              2               3              4               5                 below
                                                Intensity Τιμή Έντασης WW lexicon
                                                           class in WW


Stelios Karabasakis                           Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora     Feb 2012     31
SENTIMENT LEXICON


                      The Aspect Miner sentiment lexicon*




          A reusable sentiment lexicon for the movie review domain
                                                                    * downloadable from
           github.com/skarabasakis/ImdbAspectMiner/blob/master/imdb_sentiment_lexicon.xls
Stelios Karabasakis         Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora   Feb 2012   32
Feature Identification
Using topic models for feature discovery
FEATURE IDENTIFICATION


                      Approaches to feature identification
The traditional approach: discovery through heuristics
• frequency: commonly occurring noun phrases are often features
     (Hu & Liu 2004)
• co-occurrence: terms commonly found near subjective expressions
     may be features (Kim & Hovy 2006, Qiu et al. 2011)
• language patterns: in phrases such as 'F of P' or 'P has F‘, P is a
     product and F is a feature (Popescu & Etzioni 2005)
• background knowledge: user annotations, ontologies, search
     engine results, Wikipedia data…
An up-and-coming approach: topic modeling

Stelios Karabasakis      Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora   Feb 2012   34
FEATURE IDENTIFICATION


                                        Topic Modeling
                  Probabilistic Topic Models can model the
                abstract topics that occur in a set of documents


                                                                                          documents are
                                                                                          mixtures of topics




                                                                                          topics are
                                                                                          distributions over words



Stelios Karabasakis   Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora   Feb 2012   35
FEATURE IDENTIFICATION


                                        Topic Modeling
Probabilistic topic models
• require that the user specifies a number of topics
     » Topics are just numbers – their semantic interpretation is not the model’s concern

• make an assumption about the probability distribution of topics
• define a probabilistic procedure for generating documents from topics
     » by inverting this procedure, we can infer topics from documents


A popular topic model: Latent Dirichlet Allocation (LDA)
• assumes that topics follow a Dirichlet prior distribution
     » i.e. each document is associated with just a small number of topics


Stelios Karabasakis   Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora   Feb 2012   36
FEATURE IDENTIFICATION


                                   Topics vs. Features
   ? Motivating question                                          Here are a few sample topics we
   Features are a form of topics. Can we                          got from running LDA on the
   use topic models to discover features?                         IMDB dataset

   ROLE               SCRIPT                      WAR                          POLICE                     CAR
   ACTOR              IDEA                        HERO                         CASE                       CHASE
   PERFORMANCE        DIALOGUE                    ATTACK                       MYSTERY                    SHOOT
   PLAY               WRITE                       GROUP                        VICTIM                     VEHICLE
   LEAD               PLOT                        AIRPLANE                     SOLVE                      COP
   CAST               SCREENPLAY                  BUNCH                        MURDER                     DRIVE
   SUPPORT            COME UP                     SOLDIER                      OFFICER                    KILL
   ACTRESS            CRAFT                       KILL                         SUSPECT                    STREET
   SHINE              EXPLAIN                     BOMB                         DETECTIVE                  BULLET
   STAR               HOLE                        ENEMY                        CRIME                      ROBBERY

       These topics arefeatures.                                     These topics arethemes.
           They are useful to us                                 We are not interested in them
Stelios Karabasakis   Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora      Feb 2012   37
FEATURE IDENTIFICATION


                      Feature identification with LDA
Problem. Topics are global, features are local

Solution. Train topic model on shorter segments (e.g. sentences) rather
 than full documents.

Problem. Running LDA on such short segments produces noisy topics

Solution. Implement a bootstrap aggregation scheme to filter the noise:
   1. Train N topic models from different subsets of dataset
   2. Merge similar topics across models to produce a single meta-model
   » Intuition: Valid feature-topics should occur in >1 models and share many common top
       terms. Noisy topics should be isolated to specific models

Stelios Karabasakis    Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora   Feb 2012   38
FEATURE IDENTIFICATION


                                          Merging topics

                 COMEDY    0.200               COMEDY            0.180                COMEDY            0.380
                 JOKE      0.099               PARODY            0.168                PARODY            0.168
                 LAUGH
                 FUN
                           0.096
                           0.088
                                       +       SATIRE
                                               JOKE
                                                                 0.099
                                                                 0.061
                                                                              =       JOKE
                                                                                      LAUGH
                                                                                                        0.160
                                                                                                        0.096
                 FORMULA   0.025               RIDICULE          0.054                SATIRE            0.099
                                                                                      FUN               0.088
                                                                                      RIDICULE          0.054      discarded
                                                                                      FORMULA           0.025




       Topic Similarity for topics Tm, Tn
       » More common terms with higher
            probabilities  higher similarity


Stelios Karabasakis     Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora       Feb 2012       39
FEATURE IDENTIFICATION


                               Merging topic models
To merge 2 topic sets
• Merge every topic of set A to most similar topic from set B
     » but only if that similarity is above average similarity

To merge N topic sets
• Merge first two, then merge the result with the third etc.
• At the end
     » discard topics with a low merging degree
     » If same term ends up in >1 topics, only keep it in the topic where it
         has the highest probability


Stelios Karabasakis   Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora   Feb 2012   40
FEATURE IDENTIFICATION


                                Movie feature lexicon
                      56 topics, manually labeled with 18 labels




Stelios Karabasakis    Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora   Feb 2012   41
Sentence classification
Utilizing language structure
for contextual sentiment estimation
and feature targeting
SENTENCE-LEVEL ANALYSIS


                                                    Sentiment
                      Sentiment: a (polarity, intensity) tuple, where
                        » polarity                {+,−}
                                                                                   2n classes
                        » intensity               {1, 2, …, n}

   mbinary: R10 S1                        m3: R10 S3                            m5: R10 S5

   1                                1                                          1                     -5
   2                          We define a
                                    2          -3                              2                     -4
   3                   -1           3
                              mapping function                                 3                     -3
                                               -2
   4                                4                                          4                     -2
   5
                              to convert ratings to
                                    5          -1                              5                     -1
   6                          sentiment classes
                                    6          +1                              6                     +1
   7                                7                                          7                     +2
                              (preferably 1:1) +2
   8                   +1           8                                          8                     +3
   9                                9          +3                              9                     +4
  10                               10                                         10                     +5

Stelios Karabasakis         Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora    Feb 2012   43
SENTENCE-LEVEL ANALYSIS


                                    Typed Dependencies
                                                                        Natalie Portman comes off as very believable,
Typed dependencies are binary                                               gaining empathy from the audience.
grammatical relations between
word pairs in a sentence
(de Marneffe et al., 2006)

    amod(relations, binary)

  type                governor dependent


Typed dependency trees are
• semantically richer than syntax trees
• easier to process, because content words are connected directly
  rather than through function words
Stelios Karabasakis       Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora    Feb 2012   44
SENTENCE-LEVEL ANALYSIS


                                    Dependency types




                      Stanford Parser’s representation defines a
                          hierarchy of 48 dependency types
Stelios Karabasakis    Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora    Feb 2012   45
SENTENCE-LEVEL ANALYSIS


                      Contextual sentiment estimation
                      ? Motivating question
                      What is the contextual sentiment of a dependency,
                      given the prior sentiment of its constituents?

Examples

    It is best to avoid watching                 infmod(best/+2, avoid/−4)  −4
       any of the increasingly                   xcomp(avoid/−4, watching/+2)  −2
        disappointing sequels.
                                                 advmod(disappointing/−2, increasingly/+3)  −3

Our model. We empirically developed and formally defined
• 6 outcome functions that model types of word interactions
• 42 dependency rules that cover all possible dependency patterns

Stelios Karabasakis      Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora    Feb 2012   46
SENTENCE-LEVEL ANALYSIS


                                   Outcome functions

                       Models an interaction where
    UNCHANGED          base term imposes the sentiment

                      Ιt seems that they ran out of budget.
      STRONGER         stronger term imposes the sentiment

          a mighty talent wasted in mass produced rom-coms

            AVG        both terms contribute equally to the sentiment

                                 intelligent and ambitious

Stelios Karabasakis    Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora    Feb 2012   47
SENTENCE-LEVEL ANALYSIS


                                   Outcome functions

                        Models an interaction where
      INTENSIFY         modifier increases the intensity of the base

                       increasingly disappointing sequels
        REFLECT         modifier overrides polarity, increases or decreases intensity of base

      impossible to enjoy unless you lower your expectations

            NEG         modifier diminishes or negates the base

                      not a masterpiece, but not bad either

Stelios Karabasakis    Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora    Feb 2012   48
SENTENCE-LEVEL ANALYSIS


                        Dependency Rules: General form

                          td(pgov, pdep)  outcome_base

type label                    term patterns                                outcome function                     base specifier
                      A pattern may specify:                 one of the following:                               GOV or DEP
                      • a list of allowed parts of speech UNCHANGED NEGATED
                      • a white list of specific terms    STRONGER         AVG
                                                          INTENSIFY        REFLECT
                                                          POSITIVE         NEGATIVE

              Examples conj(*,*)  AVG_DEP
                                   advmod({n,a,r},*)  INTENSIFY_GOV
                                   amod(*,{too})  NEGATIVE_GOV
Stelios Karabasakis         Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora     Feb 2012    49
SENTENCE-LEVEL ANALYSIS


                                      Aspect Miner dependency rule set
                                      gov                   dep                                                                        gov                   dep
                    Td                                                               outcome        base              td                                                     outcome    base
                               pos       wlist       pos       wlist                                                           pos       wlist        pos      wlist

                                                   1. Negation                                                                                      4. Modifiers

     1.1            neg         *           *          *          *                 NEGATE         GOV    4.1.1   advmod
                                                                                                                                *            *          *    {enough}       POSITIVE   GOV
     1.2.1          det                                                                                    4.1.2    Amod
     1.2.2          prt                                                                                    4.2.1   advmod
                                                                                                                                *            *          *      {too}       NEGATIVE    GOV
     1.2.3       advmod                                                                                    4.2.2    amod
                                *           *          *    negTerms1               NEGATE         GOV
     1.2.4         dobj                                                                                    4.3     advmod       v            *          *          *        REFLECT    GOV
     1.2.5         nsubj
                                                                                                           4.4     advmod      n,a,r         *          *          *       INTENSIFY   GOV
     1.2.6          dep
                                                                                                           4.5      amod        *            *          *          *        REFLECT    GOV
     1.3           pobj         *     negTerms1        *          *                 NEGATE         DEP
                                                                                                           4.6      infmod      a            *          *          *        REFLECT    GOV
     1.4            aux         *           *          *      negAux2               NEGATE         GOV
                                                                                                           4.7      infmod     v,n,r         *          *          *       INTENSIFY   DEP
     1   negTerms = {n't, no, not, never, none, nothing, nobody, noone, nowhere, without, hardly,
               barely, rarely, seldom, against, minus, sans}                                               4.8                  a            *          *          *        REFLECT    DEP
     2   negAux = {should, could, would, might, ought}                                                             partmod
                                                                                                           4.9                 v,n,r         *          *          *       STRONGER    DEP
                                                    2. Subjects                                            4.10    quantmod     *            *          *          *       INTENSIFY   GOV
     2.1.1         nsubj                                                                                   4.11       prt       *            *          *          *       STRONGER    GOV
                                *           *          *          *                INTENSIFY       GOV
     2.1.2      nsubjpass
                                                                                                           4.13      prep       *            *          *      {like}      UNCHANGED   GOV
     2.2.1         csubj
                                *           *          *          *                 REFLECT        GOV    4.12      prep       *            *          *          *        REFLECT    GOV
     2.2.2      csubjpass
                                                                                                                                                 5. Clausal Modifiers
                                                    3. Objects
                                                                                                           5.1      advcl       a            *          *          *        REFLECT    DEP
     3.1.1         dobj         *     negVerbs3        *          *                 NEGATE         DEP
     3.1.2         dobj         *           *          *          *                 REFLECT        GOV    5.2      advcl      v,n,r         *          *          *       UNCHANGED   DEP

     3.2           iobj         *           *          *          *               UNCHANGED        GOV    5.3      purpcl      *            *          *          *       UNCHANGED   DEP

     3.3           pobj         *           *          *          *               UNCHANGED        DEP                                      6. Clausal complements

     3   negVerbs = {avoid, cease, decline , forget, fail, miss , neglect, refrain, refuse, stop}          6.1.1    ccomp
                                                                                                           6.1.2    xcomp       *            *          *          *        REFLECT    GOV
                                                                                                           6.1.3    acomp
                                                                                                           6.2.1     conj
                                                                                                           6.2.2    appos       *            *          *          *          AVG      GOV
                                                                                                           6.2.3   parataxis
                                                                                                           6.3       dep        *            *          *          *       STRONGER    DEP


Stelios Karabasakis                                Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora                                         Feb 2012          50
SENTENCE-LEVEL ANALYSIS


                      Sentence classification algorithm
Initialization
• Generate dependency tree from sentence
• Annotate subjective terms with prior polarities from sentiment lexicon
• Annotate feature terms with labels from feature lexicon
Sentiment estimation
• Apply closest matching rule to every dependency relation in the tree
     » The sentiment of the dependency replaces previous sentiment of the governor node
     » Dependencies are processed in reverse postfix order (bottom to top and right to left)

Feature targeting
• The scope of a feature term is a subtree that contains the term and goes
     » all the way down to the leaves
     » all the way up to the closest clausal dependency
• the sentiment at the root of the subtree gets assigned to the feature
Stelios Karabasakis     Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora    Feb 2012   51
SENTENCE-LEVEL ANALYSIS


                      Sentence classification example




Stelios Karabasakis    Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora    Feb 2012   52
SENTENCE-LEVEL ANALYSIS


                           Sentence polarity evaluation
     Test set: Sentence polarity dataset by Pang & Lee, 2002
                  (5331 positive + 5331 negative sentences from movie reviews)

Results
Polarity classification is accurate for
         71.5% of positive sentences
         76.9% of negative sentences
         74.2% of all sentences
Analysis of error causes
         39.0%        inaccurate dependency rule
         28.5%        misclassified term (or we picked the wrong sense)
         21.5%        erroneous sentence parsing
          8.5%        ambiguous sentence
          2.5%        dependency rules applied in the wrong order
Stelios Karabasakis      Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora    Feb 2012   53
SENTENCE-LEVEL ANALYSIS


                                       Comparative evaluation
                      Reference                                  Method                                      Accuracy

                                                          Linguistic methods

                      Nakagawa, Irui & Kurohashi, 2005           majority voting                             62,9%
                      Ikeda & Takamura, 2008                     majority voting with negations              65.8%
                      Aspect Miner                               dependency rules                            74.2%

                                                       Learning based methods

                      Andreevskaia & Bergler, 2008               naïve bayes                                 69.0%
                      Nakagawa, Irui & Kurohashi, 2005           SVM (bag-of-features)                       76.4%
                      Arora, Mayfield et al., 2010               genetic programming                         76.9%
                                                                 SVM (sentence-wise learning
                      Ikeda & Takamura, 2008                                                                 77.0%
                                                                 with polarity shifting + ngrams)
                      Nakagawa, Irui & Kurohashi, 2005           dependency tree CRFs                        77.3%



         Conclusion: Our method fares well among linguistic techniques,
          but does not match the accuracy of learning based methods
Stelios Karabasakis            Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora        Feb 2012   54
Conclusions
Putting it all together
Training subsystem
                                                                                                                                                                                                   CONCLUSIONS

                                               Training corpus                                                                       Term classifier
                                                (rated reviews)


                                                                                                                           Corpus statistics                    Term Histogram
                                                                                                                              collection                          generation
                                                    Tokenization
                                                    POS tagging
                                           Named Entity identification                                                            Index of             PEAK           PN              WW
                                                                                                    Indexing
                      Lexical Analyzer              Lemmatization                                                                 terms              classifier    classifier      classifier


                                            Comparatives annotation

                                                                                                                                    Feature identifier
                                           Negation scope resolution
                                                                                                                         ...                                             Topic
                                                                                                                                                                         models
                                               Stop word removal                                                    partition 1




                                                                                   Training set partitioning
                                                                                                                                                          TΜ1      TΜ2     ...   TΜΝ-1   TΜΝ
                                                                                                                         ...

                                            Open-class word filtering                                               partition 2
                                                                                                                                               LDA




                                                                                                                        ...
                                                                                                                                                                    Aggregation
                                                                                                                         ...

                                                    Bags of terms                                                  partition N-1
                                                    (one per                                                             ...
                                                    document)
                                                                                                                    partition N                                   Assisted labeling




                                                                                                                  Sentiment lexicon                               Feature lexicon




                                                                    Dependency                             Dependency              Sentence & Feature
                                                                      parsing                                tree(s)                  Classification
                                                                                                                                                                        Result:
                                         Text to
                                                                                                                                                                   Feature-sentiment
                                         classify
                                                                                                                                                                         pairs
                                                                                                                                      Dependency
                                                                                                                                        Rule set
                                                             Sentence classifier


Stelios Karabasakis                            Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora                                                                Feb 2012    56
CONCLUSIONS


                         Summary of contributions
• We showed the feasibility of                                   • We developed a
     granular prior polarity                                         reusable sentiment lexicon
     classification using review                                     and feature lexicon for the
     ratings                                                         movie review domain
     » and developed a classifier that
         achieved at least 70% accuracy                          • We created a set of linguistic
         on the training dataset                                     rules and developed a
                                                                     methodology that is capable
• We suggested a                                                     fine-grained feature-level
     bagging-inspired                                                classification of sentences
     meta-algorithm for                                              » and achieved 74.2% accuracy
     discovering feature topics                                           for polarity classification on
     with LDA                                                             our test dataset.



Stelios Karabasakis   Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora   Feb 2012    57
CONCLUSIONS


                          Suggested Improvements
Term classification                                                                                intensifier term

• Assigning a special class to intensifier terms
• Per-feature polysemy resolution

Feature identification
• Named entities as features
• Applying multi-grain topic models for
     discovery of local topics, e.g. MG-LDA (Titov & MacDonald, 2008)
Sentence-level classification
• Supervised learning of rules.
  Replace manually-made set of rules with a set of rules inferred from
  frequent dependency patterns.

Stelios Karabasakis   Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora      Feb 2012    58
CONCLUSIONS


                                                            References
             For a complete list of references, see the full report (in greek)
                                                     http://j.mp/AspectMiner
B. Liu, “Sentiment analysis and subjectivity,” Handbook of Natural          M. Huand B. Liu, “Mining and summarizing customer reviews,” in
      Language Processing,, pp. 978–1420085921, 2010.                              Proceedings of the tenth ACM SIGKDD international conference
B. Pang and L. Lee, “Opinion mining and sentiment analysis,”                       on Knowledge discovery and data mining, 2004, pp. 168–177.
      Foundations and Trends in Information Retrieval, vol. 2, no. 1-2,     X. Ding, B. Liu, and P S. Yu, “A holistic lexicon-based approach to opinion
                                                                                                   .
      pp. 1–135, 2008.                                                             mining,” in Proceedings of the international conference on Web
A. Esuliand F. Sebastiani, “Sentiwordnet: A publicly available lexical             search and web data mining, 2008, pp. 231–240.
      resource for opinion mining,” in Proceedings of LREC, 2006, vol. 6,   I. Titovand R. McDonald, “Modelingonline reviews with multi-grain
      pp. 417–422.                                                                 topic models,” in Proceeding of the 17th international conference
V. Hatzivassiloglouand K. R. McKeown, “Predicting the semantic                     on World Wide Web, 2008, pp. 111–120.
      orientation of adjectives,” in Proceedings of the eighth conference   T. Nakagawa, K. Inui, and S. Kurohashi, “Dependency tree-based
      on European chapter of the Association for Computational                     sentiment classification using CRFswith hidden variables,” in
      Linguistics, 1997, pp. 174–181.                                              Human Language Technologies: The 2010 Annual Conference of
P Turney, M. L. Littman, and others, “Measuring praise and criticism:
 .                                                                                 the North American Chapter of the Association for Computational
      Inference of semantic orientation fromassociation,” in ACM                   Linguistics, 2010, pp. 786–794.
      Transactions on Information Systems (TOIS), 2003.                     A. Andreevskaiaand S. Bergler, “When specialists and generalists work
A. M. Popescuand O. Etzioni, “Extracting product features and                      together: Overcoming domain dependence in sentiment
      opinions from reviews,” in Proceedings of the conference on                  tagging,” ACL-08: HLT, 2008.
      Human Language Technology and Empirical Methods in Natural            D. Ikeda and H. Takamura, “Learning to shift the polarity of words for
      Language Processing, 2005, pp. 339–346.                                      sentiment classification,” Comp.Intelligence, vol. 25, no. 1, pp.
                                                                                   296–303, 2008.

Stelios Karabasakis               Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora                 Feb 2012              59

Más contenido relacionado

Último

FILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinoFILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinojohnmickonozaleda
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)cama23
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomnelietumpap1
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxMaryGraceBautista27
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptxSherlyMaeNeri
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 

Último (20)

FILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinoFILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipino
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choom
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptx
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptx
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 

Destacado

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

Destacado (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Aspect Miner: Fine-grained, feature-level opinion mining from rated review corpora

  • 1. Aspect Miner Fine-grained feature level opinion mining from rated review corpora MSc Thesis Defense | February 2012 Stelios Karabasakis Dept. of Informatics and Telecommunications National and Kapodistrian University of Athens in association with the Knowledge Discovery in Databases Laboratory kddlab.di.uoa.gr
  • 2. INTRODUCTION Opinion Mining: an overview What is it? The task of recognizing and classifying the opinions and sentiments expressed in unstructured text. Our focus in Use cases this work Opinion sources • product comparison • news • opinion summarization • blogs • opinion-aware recommendation systems • reviews • opinion-aware online advertising • user comments • reputation management • social networks • business intelligence • forums • government intelligence • discussion groups Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 2
  • 3. INTRODUCTION Reviews • Popular form of user movies books generated content » consumers use them to make informed choices » businesses use them to gauge and monitor hotels restaurants consumer sentiment • Covering many distinct domains, such as… goods services Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 3
  • 4. INTRODUCTION Ratings • Every online review typically carries a rating » picked by the review author » summarizes the sentiment of the text • Corpora of rated reviews are » abundant on the web » potentially useful for supervised opinion mining » largely ignored in the literature! Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 4
  • 5. INTRODUCTION Opinion Mining is challenging Not as simple as counting positive vs. negative words It is pointless to discuss why Hitchcock was a genius. Distinct opinions about different topics in the same sentence The top-notch production values are not enough to distract from a clichéd story that lacks heart and soul. Semantics of subjective expressions are domain-dependent unpredictable plot twist, gloomy atmosphere (movies) unpredictable service quality, gloomy room (hotels) Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 5
  • 6. INTRODUCTION Opinion Mining is a text classification problem classification dimensions • subjectivity: factual vs. subjective statements • polarity: positive vs. negative sentiment • intensity: weak vs. strong sentiment classification granularity ? Motivating question How can we train a system to • binary distinguish among multiple • multiclass degrees of sentiment? Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 6
  • 7. INTRODUCTION Classification levels document level In “Game of Thrones” (2011), the transition from book to screen is remarkably successful. The carefully chosen location and cast, the top-notch cinematography and the seamless- positive ness of its narrative come together brilliantly. The new HBO show offers compelling drama, even when rehashing old fantasy themes. Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 7
  • 8. INTRODUCTION Classification levels sentence level In “Game of Thrones” (2011), the transition positive from book to screen is remarkably successful. The carefully chosen location and cast, the top-notch cinematography and the seamless- positive ness of its narrative come together brilliantly. The new HBO show offers compelling drama, even when rehashing old fantasy themes. positive Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 8
  • 9. INTRODUCTION Classification levels feature level features = domain-specific ratable properties In “Game of Thrones” (2011), the transition adaptation: positive from book to screen is remarkably successful. The carefully chosen location and cast, the production: positive cast: positive top-notch cinematography and the seamless- direction: positive ness of its narrative come together brilliantly. plot: positive The new HBO show offers compelling drama, serialization: positive even when rehashing old fantasy themes. subject: negative ? Motivating question How can we identify feature terms and the features they refer to? Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 9
  • 10. INTRODUCTION Problem description Produce rich, fine-grained, feature-oriented review summaries by analyzing reviews at the sentence level and aggregating the results Sample summary “Avatar” (2009) aggregated summary of 90 reviews aspect mentions sentiment mean sentiment dispersion direction 217 9/10 STRONGLY POSITIVE 17% UNANIMOUS AGREEMENT story 152 8/10 POSITIVE 32% GENERAL AGREEMENT acting 177 4/10 WEAKLY NEGATIVE 56% MIXED REACTION Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 10
  • 11. INTRODUCTION Solution components a sentiment lexicon term prior sentiment _ masterpiece 10 (very strongly positive) multiclass and adapted good 8 (positive) to the target domain mediocre 5 (very weakly negative) terrible 2 (strongly negative) feature term feature a feature lexicon protagonist CAST performance CAST for the target domain deliver CAST camera DIRECTION cinematography DIRECTION dialogue WRITING script WRITING and a set of linguistic rules for sentence classification Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 11
  • 12. INTRODUCTION The Aspect Miner system (a proof-of concept implementation of our approach) Training subsystem Training corpus Index of (rated reviews) terms Feature Term Lexical identifier classifier Analyzer Feature Sentiment lexicon lexicon Result: Text to classify Sentence classifier Feature-level sentiments Key features: modular architecture, unsupervised, domain agnostic, configurable granularity Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 12
  • 13. INTRODUCTION Aspect Miner implementation* • Implemented in Java with » NekoHTML for scraping » JDBC/MySQL for dataset storage » Lucene as a lexical analysis API and for indexing » Wordnet & JWNL for lemmatization » Stanford Parser for POS-tagging & typed dependency parsing » Mallet’s LDA implementation for topic modeling » GraphViz for visualizations * source code (MIT-licensed) available from github.com/skarabasakis/ImdbAspectMiner Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 13
  • 14. INTRODUCTION Training dataset* 107.646 movie reviews from IMDB.com, rated 1-10 stars *available as an SQL dump from http://db.tt/vAthzJRL mean = 291 words median = 228 words # reviews review length (words) Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 14
  • 15. Sentiment Lexicon Construction Designing a fine-grained term classifier
  • 16. SENTIMENT LEXICON Terms A term is a (base form, part of speech) tuple » part of speech {VERB, NOUN, ADJECTIVE, ADVERB} » a term represents all inflected forms and spellings of a word e.g. {choose, chooses, chose, chosen, …}  [choose VERB] {localise, localize, …}  [localize VERB] » terms can be compound e.g. [work out VERB] [common sense NOUN] [meet up with VERB] [as a matter of fact ADVERB] Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 16
  • 17. SENTIMENT LEXICON Lexical analyzer Training corpus (rated reviews) Purpose: to extract terms from texts Tokenization » Identifies the base form of words & compounds POS tagging • Uses Wordnet to look up base forms Named Entity identification Lemmatization » Eliminates non-subjective words Lexical Analyzer Comparatives annotation • Stop words including very common terms (be,have,…) Negation scope resolution • Named Entities (i.e. proper nouns) • all articles, pronouns, prepositions etc. Stop word removal » Eliminates words that would be misleading Open-class word filtering for sentiment classification • Comparatives & superlatives Bags of terms (one per • Words within a negation scope document) Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 17
  • 18. SENTIMENT LEXICON Lexical analysis example The most dramatic moment in the Sixth Sense does not occur until the final minutes and the jaw dropping twist Shyamalan has been building up to. Lemmatize Eliminate Get indexable terms Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 18
  • 19. SENTIMENT LEXICON Previous approaches to term classification Lexicon-based approach • Prior sentiment inferred from lexical associations (synonyms, antonyms, hypernyms etc.) in a dictionary • High accuracy, limited coverage • Notable example: Sentiwordnet (Esuli & Sebastiani 2006) Corpus-based approach • Prior sentiment inferred from correlation patterns (and, or, either…or, but etc.) in a training corpus • Extended coverage, lower accuracy • Notable examples: Hatzivassiloglou & McKeown 1997, Turney & Littman 2003, Popescu & Etzioni 2005, Ding Liu & Yu 2008 Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 19
  • 20. SENTIMENT LEXICON Ratings-based term classification Our proposal: a ratings-based approach positive term negative term • Requires a training set of rated reviews • Prior sentiment inferred from the distribution of ratings among all the reviews neutral term polysemous term where a term occurs, i.e. the rating histogram of the term Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 20
  • 21. SENTIMENT LEXICON IMDB dataset: Ratings distribution # reviews # terms # reviews # terms rating Caution: Ratings are not evenly distributed across the training corpus. Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 21
  • 22. SENTIMENT LEXICON Rating frequency weighting Why? Weighting is necessary to » eliminate training set biases » make rating frequencies comparable to each other How? Multiply every rating frequency in a histogram with that rating’s weight , calculated as follows: » := cumulative term count of all reviews with rating » We pick in such a way that are equal for all • Most predominant rating in the dataset has =1 • The less frequent the rating, the higher its weight Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 22
  • 23. SENTIMENT LEXICON Some sample histograms extracted from the IMDB dataset Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 23
  • 24. SENTIMENT LEXICON Designing a term classifier input: weighted rating histogram for term output: one or more* sets of significant ratings * if term is polysemous A weighted mean function can condense into a single rating. 9 5 7 7 10 8 7 9 7 10 This rating indicates the term’s sentiment. Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 24
  • 25. SENTIMENT LEXICON Neutrality criterion For a term to be neutral, its rating histogram must approximate a uniform distribution 1 where 0 < ≤1 Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 25
  • 26. SENTIMENT LEXICON Term classification schemes Scheme 1: Peak Classifier  Picks the histogram’s peak rating as the only significant rating Pros Simplest classifier possible. Useful as a comparison baseline. Surprisingly capable at classifying polarity (almost 2/3 accurate) Cons Can’t detect polysemy Poor at classifying intensity Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 26
  • 27. SENTIMENT LEXICON Term classification schemes Scheme 2: Positive/Negative Area Classifier (PN)  All ratings above a cutoff frequency are significant  Cutoff frequency should be set a little bit above 11 the frequency average.  Returns separate sets for positive and negative ratings Pros Better at classifying intensity Makes an attempt at detecting polysemy Cons Weak terms can be mistaken for polysemous Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 27
  • 28. SENTIMENT LEXICON Term classification schemes Scheme 3: Widest Window Classifier (WW)  Looks for windows of consecutive significant ratings  Ratings are added to windows from most to least frequent  Significant rating windows must satisfy 2 constraints  minimum coverage: windows must contain at least of samples  be as wide as possible  Returns as many rating classes as the windows it detects Pros Avoids detecting false polysemy Avoids biases exhibited by the other classification schemes Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 28
  • 29. SENTIMENT LEXICON Classifier evaluation: Ratings Distribution We classified 33.000 terms that appear ≥5 times in the IMDB dataset. Conclusion: WW classifier distributes rating classes more evenly PEAK PN WW Distribution of primary rating classes for each classifier Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 29
  • 30. SENTIMENT LEXICON Classifier evaluation: Polarity We evaluate against a reference lexicon of 5272 terms based on the MPQA and General Inquirer subjectivity lexicons. Accuracy Precision Recall F1-Score  WW is the most POSITIVE 55.5% 44.2% 49.2% accurate of the 3 PEAK 63.6% NEGATIVE 67.3% 65.3% 66.3% proposed classifiers POSITIVE 62.4% 58.4% 60.4%  But not as accurate PN 66.2% NEGATIVE 68.4% 72.3% 70.3% than SentiWordnet POSITIVE 70.4% 86.2% 77.5% WW 70.1%  However, WW is NEGATIVE 69.6% 60.5% 64.8% more accurate for POSITIVE 63.6% 61.3% 62.4% domain-specific SentiWordnet 73.2% NEGATIVE 83.6% 48.3% 61.3% terms Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 30
  • 31. SENTIMENT LEXICON Classifier evaluation: Intensity We evaluate against a test set of 443 strong + 323 weak terms based on the General Inquirer subjectivity lexicon. WEAK STRONG 40.0% Using the WW classifier % terms in WW lexicon to classify intensity: 30.0%  78% of strong terms Ποσοστό όρων 20.0% are classified 3 and above 10.0%  83% of weak terms are classified 3 and 0.0% 1 2 3 4 5 below Intensity Τιμή Έντασης WW lexicon class in WW Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 31
  • 32. SENTIMENT LEXICON The Aspect Miner sentiment lexicon* A reusable sentiment lexicon for the movie review domain * downloadable from github.com/skarabasakis/ImdbAspectMiner/blob/master/imdb_sentiment_lexicon.xls Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 32
  • 33. Feature Identification Using topic models for feature discovery
  • 34. FEATURE IDENTIFICATION Approaches to feature identification The traditional approach: discovery through heuristics • frequency: commonly occurring noun phrases are often features (Hu & Liu 2004) • co-occurrence: terms commonly found near subjective expressions may be features (Kim & Hovy 2006, Qiu et al. 2011) • language patterns: in phrases such as 'F of P' or 'P has F‘, P is a product and F is a feature (Popescu & Etzioni 2005) • background knowledge: user annotations, ontologies, search engine results, Wikipedia data… An up-and-coming approach: topic modeling Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 34
  • 35. FEATURE IDENTIFICATION Topic Modeling Probabilistic Topic Models can model the abstract topics that occur in a set of documents documents are mixtures of topics topics are distributions over words Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 35
  • 36. FEATURE IDENTIFICATION Topic Modeling Probabilistic topic models • require that the user specifies a number of topics » Topics are just numbers – their semantic interpretation is not the model’s concern • make an assumption about the probability distribution of topics • define a probabilistic procedure for generating documents from topics » by inverting this procedure, we can infer topics from documents A popular topic model: Latent Dirichlet Allocation (LDA) • assumes that topics follow a Dirichlet prior distribution » i.e. each document is associated with just a small number of topics Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 36
  • 37. FEATURE IDENTIFICATION Topics vs. Features ? Motivating question Here are a few sample topics we Features are a form of topics. Can we got from running LDA on the use topic models to discover features? IMDB dataset ROLE SCRIPT WAR POLICE CAR ACTOR IDEA HERO CASE CHASE PERFORMANCE DIALOGUE ATTACK MYSTERY SHOOT PLAY WRITE GROUP VICTIM VEHICLE LEAD PLOT AIRPLANE SOLVE COP CAST SCREENPLAY BUNCH MURDER DRIVE SUPPORT COME UP SOLDIER OFFICER KILL ACTRESS CRAFT KILL SUSPECT STREET SHINE EXPLAIN BOMB DETECTIVE BULLET STAR HOLE ENEMY CRIME ROBBERY These topics arefeatures. These topics arethemes. They are useful to us We are not interested in them Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 37
  • 38. FEATURE IDENTIFICATION Feature identification with LDA Problem. Topics are global, features are local Solution. Train topic model on shorter segments (e.g. sentences) rather than full documents. Problem. Running LDA on such short segments produces noisy topics Solution. Implement a bootstrap aggregation scheme to filter the noise: 1. Train N topic models from different subsets of dataset 2. Merge similar topics across models to produce a single meta-model » Intuition: Valid feature-topics should occur in >1 models and share many common top terms. Noisy topics should be isolated to specific models Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 38
  • 39. FEATURE IDENTIFICATION Merging topics COMEDY 0.200 COMEDY 0.180 COMEDY 0.380 JOKE 0.099 PARODY 0.168 PARODY 0.168 LAUGH FUN 0.096 0.088 + SATIRE JOKE 0.099 0.061 = JOKE LAUGH 0.160 0.096 FORMULA 0.025 RIDICULE 0.054 SATIRE 0.099 FUN 0.088 RIDICULE 0.054 discarded FORMULA 0.025 Topic Similarity for topics Tm, Tn » More common terms with higher probabilities  higher similarity Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 39
  • 40. FEATURE IDENTIFICATION Merging topic models To merge 2 topic sets • Merge every topic of set A to most similar topic from set B » but only if that similarity is above average similarity To merge N topic sets • Merge first two, then merge the result with the third etc. • At the end » discard topics with a low merging degree » If same term ends up in >1 topics, only keep it in the topic where it has the highest probability Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 40
  • 41. FEATURE IDENTIFICATION Movie feature lexicon 56 topics, manually labeled with 18 labels Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 41
  • 42. Sentence classification Utilizing language structure for contextual sentiment estimation and feature targeting
  • 43. SENTENCE-LEVEL ANALYSIS Sentiment Sentiment: a (polarity, intensity) tuple, where » polarity {+,−} 2n classes » intensity {1, 2, …, n} mbinary: R10 S1 m3: R10 S3 m5: R10 S5 1 1 1 -5 2 We define a 2 -3 2 -4 3 -1 3 mapping function 3 -3 -2 4 4 4 -2 5 to convert ratings to 5 -1 5 -1 6 sentiment classes 6 +1 6 +1 7 7 7 +2 (preferably 1:1) +2 8 +1 8 8 +3 9 9 +3 9 +4 10 10 10 +5 Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 43
  • 44. SENTENCE-LEVEL ANALYSIS Typed Dependencies Natalie Portman comes off as very believable, Typed dependencies are binary gaining empathy from the audience. grammatical relations between word pairs in a sentence (de Marneffe et al., 2006) amod(relations, binary) type governor dependent Typed dependency trees are • semantically richer than syntax trees • easier to process, because content words are connected directly rather than through function words Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 44
  • 45. SENTENCE-LEVEL ANALYSIS Dependency types Stanford Parser’s representation defines a hierarchy of 48 dependency types Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 45
  • 46. SENTENCE-LEVEL ANALYSIS Contextual sentiment estimation ? Motivating question What is the contextual sentiment of a dependency, given the prior sentiment of its constituents? Examples It is best to avoid watching infmod(best/+2, avoid/−4)  −4 any of the increasingly xcomp(avoid/−4, watching/+2)  −2 disappointing sequels. advmod(disappointing/−2, increasingly/+3)  −3 Our model. We empirically developed and formally defined • 6 outcome functions that model types of word interactions • 42 dependency rules that cover all possible dependency patterns Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 46
  • 47. SENTENCE-LEVEL ANALYSIS Outcome functions Models an interaction where UNCHANGED base term imposes the sentiment Ιt seems that they ran out of budget. STRONGER stronger term imposes the sentiment a mighty talent wasted in mass produced rom-coms AVG both terms contribute equally to the sentiment intelligent and ambitious Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 47
  • 48. SENTENCE-LEVEL ANALYSIS Outcome functions Models an interaction where INTENSIFY modifier increases the intensity of the base increasingly disappointing sequels REFLECT modifier overrides polarity, increases or decreases intensity of base impossible to enjoy unless you lower your expectations NEG modifier diminishes or negates the base not a masterpiece, but not bad either Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 48
  • 49. SENTENCE-LEVEL ANALYSIS Dependency Rules: General form td(pgov, pdep)  outcome_base type label term patterns outcome function base specifier A pattern may specify: one of the following: GOV or DEP • a list of allowed parts of speech UNCHANGED NEGATED • a white list of specific terms STRONGER AVG INTENSIFY REFLECT POSITIVE NEGATIVE Examples conj(*,*)  AVG_DEP advmod({n,a,r},*)  INTENSIFY_GOV amod(*,{too})  NEGATIVE_GOV Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 49
  • 50. SENTENCE-LEVEL ANALYSIS Aspect Miner dependency rule set gov dep gov dep Td outcome base td outcome base pos wlist pos wlist pos wlist pos wlist 1. Negation 4. Modifiers 1.1 neg * * * *  NEGATE GOV 4.1.1 advmod * * * {enough}  POSITIVE GOV 1.2.1 det 4.1.2 Amod 1.2.2 prt 4.2.1 advmod * * * {too}  NEGATIVE GOV 1.2.3 advmod 4.2.2 amod * * * negTerms1  NEGATE GOV 1.2.4 dobj 4.3 advmod v * * *  REFLECT GOV 1.2.5 nsubj 4.4 advmod n,a,r * * *  INTENSIFY GOV 1.2.6 dep 4.5 amod * * * *  REFLECT GOV 1.3 pobj * negTerms1 * *  NEGATE DEP 4.6 infmod a * * *  REFLECT GOV 1.4 aux * * * negAux2  NEGATE GOV 4.7 infmod v,n,r * * *  INTENSIFY DEP 1 negTerms = {n't, no, not, never, none, nothing, nobody, noone, nowhere, without, hardly, barely, rarely, seldom, against, minus, sans} 4.8 a * * *  REFLECT DEP 2 negAux = {should, could, would, might, ought} partmod 4.9 v,n,r * * *  STRONGER DEP 2. Subjects 4.10 quantmod * * * *  INTENSIFY GOV 2.1.1 nsubj 4.11 prt * * * *  STRONGER GOV * * * *  INTENSIFY GOV 2.1.2 nsubjpass 4.13 prep * * * {like}  UNCHANGED GOV 2.2.1 csubj * * * *  REFLECT GOV 4.12 prep * * * *  REFLECT GOV 2.2.2 csubjpass 5. Clausal Modifiers 3. Objects 5.1 advcl a * * *  REFLECT DEP 3.1.1 dobj * negVerbs3 * *  NEGATE DEP 3.1.2 dobj * * * *  REFLECT GOV 5.2 advcl v,n,r * * *  UNCHANGED DEP 3.2 iobj * * * *  UNCHANGED GOV 5.3 purpcl * * * *  UNCHANGED DEP 3.3 pobj * * * *  UNCHANGED DEP 6. Clausal complements 3 negVerbs = {avoid, cease, decline , forget, fail, miss , neglect, refrain, refuse, stop} 6.1.1 ccomp 6.1.2 xcomp * * * *  REFLECT GOV 6.1.3 acomp 6.2.1 conj 6.2.2 appos * * * *  AVG GOV 6.2.3 parataxis 6.3 dep * * * *  STRONGER DEP Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 50
  • 51. SENTENCE-LEVEL ANALYSIS Sentence classification algorithm Initialization • Generate dependency tree from sentence • Annotate subjective terms with prior polarities from sentiment lexicon • Annotate feature terms with labels from feature lexicon Sentiment estimation • Apply closest matching rule to every dependency relation in the tree » The sentiment of the dependency replaces previous sentiment of the governor node » Dependencies are processed in reverse postfix order (bottom to top and right to left) Feature targeting • The scope of a feature term is a subtree that contains the term and goes » all the way down to the leaves » all the way up to the closest clausal dependency • the sentiment at the root of the subtree gets assigned to the feature Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 51
  • 52. SENTENCE-LEVEL ANALYSIS Sentence classification example Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 52
  • 53. SENTENCE-LEVEL ANALYSIS Sentence polarity evaluation Test set: Sentence polarity dataset by Pang & Lee, 2002 (5331 positive + 5331 negative sentences from movie reviews) Results Polarity classification is accurate for 71.5% of positive sentences 76.9% of negative sentences 74.2% of all sentences Analysis of error causes 39.0% inaccurate dependency rule 28.5% misclassified term (or we picked the wrong sense) 21.5% erroneous sentence parsing 8.5% ambiguous sentence 2.5% dependency rules applied in the wrong order Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 53
  • 54. SENTENCE-LEVEL ANALYSIS Comparative evaluation Reference Method Accuracy Linguistic methods Nakagawa, Irui & Kurohashi, 2005 majority voting 62,9% Ikeda & Takamura, 2008 majority voting with negations 65.8% Aspect Miner dependency rules 74.2% Learning based methods Andreevskaia & Bergler, 2008 naïve bayes 69.0% Nakagawa, Irui & Kurohashi, 2005 SVM (bag-of-features) 76.4% Arora, Mayfield et al., 2010 genetic programming 76.9% SVM (sentence-wise learning Ikeda & Takamura, 2008 77.0% with polarity shifting + ngrams) Nakagawa, Irui & Kurohashi, 2005 dependency tree CRFs 77.3% Conclusion: Our method fares well among linguistic techniques, but does not match the accuracy of learning based methods Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 54
  • 56. Training subsystem CONCLUSIONS Training corpus Term classifier (rated reviews) Corpus statistics Term Histogram collection generation Tokenization POS tagging Named Entity identification Index of PEAK PN WW Indexing Lexical Analyzer Lemmatization terms classifier classifier classifier Comparatives annotation Feature identifier Negation scope resolution ... Topic models Stop word removal partition 1 Training set partitioning TΜ1 TΜ2 ... TΜΝ-1 TΜΝ ... Open-class word filtering partition 2 LDA ... Aggregation ... Bags of terms partition N-1 (one per ... document) partition N Assisted labeling Sentiment lexicon Feature lexicon Dependency Dependency Sentence & Feature parsing tree(s) Classification Result: Text to Feature-sentiment classify pairs Dependency Rule set Sentence classifier Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 56
  • 57. CONCLUSIONS Summary of contributions • We showed the feasibility of • We developed a granular prior polarity reusable sentiment lexicon classification using review and feature lexicon for the ratings movie review domain » and developed a classifier that achieved at least 70% accuracy • We created a set of linguistic on the training dataset rules and developed a methodology that is capable • We suggested a fine-grained feature-level bagging-inspired classification of sentences meta-algorithm for » and achieved 74.2% accuracy discovering feature topics for polarity classification on with LDA our test dataset. Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 57
  • 58. CONCLUSIONS Suggested Improvements Term classification intensifier term • Assigning a special class to intensifier terms • Per-feature polysemy resolution Feature identification • Named entities as features • Applying multi-grain topic models for discovery of local topics, e.g. MG-LDA (Titov & MacDonald, 2008) Sentence-level classification • Supervised learning of rules. Replace manually-made set of rules with a set of rules inferred from frequent dependency patterns. Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 58
  • 59. CONCLUSIONS References For a complete list of references, see the full report (in greek) http://j.mp/AspectMiner B. Liu, “Sentiment analysis and subjectivity,” Handbook of Natural M. Huand B. Liu, “Mining and summarizing customer reviews,” in Language Processing,, pp. 978–1420085921, 2010. Proceedings of the tenth ACM SIGKDD international conference B. Pang and L. Lee, “Opinion mining and sentiment analysis,” on Knowledge discovery and data mining, 2004, pp. 168–177. Foundations and Trends in Information Retrieval, vol. 2, no. 1-2, X. Ding, B. Liu, and P S. Yu, “A holistic lexicon-based approach to opinion . pp. 1–135, 2008. mining,” in Proceedings of the international conference on Web A. Esuliand F. Sebastiani, “Sentiwordnet: A publicly available lexical search and web data mining, 2008, pp. 231–240. resource for opinion mining,” in Proceedings of LREC, 2006, vol. 6, I. Titovand R. McDonald, “Modelingonline reviews with multi-grain pp. 417–422. topic models,” in Proceeding of the 17th international conference V. Hatzivassiloglouand K. R. McKeown, “Predicting the semantic on World Wide Web, 2008, pp. 111–120. orientation of adjectives,” in Proceedings of the eighth conference T. Nakagawa, K. Inui, and S. Kurohashi, “Dependency tree-based on European chapter of the Association for Computational sentiment classification using CRFswith hidden variables,” in Linguistics, 1997, pp. 174–181. Human Language Technologies: The 2010 Annual Conference of P Turney, M. L. Littman, and others, “Measuring praise and criticism: . the North American Chapter of the Association for Computational Inference of semantic orientation fromassociation,” in ACM Linguistics, 2010, pp. 786–794. Transactions on Information Systems (TOIS), 2003. A. Andreevskaiaand S. Bergler, “When specialists and generalists work A. M. Popescuand O. Etzioni, “Extracting product features and together: Overcoming domain dependence in sentiment opinions from reviews,” in Proceedings of the conference on tagging,” ACL-08: HLT, 2008. Human Language Technology and Empirical Methods in Natural D. Ikeda and H. Takamura, “Learning to shift the polarity of words for Language Processing, 2005, pp. 339–346. sentiment classification,” Comp.Intelligence, vol. 25, no. 1, pp. 296–303, 2008. Stelios Karabasakis Aspect Miner: Fine-grained feature-level opinion mining from rated review corpora Feb 2012 59

Notas del editor

  1. experimental opinion mining system for user reviews
  2. Identifying compound terms brings us some of the benefits of n-grams, without the increased costs and noise
  3. If polysemous, the RC set with the highest frequency sum indicates the term’s primary sentiment.