2. DEFINITION:
• to classify an opinion document as expressing a positive or negative
opinion or sentiment.
• it considers the whole document as a basic information unit.
3. PROBLEM DEFINITION
Given an opinion document d evaluating an entity,determine the overall
sentiment s of the opinion holder about the entity,i.e., determine s
expressed on aspect GENERAL in the quintuple
(_, GENERAL, s, _, _),
where the entity e, opinion holder h, and time of opinion t are assumed
known or irrelevant (do not care).
• If s takes categorical values, e.g., positive and negative, then it is a
classification problem.
• If s takes numeric values or ordinal scores within a given range,e.g.,
1 to 5, the problem becomes regression.
ASSUMPTION
“The opinion document d expresses opinions on a single entity e
and contains opinions from a single opinion holder h.”
4. Sentiment Classification Using Supervised Learning
• Usually 2 class classification problem
Positive Negative
• If rating is used (1-5 stars)
1-2(negative) ,4-5(positive),3(neutral)
• Essentially a text classification problem
• Many supervised learning techniques(naïve Bayes classification, and
support vector machines (SVM))
Key features used in sentiment classification
• Terms and their frequency
• Part of speech(POS)
• Sentiment words and phrases
• Rule of opinion
• Sentiment shifter
• Syntactic dependency
5. Algorithm
• Two consecutive words are extracted if their POS tag conform to
any of the pattern
Example: This piano produces beautiful sounds
WP NN VB JJ NN
Sentiment Classification Using Unsupervised Learning
6. • Estimates the sentiment orientation (SO) of the extracted phrases
using the pointwise mutual information (PMI) measure:
PMI(term1,term2) = log 2(Pr(term1 ˄ term2 )/(Pr(term1)Pr(term2 )))
PMI measures the degree of statistical dependence between two terms
Pr(term1 ˄ term2 ) is the actual co-occurrence probability of term1
and term2
Pr(term1)Pr(term2) is the co-occurrence probability of the two terms
if they are statistically independent.
SO = PMI (phrase ,”excellent”) – PMI(phrase ,”poor”)
SO(phrase) = log2 hits(phrase near “excellent”) hits(“poor”)
hits (phrase near “poor”)hits(“excellent”)
7. • Given a review, the algorithm computes the average SO of all
phrases in the review and classifies the review as positive if the
average SO is positive and negative otherwise.
8. We modeled rating prediction as a graph-based semi-supervised
learning problem, which used
• labeled (with ratings) reviews
• unlabeled (without ratings) reviews.
The unlabeled reviews were also the test reviews whose ratings need
to be predicted.
In the graph,
• each node is a document (review) and
• the link between two nodes is the similarity value between the two
documents.
The algorithm used assumed that initially a separate learner has
already predicted the numerical ratings of the unlabeled documents.
The graph based method only improves them by revising the ratings
through solving an optimization problem to force ratings to be smooth
throughout the graph with regard to both the ratings and the link
weights.
Sentiment Rating Prediction
(Regression Problem)
9. Sentiment classification is highly sensitive to the domain from
which the training data is extracted.
Two types of domains
Source domain : original domain with labeled trained data
Target domain : new domain which is used for testing
Four Strategies
1. Training on a mixture of labeled reviews from other domains where
such data are available and testing on the target domain
2. Training a classifier as above, but limiting the set of features to
those only observed in the target domain
3. Using ensembles of classifiers from domains with available labeled
data and testing on the target domain
4. Combining small amounts of labeled data with large amounts of
unlabeled data in the target.
Cross Domain Sentiment Classification
10. Cross-language sentiment classification means to perform sentiment
classification of opinion documents in multiple languages
Example: If we use Sentiment resources in English to perform classification of
Chinese reviews the following algorithm is used :
• Translates each Chinese review into English using multiple translators, which
produce different English versions.
• It then uses a lexicon-based approach to classify each translated English
version.
The lexicon consists of a set of
positive terms, a set of negative terms, a set of negation terms, and a set of
intensifiers.
• The algorithm then sums up the sentiment scores of the terms in
the review considering negations and intensifiers.
• If the final score is less than 0, the review is negative, otherwise positive.
• For the final classification of each review, it combines the scores of different
translated versions using various ensemble methods, e.g., average, max,
weighted average, voting
Cross Language Sentiment Classification
12. INTRODUCTION
Sentences are short documents .Sentence level analysis is to classify
sentiment expressed in each sentence
ASSUMPTION
One assumption that researchers often make is that sentence usually
contain single opinion
PROBLEM DEFINITION
Given a sentence x, determine whether x expresses a positive, negative,
or neutral (or no) opinion.
SENTENCE SENTIMENT CLASSIFICATION CAN BE SOLVED AS
• Two separate classification Problem
1. Classify whether sentence expresses opinion or not( Subjective
classification)
2. Classify those opinion sentences into positive and negative classes
13. Sentences are classified into two types
• Subjective (give personal views and opinion)
• Objective (some factual information)
• Subjective classification is based on supervised learning
• Gradability is a semantic property that enables a word to appear in a
comparative construct and to accept modifying expressions that act
as intensifiers or diminishers.
Example: a small planet is usually much larger than a large house
• sentence similarity was measured based on shared words, phrases
SUBJECTIVITY CLASSIFICATION
14. One of the bottlenecks in applying supervised learning is the manual
effort involved in annotating a large number of training examples.
Solution :
a bootstrapping approach to label training data automatically was
proposed
• The algorithm works by first using two high precision classifiers to
automatically identify some subjective and objective sentences.
• The highprecision classifiers use lists of lexical items (single words
or n-grams) that are good subjectivity clues.
• HP-Subj classifies a sentence as subjective if it contains two or
more strong subjective clues.
• HP-Obj classifies a sentence as objective if there are no strong
subjective clues..
• The extracted sentences are then added to Sentiment Analysis and
Opinion Mining the training data to learn patterns
15. ASSUMPTION
A sentence expresses a single sentiment from a single opinion
holder.
METHOD
• For sentiment classification of subjective sentences, we use a large
set of seed adjectives.
• modified log-likelihood ratio to determine the positive or negative
orientation for each adjective, adverb, noun and verb.
• An orientation to each sentence is assigned by the average log-
likelihood scores of its words.
• Two thresholds are chosen using the training data and applied to
determine whether the sentence has a positive, negative, or neutral
orientation.
SENTENCE SENTIMENT CLASSIFICATION
16. DEALING WITH CONDITIONAL SENTENCES
• Conditional sentences are sentences that describe implications or
hypothetical situations and their consequences.
Such a sentence typically contains two clauses:
• the condition clause
• the consequent clause,
• that are dependent on each other. Their relationship has significant
impact on whether the sentence expresses a positive or negative
sentiment.
• EXAMPLE:
“If someone makes a reliable car, I will buy it”
17. • Translate test sentences in the target language into the source
language and classify them using a source language classifier.
• Translate a source language training corpus into the target
language and build a corpus-based classifier in the target
language.
• Translate a sentiment or subjectivity lexicon in the source
language to the target language and build a lexicon-based
classifier in the target language.
CROSS LANGUAGE SUBJECTIVITY
CLASSIFICATION