3. Literature Review / Background
Web is a huge database of opinions on hotels
Commercial Possibilities / Business Intelligence
“What others think” is an important element in decision
making
Opinion Mining / Sentiment Analysis
4. Far From a Solved Problem
Impossible for human read every single opinions
Machines can be trained to do this
People always express more than one opinion
Use of Sarcasm and Negation
Expression of sentiments in different topic and domain
eg big: Positive when swimming pool is big enough to swim,
Negative when the queue is long
5. How to train a machine to analyze
sentiments
Natural Language Processing (NLP)
Transform opinion to a format the machine understand
Artificial Intelligence
Machine are able to use information given by NLP and a lot of
math to analyze sentiments
Make the machine determine what is facts and opinions like
how a normal human understand them by reading
6. Problems of Machine
Subjectivity and Sentiment
Analyze polarity
Opinion rating
Sentiment intensity
Different domains / topic context
Facts Vs Opinion
7. Ambiguity to machine examples
“The swimming pool is better than the tennis court”.
Comparisons are hard to classify
“This hotel is very boleh lah”
Use of Slang and cultural communication
“This breakfast is as good as none”
Negativity not obvious to machine
“The weather is hot”
In different context, the statement has different polarity
10. Review and aspects extraction process
Extract important datasets from review websites
Word handling to refine datasets
Use part of speech tagging to label text to extract aspects
which are nouns
Determine aspects / features that people are concerned
about from these reviews by occurrence and context
11. Part of Speech Tagging
Assigning a label to every word in the text to allow machine
to do something with it
15. Classifying Sentiments using some
existing methods
Naïve Bayes
To determine polarity of sentiments
Maximum Entropy
Using probability distributions on the basis of partial knowledge
Support Vector machine
Analyze patterns and classify sentiments
16. Naïve Bayes Classifier
To determine polarity of sentiments
P(X | Y) = P(X)P(Y | X) / P(Y)
Probability that a sentiments is positive or negative, given it's
contents
Probability of a word occurring given a positive or negative
sentiment
Assumptions: There is no link between words
P(sentiment | sentence) = P(sentiment)P(sentence | sentiment) /
P(sentence)
17. Problem with Naïve Bayes
Polarity does not change with domain
Words within sentiments have no relationship with each
other
Words not found in lexicon might be missed by Naïve Bayes
resulting in inaccuracy of polarity
No opinion rating to determine which sentiment is more
polar
18. Solution to Naïve Bayes
Establish domain sentiment relations
Establish domain aspects relations
Establish aspects sentiments relations
Estimate polarity for unseeded sentiments
Estimate strength of polarity on sentiments
19. Establishing relations
Establish domain by categorizing aspects founded into
domains such as food, location and security
Finding occurrence of aspects / sentiments within sentences
for a particular domain
Finding polarity of sentences, aspects and sentiments and
establishing relations Domain
Sentiments Aspects
20. Finding polarity for unseeded sentiments
After establishing relations, we have a graph of nodes
(Sentiments / Aspects)
Some nodes have no polarity after naïve bayes but its
connected nodes might have polarity
Determine the probability that the node is positive or
negative given its surrounding nodes
21. Estimating the strength of polarity
Determine the strength of the polarity of an unseeded node
given that amount of traversal surrounding nodes with
polarity has to take to reach it
Find the shortest path to reach an unseeded node which will
result in a spanning tree
This will determine the strength of polarity
26. Prototyping
Refining parameters to come up with a prototype mainly to
solve the following problems:
Analyze polarity
Opinion rating
Sentiment intensity
Different domains / topic context
Manually analyze reviews myself and check prototype for
effectiveness and seek to improve accuracy
27. Prototype testing
Enlarging dataset from various hotel review site
Merging results to find correlations between sentiments
expression on different sites
Testing on different domain such as food to get domain
dependent results