SlideShare una empresa de Scribd logo
1 de 27
MINING USER’S
OPINIONS ON HOTELS
BRIEF RECAP ON CA1
Literature Review / Background

 Web is a huge database of opinions on hotels

 Commercial Possibilities / Business Intelligence

 “What others think” is an important element in decision
  making

 Opinion Mining / Sentiment Analysis
Far From a Solved Problem

 Impossible for human read every single opinions
   Machines can be trained to do this

 People always express more than one opinion

 Use of Sarcasm and Negation

 Expression of sentiments in different topic and domain
   eg big: Positive when swimming pool is big enough to swim,
     Negative when the queue is long
How to train a machine to analyze
sentiments

 Natural Language Processing (NLP)
    Transform opinion to a format the machine understand

 Artificial Intelligence
    Machine are able to use information given by NLP and a lot of
     math to analyze sentiments
    Make the machine determine what is facts and opinions like
     how a normal human understand them by reading
Problems of Machine

 Subjectivity and Sentiment

 Analyze polarity

 Opinion rating

 Sentiment intensity

 Different domains / topic context

 Facts Vs Opinion
Ambiguity to machine examples

 “The swimming pool is better than the tennis court”.
    Comparisons are hard to classify

 “This hotel is very boleh lah”
    Use of Slang and cultural communication

 “This breakfast is as good as none”
    Negativity not obvious to machine

 “The weather is hot”
    In different context, the statement has different polarity
WHAT IS DONE IN CA1
EXTRACTION – Preparing
 machine to analyze data
Review and aspects extraction process

 Extract important datasets from review websites

 Word handling to refine datasets

 Use part of speech tagging to label text to extract aspects
   which are nouns

 Determine aspects / features that people are concerned
   about from these reviews by occurrence and context
Part of Speech Tagging

 Assigning a label to every word in the text to allow machine
  to do something with it
Word Handling

 Dictionary / Spelling Correction

 Slang Check

 Foreign language check

 Singular / Plural conversion

 Duplicate check
END OF CA1
CA2 : Data Processing
Classifying Sentiments using some
existing methods

 Naïve Bayes
   To determine polarity of sentiments

 Maximum Entropy
   Using probability distributions on the basis of partial knowledge

 Support Vector machine
   Analyze patterns and classify sentiments
Naïve Bayes Classifier
 To determine polarity of sentiments

 P(X | Y) = P(X)P(Y | X) / P(Y)

 Probability that a sentiments is positive or negative, given it's
   contents

 Probability of a word occurring given a positive or negative
   sentiment

 Assumptions: There is no link between words

 P(sentiment | sentence) = P(sentiment)P(sentence | sentiment) /
   P(sentence)
Problem with Naïve Bayes

 Polarity does not change with domain

 Words within sentiments have no relationship with each
  other

 Words not found in lexicon might be missed by Naïve Bayes
  resulting in inaccuracy of polarity

 No opinion rating to determine which sentiment is more
  polar
Solution to Naïve Bayes

 Establish domain sentiment relations

 Establish domain aspects relations

 Establish aspects sentiments relations

 Estimate polarity for unseeded sentiments

 Estimate strength of polarity on sentiments
Establishing relations

 Establish domain by categorizing aspects founded into
  domains such as food, location and security

 Finding occurrence of aspects / sentiments within sentences
  for a particular domain

 Finding polarity of sentences, aspects and sentiments and
  establishing relations                     Domain




                                Sentiments            Aspects
Finding polarity for unseeded sentiments

 After establishing relations, we have a graph of nodes
  (Sentiments / Aspects)

 Some nodes have no polarity after naïve bayes but its
  connected nodes might have polarity

 Determine the probability that the node is positive or
  negative given its surrounding nodes
Estimating the strength of polarity

 Determine the strength of the polarity of an unseeded node
   given that amount of traversal surrounding nodes with
   polarity has to take to reach it

 Find the shortest path to reach an unseeded node which will
   result in a spanning tree

 This will determine the strength of polarity
Implementation

 Using Dijkstra Algorithm to find the spanning tree
Implementation

 Find the cost to get from surrounding nodes to an unseed
  node
END OF CA2
What is going to happen
                 in CA3?
Prototyping

 Refining parameters to come up with a prototype mainly to
  solve the following problems:
   Analyze polarity
   Opinion rating
   Sentiment intensity
   Different domains / topic context

 Manually analyze reviews myself and check prototype for
  effectiveness and seek to improve accuracy
Prototype testing

 Enlarging dataset from various hotel review site

 Merging results to find correlations between sentiments
  expression on different sites

 Testing on different domain such as food to get domain
  dependent results

Más contenido relacionado

Destacado

urban area picture
urban area pictureurban area picture
urban area picture
proudyproud
 
Zofiagia Companies
Zofiagia CompaniesZofiagia Companies
Zofiagia Companies
Brendan Ryan
 
3 largest urban area in each continent (1)
3 largest urban area in each continent (1)3 largest urban area in each continent (1)
3 largest urban area in each continent (1)
proudyproud
 
The Middlesex Company
The Middlesex CompanyThe Middlesex Company
The Middlesex Company
Brendan Ryan
 
Mail parsetest100
Mail parsetest100Mail parsetest100
Mail parsetest100
yokotaso
 

Destacado (9)

Fypca4
Fypca4Fypca4
Fypca4
 
urban area picture
urban area pictureurban area picture
urban area picture
 
Ee3702
Ee3702Ee3702
Ee3702
 
Zofiagia Companies
Zofiagia CompaniesZofiagia Companies
Zofiagia Companies
 
3 largest urban area in each continent (1)
3 largest urban area in each continent (1)3 largest urban area in each continent (1)
3 largest urban area in each continent (1)
 
The Middlesex Company
The Middlesex CompanyThe Middlesex Company
The Middlesex Company
 
Ee3702
Ee3702Ee3702
Ee3702
 
Ee3702
Ee3702Ee3702
Ee3702
 
Mail parsetest100
Mail parsetest100Mail parsetest100
Mail parsetest100
 

Similar a Fyp ca2

Experiences with Sentiment Analysis with Peter Zadrozny
Experiences with Sentiment Analysis with Peter ZadroznyExperiences with Sentiment Analysis with Peter Zadrozny
Experiences with Sentiment Analysis with Peter Zadrozny
padatascience
 
intro to sentementel analysis
intro to sentementel analysisintro to sentementel analysis
intro to sentementel analysis
bhardwaj86
 

Similar a Fyp ca2 (20)

Lac presentation
Lac presentationLac presentation
Lac presentation
 
Sentiment+Analysis.ppt
Sentiment+Analysis.pptSentiment+Analysis.ppt
Sentiment+Analysis.ppt
 
Icdm2013 slides
Icdm2013 slidesIcdm2013 slides
Icdm2013 slides
 
NLP Ecosystem
NLP EcosystemNLP Ecosystem
NLP Ecosystem
 
Analyzing Arguments during a Debate using Natural Language Processing in Python
Analyzing Arguments during a Debate using Natural Language Processing in PythonAnalyzing Arguments during a Debate using Natural Language Processing in Python
Analyzing Arguments during a Debate using Natural Language Processing in Python
 
Experiences with Sentiment Analysis with Peter Zadrozny
Experiences with Sentiment Analysis with Peter ZadroznyExperiences with Sentiment Analysis with Peter Zadrozny
Experiences with Sentiment Analysis with Peter Zadrozny
 
A SURVEY OF SENTIMENT CLASSSIFICTION TECHNIQUES
A SURVEY OF SENTIMENT CLASSSIFICTION TECHNIQUESA SURVEY OF SENTIMENT CLASSSIFICTION TECHNIQUES
A SURVEY OF SENTIMENT CLASSSIFICTION TECHNIQUES
 
RCOMM 2011 - Sentiment Classification
RCOMM 2011 - Sentiment ClassificationRCOMM 2011 - Sentiment Classification
RCOMM 2011 - Sentiment Classification
 
RCOMM 2011 - Sentiment Classification with RapidMiner
RCOMM 2011 - Sentiment Classification with RapidMinerRCOMM 2011 - Sentiment Classification with RapidMiner
RCOMM 2011 - Sentiment Classification with RapidMiner
 
COMMENT POLARITY MOVIE RATING SYSTEM-1.pptx
COMMENT POLARITY MOVIE RATING SYSTEM-1.pptxCOMMENT POLARITY MOVIE RATING SYSTEM-1.pptx
COMMENT POLARITY MOVIE RATING SYSTEM-1.pptx
 
Sentence level sentiment polarity calculation for customer reviews by conside...
Sentence level sentiment polarity calculation for customer reviews by conside...Sentence level sentiment polarity calculation for customer reviews by conside...
Sentence level sentiment polarity calculation for customer reviews by conside...
 
Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...
Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...
Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...
 
A review on sentiment analysis and emotion detection.pptx
A review on sentiment analysis and emotion detection.pptxA review on sentiment analysis and emotion detection.pptx
A review on sentiment analysis and emotion detection.pptx
 
Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Re...
Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Re...Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Re...
Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Re...
 
OPTIMIZATION OF CROSS DOMAIN SENTIMENT ANALYSIS USING SENTIWORDNET
OPTIMIZATION OF CROSS DOMAIN SENTIMENT ANALYSIS USING SENTIWORDNETOPTIMIZATION OF CROSS DOMAIN SENTIMENT ANALYSIS USING SENTIWORDNET
OPTIMIZATION OF CROSS DOMAIN SENTIMENT ANALYSIS USING SENTIWORDNET
 
Comparative Study on Lexicon-based sentiment analysers over Negative sentiment
Comparative Study on Lexicon-based sentiment analysers over Negative sentimentComparative Study on Lexicon-based sentiment analysers over Negative sentiment
Comparative Study on Lexicon-based sentiment analysers over Negative sentiment
 
SENTIMENT ANALYSIS-AN OBJECTIVE VIEW
SENTIMENT ANALYSIS-AN OBJECTIVE VIEWSENTIMENT ANALYSIS-AN OBJECTIVE VIEW
SENTIMENT ANALYSIS-AN OBJECTIVE VIEW
 
Intro to sentiment analysis
Intro to sentiment analysisIntro to sentiment analysis
Intro to sentiment analysis
 
intro to sentementel analysis
intro to sentementel analysisintro to sentementel analysis
intro to sentementel analysis
 
Intro to Sentiment Analysis
Intro to Sentiment AnalysisIntro to Sentiment Analysis
Intro to Sentiment Analysis
 

Fyp ca2

  • 3. Literature Review / Background  Web is a huge database of opinions on hotels  Commercial Possibilities / Business Intelligence  “What others think” is an important element in decision making  Opinion Mining / Sentiment Analysis
  • 4. Far From a Solved Problem  Impossible for human read every single opinions  Machines can be trained to do this  People always express more than one opinion  Use of Sarcasm and Negation  Expression of sentiments in different topic and domain  eg big: Positive when swimming pool is big enough to swim, Negative when the queue is long
  • 5. How to train a machine to analyze sentiments  Natural Language Processing (NLP)  Transform opinion to a format the machine understand  Artificial Intelligence  Machine are able to use information given by NLP and a lot of math to analyze sentiments  Make the machine determine what is facts and opinions like how a normal human understand them by reading
  • 6. Problems of Machine  Subjectivity and Sentiment  Analyze polarity  Opinion rating  Sentiment intensity  Different domains / topic context  Facts Vs Opinion
  • 7. Ambiguity to machine examples  “The swimming pool is better than the tennis court”.  Comparisons are hard to classify  “This hotel is very boleh lah”  Use of Slang and cultural communication  “This breakfast is as good as none”  Negativity not obvious to machine  “The weather is hot”  In different context, the statement has different polarity
  • 8. WHAT IS DONE IN CA1
  • 9. EXTRACTION – Preparing machine to analyze data
  • 10. Review and aspects extraction process  Extract important datasets from review websites  Word handling to refine datasets  Use part of speech tagging to label text to extract aspects which are nouns  Determine aspects / features that people are concerned about from these reviews by occurrence and context
  • 11. Part of Speech Tagging  Assigning a label to every word in the text to allow machine to do something with it
  • 12. Word Handling  Dictionary / Spelling Correction  Slang Check  Foreign language check  Singular / Plural conversion  Duplicate check
  • 14. CA2 : Data Processing
  • 15. Classifying Sentiments using some existing methods  Naïve Bayes  To determine polarity of sentiments  Maximum Entropy  Using probability distributions on the basis of partial knowledge  Support Vector machine  Analyze patterns and classify sentiments
  • 16. Naïve Bayes Classifier  To determine polarity of sentiments  P(X | Y) = P(X)P(Y | X) / P(Y)  Probability that a sentiments is positive or negative, given it's contents  Probability of a word occurring given a positive or negative sentiment  Assumptions: There is no link between words  P(sentiment | sentence) = P(sentiment)P(sentence | sentiment) / P(sentence)
  • 17. Problem with Naïve Bayes  Polarity does not change with domain  Words within sentiments have no relationship with each other  Words not found in lexicon might be missed by Naïve Bayes resulting in inaccuracy of polarity  No opinion rating to determine which sentiment is more polar
  • 18. Solution to Naïve Bayes  Establish domain sentiment relations  Establish domain aspects relations  Establish aspects sentiments relations  Estimate polarity for unseeded sentiments  Estimate strength of polarity on sentiments
  • 19. Establishing relations  Establish domain by categorizing aspects founded into domains such as food, location and security  Finding occurrence of aspects / sentiments within sentences for a particular domain  Finding polarity of sentences, aspects and sentiments and establishing relations Domain Sentiments Aspects
  • 20. Finding polarity for unseeded sentiments  After establishing relations, we have a graph of nodes (Sentiments / Aspects)  Some nodes have no polarity after naïve bayes but its connected nodes might have polarity  Determine the probability that the node is positive or negative given its surrounding nodes
  • 21. Estimating the strength of polarity  Determine the strength of the polarity of an unseeded node given that amount of traversal surrounding nodes with polarity has to take to reach it  Find the shortest path to reach an unseeded node which will result in a spanning tree  This will determine the strength of polarity
  • 22. Implementation  Using Dijkstra Algorithm to find the spanning tree
  • 23. Implementation  Find the cost to get from surrounding nodes to an unseed node
  • 25. What is going to happen in CA3?
  • 26. Prototyping  Refining parameters to come up with a prototype mainly to solve the following problems:  Analyze polarity  Opinion rating  Sentiment intensity  Different domains / topic context  Manually analyze reviews myself and check prototype for effectiveness and seek to improve accuracy
  • 27. Prototype testing  Enlarging dataset from various hotel review site  Merging results to find correlations between sentiments expression on different sites  Testing on different domain such as food to get domain dependent results