SlideShare una empresa de Scribd logo
1 de 33
Descargar para leer sin conexión
Mining User’s
 Opinions in
    Hotel

                                      TEY JUN HONG
                                        U095074X
   National University Of Singapore
Content
      1. Background
2. Formulating the problem
  3. Data Mining Process
       4. Techniques
         5. Analysis




            01
What is Data
              Mining?
• Extraction of meaningful / useful / Interesting
  patterns from a large volume of data sources
• In this project, the source will be large
  volume of WEB HOTEL REVIEWS data
• Data mining is one of the top ten emerging
  technology


                MIT’s TECHNOLOGY REVIEW 2004
What is Data
•
                Mining?
    Process of exploration and analysis
•   By automatic / semi automatic means
•   With little or no human interactions
•   To discover meaningful patterns and rules




           MASTERING DATA MINING BY BERRY AND LINOFF, 2000
User’s Opinions in Hotel
• Increase in social media and web
  user
• Increase in valuable opinion
  oriented data in Hotel due to web
  expansion
• Identify potential hotel to stay by
  looking at the aspects
• Overall Sentiments on hotel are
  greatly sought on the web for
  Sentiment Analysis
What can Data Mining do?
 • Identify best prospects
   (ASPECTS), and retain customers
 • Predict what ASPECTS
   customers like and promote
   accordingly
 • Learn parameters influencing
   trends in sales and margins
 • Identification of opinions for
   customers


   Sentiment Analysis !!!
What are the problems?
• Exponential growth of user’s
  opinions
• Limitations of human analysis
• Accuracy of human analysis

Machines can be trained to take
over human analysis with advanced
computer technology and it is done
with LOW COST
Some Limitations of machines
   • Unable to read like a human
   • No emotions
   • Cannot detect sarcasm
   • Expression of sentiments in
     different topic and domain
   • Polarity analysis
   • Facts Vs Opinion
Some machine limitation
• “The service is as good as none”.
          examples
  Negation not obvious to machine

• “Swimming pool is big enough to
  swim with comfort” , “There is a
  big crowd at the counter
  complaining”. Polarity might
  change with context.

• “The room is warmer than the
  lobby”. Comparisons are hard to
  classify
Sentiment
•             Analysis
    Machine learning
•   Pattern recognition
•   Statistics
•   Databases
Machine Learning
• A tool for data mining and intelligent decision
  support
• Application of computer algorithms that
  improve automatically through experience




          MASTERING DATA MINING BY BERRY AND LINOFF, 2000
Types of Machine learning
 • Supervised Learning
   • A training set is provided (data
     with correct answers) which is
     used to mine for known pattern
 • Unsupervised Learning
   • Data are provided with no prior
     knowledge of the hidden
     patterns that they contain.
 • Semi Supervised Learning
Supervised Learning techniques
    • Rule Mining and Rule learning
    • Bayesian Networks
    • Support Vector Machine
Project Objective
• Prediction of sentence polarity
• Classification of polarity for sentiment
  lexicon
• Detection of relations
Pre-requisite
• Large data set
• Relevant Prior Knowledge to
  domain, in our case the hotel
  domain
  • Eg. Rating
• Sentiment lexicon for sentiment
  analysis
• Data selection for reliability and
  standards
Data Mining Process
Cleaning the “Dirty” Data (60% of
 •                     effort)
     Frequent problem : Data inconsistencies
 •   Duplicate data
 •   Spelling Errors != Trim from data
 •   Foreign accent and characters
 •   Singular / Plural conversion
 •   Punctuations removal / replacement
 •   Noise and incomplete data
 •   Naming convention misused, same name but
     different meaning
Data Preprocessing (Laundering)
•   Part of Speech Tagging (POS) using Brill
    Tagger




•   Polarity tagging using sentiment lexicon
Findings
•    Part of Speech Tagging (POS) using Brill
     Tagger - NO PROBLEM
    -95% accuracy POS tagging words after data
                      cleaning
Findings
• Polarity tagging using sentiment lexicon –
  BIG PROBLEM
 -40% sentiment words not found in sentiment
                     lexicon
    -10% sentiment words with a positive or
negative polarity found are in the neutral section
              of sentiment lexicon
Problems
•   Sentiment lexicon not comprehensive to fulfill
    machine learning technique adopted
•   Polarity of sentiment words who are domain
    dependent are founded in neutral section of
    sentiment lexicon
•   Polarity of sentiment words can also change
    within the domain even though they are
    domain dependent


     EXPANSION OF LEXICON !!!
Solution
• Classify the polarity of unlabeled sentiment
  word using rule based mining
• Classify domain dependent sentiment words
• Establish word relations between labeled and
  unlabeled sentiment words
Data Processing
•    Rule based mining using conjunction and
     punctuation
    Polarity Assignment       Rules

       Same                 Adj – AND/OR - Adj

      Opposite           Neg - Adj – AND/OR - Adj /
                          Adj – AND/OR - Neg- Adj
       Same            Neg - Adj – AND/OR - Neg- Adj

      Opposite              Adj – BUT/NOR – Adj

       Same             Neg - Adj – BUT/NOR - Adj /
                         Adj – BUT/NOR - Neg- Adj
      Opposite         Neg - Adj – BUT/NOR - Neg- Adj

       Same                       Adj , Adj
Data Processing
•   Relation Network – Aspect – Sentiment word
    pair
Data Processing
•   Relation Network – Aspect – Sentiment word
    pair
Analysis
• Using the expanded sentiment lexicon, we
  analyze the polarity sentiment by doing a
  sentiment lookup using Bayesian Network
Bayesian
•   To determine polarity of sentiments

           P(X | Y) = P(X) P(Y | X) / P(Y)

•   Probability that a sentiments is positive or
    negative, given it's contents
•   Assumptions: There is no link between words
•   P(sentiment | sentence) =
    P(sentiment)P(sentence | sentiment) /
    P(sentence)
Validation
• Precision = N (agree & found) / N (found)
• High precision means most of the correct
  sentiment words are found by the system
• Recall = N (agree & found) / N (agree)
• High recall means most of found sentiment
  words are correctly labeled by the system
Validation Results
•   It is found that out of the 350 aspect-
    unlabelled sentiment word pairs,
•   Only 194 are founded by the methods.
    Thus, the precision is about 57%.
•   The recall is also not very high; only 126
    words are corrected labelled by the
    system, which is about 63%.
Discussion
•   The results will improve if more rules are
    applied such the inclusion of more adverbs
    such as “excessively” as negation words.
•   There might not be enough dataset for the
    system to work on. There are only 350 aspect-
    unlabelled sentiment word pairs for the
    application to work with.
•   This, however requires more human judges to
    validate the data
Conclusion
• Comprehensive Sentiment Lexicon is a
  simple yet effective solution to sentiment
  analysis as it does not requires prior training
• Current sentiment lexicon does not capture
  such domain and context sensitivities of
  sentiment expressions
Conclusion
• This leads to poor coverage
• Thus, expanding general sentiment lexicon to
  capture domain and context sensitivities of
  sentiment expressions are advocated
Questions?

    01   DEMO

Más contenido relacionado

La actualidad más candente

NLP with Deep Learning
NLP with Deep LearningNLP with Deep Learning
NLP with Deep Learningfmguler
 
Question Answering System using machine learning approach
Question Answering System using machine learning approachQuestion Answering System using machine learning approach
Question Answering System using machine learning approachGarima Nanda
 
Answer Selection and Validation for Arabic Questions
Answer Selection and Validation for Arabic QuestionsAnswer Selection and Validation for Arabic Questions
Answer Selection and Validation for Arabic QuestionsAhmed Magdy Ezzeldin, MSc.
 
Opinion Mining or Sentiment Analysis
Opinion Mining or Sentiment AnalysisOpinion Mining or Sentiment Analysis
Opinion Mining or Sentiment AnalysisRachna Raveendran
 
An overview of text mining and sentiment analysis for Decision Support System
An overview of text mining and sentiment analysis for Decision Support SystemAn overview of text mining and sentiment analysis for Decision Support System
An overview of text mining and sentiment analysis for Decision Support SystemGan Keng Hoon
 
Rigourous evaluation of nlp models in real world deployment
Rigourous evaluation of nlp models in real world deploymentRigourous evaluation of nlp models in real world deployment
Rigourous evaluation of nlp models in real world deploymentSandy Man
 
Intro to Deep Learning for Question Answering
Intro to Deep Learning for Question AnsweringIntro to Deep Learning for Question Answering
Intro to Deep Learning for Question AnsweringTraian Rebedea
 

La actualidad más candente (10)

NLP with Deep Learning
NLP with Deep LearningNLP with Deep Learning
NLP with Deep Learning
 
Question Answering System using machine learning approach
Question Answering System using machine learning approachQuestion Answering System using machine learning approach
Question Answering System using machine learning approach
 
Answer Selection and Validation for Arabic Questions
Answer Selection and Validation for Arabic QuestionsAnswer Selection and Validation for Arabic Questions
Answer Selection and Validation for Arabic Questions
 
Final deck
Final deckFinal deck
Final deck
 
Opinion Mining or Sentiment Analysis
Opinion Mining or Sentiment AnalysisOpinion Mining or Sentiment Analysis
Opinion Mining or Sentiment Analysis
 
An overview of text mining and sentiment analysis for Decision Support System
An overview of text mining and sentiment analysis for Decision Support SystemAn overview of text mining and sentiment analysis for Decision Support System
An overview of text mining and sentiment analysis for Decision Support System
 
Abstractive Review Summarization
Abstractive Review SummarizationAbstractive Review Summarization
Abstractive Review Summarization
 
Arabic question answering ‫‬
Arabic question answering ‫‬Arabic question answering ‫‬
Arabic question answering ‫‬
 
Rigourous evaluation of nlp models in real world deployment
Rigourous evaluation of nlp models in real world deploymentRigourous evaluation of nlp models in real world deployment
Rigourous evaluation of nlp models in real world deployment
 
Intro to Deep Learning for Question Answering
Intro to Deep Learning for Question AnsweringIntro to Deep Learning for Question Answering
Intro to Deep Learning for Question Answering
 

Destacado

3 largest urban area in each continent (1)
3 largest urban area in each continent (1)3 largest urban area in each continent (1)
3 largest urban area in each continent (1)proudyproud
 
urban area picture
urban area pictureurban area picture
urban area pictureproudyproud
 

Destacado (8)

Ee3702
Ee3702Ee3702
Ee3702
 
Fypca5
Fypca5Fypca5
Fypca5
 
Proffice Denmark
Proffice DenmarkProffice Denmark
Proffice Denmark
 
3 largest urban area in each continent (1)
3 largest urban area in each continent (1)3 largest urban area in each continent (1)
3 largest urban area in each continent (1)
 
Ee3702
Ee3702Ee3702
Ee3702
 
urban area picture
urban area pictureurban area picture
urban area picture
 
Fypca4
Fypca4Fypca4
Fypca4
 
Fypca5
Fypca5Fypca5
Fypca5
 

Similar a Fypca4

Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...
Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...
Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...hajinouha0
 
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...Dr. Haxel Consult
 
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.comEnhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.comSimon Hughes
 
Big Data + Sentiment Analysis = Awesome
Big Data + Sentiment Analysis = AwesomeBig Data + Sentiment Analysis = Awesome
Big Data + Sentiment Analysis = AwesomeAdel Rahimi
 
Correlation does not mean causation
Correlation does not mean causationCorrelation does not mean causation
Correlation does not mean causationPeter Varhol
 
20211115 jsai international_symposia_slide
20211115 jsai international_symposia_slide20211115 jsai international_symposia_slide
20211115 jsai international_symposia_slideSatoshi Kawamoto
 
Design Recommender systems from scratch
Design Recommender systems from scratchDesign Recommender systems from scratch
Design Recommender systems from scratchDr. Amit Sachan
 
Recommender Systems in a nutshell
Recommender Systems in a nutshellRecommender Systems in a nutshell
Recommender Systems in a nutshellKonstantin Savenkov
 
Not fair! testing ai bias and organizational values
Not fair! testing ai bias and organizational valuesNot fair! testing ai bias and organizational values
Not fair! testing ai bias and organizational valuesPeter Varhol
 
Not fair! testing AI bias and organizational values
Not fair! testing AI bias and organizational valuesNot fair! testing AI bias and organizational values
Not fair! testing AI bias and organizational valuesPeter Varhol
 
Recommendation engine Using Genetic Algorithm
Recommendation engine Using Genetic AlgorithmRecommendation engine Using Genetic Algorithm
Recommendation engine Using Genetic AlgorithmVaibhav Varshney
 
Natural Language Processing: L01 introduction
Natural Language Processing: L01 introductionNatural Language Processing: L01 introduction
Natural Language Processing: L01 introductionananth
 
Umm, how did you get that number? Managing Data Integrity throughout the Data...
Umm, how did you get that number? Managing Data Integrity throughout the Data...Umm, how did you get that number? Managing Data Integrity throughout the Data...
Umm, how did you get that number? Managing Data Integrity throughout the Data...John Kinmonth
 
Using AI to Build Fair and Equitable Workplaces
Using AI to Build Fair and Equitable WorkplacesUsing AI to Build Fair and Equitable Workplaces
Using AI to Build Fair and Equitable WorkplacesData Con LA
 
Semi supervised approach for word sense disambiguation
Semi supervised approach for word sense disambiguationSemi supervised approach for word sense disambiguation
Semi supervised approach for word sense disambiguationkokanechandrakant
 
Machine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersMachine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersAlbert Y. C. Chen
 
Advanced topics research
Advanced topics researchAdvanced topics research
Advanced topics researchkieran122
 

Similar a Fypca4 (20)

Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...
Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...
Sld-Natural-Language-Processing-for-large-volumes-of-human-text-data-Sozzi-Br...
 
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...
II-SDV 2017: Localizing International Content for Search, Data Mining and Ana...
 
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.comEnhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
 
Collective sensing
Collective sensingCollective sensing
Collective sensing
 
Big Data + Sentiment Analysis = Awesome
Big Data + Sentiment Analysis = AwesomeBig Data + Sentiment Analysis = Awesome
Big Data + Sentiment Analysis = Awesome
 
Correlation does not mean causation
Correlation does not mean causationCorrelation does not mean causation
Correlation does not mean causation
 
20211115 jsai international_symposia_slide
20211115 jsai international_symposia_slide20211115 jsai international_symposia_slide
20211115 jsai international_symposia_slide
 
Design Recommender systems from scratch
Design Recommender systems from scratchDesign Recommender systems from scratch
Design Recommender systems from scratch
 
Recommender Systems in a nutshell
Recommender Systems in a nutshellRecommender Systems in a nutshell
Recommender Systems in a nutshell
 
Better Search Engine Testing
Better Search Engine TestingBetter Search Engine Testing
Better Search Engine Testing
 
Not fair! testing ai bias and organizational values
Not fair! testing ai bias and organizational valuesNot fair! testing ai bias and organizational values
Not fair! testing ai bias and organizational values
 
Not fair! testing AI bias and organizational values
Not fair! testing AI bias and organizational valuesNot fair! testing AI bias and organizational values
Not fair! testing AI bias and organizational values
 
Recommendation engine Using Genetic Algorithm
Recommendation engine Using Genetic AlgorithmRecommendation engine Using Genetic Algorithm
Recommendation engine Using Genetic Algorithm
 
Natural Language Processing: L01 introduction
Natural Language Processing: L01 introductionNatural Language Processing: L01 introduction
Natural Language Processing: L01 introduction
 
Umm, how did you get that number? Managing Data Integrity throughout the Data...
Umm, how did you get that number? Managing Data Integrity throughout the Data...Umm, how did you get that number? Managing Data Integrity throughout the Data...
Umm, how did you get that number? Managing Data Integrity throughout the Data...
 
Using AI to Build Fair and Equitable Workplaces
Using AI to Build Fair and Equitable WorkplacesUsing AI to Build Fair and Equitable Workplaces
Using AI to Build Fair and Equitable Workplaces
 
SciBite
SciBiteSciBite
SciBite
 
Semi supervised approach for word sense disambiguation
Semi supervised approach for word sense disambiguationSemi supervised approach for word sense disambiguation
Semi supervised approach for word sense disambiguation
 
Machine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersMachine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional Managers
 
Advanced topics research
Advanced topics researchAdvanced topics research
Advanced topics research
 

Último

20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-pyJamie (Taka) Wang
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDELiveplex
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfJamie (Taka) Wang
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarPrecisely
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsSafe Software
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 

Último (20)

20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-py
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
20150722 - AGV
20150722 - AGV20150722 - AGV
20150722 - AGV
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity Webinar
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 

Fypca4

  • 1. Mining User’s Opinions in Hotel TEY JUN HONG U095074X National University Of Singapore
  • 2. Content 1. Background 2. Formulating the problem 3. Data Mining Process 4. Techniques 5. Analysis 01
  • 3. What is Data Mining? • Extraction of meaningful / useful / Interesting patterns from a large volume of data sources • In this project, the source will be large volume of WEB HOTEL REVIEWS data • Data mining is one of the top ten emerging technology MIT’s TECHNOLOGY REVIEW 2004
  • 4. What is Data • Mining? Process of exploration and analysis • By automatic / semi automatic means • With little or no human interactions • To discover meaningful patterns and rules MASTERING DATA MINING BY BERRY AND LINOFF, 2000
  • 5. User’s Opinions in Hotel • Increase in social media and web user • Increase in valuable opinion oriented data in Hotel due to web expansion • Identify potential hotel to stay by looking at the aspects • Overall Sentiments on hotel are greatly sought on the web for Sentiment Analysis
  • 6. What can Data Mining do? • Identify best prospects (ASPECTS), and retain customers • Predict what ASPECTS customers like and promote accordingly • Learn parameters influencing trends in sales and margins • Identification of opinions for customers Sentiment Analysis !!!
  • 7. What are the problems? • Exponential growth of user’s opinions • Limitations of human analysis • Accuracy of human analysis Machines can be trained to take over human analysis with advanced computer technology and it is done with LOW COST
  • 8. Some Limitations of machines • Unable to read like a human • No emotions • Cannot detect sarcasm • Expression of sentiments in different topic and domain • Polarity analysis • Facts Vs Opinion
  • 9. Some machine limitation • “The service is as good as none”. examples Negation not obvious to machine • “Swimming pool is big enough to swim with comfort” , “There is a big crowd at the counter complaining”. Polarity might change with context. • “The room is warmer than the lobby”. Comparisons are hard to classify
  • 10. Sentiment • Analysis Machine learning • Pattern recognition • Statistics • Databases
  • 11. Machine Learning • A tool for data mining and intelligent decision support • Application of computer algorithms that improve automatically through experience MASTERING DATA MINING BY BERRY AND LINOFF, 2000
  • 12. Types of Machine learning • Supervised Learning • A training set is provided (data with correct answers) which is used to mine for known pattern • Unsupervised Learning • Data are provided with no prior knowledge of the hidden patterns that they contain. • Semi Supervised Learning
  • 13. Supervised Learning techniques • Rule Mining and Rule learning • Bayesian Networks • Support Vector Machine
  • 14. Project Objective • Prediction of sentence polarity • Classification of polarity for sentiment lexicon • Detection of relations
  • 15. Pre-requisite • Large data set • Relevant Prior Knowledge to domain, in our case the hotel domain • Eg. Rating • Sentiment lexicon for sentiment analysis • Data selection for reliability and standards
  • 17. Cleaning the “Dirty” Data (60% of • effort) Frequent problem : Data inconsistencies • Duplicate data • Spelling Errors != Trim from data • Foreign accent and characters • Singular / Plural conversion • Punctuations removal / replacement • Noise and incomplete data • Naming convention misused, same name but different meaning
  • 18. Data Preprocessing (Laundering) • Part of Speech Tagging (POS) using Brill Tagger • Polarity tagging using sentiment lexicon
  • 19. Findings • Part of Speech Tagging (POS) using Brill Tagger - NO PROBLEM -95% accuracy POS tagging words after data cleaning
  • 20. Findings • Polarity tagging using sentiment lexicon – BIG PROBLEM -40% sentiment words not found in sentiment lexicon -10% sentiment words with a positive or negative polarity found are in the neutral section of sentiment lexicon
  • 21. Problems • Sentiment lexicon not comprehensive to fulfill machine learning technique adopted • Polarity of sentiment words who are domain dependent are founded in neutral section of sentiment lexicon • Polarity of sentiment words can also change within the domain even though they are domain dependent EXPANSION OF LEXICON !!!
  • 22. Solution • Classify the polarity of unlabeled sentiment word using rule based mining • Classify domain dependent sentiment words • Establish word relations between labeled and unlabeled sentiment words
  • 23. Data Processing • Rule based mining using conjunction and punctuation Polarity Assignment Rules Same Adj – AND/OR - Adj Opposite Neg - Adj – AND/OR - Adj / Adj – AND/OR - Neg- Adj Same Neg - Adj – AND/OR - Neg- Adj Opposite Adj – BUT/NOR – Adj Same Neg - Adj – BUT/NOR - Adj / Adj – BUT/NOR - Neg- Adj Opposite Neg - Adj – BUT/NOR - Neg- Adj Same Adj , Adj
  • 24. Data Processing • Relation Network – Aspect – Sentiment word pair
  • 25. Data Processing • Relation Network – Aspect – Sentiment word pair
  • 26. Analysis • Using the expanded sentiment lexicon, we analyze the polarity sentiment by doing a sentiment lookup using Bayesian Network
  • 27. Bayesian • To determine polarity of sentiments P(X | Y) = P(X) P(Y | X) / P(Y) • Probability that a sentiments is positive or negative, given it's contents • Assumptions: There is no link between words • P(sentiment | sentence) = P(sentiment)P(sentence | sentiment) / P(sentence)
  • 28. Validation • Precision = N (agree & found) / N (found) • High precision means most of the correct sentiment words are found by the system • Recall = N (agree & found) / N (agree) • High recall means most of found sentiment words are correctly labeled by the system
  • 29. Validation Results • It is found that out of the 350 aspect- unlabelled sentiment word pairs, • Only 194 are founded by the methods. Thus, the precision is about 57%. • The recall is also not very high; only 126 words are corrected labelled by the system, which is about 63%.
  • 30. Discussion • The results will improve if more rules are applied such the inclusion of more adverbs such as “excessively” as negation words. • There might not be enough dataset for the system to work on. There are only 350 aspect- unlabelled sentiment word pairs for the application to work with. • This, however requires more human judges to validate the data
  • 31. Conclusion • Comprehensive Sentiment Lexicon is a simple yet effective solution to sentiment analysis as it does not requires prior training • Current sentiment lexicon does not capture such domain and context sensitivities of sentiment expressions
  • 32. Conclusion • This leads to poor coverage • Thus, expanding general sentiment lexicon to capture domain and context sensitivities of sentiment expressions are advocated
  • 33. Questions? 01 DEMO

Notas del editor

  1. What can we infer from user opinions of hotel
  2. What can data mining do in a hotel domain, in other words, learn the market
  3. Impossible for humans to read every single opinionsBiased of humans to read certain opinionsMachinesAllow fast access to vast amount of dataAllow computational intensive algorithm and statistical methods
  4. Impossible for humans to read every single opinionsBiased of humans to read certain opinionsMachinesAllow fast access to vast amount of dataAllow computational intensive algorithm and statistical methods
  5. Many fields of data mining and in this project we will focus on these 4
  6. Growing data volume , limitation of humans and low cost to human
  7. The goal for unsupervised learning is to discover these patternsSemi – Knowledge is known and applied from one data collection in order to mine, classify, analyze, interpret a related data collection
  8. Some of the problems to be solved by data miningPrediction of sentence polarityClassification of polarity for sentiment lexiconDetection of relations
  9. Data inconsistencies: Say good in the title but in the review say bad
  10. Assigning a label to every word in the text to allow machine to do something with it
  11. Pos tagging wrong due to some word like heart having double tagging
  12. For example, in the domain of handheld devices, the word “large” can express positivity for screen size but negativity in the phone size.
  13. Assigning a label to every word in the text to allow machine to do something with it
  14. After establishing relations, we have a graph of nodes (Sentiments / Aspects)Determine the probability that the node is positive or negative given its surrounding nodesStart with a high frequency unlabelled sentiment word-aspect pair then based on the aspect and its label semtiment pair, determine the polarity for the unlabelThis process iterate till all unlabe found their polarity
  15. After establishing relations, we have a graph of nodes (Sentiments / Aspects)Determine the probability that the node is positive or negative given its surrounding nodesStart with a high frequency unlabelled sentiment word-aspect pair then based on the aspect and its label semtiment pair, determine the polarity for the unlabelThis process iterate till all unlabe found their polarity
  16. Assigning a label to every word in the text to allow machine to do something with it
  17. A comprehensive sentiment lexicon can provide a simple yet effective solution to sentiment analysis, because it is general and does not require prior training. Therefore, attention and effort have been paid to the construction of such lexicons. However, a significant challenge to this approach is that the polarity of many words is domain and context dependent. For example, ‘long’ is positive in ‘long battery life’ and negative in ‘long shutter lag.’ Current sentiment lexicons do not capture such domain and context sensitivities of sentiment expressions. They either exclude such domain and context dependent sentiment expressions or tag them with an overall polarity tendency based on statistics gathered from certain corpus such as the world wide web accessed via the internet. While excluding such expressions leads to poor coverage, simply tagging them with a polarity tendency leads to poor precision.
  18. ATheyeither exclude such domain and context dependent sentiment expressions or tag them with an overall polarity tendency based on statistics gathered from certain corpus such as the world wide web accessed via the internet. While excluding such expressions leads to poor coverage, simply tagging them with a polarity tendency leads to poor precision.