Building Large Arabic Multi-Domain Resources for Sentiment Analysis
1. Building Large Arabic
Multi-domain Resources
for Sentiment Analysis
Hady ElSahar and Samhaa R. El-Beltagy
Center for Informatics Science, Nile University
CICLing 2015 – April 19, 2014
hadyelsahar@gmail.com
2. Agenda
• Problem Statement
• Building Multi-Domain Datasets for Sentiment Analysis
• Building Multi-Domain lexicons
• Experiments and Evaluation
• Mining experiments results
4. Problem Statement
• Small size
• Domain Specificity
• Not publicly available
• Insufficient coverage of different Arabic dialects and non standard
terms
Current resources for sentiment analysis suffer many deficiencies:
5. Problem Statement
Author Dataset name Size Multi Domain Publicly Available
Rushdi-Saleh et al. OCA 500 NO YES
Abdul-Mageed & Diab AWATIF < 10K Yes NO
Aly, M. & Atiya, A. LABR 63K NO YES
Eshrag Refaee et al. Twitter Corpus 8,868 N/A YES
Sentiment Datasets related work
6. Problem Statement
Author Size MSA / Dialect Multi Domain Publicly Available
El-Beltagy et al. 4K MSA + Dialect N/A YES
Abdul-Mageed & Diab (SANA) 225K MSA + Dialect Yes NO
Badaro et al. 150K MSA only N/A YES
Sentiment lexicons related work
7. Proposed solution
• Building large Arabic datasets and lexicons for sentiment analysis
• Large size
• Multi-domain
• Arabic dialects
• Well documented, tested for sentiment classification
• Publicly available for every one to use
8. Agenda
• Problem Statement
• Building Multi-Domain Datasets for Sentiment Analysis
• Building Multi-Domain lexicons
• Experiments and Evaluation
• Mining experiments results
10. Building Datasets
• Lack of Arabic reviewing content on the internet:
• Less Arabic based e-commerce & reviewing websites
• Arabic speakers use the English language to write their reviews
English *** , Do you Speak it !!!!
11. Domain Reviewing Websites Scrapped
Hotel reviews
Restaurant reviews
Product Reviews
Movie Reviews
Building Datasets
Scrapping Arabic Reviewing content on the Internet
12. Building Datasets
• Normalize different ratings systems into ( positive, negative and neutral )
classes using heuristics.
• Automatic labeling of reviews.
13. Building Datasets
• Removing redundant and spamming reviews
• Removing contradicting reviews ( Similar Text Different polarity )
• Remove duplicate reviews
17. Agenda
• Problem Statement
• Building Multi-Domain Datasets for Sentiment Analysis
• Building Multi-Domain lexicons
• Experiments and Evaluation
• Mining experiments results
18. Building multi domain lexicons
• Manually hand crafting sentiment lexicons is a tedious task
• Proposed approach utilizes feature selection and ranking of Support Vector
Machines (SVM)
• SVM with L1 regularization penalty results in sparse coefficient vectors
doesn't
deserve
bad failure happen scene better wonderful enjoyable
يستحق ال سئ فشل ……. حصل مشهد …….. افضل رائع ممتع
-0.532 -0.52 -0.4 ……. 0 0 …….. 0.270 0.272 0.357
Coefficient vector of a trained support vector machine
19. Datasets
Training
L1-norm SVM
Selecting Top
Features
Manually
Verification
Multi domain
Lexicons
Building multi domain lexicons
• Train SVM classifier on each of the generated datasets using a unigram + bigram
model
• Omit features corresponding to zero coefficients
• Label features with positive coefficient values as positive lexicon entries
• Label features with negative coefficient values as Negative lexicon entries
• Manually filter and verify resulting lexicon ( a lot easier ! )
20. Building multi domain lexicon from the
datasets
Hotels Restaurants Movies Products LABR / Books ALL
# non-zero coef.
features
556 1413 526 661 3552 6708
# Manually filtered 218 734 87 369 874 1913
Size of built multi-domain lexicons before and after manual filtration
21. Building multi domain lexicon from the
datasets
Selected examples from the Generated lexicons:
Hotels Restaurants Movies
أعود لن
not coming back
بارد
cold
المشاهدة يستحق
worth watching
المياهضعيفة
low water pressure
يشبع
Enough portions
برافو
Bravo
22. Agenda
• Problem Statement
• Building Multi-Domain Datasets for Sentiment Analysis
• Building Multi-Domain lexicons
• Experiments and Evaluation
• Mining experiments results
23. Experiments and bench marking Datasets
• Verify the viability of using the datasets for sentiment analysis
• Test the effectiveness of the generated lexicon
• Export the results of all experiments publicly for further analysis
• Provide easy benchmarking framework for future sentiment classifiers
Experiments Benchmarking the datasets for the task of sentiment analysis :
24. Experiments and bench marking Datasets
Datasets setups :
• 2 Class sentiment Classification (Positive or Negative)
• 3 Class Sentiment Classification problem (Positive, Negative or Mixed/Neutral )
• Balanced / Unbalanced Setups
• 20%-80% Splits (testing generated lexicons on unseen data)
• Cross validation
25. Experiments and bench marking Datasets
Feature building Methods :
• Standard feature building methods :
• Count, TF-IDF, Delta-TFIDF
• Features built from generated lexicons :
• (term existence, term count, weighted count )
• Domain specific lexicon, domain general lexicon
• Merging Lexicon based features with other features
Classifiers : Linear SVM, Logistic regression, BNB , KNN and SGD
26. Experiments and bench marking Datasets
• 3075 experiments, resulted from using all classifiers, features and
Datasets setups combinations together.
• Results are publicly available for further analysis and as benchmarks
27. Agenda
• Problem Statement
• Building Multi-Domain Datasets for Sentiment Analysis
• Building Multi-Domain lexicons
• Experiments and Evaluation
• Mining experiments results
28. Mining experiments results
Mining the experiments results to answer questions like :
• What are the top performing classifiers and features combinations ?
• Can we rely only on lexicons for sentiment analysis ?
• What is the effect of combining lexicon based features with other
features ?
• Are shorter documents easier to classify ?
• Are documents richer with subjective words easier to classify ?
29. Can we rely only on lexicon based
features for sentiment
classification?
Can features generated from lexicons provide an adequate accuracy relative to
other feature generating methods.
30. Mining experiments results
Features Number of features Average Accuracy
2Class
Lex-domain ~ 500 0.768
Lex-all 1913 0.782
Count ~ 50K features 0.783
31. Mining experiments results
Features Number of features Average Accuracy
3Class
Lex-domain ~ 500 0.549
Lex-all 1913 0.554
Count ~ 50K features 0.570
32. Effect of merging lexicon based
features with other features?
Can features generated from lexicons provide an adequate accuracy relative to
other feature generating methods.
36. Mining experiments results
Storyline : Patch Adams was desperate and attempt to commit a
suicide many times, until he was sent to a mental hospital….
……..
Then he started unintentionally helping others through socializing
with them until they have become better
37. Mining experiments results
• Document length : No. of tokens in per document (log scale)
• Subjectivity score
• Sum of polarities of words that appear in the document (using generated
lexicons)
• Error Rate
• Number of misclassified documents of this specific group (doc. Length and
subjectivity score )
38. Mining experiments results
The error rate for various document lengths and subjectivity score groups (the Darker the worse)
39. Conclusion
• Built a large multi-domain datasets for sentiment Analysis ( 33K
reviews)
• Proposed an approach for semi-automatically learning multi-domain
lexicons (~2K)
• Everything is publicly available :
• Datasets (raw + processed)
• Lexicons
• Web Scrappers (to rerun for more recent reviews)
• Experiments code and results