Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Twitter sentiment analysis

7.384 visualizaciones

Publicado el

IRE Project presentation on Twitter Sentiment analysis.

Publicado en: Educación
  • Inicia sesión para ver los comentarios

Twitter sentiment analysis

  1. 1. Twitter Sentiment Analysis Akhil Batra Avinash Kalivarapu Sunil Kandari
  2. 2. Sentiment Analysis ? • Sentiment Analysis is the process of determining whether a piece of writing is positive, negative or neutral. • Also referred to as opinion mining, it makes our goal to determine whether the data(tweet) is positive, negative or neutral.
  3. 3. Why is Sentiment Analysis Important? • In public opinions eg: • Is this product review positive or negative? • Is this customer email satisfied or dissatisfied? • Based on a sample of tweets, how are people responding to this ad campaign/product release/news item? • How have bloggers' attitudes about the president changed since the election?
  4. 4. Why Twitter Data for Sentiment Analysis? • Popular microblogging site • Short Text Messages of 140 characters • 240+ million active users • 500 million tweets are generated everyday • Twitter audience varies from common man to celebrities • Users often discuss current affairs and share personal views on • various subjects • Tweets are small in length and hence unambiguous
  5. 5. Problem Statement Given a message, decide whether the message is of positive, negative, or neutral sentiment. For messages conveying both a positive and negative sentiment, whichever is the stronger sentiment should be chosen
  6. 6. Challenges • People express opinions in complex ways • In opinion texts, lexical content alone can be misleading • Intra-textual and sub-sentential reversals,negation, topic change common • Rhetorical devices/modes such as sarcasm, irony, implication, etc. • Unstructured and also non-grammatical • Lexical Variation • Out of Vocabulary Words • Extensive usage of acronyms like asap, lol, afaik
  7. 7. Twitter Dataset Preprocessing Tokenizer Feature Extraction (Word +Senti Feature) Classification(unigram-bigram SVM/Bayes ) Process Flow
  8. 8. Training
  9. 9. Testing
  10. 10. Extracted Features • Word feature • Word polarity score using wordnet • Positive/Negative Hash Tags • Positive/Negative/Extremely Positive/Extremely Negative Emoticons • Negations • POS tag polarity score(Noun,Preposition,Adjectives) • Special characters • Count of repetition words • Count of Non English words • Count of Acronyms
  11. 11. Classifiers • Naive Bayes Classifier • SVM
  12. 12. Analysis and Results Classifiers % Accuracy Unigram + Bayes Classification function 50* Bigram + Bayes Classification function 54* Unigram + SVM 65* Unigram+ Senti-Feature+SVM 66* Unigram+ Senti-Feature+POS polarity+SVM 68*
  13. 13. Conclusion We conclude that extracting features and POS tagging of tweets gives the best result using SVM classifier There is always a scope of increasing the accuracy by extracting more features which are relevant for the sentiments Increasing the n-gram value to more than 2 does not necessarily increase the accuracy