SlideShare a Scribd company logo
1 of 15
Download to read offline
Sentiment analysis of tweets using Neural Networks
Adri´an Palacios
Universidad Polit´ecnica de Valencia
June 6th, 2013
1 de 15
Introduction
The objective of this work is:
• To use Neural Networks (using the April toolkit) for the polarity
classification of tweets.
• To check how NNs behave when applying different techniques for
preprocessing the data.
• We don’t look for good results, we are just experimenting with
these techniques.
2 de 15
Preprocessing of tweets
Prior to the training of NNs, we need to obtain a feature vector
representation for the samples (tweets):
3 de 15
Preprocessing techniques
To achieve this, we create a bag of words after applying one of the
following preprocessing techniques:
1. Unigrams.
2. Bigrams.
3. Stemming.
4. Lemmatization.
5. Part-of-Speech tagging.
4 de 15
Stemming
Stemming: A process that chops off the suffixes of a given word
following some predefined rules.
Examples:
• Stem(run): run.
• Stem(ran): ran.
• Stem(running): run.
5 de 15
Lemmatization
Lemmatization: A process that determines the lemma (canonical
form of the lexeme) of a given word.
Examples:
• Lemma(run): run.
• Lemma(ran): run.
• Lemma(running): run.
6 de 15
PoS tagging
PoS tagging: The assignation Part-of-Speech tags to the words of a
given sentence.
7 de 15
Learning techniques
The polarity classification will be made:
• Using a Multilayer Perceptron with a single layer,
• after 5-fold cross-validation technique,
• and an ensemble of the resulting MLPs.
8 de 15
Hyper-parameter search
We will perform a random search for hyper-parameter optimization
instead of a grid search.
9 de 15
Ensemble methods
After training is done, since we use 5-fold cross-validation, we get 5
MLPs for each set of parameters.
To be consistent, we merge these 5 classifiers into a single one using
the bootstrap aggregating method (votes have equal weight) for the
ensemble.
10 de 15
Corpus
We will work with the corpus provided at the 2012 edition of the
Workshop on Sentiment Analysis at SEPLN.
Training Test
Samples 7219 60798
11 de 15
Training results
Accuracy of the validation set classification:
3 levels 5 levels
Unigrams 54.44 45.62
Bigrams 54.09 39.99
Stemming 62.34 47.49
Lemmatization 61.60 46.75
PoS-tagging 52.58 38.40
12 de 15
Test results
Accuracy of the test set classification (average and ensemble):
3 levels 5 levels
Unigrams 32.13 26.12
Bigrams 32.39 28.21
Stem. 32.34 26.81
Lemma. 31.84 26.18
PoS-tag. 35.22 35.22
3 levels 5 levels
Unigrams 32.16 26.52
Bigrams 32.32 29.32
Stem. 32.23 27.16
Lemma. 31.80 26.49
PoS-tag. 35.22 35.22
13 de 15
Conclusions
Results are bad, but we can improve by:
• Using more complex techniques for preprocessing.
• Using more complex models for learning.
• Exploring more values for random hyper-parameter search.
• Learning from PoS tagged tweets in a different way.
14 de 15
Questions?
The tools used for the experiments can be found at:
• The NLTK: nltk.org
• Freeling: nlp.lsi.upc.edu/freeling
• The April toolkit: github.com/pakozm/april-ann
15 de 15

More Related Content

What's hot

Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.
butest
 
Internship project report,Predictive Modelling
Internship project report,Predictive ModellingInternship project report,Predictive Modelling
Internship project report,Predictive Modelling
Amit Kumar
 
Basics of Machine Learning
Basics of Machine LearningBasics of Machine Learning
Basics of Machine Learning
butest
 
SENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATASENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATA
Parvathy Devaraj
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
Lior Rokach
 

What's hot (18)

Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.
 
Sentimental Analysis - Naive Bayes Algorithm
Sentimental Analysis - Naive Bayes AlgorithmSentimental Analysis - Naive Bayes Algorithm
Sentimental Analysis - Naive Bayes Algorithm
 
Svm and maximum entropy model for sentiment analysis of tweets
Svm and maximum entropy model for sentiment analysis of tweetsSvm and maximum entropy model for sentiment analysis of tweets
Svm and maximum entropy model for sentiment analysis of tweets
 
Sentiment analysis using ml
Sentiment analysis using mlSentiment analysis using ml
Sentiment analysis using ml
 
Machine Learning Interview Questions and Answers
Machine Learning Interview Questions and AnswersMachine Learning Interview Questions and Answers
Machine Learning Interview Questions and Answers
 
Internship project report,Predictive Modelling
Internship project report,Predictive ModellingInternship project report,Predictive Modelling
Internship project report,Predictive Modelling
 
Applications in Machine Learning
Applications in Machine LearningApplications in Machine Learning
Applications in Machine Learning
 
Presentation on supervised learning
Presentation on supervised learningPresentation on supervised learning
Presentation on supervised learning
 
Machine Learning Unit 1 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 1 Semester 3  MSc IT Part 2 Mumbai UniversityMachine Learning Unit 1 Semester 3  MSc IT Part 2 Mumbai University
Machine Learning Unit 1 Semester 3 MSc IT Part 2 Mumbai University
 
Basics of Machine Learning
Basics of Machine LearningBasics of Machine Learning
Basics of Machine Learning
 
SENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATASENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATA
 
Semi supervised learning machine learning made simple
Semi supervised learning  machine learning made simpleSemi supervised learning  machine learning made simple
Semi supervised learning machine learning made simple
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Matrix Factorization Technique for Recommender Systems
Matrix Factorization Technique for Recommender SystemsMatrix Factorization Technique for Recommender Systems
Matrix Factorization Technique for Recommender Systems
 
Learning
LearningLearning
Learning
 
Four machine learning methods to predict academic achievement of college stud...
Four machine learning methods to predict academic achievement of college stud...Four machine learning methods to predict academic achievement of college stud...
Four machine learning methods to predict academic achievement of college stud...
 

Viewers also liked

Evolutionary Multi-Agent Systems for RTS Games
Evolutionary Multi-Agent Systems for RTS GamesEvolutionary Multi-Agent Systems for RTS Games
Evolutionary Multi-Agent Systems for RTS Games
Adrián Palacios Corella
 

Viewers also liked (8)

Can Deep Learning solve the Sentiment Analysis Problem
Can Deep Learning solve the Sentiment Analysis ProblemCan Deep Learning solve the Sentiment Analysis Problem
Can Deep Learning solve the Sentiment Analysis Problem
 
Evolutionary Multi-Agent Systems for RTS Games
Evolutionary Multi-Agent Systems for RTS GamesEvolutionary Multi-Agent Systems for RTS Games
Evolutionary Multi-Agent Systems for RTS Games
 
Adaptación de skip-gramas a modelos conexionistas del lenguaje
Adaptación de skip-gramas a modelos conexionistas del lenguajeAdaptación de skip-gramas a modelos conexionistas del lenguaje
Adaptación de skip-gramas a modelos conexionistas del lenguaje
 
Multimedia data minig and analytics sentiment analysis using social multimedia
Multimedia data minig and analytics sentiment analysis using social multimediaMultimedia data minig and analytics sentiment analysis using social multimedia
Multimedia data minig and analytics sentiment analysis using social multimedia
 
CNN for Sentiment Analysis on Italian Tweets
CNN for Sentiment Analysis on Italian TweetsCNN for Sentiment Analysis on Italian Tweets
CNN for Sentiment Analysis on Italian Tweets
 
Model selection and tuning at scale
Model selection and tuning at scaleModel selection and tuning at scale
Model selection and tuning at scale
 
Turning Analysis into Action with APIs - Superweek 2017
Turning Analysis into Action with APIs - Superweek 2017Turning Analysis into Action with APIs - Superweek 2017
Turning Analysis into Action with APIs - Superweek 2017
 
How Sentiment Analysis works
How Sentiment Analysis worksHow Sentiment Analysis works
How Sentiment Analysis works
 

Similar to Sentiment analysis of tweets using Neural Networks

CIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition rankingCIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition ranking
eXascale Infolab
 
Poster_Reseau_Neurones_Journees_2013
Poster_Reseau_Neurones_Journees_2013Poster_Reseau_Neurones_Journees_2013
Poster_Reseau_Neurones_Journees_2013
Pedro Lopes
 
NITW_Improving Deep Neural Networks.pptx
NITW_Improving Deep Neural Networks.pptxNITW_Improving Deep Neural Networks.pptx
NITW_Improving Deep Neural Networks.pptx
ssuserd23711
 
LETS PUBLISH WITH MORE RELIABLE & PRESENTABLE MODELLING.pptx
LETS PUBLISH WITH MORE RELIABLE & PRESENTABLE MODELLING.pptxLETS PUBLISH WITH MORE RELIABLE & PRESENTABLE MODELLING.pptx
LETS PUBLISH WITH MORE RELIABLE & PRESENTABLE MODELLING.pptx
shamsul2010
 

Similar to Sentiment analysis of tweets using Neural Networks (20)

Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
 
CIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition rankingCIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition ranking
 
Ensemble methods in Machine learning technology
Ensemble methods in Machine learning technologyEnsemble methods in Machine learning technology
Ensemble methods in Machine learning technology
 
Getting started with Machine Learning
Getting started with Machine LearningGetting started with Machine Learning
Getting started with Machine Learning
 
ANN - UNIT 3.pptx
ANN - UNIT 3.pptxANN - UNIT 3.pptx
ANN - UNIT 3.pptx
 
ANN - UNIT 3.pptx
ANN - UNIT 3.pptxANN - UNIT 3.pptx
ANN - UNIT 3.pptx
 
crossvalidation.pptx
crossvalidation.pptxcrossvalidation.pptx
crossvalidation.pptx
 
Cross validation.pptx
Cross validation.pptxCross validation.pptx
Cross validation.pptx
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learning
 
Week 11 Model Evalaution Model Evaluation
Week 11 Model Evalaution Model EvaluationWeek 11 Model Evalaution Model Evaluation
Week 11 Model Evalaution Model Evaluation
 
Simple Ensemble Learning
Simple Ensemble LearningSimple Ensemble Learning
Simple Ensemble Learning
 
Poster_Reseau_Neurones_Journees_2013
Poster_Reseau_Neurones_Journees_2013Poster_Reseau_Neurones_Journees_2013
Poster_Reseau_Neurones_Journees_2013
 
Artificial Neural Networks , Recurrent networks , Perceptron's
Artificial Neural Networks , Recurrent networks , Perceptron'sArtificial Neural Networks , Recurrent networks , Perceptron's
Artificial Neural Networks , Recurrent networks , Perceptron's
 
NITW_Improving Deep Neural Networks (1).pptx
NITW_Improving Deep Neural Networks (1).pptxNITW_Improving Deep Neural Networks (1).pptx
NITW_Improving Deep Neural Networks (1).pptx
 
NITW_Improving Deep Neural Networks.pptx
NITW_Improving Deep Neural Networks.pptxNITW_Improving Deep Neural Networks.pptx
NITW_Improving Deep Neural Networks.pptx
 
Random Forest Decision Tree.pptx
Random Forest Decision Tree.pptxRandom Forest Decision Tree.pptx
Random Forest Decision Tree.pptx
 
AWS Certified Machine Learning Specialty
AWS Certified Machine Learning Specialty AWS Certified Machine Learning Specialty
AWS Certified Machine Learning Specialty
 
LETS PUBLISH WITH MORE RELIABLE & PRESENTABLE MODELLING.pptx
LETS PUBLISH WITH MORE RELIABLE & PRESENTABLE MODELLING.pptxLETS PUBLISH WITH MORE RELIABLE & PRESENTABLE MODELLING.pptx
LETS PUBLISH WITH MORE RELIABLE & PRESENTABLE MODELLING.pptx
 
EssentialsOfMachineLearning.pdf
EssentialsOfMachineLearning.pdfEssentialsOfMachineLearning.pdf
EssentialsOfMachineLearning.pdf
 
Testing
TestingTesting
Testing
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 

Sentiment analysis of tweets using Neural Networks

  • 1. Sentiment analysis of tweets using Neural Networks Adri´an Palacios Universidad Polit´ecnica de Valencia June 6th, 2013 1 de 15
  • 2. Introduction The objective of this work is: • To use Neural Networks (using the April toolkit) for the polarity classification of tweets. • To check how NNs behave when applying different techniques for preprocessing the data. • We don’t look for good results, we are just experimenting with these techniques. 2 de 15
  • 3. Preprocessing of tweets Prior to the training of NNs, we need to obtain a feature vector representation for the samples (tweets): 3 de 15
  • 4. Preprocessing techniques To achieve this, we create a bag of words after applying one of the following preprocessing techniques: 1. Unigrams. 2. Bigrams. 3. Stemming. 4. Lemmatization. 5. Part-of-Speech tagging. 4 de 15
  • 5. Stemming Stemming: A process that chops off the suffixes of a given word following some predefined rules. Examples: • Stem(run): run. • Stem(ran): ran. • Stem(running): run. 5 de 15
  • 6. Lemmatization Lemmatization: A process that determines the lemma (canonical form of the lexeme) of a given word. Examples: • Lemma(run): run. • Lemma(ran): run. • Lemma(running): run. 6 de 15
  • 7. PoS tagging PoS tagging: The assignation Part-of-Speech tags to the words of a given sentence. 7 de 15
  • 8. Learning techniques The polarity classification will be made: • Using a Multilayer Perceptron with a single layer, • after 5-fold cross-validation technique, • and an ensemble of the resulting MLPs. 8 de 15
  • 9. Hyper-parameter search We will perform a random search for hyper-parameter optimization instead of a grid search. 9 de 15
  • 10. Ensemble methods After training is done, since we use 5-fold cross-validation, we get 5 MLPs for each set of parameters. To be consistent, we merge these 5 classifiers into a single one using the bootstrap aggregating method (votes have equal weight) for the ensemble. 10 de 15
  • 11. Corpus We will work with the corpus provided at the 2012 edition of the Workshop on Sentiment Analysis at SEPLN. Training Test Samples 7219 60798 11 de 15
  • 12. Training results Accuracy of the validation set classification: 3 levels 5 levels Unigrams 54.44 45.62 Bigrams 54.09 39.99 Stemming 62.34 47.49 Lemmatization 61.60 46.75 PoS-tagging 52.58 38.40 12 de 15
  • 13. Test results Accuracy of the test set classification (average and ensemble): 3 levels 5 levels Unigrams 32.13 26.12 Bigrams 32.39 28.21 Stem. 32.34 26.81 Lemma. 31.84 26.18 PoS-tag. 35.22 35.22 3 levels 5 levels Unigrams 32.16 26.52 Bigrams 32.32 29.32 Stem. 32.23 27.16 Lemma. 31.80 26.49 PoS-tag. 35.22 35.22 13 de 15
  • 14. Conclusions Results are bad, but we can improve by: • Using more complex techniques for preprocessing. • Using more complex models for learning. • Exploring more values for random hyper-parameter search. • Learning from PoS tagged tweets in a different way. 14 de 15
  • 15. Questions? The tools used for the experiments can be found at: • The NLTK: nltk.org • Freeling: nlp.lsi.upc.edu/freeling • The April toolkit: github.com/pakozm/april-ann 15 de 15