SlideShare una empresa de Scribd logo
1 de 24
Descargar para leer sin conexión
Introduction
Development of the classification system
Application and experimental results
Conclusions
Sentiment Analysis for italian language
microblogging: development and valutation of an
automatic system
Corrado Monti
22-04-2013
Corrado Monti Sentiment Analysis for italian microblogging 1/23
Introduction
Development of the classification system
Application and experimental results
Conclusions
Twitter
Main microblogging platform
instant pubblication of short textual contents
Tweet → 140 characters
Widespread in Italy too
Italian internet
users
Hearsay
knowledge of
Twitter
Weekly
Twitter users
Dayly
Twitter
users
28.6 M people
100%
25.3 M people
88.6% of internet users
1.24 M people
4.4% of internet users
4.7 M people
16.5% of internet users
december 2012
Corrado Monti Sentiment Analysis for italian microblogging 2/23
Introduction
Development of the classification system
Application and experimental results
Conclusions
Textual classification
We would like to classify millions of texts
Rule-based approach: definitions of rules (keywords, regular
expressions) together with field experts
Inaccurate, hard to define
Supervised machine learning
They need a training set to learn how to classify new texts
Every text is transformed in a sequence of numerical features;
the algorithm learns what these numbers mean
Recent problem: Sentiment Analysis → recognize the
sentiment expressed by the author of the text
Many applications, both industrial and academic
Corrado Monti Sentiment Analysis for italian microblogging 3/23
Introduction
Development of the classification system
Application and experimental results
Conclusions
Measuring collective feelings
One of the first works is O’Connor et al. (2010), who
measured economic trust through Sentiment Analysis on
Twitter
SentimentRatio
1.52.02.53.03.54.0
GallupEconomicConfidence
−60−50−40−30−20
2008−01
2008−02
2008−03
2008−04
2008−05
2008−06
2008−07
2008−08
2008−09
2008−10
2008−11
2008−12
2009−01
2009−02
2009−03
2009−04
2009−05
2009−06
2009−07
2009−08
2009−09
2009−10
2009−11
Dates Dates
For which phenomena is this possible?
How much are these measures valid? Do they work in Italy
too?
Corrado Monti Sentiment Analysis for italian microblogging 4/23
Introduction
Development of the classification system
Application and experimental results
Conclusions
Goals of this work
Hypothesis: can political disaffection be measured through
massive tweet classification?
It is a relevant phenomenon, especially in Italy
Lot of interest, academic (sociology) and not academic
We’d also like to build reusable classifiers
Corrado Monti Sentiment Analysis for italian microblogging 5/23
Introduction
Development of the classification system
Application and experimental results
Conclusions
Political disaffection
How to define a “disaffected” tweet?
According to domain experts, it must
1. have a politic-related topic → topic detection
2. have negative sentiment → sentiment analysis
3. be directed to all politicians
Tweet Is political?
Discarded
No
Yes
Is negative?
Discarded
No
Yes
Is general?
Discarded
No
Yes
Politically
Disaffected
Tweet
Corrado Monti Sentiment Analysis for italian microblogging 6/23
Introduction
Development of the classification system
Application and experimental results
Conclusions
Training Set
28 340 tweet labelled by 40 political science students
Between april and june 2012
3 labellers for every text
We keep in the dataset only tweet with unanimous “politic”
label
For other labels we measure agreement with Krippendorff α:
“negative” → 0.78 → reliable labels
“generic” → 0.41 → much noise on labels ×
⇓
negative non negative
politic 7 965 4 544
not politic 15 831
Corrado Monti Sentiment Analysis for italian microblogging 7/23
Introduction
Development of the classification system
Application and experimental results
Conclusions
1 – Topic Detection: politics
Time-robust classifier
Training Set extension: 17 388 news titles (January-October
2012)
Best feature extraction:
5-grams of characters → uniformity from different spacings
tf-idf
Discard terms with less than 4 occourences
Corrado Monti Sentiment Analysis for italian microblogging 8/23
Introduction
Development of the classification system
Application and experimental results
Conclusions
45 728 points with 78 642 features
SMO SVM-solver, k-Nearest Neighbor, Random Forest, kernel are too
resource-hungry
We preferred online algorithms
ALMA, Passive-Aggressive, PEGASOS, OIPCAC gave good results
Classifier Accuracy F-Measure Global time
ALMA 0.88 ± 0.01 0.87 ± 0.01 13.5 ± 1
PA 0.89 ± 0.01 0.89 ± 0.01 10.6 ± 0.1
PEGASOS 0.88 ± 0.01 0.88 ± 0.01 1103 ± 10
OIPCAC 0.89 ± 0.001 0.89 ± 0.01 5911 ± 52
Other algorithms were tested, but with worst results (i.e. Na¨ıve
Bayes)
We selected Passive-Aggressive: good results, low costs
Corrado Monti Sentiment Analysis for italian microblogging 9/23
Introduction
Development of the classification system
Application and experimental results
Conclusions
2 – Sentiment Analysis: feature extraction
Different ways were tested:
n-grams of characters, words, n-grams of words
boolean presence, term frequency, tf-idf, fuzzy match between
n-grams
Best: term frequency of single words
Adjustments:
Fraction of uppercase words as a feature
Features of synonyms were joined together
Other adjustments did not gave better results:
Consider as different words in beginning or in the end of the
tweet
Corrado Monti Sentiment Analysis for italian microblogging 10/23
Introduction
Development of the classification system
Application and experimental results
Conclusions
2 – Removal of the target of sentiment
We do not want that the sentiment could be guessed basing
on specific entities (PD, Berlusconi. . . ). It should be based on
the surrounding words.
Training Set prejudices ⇒ Overfitting
Problem sometimes ignored, especially in Italian language Sentiment
Analysis
How can we decide which words must be removed?
Proposed solution: ontology-based filter
DBpedia
query
SPARQL
fi rst
objects
list
text from
online
newspapers
list of
objects
to remove
appeared?
Discarded
No
Yes
Corrado Monti Sentiment Analysis for italian microblogging 11/23
Introduction
Development of the classification system
Application and experimental results
Conclusions
2 – Sentiment Analysis: classification
12 476 points with 2 383 features
Classifier Accuracy F-Measure Global time
ALMA 0.70 ± 0.03 0.75 ± 0.03 0.82 ± 0.3
PA 0.67 ± 0.06 0.71 ± 0.12 0.9 ± 10−3
PEGASOS 0.69 ± 0.03 0.73 ± 0.05 76 ± 0.1
OIPCAC 0.71 ± 0.03 0.75 ± 0.02 121 ± 25
RF 0.72 ± 0.03 0.78 ± 0.03 2173 ± 48
We select ALMA: good results, low costs
Corrado Monti Sentiment Analysis for italian microblogging 12/23
Introduction
Development of the classification system
Application and experimental results
Conclusions
3 – Genericity: rule-based approach
Problem: select tweets directed to all politicians
Unhelpful training set labels → we simplify the problem and
decide to use a rule-based approach
Regole usate:
Presence of keywords
defined with field experts
(based on the most “secure”
part of the training set)
∨
Entities from at least three
different political areas
(identified through
DBpedia ontologies)
Corrado Monti Sentiment Analysis for italian microblogging 13/23
Introduction
Development of the classification system
Application and experimental results
Conclusions
Applicazione del sistema di classificazione
Hypothesis:
Applying this classification system to tweet of April-October
2012 we can obtain a good measure of the diffusion of
political disaffection in society?
Corrado Monti Sentiment Analysis for italian microblogging 14/23
Introduction
Development of the classification system
Application and experimental results
Conclusions
Surveys
Surveys are a curtesy of IPSOS
We compute two indexes:
1. Inefficacy: fraction of italians that for every
party say that their propensity to vote them
is 1 in a 1 to 10 scale
Sociologists say that this capture the
sentiment of inefficacy perceived by citizens,
symptom of political disaffection
2. Non-vote: fraction of italians that will not
vote
It measures a behaviour
Influenced by other factors: election proximity,
“moral” perception of abstention
...
...
We have a survey every ∼ 10 days in April-October
Corrado Monti Sentiment Analysis for italian microblogging 15/23
Introduction
Development of the classification system
Application and experimental results
Conclusions
Tweet sample
We gather a set of user active in October and we
sample also their followers
We obtain 261 313 users active in October
∼ 5% of italian users and ∼ 10% of active italian
users in October 2012
We select only those active in April → 167 557
utenti
Scraping of their tweets in this time period:
35 882 423 tweet
Lorem ipsum dolor sit
amet, consectetur
adipiscing elit. In
rhoncus diam a urna
pulvinar convallis.
Lorem ipsum dolor sit
amet, consectetur
adipiscing elit. In
rhoncus diam a urna
pulvinar convallis.
Lorem ipsum dolor sit
amet, consectetur
adipiscing elit. In
rhoncus diam a urna
pulvinar convallis.
Lorem ipsum dolor sit
amet, consectetur
adipiscing elit. In
rhoncus diam a urna
pulvinar convallis.
Lorem ipsum dolor sit
amet, consectetur
adipiscing elit. In
rhoncus diam a urna
pulvinar convallis.
Lorem ipsum dolor sit
amet, consectetur
adipiscing elit. In
rhoncus diam a urna
pulvinar convallis.
Corrado Monti Sentiment Analysis for italian microblogging 16/23
Introduction
Development of the classification system
Application and experimental results
Conclusions
Comparison
For every survey date, we compute the il ratio between
disaffected tweet volume and political tweet volume in a
certain time window. We consider:
∆14
1 ∆14
7 ∆7
1
Giugno 2012
L M M G V S D
1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30
Giugno 2012
L M M G V S D
1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30
Giugno 2012
L M M G V S D
1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30
Last two weeks Between 14 and 7 days before Last week
We compare these ratios with polls through Pearson
correlation index
Corrado Monti Sentiment Analysis for italian microblogging 17/23
Introduction
Development of the classification system
Application and experimental results
Conclusions
Inefficacy:
∆max
min ρ 95%Confidenceinterval P-Value for ρ > 0
∆14
1 0.7860 0.476-0.922 0.031%
∆14
7 0.7749 0.454-0.917 0.042%
∆7
1 0.6880 0.310-0.878 0.226%
Non-voto:
∆max
min ρ 95%Confidenceinterval P-Value for ρ > 0
∆14
1 0.5579 0.190-0.788 0.567%
∆14
7 0.5920 0.248-0.803 0.231%
∆7
1 0.4433 0.049-0.718 3.00%
Corrado Monti Sentiment Analysis for italian microblogging 18/23
Introduction
Development of the classification system
Application and experimental results
Conclusions
Apr 15 May 01 May 15 Jun 01 Jun 15 Jul 01 Jul 15 Aug 01 Aug 15 Sep 01 Sep 15 Oct 01
0.00
0.05
0.10
0.15
0.20
0.25
0.004
0.005
0.006
0.007
0.008
0.009
0.01
0.011
Time
Inefficacyindicator
Twitterdisaffectionratio
Twitter disaffection ratio
Inefficacy indicator
Corrado Monti Sentiment Analysis for italian microblogging 19/23
Introduction
Development of the classification system
Application and experimental results
Conclusions
Interpretation
Data seem to indicate a quite strong correlation between
disaffected tweet and diffusion of the phenomena in society
This does not mean that Twitter is a representative sample
We can guess that the quantity of discussion about this
pheonomenon is connected with how much it will spread
Corrado Monti Sentiment Analysis for italian microblogging 20/23
Introduction
Development of the classification system
Application and experimental results
Conclusions
Further investigation
With this big tweet sample, we can study causes of
oscillations with text mining techinques
We compute the ratio day by day and we found out when we
have disaffection peaks
For every peak
1. we take news of that day
2. we compare them with the mass of tweet of the peak
3. which is the most similar news?
Corrado Monti Sentiment Analysis for italian microblogging 21/23
Introduction
Development of the classification system
Application and experimental results
Conclusions
Apr 01 Apr 15 May 01 May 15 Jun 01 Jun 15 Jul 01 Jul 15 Aug 01 Aug 15 Sep 01 Sep 15 Oct 01
0.000
0.005
0.010
0.015
0.020
0.
0.025
0.05
0.075
0.1
0.125
0.15
0.175
0.2
Time
Twitterdisaffectionratio
Inefficacyindicator
Twitter disaffection ratio
Inefficacy indicator
Rapporto di tweet
Sondaggi (Inefficacia)
Tempo
S
o
n
d
a
g
g
i
S
o
n
d
a
g
g
i
T
w
e
e
t


(Teorie del complotto su
stragismo di Stato)
Scandalo Lega
Amministrative
Attentato di Brindisi
Scandalo
Fiorito
Corrado Monti Sentiment Analysis for italian microblogging 22/23
Introduction
Development of the classification system
Application and experimental results
Conclusions
Conclusions
We build classifiers for political messages
Topic and sentiment classifers are reusable
We showed a correlation between quantity of discussion on
Twitter and diffusion of a phenomenon
Future works
Classifying users as nodes of a social network
Confirm or denial of models that could explain our data
Corrado Monti Sentiment Analysis for italian microblogging 23/23
Introduction
Development of the classification system
Application and experimental results
Conclusions
Thanks
Corrado Monti Sentiment Analysis for italian microblogging 24/23

Más contenido relacionado

Destacado

Vendor/Supplier rating
Vendor/Supplier ratingVendor/Supplier rating
Vendor/Supplier ratingsuresh t
 
Social media & sentiment analysis splunk conf2012
Social media & sentiment analysis   splunk conf2012Social media & sentiment analysis   splunk conf2012
Social media & sentiment analysis splunk conf2012Michael Wilde
 
Sentiment Analysis and Social Media: How and Why
Sentiment Analysis and Social Media: How and WhySentiment Analysis and Social Media: How and Why
Sentiment Analysis and Social Media: How and WhyDavide Feltoni Gurini
 
How to evaluate and select vendors with TransparentChoice software
How to evaluate and select vendors with TransparentChoice softwareHow to evaluate and select vendors with TransparentChoice software
How to evaluate and select vendors with TransparentChoice softwareTransparentChoice
 
Twitter sentiment-analysis Jiit2013-14
Twitter sentiment-analysis Jiit2013-14Twitter sentiment-analysis Jiit2013-14
Twitter sentiment-analysis Jiit2013-14Rachit Goel
 

Destacado (7)

Vendor/Supplier rating
Vendor/Supplier ratingVendor/Supplier rating
Vendor/Supplier rating
 
Social media & sentiment analysis splunk conf2012
Social media & sentiment analysis   splunk conf2012Social media & sentiment analysis   splunk conf2012
Social media & sentiment analysis splunk conf2012
 
Sentiment Analysis and Social Media: How and Why
Sentiment Analysis and Social Media: How and WhySentiment Analysis and Social Media: How and Why
Sentiment Analysis and Social Media: How and Why
 
How to evaluate and select vendors with TransparentChoice software
How to evaluate and select vendors with TransparentChoice softwareHow to evaluate and select vendors with TransparentChoice software
How to evaluate and select vendors with TransparentChoice software
 
Vendor rating
Vendor ratingVendor rating
Vendor rating
 
Vendor rating system
Vendor rating systemVendor rating system
Vendor rating system
 
Twitter sentiment-analysis Jiit2013-14
Twitter sentiment-analysis Jiit2013-14Twitter sentiment-analysis Jiit2013-14
Twitter sentiment-analysis Jiit2013-14
 

Similar a Sentiment Analysis and Political Disaffection in Italy

D. Zardetto, Using Twitter data for the Social Mood on Economy Index
D. Zardetto, Using Twitter data for the Social Mood on Economy Index D. Zardetto, Using Twitter data for the Social Mood on Economy Index
D. Zardetto, Using Twitter data for the Social Mood on Economy Index Istituto nazionale di statistica
 
Twitter Data Analysis
Twitter Data Analysis Twitter Data Analysis
Twitter Data Analysis Manan Gadhiya
 
Sentimental Analysis - Naive Bayes Algorithm
Sentimental Analysis - Naive Bayes AlgorithmSentimental Analysis - Naive Bayes Algorithm
Sentimental Analysis - Naive Bayes AlgorithmKhushboo Gupta
 
Sentiment Analysis on Twitter data using Machine Learning
Sentiment Analysis on Twitter data using Machine LearningSentiment Analysis on Twitter data using Machine Learning
Sentiment Analysis on Twitter data using Machine LearningIRJET Journal
 
Easybib Essay Checker
Easybib Essay CheckerEasybib Essay Checker
Easybib Essay CheckerDebbie White
 
The new @Buzzdetector services presentation June 2015
The new @Buzzdetector services presentation June 2015The new @Buzzdetector services presentation June 2015
The new @Buzzdetector services presentation June 2015gianandrea facchini
 
Knime social media_white_paper
Knime social media_white_paperKnime social media_white_paper
Knime social media_white_paperFiras Husseini
 
Sentiment Analysis on Twitter Data
Sentiment Analysis on Twitter DataSentiment Analysis on Twitter Data
Sentiment Analysis on Twitter DataIRJET Journal
 
Detecting insults in social media conversations
Detecting insults in social media conversationsDetecting insults in social media conversations
Detecting insults in social media conversationsraj
 
Twitter sentiment analysis using Azure NLP
Twitter sentiment analysis using Azure NLP Twitter sentiment analysis using Azure NLP
Twitter sentiment analysis using Azure NLP Olusola Amusan
 
Believe it or not - keynote CAS 2015
Believe it or not - keynote CAS 2015Believe it or not - keynote CAS 2015
Believe it or not - keynote CAS 2015lantoli
 
D1: The NMC Methodology
D1: The NMC MethodologyD1: The NMC Methodology
D1: The NMC Methodologylisbk
 
Combining Lexicon based and Machine Learning based Methods for Twitter Sentim...
Combining Lexicon based and Machine Learning based Methods for Twitter Sentim...Combining Lexicon based and Machine Learning based Methods for Twitter Sentim...
Combining Lexicon based and Machine Learning based Methods for Twitter Sentim...IRJET Journal
 
QuestionSubmitted by robrts6 on Mon, 2015-12-07 0900due o.docx
QuestionSubmitted by robrts6 on Mon, 2015-12-07 0900due o.docxQuestionSubmitted by robrts6 on Mon, 2015-12-07 0900due o.docx
QuestionSubmitted by robrts6 on Mon, 2015-12-07 0900due o.docxsleeperharwell
 
Hate Speech Identification Using Machine Learning
Hate Speech Identification Using Machine LearningHate Speech Identification Using Machine Learning
Hate Speech Identification Using Machine LearningIRJET Journal
 

Similar a Sentiment Analysis and Political Disaffection in Italy (20)

D. Zardetto, Using Twitter data for the Social Mood on Economy Index
D. Zardetto, Using Twitter data for the Social Mood on Economy Index D. Zardetto, Using Twitter data for the Social Mood on Economy Index
D. Zardetto, Using Twitter data for the Social Mood on Economy Index
 
Twitter Data Analysis
Twitter Data Analysis Twitter Data Analysis
Twitter Data Analysis
 
Project report
Project reportProject report
Project report
 
Sentimental Analysis - Naive Bayes Algorithm
Sentimental Analysis - Naive Bayes AlgorithmSentimental Analysis - Naive Bayes Algorithm
Sentimental Analysis - Naive Bayes Algorithm
 
Sentiment Analysis on Twitter data using Machine Learning
Sentiment Analysis on Twitter data using Machine LearningSentiment Analysis on Twitter data using Machine Learning
Sentiment Analysis on Twitter data using Machine Learning
 
Monitoring opinion on esop through social media and clustering its polarity
Monitoring opinion on esop through social media and clustering its polarityMonitoring opinion on esop through social media and clustering its polarity
Monitoring opinion on esop through social media and clustering its polarity
 
Easybib Essay Checker
Easybib Essay CheckerEasybib Essay Checker
Easybib Essay Checker
 
European Project Design
European Project DesignEuropean Project Design
European Project Design
 
Expertise Social Media Research - eng- out 2013
Expertise   Social Media Research - eng- out 2013Expertise   Social Media Research - eng- out 2013
Expertise Social Media Research - eng- out 2013
 
The new @Buzzdetector services presentation June 2015
The new @Buzzdetector services presentation June 2015The new @Buzzdetector services presentation June 2015
The new @Buzzdetector services presentation June 2015
 
Knime social media_white_paper
Knime social media_white_paperKnime social media_white_paper
Knime social media_white_paper
 
Sentiment Analysis on Twitter Data
Sentiment Analysis on Twitter DataSentiment Analysis on Twitter Data
Sentiment Analysis on Twitter Data
 
Detecting insults in social media conversations
Detecting insults in social media conversationsDetecting insults in social media conversations
Detecting insults in social media conversations
 
Twitter sentiment analysis using Azure NLP
Twitter sentiment analysis using Azure NLP Twitter sentiment analysis using Azure NLP
Twitter sentiment analysis using Azure NLP
 
Believe it or not - keynote CAS 2015
Believe it or not - keynote CAS 2015Believe it or not - keynote CAS 2015
Believe it or not - keynote CAS 2015
 
D1: The NMC Methodology
D1: The NMC MethodologyD1: The NMC Methodology
D1: The NMC Methodology
 
Combining Lexicon based and Machine Learning based Methods for Twitter Sentim...
Combining Lexicon based and Machine Learning based Methods for Twitter Sentim...Combining Lexicon based and Machine Learning based Methods for Twitter Sentim...
Combining Lexicon based and Machine Learning based Methods for Twitter Sentim...
 
QuestionSubmitted by robrts6 on Mon, 2015-12-07 0900due o.docx
QuestionSubmitted by robrts6 on Mon, 2015-12-07 0900due o.docxQuestionSubmitted by robrts6 on Mon, 2015-12-07 0900due o.docx
QuestionSubmitted by robrts6 on Mon, 2015-12-07 0900due o.docx
 
Hate Speech Identification Using Machine Learning
Hate Speech Identification Using Machine LearningHate Speech Identification Using Machine Learning
Hate Speech Identification Using Machine Learning
 
Monitoring Nepal
Monitoring NepalMonitoring Nepal
Monitoring Nepal
 

Último

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 

Último (20)

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 

Sentiment Analysis and Political Disaffection in Italy

  • 1. Introduction Development of the classification system Application and experimental results Conclusions Sentiment Analysis for italian language microblogging: development and valutation of an automatic system Corrado Monti 22-04-2013 Corrado Monti Sentiment Analysis for italian microblogging 1/23
  • 2. Introduction Development of the classification system Application and experimental results Conclusions Twitter Main microblogging platform instant pubblication of short textual contents Tweet → 140 characters Widespread in Italy too Italian internet users Hearsay knowledge of Twitter Weekly Twitter users Dayly Twitter users 28.6 M people 100% 25.3 M people 88.6% of internet users 1.24 M people 4.4% of internet users 4.7 M people 16.5% of internet users december 2012 Corrado Monti Sentiment Analysis for italian microblogging 2/23
  • 3. Introduction Development of the classification system Application and experimental results Conclusions Textual classification We would like to classify millions of texts Rule-based approach: definitions of rules (keywords, regular expressions) together with field experts Inaccurate, hard to define Supervised machine learning They need a training set to learn how to classify new texts Every text is transformed in a sequence of numerical features; the algorithm learns what these numbers mean Recent problem: Sentiment Analysis → recognize the sentiment expressed by the author of the text Many applications, both industrial and academic Corrado Monti Sentiment Analysis for italian microblogging 3/23
  • 4. Introduction Development of the classification system Application and experimental results Conclusions Measuring collective feelings One of the first works is O’Connor et al. (2010), who measured economic trust through Sentiment Analysis on Twitter SentimentRatio 1.52.02.53.03.54.0 GallupEconomicConfidence −60−50−40−30−20 2008−01 2008−02 2008−03 2008−04 2008−05 2008−06 2008−07 2008−08 2008−09 2008−10 2008−11 2008−12 2009−01 2009−02 2009−03 2009−04 2009−05 2009−06 2009−07 2009−08 2009−09 2009−10 2009−11 Dates Dates For which phenomena is this possible? How much are these measures valid? Do they work in Italy too? Corrado Monti Sentiment Analysis for italian microblogging 4/23
  • 5. Introduction Development of the classification system Application and experimental results Conclusions Goals of this work Hypothesis: can political disaffection be measured through massive tweet classification? It is a relevant phenomenon, especially in Italy Lot of interest, academic (sociology) and not academic We’d also like to build reusable classifiers Corrado Monti Sentiment Analysis for italian microblogging 5/23
  • 6. Introduction Development of the classification system Application and experimental results Conclusions Political disaffection How to define a “disaffected” tweet? According to domain experts, it must 1. have a politic-related topic → topic detection 2. have negative sentiment → sentiment analysis 3. be directed to all politicians Tweet Is political? Discarded No Yes Is negative? Discarded No Yes Is general? Discarded No Yes Politically Disaffected Tweet Corrado Monti Sentiment Analysis for italian microblogging 6/23
  • 7. Introduction Development of the classification system Application and experimental results Conclusions Training Set 28 340 tweet labelled by 40 political science students Between april and june 2012 3 labellers for every text We keep in the dataset only tweet with unanimous “politic” label For other labels we measure agreement with Krippendorff α: “negative” → 0.78 → reliable labels “generic” → 0.41 → much noise on labels × ⇓ negative non negative politic 7 965 4 544 not politic 15 831 Corrado Monti Sentiment Analysis for italian microblogging 7/23
  • 8. Introduction Development of the classification system Application and experimental results Conclusions 1 – Topic Detection: politics Time-robust classifier Training Set extension: 17 388 news titles (January-October 2012) Best feature extraction: 5-grams of characters → uniformity from different spacings tf-idf Discard terms with less than 4 occourences Corrado Monti Sentiment Analysis for italian microblogging 8/23
  • 9. Introduction Development of the classification system Application and experimental results Conclusions 45 728 points with 78 642 features SMO SVM-solver, k-Nearest Neighbor, Random Forest, kernel are too resource-hungry We preferred online algorithms ALMA, Passive-Aggressive, PEGASOS, OIPCAC gave good results Classifier Accuracy F-Measure Global time ALMA 0.88 ± 0.01 0.87 ± 0.01 13.5 ± 1 PA 0.89 ± 0.01 0.89 ± 0.01 10.6 ± 0.1 PEGASOS 0.88 ± 0.01 0.88 ± 0.01 1103 ± 10 OIPCAC 0.89 ± 0.001 0.89 ± 0.01 5911 ± 52 Other algorithms were tested, but with worst results (i.e. Na¨ıve Bayes) We selected Passive-Aggressive: good results, low costs Corrado Monti Sentiment Analysis for italian microblogging 9/23
  • 10. Introduction Development of the classification system Application and experimental results Conclusions 2 – Sentiment Analysis: feature extraction Different ways were tested: n-grams of characters, words, n-grams of words boolean presence, term frequency, tf-idf, fuzzy match between n-grams Best: term frequency of single words Adjustments: Fraction of uppercase words as a feature Features of synonyms were joined together Other adjustments did not gave better results: Consider as different words in beginning or in the end of the tweet Corrado Monti Sentiment Analysis for italian microblogging 10/23
  • 11. Introduction Development of the classification system Application and experimental results Conclusions 2 – Removal of the target of sentiment We do not want that the sentiment could be guessed basing on specific entities (PD, Berlusconi. . . ). It should be based on the surrounding words. Training Set prejudices ⇒ Overfitting Problem sometimes ignored, especially in Italian language Sentiment Analysis How can we decide which words must be removed? Proposed solution: ontology-based filter DBpedia query SPARQL fi rst objects list text from online newspapers list of objects to remove appeared? Discarded No Yes Corrado Monti Sentiment Analysis for italian microblogging 11/23
  • 12. Introduction Development of the classification system Application and experimental results Conclusions 2 – Sentiment Analysis: classification 12 476 points with 2 383 features Classifier Accuracy F-Measure Global time ALMA 0.70 ± 0.03 0.75 ± 0.03 0.82 ± 0.3 PA 0.67 ± 0.06 0.71 ± 0.12 0.9 ± 10−3 PEGASOS 0.69 ± 0.03 0.73 ± 0.05 76 ± 0.1 OIPCAC 0.71 ± 0.03 0.75 ± 0.02 121 ± 25 RF 0.72 ± 0.03 0.78 ± 0.03 2173 ± 48 We select ALMA: good results, low costs Corrado Monti Sentiment Analysis for italian microblogging 12/23
  • 13. Introduction Development of the classification system Application and experimental results Conclusions 3 – Genericity: rule-based approach Problem: select tweets directed to all politicians Unhelpful training set labels → we simplify the problem and decide to use a rule-based approach Regole usate: Presence of keywords defined with field experts (based on the most “secure” part of the training set) ∨ Entities from at least three different political areas (identified through DBpedia ontologies) Corrado Monti Sentiment Analysis for italian microblogging 13/23
  • 14. Introduction Development of the classification system Application and experimental results Conclusions Applicazione del sistema di classificazione Hypothesis: Applying this classification system to tweet of April-October 2012 we can obtain a good measure of the diffusion of political disaffection in society? Corrado Monti Sentiment Analysis for italian microblogging 14/23
  • 15. Introduction Development of the classification system Application and experimental results Conclusions Surveys Surveys are a curtesy of IPSOS We compute two indexes: 1. Inefficacy: fraction of italians that for every party say that their propensity to vote them is 1 in a 1 to 10 scale Sociologists say that this capture the sentiment of inefficacy perceived by citizens, symptom of political disaffection 2. Non-vote: fraction of italians that will not vote It measures a behaviour Influenced by other factors: election proximity, “moral” perception of abstention ... ... We have a survey every ∼ 10 days in April-October Corrado Monti Sentiment Analysis for italian microblogging 15/23
  • 16. Introduction Development of the classification system Application and experimental results Conclusions Tweet sample We gather a set of user active in October and we sample also their followers We obtain 261 313 users active in October ∼ 5% of italian users and ∼ 10% of active italian users in October 2012 We select only those active in April → 167 557 utenti Scraping of their tweets in this time period: 35 882 423 tweet Lorem ipsum dolor sit amet, consectetur adipiscing elit. In rhoncus diam a urna pulvinar convallis. Lorem ipsum dolor sit amet, consectetur adipiscing elit. In rhoncus diam a urna pulvinar convallis. Lorem ipsum dolor sit amet, consectetur adipiscing elit. In rhoncus diam a urna pulvinar convallis. Lorem ipsum dolor sit amet, consectetur adipiscing elit. In rhoncus diam a urna pulvinar convallis. Lorem ipsum dolor sit amet, consectetur adipiscing elit. In rhoncus diam a urna pulvinar convallis. Lorem ipsum dolor sit amet, consectetur adipiscing elit. In rhoncus diam a urna pulvinar convallis. Corrado Monti Sentiment Analysis for italian microblogging 16/23
  • 17. Introduction Development of the classification system Application and experimental results Conclusions Comparison For every survey date, we compute the il ratio between disaffected tweet volume and political tweet volume in a certain time window. We consider: ∆14 1 ∆14 7 ∆7 1 Giugno 2012 L M M G V S D 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Giugno 2012 L M M G V S D 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Giugno 2012 L M M G V S D 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Last two weeks Between 14 and 7 days before Last week We compare these ratios with polls through Pearson correlation index Corrado Monti Sentiment Analysis for italian microblogging 17/23
  • 18. Introduction Development of the classification system Application and experimental results Conclusions Inefficacy: ∆max min ρ 95%Confidenceinterval P-Value for ρ > 0 ∆14 1 0.7860 0.476-0.922 0.031% ∆14 7 0.7749 0.454-0.917 0.042% ∆7 1 0.6880 0.310-0.878 0.226% Non-voto: ∆max min ρ 95%Confidenceinterval P-Value for ρ > 0 ∆14 1 0.5579 0.190-0.788 0.567% ∆14 7 0.5920 0.248-0.803 0.231% ∆7 1 0.4433 0.049-0.718 3.00% Corrado Monti Sentiment Analysis for italian microblogging 18/23
  • 19. Introduction Development of the classification system Application and experimental results Conclusions Apr 15 May 01 May 15 Jun 01 Jun 15 Jul 01 Jul 15 Aug 01 Aug 15 Sep 01 Sep 15 Oct 01 0.00 0.05 0.10 0.15 0.20 0.25 0.004 0.005 0.006 0.007 0.008 0.009 0.01 0.011 Time Inefficacyindicator Twitterdisaffectionratio Twitter disaffection ratio Inefficacy indicator Corrado Monti Sentiment Analysis for italian microblogging 19/23
  • 20. Introduction Development of the classification system Application and experimental results Conclusions Interpretation Data seem to indicate a quite strong correlation between disaffected tweet and diffusion of the phenomena in society This does not mean that Twitter is a representative sample We can guess that the quantity of discussion about this pheonomenon is connected with how much it will spread Corrado Monti Sentiment Analysis for italian microblogging 20/23
  • 21. Introduction Development of the classification system Application and experimental results Conclusions Further investigation With this big tweet sample, we can study causes of oscillations with text mining techinques We compute the ratio day by day and we found out when we have disaffection peaks For every peak 1. we take news of that day 2. we compare them with the mass of tweet of the peak 3. which is the most similar news? Corrado Monti Sentiment Analysis for italian microblogging 21/23
  • 22. Introduction Development of the classification system Application and experimental results Conclusions Apr 01 Apr 15 May 01 May 15 Jun 01 Jun 15 Jul 01 Jul 15 Aug 01 Aug 15 Sep 01 Sep 15 Oct 01 0.000 0.005 0.010 0.015 0.020 0. 0.025 0.05 0.075 0.1 0.125 0.15 0.175 0.2 Time Twitterdisaffectionratio Inefficacyindicator Twitter disaffection ratio Inefficacy indicator Rapporto di tweet Sondaggi (Inefficacia) Tempo S o n d a g g i S o n d a g g i T w e e t (Teorie del complotto su stragismo di Stato) Scandalo Lega Amministrative Attentato di Brindisi Scandalo Fiorito Corrado Monti Sentiment Analysis for italian microblogging 22/23
  • 23. Introduction Development of the classification system Application and experimental results Conclusions Conclusions We build classifiers for political messages Topic and sentiment classifers are reusable We showed a correlation between quantity of discussion on Twitter and diffusion of a phenomenon Future works Classifying users as nodes of a social network Confirm or denial of models that could explain our data Corrado Monti Sentiment Analysis for italian microblogging 23/23
  • 24. Introduction Development of the classification system Application and experimental results Conclusions Thanks Corrado Monti Sentiment Analysis for italian microblogging 24/23