SlideShare una empresa de Scribd logo
1 de 25
Descargar para leer sin conexión
P h D S t u d e n t
P a t r i c i a S á n c h e z -Ho l g a d o
D i r e c t o r
C a r l o s A r c i l a - C a l d e r ó n
Context
Twitter as a tool for scientific communication in Spain
Relevant network: user volume, free generation of content and its information in real time.
Advantage: immediacy | Disadvantage: saturation
• It has enormous potential and begins to be protagonist, but at the same time requires
efficient use.
• Twitter is the most used network by science journalists.
• Science communicators increasingly use digital technology and social networks.
• The first data on a scientific or technical scoop are already made public on Twitter.
• The opinion shown on Twitter has a direct link with national and international scientific news.
RQ1 - Can we analyze a part of the public data available in the social
network Twitter to know attitudes, opinions and sentiments towards
the communication topics of science that are shared?
Objectives
Main Objective:
Develop and evaluate a classifier for the analysis of sentiment of messages on scientific topics,
in Spanish and in real time, on the social network Twitter using machine learning techniques.
Secondary Objectives:
1. Creation of a specific corpus of texts classified by positive or negative sentiment.
2. Development of a prototype for the analysis of sentiment of scientific messages on Twitter
in real time.
3. Test the prototype.
Expected Results
Corpus of texts of scientific topics in Spanish,
labeled with positive or negative sentiment.
Prototype "OPSCIENCE" Spanish version
Methodology
Machine Learning
• Selection
• Preprocessing
• Transformation
• Modeling
• Interpretation
• Evaluation
Data Mining
Patterns in large
volumes of data set.
• Supervised:
establishes a
correspondence
between the desired
inputs and outputs of
the system.
Machine
Learning
It uses algorithms
and statistics to
understand, learn
and reproduce
human language.
• Probabilistic
models based on
data
Natural Language
Processing NLP
Computational study
of sentiments
expressed through
texts.
• Polarity: positive
or negative
Sentiment
Analysis
The goal of supervised machine learning is
create a function
that is able to
predict
what the value of an input element would be
after being trained with the sentiment classifier.
OPScience classificator
It allows to analyze locally the tone of scientific tweets in real time:
- Using free available resources such as Python (version 2.7) and the Application Program
Interface (API) of Twitter (REST and STREAMING).
- Based on the NLTK and Sci-Learn libraries for Python.
- Train a supervised machine learning model with 6 classification algorithms (Original Naive
Bayes Original, Naive Bayes for multimodal models, Naive Bayes for multivariate Bernoulli
models, Logistic Regression, Linear Support Vector Classification and Linear classifiers with
stochastic gradient descent -SGD- training).
Development of the project
STEP 1:
Creation of a corpus of scientific texts in Spanish
which will serve to train an automatic learning model.
STEP 2:
Supervised machine learning model
trained with 6 classification algorithms
STEP 3:
Real-time classifier test
Connecting to the Twitter streaming API
STEP 1. Creation of a corpus of scientific texts in Spanish
1.1 Acquisition of the Data
• Downloading data from Twitter
• Creating an app
• Data obtained
• Script for data download
Characteristics of the total dataset
Language Spanish
tweets downloaded in streaming 171.459
tweets downloaded in Rest 37.292
Total of downloaded tweets 208.751
STEP 1. Creation of a corpus of scientific texts in Spanish
1.2 Preprocessing of the data:
• Store the tweets in csv text.
• UTF / ANSI formats
• Spanish Language
• Texts in lowercase
• Retweets
• Suppression of possible
duplicates with R
• Tokenization
• Other preprocessing
• Manual classification of the
sentiment of the text
STEP 1. Creation of a corpus of scientific texts in Spanish
Corpus of texts:
10,000 elements
• 5,000 messages labeled as positive
• 5,000 messages labeled as negative
STEP 2. Supervised machine learning model
Learning: The classifier will be trained with the corpus of positive and negative scientific
tweets in Spanish: Training 70% - Test validation 30%
6 Algorithms used:
– Original Naive Bayes,
– Naive Bayes for multimodal models,
– Naive Bayes for multivariate Bernoulli models,
– Logistic Regression,
– Linear Support Vector Classification (SVC) and
– Linear classifiers with stochastic gradient descent -SGD- training.
Combination of classification algorithms: voting by feature intervals.
A voting system is created where each algorithm has one vote and the classification that
has the most votes is the one chosen.
STEP 3. Real-time classifier test
Validation of the Model
• Using these predictive models, the classifier will allow to connect to the streaming of
Twitter data in real time (using the API streaming available) and
• filter tweets by keywords or hashtag, written in Spanish about science to predict
the sentiment of each tweet generated
• and automatically visualize with the Matplot library those with high confidence
intervals (> 0.80).
Results
Classifier Results
Accuracy = correct predictions / total predictions
Average of this type of models 70%
Example: TASS project is around 72% (Cumbreras et al., 2016).
Algoritmo Accuracy %
Original Naive Bayes Algo 72.64
MNB_classifier 72.24
BernoulliNB_classifier 72.80
LogisticRegression_classifier 71.88
LinearSVC_classifier 70.45
SGDClassifier 71.15
Combination of classifiers
voted_classifier: Accuracy 72.31 %
Confussion Matrix
Predicción
Pos Neg
Real Pos TP FN
Neg TF TN
Predicción
Pos Neg
Real Pos <1158> 342
Neg 465 <1047>
Conclusions
Conclusions
• Microblogging and Twitter as a communication tool of Science.
• Preparation of a specific corpus of scientific texts in Spanish
• Training of a model: used algorithms and parameters.
• Evaluation of obtaining results. Accuracy 72%
• Test in real time.
Future lines of research
• This study can support the strategies of scientific communication.
• Test and study of individual results of the classification algorithms.
• Enlargement of the corpus and labeling with more classes: positive,
negative and neutral to include the informative messages.
• Measurement of the models at the end of each preprocessing phase, in
order to assess their relative importance.
• Real-time, large-scale studies with distributed computing.
Future lines of research
Continue
RQ1 - Can we analyze a part of the public data available in the social network Twitter to
know attitudes, opinions, sentiments towards the communication topics of science that
are shared?
with
 and move towards the prediction of future trends in science topics?.
Pa t r i c i a S á n c h e z - H o l ga d o
C a r l o s A r c i l a - C a l d e ró n

Más contenido relacionado

Similar a Towards the study of sentiment in the public opinion of science in Spanish

IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...
IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...
IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...IRJET Journal
 
1. Intro DS.pptx
1. Intro DS.pptx1. Intro DS.pptx
1. Intro DS.pptxAnusuya123
 
Knowledge base system appl. p 1,2-ver1
Knowledge base system appl.  p 1,2-ver1Knowledge base system appl.  p 1,2-ver1
Knowledge base system appl. p 1,2-ver1Taymoor Nazmy
 
Omics Logic - Bioinformatics 2.0
Omics Logic - Bioinformatics 2.0Omics Logic - Bioinformatics 2.0
Omics Logic - Bioinformatics 2.0Elia Brodsky
 
Multimodal Learning Analytics
Multimodal Learning AnalyticsMultimodal Learning Analytics
Multimodal Learning AnalyticsXavier Ochoa
 
Chapter-1 - Notes.pptx
Chapter-1 - Notes.pptxChapter-1 - Notes.pptx
Chapter-1 - Notes.pptxDATASCIENCE41
 
Political prediction analysis using text mining and deep learning
Political prediction analysis using text mining and deep learningPolitical prediction analysis using text mining and deep learning
Political prediction analysis using text mining and deep learningVishwambhar Deshpande
 
Exploring Correlation Between Sentiment of Environmental Tweets and the Stock...
Exploring Correlation Between Sentiment of Environmental Tweets and the Stock...Exploring Correlation Between Sentiment of Environmental Tweets and the Stock...
Exploring Correlation Between Sentiment of Environmental Tweets and the Stock...Data Works MD
 
Software tools to facilitate materials science research
Software tools to facilitate materials science researchSoftware tools to facilitate materials science research
Software tools to facilitate materials science researchAnubhav Jain
 
Ibm piquant summary
Ibm piquant summaryIbm piquant summary
Ibm piquant summaryIIUM
 
Political Prediction Analysis using text mining and deep learning.pptx
Political Prediction Analysis using text mining and deep learning.pptxPolitical Prediction Analysis using text mining and deep learning.pptx
Political Prediction Analysis using text mining and deep learning.pptxDineshGaikwad36
 
Multimodal Learning Analytics
Multimodal Learning AnalyticsMultimodal Learning Analytics
Multimodal Learning AnalyticsXavier Ochoa
 
IQSS Presentation to Program in Health Policy
IQSS Presentation to Program in Health PolicyIQSS Presentation to Program in Health Policy
IQSS Presentation to Program in Health Policyalexstorer
 
Introduction to Text Mining
Introduction to Text MiningIntroduction to Text Mining
Introduction to Text MiningMinha Hwang
 
Real-Time Streaming Data Analysis with HTM
Real-Time Streaming Data Analysis with HTMReal-Time Streaming Data Analysis with HTM
Real-Time Streaming Data Analysis with HTMNumenta
 
Invulformulier vakinformatie
Invulformulier vakinformatieInvulformulier vakinformatie
Invulformulier vakinformatiebutest
 

Similar a Towards the study of sentiment in the public opinion of science in Spanish (20)

IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...
IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...
IRJET - Sentiment Analysis for Marketing and Product Review using a Hybrid Ap...
 
1. Intro DS.pptx
1. Intro DS.pptx1. Intro DS.pptx
1. Intro DS.pptx
 
Knowledge base system appl. p 1,2-ver1
Knowledge base system appl.  p 1,2-ver1Knowledge base system appl.  p 1,2-ver1
Knowledge base system appl. p 1,2-ver1
 
ReComp for genomics
ReComp for genomicsReComp for genomics
ReComp for genomics
 
Omics Logic - Bioinformatics 2.0
Omics Logic - Bioinformatics 2.0Omics Logic - Bioinformatics 2.0
Omics Logic - Bioinformatics 2.0
 
Loupe model - Use Cases and Requirements
Loupe model - Use Cases and Requirements Loupe model - Use Cases and Requirements
Loupe model - Use Cases and Requirements
 
Multimodal Learning Analytics
Multimodal Learning AnalyticsMultimodal Learning Analytics
Multimodal Learning Analytics
 
Chapter-1 - Notes.pptx
Chapter-1 - Notes.pptxChapter-1 - Notes.pptx
Chapter-1 - Notes.pptx
 
Political prediction analysis using text mining and deep learning
Political prediction analysis using text mining and deep learningPolitical prediction analysis using text mining and deep learning
Political prediction analysis using text mining and deep learning
 
wendi_ppt
wendi_pptwendi_ppt
wendi_ppt
 
Exploring Correlation Between Sentiment of Environmental Tweets and the Stock...
Exploring Correlation Between Sentiment of Environmental Tweets and the Stock...Exploring Correlation Between Sentiment of Environmental Tweets and the Stock...
Exploring Correlation Between Sentiment of Environmental Tweets and the Stock...
 
Software tools to facilitate materials science research
Software tools to facilitate materials science researchSoftware tools to facilitate materials science research
Software tools to facilitate materials science research
 
Ibm piquant summary
Ibm piquant summaryIbm piquant summary
Ibm piquant summary
 
Political Prediction Analysis using text mining and deep learning.pptx
Political Prediction Analysis using text mining and deep learning.pptxPolitical Prediction Analysis using text mining and deep learning.pptx
Political Prediction Analysis using text mining and deep learning.pptx
 
Multimodal Learning Analytics
Multimodal Learning AnalyticsMultimodal Learning Analytics
Multimodal Learning Analytics
 
IQSS Presentation to Program in Health Policy
IQSS Presentation to Program in Health PolicyIQSS Presentation to Program in Health Policy
IQSS Presentation to Program in Health Policy
 
Data Science and Analysis.pptx
Data Science and Analysis.pptxData Science and Analysis.pptx
Data Science and Analysis.pptx
 
Introduction to Text Mining
Introduction to Text MiningIntroduction to Text Mining
Introduction to Text Mining
 
Real-Time Streaming Data Analysis with HTM
Real-Time Streaming Data Analysis with HTMReal-Time Streaming Data Analysis with HTM
Real-Time Streaming Data Analysis with HTM
 
Invulformulier vakinformatie
Invulformulier vakinformatieInvulformulier vakinformatie
Invulformulier vakinformatie
 

Más de Technological Ecosystems for Enhancing Multiculturality

Más de Technological Ecosystems for Enhancing Multiculturality (20)

A Preliminary Study of Proof of Concept Practices and their connection with I...
A Preliminary Study of Proof of Concept Practices and their connection with I...A Preliminary Study of Proof of Concept Practices and their connection with I...
A Preliminary Study of Proof of Concept Practices and their connection with I...
 
Social networks as a promotional space for Spanish radio content. The case st...
Social networks as a promotional space for Spanish radio content. The case st...Social networks as a promotional space for Spanish radio content. The case st...
Social networks as a promotional space for Spanish radio content. The case st...
 
A Three-Step Data-Mining Analysis of Top-Ranked Higher Education Institutions...
A Three-Step Data-Mining Analysis of Top-Ranked Higher Education Institutions...A Three-Step Data-Mining Analysis of Top-Ranked Higher Education Institutions...
A Three-Step Data-Mining Analysis of Top-Ranked Higher Education Institutions...
 
Specifics of multimedia texts in the context of social networks media aesthetics
Specifics of multimedia texts in the context of social networks media aestheticsSpecifics of multimedia texts in the context of social networks media aesthetics
Specifics of multimedia texts in the context of social networks media aesthetics
 
Combined Effects of Similarity and Imagined Contact on First-Person Testimoni...
Combined Effects of Similarity and Imagined Contact on First-Person Testimoni...Combined Effects of Similarity and Imagined Contact on First-Person Testimoni...
Combined Effects of Similarity and Imagined Contact on First-Person Testimoni...
 
Direct online political communication effects on civil participation in spain...
Direct online political communication effects on civil participation in spain...Direct online political communication effects on civil participation in spain...
Direct online political communication effects on civil participation in spain...
 
University Media in Ecuador: Types, Functions and Self-determination
University Media in Ecuador: Types, Functions and Self-determinationUniversity Media in Ecuador: Types, Functions and Self-determination
University Media in Ecuador: Types, Functions and Self-determination
 
Like it or die: using social networks to improve collaborative learning in hi...
Like it or die: using social networks to improve collaborative learning in hi...Like it or die: using social networks to improve collaborative learning in hi...
Like it or die: using social networks to improve collaborative learning in hi...
 
Framing theory in studies of environmental information in press
Framing theory in studies of environmental information in pressFraming theory in studies of environmental information in press
Framing theory in studies of environmental information in press
 
Domain engineering for generating dashboards to analyze employment and employ...
Domain engineering for generating dashboards to analyze employment and employ...Domain engineering for generating dashboards to analyze employment and employ...
Domain engineering for generating dashboards to analyze employment and employ...
 
Mapping the systematic literature studies about software ecosystems
Mapping the systematic literature studies about software ecosystemsMapping the systematic literature studies about software ecosystems
Mapping the systematic literature studies about software ecosystems
 
Tag-Based Browsing of Digital Collections with Inverted Indexes and Browsing ...
Tag-Based Browsing of Digital Collections with Inverted Indexes and Browsing ...Tag-Based Browsing of Digital Collections with Inverted Indexes and Browsing ...
Tag-Based Browsing of Digital Collections with Inverted Indexes and Browsing ...
 
A Multivocal Literature Review on the use of DevOps for e-learning systems
A Multivocal Literature Review on the use of DevOps for e-learning systemsA Multivocal Literature Review on the use of DevOps for e-learning systems
A Multivocal Literature Review on the use of DevOps for e-learning systems
 
Document Annotation Tools: Annotation Classification Mechanisms
Document Annotation Tools: Annotation Classification MechanismsDocument Annotation Tools: Annotation Classification Mechanisms
Document Annotation Tools: Annotation Classification Mechanisms
 
Toward supporting decision-making under uncertainty in digital humanities wit...
Toward supporting decision-making under uncertainty in digital humanities wit...Toward supporting decision-making under uncertainty in digital humanities wit...
Toward supporting decision-making under uncertainty in digital humanities wit...
 
Managing Uncertainty in the Humanities: Digital and Analogue Approaches
Managing Uncertainty in the Humanities: Digital and Analogue ApproachesManaging Uncertainty in the Humanities: Digital and Analogue Approaches
Managing Uncertainty in the Humanities: Digital and Analogue Approaches
 
Representing Imprecise and Uncertain Knowledge in Digital Humanities: A Theor...
Representing Imprecise and Uncertain Knowledge in Digital Humanities: A Theor...Representing Imprecise and Uncertain Knowledge in Digital Humanities: A Theor...
Representing Imprecise and Uncertain Knowledge in Digital Humanities: A Theor...
 
Dotmocracy and Planning Poker for Uncertainty Management in Collaborative Res...
Dotmocracy and Planning Poker for Uncertainty Management in Collaborative Res...Dotmocracy and Planning Poker for Uncertainty Management in Collaborative Res...
Dotmocracy and Planning Poker for Uncertainty Management in Collaborative Res...
 
Applying Commercial Computer Vision Tools to Cope with Uncertainties in a Cit...
Applying Commercial Computer Vision Tools to Cope with Uncertainties in a Cit...Applying Commercial Computer Vision Tools to Cope with Uncertainties in a Cit...
Applying Commercial Computer Vision Tools to Cope with Uncertainties in a Cit...
 
Appliying topic modeling techniques to degraded texts. Spanish historical pre...
Appliying topic modeling techniques to degraded texts. Spanish historical pre...Appliying topic modeling techniques to degraded texts. Spanish historical pre...
Appliying topic modeling techniques to degraded texts. Spanish historical pre...
 

Último

Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 

Último (20)

Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 

Towards the study of sentiment in the public opinion of science in Spanish

  • 1. P h D S t u d e n t P a t r i c i a S á n c h e z -Ho l g a d o D i r e c t o r C a r l o s A r c i l a - C a l d e r ó n
  • 3. Twitter as a tool for scientific communication in Spain Relevant network: user volume, free generation of content and its information in real time. Advantage: immediacy | Disadvantage: saturation • It has enormous potential and begins to be protagonist, but at the same time requires efficient use. • Twitter is the most used network by science journalists. • Science communicators increasingly use digital technology and social networks. • The first data on a scientific or technical scoop are already made public on Twitter. • The opinion shown on Twitter has a direct link with national and international scientific news.
  • 4. RQ1 - Can we analyze a part of the public data available in the social network Twitter to know attitudes, opinions and sentiments towards the communication topics of science that are shared?
  • 5. Objectives Main Objective: Develop and evaluate a classifier for the analysis of sentiment of messages on scientific topics, in Spanish and in real time, on the social network Twitter using machine learning techniques. Secondary Objectives: 1. Creation of a specific corpus of texts classified by positive or negative sentiment. 2. Development of a prototype for the analysis of sentiment of scientific messages on Twitter in real time. 3. Test the prototype.
  • 6. Expected Results Corpus of texts of scientific topics in Spanish, labeled with positive or negative sentiment. Prototype "OPSCIENCE" Spanish version
  • 8. Machine Learning • Selection • Preprocessing • Transformation • Modeling • Interpretation • Evaluation Data Mining Patterns in large volumes of data set. • Supervised: establishes a correspondence between the desired inputs and outputs of the system. Machine Learning It uses algorithms and statistics to understand, learn and reproduce human language. • Probabilistic models based on data Natural Language Processing NLP Computational study of sentiments expressed through texts. • Polarity: positive or negative Sentiment Analysis
  • 9. The goal of supervised machine learning is create a function that is able to predict what the value of an input element would be after being trained with the sentiment classifier.
  • 10. OPScience classificator It allows to analyze locally the tone of scientific tweets in real time: - Using free available resources such as Python (version 2.7) and the Application Program Interface (API) of Twitter (REST and STREAMING). - Based on the NLTK and Sci-Learn libraries for Python. - Train a supervised machine learning model with 6 classification algorithms (Original Naive Bayes Original, Naive Bayes for multimodal models, Naive Bayes for multivariate Bernoulli models, Logistic Regression, Linear Support Vector Classification and Linear classifiers with stochastic gradient descent -SGD- training).
  • 12. STEP 1: Creation of a corpus of scientific texts in Spanish which will serve to train an automatic learning model. STEP 2: Supervised machine learning model trained with 6 classification algorithms STEP 3: Real-time classifier test Connecting to the Twitter streaming API
  • 13. STEP 1. Creation of a corpus of scientific texts in Spanish 1.1 Acquisition of the Data • Downloading data from Twitter • Creating an app • Data obtained • Script for data download Characteristics of the total dataset Language Spanish tweets downloaded in streaming 171.459 tweets downloaded in Rest 37.292 Total of downloaded tweets 208.751
  • 14. STEP 1. Creation of a corpus of scientific texts in Spanish 1.2 Preprocessing of the data: • Store the tweets in csv text. • UTF / ANSI formats • Spanish Language • Texts in lowercase • Retweets • Suppression of possible duplicates with R • Tokenization • Other preprocessing • Manual classification of the sentiment of the text
  • 15. STEP 1. Creation of a corpus of scientific texts in Spanish Corpus of texts: 10,000 elements • 5,000 messages labeled as positive • 5,000 messages labeled as negative
  • 16. STEP 2. Supervised machine learning model Learning: The classifier will be trained with the corpus of positive and negative scientific tweets in Spanish: Training 70% - Test validation 30% 6 Algorithms used: – Original Naive Bayes, – Naive Bayes for multimodal models, – Naive Bayes for multivariate Bernoulli models, – Logistic Regression, – Linear Support Vector Classification (SVC) and – Linear classifiers with stochastic gradient descent -SGD- training. Combination of classification algorithms: voting by feature intervals. A voting system is created where each algorithm has one vote and the classification that has the most votes is the one chosen.
  • 17. STEP 3. Real-time classifier test Validation of the Model • Using these predictive models, the classifier will allow to connect to the streaming of Twitter data in real time (using the API streaming available) and • filter tweets by keywords or hashtag, written in Spanish about science to predict the sentiment of each tweet generated • and automatically visualize with the Matplot library those with high confidence intervals (> 0.80).
  • 19. Classifier Results Accuracy = correct predictions / total predictions Average of this type of models 70% Example: TASS project is around 72% (Cumbreras et al., 2016). Algoritmo Accuracy % Original Naive Bayes Algo 72.64 MNB_classifier 72.24 BernoulliNB_classifier 72.80 LogisticRegression_classifier 71.88 LinearSVC_classifier 70.45 SGDClassifier 71.15
  • 20. Combination of classifiers voted_classifier: Accuracy 72.31 % Confussion Matrix Predicción Pos Neg Real Pos TP FN Neg TF TN Predicción Pos Neg Real Pos <1158> 342 Neg 465 <1047>
  • 22. Conclusions • Microblogging and Twitter as a communication tool of Science. • Preparation of a specific corpus of scientific texts in Spanish • Training of a model: used algorithms and parameters. • Evaluation of obtaining results. Accuracy 72% • Test in real time.
  • 23. Future lines of research • This study can support the strategies of scientific communication. • Test and study of individual results of the classification algorithms. • Enlargement of the corpus and labeling with more classes: positive, negative and neutral to include the informative messages. • Measurement of the models at the end of each preprocessing phase, in order to assess their relative importance. • Real-time, large-scale studies with distributed computing.
  • 24. Future lines of research Continue RQ1 - Can we analyze a part of the public data available in the social network Twitter to know attitudes, opinions, sentiments towards the communication topics of science that are shared? with  and move towards the prediction of future trends in science topics?.
  • 25. Pa t r i c i a S á n c h e z - H o l ga d o C a r l o s A r c i l a - C a l d e ró n