1) The document discusses analyzing emotions expressed on Twitter related to COVID-19 in India. It describes clustering tweets using unsupervised learning techniques like K-means, NMF, and LDA to group tweets by emotion (fear, joy, sadness, anger).
2) Supervised learning with a bidirectional LSTM model was also used, finding similar results of four emotion clusters.
3) The analysis found more negative tweets relating to fear, sadness, and anger than positive tweets related to joy, indicating people are not feeling optimistic about the COVID-19 situation.
2. Introduction
7.31M
People were affected by the COVID-19 in India which ranks 2nd throughout
the world in cases.
Confirmed Cases
7 239 389
Confirmed Deaths
110 586
Cases per 100 000 people
5245.92
COVID-19 has changed our lives forever. The world we knew until now has been transformed and
nowadays we live in a completely new scenario in a perpetual restructuring transition, in which
the way we live, relate, and communicate with others has been altered permanently. Within this
context, risk communication is playing a decisive role when informing, transmitting, and
channeling the flow of information in society. COVID-19 has posed a real pandemic risk
management challenge in terms of impact, preparedness, response, and mitigation by
governments, health organizations, non-governmental organizations (NGOs), mass media, and
stakeholders. In this study, we examines how COVID-19 has affected risk communication in
uncertain contexts and its impact on the emotions and sentiments derived from twitter during
the COVID-19 pandemic.
Source: World Health Organization - India
3. Problem Statement
This Covid-19 pandemic has severely affected countries around the world. The
intensity of the pandemic is increasing very fast in India. The number of new
cases is increasing every day, every week. In a span of six months, the total
number of cases crossed 50 lakh and total number of deaths is almost I lakh. It
has been observed that the sudden outbreaks of such pandemics affect public
mental states and emotions. This pandemic also result in either constructive or
destructive behavioural changes among people. Anger, Sadness, fear are the
most common emotions witnessed among the people during several pandemics.
Social media platform like Twitter and others have rich sources of information
from people.
Description
• To study the twitter data, performing Exploratory Data Analysis
• Generating summary from the data and finding out conclusion
• Determining types of emotions from ”Anger”, “Fear”, “Sadness”, “Joy”
4. Data Analysis
Word Cloud
• The dataset has 2 million tweets and has four attributes – text, location,
date and time.
• In primary data analysis, the dataset contains 32% of null values and
repeated, leading to redundancy. We observed that after removing null
values and redundant data, data has only 9730 tweets.
• We consider 9730 tweets for our analysis.
• The tweets in dataset are dated from 13th September, 2020 to 22nd
September, 2020.
7. Unsupervised Clustering Analysis
Determining clusters using Elbow
Method
Datasets do not have any labels or category
attributes. It is difficult to classify the tweets
without
labels. This problem can be solved by
unsupervised learning. Therefore, before
classifying the tweets into sentiments, we
have to do an analysis on clustering of tweets
using unsupervised learning algorithms.
K-Means
Clustering
• The Elbow method gives an effective technique to choose the number of
clusters. This method is based on plotting the cost function for the various
number of cluster and identifying the breakpoints.
• If adding more clusters does not reduce the variance significantly, then we
should stop adding more clusters.
• This method gives an insight about cluster which serves a essential data
mining technique before the beginning of clustering.
8. Unsupervised Clustering Analysis
Top frequency words from K-Means
K-Means
Clustering
Fig:-
MonogrambasedK-MeansClustering
Fig:-
BigrambasedK-MeansClustering
• Fig 2.2 shows the monogram-based word
vectors to find number of clusters from
tweets. It is observed that an elbow is seen at
3 and 4 clusters.
• Fig 2.3 shows the bigram-based word vectors
to find the number of clusters from tweets. It
is observed that an elbow is seen at 2, 3 and 5
clusters.
• We can conclude from an elbow analysis that
the tweets are grouped into 3 or 4 clusters if
we consider the monogram-based word vector
representation.
9. Unsupervised Clustering Analysis
Top frequency words from LDA
Latent Drichlet
Allocation(LDA)
Fig:-
MonogrambasedNMFClustering
Fig:-
BigrambasedNMFClustering
Top frequency words from NMF
Clustering
NMF
Clustering
• Nonnegative Matrix Factorization (NMF) is
a popular dimension reduction technique
of clustering by extracting latent features
from high-dimensional data and is widely
used for text mining.
• These lower-dimensional vectors are non-
negative, which also means their
coefficients are non-negative. NMF will
give us the two matrices – words (original
text data) and its weights for those words.
10. Unsupervised Clustering Analysis
Top frequency words from LDA
Latent Dirichlet
AllocationFig:- LDA Clustering result
• LDA is also widely used for dimensionality
reduction. It is a method for clustering
large corpus of text from documents. It is
also used in topic modelling.
• In this method, the word-vector was
formed, and they are iterated based on
Gibbs sampling, afterwards, the
probability of that word is sampled
according to the clusters formed.
12. Supervised Learning
Top frequency words from LDA
Bi-directional
LSTM
Fig:-Model Architecture
• Succeeding to Unsupervised Learning, we
adopted a method so called as Transfer
Learning for Supervised Learning.
• For that we adopted the dataset and
tokenizers of “HuggingFace”[2], consisting
of 20,000 sentences in a corpus. The
sentences were cleaned, tokenized from
HuggingFace, pad
13. Supervised Learning
Results from analysis of tweets
• From the overall results of the tweets we found that the Joy is the highest followed by
sadness, fear and anger
• But at the same time the combined values of negative sentiments i.e anger, fear and
sadness exceeds the joy
• While there is no contribution of Love and Surprise
14. Associated emotion words
SadFearAnger Joy
• Joy : people enjoying movie, tweeting about music, reading articles, making memes, trying different dishes at home,
enjoyed by hearing news that cases of COVID-19 is decreasing daily etc. While some tweets are depending on aspects
of person like some enjoy for having strict fine on no mask while some don’t.
• Sadness : tweets are mainly associated with Mortality, criticizing government, GDP rates, sharing horrible experiences,
some talking about racism, opening of Schools, protest, rebel etc.
• Fear: The tweets are rebellious, fear from testing, fear from going outside and meeting people, panic, fear from being
positive test, suffering from COVID etc. While for the Joy and Love there is no contribution.
15. Cluster 0 correlates with fear
Cluster 1 correlates with sadness
Cluster 2 correlates with joy
Cluster 3 correlates with angry
Comparing Unsupervised Learning with Supervised Learning
16. Conclusion from World Map
European countries, North
America and South Asia reported
more tweets. These countries are
having more number total cases
in World. India followed by USA is
2nd most total corona cases
reported. This shows that
number of tweets are directly
proportional to number of cases
because people of those
countries give their views of their
life on twitter.
World-wide COVID
Tweets analysis
18. Conclusion
COVID-19 Emotion Analysis
The above analysis concludes that performing an unsupervised approach over the twitter emotion
dataset gives the same result as that of the supervised approach, having 4 clusters corresponding to
fear, joy, sadness, and anger. The combined result of Anger, Fear, Sadness is more than Joy, indicating
that people are not feeling optimistic. Angriness is produced due to mismanagement; Fear is produced
due to the COVID-19 virus; sadness is due to loss of Jobs & Wages & loss of beloved ones; joy is
there because people at lockdown are enjoying and making fun while residing home.