Spotle AI-thon Top 10 Showcase - Analysing Mental Health Of India - Team Elite - IIM Indore Aegis Vardhaman College

AI-THON
Mental Health Of INDIA
During COVID-19
TEAM : THE ELITE

Associate Partners:
Chennai Mathematical
Institute
Academia Partner
SPJIMR
Entrepreneurship Ecosystem
Partner
Nasscom Community
Community Partner
Media Partners:
PRMoment
Media Partner
So You Wanna Be
In TV?, London
Community Media Partner
Bloggers Alliance
Blogging Partner
Hiring Partners:
L&T Infotech
Lifevitae Singapore
EKO Informatics

Team : The Elite
  Priyank Jha
PGP in Data Science @ Aegis
https://spotle.ai/Priyankjha1
  Devleena Banerjee
Business Analytics @ IIM Indore
https://spotle.ai/DevleenaBanerjee
  Vidhya Subramaniam
Business Analytics @ IIM Indore
https://spotle.ai/VidhyaSubramaniam
  Chiranjeevi Karthik
Student @ Vardhaman College
https://spotle.ai/Karthikchiranjeevi

Table of Contents
1. Introduction
      Problem | Objective
2. Our Approach
        Methodology | Solution
3. The Outcome
        Results | Conclusion
4. Productionization
        Limitations | Prototype

Problem Statement
Can we analyze the mental health of a person based on his twitter
usage?
If yes, what are the factors that determine this?
To what extent COVID-19 aﬀected people`s mental health?

Objective
To come up with an eﬀective methodology to analyze mental health
based on tweets.
To understand what determines the emotion conveyed in a tweet.
To gather insights on how COVID-19 aﬀected mental health based on
tweets.

Methodology
Dataset
Cleaned
Dataset with
extracted features
Feature Extraction &
Data Pre-Processing
Try to label tweets
using hashtags &
emojis
Tweet Location
Date &
Time
... ... ...
... ... ...
Emojis
Emojis_in_words
Hashtags
Time of the day
Try to label tweets
using emotion
lexicons
Check which approach is
more reliableCreate a model
for each emotion
For our analysis we are
considering 6 emotions.
(Ekman`s Emotions)

Observations : Labelling tweets
✘
Labelling tweets
using hashtags &
emojis Failed
% of tweets with emojis and
hashtags which correspond
to an emotion are less.
Less correlation between
emotions extracted from
emojis,hashtags and polarity
of tweets.
✓
Labelling tweets
using lexicons
Worked
For every tweet we could
extract whether a particular
emotion is present in the
tweet or not.

Our Strategy
No external
dataset was used
The unsupervised approach
using emotion lexicons is
relatively faster.
We just need a single scan of
the dataset to label each
tweet with a particular
emotion.
Multiple emotions
in single tweet
Thanks to the emotion
lexicons, we could label each
tweet with multiple
emotions.

Solution
For every tweet in the
dataset
Initialise an empty
vector
Split the tweet
into a list of
words
Check if the
word exists
in lexicon
database
If yes
then add the vector to
the initialised vector
Lexicon
database
STEP - I
STEP - II
Resultant vector
Store the embedding
in the database along
with the tweet
STEP - III
... ...... ...... ...... ...... ...... ...... ............. ... ...
For all the words
in a tweet
0 0 0 0 0 0
... ... ... ... ... ...
Word Anger Disgust Fear Joy Sadness Surprise
aback 0 0 0 0 0 0
abandon 0 0 1 0 1 0
... ... ... ... ... ... ...
.. ... ... ... ... ... ...
.. ... ... ... ... ... ...
... ... ... ... ... ...

Modelling
We have built 6 binary classiﬁcation models, where each model
corresponds to a particular emotion.
To build these models, Logistic regression was used over the vector
embeddings extracted using the lexicon database.

Conclusion
Yes, we can determine the mental health of a person using twitter
usage.
The overall emotions in a tweet are decided by the emotions of
individual words and not hashtags or emojis.
COVID-19 has deﬁnitely taken a toll on people`s mental health as fear
and sadness seem to be dominating their emotional state.

Limitations
The predictions of our model are accurate only for tweets that use
vocabulary similar to that of our training set.
If none of the words in a tweet are part of our training set vocabulary,
then it is implicitly labelled as neutral.
To overcome the above limitations, we can train on a larger dataset.

Prototype
https://tweet-emotion-detection.herokuapp.com/

Complete Analysis
Download the pdf here

Time of the day the tweets were posted

Number of tweets belonging to each emotion

Correlation between emotions extracted from emojis
v/s Polarity of the tweet

References
1. https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html
2. https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop_duplicates.html
3. https://www.geeksforgeeks.org/handling-oserror-exception-in-python/
4. https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.tseries.offsets.DateOffset.html
5. https://stackoverflow.com/questions/43146528/how-to-extract-all-the-emojis-from-text
6. https://emojis.wiki/
7. https://stackoverflow.com/questions/43145199/create-wordcloud-from-dictionary-values
8. https://textblob.readthedocs.io/en/dev/quickstart.html#sentiment-analysis
9. http://sentiment.nrc.ca/lexicons-for-research/
10. https://seaborn.pydata.org/generated/seaborn.pairplot.html
11. https://stackoverflow.com/questions/9897345/pickle-alternatives

Spotle AI-thon Top 10 Showcase - Analysing Mental Health Of India - Team Elite - IIM Indore Aegis Vardhaman College

Recomendados

Recomendados

Más contenido relacionado

Similar a Spotle AI-thon Top 10 Showcase - Analysing Mental Health Of India - Team Elite - IIM Indore Aegis Vardhaman College

Similar a Spotle AI-thon Top 10 Showcase - Analysing Mental Health Of India - Team Elite - IIM Indore Aegis Vardhaman College (20)

Más de Spotle.ai

Más de Spotle.ai (20)

Último

Último (20)

Spotle AI-thon Top 10 Showcase - Analysing Mental Health Of India - Team Elite - IIM Indore Aegis Vardhaman College