Spotle AI-thon - The AI Global Challenge had 7000+ participants from best campuses in India, Singapore working on addressing the mental health challenge with AI. Top 10 teams from IIT Roorkee, CMI, NIT, IIM Indore, Charotar University, DIAT made it to the final round. This is a showcase Top 10 presentation from Team Elite (IIM Indore, Aegis, Vardhaman College)
2. Associate Partners:
Chennai Mathematical
Institute
Academia Partner
SPJIMR
Entrepreneurship Ecosystem
Partner
Nasscom Community
Community Partner
Media Partners:
PRMoment
Media Partner
So You Wanna Be
In TV?, London
Community Media Partner
Bloggers Alliance
Blogging Partner
Hiring Partners:
L&T Infotech
Lifevitae Singapore
EKO Informatics
3. Team : The Elite
Priyank Jha
PGP in Data Science @ Aegis
https://spotle.ai/Priyankjha1
Devleena Banerjee
Business Analytics @ IIM Indore
https://spotle.ai/DevleenaBanerjee
Vidhya Subramaniam
Business Analytics @ IIM Indore
https://spotle.ai/VidhyaSubramaniam
Chiranjeevi Karthik
Student @ Vardhaman College
https://spotle.ai/Karthikchiranjeevi
4. Table of Contents
1. Introduction
Problem | Objective
2. Our Approach
Methodology | Solution
3. The Outcome
Results | Conclusion
4. Productionization
Limitations | Prototype
5. Problem Statement
Can we analyze the mental health of a person based on his twitter
usage?
If yes, what are the factors that determine this?
To what extent COVID-19 affected people`s mental health?
6. Objective
To come up with an effective methodology to analyze mental health
based on tweets.
To understand what determines the emotion conveyed in a tweet.
To gather insights on how COVID-19 affected mental health based on
tweets.
7. Methodology
Dataset
Cleaned
Dataset with
extracted features
Feature Extraction &
Data Pre-Processing
Try to label tweets
using hashtags &
emojis
Tweet Location
Date &
Time
... ... ...
... ... ...
Emojis
Emojis_in_words
Hashtags
Time of the day
Try to label tweets
using emotion
lexicons
Check which approach is
more reliableCreate a model
for each emotion
For our analysis we are
considering 6 emotions.
(Ekman`s Emotions)
8. Observations : Labelling tweets
✘
Labelling tweets
using hashtags &
emojis Failed
% of tweets with emojis and
hashtags which correspond
to an emotion are less.
Less correlation between
emotions extracted from
emojis,hashtags and polarity
of tweets.
✓
Labelling tweets
using lexicons
Worked
For every tweet we could
extract whether a particular
emotion is present in the
tweet or not.
9. Our Strategy
No external
dataset was used
The unsupervised approach
using emotion lexicons is
relatively faster.
We just need a single scan of
the dataset to label each
tweet with a particular
emotion.
Multiple emotions
in single tweet
Thanks to the emotion
lexicons, we could label each
tweet with multiple
emotions.
10. Solution
For every tweet in the
dataset
Initialise an empty
vector
Split the tweet
into a list of
words
Check if the
word exists
in lexicon
database
If yes
then add the vector to
the initialised vector
Lexicon
database
STEP - I
STEP - II
Resultant vector
Store the embedding
in the database along
with the tweet
STEP - III
... ...... ...... ...... ...... ...... ...... ............. ... ...
For all the words
in a tweet
0 0 0 0 0 0
... ... ... ... ... ...
Word Anger Disgust Fear Joy Sadness Surprise
aback 0 0 0 0 0 0
abandon 0 0 1 0 1 0
... ... ... ... ... ... ...
.. ... ... ... ... ... ...
.. ... ... ... ... ... ...
... ... ... ... ... ...
11. Modelling
We have built 6 binary classification models, where each model
corresponds to a particular emotion.
To build these models, Logistic regression was used over the vector
embeddings extracted using the lexicon database.
14. Conclusion
Yes, we can determine the mental health of a person using twitter
usage.
The overall emotions in a tweet are decided by the emotions of
individual words and not hashtags or emojis.
COVID-19 has definitely taken a toll on people`s mental health as fear
and sadness seem to be dominating their emotional state.
15. Limitations
The predictions of our model are accurate only for tweets that use
vocabulary similar to that of our training set.
If none of the words in a tweet are part of our training set vocabulary,
then it is implicitly labelled as neutral.
To overcome the above limitations, we can train on a larger dataset.