SlideShare una empresa de Scribd logo
1 de 24
Descargar para leer sin conexión
Identifying Prominent
Life Events on Twitter
SPEAKER – TOM DICKINSON
AUTHORS – TOM DICKINSON (OU - KMI), MIRIAM FERNANDEZ (OU - KMI), LISA
THOMAS (NORTHUMBRIA), PAUL MULHOLLAND (OU - KMI), PAM BRIGGS
(NORTHUMBRIA) AND HARITH ALANI (OU - KMI)
Quick Overview
Some Background
What we did
Discussion
Some Background
Why are we doing this?
As content creators, we post a lot of stuff on social media
This content can range from silly cats, to important life events that have happened to us
However, as users, we effectively lose access to this information about ourselves and forget
what’s there
By being able to mine and present this data to users, we can look at giving users a tool to aid in
self reflection over their own online digital presence
So what are we doing?
As part of the Reellives project, we are looking at making short “Reels” from a users social media
content
These reels are intended as mini documentaries about a users life on social media
This presents two main problems for us to solve:
◦ R.Q. 1) How can we extract meaningful events about ourselves from our social media data?
◦ R.Q. 2) How can we present these events in a cohesive narrative?
R.Q. 1 is being tackled by KMI, where we are looking at event extraction.
R.Q. 2 is being tackled by Edinburgh, who are looking at taking our output, as their input, to
construct narratives.
So what are we doing?
Social
Media
Storage
Extraction Events
Story
Generation
“Reel”
Life
Event
Detection
StoryFabula Narrative
What is a life event?
There already exists a large body of research of event detection on social media.
However, not much has been done on focusing on life or personal events.
Semantically, they are no different:
◦ Both types will have a time and a location
◦ An action occurs
◦ The event is experienced by one or more agents
However:
◦ With general events we care more about the broader social and political significance
◦ With life events we care more about the personal significance
What is a Life Event?
We can also get some intuition for life events from Autobiographical
Memory.
Autobiographical memory is type of memory system that deals with
specific events that happened to us
◦ This is opposed to semantic memory which is our knowledge of things
It can be modelled with three separate layers
◦ Lifetime periods
◦ When I was at school I had my first kiss
◦ General Events
◦ I got married
◦ Event-Specific Knowledge
◦ My tie was red at the wedding
In our work, we can consider the event-specific knowledge to be
reflected in social media posts
What We Did
Types of life events
To start off our research, we looked at identifying a finite number of life events.
The types of life events we chose are inspired by work done in Autobiographical Memory
◦ S. M. Janssen and D. C. Rubin. Age effects in cultural life scripts. Applied Cognitive Psychology
Their research showed a common consensus, amongst different age groups, of 48 life events that
would happen to a fictional child over the course of their life.
From this study, we selected 5 of the top events mentioned in a paper
◦ Getting Married
◦ Having Children
◦ Starting School
◦ A Parents Death
◦ First Love
We also look at combining all positive “about an event” into a training set to create a more general “Is
this about an event” classifier.
What we did – Data Collection
We chose Twitter due to ease of use for extracting large datasets.
Our selection methodology was based around a simple keyword search, where we considered
the root concepts for each of our events, and enhanced with synonyms from WordNet.
We extracted Tweets from Twitter’s front-end search, as opposed to their API
◦ This is due to their API having a 7 day limit
◦ Twitter now indexes every tweet, making it available to scrape from their front-end search application
Additional details were extracted for each Tweet, using their Lookup API with the extracted
Tweet ID.
What we did - Annotations
To annotate our dataset, we turned to CrowdFlower
To start with, we ran several small trials of annotation exercises on CrowdFlower to make sure
our questions were satisfactory
We initially had 7 questions:
◦ Is this tweet about Getting Married?
◦ Is this tweet about an event?
◦ Was the tweet before, during, or after the event?
◦ Is the author of the tweet experiencing the event?
◦ Is anyone else experiencing the event with the author?
◦ Is anyone else named in the tweet experiencing the event?
◦ Did the event happen where it was tweeted?
This did not prove too popular as we had large number of quiz failures
What we did - Annotations
Obvious failure for this initial test run were too many questions and possible subjectivity for our
given definition of an event.
After another trial, we finally settled on only asking two questions:
◦ Q1 - Is this tweet related to a particular topic theme? (Topic theme is the cluster we extracted from)
◦ Q2 - Is this tweet about an important life event?
We also provided users a list of the 46 life events that Jansen and Rubin identified, as a way to
get them to understand what we were after.
This ran much better, and our final agreement ratings were 89.5% and 87.17% respectively
What we did – Feature Sets
Our feature sets were divided into several groups:
◦ User features
◦ H1) Certain types of users may be more prone to share life events in Twitter
◦ Content Features
◦ H2) Posts written in a certain way may be related to life events
◦ Semantic Features
◦ H3) Posts about life events might be semantically associated with certain entities or concepts
◦ Interaction Features
◦ H4) Users who do not normally talk with the poster, might start interacting for certain types of life events
What we did - Classifiers
We ended up just testing two classifiers, as other work had already tested a number of different
classifiers on similar datasets:
◦ J48
◦ Naïve Bayes
We did try SVM’s as well, but due to poor performance, omitted it from our results.
To evaluate we used 10-fold cross validation, reporting standard classification performance
measures of Precision, Recall, and F1 scores.
What we did - Results
https://
Discussion
Why the dominance of content features?
Unigrams outstripped performance of other feature sets.
This is similar to other similar papers, and slightly disappointing.
While the classifiers were biased towards the keywords chosen, it is disappointing other feature
sets did not perform well.
In the case of interaction features this might be because:
◦ We were limited in what types of interaction features we could obtain, due to the limits of Twitters API
◦ The dataset might have been annotated incorrectly
◦ For example, stories of other people are annotated, rather than people declaring an event about themselves
◦ Due to the nature of Twitter and it’s followers, interaction features might just not be a good
discriminator.
◦ Sites like Facebook though, which tends to be private, might have better performance in this area
Choice of targeting specific life events
Targeting only five specific life events, dilutes what we can actually extract from social media
Our binary classifier worked alright, but:
◦ Due to dependency on unigrams, it will probably not perform very well outside of these 5 events
This is no silver bullet for solving our research question
Collecting the dataset
The collected dataset was biased to certain words due to a keyword search
A better way to collect these datasets would be to randomly sample twitter profiles, and
annotate their timelines
However, it is likely that only a small number of tweets are actually about these types of events
in a users timeline
To achieve a decent training set, we would need to annotate lots of tweets which is very costly
Twitter and the annotation process
Using CrowdFlower is a great way to gain lots of annotations fast
However, with Twitter data we think the annotation is flawed for these types of questions
Lack of context
◦ Is a 140 character max text string enough context to annotate these types of events
◦ Example: Is “MadJacks Forever Memories” about getting married?
◦ Madjacks is a wedding venue in Las Vegas, so this might be?
First vs third party annotation
◦ While lack of context for a third party is an issue, if the owner of the tweet annotated it, would we get
better results?
Extracting useful interaction features is difficult
◦ There is no API to get conversations for tweets. Mining this manually is possible, but annoying.
◦ You can’t get access to which users have favourited a tweet
Facebook would be better…
…but it has heavy privacy controls to access user data
While this is great for users, it’s annoying for researchers
Retrieving content from Facebook all needs to be done within an application
◦ These days, a User ID is hashed with your application ID
◦ If you have a standard user ID, you can’t access the Facebook graph API to retrieve information about it
Asking people to just give us their Facebook data with a single sign on approach isn’t the best
approach either
◦ Users are reluctant to just give researchers their private data
◦ What do they get out of it? (besides the results of the research)
Is Instagram the middle ground?
Like Twitter, there are a lot of open Instagram accounts
◦ Sites like websta.me index large numbers of users and offer tag based search
Like Twitter, it is (currently) easy to extract Instagram data
◦ While the API, like Twitter, is limited, it is possible to extract full user profiles
◦ Instagram works with a REST based architecture, returning user posts in JSON feeds that can be
paginated allowing full extraction of posts
◦ Using the API each post can be augmented with additional information not available in the media
stream
While we think of Instagram only being photos, most photos have short captions similar to
Twitter length
◦ Comments can also provide semantic context
Future Work
We are currently looking at collecting Instagram and Facebook data for future experiments
◦ Facebook data is being collected with a trivial app that users can use
Unsupervised life event detection
◦ As opposed to targeting specific events, being able to extract any type would be of more value
◦ Currently we are looking at knowledge based approaches using ConceptNet to achieve this
Graph Classification of Posts
◦ So far we have employed fairly flat vectors when considering feature sets
◦ As opposed to this, an alternative is to treat posts as graphs, looking at relationships within semantic
(ConceptNet, DBpedia etc), interactions, and dependency parsing
◦ Graph frequent pattern mining might identify new feature sets that we can look at using

Más contenido relacionado

Destacado

Presentations from IPA Modern Briefing from 3rd July 2012
Presentations from IPA Modern Briefing from 3rd July 2012Presentations from IPA Modern Briefing from 3rd July 2012
Presentations from IPA Modern Briefing from 3rd July 2012The_IPA
 
The faces of your brand: your people
The faces of your brand: your peopleThe faces of your brand: your people
The faces of your brand: your peopleCharityComms
 
Engaging donors and users online - Nordic summit of Diabetes foundations June...
Engaging donors and users online - Nordic summit of Diabetes foundations June...Engaging donors and users online - Nordic summit of Diabetes foundations June...
Engaging donors and users online - Nordic summit of Diabetes foundations June...Beate Sørum
 
Messages that encourage bequest giving to cancer research
Messages that encourage bequest giving to cancer researchMessages that encourage bequest giving to cancer research
Messages that encourage bequest giving to cancer researchRussell James
 
Make Things Like Tony Stark - UX London 2015
Make Things Like Tony Stark - UX London 2015Make Things Like Tony Stark - UX London 2015
Make Things Like Tony Stark - UX London 2015John V Willshire
 
The Future of the Client Agency Relationship - A presentation by Hall and Par...
The Future of the Client Agency Relationship - A presentation by Hall and Par...The Future of the Client Agency Relationship - A presentation by Hall and Par...
The Future of the Client Agency Relationship - A presentation by Hall and Par...The_IPA
 
RIA Screen Layouts
RIA Screen LayoutsRIA Screen Layouts
RIA Screen LayoutsTheresa Neil
 

Destacado (8)

Presentations from IPA Modern Briefing from 3rd July 2012
Presentations from IPA Modern Briefing from 3rd July 2012Presentations from IPA Modern Briefing from 3rd July 2012
Presentations from IPA Modern Briefing from 3rd July 2012
 
The faces of your brand: your people
The faces of your brand: your peopleThe faces of your brand: your people
The faces of your brand: your people
 
ClimateMeme1
ClimateMeme1ClimateMeme1
ClimateMeme1
 
Engaging donors and users online - Nordic summit of Diabetes foundations June...
Engaging donors and users online - Nordic summit of Diabetes foundations June...Engaging donors and users online - Nordic summit of Diabetes foundations June...
Engaging donors and users online - Nordic summit of Diabetes foundations June...
 
Messages that encourage bequest giving to cancer research
Messages that encourage bequest giving to cancer researchMessages that encourage bequest giving to cancer research
Messages that encourage bequest giving to cancer research
 
Make Things Like Tony Stark - UX London 2015
Make Things Like Tony Stark - UX London 2015Make Things Like Tony Stark - UX London 2015
Make Things Like Tony Stark - UX London 2015
 
The Future of the Client Agency Relationship - A presentation by Hall and Par...
The Future of the Client Agency Relationship - A presentation by Hall and Par...The Future of the Client Agency Relationship - A presentation by Hall and Par...
The Future of the Client Agency Relationship - A presentation by Hall and Par...
 
RIA Screen Layouts
RIA Screen LayoutsRIA Screen Layouts
RIA Screen Layouts
 

Similar a Identifying Prominent Life Events on Twitter - K-Cap 2015

SENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATASENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATAanargha gangadharan
 
SENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATASENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATAParvathy Devaraj
 
REAL TIME SENTIMENT ANALYSIS OF TWITTER DATA
REAL TIME SENTIMENT ANALYSIS OF TWITTER DATAREAL TIME SENTIMENT ANALYSIS OF TWITTER DATA
REAL TIME SENTIMENT ANALYSIS OF TWITTER DATAMary Lis Joseph
 
3D Viewing Of Geology Settings In Specific Areas
3D Viewing Of Geology Settings In Specific Areas3D Viewing Of Geology Settings In Specific Areas
3D Viewing Of Geology Settings In Specific AreasAlison Hall
 
Instagramming The Ends of Identity: Pre-birth and post-death identity pract...
InstagrammingThe Ends of Identity: Pre-birth and post-death identity pract...InstagrammingThe Ends of Identity: Pre-birth and post-death identity pract...
Instagramming The Ends of Identity: Pre-birth and post-death identity pract...Tim Highfield
 
Social Media and AI: Don’t forget the users
Social Media and AI: Don’t forget the usersSocial Media and AI: Don’t forget the users
Social Media and AI: Don’t forget the usersMounia Lalmas-Roelleke
 
What Your Tweets Tell Us About You, Speaker Notes
What Your Tweets Tell Us About You, Speaker NotesWhat Your Tweets Tell Us About You, Speaker Notes
What Your Tweets Tell Us About You, Speaker NotesKrisKasianovitz
 
What do you really mean when you tweet? Challenges for opinion mining on soci...
What do you really mean when you tweet? Challenges for opinion mining on soci...What do you really mean when you tweet? Challenges for opinion mining on soci...
What do you really mean when you tweet? Challenges for opinion mining on soci...Diana Maynard
 
Ces 2013 towards a cdn definition of evaluation
Ces 2013   towards a cdn definition of evaluationCes 2013   towards a cdn definition of evaluation
Ces 2013 towards a cdn definition of evaluationCesToronto
 
Content, Data and Humans
Content, Data and HumansContent, Data and Humans
Content, Data and HumansRandall Snare
 
How to use Big Data to drive product strategy and adoption
How to use Big Data to drive product strategy and adoptionHow to use Big Data to drive product strategy and adoption
How to use Big Data to drive product strategy and adoptionUXPA International
 
Opinion mining for social media
Opinion mining for social mediaOpinion mining for social media
Opinion mining for social mediaDiana Maynard
 
How To Make A Project Concept Paper - Best Desi
How To Make A Project Concept Paper - Best DesiHow To Make A Project Concept Paper - Best Desi
How To Make A Project Concept Paper - Best DesiBria Davis
 
Social media case studies and strategies for success final
Social media case studies and strategies for success finalSocial media case studies and strategies for success final
Social media case studies and strategies for success finalJeff Stern
 
Engaging with Users on Public Social Media
Engaging with Users on Public Social MediaEngaging with Users on Public Social Media
Engaging with Users on Public Social MediaJeffrey Nichols
 
Writing An Abstract - University Of Adelaide
Writing An Abstract - University Of AdelaideWriting An Abstract - University Of Adelaide
Writing An Abstract - University Of AdelaideAlyssa Jones
 
1. The birth of the internet brief discussion2. Early Internet b.docx
1. The birth of the internet brief discussion2. Early Internet b.docx1. The birth of the internet brief discussion2. Early Internet b.docx
1. The birth of the internet brief discussion2. Early Internet b.docxpaynetawnya
 
Twitter Sentiment Analysis
Twitter Sentiment AnalysisTwitter Sentiment Analysis
Twitter Sentiment Analysisijtsrd
 
Kentucky Bar Exam Essay Grading
Kentucky Bar Exam Essay GradingKentucky Bar Exam Essay Grading
Kentucky Bar Exam Essay GradingLynn Bennett
 

Similar a Identifying Prominent Life Events on Twitter - K-Cap 2015 (20)

SENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATASENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATA
 
SENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATASENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATA
 
REAL TIME SENTIMENT ANALYSIS OF TWITTER DATA
REAL TIME SENTIMENT ANALYSIS OF TWITTER DATAREAL TIME SENTIMENT ANALYSIS OF TWITTER DATA
REAL TIME SENTIMENT ANALYSIS OF TWITTER DATA
 
3D Viewing Of Geology Settings In Specific Areas
3D Viewing Of Geology Settings In Specific Areas3D Viewing Of Geology Settings In Specific Areas
3D Viewing Of Geology Settings In Specific Areas
 
Instagramming The Ends of Identity: Pre-birth and post-death identity pract...
InstagrammingThe Ends of Identity: Pre-birth and post-death identity pract...InstagrammingThe Ends of Identity: Pre-birth and post-death identity pract...
Instagramming The Ends of Identity: Pre-birth and post-death identity pract...
 
Social Media and AI: Don’t forget the users
Social Media and AI: Don’t forget the usersSocial Media and AI: Don’t forget the users
Social Media and AI: Don’t forget the users
 
What Your Tweets Tell Us About You, Speaker Notes
What Your Tweets Tell Us About You, Speaker NotesWhat Your Tweets Tell Us About You, Speaker Notes
What Your Tweets Tell Us About You, Speaker Notes
 
What do you really mean when you tweet? Challenges for opinion mining on soci...
What do you really mean when you tweet? Challenges for opinion mining on soci...What do you really mean when you tweet? Challenges for opinion mining on soci...
What do you really mean when you tweet? Challenges for opinion mining on soci...
 
Ces 2013 towards a cdn definition of evaluation
Ces 2013   towards a cdn definition of evaluationCes 2013   towards a cdn definition of evaluation
Ces 2013 towards a cdn definition of evaluation
 
Content, Data and Humans
Content, Data and HumansContent, Data and Humans
Content, Data and Humans
 
How to use Big Data to drive product strategy and adoption
How to use Big Data to drive product strategy and adoptionHow to use Big Data to drive product strategy and adoption
How to use Big Data to drive product strategy and adoption
 
Opinion mining for social media
Opinion mining for social mediaOpinion mining for social media
Opinion mining for social media
 
How To Make A Project Concept Paper - Best Desi
How To Make A Project Concept Paper - Best DesiHow To Make A Project Concept Paper - Best Desi
How To Make A Project Concept Paper - Best Desi
 
Social media case studies and strategies for success final
Social media case studies and strategies for success finalSocial media case studies and strategies for success final
Social media case studies and strategies for success final
 
Engaging with Users on Public Social Media
Engaging with Users on Public Social MediaEngaging with Users on Public Social Media
Engaging with Users on Public Social Media
 
Writing An Abstract - University Of Adelaide
Writing An Abstract - University Of AdelaideWriting An Abstract - University Of Adelaide
Writing An Abstract - University Of Adelaide
 
1. The birth of the internet brief discussion2. Early Internet b.docx
1. The birth of the internet brief discussion2. Early Internet b.docx1. The birth of the internet brief discussion2. Early Internet b.docx
1. The birth of the internet brief discussion2. Early Internet b.docx
 
vishwas
vishwasvishwas
vishwas
 
Twitter Sentiment Analysis
Twitter Sentiment AnalysisTwitter Sentiment Analysis
Twitter Sentiment Analysis
 
Kentucky Bar Exam Essay Grading
Kentucky Bar Exam Essay GradingKentucky Bar Exam Essay Grading
Kentucky Bar Exam Essay Grading
 

Identifying Prominent Life Events on Twitter - K-Cap 2015

  • 1. Identifying Prominent Life Events on Twitter SPEAKER – TOM DICKINSON AUTHORS – TOM DICKINSON (OU - KMI), MIRIAM FERNANDEZ (OU - KMI), LISA THOMAS (NORTHUMBRIA), PAUL MULHOLLAND (OU - KMI), PAM BRIGGS (NORTHUMBRIA) AND HARITH ALANI (OU - KMI)
  • 4. Why are we doing this? As content creators, we post a lot of stuff on social media This content can range from silly cats, to important life events that have happened to us However, as users, we effectively lose access to this information about ourselves and forget what’s there By being able to mine and present this data to users, we can look at giving users a tool to aid in self reflection over their own online digital presence
  • 5. So what are we doing? As part of the Reellives project, we are looking at making short “Reels” from a users social media content These reels are intended as mini documentaries about a users life on social media This presents two main problems for us to solve: ◦ R.Q. 1) How can we extract meaningful events about ourselves from our social media data? ◦ R.Q. 2) How can we present these events in a cohesive narrative? R.Q. 1 is being tackled by KMI, where we are looking at event extraction. R.Q. 2 is being tackled by Edinburgh, who are looking at taking our output, as their input, to construct narratives.
  • 6. So what are we doing? Social Media Storage Extraction Events Story Generation “Reel” Life Event Detection StoryFabula Narrative
  • 7. What is a life event? There already exists a large body of research of event detection on social media. However, not much has been done on focusing on life or personal events. Semantically, they are no different: ◦ Both types will have a time and a location ◦ An action occurs ◦ The event is experienced by one or more agents However: ◦ With general events we care more about the broader social and political significance ◦ With life events we care more about the personal significance
  • 8. What is a Life Event? We can also get some intuition for life events from Autobiographical Memory. Autobiographical memory is type of memory system that deals with specific events that happened to us ◦ This is opposed to semantic memory which is our knowledge of things It can be modelled with three separate layers ◦ Lifetime periods ◦ When I was at school I had my first kiss ◦ General Events ◦ I got married ◦ Event-Specific Knowledge ◦ My tie was red at the wedding In our work, we can consider the event-specific knowledge to be reflected in social media posts
  • 10. Types of life events To start off our research, we looked at identifying a finite number of life events. The types of life events we chose are inspired by work done in Autobiographical Memory ◦ S. M. Janssen and D. C. Rubin. Age effects in cultural life scripts. Applied Cognitive Psychology Their research showed a common consensus, amongst different age groups, of 48 life events that would happen to a fictional child over the course of their life. From this study, we selected 5 of the top events mentioned in a paper ◦ Getting Married ◦ Having Children ◦ Starting School ◦ A Parents Death ◦ First Love We also look at combining all positive “about an event” into a training set to create a more general “Is this about an event” classifier.
  • 11. What we did – Data Collection We chose Twitter due to ease of use for extracting large datasets. Our selection methodology was based around a simple keyword search, where we considered the root concepts for each of our events, and enhanced with synonyms from WordNet. We extracted Tweets from Twitter’s front-end search, as opposed to their API ◦ This is due to their API having a 7 day limit ◦ Twitter now indexes every tweet, making it available to scrape from their front-end search application Additional details were extracted for each Tweet, using their Lookup API with the extracted Tweet ID.
  • 12. What we did - Annotations To annotate our dataset, we turned to CrowdFlower To start with, we ran several small trials of annotation exercises on CrowdFlower to make sure our questions were satisfactory We initially had 7 questions: ◦ Is this tweet about Getting Married? ◦ Is this tweet about an event? ◦ Was the tweet before, during, or after the event? ◦ Is the author of the tweet experiencing the event? ◦ Is anyone else experiencing the event with the author? ◦ Is anyone else named in the tweet experiencing the event? ◦ Did the event happen where it was tweeted? This did not prove too popular as we had large number of quiz failures
  • 13. What we did - Annotations Obvious failure for this initial test run were too many questions and possible subjectivity for our given definition of an event. After another trial, we finally settled on only asking two questions: ◦ Q1 - Is this tweet related to a particular topic theme? (Topic theme is the cluster we extracted from) ◦ Q2 - Is this tweet about an important life event? We also provided users a list of the 46 life events that Jansen and Rubin identified, as a way to get them to understand what we were after. This ran much better, and our final agreement ratings were 89.5% and 87.17% respectively
  • 14. What we did – Feature Sets Our feature sets were divided into several groups: ◦ User features ◦ H1) Certain types of users may be more prone to share life events in Twitter ◦ Content Features ◦ H2) Posts written in a certain way may be related to life events ◦ Semantic Features ◦ H3) Posts about life events might be semantically associated with certain entities or concepts ◦ Interaction Features ◦ H4) Users who do not normally talk with the poster, might start interacting for certain types of life events
  • 15. What we did - Classifiers We ended up just testing two classifiers, as other work had already tested a number of different classifiers on similar datasets: ◦ J48 ◦ Naïve Bayes We did try SVM’s as well, but due to poor performance, omitted it from our results. To evaluate we used 10-fold cross validation, reporting standard classification performance measures of Precision, Recall, and F1 scores.
  • 16. What we did - Results https://
  • 18. Why the dominance of content features? Unigrams outstripped performance of other feature sets. This is similar to other similar papers, and slightly disappointing. While the classifiers were biased towards the keywords chosen, it is disappointing other feature sets did not perform well. In the case of interaction features this might be because: ◦ We were limited in what types of interaction features we could obtain, due to the limits of Twitters API ◦ The dataset might have been annotated incorrectly ◦ For example, stories of other people are annotated, rather than people declaring an event about themselves ◦ Due to the nature of Twitter and it’s followers, interaction features might just not be a good discriminator. ◦ Sites like Facebook though, which tends to be private, might have better performance in this area
  • 19. Choice of targeting specific life events Targeting only five specific life events, dilutes what we can actually extract from social media Our binary classifier worked alright, but: ◦ Due to dependency on unigrams, it will probably not perform very well outside of these 5 events This is no silver bullet for solving our research question
  • 20. Collecting the dataset The collected dataset was biased to certain words due to a keyword search A better way to collect these datasets would be to randomly sample twitter profiles, and annotate their timelines However, it is likely that only a small number of tweets are actually about these types of events in a users timeline To achieve a decent training set, we would need to annotate lots of tweets which is very costly
  • 21. Twitter and the annotation process Using CrowdFlower is a great way to gain lots of annotations fast However, with Twitter data we think the annotation is flawed for these types of questions Lack of context ◦ Is a 140 character max text string enough context to annotate these types of events ◦ Example: Is “MadJacks Forever Memories” about getting married? ◦ Madjacks is a wedding venue in Las Vegas, so this might be? First vs third party annotation ◦ While lack of context for a third party is an issue, if the owner of the tweet annotated it, would we get better results? Extracting useful interaction features is difficult ◦ There is no API to get conversations for tweets. Mining this manually is possible, but annoying. ◦ You can’t get access to which users have favourited a tweet
  • 22. Facebook would be better… …but it has heavy privacy controls to access user data While this is great for users, it’s annoying for researchers Retrieving content from Facebook all needs to be done within an application ◦ These days, a User ID is hashed with your application ID ◦ If you have a standard user ID, you can’t access the Facebook graph API to retrieve information about it Asking people to just give us their Facebook data with a single sign on approach isn’t the best approach either ◦ Users are reluctant to just give researchers their private data ◦ What do they get out of it? (besides the results of the research)
  • 23. Is Instagram the middle ground? Like Twitter, there are a lot of open Instagram accounts ◦ Sites like websta.me index large numbers of users and offer tag based search Like Twitter, it is (currently) easy to extract Instagram data ◦ While the API, like Twitter, is limited, it is possible to extract full user profiles ◦ Instagram works with a REST based architecture, returning user posts in JSON feeds that can be paginated allowing full extraction of posts ◦ Using the API each post can be augmented with additional information not available in the media stream While we think of Instagram only being photos, most photos have short captions similar to Twitter length ◦ Comments can also provide semantic context
  • 24. Future Work We are currently looking at collecting Instagram and Facebook data for future experiments ◦ Facebook data is being collected with a trivial app that users can use Unsupervised life event detection ◦ As opposed to targeting specific events, being able to extract any type would be of more value ◦ Currently we are looking at knowledge based approaches using ConceptNet to achieve this Graph Classification of Posts ◦ So far we have employed fairly flat vectors when considering feature sets ◦ As opposed to this, an alternative is to treat posts as graphs, looking at relationships within semantic (ConceptNet, DBpedia etc), interactions, and dependency parsing ◦ Graph frequent pattern mining might identify new feature sets that we can look at using

Notas del editor

  1. Provide a quick overview of what I’m talking about There are three sections: An overview of the background for why we’re doing this, and some terminology What we actually did with the experiment A discussion about our results and what we learnt from it.
  2. Suggest dementia research maybe?
  3. Anyone familiar with narratology? R.Q.1 is effectively the creation of a fabula R.Q.2 is effectively the story generation
  4. Now before we go any further, it’d be useful to discuss what we mean by a life event
  5. Chat about Autobiographical memory for a minute
  6. As a way to kickstart our research, we focus on simple event classifiers These initially are to be used as a filter to kick start off our system, helping to filter posts that are more likely to be about life events We can then look at clustering similar posts together to help performance of our narrative generation component
  7. We admit this isn’t the best, but it is definitely the easiest to obtain data from
  8. Mention CrowdFlower quizzes and how they test users before they can do your job
  9. Talk from Chris Welty earlier highlighted some of our issues with annotations from Twitter.
  10. Most of our event specific classifiers had an F1 score of about 0.9 (falling in love was 0.841) Our binary classifier performed at 0.753 In almost all cases, content features (unigrams) ran far superior to other feature sets. This is likely due to bias of a keyword search, and the short nature of a tweet. Occasionally semantic features helped boost the performance slightly. Other feature sets like interaction and user did not perform so well. Semantic features
  11. Talk by Chris earlier highlighted some of our issues with annotations from Twitter. We did sample some of our annotations, and found a number had been mislabled. For example news stories were included in out annotated dataset.