Based on paper Understanding and Classifying Image Tweets
ACM-MM 2013
Disclaimer: I am not any kind of author of this paper. I have used that paper as a basis for my course project proposal.
1. Investigating Images Related to
Twitter Trending Topics
1
MUSTAFA ILKER SARAC
20801528
UNDERSTANDING AND CLASSIFYING IMAGE
TWEETS
ACM-MM 2013
CS531 - Mustafa Ilker SARAC
1/13/2014
2. Content
2
Introduction
Motivation
Image-Tweets
Image and Text Relation
Visual/Non-Visual Classification
Experiments
Initial Results
CS531 - Mustafa Ilker SARAC
1/13/2014
3. Introduction
3
Image-tweets
Correlation between tweet’s image and text
50% of all posts are image-tweets
Image tweets retweeted more and survived longer
CS531 - Mustafa Ilker SARAC
1/13/2014
4. Motivation
4
Questions to ask
What types of images do users embed?
Do the images distinctly differ from images on image/photosharing websites like Flickr?
Do the textual contents of image tweets differ from posts that
are text-only?
Contributions
Corpus
Annotated subset
Built a classifier to distinguish two subclasses of image-tweets;
Visual
Non-Visual
CS531 - Mustafa Ilker SARAC
1/13/2014
5. Image-Tweets
5
Corpus
Text-only and image-tweets from Weibo
7 months in 2012
~57M tweets
Manually annotated ~5K subset
CS531 - Mustafa Ilker SARAC
1/13/2014
6. Image-Tweets
6
Image Characteristics
Images are post-processed by Weibo
45.1% of the corpus are image-tweets
Images vary by quality and topics
70% of annotated corpus are natural photograph.
CS531 - Mustafa Ilker SARAC
1/13/2014
7. Image-Tweets
7
Image-tweets vs. Text-only When? What? Why?
More image-tweets during daytime – When?
LDA applied to a subset, ~1M, of corpus – What?
k=50 latent topics are learned
Daily chatter or information sharing – Why?
CS531 - Mustafa Ilker SARAC
1/13/2014
8. Image and Text Relation
8
99% of image tweets have text.
Status (event, time ,location)
Logico – semantic
CS531 - Mustafa Ilker SARAC
1/13/2014
9. Image and Text Relation
9
Visually-relevant image-tweets
At least one noun or verb corresponds to part of the image
Non-visual image-tweets
Image and text has no visual correspondence
Hard to distinguish by just looking images
May exhibit emotional relevance
CS531 - Mustafa Ilker SARAC
1/13/2014
10. Visual/Non-Visual Classification
10
Dataset Construction
Crowdsourcing to label a random subset of the image-tweets
Visual
Non-visual
Each image is annotated by 3 different subjects
4811 image-tweets annotated
3206 (2/3) visual
1605 (1/3) non-visual
3 major types of features are used
Text
Image
Context
CS531 - Mustafa Ilker SARAC
1/13/2014
11. Visual/Non-Visual Classification
11
Text Features
Binary word features
Previously learned topics from LDA
Part of Speech(POS) density features
Named Entities
Microblog specific features
@mentions
#hashtags
Geolocation
URLs
CS531 - Mustafa Ilker SARAC
1/13/2014
12. Visual/Non-Visual Classification
12
Image features
Face detection
SIFT features with bag of visual words representation
Applied LDA with k=35
Context Features
Retweets
Comments
Follower Ratio
Posting Time etc.
CS531 - Mustafa Ilker SARAC
1/13/2014
13. Experiment
13
10 fold cross-validation with Naïve Bayes is
performed
Macro-averaged F1 score is computed.
Baseline is using only words as feature
F1 = 64.8
Each feature is combined individually to observe the
impact.
When combined all positive features
F1 = 70.5
CS531 - Mustafa Ilker SARAC
1/13/2014
15. Proposed Work
15
Re-rank images of image-tweets returned by Twitter
search
Select good images in order to represent Trending
Topics.
Twitter scraped and some initial results are obtained
using
Retweets,
Favorites for contextual features
SIFT for image features to compare images.
CS531 - Mustafa Ilker SARAC
1/13/2014