Brief description of the paper "Large-scale visual sentiment ontology and detectors using adjective noun pairs" presented in ACM Multimedia 2013 as a full paper.
4. Basic strategy
• Adjective noun pairs (ANPs)
– Adjectives play a significant role in conveying
sentiments, but visually inconsistent.
– Combined phrases make the concepts more
detectable than single adj. & n.
• cf. Recognition using visual phrases
[CVPR11]
5. Contributions
• Automatically construct a large-scale Visual
Sentiment Ontology (VSO) with 3000 ANPs
– With the help of psychological theories and web
mining techniques
• Propose SentiBank: a visual concept detector
library to detect the presence of 1200 ANPs
– Useful for sentiment analysis of visual contents as
attributes
6. Framework
1. Select 24 fundamental words representing emotion
2. Retrieve images with every of the words as a query
3. Tags associated with the images are extracted to
build ANPs ( = strong sentiment ADJs + all Ns)
4. Train ANP detectors and keep only detectors with
reasonable performance to form SentiBank
7. Framework
1. Select 24 fundamental words representing emotion
2. Retrieve images with every of the words as a query
3. Tags associated with the images are extracted to
build ANPs ( = strong sentiment ADJs + all Ns)
4. Train ANP detectors and keep only detectors with
reasonable performance to form SentiBank
9. 24 basic words for emotions (cont.)
• 8 basic emotions
x 3 degrees
1
4
2
3
1
2
3
4
1
2
3
4
3
2
4
1
10. Framework
1. Select 24 fundamental words representing emotion
2. Retrieve images with every of the words as a query
3. Tags associated with the images are extracted to
build ANPs ( = strong sentiment ADJs + all Ns)
4. Train ANP detectors and keep only detectors with
reasonable performance to form SentiBank
11. Sentiment word discovery
• Web mining strategy
– Retrieve images & videos from Flickr & YouTube
with each of 24 basic words as a query
– Extract their associated tags by Lookapp tool
[Borth+ ICMR11]
12. Sentiment word discovery (cont.)
• Exploits various NLP techniques & resources
– Post-processings
• Remove stop words, perform stemming
• Top 100 tags are selected for each emotion
– Sentiment value computation (-1 neg +1 pos)
• SentiWordNet [Esuli+ 2006] SentiStrength [Thelwall+ 2010]
13. Framework
1. Select 24 fundamental words representing emotion
2. Retrieve images with every of the words as a query
3. Tags associated with the images are extracted to
build ANPs ( = strong sentiment ADJs + all Ns)
4. Train ANP detectors and keep only detectors with
reasonable performance to form SentiBank
14. ANP construction
• Take all the pairs of (ADJ, N)s into consideration
– Remove named entities with meaning changed
(e.g. “hot” + “dog” generic named entity)
• Fuse sentiment values
– Simple sum-up model : s(ANP) = s(ADJ) + s(N)
• If sgn(s(ADJ)) != sgn(s(N)), then s(ANP) = S(ADJ).
• Rank ANPs by their frequency
– Remove all ANPs with no images
– Resulting in 47K ANP candidates
15. ANP construction (cont.)
• Ontology sampling
– Partition candidates into individual ADJ sets
– Sample a subset from each ADJ set
– Take ANPs with sufficient (>125) images
• Linking back to emotions
– For each ANP, count images with 24 basic words & the ANP
in their meta, create a 24-dim histogram
16. How reliable ANP labels are?
• Web annotation may not be reliable
– Using Flickr tags as pseudo ANP labels might incur
false positive
• Manual (=AMT) validation
– Randomly sample images of 200 ANPs
– Each image is validated by 3 Turkers, treated as
correct only if >= 2 Turkers agree
– Results: 97% correct
18. Framework
1. Select 24 fundamental words representing emotion
2. Retrieve images with every of the words as a query
3. Tags associated with the images are extracted to
build ANPs ( = strong sentiment ADJs + all Ns)
4. Train ANP detectors and keep only detectors with
reasonable performance to form SentiBank
19. Training ANP detectors
• Various visual features
– Color histogram (3 colors x 256 dim), GIST (512 dim),
LBP (53 dim), BoW with spatial pyramid and max
pooling (1000 dim x 2 layers), attributes [Yu+ CVPR13]
(2000 dim)
• Training a linear SVM for every ANP
– Parameter tuning by cross validation (AP@20-based)
– Measure performance by AP@20, AUC & F-score.
• Several feature fusions
– Early fusion, late fusion, weighted early/late fusion
20. Detector performance
• Comparing visual features (left)
– 1st: attributes, 2nd: BoWs
• Comparing feature fusions (right)
– 1st: Weighted late fusion, but not dominant
– Adopt early fusion for implementation simplicity
22. Detectability issues
• Select only ANPs with good detection accuracy
– 1200 ANPs with AP@20>0 & F-score>0.6
• No correlation bwt detectability & occurrence
– Difficulty in detecting ANPs depends on the
content diversity and the abstract level
23. Other issues
• Special visual features improve detectors
– ObjectBank [Li+ NIPS2010], facial features, aesthetic
features [Bhattacharya+ ACMMM13]
• Ontology structure
– Interactive process to combine 1200 ANPs into distinct
groups 6 levels, 15 nodes at the top
• N: standard “is-a” relations
• ADJ: exclusive (“sad” vs “happy”) & strength (“nice”, “great”,
“awesome”)
– 41% nouns uncovered by ImageNet
• Related to abstract concepts (e.g. “violence”, “religion”)
24. Framework
1. Select 24 fundamental words representing emotion
2. Retrieve images with every of the words as a query
3. Tags associated with the images are extracted to
build ANPs ( = strong sentiment ADJs + all Ns)
4. Train ANP detectors and keep only detectors with
reasonable performance to form SentiBank
25. SentiBank applications
• Sentiment prediction in image tweets
– Sentiment analysis rely on text-based tools
– 140 characters (in ENG) are too short
– Use SentiBank to complement and augment texts
• Emotion classification
– Demonstrate the performance against an emotion
dataset of art photos [Machajdik+ ACMMM10]
26. Sentiment prediction in tweets
• Data collection
– Gather tweets with images & popular hashtags
• #nuclearpower, #election, #championsleague, #cairo …
– AMT to obtain sentiment ground-truth
• 3 Turkers for every tweets: almost agreed (below)
http://www.ee.columbia.ed
u/ln/dvmm/vso/download/t
witter_dataset.html
27. Sentiment prediction in tweets (cont.)
• Visual-based classifier
– Serve SentiBank as a mid-level representation
• Use ANP responses as an input feature
• Employ a linear classifier for the final output
– Compare SentiBank with low-level features