Kento Terao, Toru Tamaki, Bisser Raytchev, Kazufumi Kaneda, Shin'ichi Satoh, Which visual questions are difficult to answer? Analysis with Entropy of Answer Distributions, Visual Question Answering and Dialog Workshop at CVPR 2020, June 14
https://youtu.be/g24WtI3vS1Y
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Which visual questions are difficult to answer? Analysis with Entropy of Answer Distributions
1. Which visual questions are
difficult to answer?
Analysis with entropy of answer distributions
Kento Terao Toru Tamaki Bisser Raytchev Shin’ichi SatohKazufumi Kaneda
Hiroshima University NII
Visual Question Answering and Dialog Workshop
at CVPR 2020, June 14
github https://github.com/tttamaki/vqd
arXiv https://arxiv.org/abs/2004.05595
2. What is this
item?
Is the catcher
wearing
the safety
gear?
WHICH IS DIFFICULT
TO ANSER?
3. Some questions are easy, some are difficult
Q : What is the player’s
position behind the batter?
yes
no
catcher
1
・・・
・・・
Q : How many people? many
over100
50
100
・・・
・・・
answer distribution
VQA
Model
VQA
Model
answer distribution
Motivation
• Finding which one is
difficult?
Application
• Using the difficulty for
developing new VQA
models
Contribution
• Providing a practical and
surprisingly simple way to
asses the difficulty
• Finding question clusters
difficult for any VQA
models
Difficult question
Easy question
4. Related works and our key contribution
# unique answers: 6
Entropy: 1.61
Ours: analyzing distributions of multiple VQA models
• No annotation of difficulty
• Estimating difficulty of visual questions, even in the test set
many
over100
50
100
・・・
・・・
VQA
model
1
Answer
distributions
Related works: analyzing
distributions of answers by humans
• Estimating # unique answers [Gurari
and Grauman, CHI’17]
• Predicting reasons why they disagree
[Bhattacharya+, ICCV2019]
• Predicting entropy as annotation
diversity [Yang+, HCOMP 2018]
Q: How many people
can fit in the 2 buses?
cloud workers
40, 80, 100, 100,
100, 100, 200,
many, many, lot
Ground truth answers
many
over100
50
100
・・・
・・・
VQA
model
2
many
over10
50
100
・・・
・・・
VQA
model
3
5. Proposed method
2. K-means clustering on
3D entropy vectors
Model I: image only
Model Q: question only
Model Q+I: image and question
Q+I baseline: Pythia v0.1 [CVPR2018]
dim = 3,129
dim = 3
1. Computing 3D entropy vectors
3. Analyzing accuracy of
VQA models for each cluster
entropy
6. Experiments
Dataset
• VQA v2 [Goyal+, CVPR2017]
• Training set for training VQA models
• Validation set for clustering and analysis
Protocol
• Training I, Q, and Q+I models on the training set
• For each model
• predicting answer distributions of each of
visual question in the validation set
• computing entropy values
• Performing k-means clustering (k=10) on the
validation set
• Computing statistics for each cluster, and sort
clusters in order of entropy
• Assigning questions in the test set to clusters
Comparisons
• Predicting by using the state-of-the-art
VQA models (trained on the training set)
• BUTD [CVPR2018]
• MFB [EMNLP2016]
• MFH [TNNLS2018]
• BAN-4/8 [NeurIPS2018]
• MCAN-small/large [CVPR2019]
• Pythia v0.3 [CVPR2019]
7. Results and observations
• All methods show poor
performances on the most
difficult cluster (about 10%
accuracy)
• The values of cluster entropy
are highly correlated with the
cluster accuracy; entropy
values increase while accuracy
decreases from cluster 0 to 9
• As the cluster difficulty
increases, the answers
predicted by the different
methods begin to differ
8. Examples in cluster 0 Annotations agree
VQA models agree and answer correctly (about 85% accuracy)
9. Examples in cluster 9 Visual questions are difficult to answer,
even when annotations agree (about 10% accuracy)
10. Check out !
Github
• Clustering results and visualization code available
Visual Question Difficulty (VQD)
https://github.com/tttamaki/vqd
Paper
• More in-depth discussions can be found on arXiv
https://arxiv.org/abs/2004.05595
You may use the difficulty in your
model for questions in the training,
validation, and test sets