Pie chart or pizza: identifying chart types and their virality on Twitter

PIE CHART OR PIZZA:
IDENTIFYING CHART TYPES AND THEIR VIRALITY ON
TWITTER
Elena Simperl
@esimperl
University of Bristol
January 13, 2021

“One of the interpretations of the EU referendum result and
the rise of Donald Trump in the US is that we are now living in
a post-truth society - a world in which anecdotes shared on
social media and invented numbers thrown on the sides of
buses are more trusted and influential than official statistics,
extensive research, and proven expertise. In this world,
scientists, statisticians, analysts, and journalists must find new
ways to bring hard, factual data to citizens.”

“Data must entertain as well as inform, excite as well as
educate. It must be built with social media sharing in mind,
and become part of our everyday activities and digital
interactions with others.”

DATA STORIES
DATASTORIES.CO.UK
Data Stories developed frameworks
and technology to bring data closer
to people through art, games, and
storytelling.
We examined the impact of varying
levels of localisation, topicalization,
participation, and shareability on
public engagement with factual
evidence.
We delivered tools and guidance to
help artists, designers, statisticians,
analysts, and journalists communicate
through data in inspiring, informative
ways.

Theme 1: Find, make sense, use data

Theme 2: Entertain and inform with data

STORYTELLING THROUGH GAMES AND ART

VIRAL
CHARTS?
Data visualisations are widely used by experts
to communicate quantitative information to the
public.
News agencies have Twitter accounts that
specialise in the dissemination of information
using charts.
Brands use infographics and other visual means
in campaigns.
Research has looked at information diffusion in
social networks for text, images, video, but not
charts.

PIE CHART OR
PIZZA?
Data-driven approach that
 identiﬁes whether an image
posted in a tweet displays a
chart.
 If yes, it
 predicts its exact chart type; and
 its potential to go viral (i.e. like
and retweet counts).

REALITY VS BENCHMARK DATASETS
Top: benchmark data
Bottom: actual charts shared on Twitter

CONVNET FOR CHART IDENTIFICATION
• Adaptation of the
VGGNet system
(Simonyan and Zisserman,
2015) tuned to the
requirements of our task
• 2.4m (excl. the ﬁnal
fully-connected layer)
parameters, around 129m
less than VGGNet’s “A”
conﬁguration.

THE REVISION+ CORPUS
ReVision corpus: introduced in (Savva et al. , 2011), 10 chart types, 2965 images
We extended it to ReVision+ (1 new chart type, 1 extended chart type, 3.6k images with no charts)
Chart type Samples
Area chart 90
Bar (+column chart) 169 (362)
Box plot 150
Line graph 317
Map 249
Pie chart 210
Pareto chart 168
Radar plot 137
Scatter plot 371
Table 263
Venn diagram 108
No chart (ILSVRC-2012) 3636
Total 6061

CHART IDENTIFICATION EVALUATION
10 chart classes
11 chart classes
+ no-chart class

BUILDING A REALISTIC
DATASET
We collected a set of
34491 images from
Twitter accounts
dedicated to data
journalism.
We split this corpus into
two parts: 3000
images for chart
identification
(DataTweet+) and
31491 images for
virality prediction
(DataTweet).

THE DATATWEET+
CORPUS
We hand-labelled 3000
images using the
crowdsourcing platform
Figure Eight.
Quality assurance:
 80%+ on gold standard
questions (50 images, manually
labelled by us);
 inter-annotator agreement
(Fleiss Kappa) 60%+ (0.8741).

DISTRIBUTION OF CHART TYPES IN
DATATWEET+

FINETUNING THE CONVNET
We ran two sets of experiments on DataTweet+, one with
the ConvNet trained on ReVision+ and one after ﬁne-
tuning it on the new corpus DataTweet+.
We “froze” the parameters of the convolutional layers
and tuned only the fully connected layers.
We set the learning rate to half of its original value; the
other training details remain identical to the ones of the
original model.

CHART IDENTIFICATION EVALUATION
Original,
clean chart
dataset,
extended
3000 charts
from Twitter

MULTI-MODAL NEURAL ARCHITECTURE
FOR VIRALITY PREDICTION

JOINTLY LEARNING TO PREDICT LIKES
AND RETWEETS
Modelled as regression task. During training our model tries to
minimise:
Target values are transformed to logarithmic scale due to the large
variation of their expected values.
We evaluate using Root Mean Square Error (RMSE) and Spearman’s
rank correlation (ρ).
retweets target retweets likes target likes

FINDINGS
Best performance when all
features included.
Despite much lower
computational complexity, the
systems equipped with the
mDataTweet+ features perform
better in both retweet and like
prediction than the ones
equipped with mILSVRC.
Using the ﬁne-tuned mDataTweet+
features results in lower average
RMSE compared to the mReVision+
ones.
Most determinant prediction
features are author-related.

CONCLUSIONS AND
FUTURE WORK (1)
First attempt to estimate how much a chart -driven
Twitter post will be shared by jointly learning to
predict the number of times a chart message will be
retweeted and liked.
Our system outperforms other competing systems on
ReVision, while it is additionally capable of excluding
images that do not contain charts.
We introduced using crowdsourcing a new dataset of
realistic data visualisations—available at:
https://github.com/pvougiou/Pie-Chart-or-Pizza.
The models trained on the DataTweet+ corpus are
relevant for ongoing research on charts ranking or
recommendation with neural networks, which
identified a series of quality metrics to create large
training datasets automatically.

CONCLUSIONS AND
FUTURE WORK (2)
Such metrics could be used to generate larger,
synthetic chart corpora where we can control for
various chart design elements to see if they make a
difference on social media.
We did not consider images with more than one
chart. Model did not do well on dashboards and
embellished charts.
We did not consider time in our shareability
predictions - 95% of the posts we analysed were
older than a year, so we predicted cumulative
retweets and likes rather than time-sensitive results.
The system was robust across chart types and author
profiles and could be extended to other tasks such as
visual question answering for charts, used e.g. in fact
checking.

PUBLICATIONS
Talking Datasets — understanding data sensemaking behaviours. L Koesten, K
Gregory, P Groth, E Simperl. Currently under review at the International Journal
of Human-Computer Studies. 2020
Everything You Always Wanted to Know about a Dataset: Studies in Data
Summarisation. L Koesten, E Simperl, E Kacprzak, T Blount, J Tennison.
International Journal of Human-Computer Studies. 2019
Collaborative Practices with Structured Data: Do Tools Support what Users Need?
L Koesten, E Kacprzak, E Simperl, J Tennison; ACM CHI Conference on Human
Factors in Computing Systems, CHI 2019.
Dataset search: a survey. A Chapman, E Simperl, L Koesten, G Konstantinidis,
LD Ibáñez, E Kacprzak, P Groth. The International Journal on Very Large Data
Bases, 2019.
Characterising dataset search — An analysis of search logs and data requests. E
Kacprzak, L Koesten, LD Ibáñez, T Blount, J Tennison, E Simperl; Journal of Web
Semantics, 2018
The Trials and Tribulations of Working with Structured Data - a Study on
Information Seeking Behaviour. L Koesten, E Kacprzak, J Tennison, E Simperl.
Proceedings of ACM CHI Conference on Human Factors in Computing Systems,
CHI 2017.
Dataset Reuse: Toward Translating Principles to Practice. L Koesten, P Vougiouklis,
E Simperl, P Groth - Patterns, 2020
Pie Chart or Pizza: Identifying Chart Types and Their Virality on Twitter - P
Vougiouklis, L Carr, E Simperl - Proceedings of the International AAAI Conference
on Web and Social Media, 2020

Pie chart or pizza: identifying chart types and their virality on Twitter

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Pie chart or pizza: identifying chart types and their virality on Twitter

Similar a Pie chart or pizza: identifying chart types and their virality on Twitter (20)

Más de Elena Simperl

Más de Elena Simperl (20)

Último

Último (20)

Pie chart or pizza: identifying chart types and their virality on Twitter