2. Assignment 1a: Developing empathy and forming
ideas within different problem contexts
Introduction
Consider the curiosities, circumstances, purpose and ideas potentially involved
in the challenge of creating a visualisation/infographic in each of the following
made-up scenarios.
Compile a detailed briefing document outlining your assumptions, definitions
and ideas about the context and vision for each scenario.
0.1 Data source
Website of the department of criminal justice of the state of Texas:
https://www.tdcj.state.tx.us/death row/dr executed offenders.html
Date of oldest execution of dataset: 12/07/1982
1
5. 1 Scenario 1
A pro-capital punishment US newspaper reporting on the milestone of the 500th
execution (pretend it is 2013).
Assumptions. We consider a medium-size daily newspaper in Texas. Quite
popular among the conservative/rural part of the population mostly through its
print edition, it has managed over the recent years to attract a growing base of
younger readers through its online edition thanks to a new policy of making a
few selected article freely available on its website for the non-subscribers. This
younger readership, with less strongly defined political opinions, comes from a
more mixed background but still represents only a small minority of the readers.
The newspaper is still majoritively owned and managed by a few members of
a conservative texan family. There has been a very old and strong tradition of
operating the journal with profit only as a secondary objective, and the owners
have a strong influence on the content of the articles, generally shaped to match
the (republican) political view of the family.
I am the youngest employee of a small team in charge of providing various
graphics and illustrations for the articles. I have recently been hired as part of
an initiative to modernize the visual elements of the journal and make better
use of data-backed visualisations. As the journal is still experimenting with such
methods, I work alone but with the support of the rest of the team for advice
and to provide me with elements I might need, such as drawings.
1.1 Context: The reason
Outline what you think might be the essence of the curiosity: What question(s)
do you think the potential audience might need answering/find interesting that
the visualisation would ultimately present?
Stakeholders Intrigue. The family owning the journal has been personally
very involved with the passing of a few pro-capital punishment bills in Texas.
It wants to use the milestone of this 500th execution as an opportunity to
report on the status of death penalty in general. I am to create a double-page
infographic, which the owners hope will illustrate with numbers the impact that
capital punishments has had on the safety of texans and show the benefits of
such laws.
Audience Intrigue. Most of the audience is likely to hold a similar position
to the newspaper, i.e. be supportive of capital punishment, in which case the
readers will be particularly looking for information that confirms and supports
their views.
4
6. The purpose of this newspaper is still to attract a quite large public rather than
a highly specific political audience. Therefore, a fraction of the readers (notably
the younger readership of the online edition) might be relatively new to the
question and looking for clues that will help them take a position in the capital
punishment debate. Even though these readers might be looking for a more
neutral perspective, the newspaper will want to influence their position towards
supporting capital punishment. Therefore, for this audience, the visualisation
also needs to answer the question of why capital punishment is a good policy.
The existence of this second type of audience, however, will force the newspaper
and the visualisation to look a bit more objective.
As this is typically done when celebrating a milestone, the readers might ex-
epect to find information ”aggregating/summarising” the situation since the
introduction of capital punishment rather than specifics regarding the story of
each sentenced criminal. The kind of questions readers might have include:
• How many people were sentenced each year? Has this figure been changing
over the years? Is it appropriate, should it be higher/lower?
• Where did these crimes happen? Is it next to where I live? Am I con-
cerned?
• What is the typical profile (age, race, type of crime) of the sentenced
criminal? Has this profile been changing over the years?
• What are the main arguments, metrics and studies supporting my views?
What numbers can I quote to make my opinion more credible when dis-
cussing about the topic with others?
• Who are the important people (politicians, celebrities, . . . ) who have the
same opinion as me?
• How fundamentally bad are these people? What is the kind of atrocities
they have generally committed to (rightfully) deserve the capital punish-
ment? What makes me fundamentally different from them?
1.2 Context: The circumstances
Work through the list (of 10 main headings) and describe your critical thinking
about any assumptions, definitions or self-imposed factors you think might be
relevant/existent. If any require no definition or are relevant to that scenario,
explain why not?
1.2.1 People: Stakeholders
As explained earlier, the managers of the journal are very implicated in this ar-
ticle. Even though I am supposed to submit all my work to my direct supervisor
for approval, it has been informally defined for this project that the owners of
5
7. the journal would be the ones taking the final decision of whether and how to
include my infographic. Therefore, for the length of this project, I am de facto
reporting directly to them without consulting my supervisor.
Their expectations have been clearly phrased. They hope that the visualisa-
tion will help to convey their political message. They specifically asked for an
infographic that would be ”powerful” and make them and the readers ”proud
to be American, proud to be texan, proud of the constitution”. They want
the visualisation to be ”fact-based” and ”scientific looking”, but insisted that it
doesn’t include any ”metrics made-up by liberals to spread doubt in the minds
of true Americans”. By this last sentence, they were referring to recently pub-
lished economic papers finding no causal relation between the crime rate and
the existence of a capital punishment.
1.2.2 People: Audience
As explained earlier, the typical audience is texan, conservative and rural. A
small fraction of the readers live in other states, less than 1% of the readers
are international. The median reader of the print edition is 54 years old and
the median reader of the online edition is 29 years old. The online edition only
represents 4% of the total revenues, but is estimated to amount for about 20%
of all the readers.
From a previous survey, we known that the typical reader has a high school
education and possible ”some university”, but most readers’ professional ac-
tivity doesn’t include ”interacting with numbers on a regular basis”. Previous
attempts to use data-based infographics have shown that the readers have a
strong preference for intuitive visualisations that can be quickly grasped.
A small but significant fraction of the readers is strongly supportive capital
punishment and already expects the journal to make a special article for this
occasion. These readers would be very disappointed if the subject wasn’t cov-
ered extensively or contained inaccuracies.
1.2.3 Constraints: Pressures
The 500th execution will normally happen in two days. I have been informed
this morning that I am assigned to this project and have to deliver the final
visualisation in two days by 10 p.m. so that it allows for last minute changes.
It has been made clear to me that my performance on this work would be as-
sessed and used as a reflection of my skills in general. As I am a new employee,
the managers still haven’t received much feedback regarding my performances,
and this will be an excellent opportunity to distinguish myself. The managers
know that and thus expect from me to put some extra hours on the project the
next two nights.
6
8. For a double-page visualisation, the policy of the journal is to allocate a 400$
budget to purchase rights for any necessary illustration. As for every project, I
am also free to request any help I would deem necessary from the other members
of the design team who are not currently assigned to a particular project.
1.2.4 Constraints: Rules
The topic is expected to be the day after tomorrow’s headline. In addition to
the front page, the article will occupy a total of 5 pages. My infographic will
occupy a double-page (page 2 and 3 of the article), followed by two more pages
of text (page 4 and 5 of the article). The newspaper has a tabloid format, i.e.
each page has dimension of 430 mm 280 mm. There is no limitation regard-
ing colors, but it would be preferable that it respects the general set of colors
of the journal (mostly variations of red, similar to those of the republican party).
The policy of the newspaper is to design all visualisations with the print edition
as the only edition in mind. The infographic might be adapted for the web later
only if the format allows.
1.2.5 Consumption: Frequency
This is a one-off visualisation and will only be used for this one edition.
1.2.6 Consumption: Setting
The newspaper will be distributed through the regular channels. This will be a
Thursday edition and thus won’t be distributed over the week-end. It is very rare
for readers to read an old edition, so the infographic should only be ”consumed”
on Thursday. If the web version gets popular, however, it is possible that a few
people keep visiting the article page for a few more days.
1.2.7 Deliverables: Size
I am to deliver a PDF version of a ready-to-print infographic by the deadline, and
store in the digital archive of the journal all the files and documents necessary
to reproduce or adapt my work. I can, if I wish, submit alternative version of
the infographic for the editors to choose from, but all submitted work need to
be final and ready to print. I must be ready to accomodate any last minute
change.
1.2.8 Deliverables: Format
The standard format for the deliverables is PDF. I am not to worry about the
web adaptation, generally a simple cut from the raw PDF file.
7
9. 1.2.9 Ressources: Creators
As explained earlier, I am to work alone, with the support of the rest of the
design team if necessary.
1.2.10 Ressources: Technical
The newspaper has licenses for most popular design software, including the
complete Adobe suite. I am free to use any software I want. If the purchase of
an additional software is required, I can either use the budget allocated to the
project (up to 100$ per software) or make a request to my supervisor.
I have been instructed to use as my main source of data the list of executed
offenders of the Texas Department of Justice for this infographic, but am free to
use any other reliable source. The newspaper is subscribed to a few premium-
access databases that I can use to obtain additional data if necessary.
1.3 Vision: The purpose map
Describe and reason what the possible aim of this work would be in terms of what
experience (the Exs) it would facilitate and through what tone of voice (Read vs.
Feel)
The infographic will consist of many numbers (since there are a lot of differ-
ent and important statistics to display) but be mostly visual and intuitive, i.e.
the meaning of these figures should be easily understandable by the context in
which they are displayed. This would put this visualisation in the left end of
”exhibitory” on the purpose map, since most of the conclusions from the data
are directly exposed to the reader. But the explanations should remain short,
and the total amount of text at a minimum level, so the infographic couldn’t be
qualified as ”explanatory”.
These numbers have a strong implication in human lives and security, so read-
ers might feel very strongly after reading the infographic. The raw data has
potential for a lot of ”reading” and ”quantitative analysis”, but making too
”scientific” a topic as sensitive as executions would likely offense many readers.
On the other hand, totally excluding logic and making it a purely emotional
topic wouldn’t be effective and against the policy of the journal. Thus, the
infographic should be classified at the top of the ”feeling” category, close to the
line with ”reading”.
We would then have the following purpose map:
8
10. 1.4 Vision: Your ideas
Sketch out roughly what you think this work could look like: what colours, what
keywords, rough drawing, any other work out there that you can be inspired/in-
fluenced by? (not a test of artistry, just map out ideas)
The two pages could be merged together to form only one visualisation. For
the sake of readability, however, no word should be split across the two pages. If
necessary, a large graph can be split in the two pages, but this should be avoided.
The main kinds of elements that could be contained include:
• A bar chart showing the count of executions across time (either one year
periods or five year periods).
• Other time series printed parallelly to the previous bar chart, to show
potential correlation between executions and another crime metric. Could
also show impact of new bills on the number of executions per year.
• A timeline reminding of the key historical date (such as the introduction
of new legislations).
• A map of Texas to show where most of the execution/crimes happen.
Some points of the map can potentially be emphasized if they can help
explaining the crime rate (e.g.: frontier with Mexico?).
9
11. • A vertical bar chart showing what is the most frequent content/words
from the last words of the criminals (e.g.: god, pardon, love,...).
• Featured profiles of executed criminals (preferably authors of atrocious
crimes with little empathy for the victims).
• Small text boxes next to relevant charts to provide additional insights and
anecdotes. E.g.: What to think of this map? What to remember from
this graph?
As for the background, it could be made of something patriotic such as a
partially transparent zoomed-in American/Texas flag or symbol.
Overall, the infographic would look like something similar to this:
2 Scenario 2
Analysts at the Texas Department of Criminal Justice staff reporting to senior
management at the Texas Department of Criminal Justice.
2.1 Context: The reason
Outline what you think might be the essence of the curiosity: What question(s)
do you think the potential audience might need answering/find interesting that
the visualisation would ultimately present?
We are a team of 4 Junior Analysts (including one summer intern) and we
have been assigned on a two days project to prepare all the visual elements of
10
12. the biennial public report on death penalty in Texas (part of a legal obligation of
transparency of the Texas Department of Criminal Justice). We are supervised
by two senior analysts who will be responsible to hand the final report to senior
management for final review and approval.
The senior analysts have provided us with a detailed list of all the main compo-
nents of the report. Last year’s report was 83 pages long (including appendix),
with a total of 34 tables and graphs. The responsibility of the graphs has been
split across the team of junior analysts and I have been assigned to prepare 8
of the 34 visuals.
Stakeholders Intrigue. We are to report directly to the senior analysts, who
will, in turn, report to the senior management. The senior analysts will be in
charge of writing most of the core of the report. The content of the report will
be highly influenced by the data, which will be taken in part from the tables and
graphics we are to prepare. Thus, the senior analysts (our direct supervisors)
hope that these graphical elements will be structured in a way that facilitates
their redaction and inspires them for this year’s report. They will also want
to make sure that the graphs and tables are similar to those of previous year’s
reports to avoid any trouble with the senior management.
The kind of questions they might have include:
• Can I find the figure I want easily?
• Can I compare it to last period’s figure easily?
• Is the format the same as what I am used to?
• Are there any mistakes?
• Do I need to ask for an additional graph?
The senior management will be the ones directly responsible for the publication
of the report, thus will be held accountable for any positive or negative conse-
quence that might result from it. They will first have a global look at the report
to check for quality, then read it more attentively to verify that they agree with
the content and, hopefully, gain additional knowledge (or fill their knowledge
gaps) on the topic. If the quality doesn’t meet the standards they will have to
send it back to the senior analysts or correct it themselves. They hope to avoid
this step to reduce their workload.
But fundamentally, the report is more a formality for the senior management
than a real source of insights or potential for career progression, so their biggest
intrigue really is ”Does this draft look satisfactory enough to allow me to move
on to more interesting projects?”. Internally, this report is really viewed as
waste of time and an increase of the costs of death penalty.
11
13. Audience Intrigue. As this report is part of a legal obligation of trans-
parency of the Department of Criminal Justice, it doesn’t have a particular
target audience, which will be by nature very diverse. It will include journal-
ists, researchers, NGOs, writers, curious, etc. They might have any kind of
question and might want to find the answer in the report. They might even
have found the report totally randomly through a search engine.
Most of this audience will not need to read the entire report but only to find
the data or piece of information they need. Sometimes, they won’t even need
to find it, but just need to be sure the information is contained in the report in
order to use it as a reference in another (formal and lengthy) paper or report.
Many of these readers will be used to the format of previous editions, and might
see from a bad eye any significant change from the usual format.
The questions of the audience might include:
• What was the profile of the executed criminals the last two years?
• What is the opinion of the Department of Criminal Justice on the evolution
of the situation.
• Does this report speak about the overall cost of executions so that I can
add it to my ”references” slide at the end of my development of Criminal
Justice in Early Communities” lecture in my ”SOC 101 - Introduction to
Sociology” class?
However, not all of the audience’s potential questions are relevant to the
design of the visualisations (and of the report), since the main objective of
that document is to meet the legal obligations of the department rather than
to satisfy ”customers”. For that reason, the most important question of the
audience that the Department of Criminal Justice will want to answer to is
”Can this information be found somewhere in the report?”.
2.2 Context: The circumstances
Work through the list (of 10 main headings) and describe your critical thinking
about any assumptions, definitions or self-imposed factors you think might be
relevant/existent. If any require no definition or are relevant to that scenario,
explain why not?
2.2.1 People: Stakeholders
As explained earlier, the main stakeholders for me are my two direct supervisors,
the senior analysts. The production of these visualisations is a routine task for
everybody, me included, and it cannot lead to any promotion or bonus. The
end stakeholders are the senior management, but unless we (Junior Analysts)
produce such a low quality work that the senior management needs to find a
responsible to fire, we will not have any interaction with them.
12
14. Fundamentally, the main stakeholder is the United States Constitution and
the rule of law: we need to produce graphs that meet the requirements of our
legal obligation, not for us, but as part of our duty to serve the best interests
of the country.
2.2.2 People: Audience
As explained earlier, the audience is very diverse.
We estimate that the online version of the document (available on the De-
partment of Criminal Justice’s website) represents the majority of the readers.
From Google Analytics statistics, we know that 65% of the downloads of pre-
vious reports have been made by people (or computers) located in Texas. 54%
of the total ”pageviews” on the report come from organic search (mostly from
Google), 23% come from access from an internal link of the department’s web-
site, 8% come from direct access (generally people who copy pasted the link of
the report in their web browser) and 15% come from links from other websites
pointing to the report. Less than 1% of the downloads comes from social net-
works.
The print version of the report (around 400 copies) circulates mostly among
institutions and libraries, but can also be ordered by anyone through the de-
partment’s website (15$ administrative fee). In most cases, these institutions
are interested to own a physical copy of the report mostly for archiving purpose
or for the sake of completeness.
The direct audience is generally very sophisticated, hold a bachelor/master de-
gree or PhD and are less sensitive to the form of the report than to its content.
The report also has an indirect audience, that is people who will read extracts
of it or content inspired by the report but rephrased to be more accessible, for
example in newspapers, magazines or other medias. Generally, these people will
not even be aware that the content they are reading originally comes from this
report. The profile of this indirect audience is very different from the profile of
the direct audience, and will be less educated, less captivated by the topic.
2.2.3 Constraints: Pressures
The graphics must be handed tomorrow by 5 pm. No extension will be granted
(or ”should be required”, according to the senior analysts). No budget is allo-
cated to this project.
No particular pressure on the project. Everybody expects it to be a routine
task.
13
15. 2.2.4 Constraints: Rules
Layout must be close or identical to that of the previous year. Allowed to make
some minor changes if necessary. The visual appearance can be changed com-
pared to the first year, but only if time permits and if all the four junior analysts
agree on the modifications (as all the visuals must obey the same design guide-
lines).
The original report is in colour but it is not uncommon that people photo-
copy parts of it in black and white. In particular, the charts are often used in
academic contexts as teaching material and handed out to students. Therefore,
the black and white outcome must always be kept in mind when designing the
different parts of the report.
The font used in the appendix section of the report is Times New Roman 11pt.
The report is printed on A4 pages with 1” margins. Each page contains a
discreet page number as well as the name of the department of justice at the
bottom of the page. The top of the page also includes the name of the current
section of the document. The formatting of the report will only take place once
all the elements have been completed.
All the tables and numeric values of the appendix have to be written in pure
text (no images) when possible so as to make them searchable (by the user or
the search engine).
2.2.5 Consumption: Frequency
A new report is published every two year. Most of the visuals and tables can
be replicated from the previous report with only a limited amount of changes
necessary. Every time, however, the structure is slightly altered and some ta-
bles/visuals are dropped or added.
2.2.6 Consumption: Setting
The report is most popular the weeks following its publication. However, it is
supposed to remain ”valid” for two years, so figures and conclusions that only
have very short term importance should preferably be avoided.
As explained earlier, it is distributed through different channels. The main one
is the digital version freely available on the website of the department of justice
(and indexable by search engines). The print edition has only been around 400
copies the previous year, but this number is now quite flexible as the department
has decided to move to the Print-On-Demand (POD) technology for this year
to spare costs and gain in flexibility, through Amazon Createspace.
14
16. 2.2.7 Deliverables: Size
34 visuals in total, but I am only in charge of 8. If I finish early, I might get
assigned to some of the visuals of my three colleagues.
More specifically, the eight visuals consist of:
• Three full page tables, mostly containing numbers
• One half-page map with legend
• One full page set of bar charts
• A full page vertical timeline
• Two half-page combo line/pie/bar charts (new for this year)
The two half-page combo charts should take the most time as I will have
to design them from scratch. Two of the three tables require to actively search
information in a diverse set of sources. The other five visuals consist mostly of
updating information and won’t represent more than a few hours of work.
2.2.8 Deliverables: Format
The eight visuals must be delivered in eight different .docx Microsoft Word
documents, in the format specified above. As explained above, output must
be optimized for both print (including black and white photocopies) and web
(including text-only search engines).
Another person will be in charge to adapt the Word documents to PDF for-
mat for web distribution and POD through Amazon Createspace.
2.2.9 Ressources: Creators
I work as part of the team, but I am in charge of my own visuals. I have a good
relationship with my teammate, so it is likely that I will ask for their advice
in case I have any doubt. The four junior analysts share the same office and
collaboration is frequent.
2.2.10 Ressources: Technical
All the employees are equiped with one basic Windows desktop computer. The
Microsoft Office and Adobe suites are installed by default on all the computers,
as well as a few additional softwares. We are free to download additional soft-
ware if necessary, but no budget is allocated. However, we are expected to only
deliver files that can be read with the default softwares of the department. In
particular, all the visuals must be contained within .docx documents.
15
17. 2.3 Vision: The purpose map
Describe and reason what the possible aim of this work would be in terms of what
experience (the Exs) it would facilitate and through what tone of voice (Read vs.
Feel)
The visuals and tables clearly aim to be as objective and informative as pos-
sible. The focus is on the quantity and the accuracy rather than the ease of
understanding. Clearly, these visualisations should occupy the top left corner
of the purpose map:
2.4 Vision: Your ideas
Sketch out roughly what you think this work could look like: what colours, what
keywords, rough drawing, any other work out there that you can be inspired/in-
fluenced by? (not a test of artistry, just map out ideas)
For most of the visuals, no creativity is required since it will only be about
updating the data with more recent information. Most of the data required for
the eight visuals can be found directly from the list of executed offenders on the
department’s website.
Roughly, the eight figures would look a bit like that:
16
19. The inspiration for the new combo charts comes directly from the other
already existing similar combo slides in the rest of the appendix. Most of the
text boxes are simply explaining the content of the tables and graphs. The
vertical timeline will be identical to that of the last report, except that two
new points must be added with a small text paragraph (the margin has to be
reduced to accommodate the new data points).
3 Scenario 3
A campaign group looking to help influence a debate about the ending of capital
punishment.
3.1 Context: The reason
Outline what you think might be the essence of the curiosity: What question(s)
do you think the potential audience might need answering/find interesting that
the visualisation would ultimately present?
I am part of a small campus association fighting against death penalty, made
of a few friends with strong convictions. A major conference on the topic of
death penalty will be hosted at our university in a few weeks. The organising
committee is made of a few university officials with strong ties to the republican
party and most of the speakers expected to be present in the conference are well
known for their systematic conservative political bias.
18
20. Due to the importance of the debate, which is expected to attract up to a
few thousand students and will be diffused on live television, we decided to take
action and protest against the structure in place. The first step of our plan of
action is to raise awareness through a strong campaign both in campus and on
social networks.
We will start our online campaign by publishing a few viral infographics in
the university facebook pages.
Stakeholders Intrigue. The group is strongly committed to rely on facts for
all its points. But we known that our ”opponents” will mostly rely on emotions
and ”shock” arguments, so we decided to adapt our approach to the field in
which we fight.
The stakeholders are made of the 14 group members (including myself). We
are already extremely aware of all the facts, evidence and data surrounding the
subject so we don’t expect to learn anything new from this infographic. How-
ever, we have always found that it was very difficult to convey all the points of
our position to an uninformed individual, so we hope that this visualisation will
help us come with better ways to convince others of our points. So the kind of
questions that we hope this visualisation would help us answer includes:
• In which order should we bring our arguments?
• What data do we need to show to effectively convince others?
• What is the most intuitive way to visualise our strong feelings?
• How does it look from outside?
• How many points do we actually have? How much space does it take on
paper?
• Are rationality and fact-based approaches compatible with ”coolness” for
this topic? Can we make it interesting while still being objective?
Audience Intrigue. The audience will be very diverse in nature and rela-
tively uninformed about the topic. As the date of the conference will approach,
the topic will become more ”trendy” and students will gain interest, possibly
looking to forge their own opinion before the debate. Thus, we expect the peo-
ple to be more attentive to our claims than usual.
As our university tends to be more conservative in general, we tend to receive
less support and more aggressivity from our peers. But most of the students
are not politically involved and remain quite open to ideas from any political
origin. They are looking for any information that could help them take a stance
in the debate. Their questions might include:
19
21. • What are the arguments of both parties?
• What are the main ”things to know” about the topic?
• Are there any fun figures that I can quote in a discussion with my friends?
What can make me sound more clever?
• Why should I care?
• What is the position of the rest of the student and of the country on the
matter? Where do I stand compared to them?
For the students who already have a strong opinion on the matter, their
intrigue would probably be closer to:
• Is this infographic trying to speak about something I don’t want to hear?
Does it come from people with a political opinion different from mine that
I should just ignore?
• What can I find that confirms that the other party has weaker claims than
mine? How strong is my position? What are the main arguments of the
opposition / our best arguments?
But the targeted audience is mostly the first group, the relatively uninformed
people, as our influence group largely considers the informed students as ”gained
cuases” or ”lost causes” with a too high cost of conversion.
3.2 Context: The circumstances
Work through the list (of 10 main headings) and describe your critical thinking
about any assumptions, definitions or self-imposed factors you think might be
relevant/existent. If any require no definition or are relevant to that scenario,
explain why not?
3.2.1 People: Stakeholders
As explained earlier, the stakeholders are the 14 students who are part of the
group. The group is mostly informal: it is officially registered as a campus
organisation, but most of the decisions are taken individually according to per-
sonal variations of opinion and motivation. On some occasions, however, we try
to coordinate our efforts to reach an objective. This campaign is one of these
occasions, so we decided to only publish things on social networks after having
informed and received tacit approval from the other 13 members. There is no
formal decision process: we have a WhatsApp group in which we send our sug-
gestions. If after a few hours no one rejected our idea, it is generally assumed
that it is fine to proceed.
The 14 students are all undergraduate students in the university, coming from
different majors, with a focus on social sciences.
20
22. 3.2.2 People: Audience
The university counts 21 235 students, 81% of which are full-time undergradu-
ate. As explained earlier, the main target consists of the non-politically affiliated
students, but we aim to maximize the total exposure of our campaign so we also
hope to reach a few individuals from other categories, including staff, lecturers,
university officials and possibly the general population. At the same occasion,
we hope to attract a few students to join our organisation.
3.2.3 Constraints: Pressures
Our funds are very limited so members mostly use their own resources. The
time pressure is quite strong since the conference will take place in a few weeks
only and our classes keep us busy most of the time. However, no specific dead-
line is set for my own work and it is quite flexible.
In the past months, a few students member of various campus groups have
received administrative sanctions (going up to expulsion for the most extreme
cases) following a protest on diverse issues. The purpose of these sanctions was
to send a clear message to student organisations: disobedience to college rules
will not be tolerated. This has created a lot of tensions and puts a lot of pressure
on many associations, since most students want to stay out of trouble and prefer
remaining unnoticed from the administration. In particular, we could face legal
action for libel if the elements we publish directly incriminate university officials.
Another source of pressure is the ranking algorithm of social networks, Facebook
in particular. Since the infographic must become viral on Facebook to reach
the diffusion objectives, certain rules must be observed, such as publishing the
infographic on the right time of the day, having the right keywords in the text
description, having the right dimensions for the picture profile update, adopting
the right tone generally expected by Facebook users, etc.
3.2.4 Constraints: Rules
Most of the rules are dictated by the practical necessities of publishing on Face-
book. The infographic must be in a format that allows for comfortable reading
on Facebook. The readers should not have to leave Facebook to see the vi-
sualisation on an external website, so this has to be taken into account. We
must also remember that Facebook highly compresses images, so no zooming
should be required. Possibly, one Facebook post can contain multiple images:
this can be used as a solution to the problem of low allowed resolution for indi-
vidual images. Interactive images such as GIF are not allowed, as well as most
non-standard image formats; the Facebook documentation should be consulted
to check for technical feasibility before starting to develop any innovative non-
standard visualisation.
The logo and the name of the campaign group must be clearly shown on all
21
23. the publications. It is important that the message be conveyed together with
the name.
3.2.5 Consumption: Frequency
This is a targeted one-off campaign, but the most popular infographics might
be republished later or in other social networks, or re-used.
3.2.6 Consumption: Setting
Rapid (hopefully viral) consumption on social networks, mostly Facebook. The
timing of the publication is very important and is thought in advance to corre-
spond to the peaks of social activity of the students.
The infographics are generally published through the Facebook page of the
campaign group, shared by each of the group members and by a set of sup-
porters and partner student associations. It is also published directly on all
the public and private Facebook groups of the university to which the student
members have access. The publication is often associated with a few prizes to
win, attributed to people randomly drawn from the list of people who ”liked”
the post, to incite students to like each of the publication and increase virality.
3.2.7 Deliverables: Size
I have to prepare one set of images that will be published together in a Facebook
post. These images are generally accompanied with a small text description to
spark interest and with a link to the (newly created) website of the group.
The total number of images is entirely flexible but should remain reasonable
to avoid the risk of boring the reader. The first few images should be very
simple, very visual and easy to understand to grab the attention of the read-
ers. The following images will only be seen by users who have already started
reading the post and can, thus, be more sophisticated. Therefore, the first few
images will require more ”artistic” and ”creative” work, whereas the last ones
are likely to require more research and descriptive work.
3.2.8 Deliverables: Format
All the images should be designed with Facebook publication as the main ob-
jective. In a second step, those images can be adapted for publishing in other
social networks or contexts, such as the group website or print diffusion. From
time to time, the group is invited to take part in small talks and re-uses some
of the infographics published on Facebook to illustrate some of their slides.
22
24. 3.2.9 Ressources: Creators
The association is very collaborative and people are always glad to help each
other. Skills are very diverse so when I don’t know how to do something, I
generally ask for help and receive some very fast.
I am used to work together with two very close friends of mine who are also
part of the group and have complementary skills. They are less active than me
in the association and are not currently working on any project, but I know I
can rely on them to help me with the technical part of the creation. The first
is an art student, very good at hand drawings of any sort, and the second is a
CS major very agile with design softwares. I generally make a plan of all the
details of and create myself the easiest elements, and I rely on them for all the
more complex graphical components of the visualisation.
3.2.10 Ressources: Technical
As students, we have free access to most of the popular design or technical soft-
wares. This includes the whole Microsoft Office and Adobe suites.
Each member of the group has its own habit for the design of these visuali-
sations, so there is no standard among us.
3.3 Vision: The purpose map
Describe and reason what the possible aim of this work would be in terms of what
experience (the Exs) it would facilitate and through what tone of voice (Read vs.
Feel)
The campaign’s tactic is made of three phases:
1. Spark the interest of the reader with shocking facts and numbers
2. Convince him that our side is the good one with objective facts and reliable
data
3. Make him emotionally involved so that he doesn’t immediately forgets
about what he read after closing the computer and takes action to bring
about changes
So there is a mix of feeling and reading, with a slight tendency towards feeling.
No interaction with the graphs are possible, but it is required from the user
to click on the ”Next” button to see the next image of the post, and social in-
teraction is encouraged through the comments. Still, this is not enough for the
visualisation to qualify as ”exploratory”. It couldn’t be qualified as explanatory
either as the amount of reading and detailed information is moderate. Overall,
the visualisation tries to be ”exhibitory” and to simply ”show things as they
are”, by letting the reader draw the most obvious conclusions by himself.
23
25. Thus, on the purpose map, the visualisation would be around here:
3.4 Vision: Your ideas
Sketch out roughly what you think this work could look like: what colours, what
keywords, rough drawing, any other work out there that you can be inspired/in-
fluenced by? (not a test of artistry, just map out ideas)
My original idea was to make a ”Top 10”, which are generally known to
attract a lot of ”views” on social medias. It could be called for example ”TOP
10 reasons why Death Penalty is bad”. It would be made of 10 images, starting
with the tenth, going up to the first. Normally, the reader would expect the
reasons to become increasingly important and this incites him to continue read-
ing. Since the argument are quite subjective anyway, this doesn’t necessarily
has to be true, but finishing with a memorable point is always a good thing.
As explained earlier, the last points can be more developed than the first and
could potentially encourage the reader to learn more about the topic by reading
other recommended sources.
The images and elements of the top 10 could look a bit like that on Facebook:
24
27. As shown above, the first image shown could use the picture and some per-
sonal details about a specific criminal to make readers feel the scale of the costs
and bring it back to an individual level. Making the criminal look more like a
person also increases the emotions, since people then no longer see ”criminal-
ity” as a society-wide issue but ”criminals” as bad persons who don’t deserve so
much attention. Here, I think the main strength of this database of convicted
offenders is that it gives us a lot of very personal information we can use to
display on selected felons (at the risk of appearing unethical by doing so).
The other seven images could make use of the other attributes of the database,
most specifically the last statements. For example, a word cloud could probably
illustrate one of the images (depending on which words come as the most visi-
ble). The frequency of a specific topic in the last statements can also be brought
as an argument and be shown visually as part of a bar chart. But overall, the
executed offenders database would probably only represent a small part of all
the data used for the visualisation.
4 Scenario 4
What potentially intrigues you about this data? What might you undertake if
you had the chance?
4.1 Context: The reason
Outline what you think might be the essence of the curiosity: What question(s)
do you think the potential audience might need answering/find interesting that
the visualisation would ultimately present?
Personal Intrigue. I am a student in Imperial College London, studying in
the new Msc in Business Analytics programme. During our Visualisation course,
we explore some innovative ways to visually represent and understand data. In
one of the classes, we briefly look at the ”Executed offenders” list of the Texas
Department of Justice’s website.
At home, after the class, curious, I go back on the website and realise that the
data has a lot of potential and decide to explore it a bit further. In particular,
I am interested by the individuals’ last statement: What are the main topics/-
subjects that an individual wants to speak about/mention when he knows he
is about to die? Is it religion, love, family, friendship, politics, justice, regrets,
justification, etc.? Have these topics changed over time? Do they differ based
on gender, based on age, based on the type of crime committed?
I think that the data would be easier to understand and fit well within a fully
26
28. interactive visualisation where parameters can be changed in one click. I re-
alise this would be a good opportunity to brush up my JavaScript/JQuery skills
and potentially try a few visualisation js/html5 libraries, so I decide to make it
web-based.
Audience Intrigue. Of course, this is a personal project and the primary
questions I try to answer are mine. But since I will do the work anyway, why
not share the results?
I decide to, in a first step, publish my work on an ad hoc page in my per-
sonal website. Depending on my findings, I might dedicate a website to the
project.
The audience might have the same questions as me, and perhaps even have
some additional ideas. A comment section on the website could potentially lead
to some very interesting suggestions of extensions to this project.
4.2 Context: The circumstances
Work through the list (of 10 main headings) and describe your critical thinking
about any assumptions, definitions or self-imposed factors you think might be
relevant/existent. If any require no definition or are relevant to that scenario,
explain why not?
4.2.1 People: Stakeholders
Me, that’s all. I only do that as a side project, for my personal interest.
4.2.2 People: Audience
In the beginning, no one except me. Later, once published on my website, just
the people I give the link to or potentially a few visitors coming from search
engines. If I like my findings and dedicate an entire website to the project, then
possibly the audience could become much larger. But that’s not the priority
right now.
4.2.3 Constraints: Pressures
I have a few weeks until the end of semester projects start. As this is still the
beginning of the semester, I have a lot more free time. Hopefully, I can finished
the core of the project by the end of the semester and start thinking and work-
ing on the ”publishing details” during the vacations.
I want to get inspiration from existing interactive visualisations. I need to
count a few days to explore the web and see what has been done in the past
(not necessarily relating to death penalty). In particular, I want to know what
are the main visualisations libraries, what they can do and what they can’t.
27
29. 4.2.4 Constraints: Rules
I want my visualisation to be compatible with the main web browsers (except
previous versions of Internet Explorer) and smartphones (Android and Iphone).
It has to run on almost any screen size/resolution and shouldn’t require any
external plugin. Ideally, the loading time should be short (so not too many files
to download) to allow for low speed smartphones to access the page without
wifi connection.
4.2.5 Consumption: Frequency
Normally a one-off project, but I could update the data as it becomes available
(i.e. as more criminals receive capital punishment, or using data from other
states) or improve it based on the feedback I receive. In any case, if I publish
the results, they should remain public on the website forever; I generally try to
keep all my websites and publications online, even when they become obsolete,
for archive purposes.
4.2.6 Consumption: Setting
My own consumption: Live, as I work on the project.
Consumption from the audience: If published, prolonged, with most likely a
peak when I first release the results on social media. Potential peaks are also
possible from search engines when a major capital-punishment-related event
happens, if my project’s page has a good SEO (Search Engine Optimisation).
In general, I don’t expect a very high level of traffic on the page.
4.2.7 Deliverables: Size
One static web-page only. However, I see two main sources of work:
• Working on the data. Categorising the different last statements ap-
propriately. What should be the relevant ”categories” of last statements?
What level of detail should we have (should we just look for ”religion” in
general, or distinguish between different types of religions?)? How to de-
fine whether one last statement includes a link to ”religion” or not? What
should we consider as ”religion”? How to understand last statements that
are not clearly phrased?
• Working on the visualisation per say. How to communicate the data
in an explicit way? How to visually express the strength of correlations?
How to make it look like an objective approach? How to make it intuitive,
how to invite the visitors to try changing the different parameters of the
interactive visualisation?
The first part of the work will be a bit more ”mechanic” but must be done
carefully or the study won’t be valid. Most likely, every of the hundreds of
statements will have to be MANUALLY categorised. A clustering algorithm
28
30. would probably give poor results given the complexity of the data and rather
small size of the population, and a bit overkilling.
Hopefully, I could finish this first step by the end of the semester.
Once this first part is done, it will already be possible to computer a few statis-
tics and see whether there are some interesting trends. If it shows that there
is value in the data and it is worth exploring more, then I could move to the
second step. The time requirement for this second part are likely to be more
variable, less predictable, as it is the ”creative” part and could be polished for
month if wanted.
4.2.8 Deliverables: Format
One web page, made of multiple files (html, js, png, jpg, etc.). As explained
before, the output should be static and therefore not require any interaction
with the server once the page is loaded. This means that the user can explore
the whole dataset offline and look at the source code freely. The data won’t be
stored in a database but directly within the page.
4.2.9 Ressources: Creators
Individual project. Some other contributors could join in later steps, but not
expected for now.
4.2.10 Ressources: Technical
I have access to most softwares and can generally easily download additional
softwares I could need. I have my own dedicated server to host the website and
the project, that I already use for my other websites. I am already familiar with
Javascript and HTML web development, so I can use advanced features easily
if necessary.
The data classification will most likely be done in Excel to simplify things.
4.3 Vision: The purpose map
Describe and reason what the possible aim of this work would be in terms of what
experience (the Exs) it would facilitate and through what tone of voice (Read vs.
Feel)
This visualisation will clearly be exploratory, as the user can play with the
parameters. However, I wouldn’t put it too far on the right in the purpose map
as it doesn’t have any complex interaction possibilities: you can only change
the values of a few field (and post a comment at the bottom of the page), you
can’t really contribute to the dataset itself or radically influence what you see
on the screen.
29
31. The visualisation is closer to ”feeling” than to ”reading”: even though the
numbers will be shown, the main purpose is to give an approximate idea of the
distributions rather than exact values. The last statements being classified on
partially subjective criteria anyway, trying to look too ”serious” and ”scientific”
would reduce the credibility and interest of this visualisation.
The resulting purpose map looks like this:
4.4 Vision: Your ideas
Sketch out roughly what you think this work could look like: what colours, what
keywords, rough drawing, any other work out there that you can be inspired/in-
fluenced by? (not a test of artistry, just map out ideas)
The core idea behind this visualisation is two easy steps: 1) You pick a pro-
file. 2) We show you what the last words of a criminal with this profile typically
are about.
When I want to give a general idea of the skills/traits of an individual, I like
to use the radial ”star chart”:
30
32. I believe it would fit perfectly for our needs here too, since it gives more an
idea regarding the last statements than a specific value.
I would also like to give the user the possibility to explore the data at the
individual level. So I would probably add a rectangle with all the pictures of
the convicted offenders, and the user could pass the mouse over the pictures to
see the details about that individual, including how its last statement has been
categorised.
Possibly, we could add a word cloud at the bottom of the visualisation to give
a broad overview of the words used in the statements.
Overall, our visualisation would look similar to this:
31
35. Assignment 1b: Developing intimacy with data
and establishing editorial perspectives
Introduction
You are working at a broadsheet newspaper as a graphics editor and preparing
ideas for your assignments editor about possible visual work you could undertake
ahead of the Rio Olympics.
Compile a report that details your deep examination, proposed transformations,
explorations and editorial ideas based on the data provided (and data you could
reasonably obtain).
You are provided with two contrasting worksheets (in Excel) showing medal-
lists from the Summer Olympic Games
34
36. 5 Examination
Articulate the meaning of the data (representativeness and phenomenon) and
thoroughly assess and describe the physical properties (type, size, condition).
Compare what the two dataset offer and contrast their differences.
5.1 Dataset 1
Number of observations (rows): 4093
Each observation represents: a won olympic medal (can be won by multiple
athletes together)
Observations that aren’t included? Only rows for medals of 4 sports (Ath-
letics, Canoeing, Rowing and Swimming). Only includes years 1896 to 2012.
Columns per row: 10
Columns details and structure:
• Games: Integer, contains the year (4 digits) of the games. Range: 1896-
2012. Most frequent value: 2012, appears 267 times. Least frequent value:
1896, appears 20 times.
• Sport: String, 4 unique values (Athletics, Canoeing, Rowing and Swim-
ming). For each year (variable ”Games”), each of the sports should nor-
mally appear, but in the older years (e.g. 1896), some of the sports weren’t
introduced yet. Most frequent value: Athletics, appears 1550 times. Least
frequent value: Canoeing, appears 491 times.
• Event: String, 93 unique values, contains the ”discipline” as well as the
parameters of the discipline (such as distance) and the information about
the ”gender” of the competition. For each year, each of the 93 events
should normally appear, but in the older years (e.g. 1896), some of the
events weren’t introduced yet. Each event is always grouped with the
same sport. Most frequent value: 100m Men, appears 85 times. Least
frequent value: 3000m Steeplechase Women, appears 3 times.
• Athlete(s): String, 3211 unique values. Format: first name followed by
last name, with the last name written in upper case characters. Note that
names with special characters are incorrectly stored (have question marks
instead of the true character). For competitions by teams, athletes are
separated by commas. Most frequent value: Michael PHELPS, appears
13 times.
• CountryCode: String, 3 upper case characters, contains the country
code of the country that the athlete represents. 102 unique values. Most
frequent value: USA, appears 982 times. Note: Countries can appear/dis-
appear/merge over time depending on political context; example: GDR
(German Democratic Republic) and FRG (West Germany) became GER
(Germany).
35
37. • CountryName: String, full country name. 96 unique values. Mostly
redundant with ”CountryCode” with a few exceptions; for example: Ger-
many can have country code DEU or GER depending on the year.
• Medal: String, contains the type of medal won. 3 unique values (Gold,
Silver, Bronze). Most frequent value: Gold, appears 1368 times. Least
frequent value: Bronze, appears 1358 times.
• Result: String, contains the performance of the athlete/country. The
formatting is different for each displicine, as the result can be a distance,
a time or another metric. This column has to be read together with the
next one (Unit). Note: a few rows have the value ”No result”.
• Unit: String, contains the key of how to interpret the ”Result” variable.
4 unique values (M:S:DD, H:MM:SS, M:SS:DD, #:DD)
• ResultInSeconds: Integer, redundant with the column ”Result” but
here the information is stored in seconds.
Note: One row has incorrectly formatted values due to disqualification.
5.2 Dataset 2
Number of observations (rows): 26398
Each observation represents: an athlete that won an olympic medal (if same
medal won my multiple persons, 1 row per person)
Observations that aren’t included? Only includes years 1920 to 2008.
Columns per row: 10
Columns details and structure:
• City: String, contains the city where the olympic games of this year were
hosted. 20 unique values. Most frequent value: Los Angeles, appears 2074
times. Least frequent value: Amsterdam, appears 710 times.
• Edition: Integer, contains the year (4 digits) of the games. 21 unique
values. Range: 1920-2008. For each year, the city is always the same.
Most frequent value: 2008, appears 2042 times. Least frequent value:
1932, appears 615 times.
• Sport: String, 33 unique values. For each year (variable ”Edition”), each
of the sports should normally appear, but in the older years, some of the
sports weren’t introduced yet.
• Discipline: String, 47 unique values. For each year (variable ”Edition”),
each of the disciplines should normally appear, but in the older years,
some of the disciplines weren’t introduced yet. For each discipline, the
sport is always the same.
36
38. • Athlete(s): String, 19356 unique values. Format: last name in capital
characters, followed by a comma and by the first names of the athlete.
• NOC: 3 characters string, contains the country code of the country to
which belong the athlete. 134 unique values.
• Gender: String, 2 unique values (Men and Women), indicates whether
the athlete is male or female. Number of rows with ”Men”: 18967. Num-
ber of rows with ”Women”: 7427. Percentage of rows with women: 39.16%
(note that this doesn’t indicate that 39.16% of the participants are women,
but that 39.16% of the medal holders are women. Women participation is
actually much lower).
• Event: String, 442 unique values, generally contains the ”discipline” as
well as the parameters of the discipline (such as distance) but doesn’t
contain the gender requirements of the event. Sometimes discipline is
omitted and only parameters are shown.
• Event gender: 1 character string, 3 unique values (M, W and X), indi-
cates whether the competition requires participants to be men, women, or
mixed (X).
• Medal: String, contains the type of medal won. 3 unique values (Gold,
Silver, Bronze).
5.3 Difference between the two datasets
The biggest differences are:
• Size: The second dataset contains much more rows because it covers much
more categories of sports (33 instead of 4). However, the first dataset
covers more years since it goes up to 2012 (compared to 2008) and down
to 1896 (compared to 1920).
• Meaning of a row: In the first dataset, one row is one medal, whereas
in the second one row is one athlete. In most cases, this doesn’t make any
different, but it will create more rows for group sports such as relay.
• Columns: Both datasets have different columns. The first dataset has
information about the results (time), which the second doesn’t have. The
second has information about the city and the discipline, which the first
doesn’t have. In addition, the second has the ”gender” information stored
separately from the event with a column for the gender of the athlete (not
found at all in dataset 1) and a column for the gender requirements of the
competition (found in dataset under the column ”event”).
37
39. 6 Transform the data
What could you do/would you need to do to clean or enhance the data? What
other data could you reasonably source in order to consolidate the data? You
may optionally do this but you are only expected to write about what you would
do and why.
6.1 Dataset 1 cleaning
I recommend the following operations:
• Uniformise column ”CountryName” so that it is constant over time. We
are not trying to analyse political history but sport performance so it is
preferable to simplify the data and focus on the geographical aspects. For
example, replace all the variations of Germany such as ”German Demo-
cratic Republic” by ”Germany”.
• Drop column ”CountryCode”, which is now fully redundant with ”Coun-
tryName” thanks to the uniformisation.
• Drop columns ”Result” and ”Unit” that are redundant with ”ResultsIn-
Seconds”. The interpretation and calculations are made much easier if
everything is already in numeric format.
• Drop the disqualified athlete row.
• Investigate the ”No result” rows, and possibly drop them as well.
• Manually repair the athlete names when they have question marks, using
information found on internet.
• Create a new category for whether an event is ”men” or ”women” only.
Then update the ”event” column to remove all the ”Men” and ”Women”
text since it is now stored in a different column.
• Create a column for ”group performance” that is worth either 0 or 1 de-
pending on whether multiple athletes won the medal together (for example
with relay disciplines). This can help to identify more easily the rows that
contain multiple athletes.
6.2 Dataset 2
I recommend the following operations:
• Create two new columns: one for the first name of the athlete, one for the
last name, and then drop the old column.
• Uniformise column ”NOC” so that it is constant over time. We are not
trying to analyse political history but sport performance so it is preferable
to simplify the data and focus on the geographical aspects. For example,
replace all the variations of Germany such as ”GDR” by ”GER”.
38
40. • Create a new binary column to indicate that an athlete won as part of a
team.
• Create an integer column that contains the number of members in the
team of the athlete if the athlete was part of a team, and 1 otherwise.
Necessary to avoid giving the impression that some countries have won
more medals when they were just better at team sports.
Overall, the data is quite clean and much better structured than the first
dataset.
6.3 Data enhancement
The biggest enhancement that I recommend is to merge the two datasets to-
gether to take only the advantages of each dataset.
Concretely, here is how I would proceed:
• Start from dataset 2
• Add a column ”ResultInSeconds”
• Add the values ”ResultsInSeconds” to each row of dataset 2 using the
corresponding element from dataset 1
• Add all the rows from dataset 1 for year 2012 and years 1896-1916 to
dataset 2, since dataset doesn’t have any information on these years
Our new dataset is now a perfect combination of the two datasets in terms of
information potential. However, this has also produced a lot of missing values,
notably in the column ”ResultsInSeconds” and for all the rows that we added
from dataset 1 (as well as no information on many disciplines for the year 2012).
These missing values might create some technical problems when generating the
different visualisations, and it might not be worth the effort. Depending on what
we want to visualise, we might decide to either use this new merged dataset, or
only dataset 1, or only dataset 2.
Ideally, I would look for a dataset similar to dataset 2 but that also includes
the missing year and the result in seconds from internet. If we find some more
complete datasets that also include personal information on the athletes (such
as weight, education, ethnicity, etc.) it could be very interesting for visualisa-
tions. I would also look to complete this data with information related to the
games themselves, such as the costs, the number of visitors, etc.
7 Exploration
Use Excel/Tableau to visually explore the two datasets in order to deepen your
appreciation of their physical properties and to help you brainstorm potentially
interesting angles of analysis.
39
41. I tried to experiment different visual formats (histograms, bar charts, lines,
pie charts, maps, etc.) and exploring how each value is better represented
(color? Label? size?). I tried to identify which subgroups have the most data
and thus would be interesting to be the object of a visualisation (e.g. the USA,
since it has won a lot of medals), which variables have the most variation or the
least (e.g. the proportion of gold medals overall is the same each year, but it
varies a lot if we only take the proportion of gold medals for a specific country)
and which variables take too many possible values to be represented entirely
(for example there are too many disciplines to represent them all in a single pie
chart).
A few of the graphs I have generated through my exploration are shown be-
low.
7.1 Dataset 1
Top 25 countries with the most medals:
Bubble map by number of medals:
40
49. 8 Editorial
Rationalise a list of at least 5 distinct, interesting editorial perspectives: artic-
ulate the angle and the framing/focus applied.
Following the insights gained through the exploration of the data, I came
up with a few angles that I thought would be interesting to develop and would
perfectly fit in a journal for example.
For the following perspectives, unless otherwise specified, I consider the merged
dataset (combination of dataset 1 and dataset 2) proposed above.
8.1 Perspective 1
Angle. What are the countries that have historically had the most importance
in Olympic games?
48
50. Framing. Take all the years from 1896 to 2012, all the sports, all the disci-
plines and all the events. Count teams once only, not once per individual.
Focus. Focus on the top 20 countries, and show the composition of the medals
(i.e proportion of gold, silver, bronze) through a stacked bar chart.
8.2 Perspective 2
Angle. Has the required performance to obtain a medal in athletics changed
over the years?
Framing. Use dataset 1, for years 1896 to 2012. Only consider rows that have
a result in seconds stored. Exclude team competitions.
Focus. Group by categories (distance and gender) and show one time series
line for each category. Years on the x axis and result on the y axis. Draw men
and women lines in two different colors. Draw multiple graphs with different y
scales or use an exponential y scale to avoid loosing accuracy on some categories
(e.g. sprint vs marathon, where competition are sometimes a hundred times
longer).
8.3 Perspective 3
Angle. How big the gap is between women’ scores and men’ scores in the
100m? Has it decreased over the years?
Framing. Use dataset 1, for years 1928 (introduction of the women 100m
event) to 2012. Take only gold medals.
Focus. Focus on the difference in seconds and plot it as a bar chart (which
could take negative values if a woman gets a better result on a year), with years
on the x axis and score difference on the y axis.
8.4 Perspective 4
Angle. Are new categories still regularly added to the games?
Framing. Use dataset 2, for years 1980 to 2008 (recent years only). Count
every new event as a new category, including the introduction of a new gender-
specific category. If a category is dropped from the games, also reduce the count
by 1.
49
51. Focus. Focus on the difference and plot it as a bar chart (which could take
negative values if a year has less categories than the previous one), with years on
the x axis and difference of categories on the y axis. Show the total number of
categories on top of each bar, as well as the total number of sports in parentheses.
If the new categories are part of a new sport, print this proportion of the bar in
a different color (i.e. like a stacked bar chart).
8.5 Perspective 5
Angle. How difficult is it to obtain multiple medals for an individual?
Framing. Consider all individuals who won a medal, individually or by team,
from 1896 to 2012, for any discipline in any category. Exclude individuals who
only won one medal.
Focus. Only show the count for each number of medals. Use a bar chart with
the total number of medal won on the y axis (excluding 1) and the number
of athletes in that position (frequency) as the length of the bar on the x axis.
Presumably, there are going to be very few people with multiple medals, and
only one or two at the top. Focus on the few outliers who have won more medals
than any else in history and show a few personal details on these people in an
”infobox” under the bar chart so that the readers can understand better how it
is possible to have so many victories.
50