SlideShare una empresa de Scribd logo
1 de 51
Descargar para leer sin conexión
Visualisation
Imperial College London, Msc Business Analytics
Homework 1
Jonathan Zimmermann
14-02-2016
Assignment 1a: Developing empathy and forming
ideas within different problem contexts
Introduction
Consider the curiosities, circumstances, purpose and ideas potentially involved
in the challenge of creating a visualisation/infographic in each of the following
made-up scenarios.
Compile a detailed briefing document outlining your assumptions, definitions
and ideas about the context and vision for each scenario.
0.1 Data source
Website of the department of criminal justice of the state of Texas:
https://www.tdcj.state.tx.us/death row/dr executed offenders.html
Date of oldest execution of dataset: 12/07/1982
1
0.2 Data sample
2
3
1 Scenario 1
A pro-capital punishment US newspaper reporting on the milestone of the 500th
execution (pretend it is 2013).
Assumptions. We consider a medium-size daily newspaper in Texas. Quite
popular among the conservative/rural part of the population mostly through its
print edition, it has managed over the recent years to attract a growing base of
younger readers through its online edition thanks to a new policy of making a
few selected article freely available on its website for the non-subscribers. This
younger readership, with less strongly defined political opinions, comes from a
more mixed background but still represents only a small minority of the readers.
The newspaper is still majoritively owned and managed by a few members of
a conservative texan family. There has been a very old and strong tradition of
operating the journal with profit only as a secondary objective, and the owners
have a strong influence on the content of the articles, generally shaped to match
the (republican) political view of the family.
I am the youngest employee of a small team in charge of providing various
graphics and illustrations for the articles. I have recently been hired as part of
an initiative to modernize the visual elements of the journal and make better
use of data-backed visualisations. As the journal is still experimenting with such
methods, I work alone but with the support of the rest of the team for advice
and to provide me with elements I might need, such as drawings.
1.1 Context: The reason
Outline what you think might be the essence of the curiosity: What question(s)
do you think the potential audience might need answering/find interesting that
the visualisation would ultimately present?
Stakeholders Intrigue. The family owning the journal has been personally
very involved with the passing of a few pro-capital punishment bills in Texas.
It wants to use the milestone of this 500th execution as an opportunity to
report on the status of death penalty in general. I am to create a double-page
infographic, which the owners hope will illustrate with numbers the impact that
capital punishments has had on the safety of texans and show the benefits of
such laws.
Audience Intrigue. Most of the audience is likely to hold a similar position
to the newspaper, i.e. be supportive of capital punishment, in which case the
readers will be particularly looking for information that confirms and supports
their views.
4
The purpose of this newspaper is still to attract a quite large public rather than
a highly specific political audience. Therefore, a fraction of the readers (notably
the younger readership of the online edition) might be relatively new to the
question and looking for clues that will help them take a position in the capital
punishment debate. Even though these readers might be looking for a more
neutral perspective, the newspaper will want to influence their position towards
supporting capital punishment. Therefore, for this audience, the visualisation
also needs to answer the question of why capital punishment is a good policy.
The existence of this second type of audience, however, will force the newspaper
and the visualisation to look a bit more objective.
As this is typically done when celebrating a milestone, the readers might ex-
epect to find information ”aggregating/summarising” the situation since the
introduction of capital punishment rather than specifics regarding the story of
each sentenced criminal. The kind of questions readers might have include:
• How many people were sentenced each year? Has this figure been changing
over the years? Is it appropriate, should it be higher/lower?
• Where did these crimes happen? Is it next to where I live? Am I con-
cerned?
• What is the typical profile (age, race, type of crime) of the sentenced
criminal? Has this profile been changing over the years?
• What are the main arguments, metrics and studies supporting my views?
What numbers can I quote to make my opinion more credible when dis-
cussing about the topic with others?
• Who are the important people (politicians, celebrities, . . . ) who have the
same opinion as me?
• How fundamentally bad are these people? What is the kind of atrocities
they have generally committed to (rightfully) deserve the capital punish-
ment? What makes me fundamentally different from them?
1.2 Context: The circumstances
Work through the list (of 10 main headings) and describe your critical thinking
about any assumptions, definitions or self-imposed factors you think might be
relevant/existent. If any require no definition or are relevant to that scenario,
explain why not?
1.2.1 People: Stakeholders
As explained earlier, the managers of the journal are very implicated in this ar-
ticle. Even though I am supposed to submit all my work to my direct supervisor
for approval, it has been informally defined for this project that the owners of
5
the journal would be the ones taking the final decision of whether and how to
include my infographic. Therefore, for the length of this project, I am de facto
reporting directly to them without consulting my supervisor.
Their expectations have been clearly phrased. They hope that the visualisa-
tion will help to convey their political message. They specifically asked for an
infographic that would be ”powerful” and make them and the readers ”proud
to be American, proud to be texan, proud of the constitution”. They want
the visualisation to be ”fact-based” and ”scientific looking”, but insisted that it
doesn’t include any ”metrics made-up by liberals to spread doubt in the minds
of true Americans”. By this last sentence, they were referring to recently pub-
lished economic papers finding no causal relation between the crime rate and
the existence of a capital punishment.
1.2.2 People: Audience
As explained earlier, the typical audience is texan, conservative and rural. A
small fraction of the readers live in other states, less than 1% of the readers
are international. The median reader of the print edition is 54 years old and
the median reader of the online edition is 29 years old. The online edition only
represents 4% of the total revenues, but is estimated to amount for about 20%
of all the readers.
From a previous survey, we known that the typical reader has a high school
education and possible ”some university”, but most readers’ professional ac-
tivity doesn’t include ”interacting with numbers on a regular basis”. Previous
attempts to use data-based infographics have shown that the readers have a
strong preference for intuitive visualisations that can be quickly grasped.
A small but significant fraction of the readers is strongly supportive capital
punishment and already expects the journal to make a special article for this
occasion. These readers would be very disappointed if the subject wasn’t cov-
ered extensively or contained inaccuracies.
1.2.3 Constraints: Pressures
The 500th execution will normally happen in two days. I have been informed
this morning that I am assigned to this project and have to deliver the final
visualisation in two days by 10 p.m. so that it allows for last minute changes.
It has been made clear to me that my performance on this work would be as-
sessed and used as a reflection of my skills in general. As I am a new employee,
the managers still haven’t received much feedback regarding my performances,
and this will be an excellent opportunity to distinguish myself. The managers
know that and thus expect from me to put some extra hours on the project the
next two nights.
6
For a double-page visualisation, the policy of the journal is to allocate a 400$
budget to purchase rights for any necessary illustration. As for every project, I
am also free to request any help I would deem necessary from the other members
of the design team who are not currently assigned to a particular project.
1.2.4 Constraints: Rules
The topic is expected to be the day after tomorrow’s headline. In addition to
the front page, the article will occupy a total of 5 pages. My infographic will
occupy a double-page (page 2 and 3 of the article), followed by two more pages
of text (page 4 and 5 of the article). The newspaper has a tabloid format, i.e.
each page has dimension of 430 mm 280 mm. There is no limitation regard-
ing colors, but it would be preferable that it respects the general set of colors
of the journal (mostly variations of red, similar to those of the republican party).
The policy of the newspaper is to design all visualisations with the print edition
as the only edition in mind. The infographic might be adapted for the web later
only if the format allows.
1.2.5 Consumption: Frequency
This is a one-off visualisation and will only be used for this one edition.
1.2.6 Consumption: Setting
The newspaper will be distributed through the regular channels. This will be a
Thursday edition and thus won’t be distributed over the week-end. It is very rare
for readers to read an old edition, so the infographic should only be ”consumed”
on Thursday. If the web version gets popular, however, it is possible that a few
people keep visiting the article page for a few more days.
1.2.7 Deliverables: Size
I am to deliver a PDF version of a ready-to-print infographic by the deadline, and
store in the digital archive of the journal all the files and documents necessary
to reproduce or adapt my work. I can, if I wish, submit alternative version of
the infographic for the editors to choose from, but all submitted work need to
be final and ready to print. I must be ready to accomodate any last minute
change.
1.2.8 Deliverables: Format
The standard format for the deliverables is PDF. I am not to worry about the
web adaptation, generally a simple cut from the raw PDF file.
7
1.2.9 Ressources: Creators
As explained earlier, I am to work alone, with the support of the rest of the
design team if necessary.
1.2.10 Ressources: Technical
The newspaper has licenses for most popular design software, including the
complete Adobe suite. I am free to use any software I want. If the purchase of
an additional software is required, I can either use the budget allocated to the
project (up to 100$ per software) or make a request to my supervisor.
I have been instructed to use as my main source of data the list of executed
offenders of the Texas Department of Justice for this infographic, but am free to
use any other reliable source. The newspaper is subscribed to a few premium-
access databases that I can use to obtain additional data if necessary.
1.3 Vision: The purpose map
Describe and reason what the possible aim of this work would be in terms of what
experience (the Exs) it would facilitate and through what tone of voice (Read vs.
Feel)
The infographic will consist of many numbers (since there are a lot of differ-
ent and important statistics to display) but be mostly visual and intuitive, i.e.
the meaning of these figures should be easily understandable by the context in
which they are displayed. This would put this visualisation in the left end of
”exhibitory” on the purpose map, since most of the conclusions from the data
are directly exposed to the reader. But the explanations should remain short,
and the total amount of text at a minimum level, so the infographic couldn’t be
qualified as ”explanatory”.
These numbers have a strong implication in human lives and security, so read-
ers might feel very strongly after reading the infographic. The raw data has
potential for a lot of ”reading” and ”quantitative analysis”, but making too
”scientific” a topic as sensitive as executions would likely offense many readers.
On the other hand, totally excluding logic and making it a purely emotional
topic wouldn’t be effective and against the policy of the journal. Thus, the
infographic should be classified at the top of the ”feeling” category, close to the
line with ”reading”.
We would then have the following purpose map:
8
1.4 Vision: Your ideas
Sketch out roughly what you think this work could look like: what colours, what
keywords, rough drawing, any other work out there that you can be inspired/in-
fluenced by? (not a test of artistry, just map out ideas)
The two pages could be merged together to form only one visualisation. For
the sake of readability, however, no word should be split across the two pages. If
necessary, a large graph can be split in the two pages, but this should be avoided.
The main kinds of elements that could be contained include:
• A bar chart showing the count of executions across time (either one year
periods or five year periods).
• Other time series printed parallelly to the previous bar chart, to show
potential correlation between executions and another crime metric. Could
also show impact of new bills on the number of executions per year.
• A timeline reminding of the key historical date (such as the introduction
of new legislations).
• A map of Texas to show where most of the execution/crimes happen.
Some points of the map can potentially be emphasized if they can help
explaining the crime rate (e.g.: frontier with Mexico?).
9
• A vertical bar chart showing what is the most frequent content/words
from the last words of the criminals (e.g.: god, pardon, love,...).
• Featured profiles of executed criminals (preferably authors of atrocious
crimes with little empathy for the victims).
• Small text boxes next to relevant charts to provide additional insights and
anecdotes. E.g.: What to think of this map? What to remember from
this graph?
As for the background, it could be made of something patriotic such as a
partially transparent zoomed-in American/Texas flag or symbol.
Overall, the infographic would look like something similar to this:
2 Scenario 2
Analysts at the Texas Department of Criminal Justice staff reporting to senior
management at the Texas Department of Criminal Justice.
2.1 Context: The reason
Outline what you think might be the essence of the curiosity: What question(s)
do you think the potential audience might need answering/find interesting that
the visualisation would ultimately present?
We are a team of 4 Junior Analysts (including one summer intern) and we
have been assigned on a two days project to prepare all the visual elements of
10
the biennial public report on death penalty in Texas (part of a legal obligation of
transparency of the Texas Department of Criminal Justice). We are supervised
by two senior analysts who will be responsible to hand the final report to senior
management for final review and approval.
The senior analysts have provided us with a detailed list of all the main compo-
nents of the report. Last year’s report was 83 pages long (including appendix),
with a total of 34 tables and graphs. The responsibility of the graphs has been
split across the team of junior analysts and I have been assigned to prepare 8
of the 34 visuals.
Stakeholders Intrigue. We are to report directly to the senior analysts, who
will, in turn, report to the senior management. The senior analysts will be in
charge of writing most of the core of the report. The content of the report will
be highly influenced by the data, which will be taken in part from the tables and
graphics we are to prepare. Thus, the senior analysts (our direct supervisors)
hope that these graphical elements will be structured in a way that facilitates
their redaction and inspires them for this year’s report. They will also want
to make sure that the graphs and tables are similar to those of previous year’s
reports to avoid any trouble with the senior management.
The kind of questions they might have include:
• Can I find the figure I want easily?
• Can I compare it to last period’s figure easily?
• Is the format the same as what I am used to?
• Are there any mistakes?
• Do I need to ask for an additional graph?
The senior management will be the ones directly responsible for the publication
of the report, thus will be held accountable for any positive or negative conse-
quence that might result from it. They will first have a global look at the report
to check for quality, then read it more attentively to verify that they agree with
the content and, hopefully, gain additional knowledge (or fill their knowledge
gaps) on the topic. If the quality doesn’t meet the standards they will have to
send it back to the senior analysts or correct it themselves. They hope to avoid
this step to reduce their workload.
But fundamentally, the report is more a formality for the senior management
than a real source of insights or potential for career progression, so their biggest
intrigue really is ”Does this draft look satisfactory enough to allow me to move
on to more interesting projects?”. Internally, this report is really viewed as
waste of time and an increase of the costs of death penalty.
11
Audience Intrigue. As this report is part of a legal obligation of trans-
parency of the Department of Criminal Justice, it doesn’t have a particular
target audience, which will be by nature very diverse. It will include journal-
ists, researchers, NGOs, writers, curious, etc. They might have any kind of
question and might want to find the answer in the report. They might even
have found the report totally randomly through a search engine.
Most of this audience will not need to read the entire report but only to find
the data or piece of information they need. Sometimes, they won’t even need
to find it, but just need to be sure the information is contained in the report in
order to use it as a reference in another (formal and lengthy) paper or report.
Many of these readers will be used to the format of previous editions, and might
see from a bad eye any significant change from the usual format.
The questions of the audience might include:
• What was the profile of the executed criminals the last two years?
• What is the opinion of the Department of Criminal Justice on the evolution
of the situation.
• Does this report speak about the overall cost of executions so that I can
add it to my ”references” slide at the end of my development of Criminal
Justice in Early Communities” lecture in my ”SOC 101 - Introduction to
Sociology” class?
However, not all of the audience’s potential questions are relevant to the
design of the visualisations (and of the report), since the main objective of
that document is to meet the legal obligations of the department rather than
to satisfy ”customers”. For that reason, the most important question of the
audience that the Department of Criminal Justice will want to answer to is
”Can this information be found somewhere in the report?”.
2.2 Context: The circumstances
Work through the list (of 10 main headings) and describe your critical thinking
about any assumptions, definitions or self-imposed factors you think might be
relevant/existent. If any require no definition or are relevant to that scenario,
explain why not?
2.2.1 People: Stakeholders
As explained earlier, the main stakeholders for me are my two direct supervisors,
the senior analysts. The production of these visualisations is a routine task for
everybody, me included, and it cannot lead to any promotion or bonus. The
end stakeholders are the senior management, but unless we (Junior Analysts)
produce such a low quality work that the senior management needs to find a
responsible to fire, we will not have any interaction with them.
12
Fundamentally, the main stakeholder is the United States Constitution and
the rule of law: we need to produce graphs that meet the requirements of our
legal obligation, not for us, but as part of our duty to serve the best interests
of the country.
2.2.2 People: Audience
As explained earlier, the audience is very diverse.
We estimate that the online version of the document (available on the De-
partment of Criminal Justice’s website) represents the majority of the readers.
From Google Analytics statistics, we know that 65% of the downloads of pre-
vious reports have been made by people (or computers) located in Texas. 54%
of the total ”pageviews” on the report come from organic search (mostly from
Google), 23% come from access from an internal link of the department’s web-
site, 8% come from direct access (generally people who copy pasted the link of
the report in their web browser) and 15% come from links from other websites
pointing to the report. Less than 1% of the downloads comes from social net-
works.
The print version of the report (around 400 copies) circulates mostly among
institutions and libraries, but can also be ordered by anyone through the de-
partment’s website (15$ administrative fee). In most cases, these institutions
are interested to own a physical copy of the report mostly for archiving purpose
or for the sake of completeness.
The direct audience is generally very sophisticated, hold a bachelor/master de-
gree or PhD and are less sensitive to the form of the report than to its content.
The report also has an indirect audience, that is people who will read extracts
of it or content inspired by the report but rephrased to be more accessible, for
example in newspapers, magazines or other medias. Generally, these people will
not even be aware that the content they are reading originally comes from this
report. The profile of this indirect audience is very different from the profile of
the direct audience, and will be less educated, less captivated by the topic.
2.2.3 Constraints: Pressures
The graphics must be handed tomorrow by 5 pm. No extension will be granted
(or ”should be required”, according to the senior analysts). No budget is allo-
cated to this project.
No particular pressure on the project. Everybody expects it to be a routine
task.
13
2.2.4 Constraints: Rules
Layout must be close or identical to that of the previous year. Allowed to make
some minor changes if necessary. The visual appearance can be changed com-
pared to the first year, but only if time permits and if all the four junior analysts
agree on the modifications (as all the visuals must obey the same design guide-
lines).
The original report is in colour but it is not uncommon that people photo-
copy parts of it in black and white. In particular, the charts are often used in
academic contexts as teaching material and handed out to students. Therefore,
the black and white outcome must always be kept in mind when designing the
different parts of the report.
The font used in the appendix section of the report is Times New Roman 11pt.
The report is printed on A4 pages with 1” margins. Each page contains a
discreet page number as well as the name of the department of justice at the
bottom of the page. The top of the page also includes the name of the current
section of the document. The formatting of the report will only take place once
all the elements have been completed.
All the tables and numeric values of the appendix have to be written in pure
text (no images) when possible so as to make them searchable (by the user or
the search engine).
2.2.5 Consumption: Frequency
A new report is published every two year. Most of the visuals and tables can
be replicated from the previous report with only a limited amount of changes
necessary. Every time, however, the structure is slightly altered and some ta-
bles/visuals are dropped or added.
2.2.6 Consumption: Setting
The report is most popular the weeks following its publication. However, it is
supposed to remain ”valid” for two years, so figures and conclusions that only
have very short term importance should preferably be avoided.
As explained earlier, it is distributed through different channels. The main one
is the digital version freely available on the website of the department of justice
(and indexable by search engines). The print edition has only been around 400
copies the previous year, but this number is now quite flexible as the department
has decided to move to the Print-On-Demand (POD) technology for this year
to spare costs and gain in flexibility, through Amazon Createspace.
14
2.2.7 Deliverables: Size
34 visuals in total, but I am only in charge of 8. If I finish early, I might get
assigned to some of the visuals of my three colleagues.
More specifically, the eight visuals consist of:
• Three full page tables, mostly containing numbers
• One half-page map with legend
• One full page set of bar charts
• A full page vertical timeline
• Two half-page combo line/pie/bar charts (new for this year)
The two half-page combo charts should take the most time as I will have
to design them from scratch. Two of the three tables require to actively search
information in a diverse set of sources. The other five visuals consist mostly of
updating information and won’t represent more than a few hours of work.
2.2.8 Deliverables: Format
The eight visuals must be delivered in eight different .docx Microsoft Word
documents, in the format specified above. As explained above, output must
be optimized for both print (including black and white photocopies) and web
(including text-only search engines).
Another person will be in charge to adapt the Word documents to PDF for-
mat for web distribution and POD through Amazon Createspace.
2.2.9 Ressources: Creators
I work as part of the team, but I am in charge of my own visuals. I have a good
relationship with my teammate, so it is likely that I will ask for their advice
in case I have any doubt. The four junior analysts share the same office and
collaboration is frequent.
2.2.10 Ressources: Technical
All the employees are equiped with one basic Windows desktop computer. The
Microsoft Office and Adobe suites are installed by default on all the computers,
as well as a few additional softwares. We are free to download additional soft-
ware if necessary, but no budget is allocated. However, we are expected to only
deliver files that can be read with the default softwares of the department. In
particular, all the visuals must be contained within .docx documents.
15
2.3 Vision: The purpose map
Describe and reason what the possible aim of this work would be in terms of what
experience (the Exs) it would facilitate and through what tone of voice (Read vs.
Feel)
The visuals and tables clearly aim to be as objective and informative as pos-
sible. The focus is on the quantity and the accuracy rather than the ease of
understanding. Clearly, these visualisations should occupy the top left corner
of the purpose map:
2.4 Vision: Your ideas
Sketch out roughly what you think this work could look like: what colours, what
keywords, rough drawing, any other work out there that you can be inspired/in-
fluenced by? (not a test of artistry, just map out ideas)
For most of the visuals, no creativity is required since it will only be about
updating the data with more recent information. Most of the data required for
the eight visuals can be found directly from the list of executed offenders on the
department’s website.
Roughly, the eight figures would look a bit like that:
16
17
The inspiration for the new combo charts comes directly from the other
already existing similar combo slides in the rest of the appendix. Most of the
text boxes are simply explaining the content of the tables and graphs. The
vertical timeline will be identical to that of the last report, except that two
new points must be added with a small text paragraph (the margin has to be
reduced to accommodate the new data points).
3 Scenario 3
A campaign group looking to help influence a debate about the ending of capital
punishment.
3.1 Context: The reason
Outline what you think might be the essence of the curiosity: What question(s)
do you think the potential audience might need answering/find interesting that
the visualisation would ultimately present?
I am part of a small campus association fighting against death penalty, made
of a few friends with strong convictions. A major conference on the topic of
death penalty will be hosted at our university in a few weeks. The organising
committee is made of a few university officials with strong ties to the republican
party and most of the speakers expected to be present in the conference are well
known for their systematic conservative political bias.
18
Due to the importance of the debate, which is expected to attract up to a
few thousand students and will be diffused on live television, we decided to take
action and protest against the structure in place. The first step of our plan of
action is to raise awareness through a strong campaign both in campus and on
social networks.
We will start our online campaign by publishing a few viral infographics in
the university facebook pages.
Stakeholders Intrigue. The group is strongly committed to rely on facts for
all its points. But we known that our ”opponents” will mostly rely on emotions
and ”shock” arguments, so we decided to adapt our approach to the field in
which we fight.
The stakeholders are made of the 14 group members (including myself). We
are already extremely aware of all the facts, evidence and data surrounding the
subject so we don’t expect to learn anything new from this infographic. How-
ever, we have always found that it was very difficult to convey all the points of
our position to an uninformed individual, so we hope that this visualisation will
help us come with better ways to convince others of our points. So the kind of
questions that we hope this visualisation would help us answer includes:
• In which order should we bring our arguments?
• What data do we need to show to effectively convince others?
• What is the most intuitive way to visualise our strong feelings?
• How does it look from outside?
• How many points do we actually have? How much space does it take on
paper?
• Are rationality and fact-based approaches compatible with ”coolness” for
this topic? Can we make it interesting while still being objective?
Audience Intrigue. The audience will be very diverse in nature and rela-
tively uninformed about the topic. As the date of the conference will approach,
the topic will become more ”trendy” and students will gain interest, possibly
looking to forge their own opinion before the debate. Thus, we expect the peo-
ple to be more attentive to our claims than usual.
As our university tends to be more conservative in general, we tend to receive
less support and more aggressivity from our peers. But most of the students
are not politically involved and remain quite open to ideas from any political
origin. They are looking for any information that could help them take a stance
in the debate. Their questions might include:
19
• What are the arguments of both parties?
• What are the main ”things to know” about the topic?
• Are there any fun figures that I can quote in a discussion with my friends?
What can make me sound more clever?
• Why should I care?
• What is the position of the rest of the student and of the country on the
matter? Where do I stand compared to them?
For the students who already have a strong opinion on the matter, their
intrigue would probably be closer to:
• Is this infographic trying to speak about something I don’t want to hear?
Does it come from people with a political opinion different from mine that
I should just ignore?
• What can I find that confirms that the other party has weaker claims than
mine? How strong is my position? What are the main arguments of the
opposition / our best arguments?
But the targeted audience is mostly the first group, the relatively uninformed
people, as our influence group largely considers the informed students as ”gained
cuases” or ”lost causes” with a too high cost of conversion.
3.2 Context: The circumstances
Work through the list (of 10 main headings) and describe your critical thinking
about any assumptions, definitions or self-imposed factors you think might be
relevant/existent. If any require no definition or are relevant to that scenario,
explain why not?
3.2.1 People: Stakeholders
As explained earlier, the stakeholders are the 14 students who are part of the
group. The group is mostly informal: it is officially registered as a campus
organisation, but most of the decisions are taken individually according to per-
sonal variations of opinion and motivation. On some occasions, however, we try
to coordinate our efforts to reach an objective. This campaign is one of these
occasions, so we decided to only publish things on social networks after having
informed and received tacit approval from the other 13 members. There is no
formal decision process: we have a WhatsApp group in which we send our sug-
gestions. If after a few hours no one rejected our idea, it is generally assumed
that it is fine to proceed.
The 14 students are all undergraduate students in the university, coming from
different majors, with a focus on social sciences.
20
3.2.2 People: Audience
The university counts 21 235 students, 81% of which are full-time undergradu-
ate. As explained earlier, the main target consists of the non-politically affiliated
students, but we aim to maximize the total exposure of our campaign so we also
hope to reach a few individuals from other categories, including staff, lecturers,
university officials and possibly the general population. At the same occasion,
we hope to attract a few students to join our organisation.
3.2.3 Constraints: Pressures
Our funds are very limited so members mostly use their own resources. The
time pressure is quite strong since the conference will take place in a few weeks
only and our classes keep us busy most of the time. However, no specific dead-
line is set for my own work and it is quite flexible.
In the past months, a few students member of various campus groups have
received administrative sanctions (going up to expulsion for the most extreme
cases) following a protest on diverse issues. The purpose of these sanctions was
to send a clear message to student organisations: disobedience to college rules
will not be tolerated. This has created a lot of tensions and puts a lot of pressure
on many associations, since most students want to stay out of trouble and prefer
remaining unnoticed from the administration. In particular, we could face legal
action for libel if the elements we publish directly incriminate university officials.
Another source of pressure is the ranking algorithm of social networks, Facebook
in particular. Since the infographic must become viral on Facebook to reach
the diffusion objectives, certain rules must be observed, such as publishing the
infographic on the right time of the day, having the right keywords in the text
description, having the right dimensions for the picture profile update, adopting
the right tone generally expected by Facebook users, etc.
3.2.4 Constraints: Rules
Most of the rules are dictated by the practical necessities of publishing on Face-
book. The infographic must be in a format that allows for comfortable reading
on Facebook. The readers should not have to leave Facebook to see the vi-
sualisation on an external website, so this has to be taken into account. We
must also remember that Facebook highly compresses images, so no zooming
should be required. Possibly, one Facebook post can contain multiple images:
this can be used as a solution to the problem of low allowed resolution for indi-
vidual images. Interactive images such as GIF are not allowed, as well as most
non-standard image formats; the Facebook documentation should be consulted
to check for technical feasibility before starting to develop any innovative non-
standard visualisation.
The logo and the name of the campaign group must be clearly shown on all
21
the publications. It is important that the message be conveyed together with
the name.
3.2.5 Consumption: Frequency
This is a targeted one-off campaign, but the most popular infographics might
be republished later or in other social networks, or re-used.
3.2.6 Consumption: Setting
Rapid (hopefully viral) consumption on social networks, mostly Facebook. The
timing of the publication is very important and is thought in advance to corre-
spond to the peaks of social activity of the students.
The infographics are generally published through the Facebook page of the
campaign group, shared by each of the group members and by a set of sup-
porters and partner student associations. It is also published directly on all
the public and private Facebook groups of the university to which the student
members have access. The publication is often associated with a few prizes to
win, attributed to people randomly drawn from the list of people who ”liked”
the post, to incite students to like each of the publication and increase virality.
3.2.7 Deliverables: Size
I have to prepare one set of images that will be published together in a Facebook
post. These images are generally accompanied with a small text description to
spark interest and with a link to the (newly created) website of the group.
The total number of images is entirely flexible but should remain reasonable
to avoid the risk of boring the reader. The first few images should be very
simple, very visual and easy to understand to grab the attention of the read-
ers. The following images will only be seen by users who have already started
reading the post and can, thus, be more sophisticated. Therefore, the first few
images will require more ”artistic” and ”creative” work, whereas the last ones
are likely to require more research and descriptive work.
3.2.8 Deliverables: Format
All the images should be designed with Facebook publication as the main ob-
jective. In a second step, those images can be adapted for publishing in other
social networks or contexts, such as the group website or print diffusion. From
time to time, the group is invited to take part in small talks and re-uses some
of the infographics published on Facebook to illustrate some of their slides.
22
3.2.9 Ressources: Creators
The association is very collaborative and people are always glad to help each
other. Skills are very diverse so when I don’t know how to do something, I
generally ask for help and receive some very fast.
I am used to work together with two very close friends of mine who are also
part of the group and have complementary skills. They are less active than me
in the association and are not currently working on any project, but I know I
can rely on them to help me with the technical part of the creation. The first
is an art student, very good at hand drawings of any sort, and the second is a
CS major very agile with design softwares. I generally make a plan of all the
details of and create myself the easiest elements, and I rely on them for all the
more complex graphical components of the visualisation.
3.2.10 Ressources: Technical
As students, we have free access to most of the popular design or technical soft-
wares. This includes the whole Microsoft Office and Adobe suites.
Each member of the group has its own habit for the design of these visuali-
sations, so there is no standard among us.
3.3 Vision: The purpose map
Describe and reason what the possible aim of this work would be in terms of what
experience (the Exs) it would facilitate and through what tone of voice (Read vs.
Feel)
The campaign’s tactic is made of three phases:
1. Spark the interest of the reader with shocking facts and numbers
2. Convince him that our side is the good one with objective facts and reliable
data
3. Make him emotionally involved so that he doesn’t immediately forgets
about what he read after closing the computer and takes action to bring
about changes
So there is a mix of feeling and reading, with a slight tendency towards feeling.
No interaction with the graphs are possible, but it is required from the user
to click on the ”Next” button to see the next image of the post, and social in-
teraction is encouraged through the comments. Still, this is not enough for the
visualisation to qualify as ”exploratory”. It couldn’t be qualified as explanatory
either as the amount of reading and detailed information is moderate. Overall,
the visualisation tries to be ”exhibitory” and to simply ”show things as they
are”, by letting the reader draw the most obvious conclusions by himself.
23
Thus, on the purpose map, the visualisation would be around here:
3.4 Vision: Your ideas
Sketch out roughly what you think this work could look like: what colours, what
keywords, rough drawing, any other work out there that you can be inspired/in-
fluenced by? (not a test of artistry, just map out ideas)
My original idea was to make a ”Top 10”, which are generally known to
attract a lot of ”views” on social medias. It could be called for example ”TOP
10 reasons why Death Penalty is bad”. It would be made of 10 images, starting
with the tenth, going up to the first. Normally, the reader would expect the
reasons to become increasingly important and this incites him to continue read-
ing. Since the argument are quite subjective anyway, this doesn’t necessarily
has to be true, but finishing with a memorable point is always a good thing.
As explained earlier, the last points can be more developed than the first and
could potentially encourage the reader to learn more about the topic by reading
other recommended sources.
The images and elements of the top 10 could look a bit like that on Facebook:
24
25
As shown above, the first image shown could use the picture and some per-
sonal details about a specific criminal to make readers feel the scale of the costs
and bring it back to an individual level. Making the criminal look more like a
person also increases the emotions, since people then no longer see ”criminal-
ity” as a society-wide issue but ”criminals” as bad persons who don’t deserve so
much attention. Here, I think the main strength of this database of convicted
offenders is that it gives us a lot of very personal information we can use to
display on selected felons (at the risk of appearing unethical by doing so).
The other seven images could make use of the other attributes of the database,
most specifically the last statements. For example, a word cloud could probably
illustrate one of the images (depending on which words come as the most visi-
ble). The frequency of a specific topic in the last statements can also be brought
as an argument and be shown visually as part of a bar chart. But overall, the
executed offenders database would probably only represent a small part of all
the data used for the visualisation.
4 Scenario 4
What potentially intrigues you about this data? What might you undertake if
you had the chance?
4.1 Context: The reason
Outline what you think might be the essence of the curiosity: What question(s)
do you think the potential audience might need answering/find interesting that
the visualisation would ultimately present?
Personal Intrigue. I am a student in Imperial College London, studying in
the new Msc in Business Analytics programme. During our Visualisation course,
we explore some innovative ways to visually represent and understand data. In
one of the classes, we briefly look at the ”Executed offenders” list of the Texas
Department of Justice’s website.
At home, after the class, curious, I go back on the website and realise that the
data has a lot of potential and decide to explore it a bit further. In particular,
I am interested by the individuals’ last statement: What are the main topics/-
subjects that an individual wants to speak about/mention when he knows he
is about to die? Is it religion, love, family, friendship, politics, justice, regrets,
justification, etc.? Have these topics changed over time? Do they differ based
on gender, based on age, based on the type of crime committed?
I think that the data would be easier to understand and fit well within a fully
26
interactive visualisation where parameters can be changed in one click. I re-
alise this would be a good opportunity to brush up my JavaScript/JQuery skills
and potentially try a few visualisation js/html5 libraries, so I decide to make it
web-based.
Audience Intrigue. Of course, this is a personal project and the primary
questions I try to answer are mine. But since I will do the work anyway, why
not share the results?
I decide to, in a first step, publish my work on an ad hoc page in my per-
sonal website. Depending on my findings, I might dedicate a website to the
project.
The audience might have the same questions as me, and perhaps even have
some additional ideas. A comment section on the website could potentially lead
to some very interesting suggestions of extensions to this project.
4.2 Context: The circumstances
Work through the list (of 10 main headings) and describe your critical thinking
about any assumptions, definitions or self-imposed factors you think might be
relevant/existent. If any require no definition or are relevant to that scenario,
explain why not?
4.2.1 People: Stakeholders
Me, that’s all. I only do that as a side project, for my personal interest.
4.2.2 People: Audience
In the beginning, no one except me. Later, once published on my website, just
the people I give the link to or potentially a few visitors coming from search
engines. If I like my findings and dedicate an entire website to the project, then
possibly the audience could become much larger. But that’s not the priority
right now.
4.2.3 Constraints: Pressures
I have a few weeks until the end of semester projects start. As this is still the
beginning of the semester, I have a lot more free time. Hopefully, I can finished
the core of the project by the end of the semester and start thinking and work-
ing on the ”publishing details” during the vacations.
I want to get inspiration from existing interactive visualisations. I need to
count a few days to explore the web and see what has been done in the past
(not necessarily relating to death penalty). In particular, I want to know what
are the main visualisations libraries, what they can do and what they can’t.
27
4.2.4 Constraints: Rules
I want my visualisation to be compatible with the main web browsers (except
previous versions of Internet Explorer) and smartphones (Android and Iphone).
It has to run on almost any screen size/resolution and shouldn’t require any
external plugin. Ideally, the loading time should be short (so not too many files
to download) to allow for low speed smartphones to access the page without
wifi connection.
4.2.5 Consumption: Frequency
Normally a one-off project, but I could update the data as it becomes available
(i.e. as more criminals receive capital punishment, or using data from other
states) or improve it based on the feedback I receive. In any case, if I publish
the results, they should remain public on the website forever; I generally try to
keep all my websites and publications online, even when they become obsolete,
for archive purposes.
4.2.6 Consumption: Setting
My own consumption: Live, as I work on the project.
Consumption from the audience: If published, prolonged, with most likely a
peak when I first release the results on social media. Potential peaks are also
possible from search engines when a major capital-punishment-related event
happens, if my project’s page has a good SEO (Search Engine Optimisation).
In general, I don’t expect a very high level of traffic on the page.
4.2.7 Deliverables: Size
One static web-page only. However, I see two main sources of work:
• Working on the data. Categorising the different last statements ap-
propriately. What should be the relevant ”categories” of last statements?
What level of detail should we have (should we just look for ”religion” in
general, or distinguish between different types of religions?)? How to de-
fine whether one last statement includes a link to ”religion” or not? What
should we consider as ”religion”? How to understand last statements that
are not clearly phrased?
• Working on the visualisation per say. How to communicate the data
in an explicit way? How to visually express the strength of correlations?
How to make it look like an objective approach? How to make it intuitive,
how to invite the visitors to try changing the different parameters of the
interactive visualisation?
The first part of the work will be a bit more ”mechanic” but must be done
carefully or the study won’t be valid. Most likely, every of the hundreds of
statements will have to be MANUALLY categorised. A clustering algorithm
28
would probably give poor results given the complexity of the data and rather
small size of the population, and a bit overkilling.
Hopefully, I could finish this first step by the end of the semester.
Once this first part is done, it will already be possible to computer a few statis-
tics and see whether there are some interesting trends. If it shows that there
is value in the data and it is worth exploring more, then I could move to the
second step. The time requirement for this second part are likely to be more
variable, less predictable, as it is the ”creative” part and could be polished for
month if wanted.
4.2.8 Deliverables: Format
One web page, made of multiple files (html, js, png, jpg, etc.). As explained
before, the output should be static and therefore not require any interaction
with the server once the page is loaded. This means that the user can explore
the whole dataset offline and look at the source code freely. The data won’t be
stored in a database but directly within the page.
4.2.9 Ressources: Creators
Individual project. Some other contributors could join in later steps, but not
expected for now.
4.2.10 Ressources: Technical
I have access to most softwares and can generally easily download additional
softwares I could need. I have my own dedicated server to host the website and
the project, that I already use for my other websites. I am already familiar with
Javascript and HTML web development, so I can use advanced features easily
if necessary.
The data classification will most likely be done in Excel to simplify things.
4.3 Vision: The purpose map
Describe and reason what the possible aim of this work would be in terms of what
experience (the Exs) it would facilitate and through what tone of voice (Read vs.
Feel)
This visualisation will clearly be exploratory, as the user can play with the
parameters. However, I wouldn’t put it too far on the right in the purpose map
as it doesn’t have any complex interaction possibilities: you can only change
the values of a few field (and post a comment at the bottom of the page), you
can’t really contribute to the dataset itself or radically influence what you see
on the screen.
29
The visualisation is closer to ”feeling” than to ”reading”: even though the
numbers will be shown, the main purpose is to give an approximate idea of the
distributions rather than exact values. The last statements being classified on
partially subjective criteria anyway, trying to look too ”serious” and ”scientific”
would reduce the credibility and interest of this visualisation.
The resulting purpose map looks like this:
4.4 Vision: Your ideas
Sketch out roughly what you think this work could look like: what colours, what
keywords, rough drawing, any other work out there that you can be inspired/in-
fluenced by? (not a test of artistry, just map out ideas)
The core idea behind this visualisation is two easy steps: 1) You pick a pro-
file. 2) We show you what the last words of a criminal with this profile typically
are about.
When I want to give a general idea of the skills/traits of an individual, I like
to use the radial ”star chart”:
30
I believe it would fit perfectly for our needs here too, since it gives more an
idea regarding the last statements than a specific value.
I would also like to give the user the possibility to explore the data at the
individual level. So I would probably add a rectangle with all the pictures of
the convicted offenders, and the user could pass the mouse over the pictures to
see the details about that individual, including how its last statement has been
categorised.
Possibly, we could add a word cloud at the bottom of the visualisation to give
a broad overview of the words used in the statements.
Overall, our visualisation would look similar to this:
31
32
33
Assignment 1b: Developing intimacy with data
and establishing editorial perspectives
Introduction
You are working at a broadsheet newspaper as a graphics editor and preparing
ideas for your assignments editor about possible visual work you could undertake
ahead of the Rio Olympics.
Compile a report that details your deep examination, proposed transformations,
explorations and editorial ideas based on the data provided (and data you could
reasonably obtain).
You are provided with two contrasting worksheets (in Excel) showing medal-
lists from the Summer Olympic Games
34
5 Examination
Articulate the meaning of the data (representativeness and phenomenon) and
thoroughly assess and describe the physical properties (type, size, condition).
Compare what the two dataset offer and contrast their differences.
5.1 Dataset 1
Number of observations (rows): 4093
Each observation represents: a won olympic medal (can be won by multiple
athletes together)
Observations that aren’t included? Only rows for medals of 4 sports (Ath-
letics, Canoeing, Rowing and Swimming). Only includes years 1896 to 2012.
Columns per row: 10
Columns details and structure:
• Games: Integer, contains the year (4 digits) of the games. Range: 1896-
2012. Most frequent value: 2012, appears 267 times. Least frequent value:
1896, appears 20 times.
• Sport: String, 4 unique values (Athletics, Canoeing, Rowing and Swim-
ming). For each year (variable ”Games”), each of the sports should nor-
mally appear, but in the older years (e.g. 1896), some of the sports weren’t
introduced yet. Most frequent value: Athletics, appears 1550 times. Least
frequent value: Canoeing, appears 491 times.
• Event: String, 93 unique values, contains the ”discipline” as well as the
parameters of the discipline (such as distance) and the information about
the ”gender” of the competition. For each year, each of the 93 events
should normally appear, but in the older years (e.g. 1896), some of the
events weren’t introduced yet. Each event is always grouped with the
same sport. Most frequent value: 100m Men, appears 85 times. Least
frequent value: 3000m Steeplechase Women, appears 3 times.
• Athlete(s): String, 3211 unique values. Format: first name followed by
last name, with the last name written in upper case characters. Note that
names with special characters are incorrectly stored (have question marks
instead of the true character). For competitions by teams, athletes are
separated by commas. Most frequent value: Michael PHELPS, appears
13 times.
• CountryCode: String, 3 upper case characters, contains the country
code of the country that the athlete represents. 102 unique values. Most
frequent value: USA, appears 982 times. Note: Countries can appear/dis-
appear/merge over time depending on political context; example: GDR
(German Democratic Republic) and FRG (West Germany) became GER
(Germany).
35
• CountryName: String, full country name. 96 unique values. Mostly
redundant with ”CountryCode” with a few exceptions; for example: Ger-
many can have country code DEU or GER depending on the year.
• Medal: String, contains the type of medal won. 3 unique values (Gold,
Silver, Bronze). Most frequent value: Gold, appears 1368 times. Least
frequent value: Bronze, appears 1358 times.
• Result: String, contains the performance of the athlete/country. The
formatting is different for each displicine, as the result can be a distance,
a time or another metric. This column has to be read together with the
next one (Unit). Note: a few rows have the value ”No result”.
• Unit: String, contains the key of how to interpret the ”Result” variable.
4 unique values (M:S:DD, H:MM:SS, M:SS:DD, #:DD)
• ResultInSeconds: Integer, redundant with the column ”Result” but
here the information is stored in seconds.
Note: One row has incorrectly formatted values due to disqualification.
5.2 Dataset 2
Number of observations (rows): 26398
Each observation represents: an athlete that won an olympic medal (if same
medal won my multiple persons, 1 row per person)
Observations that aren’t included? Only includes years 1920 to 2008.
Columns per row: 10
Columns details and structure:
• City: String, contains the city where the olympic games of this year were
hosted. 20 unique values. Most frequent value: Los Angeles, appears 2074
times. Least frequent value: Amsterdam, appears 710 times.
• Edition: Integer, contains the year (4 digits) of the games. 21 unique
values. Range: 1920-2008. For each year, the city is always the same.
Most frequent value: 2008, appears 2042 times. Least frequent value:
1932, appears 615 times.
• Sport: String, 33 unique values. For each year (variable ”Edition”), each
of the sports should normally appear, but in the older years, some of the
sports weren’t introduced yet.
• Discipline: String, 47 unique values. For each year (variable ”Edition”),
each of the disciplines should normally appear, but in the older years,
some of the disciplines weren’t introduced yet. For each discipline, the
sport is always the same.
36
• Athlete(s): String, 19356 unique values. Format: last name in capital
characters, followed by a comma and by the first names of the athlete.
• NOC: 3 characters string, contains the country code of the country to
which belong the athlete. 134 unique values.
• Gender: String, 2 unique values (Men and Women), indicates whether
the athlete is male or female. Number of rows with ”Men”: 18967. Num-
ber of rows with ”Women”: 7427. Percentage of rows with women: 39.16%
(note that this doesn’t indicate that 39.16% of the participants are women,
but that 39.16% of the medal holders are women. Women participation is
actually much lower).
• Event: String, 442 unique values, generally contains the ”discipline” as
well as the parameters of the discipline (such as distance) but doesn’t
contain the gender requirements of the event. Sometimes discipline is
omitted and only parameters are shown.
• Event gender: 1 character string, 3 unique values (M, W and X), indi-
cates whether the competition requires participants to be men, women, or
mixed (X).
• Medal: String, contains the type of medal won. 3 unique values (Gold,
Silver, Bronze).
5.3 Difference between the two datasets
The biggest differences are:
• Size: The second dataset contains much more rows because it covers much
more categories of sports (33 instead of 4). However, the first dataset
covers more years since it goes up to 2012 (compared to 2008) and down
to 1896 (compared to 1920).
• Meaning of a row: In the first dataset, one row is one medal, whereas
in the second one row is one athlete. In most cases, this doesn’t make any
different, but it will create more rows for group sports such as relay.
• Columns: Both datasets have different columns. The first dataset has
information about the results (time), which the second doesn’t have. The
second has information about the city and the discipline, which the first
doesn’t have. In addition, the second has the ”gender” information stored
separately from the event with a column for the gender of the athlete (not
found at all in dataset 1) and a column for the gender requirements of the
competition (found in dataset under the column ”event”).
37
6 Transform the data
What could you do/would you need to do to clean or enhance the data? What
other data could you reasonably source in order to consolidate the data? You
may optionally do this but you are only expected to write about what you would
do and why.
6.1 Dataset 1 cleaning
I recommend the following operations:
• Uniformise column ”CountryName” so that it is constant over time. We
are not trying to analyse political history but sport performance so it is
preferable to simplify the data and focus on the geographical aspects. For
example, replace all the variations of Germany such as ”German Demo-
cratic Republic” by ”Germany”.
• Drop column ”CountryCode”, which is now fully redundant with ”Coun-
tryName” thanks to the uniformisation.
• Drop columns ”Result” and ”Unit” that are redundant with ”ResultsIn-
Seconds”. The interpretation and calculations are made much easier if
everything is already in numeric format.
• Drop the disqualified athlete row.
• Investigate the ”No result” rows, and possibly drop them as well.
• Manually repair the athlete names when they have question marks, using
information found on internet.
• Create a new category for whether an event is ”men” or ”women” only.
Then update the ”event” column to remove all the ”Men” and ”Women”
text since it is now stored in a different column.
• Create a column for ”group performance” that is worth either 0 or 1 de-
pending on whether multiple athletes won the medal together (for example
with relay disciplines). This can help to identify more easily the rows that
contain multiple athletes.
6.2 Dataset 2
I recommend the following operations:
• Create two new columns: one for the first name of the athlete, one for the
last name, and then drop the old column.
• Uniformise column ”NOC” so that it is constant over time. We are not
trying to analyse political history but sport performance so it is preferable
to simplify the data and focus on the geographical aspects. For example,
replace all the variations of Germany such as ”GDR” by ”GER”.
38
• Create a new binary column to indicate that an athlete won as part of a
team.
• Create an integer column that contains the number of members in the
team of the athlete if the athlete was part of a team, and 1 otherwise.
Necessary to avoid giving the impression that some countries have won
more medals when they were just better at team sports.
Overall, the data is quite clean and much better structured than the first
dataset.
6.3 Data enhancement
The biggest enhancement that I recommend is to merge the two datasets to-
gether to take only the advantages of each dataset.
Concretely, here is how I would proceed:
• Start from dataset 2
• Add a column ”ResultInSeconds”
• Add the values ”ResultsInSeconds” to each row of dataset 2 using the
corresponding element from dataset 1
• Add all the rows from dataset 1 for year 2012 and years 1896-1916 to
dataset 2, since dataset doesn’t have any information on these years
Our new dataset is now a perfect combination of the two datasets in terms of
information potential. However, this has also produced a lot of missing values,
notably in the column ”ResultsInSeconds” and for all the rows that we added
from dataset 1 (as well as no information on many disciplines for the year 2012).
These missing values might create some technical problems when generating the
different visualisations, and it might not be worth the effort. Depending on what
we want to visualise, we might decide to either use this new merged dataset, or
only dataset 1, or only dataset 2.
Ideally, I would look for a dataset similar to dataset 2 but that also includes
the missing year and the result in seconds from internet. If we find some more
complete datasets that also include personal information on the athletes (such
as weight, education, ethnicity, etc.) it could be very interesting for visualisa-
tions. I would also look to complete this data with information related to the
games themselves, such as the costs, the number of visitors, etc.
7 Exploration
Use Excel/Tableau to visually explore the two datasets in order to deepen your
appreciation of their physical properties and to help you brainstorm potentially
interesting angles of analysis.
39
I tried to experiment different visual formats (histograms, bar charts, lines,
pie charts, maps, etc.) and exploring how each value is better represented
(color? Label? size?). I tried to identify which subgroups have the most data
and thus would be interesting to be the object of a visualisation (e.g. the USA,
since it has won a lot of medals), which variables have the most variation or the
least (e.g. the proportion of gold medals overall is the same each year, but it
varies a lot if we only take the proportion of gold medals for a specific country)
and which variables take too many possible values to be represented entirely
(for example there are too many disciplines to represent them all in a single pie
chart).
A few of the graphs I have generated through my exploration are shown be-
low.
7.1 Dataset 1
Top 25 countries with the most medals:
Bubble map by number of medals:
40
Map representation by number of medals:
41
Average length of a competition, in seconds, by sport:
42
Total number of medals over the years:
43
7.2 Dataset 2
Proportion of male/female medals:
Proportion of male/female medals over time:
44
USA medals by sport over time:
45
USA medals by type (bronze, silver, gold) over time:
46
Sports by number of disciplines and categories of events:
47
8 Editorial
Rationalise a list of at least 5 distinct, interesting editorial perspectives: artic-
ulate the angle and the framing/focus applied.
Following the insights gained through the exploration of the data, I came
up with a few angles that I thought would be interesting to develop and would
perfectly fit in a journal for example.
For the following perspectives, unless otherwise specified, I consider the merged
dataset (combination of dataset 1 and dataset 2) proposed above.
8.1 Perspective 1
Angle. What are the countries that have historically had the most importance
in Olympic games?
48
Framing. Take all the years from 1896 to 2012, all the sports, all the disci-
plines and all the events. Count teams once only, not once per individual.
Focus. Focus on the top 20 countries, and show the composition of the medals
(i.e proportion of gold, silver, bronze) through a stacked bar chart.
8.2 Perspective 2
Angle. Has the required performance to obtain a medal in athletics changed
over the years?
Framing. Use dataset 1, for years 1896 to 2012. Only consider rows that have
a result in seconds stored. Exclude team competitions.
Focus. Group by categories (distance and gender) and show one time series
line for each category. Years on the x axis and result on the y axis. Draw men
and women lines in two different colors. Draw multiple graphs with different y
scales or use an exponential y scale to avoid loosing accuracy on some categories
(e.g. sprint vs marathon, where competition are sometimes a hundred times
longer).
8.3 Perspective 3
Angle. How big the gap is between women’ scores and men’ scores in the
100m? Has it decreased over the years?
Framing. Use dataset 1, for years 1928 (introduction of the women 100m
event) to 2012. Take only gold medals.
Focus. Focus on the difference in seconds and plot it as a bar chart (which
could take negative values if a woman gets a better result on a year), with years
on the x axis and score difference on the y axis.
8.4 Perspective 4
Angle. Are new categories still regularly added to the games?
Framing. Use dataset 2, for years 1980 to 2008 (recent years only). Count
every new event as a new category, including the introduction of a new gender-
specific category. If a category is dropped from the games, also reduce the count
by 1.
49
Focus. Focus on the difference and plot it as a bar chart (which could take
negative values if a year has less categories than the previous one), with years on
the x axis and difference of categories on the y axis. Show the total number of
categories on top of each bar, as well as the total number of sports in parentheses.
If the new categories are part of a new sport, print this proportion of the bar in
a different color (i.e. like a stacked bar chart).
8.5 Perspective 5
Angle. How difficult is it to obtain multiple medals for an individual?
Framing. Consider all individuals who won a medal, individually or by team,
from 1896 to 2012, for any discipline in any category. Exclude individuals who
only won one medal.
Focus. Only show the count for each number of medals. Use a bar chart with
the total number of medal won on the y axis (excluding 1) and the number
of athletes in that position (frequency) as the length of the bar on the x axis.
Presumably, there are going to be very few people with multiple medals, and
only one or two at the top. Focus on the few outliers who have won more medals
than any else in history and show a few personal details on these people in an
”infobox” under the bar chart so that the readers can understand better how it
is possible to have so many victories.
50

Más contenido relacionado

Destacado

mmckinnon target ppt
mmckinnon target pptmmckinnon target ppt
mmckinnon target pptmmckinnon6492
 
Problem set 2 - Statistics and econometrics - Msc Business Analytics - Imper...
Problem set 2  - Statistics and econometrics - Msc Business Analytics - Imper...Problem set 2  - Statistics and econometrics - Msc Business Analytics - Imper...
Problem set 2 - Statistics and econometrics - Msc Business Analytics - Imper...Jonathan Zimmermann
 
Economic Overview of the Sport, Social, Urban and Environmental Impacts and L...
Economic Overview of the Sport, Social, Urban and Environmental Impacts and L...Economic Overview of the Sport, Social, Urban and Environmental Impacts and L...
Economic Overview of the Sport, Social, Urban and Environmental Impacts and L...Olympic Studies Centre, CEO-UAB
 
The Olympics
The OlympicsThe Olympics
The OlympicsSali1110
 
A Mobile Olympics: Viral Cities, Mobile Media and Mega-Events
A Mobile Olympics: Viral Cities, Mobile Media and Mega-EventsA Mobile Olympics: Viral Cities, Mobile Media and Mega-Events
A Mobile Olympics: Viral Cities, Mobile Media and Mega-EventsUniversity of Salford, Manchester
 
Target Corporation - Consulting project
Target Corporation - Consulting projectTarget Corporation - Consulting project
Target Corporation - Consulting projectJonathan Zimmermann
 
Capital punishment (1)
Capital punishment (1)Capital punishment (1)
Capital punishment (1)tvlndgrlcain
 
Abortion power point
Abortion power pointAbortion power point
Abortion power pointldkoziol
 

Destacado (10)

mmckinnon target ppt
mmckinnon target pptmmckinnon target ppt
mmckinnon target ppt
 
Problem set 2 - Statistics and econometrics - Msc Business Analytics - Imper...
Problem set 2  - Statistics and econometrics - Msc Business Analytics - Imper...Problem set 2  - Statistics and econometrics - Msc Business Analytics - Imper...
Problem set 2 - Statistics and econometrics - Msc Business Analytics - Imper...
 
Economic Overview of the Sport, Social, Urban and Environmental Impacts and L...
Economic Overview of the Sport, Social, Urban and Environmental Impacts and L...Economic Overview of the Sport, Social, Urban and Environmental Impacts and L...
Economic Overview of the Sport, Social, Urban and Environmental Impacts and L...
 
The Olympics
The OlympicsThe Olympics
The Olympics
 
A Mobile Olympics: Viral Cities, Mobile Media and Mega-Events
A Mobile Olympics: Viral Cities, Mobile Media and Mega-EventsA Mobile Olympics: Viral Cities, Mobile Media and Mega-Events
A Mobile Olympics: Viral Cities, Mobile Media and Mega-Events
 
Target Corporation - Consulting project
Target Corporation - Consulting projectTarget Corporation - Consulting project
Target Corporation - Consulting project
 
Capital punishment (1)
Capital punishment (1)Capital punishment (1)
Capital punishment (1)
 
Capital punishment
Capital punishmentCapital punishment
Capital punishment
 
Abortion power point
Abortion power pointAbortion power point
Abortion power point
 
Death penalty
Death penaltyDeath penalty
Death penalty
 

Similar a Visualisation - Homework 1 - Msc Business Analytics - Imperial College London

Discussion What Is NewsAs you watch a local news program or re.docx
Discussion What Is NewsAs you watch a local news program or re.docxDiscussion What Is NewsAs you watch a local news program or re.docx
Discussion What Is NewsAs you watch a local news program or re.docxelinoraudley582231
 
FINAL PRINT -Engagement in the Details - AN ANALYSIS OF READER INTERACTION AC...
FINAL PRINT -Engagement in the Details - AN ANALYSIS OF READER INTERACTION AC...FINAL PRINT -Engagement in the Details - AN ANALYSIS OF READER INTERACTION AC...
FINAL PRINT -Engagement in the Details - AN ANALYSIS OF READER INTERACTION AC...Nathan J Stone
 
Internet Essay Topics. Informative Essay Examples sample, Bookwormlab
Internet Essay Topics. Informative Essay Examples sample, BookwormlabInternet Essay Topics. Informative Essay Examples sample, Bookwormlab
Internet Essay Topics. Informative Essay Examples sample, BookwormlabNicole Muyeed
 
Ashford 6 - Week 5 - Journal AP Style Practice .docx
Ashford 6 - Week 5 - Journal   AP Style Practice  .docxAshford 6 - Week 5 - Journal   AP Style Practice  .docx
Ashford 6 - Week 5 - Journal AP Style Practice .docxdavezstarr61655
 
Writing Project 3 PrewritingFor this project I have chos.docx
Writing Project 3 PrewritingFor this project I have chos.docxWriting Project 3 PrewritingFor this project I have chos.docx
Writing Project 3 PrewritingFor this project I have chos.docxjeffevans62972
 
College Essay Help Quick Essay Writers Www.Quic
College Essay Help Quick Essay Writers Www.QuicCollege Essay Help Quick Essay Writers Www.Quic
College Essay Help Quick Essay Writers Www.QuicBrenda Torres
 
International Issues Paper Instructions– Winter 2018 .docx
International Issues Paper Instructions– Winter 2018   .docxInternational Issues Paper Instructions– Winter 2018   .docx
International Issues Paper Instructions– Winter 2018 .docxnormanibarber20063
 
Essay Question For Nyu. Online assignment writing service.
Essay Question For Nyu. Online assignment writing service.Essay Question For Nyu. Online assignment writing service.
Essay Question For Nyu. Online assignment writing service.Jennifer Smith
 
Alternative story formats PowerPoint
Alternative story formats PowerPointAlternative story formats PowerPoint
Alternative story formats PowerPointDavid Arkin
 
The (surprisingly bright) future of local journalism
The (surprisingly bright) future of local journalismThe (surprisingly bright) future of local journalism
The (surprisingly bright) future of local journalismMike Orren
 
Social Networking Facebook My Space
Social Networking Facebook My SpaceSocial Networking Facebook My Space
Social Networking Facebook My Spaceannesunita
 
Ashford 3 - Week 2 - AssignmentAshford University Assignment .docx
Ashford 3 - Week 2 - AssignmentAshford University Assignment .docxAshford 3 - Week 2 - AssignmentAshford University Assignment .docx
Ashford 3 - Week 2 - AssignmentAshford University Assignment .docxdavezstarr61655
 
Script for ICCRTS 2011 Presentation
Script for ICCRTS 2011 PresentationScript for ICCRTS 2011 Presentation
Script for ICCRTS 2011 PresentationBruce Forrester
 
Essay Examples For University
Essay Examples For UniversityEssay Examples For University
Essay Examples For UniversityJenny Richardson
 
Getting People to Pay for Local News Online – ONA19
Getting People to Pay for Local News Online – ONA19Getting People to Pay for Local News Online – ONA19
Getting People to Pay for Local News Online – ONA19Online News Association
 
Outline Example For Essay. 37 Outstanding Essay Outline Templates Argumentati...
Outline Example For Essay. 37 Outstanding Essay Outline Templates Argumentati...Outline Example For Essay. 37 Outstanding Essay Outline Templates Argumentati...
Outline Example For Essay. 37 Outstanding Essay Outline Templates Argumentati...Lisa Cartagena
 
Communicating Data for Impact
Communicating Data for ImpactCommunicating Data for Impact
Communicating Data for ImpactPeter Speyer
 

Similar a Visualisation - Homework 1 - Msc Business Analytics - Imperial College London (20)

Discussion What Is NewsAs you watch a local news program or re.docx
Discussion What Is NewsAs you watch a local news program or re.docxDiscussion What Is NewsAs you watch a local news program or re.docx
Discussion What Is NewsAs you watch a local news program or re.docx
 
FINAL PRINT -Engagement in the Details - AN ANALYSIS OF READER INTERACTION AC...
FINAL PRINT -Engagement in the Details - AN ANALYSIS OF READER INTERACTION AC...FINAL PRINT -Engagement in the Details - AN ANALYSIS OF READER INTERACTION AC...
FINAL PRINT -Engagement in the Details - AN ANALYSIS OF READER INTERACTION AC...
 
Internet Essay Topics. Informative Essay Examples sample, Bookwormlab
Internet Essay Topics. Informative Essay Examples sample, BookwormlabInternet Essay Topics. Informative Essay Examples sample, Bookwormlab
Internet Essay Topics. Informative Essay Examples sample, Bookwormlab
 
Ashford 6 - Week 5 - Journal AP Style Practice .docx
Ashford 6 - Week 5 - Journal   AP Style Practice  .docxAshford 6 - Week 5 - Journal   AP Style Practice  .docx
Ashford 6 - Week 5 - Journal AP Style Practice .docx
 
Writing Project 3 PrewritingFor this project I have chos.docx
Writing Project 3 PrewritingFor this project I have chos.docxWriting Project 3 PrewritingFor this project I have chos.docx
Writing Project 3 PrewritingFor this project I have chos.docx
 
College Essay Help Quick Essay Writers Www.Quic
College Essay Help Quick Essay Writers Www.QuicCollege Essay Help Quick Essay Writers Www.Quic
College Essay Help Quick Essay Writers Www.Quic
 
Market research
Market researchMarket research
Market research
 
International Issues Paper Instructions– Winter 2018 .docx
International Issues Paper Instructions– Winter 2018   .docxInternational Issues Paper Instructions– Winter 2018   .docx
International Issues Paper Instructions– Winter 2018 .docx
 
Essay Question For Nyu. Online assignment writing service.
Essay Question For Nyu. Online assignment writing service.Essay Question For Nyu. Online assignment writing service.
Essay Question For Nyu. Online assignment writing service.
 
Guide To Content Analysis
Guide To Content AnalysisGuide To Content Analysis
Guide To Content Analysis
 
News intro and language
News intro and languageNews intro and language
News intro and language
 
Alternative story formats PowerPoint
Alternative story formats PowerPointAlternative story formats PowerPoint
Alternative story formats PowerPoint
 
The (surprisingly bright) future of local journalism
The (surprisingly bright) future of local journalismThe (surprisingly bright) future of local journalism
The (surprisingly bright) future of local journalism
 
Social Networking Facebook My Space
Social Networking Facebook My SpaceSocial Networking Facebook My Space
Social Networking Facebook My Space
 
Ashford 3 - Week 2 - AssignmentAshford University Assignment .docx
Ashford 3 - Week 2 - AssignmentAshford University Assignment .docxAshford 3 - Week 2 - AssignmentAshford University Assignment .docx
Ashford 3 - Week 2 - AssignmentAshford University Assignment .docx
 
Script for ICCRTS 2011 Presentation
Script for ICCRTS 2011 PresentationScript for ICCRTS 2011 Presentation
Script for ICCRTS 2011 Presentation
 
Essay Examples For University
Essay Examples For UniversityEssay Examples For University
Essay Examples For University
 
Getting People to Pay for Local News Online – ONA19
Getting People to Pay for Local News Online – ONA19Getting People to Pay for Local News Online – ONA19
Getting People to Pay for Local News Online – ONA19
 
Outline Example For Essay. 37 Outstanding Essay Outline Templates Argumentati...
Outline Example For Essay. 37 Outstanding Essay Outline Templates Argumentati...Outline Example For Essay. 37 Outstanding Essay Outline Templates Argumentati...
Outline Example For Essay. 37 Outstanding Essay Outline Templates Argumentati...
 
Communicating Data for Impact
Communicating Data for ImpactCommunicating Data for Impact
Communicating Data for Impact
 

Último

Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...amitlee9823
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsJoseMangaJr1
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 

Último (20)

Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 

Visualisation - Homework 1 - Msc Business Analytics - Imperial College London

  • 1. Visualisation Imperial College London, Msc Business Analytics Homework 1 Jonathan Zimmermann 14-02-2016
  • 2. Assignment 1a: Developing empathy and forming ideas within different problem contexts Introduction Consider the curiosities, circumstances, purpose and ideas potentially involved in the challenge of creating a visualisation/infographic in each of the following made-up scenarios. Compile a detailed briefing document outlining your assumptions, definitions and ideas about the context and vision for each scenario. 0.1 Data source Website of the department of criminal justice of the state of Texas: https://www.tdcj.state.tx.us/death row/dr executed offenders.html Date of oldest execution of dataset: 12/07/1982 1
  • 4. 3
  • 5. 1 Scenario 1 A pro-capital punishment US newspaper reporting on the milestone of the 500th execution (pretend it is 2013). Assumptions. We consider a medium-size daily newspaper in Texas. Quite popular among the conservative/rural part of the population mostly through its print edition, it has managed over the recent years to attract a growing base of younger readers through its online edition thanks to a new policy of making a few selected article freely available on its website for the non-subscribers. This younger readership, with less strongly defined political opinions, comes from a more mixed background but still represents only a small minority of the readers. The newspaper is still majoritively owned and managed by a few members of a conservative texan family. There has been a very old and strong tradition of operating the journal with profit only as a secondary objective, and the owners have a strong influence on the content of the articles, generally shaped to match the (republican) political view of the family. I am the youngest employee of a small team in charge of providing various graphics and illustrations for the articles. I have recently been hired as part of an initiative to modernize the visual elements of the journal and make better use of data-backed visualisations. As the journal is still experimenting with such methods, I work alone but with the support of the rest of the team for advice and to provide me with elements I might need, such as drawings. 1.1 Context: The reason Outline what you think might be the essence of the curiosity: What question(s) do you think the potential audience might need answering/find interesting that the visualisation would ultimately present? Stakeholders Intrigue. The family owning the journal has been personally very involved with the passing of a few pro-capital punishment bills in Texas. It wants to use the milestone of this 500th execution as an opportunity to report on the status of death penalty in general. I am to create a double-page infographic, which the owners hope will illustrate with numbers the impact that capital punishments has had on the safety of texans and show the benefits of such laws. Audience Intrigue. Most of the audience is likely to hold a similar position to the newspaper, i.e. be supportive of capital punishment, in which case the readers will be particularly looking for information that confirms and supports their views. 4
  • 6. The purpose of this newspaper is still to attract a quite large public rather than a highly specific political audience. Therefore, a fraction of the readers (notably the younger readership of the online edition) might be relatively new to the question and looking for clues that will help them take a position in the capital punishment debate. Even though these readers might be looking for a more neutral perspective, the newspaper will want to influence their position towards supporting capital punishment. Therefore, for this audience, the visualisation also needs to answer the question of why capital punishment is a good policy. The existence of this second type of audience, however, will force the newspaper and the visualisation to look a bit more objective. As this is typically done when celebrating a milestone, the readers might ex- epect to find information ”aggregating/summarising” the situation since the introduction of capital punishment rather than specifics regarding the story of each sentenced criminal. The kind of questions readers might have include: • How many people were sentenced each year? Has this figure been changing over the years? Is it appropriate, should it be higher/lower? • Where did these crimes happen? Is it next to where I live? Am I con- cerned? • What is the typical profile (age, race, type of crime) of the sentenced criminal? Has this profile been changing over the years? • What are the main arguments, metrics and studies supporting my views? What numbers can I quote to make my opinion more credible when dis- cussing about the topic with others? • Who are the important people (politicians, celebrities, . . . ) who have the same opinion as me? • How fundamentally bad are these people? What is the kind of atrocities they have generally committed to (rightfully) deserve the capital punish- ment? What makes me fundamentally different from them? 1.2 Context: The circumstances Work through the list (of 10 main headings) and describe your critical thinking about any assumptions, definitions or self-imposed factors you think might be relevant/existent. If any require no definition or are relevant to that scenario, explain why not? 1.2.1 People: Stakeholders As explained earlier, the managers of the journal are very implicated in this ar- ticle. Even though I am supposed to submit all my work to my direct supervisor for approval, it has been informally defined for this project that the owners of 5
  • 7. the journal would be the ones taking the final decision of whether and how to include my infographic. Therefore, for the length of this project, I am de facto reporting directly to them without consulting my supervisor. Their expectations have been clearly phrased. They hope that the visualisa- tion will help to convey their political message. They specifically asked for an infographic that would be ”powerful” and make them and the readers ”proud to be American, proud to be texan, proud of the constitution”. They want the visualisation to be ”fact-based” and ”scientific looking”, but insisted that it doesn’t include any ”metrics made-up by liberals to spread doubt in the minds of true Americans”. By this last sentence, they were referring to recently pub- lished economic papers finding no causal relation between the crime rate and the existence of a capital punishment. 1.2.2 People: Audience As explained earlier, the typical audience is texan, conservative and rural. A small fraction of the readers live in other states, less than 1% of the readers are international. The median reader of the print edition is 54 years old and the median reader of the online edition is 29 years old. The online edition only represents 4% of the total revenues, but is estimated to amount for about 20% of all the readers. From a previous survey, we known that the typical reader has a high school education and possible ”some university”, but most readers’ professional ac- tivity doesn’t include ”interacting with numbers on a regular basis”. Previous attempts to use data-based infographics have shown that the readers have a strong preference for intuitive visualisations that can be quickly grasped. A small but significant fraction of the readers is strongly supportive capital punishment and already expects the journal to make a special article for this occasion. These readers would be very disappointed if the subject wasn’t cov- ered extensively or contained inaccuracies. 1.2.3 Constraints: Pressures The 500th execution will normally happen in two days. I have been informed this morning that I am assigned to this project and have to deliver the final visualisation in two days by 10 p.m. so that it allows for last minute changes. It has been made clear to me that my performance on this work would be as- sessed and used as a reflection of my skills in general. As I am a new employee, the managers still haven’t received much feedback regarding my performances, and this will be an excellent opportunity to distinguish myself. The managers know that and thus expect from me to put some extra hours on the project the next two nights. 6
  • 8. For a double-page visualisation, the policy of the journal is to allocate a 400$ budget to purchase rights for any necessary illustration. As for every project, I am also free to request any help I would deem necessary from the other members of the design team who are not currently assigned to a particular project. 1.2.4 Constraints: Rules The topic is expected to be the day after tomorrow’s headline. In addition to the front page, the article will occupy a total of 5 pages. My infographic will occupy a double-page (page 2 and 3 of the article), followed by two more pages of text (page 4 and 5 of the article). The newspaper has a tabloid format, i.e. each page has dimension of 430 mm 280 mm. There is no limitation regard- ing colors, but it would be preferable that it respects the general set of colors of the journal (mostly variations of red, similar to those of the republican party). The policy of the newspaper is to design all visualisations with the print edition as the only edition in mind. The infographic might be adapted for the web later only if the format allows. 1.2.5 Consumption: Frequency This is a one-off visualisation and will only be used for this one edition. 1.2.6 Consumption: Setting The newspaper will be distributed through the regular channels. This will be a Thursday edition and thus won’t be distributed over the week-end. It is very rare for readers to read an old edition, so the infographic should only be ”consumed” on Thursday. If the web version gets popular, however, it is possible that a few people keep visiting the article page for a few more days. 1.2.7 Deliverables: Size I am to deliver a PDF version of a ready-to-print infographic by the deadline, and store in the digital archive of the journal all the files and documents necessary to reproduce or adapt my work. I can, if I wish, submit alternative version of the infographic for the editors to choose from, but all submitted work need to be final and ready to print. I must be ready to accomodate any last minute change. 1.2.8 Deliverables: Format The standard format for the deliverables is PDF. I am not to worry about the web adaptation, generally a simple cut from the raw PDF file. 7
  • 9. 1.2.9 Ressources: Creators As explained earlier, I am to work alone, with the support of the rest of the design team if necessary. 1.2.10 Ressources: Technical The newspaper has licenses for most popular design software, including the complete Adobe suite. I am free to use any software I want. If the purchase of an additional software is required, I can either use the budget allocated to the project (up to 100$ per software) or make a request to my supervisor. I have been instructed to use as my main source of data the list of executed offenders of the Texas Department of Justice for this infographic, but am free to use any other reliable source. The newspaper is subscribed to a few premium- access databases that I can use to obtain additional data if necessary. 1.3 Vision: The purpose map Describe and reason what the possible aim of this work would be in terms of what experience (the Exs) it would facilitate and through what tone of voice (Read vs. Feel) The infographic will consist of many numbers (since there are a lot of differ- ent and important statistics to display) but be mostly visual and intuitive, i.e. the meaning of these figures should be easily understandable by the context in which they are displayed. This would put this visualisation in the left end of ”exhibitory” on the purpose map, since most of the conclusions from the data are directly exposed to the reader. But the explanations should remain short, and the total amount of text at a minimum level, so the infographic couldn’t be qualified as ”explanatory”. These numbers have a strong implication in human lives and security, so read- ers might feel very strongly after reading the infographic. The raw data has potential for a lot of ”reading” and ”quantitative analysis”, but making too ”scientific” a topic as sensitive as executions would likely offense many readers. On the other hand, totally excluding logic and making it a purely emotional topic wouldn’t be effective and against the policy of the journal. Thus, the infographic should be classified at the top of the ”feeling” category, close to the line with ”reading”. We would then have the following purpose map: 8
  • 10. 1.4 Vision: Your ideas Sketch out roughly what you think this work could look like: what colours, what keywords, rough drawing, any other work out there that you can be inspired/in- fluenced by? (not a test of artistry, just map out ideas) The two pages could be merged together to form only one visualisation. For the sake of readability, however, no word should be split across the two pages. If necessary, a large graph can be split in the two pages, but this should be avoided. The main kinds of elements that could be contained include: • A bar chart showing the count of executions across time (either one year periods or five year periods). • Other time series printed parallelly to the previous bar chart, to show potential correlation between executions and another crime metric. Could also show impact of new bills on the number of executions per year. • A timeline reminding of the key historical date (such as the introduction of new legislations). • A map of Texas to show where most of the execution/crimes happen. Some points of the map can potentially be emphasized if they can help explaining the crime rate (e.g.: frontier with Mexico?). 9
  • 11. • A vertical bar chart showing what is the most frequent content/words from the last words of the criminals (e.g.: god, pardon, love,...). • Featured profiles of executed criminals (preferably authors of atrocious crimes with little empathy for the victims). • Small text boxes next to relevant charts to provide additional insights and anecdotes. E.g.: What to think of this map? What to remember from this graph? As for the background, it could be made of something patriotic such as a partially transparent zoomed-in American/Texas flag or symbol. Overall, the infographic would look like something similar to this: 2 Scenario 2 Analysts at the Texas Department of Criminal Justice staff reporting to senior management at the Texas Department of Criminal Justice. 2.1 Context: The reason Outline what you think might be the essence of the curiosity: What question(s) do you think the potential audience might need answering/find interesting that the visualisation would ultimately present? We are a team of 4 Junior Analysts (including one summer intern) and we have been assigned on a two days project to prepare all the visual elements of 10
  • 12. the biennial public report on death penalty in Texas (part of a legal obligation of transparency of the Texas Department of Criminal Justice). We are supervised by two senior analysts who will be responsible to hand the final report to senior management for final review and approval. The senior analysts have provided us with a detailed list of all the main compo- nents of the report. Last year’s report was 83 pages long (including appendix), with a total of 34 tables and graphs. The responsibility of the graphs has been split across the team of junior analysts and I have been assigned to prepare 8 of the 34 visuals. Stakeholders Intrigue. We are to report directly to the senior analysts, who will, in turn, report to the senior management. The senior analysts will be in charge of writing most of the core of the report. The content of the report will be highly influenced by the data, which will be taken in part from the tables and graphics we are to prepare. Thus, the senior analysts (our direct supervisors) hope that these graphical elements will be structured in a way that facilitates their redaction and inspires them for this year’s report. They will also want to make sure that the graphs and tables are similar to those of previous year’s reports to avoid any trouble with the senior management. The kind of questions they might have include: • Can I find the figure I want easily? • Can I compare it to last period’s figure easily? • Is the format the same as what I am used to? • Are there any mistakes? • Do I need to ask for an additional graph? The senior management will be the ones directly responsible for the publication of the report, thus will be held accountable for any positive or negative conse- quence that might result from it. They will first have a global look at the report to check for quality, then read it more attentively to verify that they agree with the content and, hopefully, gain additional knowledge (or fill their knowledge gaps) on the topic. If the quality doesn’t meet the standards they will have to send it back to the senior analysts or correct it themselves. They hope to avoid this step to reduce their workload. But fundamentally, the report is more a formality for the senior management than a real source of insights or potential for career progression, so their biggest intrigue really is ”Does this draft look satisfactory enough to allow me to move on to more interesting projects?”. Internally, this report is really viewed as waste of time and an increase of the costs of death penalty. 11
  • 13. Audience Intrigue. As this report is part of a legal obligation of trans- parency of the Department of Criminal Justice, it doesn’t have a particular target audience, which will be by nature very diverse. It will include journal- ists, researchers, NGOs, writers, curious, etc. They might have any kind of question and might want to find the answer in the report. They might even have found the report totally randomly through a search engine. Most of this audience will not need to read the entire report but only to find the data or piece of information they need. Sometimes, they won’t even need to find it, but just need to be sure the information is contained in the report in order to use it as a reference in another (formal and lengthy) paper or report. Many of these readers will be used to the format of previous editions, and might see from a bad eye any significant change from the usual format. The questions of the audience might include: • What was the profile of the executed criminals the last two years? • What is the opinion of the Department of Criminal Justice on the evolution of the situation. • Does this report speak about the overall cost of executions so that I can add it to my ”references” slide at the end of my development of Criminal Justice in Early Communities” lecture in my ”SOC 101 - Introduction to Sociology” class? However, not all of the audience’s potential questions are relevant to the design of the visualisations (and of the report), since the main objective of that document is to meet the legal obligations of the department rather than to satisfy ”customers”. For that reason, the most important question of the audience that the Department of Criminal Justice will want to answer to is ”Can this information be found somewhere in the report?”. 2.2 Context: The circumstances Work through the list (of 10 main headings) and describe your critical thinking about any assumptions, definitions or self-imposed factors you think might be relevant/existent. If any require no definition or are relevant to that scenario, explain why not? 2.2.1 People: Stakeholders As explained earlier, the main stakeholders for me are my two direct supervisors, the senior analysts. The production of these visualisations is a routine task for everybody, me included, and it cannot lead to any promotion or bonus. The end stakeholders are the senior management, but unless we (Junior Analysts) produce such a low quality work that the senior management needs to find a responsible to fire, we will not have any interaction with them. 12
  • 14. Fundamentally, the main stakeholder is the United States Constitution and the rule of law: we need to produce graphs that meet the requirements of our legal obligation, not for us, but as part of our duty to serve the best interests of the country. 2.2.2 People: Audience As explained earlier, the audience is very diverse. We estimate that the online version of the document (available on the De- partment of Criminal Justice’s website) represents the majority of the readers. From Google Analytics statistics, we know that 65% of the downloads of pre- vious reports have been made by people (or computers) located in Texas. 54% of the total ”pageviews” on the report come from organic search (mostly from Google), 23% come from access from an internal link of the department’s web- site, 8% come from direct access (generally people who copy pasted the link of the report in their web browser) and 15% come from links from other websites pointing to the report. Less than 1% of the downloads comes from social net- works. The print version of the report (around 400 copies) circulates mostly among institutions and libraries, but can also be ordered by anyone through the de- partment’s website (15$ administrative fee). In most cases, these institutions are interested to own a physical copy of the report mostly for archiving purpose or for the sake of completeness. The direct audience is generally very sophisticated, hold a bachelor/master de- gree or PhD and are less sensitive to the form of the report than to its content. The report also has an indirect audience, that is people who will read extracts of it or content inspired by the report but rephrased to be more accessible, for example in newspapers, magazines or other medias. Generally, these people will not even be aware that the content they are reading originally comes from this report. The profile of this indirect audience is very different from the profile of the direct audience, and will be less educated, less captivated by the topic. 2.2.3 Constraints: Pressures The graphics must be handed tomorrow by 5 pm. No extension will be granted (or ”should be required”, according to the senior analysts). No budget is allo- cated to this project. No particular pressure on the project. Everybody expects it to be a routine task. 13
  • 15. 2.2.4 Constraints: Rules Layout must be close or identical to that of the previous year. Allowed to make some minor changes if necessary. The visual appearance can be changed com- pared to the first year, but only if time permits and if all the four junior analysts agree on the modifications (as all the visuals must obey the same design guide- lines). The original report is in colour but it is not uncommon that people photo- copy parts of it in black and white. In particular, the charts are often used in academic contexts as teaching material and handed out to students. Therefore, the black and white outcome must always be kept in mind when designing the different parts of the report. The font used in the appendix section of the report is Times New Roman 11pt. The report is printed on A4 pages with 1” margins. Each page contains a discreet page number as well as the name of the department of justice at the bottom of the page. The top of the page also includes the name of the current section of the document. The formatting of the report will only take place once all the elements have been completed. All the tables and numeric values of the appendix have to be written in pure text (no images) when possible so as to make them searchable (by the user or the search engine). 2.2.5 Consumption: Frequency A new report is published every two year. Most of the visuals and tables can be replicated from the previous report with only a limited amount of changes necessary. Every time, however, the structure is slightly altered and some ta- bles/visuals are dropped or added. 2.2.6 Consumption: Setting The report is most popular the weeks following its publication. However, it is supposed to remain ”valid” for two years, so figures and conclusions that only have very short term importance should preferably be avoided. As explained earlier, it is distributed through different channels. The main one is the digital version freely available on the website of the department of justice (and indexable by search engines). The print edition has only been around 400 copies the previous year, but this number is now quite flexible as the department has decided to move to the Print-On-Demand (POD) technology for this year to spare costs and gain in flexibility, through Amazon Createspace. 14
  • 16. 2.2.7 Deliverables: Size 34 visuals in total, but I am only in charge of 8. If I finish early, I might get assigned to some of the visuals of my three colleagues. More specifically, the eight visuals consist of: • Three full page tables, mostly containing numbers • One half-page map with legend • One full page set of bar charts • A full page vertical timeline • Two half-page combo line/pie/bar charts (new for this year) The two half-page combo charts should take the most time as I will have to design them from scratch. Two of the three tables require to actively search information in a diverse set of sources. The other five visuals consist mostly of updating information and won’t represent more than a few hours of work. 2.2.8 Deliverables: Format The eight visuals must be delivered in eight different .docx Microsoft Word documents, in the format specified above. As explained above, output must be optimized for both print (including black and white photocopies) and web (including text-only search engines). Another person will be in charge to adapt the Word documents to PDF for- mat for web distribution and POD through Amazon Createspace. 2.2.9 Ressources: Creators I work as part of the team, but I am in charge of my own visuals. I have a good relationship with my teammate, so it is likely that I will ask for their advice in case I have any doubt. The four junior analysts share the same office and collaboration is frequent. 2.2.10 Ressources: Technical All the employees are equiped with one basic Windows desktop computer. The Microsoft Office and Adobe suites are installed by default on all the computers, as well as a few additional softwares. We are free to download additional soft- ware if necessary, but no budget is allocated. However, we are expected to only deliver files that can be read with the default softwares of the department. In particular, all the visuals must be contained within .docx documents. 15
  • 17. 2.3 Vision: The purpose map Describe and reason what the possible aim of this work would be in terms of what experience (the Exs) it would facilitate and through what tone of voice (Read vs. Feel) The visuals and tables clearly aim to be as objective and informative as pos- sible. The focus is on the quantity and the accuracy rather than the ease of understanding. Clearly, these visualisations should occupy the top left corner of the purpose map: 2.4 Vision: Your ideas Sketch out roughly what you think this work could look like: what colours, what keywords, rough drawing, any other work out there that you can be inspired/in- fluenced by? (not a test of artistry, just map out ideas) For most of the visuals, no creativity is required since it will only be about updating the data with more recent information. Most of the data required for the eight visuals can be found directly from the list of executed offenders on the department’s website. Roughly, the eight figures would look a bit like that: 16
  • 18. 17
  • 19. The inspiration for the new combo charts comes directly from the other already existing similar combo slides in the rest of the appendix. Most of the text boxes are simply explaining the content of the tables and graphs. The vertical timeline will be identical to that of the last report, except that two new points must be added with a small text paragraph (the margin has to be reduced to accommodate the new data points). 3 Scenario 3 A campaign group looking to help influence a debate about the ending of capital punishment. 3.1 Context: The reason Outline what you think might be the essence of the curiosity: What question(s) do you think the potential audience might need answering/find interesting that the visualisation would ultimately present? I am part of a small campus association fighting against death penalty, made of a few friends with strong convictions. A major conference on the topic of death penalty will be hosted at our university in a few weeks. The organising committee is made of a few university officials with strong ties to the republican party and most of the speakers expected to be present in the conference are well known for their systematic conservative political bias. 18
  • 20. Due to the importance of the debate, which is expected to attract up to a few thousand students and will be diffused on live television, we decided to take action and protest against the structure in place. The first step of our plan of action is to raise awareness through a strong campaign both in campus and on social networks. We will start our online campaign by publishing a few viral infographics in the university facebook pages. Stakeholders Intrigue. The group is strongly committed to rely on facts for all its points. But we known that our ”opponents” will mostly rely on emotions and ”shock” arguments, so we decided to adapt our approach to the field in which we fight. The stakeholders are made of the 14 group members (including myself). We are already extremely aware of all the facts, evidence and data surrounding the subject so we don’t expect to learn anything new from this infographic. How- ever, we have always found that it was very difficult to convey all the points of our position to an uninformed individual, so we hope that this visualisation will help us come with better ways to convince others of our points. So the kind of questions that we hope this visualisation would help us answer includes: • In which order should we bring our arguments? • What data do we need to show to effectively convince others? • What is the most intuitive way to visualise our strong feelings? • How does it look from outside? • How many points do we actually have? How much space does it take on paper? • Are rationality and fact-based approaches compatible with ”coolness” for this topic? Can we make it interesting while still being objective? Audience Intrigue. The audience will be very diverse in nature and rela- tively uninformed about the topic. As the date of the conference will approach, the topic will become more ”trendy” and students will gain interest, possibly looking to forge their own opinion before the debate. Thus, we expect the peo- ple to be more attentive to our claims than usual. As our university tends to be more conservative in general, we tend to receive less support and more aggressivity from our peers. But most of the students are not politically involved and remain quite open to ideas from any political origin. They are looking for any information that could help them take a stance in the debate. Their questions might include: 19
  • 21. • What are the arguments of both parties? • What are the main ”things to know” about the topic? • Are there any fun figures that I can quote in a discussion with my friends? What can make me sound more clever? • Why should I care? • What is the position of the rest of the student and of the country on the matter? Where do I stand compared to them? For the students who already have a strong opinion on the matter, their intrigue would probably be closer to: • Is this infographic trying to speak about something I don’t want to hear? Does it come from people with a political opinion different from mine that I should just ignore? • What can I find that confirms that the other party has weaker claims than mine? How strong is my position? What are the main arguments of the opposition / our best arguments? But the targeted audience is mostly the first group, the relatively uninformed people, as our influence group largely considers the informed students as ”gained cuases” or ”lost causes” with a too high cost of conversion. 3.2 Context: The circumstances Work through the list (of 10 main headings) and describe your critical thinking about any assumptions, definitions or self-imposed factors you think might be relevant/existent. If any require no definition or are relevant to that scenario, explain why not? 3.2.1 People: Stakeholders As explained earlier, the stakeholders are the 14 students who are part of the group. The group is mostly informal: it is officially registered as a campus organisation, but most of the decisions are taken individually according to per- sonal variations of opinion and motivation. On some occasions, however, we try to coordinate our efforts to reach an objective. This campaign is one of these occasions, so we decided to only publish things on social networks after having informed and received tacit approval from the other 13 members. There is no formal decision process: we have a WhatsApp group in which we send our sug- gestions. If after a few hours no one rejected our idea, it is generally assumed that it is fine to proceed. The 14 students are all undergraduate students in the university, coming from different majors, with a focus on social sciences. 20
  • 22. 3.2.2 People: Audience The university counts 21 235 students, 81% of which are full-time undergradu- ate. As explained earlier, the main target consists of the non-politically affiliated students, but we aim to maximize the total exposure of our campaign so we also hope to reach a few individuals from other categories, including staff, lecturers, university officials and possibly the general population. At the same occasion, we hope to attract a few students to join our organisation. 3.2.3 Constraints: Pressures Our funds are very limited so members mostly use their own resources. The time pressure is quite strong since the conference will take place in a few weeks only and our classes keep us busy most of the time. However, no specific dead- line is set for my own work and it is quite flexible. In the past months, a few students member of various campus groups have received administrative sanctions (going up to expulsion for the most extreme cases) following a protest on diverse issues. The purpose of these sanctions was to send a clear message to student organisations: disobedience to college rules will not be tolerated. This has created a lot of tensions and puts a lot of pressure on many associations, since most students want to stay out of trouble and prefer remaining unnoticed from the administration. In particular, we could face legal action for libel if the elements we publish directly incriminate university officials. Another source of pressure is the ranking algorithm of social networks, Facebook in particular. Since the infographic must become viral on Facebook to reach the diffusion objectives, certain rules must be observed, such as publishing the infographic on the right time of the day, having the right keywords in the text description, having the right dimensions for the picture profile update, adopting the right tone generally expected by Facebook users, etc. 3.2.4 Constraints: Rules Most of the rules are dictated by the practical necessities of publishing on Face- book. The infographic must be in a format that allows for comfortable reading on Facebook. The readers should not have to leave Facebook to see the vi- sualisation on an external website, so this has to be taken into account. We must also remember that Facebook highly compresses images, so no zooming should be required. Possibly, one Facebook post can contain multiple images: this can be used as a solution to the problem of low allowed resolution for indi- vidual images. Interactive images such as GIF are not allowed, as well as most non-standard image formats; the Facebook documentation should be consulted to check for technical feasibility before starting to develop any innovative non- standard visualisation. The logo and the name of the campaign group must be clearly shown on all 21
  • 23. the publications. It is important that the message be conveyed together with the name. 3.2.5 Consumption: Frequency This is a targeted one-off campaign, but the most popular infographics might be republished later or in other social networks, or re-used. 3.2.6 Consumption: Setting Rapid (hopefully viral) consumption on social networks, mostly Facebook. The timing of the publication is very important and is thought in advance to corre- spond to the peaks of social activity of the students. The infographics are generally published through the Facebook page of the campaign group, shared by each of the group members and by a set of sup- porters and partner student associations. It is also published directly on all the public and private Facebook groups of the university to which the student members have access. The publication is often associated with a few prizes to win, attributed to people randomly drawn from the list of people who ”liked” the post, to incite students to like each of the publication and increase virality. 3.2.7 Deliverables: Size I have to prepare one set of images that will be published together in a Facebook post. These images are generally accompanied with a small text description to spark interest and with a link to the (newly created) website of the group. The total number of images is entirely flexible but should remain reasonable to avoid the risk of boring the reader. The first few images should be very simple, very visual and easy to understand to grab the attention of the read- ers. The following images will only be seen by users who have already started reading the post and can, thus, be more sophisticated. Therefore, the first few images will require more ”artistic” and ”creative” work, whereas the last ones are likely to require more research and descriptive work. 3.2.8 Deliverables: Format All the images should be designed with Facebook publication as the main ob- jective. In a second step, those images can be adapted for publishing in other social networks or contexts, such as the group website or print diffusion. From time to time, the group is invited to take part in small talks and re-uses some of the infographics published on Facebook to illustrate some of their slides. 22
  • 24. 3.2.9 Ressources: Creators The association is very collaborative and people are always glad to help each other. Skills are very diverse so when I don’t know how to do something, I generally ask for help and receive some very fast. I am used to work together with two very close friends of mine who are also part of the group and have complementary skills. They are less active than me in the association and are not currently working on any project, but I know I can rely on them to help me with the technical part of the creation. The first is an art student, very good at hand drawings of any sort, and the second is a CS major very agile with design softwares. I generally make a plan of all the details of and create myself the easiest elements, and I rely on them for all the more complex graphical components of the visualisation. 3.2.10 Ressources: Technical As students, we have free access to most of the popular design or technical soft- wares. This includes the whole Microsoft Office and Adobe suites. Each member of the group has its own habit for the design of these visuali- sations, so there is no standard among us. 3.3 Vision: The purpose map Describe and reason what the possible aim of this work would be in terms of what experience (the Exs) it would facilitate and through what tone of voice (Read vs. Feel) The campaign’s tactic is made of three phases: 1. Spark the interest of the reader with shocking facts and numbers 2. Convince him that our side is the good one with objective facts and reliable data 3. Make him emotionally involved so that he doesn’t immediately forgets about what he read after closing the computer and takes action to bring about changes So there is a mix of feeling and reading, with a slight tendency towards feeling. No interaction with the graphs are possible, but it is required from the user to click on the ”Next” button to see the next image of the post, and social in- teraction is encouraged through the comments. Still, this is not enough for the visualisation to qualify as ”exploratory”. It couldn’t be qualified as explanatory either as the amount of reading and detailed information is moderate. Overall, the visualisation tries to be ”exhibitory” and to simply ”show things as they are”, by letting the reader draw the most obvious conclusions by himself. 23
  • 25. Thus, on the purpose map, the visualisation would be around here: 3.4 Vision: Your ideas Sketch out roughly what you think this work could look like: what colours, what keywords, rough drawing, any other work out there that you can be inspired/in- fluenced by? (not a test of artistry, just map out ideas) My original idea was to make a ”Top 10”, which are generally known to attract a lot of ”views” on social medias. It could be called for example ”TOP 10 reasons why Death Penalty is bad”. It would be made of 10 images, starting with the tenth, going up to the first. Normally, the reader would expect the reasons to become increasingly important and this incites him to continue read- ing. Since the argument are quite subjective anyway, this doesn’t necessarily has to be true, but finishing with a memorable point is always a good thing. As explained earlier, the last points can be more developed than the first and could potentially encourage the reader to learn more about the topic by reading other recommended sources. The images and elements of the top 10 could look a bit like that on Facebook: 24
  • 26. 25
  • 27. As shown above, the first image shown could use the picture and some per- sonal details about a specific criminal to make readers feel the scale of the costs and bring it back to an individual level. Making the criminal look more like a person also increases the emotions, since people then no longer see ”criminal- ity” as a society-wide issue but ”criminals” as bad persons who don’t deserve so much attention. Here, I think the main strength of this database of convicted offenders is that it gives us a lot of very personal information we can use to display on selected felons (at the risk of appearing unethical by doing so). The other seven images could make use of the other attributes of the database, most specifically the last statements. For example, a word cloud could probably illustrate one of the images (depending on which words come as the most visi- ble). The frequency of a specific topic in the last statements can also be brought as an argument and be shown visually as part of a bar chart. But overall, the executed offenders database would probably only represent a small part of all the data used for the visualisation. 4 Scenario 4 What potentially intrigues you about this data? What might you undertake if you had the chance? 4.1 Context: The reason Outline what you think might be the essence of the curiosity: What question(s) do you think the potential audience might need answering/find interesting that the visualisation would ultimately present? Personal Intrigue. I am a student in Imperial College London, studying in the new Msc in Business Analytics programme. During our Visualisation course, we explore some innovative ways to visually represent and understand data. In one of the classes, we briefly look at the ”Executed offenders” list of the Texas Department of Justice’s website. At home, after the class, curious, I go back on the website and realise that the data has a lot of potential and decide to explore it a bit further. In particular, I am interested by the individuals’ last statement: What are the main topics/- subjects that an individual wants to speak about/mention when he knows he is about to die? Is it religion, love, family, friendship, politics, justice, regrets, justification, etc.? Have these topics changed over time? Do they differ based on gender, based on age, based on the type of crime committed? I think that the data would be easier to understand and fit well within a fully 26
  • 28. interactive visualisation where parameters can be changed in one click. I re- alise this would be a good opportunity to brush up my JavaScript/JQuery skills and potentially try a few visualisation js/html5 libraries, so I decide to make it web-based. Audience Intrigue. Of course, this is a personal project and the primary questions I try to answer are mine. But since I will do the work anyway, why not share the results? I decide to, in a first step, publish my work on an ad hoc page in my per- sonal website. Depending on my findings, I might dedicate a website to the project. The audience might have the same questions as me, and perhaps even have some additional ideas. A comment section on the website could potentially lead to some very interesting suggestions of extensions to this project. 4.2 Context: The circumstances Work through the list (of 10 main headings) and describe your critical thinking about any assumptions, definitions or self-imposed factors you think might be relevant/existent. If any require no definition or are relevant to that scenario, explain why not? 4.2.1 People: Stakeholders Me, that’s all. I only do that as a side project, for my personal interest. 4.2.2 People: Audience In the beginning, no one except me. Later, once published on my website, just the people I give the link to or potentially a few visitors coming from search engines. If I like my findings and dedicate an entire website to the project, then possibly the audience could become much larger. But that’s not the priority right now. 4.2.3 Constraints: Pressures I have a few weeks until the end of semester projects start. As this is still the beginning of the semester, I have a lot more free time. Hopefully, I can finished the core of the project by the end of the semester and start thinking and work- ing on the ”publishing details” during the vacations. I want to get inspiration from existing interactive visualisations. I need to count a few days to explore the web and see what has been done in the past (not necessarily relating to death penalty). In particular, I want to know what are the main visualisations libraries, what they can do and what they can’t. 27
  • 29. 4.2.4 Constraints: Rules I want my visualisation to be compatible with the main web browsers (except previous versions of Internet Explorer) and smartphones (Android and Iphone). It has to run on almost any screen size/resolution and shouldn’t require any external plugin. Ideally, the loading time should be short (so not too many files to download) to allow for low speed smartphones to access the page without wifi connection. 4.2.5 Consumption: Frequency Normally a one-off project, but I could update the data as it becomes available (i.e. as more criminals receive capital punishment, or using data from other states) or improve it based on the feedback I receive. In any case, if I publish the results, they should remain public on the website forever; I generally try to keep all my websites and publications online, even when they become obsolete, for archive purposes. 4.2.6 Consumption: Setting My own consumption: Live, as I work on the project. Consumption from the audience: If published, prolonged, with most likely a peak when I first release the results on social media. Potential peaks are also possible from search engines when a major capital-punishment-related event happens, if my project’s page has a good SEO (Search Engine Optimisation). In general, I don’t expect a very high level of traffic on the page. 4.2.7 Deliverables: Size One static web-page only. However, I see two main sources of work: • Working on the data. Categorising the different last statements ap- propriately. What should be the relevant ”categories” of last statements? What level of detail should we have (should we just look for ”religion” in general, or distinguish between different types of religions?)? How to de- fine whether one last statement includes a link to ”religion” or not? What should we consider as ”religion”? How to understand last statements that are not clearly phrased? • Working on the visualisation per say. How to communicate the data in an explicit way? How to visually express the strength of correlations? How to make it look like an objective approach? How to make it intuitive, how to invite the visitors to try changing the different parameters of the interactive visualisation? The first part of the work will be a bit more ”mechanic” but must be done carefully or the study won’t be valid. Most likely, every of the hundreds of statements will have to be MANUALLY categorised. A clustering algorithm 28
  • 30. would probably give poor results given the complexity of the data and rather small size of the population, and a bit overkilling. Hopefully, I could finish this first step by the end of the semester. Once this first part is done, it will already be possible to computer a few statis- tics and see whether there are some interesting trends. If it shows that there is value in the data and it is worth exploring more, then I could move to the second step. The time requirement for this second part are likely to be more variable, less predictable, as it is the ”creative” part and could be polished for month if wanted. 4.2.8 Deliverables: Format One web page, made of multiple files (html, js, png, jpg, etc.). As explained before, the output should be static and therefore not require any interaction with the server once the page is loaded. This means that the user can explore the whole dataset offline and look at the source code freely. The data won’t be stored in a database but directly within the page. 4.2.9 Ressources: Creators Individual project. Some other contributors could join in later steps, but not expected for now. 4.2.10 Ressources: Technical I have access to most softwares and can generally easily download additional softwares I could need. I have my own dedicated server to host the website and the project, that I already use for my other websites. I am already familiar with Javascript and HTML web development, so I can use advanced features easily if necessary. The data classification will most likely be done in Excel to simplify things. 4.3 Vision: The purpose map Describe and reason what the possible aim of this work would be in terms of what experience (the Exs) it would facilitate and through what tone of voice (Read vs. Feel) This visualisation will clearly be exploratory, as the user can play with the parameters. However, I wouldn’t put it too far on the right in the purpose map as it doesn’t have any complex interaction possibilities: you can only change the values of a few field (and post a comment at the bottom of the page), you can’t really contribute to the dataset itself or radically influence what you see on the screen. 29
  • 31. The visualisation is closer to ”feeling” than to ”reading”: even though the numbers will be shown, the main purpose is to give an approximate idea of the distributions rather than exact values. The last statements being classified on partially subjective criteria anyway, trying to look too ”serious” and ”scientific” would reduce the credibility and interest of this visualisation. The resulting purpose map looks like this: 4.4 Vision: Your ideas Sketch out roughly what you think this work could look like: what colours, what keywords, rough drawing, any other work out there that you can be inspired/in- fluenced by? (not a test of artistry, just map out ideas) The core idea behind this visualisation is two easy steps: 1) You pick a pro- file. 2) We show you what the last words of a criminal with this profile typically are about. When I want to give a general idea of the skills/traits of an individual, I like to use the radial ”star chart”: 30
  • 32. I believe it would fit perfectly for our needs here too, since it gives more an idea regarding the last statements than a specific value. I would also like to give the user the possibility to explore the data at the individual level. So I would probably add a rectangle with all the pictures of the convicted offenders, and the user could pass the mouse over the pictures to see the details about that individual, including how its last statement has been categorised. Possibly, we could add a word cloud at the bottom of the visualisation to give a broad overview of the words used in the statements. Overall, our visualisation would look similar to this: 31
  • 33. 32
  • 34. 33
  • 35. Assignment 1b: Developing intimacy with data and establishing editorial perspectives Introduction You are working at a broadsheet newspaper as a graphics editor and preparing ideas for your assignments editor about possible visual work you could undertake ahead of the Rio Olympics. Compile a report that details your deep examination, proposed transformations, explorations and editorial ideas based on the data provided (and data you could reasonably obtain). You are provided with two contrasting worksheets (in Excel) showing medal- lists from the Summer Olympic Games 34
  • 36. 5 Examination Articulate the meaning of the data (representativeness and phenomenon) and thoroughly assess and describe the physical properties (type, size, condition). Compare what the two dataset offer and contrast their differences. 5.1 Dataset 1 Number of observations (rows): 4093 Each observation represents: a won olympic medal (can be won by multiple athletes together) Observations that aren’t included? Only rows for medals of 4 sports (Ath- letics, Canoeing, Rowing and Swimming). Only includes years 1896 to 2012. Columns per row: 10 Columns details and structure: • Games: Integer, contains the year (4 digits) of the games. Range: 1896- 2012. Most frequent value: 2012, appears 267 times. Least frequent value: 1896, appears 20 times. • Sport: String, 4 unique values (Athletics, Canoeing, Rowing and Swim- ming). For each year (variable ”Games”), each of the sports should nor- mally appear, but in the older years (e.g. 1896), some of the sports weren’t introduced yet. Most frequent value: Athletics, appears 1550 times. Least frequent value: Canoeing, appears 491 times. • Event: String, 93 unique values, contains the ”discipline” as well as the parameters of the discipline (such as distance) and the information about the ”gender” of the competition. For each year, each of the 93 events should normally appear, but in the older years (e.g. 1896), some of the events weren’t introduced yet. Each event is always grouped with the same sport. Most frequent value: 100m Men, appears 85 times. Least frequent value: 3000m Steeplechase Women, appears 3 times. • Athlete(s): String, 3211 unique values. Format: first name followed by last name, with the last name written in upper case characters. Note that names with special characters are incorrectly stored (have question marks instead of the true character). For competitions by teams, athletes are separated by commas. Most frequent value: Michael PHELPS, appears 13 times. • CountryCode: String, 3 upper case characters, contains the country code of the country that the athlete represents. 102 unique values. Most frequent value: USA, appears 982 times. Note: Countries can appear/dis- appear/merge over time depending on political context; example: GDR (German Democratic Republic) and FRG (West Germany) became GER (Germany). 35
  • 37. • CountryName: String, full country name. 96 unique values. Mostly redundant with ”CountryCode” with a few exceptions; for example: Ger- many can have country code DEU or GER depending on the year. • Medal: String, contains the type of medal won. 3 unique values (Gold, Silver, Bronze). Most frequent value: Gold, appears 1368 times. Least frequent value: Bronze, appears 1358 times. • Result: String, contains the performance of the athlete/country. The formatting is different for each displicine, as the result can be a distance, a time or another metric. This column has to be read together with the next one (Unit). Note: a few rows have the value ”No result”. • Unit: String, contains the key of how to interpret the ”Result” variable. 4 unique values (M:S:DD, H:MM:SS, M:SS:DD, #:DD) • ResultInSeconds: Integer, redundant with the column ”Result” but here the information is stored in seconds. Note: One row has incorrectly formatted values due to disqualification. 5.2 Dataset 2 Number of observations (rows): 26398 Each observation represents: an athlete that won an olympic medal (if same medal won my multiple persons, 1 row per person) Observations that aren’t included? Only includes years 1920 to 2008. Columns per row: 10 Columns details and structure: • City: String, contains the city where the olympic games of this year were hosted. 20 unique values. Most frequent value: Los Angeles, appears 2074 times. Least frequent value: Amsterdam, appears 710 times. • Edition: Integer, contains the year (4 digits) of the games. 21 unique values. Range: 1920-2008. For each year, the city is always the same. Most frequent value: 2008, appears 2042 times. Least frequent value: 1932, appears 615 times. • Sport: String, 33 unique values. For each year (variable ”Edition”), each of the sports should normally appear, but in the older years, some of the sports weren’t introduced yet. • Discipline: String, 47 unique values. For each year (variable ”Edition”), each of the disciplines should normally appear, but in the older years, some of the disciplines weren’t introduced yet. For each discipline, the sport is always the same. 36
  • 38. • Athlete(s): String, 19356 unique values. Format: last name in capital characters, followed by a comma and by the first names of the athlete. • NOC: 3 characters string, contains the country code of the country to which belong the athlete. 134 unique values. • Gender: String, 2 unique values (Men and Women), indicates whether the athlete is male or female. Number of rows with ”Men”: 18967. Num- ber of rows with ”Women”: 7427. Percentage of rows with women: 39.16% (note that this doesn’t indicate that 39.16% of the participants are women, but that 39.16% of the medal holders are women. Women participation is actually much lower). • Event: String, 442 unique values, generally contains the ”discipline” as well as the parameters of the discipline (such as distance) but doesn’t contain the gender requirements of the event. Sometimes discipline is omitted and only parameters are shown. • Event gender: 1 character string, 3 unique values (M, W and X), indi- cates whether the competition requires participants to be men, women, or mixed (X). • Medal: String, contains the type of medal won. 3 unique values (Gold, Silver, Bronze). 5.3 Difference between the two datasets The biggest differences are: • Size: The second dataset contains much more rows because it covers much more categories of sports (33 instead of 4). However, the first dataset covers more years since it goes up to 2012 (compared to 2008) and down to 1896 (compared to 1920). • Meaning of a row: In the first dataset, one row is one medal, whereas in the second one row is one athlete. In most cases, this doesn’t make any different, but it will create more rows for group sports such as relay. • Columns: Both datasets have different columns. The first dataset has information about the results (time), which the second doesn’t have. The second has information about the city and the discipline, which the first doesn’t have. In addition, the second has the ”gender” information stored separately from the event with a column for the gender of the athlete (not found at all in dataset 1) and a column for the gender requirements of the competition (found in dataset under the column ”event”). 37
  • 39. 6 Transform the data What could you do/would you need to do to clean or enhance the data? What other data could you reasonably source in order to consolidate the data? You may optionally do this but you are only expected to write about what you would do and why. 6.1 Dataset 1 cleaning I recommend the following operations: • Uniformise column ”CountryName” so that it is constant over time. We are not trying to analyse political history but sport performance so it is preferable to simplify the data and focus on the geographical aspects. For example, replace all the variations of Germany such as ”German Demo- cratic Republic” by ”Germany”. • Drop column ”CountryCode”, which is now fully redundant with ”Coun- tryName” thanks to the uniformisation. • Drop columns ”Result” and ”Unit” that are redundant with ”ResultsIn- Seconds”. The interpretation and calculations are made much easier if everything is already in numeric format. • Drop the disqualified athlete row. • Investigate the ”No result” rows, and possibly drop them as well. • Manually repair the athlete names when they have question marks, using information found on internet. • Create a new category for whether an event is ”men” or ”women” only. Then update the ”event” column to remove all the ”Men” and ”Women” text since it is now stored in a different column. • Create a column for ”group performance” that is worth either 0 or 1 de- pending on whether multiple athletes won the medal together (for example with relay disciplines). This can help to identify more easily the rows that contain multiple athletes. 6.2 Dataset 2 I recommend the following operations: • Create two new columns: one for the first name of the athlete, one for the last name, and then drop the old column. • Uniformise column ”NOC” so that it is constant over time. We are not trying to analyse political history but sport performance so it is preferable to simplify the data and focus on the geographical aspects. For example, replace all the variations of Germany such as ”GDR” by ”GER”. 38
  • 40. • Create a new binary column to indicate that an athlete won as part of a team. • Create an integer column that contains the number of members in the team of the athlete if the athlete was part of a team, and 1 otherwise. Necessary to avoid giving the impression that some countries have won more medals when they were just better at team sports. Overall, the data is quite clean and much better structured than the first dataset. 6.3 Data enhancement The biggest enhancement that I recommend is to merge the two datasets to- gether to take only the advantages of each dataset. Concretely, here is how I would proceed: • Start from dataset 2 • Add a column ”ResultInSeconds” • Add the values ”ResultsInSeconds” to each row of dataset 2 using the corresponding element from dataset 1 • Add all the rows from dataset 1 for year 2012 and years 1896-1916 to dataset 2, since dataset doesn’t have any information on these years Our new dataset is now a perfect combination of the two datasets in terms of information potential. However, this has also produced a lot of missing values, notably in the column ”ResultsInSeconds” and for all the rows that we added from dataset 1 (as well as no information on many disciplines for the year 2012). These missing values might create some technical problems when generating the different visualisations, and it might not be worth the effort. Depending on what we want to visualise, we might decide to either use this new merged dataset, or only dataset 1, or only dataset 2. Ideally, I would look for a dataset similar to dataset 2 but that also includes the missing year and the result in seconds from internet. If we find some more complete datasets that also include personal information on the athletes (such as weight, education, ethnicity, etc.) it could be very interesting for visualisa- tions. I would also look to complete this data with information related to the games themselves, such as the costs, the number of visitors, etc. 7 Exploration Use Excel/Tableau to visually explore the two datasets in order to deepen your appreciation of their physical properties and to help you brainstorm potentially interesting angles of analysis. 39
  • 41. I tried to experiment different visual formats (histograms, bar charts, lines, pie charts, maps, etc.) and exploring how each value is better represented (color? Label? size?). I tried to identify which subgroups have the most data and thus would be interesting to be the object of a visualisation (e.g. the USA, since it has won a lot of medals), which variables have the most variation or the least (e.g. the proportion of gold medals overall is the same each year, but it varies a lot if we only take the proportion of gold medals for a specific country) and which variables take too many possible values to be represented entirely (for example there are too many disciplines to represent them all in a single pie chart). A few of the graphs I have generated through my exploration are shown be- low. 7.1 Dataset 1 Top 25 countries with the most medals: Bubble map by number of medals: 40
  • 42. Map representation by number of medals: 41
  • 43. Average length of a competition, in seconds, by sport: 42
  • 44. Total number of medals over the years: 43
  • 45. 7.2 Dataset 2 Proportion of male/female medals: Proportion of male/female medals over time: 44
  • 46. USA medals by sport over time: 45
  • 47. USA medals by type (bronze, silver, gold) over time: 46
  • 48. Sports by number of disciplines and categories of events: 47
  • 49. 8 Editorial Rationalise a list of at least 5 distinct, interesting editorial perspectives: artic- ulate the angle and the framing/focus applied. Following the insights gained through the exploration of the data, I came up with a few angles that I thought would be interesting to develop and would perfectly fit in a journal for example. For the following perspectives, unless otherwise specified, I consider the merged dataset (combination of dataset 1 and dataset 2) proposed above. 8.1 Perspective 1 Angle. What are the countries that have historically had the most importance in Olympic games? 48
  • 50. Framing. Take all the years from 1896 to 2012, all the sports, all the disci- plines and all the events. Count teams once only, not once per individual. Focus. Focus on the top 20 countries, and show the composition of the medals (i.e proportion of gold, silver, bronze) through a stacked bar chart. 8.2 Perspective 2 Angle. Has the required performance to obtain a medal in athletics changed over the years? Framing. Use dataset 1, for years 1896 to 2012. Only consider rows that have a result in seconds stored. Exclude team competitions. Focus. Group by categories (distance and gender) and show one time series line for each category. Years on the x axis and result on the y axis. Draw men and women lines in two different colors. Draw multiple graphs with different y scales or use an exponential y scale to avoid loosing accuracy on some categories (e.g. sprint vs marathon, where competition are sometimes a hundred times longer). 8.3 Perspective 3 Angle. How big the gap is between women’ scores and men’ scores in the 100m? Has it decreased over the years? Framing. Use dataset 1, for years 1928 (introduction of the women 100m event) to 2012. Take only gold medals. Focus. Focus on the difference in seconds and plot it as a bar chart (which could take negative values if a woman gets a better result on a year), with years on the x axis and score difference on the y axis. 8.4 Perspective 4 Angle. Are new categories still regularly added to the games? Framing. Use dataset 2, for years 1980 to 2008 (recent years only). Count every new event as a new category, including the introduction of a new gender- specific category. If a category is dropped from the games, also reduce the count by 1. 49
  • 51. Focus. Focus on the difference and plot it as a bar chart (which could take negative values if a year has less categories than the previous one), with years on the x axis and difference of categories on the y axis. Show the total number of categories on top of each bar, as well as the total number of sports in parentheses. If the new categories are part of a new sport, print this proportion of the bar in a different color (i.e. like a stacked bar chart). 8.5 Perspective 5 Angle. How difficult is it to obtain multiple medals for an individual? Framing. Consider all individuals who won a medal, individually or by team, from 1896 to 2012, for any discipline in any category. Exclude individuals who only won one medal. Focus. Only show the count for each number of medals. Use a bar chart with the total number of medal won on the y axis (excluding 1) and the number of athletes in that position (frequency) as the length of the bar on the x axis. Presumably, there are going to be very few people with multiple medals, and only one or two at the top. Focus on the few outliers who have won more medals than any else in history and show a few personal details on these people in an ”infobox” under the bar chart so that the readers can understand better how it is possible to have so many victories. 50