1. Data Analytics process in
Learning and Academic
Analytics projects
Day 4: Data visualization
Alex Rayón Jerez
alex.rayon@deusto.es
DeustoTech Learning – Deusto Institute of Technology – University of Deusto
Avda. Universidades 24, 48007 Bilbao, Spain
www.deusto.es
2. “Perfection is achieved not
when there is nothing more to
add, but when there is nothing
left to take away”
Antoine de Saint-Exupery
4. “[...] people almost universally use story
narratives to represent, reason about, and make
sense of contexts involving multiple interacting
agents, using motivations and goals to explain
both observed and possible future actions. With
regard to learning analytics, I’m seeing this as how
it can contribute to the retrospective
understanding and sharing of what transpired
within the operational contexts”
[Zachary2013]
5. Objectives
● Know the foundations
○ Learn the principles of information visualization
● Learn about existing techniques and systems
○ Effectiveness
○ Develop the knowledge to select appropriate
visualization techniques for particular tasks
● Build
○ Build your own visualizations
○ Apply theoretical foundations
6. Table of contents
● Introduction
● History
● Concept
● Process
● Mistakes in visualization
● Tools
● Designing a Dashboard
7. Table of contents
● Introduction
● History
● Concept
● Process
● Mistakes in visualization
● Tools
● Designing a Dashboard
8. Introduction
● Danger of getting lost in data, which may be:
○ Irrelevant to the current task in hand
○ Processed in an inappropriate way
○ Presented in an inappropriate way
Source: http://www.planetminecraft.com/server/padlens-maze/
10. Introduction (III)
● Good graphics….
○ Point relationships, trends or patterns
○ Explore data to infer new things
○ To make something easy to understand
○ To observe a reality from different viewpoints
○ To achieve an idea to be memorized
11. Introduction (IV)
● It is a way of expressing
○ Like maths, music, drawing or writing
● So, it has some rules to respect
Source: http://powerlisting.wikia.com/wiki/Mathematics_Manipulation
12. Table of contents
● Introduction
● History
● Concept
● Process
● Mistakes in visualization
● Tools
● Designing a Dashboard
13. History
Definition and characteristics
18th Century 19th Century 20th Century
Joseph Priestley
William Playfair
John Snow
Charles J. Minard
F. Nightingale
Jacques Bertin
John Tukey
Edward Tufte
Leland Wilkinson
14. History
18th Century: Joseph Priestley
Source: http://en.wikipedia.org/wiki/A_New_Chart_of_History#mediaviewer/File:A_New_Chart_of_History_color.jpg
15. History
18th Century: Joseph Priestley (II)
● Lectures on History and General
Policy (1788)
○ A Chart of Biography (1765)
○ A New Chart of History (1769)
● Beautiful metaphors of an
inaccurate and abstract
dimension (time) translated to a
concrete one (space)
○ Time thinking consumes cognitive
resources
27. Concepts
Introduction (II)
● Cognitive tools: extending human perception
and learning
○ Were invented and developed by our ancestors for
making sense of the world and acting more
effectively within it
■ Stories that helped people to remember things by
making knowledge more engaging
■ Metaphors that enabled people to understand one
thing by seeing it in terms of another
■ Binary oppositions like good/bad that helped
people to organize and categorize knowledge
30. Concepts
Data visualization
The use of computer-supported,
interactive, visual
representations of abstract
elements to amplify cognition
[Card1999]
31. Concepts
Information visualization
● Also known as InfoVis
● Focuses on visualizing non-physical, abstract
data such as financial data, business
information, document collections and
abstract conceptions
● However, inadequately supported decision
making [AmarStasko2004]
○ Limited affordances
○ Predetermined representations
○ Decline of determinism in decision-making
32. Concepts
Geovisualization
● Geo-spatial data is special since it describes
objects or phenomena that are related to a
specific location in the real world
Source: http://www.boostlabs.com/why-geovisualization-geographic-visualization-works/
35. Concepts
Visual Analytics (III)
[Keim2006]
“Visual analytics is more than just visualization and
can rather be seen as an integrated approach
combining visualization, human factors and data
analysis. [...]integrates methodology from information
analytics, geospatial analytics, and scientific analytics.
Especially human factors (e.g., interaction, cognition,
perception, collaboration, presentation, and
dissemination) play a key role in the communication
between human and computer, as well as in the
decisionmaking process.”
36. Concepts
Visual Analytics (IV)
● [Shneiderman2002] suggests combining
computational analysis approaches such as
data mining with information visualization
● People use visual analytics tools and
techniques to
○ Synthesize information and derive insight from
massive, dynamic, ambiguous and often conflicting
data
○ Detect the expected and discover the unexpected
○ Provide timely, defensible, and understandable
assessments
○ Communicate assessment effectively for action
38. Concepts
Visual Analytics (VI)
● Combine strengths of both human and
electronic data processing [Keim2008]
○ Gives a semi-automated analytical process
○ Use strengths from each
46. Table of contents
● Introduction
● History
● Concept
● Process
● Mistakes in visualization
● Tools
● Designing a Dashboard
47. Process
Introduction
The purpose of analytical displays of evidence is to assist thinking.
Consequently, in constructing displays of evidence, the first question
is, “What are the thinking tasks that these displays are supposed
to serve?” The central claim of the book is that effective analytic
designs entail turning thinking principles into seeing principles. So, if
the thinking task is to understand causality, the task calls for a design
principle: “Show causality.” If a thinking task is to answer a question
and compare it with alternatives, the design principle is: “Show
comparisons.” The point is that analytical designs are not to be
decided on their convenience to the user or necessarily their
readability or what psychologists or decorators think about them;
rather, design architectures should be decided on how the
architecture assists analytical thinking about evidence.
Edward T. Tufte in an interview
49. Process
1) Data transformation
● Encoding of value
○ Univariate data
○ Bivariate data
○ Multivariate data
● Encoding of relation
○ Lines
○ Maps and diagrams
50. Process
1) Data transformation (II)
● Encoding of value
○ Univariate data
○ Bivariate data
○ Multivariate data
● Encoding of relation
○ Lines
○ Maps and diagrams
56. Process
1) Data transformation (VIII)
● Encoding of value
○ Univariate data
○ Bivariate data
○ Multivariate data
● Encoding of relation
○ Lines
○ Maps and diagrams
57. Process
1) Data transformation (IX)
● Relation
○ A logical or natural association between two or more
things
○ Relevance of one to another
○ Connection
58. Process
1) Data transformation (X)
Source: http://www.digitaltrainingacademy.com/socialmedia/2009/06/social_networking_map.php
Social network
Lines indicate
relationship
65. Process
2) Visual mapping (II)
● Two researchers of the AT&T Bell Labs,
William S. Cleveland y Robert McGill,
published a core article in the Journal of the
American Statistical Association
● The title was: “Graphical perception: theory,
experimentation, and application to the
development of graphical methods”
● It proposes a guide the most suitable visual
representation depending on the objective of
each graph
66. Process
2) Visual mapping (III)
“A graphical form that involves
elementary perceptual tasks that lead to
more accurate judgements than another
graphical form (with the same
quantitative information) will result in a
better organization and increase the
chances of a correct perception of
patterns and behavior.”
67. Process
2) Visual mapping (IV)
Source: http://www.businessinsider.com/pie-charts-are-the-worst-2013-6
“Save the pies for
dessert”
(Stephen Few)
68. Process
2) Visual mapping (V)
Source: http://blogs.elpais.com/.a/6a00d8341bfb1653ef0167631df6f7970b-550wi
69. Process
2) Visual mapping (VI)
Source: http://blogs.elpais.com/.a/6a00d8341bfb1653ef016302299aa9970d-
550wi
In some representations,
the accuracy is not the
objective, but the
perception of general
patterns, concentrations,
aggregations, trends, etc.
The shapes in the low
part of the list could be
quite useful
72. Process
2) Visual mapping (IX)
● Maria Kozhevnikov, states that not
everybody understands statistical graphs
easily
○ It depends on some activation patterns within the
brain
● In one of her studies, she exposed how artists,
architects and scientifics interpret graphs in
different ways
○ The same happens with regular readers
77. Process
2) Visual mapping (XIV)
Temporal variance of a magnitude?
A line chart
(Source: http://en.wikipedia.org/wiki/Line_graph)
78. Process
2) Visual mapping (XV)
Correlation among two variables?
A scatter plot
(Source: http://en.wikipedia.org/wiki/Scatter_plot)
79. Process
2) Visual mapping (XVI)
Difference between two variables?
As Cleveland and McGill states, our brain has problems comparing angles,
curves and directions → if we want to show the difference, we must represent
directly the difference
or
80. Process
2) Visual mapping (XVII)
Source: http://www.excelcharts.com/blog/uncommon-knowledge-about-pie-charts/#prettyPhoto[gallery]/0/
82. Process
2) Visual mapping (XIX)
Source: http://blogs.elpais.com/.a/6a00d8341bfb1653ef0153903da6ba970b-550wi
A map
Graphics
Numeric
table
83. Process
2) Visual mapping (XX)
Source: http://blogs.elpais.com/.a/6a00d8341bfb1653ef0153903da6ba970b-550wi
Different
visualization
configurations
Filters (zoom, search
tool, select data by
continent and size)
Depth search (click in
the bubbles and show
more data, etc.)
84. Process
2) Visual mapping (XXI)
Source: http://www.stonesc.com/Vis08_Workshop/DVD/Reijner_submission.pdf
89. Process
Principles
● Summary of Tufte’s principles
○ Tell the truth
■ Graphical integrity
○ Do it effectively with clarity, precision, etc.
■ Design aesthetics
“The success of a visualization is based on deep
knowledge and care about the substance, and the
quality, relevance and integrity of the content”
[Tufte1983]
90. Process
Principles (II)
● Design aesthetics: five principles
○ Above all else show the data
○ Maximize the data-ink ratio, within reason
○ Erase non-data ink, within reason
○ Erase redundant data-ink
○ Revise and edit
91. Process
Principles (III)
● Preattentive attributes
○ Color
○ Size
○ Orientation
○ Placement on page
or
Source: http://www.storytellingwithdata.com/2011/10/google-example-preattentive-attributes.html
92. Table of contents
● History
● Concept
● Process
● Mistakes in visualization
● Tools
● Designing a Dashboard
95. Mistakes in visualization
Some mistakes (II)
● Multidimensionality
● Lack of context and
understanding
○ Are the numbers
relevant?
○ What do they mean?
○ How do they affect
to me?
An onion with just one layer
96. Mistakes in visualization
Some mistakes (III)
Problems?
Try to identify:
1) The biggest donor in 2008
2) The smallest donor in 2009
3) The variation between
2008 and 2009
4) Which region received the
biggest amount of moneySource: http://blogs.elpais.com/.a/6a00d8341bfb1653ef0153903125d9970b-550wi
97. Mistakes in visualization
Some mistakes (IV)
● A map is not the best
way to represent that
data
● If I want to answer
previously stated
questions I must search
for the relevant figures,
memorize them and
then compare
Source: http://blogs.elpais.com/.a/6a00d8341bfb1653ef0153903125d9970b-550wi
98. Mistakes in visualization
Some mistakes (V)
Problems?
The graph tries to reveal the
size of UK’s deficit (the black
box in the right side)
Does the graph helps in the
contextualization?
Can we analyze data deeper?
How can we compare?
Know the differences?
Source: http://blogs.elpais.com/.a/6a00d8341bfb1653ef015390a96894970b-550wi
99. Mistakes in visualization
Some mistakes (VI)
Source: http://blogs.elpais.com/.a/6a00d8341bfb1653ef015390a98d8a970b-550wi
Solution
100. Mistakes in visualization
Some mistakes (VII)
Problems?
Bar values should start at zero
Source: http://www.qualitydigest.com/inside/quality-insider-article/asci-customer-satisfaction-airlines-remains-low.html
101. Table of contents
● History
● Concept
● Process
● Mistakes in visualization
● Tools
● Designing a Dashboard
109. Tools
ggplot2 in R
An implementation of the Grammar of Graphics
by Leland Wilkinson
“In brief, the grammar tells us that a statistical
graphic is a mapping from data to aesthetic
attributes (color, shape, size) of geometric objects
(points, lines, bars). The plot may also contain
statistical transformations of the data and is
drawn on a specific coordinate system”
127. Dashboard
Definition
“A dashboard is a visual display of the
most important information needed to
achieve one or more objectives;
consolidated and arranged on a single
screen so the information can be
monitored at a glance”
[Few2007]
128. Dashboard
Characteristics
● Visual displays
● Display information needed to achieve specific
objectives
● Fits on a single computer screen
● Are used to monitor information at a glance
● Have small, concise, clear, intuitive display
mechanisms
● Are customized
129. Dashboard
Categories
Role Strategic, Operational, Analytical
Type of data Quantitative, Non-quantitative
Data domain Sales, Finance, Marketing, Manufacturing, Human Resources, Learning, etc.
Type of measures Balanced Scored Cards, Six Sigma, Non-performance
Span of data Enterprise wide, Departmental, Individual
Update frequency Monthly, Weekly, Daily, Hourly, Real-time
Interactivity Static display, Interactive display
Mechanisms of
display
Primarily graphical, Primarily text, Integration of graphics and text
Portal functionality Conduit to additional data. No portal functionality
130. Dashboard
Common mistakes
1) Exceeding the boundaries of a single
screen
● Information that appears on dashboards is
often fragmented in one of two ways:
○ Separated into discrete screens to which one must
navigate
○ Separated into different instances of a single screen
that are accesses through same form of interaction
131. Dashboard
Common mistakes (II)
2) Supplying inadequate context for the data
● Fail to provide adequate context to make the
measures meaningful
3) Displaying excessive detail or precision
● Show unnecessary detail
4) Choosing a deficient measure
● Use of measures that fail to directly express
the intended message
132. Dashboard
Common mistakes (III)
5) Choosing inappropiate display media
● Common problem with pie charts ;-)
6) Introducing meaningless variety
● Exhibit unnecessary variety of display media
133. Dashboard
Common mistakes (IV)
7) Using poorly designed display media
● A legend was used to label and assign values to the slices
of the pie. This forces our eyes to bounce back and forth
between the graph and the legend to glean meaning,
which is a waste of time and effort when the slices could
have been labeled directly.
● The order of the slices and the corresponding labels
appears random. Ordering them by size would have
provided useful information that could have been
assimilated instantly.
● The bright colors of the pie slices produce sensory
overkill. Bright colors ought to be reserved for specific
data that should stand out from the rest.
134. Dashboard
Common mistakes (V)
8) Encoding quantitative data inaccurately
9) Arranging the data poorly
● The most important data ought to be
prominent
● Data that require immediate attention ought
to stand out
● Data that should be compared ought to be
arranged and visually designed to encourage
comparisons
135. Dashboard
Common mistakes (VI)
10) Highlighting important data ineffectively
or not at all
● Fail to differentiate data by its importance
○ Giving relatively equal prominence to everything on
the screen
11) Cluttering the display with useless
decoration
● Try to look something that is not
● It results in useless and distracting decoration
136. Dashboard
Common mistakes (VII)
12) Misusing or overusing color
● Too much color undermines its power
13) Designing an unattractive visual display
● The fundamental challenge of dashboard
design is to effectively display a great deal of
often disparate data in a small amount of
space
137. Dashboard
Buzz words
● Dashboards
○ Presents information in a way that is easy to read and
interpret
● Key Performance Indicator
○ Success or steps leading to the success of a goal
138. Dashboard
Exploratory Analytics Requirements
● The tool ideally exhibits the following
characteristics:
○ Provides every analytical display, interaction, and
function that might be needed by those who use it for
their analytical tasks
○ Grounds the entire analytical experience in a single,
central workspace, with all displays, interactions, and
functions within easy reach from there
139. Dashboard
Exploratory Analytics Requirements (II)
● The tool ideally exhibits the following
characteristics:
○ Supports efficient, seamless transitions from one step
to the next of the analytical process, even though the
sequence and nature of those steps cannot be
anticipated
○ Doesn’t require a lot of fiddling with things to whip
them into shape to support your analytical needs
(such as having to take time to carefully position and
size graphs on the screen)
143. Dashboard
Interactive data visualizations (II)
Graphic
design
Data analysis
Interactive
design
Exploratory
Data analysis
Interactive
visualization
User
interface
design
Static
visualization
144. Dashboard
Interactive data visualizations (III)
● When is static representation not enough?
○ Scale
■ Too many data points
■ Too many different dimensions
○ Storytelling
○ Exploration
○ Learning
147. Dashboard
Interactive data visualizations (VI)
Pick a detail from a larger dataset to keep track of it
Source: http://en.wikipedia.org/wiki/Closest_pair_of_points_problem
163. Dashboard
Interaction framework (IV)
Passive interaction
Two important aspects of passive interaction:
1) During typical use of a visualization tool, most
of the user’s time is spent on passive interaction
– often involving eye movement
2) Passive interaction does not imply a static
representation
167. References
[AmarStasko2005] Amar, R. A., & Stasko, J. T. (2005). Knowledge precepts for design and evaluation of information visualizations. Visualization and
Computer Graphics, IEEE Transactions on, 11(4), 432-442.
[Cairo] Alberto Cairo [Online]. URL: https://twitter.com/albertocairo
[Chi2000] Chi, Ed H. "A taxonomy of visualization techniques using the data state reference model." Information Visualization, 2000. InfoVis 2000.
IEEE Symposium on. IEEE, 2000.
[ClevelandMcGill1985] Cleveland, William S., and Robert McGill. "Graphical perception and graphical methods for analyzing scientific data." Science
229.4716 (1985): 828-833.
[Few2004] Few, Stephen. "Show me the numbers." Analytics Pres (2004).
[Few2007] Few, Stephen. "Dashboard confusion revisited." Perceptual Edge (2007).
[Fry] Ben Fry [Online]. URL: http://benfry.com/
[Jarvinen2013] Data visualization [Online]. URL: http://lib.tkk.fi/Lic/2013/urn100763.pdf
[Keim2006] Keim, D.A.; Mansmann, F. and Schneidewind, J. and Ziegler, H., Challenges in Visual Data Analysis, Proceedings of Information
Visualization (IV 2006), IEEE, p. 9-16, 2006.
[Kosslyn] Kosslyn Laboratory [Online]. URL: http://isites.harvard.edu/icb/icb.do?keyword=kosslynlab&pageid=icb.page250946
[Malamed] Visual Language for Designers: Principles for Creating Graphics that People Understand [Online]. URL: http://www.amazon.com/Visual-
Language-Designers-Principles-Understand/dp/1592535151
[Shneiderman1996] Shneiderman, Ben. "The eyes have it: A task by data type taxonomy for information visualizations." Visual Languages, 1996.
Proceedings., IEEE Symposium on. IEEE, 1996.
[Shneiderman2002] Shneiderman, B. (2002) Inventing discovery tools: combining information visualization with data mining1. Information
visualization, 1(1), 5-12.
[ThomasCook2005] J.J. Thomas and K.A. Cook, "A Visual Analytics Agenda," IEEE Computer Graphics & Applications, vol. 26, pp. 10-13, 2006.
[Verbert2014a] Visual Analytics [Online]. URL: http://www.slideshare.net/kverbert/in-34471961
[Yau] Nathan Yau [Online]. URL: http://flowingdata.com/about-nathan/
[Zachary2013] Zachary, W., Rosoff, A., Miller, L. C., & Read, S. J. (2013). Context as a Cognitive Process: An Integrative Framework for Supporting
Decision Making. Paper presented at the STIDS.
168. Courses
KU Leuven [Online]. URL: http://ariadne.cs.kuleuven.be/wiki/index.php/MM-Course1314
Berkeley [Online]. URL: http://blogs.ischool.berkeley.edu/i247s13/
Columbia university [Online]. URL: http://columbiadataviz.wordpress.com/student-work/
Information Visualization MOOC [Online]. URL: http://ivmooc.cns.iu.edu/
170. Data Analytics process in
Learning and Academic
Analytics projects
Day 4: Data visualization
Alex Rayón Jerez
alex.rayon@deusto.es
DeustoTech Learning – Deusto Institute of Technology – University of Deusto
Avda. Universidades 24, 48007 Bilbao, Spain
www.deusto.es