Digital Transformation of the Enterprise. What IT leaders need to know!
Indic threads pune12-grammar of graphicsa new approach to visualization-karan
1. Grammar of Graphics:
A New Approach To Visualization
Karanbir Singh Gujral
IBM, India Software Labs
2. Why visualization?
“Visualization & Data Discovery” market is the fastest
growing segment of Business Analytics. Our customers
need a solution that provides:
• High definition (HD) visualizations
• In-market flexibility – new novel visualizations without a
new release
• Portable (across the full mobile landscape)
• Scalable
• Extensible
• Interactive
2
4. Dealing with DRIP: Visualization
Human visual system has evolved over time to spot patterns,
outliers and trend
Gain insight, by visually assessing data first, perform deeper
analysis afterward
Visualization is not just about reporting and
“business graphics”
Anscombe's Quartet
Visualization is the ‘face’ of analysis & knowledge
Visualization is a force multiplier, Analytics
“A great visualization is
not a stand-alone technology Visualization worth a million data points”
5. Visualization by example
• DATA: Basic functionality by any system for analyzing
data is to filter, slice and dice to create a view of the
data you want
• TABLE: Presenting the data in the simplest form
• CHART: Standard recommendation: To compare two
categories of counts, use a clustered bar chart
Month Y1998 Y2008
0 13880 17308
1 10484 20596
2 9847 16183
3 6952 10355
4 9393 6229
5 12870 10931
6 9330 10598
7 14726 9835
8 11893 9913
9 7815 3249
10 6419 4458
11 9900 17779
6. Visualization by example
• Adapt the layout to the data:
Months are cyclical; use a polar
axis. This allows the user to
spot seasonal effects more
easily
• Bars are not good for
comparisons: Change to
aligned points. This allows the
years to be compared directly
• Engage the user: Use a
custom symbol appropriate for
the domain
7. Grammar of Graphics
Grammar not Types
Visualization
Visualization
Not a prescribed “Library of Charts” “Description”
“Description”
A highly adaptive framework that
allows each integrator to quickly create Common Visualization Framework
and customize their own library of
interactive visualizations
Language is flexible enough to:
describe our known chart types
describe unknown chart types
Platform native visualizations
8. Old Way: Charts are Types
Fixed Set of “supported charts”
• If it isn’t in the list, you can’t have it
Expensive and slow to innovate
• Each new chart is a new development effort
“Ad hoc” features tightly coupled to type
• E.g. “Animation only implemented for Hans Rosling-style
bubble charts, not for all charts”
Adding a new feature to 20 charts is a large effort
Kills creativity
9. New Way: Grammar of Graphics
A language-based specification of a chart
In terms of features, not “types”, e.g.
• “bar chart” = basic 2D coordinates, categorical x numeric
displayed with intervals dropped from locations
• “line chart” = basic 2D coordinates, any x numeric displayed
with lines connecting locations
• “histogram” = basic 2D coordinates, numeric x statistic binned
counts, displayed with
Orthogonal set of features describes all common charts,
virtually all uncommon charts, and most cutting-edge research
charts
15. Where is it available?
Books
• Grammar of Graphics by Leland Wilkinson
Open Source
• Javascript libraries: ProtoVis and D3
• Ggplot2 in ‘R’: Statistical computing
• Bokeh in Python
Commercial
• IBM RAVE (Rapidly Adaptive Visualization Engine)
• Tableau software
16. GoG: Composable Set of Chart Features
Element Type
Element Type Guides
Guides Aesthetics
Aesthetics
••Point, line, area, interval ••Simple Axis
Simple Axis ••Map Data to Graphic
Map Data to Graphic
Point, line, area, interval ••Nested Axis Attributes.
(bar), polygon, schema,
(bar), polygon, schema, Nested Axis Attributes.
••Facet Axis
Facet Axis ••Works on all elements
Works on all elements
text
text
••Each element can be ••Legend
Legend ••Color (exterior, interior,
Color (exterior, interior,
Each element can be
used with any data
used with any data
Element
Guides Type
Coordinates
Aesthetics
Layouts
Faceting
(numeric, category, time
(numeric, category, time
…)
works with gradients)
works with gradients)
••Size (width/height/both)
Size (width/height/both)
••Symbol
Symbol
…) ••Dashing, General Styles
Dashing, General Styles
•Map number Graphic Attributes.be
•Any Data to of (Network, (bar), polygon,
Simple line, area, interval Treelike)
Graph Layouts dimensions can
Point, Axis
••As many elements on a
As many elements on a ••Label, Tooltip, Meta
Label, Tooltip, Meta
chart as you like
chart as you like
Faceting
•Chart-in-chart chain of transformations
Faceting
•Works on all a
schema, with
defined, Axis elements
•Nested text
Treemaps
••Chart-in-chart
•PanelingLayouts be used with any data
Chart-in-chart
•Color Paneling and can
•Clustering
Facet (exterior, stackingworks with
Custom
Each Axis
element interior,
••Paneling
•Polar
(numeric,
gradients)category, time …)
Legend Coordinates
Coordinates
Layouts
Layouts Transposeelements on a
•Size (width/height/both) chart as you likebenumber ofwith aachain
As many ••Any number of dimensions
Any dimensions
can be defined, with chain
can defined,
Map projections
of transformations
Treelike)
Treelike)
•Symbol
••Graph Layouts (Network,
Graph Layouts (Network,
of transformations
••Clustering and stacking
Clustering and stacking
••Polar
Polar
Custom Layouts •Dashing, General Styles
••Treemaps
Treemaps ••Transpose
••Custom Layouts Transpose
••Map projections
Map projections
•Label, Tooltip, Meta
This CFO Dashboard Visualization uses chart-in-chart faceting:
•The outer chart uses a graph layout, with an integrator-designed schema element for the nodes and standard edge element links.
The schema element has multiple parts and five different aesthetics set symbol type and color for each part.
•The inner chart uses an interval element with 2D coordinates and two axes: a standard bar chart.
17. The Grammar of a Bar Chart
"grammar": [ { "color": [ {"field":
"coordinates": { {"$ref": "pop1960"}} ],
"dimensions": [ {"axis": "style":
{}}, {"axis": {}} ], {"stroke": {"width": 0.25}}
"transforms": [ {"type": }
"transpose"} ] ],
}, "style": {"fill":
"elements": [ { "#bbf", "padding": 5}
This is the complete
"type": "interval", } grammar VizJSON
"position": [
{"field": {"$ref":
"pop2010"}}, chart flipped (transpose) - bars
Coordinates: 2D Position (how we place elements in the
run horizontally coordinate system) shows state names
{"field": {"$ref": by current population
"name"}} ], for both x and y dimensions
Guides: "axis“
Aesthetics (how to color it): Color uses
Element Types (go inside the data area): the population data for 1960
Uses a single interval (e.g. bar) Layouts , Faceting - None
Style for a thin border (e.g. 0.25 width) Pop2010, pop1960, name are references
to parts of the data
18. Simple Changes: Power of Composition
Before:
{
After
"type": "interval",
"position": [
Before {"field": {"$ref":
"pop2010"}},
{"field": {"$ref":
"name"}} ],Adda position field to make it
Add a position field to make it
a range chart with start at
a range chart with start at
1960, end at 2010
1960, end at 2010
After:
{
"type": "interval",
"position": [
{"field": {"$ref":
"pop1960"}},
{"field": {"$ref":
"pop2010"}},
{"field": {"$ref":
19. Simple Changes: Power of Composition
Add a point element for 1980
populations
{ "type": "interval",
"position": [
{"field": {"$ref":
"pop1960"}},
{"field": {"$ref":
"pop2010"}},
{"field": {"$ref":
"name"}} ],
"color": [ {"field":
{"$ref": "pop1960"}} ],
"style": {"stroke":
{"width": 0.25}} },
{
"type": "point",
"position": [
{"field": {"$ref":
20. Maps are just another element
{ "coordinates": { Use a projection type of transform,
"dimensions": [ {}, {} ], specifically with a Mercator coordinate
system.
"transforms": [
{ Data set already has the geographic
"type": "projection", Aesthetics, labels, style are all as usual.
"projectionParams":
Map layers correspond well to elements.
{"name": "mercator"} } ] },
"elements": [ {
"type": "polygon",
"label": [
{"content": [ {"$ref": "abbr"} ]} ],
"color": [{"field": {"$ref":
"pop2010"}}],
"style": {"stroke": {"width":
0.25}}
} ],
"style": {"fill": "#bbf", "padding":
5} }
21. Demo: Showcase each feature
Guides
Aesthetics
Element
Type
Faceting
Layouts
Coordinates
23. Thank You
Slide authors: Greg Adams (IBM)
Graham Wills (IBM)
Karan Gujral (IBM)
Editor's Notes
Computer and network technology has made it trivial to capture, store, manipulate and disseminate ever increasing amounts of data. The amount of information in the world has been growing more than exponentially.
Data set is taken from a public source releases by the FAA. It contains every commercial flight in the US over several decades (last year of data is 2008), with many details Here, we have rolled up to get monthly counts of CANCELLED flights for two years, a decade apart. These are standard recommended tables and charts from Excel
Data set is taken from a public source releases by the FAA. It contains every commercial flight in the US over several decades (last year of data is 2008), with many details Here, we have rolled up to get monthly counts of CANCELLED flights for two years, a decade apart. These are standard recommended tables and charts from Excel Note the “cloud” of outliers for 2008 delays being much worse that 1998 in winter.
Don ’t describe charts by type (barchart, linechart, histogram etc.) but by mapping “ bar chart” = basic 2D coordinates, categorical x numeric displayed with intervals dropped from locations “ line chart” = basic 2D coordinates, any x numeric displayed with lines connecting locations Statistical operations (sum, count), styling (color) etc. Grammar-based approach means flexibility: new charts or charts attributes can be added without a new product binary Field team will be able to build a customer-specific chart Customers will be able to add that one extra customization they need Research will be able to rapidly build cutting edge visualizations Declarative language for visualizations (charts, interactivity, events, etc.), x-IBM standard
Draw an example with writing individual connectors for each database, versus working with SQL. Discuss the reduced cost of working with SQL, standardization, future proofing and so on. Consider each type of visualization to be the equivalent of a database, and the GoG approach to be an abstraction on which each chart can be based.
Charts in the top line are ones that you could get in a traditional graphing package, but would look different in different environments Charts in the middle line are ones where you would probably need a specialist solution – each chart would require a different OEM solution Charts in the bottom line are ones that may not be possible anywhere else without a dedicated graphics programmer