Strata Conference 2012

O’Reilly Strata Conference
Making Data Work

Feb 28 - Mar 1
Santa Clara, CA

Michelle Li

Conference Overview

• 3 days of workshops, lectures, keynotes, startup showcase and a mini
Maker Faire
• Developers, data scientists, data analysts, and other data professionals
including researchers, designers, journalists
• 5 different session tracks: Data Science, Deep Data, Business &
Industry, Hadoop and Big Data (Applied & Tech), Domain Data, and
Visualization & Interface

© 2011 Oculus Info Inc. 2

Evening events included:
• Mini Maker Faire – showcase of innovative data-related hardware, apps, and robots
• Data Crush: Where Wine and Data Meet – wine tasting event where participants provide
feedback data that was compiled and analyzed to extrapolate behavioural trends and
factors influencing their responses
• Startup Showcase – live demo program and competition for 10 finalist startups and early-
stage companies to demonstrate their innovations to
judges, investors, entrepreneurs, journalists


Who goes to a conference about data?

Over 2000 attendees from various organizations:

Microsoft Digg
Google Groupon
Apple PayPal
Netflix Infochimps
IBM Tableau
Oracle VMware
LinkedIn Guardian News
Facebook The Seattle Times
Twitter MIT Media Lab
Amazon …


Data is the New Oil

Source: http://www.house.gov/apps/list/press/tx08_brady/71509_hc_chart.html


Materialize Data Into New Services


Google Insights: “Infographic”


Google Insights: “Infographic” vs “Big Data”


Session Overviews

o Data visualization
• how we communicate information
• visual analysis and principles for designing effective
data views
• design process and visualization tools for presenting
data
o Data Journalism
• creating data stories to share information socially
o Democratization of Data
• data for the common good


Noah Iliinksy, Complex Diagrams
Jock Mackinlay, Tableau

DESIGNING DATA VISUALIZATIONS


Data Visualization

The representation and
presentation of data that
exploits our visual
perception abilities in order
to amplify cognition


Science of Visualization

o Humans are slow at mental math;
but we’re faster when using the 34
world around us
o Human perception is powerful but
x 72 VS

perception can be aided and
augmented by visual prompts
o Finding patterns is key to
information visualization
• We have a flexible pattern
finder coupled with an
adaptive decision-making
mechanism


Visualization Makes Data Accessible

Allows us to easily see trends and patterns


Leverage the Amazing Abilities of Our Eyes and Brain

Preattentive features:
length, width, size, colour, closure, number, intersection, contrast, tilt, cur
vature, etc.


Faster Access to Actionable Insights
Difficult to compare 15+ tire models with Chart allows customer to focus on appropriate tires based
different characteristics on 3 axes of data:
• rim diameters, various widths, various • desired rim size
features, price, special features • tire width
• toughness/quickness

Source: http://www.rivbike.com/Tires-Pumps-Patches-s/52.htm Source: http://complexdiagrams.com/2009/03/tire-chart/


Allows Access to Huge Amounts of Data

GapMinder Public health data on a massive global scale
Understand data through stories

Source: gapminder.org


Visualization for Exploration

LinkedIn Maps


Visualization for Explanation


Visualizing Data

Data has properties
• categorical, quantifiable, geographic, binary
• continuous, non-continous, ordered
• timeline


Define Knowledge Before Structure
Donut charts: Aesthetically pleasing but not very functional in these cases.
Good: Individual donuts good for glance of relative share of total market

Chart #1

• Comparing series of donut charts is
meaningless
• Shows time series data over 7 donuts

Chart #2

• Too many wedges
• Many of the wedges are similarly sized
• Non-standard sort

Source: http://litmus.com/blog/email-client-market-share-infograph/email-client-market-stats-1000


Use Defaults

Time series data is
usually best shown in a
line graph

Shows sequential
changes more easily
than comparing wedges
between donuts

Line graph shows trends
more clearly


Simple bar graph, but it’s much easier
to extract knowledge from it


Unless your data is periodic, don’t put your data in a periodic table

Chronological timeline
Family tree
Influence of different controllers
Meaningful context


Encoding Well

Position is everything.
Colour is hard.
/Moritz Stefaner


Position is Everything


Colour is Difficult
Colour can be used effectively
in information display
• Naturally codes attributes of
objects
• Not naturally ordered in our
brain

Excellent for labelling and
categorization
• Works well for heat
maps/temperature and
categorization

Poor for displaying
shape, rank, order, detail or
space
• Not effective for
quantitative data


Colour is Difficult


Retinal Properties

o Jacques Bertin identified that every
visualization is made up of basic
components
o Each component has different expressive
power
o Each works best only in some conditions
o 6 basic variables:
size, value, texture, colour, orientation, sha
pe

o Jock Mackinlay applied these same
principles to automatically construct
visualizations out of data

Four dimensions of data shown Diagram shows how each visual
effectively in traditional scatter component works best in each case
plot generated by computers
and how to use them.


Appropriate Encodings

http://complexdiagrams.com/properties


Fabien Girardin, Near Future Laboratory

SKETCHING WITH DATA


Napolean

“Un bon croquis vaut mieux
qu’un long discours.”


Network Data


Urban Demos ‘Urban demos’ reveal how the city
lives through its data. The City of
Geneva visualized digital traces
created from cellular network
activity.

They reflect mobility in a city or a
street and reveal insights about a
city that are of importance from
an economic and political
perspective.


Digital Traces


Process


Innovate With Data - Sketch


Sketching With Data

Sketch: to think, to make an idea
tangible (and observe its different
dimensions and implications), to tell
stories, to share discoveries

A rough version of a creative
work, made to assist in reaching
coherent result

Key values of sketching:
• share common language
• qualify results
• explore ideas


Sketch To Share A Common Language

Sets a common language among different actors of the project how they understand the
data and how the data can be used

Project: explore novel services for mobile phone operators using aggregated cellular
network activity

Network Engineer’s view of the data Product Manager’s view of the same data


Sketch To Share A Common Language

This is an early sketch to show the
data they were trying to
transform, which reveals the
quality of the data to measure
mobility and density of activity on
the network 44
© 2011 Oculus Info Inc.

Sketch To Qualify Results

Project: Controlling hyper-congestion at le Louvre to create an enjoyable
visiting experience

Hypercongestion refers to the situation in which the quantity of visitors in a space
influences negatively the quality of their visiting experience and their security.


Sketch To Qualify Results
o Used network of sensors over 10 days around critical areas to collect empirical data on flows
and densities of visitors in key areas
o Measured occupancy levels, visiting times, and centrality of trails
o Field experts (security guards) helped contextualized data and early results through sketches
o These results can influence the remodeling of areas and the deployment of information kiosks
and help evaluate strategies and policies to control hyper-congestion


Defining Measures of Hyper-Congestion

• Measures provided insights and revealed symptoms of hyper-congestion, but
were insufficient to describe the cause of the issue
• how to qualify how people walk, etc.
• Sketches were produced after each data collection period: visualized
information about visiting sequences, travel times how long visitors stayed in
each room
• Used sketches to discuss with people in the field, who provided qualitative
evidence to contextualize and qualify results and explain detected irregularities


Defining Measures of Hyper-Congestion

Network data tells A
story, not THE story


Sketch To Explore Ideas
Project: Explore the role of a retail bank BBVA in smart cities in the near future
Explored opportunities for innovative services to exploit data in the domains of
distribution strategies, audience profiling and social navigation


Sketch To Explore Ideas

Created multiple prototypes to
explore opportunities for innovative
BBVA internal and external services

Project participants were able to
explore and interrogate the data
from multiple perspectives

Use of the dashboard helped
participants develop specific
scenarios involving services and
products that a bank could take
advantage of


Interactive Sketching Tool: Quadrigram

Data manipulation and visualization
environment using a visual
programming language

Modular, node-based interface for
designing data flows, linking data
resources to operators, controls and
visualizations

WYSIWG interface designed for iterative
exploration and explanation, allowing
us to generate new questions and
provide answers with data


Access, Manipulate, Analyze and Visualize

Real-time traffic information

Five representations of a single data set:
1. Table visualizer (rows & columns)
2. Network visualization to see
relationships between points
3. Geodata to view points of map to
see context
4. Data in real-time visualizes traffic
moving at different velocities
5. Temporal data


Access, Manipulate, Analyze and Visualize

Data as living material

OPEN DATA & DATA JOURNALISM


Data Journalism

• Data is changing journalism in several ways
• New ways of visualizing complexity
• Provide real answers, based on evidence rather than assertion
• Democratization of tools and data platforms to help people understand
information and share stories
• Bigger datasets about really small things
o Allows you to search data
o Make complex maps really quickly
• Crowdsourcing
o Aggregated input from the public is powerful for disaster response
o Accurately depicts dynamic situations
• Open data means open data journalism
• Governments are increasingly publishing their data repositories for
other people to access and use it


Japanese Geiger Maps

Using Pachube to aggregate geiger counter
readings from various data sources

• Geiger counter – readings for
Tsunami/Fukushima facility
• Government was releasing information only
once per day in PDF format – only numerals;
nothing about what they mean
• Pachube community created tutorials-
collected and aggregated measurements
from various sources and hooked them up to
the web
• Suddenly 2000 feeds/minute across Japan
• People took data and built applications to
represent data in terms of health
consequences and change from background
radiation

http://japan.failedrobot.com/


Winds of Fukushima

Android App: took your geolocation, wind direction and nearby radiation monitors to
infer where radiation may peak next

Android app: Winds of Fukushima


After the tsunami and
earthquakes, Toyota
and Honda shared their
data to map out usable
roads


Crowdsourcing Datasets

Understand trends of the
data set

Help find anomalies

People measured things that
might not be measured by
the offical network

Public visibility and
accountability- get people
from different domain
expertise to talk about the
data


Simon Rogers, Guardian

THE CRAFT OF DATA JOURNALISM


Behind the Scenes at the Guardian Datablog

Datablog started off as a small blog offering full datasets
behind their stories and now publishes hundreds of raw
datasets, data visualizations and data analyses

Process
o Locate the data or receive it from various sources (e.g.
breaking news stories, government data, journalists’
research)
o Examine the data: transform for quality/purpose, tidy
up, consolidate
o Perform calculations and statistical inquiries to see
whether there is a story
o Output a story, graphic or visualization
• Excel/Google charts for small line graphs and pies
• Google Fusion Tables for maps
• Internal dev team produce the more
sophisticated graphics


The First Guardian Data Journalism:
May 5, 1821

• Contained a table of data: a list of
schools in Manchester and
Salford, with the number of students
at each and the average annual
spending
ie. how many pupils received free
education and how many poor
children there were in the city
• Official statistics were collected by only
4 clergymen, which resulted in
inaccurate and faulty data
• Leaked by a source identified as
“NH”, the data caused a huge
sensation
• Revealed that 25 000 children were
receiving free education instead of the
8 000 that was officially estimated

• Using data to show the true state of
affairs to help fight for a decent
© 2011 Oculus Info Inc. 66 education system

Public spending by the UK’s central government departments 2010-2011


Becoming Data Providers


Exploring the Data

170 spreadsheets of
government spending data

Guardian created a
spending data explorer
application

Designed to make it easier
for people to search and
download key data

Simply analysis has already
been done: combined
spending for each
department into single
spreadsheets


Wikileaks Afghanistan War Logs

Wikileaks log of every IED attack
with co-ordinates from 2004-2009

Soldiers are good at entering data:
locations of where soldiers died in
Afghanistan, including date, what
happened, number of
casualties, and summaries


Bigger Datasets Of Smaller Things:
Every IED attack from 2004-2009


Crowdsourcing Experiment: MP Expense Scandal
• Big release of MP’s documented expense claims – 458,000+ documents
• The Guardian developed a crowdsourcing application in 5 days
• Within 10 minutes of the launch, 323 people were using the application to go through the
documents
• First half hour, more than 2000 pages had been reviewed
• Each receipt filed by an MP were converted into an image for the public to review
• Users reviewing were asked to determine and detail what entries there were on a page and flag
them as unimportant, interesting, “interesting but known” or worthy or investigation

http://mps-expenses.guardian.co.uk/


What Was Revealed…

• Douglas Hogg, Conservative MP for
Sleaford and North Hykeham, charged
£2,115 to have the moat cleared at his
Lincolnshire estate and claimed bills for
a "mole man".
• Sir Peter Viggers, Tory MP for
Gosport, claimed £1,645 for a floating
"duck island" in the garden of his
Hampshire home as part of £32,000 of
gardening expenses over three years.
• Jacqui Smith, the former home
secretary, claimed £10 for two adult
films which were accessed by her
husband at her constituency home.
• Tony Blair claimed almost £7000 for
roof repairs two days before leaving
office and standing down as MP.


London Riots

Instant data journalism: filling the hole of knowledge for anyone
wanting to know what was happening where

• Collected key reported incidents from as many possible sources
• Compiled a list of every incident where there was a verified
report, then mapped it with Google Fusion tables
• Allowed people to download the data behind it – possibly the
the simplest but most popular thing they did


Reading the Riots

o Project took a look at the riots as
experienced by those who were
there
o A specially-recruited team
interviewed around 270 people
about the riots and why they had
been involved


England Riots: Was Poverty A Factor?


‘Riot Commute’

• Data from 1,100 individual’s
magistrate’s court records that included
postcodes for defendents’ home and
offence locations

• 70% of those accused of riot-related
crimes travelled from outside their area

• Riots occurred in the city centre, but
accused rioters lived in out districts

• Travelled an average of 2.2 miles from
home to the riot offence site

• Transport mapping specialists modelled
the most likely routes from home to
offence


How Riot Rumours Spread On Twitter

• Many people, including the PM and acting head of the Metropolitan police, blamed
Twitter for spreading the disorder
• Analysis of 2.6 million riot-related tweets suggested a different conclusion: the
network was able to collectively dispel and clarify false information
• Picked a subset of more than 10 000 tweets concerning 7 key rumours that emerged
during the riots


Strata Conference 2012

Recomendados

Recomendados

Más contenido relacionado

Similar a Strata Conference 2012

Similar a Strata Conference 2012 (20)

Último

Último (20)

Strata Conference 2012

Notas del editor