SlideShare una empresa de Scribd logo
1 de 58
Descargar para leer sin conexión
Visualizing
“Big” Data
Sean Kandel & Jeffrey Heer
Trifacta Inc. @trifacta
How can we visualize and
interact with billion+ record
databases in real-time?
Two Challenges:
1. Effective visual encoding
2. Real-time interaction
Perceptual and interactive
scalability should be limited
by the chosen resolution of
the visualized data, not the
number of records.
Perception
Data

Sampling

Binning

Modeling
Google Fusion Tables (Sampling)
imMens (Binned Aggregation)
Bin > Aggregate (> Smooth) > Plot
1. Bin Divide data domain into discrete “buckets”
Categories: Already discrete (but check cardinality)
Numbers: Choose bin intervals (uniform, quantile, ...)
Time: Choose time unit: Hour, Day, Month, etc.
Geo: Bin x, y coordinates after cartographic projection
Number of Bins?
Hexagonal or Rectangular Bins?

100,000 Data Points

Hexagonal Bins

Rectangular Bins

Hex bins better estimate density for 2D plots,
but the improvement is marginal [Scott 92], while
rectangles support reuse and query processing.
Bin > Aggregate (> Smooth) > Plot
1. Bin Divide data domain into discrete “buckets”
Categories: Already discrete (but check cardinality)
Numbers: Choose bin intervals (uniform, quantile, ...)
Time: Choose time unit: Hour, Day, Month, etc.
Geo: Bin x, y coordinates after cartographic projection

2. Aggregate Count, Sum, Average, Min, Max, ...
Bin > Aggregate (> Smooth) > Plot
1. Bin Divide data domain into discrete “buckets”
Categories: Already discrete (but check cardinality)
Numbers: Choose bin intervals (uniform, quantile, ...)
Time: Choose time unit: Hour, Day, Month, etc.
Geo: Bin x, y coordinates after cartographic projection

2. Aggregate Count, Sum, Average, Min, Max, ...
(3. Smooth Optional: smooth aggregates [Wickham ’13])
[1] Wickham 2013
Bin > Aggregate (> Smooth) > Plot
1. Bin Divide data domain into discrete “buckets”
Categories: Already discrete (but check cardinality)
Numbers: Choose bin intervals (uniform, quantile, ...)
Time: Choose time unit: Hour, Day, Month, etc.
Geo: Bin x, y coordinates after cartographic projection

2. Aggregate Count, Sum, Average, Min, Max, ...
(3. Smooth Optional: smooth aggregates [Wickham ’13])
4. Plot Visualize the aggregate summary values
Plot: Visual Encoding
Choose Most Effective Encoding [Cleveland & McGill ’84]
1D Plot -> Position or Length Encoding
Histograms, line charts, etc.

2D Plot -> Area or Color Encoding
Spatial dimensions (x, y) already allocated.
While less effective than area for magnitude
estimation, color can be used at the per-pixel level
and provides an overall “gestalt”
Standard Color Ramp
Counts near zero are white.
-> Outliers are missed

Add Discontinuity after Zero
Counts near zero remain visible.
-> Outliers can be seen
Linear Alpha Interpolation
is not perceptually linear.

Cube-Root Alpha Interpolation
approximates perceptual linearity.
Color Encoding
Min. Non-Zero Intensity (α=0.15) [1]

Perceptual Scaling (γ=1/3) [2]

Luminance (in range 0-1) User-Adjustable Min/Max Values [3]

[1] Keep small non-zero values visible (outliers!)
[2] Match color ramp to perceptual distances
[3] Enable exploration across value ranges
Design Space of Binned Plots
Interaction
Interaction Techniques?
1. Select
Detail-on-Demand
2. Navigate Pan & Zoom
3. Query Brush & Link
Y
512

…

1023

5-D Data Cube
Month, Day, Hour, X, Y
767

…
X
Month …11

11
…

0
23
…

0
23
…

11
…
0
23
…

1

1

1

0

0

0

Hour

0 1

… 30

0 1

…

30

0 1

256

… 30

Day

12 x 31 x 24 x 512 x 512 = ~2.3 billion cells
Y
512

…

1023

Brushing January
Month, Day, Hour, X, Y
767

…
X
Month …11

11
…

0
23
…

0
23
…

11
…
0
23
…

1

1

1

0

0

0

Hour

0 1

… 30

0 1

…

30

0 1

256

… 30

Day

31 x 24 x 512 x 512 = ~195 million cells
Multivariate Data Tiles
1. Send data, not pixels
2. Embed multi-dim data
Full 5-D Cube

Σ

Σ

Σ

Σ

For any pair of 1D or 2D binned plots, the
maximum number of dimensions needed
to support brushing & linking is four.
Y : 512 bins

X : 512 bins
~2.3B bins

Full 5-D Cube

Σ

Σ

Σ

13 3-D Data Tiles

Σ

~17.6M bins
(in 352KB!)
Query & Render on GPU via WebGL

Pack data tiles as PNG image files,
bind to WebGL as image textures.
Query & Render on GPU via WebGL

Σ
Invoke program for each output bin.
Executes in parallel on GPU.
Query & Render on GPU via WebGL

Σ
Performance Benchmarks
Simulate interaction:
brushing & linking
across binned plots.
- imMens vs. Profiler
- 4x4 and 5x5 plots
- 10 to 50 bins
Measure time from
selection to render.
Test setup:
2.3 GHz MacBook Pro (4-core)
NVIDIA GeForce GT 650M
Google Chrome v.23.0
5 dimensions x 50 bins/dim x 25 plots

imMens

~50fps querying of visual
summaries of 1B data points.

In-Memory Data Cube

Number of Data Points
NanoCubes

[1] Lins et. al. Infovis 2013
[2] Sismanis et. al. SIGMOD 2002
NanoCubes

[1] Lins et. al. Infovis 2013
Resources
imMens
Tableau Public
BigVis (R)
Nanocubes
BlinkDB
MapD

vis.stanford.edu/projects/immens
tableausoftware.com/public
github.com/hadley/bigvis
nanocubes.net
blinkdb.org
geops.csail.mit.edu/docs/
Acknowledgments
Zhicheng “Leo” Liu
Biye Jiang
Visualizing
“Big” Data
Sean Kandel & Jeffrey Heer
Trifacta Inc. @trifacta

Más contenido relacionado

La actualidad más candente

Foss4 g 2017-kansai-ryoo-kim
Foss4 g 2017-kansai-ryoo-kimFoss4 g 2017-kansai-ryoo-kim
Foss4 g 2017-kansai-ryoo-kim
OSgeo Japan
 
ข้อมูลและสารสนเทศ
ข้อมูลและสารสนเทศข้อมูลและสารสนเทศ
ข้อมูลและสารสนเทศ
Surapong Jakang
 

La actualidad más candente (20)

Chapter - 4 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 4 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 4 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 4 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
 
02 Geographic scripting in uDig - halfway between user and developer
02 Geographic scripting in uDig - halfway between user and developer02 Geographic scripting in uDig - halfway between user and developer
02 Geographic scripting in uDig - halfway between user and developer
 
Blur Filter - Hanpo
Blur Filter - HanpoBlur Filter - Hanpo
Blur Filter - Hanpo
 
Gaussian Image Blurring in CUDA C++
Gaussian Image Blurring in CUDA C++Gaussian Image Blurring in CUDA C++
Gaussian Image Blurring in CUDA C++
 
Lec02 03 rasterization
Lec02 03 rasterizationLec02 03 rasterization
Lec02 03 rasterization
 
Tutor 41 Big Numbers
Tutor 41 Big NumbersTutor 41 Big Numbers
Tutor 41 Big Numbers
 
Drawing Fonts
Drawing FontsDrawing Fonts
Drawing Fonts
 
Math in the News: Issue 87
Math in the News: Issue 87Math in the News: Issue 87
Math in the News: Issue 87
 
Deep learning
Deep learningDeep learning
Deep learning
 
신뢰 전파 기법을 이용한 스테레오 정합(Stereo matching using belief propagation algorithm)
신뢰 전파 기법을 이용한 스테레오 정합(Stereo matching using belief propagation algorithm)신뢰 전파 기법을 이용한 스테레오 정합(Stereo matching using belief propagation algorithm)
신뢰 전파 기법을 이용한 스테레오 정합(Stereo matching using belief propagation algorithm)
 
Opensource gis development - part 4
Opensource gis development - part 4Opensource gis development - part 4
Opensource gis development - part 4
 
Foss4 g 2017-kansai-ryoo-kim
Foss4 g 2017-kansai-ryoo-kimFoss4 g 2017-kansai-ryoo-kim
Foss4 g 2017-kansai-ryoo-kim
 
ข้อมูลและสารสนเทศ
ข้อมูลและสารสนเทศข้อมูลและสารสนเทศ
ข้อมูลและสารสนเทศ
 
Fast Object Recognition from 3D Depth Data with Extreme Learning Machine
Fast Object Recognition from 3D Depth Data with Extreme Learning MachineFast Object Recognition from 3D Depth Data with Extreme Learning Machine
Fast Object Recognition from 3D Depth Data with Extreme Learning Machine
 
Y Tiles
Y TilesY Tiles
Y Tiles
 
Cell calculation
Cell calculationCell calculation
Cell calculation
 
Spatial Analysis with R - the Good, the Bad, and the Pretty
Spatial Analysis with R - the Good, the Bad, and the PrettySpatial Analysis with R - the Good, the Bad, and the Pretty
Spatial Analysis with R - the Good, the Bad, and the Pretty
 
Genome Browser based on Google Maps API
Genome Browser based on Google Maps APIGenome Browser based on Google Maps API
Genome Browser based on Google Maps API
 
Image sampling and quantization
Image sampling and quantizationImage sampling and quantization
Image sampling and quantization
 
Image transforms
Image transformsImage transforms
Image transforms
 

Similar a 2013.10.24 big datavisualization

Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...
Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...
Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...
Accumulo Summit
 
Geo & capped collections with MongoDB
Geo & capped collections  with MongoDBGeo & capped collections  with MongoDB
Geo & capped collections with MongoDB
Rainforest QA
 
Cahall Final Intern Presentation
Cahall Final Intern PresentationCahall Final Intern Presentation
Cahall Final Intern Presentation
Daniel Cahall
 

Similar a 2013.10.24 big datavisualization (20)

Interactive Latency in Big Data Visualization
Interactive Latency in Big Data VisualizationInteractive Latency in Big Data Visualization
Interactive Latency in Big Data Visualization
 
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 5
Data Mining:  Concepts and Techniques (3rd ed.)— Chapter 5 Data Mining:  Concepts and Techniques (3rd ed.)— Chapter 5
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 5
 
Ics2311 l02 Graphics fundamentals
Ics2311 l02 Graphics fundamentalsIcs2311 l02 Graphics fundamentals
Ics2311 l02 Graphics fundamentals
 
2 olap operaciones
2 olap operaciones2 olap operaciones
2 olap operaciones
 
Datascape Introduction
Datascape IntroductionDatascape Introduction
Datascape Introduction
 
Making BIG DATA smaller
Making BIG DATA smallerMaking BIG DATA smaller
Making BIG DATA smaller
 
TDC2017 | São Paulo - Trilha Java EE How we figured out we had a SRE team at ...
TDC2017 | São Paulo - Trilha Java EE How we figured out we had a SRE team at ...TDC2017 | São Paulo - Trilha Java EE How we figured out we had a SRE team at ...
TDC2017 | São Paulo - Trilha Java EE How we figured out we had a SRE team at ...
 
Time-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity ClustersTime-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity Clusters
 
Object Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksObject Detection using Deep Neural Networks
Object Detection using Deep Neural Networks
 
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017
 
Introduction to Applied Machine Learning
Introduction to Applied Machine LearningIntroduction to Applied Machine Learning
Introduction to Applied Machine Learning
 
Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...
Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...
Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...
 
05 cubetech
05 cubetech05 cubetech
05 cubetech
 
Specialized indexing for NoSQL Databases like Accumulo and HBase
Specialized indexing for NoSQL Databases like Accumulo and HBaseSpecialized indexing for NoSQL Databases like Accumulo and HBase
Specialized indexing for NoSQL Databases like Accumulo and HBase
 
Beginning direct3d gameprogramming01_thehistoryofdirect3dgraphics_20160407_ji...
Beginning direct3d gameprogramming01_thehistoryofdirect3dgraphics_20160407_ji...Beginning direct3d gameprogramming01_thehistoryofdirect3dgraphics_20160407_ji...
Beginning direct3d gameprogramming01_thehistoryofdirect3dgraphics_20160407_ji...
 
Geo & capped collections with MongoDB
Geo & capped collections  with MongoDBGeo & capped collections  with MongoDB
Geo & capped collections with MongoDB
 
Cahall Final Intern Presentation
Cahall Final Intern PresentationCahall Final Intern Presentation
Cahall Final Intern Presentation
 
An Efficient Method of Partitioning High Volumes of Multidimensional Data for...
An Efficient Method of Partitioning High Volumes of Multidimensional Data for...An Efficient Method of Partitioning High Volumes of Multidimensional Data for...
An Efficient Method of Partitioning High Volumes of Multidimensional Data for...
 
Regularised Cross-Modal Hashing (SIGIR'15 Poster)
Regularised Cross-Modal Hashing (SIGIR'15 Poster)Regularised Cross-Modal Hashing (SIGIR'15 Poster)
Regularised Cross-Modal Hashing (SIGIR'15 Poster)
 
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
 

Último

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Último (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 

2013.10.24 big datavisualization