SlideShare a Scribd company logo
1 of 21
Download to read offline
Amit Kapoor
@amitkaps
Visualising
Big Data
Visualise Million
Data Points
x <- rnorm(1000000, mean=0, sd=2)
y <- rnorm(1000000, mean=0, sd=2)
xy <- data.frame(x,y)
Same order as the
Number of Pixels
on my MacBook Air
1400 x 900
Data
Data Sample
Sampling can be
effective (with
overweighting
unusual values)
Require multiple
plots or careful
tuning parameters
Data Sample
Model
Models are great as
they scale nicely.
But, visualisation is
required as
“I don’t know, what I
don’t know.”
Data Sample
ModelBinning
Binning can solve a
lot of these
challenges
“Bin - Summarize -
Smooth: A framework
for visualising big data” -
Hadley Wickam (2013)
“Visualising big data
is the process of creating
generalized histograms”
Approach
BIN : fixed size bins = (x-origin)/width
SUMMARIZE : summary stats = count, mean, stdev
SMOOTH : smoothing e.g. kernel mean, regression
VISUALISE : visualise using standard plots
Bigvis Package in R
Aim: To plot 100 million points in under 5 seconds.
Approach:
- Plotting using standard R libraries
- Processing done in (fast) compiled C++ code, using
Rcpp package
- Outlier removal in big data
- Smoothing to highlight trends & suppress noise
Diamonds dataset
ggplot(diamonds) + aes(carat, price)
+ geom_point(alpha = 0.2, colour =
“orange”)
50k observations e.g. price, carat of diamonds
Condense (bin + summarise)
library(bigvis)
library(ggplot2)
Nbin <- 20
BinData <- with(diamonds, condense(
bin(carat, find_width(carat,Nbin)),
bin(price, find_width(price,Nbin)))
Plotting the Condense
p <- ggplot(BinData) + aes(carat,
price, fill=.count) + geom_tile()
Create bins = 20 and summarized using count
Both Points & Condensed
q <- p + geom_point(data = diamonds,
aes(fill = NULL), alpha = 0.2, colour
= "orange")
Create bins = 20, summarized using count & added base data
Movies dataset
ggplot(movies) + aes(length, rating)
+ geom_point(alpha = 0.2, colour =
“orange”)
130k observations e.g. length, rating of movies on IMDB
Let us see the outliers
title length rating
1 Matrjoschka 5700 8.5
2 The Cure for Insomnia 5220 5.9
3 The Longest Most Meaningless Movie in the World 2880 7.3
4 The Hazards of Helen 1428 6.6
5 **** 1100 6.9
Condense (bin + summarise)
library(bigvis)
library(ggplot2)
Nbin <- 1e4
BinData <- with(movies, condense(
bin(length, find_width(length,Nbin)),
bin(rating, find_width(rating,Nbin)))
Condesed Plot
p <- ggplot(BinData) + aes(length,
rating, fill=.count) + geom_tile()
Create bins = 10000 and summarized using count
Remove Outliers
p %>% peel(BinData)
Create bins = 10000, summarize count & peel 1% outlier
Smoothing
smoothBinData <- smooth(peel
(binData), h=c(20, 1))
autoplot(smoothBinData)
Create bins = 20, summarize count, peel 1% outlier & smooth
Big Data Visualisation
● Approach: Bin - Summarize - Smooth - Visualise
● “Interactively” plot nearly 100 millions data point in-
memory for EDA in R
● Can be extend to in-database e.g. for binning
● Can be parallelised e.g. summarize on count, mean
Amit Kapoor
@amitkaps
amitkaps.com
narrativeviz.com
Data
Visual
Story
*

More Related Content

What's hot

peRm R group. Review of packages for r for market data downloading and analysis
peRm R group. Review of packages for r for market data downloading and analysispeRm R group. Review of packages for r for market data downloading and analysis
peRm R group. Review of packages for r for market data downloading and analysis
Vyacheslav Arbuzov
 
Pointer Events in Canvas
Pointer Events in CanvasPointer Events in Canvas
Pointer Events in Canvas
deanhudson
 

What's hot (20)

Real life XNA
Real life XNAReal life XNA
Real life XNA
 
Introducing the Microsoft Virtual Earth Silverlight Map Control CTP
Introducing the Microsoft Virtual Earth Silverlight Map Control CTPIntroducing the Microsoft Virtual Earth Silverlight Map Control CTP
Introducing the Microsoft Virtual Earth Silverlight Map Control CTP
 
Surface3d in R and rgl package.
Surface3d in R and rgl package.Surface3d in R and rgl package.
Surface3d in R and rgl package.
 
Fun with D3.js: Data Visualization Eye Candy with Streaming JSON
Fun with D3.js: Data Visualization Eye Candy with Streaming JSONFun with D3.js: Data Visualization Eye Candy with Streaming JSON
Fun with D3.js: Data Visualization Eye Candy with Streaming JSON
 
Introduction to graphics programming in c
Introduction to graphics programming in cIntroduction to graphics programming in c
Introduction to graphics programming in c
 
LSGAN - SIMPle(Simple Idea Meaningful Performance Level up)
LSGAN - SIMPle(Simple Idea Meaningful Performance Level up)LSGAN - SIMPle(Simple Idea Meaningful Performance Level up)
LSGAN - SIMPle(Simple Idea Meaningful Performance Level up)
 
peRm R group. Review of packages for r for market data downloading and analysis
peRm R group. Review of packages for r for market data downloading and analysispeRm R group. Review of packages for r for market data downloading and analysis
peRm R group. Review of packages for r for market data downloading and analysis
 
12. Map | WeakMap | ES6 | JavaScript | Typescript
12. Map | WeakMap | ES6 | JavaScript | Typescript12. Map | WeakMap | ES6 | JavaScript | Typescript
12. Map | WeakMap | ES6 | JavaScript | Typescript
 
CLUSTERGRAM
CLUSTERGRAMCLUSTERGRAM
CLUSTERGRAM
 
Data Visualization with R.ggplot2 and its extensions examples.
Data Visualization with R.ggplot2 and its extensions examples.Data Visualization with R.ggplot2 and its extensions examples.
Data Visualization with R.ggplot2 and its extensions examples.
 
Ggplot2 cheatsheet-2.1
Ggplot2 cheatsheet-2.1Ggplot2 cheatsheet-2.1
Ggplot2 cheatsheet-2.1
 
C Graphics Functions
C Graphics FunctionsC Graphics Functions
C Graphics Functions
 
Numpy python cheat_sheet
Numpy python cheat_sheetNumpy python cheat_sheet
Numpy python cheat_sheet
 
CS 354 Transformation, Clipping, and Culling
CS 354 Transformation, Clipping, and CullingCS 354 Transformation, Clipping, and Culling
CS 354 Transformation, Clipping, and Culling
 
Juggle: Hybrid Large-Scale Music Recommendation
Juggle: Hybrid Large-Scale Music RecommendationJuggle: Hybrid Large-Scale Music Recommendation
Juggle: Hybrid Large-Scale Music Recommendation
 
Pointer Events in Canvas
Pointer Events in CanvasPointer Events in Canvas
Pointer Events in Canvas
 
Intro to ggplot2 - Sheffield R Users Group, Feb 2015
Intro to ggplot2 - Sheffield R Users Group, Feb 2015Intro to ggplot2 - Sheffield R Users Group, Feb 2015
Intro to ggplot2 - Sheffield R Users Group, Feb 2015
 
Scaling up data science applications
Scaling up data science applicationsScaling up data science applications
Scaling up data science applications
 
Kwp2 091217
Kwp2 091217Kwp2 091217
Kwp2 091217
 
Seminar PSU 10.10.2014 mme
Seminar PSU 10.10.2014 mmeSeminar PSU 10.10.2014 mme
Seminar PSU 10.10.2014 mme
 

Viewers also liked

Five Things I Wish I Knew the First Day I Used Tableau
Five Things I Wish I Knew the First Day I Used TableauFive Things I Wish I Knew the First Day I Used Tableau
Five Things I Wish I Knew the First Day I Used Tableau
Ryan Sleeper
 
Storytelling with Data - Approach | Skills
Storytelling with Data - Approach | SkillsStorytelling with Data - Approach | Skills
Storytelling with Data - Approach | Skills
Amit Kapoor
 

Viewers also liked (20)

Interent of Things (IoT) & Data Science Contextual Reference Models
Interent of Things (IoT) & Data Science Contextual Reference ModelsInterent of Things (IoT) & Data Science Contextual Reference Models
Interent of Things (IoT) & Data Science Contextual Reference Models
 
2016 04-07 презентация
2016 04-07 презентация2016 04-07 презентация
2016 04-07 презентация
 
Telling Stories with Data - Using Story Spine
Telling Stories with Data - Using Story SpineTelling Stories with Data - Using Story Spine
Telling Stories with Data - Using Story Spine
 
Five Things I Wish I Knew the First Day I Used Tableau
Five Things I Wish I Knew the First Day I Used TableauFive Things I Wish I Knew the First Day I Used Tableau
Five Things I Wish I Knew the First Day I Used Tableau
 
The Power of Ensembles in Machine Learning
The Power of Ensembles in Machine LearningThe Power of Ensembles in Machine Learning
The Power of Ensembles in Machine Learning
 
Data driven storytelling tips from an iron viz champion ryan sleeper
Data driven storytelling tips from an iron viz champion   ryan sleeperData driven storytelling tips from an iron viz champion   ryan sleeper
Data driven storytelling tips from an iron viz champion ryan sleeper
 
Crafting Visual Stories with Data
Crafting Visual Stories with DataCrafting Visual Stories with Data
Crafting Visual Stories with Data
 
Python Visualisation for Data Science
Python Visualisation for Data SciencePython Visualisation for Data Science
Python Visualisation for Data Science
 
Learning the Craft of Data Visualisation
Learning the Craft of Data VisualisationLearning the Craft of Data Visualisation
Learning the Craft of Data Visualisation
 
Data Visualisation Literacy - Learning to See
Data Visualisation Literacy - Learning to SeeData Visualisation Literacy - Learning to See
Data Visualisation Literacy - Learning to See
 
Deep Learning for NLP
Deep Learning for NLPDeep Learning for NLP
Deep Learning for NLP
 
Embedding with Tableau Server
Embedding with Tableau ServerEmbedding with Tableau Server
Embedding with Tableau Server
 
Data Storytelling: The only way to unlock true insight from your data
Data Storytelling: The only way to unlock true insight from your dataData Storytelling: The only way to unlock true insight from your data
Data Storytelling: The only way to unlock true insight from your data
 
Storytelling with Data - Approach | Skills
Storytelling with Data - Approach | SkillsStorytelling with Data - Approach | Skills
Storytelling with Data - Approach | Skills
 
Nonprofit Marketing Plan Template - Summary
Nonprofit Marketing Plan Template - SummaryNonprofit Marketing Plan Template - Summary
Nonprofit Marketing Plan Template - Summary
 
Marketing Plan Template - Small Business
Marketing Plan Template - Small BusinessMarketing Plan Template - Small Business
Marketing Plan Template - Small Business
 
Data stories - how to combine the power storytelling with effective data visu...
Data stories - how to combine the power storytelling with effective data visu...Data stories - how to combine the power storytelling with effective data visu...
Data stories - how to combine the power storytelling with effective data visu...
 
Сценарий для рисованной истории
Сценарий для рисованной историиСценарий для рисованной истории
Сценарий для рисованной истории
 
The 8 Hats of Data Visualisation
The 8 Hats of Data VisualisationThe 8 Hats of Data Visualisation
The 8 Hats of Data Visualisation
 
5 Secrets to Better Presentation Charts and Graphs
5 Secrets to Better Presentation Charts and Graphs5 Secrets to Better Presentation Charts and Graphs
5 Secrets to Better Presentation Charts and Graphs
 

Similar to Visualising Big Data

Intelligent Ruby + Machine Learning
Intelligent Ruby + Machine LearningIntelligent Ruby + Machine Learning
Intelligent Ruby + Machine Learning
Ilya Grigorik
 

Similar to Visualising Big Data (20)

Informatics Practices (new) solution CBSE 2021, Compartment, improvement ex...
Informatics Practices (new) solution CBSE  2021, Compartment,  improvement ex...Informatics Practices (new) solution CBSE  2021, Compartment,  improvement ex...
Informatics Practices (new) solution CBSE 2021, Compartment, improvement ex...
 
Learning Predictive Modeling with TSA and Kaggle
Learning Predictive Modeling with TSA and KaggleLearning Predictive Modeling with TSA and Kaggle
Learning Predictive Modeling with TSA and Kaggle
 
The Ring programming language version 1.2 book - Part 35 of 84
The Ring programming language version 1.2 book - Part 35 of 84The Ring programming language version 1.2 book - Part 35 of 84
The Ring programming language version 1.2 book - Part 35 of 84
 
Dynamic C++ Silicon Valley Code Camp 2012
Dynamic C++ Silicon Valley Code Camp 2012Dynamic C++ Silicon Valley Code Camp 2012
Dynamic C++ Silicon Valley Code Camp 2012
 
NCCU: Statistics in the Criminal Justice System, R basics and Simulation - Pr...
NCCU: Statistics in the Criminal Justice System, R basics and Simulation - Pr...NCCU: Statistics in the Criminal Justice System, R basics and Simulation - Pr...
NCCU: Statistics in the Criminal Justice System, R basics and Simulation - Pr...
 
Accelerating HPC Applications on NVIDIA GPUs with OpenACC
Accelerating HPC Applications on NVIDIA GPUs with OpenACCAccelerating HPC Applications on NVIDIA GPUs with OpenACC
Accelerating HPC Applications on NVIDIA GPUs with OpenACC
 
2021 05-04-u2-net
2021 05-04-u2-net2021 05-04-u2-net
2021 05-04-u2-net
 
Making BIG DATA smaller
Making BIG DATA smallerMaking BIG DATA smaller
Making BIG DATA smaller
 
Introduction of DiscoGAN
Introduction of DiscoGANIntroduction of DiscoGAN
Introduction of DiscoGAN
 
Machine Learning with Microsoft Azure
Machine Learning with Microsoft AzureMachine Learning with Microsoft Azure
Machine Learning with Microsoft Azure
 
ML .pptx
ML .pptxML .pptx
ML .pptx
 
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
 
imager package in R and examples..
imager package in R and examples..imager package in R and examples..
imager package in R and examples..
 
Decision Tree.pptx
Decision Tree.pptxDecision Tree.pptx
Decision Tree.pptx
 
Interactive Latency in Big Data Visualization
Interactive Latency in Big Data VisualizationInteractive Latency in Big Data Visualization
Interactive Latency in Big Data Visualization
 
집단지성 프로그래밍 08-가격모델링
집단지성 프로그래밍 08-가격모델링집단지성 프로그래밍 08-가격모델링
집단지성 프로그래밍 08-가격모델링
 
Applying Linear Optimization Using GLPK
Applying Linear Optimization Using GLPKApplying Linear Optimization Using GLPK
Applying Linear Optimization Using GLPK
 
"Wix Engineering Media AI Photo Studio", Mykola Mykhailych
"Wix Engineering Media AI Photo Studio", Mykola Mykhailych"Wix Engineering Media AI Photo Studio", Mykola Mykhailych
"Wix Engineering Media AI Photo Studio", Mykola Mykhailych
 
Intelligent Ruby + Machine Learning
Intelligent Ruby + Machine LearningIntelligent Ruby + Machine Learning
Intelligent Ruby + Machine Learning
 
Data Profiling in Apache Calcite
Data Profiling in Apache CalciteData Profiling in Apache Calcite
Data Profiling in Apache Calcite
 

More from Amit Kapoor

Model Visualisation
Model VisualisationModel Visualisation
Model Visualisation
Amit Kapoor
 
Storytelling - Gutenberg
Storytelling - GutenbergStorytelling - Gutenberg
Storytelling - Gutenberg
Amit Kapoor
 

More from Amit Kapoor (11)

Model Visualisation
Model VisualisationModel Visualisation
Model Visualisation
 
Fifth Elephant 2014 talk - Crafting Visual Stories with Data
Fifth Elephant 2014 talk - Crafting Visual Stories with DataFifth Elephant 2014 talk - Crafting Visual Stories with Data
Fifth Elephant 2014 talk - Crafting Visual Stories with Data
 
Storytelling with Data - See | Show | Tell | Engage
Storytelling with Data - See | Show | Tell | EngageStorytelling with Data - See | Show | Tell | Engage
Storytelling with Data - See | Show | Tell | Engage
 
Business Process Improvement - A Strategic and Supply Chain Perspective
Business Process Improvement - A Strategic and Supply Chain Perspective Business Process Improvement - A Strategic and Supply Chain Perspective
Business Process Improvement - A Strategic and Supply Chain Perspective
 
What makes a data-story work?
What makes a data-story work?What makes a data-story work?
What makes a data-story work?
 
What is Strategy - Thinking like a Strategist
What is Strategy - Thinking like a StrategistWhat is Strategy - Thinking like a Strategist
What is Strategy - Thinking like a Strategist
 
Story Structure and Modern Storytelling
Story Structure and Modern StorytellingStory Structure and Modern Storytelling
Story Structure and Modern Storytelling
 
Targeting the Moment of Truth - Using Big Data in Retail
Targeting the Moment of Truth - Using Big Data in RetailTargeting the Moment of Truth - Using Big Data in Retail
Targeting the Moment of Truth - Using Big Data in Retail
 
Storytelling - Gutenberg
Storytelling - GutenbergStorytelling - Gutenberg
Storytelling - Gutenberg
 
Analytics in Consulting
Analytics in ConsultingAnalytics in Consulting
Analytics in Consulting
 
Retail Pricing Perspective
Retail Pricing PerspectiveRetail Pricing Perspective
Retail Pricing Perspective
 

Recently uploaded

Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 

Recently uploaded (20)

Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 

Visualising Big Data

  • 2. Visualise Million Data Points x <- rnorm(1000000, mean=0, sd=2) y <- rnorm(1000000, mean=0, sd=2) xy <- data.frame(x,y) Same order as the Number of Pixels on my MacBook Air 1400 x 900 Data
  • 3. Data Sample Sampling can be effective (with overweighting unusual values) Require multiple plots or careful tuning parameters
  • 4. Data Sample Model Models are great as they scale nicely. But, visualisation is required as “I don’t know, what I don’t know.”
  • 5. Data Sample ModelBinning Binning can solve a lot of these challenges “Bin - Summarize - Smooth: A framework for visualising big data” - Hadley Wickam (2013)
  • 6.
  • 7. “Visualising big data is the process of creating generalized histograms”
  • 8. Approach BIN : fixed size bins = (x-origin)/width SUMMARIZE : summary stats = count, mean, stdev SMOOTH : smoothing e.g. kernel mean, regression VISUALISE : visualise using standard plots
  • 9. Bigvis Package in R Aim: To plot 100 million points in under 5 seconds. Approach: - Plotting using standard R libraries - Processing done in (fast) compiled C++ code, using Rcpp package - Outlier removal in big data - Smoothing to highlight trends & suppress noise
  • 10. Diamonds dataset ggplot(diamonds) + aes(carat, price) + geom_point(alpha = 0.2, colour = “orange”) 50k observations e.g. price, carat of diamonds
  • 11. Condense (bin + summarise) library(bigvis) library(ggplot2) Nbin <- 20 BinData <- with(diamonds, condense( bin(carat, find_width(carat,Nbin)), bin(price, find_width(price,Nbin)))
  • 12. Plotting the Condense p <- ggplot(BinData) + aes(carat, price, fill=.count) + geom_tile() Create bins = 20 and summarized using count
  • 13. Both Points & Condensed q <- p + geom_point(data = diamonds, aes(fill = NULL), alpha = 0.2, colour = "orange") Create bins = 20, summarized using count & added base data
  • 14. Movies dataset ggplot(movies) + aes(length, rating) + geom_point(alpha = 0.2, colour = “orange”) 130k observations e.g. length, rating of movies on IMDB
  • 15. Let us see the outliers title length rating 1 Matrjoschka 5700 8.5 2 The Cure for Insomnia 5220 5.9 3 The Longest Most Meaningless Movie in the World 2880 7.3 4 The Hazards of Helen 1428 6.6 5 **** 1100 6.9
  • 16. Condense (bin + summarise) library(bigvis) library(ggplot2) Nbin <- 1e4 BinData <- with(movies, condense( bin(length, find_width(length,Nbin)), bin(rating, find_width(rating,Nbin)))
  • 17. Condesed Plot p <- ggplot(BinData) + aes(length, rating, fill=.count) + geom_tile() Create bins = 10000 and summarized using count
  • 18. Remove Outliers p %>% peel(BinData) Create bins = 10000, summarize count & peel 1% outlier
  • 19. Smoothing smoothBinData <- smooth(peel (binData), h=c(20, 1)) autoplot(smoothBinData) Create bins = 20, summarize count, peel 1% outlier & smooth
  • 20. Big Data Visualisation ● Approach: Bin - Summarize - Smooth - Visualise ● “Interactively” plot nearly 100 millions data point in- memory for EDA in R ● Can be extend to in-database e.g. for binning ● Can be parallelised e.g. summarize on count, mean