SlideShare una empresa de Scribd logo
1 de 19
hadoopsphere
Future of DataVisualization
HadoopSphereVirtual Conclave
August 2015
2
Commonly understood components of data visualization
• Graphs, maps, tables, shapes
• WYSIWYG editors
• Dashboards
• HTML5 views
• Infographics
3
Defining data visualization
• Data visualization is the presentation of data in a pictorial or
graphical format. -Wikipedia
• Data visualization is a visual representation of the insights
gained from your analysis. - Datameer
4
EmergingTrends
• New Channels
– Mobile,VR devices
• More interactive charts
– Redraw, filter, annotations
• Multidimensional visual
– VR, GL
• Network visualization
– Social, Linkages
• Collaborations
– Share, Review,Workflow
• And we may have ‘audiolizations’ as well
– Audio narrations
5
Process of data visualization
Prepare
Explore
Design
Deliver
6
Challenges
Access to data
Parse data
Central data access
Fast queries
Complex visual types
Linked Views
Data mining
Collaboration
Workflow
7
Introducing Apache Zeppelin
HDFS/ Data Store
Operations
Governance/Security
YARN
Spark / Flink /Tajo …
• Apache Zeppelin is a web-based multi-purpose notebook for interactive data
analysis.
• It is a 100% open source incubator project of Apache Software Foundations.
• As per HadoopSphere,Apache Zeppelin is going to influence big data visualization
tools for next 2 years or more.
8
Zeppelin Notebook
• A web-based notebook that
enables interactive data
analytics.
• You can type in code in SQL,
Scala and more in the
notebook.
• Run the commands directly
from the notebook.
Source for this slide and subsequent slides:
(1) http://zeppelin.apache.org
(2) Lee Moon Soo, Introduction to Zeppelin, ApacheCon 2015
9
Zeppelin user interface
10
Behind the scenes
• Java based backend
• Active development community
- Built-in Apache Spark integration
- Uses Angular JS, D3.js
- Tested on Mac OSx, Ubuntu 14.x, CentOS 6.x
11
Zeppelin features -Visualization
• Some basic charts are
currently included in Zeppelin
and more will be added in
future.
• Visualizations are not limited
to Spark SQL's query -
relational output from many
other language backends can
be recognized and visualized.
12
Zeppelin features - Pivots
• With simple drag and drop
Zeppelin aggregates the
values and display them in
pivot chart.
• You can easily create chart
with multiple aggregated
values including sum, count,
average, min, max.
13
Zeppelin features – Dynamic forms
• Zeppelin can dynamically take
inputs in forms as part of the
notebook.
• These dynamic forms can be
used to see input based results
or render charts.
14
Zeppelin features – Collaboration and publishing
• Notebook URL can be shared
among collaborators. Zeppelin
can then broadcast any changes
in real time, just like the
collaboration in Google docs.
• Zeppelin provides a URL to
display the results only that can
easily be embedded as an
iframe inside a web page.
15
Zeppelin interpreter architecture
• Zeppelin Interpreter is a connector between Zeppelin and backend data processing
system. For example to use scala code in Zeppelin, you need scala interpreter.
• Every Interpreter belongs to an InterpreterGroup which is a unit of start/stop
interpreter. Interpreters in the same InterpreterGroup can reference each other.
For example, SparkSqlInterpreter can reference SparkInterpreter to get
SparkContext from it while they're in the same group.
ZeppelinServer
InterpreterGroup
Separate JVM process
Interpreter Interpreter Interpreter
Spark
Spark PySpark SparkSQL Dep
Load
libraries
Maven repositorySpark cluster
Share single SparkDriver
Thrift
16
Zeppelin interaction ecosystem
* includes future roadmap components
17
Getting involved with Zeppelin
• http://zeppelin.apache.org/
• http://github.com/apache/incubator-zeppelin
Installation reference:
• http://hortonworks.com/blog/introduction-to-data-science-
with-apache-spark/
• http://nflabs.github.io/z-manager/
Mailing List
• users@zeppelin.incubator.apache.org
18
Other Notebook options
• iPython Notebook
• Beaker
• Spark-Notebook
• Databricks Cloud Notebook
19
Thank you
scale@hadoopsphere.com
Twitter: @hadoopsphere

Más contenido relacionado

La actualidad más candente

Infrastructure for Deep Learning in Apache Spark
Infrastructure for Deep Learning in Apache SparkInfrastructure for Deep Learning in Apache Spark
Infrastructure for Deep Learning in Apache Spark
Databricks
 
Continuous Processing in Structured Streaming with Jose Torres
 Continuous Processing in Structured Streaming with Jose Torres Continuous Processing in Structured Streaming with Jose Torres
Continuous Processing in Structured Streaming with Jose Torres
Databricks
 
Taboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache SparkTaboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache Spark
tsliwowicz
 
Family data sheet HP Virtual Connect(May 2013)
Family data sheet HP Virtual Connect(May 2013)Family data sheet HP Virtual Connect(May 2013)
Family data sheet HP Virtual Connect(May 2013)
E. Balauca
 

La actualidad más candente (20)

Continuous Analytics & Optimisation using Apache Spark (Big Data Analytics, L...
Continuous Analytics & Optimisation using Apache Spark (Big Data Analytics, L...Continuous Analytics & Optimisation using Apache Spark (Big Data Analytics, L...
Continuous Analytics & Optimisation using Apache Spark (Big Data Analytics, L...
 
Mining public datasets using opensource tools: Zeppelin, Spark and Juju
Mining public datasets using opensource tools: Zeppelin, Spark and JujuMining public datasets using opensource tools: Zeppelin, Spark and Juju
Mining public datasets using opensource tools: Zeppelin, Spark and Juju
 
Infrastructure for Deep Learning in Apache Spark
Infrastructure for Deep Learning in Apache SparkInfrastructure for Deep Learning in Apache Spark
Infrastructure for Deep Learning in Apache Spark
 
Sa introduction to big data pipelining with cassandra & spark west mins...
Sa introduction to big data pipelining with cassandra & spark   west mins...Sa introduction to big data pipelining with cassandra & spark   west mins...
Sa introduction to big data pipelining with cassandra & spark west mins...
 
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
 
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
 
Real time ETL processing using Spark streaming
Real time ETL processing using Spark streamingReal time ETL processing using Spark streaming
Real time ETL processing using Spark streaming
 
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
 
Big Telco - Yousun Jeong
Big Telco - Yousun JeongBig Telco - Yousun Jeong
Big Telco - Yousun Jeong
 
Data Science with Spark & Zeppelin
Data Science with Spark & ZeppelinData Science with Spark & Zeppelin
Data Science with Spark & Zeppelin
 
Continuous Processing in Structured Streaming with Jose Torres
 Continuous Processing in Structured Streaming with Jose Torres Continuous Processing in Structured Streaming with Jose Torres
Continuous Processing in Structured Streaming with Jose Torres
 
Designing the Next Generation of Data Pipelines at Zillow with Apache Spark
Designing the Next Generation of Data Pipelines at Zillow with Apache SparkDesigning the Next Generation of Data Pipelines at Zillow with Apache Spark
Designing the Next Generation of Data Pipelines at Zillow with Apache Spark
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its Benefits
 
Spark Advanced Analytics NJ Data Science Meetup - Princeton University
Spark Advanced Analytics NJ Data Science Meetup - Princeton UniversitySpark Advanced Analytics NJ Data Science Meetup - Princeton University
Spark Advanced Analytics NJ Data Science Meetup - Princeton University
 
Taboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache SparkTaboola Road To Scale With Apache Spark
Taboola Road To Scale With Apache Spark
 
Family data sheet HP Virtual Connect(May 2013)
Family data sheet HP Virtual Connect(May 2013)Family data sheet HP Virtual Connect(May 2013)
Family data sheet HP Virtual Connect(May 2013)
 
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
 
Stream All Things—Patterns of Modern Data Integration with Gwen Shapira
Stream All Things—Patterns of Modern Data Integration with Gwen ShapiraStream All Things—Patterns of Modern Data Integration with Gwen Shapira
Stream All Things—Patterns of Modern Data Integration with Gwen Shapira
 
Tangram: Distributed Scheduling Framework for Apache Spark at Facebook
Tangram: Distributed Scheduling Framework for Apache Spark at FacebookTangram: Distributed Scheduling Framework for Apache Spark at Facebook
Tangram: Distributed Scheduling Framework for Apache Spark at Facebook
 
ASPgems - kappa architecture
ASPgems - kappa architectureASPgems - kappa architecture
ASPgems - kappa architecture
 

Similar a Future of data visualization

IRMAC April 2015 - DMBOK2 DWBI New Content
IRMAC April 2015 - DMBOK2 DWBI New ContentIRMAC April 2015 - DMBOK2 DWBI New Content
IRMAC April 2015 - DMBOK2 DWBI New Content
Martin Sykora
 

Similar a Future of data visualization (20)

Tableau
TableauTableau
Tableau
 
Spring Integration Splunk
Spring Integration SplunkSpring Integration Splunk
Spring Integration Splunk
 
Big data analytics with hadoop volume 2
Big data analytics with hadoop volume 2Big data analytics with hadoop volume 2
Big data analytics with hadoop volume 2
 
IRMAC April 2015 - DMBOK2 DWBI New Content
IRMAC April 2015 - DMBOK2 DWBI New ContentIRMAC April 2015 - DMBOK2 DWBI New Content
IRMAC April 2015 - DMBOK2 DWBI New Content
 
Jason Huang, Solutions Engineer, Qubole at MLconf ATL - 9/18/15
Jason Huang, Solutions Engineer, Qubole at MLconf ATL - 9/18/15Jason Huang, Solutions Engineer, Qubole at MLconf ATL - 9/18/15
Jason Huang, Solutions Engineer, Qubole at MLconf ATL - 9/18/15
 
Atlanta MLConf
Atlanta MLConfAtlanta MLConf
Atlanta MLConf
 
SplunkLive! Amsterdam 2015 - Web Framework & 3rd Party Visualization
SplunkLive! Amsterdam 2015 - Web Framework & 3rd Party VisualizationSplunkLive! Amsterdam 2015 - Web Framework & 3rd Party Visualization
SplunkLive! Amsterdam 2015 - Web Framework & 3rd Party Visualization
 
SEMLIB Final Conference | DERI presentation
SEMLIB Final Conference | DERI presentationSEMLIB Final Conference | DERI presentation
SEMLIB Final Conference | DERI presentation
 
Transitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to SparkTransitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to Spark
 
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
 
Resume_VipinKP
Resume_VipinKPResume_VipinKP
Resume_VipinKP
 
Advanced Visualization of Spark jobs
Advanced Visualization of Spark jobsAdvanced Visualization of Spark jobs
Advanced Visualization of Spark jobs
 
Sparkflows.io
Sparkflows.ioSparkflows.io
Sparkflows.io
 
SplunkLive London 2014 Developer Presentation
SplunkLive London 2014  Developer PresentationSplunkLive London 2014  Developer Presentation
SplunkLive London 2014 Developer Presentation
 
Pallavi_Resume
Pallavi_ResumePallavi_Resume
Pallavi_Resume
 
Big Data at Oracle - Strata 2015 San Jose
Big Data at Oracle - Strata 2015 San JoseBig Data at Oracle - Strata 2015 San Jose
Big Data at Oracle - Strata 2015 San Jose
 
Ben Gardner | Delivering a Linked Data warehouse and integrating across the w...
Ben Gardner | Delivering a Linked Data warehouse and integrating across the w...Ben Gardner | Delivering a Linked Data warehouse and integrating across the w...
Ben Gardner | Delivering a Linked Data warehouse and integrating across the w...
 
DeepeshRehi
DeepeshRehiDeepeshRehi
DeepeshRehi
 
Best-Practices-for-Using-Tableau-With-Snowflake.pdf
Best-Practices-for-Using-Tableau-With-Snowflake.pdfBest-Practices-for-Using-Tableau-With-Snowflake.pdf
Best-Practices-for-Using-Tableau-With-Snowflake.pdf
 
Apache Spark: Lightning Fast Cluster Computing
Apache Spark: Lightning Fast Cluster ComputingApache Spark: Lightning Fast Cluster Computing
Apache Spark: Lightning Fast Cluster Computing
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

Future of data visualization

  • 2. 2 Commonly understood components of data visualization • Graphs, maps, tables, shapes • WYSIWYG editors • Dashboards • HTML5 views • Infographics
  • 3. 3 Defining data visualization • Data visualization is the presentation of data in a pictorial or graphical format. -Wikipedia • Data visualization is a visual representation of the insights gained from your analysis. - Datameer
  • 4. 4 EmergingTrends • New Channels – Mobile,VR devices • More interactive charts – Redraw, filter, annotations • Multidimensional visual – VR, GL • Network visualization – Social, Linkages • Collaborations – Share, Review,Workflow • And we may have ‘audiolizations’ as well – Audio narrations
  • 5. 5 Process of data visualization Prepare Explore Design Deliver
  • 6. 6 Challenges Access to data Parse data Central data access Fast queries Complex visual types Linked Views Data mining Collaboration Workflow
  • 7. 7 Introducing Apache Zeppelin HDFS/ Data Store Operations Governance/Security YARN Spark / Flink /Tajo … • Apache Zeppelin is a web-based multi-purpose notebook for interactive data analysis. • It is a 100% open source incubator project of Apache Software Foundations. • As per HadoopSphere,Apache Zeppelin is going to influence big data visualization tools for next 2 years or more.
  • 8. 8 Zeppelin Notebook • A web-based notebook that enables interactive data analytics. • You can type in code in SQL, Scala and more in the notebook. • Run the commands directly from the notebook. Source for this slide and subsequent slides: (1) http://zeppelin.apache.org (2) Lee Moon Soo, Introduction to Zeppelin, ApacheCon 2015
  • 10. 10 Behind the scenes • Java based backend • Active development community - Built-in Apache Spark integration - Uses Angular JS, D3.js - Tested on Mac OSx, Ubuntu 14.x, CentOS 6.x
  • 11. 11 Zeppelin features -Visualization • Some basic charts are currently included in Zeppelin and more will be added in future. • Visualizations are not limited to Spark SQL's query - relational output from many other language backends can be recognized and visualized.
  • 12. 12 Zeppelin features - Pivots • With simple drag and drop Zeppelin aggregates the values and display them in pivot chart. • You can easily create chart with multiple aggregated values including sum, count, average, min, max.
  • 13. 13 Zeppelin features – Dynamic forms • Zeppelin can dynamically take inputs in forms as part of the notebook. • These dynamic forms can be used to see input based results or render charts.
  • 14. 14 Zeppelin features – Collaboration and publishing • Notebook URL can be shared among collaborators. Zeppelin can then broadcast any changes in real time, just like the collaboration in Google docs. • Zeppelin provides a URL to display the results only that can easily be embedded as an iframe inside a web page.
  • 15. 15 Zeppelin interpreter architecture • Zeppelin Interpreter is a connector between Zeppelin and backend data processing system. For example to use scala code in Zeppelin, you need scala interpreter. • Every Interpreter belongs to an InterpreterGroup which is a unit of start/stop interpreter. Interpreters in the same InterpreterGroup can reference each other. For example, SparkSqlInterpreter can reference SparkInterpreter to get SparkContext from it while they're in the same group. ZeppelinServer InterpreterGroup Separate JVM process Interpreter Interpreter Interpreter Spark Spark PySpark SparkSQL Dep Load libraries Maven repositorySpark cluster Share single SparkDriver Thrift
  • 16. 16 Zeppelin interaction ecosystem * includes future roadmap components
  • 17. 17 Getting involved with Zeppelin • http://zeppelin.apache.org/ • http://github.com/apache/incubator-zeppelin Installation reference: • http://hortonworks.com/blog/introduction-to-data-science- with-apache-spark/ • http://nflabs.github.io/z-manager/ Mailing List • users@zeppelin.incubator.apache.org
  • 18. 18 Other Notebook options • iPython Notebook • Beaker • Spark-Notebook • Databricks Cloud Notebook