Future of data visualization

hadoopsphere
Future of DataVisualization
HadoopSphereVirtual Conclave
August 2015

2
Commonly understood components of data visualization
• Graphs, maps, tables, shapes
• WYSIWYG editors
• Dashboards
• HTML5 views
• Infographics

3
Defining data visualization
• Data visualization is the presentation of data in a pictorial or
graphical format. -Wikipedia
• Data visualization is a visual representation of the insights
gained from your analysis. - Datameer

4
EmergingTrends
• New Channels
– Mobile,VR devices
• More interactive charts
– Redraw, filter, annotations
• Multidimensional visual
– VR, GL
• Network visualization
– Social, Linkages
• Collaborations
– Share, Review,Workflow
• And we may have ‘audiolizations’ as well
– Audio narrations

5
Process of data visualization
Prepare
Explore
Design
Deliver

6
Challenges
Access to data
Parse data
Central data access
Fast queries
Complex visual types
Linked Views
Data mining
Collaboration
Workflow

7
Introducing Apache Zeppelin
HDFS/ Data Store
Operations
Governance/Security
YARN
Spark / Flink /Tajo …
• Apache Zeppelin is a web-based multi-purpose notebook for interactive data
analysis.
• It is a 100% open source incubator project of Apache Software Foundations.
• As per HadoopSphere,Apache Zeppelin is going to influence big data visualization
tools for next 2 years or more.

8
Zeppelin Notebook
• A web-based notebook that
enables interactive data
analytics.
• You can type in code in SQL,
Scala and more in the
notebook.
• Run the commands directly
from the notebook.
Source for this slide and subsequent slides:
(1) http://zeppelin.apache.org
(2) Lee Moon Soo, Introduction to Zeppelin, ApacheCon 2015

10
Behind the scenes
• Java based backend
• Active development community
- Built-in Apache Spark integration
- Uses Angular JS, D3.js
- Tested on Mac OSx, Ubuntu 14.x, CentOS 6.x

11
Zeppelin features -Visualization
• Some basic charts are
currently included in Zeppelin
and more will be added in
future.
• Visualizations are not limited
to Spark SQL's query -
relational output from many
other language backends can
be recognized and visualized.

12
Zeppelin features - Pivots
• With simple drag and drop
Zeppelin aggregates the
values and display them in
pivot chart.
• You can easily create chart
with multiple aggregated
values including sum, count,
average, min, max.

13
Zeppelin features – Dynamic forms
• Zeppelin can dynamically take
inputs in forms as part of the
notebook.
• These dynamic forms can be
used to see input based results
or render charts.

14
Zeppelin features – Collaboration and publishing
• Notebook URL can be shared
among collaborators. Zeppelin
can then broadcast any changes
in real time, just like the
collaboration in Google docs.
• Zeppelin provides a URL to
display the results only that can
easily be embedded as an
iframe inside a web page.

15
Zeppelin interpreter architecture
• Zeppelin Interpreter is a connector between Zeppelin and backend data processing
system. For example to use scala code in Zeppelin, you need scala interpreter.
• Every Interpreter belongs to an InterpreterGroup which is a unit of start/stop
interpreter. Interpreters in the same InterpreterGroup can reference each other.
For example, SparkSqlInterpreter can reference SparkInterpreter to get
SparkContext from it while they're in the same group.
ZeppelinServer
InterpreterGroup
Separate JVM process
Interpreter Interpreter Interpreter
Spark
Spark PySpark SparkSQL Dep
Load
libraries
Maven repositorySpark cluster
Share single SparkDriver
Thrift

16
Zeppelin interaction ecosystem
* includes future roadmap components

17
Getting involved with Zeppelin
• http://zeppelin.apache.org/
• http://github.com/apache/incubator-zeppelin
Installation reference:
• http://hortonworks.com/blog/introduction-to-data-science-
with-apache-spark/
• http://nflabs.github.io/z-manager/
Mailing List
• users@zeppelin.incubator.apache.org

18
Other Notebook options
• iPython Notebook
• Beaker
• Spark-Notebook
• Databricks Cloud Notebook

19
Thank you
scale@hadoopsphere.com
Twitter: @hadoopsphere

Future of data visualization

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Future of data visualization

Similar a Future of data visualization (20)

Último

Último (20)

Future of data visualization