Apache Zeppelin is an emerging open-source tool for data visualization that allows for interactive data analytics. It provides a web-based notebook interface that allows users to write and execute code in languages like SQL and Scala. The tool offers features like built-in visualization capabilities, pivot tables, dynamic forms, and collaboration tools. Zeppelin works with backends like Apache Spark and uses interpreters to connect to different data processing systems. It is predicted to influence big data visualization in the coming years.
3. 3
Defining data visualization
• Data visualization is the presentation of data in a pictorial or
graphical format. -Wikipedia
• Data visualization is a visual representation of the insights
gained from your analysis. - Datameer
4. 4
EmergingTrends
• New Channels
– Mobile,VR devices
• More interactive charts
– Redraw, filter, annotations
• Multidimensional visual
– VR, GL
• Network visualization
– Social, Linkages
• Collaborations
– Share, Review,Workflow
• And we may have ‘audiolizations’ as well
– Audio narrations
6. 6
Challenges
Access to data
Parse data
Central data access
Fast queries
Complex visual types
Linked Views
Data mining
Collaboration
Workflow
7. 7
Introducing Apache Zeppelin
HDFS/ Data Store
Operations
Governance/Security
YARN
Spark / Flink /Tajo …
• Apache Zeppelin is a web-based multi-purpose notebook for interactive data
analysis.
• It is a 100% open source incubator project of Apache Software Foundations.
• As per HadoopSphere,Apache Zeppelin is going to influence big data visualization
tools for next 2 years or more.
8. 8
Zeppelin Notebook
• A web-based notebook that
enables interactive data
analytics.
• You can type in code in SQL,
Scala and more in the
notebook.
• Run the commands directly
from the notebook.
Source for this slide and subsequent slides:
(1) http://zeppelin.apache.org
(2) Lee Moon Soo, Introduction to Zeppelin, ApacheCon 2015
10. 10
Behind the scenes
• Java based backend
• Active development community
- Built-in Apache Spark integration
- Uses Angular JS, D3.js
- Tested on Mac OSx, Ubuntu 14.x, CentOS 6.x
11. 11
Zeppelin features -Visualization
• Some basic charts are
currently included in Zeppelin
and more will be added in
future.
• Visualizations are not limited
to Spark SQL's query -
relational output from many
other language backends can
be recognized and visualized.
12. 12
Zeppelin features - Pivots
• With simple drag and drop
Zeppelin aggregates the
values and display them in
pivot chart.
• You can easily create chart
with multiple aggregated
values including sum, count,
average, min, max.
13. 13
Zeppelin features – Dynamic forms
• Zeppelin can dynamically take
inputs in forms as part of the
notebook.
• These dynamic forms can be
used to see input based results
or render charts.
14. 14
Zeppelin features – Collaboration and publishing
• Notebook URL can be shared
among collaborators. Zeppelin
can then broadcast any changes
in real time, just like the
collaboration in Google docs.
• Zeppelin provides a URL to
display the results only that can
easily be embedded as an
iframe inside a web page.
15. 15
Zeppelin interpreter architecture
• Zeppelin Interpreter is a connector between Zeppelin and backend data processing
system. For example to use scala code in Zeppelin, you need scala interpreter.
• Every Interpreter belongs to an InterpreterGroup which is a unit of start/stop
interpreter. Interpreters in the same InterpreterGroup can reference each other.
For example, SparkSqlInterpreter can reference SparkInterpreter to get
SparkContext from it while they're in the same group.
ZeppelinServer
InterpreterGroup
Separate JVM process
Interpreter Interpreter Interpreter
Spark
Spark PySpark SparkSQL Dep
Load
libraries
Maven repositorySpark cluster
Share single SparkDriver
Thrift