SlideShare a Scribd company logo
1 of 42
Spark Kernel
IBM Emerging Internet Technologies
Outline
• Scenario
• Problem
• How do you enable interactive
applications against Apache
Spark?
• Solution
• Spark Kernel
• Architecture
• Memory issue
• Comm API
• Livesheets (line of business tool)
• RESTful Server (query interface)
• Extending the Spark Kernel
• Summary & Questions
Scenario
• Livesheets prototype
• Needs to be able to build computations on the fly
• Needs to be able to perform computations on static (historical) data as
well as dynamic (streaming) data
• Needs to be responsive (order of seconds instead of minutes)
Problem
How do you enable interactive applications?
• Spark Submit for job submission to Apache Spark
• JDBC and other offerings available for Spark SQL
• RESTful interfaces available to submit jars
• Spark Shell offers code snippet support to execute against a Spark cluster
Problem
How do you enable interactive applications?
• Used Spark Submit
• Bundled up Spark-based computations into a jar
• Started an external process to run the Spark Submit script against the jar
Problem
How do you enable interactive applications?
• What was wrong?
• Rebundle the jar every time a computation changed
• Not easy to attach to an existing Spark job
• Getting results involved writing to a data store and then reading back out
• Very slow turnaround
Solution: Spark Kernel
• Scala application that can do the following:
• Define and execute raw Scala source code
• Define and run Spark tasks via code snippets
or jars
• Collect results directly from a Spark cluster
• Benefits
• Avoid friction of shipping jars and reading
results from peripheral systems
• Well-defined API (IPython/Jupyter)
• Acts as a proxy for Spark applications such that
they can run remotely away from Spark
• Provides a client library for application
development
Spark Cluster
Master
Worker
Worker
Worker
Worker
Kernel
IPython App 1
Kernel Client
library
App2
Kernel Client
library
ZeroMQ with IPython
message protocol
Kernel Architecture
Spark Cluster
Master
Worker
Worker
Worker
Worker
Kernel
ØMQ
Akka
Message Parsing and Validation
Routing
Message Handling
Scala Interpreter
Class
Server
Spark
Context
Heartbeat Shell Control StdIn IOPub
Kernel Architecture
Spark Cluster
Master
Worker
Worker
Worker
Worker
Kernel
ØMQ
Akka
Message Parsing and Validation
Routing
Message Handling
Scala Interpreter
Class
Server
Spark
Context
Heartbea
t
She
ll
Control
StdI
n
IOPu
b
• Why ZeroMQ?
• Used by IPython
• Responsiveness
• Building blocks have behavior
• Publisher sends messages
to all subscribers
Kernel Architecture
Spark Cluster
Master
Worker
Worker
Worker
Worker
Kernel
ØMQ
Akka
Message Parsing and Validation
Routing
Message Handling
Scala Interpreter
Class
Server
Spark
Context
Heartbea
t
She
ll
Control
StdI
n
IOPu
b
• Why Akka?
• Concurrency
• Code isolation
• Fault tolerance
• Scalability
• IPython Protocol
• Specifies incoming and
outgoing messages handled
by the kernel
• Defines the purposes of the
five channels of
communication
Channels of Communication
Kernel
ØMQ
…
Heartbeat Shell Control StdIn IOPub
• ZeroMQ API
• Uses ZeroMQ for socket
communication via the five
defined ports
• Uses ZMTP as the wire
protocol
• Heartbeat
• Used to indicate that the
kernel is still alive
• Echoes received messages
back to client
• Primarily used by IPython
Channels of Communication
Kernel
ØMQ
…
Heartbeat Shell Control StdIn IOPub
• Shell
• Used to communicate
requests from a client to the
kernel
• Main purposes are code
execution and Comm
messages from a client
• Control
• Serves as a higher priority
shell channel
• Typically used to receive
shutdown signals
Channels of Communication
Kernel
ØMQ
…
Heartbeat Shell Control StdIn IOPub
• StdIn
• Used to communicate
requests from the kernel to
the client(s)
• Primarily used by IPython as
a form of communication for
users through the UI
• IOPub
• Broadcasts messages to all
listening clients
• Used to communicate side
effects (standard out/error)
as well as Comm messages
Channels of Communication
Kernel
ØMQ
…
Heartbeat Shell Control StdIn IOPub
Processing Messages
Kernel
…
Akka
Message Parsing and Validation
Routing
Message Handling
…
• Message Parsing and Validation
• Uses Akka actors wrapping JeroMQ as an abstraction to parse messages
• Calculates an HMAC (keyed-hash message authentication code) using
SHA-256 and a secret key to validate against a signature in a message
Processing Messages
Kernel
…
Akka
Message Parsing and Validation
Routing
Message Handling
…
• Routing
• Incoming messages are routed by message type to associated message
handler actors
• Outgoing messages are routed by message type to associated channels
Processing Messages
Kernel
…
Akka
Message Parsing and Validation
Routing
Message Handling
…
• Message Handling
• Each message type has an associated Akka actor to handle the request
• Some handlers use child actors to perform tasks, protecting the state of the
handler by following Erlang’s Error Kernel Pattern as well as reducing strain
on the handler
Scala Interpreter
Kernel
…
Scala Interpreter
Class
Server
Spark
Context
• Scala Interpreter
• Uses the Spark REPL API to execute Scala code
• Contains zero modifications to Spark’s REPL
• Contains injected variables to provide Spark APIs and kernel APIs including
magics and Comm communication
Scala Interpreter
Kernel
…
Scala Interpreter
Class
Server
Spark
Context
• Class Server
• Exposes generated REPL
classes to the Spark cluster
• In Spark’s Scala 2.10
implementation of the
REPL, this is created for us
• Spark Context
• Standard Scala-based
Spark Context
• Exposed as a variable
named ‘sc’ for user
submitted code
Kernel
Kernel Client Architecture
Heartbea
t
ShellControlStdInIO/Pub
Kernel Client
ØMQ
Akka
Message Parsing and
Validation
Routing
Message Handling
API
Application
Expose public methods
accessible from Scala
and Java
Client sockets mirror
and communicate with
kernel sockets
Actor system for client
shares codebase with
kernel
Kernel Client Example
Memory Issue
• Scala REPL (therefore Spark Shell)
• Generates new classes with each code snippet compiled (leads to
PermGen space issues on JVM)
• Instantiates a new Request class instance per execution to hold state
(leads to OutOfMemory exception)
Memory Issue
Comm API to the rescue!
Comm API
Frontend
(Client)
Backend
(Kernel)
• Flexibility
• Bidirectional communication
• Ability to programmatically
define messages and their
interactions
• Performance
• Avoid recompiling code
• Does not keep execution state
• Simplicity
• Start (open) communication
• Send data (msg)
• Stop (close) communication
open
msg
close
Comm API
Frontend
(Client)
Backend
(Kernel)
• Comm Open Request
• Establishes a new link
between the frontend and
backend
• Can contain data needed for
initialization
{
"comm_id" : "u-u-i-d",
"target_name" : "my_comm",
"data" : {}
}
open
msg
close
Comm API
Frontend
(Client)
Backend
(Kernel)
• Comm Msg Request
• Primary form of
communication
• Contains data relevant to
the request
open
msg
close
{
"comm_id" : "u-u-i-d",
"data" : {}
}
Comm API
Frontend
(Client)
Backend
(Kernel)
• Comm Close Request
• Removes the link between
the front and back end
components
• Can contain data needed for
teardown
open
msg
close
{
"comm_id" : "u-u-i-d",
"data" : {}
}
Livesheets
RESTful Server
Extending the Spark Kernel
• PySpark support
• Zeppelin integration
Summary
• Goal was to provide an API to enable interactive Spark applications
• Kernel provides a responsive API to use Apache Spark
• Submit code snippets in same fashion as Spark Shell
• Use Comm API for programmatically-defined messages
• Kernel implements IPython message protocol
• Able to use with IPython notebooks out of the box
• Repository: https://github.com/ibm-et/spark-kernel
Questions?
Contact info:
rcsenkbe@us.ibm.com, fallside@us.ibm.com

More Related Content

What's hot

Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Helena Edelson
 
Real Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingReal Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingHari Shreedharan
 
Rethinking Streaming Analytics For Scale
Rethinking Streaming Analytics For ScaleRethinking Streaming Analytics For Scale
Rethinking Streaming Analytics For ScaleHelena Edelson
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache SparkMammoth Data
 
Real-time personal trainer on the SMACK stack
Real-time personal trainer on the SMACK stackReal-time personal trainer on the SMACK stack
Real-time personal trainer on the SMACK stackAnirvan Chakraborty
 
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, ScalaLambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, ScalaHelena Edelson
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and AkkaStreaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and AkkaHelena Edelson
 
AWS April 2016 Webinar Series - Best Practices for Apache Spark on AWS
AWS April 2016 Webinar Series - Best Practices for Apache Spark on AWSAWS April 2016 Webinar Series - Best Practices for Apache Spark on AWS
AWS April 2016 Webinar Series - Best Practices for Apache Spark on AWSAmazon Web Services
 
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...Lucidworks
 
A Tale of Two APIs: Using Spark Streaming In Production
A Tale of Two APIs: Using Spark Streaming In ProductionA Tale of Two APIs: Using Spark Streaming In Production
A Tale of Two APIs: Using Spark Streaming In ProductionLightbend
 
Spark DataFrames: Simple and Fast Analytics on Structured Data at Spark Summi...
Spark DataFrames: Simple and Fast Analytics on Structured Data at Spark Summi...Spark DataFrames: Simple and Fast Analytics on Structured Data at Spark Summi...
Spark DataFrames: Simple and Fast Analytics on Structured Data at Spark Summi...Databricks
 
Spark Tuning For Enterprise System Administrators, Spark Summit East 2016
Spark Tuning For Enterprise System Administrators, Spark Summit East 2016Spark Tuning For Enterprise System Administrators, Spark Summit East 2016
Spark Tuning For Enterprise System Administrators, Spark Summit East 2016Anya Bida
 
Apache Spark 1.6 with Zeppelin - Transformations and Actions on RDDs
Apache Spark 1.6 with Zeppelin - Transformations and Actions on RDDsApache Spark 1.6 with Zeppelin - Transformations and Actions on RDDs
Apache Spark 1.6 with Zeppelin - Transformations and Actions on RDDsTimothy Spann
 
Lambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLLambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLhuguk
 
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...Anton Kirillov
 
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch AnalysisNoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch AnalysisHelena Edelson
 
Cassandra and SparkSQL: You Don't Need Functional Programming for Fun with Ru...
Cassandra and SparkSQL: You Don't Need Functional Programming for Fun with Ru...Cassandra and SparkSQL: You Don't Need Functional Programming for Fun with Ru...
Cassandra and SparkSQL: You Don't Need Functional Programming for Fun with Ru...Databricks
 
Efficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out DatabasesEfficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out DatabasesJen Aman
 
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and DatabricksFour Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and DatabricksLegacy Typesafe (now Lightbend)
 
Apache Spark Streaming: Architecture and Fault Tolerance
Apache Spark Streaming: Architecture and Fault ToleranceApache Spark Streaming: Architecture and Fault Tolerance
Apache Spark Streaming: Architecture and Fault ToleranceSachin Aggarwal
 

What's hot (20)

Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
 
Real Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingReal Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark Streaming
 
Rethinking Streaming Analytics For Scale
Rethinking Streaming Analytics For ScaleRethinking Streaming Analytics For Scale
Rethinking Streaming Analytics For Scale
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache Spark
 
Real-time personal trainer on the SMACK stack
Real-time personal trainer on the SMACK stackReal-time personal trainer on the SMACK stack
Real-time personal trainer on the SMACK stack
 
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, ScalaLambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and AkkaStreaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and Akka
 
AWS April 2016 Webinar Series - Best Practices for Apache Spark on AWS
AWS April 2016 Webinar Series - Best Practices for Apache Spark on AWSAWS April 2016 Webinar Series - Best Practices for Apache Spark on AWS
AWS April 2016 Webinar Series - Best Practices for Apache Spark on AWS
 
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
 
A Tale of Two APIs: Using Spark Streaming In Production
A Tale of Two APIs: Using Spark Streaming In ProductionA Tale of Two APIs: Using Spark Streaming In Production
A Tale of Two APIs: Using Spark Streaming In Production
 
Spark DataFrames: Simple and Fast Analytics on Structured Data at Spark Summi...
Spark DataFrames: Simple and Fast Analytics on Structured Data at Spark Summi...Spark DataFrames: Simple and Fast Analytics on Structured Data at Spark Summi...
Spark DataFrames: Simple and Fast Analytics on Structured Data at Spark Summi...
 
Spark Tuning For Enterprise System Administrators, Spark Summit East 2016
Spark Tuning For Enterprise System Administrators, Spark Summit East 2016Spark Tuning For Enterprise System Administrators, Spark Summit East 2016
Spark Tuning For Enterprise System Administrators, Spark Summit East 2016
 
Apache Spark 1.6 with Zeppelin - Transformations and Actions on RDDs
Apache Spark 1.6 with Zeppelin - Transformations and Actions on RDDsApache Spark 1.6 with Zeppelin - Transformations and Actions on RDDs
Apache Spark 1.6 with Zeppelin - Transformations and Actions on RDDs
 
Lambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLLambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale ML
 
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
Data processing platforms architectures with Spark, Mesos, Akka, Cassandra an...
 
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch AnalysisNoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
 
Cassandra and SparkSQL: You Don't Need Functional Programming for Fun with Ru...
Cassandra and SparkSQL: You Don't Need Functional Programming for Fun with Ru...Cassandra and SparkSQL: You Don't Need Functional Programming for Fun with Ru...
Cassandra and SparkSQL: You Don't Need Functional Programming for Fun with Ru...
 
Efficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out DatabasesEfficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out Databases
 
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and DatabricksFour Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
 
Apache Spark Streaming: Architecture and Fault Tolerance
Apache Spark Streaming: Architecture and Fault ToleranceApache Spark Streaming: Architecture and Fault Tolerance
Apache Spark Streaming: Architecture and Fault Tolerance
 

Viewers also liked

Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
Real-Time Anomaly Detection  with Spark MLlib, Akka and  CassandraReal-Time Anomaly Detection  with Spark MLlib, Akka and  Cassandra
Real-Time Anomaly Detection with Spark MLlib, Akka and CassandraNatalino Busa
 
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo LeeData Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo LeeSpark Summit
 
Using Spark, Kafka, Cassandra and Akka on Mesos for Real-Time Personalization
Using Spark, Kafka, Cassandra and Akka on Mesos for Real-Time PersonalizationUsing Spark, Kafka, Cassandra and Akka on Mesos for Real-Time Personalization
Using Spark, Kafka, Cassandra and Akka on Mesos for Real-Time PersonalizationPatrick Di Loreto
 
H2O - the optimized HTTP server
H2O - the optimized HTTP serverH2O - the optimized HTTP server
H2O - the optimized HTTP serverKazuho Oku
 
Container Orchestration Wars
Container Orchestration WarsContainer Orchestration Wars
Container Orchestration WarsKarl Isenberg
 
Linux 4.x Tracing Tools: Using BPF Superpowers
Linux 4.x Tracing Tools: Using BPF SuperpowersLinux 4.x Tracing Tools: Using BPF Superpowers
Linux 4.x Tracing Tools: Using BPF SuperpowersBrendan Gregg
 

Viewers also liked (7)

How to deploy Apache Spark 
to Mesos/DCOS
How to deploy Apache Spark 
to Mesos/DCOSHow to deploy Apache Spark 
to Mesos/DCOS
How to deploy Apache Spark 
to Mesos/DCOS
 
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
Real-Time Anomaly Detection  with Spark MLlib, Akka and  CassandraReal-Time Anomaly Detection  with Spark MLlib, Akka and  Cassandra
Real-Time Anomaly Detection with Spark MLlib, Akka and Cassandra
 
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo LeeData Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
 
Using Spark, Kafka, Cassandra and Akka on Mesos for Real-Time Personalization
Using Spark, Kafka, Cassandra and Akka on Mesos for Real-Time PersonalizationUsing Spark, Kafka, Cassandra and Akka on Mesos for Real-Time Personalization
Using Spark, Kafka, Cassandra and Akka on Mesos for Real-Time Personalization
 
H2O - the optimized HTTP server
H2O - the optimized HTTP serverH2O - the optimized HTTP server
H2O - the optimized HTTP server
 
Container Orchestration Wars
Container Orchestration WarsContainer Orchestration Wars
Container Orchestration Wars
 
Linux 4.x Tracing Tools: Using BPF Superpowers
Linux 4.x Tracing Tools: Using BPF SuperpowersLinux 4.x Tracing Tools: Using BPF Superpowers
Linux 4.x Tracing Tools: Using BPF Superpowers
 

Similar to Spark Kernel Enables Interactive Apache Spark Apps

Typesafe stack - Scala, Akka and Play
Typesafe stack - Scala, Akka and PlayTypesafe stack - Scala, Akka and Play
Typesafe stack - Scala, Akka and PlayLuka Zakrajšek
 
Real world Scala hAkking NLJUG JFall 2011
Real world Scala hAkking NLJUG JFall 2011Real world Scala hAkking NLJUG JFall 2011
Real world Scala hAkking NLJUG JFall 2011Raymond Roestenburg
 
Agile Lab_BigData_Meetup_AKKA
Agile Lab_BigData_Meetup_AKKAAgile Lab_BigData_Meetup_AKKA
Agile Lab_BigData_Meetup_AKKAPaolo Platter
 
Monitoring Akka with Kamon 1.0
Monitoring Akka with Kamon 1.0Monitoring Akka with Kamon 1.0
Monitoring Akka with Kamon 1.0Steffen Gebert
 
Virtual Flink Forward 2020: How Streaming Helps Your Staging Environment and ...
Virtual Flink Forward 2020: How Streaming Helps Your Staging Environment and ...Virtual Flink Forward 2020: How Streaming Helps Your Staging Environment and ...
Virtual Flink Forward 2020: How Streaming Helps Your Staging Environment and ...Flink Forward
 
Lessons Learned: Using Spark and Microservices
Lessons Learned: Using Spark and MicroservicesLessons Learned: Using Spark and Microservices
Lessons Learned: Using Spark and MicroservicesAlexis Seigneurin
 
Seattle Spark Meetup Mobius CSharp API
Seattle Spark Meetup Mobius CSharp APISeattle Spark Meetup Mobius CSharp API
Seattle Spark Meetup Mobius CSharp APIshareddatamsft
 
Rethinking the debugger
Rethinking the debuggerRethinking the debugger
Rethinking the debuggerIulian Dragos
 
A tour of Java and the JVM
A tour of Java and the JVMA tour of Java and the JVM
A tour of Java and the JVMAlex Birch
 
The Future of Messaging: RabbitMQ and AMQP
The Future of Messaging: RabbitMQ and AMQP The Future of Messaging: RabbitMQ and AMQP
The Future of Messaging: RabbitMQ and AMQP Eberhard Wolff
 
Intro to Apache Apex @ Women in Big Data
Intro to Apache Apex @ Women in Big DataIntro to Apache Apex @ Women in Big Data
Intro to Apache Apex @ Women in Big DataApache Apex
 
03 2014 Apache Spark Serving: Unifying Batch, Streaming, and RESTful Serving
03 2014 Apache Spark Serving: Unifying Batch, Streaming, and RESTful Serving03 2014 Apache Spark Serving: Unifying Batch, Streaming, and RESTful Serving
03 2014 Apache Spark Serving: Unifying Batch, Streaming, and RESTful ServingDatabricks
 
Apache Kafka
Apache KafkaApache Kafka
Apache KafkaJoe Stein
 
Simplifying Big Data Applications with Apache Spark 2.0
Simplifying Big Data Applications with Apache Spark 2.0Simplifying Big Data Applications with Apache Spark 2.0
Simplifying Big Data Applications with Apache Spark 2.0Spark Summit
 
Building Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache KafkaBuilding Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache KafkaSlim Baltagi
 
Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...
Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...
Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...HostedbyConfluent
 
Scotas - Oracle Open World Sao Pablo
Scotas - Oracle Open World Sao PabloScotas - Oracle Open World Sao Pablo
Scotas - Oracle Open World Sao PabloJulian Arocena
 
Apache Big Data Europe 2016
Apache Big Data Europe 2016Apache Big Data Europe 2016
Apache Big Data Europe 2016Tim Ellison
 
Life in a Queue - Using Message Queue with django
Life in a Queue - Using Message Queue with djangoLife in a Queue - Using Message Queue with django
Life in a Queue - Using Message Queue with djangoTareque Hossain
 

Similar to Spark Kernel Enables Interactive Apache Spark Apps (20)

Typesafe stack - Scala, Akka and Play
Typesafe stack - Scala, Akka and PlayTypesafe stack - Scala, Akka and Play
Typesafe stack - Scala, Akka and Play
 
Real world Scala hAkking NLJUG JFall 2011
Real world Scala hAkking NLJUG JFall 2011Real world Scala hAkking NLJUG JFall 2011
Real world Scala hAkking NLJUG JFall 2011
 
Agile Lab_BigData_Meetup_AKKA
Agile Lab_BigData_Meetup_AKKAAgile Lab_BigData_Meetup_AKKA
Agile Lab_BigData_Meetup_AKKA
 
Monitoring Akka with Kamon 1.0
Monitoring Akka with Kamon 1.0Monitoring Akka with Kamon 1.0
Monitoring Akka with Kamon 1.0
 
Virtual Flink Forward 2020: How Streaming Helps Your Staging Environment and ...
Virtual Flink Forward 2020: How Streaming Helps Your Staging Environment and ...Virtual Flink Forward 2020: How Streaming Helps Your Staging Environment and ...
Virtual Flink Forward 2020: How Streaming Helps Your Staging Environment and ...
 
Lessons Learned: Using Spark and Microservices
Lessons Learned: Using Spark and MicroservicesLessons Learned: Using Spark and Microservices
Lessons Learned: Using Spark and Microservices
 
Seattle Spark Meetup Mobius CSharp API
Seattle Spark Meetup Mobius CSharp APISeattle Spark Meetup Mobius CSharp API
Seattle Spark Meetup Mobius CSharp API
 
Rethinking the debugger
Rethinking the debuggerRethinking the debugger
Rethinking the debugger
 
A tour of Java and the JVM
A tour of Java and the JVMA tour of Java and the JVM
A tour of Java and the JVM
 
The Future of Messaging: RabbitMQ and AMQP
The Future of Messaging: RabbitMQ and AMQP The Future of Messaging: RabbitMQ and AMQP
The Future of Messaging: RabbitMQ and AMQP
 
Intro to Apache Apex @ Women in Big Data
Intro to Apache Apex @ Women in Big DataIntro to Apache Apex @ Women in Big Data
Intro to Apache Apex @ Women in Big Data
 
03 2014 Apache Spark Serving: Unifying Batch, Streaming, and RESTful Serving
03 2014 Apache Spark Serving: Unifying Batch, Streaming, and RESTful Serving03 2014 Apache Spark Serving: Unifying Batch, Streaming, and RESTful Serving
03 2014 Apache Spark Serving: Unifying Batch, Streaming, and RESTful Serving
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Simplifying Big Data Applications with Apache Spark 2.0
Simplifying Big Data Applications with Apache Spark 2.0Simplifying Big Data Applications with Apache Spark 2.0
Simplifying Big Data Applications with Apache Spark 2.0
 
Building Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache KafkaBuilding Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache Kafka
 
Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...
Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...
Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...
 
Scotas - Oracle Open World Sao Pablo
Scotas - Oracle Open World Sao PabloScotas - Oracle Open World Sao Pablo
Scotas - Oracle Open World Sao Pablo
 
Apache Big Data Europe 2016
Apache Big Data Europe 2016Apache Big Data Europe 2016
Apache Big Data Europe 2016
 
Life in a Queue - Using Message Queue with django
Life in a Queue - Using Message Queue with djangoLife in a Queue - Using Message Queue with django
Life in a Queue - Using Message Queue with django
 

Recently uploaded

W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceanilsa9823
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 

Recently uploaded (20)

W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 

Spark Kernel Enables Interactive Apache Spark Apps

  • 1. Spark Kernel IBM Emerging Internet Technologies
  • 2. Outline • Scenario • Problem • How do you enable interactive applications against Apache Spark? • Solution • Spark Kernel • Architecture • Memory issue • Comm API • Livesheets (line of business tool) • RESTful Server (query interface) • Extending the Spark Kernel • Summary & Questions
  • 3. Scenario • Livesheets prototype • Needs to be able to build computations on the fly • Needs to be able to perform computations on static (historical) data as well as dynamic (streaming) data • Needs to be responsive (order of seconds instead of minutes)
  • 4. Problem How do you enable interactive applications? • Spark Submit for job submission to Apache Spark • JDBC and other offerings available for Spark SQL • RESTful interfaces available to submit jars • Spark Shell offers code snippet support to execute against a Spark cluster
  • 5. Problem How do you enable interactive applications? • Used Spark Submit • Bundled up Spark-based computations into a jar • Started an external process to run the Spark Submit script against the jar
  • 6. Problem How do you enable interactive applications? • What was wrong? • Rebundle the jar every time a computation changed • Not easy to attach to an existing Spark job • Getting results involved writing to a data store and then reading back out • Very slow turnaround
  • 7. Solution: Spark Kernel • Scala application that can do the following: • Define and execute raw Scala source code • Define and run Spark tasks via code snippets or jars • Collect results directly from a Spark cluster • Benefits • Avoid friction of shipping jars and reading results from peripheral systems • Well-defined API (IPython/Jupyter) • Acts as a proxy for Spark applications such that they can run remotely away from Spark • Provides a client library for application development Spark Cluster Master Worker Worker Worker Worker Kernel IPython App 1 Kernel Client library App2 Kernel Client library ZeroMQ with IPython message protocol
  • 8. Kernel Architecture Spark Cluster Master Worker Worker Worker Worker Kernel ØMQ Akka Message Parsing and Validation Routing Message Handling Scala Interpreter Class Server Spark Context Heartbeat Shell Control StdIn IOPub
  • 9. Kernel Architecture Spark Cluster Master Worker Worker Worker Worker Kernel ØMQ Akka Message Parsing and Validation Routing Message Handling Scala Interpreter Class Server Spark Context Heartbea t She ll Control StdI n IOPu b • Why ZeroMQ? • Used by IPython • Responsiveness • Building blocks have behavior • Publisher sends messages to all subscribers
  • 10. Kernel Architecture Spark Cluster Master Worker Worker Worker Worker Kernel ØMQ Akka Message Parsing and Validation Routing Message Handling Scala Interpreter Class Server Spark Context Heartbea t She ll Control StdI n IOPu b • Why Akka? • Concurrency • Code isolation • Fault tolerance • Scalability
  • 11. • IPython Protocol • Specifies incoming and outgoing messages handled by the kernel • Defines the purposes of the five channels of communication Channels of Communication Kernel ØMQ … Heartbeat Shell Control StdIn IOPub • ZeroMQ API • Uses ZeroMQ for socket communication via the five defined ports • Uses ZMTP as the wire protocol
  • 12. • Heartbeat • Used to indicate that the kernel is still alive • Echoes received messages back to client • Primarily used by IPython Channels of Communication Kernel ØMQ … Heartbeat Shell Control StdIn IOPub • Shell • Used to communicate requests from a client to the kernel • Main purposes are code execution and Comm messages from a client
  • 13. • Control • Serves as a higher priority shell channel • Typically used to receive shutdown signals Channels of Communication Kernel ØMQ … Heartbeat Shell Control StdIn IOPub • StdIn • Used to communicate requests from the kernel to the client(s) • Primarily used by IPython as a form of communication for users through the UI
  • 14. • IOPub • Broadcasts messages to all listening clients • Used to communicate side effects (standard out/error) as well as Comm messages Channels of Communication Kernel ØMQ … Heartbeat Shell Control StdIn IOPub
  • 15. Processing Messages Kernel … Akka Message Parsing and Validation Routing Message Handling … • Message Parsing and Validation • Uses Akka actors wrapping JeroMQ as an abstraction to parse messages • Calculates an HMAC (keyed-hash message authentication code) using SHA-256 and a secret key to validate against a signature in a message
  • 16. Processing Messages Kernel … Akka Message Parsing and Validation Routing Message Handling … • Routing • Incoming messages are routed by message type to associated message handler actors • Outgoing messages are routed by message type to associated channels
  • 17. Processing Messages Kernel … Akka Message Parsing and Validation Routing Message Handling … • Message Handling • Each message type has an associated Akka actor to handle the request • Some handlers use child actors to perform tasks, protecting the state of the handler by following Erlang’s Error Kernel Pattern as well as reducing strain on the handler
  • 18. Scala Interpreter Kernel … Scala Interpreter Class Server Spark Context • Scala Interpreter • Uses the Spark REPL API to execute Scala code • Contains zero modifications to Spark’s REPL • Contains injected variables to provide Spark APIs and kernel APIs including magics and Comm communication
  • 19. Scala Interpreter Kernel … Scala Interpreter Class Server Spark Context • Class Server • Exposes generated REPL classes to the Spark cluster • In Spark’s Scala 2.10 implementation of the REPL, this is created for us • Spark Context • Standard Scala-based Spark Context • Exposed as a variable named ‘sc’ for user submitted code
  • 20. Kernel Kernel Client Architecture Heartbea t ShellControlStdInIO/Pub Kernel Client ØMQ Akka Message Parsing and Validation Routing Message Handling API Application Expose public methods accessible from Scala and Java Client sockets mirror and communicate with kernel sockets Actor system for client shares codebase with kernel
  • 22.
  • 23. Memory Issue • Scala REPL (therefore Spark Shell) • Generates new classes with each code snippet compiled (leads to PermGen space issues on JVM) • Instantiates a new Request class instance per execution to hold state (leads to OutOfMemory exception)
  • 24. Memory Issue Comm API to the rescue!
  • 25. Comm API Frontend (Client) Backend (Kernel) • Flexibility • Bidirectional communication • Ability to programmatically define messages and their interactions • Performance • Avoid recompiling code • Does not keep execution state • Simplicity • Start (open) communication • Send data (msg) • Stop (close) communication open msg close
  • 26. Comm API Frontend (Client) Backend (Kernel) • Comm Open Request • Establishes a new link between the frontend and backend • Can contain data needed for initialization { "comm_id" : "u-u-i-d", "target_name" : "my_comm", "data" : {} } open msg close
  • 27. Comm API Frontend (Client) Backend (Kernel) • Comm Msg Request • Primary form of communication • Contains data relevant to the request open msg close { "comm_id" : "u-u-i-d", "data" : {} }
  • 28. Comm API Frontend (Client) Backend (Kernel) • Comm Close Request • Removes the link between the front and back end components • Can contain data needed for teardown open msg close { "comm_id" : "u-u-i-d", "data" : {} }
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 35.
  • 36.
  • 38.
  • 39. Extending the Spark Kernel • PySpark support • Zeppelin integration
  • 40.
  • 41. Summary • Goal was to provide an API to enable interactive Spark applications • Kernel provides a responsive API to use Apache Spark • Submit code snippets in same fashion as Spark Shell • Use Comm API for programmatically-defined messages • Kernel implements IPython message protocol • Able to use with IPython notebooks out of the box • Repository: https://github.com/ibm-et/spark-kernel