SlideShare una empresa de Scribd logo
1 de 19
Descargar para leer sin conexión
JDG 7 & Spark
Integration
Infinispan Spark connector
tedwon
Agenda
● Brief Introduction to JBoss Data Grid 7
● Brief Introduction to Apache Spark
● Features of Infinispan Spark connector
● Demo
What is JBoss Data Grid
● Distributed cache
● In-memory NoSQL
● Key/value data store
● Built from the Infinispan community project
● In architecture, there is no master node
● Peer-to-peer membership clustering
● The most strength is extremely highly availability and scalability as in-memory
○ In my personal opinion
● Good harmony with real-time data processing framework for data services
● As a metaphor, In-memory HDFS for real-time processing
What is JBoss Data Grid 7
● JDG 7 was released at July, 2016
● JDG 7 is based on Infinispan 8.3.0 and EAP 7.0.0.GA
○ The previous version JDG 6.6.0 is based on Infinispan 6.4.0
● JDG 7 now becomes much more powerful as a competitive software product
● There are new features and enhancements
○ New GUI Admin console
○ Easy to install and configure clustering and cache
○ Easy to monitor and change configuration
○ Easy to create a new cache through admin console even in runtime
○ Provides API for integration with Apache Spark
JDG 7 - New Features and Enhancements
1. DISTRIBUTED STREAMS
2. REMOTE TASK EXECUTION
3. APACHE SPARK INTEGRATION
4. APACHE HADOOP INTEGRATION
5. NEW ADMINISTRATION CONSOLE FOR SERVER DEPLOYMENTS
6. CONTROLLED SHUTDOWN AND RESTART OF CLUSTER
7. NODE.JS (JAVASCRIPT) HOT ROD CLIENT
8. CASSANDRA CACHE STORE
9. HOT ROD C++ ENHANCEMENTS
10. HOT ROD C# ENHANCEMENTS
What is Apache Spark
● General-purpose Lightning-fast cluster computing framework
● General-purpose
○ One common data processing engine
○ Batch, SQL, Streaming, MLlib, Graph
○ One platform to rule them all
● Lightning-fast
○ RDD - Resilient Distributed Dataset
○ Read-only multiset of data items distributed over a cluster of machines
● Works with Hadoop HDFS and process that data in parallel
○ Distributed file System
● MapReduce interfaces with funtional programming style
○ No MR chaining workflow in Hadoop
What is Apache Spark
Apache Spark - One platform to rule them all
What is Apache Spark RDD
● RDD consists of mulitle data partitions over a cluster of machines
● RDD is intermediate data during a Spark data processing job
● RDD is distributed in memory of each worker node JVM
● Data processing job is transformations of RDDs
● RDDs are disappeared after finishing a job
● Not able to share RDDs between Spark jobs
What is Apache Spark RDD
Infinispan Spark connector
JDG 7’s new feautre
What is Infinispan Spark connector
● https://github.com/infinispan/infinispan-spark
● JDG 7 Document https://goo.gl/9BXp98
● RDD and DStream integration with Apache Spark 1.6
○ DStream is a continuous sequence of RDDs for real-time stream processing
● Use JDG as a data source for Spark
● Easy to read & write cache data in a Spark job
● Provides seamless funtional programming style and syntactic sugar
● Good to share RDD with other Spark jobs
Supported Configurations
Features of Infinispan Spark connector
1. Create an Spark RDD from a JDG cache data
○ Read cache data from Spark job
2. Write any key/value based RDD to a JDG cache
○ Write intermediate or final data to cache
3. Create a Spark DStream from cache-level events
○ Insert, Modify and Delete event in a cache
4. Write any key/value DStream to JDG
○ Write any DStream to cache
5. Use JDG server side filters to create a cache based RDD
○ Using Infinispan Query DSL
Prerequisite & Version compatibility
● JDK 8, Scala 2.10.4, Spark 1.6
● JBoss Data Grid 7.0.0 Server
○ Infinispan Server 8.2.4.Final
● Run JDG in Remote Client-Server mode
Demo
https://github.com/tedwon/infinispan-spark-connector-examples
Performance Considerations
● The number of Spark workers should be greater than JDG nodes
○ To take advantage of the parallelism
● Support locality with co-located in the same node as JDG and Spark worker
○ Spark worker only processes data in the local node with connector
Thank you
References
1. https://access.redhat.com/documentation/en-US/Red_Hat_JBoss_Data_Grid/7.0/html-single/Developer_Guide/index.
html#Integration_with_Apache_Spark
2. http://spark.apache.org/docs/1.6.2/programming-guide.html
3. https://github.com/infinispan/infinispan-spark
4. https://github.com/infinispan/infinispan
5. https://github.com/tedwon/infinispan-spark-connector-examples
6. https://access.redhat.com/documentation/en-US/Red_Hat_JBoss_Data_Grid/7.0/html-single/7.0.0_Release_Notes/in
dex.html#chap-New_Features_and_Enhancements
7. https://en.wikipedia.org/wiki/Apache_Spark
8. https://hub.docker.com/r/gustavonalle/infinispan-spark/

Más contenido relacionado

La actualidad más candente

Infinispan, transactional key value data grid and nosql database
Infinispan, transactional key value data grid and nosql databaseInfinispan, transactional key value data grid and nosql database
Infinispan, transactional key value data grid and nosql databaseAlexander Petrov
 
ClustrixDB at Samsung Cloud
ClustrixDB at Samsung CloudClustrixDB at Samsung Cloud
ClustrixDB at Samsung CloudMariaDB plc
 
MongoDB vs Mysql. A devops point of view
MongoDB vs Mysql. A devops point of viewMongoDB vs Mysql. A devops point of view
MongoDB vs Mysql. A devops point of viewPierre Baillet
 
MongoDB Administration ~ Kevin Hanson
MongoDB Administration ~ Kevin HansonMongoDB Administration ~ Kevin Hanson
MongoDB Administration ~ Kevin Hansonhungarianhc
 
GridFS: The Perfect Solution for Media Storage
GridFS: The Perfect Solution for Media StorageGridFS: The Perfect Solution for Media Storage
GridFS: The Perfect Solution for Media StorageMongoDB
 
Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8MongoDB
 
Conquering Data Migration from Oracle to Postgres
Conquering Data Migration from Oracle to PostgresConquering Data Migration from Oracle to Postgres
Conquering Data Migration from Oracle to PostgresEDB
 
MongoDB .local London 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local London 2019: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local London 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local London 2019: MongoDB Atlas Data Lake Technical Deep DiveMongoDB
 
Summer 2017 undergraduate research powerpoint
Summer 2017 undergraduate research powerpointSummer 2017 undergraduate research powerpoint
Summer 2017 undergraduate research powerpointChristopher Dubois
 
WiredTiger Overview
WiredTiger OverviewWiredTiger Overview
WiredTiger OverviewWiredTiger
 
FOSSASIA 2016 - 7 Tips to design web centric high-performance applications
FOSSASIA 2016 - 7 Tips to design web centric high-performance applicationsFOSSASIA 2016 - 7 Tips to design web centric high-performance applications
FOSSASIA 2016 - 7 Tips to design web centric high-performance applicationsAshnikbiz
 
Azure document db/Cosmos DB
Azure document db/Cosmos DBAzure document db/Cosmos DB
Azure document db/Cosmos DBMohit Chhabra
 
Stream processing at Hotstar
Stream processing at HotstarStream processing at Hotstar
Stream processing at HotstarKafkaZone
 
HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...
HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...
HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...HBaseCon
 
An Overview of Apache Spark
An Overview of Apache SparkAn Overview of Apache Spark
An Overview of Apache SparkYasoda Jayaweera
 
Introduction To MongoDB
Introduction To MongoDBIntroduction To MongoDB
Introduction To MongoDBElieHannouch
 
MongoDB Administration 101
MongoDB Administration 101MongoDB Administration 101
MongoDB Administration 101MongoDB
 
Solr cloud the 'search first' nosql database extended deep dive
Solr cloud the 'search first' nosql database   extended deep diveSolr cloud the 'search first' nosql database   extended deep dive
Solr cloud the 'search first' nosql database extended deep divelucenerevolution
 

La actualidad más candente (20)

Infinispan, transactional key value data grid and nosql database
Infinispan, transactional key value data grid and nosql databaseInfinispan, transactional key value data grid and nosql database
Infinispan, transactional key value data grid and nosql database
 
ClustrixDB at Samsung Cloud
ClustrixDB at Samsung CloudClustrixDB at Samsung Cloud
ClustrixDB at Samsung Cloud
 
MongoDB vs Mysql. A devops point of view
MongoDB vs Mysql. A devops point of viewMongoDB vs Mysql. A devops point of view
MongoDB vs Mysql. A devops point of view
 
MongoDB Administration ~ Kevin Hanson
MongoDB Administration ~ Kevin HansonMongoDB Administration ~ Kevin Hanson
MongoDB Administration ~ Kevin Hanson
 
Cosmosdb graph
Cosmosdb graphCosmosdb graph
Cosmosdb graph
 
GridFS: The Perfect Solution for Media Storage
GridFS: The Perfect Solution for Media StorageGridFS: The Perfect Solution for Media Storage
GridFS: The Perfect Solution for Media Storage
 
Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8
 
Conquering Data Migration from Oracle to Postgres
Conquering Data Migration from Oracle to PostgresConquering Data Migration from Oracle to Postgres
Conquering Data Migration from Oracle to Postgres
 
MongoDB .local London 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local London 2019: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local London 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local London 2019: MongoDB Atlas Data Lake Technical Deep Dive
 
Summer 2017 undergraduate research powerpoint
Summer 2017 undergraduate research powerpointSummer 2017 undergraduate research powerpoint
Summer 2017 undergraduate research powerpoint
 
WiredTiger Overview
WiredTiger OverviewWiredTiger Overview
WiredTiger Overview
 
FOSSASIA 2016 - 7 Tips to design web centric high-performance applications
FOSSASIA 2016 - 7 Tips to design web centric high-performance applicationsFOSSASIA 2016 - 7 Tips to design web centric high-performance applications
FOSSASIA 2016 - 7 Tips to design web centric high-performance applications
 
Spark Core
Spark CoreSpark Core
Spark Core
 
Azure document db/Cosmos DB
Azure document db/Cosmos DBAzure document db/Cosmos DB
Azure document db/Cosmos DB
 
Stream processing at Hotstar
Stream processing at HotstarStream processing at Hotstar
Stream processing at Hotstar
 
HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...
HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...
HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...
 
An Overview of Apache Spark
An Overview of Apache SparkAn Overview of Apache Spark
An Overview of Apache Spark
 
Introduction To MongoDB
Introduction To MongoDBIntroduction To MongoDB
Introduction To MongoDB
 
MongoDB Administration 101
MongoDB Administration 101MongoDB Administration 101
MongoDB Administration 101
 
Solr cloud the 'search first' nosql database extended deep dive
Solr cloud the 'search first' nosql database   extended deep diveSolr cloud the 'search first' nosql database   extended deep dive
Solr cloud the 'search first' nosql database extended deep dive
 

Destacado

Complex Event Processing with Esper
Complex Event Processing with EsperComplex Event Processing with Esper
Complex Event Processing with EsperTed Won
 
RHQ 공감 Seminar 6th
RHQ 공감 Seminar 6thRHQ 공감 Seminar 6th
RHQ 공감 Seminar 6thTed Won
 
JBoss Community's Application Monitoring Platform
JBoss Community's Application Monitoring PlatformJBoss Community's Application Monitoring Platform
JBoss Community's Application Monitoring PlatformTed Won
 
JCO 11th 클라우드 환경에서 Java EE 운영 환경 구축하기
JCO 11th 클라우드 환경에서 Java EE 운영 환경 구축하기JCO 11th 클라우드 환경에서 Java EE 운영 환경 구축하기
JCO 11th 클라우드 환경에서 Java EE 운영 환경 구축하기Ted Won
 
Red Hat Forum 2012 - JBoss RHQ - Java Application Monitoring & Management Pla...
Red Hat Forum 2012 - JBoss RHQ - Java Application Monitoring & Management Pla...Red Hat Forum 2012 - JBoss RHQ - Java Application Monitoring & Management Pla...
Red Hat Forum 2012 - JBoss RHQ - Java Application Monitoring & Management Pla...Ted Won
 
Nara - Personalized Web Recommendation Service Quick Review
Nara - Personalized Web Recommendation Service Quick ReviewNara - Personalized Web Recommendation Service Quick Review
Nara - Personalized Web Recommendation Service Quick ReviewTed Won
 
Real-time Big Data Analytics Practice with Unstructured Data
Real-time Big Data Analytics Practice with Unstructured DataReal-time Big Data Analytics Practice with Unstructured Data
Real-time Big Data Analytics Practice with Unstructured DataTed Won
 
Hadoop for the Data Scientist: Spark in Cloudera 5.5
Hadoop for the Data Scientist: Spark in Cloudera 5.5Hadoop for the Data Scientist: Spark in Cloudera 5.5
Hadoop for the Data Scientist: Spark in Cloudera 5.5Cloudera, Inc.
 
Complex Event Processing with Esper
Complex Event Processing with EsperComplex Event Processing with Esper
Complex Event Processing with EsperTed Won
 
How to Avoid Problems with Lump-sum Relocation Allowances
How to Avoid Problems with Lump-sum Relocation AllowancesHow to Avoid Problems with Lump-sum Relocation Allowances
How to Avoid Problems with Lump-sum Relocation AllowancesParsifal Corporation
 
Real Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingReal Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingHari Shreedharan
 
AddisDev Meetup ii: Golang and Flow-based Programming
AddisDev Meetup ii: Golang and Flow-based ProgrammingAddisDev Meetup ii: Golang and Flow-based Programming
AddisDev Meetup ii: Golang and Flow-based ProgrammingSamuel Lampa
 
지금 핫한 Real-time In-memory Stream Processing 이야기
지금 핫한 Real-time In-memory Stream Processing 이야기지금 핫한 Real-time In-memory Stream Processing 이야기
지금 핫한 Real-time In-memory Stream Processing 이야기Ted Won
 
Tools For jQuery Application Architecture (Extended Slides)
Tools For jQuery Application Architecture (Extended Slides)Tools For jQuery Application Architecture (Extended Slides)
Tools For jQuery Application Architecture (Extended Slides)Addy Osmani
 
SimplifyStreamingArchitecture
SimplifyStreamingArchitectureSimplifyStreamingArchitecture
SimplifyStreamingArchitectureMaheedhar Gunturu
 
Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream ...
Apache Flink Meetup:  Sanjar Akhmedov - Joining Infinity – Windowless Stream ...Apache Flink Meetup:  Sanjar Akhmedov - Joining Infinity – Windowless Stream ...
Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream ...Ververica
 
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard OfApache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard OfCharles Givre
 
Going bananas with recursion schemes for fixed point data types
Going bananas with recursion schemes for fixed point data typesGoing bananas with recursion schemes for fixed point data types
Going bananas with recursion schemes for fixed point data typesPawel Szulc
 
Data ingestion and distribution with apache NiFi
Data ingestion and distribution with apache NiFiData ingestion and distribution with apache NiFi
Data ingestion and distribution with apache NiFiLev Brailovskiy
 

Destacado (20)

Complex Event Processing with Esper
Complex Event Processing with EsperComplex Event Processing with Esper
Complex Event Processing with Esper
 
RHQ 공감 Seminar 6th
RHQ 공감 Seminar 6thRHQ 공감 Seminar 6th
RHQ 공감 Seminar 6th
 
JBoss Community's Application Monitoring Platform
JBoss Community's Application Monitoring PlatformJBoss Community's Application Monitoring Platform
JBoss Community's Application Monitoring Platform
 
JCO 11th 클라우드 환경에서 Java EE 운영 환경 구축하기
JCO 11th 클라우드 환경에서 Java EE 운영 환경 구축하기JCO 11th 클라우드 환경에서 Java EE 운영 환경 구축하기
JCO 11th 클라우드 환경에서 Java EE 운영 환경 구축하기
 
Red Hat Forum 2012 - JBoss RHQ - Java Application Monitoring & Management Pla...
Red Hat Forum 2012 - JBoss RHQ - Java Application Monitoring & Management Pla...Red Hat Forum 2012 - JBoss RHQ - Java Application Monitoring & Management Pla...
Red Hat Forum 2012 - JBoss RHQ - Java Application Monitoring & Management Pla...
 
Nara - Personalized Web Recommendation Service Quick Review
Nara - Personalized Web Recommendation Service Quick ReviewNara - Personalized Web Recommendation Service Quick Review
Nara - Personalized Web Recommendation Service Quick Review
 
Real-time Big Data Analytics Practice with Unstructured Data
Real-time Big Data Analytics Practice with Unstructured DataReal-time Big Data Analytics Practice with Unstructured Data
Real-time Big Data Analytics Practice with Unstructured Data
 
Hadoop for the Data Scientist: Spark in Cloudera 5.5
Hadoop for the Data Scientist: Spark in Cloudera 5.5Hadoop for the Data Scientist: Spark in Cloudera 5.5
Hadoop for the Data Scientist: Spark in Cloudera 5.5
 
Complex Event Processing with Esper
Complex Event Processing with EsperComplex Event Processing with Esper
Complex Event Processing with Esper
 
How to Avoid Problems with Lump-sum Relocation Allowances
How to Avoid Problems with Lump-sum Relocation AllowancesHow to Avoid Problems with Lump-sum Relocation Allowances
How to Avoid Problems with Lump-sum Relocation Allowances
 
Real Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingReal Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark Streaming
 
AddisDev Meetup ii: Golang and Flow-based Programming
AddisDev Meetup ii: Golang and Flow-based ProgrammingAddisDev Meetup ii: Golang and Flow-based Programming
AddisDev Meetup ii: Golang and Flow-based Programming
 
지금 핫한 Real-time In-memory Stream Processing 이야기
지금 핫한 Real-time In-memory Stream Processing 이야기지금 핫한 Real-time In-memory Stream Processing 이야기
지금 핫한 Real-time In-memory Stream Processing 이야기
 
Tools For jQuery Application Architecture (Extended Slides)
Tools For jQuery Application Architecture (Extended Slides)Tools For jQuery Application Architecture (Extended Slides)
Tools For jQuery Application Architecture (Extended Slides)
 
SimplifyStreamingArchitecture
SimplifyStreamingArchitectureSimplifyStreamingArchitecture
SimplifyStreamingArchitecture
 
Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream ...
Apache Flink Meetup:  Sanjar Akhmedov - Joining Infinity – Windowless Stream ...Apache Flink Meetup:  Sanjar Akhmedov - Joining Infinity – Windowless Stream ...
Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream ...
 
Spark+flume seattle
Spark+flume seattleSpark+flume seattle
Spark+flume seattle
 
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard OfApache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard Of
 
Going bananas with recursion schemes for fixed point data types
Going bananas with recursion schemes for fixed point data typesGoing bananas with recursion schemes for fixed point data types
Going bananas with recursion schemes for fixed point data types
 
Data ingestion and distribution with apache NiFi
Data ingestion and distribution with apache NiFiData ingestion and distribution with apache NiFi
Data ingestion and distribution with apache NiFi
 

Similar a JDG 7 & Spark Integration

Spark Driven Big Data Analytics
Spark Driven Big Data AnalyticsSpark Driven Big Data Analytics
Spark Driven Big Data Analyticsinoshg
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Sparkdatamantra
 
Apache spark on Hadoop Yarn Resource Manager
Apache spark on Hadoop Yarn Resource ManagerApache spark on Hadoop Yarn Resource Manager
Apache spark on Hadoop Yarn Resource Managerharidasnss
 
Retour d'expérience d'un environnement base de données multitenant
Retour d'expérience d'un environnement base de données multitenantRetour d'expérience d'un environnement base de données multitenant
Retour d'expérience d'un environnement base de données multitenantSwiss Data Forum Swiss Data Forum
 
Blackray @ SAPO CodeBits 2009
Blackray @ SAPO CodeBits 2009Blackray @ SAPO CodeBits 2009
Blackray @ SAPO CodeBits 2009fschupp
 
Apache Spark e AWS Glue
Apache Spark e AWS GlueApache Spark e AWS Glue
Apache Spark e AWS GlueLaercio Serra
 
Zeppelin and spark sql demystified
Zeppelin and spark sql demystifiedZeppelin and spark sql demystified
Zeppelin and spark sql demystifiedOmid Vahdaty
 
Apache Spark for Beginners
Apache Spark for BeginnersApache Spark for Beginners
Apache Spark for BeginnersAnirudh
 
Putting the Spark into Functional Fashion Tech Analystics
Putting the Spark into Functional Fashion Tech AnalysticsPutting the Spark into Functional Fashion Tech Analystics
Putting the Spark into Functional Fashion Tech AnalysticsGareth Rogers
 
Introduction to Apache Spark
Introduction to Apache Spark Introduction to Apache Spark
Introduction to Apache Spark Juan Pedro Moreno
 
New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015Robbie Strickland
 
Spark Concepts - Spark SQL, Graphx, Streaming
Spark Concepts - Spark SQL, Graphx, StreamingSpark Concepts - Spark SQL, Graphx, Streaming
Spark Concepts - Spark SQL, Graphx, StreamingPetr Zapletal
 
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
 Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F... Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...Databricks
 

Similar a JDG 7 & Spark Integration (20)

Spark Driven Big Data Analytics
Spark Driven Big Data AnalyticsSpark Driven Big Data Analytics
Spark Driven Big Data Analytics
 
Apache Spark on HDinsight Training
Apache Spark on HDinsight TrainingApache Spark on HDinsight Training
Apache Spark on HDinsight Training
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Apache spark on Hadoop Yarn Resource Manager
Apache spark on Hadoop Yarn Resource ManagerApache spark on Hadoop Yarn Resource Manager
Apache spark on Hadoop Yarn Resource Manager
 
Retour d'expérience d'un environnement base de données multitenant
Retour d'expérience d'un environnement base de données multitenantRetour d'expérience d'un environnement base de données multitenant
Retour d'expérience d'un environnement base de données multitenant
 
Blackray @ SAPO CodeBits 2009
Blackray @ SAPO CodeBits 2009Blackray @ SAPO CodeBits 2009
Blackray @ SAPO CodeBits 2009
 
Apache Spark e AWS Glue
Apache Spark e AWS GlueApache Spark e AWS Glue
Apache Spark e AWS Glue
 
Zeppelin and spark sql demystified
Zeppelin and spark sql demystifiedZeppelin and spark sql demystified
Zeppelin and spark sql demystified
 
Apache Spark for Beginners
Apache Spark for BeginnersApache Spark for Beginners
Apache Spark for Beginners
 
Spark Worshop
Spark WorshopSpark Worshop
Spark Worshop
 
Putting the Spark into Functional Fashion Tech Analystics
Putting the Spark into Functional Fashion Tech AnalysticsPutting the Spark into Functional Fashion Tech Analystics
Putting the Spark into Functional Fashion Tech Analystics
 
Apache Spark
Apache SparkApache Spark
Apache Spark
 
Spark
SparkSpark
Spark
 
Introduction to Apache Spark
Introduction to Apache Spark Introduction to Apache Spark
Introduction to Apache Spark
 
New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015
 
Spark Concepts - Spark SQL, Graphx, Streaming
Spark Concepts - Spark SQL, Graphx, StreamingSpark Concepts - Spark SQL, Graphx, Streaming
Spark Concepts - Spark SQL, Graphx, Streaming
 
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
 Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F... Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache Spark
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache Spark
 
Spark SQL
Spark SQLSpark SQL
Spark SQL
 

Más de Ted Won

Undertow RequestBufferingHandler 소개
Undertow RequestBufferingHandler 소개Undertow RequestBufferingHandler 소개
Undertow RequestBufferingHandler 소개Ted Won
 
JBoss EAP 7 & JDG 7 최신 기술 소개
JBoss EAP 7 & JDG 7 최신 기술 소개JBoss EAP 7 & JDG 7 최신 기술 소개
JBoss EAP 7 & JDG 7 최신 기술 소개Ted Won
 
JBoss Modules Internal
JBoss Modules InternalJBoss Modules Internal
JBoss Modules InternalTed Won
 
오픈 소스 컨트리뷰션 가이드
오픈 소스 컨트리뷰션 가이드오픈 소스 컨트리뷰션 가이드
오픈 소스 컨트리뷰션 가이드Ted Won
 
Jenkins X Hands-on - automated CI/CD solution for cloud native applications o...
Jenkins X Hands-on - automated CI/CD solution for cloud native applications o...Jenkins X Hands-on - automated CI/CD solution for cloud native applications o...
Jenkins X Hands-on - automated CI/CD solution for cloud native applications o...Ted Won
 
Jenkins X - automated CI/CD solution for cloud native applications on Kubernetes
Jenkins X - automated CI/CD solution for cloud native applications on KubernetesJenkins X - automated CI/CD solution for cloud native applications on Kubernetes
Jenkins X - automated CI/CD solution for cloud native applications on KubernetesTed Won
 
Hawkular overview
Hawkular overviewHawkular overview
Hawkular overviewTed Won
 
Building Real-time CEP Application with Open Source Projects
Building Real-time CEP Application with Open Source Projects Building Real-time CEP Application with Open Source Projects
Building Real-time CEP Application with Open Source Projects Ted Won
 
JBoss RHQ와 Byteman을 이용한 오픈소스 자바 애플리케이션 모니터링
JBoss RHQ와 Byteman을 이용한 오픈소스 자바 애플리케이션 모니터링JBoss RHQ와 Byteman을 이용한 오픈소스 자바 애플리케이션 모니터링
JBoss RHQ와 Byteman을 이용한 오픈소스 자바 애플리케이션 모니터링Ted Won
 

Más de Ted Won (9)

Undertow RequestBufferingHandler 소개
Undertow RequestBufferingHandler 소개Undertow RequestBufferingHandler 소개
Undertow RequestBufferingHandler 소개
 
JBoss EAP 7 & JDG 7 최신 기술 소개
JBoss EAP 7 & JDG 7 최신 기술 소개JBoss EAP 7 & JDG 7 최신 기술 소개
JBoss EAP 7 & JDG 7 최신 기술 소개
 
JBoss Modules Internal
JBoss Modules InternalJBoss Modules Internal
JBoss Modules Internal
 
오픈 소스 컨트리뷰션 가이드
오픈 소스 컨트리뷰션 가이드오픈 소스 컨트리뷰션 가이드
오픈 소스 컨트리뷰션 가이드
 
Jenkins X Hands-on - automated CI/CD solution for cloud native applications o...
Jenkins X Hands-on - automated CI/CD solution for cloud native applications o...Jenkins X Hands-on - automated CI/CD solution for cloud native applications o...
Jenkins X Hands-on - automated CI/CD solution for cloud native applications o...
 
Jenkins X - automated CI/CD solution for cloud native applications on Kubernetes
Jenkins X - automated CI/CD solution for cloud native applications on KubernetesJenkins X - automated CI/CD solution for cloud native applications on Kubernetes
Jenkins X - automated CI/CD solution for cloud native applications on Kubernetes
 
Hawkular overview
Hawkular overviewHawkular overview
Hawkular overview
 
Building Real-time CEP Application with Open Source Projects
Building Real-time CEP Application with Open Source Projects Building Real-time CEP Application with Open Source Projects
Building Real-time CEP Application with Open Source Projects
 
JBoss RHQ와 Byteman을 이용한 오픈소스 자바 애플리케이션 모니터링
JBoss RHQ와 Byteman을 이용한 오픈소스 자바 애플리케이션 모니터링JBoss RHQ와 Byteman을 이용한 오픈소스 자바 애플리케이션 모니터링
JBoss RHQ와 Byteman을 이용한 오픈소스 자바 애플리케이션 모니터링
 

Último

Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceanilsa9823
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 

Último (20)

Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 

JDG 7 & Spark Integration

  • 1. JDG 7 & Spark Integration Infinispan Spark connector tedwon
  • 2. Agenda ● Brief Introduction to JBoss Data Grid 7 ● Brief Introduction to Apache Spark ● Features of Infinispan Spark connector ● Demo
  • 3. What is JBoss Data Grid ● Distributed cache ● In-memory NoSQL ● Key/value data store ● Built from the Infinispan community project ● In architecture, there is no master node ● Peer-to-peer membership clustering ● The most strength is extremely highly availability and scalability as in-memory ○ In my personal opinion ● Good harmony with real-time data processing framework for data services ● As a metaphor, In-memory HDFS for real-time processing
  • 4. What is JBoss Data Grid 7 ● JDG 7 was released at July, 2016 ● JDG 7 is based on Infinispan 8.3.0 and EAP 7.0.0.GA ○ The previous version JDG 6.6.0 is based on Infinispan 6.4.0 ● JDG 7 now becomes much more powerful as a competitive software product ● There are new features and enhancements ○ New GUI Admin console ○ Easy to install and configure clustering and cache ○ Easy to monitor and change configuration ○ Easy to create a new cache through admin console even in runtime ○ Provides API for integration with Apache Spark
  • 5. JDG 7 - New Features and Enhancements 1. DISTRIBUTED STREAMS 2. REMOTE TASK EXECUTION 3. APACHE SPARK INTEGRATION 4. APACHE HADOOP INTEGRATION 5. NEW ADMINISTRATION CONSOLE FOR SERVER DEPLOYMENTS 6. CONTROLLED SHUTDOWN AND RESTART OF CLUSTER 7. NODE.JS (JAVASCRIPT) HOT ROD CLIENT 8. CASSANDRA CACHE STORE 9. HOT ROD C++ ENHANCEMENTS 10. HOT ROD C# ENHANCEMENTS
  • 7. ● General-purpose Lightning-fast cluster computing framework ● General-purpose ○ One common data processing engine ○ Batch, SQL, Streaming, MLlib, Graph ○ One platform to rule them all ● Lightning-fast ○ RDD - Resilient Distributed Dataset ○ Read-only multiset of data items distributed over a cluster of machines ● Works with Hadoop HDFS and process that data in parallel ○ Distributed file System ● MapReduce interfaces with funtional programming style ○ No MR chaining workflow in Hadoop What is Apache Spark
  • 8. Apache Spark - One platform to rule them all
  • 9. What is Apache Spark RDD ● RDD consists of mulitle data partitions over a cluster of machines ● RDD is intermediate data during a Spark data processing job ● RDD is distributed in memory of each worker node JVM ● Data processing job is transformations of RDDs ● RDDs are disappeared after finishing a job ● Not able to share RDDs between Spark jobs
  • 10. What is Apache Spark RDD
  • 11. Infinispan Spark connector JDG 7’s new feautre
  • 12. What is Infinispan Spark connector ● https://github.com/infinispan/infinispan-spark ● JDG 7 Document https://goo.gl/9BXp98 ● RDD and DStream integration with Apache Spark 1.6 ○ DStream is a continuous sequence of RDDs for real-time stream processing ● Use JDG as a data source for Spark ● Easy to read & write cache data in a Spark job ● Provides seamless funtional programming style and syntactic sugar ● Good to share RDD with other Spark jobs
  • 14. Features of Infinispan Spark connector 1. Create an Spark RDD from a JDG cache data ○ Read cache data from Spark job 2. Write any key/value based RDD to a JDG cache ○ Write intermediate or final data to cache 3. Create a Spark DStream from cache-level events ○ Insert, Modify and Delete event in a cache 4. Write any key/value DStream to JDG ○ Write any DStream to cache 5. Use JDG server side filters to create a cache based RDD ○ Using Infinispan Query DSL
  • 15. Prerequisite & Version compatibility ● JDK 8, Scala 2.10.4, Spark 1.6 ● JBoss Data Grid 7.0.0 Server ○ Infinispan Server 8.2.4.Final ● Run JDG in Remote Client-Server mode
  • 17. Performance Considerations ● The number of Spark workers should be greater than JDG nodes ○ To take advantage of the parallelism ● Support locality with co-located in the same node as JDG and Spark worker ○ Spark worker only processes data in the local node with connector
  • 19. References 1. https://access.redhat.com/documentation/en-US/Red_Hat_JBoss_Data_Grid/7.0/html-single/Developer_Guide/index. html#Integration_with_Apache_Spark 2. http://spark.apache.org/docs/1.6.2/programming-guide.html 3. https://github.com/infinispan/infinispan-spark 4. https://github.com/infinispan/infinispan 5. https://github.com/tedwon/infinispan-spark-connector-examples 6. https://access.redhat.com/documentation/en-US/Red_Hat_JBoss_Data_Grid/7.0/html-single/7.0.0_Release_Notes/in dex.html#chap-New_Features_and_Enhancements 7. https://en.wikipedia.org/wiki/Apache_Spark 8. https://hub.docker.com/r/gustavonalle/infinispan-spark/