SlideShare a Scribd company logo
1 of 15
Download to read offline
Christos Erotocritou, GridGain Systems
Fast Data with Apache
Ignite & Apache Spark
#EUstr10
© 2017 GridGain Systems, Inc.
#EUstr10SPARK SUMMIT
…is a distributed, memory-centric data platform
with powerful & flexible processing APIs
© 2017 GridGain Systems, Inc.
#EUstr10SPARK SUMMIT
Apache Ignite Memory-Centric Data Platform
Ignite Memory-Centric Storage
Ignite Native Persistence
(Flash, SSD, Intel 3D XPoint)
Third-Party Persistence
(RDBMS, HDFS, NoSQL)
SQL Transactions Compute IgniteRDD MLStreamingKey/Value
IoTFinancial
Services
Pharma &
Healthcare
E-CommerceTravel &
Logistics
Telco
Applications
© 2017 GridGain Systems, Inc.
#EUstr10SPARK SUMMIT
Memory-Centric Storage
© 2017 GridGain Systems, Inc.
#EUstr10SPARK SUMMIT
Pure Ignite Deployment
Front-End APIs
SQL TXCompute
Ignite
RDD
Key /
Value
Payments SecuritiesRisk Trading Clients
Ignite Cluster
DURABLE MEMORY DURABLE MEMORY DURABLE MEMORY
Data Caches / Tables
Applications in
Java, .NET & C++
Wide Range of
Data Access and
Processing APIs
Shared Storage
across Apps &
Support for Multi-
Tenancy
Disk & Memory
Data Storage
© 2017 GridGain Systems, Inc.
#EUstr10SPARK SUMMIT
Durable Memory
Ignite Server Cluster
Off-heap Removes
noticeable GC pauses
Automatic
Defragmentation
Stores Superset
of Data
Predictable memory
consumption
Fully Transactional
(Write-Ahead Log)
DURABLE MEMORY DURABLE MEMORY DURABLE MEMORY
Server Node Server Node Server Node
Memory-Centric Storage
Instantaneous
Restarts
© 2017 GridGain Systems, Inc.
#EUstr10SPARK SUMMIT
Apache Ignite Features
JCache Compute Transactions
Scan & Text
QueriesSQL JDBC &
ODBC
StreamingServices
Java .NET C++ PHP BI ToolsMemcached REST
DURABLE MEMORY DURABLE MEMORY DURABLE MEMORY DURABLE MEMORY
Distributed Memory-Centric Storage
Dynamic
Scaling
Server Nodes
© 2017 GridGain Systems, Inc.
#EUstr10SPARK SUMMIT
1.Initial Request
2.Fetch data from remote nodes
3.Process the entire data-set
1.Initial request
2.Co-locate processing with data
3.Reduce multiple results into one
Client-Server Processing Co-located Processing
2
1
Data & Processing
Node
Data & Processing
Node
Client Node
33
Data 1
Data NodeData 2
Data Node
Processing Node
1
2
© 2017 GridGain Systems, Inc.
#EUstr10SPARK SUMMIT
Hadoop, Spark & Ignite Deployment
SQL &
Compute APIDB
File Exports
Ignite
Clients
Kafka Data
Streamer
Ignite Data
Streamer
Spark App
Hadoop
Data Node
Spark App
Hadoop
Data Node
Spark App
Hadoop
Data Node
Spark Clients
Server Nodes
IgniteRDD IgniteRDD IgniteRDD
© 2017 GridGain Systems, Inc.
#EUstr10SPARK SUMMIT
Apache Ignite Spark Integration
Spark Application
Spark Worker
Spark
Job
Spark
Job
Yarn Mesos Docker HDFS
Spark Worker
Spark
Job
Spark
Job
Spark Worker
Spark
Job
Spark
Job
In-Memory Shared RDD or DataFrame
Share RDD
across jobs on
the host
In-Memory
Indexes
SQL on top of
RDDs
Share RDD
Globally
Ignite Node Ignite Node Ignite Node
© 2017 GridGain Systems, Inc.
#EUstr10SPARK SUMMIT
• IgniteContext is the main entry point to Spark-Ignite integration:
val igniteContext = new IgniteContext[Integer, Integer]
(sparkContext, () => new IgniteConfiguration())
val cache = igniteContext.fromCache("myRdd")
val result = cache.filter(_._2.contains("Ignite")).collect()
val cacheRdd = igniteContext.fromCache("myRdd")
cacheRdd.savePairs(sparkContext.parallelize(1 to 10000, 10).map(i => (i, i)))
• Saving values to Ignite:
• Running SQL queries against Ignite Cache:
val cacheRdd = igniteContext.fromCache("myRdd")
val result = cacheRdd.sql
("select _val from Integer where val > ? and val < ?", 10, 100)
• Reading values from Ignite:
Working with IgniteRDD
© 2017 GridGain Systems, Inc.
#EUstr10SPARK SUMMIT
val companyCacheIgnite = new IgniteContext[Int, String](sc, () =>
new IgniteConfiguration()).fromCache("CompanyCache")
val dfCompany = sqlContext.createDataFrame(companyCacheIgnite.map(p=>
Company(p._1, p._2)))
dfCompany.registerTempTable("company")
Working with DataFrame API
• Create an IgniteRDD
• Create a “Company” DataFrame
• Register DataFrame as a table
© 2017 GridGain Systems, Inc.
#EUstr10SPARK SUMMIT
– Ingests data from HDFS or
another distributed file system
– Inclined towards analytics (OLAP)
and focused on MR-specific
payloads
– Requires the creation of RDD and
data and processing operations
are governed by it
– Basic disk-based SQL support
– Strong ML libraries
– Big community
– Data source agnostic
– Fully fledged compute engine and
durable storage
– OLAP & OLTP
– Zero-deployment
– In-Memory SQL support
– Fully ACID transactions across
memory and disk
– Less focused on Hadoop
– Early ML Support
– Growing Community
© 2017 GridGain Systems, Inc.
#EUstr10SPARK SUMMIT
• What is GridGain?
• Binary build of Apache Ignite™
• Added enterprise features for enterprise deployments
• Earlier features and bug fixes by a few weeks
• Fully certified & tested releases
“We develop and support the worlds leading In-Memory Computing Platform”
© 2017 GridGain Systems, Inc.
#EUstr10SPARK SUMMIT
Thank you for joining us. Follow the conversation.
http://ignite.apache.org
Any Questions?

More Related Content

What's hot

Dongwon Kim – A Comparative Performance Evaluation of Flink
Dongwon Kim – A Comparative Performance Evaluation of FlinkDongwon Kim – A Comparative Performance Evaluation of Flink
Dongwon Kim – A Comparative Performance Evaluation of Flink
Flink Forward
 
Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...
Databricks
 
Extending Apache Spark SQL Data Source APIs with Join Push Down with Ioana De...
Extending Apache Spark SQL Data Source APIs with Join Push Down with Ioana De...Extending Apache Spark SQL Data Source APIs with Join Push Down with Ioana De...
Extending Apache Spark SQL Data Source APIs with Join Push Down with Ioana De...
Databricks
 

What's hot (20)

Designing Structured Streaming Pipelines—How to Architect Things Right
Designing Structured Streaming Pipelines—How to Architect Things RightDesigning Structured Streaming Pipelines—How to Architect Things Right
Designing Structured Streaming Pipelines—How to Architect Things Right
 
Beyond SQL: Speeding up Spark with DataFrames
Beyond SQL: Speeding up Spark with DataFramesBeyond SQL: Speeding up Spark with DataFrames
Beyond SQL: Speeding up Spark with DataFrames
 
Debugging PySpark: Spark Summit East talk by Holden Karau
Debugging PySpark: Spark Summit East talk by Holden KarauDebugging PySpark: Spark Summit East talk by Holden Karau
Debugging PySpark: Spark Summit East talk by Holden Karau
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL Joins
 
Dongwon Kim – A Comparative Performance Evaluation of Flink
Dongwon Kim – A Comparative Performance Evaluation of FlinkDongwon Kim – A Comparative Performance Evaluation of Flink
Dongwon Kim – A Comparative Performance Evaluation of Flink
 
PySpark dataframe
PySpark dataframePySpark dataframe
PySpark dataframe
 
Photon Technical Deep Dive: How to Think Vectorized
Photon Technical Deep Dive: How to Think VectorizedPhoton Technical Deep Dive: How to Think Vectorized
Photon Technical Deep Dive: How to Think Vectorized
 
Understanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsUnderstanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIs
 
From HDFS to S3: Migrate Pinterest Apache Spark Clusters
From HDFS to S3: Migrate Pinterest Apache Spark ClustersFrom HDFS to S3: Migrate Pinterest Apache Spark Clusters
From HDFS to S3: Migrate Pinterest Apache Spark Clusters
 
Tuning and Debugging in Apache Spark
Tuning and Debugging in Apache SparkTuning and Debugging in Apache Spark
Tuning and Debugging in Apache Spark
 
Physical Plans in Spark SQL
Physical Plans in Spark SQLPhysical Plans in Spark SQL
Physical Plans in Spark SQL
 
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache Spark
 
Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...
 
Extending Apache Spark SQL Data Source APIs with Join Push Down with Ioana De...
Extending Apache Spark SQL Data Source APIs with Join Push Down with Ioana De...Extending Apache Spark SQL Data Source APIs with Join Push Down with Ioana De...
Extending Apache Spark SQL Data Source APIs with Join Push Down with Ioana De...
 
How Adobe Does 2 Million Records Per Second Using Apache Spark!
How Adobe Does 2 Million Records Per Second Using Apache Spark!How Adobe Does 2 Million Records Per Second Using Apache Spark!
How Adobe Does 2 Million Records Per Second Using Apache Spark!
 
Making Structured Streaming Ready for Production
Making Structured Streaming Ready for ProductionMaking Structured Streaming Ready for Production
Making Structured Streaming Ready for Production
 
Spark shuffle introduction
Spark shuffle introductionSpark shuffle introduction
Spark shuffle introduction
 
Flink vs. Spark
Flink vs. SparkFlink vs. Spark
Flink vs. Spark
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 

Viewers also liked

Spark Pipelines in the Cloud with Alluxio with Gene Pang
Spark Pipelines in the Cloud with Alluxio with Gene PangSpark Pipelines in the Cloud with Alluxio with Gene Pang
Spark Pipelines in the Cloud with Alluxio with Gene Pang
Spark Summit
 
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
Spark Summit
 
Optimal Strategies for Large Scale Batch ETL Jobs with Emma Tang
Optimal Strategies for Large Scale Batch ETL Jobs with Emma TangOptimal Strategies for Large Scale Batch ETL Jobs with Emma Tang
Optimal Strategies for Large Scale Batch ETL Jobs with Emma Tang
Databricks
 

Viewers also liked (12)

Storage Engine Considerations for Your Apache Spark Applications with Mladen ...
Storage Engine Considerations for Your Apache Spark Applications with Mladen ...Storage Engine Considerations for Your Apache Spark Applications with Mladen ...
Storage Engine Considerations for Your Apache Spark Applications with Mladen ...
 
Supporting Highly Multitenant Spark Notebook Workloads with Craig Ingram and ...
Supporting Highly Multitenant Spark Notebook Workloads with Craig Ingram and ...Supporting Highly Multitenant Spark Notebook Workloads with Craig Ingram and ...
Supporting Highly Multitenant Spark Notebook Workloads with Craig Ingram and ...
 
Accelerating Shuffle: A Tailor-Made RDMA Solution for Apache Spark with Yuval...
Accelerating Shuffle: A Tailor-Made RDMA Solution for Apache Spark with Yuval...Accelerating Shuffle: A Tailor-Made RDMA Solution for Apache Spark with Yuval...
Accelerating Shuffle: A Tailor-Made RDMA Solution for Apache Spark with Yuval...
 
Spark Pipelines in the Cloud with Alluxio with Gene Pang
Spark Pipelines in the Cloud with Alluxio with Gene PangSpark Pipelines in the Cloud with Alluxio with Gene Pang
Spark Pipelines in the Cloud with Alluxio with Gene Pang
 
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
Running Spark Inside Containers with Haohai Ma and Khalid Ahmed
 
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
Apache Spark Performance Troubleshooting at Scale, Challenges, Tools, and Met...
 
Optimal Strategies for Large Scale Batch ETL Jobs with Emma Tang
Optimal Strategies for Large Scale Batch ETL Jobs with Emma TangOptimal Strategies for Large Scale Batch ETL Jobs with Emma Tang
Optimal Strategies for Large Scale Batch ETL Jobs with Emma Tang
 
Best Practices for Using Alluxio with Apache Spark with Gene Pang
Best Practices for Using Alluxio with Apache Spark with Gene PangBest Practices for Using Alluxio with Apache Spark with Gene Pang
Best Practices for Using Alluxio with Apache Spark with Gene Pang
 
An Adaptive Execution Engine for Apache Spark with Carson Wang and Yucai Yu
An Adaptive Execution Engine for Apache Spark with Carson Wang and Yucai YuAn Adaptive Execution Engine for Apache Spark with Carson Wang and Yucai Yu
An Adaptive Execution Engine for Apache Spark with Carson Wang and Yucai Yu
 
Deep Learning and Streaming in Apache Spark 2.x with Matei Zaharia
Deep Learning and Streaming in Apache Spark 2.x with Matei ZahariaDeep Learning and Streaming in Apache Spark 2.x with Matei Zaharia
Deep Learning and Streaming in Apache Spark 2.x with Matei Zaharia
 
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
 
Spark Streaming Programming Techniques You Should Know with Gerard Maas
Spark Streaming Programming Techniques You Should Know with Gerard MaasSpark Streaming Programming Techniques You Should Know with Gerard Maas
Spark Streaming Programming Techniques You Should Know with Gerard Maas
 

Similar to Fast Data with Apache Ignite and Apache Spark with Christos Erotocritou

Ingesting streaming data for analysis in apache ignite (stream sets theme)
Ingesting streaming data for analysis in apache ignite (stream sets theme)Ingesting streaming data for analysis in apache ignite (stream sets theme)
Ingesting streaming data for analysis in apache ignite (stream sets theme)
Tom Diederich
 

Similar to Fast Data with Apache Ignite and Apache Spark with Christos Erotocritou (20)

Improving Apache Spark™ In-Memory Computing with Apache Ignite™
 Improving Apache Spark™ In-Memory Computing with Apache Ignite™ Improving Apache Spark™ In-Memory Computing with Apache Ignite™
Improving Apache Spark™ In-Memory Computing with Apache Ignite™
 
Apache Spark and Apache Ignite: Where Fast Data Meets IoT
Apache Spark and Apache Ignite: Where Fast Data Meets IoTApache Spark and Apache Ignite: Where Fast Data Meets IoT
Apache Spark and Apache Ignite: Where Fast Data Meets IoT
 
Apache Ignite - Distributed Database Orchestration
Apache Ignite - Distributed Database OrchestrationApache Ignite - Distributed Database Orchestration
Apache Ignite - Distributed Database Orchestration
 
Spark Summit EU talk by Christos Erotocritou
Spark Summit EU talk by Christos ErotocritouSpark Summit EU talk by Christos Erotocritou
Spark Summit EU talk by Christos Erotocritou
 
Ingesting streaming data for analysis in apache ignite (stream sets theme)
Ingesting streaming data for analysis in apache ignite (stream sets theme)Ingesting streaming data for analysis in apache ignite (stream sets theme)
Ingesting streaming data for analysis in apache ignite (stream sets theme)
 
[NEW LAUNCH!] Introducing Amazon Elastic Inference: Reduce Deep Learning Infe...
[NEW LAUNCH!] Introducing Amazon Elastic Inference: Reduce Deep Learning Infe...[NEW LAUNCH!] Introducing Amazon Elastic Inference: Reduce Deep Learning Infe...
[NEW LAUNCH!] Introducing Amazon Elastic Inference: Reduce Deep Learning Infe...
 
Apache Ignite: In-Memory Hammer for Your Data Science Toolkit
Apache Ignite: In-Memory Hammer for Your Data Science ToolkitApache Ignite: In-Memory Hammer for Your Data Science Toolkit
Apache Ignite: In-Memory Hammer for Your Data Science Toolkit
 
Apache Ignite: In-Memory Hammer for Your Data Science Toolkit
Apache Ignite: In-Memory Hammer for Your Data Science ToolkitApache Ignite: In-Memory Hammer for Your Data Science Toolkit
Apache Ignite: In-Memory Hammer for Your Data Science Toolkit
 
Introduction to Scalable Deep Learning on AWS with Apache MXNet
Introduction to Scalable Deep Learning on AWS with Apache MXNetIntroduction to Scalable Deep Learning on AWS with Apache MXNet
Introduction to Scalable Deep Learning on AWS with Apache MXNet
 
In-Memory Computing Essentials for Architects and Engineers
In-Memory Computing Essentials for Architects and EngineersIn-Memory Computing Essentials for Architects and Engineers
In-Memory Computing Essentials for Architects and Engineers
 
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-AirflowPyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
PyconZA19-Distributed-workloads-challenges-with-PySpark-and-Airflow
 
Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...
Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...
Melbourne: Certus Data 2.0 Vault Meetup with Snowflake - Data Vault In The Cl...
 
MCL303-Deep Learning with Apache MXNet and Gluon
MCL303-Deep Learning with Apache MXNet and GluonMCL303-Deep Learning with Apache MXNet and Gluon
MCL303-Deep Learning with Apache MXNet and Gluon
 
Data Summer Conf 2018, “Apache Ignite + Apache Spark RDDs and DataFrames inte...
Data Summer Conf 2018, “Apache Ignite + Apache Spark RDDs and DataFrames inte...Data Summer Conf 2018, “Apache Ignite + Apache Spark RDDs and DataFrames inte...
Data Summer Conf 2018, “Apache Ignite + Apache Spark RDDs and DataFrames inte...
 
Apache Spark and Apache Ignite: Where Fast Data Meets the IoT
Apache Spark and Apache Ignite: Where Fast Data Meets the IoTApache Spark and Apache Ignite: Where Fast Data Meets the IoT
Apache Spark and Apache Ignite: Where Fast Data Meets the IoT
 
Running Spark In Production in the Cloud is Not Easy with Nayur Khan
Running Spark In Production in the Cloud is Not Easy with Nayur KhanRunning Spark In Production in the Cloud is Not Easy with Nayur Khan
Running Spark In Production in the Cloud is Not Easy with Nayur Khan
 
Data relay introduction to big data clusters
Data relay introduction to big data clustersData relay introduction to big data clusters
Data relay introduction to big data clusters
 
Deep Dive on Amazon EC2 Accelerated Computing - AWS Online Tech Talks
Deep Dive on Amazon EC2 Accelerated Computing - AWS Online Tech TalksDeep Dive on Amazon EC2 Accelerated Computing - AWS Online Tech Talks
Deep Dive on Amazon EC2 Accelerated Computing - AWS Online Tech Talks
 
IMC Summit 2016 Breakout - Matt Coventon - Test Driving Streaming and CEP on ...
IMC Summit 2016 Breakout - Matt Coventon - Test Driving Streaming and CEP on ...IMC Summit 2016 Breakout - Matt Coventon - Test Driving Streaming and CEP on ...
IMC Summit 2016 Breakout - Matt Coventon - Test Driving Streaming and CEP on ...
 
Rightsizing Your Silicon Design Environment: Elastic Clusters for EDA Workloa...
Rightsizing Your Silicon Design Environment: Elastic Clusters for EDA Workloa...Rightsizing Your Silicon Design Environment: Elastic Clusters for EDA Workloa...
Rightsizing Your Silicon Design Environment: Elastic Clusters for EDA Workloa...
 

More from Spark Summit

Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Spark Summit
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Spark Summit
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Spark Summit
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Spark Summit
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Spark Summit
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Spark Summit
 

More from Spark Summit (20)

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub Wozniak
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim Simeonov
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir Volk
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
 

Recently uploaded

Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 

Recently uploaded (20)

Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 

Fast Data with Apache Ignite and Apache Spark with Christos Erotocritou

  • 1. Christos Erotocritou, GridGain Systems Fast Data with Apache Ignite & Apache Spark #EUstr10
  • 2. © 2017 GridGain Systems, Inc. #EUstr10SPARK SUMMIT …is a distributed, memory-centric data platform with powerful & flexible processing APIs
  • 3. © 2017 GridGain Systems, Inc. #EUstr10SPARK SUMMIT Apache Ignite Memory-Centric Data Platform Ignite Memory-Centric Storage Ignite Native Persistence (Flash, SSD, Intel 3D XPoint) Third-Party Persistence (RDBMS, HDFS, NoSQL) SQL Transactions Compute IgniteRDD MLStreamingKey/Value IoTFinancial Services Pharma & Healthcare E-CommerceTravel & Logistics Telco Applications
  • 4. © 2017 GridGain Systems, Inc. #EUstr10SPARK SUMMIT Memory-Centric Storage
  • 5. © 2017 GridGain Systems, Inc. #EUstr10SPARK SUMMIT Pure Ignite Deployment Front-End APIs SQL TXCompute Ignite RDD Key / Value Payments SecuritiesRisk Trading Clients Ignite Cluster DURABLE MEMORY DURABLE MEMORY DURABLE MEMORY Data Caches / Tables Applications in Java, .NET & C++ Wide Range of Data Access and Processing APIs Shared Storage across Apps & Support for Multi- Tenancy Disk & Memory Data Storage
  • 6. © 2017 GridGain Systems, Inc. #EUstr10SPARK SUMMIT Durable Memory Ignite Server Cluster Off-heap Removes noticeable GC pauses Automatic Defragmentation Stores Superset of Data Predictable memory consumption Fully Transactional (Write-Ahead Log) DURABLE MEMORY DURABLE MEMORY DURABLE MEMORY Server Node Server Node Server Node Memory-Centric Storage Instantaneous Restarts
  • 7. © 2017 GridGain Systems, Inc. #EUstr10SPARK SUMMIT Apache Ignite Features JCache Compute Transactions Scan & Text QueriesSQL JDBC & ODBC StreamingServices Java .NET C++ PHP BI ToolsMemcached REST DURABLE MEMORY DURABLE MEMORY DURABLE MEMORY DURABLE MEMORY Distributed Memory-Centric Storage Dynamic Scaling Server Nodes
  • 8. © 2017 GridGain Systems, Inc. #EUstr10SPARK SUMMIT 1.Initial Request 2.Fetch data from remote nodes 3.Process the entire data-set 1.Initial request 2.Co-locate processing with data 3.Reduce multiple results into one Client-Server Processing Co-located Processing 2 1 Data & Processing Node Data & Processing Node Client Node 33 Data 1 Data NodeData 2 Data Node Processing Node 1 2
  • 9. © 2017 GridGain Systems, Inc. #EUstr10SPARK SUMMIT Hadoop, Spark & Ignite Deployment SQL & Compute APIDB File Exports Ignite Clients Kafka Data Streamer Ignite Data Streamer Spark App Hadoop Data Node Spark App Hadoop Data Node Spark App Hadoop Data Node Spark Clients Server Nodes IgniteRDD IgniteRDD IgniteRDD
  • 10. © 2017 GridGain Systems, Inc. #EUstr10SPARK SUMMIT Apache Ignite Spark Integration Spark Application Spark Worker Spark Job Spark Job Yarn Mesos Docker HDFS Spark Worker Spark Job Spark Job Spark Worker Spark Job Spark Job In-Memory Shared RDD or DataFrame Share RDD across jobs on the host In-Memory Indexes SQL on top of RDDs Share RDD Globally Ignite Node Ignite Node Ignite Node
  • 11. © 2017 GridGain Systems, Inc. #EUstr10SPARK SUMMIT • IgniteContext is the main entry point to Spark-Ignite integration: val igniteContext = new IgniteContext[Integer, Integer] (sparkContext, () => new IgniteConfiguration()) val cache = igniteContext.fromCache("myRdd") val result = cache.filter(_._2.contains("Ignite")).collect() val cacheRdd = igniteContext.fromCache("myRdd") cacheRdd.savePairs(sparkContext.parallelize(1 to 10000, 10).map(i => (i, i))) • Saving values to Ignite: • Running SQL queries against Ignite Cache: val cacheRdd = igniteContext.fromCache("myRdd") val result = cacheRdd.sql ("select _val from Integer where val > ? and val < ?", 10, 100) • Reading values from Ignite: Working with IgniteRDD
  • 12. © 2017 GridGain Systems, Inc. #EUstr10SPARK SUMMIT val companyCacheIgnite = new IgniteContext[Int, String](sc, () => new IgniteConfiguration()).fromCache("CompanyCache") val dfCompany = sqlContext.createDataFrame(companyCacheIgnite.map(p=> Company(p._1, p._2))) dfCompany.registerTempTable("company") Working with DataFrame API • Create an IgniteRDD • Create a “Company” DataFrame • Register DataFrame as a table
  • 13. © 2017 GridGain Systems, Inc. #EUstr10SPARK SUMMIT – Ingests data from HDFS or another distributed file system – Inclined towards analytics (OLAP) and focused on MR-specific payloads – Requires the creation of RDD and data and processing operations are governed by it – Basic disk-based SQL support – Strong ML libraries – Big community – Data source agnostic – Fully fledged compute engine and durable storage – OLAP & OLTP – Zero-deployment – In-Memory SQL support – Fully ACID transactions across memory and disk – Less focused on Hadoop – Early ML Support – Growing Community
  • 14. © 2017 GridGain Systems, Inc. #EUstr10SPARK SUMMIT • What is GridGain? • Binary build of Apache Ignite™ • Added enterprise features for enterprise deployments • Earlier features and bug fixes by a few weeks • Fully certified & tested releases “We develop and support the worlds leading In-Memory Computing Platform”
  • 15. © 2017 GridGain Systems, Inc. #EUstr10SPARK SUMMIT Thank you for joining us. Follow the conversation. http://ignite.apache.org Any Questions?