More Related Content Similar to Fast Data with Apache Ignite and Apache Spark with Christos Erotocritou (20) More from Spark Summit (20) Fast Data with Apache Ignite and Apache Spark with Christos Erotocritou2. © 2017 GridGain Systems, Inc.
#EUstr10SPARK SUMMIT
…is a distributed, memory-centric data platform
with powerful & flexible processing APIs
3. © 2017 GridGain Systems, Inc.
#EUstr10SPARK SUMMIT
Apache Ignite Memory-Centric Data Platform
Ignite Memory-Centric Storage
Ignite Native Persistence
(Flash, SSD, Intel 3D XPoint)
Third-Party Persistence
(RDBMS, HDFS, NoSQL)
SQL Transactions Compute IgniteRDD MLStreamingKey/Value
IoTFinancial
Services
Pharma &
Healthcare
E-CommerceTravel &
Logistics
Telco
Applications
4. © 2017 GridGain Systems, Inc.
#EUstr10SPARK SUMMIT
Memory-Centric Storage
5. © 2017 GridGain Systems, Inc.
#EUstr10SPARK SUMMIT
Pure Ignite Deployment
Front-End APIs
SQL TXCompute
Ignite
RDD
Key /
Value
Payments SecuritiesRisk Trading Clients
Ignite Cluster
DURABLE MEMORY DURABLE MEMORY DURABLE MEMORY
Data Caches / Tables
Applications in
Java, .NET & C++
Wide Range of
Data Access and
Processing APIs
Shared Storage
across Apps &
Support for Multi-
Tenancy
Disk & Memory
Data Storage
6. © 2017 GridGain Systems, Inc.
#EUstr10SPARK SUMMIT
Durable Memory
Ignite Server Cluster
Off-heap Removes
noticeable GC pauses
Automatic
Defragmentation
Stores Superset
of Data
Predictable memory
consumption
Fully Transactional
(Write-Ahead Log)
DURABLE MEMORY DURABLE MEMORY DURABLE MEMORY
Server Node Server Node Server Node
Memory-Centric Storage
Instantaneous
Restarts
7. © 2017 GridGain Systems, Inc.
#EUstr10SPARK SUMMIT
Apache Ignite Features
JCache Compute Transactions
Scan & Text
QueriesSQL JDBC &
ODBC
StreamingServices
Java .NET C++ PHP BI ToolsMemcached REST
DURABLE MEMORY DURABLE MEMORY DURABLE MEMORY DURABLE MEMORY
Distributed Memory-Centric Storage
Dynamic
Scaling
Server Nodes
8. © 2017 GridGain Systems, Inc.
#EUstr10SPARK SUMMIT
1.Initial Request
2.Fetch data from remote nodes
3.Process the entire data-set
1.Initial request
2.Co-locate processing with data
3.Reduce multiple results into one
Client-Server Processing Co-located Processing
2
1
Data & Processing
Node
Data & Processing
Node
Client Node
33
Data 1
Data NodeData 2
Data Node
Processing Node
1
2
9. © 2017 GridGain Systems, Inc.
#EUstr10SPARK SUMMIT
Hadoop, Spark & Ignite Deployment
SQL &
Compute APIDB
File Exports
Ignite
Clients
Kafka Data
Streamer
Ignite Data
Streamer
Spark App
Hadoop
Data Node
Spark App
Hadoop
Data Node
Spark App
Hadoop
Data Node
Spark Clients
Server Nodes
IgniteRDD IgniteRDD IgniteRDD
10. © 2017 GridGain Systems, Inc.
#EUstr10SPARK SUMMIT
Apache Ignite Spark Integration
Spark Application
Spark Worker
Spark
Job
Spark
Job
Yarn Mesos Docker HDFS
Spark Worker
Spark
Job
Spark
Job
Spark Worker
Spark
Job
Spark
Job
In-Memory Shared RDD or DataFrame
Share RDD
across jobs on
the host
In-Memory
Indexes
SQL on top of
RDDs
Share RDD
Globally
Ignite Node Ignite Node Ignite Node
11. © 2017 GridGain Systems, Inc.
#EUstr10SPARK SUMMIT
• IgniteContext is the main entry point to Spark-Ignite integration:
val igniteContext = new IgniteContext[Integer, Integer]
(sparkContext, () => new IgniteConfiguration())
val cache = igniteContext.fromCache("myRdd")
val result = cache.filter(_._2.contains("Ignite")).collect()
val cacheRdd = igniteContext.fromCache("myRdd")
cacheRdd.savePairs(sparkContext.parallelize(1 to 10000, 10).map(i => (i, i)))
• Saving values to Ignite:
• Running SQL queries against Ignite Cache:
val cacheRdd = igniteContext.fromCache("myRdd")
val result = cacheRdd.sql
("select _val from Integer where val > ? and val < ?", 10, 100)
• Reading values from Ignite:
Working with IgniteRDD
12. © 2017 GridGain Systems, Inc.
#EUstr10SPARK SUMMIT
val companyCacheIgnite = new IgniteContext[Int, String](sc, () =>
new IgniteConfiguration()).fromCache("CompanyCache")
val dfCompany = sqlContext.createDataFrame(companyCacheIgnite.map(p=>
Company(p._1, p._2)))
dfCompany.registerTempTable("company")
Working with DataFrame API
• Create an IgniteRDD
• Create a “Company” DataFrame
• Register DataFrame as a table
13. © 2017 GridGain Systems, Inc.
#EUstr10SPARK SUMMIT
– Ingests data from HDFS or
another distributed file system
– Inclined towards analytics (OLAP)
and focused on MR-specific
payloads
– Requires the creation of RDD and
data and processing operations
are governed by it
– Basic disk-based SQL support
– Strong ML libraries
– Big community
– Data source agnostic
– Fully fledged compute engine and
durable storage
– OLAP & OLTP
– Zero-deployment
– In-Memory SQL support
– Fully ACID transactions across
memory and disk
– Less focused on Hadoop
– Early ML Support
– Growing Community
14. © 2017 GridGain Systems, Inc.
#EUstr10SPARK SUMMIT
• What is GridGain?
• Binary build of Apache Ignite™
• Added enterprise features for enterprise deployments
• Earlier features and bug fixes by a few weeks
• Fully certified & tested releases
“We develop and support the worlds leading In-Memory Computing Platform”
15. © 2017 GridGain Systems, Inc.
#EUstr10SPARK SUMMIT
Thank you for joining us. Follow the conversation.
http://ignite.apache.org
Any Questions?