Real-time connectivity of databases and systems is critical in enterprises adopting digital transformation to support super-fast decisioning to drive applications like fraud detection, digital payments, recommendation engines. This talk will focus on the many functions that database streaming serves with Kafka, Spark and Aerospike. We will explore how to eliminate the wall between transaction processing and analytics by synthesizing streaming data with system of record data, to gain key insights in real-time.
Aerospike Meetup - Real Time Insights using Spark with Aerospike - Zohar - 04...Aerospike
Similar to Distributed Data Storage & Streaming for Real-time Decisioning Using Kafka, Spark and Aerospike (Kiran Matty, Aerospike) Kafka Summit 2020 (20)
Distributed Data Storage & Streaming for Real-time Decisioning Using Kafka, Spark and Aerospike (Kiran Matty, Aerospike) Kafka Summit 2020
1. 1
Distributed Data Storage and Streaming for Real-
time Decisioning using Kafka, Spark and Aerospike
Kiran Matty
August 27, 2020
2. 2
▪ Director of PM for Ecosystem @ Aerospike
▪ Domain experience spans Big Data Infrastructure and Data Security @ Visa, Hortonworks, HPE, and Cisco
▪ Interests include large scale distributed systems and AI/ML
▪ Lego builder in spare time
Who Am I?
3. 3
About Aerospike
Aerospike Delivers Superior Reliability and Performance at the Lowest TCO
Lowest TCO
TCO ($)
Scale TB
Large, growing, unmet need
Alternative
TCO
Aerospike
TCO
Performance
Scale TB
Significant functional
overlap - Commodity DB
problem set
Strategic Operational Apps
- Superior Uptime & Resiliency
- Transactional
- Low latency
4. 4
The Aerospike Difference
Patented Flash
Optimized Storage Layer
ü Significantly higher
performance & IOPS
Multi-threaded
Massively Parallel
ü ‘Scale up’ and ‘Scale out’
Self-healing
clusters
ü Superior Uptime,
Availability and Reliability
ü Single-hop to data
Storage indices in DRAM
Data on optimized SSD’s
ü Predictable Performance
regardless of scale
patented
Aerospike Hybrid Memory Architecture TM
6. 6
Aerospike Connect for Real-time Streaming Use Cases
Real-time monitoring Fintech
IIoT/Predictive Maintenance AIPersonalization/360o profile
Fraud Detection
7. 7
What is Aerospike Connect for Kafka?
Outbound* Inbound*
Kafka
Producer
IOT/edge devices
Supported Formats
*works on both Apache Kafka and Confluent
Platform
Change
Notification
9. 9
Data Warehouse Data Lake
Legacy RDBMS HDFS Based
XDR
Edge Systems
System of Record
Aerospike Database
API
API
Aerospike Connect @ Scale
Aerospike
Database
Aerospike
Database
XDR
API
XDR
StreamingAPI
Filesystem
HDFS BasedLegacy RDBMS
Spark Cluster (300 nodes)
ü 33 Node Aerospike cluster
used
ü 4,096 Aerospike
partitions mapped to
215 (32,768)max
Spark partitions per
namespace to achieve
massive parallelization
ü Max 32 namespaces are
supported per cluster
Spark Connector API
2
1
1
2
Training
Inference
3rd party data
Kafka Connector
10. 10
Real-Time Processing of Trading Data w/ Aerospike
Aerospike
Database
Aerospike
Database
Real-time stock
ticker data
Note: Conceptual view
11. 11
HPE COVID-19 Response w/ Aerospike Solution: Schema-less Data Mining
Architecture for Rapid COVID-19
Knowledge Discovery:
“How quickly is COVID-19
spreading?” “How likely is its
recurrence after recovery?”, etc.
Requirements for Aerospike:
1. Sub-millisecond read/write
latency
2. Millions of IOPS
3. Low TCO
4. Strong Consistency
Results: Aerospike was successfully
used for high velocity ingest to
enable Real-time analytics
downstream
Source: Theresa Melvin, Chief Architect of AI Driven Big Data Solutions, HPE
Aerospike Python Client
Aerospike Flink
Connector*
*Not a GA’d product
12. 12
▪ Lower TCO
– Combines cost efficiencies of Kafka and Aerospike
▪ Reduce time to insight
– Combines the speed and parallelism of Kafka and Aerospike
▪ Deploy Anywhere
Why Aerospike Connect for Apache Kafka?