Achieve Sub-Second Analytics on Apache Kafka with Confluent and Imply

Achieve Sub-Second
Analytics on Apache Kafka
with Confluent and Imply
Rachel Pedreschi, Field Engineering Director, Imply Data
Josh Treichel, Partner Solutions Architect, Confluent

2
As a software engineer, Josh has spent over 10 years building,
integrating and supporting complex systems. He previously worked on
Confluent’s Customer Operations Team supporting some of the largest
Kafka and Confluent deployments in the world.
Josh Treichel
Partner Solutions Architect, Confluent
A "big data geek-ette," Rachel is no stranger to the world of big data, fast
data and everything in between. She is a Vertica-, Informix-, and
Redbrick-certified database administrator on top of her work with Apache
Cassandra™, Apache® Ignite™ and Apache Druid (incubating). She has
more than 20 years of high-performance database experience. Rachel has
an MBA from San Francisco State University.
Rachel Pedreschi
Field Engineering Director, Imply Data

3
Session Overview
● This session will be one hour
● The last 10-15 minutes will consist of Q&A
● Submit questions by entering them into the GoToWebinar panel
● The slides and recording will be available

4
https://tinyurl.com/confluentimply

Founded by the creators
of Apache Kafka
Technology Developed
while at LinkedIn
Largest Contributor and
tester of Apache Kafka
● Founded in 2014
● Raised $84M from Benchmark, Index, Sequoia
● 350+ Employees
● Transacting in 20 countries
● Hundreds of enterprise subscription customers
● Commercial entities in US, UK, Germany, Australia

66
Business Digitization Trends are Revolutionizing your Data Flow
Massive volumes of
new data generated
every day
Mobile Cloud Microservices Internet of
Things
Machine
Learning
Distributed across
apps, devices,
datacenters, clouds
Structured,
unstructured

77
Legacy Data Infrastructure Solutions Have Architectural Flaws
App App
DWH
Transactional
Databases
Analytics
Databases
Data Flow
DB DB
App App
MOM MOM
ETL
ETL
ESB
These solutions can be
● Batch-oriented, instead of
event-oriented in real time
● Complex to scale at high
throughput
● Connected point-to-point, instead
of publish / subscribe
● Lacking data persistence and
retention
● Incapable of in-flight message
processing
App App

88
Modern Architectures are Adapting to New Data Requirements
NoSQL DBs Big Data Analytics
But how do we
revolutionize data flow
in a world of exploding,
distributed and ever
changing data?
App App
DWH
Transactional
Databases
Analytics
Databases
Data Flow
DB DB
App App
MOM MOM
ETL
ETL
ESB
App App

99
The Solution is a Streaming Platform for Real-Time Data Processing
A Streaming Platform
provides a single source
of truth about your data
to everyone in your
organization
NoSQL DBs Big Data Analytics
App App
DWH
Transactional
Databases
Analytics
Databases
Data Flow
DB DB
App AppApp App
Streaming Platform

1010
Kafka: Next Generation Messaging

1111
Over 35% of Fortune 500’s Already Trust Kafka for Mission-Critical Apps
6 of top 10
Travel
7 of top 10
Global banks
8 of top 10
Insurance
9 of top 10
Telecom

1212
Pub-sub messaging in real-time at scale
Connectivity for all producers and consumers
Data persistence with infinite retention
Stream processing without coding
Distributed architecture for global deployment
Confluent Platform
The streaming platform built by the creators of Apache Kafka

1313
Confluent Delivers a Mission-Critical Streaming Platform
Apache Kafka®
Core | Connect API | Streams API
Data Compatibility
Schema Registry
Enterprise Operations
Replicator | Auto Data Balancer | Connectors | MQTT Proxy | Operator
Database
Changes
Log Events IoT Data Web Events other events
Hadoop
Database
Data
Warehouse
CRM
other
DATA INTEGRATION
Transformations
Custom Apps
Analytics
Monitoring
other
REAL-TIME APPLICATIONS
OPEN SOURCE FEATURES COMMERCIAL FEATURES
Datacenter Public Cloud Confluent Cloud
Confluent Platform
Management & Monitoring
Control Center | Security
Development & Connectivity
Clients | Connectors | REST Proxy | KSQL
CONFLUENT FULLY-MANAGEDCUSTOMER SELF-MANAGED

1414
KSQL: Enable Stream Processing using SQL-like Semantics
Example Use Cases
• Streaming ETL
• Anomaly detection
• Event monitoring
Leverage Kafka Streams API
without any coding required
KSQL server
Engine
(runs queries)
REST API
CLIClients
Confluent
Control Center
GUI
Kafka Cluster
Use any programming language
Connect via Control Center UI,
CLI, REST or headless

1515
CREATE STREAM enriched_clickstream AS
SELECT userid,status,request,ip,users.city
FROM clickstream c
LEFT JOIN web_users users on c.userid =
users.user_id;
KSQL: the Simplest Way to Do Stream Processing
Streaming ETL

Founded by the creators
of Apache Druid
(incubating) and D3
Technology Developed
while at Metamarkets
Largest Contributor and
tester of Apache Druid
● Founded in 2015
● Raised $13M from Andreesen Horowitz, Khosla
● 1000s of open source implementations
● Hundreds of enterprise subscription customers
● End to end streaming analytics platform built on
Apache Druid

17
Old World - Data Warehouses and Data Marts
images copied from: https://panoply.io/data-warehouse-guide/data-warehouse-architecture-traditional-vs-cloud/

18
Less Old World - Data Lakes

19
New World - Data Rivers!

22
Imply is the only end to end solution for streaming analytics built on Apache Druid

26Confidential
Putting KSQL to work with Imply: Monitoring for real-time business alerts
User click-stream data flows through Kafka and
enriched with KSQL
Data is easily visualized and explored in Imply
WithKSQL
Your website
Click-stream
Exploratory Analytics
Dashboards
Machine Learning / AI
Real-time
insights

28
Resources and Next Steps
https://confluent.io
http://cnfl.io/ksql
http://cnfl.io/slack
#ksql
@confluentinc

Achieve Sub-Second Analytics on Apache Kafka with Confluent and Imply

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Achieve Sub-Second Analytics on Apache Kafka with Confluent and Imply

Similar a Achieve Sub-Second Analytics on Apache Kafka with Confluent and Imply (20)

Más de confluent

Más de confluent (20)

Último

Último (20)

Achieve Sub-Second Analytics on Apache Kafka with Confluent and Imply