Presenters: Rachel Pedreschi, Senior Director, Solutions Engineering, Imply.io + Josh Treichel, Partner Solutions Architect, Confluent
Analytic pipelines running purely on batch processing systems can suffer from hours of data lag, resulting in accuracy issues with analysis and overall decision-making. Join us for a demo to learn how easy it is to integrate your Apache Kafka® streams in Apache Druid (incubating) to provide real-time insights into the data.
In this online talk, you’ll hear about ingesting your Kafka streams into Imply’s scalable analytic engine and gaining real-time insights via a modern user interface.
Register now to learn about:
-The benefits of combining a real-time streaming platform with a comprehensive analytics stack
-Building an analytics pipeline by integrating Confluent Platform and Imply
-How KSQL, streaming SQL for Kafka, can easily transform and filter streams of data in real time
-Querying and visualizing streaming data in Imply
-Practical ways to implement Confluent Platform and Imply to address common use cases such as analyzing network flows, collecting and monitoring IoT data and visualizing clickstream data
Confluent Platform, developed by the creators of Kafka, enables the ingest and processing of massive amounts of real-time event data. Imply, the complete analytics stack built on Druid, can ingest, store, query and visualize streaming data from Confluent Platform, enabling end-to-end real-time analytics. Together, Confluent and Imply can provide low latency data delivery, data transform, and data querying capabilities to power a range of use cases.
Achieve Sub-Second Analytics on Apache Kafka with Confluent and Imply
1. Achieve Sub-Second
Analytics on Apache Kafka
with Confluent and Imply
Rachel Pedreschi, Field Engineering Director, Imply Data
Josh Treichel, Partner Solutions Architect, Confluent
2. 2
As a software engineer, Josh has spent over 10 years building,
integrating and supporting complex systems. He previously worked on
Confluent’s Customer Operations Team supporting some of the largest
Kafka and Confluent deployments in the world.
Josh Treichel
Partner Solutions Architect, Confluent
A "big data geek-ette," Rachel is no stranger to the world of big data, fast
data and everything in between. She is a Vertica-, Informix-, and
Redbrick-certified database administrator on top of her work with Apache
Cassandra™, Apache® Ignite™ and Apache Druid (incubating). She has
more than 20 years of high-performance database experience. Rachel has
an MBA from San Francisco State University.
Rachel Pedreschi
Field Engineering Director, Imply Data
3. 3
Session Overview
● This session will be one hour
● The last 10-15 minutes will consist of Q&A
● Submit questions by entering them into the GoToWebinar panel
● The slides and recording will be available
5. Founded by the creators
of Apache Kafka
Technology Developed
while at LinkedIn
Largest Contributor and
tester of Apache Kafka
● Founded in 2014
● Raised $84M from Benchmark, Index, Sequoia
● 350+ Employees
● Transacting in 20 countries
● Hundreds of enterprise subscription customers
● Commercial entities in US, UK, Germany, Australia
6. 66
Business Digitization Trends are Revolutionizing your Data Flow
Massive volumes of
new data generated
every day
Mobile Cloud Microservices Internet of
Things
Machine
Learning
Distributed across
apps, devices,
datacenters, clouds
Structured,
unstructured
7. 77
Legacy Data Infrastructure Solutions Have Architectural Flaws
App App
DWH
Transactional
Databases
Analytics
Databases
Data Flow
DB DB
App App
MOM MOM
ETL
ETL
ESB
These solutions can be
● Batch-oriented, instead of
event-oriented in real time
● Complex to scale at high
throughput
● Connected point-to-point, instead
of publish / subscribe
● Lacking data persistence and
retention
● Incapable of in-flight message
processing
App App
8. 88
Modern Architectures are Adapting to New Data Requirements
NoSQL DBs Big Data Analytics
But how do we
revolutionize data flow
in a world of exploding,
distributed and ever
changing data?
App App
DWH
Transactional
Databases
Analytics
Databases
Data Flow
DB DB
App App
MOM MOM
ETL
ETL
ESB
App App
9. 99
The Solution is a Streaming Platform for Real-Time Data Processing
A Streaming Platform
provides a single source
of truth about your data
to everyone in your
organization
NoSQL DBs Big Data Analytics
App App
DWH
Transactional
Databases
Analytics
Databases
Data Flow
DB DB
App AppApp App
Streaming Platform
11. 1111
Over 35% of Fortune 500’s Already Trust Kafka for Mission-Critical Apps
6 of top 10
Travel
7 of top 10
Global banks
8 of top 10
Insurance
9 of top 10
Telecom
12. 1212
Pub-sub messaging in real-time at scale
Connectivity for all producers and consumers
Data persistence with infinite retention
Stream processing without coding
Distributed architecture for global deployment
Confluent Platform
The streaming platform built by the creators of Apache Kafka
13. 1313
Confluent Delivers a Mission-Critical Streaming Platform
Apache Kafka®
Core | Connect API | Streams API
Data Compatibility
Schema Registry
Enterprise Operations
Replicator | Auto Data Balancer | Connectors | MQTT Proxy | Operator
Database
Changes
Log Events IoT Data Web Events other events
Hadoop
Database
Data
Warehouse
CRM
other
DATA INTEGRATION
Transformations
Custom Apps
Analytics
Monitoring
other
REAL-TIME APPLICATIONS
OPEN SOURCE FEATURES COMMERCIAL FEATURES
Datacenter Public Cloud Confluent Cloud
Confluent Platform
Management & Monitoring
Control Center | Security
Development & Connectivity
Clients | Connectors | REST Proxy | KSQL
CONFLUENT FULLY-MANAGEDCUSTOMER SELF-MANAGED
14. 1414
KSQL: Enable Stream Processing using SQL-like Semantics
Example Use Cases
• Streaming ETL
• Anomaly detection
• Event monitoring
Leverage Kafka Streams API
without any coding required
KSQL server
Engine
(runs queries)
REST API
CLIClients
Confluent
Control Center
GUI
Kafka Cluster
Use any programming language
Connect via Control Center UI,
CLI, REST or headless
15. 1515
CREATE STREAM enriched_clickstream AS
SELECT userid,status,request,ip,users.city
FROM clickstream c
LEFT JOIN web_users users on c.userid =
users.user_id;
KSQL: the Simplest Way to Do Stream Processing
Streaming ETL
16. Founded by the creators
of Apache Druid
(incubating) and D3
Technology Developed
while at Metamarkets
Largest Contributor and
tester of Apache Druid
● Founded in 2015
● Raised $13M from Andreesen Horowitz, Khosla
● 1000s of open source implementations
● Hundreds of enterprise subscription customers
● End to end streaming analytics platform built on
Apache Druid
17. 17
Old World - Data Warehouses and Data Marts
images copied from: https://panoply.io/data-warehouse-guide/data-warehouse-architecture-traditional-vs-cloud/
18. 18
Less Old World - Data Lakes
images copied from: https://panoply.io/data-warehouse-guide/data-warehouse-architecture-traditional-vs-cloud/
19. 19
New World - Data Rivers!
images copied from: https://panoply.io/data-warehouse-guide/data-warehouse-architecture-traditional-vs-cloud/
26. 26Confidential
Putting KSQL to work with Imply: Monitoring for real-time business alerts
User click-stream data flows through Kafka and
enriched with KSQL
Data is easily visualized and explored in Imply
WithKSQL
Your website
Click-stream
Exploratory Analytics
Dashboards
Machine Learning / AI
Real-time
insights