This webinar discusses five early challenges of building streaming fast data applications: 1) choosing among alternative streaming frameworks like Kafka Streams, Spark Streaming, and Flink; 2) integrating microservices with streaming services; 3) understanding operational challenges of streaming services; 4) gaining competitive advantage through machine learning on fast data; and 5) optimizing resource utilization across large clusters running many components. The webinar promotes Lightbend's Fast Data Platform as providing an easy on-ramp and complete solution for these challenges.
Powering Real-Time Decisions with Continuous Data Streams
Five Early Challenges Of Building Streaming Fast Data Applications
1. Craig Blitz, Senior Product Director at Lightbend
WEBINAR
Five Early Challenges Of Building
Streaming Fast Data Applications
2. Why does fast matter?
Recommendation Engines Automation Competitive Advantage
3. Fast Data: Opportunity Meets Necessity
Apache
Hadoop
20142005
Early use of
MapReduce and
Hadoop
Hadoop 1.0
2008
Spark 0.5
Spark 1.0
2011 2017
Spark 2.0
Structured
StreamingMLlib
Akka
Streams
Growth in Mobile Data
Traffic 2009-2020
[Source: Carrier & Public
Wi-Fi, July 2015, Mobile
Experts LLC]
Flink 1.0,
kafka
streams
?
Apache
Beam 2.0 (!!)
Apache
Beam 0.6
4. Growth in Streaming Traffic Coincides with Microservices
and Cloud Native Apps
Microservices interest over time
(2004-2017)
5. What is an integrated Fast Data Platform?
• A solution that ties together fast data components, microservices, cluster
management, application/service lifecycle management, and support.
Messaging
Microservices
Streaming
Services
Persistence
Management
Monitoring
6. Lots of Innovation, but Maturity Lags
• Innovation within components
• Solution comprises many components
• Components supported by different companies
• Aspects of SDLC remain tricky
7. Survey Says….
A currently open survey by Lightbend looks at Fast Data and related topics.
Preliminary results of 1200 initial respondents:
• 86% said they are dealing with more data compared to the past.
• More than half are scrambling to process data more quickly.
• The majority today process data once-daily / intra daily.
• The majority are in production or pilot for production with microservices.
• What's tough about Fast Data: technology choice, implementation and scale.
9. • Excels at low-cost, scalable batch analytics
• Data Warehouse Replacement
• Less suitable for real-time (streaming)
Hadoop
10. Streaming Engines – So many to choose from!
Kafka Streams
•Kafka Library
•Consume, Produce
Kafka Topics
•Pull model instead
of Async +
Backpressure
•Useful for stateful
stream processing
Akka Streams
•Low-latency
Complex Event
Processing
•Integration with
data sources/sinks
•Iterative, pipelined
processing
•Integration with
microservices
Spark Streaming
•Mini-batch
•Machine Learning
•Longer running
jobs like Training
Models
•Supports batch
and near-real time
•Run SQL jobs
Apache Flink
•High-Volume, Low
Latency
•True streaming
•Iterative, pipelined
processing
•Excellent Apache
Beam Support
11. So That’s Perfectly Clear
• Choices are not always obvious
• Tradeoff speed, memory, choice of libraries, …
• Application may require multiple engines
14. • A streaming service should appear as just another service in your
architecture
• Must be reactive: elastic, resilient, responsive, and message-driven
• Unlike Hadoop systems, which can serve results to a service when ready
• We shouldn’t care how a service is implemented
Streaming Services are Part of your Application
Service
A
Service
C
Service
B
17. • How do they manage state?
• How do you scale them?
• How do you version/upgrade them?
Stream Engines Do Not Always Meet Microservices Goals
In most cases, the operator needs to know too much about the
underlying component or service
19. • Branch of artificial intelligence
• Recognize patterns in data
• Build models to predict outcomes
• Recommend actions based on predicted outcomes vs stated goals
What Do We Mean By Machine Learning?
21. • Example Uses Cases
• Fraud and Anomaly Detection
• Recommendation Engines / Marketing
Personalization
• Financial Trading
• Smart Cars
• Natural Language Processing
• Automation
How Can Businesses Identify Machine Learning
Opportunities?
Stop!
Ask Yourself:
Where I have
hard-coded
models or rules?
23. • Clusters Can Get Quite Large with Many Moving Components
• Interaction Between Components Quite Complex
• First Generation Auto-Scalers Naïve
First Generation Resource Optimization
“Scale when
CPU reaches
80%”
“Scale when
Queue Length >
10”
24. • Clusters Can Get Quite Large with Many Moving Components
• Interaction Between Components Quite Complex
• Bottlenecks shift over time as application and infrastructure changes
But Hard To Tie These Rules Back to Business Objectives
25. What You Really Want
“Scale what you need to
scale to continue to meet
service-level objectives”
26. • On-Line Machine Learning Can Help
• Specify Service Level Objectives per service or application
• But Challenges Remain….
• Hard to Build
• Need knowledge of how to scale components
• “Operator Model”
Good News
27.
28. • Easy on-ramp for getting started
• Curated choice of components
• Complete monitoring and intelligent management
• Support across entire platform
Lightbend Fast Data Platform