As companies continue to invest in big data, their focus is shifting from predictive analytics for reporting and business dashboards to machine learning & AI for real-time intelligent decision-making embedded in software. Many organizations are testing, exploring and piloting applications that automatically promote trending products, adjust prices or respond to alerts raised by intelligent real-time systems.
In her talk, Ms. Victoria Livschitz, founder and CTO of Grid Dynamics, will discuss common business drivers of real-time analytics applications and the emerging platforms for building such applications.
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Open Blueprint for Real-Time Analytics with In-Stream Processing
1. Privileged and confidential
Open Blueprint for Real-Time Analytics
with In-Stream Processing
Victoria Livschitz, Founder & CTO, Grid Dynamics
09/28/2017
2. 2
Business Need
About the speaker:
CTO @Grid Dynamics: present
Founder and CEO @Grid Dynamics: 2006 – 2013
Principal engineer @Sun: 1997 – 2006
Engineering IT services company focused on digital transformation
through cloud, big data & open source for Fortune 500 clients.
Pioneer in real-time processing from company’s inception in 2006.
Architected 3 out of top-10 busiest e-commerce sites. Never had
production outage in peak season.
Frequent contributor to open source projects: Hadoop, Solr,
Lucene, Storm, others.
Victoria Livschitz
About Grid Dynamics:
3. 3
Agenda
• What is “real-time” in analytics, and why it matters
• In-Stream Processing: emerging platform for real-time processing
• Open ISP blueprint: reference architecture, reference implementation
• Take ISP for a spin: reference demo of real-time twitter sentiment analysis
5. 5
What is “real-time” in analytics, machine learning,
data sciences & AI?
Receive
event
Event
Analyze
event
Act on
event
ResponseAugment
model
How long is the cycle?
What is done online vs. offline?
Learning Analysis
6. 6
Weeks Days Hours Seconds
Receive
event
Event Analyze
event
Act on
event
ResponseAugment
model
How long is the cycle?
What is done online vs. offline?
Learning Analysis
What is “real-time” in analytics, machine learning,
data sciences & AI?
8. 2. Offline learning, real-time
analytics, online response
Event
Act on
event
Response
Receive
event
A few seconds
A day
Receive
event
Augment
model
Analyze
event
Modify
reaction
1.Offline learning/analytics, online response
Valueof“real-time”
Event
Receive
event Response
Analyze
event
Act on
event
A few
seconds
Receive
event
Augment
modelA day
9. Receive
event
Analyze
event
Act on
event
Augment
model
3. Real-time learning/analytics, online response A few seconds
2. Offline learning, real-time
analytics, online response
Event
Act on
event
Response
Receive
event
A few seconds
A day
Receive
event
Augment
model
Analyze
event
Modify
reaction
1.Offline learning/analytics, online response
Valueof“real-time”
Event
Receive
event Response
Analyze
event
Act on
event
A few
seconds
Receive
event
Augment
modelA day
Event Response
11. 11
Classification of retail use cases relative to “real-timeness”
Level 1: Segmented historic context: data on
what happened to all such customers before
Level 3: Situational context: where customer is,
what she wants – or might buy - right now
Level 4: Supply chain dynamics: demand surge,
product availability, competitive pricing
From time to time, send a coupon
based on a segment
Level 2: individualized historic context: 360-
degree view across personal data
On a birthday, offer a coupon based
on personal history
Right now, offer a product based on
what’s in her hands
During a storm, deliver trending
umbrella/pancho combo
Example: Personalized Offers
12. 12
Level 1: Segmented historic context: data on
what happened to all such customers before
Level 3: Situational context: where customer is,
what she wants – or might buy - right now
Level 4: Supply/demand dynamics: impact of
demand surge, shortage, competitive actions...
Level 2: individualized historic context: 360-
degree view across individual’s data Suited
for offline
ML
Requires
real-time
ML
Historic aggregated
data
Real-time
individual’s data
Historic
individual’s data
Real-time
everything
Classification of retail use cases relative to “real-timeness”
13. 13
Top 6 drivers of real-time applications in retail
#3. Dynamic pricing
Determine “right price” for products
based on availability, trending,
personal context & competitive price
#1. Personalized search
Augment search hits and relevancy
ranking based on personal context &
history
#2. Personalized offers
Motivate “buy now” behavior by
offering deals based on personal
context & history
#4. Dynamic inventory
Predict inventory needs & re-stock
products in stores based on
fluctuations in inventory & demand
#5. Intelligent sourcing
Determine what order to source from
what store to optimize delivery SLAs
& shipment costs
#6. Real-time alerts
Detect unusual patterns: fraud, surge in
demand, weather changes, shift in
brand sentiment. Respond right away
16. 16
…In-Stream Processing (ISP) service is an approach
to build real-time extensions of Big Data applications
Today’s
focus
17. 17
ISP is ideal for:
• Real-time data ingress to replace batch ETLs
• Real-time identification of one-in-a-million “actionable insights”
• Real-time response to actionable insights
• Real-time learning from new data
23. 23
Blueprint goals
Scalable to
100,000+/second Real-time streaming;
real-time ML
Cloud-ready
Proven for mission-
critical use
Open source
(and built 100%
with open source)
Production-ready
Portable across
clouds
Extendable
25. 25
Designed as a complete platform
• No single points of failure
• No bottlenecks
• Built-in scaling
• Dockerized
• Deployable to any cloud
• Reference implementation for
AWS (open source)
• Reference demo: real-time
twitter sentiment analytics for
new movie reviews
27. 27
How to achieve cloud portability?
• Phase 1: bootstrap management cluster
• [manual] Choose a cloud. Get a set of VMs (6) to host mngt cluster
• [automated] Deploy & configure Mesos/Marathon cluster on available VMs
• Phase 2: use management cluster to provision ISP environments
• [automated] Deploy all ISP components as Docker containers
• [automated] Deploy analytics application components (like Twitter API)
• [automated] Configure all dependencies
• [automated] Scale on-demand
• [automated] Shut down when done
30. 30
Real-time demo, a.k.a. “Data Science Kitchen”
• Provide reference example on how to use ISP platform…
• ... and learn the basics of data science along the way
• Gets actual Twitter data via streaming API
• Analyses & visualizes what people think about latest movies
• Exposes data science “kitchen”: models, training sets, dictionaries
• Provides nice web UI to play with data
• Uses our ISP RI (reference implementation)
• Demo is running on AWS as a public service
• Everything is open sourced
• Documentation on our Tech Blog
34. 34
Where to learn more
• 7-part blog series on ISP
• 7-part blog series on Data Science Kitchen
1. Read our blog: blog.griddynamics.com
2. Play with our demo
• http://apps.griddynamics.com/realtime-twitter-sentiment-
analysis-example
3. Connect
• Twitter: @griddynamics
• Subscribe to our blog
• Drop email: info@griddynamics.com