SlideShare a Scribd company logo
1 of 26
© 2016 MapR Technologies 1© 2016 MapR Technologies 1MapR Confidential © 2016 MapR Technologies
CEP - A Simplified Enterprise Architecture
for Real-time Stream Processing
Mathieu Dumoulin, Data Engineer (mdumoulin@mapr.com, @lordxar)
© 2016 MapR Technologies 2© 2016 MapR Technologies 2MapR Confidential
Mathieu Dumoulin
• Living in Tokyo, Japan last 3 years
• Data Engineer for MapR Professional Services
• Other jobs: Data Scientist, Search Engineer
• Connect with me:
–Read my blog posts:
https://www.mapr.com/blog/author/mathieu-dumoulin
–Twitter: @Lordxar
–Email: mdumoulin@mapr.com
© 2016 MapR Technologies 3© 2016 MapR Technologies 3MapR Confidential
Content Summary
1.Complex Event Processing
2.Streaming Architecture
3.Rules Engines for CEP
4.Simplified Hadoop-based CEP Architecture
5.Live Demo
6.Does it scale?
7.Conclusion
© 2016 MapR Technologies 4© 2016 MapR Technologies 4MapR Confidential
Complex Event Processing (CEP)
Some terminology:
• Event: Data with a timestamp (a log event, a transaction, ...)
• Event processing: Track and analyze streaming event data
• Complex event processing is to identify meaningful events and
respond to them as quickly as possible. Usually over a sliding
window on the stream of event data.
CEP is just a fancy way to do
business rules on streaming data
© 2016 MapR Technologies 5© 2016 MapR Technologies 5MapR Confidential
IoT: Needs some CEP in There Somewhere
© 2016 MapR Technologies 6© 2016 MapR Technologies 6MapR Confidential
CEP in Action
The power of CEP comes from being able to detect complex
situations that could not be detected from any individual data
directly.
Window opened
Motion Sensor
Light turned on
Door opened
© 2016 MapR Technologies 7© 2016 MapR Technologies 7MapR Confidential
Actually, CEP Has Been Around For a While
Taken from March 2010 issue of the Dutch Java Magazine (source)
© 2016 MapR Technologies 8© 2016 MapR Technologies 8MapR Confidential
Technology Has Been Holding Rule Engines Back
• Rule engines are not new
– First papers from the 90’s, many implementations in early 2000’s
• Engine is running in-memory on single node
– A few GB of memory (or less) was a severe limitation
– Single core CPU can only do so much
• Need modern stream messaging (Kafka, MapR Streams)
– Need persistence
– Need speed
• No standard, no dominant sponsor
– 90’s and early 2000 dominated by Microsoft
– OSS had not come of age in enterprise IT
© 2016 MapR Technologies 9© 2016 MapR Technologies 9MapR Confidential
CEP in a Modern Enterprise Data Pipeline
Source: Oracle / Rittman Mead Information Management Reference Architecture
© 2016 MapR Technologies 10© 2016 MapR Technologies 10MapR Confidential
Modern Streaming Architecture
• Build flexible systems
– more efficient and easier to build
– Decouples dependencies
• Better model the way business processes take place.
• More value now
– Aggregates data from many sources once
– Serves data to one or many projects immediately
• More value later
– Run batch analytics on the data later
– Reprocess the data with different algorithms later
© 2016 MapR Technologies 11© 2016 MapR Technologies 11MapR Confidential
Kafka-esque Messaging for Rule Engines
• Stream Persistence is a key feature
• CEP is only one use case
– Support batch analytics and Ad-hoc analysis from the same data
stream
• Compensate for Current Rule Engine limitations
– Enables Hot Replacement for fault-tolerance
– Enables simple horizontal scaling by partitioning data and rules
• Convergence
– Run this use case on your existing, standard, big data technology
– Use OSS frameworks and Open APIs
© 2016 MapR Technologies 12© 2016 MapR Technologies 12MapR Confidential
Roy Schulte, vice president, Gartner
Most CEP in IoT [...] is custom coded [...]
rather than
[using a] general purpose stream platform.
See: Complex Event Processing and The Future Of Business Decisions
by David Luckham and W. Roy Schulte
© 2016 MapR Technologies 13© 2016 MapR Technologies 13MapR Confidential
Custom Coded CEP: The Good and The Bad
The Good:
• Made to order with a modern framework
• “No limit” to potential for performance and scalability
• Fit to purpose technology
The bad:
• Engineers aren’t business domain experts
• Lots of work to build from scratch every time
• Changes to logic is a pain point (from business side)
• Lack of available talent/organizational capability
© 2016 MapR Technologies 14© 2016 MapR Technologies 14MapR Confidential
Declarative Makes Sense For Business
Manage complex behavior through simple rules
working together, executed by a rules Engine.
© 2016 MapR Technologies 15© 2016 MapR Technologies 15MapR Confidential
Drools is a business rule management system (BRMS) with a
forward and backward chaining inference based rules engine.
• Project homepage: http://www.drools.org/
• Developer: Red Hat
• Enterprise supported version available
– JBoss Enterprise BRMS
• Enhanced implementation of the Rete algorithm
– A state of the art algorithm for rules engines
• Has a GUI Rules Editor: Workbench
An Open Source Rule Engine:
© 2016 MapR Technologies 16© 2016 MapR Technologies 16MapR Confidential
An Open Source Rule Engine:
Production
Memory
(Rules)
Working
Memory
(Facts)
Pattern
Matcher
AgendaDomain Expert
Rules
Editor
Actions
© 2016 MapR Technologies 17© 2016 MapR Technologies 17MapR Confidential
STATELESS
Session
CEP in Drools: Stateful Session and Sliding Window
STATELESS
Session
Rule:
Is the ball red?
Rule:
Are there 2+ red
balls in the last 4
balls I’ve seen?
© 2016 MapR Technologies 18© 2016 MapR Technologies 18MapR Confidential
STATEFUL
Session
CEP in Drools: Stateful Session + Sliding Window
STATELESS
Session
Rule:
Is the ball red?
Rule:
Are there 2+ red
balls in the last 4
balls I’ve seen?
© 2016 MapR Technologies 19© 2016 MapR Technologies 19MapR Confidential
Streaming Architecture for CEP
Sensors -
Real-time Data
Producer
Distributed
Cluster (Kafka,
MapR)
Consumer Server
(Edge node, cluster
node)
Integrate with other
systems
© 2016 MapR Technologies 20© 2016 MapR Technologies 20MapR Confidential
The Case for CEP on Streaming Architecture
• Decouple rules maintenance from code and infrastructure
– Manage the cluster separately
– The application code may need only minimal maintenance
• Rules maintenance in the hands of the business domain experts
– Easily supports multiple projects & teams
• Data is persisted in the stream (input and output)
– Open to new use cases
• Send data back to the stream
– Integrate with other downstream use cases
© 2016 MapR Technologies 21© 2016 MapR Technologies 21MapR Confidential
But Does It Scale? Yes, But Only to a Point
• Drools and other rule engines are in-memory and the
memory is not distributed
– This is only a technical limitation that can be
overcome (Ex: Alluxio, Apache Ignite)
• Streams make it easy to provide reasonable fault-
tolerance and quick disaster recovery
• Run multiple servers, split rules logically, fan out data
into multiple topics
• A single session can handle 100K+/sec events. How
much scale is needed?
© 2016 MapR Technologies 22© 2016 MapR Technologies 22MapR Confidential
Live Demo: Smart City Traffic Management
© 2016 MapR Technologies 23© 2016 MapR Technologies 23MapR Confidential
● Try out integration with Spark
Streaming and Flink
● Run serious performance
benchmarks
● Deploy into production
© 2016 MapR Technologies 24© 2016 MapR Technologies 24MapR Confidential
Recap
• It’s not Rule Engine vs. Spark and Flink Stream processing
– It’s Rules + Stream Processing
– Spark Flink, Java are just an implementation choice
• Focus on business value from applying rules to data
– Think of benefits of SQL vs. Java, C++, Scala, …
• Great use case for a Streaming Architecture and microservices
An in-depth blog post on this talk topic will be available on
MapR blog: https://www.mapr.com/blog/author/mathieu-dumoulin
© 2016 MapR Technologies 25© 2016 MapR Technologies 25MapR Confidential
Suggested Reading
● Get Ted & Ellen’s book and many
more for free:
○ https://www.mapr.com/ebooks/
● More more great blog content
about CEP and IoT applications
○ Eric Bruno on Linkedin
○ Karzel et al. on InfoQ
© 2016 MapR Technologies 26© 2016 MapR Technologies 26MapR Confidential
Q & A
@mapr
mdumoulin@mapr.com
@lordxar
Engage with us!
mapr-technologies

More Related Content

What's hot

McAfee - Enterprise Security Manager (ESM) - SIEM
McAfee - Enterprise Security Manager (ESM) - SIEMMcAfee - Enterprise Security Manager (ESM) - SIEM
McAfee - Enterprise Security Manager (ESM) - SIEMIftikhar Ali Iqbal
 
Developing RESTful Web APIs with Python, Flask and MongoDB
Developing RESTful Web APIs with Python, Flask and MongoDBDeveloping RESTful Web APIs with Python, Flask and MongoDB
Developing RESTful Web APIs with Python, Flask and MongoDBNicola Iarocci
 
Architecting Azure IoT Solutions
Architecting Azure IoT SolutionsArchitecting Azure IoT Solutions
Architecting Azure IoT SolutionsGlobalLogic Ukraine
 
China Telecom Americas: SD-WAN Overview
China Telecom Americas:  SD-WAN OverviewChina Telecom Americas:  SD-WAN Overview
China Telecom Americas: SD-WAN OverviewVlad Sinayuk
 
Azure Active Directory
Azure Active DirectoryAzure Active Directory
Azure Active DirectorySovelto
 
Google Cloud Storage | Google Cloud Platform Tutorial | Google Cloud Architec...
Google Cloud Storage | Google Cloud Platform Tutorial | Google Cloud Architec...Google Cloud Storage | Google Cloud Platform Tutorial | Google Cloud Architec...
Google Cloud Storage | Google Cloud Platform Tutorial | Google Cloud Architec...Edureka!
 
Plan de Reprise d'Activité avec Azure Site Recovery
Plan de Reprise d'Activité avec Azure Site RecoveryPlan de Reprise d'Activité avec Azure Site Recovery
Plan de Reprise d'Activité avec Azure Site RecoveryMicrosoft
 
Mobile Cloud Computing
Mobile Cloud ComputingMobile Cloud Computing
Mobile Cloud ComputingVikas Kottari
 
DDS: The IoT Data Sharing Standard
DDS: The IoT Data Sharing StandardDDS: The IoT Data Sharing Standard
DDS: The IoT Data Sharing StandardAngelo Corsaro
 
ZERO TRUST ARCHITECTURE - DIGITAL TRUST FRAMEWORK
ZERO TRUST ARCHITECTURE - DIGITAL TRUST FRAMEWORKZERO TRUST ARCHITECTURE - DIGITAL TRUST FRAMEWORK
ZERO TRUST ARCHITECTURE - DIGITAL TRUST FRAMEWORKMaganathin Veeraragaloo
 
Software Engineering at Google.pdf
Software Engineering at Google.pdfSoftware Engineering at Google.pdf
Software Engineering at Google.pdfMan_Ebook
 
Introduction to Azure
Introduction to AzureIntroduction to Azure
Introduction to AzureRobert Crane
 
Oracle Unified Directory. Lessons learnt. Is it ready for a move from OID? (O...
Oracle Unified Directory. Lessons learnt. Is it ready for a move from OID? (O...Oracle Unified Directory. Lessons learnt. Is it ready for a move from OID? (O...
Oracle Unified Directory. Lessons learnt. Is it ready for a move from OID? (O...Andrejs Prokopjevs
 
Cloud security and security architecture
Cloud security and security architectureCloud security and security architecture
Cloud security and security architectureVladimir Jirasek
 
JDBC - JPA - Spring Data
JDBC - JPA - Spring DataJDBC - JPA - Spring Data
JDBC - JPA - Spring DataArturs Drozdovs
 
Azure Virtual Desktop Overview.pptx
Azure Virtual Desktop Overview.pptxAzure Virtual Desktop Overview.pptx
Azure Virtual Desktop Overview.pptxceyhan1
 

What's hot (20)

McAfee - Enterprise Security Manager (ESM) - SIEM
McAfee - Enterprise Security Manager (ESM) - SIEMMcAfee - Enterprise Security Manager (ESM) - SIEM
McAfee - Enterprise Security Manager (ESM) - SIEM
 
Developing RESTful Web APIs with Python, Flask and MongoDB
Developing RESTful Web APIs with Python, Flask and MongoDBDeveloping RESTful Web APIs with Python, Flask and MongoDB
Developing RESTful Web APIs with Python, Flask and MongoDB
 
Architecting Azure IoT Solutions
Architecting Azure IoT SolutionsArchitecting Azure IoT Solutions
Architecting Azure IoT Solutions
 
China Telecom Americas: SD-WAN Overview
China Telecom Americas:  SD-WAN OverviewChina Telecom Americas:  SD-WAN Overview
China Telecom Americas: SD-WAN Overview
 
Azure Security Overview
Azure Security OverviewAzure Security Overview
Azure Security Overview
 
Azure Active Directory
Azure Active DirectoryAzure Active Directory
Azure Active Directory
 
Google Cloud Storage | Google Cloud Platform Tutorial | Google Cloud Architec...
Google Cloud Storage | Google Cloud Platform Tutorial | Google Cloud Architec...Google Cloud Storage | Google Cloud Platform Tutorial | Google Cloud Architec...
Google Cloud Storage | Google Cloud Platform Tutorial | Google Cloud Architec...
 
Plan de Reprise d'Activité avec Azure Site Recovery
Plan de Reprise d'Activité avec Azure Site RecoveryPlan de Reprise d'Activité avec Azure Site Recovery
Plan de Reprise d'Activité avec Azure Site Recovery
 
Microsoft azure
Microsoft azureMicrosoft azure
Microsoft azure
 
REST full API Design
REST full API DesignREST full API Design
REST full API Design
 
Mobile Cloud Computing
Mobile Cloud ComputingMobile Cloud Computing
Mobile Cloud Computing
 
DDS: The IoT Data Sharing Standard
DDS: The IoT Data Sharing StandardDDS: The IoT Data Sharing Standard
DDS: The IoT Data Sharing Standard
 
ZERO TRUST ARCHITECTURE - DIGITAL TRUST FRAMEWORK
ZERO TRUST ARCHITECTURE - DIGITAL TRUST FRAMEWORKZERO TRUST ARCHITECTURE - DIGITAL TRUST FRAMEWORK
ZERO TRUST ARCHITECTURE - DIGITAL TRUST FRAMEWORK
 
Software Engineering at Google.pdf
Software Engineering at Google.pdfSoftware Engineering at Google.pdf
Software Engineering at Google.pdf
 
Introduction to Azure
Introduction to AzureIntroduction to Azure
Introduction to Azure
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
 
Oracle Unified Directory. Lessons learnt. Is it ready for a move from OID? (O...
Oracle Unified Directory. Lessons learnt. Is it ready for a move from OID? (O...Oracle Unified Directory. Lessons learnt. Is it ready for a move from OID? (O...
Oracle Unified Directory. Lessons learnt. Is it ready for a move from OID? (O...
 
Cloud security and security architecture
Cloud security and security architectureCloud security and security architecture
Cloud security and security architecture
 
JDBC - JPA - Spring Data
JDBC - JPA - Spring DataJDBC - JPA - Spring Data
JDBC - JPA - Spring Data
 
Azure Virtual Desktop Overview.pptx
Azure Virtual Desktop Overview.pptxAzure Virtual Desktop Overview.pptx
Azure Virtual Desktop Overview.pptx
 

Similar to CEP - simplified streaming architecture - Strata Singapore 2016

Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainEvolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainMapR Technologies
 
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Mathieu Dumoulin
 
Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop DataWorks Summit/Hadoop Summit
 
How Spark is Enabling the New Wave of Converged Cloud Applications
How Spark is Enabling the New Wave of Converged Cloud Applications How Spark is Enabling the New Wave of Converged Cloud Applications
How Spark is Enabling the New Wave of Converged Cloud Applications MapR Technologies
 
Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...
Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...
Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...Mathieu Dumoulin
 
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...MapR Technologies
 
Real World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in ProductionReal World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in ProductionCodemotion
 
How Spark is Enabling the New Wave of Converged Applications
How Spark is Enabling  the New Wave of Converged ApplicationsHow Spark is Enabling  the New Wave of Converged Applications
How Spark is Enabling the New Wave of Converged ApplicationsMapR Technologies
 
Map r seattle streams meetup oct 2016
Map r seattle streams meetup   oct 2016Map r seattle streams meetup   oct 2016
Map r seattle streams meetup oct 2016Nitin Kumar
 
Where is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteWhere is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteTed Dunning
 
MapR and Machine Learning Primer
MapR and Machine Learning PrimerMapR and Machine Learning Primer
MapR and Machine Learning PrimerMathieu Dumoulin
 
Fast Cars, Big Data - How Streaming Can Help Formula 1
Fast Cars, Big Data - How Streaming Can Help Formula 1Fast Cars, Big Data - How Streaming Can Help Formula 1
Fast Cars, Big Data - How Streaming Can Help Formula 1Tugdual Grall
 
MapR 5.2: Getting More Value from the MapR Converged Data Platform
MapR 5.2: Getting More Value from the MapR Converged Data PlatformMapR 5.2: Getting More Value from the MapR Converged Data Platform
MapR 5.2: Getting More Value from the MapR Converged Data PlatformMapR Technologies
 
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...DataWorks Summit/Hadoop Summit
 
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Codemotion
 
Distributed Deep Learning on Spark
Distributed Deep Learning on SparkDistributed Deep Learning on Spark
Distributed Deep Learning on SparkMathieu Dumoulin
 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningTed Dunning
 
MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -MapR Technologies
 
Predictive Maintenance Using Recurrent Neural Networks
Predictive Maintenance Using Recurrent Neural NetworksPredictive Maintenance Using Recurrent Neural Networks
Predictive Maintenance Using Recurrent Neural NetworksJustin Brandenburg
 

Similar to CEP - simplified streaming architecture - Strata Singapore 2016 (20)

Streaming in the Extreme
Streaming in the ExtremeStreaming in the Extreme
Streaming in the Extreme
 
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainEvolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and Rain
 
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
 
Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop
 
How Spark is Enabling the New Wave of Converged Cloud Applications
How Spark is Enabling the New Wave of Converged Cloud Applications How Spark is Enabling the New Wave of Converged Cloud Applications
How Spark is Enabling the New Wave of Converged Cloud Applications
 
Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...
Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...
Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...
 
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...
 
Real World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in ProductionReal World Use Cases: Hadoop and NoSQL in Production
Real World Use Cases: Hadoop and NoSQL in Production
 
How Spark is Enabling the New Wave of Converged Applications
How Spark is Enabling  the New Wave of Converged ApplicationsHow Spark is Enabling  the New Wave of Converged Applications
How Spark is Enabling the New Wave of Converged Applications
 
Map r seattle streams meetup oct 2016
Map r seattle streams meetup   oct 2016Map r seattle streams meetup   oct 2016
Map r seattle streams meetup oct 2016
 
Where is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteWhere is Data Going? - RMDC Keynote
Where is Data Going? - RMDC Keynote
 
MapR and Machine Learning Primer
MapR and Machine Learning PrimerMapR and Machine Learning Primer
MapR and Machine Learning Primer
 
Fast Cars, Big Data - How Streaming Can Help Formula 1
Fast Cars, Big Data - How Streaming Can Help Formula 1Fast Cars, Big Data - How Streaming Can Help Formula 1
Fast Cars, Big Data - How Streaming Can Help Formula 1
 
MapR 5.2: Getting More Value from the MapR Converged Data Platform
MapR 5.2: Getting More Value from the MapR Converged Data PlatformMapR 5.2: Getting More Value from the MapR Converged Data Platform
MapR 5.2: Getting More Value from the MapR Converged Data Platform
 
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
 
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
 
Distributed Deep Learning on Spark
Distributed Deep Learning on SparkDistributed Deep Learning on Spark
Distributed Deep Learning on Spark
 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine Learning
 
MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -
 
Predictive Maintenance Using Recurrent Neural Networks
Predictive Maintenance Using Recurrent Neural NetworksPredictive Maintenance Using Recurrent Neural Networks
Predictive Maintenance Using Recurrent Neural Networks
 

More from Mathieu Dumoulin

Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...Mathieu Dumoulin
 
State of the Art Robot Predictive Maintenance with Real-time Sensor Data
State of the Art Robot Predictive Maintenance with Real-time Sensor DataState of the Art Robot Predictive Maintenance with Real-time Sensor Data
State of the Art Robot Predictive Maintenance with Real-time Sensor DataMathieu Dumoulin
 
Real world machine learning with Java for Fumankaitori.com
Real world machine learning with Java for Fumankaitori.comReal world machine learning with Java for Fumankaitori.com
Real world machine learning with Java for Fumankaitori.comMathieu Dumoulin
 
Introduction aux algorithmes map reduce
Introduction aux algorithmes map reduceIntroduction aux algorithmes map reduce
Introduction aux algorithmes map reduceMathieu Dumoulin
 
MapReduce: Traitement de données distribué à grande échelle simplifié
MapReduce: Traitement de données distribué à grande échelle simplifiéMapReduce: Traitement de données distribué à grande échelle simplifié
MapReduce: Traitement de données distribué à grande échelle simplifiéMathieu Dumoulin
 
Presentation Hadoop Québec
Presentation Hadoop QuébecPresentation Hadoop Québec
Presentation Hadoop QuébecMathieu Dumoulin
 

More from Mathieu Dumoulin (7)

Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
 
State of the Art Robot Predictive Maintenance with Real-time Sensor Data
State of the Art Robot Predictive Maintenance with Real-time Sensor DataState of the Art Robot Predictive Maintenance with Real-time Sensor Data
State of the Art Robot Predictive Maintenance with Real-time Sensor Data
 
Real world machine learning with Java for Fumankaitori.com
Real world machine learning with Java for Fumankaitori.comReal world machine learning with Java for Fumankaitori.com
Real world machine learning with Java for Fumankaitori.com
 
Introduction aux algorithmes map reduce
Introduction aux algorithmes map reduceIntroduction aux algorithmes map reduce
Introduction aux algorithmes map reduce
 
MapReduce: Traitement de données distribué à grande échelle simplifié
MapReduce: Traitement de données distribué à grande échelle simplifiéMapReduce: Traitement de données distribué à grande échelle simplifié
MapReduce: Traitement de données distribué à grande échelle simplifié
 
Presentation Hadoop Québec
Presentation Hadoop QuébecPresentation Hadoop Québec
Presentation Hadoop Québec
 
Introduction à Hadoop
Introduction à HadoopIntroduction à Hadoop
Introduction à Hadoop
 

Recently uploaded

Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 

Recently uploaded (20)

Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 

CEP - simplified streaming architecture - Strata Singapore 2016

  • 1. © 2016 MapR Technologies 1© 2016 MapR Technologies 1MapR Confidential © 2016 MapR Technologies CEP - A Simplified Enterprise Architecture for Real-time Stream Processing Mathieu Dumoulin, Data Engineer (mdumoulin@mapr.com, @lordxar)
  • 2. © 2016 MapR Technologies 2© 2016 MapR Technologies 2MapR Confidential Mathieu Dumoulin • Living in Tokyo, Japan last 3 years • Data Engineer for MapR Professional Services • Other jobs: Data Scientist, Search Engineer • Connect with me: –Read my blog posts: https://www.mapr.com/blog/author/mathieu-dumoulin –Twitter: @Lordxar –Email: mdumoulin@mapr.com
  • 3. © 2016 MapR Technologies 3© 2016 MapR Technologies 3MapR Confidential Content Summary 1.Complex Event Processing 2.Streaming Architecture 3.Rules Engines for CEP 4.Simplified Hadoop-based CEP Architecture 5.Live Demo 6.Does it scale? 7.Conclusion
  • 4. © 2016 MapR Technologies 4© 2016 MapR Technologies 4MapR Confidential Complex Event Processing (CEP) Some terminology: • Event: Data with a timestamp (a log event, a transaction, ...) • Event processing: Track and analyze streaming event data • Complex event processing is to identify meaningful events and respond to them as quickly as possible. Usually over a sliding window on the stream of event data. CEP is just a fancy way to do business rules on streaming data
  • 5. © 2016 MapR Technologies 5© 2016 MapR Technologies 5MapR Confidential IoT: Needs some CEP in There Somewhere
  • 6. © 2016 MapR Technologies 6© 2016 MapR Technologies 6MapR Confidential CEP in Action The power of CEP comes from being able to detect complex situations that could not be detected from any individual data directly. Window opened Motion Sensor Light turned on Door opened
  • 7. © 2016 MapR Technologies 7© 2016 MapR Technologies 7MapR Confidential Actually, CEP Has Been Around For a While Taken from March 2010 issue of the Dutch Java Magazine (source)
  • 8. © 2016 MapR Technologies 8© 2016 MapR Technologies 8MapR Confidential Technology Has Been Holding Rule Engines Back • Rule engines are not new – First papers from the 90’s, many implementations in early 2000’s • Engine is running in-memory on single node – A few GB of memory (or less) was a severe limitation – Single core CPU can only do so much • Need modern stream messaging (Kafka, MapR Streams) – Need persistence – Need speed • No standard, no dominant sponsor – 90’s and early 2000 dominated by Microsoft – OSS had not come of age in enterprise IT
  • 9. © 2016 MapR Technologies 9© 2016 MapR Technologies 9MapR Confidential CEP in a Modern Enterprise Data Pipeline Source: Oracle / Rittman Mead Information Management Reference Architecture
  • 10. © 2016 MapR Technologies 10© 2016 MapR Technologies 10MapR Confidential Modern Streaming Architecture • Build flexible systems – more efficient and easier to build – Decouples dependencies • Better model the way business processes take place. • More value now – Aggregates data from many sources once – Serves data to one or many projects immediately • More value later – Run batch analytics on the data later – Reprocess the data with different algorithms later
  • 11. © 2016 MapR Technologies 11© 2016 MapR Technologies 11MapR Confidential Kafka-esque Messaging for Rule Engines • Stream Persistence is a key feature • CEP is only one use case – Support batch analytics and Ad-hoc analysis from the same data stream • Compensate for Current Rule Engine limitations – Enables Hot Replacement for fault-tolerance – Enables simple horizontal scaling by partitioning data and rules • Convergence – Run this use case on your existing, standard, big data technology – Use OSS frameworks and Open APIs
  • 12. © 2016 MapR Technologies 12© 2016 MapR Technologies 12MapR Confidential Roy Schulte, vice president, Gartner Most CEP in IoT [...] is custom coded [...] rather than [using a] general purpose stream platform. See: Complex Event Processing and The Future Of Business Decisions by David Luckham and W. Roy Schulte
  • 13. © 2016 MapR Technologies 13© 2016 MapR Technologies 13MapR Confidential Custom Coded CEP: The Good and The Bad The Good: • Made to order with a modern framework • “No limit” to potential for performance and scalability • Fit to purpose technology The bad: • Engineers aren’t business domain experts • Lots of work to build from scratch every time • Changes to logic is a pain point (from business side) • Lack of available talent/organizational capability
  • 14. © 2016 MapR Technologies 14© 2016 MapR Technologies 14MapR Confidential Declarative Makes Sense For Business Manage complex behavior through simple rules working together, executed by a rules Engine.
  • 15. © 2016 MapR Technologies 15© 2016 MapR Technologies 15MapR Confidential Drools is a business rule management system (BRMS) with a forward and backward chaining inference based rules engine. • Project homepage: http://www.drools.org/ • Developer: Red Hat • Enterprise supported version available – JBoss Enterprise BRMS • Enhanced implementation of the Rete algorithm – A state of the art algorithm for rules engines • Has a GUI Rules Editor: Workbench An Open Source Rule Engine:
  • 16. © 2016 MapR Technologies 16© 2016 MapR Technologies 16MapR Confidential An Open Source Rule Engine: Production Memory (Rules) Working Memory (Facts) Pattern Matcher AgendaDomain Expert Rules Editor Actions
  • 17. © 2016 MapR Technologies 17© 2016 MapR Technologies 17MapR Confidential STATELESS Session CEP in Drools: Stateful Session and Sliding Window STATELESS Session Rule: Is the ball red? Rule: Are there 2+ red balls in the last 4 balls I’ve seen?
  • 18. © 2016 MapR Technologies 18© 2016 MapR Technologies 18MapR Confidential STATEFUL Session CEP in Drools: Stateful Session + Sliding Window STATELESS Session Rule: Is the ball red? Rule: Are there 2+ red balls in the last 4 balls I’ve seen?
  • 19. © 2016 MapR Technologies 19© 2016 MapR Technologies 19MapR Confidential Streaming Architecture for CEP Sensors - Real-time Data Producer Distributed Cluster (Kafka, MapR) Consumer Server (Edge node, cluster node) Integrate with other systems
  • 20. © 2016 MapR Technologies 20© 2016 MapR Technologies 20MapR Confidential The Case for CEP on Streaming Architecture • Decouple rules maintenance from code and infrastructure – Manage the cluster separately – The application code may need only minimal maintenance • Rules maintenance in the hands of the business domain experts – Easily supports multiple projects & teams • Data is persisted in the stream (input and output) – Open to new use cases • Send data back to the stream – Integrate with other downstream use cases
  • 21. © 2016 MapR Technologies 21© 2016 MapR Technologies 21MapR Confidential But Does It Scale? Yes, But Only to a Point • Drools and other rule engines are in-memory and the memory is not distributed – This is only a technical limitation that can be overcome (Ex: Alluxio, Apache Ignite) • Streams make it easy to provide reasonable fault- tolerance and quick disaster recovery • Run multiple servers, split rules logically, fan out data into multiple topics • A single session can handle 100K+/sec events. How much scale is needed?
  • 22. © 2016 MapR Technologies 22© 2016 MapR Technologies 22MapR Confidential Live Demo: Smart City Traffic Management
  • 23. © 2016 MapR Technologies 23© 2016 MapR Technologies 23MapR Confidential ● Try out integration with Spark Streaming and Flink ● Run serious performance benchmarks ● Deploy into production
  • 24. © 2016 MapR Technologies 24© 2016 MapR Technologies 24MapR Confidential Recap • It’s not Rule Engine vs. Spark and Flink Stream processing – It’s Rules + Stream Processing – Spark Flink, Java are just an implementation choice • Focus on business value from applying rules to data – Think of benefits of SQL vs. Java, C++, Scala, … • Great use case for a Streaming Architecture and microservices An in-depth blog post on this talk topic will be available on MapR blog: https://www.mapr.com/blog/author/mathieu-dumoulin
  • 25. © 2016 MapR Technologies 25© 2016 MapR Technologies 25MapR Confidential Suggested Reading ● Get Ted & Ellen’s book and many more for free: ○ https://www.mapr.com/ebooks/ ● More more great blog content about CEP and IoT applications ○ Eric Bruno on Linkedin ○ Karzel et al. on InfoQ
  • 26. © 2016 MapR Technologies 26© 2016 MapR Technologies 26MapR Confidential Q & A @mapr mdumoulin@mapr.com @lordxar Engage with us! mapr-technologies

Editor's Notes

  1. It’s just not true ML solves all problems. ML seeks to make predictions, which is very useful. But most business processes don’t need prediction every step of the way, they are rather more like a series of steps with conditionals arranged in a DAG
  2. Rules need to be: Independent Easily Updated (Add, Change, Delete) Rules apply to only minimum set of relevant data Allow business domain experts to contribute
  3. Integrate Flink/Spark Streaming with Drools Performance and Scalability Testing Flink brings “for free” lots of benefits: State is saved automatically by checkpoints Fault-recovery for Drools state is simplified Record-at-a-time processing is a good model to add data to KieSession