SlideShare una empresa de Scribd logo
1 de 23
Descargar para leer sin conexión
Simultaneous Analysis of Massive Data
Streams in Real-Time and Batch
Anjana Fernando
Technical Lead
WSO2
Agenda
• How massive data streams created
• How to receive
• How to store
• How to analyze, batch vs real-time
• WSO2 Big Data solution
• Demo
Massive Data Streams -> Data Streams with Big Data
What is Big Data?
❏ The 3 Vs
❏ Velocity
❏ Volume
❏ Variety
Where does it originate from?
• Machine logs
• Social media
• Archives
• Traffic information
• Weather data
• Sensor data (IoT)
What do I do with it?
Create intelligence..
• Should I take an umbrella to work today?
• What is the best route to go back home?
• What are the current market trends?
• Are my servers running healthily?
Protocols used to publish data..
• HTTP
• MQTT
• Zigbee
• Thrift
• Avro
• ProtoBuf
How to store the data?
• Relational databases
• Block data stores
-> HDFS
• Column oriented
-> HBase
-> Cassandra
• Document based
-> MongoDB
-> CouchDB
• In-Memory
-> VoltDB
A
C P
How to analyse data?
• Two options:
-> Batch processing: Schedule data processing jobs
and receive the processed data later
-> Real-time processing: The queries are executed
and the results are retrieved instantly
Analysing data..
• Batch processing
-> Apache Hadoop: Map/Reduce processing system
and a distributed file system
Analysing data..
• Batch processing - Data Warehouse
-> Apache Hive - Hadoop based framework for working
on large scale data stores with SQL-like queries
INSERT OVERWRITE TABLE UserTable SELECT userName, COUNT(DISTINCT
orderID),SUM(quantity) FROM PhoneSalesTable WHERE version= "1.0.0"
GROUP BY userName;
Analysing data..
• Batch processing - In-Memory Computing
-> Apache Spark - Functional programming model,
in-memory computing, claims 10x - 100x faster than
Hadoop
Analysing data..
• Real-time processing - Stream Processing
-> Apache Storm - Distributed and fault-tolerant
Spouts Bolts
Analysing data..
• Real-time processing - Complex Event Processing
-> WSO2 Siddhi:
Big Data Architecture with WSO2..
• Data Streams
{
'name':'phone.retail.shop',
'version':'1.0.0',
'nickName': 'Phone_Retail_Shop',
'description': 'Phone Sales',
'metaData':[
{'name':'clientType','type':'STRING'}
],
'payloadData':[
{'name':'brand','type':'STRING'},
{'name':'quantity','type':'INT'},
{'name':'total','type':'INT'},
{'name':'user','type':'STRING'}
]
}
The common stream format used in both CEP and BAM; The stream
definition contains the stream name, version and other attributes that
makes up the stream.
Big Data Architecture with WSO2..
• WSO2 BAM
-> Data Receiver - High performance binary format data
publishing with Apache Thrift, shared with WSO2 CEP
-> Data Storage - Cassandra for highly scalable data store
-> Data Analyzer - Hive based batch processing
Big Data Architecture with WSO2..
• WSO2 BAM..
-> Activity Monitoring: Implemented using a custom indexing
mechanism to instantly search for events of a specific activity in
the system
Big Data Architecture with WSO2..
• WSO2 BAM..
-> Incremental Data Processing - Customized Hive to support
incremental data processing:
@Incremental (name="salesAnalysis" , tables="PhoneSalesTable")
SELECT brandname,
Count(DISTINCT orderid),
Sum(quantity)
FROM phonesalestable
WHERE version = "1.0.0"
GROUP BY brandname;
Big Data Architecture with WSO2..
• WSO2 CEP
-> Same data receiver as BAM, where this is the point where the
same event is sent to both servers, where BAM for batch
processing and CEP for real-time processing of the same data
streams
-> Real-time in-memory processing, based on WSO2 Siddhi
engine, with data adapters for receiving and sending event with
different data types and transports, e.g. XML, JSON, Text, HTTP,
JMS, SMTP
Demo
Questions?
Thank you!

Más contenido relacionado

La actualidad más candente

Data integration
Data integrationData integration
Data integrationBallerina
 
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor ManagementMongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor ManagementMongoDB
 
Webinar: Managing Real Time Risk Analytics with MongoDB
Webinar: Managing Real Time Risk Analytics with MongoDB Webinar: Managing Real Time Risk Analytics with MongoDB
Webinar: Managing Real Time Risk Analytics with MongoDB MongoDB
 
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...Gianfranco Palumbo
 
Webinar: Choosing the Right Shard Key for High Performance and Scale
Webinar: Choosing the Right Shard Key for High Performance and ScaleWebinar: Choosing the Right Shard Key for High Performance and Scale
Webinar: Choosing the Right Shard Key for High Performance and ScaleMongoDB
 
MongoDB .local Munich 2019: Managing a Heterogeneous Stack with MongoDB & SQL
MongoDB .local Munich 2019: Managing a Heterogeneous Stack with MongoDB & SQLMongoDB .local Munich 2019: Managing a Heterogeneous Stack with MongoDB & SQL
MongoDB .local Munich 2019: Managing a Heterogeneous Stack with MongoDB & SQLMongoDB
 
Agility and Scalability with MongoDB
Agility and Scalability with MongoDBAgility and Scalability with MongoDB
Agility and Scalability with MongoDBMongoDB
 
Data Con LA 2019 - Hybrid Transactional Analytical Processing (HTAP) with Mar...
Data Con LA 2019 - Hybrid Transactional Analytical Processing (HTAP) with Mar...Data Con LA 2019 - Hybrid Transactional Analytical Processing (HTAP) with Mar...
Data Con LA 2019 - Hybrid Transactional Analytical Processing (HTAP) with Mar...Data Con LA
 
Intelligent integration with WSO2 ESB & WSO2 CEP
Intelligent integration with WSO2 ESB & WSO2 CEP Intelligent integration with WSO2 ESB & WSO2 CEP
Intelligent integration with WSO2 ESB & WSO2 CEP Sriskandarajah Suhothayan
 
Socialite, the Open Source Status Feed Part 2: Managing the Social Graph
Socialite, the Open Source Status Feed Part 2: Managing the Social GraphSocialite, the Open Source Status Feed Part 2: Managing the Social Graph
Socialite, the Open Source Status Feed Part 2: Managing the Social GraphMongoDB
 
Solving Challenges With 'Huge Data'
Solving Challenges With 'Huge Data'Solving Challenges With 'Huge Data'
Solving Challenges With 'Huge Data'IBM Sverige
 
Tales from production with postgreSQL at scale
Tales from production with postgreSQL at scaleTales from production with postgreSQL at scale
Tales from production with postgreSQL at scaleSoumya Ranjan Subudhi
 
Back to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQLBack to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQLMongoDB
 
Back to Basics 2017: Introduction to Sharding
Back to Basics 2017: Introduction to ShardingBack to Basics 2017: Introduction to Sharding
Back to Basics 2017: Introduction to ShardingMongoDB
 
MongoDB - An Introduction
MongoDB - An IntroductionMongoDB - An Introduction
MongoDB - An Introductionsethfloydjr
 
Intro to bigdata on gcp (1)
Intro to bigdata on gcp (1)Intro to bigdata on gcp (1)
Intro to bigdata on gcp (1)SahilRaina21
 

La actualidad más candente (20)

Look Ma! No more blobs
Look Ma! No more blobsLook Ma! No more blobs
Look Ma! No more blobs
 
Data integration
Data integrationData integration
Data integration
 
Mongodb
MongodbMongodb
Mongodb
 
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor ManagementMongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
MongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management
 
Webinar: Managing Real Time Risk Analytics with MongoDB
Webinar: Managing Real Time Risk Analytics with MongoDB Webinar: Managing Real Time Risk Analytics with MongoDB
Webinar: Managing Real Time Risk Analytics with MongoDB
 
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
 
Webinar: Choosing the Right Shard Key for High Performance and Scale
Webinar: Choosing the Right Shard Key for High Performance and ScaleWebinar: Choosing the Right Shard Key for High Performance and Scale
Webinar: Choosing the Right Shard Key for High Performance and Scale
 
MongoDB .local Munich 2019: Managing a Heterogeneous Stack with MongoDB & SQL
MongoDB .local Munich 2019: Managing a Heterogeneous Stack with MongoDB & SQLMongoDB .local Munich 2019: Managing a Heterogeneous Stack with MongoDB & SQL
MongoDB .local Munich 2019: Managing a Heterogeneous Stack with MongoDB & SQL
 
Agility and Scalability with MongoDB
Agility and Scalability with MongoDBAgility and Scalability with MongoDB
Agility and Scalability with MongoDB
 
Data Con LA 2019 - Hybrid Transactional Analytical Processing (HTAP) with Mar...
Data Con LA 2019 - Hybrid Transactional Analytical Processing (HTAP) with Mar...Data Con LA 2019 - Hybrid Transactional Analytical Processing (HTAP) with Mar...
Data Con LA 2019 - Hybrid Transactional Analytical Processing (HTAP) with Mar...
 
Intelligent integration with WSO2 ESB & WSO2 CEP
Intelligent integration with WSO2 ESB & WSO2 CEP Intelligent integration with WSO2 ESB & WSO2 CEP
Intelligent integration with WSO2 ESB & WSO2 CEP
 
Socialite, the Open Source Status Feed Part 2: Managing the Social Graph
Socialite, the Open Source Status Feed Part 2: Managing the Social GraphSocialite, the Open Source Status Feed Part 2: Managing the Social Graph
Socialite, the Open Source Status Feed Part 2: Managing the Social Graph
 
Solving Challenges With 'Huge Data'
Solving Challenges With 'Huge Data'Solving Challenges With 'Huge Data'
Solving Challenges With 'Huge Data'
 
MongoDB Schema Design Tips & Tricks
MongoDB Schema Design Tips & TricksMongoDB Schema Design Tips & Tricks
MongoDB Schema Design Tips & Tricks
 
Tales from production with postgreSQL at scale
Tales from production with postgreSQL at scaleTales from production with postgreSQL at scale
Tales from production with postgreSQL at scale
 
Back to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQLBack to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQL
 
MongoDB + Spring
MongoDB + SpringMongoDB + Spring
MongoDB + Spring
 
Back to Basics 2017: Introduction to Sharding
Back to Basics 2017: Introduction to ShardingBack to Basics 2017: Introduction to Sharding
Back to Basics 2017: Introduction to Sharding
 
MongoDB - An Introduction
MongoDB - An IntroductionMongoDB - An Introduction
MongoDB - An Introduction
 
Intro to bigdata on gcp (1)
Intro to bigdata on gcp (1)Intro to bigdata on gcp (1)
Intro to bigdata on gcp (1)
 

Similar a Simultaneous analysis of massive data streams in real time and batch

MongoDB Best Practices
MongoDB Best PracticesMongoDB Best Practices
MongoDB Best PracticesLewis Lin 🦊
 
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon Kinesis
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon KinesisDay 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon Kinesis
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon KinesisAmazon Web Services
 
Apache Drill: An Active, Ad-hoc Query System for large-scale Data Sets
Apache Drill: An Active, Ad-hoc Query System for large-scale Data SetsApache Drill: An Active, Ad-hoc Query System for large-scale Data Sets
Apache Drill: An Active, Ad-hoc Query System for large-scale Data SetsMapR Technologies
 
Big data from the trenches
Big data from the trenchesBig data from the trenches
Big data from the trenchesAzrul MADISA
 
Real-Time Streaming: Move IMS Data to Your Cloud Data Warehouse
Real-Time Streaming: Move IMS Data to Your Cloud Data WarehouseReal-Time Streaming: Move IMS Data to Your Cloud Data Warehouse
Real-Time Streaming: Move IMS Data to Your Cloud Data WarehousePrecisely
 
Introduction to WSO2 Data Analytics Platform
Introduction to  WSO2 Data Analytics PlatformIntroduction to  WSO2 Data Analytics Platform
Introduction to WSO2 Data Analytics PlatformSrinath Perera
 
MongoDB: What, why, when
MongoDB: What, why, whenMongoDB: What, why, when
MongoDB: What, why, whenEugenio Minardi
 
Getting Started with Real-time Analytics
Getting Started with Real-time AnalyticsGetting Started with Real-time Analytics
Getting Started with Real-time AnalyticsAmazon Web Services
 
Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMark Kromer
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapSrinath Perera
 
Big Data Warehousing Meetup: Real-time Trade Data Monitoring with Storm & Cas...
Big Data Warehousing Meetup: Real-time Trade Data Monitoring with Storm & Cas...Big Data Warehousing Meetup: Real-time Trade Data Monitoring with Storm & Cas...
Big Data Warehousing Meetup: Real-time Trade Data Monitoring with Storm & Cas...Caserta
 
Analytics in Your Enterprise
Analytics in Your EnterpriseAnalytics in Your Enterprise
Analytics in Your EnterpriseWSO2
 
Azure Stream Analytics : Analyse Data in Motion
Azure Stream Analytics  : Analyse Data in MotionAzure Stream Analytics  : Analyse Data in Motion
Azure Stream Analytics : Analyse Data in MotionRuhani Arora
 
Introduction to Azure DocumentDB
Introduction to Azure DocumentDBIntroduction to Azure DocumentDB
Introduction to Azure DocumentDBDenny Lee
 
WSO2 Analytics Platform - The one stop shop for all your data needs
WSO2 Analytics Platform - The one stop shop for all your data needsWSO2 Analytics Platform - The one stop shop for all your data needs
WSO2 Analytics Platform - The one stop shop for all your data needsSriskandarajah Suhothayan
 
Real-time big data analytics based on product recommendations case study
Real-time big data analytics based on product recommendations case studyReal-time big data analytics based on product recommendations case study
Real-time big data analytics based on product recommendations case studydeep.bi
 
Cloud-Based Big Data Analytics
Cloud-Based Big Data AnalyticsCloud-Based Big Data Analytics
Cloud-Based Big Data AnalyticsSateeshreddy N
 
Filtering From the Firehose: Real Time Social Media Streaming
Filtering From the Firehose: Real Time Social Media StreamingFiltering From the Firehose: Real Time Social Media Streaming
Filtering From the Firehose: Real Time Social Media StreamingCloud Elements
 
[WSO2Con USA 2018] Patterns for Building Streaming Apps
[WSO2Con USA 2018] Patterns for Building Streaming Apps[WSO2Con USA 2018] Patterns for Building Streaming Apps
[WSO2Con USA 2018] Patterns for Building Streaming AppsWSO2
 

Similar a Simultaneous analysis of massive data streams in real time and batch (20)

MongoDB Best Practices
MongoDB Best PracticesMongoDB Best Practices
MongoDB Best Practices
 
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon Kinesis
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon KinesisDay 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon Kinesis
Day 5 - Real-time Data Processing/Internet of Things (IoT) with Amazon Kinesis
 
Apache Drill: An Active, Ad-hoc Query System for large-scale Data Sets
Apache Drill: An Active, Ad-hoc Query System for large-scale Data SetsApache Drill: An Active, Ad-hoc Query System for large-scale Data Sets
Apache Drill: An Active, Ad-hoc Query System for large-scale Data Sets
 
Big data from the trenches
Big data from the trenchesBig data from the trenches
Big data from the trenches
 
Real-Time Streaming: Move IMS Data to Your Cloud Data Warehouse
Real-Time Streaming: Move IMS Data to Your Cloud Data WarehouseReal-Time Streaming: Move IMS Data to Your Cloud Data Warehouse
Real-Time Streaming: Move IMS Data to Your Cloud Data Warehouse
 
Introduction to WSO2 Data Analytics Platform
Introduction to  WSO2 Data Analytics PlatformIntroduction to  WSO2 Data Analytics Platform
Introduction to WSO2 Data Analytics Platform
 
MongoDB: What, why, when
MongoDB: What, why, whenMongoDB: What, why, when
MongoDB: What, why, when
 
Getting Started with Real-time Analytics
Getting Started with Real-time AnalyticsGetting Started with Real-time Analytics
Getting Started with Real-time Analytics
 
Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data Analytics
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and Roadmap
 
Big Data Warehousing Meetup: Real-time Trade Data Monitoring with Storm & Cas...
Big Data Warehousing Meetup: Real-time Trade Data Monitoring with Storm & Cas...Big Data Warehousing Meetup: Real-time Trade Data Monitoring with Storm & Cas...
Big Data Warehousing Meetup: Real-time Trade Data Monitoring with Storm & Cas...
 
Analytics in Your Enterprise
Analytics in Your EnterpriseAnalytics in Your Enterprise
Analytics in Your Enterprise
 
Siddhi - cloud-native stream processor
Siddhi - cloud-native stream processorSiddhi - cloud-native stream processor
Siddhi - cloud-native stream processor
 
Azure Stream Analytics : Analyse Data in Motion
Azure Stream Analytics  : Analyse Data in MotionAzure Stream Analytics  : Analyse Data in Motion
Azure Stream Analytics : Analyse Data in Motion
 
Introduction to Azure DocumentDB
Introduction to Azure DocumentDBIntroduction to Azure DocumentDB
Introduction to Azure DocumentDB
 
WSO2 Analytics Platform - The one stop shop for all your data needs
WSO2 Analytics Platform - The one stop shop for all your data needsWSO2 Analytics Platform - The one stop shop for all your data needs
WSO2 Analytics Platform - The one stop shop for all your data needs
 
Real-time big data analytics based on product recommendations case study
Real-time big data analytics based on product recommendations case studyReal-time big data analytics based on product recommendations case study
Real-time big data analytics based on product recommendations case study
 
Cloud-Based Big Data Analytics
Cloud-Based Big Data AnalyticsCloud-Based Big Data Analytics
Cloud-Based Big Data Analytics
 
Filtering From the Firehose: Real Time Social Media Streaming
Filtering From the Firehose: Real Time Social Media StreamingFiltering From the Firehose: Real Time Social Media Streaming
Filtering From the Firehose: Real Time Social Media Streaming
 
[WSO2Con USA 2018] Patterns for Building Streaming Apps
[WSO2Con USA 2018] Patterns for Building Streaming Apps[WSO2Con USA 2018] Patterns for Building Streaming Apps
[WSO2Con USA 2018] Patterns for Building Streaming Apps
 

Más de Anjana Fernando

Ballerina – An Open-Source, Cloud-Native Programming Language for Microservices
Ballerina – An Open-Source, Cloud-Native Programming Language for MicroservicesBallerina – An Open-Source, Cloud-Native Programming Language for Microservices
Ballerina – An Open-Source, Cloud-Native Programming Language for MicroservicesAnjana Fernando
 
Automatic Microservices Observability with Ballerina - GIDS 2021
Automatic Microservices Observability with Ballerina - GIDS 2021Automatic Microservices Observability with Ballerina - GIDS 2021
Automatic Microservices Observability with Ballerina - GIDS 2021Anjana Fernando
 
Ballerina: An Open-Source, Cloud-Native Programming Language - GIDS 2021
Ballerina: An Open-Source, Cloud-Native Programming Language - GIDS 2021Ballerina: An Open-Source, Cloud-Native Programming Language - GIDS 2021
Ballerina: An Open-Source, Cloud-Native Programming Language - GIDS 2021Anjana Fernando
 
Java Distributed Transactions
Java Distributed TransactionsJava Distributed Transactions
Java Distributed TransactionsAnjana Fernando
 
Monitoring Your Business with WSO2 BAM
Monitoring Your Business with WSO2 BAMMonitoring Your Business with WSO2 BAM
Monitoring Your Business with WSO2 BAMAnjana Fernando
 
Data Services: Getting Your Data Into APIs
Data Services: Getting Your Data Into APIsData Services: Getting Your Data Into APIs
Data Services: Getting Your Data Into APIsAnjana Fernando
 
Scalable Log Analysis with WSO2 BAM
Scalable Log Analysis with WSO2 BAMScalable Log Analysis with WSO2 BAM
Scalable Log Analysis with WSO2 BAMAnjana Fernando
 
Data integration and Business Processes
Data integration and Business ProcessesData integration and Business Processes
Data integration and Business ProcessesAnjana Fernando
 
Ballerina - A Programming Language for Cloud and DevOps
Ballerina - A Programming Language for Cloud and DevOpsBallerina - A Programming Language for Cloud and DevOps
Ballerina - A Programming Language for Cloud and DevOpsAnjana Fernando
 
Ballerina - Cloud Native Programming Language
Ballerina - Cloud Native Programming LanguageBallerina - Cloud Native Programming Language
Ballerina - Cloud Native Programming LanguageAnjana Fernando
 
Ballerina - A Programming Language for Cloud and DevOps
Ballerina - A Programming Language for Cloud and DevOpsBallerina - A Programming Language for Cloud and DevOps
Ballerina - A Programming Language for Cloud and DevOpsAnjana Fernando
 
Effective microservices development with ballerina
Effective microservices development with ballerinaEffective microservices development with ballerina
Effective microservices development with ballerinaAnjana Fernando
 

Más de Anjana Fernando (13)

Ballerina – An Open-Source, Cloud-Native Programming Language for Microservices
Ballerina – An Open-Source, Cloud-Native Programming Language for MicroservicesBallerina – An Open-Source, Cloud-Native Programming Language for Microservices
Ballerina – An Open-Source, Cloud-Native Programming Language for Microservices
 
Automatic Microservices Observability with Ballerina - GIDS 2021
Automatic Microservices Observability with Ballerina - GIDS 2021Automatic Microservices Observability with Ballerina - GIDS 2021
Automatic Microservices Observability with Ballerina - GIDS 2021
 
Ballerina: An Open-Source, Cloud-Native Programming Language - GIDS 2021
Ballerina: An Open-Source, Cloud-Native Programming Language - GIDS 2021Ballerina: An Open-Source, Cloud-Native Programming Language - GIDS 2021
Ballerina: An Open-Source, Cloud-Native Programming Language - GIDS 2021
 
IoT Analytics
IoT AnalyticsIoT Analytics
IoT Analytics
 
Java Distributed Transactions
Java Distributed TransactionsJava Distributed Transactions
Java Distributed Transactions
 
Monitoring Your Business with WSO2 BAM
Monitoring Your Business with WSO2 BAMMonitoring Your Business with WSO2 BAM
Monitoring Your Business with WSO2 BAM
 
Data Services: Getting Your Data Into APIs
Data Services: Getting Your Data Into APIsData Services: Getting Your Data Into APIs
Data Services: Getting Your Data Into APIs
 
Scalable Log Analysis with WSO2 BAM
Scalable Log Analysis with WSO2 BAMScalable Log Analysis with WSO2 BAM
Scalable Log Analysis with WSO2 BAM
 
Data integration and Business Processes
Data integration and Business ProcessesData integration and Business Processes
Data integration and Business Processes
 
Ballerina - A Programming Language for Cloud and DevOps
Ballerina - A Programming Language for Cloud and DevOpsBallerina - A Programming Language for Cloud and DevOps
Ballerina - A Programming Language for Cloud and DevOps
 
Ballerina - Cloud Native Programming Language
Ballerina - Cloud Native Programming LanguageBallerina - Cloud Native Programming Language
Ballerina - Cloud Native Programming Language
 
Ballerina - A Programming Language for Cloud and DevOps
Ballerina - A Programming Language for Cloud and DevOpsBallerina - A Programming Language for Cloud and DevOps
Ballerina - A Programming Language for Cloud and DevOps
 
Effective microservices development with ballerina
Effective microservices development with ballerinaEffective microservices development with ballerina
Effective microservices development with ballerina
 

Último

IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIES VE
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераMark Opanasiuk
 
Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge
 
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties ReimaginedEasier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties Reimaginedpanagenda
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...FIDO Alliance
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGDSC PJATK
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfFIDO Alliance
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Patrick Viafore
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...panagenda
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?Mark Billinghurst
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...FIDO Alliance
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfFIDO Alliance
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceSamy Fodil
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...CzechDreamin
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfFIDO Alliance
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024Lorenzo Miniero
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon
 
ECS 2024 Teams Premium - Pretty Secure
ECS 2024   Teams Premium - Pretty SecureECS 2024   Teams Premium - Pretty Secure
ECS 2024 Teams Premium - Pretty SecureFemke de Vroome
 
AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101vincent683379
 

Último (20)

Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджера
 
Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024
 
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties ReimaginedEasier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 Warsaw
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdf
 
ECS 2024 Teams Premium - Pretty Secure
ECS 2024   Teams Premium - Pretty SecureECS 2024   Teams Premium - Pretty Secure
ECS 2024 Teams Premium - Pretty Secure
 
AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101
 

Simultaneous analysis of massive data streams in real time and batch

  • 1. Simultaneous Analysis of Massive Data Streams in Real-Time and Batch Anjana Fernando Technical Lead WSO2
  • 2. Agenda • How massive data streams created • How to receive • How to store • How to analyze, batch vs real-time • WSO2 Big Data solution • Demo
  • 3. Massive Data Streams -> Data Streams with Big Data
  • 4. What is Big Data? ❏ The 3 Vs ❏ Velocity ❏ Volume ❏ Variety
  • 5. Where does it originate from? • Machine logs • Social media • Archives • Traffic information • Weather data • Sensor data (IoT)
  • 6. What do I do with it? Create intelligence.. • Should I take an umbrella to work today? • What is the best route to go back home? • What are the current market trends? • Are my servers running healthily?
  • 7. Protocols used to publish data.. • HTTP • MQTT • Zigbee • Thrift • Avro • ProtoBuf
  • 8. How to store the data? • Relational databases • Block data stores -> HDFS • Column oriented -> HBase -> Cassandra • Document based -> MongoDB -> CouchDB • In-Memory -> VoltDB A C P
  • 9. How to analyse data? • Two options: -> Batch processing: Schedule data processing jobs and receive the processed data later -> Real-time processing: The queries are executed and the results are retrieved instantly
  • 10. Analysing data.. • Batch processing -> Apache Hadoop: Map/Reduce processing system and a distributed file system
  • 11. Analysing data.. • Batch processing - Data Warehouse -> Apache Hive - Hadoop based framework for working on large scale data stores with SQL-like queries INSERT OVERWRITE TABLE UserTable SELECT userName, COUNT(DISTINCT orderID),SUM(quantity) FROM PhoneSalesTable WHERE version= "1.0.0" GROUP BY userName;
  • 12. Analysing data.. • Batch processing - In-Memory Computing -> Apache Spark - Functional programming model, in-memory computing, claims 10x - 100x faster than Hadoop
  • 13. Analysing data.. • Real-time processing - Stream Processing -> Apache Storm - Distributed and fault-tolerant Spouts Bolts
  • 14. Analysing data.. • Real-time processing - Complex Event Processing -> WSO2 Siddhi:
  • 15.
  • 16. Big Data Architecture with WSO2.. • Data Streams { 'name':'phone.retail.shop', 'version':'1.0.0', 'nickName': 'Phone_Retail_Shop', 'description': 'Phone Sales', 'metaData':[ {'name':'clientType','type':'STRING'} ], 'payloadData':[ {'name':'brand','type':'STRING'}, {'name':'quantity','type':'INT'}, {'name':'total','type':'INT'}, {'name':'user','type':'STRING'} ] } The common stream format used in both CEP and BAM; The stream definition contains the stream name, version and other attributes that makes up the stream.
  • 17. Big Data Architecture with WSO2.. • WSO2 BAM -> Data Receiver - High performance binary format data publishing with Apache Thrift, shared with WSO2 CEP -> Data Storage - Cassandra for highly scalable data store -> Data Analyzer - Hive based batch processing
  • 18. Big Data Architecture with WSO2.. • WSO2 BAM.. -> Activity Monitoring: Implemented using a custom indexing mechanism to instantly search for events of a specific activity in the system
  • 19. Big Data Architecture with WSO2.. • WSO2 BAM.. -> Incremental Data Processing - Customized Hive to support incremental data processing: @Incremental (name="salesAnalysis" , tables="PhoneSalesTable") SELECT brandname, Count(DISTINCT orderid), Sum(quantity) FROM phonesalestable WHERE version = "1.0.0" GROUP BY brandname;
  • 20. Big Data Architecture with WSO2.. • WSO2 CEP -> Same data receiver as BAM, where this is the point where the same event is sent to both servers, where BAM for batch processing and CEP for real-time processing of the same data streams -> Real-time in-memory processing, based on WSO2 Siddhi engine, with data adapters for receiving and sending event with different data types and transports, e.g. XML, JSON, Text, HTTP, JMS, SMTP
  • 21. Demo