SlideShare una empresa de Scribd logo
1 de 25
Descargar para leer sin conexión
Data Stream
Processing
Agenda
 Overview
 What is Streaming Data?
 Streaming Data Pipeline
 Streaming Platform components
 What is Stishovite?
Overview
Monitoring Events
In RealTime
Monitoring Alerts
Sending alerts based on
detection of event patterns
in data streams.
Dashboards
RealTime Operational
Dashboards
Search
Full-text querying,
aggregations, Geo Data in
near real time
Analytics
Analyze big volumes of
data quickly and in near
real time
Streaming Data is data that is generated continuously by thousands of data sources, which
typically send in the data records simultaneously, and in small sizes (order of Kilobytes).
This data needs to be processed sequentially and incrementally on a record-by-record basis or
over sliding time windows, and used for a wide variety of analytics including correlations,
aggregations, filtering, and sampling.
Stream processing has become the defacto standard for building real-time ETL and Stream
Analytics applications. We see batch workloads move into Stream processing to act on the
data and derive insights faster. With the explosion of data such IoT and machine-generated
data, Stream Processing + Predictive Analytics is driving tremendous business value.
Streaming Data
Streaming Data examples include:
• Website, Network and Applications monitoring
• Fraud detection
• Advertising
• Internet of Things: sensors (trucks, transportation vehicles, industrial equipment)
• Machine-generated data
• Social analytics
• Private Searching
• Others
Streaming Data Examples
o Persistence
o Performance
o Scale
o Parallel & Partitioned
o Messaging
o Processing
o Storage
Key Requirements for Streaming Data
State of Stream Processing
Stateless
• Filter
• Map
Stateful
• Aggregate
• Join
Typical Streaming Workflow
Producer
Producer
Streaming
Platform
Streaming
Processing
Persistence
Consumer
We need to collect the data, process the data, store the data, and finally serve the data for
analysis, searching, machine learning and dashboards.
Streaming Data Pipeline
Data Sources Collect & Insgest
Data
Serve DataStore DataProcess Data
? ? ? ?
We need to collect the data from a wide array of inputs and write them into a wide array of
outputs in real time.
Collect Data
• Pull-based
• Push-based
Change Data Capture (CDC)
Database Changefeeds
CollectorsCustom Collectors
• Java
• Python
When data is ingested in real time, each data item is imported as it is emitted by the source. An
effective data ingestion process begins by prioritizing data sources, validating individual files
and routing data items to the correct destination.
Streaming Data Ingestion
Kafka Topics
Apache Kafka is a distributed system designed for streams. It is built to be fault-tolerant, high-
throughput, horizontally scalable, and allows geographically distributing data streams and
stream processing applications.
Apache Kafka
Kafka’s system design can be thought of as that of a distributed commit log, where incoming
data is written sequentially to disk. There are four main components involved in moving data in
and out of Kafka:
• Topics
• Producers
• Consumers
• Brokers
How Kafka Works
Kafka Streaming Platform
Collect & ingest
Data
We need to collect the data, process the data, store the data, and finally serve the data for
analysis, machine learning, and dashboards.
Data Sources Serve DataStore DataProcess Data
? ? ?
Streaming Data Pipeline
Data Stream Processing
There are a wide variety of technologies, frameworks, and libraries for building applications
that process streams of data. Frameworks such as Flink, Storm, Samza and Spark all can
process streams of data in real time writing code in Java, Python or Scala doing excellent job.
But if you was looking for something more simple to build data pipelines with a minimal data
processing you should test:
Apache NiFi is an integrated data platform that enables the automation of data flow between
systems. It provides real-time control that makes it easy to manage the movement of data
between any source and any destination. Apache NiFi helps move and track data.
Apache Nifi
Apache NiFi is used for:
• Reliable and secure transfer of data between systems
• Delivery of data from sources to analytic platforms
• Enrichment and preparation of data:
• Conversion between formats
• Extraction/Parsing/Splitting/Aggregation
• Schema translation
• Routing decisions
Collect & ingest
Data
Data Stream Processing
Data Sources Serve DataStore DataProcess Data
? ?
Streaming Data Pipeline
For storing lots of streaming data, we need a data store that supports fast writes and scales.
Storing Streaming Data
Collect & ingest
Data
Storing Streaming Data
Data Sources Serve DataStore DataProcess Data
?
Streaming Data Pipeline
End applications like dashboards, business intelligence tools, and other applications that use
the processed event data.
Serving the Data
Collect & ingest
Data
Complete workflow of streaming data
Data Sources Serve DataStore DataProcess Data
Streaming Data Pipeline
Stishovite is a centralized console to manage the entire pipeline of the xGem Streaming
Platform.
xGem Stream Platform is the integration of differents Open Source Products.
https://gitlab.com/xgem/stishovite
What is Stishovite?
Thanks!
Jorge Hirtz
@jahtux

Más contenido relacionado

La actualidad más candente

Introduction to basic data analytics tools
Introduction to basic data analytics toolsIntroduction to basic data analytics tools
Introduction to basic data analytics toolsNascenia IT
 
Spark Summit Keynote by Suren Nathan
Spark Summit Keynote by Suren NathanSpark Summit Keynote by Suren Nathan
Spark Summit Keynote by Suren NathanSpark Summit
 
Obfuscating LinkedIn Member Data
Obfuscating LinkedIn Member DataObfuscating LinkedIn Member Data
Obfuscating LinkedIn Member DataDataWorks Summit
 
Big data – can it deliver speed and accuracy v1
Big data – can it deliver speed and accuracy v1Big data – can it deliver speed and accuracy v1
Big data – can it deliver speed and accuracy v1GurinderG
 
Building Data Lakes with Apache Airflow
Building Data Lakes with Apache AirflowBuilding Data Lakes with Apache Airflow
Building Data Lakes with Apache AirflowGary Stafford
 
Succinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit Agarwal
Succinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit AgarwalSuccinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit Agarwal
Succinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit AgarwalSpark Summit
 
Dealing with Drift: Building an Enterprise Data Lake
Dealing with Drift: Building an Enterprise Data LakeDealing with Drift: Building an Enterprise Data Lake
Dealing with Drift: Building an Enterprise Data LakePat Patterson
 
EAP - Accelerating behavorial analytics at PayPal using Hadoop
EAP - Accelerating behavorial analytics at PayPal using HadoopEAP - Accelerating behavorial analytics at PayPal using Hadoop
EAP - Accelerating behavorial analytics at PayPal using HadoopDataWorks Summit
 
August 2016 HUG: Open Source Big Data Ingest with StreamSets Data Collector
August 2016 HUG: Open Source Big Data Ingest with StreamSets Data Collector August 2016 HUG: Open Source Big Data Ingest with StreamSets Data Collector
August 2016 HUG: Open Source Big Data Ingest with StreamSets Data Collector Yahoo Developer Network
 
Real-time Distributed Stream Processing @ Scale
Real-time Distributed Stream Processing@ ScaleReal-time Distributed Stream Processing@ Scale
Real-time Distributed Stream Processing @ ScaleJerome Boulon
 
Spark Summit EU talk by Pat Patterson
Spark Summit EU talk by Pat PattersonSpark Summit EU talk by Pat Patterson
Spark Summit EU talk by Pat PattersonSpark Summit
 
Data Pipelines With Streamsets
Data Pipelines With Streamsets Data Pipelines With Streamsets
Data Pipelines With Streamsets Jowanza Joseph
 
Dealing With Drift - Building an Enterprise Data Lake
Dealing With Drift - Building an Enterprise Data LakeDealing With Drift - Building an Enterprise Data Lake
Dealing With Drift - Building an Enterprise Data LakePat Patterson
 
Unifying Streaming and Historical Telemetry Data For Real-time Performance Re...
Unifying Streaming and Historical Telemetry Data For Real-time Performance Re...Unifying Streaming and Historical Telemetry Data For Real-time Performance Re...
Unifying Streaming and Historical Telemetry Data For Real-time Performance Re...Databricks
 
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...Flink Forward
 
Headaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous ApplicationsHeadaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous ApplicationsDatabricks
 
Kafka Summit SF 2017 - Keynote - Managing Data at Scale: The Unreasonable Eff...
Kafka Summit SF 2017 - Keynote - Managing Data at Scale: The Unreasonable Eff...Kafka Summit SF 2017 - Keynote - Managing Data at Scale: The Unreasonable Eff...
Kafka Summit SF 2017 - Keynote - Managing Data at Scale: The Unreasonable Eff...confluent
 
Solving Performance Problems on Hadoop
Solving Performance Problems on HadoopSolving Performance Problems on Hadoop
Solving Performance Problems on HadoopTyler Mitchell
 

La actualidad más candente (20)

Introduction to basic data analytics tools
Introduction to basic data analytics toolsIntroduction to basic data analytics tools
Introduction to basic data analytics tools
 
Spark Summit Keynote by Suren Nathan
Spark Summit Keynote by Suren NathanSpark Summit Keynote by Suren Nathan
Spark Summit Keynote by Suren Nathan
 
Obfuscating LinkedIn Member Data
Obfuscating LinkedIn Member DataObfuscating LinkedIn Member Data
Obfuscating LinkedIn Member Data
 
Big data – can it deliver speed and accuracy v1
Big data – can it deliver speed and accuracy v1Big data – can it deliver speed and accuracy v1
Big data – can it deliver speed and accuracy v1
 
Building Data Lakes with Apache Airflow
Building Data Lakes with Apache AirflowBuilding Data Lakes with Apache Airflow
Building Data Lakes with Apache Airflow
 
Succinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit Agarwal
Succinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit AgarwalSuccinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit Agarwal
Succinct Spark: Fast Interactive Queries on Compressed RDDs by Rachit Agarwal
 
Dealing with Drift: Building an Enterprise Data Lake
Dealing with Drift: Building an Enterprise Data LakeDealing with Drift: Building an Enterprise Data Lake
Dealing with Drift: Building an Enterprise Data Lake
 
EAP - Accelerating behavorial analytics at PayPal using Hadoop
EAP - Accelerating behavorial analytics at PayPal using HadoopEAP - Accelerating behavorial analytics at PayPal using Hadoop
EAP - Accelerating behavorial analytics at PayPal using Hadoop
 
August 2016 HUG: Open Source Big Data Ingest with StreamSets Data Collector
August 2016 HUG: Open Source Big Data Ingest with StreamSets Data Collector August 2016 HUG: Open Source Big Data Ingest with StreamSets Data Collector
August 2016 HUG: Open Source Big Data Ingest with StreamSets Data Collector
 
The Evolution of Big Data Pipelines at Intuit
The Evolution of Big Data Pipelines at Intuit The Evolution of Big Data Pipelines at Intuit
The Evolution of Big Data Pipelines at Intuit
 
Real-time Distributed Stream Processing @ Scale
Real-time Distributed Stream Processing@ ScaleReal-time Distributed Stream Processing@ Scale
Real-time Distributed Stream Processing @ Scale
 
Spark Summit EU talk by Pat Patterson
Spark Summit EU talk by Pat PattersonSpark Summit EU talk by Pat Patterson
Spark Summit EU talk by Pat Patterson
 
Data Pipelines With Streamsets
Data Pipelines With Streamsets Data Pipelines With Streamsets
Data Pipelines With Streamsets
 
Dealing With Drift - Building an Enterprise Data Lake
Dealing With Drift - Building an Enterprise Data LakeDealing With Drift - Building an Enterprise Data Lake
Dealing With Drift - Building an Enterprise Data Lake
 
Unifying Streaming and Historical Telemetry Data For Real-time Performance Re...
Unifying Streaming and Historical Telemetry Data For Real-time Performance Re...Unifying Streaming and Historical Telemetry Data For Real-time Performance Re...
Unifying Streaming and Historical Telemetry Data For Real-time Performance Re...
 
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
Virtual Flink Forward 2020: Netflix Data Mesh: Composable Data Processing - J...
 
Accelerating Data Warehouse Modernization
Accelerating Data Warehouse ModernizationAccelerating Data Warehouse Modernization
Accelerating Data Warehouse Modernization
 
Headaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous ApplicationsHeadaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous Applications
 
Kafka Summit SF 2017 - Keynote - Managing Data at Scale: The Unreasonable Eff...
Kafka Summit SF 2017 - Keynote - Managing Data at Scale: The Unreasonable Eff...Kafka Summit SF 2017 - Keynote - Managing Data at Scale: The Unreasonable Eff...
Kafka Summit SF 2017 - Keynote - Managing Data at Scale: The Unreasonable Eff...
 
Solving Performance Problems on Hadoop
Solving Performance Problems on HadoopSolving Performance Problems on Hadoop
Solving Performance Problems on Hadoop
 

Similar a xGem Data Stream Processing

Apache Spark Streaming -Real time web server log analytics
Apache Spark Streaming -Real time web server log analyticsApache Spark Streaming -Real time web server log analytics
Apache Spark Streaming -Real time web server log analyticsANKIT GUPTA
 
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureOtimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureLuan Moreno Medeiros Maciel
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming VisualizationGuido Schmutz
 
BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use CasesBDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use CasesAmazon Web Services
 
10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About 10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About Jesus Rodriguez
 
Lightning Fast Analytics with Hive LLAP and Druid
Lightning Fast Analytics with Hive LLAP and DruidLightning Fast Analytics with Hive LLAP and Druid
Lightning Fast Analytics with Hive LLAP and DruidDataWorks Summit
 
ACDKOCHI19 - Next Generation Data Analytics Platform on AWS
ACDKOCHI19 - Next Generation Data Analytics Platform on AWSACDKOCHI19 - Next Generation Data Analytics Platform on AWS
ACDKOCHI19 - Next Generation Data Analytics Platform on AWSAWS User Group Kochi
 
Streaming Data and Stream Processing with Apache Kafka
Streaming Data and Stream Processing with Apache KafkaStreaming Data and Stream Processing with Apache Kafka
Streaming Data and Stream Processing with Apache Kafkaconfluent
 
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...Data Con LA
 
Streaming Data Ingest and Processing with Apache Kafka
Streaming Data Ingest and Processing with Apache KafkaStreaming Data Ingest and Processing with Apache Kafka
Streaming Data Ingest and Processing with Apache KafkaAttunity
 
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...Amazon Web Services
 
Etl is Dead; Long Live Streams
Etl is Dead; Long Live StreamsEtl is Dead; Long Live Streams
Etl is Dead; Long Live Streamsconfluent
 
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS Amazon Web Services LATAM
 
Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...
Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...
Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...Amazon Web Services
 
Building real time data-driven products
Building real time data-driven productsBuilding real time data-driven products
Building real time data-driven productsLars Albertsson
 
Integração de Dados com Apache NIFI - Marco Garcia Cetax
Integração de Dados com Apache NIFI - Marco Garcia CetaxIntegração de Dados com Apache NIFI - Marco Garcia Cetax
Integração de Dados com Apache NIFI - Marco Garcia CetaxMarco Garcia
 
Introduction to Amazon Kinesis Analytics
Introduction to Amazon Kinesis AnalyticsIntroduction to Amazon Kinesis Analytics
Introduction to Amazon Kinesis AnalyticsAmazon Web Services
 
Real-time Analysis of Data Processing Pipelines with Spring Cloud Data Flow a...
Real-time Analysis of Data Processing Pipelines with Spring Cloud Data Flow a...Real-time Analysis of Data Processing Pipelines with Spring Cloud Data Flow a...
Real-time Analysis of Data Processing Pipelines with Spring Cloud Data Flow a...VMware Tanzu
 
Getting Started with Real-time Analytics
Getting Started with Real-time AnalyticsGetting Started with Real-time Analytics
Getting Started with Real-time AnalyticsAmazon Web Services
 

Similar a xGem Data Stream Processing (20)

Apache Spark Streaming -Real time web server log analytics
Apache Spark Streaming -Real time web server log analyticsApache Spark Streaming -Real time web server log analytics
Apache Spark Streaming -Real time web server log analytics
 
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureOtimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
 
BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use CasesBDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
 
Xavient - DiP
Xavient - DiPXavient - DiP
Xavient - DiP
 
10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About 10 Big Data Technologies you Didn't Know About
10 Big Data Technologies you Didn't Know About
 
Lightning Fast Analytics with Hive LLAP and Druid
Lightning Fast Analytics with Hive LLAP and DruidLightning Fast Analytics with Hive LLAP and Druid
Lightning Fast Analytics with Hive LLAP and Druid
 
ACDKOCHI19 - Next Generation Data Analytics Platform on AWS
ACDKOCHI19 - Next Generation Data Analytics Platform on AWSACDKOCHI19 - Next Generation Data Analytics Platform on AWS
ACDKOCHI19 - Next Generation Data Analytics Platform on AWS
 
Streaming Data and Stream Processing with Apache Kafka
Streaming Data and Stream Processing with Apache KafkaStreaming Data and Stream Processing with Apache Kafka
Streaming Data and Stream Processing with Apache Kafka
 
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
 
Streaming Data Ingest and Processing with Apache Kafka
Streaming Data Ingest and Processing with Apache KafkaStreaming Data Ingest and Processing with Apache Kafka
Streaming Data Ingest and Processing with Apache Kafka
 
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
 
Etl is Dead; Long Live Streams
Etl is Dead; Long Live StreamsEtl is Dead; Long Live Streams
Etl is Dead; Long Live Streams
 
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
Innovation Track AWS Cloud Experience Argentina - Data Lakes & Analytics en AWS
 
Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...
Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...
Emerging Prevalence of Data Streaming in Analytics and it's Business Signific...
 
Building real time data-driven products
Building real time data-driven productsBuilding real time data-driven products
Building real time data-driven products
 
Integração de Dados com Apache NIFI - Marco Garcia Cetax
Integração de Dados com Apache NIFI - Marco Garcia CetaxIntegração de Dados com Apache NIFI - Marco Garcia Cetax
Integração de Dados com Apache NIFI - Marco Garcia Cetax
 
Introduction to Amazon Kinesis Analytics
Introduction to Amazon Kinesis AnalyticsIntroduction to Amazon Kinesis Analytics
Introduction to Amazon Kinesis Analytics
 
Real-time Analysis of Data Processing Pipelines with Spring Cloud Data Flow a...
Real-time Analysis of Data Processing Pipelines with Spring Cloud Data Flow a...Real-time Analysis of Data Processing Pipelines with Spring Cloud Data Flow a...
Real-time Analysis of Data Processing Pipelines with Spring Cloud Data Flow a...
 
Getting Started with Real-time Analytics
Getting Started with Real-time AnalyticsGetting Started with Real-time Analytics
Getting Started with Real-time Analytics
 

Último

Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 

Último (20)

Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 

xGem Data Stream Processing

  • 1.
  • 3. Agenda  Overview  What is Streaming Data?  Streaming Data Pipeline  Streaming Platform components  What is Stishovite?
  • 4. Overview Monitoring Events In RealTime Monitoring Alerts Sending alerts based on detection of event patterns in data streams. Dashboards RealTime Operational Dashboards Search Full-text querying, aggregations, Geo Data in near real time Analytics Analyze big volumes of data quickly and in near real time
  • 5. Streaming Data is data that is generated continuously by thousands of data sources, which typically send in the data records simultaneously, and in small sizes (order of Kilobytes). This data needs to be processed sequentially and incrementally on a record-by-record basis or over sliding time windows, and used for a wide variety of analytics including correlations, aggregations, filtering, and sampling. Stream processing has become the defacto standard for building real-time ETL and Stream Analytics applications. We see batch workloads move into Stream processing to act on the data and derive insights faster. With the explosion of data such IoT and machine-generated data, Stream Processing + Predictive Analytics is driving tremendous business value. Streaming Data
  • 6. Streaming Data examples include: • Website, Network and Applications monitoring • Fraud detection • Advertising • Internet of Things: sensors (trucks, transportation vehicles, industrial equipment) • Machine-generated data • Social analytics • Private Searching • Others Streaming Data Examples
  • 7. o Persistence o Performance o Scale o Parallel & Partitioned o Messaging o Processing o Storage Key Requirements for Streaming Data
  • 8. State of Stream Processing Stateless • Filter • Map Stateful • Aggregate • Join
  • 10. We need to collect the data, process the data, store the data, and finally serve the data for analysis, searching, machine learning and dashboards. Streaming Data Pipeline Data Sources Collect & Insgest Data Serve DataStore DataProcess Data ? ? ? ?
  • 11. We need to collect the data from a wide array of inputs and write them into a wide array of outputs in real time. Collect Data • Pull-based • Push-based Change Data Capture (CDC) Database Changefeeds CollectorsCustom Collectors • Java • Python
  • 12. When data is ingested in real time, each data item is imported as it is emitted by the source. An effective data ingestion process begins by prioritizing data sources, validating individual files and routing data items to the correct destination. Streaming Data Ingestion Kafka Topics
  • 13. Apache Kafka is a distributed system designed for streams. It is built to be fault-tolerant, high- throughput, horizontally scalable, and allows geographically distributing data streams and stream processing applications. Apache Kafka
  • 14. Kafka’s system design can be thought of as that of a distributed commit log, where incoming data is written sequentially to disk. There are four main components involved in moving data in and out of Kafka: • Topics • Producers • Consumers • Brokers How Kafka Works
  • 16. Collect & ingest Data We need to collect the data, process the data, store the data, and finally serve the data for analysis, machine learning, and dashboards. Data Sources Serve DataStore DataProcess Data ? ? ? Streaming Data Pipeline
  • 17. Data Stream Processing There are a wide variety of technologies, frameworks, and libraries for building applications that process streams of data. Frameworks such as Flink, Storm, Samza and Spark all can process streams of data in real time writing code in Java, Python or Scala doing excellent job. But if you was looking for something more simple to build data pipelines with a minimal data processing you should test:
  • 18. Apache NiFi is an integrated data platform that enables the automation of data flow between systems. It provides real-time control that makes it easy to manage the movement of data between any source and any destination. Apache NiFi helps move and track data. Apache Nifi Apache NiFi is used for: • Reliable and secure transfer of data between systems • Delivery of data from sources to analytic platforms • Enrichment and preparation of data: • Conversion between formats • Extraction/Parsing/Splitting/Aggregation • Schema translation • Routing decisions
  • 19. Collect & ingest Data Data Stream Processing Data Sources Serve DataStore DataProcess Data ? ? Streaming Data Pipeline
  • 20. For storing lots of streaming data, we need a data store that supports fast writes and scales. Storing Streaming Data
  • 21. Collect & ingest Data Storing Streaming Data Data Sources Serve DataStore DataProcess Data ? Streaming Data Pipeline
  • 22. End applications like dashboards, business intelligence tools, and other applications that use the processed event data. Serving the Data
  • 23. Collect & ingest Data Complete workflow of streaming data Data Sources Serve DataStore DataProcess Data Streaming Data Pipeline
  • 24. Stishovite is a centralized console to manage the entire pipeline of the xGem Streaming Platform. xGem Stream Platform is the integration of differents Open Source Products. https://gitlab.com/xgem/stishovite What is Stishovite?