SlideShare una empresa de Scribd logo
1 de 20
Descargar para leer sin conexión
CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3
18.
                                    Septem
                                      ber
                                     2012



•    What is Streaming Data?         2


•    Why Kafka?
•    Kafka Architecture
•    Use Case: Prospective Search




Overview
18.
                                           Septem
                                             ber
                                            2012



•  Spin-off of MeMo News AG, the             3

   leading provider for Social Media
   Monitoring & Analytics in Switzerland
•  Big Data expert, focused on Hadoop,
   HBase and Solr
•  Objective: Transforming data into
   insights




About Sentric
CC 2.0 by audreyjm529| http://flic.kr/p/mNMtL	
  
18.
                                  Septem
                                    ber
                                   2012



•  Website Activity Data           5

      •  User activity
      •  Server activity
•  Social Media Data
•  News Data
•  …

•  How to Analyze in Real-Time?

What is Streaming Data?

Data Streams
18.
                                                                        Septem
                                                                          ber
                                                                         2012

                                                                         6


                                                              now	
  


           t	
  




                   Offline	
  (Hadoop/MR)	
     Online	
  (Ka5a)	
  




What is Streaming Data?

Offline vs. Online
CC 2.0 by Tom Hilton | http://flic.kr/p/54KSXy	
  
18.
                                                         Septem
                                                           ber
                                                          2012


•  Message Queues (RabbitMQ, ActiveMQ)                    8

     •       do not scale / have no persistence
•  Flume / Scribe
     •       Log-Aggregation only, high throughput and
             scalable, push model
     •       Focus on offline consumption
•  Kafka
     •       High throughput and scalable, pull model
     •       Different consumption profiles


Why Kafka?

Streaming Systems
18.
                                                                                                                   Septem
                                                                                                                     ber
                                                                                                                    2012

                                                                                                                    9




Source:	
  h<p://research.microso@.com/en-­‐us/um/people/srikanth/netdb11/netdb11papers/netdb11-­‐final12.pdf	
  


Why Kafka?

Consumer Performance
CC 2.0 by Presidente | http://flic.kr/p/2ptSZ	
  
18.
                           Septem
                             ber
                            2012



•      Messaging System     11


•      Publish-Subscribe
•      Persistent
•      High-Throughput




Kafka Architecture

Key Concepts
18.
                                                          Septem
                                                            ber
                                                           2012

                                                          12

                            ZooKeeper
          Producer                             Consumer



          Producer
                             Broker            Consumer


          Producer
                     Push               Pull
                                               Consumer

          Producer



Kafka Architecture

Messaging
18.
                                                     Septem
                                                       ber
                                                      2012


                           Topics                    13



        logs                  …         page-views



                Msg               Msg         Msg




Consumer        Consumer                 Consumer



Kafka Architecture

Publish-Subscribe
18.
                                               Septem
                                                 ber
                                                2012



•  Persists messages to disc                   14

      •     Topic is base abstraction
      •     Binary write ahead log
      •     No message ID
      •     Message offset ID (byte position)
•  Messages retained a specific time
      •  Default is 7 days




Kafka Architecture

Persistent
18.
                                                  Septem
                                                    ber
                                                   2012



•  API Simplicity                                 15

      •  Append message
      •  Fetch message from given byte position
•      Batching
•      Stateless Broker
•      O(1) disc access (no seeks)
•      Use of operating system features



Kafka Architecture

High-Throughput
CC 2.0 by nolifebeforecoffee | http://flic.kr/p/c1UTf
18.
                                                                             Septem
                                                                               ber
                                                                              2012


                                        n News Agents                         17

                                      Kafka




                     REST

                                       RT Alerts

          Web-UI
                              HBase




          MySQL        Solr
                                              Icons by http://dryicons.com

Prospective Search

Solution Architecture
18.
                                                                                       Septem
                                                                                         ber
                                                                                        2012

                                                                                       18




                                                  Processing

                     Pull (Batch)




                                    Prospective
                                      Search
                                                        RT Alerts
             Kafka Consumer


                                                        Icons by http://dryicons.com

Prospective Search

Prospective Search with Kafka
18.
                                        Septem
                                          ber
                                         2012



•  http://incubator.apache.org/kafka/   19


•  http://sites.computer.org/debull/
   A12june/A12JUN-CD.pdf




Resources to get started
18.
                                                        Septem
                                                          ber
                                                         2012

                                                        20




                            Questions?
           Christian Gügi, christian.guegi@sentric.ch




Swiss Big Data User Group

Thank you!

Más contenido relacionado

Destacado

What to do When Everyone Wants to be Your Partner - Sandy Campbell
What to do When Everyone Wants to be Your Partner - Sandy CampbellWhat to do When Everyone Wants to be Your Partner - Sandy Campbell
What to do When Everyone Wants to be Your Partner - Sandy CampbellHELIGLIASA
 
The Influencer's Mantra (NFRUG Feb 2010)
The Influencer's Mantra (NFRUG Feb 2010)The Influencer's Mantra (NFRUG Feb 2010)
The Influencer's Mantra (NFRUG Feb 2010)Siraj Sirajuddin
 
Revista.Forever Living.España.Portugal.oct.nº13
Revista.Forever Living.España.Portugal.oct.nº13Revista.Forever Living.España.Portugal.oct.nº13
Revista.Forever Living.España.Portugal.oct.nº13Nicolás Alzaga Ruiz
 
Internacionalización empresarial.
Internacionalización empresarial.Internacionalización empresarial.
Internacionalización empresarial.SWAN Partners
 
Universal design of learning
Universal design of learningUniversal design of learning
Universal design of learningMissTerrell
 
Make UX Great Again
Make UX Great AgainMake UX Great Again
Make UX Great AgainSteve Noone
 
самовиховання. тема 5
самовиховання. тема 5самовиховання. тема 5
самовиховання. тема 5mad psychologist
 
High-throughput Quantum Chemistry and Virtual Screening for Lithium Ion Batte...
High-throughput Quantum Chemistry and Virtual Screening for Lithium Ion Batte...High-throughput Quantum Chemistry and Virtual Screening for Lithium Ion Batte...
High-throughput Quantum Chemistry and Virtual Screening for Lithium Ion Batte...BIOVIA
 
Device Modeling of Li-Ion battery MATLAB/Simulink Model
Device Modeling of Li-Ion battery MATLAB/Simulink ModelDevice Modeling of Li-Ion battery MATLAB/Simulink Model
Device Modeling of Li-Ion battery MATLAB/Simulink ModelTsuyoshi Horigome
 
How to Design of Power Management of Hybrid Circuit(Li-Ion Battery and Li-Ion...
How to Design of Power Management of Hybrid Circuit(Li-Ion Battery and Li-Ion...How to Design of Power Management of Hybrid Circuit(Li-Ion Battery and Li-Ion...
How to Design of Power Management of Hybrid Circuit(Li-Ion Battery and Li-Ion...Tsuyoshi Horigome
 

Destacado (18)

10th Annual Utah's Health Services Research Conference - Recommendations for ...
10th Annual Utah's Health Services Research Conference - Recommendations for ...10th Annual Utah's Health Services Research Conference - Recommendations for ...
10th Annual Utah's Health Services Research Conference - Recommendations for ...
 
What to do When Everyone Wants to be Your Partner - Sandy Campbell
What to do When Everyone Wants to be Your Partner - Sandy CampbellWhat to do When Everyone Wants to be Your Partner - Sandy Campbell
What to do When Everyone Wants to be Your Partner - Sandy Campbell
 
The Influencer's Mantra (NFRUG Feb 2010)
The Influencer's Mantra (NFRUG Feb 2010)The Influencer's Mantra (NFRUG Feb 2010)
The Influencer's Mantra (NFRUG Feb 2010)
 
Presentación1
Presentación1Presentación1
Presentación1
 
Mobile bridge
Mobile bridgeMobile bridge
Mobile bridge
 
Revista.Forever Living.España.Portugal.oct.nº13
Revista.Forever Living.España.Portugal.oct.nº13Revista.Forever Living.España.Portugal.oct.nº13
Revista.Forever Living.España.Portugal.oct.nº13
 
Tumores mediastínicos: A propósito de un caso clínico de fiebre
Tumores mediastínicos: A propósito de un caso clínico de fiebreTumores mediastínicos: A propósito de un caso clínico de fiebre
Tumores mediastínicos: A propósito de un caso clínico de fiebre
 
Mobile bridge
Mobile bridgeMobile bridge
Mobile bridge
 
Internacionalización empresarial.
Internacionalización empresarial.Internacionalización empresarial.
Internacionalización empresarial.
 
Tomás uceda writer_pdf
Tomás uceda writer_pdfTomás uceda writer_pdf
Tomás uceda writer_pdf
 
Universal design of learning
Universal design of learningUniversal design of learning
Universal design of learning
 
Arquitectura inca
Arquitectura incaArquitectura inca
Arquitectura inca
 
Make UX Great Again
Make UX Great AgainMake UX Great Again
Make UX Great Again
 
самовиховання. тема 5
самовиховання. тема 5самовиховання. тема 5
самовиховання. тема 5
 
Folleto exposición
Folleto exposiciónFolleto exposición
Folleto exposición
 
High-throughput Quantum Chemistry and Virtual Screening for Lithium Ion Batte...
High-throughput Quantum Chemistry and Virtual Screening for Lithium Ion Batte...High-throughput Quantum Chemistry and Virtual Screening for Lithium Ion Batte...
High-throughput Quantum Chemistry and Virtual Screening for Lithium Ion Batte...
 
Device Modeling of Li-Ion battery MATLAB/Simulink Model
Device Modeling of Li-Ion battery MATLAB/Simulink ModelDevice Modeling of Li-Ion battery MATLAB/Simulink Model
Device Modeling of Li-Ion battery MATLAB/Simulink Model
 
How to Design of Power Management of Hybrid Circuit(Li-Ion Battery and Li-Ion...
How to Design of Power Management of Hybrid Circuit(Li-Ion Battery and Li-Ion...How to Design of Power Management of Hybrid Circuit(Li-Ion Battery and Li-Ion...
How to Design of Power Management of Hybrid Circuit(Li-Ion Battery and Li-Ion...
 

Similar a Online Media Data Stream Processing with Kafka

Using HBase Coprocessors to implement Prospective Search - Berlin Buzzwords -...
Using HBase Coprocessors to implement Prospective Search - Berlin Buzzwords -...Using HBase Coprocessors to implement Prospective Search - Berlin Buzzwords -...
Using HBase Coprocessors to implement Prospective Search - Berlin Buzzwords -...Christian Gügi
 
Notebook-based AI Pipelines with Elyra and Kubeflow
Notebook-based AI Pipelines with Elyra and KubeflowNotebook-based AI Pipelines with Elyra and Kubeflow
Notebook-based AI Pipelines with Elyra and KubeflowNick Pentreath
 
Building Notebook-based AI Pipelines with Elyra and Kubeflow
Building Notebook-based AI Pipelines with Elyra and KubeflowBuilding Notebook-based AI Pipelines with Elyra and Kubeflow
Building Notebook-based AI Pipelines with Elyra and KubeflowDatabricks
 
Think 2019 session 7921 gazprombank and ibs dsk bank - (1)
Think 2019 session 7921   gazprombank and ibs dsk bank - (1)Think 2019 session 7921   gazprombank and ibs dsk bank - (1)
Think 2019 session 7921 gazprombank and ibs dsk bank - (1)Goran Angelov
 
OpenCms Days 2012 - OpenCms 8.5: Creating "in place" editable pages with the ...
OpenCms Days 2012 - OpenCms 8.5: Creating "in place" editable pages with the ...OpenCms Days 2012 - OpenCms 8.5: Creating "in place" editable pages with the ...
OpenCms Days 2012 - OpenCms 8.5: Creating "in place" editable pages with the ...Alkacon Software GmbH & Co. KG
 
Log everything!
Log everything!Log everything!
Log everything!ICANS GmbH
 
How to Run Containerized Enterprise SQL Applications in the Cloud with NuoDB ...
How to Run Containerized Enterprise SQL Applications in the Cloud with NuoDB ...How to Run Containerized Enterprise SQL Applications in the Cloud with NuoDB ...
How to Run Containerized Enterprise SQL Applications in the Cloud with NuoDB ...MayaData Inc
 
NuoDB + MayaData: How to Run Containerized Enterprise SQL Applications in the...
NuoDB + MayaData: How to Run Containerized Enterprise SQL Applications in the...NuoDB + MayaData: How to Run Containerized Enterprise SQL Applications in the...
NuoDB + MayaData: How to Run Containerized Enterprise SQL Applications in the...NuoDB
 
Cloud Computing Principles and Paradigms: 4 the enterprise cloud computing pa...
Cloud Computing Principles and Paradigms: 4 the enterprise cloud computing pa...Cloud Computing Principles and Paradigms: 4 the enterprise cloud computing pa...
Cloud Computing Principles and Paradigms: 4 the enterprise cloud computing pa...Majid Hajibaba
 
Building businesspost.ie using Node.js
Building businesspost.ie using Node.jsBuilding businesspost.ie using Node.js
Building businesspost.ie using Node.jsRichard Rodger
 
A short introduction to Spark and its benefits
A short introduction to Spark and its benefitsA short introduction to Spark and its benefits
A short introduction to Spark and its benefitsJohan Picard
 
Accelerate Digital Transformation with IBM Cloud Private
Accelerate Digital Transformation with IBM Cloud PrivateAccelerate Digital Transformation with IBM Cloud Private
Accelerate Digital Transformation with IBM Cloud PrivateMichael Elder
 
AWS Meetup Paris - Short URL project by Pernod Ricard
AWS Meetup Paris - Short URL project by Pernod RicardAWS Meetup Paris - Short URL project by Pernod Ricard
AWS Meetup Paris - Short URL project by Pernod RicardCharles Rapp
 
Of metacello, git, scripting and things
Of metacello, git, scripting and thingsOf metacello, git, scripting and things
Of metacello, git, scripting and thingsESUG
 
Yieldbot Tech Talk, Sept 20, 2012
Yieldbot Tech Talk, Sept 20, 2012Yieldbot Tech Talk, Sept 20, 2012
Yieldbot Tech Talk, Sept 20, 2012yieldbot
 
#OSSPARIS17 - Développeurs, urbanisez la consommation de vos Clouds et APIs a...
#OSSPARIS17 - Développeurs, urbanisez la consommation de vos Clouds et APIs a...#OSSPARIS17 - Développeurs, urbanisez la consommation de vos Clouds et APIs a...
#OSSPARIS17 - Développeurs, urbanisez la consommation de vos Clouds et APIs a...Paris Open Source Summit
 
The ins and outs of SAP Cloud Platform's environments
The ins and outs of SAP Cloud Platform's environmentsThe ins and outs of SAP Cloud Platform's environments
The ins and outs of SAP Cloud Platform's environmentsMorten Wittrock
 

Similar a Online Media Data Stream Processing with Kafka (20)

Using HBase Coprocessors to implement Prospective Search - Berlin Buzzwords -...
Using HBase Coprocessors to implement Prospective Search - Berlin Buzzwords -...Using HBase Coprocessors to implement Prospective Search - Berlin Buzzwords -...
Using HBase Coprocessors to implement Prospective Search - Berlin Buzzwords -...
 
Notebook-based AI Pipelines with Elyra and Kubeflow
Notebook-based AI Pipelines with Elyra and KubeflowNotebook-based AI Pipelines with Elyra and Kubeflow
Notebook-based AI Pipelines with Elyra and Kubeflow
 
Building Notebook-based AI Pipelines with Elyra and Kubeflow
Building Notebook-based AI Pipelines with Elyra and KubeflowBuilding Notebook-based AI Pipelines with Elyra and Kubeflow
Building Notebook-based AI Pipelines with Elyra and Kubeflow
 
Think 2019 session 7921 gazprombank and ibs dsk bank - (1)
Think 2019 session 7921   gazprombank and ibs dsk bank - (1)Think 2019 session 7921   gazprombank and ibs dsk bank - (1)
Think 2019 session 7921 gazprombank and ibs dsk bank - (1)
 
OpenCms Days 2012 - OpenCms 8.5: Creating "in place" editable pages with the ...
OpenCms Days 2012 - OpenCms 8.5: Creating "in place" editable pages with the ...OpenCms Days 2012 - OpenCms 8.5: Creating "in place" editable pages with the ...
OpenCms Days 2012 - OpenCms 8.5: Creating "in place" editable pages with the ...
 
Log everything!
Log everything!Log everything!
Log everything!
 
Introduction to NodeJS
Introduction to NodeJSIntroduction to NodeJS
Introduction to NodeJS
 
How to Run Containerized Enterprise SQL Applications in the Cloud with NuoDB ...
How to Run Containerized Enterprise SQL Applications in the Cloud with NuoDB ...How to Run Containerized Enterprise SQL Applications in the Cloud with NuoDB ...
How to Run Containerized Enterprise SQL Applications in the Cloud with NuoDB ...
 
NuoDB + MayaData: How to Run Containerized Enterprise SQL Applications in the...
NuoDB + MayaData: How to Run Containerized Enterprise SQL Applications in the...NuoDB + MayaData: How to Run Containerized Enterprise SQL Applications in the...
NuoDB + MayaData: How to Run Containerized Enterprise SQL Applications in the...
 
Cloud Computing Principles and Paradigms: 4 the enterprise cloud computing pa...
Cloud Computing Principles and Paradigms: 4 the enterprise cloud computing pa...Cloud Computing Principles and Paradigms: 4 the enterprise cloud computing pa...
Cloud Computing Principles and Paradigms: 4 the enterprise cloud computing pa...
 
Building businesspost.ie using Node.js
Building businesspost.ie using Node.jsBuilding businesspost.ie using Node.js
Building businesspost.ie using Node.js
 
A short introduction to Spark and its benefits
A short introduction to Spark and its benefitsA short introduction to Spark and its benefits
A short introduction to Spark and its benefits
 
Accelerate Digital Transformation with IBM Cloud Private
Accelerate Digital Transformation with IBM Cloud PrivateAccelerate Digital Transformation with IBM Cloud Private
Accelerate Digital Transformation with IBM Cloud Private
 
AWS Meetup Paris - Short URL project by Pernod Ricard
AWS Meetup Paris - Short URL project by Pernod RicardAWS Meetup Paris - Short URL project by Pernod Ricard
AWS Meetup Paris - Short URL project by Pernod Ricard
 
CDMI For Swift
CDMI For SwiftCDMI For Swift
CDMI For Swift
 
NoSQL Overview
NoSQL OverviewNoSQL Overview
NoSQL Overview
 
Of metacello, git, scripting and things
Of metacello, git, scripting and thingsOf metacello, git, scripting and things
Of metacello, git, scripting and things
 
Yieldbot Tech Talk, Sept 20, 2012
Yieldbot Tech Talk, Sept 20, 2012Yieldbot Tech Talk, Sept 20, 2012
Yieldbot Tech Talk, Sept 20, 2012
 
#OSSPARIS17 - Développeurs, urbanisez la consommation de vos Clouds et APIs a...
#OSSPARIS17 - Développeurs, urbanisez la consommation de vos Clouds et APIs a...#OSSPARIS17 - Développeurs, urbanisez la consommation de vos Clouds et APIs a...
#OSSPARIS17 - Développeurs, urbanisez la consommation de vos Clouds et APIs a...
 
The ins and outs of SAP Cloud Platform's environments
The ins and outs of SAP Cloud Platform's environmentsThe ins and outs of SAP Cloud Platform's environments
The ins and outs of SAP Cloud Platform's environments
 

Más de Christian Gügi

Real-Time Fraud Detection in Payment Transactions
Real-Time Fraud Detection in Payment TransactionsReal-Time Fraud Detection in Payment Transactions
Real-Time Fraud Detection in Payment TransactionsChristian Gügi
 
Building Scalable Big Data Pipelines
Building Scalable Big Data PipelinesBuilding Scalable Big Data Pipelines
Building Scalable Big Data PipelinesChristian Gügi
 
Case Study: In-Store Analysis
Case Study: In-Store AnalysisCase Study: In-Store Analysis
Case Study: In-Store AnalysisChristian Gügi
 
Apache HBase: Introduction to a column-oriented data store
Apache HBase: Introduction to a column-oriented data storeApache HBase: Introduction to a column-oriented data store
Apache HBase: Introduction to a column-oriented data storeChristian Gügi
 
Apachecon Europe 2012: Operating HBase - Things you need to know
Apachecon Europe 2012: Operating HBase - Things you need to knowApachecon Europe 2012: Operating HBase - Things you need to know
Apachecon Europe 2012: Operating HBase - Things you need to knowChristian Gügi
 
Near Real Time Processing of Social Media Data with HBase
Near Real Time Processing of Social Media Data with HBaseNear Real Time Processing of Social Media Data with HBase
Near Real Time Processing of Social Media Data with HBaseChristian Gügi
 

Más de Christian Gügi (6)

Real-Time Fraud Detection in Payment Transactions
Real-Time Fraud Detection in Payment TransactionsReal-Time Fraud Detection in Payment Transactions
Real-Time Fraud Detection in Payment Transactions
 
Building Scalable Big Data Pipelines
Building Scalable Big Data PipelinesBuilding Scalable Big Data Pipelines
Building Scalable Big Data Pipelines
 
Case Study: In-Store Analysis
Case Study: In-Store AnalysisCase Study: In-Store Analysis
Case Study: In-Store Analysis
 
Apache HBase: Introduction to a column-oriented data store
Apache HBase: Introduction to a column-oriented data storeApache HBase: Introduction to a column-oriented data store
Apache HBase: Introduction to a column-oriented data store
 
Apachecon Europe 2012: Operating HBase - Things you need to know
Apachecon Europe 2012: Operating HBase - Things you need to knowApachecon Europe 2012: Operating HBase - Things you need to know
Apachecon Europe 2012: Operating HBase - Things you need to know
 
Near Real Time Processing of Social Media Data with HBase
Near Real Time Processing of Social Media Data with HBaseNear Real Time Processing of Social Media Data with HBase
Near Real Time Processing of Social Media Data with HBase
 

Último

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 

Último (20)

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 

Online Media Data Stream Processing with Kafka

  • 1. CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3
  • 2. 18. Septem ber 2012 •  What is Streaming Data? 2 •  Why Kafka? •  Kafka Architecture •  Use Case: Prospective Search Overview
  • 3. 18. Septem ber 2012 •  Spin-off of MeMo News AG, the 3 leading provider for Social Media Monitoring & Analytics in Switzerland •  Big Data expert, focused on Hadoop, HBase and Solr •  Objective: Transforming data into insights About Sentric
  • 4. CC 2.0 by audreyjm529| http://flic.kr/p/mNMtL  
  • 5. 18. Septem ber 2012 •  Website Activity Data 5 •  User activity •  Server activity •  Social Media Data •  News Data •  … •  How to Analyze in Real-Time? What is Streaming Data? Data Streams
  • 6. 18. Septem ber 2012 6 now   t   Offline  (Hadoop/MR)   Online  (Ka5a)   What is Streaming Data? Offline vs. Online
  • 7. CC 2.0 by Tom Hilton | http://flic.kr/p/54KSXy  
  • 8. 18. Septem ber 2012 •  Message Queues (RabbitMQ, ActiveMQ) 8 •  do not scale / have no persistence •  Flume / Scribe •  Log-Aggregation only, high throughput and scalable, push model •  Focus on offline consumption •  Kafka •  High throughput and scalable, pull model •  Different consumption profiles Why Kafka? Streaming Systems
  • 9. 18. Septem ber 2012 9 Source:  h<p://research.microso@.com/en-­‐us/um/people/srikanth/netdb11/netdb11papers/netdb11-­‐final12.pdf   Why Kafka? Consumer Performance
  • 10. CC 2.0 by Presidente | http://flic.kr/p/2ptSZ  
  • 11. 18. Septem ber 2012 •  Messaging System 11 •  Publish-Subscribe •  Persistent •  High-Throughput Kafka Architecture Key Concepts
  • 12. 18. Septem ber 2012 12 ZooKeeper Producer Consumer Producer Broker Consumer Producer Push Pull Consumer Producer Kafka Architecture Messaging
  • 13. 18. Septem ber 2012 Topics 13 logs … page-views Msg Msg Msg Consumer Consumer Consumer Kafka Architecture Publish-Subscribe
  • 14. 18. Septem ber 2012 •  Persists messages to disc 14 •  Topic is base abstraction •  Binary write ahead log •  No message ID •  Message offset ID (byte position) •  Messages retained a specific time •  Default is 7 days Kafka Architecture Persistent
  • 15. 18. Septem ber 2012 •  API Simplicity 15 •  Append message •  Fetch message from given byte position •  Batching •  Stateless Broker •  O(1) disc access (no seeks) •  Use of operating system features Kafka Architecture High-Throughput
  • 16. CC 2.0 by nolifebeforecoffee | http://flic.kr/p/c1UTf
  • 17. 18. Septem ber 2012 n News Agents 17 Kafka REST RT Alerts Web-UI HBase MySQL Solr Icons by http://dryicons.com Prospective Search Solution Architecture
  • 18. 18. Septem ber 2012 18 Processing Pull (Batch) Prospective Search RT Alerts Kafka Consumer Icons by http://dryicons.com Prospective Search Prospective Search with Kafka
  • 19. 18. Septem ber 2012 •  http://incubator.apache.org/kafka/ 19 •  http://sites.computer.org/debull/ A12june/A12JUN-CD.pdf Resources to get started
  • 20. 18. Septem ber 2012 20 Questions? Christian Gügi, christian.guegi@sentric.ch Swiss Big Data User Group Thank you!