SlideShare una empresa de Scribd logo
1 de 20
CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3
18.
                                    Septem
                                      ber
                                     2012



•    What is Streaming Data?         2


•    Why Kafka?
•    Kafka Architecture
•    Use Case: Prospective Search




Overview
18.
                                           Septem
                                             ber
                                            2012



•  Spin-off of MeMo News AG, the             3

   leading provider for Social Media
   Monitoring & Analytics in Switzerland
•  Big Data expert, focused on Hadoop,
   HBase and Solr
•  Objective: Transforming data into
   insights




About Sentric
CC 2.0 by audreyjm529| http://flic.kr/p/mNMtL	
  
18.
                                  Septem
                                    ber
                                   2012



•  Website Activity Data           5

      •  User activity
      •  Server activity
•  Social Media Data
•  News Data
•  …

•  How to Analyze in Real-Time?

What is Streaming Data?

Data Streams
18.
                                                                        Septem
                                                                          ber
                                                                         2012

                                                                         6


                                                              now	
  


           t	
  




                   Offline	
  (Hadoop/MR)	
     Online	
  (Ka5a)	
  




What is Streaming Data?

Offline vs. Online
CC 2.0 by Tom Hilton | http://flic.kr/p/54KSXy	
  
18.
                                                         Septem
                                                           ber
                                                          2012


•  Message Queues (RabbitMQ, ActiveMQ)                    8

     •       do not scale / have no persistence
•  Flume / Scribe
     •       Log-Aggregation only, high throughput and
             scalable, push model
     •       Focus on offline consumption
•  Kafka
     •       High throughput and scalable, pull model
     •       Different consumption profiles


Why Kafka?

Streaming Systems
18.
                                                                                                                   Septem
                                                                                                                     ber
                                                                                                                    2012

                                                                                                                    9




Source:	
  h<p://research.microso@.com/en-­‐us/um/people/srikanth/netdb11/netdb11papers/netdb11-­‐final12.pdf	
  


Why Kafka?

Consumer Performance
CC 2.0 by Presidente | http://flic.kr/p/2ptSZ	
  
18.
                           Septem
                             ber
                            2012



•      Messaging System     11


•      Publish-Subscribe
•      Persistent
•      High-Throughput




Kafka Architecture

Key Concepts
18.
                                                          Septem
                                                            ber
                                                           2012

                                                          12

                            ZooKeeper
          Producer                             Consumer



          Producer
                             Broker            Consumer


          Producer
                     Push               Pull
                                               Consumer

          Producer



Kafka Architecture

Messaging
18.
                                                     Septem
                                                       ber
                                                      2012


                           Topics                    13



        logs                  …         page-views



                Msg               Msg         Msg




Consumer        Consumer                 Consumer



Kafka Architecture

Publish-Subscribe
18.
                                               Septem
                                                 ber
                                                2012



•  Persists messages to disc                   14

      •     Topic is base abstraction
      •     Binary write ahead log
      •     No message ID
      •     Message offset ID (byte position)
•  Messages retained a specific time
      •  Default is 7 days




Kafka Architecture

Persistent
18.
                                                  Septem
                                                    ber
                                                   2012



•  API Simplicity                                 15

      •  Append message
      •  Fetch message from given byte position
•      Batching
•      Stateless Broker
•      O(1) disc access (no seeks)
•      Use of operating system features



Kafka Architecture

High-Throughput
CC 2.0 by nolifebeforecoffee | http://flic.kr/p/c1UTf
18.
                                                                             Septem
                                                                               ber
                                                                              2012


                                        n News Agents                         17

                                      Kafka




                     REST

                                       RT Alerts

          Web-UI
                              HBase




          MySQL        Solr
                                              Icons by http://dryicons.com

Prospective Search

Solution Architecture
18.
                                                                                       Septem
                                                                                         ber
                                                                                        2012

                                                                                       18




                                                  Processing

                     Pull (Batch)




                                    Prospective
                                      Search
                                                        RT Alerts
             Kafka Consumer


                                                        Icons by http://dryicons.com

Prospective Search

Prospective Search with Kafka
18.
                                        Septem
                                          ber
                                         2012



•  http://incubator.apache.org/kafka/   19


•  http://sites.computer.org/debull/
   A12june/A12JUN-CD.pdf




Resources to get started
18.
                                                        Septem
                                                          ber
                                                         2012

                                                        20




                            Questions?
           Christian Gügi, christian.guegi@sentric.ch




Swiss Big Data User Group

Thank you!

Más contenido relacionado

Similar a Online Media Data Stream Processing with Kafka

How to Run Containerized Enterprise SQL Applications in the Cloud with NuoDB ...
How to Run Containerized Enterprise SQL Applications in the Cloud with NuoDB ...How to Run Containerized Enterprise SQL Applications in the Cloud with NuoDB ...
How to Run Containerized Enterprise SQL Applications in the Cloud with NuoDB ...MayaData Inc
 
NuoDB + MayaData: How to Run Containerized Enterprise SQL Applications in the...
NuoDB + MayaData: How to Run Containerized Enterprise SQL Applications in the...NuoDB + MayaData: How to Run Containerized Enterprise SQL Applications in the...
NuoDB + MayaData: How to Run Containerized Enterprise SQL Applications in the...NuoDB
 
Cloud Computing Principles and Paradigms: 4 the enterprise cloud computing pa...
Cloud Computing Principles and Paradigms: 4 the enterprise cloud computing pa...Cloud Computing Principles and Paradigms: 4 the enterprise cloud computing pa...
Cloud Computing Principles and Paradigms: 4 the enterprise cloud computing pa...Majid Hajibaba
 
Building businesspost.ie using Node.js
Building businesspost.ie using Node.jsBuilding businesspost.ie using Node.js
Building businesspost.ie using Node.jsRichard Rodger
 
A short introduction to Spark and its benefits
A short introduction to Spark and its benefitsA short introduction to Spark and its benefits
A short introduction to Spark and its benefitsJohan Picard
 
Accelerate Digital Transformation with IBM Cloud Private
Accelerate Digital Transformation with IBM Cloud PrivateAccelerate Digital Transformation with IBM Cloud Private
Accelerate Digital Transformation with IBM Cloud PrivateMichael Elder
 
AWS Meetup Paris - Short URL project by Pernod Ricard
AWS Meetup Paris - Short URL project by Pernod RicardAWS Meetup Paris - Short URL project by Pernod Ricard
AWS Meetup Paris - Short URL project by Pernod RicardCharles Rapp
 
Of metacello, git, scripting and things
Of metacello, git, scripting and thingsOf metacello, git, scripting and things
Of metacello, git, scripting and thingsESUG
 
Yieldbot Tech Talk, Sept 20, 2012
Yieldbot Tech Talk, Sept 20, 2012Yieldbot Tech Talk, Sept 20, 2012
Yieldbot Tech Talk, Sept 20, 2012yieldbot
 
#OSSPARIS17 - Développeurs, urbanisez la consommation de vos Clouds et APIs a...
#OSSPARIS17 - Développeurs, urbanisez la consommation de vos Clouds et APIs a...#OSSPARIS17 - Développeurs, urbanisez la consommation de vos Clouds et APIs a...
#OSSPARIS17 - Développeurs, urbanisez la consommation de vos Clouds et APIs a...Paris Open Source Summit
 
The ins and outs of SAP Cloud Platform's environments
The ins and outs of SAP Cloud Platform's environmentsThe ins and outs of SAP Cloud Platform's environments
The ins and outs of SAP Cloud Platform's environmentsMorten Wittrock
 
Presentation of OCCIware, a standard, extensible Cloud consumer platform at P...
Presentation of OCCIware, a standard, extensible Cloud consumer platform at P...Presentation of OCCIware, a standard, extensible Cloud consumer platform at P...
Presentation of OCCIware, a standard, extensible Cloud consumer platform at P...OCCIware
 
OCCIware @ Paris Open Source Summit 2017 - a standard, extensible Cloud consu...
OCCIware @ Paris Open Source Summit 2017 - a standard, extensible Cloud consu...OCCIware @ Paris Open Source Summit 2017 - a standard, extensible Cloud consu...
OCCIware @ Paris Open Source Summit 2017 - a standard, extensible Cloud consu...Marc Dutoo
 
Cloud Foundry, the Open Platform as a Service - Oscon - July 2012
Cloud Foundry, the Open Platform as a Service - Oscon - July 2012Cloud Foundry, the Open Platform as a Service - Oscon - July 2012
Cloud Foundry, the Open Platform as a Service - Oscon - July 2012Patrick Chanezon
 
Client Server 3.0 - 6 Ways JavaScript is Revolutionizing the Client/Server Re...
Client Server 3.0 - 6 Ways JavaScript is Revolutionizing the Client/Server Re...Client Server 3.0 - 6 Ways JavaScript is Revolutionizing the Client/Server Re...
Client Server 3.0 - 6 Ways JavaScript is Revolutionizing the Client/Server Re...Jesse Cravens
 
Workshop About Software Engineering Skills 2019
Workshop About Software Engineering Skills 2019Workshop About Software Engineering Skills 2019
Workshop About Software Engineering Skills 2019PhuocNT (Fresher.VN)
 

Similar a Online Media Data Stream Processing with Kafka (20)

How to Run Containerized Enterprise SQL Applications in the Cloud with NuoDB ...
How to Run Containerized Enterprise SQL Applications in the Cloud with NuoDB ...How to Run Containerized Enterprise SQL Applications in the Cloud with NuoDB ...
How to Run Containerized Enterprise SQL Applications in the Cloud with NuoDB ...
 
NuoDB + MayaData: How to Run Containerized Enterprise SQL Applications in the...
NuoDB + MayaData: How to Run Containerized Enterprise SQL Applications in the...NuoDB + MayaData: How to Run Containerized Enterprise SQL Applications in the...
NuoDB + MayaData: How to Run Containerized Enterprise SQL Applications in the...
 
Cloud Computing Principles and Paradigms: 4 the enterprise cloud computing pa...
Cloud Computing Principles and Paradigms: 4 the enterprise cloud computing pa...Cloud Computing Principles and Paradigms: 4 the enterprise cloud computing pa...
Cloud Computing Principles and Paradigms: 4 the enterprise cloud computing pa...
 
Building businesspost.ie using Node.js
Building businesspost.ie using Node.jsBuilding businesspost.ie using Node.js
Building businesspost.ie using Node.js
 
A short introduction to Spark and its benefits
A short introduction to Spark and its benefitsA short introduction to Spark and its benefits
A short introduction to Spark and its benefits
 
Accelerate Digital Transformation with IBM Cloud Private
Accelerate Digital Transformation with IBM Cloud PrivateAccelerate Digital Transformation with IBM Cloud Private
Accelerate Digital Transformation with IBM Cloud Private
 
AWS Meetup Paris - Short URL project by Pernod Ricard
AWS Meetup Paris - Short URL project by Pernod RicardAWS Meetup Paris - Short URL project by Pernod Ricard
AWS Meetup Paris - Short URL project by Pernod Ricard
 
CDMI For Swift
CDMI For SwiftCDMI For Swift
CDMI For Swift
 
NoSQL Overview
NoSQL OverviewNoSQL Overview
NoSQL Overview
 
Of metacello, git, scripting and things
Of metacello, git, scripting and thingsOf metacello, git, scripting and things
Of metacello, git, scripting and things
 
Yieldbot Tech Talk, Sept 20, 2012
Yieldbot Tech Talk, Sept 20, 2012Yieldbot Tech Talk, Sept 20, 2012
Yieldbot Tech Talk, Sept 20, 2012
 
#OSSPARIS17 - Développeurs, urbanisez la consommation de vos Clouds et APIs a...
#OSSPARIS17 - Développeurs, urbanisez la consommation de vos Clouds et APIs a...#OSSPARIS17 - Développeurs, urbanisez la consommation de vos Clouds et APIs a...
#OSSPARIS17 - Développeurs, urbanisez la consommation de vos Clouds et APIs a...
 
The ins and outs of SAP Cloud Platform's environments
The ins and outs of SAP Cloud Platform's environmentsThe ins and outs of SAP Cloud Platform's environments
The ins and outs of SAP Cloud Platform's environments
 
Presentation of OCCIware, a standard, extensible Cloud consumer platform at P...
Presentation of OCCIware, a standard, extensible Cloud consumer platform at P...Presentation of OCCIware, a standard, extensible Cloud consumer platform at P...
Presentation of OCCIware, a standard, extensible Cloud consumer platform at P...
 
OCCIware @ Paris Open Source Summit 2017 - a standard, extensible Cloud consu...
OCCIware @ Paris Open Source Summit 2017 - a standard, extensible Cloud consu...OCCIware @ Paris Open Source Summit 2017 - a standard, extensible Cloud consu...
OCCIware @ Paris Open Source Summit 2017 - a standard, extensible Cloud consu...
 
Cloud foundry and openstackcloud
Cloud foundry and openstackcloudCloud foundry and openstackcloud
Cloud foundry and openstackcloud
 
Cloud Foundry, the Open Platform as a Service - Oscon - July 2012
Cloud Foundry, the Open Platform as a Service - Oscon - July 2012Cloud Foundry, the Open Platform as a Service - Oscon - July 2012
Cloud Foundry, the Open Platform as a Service - Oscon - July 2012
 
EVOLVE'15 | Enhance | Richard Gatewood | Integrating SFDC & AEM
EVOLVE'15 | Enhance | Richard Gatewood | Integrating SFDC & AEM EVOLVE'15 | Enhance | Richard Gatewood | Integrating SFDC & AEM
EVOLVE'15 | Enhance | Richard Gatewood | Integrating SFDC & AEM
 
Client Server 3.0 - 6 Ways JavaScript is Revolutionizing the Client/Server Re...
Client Server 3.0 - 6 Ways JavaScript is Revolutionizing the Client/Server Re...Client Server 3.0 - 6 Ways JavaScript is Revolutionizing the Client/Server Re...
Client Server 3.0 - 6 Ways JavaScript is Revolutionizing the Client/Server Re...
 
Workshop About Software Engineering Skills 2019
Workshop About Software Engineering Skills 2019Workshop About Software Engineering Skills 2019
Workshop About Software Engineering Skills 2019
 

Más de Swiss Big Data User Group

Making Hadoop based analytics simple for everyone to use
Making Hadoop based analytics simple for everyone to useMaking Hadoop based analytics simple for everyone to use
Making Hadoop based analytics simple for everyone to useSwiss Big Data User Group
 
A real life project using Cassandra at a large Swiss Telco operator
A real life project using Cassandra at a large Swiss Telco operatorA real life project using Cassandra at a large Swiss Telco operator
A real life project using Cassandra at a large Swiss Telco operatorSwiss Big Data User Group
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaSwiss Big Data User Group
 
Closing The Loop for Evaluating Big Data Analysis
Closing The Loop for Evaluating Big Data AnalysisClosing The Loop for Evaluating Big Data Analysis
Closing The Loop for Evaluating Big Data AnalysisSwiss Big Data User Group
 
Big Data and Data Science for traditional Swiss companies
Big Data and Data Science for traditional Swiss companiesBig Data and Data Science for traditional Swiss companies
Big Data and Data Science for traditional Swiss companiesSwiss Big Data User Group
 
Design Patterns for Large-Scale Real-Time Learning
Design Patterns for Large-Scale Real-Time LearningDesign Patterns for Large-Scale Real-Time Learning
Design Patterns for Large-Scale Real-Time LearningSwiss Big Data User Group
 
Unleash the power of Big Data in your existing Data Warehouse
Unleash the power of Big Data in your existing Data WarehouseUnleash the power of Big Data in your existing Data Warehouse
Unleash the power of Big Data in your existing Data WarehouseSwiss Big Data User Group
 
Project "Babelfish" - A data warehouse to attack complexity
 Project "Babelfish" - A data warehouse to attack complexity Project "Babelfish" - A data warehouse to attack complexity
Project "Babelfish" - A data warehouse to attack complexitySwiss Big Data User Group
 
Brainserve Datacenter: the High-Density Choice
Brainserve Datacenter: the High-Density ChoiceBrainserve Datacenter: the High-Density Choice
Brainserve Datacenter: the High-Density ChoiceSwiss Big Data User Group
 
Urturn on AWS: scaling infra, cost and time to maket
Urturn on AWS: scaling infra, cost and time to maketUrturn on AWS: scaling infra, cost and time to maket
Urturn on AWS: scaling infra, cost and time to maketSwiss Big Data User Group
 
The World Wide Distributed Computing Architecture of the LHC Datagrid
The World Wide Distributed Computing Architecture of the LHC DatagridThe World Wide Distributed Computing Architecture of the LHC Datagrid
The World Wide Distributed Computing Architecture of the LHC DatagridSwiss Big Data User Group
 
New opportunities for connected data : Neo4j the graph database
New opportunities for connected data : Neo4j the graph databaseNew opportunities for connected data : Neo4j the graph database
New opportunities for connected data : Neo4j the graph databaseSwiss Big Data User Group
 
Technology Outlook - The new Era of computing
Technology Outlook - The new Era of computingTechnology Outlook - The new Era of computing
Technology Outlook - The new Era of computingSwiss Big Data User Group
 

Más de Swiss Big Data User Group (20)

Making Hadoop based analytics simple for everyone to use
Making Hadoop based analytics simple for everyone to useMaking Hadoop based analytics simple for everyone to use
Making Hadoop based analytics simple for everyone to use
 
A real life project using Cassandra at a large Swiss Telco operator
A real life project using Cassandra at a large Swiss Telco operatorA real life project using Cassandra at a large Swiss Telco operator
A real life project using Cassandra at a large Swiss Telco operator
 
Data Analytics – B2B vs. B2C
Data Analytics – B2B vs. B2CData Analytics – B2B vs. B2C
Data Analytics – B2B vs. B2C
 
SQL on Hadoop
SQL on HadoopSQL on Hadoop
SQL on Hadoop
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
Closing The Loop for Evaluating Big Data Analysis
Closing The Loop for Evaluating Big Data AnalysisClosing The Loop for Evaluating Big Data Analysis
Closing The Loop for Evaluating Big Data Analysis
 
Big Data and Data Science for traditional Swiss companies
Big Data and Data Science for traditional Swiss companiesBig Data and Data Science for traditional Swiss companies
Big Data and Data Science for traditional Swiss companies
 
Design Patterns for Large-Scale Real-Time Learning
Design Patterns for Large-Scale Real-Time LearningDesign Patterns for Large-Scale Real-Time Learning
Design Patterns for Large-Scale Real-Time Learning
 
Educating Data Scientists of the Future
Educating Data Scientists of the FutureEducating Data Scientists of the Future
Educating Data Scientists of the Future
 
Unleash the power of Big Data in your existing Data Warehouse
Unleash the power of Big Data in your existing Data WarehouseUnleash the power of Big Data in your existing Data Warehouse
Unleash the power of Big Data in your existing Data Warehouse
 
Big data for Telco: opportunity or threat?
Big data for Telco: opportunity or threat?Big data for Telco: opportunity or threat?
Big data for Telco: opportunity or threat?
 
Project "Babelfish" - A data warehouse to attack complexity
 Project "Babelfish" - A data warehouse to attack complexity Project "Babelfish" - A data warehouse to attack complexity
Project "Babelfish" - A data warehouse to attack complexity
 
Brainserve Datacenter: the High-Density Choice
Brainserve Datacenter: the High-Density ChoiceBrainserve Datacenter: the High-Density Choice
Brainserve Datacenter: the High-Density Choice
 
Urturn on AWS: scaling infra, cost and time to maket
Urturn on AWS: scaling infra, cost and time to maketUrturn on AWS: scaling infra, cost and time to maket
Urturn on AWS: scaling infra, cost and time to maket
 
The World Wide Distributed Computing Architecture of the LHC Datagrid
The World Wide Distributed Computing Architecture of the LHC DatagridThe World Wide Distributed Computing Architecture of the LHC Datagrid
The World Wide Distributed Computing Architecture of the LHC Datagrid
 
New opportunities for connected data : Neo4j the graph database
New opportunities for connected data : Neo4j the graph databaseNew opportunities for connected data : Neo4j the graph database
New opportunities for connected data : Neo4j the graph database
 
Technology Outlook - The new Era of computing
Technology Outlook - The new Era of computingTechnology Outlook - The new Era of computing
Technology Outlook - The new Era of computing
 
In-Store Analysis with Hadoop
In-Store Analysis with HadoopIn-Store Analysis with Hadoop
In-Store Analysis with Hadoop
 
Big Data Visualization With ParaView
Big Data Visualization With ParaViewBig Data Visualization With ParaView
Big Data Visualization With ParaView
 
Introduction to Apache Drill
Introduction to Apache DrillIntroduction to Apache Drill
Introduction to Apache Drill
 

Último

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 

Último (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 

Online Media Data Stream Processing with Kafka

  • 1. CC 2.0 by William Brawley | http://flic.kr/p/7PdUP3
  • 2. 18. Septem ber 2012 •  What is Streaming Data? 2 •  Why Kafka? •  Kafka Architecture •  Use Case: Prospective Search Overview
  • 3. 18. Septem ber 2012 •  Spin-off of MeMo News AG, the 3 leading provider for Social Media Monitoring & Analytics in Switzerland •  Big Data expert, focused on Hadoop, HBase and Solr •  Objective: Transforming data into insights About Sentric
  • 4. CC 2.0 by audreyjm529| http://flic.kr/p/mNMtL  
  • 5. 18. Septem ber 2012 •  Website Activity Data 5 •  User activity •  Server activity •  Social Media Data •  News Data •  … •  How to Analyze in Real-Time? What is Streaming Data? Data Streams
  • 6. 18. Septem ber 2012 6 now   t   Offline  (Hadoop/MR)   Online  (Ka5a)   What is Streaming Data? Offline vs. Online
  • 7. CC 2.0 by Tom Hilton | http://flic.kr/p/54KSXy  
  • 8. 18. Septem ber 2012 •  Message Queues (RabbitMQ, ActiveMQ) 8 •  do not scale / have no persistence •  Flume / Scribe •  Log-Aggregation only, high throughput and scalable, push model •  Focus on offline consumption •  Kafka •  High throughput and scalable, pull model •  Different consumption profiles Why Kafka? Streaming Systems
  • 9. 18. Septem ber 2012 9 Source:  h<p://research.microso@.com/en-­‐us/um/people/srikanth/netdb11/netdb11papers/netdb11-­‐final12.pdf   Why Kafka? Consumer Performance
  • 10. CC 2.0 by Presidente | http://flic.kr/p/2ptSZ  
  • 11. 18. Septem ber 2012 •  Messaging System 11 •  Publish-Subscribe •  Persistent •  High-Throughput Kafka Architecture Key Concepts
  • 12. 18. Septem ber 2012 12 ZooKeeper Producer Consumer Producer Broker Consumer Producer Push Pull Consumer Producer Kafka Architecture Messaging
  • 13. 18. Septem ber 2012 Topics 13 logs … page-views Msg Msg Msg Consumer Consumer Consumer Kafka Architecture Publish-Subscribe
  • 14. 18. Septem ber 2012 •  Persists messages to disc 14 •  Topic is base abstraction •  Binary write ahead log •  No message ID •  Message offset ID (byte position) •  Messages retained a specific time •  Default is 7 days Kafka Architecture Persistent
  • 15. 18. Septem ber 2012 •  API Simplicity 15 •  Append message •  Fetch message from given byte position •  Batching •  Stateless Broker •  O(1) disc access (no seeks) •  Use of operating system features Kafka Architecture High-Throughput
  • 16. CC 2.0 by nolifebeforecoffee | http://flic.kr/p/c1UTf
  • 17. 18. Septem ber 2012 n News Agents 17 Kafka REST RT Alerts Web-UI HBase MySQL Solr Icons by http://dryicons.com Prospective Search Solution Architecture
  • 18. 18. Septem ber 2012 18 Processing Pull (Batch) Prospective Search RT Alerts Kafka Consumer Icons by http://dryicons.com Prospective Search Prospective Search with Kafka
  • 19. 18. Septem ber 2012 •  http://incubator.apache.org/kafka/ 19 •  http://sites.computer.org/debull/ A12june/A12JUN-CD.pdf Resources to get started
  • 20. 18. Septem ber 2012 20 Questions? Christian Gügi, christian.guegi@sentric.ch Swiss Big Data User Group Thank you!