SlideShare una empresa de Scribd logo
1 de 8
Descargar para leer sin conexión
Storm @ Visual Revenue   (an Outbrain
Company)




Alex Poon
VP of Engineering
Who are we?
What we do?
                                          Customer
Traffic

•  14B page views per month

•  At peak, 8000-10000 per sec              Web
Servers


•  Deployed Storm to production ~ 1
                                                                  Ka=a

month ago                                 Data
Transform/
                                           Aggrega8on

•  Storm cluster of ~50 instances on                              Storm

AWS
                                             Databases




                                       Dashboard
         Algo




                                            Automa8on

Before Storm
•  Built our own distributed data processing
    •  ZMQ
    •  Batch based process
    •  Hashing processing by customers
•  Advantages
    •  Simple in-house system built from very basic components
    •  Well understood
•  Disadvantages
    •  Hard to scale, constant battle for keeping up with pings
    •  Machine management was clumsy
    •  Uneven distribution of traffic
    •  Multiple processes doing similar work, wasting resources
Why Kafka/Storm?
•  Kafka
    •  open-sourced, distributed publish-subscribe messaging system
•  Storm
    •  open-sourced, real-time computation system for continuous
    computation
•  They are awesome
    •  Distributed, highly scalable, and fault tolerance
    •  High throughput
    •  Reliable
    •  Real-time
    •  Great at in-memory analytics, and real-time decision support
DataAggregation
                      Customer

                          15s





                       Position
   Front Page

                          15s
         15s

URL
     Aggregate

             15s

                      Aggregate
   Arrangement

                          5m
           5m





Spout
                 Tweet
       @Handle

Bolt
                     15s
         15s

Learning / Ideas
1. Kafka + zookeeper is extremely scalable and easy to setup.
Check out the Brod library if you are doing Python

2. Use the Storm UI (Ganglia based) to monitor your cluster

3. Shell Bolts were inefficient and hard to debug (at least for us)

4. Upgrade to at least Storm version 0.8.2 which gives you capacity
metrics on top of other goodies

5. Storm’s anchoring/replay capability is awesome but comes with a
visible overhead

6. Use a good framework to manage your cluster, we use Salt Stack

7. Our unit tests are built in Junit. Most built in unit tests for Storm
are only available in Clojure for now
Thank You

 Alex Poon
 @alexpoon06
 @Outbrain

  Yes, it is true. We are
  Hiring!! 

     www.visualrevenue.com/jobs


Más contenido relacionado

La actualidad más candente

Multi-master, multi-region MySQL deployment in Amazon AWS
Multi-master, multi-region MySQL deployment in Amazon AWSMulti-master, multi-region MySQL deployment in Amazon AWS
Multi-master, multi-region MySQL deployment in Amazon AWSContinuent
 
Aws 12 Month Free Tier for Web Designers and Developers
Aws 12 Month Free Tier for Web Designers and DevelopersAws 12 Month Free Tier for Web Designers and Developers
Aws 12 Month Free Tier for Web Designers and DevelopersDylan Burris
 
MONITORING THE UNKNOWN, 1000*100 SERIES A DAY - DEVOXX MOROCCO 2017
MONITORING THE UNKNOWN, 1000*100 SERIES A DAY - DEVOXX MOROCCO 2017MONITORING THE UNKNOWN, 1000*100 SERIES A DAY - DEVOXX MOROCCO 2017
MONITORING THE UNKNOWN, 1000*100 SERIES A DAY - DEVOXX MOROCCO 2017Quentin Adam
 
Problems you’ll face in the Microservices World: Configuration, Authenticatio...
Problems you’ll face in the Microservices World: Configuration, Authenticatio...Problems you’ll face in the Microservices World: Configuration, Authenticatio...
Problems you’ll face in the Microservices World: Configuration, Authenticatio...Quentin Adam
 
5 simple steps to migrate to AWS
5 simple steps to migrate to AWS5 simple steps to migrate to AWS
5 simple steps to migrate to AWSAmazon Web Services
 
Azure Site Recovery Loves Business Continuity
Azure Site Recovery Loves Business ContinuityAzure Site Recovery Loves Business Continuity
Azure Site Recovery Loves Business ContinuityMichael Frank
 
#lspe Q1 2013 dynamically scaling netflix in the cloud
#lspe Q1 2013   dynamically scaling netflix in the cloud#lspe Q1 2013   dynamically scaling netflix in the cloud
#lspe Q1 2013 dynamically scaling netflix in the cloudCoburn Watson
 
AWS Customer Presentation - JovianDATA
AWS Customer Presentation - JovianDATAAWS Customer Presentation - JovianDATA
AWS Customer Presentation - JovianDATAAmazon Web Services
 
Building big data pipelines with Kafka and Kubernetes
Building big data pipelines with Kafka and KubernetesBuilding big data pipelines with Kafka and Kubernetes
Building big data pipelines with Kafka and KubernetesVenu Ryali
 
Meetup #3: Migrate a fast scale system to AWS
Meetup #3: Migrate a fast scale system to AWSMeetup #3: Migrate a fast scale system to AWS
Meetup #3: Migrate a fast scale system to AWSAWS Vietnam Community
 
Taming the cost of your first cloud - CCCEU 2014
Taming the cost of your first cloud - CCCEU 2014Taming the cost of your first cloud - CCCEU 2014
Taming the cost of your first cloud - CCCEU 2014Tim Mackey
 
SCCM ConfigMgr Intune Architecture Decision Maker
SCCM ConfigMgr Intune Architecture Decision MakerSCCM ConfigMgr Intune Architecture Decision Maker
SCCM ConfigMgr Intune Architecture Decision MakerAnoop Nair
 
Azure Nights August2017
Azure Nights August2017Azure Nights August2017
Azure Nights August2017Michael Frank
 
Green / Blue Deployment with Immutable Servers
Green / Blue Deployment with Immutable ServersGreen / Blue Deployment with Immutable Servers
Green / Blue Deployment with Immutable ServersSimon Dittlmann
 
Sina App Engine - a distributed web solution on cloud
Sina App Engine - a distributed web solution on cloudSina App Engine - a distributed web solution on cloud
Sina App Engine - a distributed web solution on cloudcong lei
 
Faas With Kata Container
Faas With Kata ContainerFaas With Kata Container
Faas With Kata ContainerMadhuri Kumari
 
Reliable, Scalable Kubernetes on AWS
Reliable, Scalable Kubernetes on AWSReliable, Scalable Kubernetes on AWS
Reliable, Scalable Kubernetes on AWSApplatix
 
Cloud - High Availability @ Low Cost - Workshop - Gurpreet ahuja
Cloud - High Availability @ Low Cost - Workshop - Gurpreet ahujaCloud - High Availability @ Low Cost - Workshop - Gurpreet ahuja
Cloud - High Availability @ Low Cost - Workshop - Gurpreet ahujaResellerClub
 

La actualidad más candente (20)

Multi-master, multi-region MySQL deployment in Amazon AWS
Multi-master, multi-region MySQL deployment in Amazon AWSMulti-master, multi-region MySQL deployment in Amazon AWS
Multi-master, multi-region MySQL deployment in Amazon AWS
 
Aws 12 Month Free Tier for Web Designers and Developers
Aws 12 Month Free Tier for Web Designers and DevelopersAws 12 Month Free Tier for Web Designers and Developers
Aws 12 Month Free Tier for Web Designers and Developers
 
MONITORING THE UNKNOWN, 1000*100 SERIES A DAY - DEVOXX MOROCCO 2017
MONITORING THE UNKNOWN, 1000*100 SERIES A DAY - DEVOXX MOROCCO 2017MONITORING THE UNKNOWN, 1000*100 SERIES A DAY - DEVOXX MOROCCO 2017
MONITORING THE UNKNOWN, 1000*100 SERIES A DAY - DEVOXX MOROCCO 2017
 
Problems you’ll face in the Microservices World: Configuration, Authenticatio...
Problems you’ll face in the Microservices World: Configuration, Authenticatio...Problems you’ll face in the Microservices World: Configuration, Authenticatio...
Problems you’ll face in the Microservices World: Configuration, Authenticatio...
 
5 simple steps to migrate to AWS
5 simple steps to migrate to AWS5 simple steps to migrate to AWS
5 simple steps to migrate to AWS
 
Azure Site Recovery Loves Business Continuity
Azure Site Recovery Loves Business ContinuityAzure Site Recovery Loves Business Continuity
Azure Site Recovery Loves Business Continuity
 
#lspe Q1 2013 dynamically scaling netflix in the cloud
#lspe Q1 2013   dynamically scaling netflix in the cloud#lspe Q1 2013   dynamically scaling netflix in the cloud
#lspe Q1 2013 dynamically scaling netflix in the cloud
 
AWS Customer Presentation - JovianDATA
AWS Customer Presentation - JovianDATAAWS Customer Presentation - JovianDATA
AWS Customer Presentation - JovianDATA
 
Building big data pipelines with Kafka and Kubernetes
Building big data pipelines with Kafka and KubernetesBuilding big data pipelines with Kafka and Kubernetes
Building big data pipelines with Kafka and Kubernetes
 
Meetup #3: Migrate a fast scale system to AWS
Meetup #3: Migrate a fast scale system to AWSMeetup #3: Migrate a fast scale system to AWS
Meetup #3: Migrate a fast scale system to AWS
 
Taming the cost of your first cloud - CCCEU 2014
Taming the cost of your first cloud - CCCEU 2014Taming the cost of your first cloud - CCCEU 2014
Taming the cost of your first cloud - CCCEU 2014
 
SCCM ConfigMgr Intune Architecture Decision Maker
SCCM ConfigMgr Intune Architecture Decision MakerSCCM ConfigMgr Intune Architecture Decision Maker
SCCM ConfigMgr Intune Architecture Decision Maker
 
Azure Nights August2017
Azure Nights August2017Azure Nights August2017
Azure Nights August2017
 
Green / Blue Deployment with Immutable Servers
Green / Blue Deployment with Immutable ServersGreen / Blue Deployment with Immutable Servers
Green / Blue Deployment with Immutable Servers
 
Sina App Engine - a distributed web solution on cloud
Sina App Engine - a distributed web solution on cloudSina App Engine - a distributed web solution on cloud
Sina App Engine - a distributed web solution on cloud
 
Faas With Kata Container
Faas With Kata ContainerFaas With Kata Container
Faas With Kata Container
 
Terraform
TerraformTerraform
Terraform
 
Reliable, Scalable Kubernetes on AWS
Reliable, Scalable Kubernetes on AWSReliable, Scalable Kubernetes on AWS
Reliable, Scalable Kubernetes on AWS
 
Blue green deployment
Blue green deploymentBlue green deployment
Blue green deployment
 
Cloud - High Availability @ Low Cost - Workshop - Gurpreet ahuja
Cloud - High Availability @ Low Cost - Workshop - Gurpreet ahujaCloud - High Availability @ Low Cost - Workshop - Gurpreet ahuja
Cloud - High Availability @ Low Cost - Workshop - Gurpreet ahuja
 

Similar a Open analytics meetup alex poon (1)

A scalable server environment for your applications
A scalable server environment for your applicationsA scalable server environment for your applications
A scalable server environment for your applicationsGigaSpaces
 
Stream Computing (The Engineer's Perspective)
Stream Computing (The Engineer's Perspective)Stream Computing (The Engineer's Perspective)
Stream Computing (The Engineer's Perspective)Ilya Ganelin
 
Palringo : a startup's journey from a data center to the cloud
Palringo : a startup's journey from a data center to the cloudPalringo : a startup's journey from a data center to the cloud
Palringo : a startup's journey from a data center to the cloudPhilipBasford
 
Cloud Computing with .Net
Cloud Computing with .NetCloud Computing with .Net
Cloud Computing with .NetWesley Faler
 
Accelerating Analytics for the Future of Genomics
Accelerating Analytics for the Future of GenomicsAccelerating Analytics for the Future of Genomics
Accelerating Analytics for the Future of GenomicsAmazon Web Services
 
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Amazon Web Services
 
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst ITThings You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst ITOpenStack
 
Five Years of EC2 Distilled
Five Years of EC2 DistilledFive Years of EC2 Distilled
Five Years of EC2 DistilledGrig Gheorghiu
 
Virtual Flink Forward 2020: How Streaming Helps Your Staging Environment and ...
Virtual Flink Forward 2020: How Streaming Helps Your Staging Environment and ...Virtual Flink Forward 2020: How Streaming Helps Your Staging Environment and ...
Virtual Flink Forward 2020: How Streaming Helps Your Staging Environment and ...Flink Forward
 
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)Spark Summit
 
Architecture Best Practices on Windows Azure
Architecture Best Practices on Windows AzureArchitecture Best Practices on Windows Azure
Architecture Best Practices on Windows AzureNuno Godinho
 
Apache storm vs. Spark Streaming
Apache storm vs. Spark StreamingApache storm vs. Spark Streaming
Apache storm vs. Spark StreamingP. Taylor Goetz
 
NICTA, Disaster Recovery Using OpenStack
NICTA, Disaster Recovery Using OpenStackNICTA, Disaster Recovery Using OpenStack
NICTA, Disaster Recovery Using OpenStacklaurabeckcahoon
 
Leaving the Ivory Tower: Research in the Real World
Leaving the Ivory Tower: Research in the Real WorldLeaving the Ivory Tower: Research in the Real World
Leaving the Ivory Tower: Research in the Real WorldArmonDadgar
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudyJohn Adams
 
Your Guide to Streaming - The Engineer's Perspective
Your Guide to Streaming - The Engineer's PerspectiveYour Guide to Streaming - The Engineer's Perspective
Your Guide to Streaming - The Engineer's PerspectiveIlya Ganelin
 
Azug - successfully breeding rabits
Azug - successfully breeding rabitsAzug - successfully breeding rabits
Azug - successfully breeding rabitsYves Goeleven
 
IEEE Cloud 2012: Clouds Hands-On Tutorial
IEEE Cloud 2012: Clouds Hands-On TutorialIEEE Cloud 2012: Clouds Hands-On Tutorial
IEEE Cloud 2012: Clouds Hands-On TutorialSrinath Perera
 
Quilt - Distributed Load Simulation from AWS
Quilt - Distributed Load Simulation from AWSQuilt - Distributed Load Simulation from AWS
Quilt - Distributed Load Simulation from AWSAjith Jose
 
Oracle in the Cloud
Oracle in the CloudOracle in the Cloud
Oracle in the Cloudzain1425
 

Similar a Open analytics meetup alex poon (1) (20)

A scalable server environment for your applications
A scalable server environment for your applicationsA scalable server environment for your applications
A scalable server environment for your applications
 
Stream Computing (The Engineer's Perspective)
Stream Computing (The Engineer's Perspective)Stream Computing (The Engineer's Perspective)
Stream Computing (The Engineer's Perspective)
 
Palringo : a startup's journey from a data center to the cloud
Palringo : a startup's journey from a data center to the cloudPalringo : a startup's journey from a data center to the cloud
Palringo : a startup's journey from a data center to the cloud
 
Cloud Computing with .Net
Cloud Computing with .NetCloud Computing with .Net
Cloud Computing with .Net
 
Accelerating Analytics for the Future of Genomics
Accelerating Analytics for the Future of GenomicsAccelerating Analytics for the Future of Genomics
Accelerating Analytics for the Future of Genomics
 
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
 
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst ITThings You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT
 
Five Years of EC2 Distilled
Five Years of EC2 DistilledFive Years of EC2 Distilled
Five Years of EC2 Distilled
 
Virtual Flink Forward 2020: How Streaming Helps Your Staging Environment and ...
Virtual Flink Forward 2020: How Streaming Helps Your Staging Environment and ...Virtual Flink Forward 2020: How Streaming Helps Your Staging Environment and ...
Virtual Flink Forward 2020: How Streaming Helps Your Staging Environment and ...
 
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
 
Architecture Best Practices on Windows Azure
Architecture Best Practices on Windows AzureArchitecture Best Practices on Windows Azure
Architecture Best Practices on Windows Azure
 
Apache storm vs. Spark Streaming
Apache storm vs. Spark StreamingApache storm vs. Spark Streaming
Apache storm vs. Spark Streaming
 
NICTA, Disaster Recovery Using OpenStack
NICTA, Disaster Recovery Using OpenStackNICTA, Disaster Recovery Using OpenStack
NICTA, Disaster Recovery Using OpenStack
 
Leaving the Ivory Tower: Research in the Real World
Leaving the Ivory Tower: Research in the Real WorldLeaving the Ivory Tower: Research in the Real World
Leaving the Ivory Tower: Research in the Real World
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudy
 
Your Guide to Streaming - The Engineer's Perspective
Your Guide to Streaming - The Engineer's PerspectiveYour Guide to Streaming - The Engineer's Perspective
Your Guide to Streaming - The Engineer's Perspective
 
Azug - successfully breeding rabits
Azug - successfully breeding rabitsAzug - successfully breeding rabits
Azug - successfully breeding rabits
 
IEEE Cloud 2012: Clouds Hands-On Tutorial
IEEE Cloud 2012: Clouds Hands-On TutorialIEEE Cloud 2012: Clouds Hands-On Tutorial
IEEE Cloud 2012: Clouds Hands-On Tutorial
 
Quilt - Distributed Load Simulation from AWS
Quilt - Distributed Load Simulation from AWSQuilt - Distributed Load Simulation from AWS
Quilt - Distributed Load Simulation from AWS
 
Oracle in the Cloud
Oracle in the CloudOracle in the Cloud
Oracle in the Cloud
 

Más de Open Analytics

Cyber after Snowden (OA Cyber Summit)
Cyber after Snowden (OA Cyber Summit)Cyber after Snowden (OA Cyber Summit)
Cyber after Snowden (OA Cyber Summit)Open Analytics
 
Utilizing cyber intelligence to combat cyber adversaries (OA Cyber Summit)
Utilizing cyber intelligence to combat cyber adversaries (OA Cyber Summit)Utilizing cyber intelligence to combat cyber adversaries (OA Cyber Summit)
Utilizing cyber intelligence to combat cyber adversaries (OA Cyber Summit)Open Analytics
 
CDM….Where do you start? (OA Cyber Summit)
CDM….Where do you start? (OA Cyber Summit)CDM….Where do you start? (OA Cyber Summit)
CDM….Where do you start? (OA Cyber Summit)Open Analytics
 
An Immigrant’s view of Cyberspace (OA Cyber Summit)
An Immigrant’s view of Cyberspace (OA Cyber Summit)An Immigrant’s view of Cyberspace (OA Cyber Summit)
An Immigrant’s view of Cyberspace (OA Cyber Summit)Open Analytics
 
MOLOCH: Search for Full Packet Capture (OA Cyber Summit)
MOLOCH: Search for Full Packet Capture (OA Cyber Summit)MOLOCH: Search for Full Packet Capture (OA Cyber Summit)
MOLOCH: Search for Full Packet Capture (OA Cyber Summit)Open Analytics
 
Observations on CFR.org Website Traffic Surge Due to Chechnya Terrorism Scare...
Observations on CFR.org Website Traffic Surge Due to Chechnya Terrorism Scare...Observations on CFR.org Website Traffic Surge Due to Chechnya Terrorism Scare...
Observations on CFR.org Website Traffic Surge Due to Chechnya Terrorism Scare...Open Analytics
 
Using Real-Time Data to Drive Optimization & Personalization
Using Real-Time Data to Drive Optimization & PersonalizationUsing Real-Time Data to Drive Optimization & Personalization
Using Real-Time Data to Drive Optimization & PersonalizationOpen Analytics
 
M&A Trends in Telco Analytics
M&A Trends in Telco AnalyticsM&A Trends in Telco Analytics
M&A Trends in Telco AnalyticsOpen Analytics
 
Competing in the Digital Economy
Competing in the Digital EconomyCompeting in the Digital Economy
Competing in the Digital EconomyOpen Analytics
 
Piwik: An Analytics Alternative (Chicago Summit)
Piwik: An Analytics Alternative (Chicago Summit)Piwik: An Analytics Alternative (Chicago Summit)
Piwik: An Analytics Alternative (Chicago Summit)Open Analytics
 
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...Open Analytics
 
Crossing the Chasm (Ikanow - Chicago Summit)
Crossing the Chasm (Ikanow - Chicago Summit)Crossing the Chasm (Ikanow - Chicago Summit)
Crossing the Chasm (Ikanow - Chicago Summit)Open Analytics
 
On the “Moneyball” – Building the Team, Product, and Service to Rival (Pegged...
On the “Moneyball” – Building the Team, Product, and Service to Rival (Pegged...On the “Moneyball” – Building the Team, Product, and Service to Rival (Pegged...
On the “Moneyball” – Building the Team, Product, and Service to Rival (Pegged...Open Analytics
 
Data evolutions in media, marketing, and retail (Business Adv Group - Chicago...
Data evolutions in media, marketing, and retail (Business Adv Group - Chicago...Data evolutions in media, marketing, and retail (Business Adv Group - Chicago...
Data evolutions in media, marketing, and retail (Business Adv Group - Chicago...Open Analytics
 
Characterizing Risk in your Supply Chain (nContext - Chicago Summit)
Characterizing Risk in your Supply Chain (nContext - Chicago Summit)Characterizing Risk in your Supply Chain (nContext - Chicago Summit)
Characterizing Risk in your Supply Chain (nContext - Chicago Summit)Open Analytics
 
From Insight to Impact (Chicago Summit - Keynote)
From Insight to Impact (Chicago Summit - Keynote)From Insight to Impact (Chicago Summit - Keynote)
From Insight to Impact (Chicago Summit - Keynote)Open Analytics
 
Easybib Open Analytics NYC
Easybib Open Analytics NYCEasybib Open Analytics NYC
Easybib Open Analytics NYCOpen Analytics
 
MarkLogic - Open Analytics Meetup
MarkLogic - Open Analytics MeetupMarkLogic - Open Analytics Meetup
MarkLogic - Open Analytics MeetupOpen Analytics
 
The caprate presentation_july2013_open analytics dc meetup
The caprate presentation_july2013_open analytics dc meetupThe caprate presentation_july2013_open analytics dc meetup
The caprate presentation_july2013_open analytics dc meetupOpen Analytics
 
Verifeed open analytics_3min deck_071713_final
Verifeed open analytics_3min deck_071713_finalVerifeed open analytics_3min deck_071713_final
Verifeed open analytics_3min deck_071713_finalOpen Analytics
 

Más de Open Analytics (20)

Cyber after Snowden (OA Cyber Summit)
Cyber after Snowden (OA Cyber Summit)Cyber after Snowden (OA Cyber Summit)
Cyber after Snowden (OA Cyber Summit)
 
Utilizing cyber intelligence to combat cyber adversaries (OA Cyber Summit)
Utilizing cyber intelligence to combat cyber adversaries (OA Cyber Summit)Utilizing cyber intelligence to combat cyber adversaries (OA Cyber Summit)
Utilizing cyber intelligence to combat cyber adversaries (OA Cyber Summit)
 
CDM….Where do you start? (OA Cyber Summit)
CDM….Where do you start? (OA Cyber Summit)CDM….Where do you start? (OA Cyber Summit)
CDM….Where do you start? (OA Cyber Summit)
 
An Immigrant’s view of Cyberspace (OA Cyber Summit)
An Immigrant’s view of Cyberspace (OA Cyber Summit)An Immigrant’s view of Cyberspace (OA Cyber Summit)
An Immigrant’s view of Cyberspace (OA Cyber Summit)
 
MOLOCH: Search for Full Packet Capture (OA Cyber Summit)
MOLOCH: Search for Full Packet Capture (OA Cyber Summit)MOLOCH: Search for Full Packet Capture (OA Cyber Summit)
MOLOCH: Search for Full Packet Capture (OA Cyber Summit)
 
Observations on CFR.org Website Traffic Surge Due to Chechnya Terrorism Scare...
Observations on CFR.org Website Traffic Surge Due to Chechnya Terrorism Scare...Observations on CFR.org Website Traffic Surge Due to Chechnya Terrorism Scare...
Observations on CFR.org Website Traffic Surge Due to Chechnya Terrorism Scare...
 
Using Real-Time Data to Drive Optimization & Personalization
Using Real-Time Data to Drive Optimization & PersonalizationUsing Real-Time Data to Drive Optimization & Personalization
Using Real-Time Data to Drive Optimization & Personalization
 
M&A Trends in Telco Analytics
M&A Trends in Telco AnalyticsM&A Trends in Telco Analytics
M&A Trends in Telco Analytics
 
Competing in the Digital Economy
Competing in the Digital EconomyCompeting in the Digital Economy
Competing in the Digital Economy
 
Piwik: An Analytics Alternative (Chicago Summit)
Piwik: An Analytics Alternative (Chicago Summit)Piwik: An Analytics Alternative (Chicago Summit)
Piwik: An Analytics Alternative (Chicago Summit)
 
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
 
Crossing the Chasm (Ikanow - Chicago Summit)
Crossing the Chasm (Ikanow - Chicago Summit)Crossing the Chasm (Ikanow - Chicago Summit)
Crossing the Chasm (Ikanow - Chicago Summit)
 
On the “Moneyball” – Building the Team, Product, and Service to Rival (Pegged...
On the “Moneyball” – Building the Team, Product, and Service to Rival (Pegged...On the “Moneyball” – Building the Team, Product, and Service to Rival (Pegged...
On the “Moneyball” – Building the Team, Product, and Service to Rival (Pegged...
 
Data evolutions in media, marketing, and retail (Business Adv Group - Chicago...
Data evolutions in media, marketing, and retail (Business Adv Group - Chicago...Data evolutions in media, marketing, and retail (Business Adv Group - Chicago...
Data evolutions in media, marketing, and retail (Business Adv Group - Chicago...
 
Characterizing Risk in your Supply Chain (nContext - Chicago Summit)
Characterizing Risk in your Supply Chain (nContext - Chicago Summit)Characterizing Risk in your Supply Chain (nContext - Chicago Summit)
Characterizing Risk in your Supply Chain (nContext - Chicago Summit)
 
From Insight to Impact (Chicago Summit - Keynote)
From Insight to Impact (Chicago Summit - Keynote)From Insight to Impact (Chicago Summit - Keynote)
From Insight to Impact (Chicago Summit - Keynote)
 
Easybib Open Analytics NYC
Easybib Open Analytics NYCEasybib Open Analytics NYC
Easybib Open Analytics NYC
 
MarkLogic - Open Analytics Meetup
MarkLogic - Open Analytics MeetupMarkLogic - Open Analytics Meetup
MarkLogic - Open Analytics Meetup
 
The caprate presentation_july2013_open analytics dc meetup
The caprate presentation_july2013_open analytics dc meetupThe caprate presentation_july2013_open analytics dc meetup
The caprate presentation_july2013_open analytics dc meetup
 
Verifeed open analytics_3min deck_071713_final
Verifeed open analytics_3min deck_071713_finalVerifeed open analytics_3min deck_071713_final
Verifeed open analytics_3min deck_071713_final
 

Open analytics meetup alex poon (1)

  • 1. Storm @ Visual Revenue (an Outbrain Company) Alex Poon VP of Engineering
  • 3. What we do? Customer
Traffic
 •  14B page views per month •  At peak, 8000-10000 per sec Web
Servers
 •  Deployed Storm to production ~ 1 Ka=a
 month ago Data
Transform/ Aggrega8on
 •  Storm cluster of ~50 instances on Storm
 AWS Databases
 Dashboard
 Algo
 Automa8on

  • 4. Before Storm •  Built our own distributed data processing •  ZMQ •  Batch based process •  Hashing processing by customers •  Advantages •  Simple in-house system built from very basic components •  Well understood •  Disadvantages •  Hard to scale, constant battle for keeping up with pings •  Machine management was clumsy •  Uneven distribution of traffic •  Multiple processes doing similar work, wasting resources
  • 5. Why Kafka/Storm? •  Kafka •  open-sourced, distributed publish-subscribe messaging system •  Storm •  open-sourced, real-time computation system for continuous computation •  They are awesome •  Distributed, highly scalable, and fault tolerance •  High throughput •  Reliable •  Real-time •  Great at in-memory analytics, and real-time decision support
  • 6. DataAggregation Customer
 15s
 Position
 Front Page
 15s
 15s
 URL
 Aggregate
 15s
 Aggregate
 Arrangement
 5m
 5m
 Spout
 Tweet
 @Handle
 Bolt
 15s
 15s

  • 7. Learning / Ideas 1. Kafka + zookeeper is extremely scalable and easy to setup. Check out the Brod library if you are doing Python 2. Use the Storm UI (Ganglia based) to monitor your cluster 3. Shell Bolts were inefficient and hard to debug (at least for us) 4. Upgrade to at least Storm version 0.8.2 which gives you capacity metrics on top of other goodies 5. Storm’s anchoring/replay capability is awesome but comes with a visible overhead 6. Use a good framework to manage your cluster, we use Salt Stack 7. Our unit tests are built in Junit. Most built in unit tests for Storm are only available in Clojure for now
  • 8. Thank You Alex Poon @alexpoon06 @Outbrain Yes, it is true. We are Hiring!! 
 www.visualrevenue.com/jobs