SlideShare una empresa de Scribd logo
1 de 19
Descargar para leer sin conexión
Storm at
Rob Doherty
Senior Backend Engineer
rdoherty@outbrain.com
@robdoherty2
What is Outbrain?
Before Storm
● Custom distributed processing system
● Python and ZMQ
● Advantages:
○ Simple components
○ Well-understood
● Disadvantages:
○ Did not scale
○ Batch-processing
Kafka + Storm
● Kafka: high-throughput distributed messaging
● Storm: distributed, real-time computation
Why Kafka?
● Need method to buffer clicks into “stream”
● Kafka + Storm common pattern for click tracking
Why Storm?
● “Real time” (15s latency requirements)
● Fault tolerance
● Easy to manage parallelism
● Stream grouping
● Active community
● Open-source project
Nginx Servers
Kafka Cluster
Storm Topology
Elastic Load Balancer
Customer
Traffic
AWS
MongoDB
Redis
Algo
API
Architecture
Kafka Cluster
● 40 Producers (8 m1.large instances)
○ Python brod
● 4 Brokers (4 m1.large instances)
10k Clicks per second (peak)
14B Clicks per month
Kafka v0.7.2
Storm Topology
● 40 Supervisors (c1.xlarge instances)
● 35 Bolts, 1 Kafka spout
● 250+ Executors (worker threads)
160k+ tuples executed per second
Storm v0.82
Leiningen v1.7
Customer
Traffic
Kafka Spout Aggregate 15s
Aggregate 5m
Position
Customer
Social
Arrangement
Front Page
@Handle
Storm Topology
Challenges
● Shell Bolts
● Anchor Bolts/Replaying Stream
● Acking Tuples
● Monitoring
Monitoring
● Scribe Logging
● Munin + Nagios
● JMX-JMXTrans + Ganglia
● Storm UI
● Thrift interface into Nimbus + D3
Monitoring
● Scribe Logging
● Munin + Nagios
● JMX-JMXTrans + Ganglia
● Storm UI
● Thrift interface into Nimbus + D3
Future Plans
● Load testing
● Break topology into smaller pieces
● Move from AWS to private data center
Thank you
Rob Doherty
Senior Backend Engineer
rdoherty@outbrain.com
@robdoherty2

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Pywps
PywpsPywps
Pywps
 
What's new with serverless on google cloud
What's new with serverless on google cloud What's new with serverless on google cloud
What's new with serverless on google cloud
 
InfluxDB and Grafana: An Introduction to Time-Based Data Storage and Visualiz...
InfluxDB and Grafana: An Introduction to Time-Based Data Storage and Visualiz...InfluxDB and Grafana: An Introduction to Time-Based Data Storage and Visualiz...
InfluxDB and Grafana: An Introduction to Time-Based Data Storage and Visualiz...
 
Infrastructure as Code with Terraform: Koombea TechTalks
Infrastructure as Code with Terraform: Koombea TechTalksInfrastructure as Code with Terraform: Koombea TechTalks
Infrastructure as Code with Terraform: Koombea TechTalks
 
Breaking the RpiDocker challenge
Breaking the RpiDocker challenge Breaking the RpiDocker challenge
Breaking the RpiDocker challenge
 
Jumbo the Hadoop cluster bootstrapper
Jumbo the Hadoop cluster bootstrapperJumbo the Hadoop cluster bootstrapper
Jumbo the Hadoop cluster bootstrapper
 
Clojure News Feed Performance Testing
Clojure News Feed Performance TestingClojure News Feed Performance Testing
Clojure News Feed Performance Testing
 
Stream Processing Live Traffic Data with Kafka Streams
Stream Processing Live Traffic Data with Kafka StreamsStream Processing Live Traffic Data with Kafka Streams
Stream Processing Live Traffic Data with Kafka Streams
 
Kafka y python
Kafka y pythonKafka y python
Kafka y python
 
Kafka practical experience
Kafka practical experienceKafka practical experience
Kafka practical experience
 
Kafka Practices @ Uber - Seattle Apache Kafka meetup
Kafka Practices @ Uber - Seattle Apache Kafka meetupKafka Practices @ Uber - Seattle Apache Kafka meetup
Kafka Practices @ Uber - Seattle Apache Kafka meetup
 
Grafana 7.0
Grafana 7.0Grafana 7.0
Grafana 7.0
 
Apache Flink Training Workshop @ HadoopCon2016 - #4 Advanced Stream Processing
Apache Flink Training Workshop @ HadoopCon2016 - #4 Advanced Stream ProcessingApache Flink Training Workshop @ HadoopCon2016 - #4 Advanced Stream Processing
Apache Flink Training Workshop @ HadoopCon2016 - #4 Advanced Stream Processing
 
Solving some of the scalability problems at booking.com
Solving some of the scalability problems at booking.comSolving some of the scalability problems at booking.com
Solving some of the scalability problems at booking.com
 
The Dark Side Of Go -- Go runtime related problems in TiDB in production
The Dark Side Of Go -- Go runtime related problems in TiDB  in productionThe Dark Side Of Go -- Go runtime related problems in TiDB  in production
The Dark Side Of Go -- Go runtime related problems in TiDB in production
 
Yipit - AWS Start-Up Customer
Yipit - AWS Start-Up Customer Yipit - AWS Start-Up Customer
Yipit - AWS Start-Up Customer
 
Order from chaos: automating monitoring configuration
Order from chaos: automating monitoring configurationOrder from chaos: automating monitoring configuration
Order from chaos: automating monitoring configuration
 
Serverless for the Cloud Native Era with Fission
Serverless for the Cloud Native Era with FissionServerless for the Cloud Native Era with Fission
Serverless for the Cloud Native Era with Fission
 
Cassandra in Docker at Yelp: Opportunities and Challenges
Cassandra in Docker at Yelp: Opportunities and ChallengesCassandra in Docker at Yelp: Opportunities and Challenges
Cassandra in Docker at Yelp: Opportunities and Challenges
 
Cloud Firestore – From JSON Deserialization to Object Document Mapping (ODM)
Cloud Firestore – From JSON Deserialization to Object Document Mapping (ODM)Cloud Firestore – From JSON Deserialization to Object Document Mapping (ODM)
Cloud Firestore – From JSON Deserialization to Object Document Mapping (ODM)
 

Similar a Nyc storm meetup_robdoherty

Similar a Nyc storm meetup_robdoherty (20)

A Tour of Apache Kafka
A Tour of Apache KafkaA Tour of Apache Kafka
A Tour of Apache Kafka
 
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
 
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/Sec
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/SecNetflix Keystone - How Netflix Handles Data Streams up to 11M Events/Sec
Netflix Keystone - How Netflix Handles Data Streams up to 11M Events/Sec
 
Netflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipelineNetflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipeline
 
Akka gRPC quick-guide
Akka gRPC quick-guideAkka gRPC quick-guide
Akka gRPC quick-guide
 
Akka gRPC quick-guide
Akka gRPC quick-guideAkka gRPC quick-guide
Akka gRPC quick-guide
 
Kubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the DatacenterKubernetes @ Squarespace: Kubernetes in the Datacenter
Kubernetes @ Squarespace: Kubernetes in the Datacenter
 
Netty training
Netty trainingNetty training
Netty training
 
How Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per dayHow Uber scaled its Real Time Infrastructure to Trillion events per day
How Uber scaled its Real Time Infrastructure to Trillion events per day
 
Uber Real Time Data Analytics
Uber Real Time Data AnalyticsUber Real Time Data Analytics
Uber Real Time Data Analytics
 
Netty training
Netty trainingNetty training
Netty training
 
umeng analytical arch
umeng analytical archumeng analytical arch
umeng analytical arch
 
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a MonthUSENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
 
QCON 2015: Gearpump, Realtime Streaming on Akka
QCON 2015: Gearpump, Realtime Streaming on AkkaQCON 2015: Gearpump, Realtime Streaming on Akka
QCON 2015: Gearpump, Realtime Streaming on Akka
 
Eugene Bova "Dapr (Distributed Application Runtime) in a Microservices Archit...
Eugene Bova "Dapr (Distributed Application Runtime) in a Microservices Archit...Eugene Bova "Dapr (Distributed Application Runtime) in a Microservices Archit...
Eugene Bova "Dapr (Distributed Application Runtime) in a Microservices Archit...
 
Build real time stream processing applications using Apache Kafka
Build real time stream processing applications using Apache KafkaBuild real time stream processing applications using Apache Kafka
Build real time stream processing applications using Apache Kafka
 
From Three Nines to Five Nines - A Kafka Journey
From Three Nines to Five Nines - A Kafka JourneyFrom Three Nines to Five Nines - A Kafka Journey
From Three Nines to Five Nines - A Kafka Journey
 

Último

Último (20)

Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 

Nyc storm meetup_robdoherty