Más contenido relacionado La actualidad más candente (20) Similar a Nyc storm meetup_robdoherty (20) Nyc storm meetup_robdoherty3. Before Storm
● Custom distributed processing system
● Python and ZMQ
● Advantages:
○ Simple components
○ Well-understood
● Disadvantages:
○ Did not scale
○ Batch-processing
4. Kafka + Storm
● Kafka: high-throughput distributed messaging
● Storm: distributed, real-time computation
5. Why Kafka?
● Need method to buffer clicks into “stream”
● Kafka + Storm common pattern for click tracking
6. Why Storm?
● “Real time” (15s latency requirements)
● Fault tolerance
● Easy to manage parallelism
● Stream grouping
● Active community
● Open-source project
8. Kafka Cluster
● 40 Producers (8 m1.large instances)
○ Python brod
● 4 Brokers (4 m1.large instances)
10k Clicks per second (peak)
14B Clicks per month
Kafka v0.7.2
9. Storm Topology
● 40 Supervisors (c1.xlarge instances)
● 35 Bolts, 1 Kafka spout
● 250+ Executors (worker threads)
160k+ tuples executed per second
Storm v0.82
Leiningen v1.7
18. Future Plans
● Load testing
● Break topology into smaller pieces
● Move from AWS to private data center