SlideShare a Scribd company logo
1 of 69
Download to read offline
Storm: overview
distributed and fault-tolerant realtime
computation.
Backend Web Berlin
Storm
www.storm-project.net
Storm is a free and open source distributed
realtime computation system.
September BWB Meetup
Use cases
distributed RPC continuous computationsstream processing
Overview
• free and open source
• integrates with any queuing and
database system
• distributed and scalable
• fault-tolerant
• supports multiple languages
Scalable
Storm topologies are inherently parallel and run across a cluster of machines.
Different parts of the topology can be scaled individually by tweaking their
parallelism.
The "rebalance" command of the "storm" command line client can adjust the
parallelism of running topologies on the fly.
Fault tolerant
When workers die, Storm will automatically restart them.
If a node dies, the worker will be restarted on another node.
The Storm daemons, Nimbus and the Supervisors, are designed to be stateless
and fail-fast.
Guarantees data processing
Storm guarantees every tuple will be fully processed. One of Storm's core
mechanisms is the ability to track the lineage of a tuple as it makes its way
through the topology in an extremely efficient way.
Messages are only replayed when there are failures. Storm's basic abstractions
provide an at-least-once processing guarantee, the same guarantee you get
when using a queueing system.
Use with many languages
Storm was designed from the ground up to be usable with any programming
language.
Similarly, spouts and bolts can be defined in any language. Non-JVM spouts
and bolts communicate to Storm over a JSON-based protocol over
stdin/stdout.
Adapters that implement this protocol exist for Ruby, Python, Javascript, Perl,
and PHP.
How Storm works? Storm cluster
Zookeeper
Zookeeper
Zookeeper
Supervisor
Supervisor
Supervisor
Supervisor
Supervisor
Nimbus
How Storm works? Basic concepts
Topology
Topology is a graph of computation. A topology runs forever, or until you kill it.
Stream
Stream is an unbounded sequence of tuples.
Spout
Spout is a source of streams.
Bolt
Bolt is the place where calculations are done. Bolts can do anything from run
functions, filter tuples, do streaming aggregations, joins, talk to databases etc.
How Storm works? Basic concepts
Worker process
A worker process executes a subset of a topology. A worker process belongs to
a specific topology and may run one or more executors for one or more
components (spouts or bolts) of this topology.
Executor (thread)
Executor is a thread that is spawned by a worker process. It may run 1+ tasks
for the same component. It always has 1 thread that it uses for all of its tasks.
Task
Task performs the actual data processing – each spout or bolt that you implement in
your code executes as many tasks across the cluster. The number of tasks for a
component is always the same throughout the lifetime of a topology.
How Storm works? Basic concepts
Spout
Task1
Task2
BoltA
Task1
Task2
Task3
BoltB
Task1
Task2
BoltC
Task1
Task2
Task3
Task4
Task5
Task6
BoltD
Task1
Task2
Task3
BoltE
Task1
Task2
BoltF
Task1
How Storm works? Topology Example
class DemoTopology {
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout(“Spout", new DemoSpout(), 2).setNumTasks(2)
.declareDefaultStream("uid", "item").declareStream(“item_copy", “uid”, "item");
builder.setBolt(“BoltA", new BoltA(), 2).setNumTasks(3).shuffleGrouping(“Spout“, “item_copy”);
builder.setBolt(“BoltB", new BoltB(), 2).setNumTasks(2).shuffleGrouping(“Spout")
.declareDefaultStream("uid", “fromB");
builder.setBolt(“BoltC", new BoltC(), 2).setNumTasks(6).shuffleGrouping(“BoltA")
.declareDefaultStream("uid", “fromC");
builder.setBolt(“BoltD", new BoltD(), 3).setNumTasks(3).shuffleGrouping(“BoltC")
.fieldsGrouping( “BoltC", new Fields("uid")).fieldsGrouping( “BoltB", new Fields("uid"))
.declareStream("forD", "uid", "text").declareStream("forF", "uid", "text", "ne");
builder.setBolt(“BoltE", new BoltE(), 1).setNumTasks(2).shuffleGrouping(“BoltD“, “forE”);
builder.setBolt(“BoltF", new BoltF(), 1).setNumTasks(1).shuffleGrouping(“BoltD“, “forF”);
StormSubmitter.submitTopology(“demoTopology”, conf, builder.createTopology());
}
How Storm works? Spout Example
public class DemoSpout extends BaseRichSpout {
….
@Override
public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) {
_collector = collector;
_queue = new MyFavoritQueue<string>();
}
@Override
public void nextTuple() {
String nextItem = queue.poll();
_collector.emit(new Values(nextItem));
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields(“item"));
}
}
How Storm works? Bolt Example
public class BoltA extends BaseRichBolt {
private OutputCollector _collector;
@Override
public void execute(Tuple tuple) {
Object obj = tuple.getValue(0);
String capitalizedItem = capitalize((String)obj);
_collector.emit(tuple, new Value(capitalizedItem));
_collector.ack(tuple);
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields(“item"));
}
}
Storm UI
Read More about Storm
• Storm
http://storm-project.net/
• Example Storm Topologies
https://github.com/nathanmarz/storm-starter
• Implementing Real-Time Trending Topics With a Distributed Rolling Count Algorithm
http://www.michael-noll.com/blog/2013/01/18/implementing-real-time-trending-
topics-in-storm/
• Understanding the Internal Message Buffers of Storm
http://www.michael-noll.com/blog/2013/06/21/understanding-storm-internal-
message-buffers/
• Understanding the Parallelism of a Storm Topology
http://www.michael-noll.com/blog/2012/10/16/understanding-the-parallelism-of-
a-storm-topology/
Storm in our company
ferret-go.com
Ferret go GmbH
Trend & Media Analytics
ferret-go.com
Our data flow (simplified)
Twitter
Facebook
Google+
Blogs
Comments
Online media
Offline media
Reviews
ElasticSearch
ElasticSearch
ElasticSearch
processing classification analyzing
Problem overview
• we have a number of streams that spout items
• for every item we do different calculations
• at the end of calculations we save item into
storage(s) – ElasticSearch, PostgreSQL etc.
• if processing fails because of some environment
issues, we want to re-queue item easily
• some of our calculations can be done in parallel
Google+
Twitter
Facebook
Solution
• Redis-based queues for spouting
• 1-2 spouts per topology
• 1 bulk bolt for storage writing per worker
• Storm cluster with 2 nodes:
32 Gb, CPU 4C-i7, Java 7, Ubuntu 12.04
• ~ 20 items per sec (could be increased)
• 3 slots per worker, 198 tasks, 68 executors
Thank you!
30.10.2013
September BWB Meetup
Andrii Gakhov
Storm: overview
distributed and fault-tolerant realtime
computation.
Backend Web Berlin
Storm
www.storm-project.net
Storm is a free and open source distributed
realtime computation system.
September BWB Meetup
Use cases
distributed RPC continuous computationsstream processing
Overview
• free and open source
• integrates with any queuing and
database system
• distributed and scalable
• fault-tolerant
• supports multiple languages
Scalable
Storm topologies are inherently parallel and run across a cluster of machines.
Different parts of the topology can be scaled individually by tweaking their
parallelism.
The "rebalance" command of the "storm" command line client can adjust the
parallelism of running topologies on the fly.
Fault tolerant
When workers die, Storm will automatically restart them.
If a node dies, the worker will be restarted on another node.
The Storm daemons, Nimbus and the Supervisors, are designed to be stateless
and fail-fast.
Guarantees data processing
Storm guarantees every tuple will be fully processed. One of Storm's core
mechanisms is the ability to track the lineage of a tuple as it makes its way
through the topology in an extremely efficient way.
Messages are only replayed when there are failures. Storm's basic abstractions
provide an at-least-once processing guarantee, the same guarantee you get
when using a queueing system.
Use with many languages
Storm was designed from the ground up to be usable with any programming
language.
Similarly, spouts and bolts can be defined in any language. Non-JVM spouts
and bolts communicate to Storm over a JSON-based protocol over
stdin/stdout.
Adapters that implement this protocol exist for Ruby, Python, Javascript, Perl,
and PHP.
How Storm works? Storm cluster
Zookeeper
Zookeeper
Zookeeper
Supervisor
Supervisor
Supervisor
Supervisor
Supervisor
Nimbus
How Storm works? Basic concepts
Topology
Topology is a graph of computation. A topology runs forever, or until you kill it.
Stream
Stream is an unbounded sequence of tuples.
Spout
Spout is a source of streams.
Bolt
Bolt is the place where calculations are done. Bolts can do anything from run
functions, filter tuples, do streaming aggregations, joins, talk to databases etc.
How Storm works? Basic concepts
Worker process
A worker process executes a subset of a topology. A worker process belongs to
a specific topology and may run one or more executors for one or more
components (spouts or bolts) of this topology.
Executor (thread)
Executor is a thread that is spawned by a worker process. It may run 1+ tasks
for the same component. It always has 1 thread that it uses for all of its tasks.
Task
Task performs the actual data processing – each spout or bolt that you implement in
your code executes as many tasks across the cluster. The number of tasks for a
component is always the same throughout the lifetime of a topology.
How Storm works? Basic concepts
Spout
Task1
Task2
BoltA
Task1
Task2
Task3
BoltB
Task1
Task2
BoltC
Task1
Task2
Task3
Task4
Task5
Task6
BoltD
Task1
Task2
Task3
BoltE
Task1
Task2
BoltF
Task1
How Storm works? Topology Example
class DemoTopology {
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout(“Spout", new DemoSpout(), 2).setNumTasks(2)
.declareDefaultStream("uid", "item").declareStream(“item_copy", “uid”, "item");
builder.setBolt(“BoltA", new BoltA(), 2).setNumTasks(3).shuffleGrouping(“Spout“, “item_copy”);
builder.setBolt(“BoltB", new BoltB(), 2).setNumTasks(2).shuffleGrouping(“Spout")
.declareDefaultStream("uid", “fromB");
builder.setBolt(“BoltC", new BoltC(), 2).setNumTasks(6).shuffleGrouping(“BoltA")
.declareDefaultStream("uid", “fromC");
builder.setBolt(“BoltD", new BoltD(), 3).setNumTasks(3).shuffleGrouping(“BoltC")
.fieldsGrouping( “BoltC", new Fields("uid")).fieldsGrouping( “BoltB", new Fields("uid"))
.declareStream("forD", "uid", "text").declareStream("forF", "uid", "text", "ne");
builder.setBolt(“BoltE", new BoltE(), 1).setNumTasks(2).shuffleGrouping(“BoltD“, “forE”);
builder.setBolt(“BoltF", new BoltF(), 1).setNumTasks(1).shuffleGrouping(“BoltD“, “forF”);
StormSubmitter.submitTopology(“demoTopology”, conf, builder.createTopology());
}
How Storm works? Spout Example
public class DemoSpout extends BaseRichSpout {
….
@Override
public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) {
_collector = collector;
_queue = new MyFavoritQueue<string>();
}
@Override
public void nextTuple() {
String nextItem = queue.poll();
_collector.emit(new Values(nextItem));
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields(“item"));
}
}
How Storm works? Bolt Example
public class BoltA extends BaseRichBolt {
private OutputCollector _collector;
@Override
public void execute(Tuple tuple) {
Object obj = tuple.getValue(0);
String capitalizedItem = capitalize((String)obj);
_collector.emit(tuple, new Value(capitalizedItem));
_collector.ack(tuple);
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields(“item"));
}
}
Storm UI
Read More about Storm
• Storm
http://storm-project.net/
• Example Storm Topologies
https://github.com/nathanmarz/storm-starter
• Implementing Real-Time Trending Topics With a Distributed Rolling Count Algorithm
http://www.michael-noll.com/blog/2013/01/18/implementing-real-time-trending-
topics-in-storm/
• Understanding the Internal Message Buffers of Storm
http://www.michael-noll.com/blog/2013/06/21/understanding-storm-internal-
message-buffers/
• Understanding the Parallelism of a Storm Topology
http://www.michael-noll.com/blog/2012/10/16/understanding-the-parallelism-of-
a-storm-topology/
Storm in our company
ferret-go.com
Ferret go GmbH
Trend & Media Analytics
ferret-go.com
Our data flow (simplified)
Twitter
Facebook
Google+
Blogs
Comments
Online media
Offline media
Reviews
ElasticSearch
ElasticSearch
ElasticSearch
processing classification analyzing
Problem overview
• we have a number of streams that spout items
• for every item we do different calculations
• at the end of calculations we save item into
storage(s) – ElasticSearch, PostgreSQL etc.
• if processing fails because of some environment
issues, we want to re-queue item easily
• some of our calculations can be done in parallel
Google+
Twitter
Facebook
Solution
• Redis-based queues for spouting
• 1-2 spouts per topology
• 1 bulk bolt for storage writing per worker
• Storm cluster with 2 nodes:
32 Gb, CPU 4C-i7, Java 7, Ubuntu 12.04
• ~ 20 items per sec (could be increased)
• 3 slots per worker, 198 tasks, 68 executors
Thank you!
30.09.2013
September BWB Meetup
Andrii Gakhov
Storm: overview
distributed and fault-tolerant realtime
computation.
Backend Web Berlin
Storm
www.storm-project.net
Storm is a free and open source distributed
realtime computation system.
September BWB Meetup
Use cases
distributed RPC continuous computationsstream processing
Overview
• free and open source
• integrates with any queuing and
database system
• distributed and scalable
• fault-tolerant
• supports multiple languages
Scalable
Storm topologies are inherently parallel and run across a cluster of machines.
Different parts of the topology can be scaled individually by tweaking their
parallelism.
The "rebalance" command of the "storm" command line client can adjust the
parallelism of running topologies on the fly.
Fault tolerant
When workers die, Storm will automatically restart them.
If a node dies, the worker will be restarted on another node.
The Storm daemons, Nimbus and the Supervisors, are designed to be stateless
and fail-fast.
Guarantees data processing
Storm guarantees every tuple will be fully processed. One of Storm's core
mechanisms is the ability to track the lineage of a tuple as it makes its way
through the topology in an extremely efficient way.
Messages are only replayed when there are failures. Storm's basic abstractions
provide an at-least-once processing guarantee, the same guarantee you get
when using a queueing system.
Use with many languages
Storm was designed from the ground up to be usable with any programming
language.
Similarly, spouts and bolts can be defined in any language. Non-JVM spouts
and bolts communicate to Storm over a JSON-based protocol over
stdin/stdout.
Adapters that implement this protocol exist for Ruby, Python, Javascript, Perl,
and PHP.
How Storm works? Storm cluster
Zookeeper
Zookeeper
Zookeeper
Supervisor
Supervisor
Supervisor
Supervisor
Supervisor
Nimbus
How Storm works? Basic concepts
Topology
Topology is a graph of computation. A topology runs forever, or until you kill it.
Stream
Stream is an unbounded sequence of tuples.
Spout
Spout is a source of streams.
Bolt
Bolt is the place where calculations are done. Bolts can do anything from run
functions, filter tuples, do streaming aggregations, joins, talk to databases etc.
How Storm works? Basic concepts
Worker process
A worker process executes a subset of a topology. A worker process belongs to
a specific topology and may run one or more executors for one or more
components (spouts or bolts) of this topology.
Executor (thread)
Executor is a thread that is spawned by a worker process. It may run 1+ tasks
for the same component. It always has 1 thread that it uses for all of its tasks.
Task
Task performs the actual data processing – each spout or bolt that you implement in
your code executes as many tasks across the cluster. The number of tasks for a
component is always the same throughout the lifetime of a topology.
How Storm works? Basic concepts
Spout
Task1
Task2
BoltA
Task1
Task2
Task3
BoltB
Task1
Task2
BoltC
Task1
Task2
Task3
Task4
Task5
Task6
BoltD
Task1
Task2
Task3
BoltE
Task1
Task2
BoltF
Task1
How Storm works? Topology Example
class DemoTopology {
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout(“Spout", new DemoSpout(), 2).setNumTasks(2)
.declareDefaultStream("uid", "item").declareStream(“item_copy", “uid”, "item");
builder.setBolt(“BoltA", new BoltA(), 2).setNumTasks(3).shuffleGrouping(“Spout“, “item_copy”);
builder.setBolt(“BoltB", new BoltB(), 2).setNumTasks(2).shuffleGrouping(“Spout")
.declareDefaultStream("uid", “fromB");
builder.setBolt(“BoltC", new BoltC(), 2).setNumTasks(6).shuffleGrouping(“BoltA")
.declareDefaultStream("uid", “fromC");
builder.setBolt(“BoltD", new BoltD(), 3).setNumTasks(3).shuffleGrouping(“BoltC")
.fieldsGrouping( “BoltC", new Fields("uid")).fieldsGrouping( “BoltB", new Fields("uid"))
.declareStream("forD", "uid", "text").declareStream("forF", "uid", "text", "ne");
builder.setBolt(“BoltE", new BoltE(), 1).setNumTasks(2).shuffleGrouping(“BoltD“, “forE”);
builder.setBolt(“BoltF", new BoltF(), 1).setNumTasks(1).shuffleGrouping(“BoltD“, “forF”);
StormSubmitter.submitTopology(“demoTopology”, conf, builder.createTopology());
}
How Storm works? Spout Example
public class DemoSpout extends BaseRichSpout {
….
@Override
public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) {
_collector = collector;
_queue = new MyFavoritQueue<string>();
}
@Override
public void nextTuple() {
String nextItem = queue.poll();
_collector.emit(new Values(nextItem));
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields(“item"));
}
}
How Storm works? Bolt Example
public class BoltA extends BaseRichBolt {
private OutputCollector _collector;
@Override
public void execute(Tuple tuple) {
Object obj = tuple.getValue(0);
String capitalizedItem = capitalize((String)obj);
_collector.emit(tuple, new Value(capitalizedItem));
_collector.ack(tuple);
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields(“item"));
}
}
Storm UI
Read More about Storm
• Storm
http://storm-project.net/
• Example Storm Topologies
https://github.com/nathanmarz/storm-starter
• Implementing Real-Time Trending Topics With a Distributed Rolling Count Algorithm
http://www.michael-noll.com/blog/2013/01/18/implementing-real-time-trending-
topics-in-storm/
• Understanding the Internal Message Buffers of Storm
http://www.michael-noll.com/blog/2013/06/21/understanding-storm-internal-
message-buffers/
• Understanding the Parallelism of a Storm Topology
http://www.michael-noll.com/blog/2012/10/16/understanding-the-parallelism-of-
a-storm-topology/
Storm in our company
ferret-go.com
Ferret go GmbH
Trend & Media Analytics
ferret-go.com
Our data flow (simplified)
Twitter
Facebook
Google+
Blogs
Comments
Online media
Offline media
Reviews
ElasticSearch
ElasticSearch
ElasticSearch
processing classification analyzing
Problem overview
• we have a number of streams that spout items
• for every item we do different calculations
• at the end of calculations we save item into
storage(s) – ElasticSearch, PostgreSQL etc.
• if processing fails because of some environment
issues, we want to re-queue item easily
• some of our calculations can be done in parallel
Google+
Twitter
Facebook
Solution
• Redis-based queues for spouting
• 1-2 spouts per topology
• 1 bulk bolt for storage writing per worker
• Storm cluster with 2 nodes:
32 Gb, CPU 4C-i7, Java 7, Ubuntu 12.04
• ~ 20 items per sec (could be increased)
• 3 slots per worker, 198 tasks, 68 executors
Thank you!
30.09.2013
September BWB Meetup
Andrii Gakhov

More Related Content

What's hot

Slide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache StormSlide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache StormMd. Shamsur Rahim
 
Storm Real Time Computation
Storm Real Time ComputationStorm Real Time Computation
Storm Real Time ComputationSonal Raj
 
Improved Reliable Streaming Processing: Apache Storm as example
Improved Reliable Streaming Processing: Apache Storm as exampleImproved Reliable Streaming Processing: Apache Storm as example
Improved Reliable Streaming Processing: Apache Storm as exampleDataWorks Summit/Hadoop Summit
 
Cassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market SceinceCassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market SceinceP. Taylor Goetz
 
Introduction to Twitter Storm
Introduction to Twitter StormIntroduction to Twitter Storm
Introduction to Twitter StormUwe Printz
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm Chandler Huang
 
Hadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureHadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureP. Taylor Goetz
 
Apache Storm Concepts
Apache Storm ConceptsApache Storm Concepts
Apache Storm ConceptsAndré Dias
 
Apache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - VerisignApache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - VerisignMichael Noll
 
Realtime processing with storm presentation
Realtime processing with storm presentationRealtime processing with storm presentation
Realtime processing with storm presentationGabriel Eisbruch
 
Storm: The Real-Time Layer - GlueCon 2012
Storm: The Real-Time Layer  - GlueCon 2012Storm: The Real-Time Layer  - GlueCon 2012
Storm: The Real-Time Layer - GlueCon 2012Dan Lynn
 
Developing Java Streaming Applications with Apache Storm
Developing Java Streaming Applications with Apache StormDeveloping Java Streaming Applications with Apache Storm
Developing Java Streaming Applications with Apache StormLester Martin
 

What's hot (19)

Slide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache StormSlide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache Storm
 
Storm Real Time Computation
Storm Real Time ComputationStorm Real Time Computation
Storm Real Time Computation
 
Apache Storm Tutorial
Apache Storm TutorialApache Storm Tutorial
Apache Storm Tutorial
 
Improved Reliable Streaming Processing: Apache Storm as example
Improved Reliable Streaming Processing: Apache Storm as exampleImproved Reliable Streaming Processing: Apache Storm as example
Improved Reliable Streaming Processing: Apache Storm as example
 
Apache Storm Internals
Apache Storm InternalsApache Storm Internals
Apache Storm Internals
 
STORM
STORMSTORM
STORM
 
Cassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market SceinceCassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market Sceince
 
Introduction to Twitter Storm
Introduction to Twitter StormIntroduction to Twitter Storm
Introduction to Twitter Storm
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
 
Resource Aware Scheduling in Apache Storm
Resource Aware Scheduling in Apache StormResource Aware Scheduling in Apache Storm
Resource Aware Scheduling in Apache Storm
 
Hadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm ArchitectureHadoop Summit Europe 2014: Apache Storm Architecture
Hadoop Summit Europe 2014: Apache Storm Architecture
 
Apache Storm Concepts
Apache Storm ConceptsApache Storm Concepts
Apache Storm Concepts
 
Storm and Cassandra
Storm and Cassandra Storm and Cassandra
Storm and Cassandra
 
Apache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - VerisignApache Storm 0.9 basic training - Verisign
Apache Storm 0.9 basic training - Verisign
 
Twitter Big Data
Twitter Big DataTwitter Big Data
Twitter Big Data
 
Storm
StormStorm
Storm
 
Realtime processing with storm presentation
Realtime processing with storm presentationRealtime processing with storm presentation
Realtime processing with storm presentation
 
Storm: The Real-Time Layer - GlueCon 2012
Storm: The Real-Time Layer  - GlueCon 2012Storm: The Real-Time Layer  - GlueCon 2012
Storm: The Real-Time Layer - GlueCon 2012
 
Developing Java Streaming Applications with Apache Storm
Developing Java Streaming Applications with Apache StormDeveloping Java Streaming Applications with Apache Storm
Developing Java Streaming Applications with Apache Storm
 

Viewers also liked

Тонкости email-маркетинга на крупных проектах
Тонкости email-маркетинга на крупных проектахТонкости email-маркетинга на крупных проектах
Тонкости email-маркетинга на крупных проектахPromodo
 
歡迎來到妖精們的根據地
歡迎來到妖精們的根據地歡迎來到妖精們的根據地
歡迎來到妖精們的根據地coporo
 
Definition of News from William Hearst
Definition of News from William HearstDefinition of News from William Hearst
Definition of News from William HearstClive Dickens
 
Create Your LinkedIn Group
Create Your LinkedIn GroupCreate Your LinkedIn Group
Create Your LinkedIn GroupTariq Ahmad
 
A Hole in One: Courtney Love Charms #CannesLions #OgilvyCannes
A Hole in One: Courtney Love Charms #CannesLions #OgilvyCannes A Hole in One: Courtney Love Charms #CannesLions #OgilvyCannes
A Hole in One: Courtney Love Charms #CannesLions #OgilvyCannes Ogilvy
 
Information Literacy in Higher Education
Information Literacy in Higher EducationInformation Literacy in Higher Education
Information Literacy in Higher EducationHELIGLIASA
 
Eye Tracking the User Experience of Mobile: What You Need to Know
Eye Tracking the User Experience of Mobile: What You Need to KnowEye Tracking the User Experience of Mobile: What You Need to Know
Eye Tracking the User Experience of Mobile: What You Need to KnowJennifer Romano Bergstrom
 
Sirje Tamm - Muutused haridusmaastikul õpetaja pilgu läbi
Sirje Tamm - Muutused haridusmaastikul õpetaja pilgu läbiSirje Tamm - Muutused haridusmaastikul õpetaja pilgu läbi
Sirje Tamm - Muutused haridusmaastikul õpetaja pilgu läbilepakas
 
Sirje Aigro - Millised muutused ootavad ees meie kooli
Sirje Aigro - Millised muutused ootavad ees meie kooliSirje Aigro - Millised muutused ootavad ees meie kooli
Sirje Aigro - Millised muutused ootavad ees meie koolilepakas
 
Optimizing task completion times
Optimizing task completion times Optimizing task completion times
Optimizing task completion times Kiran Badam
 
Using Social Media For Commodity Affiliate Programs
Using Social Media For Commodity Affiliate ProgramsUsing Social Media For Commodity Affiliate Programs
Using Social Media For Commodity Affiliate ProgramsAffiliate Summit
 
Steam Eye Mask help you solve eye problem
Steam Eye Mask help you solve eye problemSteam Eye Mask help you solve eye problem
Steam Eye Mask help you solve eye problemSelina Li
 
Презентация проекта Аргус-М
Презентация проекта Аргус-МПрезентация проекта Аргус-М
Презентация проекта Аргус-Мkulibin
 

Viewers also liked (18)

Тонкости email-маркетинга на крупных проектах
Тонкости email-маркетинга на крупных проектахТонкости email-маркетинга на крупных проектах
Тонкости email-маркетинга на крупных проектах
 
歡迎來到妖精們的根據地
歡迎來到妖精們的根據地歡迎來到妖精們的根據地
歡迎來到妖精們的根據地
 
Definition of News from William Hearst
Definition of News from William HearstDefinition of News from William Hearst
Definition of News from William Hearst
 
Evaluation001
Evaluation001Evaluation001
Evaluation001
 
Zaragoza turismo 239
Zaragoza turismo 239Zaragoza turismo 239
Zaragoza turismo 239
 
Create Your LinkedIn Group
Create Your LinkedIn GroupCreate Your LinkedIn Group
Create Your LinkedIn Group
 
A Hole in One: Courtney Love Charms #CannesLions #OgilvyCannes
A Hole in One: Courtney Love Charms #CannesLions #OgilvyCannes A Hole in One: Courtney Love Charms #CannesLions #OgilvyCannes
A Hole in One: Courtney Love Charms #CannesLions #OgilvyCannes
 
میوه آرایی
میوه آراییمیوه آرایی
میوه آرایی
 
Information Literacy in Higher Education
Information Literacy in Higher EducationInformation Literacy in Higher Education
Information Literacy in Higher Education
 
Eye Tracking the User Experience of Mobile: What You Need to Know
Eye Tracking the User Experience of Mobile: What You Need to KnowEye Tracking the User Experience of Mobile: What You Need to Know
Eye Tracking the User Experience of Mobile: What You Need to Know
 
Alliance 0
Alliance 0Alliance 0
Alliance 0
 
Sirje Tamm - Muutused haridusmaastikul õpetaja pilgu läbi
Sirje Tamm - Muutused haridusmaastikul õpetaja pilgu läbiSirje Tamm - Muutused haridusmaastikul õpetaja pilgu läbi
Sirje Tamm - Muutused haridusmaastikul õpetaja pilgu läbi
 
Sirje Aigro - Millised muutused ootavad ees meie kooli
Sirje Aigro - Millised muutused ootavad ees meie kooliSirje Aigro - Millised muutused ootavad ees meie kooli
Sirje Aigro - Millised muutused ootavad ees meie kooli
 
Optimizing task completion times
Optimizing task completion times Optimizing task completion times
Optimizing task completion times
 
Zaragoza turismo 231
Zaragoza turismo 231Zaragoza turismo 231
Zaragoza turismo 231
 
Using Social Media For Commodity Affiliate Programs
Using Social Media For Commodity Affiliate ProgramsUsing Social Media For Commodity Affiliate Programs
Using Social Media For Commodity Affiliate Programs
 
Steam Eye Mask help you solve eye problem
Steam Eye Mask help you solve eye problemSteam Eye Mask help you solve eye problem
Steam Eye Mask help you solve eye problem
 
Презентация проекта Аргус-М
Презентация проекта Аргус-МПрезентация проекта Аргус-М
Презентация проекта Аргус-М
 

Similar to BWB Meetup: Storm - distributed realtime computation system

Cleveland HUG - Storm
Cleveland HUG - StormCleveland HUG - Storm
Cleveland HUG - Stormjustinjleet
 
storm-170531123446.dotx.pptx
storm-170531123446.dotx.pptxstorm-170531123446.dotx.pptx
storm-170531123446.dotx.pptxIbrahimBenhadhria
 
Real-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormReal-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormDavorin Vukelic
 
Distributed Realtime Computation using Apache Storm
Distributed Realtime Computation using Apache StormDistributed Realtime Computation using Apache Storm
Distributed Realtime Computation using Apache Stormthe100rabh
 
Real time stream processing presentation at General Assemb.ly
Real time stream processing presentation at General Assemb.lyReal time stream processing presentation at General Assemb.ly
Real time stream processing presentation at General Assemb.lyVarun Vijayaraghavan
 
Os Reindersfinal
Os ReindersfinalOs Reindersfinal
Os Reindersfinaloscon2007
 
Os Reindersfinal
Os ReindersfinalOs Reindersfinal
Os Reindersfinaloscon2007
 
Sinfonier: How I turned my grandmother into a data analyst - Fran J. Gomez - ...
Sinfonier: How I turned my grandmother into a data analyst - Fran J. Gomez - ...Sinfonier: How I turned my grandmother into a data analyst - Fran J. Gomez - ...
Sinfonier: How I turned my grandmother into a data analyst - Fran J. Gomez - ...Codemotion
 
Recursion & Erlang, FunctionalConf 14, Bangalore
Recursion & Erlang, FunctionalConf 14, BangaloreRecursion & Erlang, FunctionalConf 14, Bangalore
Recursion & Erlang, FunctionalConf 14, BangaloreBhasker Kode
 

Similar to BWB Meetup: Storm - distributed realtime computation system (20)

Storm
StormStorm
Storm
 
Cleveland HUG - Storm
Cleveland HUG - StormCleveland HUG - Storm
Cleveland HUG - Storm
 
1 storm-intro
1 storm-intro1 storm-intro
1 storm-intro
 
storm-170531123446.dotx.pptx
storm-170531123446.dotx.pptxstorm-170531123446.dotx.pptx
storm-170531123446.dotx.pptx
 
Real-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormReal-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache Storm
 
Apache Storm Basics
Apache Storm BasicsApache Storm Basics
Apache Storm Basics
 
Distributed Realtime Computation using Apache Storm
Distributed Realtime Computation using Apache StormDistributed Realtime Computation using Apache Storm
Distributed Realtime Computation using Apache Storm
 
Apache Storm
Apache StormApache Storm
Apache Storm
 
Real time stream processing presentation at General Assemb.ly
Real time stream processing presentation at General Assemb.lyReal time stream processing presentation at General Assemb.ly
Real time stream processing presentation at General Assemb.ly
 
storm for RTA.pptx
storm for RTA.pptxstorm for RTA.pptx
storm for RTA.pptx
 
The Future of Apache Storm
The Future of Apache StormThe Future of Apache Storm
The Future of Apache Storm
 
Os Reindersfinal
Os ReindersfinalOs Reindersfinal
Os Reindersfinal
 
Os Reindersfinal
Os ReindersfinalOs Reindersfinal
Os Reindersfinal
 
Storm 0.8.2
Storm 0.8.2Storm 0.8.2
Storm 0.8.2
 
IOT.pptx
IOT.pptxIOT.pptx
IOT.pptx
 
Sinfonier: How I turned my grandmother into a data analyst - Fran J. Gomez - ...
Sinfonier: How I turned my grandmother into a data analyst - Fran J. Gomez - ...Sinfonier: How I turned my grandmother into a data analyst - Fran J. Gomez - ...
Sinfonier: How I turned my grandmother into a data analyst - Fran J. Gomez - ...
 
Storm begins
Storm beginsStorm begins
Storm begins
 
The Future of Apache Storm
The Future of Apache StormThe Future of Apache Storm
The Future of Apache Storm
 
Recursion & Erlang, FunctionalConf 14, Bangalore
Recursion & Erlang, FunctionalConf 14, BangaloreRecursion & Erlang, FunctionalConf 14, Bangalore
Recursion & Erlang, FunctionalConf 14, Bangalore
 
storm-170531123446.pptx
storm-170531123446.pptxstorm-170531123446.pptx
storm-170531123446.pptx
 

More from Andrii Gakhov

Let's start GraphQL: structure, behavior, and architecture
Let's start GraphQL: structure, behavior, and architectureLet's start GraphQL: structure, behavior, and architecture
Let's start GraphQL: structure, behavior, and architectureAndrii Gakhov
 
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...Andrii Gakhov
 
Too Much Data? - Just Sample, Just Hash, ...
Too Much Data? - Just Sample, Just Hash, ...Too Much Data? - Just Sample, Just Hash, ...
Too Much Data? - Just Sample, Just Hash, ...Andrii Gakhov
 
Implementing a Fileserver with Nginx and Lua
Implementing a Fileserver with Nginx and LuaImplementing a Fileserver with Nginx and Lua
Implementing a Fileserver with Nginx and LuaAndrii Gakhov
 
Pecha Kucha: Ukrainian Food Traditions
Pecha Kucha: Ukrainian Food TraditionsPecha Kucha: Ukrainian Food Traditions
Pecha Kucha: Ukrainian Food TraditionsAndrii Gakhov
 
Probabilistic data structures. Part 4. Similarity
Probabilistic data structures. Part 4. SimilarityProbabilistic data structures. Part 4. Similarity
Probabilistic data structures. Part 4. SimilarityAndrii Gakhov
 
Probabilistic data structures. Part 3. Frequency
Probabilistic data structures. Part 3. FrequencyProbabilistic data structures. Part 3. Frequency
Probabilistic data structures. Part 3. FrequencyAndrii Gakhov
 
Probabilistic data structures. Part 2. Cardinality
Probabilistic data structures. Part 2. CardinalityProbabilistic data structures. Part 2. Cardinality
Probabilistic data structures. Part 2. CardinalityAndrii Gakhov
 
Вероятностные структуры данных
Вероятностные структуры данныхВероятностные структуры данных
Вероятностные структуры данныхAndrii Gakhov
 
Recurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: TheoryRecurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: TheoryAndrii Gakhov
 
Apache Big Data Europe 2015: Selected Talks
Apache Big Data Europe 2015: Selected TalksApache Big Data Europe 2015: Selected Talks
Apache Big Data Europe 2015: Selected TalksAndrii Gakhov
 
Swagger / Quick Start Guide
Swagger / Quick Start GuideSwagger / Quick Start Guide
Swagger / Quick Start GuideAndrii Gakhov
 
API Days Berlin highlights
API Days Berlin highlightsAPI Days Berlin highlights
API Days Berlin highlightsAndrii Gakhov
 
ELK - What's new and showcases
ELK - What's new and showcasesELK - What's new and showcases
ELK - What's new and showcasesAndrii Gakhov
 
Apache Spark Overview @ ferret
Apache Spark Overview @ ferretApache Spark Overview @ ferret
Apache Spark Overview @ ferretAndrii Gakhov
 
Data Mining - lecture 8 - 2014
Data Mining - lecture 8 - 2014Data Mining - lecture 8 - 2014
Data Mining - lecture 8 - 2014Andrii Gakhov
 
Data Mining - lecture 7 - 2014
Data Mining - lecture 7 - 2014Data Mining - lecture 7 - 2014
Data Mining - lecture 7 - 2014Andrii Gakhov
 
Data Mining - lecture 6 - 2014
Data Mining - lecture 6 - 2014Data Mining - lecture 6 - 2014
Data Mining - lecture 6 - 2014Andrii Gakhov
 
Data Mining - lecture 5 - 2014
Data Mining - lecture 5 - 2014Data Mining - lecture 5 - 2014
Data Mining - lecture 5 - 2014Andrii Gakhov
 

More from Andrii Gakhov (20)

Let's start GraphQL: structure, behavior, and architecture
Let's start GraphQL: structure, behavior, and architectureLet's start GraphQL: structure, behavior, and architecture
Let's start GraphQL: structure, behavior, and architecture
 
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...
 
Too Much Data? - Just Sample, Just Hash, ...
Too Much Data? - Just Sample, Just Hash, ...Too Much Data? - Just Sample, Just Hash, ...
Too Much Data? - Just Sample, Just Hash, ...
 
DNS Delegation
DNS DelegationDNS Delegation
DNS Delegation
 
Implementing a Fileserver with Nginx and Lua
Implementing a Fileserver with Nginx and LuaImplementing a Fileserver with Nginx and Lua
Implementing a Fileserver with Nginx and Lua
 
Pecha Kucha: Ukrainian Food Traditions
Pecha Kucha: Ukrainian Food TraditionsPecha Kucha: Ukrainian Food Traditions
Pecha Kucha: Ukrainian Food Traditions
 
Probabilistic data structures. Part 4. Similarity
Probabilistic data structures. Part 4. SimilarityProbabilistic data structures. Part 4. Similarity
Probabilistic data structures. Part 4. Similarity
 
Probabilistic data structures. Part 3. Frequency
Probabilistic data structures. Part 3. FrequencyProbabilistic data structures. Part 3. Frequency
Probabilistic data structures. Part 3. Frequency
 
Probabilistic data structures. Part 2. Cardinality
Probabilistic data structures. Part 2. CardinalityProbabilistic data structures. Part 2. Cardinality
Probabilistic data structures. Part 2. Cardinality
 
Вероятностные структуры данных
Вероятностные структуры данныхВероятностные структуры данных
Вероятностные структуры данных
 
Recurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: TheoryRecurrent Neural Networks. Part 1: Theory
Recurrent Neural Networks. Part 1: Theory
 
Apache Big Data Europe 2015: Selected Talks
Apache Big Data Europe 2015: Selected TalksApache Big Data Europe 2015: Selected Talks
Apache Big Data Europe 2015: Selected Talks
 
Swagger / Quick Start Guide
Swagger / Quick Start GuideSwagger / Quick Start Guide
Swagger / Quick Start Guide
 
API Days Berlin highlights
API Days Berlin highlightsAPI Days Berlin highlights
API Days Berlin highlights
 
ELK - What's new and showcases
ELK - What's new and showcasesELK - What's new and showcases
ELK - What's new and showcases
 
Apache Spark Overview @ ferret
Apache Spark Overview @ ferretApache Spark Overview @ ferret
Apache Spark Overview @ ferret
 
Data Mining - lecture 8 - 2014
Data Mining - lecture 8 - 2014Data Mining - lecture 8 - 2014
Data Mining - lecture 8 - 2014
 
Data Mining - lecture 7 - 2014
Data Mining - lecture 7 - 2014Data Mining - lecture 7 - 2014
Data Mining - lecture 7 - 2014
 
Data Mining - lecture 6 - 2014
Data Mining - lecture 6 - 2014Data Mining - lecture 6 - 2014
Data Mining - lecture 6 - 2014
 
Data Mining - lecture 5 - 2014
Data Mining - lecture 5 - 2014Data Mining - lecture 5 - 2014
Data Mining - lecture 5 - 2014
 

Recently uploaded

Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 

Recently uploaded (20)

Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 

BWB Meetup: Storm - distributed realtime computation system

  • 1. Storm: overview distributed and fault-tolerant realtime computation. Backend Web Berlin
  • 2. Storm www.storm-project.net Storm is a free and open source distributed realtime computation system. September BWB Meetup
  • 3. Use cases distributed RPC continuous computationsstream processing
  • 4. Overview • free and open source • integrates with any queuing and database system • distributed and scalable • fault-tolerant • supports multiple languages
  • 5. Scalable Storm topologies are inherently parallel and run across a cluster of machines. Different parts of the topology can be scaled individually by tweaking their parallelism. The "rebalance" command of the "storm" command line client can adjust the parallelism of running topologies on the fly.
  • 6. Fault tolerant When workers die, Storm will automatically restart them. If a node dies, the worker will be restarted on another node. The Storm daemons, Nimbus and the Supervisors, are designed to be stateless and fail-fast.
  • 7. Guarantees data processing Storm guarantees every tuple will be fully processed. One of Storm's core mechanisms is the ability to track the lineage of a tuple as it makes its way through the topology in an extremely efficient way. Messages are only replayed when there are failures. Storm's basic abstractions provide an at-least-once processing guarantee, the same guarantee you get when using a queueing system.
  • 8. Use with many languages Storm was designed from the ground up to be usable with any programming language. Similarly, spouts and bolts can be defined in any language. Non-JVM spouts and bolts communicate to Storm over a JSON-based protocol over stdin/stdout. Adapters that implement this protocol exist for Ruby, Python, Javascript, Perl, and PHP.
  • 9. How Storm works? Storm cluster Zookeeper Zookeeper Zookeeper Supervisor Supervisor Supervisor Supervisor Supervisor Nimbus
  • 10. How Storm works? Basic concepts Topology Topology is a graph of computation. A topology runs forever, or until you kill it. Stream Stream is an unbounded sequence of tuples. Spout Spout is a source of streams. Bolt Bolt is the place where calculations are done. Bolts can do anything from run functions, filter tuples, do streaming aggregations, joins, talk to databases etc.
  • 11. How Storm works? Basic concepts Worker process A worker process executes a subset of a topology. A worker process belongs to a specific topology and may run one or more executors for one or more components (spouts or bolts) of this topology. Executor (thread) Executor is a thread that is spawned by a worker process. It may run 1+ tasks for the same component. It always has 1 thread that it uses for all of its tasks. Task Task performs the actual data processing – each spout or bolt that you implement in your code executes as many tasks across the cluster. The number of tasks for a component is always the same throughout the lifetime of a topology.
  • 12. How Storm works? Basic concepts Spout Task1 Task2 BoltA Task1 Task2 Task3 BoltB Task1 Task2 BoltC Task1 Task2 Task3 Task4 Task5 Task6 BoltD Task1 Task2 Task3 BoltE Task1 Task2 BoltF Task1
  • 13. How Storm works? Topology Example class DemoTopology { TopologyBuilder builder = new TopologyBuilder(); builder.setSpout(“Spout", new DemoSpout(), 2).setNumTasks(2) .declareDefaultStream("uid", "item").declareStream(“item_copy", “uid”, "item"); builder.setBolt(“BoltA", new BoltA(), 2).setNumTasks(3).shuffleGrouping(“Spout“, “item_copy”); builder.setBolt(“BoltB", new BoltB(), 2).setNumTasks(2).shuffleGrouping(“Spout") .declareDefaultStream("uid", “fromB"); builder.setBolt(“BoltC", new BoltC(), 2).setNumTasks(6).shuffleGrouping(“BoltA") .declareDefaultStream("uid", “fromC"); builder.setBolt(“BoltD", new BoltD(), 3).setNumTasks(3).shuffleGrouping(“BoltC") .fieldsGrouping( “BoltC", new Fields("uid")).fieldsGrouping( “BoltB", new Fields("uid")) .declareStream("forD", "uid", "text").declareStream("forF", "uid", "text", "ne"); builder.setBolt(“BoltE", new BoltE(), 1).setNumTasks(2).shuffleGrouping(“BoltD“, “forE”); builder.setBolt(“BoltF", new BoltF(), 1).setNumTasks(1).shuffleGrouping(“BoltD“, “forF”); StormSubmitter.submitTopology(“demoTopology”, conf, builder.createTopology()); }
  • 14. How Storm works? Spout Example public class DemoSpout extends BaseRichSpout { …. @Override public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) { _collector = collector; _queue = new MyFavoritQueue<string>(); } @Override public void nextTuple() { String nextItem = queue.poll(); _collector.emit(new Values(nextItem)); } @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields(“item")); } }
  • 15. How Storm works? Bolt Example public class BoltA extends BaseRichBolt { private OutputCollector _collector; @Override public void execute(Tuple tuple) { Object obj = tuple.getValue(0); String capitalizedItem = capitalize((String)obj); _collector.emit(tuple, new Value(capitalizedItem)); _collector.ack(tuple); } @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields(“item")); } }
  • 17. Read More about Storm • Storm http://storm-project.net/ • Example Storm Topologies https://github.com/nathanmarz/storm-starter • Implementing Real-Time Trending Topics With a Distributed Rolling Count Algorithm http://www.michael-noll.com/blog/2013/01/18/implementing-real-time-trending- topics-in-storm/ • Understanding the Internal Message Buffers of Storm http://www.michael-noll.com/blog/2013/06/21/understanding-storm-internal- message-buffers/ • Understanding the Parallelism of a Storm Topology http://www.michael-noll.com/blog/2012/10/16/understanding-the-parallelism-of- a-storm-topology/
  • 18. Storm in our company ferret-go.com
  • 19. Ferret go GmbH Trend & Media Analytics ferret-go.com
  • 20. Our data flow (simplified) Twitter Facebook Google+ Blogs Comments Online media Offline media Reviews ElasticSearch ElasticSearch ElasticSearch processing classification analyzing
  • 21. Problem overview • we have a number of streams that spout items • for every item we do different calculations • at the end of calculations we save item into storage(s) – ElasticSearch, PostgreSQL etc. • if processing fails because of some environment issues, we want to re-queue item easily • some of our calculations can be done in parallel Google+ Twitter Facebook
  • 22. Solution • Redis-based queues for spouting • 1-2 spouts per topology • 1 bulk bolt for storage writing per worker • Storm cluster with 2 nodes: 32 Gb, CPU 4C-i7, Java 7, Ubuntu 12.04 • ~ 20 items per sec (could be increased) • 3 slots per worker, 198 tasks, 68 executors
  • 23. Thank you! 30.10.2013 September BWB Meetup Andrii Gakhov
  • 24. Storm: overview distributed and fault-tolerant realtime computation. Backend Web Berlin
  • 25. Storm www.storm-project.net Storm is a free and open source distributed realtime computation system. September BWB Meetup
  • 26. Use cases distributed RPC continuous computationsstream processing
  • 27. Overview • free and open source • integrates with any queuing and database system • distributed and scalable • fault-tolerant • supports multiple languages
  • 28. Scalable Storm topologies are inherently parallel and run across a cluster of machines. Different parts of the topology can be scaled individually by tweaking their parallelism. The "rebalance" command of the "storm" command line client can adjust the parallelism of running topologies on the fly.
  • 29. Fault tolerant When workers die, Storm will automatically restart them. If a node dies, the worker will be restarted on another node. The Storm daemons, Nimbus and the Supervisors, are designed to be stateless and fail-fast.
  • 30. Guarantees data processing Storm guarantees every tuple will be fully processed. One of Storm's core mechanisms is the ability to track the lineage of a tuple as it makes its way through the topology in an extremely efficient way. Messages are only replayed when there are failures. Storm's basic abstractions provide an at-least-once processing guarantee, the same guarantee you get when using a queueing system.
  • 31. Use with many languages Storm was designed from the ground up to be usable with any programming language. Similarly, spouts and bolts can be defined in any language. Non-JVM spouts and bolts communicate to Storm over a JSON-based protocol over stdin/stdout. Adapters that implement this protocol exist for Ruby, Python, Javascript, Perl, and PHP.
  • 32. How Storm works? Storm cluster Zookeeper Zookeeper Zookeeper Supervisor Supervisor Supervisor Supervisor Supervisor Nimbus
  • 33. How Storm works? Basic concepts Topology Topology is a graph of computation. A topology runs forever, or until you kill it. Stream Stream is an unbounded sequence of tuples. Spout Spout is a source of streams. Bolt Bolt is the place where calculations are done. Bolts can do anything from run functions, filter tuples, do streaming aggregations, joins, talk to databases etc.
  • 34. How Storm works? Basic concepts Worker process A worker process executes a subset of a topology. A worker process belongs to a specific topology and may run one or more executors for one or more components (spouts or bolts) of this topology. Executor (thread) Executor is a thread that is spawned by a worker process. It may run 1+ tasks for the same component. It always has 1 thread that it uses for all of its tasks. Task Task performs the actual data processing – each spout or bolt that you implement in your code executes as many tasks across the cluster. The number of tasks for a component is always the same throughout the lifetime of a topology.
  • 35. How Storm works? Basic concepts Spout Task1 Task2 BoltA Task1 Task2 Task3 BoltB Task1 Task2 BoltC Task1 Task2 Task3 Task4 Task5 Task6 BoltD Task1 Task2 Task3 BoltE Task1 Task2 BoltF Task1
  • 36. How Storm works? Topology Example class DemoTopology { TopologyBuilder builder = new TopologyBuilder(); builder.setSpout(“Spout", new DemoSpout(), 2).setNumTasks(2) .declareDefaultStream("uid", "item").declareStream(“item_copy", “uid”, "item"); builder.setBolt(“BoltA", new BoltA(), 2).setNumTasks(3).shuffleGrouping(“Spout“, “item_copy”); builder.setBolt(“BoltB", new BoltB(), 2).setNumTasks(2).shuffleGrouping(“Spout") .declareDefaultStream("uid", “fromB"); builder.setBolt(“BoltC", new BoltC(), 2).setNumTasks(6).shuffleGrouping(“BoltA") .declareDefaultStream("uid", “fromC"); builder.setBolt(“BoltD", new BoltD(), 3).setNumTasks(3).shuffleGrouping(“BoltC") .fieldsGrouping( “BoltC", new Fields("uid")).fieldsGrouping( “BoltB", new Fields("uid")) .declareStream("forD", "uid", "text").declareStream("forF", "uid", "text", "ne"); builder.setBolt(“BoltE", new BoltE(), 1).setNumTasks(2).shuffleGrouping(“BoltD“, “forE”); builder.setBolt(“BoltF", new BoltF(), 1).setNumTasks(1).shuffleGrouping(“BoltD“, “forF”); StormSubmitter.submitTopology(“demoTopology”, conf, builder.createTopology()); }
  • 37. How Storm works? Spout Example public class DemoSpout extends BaseRichSpout { …. @Override public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) { _collector = collector; _queue = new MyFavoritQueue<string>(); } @Override public void nextTuple() { String nextItem = queue.poll(); _collector.emit(new Values(nextItem)); } @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields(“item")); } }
  • 38. How Storm works? Bolt Example public class BoltA extends BaseRichBolt { private OutputCollector _collector; @Override public void execute(Tuple tuple) { Object obj = tuple.getValue(0); String capitalizedItem = capitalize((String)obj); _collector.emit(tuple, new Value(capitalizedItem)); _collector.ack(tuple); } @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields(“item")); } }
  • 40. Read More about Storm • Storm http://storm-project.net/ • Example Storm Topologies https://github.com/nathanmarz/storm-starter • Implementing Real-Time Trending Topics With a Distributed Rolling Count Algorithm http://www.michael-noll.com/blog/2013/01/18/implementing-real-time-trending- topics-in-storm/ • Understanding the Internal Message Buffers of Storm http://www.michael-noll.com/blog/2013/06/21/understanding-storm-internal- message-buffers/ • Understanding the Parallelism of a Storm Topology http://www.michael-noll.com/blog/2012/10/16/understanding-the-parallelism-of- a-storm-topology/
  • 41. Storm in our company ferret-go.com
  • 42. Ferret go GmbH Trend & Media Analytics ferret-go.com
  • 43. Our data flow (simplified) Twitter Facebook Google+ Blogs Comments Online media Offline media Reviews ElasticSearch ElasticSearch ElasticSearch processing classification analyzing
  • 44. Problem overview • we have a number of streams that spout items • for every item we do different calculations • at the end of calculations we save item into storage(s) – ElasticSearch, PostgreSQL etc. • if processing fails because of some environment issues, we want to re-queue item easily • some of our calculations can be done in parallel Google+ Twitter Facebook
  • 45. Solution • Redis-based queues for spouting • 1-2 spouts per topology • 1 bulk bolt for storage writing per worker • Storm cluster with 2 nodes: 32 Gb, CPU 4C-i7, Java 7, Ubuntu 12.04 • ~ 20 items per sec (could be increased) • 3 slots per worker, 198 tasks, 68 executors
  • 46. Thank you! 30.09.2013 September BWB Meetup Andrii Gakhov
  • 47. Storm: overview distributed and fault-tolerant realtime computation. Backend Web Berlin
  • 48. Storm www.storm-project.net Storm is a free and open source distributed realtime computation system. September BWB Meetup
  • 49. Use cases distributed RPC continuous computationsstream processing
  • 50. Overview • free and open source • integrates with any queuing and database system • distributed and scalable • fault-tolerant • supports multiple languages
  • 51. Scalable Storm topologies are inherently parallel and run across a cluster of machines. Different parts of the topology can be scaled individually by tweaking their parallelism. The "rebalance" command of the "storm" command line client can adjust the parallelism of running topologies on the fly.
  • 52. Fault tolerant When workers die, Storm will automatically restart them. If a node dies, the worker will be restarted on another node. The Storm daemons, Nimbus and the Supervisors, are designed to be stateless and fail-fast.
  • 53. Guarantees data processing Storm guarantees every tuple will be fully processed. One of Storm's core mechanisms is the ability to track the lineage of a tuple as it makes its way through the topology in an extremely efficient way. Messages are only replayed when there are failures. Storm's basic abstractions provide an at-least-once processing guarantee, the same guarantee you get when using a queueing system.
  • 54. Use with many languages Storm was designed from the ground up to be usable with any programming language. Similarly, spouts and bolts can be defined in any language. Non-JVM spouts and bolts communicate to Storm over a JSON-based protocol over stdin/stdout. Adapters that implement this protocol exist for Ruby, Python, Javascript, Perl, and PHP.
  • 55. How Storm works? Storm cluster Zookeeper Zookeeper Zookeeper Supervisor Supervisor Supervisor Supervisor Supervisor Nimbus
  • 56. How Storm works? Basic concepts Topology Topology is a graph of computation. A topology runs forever, or until you kill it. Stream Stream is an unbounded sequence of tuples. Spout Spout is a source of streams. Bolt Bolt is the place where calculations are done. Bolts can do anything from run functions, filter tuples, do streaming aggregations, joins, talk to databases etc.
  • 57. How Storm works? Basic concepts Worker process A worker process executes a subset of a topology. A worker process belongs to a specific topology and may run one or more executors for one or more components (spouts or bolts) of this topology. Executor (thread) Executor is a thread that is spawned by a worker process. It may run 1+ tasks for the same component. It always has 1 thread that it uses for all of its tasks. Task Task performs the actual data processing – each spout or bolt that you implement in your code executes as many tasks across the cluster. The number of tasks for a component is always the same throughout the lifetime of a topology.
  • 58. How Storm works? Basic concepts Spout Task1 Task2 BoltA Task1 Task2 Task3 BoltB Task1 Task2 BoltC Task1 Task2 Task3 Task4 Task5 Task6 BoltD Task1 Task2 Task3 BoltE Task1 Task2 BoltF Task1
  • 59. How Storm works? Topology Example class DemoTopology { TopologyBuilder builder = new TopologyBuilder(); builder.setSpout(“Spout", new DemoSpout(), 2).setNumTasks(2) .declareDefaultStream("uid", "item").declareStream(“item_copy", “uid”, "item"); builder.setBolt(“BoltA", new BoltA(), 2).setNumTasks(3).shuffleGrouping(“Spout“, “item_copy”); builder.setBolt(“BoltB", new BoltB(), 2).setNumTasks(2).shuffleGrouping(“Spout") .declareDefaultStream("uid", “fromB"); builder.setBolt(“BoltC", new BoltC(), 2).setNumTasks(6).shuffleGrouping(“BoltA") .declareDefaultStream("uid", “fromC"); builder.setBolt(“BoltD", new BoltD(), 3).setNumTasks(3).shuffleGrouping(“BoltC") .fieldsGrouping( “BoltC", new Fields("uid")).fieldsGrouping( “BoltB", new Fields("uid")) .declareStream("forD", "uid", "text").declareStream("forF", "uid", "text", "ne"); builder.setBolt(“BoltE", new BoltE(), 1).setNumTasks(2).shuffleGrouping(“BoltD“, “forE”); builder.setBolt(“BoltF", new BoltF(), 1).setNumTasks(1).shuffleGrouping(“BoltD“, “forF”); StormSubmitter.submitTopology(“demoTopology”, conf, builder.createTopology()); }
  • 60. How Storm works? Spout Example public class DemoSpout extends BaseRichSpout { …. @Override public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) { _collector = collector; _queue = new MyFavoritQueue<string>(); } @Override public void nextTuple() { String nextItem = queue.poll(); _collector.emit(new Values(nextItem)); } @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields(“item")); } }
  • 61. How Storm works? Bolt Example public class BoltA extends BaseRichBolt { private OutputCollector _collector; @Override public void execute(Tuple tuple) { Object obj = tuple.getValue(0); String capitalizedItem = capitalize((String)obj); _collector.emit(tuple, new Value(capitalizedItem)); _collector.ack(tuple); } @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields(“item")); } }
  • 63. Read More about Storm • Storm http://storm-project.net/ • Example Storm Topologies https://github.com/nathanmarz/storm-starter • Implementing Real-Time Trending Topics With a Distributed Rolling Count Algorithm http://www.michael-noll.com/blog/2013/01/18/implementing-real-time-trending- topics-in-storm/ • Understanding the Internal Message Buffers of Storm http://www.michael-noll.com/blog/2013/06/21/understanding-storm-internal- message-buffers/ • Understanding the Parallelism of a Storm Topology http://www.michael-noll.com/blog/2012/10/16/understanding-the-parallelism-of- a-storm-topology/
  • 64. Storm in our company ferret-go.com
  • 65. Ferret go GmbH Trend & Media Analytics ferret-go.com
  • 66. Our data flow (simplified) Twitter Facebook Google+ Blogs Comments Online media Offline media Reviews ElasticSearch ElasticSearch ElasticSearch processing classification analyzing
  • 67. Problem overview • we have a number of streams that spout items • for every item we do different calculations • at the end of calculations we save item into storage(s) – ElasticSearch, PostgreSQL etc. • if processing fails because of some environment issues, we want to re-queue item easily • some of our calculations can be done in parallel Google+ Twitter Facebook
  • 68. Solution • Redis-based queues for spouting • 1-2 spouts per topology • 1 bulk bolt for storage writing per worker • Storm cluster with 2 nodes: 32 Gb, CPU 4C-i7, Java 7, Ubuntu 12.04 • ~ 20 items per sec (could be increased) • 3 slots per worker, 198 tasks, 68 executors
  • 69. Thank you! 30.09.2013 September BWB Meetup Andrii Gakhov

Editor's Notes

  1. In Slide Show mode, click the arrow to enter the PowerPoint Getting Started Center.
  2. In Slide Show mode, click the arrow to enter the PowerPoint Getting Started Center.