A Brief introduction to Apache Storm. Talk given at the October Toronto Java User Group meeting, video available at https://www.youtube.com/watch?v=CWyH4-SOGm8
2. What Is Storm
● Distributed
● Stream Oriented
● Real Time*
● Scalable
● Reliable**
3. Topologies
● Storm’s equivalent of an application
○ Consists of Components (Spouts and Bolts)
■ parallelism
■ Data represented as streams of tuples
4. Spouts
● Spouts
○ Expose an external input source
■ An unbounded stream of data
○ Are polled, by storm, for their next Tuple
○ Produce one or more streams of Tuples
○ Notified when a Tuple is completely processed
○ Notified when a Tuple fails to be processed
5. Bolts
● Bolts
○ Subscribes to one or more Streams
■ Grouped by all, randomly, or by Field values
○ Produces zero or more Streams
○ Single threaded execution per instance
6. An Example
Classic word count:
Lets assume we have an unbounded incoming stream of
sentences and we want to have a constantly updated count
of the words found in the text.
Code:
https://github.com/apache/storm/blob/master/examples/storm-starter/src/jvm/storm/starter/WordCountTopology.java
7. Building a topology
We provide a description of the topology to be
deployed via the fluent TopologyBuilder class.
TopologyBuilder builder = new TopologyBuilder();
8. Spout
First we’ll create a spout to generate random
sentences and add it to the topology
public class RandomSentenceSpout extends BaseRichSpout {
…
public void nextTuple() {
String[] sentences = new String[]{ "the cow jumped over the moon", "an apple a day keeps the doctor away",
"four score and seven years ago", "snow white and the seven dwarfs", "i am at two with nature" };
String sentence = sentences[_rand.nextInt(sentences.length)];
_collector.emit(new Values(sentence));
}
…
}
builder.setSpout("spout", new RandomSentenceSpout(), 5);
9. Split Messages
Next we’ll split each item into words.
The example is in python, here it is in Java:
public static class SplitSentence extends BaseBasicBolt {
...
public void execute(Tuple tuple, BasicOutputCollector collector) {
String sentence = tuple.getString(0);
for (String word : sentence.split(“s”)) {
collector.emit(new Values(word));
}
}
...
}
builder.setBolt("split", new SplitSentence(), 8).shuffleGrouping("spout");
10. Count Words
Now we can count the words
public static class WordCount extends BaseBasicBolt {
Map<String, Integer> counts = new HashMap<String, Integer>();
public void execute(Tuple tuple, BasicOutputCollector collector) {
String word = tuple.getString(0);
Integer count = counts.get(word);
if (count == null)
count = 0;
count++;
counts.put(word, count);
collector.emit(new Values(word, count));
}
...
}
builder.setBolt("count", new WordCount(), 12).fieldsGrouping("split", new Fields("word"));