SlideShare una empresa de Scribd logo
1 de 56
Descargar para leer sin conexión
Sankt Augustin
24-25.08.2013
Introduction to
Twitter Storm
uweseiler
Sankt Augustin
24-25.08.2013 About me
Big Data Nerd
TravelpiratePhotography Enthusiast
Hadoop Trainer MongoDB Author
Sankt Augustin
24-25.08.2013 About us
is a bunch of…
Big Data Nerds Agile Ninjas Continuous Delivery Gurus
Enterprise Java Specialists Performance Geeks
Join us!
Sankt Augustin
24-25.08.2013 Agenda
• Why Twitter Storm?
• What is Twitter Storm?
• What to do with Twitter Storm?
Sankt Augustin
24-25.08.2013 The 3 V’s of Big Data
VarietyVolume Velocity
Sankt Augustin
24-25.08.2013 Velocity
Sankt Augustin
24-25.08.2013 Why Twitter Storm?
Sankt Augustin
24-25.08.2013 Batch vs. Real-Time processing
• Batch processing
– Gathering of data and processing as a
group at one time.
• Real-time processing
– Processing of data that takes place as the
information is being entered.
Sankt Augustin
24-25.08.2013 Lambda architecture
Sankt Augustin
24-25.08.2013 Bridging the gap…
• A batch workflow is too slow
• Views are out of date
Absorbed into batch views
Time
Not Absorbed
Now
Just a few hours
of data
Sankt Augustin
24-25.08.2013 Storm vs. Hadoop
• Real-time
processing
• Topologies run
forever
• No SPOF
• Stateless nodes
• Batch processing
• Jobs run to
completion
• NameNode is SPOF
• Stateful nodes
• Scalable
• Gurantees no dataloss
• Open Source
Sankt Augustin
24-25.08.2013 Stream Processing
Stream processing is a technical paradigm to process
big volumes of unbound sequence of tuples in real-time
Source Stream Processing
• Algorithmic trading
• Sensor data monitoring
• Continuous analytics
Sankt Augustin
24-25.08.2013 Example: Stream of tweets
https://github.com/colinsurprenant/tweitgeist
Sankt Augustin
24-25.08.2013 Agenda
• Why Twitter Storm?
• What is Twitter Storm?
• What to do with Twitter Storm?
Sankt Augustin
24-25.08.2013 Welcome, Twitter Storm!
• Created by Nathan Marz @ BackType
– Analyze tweets, links, users on Twitter
• Open sourced on 19th September, 2011
– Eclipse Public License 1.0
– Storm v0.5.2
• Latest Updates
– Current stable release v0.8.2 released on 11th January,
2013
– Major core improvements planned for v0.9.0
– Storm will be an Apache Project [soon..]
Sankt Augustin
24-25.08.2013 Storm under the hood
• Java & Clojure
• Apache Thrift
– Cross language bridge, RPC, Framework to build
services
• ZeroMQ
– Asynchronous message transport layer
• Kryo
– Serialization framework
• Jetty
– Embedded web server
Sankt Augustin
24-25.08.2013 Conceptual view
Spout
Spout
Spout:
Source of streams
Bolt
Bolt
Bolt
Bolt
Bolt
Bolt:
Consumer of streams,
Processing of tuples,
Possibly emits new tuples
Tuple
Tuple
Tuple
Tuple:
List of name-value pairs
Stream:
Unbound sequence of tuples
Topology: Network of Spouts & Bolts as the nodes and stream as the edge
Sankt Augustin
24-25.08.2013 Physical view
Java thread
spawned
by worker, runs one
or more tasks of the
same component
Nimbus
ZooKeeper
WorkerSupervisor
Executor Task
ZooKeeper
ZooKeeper
Supervisor
Supervisor
Supervisor
Supervisor
Worker
Worker
Worker Node
Worker Process
Java process
executing a subset
of topology
Component (Spout/
Bolt) instance,
performs the actual
data processing
Master daemon process
Responsible for
• distributing code
• assigning tasks
• monitoring failures
Storing operational
cluster state
Worker daemon process listening
for work assigned to its node
Sankt Augustin
24-25.08.2013 A simple example: WordCount
FileReader
Spout
WordSplit
Bolt
WordCount
Bolt
line
shakespeare.txt
word
of: 18126
to: 18763
i: 19540
and: 26099
the: 27730
Sorted list
Sankt Augustin
24-25.08.2013 FileReaderSpout I
package de.codecentric.storm.wordcount.spouts;
import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.util.Map;
import backtype.storm.spout.SpoutOutputCollector;
import backtype.storm.task.TopologyContext;
import backtype.storm.topology.OutputFieldsDeclarer;
import backtype.storm.topology.base.BaseRichSpout;
import backtype.storm.tuple.Fields;
import backtype.storm.tuple.Values;
public class FileReaderSpout extends BaseRichSpout {
private SpoutOutputCollector collector;
private FileReader fileReader;
private boolean completed = false;
public void ack(Object msgId) {
System.out.println("OK:" + msgId);
}
public void fail(Object msgId) {
System.out.println("FAIL:" + msgId);
}
Sankt Augustin
24-25.08.2013 FileReaderSpout II
/**
* Declare the output field "line"
*/
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("line"));
}
/**
* We will read the file and get the collector object
*/
public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) {
try {
this.fileReader = new FileReader(conf.get("wordsFile").toString());
} catch (FileNotFoundException e) {
throw new RuntimeException("Error reading file ["
+ conf.get("wordFile") + "]");
}
this.collector = collector;
}
public void close() {
}
Sankt Augustin
24-25.08.2013 FileReaderSpout III
/**
* The only thing that the methods will do is emit each file line
*/
public void nextTuple() {
/**
* The nextuple it is called forever, so if we have read the file we
* will wait and then return
*/
String str;
// Open the reader
BufferedReader reader = new BufferedReader(fileReader);
try {
// Read all lines
while ((str = reader.readLine()) != null) {
/**
* Emit each line as a value
*/
this.collector.emit(new Values(str), str);
}
} catch (Exception e) {
throw new RuntimeException("Error reading tuple", e);
} finally {
completed = true;
}
}
}
Sankt Augustin
24-25.08.2013 WordSplitBolt I
package de.codecentric.storm.wordcount.bolts;
import backtype.storm.topology.BasicOutputCollector;
import backtype.storm.topology.OutputFieldsDeclarer;
import backtype.storm.topology.base.BaseBasicBolt;
import backtype.storm.tuple.Fields;
import backtype.storm.tuple.Tuple;
import backtype.storm.tuple.Values;
public class WordSplitBolt extends BaseBasicBolt {
public void cleanup() {}
/**
* The bolt will only emit the field "word"
*/
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word"));
}
Sankt Augustin
24-25.08.2013 WordSplitBolt II
/**
* The bolt will receive the line from the
* words file and process it to split it into words
*/
public void execute(Tuple input, BasicOutputCollector collector) {
String sentence = input.getString(0);
String[] words = sentence.split(" ");
for(String word : words){
word = word.trim();
if(!word.isEmpty()){
word = word.toLowerCase();
collector.emit(new Values(word));
}
}
}
Sankt Augustin
24-25.08.2013 WordCountBolt I
package de.codecentric.storm.wordcount.bolts;
import java.util.Comparator;
import java.util.HashMap;
import java.util.Map;
import java.util.SortedSet;
import java.util.TreeSet;
import backtype.storm.task.TopologyContext;
import backtype.storm.topology.BasicOutputCollector;
import backtype.storm.topology.OutputFieldsDeclarer;
import backtype.storm.topology.base.BaseBasicBolt;
import backtype.storm.tuple.Tuple;
public class WordCountBolt extends BaseBasicBolt {
/**
*
*/
private static final long serialVersionUID = 1L;
Integer id;
String name;
Map<String, Integer> counters;
Sankt Augustin
24-25.08.2013 WordCountBolt II
/**
* On create
*/
@Override
public void prepare(Map stormConf, TopologyContext context) {
this.counters = new HashMap<String, Integer>();
this.name = context.getThisComponentId();
this.id = context.getThisTaskId();
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
}
@Override
public void execute(Tuple input, BasicOutputCollector collector) {
String str = input.getString(0);
/**
* If the word doesn't exist in the map we will create this, if not we will add 1
*/
if (!counters.containsKey(str)) {
counters.put(str, 1);
} else {
Integer c = counters.get(str) + 1;
counters.put(str, c);
}
}
Sankt Augustin
24-25.08.2013 WordCountBolt III
/**
* At the end of the spout (when the cluster is shutdown we will show the
* word counters
*/
@Override
public void cleanup() {
// Sort map
SortedSet<Map.Entry<String, Integer>> sortedCounts = entriesSortedByValues(counters);
System.out.println("-- Word Counter [" + name + "-" + id + "] --");
for (Map.Entry<String, Integer> entry : sortedCounts) {
System.out.println(entry.getKey() + ": " + entry.getValue());
}
}
…
}
Sankt Augustin
24-25.08.2013 WordCountTopology
public class WordCountTopology {
public static void main(String[] args) throws InterruptedException {
// Topology definition
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("word-reader",new FileReaderSpout());
builder.setBolt("word-normalizer", new WordSplitBolt())
.shuffleGrouping("word-reader");
builder.setBolt("word-counter", new WordCountBolt(),1)
.fieldsGrouping("word-normalizer", new Fields("word"));
// Configuration
Config conf = new Config();
conf.put("wordsFile", args[0]);
conf.setDebug(false);
// Run Topology
conf.put(Config.TOPOLOGY_MAX_SPOUT_PENDING, 1);
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("word-count-topology", conf, builder.createTopology());
// You don‘t do this on a regular topology
Utils.sleep(10000);
cluster.killTopology("word-count-topology");
cluster.shutdown();
}
}
Sankt Augustin
24-25.08.2013 Stream Grouping
• Each Spout or Bolt might be running n instances in parallel
• Groupings are used to decide to which task in the
subscribing bolt (group) a tuple is sent to.
• Possible Groupings:
Grouping Feature
Shuffle Random grouping
Fields Grouped by value such that equal value results in same task
All Replicates to all tasks
Global Makes all tuples go to one task
None Makes Bolt run in the same thread as the Bolt / Spout it subscribes to
Direct Producer (task that emits) controls which Consumer will receive
Local If the target bolt has one or more tasks in the same worker process,
tuples will be shuffled to just those in-process tasks
Sankt Augustin
24-25.08.2013 Key features of Twitter Storm
Storm is
• Fast & scalable
• Fault-tolerant
• Guaranteeing message processing
• Easy to setup & operate
• Free & Open Source
Sankt Augustin
24-25.08.2013 Key features of Twitter Storm
Storm is
• Fast & scalable
• Fault-tolerant
• Guaranteeing message processing
• Easy to setup & operate
• Free & Open Source
Sankt Augustin
24-25.08.2013 Extremely performant
Sankt Augustin
24-25.08.2013 Parallelism
Number of worker nodes = 2
Number of worker slots per node = 4
Number of topology worker = 4
FileReaderSpout WordSplitBolt WordCountBolt
Number of tasks =
Not specified = Same
as parallism hint
Parellism_hint = 2
Number of tasks = 8
Parellism_hint = 4
Number of tasks =
Not specified = 6
Parellism_hint = 6
Number of component instances = 2 + 8 + 6 = 16
Number of executor threads = 2 + 4 + 6 = 12
Sankt Augustin
24-25.08.2013 Message passing
Receive
Thread
Executor
Transfer
Thread
Executor
Executor
Receiver queue
To other workers
From other workers
Internal transfer queue
Transfer queue
Interprocess communication is mediated by ZeroMQ
Outside transfer is done with Kryo serialization
Local communication is mediated by LMAX Disruptor
Inside transfer is done with no serialization
Sankt Augustin
24-25.08.2013 Key features of Twitter Storm
Storm is
• Fast & scalable
• Fault-tolerant
• Guaranteeing message processing
• Easy to setup & operate
• Free & Open Source
Sankt Augustin
24-25.08.2013 Fault tolerance
Nimbus ZooKeeper Supervisor Worker
Cluster works normally
Monitoring
cluster state
Synchronizing
assignment
Sending heartbeat
Reading worker heart
beat from local file
system
Sending executor heartbeat
Sankt Augustin
24-25.08.2013 Fault tolerance
Nimbus ZooKeeper Supervisor Worker
Nimbus goes down
Monitoring
cluster state
Synchronizing
assignment
Sending heartbeat
Reading worker heart
beat from local file
system
Sending executor heartbeat
Processing will still continue. But topology lifecycle
operations and reassignment facility are lost
Sankt Augustin
24-25.08.2013 Fault tolerance
Nimbus ZooKeeper Supervisor Worker
Worker node goes down
Monitoring
cluster state
Sending executor heartbeat
Nimbus will reassign the tasks to other machines
and the processing will continue
Supervisor Worker
Synchronizing
assignment
Sending heartbeat
Reading worker heart
beat from local file
system
Sankt Augustin
24-25.08.2013 Fault tolerance
Nimbus ZooKeeper Supervisor Worker
Supervisor goes down
Monitoring
cluster state
Synchronizing
assignment
Sending heartbeat
Reading worker heart
beat from local file
system
Sending executor heartbeat
Processing will still continue. But assignment is
never synchronized
Sankt Augustin
24-25.08.2013 Fault tolerance
Nimbus ZooKeeper Supervisor Worker
Worker process goes down
Monitoring
cluster state
Synchronizing
assignment
Sending heartbeat
Reading worker heart
beat from local file
system
Sending executor heartbeat
Supervisor will restart the worker process and the
processing will continue
Sankt Augustin
24-25.08.2013 Key features of Twitter Storm
Storm is
• Fast & scalable
• Fault-tolerant
• Guaranteeing message processing
• Easy to setup & operate
• Free & Open Source
Sankt Augustin
24-25.08.2013 Reliability API
public class FileReaderSpout extends BaseRichSpout {
public void nextTuple() {
…;
UUID messageID = getMsgID();
collector.emit(newValues(line), msgId)
}
public void ack(Object msgId) {
// Do something with acked message id
}
public void fail(Object msgId) {
// Do something with failes message id
}
}
public class WordSplitBolt extends BaseBasicBolt {
public void execute(Tuple input, BasicOutputCollector collector) {
for (String s : input.getString(0).split("s")) {
collector.emit(input, newValues(s));
}
collector.ack(input);
}
}
Tupel tree
Anchoring incoming tuple to
outgoing tuples
Sending ack
This
“This is a line”
This
This
This
Emiting tuple with Message ID
Sankt Augustin
24-25.08.2013 ACKing Framework
ACKer init
FileReaderSpout WordSplitBolt WordCountBolt
ACKer implicit
boltACKer ack
ACKer fail
ACKer ack
ACKer fail
Tuple A
Tuple B
Tuple C
• Emitted tuple A, XOR tuple A id with ack val
• Emitted tuple B, XOR tuple B id with ack val
• Emitted tuple C, XOR tuple C id with ack val
• Acked tuple A, XOR tuple A id with ack val
• Acked tuple B, XOR tuple B id with ack val
• Acked tuple C, XOR tuple C id with ack val
Spout Tuple ID Spout Task ID ACK val (64 Bit)
ACKer implizit bolt
ACK val has become 0, ACKer implicit bolt
knows the tuple tree has been completed
Sankt Augustin
24-25.08.2013 Key features of Twitter Storm
Storm is
• Fast & scalable
• Fault-tolerant
• Guaranteeing message processing
• Easy to setup & operate
• Free & Open Source
Sankt Augustin
24-25.08.2013 Cluster Setup
• Setup ZooKeeper cluster
• Install dependencies on Nimbus and worker machines
– ZeroMQ 2.1.7 and JZMQ
– Java 6 and Python 2.6.6
– unzip
• Download and extract a Storm release to Nimbus and
worker machines
• Fill in mandatory configuration into storm.yaml
• Launch daemons under supervision using storm scripts
• Start a topology:
– storm jar <path_topology_jar> <main_class> <arg1>…<argN>
Sankt Augustin
24-25.08.2013 Cluster Summary
Sankt Augustin
24-25.08.2013 Topology Summary
Sankt Augustin
24-25.08.2013 Component Summary
Sankt Augustin
24-25.08.2013 Key features of Twitter Storm
Storm is
• Fast & scalable
• Fault-tolerant
• Guaranteeing message processing
• Easy to setup & operate
• Free & Open Source
Sankt Augustin
24-25.08.2013 Basic resources
• Storm is available at
– http://storm-project.net/
– https://github.com/nathanmarz/storm
under Eclipse Public License 1.0
• Get help on
– http://groups.google.com/group/storm-user
– #storm-user freenode room
• Follow
@stormprocessor and @nathanmarz
Sankt Augustin
24-25.08.2013 Many contributions
• Community repository for modules to use Storm at
– https://github.com/nathanmarz/storm-contrib
– including integration with Redis, Kafka, MongoDB, HBase, JMS,
Amazon SQS, …
• Good articles for understanding Storm internals
– http://www.michael-noll.com/blog/2012/10/16/understanding-the-
parallelism-of-a-stormtopology/
– http://www.michael-noll.com/blog/2013/06/21/understanding-storm-
internal-messagebuffers/
• Good slides for understanding real-life examples
– http://www.slideshare.net/DanLynn1/storm-as-deep-into-
realtime-data-processing-as-youcan-get-in-30-minutes
– http://www.slideshare.net/KrishnaGade2/storm-at-twitter
Sankt Augustin
24-25.08.2013 Coming next…
• Current release: 0.8.2
• Work in progress (newest): 0.9.0-wip21
– SLF4J and Logback
– Pluggable tuple serialization and blowfish
encryption
– Pluggable interprocess messaging and Netty
implementation
– Some bug fixes
– And more
• Storm on YARN
Sankt Augustin
24-25.08.2013 Agenda
• Why Twitter Storm?
• What is Twitter Storm?
• What to do with Twitter Storm?
Sankt Augustin
24-25.08.2013 One example: Webshop
• Webtracking component
• No defined page impression
• Identifying page impressions using
Varnish logs of the click stream data
• Page consists of different fragments
– Body
– Article description
– Recommendation box, …
• Session data also of interest
Sankt Augustin
24-25.08.2013 One example: Webshop
• Custom solution using J2EE and
MongoDB
• Export into Comscore DAx and
Enterprise DWH
• Solution is currently working but not
scalable
• What about performance?
Sankt Augustin
24-25.08.2013 Topology Architecture

Más contenido relacionado

La actualidad más candente

Distributed Realtime Computation using Apache Storm
Distributed Realtime Computation using Apache StormDistributed Realtime Computation using Apache Storm
Distributed Realtime Computation using Apache Stormthe100rabh
 
Storm Real Time Computation
Storm Real Time ComputationStorm Real Time Computation
Storm Real Time ComputationSonal Raj
 
Learning Stream Processing with Apache Storm
Learning Stream Processing with Apache StormLearning Stream Processing with Apache Storm
Learning Stream Processing with Apache StormEugene Dvorkin
 
Multi-Tenant Storm Service on Hadoop Grid
Multi-Tenant Storm Service on Hadoop GridMulti-Tenant Storm Service on Hadoop Grid
Multi-Tenant Storm Service on Hadoop GridDataWorks Summit
 
PHP Backends for Real-Time User Interaction using Apache Storm.
PHP Backends for Real-Time User Interaction using Apache Storm.PHP Backends for Real-Time User Interaction using Apache Storm.
PHP Backends for Real-Time User Interaction using Apache Storm.DECK36
 
Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)Robert Evans
 
Real Time Graph Computations in Storm, Neo4J, Python - PyCon India 2013
Real Time Graph Computations in Storm, Neo4J, Python - PyCon India 2013Real Time Graph Computations in Storm, Neo4J, Python - PyCon India 2013
Real Time Graph Computations in Storm, Neo4J, Python - PyCon India 2013Sonal Raj
 
Realtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and HadoopRealtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and HadoopDataWorks Summit
 
Slide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache StormSlide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache StormMd. Shamsur Rahim
 
Improved Reliable Streaming Processing: Apache Storm as example
Improved Reliable Streaming Processing: Apache Storm as exampleImproved Reliable Streaming Processing: Apache Storm as example
Improved Reliable Streaming Processing: Apache Storm as exampleDataWorks Summit/Hadoop Summit
 
Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014P. Taylor Goetz
 
Distributed real time stream processing- why and how
Distributed real time stream processing- why and howDistributed real time stream processing- why and how
Distributed real time stream processing- why and howPetr Zapletal
 
Storm: The Real-Time Layer - GlueCon 2012
Storm: The Real-Time Layer  - GlueCon 2012Storm: The Real-Time Layer  - GlueCon 2012
Storm: The Real-Time Layer - GlueCon 2012Dan Lynn
 

La actualidad más candente (19)

Storm
StormStorm
Storm
 
Introduction to Apache Storm
Introduction to Apache StormIntroduction to Apache Storm
Introduction to Apache Storm
 
Storm and Cassandra
Storm and Cassandra Storm and Cassandra
Storm and Cassandra
 
Distributed Realtime Computation using Apache Storm
Distributed Realtime Computation using Apache StormDistributed Realtime Computation using Apache Storm
Distributed Realtime Computation using Apache Storm
 
Storm Real Time Computation
Storm Real Time ComputationStorm Real Time Computation
Storm Real Time Computation
 
Learning Stream Processing with Apache Storm
Learning Stream Processing with Apache StormLearning Stream Processing with Apache Storm
Learning Stream Processing with Apache Storm
 
Multi-Tenant Storm Service on Hadoop Grid
Multi-Tenant Storm Service on Hadoop GridMulti-Tenant Storm Service on Hadoop Grid
Multi-Tenant Storm Service on Hadoop Grid
 
PHP Backends for Real-Time User Interaction using Apache Storm.
PHP Backends for Real-Time User Interaction using Apache Storm.PHP Backends for Real-Time User Interaction using Apache Storm.
PHP Backends for Real-Time User Interaction using Apache Storm.
 
Introduction to Storm
Introduction to StormIntroduction to Storm
Introduction to Storm
 
Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)
 
Real Time Graph Computations in Storm, Neo4J, Python - PyCon India 2013
Real Time Graph Computations in Storm, Neo4J, Python - PyCon India 2013Real Time Graph Computations in Storm, Neo4J, Python - PyCon India 2013
Real Time Graph Computations in Storm, Neo4J, Python - PyCon India 2013
 
Apache Storm Internals
Apache Storm InternalsApache Storm Internals
Apache Storm Internals
 
Realtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and HadoopRealtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and Hadoop
 
Slide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache StormSlide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache Storm
 
Improved Reliable Streaming Processing: Apache Storm as example
Improved Reliable Streaming Processing: Apache Storm as exampleImproved Reliable Streaming Processing: Apache Storm as example
Improved Reliable Streaming Processing: Apache Storm as example
 
Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014
 
Apache Storm Tutorial
Apache Storm TutorialApache Storm Tutorial
Apache Storm Tutorial
 
Distributed real time stream processing- why and how
Distributed real time stream processing- why and howDistributed real time stream processing- why and how
Distributed real time stream processing- why and how
 
Storm: The Real-Time Layer - GlueCon 2012
Storm: The Real-Time Layer  - GlueCon 2012Storm: The Real-Time Layer  - GlueCon 2012
Storm: The Real-Time Layer - GlueCon 2012
 

Similar a Introduction to Twitter Storm

10 things I’ve learnt In the clouds
10 things I’ve learnt In the clouds10 things I’ve learnt In the clouds
10 things I’ve learnt In the cloudsStuart Lodge
 
BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at...
BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at...BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at...
BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at...Big Data Spain
 
Introduction to the Hadoop Ecosystem (FrOSCon Edition)
Introduction to the Hadoop Ecosystem (FrOSCon Edition)Introduction to the Hadoop Ecosystem (FrOSCon Edition)
Introduction to the Hadoop Ecosystem (FrOSCon Edition)Uwe Printz
 
PigSPARQL: A SPARQL Query Processing Baseline for Big Data
PigSPARQL: A SPARQL Query Processing Baseline for Big DataPigSPARQL: A SPARQL Query Processing Baseline for Big Data
PigSPARQL: A SPARQL Query Processing Baseline for Big DataAlexander Schätzle
 
Workers of the web - BrazilJS 2013
Workers of the web - BrazilJS 2013Workers of the web - BrazilJS 2013
Workers of the web - BrazilJS 2013Thibault Imbert
 
Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015Robbie Strickland
 
Go and Uber’s time series database m3
Go and Uber’s time series database m3Go and Uber’s time series database m3
Go and Uber’s time series database m3Rob Skillington
 
Real-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormReal-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormDavorin Vukelic
 
Real time stream processing presentation at General Assemb.ly
Real time stream processing presentation at General Assemb.lyReal time stream processing presentation at General Assemb.ly
Real time stream processing presentation at General Assemb.lyVarun Vijayaraghavan
 
Bjarne Stroustrup - The Essence of C++: With Examples in C++84, C++98, C++11,...
Bjarne Stroustrup - The Essence of C++: With Examples in C++84, C++98, C++11,...Bjarne Stroustrup - The Essence of C++: With Examples in C++84, C++98, C++11,...
Bjarne Stroustrup - The Essence of C++: With Examples in C++84, C++98, C++11,...Complement Verb
 
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomyDongmin Yu
 
Apache Flink Training: DataStream API Part 2 Advanced
Apache Flink Training: DataStream API Part 2 Advanced Apache Flink Training: DataStream API Part 2 Advanced
Apache Flink Training: DataStream API Part 2 Advanced Flink Forward
 
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by ScyllaScylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by ScyllaScyllaDB
 
[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기NAVER D2
 
Using akka streams to access s3 objects
Using akka streams to access s3 objectsUsing akka streams to access s3 objects
Using akka streams to access s3 objectsMikhail Girkin
 
Concurrent talk
Concurrent talkConcurrent talk
Concurrent talkrahulrevo
 

Similar a Introduction to Twitter Storm (20)

10 things I’ve learnt In the clouds
10 things I’ve learnt In the clouds10 things I’ve learnt In the clouds
10 things I’ve learnt In the clouds
 
Streams in Node.js
Streams in Node.jsStreams in Node.js
Streams in Node.js
 
BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at...
BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at...BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at...
BigQuery JavaScript User-Defined Functions by THOMAS PARK and FELIPE HOFFA at...
 
Introduction to the Hadoop Ecosystem (FrOSCon Edition)
Introduction to the Hadoop Ecosystem (FrOSCon Edition)Introduction to the Hadoop Ecosystem (FrOSCon Edition)
Introduction to the Hadoop Ecosystem (FrOSCon Edition)
 
PigSPARQL: A SPARQL Query Processing Baseline for Big Data
PigSPARQL: A SPARQL Query Processing Baseline for Big DataPigSPARQL: A SPARQL Query Processing Baseline for Big Data
PigSPARQL: A SPARQL Query Processing Baseline for Big Data
 
Workers of the web - BrazilJS 2013
Workers of the web - BrazilJS 2013Workers of the web - BrazilJS 2013
Workers of the web - BrazilJS 2013
 
Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015
 
Go and Uber’s time series database m3
Go and Uber’s time series database m3Go and Uber’s time series database m3
Go and Uber’s time series database m3
 
Real-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache StormReal-Time Streaming with Apache Spark Streaming and Apache Storm
Real-Time Streaming with Apache Spark Streaming and Apache Storm
 
Thread
ThreadThread
Thread
 
Real time stream processing presentation at General Assemb.ly
Real time stream processing presentation at General Assemb.lyReal time stream processing presentation at General Assemb.ly
Real time stream processing presentation at General Assemb.ly
 
Bjarne Stroustrup - The Essence of C++: With Examples in C++84, C++98, C++11,...
Bjarne Stroustrup - The Essence of C++: With Examples in C++84, C++98, C++11,...Bjarne Stroustrup - The Essence of C++: With Examples in C++84, C++98, C++11,...
Bjarne Stroustrup - The Essence of C++: With Examples in C++84, C++98, C++11,...
 
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomy
 
Apache Flink Training: DataStream API Part 2 Advanced
Apache Flink Training: DataStream API Part 2 Advanced Apache Flink Training: DataStream API Part 2 Advanced
Apache Flink Training: DataStream API Part 2 Advanced
 
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by ScyllaScylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
 
[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기
 
Using akka streams to access s3 objects
Using akka streams to access s3 objectsUsing akka streams to access s3 objects
Using akka streams to access s3 objects
 
Java Concurrency
Java ConcurrencyJava Concurrency
Java Concurrency
 
Concurrent talk
Concurrent talkConcurrent talk
Concurrent talk
 
Network
NetworkNetwork
Network
 

Más de Uwe Printz

Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Uwe Printz
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Uwe Printz
 
Hadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data ModelHadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data ModelUwe Printz
 
Hadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureHadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureUwe Printz
 
Hadoop Operations - Best practices from the field
Hadoop Operations - Best practices from the fieldHadoop Operations - Best practices from the field
Hadoop Operations - Best practices from the fieldUwe Printz
 
Lightning Talk: Agility & Databases
Lightning Talk: Agility & DatabasesLightning Talk: Agility & Databases
Lightning Talk: Agility & DatabasesUwe Printz
 
Hadoop 2 - More than MapReduce
Hadoop 2 - More than MapReduceHadoop 2 - More than MapReduce
Hadoop 2 - More than MapReduceUwe Printz
 
Welcome to Hadoop2Land!
Welcome to Hadoop2Land!Welcome to Hadoop2Land!
Welcome to Hadoop2Land!Uwe Printz
 
Hadoop 2 - Beyond MapReduce
Hadoop 2 - Beyond MapReduceHadoop 2 - Beyond MapReduce
Hadoop 2 - Beyond MapReduceUwe Printz
 
MongoDB für Java Programmierer (JUGKA, 11.12.13)
MongoDB für Java Programmierer (JUGKA, 11.12.13)MongoDB für Java Programmierer (JUGKA, 11.12.13)
MongoDB für Java Programmierer (JUGKA, 11.12.13)Uwe Printz
 
Hadoop 2 - Going beyond MapReduce
Hadoop 2 - Going beyond MapReduceHadoop 2 - Going beyond MapReduce
Hadoop 2 - Going beyond MapReduceUwe Printz
 
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)Uwe Printz
 
MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)Uwe Printz
 
MongoDB für Java-Programmierer
MongoDB für Java-ProgrammiererMongoDB für Java-Programmierer
MongoDB für Java-ProgrammiererUwe Printz
 
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...Uwe Printz
 
Introduction to the Hadoop Ecosystem (SEACON Edition)
Introduction to the Hadoop Ecosystem (SEACON Edition)Introduction to the Hadoop Ecosystem (SEACON Edition)
Introduction to the Hadoop Ecosystem (SEACON Edition)Uwe Printz
 
Introduction to the Hadoop Ecosystem (codemotion Edition)
Introduction to the Hadoop Ecosystem (codemotion Edition)Introduction to the Hadoop Ecosystem (codemotion Edition)
Introduction to the Hadoop Ecosystem (codemotion Edition)Uwe Printz
 
Map/Confused? A practical approach to Map/Reduce with MongoDB
Map/Confused? A practical approach to Map/Reduce with MongoDBMap/Confused? A practical approach to Map/Reduce with MongoDB
Map/Confused? A practical approach to Map/Reduce with MongoDBUwe Printz
 
First meetup of the MongoDB User Group Frankfurt
First meetup of the MongoDB User Group FrankfurtFirst meetup of the MongoDB User Group Frankfurt
First meetup of the MongoDB User Group FrankfurtUwe Printz
 

Más de Uwe Printz (20)

Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?
 
Hadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data ModelHadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data Model
 
Hadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, FutureHadoop & Security - Past, Present, Future
Hadoop & Security - Past, Present, Future
 
Hadoop Operations - Best practices from the field
Hadoop Operations - Best practices from the fieldHadoop Operations - Best practices from the field
Hadoop Operations - Best practices from the field
 
Apache Spark
Apache SparkApache Spark
Apache Spark
 
Lightning Talk: Agility & Databases
Lightning Talk: Agility & DatabasesLightning Talk: Agility & Databases
Lightning Talk: Agility & Databases
 
Hadoop 2 - More than MapReduce
Hadoop 2 - More than MapReduceHadoop 2 - More than MapReduce
Hadoop 2 - More than MapReduce
 
Welcome to Hadoop2Land!
Welcome to Hadoop2Land!Welcome to Hadoop2Land!
Welcome to Hadoop2Land!
 
Hadoop 2 - Beyond MapReduce
Hadoop 2 - Beyond MapReduceHadoop 2 - Beyond MapReduce
Hadoop 2 - Beyond MapReduce
 
MongoDB für Java Programmierer (JUGKA, 11.12.13)
MongoDB für Java Programmierer (JUGKA, 11.12.13)MongoDB für Java Programmierer (JUGKA, 11.12.13)
MongoDB für Java Programmierer (JUGKA, 11.12.13)
 
Hadoop 2 - Going beyond MapReduce
Hadoop 2 - Going beyond MapReduceHadoop 2 - Going beyond MapReduce
Hadoop 2 - Going beyond MapReduce
 
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
Introduction to the Hadoop Ecosystem (IT-Stammtisch Darmstadt Edition)
 
MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)
 
MongoDB für Java-Programmierer
MongoDB für Java-ProgrammiererMongoDB für Java-Programmierer
MongoDB für Java-Programmierer
 
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
Introduction to the Hadoop Ecosystem with Hadoop 2.0 aka YARN (Java Serbia Ed...
 
Introduction to the Hadoop Ecosystem (SEACON Edition)
Introduction to the Hadoop Ecosystem (SEACON Edition)Introduction to the Hadoop Ecosystem (SEACON Edition)
Introduction to the Hadoop Ecosystem (SEACON Edition)
 
Introduction to the Hadoop Ecosystem (codemotion Edition)
Introduction to the Hadoop Ecosystem (codemotion Edition)Introduction to the Hadoop Ecosystem (codemotion Edition)
Introduction to the Hadoop Ecosystem (codemotion Edition)
 
Map/Confused? A practical approach to Map/Reduce with MongoDB
Map/Confused? A practical approach to Map/Reduce with MongoDBMap/Confused? A practical approach to Map/Reduce with MongoDB
Map/Confused? A practical approach to Map/Reduce with MongoDB
 
First meetup of the MongoDB User Group Frankfurt
First meetup of the MongoDB User Group FrankfurtFirst meetup of the MongoDB User Group Frankfurt
First meetup of the MongoDB User Group Frankfurt
 

Último

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 

Último (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

Introduction to Twitter Storm

  • 2. Sankt Augustin 24-25.08.2013 About me Big Data Nerd TravelpiratePhotography Enthusiast Hadoop Trainer MongoDB Author
  • 3. Sankt Augustin 24-25.08.2013 About us is a bunch of… Big Data Nerds Agile Ninjas Continuous Delivery Gurus Enterprise Java Specialists Performance Geeks Join us!
  • 4. Sankt Augustin 24-25.08.2013 Agenda • Why Twitter Storm? • What is Twitter Storm? • What to do with Twitter Storm?
  • 5. Sankt Augustin 24-25.08.2013 The 3 V’s of Big Data VarietyVolume Velocity
  • 8. Sankt Augustin 24-25.08.2013 Batch vs. Real-Time processing • Batch processing – Gathering of data and processing as a group at one time. • Real-time processing – Processing of data that takes place as the information is being entered.
  • 10. Sankt Augustin 24-25.08.2013 Bridging the gap… • A batch workflow is too slow • Views are out of date Absorbed into batch views Time Not Absorbed Now Just a few hours of data
  • 11. Sankt Augustin 24-25.08.2013 Storm vs. Hadoop • Real-time processing • Topologies run forever • No SPOF • Stateless nodes • Batch processing • Jobs run to completion • NameNode is SPOF • Stateful nodes • Scalable • Gurantees no dataloss • Open Source
  • 12. Sankt Augustin 24-25.08.2013 Stream Processing Stream processing is a technical paradigm to process big volumes of unbound sequence of tuples in real-time Source Stream Processing • Algorithmic trading • Sensor data monitoring • Continuous analytics
  • 13. Sankt Augustin 24-25.08.2013 Example: Stream of tweets https://github.com/colinsurprenant/tweitgeist
  • 14. Sankt Augustin 24-25.08.2013 Agenda • Why Twitter Storm? • What is Twitter Storm? • What to do with Twitter Storm?
  • 15. Sankt Augustin 24-25.08.2013 Welcome, Twitter Storm! • Created by Nathan Marz @ BackType – Analyze tweets, links, users on Twitter • Open sourced on 19th September, 2011 – Eclipse Public License 1.0 – Storm v0.5.2 • Latest Updates – Current stable release v0.8.2 released on 11th January, 2013 – Major core improvements planned for v0.9.0 – Storm will be an Apache Project [soon..]
  • 16. Sankt Augustin 24-25.08.2013 Storm under the hood • Java & Clojure • Apache Thrift – Cross language bridge, RPC, Framework to build services • ZeroMQ – Asynchronous message transport layer • Kryo – Serialization framework • Jetty – Embedded web server
  • 17. Sankt Augustin 24-25.08.2013 Conceptual view Spout Spout Spout: Source of streams Bolt Bolt Bolt Bolt Bolt Bolt: Consumer of streams, Processing of tuples, Possibly emits new tuples Tuple Tuple Tuple Tuple: List of name-value pairs Stream: Unbound sequence of tuples Topology: Network of Spouts & Bolts as the nodes and stream as the edge
  • 18. Sankt Augustin 24-25.08.2013 Physical view Java thread spawned by worker, runs one or more tasks of the same component Nimbus ZooKeeper WorkerSupervisor Executor Task ZooKeeper ZooKeeper Supervisor Supervisor Supervisor Supervisor Worker Worker Worker Node Worker Process Java process executing a subset of topology Component (Spout/ Bolt) instance, performs the actual data processing Master daemon process Responsible for • distributing code • assigning tasks • monitoring failures Storing operational cluster state Worker daemon process listening for work assigned to its node
  • 19. Sankt Augustin 24-25.08.2013 A simple example: WordCount FileReader Spout WordSplit Bolt WordCount Bolt line shakespeare.txt word of: 18126 to: 18763 i: 19540 and: 26099 the: 27730 Sorted list
  • 20. Sankt Augustin 24-25.08.2013 FileReaderSpout I package de.codecentric.storm.wordcount.spouts; import java.io.BufferedReader; import java.io.FileNotFoundException; import java.io.FileReader; import java.util.Map; import backtype.storm.spout.SpoutOutputCollector; import backtype.storm.task.TopologyContext; import backtype.storm.topology.OutputFieldsDeclarer; import backtype.storm.topology.base.BaseRichSpout; import backtype.storm.tuple.Fields; import backtype.storm.tuple.Values; public class FileReaderSpout extends BaseRichSpout { private SpoutOutputCollector collector; private FileReader fileReader; private boolean completed = false; public void ack(Object msgId) { System.out.println("OK:" + msgId); } public void fail(Object msgId) { System.out.println("FAIL:" + msgId); }
  • 21. Sankt Augustin 24-25.08.2013 FileReaderSpout II /** * Declare the output field "line" */ public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("line")); } /** * We will read the file and get the collector object */ public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) { try { this.fileReader = new FileReader(conf.get("wordsFile").toString()); } catch (FileNotFoundException e) { throw new RuntimeException("Error reading file [" + conf.get("wordFile") + "]"); } this.collector = collector; } public void close() { }
  • 22. Sankt Augustin 24-25.08.2013 FileReaderSpout III /** * The only thing that the methods will do is emit each file line */ public void nextTuple() { /** * The nextuple it is called forever, so if we have read the file we * will wait and then return */ String str; // Open the reader BufferedReader reader = new BufferedReader(fileReader); try { // Read all lines while ((str = reader.readLine()) != null) { /** * Emit each line as a value */ this.collector.emit(new Values(str), str); } } catch (Exception e) { throw new RuntimeException("Error reading tuple", e); } finally { completed = true; } } }
  • 23. Sankt Augustin 24-25.08.2013 WordSplitBolt I package de.codecentric.storm.wordcount.bolts; import backtype.storm.topology.BasicOutputCollector; import backtype.storm.topology.OutputFieldsDeclarer; import backtype.storm.topology.base.BaseBasicBolt; import backtype.storm.tuple.Fields; import backtype.storm.tuple.Tuple; import backtype.storm.tuple.Values; public class WordSplitBolt extends BaseBasicBolt { public void cleanup() {} /** * The bolt will only emit the field "word" */ public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("word")); }
  • 24. Sankt Augustin 24-25.08.2013 WordSplitBolt II /** * The bolt will receive the line from the * words file and process it to split it into words */ public void execute(Tuple input, BasicOutputCollector collector) { String sentence = input.getString(0); String[] words = sentence.split(" "); for(String word : words){ word = word.trim(); if(!word.isEmpty()){ word = word.toLowerCase(); collector.emit(new Values(word)); } } }
  • 25. Sankt Augustin 24-25.08.2013 WordCountBolt I package de.codecentric.storm.wordcount.bolts; import java.util.Comparator; import java.util.HashMap; import java.util.Map; import java.util.SortedSet; import java.util.TreeSet; import backtype.storm.task.TopologyContext; import backtype.storm.topology.BasicOutputCollector; import backtype.storm.topology.OutputFieldsDeclarer; import backtype.storm.topology.base.BaseBasicBolt; import backtype.storm.tuple.Tuple; public class WordCountBolt extends BaseBasicBolt { /** * */ private static final long serialVersionUID = 1L; Integer id; String name; Map<String, Integer> counters;
  • 26. Sankt Augustin 24-25.08.2013 WordCountBolt II /** * On create */ @Override public void prepare(Map stormConf, TopologyContext context) { this.counters = new HashMap<String, Integer>(); this.name = context.getThisComponentId(); this.id = context.getThisTaskId(); } @Override public void declareOutputFields(OutputFieldsDeclarer declarer) { } @Override public void execute(Tuple input, BasicOutputCollector collector) { String str = input.getString(0); /** * If the word doesn't exist in the map we will create this, if not we will add 1 */ if (!counters.containsKey(str)) { counters.put(str, 1); } else { Integer c = counters.get(str) + 1; counters.put(str, c); } }
  • 27. Sankt Augustin 24-25.08.2013 WordCountBolt III /** * At the end of the spout (when the cluster is shutdown we will show the * word counters */ @Override public void cleanup() { // Sort map SortedSet<Map.Entry<String, Integer>> sortedCounts = entriesSortedByValues(counters); System.out.println("-- Word Counter [" + name + "-" + id + "] --"); for (Map.Entry<String, Integer> entry : sortedCounts) { System.out.println(entry.getKey() + ": " + entry.getValue()); } } … }
  • 28. Sankt Augustin 24-25.08.2013 WordCountTopology public class WordCountTopology { public static void main(String[] args) throws InterruptedException { // Topology definition TopologyBuilder builder = new TopologyBuilder(); builder.setSpout("word-reader",new FileReaderSpout()); builder.setBolt("word-normalizer", new WordSplitBolt()) .shuffleGrouping("word-reader"); builder.setBolt("word-counter", new WordCountBolt(),1) .fieldsGrouping("word-normalizer", new Fields("word")); // Configuration Config conf = new Config(); conf.put("wordsFile", args[0]); conf.setDebug(false); // Run Topology conf.put(Config.TOPOLOGY_MAX_SPOUT_PENDING, 1); LocalCluster cluster = new LocalCluster(); cluster.submitTopology("word-count-topology", conf, builder.createTopology()); // You don‘t do this on a regular topology Utils.sleep(10000); cluster.killTopology("word-count-topology"); cluster.shutdown(); } }
  • 29. Sankt Augustin 24-25.08.2013 Stream Grouping • Each Spout or Bolt might be running n instances in parallel • Groupings are used to decide to which task in the subscribing bolt (group) a tuple is sent to. • Possible Groupings: Grouping Feature Shuffle Random grouping Fields Grouped by value such that equal value results in same task All Replicates to all tasks Global Makes all tuples go to one task None Makes Bolt run in the same thread as the Bolt / Spout it subscribes to Direct Producer (task that emits) controls which Consumer will receive Local If the target bolt has one or more tasks in the same worker process, tuples will be shuffled to just those in-process tasks
  • 30. Sankt Augustin 24-25.08.2013 Key features of Twitter Storm Storm is • Fast & scalable • Fault-tolerant • Guaranteeing message processing • Easy to setup & operate • Free & Open Source
  • 31. Sankt Augustin 24-25.08.2013 Key features of Twitter Storm Storm is • Fast & scalable • Fault-tolerant • Guaranteeing message processing • Easy to setup & operate • Free & Open Source
  • 33. Sankt Augustin 24-25.08.2013 Parallelism Number of worker nodes = 2 Number of worker slots per node = 4 Number of topology worker = 4 FileReaderSpout WordSplitBolt WordCountBolt Number of tasks = Not specified = Same as parallism hint Parellism_hint = 2 Number of tasks = 8 Parellism_hint = 4 Number of tasks = Not specified = 6 Parellism_hint = 6 Number of component instances = 2 + 8 + 6 = 16 Number of executor threads = 2 + 4 + 6 = 12
  • 34. Sankt Augustin 24-25.08.2013 Message passing Receive Thread Executor Transfer Thread Executor Executor Receiver queue To other workers From other workers Internal transfer queue Transfer queue Interprocess communication is mediated by ZeroMQ Outside transfer is done with Kryo serialization Local communication is mediated by LMAX Disruptor Inside transfer is done with no serialization
  • 35. Sankt Augustin 24-25.08.2013 Key features of Twitter Storm Storm is • Fast & scalable • Fault-tolerant • Guaranteeing message processing • Easy to setup & operate • Free & Open Source
  • 36. Sankt Augustin 24-25.08.2013 Fault tolerance Nimbus ZooKeeper Supervisor Worker Cluster works normally Monitoring cluster state Synchronizing assignment Sending heartbeat Reading worker heart beat from local file system Sending executor heartbeat
  • 37. Sankt Augustin 24-25.08.2013 Fault tolerance Nimbus ZooKeeper Supervisor Worker Nimbus goes down Monitoring cluster state Synchronizing assignment Sending heartbeat Reading worker heart beat from local file system Sending executor heartbeat Processing will still continue. But topology lifecycle operations and reassignment facility are lost
  • 38. Sankt Augustin 24-25.08.2013 Fault tolerance Nimbus ZooKeeper Supervisor Worker Worker node goes down Monitoring cluster state Sending executor heartbeat Nimbus will reassign the tasks to other machines and the processing will continue Supervisor Worker Synchronizing assignment Sending heartbeat Reading worker heart beat from local file system
  • 39. Sankt Augustin 24-25.08.2013 Fault tolerance Nimbus ZooKeeper Supervisor Worker Supervisor goes down Monitoring cluster state Synchronizing assignment Sending heartbeat Reading worker heart beat from local file system Sending executor heartbeat Processing will still continue. But assignment is never synchronized
  • 40. Sankt Augustin 24-25.08.2013 Fault tolerance Nimbus ZooKeeper Supervisor Worker Worker process goes down Monitoring cluster state Synchronizing assignment Sending heartbeat Reading worker heart beat from local file system Sending executor heartbeat Supervisor will restart the worker process and the processing will continue
  • 41. Sankt Augustin 24-25.08.2013 Key features of Twitter Storm Storm is • Fast & scalable • Fault-tolerant • Guaranteeing message processing • Easy to setup & operate • Free & Open Source
  • 42. Sankt Augustin 24-25.08.2013 Reliability API public class FileReaderSpout extends BaseRichSpout { public void nextTuple() { …; UUID messageID = getMsgID(); collector.emit(newValues(line), msgId) } public void ack(Object msgId) { // Do something with acked message id } public void fail(Object msgId) { // Do something with failes message id } } public class WordSplitBolt extends BaseBasicBolt { public void execute(Tuple input, BasicOutputCollector collector) { for (String s : input.getString(0).split("s")) { collector.emit(input, newValues(s)); } collector.ack(input); } } Tupel tree Anchoring incoming tuple to outgoing tuples Sending ack This “This is a line” This This This Emiting tuple with Message ID
  • 43. Sankt Augustin 24-25.08.2013 ACKing Framework ACKer init FileReaderSpout WordSplitBolt WordCountBolt ACKer implicit boltACKer ack ACKer fail ACKer ack ACKer fail Tuple A Tuple B Tuple C • Emitted tuple A, XOR tuple A id with ack val • Emitted tuple B, XOR tuple B id with ack val • Emitted tuple C, XOR tuple C id with ack val • Acked tuple A, XOR tuple A id with ack val • Acked tuple B, XOR tuple B id with ack val • Acked tuple C, XOR tuple C id with ack val Spout Tuple ID Spout Task ID ACK val (64 Bit) ACKer implizit bolt ACK val has become 0, ACKer implicit bolt knows the tuple tree has been completed
  • 44. Sankt Augustin 24-25.08.2013 Key features of Twitter Storm Storm is • Fast & scalable • Fault-tolerant • Guaranteeing message processing • Easy to setup & operate • Free & Open Source
  • 45. Sankt Augustin 24-25.08.2013 Cluster Setup • Setup ZooKeeper cluster • Install dependencies on Nimbus and worker machines – ZeroMQ 2.1.7 and JZMQ – Java 6 and Python 2.6.6 – unzip • Download and extract a Storm release to Nimbus and worker machines • Fill in mandatory configuration into storm.yaml • Launch daemons under supervision using storm scripts • Start a topology: – storm jar <path_topology_jar> <main_class> <arg1>…<argN>
  • 49. Sankt Augustin 24-25.08.2013 Key features of Twitter Storm Storm is • Fast & scalable • Fault-tolerant • Guaranteeing message processing • Easy to setup & operate • Free & Open Source
  • 50. Sankt Augustin 24-25.08.2013 Basic resources • Storm is available at – http://storm-project.net/ – https://github.com/nathanmarz/storm under Eclipse Public License 1.0 • Get help on – http://groups.google.com/group/storm-user – #storm-user freenode room • Follow @stormprocessor and @nathanmarz
  • 51. Sankt Augustin 24-25.08.2013 Many contributions • Community repository for modules to use Storm at – https://github.com/nathanmarz/storm-contrib – including integration with Redis, Kafka, MongoDB, HBase, JMS, Amazon SQS, … • Good articles for understanding Storm internals – http://www.michael-noll.com/blog/2012/10/16/understanding-the- parallelism-of-a-stormtopology/ – http://www.michael-noll.com/blog/2013/06/21/understanding-storm- internal-messagebuffers/ • Good slides for understanding real-life examples – http://www.slideshare.net/DanLynn1/storm-as-deep-into- realtime-data-processing-as-youcan-get-in-30-minutes – http://www.slideshare.net/KrishnaGade2/storm-at-twitter
  • 52. Sankt Augustin 24-25.08.2013 Coming next… • Current release: 0.8.2 • Work in progress (newest): 0.9.0-wip21 – SLF4J and Logback – Pluggable tuple serialization and blowfish encryption – Pluggable interprocess messaging and Netty implementation – Some bug fixes – And more • Storm on YARN
  • 53. Sankt Augustin 24-25.08.2013 Agenda • Why Twitter Storm? • What is Twitter Storm? • What to do with Twitter Storm?
  • 54. Sankt Augustin 24-25.08.2013 One example: Webshop • Webtracking component • No defined page impression • Identifying page impressions using Varnish logs of the click stream data • Page consists of different fragments – Body – Article description – Recommendation box, … • Session data also of interest
  • 55. Sankt Augustin 24-25.08.2013 One example: Webshop • Custom solution using J2EE and MongoDB • Export into Comscore DAx and Enterprise DWH • Solution is currently working but not scalable • What about performance?