SlideShare una empresa de Scribd logo
1 de 41
Small Pieces Loosely
       Joined
        #cgn13
or...
A practical example of processing real-time
   data with a distributed agent network
            (Warning: does not contain real code)
Red Gate
12th October 2011
eMail Marketing
Mailchimp webhook
"type": "subscribe",
"fired_at": "2009-03-26 21:35:57",
"data[id]": "8a25ff1d98",
"data[list_id]": "a6b5da1054",
"data[email]": "api@mailchimp.com",
"data[email_type]": "html",
"data[merges][EMAIL]": "api@mailchimp.com",
"data[merges][FNAME]": "MailChimp",
"data[merges][LNAME]": "API",
"data[merges][INTERESTS]": "Group1,Group2",
"data[ip_opt]": "10.20.10.30",
"data[ip_signup]": "10.20.10.30"
Pump the callbacks into a message bus...
Messaging
mailchimp-pump.php


$json = json_encode($_POST);
$msg = new AMQPMessage($json);
$channel->basic_publish($msg, 'mailchimp', "morat.campaign.mailchimp.".
$_POST['type']);
I’d like to watch the stream on IRC...
Valve

Subscribe to mailchimp exchange
morat.campaign.mailchimp.#
Translate to plain english for IRC
Inject into irc exchange with routing key morat.irc.
[channel]
mailchimp-irc-valve.rb

case record['type']
when 'subscribe'
    output :irc, "'#{record['data']['merges']['FNAME']} #{record['data']
    ['merges']['LNAME']}' has joined the list"
when 'unsubscribe'
    output :irc, "'#{record['data']['merges']['FNAME']} #{record['data']
    ['merges']['LNAME']}' has left the list"
...
Create a Sink to send the messages to IRC...
irc-sink.pl

$q = $amq->channel(1)->queue('morat.irc.' . $channel , { passive => 0,
durable => 0, auto_delete => 1, exclusive => 0, })->subscribe( sub {
    my ($payload, $meta) = @_;
    my ($channel) = $meta->{'queue'} =~ /.([^.]+)$/;

      $irc->yield('privmsg', '#'.$channel, GREEN.$payload);
});
Where have we got to?

Pump: Mailchimp webhook (HTTP POST) >
morat.[campaign].mailchimp.[type] (JSON)
Valve: morat.campaign.mailchimp.[type] (JSON) >
morat.irc.[campaign] (Text)
Sink: morat.irc.[campaign] (Text) > IRC server
That’s cool, but hey it would be great to see
#campaign tweets as well...
twitter-search-pump.rb
TweetStream::Client.new.track(keywords.split(',')) do |status|
  keywords.split(',').each do |searchterm|
    if status.text.match(searchterm)
      searchterm.sub!(' ','')
      searchterm.sub!('#','')
      log.debug "Sending: #{status.user.screen_name} :: #{status.text} ::
morat.twitter.search.#{searchterm}"
      broker.exchange.publish JSON.generate(status), :routing_key =>
"morat.twitter.search.#{searchterm}"
    end
  end
end
twitter-irc-valve.rb

case routing_key
when 'morat.twitter.@neildavidson.list.redgaters'
     output :irc, "RG chatter: #{record['user']['screen_name']} tweeted:
     #{record['text']}", :routing_key => "morat.irc.redgaters"
else
     searchterm = routing_key.match(/morat.twitter.search.(.+)/)[1]
     output :irc, "#{record['user']['screen_name']} tweeted:
     #{record['text']}", :routing_key => "morat.irc.#{searchterm}"
I feel the urge to graph...
Thanks @garethr
Valve
Subscribe to mailchimp exchange morat.
[campaign].mailchimp.#
Translate to Graphite format: [value] [timestamp]
Inject into graphite exchange with routing key
based on sample window: 10sec.
[campaign].mailchimp.[action].count
But let’s make it cool...
Complex Event Processing
mailchimp-graphite-
                valve.rb
    %w{ subscribe unsubscribe campaign }.each do |action|
  [ '10 sec', '1 min', '5 min', '15 min' ].each do |window|
     valve.register "SELECT count(*) from
MailchimpEvent(type='#{action}').win:time_batch(#{window})", (
       Listener.new(valve) do |agent, event|
         valve.output :graphite, "#{event.get('count(*)')}", :routing_key =>
window.delete(' ') + ".morat.#{valve.application}.mailchimp.#{action}"
       end
     )
  end
end
Why use CEP?
# find the sum of retweets of last 5 tweets which saw more than 10 retweets
SELECT sum(retweets) from TweetEvent(retweets >= 10).win:length(5)

# find max, min and average number of retweets for a sliding 60 second window of time
SELECT max(retweets), min(retweets), avg(retweets) FROM TweetEvent.win:time(60 sec)

# compute number of retweets for all tweets in 10 second batches
SELECT sum(retweets) from TweetEvent.win:time_batch(10 sec)

# number of retweets, grouped by timezone, buffered in 10 second increments
SELECT timezone, sum(retweets) from TweetEvent.win:time_batch(10 sec) group by timezone

# compute the sum of retweets in sliding 60 second window, and emit count every 30 events
SELECT sum(retweets) from TweetEvent.win:time(60 sec) output snapshot every 30 events

# every 10 seconds, report timezones which accumulated more than 10 retweets
SELECT timezone, sum(retweets) from TweetEvent.win:time_batch(10 sec) group by timezone having
sum(retweets) > 10

       Courtesy @igrigorik http://www.igvita.com/2011/05/27/streamsql-event-processing-with-esper/
Is there really a correlation?
Statistical Computing
Valve

Grab raw data for window from graphite via REST
Create scatter graph using R and calculate
correlation
Inject correlation into graphite exchange
twitter-correlation-valve.rb
      require 'rsruby'
...

r.jpeg(filename)
r.assign('xs', data[1])
r.assign('ys', data[2])
fit = r.lm('ys ~ xs')
r.plot({
   'x' => data[1],
   'y' => data[2],
   'xlab' => label[1],
   'ylab' => label[2]
})
cor = r.cor(data[1],data[2]).to_s
r.title("Correlation: " + cor)
r.abline(fit['coefficients']['(Intercept)'],fit['coefficients']['xs'])
r.eval_R("dev.off()")
Lets add some realtime visualisation...
Websockets
Valve
Subscribe to twitter exchange
morat.twitter.search.[keyword]
Extract adjectives using entagger
Inject adjectives into twitter exchange with routing
key morat.twitter.search.[keyword].adjectives as:
[adjective] [count]
twitter-sentiment-valve.rb
      require 'engtagger'
...

log.debug "Received tweet from #{record['user']['screen_name']} on
#{routing_key}"

adjectives = @parser.add_tags(record['text']).scan(EngTagger::ADJ).map do |
n|
   @parser.strip_tags(n)
end

ret = Hash.new(0)
adjectives.each do |n|
  n = @parser.stem(n)
  ret[n] += 1 unless n =~ /As*z/
end
Sink

Subscribe to twitter exchange
morat.twitter.search.[keyword].adjectives
Use node.js and Socket.IO to send data to web
client via Websockets
Visualise with processing.js in web browser
twitter-sentiment-sink.js
    io.sockets.on('connection', function (socket) {
     amqp_connection.on('ready', function () {
         var queue = amqp_connection.queue('');
         exchange = amqp_connection.exchange('twitter', { type: 'topic',
passive: false, durable: true, autoDelete: true}, function (exchange) {
             queue.bind(exchange,routing_key);
             queue.subscribe(function (message) {
                 socket.emit('data', { text: message.data.toString() });
             });
         });
     });
});
twitter-sentiment-sink.html
     <H1>Twitter Sentiment</H1>
  <div id="container">
    <canvas id="twitter-sentiment-sink" data-processing-sources="twitter-
sentiment-sink.pde" WIDTH=800 HEIGHT=600></canvas>
  </div>
  <script src="/socket.io/socket.io.js"></script>
  <script type="text/javascript">
    var socket = io.connect('http://localhost');
    socket.on('data', function (data) {
      var pjs = Processing.getInstanceById('twitter-sentiment-sink');
      pjs.addDatum(data.text.split(' ')[0]);
    });
  </script>
@ennui2342

www.morat.co.uk
 polis.ecafe.org

Más contenido relacionado

La actualidad más candente

Build Lightweight Web Module
Build Lightweight Web ModuleBuild Lightweight Web Module
Build Lightweight Web Module
Morgan Cheng
 
fog or: How I Learned to Stop Worrying and Love the Cloud (OpenStack Edition)
fog or: How I Learned to Stop Worrying and Love the Cloud (OpenStack Edition)fog or: How I Learned to Stop Worrying and Love the Cloud (OpenStack Edition)
fog or: How I Learned to Stop Worrying and Love the Cloud (OpenStack Edition)
Wesley Beary
 
Redux. From twitter hype to production
Redux. From twitter hype to productionRedux. From twitter hype to production
Redux. From twitter hype to production
FDConf
 
Building Real Time Systems on MongoDB Using the Oplog at Stripe
Building Real Time Systems on MongoDB Using the Oplog at StripeBuilding Real Time Systems on MongoDB Using the Oplog at Stripe
Building Real Time Systems on MongoDB Using the Oplog at Stripe
MongoDB
 
Anton Moldovan "Load testing which you always wanted"
Anton Moldovan "Load testing which you always wanted"Anton Moldovan "Load testing which you always wanted"
Anton Moldovan "Load testing which you always wanted"
Fwdays
 
Assignment no39
Assignment no39Assignment no39
Assignment no39
Jay Patel
 

La actualidad más candente (20)

Cross Domain Web
Mashups with JQuery and Google App Engine
Cross Domain Web
Mashups with JQuery and Google App EngineCross Domain Web
Mashups with JQuery and Google App Engine
Cross Domain Web
Mashups with JQuery and Google App Engine
 
The Ring programming language version 1.10 book - Part 10 of 212
The Ring programming language version 1.10 book - Part 10 of 212The Ring programming language version 1.10 book - Part 10 of 212
The Ring programming language version 1.10 book - Part 10 of 212
 
Firebase ng2 zurich
Firebase ng2 zurichFirebase ng2 zurich
Firebase ng2 zurich
 
Build Lightweight Web Module
Build Lightweight Web ModuleBuild Lightweight Web Module
Build Lightweight Web Module
 
fog or: How I Learned to Stop Worrying and Love the Cloud (OpenStack Edition)
fog or: How I Learned to Stop Worrying and Love the Cloud (OpenStack Edition)fog or: How I Learned to Stop Worrying and Love the Cloud (OpenStack Edition)
fog or: How I Learned to Stop Worrying and Love the Cloud (OpenStack Edition)
 
Using Change Streams to Keep Up with Your Data
Using Change Streams to Keep Up with Your DataUsing Change Streams to Keep Up with Your Data
Using Change Streams to Keep Up with Your Data
 
Use Kotlin scripts and Clova SDK to build your Clova extension
Use Kotlin scripts and Clova SDK to build your Clova extensionUse Kotlin scripts and Clova SDK to build your Clova extension
Use Kotlin scripts and Clova SDK to build your Clova extension
 
Nagios Conference 2014 - Rodrigo Faria - Developing your Plugin
Nagios Conference 2014 - Rodrigo Faria - Developing your PluginNagios Conference 2014 - Rodrigo Faria - Developing your Plugin
Nagios Conference 2014 - Rodrigo Faria - Developing your Plugin
 
Psycopg2 - Connect to PostgreSQL using Python Script
Psycopg2 - Connect to PostgreSQL using Python ScriptPsycopg2 - Connect to PostgreSQL using Python Script
Psycopg2 - Connect to PostgreSQL using Python Script
 
Rxjs swetugg
Rxjs swetuggRxjs swetugg
Rxjs swetugg
 
Streaming twitter data using kafka
Streaming twitter data using kafkaStreaming twitter data using kafka
Streaming twitter data using kafka
 
Watch out: Observables are here to stay
Watch out: Observables are here to stayWatch out: Observables are here to stay
Watch out: Observables are here to stay
 
Nagios Conference 2014 - Rob Seiwert - Graphing and Trend Prediction in Nagios
Nagios Conference 2014 - Rob Seiwert - Graphing and Trend Prediction in NagiosNagios Conference 2014 - Rob Seiwert - Graphing and Trend Prediction in Nagios
Nagios Conference 2014 - Rob Seiwert - Graphing and Trend Prediction in Nagios
 
SignalR
SignalRSignalR
SignalR
 
Akka: Actor Design & Communication Technics
Akka: Actor Design & Communication TechnicsAkka: Actor Design & Communication Technics
Akka: Actor Design & Communication Technics
 
Redux. From twitter hype to production
Redux. From twitter hype to productionRedux. From twitter hype to production
Redux. From twitter hype to production
 
Building Real Time Systems on MongoDB Using the Oplog at Stripe
Building Real Time Systems on MongoDB Using the Oplog at StripeBuilding Real Time Systems on MongoDB Using the Oplog at Stripe
Building Real Time Systems on MongoDB Using the Oplog at Stripe
 
Anton Moldovan "Load testing which you always wanted"
Anton Moldovan "Load testing which you always wanted"Anton Moldovan "Load testing which you always wanted"
Anton Moldovan "Load testing which you always wanted"
 
Rxjs marble-testing
Rxjs marble-testingRxjs marble-testing
Rxjs marble-testing
 
Assignment no39
Assignment no39Assignment no39
Assignment no39
 

Similar a Small pieces loosely joined

Building Analytics Applications with Streaming Expressions in Apache Solr - A...
Building Analytics Applications with Streaming Expressions in Apache Solr - A...Building Analytics Applications with Streaming Expressions in Apache Solr - A...
Building Analytics Applications with Streaming Expressions in Apache Solr - A...
Lucidworks
 
Zeromq - Pycon India 2013
Zeromq - Pycon India 2013Zeromq - Pycon India 2013
Zeromq - Pycon India 2013
Srinivasan R
 
Introducing the WSO2 Complex Event Processor
Introducing the WSO2 Complex Event ProcessorIntroducing the WSO2 Complex Event Processor
Introducing the WSO2 Complex Event Processor
WSO2
 

Similar a Small pieces loosely joined (20)

TSAR (TimeSeries AggregatoR) Tech Talk
TSAR (TimeSeries AggregatoR) Tech TalkTSAR (TimeSeries AggregatoR) Tech Talk
TSAR (TimeSeries AggregatoR) Tech Talk
 
Tsar tech talk
Tsar tech talkTsar tech talk
Tsar tech talk
 
fog or: How I Learned to Stop Worrying and Love the Cloud
fog or: How I Learned to Stop Worrying and Love the Cloudfog or: How I Learned to Stop Worrying and Love the Cloud
fog or: How I Learned to Stop Worrying and Love the Cloud
 
Lambdas puzzler - Peter Lawrey
Lambdas puzzler - Peter LawreyLambdas puzzler - Peter Lawrey
Lambdas puzzler - Peter Lawrey
 
Streaming Way to Webscale: How We Scale Bitly via Streaming
Streaming Way to Webscale: How We Scale Bitly via StreamingStreaming Way to Webscale: How We Scale Bitly via Streaming
Streaming Way to Webscale: How We Scale Bitly via Streaming
 
112 portfpres.pdf
112 portfpres.pdf112 portfpres.pdf
112 portfpres.pdf
 
The Browser Environment - A Systems Programmer's Perspective [sinatra edition]
The Browser Environment - A Systems Programmer's Perspective [sinatra edition]The Browser Environment - A Systems Programmer's Perspective [sinatra edition]
The Browser Environment - A Systems Programmer's Perspective [sinatra edition]
 
Streaming Solr - Activate 2018 talk
Streaming Solr - Activate 2018 talkStreaming Solr - Activate 2018 talk
Streaming Solr - Activate 2018 talk
 
Building Analytics Applications with Streaming Expressions in Apache Solr - A...
Building Analytics Applications with Streaming Expressions in Apache Solr - A...Building Analytics Applications with Streaming Expressions in Apache Solr - A...
Building Analytics Applications with Streaming Expressions in Apache Solr - A...
 
Introduction to Marionette Collective
Introduction to Marionette CollectiveIntroduction to Marionette Collective
Introduction to Marionette Collective
 
Deep dive into stateful stream processing in structured streaming by Tathaga...
Deep dive into stateful stream processing in structured streaming  by Tathaga...Deep dive into stateful stream processing in structured streaming  by Tathaga...
Deep dive into stateful stream processing in structured streaming by Tathaga...
 
Timeseries - data visualization in Grafana
Timeseries - data visualization in GrafanaTimeseries - data visualization in Grafana
Timeseries - data visualization in Grafana
 
WattGo: Analyses temps-réél de series temporelles avec Spark et Solr (Français)
WattGo: Analyses temps-réél de series temporelles avec Spark et Solr (Français)WattGo: Analyses temps-réél de series temporelles avec Spark et Solr (Français)
WattGo: Analyses temps-réél de series temporelles avec Spark et Solr (Français)
 
Pycon - Python for ethical hackers
Pycon - Python for ethical hackers Pycon - Python for ethical hackers
Pycon - Python for ethical hackers
 
Zeromq - Pycon India 2013
Zeromq - Pycon India 2013Zeromq - Pycon India 2013
Zeromq - Pycon India 2013
 
Arduino and the real time web
Arduino and the real time webArduino and the real time web
Arduino and the real time web
 
Reactive programming every day
Reactive programming every dayReactive programming every day
Reactive programming every day
 
Introducing the WSO2 Complex Event Processor
Introducing the WSO2 Complex Event ProcessorIntroducing the WSO2 Complex Event Processor
Introducing the WSO2 Complex Event Processor
 
Streaming sql w kafka and flink
Streaming sql w  kafka and flinkStreaming sql w  kafka and flink
Streaming sql w kafka and flink
 
Big Data Analytics with Scala at SCALA.IO 2013
Big Data Analytics with Scala at SCALA.IO 2013Big Data Analytics with Scala at SCALA.IO 2013
Big Data Analytics with Scala at SCALA.IO 2013
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 

Small pieces loosely joined

  • 1. Small Pieces Loosely Joined #cgn13
  • 2. or... A practical example of processing real-time data with a distributed agent network (Warning: does not contain real code)
  • 6. Mailchimp webhook "type": "subscribe", "fired_at": "2009-03-26 21:35:57", "data[id]": "8a25ff1d98", "data[list_id]": "a6b5da1054", "data[email]": "api@mailchimp.com", "data[email_type]": "html", "data[merges][EMAIL]": "api@mailchimp.com", "data[merges][FNAME]": "MailChimp", "data[merges][LNAME]": "API", "data[merges][INTERESTS]": "Group1,Group2", "data[ip_opt]": "10.20.10.30", "data[ip_signup]": "10.20.10.30"
  • 7. Pump the callbacks into a message bus...
  • 9. mailchimp-pump.php $json = json_encode($_POST); $msg = new AMQPMessage($json); $channel->basic_publish($msg, 'mailchimp', "morat.campaign.mailchimp.". $_POST['type']);
  • 10. I’d like to watch the stream on IRC...
  • 11. Valve Subscribe to mailchimp exchange morat.campaign.mailchimp.# Translate to plain english for IRC Inject into irc exchange with routing key morat.irc. [channel]
  • 12. mailchimp-irc-valve.rb case record['type'] when 'subscribe' output :irc, "'#{record['data']['merges']['FNAME']} #{record['data'] ['merges']['LNAME']}' has joined the list" when 'unsubscribe' output :irc, "'#{record['data']['merges']['FNAME']} #{record['data'] ['merges']['LNAME']}' has left the list" ...
  • 13. Create a Sink to send the messages to IRC...
  • 14. irc-sink.pl $q = $amq->channel(1)->queue('morat.irc.' . $channel , { passive => 0, durable => 0, auto_delete => 1, exclusive => 0, })->subscribe( sub { my ($payload, $meta) = @_; my ($channel) = $meta->{'queue'} =~ /.([^.]+)$/; $irc->yield('privmsg', '#'.$channel, GREEN.$payload); });
  • 15. Where have we got to? Pump: Mailchimp webhook (HTTP POST) > morat.[campaign].mailchimp.[type] (JSON) Valve: morat.campaign.mailchimp.[type] (JSON) > morat.irc.[campaign] (Text) Sink: morat.irc.[campaign] (Text) > IRC server
  • 16. That’s cool, but hey it would be great to see #campaign tweets as well...
  • 17. twitter-search-pump.rb TweetStream::Client.new.track(keywords.split(',')) do |status| keywords.split(',').each do |searchterm| if status.text.match(searchterm) searchterm.sub!(' ','') searchterm.sub!('#','') log.debug "Sending: #{status.user.screen_name} :: #{status.text} :: morat.twitter.search.#{searchterm}" broker.exchange.publish JSON.generate(status), :routing_key => "morat.twitter.search.#{searchterm}" end end end
  • 18. twitter-irc-valve.rb case routing_key when 'morat.twitter.@neildavidson.list.redgaters' output :irc, "RG chatter: #{record['user']['screen_name']} tweeted: #{record['text']}", :routing_key => "morat.irc.redgaters" else searchterm = routing_key.match(/morat.twitter.search.(.+)/)[1] output :irc, "#{record['user']['screen_name']} tweeted: #{record['text']}", :routing_key => "morat.irc.#{searchterm}"
  • 19.
  • 20. I feel the urge to graph...
  • 22. Valve Subscribe to mailchimp exchange morat. [campaign].mailchimp.# Translate to Graphite format: [value] [timestamp] Inject into graphite exchange with routing key based on sample window: 10sec. [campaign].mailchimp.[action].count
  • 23. But let’s make it cool...
  • 25. mailchimp-graphite- valve.rb %w{ subscribe unsubscribe campaign }.each do |action| [ '10 sec', '1 min', '5 min', '15 min' ].each do |window| valve.register "SELECT count(*) from MailchimpEvent(type='#{action}').win:time_batch(#{window})", ( Listener.new(valve) do |agent, event| valve.output :graphite, "#{event.get('count(*)')}", :routing_key => window.delete(' ') + ".morat.#{valve.application}.mailchimp.#{action}" end ) end end
  • 26. Why use CEP? # find the sum of retweets of last 5 tweets which saw more than 10 retweets SELECT sum(retweets) from TweetEvent(retweets >= 10).win:length(5) # find max, min and average number of retweets for a sliding 60 second window of time SELECT max(retweets), min(retweets), avg(retweets) FROM TweetEvent.win:time(60 sec) # compute number of retweets for all tweets in 10 second batches SELECT sum(retweets) from TweetEvent.win:time_batch(10 sec) # number of retweets, grouped by timezone, buffered in 10 second increments SELECT timezone, sum(retweets) from TweetEvent.win:time_batch(10 sec) group by timezone # compute the sum of retweets in sliding 60 second window, and emit count every 30 events SELECT sum(retweets) from TweetEvent.win:time(60 sec) output snapshot every 30 events # every 10 seconds, report timezones which accumulated more than 10 retweets SELECT timezone, sum(retweets) from TweetEvent.win:time_batch(10 sec) group by timezone having sum(retweets) > 10 Courtesy @igrigorik http://www.igvita.com/2011/05/27/streamsql-event-processing-with-esper/
  • 27.
  • 28. Is there really a correlation?
  • 30. Valve Grab raw data for window from graphite via REST Create scatter graph using R and calculate correlation Inject correlation into graphite exchange
  • 31. twitter-correlation-valve.rb require 'rsruby' ... r.jpeg(filename) r.assign('xs', data[1]) r.assign('ys', data[2]) fit = r.lm('ys ~ xs') r.plot({ 'x' => data[1], 'y' => data[2], 'xlab' => label[1], 'ylab' => label[2] }) cor = r.cor(data[1],data[2]).to_s r.title("Correlation: " + cor) r.abline(fit['coefficients']['(Intercept)'],fit['coefficients']['xs']) r.eval_R("dev.off()")
  • 32.
  • 33. Lets add some realtime visualisation...
  • 35. Valve Subscribe to twitter exchange morat.twitter.search.[keyword] Extract adjectives using entagger Inject adjectives into twitter exchange with routing key morat.twitter.search.[keyword].adjectives as: [adjective] [count]
  • 36. twitter-sentiment-valve.rb require 'engtagger' ... log.debug "Received tweet from #{record['user']['screen_name']} on #{routing_key}" adjectives = @parser.add_tags(record['text']).scan(EngTagger::ADJ).map do | n| @parser.strip_tags(n) end ret = Hash.new(0) adjectives.each do |n| n = @parser.stem(n) ret[n] += 1 unless n =~ /As*z/ end
  • 37. Sink Subscribe to twitter exchange morat.twitter.search.[keyword].adjectives Use node.js and Socket.IO to send data to web client via Websockets Visualise with processing.js in web browser
  • 38. twitter-sentiment-sink.js io.sockets.on('connection', function (socket) { amqp_connection.on('ready', function () { var queue = amqp_connection.queue(''); exchange = amqp_connection.exchange('twitter', { type: 'topic', passive: false, durable: true, autoDelete: true}, function (exchange) { queue.bind(exchange,routing_key); queue.subscribe(function (message) { socket.emit('data', { text: message.data.toString() }); }); }); }); });
  • 39. twitter-sentiment-sink.html <H1>Twitter Sentiment</H1> <div id="container"> <canvas id="twitter-sentiment-sink" data-processing-sources="twitter- sentiment-sink.pde" WIDTH=800 HEIGHT=600></canvas> </div> <script src="/socket.io/socket.io.js"></script> <script type="text/javascript"> var socket = io.connect('http://localhost'); socket.on('data', function (data) { var pjs = Processing.getInstanceById('twitter-sentiment-sink'); pjs.addDatum(data.text.split(' ')[0]); }); </script>
  • 40.

Notas del editor

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n
  27. \n
  28. \n
  29. \n
  30. \n
  31. \n
  32. \n
  33. \n
  34. \n
  35. \n
  36. \n
  37. \n
  38. \n
  39. \n
  40. \n
  41. Take the adjectives and store a running total in Redis to create long timeline tag clouds\n Pull out @replies and RT&amp;#x2019;s and throw them into Neo4j - a graph database for post-competition analysis\n Hook an Arduino up to IRC to receive Mailchimp subscriptions and create a physical visualisation in the office (e.g. glow ball)\n
  42. \n
  43. \n
  44. \n
  45. \n
  46. \n