SlideShare una empresa de Scribd logo
1 de 38
Descargar para leer sin conexión
“If everything seems
  under control, you're
not going fast enough.”
realtime analysis of #debate hashtag




                  Davide Palmisano @dpalmisano
when size matters: the
  4Vs of   big data

   Volume, Velocity, Variety,
   and Veracity
let’s focus on   Velocity
during peak time ~35
              persons/second top up
              their Oyster card*




http://www.tfl.gov.uk/corporate/modesoftransport/londonunderground/1608.aspx
every second ~58 new
   pictures are uploaded on
                   Instagram*




http://www.digitalbuzzblog.com/infographic-instagram-stats/
the night of the first
#debate,      2615 tweets
           per second have
                been recorded*


http://www.nbcnews.com/technology/technolog/presidential-debate-sets-twitter-record-6281796
What have been the most
  influential URLs ?
What have been the   implicit
 concepts underlying the
       conversation?
How these concepts
evolved during the
    discussion?
every single tweet
potentially contains some
   hidden information
extract such information,
   making it explicit,
     analysing it
 and doing it at a rate of
   ~2000 tweets/sec?
real-time analytics

Storm,       a free and open source
   distributed realtime computation
   system. Storm makes it easy to
        reliably process unbounded
streams of data, doing for realtime
   processing what Hadoop did for
                    batch processing.
batch analyses

The Apache Hadoop software library is a
framework that allows for the distributed
  processing of large data sets across
   clusters of computers using simple
          programming models.

         + hdfs, a distributed FS
data gathering from the Social Web




    crunching the Social Web, in real-time.



formerly known as          Beancounter
beancounter.io is a SaaS
  platform to profile your
users from their activities on
      the Social Web
now powering part
of the Italian public
        broadcaster
            #socialtv
       environment
(a quick parenthesis)

                                                                                or ...

   “how a butterfly flapping
     its wings in Asia might
   cause a hurricane in the
                    Atlantic”                                                       *
http://www.amazon.com/Strategic-Thinking-New-Science-Complexity/dp/0684842688
beancounter.io uses Twitter
  OAuth authorisation to
 perform TV Social events
        check-ins
while beancounter.io was
         handling more than ~100
          check-ins per minute

       at 13.32 UTC-8 Twitter had
                            an         outage *
https://status.io.watchmouse.com/7617/125017//statuses/home_timeline-(OAuth-1.0a)
Facebook and Twitter check-ins rate


                             Nov 6, 2012 13:32 UTC-8      twitter service disruption
                                                                                                                      200




                                                                                                                    150




                                                                                                                100




                                                                                                               50


2012-11-06T20:45:01.690984
                             2012-11-06T21:40:03.615521

                                                          2012-11-06T22:35:04.645506                       0


                                                                                       2012-11-06T23:30:05.627388
Facebook and Twitter overall comments
                                   Nov 6, 2012 13:32 UTC-8                         twitter service disruption

                                                                                                                               1500




                                                                                                                              1125




                                                                                                                         750




                                                                                                                        375




2012-11-06T20:45:01.690984
                                                                                                                    0
                             2012-11-06T21:30:02.861083

                                                          2012-11-06T22:15:04.455317

                                                                                       2012-11-06T23:00:05.432714




                                                                                              Facebook              Twitter
lesson learnt: the real-time
Web is an hyper-connected
graph of a myriad of di!erent
        live systems


 always mind the butterflies,
 even if you can’t see them
back to #debate
<timestamp, <c0...cn>>

concepts are extracted using NLP
  technologies for each tweet
we’ve tied together beancounter.io,
              Storm and Hadoop


  please note, this was only the
       10% of the firehose


                                                  real-time analytics


hdfs, distributed FS

                                                Storm
                              batch analytics
more than ~ 500k tweets
processed in 2h for an average
      rate of ~70 t/sec

    each tweet produced a
  snapshot (~10k each) for an
 overall size of 4.6GB of data
more than ~18k
    di!erent URLs shared


 highest peak: 253 tweets/sec


5 amazon EC2 x-large instance
    + 2 mid-sized for HDFS
recurring concepts

                                                                                                70000




                                                                                            52500




                                                                                         35000




                                                                                        17500




Osama Bin Laden
             Iran
                    Israel                                                          0
                             Middle East
                                           Pakistan
                                                      Iraq
                                                             Afghanistan
                                                                           Russia
most co-occurrent concepts

      Iran - Israel 35.356 %
 Russia - Middle East 24.7 %
                 ...
                 ...
Wikileaks - Richard Nixon 93.5%
5321




17284   6960
facts
  data viz is a completely another job


mining data requires science skills, it’s not
 just about technology: it’s about math

 forget to control everything when data
   flows at that speed: make reasoned
             approximations
?
Davide Palmisano
@dpalmisano
http://davidepalmisano.com

Más contenido relacionado

Más de Davide Palmisano

beancounter.io - Social Web user profiling as a service #semtechbiz
beancounter.io - Social Web user profiling as a service #semtechbiz beancounter.io - Social Web user profiling as a service #semtechbiz
beancounter.io - Social Web user profiling as a service #semtechbiz Davide Palmisano
 
NoTube: past, present and future
NoTube: past, present and futureNoTube: past, present and future
NoTube: past, present and futureDavide Palmisano
 
Dear Sourcesense, don't you think it's time to make sense of #opendata as well?
Dear Sourcesense, don't you think it's time to make sense of #opendata as well?Dear Sourcesense, don't you think it's time to make sense of #opendata as well?
Dear Sourcesense, don't you think it's time to make sense of #opendata as well?Davide Palmisano
 
distilling the Web of Data drop by drop (with Java)
distilling the Web of Data drop by drop (with Java)distilling the Web of Data drop by drop (with Java)
distilling the Web of Data drop by drop (with Java)Davide Palmisano
 
From the Semantic Web to the Web of Data: ten years of linking up
From the Semantic Web to the Web of Data: ten years of linking upFrom the Semantic Web to the Web of Data: ten years of linking up
From the Semantic Web to the Web of Data: ten years of linking upDavide Palmisano
 
NoTube Project Collecting Data Social Web
NoTube Project Collecting Data Social WebNoTube Project Collecting Data Social Web
NoTube Project Collecting Data Social WebDavide Palmisano
 

Más de Davide Palmisano (7)

beancounter.io - Social Web user profiling as a service #semtechbiz
beancounter.io - Social Web user profiling as a service #semtechbiz beancounter.io - Social Web user profiling as a service #semtechbiz
beancounter.io - Social Web user profiling as a service #semtechbiz
 
NoTube: past, present and future
NoTube: past, present and futureNoTube: past, present and future
NoTube: past, present and future
 
Dear Sourcesense, don't you think it's time to make sense of #opendata as well?
Dear Sourcesense, don't you think it's time to make sense of #opendata as well?Dear Sourcesense, don't you think it's time to make sense of #opendata as well?
Dear Sourcesense, don't you think it's time to make sense of #opendata as well?
 
distilling the Web of Data drop by drop (with Java)
distilling the Web of Data drop by drop (with Java)distilling the Web of Data drop by drop (with Java)
distilling the Web of Data drop by drop (with Java)
 
From the Semantic Web to the Web of Data: ten years of linking up
From the Semantic Web to the Web of Data: ten years of linking upFrom the Semantic Web to the Web of Data: ten years of linking up
From the Semantic Web to the Web of Data: ten years of linking up
 
Unwinding The Twine
Unwinding The TwineUnwinding The Twine
Unwinding The Twine
 
NoTube Project Collecting Data Social Web
NoTube Project Collecting Data Social WebNoTube Project Collecting Data Social Web
NoTube Project Collecting Data Social Web
 

Último

Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 

Último (20)

Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 

If everything seems under control, you're not going fast enough

  • 1. “If everything seems under control, you're not going fast enough.” realtime analysis of #debate hashtag Davide Palmisano @dpalmisano
  • 2. when size matters: the 4Vs of big data Volume, Velocity, Variety, and Veracity
  • 3. let’s focus on Velocity
  • 4. during peak time ~35 persons/second top up their Oyster card* http://www.tfl.gov.uk/corporate/modesoftransport/londonunderground/1608.aspx
  • 5. every second ~58 new pictures are uploaded on Instagram* http://www.digitalbuzzblog.com/infographic-instagram-stats/
  • 6. the night of the first #debate, 2615 tweets per second have been recorded* http://www.nbcnews.com/technology/technolog/presidential-debate-sets-twitter-record-6281796
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12. What have been the most influential URLs ?
  • 13. What have been the implicit concepts underlying the conversation?
  • 14. How these concepts evolved during the discussion?
  • 15. every single tweet potentially contains some hidden information
  • 16. extract such information, making it explicit, analysing it and doing it at a rate of ~2000 tweets/sec?
  • 17. real-time analytics Storm, a free and open source distributed realtime computation system. Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing.
  • 18. batch analyses The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. + hdfs, a distributed FS
  • 19. data gathering from the Social Web crunching the Social Web, in real-time. formerly known as Beancounter
  • 20. beancounter.io is a SaaS platform to profile your users from their activities on the Social Web
  • 21. now powering part of the Italian public broadcaster #socialtv environment
  • 22. (a quick parenthesis) or ... “how a butterfly flapping its wings in Asia might cause a hurricane in the Atlantic” * http://www.amazon.com/Strategic-Thinking-New-Science-Complexity/dp/0684842688
  • 23. beancounter.io uses Twitter OAuth authorisation to perform TV Social events check-ins
  • 24. while beancounter.io was handling more than ~100 check-ins per minute at 13.32 UTC-8 Twitter had an outage * https://status.io.watchmouse.com/7617/125017//statuses/home_timeline-(OAuth-1.0a)
  • 25. Facebook and Twitter check-ins rate Nov 6, 2012 13:32 UTC-8 twitter service disruption 200 150 100 50 2012-11-06T20:45:01.690984 2012-11-06T21:40:03.615521 2012-11-06T22:35:04.645506 0 2012-11-06T23:30:05.627388
  • 26. Facebook and Twitter overall comments Nov 6, 2012 13:32 UTC-8 twitter service disruption 1500 1125 750 375 2012-11-06T20:45:01.690984 0 2012-11-06T21:30:02.861083 2012-11-06T22:15:04.455317 2012-11-06T23:00:05.432714 Facebook Twitter
  • 27. lesson learnt: the real-time Web is an hyper-connected graph of a myriad of di!erent live systems always mind the butterflies, even if you can’t see them
  • 29. <timestamp, <c0...cn>> concepts are extracted using NLP technologies for each tweet
  • 30. we’ve tied together beancounter.io, Storm and Hadoop please note, this was only the 10% of the firehose real-time analytics hdfs, distributed FS Storm batch analytics
  • 31. more than ~ 500k tweets processed in 2h for an average rate of ~70 t/sec each tweet produced a snapshot (~10k each) for an overall size of 4.6GB of data
  • 32. more than ~18k di!erent URLs shared highest peak: 253 tweets/sec 5 amazon EC2 x-large instance + 2 mid-sized for HDFS
  • 33. recurring concepts 70000 52500 35000 17500 Osama Bin Laden Iran Israel 0 Middle East Pakistan Iraq Afghanistan Russia
  • 34. most co-occurrent concepts Iran - Israel 35.356 % Russia - Middle East 24.7 % ... ... Wikileaks - Richard Nixon 93.5%
  • 35. 5321 17284 6960
  • 36. facts data viz is a completely another job mining data requires science skills, it’s not just about technology: it’s about math forget to control everything when data flows at that speed: make reasoned approximations
  • 37. ?