SlideShare a Scribd company logo
1 of 27
Download to read offline
Acunu Analytics
                           Simple, powerful, real-time


                                 Andrew Byde
                               Principal Scientist




Tuesday, 27 March 2012
Making big data useful
                                    How do we turn this ...
                            time                   page                 session id     duration

                              ...                    ...                    ...           ...

                         14:58:03.234           /index.html             248.180.3.40     898

                         14:58:03.234   /csi/csi/council/freedom.html   248.180.3.40     1234

                         14:58:03.234    /docs/access/chapter8.txt      99.1.10.178       52

                              ...                    ...                     ...          ...




                                            x billions

Tuesday, 27 March 2012
MY
                              Introduction


       into this...




Tuesday, 27 March 2012
or this...




Tuesday, 27 March 2012
or this...




Tuesday, 27 March 2012
• SQL + materialised views




Tuesday, 27 March 2012
• SQL + materialised views
                         ... would be nice if it scaled




Tuesday, 27 March 2012
• Hadoop/Map-Reduce can do anything




Tuesday, 27 March 2012
• Hadoop/Map-Reduce can do anything
                         Not real-time

                         Inefficient re-computation




Tuesday, 27 March 2012
• Hadoop/Map-Reduce can do anything
                         Not real-time

                         Inefficient re-computation

                             (100TB on a 100 node cluster is > 3 hours)




Tuesday, 27 March 2012
• Cassandra counters are pretty cool




Tuesday, 27 March 2012
• Cassandra counters are pretty cool
                         but the query semantics is spartan

                         => DIY solutions




Tuesday, 27 March 2012
Acunu Analytics
                   • Simple, real-time, incremental analytics
                   • push processing into ingest phase
                                 AA
                         event                     Cassandra
                                       counter
                                       updates

Tuesday, 27 March 2012
Acunu Analytics
               • Event template, e.g.,
                          select : ["COUNT", "AVG(loadTime)"],
                          type : {
                             time : [TIME(HOUR; MIN; SEC), ?, 0],
                             page : PATH(/),
                             loadTime : [LONG, 0, 0]
                          }

               • specifies “blow-up” strategy according to
                         supported queries


Tuesday, 27 March 2012
Acunu Analytics
 type : {
    time : TIME(HOUR; MIN),
    category : STRING,
    user : STRING               21:00        all→1345     :00→45      :01→62      :02→87     ...
 }
                                22:00        all→3221     :00→22      :00→19     :02→104     ...

                                   ...                                                       ...

                                 click        all→228    user01→1    user14→12   user99→7    ...

                                 open         all→354    user01→4    user04→8    user56→17   ...

                                   ...

                              click, 22:00   all→1904       ...

                                  ∅          all→87314   click→238   open→354       ...




Tuesday, 27 March 2012
Acunu Analytics
 type : {
    time : TIME(HOUR; MIN),
    category : STRING,
    user : STRING               21:00        all→1345     :00→45      :01→62      :02→87     ...
 }
                                22:00        all→3221     :00→22      :00→19     :02→104     ...

                                   ...                                                       ...


   (22:02, “click”, user01)      click        all→228    user01→1    user14→12   user99→7    ...

                                 open         all→354    user01→4    user04→8    user56→17   ...

                                   ...

                              click, 22:00   all→1904       ...

                                  ∅          all→87314   click→238   open→354       ...




Tuesday, 27 March 2012
Acunu Analytics
 type : {
    time : TIME(HOUR; MIN),
    category : STRING,
    user : STRING               21:00         all→1345    :00→45    :01→62      :02→87     ...
 }
                                22:00        all→3222     :00→22    :00→19     :02→105     ...

                                   ...                                                     ...


   (22:02, “click”, user01)      click       all→229     user01→2 user14→12    user99→7    ...

                                 open         all→354    user01→4   user04→8   user56→17   ...

                                   ...

                              click, 22:00   all→1905       ...

                                  ∅          all→87315 click→239 open→355         ...




Tuesday, 27 March 2012
Acunu Analytics
     Pre-assembled queries, e.g. ...
                                21:00        all→1345     :00→45      :01→62      :02→87     ...

  for 22:00-23:00,              22:00        all→3222     :00→22      :00→19     :02→105     ...

    group by minute                ...                                                       ...

                                 click        all→229    user01→2    user14→12   user99→7    ...
  group all by user,             open         all→354    user01→4    user04→8    user56→17   ...
   where category=click            ...


           count all          click, 22:00   all→1905       ...

                                  ∅          all→87315   click→239   open→355       ...

     group all by category
Tuesday, 27 March 2012
Summary
                   • Simple, real-time, incremental analytics
                   • work done on ingest
                   • sum, count, distinct, avg, stddev, min-max etc
                   • time + hierarchy bucketing
                   • efficient ‘group’ semantics
                   • works with Apache Cassandra
Tuesday, 27 March 2012
Early Access Program


                           analytics@acunu.com




Tuesday, 27 March 2012
Tuesday, 27 March 2012
count




Tuesday, 27 March 2012
count
  distinct
 (session)
       count




Tuesday, 27 March 2012
count
  distinct
 (session)
       count


 avg(duration)




Tuesday, 27 March 2012
count
                         grouped by ...
                             day
   count
  distinct
 (session)
       count


 avg(duration)




Tuesday, 27 March 2012
count
                         grouped by ...
                             day
   count
  distinct
 (session)
       count               ... geography

 avg(duration)




Tuesday, 27 March 2012
count
                         grouped by ...
                             day
   count
  distinct
 (session)
       count               ... geography

 avg(duration)
                            ... browser



Tuesday, 27 March 2012

More Related Content

More from Acunu

Understanding Cassandra internals to solve real-world problems
Understanding Cassandra internals to solve real-world problemsUnderstanding Cassandra internals to solve real-world problems
Understanding Cassandra internals to solve real-world problems
Acunu
 
Supercharging Cassandra - GOTO Amsterdam
Supercharging Cassandra - GOTO AmsterdamSupercharging Cassandra - GOTO Amsterdam
Supercharging Cassandra - GOTO Amsterdam
Acunu
 

More from Acunu (20)

Understanding Cassandra internals to solve real-world problems
Understanding Cassandra internals to solve real-world problemsUnderstanding Cassandra internals to solve real-world problems
Understanding Cassandra internals to solve real-world problems
 
Acunu Analytics: Simpler Real-Time Cassandra Apps
Acunu Analytics: Simpler Real-Time Cassandra AppsAcunu Analytics: Simpler Real-Time Cassandra Apps
Acunu Analytics: Simpler Real-Time Cassandra Apps
 
All Your Base
All Your BaseAll Your Base
All Your Base
 
Real-time Cassandra
Real-time CassandraReal-time Cassandra
Real-time Cassandra
 
Realtime Analytics on the Twitter Firehose with Apache Cassandra - Denormaliz...
Realtime Analytics on the Twitter Firehose with Apache Cassandra - Denormaliz...Realtime Analytics on the Twitter Firehose with Apache Cassandra - Denormaliz...
Realtime Analytics on the Twitter Firehose with Apache Cassandra - Denormaliz...
 
Realtime Analytics with Cassandra
Realtime Analytics with CassandraRealtime Analytics with Cassandra
Realtime Analytics with Cassandra
 
Acunu Analytics @ Cassandra London
Acunu Analytics @ Cassandra LondonAcunu Analytics @ Cassandra London
Acunu Analytics @ Cassandra London
 
Exploring Big Data value for your business
Exploring Big Data value for your businessExploring Big Data value for your business
Exploring Big Data value for your business
 
Realtime Analytics on the Twitter Firehose with Cassandra
Realtime Analytics on the Twitter Firehose with CassandraRealtime Analytics on the Twitter Firehose with Cassandra
Realtime Analytics on the Twitter Firehose with Cassandra
 
Progressive NOSQL: Cassandra
Progressive NOSQL: CassandraProgressive NOSQL: Cassandra
Progressive NOSQL: Cassandra
 
Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...
Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...
Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...
 
Cassandra EU 2012 - Putting the X Factor into Cassandra
Cassandra EU 2012 - Putting the X Factor into CassandraCassandra EU 2012 - Putting the X Factor into Cassandra
Cassandra EU 2012 - Putting the X Factor into Cassandra
 
Cassandra EU 2012 - Netflix's Cassandra Architecture and Open Source Efforts
Cassandra EU 2012 - Netflix's Cassandra Architecture and Open Source EffortsCassandra EU 2012 - Netflix's Cassandra Architecture and Open Source Efforts
Cassandra EU 2012 - Netflix's Cassandra Architecture and Open Source Efforts
 
Next Generation Cassandra
Next Generation CassandraNext Generation Cassandra
Next Generation Cassandra
 
Cassandra EU 2012 - CQL: Then, Now and When by Eric Evans
Cassandra EU 2012 - CQL: Then, Now and When by Eric Evans Cassandra EU 2012 - CQL: Then, Now and When by Eric Evans
Cassandra EU 2012 - CQL: Then, Now and When by Eric Evans
 
Cassandra EU 2012 - Storage Internals by Nicolas Favre-Felix
Cassandra EU 2012 - Storage Internals by Nicolas Favre-FelixCassandra EU 2012 - Storage Internals by Nicolas Favre-Felix
Cassandra EU 2012 - Storage Internals by Nicolas Favre-Felix
 
Cassandra EU 2012 - Highly Available: The Cassandra Distribution Model by Sam...
Cassandra EU 2012 - Highly Available: The Cassandra Distribution Model by Sam...Cassandra EU 2012 - Highly Available: The Cassandra Distribution Model by Sam...
Cassandra EU 2012 - Highly Available: The Cassandra Distribution Model by Sam...
 
Cassandra EU 2012 - Data modelling workshop by Richard Low
Cassandra EU 2012 - Data modelling workshop by Richard LowCassandra EU 2012 - Data modelling workshop by Richard Low
Cassandra EU 2012 - Data modelling workshop by Richard Low
 
Cassandra Performance: Past, present & future
Cassandra Performance: Past, present & futureCassandra Performance: Past, present & future
Cassandra Performance: Past, present & future
 
Supercharging Cassandra - GOTO Amsterdam
Supercharging Cassandra - GOTO AmsterdamSupercharging Cassandra - GOTO Amsterdam
Supercharging Cassandra - GOTO Amsterdam
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Recently uploaded (20)

Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 

Acunu Analytics

  • 1. Acunu Analytics Simple, powerful, real-time Andrew Byde Principal Scientist Tuesday, 27 March 2012
  • 2. Making big data useful How do we turn this ... time page session id duration ... ... ... ... 14:58:03.234 /index.html 248.180.3.40 898 14:58:03.234 /csi/csi/council/freedom.html 248.180.3.40 1234 14:58:03.234 /docs/access/chapter8.txt 99.1.10.178 52 ... ... ... ... x billions Tuesday, 27 March 2012
  • 3. MY Introduction into this... Tuesday, 27 March 2012
  • 6. • SQL + materialised views Tuesday, 27 March 2012
  • 7. • SQL + materialised views ... would be nice if it scaled Tuesday, 27 March 2012
  • 8. • Hadoop/Map-Reduce can do anything Tuesday, 27 March 2012
  • 9. • Hadoop/Map-Reduce can do anything Not real-time Inefficient re-computation Tuesday, 27 March 2012
  • 10. • Hadoop/Map-Reduce can do anything Not real-time Inefficient re-computation (100TB on a 100 node cluster is > 3 hours) Tuesday, 27 March 2012
  • 11. • Cassandra counters are pretty cool Tuesday, 27 March 2012
  • 12. • Cassandra counters are pretty cool but the query semantics is spartan => DIY solutions Tuesday, 27 March 2012
  • 13. Acunu Analytics • Simple, real-time, incremental analytics • push processing into ingest phase AA event Cassandra counter updates Tuesday, 27 March 2012
  • 14. Acunu Analytics • Event template, e.g., select : ["COUNT", "AVG(loadTime)"], type : { time : [TIME(HOUR; MIN; SEC), ?, 0], page : PATH(/), loadTime : [LONG, 0, 0] } • specifies “blow-up” strategy according to supported queries Tuesday, 27 March 2012
  • 15. Acunu Analytics type : { time : TIME(HOUR; MIN), category : STRING, user : STRING 21:00 all→1345 :00→45 :01→62 :02→87 ... } 22:00 all→3221 :00→22 :00→19 :02→104 ... ... ... click all→228 user01→1 user14→12 user99→7 ... open all→354 user01→4 user04→8 user56→17 ... ... click, 22:00 all→1904 ... ∅ all→87314 click→238 open→354 ... Tuesday, 27 March 2012
  • 16. Acunu Analytics type : { time : TIME(HOUR; MIN), category : STRING, user : STRING 21:00 all→1345 :00→45 :01→62 :02→87 ... } 22:00 all→3221 :00→22 :00→19 :02→104 ... ... ... (22:02, “click”, user01) click all→228 user01→1 user14→12 user99→7 ... open all→354 user01→4 user04→8 user56→17 ... ... click, 22:00 all→1904 ... ∅ all→87314 click→238 open→354 ... Tuesday, 27 March 2012
  • 17. Acunu Analytics type : { time : TIME(HOUR; MIN), category : STRING, user : STRING 21:00 all→1345 :00→45 :01→62 :02→87 ... } 22:00 all→3222 :00→22 :00→19 :02→105 ... ... ... (22:02, “click”, user01) click all→229 user01→2 user14→12 user99→7 ... open all→354 user01→4 user04→8 user56→17 ... ... click, 22:00 all→1905 ... ∅ all→87315 click→239 open→355 ... Tuesday, 27 March 2012
  • 18. Acunu Analytics Pre-assembled queries, e.g. ... 21:00 all→1345 :00→45 :01→62 :02→87 ... for 22:00-23:00, 22:00 all→3222 :00→22 :00→19 :02→105 ... group by minute ... ... click all→229 user01→2 user14→12 user99→7 ... group all by user, open all→354 user01→4 user04→8 user56→17 ... where category=click ... count all click, 22:00 all→1905 ... ∅ all→87315 click→239 open→355 ... group all by category Tuesday, 27 March 2012
  • 19. Summary • Simple, real-time, incremental analytics • work done on ingest • sum, count, distinct, avg, stddev, min-max etc • time + hierarchy bucketing • efficient ‘group’ semantics • works with Apache Cassandra Tuesday, 27 March 2012
  • 20. Early Access Program analytics@acunu.com Tuesday, 27 March 2012
  • 23. count distinct (session) count Tuesday, 27 March 2012
  • 24. count distinct (session) count avg(duration) Tuesday, 27 March 2012
  • 25. count grouped by ... day count distinct (session) count avg(duration) Tuesday, 27 March 2012
  • 26. count grouped by ... day count distinct (session) count ... geography avg(duration) Tuesday, 27 March 2012
  • 27. count grouped by ... day count distinct (session) count ... geography avg(duration) ... browser Tuesday, 27 March 2012