SlideShare a Scribd company logo
1 of 32
Real-time “OLAP” for Big Data (+ use cases)
     Cosmin Lehene | Adobe
     #bigdataro - 30 January 2013




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
What we needed … and built


      OLAP Semantics
      Low Latency Ingestion
      High Throughput
      Real-time Query API




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   2
“Physical” Building Blocks




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   3
Logical Building Blocks


      Dimensions, Metrics
      Aggregations
      Roll-up, drill-down, slicing and dicing, sorting




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   4
OLAP 101 – Queries example




                 Date                           Country                       City            OS        Browser      Sale

                 2012-05-21                     USA                           NY              Windows   FF           0.0

                 2012-05-21                     USA                           NY              Windows   FF           10.0

                 2012-05-22                     USA                           SF              OSX       Chrome       25.0

                 2012-05-22                     Canada                        Ontario         Linux     Chrome       0.0

                 2012-05-23                     USA                           Chicago         OSX       Safari       15.0

                 5 visits,                      2                             4 cities:       3 OS-es   3 browsers   50.0
                 3 days                         countries                     NY: 2           Win: 2    FF: 2        3 sales
                                                USA: 4                        SF: 1           OSX: 2    Chrome:2
                                                Canada: 1



© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.               5
OLAP 101 – Queries example

      Rolling up to country level:                                               Country    visits   sales
  SELECT COUNT(visits), SUM(sales)
                                                                                  USA        4        $50
  GROUP BY country
                                                                                  Canada     1        0




      “Slice” by browser                                                         Country   visits sales
  SELECT COUNT(visits), SUM(sales)                                                USA       2         $10
  GROUP BY country
                                                                                  Canada    0         0
  HAVING browser = “FF”

                                                                                  Browser   sales     visits
      Top browsers by sales
  SELECT SUM(sales), COUNT(visits)                                                Chrome    $25       2

  GROUP BY browser                                                                Safari    $15       1
  ORDER BY sales                                                                  FF        $10       2

© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   6
OLAP – Runtime Aggregation vs. Pre-aggregation


      Aggregate at runtime                                                      Pre-aggregate
            Most flexible                                                           Fast
            Fast – scatter gather                                                   Efficient – O(1)
            Space efficient                                                         High throughput
      But                                                                       But
            I/O, CPU intensive                                                      More effort to process (latency)
            slow for larger data                                                    Combinatorial explosion (space)
            low throughput                                                          No flexibility




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   7
SaasBase Map




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   8
SaasBase Domain Model Mapping




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   9
SaasBase - Domain Model Mapping




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   10
SaasBase - Ingestion, Processing, Indexing, Querying




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   11
SaasBase - Ingestion, Processing, Indexing, Querying




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   12
Ingestion




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   13
Ingestion(ETL) throughput vs. latency


      Historical data (large batches)
            Optimize for throughput
      Increments (latest data, smaller)
            Optimize for latency




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   14
Processing




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   15
Processing



      Processing involves reading the Input (files, tables, events), pre-
       aggregating it (reducing cardinality) and generating cubes that can be
       queried in real-time


      “Super Processor” code running in Storm, Map-Reduce, HBase




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   16
Processing for OLAP semantics

            GROUP BY (process, query)
            COUNT, SUM, AVG, etc. (process, query)
            SORT (process, query)
            HAVING (mostly query, can define pre-process constraints)




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   17
SaasBase vs. SQL Views Comparison




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   18
Query Engine

      Always reads indexed, compact data
      Query parsing
      Scan strategy
            Single vs. multiple scans
            Start/stop rows (prefixes, index positions, etc.)
            Index selection (volatile indexes with incremental processing)
      Deserialization
      Post-aggregation, sorting, fuzzy-sorting etc.
      Paging
      Custom dimension/metric class loading




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   19
Adobe Business Catalyst

      Online business presence: e-commerce, marketing, web analytics etc.
      Use case: Web Analytics (visitors, channels, content, e-
       commerce, campaigns, etc.)




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   20
BC - Workflow




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   21
Adobe Business Catalyst - Stats

      3 active datacenters
      Raw data ~6TB (from ~1TB 18 months ago)
      Visits table: ~1TB each(compressed)
      OLAP cubes (stats): 49GB – 64GB (compressed)


      ~30 minutes latency (from actual pageview/sale to chart in UI)
      10s – 100s of milliseconds latency for queries
      ~3000/s max concurrent OLAP queries (actual traffic is much lower)




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   22
Adobe Pass for TV Everywhere

      Authentication & Authorization
      Single sign-on to Programmer content (e.g.
       Turner, NBC, Hulu, MTV, etc) with Cable operator credentials (e.g.
       Comcast, Dish, etc.)




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   23
Adobe Pass – Use Case

      Analytics use case: Operational metrics (users, devices, latencies, etc.)
      Real-time ingestion in HBase
      High Frequency Map Reduce jobs (every 2 minutes)




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   24
Adobe Pass - Stats (London Olympics 2012)

      67M streams ~ 5.3M hours
      1.5M concurrent streams
      > 7M unique users


      1 Technical & Engineering Emmy Award ;)




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   25
Adobe Primetime – Real-time Video Analytics

      Unified video platform (acquisition, transcoding, broadcast, ads,
       analytics)




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   26
Adobe Primetime – Use Case


      Use Cases:
            Audience metrics – minutes latency ok
            Ads metrics – seconds to minutes ok
            Streaming QoS metrics – seconds must


      Requirements:
            Massive throughput (millions of streams, multiple
             heartbeats every 10 seconds)
            Low latency (end-to-end)


© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   27
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   28
Conclusions

      OLAP semantics on a simple data model
            Data as first class citizen
            Domain Specific “Language” for Dimensions, Metrics, Aggregations
      Framework for vertical analytics systems
      Tunable performance, resource allocation




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   29
Thank you!
                                                            Cosmin Lehene @clehene

                                                            http://hstack.org



© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   30
Related

  http://www.hbasecon.com/sessions/low-latency-olap-with-hbase/
  http://www.slideshare.net/clehene/low-latency-olap-with-hbase-hbasecon-2012




© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.   31
© 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

More Related Content

What's hot

Object Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning FrameworkObject Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning FrameworkNader Karimi
 
Recommendation Systems Basics
Recommendation Systems BasicsRecommendation Systems Basics
Recommendation Systems BasicsJarin Tasnim Khan
 
Introduction of Deep Learning
Introduction of Deep LearningIntroduction of Deep Learning
Introduction of Deep LearningMyungjin Lee
 
Srs template ieee-movie recommender
Srs template ieee-movie recommenderSrs template ieee-movie recommender
Srs template ieee-movie recommender429SAYAKTRIPATHY
 
A brief introduction to recent segmentation methods
A brief introduction to recent segmentation methodsA brief introduction to recent segmentation methods
A brief introduction to recent segmentation methodsShunta Saito
 
Convolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNetConvolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNetSungminYou
 
Deep Learning Hardware: Past, Present, & Future
Deep Learning Hardware: Past, Present, & FutureDeep Learning Hardware: Past, Present, & Future
Deep Learning Hardware: Past, Present, & FutureRouyun Pan
 
Presentation on unsupervised learning
Presentation on unsupervised learning Presentation on unsupervised learning
Presentation on unsupervised learning ANKUSH PAL
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural NetworksAshray Bhandare
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Gaurav Mittal
 
Recommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringRecommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringViet-Trung TRAN
 
Transfer Learning: An overview
Transfer Learning: An overviewTransfer Learning: An overview
Transfer Learning: An overviewjins0618
 
Machine Learning & Amazon SageMaker
Machine Learning & Amazon SageMakerMachine Learning & Amazon SageMaker
Machine Learning & Amazon SageMakerAmazon Web Services
 
Backpropagation in Convolutional Neural Network
Backpropagation in Convolutional Neural NetworkBackpropagation in Convolutional Neural Network
Backpropagation in Convolutional Neural NetworkHiroshi Kuwajima
 
Collaborative Filtering using KNN
Collaborative Filtering using KNNCollaborative Filtering using KNN
Collaborative Filtering using KNNŞeyda Hatipoğlu
 
Cs231n 2017 lecture13 Generative Model
Cs231n 2017 lecture13 Generative ModelCs231n 2017 lecture13 Generative Model
Cs231n 2017 lecture13 Generative ModelYanbin Kong
 
Restaurant recommender system
Restaurant recommender systemRestaurant recommender system
Restaurant recommender systemArif Huda
 

What's hot (20)

Object Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning FrameworkObject Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning Framework
 
Recommendation Systems Basics
Recommendation Systems BasicsRecommendation Systems Basics
Recommendation Systems Basics
 
Introduction of Deep Learning
Introduction of Deep LearningIntroduction of Deep Learning
Introduction of Deep Learning
 
Srs template ieee-movie recommender
Srs template ieee-movie recommenderSrs template ieee-movie recommender
Srs template ieee-movie recommender
 
LeNet-5
LeNet-5LeNet-5
LeNet-5
 
A brief introduction to recent segmentation methods
A brief introduction to recent segmentation methodsA brief introduction to recent segmentation methods
A brief introduction to recent segmentation methods
 
Convolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNetConvolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNet
 
Deep Learning Hardware: Past, Present, & Future
Deep Learning Hardware: Past, Present, & FutureDeep Learning Hardware: Past, Present, & Future
Deep Learning Hardware: Past, Present, & Future
 
YOLO
YOLOYOLO
YOLO
 
A Real-time Radiosity Architecture
A Real-time Radiosity ArchitectureA Real-time Radiosity Architecture
A Real-time Radiosity Architecture
 
Presentation on unsupervised learning
Presentation on unsupervised learning Presentation on unsupervised learning
Presentation on unsupervised learning
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural Networks
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
 
Recommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filteringRecommender systems: Content-based and collaborative filtering
Recommender systems: Content-based and collaborative filtering
 
Transfer Learning: An overview
Transfer Learning: An overviewTransfer Learning: An overview
Transfer Learning: An overview
 
Machine Learning & Amazon SageMaker
Machine Learning & Amazon SageMakerMachine Learning & Amazon SageMaker
Machine Learning & Amazon SageMaker
 
Backpropagation in Convolutional Neural Network
Backpropagation in Convolutional Neural NetworkBackpropagation in Convolutional Neural Network
Backpropagation in Convolutional Neural Network
 
Collaborative Filtering using KNN
Collaborative Filtering using KNNCollaborative Filtering using KNN
Collaborative Filtering using KNN
 
Cs231n 2017 lecture13 Generative Model
Cs231n 2017 lecture13 Generative ModelCs231n 2017 lecture13 Generative Model
Cs231n 2017 lecture13 Generative Model
 
Restaurant recommender system
Restaurant recommender systemRestaurant recommender system
Restaurant recommender system
 

Viewers also liked

Case Study Real Time Olap Cubes
Case Study Real Time Olap CubesCase Study Real Time Olap Cubes
Case Study Real Time Olap Cubesmister_zed
 
IS OLAP DEAD IN THE AGE OF BIG DATA?
IS OLAP DEAD IN THE AGE OF BIG DATA?IS OLAP DEAD IN THE AGE OF BIG DATA?
IS OLAP DEAD IN THE AGE OF BIG DATA?DataWorks Summit
 
Low Latency OLAP with Hadoop and HBase
Low Latency OLAP with Hadoop and HBaseLow Latency OLAP with Hadoop and HBase
Low Latency OLAP with Hadoop and HBaseDataWorks Summit
 
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveApache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveXu Jiang
 
Low Latency “OLAP” with HBase - HBaseCon 2012
Low Latency “OLAP” with HBase - HBaseCon 2012Low Latency “OLAP” with HBase - HBaseCon 2012
Low Latency “OLAP” with HBase - HBaseCon 2012Cosmin Lehene
 
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)SANG WON PARK
 
OLAP Basics and Fundamentals by Bharat Kalia
OLAP Basics and Fundamentals by Bharat Kalia OLAP Basics and Fundamentals by Bharat Kalia
OLAP Basics and Fundamentals by Bharat Kalia Bharat Kalia
 
Case study- Real-time OLAP Cubes
Case study- Real-time OLAP Cubes Case study- Real-time OLAP Cubes
Case study- Real-time OLAP Cubes Ziemowit Jankowski
 
Technical product manager
Technical product managerTechnical product manager
Technical product managerMark Long
 
Lotus Forms Webform Server 3.0 Overview & Architecture
Lotus Forms Webform Server 3.0 Overview & ArchitectureLotus Forms Webform Server 3.0 Overview & Architecture
Lotus Forms Webform Server 3.0 Overview & Architectureddrschiw
 
Building Faster Horses: Taking Over An Existing Software Product
Building Faster Horses: Taking Over An Existing Software ProductBuilding Faster Horses: Taking Over An Existing Software Product
Building Faster Horses: Taking Over An Existing Software ProductStacy Vicknair
 
Algorithm - Introduction
Algorithm - IntroductionAlgorithm - Introduction
Algorithm - IntroductionMadhu Bala
 
kafka-steaming-data
kafka-steaming-datakafka-steaming-data
kafka-steaming-dataBryan Jacobs
 
Introduction To Algorithm [2]
Introduction To Algorithm [2]Introduction To Algorithm [2]
Introduction To Algorithm [2]ecko_disasterz
 
University Course Timetabling by using Multi Objective Genetic Algortihms
University Course Timetabling by using Multi Objective Genetic AlgortihmsUniversity Course Timetabling by using Multi Objective Genetic Algortihms
University Course Timetabling by using Multi Objective Genetic AlgortihmsHalil Kaşkavalcı
 
VMworld 2015: vSphere Web Client- Yesterday, Today, and Tomorrow
VMworld 2015: vSphere Web Client- Yesterday, Today, and TomorrowVMworld 2015: vSphere Web Client- Yesterday, Today, and Tomorrow
VMworld 2015: vSphere Web Client- Yesterday, Today, and TomorrowVMworld
 

Viewers also liked (20)

Case Study Real Time Olap Cubes
Case Study Real Time Olap CubesCase Study Real Time Olap Cubes
Case Study Real Time Olap Cubes
 
IS OLAP DEAD IN THE AGE OF BIG DATA?
IS OLAP DEAD IN THE AGE OF BIG DATA?IS OLAP DEAD IN THE AGE OF BIG DATA?
IS OLAP DEAD IN THE AGE OF BIG DATA?
 
Low Latency OLAP with Hadoop and HBase
Low Latency OLAP with Hadoop and HBaseLow Latency OLAP with Hadoop and HBase
Low Latency OLAP with Hadoop and HBase
 
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveApache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
 
Low Latency “OLAP” with HBase - HBaseCon 2012
Low Latency “OLAP” with HBase - HBaseCon 2012Low Latency “OLAP” with HBase - HBaseCon 2012
Low Latency “OLAP” with HBase - HBaseCon 2012
 
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)
OLAP for Big Data (Druid vs Apache Kylin vs Apache Lens)
 
Business analysis
Business analysisBusiness analysis
Business analysis
 
OLAP Basics and Fundamentals by Bharat Kalia
OLAP Basics and Fundamentals by Bharat Kalia OLAP Basics and Fundamentals by Bharat Kalia
OLAP Basics and Fundamentals by Bharat Kalia
 
Case study- Real-time OLAP Cubes
Case study- Real-time OLAP Cubes Case study- Real-time OLAP Cubes
Case study- Real-time OLAP Cubes
 
Technical product manager
Technical product managerTechnical product manager
Technical product manager
 
docker
dockerdocker
docker
 
Core Management - Task 1
Core Management - Task 1Core Management - Task 1
Core Management - Task 1
 
Lotus Forms Webform Server 3.0 Overview & Architecture
Lotus Forms Webform Server 3.0 Overview & ArchitectureLotus Forms Webform Server 3.0 Overview & Architecture
Lotus Forms Webform Server 3.0 Overview & Architecture
 
Building Faster Horses: Taking Over An Existing Software Product
Building Faster Horses: Taking Over An Existing Software ProductBuilding Faster Horses: Taking Over An Existing Software Product
Building Faster Horses: Taking Over An Existing Software Product
 
IEA DSM Task 24 Transport Panel at BECC conference
IEA DSM Task 24 Transport Panel at BECC conferenceIEA DSM Task 24 Transport Panel at BECC conference
IEA DSM Task 24 Transport Panel at BECC conference
 
Algorithm - Introduction
Algorithm - IntroductionAlgorithm - Introduction
Algorithm - Introduction
 
kafka-steaming-data
kafka-steaming-datakafka-steaming-data
kafka-steaming-data
 
Introduction To Algorithm [2]
Introduction To Algorithm [2]Introduction To Algorithm [2]
Introduction To Algorithm [2]
 
University Course Timetabling by using Multi Objective Genetic Algortihms
University Course Timetabling by using Multi Objective Genetic AlgortihmsUniversity Course Timetabling by using Multi Objective Genetic Algortihms
University Course Timetabling by using Multi Objective Genetic Algortihms
 
VMworld 2015: vSphere Web Client- Yesterday, Today, and Tomorrow
VMworld 2015: vSphere Web Client- Yesterday, Today, and TomorrowVMworld 2015: vSphere Web Client- Yesterday, Today, and Tomorrow
VMworld 2015: vSphere Web Client- Yesterday, Today, and Tomorrow
 

Similar to Real-time OLAP Big Data Use Cases

HBaseCon 2012 | Low Latency OLAP with HBase - Cosmin Lehene, Adobe
HBaseCon 2012 | Low Latency OLAP with HBase - Cosmin Lehene, AdobeHBaseCon 2012 | Low Latency OLAP with HBase - Cosmin Lehene, Adobe
HBaseCon 2012 | Low Latency OLAP with HBase - Cosmin Lehene, AdobeCloudera, Inc.
 
Xebia adobe flash mobile applications
Xebia adobe flash mobile applicationsXebia adobe flash mobile applications
Xebia adobe flash mobile applicationsMichael Chaize
 
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...Romeo Kienzler
 
Oop2012 keynote Design Driven Development
Oop2012 keynote Design Driven DevelopmentOop2012 keynote Design Driven Development
Oop2012 keynote Design Driven DevelopmentMichael Chaize
 
Monitoring with Icinga2 at Adobe
Monitoring with Icinga2 at AdobeMonitoring with Icinga2 at Adobe
Monitoring with Icinga2 at AdobeIcinga
 
Flex and LiveCycle Data Services Best Practices from the Trenches (Adobe MAX ...
Flex and LiveCycle Data Services Best Practices from the Trenches (Adobe MAX ...Flex and LiveCycle Data Services Best Practices from the Trenches (Adobe MAX ...
Flex and LiveCycle Data Services Best Practices from the Trenches (Adobe MAX ...François Le Droff
 
xTech2006_DB2onRails
xTech2006_DB2onRailsxTech2006_DB2onRails
xTech2006_DB2onRailswebuploader
 
Moving to the cloud azure, office365, and intune - concurrency
Moving to the cloud   azure, office365, and intune - concurrencyMoving to the cloud   azure, office365, and intune - concurrency
Moving to the cloud azure, office365, and intune - concurrencyConcurrency, Inc.
 
Serverless Databases - Amazon DynamoDB and Amazon Aurora Serverless - Demo
Serverless Databases - Amazon DynamoDB and Amazon Aurora Serverless - DemoServerless Databases - Amazon DynamoDB and Amazon Aurora Serverless - Demo
Serverless Databases - Amazon DynamoDB and Amazon Aurora Serverless - DemoAmazon Web Services
 
The Yin and Yang of Software
The Yin and Yang of SoftwareThe Yin and Yang of Software
The Yin and Yang of Softwareelliando dias
 
GPSTEC324_STORAGE FOR HPC IN THE CLOUD
GPSTEC324_STORAGE FOR HPC IN THE CLOUDGPSTEC324_STORAGE FOR HPC IN THE CLOUD
GPSTEC324_STORAGE FOR HPC IN THE CLOUDAmazon Web Services
 
GPS: Storage for HPC in the Cloud - GPSTEC324 - re:Invent 2017
GPS: Storage for HPC in the Cloud - GPSTEC324 - re:Invent 2017GPS: Storage for HPC in the Cloud - GPSTEC324 - re:Invent 2017
GPS: Storage for HPC in the Cloud - GPSTEC324 - re:Invent 2017Amazon Web Services
 
Adobe jax2010 1_dashboard
Adobe jax2010 1_dashboardAdobe jax2010 1_dashboard
Adobe jax2010 1_dashboardguest9776673
 
Adobe Ask the AEM Community Expert Session Oct 2016
Adobe Ask the AEM Community Expert Session Oct 2016Adobe Ask the AEM Community Expert Session Oct 2016
Adobe Ask the AEM Community Expert Session Oct 2016AdobeMarketingCloud
 
Strengthening Adobe’s Enterprise Platform with Day Software and Open Development
Strengthening Adobe’s Enterprise Platform with Day Software and Open DevelopmentStrengthening Adobe’s Enterprise Platform with Day Software and Open Development
Strengthening Adobe’s Enterprise Platform with Day Software and Open DevelopmentCraig Randall
 
Windows Azure Platform + PHP - Jonathan Wong
Windows Azure Platform + PHP - Jonathan WongWindows Azure Platform + PHP - Jonathan Wong
Windows Azure Platform + PHP - Jonathan WongSpiffy
 
(ARC346) Scaling To 25 Billion Daily Requests Within 3 Months On AWS
(ARC346) Scaling To 25 Billion Daily Requests Within 3 Months On AWS(ARC346) Scaling To 25 Billion Daily Requests Within 3 Months On AWS
(ARC346) Scaling To 25 Billion Daily Requests Within 3 Months On AWSAmazon Web Services
 
The Open PaaS Stack
The Open PaaS StackThe Open PaaS Stack
The Open PaaS StackGuy Korland
 

Similar to Real-time OLAP Big Data Use Cases (20)

HBaseCon 2012 | Low Latency OLAP with HBase - Cosmin Lehene, Adobe
HBaseCon 2012 | Low Latency OLAP with HBase - Cosmin Lehene, AdobeHBaseCon 2012 | Low Latency OLAP with HBase - Cosmin Lehene, Adobe
HBaseCon 2012 | Low Latency OLAP with HBase - Cosmin Lehene, Adobe
 
Xebia adobe flash mobile applications
Xebia adobe flash mobile applicationsXebia adobe flash mobile applications
Xebia adobe flash mobile applications
 
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
 
Oop2012 keynote Design Driven Development
Oop2012 keynote Design Driven DevelopmentOop2012 keynote Design Driven Development
Oop2012 keynote Design Driven Development
 
Monitoring with Icinga2 at Adobe
Monitoring with Icinga2 at AdobeMonitoring with Icinga2 at Adobe
Monitoring with Icinga2 at Adobe
 
Flex and LiveCycle Data Services Best Practices from the Trenches (Adobe MAX ...
Flex and LiveCycle Data Services Best Practices from the Trenches (Adobe MAX ...Flex and LiveCycle Data Services Best Practices from the Trenches (Adobe MAX ...
Flex and LiveCycle Data Services Best Practices from the Trenches (Adobe MAX ...
 
xTech2006_DB2onRails
xTech2006_DB2onRailsxTech2006_DB2onRails
xTech2006_DB2onRails
 
Moving to the cloud azure, office365, and intune - concurrency
Moving to the cloud   azure, office365, and intune - concurrencyMoving to the cloud   azure, office365, and intune - concurrency
Moving to the cloud azure, office365, and intune - concurrency
 
Serverless Databases - Amazon DynamoDB and Amazon Aurora Serverless - Demo
Serverless Databases - Amazon DynamoDB and Amazon Aurora Serverless - DemoServerless Databases - Amazon DynamoDB and Amazon Aurora Serverless - Demo
Serverless Databases - Amazon DynamoDB and Amazon Aurora Serverless - Demo
 
The Yin and Yang of Software
The Yin and Yang of SoftwareThe Yin and Yang of Software
The Yin and Yang of Software
 
Ibm db2 big sql
Ibm db2 big sqlIbm db2 big sql
Ibm db2 big sql
 
GPSTEC324_STORAGE FOR HPC IN THE CLOUD
GPSTEC324_STORAGE FOR HPC IN THE CLOUDGPSTEC324_STORAGE FOR HPC IN THE CLOUD
GPSTEC324_STORAGE FOR HPC IN THE CLOUD
 
GPS: Storage for HPC in the Cloud - GPSTEC324 - re:Invent 2017
GPS: Storage for HPC in the Cloud - GPSTEC324 - re:Invent 2017GPS: Storage for HPC in the Cloud - GPSTEC324 - re:Invent 2017
GPS: Storage for HPC in the Cloud - GPSTEC324 - re:Invent 2017
 
Adobe jax2010 1_dashboard
Adobe jax2010 1_dashboardAdobe jax2010 1_dashboard
Adobe jax2010 1_dashboard
 
Adobe Ask the AEM Community Expert Session Oct 2016
Adobe Ask the AEM Community Expert Session Oct 2016Adobe Ask the AEM Community Expert Session Oct 2016
Adobe Ask the AEM Community Expert Session Oct 2016
 
Strengthening Adobe’s Enterprise Platform with Day Software and Open Development
Strengthening Adobe’s Enterprise Platform with Day Software and Open DevelopmentStrengthening Adobe’s Enterprise Platform with Day Software and Open Development
Strengthening Adobe’s Enterprise Platform with Day Software and Open Development
 
Windows Azure Platform + PHP - Jonathan Wong
Windows Azure Platform + PHP - Jonathan WongWindows Azure Platform + PHP - Jonathan Wong
Windows Azure Platform + PHP - Jonathan Wong
 
(ARC346) Scaling To 25 Billion Daily Requests Within 3 Months On AWS
(ARC346) Scaling To 25 Billion Daily Requests Within 3 Months On AWS(ARC346) Scaling To 25 Billion Daily Requests Within 3 Months On AWS
(ARC346) Scaling To 25 Billion Daily Requests Within 3 Months On AWS
 
Big Data in the Cloud
Big Data in the Cloud Big Data in the Cloud
Big Data in the Cloud
 
The Open PaaS Stack
The Open PaaS StackThe Open PaaS Stack
The Open PaaS Stack
 

Recently uploaded

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 

Recently uploaded (20)

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 

Real-time OLAP Big Data Use Cases

  • 1. Real-time “OLAP” for Big Data (+ use cases) Cosmin Lehene | Adobe #bigdataro - 30 January 2013 © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
  • 2. What we needed … and built  OLAP Semantics  Low Latency Ingestion  High Throughput  Real-time Query API © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 2
  • 3. “Physical” Building Blocks © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 3
  • 4. Logical Building Blocks  Dimensions, Metrics  Aggregations  Roll-up, drill-down, slicing and dicing, sorting © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 4
  • 5. OLAP 101 – Queries example Date Country City OS Browser Sale 2012-05-21 USA NY Windows FF 0.0 2012-05-21 USA NY Windows FF 10.0 2012-05-22 USA SF OSX Chrome 25.0 2012-05-22 Canada Ontario Linux Chrome 0.0 2012-05-23 USA Chicago OSX Safari 15.0 5 visits, 2 4 cities: 3 OS-es 3 browsers 50.0 3 days countries NY: 2 Win: 2 FF: 2 3 sales USA: 4 SF: 1 OSX: 2 Chrome:2 Canada: 1 © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 5
  • 6. OLAP 101 – Queries example  Rolling up to country level: Country visits sales SELECT COUNT(visits), SUM(sales) USA 4 $50 GROUP BY country Canada 1 0  “Slice” by browser Country visits sales SELECT COUNT(visits), SUM(sales) USA 2 $10 GROUP BY country Canada 0 0 HAVING browser = “FF” Browser sales visits  Top browsers by sales SELECT SUM(sales), COUNT(visits) Chrome $25 2 GROUP BY browser Safari $15 1 ORDER BY sales FF $10 2 © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 6
  • 7. OLAP – Runtime Aggregation vs. Pre-aggregation  Aggregate at runtime  Pre-aggregate  Most flexible  Fast  Fast – scatter gather  Efficient – O(1)  Space efficient  High throughput  But  But  I/O, CPU intensive  More effort to process (latency)  slow for larger data  Combinatorial explosion (space)  low throughput  No flexibility © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 7
  • 8. SaasBase Map © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 8
  • 9. SaasBase Domain Model Mapping © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 9
  • 10. SaasBase - Domain Model Mapping © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 10
  • 11. SaasBase - Ingestion, Processing, Indexing, Querying © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 11
  • 12. SaasBase - Ingestion, Processing, Indexing, Querying © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 12
  • 13. Ingestion © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 13
  • 14. Ingestion(ETL) throughput vs. latency  Historical data (large batches)  Optimize for throughput  Increments (latest data, smaller)  Optimize for latency © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 14
  • 15. Processing © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 15
  • 16. Processing  Processing involves reading the Input (files, tables, events), pre- aggregating it (reducing cardinality) and generating cubes that can be queried in real-time  “Super Processor” code running in Storm, Map-Reduce, HBase © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 16
  • 17. Processing for OLAP semantics  GROUP BY (process, query)  COUNT, SUM, AVG, etc. (process, query)  SORT (process, query)  HAVING (mostly query, can define pre-process constraints) © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 17
  • 18. SaasBase vs. SQL Views Comparison © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 18
  • 19. Query Engine  Always reads indexed, compact data  Query parsing  Scan strategy  Single vs. multiple scans  Start/stop rows (prefixes, index positions, etc.)  Index selection (volatile indexes with incremental processing)  Deserialization  Post-aggregation, sorting, fuzzy-sorting etc.  Paging  Custom dimension/metric class loading © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 19
  • 20. Adobe Business Catalyst  Online business presence: e-commerce, marketing, web analytics etc.  Use case: Web Analytics (visitors, channels, content, e- commerce, campaigns, etc.) © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 20
  • 21. BC - Workflow © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 21
  • 22. Adobe Business Catalyst - Stats  3 active datacenters  Raw data ~6TB (from ~1TB 18 months ago)  Visits table: ~1TB each(compressed)  OLAP cubes (stats): 49GB – 64GB (compressed)  ~30 minutes latency (from actual pageview/sale to chart in UI)  10s – 100s of milliseconds latency for queries  ~3000/s max concurrent OLAP queries (actual traffic is much lower) © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 22
  • 23. Adobe Pass for TV Everywhere  Authentication & Authorization  Single sign-on to Programmer content (e.g. Turner, NBC, Hulu, MTV, etc) with Cable operator credentials (e.g. Comcast, Dish, etc.) © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 23
  • 24. Adobe Pass – Use Case  Analytics use case: Operational metrics (users, devices, latencies, etc.)  Real-time ingestion in HBase  High Frequency Map Reduce jobs (every 2 minutes) © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 24
  • 25. Adobe Pass - Stats (London Olympics 2012)  67M streams ~ 5.3M hours  1.5M concurrent streams  > 7M unique users  1 Technical & Engineering Emmy Award ;) © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 25
  • 26. Adobe Primetime – Real-time Video Analytics  Unified video platform (acquisition, transcoding, broadcast, ads, analytics) © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 26
  • 27. Adobe Primetime – Use Case  Use Cases:  Audience metrics – minutes latency ok  Ads metrics – seconds to minutes ok  Streaming QoS metrics – seconds must  Requirements:  Massive throughput (millions of streams, multiple heartbeats every 10 seconds)  Low latency (end-to-end) © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 27
  • 28. © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 28
  • 29. Conclusions  OLAP semantics on a simple data model  Data as first class citizen  Domain Specific “Language” for Dimensions, Metrics, Aggregations  Framework for vertical analytics systems  Tunable performance, resource allocation © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 29
  • 30. Thank you! Cosmin Lehene @clehene http://hstack.org © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 30
  • 31. Related http://www.hbasecon.com/sessions/low-latency-olap-with-hbase/ http://www.slideshare.net/clehene/low-latency-olap-with-hbase-hbasecon-2012 © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 31
  • 32. © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.

Editor's Notes

  1. How many HBase users?
  2. Data as first class citizen
  3. Add the real building blocks HDFS, MapReduce, Hbase Storm
  4. Add the real building blocks HDFS, MapReduce, Hbase Storm
  5. Check contrast on projector
  6. Two approaches RDBMS / .OLAP
  7. Dimensions – readtransformserializedeserialize data attributesMetrics – read/transform/aggregate/serializeConstraints: ingestion filteringReport: instrument dimensions groups + metrics with aggregations, sorting
  8. QUERY ENGINE -> INDEX(always realtime)What’s the difference between this and HIVE/PIG/Impala
  9. Process = aggregate,generate indexes (natural)Query = uses indexes, can do extra aggregation
  10. LEFT: report definition, NOT a QUERYLIKE A VIEW - CREATED - THEN QUERIED
  11. >100K/sec/threadREALTIME
  12. ~12 hours to reprocess everything from scratch
  13. 2 datacenters (active-failover) on US West and East coasts (2NN + 19DN, 0.5PB total, 456 cores, 1.1TB RAM)
  14. ----- Meeting Notes (1/29/13 18:09) -----OlympicsSame SaasBase codebase running in Storm instead of HadoopSimpler aggregations, but strict latency requirements
  15. ----- Meeting Notes (1/29/13 18:12) -----draw line between player and chart
  16. Data analysts work with familiar concepts----- Meeting Notes (1/29/13 18:12) -----Future:
  17. …….