Enviar búsqueda
Cargar
Low Latency “OLAP” with HBase - HBaseCon 2012
•
27 recomendaciones
•
26,060 vistas
Cosmin Lehene
Seguir
Tecnología
Empresariales
Denunciar
Compartir
Denunciar
Compartir
1 de 35
Descargar ahora
Descargar para leer sin conexión
Recomendados
Low Latency OLAP with Hadoop and HBase
Low Latency OLAP with Hadoop and HBase
DataWorks Summit
HBase and Hadoop at Adobe
HBase and Hadoop at Adobe
Cosmin Lehene
DBA Basics guide
DBA Basics guide
azoznasser1
An Intro to Tuning Your SQL on DB2 for z/OS
An Intro to Tuning Your SQL on DB2 for z/OS
Willie Favero
DBA101
DBA101
Craig Mullins
DB2 V10 Migration Guidance
DB2 V10 Migration Guidance
Craig Mullins
JONSMITH10042016
JONSMITH10042016
Jon Smith
DB2 10 Smarter Database - IBM Tech Forum
DB2 10 Smarter Database - IBM Tech Forum
Surekha Parekh
Recomendados
Low Latency OLAP with Hadoop and HBase
Low Latency OLAP with Hadoop and HBase
DataWorks Summit
HBase and Hadoop at Adobe
HBase and Hadoop at Adobe
Cosmin Lehene
DBA Basics guide
DBA Basics guide
azoznasser1
An Intro to Tuning Your SQL on DB2 for z/OS
An Intro to Tuning Your SQL on DB2 for z/OS
Willie Favero
DBA101
DBA101
Craig Mullins
DB2 V10 Migration Guidance
DB2 V10 Migration Guidance
Craig Mullins
JONSMITH10042016
JONSMITH10042016
Jon Smith
DB2 10 Smarter Database - IBM Tech Forum
DB2 10 Smarter Database - IBM Tech Forum
Surekha Parekh
DB2 10 Webcast #1 - Overview And Migration Planning
DB2 10 Webcast #1 - Overview And Migration Planning
Laura Hood
Realtime Apache Hadoop at Facebook
Realtime Apache Hadoop at Facebook
parallellabs
SQL Server 2008 Fast Track Data Warehouse
SQL Server 2008 Fast Track Data Warehouse
Mark Ginnebaugh
Ta3
Ta3
leo1092
Monster
Monster
Jon Smith
Oracle10g new features
Oracle10g new features
Tanvi_Agrawal
DB210 Smarter Database IBM Tech Forum 2011
DB210 Smarter Database IBM Tech Forum 2011
Laura Hood
SQL Server Workshop Paul Bertucci
SQL Server Workshop Paul Bertucci
Mark Ginnebaugh
An Hour of DB2 Tips
An Hour of DB2 Tips
Craig Mullins
SQLFire Webinar
SQLFire Webinar
Carter Shanklin
SQLFire at Strata 2012
SQLFire at Strata 2012
Carter Shanklin
SDEC2011 Using Couchbase for social game scaling and speed
SDEC2011 Using Couchbase for social game scaling and speed
Korea Sdec
SQLFire lightning talk
SQLFire lightning talk
Carter Shanklin
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Xu Jiang
Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013
Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013
Cosmin Lehene
6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @Shanghai
6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @Shanghai
Luke Han
HISTORIA ACTIVA
HISTORIA ACTIVA
Jose Ramon
Making Of Zoozoo (Part 1)
Making Of Zoozoo (Part 1)
nirvanafilmblog
Ha nacido un concursante
Ha nacido un concursante
Jose Ramon
DÍAS DE RADIO
DÍAS DE RADIO
Jose Ramon
Mismuseos.net: Art After Technology (putting cultural data to work)
Mismuseos.net: Art After Technology (putting cultural data to work)
GNOSS
RHBC Announcements 3/19/17
RHBC Announcements 3/19/17
rhbc
Más contenido relacionado
La actualidad más candente
DB2 10 Webcast #1 - Overview And Migration Planning
DB2 10 Webcast #1 - Overview And Migration Planning
Laura Hood
Realtime Apache Hadoop at Facebook
Realtime Apache Hadoop at Facebook
parallellabs
SQL Server 2008 Fast Track Data Warehouse
SQL Server 2008 Fast Track Data Warehouse
Mark Ginnebaugh
Ta3
Ta3
leo1092
Monster
Monster
Jon Smith
Oracle10g new features
Oracle10g new features
Tanvi_Agrawal
DB210 Smarter Database IBM Tech Forum 2011
DB210 Smarter Database IBM Tech Forum 2011
Laura Hood
SQL Server Workshop Paul Bertucci
SQL Server Workshop Paul Bertucci
Mark Ginnebaugh
An Hour of DB2 Tips
An Hour of DB2 Tips
Craig Mullins
SQLFire Webinar
SQLFire Webinar
Carter Shanklin
SQLFire at Strata 2012
SQLFire at Strata 2012
Carter Shanklin
SDEC2011 Using Couchbase for social game scaling and speed
SDEC2011 Using Couchbase for social game scaling and speed
Korea Sdec
SQLFire lightning talk
SQLFire lightning talk
Carter Shanklin
La actualidad más candente
(13)
DB2 10 Webcast #1 - Overview And Migration Planning
DB2 10 Webcast #1 - Overview And Migration Planning
Realtime Apache Hadoop at Facebook
Realtime Apache Hadoop at Facebook
SQL Server 2008 Fast Track Data Warehouse
SQL Server 2008 Fast Track Data Warehouse
Ta3
Ta3
Monster
Monster
Oracle10g new features
Oracle10g new features
DB210 Smarter Database IBM Tech Forum 2011
DB210 Smarter Database IBM Tech Forum 2011
SQL Server Workshop Paul Bertucci
SQL Server Workshop Paul Bertucci
An Hour of DB2 Tips
An Hour of DB2 Tips
SQLFire Webinar
SQLFire Webinar
SQLFire at Strata 2012
SQLFire at Strata 2012
SDEC2011 Using Couchbase for social game scaling and speed
SDEC2011 Using Couchbase for social game scaling and speed
SQLFire lightning talk
SQLFire lightning talk
Destacado
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Xu Jiang
Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013
Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013
Cosmin Lehene
6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @Shanghai
6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @Shanghai
Luke Han
HISTORIA ACTIVA
HISTORIA ACTIVA
Jose Ramon
Making Of Zoozoo (Part 1)
Making Of Zoozoo (Part 1)
nirvanafilmblog
Ha nacido un concursante
Ha nacido un concursante
Jose Ramon
DÍAS DE RADIO
DÍAS DE RADIO
Jose Ramon
Mismuseos.net: Art After Technology (putting cultural data to work)
Mismuseos.net: Art After Technology (putting cultural data to work)
GNOSS
RHBC Announcements 3/19/17
RHBC Announcements 3/19/17
rhbc
The cognitive approach to abnormality (2)
The cognitive approach to abnormality (2)
clivecaines
Elastic HBase on Mesos - HBaseCon 2015
Elastic HBase on Mesos - HBaseCon 2015
Cosmin Lehene
Apache Cassandra Lesson: Data Modelling and CQL3
Apache Cassandra Lesson: Data Modelling and CQL3
Markus Klems
Normas de cine
Normas de cine
Jose Ramon
Stateless Hypervisors at Scale
Stateless Hypervisors at Scale
Antony Messerl
Beacosystem V3
Beacosystem V3
Sean O'Sullivan
IS OLAP DEAD IN THE AGE OF BIG DATA?
IS OLAP DEAD IN THE AGE OF BIG DATA?
DataWorks Summit
Hadoop World 2011: Advanced HBase Schema Design
Hadoop World 2011: Advanced HBase Schema Design
Cloudera, Inc.
Test strategies for data processing pipelines
Test strategies for data processing pipelines
Lars Albertsson
A Survey of HBase Application Archetypes
A Survey of HBase Application Archetypes
HBaseCon
Apache NiFi- MiNiFi meetup Slides
Apache NiFi- MiNiFi meetup Slides
Isheeta Sanghi
Destacado
(20)
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013
Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013
6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @Shanghai
6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @Shanghai
HISTORIA ACTIVA
HISTORIA ACTIVA
Making Of Zoozoo (Part 1)
Making Of Zoozoo (Part 1)
Ha nacido un concursante
Ha nacido un concursante
DÍAS DE RADIO
DÍAS DE RADIO
Mismuseos.net: Art After Technology (putting cultural data to work)
Mismuseos.net: Art After Technology (putting cultural data to work)
RHBC Announcements 3/19/17
RHBC Announcements 3/19/17
The cognitive approach to abnormality (2)
The cognitive approach to abnormality (2)
Elastic HBase on Mesos - HBaseCon 2015
Elastic HBase on Mesos - HBaseCon 2015
Apache Cassandra Lesson: Data Modelling and CQL3
Apache Cassandra Lesson: Data Modelling and CQL3
Normas de cine
Normas de cine
Stateless Hypervisors at Scale
Stateless Hypervisors at Scale
Beacosystem V3
Beacosystem V3
IS OLAP DEAD IN THE AGE OF BIG DATA?
IS OLAP DEAD IN THE AGE OF BIG DATA?
Hadoop World 2011: Advanced HBase Schema Design
Hadoop World 2011: Advanced HBase Schema Design
Test strategies for data processing pipelines
Test strategies for data processing pipelines
A Survey of HBase Application Archetypes
A Survey of HBase Application Archetypes
Apache NiFi- MiNiFi meetup Slides
Apache NiFi- MiNiFi meetup Slides
Similar a Low Latency “OLAP” with HBase - HBaseCon 2012
Xebia adobe flash mobile applications
Xebia adobe flash mobile applications
Michael Chaize
xTech2006_DB2onRails
xTech2006_DB2onRails
webuploader
Flex and LiveCycle Data Services Best Practices from the Trenches (Adobe MAX ...
Flex and LiveCycle Data Services Best Practices from the Trenches (Adobe MAX ...
François Le Droff
오라클 DR 및 복제 솔루션(Dbvisit 소개)
오라클 DR 및 복제 솔루션(Dbvisit 소개)
Linux Foundation Korea
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Bhupesh Bansal
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
Hadoop User Group
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
Romeo Kienzler
Compression Options in Hadoop - A Tale of Tradeoffs
Compression Options in Hadoop - A Tale of Tradeoffs
DataWorks Summit
Software im SAP Umfeld_IBM DB2
Software im SAP Umfeld_IBM DB2
IBM Switzerland
Ibm db2 big sql
Ibm db2 big sql
ModusOptimum
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
DataWorks Summit/Hadoop Summit
Monitoring with Icinga2 at Adobe
Monitoring with Icinga2 at Adobe
Icinga
Leveraging Open Source to Manage SAN Performance
Leveraging Open Source to Manage SAN Performance
brettallison
IBMHadoopofferingTechline-Systems2015
IBMHadoopofferingTechline-Systems2015
Daniela Zuppini
Big Data - HDInsight and Power BI
Big Data - HDInsight and Power BI
Prasad Prabhu (PP)
Hadoop Summit San Jose 2013: Compression Options in Hadoop - A Tale of Tradeo...
Hadoop Summit San Jose 2013: Compression Options in Hadoop - A Tale of Tradeo...
Sumeet Singh
Compression Options in Hadoop - A Tale of Tradeoffs
Compression Options in Hadoop - A Tale of Tradeoffs
DataWorks Summit
OVH Lab - Enterprise Cloud Databases
OVH Lab - Enterprise Cloud Databases
OVHcloud
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Cloudera, Inc.
Large Scale SQL Considerations for SharePoint Deployments
Large Scale SQL Considerations for SharePoint Deployments
Joel Oleson
Similar a Low Latency “OLAP” with HBase - HBaseCon 2012
(20)
Xebia adobe flash mobile applications
Xebia adobe flash mobile applications
xTech2006_DB2onRails
xTech2006_DB2onRails
Flex and LiveCycle Data Services Best Practices from the Trenches (Adobe MAX ...
Flex and LiveCycle Data Services Best Practices from the Trenches (Adobe MAX ...
오라클 DR 및 복제 솔루션(Dbvisit 소개)
오라클 DR 및 복제 솔루션(Dbvisit 소개)
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
Compression Options in Hadoop - A Tale of Tradeoffs
Compression Options in Hadoop - A Tale of Tradeoffs
Software im SAP Umfeld_IBM DB2
Software im SAP Umfeld_IBM DB2
Ibm db2 big sql
Ibm db2 big sql
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Monitoring with Icinga2 at Adobe
Monitoring with Icinga2 at Adobe
Leveraging Open Source to Manage SAN Performance
Leveraging Open Source to Manage SAN Performance
IBMHadoopofferingTechline-Systems2015
IBMHadoopofferingTechline-Systems2015
Big Data - HDInsight and Power BI
Big Data - HDInsight and Power BI
Hadoop Summit San Jose 2013: Compression Options in Hadoop - A Tale of Tradeo...
Hadoop Summit San Jose 2013: Compression Options in Hadoop - A Tale of Tradeo...
Compression Options in Hadoop - A Tale of Tradeoffs
Compression Options in Hadoop - A Tale of Tradeoffs
OVH Lab - Enterprise Cloud Databases
OVH Lab - Enterprise Cloud Databases
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Large Scale SQL Considerations for SharePoint Deployments
Large Scale SQL Considerations for SharePoint Deployments
Último
A Framework for Development in the AI Age
A Framework for Development in the AI Age
Cprime
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
Mydbops
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
LoriGlavin3
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
Hiroshi SHIBATA
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
ThousandEyes
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
fnnc6jmgwh
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
itnewsafrica
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
LoriGlavin3
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
LoriGlavin3
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
Inflectra
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
Ravi Sanghani
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdf
Aarwolf Industries LLC
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDF
Michael Gough
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
Kari Kakkonen
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#
Karmanjay Verma
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
itnewsafrica
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
Nicole Novielli
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
Neo4j
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
LoriGlavin3
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
LoriGlavin3
Último
(20)
A Framework for Development in the AI Age
A Framework for Development in the AI Age
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdf
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDF
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Low Latency “OLAP” with HBase - HBaseCon 2012
1.
Low Latency “OLAP”
with HBase Cosmin Lehene | Adobe © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
2.
What we needed
… and built OLAP Semantics Low Latency Ingestion High Throughput Real-time Query API Not hardcoded to web analytics or x-, y-, z- analytics, but extensible © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 2
3.
Building Blocks
Dimensions, Metrics Aggregations Roll-up, drill-down, slicing and dicing, sorting © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 3
4.
OLAP 101 –
Queries example Date Countr City OS Browser Sale y 2012-05-21 USA NY Windows FF 0.0 2012-05-21 USA NY Windows FF 10.0 2012-05-22 USA SF OSX Chrome 25.0 2012-05-22 Canada Ontario Linux Chrome 0.0 2012-05-23 USA Chicago OSX Safari 15.0 5 visits, 2 4 cities: 3 OS-es 3 browsers 50.0 3 days countries NY: 2 Win: 2 FF: 2 3 sales USA: 4 SF: 1 OSX: 2 Chrome:2 Canada: 1 © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 4
5.
OLAP 101 –
Queries example Rolling up to country level: Country visits sales SELECT COUNT(visits), SUM(sales) USA 4 $50 GROUP BY country Canada 1 0 “Slicing” by browser Country visits sales SELECT COUNT(visits), SUM(sales) USA 2 $10 GROUP BY country Canada 0 0 HAVING browser = “FF” Top browsers by sales Browser sales visits SELECT SUM(sales), COUNT(visits) Chrome $25 2 GROUP BY browser Safari $15 1 ORDER BY sales FF $10 2 © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 5
6.
OLAP – Runtime
Aggregation vs. Pre-aggregation Aggregate at runtime Pre-aggregate Most flexible Fast Fast – scatter gather Efficient – O(1) Space efficient High throughput But But I/O, CPU intensive More effort to process (latency) slow for larger data Combinatorial explosion (space) low throughput No flexibility © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 6
7.
Pre-aggregation
Data needs to be summarized Can’t visualize 1B data points (no, not even with Retina display) Difficult to comprehend correlations among more than 3 dimensions Not all dimension groups are relevant Index on a needed basis (view selection problem) Runtime aggregation == TeraSort for every query? Pre-aggregate to reduce cardinality © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 7
8.
SaasBase
We tune both pre-aggregation level vs. runtime post-aggregation (ingestion speed + space ) vs. (query speed) Think materialized views from RDBMS © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 8
9.
SaasBase Domain Model
Mapping © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 9
10.
SaasBase - Domain
Model Mapping © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 10
11.
SaasBase - Ingestion,
Processing, Indexing, Querying © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 11
12.
SaasBase - Ingestion,
Processing, Indexing, Querying © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 12
13.
Ingestion © 2012 Adobe
Systems Incorporated. All Rights Reserved. Adobe Confidential. 13
14.
Ingestion throughput vs.
latency Historical data (large batches) Optimize for throughput Increments (latest data, smaller) Optimize for latency © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 14
15.
Large, granular input
strategies Slow listing in HDFS Archive processed files Filtering input FileDateFilter (log name patterns: log-YYYY-MM-dd-HH.log) TableInputFormat start/stop row File Index in HBase (track processed/new files) Map tasks overhead - stitching input splits 400K files => 400K map tasks => overhead, slow reduce copy CombineFileInputFormat – 2GB-splits => 500 splits for 1TB FixedMappersTableInputFormat (e.g. 5-region splits) © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 15
16.
Ingestion – Bulk
Import HFileOutputFormat (HFOF) 100s X faster than HBase API No need to recover from failed jobs No unnecessary load on machines * No shuffle - global reduce order required! e.g. first reduce key needs to be in the first region, last one in the last region Watch for uneven partitions © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 16
17.
HFOF – FileSizeDatePartitioner
1 partition(reduce) / day for initial import Uneven reduce (partitions) due to data growth over time Reduce k: 2010-12-04 = 500MB Reduce n: 2012-05-22 = 5GB => slow and will result in a 5GB region Balance reduce buckets based on input file sizes and the reduce key Generate sub-partitions based on predefined size (e.g. 1GB) © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 17
18.
Processing © 2012 Adobe
Systems Incorporated. All Rights Reserved. Adobe Confidential. 18
19.
Processing
Processing involves reading the Input (files, tables, events), pre- aggregating it (reducing cardinality) and generating tables that can be queried in real-time 1 year: 1B events => 100B data points indexed Query => scan 365 data points (e.g. daily page views) Processing could be either MR or real-time (e.g. Storm) © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 19
20.
Processing for OLAP
semantics GROUP BY (process, query) COUNT, SUM, AVG, etc. (process, query) SORT (process, query) HAVING (mostly query, can define pre-process constraints) © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 20
21.
SaasBase vs. SQL
Views Comparison © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 21
22.
reports.json entities definition ©
2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 22
23.
Processing Performance
read, map, partition, combine, copy, sort, reduce, write Read: Scan.setCaching() (I/O ~ buffer) Scan.setBatching() (avoid timeouts for abnormal input, e.g. 1M hits/visit) Even region distribution across cluster (distributes CPU, I/O) Map: No unnecessary transformations: Bytes.toString(bytes) + Bytes.toBytes(string) (CPU) Avoid GC : new X() (CPU, Memory) Avoid system calls (context switching) Stripping unnecessary data (I/O) © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 23
24.
Processing Performance
Hot (in memory) vs. Cold (on disk, on network) data Minimize I/O from disk/network Single shot MR job: SuperProcessor Emit all groups from one map() call Incremental processing Data format YYYY-MM-DD prefixed rowkey (HH:mm for more granularity) © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 24
25.
Indexing © 2012 Adobe
Systems Incorporated. All Rights Reserved. Adobe Confidential. 25
26.
HBase natural order:
hierarchical representation © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 26
27.
Indexing - Why
Example: top 10 cities ~50K [country, city] combinations per day Top 10 cities for 1 year => 365 (days) X 50K ~=15M data points scanned If you add gender => 30M If you add Device, OS, Browser … Might compress well, but think about the environment How much energy would you spend for just top 10 cities? * Image from: http://my.neutralexistence.com/images/Green-Earth.jpg © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 27
28.
Indexing with HBase
“10” < “2” GROUP BY year, month, country, city ORDER BY visits DESC LIMIT 10 Lexicographic sorting 2012/05/USA/0000000000/ 2012/05/USA/4294961296/San Francisco = 1000 visits* 2012/05/USA/4294961396/New York = 900 visits* . . . 2012/05/USA/9999999999/ scan “t” startrow => “2012/05/USA/”, limit => 10 * Padding numbers for lexicographic sorting: 1000 -> Long.MAX_VALUE – 1000 = 4294961296 © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 28
29.
Query Engine
Always reads indexed, compact data Query parsing Scan strategy Single vs. multiple scans Start/stop rows (prefixes, index positions, etc.) Index selection (volatile indexes with incremental processing) Deserialization Post-aggregation, sorting, fuzzy-sorting etc. Paging Custom dimension/metric class loading © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 29
30.
Conclusions
OLAP semantics on a simple data model Data as first class citizen Domain Specific “Language” for Dimensions, Metrics, Aggregations Tunable performance, resource allocation Framework for vertical analytics systems © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 30
31.
Thank you!
Cosmin Lehene @clehene http://hstack.org Credits: Andrei Dragomir Adrian Muraru Andrei Dulvac Raluca Podiuc Tudor Scurtu Bogdan Dragu Bogdan Drutu © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 31
32.
© 2012 Adobe
Systems Incorporated. All Rights Reserved. Adobe Confidential.
33.
OLAP 101 -
Rollup Countr Visits Sale y USA 4 $50 Canada 1 $0 Rollup: SELECT COUNT(visits), SUM(sales) GROUP BY country © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 33
34.
OLAP 101 -
Slicing Date Countr City OS Browser Sale y 2012-03-02 USA NY Windows FF 0.0 2012-03-02 USA NY Windows FF 10.0 2012-03-03 USA S OSX Chrome 25.0 2012-03-03 Canada Ontario Linux Chrome 0.0 2012-03-04 USA Chicago OSX Safari 15.0 5 visits, 2 4 cities: 3 OS-es 3 browsers 50.0 3 days countries NY: 2 Win: 2 FF: 2 3 sales USA: 4 SF: 1 OSX: 2 Chrome:2 Canada: 1 Filter or Segment or Slice (WHERE or HAVING) © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 34
35.
OLAP 101 –
Sorting, TOP n Date Countr City OS Browser Sale y Chrome $25 Safari $15 Firefox $10 SELECT SUM(sales) as total GROUP BY browser ORDER BY total © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 35
Notas del editor
How many HBase users?
Data as first class citizen
Check contrast on projector
Just like speedvs space in general CS/algoQueries always hit indexes
Dimensions – readtransformserializedeserialize data attributesMetrics – read/transform/aggregate/serializeConstraints: ingestion filteringReport: instrument dimensions groups + metrics with aggregations, sorting
QUERY ENGINE -> INDEX(always realtime)
Initial import/process and NEW reports (not covered) on historical data
18K regions, upgrade to 0.92
DiagramHARD TO DIGEST (TOO MUCH INFO, TOO CONDENSED)
Process = aggregate,generate indexes (natural)Query = uses indexes, can do extra aggregation
LEFT: report definition, NOT a QUERYLIKE A VIEW - CREATED - THEN QUERIED
Inconsistent
Rowkey =dimensions group -> metrics (right)
GO BACK to EXPLAIN
>100K/sec/threadREALTIME
Data analysts work with familiar concepts
Descargar ahora