SlideShare una empresa de Scribd logo
1 de 21
Hongbin Ma, Luke Han
Kyligence Inc.
Apache Kylin’s
Performance Boost from
Apache HBase
About us
Hongbin Ma| 马洪宾
 PMC member of Apache Kylin
 Technical partner of Kyligence Inc.
 mahongbin@apache.org
Kyligence Inc.
 Kyligence is a leading data intelligence company focusing on Big Data technologies and
innovation, offering intelligent platform and product powered by Apache Kylin™ for
enterprise ready business analytics solutions.
Luke Han | 韩卿
 Co-creator & VP of Apache Kylin
 ASF Member
 Co-founder & CEO at Kyligence Inc.
 lukehan@apache.org
Apache Kylin aerial view
MapReduce/Spark
Kylin
BI Tools, Web App…
ANSI SQL
What is Apache Kylin
 Apache Kylin is an open source distributed analytics engine that
provides a SQL interface for multi-dimensional analysis on Hadoop
 Works well with extremely large datasets
 Provides REST API, ODBC and JDBC as user interface
 Widely adopted by many companies like eBay, JD, Baidu, NetEase, VIP.com,
etc.
Apache Kylin Global Adoptions
What is Apache Kylin
 Apache Kylin is an open source distributed analytics engine that
provides a SQL interface for multi-dimensional analysis on Hadoop
 Works well with extremely large datasets
 Provides REST API, ODBC and JDBC as user interface
 Widely adopted by many companies like eBay, JD, Baidu, NetEase, VIP.com,
etc.
 Apache Kylin pre-calculates OLAP cubes with a horizontal scalable
computation framework(MapReduce, Spark, etc.) and store the cubes
into a reliable & scalable data store(HBase, Casscandra, etc.)
Architecture Design
Cube Builder
(MapReduce, Spark, etc…)
SQL
Low Latency -
SecondsRouting
3rd Party App
(Web App, Mobile…)
Metadata
SQL-Based Tool
(BI Tools: Tableau…)
Query Engine
Hadoop
Hive
REST API JDBC/ODBC
 Online Analysis Data Flow
 Offline Data Flow
 Clients/Users interactive with
Kylin via SQL
 OLAP Cube is transparent to
users
Star Schema Data Key Value Data
Data
Cube
OLAP
Cubes
(HBase)
SQL
REST ServerDataSource
Abstraction
Engine
Abstraction
Storage
Abstraction
Cube data explained
dimensions cuboid cuboid lattice
Cubes stored in HBase
Let’s take a looks at
cuboid (D1,D3,D5)
where all dimensions are:
(D1,D2,D3,D4,D5)
This cuboid is donated as “cuboid 00010101”
Why HBase as the first choice?
 Well integrated with Hadoop
 Block encoding to reduce storage footprint
 Good at both seeking and scanning
 Coprocessors to move computation to data
 Scalable and flexible as a data store
Region server
How Kylin queries HBase
Kylin Query
Server
region
coprocessor
Country Metrics…DateSellerIDCuboidID
2. Scan with Fuzzy Key Filter
1. Filter/Aggregation push down
3. Half baked results
May still be slow when
 The cuboid is large because there’s really lots of combinations in it
 Cuboid layout is not friendly to query, e.g. filter on suffix dimensions while
group by prefix dimensions.
 The filter in query is huge and complex
 Regions are returning too many half-baked results
Solution: Cube + MPP
Kylin Query
Server
Novelty
 Compared with “pure” MPP solutions
 Cube data is more query-friendly because it is pre-aggregated and sorted.
 Faster speed
 Less CPU consumption
 Less storage read
 Able to leverage column storage and inverted index just like typical MPP
 Compared with “pure” Cubing technologies
 Overcome the bottleneck in cube size
 Overcome the bottleneck in cube visiting speed
Problem
 The sizes of different cuboids in the same cube may vary
 Too many parallelism for small cuboids is harmful
 A RPC is required for each shard, we don’t want to abuse network/CPU
resource
Solution: Shard Circle
0
1
2
3
4
5
6
7
8
9
Given estimated size for each cuboid 𝑆𝑖,
and expected size for each region 𝑆𝑟 (specified by modeler)
𝑟𝑒𝑔𝑖𝑜𝑛𝑁𝑢𝑚 =
𝑆𝑖
𝑆𝑟
𝑐𝑢𝑏𝑜𝑖𝑑𝑅𝑒𝑔𝑖𝑜𝑛𝑁𝑢𝑚 =
𝑆𝑖 ∗ 𝑓𝑎𝑐𝑡𝑜𝑟
𝑆𝑟
𝑐𝑢𝑏𝑜𝑖𝑑𝐶𝑖𝑟𝑐𝑙𝑒𝑆𝑡𝑎𝑟𝑡 = ℎ𝑎𝑠ℎ 𝑖 𝑀𝑂𝐷 𝑟𝑒𝑔𝑖𝑜𝑛𝑁𝑢𝑚
Salted Cuboid Rows
 ShardID at the beginning of row key
 Configurable policies for computing ShardID
 From hash result of remaining row key – facilitate randomize
 From specific dimension values – facilitate runtime performance
Country Metrics…DateSellerIDCuboidIDShardID
Compute ShardID from SellerID
 For queries those group by SellerID
 Each shard aggregating non-joint subset of SellerIDs
 No further aggregation at merge side
 For queries those filter by SellerID
 The push down SellerID filter can be trimmed to contain only interested
SellerIDs
Experimental results
Small cuboids getting less shards
1.005586592
0.625 0.625
0.678571429
0.794117647
0
0.2
0.4
0.6
0.8
1
1.2
SQL 1 SQL 2 SQL 3 SQL 4 SQL 5
13 regions 23 regions
Q & A
To get more information about Apache Kylin:
 Apache Kylin Website: http://kylin.apache.org
 Kyligence Website: http://kyligence.io
 Twitter: @ApacheKylin
 Mail list: dev@kylin.apache.org

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Adding Spark support to Kylin at Bay Area Spark Meetup
Adding Spark support to Kylin at Bay Area Spark MeetupAdding Spark support to Kylin at Bay Area Spark Meetup
Adding Spark support to Kylin at Bay Area Spark Meetup
 
Apache Kylin on HBase: Extreme OLAP engine for big data
Apache Kylin on HBase: Extreme OLAP engine for big dataApache Kylin on HBase: Extreme OLAP engine for big data
Apache Kylin on HBase: Extreme OLAP engine for big data
 
Apache Kylin - Balance between space and time - Hadoop Summit 2015
Apache Kylin -  Balance between space and time - Hadoop Summit 2015Apache Kylin -  Balance between space and time - Hadoop Summit 2015
Apache Kylin - Balance between space and time - Hadoop Summit 2015
 
Apache Kylin Streaming
Apache Kylin Streaming Apache Kylin Streaming
Apache Kylin Streaming
 
The Evolution of Apache Kylin
The Evolution of Apache KylinThe Evolution of Apache Kylin
The Evolution of Apache Kylin
 
Kylin olap part 1- getting started
Kylin olap   part 1- getting startedKylin olap   part 1- getting started
Kylin olap part 1- getting started
 
The Evolution of Apache Kylin by Luke Han
The Evolution of Apache Kylin by Luke HanThe Evolution of Apache Kylin by Luke Han
The Evolution of Apache Kylin by Luke Han
 
Apache Kylin Introduction
Apache Kylin IntroductionApache Kylin Introduction
Apache Kylin Introduction
 
6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @Shanghai
6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @Shanghai6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @Shanghai
6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @Shanghai
 
Apache Kylin Use Cases in China and Japan
Apache Kylin Use Cases in China and JapanApache Kylin Use Cases in China and Japan
Apache Kylin Use Cases in China and Japan
 
1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin ...
1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin ...1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin ...
1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin ...
 
Big Data MDX with Mondrian and Apache Kylin
Big Data MDX with Mondrian and Apache KylinBig Data MDX with Mondrian and Apache Kylin
Big Data MDX with Mondrian and Apache Kylin
 
Apache Kylin Open Source Journey for QCon2015 Beijing
Apache Kylin Open Source Journey for QCon2015 BeijingApache Kylin Open Source Journey for QCon2015 Beijing
Apache Kylin Open Source Journey for QCon2015 Beijing
 
Datacubes in Apache Hive at ApacheCon
Datacubes in Apache Hive at ApacheConDatacubes in Apache Hive at ApacheCon
Datacubes in Apache Hive at ApacheCon
 
Kylin Engineering Principles
Kylin Engineering PrinciplesKylin Engineering Principles
Kylin Engineering Principles
 
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveApache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
 
Apache kylin - Big Data Technology Conference 2014 Beijing
Apache kylin - Big Data Technology Conference 2014 BeijingApache kylin - Big Data Technology Conference 2014 Beijing
Apache kylin - Big Data Technology Conference 2014 Beijing
 
Kylin and Druid Presentation
Kylin and Druid PresentationKylin and Druid Presentation
Kylin and Druid Presentation
 
Apache Kylin @ Big Data Europe 2015
Apache Kylin @ Big Data Europe 2015Apache Kylin @ Big Data Europe 2015
Apache Kylin @ Big Data Europe 2015
 
eBay Experimentation Platform on Hadoop
eBay Experimentation Platform on HadoopeBay Experimentation Platform on Hadoop
eBay Experimentation Platform on Hadoop
 

Similar a Apache Kylin’s Performance Boost from Apache HBase

HBaseConAsia2018 Track2-2: Apache Kylin on HBase: Extreme OLAP for big data
HBaseConAsia2018  Track2-2: Apache Kylin on HBase: Extreme OLAP for big dataHBaseConAsia2018  Track2-2: Apache Kylin on HBase: Extreme OLAP for big data
HBaseConAsia2018 Track2-2: Apache Kylin on HBase: Extreme OLAP for big data
Michael Stack
 
How to Build Modern Data Architectures Both On Premises and in the Cloud
How to Build Modern Data Architectures Both On Premises and in the CloudHow to Build Modern Data Architectures Both On Premises and in the Cloud
How to Build Modern Data Architectures Both On Premises and in the Cloud
VMware Tanzu
 

Similar a Apache Kylin’s Performance Boost from Apache HBase (20)

AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
 
SQL Server 2019 Big Data Cluster
SQL Server 2019 Big Data ClusterSQL Server 2019 Big Data Cluster
SQL Server 2019 Big Data Cluster
 
Creating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital TransformationCreating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital Transformation
 
Cloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data LakeCloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data Lake
 
L'architettura di classe enterprise di nuova generazione - Massimo Brignoli
L'architettura di classe enterprise di nuova generazione - Massimo BrignoliL'architettura di classe enterprise di nuova generazione - Massimo Brignoli
L'architettura di classe enterprise di nuova generazione - Massimo Brignoli
 
BI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache CassandraBI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache Cassandra
 
MongoDB - General Purpose Database
MongoDB - General Purpose DatabaseMongoDB - General Purpose Database
MongoDB - General Purpose Database
 
Key Methodologies for Migrating from Oracle to Postgres
Key Methodologies for Migrating from Oracle to PostgresKey Methodologies for Migrating from Oracle to Postgres
Key Methodologies for Migrating from Oracle to Postgres
 
Big Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsBig Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI Pros
 
Modern Business Intelligence and Advanced Analytics
Modern Business Intelligence and Advanced AnalyticsModern Business Intelligence and Advanced Analytics
Modern Business Intelligence and Advanced Analytics
 
HBaseConAsia2018 Track2-2: Apache Kylin on HBase: Extreme OLAP for big data
HBaseConAsia2018  Track2-2: Apache Kylin on HBase: Extreme OLAP for big dataHBaseConAsia2018  Track2-2: Apache Kylin on HBase: Extreme OLAP for big data
HBaseConAsia2018 Track2-2: Apache Kylin on HBase: Extreme OLAP for big data
 
AWS Summit 2013 | India - Big Data Analytics, Abhishek Sinha
AWS Summit 2013 | India - Big Data Analytics, Abhishek SinhaAWS Summit 2013 | India - Big Data Analytics, Abhishek Sinha
AWS Summit 2013 | India - Big Data Analytics, Abhishek Sinha
 
Unlocking Operational Intelligence from the Data Lake
Unlocking Operational Intelligence from the Data LakeUnlocking Operational Intelligence from the Data Lake
Unlocking Operational Intelligence from the Data Lake
 
L’architettura di Classe Enterprise di Nuova Generazione
L’architettura di Classe Enterprise di Nuova GenerazioneL’architettura di Classe Enterprise di Nuova Generazione
L’architettura di Classe Enterprise di Nuova Generazione
 
DataMass Summit - Machine Learning for Big Data in SQL Server
DataMass Summit - Machine Learning for Big Data  in SQL ServerDataMass Summit - Machine Learning for Big Data  in SQL Server
DataMass Summit - Machine Learning for Big Data in SQL Server
 
Partner Enablement: Key Differentiators of Denodo Platform 6.0 for the Field
Partner Enablement: Key Differentiators of Denodo Platform 6.0 for the FieldPartner Enablement: Key Differentiators of Denodo Platform 6.0 for the Field
Partner Enablement: Key Differentiators of Denodo Platform 6.0 for the Field
 
Qo comparision
Qo comparisionQo comparision
Qo comparision
 
Big Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureBig Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft Azure
 
How to Build Modern Data Architectures Both On Premises and in the Cloud
How to Build Modern Data Architectures Both On Premises and in the CloudHow to Build Modern Data Architectures Both On Premises and in the Cloud
How to Build Modern Data Architectures Both On Premises and in the Cloud
 
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
 

Más de HBaseCon

Más de HBaseCon (20)

hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kuberneteshbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
 
hbaseconasia2017: HBase on Beam
hbaseconasia2017: HBase on Beamhbaseconasia2017: HBase on Beam
hbaseconasia2017: HBase on Beam
 
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei
hbaseconasia2017: HBase Disaster Recovery Solution at Huaweihbaseconasia2017: HBase Disaster Recovery Solution at Huawei
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei
 
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinteresthbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
 
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
 
hbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: Apache HBase at Neteasehbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: Apache HBase at Netease
 
hbaseconasia2017: HBase在Hulu的使用和实践
hbaseconasia2017: HBase在Hulu的使用和实践hbaseconasia2017: HBase在Hulu的使用和实践
hbaseconasia2017: HBase在Hulu的使用和实践
 
hbaseconasia2017: 基于HBase的企业级大数据平台
hbaseconasia2017: 基于HBase的企业级大数据平台hbaseconasia2017: 基于HBase的企业级大数据平台
hbaseconasia2017: 基于HBase的企业级大数据平台
 
hbaseconasia2017: HBase at JD.com
hbaseconasia2017: HBase at JD.comhbaseconasia2017: HBase at JD.com
hbaseconasia2017: HBase at JD.com
 
hbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Large scale data near-line loading method and architecturehbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Large scale data near-line loading method and architecture
 
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huaweihbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
 
hbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMihbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMi
 
hbaseconasia2017: hbase-2.0.0
hbaseconasia2017: hbase-2.0.0hbaseconasia2017: hbase-2.0.0
hbaseconasia2017: hbase-2.0.0
 
HBaseCon2017 Democratizing HBase
HBaseCon2017 Democratizing HBaseHBaseCon2017 Democratizing HBase
HBaseCon2017 Democratizing HBase
 
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon2017 Removable singularity: a story of HBase upgrade in PinterestHBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
 
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBaseHBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
 
HBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBaseHBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBase
 
HBaseCon2017 Highly-Available HBase
HBaseCon2017 Highly-Available HBaseHBaseCon2017 Highly-Available HBase
HBaseCon2017 Highly-Available HBase
 
HBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at DidiHBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at Didi
 
HBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase ClientHBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase Client
 

Último

%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
masabamasaba
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
VictoriaMetrics
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
shinachiaurasa2
 

Último (20)

OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT  - Elevating Productivity in Today's Agile EnvironmentHarnessing ChatGPT  - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 

Apache Kylin’s Performance Boost from Apache HBase

  • 1. Hongbin Ma, Luke Han Kyligence Inc. Apache Kylin’s Performance Boost from Apache HBase
  • 2. About us Hongbin Ma| 马洪宾  PMC member of Apache Kylin  Technical partner of Kyligence Inc.  mahongbin@apache.org Kyligence Inc.  Kyligence is a leading data intelligence company focusing on Big Data technologies and innovation, offering intelligent platform and product powered by Apache Kylin™ for enterprise ready business analytics solutions. Luke Han | 韩卿  Co-creator & VP of Apache Kylin  ASF Member  Co-founder & CEO at Kyligence Inc.  lukehan@apache.org
  • 3. Apache Kylin aerial view MapReduce/Spark Kylin BI Tools, Web App… ANSI SQL
  • 4. What is Apache Kylin  Apache Kylin is an open source distributed analytics engine that provides a SQL interface for multi-dimensional analysis on Hadoop  Works well with extremely large datasets  Provides REST API, ODBC and JDBC as user interface  Widely adopted by many companies like eBay, JD, Baidu, NetEase, VIP.com, etc.
  • 6. What is Apache Kylin  Apache Kylin is an open source distributed analytics engine that provides a SQL interface for multi-dimensional analysis on Hadoop  Works well with extremely large datasets  Provides REST API, ODBC and JDBC as user interface  Widely adopted by many companies like eBay, JD, Baidu, NetEase, VIP.com, etc.  Apache Kylin pre-calculates OLAP cubes with a horizontal scalable computation framework(MapReduce, Spark, etc.) and store the cubes into a reliable & scalable data store(HBase, Casscandra, etc.)
  • 7. Architecture Design Cube Builder (MapReduce, Spark, etc…) SQL Low Latency - SecondsRouting 3rd Party App (Web App, Mobile…) Metadata SQL-Based Tool (BI Tools: Tableau…) Query Engine Hadoop Hive REST API JDBC/ODBC  Online Analysis Data Flow  Offline Data Flow  Clients/Users interactive with Kylin via SQL  OLAP Cube is transparent to users Star Schema Data Key Value Data Data Cube OLAP Cubes (HBase) SQL REST ServerDataSource Abstraction Engine Abstraction Storage Abstraction
  • 8. Cube data explained dimensions cuboid cuboid lattice
  • 9. Cubes stored in HBase Let’s take a looks at cuboid (D1,D3,D5) where all dimensions are: (D1,D2,D3,D4,D5) This cuboid is donated as “cuboid 00010101”
  • 10. Why HBase as the first choice?  Well integrated with Hadoop  Block encoding to reduce storage footprint  Good at both seeking and scanning  Coprocessors to move computation to data  Scalable and flexible as a data store
  • 11. Region server How Kylin queries HBase Kylin Query Server region coprocessor Country Metrics…DateSellerIDCuboidID 2. Scan with Fuzzy Key Filter 1. Filter/Aggregation push down 3. Half baked results
  • 12. May still be slow when  The cuboid is large because there’s really lots of combinations in it  Cuboid layout is not friendly to query, e.g. filter on suffix dimensions while group by prefix dimensions.  The filter in query is huge and complex  Regions are returning too many half-baked results
  • 13. Solution: Cube + MPP Kylin Query Server
  • 14. Novelty  Compared with “pure” MPP solutions  Cube data is more query-friendly because it is pre-aggregated and sorted.  Faster speed  Less CPU consumption  Less storage read  Able to leverage column storage and inverted index just like typical MPP  Compared with “pure” Cubing technologies  Overcome the bottleneck in cube size  Overcome the bottleneck in cube visiting speed
  • 15. Problem  The sizes of different cuboids in the same cube may vary  Too many parallelism for small cuboids is harmful  A RPC is required for each shard, we don’t want to abuse network/CPU resource
  • 16. Solution: Shard Circle 0 1 2 3 4 5 6 7 8 9 Given estimated size for each cuboid 𝑆𝑖, and expected size for each region 𝑆𝑟 (specified by modeler) 𝑟𝑒𝑔𝑖𝑜𝑛𝑁𝑢𝑚 = 𝑆𝑖 𝑆𝑟 𝑐𝑢𝑏𝑜𝑖𝑑𝑅𝑒𝑔𝑖𝑜𝑛𝑁𝑢𝑚 = 𝑆𝑖 ∗ 𝑓𝑎𝑐𝑡𝑜𝑟 𝑆𝑟 𝑐𝑢𝑏𝑜𝑖𝑑𝐶𝑖𝑟𝑐𝑙𝑒𝑆𝑡𝑎𝑟𝑡 = ℎ𝑎𝑠ℎ 𝑖 𝑀𝑂𝐷 𝑟𝑒𝑔𝑖𝑜𝑛𝑁𝑢𝑚
  • 17. Salted Cuboid Rows  ShardID at the beginning of row key  Configurable policies for computing ShardID  From hash result of remaining row key – facilitate randomize  From specific dimension values – facilitate runtime performance Country Metrics…DateSellerIDCuboidIDShardID
  • 18. Compute ShardID from SellerID  For queries those group by SellerID  Each shard aggregating non-joint subset of SellerIDs  No further aggregation at merge side  For queries those filter by SellerID  The push down SellerID filter can be trimmed to contain only interested SellerIDs
  • 20. Small cuboids getting less shards 1.005586592 0.625 0.625 0.678571429 0.794117647 0 0.2 0.4 0.6 0.8 1 1.2 SQL 1 SQL 2 SQL 3 SQL 4 SQL 5 13 regions 23 regions
  • 21. Q & A To get more information about Apache Kylin:  Apache Kylin Website: http://kylin.apache.org  Kyligence Website: http://kyligence.io  Twitter: @ApacheKylin  Mail list: dev@kylin.apache.org

Notas del editor

  1. depict of cube data
  2. Kylin arch