SlideShare una empresa de Scribd logo
1 de 31
Descargar para leer sin conexión
Apache HBase at DiDi
Kang Yuan
Agenda
1. About Us
2. DiDi HBase Platform
3. Application and Solution
4. Challenges and future
About Us
• DiDi
• In <Silicon valley>
About Us
• DiDi
• the world’s leading mobile transportation platform
• 20 million rides on a daily basis
Express Mini Bus Bus
Luxury Car Car pool Taxi
Premier Designated
driver
Trial run
Rental Car Taxi For
Aged
ofo sharing
Bike
About Us
• Mission
To Redefine the Future of Mobility
• Vision
To become a global leader in smart transportation and automotive
technology, the world’s largest operator of vehicle networks and a global
leader in smart transportation systems
About Us
• HBase Team:4 Developers
• Kang Yuan
• Yang Li
• Hanzhi Zhang
• Jingyi Yao
• Attached to BigData Architecture Department
• Cooperate with Hadoop/Hive/Spark/Flink/Druid Team closely
DiDi HBase Platform
• Cluster(3)
• Storage Cluster – location A
• Compute Cluster – location B
• Storage Cluster – location B
• Location A: the same place with the hadoop Cluster
• Location B: for online business or streaming
• Application(50+ business and 160+ tables)
• Batching Job result storage
• Online writing/reading
• Persistence for Streaming Jobs
• 99.95% available
DiDi HBase Platform
• HBase Version
• Based on 0.98.21
• Region Group patch HBASE-6721
• Thrift2 patch
• Multi-tenant Problem
• A bad table can put down a cluster
• We don’t know who the tables belong to
DiDi HBase Platform
• Region Group
DiDi HBase Platform
• Region Group
• Isolate important use cases from others
• Easy to manage(web ui, user group, operation tools)
• Elastic to assign resources
• Different Configuration in one Cluster(for different machine types,
business, testing etc)
• Easy to compute the cost of the business
• Easy to upgrade the regionserver in one Group before do it in the whole
cluster
• Cost(share pool) = TableSize*x x = cost/GB
• Cost(specific group ) = Rscount*y y = cost/RS
DiDi HBase Platform
• Improvement
• Web UI to show group
• MoveTables bug fix
• CreateTableHandler fix
DiDi HBase Platform
DiDi HBase Platform
• DiDi HBase Service
DiDi HBase Platform
Create Project and Tables
DiDi HBase Platform
workflow
DiDi HBase Platform
Monitor your tables
Get your bill, user must care for their cost
DiDi HBase Platform
• Phoenix
• Advantage
• Easy to use for RDBMS User(jdbc、sql)
• Auto salting table for performance and hot spot avoiding
• Like a Big Mysql(One sentence to explain to our users)
• Disadvantage
• Some bug like ordering vector item
• Unstable statistic info caching
• No good in Join case
• So many other hotter system :Presto/Impala/Spark SQL/Kylin
• Successful Use
• Row Timestamp
• Multidimensional Table Schema
• Phoenix(more customers recently)
• Row Timestamp for metrics
• Monitoring table write/read/storage
• Easy to compute avg, max, min for metrics
• Quick to query recently data
• Multi-dimension Table Schema
• MR/Spark Job to compute BI reporting data
• Many demission combination result like city, gender, age, business type
• Primary Key: JobID, date, dimission1, dimission2, dimission3…
• Value: dimNameArray, valueArray
• This can fit nearly all the Multi-dimension reporting business
DiDi HBase Platform
DiDi HBase Platform
• Client Access
• Multiple Languages Clients
• C++, Go, Python, PHP
• Thritf2, QueryServer
• Security(ACL)
Application and Solution(Hadoop Monitor)
• Hadoop Monitor
• Help hadoop to query their fsimage and jobhistory
• BI for Hadoop manager
• Store data in phoenix
Application and Solution
Application and Solution(Gis Query)
• GPS
Application and Solution
• GPS
• Query Model
• Rowkey:ID+Timestamp
• Rowkey:Reversed GeoHash+timestamp+ID
• GeoHash
• A index in two dimensions
• Fit HBase rowkey prefect
• Point 1: same prefix code result in a nearby place
• Point 2: query rowkey prefix can location a region whose area decided by
prefix length
Application and Solution
• GPS
• GeoHash
Application and Solution(Online Machine Learning)
• ETA(Estimated Time of Arrival)
• Origin data collection
• ETL
• Feature extraction
• Storage
• Model Training
Application and Solution
• ETA(Estimated Time of Arrival)
• Training data by spark, every 30 minutes
• Pick up data by city from HBase in 5 minutes
• Compute ETA in 25 minutes
• Rowkey: Salting+CityId+Type0+Type1+Type2+Timestamp
• Columns: Order,Feature
• Every day HBase data will be dumped into HDFS for offline training
Application and Solution(Image)
• Traffic in Cloud
• High Volume Throughput, Little Read
• Read via 8 Thrift nodes
• Road traffic info
• POI data
• Heat-map
• Write with Spark Job
Application and Solution
• Architecture
Challenges and future
• More connect with other bigdata framework
• Hive Phoenix/HBase Handler
• Integration with hive
• Use Hive sql to query phoenix
• Easy to load data from hadoop cluster
• Join with Hive table
• Spark Phoenix/HBase Handler
Challenges and future
• More stable
• hbase1.x upgrade
• OLAP extends
• Kylin
• TPC-H Compare
• Thrift load balancer
• Auto Group balancer
• More powerful DHS
Any Question?

Más contenido relacionado

La actualidad más candente

HBaseConAsia2018 Track2-3: Bringing MySQL Compatibility to HBase using Databa...
HBaseConAsia2018 Track2-3: Bringing MySQL Compatibility to HBase using Databa...HBaseConAsia2018 Track2-3: Bringing MySQL Compatibility to HBase using Databa...
HBaseConAsia2018 Track2-3: Bringing MySQL Compatibility to HBase using Databa...
Michael Stack
 
HBaseConAsia2018 Track3-7: The application of HBase in New Energy Vehicle Mon...
HBaseConAsia2018 Track3-7: The application of HBase in New Energy Vehicle Mon...HBaseConAsia2018 Track3-7: The application of HBase in New Energy Vehicle Mon...
HBaseConAsia2018 Track3-7: The application of HBase in New Energy Vehicle Mon...
Michael Stack
 
RedisConf17 - Home Depot - Turbo charging existing applications with Redis
RedisConf17 - Home Depot - Turbo charging existing applications with RedisRedisConf17 - Home Depot - Turbo charging existing applications with Redis
RedisConf17 - Home Depot - Turbo charging existing applications with Redis
Redis Labs
 

La actualidad más candente (20)

HBaseConAsia2018 Track3-6: HBase at Meituan
HBaseConAsia2018 Track3-6: HBase at MeituanHBaseConAsia2018 Track3-6: HBase at Meituan
HBaseConAsia2018 Track3-6: HBase at Meituan
 
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring BudgetHBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
 
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and SparkHBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
 
HBaseConAsia2018 Track2-3: Bringing MySQL Compatibility to HBase using Databa...
HBaseConAsia2018 Track2-3: Bringing MySQL Compatibility to HBase using Databa...HBaseConAsia2018 Track2-3: Bringing MySQL Compatibility to HBase using Databa...
HBaseConAsia2018 Track2-3: Bringing MySQL Compatibility to HBase using Databa...
 
HBaseConAsia2018 Track3-3: HBase at China Life Insurance
HBaseConAsia2018 Track3-3: HBase at China Life InsuranceHBaseConAsia2018 Track3-3: HBase at China Life Insurance
HBaseConAsia2018 Track3-3: HBase at China Life Insurance
 
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei
hbaseconasia2017: HBase Disaster Recovery Solution at Huaweihbaseconasia2017: HBase Disaster Recovery Solution at Huawei
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei
 
HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...
HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...
HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...
 
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC time
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC timeHBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC time
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC time
 
HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data
HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big DataHBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data
HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data
 
HBaseConAsia2018 Track3-7: The application of HBase in New Energy Vehicle Mon...
HBaseConAsia2018 Track3-7: The application of HBase in New Energy Vehicle Mon...HBaseConAsia2018 Track3-7: The application of HBase in New Energy Vehicle Mon...
HBaseConAsia2018 Track3-7: The application of HBase in New Energy Vehicle Mon...
 
HBase at Bloomberg: High Availability Needs for the Financial Industry
HBase at Bloomberg: High Availability Needs for the Financial IndustryHBase at Bloomberg: High Availability Needs for the Financial Industry
HBase at Bloomberg: High Availability Needs for the Financial Industry
 
HBaseCon 2015- HBase @ Flipboard
HBaseCon 2015- HBase @ FlipboardHBaseCon 2015- HBase @ Flipboard
HBaseCon 2015- HBase @ Flipboard
 
HBaseConAsia2018 Keynote1: Apache HBase Project Status
HBaseConAsia2018 Keynote1: Apache HBase Project StatusHBaseConAsia2018 Keynote1: Apache HBase Project Status
HBaseConAsia2018 Keynote1: Apache HBase Project Status
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big Data
 
25 snowflake
25 snowflake25 snowflake
25 snowflake
 
HBaseCon 2015 General Session: Zen - A Graph Data Model on HBase
HBaseCon 2015 General Session: Zen - A Graph Data Model on HBaseHBaseCon 2015 General Session: Zen - A Graph Data Model on HBase
HBaseCon 2015 General Session: Zen - A Graph Data Model on HBase
 
CosmosDB for DBAs & Developers
CosmosDB for DBAs & DevelopersCosmosDB for DBAs & Developers
CosmosDB for DBAs & Developers
 
HBaseConAsia2018 Track3-5: HBase Practice at Lianjia
HBaseConAsia2018 Track3-5: HBase Practice at LianjiaHBaseConAsia2018 Track3-5: HBase Practice at Lianjia
HBaseConAsia2018 Track3-5: HBase Practice at Lianjia
 
RedisConf17 - Home Depot - Turbo charging existing applications with Redis
RedisConf17 - Home Depot - Turbo charging existing applications with RedisRedisConf17 - Home Depot - Turbo charging existing applications with Redis
RedisConf17 - Home Depot - Turbo charging existing applications with Redis
 
HBaseCon 2015: Optimizing HBase for the Cloud in Microsoft Azure HDInsight
HBaseCon 2015: Optimizing HBase for the Cloud in Microsoft Azure HDInsightHBaseCon 2015: Optimizing HBase for the Cloud in Microsoft Azure HDInsight
HBaseCon 2015: Optimizing HBase for the Cloud in Microsoft Azure HDInsight
 

Similar a HBaseCon2017 Apache HBase at Didi

Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Andrew Brust
 
A Scalable Data Transformation Framework using Hadoop Ecosystem
A Scalable Data Transformation Framework using Hadoop EcosystemA Scalable Data Transformation Framework using Hadoop Ecosystem
A Scalable Data Transformation Framework using Hadoop Ecosystem
DataWorks Summit
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft Platform
Jesus Rodriguez
 
Getting Started with Hadoop
Getting Started with HadoopGetting Started with Hadoop
Getting Started with Hadoop
Cloudera, Inc.
 

Similar a HBaseCon2017 Apache HBase at Didi (20)

Big Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsBig Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI Pros
 
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
 
Technologies for Data Analytics Platform
Technologies for Data Analytics PlatformTechnologies for Data Analytics Platform
Technologies for Data Analytics Platform
 
Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overview
 
Microsoft's Big Play for Big Data
Microsoft's Big Play for Big DataMicrosoft's Big Play for Big Data
Microsoft's Big Play for Big Data
 
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
 
A Scalable Data Transformation Framework using Hadoop Ecosystem
A Scalable Data Transformation Framework using Hadoop EcosystemA Scalable Data Transformation Framework using Hadoop Ecosystem
A Scalable Data Transformation Framework using Hadoop Ecosystem
 
A Scalable Data Transformation Framework using the Hadoop Ecosystem
A Scalable Data Transformation Framework using the Hadoop EcosystemA Scalable Data Transformation Framework using the Hadoop Ecosystem
A Scalable Data Transformation Framework using the Hadoop Ecosystem
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft Platform
 
Modern Big Data Analytics Tools: An Overview
Modern Big Data Analytics Tools: An OverviewModern Big Data Analytics Tools: An Overview
Modern Big Data Analytics Tools: An Overview
 
Bigdata and Hadoop with Docker
Bigdata and Hadoop with DockerBigdata and Hadoop with Docker
Bigdata and Hadoop with Docker
 
Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem
Things Every Oracle DBA Needs to Know about the Hadoop EcosystemThings Every Oracle DBA Needs to Know about the Hadoop Ecosystem
Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem
 
Big Data Processing
Big Data ProcessingBig Data Processing
Big Data Processing
 
Practical introduction to hadoop
Practical introduction to hadoopPractical introduction to hadoop
Practical introduction to hadoop
 
Intro to Big Data
Intro to Big DataIntro to Big Data
Intro to Big Data
 
List of Engineering Colleges in Uttarakhand
List of Engineering Colleges in UttarakhandList of Engineering Colleges in Uttarakhand
List of Engineering Colleges in Uttarakhand
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Big Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsBig Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI Pros
 
Getting Started with Hadoop
Getting Started with HadoopGetting Started with Hadoop
Getting Started with Hadoop
 

Más de HBaseCon

HBaseCon2017 Spark HBase Connector: Feature Rich and Efficient Access to HBas...
HBaseCon2017 Spark HBase Connector: Feature Rich and Efficient Access to HBas...HBaseCon2017 Spark HBase Connector: Feature Rich and Efficient Access to HBas...
HBaseCon2017 Spark HBase Connector: Feature Rich and Efficient Access to HBas...
HBaseCon
 

Más de HBaseCon (20)

hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kuberneteshbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
 
hbaseconasia2017: HBase on Beam
hbaseconasia2017: HBase on Beamhbaseconasia2017: HBase on Beam
hbaseconasia2017: HBase on Beam
 
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinteresthbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
 
hbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: Apache HBase at Neteasehbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: Apache HBase at Netease
 
hbaseconasia2017: HBase在Hulu的使用和实践
hbaseconasia2017: HBase在Hulu的使用和实践hbaseconasia2017: HBase在Hulu的使用和实践
hbaseconasia2017: HBase在Hulu的使用和实践
 
hbaseconasia2017: 基于HBase的企业级大数据平台
hbaseconasia2017: 基于HBase的企业级大数据平台hbaseconasia2017: 基于HBase的企业级大数据平台
hbaseconasia2017: 基于HBase的企业级大数据平台
 
hbaseconasia2017: HBase at JD.com
hbaseconasia2017: HBase at JD.comhbaseconasia2017: HBase at JD.com
hbaseconasia2017: HBase at JD.com
 
hbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Large scale data near-line loading method and architecturehbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Large scale data near-line loading method and architecture
 
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huaweihbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
 
hbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMihbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMi
 
hbaseconasia2017: hbase-2.0.0
hbaseconasia2017: hbase-2.0.0hbaseconasia2017: hbase-2.0.0
hbaseconasia2017: hbase-2.0.0
 
HBaseCon2017 Democratizing HBase
HBaseCon2017 Democratizing HBaseHBaseCon2017 Democratizing HBase
HBaseCon2017 Democratizing HBase
 
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon2017 Removable singularity: a story of HBase upgrade in PinterestHBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
 
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBaseHBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
 
HBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBaseHBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBase
 
HBaseCon2017 Highly-Available HBase
HBaseCon2017 Highly-Available HBaseHBaseCon2017 Highly-Available HBase
HBaseCon2017 Highly-Available HBase
 
HBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase ClientHBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase Client
 
HBaseCon2017 Improving HBase availability in a multi tenant environment
HBaseCon2017 Improving HBase availability in a multi tenant environmentHBaseCon2017 Improving HBase availability in a multi tenant environment
HBaseCon2017 Improving HBase availability in a multi tenant environment
 
HBaseCon2017 Spark HBase Connector: Feature Rich and Efficient Access to HBas...
HBaseCon2017 Spark HBase Connector: Feature Rich and Efficient Access to HBas...HBaseCon2017 Spark HBase Connector: Feature Rich and Efficient Access to HBas...
HBaseCon2017 Spark HBase Connector: Feature Rich and Efficient Access to HBas...
 
HBaseCon2017 HBase at Xiaomi
HBaseCon2017 HBase at XiaomiHBaseCon2017 HBase at Xiaomi
HBaseCon2017 HBase at Xiaomi
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Último (20)

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 

HBaseCon2017 Apache HBase at Didi

  • 1. Apache HBase at DiDi Kang Yuan
  • 2. Agenda 1. About Us 2. DiDi HBase Platform 3. Application and Solution 4. Challenges and future
  • 3. About Us • DiDi • In <Silicon valley>
  • 4. About Us • DiDi • the world’s leading mobile transportation platform • 20 million rides on a daily basis Express Mini Bus Bus Luxury Car Car pool Taxi Premier Designated driver Trial run Rental Car Taxi For Aged ofo sharing Bike
  • 5. About Us • Mission To Redefine the Future of Mobility • Vision To become a global leader in smart transportation and automotive technology, the world’s largest operator of vehicle networks and a global leader in smart transportation systems
  • 6. About Us • HBase Team:4 Developers • Kang Yuan • Yang Li • Hanzhi Zhang • Jingyi Yao • Attached to BigData Architecture Department • Cooperate with Hadoop/Hive/Spark/Flink/Druid Team closely
  • 7. DiDi HBase Platform • Cluster(3) • Storage Cluster – location A • Compute Cluster – location B • Storage Cluster – location B • Location A: the same place with the hadoop Cluster • Location B: for online business or streaming • Application(50+ business and 160+ tables) • Batching Job result storage • Online writing/reading • Persistence for Streaming Jobs • 99.95% available
  • 8. DiDi HBase Platform • HBase Version • Based on 0.98.21 • Region Group patch HBASE-6721 • Thrift2 patch • Multi-tenant Problem • A bad table can put down a cluster • We don’t know who the tables belong to
  • 9. DiDi HBase Platform • Region Group
  • 10. DiDi HBase Platform • Region Group • Isolate important use cases from others • Easy to manage(web ui, user group, operation tools) • Elastic to assign resources • Different Configuration in one Cluster(for different machine types, business, testing etc) • Easy to compute the cost of the business • Easy to upgrade the regionserver in one Group before do it in the whole cluster • Cost(share pool) = TableSize*x x = cost/GB • Cost(specific group ) = Rscount*y y = cost/RS
  • 11. DiDi HBase Platform • Improvement • Web UI to show group • MoveTables bug fix • CreateTableHandler fix
  • 13. DiDi HBase Platform • DiDi HBase Service
  • 14. DiDi HBase Platform Create Project and Tables
  • 16. DiDi HBase Platform Monitor your tables Get your bill, user must care for their cost
  • 17. DiDi HBase Platform • Phoenix • Advantage • Easy to use for RDBMS User(jdbc、sql) • Auto salting table for performance and hot spot avoiding • Like a Big Mysql(One sentence to explain to our users) • Disadvantage • Some bug like ordering vector item • Unstable statistic info caching • No good in Join case • So many other hotter system :Presto/Impala/Spark SQL/Kylin • Successful Use • Row Timestamp • Multidimensional Table Schema
  • 18. • Phoenix(more customers recently) • Row Timestamp for metrics • Monitoring table write/read/storage • Easy to compute avg, max, min for metrics • Quick to query recently data • Multi-dimension Table Schema • MR/Spark Job to compute BI reporting data • Many demission combination result like city, gender, age, business type • Primary Key: JobID, date, dimission1, dimission2, dimission3… • Value: dimNameArray, valueArray • This can fit nearly all the Multi-dimension reporting business DiDi HBase Platform
  • 19. DiDi HBase Platform • Client Access • Multiple Languages Clients • C++, Go, Python, PHP • Thritf2, QueryServer • Security(ACL)
  • 20. Application and Solution(Hadoop Monitor) • Hadoop Monitor • Help hadoop to query their fsimage and jobhistory • BI for Hadoop manager • Store data in phoenix
  • 23. Application and Solution • GPS • Query Model • Rowkey:ID+Timestamp • Rowkey:Reversed GeoHash+timestamp+ID • GeoHash • A index in two dimensions • Fit HBase rowkey prefect • Point 1: same prefix code result in a nearby place • Point 2: query rowkey prefix can location a region whose area decided by prefix length
  • 24. Application and Solution • GPS • GeoHash
  • 25. Application and Solution(Online Machine Learning) • ETA(Estimated Time of Arrival) • Origin data collection • ETL • Feature extraction • Storage • Model Training
  • 26. Application and Solution • ETA(Estimated Time of Arrival) • Training data by spark, every 30 minutes • Pick up data by city from HBase in 5 minutes • Compute ETA in 25 minutes • Rowkey: Salting+CityId+Type0+Type1+Type2+Timestamp • Columns: Order,Feature • Every day HBase data will be dumped into HDFS for offline training
  • 27. Application and Solution(Image) • Traffic in Cloud • High Volume Throughput, Little Read • Read via 8 Thrift nodes • Road traffic info • POI data • Heat-map • Write with Spark Job
  • 29. Challenges and future • More connect with other bigdata framework • Hive Phoenix/HBase Handler • Integration with hive • Use Hive sql to query phoenix • Easy to load data from hadoop cluster • Join with Hive table • Spark Phoenix/HBase Handler
  • 30. Challenges and future • More stable • hbase1.x upgrade • OLAP extends • Kylin • TPC-H Compare • Thrift load balancer • Auto Group balancer • More powerful DHS