SlideShare una empresa de Scribd logo
1 de 20
Big Data Schema Design

               Deepak
Overview
•   Schema design is vital for performance.
•   Keywords : Non-relational, NOSQL, Distributed
•   Underlying File system : GFS, HDFS
•   Examples : Hadoop, GFS, Hbase, Big Tables etc
•   Example implementations : Facebook, Wallmart
    etc.
When to use
• Typically with systems having >=100’s of
  millions/billions rows
• Records of the order of 100’s or 1000’s of
  TB’s
• No advanced Query Language needed
• Typed columns or other RDBMS features not
  needed
Hadoop Architecture
Hadoop Ecosystem
HBase Architecture
Overview
• HBase runs on top of HDFS
• HDFS was chosen because of its fault tolerance,
  check summing, failover properties
• Java Native client or REST API
• Manager manages cluster, Region Servers
  manages data
HBase Data Model
• Table: design-time namespace, has many rows.
• Row: atomic key/value container, with one row
  key
• Column Family: divide columns into physical files
• Column: a key in the k/v container inside a row
• Timestamp: long milliseconds, sorted descending
• Value: a time-versioned value in the k/v container
Distribution
More distribution
Thoughts on the logical view
• Unit of scalability is Region.
• The rows are not tied to a server. They maybe
  moved around for load balancing.
• Add nodes so that we do not have too many
  regions per node
• Too many regions per node will work against
  distribution
Column Family
• Each Column Family represents a Physical storage
  unit ( A Directory)
• Data that are queried together should be stored
  together.
• Features such as compression can be enabled per
  Column Family
Bloom Filter
• Generated automatically when an HFile is
  flushed to disk
• Available in primary memory
• Contains Row keys
• CK can be stored as part of RK, but that
  might overload the memory.
• Can filter based on what is stored.
Physical View
Key Cardinality
Tall vs Fat Tables
• Fat tables with large amounts of data in each
  column.
• Tall tables with large amounts of rows.
• Tall is good for search or scans
• Fat is good for fetches or gets
• Rows don’t split
• Atomicity is only at row level, having compound
  keys, atomicity is not guaranteed
Key Design
• Sequential keys : Example timestamp as key
• With Sequential keys you keep hot spotting on a
  region.
• Salting to distribute the records
• Field promotion
• Random keys
Key Design Performance
Summary
• Think twice before you decide on NOSQL
  technologies
• Avoid hotspots
• Store values at appropriate places
• Choose the right keys
• Store inferences into RDBMS if necessary
Visit us:

   Facebook: http://www.facebook.com/QBurst
        Twitter: http://twitter.com/qburst
 Google+: https://plus.google.com/+qburst/posts
LinkedIn: http://www.linkedin.com/company/qburst
YouTube: http://www.youtube.com/QBurstVideos


                www.qburst.com

Más contenido relacionado

La actualidad más candente

Hadoop Infrastructure (Oct. 3rd, 2012)
Hadoop Infrastructure (Oct. 3rd, 2012)Hadoop Infrastructure (Oct. 3rd, 2012)
Hadoop Infrastructure (Oct. 3rd, 2012)
John Dougherty
 
MySQL Storage Engines
MySQL Storage EnginesMySQL Storage Engines
MySQL Storage Engines
Karthik .P.R
 
Using flash on the server side
Using flash on the server sideUsing flash on the server side
Using flash on the server side
Howard Marks
 

La actualidad más candente (20)

Supercharge your RDBMS with Elasticsearch
Supercharge your RDBMS with ElasticsearchSupercharge your RDBMS with Elasticsearch
Supercharge your RDBMS with Elasticsearch
 
HBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at DidiHBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at Didi
 
Introduction to CosmosDB - Azure Bootcamp 2018
Introduction to CosmosDB - Azure Bootcamp 2018Introduction to CosmosDB - Azure Bootcamp 2018
Introduction to CosmosDB - Azure Bootcamp 2018
 
Indexing with solr search server and hadoop framework
Indexing with solr search server and hadoop frameworkIndexing with solr search server and hadoop framework
Indexing with solr search server and hadoop framework
 
HBase
HBaseHBase
HBase
 
Hadoop Infrastructure (Oct. 3rd, 2012)
Hadoop Infrastructure (Oct. 3rd, 2012)Hadoop Infrastructure (Oct. 3rd, 2012)
Hadoop Infrastructure (Oct. 3rd, 2012)
 
Road to cloud-iaas
Road to cloud-iaasRoad to cloud-iaas
Road to cloud-iaas
 
MySQL Storage Engines
MySQL Storage EnginesMySQL Storage Engines
MySQL Storage Engines
 
CosmosDB for DBAs & Developers
CosmosDB for DBAs & DevelopersCosmosDB for DBAs & Developers
CosmosDB for DBAs & Developers
 
Share point 2013 on azure
Share point 2013 on azureShare point 2013 on azure
Share point 2013 on azure
 
Using flash on the server side
Using flash on the server sideUsing flash on the server side
Using flash on the server side
 
Postgres Open
Postgres OpenPostgres Open
Postgres Open
 
Hive big-data meetup
Hive big-data meetupHive big-data meetup
Hive big-data meetup
 
Storage for VDI
Storage for VDIStorage for VDI
Storage for VDI
 
Scaling RDBMS on AWS- ClustrixDB @AWS Meetup 20160711
Scaling RDBMS on AWS- ClustrixDB @AWS Meetup 20160711Scaling RDBMS on AWS- ClustrixDB @AWS Meetup 20160711
Scaling RDBMS on AWS- ClustrixDB @AWS Meetup 20160711
 
Short introduction to Redis
Short introduction to RedisShort introduction to Redis
Short introduction to Redis
 
Building your data warehouse with Redshift
Building your data warehouse with RedshiftBuilding your data warehouse with Redshift
Building your data warehouse with Redshift
 
Barcamp Macau 2014 - Introduction to AWS
Barcamp Macau 2014 - Introduction to AWSBarcamp Macau 2014 - Introduction to AWS
Barcamp Macau 2014 - Introduction to AWS
 
Keynote: The Future of Apache HBase
Keynote: The Future of Apache HBaseKeynote: The Future of Apache HBase
Keynote: The Future of Apache HBase
 
CosmosDb for beginners
CosmosDb for beginnersCosmosDb for beginners
CosmosDb for beginners
 

Similar a Schema Design

UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
Rahul Borate
 
Hbase schema design and sizing apache-con europe - nov 2012
Hbase schema design and sizing   apache-con europe - nov 2012Hbase schema design and sizing   apache-con europe - nov 2012
Hbase schema design and sizing apache-con europe - nov 2012
Chris Huang
 

Similar a Schema Design (20)

HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practice
 
HBase Advanced Schema Design - Berlin Buzzwords - June 2012
HBase Advanced Schema Design - Berlin Buzzwords - June 2012HBase Advanced Schema Design - Berlin Buzzwords - June 2012
HBase Advanced Schema Design - Berlin Buzzwords - June 2012
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
 
NoSql - mayank singh
NoSql - mayank singhNoSql - mayank singh
NoSql - mayank singh
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
 
Apache HBase Workshop
Apache HBase WorkshopApache HBase Workshop
Apache HBase Workshop
 
Apache hive
Apache hiveApache hive
Apache hive
 
Comparative study of modern databases
Comparative study of modern databasesComparative study of modern databases
Comparative study of modern databases
 
Intro to HBase - Lars George
Intro to HBase - Lars GeorgeIntro to HBase - Lars George
Intro to HBase - Lars George
 
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"SE2016 Java Valerii Moisieienko "Apache HBase Workshop"
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"
 
Cassandra an overview
Cassandra an overviewCassandra an overview
Cassandra an overview
 
HBase Advanced - Lars George
HBase Advanced - Lars GeorgeHBase Advanced - Lars George
HBase Advanced - Lars George
 
L6.sp17.pptx
L6.sp17.pptxL6.sp17.pptx
L6.sp17.pptx
 
Technologies for Data Analytics Platform
Technologies for Data Analytics PlatformTechnologies for Data Analytics Platform
Technologies for Data Analytics Platform
 
Hbase schema design and sizing apache-con europe - nov 2012
Hbase schema design and sizing   apache-con europe - nov 2012Hbase schema design and sizing   apache-con europe - nov 2012
Hbase schema design and sizing apache-con europe - nov 2012
 
NoSql
NoSqlNoSql
NoSql
 
Database Technologies
Database TechnologiesDatabase Technologies
Database Technologies
 

Más de QBurst

Más de QBurst (9)

Frontend Optimization - Tips for Improving the Performance of Single Page App...
Frontend Optimization - Tips for Improving the Performance of Single Page App...Frontend Optimization - Tips for Improving the Performance of Single Page App...
Frontend Optimization - Tips for Improving the Performance of Single Page App...
 
Best Practices for Building Cloud-Native Apps
Best Practices for Building Cloud-Native AppsBest Practices for Building Cloud-Native Apps
Best Practices for Building Cloud-Native Apps
 
Project Tracking Application
Project Tracking ApplicationProject Tracking Application
Project Tracking Application
 
DevOps Transformation: Learnings and Best Practices
DevOps Transformation: Learnings and Best PracticesDevOps Transformation: Learnings and Best Practices
DevOps Transformation: Learnings and Best Practices
 
Cloud Migration Strategy and Best Practices
Cloud Migration Strategy and Best PracticesCloud Migration Strategy and Best Practices
Cloud Migration Strategy and Best Practices
 
Implementing AMP on WP Blog
Implementing AMP on WP Blog Implementing AMP on WP Blog
Implementing AMP on WP Blog
 
HTTPS Impact on SEO
HTTPS Impact on SEOHTTPS Impact on SEO
HTTPS Impact on SEO
 
How to Secure Your WordPress Site
How to Secure Your WordPress SiteHow to Secure Your WordPress Site
How to Secure Your WordPress Site
 
QBurst Big Data Expertise - Infographic
 QBurst Big Data Expertise - Infographic  QBurst Big Data Expertise - Infographic
QBurst Big Data Expertise - Infographic
 

Último

Chandigarh Escorts Service 📞8868886958📞 Just📲 Call Nihal Chandigarh Call Girl...
Chandigarh Escorts Service 📞8868886958📞 Just📲 Call Nihal Chandigarh Call Girl...Chandigarh Escorts Service 📞8868886958📞 Just📲 Call Nihal Chandigarh Call Girl...
Chandigarh Escorts Service 📞8868886958📞 Just📲 Call Nihal Chandigarh Call Girl...
Sheetaleventcompany
 
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
daisycvs
 
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
lizamodels9
 
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756
dollysharma2066
 
0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdf0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdf
Renandantas16
 

Último (20)

Monthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptxMonthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptx
 
Pharma Works Profile of Karan Communications
Pharma Works Profile of Karan CommunicationsPharma Works Profile of Karan Communications
Pharma Works Profile of Karan Communications
 
Value Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and painsValue Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and pains
 
Dr. Admir Softic_ presentation_Green Club_ENG.pdf
Dr. Admir Softic_ presentation_Green Club_ENG.pdfDr. Admir Softic_ presentation_Green Club_ENG.pdf
Dr. Admir Softic_ presentation_Green Club_ENG.pdf
 
Falcon's Invoice Discounting: Your Path to Prosperity
Falcon's Invoice Discounting: Your Path to ProsperityFalcon's Invoice Discounting: Your Path to Prosperity
Falcon's Invoice Discounting: Your Path to Prosperity
 
Chandigarh Escorts Service 📞8868886958📞 Just📲 Call Nihal Chandigarh Call Girl...
Chandigarh Escorts Service 📞8868886958📞 Just📲 Call Nihal Chandigarh Call Girl...Chandigarh Escorts Service 📞8868886958📞 Just📲 Call Nihal Chandigarh Call Girl...
Chandigarh Escorts Service 📞8868886958📞 Just📲 Call Nihal Chandigarh Call Girl...
 
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLMONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
 
Call Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine ServiceCall Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine Service
 
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
 
VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
 
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
 
RSA Conference Exhibitor List 2024 - Exhibitors Data
RSA Conference Exhibitor List 2024 - Exhibitors DataRSA Conference Exhibitor List 2024 - Exhibitors Data
RSA Conference Exhibitor List 2024 - Exhibitors Data
 
Organizational Transformation Lead with Culture
Organizational Transformation Lead with CultureOrganizational Transformation Lead with Culture
Organizational Transformation Lead with Culture
 
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service AvailableCall Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
 
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756
 
Falcon Invoice Discounting platform in india
Falcon Invoice Discounting platform in indiaFalcon Invoice Discounting platform in india
Falcon Invoice Discounting platform in india
 
Cracking the Cultural Competence Code.pptx
Cracking the Cultural Competence Code.pptxCracking the Cultural Competence Code.pptx
Cracking the Cultural Competence Code.pptx
 
Famous Olympic Siblings from the 21st Century
Famous Olympic Siblings from the 21st CenturyFamous Olympic Siblings from the 21st Century
Famous Olympic Siblings from the 21st Century
 
0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdf0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdf
 

Schema Design

  • 1. Big Data Schema Design Deepak
  • 2. Overview • Schema design is vital for performance. • Keywords : Non-relational, NOSQL, Distributed • Underlying File system : GFS, HDFS • Examples : Hadoop, GFS, Hbase, Big Tables etc • Example implementations : Facebook, Wallmart etc.
  • 3. When to use • Typically with systems having >=100’s of millions/billions rows • Records of the order of 100’s or 1000’s of TB’s • No advanced Query Language needed • Typed columns or other RDBMS features not needed
  • 7. Overview • HBase runs on top of HDFS • HDFS was chosen because of its fault tolerance, check summing, failover properties • Java Native client or REST API • Manager manages cluster, Region Servers manages data
  • 8. HBase Data Model • Table: design-time namespace, has many rows. • Row: atomic key/value container, with one row key • Column Family: divide columns into physical files • Column: a key in the k/v container inside a row • Timestamp: long milliseconds, sorted descending • Value: a time-versioned value in the k/v container
  • 11. Thoughts on the logical view • Unit of scalability is Region. • The rows are not tied to a server. They maybe moved around for load balancing. • Add nodes so that we do not have too many regions per node • Too many regions per node will work against distribution
  • 12. Column Family • Each Column Family represents a Physical storage unit ( A Directory) • Data that are queried together should be stored together. • Features such as compression can be enabled per Column Family
  • 13. Bloom Filter • Generated automatically when an HFile is flushed to disk • Available in primary memory • Contains Row keys • CK can be stored as part of RK, but that might overload the memory. • Can filter based on what is stored.
  • 16. Tall vs Fat Tables • Fat tables with large amounts of data in each column. • Tall tables with large amounts of rows. • Tall is good for search or scans • Fat is good for fetches or gets • Rows don’t split • Atomicity is only at row level, having compound keys, atomicity is not guaranteed
  • 17. Key Design • Sequential keys : Example timestamp as key • With Sequential keys you keep hot spotting on a region. • Salting to distribute the records • Field promotion • Random keys
  • 19. Summary • Think twice before you decide on NOSQL technologies • Avoid hotspots • Store values at appropriate places • Choose the right keys • Store inferences into RDBMS if necessary
  • 20. Visit us: Facebook: http://www.facebook.com/QBurst Twitter: http://twitter.com/qburst Google+: https://plus.google.com/+qburst/posts LinkedIn: http://www.linkedin.com/company/qburst YouTube: http://www.youtube.com/QBurstVideos www.qburst.com

Notas del editor

  1. Activity 1   - Study Make a conscious effort to improve attention to detail everywhere. Wherever you go, look for things to recall later. When you're shopping look for three things to study. Take 15 to 20 seconds to study each object. After returning home, write down specific things about the objects. Make notes of the size, the shape, the color.   Activity 2     - Recollection           People tend to get careless about the things in which they are familiar. Complacency especially during routine actions does not exercise the mind. Make a point to look for details and notice things as often as possible. Have you noticed the number of steps you need to climb from the ground to reach 3rd floor and 4th floor  at QBurst.63,85)