SlideShare a Scribd company logo
1 of 77
Download to read offline
Hypertable  
  
An  Open  Source,  
High  Performance,  
Massively  Scalable  Database  
  	
Doug Judd
CEO Hypertable Inc.
Three  Reasons  to  
Choose  Hypertable	
•  High Performance
•  Open Source
•  Future Direction SQL
Introduction
Highlights	
•  Modeled after Google’s Bigtable database
•  High Performance Implementation (C++)
•  Apache Thrift interface for all popular languages
(Java, PHP, Ruby, Python, Perl, etc)
•  Broad Hadoop distribution support
o  Apache 2
o  Cloudera CDH3, CDH4, CDH5
o  IBM BigInsights 3
o  Hortonworks HDP2
o  MapR
•  Actively developed for 8 years
Open  Source	
•  Licensed under the GPL
•  Hosted on GitHub
o  git://github.com/hypertable/hypertable.git
o  https://github.com/hypertable/hypertable.git
•  Online source documentation
•  Mailing Lists
o  groups.google.com/group/hypertable-user
o  groups.google.com/group/hypertable-dev
Bigtable	
•  Google’s most successful scalable database
•  Bigtable underpins 100+ Google services
•  YouTube, Blogger, Google Earth, Google Maps,
Orkut, Gmail, Google Analytics, Google Book
Search, Google Code, Crawl Database, Google
Code …
•  Data is physically ordered by primary key – it’s not a
distributed hash table
How  Hypertable  Differs  From  
A  Traditional  RDBMS	
•  Horizontally Scalable
•  Sparse Table Structure
o  Variable number of columns per-row
o  Rows can have billions of columns
•  Cells can have multiple time stamped versions
Database  Model	
•  Sparse, two-dimensional tables
•  Cells can have multiple versions
•  Cells addressed by 4-part key
o  Row
o  Column family
o  Column qualifier
o  Timestamp
Conceptual  Table  
Representation
Actual  Table  
Representation
Anatomy  of  a  Key	
•  Column Family is 8-bit
•  Timestamp and Revision are 64-bit integer
nanoseconds since Epoch
•  Simple byte-wise comparison
Architecture
Table  Growth  Process
How  Scaling  Works
How  Scaling  Works
How  Scaling  Works
High  Level  Architecture
High  Level  Architecture
High  Level  Architecture
High  Level  Architecture
High  Level  Architecture
High  Level  Architecture
RangeServer  
Insert  Handling
RangeServer  
Query  Handling
Cellstore  Format
Bloom  Filter
Request  Routing
Administration
Cluster  Task  
AutomationTool	
•  ht_cluster
•  Modeled after Capistrano
•  Role
o  Designates a function or service and the set of machines that will perform
that function or service
o  Examples: Hyperspace, Master, Slave (RangeServer), ThriftBroker
o  Machines can belong to one ore more roles
•  Task
o  Script written for specific roles and used to manage the associated
function or service
o  Examples: start_hyperspace, stop_hyperspace
cluster.def	
INSTALL_PREFIX=/opt/hypertable
HYPERTABLE_VERSION=0.9.8.2
PACKAGE_FILE=/tmp/hypertable-0.9.8.2-linux-x86_64.tar.gz
FS=hadoop
HADOOP_DISTRO=cdh4
ORIGIN_CONFIG_FILE=/root/hypertable.cfg
PROMPT_CLEAN=true
role: source test00
role: master test[00-02]
role: hyperspace test[00-02]
role: slave test[03-99] - test37
role: thriftbroker
role: spare
include: "core.tasks"
Common  Tasks	
ht cluster start
ht cluster stop
ht cluster push_config
ht cluster install_package
ht cluster upgrade
Monitoring
Ganglia  Metrics
Thrift  Broker  Metrics	
Metric	
 Units	
Connections	
 count	
Requests	
 requests/s	
Errors	
 errors/s	
Virtual  Memory	
 GB	
Resident  Memory	
 GB	
Heap  Size	
 GB	
Heap  Slack  Bytes	
 GB	
CPU  user	
 percentage	
CPU  sys	
 percentage	
Version	
 string
Range  Server  Metrics	
Metric	
 Units	
Scans	
 scans/s	
Updates	
 updates/s	
Bytes  Returned	
 bytes/s	
Bytes  Scanned	
 bytes/s	
Byte  Scan  Yield	
 percentage	
Bytes  WriUen	
 bytes/s	
Cells  Returned	
 cells/s	
Cells  Scanned	
 cells/s	
Cell  Scan  Yield	
 percentage	
Outstanding  Scanners	
count	
Request  Backlog	
 count	
Metric	
 Units	
Major  Compactions	
 count	
Minor  Compactions	
 count	
Merging  Compactions	
 count	
GC  Compactions	
 count	
Virtual  Memory	
 GB	
Resident  Memory	
 GB	
Heap  Size	
 GB	
Heap  Slack  Bytes	
 GB	
Tracked  Memory	
 GB	
CPU  user	
 percentage	
CPU  sys	
 percentage
Range  Server  Metrics	
Metric	
 Units	
Ranges	
 count	
CellStores	
 count	
Block  Cache  Hits	
 percentage	
Block  Cache  Memory	
 GB	
Block  Cache  Fill	
 GB	
Query  Cache  Hits	
 Percentage	
Query  Cache  Memory	
GB	
Query  Cache  Fill	
 GB	
Version	
 string
FS  Broker  Metrics	
Metric	
 Units	
Read  Throughput	
 MB/s	
Write  Throughput	
 MB/s	
Syncs	
 syncs/s	
Sync  Latency	
 milliseconds	
Errors	
 count	
JVM  GCs	
 count	
JVM  GC  Time	
 milliseconds	
JVM  Heap  Size	
 GB	
Virtual  Memory	
 GB	
Resident  Memory	
 GB	
Metric	
 Units	
Heap  Size	
 GB	
Heap  Slack  Bytes	
 GB	
CPU  user	
 percentage	
CPU  sys	
 percentage	
Version	
 string
Master  and  Hyperspace  
Metrics	
Metric	
 Units	
Operations	
 operations/s	
Virtual  Memory	
 GB	
Resident  Memory	
 GB	
Heap  Size	
 GB	
Heap  Slack  Bytes	
 GB	
CPU  user	
 percentage	
CPU  sys	
 percentage	
Version	
 string	
Metric	
 Units	
Requests	
 requests/s	
Virtual  Memory	
 GB	
Resident  Memory	
 GB	
Heap  Size	
 GB	
Heap  Slack  Bytes	
 GB	
CPU  user	
 percentage	
CPU  sys	
 percentage	
Version	
 string	
Master	
 Hyperspace
Slow  Query  Log	
•  ThriftBroker feature
•  Logs queries that
take longer than 10
seconds
•  Log line format
o  End time (seconds)
o  Start time (seconds)
o  Function called
o  Client IP/port
o  Latency (milliseconds)
o  Sub-scanner count
o  Bytes Returned
o  Bytes Scanned
o  Disk read
o  Servers contacted
o  Namespace
o  HQL representation of query
Features
Namespaces
Namespaces	
USE ‘/’;
CREATE NAMESPACE foo;
USE foo;
CREATE NAMESPACE bar;
CREATE TABLE mytable (a, b, c);
GET LISTING;
(bar) namespace
mytable
Atomic  Counters	
•  Column option:
CREATE TABLE counts (
url COUNTER
);
•  Modified via existing API using specially
formatted values:
Value Format Description
[+]n Increment counter by n
-n Decrement counter by n
=n Reset counter to n
Secondary  Indexes	
Total  Cells  Inserted:	
1  billion	
Total  Time  Taken:	
45  minutes	
Aggregate  Throughput  (inserts/s):	
372,362	
Aggregate  Throughput  (bytes/s):	
14,763,300	
§  Six test machines
-  Dual Six-core Opteron HE Processors
-  24 GB RAM
-  4X 2TB SATA drives
§  Single Indexed column
-  Key: randomly generated 20-byte integer
-  Value: two randomly chosen words from /usr/share/dict/
words
Secondary  Indexes  (HQL)	
CREATE TABLE products (
title,
section,
info,
category,
INDEX section,
INDEX info,
QUALIFIER INDEX info,
QUALIFIER INDEX category
);
Secondary  Indexes	
SELECT title
FROM products
WHERE info:actor = “Jack Nicholson”;
B00002VWE0 title Five Easy Pieces (1970)
B002VWNIDG title The Shining (1980)
Secondary  Indexes	
SELECT title, info:author
FROM products
WHERE info:author =~ /^Stephen [PK]/;
0307743659 title The Shining Mass Market Paperback
0307743659 info:author Stephen King
0321776402 title C++ Primer Plus (6th Edition)
(Developer's Library)
0321776402 info:author Stephen Prata
Secondary  Indexes	
SELECT title
FROM products
WHERE Exists(info:studio);
B00002VWE0 title Five Easy Pieces (1970)
B000Q66J1M title 2001: A Space Odyssey [Blu-ray]
B002VWNIDG title The Shining (1980)
Secondary  Indexes	
SELECT title
FROM products
WHERE info:author =~ /^Stephen P/ OR
info:publisher =~ /^Anchor/;
0307743659 title The Shining Mass Market Paperback
0321776402 title C++ Primer Plus (6th Edition)
(Developer's Library)
Secondary  Indexes	
SELECT title
FROM products
WHERE info:author =~ /^Stephen [PK]/ AND
info:publisher =~ /^Anchor/;
0307743659 title The Shining Mass Market Paperback
Secondary  Indexes	
SELECT title
FROM products
WHERE ROW =^ 'B' AND
info:actor = 'Jack Nicholson';
B00002VWE0 title Five Easy Pieces (1970)
B002VWNIDG title The Shining (1980)
Regex  Filtering	
•  Google’s RE2 regular expression engine
o  Extremely fast (up to 50X Java regex)
o  Searches run in time linear in the size of the
input
o  Searches constrained to a fixed amount of
memory
•  Supported Searches:
o  Row key
o  Column qualifier
o  Value
Regex  Filtering	
SELECT info:/^a/ FROM products;
0307743659 info:author Stephen King
0321321928 info:author Stephen C. Dewhurst
0321776402 info:author Stephen Prata
B00002VWE0 info:actor Karen Black
B00002VWE0 info:actor Jack Nicholson
B000Q66J1M info:actor Gary Lockwood
B000Q66J1M info:actor Keir Dullea
B002VWNIDG info:actor Shelley Duvall
B002VWNIDG info:actor Jack Nicholson
Regex  Filtering	
SELECT title
FROM products
WHERE ROW REGEXP "2";
0321321928 title C++ Common Knowledge: Essential
Intermediate Programming [Paperback]
0321776402 title C++ Primer Plus (6th Edition)
(Developer's Library)
B00002VWE0 title Five Easy Pieces (1970)
B002VWNIDG title The Shining (1980)
Regex  Filtering	
SELECT title
FROM products
WHERE VALUE REGEXP "(";
0321776402 title C++ Primer Plus (6th Edition)
(Developer's Library)
B00002VWE0 title Five Easy Pieces (1970)
B002VWNIDG title The Shining (1980)
Hadoop  MapReduce	
•  MapReduce Input/Output formats
o  Normal (mapreduce)
o  Streaming (mapred)
•  Load data from HT to Hive and vice-versa
•  Use Hive types
•  Use Hive QL (joins, aggregations)
•  Low latency data warehousing
•  Uses Hypertable’s native MapReduce Input/Output
format
Column  Family  Options	
•  TTL=<t>
o  “time to live”
o  Remove cells that are older than <t>
•  MAX_VERSIONS=<n>
o  Keep only most recent <n> cell versions
Access  Groups	
CREATE TABLE User (
name,
address,
photo,
profile,
ACCESS GROUP default (name, address, photo),
ACCESS GROUP profile (profile)
);
Adaptive  
Memory  Allocation
Group  Commit	
•  Supports highly concurrent updates
•  Trades average latency for better throughput
•  By default, commit log writes are auto-coalesced
•  Commit log write interval can be statically
configured per-table:
CREATE TABLE counts (
url,
domain
) GROUP_COMMIT_INTERVAL=100;
Caching	
•  Block Cache
o  Caches CellStore blocks
o  Can be configured to store blocks compressed or
uncompressed (default = compressed)
o  Dynamically adjusted size based on workload
•  Query Cache
o  Caches query results
o  Caches single row queries only
Compression	
•  Cell Store blocks are compressed
•  Commit Log updates are compressed
•  Supported Compression Schemes:
bmz, lzo, quicklz, snappy, zlib, none
•  Quicklz performance numbers:
Language Compression
Speed (MB/s)
Decompression
Speed (MB/s)
C++ 308 358
Java 127 95
Performance  Study
Hypertable  vs.  HBase	
•  Modeled after test described in Bigtable paper
•  Hypertable 0.9.5.5 vs. HBase 0.90.4
•  16-node Cluster
o  CPU: 2X AMD C32 Six-core model 4170 HE 2.1GHz
o  RAM: 24GB
o  Disk: 4X 2TB SATA
•  Tests Run
o  Random Write
o  Scan
o  Random Read Zipfian
o  Random Read Uniform
Random  Write
Scan
Random  Read  Zipfian
Case  Studies
•  Operational Data Store
•  System metrics
o  CPU
o  Memory
o  IO
o  Network
•  Application metrics
o  Web
o  DB
o  Caches
•  Business metrics
o  Usage
o  Revenue
Case  Study:  
Noah  System
•  Storage Capacity
o  Up to 100TB
o  Up to 1 trillion records
•  Automatic Sharding
o  Irregular data growth patterns
•  Heavy Writes
o  ~30K inserts/s
•  Fast Reads of Recent Data
•  Table Scans
System  
Requirements
Architecture  
Diagram
•  2nd Largest Indian Internet Portal
•  Rediffmail
o  One of the world’s largest email services
o  Over 100 Million registered users
•  Active Deployments
o  Rediffmaill
o  Email SPAM classification
o  News Crawl Database
o  Recommendation System
Case  Study:  
Rediff
Architectural  Overview
Query  Latency
Summary	
•  High Performance
•  Open Source
•  Future Direction SQL
The  End

More Related Content

What's hot

Hadoop 20111117
Hadoop 20111117Hadoop 20111117
Hadoop 20111117exsuns
 
PostgreSQL and Redis - talk at pgcon 2013
PostgreSQL and Redis - talk at pgcon 2013PostgreSQL and Redis - talk at pgcon 2013
PostgreSQL and Redis - talk at pgcon 2013Andrew Dunstan
 
Json in Postgres - the Roadmap
 Json in Postgres - the Roadmap Json in Postgres - the Roadmap
Json in Postgres - the RoadmapEDB
 
Developing and Deploying Apps with the Postgres FDW
Developing and Deploying Apps with the Postgres FDWDeveloping and Deploying Apps with the Postgres FDW
Developing and Deploying Apps with the Postgres FDWJonathan Katz
 
Hadoop 20111215
Hadoop 20111215Hadoop 20111215
Hadoop 20111215exsuns
 
Native erasure coding support inside hdfs presentation
Native erasure coding support inside hdfs presentationNative erasure coding support inside hdfs presentation
Native erasure coding support inside hdfs presentationlin bao
 
In-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteIn-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteDataWorks Summit
 
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...InfluxData
 
PostgreSQL 8.4 TriLUG 2009-11-12
PostgreSQL 8.4 TriLUG 2009-11-12PostgreSQL 8.4 TriLUG 2009-11-12
PostgreSQL 8.4 TriLUG 2009-11-12Andrew Dunstan
 
apache pig performance optimizations talk at apachecon 2010
apache pig performance optimizations talk at apachecon 2010apache pig performance optimizations talk at apachecon 2010
apache pig performance optimizations talk at apachecon 2010Thejas Nair
 
How to use Parquet as a basis for ETL and analytics
How to use Parquet as a basis for ETL and analyticsHow to use Parquet as a basis for ETL and analytics
How to use Parquet as a basis for ETL and analyticsJulien Le Dem
 
Interactive Hadoop via Flash and Memory
Interactive Hadoop via Flash and MemoryInteractive Hadoop via Flash and Memory
Interactive Hadoop via Flash and MemoryChris Nauroth
 
Tajo Seoul Meetup-201501
Tajo Seoul Meetup-201501Tajo Seoul Meetup-201501
Tajo Seoul Meetup-201501Jinho Kim
 
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...Ontico
 
Introduction to Apache Pig
Introduction to Apache PigIntroduction to Apache Pig
Introduction to Apache PigJason Shao
 
DBD::Gofer 200809
DBD::Gofer 200809DBD::Gofer 200809
DBD::Gofer 200809Tim Bunce
 
Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!Nathan Bijnens
 
Hortonworks HBase Meetup Presentation
Hortonworks HBase Meetup PresentationHortonworks HBase Meetup Presentation
Hortonworks HBase Meetup PresentationHortonworks
 

What's hot (20)

Hadoop 20111117
Hadoop 20111117Hadoop 20111117
Hadoop 20111117
 
PostgreSQL and Redis - talk at pgcon 2013
PostgreSQL and Redis - talk at pgcon 2013PostgreSQL and Redis - talk at pgcon 2013
PostgreSQL and Redis - talk at pgcon 2013
 
PostgreSQL
PostgreSQLPostgreSQL
PostgreSQL
 
Json in Postgres - the Roadmap
 Json in Postgres - the Roadmap Json in Postgres - the Roadmap
Json in Postgres - the Roadmap
 
Developing and Deploying Apps with the Postgres FDW
Developing and Deploying Apps with the Postgres FDWDeveloping and Deploying Apps with the Postgres FDW
Developing and Deploying Apps with the Postgres FDW
 
Hadoop 20111215
Hadoop 20111215Hadoop 20111215
Hadoop 20111215
 
Native erasure coding support inside hdfs presentation
Native erasure coding support inside hdfs presentationNative erasure coding support inside hdfs presentation
Native erasure coding support inside hdfs presentation
 
In-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteIn-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great Taste
 
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
 
PostgreSQL 8.4 TriLUG 2009-11-12
PostgreSQL 8.4 TriLUG 2009-11-12PostgreSQL 8.4 TriLUG 2009-11-12
PostgreSQL 8.4 TriLUG 2009-11-12
 
apache pig performance optimizations talk at apachecon 2010
apache pig performance optimizations talk at apachecon 2010apache pig performance optimizations talk at apachecon 2010
apache pig performance optimizations talk at apachecon 2010
 
How to use Parquet as a basis for ETL and analytics
How to use Parquet as a basis for ETL and analyticsHow to use Parquet as a basis for ETL and analytics
How to use Parquet as a basis for ETL and analytics
 
Interactive Hadoop via Flash and Memory
Interactive Hadoop via Flash and MemoryInteractive Hadoop via Flash and Memory
Interactive Hadoop via Flash and Memory
 
Tajo Seoul Meetup-201501
Tajo Seoul Meetup-201501Tajo Seoul Meetup-201501
Tajo Seoul Meetup-201501
 
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
 
Introduction to Apache Pig
Introduction to Apache PigIntroduction to Apache Pig
Introduction to Apache Pig
 
DBD::Gofer 200809
DBD::Gofer 200809DBD::Gofer 200809
DBD::Gofer 200809
 
Perl Programming - 04 Programming Database
Perl Programming - 04 Programming DatabasePerl Programming - 04 Programming Database
Perl Programming - 04 Programming Database
 
Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!
 
Hortonworks HBase Meetup Presentation
Hortonworks HBase Meetup PresentationHortonworks HBase Meetup Presentation
Hortonworks HBase Meetup Presentation
 

Similar to Hypertable - massively scalable nosql database

Nodejs - Should Ruby Developers Care?
Nodejs - Should Ruby Developers Care?Nodejs - Should Ruby Developers Care?
Nodejs - Should Ruby Developers Care?Felix Geisendörfer
 
Sorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at SpotifySorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at SpotifyNeville Li
 
Functional Comparison and Performance Evaluation of Streaming Frameworks
Functional Comparison and Performance Evaluation of Streaming FrameworksFunctional Comparison and Performance Evaluation of Streaming Frameworks
Functional Comparison and Performance Evaluation of Streaming FrameworksHuafeng Wang
 
Kubernetes上で動作する機械学習モジュールの配信&管理基盤Rekcurd について
Kubernetes上で動作する機械学習モジュールの配信&管理基盤Rekcurd についてKubernetes上で動作する機械学習モジュールの配信&管理基盤Rekcurd について
Kubernetes上で動作する機械学習モジュールの配信&管理基盤Rekcurd についてLINE Corporation
 
Scio - Moving to Google Cloud, A Spotify Story
 Scio - Moving to Google Cloud, A Spotify Story Scio - Moving to Google Cloud, A Spotify Story
Scio - Moving to Google Cloud, A Spotify StoryNeville Li
 
ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupRafal Kwasny
 
Emerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataEmerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataRahul Jain
 
Hadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log projectHadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log projectMao Geng
 
Scaling ingest pipelines with high performance computing principles - Rajiv K...
Scaling ingest pipelines with high performance computing principles - Rajiv K...Scaling ingest pipelines with high performance computing principles - Rajiv K...
Scaling ingest pipelines with high performance computing principles - Rajiv K...SignalFx
 
Node.js - Advanced Basics
Node.js - Advanced BasicsNode.js - Advanced Basics
Node.js - Advanced BasicsDoug Jones
 
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Odinot Stanislas
 
How to Puppetize Google Cloud Platform - PuppetConf 2014
How to Puppetize Google Cloud Platform - PuppetConf 2014How to Puppetize Google Cloud Platform - PuppetConf 2014
How to Puppetize Google Cloud Platform - PuppetConf 2014Puppet
 
Apache Drill talk ApacheCon 2018
Apache Drill talk ApacheCon 2018Apache Drill talk ApacheCon 2018
Apache Drill talk ApacheCon 2018Aman Sinha
 
OSCON 2011 Learning CouchDB
OSCON 2011 Learning CouchDBOSCON 2011 Learning CouchDB
OSCON 2011 Learning CouchDBBradley Holt
 

Similar to Hypertable - massively scalable nosql database (20)

Nodejs - Should Ruby Developers Care?
Nodejs - Should Ruby Developers Care?Nodejs - Should Ruby Developers Care?
Nodejs - Should Ruby Developers Care?
 
Sorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at SpotifySorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at Spotify
 
Functional Comparison and Performance Evaluation of Streaming Frameworks
Functional Comparison and Performance Evaluation of Streaming FrameworksFunctional Comparison and Performance Evaluation of Streaming Frameworks
Functional Comparison and Performance Evaluation of Streaming Frameworks
 
Kubernetes上で動作する機械学習モジュールの配信&管理基盤Rekcurd について
Kubernetes上で動作する機械学習モジュールの配信&管理基盤Rekcurd についてKubernetes上で動作する機械学習モジュールの配信&管理基盤Rekcurd について
Kubernetes上で動作する機械学習モジュールの配信&管理基盤Rekcurd について
 
Scio - Moving to Google Cloud, A Spotify Story
 Scio - Moving to Google Cloud, A Spotify Story Scio - Moving to Google Cloud, A Spotify Story
Scio - Moving to Google Cloud, A Spotify Story
 
ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetup
 
Nodejs - A quick tour (v6)
Nodejs - A quick tour (v6)Nodejs - A quick tour (v6)
Nodejs - A quick tour (v6)
 
Emerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataEmerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big Data
 
Apache Spark v3.0.0
Apache Spark v3.0.0Apache Spark v3.0.0
Apache Spark v3.0.0
 
Osd ctw spark
Osd ctw sparkOsd ctw spark
Osd ctw spark
 
Hadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log projectHadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log project
 
Overview of the Hive Stinger Initiative
Overview of the Hive Stinger InitiativeOverview of the Hive Stinger Initiative
Overview of the Hive Stinger Initiative
 
What's New in Apache Hive
What's New in Apache HiveWhat's New in Apache Hive
What's New in Apache Hive
 
Scaling ingest pipelines with high performance computing principles - Rajiv K...
Scaling ingest pipelines with high performance computing principles - Rajiv K...Scaling ingest pipelines with high performance computing principles - Rajiv K...
Scaling ingest pipelines with high performance computing principles - Rajiv K...
 
Node.js - Advanced Basics
Node.js - Advanced BasicsNode.js - Advanced Basics
Node.js - Advanced Basics
 
Ceph
CephCeph
Ceph
 
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
 
How to Puppetize Google Cloud Platform - PuppetConf 2014
How to Puppetize Google Cloud Platform - PuppetConf 2014How to Puppetize Google Cloud Platform - PuppetConf 2014
How to Puppetize Google Cloud Platform - PuppetConf 2014
 
Apache Drill talk ApacheCon 2018
Apache Drill talk ApacheCon 2018Apache Drill talk ApacheCon 2018
Apache Drill talk ApacheCon 2018
 
OSCON 2011 Learning CouchDB
OSCON 2011 Learning CouchDBOSCON 2011 Learning CouchDB
OSCON 2011 Learning CouchDB
 

More from bigdatagurus_meetup

More from bigdatagurus_meetup (11)

Apache Sentry for Hadoop security
Apache Sentry for Hadoop securityApache Sentry for Hadoop security
Apache Sentry for Hadoop security
 
Big data beyond the hype may 2014
Big data beyond the hype may 2014Big data beyond the hype may 2014
Big data beyond the hype may 2014
 
What enterprises can learn from Real Time Bidding (RTB)
What enterprises can learn from Real Time Bidding (RTB)What enterprises can learn from Real Time Bidding (RTB)
What enterprises can learn from Real Time Bidding (RTB)
 
Quantcast File System (QFS) - Alternative to HDFS
Quantcast File System (QFS) - Alternative to HDFSQuantcast File System (QFS) - Alternative to HDFS
Quantcast File System (QFS) - Alternative to HDFS
 
Scaling HBase at Pinterest
Scaling HBase at PinterestScaling HBase at Pinterest
Scaling HBase at Pinterest
 
Continuuity Weave
Continuuity WeaveContinuuity Weave
Continuuity Weave
 
Cassandra 2.0 (Introduction)
Cassandra 2.0 (Introduction)Cassandra 2.0 (Introduction)
Cassandra 2.0 (Introduction)
 
Search On Hadoop
Search On HadoopSearch On Hadoop
Search On Hadoop
 
Apache Tez -- A modern processing engine
Apache Tez -- A modern processing engineApache Tez -- A modern processing engine
Apache Tez -- A modern processing engine
 
Cloudera Developer Kit (CDK)
Cloudera Developer Kit (CDK)Cloudera Developer Kit (CDK)
Cloudera Developer Kit (CDK)
 
Lipstick On Pig
Lipstick On Pig Lipstick On Pig
Lipstick On Pig
 

Recently uploaded

Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...HyderabadDolls
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...HyderabadDolls
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numberssuginr1
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...gajnagarg
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...SOFTTECHHUB
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRajesh Mondal
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1ranjankumarbehera14
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...gragchanchal546
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...Bertram Ludäscher
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...gajnagarg
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowgargpaaro
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...HyderabadDolls
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...HyderabadDolls
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdfkhraisr
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...kumargunjan9515
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...Health
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...gajnagarg
 

Recently uploaded (20)

Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
 

Hypertable - massively scalable nosql database

  • 1. Hypertable     An  Open  Source,   High  Performance,   Massively  Scalable  Database     Doug Judd CEO Hypertable Inc.
  • 2. Three  Reasons  to   Choose  Hypertable •  High Performance •  Open Source •  Future Direction SQL
  • 4. Highlights •  Modeled after Google’s Bigtable database •  High Performance Implementation (C++) •  Apache Thrift interface for all popular languages (Java, PHP, Ruby, Python, Perl, etc) •  Broad Hadoop distribution support o  Apache 2 o  Cloudera CDH3, CDH4, CDH5 o  IBM BigInsights 3 o  Hortonworks HDP2 o  MapR •  Actively developed for 8 years
  • 5. Open  Source •  Licensed under the GPL •  Hosted on GitHub o  git://github.com/hypertable/hypertable.git o  https://github.com/hypertable/hypertable.git •  Online source documentation •  Mailing Lists o  groups.google.com/group/hypertable-user o  groups.google.com/group/hypertable-dev
  • 6. Bigtable •  Google’s most successful scalable database •  Bigtable underpins 100+ Google services •  YouTube, Blogger, Google Earth, Google Maps, Orkut, Gmail, Google Analytics, Google Book Search, Google Code, Crawl Database, Google Code … •  Data is physically ordered by primary key – it’s not a distributed hash table
  • 7. How  Hypertable  Differs  From   A  Traditional  RDBMS •  Horizontally Scalable •  Sparse Table Structure o  Variable number of columns per-row o  Rows can have billions of columns •  Cells can have multiple time stamped versions
  • 8. Database  Model •  Sparse, two-dimensional tables •  Cells can have multiple versions •  Cells addressed by 4-part key o  Row o  Column family o  Column qualifier o  Timestamp
  • 11. Anatomy  of  a  Key •  Column Family is 8-bit •  Timestamp and Revision are 64-bit integer nanoseconds since Epoch •  Simple byte-wise comparison
  • 29. Cluster  Task   AutomationTool •  ht_cluster •  Modeled after Capistrano •  Role o  Designates a function or service and the set of machines that will perform that function or service o  Examples: Hyperspace, Master, Slave (RangeServer), ThriftBroker o  Machines can belong to one ore more roles •  Task o  Script written for specific roles and used to manage the associated function or service o  Examples: start_hyperspace, stop_hyperspace
  • 31. Common  Tasks ht cluster start ht cluster stop ht cluster push_config ht cluster install_package ht cluster upgrade
  • 34. Thrift  Broker  Metrics Metric Units Connections count Requests requests/s Errors errors/s Virtual  Memory GB Resident  Memory GB Heap  Size GB Heap  Slack  Bytes GB CPU  user percentage CPU  sys percentage Version string
  • 35. Range  Server  Metrics Metric Units Scans scans/s Updates updates/s Bytes  Returned bytes/s Bytes  Scanned bytes/s Byte  Scan  Yield percentage Bytes  WriUen bytes/s Cells  Returned cells/s Cells  Scanned cells/s Cell  Scan  Yield percentage Outstanding  Scanners count Request  Backlog count Metric Units Major  Compactions count Minor  Compactions count Merging  Compactions count GC  Compactions count Virtual  Memory GB Resident  Memory GB Heap  Size GB Heap  Slack  Bytes GB Tracked  Memory GB CPU  user percentage CPU  sys percentage
  • 36. Range  Server  Metrics Metric Units Ranges count CellStores count Block  Cache  Hits percentage Block  Cache  Memory GB Block  Cache  Fill GB Query  Cache  Hits Percentage Query  Cache  Memory GB Query  Cache  Fill GB Version string
  • 37. FS  Broker  Metrics Metric Units Read  Throughput MB/s Write  Throughput MB/s Syncs syncs/s Sync  Latency milliseconds Errors count JVM  GCs count JVM  GC  Time milliseconds JVM  Heap  Size GB Virtual  Memory GB Resident  Memory GB Metric Units Heap  Size GB Heap  Slack  Bytes GB CPU  user percentage CPU  sys percentage Version string
  • 38. Master  and  Hyperspace   Metrics Metric Units Operations operations/s Virtual  Memory GB Resident  Memory GB Heap  Size GB Heap  Slack  Bytes GB CPU  user percentage CPU  sys percentage Version string Metric Units Requests requests/s Virtual  Memory GB Resident  Memory GB Heap  Size GB Heap  Slack  Bytes GB CPU  user percentage CPU  sys percentage Version string Master Hyperspace
  • 39. Slow  Query  Log •  ThriftBroker feature •  Logs queries that take longer than 10 seconds •  Log line format o  End time (seconds) o  Start time (seconds) o  Function called o  Client IP/port o  Latency (milliseconds) o  Sub-scanner count o  Bytes Returned o  Bytes Scanned o  Disk read o  Servers contacted o  Namespace o  HQL representation of query
  • 42. Namespaces USE ‘/’; CREATE NAMESPACE foo; USE foo; CREATE NAMESPACE bar; CREATE TABLE mytable (a, b, c); GET LISTING; (bar) namespace mytable
  • 43. Atomic  Counters •  Column option: CREATE TABLE counts ( url COUNTER ); •  Modified via existing API using specially formatted values: Value Format Description [+]n Increment counter by n -n Decrement counter by n =n Reset counter to n
  • 44. Secondary  Indexes Total  Cells  Inserted: 1  billion Total  Time  Taken: 45  minutes Aggregate  Throughput  (inserts/s): 372,362 Aggregate  Throughput  (bytes/s): 14,763,300 §  Six test machines -  Dual Six-core Opteron HE Processors -  24 GB RAM -  4X 2TB SATA drives §  Single Indexed column -  Key: randomly generated 20-byte integer -  Value: two randomly chosen words from /usr/share/dict/ words
  • 45. Secondary  Indexes  (HQL) CREATE TABLE products ( title, section, info, category, INDEX section, INDEX info, QUALIFIER INDEX info, QUALIFIER INDEX category );
  • 46. Secondary  Indexes SELECT title FROM products WHERE info:actor = “Jack Nicholson”; B00002VWE0 title Five Easy Pieces (1970) B002VWNIDG title The Shining (1980)
  • 47. Secondary  Indexes SELECT title, info:author FROM products WHERE info:author =~ /^Stephen [PK]/; 0307743659 title The Shining Mass Market Paperback 0307743659 info:author Stephen King 0321776402 title C++ Primer Plus (6th Edition) (Developer's Library) 0321776402 info:author Stephen Prata
  • 48. Secondary  Indexes SELECT title FROM products WHERE Exists(info:studio); B00002VWE0 title Five Easy Pieces (1970) B000Q66J1M title 2001: A Space Odyssey [Blu-ray] B002VWNIDG title The Shining (1980)
  • 49. Secondary  Indexes SELECT title FROM products WHERE info:author =~ /^Stephen P/ OR info:publisher =~ /^Anchor/; 0307743659 title The Shining Mass Market Paperback 0321776402 title C++ Primer Plus (6th Edition) (Developer's Library)
  • 50. Secondary  Indexes SELECT title FROM products WHERE info:author =~ /^Stephen [PK]/ AND info:publisher =~ /^Anchor/; 0307743659 title The Shining Mass Market Paperback
  • 51. Secondary  Indexes SELECT title FROM products WHERE ROW =^ 'B' AND info:actor = 'Jack Nicholson'; B00002VWE0 title Five Easy Pieces (1970) B002VWNIDG title The Shining (1980)
  • 52. Regex  Filtering •  Google’s RE2 regular expression engine o  Extremely fast (up to 50X Java regex) o  Searches run in time linear in the size of the input o  Searches constrained to a fixed amount of memory •  Supported Searches: o  Row key o  Column qualifier o  Value
  • 53. Regex  Filtering SELECT info:/^a/ FROM products; 0307743659 info:author Stephen King 0321321928 info:author Stephen C. Dewhurst 0321776402 info:author Stephen Prata B00002VWE0 info:actor Karen Black B00002VWE0 info:actor Jack Nicholson B000Q66J1M info:actor Gary Lockwood B000Q66J1M info:actor Keir Dullea B002VWNIDG info:actor Shelley Duvall B002VWNIDG info:actor Jack Nicholson
  • 54. Regex  Filtering SELECT title FROM products WHERE ROW REGEXP "2"; 0321321928 title C++ Common Knowledge: Essential Intermediate Programming [Paperback] 0321776402 title C++ Primer Plus (6th Edition) (Developer's Library) B00002VWE0 title Five Easy Pieces (1970) B002VWNIDG title The Shining (1980)
  • 55. Regex  Filtering SELECT title FROM products WHERE VALUE REGEXP "("; 0321776402 title C++ Primer Plus (6th Edition) (Developer's Library) B00002VWE0 title Five Easy Pieces (1970) B002VWNIDG title The Shining (1980)
  • 56. Hadoop  MapReduce •  MapReduce Input/Output formats o  Normal (mapreduce) o  Streaming (mapred)
  • 57. •  Load data from HT to Hive and vice-versa •  Use Hive types •  Use Hive QL (joins, aggregations) •  Low latency data warehousing •  Uses Hypertable’s native MapReduce Input/Output format
  • 58. Column  Family  Options •  TTL=<t> o  “time to live” o  Remove cells that are older than <t> •  MAX_VERSIONS=<n> o  Keep only most recent <n> cell versions
  • 59. Access  Groups CREATE TABLE User ( name, address, photo, profile, ACCESS GROUP default (name, address, photo), ACCESS GROUP profile (profile) );
  • 61. Group  Commit •  Supports highly concurrent updates •  Trades average latency for better throughput •  By default, commit log writes are auto-coalesced •  Commit log write interval can be statically configured per-table: CREATE TABLE counts ( url, domain ) GROUP_COMMIT_INTERVAL=100;
  • 62. Caching •  Block Cache o  Caches CellStore blocks o  Can be configured to store blocks compressed or uncompressed (default = compressed) o  Dynamically adjusted size based on workload •  Query Cache o  Caches query results o  Caches single row queries only
  • 63. Compression •  Cell Store blocks are compressed •  Commit Log updates are compressed •  Supported Compression Schemes: bmz, lzo, quicklz, snappy, zlib, none •  Quicklz performance numbers: Language Compression Speed (MB/s) Decompression Speed (MB/s) C++ 308 358 Java 127 95
  • 65. Hypertable  vs.  HBase •  Modeled after test described in Bigtable paper •  Hypertable 0.9.5.5 vs. HBase 0.90.4 •  16-node Cluster o  CPU: 2X AMD C32 Six-core model 4170 HE 2.1GHz o  RAM: 24GB o  Disk: 4X 2TB SATA •  Tests Run o  Random Write o  Scan o  Random Read Zipfian o  Random Read Uniform
  • 67. Scan
  • 70. •  Operational Data Store •  System metrics o  CPU o  Memory o  IO o  Network •  Application metrics o  Web o  DB o  Caches •  Business metrics o  Usage o  Revenue Case  Study:   Noah  System
  • 71. •  Storage Capacity o  Up to 100TB o  Up to 1 trillion records •  Automatic Sharding o  Irregular data growth patterns •  Heavy Writes o  ~30K inserts/s •  Fast Reads of Recent Data •  Table Scans System   Requirements
  • 73. •  2nd Largest Indian Internet Portal •  Rediffmail o  One of the world’s largest email services o  Over 100 Million registered users •  Active Deployments o  Rediffmaill o  Email SPAM classification o  News Crawl Database o  Recommendation System Case  Study:   Rediff
  • 76. Summary •  High Performance •  Open Source •  Future Direction SQL