4. Requirements
➡ OLTP optimized
➡ Constant response time is more
important than low-latency
➡ Expand capacity online
➡ Ability to move records
➡ No re-sharding
5. 2 years ago
Clustered Non-relational Unstructured
Oracle RAC
MS SQL Server Hadoop
MySQL Cluster MogileFS
Teradata (OLAP) S3
6. Today
Clustered Non-relational Unstructured
Oracle RAC
MS SQL Server Hypertable
Hadoop
MySQL Cluster HBase
MogileFS
DB2 CouchDB
S3
Teradata (OLAP) SimpleDB
HiveDB
7. System Storage Interface Partitioning Expansion Node Types Maturity
No
Oracle RAC Shared SQL Transparent Identical 7 years
downtime
Memory Requires
MySQL Cluster SQL Transparent Mixed 3 years
/ Local Restart
?
No
Hypertable Local HQL Transparent Mixed (released
downtime
2/08)
Degraded 3 years
DB2 Local SQL Fixed Hash Performan Identical (25 years
ce total)
Key-based No 18 months
HiveDB Local SQL Mixed
Directory downtime (+13 years!)
24. Directory
➡ No broadcasting
➡ No re-partitioning
➡ Easy to relocate records
➡ Easy to add capacity
25.
26. Disadvantages
➡ Intelligence moved from the
database to the application
(Queries have to be planned and
indexed)
➡ Can’t join across partitions
➡ NO OLAP! (We consider this a
benefit.)
46. Case Study:
Performance Results
➡ Large object reads: 1500/s
Actual result: 2250/s
➡ Large object writes: 200/s
Actual result: 300/s
➡ Response time: 100ms
Actual result: 8ms
56. Roadmap
➡ MemcacheD directory
➡ Simplified Hibernate support
➡ Management web service
➡ Command-line tools
➡ Framework Adapters
57. Call for developers!
➡ Grails (GORM)
➡ ActiveRecord (JRuby on Rails)
➡ Ambition (JRuby or web service)
➡ iBatis
➡ JPA
➡ ... Postgres?
58. Contributing
➡ Post to the mailing list
http://groups.google.com/group/hivedb-dev
➡ Comment on our site
http://www.hivedb.org
➡ Submit a patch / pull request
git clone git://github.com/britt/hivedb.git
60. Blobject
➡ Gets around the problem of ALTER
statements. No data set of this size
can be transformed synchronously.
➡ The hive can contain multiple
versions of a serialized record.
➡ Compression
61. Hadoop!
➡ Query and transform blobbed data
asynchronously.