Cloudcon East Presentation

The Problem
• Hundreds of
millions of
products

• Millions of new
products every
week

• Accelerating
growth

Scaling up is no
longer an option.

Requirements
➡ OLTP optimized
➡ Constant response time is more
important than low-latency
➡ Expand capacity online
➡ Ability to move records
➡ No re-sharding

2 years ago
Clustered Non-relational Unstructured

Oracle RAC
MS SQL Server Hadoop
MySQL Cluster MogileFS
Teradata (OLAP) S3

Today
Clustered Non-relational Unstructured

Oracle RAC
MS SQL Server Hypertable
Hadoop
MySQL Cluster HBase
MogileFS
DB2 CouchDB
S3
Teradata (OLAP) SimpleDB
HiveDB

System Storage Interface Partitioning Expansion Node Types Maturity

No
Oracle RAC Shared SQL Transparent Identical 7 years
downtime

Memory Requires
MySQL Cluster SQL Transparent Mixed 3 years
/ Local Restart

?
No
Hypertable Local HQL Transparent Mixed (released
downtime
2/08)
Degraded 3 years
DB2 Local SQL Fixed Hash Performan Identical (25 years
ce total)
Key-based No 18 months
HiveDB Local SQL Mixed
Directory downtime (+13 years!)

Operating a
distributed system
is hard.

Non-functional
concerns quickly
dominate functional
concerns.

Replication
Fail over
Monitoring

Our approach:
minimize cleverness

Build atop solid,
easy to manage
technologies.

So we can
get back
to selling
t-shirts.

The simplest thing
➡ HiveDB is a JDBC Gatekeeper for
applications.
➡ Lightest integration
➡ Smallest possible implementation

The eBay approach
➡ Pro: Internally consistent data
➡ Con: Re-sharding of data required
➡ Con: Re-sharding is a DBA
operation

Partition by key
A simple bucketed partitioning scheme
expressed in code.

Directory
➡ No broadcasting
➡ No re-partitioning
➡ Easy to relocate records
➡ Easy to add capacity

Disadvantages
➡ Intelligence moved from the
database to the application
(Queries have to be planned and
indexed)
➡ Can’t join across partitions
➡ NO OLAP! (We consider this a
benefit.)

Hibernate Shards
Partitioned Hibernate from Google

Why did we build
this thing again?

Oh wait, you have
to tell it where
things are.

Benefits of Shards
➡ Unified data access layer
➡ Result set aggregation across
partitions
➡ Everyone in the JAVA-world knows
Hibernate.

Working on
simplifying it even
further.

Case Study:
CafePress
➡ Leader in User-generated
Commerce
➡ More products than eBay
(>250,000,000)
➡ 24/7/365

Case Study:
Performance Requirements
➡ Thousands of queries/sec
➡ 10 : 1 read/write ratio
➡ Geographically distributed

Case Study:
Test Environment
➡ Real schema
➡ Production-class hardware
➡ Partial data (~40M records)

CafePress HiveDB 2007
Performance Test Environment

JMeter (1 thread) JMeter (no threads)
command &

client.jar client.jar
control

Measurement Test Controller
Workstation Workstation
100MBit switch
load generators

JMeter (100s of threads) JMeter (100s of threads)
client.jar client.jar

48GB backplane non-
Dell 2950 / 2x2 Xeon blocking gigabit switch Dell 2950 / 2x2 Xeon
16GB, 6x72GB 15k 16GB, 6x72GB 15k
web service

Hardware LB
(hivedb)

Dell 1950 / 2x2 Xeon Dell 1950 / 2x2 Xeon Dell 1950 / 2x2 Xeon
Tomcat 5.5 Tomcat 5.5 Tomcat 5.5

Directory Partition 0 Partition 1
databases
(mysql)

Dell 2950 / 2x2 Xeon Dell 2950 / 2x2 Xeon Dell 2950 / 2x2 Xeon
16GB, 6x72GB 15k 16GB, 6x72GB 15k 16GB, 6x72GB 15k

jmccarthy@cafepress.com Modiﬁed on April 09 2007

Case Study:
Performance Goals
➡ Large object reads: 1500/s

➡ Large object writes: 200/s

➡ Response time: 100ms

Case Study:
Performance Results
➡ Large object reads: 1500/s
Actual result: 2250/s
➡ Large object writes: 200/s
➡ Response time: 100ms
Actual result: 8ms

Case Study:
Performance Results
➡ Max read throughput
(CPU limited in Java layer;
MySQL <25% utilized)

Case Study:
Results
➡ Billions of queries served
➡ Highest DB uptime at CafePress
➡ Hundreds of millions of updates
performed

High Availability
➡ We don’t specify a fail over
strategy.
➡ We don’t specify a backup or
replication strategy.

Fail Over Options
➡ Load balancer
➡ MySQL Proxy
➡ Flipper / MMM
➡ HAProxy

Replication
➡ You should use MySQL replication.
➡ It really works.
➡ You can add in LVM snapshots for
even more disastrous disaster
recovery.

The Bottleneck
There is one problem with HiveDB. The
directory is a scaling bottleneck.

There’s no reason
it has to be a
database.

MemcacheD
➡ Hive directory implemented atop
memcached

Roadmap
➡ MemcacheD directory
➡ Simplified Hibernate support
➡ Management web service
➡ Command-line tools
➡ Framework Adapters

Call for developers!
➡ Grails (GORM)
➡ ActiveRecord (JRuby on Rails)
➡ Ambition (JRuby or web service)
➡ iBatis
➡ JPA
➡ ... Postgres?

Contributing
➡ Post to the mailing list
http://groups.google.com/group/hivedb-dev

➡ Comment on our site
http://www.hivedb.org

➡ Submit a patch / pull request
git clone git://github.com/britt/hivedb.git

Photo Credits
➡ http://www.flickr.com/photos/7362313@N07/1240245941/
sizes/o

➡ http://www.flickr.com/photos/
99287245@N00/2229322675/sizes/o

➡ http://www.flickr.com/photos/
51035555243@N01/26006138/

➡ http://www.flickr.com/photos/vgm8383/2176897085/
sizes/o/

➡ http://t-shirts.cafepress.com/item/money-organic-cotton-
tee/73439294

Blobject
➡ Gets around the problem of ALTER
statements. No data set of this size
can be transformed synchronously.
➡ The hive can contain multiple
versions of a serialized record.
➡ Compression

Hadoop!
➡ Query and transform blobbed data
asynchronously.

Cloudcon East Presentation

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (19)

Destacado

Destacado (9)

Similar a Cloudcon East Presentation

Similar a Cloudcon East Presentation (20)

Último

Último (20)

Cloudcon East Presentation