2. Scalable table stores are critical systems
Swapnil Patil, CMU 2
• For data processing & analysis (e.g. Pregel, Hive)
• For systems services (e.g., Google Colossus metadata)
3. Evolution of scalable table stores
Simple, lightweight complex, feature-rich stores
Supports a broader range of applications and services
Hard to debug and understand performance problems
Complex behavior and interaction of various components
GrowingsetofHBasefeatures
2008 2009 2010 2011+
RangeRowFilters
Batch updates
Bulk load tools
RegEx filtering
Scan optimizations
HBASE
release
Co-processors
Access Control
⏏
⏏
⏏
3Swapnil Patil, CMU
4. YCSB++ FUNCTIONALITY
ZooKeeper-based distributed and coordinated testing
API and extensions the new Apache ACCUMULO DB
Fine-grained, correlated monitoring usingOTUS
FEATURES TESTED USING YCSB++
Batch writing Table pre-splitting Bulk loading
Weak consistency Server-side filtering Fine-grained security
Tool released at http://www.pdl.cmu.edu/ycsb++
Swapnil Patil, CMU 4
Need richer tools for understanding
advanced features in table stores …
5. Outline
• Problem
• YCSB++ design
• Illustrative examples
• Ongoing work and summary
Swapnil Patil, CMU 5
6. Yahoo Cloud Serving Benchmark [Cooper2010]
Swapnil Patil, CMU 6
• For CRUD (create-read-update-delete) benchmarking
• Single-node system with an extensible API
Storage Servers
HBASE
OTHER
DBS
Workload
Executor
Threads
Stats
DBClients
Command-line
Parameters
Workload
Parameter
File
7. YCSB++: New extensions
Swapnil Patil, CMU 7
Added support for the new Apache ACCUMULO DB
− New parameters and workload executors
Storage Servers
HBASE
OTHER
DBS
Workload
Executor
Threads
Stats
DBClients
Workload
Parameter
File
Command-line
Parameters
EXTENSIONSEXTENSIONS ACCUMULO
8. YCSB++: Distributed & parallel tests
Swapnil Patil, CMU 8
Multi-client, multi-phase coordination using ZooKeeper
− Enables testing at large scales and testing asymmetric features
Storage Servers
HBASE
OTHER
DBS
Workload
Executor
Threads
Stats
DBClients
Workload
Parameter
File
Command-line
Parameters
EXTENSIONSEXTENSIONS
MULTI-PHASE
ACCUMULO
YCSB clients
COORDINATION
9. YCSB++: Collective monitoring
Swapnil Patil, CMU 9
OTUS monitor built on Ganglia [Ren2011]
− Collects information fromYCSB, table stores, HDFS and OS
Storage Servers
HBASE
OTHER
DBS
Workload
Executor
Threads
Stats
DBClients
Workload
Parameter
File
Command-line
Parameters
EXTENSIONSEXTENSIONS
MULTI-PHASE
ACCUMULO
YCSB clients
COORDINATION OTUS MONITORING
10. Example ofYCSB++ debugging
Swapnil Patil, CMU 10
OTUS collects fine-grained information
− Both HDFS process andTabletServer process on same node
0
20
40
60
80
100
00:00 04:00 08:00 12:00 16:00 20:00 00:00 04:00
0
8
16
24
32
40
CPUUsage(%)
AvgNumberofStoreFilesPerTablet
Time (Minutes)
Monitoring Resource Usage and TableStore Metrics
Accumulo Avg. StoreFiles per Tablet
HDFS DataNode CPU Usage
Accumulo TabletServer CPU Usage
11. Outline
• Problem
• YCSB++ design
• Illustrative examples
− YCSB++ on HBASE and ACCUMULO (Bigtable-like stores)
• Ongoing work and summary
Swapnil Patil, CMU 11
12. Tablet Servers
Recap of Bigtable-like table stores
Swapnil Patil, CMU 12
HDFS nodes
TabletTN
Memtable
(Fewer)
Sorted
Indexed
Files
Sorted
Indexed
Files
MINOR
COMPACTION
MAJOR
COMPACTION
Write
Ahead
Log
Data
Insertion
1 2
3
Write-path: in-memory buffering & async FS writes
1) Mutations logged in memory tables (unsorted order)
2) Minor compaction: Memtables -> sorted, indexed files in HDFS
3) Major compaction: LSM-tree based file merging in background
Read-path: lookup both memtables and on-disk files
13. Apache ACCUMULO
Started at NSA; now an Apache Incubator project
− Designed for for high-speed ingest and scan workloads
− http://incubator.apache.org/projects/accumulo.html
New features in ACCUMULO
− Iterator framework for user-specified programs placed in
between different stages of the DB pipeline
E.g., Support joins and stream processing using iterators
− Also supports fine-grained cell-level access control
Swapnil Patil, CMU 13
15. Client-side batch writing
Feature: clients batch inserts, delay writes to server
• Improves insert throughput and latency
• Newly inserted data may not be immediately visible to
other clients
Swapnil Patil, CMU 15
⏏
⏏
Table store servers
ZooKeeper
Cluster
Manager
YCSB++
Store client
Batch
YCSB++
Store client
CLIENT #1 CLIENT #2
Read{K}
16. Batch writing improves throughput
6 clients creating 9 million 1-Kbyte records on 6 servers
− Small batches - high client CPU utilization, limits throughput
− Large batches - saturate servers, limited benefit from batching
Swapnil Patil, CMU 16
0
10
20
30
40
50
60
10 KB 100 KB 1 MB 10 MB
Insertspersecond(1000s)
Batch size
Hbase Accumulo
17. Table store servers
ZooKeeper
Batch writing causes weak consistency
Swapnil Patil, CMU 17
Test setup: ZooKeeper-based client coordination
• Share producer-consumer queue between readers/writers
• R-W lag = delay before C2 can read C1’s most recent write
YCSB++
Store client
Batch
YCSB++
Store client
1
2 3
4
CLIENT #1 CLIENT #2
Insert
{K:V}
(106 records)
EnqueueK
(sample 1% records)
Polland
dequeueK
Read{K}
18. Batch writing causes weak consistency
Deferred write wins, but lag can be ~100 seconds
− (N%) = fraction of requests that needed multiple read()s
− Implementation of batching affects the median latency
Swapnil Patil, CMU 18
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1 10 100 1000 10000 100000
Fractionofrequests
read-after-write time lag (ms)
(a) HBase: Time lag for different buffer sizes
10 KB ( <1%)
100 KB (7.4%)
1 MB ( 17%)
10 MB ( 23%)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1 10 100 1000 10000 100000
Fractionofrequests
read-after-write time lag (ms)
(b) Accumulo: Time lag for different buffer sizes
10 KB ( <1%)
100 KB (1.2%)
1 MB ( 14%)
10 MB ( 33%)
20. Features for high-speed insertions
Most table stores have high-speed ingest features
− Periodically insert large amounts of data or migrate old
data in bulk
− Classic relational DB techniques applied to new stores
Two features: bulk loading and table pre-splitting
• Less data migration during inserts
• Engages more tablet servers immediately
• Need careful tuning and configuration [Sasha2002]
Swapnil Patil, CMU 20
⏏
⏏
⏏
21. 8-phase test setup: table bulk loading
Bulk loading involves two steps
− Hadoop-based data formatting
− Importing store files into table store
Pre-load phase (1 and 2)
− Bulk load 6M rows in an empty table
− Goal: parallelism by engaging all servers
Load phase (4 and 5)
− Load 48M new rows
− Goal: study rebalancing during ingest
R/U measurements (3, 6 and 7)
− Correlate latency with rebalancing work
Swapnil Patil, CMU 21
Load (importing)
Read/Update workload
Load (re-formatting)
Read/Update workload
Sleep
Read/Update workload
Pre-Load (importing)
Pre-Load (re-formatting)
Phases
1
2
3
4
5
6
7
8
22. Read latency affected by rebalancing work
Swapnil Patil, CMU 22
Load (importing)
Read/Update workload
Load (re-formatting)
Read/Update workload
Sleep
Read/Update workload
Pre-Load (importing)
Pre-Load (re-formatting)
Phases
1
2
3
4
5
6
7
8
1
10
100
1000
0 60 120 180 240 300
AccumuloReadLatency(ms)
Measurement Phase RunningTime (Seconds)
R/U 1 (Phase 3) R/U 2 (Phase 6) R/U 3 (Phase 8)
• High latency after high insertion periods that
cause servers to rebalance (compactions)
• Latency drops after store is in a steady state
25. Extending to table pre-splitting
Swapnil Patil, CMU 25
Tablepre-splittingtest
Load
Pre-load
Pre-split into N ranges
Read/Update workload
Sleep
Read/Update workload
Load (importing)
Read/Update workload
Load (re-formatting)
Read/Update workload
Sleep
Read/Update workload
Pre-Load (importing)
Pre-Load (re-formatting)
Bulkloadingtest
Pre-split a key range into N partitions to avoid splitting during insertion
26. Outline
• Problem
• YCSB++ design
• Illustrative examples
• Ongoing work and summary
Swapnil Patil, CMU 26
27. Things not covered in this talk
More features: function shipping to servers
− Data filtering at the servers
− Fine-grained, cell-level access control
MoredetailsintheACMSOCC2011paper
Ongoing work
− Analyze more table stores: Cassandra,CouchDB, MongoDB
− Continue research through the new Intel Science and
Technology Center for Cloud Computing at CMU (withGaTech)
Swapnil Patil, CMU 27
28. Summary:YCSB++ tool
• Tool for performance debugging and benchmarking
advanced features using new extensions toYCSB
• Two case-studies: Apache HBASE and ACCUMULO
• Tool available at http://www.pdl.cmu.edu/ycsb++
Weak consistency semantics Distributed clients using ZooKeeper
Fast insertions (pre-splits & bulk loads) Multi-phase testing (with Hadoop)
Server-side filtering New workload generators and
database client API extensionsFine-grained access control
28Swapnil Patil, CMU