SlideShare una empresa de Scribd logo
1 de 28
Descargar para leer sin conexión
YCSB++ BenchmarkingTool
PerformanceDebuggingAdvanced
FeaturesofScalableTableStores
Swapnil Patil
Milo Polte,WittawatTantisiriroj, Kai Ren, Lin Xiao, Julio Lopez, Garth Gibson, Adam Fuchs *, Billie Rinaldi *
CarnegieMellonUniversity *NationalSecurityAgency
Open Cirrus Summit, October 2011, Atlanta GA
Scalable table stores are critical systems
Swapnil Patil, CMU 2
•  For data processing & analysis (e.g. Pregel, Hive)
•  For systems services (e.g., Google Colossus metadata)
Evolution of scalable table stores
Simple, lightweight  complex, feature-rich stores
Supports a broader range of applications and services
Hard to debug and understand performance problems
Complex behavior and interaction of various components
GrowingsetofHBasefeatures
2008 2009 2010 2011+
RangeRowFilters
Batch updates
Bulk load tools
RegEx filtering
Scan optimizations
HBASE
release
Co-processors
Access Control
⏏

⏏

⏏

3Swapnil Patil, CMU
YCSB++ FUNCTIONALITY
ZooKeeper-based distributed and coordinated testing
API and extensions the new Apache ACCUMULO DB
Fine-grained, correlated monitoring usingOTUS
FEATURES TESTED USING YCSB++
Batch writing Table pre-splitting  Bulk loading
Weak consistency  Server-side filtering  Fine-grained security
Tool released at http://www.pdl.cmu.edu/ycsb++
Swapnil Patil, CMU 4
Need richer tools for understanding
advanced features in table stores …
Outline
•  Problem
•  YCSB++ design
•  Illustrative examples
•  Ongoing work and summary
Swapnil Patil, CMU 5
Yahoo Cloud Serving Benchmark [Cooper2010]
Swapnil Patil, CMU 6
•  For CRUD (create-read-update-delete) benchmarking
•  Single-node system with an extensible API
Storage Servers
HBASE
OTHER
DBS
Workload
Executor
Threads
Stats
DBClients
Command-line
Parameters
Workload
Parameter
File
YCSB++: New extensions
Swapnil Patil, CMU 7
Added support for the new Apache ACCUMULO DB
− New parameters and workload executors
Storage Servers
HBASE
OTHER
DBS
Workload
Executor
Threads
Stats
DBClients
Workload
Parameter
File
Command-line
Parameters
EXTENSIONSEXTENSIONS ACCUMULO
YCSB++: Distributed & parallel tests
Swapnil Patil, CMU 8
Multi-client, multi-phase coordination using ZooKeeper
− Enables testing at large scales and testing asymmetric features
Storage Servers
HBASE
OTHER
DBS
Workload
Executor
Threads
Stats
DBClients
Workload
Parameter
File
Command-line
Parameters
EXTENSIONSEXTENSIONS
MULTI-PHASE
ACCUMULO
YCSB clients
COORDINATION
YCSB++: Collective monitoring
Swapnil Patil, CMU 9
OTUS monitor built on Ganglia [Ren2011]
− Collects information fromYCSB, table stores, HDFS and OS
Storage Servers
HBASE
OTHER
DBS
Workload
Executor
Threads
Stats
DBClients
Workload
Parameter
File
Command-line
Parameters
EXTENSIONSEXTENSIONS
MULTI-PHASE
ACCUMULO
YCSB clients
COORDINATION OTUS MONITORING
Example ofYCSB++ debugging
Swapnil Patil, CMU 10
OTUS collects fine-grained information
− Both HDFS process andTabletServer process on same node
0
20
40
60
80
100
00:00 04:00 08:00 12:00 16:00 20:00 00:00 04:00
0
8
16
24
32
40
CPUUsage(%)
AvgNumberofStoreFilesPerTablet
Time (Minutes)
Monitoring Resource Usage and TableStore Metrics
Accumulo Avg. StoreFiles per Tablet
HDFS DataNode CPU Usage
Accumulo TabletServer CPU Usage
Outline
•  Problem
•  YCSB++ design
•  Illustrative examples
− YCSB++ on HBASE and ACCUMULO (Bigtable-like stores)
•  Ongoing work and summary
Swapnil Patil, CMU 11
Tablet Servers
Recap of Bigtable-like table stores
Swapnil Patil, CMU 12
HDFS nodes
TabletTN
Memtable
(Fewer)
Sorted
Indexed
Files
Sorted
Indexed
Files
MINOR
COMPACTION
MAJOR
COMPACTION
Write
Ahead
Log
Data
Insertion
1 2
3
Write-path: in-memory buffering & async FS writes
1) Mutations logged in memory tables (unsorted order)
2) Minor compaction: Memtables -> sorted, indexed files in HDFS
3) Major compaction: LSM-tree based file merging in background
Read-path: lookup both memtables and on-disk files
Apache ACCUMULO
Started at NSA; now an Apache Incubator project
− Designed for for high-speed ingest and scan workloads
− http://incubator.apache.org/projects/accumulo.html
New features in ACCUMULO
− Iterator framework for user-specified programs placed in
between different stages of the DB pipeline
  E.g., Support joins and stream processing using iterators
− Also supports fine-grained cell-level access control
Swapnil Patil, CMU 13
ILLUSTRATIVE EXAMPLE #1
Analyzing the
fast inserts vs. weak consistency
tradeoff usingYCSB++
Swapnil Patil, CMU 14
Client-side batch writing
Feature: clients batch inserts, delay writes to server
•  Improves insert throughput and latency
•  Newly inserted data may not be immediately visible to
other clients
Swapnil Patil, CMU 15
⏏

⏏

Table store servers
ZooKeeper
Cluster
Manager
YCSB++
Store client
Batch
YCSB++
Store client
CLIENT #1 CLIENT #2
Read{K}
Batch writing improves throughput
6 clients creating 9 million 1-Kbyte records on 6 servers
− Small batches - high client CPU utilization, limits throughput
− Large batches - saturate servers, limited benefit from batching
Swapnil Patil, CMU 16
0
10
20
30
40
50
60
10 KB 100 KB 1 MB 10 MB
Insertspersecond(1000s)
Batch size
Hbase Accumulo
Table store servers
ZooKeeper
Batch writing causes weak consistency
Swapnil Patil, CMU 17
Test setup: ZooKeeper-based client coordination
•  Share producer-consumer queue between readers/writers
•  R-W lag = delay before C2 can read C1’s most recent write
YCSB++
Store client
Batch
YCSB++
Store client
1
2 3
4
CLIENT #1 CLIENT #2
Insert
{K:V}
(106 records)
EnqueueK
(sample 1% records)
Polland
dequeueK
Read{K}
Batch writing causes weak consistency
Deferred write wins, but lag can be ~100 seconds
− (N%) = fraction of requests that needed multiple read()s
− Implementation of batching affects the median latency
Swapnil Patil, CMU 18
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1 10 100 1000 10000 100000
Fractionofrequests
read-after-write time lag (ms)
(a) HBase: Time lag for different buffer sizes
10 KB ( <1%)
100 KB (7.4%)
1 MB ( 17%)
10 MB ( 23%)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1 10 100 1000 10000 100000
Fractionofrequests
read-after-write time lag (ms)
(b) Accumulo: Time lag for different buffer sizes
10 KB ( <1%)
100 KB (1.2%)
1 MB ( 14%)
10 MB ( 33%)
ILLUSTRATIVE EXAMPLE #2
Benchmarking
high-speed ingest features
usingYCSB++
Swapnil Patil, CMU 19
Features for high-speed insertions
Most table stores have high-speed ingest features
− Periodically insert large amounts of data or migrate old
data in bulk
− Classic relational DB techniques applied to new stores
Two features: bulk loading and table pre-splitting
•  Less data migration during inserts
•  Engages more tablet servers immediately
•  Need careful tuning and configuration [Sasha2002]
Swapnil Patil, CMU 20
⏏

⏏

⏏

8-phase test setup: table bulk loading
Bulk loading involves two steps
− Hadoop-based data formatting
− Importing store files into table store
Pre-load phase (1 and 2)
− Bulk load 6M rows in an empty table
− Goal: parallelism by engaging all servers
Load phase (4 and 5)
− Load 48M new rows
− Goal: study rebalancing during ingest
R/U measurements (3, 6 and 7)
− Correlate latency with rebalancing work
Swapnil Patil, CMU 21
Load (importing)
Read/Update workload
Load (re-formatting)
Read/Update workload
Sleep
Read/Update workload
Pre-Load (importing)
Pre-Load (re-formatting)
Phases
1
2
3
4
5
6
7
8
Read latency affected by rebalancing work
Swapnil Patil, CMU 22
Load (importing)
Read/Update workload
Load (re-formatting)
Read/Update workload
Sleep
Read/Update workload
Pre-Load (importing)
Pre-Load (re-formatting)
Phases
1
2
3
4
5
6
7
8
1
10
100
1000
0 60 120 180 240 300
AccumuloReadLatency(ms)
Measurement Phase RunningTime (Seconds)
R/U 1 (Phase 3) R/U 2 (Phase 6) R/U 3 (Phase 8)
•  High latency after high insertion periods that
cause servers to rebalance (compactions)
•  Latency drops after store is in a steady state
Rebalancing on ACCUMULO servers
Swapnil Patil, CMU 23
Load (importing)
Read/Update workload
Load (re-formatting)
Read/Update workload
Sleep
Read/Update workload
Pre-Load (importing)
Pre-Load (re-formatting)
Phases
1
2
3
4
5
6
7
8
•  OTUS monitor shows the server-side
compactions during post-ingest
measurement phases
1
10
100
1000
0 300 600 900 1200 1500 1800
Experiment RunningTime (sec)
StoreFiles
Tablets
Compactions
HBASE is slower: Different compaction policies
Swapnil Patil, CMU 24
1
10
100
1000
10000
0 60 120 180 240 300
AccumuloReadLatency(ms)
Measurement Phase RunningTime (Seconds)
R/U 1 (Phase 3) R/U 2 (Phase 6) R/U 3 (Phase 8)
1
10
100
1000
0 300 600 900 1200 1500 1800
Accumulo Experiment RunningTime (sec)
StoreFiles
Tablets
Compactions
1
10
100
1000
10000
0 60 120 180 240 300
HBaseReadLatency(ms)
Measurement Phase RunningTime (Seconds)
1
10
100
1000
0 300 600 900 1200 1500 1800
HBase Experiment RunningTime (sec)
Extending to table pre-splitting
Swapnil Patil, CMU 25
Tablepre-splittingtest
Load
Pre-load
Pre-split into N ranges
Read/Update workload
Sleep
Read/Update workload
Load (importing)
Read/Update workload
Load (re-formatting)
Read/Update workload
Sleep
Read/Update workload
Pre-Load (importing)
Pre-Load (re-formatting)
Bulkloadingtest
Pre-split a key range into N partitions to avoid splitting during insertion
Outline
•  Problem
•  YCSB++ design
•  Illustrative examples
•  Ongoing work and summary
Swapnil Patil, CMU 26
Things not covered in this talk
More features: function shipping to servers
− Data filtering at the servers
− Fine-grained, cell-level access control
MoredetailsintheACMSOCC2011paper
Ongoing work
− Analyze more table stores: Cassandra,CouchDB, MongoDB
− Continue research through the new Intel Science and
Technology Center for Cloud Computing at CMU (withGaTech)
Swapnil Patil, CMU 27
Summary:YCSB++ tool
•  Tool for performance debugging and benchmarking
advanced features using new extensions toYCSB
•  Two case-studies: Apache HBASE and ACCUMULO
•  Tool available at http://www.pdl.cmu.edu/ycsb++
Weak consistency semantics Distributed clients using ZooKeeper
Fast insertions (pre-splits & bulk loads) Multi-phase testing (with Hadoop)
Server-side filtering New workload generators and
database client API extensionsFine-grained access control
28Swapnil Patil, CMU

Más contenido relacionado

La actualidad más candente

Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...DataStax
 
Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?
Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?
Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?Accumulo Summit
 
Apache Cassandra 2.0
Apache Cassandra 2.0Apache Cassandra 2.0
Apache Cassandra 2.0Joe Stein
 
Distribute Key Value Store
Distribute Key Value StoreDistribute Key Value Store
Distribute Key Value StoreSantal Li
 
The Google File System (GFS)
The Google File System (GFS)The Google File System (GFS)
The Google File System (GFS)Romain Jacotin
 
Cassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + DynamoCassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + Dynamojbellis
 
HBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
HBaseCon 2012 | Base Metrics: What They Mean to You - ClouderaHBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
HBaseCon 2012 | Base Metrics: What They Mean to You - ClouderaCloudera, Inc.
 
Everyday I’m scaling... Cassandra
Everyday I’m scaling... CassandraEveryday I’m scaling... Cassandra
Everyday I’m scaling... CassandraInstaclustr
 
Oracle Real Application Cluster ( RAC )
Oracle Real Application Cluster ( RAC )Oracle Real Application Cluster ( RAC )
Oracle Real Application Cluster ( RAC )varasteh65
 
Cost and performance comparison for OpenStack compute and storage infrastructure
Cost and performance comparison for OpenStack compute and storage infrastructureCost and performance comparison for OpenStack compute and storage infrastructure
Cost and performance comparison for OpenStack compute and storage infrastructurePrincipled Technologies
 
Apache HBase, Accelerated: In-Memory Flush and Compaction
Apache HBase, Accelerated: In-Memory Flush and Compaction Apache HBase, Accelerated: In-Memory Flush and Compaction
Apache HBase, Accelerated: In-Memory Flush and Compaction HBaseCon
 
ClustrixDB 7.5 Announcement
ClustrixDB 7.5 AnnouncementClustrixDB 7.5 Announcement
ClustrixDB 7.5 AnnouncementClustrix
 
Performance of persistent apps on Container-Native Storage for Red Hat OpenSh...
Performance of persistent apps on Container-Native Storage for Red Hat OpenSh...Performance of persistent apps on Container-Native Storage for Red Hat OpenSh...
Performance of persistent apps on Container-Native Storage for Red Hat OpenSh...Principled Technologies
 
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...DataStax
 
006 performance tuningandclusteradmin
006 performance tuningandclusteradmin006 performance tuningandclusteradmin
006 performance tuningandclusteradminScott Miao
 
Hug Hbase Presentation.
Hug Hbase Presentation.Hug Hbase Presentation.
Hug Hbase Presentation.Jack Levin
 

La actualidad más candente (20)

Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
 
Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?
Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?
Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?
 
Apache Cassandra 2.0
Apache Cassandra 2.0Apache Cassandra 2.0
Apache Cassandra 2.0
 
Distribute Key Value Store
Distribute Key Value StoreDistribute Key Value Store
Distribute Key Value Store
 
The Google File System (GFS)
The Google File System (GFS)The Google File System (GFS)
The Google File System (GFS)
 
Cassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + DynamoCassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + Dynamo
 
Cassandra useful features
Cassandra useful featuresCassandra useful features
Cassandra useful features
 
HBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
HBaseCon 2012 | Base Metrics: What They Mean to You - ClouderaHBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
HBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
 
Everyday I’m scaling... Cassandra
Everyday I’m scaling... CassandraEveryday I’m scaling... Cassandra
Everyday I’m scaling... Cassandra
 
Oracle Real Application Cluster ( RAC )
Oracle Real Application Cluster ( RAC )Oracle Real Application Cluster ( RAC )
Oracle Real Application Cluster ( RAC )
 
Apache Cassandra at Macys
Apache Cassandra at MacysApache Cassandra at Macys
Apache Cassandra at Macys
 
Bigtable and Dynamo
Bigtable and DynamoBigtable and Dynamo
Bigtable and Dynamo
 
Cost and performance comparison for OpenStack compute and storage infrastructure
Cost and performance comparison for OpenStack compute and storage infrastructureCost and performance comparison for OpenStack compute and storage infrastructure
Cost and performance comparison for OpenStack compute and storage infrastructure
 
Apache HBase, Accelerated: In-Memory Flush and Compaction
Apache HBase, Accelerated: In-Memory Flush and Compaction Apache HBase, Accelerated: In-Memory Flush and Compaction
Apache HBase, Accelerated: In-Memory Flush and Compaction
 
ClustrixDB 7.5 Announcement
ClustrixDB 7.5 AnnouncementClustrixDB 7.5 Announcement
ClustrixDB 7.5 Announcement
 
Performance of persistent apps on Container-Native Storage for Red Hat OpenSh...
Performance of persistent apps on Container-Native Storage for Red Hat OpenSh...Performance of persistent apps on Container-Native Storage for Red Hat OpenSh...
Performance of persistent apps on Container-Native Storage for Red Hat OpenSh...
 
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
 
006 performance tuningandclusteradmin
006 performance tuningandclusteradmin006 performance tuningandclusteradmin
006 performance tuningandclusteradmin
 
Hug Hbase Presentation.
Hug Hbase Presentation.Hug Hbase Presentation.
Hug Hbase Presentation.
 
Exch2007 sp1 win2008
Exch2007 sp1 win2008Exch2007 sp1 win2008
Exch2007 sp1 win2008
 

Similar a Ycsb benchmarking

Adventures in RDS Load Testing
Adventures in RDS Load TestingAdventures in RDS Load Testing
Adventures in RDS Load TestingMike Harnish
 
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...MLconf
 
Accumulo Summit 2015: Ferrari on a Bumpy Road: Shock Absorbers to Smooth Out ...
Accumulo Summit 2015: Ferrari on a Bumpy Road: Shock Absorbers to Smooth Out ...Accumulo Summit 2015: Ferrari on a Bumpy Road: Shock Absorbers to Smooth Out ...
Accumulo Summit 2015: Ferrari on a Bumpy Road: Shock Absorbers to Smooth Out ...Accumulo Summit
 
Thomas+Niewel+ +Oracletuning
Thomas+Niewel+ +OracletuningThomas+Niewel+ +Oracletuning
Thomas+Niewel+ +Oracletuningafa reg
 
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New FeaturesAmazon Web Services
 
How should I monitor my idaa
How should I monitor my idaaHow should I monitor my idaa
How should I monitor my idaaCuneyt Goksu
 
C* Summit 2013: CMB: An Open Message Bus for the Cloud by Boris Wolf
C* Summit 2013: CMB: An Open Message Bus for the Cloud by Boris WolfC* Summit 2013: CMB: An Open Message Bus for the Cloud by Boris Wolf
C* Summit 2013: CMB: An Open Message Bus for the Cloud by Boris WolfDataStax Academy
 
SQLIO - measuring storage performance
SQLIO - measuring storage performanceSQLIO - measuring storage performance
SQLIO - measuring storage performancevalerian_ceaus
 
Troubleshooting SQL Server
Troubleshooting SQL ServerTroubleshooting SQL Server
Troubleshooting SQL ServerStephen Rose
 
HBaseCon 2012 | HBase, the Use Case in eBay Cassini
HBaseCon 2012 | HBase, the Use Case in eBay Cassini HBaseCon 2012 | HBase, the Use Case in eBay Cassini
HBaseCon 2012 | HBase, the Use Case in eBay Cassini Cloudera, Inc.
 
GECon2017_High-volume data streaming in azure_ Aliaksandr Laisha
GECon2017_High-volume data streaming in azure_ Aliaksandr LaishaGECon2017_High-volume data streaming in azure_ Aliaksandr Laisha
GECon2017_High-volume data streaming in azure_ Aliaksandr LaishaGECon_Org Team
 
HeroLympics Eng V03 Henk Vd Valk
HeroLympics  Eng V03 Henk Vd ValkHeroLympics  Eng V03 Henk Vd Valk
HeroLympics Eng V03 Henk Vd Valkhvdvalk
 
CPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCoburn Watson
 
Top 10 tips for Oracle performance
Top 10 tips for Oracle performanceTop 10 tips for Oracle performance
Top 10 tips for Oracle performanceGuy Harrison
 
Nosql series-part-3-hypertable
Nosql series-part-3-hypertableNosql series-part-3-hypertable
Nosql series-part-3-hypertablehypertable
 
Data on the Move: Transitioning from a Legacy Architecture to a Big Data Plat...
Data on the Move: Transitioning from a Legacy Architecture to a Big Data Plat...Data on the Move: Transitioning from a Legacy Architecture to a Big Data Plat...
Data on the Move: Transitioning from a Legacy Architecture to a Big Data Plat...MapR Technologies
 

Similar a Ycsb benchmarking (20)

Adventures in RDS Load Testing
Adventures in RDS Load TestingAdventures in RDS Load Testing
Adventures in RDS Load Testing
 
Using AWR for IO Subsystem Analysis
Using AWR for IO Subsystem AnalysisUsing AWR for IO Subsystem Analysis
Using AWR for IO Subsystem Analysis
 
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
 
Accumulo Summit 2015: Ferrari on a Bumpy Road: Shock Absorbers to Smooth Out ...
Accumulo Summit 2015: Ferrari on a Bumpy Road: Shock Absorbers to Smooth Out ...Accumulo Summit 2015: Ferrari on a Bumpy Road: Shock Absorbers to Smooth Out ...
Accumulo Summit 2015: Ferrari on a Bumpy Road: Shock Absorbers to Smooth Out ...
 
Thomas+Niewel+ +Oracletuning
Thomas+Niewel+ +OracletuningThomas+Niewel+ +Oracletuning
Thomas+Niewel+ +Oracletuning
 
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
 
How should I monitor my idaa
How should I monitor my idaaHow should I monitor my idaa
How should I monitor my idaa
 
C* Summit 2013: CMB: An Open Message Bus for the Cloud by Boris Wolf
C* Summit 2013: CMB: An Open Message Bus for the Cloud by Boris WolfC* Summit 2013: CMB: An Open Message Bus for the Cloud by Boris Wolf
C* Summit 2013: CMB: An Open Message Bus for the Cloud by Boris Wolf
 
Using Statspack and AWR for Memory Monitoring and Tuning
Using Statspack and AWR for Memory Monitoring and TuningUsing Statspack and AWR for Memory Monitoring and Tuning
Using Statspack and AWR for Memory Monitoring and Tuning
 
SQLIO - measuring storage performance
SQLIO - measuring storage performanceSQLIO - measuring storage performance
SQLIO - measuring storage performance
 
Troubleshooting SQL Server
Troubleshooting SQL ServerTroubleshooting SQL Server
Troubleshooting SQL Server
 
HBaseCon 2012 | HBase, the Use Case in eBay Cassini
HBaseCon 2012 | HBase, the Use Case in eBay Cassini HBaseCon 2012 | HBase, the Use Case in eBay Cassini
HBaseCon 2012 | HBase, the Use Case in eBay Cassini
 
GECon2017_High-volume data streaming in azure_ Aliaksandr Laisha
GECon2017_High-volume data streaming in azure_ Aliaksandr LaishaGECon2017_High-volume data streaming in azure_ Aliaksandr Laisha
GECon2017_High-volume data streaming in azure_ Aliaksandr Laisha
 
HeroLympics Eng V03 Henk Vd Valk
HeroLympics  Eng V03 Henk Vd ValkHeroLympics  Eng V03 Henk Vd Valk
HeroLympics Eng V03 Henk Vd Valk
 
CPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performance
 
Dba tuning
Dba tuningDba tuning
Dba tuning
 
Top 10 tips for Oracle performance
Top 10 tips for Oracle performanceTop 10 tips for Oracle performance
Top 10 tips for Oracle performance
 
Nosql series-part-3-hypertable
Nosql series-part-3-hypertableNosql series-part-3-hypertable
Nosql series-part-3-hypertable
 
Data on the Move: Transitioning from a Legacy Architecture to a Big Data Plat...
Data on the Move: Transitioning from a Legacy Architecture to a Big Data Plat...Data on the Move: Transitioning from a Legacy Architecture to a Big Data Plat...
Data on the Move: Transitioning from a Legacy Architecture to a Big Data Plat...
 
Exchange Server 2013 Database and Store Changes
Exchange Server 2013 Database and Store ChangesExchange Server 2013 Database and Store Changes
Exchange Server 2013 Database and Store Changes
 

Más de Sqrrl

Transitioning Government Technology
Transitioning Government TechnologyTransitioning Government Technology
Transitioning Government TechnologySqrrl
 
Leveraging Threat Intelligence to Guide Your Hunts
Leveraging Threat Intelligence to Guide Your HuntsLeveraging Threat Intelligence to Guide Your Hunts
Leveraging Threat Intelligence to Guide Your HuntsSqrrl
 
How to Hunt for Lateral Movement on Your Network
How to Hunt for Lateral Movement on Your NetworkHow to Hunt for Lateral Movement on Your Network
How to Hunt for Lateral Movement on Your NetworkSqrrl
 
Machine Learning for Incident Detection: Getting Started
Machine Learning for Incident Detection: Getting StartedMachine Learning for Incident Detection: Getting Started
Machine Learning for Incident Detection: Getting StartedSqrrl
 
Building a Next-Generation Security Operations Center (SOC)
Building a Next-Generation Security Operations Center (SOC)Building a Next-Generation Security Operations Center (SOC)
Building a Next-Generation Security Operations Center (SOC)Sqrrl
 
User and Entity Behavior Analytics using the Sqrrl Behavior Graph
User and Entity Behavior Analytics using the Sqrrl Behavior GraphUser and Entity Behavior Analytics using the Sqrrl Behavior Graph
User and Entity Behavior Analytics using the Sqrrl Behavior GraphSqrrl
 
Threat Hunting Platforms (Collaboration with SANS Institute)
Threat Hunting Platforms (Collaboration with SANS Institute)Threat Hunting Platforms (Collaboration with SANS Institute)
Threat Hunting Platforms (Collaboration with SANS Institute)Sqrrl
 
Sqrrl and IBM: Threat Hunting for QRadar Users
Sqrrl and IBM: Threat Hunting for QRadar UsersSqrrl and IBM: Threat Hunting for QRadar Users
Sqrrl and IBM: Threat Hunting for QRadar UsersSqrrl
 
Threat Hunting for Command and Control Activity
Threat Hunting for Command and Control ActivityThreat Hunting for Command and Control Activity
Threat Hunting for Command and Control ActivitySqrrl
 
Modernizing Your SOC: A CISO-led Training
Modernizing Your SOC: A CISO-led TrainingModernizing Your SOC: A CISO-led Training
Modernizing Your SOC: A CISO-led TrainingSqrrl
 
Threat Hunting vs. UEBA: Similarities, Differences, and How They Work Together
Threat Hunting vs. UEBA: Similarities, Differences, and How They Work Together Threat Hunting vs. UEBA: Similarities, Differences, and How They Work Together
Threat Hunting vs. UEBA: Similarities, Differences, and How They Work Together Sqrrl
 
Leveraging DNS to Surface Attacker Activity
Leveraging DNS to Surface Attacker ActivityLeveraging DNS to Surface Attacker Activity
Leveraging DNS to Surface Attacker ActivitySqrrl
 
The Art and Science of Alert Triage
The Art and Science of Alert TriageThe Art and Science of Alert Triage
The Art and Science of Alert TriageSqrrl
 
Reducing Mean Time to Know
Reducing Mean Time to KnowReducing Mean Time to Know
Reducing Mean Time to KnowSqrrl
 
Sqrrl Enterprise: Big Data Security Analytics Use Case
Sqrrl Enterprise: Big Data Security Analytics Use CaseSqrrl Enterprise: Big Data Security Analytics Use Case
Sqrrl Enterprise: Big Data Security Analytics Use CaseSqrrl
 
The Linked Data Advantage
The Linked Data AdvantageThe Linked Data Advantage
The Linked Data AdvantageSqrrl
 
Sqrrl Enterprise: Integrate, Explore, Analyze
Sqrrl Enterprise: Integrate, Explore, AnalyzeSqrrl Enterprise: Integrate, Explore, Analyze
Sqrrl Enterprise: Integrate, Explore, AnalyzeSqrrl
 
Sqrrl Datasheet: Cyber Hunting
Sqrrl Datasheet: Cyber HuntingSqrrl Datasheet: Cyber Hunting
Sqrrl Datasheet: Cyber HuntingSqrrl
 
Benchmarking The Apache Accumulo Distributed Key–Value Store
Benchmarking The Apache Accumulo Distributed Key–Value StoreBenchmarking The Apache Accumulo Distributed Key–Value Store
Benchmarking The Apache Accumulo Distributed Key–Value StoreSqrrl
 
Scalable Graph Clustering with Pregel
Scalable Graph Clustering with PregelScalable Graph Clustering with Pregel
Scalable Graph Clustering with PregelSqrrl
 

Más de Sqrrl (20)

Transitioning Government Technology
Transitioning Government TechnologyTransitioning Government Technology
Transitioning Government Technology
 
Leveraging Threat Intelligence to Guide Your Hunts
Leveraging Threat Intelligence to Guide Your HuntsLeveraging Threat Intelligence to Guide Your Hunts
Leveraging Threat Intelligence to Guide Your Hunts
 
How to Hunt for Lateral Movement on Your Network
How to Hunt for Lateral Movement on Your NetworkHow to Hunt for Lateral Movement on Your Network
How to Hunt for Lateral Movement on Your Network
 
Machine Learning for Incident Detection: Getting Started
Machine Learning for Incident Detection: Getting StartedMachine Learning for Incident Detection: Getting Started
Machine Learning for Incident Detection: Getting Started
 
Building a Next-Generation Security Operations Center (SOC)
Building a Next-Generation Security Operations Center (SOC)Building a Next-Generation Security Operations Center (SOC)
Building a Next-Generation Security Operations Center (SOC)
 
User and Entity Behavior Analytics using the Sqrrl Behavior Graph
User and Entity Behavior Analytics using the Sqrrl Behavior GraphUser and Entity Behavior Analytics using the Sqrrl Behavior Graph
User and Entity Behavior Analytics using the Sqrrl Behavior Graph
 
Threat Hunting Platforms (Collaboration with SANS Institute)
Threat Hunting Platforms (Collaboration with SANS Institute)Threat Hunting Platforms (Collaboration with SANS Institute)
Threat Hunting Platforms (Collaboration with SANS Institute)
 
Sqrrl and IBM: Threat Hunting for QRadar Users
Sqrrl and IBM: Threat Hunting for QRadar UsersSqrrl and IBM: Threat Hunting for QRadar Users
Sqrrl and IBM: Threat Hunting for QRadar Users
 
Threat Hunting for Command and Control Activity
Threat Hunting for Command and Control ActivityThreat Hunting for Command and Control Activity
Threat Hunting for Command and Control Activity
 
Modernizing Your SOC: A CISO-led Training
Modernizing Your SOC: A CISO-led TrainingModernizing Your SOC: A CISO-led Training
Modernizing Your SOC: A CISO-led Training
 
Threat Hunting vs. UEBA: Similarities, Differences, and How They Work Together
Threat Hunting vs. UEBA: Similarities, Differences, and How They Work Together Threat Hunting vs. UEBA: Similarities, Differences, and How They Work Together
Threat Hunting vs. UEBA: Similarities, Differences, and How They Work Together
 
Leveraging DNS to Surface Attacker Activity
Leveraging DNS to Surface Attacker ActivityLeveraging DNS to Surface Attacker Activity
Leveraging DNS to Surface Attacker Activity
 
The Art and Science of Alert Triage
The Art and Science of Alert TriageThe Art and Science of Alert Triage
The Art and Science of Alert Triage
 
Reducing Mean Time to Know
Reducing Mean Time to KnowReducing Mean Time to Know
Reducing Mean Time to Know
 
Sqrrl Enterprise: Big Data Security Analytics Use Case
Sqrrl Enterprise: Big Data Security Analytics Use CaseSqrrl Enterprise: Big Data Security Analytics Use Case
Sqrrl Enterprise: Big Data Security Analytics Use Case
 
The Linked Data Advantage
The Linked Data AdvantageThe Linked Data Advantage
The Linked Data Advantage
 
Sqrrl Enterprise: Integrate, Explore, Analyze
Sqrrl Enterprise: Integrate, Explore, AnalyzeSqrrl Enterprise: Integrate, Explore, Analyze
Sqrrl Enterprise: Integrate, Explore, Analyze
 
Sqrrl Datasheet: Cyber Hunting
Sqrrl Datasheet: Cyber HuntingSqrrl Datasheet: Cyber Hunting
Sqrrl Datasheet: Cyber Hunting
 
Benchmarking The Apache Accumulo Distributed Key–Value Store
Benchmarking The Apache Accumulo Distributed Key–Value StoreBenchmarking The Apache Accumulo Distributed Key–Value Store
Benchmarking The Apache Accumulo Distributed Key–Value Store
 
Scalable Graph Clustering with Pregel
Scalable Graph Clustering with PregelScalable Graph Clustering with Pregel
Scalable Graph Clustering with Pregel
 

Último

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 

Último (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 

Ycsb benchmarking

  • 1. YCSB++ BenchmarkingTool PerformanceDebuggingAdvanced FeaturesofScalableTableStores Swapnil Patil Milo Polte,WittawatTantisiriroj, Kai Ren, Lin Xiao, Julio Lopez, Garth Gibson, Adam Fuchs *, Billie Rinaldi * CarnegieMellonUniversity *NationalSecurityAgency Open Cirrus Summit, October 2011, Atlanta GA
  • 2. Scalable table stores are critical systems Swapnil Patil, CMU 2 •  For data processing & analysis (e.g. Pregel, Hive) •  For systems services (e.g., Google Colossus metadata)
  • 3. Evolution of scalable table stores Simple, lightweight  complex, feature-rich stores Supports a broader range of applications and services Hard to debug and understand performance problems Complex behavior and interaction of various components GrowingsetofHBasefeatures 2008 2009 2010 2011+ RangeRowFilters Batch updates Bulk load tools RegEx filtering Scan optimizations HBASE release Co-processors Access Control ⏏
 ⏏
 ⏏
 3Swapnil Patil, CMU
  • 4. YCSB++ FUNCTIONALITY ZooKeeper-based distributed and coordinated testing API and extensions the new Apache ACCUMULO DB Fine-grained, correlated monitoring usingOTUS FEATURES TESTED USING YCSB++ Batch writing Table pre-splitting  Bulk loading Weak consistency  Server-side filtering  Fine-grained security Tool released at http://www.pdl.cmu.edu/ycsb++ Swapnil Patil, CMU 4 Need richer tools for understanding advanced features in table stores …
  • 5. Outline •  Problem •  YCSB++ design •  Illustrative examples •  Ongoing work and summary Swapnil Patil, CMU 5
  • 6. Yahoo Cloud Serving Benchmark [Cooper2010] Swapnil Patil, CMU 6 •  For CRUD (create-read-update-delete) benchmarking •  Single-node system with an extensible API Storage Servers HBASE OTHER DBS Workload Executor Threads Stats DBClients Command-line Parameters Workload Parameter File
  • 7. YCSB++: New extensions Swapnil Patil, CMU 7 Added support for the new Apache ACCUMULO DB − New parameters and workload executors Storage Servers HBASE OTHER DBS Workload Executor Threads Stats DBClients Workload Parameter File Command-line Parameters EXTENSIONSEXTENSIONS ACCUMULO
  • 8. YCSB++: Distributed & parallel tests Swapnil Patil, CMU 8 Multi-client, multi-phase coordination using ZooKeeper − Enables testing at large scales and testing asymmetric features Storage Servers HBASE OTHER DBS Workload Executor Threads Stats DBClients Workload Parameter File Command-line Parameters EXTENSIONSEXTENSIONS MULTI-PHASE ACCUMULO YCSB clients COORDINATION
  • 9. YCSB++: Collective monitoring Swapnil Patil, CMU 9 OTUS monitor built on Ganglia [Ren2011] − Collects information fromYCSB, table stores, HDFS and OS Storage Servers HBASE OTHER DBS Workload Executor Threads Stats DBClients Workload Parameter File Command-line Parameters EXTENSIONSEXTENSIONS MULTI-PHASE ACCUMULO YCSB clients COORDINATION OTUS MONITORING
  • 10. Example ofYCSB++ debugging Swapnil Patil, CMU 10 OTUS collects fine-grained information − Both HDFS process andTabletServer process on same node 0 20 40 60 80 100 00:00 04:00 08:00 12:00 16:00 20:00 00:00 04:00 0 8 16 24 32 40 CPUUsage(%) AvgNumberofStoreFilesPerTablet Time (Minutes) Monitoring Resource Usage and TableStore Metrics Accumulo Avg. StoreFiles per Tablet HDFS DataNode CPU Usage Accumulo TabletServer CPU Usage
  • 11. Outline •  Problem •  YCSB++ design •  Illustrative examples − YCSB++ on HBASE and ACCUMULO (Bigtable-like stores) •  Ongoing work and summary Swapnil Patil, CMU 11
  • 12. Tablet Servers Recap of Bigtable-like table stores Swapnil Patil, CMU 12 HDFS nodes TabletTN Memtable (Fewer) Sorted Indexed Files Sorted Indexed Files MINOR COMPACTION MAJOR COMPACTION Write Ahead Log Data Insertion 1 2 3 Write-path: in-memory buffering & async FS writes 1) Mutations logged in memory tables (unsorted order) 2) Minor compaction: Memtables -> sorted, indexed files in HDFS 3) Major compaction: LSM-tree based file merging in background Read-path: lookup both memtables and on-disk files
  • 13. Apache ACCUMULO Started at NSA; now an Apache Incubator project − Designed for for high-speed ingest and scan workloads − http://incubator.apache.org/projects/accumulo.html New features in ACCUMULO − Iterator framework for user-specified programs placed in between different stages of the DB pipeline   E.g., Support joins and stream processing using iterators − Also supports fine-grained cell-level access control Swapnil Patil, CMU 13
  • 14. ILLUSTRATIVE EXAMPLE #1 Analyzing the fast inserts vs. weak consistency tradeoff usingYCSB++ Swapnil Patil, CMU 14
  • 15. Client-side batch writing Feature: clients batch inserts, delay writes to server •  Improves insert throughput and latency •  Newly inserted data may not be immediately visible to other clients Swapnil Patil, CMU 15 ⏏
 ⏏
 Table store servers ZooKeeper Cluster Manager YCSB++ Store client Batch YCSB++ Store client CLIENT #1 CLIENT #2 Read{K}
  • 16. Batch writing improves throughput 6 clients creating 9 million 1-Kbyte records on 6 servers − Small batches - high client CPU utilization, limits throughput − Large batches - saturate servers, limited benefit from batching Swapnil Patil, CMU 16 0 10 20 30 40 50 60 10 KB 100 KB 1 MB 10 MB Insertspersecond(1000s) Batch size Hbase Accumulo
  • 17. Table store servers ZooKeeper Batch writing causes weak consistency Swapnil Patil, CMU 17 Test setup: ZooKeeper-based client coordination •  Share producer-consumer queue between readers/writers •  R-W lag = delay before C2 can read C1’s most recent write YCSB++ Store client Batch YCSB++ Store client 1 2 3 4 CLIENT #1 CLIENT #2 Insert {K:V} (106 records) EnqueueK (sample 1% records) Polland dequeueK Read{K}
  • 18. Batch writing causes weak consistency Deferred write wins, but lag can be ~100 seconds − (N%) = fraction of requests that needed multiple read()s − Implementation of batching affects the median latency Swapnil Patil, CMU 18 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 10 100 1000 10000 100000 Fractionofrequests read-after-write time lag (ms) (a) HBase: Time lag for different buffer sizes 10 KB ( <1%) 100 KB (7.4%) 1 MB ( 17%) 10 MB ( 23%) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 10 100 1000 10000 100000 Fractionofrequests read-after-write time lag (ms) (b) Accumulo: Time lag for different buffer sizes 10 KB ( <1%) 100 KB (1.2%) 1 MB ( 14%) 10 MB ( 33%)
  • 19. ILLUSTRATIVE EXAMPLE #2 Benchmarking high-speed ingest features usingYCSB++ Swapnil Patil, CMU 19
  • 20. Features for high-speed insertions Most table stores have high-speed ingest features − Periodically insert large amounts of data or migrate old data in bulk − Classic relational DB techniques applied to new stores Two features: bulk loading and table pre-splitting •  Less data migration during inserts •  Engages more tablet servers immediately •  Need careful tuning and configuration [Sasha2002] Swapnil Patil, CMU 20 ⏏
 ⏏
 ⏏

  • 21. 8-phase test setup: table bulk loading Bulk loading involves two steps − Hadoop-based data formatting − Importing store files into table store Pre-load phase (1 and 2) − Bulk load 6M rows in an empty table − Goal: parallelism by engaging all servers Load phase (4 and 5) − Load 48M new rows − Goal: study rebalancing during ingest R/U measurements (3, 6 and 7) − Correlate latency with rebalancing work Swapnil Patil, CMU 21 Load (importing) Read/Update workload Load (re-formatting) Read/Update workload Sleep Read/Update workload Pre-Load (importing) Pre-Load (re-formatting) Phases 1 2 3 4 5 6 7 8
  • 22. Read latency affected by rebalancing work Swapnil Patil, CMU 22 Load (importing) Read/Update workload Load (re-formatting) Read/Update workload Sleep Read/Update workload Pre-Load (importing) Pre-Load (re-formatting) Phases 1 2 3 4 5 6 7 8 1 10 100 1000 0 60 120 180 240 300 AccumuloReadLatency(ms) Measurement Phase RunningTime (Seconds) R/U 1 (Phase 3) R/U 2 (Phase 6) R/U 3 (Phase 8) •  High latency after high insertion periods that cause servers to rebalance (compactions) •  Latency drops after store is in a steady state
  • 23. Rebalancing on ACCUMULO servers Swapnil Patil, CMU 23 Load (importing) Read/Update workload Load (re-formatting) Read/Update workload Sleep Read/Update workload Pre-Load (importing) Pre-Load (re-formatting) Phases 1 2 3 4 5 6 7 8 •  OTUS monitor shows the server-side compactions during post-ingest measurement phases 1 10 100 1000 0 300 600 900 1200 1500 1800 Experiment RunningTime (sec) StoreFiles Tablets Compactions
  • 24. HBASE is slower: Different compaction policies Swapnil Patil, CMU 24 1 10 100 1000 10000 0 60 120 180 240 300 AccumuloReadLatency(ms) Measurement Phase RunningTime (Seconds) R/U 1 (Phase 3) R/U 2 (Phase 6) R/U 3 (Phase 8) 1 10 100 1000 0 300 600 900 1200 1500 1800 Accumulo Experiment RunningTime (sec) StoreFiles Tablets Compactions 1 10 100 1000 10000 0 60 120 180 240 300 HBaseReadLatency(ms) Measurement Phase RunningTime (Seconds) 1 10 100 1000 0 300 600 900 1200 1500 1800 HBase Experiment RunningTime (sec)
  • 25. Extending to table pre-splitting Swapnil Patil, CMU 25 Tablepre-splittingtest Load Pre-load Pre-split into N ranges Read/Update workload Sleep Read/Update workload Load (importing) Read/Update workload Load (re-formatting) Read/Update workload Sleep Read/Update workload Pre-Load (importing) Pre-Load (re-formatting) Bulkloadingtest Pre-split a key range into N partitions to avoid splitting during insertion
  • 26. Outline •  Problem •  YCSB++ design •  Illustrative examples •  Ongoing work and summary Swapnil Patil, CMU 26
  • 27. Things not covered in this talk More features: function shipping to servers − Data filtering at the servers − Fine-grained, cell-level access control MoredetailsintheACMSOCC2011paper Ongoing work − Analyze more table stores: Cassandra,CouchDB, MongoDB − Continue research through the new Intel Science and Technology Center for Cloud Computing at CMU (withGaTech) Swapnil Patil, CMU 27
  • 28. Summary:YCSB++ tool •  Tool for performance debugging and benchmarking advanced features using new extensions toYCSB •  Two case-studies: Apache HBASE and ACCUMULO •  Tool available at http://www.pdl.cmu.edu/ycsb++ Weak consistency semantics Distributed clients using ZooKeeper Fast insertions (pre-splits & bulk loads) Multi-phase testing (with Hadoop) Server-side filtering New workload generators and database client API extensionsFine-grained access control 28Swapnil Patil, CMU