The Ultimate Guide to Choosing WordPress Pros and Cons
Benchmarking data warehouse systems in the cloud: new requirements & new metrics
1. Data Warehouse Systems in the
Cloud: new requirements and new
challenges
Rim Moussa
LaTICE Lab. -University of Tunis
ESTI -University of Carthage
rim.moussa@esti.rnu.tn
10th Intl. Conference on Computer Systems and Applications
(AICCSA), Fez, Kingdom of Morocco
th
30 May 2013 Keynote @ Intl. Conference on Computing, Networking and
30th May
Communications, Hammamet, Tunisia
DWS in the Cloud, AICCSA'13, Fez
2013
4. Outline
1. Cloud Computing
2. Data Warehouse Systems
3. Overview of DWS Benchmarks
4. New Requirements for DWS in the Cloud
5. Related Work
6. Conclusion
7. Research Perspectives
30th May
2013
DWS in the Cloud, AICCSA'13, Fez
4
5. Cloud Computing
●
NIST Definition
–
●
cloud computing as a pay-per-use model for enabling available,
convenient, on-demand network access to a shared pool of
configurable computing resources (e.g. networks, servers, storage,
applications, services) that can be rapidly provisioned and released
with minimal management effort or service provider interaction.
Opportunities
–
Performance
–
Faster data analysis through usage of up-to-date hardware
infrastructure made available by Cloud Service Providers,
More Economical
●
●
30th May
2013
Organizations no longer need to expend capital upfront for
hardware and software purchases, with Services provided on a
pay-per-use basis,
DWS in the Cloud, AICCSA'13, Fez
5
6. Cloud Computing
--Market share
●
Market Share
–
Forrester Research expects the global cloud computing
market to reach $241 billion in 2020,
–
Gartner group: The public cloud services market is
forecast to grow 18.5% in 2013 to total $131 billion
worldwide, up from $111 billion in 2012,
–
Gartner: the public cloud services market in the Middle
East and North Africa (MENA) is expected to increase
by 24.5% in 2013,
–
Gartner group: the public cloud services market in INDIA
is forecast to grow 36% in 2013 to total $443 million, up
from $326 million in 2012,
30th May
2013
DWS in the Cloud, AICCSA'13, Fez
6
8. Data Warehouse Systems
--Technologies
●
Traditional Relational DBMSs & OLAP Servers
–
–
●
Mature
Do not scale linearly
NoSQL solutions
–
Adopted by Google, Facebook, Amazon, ...
–
Dynamic horizontal scale-up
–
Nodes are added without bringing the cluster down
●
Shared-nothing architecture
●
Independent
computing
and
storage
nodes
interconnected via a high speed network
MapReduce Distributed programming framework
●
30th May
2013
DWS in the Cloud, AICCSA'13, Fez
8
10. Data Warehouse Systems
--Common Optimizations: Hardware Storage Tech.
●
DRAM: in-memory data processing (very expensive)
●
SSD (Solid State Drives): a non-volatile type of memory.
●
An SSD does not have a mechanical arm to read and
write data
SSD
HDD
Cost/GB
$1/GB
$0.075/GB
Typical size
512GB
Up to 2TB
Failure rate:
2 million hours
MTBF
Read/Write speed 200-500 MBps
30th May
2013
1.5 million hour
120 MBps
DWS in the Cloud, AICCSA'13, Fez
10
11. Data Warehouse Systems
--Common Optimizations: Columnar Storage Principle
●
Row-oriented storage
–
Read pages containing all columns
Date
●
Customer
Product
Price Quantity
Column-oriented storage
–
Read only columns needed for query processing
Date
30th May
2013
Customer
Product
Price
DWS in the Cloud, AICCSA'13, Fez
Quantity
11
12. Data Warehouse Systems
--Common Optimizations: Columnar Storage Benefits
●
●
●
Allows best data compression rate, since data values are
redundant within a single column,
Eliminates unnecessary I/O through the retrieval of only
relevant data
Vectorwise is in the TPC-H - Top Ten Performance Results
(14-Jun-2013)
30th May
2013
DWS in the Cloud, AICCSA'13, Fez
12
13. Data Warehouse Systems
--Common Optimizations: Derived Data
●
Derived Data:
–
–
Derived Attributes,
–
●
Indexes,
Aggregate tables
Pros:
–
●
High Performance
Cons:
–
Maintenance: refresh is expensive
–
Storage cost
30th May
2013
DWS in the Cloud, AICCSA'13, Fez
13
14. Data Warehouse Systems
--DWS Benchmarks
●
APB-1 OLAP Benchmark --obsolete
–
–
●
Released by the OLAP Council (www.olapcouncil.org) in 1998
A simple star schema data model
TPC DSS Benchmark
–
Released by the Transaction Processing Council (www.tpc.org)
–
Examine large volumes of data (from 10GB to 100TB)
–
Complex relational data model
–
TPC-H
Workload composed of 22 ad-hoc complex SQL Statements
●
The most prominent DSS benchmark
TPC-DS -successor of TPC-H
●
–
●
●
30th May
2013
Workload composed of a 99 SQL business questions
Same metrics than TPC-H
DWS in the Cloud, AICCSA'13, Fez
14
15. Data Warehouse Systems
--TPC-H Benchmark Metrics (same for TPC-DS)
●
Query-per-hour Performance Metric
–
–
●
For a given scale factor (warehouse data volume)
Concurrent users
Price-Performance Metric
–
30th May
2013
Ratio of Priced System (cost of ownership: hardware,
software, maintenance, and cost of everything needed to run
the TPC6H workload) to Query performance Metric
DWS in the Cloud, AICCSA'13, Fez
15
16. Data Warehouse Systems
--TPC-H mismatches Cloud Rationale
●
TPC-H Does not represent BI suites
–
–
Analytics services (Multi-dimensional
Language, Mining Structures)
–
●
Integration services
Reporting services
eXpressions
TPC-H Workload Processing Metric
–
Qph@Size defines the number of queries processed by hour
–
The workload is assumed static, which is not realistic!
–
The benchmark should assess the SUT scalability under
variable and evolving workload and data volumes
30th May
2013
DWS in the Cloud, AICCSA'13, Fez
16
17. Data Warehouse Systems
--TPC-H mismatches Cloud Rationale (ctnd.1)
●
TPC-H Cost-Performance Metric
–
$/Qph@Size, where the cost relates to all of hardware,
software and HR required for running the workload (3yrs)
–
The cost model in the cloud is different, and does
relate to the cost of ownership
●
TPC-H does not report a Cost-Effectiveness Metric
●
not
TPC-H implementation vs. CAP theorem
–
CAP theorem: A distributed system can not fulfill both
Consistency (same view of data), Availability (query response)
and Partition Tolerance (cope with hardware crash).
–
Since DWS deployments are onto shared-nothing architectures,
benchmarks should be either CA, CP and AP-compliant.
30th May
2013
DWS in the Cloud, AICCSA'13, Fez
17
18. New Requirements & New Metrics
NewRequirements & New Metrics
30th May
2013
DWS in the Cloud, AICCSA'13, Fez
18
19. High Performance Requirement
High Performance Requirement
--Data Transfer IN/ OUT CSP
●
Data Transfer Characteristics
–
Huge data volumes transfer IN and OUT the
Cloud Service Provider
–
Resulting in Network-bound DWS
–
Usually, the cost model adopted by CSPs is:
●
●
●
Data upload IN the CSP is free of charge
Data download OUT the CSP is priced
Data Transfer Metrics in the Cloud
–
–
30th May
2013
Time and cost for data upload
Time and cost for data download
DWS in the Cloud, AICCSA'13, Fez
19
20. High Performance Requirement
High Performance (ctnd. 1)
Requirement
--Workload Processing
●
Workload Processing Characteristics
–
–
●
Both I/O-bound and CPU-bound business
questions
Intra-query processing combined with virtual
partitioning or physical processing
Performance across Cluster Size
–
–
30th May
2013
For each business question, there is an
optimum response time for a particular cluster
size and performance degrades from this
optimum onward and backward
Proved for both SQL and NoSQL technologies
DWS in the Cloud, AICCSA'13, Fez
20
21. High Performance Requirement
High Performance (ctnd.2)
Requirement
--Workload Processing
●
30th May
2013
TPC-H benchmarking of Apache Hadoop/Pig
Latin
on
GRID5000
-Bordeaux
Site
[Moussa,ICCIT'12] (SF=10)
DWS in the Cloud, AICCSA'13, Fez
21
22. High Performance Requirement
High Performance (ctnd.3)
Requirement
--Workload Processing
●
Workload Processing Metrics
–
–
30th May
2013
Elapsed times for running business questions,
Slope: performance - cost
DWS in the Cloud, AICCSA'13, Fez
22
23. Scalability Requirement
●
Definition
–
●
Scalability is the ability of a system to
increase total throughput under an
increased load when hardware resources
are added..
Scalability Metric
–
Query Performance Metric under
●
●
30th May
2013
Ever increasing workload
Different query frequencies
DWS in the Cloud, AICCSA'13, Fez
23
24. Elasticity Requirement
●
Definition
–
●
Elasticity adjusts the system capacity at runtime by
adding and removing resources without service
interruption in order to handle the workload variation.
Elasticity Metric
–
–
Scaling Latency: elapsed time to scale-down and
scale-up
–
Impact on SUT performances during scale-up and
scale-down
–
Scale-up cost (+$)
–
30th May
2013
Capacity to add/remove resources: (0|1)
Scale-down gain (-$)
DWS in the Cloud, AICCSA'13, Fez
24
25. High Availability Requirement
–- Redundancy Strategies
●
Redundancy Strategies
–
–
●
Replication (a.k.a. mirroring)
Erasure-Resilient Codes
Redundancy Strategies vs. Workload Type
–
–
●
Replication suits OLTP workload
Erasure-resilient codes suits OLAP workload
Comparison [Litwin et al.,ACM TODS'05]
–
–
Computation cost
–
30th May
2013
Data storage cost
Communication cost
DWS in the Cloud, AICCSA'13, Fez
25
27. High Availability Requirement
--Metrics for the Cloud (ctnd.2)
●
High Availability Metrics
–
$@k: Cost of different targeted levels of
availabilities (1-available, . . . , k-available, i.e.
the number of failures the system can tolerate).
–
Cost of recovery expressed
●
●
30th May
2013
Time to get system back
Decreased system productivity caused by
the hardware failure ($) from customer
perspective
DWS in the Cloud, AICCSA'13, Fez
27
28. Cost Management Requirement
●
CSP price cost model
–
Different cloud service price models (IaaS,
PaaS, SaaS)
–
e.g.
CPU cost for IaaS: Instance based
(Amazon, MS Azur) or CPU-cycles based
(Cloud Sites, Google App Engine)
●
Query processing by Google BigQuery is
based on retrieved bytes (columnar storage)
Cost-Performance Ratio
●
●
●
30th May
2013
Cost-Effectiveness ratio
DWS in the Cloud, AICCSA'13, Fez
28
29. Related Work
●
Benchmarking in the cloud
–
[Gray,MS'08]: Terasoft Benchmark for data sort evaluations,
–
[Cooper et al., SoCC'10]: Yahoo Cloud Serving Benchmark
(YCSB) for evaluating the performance of "key-value" and
"cloud" serving stores.
–
[Sobel et al., ICCSA'08]: CloudStone Benchmark for Web2.0
applications
–
[Bennet et al., KDD'10]: MalStone Benchmarking for data
mining in the cloud
–
[Ang et al., USENIX'10]: CloudCMP project for CSP
comparison
–
[Binnig et al., DBTest'09], [Kossmann et al., SIGMOD'10]:
Benchmarking OLTP systems in the cloud
●
30th May
2013
DWS in the Cloud, AICCSA'13, Fez
29
30. Related Work (ctnd.1)
●
NoSQL and SQL Technologies Assessment in the cloud
–
–
●
[Pavlo et al. SIGMOD'09],
[Floratou et al., TPC-TC'11 ],
More Specific Issues
–
[Forrester, 2011]: Storage on-premises vs. in the cloud
–
[Nguyen et al., EDBT Workshops'12]: Materialized Views
Selection
–
[Moussa, IJWA'12]: OLAP Scenarios in the Cloud and OLAP
Workload Texonomy
30th May
2013
DWS in the Cloud, AICCSA'13, Fez
30
31. Conclusion & Future Work
●
Keynote scope
–
Overview of DWS
–
Insight of new requirements and new metrics to be
considered for benchmarking DWS in the cloud [Moussa,
AICCSA'13]
●
Research Perspectives
–
Assessment of OLAP systems in the cloud e
●
●
●
●
30th May
2013
Amazon RDS
Google BigQuery
MS Azure
...
DWS in the Cloud, AICCSA'13, Fez
31
32. Research Perspectives
--New OLTP Systems
●
Classical Workload Taxonomy
–
–
●
OLTP: Transactions, ACID properties
OLAP: complex queries, star-joins, grouping,
aggregations...
New OLTP Workload features:
–
–
Big Data
–
●
OLTP
Real-time analytics
Examples of systems: Google Spanner,
Clustrix, NuoDB and TransLattice
30th May
2013
DWS in the Cloud, AICCSA'13, Fez
34
33. Thank you for Your Attention
Q&A
?
Rim Moussa
Data Warehouse Systems in the Cloud
N2C'2013, Hammamet
30th May
2013
15th June 2013
DWS in the Cloud, AICCSA'13, Fez
35