4. A cloud database is a database that typically runs on a cloud
computing platform.
1. Virtual machine Image
2. Database as a service (DBaaS)
Cloud Database
4Relational Cloud
5. Relational Cloud 5
Moving tasks from database user to service operator:
• Configuration
• Scalability
• Performance tuning
• Backup
• Privacy
• Access control
• Licensing
• Pay-per-use
What is DBaaS?
6. Some Cloud Databases
• Amazon RDS
• Microsoft SQL Azure (MSSQL)
• Google Cloud SQL (MySQL)
• EnterpriseDB (PostgreSQL)
• Garantia Data (NoSQL)
• MongoLab (MongoDB)
• StormDB
• Xeround
10 Most Useful Cloud Databases
6Relational Cloud
7. Amazon RDS is a web service that makes it easy to set up, operate,
and scale a relational database in the cloud. It provides cost-
efficient and resizable capacity while managing time-consuming
database management tasks, freeing you up to focus on your
applications and business.
• MySQL (2009)
• Oracle (2011)
• SQL Server (2012)
• PostgreSQL (2013)
Amazon Relational Database Service (RDS)
7Relational Cloud
8. • A cloud-based service offering data-storage capabilities
• Based on Microsoft SQL Server
• High availability
• Elastic scale
• Rapid provisioning
• Pay-per-use
Microsoft SQL Azure
8Relational Cloud
11. Relational Cloud 12
Goal: minimize the number of machines required, while meeting
application-level query performance goals
1st approach: DB-in-VM
• Each database on a single VM
• Multiple VMs on a single physical machine
• Requires 2x to 3x more machines
• Delivers 6x to 12x less performance
Efficient Multi-tenancy
12. Relational Cloud 13
2nd approach:
• Single database server on each machine
• Multiple logical databases on each server
• Relational Cloud periodically determines which databases should
be placed on which machines
• Using a non-linear optimization formulation
• Estimates the resource utilization of multiple databases
Efficient Multi-tenancy
13. Relational Cloud 14
• When a database work-load exceeds the capacity of a single
machine
• Query processing (and the corresponding data) is partitioned
amongst multiple nodes
• Workload-aware partitioner
• Automatically analyze complex query workloads
• Map data items to nodes
• Minimize the number of multi-node transactions/statements
Elastic Scalability
14. Relational Cloud 15
• CryptDB
• Prevents administrators from seeing a user's data
• Adjustable security
• Different encryption levels for different types of data
• Only a 22.5% performance reduction in throughput
Privacy
15. Relational Cloud 16
• Existing unmodified DBMS engines in the back-end nodes
• Each tenant of the system can load one or more databases
• Applications communicate with Relational Cloud using a
standard connectivity layer such as JDBC.
System Design
17. Relational Cloud 18
• Partition each database into one or more pieces when the load
on a database exceeds the capacity of a single machine
• Place the database partitions on the back-end machines
• Minimize the number of machines
• Balance load
• Migrate the partitions as needed without causing downtime
• Replicate the data for availability
• Secure the data and process the queries so that they can run on
untrusted back-ends over encrypted data
Role of front-end nodes
18. Relational Cloud 19
• Goals:
• To scale a single database to multiple nodes
• To enable more granular placement and load balance
• Current strategy is well-suited to OLTP and Web workloads
• OLTP vs. OLAP
• Minimizes the number of multi-node transactions
• Workload-aware partitioning strategy
• Front-end node periodically analyzes query execution traces to
identify sets of tuples that are accessed together
Database Partitioning
19. Relational Cloud 21
• Execution graph (weighted)
• Each node is a tuple or collection of tuples
• An edge is drawn between any two nodes whose tuples are touched
within a single transaction
• G, Karypis and V, Kumar, A fast and high quality multilevel
scheme for partitioning irregular graphs, SIAM J. Sci. Comput.,
20(1), 1998
• Output of the partitioner is an assignment of individual tuples to
logical partitions
Database Partitioning
20. Relational Cloud 22
• Where to dispatch each query?
• Classification problem (Decision Tree)
• Features: the tuple attributes
• Target field: Partition label for each tuple
• Independence from schema layout & foreign key information
• Discover correlations hidden in the data
Database Partitioning
21. Relational Cloud 23
• Big graph problem!
• Database with N tuples
• N nodes
• N2 edges
• Existing graph partitioning implementations scale only to a few
tens of millions of nodes
• Heuristic methods:
• Blanket statement removal
• Sampling tuples and transactions
Database Partitioning
23. Relational Cloud 25
• Monitoring the resource requirements of each workload
• Predicting the load multiple workloads will generate when run
together on a server
• Assigning workloads to physical servers
• Migrating them between physical nodes
Monitoring and consolidation engine: Kairos
Kairos input: existing (non-consolidated) collection of workloads,
and a set of target physical machines
Placement & Migration
24. Relational Cloud 26
1. Resource Monitor
• Through an automated statistics collection process, the resource monitor
captures a number of DBMS and OS statistics
2. Combined Load Predictor
• Developed a non-linear model of CPU, RAM, and disk
• To predict the combined resource requirements when multiple workloads are
consolidated onto a single physical server
• Accuracy at predicting the combined disk requirements of multiple workloads
is up to 30 x better than simply assuming that disk I/O combines additively
3. Consolidation Engine
• Kairos uses non-linear optimization techniques to place database partitions on
back-end nodes
Kairos components
25. Relational Cloud 27
• Relocate database partitions across physical nodes
• Why migration?
1. For scheduled maintenance and administration tasks
2. To respond to load changes
• Live migration: without downtime or reducing performance
• Currently developing and testing a cache-like approach
Live Migration
27. Relational Cloud 29
• Encrypt each value of each row independently into an onion
• Back-end DBMS unable to answer queries
• A design that will allow DBAs to perform tuning tasks without
having any visibility into the actual stored data
• Adjustable Security
CryptDB
30. Relational Cloud 32
• Start the database with all data encrypted with the most private
scheme, RND.
• JDBC client has access to the keys for all onion layers of every
ciphertext stored on the server (by computing them based on a
single master key).
• When the JDBC client driver receives SQL queries from the
application, it computes the onion keys needed by the server to
decrypt certain columns to the maximum privacy level that will
allow the query execute on the server.
CryptDB
31. Relational Cloud 33
• Security level dynamically adapts based on the queries that
applications make.
• For simplicity, CryptDB encrypts all data items in a column using
the same set of keys.
• Each layer of the onion has a different key (different from any
other column).
CryptDB
32. Relational Cloud 34
• The encryption algorithms are symmetric; in order for the server
to remove a layer, the server must receive the symmetric onion
key for that layer from the JDBC client.
• Once the entire column has been decrypted, the original onion
ciphertext is discarded, since inner onion layers can support a
superset of queries compared to outer layers.
• Key factor in performance: ciphertext expansion
CryptDB
33. Relational Cloud 35
• SELECT i_price, ... FROM item WHERE i_id=N
• Initially each column in the database is separately encrypted in
several layers of encryption, RND the outer layer.
• JDBC client will decrypt the i_id column to DET level 4 by
sending the appropriate decryption key to the server.
• The query will return RND-encrypted ciphertexts to the JDBC
client, which will decrypt them for the application.
CryptDB Example
34. Relational Cloud 36
• SELECT c_discount, w_tax, ... FROM customer,
warehouse WHERE w_id=c_w_id AND c_id=N
• JDBC client needs to decrypt the w_id and c_w_id columns to
DET level 2.
• JDBC client needs to decrypt c_id column to DET level 4 and
send the DET-encrypted value N to the server.
CryptDB Example
35. Relational Cloud 37
• SELECT SUM(ol_amount) FROM order_line WHERE
ol_o_id=N
• Server needs the keys to adjust the encryption of the
ol_amount field to HOM.
CryptDB Example
39. Relational Cloud 41
• Measured the time to process 100,000 statements
(selects/updates)
• Client-side overhead: an average per statement 25.6 ms
• Server-side overhead:
Experiments: CryptDB Performance
Virtual allow users to purchase virtual machine instances for a limited time
Users can either upload their own machine image with a database installed on it, or use ready-made machine images that already include an optimized installation of a database
Outsourcing
Efficient multi-tenancy. Given a set of databases and workloads, what is the best way to serve them from a given set of machines?
Elastic scalability. A good DBaaS must support database and work-loads of different sizes, The challenge arise when a database work-load exceeds the capacity of a single machine
Privacy. A significant barrier to deploying databases in the cloud is the perceived lack of privacy, which in turn reduces the degree of trust users are willing to place in the system
each VM contains a separate copy of the OS and database,and each database has its own buffer pool, forces its own log to disk, etc
JDBC is a Java database connectivity from Oracle Corporation.
This technology is an API that defines how a client may access a database. It provides methods for querying and updating data in a database.
(1) Blanket statement removal, i,e,, the exclusion from the graph occasional statements that scan large portions of the database
Consolidate: combine (a number of things) into a single more effective or logical whole
2: The reason is that two combined workloads perform many fewer I/Os than the sum of their individual I/Os: when combined, workloads share a single log & buffer, and can both benefit from group commit. Moreover, DBMS perform a substantial amount of non-essential I/O during idle periods.
3: (1) minimize the number of machines required to support a given workload mix, and (2) balance load across the back-end machine
After the client issues a few queries, the server removes any unneeded onion layers of encryption, and from then on, it does not perform any more cryptographic operations.