SlideShare una empresa de Scribd logo
1 de 319
Descargar para leer sin conexión
Felix Gessert
(29.04.2014)
Slides: baqend.com/nosql.pdf
About me
 PhD student (database group university of hamburg)
About me
 PhD student (database group university of hamburg)
Research Project for PhD
About me
 PhD student (database group university of hamburg)
Research Project for PhD
Cloud Database Startup
Outline
• Categories of Cloud
Databases
• Properties
What are Cloud
Databases?
Cloud Databases in
the wild
Research Perspectives
Wrap-up and
literature
2003: GFS (Google)
2006: BigTable (Google)
2007: Dynamo (Amazon)
2007: S3 (Amazon)
2008: SimpleDB (Amazon)
2010: SQL Azure
2011: Parse
2012: DynamoDB (Amazon)
2013: Redshift (Amazon)
Context: What kinds of Cloud
databases are there?
DBaaS
Infrastructure-as-a-Service
Platform-as-a-Service
…Database-as-a-Service
Cloud SQL
Amazon RDS
SQL Azure
Cloudant
MongoHQ
Parse
Orestes
Google F1
DynamoDB BigQuery
EMR GCS
S3
DBaaS
Infrastructure-as-a-Service
Platform-as-a-Service
…Database-as-a-Service
Cloud SQL
Amazon RDS
SQL Azure
Cloudant
MongoHQ
Parse
Orestes
Google F1
DynamoDB
Managed
RDBMSs
BigQuery
EMR GCS
S3
DBaaS
Infrastructure-as-a-Service
Platform-as-a-Service
…Database-as-a-Service
Cloud SQL
Amazon RDS
SQL Azure
Cloudant
MongoHQ
Parse
Orestes
Google F1
DynamoDB
Managed
RDBMSs
Cloud-Deployment
of DBMSs
BigQuery
EMR GCS
S3
DBaaS
Infrastructure-as-a-Service
Platform-as-a-Service
…Database-as-a-Service
Cloud SQL
Amazon RDS
SQL Azure
Cloudant
MongoHQ
Parse
Orestes
Google F1
DynamoDB
Managed
NoSQL DBs
Managed
RDBMSs
Cloud-Deployment
of DBMSs
BigQuery
EMR GCS
S3
DBaaS
Infrastructure-as-a-Service
Platform-as-a-Service
…Database-as-a-Service
Cloud SQL
Amazon RDS
SQL Azure
Cloudant
MongoHQ
Parse
Orestes
Google F1
DynamoDB
Managed
NoSQL DBs
Managed
RDBMSs
Cloud-Deployment
of DBMSs
Cloud-only
DBaaS-Systems
BigQuery
EMR GCS
S3
DBaaS
Infrastructure-as-a-Service
Platform-as-a-Service
…Database-as-a-Service
Cloud SQL
Amazon RDS
SQL Azure
Cloudant
MongoHQ
Parse
Orestes
Google F1
DynamoDB
Managed
NoSQL DBs
Managed
RDBMSs
Cloud-Deployment
of DBMSs
Cloud-only
DBaaS-Systems
BigQuery
EMR
Analytics-as-
a-Service
GCS
S3
DBaaS
Infrastructure-as-a-Service
Platform-as-a-Service
…Database-as-a-Service
Cloud SQL
Amazon RDS
SQL Azure
Cloudant
MongoHQ
Parse
Orestes
Google F1
DynamoDB
Managed
NoSQL DBs
Managed
RDBMSs
Cloud-Deployment
of DBMSs
Cloud-only
DBaaS-Systems
BigQuery
EMR
Analytics-as-
a-Service
GCS
S3
Object
Stores
DBaaS
Infrastructure-as-a-Service
Platform-as-a-Service
…Database-as-a-Service
Cloud SQL
Amazon RDS
SQL Azure
Cloudant
MongoHQ
Parse
Orestes
Google F1
DynamoDB
Managed
NoSQL DBs
Managed
RDBMSs
Storage APIs
Cloud-Deployment
of DBMSs
Cloud-only
DBaaS-Systems
BigQuery
EMR
Analytics-as-
a-Service
GCS
S3
Object
Stores
DBaaS
Infrastructure-as-a-Service
Platform-as-a-Service
…Database-as-a-Service
Cloud SQL
Amazon RDS
SQL Azure
Cloudant
MongoHQ
Parse
Orestes
Google F1
DynamoDB
Managed
NoSQL DBs
Managed
RDBMSs
Backend-as-a-
Service
Storage APIs
Cloud-Deployment
of DBMSs
Cloud-only
DBaaS-Systems
BigQuery
EMR
Analytics-as-
a-Service
GCS
S3
Object
Stores
Cloud-deployed
database
Data-Analytics-
as-a-Service
Database-as-a-
Service
SQL
Managed
RDBMS
Managed
DWH
NoSQL
Managed
NoSQL DB
Backend-as-a-
Service
Proprietary or
Polyglot
Files Object Store
Cloud Databases
Cloud-deployed
database
Data-Analytics-
as-a-Service
Database-as-a-
Service
SQL
Managed
RDBMS
Managed
DWH
NoSQL
Managed
NoSQL DB
Backend-as-a-
Service
Proprietary or
Polyglot
Files Object Store
Cloud Databases
Standard
Interface
Cloud-deployed
database
Data-Analytics-
as-a-Service
Database-as-a-
Service
SQL
Managed
RDBMS
Managed
DWH
NoSQL
Managed
NoSQL DB
Backend-as-a-
Service
Proprietary or
Polyglot
Files Object Store
Cloud Databases
Standard
Interface
Vendor-
specific
Interface
Application Architecture <-> Cloud Database Category
Architectures
Applications
Data
Warehouse
Operative
Database
Reporting Data MiningAnalytics
DataManagementDataAnalytics
Application Architecture <-> Cloud Database Category
Architectures
Applications
Data
Warehouse
Operative
Database
Reporting Data MiningAnalytics
DataManagementDataAnalytics
DBaaS
Application Architecture <-> Cloud Database Category
Architectures
Applications
Data
Warehouse
Operative
Database
Reporting Data MiningAnalytics
DataManagementDataAnalytics
DBaaSProbaby not.
Cloud-Deployed Database
IaaS-Cloud
Cloud-Deployed Database
IaaS-Cloud
Cloud-deploy your
favourite database system
Cloud-Deployed Database
IaaS-Cloud
Cloud-deploy your
favourite database system
Does not solve:
Provisioning, Backups, Security,
Scaling, Elasticity, Performance
Tuning, Failover, Replication, ...
Managed RDBMS/DWH/NoSQL DB
IaaS-Cloud
DBaaS-Provider
Managed RDBMS/DWH/NoSQL DB
IaaS-Cloud
RDBMS DWH NoSQL DB
DBaaS-Provider
Provisioning, Backups, Security,
Scaling, Elasticity, Performance
Tuning, Failover, Replication, ...
Managed RDBMS/DWH/NoSQL DB
IaaS-Cloud
RDBMS DWH NoSQL DB
DBaaS-Provider
Managed RDBMS/DWH/NoSQL DB
IaaS-Cloud
RDBMS DWH NoSQL DB
DBaaS-Provider
SQL Azure
Google
Cloud SQL
RDBMS
Managed RDBMS/DWH/NoSQL DB
IaaS-Cloud
RDBMS DWH NoSQL DB
DBaaS-Provider
SQL Azure
Google
Cloud SQL
RDBMSNoSQLDB
Managed RDBMS/DWH/NoSQL DB
IaaS-Cloud
RDBMS DWH NoSQL DB
DBaaS-Provider
Amazon Redshift
SQL Azure
Google
Cloud SQL
RDBMSNoSQLDBDWH
Proprietary DB/Object Store
Cloud
Black-Box Database
or file system
Managed by
Cloud Provider
Provider‘sAPI
Proprietary DB/Object Store
Cloud
Black-Box Database
or file system
Managed by
Cloud Provider
Provider‘sAPI
Amazon
SimpleDB
Google Cloud
Datastore
Azure Tables
Database.com
BigTable, Megastore, Spanner, F1, Dynamo,
PNuts, Relational Cloud, …
ProprietaryDB
Proprietary DB/Object Store
Cloud
Black-Box Database
or file system
Managed by
Cloud Provider
Provider‘sAPI
Amazon
SimpleDB
Google Cloud
Storage
Azure Blob
Storage
Google Cloud
Datastore
Azure Tables
Openstack
Swift
Database.com
BigTable, Megastore, Spanner, F1, Dynamo,
PNuts, Relational Cloud, …
ProprietaryDBObjectStore
Backend-as-a-Service, Polyglot Persistence
Service
IaaS-Cloud
Backend API
Service-Layer
Data API
Backend-as-a-Service, Polyglot Persistence
Service
IaaS-Cloud
Backend API
Service-Layer
Data API
Authentication,
Users, Validation,etc.
Maps to (different)
databases
Backend-as-a-Service, Polyglot Persistence
Service
IaaS-Cloud
Backend API
Service-Layer
Data API
Realtime
BaaS
Backend-as-a-Service, Polyglot Persistence
Service
IaaS-Cloud
Backend API
Service-Layer
Data API
Realtime
BaaS
BaaS
AppCelerator
Cloud
Analytics-as-a-Service
Cloud
Analytics Cluster
Provisioning,
Data Ingest
Analytics-as-a-Service
Cloud
Analytics Cluster
Provisioning,
Data Ingest
Azure
HDInsight
Amazon Elastic
MapReduce
Hadoop
Analytics-as-a-Service
Cloud
Analytics Cluster
Provisioning,
Data Ingest
Azure
HDInsight
Google
BigQuery
Google
Prediction API
Amazon Elastic
MapReduce
HadoopCustom
DBaaS: Common Aspects
#1 Metric: Total Cost
Daniela Florescu and Donald Kossmann “Rethinking cost and
performance of database systems”, SIGMOD Rec. 2009.
DBaaS: Common Aspects
#1 Metric: Total Cost
Maximum utilization of
available hardware:
Multi-Tenancy
Daniela Florescu and Donald Kossmann “Rethinking cost and
performance of database systems”, SIGMOD Rec. 2009.
DBaaS: Common Aspects
 Multi-Tenancy - four common approaches:
T. Kiefer, W. Lehner “Private table database virtualization for dbaas”
UCC, 2011
Private OS Private Process/DB Private Schema Shared Schema
DBaaS: Common Aspects
 Multi-Tenancy - four common approaches:
T. Kiefer, W. Lehner “Private table database virtualization for dbaas”
UCC, 2011
Private OS
VM
Hardware Resources
Database Process
Database
Schema
Private Process/DB Private Schema Shared Schema
e.g. Amazon RDS
DBaaS: Common Aspects
 Multi-Tenancy - four common approaches:
T. Kiefer, W. Lehner “Private table database virtualization for dbaas”
UCC, 2011
Private OS
VM
Hardware Resources
Database Process
Database
Schema
Private Process/DB Private Schema
VM
Hardware Resources
Database Process
Database
Schema
Shared Schema
e.g. Amazon RDS e.g. MongoHQ
DBaaS: Common Aspects
 Multi-Tenancy - four common approaches:
T. Kiefer, W. Lehner “Private table database virtualization for dbaas”
UCC, 2011
Private OS
VM
Hardware Resources
Database Process
Database
Schema
Private Process/DB Private Schema
VM
Hardware Resources
Database Process
Database
Schema
VM
Hardware Resources
Database Process
Database
Schema
Shared Schema
e.g. Amazon RDS e.g. MongoHQ e.g. Google DataStore
DBaaS: Common Aspects
 Multi-Tenancy - four common approaches:
T. Kiefer, W. Lehner “Private table database virtualization for dbaas”
UCC, 2011
Private OS
VM
Hardware Resources
Database Process
Database
Schema
Private Process/DB Private Schema
VM
Hardware Resources
Database Process
Database
Schema
VM
Hardware Resources
Database Process
Database
Schema
Shared Schema
VM
Hardware Resources
Database Process
Database
Schema
Virtual Schema
e.g. Amazon RDS e.g. MongoHQ e.g. Google DataStore Most SaaS Apps
DBaaS: Common Aspects
 Multi-Tenancy - four common approaches:
W. Lehner, U. Sattler “Web-scale Data Management for the Cloud”
Springer, 2013
Private OS
Private
Process/DB
Private Schema
Shared Schema
App.
indep.
Isolation
Ressource
Util.
Maintenance,
Provisioning
 Billing Models:
DBaaS: Common Aspects
Usage
Account
 Billing Models:
DBaaS: Common Aspects
Usage
Account
Pay-per-use
Parameters: Network, Bandwidth,
Storage, CPU, Requests, etc.
Payment: Pre-Paid, Post-Paid
Variants: On-Demand, Auction, Reserved
e.g. DynamoDB
 Billing Models:
DBaaS: Common Aspects
Usage
Account
End of
month
Plan-based
Parameters: Allocated Plan (e.g.
2 instances + X GB storage)
e.g. MongoHQ
 Billing Models:
DBaaS: Common Aspects
Usage
Account
End of
month
Plan-based
Parameters: Allocated Plan (e.g.
2 instances + X GB storage)
Free Tier: free plan or free initial
account credit
e.g. MongoHQ
DBaaS: Common Aspects
Database-a-
a-Service
Authentication
Authorization
API
DBaaS: Common Aspects
Database-a-
a-Service
Authentication
Authorization
API
Authenticate
DBaaS: Common Aspects
Internal Schemes External Identity
Provider
Federated Identity
(SSO)
e.g. Amazon IAM e.g. OpenID e.g. SAML
Database-a-
a-Service
Authentication
Authorization
API
Authenticate
DBaaS: Common Aspects
Internal Schemes External Identity
Provider
Federated Identity
(SSO)
e.g. Amazon IAM e.g. OpenID e.g. SAML
Used extensively
Database-a-
a-Service
Authentication
Authorization
API
Authenticate
DBaaS: Common Aspects
Internal Schemes External Identity
Provider
Federated Identity
(SSO)
e.g. Amazon IAM e.g. OpenID e.g. SAML
Used extensively
Database-a-
a-Service
Authentication
Authorization
API
Authenticate
Token
DBaaS: Common Aspects
Internal Schemes External Identity
Provider
Federated Identity
(SSO)
e.g. Amazon IAM e.g. OpenID e.g. SAML
Used extensively
Database-a-
a-Service
Authentication
Authorization
API
Authenticate
Token
Authenticated Request
DBaaS: Common Aspects
Internal Schemes External Identity
Provider
Federated Identity
(SSO)
e.g. Amazon IAM e.g. OpenID e.g. SAML
Used extensively
User-based Access
Control
Role-based Access
Control
Policies
e.g. Amazon S3 ACLs e.g. Amazon IAM e.g. XACML
Database-a-
a-Service
Authentication
Authorization
API
Authenticate
Token
Authenticated Request
DBaaS: Common Aspects
Internal Schemes External Identity
Provider
Federated Identity
(SSO)
e.g. Amazon IAM e.g. OpenID e.g. SAML
Used extensively
User-based Access
Control
Role-based Access
Control
Policies
e.g. Amazon S3 ACLs e.g. Amazon IAM e.g. XACML
Database-a-
a-Service
Authentication
Authorization
API
Authenticate
Token
Authenticated Request
Response
DBaaS: Common Aspects
Internal Schemes External Identity
Provider
Federated Identity
(SSO)
e.g. Amazon IAM e.g. OpenID e.g. SAML
Used extensively
User-based Access
Control
Role-based Access
Control
Policies
e.g. Amazon S3 ACLs e.g. Amazon IAM e.g. XACML
Database-a-
a-Service
Authentication
Authorization
API
Authenticate
Token
Authenticated Request
Response
DBaaS: Common Aspects
Internal Schemes External Identity
Provider
Federated Identity
(SSO)
e.g. Amazon IAM e.g. OpenID e.g. SAML
Used extensively
User-based Access
Control
Role-based Access
Control
Policies
e.g. Amazon S3 ACLs e.g. Amazon IAM e.g. XACML
Database-a-
a-Service
Authentication
Authorization
API
Authenticate
Token
Authenticated Request
Response
Federated ACLs
M. Decat, B. Lagaisse, et al. “Toward efficient and confidentiality-aware
federation of access control policies “, DOA Trusted Cloud 2013
• Imagine ACL: „Patient data can only be
accessed by treating physician“
• Idea: decompose policy and evaluate parts near
data owner
 Service Level Agreements
DBaaS: Common Aspects
SLA
 Service Level Agreements
DBaaS: Common Aspects
SLA
Legal Part
1. Fees
2. Penalties
Technical Part
1. SLO
2. SLO
3. SLO
 Service Level Agreements
DBaaS: Common Aspects
SLA
Legal Part
1. Fees
2. Penalties
Technical Part
1. SLO
2. SLO
3. SLO
Service Level Objectives:
• Availability
• Durability
• Consistency/Staleness
• Query Response Time
 SLAs – achieved through Workload Management
DBaaS: Common Aspects
W. Lehner, U. Sattler “Web-scale Data Management for the Cloud”
Springer, 2013
 SLAs – achieved through Workload Management
DBaaS: Common Aspects
W. Lehner, U. Sattler “Web-scale Data Management for the Cloud”
Springer, 2013
 SLAs – achieved through Workload Management
DBaaS: Common Aspects
W. Lehner, U. Sattler “Web-scale Data Management for the Cloud”
Springer, 2013
 SLAs – achieved through Workload Management
DBaaS: Common Aspects
W. Lehner, U. Sattler “Web-scale Data Management for the Cloud”
Springer, 2013
 SLAs – achieved through Workload Management
DBaaS: Common Aspects
W. Lehner, U. Sattler “Web-scale Data Management for the Cloud”
Springer, 2013
Maximize:
 SLAs – achieved through Workload Management
DBaaS: Common Aspects
W. Lehner, U. Sattler “Web-scale Data Management for the Cloud”
Springer, 2013
 SLAs – achieved through Workload Management
DBaaS: Common Aspects
W. Lehner, U. Sattler “Web-scale Data Management for the Cloud”
Springer, 2013
QOS for NoSQL DBs
Y. Zhu et al. “Scheduling with Freshness and Performance Guarantees for
Web Applications in the Cloud“, CRPIT
Old:
Workload management in RDBMs (DB2 and
Oracle)
New:
Use well-known scheduling algorithms for queries
in replicated DBs
 Resource Provisionig
 Goal: Resources ⇔ SLAs
DBaaS: Common Aspects
T. Lorido-Botran, J. Miguel-Alonso et al.: “Auto-scaling Techniques for
Elastic Applications in Cloud Environments”. Technical Report, 2013
Resources
Time
 Resource Provisionig
 Goal: Resources ⇔ SLAs
DBaaS: Common Aspects
T. Lorido-Botran, J. Miguel-Alonso et al.: “Auto-scaling Techniques for
Elastic Applications in Cloud Environments”. Technical Report, 2013
Resources
Time
Expected
Load
 Resource Provisionig
 Goal: Resources ⇔ SLAs
DBaaS: Common Aspects
T. Lorido-Botran, J. Miguel-Alonso et al.: “Auto-scaling Techniques for
Elastic Applications in Cloud Environments”. Technical Report, 2013
Resources
Time
Expected
Load
Provisioned Resources:
• #No of Shard- or Replica
servers
• Computing, Storage,
Network Capacities
 Resource Provisionig
 Goal: Resources ⇔ SLAs
DBaaS: Common Aspects
T. Lorido-Botran, J. Miguel-Alonso et al.: “Auto-scaling Techniques for
Elastic Applications in Cloud Environments”. Technical Report, 2013
Resources
Time
Actual
Load
 Resource Provisionig
 Goal: Resources ⇔ SLAs
DBaaS: Common Aspects
T. Lorido-Botran, J. Miguel-Alonso et al.: “Auto-scaling Techniques for
Elastic Applications in Cloud Environments”. Technical Report, 2013
Resources
Time
Actual
Load
Overprovisioning:
• SLAs met
• Excess Capacities
 Resource Provisionig
 Goal: Resources ⇔ SLAs
DBaaS: Common Aspects
T. Lorido-Botran, J. Miguel-Alonso et al.: “Auto-scaling Techniques for
Elastic Applications in Cloud Environments”. Technical Report, 2013
Resources
Time
Actual
Load
Overprovisioning:
• SLAs met
• Excess Capacities
Underprovisioning:
• SLAs violated
• Usage maximized
 Resource Provisionig
 Goal: Resources ⇔ SLAs
DBaaS: Common Aspects
T. Lorido-Botran, J. Miguel-Alonso et al.: “Auto-scaling Techniques for
Elastic Applications in Cloud Environments”. Technical Report, 2013
Resources
Time
Actual
Load
Overprovisioning:
• SLAs met
• Excess Capacities
Underprovisioning:
• SLAs violated
• Usage maximized
SmartSLA
P. Xiong: “Intelligent management of virtualized resources for database
systems in cloud environment”, ICDE 2011
Solution: machine learning (regression + boosting)
for prediction  choose allocation that minimizes
SLA penalties
Resource
allocation
Database
performance
Learn
Mapping
Functional
Requirements
Scan-Querys
Conditional Updates
Transactions
Query by Example
Joins
Analytics
Elasticity
Consistency
Read-Latency
Write-Latency
Write-Throughput
Scalability of Data Volume
Read Scalability
Read-Availability
Write-Availability
Non-Functional
Requirements
Durability
Write Scalability
DBaaS: General Considerations
Functional
Requirements
Scan-Querys
Conditional Updates
Transactions
Query by Example
Joins
Analytics
Elasticity
Consistency
Read-Latency
Write-Latency
Write-Throughput
Scalability of Data Volume
Read Scalability
Read-Availability
Write-Availability
Non-Functional
Requirements
Durability
Write Scalability
DBaaS: General Considerations
Functional
Requirements
Scan-Querys
Conditional Updates
Transactions
Query by Example
Joins
Analytics
Elasticity
Consistency
Read-Latency
Write-Latency
Write-Throughput
Scalability of Data Volume
Read Scalability
Read-Availability
Write-Availability
Non-Functional
Requirements
Durability
Write Scalability
DBaaS: General Considerations
Functional
Requirements
Scan-Querys
Conditional Updates
Transactions
Query by Example
Joins
Analytics
Elasticity
Consistency
Read-Latency
Write-Latency
Write-Throughput
Scalability of Data Volume
Read Scalability
Read-Availability
Write-Availability
Non-Functional
Requirements
Durability
Write Scalability
DBaaS: General Considerations
aaS
Functional
Requirements
Scan-Querys
Conditional Updates
Transactions
Query by Example
Joins
Analytics
Elasticity
Consistency
Read-Latency
Write-Latency
Write-Throughput
Scalability of Data Volume
Read Scalability
Read-Availability
Write-Availability
Non-Functional
Requirements
Durability
Write Scalability
DBaaS: General Considerations
aaS
Questions to ask:
• Which requirements are met by the DB?
• Which are met by the provider?  SLAs
Outline
Examples of different cloud
database systems:
• Cloud-deployed
• Managed DBMS
• SQL
• NoSQL
• Proprietary
• BaaS
What are Cloud
Databases?
Cloud Databases in
the wild
Research Perspectives
Wrap-up and
literature
 Idea: Run (mostly) unmodified DB on IaaS
Cloud-Deployed DB
 Method I: DIY
 Method II: Deployment Tools
 Method III: Marketplaces
 Idea: Run (mostly) unmodified DB on IaaS
Cloud-Deployed DB
 Method I: DIY
 Method II: Deployment Tools
 Method III: Marketplaces
1. Provision VM(s)
 Idea: Run (mostly) unmodified DB on IaaS
Cloud-Deployed DB
 Method I: DIY
 Method II: Deployment Tools
 Method III: Marketplaces
1. Provision VM(s) 2. Install DBMS (manual, script,
Chef, Puppet)
 Idea: Run (mostly) unmodified DB on IaaS
Cloud-Deployed DB
 Method I: DIY
 Method II: Deployment Tools
 Method III: Marketplaces
> whirr launch-cluster --config
hbase.properties
Login, cluster-size etc. Amazon EC2
1. Provision VM(s) 2. Install DBMS (manual, script,
Chef, Puppet)
 Idea: Run (mostly) unmodified DB on IaaS
Cloud-Deployed DB
 Method I: DIY
 Method II: Deployment Tools
 Method III: Marketplaces
> whirr launch-cluster --config
hbase.properties
Login, cluster-size etc. Amazon EC2
1. Provision VM(s) 2. Install DBMS (manual, script,
Chef, Puppet)
 Idea: Run preconfigured DB on IaaS
AWS Marketplace AWS
Marketplace
Model:
Cloud-Deployed
Pricing:
Instance +
Volume +
License
Underlying DB:
Choosable
API:
DB-specific
 Idea: Run preconfigured DB on IaaS
AWS Marketplace AWS
Marketplace
Model:
Cloud-Deployed
Pricing:
Instance +
Volume +
License
Underlying DB:
Choosable
API:
DB-specific
 Idea: Run preconfigured DB on IaaS
AWS Marketplace AWS
Marketplace
Model:
Cloud-Deployed
Pricing:
Instance +
Volume +
License
Underlying DB:
Choosable
API:
DB-specific
 Idea: Run preconfigured DB on IaaS
AWS Marketplace AWS
Marketplace
Model:
Cloud-Deployed
Pricing:
Instance +
Volume +
License
Underlying DB:
Choosable
API:
DB-specific
 Idea: Run preconfigured DB on IaaS
AWS Marketplace AWS
Marketplace
Model:
Cloud-Deployed
Pricing:
Instance +
Volume +
License
Underlying DB:
Choosable
API:
DB-specific
 Idea: Run preconfigured DB on IaaS
AWS Marketplace AWS
Marketplace
Model:
Cloud-Deployed
Pricing:
Instance +
Volume +
License
Underlying DB:
Choosable
API:
DB-specific
 Idea: Run preconfigured DB on IaaS
AWS Marketplace AWS
Marketplace
Model:
Cloud-Deployed
Pricing:
Instance +
Volume +
License
Underlying DB:
Choosable
API:
DB-specific
 Idea: Run preconfigured DB on IaaS
AWS Marketplace AWS
Marketplace
Model:
Cloud-Deployed
Pricing:
Instance +
Volume +
License
Underlying DB:
Choosable
API:
DB-specific
Bad:
• No Clusters
• Not managed (automatic Updates, Snapshots, etc.)
• Private OS Multi-Tenancy  Bad Resource Usage
Good:
• Easy to get started
Amazon Elastic MapReduce EMR
Model:
Analytics-aaS
Pricing:
Infrastructure
API:
Hadoop
Amazon Elastic
MapReduce
W. Lehner, U. Sattler “Web-scale Data Management for the Cloud”
Springer, 2013
Amazon Elastic MapReduce EMR
Model:
Analytics-aaS
Pricing:
Infrastructure
API:
Hadoop
Amazon Elastic
MapReduce
Provisions
W. Lehner, U. Sattler “Web-scale Data Management for the Cloud”
Springer, 2013
Amazon Elastic MapReduce EMR
Model:
Analytics-aaS
Pricing:
Infrastructure
API:
Hadoop
Amazon Elastic
MapReduce
Job Tracker
Task Tracker +
HDFS Data Node
Task Tracker
Provisions
W. Lehner, U. Sattler “Web-scale Data Management for the Cloud”
Springer, 2013
Amazon Elastic MapReduce EMR
Model:
Analytics-aaS
Pricing:
Infrastructure
API:
Hadoop
Amazon Elastic
MapReduce
Data Source
and Sink
Job Tracker
Task Tracker +
HDFS Data Node
Task Tracker
Provisions
W. Lehner, U. Sattler “Web-scale Data Management for the Cloud”
Springer, 2013
Amazon Elastic MapReduce EMR
Model:
Analytics-aaS
Pricing:
Infrastructure
API:
Hadoop
Amazon Elastic
MapReduce
Submits Hadoop Jobs as:
• JAR
• Streaming
• Cascading
• Pig
• Hive
• Impala
Data Source
and Sink
Job Tracker
Task Tracker +
HDFS Data Node
Task Tracker
Provisions
W. Lehner, U. Sattler “Web-scale Data Management for the Cloud”
Springer, 2013
Amazon Elastic MapReduce EMR
Model:
Analytics-aaS
Pricing:
Infrastructure
API:
Hadoop
Amazon Elastic
MapReduce
Submits Hadoop Jobs as:
• JAR
• Streaming
• Cascading
• Pig
• Hive
• Impala
Data Source
and Sink
Job Tracker
Task Tracker +
HDFS Data Node
Task Tracker
Provisions
W. Lehner, U. Sattler “Web-scale Data Management for the Cloud”
Springer, 2013
• No data locality with S3
• AWS Import/Export: send your HDD
• HBase Integration
• Compatible with Spot and Reserved Instances
• Similar: Azure HDInsight
 Idea: Web-scale analysis of nested data
Google BigQuery BigQuery
Model:
Analytics-aaS
Pricing:
Storage + GBs
Processed
API:
REST
Google
BigQuery
 Idea: Web-scale analysis of nested data
Google BigQuery BigQuery
Model:
Analytics-aaS
Pricing:
Storage + GBs
Processed
API:
REST
Google
BigQuery
 Idea: Web-scale analysis of nested data
Google BigQuery BigQuery
Model:
Analytics-aaS
Pricing:
Storage + GBs
Processed
API:
REST
Google
BigQuery
Dremel
Melnik et al. “Dremel: Interactive analysis
of web-scale datasets”, VLDB 2010
Idea:
Multi-Level execution tree on
nested columnar data format
(≥100 nodes)
 Idea: Web-scale analysis of nested data
Google BigQuery BigQuery
Model:
Analytics-aaS
Pricing:
Storage + GBs
Processed
API:
REST
Google
BigQuery
Dremel
Melnik et al. “Dremel: Interactive analysis
of web-scale datasets”, VLDB 2010
Idea:
Multi-Level execution tree on
nested columnar data format
(≥100 nodes)
• SLA: 99.9% uptime / month
• Fundamentally different from relational DWHs
and MapReduce
• Design copied by Apache Drill, Impala, Shark
 Relational Database Service
Amazon RDS RDS
Model:
Managed RDBMS
Pricing:
Instance + Volume
+ License
Underlying DB:
MySQL, Postgres,
MSSQL, Oracle
API:
DB-specific
 Relational Database Service
Amazon RDS RDS
Model:
Managed RDBMS
Pricing:
Instance + Volume
+ License
Underlying DB:
MySQL, Postgres,
MSSQL, Oracle
API:
DB-specific
 Relational Database Service
Amazon RDS RDS
Model:
Managed RDBMS
Pricing:
Instance + Volume
+ License
Underlying DB:
MySQL, Postgres,
MSSQL, Oracle
API:
DB-specific
• Synchronous Replication
• Automatic Failover
 Relational Database Service
Amazon RDS RDS
Model:
Managed RDBMS
Pricing:
Instance + Volume
+ License
Underlying DB:
MySQL, Postgres,
MSSQL, Oracle
API:
DB-specific
• Synchronous Replication
• Automatic Failover
99,95% uptime SLA
 Relational Database Service
Amazon RDS RDS
Model:
Managed RDBMS
Pricing:
Instance + Volume
+ License
Underlying DB:
MySQL, Postgres,
MSSQL, Oracle
API:
DB-specific
• Synchronous Replication
• Automatic Failover
99,95% uptime SLA
Provisioned IOPS: access to
EBS volumes network-
optimized (up to 4000 IOPS)
 Relational Database Service
Amazon RDS RDS
Model:
Managed RDBMS
Pricing:
Instance + Volume
+ License
Underlying DB:
MySQL, Postgres,
MSSQL, Oracle
API:
DB-specific
 Relational Database Service
Amazon RDS RDS
Model:
Managed RDBMS
Pricing:
Instance + Volume
+ License
Underlying DB:
MySQL, Postgres,
MSSQL, Oracle
API:
DB-specific
EC2 instances: Up to 32
Cores, 244 GB RAM, 10 GbE
 Relational Database Service
Amazon RDS RDS
Model:
Managed RDBMS
Pricing:
Instance + Volume
+ License
Underlying DB:
MySQL, Postgres,
MSSQL, Oracle
API:
DB-specific
EC2 instances: Up to 32
Cores, 244 GB RAM, 10 GbE
Minor Version Upgrades are
performed without downtime
 Relational Database Service
Amazon RDS RDS
Model:
Managed RDBMS
Pricing:
Instance + Volume
+ License
Underlying DB:
MySQL, Postgres,
MSSQL, Oracle
API:
DB-specific
 Relational Database Service
Amazon RDS RDS
Model:
Managed RDBMS
Pricing:
Instance + Volume
+ License
Underlying DB:
MySQL, Postgres,
MSSQL, Oracle
API:
DB-specific
Backups are automated and
scheduled
 Relational Database Service
Amazon RDS RDS
Model:
Managed RDBMS
Pricing:
Instance + Volume
+ License
Underlying DB:
MySQL, Postgres,
MSSQL, Oracle
API:
DB-specific
Backups are automated and
scheduled
• Support for (asynchronous) Read Replicas
• Administration: Web-based or SDKs
• Only RDBMSs
• “Analytic Brother“ of RDS: RedShift (PDWH)
 Similar to RDS
Microsoft SQL Azure SQL Azure
Model:
Managed RDBMS
Pricing:
Database size
Underlying DB:
MSSQL Server
API:
T-SQL/TDS
SQL Azure
 Similar to RDS
Microsoft SQL Azure SQL Azure
Model:
Managed RDBMS
Pricing:
Database size
Underlying DB:
MSSQL Server
API:
T-SQL/TDS
SQL Azure
 Similar to RDS
Microsoft SQL Azure SQL Azure
Model:
Managed RDBMS
Pricing:
Database size
Underlying DB:
MSSQL Server
API:
T-SQL/TDS
SQL Azure
 Similar to RDS
Microsoft SQL Azure SQL Azure
Model:
Managed RDBMS
Pricing:
Database size
Underlying DB:
MSSQL Server
API:
T-SQL/TDS
SQL Azure
 Similar to RDS
Microsoft SQL Azure SQL Azure
Model:
Managed RDBMS
Pricing:
Database size
Underlying DB:
MSSQL Server
API:
T-SQL/TDS
SQL Azure
 Similar to RDS
Microsoft SQL Azure SQL Azure
Model:
Managed RDBMS
Pricing:
Database size
Underlying DB:
MSSQL Server
API:
T-SQL/TDS
SQL Azure
Cloud SQL Server
P. Bernstein et al. “Adapting Microsoft SQL server
for cloud computing”, ICDE 2011
• Multi-Tenant MSSQL
• Paxos-like commit protocol for
consistent replication
 Similar to RDS
Microsoft SQL Azure SQL Azure
Model:
Managed RDBMS
Pricing:
Database size
Underlying DB:
MSSQL Server
API:
T-SQL/TDS
SQL Azure
Keyless Table Group: regular
database
Keyed Table Group: partitioned
by row key
Cloud SQL Server
P. Bernstein et al. “Adapting Microsoft SQL server
for cloud computing”, ICDE 2011
• Multi-Tenant MSSQL
• Paxos-like commit protocol for
consistent replication
 Similar to RDS
Microsoft SQL Azure SQL Azure
Model:
Managed RDBMS
Pricing:
Database size
Underlying DB:
MSSQL Server
API:
T-SQL/TDS
SQL Azure
Keyless Table Group: regular
database
Keyed Table Group: partitioned
by row key
Consistency unit
(ACID boundary)
Cloud SQL Server
P. Bernstein et al. “Adapting Microsoft SQL server
for cloud computing”, ICDE 2011
• Multi-Tenant MSSQL
• Paxos-like commit protocol for
consistent replication
 Similar to RDS
Microsoft SQL Azure SQL Azure
Model:
Managed RDBMS
Pricing:
Database size
Underlying DB:
MSSQL Server
API:
T-SQL/TDS
SQL Azure
Keyless Table Group: regular
database
Keyed Table Group: partitioned
by row key
Consistency unit
(ACID boundary)
Automatic Partitioning
for Keyed Table Groups
Cloud SQL Server
P. Bernstein et al. “Adapting Microsoft SQL server
for cloud computing”, ICDE 2011
• Multi-Tenant MSSQL
• Paxos-like commit protocol for
consistent replication
 Similar to RDS
Microsoft SQL Azure SQL Azure
Model:
Managed RDBMS
Pricing:
Database size
Underlying DB:
MSSQL Server
API:
T-SQL/TDS
SQL Azure
Keyless Table Group: regular
database
Keyed Table Group: partitioned
by row key
Consistency unit
(ACID boundary)
Automatic Partitioning
for Keyed Table Groups
Cloud SQL Server
P. Bernstein et al. “Adapting Microsoft SQL server
for cloud computing”, ICDE 2011
• Multi-Tenant MSSQL
• Paxos-like commit protocol for
consistent replication
• SLA: 99.9% uptime / month
• Usually Cheaper than RDS (Multi-Tenancy)
• Smaller Databases (max. 150 GB)
• Rich MSSQL server tooling
• Keyed Table Group internal feature only
 MySQL for Google App Engine PaaS
 Support for: patching, replication, backup
 SLA: 99,95 % uptime / month






Other RDBMS services
Google
Cloud SQL Google Cloud SQL
Pricing:
Database size
Underlying DB:
MySQL
 MySQL for Google App Engine PaaS
 Support for: patching, replication, backup
 SLA: 99,95 % uptime / month
 Postgres for Heroku PaaS
 Hosted on EC2
 No SLAs



Other RDBMS services
Google
Cloud SQL Google Cloud SQL
Pricing:
Database size
Underlying DB:
MySQL
Heroku Postgres
Pricing:
Plan based
Underlying DB:
Postgres
 MySQL for Google App Engine PaaS
 Support for: patching, replication, backup
 SLA: 99,95 % uptime / month
 Postgres for Heroku PaaS
 Hosted on EC2
 No SLAs
 MySQL for OpenStack (Icehouse)
 Under development (HP driven)
 VM ⇔ Tenant
Other RDBMS services
Google
Cloud SQL Google Cloud SQL
Pricing:
Database size
Underlying DB:
MySQL
Heroku Postgres
Pricing:
Plan based
Underlying DB:
Postgres
Trove
Pricing:
Own Hardware
Underlying DB:
MySQL
Trove
 MySQL for Google App Engine PaaS
 Support for: patching, replication, backup
 SLA: 99,95 % uptime / month
 Postgres for Heroku PaaS
 Hosted on EC2
 No SLAs
 MySQL for OpenStack (Icehouse)
 Under development (HP driven)
 VM ⇔ Tenant
Other RDBMS services
Google
Cloud SQL Google Cloud SQL
Pricing:
Database size
Underlying DB:
MySQL
Heroku Postgres
Pricing:
Plan based
Underlying DB:
Postgres
Trove
Pricing:
Own Hardware
Underlying DB:
MySQL
Trove
Evaluation of Cloud RDBMSs
D. Kossmann,T. Kraska: An evaluation of alternative architectures for transaction processing in the cloud”, Sigmod 2010
TPC-W Benchmark (Online Shop), 2010, Emulated Browsers / RPS:
HBase Wide-
Column
CP Over
Row Key
~700 1/4 Apache
(EMR)
MongoDB Doc-
ument
CP yes >100
<500
4/4 GPL
Riak Key-
Value
AP ~60 3/4 Apache
(Softlayer)
Cassandra Wide-
Column
AP With
Comp.
Index
>300
<1000
2/4 Apache
Redis Key-
Value
CA Through
Lists,
etc.
manual N/A 4/4 BSD
Managed NoSQL services
Model CAP Scans
Sec.
Indices
Largest
Cluster
Lic.
Lear-
ning DBaaS
HBase Wide-
Column
CP Over
Row Key
~700 1/4 Apache
(EMR)
MongoDB Doc-
ument
CP yes >100
<500
4/4 GPL
Riak Key-
Value
AP ~60 3/4 Apache
(Softlayer)
Cassandra Wide-
Column
AP With
Comp.
Index
>300
<1000
2/4 Apache
Redis Key-
Value
CA Through
Lists,
etc.
manual N/A 4/4 BSD
Managed NoSQL services
Model CAP Scans
Sec.
Indices
Largest
Cluster
Lic.
Lear-
ning DBaaS
And there are many more:
• CouchDB (e.g. Cloudant)
• CouchBase (e.g. KuroBase Beta)
• ElasticSearch(e.g. Bonsai)
• Solr (e.g. WebSolr)
• …
MongoHQ MongoHQ
Model:
Managed NoSQL
Pricing:
Plan-based
Underlying DB:
MongoDB
API:
Mongo, REST (beta)
 EC2-based MongoDB-as-a-Serivce

MongoHQ MongoHQ
Model:
Managed NoSQL
Pricing:
Plan-based
Underlying DB:
MongoDB
API:
Mongo, REST (beta)
 EC2-based MongoDB-as-a-Serivce

MongoHQ MongoHQ
Model:
Managed NoSQL
Pricing:
Plan-based
Underlying DB:
MongoDB
API:
Mongo, REST (beta)
 EC2-based MongoDB-as-a-Serivce

MongoHQ MongoHQ
Model:
Managed NoSQL
Pricing:
Plan-based
Underlying DB:
MongoDB
API:
Mongo, REST (beta)
 EC2-based MongoDB-as-a-Serivce

MongoHQ MongoHQ
Model:
Managed NoSQL
Pricing:
Plan-based
Underlying DB:
MongoDB
API:
Mongo, REST (beta)
 EC2-based MongoDB-as-a-Serivce

MongoHQ MongoHQ
Model:
Managed NoSQL
Pricing:
Plan-based
Underlying DB:
MongoDB
API:
Mongo, REST (beta)
 EC2-based MongoDB-as-a-Serivce

Private Process Multi-
Tenancy
Scale-Up Strategy: RAM,
IOPs and CPU increased at
runtime
Maximum Size: 1TB
Free Tier
MongoHQ MongoHQ
Model:
Managed NoSQL
Pricing:
Plan-based
Underlying DB:
MongoDB
API:
Mongo, REST (beta)
 EC2-based MongoDB-as-a-Serivce

Private Process Multi-
Tenancy
Scale-Up Strategy: RAM,
IOPs and CPU increased at
runtime
Maximum Size: 1TB
Free Tier
VM-Deployment (EC2):
M1.large on EC2: $128.10
on EC2 with 1-yr-Res.: $30
With MongoHQ: $637
MongoHQ MongoHQ
Model:
Managed NoSQL
Pricing:
Plan-based
Underlying DB:
MongoDB
API:
Mongo, REST (beta)
 EC2-based MongoDB-as-a-Serivce
 #1 thing that should never happen:
MongoHQ
 Problem: Scalability
Client
Client
configconfigconfig
mongos
Replica Set
Master
Slave
Slave
mongos
MongoHQ
 Problem: Scalability
Client
Client
configconfigconfig
mongos
Replica Set
Master
Slave
Slave
mongos
What if Writes / second or
data volume become
bottleneck?
MongoHQ
 Problem: Scalability
Client
Client
configconfigconfig
mongos
Replica Set
Replica Set
Master
Slave
Slave
Master
Slave
Slave
mongos
Sharding (Scale Out)
• Dynamic Scaling: Tenant adds
Replica Set
• Elastic Scaling: Provider
adds/removes Replica Set
MongoHQ: Only manual sharding
on “contact us” basis
MongoHQ
 Problem: Scalability
Client
Client
configconfigconfig
mongos
Replica Set
Replica Set
Master
Slave
Slave
Master
Slave
Slave
mongos
Sharding (Scale Out)
• Dynamic Scaling: Tenant adds
Replica Set
• Elastic Scaling: Provider
adds/removes Replica Set
MongoHQ: Only manual sharding
on “contact us” basis
• Bad: no SLAs, no horizontal scaling
• Good: Solid Dashboard, Backups, Replication,
integration with Heroku
• Competitors: MongoLab, ObjectRocket,
MongoSoup
ElastiCache ElastiCache
Model:
Managed NoSQL
Pricing:
Infrastructure
Underlying DB:
Memcache, Redis
API:
DB, REST
(management)
 „RDS for Memcache and Redis“
ElastiCache ElastiCache
Model:
Managed NoSQL
Pricing:
Infrastructure
Underlying DB:
Memcache, Redis
API:
DB, REST
(management)
 „RDS for Memcache and Redis“
Memcache or RedisMemcache can be run
as a cluster (client-
side sharding)
Limited choice of
instance types
ElastiCache ElastiCache
Model:
Managed NoSQL
Pricing:
Infrastructure
Underlying DB:
Memcache, Redis
API:
DB, REST
(management)
 „RDS for Memcache and Redis“
Announced last
Saturday: Snapshots
ElastiCache ElastiCache
Model:
Managed NoSQL
Pricing:
Infrastructure
Underlying DB:
Memcache, Redis
API:
DB, REST
(management)
 „RDS for Memcache and Redis“
ElastiCache ElastiCache
Model:
Managed NoSQL
Pricing:
Infrastructure
Underlying DB:
Memcache, Redis
API:
DB, REST
(management)
 „RDS for Memcache and Redis“
$ elasticache-modify-cache-parameter-group xy
AWS CLI tools and SDKs
offer a superset of
Dashboard functions
 Many Hosted NoSQL
DbaaS Providers
represented

Heroku DBaaS Addons
 Many Hosted NoSQL
DbaaS Providers
represented
 And Search
Heroku DBaaS Addons
Heroku Redis2Go example Redis2Go
Model:
Managed NoSQL
Pricing:
Plan-based
Underlying DB:
Redis
API:
Redis
Create Heroku App:
Heroku Redis2Go example Redis2Go
Model:
Managed NoSQL
Pricing:
Plan-based
Underlying DB:
Redis
API:
Redis
Create Heroku App:
Add Redis2Go Addon:
Heroku Redis2Go example Redis2Go
Model:
Managed NoSQL
Pricing:
Plan-based
Underlying DB:
Redis
API:
Redis
Create Heroku App:
Add Redis2Go Addon:
Use Connection URL (environment variable):
Heroku Redis2Go example Redis2Go
Model:
Managed NoSQL
Pricing:
Plan-based
Underlying DB:
Redis
API:
Redis
Create Heroku App:
Add Redis2Go Addon:
Use Connection URL (environment variable):
Deploy:
Heroku Redis2Go example Redis2Go
Model:
Managed NoSQL
Pricing:
Plan-based
Underlying DB:
Redis
API:
Redis
Create Heroku App:
Add Redis2Go Addon:
Use Connection URL (environment variable):
Deploy:
• Very simple
• Only suited for small to medium
applications (no SLAs, limited control)
SimpleDB Table-
Store
CP Yes (as
queries)
Auto-
matic
SQL-like
(no joins,
groups, …)
REST +
SDKs
Dynamo-
DB
Table-
Store
CP By range
key /
index
Local Sec.
Global
Sec.
Key+Cond.
On Range
Key(s)
REST +
SDKs
Automatic
over Prim.
Key
Azure
Tables
Table-
Store
CP By range
key
Key+Cond.
On Range
Key
REST +
SDKs
Automatic
over Part.
Key
99.9%
uptime
AE/Cloud
DataStore
Entity-
Group
CP Yes (as
queries)
Auto-
matic
Conjunct.
of Eq.
Predicates
REST/
SDK,
JDO,JPA
Automatic
over Entity
Groups
S3, Az.
Blob, GCS
Blob-
Store
AP REST +
SDKs
Automatic
over key
99.9%
uptime
(S3)
Proprietary Database services
Model CAP Scans
Sec.
Indices
Queries API SLA
Scale-
out
SimpleDB Table-
Store
CP Yes (as
queries)
Auto-
matic
SQL-like
(no joins,
groups, …)
REST +
SDKs
Dynamo-
DB
Table-
Store
CP By range
key /
index
Local Sec.
Global
Sec.
Key+Cond.
On Range
Key(s)
REST +
SDKs
Automatic
over Prim.
Key
Azure
Tables
Table-
Store
CP By range
key
Key+Cond.
On Range
Key
REST +
SDKs
Automatic
over Part.
Key
99.9%
uptime
AE/Cloud
DataStore
Entity-
Group
CP Yes (as
queries)
Auto-
matic
Conjunct.
of Eq.
Predicates
REST/
SDK,
JDO,JPA
Automatic
over Entity
Groups
S3, Az.
Blob, GCS
Blob-
Store
AP REST +
SDKs
Automatic
over key
99.9%
uptime
(S3)
Proprietary Database services
Model CAP Scans
Sec.
Indices
Queries API SLA
Scale-
out
There are many more object stores (HP, Rackspace,
etc.)
…but no comparable Table Stores
Azure Storage
Load-Balancing-System
Application
HTTP/HTTPS
Worker-Role
Web-Role
IIS-Webserver
Virtual Machines
Call
Other
Services
Not Allowed
Virtual Machines
Azure Storage
Load-Balancing-System
Application
HTTP/HTTPS
Worker-Role
Web-Role
IIS-Webserver
Virtual Machines
Call
Other
Services
Not Allowed
Virtual Machines
Tables
Blobs
Queues

Table Service example: Azure Tables
Partition
Key
Row Key
(sortiert)
Timestamp
(autom.)
Property1 Propertyn
intro.pdf v1.1 14/6/2013 … …
intro.pdf v1.2 15/6/2013 …
präs.pptx v0.0 11/6/2013 …
Partition
Partition
RESTAPI

Table Service example: Azure Tables
Partition
Key
Row Key
(sortiert)
Timestamp
(autom.)
Property1 Propertyn
intro.pdf v1.1 14/6/2013 … …
intro.pdf v1.2 15/6/2013 …
präs.pptx v0.0 11/6/2013 …
Partition
Partition
RESTAPI
SparseHash-distributed to
parition servers
No Index: Lookup only (!) by full table scan
Atomic "Entity-
Group Batch
Transaction" possible
 Similar to Amazon SimpleDB and DynamoDB
Table Service example: Azure Tables
Partition
Key
Row Key
(sortiert)
Timestamp
(autom.)
Property1 Propertyn
intro.pdf v1.1 14/6/2013 … …
intro.pdf v1.2 15/6/2013 …
präs.pptx v0.0 11/6/2013 …
Partition
Partition
RESTAPI
• Indexes all attributes
• Rich(er) queries
• Many Limits (size, RPS, etc.)
• Provisioned Throughput
• On SSDs („single digit latency“)
• Optional Indexes
Azure Table Storage Azure Tables
Model:
Propriertary
Pricing:
Requests + Storage
+ Network
Underlying DB:
Custom System
API:
REST
Challenges:
 Single partition and range key (modelling)
 Very “basic“ queries






Azure Tables
Azure Table Storage Azure Tables
Model:
Propriertary
Pricing:
Requests + Storage
+ Network
Underlying DB:
Custom System
API:
REST
Challenges:
 Single partition and range key (modelling)
 Very “basic“ queries






Azure Tables
Azure Table Storage Azure Tables
Model:
Propriertary
Pricing:
Requests + Storage
+ Network
Underlying DB:
Custom System
API:
REST
Challenges:
 Single partition and range key (modelling)
 Very “basic“ queries
Good:
Automatic Distribution
Replicated 3x locally + 1x async. geo-replica



Azure Tables
Azure Table Storage Azure Tables
Model:
Propriertary
Pricing:
Requests + Storage
+ Network
Underlying DB:
Custom System
API:
REST
Challenges:
 Single partition and range key (modelling)
 Very “basic“ queries
Good:
Automatic Distribution
Replicated 3x locally + 1x async. geo-replica
Very good:
SLA (99.9% uptime)
Internal architecture published
Azure Tables
Azure Table Storage Azure Tables
Model:
Propriertary
Pricing:
Requests + Storage
+ Network
Underlying DB:
Custom System
API:
REST
Challenges:
 Single partition and range key (modelling)
 Very “basic“ queries
Good:
Automatic Distribution
Replicated 3x locally + 1x async. geo-replica
Very good:
SLA (99.9% uptime)
Internal architecture published
Azure Tables
Windows Azure Storage
B. Calder, et al. "Windows Azure Storage: a highly available cloud storage service with strong consistency." , SOSP 2011
Idea:
• Layered storage infrastructure for Blobs and Tables
• Use research results (GFS, BigTable, Paxos, LSM, Erasure Coding)
Azure Table Storage Azure Tables
Model:
Propriertary
Pricing:
Requests + Storage
+ Network
Underlying DB:
Custom System
API:
REST
Challenges:
 Single partition and range key (modelling)
 Very “basic“ queries
Good:
Automatic Distribution
Replicated 3x locally + 1x async. geo-replica
Very good:
SLA (99.9% uptime)
Internal architecture published
Azure Tables
Windows Azure Storage
B. Calder, et al. "Windows Azure Storage: a highly available cloud storage service with strong consistency." , SOSP 2011
Idea:
• Layered storage infrastructure for Blobs and Tables
• Use research results (GFS, BigTable, Paxos, LSM, Erasure Coding)
Think: GFS/HDFS
Think: BigTable/HBase
Think: Chubby/ZooKeeper
DynamoDB DynamoDB
Model:
Propriertary
Pricing:
Provisioned
Throughput +
Network
Underlying DB:
Custom System
API:
REST
 Successor to SimpleDB
Limitations:
• Slow (~50-100 RPS)
• 10 GB per Domain
• Query result max. 2500 records or
1 MB
• Max. 1K-sized attributes
DynamoDB DynamoDB
Model:
Propriertary
Pricing:
Provisioned
Throughput +
Network
Underlying DB:
Custom System
API:
REST
 Successor to SimpleDB
dom:com.cnn content : "<html>…"
Primary Key Attribute
(scalar or set)
page:index
Range Key
Item:
DynamoDB DynamoDB
Model:
Propriertary
Pricing:
Provisioned
Throughput +
Network
Underlying DB:
Custom System
API:
REST
 Successor to SimpleDB
dom:com.cnn content : "<html>…"
Primary Key Attribute
(scalar or set)
Querying Options:
• GetItem: Key Lookup
• Query: Primary Key + Condition on Range Key
• Scan: Full Table Scan with filter
• EMR: Hive queries (for analytics)
page:index
Range Key
Item:
DynamoDB DynamoDB
Model:
Propriertary
Pricing:
Provisioned
Throughput +
Network
Underlying DB:
Custom System
API:
REST
 Successor to SimpleDB
 Consistency:
 Strongly (2x price) or Eventually Consistent Reads
 Atomic (Conditional) Updates per Item
 Indexing Options:
◦ Local Sec. Index: consistent additional Range Key
◦ Global Sec. Index: eventually consistent index-table (Primary Key)
dom:com.cnn content : "<html>…"
Primary Key Attribute
(scalar or set)
page:index
Range Key
Item:
DynamoDB DynamoDB
Model:
Propriertary
Pricing:
Provisioned
Throughput +
Network
Underlying DB:
Custom System
API:
REST
 Successor to SimpleDB
DynamoDB DynamoDB
Model:
Propriertary
Pricing:
Provisioned
Throughput +
Network
Underlying DB:
Custom System
API:
REST
 Successor to SimpleDB
DynamoDB DynamoDB
Model:
Propriertary
Pricing:
Provisioned
Throughput +
Network
Underlying DB:
Custom System
API:
REST
 Successor to SimpleDB
DynamoDB DynamoDB
Model:
Propriertary
Pricing:
Provisioned
Throughput +
Network
Underlying DB:
Custom System
API:
REST
 Successor to SimpleDB
DynamoDB DynamoDB
Model:
Propriertary
Pricing:
Provisioned
Throughput +
Network
Underlying DB:
Custom System
API:
REST
 Successor to SimpleDB
Unit of Billing
DynamoDB DynamoDB
Model:
Propriertary
Pricing:
Provisioned
Throughput +
Network
Underlying DB:
Custom System
API:
REST
 Successor to SimpleDB
Unit of Billing
Good:
• Low Latency (SSD)
• Data partitioning and AZ-replication
Bad:
• Scaling not elastic (Capacity Units)
• No SLAs, no internals published (≠ Dynamo!)
• No built-in backups ( AWS data pipeline)
• Vendor Lock-in
AE/Cloud DataStore DataStore
Model:
Propriertary
Pricing:
CPU + Storage +
Network
Underlying DB:
MegaStore
API:
SDK, JPA, JDO
Google Cloud
Datastore
 Structured Storage System for App Engine
 Based on:
◦ Megastore BigTable  Colossus

AE/Cloud DataStore DataStore
Model:
Propriertary
Pricing:
CPU + Storage +
Network
Underlying DB:
MegaStore
API:
SDK, JPA, JDO
Google Cloud
Datastore
 Structured Storage System for App Engine
 Based on:
◦ Megastore BigTable  Colossus

AE/Cloud DataStore DataStore
Model:
Propriertary
Pricing:
CPU + Storage +
Network
Underlying DB:
MegaStore
API:
SDK, JPA, JDO
Google Cloud
Datastore
 Structured Storage System for App Engine
 Based on:
◦ Megastore BigTable  Colossus

AE/Cloud DataStore DataStore
Model:
Propriertary
Pricing:
CPU + Storage +
Network
Underlying DB:
MegaStore
API:
SDK, JPA, JDO
Google Cloud
Datastore
 Structured Storage System for App Engine
 Based on:
◦ Megastore BigTable  Colossus
 Schemafree Entity Group (EG) data model:
User
ID
Name
Photo
ID
User
URL
Root Table Child Table
1
n
AE/Cloud DataStore DataStore
Model:
Propriertary
Pricing:
CPU + Storage +
Network
Underlying DB:
MegaStore
API:
SDK, JPA, JDO
Google Cloud
Datastore
 Structured Storage System for App Engine
 Based on:
◦ Megastore BigTable  Colossus
 Schemafree Entity Group (EG) data model:
User
ID
Name
Photo
ID
User
URL
Root Table Child Table
1
n
EG: User + n Photos
• Unit of ACID transactions/
consistency
• Fields autoindexed
(eventually consistent)
AE/Cloud DataStore DataStore
Model:
Propriertary
Pricing:
CPU + Storage +
Network
Underlying DB:
MegaStore
API:
SDK, JPA, JDO
Google Cloud
Datastore
 Structured Storage System for App Engine
 Based on:
◦ Megastore BigTable  Colossus
 Schemafree Entity Group (EG) data model:
User
ID
Name
Photo
ID
User
URL
Root Table Child Table
1
n
EG: User + n Photos
• Unit of ACID transactions/
consistency
• Fields autoindexed
(eventually consistent)
SELECT * FROM photos
WHERE ANCESTOR IS :34 AND name = „sunset“
ORDER BY date ASC
LIMIT 10
OFFSET 10
AE/Cloud DataStore
 Internally:
Entity Groups
define partitions
Synchronous
Paxos-based
replication
ACID per EG. Maximum of
1 Write/s to an EG.
Eventual Consistency
across groups
Stored in
BigTable
AE/Cloud DataStore
 Internally:
Entity Groups
define partitions
Synchronous
Paxos-based
replication
ACID per EG. Maximum of
1 Write/s to an EG.
Eventual Consistency
across groups
Stored in
BigTable
AE/Cloud DataStore
 Internally:
MegaStore
J. Baker, et al. "Megastore: Providing Scalable,
Highly Available Storage for Interactive Services."
CIDR 2011.
• Paxos-based replication and
transactions
• 100 Google applications
Problems: Slow Writes, Predefined
Entity Groups
Entity Groups
define partitions
Synchronous
Paxos-based
replication
ACID per EG. Maximum of
1 Write/s to an EG.
Eventual Consistency
across groups
Stored in
BigTable
AE/Cloud DataStore
 Internally:
MegaStore
J. Baker, et al. "Megastore: Providing Scalable,
Highly Available Storage for Interactive Services."
CIDR 2011.
• Paxos-based replication and
transactions
• 100 Google applications
Problems: Slow Writes, Predefined
Entity Groups
Entity Groups
define partitions
Synchronous
Paxos-based
replication
ACID per EG. Maximum of
1 Write/s to an EG.
Eventual Consistency
across groups
Stored in
BigTable
Spanner
J. Corbett et al. "Spanner: Google’s globally
distributed database." TOCS 2013
Idea:
• Autosharded Entity Groups
• Not based on BigTable
Implementation:
• TrueTime API (GPS + atomic
clocks)  commit timestamps
of 2PL-SI transactions
• Paxos-replication per Shard
AE/Cloud DataStore
 Internally:
MegaStore
J. Baker, et al. "Megastore: Providing Scalable,
Highly Available Storage for Interactive Services."
CIDR 2011.
• Paxos-based replication and
transactions
• 100 Google applications
Problems: Slow Writes, Predefined
Entity Groups
Entity Groups
define partitions
Synchronous
Paxos-based
replication
ACID per EG. Maximum of
1 Write/s to an EG.
Eventual Consistency
across groups
Stored in
BigTable
Spanner
J. Corbett et al. "Spanner: Google’s globally
distributed database." TOCS 2013
Idea:
• Autosharded Entity Groups
• Not based on BigTable
Implementation:
• TrueTime API (GPS + atomic
clocks)  commit timestamps
of 2PL-SI transactions
• Paxos-replication per Shard
F1
J. Shute, et al. "F1: A distributed SQL database that scales.“, VLDB
2013
Idea:
• Full SQL relational database built on
Spanner, powers AdWords
Implementation:
• 5-way replication
• Data Model: relational + hierarchy
(customercampaignAdGroup)
• Transactions: Snapshot-Read-Only,
Pessimistic (Spanner), optimistic
• Distributed SQL Engine
AE/Cloud DataStore
 Internally:
MegaStore
J. Baker, et al. "Megastore: Providing Scalable,
Highly Available Storage for Interactive Services."
CIDR 2011.
• Paxos-based replication and
transactions
• 100 Google applications
Problems: Slow Writes, Predefined
Entity Groups
Entity Groups
define partitions
Synchronous
Paxos-based
replication
ACID per EG. Maximum of
1 Write/s to an EG.
Eventual Consistency
across groups
Stored in
BigTable
Spanner
J. Corbett et al. "Spanner: Google’s globally
distributed database." TOCS 2013
Idea:
• Autosharded Entity Groups
• Not based on BigTable
Implementation:
• TrueTime API (GPS + atomic
clocks)  commit timestamps
of 2PL-SI transactions
• Paxos-replication per Shard
F1
J. Shute, et al. "F1: A distributed SQL database that scales.“, VLDB
2013
Idea:
• Full SQL relational database built on
Spanner, powers AdWords
Implementation:
• 5-way replication
• Data Model: relational + hierarchy
(customercampaignAdGroup)
• Transactions: Snapshot-Read-Only,
Pessimistic (Spanner), optimistic
• Distributed SQL Engine
Good:
• Transactions (though limited)
• Good scalability of data volume
Bad:
• Entity Groups hard to define
• Bad Scale-Out for write and reads (Kossmann et
al.)  no advantage over RDBMS for small data
volumes
A Spanner/F1 based DBaaS?
 Idea: Blobs with RESTful CRUD Interface
 Distribution: 𝐵𝑢𝑐𝑘𝑒𝑡, 𝐾𝑒𝑦 → 𝑆𝑒𝑟𝑣𝑒𝑟 + 𝑅𝑒𝑝𝑙𝑖𝑐𝑎𝑠
 CDN Integration (Azure CDN, Cloudfront)
 S3: > 2 Trillion (2 ⋅ 1012) objects
Azure Blobs, Amazon S3, Google Cloud
Storage
Container
Blobcontains
Block 1 (up to 4 MB)
Block 2
Block n
REST-Request
Azure Blobs:
Block Blob: 4MB blocks  large files
Page Blob: 512B blocks  random IO
 Automatic Versioning
 Pricing: per request (10.000 ~ 1c) and storage
(1GB/month ~ 3c) and network (1GB ~ 10c)
Azure Blobs, Amazon S3, Google Cloud
Storage
S3 Blob
DELETE /puppy.jpg HTTP/1.1
Host: mybucket.s3.amazonaws.com
Authorization: AWS AKIAIO...
AWS Ireland DC
Replicas
Reduced Redundancy: only 1 replica
Glacier: tape-disk archivalAmazon S3:
 Automatic Versioning
 Pricing: per request (10.000 ~ 1c) and storage
(1GB/month ~ 3c) and network (1GB ~ 10c)
Azure Blobs, Amazon S3, Google Cloud
Storage
S3 Blob
DELETE /puppy.jpg HTTP/1.1
Host: mybucket.s3.amazonaws.com
Authorization: AWS AKIAIO...
AWS Ireland DC
Replicas
Reduced Redundancy: only 1 replica
Glacier: tape-disk archivalAmazon S3:
S3 Consistency
D. Bermbach,, S. Tai. "Eventual consistency:
How soon is eventual?”, MW4SOC 11
Findings:
• Inconsistency window varies
from 2-11 seconds
• Monotonic Read Consistency
is often violated
 Automatic Versioning
 Pricing: per request (10.000 ~ 1c) and storage
(1GB/month ~ 3c) and network (1GB ~ 10c)
Azure Blobs, Amazon S3, Google Cloud
Storage
S3 Blob
DELETE /puppy.jpg HTTP/1.1
Host: mybucket.s3.amazonaws.com
Authorization: AWS AKIAIO...
AWS Ireland DC
Replicas
Reduced Redundancy: only 1 replica
Glacier: tape-disk archivalAmazon S3:
S3 Consistency
D. Bermbach,, S. Tai. "Eventual consistency:
How soon is eventual?”, MW4SOC 11
Findings:
• Inconsistency window varies
from 2-11 seconds
• Monotonic Read Consistency
is often violated
Building a database on S3
M. Brantner, et al. "Building a database on S3." Sigmod 2008
Idea:
• Use S3 as the persistent storage of a
database
Implementation:
• Buffer Pool and Log Manager on S3
• No transaction or query support
 Founded June, 2011
 Acquired by Facebook April, 2013
 Pricing:
◦ Free
◦ Pro (199$)
◦ Enterprise
Parse - MBaaS Parse
Model:
Backend-aas
Pricing:
Plan-based
Underlying DB:
Mainly MongoDB
API:
SDKs, REST
 Founded June, 2011
 Acquired by Facebook April, 2013
 Pricing:
◦ Free
◦ Pro (199$)
◦ Enterprise
Parse - MBaaS Parse
Model:
Backend-aas
Pricing:
Plan-based
Underlying DB:
Mainly MongoDB
API:
SDKs, REST
Parse Core
 Founded June, 2011
 Acquired by Facebook April, 2013
 Pricing:
◦ Free
◦ Pro (199$)
◦ Enterprise
Parse - MBaaS Parse
Model:
Backend-aas
Pricing:
Plan-based
Underlying DB:
Mainly MongoDB
API:
SDKs, REST
Parse Core Parse Analytics
 Founded June, 2011
 Acquired by Facebook April, 2013
 Pricing:
◦ Free
◦ Pro (199$)
◦ Enterprise
Parse - MBaaS Parse
Model:
Backend-aas
Pricing:
Plan-based
Underlying DB:
Mainly MongoDB
API:
SDKs, REST
Parse Core Parse Analytics Parse Push
Parse - MBaaS
Authentication
User + Password
OAuth: Facebook, Twitter
Parse - MBaaS
Authentication
User + Password
OAuth: Facebook, Twitter
Parse - MBaaS
Query Cinemas
(new Parse.Query('Cinemas'))
.withinKilometers(...)
.fetch()
Authentication
User + Password
OAuth: Facebook, Twitter
Parse - MBaaS
Query Cinemas
(new Parse.Query('Cinemas'))
.withinKilometers(...)
.fetch()
Query Movies
(new Parse.Query('Movies'))
.greaterThan('startAt‘, now)
.notEqualTo('cinemas', cId)
.fetch()
Meteor Meteor
Model:
Backend-aaS
Pricing:
No yet revealed
Underlying DB:
MongoDB
API:
WebSockets
 Idea: Full-Stack JavaScript with Node.js,
MongoDB and WebSockets
Web Browser Node.js + MongoDB
(Single Server)
WebSocket
Meteor Meteor
Model:
Backend-aaS
Pricing:
No yet revealed
Underlying DB:
MongoDB
API:
WebSockets
 Idea: Full-Stack JavaScript with Node.js,
MongoDB and WebSockets
<div class="player {{selected}}">
<span class="name">{{name}}</span>
<span class="score">{{score}}</span>
</div>
Web Browser Node.js + MongoDB
(Single Server)
WebSocket
Meteor Meteor
Model:
Backend-aaS
Pricing:
No yet revealed
Underlying DB:
MongoDB
API:
WebSockets
 Idea: Full-Stack JavaScript with Node.js,
MongoDB and WebSockets
Players = new Meteor.
Collection("players");
if (Meteor.isServer) {
//…
<div class="player {{selected}}">
<span class="name">{{name}}</span>
<span class="score">{{score}}</span>
</div>
Web Browser Node.js + MongoDB
(Single Server)
WebSocket
Meteor Meteor
Model:
Backend-aaS
Pricing:
No yet revealed
Underlying DB:
MongoDB
API:
WebSockets
 Idea: Full-Stack JavaScript with Node.js,
MongoDB and WebSockets
Players = new Meteor.
Collection("players");
if (Meteor.isServer) {
//…
if (Meteor.isClient) {
Template.leaderboard.players = function () {
return Players.find({},
{sort: {score: -1, name: 1}});
};
<div class="player {{selected}}">
<span class="name">{{name}}</span>
<span class="score">{{score}}</span>
</div>
Web Browser Node.js + MongoDB
(Single Server)
WebSocket
Meteor Meteor
Model:
Backend-aaS
Pricing:
No yet revealed
Underlying DB:
MongoDB
API:
WebSockets
 Idea: Full-Stack JavaScript with Node.js,
MongoDB and WebSockets
Players = new Meteor.
Collection("players");
if (Meteor.isServer) {
//…
if (Meteor.isClient) {
Template.leaderboard.players = function () {
return Players.find({},
{sort: {score: -1, name: 1}});
};
<div class="player {{selected}}">
<span class="name">{{name}}</span>
<span class="score">{{score}}</span>
</div>
Web Browser Node.js + MongoDB
(Single Server)
WebSocket
$ meteor deploy abc.meteor.com
Meteor Meteor
Model:
Backend-aaS
Pricing:
No yet revealed
Underlying DB:
MongoDB
API:
WebSockets
 Idea: Full-Stack JavaScript with Node.js,
MongoDB and WebSockets
Players = new Meteor.
Collection("players");
if (Meteor.isServer) {
//…
if (Meteor.isClient) {
Template.leaderboard.players = function () {
return Players.find({},
{sort: {score: -1, name: 1}});
};
<div class="player {{selected}}">
<span class="name">{{name}}</span>
<span class="score">{{score}}</span>
</div>
Web Browser Node.js + MongoDB
(Single Server)
WebSocket
$ meteor deploy abc.meteor.com
Very productive for very small projects
Fundamentally limited scalability:
• Server tails Mongo‘s oplog
• And holds the fetched data state of
every client
Outline
• Hot Topics
• Orestes: a scalable, low-
latency architecture
• Baqend: putting it into
practice
What are Cloud
Databases?
Cloud Databases in
the wild
Research Perspectives
Wrap-up and
literature
 Example: CryptDB
 Idea: Only decrypt as much as neccessary
Encrypted Databases: Research
RDBMS
SQL-Proxy
Encrypts and decrypts, rewrites queries
 Example: CryptDB
 Idea: Only decrypt as much as neccessary
Encrypted Databases: Research
RDBMS
SQL-Proxy
Encrypts and decrypts, rewrites queries
 Example: CryptDB
 Idea: Only decrypt as much as neccessary
Encrypted Databases: Research
RDBMS
SQL-Proxy
Encrypts and decrypts, rewrites queries
Relational Cloud
C. Curino, et al. "Relational cloud: A database-as-a-service
for the cloud.“, CIDR 2011
DBaaS Architecture:
• Encrypted with CryptDB
• Multi-Tenancy through live
migration
• Workload-aware partitioning
(graph-based)
 Example: CryptDB
 Idea: Only decrypt as much as neccessary
Encrypted Databases: Research
RDBMS
SQL-Proxy
Encrypts and decrypts, rewrites queries
Relational Cloud
C. Curino, et al. "Relational cloud: A database-as-a-service
for the cloud.“, CIDR 2011
DBaaS Architecture:
• Encrypted with CryptDB
• Multi-Tenancy through live
migration
• Workload-aware partitioning
(graph-based)
• Early approach
• Not adopted in practice, yet
Dream solution:
Full Homorphic Encryption
Transactions/Consistency: Research
Dynamo Eventual None 1 RT -
Yahoo PNuts Timeline per key Single Key 1 RT possible
COPS Causality Multi-Record 1 RT possible
MySQL (async) Serializable Static Partition 1 RT possible
Megastore Serializable Static Partition 2 RT -
Spanner/F1 Snapshot Isolation Partition 2 RT -
MDCC Read-Commited Multi-Record 1 RT -
Consistency Transactional Unit
Commit
Latency
Data
Loss?
Transactions/Consistency: Research
Dynamo Eventual None 1 RT -
Yahoo PNuts Timeline per key Single Key 1 RT possible
COPS Causality Multi-Record 1 RT possible
MySQL (async) Serializable Static Partition 1 RT possible
Megastore Serializable Static Partition 2 RT -
Spanner/F1 Snapshot Isolation Partition 2 RT -
MDCC Read-Commited Multi-Record 1 RT -
Consistency Transactional Unit
Commit
Latency
Data
Loss?
Multi-Data Center Consistency
T. Kraska et al. "MDCC: Multi-data center consistency." EuroSys, 2013.
Idea:
• Multi-Data center commit protocol with
single round-trip
Implementation:
• Optimistic Commit Protocol
• Fast, Generalized Multi-Paxos
Result: almost as fast as Dynamo-style
Transactions/Consistency: Research
Dynamo Eventual None 1 RT -
Yahoo PNuts Timeline per key Single Key 1 RT possible
COPS Causality Multi-Record 1 RT possible
MySQL (async) Serializable Static Partition 1 RT possible
Megastore Serializable Static Partition 2 RT -
Spanner/F1 Snapshot Isolation Partition 2 RT -
MDCC Read-Commited Multi-Record 1 RT -
Consistency Transactional Unit
Commit
Latency
Data
Loss?
Multi-Data Center Consistency
T. Kraska et al. "MDCC: Multi-data center consistency." EuroSys, 2013.
Idea:
• Multi-Data center commit protocol with
single round-trip
Implementation:
• Optimistic Commit Protocol
• Fast, Generalized Multi-Paxos
Result: almost as fast as Dynamo-style
Currently no NoSQL DB implements
consistent Multi-DC replication
 YCSB (Yahoo Cloud Serving Benchmark)
Benchmarking: Research
Data Store
 YCSB (Yahoo Cloud Serving Benchmark)
Benchmarking: Research
Client
WorkloadGenerator
PluggableDBinterface
Data Store
Threads
Stats
 YCSB (Yahoo Cloud Serving Benchmark)
Benchmarking: Research
Client
WorkloadGenerator
PluggableDBinterface
Workload:
1. Operation Mix
2. Record Size
3. Popularity Distribution
Runtime Parameters:
DB host name,
threads, etc.
Data Store
Threads
Stats
 YCSB (Yahoo Cloud Serving Benchmark)
Benchmarking: Research
Client
WorkloadGenerator
PluggableDBinterface
Workload:
1. Operation Mix
2. Record Size
3. Popularity Distribution
Runtime Parameters:
DB host name,
threads, etc.
Read()
Insert()
Update()
Delete()
Scan()
Data Store
Threads
Stats
DB protocol
 YCSB (Yahoo Cloud Serving Benchmark)
Benchmarking: Research
Client
WorkloadGenerator
PluggableDBinterface
Workload:
1. Operation Mix
2. Record Size
3. Popularity Distribution
Runtime Parameters:
DB host name,
threads, etc.
Read()
Insert()
Update()
Delete()
Scan()
Data Store
Threads
Stats
DB protocol
Workload Operation Mix Distribution Example
A – Update Heavy Read: 50%
Update: 50%
Zipfian Session Store
B – Read Heavy Read: 95%
Update: 5%
Zipfian Photo Tagging
C – Read Only Read: 100% Zipfian User Profile Cache
D – Read Latest Read: 95%
Insert: 5%
Latest User Status Updates
E – Short Ranges Scan: 95%
Insert: 5%
Zipfian/
Uniform
Threaded Conversations
 Example Result
(Read Heavy):
Benchmarking: Research
 Example Result
(Read Heavy):
Benchmarking: Research
Weaknesses:
• Single client can be a
bottleneck
• No consistency &
availability measurement
 Example Result
(Read Heavy):
Benchmarking: Research
YCSB++
S. Patil, M. Polte, et al.„Ycsb++: benchmarking and
performance debugging advanced features in scalable
table stores“, SOCC 2011
• Clients coordinate through
Zookeeper
• Simple Read-After-Write Checks
• Evaluation: Hbase & Accumulo
Weaknesses:
• Single client can be a
bottleneck
• No consistency &
availability measurement
 Example Result
(Read Heavy):
Benchmarking: Research
YCSB++
S. Patil, M. Polte, et al.„Ycsb++: benchmarking and
performance debugging advanced features in scalable
table stores“, SOCC 2011
• Clients coordinate through
Zookeeper
• Simple Read-After-Write Checks
• Evaluation: Hbase & Accumulo
Weaknesses:
• Single client can be a
bottleneck
• No consistency &
availability measurement
• No Transaction Support
YCSB+T
A. Dey et al. “YCSB+T: Benchmarking Web-Scale
Transactional Databases”, CloudDB 2014
• New workload: Transactional
Bank Account
• Simple anomaly detection for
Lost Updates
• No comparison of systems
 Example Result
(Read Heavy):
Benchmarking: Research
YCSB++
S. Patil, M. Polte, et al.„Ycsb++: benchmarking and
performance debugging advanced features in scalable
table stores“, SOCC 2011
• Clients coordinate through
Zookeeper
• Simple Read-After-Write Checks
• Evaluation: Hbase & Accumulo
Weaknesses:
• Single client can be a
bottleneck
• No consistency &
availability measurement
• No Transaction Support
YCSB+T
A. Dey et al. “YCSB+T: Benchmarking Web-Scale
Transactional Databases”, CloudDB 2014
• New workload: Transactional
Bank Account
• Simple anomaly detection for
Lost Updates
• No comparison of systems
No specific application
CloudStone, CARE, TPC
extensions?
Benchmarking: state of the art
Benchmarking: state of the art
Tests latency and throughput
of IaaS-Providers, CDNs and
Object Stores
Benchmarking: state of the art
Tests latency and throughput
of IaaS-Providers, CDNs and
Object Stores
Does not test: Cloud Databases
Vision
YCSB Harmony
Idea:
Vision
YCSB Harmony
YCSB
Client
YCSB
Client YCSB
Client
YCSB
Client
Idea:
1. Periodically
launch clients
Vision
YCSB Harmony
YCSB
Client
YCSB
Client YCSB
Client
YCSB
Client
Idea:
1. Periodically
launch clients
2. Benchmark
Cloud DB
3. Publish results
Vision
YCSB Harmony
System Availablity Reads/s Writes/s Avg. Latency 95th perc.
Latency
Plot
DynamoDB 99.9% 23411 34534 3.2 ms 9 ms
RDS 99.8% 2342 2455 30 ms 80 ms
Azure Table 99.5% 22343 23442 12 ms 20 ms
Google
DataStore
99.5% 3000 2000 30 ms 300 ms
Database research at the
University of Hamburg
Motivation
Classic 3-Tier-ArchitectureThree-Tier Architecture
Motivation
ClientApplicationDatabase
Web
Server
Web
Server
Classic 3-Tier-ArchitectureThree-Tier Architecture
Motivation
ClientApplicationDatabase
Web
Server
Web
Server
Classic 3-Tier-ArchitectureThree-Tier Architecture
Motivation
ClientApplicationDatabase
Web
Server
Web
Server
High Latency
Classic 3-Tier-ArchitectureThree-Tier Architecture
Motivation
ClientApplicationDatabase
Web
Server
Web
Server
Average (2014):
90 HTTP Requests per
page load
High Latency
Classic 3-Tier-ArchitectureThree-Tier Architecture
Motivation
ClientApplicationDatabase
Web
Server
Web
Server
Average (2014):
90 HTTP Requests per
page load
High Latency
Classic 3-Tier-ArchitectureThree-Tier ArchitectureWith every 100ms of additional page
load time, revenue decreases by 1%.
Study by Amazon
Motivation
ClientApplicationDatabase
Web
Server
Web
Server
Average (2014):
90 HTTP Requests per
page load
High Latency
Classic 3-Tier-ArchitectureThree-Tier ArchitectureWith every 100ms of additional page
load time, revenue decreases by 1%.
Study by Amazon
When increasing load time of search
results by 500ms, traffic decreases by
20%.
Study by Google
Motivation
ClientApplicationDatabase
Web
Server
Web
Server
Average (2014):
90 HTTP Requests per
page load
High Latency
Classic 3-Tier-ArchitectureThree-Tier ArchitectureWith every 100ms of additional page
load time, revenue decreases by 1%.
Study by Amazon
When increasing load time of search
results by 500ms, traffic decreases by
20%.
Study by Google
SPAs
Rich ClientApplicationDatabase
Web
Server
Web
Server
Single-Page Applications
High Latency
(Rich Client, Smart Client)
Data (e.g. JSON)
(AngularJS, Backbone,
Ember.JS, etc.)
ORESTES
Rich ClientOrestesDB-Cluster
REST
Server
Low Latency
Cloud
REST
Server
ORESTES
Rich ClientOrestesDB-Cluster
REST
Server
Web-
Caches Low Latency
Cloud
REST
Server
Cacheable
Database Objects
ORESTES
Rich ClientOrestesDB-Cluster
REST
Server
Web-
Caches Low Latency
Scalable
NoSQL DBs
Cloud
REST
Server
Cacheable
Database Objects
ORESTES
Rich ClientOrestesDB-Cluster
REST
Server
Web-
Caches Low Latency
Scalable
NoSQL DBs
Cloud
Exposes the DB via
REST API and handles:
• Scaling
• Cache Consistency
• Transactions
• Schema
REST
Server
Cacheable
Database Objects
 Middleware:
◦ Scalable REST API for aggregate-oriented
persistence, queries, schema management and
transactions.
 Caching:
◦ Consistent web caching of database objects for
read scalability and latency reduction.
 Transactions:
◦ Optimistic cache-aware transaction model.
ORESTES: Components
Java/JDO
persist
find
createQuery
JavaScript/JPA Port
REST/HTTP API
others
Application
Server
Browser or
Mobile Device
Application
Layer
Persistence
API
Data
Store
Java/JDO
persist
find
createQuery
JavaScript/JPA Port
REST/HTTP API
others
Application
Server
Browser or
Mobile Device
Application
Layer
Persistence
API
Data
Store
HTTP Server
Trans-
actions
Querys
Object
Persist.
Schema
Key-
Value
Doc-
uments
DBaaS & BaaS Layer
Transaction Validation
HTTP Server
Config-
uration
Partial
Updates
Access Control
Multi-Tenancy Schema Management
Workload Management
Cache Coherence
Autoscaling
Database-
independent
Concerns
Database -
specific
Wrappers
SLAs
HTTP Server
Java/JDO
persist
find
createQuery
JavaScript/JPA Port
REST/HTTP API
others
Application
Server
Browser or
Mobile Device
Application
Layer
Persistence
API
Data
Store
ISP
Forward-Proxy Caches
ISP Caches
Reverse-Proxy Caches and
Load Balancers
CDN Caches
Content Delivery
Networks
Purge
Scale
HTTP Server
Trans-
actions
Querys
Object
Persist.
Schema
Key-
Value
Doc-
uments
DBaaS & BaaS Layer
Transaction Validation
HTTP Server
Config-
uration
Partial
Updates
Access Control
Multi-Tenancy Schema Management
Workload Management
Cache Coherence
Autoscaling
Database-
independent
Concerns
Database -
specific
Wrappers
SLAs
HTTP Server
Java/JDO
persist
find
createQuery
JavaScript/JPA Port
REST/HTTP API
others
Application
Server
Browser or
Mobile Device
Application
Layer
Persistence
API
Data
Store
ISP
Forward-Proxy Caches
ISP Caches
Reverse-Proxy Caches and
Load Balancers
CDN Caches
Content Delivery
Networks
Purge
Scale
HTTP Server
Trans-
actions
Querys
Object
Persist.
Schema
Key-
Value
Doc-
uments
DBaaS & BaaS Layer
Transaction Validation
HTTP Server
Config-
uration
Partial
Updates
Access Control
Multi-Tenancy Schema Management
Workload Management
Cache Coherence
Autoscaling
Database-
independent
Concerns
Database -
specific
Wrappers
SLAs
HTTP Server
Redis (Replicated)
10201040
10101010Counting
Bloom Filter
add
delete
Node.JS (local to Server)
Stored Procedures
Custom Validation
Java/JDO
persist
find
createQuery
JavaScript/JPA Port
REST/HTTP API
others
Application
Server
Browser or
Mobile Device
Application
Layer
Persistence
API
Data
Store
ISP
Forward-Proxy Caches
ISP Caches
Reverse-Proxy Caches and
Load Balancers
CDN Caches
Content Delivery
Networks
Purge
Scale
HTTP Server
Trans-
actions
Querys
Object
Persist.
Schema
Key-
Value
Doc-
uments
DBaaS & BaaS Layer
Transaction Validation
HTTP Server
Config-
uration
Partial
Updates
Access Control
Multi-Tenancy Schema Management
Workload Management
Cache Coherence
Autoscaling
Database-
independent
Concerns
Database -
specific
Wrappers
SLAs
HTTP Server
Redis (Replicated)
10201040
10101010Counting
Bloom Filter
add
delete
Node.JS (local to Server)
Stored Procedures
Custom Validation
GET /db/{bucket}/{class}/{id}
200 OK
Cache-Control: public, max-age=6000
ETag: "3"
JSON Object
Caching
Orestes
Client-
(Browser-)
Cache
Proxy
Caches
ISP
Caches
CDN
Caches
Reverse-
Proxy
Caches
Miss
Hit
Miss
Miss
Miss
Miss
100%
50%
0%
P(Cache-Hit)
0 ms 1 ms 10 ms 20 ms 50-500 ms 50-500 ms
Caching
Orestes
Client-
(Browser-)
Cache
Proxy
Caches
ISP
Caches
CDN
Caches
Reverse-
Proxy
Caches
Miss
Hit
Miss
Miss
Miss
Miss
100%
50%
0%
P(Cache-Hit)
0 ms 1 ms 10 ms 20 ms 50-500 ms 50-500 ms
em.find(id) JavaScript
Caching
Orestes
Client-
(Browser-)
Cache
Proxy
Caches
ISP
Caches
CDN
Caches
Reverse-
Proxy
Caches
Miss
Hit
Miss
Miss
Miss
Miss
100%
50%
0%
P(Cache-Hit)
0 ms 1 ms 10 ms 20 ms 50-500 ms 50-500 ms
GET /db/posts/{id} HTTP
Caching
Orestes
Client-
(Browser-)
Cache
Proxy
Caches
ISP
Caches
CDN
Caches
Reverse-
Proxy
Caches
Miss
Hit
Miss
Miss
Miss
Miss
100%
50%
0%
P(Cache-Hit)
0 ms 1 ms 10 ms 20 ms 50-500 ms 50-500 ms
Cache-Hit: Return Object
Cache-Miss: Forward Request
Caching
Orestes
Client-
(Browser-)
Cache
Proxy
Caches
ISP
Caches
CDN
Caches
Reverse-
Proxy
Caches
Miss
Hit
Miss
Miss
Miss
Miss
100%
50%
0%
P(Cache-Hit)
0 ms 1 ms 10 ms 20 ms 50-500 ms 50-500 ms
Fetch object from DB and return
it with caching information
Caching
Orestes
Client-
(Browser-)
Cache
Proxy
Caches
ISP
Caches
CDN
Caches
Reverse-
Proxy
Caches
Miss
Hit
Miss
Miss
Miss
Miss
100%
50%
0%
P(Cache-Hit)
0 ms 1 ms 10 ms 20 ms 50-500 ms 50-500 ms
Scalability and Cache-Hits
Caching
Orestes
Client-
(Browser-)
Cache
Proxy
Caches
ISP
Caches
CDN
Caches
Reverse-
Proxy
Caches
Miss
Hit
Miss
Miss
Miss
Miss
100%
50%
0%
P(Cache-Hit)
0 ms 1 ms 10 ms 20 ms 50-500 ms 50-500 ms
Scalability and Cache-Hits
Latency Benefit
App
App
BFB: Bloom Filter Bounded Staleness
Cache
1 4 020
App
App
BFB: Bloom Filter Bounded Staleness
Cache
1 4 020
How to prevent stale
reads (inconsistency)?
App
App
BFB: Bloom Filter Bounded Staleness
Cache
1 4 020
App
App
BFB: Bloom Filter Bounded Staleness
Cache
1 4 020
App
App
BFB: Bloom Filter Bounded Staleness
Cache
1 4 020
App
App
BFB: Bloom Filter Bounded Staleness
Cache
1 4 020
purge(obj)
hashB(oid)hashA(oid)
13
App
App
BFB: Bloom Filter Bounded Staleness
Cache
1 4 020
purge(obj)
131 1 110
Flat(Counting Bloomfilter)
App
App
BFB: Bloom Filter Bounded Staleness
Cache
1 4 020
purge(obj)
131 1 110
hashB(oid)hashA(oid)
App
App
BFB: Bloom Filter Bounded Staleness
Cache
1 4 020
purge(obj)
131 1 110
hashB(oid)hashA(oid)
App
App
BFB: Bloom Filter Bounded Staleness
Cache
1 4 020
purge(obj)
131 1 110
App
App
BFB: Bloom Filter Bounded Staleness
Cache
1 4 020
hashB(oid)hashA(oid)
1 1 110
App
App
BFB: Bloom Filter Bounded Staleness
Cache
1 4 020
hashB(oid)hashA(oid)
1 1 110
𝑓 ≈ 1 − 𝑒−
𝑘𝑛
𝑚
𝑘
𝑘 = ln 2 ⋅ (
𝑛
𝑚
)
False-Positive
Rate:
Hash-
Functions:
With 10.000 Updates per 10 minutes and 1% error rate: 12 KByte
App
App
BFB: Bloom Filter Bounded Staleness
Cache
1 4 020
hashB(oid)hashA(oid)
1 1 110
SCOT: Scalable Cache-Aware Optimistic Transactions
Cache
Cache
Cache
REST-Server
REST-Server
REST-Server
DB
Coordinator
Client
SCOT: Scalable Cache-Aware Optimistic Transactions
Cache
Cache
Cache
REST-Server
REST-Server
REST-Server
DB
Coordinator
Client
Begin Transaction
Bloom Filter
1
SCOT: Scalable Cache-Aware Optimistic Transactions
Cache
Cache
Cache
REST-Server
REST-Server
REST-Server
DB
Coordinator
Client
Begin Transaction
Bloom Filter
1
Reads
Writes
2
Writes
(Hidden)
SCOT: Scalable Cache-Aware Optimistic Transactions
Cache
Cache
Cache
REST-Server
REST-Server
REST-Server
DB
Coordinator
Client
Begin Transaction
Bloom Filter
1
Reads
Writes
2
Commit: read- & write-set versions
Committed OR aborted + stale objects
3
Writes
(Hidden)
SCOT: Scalable Cache-Aware Optimistic Transactions
Cache
Cache
Cache
REST-Server
REST-Server
REST-Server
DB
Coordinator
Client
Begin Transaction
Bloom Filter
1
Reads
Writes
2
Commit: read- & write-set versions
Committed OR aborted + stale objects
3
Writes
(Hidden)
validation 4
E.g. Redis or ZooKeeper
prevent conflicting
validations
SCOT: Scalable Cache-Aware Optimistic Transactions
Cache
Cache
Cache
REST-Server
REST-Server
REST-Server
DB
Coordinator
Client
Begin Transaction
Bloom Filter
1
Reads
Writes
2
Commit: read- & write-set versions
Committed OR aborted + stale objects
3
Writes
(Hidden)
validation 4
E.g. Redis or ZooKeeper
5Writes (Public)
Read all
prevent conflicting
validations
SCOT: Scalable Cache-Aware Optimistic Transactions
Cache
Cache
Cache
REST-Server
REST-Server
REST-Server
DB
Coordinator
Client
Begin Transaction
Bloom Filter
1
Reads
Writes
2
Commit: read- & write-set versions
Committed OR aborted + stale objects
3
Writes
(Hidden)
validation 4
E.g. Redis or ZooKeeper
5Writes (Public)
Read all
prevent conflicting
validations
Caching → Shorter transaction
duration → less aborts
Polyglot
Persistence
application
Orestes servers
REST/HTTP
protocol
Redis MongoDB db4o
meta data contains SLA
parse SLA
& route data
manage
materialisation
resolve mapping
Polyglot Persistence
Mediator
Polyglot
Persistence
application
Orestes servers
REST/HTTP
protocol
Redis MongoDB db4o
meta data contains SLA
parse SLA
& route data
manage
materialisation
resolve mapping
Polyglot Persistence
Mediator
Polyglot
Persistence
application
Orestes servers
REST/HTTP
protocol
Redis MongoDB db4o
meta data contains SLA
parse SLA
& route data
manage
materialisation
resolve mapping
Polyglot Persistence
Mediator
Polyglot
Persistence
application
Orestes servers
REST/HTTP
protocol
Redis MongoDB db4o
meta data contains SLA
parse SLA
& route data
manage
materialisation
resolve mapping
Polyglot Persistence
Mediator
Results:
Article-Objects with Impression Count
Article
ID
Title
…
Imp.
Imp.
ID
MongoDB Redis Sorted Set
Speedup with PPM:
• 50-1000%
• 66% performance of Varnish
Cloud Evaluation of ORESTES
Client Machine
50
...
Web
Cache
Orestes
Server
Versant
DB
Amazon EC2 Ireland EC2 USA165 ms
Client Machine
Client Machine
Cloud Evaluation of ORESTES
Client Machine
50
...
Web
Cache
Orestes
Server
Versant
DB
Amazon EC2 Ireland EC2 USA165 ms
Client Machine
Client Machine
30 000 Objekte
500 Anfragen/
Client
30 000 Objects
500 Req./Clients
10/1 Read/Write
Cloud Evaluation of ORESTES
Client Machine
50
...
Web
Cache
Orestes
Server
Versant
DB
Amazon EC2 Ireland EC2 USA165 ms
Client Machine
Client Machine
 Stateless Scale-out REST middleware for DBaaS
 Generic Schema, Authentication, Multi-Tenancy, etc.
 SCOT (Scalable Optimistic Cache-Aware Transactions):
◦ Optimistic Database-indepdent ACID transactions
 BFB (Bloom Filter Bounded Staleness):
◦ Allows static caching with tunable consistency guarantee
 PPM (Polyglot Persistence Mediator):
◦ SLAs  appropriate database
Orestes: Summary
From Research to
Practice
 Orestes as a startup
Baqend
 Orestes as a startup
Baqend
Internet
Seoxy
REST-API Transactions Schema Management Cache Consistency
Auto-Scaling Multi-Tenancy Security and Access Control Provisioning
 Orestes as a startup
Baqend
 Orestes as a startup
Baqend
 Orestes as a startup
Baqend
 Orestes as a startup
Baqend
 Orestes as a startup
Baqend
Baqend in Action
Baqend in Action
GET /app.html
Baqend in Action
GET /app.html
db.find(Menu, 'main')
.done(...);
db.find(Page, 'hero')
.done(...);
db.query(Page, 'top3')
.done(...);
Baqend in Action
GET /app.html
db.find(Menu, 'main')
.done(...);
db.find(Page, 'hero')
.done(...);
db.query(Page, 'top3')
.done(...);
GET /img/pic005.jpg
GET /img/pic017.jpg
GET /img/pic022.jpg
More: Baqend.com
Wrap-up & book recommendations
How to choose a cloud database:
Wrap-up
Managed
RDBMS
Managed
DWH
Managed
NoSQL DB
Backend-as-
a-Service
Proprietary
Serivce
Object Store
How to choose a cloud database:
Wrap-up
Define your functional
requirements
Define your non-functional
requirements
Managed
RDBMS
Managed
DWH
Managed
NoSQL DB
Backend-as-
a-Service
Proprietary
Serivce
Object Store
How to choose a cloud database:
Wrap-up
Define your functional
requirements
Define your non-functional
requirements
Managed
RDBMS
Managed
DWH
Managed
NoSQL DB
Backend-as-
a-Service
Proprietary
Serivce
Object Store
1. Underyling DB
2. Docs & books
3. Your own tests
Evaluate by:
How to choose a cloud database:
Wrap-up
Define your functional
requirements
Define your non-functional
requirements
Managed
RDBMS
Managed
DWH
Managed
NoSQL DB
Backend-as-
a-Service
Proprietary
Serivce
Object Store
1. Underyling DB
2. Docs & books
3. Your own tests
4. SLAs
5. Docs & books
6. Experience Reports
Evaluate by:
How to choose a cloud database:
Wrap-up
Define your functional
requirements
Define your non-functional
requirements
Managed
RDBMS
Managed
DWH
Managed
NoSQL DB
Backend-as-
a-Service
Proprietary
Serivce
Object Store
1. Underyling DB
2. Docs & books
3. Your own tests
4. SLAs
5. Docs & books
6. Experience Reports
Evaluate by:
Try it
Book recommendations
Book recommendations
• (Non-scientific) literature is rare
• Some books cover specific DBaaS and
cloud platforms
Blogs
http://nosql.mypopescu.com/
http://www.dzone.com/mz/nosql
http://www.dbms2.com/
http://hackingdistributed.com/
http://highscalability.com/
http://www.nosqlweekly.com/
VLDB (Very Large Databases)
SIGMOD (Special Interest Group on Management of Data)
ICDE (International Conference on Data Engineering)
CIDR (Conference on Innovative Data Systems Research)
SOCC (Symposium on Cloud Computing)
OSDI/SOSP (Operating Systems Design and
Implementation/ Symposium on Operating System Principles)
EuroSys
Top Scientific Conferences
Database
Research
Distributed
Systems
Research
VLDB (Very Large Databases)
SIGMOD (Special Interest Group on Management of Data)
ICDE (International Conference on Data Engineering)
CIDR (Conference on Innovative Data Systems Research)
SOCC (Symposium on Cloud Computing)
OSDI/SOSP (Operating Systems Design and
Implementation/ Symposium on Operating System Principles)
EuroSys
Top Scientific Conferences
Database
Research
Distributed
Systems
Research
This year probably in Washington D.C.
Learn more: scdm2013.com
Thank you. Queries?
Contact:
felix.gessert@baqend.com
http://baqend.com
http://orestes.info
http://scdm2013.com

Más contenido relacionado

La actualidad más candente

NoSQL on MySQL - MySQL Document Store by Vadim Tkachenko
NoSQL on MySQL - MySQL Document Store by Vadim TkachenkoNoSQL on MySQL - MySQL Document Store by Vadim Tkachenko
NoSQL on MySQL - MySQL Document Store by Vadim TkachenkoData Con LA
 
Web Performance
Web PerformanceWeb Performance
Web PerformanceBaqend
 
Building a Real-Time Gaming Analytics Service with Apache Druid
Building a Real-Time Gaming Analytics Service with Apache DruidBuilding a Real-Time Gaming Analytics Service with Apache Druid
Building a Real-Time Gaming Analytics Service with Apache DruidImply
 
Benchmarking Apache Druid
Benchmarking Apache Druid Benchmarking Apache Druid
Benchmarking Apache Druid Matt Sarrel
 
Data Analytics and Processing at Snap - Druid Meetup LA - September 2018
Data Analytics and Processing at Snap - Druid Meetup LA - September 2018Data Analytics and Processing at Snap - Druid Meetup LA - September 2018
Data Analytics and Processing at Snap - Druid Meetup LA - September 2018Charles Allen
 
Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8MongoDB
 
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...DataStax
 
Apache Druid®: A Dance of Distributed Processes
 Apache Druid®: A Dance of Distributed Processes Apache Druid®: A Dance of Distributed Processes
Apache Druid®: A Dance of Distributed ProcessesImply
 
How Thermo Fisher is Reducing Data Analysis Times from Days to Minutes with M...
How Thermo Fisher is Reducing Data Analysis Times from Days to Minutes with M...How Thermo Fisher is Reducing Data Analysis Times from Days to Minutes with M...
How Thermo Fisher is Reducing Data Analysis Times from Days to Minutes with M...MongoDB
 
Querying NoSQL with SQL: HAVING Your JSON Cake and SELECTing it too
Querying NoSQL with SQL: HAVING Your JSON Cake and SELECTing it tooQuerying NoSQL with SQL: HAVING Your JSON Cake and SELECTing it too
Querying NoSQL with SQL: HAVING Your JSON Cake and SELECTing it tooAll Things Open
 
MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...
MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...
MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...Prasoon Kumar
 
Analytics over Terabytes of Data at Twitter
Analytics over Terabytes of Data at TwitterAnalytics over Terabytes of Data at Twitter
Analytics over Terabytes of Data at TwitterImply
 
August meetup - All about Apache Druid
August meetup - All about Apache Druid August meetup - All about Apache Druid
August meetup - All about Apache Druid Imply
 
Mongo db eveningschemadesign
Mongo db eveningschemadesignMongo db eveningschemadesign
Mongo db eveningschemadesignMongoDB APAC
 
How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...
How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...
How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...Imply
 
A Presentation on MongoDB Introduction - Habilelabs
A Presentation on MongoDB Introduction - HabilelabsA Presentation on MongoDB Introduction - Habilelabs
A Presentation on MongoDB Introduction - HabilelabsHabilelabs
 
Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Fou...
Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Fou...Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Fou...
Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Fou...✔ Eric David Benari, PMP
 
Introducing Project Alternator - Scylla’s Open-Source DynamoDB-compatible API
Introducing Project Alternator - Scylla’s Open-Source DynamoDB-compatible APIIntroducing Project Alternator - Scylla’s Open-Source DynamoDB-compatible API
Introducing Project Alternator - Scylla’s Open-Source DynamoDB-compatible APIScyllaDB
 

La actualidad más candente (20)

MongoDB on Azure
MongoDB on AzureMongoDB on Azure
MongoDB on Azure
 
NoSQL on MySQL - MySQL Document Store by Vadim Tkachenko
NoSQL on MySQL - MySQL Document Store by Vadim TkachenkoNoSQL on MySQL - MySQL Document Store by Vadim Tkachenko
NoSQL on MySQL - MySQL Document Store by Vadim Tkachenko
 
Web Performance
Web PerformanceWeb Performance
Web Performance
 
Building a Real-Time Gaming Analytics Service with Apache Druid
Building a Real-Time Gaming Analytics Service with Apache DruidBuilding a Real-Time Gaming Analytics Service with Apache Druid
Building a Real-Time Gaming Analytics Service with Apache Druid
 
Benchmarking Apache Druid
Benchmarking Apache Druid Benchmarking Apache Druid
Benchmarking Apache Druid
 
Data Analytics and Processing at Snap - Druid Meetup LA - September 2018
Data Analytics and Processing at Snap - Druid Meetup LA - September 2018Data Analytics and Processing at Snap - Druid Meetup LA - September 2018
Data Analytics and Processing at Snap - Druid Meetup LA - September 2018
 
Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8
 
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
 
Apache Druid®: A Dance of Distributed Processes
 Apache Druid®: A Dance of Distributed Processes Apache Druid®: A Dance of Distributed Processes
Apache Druid®: A Dance of Distributed Processes
 
How Thermo Fisher is Reducing Data Analysis Times from Days to Minutes with M...
How Thermo Fisher is Reducing Data Analysis Times from Days to Minutes with M...How Thermo Fisher is Reducing Data Analysis Times from Days to Minutes with M...
How Thermo Fisher is Reducing Data Analysis Times from Days to Minutes with M...
 
What's new in MongoDB 2.6
What's new in MongoDB 2.6What's new in MongoDB 2.6
What's new in MongoDB 2.6
 
Querying NoSQL with SQL: HAVING Your JSON Cake and SELECTing it too
Querying NoSQL with SQL: HAVING Your JSON Cake and SELECTing it tooQuerying NoSQL with SQL: HAVING Your JSON Cake and SELECTing it too
Querying NoSQL with SQL: HAVING Your JSON Cake and SELECTing it too
 
MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...
MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...
MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...
 
Analytics over Terabytes of Data at Twitter
Analytics over Terabytes of Data at TwitterAnalytics over Terabytes of Data at Twitter
Analytics over Terabytes of Data at Twitter
 
August meetup - All about Apache Druid
August meetup - All about Apache Druid August meetup - All about Apache Druid
August meetup - All about Apache Druid
 
Mongo db eveningschemadesign
Mongo db eveningschemadesignMongo db eveningschemadesign
Mongo db eveningschemadesign
 
How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...
How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...
How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...
 
A Presentation on MongoDB Introduction - Habilelabs
A Presentation on MongoDB Introduction - HabilelabsA Presentation on MongoDB Introduction - Habilelabs
A Presentation on MongoDB Introduction - Habilelabs
 
Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Fou...
Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Fou...Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Fou...
Making MySQL Flexible with ParElastic Database Scalability, Amrith Kumar, Fou...
 
Introducing Project Alternator - Scylla’s Open-Source DynamoDB-compatible API
Introducing Project Alternator - Scylla’s Open-Source DynamoDB-compatible APIIntroducing Project Alternator - Scylla’s Open-Source DynamoDB-compatible API
Introducing Project Alternator - Scylla’s Open-Source DynamoDB-compatible API
 

Destacado

Building a Global-Scale Multi-Tenant Cloud Platform on AWS and Docker: Lesson...
Building a Global-Scale Multi-Tenant Cloud Platform on AWS and Docker: Lesson...Building a Global-Scale Multi-Tenant Cloud Platform on AWS and Docker: Lesson...
Building a Global-Scale Multi-Tenant Cloud Platform on AWS and Docker: Lesson...Felix Gessert
 
Building an Angular 2 App
Building an Angular 2 AppBuilding an Angular 2 App
Building an Angular 2 AppFelix Gessert
 
Pitch auf den Hamburger IT-Strategietagen
Pitch auf den Hamburger IT-StrategietagenPitch auf den Hamburger IT-Strategietagen
Pitch auf den Hamburger IT-StrategietagenFelix Gessert
 
High availability solution database mirroring
High availability solution database mirroringHigh availability solution database mirroring
High availability solution database mirroringMustafa EL-Masry
 
Evaluating Cloud Database Offerings
Evaluating Cloud Database OfferingsEvaluating Cloud Database Offerings
Evaluating Cloud Database OfferingsChristopher Foot
 
[SSA] 03.newsql database (2014.02.05)
[SSA] 03.newsql database (2014.02.05)[SSA] 03.newsql database (2014.02.05)
[SSA] 03.newsql database (2014.02.05)Steve Min
 
Trusted Analytics as a Service (BDT209) | AWS re:Invent 2013
Trusted Analytics as a Service (BDT209) | AWS re:Invent 2013Trusted Analytics as a Service (BDT209) | AWS re:Invent 2013
Trusted Analytics as a Service (BDT209) | AWS re:Invent 2013Amazon Web Services
 
Storage Devices And Backup Media
Storage Devices And Backup MediaStorage Devices And Backup Media
Storage Devices And Backup MediaTyrone Turner
 
DBaaS - The Next generation of database infrastructure
DBaaS - The Next generation of database infrastructureDBaaS - The Next generation of database infrastructure
DBaaS - The Next generation of database infrastructureEmiliano Fusaglia
 
reliability based design optimization for cloud migration
reliability based design optimization for cloud migrationreliability based design optimization for cloud migration
reliability based design optimization for cloud migrationNishmitha B
 
Innovation with Open Source: The New South Wales Judicial Commission experience
Innovation with Open Source: The New South Wales Judicial Commission experienceInnovation with Open Source: The New South Wales Judicial Commission experience
Innovation with Open Source: The New South Wales Judicial Commission experienceLinuxmalaysia Malaysia
 
The Path To Cloud - an Infograph on Cloud Migration
The Path To Cloud - an Infograph on Cloud MigrationThe Path To Cloud - an Infograph on Cloud Migration
The Path To Cloud - an Infograph on Cloud MigrationInApp
 
Aims2011 slacc-presentation final-version
Aims2011 slacc-presentation final-versionAims2011 slacc-presentation final-version
Aims2011 slacc-presentation final-versionictseserv
 
Data SLA in the public cloud
Data SLA in the public cloudData SLA in the public cloud
Data SLA in the public cloudLiran Zelkha
 
Database Architecture & Scaling Strategies, in the Cloud & on the Rack
Database Architecture & Scaling Strategies, in the Cloud & on the Rack Database Architecture & Scaling Strategies, in the Cloud & on the Rack
Database Architecture & Scaling Strategies, in the Cloud & on the Rack Clustrix
 

Destacado (20)

Databases in the Cloud
Databases in the CloudDatabases in the Cloud
Databases in the Cloud
 
Building a Global-Scale Multi-Tenant Cloud Platform on AWS and Docker: Lesson...
Building a Global-Scale Multi-Tenant Cloud Platform on AWS and Docker: Lesson...Building a Global-Scale Multi-Tenant Cloud Platform on AWS and Docker: Lesson...
Building a Global-Scale Multi-Tenant Cloud Platform on AWS and Docker: Lesson...
 
Intro to Baqend
Intro to BaqendIntro to Baqend
Intro to Baqend
 
Building an Angular 2 App
Building an Angular 2 AppBuilding an Angular 2 App
Building an Angular 2 App
 
Pitch auf den Hamburger IT-Strategietagen
Pitch auf den Hamburger IT-StrategietagenPitch auf den Hamburger IT-Strategietagen
Pitch auf den Hamburger IT-Strategietagen
 
High availability solution database mirroring
High availability solution database mirroringHigh availability solution database mirroring
High availability solution database mirroring
 
Evaluating Cloud Database Offerings
Evaluating Cloud Database OfferingsEvaluating Cloud Database Offerings
Evaluating Cloud Database Offerings
 
[SSA] 03.newsql database (2014.02.05)
[SSA] 03.newsql database (2014.02.05)[SSA] 03.newsql database (2014.02.05)
[SSA] 03.newsql database (2014.02.05)
 
Trusted Analytics as a Service (BDT209) | AWS re:Invent 2013
Trusted Analytics as a Service (BDT209) | AWS re:Invent 2013Trusted Analytics as a Service (BDT209) | AWS re:Invent 2013
Trusted Analytics as a Service (BDT209) | AWS re:Invent 2013
 
Storage Devices And Backup Media
Storage Devices And Backup MediaStorage Devices And Backup Media
Storage Devices And Backup Media
 
DBaaS - The Next generation of database infrastructure
DBaaS - The Next generation of database infrastructureDBaaS - The Next generation of database infrastructure
DBaaS - The Next generation of database infrastructure
 
B090827
B090827B090827
B090827
 
Cloud-Powered Social Gaming
Cloud-Powered Social GamingCloud-Powered Social Gaming
Cloud-Powered Social Gaming
 
reliability based design optimization for cloud migration
reliability based design optimization for cloud migrationreliability based design optimization for cloud migration
reliability based design optimization for cloud migration
 
Metrics
MetricsMetrics
Metrics
 
Innovation with Open Source: The New South Wales Judicial Commission experience
Innovation with Open Source: The New South Wales Judicial Commission experienceInnovation with Open Source: The New South Wales Judicial Commission experience
Innovation with Open Source: The New South Wales Judicial Commission experience
 
The Path To Cloud - an Infograph on Cloud Migration
The Path To Cloud - an Infograph on Cloud MigrationThe Path To Cloud - an Infograph on Cloud Migration
The Path To Cloud - an Infograph on Cloud Migration
 
Aims2011 slacc-presentation final-version
Aims2011 slacc-presentation final-versionAims2011 slacc-presentation final-version
Aims2011 slacc-presentation final-version
 
Data SLA in the public cloud
Data SLA in the public cloudData SLA in the public cloud
Data SLA in the public cloud
 
Database Architecture & Scaling Strategies, in the Cloud & on the Rack
Database Architecture & Scaling Strategies, in the Cloud & on the Rack Database Architecture & Scaling Strategies, in the Cloud & on the Rack
Database Architecture & Scaling Strategies, in the Cloud & on the Rack
 

Similar a Cloud Databases in Research and Practice

Dean Bryen: Scaling The Platform For Your Startup
Dean Bryen: Scaling The Platform For Your StartupDean Bryen: Scaling The Platform For Your Startup
Dean Bryen: Scaling The Platform For Your Startuphuguk
 
Database Choices
Database ChoicesDatabase Choices
Database ChoicesLynn Langit
 
Scaling the Platform for Your Startup
Scaling the Platform for Your StartupScaling the Platform for Your Startup
Scaling the Platform for Your StartupAmazon Web Services
 
Securing_Dbs_in_Cloud_v12
Securing_Dbs_in_Cloud_v12Securing_Dbs_in_Cloud_v12
Securing_Dbs_in_Cloud_v12Steve Markey
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's includedJames Serra
 
How to Scale to Millions of Users with AWS
How to Scale to Millions of Users with AWSHow to Scale to Millions of Users with AWS
How to Scale to Millions of Users with AWSAmazon Web Services
 
Scaling the Platform for Your Startup
Scaling the Platform for Your StartupScaling the Platform for Your Startup
Scaling the Platform for Your StartupAmazon Web Services
 
Not only SQL - Database Choices
Not only SQL - Database ChoicesNot only SQL - Database Choices
Not only SQL - Database ChoicesLynn Langit
 
Travel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech infoTravel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech infoRakuten Group, Inc.
 
Data Replication Options in AWS
Data Replication Options in AWSData Replication Options in AWS
Data Replication Options in AWSIrawan Soetomo
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Martin Bém
 
Scaling the Platform for Your Startup - Startup Talks June 2015
Scaling the Platform for Your Startup - Startup Talks June 2015Scaling the Platform for Your Startup - Startup Talks June 2015
Scaling the Platform for Your Startup - Startup Talks June 2015Amazon Web Services
 
PASS_Summit_2019_Azure_Storage_Options_for_Analytics
PASS_Summit_2019_Azure_Storage_Options_for_AnalyticsPASS_Summit_2019_Azure_Storage_Options_for_Analytics
PASS_Summit_2019_Azure_Storage_Options_for_AnalyticsDustin Vannoy
 
Samedi SQL Québec - La plateforme data de Azure
Samedi SQL Québec - La plateforme data de AzureSamedi SQL Québec - La plateforme data de Azure
Samedi SQL Québec - La plateforme data de AzureMSDEVMTL
 
NoSQL: Why, When, and How
NoSQL: Why, When, and HowNoSQL: Why, When, and How
NoSQL: Why, When, and HowBigBlueHat
 
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Precisely
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)James Serra
 
The Time Has Come for Big-Data-as-a-Service
The Time Has Come for Big-Data-as-a-ServiceThe Time Has Come for Big-Data-as-a-Service
The Time Has Come for Big-Data-as-a-ServiceBlueData, Inc.
 

Similar a Cloud Databases in Research and Practice (20)

Dean Bryen: Scaling The Platform For Your Startup
Dean Bryen: Scaling The Platform For Your StartupDean Bryen: Scaling The Platform For Your Startup
Dean Bryen: Scaling The Platform For Your Startup
 
Database Choices
Database ChoicesDatabase Choices
Database Choices
 
Scaling the Platform for Your Startup
Scaling the Platform for Your StartupScaling the Platform for Your Startup
Scaling the Platform for Your Startup
 
How and when to use NoSQL
How and when to use NoSQLHow and when to use NoSQL
How and when to use NoSQL
 
Securing_Dbs_in_Cloud_v12
Securing_Dbs_in_Cloud_v12Securing_Dbs_in_Cloud_v12
Securing_Dbs_in_Cloud_v12
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's included
 
How to Scale to Millions of Users with AWS
How to Scale to Millions of Users with AWSHow to Scale to Millions of Users with AWS
How to Scale to Millions of Users with AWS
 
Scaling the Platform for Your Startup
Scaling the Platform for Your StartupScaling the Platform for Your Startup
Scaling the Platform for Your Startup
 
Not only SQL - Database Choices
Not only SQL - Database ChoicesNot only SQL - Database Choices
Not only SQL - Database Choices
 
Nosql seminar
Nosql seminarNosql seminar
Nosql seminar
 
Travel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech infoTravel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech info
 
Data Replication Options in AWS
Data Replication Options in AWSData Replication Options in AWS
Data Replication Options in AWS
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27
 
Scaling the Platform for Your Startup - Startup Talks June 2015
Scaling the Platform for Your Startup - Startup Talks June 2015Scaling the Platform for Your Startup - Startup Talks June 2015
Scaling the Platform for Your Startup - Startup Talks June 2015
 
PASS_Summit_2019_Azure_Storage_Options_for_Analytics
PASS_Summit_2019_Azure_Storage_Options_for_AnalyticsPASS_Summit_2019_Azure_Storage_Options_for_Analytics
PASS_Summit_2019_Azure_Storage_Options_for_Analytics
 
Samedi SQL Québec - La plateforme data de Azure
Samedi SQL Québec - La plateforme data de AzureSamedi SQL Québec - La plateforme data de Azure
Samedi SQL Québec - La plateforme data de Azure
 
NoSQL: Why, When, and How
NoSQL: Why, When, and HowNoSQL: Why, When, and How
NoSQL: Why, When, and How
 
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)
 
The Time Has Come for Big-Data-as-a-Service
The Time Has Come for Big-Data-as-a-ServiceThe Time Has Come for Big-Data-as-a-Service
The Time Has Come for Big-Data-as-a-Service
 

Último

STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxMurugaveni B
 
trihybrid cross , test cross chi squares
trihybrid cross , test cross chi squarestrihybrid cross , test cross chi squares
trihybrid cross , test cross chi squaresusmanzain586
 
Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫qfactory1
 
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxmaryFF1
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxJorenAcuavera1
 
Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptJoemSTuliba
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPirithiRaju
 
PROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and VerticalPROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and VerticalMAESTRELLAMesa2
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationColumbia Weather Systems
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024innovationoecd
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...D. B. S. College Kanpur
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringPrajakta Shinde
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayupadhyaymani499
 
Ai in communication electronicss[1].pptx
Ai in communication electronicss[1].pptxAi in communication electronicss[1].pptx
Ai in communication electronicss[1].pptxsubscribeus100
 
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests GlycosidesGLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests GlycosidesNandakishor Bhaurao Deshmukh
 
Organic farming with special reference to vermiculture
Organic farming with special reference to vermicultureOrganic farming with special reference to vermiculture
Organic farming with special reference to vermicultureTakeleZike1
 

Último (20)

STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
 
trihybrid cross , test cross chi squares
trihybrid cross , test cross chi squarestrihybrid cross , test cross chi squares
trihybrid cross , test cross chi squares
 
AZOTOBACTER AS BIOFERILIZER.PPTX
AZOTOBACTER AS BIOFERILIZER.PPTXAZOTOBACTER AS BIOFERILIZER.PPTX
AZOTOBACTER AS BIOFERILIZER.PPTX
 
Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫Manassas R - Parkside Middle School 🌎🏫
Manassas R - Parkside Middle School 🌎🏫
 
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptx
 
Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.ppt
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
 
PROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and VerticalPROJECTILE MOTION-Horizontal and Vertical
PROJECTILE MOTION-Horizontal and Vertical
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather Station
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptx
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical Engineering
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyay
 
Ai in communication electronicss[1].pptx
Ai in communication electronicss[1].pptxAi in communication electronicss[1].pptx
Ai in communication electronicss[1].pptx
 
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests GlycosidesGLYCOSIDES Classification Of GLYCOSIDES  Chemical Tests Glycosides
GLYCOSIDES Classification Of GLYCOSIDES Chemical Tests Glycosides
 
Organic farming with special reference to vermiculture
Organic farming with special reference to vermicultureOrganic farming with special reference to vermiculture
Organic farming with special reference to vermiculture
 

Cloud Databases in Research and Practice

  • 2. About me  PhD student (database group university of hamburg)
  • 3. About me  PhD student (database group university of hamburg) Research Project for PhD
  • 4. About me  PhD student (database group university of hamburg) Research Project for PhD Cloud Database Startup
  • 5. Outline • Categories of Cloud Databases • Properties What are Cloud Databases? Cloud Databases in the wild Research Perspectives Wrap-up and literature
  • 6. 2003: GFS (Google) 2006: BigTable (Google) 2007: Dynamo (Amazon) 2007: S3 (Amazon) 2008: SimpleDB (Amazon) 2010: SQL Azure 2011: Parse 2012: DynamoDB (Amazon) 2013: Redshift (Amazon)
  • 7. Context: What kinds of Cloud databases are there?
  • 8. DBaaS Infrastructure-as-a-Service Platform-as-a-Service …Database-as-a-Service Cloud SQL Amazon RDS SQL Azure Cloudant MongoHQ Parse Orestes Google F1 DynamoDB BigQuery EMR GCS S3
  • 9. DBaaS Infrastructure-as-a-Service Platform-as-a-Service …Database-as-a-Service Cloud SQL Amazon RDS SQL Azure Cloudant MongoHQ Parse Orestes Google F1 DynamoDB Managed RDBMSs BigQuery EMR GCS S3
  • 10. DBaaS Infrastructure-as-a-Service Platform-as-a-Service …Database-as-a-Service Cloud SQL Amazon RDS SQL Azure Cloudant MongoHQ Parse Orestes Google F1 DynamoDB Managed RDBMSs Cloud-Deployment of DBMSs BigQuery EMR GCS S3
  • 11. DBaaS Infrastructure-as-a-Service Platform-as-a-Service …Database-as-a-Service Cloud SQL Amazon RDS SQL Azure Cloudant MongoHQ Parse Orestes Google F1 DynamoDB Managed NoSQL DBs Managed RDBMSs Cloud-Deployment of DBMSs BigQuery EMR GCS S3
  • 12. DBaaS Infrastructure-as-a-Service Platform-as-a-Service …Database-as-a-Service Cloud SQL Amazon RDS SQL Azure Cloudant MongoHQ Parse Orestes Google F1 DynamoDB Managed NoSQL DBs Managed RDBMSs Cloud-Deployment of DBMSs Cloud-only DBaaS-Systems BigQuery EMR GCS S3
  • 13. DBaaS Infrastructure-as-a-Service Platform-as-a-Service …Database-as-a-Service Cloud SQL Amazon RDS SQL Azure Cloudant MongoHQ Parse Orestes Google F1 DynamoDB Managed NoSQL DBs Managed RDBMSs Cloud-Deployment of DBMSs Cloud-only DBaaS-Systems BigQuery EMR Analytics-as- a-Service GCS S3
  • 14. DBaaS Infrastructure-as-a-Service Platform-as-a-Service …Database-as-a-Service Cloud SQL Amazon RDS SQL Azure Cloudant MongoHQ Parse Orestes Google F1 DynamoDB Managed NoSQL DBs Managed RDBMSs Cloud-Deployment of DBMSs Cloud-only DBaaS-Systems BigQuery EMR Analytics-as- a-Service GCS S3 Object Stores
  • 15. DBaaS Infrastructure-as-a-Service Platform-as-a-Service …Database-as-a-Service Cloud SQL Amazon RDS SQL Azure Cloudant MongoHQ Parse Orestes Google F1 DynamoDB Managed NoSQL DBs Managed RDBMSs Storage APIs Cloud-Deployment of DBMSs Cloud-only DBaaS-Systems BigQuery EMR Analytics-as- a-Service GCS S3 Object Stores
  • 16. DBaaS Infrastructure-as-a-Service Platform-as-a-Service …Database-as-a-Service Cloud SQL Amazon RDS SQL Azure Cloudant MongoHQ Parse Orestes Google F1 DynamoDB Managed NoSQL DBs Managed RDBMSs Backend-as-a- Service Storage APIs Cloud-Deployment of DBMSs Cloud-only DBaaS-Systems BigQuery EMR Analytics-as- a-Service GCS S3 Object Stores
  • 20. Application Architecture <-> Cloud Database Category Architectures Applications Data Warehouse Operative Database Reporting Data MiningAnalytics DataManagementDataAnalytics
  • 21. Application Architecture <-> Cloud Database Category Architectures Applications Data Warehouse Operative Database Reporting Data MiningAnalytics DataManagementDataAnalytics DBaaS
  • 22. Application Architecture <-> Cloud Database Category Architectures Applications Data Warehouse Operative Database Reporting Data MiningAnalytics DataManagementDataAnalytics DBaaSProbaby not.
  • 25. Cloud-Deployed Database IaaS-Cloud Cloud-deploy your favourite database system Does not solve: Provisioning, Backups, Security, Scaling, Elasticity, Performance Tuning, Failover, Replication, ...
  • 27. Managed RDBMS/DWH/NoSQL DB IaaS-Cloud RDBMS DWH NoSQL DB DBaaS-Provider Provisioning, Backups, Security, Scaling, Elasticity, Performance Tuning, Failover, Replication, ...
  • 28. Managed RDBMS/DWH/NoSQL DB IaaS-Cloud RDBMS DWH NoSQL DB DBaaS-Provider
  • 29. Managed RDBMS/DWH/NoSQL DB IaaS-Cloud RDBMS DWH NoSQL DB DBaaS-Provider SQL Azure Google Cloud SQL RDBMS
  • 30. Managed RDBMS/DWH/NoSQL DB IaaS-Cloud RDBMS DWH NoSQL DB DBaaS-Provider SQL Azure Google Cloud SQL RDBMSNoSQLDB
  • 31. Managed RDBMS/DWH/NoSQL DB IaaS-Cloud RDBMS DWH NoSQL DB DBaaS-Provider Amazon Redshift SQL Azure Google Cloud SQL RDBMSNoSQLDBDWH
  • 32. Proprietary DB/Object Store Cloud Black-Box Database or file system Managed by Cloud Provider Provider‘sAPI
  • 33. Proprietary DB/Object Store Cloud Black-Box Database or file system Managed by Cloud Provider Provider‘sAPI Amazon SimpleDB Google Cloud Datastore Azure Tables Database.com BigTable, Megastore, Spanner, F1, Dynamo, PNuts, Relational Cloud, … ProprietaryDB
  • 34. Proprietary DB/Object Store Cloud Black-Box Database or file system Managed by Cloud Provider Provider‘sAPI Amazon SimpleDB Google Cloud Storage Azure Blob Storage Google Cloud Datastore Azure Tables Openstack Swift Database.com BigTable, Megastore, Spanner, F1, Dynamo, PNuts, Relational Cloud, … ProprietaryDBObjectStore
  • 36. Backend-as-a-Service, Polyglot Persistence Service IaaS-Cloud Backend API Service-Layer Data API Authentication, Users, Validation,etc. Maps to (different) databases
  • 38. Backend-as-a-Service, Polyglot Persistence Service IaaS-Cloud Backend API Service-Layer Data API Realtime BaaS BaaS AppCelerator Cloud
  • 42. DBaaS: Common Aspects #1 Metric: Total Cost Daniela Florescu and Donald Kossmann “Rethinking cost and performance of database systems”, SIGMOD Rec. 2009.
  • 43. DBaaS: Common Aspects #1 Metric: Total Cost Maximum utilization of available hardware: Multi-Tenancy Daniela Florescu and Donald Kossmann “Rethinking cost and performance of database systems”, SIGMOD Rec. 2009.
  • 44. DBaaS: Common Aspects  Multi-Tenancy - four common approaches: T. Kiefer, W. Lehner “Private table database virtualization for dbaas” UCC, 2011 Private OS Private Process/DB Private Schema Shared Schema
  • 45. DBaaS: Common Aspects  Multi-Tenancy - four common approaches: T. Kiefer, W. Lehner “Private table database virtualization for dbaas” UCC, 2011 Private OS VM Hardware Resources Database Process Database Schema Private Process/DB Private Schema Shared Schema e.g. Amazon RDS
  • 46. DBaaS: Common Aspects  Multi-Tenancy - four common approaches: T. Kiefer, W. Lehner “Private table database virtualization for dbaas” UCC, 2011 Private OS VM Hardware Resources Database Process Database Schema Private Process/DB Private Schema VM Hardware Resources Database Process Database Schema Shared Schema e.g. Amazon RDS e.g. MongoHQ
  • 47. DBaaS: Common Aspects  Multi-Tenancy - four common approaches: T. Kiefer, W. Lehner “Private table database virtualization for dbaas” UCC, 2011 Private OS VM Hardware Resources Database Process Database Schema Private Process/DB Private Schema VM Hardware Resources Database Process Database Schema VM Hardware Resources Database Process Database Schema Shared Schema e.g. Amazon RDS e.g. MongoHQ e.g. Google DataStore
  • 48. DBaaS: Common Aspects  Multi-Tenancy - four common approaches: T. Kiefer, W. Lehner “Private table database virtualization for dbaas” UCC, 2011 Private OS VM Hardware Resources Database Process Database Schema Private Process/DB Private Schema VM Hardware Resources Database Process Database Schema VM Hardware Resources Database Process Database Schema Shared Schema VM Hardware Resources Database Process Database Schema Virtual Schema e.g. Amazon RDS e.g. MongoHQ e.g. Google DataStore Most SaaS Apps
  • 49. DBaaS: Common Aspects  Multi-Tenancy - four common approaches: W. Lehner, U. Sattler “Web-scale Data Management for the Cloud” Springer, 2013 Private OS Private Process/DB Private Schema Shared Schema App. indep. Isolation Ressource Util. Maintenance, Provisioning
  • 50.  Billing Models: DBaaS: Common Aspects Usage Account
  • 51.  Billing Models: DBaaS: Common Aspects Usage Account Pay-per-use Parameters: Network, Bandwidth, Storage, CPU, Requests, etc. Payment: Pre-Paid, Post-Paid Variants: On-Demand, Auction, Reserved e.g. DynamoDB
  • 52.  Billing Models: DBaaS: Common Aspects Usage Account End of month Plan-based Parameters: Allocated Plan (e.g. 2 instances + X GB storage) e.g. MongoHQ
  • 53.  Billing Models: DBaaS: Common Aspects Usage Account End of month Plan-based Parameters: Allocated Plan (e.g. 2 instances + X GB storage) Free Tier: free plan or free initial account credit e.g. MongoHQ
  • 56. DBaaS: Common Aspects Internal Schemes External Identity Provider Federated Identity (SSO) e.g. Amazon IAM e.g. OpenID e.g. SAML Database-a- a-Service Authentication Authorization API Authenticate
  • 57. DBaaS: Common Aspects Internal Schemes External Identity Provider Federated Identity (SSO) e.g. Amazon IAM e.g. OpenID e.g. SAML Used extensively Database-a- a-Service Authentication Authorization API Authenticate
  • 58. DBaaS: Common Aspects Internal Schemes External Identity Provider Federated Identity (SSO) e.g. Amazon IAM e.g. OpenID e.g. SAML Used extensively Database-a- a-Service Authentication Authorization API Authenticate Token
  • 59. DBaaS: Common Aspects Internal Schemes External Identity Provider Federated Identity (SSO) e.g. Amazon IAM e.g. OpenID e.g. SAML Used extensively Database-a- a-Service Authentication Authorization API Authenticate Token Authenticated Request
  • 60. DBaaS: Common Aspects Internal Schemes External Identity Provider Federated Identity (SSO) e.g. Amazon IAM e.g. OpenID e.g. SAML Used extensively User-based Access Control Role-based Access Control Policies e.g. Amazon S3 ACLs e.g. Amazon IAM e.g. XACML Database-a- a-Service Authentication Authorization API Authenticate Token Authenticated Request
  • 61. DBaaS: Common Aspects Internal Schemes External Identity Provider Federated Identity (SSO) e.g. Amazon IAM e.g. OpenID e.g. SAML Used extensively User-based Access Control Role-based Access Control Policies e.g. Amazon S3 ACLs e.g. Amazon IAM e.g. XACML Database-a- a-Service Authentication Authorization API Authenticate Token Authenticated Request Response
  • 62. DBaaS: Common Aspects Internal Schemes External Identity Provider Federated Identity (SSO) e.g. Amazon IAM e.g. OpenID e.g. SAML Used extensively User-based Access Control Role-based Access Control Policies e.g. Amazon S3 ACLs e.g. Amazon IAM e.g. XACML Database-a- a-Service Authentication Authorization API Authenticate Token Authenticated Request Response
  • 63. DBaaS: Common Aspects Internal Schemes External Identity Provider Federated Identity (SSO) e.g. Amazon IAM e.g. OpenID e.g. SAML Used extensively User-based Access Control Role-based Access Control Policies e.g. Amazon S3 ACLs e.g. Amazon IAM e.g. XACML Database-a- a-Service Authentication Authorization API Authenticate Token Authenticated Request Response Federated ACLs M. Decat, B. Lagaisse, et al. “Toward efficient and confidentiality-aware federation of access control policies “, DOA Trusted Cloud 2013 • Imagine ACL: „Patient data can only be accessed by treating physician“ • Idea: decompose policy and evaluate parts near data owner
  • 64.  Service Level Agreements DBaaS: Common Aspects SLA
  • 65.  Service Level Agreements DBaaS: Common Aspects SLA Legal Part 1. Fees 2. Penalties Technical Part 1. SLO 2. SLO 3. SLO
  • 66.  Service Level Agreements DBaaS: Common Aspects SLA Legal Part 1. Fees 2. Penalties Technical Part 1. SLO 2. SLO 3. SLO Service Level Objectives: • Availability • Durability • Consistency/Staleness • Query Response Time
  • 67.  SLAs – achieved through Workload Management DBaaS: Common Aspects W. Lehner, U. Sattler “Web-scale Data Management for the Cloud” Springer, 2013
  • 68.  SLAs – achieved through Workload Management DBaaS: Common Aspects W. Lehner, U. Sattler “Web-scale Data Management for the Cloud” Springer, 2013
  • 69.  SLAs – achieved through Workload Management DBaaS: Common Aspects W. Lehner, U. Sattler “Web-scale Data Management for the Cloud” Springer, 2013
  • 70.  SLAs – achieved through Workload Management DBaaS: Common Aspects W. Lehner, U. Sattler “Web-scale Data Management for the Cloud” Springer, 2013
  • 71.  SLAs – achieved through Workload Management DBaaS: Common Aspects W. Lehner, U. Sattler “Web-scale Data Management for the Cloud” Springer, 2013 Maximize:
  • 72.  SLAs – achieved through Workload Management DBaaS: Common Aspects W. Lehner, U. Sattler “Web-scale Data Management for the Cloud” Springer, 2013
  • 73.  SLAs – achieved through Workload Management DBaaS: Common Aspects W. Lehner, U. Sattler “Web-scale Data Management for the Cloud” Springer, 2013 QOS for NoSQL DBs Y. Zhu et al. “Scheduling with Freshness and Performance Guarantees for Web Applications in the Cloud“, CRPIT Old: Workload management in RDBMs (DB2 and Oracle) New: Use well-known scheduling algorithms for queries in replicated DBs
  • 74.  Resource Provisionig  Goal: Resources ⇔ SLAs DBaaS: Common Aspects T. Lorido-Botran, J. Miguel-Alonso et al.: “Auto-scaling Techniques for Elastic Applications in Cloud Environments”. Technical Report, 2013 Resources Time
  • 75.  Resource Provisionig  Goal: Resources ⇔ SLAs DBaaS: Common Aspects T. Lorido-Botran, J. Miguel-Alonso et al.: “Auto-scaling Techniques for Elastic Applications in Cloud Environments”. Technical Report, 2013 Resources Time Expected Load
  • 76.  Resource Provisionig  Goal: Resources ⇔ SLAs DBaaS: Common Aspects T. Lorido-Botran, J. Miguel-Alonso et al.: “Auto-scaling Techniques for Elastic Applications in Cloud Environments”. Technical Report, 2013 Resources Time Expected Load Provisioned Resources: • #No of Shard- or Replica servers • Computing, Storage, Network Capacities
  • 77.  Resource Provisionig  Goal: Resources ⇔ SLAs DBaaS: Common Aspects T. Lorido-Botran, J. Miguel-Alonso et al.: “Auto-scaling Techniques for Elastic Applications in Cloud Environments”. Technical Report, 2013 Resources Time Actual Load
  • 78.  Resource Provisionig  Goal: Resources ⇔ SLAs DBaaS: Common Aspects T. Lorido-Botran, J. Miguel-Alonso et al.: “Auto-scaling Techniques for Elastic Applications in Cloud Environments”. Technical Report, 2013 Resources Time Actual Load Overprovisioning: • SLAs met • Excess Capacities
  • 79.  Resource Provisionig  Goal: Resources ⇔ SLAs DBaaS: Common Aspects T. Lorido-Botran, J. Miguel-Alonso et al.: “Auto-scaling Techniques for Elastic Applications in Cloud Environments”. Technical Report, 2013 Resources Time Actual Load Overprovisioning: • SLAs met • Excess Capacities Underprovisioning: • SLAs violated • Usage maximized
  • 80.  Resource Provisionig  Goal: Resources ⇔ SLAs DBaaS: Common Aspects T. Lorido-Botran, J. Miguel-Alonso et al.: “Auto-scaling Techniques for Elastic Applications in Cloud Environments”. Technical Report, 2013 Resources Time Actual Load Overprovisioning: • SLAs met • Excess Capacities Underprovisioning: • SLAs violated • Usage maximized SmartSLA P. Xiong: “Intelligent management of virtualized resources for database systems in cloud environment”, ICDE 2011 Solution: machine learning (regression + boosting) for prediction  choose allocation that minimizes SLA penalties Resource allocation Database performance Learn Mapping
  • 81. Functional Requirements Scan-Querys Conditional Updates Transactions Query by Example Joins Analytics Elasticity Consistency Read-Latency Write-Latency Write-Throughput Scalability of Data Volume Read Scalability Read-Availability Write-Availability Non-Functional Requirements Durability Write Scalability DBaaS: General Considerations
  • 82. Functional Requirements Scan-Querys Conditional Updates Transactions Query by Example Joins Analytics Elasticity Consistency Read-Latency Write-Latency Write-Throughput Scalability of Data Volume Read Scalability Read-Availability Write-Availability Non-Functional Requirements Durability Write Scalability DBaaS: General Considerations
  • 83. Functional Requirements Scan-Querys Conditional Updates Transactions Query by Example Joins Analytics Elasticity Consistency Read-Latency Write-Latency Write-Throughput Scalability of Data Volume Read Scalability Read-Availability Write-Availability Non-Functional Requirements Durability Write Scalability DBaaS: General Considerations
  • 84. Functional Requirements Scan-Querys Conditional Updates Transactions Query by Example Joins Analytics Elasticity Consistency Read-Latency Write-Latency Write-Throughput Scalability of Data Volume Read Scalability Read-Availability Write-Availability Non-Functional Requirements Durability Write Scalability DBaaS: General Considerations aaS
  • 85. Functional Requirements Scan-Querys Conditional Updates Transactions Query by Example Joins Analytics Elasticity Consistency Read-Latency Write-Latency Write-Throughput Scalability of Data Volume Read Scalability Read-Availability Write-Availability Non-Functional Requirements Durability Write Scalability DBaaS: General Considerations aaS Questions to ask: • Which requirements are met by the DB? • Which are met by the provider?  SLAs
  • 86. Outline Examples of different cloud database systems: • Cloud-deployed • Managed DBMS • SQL • NoSQL • Proprietary • BaaS What are Cloud Databases? Cloud Databases in the wild Research Perspectives Wrap-up and literature
  • 87.  Idea: Run (mostly) unmodified DB on IaaS Cloud-Deployed DB  Method I: DIY  Method II: Deployment Tools  Method III: Marketplaces
  • 88.  Idea: Run (mostly) unmodified DB on IaaS Cloud-Deployed DB  Method I: DIY  Method II: Deployment Tools  Method III: Marketplaces 1. Provision VM(s)
  • 89.  Idea: Run (mostly) unmodified DB on IaaS Cloud-Deployed DB  Method I: DIY  Method II: Deployment Tools  Method III: Marketplaces 1. Provision VM(s) 2. Install DBMS (manual, script, Chef, Puppet)
  • 90.  Idea: Run (mostly) unmodified DB on IaaS Cloud-Deployed DB  Method I: DIY  Method II: Deployment Tools  Method III: Marketplaces > whirr launch-cluster --config hbase.properties Login, cluster-size etc. Amazon EC2 1. Provision VM(s) 2. Install DBMS (manual, script, Chef, Puppet)
  • 91.  Idea: Run (mostly) unmodified DB on IaaS Cloud-Deployed DB  Method I: DIY  Method II: Deployment Tools  Method III: Marketplaces > whirr launch-cluster --config hbase.properties Login, cluster-size etc. Amazon EC2 1. Provision VM(s) 2. Install DBMS (manual, script, Chef, Puppet)
  • 92.  Idea: Run preconfigured DB on IaaS AWS Marketplace AWS Marketplace Model: Cloud-Deployed Pricing: Instance + Volume + License Underlying DB: Choosable API: DB-specific
  • 93.  Idea: Run preconfigured DB on IaaS AWS Marketplace AWS Marketplace Model: Cloud-Deployed Pricing: Instance + Volume + License Underlying DB: Choosable API: DB-specific
  • 94.  Idea: Run preconfigured DB on IaaS AWS Marketplace AWS Marketplace Model: Cloud-Deployed Pricing: Instance + Volume + License Underlying DB: Choosable API: DB-specific
  • 95.  Idea: Run preconfigured DB on IaaS AWS Marketplace AWS Marketplace Model: Cloud-Deployed Pricing: Instance + Volume + License Underlying DB: Choosable API: DB-specific
  • 96.  Idea: Run preconfigured DB on IaaS AWS Marketplace AWS Marketplace Model: Cloud-Deployed Pricing: Instance + Volume + License Underlying DB: Choosable API: DB-specific
  • 97.  Idea: Run preconfigured DB on IaaS AWS Marketplace AWS Marketplace Model: Cloud-Deployed Pricing: Instance + Volume + License Underlying DB: Choosable API: DB-specific
  • 98.  Idea: Run preconfigured DB on IaaS AWS Marketplace AWS Marketplace Model: Cloud-Deployed Pricing: Instance + Volume + License Underlying DB: Choosable API: DB-specific
  • 99.  Idea: Run preconfigured DB on IaaS AWS Marketplace AWS Marketplace Model: Cloud-Deployed Pricing: Instance + Volume + License Underlying DB: Choosable API: DB-specific Bad: • No Clusters • Not managed (automatic Updates, Snapshots, etc.) • Private OS Multi-Tenancy  Bad Resource Usage Good: • Easy to get started
  • 100. Amazon Elastic MapReduce EMR Model: Analytics-aaS Pricing: Infrastructure API: Hadoop Amazon Elastic MapReduce W. Lehner, U. Sattler “Web-scale Data Management for the Cloud” Springer, 2013
  • 101. Amazon Elastic MapReduce EMR Model: Analytics-aaS Pricing: Infrastructure API: Hadoop Amazon Elastic MapReduce Provisions W. Lehner, U. Sattler “Web-scale Data Management for the Cloud” Springer, 2013
  • 102. Amazon Elastic MapReduce EMR Model: Analytics-aaS Pricing: Infrastructure API: Hadoop Amazon Elastic MapReduce Job Tracker Task Tracker + HDFS Data Node Task Tracker Provisions W. Lehner, U. Sattler “Web-scale Data Management for the Cloud” Springer, 2013
  • 103. Amazon Elastic MapReduce EMR Model: Analytics-aaS Pricing: Infrastructure API: Hadoop Amazon Elastic MapReduce Data Source and Sink Job Tracker Task Tracker + HDFS Data Node Task Tracker Provisions W. Lehner, U. Sattler “Web-scale Data Management for the Cloud” Springer, 2013
  • 104. Amazon Elastic MapReduce EMR Model: Analytics-aaS Pricing: Infrastructure API: Hadoop Amazon Elastic MapReduce Submits Hadoop Jobs as: • JAR • Streaming • Cascading • Pig • Hive • Impala Data Source and Sink Job Tracker Task Tracker + HDFS Data Node Task Tracker Provisions W. Lehner, U. Sattler “Web-scale Data Management for the Cloud” Springer, 2013
  • 105. Amazon Elastic MapReduce EMR Model: Analytics-aaS Pricing: Infrastructure API: Hadoop Amazon Elastic MapReduce Submits Hadoop Jobs as: • JAR • Streaming • Cascading • Pig • Hive • Impala Data Source and Sink Job Tracker Task Tracker + HDFS Data Node Task Tracker Provisions W. Lehner, U. Sattler “Web-scale Data Management for the Cloud” Springer, 2013 • No data locality with S3 • AWS Import/Export: send your HDD • HBase Integration • Compatible with Spot and Reserved Instances • Similar: Azure HDInsight
  • 106.  Idea: Web-scale analysis of nested data Google BigQuery BigQuery Model: Analytics-aaS Pricing: Storage + GBs Processed API: REST Google BigQuery
  • 107.  Idea: Web-scale analysis of nested data Google BigQuery BigQuery Model: Analytics-aaS Pricing: Storage + GBs Processed API: REST Google BigQuery
  • 108.  Idea: Web-scale analysis of nested data Google BigQuery BigQuery Model: Analytics-aaS Pricing: Storage + GBs Processed API: REST Google BigQuery Dremel Melnik et al. “Dremel: Interactive analysis of web-scale datasets”, VLDB 2010 Idea: Multi-Level execution tree on nested columnar data format (≥100 nodes)
  • 109.  Idea: Web-scale analysis of nested data Google BigQuery BigQuery Model: Analytics-aaS Pricing: Storage + GBs Processed API: REST Google BigQuery Dremel Melnik et al. “Dremel: Interactive analysis of web-scale datasets”, VLDB 2010 Idea: Multi-Level execution tree on nested columnar data format (≥100 nodes) • SLA: 99.9% uptime / month • Fundamentally different from relational DWHs and MapReduce • Design copied by Apache Drill, Impala, Shark
  • 110.  Relational Database Service Amazon RDS RDS Model: Managed RDBMS Pricing: Instance + Volume + License Underlying DB: MySQL, Postgres, MSSQL, Oracle API: DB-specific
  • 111.  Relational Database Service Amazon RDS RDS Model: Managed RDBMS Pricing: Instance + Volume + License Underlying DB: MySQL, Postgres, MSSQL, Oracle API: DB-specific
  • 112.  Relational Database Service Amazon RDS RDS Model: Managed RDBMS Pricing: Instance + Volume + License Underlying DB: MySQL, Postgres, MSSQL, Oracle API: DB-specific • Synchronous Replication • Automatic Failover
  • 113.  Relational Database Service Amazon RDS RDS Model: Managed RDBMS Pricing: Instance + Volume + License Underlying DB: MySQL, Postgres, MSSQL, Oracle API: DB-specific • Synchronous Replication • Automatic Failover 99,95% uptime SLA
  • 114.  Relational Database Service Amazon RDS RDS Model: Managed RDBMS Pricing: Instance + Volume + License Underlying DB: MySQL, Postgres, MSSQL, Oracle API: DB-specific • Synchronous Replication • Automatic Failover 99,95% uptime SLA Provisioned IOPS: access to EBS volumes network- optimized (up to 4000 IOPS)
  • 115.  Relational Database Service Amazon RDS RDS Model: Managed RDBMS Pricing: Instance + Volume + License Underlying DB: MySQL, Postgres, MSSQL, Oracle API: DB-specific
  • 116.  Relational Database Service Amazon RDS RDS Model: Managed RDBMS Pricing: Instance + Volume + License Underlying DB: MySQL, Postgres, MSSQL, Oracle API: DB-specific EC2 instances: Up to 32 Cores, 244 GB RAM, 10 GbE
  • 117.  Relational Database Service Amazon RDS RDS Model: Managed RDBMS Pricing: Instance + Volume + License Underlying DB: MySQL, Postgres, MSSQL, Oracle API: DB-specific EC2 instances: Up to 32 Cores, 244 GB RAM, 10 GbE Minor Version Upgrades are performed without downtime
  • 118.  Relational Database Service Amazon RDS RDS Model: Managed RDBMS Pricing: Instance + Volume + License Underlying DB: MySQL, Postgres, MSSQL, Oracle API: DB-specific
  • 119.  Relational Database Service Amazon RDS RDS Model: Managed RDBMS Pricing: Instance + Volume + License Underlying DB: MySQL, Postgres, MSSQL, Oracle API: DB-specific Backups are automated and scheduled
  • 120.  Relational Database Service Amazon RDS RDS Model: Managed RDBMS Pricing: Instance + Volume + License Underlying DB: MySQL, Postgres, MSSQL, Oracle API: DB-specific Backups are automated and scheduled • Support for (asynchronous) Read Replicas • Administration: Web-based or SDKs • Only RDBMSs • “Analytic Brother“ of RDS: RedShift (PDWH)
  • 121.  Similar to RDS Microsoft SQL Azure SQL Azure Model: Managed RDBMS Pricing: Database size Underlying DB: MSSQL Server API: T-SQL/TDS SQL Azure
  • 122.  Similar to RDS Microsoft SQL Azure SQL Azure Model: Managed RDBMS Pricing: Database size Underlying DB: MSSQL Server API: T-SQL/TDS SQL Azure
  • 123.  Similar to RDS Microsoft SQL Azure SQL Azure Model: Managed RDBMS Pricing: Database size Underlying DB: MSSQL Server API: T-SQL/TDS SQL Azure
  • 124.  Similar to RDS Microsoft SQL Azure SQL Azure Model: Managed RDBMS Pricing: Database size Underlying DB: MSSQL Server API: T-SQL/TDS SQL Azure
  • 125.  Similar to RDS Microsoft SQL Azure SQL Azure Model: Managed RDBMS Pricing: Database size Underlying DB: MSSQL Server API: T-SQL/TDS SQL Azure
  • 126.  Similar to RDS Microsoft SQL Azure SQL Azure Model: Managed RDBMS Pricing: Database size Underlying DB: MSSQL Server API: T-SQL/TDS SQL Azure Cloud SQL Server P. Bernstein et al. “Adapting Microsoft SQL server for cloud computing”, ICDE 2011 • Multi-Tenant MSSQL • Paxos-like commit protocol for consistent replication
  • 127.  Similar to RDS Microsoft SQL Azure SQL Azure Model: Managed RDBMS Pricing: Database size Underlying DB: MSSQL Server API: T-SQL/TDS SQL Azure Keyless Table Group: regular database Keyed Table Group: partitioned by row key Cloud SQL Server P. Bernstein et al. “Adapting Microsoft SQL server for cloud computing”, ICDE 2011 • Multi-Tenant MSSQL • Paxos-like commit protocol for consistent replication
  • 128.  Similar to RDS Microsoft SQL Azure SQL Azure Model: Managed RDBMS Pricing: Database size Underlying DB: MSSQL Server API: T-SQL/TDS SQL Azure Keyless Table Group: regular database Keyed Table Group: partitioned by row key Consistency unit (ACID boundary) Cloud SQL Server P. Bernstein et al. “Adapting Microsoft SQL server for cloud computing”, ICDE 2011 • Multi-Tenant MSSQL • Paxos-like commit protocol for consistent replication
  • 129.  Similar to RDS Microsoft SQL Azure SQL Azure Model: Managed RDBMS Pricing: Database size Underlying DB: MSSQL Server API: T-SQL/TDS SQL Azure Keyless Table Group: regular database Keyed Table Group: partitioned by row key Consistency unit (ACID boundary) Automatic Partitioning for Keyed Table Groups Cloud SQL Server P. Bernstein et al. “Adapting Microsoft SQL server for cloud computing”, ICDE 2011 • Multi-Tenant MSSQL • Paxos-like commit protocol for consistent replication
  • 130.  Similar to RDS Microsoft SQL Azure SQL Azure Model: Managed RDBMS Pricing: Database size Underlying DB: MSSQL Server API: T-SQL/TDS SQL Azure Keyless Table Group: regular database Keyed Table Group: partitioned by row key Consistency unit (ACID boundary) Automatic Partitioning for Keyed Table Groups Cloud SQL Server P. Bernstein et al. “Adapting Microsoft SQL server for cloud computing”, ICDE 2011 • Multi-Tenant MSSQL • Paxos-like commit protocol for consistent replication • SLA: 99.9% uptime / month • Usually Cheaper than RDS (Multi-Tenancy) • Smaller Databases (max. 150 GB) • Rich MSSQL server tooling • Keyed Table Group internal feature only
  • 131.  MySQL for Google App Engine PaaS  Support for: patching, replication, backup  SLA: 99,95 % uptime / month       Other RDBMS services Google Cloud SQL Google Cloud SQL Pricing: Database size Underlying DB: MySQL
  • 132.  MySQL for Google App Engine PaaS  Support for: patching, replication, backup  SLA: 99,95 % uptime / month  Postgres for Heroku PaaS  Hosted on EC2  No SLAs    Other RDBMS services Google Cloud SQL Google Cloud SQL Pricing: Database size Underlying DB: MySQL Heroku Postgres Pricing: Plan based Underlying DB: Postgres
  • 133.  MySQL for Google App Engine PaaS  Support for: patching, replication, backup  SLA: 99,95 % uptime / month  Postgres for Heroku PaaS  Hosted on EC2  No SLAs  MySQL for OpenStack (Icehouse)  Under development (HP driven)  VM ⇔ Tenant Other RDBMS services Google Cloud SQL Google Cloud SQL Pricing: Database size Underlying DB: MySQL Heroku Postgres Pricing: Plan based Underlying DB: Postgres Trove Pricing: Own Hardware Underlying DB: MySQL Trove
  • 134.  MySQL for Google App Engine PaaS  Support for: patching, replication, backup  SLA: 99,95 % uptime / month  Postgres for Heroku PaaS  Hosted on EC2  No SLAs  MySQL for OpenStack (Icehouse)  Under development (HP driven)  VM ⇔ Tenant Other RDBMS services Google Cloud SQL Google Cloud SQL Pricing: Database size Underlying DB: MySQL Heroku Postgres Pricing: Plan based Underlying DB: Postgres Trove Pricing: Own Hardware Underlying DB: MySQL Trove Evaluation of Cloud RDBMSs D. Kossmann,T. Kraska: An evaluation of alternative architectures for transaction processing in the cloud”, Sigmod 2010 TPC-W Benchmark (Online Shop), 2010, Emulated Browsers / RPS:
  • 135. HBase Wide- Column CP Over Row Key ~700 1/4 Apache (EMR) MongoDB Doc- ument CP yes >100 <500 4/4 GPL Riak Key- Value AP ~60 3/4 Apache (Softlayer) Cassandra Wide- Column AP With Comp. Index >300 <1000 2/4 Apache Redis Key- Value CA Through Lists, etc. manual N/A 4/4 BSD Managed NoSQL services Model CAP Scans Sec. Indices Largest Cluster Lic. Lear- ning DBaaS
  • 136. HBase Wide- Column CP Over Row Key ~700 1/4 Apache (EMR) MongoDB Doc- ument CP yes >100 <500 4/4 GPL Riak Key- Value AP ~60 3/4 Apache (Softlayer) Cassandra Wide- Column AP With Comp. Index >300 <1000 2/4 Apache Redis Key- Value CA Through Lists, etc. manual N/A 4/4 BSD Managed NoSQL services Model CAP Scans Sec. Indices Largest Cluster Lic. Lear- ning DBaaS And there are many more: • CouchDB (e.g. Cloudant) • CouchBase (e.g. KuroBase Beta) • ElasticSearch(e.g. Bonsai) • Solr (e.g. WebSolr) • …
  • 137. MongoHQ MongoHQ Model: Managed NoSQL Pricing: Plan-based Underlying DB: MongoDB API: Mongo, REST (beta)  EC2-based MongoDB-as-a-Serivce 
  • 138. MongoHQ MongoHQ Model: Managed NoSQL Pricing: Plan-based Underlying DB: MongoDB API: Mongo, REST (beta)  EC2-based MongoDB-as-a-Serivce 
  • 139. MongoHQ MongoHQ Model: Managed NoSQL Pricing: Plan-based Underlying DB: MongoDB API: Mongo, REST (beta)  EC2-based MongoDB-as-a-Serivce 
  • 140. MongoHQ MongoHQ Model: Managed NoSQL Pricing: Plan-based Underlying DB: MongoDB API: Mongo, REST (beta)  EC2-based MongoDB-as-a-Serivce 
  • 141. MongoHQ MongoHQ Model: Managed NoSQL Pricing: Plan-based Underlying DB: MongoDB API: Mongo, REST (beta)  EC2-based MongoDB-as-a-Serivce 
  • 142. MongoHQ MongoHQ Model: Managed NoSQL Pricing: Plan-based Underlying DB: MongoDB API: Mongo, REST (beta)  EC2-based MongoDB-as-a-Serivce  Private Process Multi- Tenancy Scale-Up Strategy: RAM, IOPs and CPU increased at runtime Maximum Size: 1TB Free Tier
  • 143. MongoHQ MongoHQ Model: Managed NoSQL Pricing: Plan-based Underlying DB: MongoDB API: Mongo, REST (beta)  EC2-based MongoDB-as-a-Serivce  Private Process Multi- Tenancy Scale-Up Strategy: RAM, IOPs and CPU increased at runtime Maximum Size: 1TB Free Tier VM-Deployment (EC2): M1.large on EC2: $128.10 on EC2 with 1-yr-Res.: $30 With MongoHQ: $637
  • 144. MongoHQ MongoHQ Model: Managed NoSQL Pricing: Plan-based Underlying DB: MongoDB API: Mongo, REST (beta)  EC2-based MongoDB-as-a-Serivce  #1 thing that should never happen:
  • 146. MongoHQ  Problem: Scalability Client Client configconfigconfig mongos Replica Set Master Slave Slave mongos What if Writes / second or data volume become bottleneck?
  • 147. MongoHQ  Problem: Scalability Client Client configconfigconfig mongos Replica Set Replica Set Master Slave Slave Master Slave Slave mongos Sharding (Scale Out) • Dynamic Scaling: Tenant adds Replica Set • Elastic Scaling: Provider adds/removes Replica Set MongoHQ: Only manual sharding on “contact us” basis
  • 148. MongoHQ  Problem: Scalability Client Client configconfigconfig mongos Replica Set Replica Set Master Slave Slave Master Slave Slave mongos Sharding (Scale Out) • Dynamic Scaling: Tenant adds Replica Set • Elastic Scaling: Provider adds/removes Replica Set MongoHQ: Only manual sharding on “contact us” basis • Bad: no SLAs, no horizontal scaling • Good: Solid Dashboard, Backups, Replication, integration with Heroku • Competitors: MongoLab, ObjectRocket, MongoSoup
  • 149. ElastiCache ElastiCache Model: Managed NoSQL Pricing: Infrastructure Underlying DB: Memcache, Redis API: DB, REST (management)  „RDS for Memcache and Redis“
  • 150. ElastiCache ElastiCache Model: Managed NoSQL Pricing: Infrastructure Underlying DB: Memcache, Redis API: DB, REST (management)  „RDS for Memcache and Redis“ Memcache or RedisMemcache can be run as a cluster (client- side sharding) Limited choice of instance types
  • 151. ElastiCache ElastiCache Model: Managed NoSQL Pricing: Infrastructure Underlying DB: Memcache, Redis API: DB, REST (management)  „RDS for Memcache and Redis“ Announced last Saturday: Snapshots
  • 152. ElastiCache ElastiCache Model: Managed NoSQL Pricing: Infrastructure Underlying DB: Memcache, Redis API: DB, REST (management)  „RDS for Memcache and Redis“
  • 153. ElastiCache ElastiCache Model: Managed NoSQL Pricing: Infrastructure Underlying DB: Memcache, Redis API: DB, REST (management)  „RDS for Memcache and Redis“ $ elasticache-modify-cache-parameter-group xy AWS CLI tools and SDKs offer a superset of Dashboard functions
  • 154.  Many Hosted NoSQL DbaaS Providers represented  Heroku DBaaS Addons
  • 155.  Many Hosted NoSQL DbaaS Providers represented  And Search Heroku DBaaS Addons
  • 156. Heroku Redis2Go example Redis2Go Model: Managed NoSQL Pricing: Plan-based Underlying DB: Redis API: Redis Create Heroku App:
  • 157. Heroku Redis2Go example Redis2Go Model: Managed NoSQL Pricing: Plan-based Underlying DB: Redis API: Redis Create Heroku App: Add Redis2Go Addon:
  • 158. Heroku Redis2Go example Redis2Go Model: Managed NoSQL Pricing: Plan-based Underlying DB: Redis API: Redis Create Heroku App: Add Redis2Go Addon: Use Connection URL (environment variable):
  • 159. Heroku Redis2Go example Redis2Go Model: Managed NoSQL Pricing: Plan-based Underlying DB: Redis API: Redis Create Heroku App: Add Redis2Go Addon: Use Connection URL (environment variable): Deploy:
  • 160. Heroku Redis2Go example Redis2Go Model: Managed NoSQL Pricing: Plan-based Underlying DB: Redis API: Redis Create Heroku App: Add Redis2Go Addon: Use Connection URL (environment variable): Deploy: • Very simple • Only suited for small to medium applications (no SLAs, limited control)
  • 161. SimpleDB Table- Store CP Yes (as queries) Auto- matic SQL-like (no joins, groups, …) REST + SDKs Dynamo- DB Table- Store CP By range key / index Local Sec. Global Sec. Key+Cond. On Range Key(s) REST + SDKs Automatic over Prim. Key Azure Tables Table- Store CP By range key Key+Cond. On Range Key REST + SDKs Automatic over Part. Key 99.9% uptime AE/Cloud DataStore Entity- Group CP Yes (as queries) Auto- matic Conjunct. of Eq. Predicates REST/ SDK, JDO,JPA Automatic over Entity Groups S3, Az. Blob, GCS Blob- Store AP REST + SDKs Automatic over key 99.9% uptime (S3) Proprietary Database services Model CAP Scans Sec. Indices Queries API SLA Scale- out
  • 162. SimpleDB Table- Store CP Yes (as queries) Auto- matic SQL-like (no joins, groups, …) REST + SDKs Dynamo- DB Table- Store CP By range key / index Local Sec. Global Sec. Key+Cond. On Range Key(s) REST + SDKs Automatic over Prim. Key Azure Tables Table- Store CP By range key Key+Cond. On Range Key REST + SDKs Automatic over Part. Key 99.9% uptime AE/Cloud DataStore Entity- Group CP Yes (as queries) Auto- matic Conjunct. of Eq. Predicates REST/ SDK, JDO,JPA Automatic over Entity Groups S3, Az. Blob, GCS Blob- Store AP REST + SDKs Automatic over key 99.9% uptime (S3) Proprietary Database services Model CAP Scans Sec. Indices Queries API SLA Scale- out There are many more object stores (HP, Rackspace, etc.) …but no comparable Table Stores
  • 165.  Table Service example: Azure Tables Partition Key Row Key (sortiert) Timestamp (autom.) Property1 Propertyn intro.pdf v1.1 14/6/2013 … … intro.pdf v1.2 15/6/2013 … präs.pptx v0.0 11/6/2013 … Partition Partition RESTAPI
  • 166.  Table Service example: Azure Tables Partition Key Row Key (sortiert) Timestamp (autom.) Property1 Propertyn intro.pdf v1.1 14/6/2013 … … intro.pdf v1.2 15/6/2013 … präs.pptx v0.0 11/6/2013 … Partition Partition RESTAPI SparseHash-distributed to parition servers No Index: Lookup only (!) by full table scan Atomic "Entity- Group Batch Transaction" possible
  • 167.  Similar to Amazon SimpleDB and DynamoDB Table Service example: Azure Tables Partition Key Row Key (sortiert) Timestamp (autom.) Property1 Propertyn intro.pdf v1.1 14/6/2013 … … intro.pdf v1.2 15/6/2013 … präs.pptx v0.0 11/6/2013 … Partition Partition RESTAPI • Indexes all attributes • Rich(er) queries • Many Limits (size, RPS, etc.) • Provisioned Throughput • On SSDs („single digit latency“) • Optional Indexes
  • 168. Azure Table Storage Azure Tables Model: Propriertary Pricing: Requests + Storage + Network Underlying DB: Custom System API: REST Challenges:  Single partition and range key (modelling)  Very “basic“ queries       Azure Tables
  • 169. Azure Table Storage Azure Tables Model: Propriertary Pricing: Requests + Storage + Network Underlying DB: Custom System API: REST Challenges:  Single partition and range key (modelling)  Very “basic“ queries       Azure Tables
  • 170. Azure Table Storage Azure Tables Model: Propriertary Pricing: Requests + Storage + Network Underlying DB: Custom System API: REST Challenges:  Single partition and range key (modelling)  Very “basic“ queries Good: Automatic Distribution Replicated 3x locally + 1x async. geo-replica    Azure Tables
  • 171. Azure Table Storage Azure Tables Model: Propriertary Pricing: Requests + Storage + Network Underlying DB: Custom System API: REST Challenges:  Single partition and range key (modelling)  Very “basic“ queries Good: Automatic Distribution Replicated 3x locally + 1x async. geo-replica Very good: SLA (99.9% uptime) Internal architecture published Azure Tables
  • 172. Azure Table Storage Azure Tables Model: Propriertary Pricing: Requests + Storage + Network Underlying DB: Custom System API: REST Challenges:  Single partition and range key (modelling)  Very “basic“ queries Good: Automatic Distribution Replicated 3x locally + 1x async. geo-replica Very good: SLA (99.9% uptime) Internal architecture published Azure Tables Windows Azure Storage B. Calder, et al. "Windows Azure Storage: a highly available cloud storage service with strong consistency." , SOSP 2011 Idea: • Layered storage infrastructure for Blobs and Tables • Use research results (GFS, BigTable, Paxos, LSM, Erasure Coding)
  • 173. Azure Table Storage Azure Tables Model: Propriertary Pricing: Requests + Storage + Network Underlying DB: Custom System API: REST Challenges:  Single partition and range key (modelling)  Very “basic“ queries Good: Automatic Distribution Replicated 3x locally + 1x async. geo-replica Very good: SLA (99.9% uptime) Internal architecture published Azure Tables Windows Azure Storage B. Calder, et al. "Windows Azure Storage: a highly available cloud storage service with strong consistency." , SOSP 2011 Idea: • Layered storage infrastructure for Blobs and Tables • Use research results (GFS, BigTable, Paxos, LSM, Erasure Coding) Think: GFS/HDFS Think: BigTable/HBase Think: Chubby/ZooKeeper
  • 174. DynamoDB DynamoDB Model: Propriertary Pricing: Provisioned Throughput + Network Underlying DB: Custom System API: REST  Successor to SimpleDB Limitations: • Slow (~50-100 RPS) • 10 GB per Domain • Query result max. 2500 records or 1 MB • Max. 1K-sized attributes
  • 175. DynamoDB DynamoDB Model: Propriertary Pricing: Provisioned Throughput + Network Underlying DB: Custom System API: REST  Successor to SimpleDB dom:com.cnn content : "<html>…" Primary Key Attribute (scalar or set) page:index Range Key Item:
  • 176. DynamoDB DynamoDB Model: Propriertary Pricing: Provisioned Throughput + Network Underlying DB: Custom System API: REST  Successor to SimpleDB dom:com.cnn content : "<html>…" Primary Key Attribute (scalar or set) Querying Options: • GetItem: Key Lookup • Query: Primary Key + Condition on Range Key • Scan: Full Table Scan with filter • EMR: Hive queries (for analytics) page:index Range Key Item:
  • 177. DynamoDB DynamoDB Model: Propriertary Pricing: Provisioned Throughput + Network Underlying DB: Custom System API: REST  Successor to SimpleDB  Consistency:  Strongly (2x price) or Eventually Consistent Reads  Atomic (Conditional) Updates per Item  Indexing Options: ◦ Local Sec. Index: consistent additional Range Key ◦ Global Sec. Index: eventually consistent index-table (Primary Key) dom:com.cnn content : "<html>…" Primary Key Attribute (scalar or set) page:index Range Key Item:
  • 182. DynamoDB DynamoDB Model: Propriertary Pricing: Provisioned Throughput + Network Underlying DB: Custom System API: REST  Successor to SimpleDB Unit of Billing
  • 183. DynamoDB DynamoDB Model: Propriertary Pricing: Provisioned Throughput + Network Underlying DB: Custom System API: REST  Successor to SimpleDB Unit of Billing Good: • Low Latency (SSD) • Data partitioning and AZ-replication Bad: • Scaling not elastic (Capacity Units) • No SLAs, no internals published (≠ Dynamo!) • No built-in backups ( AWS data pipeline) • Vendor Lock-in
  • 184. AE/Cloud DataStore DataStore Model: Propriertary Pricing: CPU + Storage + Network Underlying DB: MegaStore API: SDK, JPA, JDO Google Cloud Datastore  Structured Storage System for App Engine  Based on: ◦ Megastore BigTable  Colossus 
  • 185. AE/Cloud DataStore DataStore Model: Propriertary Pricing: CPU + Storage + Network Underlying DB: MegaStore API: SDK, JPA, JDO Google Cloud Datastore  Structured Storage System for App Engine  Based on: ◦ Megastore BigTable  Colossus 
  • 186. AE/Cloud DataStore DataStore Model: Propriertary Pricing: CPU + Storage + Network Underlying DB: MegaStore API: SDK, JPA, JDO Google Cloud Datastore  Structured Storage System for App Engine  Based on: ◦ Megastore BigTable  Colossus 
  • 187. AE/Cloud DataStore DataStore Model: Propriertary Pricing: CPU + Storage + Network Underlying DB: MegaStore API: SDK, JPA, JDO Google Cloud Datastore  Structured Storage System for App Engine  Based on: ◦ Megastore BigTable  Colossus  Schemafree Entity Group (EG) data model: User ID Name Photo ID User URL Root Table Child Table 1 n
  • 188. AE/Cloud DataStore DataStore Model: Propriertary Pricing: CPU + Storage + Network Underlying DB: MegaStore API: SDK, JPA, JDO Google Cloud Datastore  Structured Storage System for App Engine  Based on: ◦ Megastore BigTable  Colossus  Schemafree Entity Group (EG) data model: User ID Name Photo ID User URL Root Table Child Table 1 n EG: User + n Photos • Unit of ACID transactions/ consistency • Fields autoindexed (eventually consistent)
  • 189. AE/Cloud DataStore DataStore Model: Propriertary Pricing: CPU + Storage + Network Underlying DB: MegaStore API: SDK, JPA, JDO Google Cloud Datastore  Structured Storage System for App Engine  Based on: ◦ Megastore BigTable  Colossus  Schemafree Entity Group (EG) data model: User ID Name Photo ID User URL Root Table Child Table 1 n EG: User + n Photos • Unit of ACID transactions/ consistency • Fields autoindexed (eventually consistent) SELECT * FROM photos WHERE ANCESTOR IS :34 AND name = „sunset“ ORDER BY date ASC LIMIT 10 OFFSET 10
  • 190. AE/Cloud DataStore  Internally: Entity Groups define partitions Synchronous Paxos-based replication ACID per EG. Maximum of 1 Write/s to an EG. Eventual Consistency across groups Stored in BigTable
  • 191. AE/Cloud DataStore  Internally: Entity Groups define partitions Synchronous Paxos-based replication ACID per EG. Maximum of 1 Write/s to an EG. Eventual Consistency across groups Stored in BigTable
  • 192. AE/Cloud DataStore  Internally: MegaStore J. Baker, et al. "Megastore: Providing Scalable, Highly Available Storage for Interactive Services." CIDR 2011. • Paxos-based replication and transactions • 100 Google applications Problems: Slow Writes, Predefined Entity Groups Entity Groups define partitions Synchronous Paxos-based replication ACID per EG. Maximum of 1 Write/s to an EG. Eventual Consistency across groups Stored in BigTable
  • 193. AE/Cloud DataStore  Internally: MegaStore J. Baker, et al. "Megastore: Providing Scalable, Highly Available Storage for Interactive Services." CIDR 2011. • Paxos-based replication and transactions • 100 Google applications Problems: Slow Writes, Predefined Entity Groups Entity Groups define partitions Synchronous Paxos-based replication ACID per EG. Maximum of 1 Write/s to an EG. Eventual Consistency across groups Stored in BigTable Spanner J. Corbett et al. "Spanner: Google’s globally distributed database." TOCS 2013 Idea: • Autosharded Entity Groups • Not based on BigTable Implementation: • TrueTime API (GPS + atomic clocks)  commit timestamps of 2PL-SI transactions • Paxos-replication per Shard
  • 194. AE/Cloud DataStore  Internally: MegaStore J. Baker, et al. "Megastore: Providing Scalable, Highly Available Storage for Interactive Services." CIDR 2011. • Paxos-based replication and transactions • 100 Google applications Problems: Slow Writes, Predefined Entity Groups Entity Groups define partitions Synchronous Paxos-based replication ACID per EG. Maximum of 1 Write/s to an EG. Eventual Consistency across groups Stored in BigTable Spanner J. Corbett et al. "Spanner: Google’s globally distributed database." TOCS 2013 Idea: • Autosharded Entity Groups • Not based on BigTable Implementation: • TrueTime API (GPS + atomic clocks)  commit timestamps of 2PL-SI transactions • Paxos-replication per Shard F1 J. Shute, et al. "F1: A distributed SQL database that scales.“, VLDB 2013 Idea: • Full SQL relational database built on Spanner, powers AdWords Implementation: • 5-way replication • Data Model: relational + hierarchy (customercampaignAdGroup) • Transactions: Snapshot-Read-Only, Pessimistic (Spanner), optimistic • Distributed SQL Engine
  • 195. AE/Cloud DataStore  Internally: MegaStore J. Baker, et al. "Megastore: Providing Scalable, Highly Available Storage for Interactive Services." CIDR 2011. • Paxos-based replication and transactions • 100 Google applications Problems: Slow Writes, Predefined Entity Groups Entity Groups define partitions Synchronous Paxos-based replication ACID per EG. Maximum of 1 Write/s to an EG. Eventual Consistency across groups Stored in BigTable Spanner J. Corbett et al. "Spanner: Google’s globally distributed database." TOCS 2013 Idea: • Autosharded Entity Groups • Not based on BigTable Implementation: • TrueTime API (GPS + atomic clocks)  commit timestamps of 2PL-SI transactions • Paxos-replication per Shard F1 J. Shute, et al. "F1: A distributed SQL database that scales.“, VLDB 2013 Idea: • Full SQL relational database built on Spanner, powers AdWords Implementation: • 5-way replication • Data Model: relational + hierarchy (customercampaignAdGroup) • Transactions: Snapshot-Read-Only, Pessimistic (Spanner), optimistic • Distributed SQL Engine Good: • Transactions (though limited) • Good scalability of data volume Bad: • Entity Groups hard to define • Bad Scale-Out for write and reads (Kossmann et al.)  no advantage over RDBMS for small data volumes A Spanner/F1 based DBaaS?
  • 196.  Idea: Blobs with RESTful CRUD Interface  Distribution: 𝐵𝑢𝑐𝑘𝑒𝑡, 𝐾𝑒𝑦 → 𝑆𝑒𝑟𝑣𝑒𝑟 + 𝑅𝑒𝑝𝑙𝑖𝑐𝑎𝑠  CDN Integration (Azure CDN, Cloudfront)  S3: > 2 Trillion (2 ⋅ 1012) objects Azure Blobs, Amazon S3, Google Cloud Storage Container Blobcontains Block 1 (up to 4 MB) Block 2 Block n REST-Request Azure Blobs: Block Blob: 4MB blocks  large files Page Blob: 512B blocks  random IO
  • 197.  Automatic Versioning  Pricing: per request (10.000 ~ 1c) and storage (1GB/month ~ 3c) and network (1GB ~ 10c) Azure Blobs, Amazon S3, Google Cloud Storage S3 Blob DELETE /puppy.jpg HTTP/1.1 Host: mybucket.s3.amazonaws.com Authorization: AWS AKIAIO... AWS Ireland DC Replicas Reduced Redundancy: only 1 replica Glacier: tape-disk archivalAmazon S3:
  • 198.  Automatic Versioning  Pricing: per request (10.000 ~ 1c) and storage (1GB/month ~ 3c) and network (1GB ~ 10c) Azure Blobs, Amazon S3, Google Cloud Storage S3 Blob DELETE /puppy.jpg HTTP/1.1 Host: mybucket.s3.amazonaws.com Authorization: AWS AKIAIO... AWS Ireland DC Replicas Reduced Redundancy: only 1 replica Glacier: tape-disk archivalAmazon S3: S3 Consistency D. Bermbach,, S. Tai. "Eventual consistency: How soon is eventual?”, MW4SOC 11 Findings: • Inconsistency window varies from 2-11 seconds • Monotonic Read Consistency is often violated
  • 199.  Automatic Versioning  Pricing: per request (10.000 ~ 1c) and storage (1GB/month ~ 3c) and network (1GB ~ 10c) Azure Blobs, Amazon S3, Google Cloud Storage S3 Blob DELETE /puppy.jpg HTTP/1.1 Host: mybucket.s3.amazonaws.com Authorization: AWS AKIAIO... AWS Ireland DC Replicas Reduced Redundancy: only 1 replica Glacier: tape-disk archivalAmazon S3: S3 Consistency D. Bermbach,, S. Tai. "Eventual consistency: How soon is eventual?”, MW4SOC 11 Findings: • Inconsistency window varies from 2-11 seconds • Monotonic Read Consistency is often violated Building a database on S3 M. Brantner, et al. "Building a database on S3." Sigmod 2008 Idea: • Use S3 as the persistent storage of a database Implementation: • Buffer Pool and Log Manager on S3 • No transaction or query support
  • 200.  Founded June, 2011  Acquired by Facebook April, 2013  Pricing: ◦ Free ◦ Pro (199$) ◦ Enterprise Parse - MBaaS Parse Model: Backend-aas Pricing: Plan-based Underlying DB: Mainly MongoDB API: SDKs, REST
  • 201.  Founded June, 2011  Acquired by Facebook April, 2013  Pricing: ◦ Free ◦ Pro (199$) ◦ Enterprise Parse - MBaaS Parse Model: Backend-aas Pricing: Plan-based Underlying DB: Mainly MongoDB API: SDKs, REST Parse Core
  • 202.  Founded June, 2011  Acquired by Facebook April, 2013  Pricing: ◦ Free ◦ Pro (199$) ◦ Enterprise Parse - MBaaS Parse Model: Backend-aas Pricing: Plan-based Underlying DB: Mainly MongoDB API: SDKs, REST Parse Core Parse Analytics
  • 203.  Founded June, 2011  Acquired by Facebook April, 2013  Pricing: ◦ Free ◦ Pro (199$) ◦ Enterprise Parse - MBaaS Parse Model: Backend-aas Pricing: Plan-based Underlying DB: Mainly MongoDB API: SDKs, REST Parse Core Parse Analytics Parse Push
  • 205. Authentication User + Password OAuth: Facebook, Twitter Parse - MBaaS
  • 206. Authentication User + Password OAuth: Facebook, Twitter Parse - MBaaS Query Cinemas (new Parse.Query('Cinemas')) .withinKilometers(...) .fetch()
  • 207. Authentication User + Password OAuth: Facebook, Twitter Parse - MBaaS Query Cinemas (new Parse.Query('Cinemas')) .withinKilometers(...) .fetch() Query Movies (new Parse.Query('Movies')) .greaterThan('startAt‘, now) .notEqualTo('cinemas', cId) .fetch()
  • 208. Meteor Meteor Model: Backend-aaS Pricing: No yet revealed Underlying DB: MongoDB API: WebSockets  Idea: Full-Stack JavaScript with Node.js, MongoDB and WebSockets Web Browser Node.js + MongoDB (Single Server) WebSocket
  • 209. Meteor Meteor Model: Backend-aaS Pricing: No yet revealed Underlying DB: MongoDB API: WebSockets  Idea: Full-Stack JavaScript with Node.js, MongoDB and WebSockets <div class="player {{selected}}"> <span class="name">{{name}}</span> <span class="score">{{score}}</span> </div> Web Browser Node.js + MongoDB (Single Server) WebSocket
  • 210. Meteor Meteor Model: Backend-aaS Pricing: No yet revealed Underlying DB: MongoDB API: WebSockets  Idea: Full-Stack JavaScript with Node.js, MongoDB and WebSockets Players = new Meteor. Collection("players"); if (Meteor.isServer) { //… <div class="player {{selected}}"> <span class="name">{{name}}</span> <span class="score">{{score}}</span> </div> Web Browser Node.js + MongoDB (Single Server) WebSocket
  • 211. Meteor Meteor Model: Backend-aaS Pricing: No yet revealed Underlying DB: MongoDB API: WebSockets  Idea: Full-Stack JavaScript with Node.js, MongoDB and WebSockets Players = new Meteor. Collection("players"); if (Meteor.isServer) { //… if (Meteor.isClient) { Template.leaderboard.players = function () { return Players.find({}, {sort: {score: -1, name: 1}}); }; <div class="player {{selected}}"> <span class="name">{{name}}</span> <span class="score">{{score}}</span> </div> Web Browser Node.js + MongoDB (Single Server) WebSocket
  • 212. Meteor Meteor Model: Backend-aaS Pricing: No yet revealed Underlying DB: MongoDB API: WebSockets  Idea: Full-Stack JavaScript with Node.js, MongoDB and WebSockets Players = new Meteor. Collection("players"); if (Meteor.isServer) { //… if (Meteor.isClient) { Template.leaderboard.players = function () { return Players.find({}, {sort: {score: -1, name: 1}}); }; <div class="player {{selected}}"> <span class="name">{{name}}</span> <span class="score">{{score}}</span> </div> Web Browser Node.js + MongoDB (Single Server) WebSocket $ meteor deploy abc.meteor.com
  • 213. Meteor Meteor Model: Backend-aaS Pricing: No yet revealed Underlying DB: MongoDB API: WebSockets  Idea: Full-Stack JavaScript with Node.js, MongoDB and WebSockets Players = new Meteor. Collection("players"); if (Meteor.isServer) { //… if (Meteor.isClient) { Template.leaderboard.players = function () { return Players.find({}, {sort: {score: -1, name: 1}}); }; <div class="player {{selected}}"> <span class="name">{{name}}</span> <span class="score">{{score}}</span> </div> Web Browser Node.js + MongoDB (Single Server) WebSocket $ meteor deploy abc.meteor.com Very productive for very small projects Fundamentally limited scalability: • Server tails Mongo‘s oplog • And holds the fetched data state of every client
  • 214. Outline • Hot Topics • Orestes: a scalable, low- latency architecture • Baqend: putting it into practice What are Cloud Databases? Cloud Databases in the wild Research Perspectives Wrap-up and literature
  • 215.
  • 216.  Example: CryptDB  Idea: Only decrypt as much as neccessary Encrypted Databases: Research RDBMS SQL-Proxy Encrypts and decrypts, rewrites queries
  • 217.  Example: CryptDB  Idea: Only decrypt as much as neccessary Encrypted Databases: Research RDBMS SQL-Proxy Encrypts and decrypts, rewrites queries
  • 218.  Example: CryptDB  Idea: Only decrypt as much as neccessary Encrypted Databases: Research RDBMS SQL-Proxy Encrypts and decrypts, rewrites queries Relational Cloud C. Curino, et al. "Relational cloud: A database-as-a-service for the cloud.“, CIDR 2011 DBaaS Architecture: • Encrypted with CryptDB • Multi-Tenancy through live migration • Workload-aware partitioning (graph-based)
  • 219.  Example: CryptDB  Idea: Only decrypt as much as neccessary Encrypted Databases: Research RDBMS SQL-Proxy Encrypts and decrypts, rewrites queries Relational Cloud C. Curino, et al. "Relational cloud: A database-as-a-service for the cloud.“, CIDR 2011 DBaaS Architecture: • Encrypted with CryptDB • Multi-Tenancy through live migration • Workload-aware partitioning (graph-based) • Early approach • Not adopted in practice, yet Dream solution: Full Homorphic Encryption
  • 220. Transactions/Consistency: Research Dynamo Eventual None 1 RT - Yahoo PNuts Timeline per key Single Key 1 RT possible COPS Causality Multi-Record 1 RT possible MySQL (async) Serializable Static Partition 1 RT possible Megastore Serializable Static Partition 2 RT - Spanner/F1 Snapshot Isolation Partition 2 RT - MDCC Read-Commited Multi-Record 1 RT - Consistency Transactional Unit Commit Latency Data Loss?
  • 221. Transactions/Consistency: Research Dynamo Eventual None 1 RT - Yahoo PNuts Timeline per key Single Key 1 RT possible COPS Causality Multi-Record 1 RT possible MySQL (async) Serializable Static Partition 1 RT possible Megastore Serializable Static Partition 2 RT - Spanner/F1 Snapshot Isolation Partition 2 RT - MDCC Read-Commited Multi-Record 1 RT - Consistency Transactional Unit Commit Latency Data Loss? Multi-Data Center Consistency T. Kraska et al. "MDCC: Multi-data center consistency." EuroSys, 2013. Idea: • Multi-Data center commit protocol with single round-trip Implementation: • Optimistic Commit Protocol • Fast, Generalized Multi-Paxos Result: almost as fast as Dynamo-style
  • 222. Transactions/Consistency: Research Dynamo Eventual None 1 RT - Yahoo PNuts Timeline per key Single Key 1 RT possible COPS Causality Multi-Record 1 RT possible MySQL (async) Serializable Static Partition 1 RT possible Megastore Serializable Static Partition 2 RT - Spanner/F1 Snapshot Isolation Partition 2 RT - MDCC Read-Commited Multi-Record 1 RT - Consistency Transactional Unit Commit Latency Data Loss? Multi-Data Center Consistency T. Kraska et al. "MDCC: Multi-data center consistency." EuroSys, 2013. Idea: • Multi-Data center commit protocol with single round-trip Implementation: • Optimistic Commit Protocol • Fast, Generalized Multi-Paxos Result: almost as fast as Dynamo-style Currently no NoSQL DB implements consistent Multi-DC replication
  • 223.  YCSB (Yahoo Cloud Serving Benchmark) Benchmarking: Research Data Store
  • 224.  YCSB (Yahoo Cloud Serving Benchmark) Benchmarking: Research Client WorkloadGenerator PluggableDBinterface Data Store Threads Stats
  • 225.  YCSB (Yahoo Cloud Serving Benchmark) Benchmarking: Research Client WorkloadGenerator PluggableDBinterface Workload: 1. Operation Mix 2. Record Size 3. Popularity Distribution Runtime Parameters: DB host name, threads, etc. Data Store Threads Stats
  • 226.  YCSB (Yahoo Cloud Serving Benchmark) Benchmarking: Research Client WorkloadGenerator PluggableDBinterface Workload: 1. Operation Mix 2. Record Size 3. Popularity Distribution Runtime Parameters: DB host name, threads, etc. Read() Insert() Update() Delete() Scan() Data Store Threads Stats DB protocol
  • 227.  YCSB (Yahoo Cloud Serving Benchmark) Benchmarking: Research Client WorkloadGenerator PluggableDBinterface Workload: 1. Operation Mix 2. Record Size 3. Popularity Distribution Runtime Parameters: DB host name, threads, etc. Read() Insert() Update() Delete() Scan() Data Store Threads Stats DB protocol Workload Operation Mix Distribution Example A – Update Heavy Read: 50% Update: 50% Zipfian Session Store B – Read Heavy Read: 95% Update: 5% Zipfian Photo Tagging C – Read Only Read: 100% Zipfian User Profile Cache D – Read Latest Read: 95% Insert: 5% Latest User Status Updates E – Short Ranges Scan: 95% Insert: 5% Zipfian/ Uniform Threaded Conversations
  • 228.  Example Result (Read Heavy): Benchmarking: Research
  • 229.  Example Result (Read Heavy): Benchmarking: Research Weaknesses: • Single client can be a bottleneck • No consistency & availability measurement
  • 230.  Example Result (Read Heavy): Benchmarking: Research YCSB++ S. Patil, M. Polte, et al.„Ycsb++: benchmarking and performance debugging advanced features in scalable table stores“, SOCC 2011 • Clients coordinate through Zookeeper • Simple Read-After-Write Checks • Evaluation: Hbase & Accumulo Weaknesses: • Single client can be a bottleneck • No consistency & availability measurement
  • 231.  Example Result (Read Heavy): Benchmarking: Research YCSB++ S. Patil, M. Polte, et al.„Ycsb++: benchmarking and performance debugging advanced features in scalable table stores“, SOCC 2011 • Clients coordinate through Zookeeper • Simple Read-After-Write Checks • Evaluation: Hbase & Accumulo Weaknesses: • Single client can be a bottleneck • No consistency & availability measurement • No Transaction Support YCSB+T A. Dey et al. “YCSB+T: Benchmarking Web-Scale Transactional Databases”, CloudDB 2014 • New workload: Transactional Bank Account • Simple anomaly detection for Lost Updates • No comparison of systems
  • 232.  Example Result (Read Heavy): Benchmarking: Research YCSB++ S. Patil, M. Polte, et al.„Ycsb++: benchmarking and performance debugging advanced features in scalable table stores“, SOCC 2011 • Clients coordinate through Zookeeper • Simple Read-After-Write Checks • Evaluation: Hbase & Accumulo Weaknesses: • Single client can be a bottleneck • No consistency & availability measurement • No Transaction Support YCSB+T A. Dey et al. “YCSB+T: Benchmarking Web-Scale Transactional Databases”, CloudDB 2014 • New workload: Transactional Bank Account • Simple anomaly detection for Lost Updates • No comparison of systems No specific application CloudStone, CARE, TPC extensions?
  • 234. Benchmarking: state of the art Tests latency and throughput of IaaS-Providers, CDNs and Object Stores
  • 235. Benchmarking: state of the art Tests latency and throughput of IaaS-Providers, CDNs and Object Stores Does not test: Cloud Databases
  • 238. Vision YCSB Harmony YCSB Client YCSB Client YCSB Client YCSB Client Idea: 1. Periodically launch clients 2. Benchmark Cloud DB 3. Publish results
  • 239. Vision YCSB Harmony System Availablity Reads/s Writes/s Avg. Latency 95th perc. Latency Plot DynamoDB 99.9% 23411 34534 3.2 ms 9 ms RDS 99.8% 2342 2455 30 ms 80 ms Azure Table 99.5% 22343 23442 12 ms 20 ms Google DataStore 99.5% 3000 2000 30 ms 300 ms
  • 240. Database research at the University of Hamburg
  • 245. Motivation ClientApplicationDatabase Web Server Web Server Average (2014): 90 HTTP Requests per page load High Latency Classic 3-Tier-ArchitectureThree-Tier Architecture
  • 246. Motivation ClientApplicationDatabase Web Server Web Server Average (2014): 90 HTTP Requests per page load High Latency Classic 3-Tier-ArchitectureThree-Tier ArchitectureWith every 100ms of additional page load time, revenue decreases by 1%. Study by Amazon
  • 247. Motivation ClientApplicationDatabase Web Server Web Server Average (2014): 90 HTTP Requests per page load High Latency Classic 3-Tier-ArchitectureThree-Tier ArchitectureWith every 100ms of additional page load time, revenue decreases by 1%. Study by Amazon When increasing load time of search results by 500ms, traffic decreases by 20%. Study by Google
  • 248. Motivation ClientApplicationDatabase Web Server Web Server Average (2014): 90 HTTP Requests per page load High Latency Classic 3-Tier-ArchitectureThree-Tier ArchitectureWith every 100ms of additional page load time, revenue decreases by 1%. Study by Amazon When increasing load time of search results by 500ms, traffic decreases by 20%. Study by Google
  • 249. SPAs Rich ClientApplicationDatabase Web Server Web Server Single-Page Applications High Latency (Rich Client, Smart Client) Data (e.g. JSON) (AngularJS, Backbone, Ember.JS, etc.)
  • 251. ORESTES Rich ClientOrestesDB-Cluster REST Server Web- Caches Low Latency Cloud REST Server Cacheable Database Objects
  • 252. ORESTES Rich ClientOrestesDB-Cluster REST Server Web- Caches Low Latency Scalable NoSQL DBs Cloud REST Server Cacheable Database Objects
  • 253. ORESTES Rich ClientOrestesDB-Cluster REST Server Web- Caches Low Latency Scalable NoSQL DBs Cloud Exposes the DB via REST API and handles: • Scaling • Cache Consistency • Transactions • Schema REST Server Cacheable Database Objects
  • 254.  Middleware: ◦ Scalable REST API for aggregate-oriented persistence, queries, schema management and transactions.  Caching: ◦ Consistent web caching of database objects for read scalability and latency reduction.  Transactions: ◦ Optimistic cache-aware transaction model. ORESTES: Components
  • 256. Java/JDO persist find createQuery JavaScript/JPA Port REST/HTTP API others Application Server Browser or Mobile Device Application Layer Persistence API Data Store HTTP Server Trans- actions Querys Object Persist. Schema Key- Value Doc- uments DBaaS & BaaS Layer Transaction Validation HTTP Server Config- uration Partial Updates Access Control Multi-Tenancy Schema Management Workload Management Cache Coherence Autoscaling Database- independent Concerns Database - specific Wrappers SLAs HTTP Server
  • 257. Java/JDO persist find createQuery JavaScript/JPA Port REST/HTTP API others Application Server Browser or Mobile Device Application Layer Persistence API Data Store ISP Forward-Proxy Caches ISP Caches Reverse-Proxy Caches and Load Balancers CDN Caches Content Delivery Networks Purge Scale HTTP Server Trans- actions Querys Object Persist. Schema Key- Value Doc- uments DBaaS & BaaS Layer Transaction Validation HTTP Server Config- uration Partial Updates Access Control Multi-Tenancy Schema Management Workload Management Cache Coherence Autoscaling Database- independent Concerns Database - specific Wrappers SLAs HTTP Server
  • 258. Java/JDO persist find createQuery JavaScript/JPA Port REST/HTTP API others Application Server Browser or Mobile Device Application Layer Persistence API Data Store ISP Forward-Proxy Caches ISP Caches Reverse-Proxy Caches and Load Balancers CDN Caches Content Delivery Networks Purge Scale HTTP Server Trans- actions Querys Object Persist. Schema Key- Value Doc- uments DBaaS & BaaS Layer Transaction Validation HTTP Server Config- uration Partial Updates Access Control Multi-Tenancy Schema Management Workload Management Cache Coherence Autoscaling Database- independent Concerns Database - specific Wrappers SLAs HTTP Server Redis (Replicated) 10201040 10101010Counting Bloom Filter add delete Node.JS (local to Server) Stored Procedures Custom Validation
  • 259. Java/JDO persist find createQuery JavaScript/JPA Port REST/HTTP API others Application Server Browser or Mobile Device Application Layer Persistence API Data Store ISP Forward-Proxy Caches ISP Caches Reverse-Proxy Caches and Load Balancers CDN Caches Content Delivery Networks Purge Scale HTTP Server Trans- actions Querys Object Persist. Schema Key- Value Doc- uments DBaaS & BaaS Layer Transaction Validation HTTP Server Config- uration Partial Updates Access Control Multi-Tenancy Schema Management Workload Management Cache Coherence Autoscaling Database- independent Concerns Database - specific Wrappers SLAs HTTP Server Redis (Replicated) 10201040 10101010Counting Bloom Filter add delete Node.JS (local to Server) Stored Procedures Custom Validation GET /db/{bucket}/{class}/{id} 200 OK Cache-Control: public, max-age=6000 ETag: "3" JSON Object
  • 267. App App BFB: Bloom Filter Bounded Staleness Cache 1 4 020
  • 268. App App BFB: Bloom Filter Bounded Staleness Cache 1 4 020 How to prevent stale reads (inconsistency)?
  • 269. App App BFB: Bloom Filter Bounded Staleness Cache 1 4 020
  • 270. App App BFB: Bloom Filter Bounded Staleness Cache 1 4 020
  • 271. App App BFB: Bloom Filter Bounded Staleness Cache 1 4 020
  • 272. App App BFB: Bloom Filter Bounded Staleness Cache 1 4 020 purge(obj) hashB(oid)hashA(oid) 13
  • 273. App App BFB: Bloom Filter Bounded Staleness Cache 1 4 020 purge(obj) 131 1 110 Flat(Counting Bloomfilter)
  • 274. App App BFB: Bloom Filter Bounded Staleness Cache 1 4 020 purge(obj) 131 1 110 hashB(oid)hashA(oid)
  • 275. App App BFB: Bloom Filter Bounded Staleness Cache 1 4 020 purge(obj) 131 1 110 hashB(oid)hashA(oid)
  • 276. App App BFB: Bloom Filter Bounded Staleness Cache 1 4 020 purge(obj) 131 1 110
  • 277. App App BFB: Bloom Filter Bounded Staleness Cache 1 4 020 hashB(oid)hashA(oid) 1 1 110
  • 278. App App BFB: Bloom Filter Bounded Staleness Cache 1 4 020 hashB(oid)hashA(oid) 1 1 110 𝑓 ≈ 1 − 𝑒− 𝑘𝑛 𝑚 𝑘 𝑘 = ln 2 ⋅ ( 𝑛 𝑚 ) False-Positive Rate: Hash- Functions: With 10.000 Updates per 10 minutes and 1% error rate: 12 KByte
  • 279. App App BFB: Bloom Filter Bounded Staleness Cache 1 4 020 hashB(oid)hashA(oid) 1 1 110
  • 280. SCOT: Scalable Cache-Aware Optimistic Transactions Cache Cache Cache REST-Server REST-Server REST-Server DB Coordinator Client
  • 281. SCOT: Scalable Cache-Aware Optimistic Transactions Cache Cache Cache REST-Server REST-Server REST-Server DB Coordinator Client Begin Transaction Bloom Filter 1
  • 282. SCOT: Scalable Cache-Aware Optimistic Transactions Cache Cache Cache REST-Server REST-Server REST-Server DB Coordinator Client Begin Transaction Bloom Filter 1 Reads Writes 2 Writes (Hidden)
  • 283. SCOT: Scalable Cache-Aware Optimistic Transactions Cache Cache Cache REST-Server REST-Server REST-Server DB Coordinator Client Begin Transaction Bloom Filter 1 Reads Writes 2 Commit: read- & write-set versions Committed OR aborted + stale objects 3 Writes (Hidden)
  • 284. SCOT: Scalable Cache-Aware Optimistic Transactions Cache Cache Cache REST-Server REST-Server REST-Server DB Coordinator Client Begin Transaction Bloom Filter 1 Reads Writes 2 Commit: read- & write-set versions Committed OR aborted + stale objects 3 Writes (Hidden) validation 4 E.g. Redis or ZooKeeper prevent conflicting validations
  • 285. SCOT: Scalable Cache-Aware Optimistic Transactions Cache Cache Cache REST-Server REST-Server REST-Server DB Coordinator Client Begin Transaction Bloom Filter 1 Reads Writes 2 Commit: read- & write-set versions Committed OR aborted + stale objects 3 Writes (Hidden) validation 4 E.g. Redis or ZooKeeper 5Writes (Public) Read all prevent conflicting validations
  • 286. SCOT: Scalable Cache-Aware Optimistic Transactions Cache Cache Cache REST-Server REST-Server REST-Server DB Coordinator Client Begin Transaction Bloom Filter 1 Reads Writes 2 Commit: read- & write-set versions Committed OR aborted + stale objects 3 Writes (Hidden) validation 4 E.g. Redis or ZooKeeper 5Writes (Public) Read all prevent conflicting validations Caching → Shorter transaction duration → less aborts
  • 287. Polyglot Persistence application Orestes servers REST/HTTP protocol Redis MongoDB db4o meta data contains SLA parse SLA & route data manage materialisation resolve mapping Polyglot Persistence Mediator
  • 288. Polyglot Persistence application Orestes servers REST/HTTP protocol Redis MongoDB db4o meta data contains SLA parse SLA & route data manage materialisation resolve mapping Polyglot Persistence Mediator
  • 289. Polyglot Persistence application Orestes servers REST/HTTP protocol Redis MongoDB db4o meta data contains SLA parse SLA & route data manage materialisation resolve mapping Polyglot Persistence Mediator
  • 290. Polyglot Persistence application Orestes servers REST/HTTP protocol Redis MongoDB db4o meta data contains SLA parse SLA & route data manage materialisation resolve mapping Polyglot Persistence Mediator Results: Article-Objects with Impression Count Article ID Title … Imp. Imp. ID MongoDB Redis Sorted Set Speedup with PPM: • 50-1000% • 66% performance of Varnish
  • 291. Cloud Evaluation of ORESTES Client Machine 50 ... Web Cache Orestes Server Versant DB Amazon EC2 Ireland EC2 USA165 ms Client Machine Client Machine
  • 292. Cloud Evaluation of ORESTES Client Machine 50 ... Web Cache Orestes Server Versant DB Amazon EC2 Ireland EC2 USA165 ms Client Machine Client Machine 30 000 Objekte 500 Anfragen/ Client 30 000 Objects 500 Req./Clients 10/1 Read/Write
  • 293. Cloud Evaluation of ORESTES Client Machine 50 ... Web Cache Orestes Server Versant DB Amazon EC2 Ireland EC2 USA165 ms Client Machine Client Machine
  • 294.  Stateless Scale-out REST middleware for DBaaS  Generic Schema, Authentication, Multi-Tenancy, etc.  SCOT (Scalable Optimistic Cache-Aware Transactions): ◦ Optimistic Database-indepdent ACID transactions  BFB (Bloom Filter Bounded Staleness): ◦ Allows static caching with tunable consistency guarantee  PPM (Polyglot Persistence Mediator): ◦ SLAs  appropriate database Orestes: Summary
  • 296.  Orestes as a startup Baqend
  • 297.  Orestes as a startup Baqend Internet Seoxy REST-API Transactions Schema Management Cache Consistency Auto-Scaling Multi-Tenancy Security and Access Control Provisioning
  • 298.  Orestes as a startup Baqend
  • 299.  Orestes as a startup Baqend
  • 300.  Orestes as a startup Baqend
  • 301.  Orestes as a startup Baqend
  • 302.  Orestes as a startup Baqend
  • 304. Baqend in Action GET /app.html
  • 305. Baqend in Action GET /app.html db.find(Menu, 'main') .done(...); db.find(Page, 'hero') .done(...); db.query(Page, 'top3') .done(...);
  • 306. Baqend in Action GET /app.html db.find(Menu, 'main') .done(...); db.find(Page, 'hero') .done(...); db.query(Page, 'top3') .done(...); GET /img/pic005.jpg GET /img/pic017.jpg GET /img/pic022.jpg
  • 308. Wrap-up & book recommendations
  • 309. How to choose a cloud database: Wrap-up Managed RDBMS Managed DWH Managed NoSQL DB Backend-as- a-Service Proprietary Serivce Object Store
  • 310. How to choose a cloud database: Wrap-up Define your functional requirements Define your non-functional requirements Managed RDBMS Managed DWH Managed NoSQL DB Backend-as- a-Service Proprietary Serivce Object Store
  • 311. How to choose a cloud database: Wrap-up Define your functional requirements Define your non-functional requirements Managed RDBMS Managed DWH Managed NoSQL DB Backend-as- a-Service Proprietary Serivce Object Store 1. Underyling DB 2. Docs & books 3. Your own tests Evaluate by:
  • 312. How to choose a cloud database: Wrap-up Define your functional requirements Define your non-functional requirements Managed RDBMS Managed DWH Managed NoSQL DB Backend-as- a-Service Proprietary Serivce Object Store 1. Underyling DB 2. Docs & books 3. Your own tests 4. SLAs 5. Docs & books 6. Experience Reports Evaluate by:
  • 313. How to choose a cloud database: Wrap-up Define your functional requirements Define your non-functional requirements Managed RDBMS Managed DWH Managed NoSQL DB Backend-as- a-Service Proprietary Serivce Object Store 1. Underyling DB 2. Docs & books 3. Your own tests 4. SLAs 5. Docs & books 6. Experience Reports Evaluate by: Try it
  • 315. Book recommendations • (Non-scientific) literature is rare • Some books cover specific DBaaS and cloud platforms
  • 317. VLDB (Very Large Databases) SIGMOD (Special Interest Group on Management of Data) ICDE (International Conference on Data Engineering) CIDR (Conference on Innovative Data Systems Research) SOCC (Symposium on Cloud Computing) OSDI/SOSP (Operating Systems Design and Implementation/ Symposium on Operating System Principles) EuroSys Top Scientific Conferences Database Research Distributed Systems Research
  • 318. VLDB (Very Large Databases) SIGMOD (Special Interest Group on Management of Data) ICDE (International Conference on Data Engineering) CIDR (Conference on Innovative Data Systems Research) SOCC (Symposium on Cloud Computing) OSDI/SOSP (Operating Systems Design and Implementation/ Symposium on Operating System Principles) EuroSys Top Scientific Conferences Database Research Distributed Systems Research This year probably in Washington D.C. Learn more: scdm2013.com