Amazon SimpleDB

Amazon SimpleDB
Sean Collins

Sean Collins

www.coreitpro.com
contact@coreitpro.com

Tale of Two Cities

• Relational
• “Non-Relational”

Tale of Two Cities

• Relational

Relational Model
Information Retrieval P. BAXENDALE, Editor

A Relational Model of Data for The relational view (or model) of data described in
Section 1 appears to be superior in several respects to the
Large Shared Data Banks graph or network model [3,4] presently in vogue for non-
inferential systems. It provides a means of describing data
with its natural structure only-that is, without superim-
E. F. CODD posing any additional structure for machine representation
IBM Research Laboratory, San Jose, California purposes. Accordingly, it provides a basis for a high level
data language which will yield maximal independence be-
tween programs on the one hand and machine representa-
Future users of large data banks must be protected from tion and organization of data on the other.
having to know how the data is organized in the machine (the A further advantage of the relational view is that it
internal representation). A prompting service which supplies forms a sound basis for treating derivability, redundancy,
such information is not a satisfactory solution. Activities of users and consistency of relations-these are discussedin Section
at terminals and most application programs should remain 2. The network model, on the other hand, has spawned a
unaffected when the internal representation of data is changed number of confusions, not the least of which is mistaking
and even when some aspects of the external representation the derivation of connections for the derivation of rela-
are changed. Changes in data representation will often be tions (seeremarks in Section 2 on the “connection trap”).
needed as a result of changes in query, update, and report Finally, the relational view permits a clearer evaluation
traffic and natural growth in the types of stored information. of the scope and logical limitations of present formatted
Existing noninferential, formatted data systems provide users data systems, and also the relative merits (from a logical
with tree-structured files or slightly more general network standpoint) of competing representations of data within a
models of the data. In Section 1, inadequacies of these models single system. Examples of this clearer perspective are
are discussed. A model based on n-ary relations, a normal cited in various parts of this paper. Implementations of
form for data base relations, and the concept of a universal systems to support the relational model are not discussed.
data sublanguage are introduced. In Section 2, certain opera-
1.2. DATA DEPENDENCIES PRESENTSYSTEMS
IN
tions on relations (other than logical inference) are discussed
The provision of data description tables in recently de-
and applied to the problems of redundancy and consistency
veloped information systems represents a major advance
in the user’s model.
toward the goal of data independence [5,6,7]. Such tables
KEY WORDS AND PHRASES: data bank, data base, data structure, data facilitate changing certain characteristics of the data repre-
organization, hierarchies of data, networks of data, relations, derivability, sentation stored in a data bank. However, the variety of
redundancy, consistency, composition, join, retrieval language, predicate
calculus, security, data integrity
data representation characteristics which can be changed
CR CATEGORIES: 3.70, 3.73, 3.75, 4.20, 4.22, 4.29 without logically impairing some application programs is
still quite limited. Further, the model of data with which
users interact is still cluttered with representational prop-
erties, particularly in regard to the representation of col-
lections of data (as opposed to individual items). Three of
the principal kinds of data dependencies which still need
1. Relational Model and Normal Form to be removed are: ordering dependence, indexing depend-
ence, and accesspath dependence. In some systems these
1.I. INTR~xJ~TI~N dependencies are not clearly separable from one another.
This paper is concerned with the application of ele- 1.2.1. Ordering Dependence. Elements of data in a
mentary relation theory to systems which provide shared data bank may be stored in a variety of ways, someinvolv-
access large banks of formatted data. Except for a paper
to ing no concern for ordering, some permitting each element
by Childs [l], the principal application of relations to data to participate in one ordering only, others permitting each
systems has been to deductive question-answering systems. element to participate in several orderings. Let us consider
Levein and Maron [2] provide numerous referencesto work those existing systems which either require or permit data
in this area. elements to be stored in at least one total ordering which is
In contrast, the problems treated here are those of data closely associated with the hardware-determined ordering
independence-the independence of application programs of addresses.For example, the records of a file concerning
and terminal activities from growth in data types and parts might be stored in ascending order by part serial
changesin data representation-and certain kinds of data number. Such systems normally permit application pro-
inconsistency which are expected to become troublesome grams to assumethat the order of presentation of records
even in nondeductive systems. from such a file is identical to (or is a subordering of) the

Volume 13 / Number 6 / June, 1970 Communications of the ACM 377

• A Relational Model of Data for Large
Shared Data Banks

• E. F. Codd
• IBM Research Laboratory, San Jose,
California
• CACM June 1970

• Data as Relations
• “In many commercial,
governmental, and scientiﬁc data
banks ... some of the relations are of
quite high degree... Accordingly, we
propose that users deal, not with
relations which are domain-ordered,
but with relationships”

Relationships

• Customer To Order
• Order to Items
• And So Forth

Relational
• Provides SQL interface to developers
• ACID
• Atomicity
• Consistency
• Isolation
• Durability

Tale of Two Cities

• “Non-Relational”

CAP Theorem

• Consistency
• Availability
• Partition-tolerance

Non-Relational

• Less structured
• “Schema-less”
• Key-value storage
• Implement parts of ACID

WHY?

• Speed
• Flexibility

WHY?

• Speed
• Flexibility
• Scale

Speed

• No JOINS
• No special column types

Speed

• No JOINS
• No special column types
• Concurrent operations

Flexibility

• No table deﬁnition
• Store whatever you want
• Wherever you want
• Adjust on the ﬂy

Scalability

• Eventual consistency
• Writes propagate across nodes
• Propagation time is not constant

Amazon SimpleDB

• Amazon AWS
• “Structured Data” Storage
• Notable users include Netﬂix

SimpleDB Data Model

• Domain
• Item
• Name
• Attributes

SimpleDB Data Model

• All data stored as Strings

SimpleDB Features

Eventually Consistent
Consistent Read
Read

Stale Reads Possible No Stale Reads

Lowest read latency Potential higher read latency

Potential lower read
Highest read throughput
throughput

SimpleDB Features

• Conditional Transactions
• PUT/DELETE
• At the Item Level
• Based on Item Attributes

Using SimpleDB

• Operations are issued as HTTP GET
requests (REST)
• Responses are XML
• Supports an SQL-like syntax for
fetching items from the domain

Using SimpleDB
• Supports an SQL-like syntax for fetching items from the
domain

• SELECT <specification> FROM <domain> WHERE
<condition>

• Specifications
• * (all attributes)
• itemName()
• count(*)
• Specific attributes

https://sdb.amazonaws.com/
?Action=PutAttributes
&Attribute.1.Name=Color
&Attribute.1.Value=Blue
&Attribute.2.Name=Size
&Attribute.2.Value=Med
&Attribute.3.Name=Price
&Attribute.3.Value=0014.99
&Attribute.3.Replace=true
&AWSAccessKeyId=[valid access key id]
&DomainName=MyDomain
&ItemName=Item123
&SignatureVersion=2
&SignatureMethod=HmacSHA256
&Timestamp=2010-01-25T15%3A03%3A05-
07%3A00
&Version=2009-04-15
&Signature=[valid signature]

Case Study

• ZINC Database
• Commercially available compounds
• Virtual Screening
• Clean “Drug Like” (#13)
• Approx. 3,751,744 compounds

Data Model

• Item

• Name = ZINC_ID
• Attributes

• Molecular Weight
• Charge

• SMILES
• “Simpliﬁed molecular input line entry
speciﬁcation”

Boto
• Provides a library for accessing
Amazon AWS services

• Encapsulates SimpleDB data in
Python objects

• Dictionaries
• Iterators
• etc..

for item in domain.select("SELECT * FROM zinc_13"):

print item.name

print item.keys()

print item.values()

Some Tips
• Aggregate your operations
• <= 25 rows per request
• Shard your data across Domains
• Handling Numerical Data
• Zero Padding
• Negative Numbers Offsets
• Dates

Advantages

• Faster development times
• (No) Administration
• No Hardware!
• Scale-as-you-go
• Pay-as-you-go

Pricing
• 1GB Free Storage
• $0.25/GB/mo Thereafter
• $0.10/GB Transfer In
• $.15/GB Out
• 25 Machine Hours Free/month
• $0.14/hr Thereafter

Limitations

• Less Features = More Work for the
Developer
• Dates
• Numerical Data
• Data Consistency

Limits

Limitations
Following is a table that describes current limits within Amazon SimpleDB.

Parameter Restriction

Domain size 10 GB per domain

Domain size 1 billion attributes per domain

Domain name 3-255 characters (a-z, A-Z, 0-9, '_', '-', and '.')

Domains per account 100

Attribute name-value pairs per item 256

Attribute name length 1024 bytes

Attribute value length 1024 bytes

Item name length 1024 bytes

Attribute name, attribute value, and item All UTF-8 characters that are valid in XML documents.
name allowed characters
Control characters and any sequences that are not valid in
XML are returned Base64-encoded. For more information,
see Working with XML-Restricted Characters .

Attributes per PutAttributes operation 256

Attributes requested per Select 256
operation

Items per BatchPutAttributesoperation 25

Maximum items in Selectresponse 2500

Maximum query execution time 5 seconds

Maximum number of unique attributes 20
per Selectexpression

Maximum number of comparisons per 20
Selectexpression

Maximum response size for Select 1MB

Copyright Information

Editorial
• NoSQL vs. SQL
• Coder vs. Architect
• Business Requirements
• Time vs. Features
• “The Nightmare Scenario”
• “Race to the Bottom”
• “Me Too Syndrome”

Editorial

• Relational Databases Need to Catch Up
• Meet/Exceed developer
expectations

• Netﬂix wouldn’t have fork-lifted ~1
Billion Rows out of Oracle “just for
fun”

Amazon SimpleDB

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (15)

Similar a Amazon SimpleDB

Similar a Amazon SimpleDB (20)

Último

Último (20)

Amazon SimpleDB

Notas del editor