SlideShare una empresa de Scribd logo
1 de 38
Descargar para leer sin conexión
Luca Cabibbo
Luca Cabibbo
NoSQL Database Design
for Next-Generation Web
Applications
Joint work with Francesca Bugiotti,
Paolo Atzeni, and Riccardo Torlone
NoSQL Database Design 1
Luca Cabibbo
 NoSQL datastores are a new generation of distributed database
systems – they have been designed to manage large data sets
distributed over many servers
 a promise of NoSQL database technology is to support the
development of next-generation web applications
 in this context, we are interested in NoSQL database design
 we also present an abstract data model for NoSQL databases
NoSQL Database Design 2
- Introduction
Luca Cabibbo
 NoSQL database systems
 NoAM – an abstract data model for NoSQL databases
 NoSQL database design for next-generation web applications
 The NoAM approach to NoSQL database design
 overview
 aggregates and aggregate design
 aggregate partitioning
 a language for data representations
 implementation
 conclusion
 ONDM (Object-NoSQL Datastore Mapper)
 architecture
 conclusion
 A case study in NoSQL database design
NoSQL Database Design 3
- Outline
Luca Cabibbo
 NoSQL datastores are a new generation of distributed database
systems
 they have been designed to support the needs of an increasing
number of modern applications – for which traditional database
technology is unsatisfactory
 a main requirement for these systems is the ability to manage
large data sets distributed over many servers – whereas
relational DBMSs are not designed to be run on clusters
NoSQL Database Design 5
* NoSQL database systems
Luca Cabibbo
 The NoSQL landscape is characterized by a high heterogeneity
 http://nosql-database.org/ lists 150 non-relational databases
 they have different data models and different APIs to access
the data – as well as different consistency and durability
guarantees
 We focus here on three main categories of NoSQL databases
 key-value stores
 a database is a collection of key-value pairs
 document stores
 a database is a collection of documents
 extensible record stores
 data is organized as tables of extensible records
 these categories include more than 70 systems
NoSQL Database Design 7
NoSQL and heterogeneity
Luca Cabibbo
 In a key-value store, a database is a schema-less collection of
key-value pairs
 values are usually binary strings, opaque to the datastore –
even if some systems have interpreted values, such as
counters, lists, or hashes
 programmer-defined keys are either binary strings or structured
keys – in some systems, part of the key is used to control data
distribution
 simple data access operations – put, get, and delete – over an
individual key-value pair or a group of related key-value pairs
NoSQL Database Design 8
Key-value stores
Luca Cabibbo
Fare clic per modificare lo stile del
titolo
Luca Cabibbo NoSQL Database Design 9
Key-value stores: examples
key value
Player:mary
username : mary
firstName : Mary
lastName : Wilson
games : { … }
Player:rick
username : rick
firstName : Ricky
lastName : Doe
score : 42
games : { ... }
a hash value
key value
/Player/mary/- { “username” : “mary”, “firstName” : “Mary”, “lastName” : “Wilson”, “games” : […] }
/Player/rick/- { “username” : “rick”, “firstName” : “Ricky”, lastName : “Doe”, “score” : “42”, “games” : […] }
key value
/Player/mary/-/username mary
/Player/mary/-/firstName Mary
/Player/mary/-/lastName Wilson
/Player/mary/-/games [ … ]
Luca Cabibbo
 In a document store, a database is a set of documents
 each document has a complex value and an identifier
 documents are composed of fields, which are dynamically
defined for each document at runtime – each field can be a
scalar value, a list, or a document itself
 documents are organized in collections
 the structure of documents is not opaque to datastores – they
create indexes on documents and support content-based
querying
NoSQL Database Design 10
Document stores
Luca Cabibbo
Fare clic per modificare lo stile del
titolo
Luca Cabibbo NoSQL Database Design 11
Document stores: example
id document
mary
{
“_id” : “mary”,
“username” : “mary”,
“firstName” : “Mary”,
“lastName” : “Wilson”,
“games” : [ { “game” : “Game:2345”, “opponent” : “Player:rick” },
{ “game” : “Game:2611”, “opponent” : “Player:ann” } ]
}
rick
{
“_id” : “rick”,
“username” : “rick”,
“firstName” : “Ricky”,
“lastName” : “Doe”,
“score” : “42”,
“games” : [ { “game” : “Game:2345”, “opponent” : “Player:mary” },
{ “game” : “Game:7425”, “opponent” : “Player:ann” },
{ “game” : “Game:1241”, “opponent” : “Player:johnny” } ]
}
collection Player
Luca Cabibbo
 An extensible record (or column-family) store organizes data
around tables, records/rows, and columns
 a relaxation of the relational model, in which databases are
mostly schema-less – since each row can have its own set of
columns
 each table designates a primary key – which comprises the
only mandatory attributes of the table – in some systems, part
of the primary key is used to control data distribution
NoSQL Database Design 12
Extensible record stores
Luca Cabibbo
Fare clic per modificare lo stile del
titolo
Luca Cabibbo NoSQL Database Design 13
Extensible record stores: example
username firstName lastName score games[0] games[1] games[2] …
mary Mary Wilson {…} {…}
rick Ricky Doe 42 {…} {…} {…}
table Player
Luca Cabibbo
 The NoSQL landscape is characterized by a high heterogeneity
 however, “the availability of a high-level representation of the
data at hand, be it logical or conceptual, remains a fundamental
tool for developers and users, since it makes understanding,
managing, accessing, and integrating information sources
much easier, independently of the technologies used”
 To this end, we propose NoAM (NoSQL Abstract Model) – an
abstract and system independent data model for NoSQL
databases
 NoAM aims at exploiting the commonalities of their various data
models – but it also introduces abstractions to balance their
differences and variations
NoSQL Database Design 14
* NoAM – an abstract data model
for NoSQL databases
Luca Cabibbo
 Main commonalities of their various NoSQL data models
 NoSQL datastores share the common provision of having a
modeling data element that is a distribution, access and
manipulation unit (DAM unit)
 a data access unit
 more precisely, a maximal unit of consistency/atomic data
access and manipulation
 a unit of distribution
 each DAM unit is located on a single node of the cluster –
but in general different DAM units are distributed among
the nodes of the cluster
 in the various systems, a DAM unit can be
 a record/row – a document – a group of key-value pairs
sharing part of the key
 in NoAM, a DAM unit is called a block
NoSQL Database Design 15
The NoAM abstract data model
Luca Cabibbo
 Main commonalities of their various NoSQL data models
 NoSQL datastores also offer the ability to access just some
parts of a DAM unit – they have a modeling data element that is
a “smaller” data access unit (SDA unit)
 in the various systems, an SDA unit can be
 a column – a field – an individual key-value pair
 in NoAM, a SDA unit is called an entry
 moreover, many datastores provide a notion of collection of
data access units
 in the various systems, a collection can be
 a table – a document collection
 in NoAM, a collection of DAM units (blocks) is called a
collection
NoSQL Database Design 16
The NoAM abstract data model
Luca Cabibbo
 The NoAM abstract data model
 a database is a set of collections – each collection has a
distinct name
 a collection is a set of blocks – each block is identified in its
collection by a block key
 a block is a non-empty set of entries
 each entry is a pair (ek,ev)
 ek is the entry key – unique within its block
 ev is a value (either a scalar or a complex value), called the
entry value
NoSQL Database Design 17
The NoAM abstract data model
Luca Cabibbo
Fare clic per modificare lo stile del
titolo
Luca Cabibbo NoSQL Database Design 18
Example: a NoAM database
username “mary”
firstName “Mary”
lastName “Wilson”
games[0]  game : Game:2345, opponent : Player:rick 
games[1]  game : Game:2611, opponent : Player:ann 
mary
username “rick”
firstName “Ricky”
lastName “Doe”
score 42
games[0]  game : Game:2345, opponent : Player:mary 
games[1]  game : Game:7425, opponent : Player:ann 
games[2]  game : Game:1241, opponent : Player:johnny 
rick
Player
id 2345
firstPlayer Player:mary
secondPlayer Player:rick
rounds[0]  moves : … , comments : … 
rounds[1]  moves : … , actions : … , spell : … 
2345
Game
Luca Cabibbo
 We consider here NoSQL database design – the problem of
representing persistent data of an application in a target NoSQL
database
 NoSQL databases are claimed to be “schema-less”
 however, the data of interest do show some structure, and
decisions on the organization of data are required
 specifically, to map application data to the modeling
elements (collections, tables, documents, key-value pairs)
available in the target datastore
NoSQL Database Design 19
* NoSQL database design for
next-generation web applications
Luca Cabibbo
 Consider a fictious online, web 2.0 game – e.g., some variant of
Ruzzle – which should manage various application objects,
including players, games, rounds, and moves
NoSQL Database Design 20
A running example
mary : Player
username = "mary"
firstName = "Mary"
lastName = "Wilson"
rick : Player
username = "rick"
firstName = "Ricky"
lastName = "Doe"
score = 42
2345 : Game
id = 2345
firstPlayer secondPlayer
: GameInfo
games[0]
gameopponent
: GameInfo
games[0]
game opponent
: Round : Round
rounds[0] rounds[1]
: Move : Move
moves[0] moves[1]
: Move
moves[0]
: GameInfo
games[2]
: GameInfo
games[1]
: GameInfo
games[1]...
...
...
...
...
...
Luca Cabibbo
 Consider a fictious online, web 2.0 game – e.g., some variant of
Ruzzle – which should manage various application objects,
including players, games, rounds, and moves
 assume for example that the target database is an extensible
record store
 what records (and tables) should we use?
 a distinct record for each different application object?
 or should we use each record to represent a group of related
objects? what is the grouping criterion?
 what columns should we use?
 a distinct column for each object field?
 or should we use each column to represent a group of
related fields? what is the grouping criterion?
NoSQL Database Design 21
A running example
Luca Cabibbo
 In NoSQL database design
 decisions on the organization of data are required, in any case
 these decisions are significant – as the data representation
affects major quality requirements – such as scalability,
performance, and consistency
 a randomly chosen data representation may not satisfy the
needed qualities
 how should we make design decisions to indeed support the
qualities of next-generation web applications?
NoSQL Database Design 22
NoSQL database design
Luca Cabibbo
 We focus here on database design for next-generation web
applications – also called scalable web applications, or scalable
simple OLTP-style applications
 foremost requirements of these applications
 data of interest are large and have a flexible structure
 data access is based on simple read-write operations
 horizontal scalability – data should be distributed over a
cluster of many servers
 high availability and good response time
 relaxed consistency guarantees – general ACID transactions
are typically unnecessary – however, a certain degree of
consistency is required, to easy application development –
e.g., eventual consistency or BASE
NoSQL Database Design 23
Next-generation web applications
Luca Cabibbo
 State-of-the-art in NoSQL database design
 a lot of best practices and guidelines
 but usually related to a specific datastore or class of
datastores
 neither a systematic methodology nor a high-level data model
 as in the case of relational database design
NoSQL Database Design 24
State of the art
Luca Cabibbo
 We propose the NoAM approach to NoSQL database design
 tailored to the requirements of next-generation web applications
 based on the NoAM abstract data model for NoSQL databases
 a high level/system independent approach – the initial design
activities are independent of any specific target systems
 a NoAM abstract database is first used to represent the
application data
 the intermediate representation is then implemented in a
target NoSQL datastore, taking into account its specific
features
NoSQL Database Design 25
* The NoAM approach to
NoSQL database design
Luca Cabibbo
 The NoAM approach to NoSQL database design is based on the
following main phases
 aggregate design – to identify the various classes of aggregate
objects needed in the application
 this activity is driven by use cases (functional requirements)
and scalability and consistency needs
 aggregate partitioning – aggregates are partitioned into smaller
data elements
 driven by use cases and performance requirements
 high-level NoSQL database design – aggregate are mapped to
the NoAM intermediate data model
 implementation – to map the intermediate representation to the
specific modeling elements of the target datastore
NoSQL Database Design 26
- Overview
Luca Cabibbo
 We start by considering application data objects…
NoSQL Database Design 27
Application data
mary : Player
username = "mary"
firstName = "Mary"
lastName = "Wilson"
rick : Player
username = "rick"
firstName = "Ricky"
lastName = "Doe"
score = 42
2345 : Game
id = 2345
firstPlayer secondPlayer
: GameInfo
games[0]
gameopponent
: GameInfo
games[0]
game opponent
: Round : Round
rounds[0] rounds[1]
: Move : Move
moves[0] moves[1]
: Move
moves[0]
: GameInfo
games[2]
: GameInfo
games[1]
: GameInfo
games[1]...
...
...
...
...
...
Luca Cabibbo
 … we group them in aggregates (decisions needed!) …
NoSQL Database Design 28
Aggregates
mary : Player
username = "mary"
firstName = "Mary"
lastName = "Wilson"
rick : Player
username = "rick"
firstName = "Ricky"
lastName = "Doe"
score = 42
2345 : Game
id = 2345
firstPlayer secondPlayer
: GameInfo
games[0]
gameopponent
: GameInfo
games[0]
game opponent
: Round : Round
rounds[0] rounds[1]
: Move : Move
moves[0] moves[1]
: Move
moves[0]
: GameInfo
games[2]
: GameInfo
games[1]
: GameInfo
games[1]...
...
...
...
...
...
Luca Cabibbo
 … we consider aggregates as complex-value objects…
NoSQL Database Design 29
Aggregates as complex-value objects
Player:mary : 
username : "mary",
firstName : "Mary",
lastName : "Wilson",
games : {
 game : Game:2345, opponent : Player:rick ,
 game : Game:2611, opponent : Player:ann 
}

Player:rick : 
username : "rick",
firstName : "Ricky",
lastName : "Doe",
score : 42,
games : {
 game : Game:2345, opponent : Player:mary ,
 game : Game:7425, opponent : Player:ann ,
 game : Game:1241, opponent : Player:johnny 
}

Game:2345 : 
id : "2345",
firstPlayer : Player:mary,
secondPlayer : Player:rick,
rounds : {
 moves : … , comments : … ,
 moves : … , actions : … , spell : … 
}

Luca Cabibbo
 … we partition these complex values (decisions needed!) …
NoSQL Database Design 30
Aggregate partitioning
Player:mary : 
username : "mary",
firstName : "Mary",
lastName : "Wilson",
games : {
 game : Game:2345, opponent : Player:rick ,
 game : Game:2611, opponent : Player:ann 
}

Player:rick : 
username : "rick",
firstName : "Ricky",
lastName : "Doe",
score : 42,
games : {
 game : Game:2345, opponent : Player:mary ,
 game : Game:7425, opponent : Player:ann ,
 game : Game:1241, opponent : Player:johnny 
}

Game:2345 : 
id : "2345",
firstPlayer : Player:mary,
secondPlayer : Player:rick,
rounds : {
 moves : … , comments : … ,
 moves : … , actions : … , spell : … 
}

Luca Cabibbo
 … and represent them into an abstract data model for NoSQL
databases (consequence of decisions) …
NoSQL Database Design 31
Data representation in NoAM
username “mary”
firstName “Mary”
lastName “Wilson”
games[0]  game : Game:2345, opponent : Player:rick 
games[1]  game : Game:2611, opponent : Player:ann 
mary username “rick”
firstName “Ricky”
lastName “Doe”
score 42
games[0]  game : Game:2345, opponent : Player:mary 
games[1]  game : Game:7425, opponent : Player:ann 
games[2]  game : Game:1241, opponent : Player:johnny 
rick
Player
id 2345
firstPlayer Player:mary
secondPlayer Player:rick
rounds[0]  moves : … , comments : … 
rounds[1]  moves : … , actions : … , spell : … 
2345
Game
Luca Cabibbo
 … and finally we map the intermediate representation to the data
structures of the target datastore (the approach specifies how)
NoSQL Database Design 32
Implementation
username firstName lastName score games[0] games[1] games[2] …
mary Mary Wilson {…} {…}
rick Ricky Doe 42 {…} {…} {…}
table Player
id firstPlayer secondPlayer rounds[0] rounds[1] rounds[2] …
2345 Player:mary Player:rick {…} {…}
table Game
Luca Cabibbo
 … and finally we map the intermediate representation to the data
structures of the target datastore (the approach specifies how)
NoSQL Database Design 33
Implementation
key value
/Player/mary/-/username mary
/Player/mary/-/firstName Mary
/Player/mary/-/lastName Wilson
/Player/mary/-/games[0] { “game” : “Game:2345”, “opponent” : “Player:rick” }
/Player/mary/-/games[1] { “game” : “Game:2611”, “opponent” : “Player:ann” }
… …
/Games/2345/-/id 2345
/Games/2345/-/firstPlayer Player:mary
/Games/2345/-/secondPlayer Player:rick
/Games/2345/-/rounds[0] { … }
/Games/2345/-/rounds[1] { … }
… …
Luca Cabibbo
 In our approach, we consider application data arranged in
aggregates
 the notion of aggregate comes from Domain-Driven Design
(DDD) – a popular object-oriented design methodology – and
from principles in the design of scalable applications
 aggregate design affects scalability and the scope of atomic
operations – and therefore, the ability to support relevant
integrity constraints
NoSQL Database Design 34
- Aggregates and aggregate design
Luca Cabibbo
 The design of scalable applications is discussed in a seminal
paper by Pat Helland – Life beyond distributed transactions: an
apostate’s opinion, CIDR 2007
 data should be organized as a set of complex-value objects
with unique identifiers – called entities or aggregates – each
aggregate is a “chunk” of related data, and is intended to be a
unit of data access and manipulation
 aggregates should govern data distribution – aggregates are
distributed among the nodes of the cluster, but each aggregate
is located on a single node
 atomic transactions can not span multiple aggregates – to avoid
the coordination overhead required by distributed transactions
 operations spanning multiple aggregates should be
implemented as multiple operations, each over a single
aggregate – using asynchronous messages and eventual
consistency
NoSQL Database Design 35
Design of scalable applications
Luca Cabibbo
 Domain-Driven Design (DDD, Eric Evans, 2003) is also based on
a similar notion of aggregate – DDD gives us other insights on
aggregate design
 each aggregate is a group of application objects (entities and
value objects) rooted in an entity
 an entity is a persistence object that has independent
existence and a unique identifier
 a value object is a persistent object, without an own identifier
 aggregate boundaries govern distribution and transactions
 each aggregate should be
 large enough, to accommodate all the data involved by some
integrity constraints or other business rules
 as small as possible, to reduce concurrency collisions – to
support performance and scalability requirements
NoSQL Database Design 36
Aggregates in Domain-driven design
Luca Cabibbo
 Aggregates in our running example are players and games
 but rounds are not – to support an integrity constraint of the
game
NoSQL Database Design 37
Example
mary : Player
username = "mary"
firstName = "Mary"
lastName = "Wilson"
rick : Player
username = "rick"
firstName = "Ricky"
lastName = "Doe"
score = 42
2345 : Game
id = 2345
firstPlayer secondPlayer
: GameInfo
games[0]
gameopponent
: GameInfo
games[0]
game opponent
: Round : Round
rounds[0] rounds[1]
: Move : Move
moves[0] moves[1]
: Move
moves[0]
: GameInfo
games[2]
: GameInfo
games[1]
: GameInfo
games[1]...
...
...
...
...
...
Luca Cabibbo
 At the application level, data is organized in aggregates
 each aggregate object is can be considered a complex-value
object, with a unique identifier
 a set of aggregate objects is a class
 a set of class form an application dataset
NoSQL Database Design 38
Application data model
Player:mary : 
username : "mary",
firstName : "Mary",
lastName : "Wilson",
games : {
 game : Game:2345, opponent : Player:rick ,
 game : Game:2611, opponent : Player:ann 
}

Player:rick : 
username : "rick",
firstName : "Ricky",
lastName : "Doe",
score : 42,
games : {
 game : Game:2345, opponent : Player:mary ,
 game : Game:7425, opponent : Player:ann ,
 game : Game:1241, opponent : Player:johnny 
}

Game:2345 : 
id : "2345",
firstPlayer : Player:mary,
secondPlayer : Player:rick,
rounds : {
 moves : … , comments : … ,
 moves : … , actions : … , spell : … 
}

Luca Cabibbo
 To summarize, aggregates have the following characteristics
 an aggregate is a complex-value object
 each aggregate is a unit of data access and atomic
manipulation
 aggregates govern data distribution
 In NoSQL database design, we should map each aggregate to a
data modeling element having analogous features
NoSQL Database Design 39
Aggregates in NoSQL db design
Luca Cabibbo
 In NoSQL database design, we should map each aggregate to a
data modeling element having analogous features
 each aggregate should be mapped to a unit of data access,
atomic manipulation, and distribution
 therefore, a record/row, a document, or a group of related
key-value pairs – that is, a NoAM block
 classes of aggregates can then be mapped to NoAM collections
 the role for columns, document fields, or individual key-value
pairs (i.e., NoAM entries) has to be discussed
 we would like to abstract from the features of specific
datastores – NoAM enables us to do so
NoSQL Database Design 40
Aggregates in NoSQL db design
Luca Cabibbo
 An application dataset can be represented in NoAM as follows
 the application dataset is represented by a NoAM database
 each class of aggregates is represented by a collection
 the class name is used as collection name
 each aggregate object is represented by a block
 the aggregate identifier is used as block key
 each aggregate object is represented by one or more entries in
the corresponding block
 the complex value of the aggregate object is partitioned into
one or more entry values
NoSQL Database Design 41
Representing aggregates in NoAM
Luca Cabibbo
Fare clic per modificare lo stile del
titolo
Luca Cabibbo NoSQL Database Design 42
Representing aggregates in NoAM
Player:mary : 
username : "mary",
firstName : "Mary",
lastName : "Wilson",
games : {
 game : Game:2345, opponent : Player:rick ,
 game : Game:2611, opponent : Player:ann 
}

Player:rick : 
username : "rick",
firstName : "Ricky",
lastName : "Doe",
score : 42,
games : {
 game : Game:2345, opponent : Player:mary ,
 game : Game:7425, opponent : Player:ann ,
 game : Game:1241, opponent : Player:johnny 
}

Game:2345 : 
id : "2345",
firstPlayer : Player:mary,
secondPlayer : Player:rick,
rounds : {
 moves : … , comments : … ,
 moves : … , actions : … , spell : … 
}

… …
… …
… …
mary
… …
… …
… …
… …
rick
Player
… …
… …
… …
2345
Game
Luca Cabibbo
Fare clic per modificare lo stile del
titolo
Luca Cabibbo NoSQL Database Design 43
Example: partitioning of aggregates
Player:mary : 
username : "mary",
firstName : "Mary",
lastName : "Wilson",
games : {
 game : Game:2345, opponent : Player:rick ,
 game : Game:2611, opponent : Player:ann 
}

Player:rick : 
username : "rick",
firstName : "Ricky",
lastName : "Doe",
score : 42,
games : {
 game : Game:2345, opponent : Player:mary ,
 game : Game:7425, opponent : Player:ann ,
 game : Game:1241, opponent : Player:johnny 
}
 Game:2345 : 
id : "2345",
firstPlayer : Player:mary,
secondPlayer : Player:rick,
rounds : {
 moves : … , comments : … ,
 moves : … , actions : … , spell : … 
}

Luca Cabibbo
Fare clic per modificare lo stile del
titolo
Luca Cabibbo NoSQL Database Design 44
Example: aggregates in NoAM
username “mary”
firstName “Mary”
lastName “Wilson”
games[0]  game : Game:2345, opponent : Player:rick 
games[1]  game : Game:2611, opponent : Player:ann 
mary
username “rick”
firstName “Ricky”
lastName “Doe”
score 42
games[0]  game : Game:2345, opponent : Player:mary 
games[1]  game : Game:7425, opponent : Player:ann 
games[2]  game : Game:1241, opponent : Player:johnny 
rick
Player
id 2345
firstPlayer Player:mary
secondPlayer Player:rick
rounds[0]  moves : … , comments : … 
rounds[1]  moves : … , actions : … , spell : … 
2345
Game
Luca Cabibbo
 In representing an aggregate object in NoAM, we use one or more
entries – to partition the complex value of the aggregate
 aggregate partitioning affects performance of data access and
manipulation operations
 this partitioning can be based on
 basic (predefined) data representation strategies
 custom data representations
NoSQL Database Design 45
- Aggregate partitioning
Luca Cabibbo
 Entry per Aggregate Object (EAO)
 an aggregate object is represented by a single entry
 the entry value is the whole complex value – the entry key is
empty
NoSQL Database Design 46
Entry per Aggregate Object (EAO)


username : "mary",
firstName : "Mary",
lastName : "Wilson",
games : {
 game : Game:2345, opponent : Player:rick ,
 game : Game:2611, opponent : Player:ann 
}

mary
Luca Cabibbo
 Entry per Top-level Field (ETF)
 an aggregate object is represented by multiple entries – a
distinct entry for each top-level field of the complex value
 the entry value is the field value – the entry key is the field
name
NoSQL Database Design 47
Entry per Top-level Field (ETF)
username “mary”
firstName “Mary”
lastName “Wilson”
games
{
 game : Game:2345, opponent : Player:rick ,
 game : Game:2611, opponent : Player:ann 
}
mary
Luca Cabibbo
 Entry per Atomic Value (EAV)
 an aggregate object is represented by multiple entries – a
distinct entry for each atomic value in the complex value
 the entry value is the atomic value – the entry key is the
“access path” to the atomic value
NoSQL Database Design 48
Entry per Atomic Value (EAV)
username “mary”
firstName “Mary”
lastName “Wilson”
games[0].game Game:2345
games[0].opponent Player:rick
games[1].game Game:2611
games[1].opponent Player:ann
mary
Luca Cabibbo
 The basic data representation strategies can be suited in some
cases – but we often need to partition aggregates in custom ways
 aggregate partitioning can be driven by data access operations
– since it affects the performance of database operations
 each element of a partition (i.e., an entry) can represent either a
scalar value or a complex value – the usage of “entries” with a
complex value is a common practice in NoSQL datastores –
e.g., Protocol Buffers, Avro schemas
NoSQL Database Design 49
Custom aggregate partitioning
Luca Cabibbo
 Guidelines for aggregate partitioning – adapted from Conceptual
Database Design (Batini, Ceri, Navathe, 1992)
 if an aggregate is small in size, or all or most of its data are
accessed or modified together – then it should be represented
by a single entry
 if an aggregate is large in size, and there are operations that
frequently access or modify only specific portions of the
aggregate – then it should be represented by multiple entries
 if two or more data elements are frequently accessed or
modified together – then they should belong to the same entry
 if two or more data elements are usually accessed or modified
separately – then they should belong to distinct entries
NoSQL Database Design 50
Guidelines for aggregate partitioning
Luca Cabibbo
 Operations for our online game
1. when a player connects to the application – the aggregate for
the player should be retrieved
2. when a player selects a game to continue – the aggregate for
the game should be retrieved
3. when a player completes a round for a game – the aggregate
for the game should be updated, by adding the new round
4. when a player invites a friend for playing a new game – an
aggregate for a new game should be created, and the
aggregate for the opponent players should be updated, by
adding the new game
 For example, what does operation 3 suggest?
 each round should be represented using a distinct entry of the
corresponding game aggregate
NoSQL Database Design 51
Aggregate partitioning: Example
Luca Cabibbo
Fare clic per modificare lo stile del
titolo
Luca Cabibbo NoSQL Database Design 52
Aggregate partitioning: Example
username “mary”
firstName “Mary”
lastName “Wilson”
games[0]  game : Game:2345, opponent : Player:rick 
games[1]  game : Game:2611, opponent : Player:ann 
mary
Player
id 2345
firstPlayer Player:mary
secondPlayer Player:rick
rounds[0]  moves : … , comments : … 
rounds[1]  moves : … , actions : … , spell : … 
2345
Game
Luca Cabibbo
 NoAM defines a language to specify aggregate partitioning – and
therefore, data representations
 the language can be used to describe or document a certain
aggregate partitioning
 more importantly, it can be used in a mapping system
 the database designer uses the language to specify a data
representation – in a system-independent way
 the mapping framework interprets the specification – to
represent aggregates in the specific target datastore and to
handle operations over them
 The language has an XPath-like syntax – and we illustrate it by
means of examples
NoSQL Database Design 53
- A language for data representations
Luca Cabibbo
 Rule /*/* specifies strategy Entry per Aggregate Object (EAO)
 the first * matches with aggregate classes
 the second * matches with aggregate identifiers
 the rule means “use an entry for each distinct aggregate class
and distinct aggregate identifier”
NoSQL Database Design 54
The language – by examples


username : "mary",
firstName : "Mary",
lastName : "Wilson",
games : {
 game : Game:2345, opponent : Player:rick ,
 game : Game:2611, opponent : Player:ann 
}

mary
Luca Cabibbo
 Rule /*/*/* specifies strategy Entry per Top-level Field (ETF)
 the third * matches with top-level fields of aggregates
 the rule means “use an entry for each distinct aggregate class,
aggregate identifier, and top-level field”
NoSQL Database Design 55
The language – by examples
username “mary”
firstName “Mary”
lastName “Wilson”
games
{
 game : Game:2345, opponent : Player:rick ,
 game : Game:2611, opponent : Player:ann 
}
mary
Luca Cabibbo
 A data representation is specified by a sequence of rules
 /Player/*/* – “use ETF for players”
 /Game/* – “use EAO for games”
NoSQL Database Design 56
The language – by examples
username “mary”
firstName “Mary”
lastName “Wilson”
games
{
 game : Game:2345, opponent : Player:rick ,
 game : Game:2611, opponent : Player:ann 
}
mary
Player


id : 2345,
firstPlayer : Player:mary,
secondPlayer : Player:rick,
rounds : {
 moves : … , comments : … ,
 moves : … , actions : … , spell : … 
}

2345
Game
Luca Cabibbo
 It is possible to have more rules over a same aggregate class
 /Player/*/games[*] – “use an entry for each game played by a
player”
 /Player/*/* – “use ETF for the remaining data of each player”
NoSQL Database Design 57
The language – by examples
username “mary”
firstName “Mary”
lastName “Wilson”
games[0]  game : Game:2345, opponent : Player:rick 
games[1]  game : Game:2611, opponent : Player:ann 
mary
Luca Cabibbo
 It is possible to have more rules over a same aggregate class
 /Player/*/games[*] – “use an entry for each game played by a
player”
 /Player/* – “use EAO for the remaining data of each player”
NoSQL Database Design 58
The language – by examples


username : “mary”,
firstName : “Mary”,
lastName : “Wilson”

games[0]  game : Game:2345, opponent : Player:rick 
games[1]  game : Game:2611, opponent : Player:ann 
mary
Luca Cabibbo
 In the implementation phase, we map the intermediate data
representation to the specific data modeling elements of the target
NoSQL datastore
 given that the NoAM data model generalizes the features of the
various systems, while keeping their major aspects, it is rather
straightforward to perform this activity
 Please note that the implementation takes also care of mapping
operations – specifically, CRUD operations (create, read, update,
delete) over aggregate objects to specific data access operations
 we do not discuss this issue here
 please find more details in the references
NoSQL Database Design 59
- Implementation
Luca Cabibbo
 Oracle NoSQL is a key-value store – a database is a collection of
key-value pairs
 values are binary strings, opaque to the datastore
 a key is composed of two parts
 the major key is a non-empty sequence of strings
 the minor key is a (possibly-empty) sequence of strings
 e.g, /Player/mary/-/username
 the major key controls data distributions – key-value pairs
having the same major key are allocated in a same node
 atomic operations on individual key-value pairs – but also on
groups of key-value pairs having the same major key
NoSQL Database Design 60
Oracle NoSQL: Implementation
Luca Cabibbo
 Mapping from NoAM to Oracle NoSQL
 a key-value pair for each entry
 the major key is composed of
 the collection name
 the block key (i.e., the aggregate identifier)
 the minor key represents the entry key (i.e., an access path)
 the value represents the entry value
 it can be either a simple value, or
 the serialization of a complex value – e.g., in JSON
NoSQL Database Design 61
Oracle NoSQL: Implementation
Luca Cabibbo
Fare clic per modificare lo stile del
titolo
Luca Cabibbo NoSQL Database Design 62
Oracle NoSQL: Implementation
key value
/Player/mary/- { “username” : “mary”, “firstName” : “Mary”, “lastName” : “Wilson”, “games” : […] }
/Player/rick/- { “username” : “rick”, “firstName” : “Ricky”, lastName : “Doe”, “score” : “42”, “games” : […] }
/Game/2345/- { “id” : “2345”, “firstPlayer” : “Player:mary”, “secondPlayer” : “Player:rick”, “rounds” : […] }


username : "mary",
firstName : "Mary",
lastName : "Wilson",
games : {
 game : Game:2345, opponent : Player:rick ,
 game : Game:2611, opponent : Player:ann 
}

mary
Player
Luca Cabibbo
Fare clic per modificare lo stile del
titolo
Luca Cabibbo NoSQL Database Design 63
Oracle NoSQL: Implementation
key value
/Player/mary/-/username mary
/Player/mary/-/firstName Mary
/Player/mary/-/lastName Wilson
/Player/mary/-/games[0] { “game” : “Game:2345”, “opponent” : “Player:rick” }
/Player/mary/-/games[1] { “game” : “Game:2611”, “opponent” : “Player:ann” }
… …
username “mary”
firstName “Mary”
lastName “Wilson”
games[0]  game : Game:2345, opponent : Player:rick 
games[1]  game : Game:2611, opponent : Player:ann 
mary
Player
Luca Cabibbo
 MongoDB is a document store – a database is a set of documents
 each document has a complex value and an identifier,
and documents are organized in collections
 Mapping from NoAM to MongoDB
 a document collection for each NoAM collection (aggregate
class)
 a main document for each block (aggregate)
 a top-level field for each entry
 the special _id field for the block key (aggregate identifier)
 atomic operations on individual documents – or on their fields
NoSQL Database Design 64
MongoDB: Implementation
Luca Cabibbo
Fare clic per modificare lo stile del
titolo
Luca Cabibbo NoSQL Database Design 65
MongoDB: Implementation
username “mary”
firstName “Mary”
lastName “Wilson”
games[0]  game : Game:2345, opponent : Player:rick 
games[1]  game : Game:2611, opponent : Player:ann 
mary
Player
id document
mary
{
“_id” : “mary”,
“username” : “mary”,
“firstName” : “Mary”,
“lastName” : “Wilson”,
“games[0]” : { “game” : “Game:2345”, “opponent” : “Player:rick” },
“games[1]” : { “game” : “Game:2611”, “opponent” : “Player:ann” }
}
collection Player
Luca Cabibbo
 A different implementation
 reconstruct structure of complex values
NoSQL Database Design 66
MongoDB: Alternative implementation
username “mary”
firstName “Mary”
lastName “Wilson”
games[0]  game : Game:2345, opponent : Player:rick 
games[1]  game : Game:2611, opponent : Player:ann 
mary
Player
id document
mary
{
“_id” : “mary”,
“username” : “mary”,
“firstName” : “Mary”,
“lastName” : “Wilson”,
“games” : [ { “game” : “Game:2345”, “opponent” : “Player:rick” },
{ “game” : “Game:2611”, “opponent” : “Player:ann” } ]
}
collection Player
Luca Cabibbo
 Amazon DynamoDB is an extensible record store
 a database is a set of tables
 each table is a set of items
 each item contains a set of attributes,
each with a name and a value
 each table has a primary key – composed of a hash partition
attribute and an optional range attribute
 the partition attribute controls distribution of items
 atomic operations on individual items – or on their columns
 Mapping from NoAM to DynamoDB
 a table for each collection (aggregate class)
 an item for each block (aggregate) – whose primary key is the
block key (aggregate identifier)
 an attribute for each entry
NoSQL Database Design 67
DynamoDB: Implementation
Luca Cabibbo
Fare clic per modificare lo stile del
titolo
Luca Cabibbo NoSQL Database Design 68
DynamoDB: Implementation
username “mary”
firstName “Mary”
lastName “Wilson”
games[0]  game : Game:2345, opponent : Player:rick 
games[1]  game : Game:2611, opponent : Player:ann 
mary
Player
username firstName lastName score games[0] games[1] games[2] …
mary Mary Wilson {…} {…}
rick Ricky Doe 42 {…} {…} {…}
table Player
id firstPlayer secondPlayer rounds[0] rounds[1] rounds[2] …
2345 Player:mary Player:rick {…} {…}
table Game
Luca Cabibbo
 NoAM (NoSQL Abstract Model) is a high-level approach to NoSQL
database design for next-generation web applications
 a high-level approach
 initial design activities are independent of any specific target
systems
 it is based on NoAM
 NoAM is an intermediate, abstract data model for NoSQL
databases – which exploits the commonalities of their
various data models – but also introduces abstractions to
balance their differences and variations
NoSQL Database Design 69
- Conclusion (NoAM)
Luca Cabibbo
 Open issues
 NoAM data model
 other abstractions are needed to represent further data
modeling elements available in NoSQL datastores
 further abstractions related to relevant metadata – e.g.,
versions and timestamps, to support concurrency control
and consistency management
 derived data and materialized views
 so far, we have assumed that data is represented in a non-
redundant way – some redundancy is usually suggested in
NoSQL databases, to improve performance – but note that
view maintenance could affect consistency negatively
 support to multi-aggregate transactions is required
NoSQL Database Design 70
Conclusion (NoAM data model)
Luca Cabibbo
 Open issues
 NoAM approach
 the proposed guidelines can propose conflicting suggestions
– therefore, the application of the approach might result in a
number of candidate data representations, rather than to a
single one
 tools can help the designer to assess a preferred solution
 NoSQL database design for different settings
 for example, to support query-intensive applications and
analytical queries
NoSQL Database Design 71
Conclusion (NoAM approach)
Luca Cabibbo
 ONDM (Object-NoSQL Datastore Mapper) is a framework that
provides application developers with
 a uniform access towards a variety of NoSQL datastores
 the ability to map application data to different data
representations, in a flexible way
 Main features of ONDM
 object-oriented API, based on Java Persistence API (JPA)
 transparent access to various NoSQL datastores – such as
Oracle NoSQL, Redis, MongoDB, CouchBase, and Cassandra
 internal representation based on NoAM
 flexible data representations – based on the NoAM language
for data representations
NoSQL Database Design 72
* ONDM
(Object-NoSQL Datastore Mapper)
Luca Cabibbo
 A layered architecture
 API – based on JPA, offers
CRUD operations to
manipulate aggregates
 internal aggregate manager
– conversion between
aggregate objects and an
internal representation
(JSON) – cache mgmt
 data representation manager
– in NoAM, wrt the specified
data representation
 datastore adapters –
conversion between NoAM
and specific data structures
and operations
Application
create, read, update, delete 
(of application objects)
Internal 
Representation
{
"id" : "2345",
"roundCount" : "2",
"firstPlayer" : "Player:mary",
"secondPlayer" : "Player:rick",
"rounds" : [...]
}
Game:2345
JSON
Java
put, get, delete 
(of complex‐value objects)
Data Representation
NoAM
Game : 2345
id : 2345
roundCount : 2
firstPlayer : Player:mary
secondPlayer : Player:rick
rounds[0] :  moves:…, comments:... 
rounds[1] :  moves:…, comments:... 
Oracle NoSQL
Datastore
Oracle NoSQL
Datastore Adapter
Oracle NoSQL
API operations
DynamoDB
Datastore
DynamoDB
Datastore Adapter
DynamoDB
API operations
id rc firstPlayer secondPlayer rounds[0]
2345 2 Player:mary Player:rick ...
rounds[1]
...
Game/2345/‐/id
Game/2345/‐/roundCount
Game/2345/‐/firstPlayer
Game/2345/‐/secondPlayer
Game/2345/‐/rounds[0]
Game/2345/‐/rounds[1]
2345
2
Player:mary
Player:rick
{ moves:[…], comments:[…] }
{ moves:[…], comments:[…] }
...
/Player/*/games[*]
/Player/*
/Game/*/rounds[*]
/Game/*/*
Data Representation 
Language
Cache
put, get, delete, 
multiPut, multiGet, multiDelete
(of entries and blocks)
NoSQL Database Design 73
- Architecture of ONDM
Luca Cabibbo
 The database design activity can result in a number of candidate
data representations – rather than to a single one
 consider again the operations for our online game
2. when a player selects a game to continue – the aggregate for
the game should be retrieved
3. when a player completes a round for a game – the aggregate
for the game should be updated, by adding the new round
 operations 2 and 3 suggest different choices for the
representation of rounds – (i) all together in a single entry or
(ii) using a distinct entry for each round
 In this case, experiments are needed to assess the most suitable
design solution – and ONDM can help in performing them
 an important feature is the ability to select a desired data
representation in a declarative way – using the NoAM language
for data representations
NoSQL Database Design 82
* A case study in NoSQL db design
Luca Cabibbo
 To decide between the various candidate representations, a few
experiments can help
 the target datastore is Oracle NoSQL (single node)
 three candidate representations
 an entry for a whole game – EAO
 /Game/*/rounds[*] + /Game/*/* – Rounds+ETF
 /Game/*/rounds[*] + /Game/* – Rounds+EAO
 various workloads
 game retrieval
 round addition
 mixed – 80% game retrievals + 20% round additions
 each game is 8kb, each round is 0.5kb
 database size is in GB, timings are ms per operation
NoSQL Database Design 83
A case study in NoSQL db design
Luca Cabibbo
Fare clic per modificare lo stile del
titolo
Luca Cabibbo NoSQL Database Design 84
A case study in NoSQL db design
0
1
2
3
4
5
6
7
8
2 4 6 8 10 12 14 16
Game Retrieval
EAO Rounds+ETF Rounds+EAO
0
0,5
1
1,5
2
2,5
3
3,5
4
2 4 6 8 10 12 14 16
Round Addition
EAO Rounds+ETF Rounds+EAO
0
1
2
3
4
5
6
7
8
2 4 6 8 10 12 14 16
Mixed Load (80/20)
EAO Rounds+ETF Rounds+EAO
Luca Cabibbo
 The experiments show that aggregate partitioning has indeed
impact on the performance of the various operations
 in general, when using a NoSQL database, decisions on the
organization of data are required
 these decisions are significant – as the data representation
affects major quality requirements – such as scalability,
performance, and consistency
NoSQL Database Design 85
- Conclusion (case study)

Más contenido relacionado

La actualidad más candente

Non relational databases-no sql
Non relational databases-no sqlNon relational databases-no sql
Non relational databases-no sqlRam kumar
 
NoSQL Solutions - a comparative study
NoSQL Solutions - a comparative studyNoSQL Solutions - a comparative study
NoSQL Solutions - a comparative studyGuillaume Lefranc
 
To SQL or NoSQL, that is the question
To SQL or NoSQL, that is the questionTo SQL or NoSQL, that is the question
To SQL or NoSQL, that is the questionKrishnakumar S
 
NOSQL Databases types and Uses
NOSQL Databases types and UsesNOSQL Databases types and Uses
NOSQL Databases types and UsesSuvradeep Rudra
 
NoSQL databases pros and cons
NoSQL databases pros and consNoSQL databases pros and cons
NoSQL databases pros and consFabio Fumarola
 
Introduction to Oracle Database
Introduction to Oracle DatabaseIntroduction to Oracle Database
Introduction to Oracle Databasepuja_dhar
 
NoSQL - 05March2014 Seminar
NoSQL - 05March2014 SeminarNoSQL - 05March2014 Seminar
NoSQL - 05March2014 SeminarJainul Musani
 
Overview of oracle database
Overview of oracle databaseOverview of oracle database
Overview of oracle databaseSamar Prasad
 
SQLCAT: Tier-1 BI in the World of Big Data
SQLCAT: Tier-1 BI in the World of Big DataSQLCAT: Tier-1 BI in the World of Big Data
SQLCAT: Tier-1 BI in the World of Big DataDenny Lee
 
NoSQL Architecture Overview
NoSQL Architecture OverviewNoSQL Architecture Overview
NoSQL Architecture OverviewChristopher Foot
 
2008 2086 Gangler
2008 2086 Gangler2008 2086 Gangler
2008 2086 GanglerSecure-24
 
9. Document Oriented Databases
9. Document Oriented Databases9. Document Oriented Databases
9. Document Oriented DatabasesFabio Fumarola
 
Introduction to NoSQL Databases
Introduction to NoSQL DatabasesIntroduction to NoSQL Databases
Introduction to NoSQL DatabasesDerek Stainer
 

La actualidad más candente (20)

Non relational databases-no sql
Non relational databases-no sqlNon relational databases-no sql
Non relational databases-no sql
 
NoSQL Solutions - a comparative study
NoSQL Solutions - a comparative studyNoSQL Solutions - a comparative study
NoSQL Solutions - a comparative study
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
To SQL or NoSQL, that is the question
To SQL or NoSQL, that is the questionTo SQL or NoSQL, that is the question
To SQL or NoSQL, that is the question
 
NOSQL Databases types and Uses
NOSQL Databases types and UsesNOSQL Databases types and Uses
NOSQL Databases types and Uses
 
Nosql databases
Nosql databasesNosql databases
Nosql databases
 
NoSQL databases pros and cons
NoSQL databases pros and consNoSQL databases pros and cons
NoSQL databases pros and cons
 
Unit 3 MongDB
Unit 3 MongDBUnit 3 MongDB
Unit 3 MongDB
 
Introduction to Oracle Database
Introduction to Oracle DatabaseIntroduction to Oracle Database
Introduction to Oracle Database
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
 
NoSQL - 05March2014 Seminar
NoSQL - 05March2014 SeminarNoSQL - 05March2014 Seminar
NoSQL - 05March2014 Seminar
 
Mysql database
Mysql databaseMysql database
Mysql database
 
Overview of oracle database
Overview of oracle databaseOverview of oracle database
Overview of oracle database
 
SQLCAT: Tier-1 BI in the World of Big Data
SQLCAT: Tier-1 BI in the World of Big DataSQLCAT: Tier-1 BI in the World of Big Data
SQLCAT: Tier-1 BI in the World of Big Data
 
Relational vs. Non-Relational
Relational vs. Non-RelationalRelational vs. Non-Relational
Relational vs. Non-Relational
 
NoSQL Architecture Overview
NoSQL Architecture OverviewNoSQL Architecture Overview
NoSQL Architecture Overview
 
2008 2086 Gangler
2008 2086 Gangler2008 2086 Gangler
2008 2086 Gangler
 
9. Document Oriented Databases
9. Document Oriented Databases9. Document Oriented Databases
9. Document Oriented Databases
 
Introduction to NoSQL Databases
Introduction to NoSQL DatabasesIntroduction to NoSQL Databases
Introduction to NoSQL Databases
 

Similar a Seminar by Luca Cabibbo : Nosql db design-20140110

NoSQL Data Architecture Patterns
NoSQL Data ArchitecturePatternsNoSQL Data ArchitecturePatterns
NoSQL Data Architecture PatternsMaynooth University
 
Making MySQL Agile-ish
Making MySQL Agile-ishMaking MySQL Agile-ish
Making MySQL Agile-ishDave Stokes
 
NoSQL databases - An introduction
NoSQL databases - An introductionNoSQL databases - An introduction
NoSQL databases - An introductionPooyan Mehrparvar
 
Solr cloud the 'search first' nosql database extended deep dive
Solr cloud the 'search first' nosql database   extended deep diveSolr cloud the 'search first' nosql database   extended deep dive
Solr cloud the 'search first' nosql database extended deep divelucenerevolution
 
Considerations for using NoSQL technology on your next IT project - Akmal Cha...
Considerations for using NoSQL technology on your next IT project - Akmal Cha...Considerations for using NoSQL technology on your next IT project - Akmal Cha...
Considerations for using NoSQL technology on your next IT project - Akmal Cha...jaxconf
 
Teoria unidad 2 Base de Datos No Relacionales
Teoria unidad 2 Base de Datos No Relacionales Teoria unidad 2 Base de Datos No Relacionales
Teoria unidad 2 Base de Datos No Relacionales Estefani Managu
 
Mongo Bb - NoSQL tutorial
Mongo Bb - NoSQL tutorialMongo Bb - NoSQL tutorial
Mongo Bb - NoSQL tutorialMohan Rathour
 
Python Utilities for Managing MySQL Databases
Python Utilities for Managing MySQL DatabasesPython Utilities for Managing MySQL Databases
Python Utilities for Managing MySQL DatabasesMats Kindahl
 
Couchbase Data Platform | Big Data Demystified
Couchbase Data Platform | Big Data DemystifiedCouchbase Data Platform | Big Data Demystified
Couchbase Data Platform | Big Data DemystifiedOmid Vahdaty
 
Neo4j tms
Neo4j tmsNeo4j tms
Neo4j tms_mdev_
 
NoSQL Intro with cassandra
NoSQL Intro with cassandraNoSQL Intro with cassandra
NoSQL Intro with cassandraBrian Enochson
 
MySQL Baics - Texas Linxufest beginners tutorial May 31st, 2019
MySQL Baics - Texas Linxufest beginners tutorial May 31st, 2019MySQL Baics - Texas Linxufest beginners tutorial May 31st, 2019
MySQL Baics - Texas Linxufest beginners tutorial May 31st, 2019Dave Stokes
 
Vskills Apache Cassandra sample material
Vskills Apache Cassandra sample materialVskills Apache Cassandra sample material
Vskills Apache Cassandra sample materialVskills
 
MySQL NDB Cluster 8.0
MySQL NDB Cluster 8.0MySQL NDB Cluster 8.0
MySQL NDB Cluster 8.0Ted Wennmark
 
DMDW Extra Lesson - NoSql and MongoDB
DMDW  Extra Lesson - NoSql and MongoDBDMDW  Extra Lesson - NoSql and MongoDB
DMDW Extra Lesson - NoSql and MongoDBJohannes Hoppe
 

Similar a Seminar by Luca Cabibbo : Nosql db design-20140110 (20)

Nosql
NosqlNosql
Nosql
 
No sql
No sqlNo sql
No sql
 
NoSQL Data Architecture Patterns
NoSQL Data ArchitecturePatternsNoSQL Data ArchitecturePatterns
NoSQL Data Architecture Patterns
 
Making MySQL Agile-ish
Making MySQL Agile-ishMaking MySQL Agile-ish
Making MySQL Agile-ish
 
NoSQL databases - An introduction
NoSQL databases - An introductionNoSQL databases - An introduction
NoSQL databases - An introduction
 
Solr cloud the 'search first' nosql database extended deep dive
Solr cloud the 'search first' nosql database   extended deep diveSolr cloud the 'search first' nosql database   extended deep dive
Solr cloud the 'search first' nosql database extended deep dive
 
Considerations for using NoSQL technology on your next IT project - Akmal Cha...
Considerations for using NoSQL technology on your next IT project - Akmal Cha...Considerations for using NoSQL technology on your next IT project - Akmal Cha...
Considerations for using NoSQL technology on your next IT project - Akmal Cha...
 
Teoria unidad 2 Base de Datos No Relacionales
Teoria unidad 2 Base de Datos No Relacionales Teoria unidad 2 Base de Datos No Relacionales
Teoria unidad 2 Base de Datos No Relacionales
 
2. Lecture2_NOSQL_KeyValue.ppt
2. Lecture2_NOSQL_KeyValue.ppt2. Lecture2_NOSQL_KeyValue.ppt
2. Lecture2_NOSQL_KeyValue.ppt
 
Mongo Bb - NoSQL tutorial
Mongo Bb - NoSQL tutorialMongo Bb - NoSQL tutorial
Mongo Bb - NoSQL tutorial
 
Python Utilities for Managing MySQL Databases
Python Utilities for Managing MySQL DatabasesPython Utilities for Managing MySQL Databases
Python Utilities for Managing MySQL Databases
 
Couchbase Data Platform | Big Data Demystified
Couchbase Data Platform | Big Data DemystifiedCouchbase Data Platform | Big Data Demystified
Couchbase Data Platform | Big Data Demystified
 
Neo4j tms
Neo4j tmsNeo4j tms
Neo4j tms
 
NoSQL Intro with cassandra
NoSQL Intro with cassandraNoSQL Intro with cassandra
NoSQL Intro with cassandra
 
MySQL Baics - Texas Linxufest beginners tutorial May 31st, 2019
MySQL Baics - Texas Linxufest beginners tutorial May 31st, 2019MySQL Baics - Texas Linxufest beginners tutorial May 31st, 2019
MySQL Baics - Texas Linxufest beginners tutorial May 31st, 2019
 
Vskills Apache Cassandra sample material
Vskills Apache Cassandra sample materialVskills Apache Cassandra sample material
Vskills Apache Cassandra sample material
 
MySQL NDB Cluster 8.0
MySQL NDB Cluster 8.0MySQL NDB Cluster 8.0
MySQL NDB Cluster 8.0
 
2018 05 08_biological_databases_no_sql
2018 05 08_biological_databases_no_sql2018 05 08_biological_databases_no_sql
2018 05 08_biological_databases_no_sql
 
NoSQL Basics and MongDB
NoSQL Basics and  MongDBNoSQL Basics and  MongDB
NoSQL Basics and MongDB
 
DMDW Extra Lesson - NoSql and MongoDB
DMDW  Extra Lesson - NoSql and MongoDBDMDW  Extra Lesson - NoSql and MongoDB
DMDW Extra Lesson - NoSql and MongoDB
 

Más de INRIA-OAK

Change Management in the Traditional and Semantic Web
Change Management in the Traditional and Semantic WebChange Management in the Traditional and Semantic Web
Change Management in the Traditional and Semantic WebINRIA-OAK
 
A Network-Aware Approach for Searching As-You-Type in Social Media
A Network-Aware Approach for Searching As-You-Type in Social MediaA Network-Aware Approach for Searching As-You-Type in Social Media
A Network-Aware Approach for Searching As-You-Type in Social MediaINRIA-OAK
 
Speeding up information extraction programs: a holistic optimizer and a learn...
Speeding up information extraction programs: a holistic optimizer and a learn...Speeding up information extraction programs: a holistic optimizer and a learn...
Speeding up information extraction programs: a holistic optimizer and a learn...INRIA-OAK
 
Querying incomplete data
Querying incomplete dataQuerying incomplete data
Querying incomplete dataINRIA-OAK
 
ANGIE in wonderland
ANGIE in wonderlandANGIE in wonderland
ANGIE in wonderlandINRIA-OAK
 
On building more human query answering systems
On building more human query answering systemsOn building more human query answering systems
On building more human query answering systemsINRIA-OAK
 
Dynamically Optimizing Queries over Large Scale Data Platforms
Dynamically Optimizing Queries over Large Scale Data PlatformsDynamically Optimizing Queries over Large Scale Data Platforms
Dynamically Optimizing Queries over Large Scale Data PlatformsINRIA-OAK
 
Web Data Management in RDF Age
Web Data Management in RDF AgeWeb Data Management in RDF Age
Web Data Management in RDF AgeINRIA-OAK
 
Oak meeting 18/09/2014
Oak meeting 18/09/2014Oak meeting 18/09/2014
Oak meeting 18/09/2014INRIA-OAK
 
Rdf saturator
Rdf saturatorRdf saturator
Rdf saturatorINRIA-OAK
 
Rdf generator
Rdf generatorRdf generator
Rdf generatorINRIA-OAK
 
Rdf conjunctive query selectivity estimation
Rdf conjunctive query selectivity estimationRdf conjunctive query selectivity estimation
Rdf conjunctive query selectivity estimationINRIA-OAK
 
rdf query reformulation
rdf query reformulationrdf query reformulation
rdf query reformulationINRIA-OAK
 
postgres loader
postgres loaderpostgres loader
postgres loaderINRIA-OAK
 

Más de INRIA-OAK (20)

Change Management in the Traditional and Semantic Web
Change Management in the Traditional and Semantic WebChange Management in the Traditional and Semantic Web
Change Management in the Traditional and Semantic Web
 
A Network-Aware Approach for Searching As-You-Type in Social Media
A Network-Aware Approach for Searching As-You-Type in Social MediaA Network-Aware Approach for Searching As-You-Type in Social Media
A Network-Aware Approach for Searching As-You-Type in Social Media
 
Speeding up information extraction programs: a holistic optimizer and a learn...
Speeding up information extraction programs: a holistic optimizer and a learn...Speeding up information extraction programs: a holistic optimizer and a learn...
Speeding up information extraction programs: a holistic optimizer and a learn...
 
Querying incomplete data
Querying incomplete dataQuerying incomplete data
Querying incomplete data
 
ANGIE in wonderland
ANGIE in wonderlandANGIE in wonderland
ANGIE in wonderland
 
On building more human query answering systems
On building more human query answering systemsOn building more human query answering systems
On building more human query answering systems
 
Dynamically Optimizing Queries over Large Scale Data Platforms
Dynamically Optimizing Queries over Large Scale Data PlatformsDynamically Optimizing Queries over Large Scale Data Platforms
Dynamically Optimizing Queries over Large Scale Data Platforms
 
Web Data Management in RDF Age
Web Data Management in RDF AgeWeb Data Management in RDF Age
Web Data Management in RDF Age
 
Oak meeting 18/09/2014
Oak meeting 18/09/2014Oak meeting 18/09/2014
Oak meeting 18/09/2014
 
Nautilus
NautilusNautilus
Nautilus
 
Warg
WargWarg
Warg
 
Vip2p
Vip2pVip2p
Vip2p
 
S4
S4S4
S4
 
Rdf saturator
Rdf saturatorRdf saturator
Rdf saturator
 
Rdf generator
Rdf generatorRdf generator
Rdf generator
 
Rdf conjunctive query selectivity estimation
Rdf conjunctive query selectivity estimationRdf conjunctive query selectivity estimation
Rdf conjunctive query selectivity estimation
 
rdf query reformulation
rdf query reformulationrdf query reformulation
rdf query reformulation
 
postgres loader
postgres loaderpostgres loader
postgres loader
 
Plreuse
PlreusePlreuse
Plreuse
 
Paxquery
PaxqueryPaxquery
Paxquery
 

Último

OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024innovationoecd
 
Forensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxForensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxkumarsanjai28051
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naJASISJULIANOELYNV
 
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxmaryFF1
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingNetHelix
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxBerniceCayabyab1
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxNandakishor Bhaurao Deshmukh
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trssuser06f238
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPirithiRaju
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》rnrncn29
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxmalonesandreagweneth
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPirithiRaju
 
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXDole Philippines School
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPirithiRaju
 

Último (20)

OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024
 
Forensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxForensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptx
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -I
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by na
 
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptxECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
ECG Graph Monitoring with AD8232 ECG Sensor & Arduino.pptx
 
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editingBase editing, prime editing, Cas13 & RNA editing and organelle base editing
Base editing, prime editing, Cas13 & RNA editing and organelle base editing
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptx
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 tr
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdf
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdf
 
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
 

Seminar by Luca Cabibbo : Nosql db design-20140110

  • 1. Luca Cabibbo Luca Cabibbo NoSQL Database Design for Next-Generation Web Applications Joint work with Francesca Bugiotti, Paolo Atzeni, and Riccardo Torlone NoSQL Database Design 1 Luca Cabibbo  NoSQL datastores are a new generation of distributed database systems – they have been designed to manage large data sets distributed over many servers  a promise of NoSQL database technology is to support the development of next-generation web applications  in this context, we are interested in NoSQL database design  we also present an abstract data model for NoSQL databases NoSQL Database Design 2 - Introduction
  • 2. Luca Cabibbo  NoSQL database systems  NoAM – an abstract data model for NoSQL databases  NoSQL database design for next-generation web applications  The NoAM approach to NoSQL database design  overview  aggregates and aggregate design  aggregate partitioning  a language for data representations  implementation  conclusion  ONDM (Object-NoSQL Datastore Mapper)  architecture  conclusion  A case study in NoSQL database design NoSQL Database Design 3 - Outline Luca Cabibbo  NoSQL datastores are a new generation of distributed database systems  they have been designed to support the needs of an increasing number of modern applications – for which traditional database technology is unsatisfactory  a main requirement for these systems is the ability to manage large data sets distributed over many servers – whereas relational DBMSs are not designed to be run on clusters NoSQL Database Design 5 * NoSQL database systems
  • 3. Luca Cabibbo  The NoSQL landscape is characterized by a high heterogeneity  http://nosql-database.org/ lists 150 non-relational databases  they have different data models and different APIs to access the data – as well as different consistency and durability guarantees  We focus here on three main categories of NoSQL databases  key-value stores  a database is a collection of key-value pairs  document stores  a database is a collection of documents  extensible record stores  data is organized as tables of extensible records  these categories include more than 70 systems NoSQL Database Design 7 NoSQL and heterogeneity Luca Cabibbo  In a key-value store, a database is a schema-less collection of key-value pairs  values are usually binary strings, opaque to the datastore – even if some systems have interpreted values, such as counters, lists, or hashes  programmer-defined keys are either binary strings or structured keys – in some systems, part of the key is used to control data distribution  simple data access operations – put, get, and delete – over an individual key-value pair or a group of related key-value pairs NoSQL Database Design 8 Key-value stores
  • 4. Luca Cabibbo Fare clic per modificare lo stile del titolo Luca Cabibbo NoSQL Database Design 9 Key-value stores: examples key value Player:mary username : mary firstName : Mary lastName : Wilson games : { … } Player:rick username : rick firstName : Ricky lastName : Doe score : 42 games : { ... } a hash value key value /Player/mary/- { “username” : “mary”, “firstName” : “Mary”, “lastName” : “Wilson”, “games” : […] } /Player/rick/- { “username” : “rick”, “firstName” : “Ricky”, lastName : “Doe”, “score” : “42”, “games” : […] } key value /Player/mary/-/username mary /Player/mary/-/firstName Mary /Player/mary/-/lastName Wilson /Player/mary/-/games [ … ] Luca Cabibbo  In a document store, a database is a set of documents  each document has a complex value and an identifier  documents are composed of fields, which are dynamically defined for each document at runtime – each field can be a scalar value, a list, or a document itself  documents are organized in collections  the structure of documents is not opaque to datastores – they create indexes on documents and support content-based querying NoSQL Database Design 10 Document stores
  • 5. Luca Cabibbo Fare clic per modificare lo stile del titolo Luca Cabibbo NoSQL Database Design 11 Document stores: example id document mary { “_id” : “mary”, “username” : “mary”, “firstName” : “Mary”, “lastName” : “Wilson”, “games” : [ { “game” : “Game:2345”, “opponent” : “Player:rick” }, { “game” : “Game:2611”, “opponent” : “Player:ann” } ] } rick { “_id” : “rick”, “username” : “rick”, “firstName” : “Ricky”, “lastName” : “Doe”, “score” : “42”, “games” : [ { “game” : “Game:2345”, “opponent” : “Player:mary” }, { “game” : “Game:7425”, “opponent” : “Player:ann” }, { “game” : “Game:1241”, “opponent” : “Player:johnny” } ] } collection Player Luca Cabibbo  An extensible record (or column-family) store organizes data around tables, records/rows, and columns  a relaxation of the relational model, in which databases are mostly schema-less – since each row can have its own set of columns  each table designates a primary key – which comprises the only mandatory attributes of the table – in some systems, part of the primary key is used to control data distribution NoSQL Database Design 12 Extensible record stores
  • 6. Luca Cabibbo Fare clic per modificare lo stile del titolo Luca Cabibbo NoSQL Database Design 13 Extensible record stores: example username firstName lastName score games[0] games[1] games[2] … mary Mary Wilson {…} {…} rick Ricky Doe 42 {…} {…} {…} table Player Luca Cabibbo  The NoSQL landscape is characterized by a high heterogeneity  however, “the availability of a high-level representation of the data at hand, be it logical or conceptual, remains a fundamental tool for developers and users, since it makes understanding, managing, accessing, and integrating information sources much easier, independently of the technologies used”  To this end, we propose NoAM (NoSQL Abstract Model) – an abstract and system independent data model for NoSQL databases  NoAM aims at exploiting the commonalities of their various data models – but it also introduces abstractions to balance their differences and variations NoSQL Database Design 14 * NoAM – an abstract data model for NoSQL databases
  • 7. Luca Cabibbo  Main commonalities of their various NoSQL data models  NoSQL datastores share the common provision of having a modeling data element that is a distribution, access and manipulation unit (DAM unit)  a data access unit  more precisely, a maximal unit of consistency/atomic data access and manipulation  a unit of distribution  each DAM unit is located on a single node of the cluster – but in general different DAM units are distributed among the nodes of the cluster  in the various systems, a DAM unit can be  a record/row – a document – a group of key-value pairs sharing part of the key  in NoAM, a DAM unit is called a block NoSQL Database Design 15 The NoAM abstract data model Luca Cabibbo  Main commonalities of their various NoSQL data models  NoSQL datastores also offer the ability to access just some parts of a DAM unit – they have a modeling data element that is a “smaller” data access unit (SDA unit)  in the various systems, an SDA unit can be  a column – a field – an individual key-value pair  in NoAM, a SDA unit is called an entry  moreover, many datastores provide a notion of collection of data access units  in the various systems, a collection can be  a table – a document collection  in NoAM, a collection of DAM units (blocks) is called a collection NoSQL Database Design 16 The NoAM abstract data model
  • 8. Luca Cabibbo  The NoAM abstract data model  a database is a set of collections – each collection has a distinct name  a collection is a set of blocks – each block is identified in its collection by a block key  a block is a non-empty set of entries  each entry is a pair (ek,ev)  ek is the entry key – unique within its block  ev is a value (either a scalar or a complex value), called the entry value NoSQL Database Design 17 The NoAM abstract data model Luca Cabibbo Fare clic per modificare lo stile del titolo Luca Cabibbo NoSQL Database Design 18 Example: a NoAM database username “mary” firstName “Mary” lastName “Wilson” games[0]  game : Game:2345, opponent : Player:rick  games[1]  game : Game:2611, opponent : Player:ann  mary username “rick” firstName “Ricky” lastName “Doe” score 42 games[0]  game : Game:2345, opponent : Player:mary  games[1]  game : Game:7425, opponent : Player:ann  games[2]  game : Game:1241, opponent : Player:johnny  rick Player id 2345 firstPlayer Player:mary secondPlayer Player:rick rounds[0]  moves : … , comments : …  rounds[1]  moves : … , actions : … , spell : …  2345 Game
  • 9. Luca Cabibbo  We consider here NoSQL database design – the problem of representing persistent data of an application in a target NoSQL database  NoSQL databases are claimed to be “schema-less”  however, the data of interest do show some structure, and decisions on the organization of data are required  specifically, to map application data to the modeling elements (collections, tables, documents, key-value pairs) available in the target datastore NoSQL Database Design 19 * NoSQL database design for next-generation web applications Luca Cabibbo  Consider a fictious online, web 2.0 game – e.g., some variant of Ruzzle – which should manage various application objects, including players, games, rounds, and moves NoSQL Database Design 20 A running example mary : Player username = "mary" firstName = "Mary" lastName = "Wilson" rick : Player username = "rick" firstName = "Ricky" lastName = "Doe" score = 42 2345 : Game id = 2345 firstPlayer secondPlayer : GameInfo games[0] gameopponent : GameInfo games[0] game opponent : Round : Round rounds[0] rounds[1] : Move : Move moves[0] moves[1] : Move moves[0] : GameInfo games[2] : GameInfo games[1] : GameInfo games[1]... ... ... ... ... ...
  • 10. Luca Cabibbo  Consider a fictious online, web 2.0 game – e.g., some variant of Ruzzle – which should manage various application objects, including players, games, rounds, and moves  assume for example that the target database is an extensible record store  what records (and tables) should we use?  a distinct record for each different application object?  or should we use each record to represent a group of related objects? what is the grouping criterion?  what columns should we use?  a distinct column for each object field?  or should we use each column to represent a group of related fields? what is the grouping criterion? NoSQL Database Design 21 A running example Luca Cabibbo  In NoSQL database design  decisions on the organization of data are required, in any case  these decisions are significant – as the data representation affects major quality requirements – such as scalability, performance, and consistency  a randomly chosen data representation may not satisfy the needed qualities  how should we make design decisions to indeed support the qualities of next-generation web applications? NoSQL Database Design 22 NoSQL database design
  • 11. Luca Cabibbo  We focus here on database design for next-generation web applications – also called scalable web applications, or scalable simple OLTP-style applications  foremost requirements of these applications  data of interest are large and have a flexible structure  data access is based on simple read-write operations  horizontal scalability – data should be distributed over a cluster of many servers  high availability and good response time  relaxed consistency guarantees – general ACID transactions are typically unnecessary – however, a certain degree of consistency is required, to easy application development – e.g., eventual consistency or BASE NoSQL Database Design 23 Next-generation web applications Luca Cabibbo  State-of-the-art in NoSQL database design  a lot of best practices and guidelines  but usually related to a specific datastore or class of datastores  neither a systematic methodology nor a high-level data model  as in the case of relational database design NoSQL Database Design 24 State of the art
  • 12. Luca Cabibbo  We propose the NoAM approach to NoSQL database design  tailored to the requirements of next-generation web applications  based on the NoAM abstract data model for NoSQL databases  a high level/system independent approach – the initial design activities are independent of any specific target systems  a NoAM abstract database is first used to represent the application data  the intermediate representation is then implemented in a target NoSQL datastore, taking into account its specific features NoSQL Database Design 25 * The NoAM approach to NoSQL database design Luca Cabibbo  The NoAM approach to NoSQL database design is based on the following main phases  aggregate design – to identify the various classes of aggregate objects needed in the application  this activity is driven by use cases (functional requirements) and scalability and consistency needs  aggregate partitioning – aggregates are partitioned into smaller data elements  driven by use cases and performance requirements  high-level NoSQL database design – aggregate are mapped to the NoAM intermediate data model  implementation – to map the intermediate representation to the specific modeling elements of the target datastore NoSQL Database Design 26 - Overview
  • 13. Luca Cabibbo  We start by considering application data objects… NoSQL Database Design 27 Application data mary : Player username = "mary" firstName = "Mary" lastName = "Wilson" rick : Player username = "rick" firstName = "Ricky" lastName = "Doe" score = 42 2345 : Game id = 2345 firstPlayer secondPlayer : GameInfo games[0] gameopponent : GameInfo games[0] game opponent : Round : Round rounds[0] rounds[1] : Move : Move moves[0] moves[1] : Move moves[0] : GameInfo games[2] : GameInfo games[1] : GameInfo games[1]... ... ... ... ... ... Luca Cabibbo  … we group them in aggregates (decisions needed!) … NoSQL Database Design 28 Aggregates mary : Player username = "mary" firstName = "Mary" lastName = "Wilson" rick : Player username = "rick" firstName = "Ricky" lastName = "Doe" score = 42 2345 : Game id = 2345 firstPlayer secondPlayer : GameInfo games[0] gameopponent : GameInfo games[0] game opponent : Round : Round rounds[0] rounds[1] : Move : Move moves[0] moves[1] : Move moves[0] : GameInfo games[2] : GameInfo games[1] : GameInfo games[1]... ... ... ... ... ...
  • 14. Luca Cabibbo  … we consider aggregates as complex-value objects… NoSQL Database Design 29 Aggregates as complex-value objects Player:mary :  username : "mary", firstName : "Mary", lastName : "Wilson", games : {  game : Game:2345, opponent : Player:rick ,  game : Game:2611, opponent : Player:ann  }  Player:rick :  username : "rick", firstName : "Ricky", lastName : "Doe", score : 42, games : {  game : Game:2345, opponent : Player:mary ,  game : Game:7425, opponent : Player:ann ,  game : Game:1241, opponent : Player:johnny  }  Game:2345 :  id : "2345", firstPlayer : Player:mary, secondPlayer : Player:rick, rounds : {  moves : … , comments : … ,  moves : … , actions : … , spell : …  }  Luca Cabibbo  … we partition these complex values (decisions needed!) … NoSQL Database Design 30 Aggregate partitioning Player:mary :  username : "mary", firstName : "Mary", lastName : "Wilson", games : {  game : Game:2345, opponent : Player:rick ,  game : Game:2611, opponent : Player:ann  }  Player:rick :  username : "rick", firstName : "Ricky", lastName : "Doe", score : 42, games : {  game : Game:2345, opponent : Player:mary ,  game : Game:7425, opponent : Player:ann ,  game : Game:1241, opponent : Player:johnny  }  Game:2345 :  id : "2345", firstPlayer : Player:mary, secondPlayer : Player:rick, rounds : {  moves : … , comments : … ,  moves : … , actions : … , spell : …  } 
  • 15. Luca Cabibbo  … and represent them into an abstract data model for NoSQL databases (consequence of decisions) … NoSQL Database Design 31 Data representation in NoAM username “mary” firstName “Mary” lastName “Wilson” games[0]  game : Game:2345, opponent : Player:rick  games[1]  game : Game:2611, opponent : Player:ann  mary username “rick” firstName “Ricky” lastName “Doe” score 42 games[0]  game : Game:2345, opponent : Player:mary  games[1]  game : Game:7425, opponent : Player:ann  games[2]  game : Game:1241, opponent : Player:johnny  rick Player id 2345 firstPlayer Player:mary secondPlayer Player:rick rounds[0]  moves : … , comments : …  rounds[1]  moves : … , actions : … , spell : …  2345 Game Luca Cabibbo  … and finally we map the intermediate representation to the data structures of the target datastore (the approach specifies how) NoSQL Database Design 32 Implementation username firstName lastName score games[0] games[1] games[2] … mary Mary Wilson {…} {…} rick Ricky Doe 42 {…} {…} {…} table Player id firstPlayer secondPlayer rounds[0] rounds[1] rounds[2] … 2345 Player:mary Player:rick {…} {…} table Game
  • 16. Luca Cabibbo  … and finally we map the intermediate representation to the data structures of the target datastore (the approach specifies how) NoSQL Database Design 33 Implementation key value /Player/mary/-/username mary /Player/mary/-/firstName Mary /Player/mary/-/lastName Wilson /Player/mary/-/games[0] { “game” : “Game:2345”, “opponent” : “Player:rick” } /Player/mary/-/games[1] { “game” : “Game:2611”, “opponent” : “Player:ann” } … … /Games/2345/-/id 2345 /Games/2345/-/firstPlayer Player:mary /Games/2345/-/secondPlayer Player:rick /Games/2345/-/rounds[0] { … } /Games/2345/-/rounds[1] { … } … … Luca Cabibbo  In our approach, we consider application data arranged in aggregates  the notion of aggregate comes from Domain-Driven Design (DDD) – a popular object-oriented design methodology – and from principles in the design of scalable applications  aggregate design affects scalability and the scope of atomic operations – and therefore, the ability to support relevant integrity constraints NoSQL Database Design 34 - Aggregates and aggregate design
  • 17. Luca Cabibbo  The design of scalable applications is discussed in a seminal paper by Pat Helland – Life beyond distributed transactions: an apostate’s opinion, CIDR 2007  data should be organized as a set of complex-value objects with unique identifiers – called entities or aggregates – each aggregate is a “chunk” of related data, and is intended to be a unit of data access and manipulation  aggregates should govern data distribution – aggregates are distributed among the nodes of the cluster, but each aggregate is located on a single node  atomic transactions can not span multiple aggregates – to avoid the coordination overhead required by distributed transactions  operations spanning multiple aggregates should be implemented as multiple operations, each over a single aggregate – using asynchronous messages and eventual consistency NoSQL Database Design 35 Design of scalable applications Luca Cabibbo  Domain-Driven Design (DDD, Eric Evans, 2003) is also based on a similar notion of aggregate – DDD gives us other insights on aggregate design  each aggregate is a group of application objects (entities and value objects) rooted in an entity  an entity is a persistence object that has independent existence and a unique identifier  a value object is a persistent object, without an own identifier  aggregate boundaries govern distribution and transactions  each aggregate should be  large enough, to accommodate all the data involved by some integrity constraints or other business rules  as small as possible, to reduce concurrency collisions – to support performance and scalability requirements NoSQL Database Design 36 Aggregates in Domain-driven design
  • 18. Luca Cabibbo  Aggregates in our running example are players and games  but rounds are not – to support an integrity constraint of the game NoSQL Database Design 37 Example mary : Player username = "mary" firstName = "Mary" lastName = "Wilson" rick : Player username = "rick" firstName = "Ricky" lastName = "Doe" score = 42 2345 : Game id = 2345 firstPlayer secondPlayer : GameInfo games[0] gameopponent : GameInfo games[0] game opponent : Round : Round rounds[0] rounds[1] : Move : Move moves[0] moves[1] : Move moves[0] : GameInfo games[2] : GameInfo games[1] : GameInfo games[1]... ... ... ... ... ... Luca Cabibbo  At the application level, data is organized in aggregates  each aggregate object is can be considered a complex-value object, with a unique identifier  a set of aggregate objects is a class  a set of class form an application dataset NoSQL Database Design 38 Application data model Player:mary :  username : "mary", firstName : "Mary", lastName : "Wilson", games : {  game : Game:2345, opponent : Player:rick ,  game : Game:2611, opponent : Player:ann  }  Player:rick :  username : "rick", firstName : "Ricky", lastName : "Doe", score : 42, games : {  game : Game:2345, opponent : Player:mary ,  game : Game:7425, opponent : Player:ann ,  game : Game:1241, opponent : Player:johnny  }  Game:2345 :  id : "2345", firstPlayer : Player:mary, secondPlayer : Player:rick, rounds : {  moves : … , comments : … ,  moves : … , actions : … , spell : …  } 
  • 19. Luca Cabibbo  To summarize, aggregates have the following characteristics  an aggregate is a complex-value object  each aggregate is a unit of data access and atomic manipulation  aggregates govern data distribution  In NoSQL database design, we should map each aggregate to a data modeling element having analogous features NoSQL Database Design 39 Aggregates in NoSQL db design Luca Cabibbo  In NoSQL database design, we should map each aggregate to a data modeling element having analogous features  each aggregate should be mapped to a unit of data access, atomic manipulation, and distribution  therefore, a record/row, a document, or a group of related key-value pairs – that is, a NoAM block  classes of aggregates can then be mapped to NoAM collections  the role for columns, document fields, or individual key-value pairs (i.e., NoAM entries) has to be discussed  we would like to abstract from the features of specific datastores – NoAM enables us to do so NoSQL Database Design 40 Aggregates in NoSQL db design
  • 20. Luca Cabibbo  An application dataset can be represented in NoAM as follows  the application dataset is represented by a NoAM database  each class of aggregates is represented by a collection  the class name is used as collection name  each aggregate object is represented by a block  the aggregate identifier is used as block key  each aggregate object is represented by one or more entries in the corresponding block  the complex value of the aggregate object is partitioned into one or more entry values NoSQL Database Design 41 Representing aggregates in NoAM Luca Cabibbo Fare clic per modificare lo stile del titolo Luca Cabibbo NoSQL Database Design 42 Representing aggregates in NoAM Player:mary :  username : "mary", firstName : "Mary", lastName : "Wilson", games : {  game : Game:2345, opponent : Player:rick ,  game : Game:2611, opponent : Player:ann  }  Player:rick :  username : "rick", firstName : "Ricky", lastName : "Doe", score : 42, games : {  game : Game:2345, opponent : Player:mary ,  game : Game:7425, opponent : Player:ann ,  game : Game:1241, opponent : Player:johnny  }  Game:2345 :  id : "2345", firstPlayer : Player:mary, secondPlayer : Player:rick, rounds : {  moves : … , comments : … ,  moves : … , actions : … , spell : …  }  … … … … … … mary … … … … … … … … rick Player … … … … … … 2345 Game
  • 21. Luca Cabibbo Fare clic per modificare lo stile del titolo Luca Cabibbo NoSQL Database Design 43 Example: partitioning of aggregates Player:mary :  username : "mary", firstName : "Mary", lastName : "Wilson", games : {  game : Game:2345, opponent : Player:rick ,  game : Game:2611, opponent : Player:ann  }  Player:rick :  username : "rick", firstName : "Ricky", lastName : "Doe", score : 42, games : {  game : Game:2345, opponent : Player:mary ,  game : Game:7425, opponent : Player:ann ,  game : Game:1241, opponent : Player:johnny  }  Game:2345 :  id : "2345", firstPlayer : Player:mary, secondPlayer : Player:rick, rounds : {  moves : … , comments : … ,  moves : … , actions : … , spell : …  }  Luca Cabibbo Fare clic per modificare lo stile del titolo Luca Cabibbo NoSQL Database Design 44 Example: aggregates in NoAM username “mary” firstName “Mary” lastName “Wilson” games[0]  game : Game:2345, opponent : Player:rick  games[1]  game : Game:2611, opponent : Player:ann  mary username “rick” firstName “Ricky” lastName “Doe” score 42 games[0]  game : Game:2345, opponent : Player:mary  games[1]  game : Game:7425, opponent : Player:ann  games[2]  game : Game:1241, opponent : Player:johnny  rick Player id 2345 firstPlayer Player:mary secondPlayer Player:rick rounds[0]  moves : … , comments : …  rounds[1]  moves : … , actions : … , spell : …  2345 Game
  • 22. Luca Cabibbo  In representing an aggregate object in NoAM, we use one or more entries – to partition the complex value of the aggregate  aggregate partitioning affects performance of data access and manipulation operations  this partitioning can be based on  basic (predefined) data representation strategies  custom data representations NoSQL Database Design 45 - Aggregate partitioning Luca Cabibbo  Entry per Aggregate Object (EAO)  an aggregate object is represented by a single entry  the entry value is the whole complex value – the entry key is empty NoSQL Database Design 46 Entry per Aggregate Object (EAO)   username : "mary", firstName : "Mary", lastName : "Wilson", games : {  game : Game:2345, opponent : Player:rick ,  game : Game:2611, opponent : Player:ann  }  mary
  • 23. Luca Cabibbo  Entry per Top-level Field (ETF)  an aggregate object is represented by multiple entries – a distinct entry for each top-level field of the complex value  the entry value is the field value – the entry key is the field name NoSQL Database Design 47 Entry per Top-level Field (ETF) username “mary” firstName “Mary” lastName “Wilson” games {  game : Game:2345, opponent : Player:rick ,  game : Game:2611, opponent : Player:ann  } mary Luca Cabibbo  Entry per Atomic Value (EAV)  an aggregate object is represented by multiple entries – a distinct entry for each atomic value in the complex value  the entry value is the atomic value – the entry key is the “access path” to the atomic value NoSQL Database Design 48 Entry per Atomic Value (EAV) username “mary” firstName “Mary” lastName “Wilson” games[0].game Game:2345 games[0].opponent Player:rick games[1].game Game:2611 games[1].opponent Player:ann mary
  • 24. Luca Cabibbo  The basic data representation strategies can be suited in some cases – but we often need to partition aggregates in custom ways  aggregate partitioning can be driven by data access operations – since it affects the performance of database operations  each element of a partition (i.e., an entry) can represent either a scalar value or a complex value – the usage of “entries” with a complex value is a common practice in NoSQL datastores – e.g., Protocol Buffers, Avro schemas NoSQL Database Design 49 Custom aggregate partitioning Luca Cabibbo  Guidelines for aggregate partitioning – adapted from Conceptual Database Design (Batini, Ceri, Navathe, 1992)  if an aggregate is small in size, or all or most of its data are accessed or modified together – then it should be represented by a single entry  if an aggregate is large in size, and there are operations that frequently access or modify only specific portions of the aggregate – then it should be represented by multiple entries  if two or more data elements are frequently accessed or modified together – then they should belong to the same entry  if two or more data elements are usually accessed or modified separately – then they should belong to distinct entries NoSQL Database Design 50 Guidelines for aggregate partitioning
  • 25. Luca Cabibbo  Operations for our online game 1. when a player connects to the application – the aggregate for the player should be retrieved 2. when a player selects a game to continue – the aggregate for the game should be retrieved 3. when a player completes a round for a game – the aggregate for the game should be updated, by adding the new round 4. when a player invites a friend for playing a new game – an aggregate for a new game should be created, and the aggregate for the opponent players should be updated, by adding the new game  For example, what does operation 3 suggest?  each round should be represented using a distinct entry of the corresponding game aggregate NoSQL Database Design 51 Aggregate partitioning: Example Luca Cabibbo Fare clic per modificare lo stile del titolo Luca Cabibbo NoSQL Database Design 52 Aggregate partitioning: Example username “mary” firstName “Mary” lastName “Wilson” games[0]  game : Game:2345, opponent : Player:rick  games[1]  game : Game:2611, opponent : Player:ann  mary Player id 2345 firstPlayer Player:mary secondPlayer Player:rick rounds[0]  moves : … , comments : …  rounds[1]  moves : … , actions : … , spell : …  2345 Game
  • 26. Luca Cabibbo  NoAM defines a language to specify aggregate partitioning – and therefore, data representations  the language can be used to describe or document a certain aggregate partitioning  more importantly, it can be used in a mapping system  the database designer uses the language to specify a data representation – in a system-independent way  the mapping framework interprets the specification – to represent aggregates in the specific target datastore and to handle operations over them  The language has an XPath-like syntax – and we illustrate it by means of examples NoSQL Database Design 53 - A language for data representations Luca Cabibbo  Rule /*/* specifies strategy Entry per Aggregate Object (EAO)  the first * matches with aggregate classes  the second * matches with aggregate identifiers  the rule means “use an entry for each distinct aggregate class and distinct aggregate identifier” NoSQL Database Design 54 The language – by examples   username : "mary", firstName : "Mary", lastName : "Wilson", games : {  game : Game:2345, opponent : Player:rick ,  game : Game:2611, opponent : Player:ann  }  mary
  • 27. Luca Cabibbo  Rule /*/*/* specifies strategy Entry per Top-level Field (ETF)  the third * matches with top-level fields of aggregates  the rule means “use an entry for each distinct aggregate class, aggregate identifier, and top-level field” NoSQL Database Design 55 The language – by examples username “mary” firstName “Mary” lastName “Wilson” games {  game : Game:2345, opponent : Player:rick ,  game : Game:2611, opponent : Player:ann  } mary Luca Cabibbo  A data representation is specified by a sequence of rules  /Player/*/* – “use ETF for players”  /Game/* – “use EAO for games” NoSQL Database Design 56 The language – by examples username “mary” firstName “Mary” lastName “Wilson” games {  game : Game:2345, opponent : Player:rick ,  game : Game:2611, opponent : Player:ann  } mary Player   id : 2345, firstPlayer : Player:mary, secondPlayer : Player:rick, rounds : {  moves : … , comments : … ,  moves : … , actions : … , spell : …  }  2345 Game
  • 28. Luca Cabibbo  It is possible to have more rules over a same aggregate class  /Player/*/games[*] – “use an entry for each game played by a player”  /Player/*/* – “use ETF for the remaining data of each player” NoSQL Database Design 57 The language – by examples username “mary” firstName “Mary” lastName “Wilson” games[0]  game : Game:2345, opponent : Player:rick  games[1]  game : Game:2611, opponent : Player:ann  mary Luca Cabibbo  It is possible to have more rules over a same aggregate class  /Player/*/games[*] – “use an entry for each game played by a player”  /Player/* – “use EAO for the remaining data of each player” NoSQL Database Design 58 The language – by examples   username : “mary”, firstName : “Mary”, lastName : “Wilson”  games[0]  game : Game:2345, opponent : Player:rick  games[1]  game : Game:2611, opponent : Player:ann  mary
  • 29. Luca Cabibbo  In the implementation phase, we map the intermediate data representation to the specific data modeling elements of the target NoSQL datastore  given that the NoAM data model generalizes the features of the various systems, while keeping their major aspects, it is rather straightforward to perform this activity  Please note that the implementation takes also care of mapping operations – specifically, CRUD operations (create, read, update, delete) over aggregate objects to specific data access operations  we do not discuss this issue here  please find more details in the references NoSQL Database Design 59 - Implementation Luca Cabibbo  Oracle NoSQL is a key-value store – a database is a collection of key-value pairs  values are binary strings, opaque to the datastore  a key is composed of two parts  the major key is a non-empty sequence of strings  the minor key is a (possibly-empty) sequence of strings  e.g, /Player/mary/-/username  the major key controls data distributions – key-value pairs having the same major key are allocated in a same node  atomic operations on individual key-value pairs – but also on groups of key-value pairs having the same major key NoSQL Database Design 60 Oracle NoSQL: Implementation
  • 30. Luca Cabibbo  Mapping from NoAM to Oracle NoSQL  a key-value pair for each entry  the major key is composed of  the collection name  the block key (i.e., the aggregate identifier)  the minor key represents the entry key (i.e., an access path)  the value represents the entry value  it can be either a simple value, or  the serialization of a complex value – e.g., in JSON NoSQL Database Design 61 Oracle NoSQL: Implementation Luca Cabibbo Fare clic per modificare lo stile del titolo Luca Cabibbo NoSQL Database Design 62 Oracle NoSQL: Implementation key value /Player/mary/- { “username” : “mary”, “firstName” : “Mary”, “lastName” : “Wilson”, “games” : […] } /Player/rick/- { “username” : “rick”, “firstName” : “Ricky”, lastName : “Doe”, “score” : “42”, “games” : […] } /Game/2345/- { “id” : “2345”, “firstPlayer” : “Player:mary”, “secondPlayer” : “Player:rick”, “rounds” : […] }   username : "mary", firstName : "Mary", lastName : "Wilson", games : {  game : Game:2345, opponent : Player:rick ,  game : Game:2611, opponent : Player:ann  }  mary Player
  • 31. Luca Cabibbo Fare clic per modificare lo stile del titolo Luca Cabibbo NoSQL Database Design 63 Oracle NoSQL: Implementation key value /Player/mary/-/username mary /Player/mary/-/firstName Mary /Player/mary/-/lastName Wilson /Player/mary/-/games[0] { “game” : “Game:2345”, “opponent” : “Player:rick” } /Player/mary/-/games[1] { “game” : “Game:2611”, “opponent” : “Player:ann” } … … username “mary” firstName “Mary” lastName “Wilson” games[0]  game : Game:2345, opponent : Player:rick  games[1]  game : Game:2611, opponent : Player:ann  mary Player Luca Cabibbo  MongoDB is a document store – a database is a set of documents  each document has a complex value and an identifier, and documents are organized in collections  Mapping from NoAM to MongoDB  a document collection for each NoAM collection (aggregate class)  a main document for each block (aggregate)  a top-level field for each entry  the special _id field for the block key (aggregate identifier)  atomic operations on individual documents – or on their fields NoSQL Database Design 64 MongoDB: Implementation
  • 32. Luca Cabibbo Fare clic per modificare lo stile del titolo Luca Cabibbo NoSQL Database Design 65 MongoDB: Implementation username “mary” firstName “Mary” lastName “Wilson” games[0]  game : Game:2345, opponent : Player:rick  games[1]  game : Game:2611, opponent : Player:ann  mary Player id document mary { “_id” : “mary”, “username” : “mary”, “firstName” : “Mary”, “lastName” : “Wilson”, “games[0]” : { “game” : “Game:2345”, “opponent” : “Player:rick” }, “games[1]” : { “game” : “Game:2611”, “opponent” : “Player:ann” } } collection Player Luca Cabibbo  A different implementation  reconstruct structure of complex values NoSQL Database Design 66 MongoDB: Alternative implementation username “mary” firstName “Mary” lastName “Wilson” games[0]  game : Game:2345, opponent : Player:rick  games[1]  game : Game:2611, opponent : Player:ann  mary Player id document mary { “_id” : “mary”, “username” : “mary”, “firstName” : “Mary”, “lastName” : “Wilson”, “games” : [ { “game” : “Game:2345”, “opponent” : “Player:rick” }, { “game” : “Game:2611”, “opponent” : “Player:ann” } ] } collection Player
  • 33. Luca Cabibbo  Amazon DynamoDB is an extensible record store  a database is a set of tables  each table is a set of items  each item contains a set of attributes, each with a name and a value  each table has a primary key – composed of a hash partition attribute and an optional range attribute  the partition attribute controls distribution of items  atomic operations on individual items – or on their columns  Mapping from NoAM to DynamoDB  a table for each collection (aggregate class)  an item for each block (aggregate) – whose primary key is the block key (aggregate identifier)  an attribute for each entry NoSQL Database Design 67 DynamoDB: Implementation Luca Cabibbo Fare clic per modificare lo stile del titolo Luca Cabibbo NoSQL Database Design 68 DynamoDB: Implementation username “mary” firstName “Mary” lastName “Wilson” games[0]  game : Game:2345, opponent : Player:rick  games[1]  game : Game:2611, opponent : Player:ann  mary Player username firstName lastName score games[0] games[1] games[2] … mary Mary Wilson {…} {…} rick Ricky Doe 42 {…} {…} {…} table Player id firstPlayer secondPlayer rounds[0] rounds[1] rounds[2] … 2345 Player:mary Player:rick {…} {…} table Game
  • 34. Luca Cabibbo  NoAM (NoSQL Abstract Model) is a high-level approach to NoSQL database design for next-generation web applications  a high-level approach  initial design activities are independent of any specific target systems  it is based on NoAM  NoAM is an intermediate, abstract data model for NoSQL databases – which exploits the commonalities of their various data models – but also introduces abstractions to balance their differences and variations NoSQL Database Design 69 - Conclusion (NoAM) Luca Cabibbo  Open issues  NoAM data model  other abstractions are needed to represent further data modeling elements available in NoSQL datastores  further abstractions related to relevant metadata – e.g., versions and timestamps, to support concurrency control and consistency management  derived data and materialized views  so far, we have assumed that data is represented in a non- redundant way – some redundancy is usually suggested in NoSQL databases, to improve performance – but note that view maintenance could affect consistency negatively  support to multi-aggregate transactions is required NoSQL Database Design 70 Conclusion (NoAM data model)
  • 35. Luca Cabibbo  Open issues  NoAM approach  the proposed guidelines can propose conflicting suggestions – therefore, the application of the approach might result in a number of candidate data representations, rather than to a single one  tools can help the designer to assess a preferred solution  NoSQL database design for different settings  for example, to support query-intensive applications and analytical queries NoSQL Database Design 71 Conclusion (NoAM approach) Luca Cabibbo  ONDM (Object-NoSQL Datastore Mapper) is a framework that provides application developers with  a uniform access towards a variety of NoSQL datastores  the ability to map application data to different data representations, in a flexible way  Main features of ONDM  object-oriented API, based on Java Persistence API (JPA)  transparent access to various NoSQL datastores – such as Oracle NoSQL, Redis, MongoDB, CouchBase, and Cassandra  internal representation based on NoAM  flexible data representations – based on the NoAM language for data representations NoSQL Database Design 72 * ONDM (Object-NoSQL Datastore Mapper)
  • 36. Luca Cabibbo  A layered architecture  API – based on JPA, offers CRUD operations to manipulate aggregates  internal aggregate manager – conversion between aggregate objects and an internal representation (JSON) – cache mgmt  data representation manager – in NoAM, wrt the specified data representation  datastore adapters – conversion between NoAM and specific data structures and operations Application create, read, update, delete  (of application objects) Internal  Representation { "id" : "2345", "roundCount" : "2", "firstPlayer" : "Player:mary", "secondPlayer" : "Player:rick", "rounds" : [...] } Game:2345 JSON Java put, get, delete  (of complex‐value objects) Data Representation NoAM Game : 2345 id : 2345 roundCount : 2 firstPlayer : Player:mary secondPlayer : Player:rick rounds[0] :  moves:…, comments:...  rounds[1] :  moves:…, comments:...  Oracle NoSQL Datastore Oracle NoSQL Datastore Adapter Oracle NoSQL API operations DynamoDB Datastore DynamoDB Datastore Adapter DynamoDB API operations id rc firstPlayer secondPlayer rounds[0] 2345 2 Player:mary Player:rick ... rounds[1] ... Game/2345/‐/id Game/2345/‐/roundCount Game/2345/‐/firstPlayer Game/2345/‐/secondPlayer Game/2345/‐/rounds[0] Game/2345/‐/rounds[1] 2345 2 Player:mary Player:rick { moves:[…], comments:[…] } { moves:[…], comments:[…] } ... /Player/*/games[*] /Player/* /Game/*/rounds[*] /Game/*/* Data Representation  Language Cache put, get, delete,  multiPut, multiGet, multiDelete (of entries and blocks) NoSQL Database Design 73 - Architecture of ONDM Luca Cabibbo  The database design activity can result in a number of candidate data representations – rather than to a single one  consider again the operations for our online game 2. when a player selects a game to continue – the aggregate for the game should be retrieved 3. when a player completes a round for a game – the aggregate for the game should be updated, by adding the new round  operations 2 and 3 suggest different choices for the representation of rounds – (i) all together in a single entry or (ii) using a distinct entry for each round  In this case, experiments are needed to assess the most suitable design solution – and ONDM can help in performing them  an important feature is the ability to select a desired data representation in a declarative way – using the NoAM language for data representations NoSQL Database Design 82 * A case study in NoSQL db design
  • 37. Luca Cabibbo  To decide between the various candidate representations, a few experiments can help  the target datastore is Oracle NoSQL (single node)  three candidate representations  an entry for a whole game – EAO  /Game/*/rounds[*] + /Game/*/* – Rounds+ETF  /Game/*/rounds[*] + /Game/* – Rounds+EAO  various workloads  game retrieval  round addition  mixed – 80% game retrievals + 20% round additions  each game is 8kb, each round is 0.5kb  database size is in GB, timings are ms per operation NoSQL Database Design 83 A case study in NoSQL db design Luca Cabibbo Fare clic per modificare lo stile del titolo Luca Cabibbo NoSQL Database Design 84 A case study in NoSQL db design 0 1 2 3 4 5 6 7 8 2 4 6 8 10 12 14 16 Game Retrieval EAO Rounds+ETF Rounds+EAO 0 0,5 1 1,5 2 2,5 3 3,5 4 2 4 6 8 10 12 14 16 Round Addition EAO Rounds+ETF Rounds+EAO 0 1 2 3 4 5 6 7 8 2 4 6 8 10 12 14 16 Mixed Load (80/20) EAO Rounds+ETF Rounds+EAO
  • 38. Luca Cabibbo  The experiments show that aggregate partitioning has indeed impact on the performance of the various operations  in general, when using a NoSQL database, decisions on the organization of data are required  these decisions are significant – as the data representation affects major quality requirements – such as scalability, performance, and consistency NoSQL Database Design 85 - Conclusion (case study)