SlideShare a Scribd company logo
1 of 104
Download to read offline
10: Taxonomy of Data and Storage
Zubair Nabi
zubair.nabi@itu.edu.pk
April 20, 2013
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 1 / 27
Outline
1 Datasets
2 Storage
3 Beyond RDBMS
4 NoSQL Taxonomy
5 NewSQL
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 2 / 27
Outline
1 Datasets
2 Storage
3 Beyond RDBMS
4 NoSQL Taxonomy
5 NewSQL
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 3 / 27
Introduction
Data is everywhere and is the driving force behind our lives
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 4 / 27
Introduction
Data is everywhere and is the driving force behind our lives
The address book on your phone is data
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 4 / 27
Introduction
Data is everywhere and is the driving force behind our lives
The address book on your phone is data
So is the newspaper that you read every morning
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 4 / 27
Introduction
Data is everywhere and is the driving force behind our lives
The address book on your phone is data
So is the newspaper that you read every morning
Everything you see around you is a potential source of data which
might be useful for a certain application
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 4 / 27
Introduction
Data is everywhere and is the driving force behind our lives
The address book on your phone is data
So is the newspaper that you read every morning
Everything you see around you is a potential source of data which
might be useful for a certain application
We use this data to share information and make a more informed
decision about different events
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 4 / 27
Introduction
Data is everywhere and is the driving force behind our lives
The address book on your phone is data
So is the newspaper that you read every morning
Everything you see around you is a potential source of data which
might be useful for a certain application
We use this data to share information and make a more informed
decision about different events
Datasets can easily be classified on the basis of their structure
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 4 / 27
Introduction
Data is everywhere and is the driving force behind our lives
The address book on your phone is data
So is the newspaper that you read every morning
Everything you see around you is a potential source of data which
might be useful for a certain application
We use this data to share information and make a more informed
decision about different events
Datasets can easily be classified on the basis of their structure
1 Structured
2 Unstructured
3 Semi-structured
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 4 / 27
Structured Data
Formatted in a universally understandable and identifiable way
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 5 / 27
Structured Data
Formatted in a universally understandable and identifiable way
In most cases, structured data is formally specified by a schema
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 5 / 27
Structured Data
Formatted in a universally understandable and identifiable way
In most cases, structured data is formally specified by a schema
Your phone address phone is structured because it has a schema
consisting of name, phone number, address, email address, etc.
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 5 / 27
Structured Data
Formatted in a universally understandable and identifiable way
In most cases, structured data is formally specified by a schema
Your phone address phone is structured because it has a schema
consisting of name, phone number, address, email address, etc.
Most traditional databases contain structured data revolving around
data laid out across columns and rows
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 5 / 27
Structured Data
Formatted in a universally understandable and identifiable way
In most cases, structured data is formally specified by a schema
Your phone address phone is structured because it has a schema
consisting of name, phone number, address, email address, etc.
Most traditional databases contain structured data revolving around
data laid out across columns and rows
Each field also has an associated type
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 5 / 27
Structured Data
Formatted in a universally understandable and identifiable way
In most cases, structured data is formally specified by a schema
Your phone address phone is structured because it has a schema
consisting of name, phone number, address, email address, etc.
Most traditional databases contain structured data revolving around
data laid out across columns and rows
Each field also has an associated type
Possible to search for items based on their data types
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 5 / 27
Unstructured Data
Data without any conceptual definition or type
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 6 / 27
Unstructured Data
Data without any conceptual definition or type
Can vary from raw text to binary data
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 6 / 27
Unstructured Data
Data without any conceptual definition or type
Can vary from raw text to binary data
Processing unstructured data requires parsing and tagging on the fly
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 6 / 27
Unstructured Data
Data without any conceptual definition or type
Can vary from raw text to binary data
Processing unstructured data requires parsing and tagging on the fly
In most cases, consists of simple log files
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 6 / 27
Semi-structured Data
Occupies the space between the structured and unstructured data
spectrum
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 7 / 27
Semi-structured Data
Occupies the space between the structured and unstructured data
spectrum
For instance, while binary data has no structure, audio and video files
have meta-data which has structure, such as author, time of creation,
etc.
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 7 / 27
Semi-structured Data
Occupies the space between the structured and unstructured data
spectrum
For instance, while binary data has no structure, audio and video files
have meta-data which has structure, such as author, time of creation,
etc.
Can also be labelled as self-describing structure
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 7 / 27
Outline
1 Datasets
2 Storage
3 Beyond RDBMS
4 NoSQL Taxonomy
5 NewSQL
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 8 / 27
Database Management Systems (DBMS)
Used to store and manage data
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 9 / 27
Database Management Systems (DBMS)
Used to store and manage data
Support for large amounts of data
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 9 / 27
Database Management Systems (DBMS)
Used to store and manage data
Support for large amounts of data
Ensure concurrency, sharing, and locking
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 9 / 27
Database Management Systems (DBMS)
Used to store and manage data
Support for large amounts of data
Ensure concurrency, sharing, and locking
Security is useful too; to enable fine-grained access control
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 9 / 27
Database Management Systems (DBMS)
Used to store and manage data
Support for large amounts of data
Ensure concurrency, sharing, and locking
Security is useful too; to enable fine-grained access control
Ability to keep working in the face of failure
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 9 / 27
Relational Database Management Systems (RDBMS)
The most popular and predominant storage system in use
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 10 / 27
Relational Database Management Systems (RDBMS)
The most popular and predominant storage system in use
Data in different files is connected by using a key field
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 10 / 27
Relational Database Management Systems (RDBMS)
The most popular and predominant storage system in use
Data in different files is connected by using a key field
Data is laid out in different tables, with a key field that identifies each
row
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 10 / 27
Relational Database Management Systems (RDBMS)
The most popular and predominant storage system in use
Data in different files is connected by using a key field
Data is laid out in different tables, with a key field that identifies each
row
The same key field is used to connect one table to another
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 10 / 27
Relational Database Management Systems (RDBMS)
The most popular and predominant storage system in use
Data in different files is connected by using a key field
Data is laid out in different tables, with a key field that identifies each
row
The same key field is used to connect one table to another
For instance, a relation might have customer ID as key and her details
as data; another table might have the same key but different data, say
her purchases; yet another table with the same key might have a
breakdown of her preferences
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 10 / 27
Relational Database Management Systems (RDBMS)
The most popular and predominant storage system in use
Data in different files is connected by using a key field
Data is laid out in different tables, with a key field that identifies each
row
The same key field is used to connect one table to another
For instance, a relation might have customer ID as key and her details
as data; another table might have the same key but different data, say
her purchases; yet another table with the same key might have a
breakdown of her preferences
Examples include Oracle Database, MS SQL Server, MySQL, IBM
DB2, and Teradata
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 10 / 27
Structured Query Language (SQL)
Non-procedural language used for data retrieval and manipulation in
RDBMS
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 11 / 27
Structured Query Language (SQL)
Non-procedural language used for data retrieval and manipulation in
RDBMS
Adds a layer of abstraction over relational algebra, which enables set
operations, selections, etc.
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 11 / 27
Structured Query Language (SQL)
Non-procedural language used for data retrieval and manipulation in
RDBMS
Adds a layer of abstraction over relational algebra, which enables set
operations, selections, etc.
Due to its declarative nature, users operate in terms of their expected
output while the underlying system decides the actual query execution
plan
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 11 / 27
Structured Query Language (SQL)
Non-procedural language used for data retrieval and manipulation in
RDBMS
Adds a layer of abstraction over relational algebra, which enables set
operations, selections, etc.
Due to its declarative nature, users operate in terms of their expected
output while the underlying system decides the actual query execution
plan
Instructions consist of a specific SQL statement and additional
parameters and operands
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 11 / 27
Structured Query Language (SQL)
Non-procedural language used for data retrieval and manipulation in
RDBMS
Adds a layer of abstraction over relational algebra, which enables set
operations, selections, etc.
Due to its declarative nature, users operate in terms of their expected
output while the underlying system decides the actual query execution
plan
Instructions consist of a specific SQL statement and additional
parameters and operands
For instance, the SELECT operator retrieves certain records, INSERT
adds a record, and so on
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 11 / 27
RDBMS and Structured Data
As structured data follows a predefined schema, it naturally maps on to
a relational database system
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 12 / 27
RDBMS and Structured Data
As structured data follows a predefined schema, it naturally maps on to
a relational database system
The schema defines the type and structure of the data and its relations
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 12 / 27
RDBMS and Structured Data
As structured data follows a predefined schema, it naturally maps on to
a relational database system
The schema defines the type and structure of the data and its relations
Schema design is an arduous process and needs to be done before
the database can be populated
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 12 / 27
RDBMS and Structured Data
As structured data follows a predefined schema, it naturally maps on to
a relational database system
The schema defines the type and structure of the data and its relations
Schema design is an arduous process and needs to be done before
the database can be populated
Another consequence of a strict schema is that it is non-trivial to
extend it
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 12 / 27
RDBMS and Structured Data
As structured data follows a predefined schema, it naturally maps on to
a relational database system
The schema defines the type and structure of the data and its relations
Schema design is an arduous process and needs to be done before
the database can be populated
Another consequence of a strict schema is that it is non-trivial to
extend it
For instance, adding a new attribute to an existing row necessitates
adding a new column to the entire table
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 12 / 27
RDBMS and Structured Data
As structured data follows a predefined schema, it naturally maps on to
a relational database system
The schema defines the type and structure of the data and its relations
Schema design is an arduous process and needs to be done before
the database can be populated
Another consequence of a strict schema is that it is non-trivial to
extend it
For instance, adding a new attribute to an existing row necessitates
adding a new column to the entire table
Extremely suboptimal in tables with millions of rows
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 12 / 27
RDBMS and Semi- and Un-structured Data
Unstructured data has no notion of schema while semi-structured data
only has a weak one
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 13 / 27
RDBMS and Semi- and Un-structured Data
Unstructured data has no notion of schema while semi-structured data
only has a weak one
Data within such datasets also has an associated type
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 13 / 27
RDBMS and Semi- and Un-structured Data
Unstructured data has no notion of schema while semi-structured data
only has a weak one
Data within such datasets also has an associated type
In fact, types are application-centric: It might be possible to interpret a
field as a float in one application and as a string in another
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 13 / 27
RDBMS and Semi- and Un-structured Data
Unstructured data has no notion of schema while semi-structured data
only has a weak one
Data within such datasets also has an associated type
In fact, types are application-centric: It might be possible to interpret a
field as a float in one application and as a string in another
While it is possible, with human intervention, to glean structure from
unstructured data, it is an extremely expensive task
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 13 / 27
RDBMS and Semi- and Un-structured Data
Unstructured data has no notion of schema while semi-structured data
only has a weak one
Data within such datasets also has an associated type
In fact, types are application-centric: It might be possible to interpret a
field as a float in one application and as a string in another
While it is possible, with human intervention, to glean structure from
unstructured data, it is an extremely expensive task
Structureless data generated by real-time sources can change the
number of attributes and their types on the fly
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 13 / 27
RDBMS and Semi- and Un-structured Data
Unstructured data has no notion of schema while semi-structured data
only has a weak one
Data within such datasets also has an associated type
In fact, types are application-centric: It might be possible to interpret a
field as a float in one application and as a string in another
While it is possible, with human intervention, to glean structure from
unstructured data, it is an extremely expensive task
Structureless data generated by real-time sources can change the
number of attributes and their types on the fly
RDBMS would require the creation of a new table each time such a
change takes place
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 13 / 27
RDBMS and Semi- and Un-structured Data
Unstructured data has no notion of schema while semi-structured data
only has a weak one
Data within such datasets also has an associated type
In fact, types are application-centric: It might be possible to interpret a
field as a float in one application and as a string in another
While it is possible, with human intervention, to glean structure from
unstructured data, it is an extremely expensive task
Structureless data generated by real-time sources can change the
number of attributes and their types on the fly
RDBMS would require the creation of a new table each time such a
change takes place
Therefore, unstructured and semi-structured data does not fit the
relational model
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 13 / 27
Outline
1 Datasets
2 Storage
3 Beyond RDBMS
4 NoSQL Taxonomy
5 NewSQL
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 14 / 27
Motivation
Different semantics:
RDBMS provide ACID semantics:
1 Atomic: The entire transaction either succeeds or fails
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 15 / 27
Motivation
Different semantics:
RDBMS provide ACID semantics:
1 Atomic: The entire transaction either succeeds or fails
2 Consistent: Data within the database remains consistent after each
transaction
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 15 / 27
Motivation
Different semantics:
RDBMS provide ACID semantics:
1 Atomic: The entire transaction either succeeds or fails
2 Consistent: Data within the database remains consistent after each
transaction
3 Isolation: Transactions are sandboxed from each other
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 15 / 27
Motivation
Different semantics:
RDBMS provide ACID semantics:
1 Atomic: The entire transaction either succeeds or fails
2 Consistent: Data within the database remains consistent after each
transaction
3 Isolation: Transactions are sandboxed from each other
4 Durable: Transactions are persistent across failures and restarts
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 15 / 27
Motivation
Different semantics:
RDBMS provide ACID semantics:
1 Atomic: The entire transaction either succeeds or fails
2 Consistent: Data within the database remains consistent after each
transaction
3 Isolation: Transactions are sandboxed from each other
4 Durable: Transactions are persistent across failures and restarts
Overkill in case of most user-facing applications
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 15 / 27
Motivation
Different semantics:
RDBMS provide ACID semantics:
1 Atomic: The entire transaction either succeeds or fails
2 Consistent: Data within the database remains consistent after each
transaction
3 Isolation: Transactions are sandboxed from each other
4 Durable: Transactions are persistent across failures and restarts
Overkill in case of most user-facing applications
Most applications are more interested in availability and willing to
sacrifice consistency leading to eventual consistency
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 15 / 27
Motivation
Different semantics:
RDBMS provide ACID semantics:
1 Atomic: The entire transaction either succeeds or fails
2 Consistent: Data within the database remains consistent after each
transaction
3 Isolation: Transactions are sandboxed from each other
4 Durable: Transactions are persistent across failures and restarts
Overkill in case of most user-facing applications
Most applications are more interested in availability and willing to
sacrifice consistency leading to eventual consistency
This basically available, soft state, eventually consistent (BASE) model
enables applications to function even in the face of partial failure
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 15 / 27
Motivation
Different semantics:
RDBMS provide ACID semantics:
1 Atomic: The entire transaction either succeeds or fails
2 Consistent: Data within the database remains consistent after each
transaction
3 Isolation: Transactions are sandboxed from each other
4 Durable: Transactions are persistent across failures and restarts
Overkill in case of most user-facing applications
Most applications are more interested in availability and willing to
sacrifice consistency leading to eventual consistency
This basically available, soft state, eventually consistent (BASE) model
enables applications to function even in the face of partial failure
High Throughput: Most NoSQL databases sacrifice consistency for
availability leading to higher throughput (in some cases an order of
magnitude)
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 15 / 27
Motivation (2)
Horizontal Scalability: To cater for more data, NoSQL stores can be
scaled up by just adding more machines and the underlying system
automatically re-distributes the data
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 16 / 27
Motivation (2)
Horizontal Scalability: To cater for more data, NoSQL stores can be
scaled up by just adding more machines and the underlying system
automatically re-distributes the data
Commodity Hardware: A large number of RDBMS require specialized
and proprietary hardware for operation. In contrast, NoSQL databases
function over commodity off-the-shelf hardware
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 16 / 27
Motivation (2)
Horizontal Scalability: To cater for more data, NoSQL stores can be
scaled up by just adding more machines and the underlying system
automatically re-distributes the data
Commodity Hardware: A large number of RDBMS require specialized
and proprietary hardware for operation. In contrast, NoSQL databases
function over commodity off-the-shelf hardware
Programming Language Support: Over the years programming
languages have started providing abstractions for database support
(LINQ, etc.) while bypassing SQL. NoSQL databases provide
abstractions that directly map onto the language abstractions leading
to tighter coupling
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 16 / 27
Motivation (3)
The Rise of Cloud Computing: Cloud Computing applications require
horizontal scalability and low administration overhead. Both
requirements are naturally satisfied by NoSQL stores
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 17 / 27
Outline
1 Datasets
2 Storage
3 Beyond RDBMS
4 NoSQL Taxonomy
5 NewSQL
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 18 / 27
Introduction
NoSQL databases can be classified on the basis of:
1 Data Model: How data is represented
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 19 / 27
Introduction
NoSQL databases can be classified on the basis of:
1 Data Model: How data is represented
2 Scalability: How scalable the system is
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 19 / 27
Introduction
NoSQL databases can be classified on the basis of:
1 Data Model: How data is represented
2 Scalability: How scalable the system is
3 Query Model: What type of API it exposes
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 19 / 27
Introduction
NoSQL databases can be classified on the basis of:
1 Data Model: How data is represented
2 Scalability: How scalable the system is
3 Query Model: What type of API it exposes
4 Persistence: How persistent the data is
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 19 / 27
Classification by Data Model
Based on the data model, NoSQL databases can roughly be categorized
into three categories:
1 Key/value Stores: A map/dictionary allowing put/get semantics per
key
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 20 / 27
Classification by Data Model
Based on the data model, NoSQL databases can roughly be categorized
into three categories:
1 Key/value Stores: A map/dictionary allowing put/get semantics per
key
2 Document Stores: Complex data structures to encapsulate document
key/value pairs
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 20 / 27
Classification by Data Model
Based on the data model, NoSQL databases can roughly be categorized
into three categories:
1 Key/value Stores: A map/dictionary allowing put/get semantics per
key
2 Document Stores: Complex data structures to encapsulate document
key/value pairs
3 Column-Oriented Stores: Data laid out by column
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 20 / 27
Key/value Stores
Data is stored within a large hash map
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 21 / 27
Key/value Stores
Data is stored within a large hash map
Simple get/put API
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 21 / 27
Key/value Stores
Data is stored within a large hash map
Simple get/put API
Favour scalability over consistency
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 21 / 27
Key/value Stores
Data is stored within a large hash map
Simple get/put API
Favour scalability over consistency
Limit on the size of the key
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 21 / 27
Key/value Stores
Data is stored within a large hash map
Simple get/put API
Favour scalability over consistency
Limit on the size of the key
Examples include Amazon’s Dynamo, LinkedIn’s Voldemort, Redis,
and Memcached
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 21 / 27
Document Stores
Key/value semantics but based on documents
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 22 / 27
Document Stores
Key/value semantics but based on documents
A document encapsulates data in a standard format, such as JSON,
XML, PDF, etc.
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 22 / 27
Document Stores
Key/value semantics but based on documents
A document encapsulates data in a standard format, such as JSON,
XML, PDF, etc.
Documents themselves can be heterogeneous
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 22 / 27
Document Stores
Key/value semantics but based on documents
A document encapsulates data in a standard format, such as JSON,
XML, PDF, etc.
Documents themselves can be heterogeneous
Documents can also be retrieved based on their content
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 22 / 27
Document Stores
Key/value semantics but based on documents
A document encapsulates data in a standard format, such as JSON,
XML, PDF, etc.
Documents themselves can be heterogeneous
Documents can also be retrieved based on their content
Examples include Apache CouchDB and MongoDB
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 22 / 27
Column-Oriented Stores
Data is stored and processed by column
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 23 / 27
Column-Oriented Stores
Data is stored and processed by column
Useful for read-mostly and read-intensive data
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 23 / 27
Column-Oriented Stores
Data is stored and processed by column
Useful for read-mostly and read-intensive data
Data within the same column is of the same type enabling
opportunities for efficient compression
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 23 / 27
Column-Oriented Stores
Data is stored and processed by column
Useful for read-mostly and read-intensive data
Data within the same column is of the same type enabling
opportunities for efficient compression
Columns are stored separately so they can be loaded in parallel
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 23 / 27
Column-Oriented Stores
Data is stored and processed by column
Useful for read-mostly and read-intensive data
Data within the same column is of the same type enabling
opportunities for efficient compression
Columns are stored separately so they can be loaded in parallel
Examples include Google’s BigTable (Apache HBase is its open source
clone) and Facebook’s Cassandra
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 23 / 27
Outline
1 Datasets
2 Storage
3 Beyond RDBMS
4 NoSQL Taxonomy
5 NewSQL
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 24 / 27
Introduction
A hybrid of traditional RDBMS and NoSQL
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 25 / 27
Introduction
A hybrid of traditional RDBMS and NoSQL
Scalability and performance of NoSQL and ACID guarantees of RDBMS
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 25 / 27
Introduction
A hybrid of traditional RDBMS and NoSQL
Scalability and performance of NoSQL and ACID guarantees of RDBMS
Use SQL as the primary language
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 25 / 27
Introduction
A hybrid of traditional RDBMS and NoSQL
Scalability and performance of NoSQL and ACID guarantees of RDBMS
Use SQL as the primary language
Ability to scale out and run over commodity hardware
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 25 / 27
Introduction
A hybrid of traditional RDBMS and NoSQL
Scalability and performance of NoSQL and ACID guarantees of RDBMS
Use SQL as the primary language
Ability to scale out and run over commodity hardware
Classified into:
1 New Databases: Designed from scratch
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 25 / 27
Introduction
A hybrid of traditional RDBMS and NoSQL
Scalability and performance of NoSQL and ACID guarantees of RDBMS
Use SQL as the primary language
Ability to scale out and run over commodity hardware
Classified into:
1 New Databases: Designed from scratch
2 New MySQL Storage Engines: Keep MySQL as interface but replace
the storage engine
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 25 / 27
Introduction
A hybrid of traditional RDBMS and NoSQL
Scalability and performance of NoSQL and ACID guarantees of RDBMS
Use SQL as the primary language
Ability to scale out and run over commodity hardware
Classified into:
1 New Databases: Designed from scratch
2 New MySQL Storage Engines: Keep MySQL as interface but replace
the storage engine
3 Transparent Clustering: Add pluggable features to existing databases
to ensure scalability
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 25 / 27
New Databases
1 Query Distribution:
Each node holds a subset of the data
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 26 / 27
New Databases
1 Query Distribution:
Each node holds a subset of the data
Queries are split and shipped to nodes that own the data
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 26 / 27
New Databases
1 Query Distribution:
Each node holds a subset of the data
Queries are split and shipped to nodes that own the data
Examples include Google’s Spanner and NuoDB
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 26 / 27
New Databases
1 Query Distribution:
Each node holds a subset of the data
Queries are split and shipped to nodes that own the data
Examples include Google’s Spanner and NuoDB
2 Pull Data:
A central node (possibly replicated) holds all data
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 26 / 27
New Databases
1 Query Distribution:
Each node holds a subset of the data
Queries are split and shipped to nodes that own the data
Examples include Google’s Spanner and NuoDB
2 Pull Data:
A central node (possibly replicated) holds all data
A set of processing nodes receives queries and pulls in required data
from the central node
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 26 / 27
New Databases
1 Query Distribution:
Each node holds a subset of the data
Queries are split and shipped to nodes that own the data
Examples include Google’s Spanner and NuoDB
2 Pull Data:
A central node (possibly replicated) holds all data
A set of processing nodes receives queries and pulls in required data
from the central node
Examples include VMware’s SQLFire
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 26 / 27
References
1 NoSQL Databases: https:
//oak.cs.ucla.edu/cs144/handouts/nosqldbs.pdf
2 NewSQL – The New Way to Handle Big Data: http://www.
linuxforu.com/2012/01/newsql-handle-big-data/
Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 27 / 27

More Related Content

What's hot

Fundamentals of Database ppt ch02
Fundamentals of Database ppt ch02Fundamentals of Database ppt ch02
Fundamentals of Database ppt ch02
Jotham Gadot
 

What's hot (20)

Advance Database Management Systems -Object Oriented Principles In Database
Advance Database Management Systems -Object Oriented Principles In DatabaseAdvance Database Management Systems -Object Oriented Principles In Database
Advance Database Management Systems -Object Oriented Principles In Database
 
DDBMS Paper with Solution
DDBMS Paper with SolutionDDBMS Paper with Solution
DDBMS Paper with Solution
 
Oracle Database | Computer Science
Oracle Database | Computer ScienceOracle Database | Computer Science
Oracle Database | Computer Science
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
Relational Database Design
Relational Database DesignRelational Database Design
Relational Database Design
 
Odbms concepts
Odbms conceptsOdbms concepts
Odbms concepts
 
Database systems - Chapter 2
Database systems - Chapter 2Database systems - Chapter 2
Database systems - Chapter 2
 
CLASE 3_ArquiteturaBD_UsuariosBD_IndependiciaLogFis_ModelosBD.pdf
CLASE 3_ArquiteturaBD_UsuariosBD_IndependiciaLogFis_ModelosBD.pdfCLASE 3_ArquiteturaBD_UsuariosBD_IndependiciaLogFis_ModelosBD.pdf
CLASE 3_ArquiteturaBD_UsuariosBD_IndependiciaLogFis_ModelosBD.pdf
 
Relational algebra in DBMS
Relational algebra in DBMSRelational algebra in DBMS
Relational algebra in DBMS
 
Chapter 2 Relational Data Model-part1
Chapter 2 Relational Data Model-part1Chapter 2 Relational Data Model-part1
Chapter 2 Relational Data Model-part1
 
Ogsa ogsi-a more detailed view
Ogsa ogsi-a more detailed viewOgsa ogsi-a more detailed view
Ogsa ogsi-a more detailed view
 
Fundamentals of Database ppt ch02
Fundamentals of Database ppt ch02Fundamentals of Database ppt ch02
Fundamentals of Database ppt ch02
 
Spark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational DataSpark + Cassandra = Real Time Analytics on Operational Data
Spark + Cassandra = Real Time Analytics on Operational Data
 
Object oriented dbms
Object oriented dbmsObject oriented dbms
Object oriented dbms
 
Sql.pptx
Sql.pptxSql.pptx
Sql.pptx
 
Object Oriented Dbms
Object Oriented DbmsObject Oriented Dbms
Object Oriented Dbms
 
DbMs
DbMsDbMs
DbMs
 
Types of databases
Types of databases   Types of databases
Types of databases
 
Chapter1
Chapter1Chapter1
Chapter1
 
Introduction to MongoDB and CRUD operations
Introduction to MongoDB and CRUD operationsIntroduction to MongoDB and CRUD operations
Introduction to MongoDB and CRUD operations
 

Similar to Topic 10: Taxonomy of Data and Storage

Data Base Management System(Dbms)Sunita
Data Base Management System(Dbms)SunitaData Base Management System(Dbms)Sunita
Data Base Management System(Dbms)Sunita
Apex
 
Database systems handbook dbms rdbms.pdf
Database systems handbook dbms rdbms.pdfDatabase systems handbook dbms rdbms.pdf
Database systems handbook dbms rdbms.pdf
Bahria University Islamabad, Pakistan
 
Database systems handbook.pdf
Database systems handbook.pdfDatabase systems handbook.pdf
Database systems handbook.pdf
Bahria University Islamabad, Pakistan
 

Similar to Topic 10: Taxonomy of Data and Storage (20)

Database Management System
Database Management SystemDatabase Management System
Database Management System
 
Ch # 09 database management system
Ch # 09 database management systemCh # 09 database management system
Ch # 09 database management system
 
Database management system
Database management systemDatabase management system
Database management system
 
Database management system
Database management systemDatabase management system
Database management system
 
Database systems Handbook 4th dbms by Muhammad Sharif.pdf
Database systems Handbook 4th  dbms by Muhammad Sharif.pdfDatabase systems Handbook 4th  dbms by Muhammad Sharif.pdf
Database systems Handbook 4th dbms by Muhammad Sharif.pdf
 
Database systems Handbook 4th dbms by Muhammad Sharif.pdf
Database systems Handbook 4th  dbms by Muhammad Sharif.pdfDatabase systems Handbook 4th  dbms by Muhammad Sharif.pdf
Database systems Handbook 4th dbms by Muhammad Sharif.pdf
 
Database systems Handbook 4th dbms by Muhammad Sharif.pdf
Database systems Handbook 4th  dbms by Muhammad Sharif.pdfDatabase systems Handbook 4th  dbms by Muhammad Sharif.pdf
Database systems Handbook 4th dbms by Muhammad Sharif.pdf
 
Data Base Management System(Dbms)Sunita
Data Base Management System(Dbms)SunitaData Base Management System(Dbms)Sunita
Data Base Management System(Dbms)Sunita
 
Database system Handbook 4th muhammad sharif.pdf
Database system Handbook 4th muhammad sharif.pdfDatabase system Handbook 4th muhammad sharif.pdf
Database system Handbook 4th muhammad sharif.pdf
 
Database system Handbook 4th muhammad sharif.pdf
Database system Handbook 4th muhammad sharif.pdfDatabase system Handbook 4th muhammad sharif.pdf
Database system Handbook 4th muhammad sharif.pdf
 
Database systems handbook dbms rdbms.pdf
Database systems handbook dbms rdbms.pdfDatabase systems handbook dbms rdbms.pdf
Database systems handbook dbms rdbms.pdf
 
Database systems handbook dbms rdbms.pdf
Database systems handbook dbms rdbms.pdfDatabase systems handbook dbms rdbms.pdf
Database systems handbook dbms rdbms.pdf
 
Database systems handbook dbms rdbms.pdf
Database systems handbook dbms rdbms.pdfDatabase systems handbook dbms rdbms.pdf
Database systems handbook dbms rdbms.pdf
 
Database systems handbook dbms rdbms.pdf
Database systems handbook dbms rdbms.pdfDatabase systems handbook dbms rdbms.pdf
Database systems handbook dbms rdbms.pdf
 
Database systems handbook.pdf
Database systems handbook.pdfDatabase systems handbook.pdf
Database systems handbook.pdf
 
Ais Romney 2006 Slides 04 Relational Databases
Ais Romney 2006 Slides 04 Relational DatabasesAis Romney 2006 Slides 04 Relational Databases
Ais Romney 2006 Slides 04 Relational Databases
 
Ais Romney 2006 Slides 04 Relational Databases
Ais Romney 2006 Slides 04 Relational DatabasesAis Romney 2006 Slides 04 Relational Databases
Ais Romney 2006 Slides 04 Relational Databases
 
Ais Romney 2006 Slides 04 Relational Databases
Ais Romney 2006 Slides 04 Relational DatabasesAis Romney 2006 Slides 04 Relational Databases
Ais Romney 2006 Slides 04 Relational Databases
 
Databasell
DatabasellDatabasell
Databasell
 
Complete book Database management systems Handbook 3rd edition by Muhammad Sh...
Complete book Database management systems Handbook 3rd edition by Muhammad Sh...Complete book Database management systems Handbook 3rd edition by Muhammad Sh...
Complete book Database management systems Handbook 3rd edition by Muhammad Sh...
 

More from Zubair Nabi

AOS Lab 12: Network Communication
AOS Lab 12: Network CommunicationAOS Lab 12: Network Communication
AOS Lab 12: Network Communication
Zubair Nabi
 
AOS Lab 11: Virtualization
AOS Lab 11: VirtualizationAOS Lab 11: Virtualization
AOS Lab 11: Virtualization
Zubair Nabi
 
AOS Lab 10: File system -- Inodes and beyond
AOS Lab 10: File system -- Inodes and beyondAOS Lab 10: File system -- Inodes and beyond
AOS Lab 10: File system -- Inodes and beyond
Zubair Nabi
 
AOS Lab 9: File system -- Of buffers, logs, and blocks
AOS Lab 9: File system -- Of buffers, logs, and blocksAOS Lab 9: File system -- Of buffers, logs, and blocks
AOS Lab 9: File system -- Of buffers, logs, and blocks
Zubair Nabi
 
AOS Lab 8: Interrupts and Device Drivers
AOS Lab 8: Interrupts and Device DriversAOS Lab 8: Interrupts and Device Drivers
AOS Lab 8: Interrupts and Device Drivers
Zubair Nabi
 
AOS Lab 7: Page tables
AOS Lab 7: Page tablesAOS Lab 7: Page tables
AOS Lab 7: Page tables
Zubair Nabi
 
AOS Lab 6: Scheduling
AOS Lab 6: SchedulingAOS Lab 6: Scheduling
AOS Lab 6: Scheduling
Zubair Nabi
 
AOS Lab 5: System calls
AOS Lab 5: System callsAOS Lab 5: System calls
AOS Lab 5: System calls
Zubair Nabi
 
AOS Lab 4: If you liked it, then you should have put a “lock” on it
AOS Lab 4: If you liked it, then you should have put a “lock” on itAOS Lab 4: If you liked it, then you should have put a “lock” on it
AOS Lab 4: If you liked it, then you should have put a “lock” on it
Zubair Nabi
 
AOS Lab 1: Hello, Linux!
AOS Lab 1: Hello, Linux!AOS Lab 1: Hello, Linux!
AOS Lab 1: Hello, Linux!
Zubair Nabi
 
AOS Lab 2: Hello, xv6!
AOS Lab 2: Hello, xv6!AOS Lab 2: Hello, xv6!
AOS Lab 2: Hello, xv6!
Zubair Nabi
 
AOS Lab 1: Hello, Linux!
AOS Lab 1: Hello, Linux!AOS Lab 1: Hello, Linux!
AOS Lab 1: Hello, Linux!
Zubair Nabi
 
The Big Data Stack
The Big Data StackThe Big Data Stack
The Big Data Stack
Zubair Nabi
 
Raabta: Low-cost Video Conferencing for the Developing World
Raabta: Low-cost Video Conferencing for the Developing WorldRaabta: Low-cost Video Conferencing for the Developing World
Raabta: Low-cost Video Conferencing for the Developing World
Zubair Nabi
 
The Anatomy of Web Censorship in Pakistan
The Anatomy of Web Censorship in PakistanThe Anatomy of Web Censorship in Pakistan
The Anatomy of Web Censorship in Pakistan
Zubair Nabi
 

More from Zubair Nabi (20)

AOS Lab 12: Network Communication
AOS Lab 12: Network CommunicationAOS Lab 12: Network Communication
AOS Lab 12: Network Communication
 
AOS Lab 11: Virtualization
AOS Lab 11: VirtualizationAOS Lab 11: Virtualization
AOS Lab 11: Virtualization
 
AOS Lab 10: File system -- Inodes and beyond
AOS Lab 10: File system -- Inodes and beyondAOS Lab 10: File system -- Inodes and beyond
AOS Lab 10: File system -- Inodes and beyond
 
AOS Lab 9: File system -- Of buffers, logs, and blocks
AOS Lab 9: File system -- Of buffers, logs, and blocksAOS Lab 9: File system -- Of buffers, logs, and blocks
AOS Lab 9: File system -- Of buffers, logs, and blocks
 
AOS Lab 8: Interrupts and Device Drivers
AOS Lab 8: Interrupts and Device DriversAOS Lab 8: Interrupts and Device Drivers
AOS Lab 8: Interrupts and Device Drivers
 
AOS Lab 7: Page tables
AOS Lab 7: Page tablesAOS Lab 7: Page tables
AOS Lab 7: Page tables
 
AOS Lab 6: Scheduling
AOS Lab 6: SchedulingAOS Lab 6: Scheduling
AOS Lab 6: Scheduling
 
AOS Lab 5: System calls
AOS Lab 5: System callsAOS Lab 5: System calls
AOS Lab 5: System calls
 
AOS Lab 4: If you liked it, then you should have put a “lock” on it
AOS Lab 4: If you liked it, then you should have put a “lock” on itAOS Lab 4: If you liked it, then you should have put a “lock” on it
AOS Lab 4: If you liked it, then you should have put a “lock” on it
 
AOS Lab 1: Hello, Linux!
AOS Lab 1: Hello, Linux!AOS Lab 1: Hello, Linux!
AOS Lab 1: Hello, Linux!
 
AOS Lab 2: Hello, xv6!
AOS Lab 2: Hello, xv6!AOS Lab 2: Hello, xv6!
AOS Lab 2: Hello, xv6!
 
AOS Lab 1: Hello, Linux!
AOS Lab 1: Hello, Linux!AOS Lab 1: Hello, Linux!
AOS Lab 1: Hello, Linux!
 
The Big Data Stack
The Big Data StackThe Big Data Stack
The Big Data Stack
 
Raabta: Low-cost Video Conferencing for the Developing World
Raabta: Low-cost Video Conferencing for the Developing WorldRaabta: Low-cost Video Conferencing for the Developing World
Raabta: Low-cost Video Conferencing for the Developing World
 
The Anatomy of Web Censorship in Pakistan
The Anatomy of Web Censorship in PakistanThe Anatomy of Web Censorship in Pakistan
The Anatomy of Web Censorship in Pakistan
 
MapReduce and DBMS Hybrids
MapReduce and DBMS HybridsMapReduce and DBMS Hybrids
MapReduce and DBMS Hybrids
 
MapReduce Application Scripting
MapReduce Application ScriptingMapReduce Application Scripting
MapReduce Application Scripting
 
Topic 15: Datacenter Design and Networking
Topic 15: Datacenter Design and NetworkingTopic 15: Datacenter Design and Networking
Topic 15: Datacenter Design and Networking
 
Topic 14: Operating Systems and Virtualization
Topic 14: Operating Systems and VirtualizationTopic 14: Operating Systems and Virtualization
Topic 14: Operating Systems and Virtualization
 
Topic 13: Cloud Stacks
Topic 13: Cloud StacksTopic 13: Cloud Stacks
Topic 13: Cloud Stacks
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Recently uploaded (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

Topic 10: Taxonomy of Data and Storage

  • 1. 10: Taxonomy of Data and Storage Zubair Nabi zubair.nabi@itu.edu.pk April 20, 2013 Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 1 / 27
  • 2. Outline 1 Datasets 2 Storage 3 Beyond RDBMS 4 NoSQL Taxonomy 5 NewSQL Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 2 / 27
  • 3. Outline 1 Datasets 2 Storage 3 Beyond RDBMS 4 NoSQL Taxonomy 5 NewSQL Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 3 / 27
  • 4. Introduction Data is everywhere and is the driving force behind our lives Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 4 / 27
  • 5. Introduction Data is everywhere and is the driving force behind our lives The address book on your phone is data Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 4 / 27
  • 6. Introduction Data is everywhere and is the driving force behind our lives The address book on your phone is data So is the newspaper that you read every morning Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 4 / 27
  • 7. Introduction Data is everywhere and is the driving force behind our lives The address book on your phone is data So is the newspaper that you read every morning Everything you see around you is a potential source of data which might be useful for a certain application Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 4 / 27
  • 8. Introduction Data is everywhere and is the driving force behind our lives The address book on your phone is data So is the newspaper that you read every morning Everything you see around you is a potential source of data which might be useful for a certain application We use this data to share information and make a more informed decision about different events Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 4 / 27
  • 9. Introduction Data is everywhere and is the driving force behind our lives The address book on your phone is data So is the newspaper that you read every morning Everything you see around you is a potential source of data which might be useful for a certain application We use this data to share information and make a more informed decision about different events Datasets can easily be classified on the basis of their structure Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 4 / 27
  • 10. Introduction Data is everywhere and is the driving force behind our lives The address book on your phone is data So is the newspaper that you read every morning Everything you see around you is a potential source of data which might be useful for a certain application We use this data to share information and make a more informed decision about different events Datasets can easily be classified on the basis of their structure 1 Structured 2 Unstructured 3 Semi-structured Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 4 / 27
  • 11. Structured Data Formatted in a universally understandable and identifiable way Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 5 / 27
  • 12. Structured Data Formatted in a universally understandable and identifiable way In most cases, structured data is formally specified by a schema Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 5 / 27
  • 13. Structured Data Formatted in a universally understandable and identifiable way In most cases, structured data is formally specified by a schema Your phone address phone is structured because it has a schema consisting of name, phone number, address, email address, etc. Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 5 / 27
  • 14. Structured Data Formatted in a universally understandable and identifiable way In most cases, structured data is formally specified by a schema Your phone address phone is structured because it has a schema consisting of name, phone number, address, email address, etc. Most traditional databases contain structured data revolving around data laid out across columns and rows Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 5 / 27
  • 15. Structured Data Formatted in a universally understandable and identifiable way In most cases, structured data is formally specified by a schema Your phone address phone is structured because it has a schema consisting of name, phone number, address, email address, etc. Most traditional databases contain structured data revolving around data laid out across columns and rows Each field also has an associated type Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 5 / 27
  • 16. Structured Data Formatted in a universally understandable and identifiable way In most cases, structured data is formally specified by a schema Your phone address phone is structured because it has a schema consisting of name, phone number, address, email address, etc. Most traditional databases contain structured data revolving around data laid out across columns and rows Each field also has an associated type Possible to search for items based on their data types Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 5 / 27
  • 17. Unstructured Data Data without any conceptual definition or type Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 6 / 27
  • 18. Unstructured Data Data without any conceptual definition or type Can vary from raw text to binary data Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 6 / 27
  • 19. Unstructured Data Data without any conceptual definition or type Can vary from raw text to binary data Processing unstructured data requires parsing and tagging on the fly Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 6 / 27
  • 20. Unstructured Data Data without any conceptual definition or type Can vary from raw text to binary data Processing unstructured data requires parsing and tagging on the fly In most cases, consists of simple log files Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 6 / 27
  • 21. Semi-structured Data Occupies the space between the structured and unstructured data spectrum Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 7 / 27
  • 22. Semi-structured Data Occupies the space between the structured and unstructured data spectrum For instance, while binary data has no structure, audio and video files have meta-data which has structure, such as author, time of creation, etc. Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 7 / 27
  • 23. Semi-structured Data Occupies the space between the structured and unstructured data spectrum For instance, while binary data has no structure, audio and video files have meta-data which has structure, such as author, time of creation, etc. Can also be labelled as self-describing structure Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 7 / 27
  • 24. Outline 1 Datasets 2 Storage 3 Beyond RDBMS 4 NoSQL Taxonomy 5 NewSQL Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 8 / 27
  • 25. Database Management Systems (DBMS) Used to store and manage data Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 9 / 27
  • 26. Database Management Systems (DBMS) Used to store and manage data Support for large amounts of data Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 9 / 27
  • 27. Database Management Systems (DBMS) Used to store and manage data Support for large amounts of data Ensure concurrency, sharing, and locking Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 9 / 27
  • 28. Database Management Systems (DBMS) Used to store and manage data Support for large amounts of data Ensure concurrency, sharing, and locking Security is useful too; to enable fine-grained access control Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 9 / 27
  • 29. Database Management Systems (DBMS) Used to store and manage data Support for large amounts of data Ensure concurrency, sharing, and locking Security is useful too; to enable fine-grained access control Ability to keep working in the face of failure Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 9 / 27
  • 30. Relational Database Management Systems (RDBMS) The most popular and predominant storage system in use Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 10 / 27
  • 31. Relational Database Management Systems (RDBMS) The most popular and predominant storage system in use Data in different files is connected by using a key field Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 10 / 27
  • 32. Relational Database Management Systems (RDBMS) The most popular and predominant storage system in use Data in different files is connected by using a key field Data is laid out in different tables, with a key field that identifies each row Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 10 / 27
  • 33. Relational Database Management Systems (RDBMS) The most popular and predominant storage system in use Data in different files is connected by using a key field Data is laid out in different tables, with a key field that identifies each row The same key field is used to connect one table to another Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 10 / 27
  • 34. Relational Database Management Systems (RDBMS) The most popular and predominant storage system in use Data in different files is connected by using a key field Data is laid out in different tables, with a key field that identifies each row The same key field is used to connect one table to another For instance, a relation might have customer ID as key and her details as data; another table might have the same key but different data, say her purchases; yet another table with the same key might have a breakdown of her preferences Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 10 / 27
  • 35. Relational Database Management Systems (RDBMS) The most popular and predominant storage system in use Data in different files is connected by using a key field Data is laid out in different tables, with a key field that identifies each row The same key field is used to connect one table to another For instance, a relation might have customer ID as key and her details as data; another table might have the same key but different data, say her purchases; yet another table with the same key might have a breakdown of her preferences Examples include Oracle Database, MS SQL Server, MySQL, IBM DB2, and Teradata Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 10 / 27
  • 36. Structured Query Language (SQL) Non-procedural language used for data retrieval and manipulation in RDBMS Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 11 / 27
  • 37. Structured Query Language (SQL) Non-procedural language used for data retrieval and manipulation in RDBMS Adds a layer of abstraction over relational algebra, which enables set operations, selections, etc. Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 11 / 27
  • 38. Structured Query Language (SQL) Non-procedural language used for data retrieval and manipulation in RDBMS Adds a layer of abstraction over relational algebra, which enables set operations, selections, etc. Due to its declarative nature, users operate in terms of their expected output while the underlying system decides the actual query execution plan Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 11 / 27
  • 39. Structured Query Language (SQL) Non-procedural language used for data retrieval and manipulation in RDBMS Adds a layer of abstraction over relational algebra, which enables set operations, selections, etc. Due to its declarative nature, users operate in terms of their expected output while the underlying system decides the actual query execution plan Instructions consist of a specific SQL statement and additional parameters and operands Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 11 / 27
  • 40. Structured Query Language (SQL) Non-procedural language used for data retrieval and manipulation in RDBMS Adds a layer of abstraction over relational algebra, which enables set operations, selections, etc. Due to its declarative nature, users operate in terms of their expected output while the underlying system decides the actual query execution plan Instructions consist of a specific SQL statement and additional parameters and operands For instance, the SELECT operator retrieves certain records, INSERT adds a record, and so on Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 11 / 27
  • 41. RDBMS and Structured Data As structured data follows a predefined schema, it naturally maps on to a relational database system Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 12 / 27
  • 42. RDBMS and Structured Data As structured data follows a predefined schema, it naturally maps on to a relational database system The schema defines the type and structure of the data and its relations Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 12 / 27
  • 43. RDBMS and Structured Data As structured data follows a predefined schema, it naturally maps on to a relational database system The schema defines the type and structure of the data and its relations Schema design is an arduous process and needs to be done before the database can be populated Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 12 / 27
  • 44. RDBMS and Structured Data As structured data follows a predefined schema, it naturally maps on to a relational database system The schema defines the type and structure of the data and its relations Schema design is an arduous process and needs to be done before the database can be populated Another consequence of a strict schema is that it is non-trivial to extend it Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 12 / 27
  • 45. RDBMS and Structured Data As structured data follows a predefined schema, it naturally maps on to a relational database system The schema defines the type and structure of the data and its relations Schema design is an arduous process and needs to be done before the database can be populated Another consequence of a strict schema is that it is non-trivial to extend it For instance, adding a new attribute to an existing row necessitates adding a new column to the entire table Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 12 / 27
  • 46. RDBMS and Structured Data As structured data follows a predefined schema, it naturally maps on to a relational database system The schema defines the type and structure of the data and its relations Schema design is an arduous process and needs to be done before the database can be populated Another consequence of a strict schema is that it is non-trivial to extend it For instance, adding a new attribute to an existing row necessitates adding a new column to the entire table Extremely suboptimal in tables with millions of rows Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 12 / 27
  • 47. RDBMS and Semi- and Un-structured Data Unstructured data has no notion of schema while semi-structured data only has a weak one Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 13 / 27
  • 48. RDBMS and Semi- and Un-structured Data Unstructured data has no notion of schema while semi-structured data only has a weak one Data within such datasets also has an associated type Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 13 / 27
  • 49. RDBMS and Semi- and Un-structured Data Unstructured data has no notion of schema while semi-structured data only has a weak one Data within such datasets also has an associated type In fact, types are application-centric: It might be possible to interpret a field as a float in one application and as a string in another Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 13 / 27
  • 50. RDBMS and Semi- and Un-structured Data Unstructured data has no notion of schema while semi-structured data only has a weak one Data within such datasets also has an associated type In fact, types are application-centric: It might be possible to interpret a field as a float in one application and as a string in another While it is possible, with human intervention, to glean structure from unstructured data, it is an extremely expensive task Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 13 / 27
  • 51. RDBMS and Semi- and Un-structured Data Unstructured data has no notion of schema while semi-structured data only has a weak one Data within such datasets also has an associated type In fact, types are application-centric: It might be possible to interpret a field as a float in one application and as a string in another While it is possible, with human intervention, to glean structure from unstructured data, it is an extremely expensive task Structureless data generated by real-time sources can change the number of attributes and their types on the fly Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 13 / 27
  • 52. RDBMS and Semi- and Un-structured Data Unstructured data has no notion of schema while semi-structured data only has a weak one Data within such datasets also has an associated type In fact, types are application-centric: It might be possible to interpret a field as a float in one application and as a string in another While it is possible, with human intervention, to glean structure from unstructured data, it is an extremely expensive task Structureless data generated by real-time sources can change the number of attributes and their types on the fly RDBMS would require the creation of a new table each time such a change takes place Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 13 / 27
  • 53. RDBMS and Semi- and Un-structured Data Unstructured data has no notion of schema while semi-structured data only has a weak one Data within such datasets also has an associated type In fact, types are application-centric: It might be possible to interpret a field as a float in one application and as a string in another While it is possible, with human intervention, to glean structure from unstructured data, it is an extremely expensive task Structureless data generated by real-time sources can change the number of attributes and their types on the fly RDBMS would require the creation of a new table each time such a change takes place Therefore, unstructured and semi-structured data does not fit the relational model Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 13 / 27
  • 54. Outline 1 Datasets 2 Storage 3 Beyond RDBMS 4 NoSQL Taxonomy 5 NewSQL Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 14 / 27
  • 55. Motivation Different semantics: RDBMS provide ACID semantics: 1 Atomic: The entire transaction either succeeds or fails Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 15 / 27
  • 56. Motivation Different semantics: RDBMS provide ACID semantics: 1 Atomic: The entire transaction either succeeds or fails 2 Consistent: Data within the database remains consistent after each transaction Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 15 / 27
  • 57. Motivation Different semantics: RDBMS provide ACID semantics: 1 Atomic: The entire transaction either succeeds or fails 2 Consistent: Data within the database remains consistent after each transaction 3 Isolation: Transactions are sandboxed from each other Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 15 / 27
  • 58. Motivation Different semantics: RDBMS provide ACID semantics: 1 Atomic: The entire transaction either succeeds or fails 2 Consistent: Data within the database remains consistent after each transaction 3 Isolation: Transactions are sandboxed from each other 4 Durable: Transactions are persistent across failures and restarts Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 15 / 27
  • 59. Motivation Different semantics: RDBMS provide ACID semantics: 1 Atomic: The entire transaction either succeeds or fails 2 Consistent: Data within the database remains consistent after each transaction 3 Isolation: Transactions are sandboxed from each other 4 Durable: Transactions are persistent across failures and restarts Overkill in case of most user-facing applications Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 15 / 27
  • 60. Motivation Different semantics: RDBMS provide ACID semantics: 1 Atomic: The entire transaction either succeeds or fails 2 Consistent: Data within the database remains consistent after each transaction 3 Isolation: Transactions are sandboxed from each other 4 Durable: Transactions are persistent across failures and restarts Overkill in case of most user-facing applications Most applications are more interested in availability and willing to sacrifice consistency leading to eventual consistency Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 15 / 27
  • 61. Motivation Different semantics: RDBMS provide ACID semantics: 1 Atomic: The entire transaction either succeeds or fails 2 Consistent: Data within the database remains consistent after each transaction 3 Isolation: Transactions are sandboxed from each other 4 Durable: Transactions are persistent across failures and restarts Overkill in case of most user-facing applications Most applications are more interested in availability and willing to sacrifice consistency leading to eventual consistency This basically available, soft state, eventually consistent (BASE) model enables applications to function even in the face of partial failure Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 15 / 27
  • 62. Motivation Different semantics: RDBMS provide ACID semantics: 1 Atomic: The entire transaction either succeeds or fails 2 Consistent: Data within the database remains consistent after each transaction 3 Isolation: Transactions are sandboxed from each other 4 Durable: Transactions are persistent across failures and restarts Overkill in case of most user-facing applications Most applications are more interested in availability and willing to sacrifice consistency leading to eventual consistency This basically available, soft state, eventually consistent (BASE) model enables applications to function even in the face of partial failure High Throughput: Most NoSQL databases sacrifice consistency for availability leading to higher throughput (in some cases an order of magnitude) Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 15 / 27
  • 63. Motivation (2) Horizontal Scalability: To cater for more data, NoSQL stores can be scaled up by just adding more machines and the underlying system automatically re-distributes the data Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 16 / 27
  • 64. Motivation (2) Horizontal Scalability: To cater for more data, NoSQL stores can be scaled up by just adding more machines and the underlying system automatically re-distributes the data Commodity Hardware: A large number of RDBMS require specialized and proprietary hardware for operation. In contrast, NoSQL databases function over commodity off-the-shelf hardware Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 16 / 27
  • 65. Motivation (2) Horizontal Scalability: To cater for more data, NoSQL stores can be scaled up by just adding more machines and the underlying system automatically re-distributes the data Commodity Hardware: A large number of RDBMS require specialized and proprietary hardware for operation. In contrast, NoSQL databases function over commodity off-the-shelf hardware Programming Language Support: Over the years programming languages have started providing abstractions for database support (LINQ, etc.) while bypassing SQL. NoSQL databases provide abstractions that directly map onto the language abstractions leading to tighter coupling Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 16 / 27
  • 66. Motivation (3) The Rise of Cloud Computing: Cloud Computing applications require horizontal scalability and low administration overhead. Both requirements are naturally satisfied by NoSQL stores Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 17 / 27
  • 67. Outline 1 Datasets 2 Storage 3 Beyond RDBMS 4 NoSQL Taxonomy 5 NewSQL Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 18 / 27
  • 68. Introduction NoSQL databases can be classified on the basis of: 1 Data Model: How data is represented Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 19 / 27
  • 69. Introduction NoSQL databases can be classified on the basis of: 1 Data Model: How data is represented 2 Scalability: How scalable the system is Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 19 / 27
  • 70. Introduction NoSQL databases can be classified on the basis of: 1 Data Model: How data is represented 2 Scalability: How scalable the system is 3 Query Model: What type of API it exposes Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 19 / 27
  • 71. Introduction NoSQL databases can be classified on the basis of: 1 Data Model: How data is represented 2 Scalability: How scalable the system is 3 Query Model: What type of API it exposes 4 Persistence: How persistent the data is Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 19 / 27
  • 72. Classification by Data Model Based on the data model, NoSQL databases can roughly be categorized into three categories: 1 Key/value Stores: A map/dictionary allowing put/get semantics per key Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 20 / 27
  • 73. Classification by Data Model Based on the data model, NoSQL databases can roughly be categorized into three categories: 1 Key/value Stores: A map/dictionary allowing put/get semantics per key 2 Document Stores: Complex data structures to encapsulate document key/value pairs Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 20 / 27
  • 74. Classification by Data Model Based on the data model, NoSQL databases can roughly be categorized into three categories: 1 Key/value Stores: A map/dictionary allowing put/get semantics per key 2 Document Stores: Complex data structures to encapsulate document key/value pairs 3 Column-Oriented Stores: Data laid out by column Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 20 / 27
  • 75. Key/value Stores Data is stored within a large hash map Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 21 / 27
  • 76. Key/value Stores Data is stored within a large hash map Simple get/put API Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 21 / 27
  • 77. Key/value Stores Data is stored within a large hash map Simple get/put API Favour scalability over consistency Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 21 / 27
  • 78. Key/value Stores Data is stored within a large hash map Simple get/put API Favour scalability over consistency Limit on the size of the key Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 21 / 27
  • 79. Key/value Stores Data is stored within a large hash map Simple get/put API Favour scalability over consistency Limit on the size of the key Examples include Amazon’s Dynamo, LinkedIn’s Voldemort, Redis, and Memcached Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 21 / 27
  • 80. Document Stores Key/value semantics but based on documents Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 22 / 27
  • 81. Document Stores Key/value semantics but based on documents A document encapsulates data in a standard format, such as JSON, XML, PDF, etc. Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 22 / 27
  • 82. Document Stores Key/value semantics but based on documents A document encapsulates data in a standard format, such as JSON, XML, PDF, etc. Documents themselves can be heterogeneous Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 22 / 27
  • 83. Document Stores Key/value semantics but based on documents A document encapsulates data in a standard format, such as JSON, XML, PDF, etc. Documents themselves can be heterogeneous Documents can also be retrieved based on their content Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 22 / 27
  • 84. Document Stores Key/value semantics but based on documents A document encapsulates data in a standard format, such as JSON, XML, PDF, etc. Documents themselves can be heterogeneous Documents can also be retrieved based on their content Examples include Apache CouchDB and MongoDB Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 22 / 27
  • 85. Column-Oriented Stores Data is stored and processed by column Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 23 / 27
  • 86. Column-Oriented Stores Data is stored and processed by column Useful for read-mostly and read-intensive data Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 23 / 27
  • 87. Column-Oriented Stores Data is stored and processed by column Useful for read-mostly and read-intensive data Data within the same column is of the same type enabling opportunities for efficient compression Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 23 / 27
  • 88. Column-Oriented Stores Data is stored and processed by column Useful for read-mostly and read-intensive data Data within the same column is of the same type enabling opportunities for efficient compression Columns are stored separately so they can be loaded in parallel Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 23 / 27
  • 89. Column-Oriented Stores Data is stored and processed by column Useful for read-mostly and read-intensive data Data within the same column is of the same type enabling opportunities for efficient compression Columns are stored separately so they can be loaded in parallel Examples include Google’s BigTable (Apache HBase is its open source clone) and Facebook’s Cassandra Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 23 / 27
  • 90. Outline 1 Datasets 2 Storage 3 Beyond RDBMS 4 NoSQL Taxonomy 5 NewSQL Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 24 / 27
  • 91. Introduction A hybrid of traditional RDBMS and NoSQL Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 25 / 27
  • 92. Introduction A hybrid of traditional RDBMS and NoSQL Scalability and performance of NoSQL and ACID guarantees of RDBMS Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 25 / 27
  • 93. Introduction A hybrid of traditional RDBMS and NoSQL Scalability and performance of NoSQL and ACID guarantees of RDBMS Use SQL as the primary language Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 25 / 27
  • 94. Introduction A hybrid of traditional RDBMS and NoSQL Scalability and performance of NoSQL and ACID guarantees of RDBMS Use SQL as the primary language Ability to scale out and run over commodity hardware Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 25 / 27
  • 95. Introduction A hybrid of traditional RDBMS and NoSQL Scalability and performance of NoSQL and ACID guarantees of RDBMS Use SQL as the primary language Ability to scale out and run over commodity hardware Classified into: 1 New Databases: Designed from scratch Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 25 / 27
  • 96. Introduction A hybrid of traditional RDBMS and NoSQL Scalability and performance of NoSQL and ACID guarantees of RDBMS Use SQL as the primary language Ability to scale out and run over commodity hardware Classified into: 1 New Databases: Designed from scratch 2 New MySQL Storage Engines: Keep MySQL as interface but replace the storage engine Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 25 / 27
  • 97. Introduction A hybrid of traditional RDBMS and NoSQL Scalability and performance of NoSQL and ACID guarantees of RDBMS Use SQL as the primary language Ability to scale out and run over commodity hardware Classified into: 1 New Databases: Designed from scratch 2 New MySQL Storage Engines: Keep MySQL as interface but replace the storage engine 3 Transparent Clustering: Add pluggable features to existing databases to ensure scalability Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 25 / 27
  • 98. New Databases 1 Query Distribution: Each node holds a subset of the data Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 26 / 27
  • 99. New Databases 1 Query Distribution: Each node holds a subset of the data Queries are split and shipped to nodes that own the data Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 26 / 27
  • 100. New Databases 1 Query Distribution: Each node holds a subset of the data Queries are split and shipped to nodes that own the data Examples include Google’s Spanner and NuoDB Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 26 / 27
  • 101. New Databases 1 Query Distribution: Each node holds a subset of the data Queries are split and shipped to nodes that own the data Examples include Google’s Spanner and NuoDB 2 Pull Data: A central node (possibly replicated) holds all data Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 26 / 27
  • 102. New Databases 1 Query Distribution: Each node holds a subset of the data Queries are split and shipped to nodes that own the data Examples include Google’s Spanner and NuoDB 2 Pull Data: A central node (possibly replicated) holds all data A set of processing nodes receives queries and pulls in required data from the central node Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 26 / 27
  • 103. New Databases 1 Query Distribution: Each node holds a subset of the data Queries are split and shipped to nodes that own the data Examples include Google’s Spanner and NuoDB 2 Pull Data: A central node (possibly replicated) holds all data A set of processing nodes receives queries and pulls in required data from the central node Examples include VMware’s SQLFire Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 26 / 27
  • 104. References 1 NoSQL Databases: https: //oak.cs.ucla.edu/cs144/handouts/nosqldbs.pdf 2 NewSQL – The New Way to Handle Big Data: http://www. linuxforu.com/2012/01/newsql-handle-big-data/ Zubair Nabi 10: Taxonomy of Data and Storage April 20, 2013 27 / 27