2. Non-relational
Flexible schema
Other or additional query languages than SQL
Distributed – horizontal scaling
Less structured data
Supports big data
2
INTRODUCTION OF NOSQL
3. When compared to relational databases, NoSQL databases are more scalable and provide
superior performance, and their data model addresses several issues that the relational
model is not designed to address:
◦ Geographically distributed architecture instead of expensive,
monolithic architecture
◦ Large volumes of rapidly changing structured, semi-structured, and
unstructured data
◦ Agile sprints, quick schema iteration, and frequent code pushes
◦ Object-oriented programming that is easy to use and flexible
3
4. It’s Not No SQL it’s NOT ONLY SQL.
It’s not even a replacement to RDBMS.
As compared to the good olden days we are
saving more and more data.
Connection between the data is growing in
which we require an architecture that takes
advantage of these two key issues.
5. MongoDB is a cross-platform, document
oriented database that provides
High performance.
High availability.
Easy scalability.
MongoDB works on concept of collection and
document.
7. When your requirements has these properties :
You absolutely must store unstructured data. Say
things coming from 3rd-party API you don’t
control, logs whose format may change any
minute, user-entered metadata, but you want
indexes on a subset of it.
You need to handle more reads/writes than
single server can deal with and master-slave
architecture won’t work for you.
You change your schema very often on a large
dataset.
8. Stands for No-SQL or Not Only SQL??
Class of non-relational data storage
systems
E.g. BigTable, Dynamo, PNUTS/Sherpa, ..
Usually do not require a fixed table schema
nor do they use the concept of joins
Distributed data storage systems
All NoSQL offerings relax one or more of
the ACID properties (will talk about the CAP
theorem)
9. Basic API access:
get(key) -- Extract the value given a key
put(key, value) -- Create or update the
value given its key
delete(key) -- Remove the key and its
associated value
execute(key, operation, parameters) --
Invoke an operation to the value (given
its key) which is a special data structure
(e.g. List, Set, Map .... etc).
10. NoSQL Data Storage: Classification
Uninterpreted key/value or ‘the big hash
table’.
Amazon S3 (Dynamo)
Flexible schema
BigTable, Cassandra, Base (ordered keys, semi-
structured data),
Sherpa/PNuts (unordered keys, JSON)
MongoDB (based on JSON)
CouchDB (name/value in text)
11. Cheap, easy to implement (open source)
Data are replicated to multiple nodes (therefore
identical and fault-tolerant) and can be
partitioned
When data is written, the latest version is on at least
one node and then replicated to other nodes
No single point of failure
Easy to distribute
Don't require a schema
12. What does NoSQL Not Provide?
Joins
Group by
But PNUTS provides interesting
materialized view approach to
joins/aggregation.
ACID transactions
SQL
Integration with applications that are
based on SQL
17. Has two phases:
A map stage that processes each document
and emits one or more objects for each input document
A reduce phase that combines the output of the map
operation.
An optional finalize stage for final modifications to the
result
Uses Custom JavaScript functions
Provides greater flexibility but is less efficient and
more complex than the aggregation pipeline
Can have output sets that exceed the 16 megabyte
output limitation of the aggregation pipeline.
18. It’s Not No SQL it’s NOT ONLY SQL.
It’s not even a replacement to RDBMS.
As compared to the good olden days we are saving
more and more data.
Connection between the data is growing in which
we require an architecture that takes advantage of
these two key issues.
19. Key Value pair
Dynamo DB
Azure Table Storage
(ATS )
Graph
database
Document Based
Mango Db
AmazonSimple DB
Couch DB
Column Oriented database
(#key,#value)
(Name, Tom)
(Age,25)
(Role, Student)
(University, CU)
[
{
"Name":
"Tom",
"Age": 30,
"Role":
"Student",
"University":
"CU",
}
]
Student
Tom
CU
25
Masters
Ottawa Location
• Neo4j
• Infogrid
Row Id Columns
1
Name Tom
Age 25
Role Studen
t
Bigtable(Google)
Base
20. MongoDB is a cross-platform, document oriented
database that provides
High performance.
High availability.
Easy scalability.
MongoDB works on concept of collection and
document.
21. All the modern applications deals with huge data.
Development with ease is possible with mongo DB.
Flexibility in deployment.
Rich Queries.
Older database systems may not be compatible with
the design.
And it’s a document oriented storage:- Data is stored in
the form of JSON Style.
23. XML JSON
It is a markup language. It is a way of representing
objects.
This is more verbose than
JSON.
This format uses less words.
It is used to describe the
structured data.
It is used to describe
unstructured data which
include arrays.
JavaScript functions like
eval(), parse()
doesn’t work here.
When eval method is applied
to JSON it returns the
described object.
Example:
<car>
<company>Volkswagen</c
ompany>
{
"company": Volkswagen,
"name": "Vento",
24. What is it ?
How does it work ?
Hadoop
Tools
Architecture
25. Distributed database management system
Designed for big data
Scalable
Fault tolerant
No single point of failure
Has an SQL like query language
NoSQL
26. Organises data into tables
Uses Cassandra Query Language ( CQL )
Does not allow sub queries or joins
Supports Hadoop Map Reduce
Uses asynchronous masterless replication
◦ Gives low latency
Allows indexing
Allows batch analysis via Hadoop
27. How does Cassandra integrate with Hadoop
Support for Map Reduce
Integration with
◦ Apache Pig
◦ Apache Hive
Can also act as a back end for Solr !
29. A peer to peer cluster
No single point of failure
Tunable consistency
◦ Is performance or accuracy more important ?
Query by key or key range
Row oriented data storage
Rows can hold up to 2 billion columns