SlideShare una empresa de Scribd logo
1 de 56
Descargar para leer sin conexión
On Storing Big Data
Ilias Flaounas
Intelligent Systems Lab
30 October 2012
I. Flaounas (Intelligent Systems Lab) 30 October 2012 1 / 16
Storing Big Data
Data start to play an increasingly important role in business and
science.
I. Flaounas (Intelligent Systems Lab) 30 October 2012 2 / 16
Storing Big Data
Data start to play an increasingly important role in business and
science.
Storing, searching, sharing, analysing and visualising big data has
become a challenge.
I. Flaounas (Intelligent Systems Lab) 30 October 2012 2 / 16
Storing Big Data
Data start to play an increasingly important role in business and
science.
Storing, searching, sharing, analysing and visualising big data has
become a challenge.
Especially storing of data is often disregarded as an issue.
I. Flaounas (Intelligent Systems Lab) 30 October 2012 2 / 16
Storing Big Data
Data start to play an increasingly important role in business and
science.
Storing, searching, sharing, analysing and visualising big data has
become a challenge.
Especially storing of data is often disregarded as an issue.
Note that sometimes a MySQL database is not enough.
I. Flaounas (Intelligent Systems Lab) 30 October 2012 2 / 16
Storing Big Data
Data start to play an increasingly important role in business and
science.
Storing, searching, sharing, analysing and visualising big data has
become a challenge.
Especially storing of data is often disregarded as an issue.
Note that sometimes a MySQL database is not enough.
Hadoop offers an out of the box distributed filesystem for storing data
files. However, the challenge appears when someone needs DB
capabilities, frequent updates or real time processing.
I. Flaounas (Intelligent Systems Lab) 30 October 2012 2 / 16
The Problems
Nowadays traditional relational databases can reach their limit in
performance.
I. Flaounas (Intelligent Systems Lab) 30 October 2012 3 / 16
The Problems
Nowadays traditional relational databases can reach their limit in
performance.
Data keep on coming in high velocity, high volumes, and high variety.
I. Flaounas (Intelligent Systems Lab) 30 October 2012 3 / 16
The Problems
Nowadays traditional relational databases can reach their limit in
performance.
Data keep on coming in high velocity, high volumes, and high variety.
Common practices to increase performance fail after a while: buying a
faster server, getting more RAM, using materialised views, fine tuning
queries...
I. Flaounas (Intelligent Systems Lab) 30 October 2012 3 / 16
The Problems
Nowadays traditional relational databases can reach their limit in
performance.
Data keep on coming in high velocity, high volumes, and high variety.
Common practices to increase performance fail after a while: buying a
faster server, getting more RAM, using materialised views, fine tuning
queries...
Furthermore, “alter table” doesn’t really work with lots of data.
I. Flaounas (Intelligent Systems Lab) 30 October 2012 3 / 16
The Problems
Nowadays traditional relational databases can reach their limit in
performance.
Data keep on coming in high velocity, high volumes, and high variety.
Common practices to increase performance fail after a while: buying a
faster server, getting more RAM, using materialised views, fine tuning
queries...
Furthermore, “alter table” doesn’t really work with lots of data.
Backups and data availability becomes an issue.
I. Flaounas (Intelligent Systems Lab) 30 October 2012 3 / 16
NoSQL Movement
The term is too broad and new to really define it.
I. Flaounas (Intelligent Systems Lab) 30 October 2012 4 / 16
NoSQL Movement
The term is too broad and new to really define it.
Wikipedia: “NoSQL (Not only SQL) DB systems are often highly
optimized for retrieve and append operations and often offer little
functionality beyond record storage.”
I. Flaounas (Intelligent Systems Lab) 30 October 2012 4 / 16
NoSQL Movement
The term is too broad and new to really define it.
Wikipedia: “NoSQL (Not only SQL) DB systems are often highly
optimized for retrieve and append operations and often offer little
functionality beyond record storage.”
No schema
I. Flaounas (Intelligent Systems Lab) 30 October 2012 4 / 16
NoSQL Movement
The term is too broad and new to really define it.
Wikipedia: “NoSQL (Not only SQL) DB systems are often highly
optimized for retrieve and append operations and often offer little
functionality beyond record storage.”
No schema
No joins between tables
I. Flaounas (Intelligent Systems Lab) 30 October 2012 4 / 16
NoSQL Movement
The term is too broad and new to really define it.
Wikipedia: “NoSQL (Not only SQL) DB systems are often highly
optimized for retrieve and append operations and often offer little
functionality beyond record storage.”
No schema
No joins between tables
No common scripting language (like SQL)
I. Flaounas (Intelligent Systems Lab) 30 October 2012 4 / 16
NoSQL Movement
The term is too broad and new to really define it.
Wikipedia: “NoSQL (Not only SQL) DB systems are often highly
optimized for retrieve and append operations and often offer little
functionality beyond record storage.”
No schema
No joins between tables
No common scripting language (like SQL)
No ACID (atomicity, consistency, isolation, durability)
I. Flaounas (Intelligent Systems Lab) 30 October 2012 4 / 16
NoSQL Movement
The term is too broad and new to really define it.
Wikipedia: “NoSQL (Not only SQL) DB systems are often highly
optimized for retrieve and append operations and often offer little
functionality beyond record storage.”
No schema
No joins between tables
No common scripting language (like SQL)
No ACID (atomicity, consistency, isolation, durability)
On the other hand you gain horizontal scalability and high performance.
Also, most NoSQL systems are Map/Reduce ready and/or bind with
Hadoop.
I. Flaounas (Intelligent Systems Lab) 30 October 2012 4 / 16
NoSQL DBs
There are lots of different systems under the NoSQL ‘umbrella’. Each one
is optimised with different application scenarios in mind, and with different
choices on trade-offs.
I. Flaounas (Intelligent Systems Lab) 30 October 2012 5 / 16
NoSQL DBs
There are lots of different systems under the NoSQL ‘umbrella’. Each one
is optimised with different application scenarios in mind, and with different
choices on trade-offs.
Document based: CouchDB, MongoDB,...
I. Flaounas (Intelligent Systems Lab) 30 October 2012 5 / 16
NoSQL DBs
There are lots of different systems under the NoSQL ‘umbrella’. Each one
is optimised with different application scenarios in mind, and with different
choices on trade-offs.
Document based: CouchDB, MongoDB,...
Key-value: Cassandra, Dynamo, Riak,...
I. Flaounas (Intelligent Systems Lab) 30 October 2012 5 / 16
NoSQL DBs
There are lots of different systems under the NoSQL ‘umbrella’. Each one
is optimised with different application scenarios in mind, and with different
choices on trade-offs.
Document based: CouchDB, MongoDB,...
Key-value: Cassandra, Dynamo, Riak,...
Tabular based: BigTable, HBase,...
I. Flaounas (Intelligent Systems Lab) 30 October 2012 5 / 16
NoSQL DBs
There are lots of different systems under the NoSQL ‘umbrella’. Each one
is optimised with different application scenarios in mind, and with different
choices on trade-offs.
Document based: CouchDB, MongoDB,...
Key-value: Cassandra, Dynamo, Riak,...
Tabular based: BigTable, HBase,...
Memory based: Memcached, Redis, other optimised for solid state
disks...
I. Flaounas (Intelligent Systems Lab) 30 October 2012 5 / 16
NoSQL DBs
There are lots of different systems under the NoSQL ‘umbrella’. Each one
is optimised with different application scenarios in mind, and with different
choices on trade-offs.
Document based: CouchDB, MongoDB,...
Key-value: Cassandra, Dynamo, Riak,...
Tabular based: BigTable, HBase,...
Memory based: Memcached, Redis, other optimised for solid state
disks...
Specialised for graphs: Neo4j, InfiniteGraph,...
I. Flaounas (Intelligent Systems Lab) 30 October 2012 5 / 16
NoSQL DBs
There are lots of different systems under the NoSQL ‘umbrella’. Each one
is optimised with different application scenarios in mind, and with different
choices on trade-offs.
Document based: CouchDB, MongoDB,...
Key-value: Cassandra, Dynamo, Riak,...
Tabular based: BigTable, HBase,...
Memory based: Memcached, Redis, other optimised for solid state
disks...
Specialised for graphs: Neo4j, InfiniteGraph,...
Specialised for full-text search: Lucene, Solr...
I. Flaounas (Intelligent Systems Lab) 30 October 2012 5 / 16
NoSQL DBs
There are lots of different systems under the NoSQL ‘umbrella’. Each one
is optimised with different application scenarios in mind, and with different
choices on trade-offs.
Document based: CouchDB, MongoDB,...
Key-value: Cassandra, Dynamo, Riak,...
Tabular based: BigTable, HBase,...
Memory based: Memcached, Redis, other optimised for solid state
disks...
Specialised for graphs: Neo4j, InfiniteGraph,...
Specialised for full-text search: Lucene, Solr...
Understand your requirements and then make a choice.
I. Flaounas (Intelligent Systems Lab) 30 October 2012 5 / 16
Oracle response
I. Flaounas (Intelligent Systems Lab) 30 October 2012 6 / 16
Oracle response
May, 2011: Oracle issues a white paper titled “Debunking the NoSQL
Hype”.
I. Flaounas (Intelligent Systems Lab) 30 October 2012 6 / 16
Oracle response
May, 2011: Oracle issues a white paper titled “Debunking the NoSQL
Hype”.
The conclusion:
“Go for the tried and true path. Don’t be risking your data on NoSQL
databases.”
I. Flaounas (Intelligent Systems Lab) 30 October 2012 6 / 16
Oracle response
May, 2011: Oracle issues a white paper titled “Debunking the NoSQL
Hype”.
The conclusion:
“Go for the tried and true path. Don’t be risking your data on NoSQL
databases.”
October 2011: Oracle releases the “Oracle NoSQL Database”. The white
paper is now reachable only via Google archives.
I. Flaounas (Intelligent Systems Lab) 30 October 2012 6 / 16
Example: MongoDB
MongoDB (from “humongous”) is an open source, high-performance,
schema-free, document-oriented database.
I. Flaounas (Intelligent Systems Lab) 30 October 2012 7 / 16
Example: MongoDB
MongoDB (from “humongous”) is an open source, high-performance,
schema-free, document-oriented database.
Document-Oriented storage
I. Flaounas (Intelligent Systems Lab) 30 October 2012 7 / 16
Example: MongoDB
MongoDB (from “humongous”) is an open source, high-performance,
schema-free, document-oriented database.
Document-Oriented storage
No predefined schema
I. Flaounas (Intelligent Systems Lab) 30 October 2012 7 / 16
Example: MongoDB
MongoDB (from “humongous”) is an open source, high-performance,
schema-free, document-oriented database.
Document-Oriented storage
No predefined schema
High Performance
I. Flaounas (Intelligent Systems Lab) 30 October 2012 7 / 16
Example: MongoDB
MongoDB (from “humongous”) is an open source, high-performance,
schema-free, document-oriented database.
Document-Oriented storage
No predefined schema
High Performance
Easy to add new “columns” in data rows
I. Flaounas (Intelligent Systems Lab) 30 October 2012 7 / 16
Example: MongoDB
MongoDB (from “humongous”) is an open source, high-performance,
schema-free, document-oriented database.
Document-Oriented storage
No predefined schema
High Performance
Easy to add new “columns” in data rows
No joins between tables
I. Flaounas (Intelligent Systems Lab) 30 October 2012 7 / 16
Example: MongoDB
MongoDB (from “humongous”) is an open source, high-performance,
schema-free, document-oriented database.
Document-Oriented storage
No predefined schema
High Performance
Easy to add new “columns” in data rows
No joins between tables
Easy to scale horizontally: Auto-Sharding
I. Flaounas (Intelligent Systems Lab) 30 October 2012 7 / 16
Example: MongoDB
MongoDB (from “humongous”) is an open source, high-performance,
schema-free, document-oriented database.
Document-Oriented storage
No predefined schema
High Performance
Easy to add new “columns” in data rows
No joins between tables
Easy to scale horizontally: Auto-Sharding
Automatic fail-over: invisible to applications
I. Flaounas (Intelligent Systems Lab) 30 October 2012 7 / 16
Example: MongoDB
MongoDB (from “humongous”) is an open source, high-performance,
schema-free, document-oriented database.
Document-Oriented storage
No predefined schema
High Performance
Easy to add new “columns” in data rows
No joins between tables
Easy to scale horizontally: Auto-Sharding
Automatic fail-over: invisible to applications
Full Index Support
I. Flaounas (Intelligent Systems Lab) 30 October 2012 7 / 16
Example: MongoDB
MongoDB (from “humongous”) is an open source, high-performance,
schema-free, document-oriented database.
Document-Oriented storage
No predefined schema
High Performance
Easy to add new “columns” in data rows
No joins between tables
Easy to scale horizontally: Auto-Sharding
Automatic fail-over: invisible to applications
Full Index Support
Map/Reduce ready - Can bind with Hadoop
I. Flaounas (Intelligent Systems Lab) 30 October 2012 7 / 16
Example: MongoDB
MongoDB (from “humongous”) is an open source, high-performance,
schema-free, document-oriented database.
Document-Oriented storage
No predefined schema
High Performance
Easy to add new “columns” in data rows
No joins between tables
Easy to scale horizontally: Auto-Sharding
Automatic fail-over: invisible to applications
Full Index Support
Map/Reduce ready - Can bind with Hadoop
Eventually consistent
I. Flaounas (Intelligent Systems Lab) 30 October 2012 7 / 16
Example: MongoDB
MongoDB (from “humongous”) is an open source, high-performance,
schema-free, document-oriented database.
Document-Oriented storage
No predefined schema
High Performance
Easy to add new “columns” in data rows
No joins between tables
Easy to scale horizontally: Auto-Sharding
Automatic fail-over: invisible to applications
Full Index Support
Map/Reduce ready - Can bind with Hadoop
Eventually consistent
Open Source but developed and maintained by company “10gen”
I. Flaounas (Intelligent Systems Lab) 30 October 2012 7 / 16
Document based DB
A document is represented in JSON format:
{
“ id” : 12345678,
“Link” : “http://news.scotsman.com/abc.html”,
“Title”:“Blah blah blah”,
“Content”: “More blah blah”,
“OutletID” : 14,
“Date” : ISODate(“2011-11-17T20:33:15.097Z”),
“ Hash” : 550973592,
“Tags” : [ International, News, Scotland],
}
I. Flaounas (Intelligent Systems Lab) 30 October 2012 8 / 16
Single Server
A single machine stores the DB, e.g MySQL.
I. Flaounas (Intelligent Systems Lab) 30 October 2012 9 / 16
Master/Slave
Two machines in Master/Slave configuration.
I. Flaounas (Intelligent Systems Lab) 30 October 2012 10 / 16
MongoDB - Replication
Automatic Fail Over - The Master is elected among servers.
I. Flaounas (Intelligent Systems Lab) 30 October 2012 11 / 16
MongoDB - Sharding
Data is spread horizontally.
I. Flaounas (Intelligent Systems Lab) 30 October 2012 12 / 16
MongoDB
If new shard is added, data is balanced automatically.
I. Flaounas (Intelligent Systems Lab) 30 October 2012 13 / 16
MongoDB
No single point of failure, distributed read/writes.
I. Flaounas (Intelligent Systems Lab) 30 October 2012 14 / 16
Big Data come with Big Problems
Maintenance of infrastructure - It is easier to manage one instead of
10 servers
I. Flaounas (Intelligent Systems Lab) 30 October 2012 15 / 16
Big Data come with Big Problems
Maintenance of infrastructure - It is easier to manage one instead of
10 servers
Need to adapt legacy software
I. Flaounas (Intelligent Systems Lab) 30 October 2012 15 / 16
Big Data come with Big Problems
Maintenance of infrastructure - It is easier to manage one instead of
10 servers
Need to adapt legacy software
Training people on the new techs
I. Flaounas (Intelligent Systems Lab) 30 October 2012 15 / 16
Big Data come with Big Problems
Maintenance of infrastructure - It is easier to manage one instead of
10 servers
Need to adapt legacy software
Training people on the new techs
Designing DB – splitting data among machines for maximum I/O
I. Flaounas (Intelligent Systems Lab) 30 October 2012 15 / 16
Big Data come with Big Problems
Maintenance of infrastructure - It is easier to manage one instead of
10 servers
Need to adapt legacy software
Training people on the new techs
Designing DB – splitting data among machines for maximum I/O
Bugs or ‘simple’ features may be missing, new versions come out too
often...
I. Flaounas (Intelligent Systems Lab) 30 October 2012 15 / 16
Big Data come with Big Problems
Maintenance of infrastructure - It is easier to manage one instead of
10 servers
Need to adapt legacy software
Training people on the new techs
Designing DB – splitting data among machines for maximum I/O
Bugs or ‘simple’ features may be missing, new versions come out too
often...
Security
I. Flaounas (Intelligent Systems Lab) 30 October 2012 15 / 16
Thank you!
I. Flaounas (Intelligent Systems Lab) 30 October 2012 16 / 16

Más contenido relacionado

Similar a On Storing Big Data

Considerations for using NoSQL technology on your next IT project
Considerations for using NoSQL technology on your next IT projectConsiderations for using NoSQL technology on your next IT project
Considerations for using NoSQL technology on your next IT projectAkmal Chaudhri
 
Considerations for using NoSQL technology on your next IT project
Considerations for using NoSQL technology on your next IT projectConsiderations for using NoSQL technology on your next IT project
Considerations for using NoSQL technology on your next IT projectAkmal Chaudhri
 
Considerations for using NoSQL technology on your next IT project
Considerations for using NoSQL technology on your next IT projectConsiderations for using NoSQL technology on your next IT project
Considerations for using NoSQL technology on your next IT projectAkmal Chaudhri
 
Considerations for using NoSQL technology on your next IT project
Considerations for using NoSQL technology on your next IT projectConsiderations for using NoSQL technology on your next IT project
Considerations for using NoSQL technology on your next IT projectAkmal Chaudhri
 
In Memory Database Essay
In Memory Database EssayIn Memory Database Essay
In Memory Database EssayTammy Moncrief
 
There's no such thing as big data
There's no such thing as big dataThere's no such thing as big data
There's no such thing as big dataAndrew Clegg
 
Graph databases and OrientDB
Graph databases and OrientDBGraph databases and OrientDB
Graph databases and OrientDBAhsan Bilal
 
Non relational databases-no sql
Non relational databases-no sqlNon relational databases-no sql
Non relational databases-no sqlRam kumar
 
Considerations for using NoSQL technology on your next IT project - Akmal Cha...
Considerations for using NoSQL technology on your next IT project - Akmal Cha...Considerations for using NoSQL technology on your next IT project - Akmal Cha...
Considerations for using NoSQL technology on your next IT project - Akmal Cha...BCS Data Management Specialist Group
 
Minimizing the Complexities of Machine Learning with Data Virtualization
Minimizing the Complexities of Machine Learning with Data VirtualizationMinimizing the Complexities of Machine Learning with Data Virtualization
Minimizing the Complexities of Machine Learning with Data VirtualizationDenodo
 
Why no sql_ibm_cloudant
Why no sql_ibm_cloudantWhy no sql_ibm_cloudant
Why no sql_ibm_cloudantPeter Tutty
 
1. introduction to no sql
1. introduction to no sql1. introduction to no sql
1. introduction to no sqlAnuja Gunale
 
Debunking "Purpose-Built Data Systems:": Enter the Universal Database
Debunking "Purpose-Built Data Systems:": Enter the Universal DatabaseDebunking "Purpose-Built Data Systems:": Enter the Universal Database
Debunking "Purpose-Built Data Systems:": Enter the Universal DatabaseStavros Papadopoulos
 
Seattle scalability meetup intro slides 23 oct 2013
Seattle scalability meetup intro slides 23 oct 2013Seattle scalability meetup intro slides 23 oct 2013
Seattle scalability meetup intro slides 23 oct 2013clive boulton
 

Similar a On Storing Big Data (20)

RDBMS vs NoSQL
RDBMS vs NoSQLRDBMS vs NoSQL
RDBMS vs NoSQL
 
On nosql
On nosqlOn nosql
On nosql
 
Considerations for using NoSQL technology on your next IT project
Considerations for using NoSQL technology on your next IT projectConsiderations for using NoSQL technology on your next IT project
Considerations for using NoSQL technology on your next IT project
 
Considerations for using NoSQL technology on your next IT project
Considerations for using NoSQL technology on your next IT projectConsiderations for using NoSQL technology on your next IT project
Considerations for using NoSQL technology on your next IT project
 
Considerations for using NoSQL technology on your next IT project
Considerations for using NoSQL technology on your next IT projectConsiderations for using NoSQL technology on your next IT project
Considerations for using NoSQL technology on your next IT project
 
Considerations for using NoSQL technology on your next IT project
Considerations for using NoSQL technology on your next IT projectConsiderations for using NoSQL technology on your next IT project
Considerations for using NoSQL technology on your next IT project
 
In Memory Database Essay
In Memory Database EssayIn Memory Database Essay
In Memory Database Essay
 
There's no such thing as big data
There's no such thing as big dataThere's no such thing as big data
There's no such thing as big data
 
Graph databases and OrientDB
Graph databases and OrientDBGraph databases and OrientDB
Graph databases and OrientDB
 
Non relational databases-no sql
Non relational databases-no sqlNon relational databases-no sql
Non relational databases-no sql
 
Considerations for using NoSQL technology on your next IT project - Akmal Cha...
Considerations for using NoSQL technology on your next IT project - Akmal Cha...Considerations for using NoSQL technology on your next IT project - Akmal Cha...
Considerations for using NoSQL technology on your next IT project - Akmal Cha...
 
Minimizing the Complexities of Machine Learning with Data Virtualization
Minimizing the Complexities of Machine Learning with Data VirtualizationMinimizing the Complexities of Machine Learning with Data Virtualization
Minimizing the Complexities of Machine Learning with Data Virtualization
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
 
Why no sql_ibm_cloudant
Why no sql_ibm_cloudantWhy no sql_ibm_cloudant
Why no sql_ibm_cloudant
 
On no sql.partiii
On no sql.partiiiOn no sql.partiii
On no sql.partiii
 
Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-Scaling the (evolving) web data –at low cost-
Scaling the (evolving) web data –at low cost-
 
NOSQL
NOSQLNOSQL
NOSQL
 
1. introduction to no sql
1. introduction to no sql1. introduction to no sql
1. introduction to no sql
 
Debunking "Purpose-Built Data Systems:": Enter the Universal Database
Debunking "Purpose-Built Data Systems:": Enter the Universal DatabaseDebunking "Purpose-Built Data Systems:": Enter the Universal Database
Debunking "Purpose-Built Data Systems:": Enter the Universal Database
 
Seattle scalability meetup intro slides 23 oct 2013
Seattle scalability meetup intro slides 23 oct 2013Seattle scalability meetup intro slides 23 oct 2013
Seattle scalability meetup intro slides 23 oct 2013
 

Más de Ilias Flaounas

Improving experimentation velocity via Multi-Armed Bandits
Improving experimentation velocity via Multi-Armed BanditsImproving experimentation velocity via Multi-Armed Bandits
Improving experimentation velocity via Multi-Armed BanditsIlias Flaounas
 
Multi-Armed Bandits:
 Intro, examples and tricks
Multi-Armed Bandits:
 Intro, examples and tricksMulti-Armed Bandits:
 Intro, examples and tricks
Multi-Armed Bandits:
 Intro, examples and tricksIlias Flaounas
 
The story of the Product Growth team at Atlassian
The story of the Product Growth team at AtlassianThe story of the Product Growth team at Atlassian
The story of the Product Growth team at AtlassianIlias Flaounas
 
Readability and Linguistic Subjectivity of News
Readability and Linguistic Subjectivity of NewsReadability and Linguistic Subjectivity of News
Readability and Linguistic Subjectivity of NewsIlias Flaounas
 
Detecting macro-patterns in the EU media sphere
Detecting macro-patterns in the EU media sphereDetecting macro-patterns in the EU media sphere
Detecting macro-patterns in the EU media sphereIlias Flaounas
 
Detecting Patterns in News Media Content
Detecting Patterns in News Media ContentDetecting Patterns in News Media Content
Detecting Patterns in News Media ContentIlias Flaounas
 
Celebrity Watch: Browsing News Content by Exploiting Social Intelligence
Celebrity Watch: Browsing News Content by Exploiting Social IntelligenceCelebrity Watch: Browsing News Content by Exploiting Social Intelligence
Celebrity Watch: Browsing News Content by Exploiting Social IntelligenceIlias Flaounas
 
ECML/PKDD 2009: Found in translation
ECML/PKDD 2009: Found in translationECML/PKDD 2009: Found in translation
ECML/PKDD 2009: Found in translationIlias Flaounas
 
Data Science at Atlassian: 
The transition towards a data-driven organisation
Data Science at Atlassian: 
The transition towards a data-driven organisationData Science at Atlassian: 
The transition towards a data-driven organisation
Data Science at Atlassian: 
The transition towards a data-driven organisationIlias Flaounas
 
Inference and validation of networks
Inference and validation of networksInference and validation of networks
Inference and validation of networksIlias Flaounas
 

Más de Ilias Flaounas (10)

Improving experimentation velocity via Multi-Armed Bandits
Improving experimentation velocity via Multi-Armed BanditsImproving experimentation velocity via Multi-Armed Bandits
Improving experimentation velocity via Multi-Armed Bandits
 
Multi-Armed Bandits:
 Intro, examples and tricks
Multi-Armed Bandits:
 Intro, examples and tricksMulti-Armed Bandits:
 Intro, examples and tricks
Multi-Armed Bandits:
 Intro, examples and tricks
 
The story of the Product Growth team at Atlassian
The story of the Product Growth team at AtlassianThe story of the Product Growth team at Atlassian
The story of the Product Growth team at Atlassian
 
Readability and Linguistic Subjectivity of News
Readability and Linguistic Subjectivity of NewsReadability and Linguistic Subjectivity of News
Readability and Linguistic Subjectivity of News
 
Detecting macro-patterns in the EU media sphere
Detecting macro-patterns in the EU media sphereDetecting macro-patterns in the EU media sphere
Detecting macro-patterns in the EU media sphere
 
Detecting Patterns in News Media Content
Detecting Patterns in News Media ContentDetecting Patterns in News Media Content
Detecting Patterns in News Media Content
 
Celebrity Watch: Browsing News Content by Exploiting Social Intelligence
Celebrity Watch: Browsing News Content by Exploiting Social IntelligenceCelebrity Watch: Browsing News Content by Exploiting Social Intelligence
Celebrity Watch: Browsing News Content by Exploiting Social Intelligence
 
ECML/PKDD 2009: Found in translation
ECML/PKDD 2009: Found in translationECML/PKDD 2009: Found in translation
ECML/PKDD 2009: Found in translation
 
Data Science at Atlassian: 
The transition towards a data-driven organisation
Data Science at Atlassian: 
The transition towards a data-driven organisationData Science at Atlassian: 
The transition towards a data-driven organisation
Data Science at Atlassian: 
The transition towards a data-driven organisation
 
Inference and validation of networks
Inference and validation of networksInference and validation of networks
Inference and validation of networks
 

Último

Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...HyderabadDolls
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdfkhraisr
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRajesh Mondal
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...gajnagarg
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...gragchanchal546
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...Health
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowgargpaaro
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...kumargunjan9515
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxchadhar227
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...gajnagarg
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubaikojalkojal131
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.pptibrahimabdi22
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...nirzagarg
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...kumargunjan9515
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfSayantanBiswas37
 

Último (20)

Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
 

On Storing Big Data

  • 1. On Storing Big Data Ilias Flaounas Intelligent Systems Lab 30 October 2012 I. Flaounas (Intelligent Systems Lab) 30 October 2012 1 / 16
  • 2. Storing Big Data Data start to play an increasingly important role in business and science. I. Flaounas (Intelligent Systems Lab) 30 October 2012 2 / 16
  • 3. Storing Big Data Data start to play an increasingly important role in business and science. Storing, searching, sharing, analysing and visualising big data has become a challenge. I. Flaounas (Intelligent Systems Lab) 30 October 2012 2 / 16
  • 4. Storing Big Data Data start to play an increasingly important role in business and science. Storing, searching, sharing, analysing and visualising big data has become a challenge. Especially storing of data is often disregarded as an issue. I. Flaounas (Intelligent Systems Lab) 30 October 2012 2 / 16
  • 5. Storing Big Data Data start to play an increasingly important role in business and science. Storing, searching, sharing, analysing and visualising big data has become a challenge. Especially storing of data is often disregarded as an issue. Note that sometimes a MySQL database is not enough. I. Flaounas (Intelligent Systems Lab) 30 October 2012 2 / 16
  • 6. Storing Big Data Data start to play an increasingly important role in business and science. Storing, searching, sharing, analysing and visualising big data has become a challenge. Especially storing of data is often disregarded as an issue. Note that sometimes a MySQL database is not enough. Hadoop offers an out of the box distributed filesystem for storing data files. However, the challenge appears when someone needs DB capabilities, frequent updates or real time processing. I. Flaounas (Intelligent Systems Lab) 30 October 2012 2 / 16
  • 7. The Problems Nowadays traditional relational databases can reach their limit in performance. I. Flaounas (Intelligent Systems Lab) 30 October 2012 3 / 16
  • 8. The Problems Nowadays traditional relational databases can reach their limit in performance. Data keep on coming in high velocity, high volumes, and high variety. I. Flaounas (Intelligent Systems Lab) 30 October 2012 3 / 16
  • 9. The Problems Nowadays traditional relational databases can reach their limit in performance. Data keep on coming in high velocity, high volumes, and high variety. Common practices to increase performance fail after a while: buying a faster server, getting more RAM, using materialised views, fine tuning queries... I. Flaounas (Intelligent Systems Lab) 30 October 2012 3 / 16
  • 10. The Problems Nowadays traditional relational databases can reach their limit in performance. Data keep on coming in high velocity, high volumes, and high variety. Common practices to increase performance fail after a while: buying a faster server, getting more RAM, using materialised views, fine tuning queries... Furthermore, “alter table” doesn’t really work with lots of data. I. Flaounas (Intelligent Systems Lab) 30 October 2012 3 / 16
  • 11. The Problems Nowadays traditional relational databases can reach their limit in performance. Data keep on coming in high velocity, high volumes, and high variety. Common practices to increase performance fail after a while: buying a faster server, getting more RAM, using materialised views, fine tuning queries... Furthermore, “alter table” doesn’t really work with lots of data. Backups and data availability becomes an issue. I. Flaounas (Intelligent Systems Lab) 30 October 2012 3 / 16
  • 12. NoSQL Movement The term is too broad and new to really define it. I. Flaounas (Intelligent Systems Lab) 30 October 2012 4 / 16
  • 13. NoSQL Movement The term is too broad and new to really define it. Wikipedia: “NoSQL (Not only SQL) DB systems are often highly optimized for retrieve and append operations and often offer little functionality beyond record storage.” I. Flaounas (Intelligent Systems Lab) 30 October 2012 4 / 16
  • 14. NoSQL Movement The term is too broad and new to really define it. Wikipedia: “NoSQL (Not only SQL) DB systems are often highly optimized for retrieve and append operations and often offer little functionality beyond record storage.” No schema I. Flaounas (Intelligent Systems Lab) 30 October 2012 4 / 16
  • 15. NoSQL Movement The term is too broad and new to really define it. Wikipedia: “NoSQL (Not only SQL) DB systems are often highly optimized for retrieve and append operations and often offer little functionality beyond record storage.” No schema No joins between tables I. Flaounas (Intelligent Systems Lab) 30 October 2012 4 / 16
  • 16. NoSQL Movement The term is too broad and new to really define it. Wikipedia: “NoSQL (Not only SQL) DB systems are often highly optimized for retrieve and append operations and often offer little functionality beyond record storage.” No schema No joins between tables No common scripting language (like SQL) I. Flaounas (Intelligent Systems Lab) 30 October 2012 4 / 16
  • 17. NoSQL Movement The term is too broad and new to really define it. Wikipedia: “NoSQL (Not only SQL) DB systems are often highly optimized for retrieve and append operations and often offer little functionality beyond record storage.” No schema No joins between tables No common scripting language (like SQL) No ACID (atomicity, consistency, isolation, durability) I. Flaounas (Intelligent Systems Lab) 30 October 2012 4 / 16
  • 18. NoSQL Movement The term is too broad and new to really define it. Wikipedia: “NoSQL (Not only SQL) DB systems are often highly optimized for retrieve and append operations and often offer little functionality beyond record storage.” No schema No joins between tables No common scripting language (like SQL) No ACID (atomicity, consistency, isolation, durability) On the other hand you gain horizontal scalability and high performance. Also, most NoSQL systems are Map/Reduce ready and/or bind with Hadoop. I. Flaounas (Intelligent Systems Lab) 30 October 2012 4 / 16
  • 19. NoSQL DBs There are lots of different systems under the NoSQL ‘umbrella’. Each one is optimised with different application scenarios in mind, and with different choices on trade-offs. I. Flaounas (Intelligent Systems Lab) 30 October 2012 5 / 16
  • 20. NoSQL DBs There are lots of different systems under the NoSQL ‘umbrella’. Each one is optimised with different application scenarios in mind, and with different choices on trade-offs. Document based: CouchDB, MongoDB,... I. Flaounas (Intelligent Systems Lab) 30 October 2012 5 / 16
  • 21. NoSQL DBs There are lots of different systems under the NoSQL ‘umbrella’. Each one is optimised with different application scenarios in mind, and with different choices on trade-offs. Document based: CouchDB, MongoDB,... Key-value: Cassandra, Dynamo, Riak,... I. Flaounas (Intelligent Systems Lab) 30 October 2012 5 / 16
  • 22. NoSQL DBs There are lots of different systems under the NoSQL ‘umbrella’. Each one is optimised with different application scenarios in mind, and with different choices on trade-offs. Document based: CouchDB, MongoDB,... Key-value: Cassandra, Dynamo, Riak,... Tabular based: BigTable, HBase,... I. Flaounas (Intelligent Systems Lab) 30 October 2012 5 / 16
  • 23. NoSQL DBs There are lots of different systems under the NoSQL ‘umbrella’. Each one is optimised with different application scenarios in mind, and with different choices on trade-offs. Document based: CouchDB, MongoDB,... Key-value: Cassandra, Dynamo, Riak,... Tabular based: BigTable, HBase,... Memory based: Memcached, Redis, other optimised for solid state disks... I. Flaounas (Intelligent Systems Lab) 30 October 2012 5 / 16
  • 24. NoSQL DBs There are lots of different systems under the NoSQL ‘umbrella’. Each one is optimised with different application scenarios in mind, and with different choices on trade-offs. Document based: CouchDB, MongoDB,... Key-value: Cassandra, Dynamo, Riak,... Tabular based: BigTable, HBase,... Memory based: Memcached, Redis, other optimised for solid state disks... Specialised for graphs: Neo4j, InfiniteGraph,... I. Flaounas (Intelligent Systems Lab) 30 October 2012 5 / 16
  • 25. NoSQL DBs There are lots of different systems under the NoSQL ‘umbrella’. Each one is optimised with different application scenarios in mind, and with different choices on trade-offs. Document based: CouchDB, MongoDB,... Key-value: Cassandra, Dynamo, Riak,... Tabular based: BigTable, HBase,... Memory based: Memcached, Redis, other optimised for solid state disks... Specialised for graphs: Neo4j, InfiniteGraph,... Specialised for full-text search: Lucene, Solr... I. Flaounas (Intelligent Systems Lab) 30 October 2012 5 / 16
  • 26. NoSQL DBs There are lots of different systems under the NoSQL ‘umbrella’. Each one is optimised with different application scenarios in mind, and with different choices on trade-offs. Document based: CouchDB, MongoDB,... Key-value: Cassandra, Dynamo, Riak,... Tabular based: BigTable, HBase,... Memory based: Memcached, Redis, other optimised for solid state disks... Specialised for graphs: Neo4j, InfiniteGraph,... Specialised for full-text search: Lucene, Solr... Understand your requirements and then make a choice. I. Flaounas (Intelligent Systems Lab) 30 October 2012 5 / 16
  • 27. Oracle response I. Flaounas (Intelligent Systems Lab) 30 October 2012 6 / 16
  • 28. Oracle response May, 2011: Oracle issues a white paper titled “Debunking the NoSQL Hype”. I. Flaounas (Intelligent Systems Lab) 30 October 2012 6 / 16
  • 29. Oracle response May, 2011: Oracle issues a white paper titled “Debunking the NoSQL Hype”. The conclusion: “Go for the tried and true path. Don’t be risking your data on NoSQL databases.” I. Flaounas (Intelligent Systems Lab) 30 October 2012 6 / 16
  • 30. Oracle response May, 2011: Oracle issues a white paper titled “Debunking the NoSQL Hype”. The conclusion: “Go for the tried and true path. Don’t be risking your data on NoSQL databases.” October 2011: Oracle releases the “Oracle NoSQL Database”. The white paper is now reachable only via Google archives. I. Flaounas (Intelligent Systems Lab) 30 October 2012 6 / 16
  • 31. Example: MongoDB MongoDB (from “humongous”) is an open source, high-performance, schema-free, document-oriented database. I. Flaounas (Intelligent Systems Lab) 30 October 2012 7 / 16
  • 32. Example: MongoDB MongoDB (from “humongous”) is an open source, high-performance, schema-free, document-oriented database. Document-Oriented storage I. Flaounas (Intelligent Systems Lab) 30 October 2012 7 / 16
  • 33. Example: MongoDB MongoDB (from “humongous”) is an open source, high-performance, schema-free, document-oriented database. Document-Oriented storage No predefined schema I. Flaounas (Intelligent Systems Lab) 30 October 2012 7 / 16
  • 34. Example: MongoDB MongoDB (from “humongous”) is an open source, high-performance, schema-free, document-oriented database. Document-Oriented storage No predefined schema High Performance I. Flaounas (Intelligent Systems Lab) 30 October 2012 7 / 16
  • 35. Example: MongoDB MongoDB (from “humongous”) is an open source, high-performance, schema-free, document-oriented database. Document-Oriented storage No predefined schema High Performance Easy to add new “columns” in data rows I. Flaounas (Intelligent Systems Lab) 30 October 2012 7 / 16
  • 36. Example: MongoDB MongoDB (from “humongous”) is an open source, high-performance, schema-free, document-oriented database. Document-Oriented storage No predefined schema High Performance Easy to add new “columns” in data rows No joins between tables I. Flaounas (Intelligent Systems Lab) 30 October 2012 7 / 16
  • 37. Example: MongoDB MongoDB (from “humongous”) is an open source, high-performance, schema-free, document-oriented database. Document-Oriented storage No predefined schema High Performance Easy to add new “columns” in data rows No joins between tables Easy to scale horizontally: Auto-Sharding I. Flaounas (Intelligent Systems Lab) 30 October 2012 7 / 16
  • 38. Example: MongoDB MongoDB (from “humongous”) is an open source, high-performance, schema-free, document-oriented database. Document-Oriented storage No predefined schema High Performance Easy to add new “columns” in data rows No joins between tables Easy to scale horizontally: Auto-Sharding Automatic fail-over: invisible to applications I. Flaounas (Intelligent Systems Lab) 30 October 2012 7 / 16
  • 39. Example: MongoDB MongoDB (from “humongous”) is an open source, high-performance, schema-free, document-oriented database. Document-Oriented storage No predefined schema High Performance Easy to add new “columns” in data rows No joins between tables Easy to scale horizontally: Auto-Sharding Automatic fail-over: invisible to applications Full Index Support I. Flaounas (Intelligent Systems Lab) 30 October 2012 7 / 16
  • 40. Example: MongoDB MongoDB (from “humongous”) is an open source, high-performance, schema-free, document-oriented database. Document-Oriented storage No predefined schema High Performance Easy to add new “columns” in data rows No joins between tables Easy to scale horizontally: Auto-Sharding Automatic fail-over: invisible to applications Full Index Support Map/Reduce ready - Can bind with Hadoop I. Flaounas (Intelligent Systems Lab) 30 October 2012 7 / 16
  • 41. Example: MongoDB MongoDB (from “humongous”) is an open source, high-performance, schema-free, document-oriented database. Document-Oriented storage No predefined schema High Performance Easy to add new “columns” in data rows No joins between tables Easy to scale horizontally: Auto-Sharding Automatic fail-over: invisible to applications Full Index Support Map/Reduce ready - Can bind with Hadoop Eventually consistent I. Flaounas (Intelligent Systems Lab) 30 October 2012 7 / 16
  • 42. Example: MongoDB MongoDB (from “humongous”) is an open source, high-performance, schema-free, document-oriented database. Document-Oriented storage No predefined schema High Performance Easy to add new “columns” in data rows No joins between tables Easy to scale horizontally: Auto-Sharding Automatic fail-over: invisible to applications Full Index Support Map/Reduce ready - Can bind with Hadoop Eventually consistent Open Source but developed and maintained by company “10gen” I. Flaounas (Intelligent Systems Lab) 30 October 2012 7 / 16
  • 43. Document based DB A document is represented in JSON format: { “ id” : 12345678, “Link” : “http://news.scotsman.com/abc.html”, “Title”:“Blah blah blah”, “Content”: “More blah blah”, “OutletID” : 14, “Date” : ISODate(“2011-11-17T20:33:15.097Z”), “ Hash” : 550973592, “Tags” : [ International, News, Scotland], } I. Flaounas (Intelligent Systems Lab) 30 October 2012 8 / 16
  • 44. Single Server A single machine stores the DB, e.g MySQL. I. Flaounas (Intelligent Systems Lab) 30 October 2012 9 / 16
  • 45. Master/Slave Two machines in Master/Slave configuration. I. Flaounas (Intelligent Systems Lab) 30 October 2012 10 / 16
  • 46. MongoDB - Replication Automatic Fail Over - The Master is elected among servers. I. Flaounas (Intelligent Systems Lab) 30 October 2012 11 / 16
  • 47. MongoDB - Sharding Data is spread horizontally. I. Flaounas (Intelligent Systems Lab) 30 October 2012 12 / 16
  • 48. MongoDB If new shard is added, data is balanced automatically. I. Flaounas (Intelligent Systems Lab) 30 October 2012 13 / 16
  • 49. MongoDB No single point of failure, distributed read/writes. I. Flaounas (Intelligent Systems Lab) 30 October 2012 14 / 16
  • 50. Big Data come with Big Problems Maintenance of infrastructure - It is easier to manage one instead of 10 servers I. Flaounas (Intelligent Systems Lab) 30 October 2012 15 / 16
  • 51. Big Data come with Big Problems Maintenance of infrastructure - It is easier to manage one instead of 10 servers Need to adapt legacy software I. Flaounas (Intelligent Systems Lab) 30 October 2012 15 / 16
  • 52. Big Data come with Big Problems Maintenance of infrastructure - It is easier to manage one instead of 10 servers Need to adapt legacy software Training people on the new techs I. Flaounas (Intelligent Systems Lab) 30 October 2012 15 / 16
  • 53. Big Data come with Big Problems Maintenance of infrastructure - It is easier to manage one instead of 10 servers Need to adapt legacy software Training people on the new techs Designing DB – splitting data among machines for maximum I/O I. Flaounas (Intelligent Systems Lab) 30 October 2012 15 / 16
  • 54. Big Data come with Big Problems Maintenance of infrastructure - It is easier to manage one instead of 10 servers Need to adapt legacy software Training people on the new techs Designing DB – splitting data among machines for maximum I/O Bugs or ‘simple’ features may be missing, new versions come out too often... I. Flaounas (Intelligent Systems Lab) 30 October 2012 15 / 16
  • 55. Big Data come with Big Problems Maintenance of infrastructure - It is easier to manage one instead of 10 servers Need to adapt legacy software Training people on the new techs Designing DB – splitting data among machines for maximum I/O Bugs or ‘simple’ features may be missing, new versions come out too often... Security I. Flaounas (Intelligent Systems Lab) 30 October 2012 15 / 16
  • 56. Thank you! I. Flaounas (Intelligent Systems Lab) 30 October 2012 16 / 16