SlideShare una empresa de Scribd logo
1 de 76
Modern Database
   Systems
@spf13

                  AKA
Steve Francia




Chief Evangelist @
responsible for drivers,
integrations, web & docs
What’s the Point?
๏   Goal: Discover & identify ideal
    storage solution for our needs
๏   History is important
๏   Many options today
๏   Document databases are good
    for Genealogy
History of the
    World
Over 5500 years ago




     2 People
1804
1 Billion People
1927
2 Billion People
World Population Growth
World Population Growth
       (last ~200 years in Billions)
                                                 8



                                                 6



                                                 4
                                           7
                                    6
                             5
                      4                          2
               3
        2
 1
1804   1927                                      0
              1960   1974   1987   1999   2012
Really Big Data
In the last 50 years...

over 4 % of the world people
were born...

in less than 1 % of the time
History of
Databases
1970

๏ Oracle
       creates the relational
 database
๏ Everyone happily uses it for
 the next 43 years
What really
 happened
Let’s start at
the beginning
It’s a story about...

Storing & Retrieving
    Information
Modern Database Systems (for Genealogy)
Modern Database Systems (for Genealogy)
Modern Database Systems (for Genealogy)
Modern Database Systems (for Genealogy)
Even today we still use
the same mediums for
     data storage
Modern Database Systems (for Genealogy)
Modern Database Systems (for Genealogy)
With the advent of
the computer things
   really took off
1960 : DBMS Emerges
๏   Ordered set of fixed length fields
๏   Low level pointer operations (flat
    files)
๏   Most popular was IMS (created at
    IBM)
๏   Shockingly still in use today at IBM &
    American Airlines
Lots of Problems
๏   Complex and inflexible
๏   User had to know physical structure of the
    DB in order to query for information
๏   Adding a field to the DB required rewriting
    the underlying access/modification scheme
๏   Records isolated (no relations)
๏   Emphasis on records to be processed, not
    overall structure
1970 : Relational DB
๏   Edgar Frank “Ted” Codd
๏   Relational Database
    theory
๏   Codd’s 13 rules
    (aka 12 rules)
3 HUGE Advantages
๏   Data independence from hardware
    and storage implementation
๏   Ability to process more than one
    record at a time with a single
    operation
๏   Establishing a relationship
    between records
IBM vs Codd
๏ IBM   bet on IMS
๏ Codd   bets on relational DB
๏ Eventually
           2 relational
 prototypes emerge
Ingres

๏ Built   at UC Berkley
๏ Uses    QUEL
๏ Inspires   Sybase & MSSQL
System R
๏   Built at IBM
๏   Leads to SEQUEL... later SQL
๏   Evolved into SQL/DS which
    evolved into DB2
๏   Project concludes that relational
    model is viable
Oracle
๏   Larry Ellison watches IBM
๏   Starts Relational Software Inc.
๏   Oracle 1st commercial RDBMS
    released in 1979
๏   Beats IBM by 2 years to market
Entity Relationship
๏   Proposed by Peter
    Chen in 1976
๏   Focuses on data use
    and not logical table
    structure
1980s
๏ RDBMS    dominates
๏ Some fields (medicine,
 physics, multimedia) need
 more than RDBMS offers
๏ Object   Databases emerge
Object Databases
๏   Inspired by Entity Relationship
๏   More flexible than relational permits
๏   Tightly coupled with OO
    programming language (c++, later
    Java)
๏   Full object: data & methods stored
1990s
๏ Internet   emerges
๏ Data   demand spikes
๏ Databases used for
 archiving historical data
Early 2000s
๏ Internet   booms
๏ RDBMS   fails to scale
๏ Indesperation we take a
 step backwards
MemcacheD
๏1   dimensional
๏ No   persistence
๏ No   ACI or D
๏ but...
... FAST
2005 ish
๏   Relational + MemcacheD
    broken (and we didn’t know it)
๏   Scale redefined with high
    volume & social
๏   Infrastructure reinvented with
    cloud computing & SSDs
Alternatives Emerge

๏ Dynamo   / Key Value
๏ Document

๏ Graph
Modern Data
  Storage
A lot going on
Easiest to define databases in
broad terms
• What is a record?
 (data model)
• CAP : CA, AP, CP ?
 (infrastructure model)
Data Storage Structure
 1D           2D                            nD

Key     Key        Value   Key      Value(s)
        Key        Value   Key      Value(s)
Value   Key        Value   Key
        Key        Value     Key         Value
                             Key         Value(s)
                             Key
                                   Key     Value
                                   Key     Value(s)
Database structure
   1D         2D             nD



Key Value
            Relational   Document
Dynamo
 Graph
CAP Theorem
               Availability




Partitioning                  Consistency
CAP Theorem

xx
Node         Node




       App
CAP Theorem
               Availability


   Dynamo
                                          RDBMS
                    t
Key Value
                ten


                             Int
                                 o
              sis




                                  ler
NoSQLs
               on




                                   ant
            Inc




                    Unavailable
Partition                                Consistency
Tolerant            MongoDB
                    BigTable
Key Value
๏                       ๏   Often
    1 Dimensional
    storage (tupal)         MultiMaster...
๏
                            meaning
    Query key only          availability over
๏   Bucket index            consistency
    (range) on keys     ๏   Partitioning easy
๏   Records cannot be       thanks to single
    updated, only           value
    replaced

Cassandra, Redis, MemcacheD, Riak, DynamoDB
Relational
                    ๏   Single master
๏   2 Dimensional
    storage (map)       meaning
                        consistency >
๏   Query any           availability
    field           ๏   Partitioning hard
๏                       due to
    BTree Indexes       transactions &
                        joins

Oracle, MSSQL, MySQL, PostgreSQL, DB2
Document
๏                     ๏   Single master
    n Dimensional
    storage (hash         meaning
    w/ nesting)           consistency >
                          availability
๏   Query any field
                      ๏   Partitioning easy
    at any level
                          thanks to richer
๏   BTree Indexes         data model

MongoDB, CouchDB, RethinkDB
Graph
 ๏   1 Dimensional storage... but grouped to appear
     2D
 ๏   Differentiated by indexes
 ๏   Large indexes cover many relationships
 ๏   Query time depends on # records returned,
     not distance to get them
 ๏   Doesn’t require traversing to determine
     relationship

Neo4j, about 20 more... nobody talks much about
MongoDB for
 Genealogy
Right Data
  Model
Types of
              genealogy data
๏
    Events                ๏
                              Photographs
    (birth, death, etc)
                          ๏
๏                             Diaries & letters
    Official records
                          ๏
๏                             Ship passenger list
    Census
                          ๏
๏                             Occupation
    Names
                          ๏
๏                             and more
    Relationships
Challenges of
             genealogy data
๏
    Lots of possible data points... need flexible
    schema
๏
    Multiple versions of same data point
    (3 different dates for death date, 4 variations on
    name).
๏
    Lots of data associated with physical records
๏
    Multiple versions of same nodes
    (intelligent nondestructive merge needed)
๏
    Need to have meta data associated
Individual                               User
                           Events[]      • Name
• AFN                 • type             • Email Address
• Modification Date   • date             • Password
                      • contributor[]    • Individual_id
                      • record[]
     Name
• First[]
• Middle[]                  Location
• Last[]               • city
                       • state
                       • county
                                         Record
                                         • contributor
                       • country         • type
                       • coordinates[]   • thumbnail
                                         • content
                                         • description
                                         • tags[]
Individual
individual = {
   _id : ObjectId("4f2978dfaa999d9db02618ce"),
   AFN : '1XYK-KQJ',
   name: {
      first: ['john', 'johannes'],
      middle: 'peter',
      last: ['smith', 'sandvik']
    }
}


db.individual.find(
{name.first : ‘john’, name.middle : ‘peter’})
Individual.Events
events : [
    death : {
       date : ISODate('1989-07-14'),
       location : {
           city: 'pensacola',
           state: 'fl',
           county: 'escambia',
           country: 'usa'
           coordinates : [30.26,87.12]},
       contributor : ObjectId("4eeac...691")}]

db.individual.find(
{events.death.date : ISODate(‘1989-07-14’)})

db.individual.find(
{events.death.location : { $near:[30,90]}})
Event Versions
events : [
   birth : [ {
        date : ISODate('1928-04-06'),
        location : {
           city: 'brattleboro',
           state: 'vt',
           county: 'windham',
           country: 'usa'
           coordinates : [42.51,72.34]},
        contributor : ObjectId("4ee...00000"),
        records: ObjectId("4ed8a...7b000000")
   },
   {
        date : ISODate('1928-04-16'),
        location : {
           city: 'brattleboro',
           state: 'vt',
           county: 'windham',
           country: 'usa'
           coordinates : [42.51,72.34]},
        contributor : ObjectId("4ee...37bb"),
        records: ObjectId("4eea...0000c8"),
    }],
}
Query with Versioned Events
events : [
   birth : [
      { date : ISODate('1928-04-06')},
      { date : ISODate('1928-04-16')}
   ],
]




db.individual.find(
{events.birth.date : ISODate(‘1928-04-16’)})
Records
record1 = {
    _id : ObjectId("4ed8aea7d8562f7d7b")
    contributor : ObjectId("4eeab...1537bb"),
    type : 'birth certificate',
    thumbnail : BinData(0,"/9j/4AAQSkZJ...."),
    content : BinData(0,"j6b/Id11lWqs..."),
    tags : ['NY', 'certified'],
    description : "John's birth certificate"
}
Right Scale
MongoDB: Scale built in
๏   Intelligent replication
๏   Automatic partitioning of data
    (user configurable)
๏   Horizontal Scale
๏   Targeted Queries
๏   Parallel Processing
Intelligent Replication

   Node 1                          Node 2
   Secondary                       Secondary
                    Heartbeat
       Re




                                    on
          p




                                      i
                                  cat
         lic
            ati




                                  pli
               on




                                Re
                    Node 3
                     Primary
Scalable Architecture
                App Server   App Server   App Server




                 Mongos       Mongos       Mongos
     Config
    Node 1
     Server
    Secondary


     Config
    Node 1
     Server
    Secondary


     Config
    Node 1
     Server
    Secondary


                 Shard        Shard        Shard
x
High Availability in Shards

     Shard         Shard

                    Primary


     Mongod
              or
                   Secondary


                   Secondary
Targeted Requests
                 1
                     4

                 Mongos


         2

             3


     Shard       Shard    Shard
Parallel processing
               1
                        6

               Mongos 5


           2    2           2

           4        4       4


      Shard    Shard        Shard

       3           3            3
Right Feature
     Set
Broad Feature Set
๏   Rich query language
๏   Native support for over 12 languages
๏   GeoSpatial
๏   Text search
๏   Aggregation & MapReduce
๏   GridFS
    (distributed & replicated file storage)
๏   Integration with Hadoop, Solr & more
Last Year I
presented
on Graph in
MongoDB



      http://j.mp/XvJ3dl
FamilySearch
presented in
December
2012




      http://j.mp/X03TXp
http://j.mp/X03TXp
http://j.mp/X03TXp
http://j.mp/X03TXp
http://spf13.com
            http://github.com/spf13
            @spf13



Questions?
download at mongodb.org

Más contenido relacionado

La actualidad más candente

ELK Stack - Kibana操作實務
ELK Stack - Kibana操作實務ELK Stack - Kibana操作實務
ELK Stack - Kibana操作實務Kedy Chang
 
Database administration
Database administrationDatabase administration
Database administrationRanidm
 
OrientDB - the 2nd generation of (Multi-Model) NoSQL - Devoxx Belgium 2015
OrientDB - the 2nd generation  of  (Multi-Model) NoSQL - Devoxx Belgium 2015OrientDB - the 2nd generation  of  (Multi-Model) NoSQL - Devoxx Belgium 2015
OrientDB - the 2nd generation of (Multi-Model) NoSQL - Devoxx Belgium 2015Luigi Dell'Aquila
 
How to build massive service for advance
How to build massive service for advanceHow to build massive service for advance
How to build massive service for advanceDaeMyung Kang
 
HBase for Architects
HBase for ArchitectsHBase for Architects
HBase for ArchitectsNick Dimiduk
 
Redshift at Lightspeed: How to continuously optimize and modify Redshift sche...
Redshift at Lightspeed: How to continuously optimize and modify Redshift sche...Redshift at Lightspeed: How to continuously optimize and modify Redshift sche...
Redshift at Lightspeed: How to continuously optimize and modify Redshift sche...Amazon Web Services
 
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...DataWorks Summit/Hadoop Summit
 
Slash n near real time indexing
Slash n   near real time indexingSlash n   near real time indexing
Slash n near real time indexingUmesh Prasad
 
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseApache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseDataWorks Summit/Hadoop Summit
 
Hive Data Modeling and Query Optimization
Hive Data Modeling and Query OptimizationHive Data Modeling and Query Optimization
Hive Data Modeling and Query OptimizationEyad Garelnabi
 
Transparent Data Encryption in PostgreSQL
Transparent Data Encryption in PostgreSQLTransparent Data Encryption in PostgreSQL
Transparent Data Encryption in PostgreSQLMasahiko Sawada
 
SQL to Hive Cheat Sheet
SQL to Hive Cheat SheetSQL to Hive Cheat Sheet
SQL to Hive Cheat SheetHortonworks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 
Dremel: Interactive Analysis of Web-Scale Datasets
Dremel: Interactive Analysis of Web-Scale Datasets Dremel: Interactive Analysis of Web-Scale Datasets
Dremel: Interactive Analysis of Web-Scale Datasets robertlz
 
Efficient logging in multithreaded C++ server
Efficient logging in multithreaded C++ serverEfficient logging in multithreaded C++ server
Efficient logging in multithreaded C++ serverShuo Chen
 
Massive service basic
Massive service basicMassive service basic
Massive service basicDaeMyung Kang
 
Why you should care about data layout in the file system with Cheng Lian and ...
Why you should care about data layout in the file system with Cheng Lian and ...Why you should care about data layout in the file system with Cheng Lian and ...
Why you should care about data layout in the file system with Cheng Lian and ...Databricks
 

La actualidad más candente (20)

ELK Stack - Kibana操作實務
ELK Stack - Kibana操作實務ELK Stack - Kibana操作實務
ELK Stack - Kibana操作實務
 
Database administration
Database administrationDatabase administration
Database administration
 
OrientDB - the 2nd generation of (Multi-Model) NoSQL - Devoxx Belgium 2015
OrientDB - the 2nd generation  of  (Multi-Model) NoSQL - Devoxx Belgium 2015OrientDB - the 2nd generation  of  (Multi-Model) NoSQL - Devoxx Belgium 2015
OrientDB - the 2nd generation of (Multi-Model) NoSQL - Devoxx Belgium 2015
 
How to build massive service for advance
How to build massive service for advanceHow to build massive service for advance
How to build massive service for advance
 
HBase for Architects
HBase for ArchitectsHBase for Architects
HBase for Architects
 
Redshift at Lightspeed: How to continuously optimize and modify Redshift sche...
Redshift at Lightspeed: How to continuously optimize and modify Redshift sche...Redshift at Lightspeed: How to continuously optimize and modify Redshift sche...
Redshift at Lightspeed: How to continuously optimize and modify Redshift sche...
 
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...
 
Slash n near real time indexing
Slash n   near real time indexingSlash n   near real time indexing
Slash n near real time indexing
 
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseApache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
 
Hive Data Modeling and Query Optimization
Hive Data Modeling and Query OptimizationHive Data Modeling and Query Optimization
Hive Data Modeling and Query Optimization
 
Transparent Data Encryption in PostgreSQL
Transparent Data Encryption in PostgreSQLTransparent Data Encryption in PostgreSQL
Transparent Data Encryption in PostgreSQL
 
SQL to Hive Cheat Sheet
SQL to Hive Cheat SheetSQL to Hive Cheat Sheet
SQL to Hive Cheat Sheet
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 
Sql Server Basics
Sql Server BasicsSql Server Basics
Sql Server Basics
 
Dremel: Interactive Analysis of Web-Scale Datasets
Dremel: Interactive Analysis of Web-Scale Datasets Dremel: Interactive Analysis of Web-Scale Datasets
Dremel: Interactive Analysis of Web-Scale Datasets
 
Efficient logging in multithreaded C++ server
Efficient logging in multithreaded C++ serverEfficient logging in multithreaded C++ server
Efficient logging in multithreaded C++ server
 
Massive service basic
Massive service basicMassive service basic
Massive service basic
 
Why you should care about data layout in the file system with Cheng Lian and ...
Why you should care about data layout in the file system with Cheng Lian and ...Why you should care about data layout in the file system with Cheng Lian and ...
Why you should care about data layout in the file system with Cheng Lian and ...
 
Hive: Loading Data
Hive: Loading DataHive: Loading Data
Hive: Loading Data
 
Database Testing
Database TestingDatabase Testing
Database Testing
 

Similar a Modern Database Systems (for Genealogy)

Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQLYan Cui
 
An Introduction to Big Data, NoSQL and MongoDB
An Introduction to Big Data, NoSQL and MongoDBAn Introduction to Big Data, NoSQL and MongoDB
An Introduction to Big Data, NoSQL and MongoDBWilliam LaForest
 
Everything We Learned About In-Memory Data Layout While Building VoltDB
Everything We Learned About In-Memory Data Layout While Building VoltDBEverything We Learned About In-Memory Data Layout While Building VoltDB
Everything We Learned About In-Memory Data Layout While Building VoltDBjhugg
 
Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)Jon Haddad
 
NoSQL in the context of Social Web
NoSQL in the context of Social WebNoSQL in the context of Social Web
NoSQL in the context of Social WebBogdan Gaza
 
Plmce2012 scaling pinterest
Plmce2012 scaling pinterestPlmce2012 scaling pinterest
Plmce2012 scaling pinterestMohit Jain
 
Is NoSQL The Future of Data Storage?
Is NoSQL The Future of Data Storage?Is NoSQL The Future of Data Storage?
Is NoSQL The Future of Data Storage?Saltmarch Media
 
SQL Server 2008 Overview
SQL Server 2008 OverviewSQL Server 2008 Overview
SQL Server 2008 OverviewDavid Chou
 
Getting started with Spark & Cassandra by Jon Haddad of Datastax
Getting started with Spark & Cassandra by Jon Haddad of DatastaxGetting started with Spark & Cassandra by Jon Haddad of Datastax
Getting started with Spark & Cassandra by Jon Haddad of DatastaxData Con LA
 
Evolution of the DBA to Data Platform Administrator/Specialist
Evolution of the DBA to Data Platform Administrator/SpecialistEvolution of the DBA to Data Platform Administrator/Specialist
Evolution of the DBA to Data Platform Administrator/SpecialistTony Rogerson
 
Combine Spring Data Neo4j and Spring Boot to quickl
Combine Spring Data Neo4j and Spring Boot to quicklCombine Spring Data Neo4j and Spring Boot to quickl
Combine Spring Data Neo4j and Spring Boot to quicklNeo4j
 
SQL vs. NoSQL. It's always a hard choice.
SQL vs. NoSQL. It's always a hard choice.SQL vs. NoSQL. It's always a hard choice.
SQL vs. NoSQL. It's always a hard choice.Denis Reznik
 
Intro to Cassandra
Intro to CassandraIntro to Cassandra
Intro to CassandraJon Haddad
 
Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Don Demcsak
 
Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)Chris Richardson
 
Back to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQLBack to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQLMongoDB
 
Back to Basics 2017 - Introduction to NoSQL
Back to Basics 2017 - Introduction to NoSQLBack to Basics 2017 - Introduction to NoSQL
Back to Basics 2017 - Introduction to NoSQLJoe Drumgoole
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQLDon Demcsak
 

Similar a Modern Database Systems (for Genealogy) (20)

Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
An Introduction to Big Data, NoSQL and MongoDB
An Introduction to Big Data, NoSQL and MongoDBAn Introduction to Big Data, NoSQL and MongoDB
An Introduction to Big Data, NoSQL and MongoDB
 
Everything We Learned About In-Memory Data Layout While Building VoltDB
Everything We Learned About In-Memory Data Layout While Building VoltDBEverything We Learned About In-Memory Data Layout While Building VoltDB
Everything We Learned About In-Memory Data Layout While Building VoltDB
 
Nosql
NosqlNosql
Nosql
 
Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)
 
NoSQL in the context of Social Web
NoSQL in the context of Social WebNoSQL in the context of Social Web
NoSQL in the context of Social Web
 
Plmce2012 scaling pinterest
Plmce2012 scaling pinterestPlmce2012 scaling pinterest
Plmce2012 scaling pinterest
 
Is NoSQL The Future of Data Storage?
Is NoSQL The Future of Data Storage?Is NoSQL The Future of Data Storage?
Is NoSQL The Future of Data Storage?
 
SQL Server 2008 Overview
SQL Server 2008 OverviewSQL Server 2008 Overview
SQL Server 2008 Overview
 
Getting started with Spark & Cassandra by Jon Haddad of Datastax
Getting started with Spark & Cassandra by Jon Haddad of DatastaxGetting started with Spark & Cassandra by Jon Haddad of Datastax
Getting started with Spark & Cassandra by Jon Haddad of Datastax
 
Evolution of the DBA to Data Platform Administrator/Specialist
Evolution of the DBA to Data Platform Administrator/SpecialistEvolution of the DBA to Data Platform Administrator/Specialist
Evolution of the DBA to Data Platform Administrator/Specialist
 
Combine Spring Data Neo4j and Spring Boot to quickl
Combine Spring Data Neo4j and Spring Boot to quicklCombine Spring Data Neo4j and Spring Boot to quickl
Combine Spring Data Neo4j and Spring Boot to quickl
 
SQL vs. NoSQL. It's always a hard choice.
SQL vs. NoSQL. It's always a hard choice.SQL vs. NoSQL. It's always a hard choice.
SQL vs. NoSQL. It's always a hard choice.
 
Intro to Cassandra
Intro to CassandraIntro to Cassandra
Intro to Cassandra
 
Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)
 
Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)
 
iForum 2015: SQL vs. NoSQL
iForum 2015: SQL vs. NoSQLiForum 2015: SQL vs. NoSQL
iForum 2015: SQL vs. NoSQL
 
Back to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQLBack to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQL
 
Back to Basics 2017 - Introduction to NoSQL
Back to Basics 2017 - Introduction to NoSQLBack to Basics 2017 - Introduction to NoSQL
Back to Basics 2017 - Introduction to NoSQL
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQL
 

Más de Steven Francia

State of the Gopher Nation - Golang - August 2017
State of the Gopher Nation - Golang - August 2017State of the Gopher Nation - Golang - August 2017
State of the Gopher Nation - Golang - August 2017Steven Francia
 
Building Awesome CLI apps in Go
Building Awesome CLI apps in GoBuilding Awesome CLI apps in Go
Building Awesome CLI apps in GoSteven Francia
 
The Future of the Operating System - Keynote LinuxCon 2015
The Future of the Operating System -  Keynote LinuxCon 2015The Future of the Operating System -  Keynote LinuxCon 2015
The Future of the Operating System - Keynote LinuxCon 2015Steven Francia
 
7 Common Mistakes in Go (2015)
7 Common Mistakes in Go (2015)7 Common Mistakes in Go (2015)
7 Common Mistakes in Go (2015)Steven Francia
 
What every successful open source project needs
What every successful open source project needsWhat every successful open source project needs
What every successful open source project needsSteven Francia
 
7 Common mistakes in Go and when to avoid them
7 Common mistakes in Go and when to avoid them7 Common mistakes in Go and when to avoid them
7 Common mistakes in Go and when to avoid themSteven Francia
 
Go for Object Oriented Programmers or Object Oriented Programming without Obj...
Go for Object Oriented Programmers or Object Oriented Programming without Obj...Go for Object Oriented Programmers or Object Oriented Programming without Obj...
Go for Object Oriented Programmers or Object Oriented Programming without Obj...Steven Francia
 
Painless Data Storage with MongoDB & Go
Painless Data Storage with MongoDB & Go Painless Data Storage with MongoDB & Go
Painless Data Storage with MongoDB & Go Steven Francia
 
Getting Started with Go
Getting Started with GoGetting Started with Go
Getting Started with GoSteven Francia
 
Build your first MongoDB App in Ruby @ StrangeLoop 2013
Build your first MongoDB App in Ruby @ StrangeLoop 2013Build your first MongoDB App in Ruby @ StrangeLoop 2013
Build your first MongoDB App in Ruby @ StrangeLoop 2013Steven Francia
 
Introduction to MongoDB and Hadoop
Introduction to MongoDB and HadoopIntroduction to MongoDB and Hadoop
Introduction to MongoDB and HadoopSteven Francia
 
MongoDB, Hadoop and humongous data - MongoSV 2012
MongoDB, Hadoop and humongous data - MongoSV 2012MongoDB, Hadoop and humongous data - MongoSV 2012
MongoDB, Hadoop and humongous data - MongoSV 2012Steven Francia
 
Big data for the rest of us
Big data for the rest of usBig data for the rest of us
Big data for the rest of usSteven Francia
 
OSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB TutorialOSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB TutorialSteven Francia
 
Replication, Durability, and Disaster Recovery
Replication, Durability, and Disaster RecoveryReplication, Durability, and Disaster Recovery
Replication, Durability, and Disaster RecoverySteven Francia
 
Multi Data Center Strategies
Multi Data Center StrategiesMulti Data Center Strategies
Multi Data Center StrategiesSteven Francia
 
NoSQL databases and managing big data
NoSQL databases and managing big dataNoSQL databases and managing big data
NoSQL databases and managing big dataSteven Francia
 
MongoDB, Hadoop and Humongous Data
MongoDB, Hadoop and Humongous DataMongoDB, Hadoop and Humongous Data
MongoDB, Hadoop and Humongous DataSteven Francia
 

Más de Steven Francia (20)

State of the Gopher Nation - Golang - August 2017
State of the Gopher Nation - Golang - August 2017State of the Gopher Nation - Golang - August 2017
State of the Gopher Nation - Golang - August 2017
 
Building Awesome CLI apps in Go
Building Awesome CLI apps in GoBuilding Awesome CLI apps in Go
Building Awesome CLI apps in Go
 
The Future of the Operating System - Keynote LinuxCon 2015
The Future of the Operating System -  Keynote LinuxCon 2015The Future of the Operating System -  Keynote LinuxCon 2015
The Future of the Operating System - Keynote LinuxCon 2015
 
7 Common Mistakes in Go (2015)
7 Common Mistakes in Go (2015)7 Common Mistakes in Go (2015)
7 Common Mistakes in Go (2015)
 
What every successful open source project needs
What every successful open source project needsWhat every successful open source project needs
What every successful open source project needs
 
7 Common mistakes in Go and when to avoid them
7 Common mistakes in Go and when to avoid them7 Common mistakes in Go and when to avoid them
7 Common mistakes in Go and when to avoid them
 
Go for Object Oriented Programmers or Object Oriented Programming without Obj...
Go for Object Oriented Programmers or Object Oriented Programming without Obj...Go for Object Oriented Programmers or Object Oriented Programming without Obj...
Go for Object Oriented Programmers or Object Oriented Programming without Obj...
 
Painless Data Storage with MongoDB & Go
Painless Data Storage with MongoDB & Go Painless Data Storage with MongoDB & Go
Painless Data Storage with MongoDB & Go
 
Getting Started with Go
Getting Started with GoGetting Started with Go
Getting Started with Go
 
Build your first MongoDB App in Ruby @ StrangeLoop 2013
Build your first MongoDB App in Ruby @ StrangeLoop 2013Build your first MongoDB App in Ruby @ StrangeLoop 2013
Build your first MongoDB App in Ruby @ StrangeLoop 2013
 
Introduction to MongoDB and Hadoop
Introduction to MongoDB and HadoopIntroduction to MongoDB and Hadoop
Introduction to MongoDB and Hadoop
 
Future of data
Future of dataFuture of data
Future of data
 
MongoDB, Hadoop and humongous data - MongoSV 2012
MongoDB, Hadoop and humongous data - MongoSV 2012MongoDB, Hadoop and humongous data - MongoSV 2012
MongoDB, Hadoop and humongous data - MongoSV 2012
 
Big data for the rest of us
Big data for the rest of usBig data for the rest of us
Big data for the rest of us
 
OSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB TutorialOSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB Tutorial
 
Replication, Durability, and Disaster Recovery
Replication, Durability, and Disaster RecoveryReplication, Durability, and Disaster Recovery
Replication, Durability, and Disaster Recovery
 
Multi Data Center Strategies
Multi Data Center StrategiesMulti Data Center Strategies
Multi Data Center Strategies
 
NoSQL databases and managing big data
NoSQL databases and managing big dataNoSQL databases and managing big data
NoSQL databases and managing big data
 
MongoDB, Hadoop and Humongous Data
MongoDB, Hadoop and Humongous DataMongoDB, Hadoop and Humongous Data
MongoDB, Hadoop and Humongous Data
 
MongoDB and hadoop
MongoDB and hadoopMongoDB and hadoop
MongoDB and hadoop
 

Último

AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarPrecisely
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Brian Pichman
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesDavid Newbury
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXTarek Kalaji
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-pyJamie (Taka) Wang
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfJamie (Taka) Wang
 

Último (20)

AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity Webinar
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
201610817 - edge part1
201610817 - edge part1201610817 - edge part1
201610817 - edge part1
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
20150722 - AGV
20150722 - AGV20150722 - AGV
20150722 - AGV
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond Ontologies
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBX
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-py
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
 

Modern Database Systems (for Genealogy)

  • 1. Modern Database Systems
  • 2. @spf13 AKA Steve Francia Chief Evangelist @ responsible for drivers, integrations, web & docs
  • 3. What’s the Point? ๏ Goal: Discover & identify ideal storage solution for our needs ๏ History is important ๏ Many options today ๏ Document databases are good for Genealogy
  • 5. Over 5500 years ago 2 People
  • 9. World Population Growth (last ~200 years in Billions) 8 6 4 7 6 5 4 2 3 2 1 1804 1927 0 1960 1974 1987 1999 2012
  • 10. Really Big Data In the last 50 years... over 4 % of the world people were born... in less than 1 % of the time
  • 12. 1970 ๏ Oracle creates the relational database ๏ Everyone happily uses it for the next 43 years
  • 14. Let’s start at the beginning
  • 15. It’s a story about... Storing & Retrieving Information
  • 20. Even today we still use the same mediums for data storage
  • 23. With the advent of the computer things really took off
  • 24. 1960 : DBMS Emerges ๏ Ordered set of fixed length fields ๏ Low level pointer operations (flat files) ๏ Most popular was IMS (created at IBM) ๏ Shockingly still in use today at IBM & American Airlines
  • 25. Lots of Problems ๏ Complex and inflexible ๏ User had to know physical structure of the DB in order to query for information ๏ Adding a field to the DB required rewriting the underlying access/modification scheme ๏ Records isolated (no relations) ๏ Emphasis on records to be processed, not overall structure
  • 26. 1970 : Relational DB ๏ Edgar Frank “Ted” Codd ๏ Relational Database theory ๏ Codd’s 13 rules (aka 12 rules)
  • 27. 3 HUGE Advantages ๏ Data independence from hardware and storage implementation ๏ Ability to process more than one record at a time with a single operation ๏ Establishing a relationship between records
  • 28. IBM vs Codd ๏ IBM bet on IMS ๏ Codd bets on relational DB ๏ Eventually 2 relational prototypes emerge
  • 29. Ingres ๏ Built at UC Berkley ๏ Uses QUEL ๏ Inspires Sybase & MSSQL
  • 30. System R ๏ Built at IBM ๏ Leads to SEQUEL... later SQL ๏ Evolved into SQL/DS which evolved into DB2 ๏ Project concludes that relational model is viable
  • 31. Oracle ๏ Larry Ellison watches IBM ๏ Starts Relational Software Inc. ๏ Oracle 1st commercial RDBMS released in 1979 ๏ Beats IBM by 2 years to market
  • 32. Entity Relationship ๏ Proposed by Peter Chen in 1976 ๏ Focuses on data use and not logical table structure
  • 33. 1980s ๏ RDBMS dominates ๏ Some fields (medicine, physics, multimedia) need more than RDBMS offers ๏ Object Databases emerge
  • 34. Object Databases ๏ Inspired by Entity Relationship ๏ More flexible than relational permits ๏ Tightly coupled with OO programming language (c++, later Java) ๏ Full object: data & methods stored
  • 35. 1990s ๏ Internet emerges ๏ Data demand spikes ๏ Databases used for archiving historical data
  • 36. Early 2000s ๏ Internet booms ๏ RDBMS fails to scale ๏ Indesperation we take a step backwards
  • 37. MemcacheD ๏1 dimensional ๏ No persistence ๏ No ACI or D ๏ but...
  • 39. 2005 ish ๏ Relational + MemcacheD broken (and we didn’t know it) ๏ Scale redefined with high volume & social ๏ Infrastructure reinvented with cloud computing & SSDs
  • 40. Alternatives Emerge ๏ Dynamo / Key Value ๏ Document ๏ Graph
  • 41. Modern Data Storage
  • 42. A lot going on Easiest to define databases in broad terms • What is a record? (data model) • CAP : CA, AP, CP ? (infrastructure model)
  • 43. Data Storage Structure 1D 2D nD Key Key Value Key Value(s) Key Value Key Value(s) Value Key Value Key Key Value Key Value Key Value(s) Key Key Value Key Value(s)
  • 44. Database structure 1D 2D nD Key Value Relational Document Dynamo Graph
  • 45. CAP Theorem Availability Partitioning Consistency
  • 47. CAP Theorem Availability Dynamo RDBMS t Key Value ten Int o sis ler NoSQLs on ant Inc Unavailable Partition Consistency Tolerant MongoDB BigTable
  • 48. Key Value ๏ ๏ Often 1 Dimensional storage (tupal) MultiMaster... ๏ meaning Query key only availability over ๏ Bucket index consistency (range) on keys ๏ Partitioning easy ๏ Records cannot be thanks to single updated, only value replaced Cassandra, Redis, MemcacheD, Riak, DynamoDB
  • 49. Relational ๏ Single master ๏ 2 Dimensional storage (map) meaning consistency > ๏ Query any availability field ๏ Partitioning hard ๏ due to BTree Indexes transactions & joins Oracle, MSSQL, MySQL, PostgreSQL, DB2
  • 50. Document ๏ ๏ Single master n Dimensional storage (hash meaning w/ nesting) consistency > availability ๏ Query any field ๏ Partitioning easy at any level thanks to richer ๏ BTree Indexes data model MongoDB, CouchDB, RethinkDB
  • 51. Graph ๏ 1 Dimensional storage... but grouped to appear 2D ๏ Differentiated by indexes ๏ Large indexes cover many relationships ๏ Query time depends on # records returned, not distance to get them ๏ Doesn’t require traversing to determine relationship Neo4j, about 20 more... nobody talks much about
  • 53. Right Data Model
  • 54. Types of genealogy data ๏ Events ๏ Photographs (birth, death, etc) ๏ ๏ Diaries & letters Official records ๏ ๏ Ship passenger list Census ๏ ๏ Occupation Names ๏ ๏ and more Relationships
  • 55. Challenges of genealogy data ๏ Lots of possible data points... need flexible schema ๏ Multiple versions of same data point (3 different dates for death date, 4 variations on name). ๏ Lots of data associated with physical records ๏ Multiple versions of same nodes (intelligent nondestructive merge needed) ๏ Need to have meta data associated
  • 56. Individual User Events[] • Name • AFN • type • Email Address • Modification Date • date • Password • contributor[] • Individual_id • record[] Name • First[] • Middle[] Location • Last[] • city • state • county Record • contributor • country • type • coordinates[] • thumbnail • content • description • tags[]
  • 57. Individual individual = { _id : ObjectId("4f2978dfaa999d9db02618ce"), AFN : '1XYK-KQJ', name: { first: ['john', 'johannes'], middle: 'peter', last: ['smith', 'sandvik'] } } db.individual.find( {name.first : ‘john’, name.middle : ‘peter’})
  • 58. Individual.Events events : [ death : { date : ISODate('1989-07-14'), location : { city: 'pensacola', state: 'fl', county: 'escambia', country: 'usa' coordinates : [30.26,87.12]}, contributor : ObjectId("4eeac...691")}] db.individual.find( {events.death.date : ISODate(‘1989-07-14’)}) db.individual.find( {events.death.location : { $near:[30,90]}})
  • 59. Event Versions events : [ birth : [ { date : ISODate('1928-04-06'), location : { city: 'brattleboro', state: 'vt', county: 'windham', country: 'usa' coordinates : [42.51,72.34]}, contributor : ObjectId("4ee...00000"), records: ObjectId("4ed8a...7b000000") }, { date : ISODate('1928-04-16'), location : { city: 'brattleboro', state: 'vt', county: 'windham', country: 'usa' coordinates : [42.51,72.34]}, contributor : ObjectId("4ee...37bb"), records: ObjectId("4eea...0000c8"), }], }
  • 60. Query with Versioned Events events : [ birth : [ { date : ISODate('1928-04-06')}, { date : ISODate('1928-04-16')} ], ] db.individual.find( {events.birth.date : ISODate(‘1928-04-16’)})
  • 61. Records record1 = { _id : ObjectId("4ed8aea7d8562f7d7b") contributor : ObjectId("4eeab...1537bb"), type : 'birth certificate', thumbnail : BinData(0,"/9j/4AAQSkZJ...."), content : BinData(0,"j6b/Id11lWqs..."), tags : ['NY', 'certified'], description : "John's birth certificate" }
  • 63. MongoDB: Scale built in ๏ Intelligent replication ๏ Automatic partitioning of data (user configurable) ๏ Horizontal Scale ๏ Targeted Queries ๏ Parallel Processing
  • 64. Intelligent Replication Node 1 Node 2 Secondary Secondary Heartbeat Re on p i cat lic ati pli on Re Node 3 Primary
  • 65. Scalable Architecture App Server App Server App Server Mongos Mongos Mongos Config Node 1 Server Secondary Config Node 1 Server Secondary Config Node 1 Server Secondary Shard Shard Shard
  • 66. x High Availability in Shards Shard Shard Primary Mongod or Secondary Secondary
  • 67. Targeted Requests 1 4 Mongos 2 3 Shard Shard Shard
  • 68. Parallel processing 1 6 Mongos 5 2 2 2 4 4 4 Shard Shard Shard 3 3 3
  • 70. Broad Feature Set ๏ Rich query language ๏ Native support for over 12 languages ๏ GeoSpatial ๏ Text search ๏ Aggregation & MapReduce ๏ GridFS (distributed & replicated file storage) ๏ Integration with Hadoop, Solr & more
  • 71. Last Year I presented on Graph in MongoDB http://j.mp/XvJ3dl
  • 76. http://spf13.com http://github.com/spf13 @spf13 Questions? download at mongodb.org