SlideShare una empresa de Scribd logo
1 de 76
Modern Database
   Systems
@spf13

                  AKA
Steve Francia




Chief Evangelist @
responsible for drivers,
integrations, web & docs
What’s the Point?
๏   Goal: Discover & identify ideal
    storage solution for our needs
๏   History is important
๏   Many options today
๏   Document databases are good
    for Genealogy
History of the
    World
Over 5500 years ago




     2 People
1804
1 Billion People
1927
2 Billion People
World Population Growth
World Population Growth
       (last ~200 years in Billions)
                                                 8



                                                 6



                                                 4
                                           7
                                    6
                             5
                      4                          2
               3
        2
 1
1804   1927                                      0
              1960   1974   1987   1999   2012
Really Big Data
In the last 50 years...

over 4 % of the world people
were born...

in less than 1 % of the time
History of
Databases
1970

๏ Oracle
       creates the relational
 database
๏ Everyone happily uses it for
 the next 43 years
What really
 happened
Let’s start at
the beginning
It’s a story about...

Storing & Retrieving
    Information
Even today we still use
the same mediums for
     data storage
With the advent of
the computer things
   really took off
1960 : DBMS Emerges
๏   Ordered set of fixed length fields
๏   Low level pointer operations (flat
    files)
๏   Most popular was IMS (created at
    IBM)
๏   Shockingly still in use today at IBM &
    American Airlines
Lots of Problems
๏   Complex and inflexible
๏   User had to know physical structure of the
    DB in order to query for information
๏   Adding a field to the DB required rewriting
    the underlying access/modification scheme
๏   Records isolated (no relations)
๏   Emphasis on records to be processed, not
    overall structure
1970 : Relational DB
๏   Edgar Frank “Ted” Codd
๏   Relational Database
    theory
๏   Codd’s 13 rules
    (aka 12 rules)
3 HUGE Advantages
๏   Data independence from hardware
    and storage implementation
๏   Ability to process more than one
    record at a time with a single
    operation
๏   Establishing a relationship
    between records
IBM vs Codd
๏ IBM   bet on IMS
๏ Codd   bets on relational DB
๏ Eventually
           2 relational
 prototypes emerge
Ingres

๏ Built   at UC Berkley
๏ Uses    QUEL
๏ Inspires   Sybase & MSSQL
System R
๏   Built at IBM
๏   Leads to SEQUEL... later SQL
๏   Evolved into SQL/DS which
    evolved into DB2
๏   Project concludes that relational
    model is viable
Oracle
๏   Larry Ellison watches IBM
๏   Starts Relational Software Inc.
๏   Oracle 1st commercial RDBMS
    released in 1979
๏   Beats IBM by 2 years to market
Entity Relationship
๏   Proposed by Peter
    Chen in 1976
๏   Focuses on data use
    and not logical table
    structure
1980s
๏ RDBMS    dominates
๏ Some fields (medicine,
 physics, multimedia) need
 more than RDBMS offers
๏ Object   Databases emerge
Object Databases
๏   Inspired by Entity Relationship
๏   More flexible than relational permits
๏   Tightly coupled with OO
    programming language (c++, later
    Java)
๏   Full object: data & methods stored
1990s
๏ Internet   emerges
๏ Data   demand spikes
๏ Databases used for
 archiving historical data
Early 2000s
๏ Internet   booms
๏ RDBMS   fails to scale
๏ Indesperation we take a
 step backwards
MemcacheD
๏1   dimensional
๏ No   persistence
๏ No   ACI or D
๏ but...
... FAST
2005 ish
๏   Relational + MemcacheD
    broken (and we didn’t know it)
๏   Scale redefined with high
    volume & social
๏   Infrastructure reinvented with
    cloud computing & SSDs
Alternatives Emerge

๏ Dynamo   / Key Value
๏ Document

๏ Graph
Modern Data
  Storage
A lot going on
Easiest to define databases in
broad terms
• What is a record?
 (data model)
• CAP : CA, AP, CP ?
 (infrastructure model)
Data Storage Structure
 1D           2D                            nD

Key     Key        Value   Key      Value(s)
        Key        Value   Key      Value(s)
Value   Key        Value   Key
        Key        Value     Key         Value
                             Key         Value(s)
                             Key
                                   Key     Value
                                   Key     Value(s)
Database structure
   1D         2D             nD



Key Value
            Relational   Document
Dynamo
 Graph
CAP Theorem
               Availability




Partitioning                  Consistency
CAP Theorem

xx
Node         Node




       App
CAP Theorem
               Availability


   Dynamo
                                          RDBMS
                    t
Key Value
                ten


                             Int
                                 o
              sis




                                  ler
NoSQLs
               on




                                   ant
            Inc




                    Unavailable
Partition                                Consistency
Tolerant            MongoDB
                    BigTable
Key Value
๏                       ๏   Often
    1 Dimensional
    storage (tupal)         MultiMaster...
๏
                            meaning
    Query key only          availability over
๏   Bucket index            consistency
    (range) on keys     ๏   Partitioning easy
๏   Records cannot be       thanks to single
    updated, only           value
    replaced

Cassandra, Redis, MemcacheD, Riak, DynamoDB
Relational
                    ๏   Single master
๏   2 Dimensional
    storage (map)       meaning
                        consistency >
๏   Query any           availability
    field           ๏   Partitioning hard
๏                       due to
    BTree Indexes       transactions &
                        joins

Oracle, MSSQL, MySQL, PostgreSQL, DB2
Document
๏                     ๏   Single master
    n Dimensional
    storage (hash         meaning
    w/ nesting)           consistency >
                          availability
๏   Query any field
                      ๏   Partitioning easy
    at any level
                          thanks to richer
๏   BTree Indexes         data model

MongoDB, CouchDB, RethinkDB
Graph
 ๏   1 Dimensional storage... but grouped to appear
     2D
 ๏   Differentiated by indexes
 ๏   Large indexes cover many relationships
 ๏   Query time depends on # records returned,
     not distance to get them
 ๏   Doesn’t require traversing to determine
     relationship

Neo4j, about 20 more... nobody talks much about
MongoDB for
 Genealogy
Right Data
  Model
Types of
              genealogy data
๏
    Events                ๏
                              Photographs
    (birth, death, etc)
                          ๏
๏                             Diaries & letters
    Official records
                          ๏
๏                             Ship passenger list
    Census
                          ๏
๏                             Occupation
    Names
                          ๏
๏                             and more
    Relationships
Challenges of
             genealogy data
๏
    Lots of possible data points... need flexible
    schema
๏
    Multiple versions of same data point
    (3 different dates for death date, 4 variations on
    name).
๏
    Lots of data associated with physical records
๏
    Multiple versions of same nodes
    (intelligent nondestructive merge needed)
๏
    Need to have meta data associated
Individual                               User
                           Events[]      • Name
• AFN                 • type             • Email Address
• Modification Date   • date             • Password
                      • contributor[]    • Individual_id
                      • record[]
     Name
• First[]
• Middle[]                  Location
• Last[]               • city
                       • state
                       • county
                                         Record
                                         • contributor
                       • country         • type
                       • coordinates[]   • thumbnail
                                         • content
                                         • description
                                         • tags[]
Individual
individual = {
   _id : ObjectId("4f2978dfaa999d9db02618ce"),
   AFN : '1XYK-KQJ',
   name: {
      first: ['john', 'johannes'],
      middle: 'peter',
      last: ['smith', 'sandvik']
    }
}


db.individual.find(
{name.first : ‘john’, name.middle : ‘peter’})
Individual.Events
events : [
    death : {
       date : ISODate('1989-07-14'),
       location : {
           city: 'pensacola',
           state: 'fl',
           county: 'escambia',
           country: 'usa'
           coordinates : [30.26,87.12]},
       contributor : ObjectId("4eeac...691")}]

db.individual.find(
{events.death.date : ISODate(‘1989-07-14’)})

db.individual.find(
{events.death.location : { $near:[30,90]}})
Event Versions
events : [
   birth : [ {
        date : ISODate('1928-04-06'),
        location : {
           city: 'brattleboro',
           state: 'vt',
           county: 'windham',
           country: 'usa'
           coordinates : [42.51,72.34]},
        contributor : ObjectId("4ee...00000"),
        records: ObjectId("4ed8a...7b000000")
   },
   {
        date : ISODate('1928-04-16'),
        location : {
           city: 'brattleboro',
           state: 'vt',
           county: 'windham',
           country: 'usa'
           coordinates : [42.51,72.34]},
        contributor : ObjectId("4ee...37bb"),
        records: ObjectId("4eea...0000c8"),
    }],
}
Query with Versioned Events
events : [
   birth : [
      { date : ISODate('1928-04-06')},
      { date : ISODate('1928-04-16')}
   ],
]




db.individual.find(
{events.birth.date : ISODate(‘1928-04-16’)})
Records
record1 = {
    _id : ObjectId("4ed8aea7d8562f7d7b")
    contributor : ObjectId("4eeab...1537bb"),
    type : 'birth certificate',
    thumbnail : BinData(0,"/9j/4AAQSkZJ...."),
    content : BinData(0,"j6b/Id11lWqs..."),
    tags : ['NY', 'certified'],
    description : "John's birth certificate"
}
Right Scale
MongoDB: Scale built in
๏   Intelligent replication
๏   Automatic partitioning of data
    (user configurable)
๏   Horizontal Scale
๏   Targeted Queries
๏   Parallel Processing
Intelligent Replication

   Node 1                          Node 2
   Secondary                       Secondary
                    Heartbeat
       Re




                                    on
          p




                                      i
                                  cat
         lic
            ati




                                  pli
               on




                                Re
                    Node 3
                     Primary
Scalable Architecture
                App Server   App Server   App Server




                 Mongos       Mongos       Mongos
     Config
    Node 1
     Server
    Secondary


     Config
    Node 1
     Server
    Secondary


     Config
    Node 1
     Server
    Secondary


                 Shard        Shard        Shard
x
High Availability in Shards

     Shard         Shard

                    Primary


     Mongod
              or
                   Secondary


                   Secondary
Targeted Requests
                 1
                     4

                 Mongos


         2

             3


     Shard       Shard    Shard
Parallel processing
               1
                        6

               Mongos 5


           2    2           2

           4        4       4


      Shard    Shard        Shard

       3           3            3
Right Feature
     Set
Broad Feature Set
๏   Rich query language
๏   Native support for over 12 languages
๏   GeoSpatial
๏   Text search
๏   Aggregation & MapReduce
๏   GridFS
    (distributed & replicated file storage)
๏   Integration with Hadoop, Solr & more
Last Year I
presented
on Graph in
MongoDB



      http://j.mp/XvJ3dl
FamilySearch
presented in
December
2012




      http://j.mp/X03TXp
http://j.mp/X03TXp
http://j.mp/X03TXp
http://j.mp/X03TXp
http://spf13.com
            http://github.com/spf13
            @spf13



Questions?
download at mongodb.org

Más contenido relacionado

La actualidad más candente

Ruby on Rails の特徴とそのエコシステム
Ruby on Rails の特徴とそのエコシステムRuby on Rails の特徴とそのエコシステム
Ruby on Rails の特徴とそのエコシステムTomoya Kawanishi
 
Family tree of data – provenance and neo4j
Family tree of data – provenance and neo4jFamily tree of data – provenance and neo4j
Family tree of data – provenance and neo4jM. David Allen
 
Become a Data Analyst
Become a Data Analyst Become a Data Analyst
Become a Data Analyst Aaron Lamphere
 
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks DeltaEnd-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks DeltaDatabricks
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidHortonworks
 
Time Series Data with InfluxDB
Time Series Data with InfluxDBTime Series Data with InfluxDB
Time Series Data with InfluxDBTuri, Inc.
 
Treasure Data Cloud Data Platform
Treasure Data Cloud Data PlatformTreasure Data Cloud Data Platform
Treasure Data Cloud Data Platforminside-BigData.com
 
関数型プログラミングのデザインパターンひとめぐり
関数型プログラミングのデザインパターンひとめぐり関数型プログラミングのデザインパターンひとめぐり
関数型プログラミングのデザインパターンひとめぐりKazuyuki TAKASE
 
RDB技術者のためのNoSQLガイド NoSQLの必要性と位置づけ
RDB技術者のためのNoSQLガイド NoSQLの必要性と位置づけRDB技術者のためのNoSQLガイド NoSQLの必要性と位置づけ
RDB技術者のためのNoSQLガイド NoSQLの必要性と位置づけRecruit Technologies
 
Untangling Cluster Management with Helix
Untangling Cluster Management with HelixUntangling Cluster Management with Helix
Untangling Cluster Management with HelixAmy W. Tang
 
RDRAモデリングを見てみよう
RDRAモデリングを見てみようRDRAモデリングを見てみよう
RDRAモデリングを見てみようZenji Kanzaki
 
Apache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow Workshop at VLDB 2019 / BOSS SessionApache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow Workshop at VLDB 2019 / BOSS SessionWes McKinney
 
グラフデータベース Neptune 使ってみた
グラフデータベース Neptune 使ってみたグラフデータベース Neptune 使ってみた
グラフデータベース Neptune 使ってみたYoshiyasu SAEKI
 
Training Series - Build A Routing Web Application With OpenStreetMap, Neo4j, ...
Training Series - Build A Routing Web Application With OpenStreetMap, Neo4j, ...Training Series - Build A Routing Web Application With OpenStreetMap, Neo4j, ...
Training Series - Build A Routing Web Application With OpenStreetMap, Neo4j, ...Neo4j
 
マイクロにしすぎた結果がこれだよ!
マイクロにしすぎた結果がこれだよ!マイクロにしすぎた結果がこれだよ!
マイクロにしすぎた結果がこれだよ!mosa siru
 
RedisのBitCountとHyperLogLogを使用した超高速Unique User数集計
RedisのBitCountとHyperLogLogを使用した超高速Unique User数集計RedisのBitCountとHyperLogLogを使用した超高速Unique User数集計
RedisのBitCountとHyperLogLogを使用した超高速Unique User数集計tmatsuura
 
Big Data in Real-Time at Twitter
Big Data in Real-Time at TwitterBig Data in Real-Time at Twitter
Big Data in Real-Time at Twitternkallen
 
Memory Management in Apache Spark
Memory Management in Apache SparkMemory Management in Apache Spark
Memory Management in Apache SparkDatabricks
 

La actualidad más candente (20)

Ruby on Rails の特徴とそのエコシステム
Ruby on Rails の特徴とそのエコシステムRuby on Rails の特徴とそのエコシステム
Ruby on Rails の特徴とそのエコシステム
 
Family tree of data – provenance and neo4j
Family tree of data – provenance and neo4jFamily tree of data – provenance and neo4j
Family tree of data – provenance and neo4j
 
Become a Data Analyst
Become a Data Analyst Become a Data Analyst
Become a Data Analyst
 
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks DeltaEnd-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
 
Time Series Data with InfluxDB
Time Series Data with InfluxDBTime Series Data with InfluxDB
Time Series Data with InfluxDB
 
Treasure Data Cloud Data Platform
Treasure Data Cloud Data PlatformTreasure Data Cloud Data Platform
Treasure Data Cloud Data Platform
 
関数型プログラミングのデザインパターンひとめぐり
関数型プログラミングのデザインパターンひとめぐり関数型プログラミングのデザインパターンひとめぐり
関数型プログラミングのデザインパターンひとめぐり
 
RDB技術者のためのNoSQLガイド NoSQLの必要性と位置づけ
RDB技術者のためのNoSQLガイド NoSQLの必要性と位置づけRDB技術者のためのNoSQLガイド NoSQLの必要性と位置づけ
RDB技術者のためのNoSQLガイド NoSQLの必要性と位置づけ
 
Untangling Cluster Management with Helix
Untangling Cluster Management with HelixUntangling Cluster Management with Helix
Untangling Cluster Management with Helix
 
RDRAモデリングを見てみよう
RDRAモデリングを見てみようRDRAモデリングを見てみよう
RDRAモデリングを見てみよう
 
Internal Hive
Internal HiveInternal Hive
Internal Hive
 
Apache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow Workshop at VLDB 2019 / BOSS SessionApache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow Workshop at VLDB 2019 / BOSS Session
 
グラフデータベース Neptune 使ってみた
グラフデータベース Neptune 使ってみたグラフデータベース Neptune 使ってみた
グラフデータベース Neptune 使ってみた
 
実践 NestJS
実践 NestJS実践 NestJS
実践 NestJS
 
Training Series - Build A Routing Web Application With OpenStreetMap, Neo4j, ...
Training Series - Build A Routing Web Application With OpenStreetMap, Neo4j, ...Training Series - Build A Routing Web Application With OpenStreetMap, Neo4j, ...
Training Series - Build A Routing Web Application With OpenStreetMap, Neo4j, ...
 
マイクロにしすぎた結果がこれだよ!
マイクロにしすぎた結果がこれだよ!マイクロにしすぎた結果がこれだよ!
マイクロにしすぎた結果がこれだよ!
 
RedisのBitCountとHyperLogLogを使用した超高速Unique User数集計
RedisのBitCountとHyperLogLogを使用した超高速Unique User数集計RedisのBitCountとHyperLogLogを使用した超高速Unique User数集計
RedisのBitCountとHyperLogLogを使用した超高速Unique User数集計
 
Big Data in Real-Time at Twitter
Big Data in Real-Time at TwitterBig Data in Real-Time at Twitter
Big Data in Real-Time at Twitter
 
Memory Management in Apache Spark
Memory Management in Apache SparkMemory Management in Apache Spark
Memory Management in Apache Spark
 

Similar a Modern Database Systems (for Genealogy)

Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQLYan Cui
 
An Introduction to Big Data, NoSQL and MongoDB
An Introduction to Big Data, NoSQL and MongoDBAn Introduction to Big Data, NoSQL and MongoDB
An Introduction to Big Data, NoSQL and MongoDBWilliam LaForest
 
Everything We Learned About In-Memory Data Layout While Building VoltDB
Everything We Learned About In-Memory Data Layout While Building VoltDBEverything We Learned About In-Memory Data Layout While Building VoltDB
Everything We Learned About In-Memory Data Layout While Building VoltDBjhugg
 
Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)Jon Haddad
 
NoSQL in the context of Social Web
NoSQL in the context of Social WebNoSQL in the context of Social Web
NoSQL in the context of Social WebBogdan Gaza
 
Plmce2012 scaling pinterest
Plmce2012 scaling pinterestPlmce2012 scaling pinterest
Plmce2012 scaling pinterestMohit Jain
 
Is NoSQL The Future of Data Storage?
Is NoSQL The Future of Data Storage?Is NoSQL The Future of Data Storage?
Is NoSQL The Future of Data Storage?Saltmarch Media
 
SQL Server 2008 Overview
SQL Server 2008 OverviewSQL Server 2008 Overview
SQL Server 2008 OverviewDavid Chou
 
Getting started with Spark & Cassandra by Jon Haddad of Datastax
Getting started with Spark & Cassandra by Jon Haddad of DatastaxGetting started with Spark & Cassandra by Jon Haddad of Datastax
Getting started with Spark & Cassandra by Jon Haddad of DatastaxData Con LA
 
Evolution of the DBA to Data Platform Administrator/Specialist
Evolution of the DBA to Data Platform Administrator/SpecialistEvolution of the DBA to Data Platform Administrator/Specialist
Evolution of the DBA to Data Platform Administrator/SpecialistTony Rogerson
 
Combine Spring Data Neo4j and Spring Boot to quickl
Combine Spring Data Neo4j and Spring Boot to quicklCombine Spring Data Neo4j and Spring Boot to quickl
Combine Spring Data Neo4j and Spring Boot to quicklNeo4j
 
SQL vs. NoSQL. It's always a hard choice.
SQL vs. NoSQL. It's always a hard choice.SQL vs. NoSQL. It's always a hard choice.
SQL vs. NoSQL. It's always a hard choice.Denis Reznik
 
Intro to Cassandra
Intro to CassandraIntro to Cassandra
Intro to CassandraJon Haddad
 
Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Don Demcsak
 
Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)Chris Richardson
 
Back to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQLBack to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQLMongoDB
 
Back to Basics 2017 - Introduction to NoSQL
Back to Basics 2017 - Introduction to NoSQLBack to Basics 2017 - Introduction to NoSQL
Back to Basics 2017 - Introduction to NoSQLJoe Drumgoole
 

Similar a Modern Database Systems (for Genealogy) (20)

Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
An Introduction to Big Data, NoSQL and MongoDB
An Introduction to Big Data, NoSQL and MongoDBAn Introduction to Big Data, NoSQL and MongoDB
An Introduction to Big Data, NoSQL and MongoDB
 
Everything We Learned About In-Memory Data Layout While Building VoltDB
Everything We Learned About In-Memory Data Layout While Building VoltDBEverything We Learned About In-Memory Data Layout While Building VoltDB
Everything We Learned About In-Memory Data Layout While Building VoltDB
 
Nosql
NosqlNosql
Nosql
 
Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)
 
NoSQL in the context of Social Web
NoSQL in the context of Social WebNoSQL in the context of Social Web
NoSQL in the context of Social Web
 
MongoDB for Genealogy
MongoDB for GenealogyMongoDB for Genealogy
MongoDB for Genealogy
 
Plmce2012 scaling pinterest
Plmce2012 scaling pinterestPlmce2012 scaling pinterest
Plmce2012 scaling pinterest
 
Is NoSQL The Future of Data Storage?
Is NoSQL The Future of Data Storage?Is NoSQL The Future of Data Storage?
Is NoSQL The Future of Data Storage?
 
SQL Server 2008 Overview
SQL Server 2008 OverviewSQL Server 2008 Overview
SQL Server 2008 Overview
 
Getting started with Spark & Cassandra by Jon Haddad of Datastax
Getting started with Spark & Cassandra by Jon Haddad of DatastaxGetting started with Spark & Cassandra by Jon Haddad of Datastax
Getting started with Spark & Cassandra by Jon Haddad of Datastax
 
Evolution of the DBA to Data Platform Administrator/Specialist
Evolution of the DBA to Data Platform Administrator/SpecialistEvolution of the DBA to Data Platform Administrator/Specialist
Evolution of the DBA to Data Platform Administrator/Specialist
 
Combine Spring Data Neo4j and Spring Boot to quickl
Combine Spring Data Neo4j and Spring Boot to quicklCombine Spring Data Neo4j and Spring Boot to quickl
Combine Spring Data Neo4j and Spring Boot to quickl
 
SQL vs. NoSQL. It's always a hard choice.
SQL vs. NoSQL. It's always a hard choice.SQL vs. NoSQL. It's always a hard choice.
SQL vs. NoSQL. It's always a hard choice.
 
Intro to Cassandra
Intro to CassandraIntro to Cassandra
Intro to Cassandra
 
Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)
 
Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)
 
iForum 2015: SQL vs. NoSQL
iForum 2015: SQL vs. NoSQLiForum 2015: SQL vs. NoSQL
iForum 2015: SQL vs. NoSQL
 
Back to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQLBack to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQL
 
Back to Basics 2017 - Introduction to NoSQL
Back to Basics 2017 - Introduction to NoSQLBack to Basics 2017 - Introduction to NoSQL
Back to Basics 2017 - Introduction to NoSQL
 

Más de Steven Francia

State of the Gopher Nation - Golang - August 2017
State of the Gopher Nation - Golang - August 2017State of the Gopher Nation - Golang - August 2017
State of the Gopher Nation - Golang - August 2017Steven Francia
 
Building Awesome CLI apps in Go
Building Awesome CLI apps in GoBuilding Awesome CLI apps in Go
Building Awesome CLI apps in GoSteven Francia
 
The Future of the Operating System - Keynote LinuxCon 2015
The Future of the Operating System -  Keynote LinuxCon 2015The Future of the Operating System -  Keynote LinuxCon 2015
The Future of the Operating System - Keynote LinuxCon 2015Steven Francia
 
7 Common Mistakes in Go (2015)
7 Common Mistakes in Go (2015)7 Common Mistakes in Go (2015)
7 Common Mistakes in Go (2015)Steven Francia
 
What every successful open source project needs
What every successful open source project needsWhat every successful open source project needs
What every successful open source project needsSteven Francia
 
7 Common mistakes in Go and when to avoid them
7 Common mistakes in Go and when to avoid them7 Common mistakes in Go and when to avoid them
7 Common mistakes in Go and when to avoid themSteven Francia
 
Go for Object Oriented Programmers or Object Oriented Programming without Obj...
Go for Object Oriented Programmers or Object Oriented Programming without Obj...Go for Object Oriented Programmers or Object Oriented Programming without Obj...
Go for Object Oriented Programmers or Object Oriented Programming without Obj...Steven Francia
 
Painless Data Storage with MongoDB & Go
Painless Data Storage with MongoDB & Go Painless Data Storage with MongoDB & Go
Painless Data Storage with MongoDB & Go Steven Francia
 
Getting Started with Go
Getting Started with GoGetting Started with Go
Getting Started with GoSteven Francia
 
Build your first MongoDB App in Ruby @ StrangeLoop 2013
Build your first MongoDB App in Ruby @ StrangeLoop 2013Build your first MongoDB App in Ruby @ StrangeLoop 2013
Build your first MongoDB App in Ruby @ StrangeLoop 2013Steven Francia
 
Introduction to MongoDB and Hadoop
Introduction to MongoDB and HadoopIntroduction to MongoDB and Hadoop
Introduction to MongoDB and HadoopSteven Francia
 
MongoDB, Hadoop and humongous data - MongoSV 2012
MongoDB, Hadoop and humongous data - MongoSV 2012MongoDB, Hadoop and humongous data - MongoSV 2012
MongoDB, Hadoop and humongous data - MongoSV 2012Steven Francia
 
Big data for the rest of us
Big data for the rest of usBig data for the rest of us
Big data for the rest of usSteven Francia
 
OSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB TutorialOSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB TutorialSteven Francia
 
Replication, Durability, and Disaster Recovery
Replication, Durability, and Disaster RecoveryReplication, Durability, and Disaster Recovery
Replication, Durability, and Disaster RecoverySteven Francia
 
Multi Data Center Strategies
Multi Data Center StrategiesMulti Data Center Strategies
Multi Data Center StrategiesSteven Francia
 
NoSQL databases and managing big data
NoSQL databases and managing big dataNoSQL databases and managing big data
NoSQL databases and managing big dataSteven Francia
 
MongoDB, Hadoop and Humongous Data
MongoDB, Hadoop and Humongous DataMongoDB, Hadoop and Humongous Data
MongoDB, Hadoop and Humongous DataSteven Francia
 

Más de Steven Francia (20)

State of the Gopher Nation - Golang - August 2017
State of the Gopher Nation - Golang - August 2017State of the Gopher Nation - Golang - August 2017
State of the Gopher Nation - Golang - August 2017
 
Building Awesome CLI apps in Go
Building Awesome CLI apps in GoBuilding Awesome CLI apps in Go
Building Awesome CLI apps in Go
 
The Future of the Operating System - Keynote LinuxCon 2015
The Future of the Operating System -  Keynote LinuxCon 2015The Future of the Operating System -  Keynote LinuxCon 2015
The Future of the Operating System - Keynote LinuxCon 2015
 
7 Common Mistakes in Go (2015)
7 Common Mistakes in Go (2015)7 Common Mistakes in Go (2015)
7 Common Mistakes in Go (2015)
 
What every successful open source project needs
What every successful open source project needsWhat every successful open source project needs
What every successful open source project needs
 
7 Common mistakes in Go and when to avoid them
7 Common mistakes in Go and when to avoid them7 Common mistakes in Go and when to avoid them
7 Common mistakes in Go and when to avoid them
 
Go for Object Oriented Programmers or Object Oriented Programming without Obj...
Go for Object Oriented Programmers or Object Oriented Programming without Obj...Go for Object Oriented Programmers or Object Oriented Programming without Obj...
Go for Object Oriented Programmers or Object Oriented Programming without Obj...
 
Painless Data Storage with MongoDB & Go
Painless Data Storage with MongoDB & Go Painless Data Storage with MongoDB & Go
Painless Data Storage with MongoDB & Go
 
Getting Started with Go
Getting Started with GoGetting Started with Go
Getting Started with Go
 
Build your first MongoDB App in Ruby @ StrangeLoop 2013
Build your first MongoDB App in Ruby @ StrangeLoop 2013Build your first MongoDB App in Ruby @ StrangeLoop 2013
Build your first MongoDB App in Ruby @ StrangeLoop 2013
 
Introduction to MongoDB and Hadoop
Introduction to MongoDB and HadoopIntroduction to MongoDB and Hadoop
Introduction to MongoDB and Hadoop
 
Future of data
Future of dataFuture of data
Future of data
 
MongoDB, Hadoop and humongous data - MongoSV 2012
MongoDB, Hadoop and humongous data - MongoSV 2012MongoDB, Hadoop and humongous data - MongoSV 2012
MongoDB, Hadoop and humongous data - MongoSV 2012
 
Big data for the rest of us
Big data for the rest of usBig data for the rest of us
Big data for the rest of us
 
OSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB TutorialOSCON 2012 MongoDB Tutorial
OSCON 2012 MongoDB Tutorial
 
Replication, Durability, and Disaster Recovery
Replication, Durability, and Disaster RecoveryReplication, Durability, and Disaster Recovery
Replication, Durability, and Disaster Recovery
 
Multi Data Center Strategies
Multi Data Center StrategiesMulti Data Center Strategies
Multi Data Center Strategies
 
NoSQL databases and managing big data
NoSQL databases and managing big dataNoSQL databases and managing big data
NoSQL databases and managing big data
 
MongoDB, Hadoop and Humongous Data
MongoDB, Hadoop and Humongous DataMongoDB, Hadoop and Humongous Data
MongoDB, Hadoop and Humongous Data
 
MongoDB and hadoop
MongoDB and hadoopMongoDB and hadoop
MongoDB and hadoop
 

Último

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 

Último (20)

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

Modern Database Systems (for Genealogy)

  • 1. Modern Database Systems
  • 2. @spf13 AKA Steve Francia Chief Evangelist @ responsible for drivers, integrations, web & docs
  • 3. What’s the Point? ๏ Goal: Discover & identify ideal storage solution for our needs ๏ History is important ๏ Many options today ๏ Document databases are good for Genealogy
  • 5. Over 5500 years ago 2 People
  • 9. World Population Growth (last ~200 years in Billions) 8 6 4 7 6 5 4 2 3 2 1 1804 1927 0 1960 1974 1987 1999 2012
  • 10. Really Big Data In the last 50 years... over 4 % of the world people were born... in less than 1 % of the time
  • 12. 1970 ๏ Oracle creates the relational database ๏ Everyone happily uses it for the next 43 years
  • 14. Let’s start at the beginning
  • 15. It’s a story about... Storing & Retrieving Information
  • 16.
  • 17.
  • 18.
  • 19.
  • 20. Even today we still use the same mediums for data storage
  • 21.
  • 22.
  • 23. With the advent of the computer things really took off
  • 24. 1960 : DBMS Emerges ๏ Ordered set of fixed length fields ๏ Low level pointer operations (flat files) ๏ Most popular was IMS (created at IBM) ๏ Shockingly still in use today at IBM & American Airlines
  • 25. Lots of Problems ๏ Complex and inflexible ๏ User had to know physical structure of the DB in order to query for information ๏ Adding a field to the DB required rewriting the underlying access/modification scheme ๏ Records isolated (no relations) ๏ Emphasis on records to be processed, not overall structure
  • 26. 1970 : Relational DB ๏ Edgar Frank “Ted” Codd ๏ Relational Database theory ๏ Codd’s 13 rules (aka 12 rules)
  • 27. 3 HUGE Advantages ๏ Data independence from hardware and storage implementation ๏ Ability to process more than one record at a time with a single operation ๏ Establishing a relationship between records
  • 28. IBM vs Codd ๏ IBM bet on IMS ๏ Codd bets on relational DB ๏ Eventually 2 relational prototypes emerge
  • 29. Ingres ๏ Built at UC Berkley ๏ Uses QUEL ๏ Inspires Sybase & MSSQL
  • 30. System R ๏ Built at IBM ๏ Leads to SEQUEL... later SQL ๏ Evolved into SQL/DS which evolved into DB2 ๏ Project concludes that relational model is viable
  • 31. Oracle ๏ Larry Ellison watches IBM ๏ Starts Relational Software Inc. ๏ Oracle 1st commercial RDBMS released in 1979 ๏ Beats IBM by 2 years to market
  • 32. Entity Relationship ๏ Proposed by Peter Chen in 1976 ๏ Focuses on data use and not logical table structure
  • 33. 1980s ๏ RDBMS dominates ๏ Some fields (medicine, physics, multimedia) need more than RDBMS offers ๏ Object Databases emerge
  • 34. Object Databases ๏ Inspired by Entity Relationship ๏ More flexible than relational permits ๏ Tightly coupled with OO programming language (c++, later Java) ๏ Full object: data & methods stored
  • 35. 1990s ๏ Internet emerges ๏ Data demand spikes ๏ Databases used for archiving historical data
  • 36. Early 2000s ๏ Internet booms ๏ RDBMS fails to scale ๏ Indesperation we take a step backwards
  • 37. MemcacheD ๏1 dimensional ๏ No persistence ๏ No ACI or D ๏ but...
  • 39. 2005 ish ๏ Relational + MemcacheD broken (and we didn’t know it) ๏ Scale redefined with high volume & social ๏ Infrastructure reinvented with cloud computing & SSDs
  • 40. Alternatives Emerge ๏ Dynamo / Key Value ๏ Document ๏ Graph
  • 41. Modern Data Storage
  • 42. A lot going on Easiest to define databases in broad terms • What is a record? (data model) • CAP : CA, AP, CP ? (infrastructure model)
  • 43. Data Storage Structure 1D 2D nD Key Key Value Key Value(s) Key Value Key Value(s) Value Key Value Key Key Value Key Value Key Value(s) Key Key Value Key Value(s)
  • 44. Database structure 1D 2D nD Key Value Relational Document Dynamo Graph
  • 45. CAP Theorem Availability Partitioning Consistency
  • 47. CAP Theorem Availability Dynamo RDBMS t Key Value ten Int o sis ler NoSQLs on ant Inc Unavailable Partition Consistency Tolerant MongoDB BigTable
  • 48. Key Value ๏ ๏ Often 1 Dimensional storage (tupal) MultiMaster... ๏ meaning Query key only availability over ๏ Bucket index consistency (range) on keys ๏ Partitioning easy ๏ Records cannot be thanks to single updated, only value replaced Cassandra, Redis, MemcacheD, Riak, DynamoDB
  • 49. Relational ๏ Single master ๏ 2 Dimensional storage (map) meaning consistency > ๏ Query any availability field ๏ Partitioning hard ๏ due to BTree Indexes transactions & joins Oracle, MSSQL, MySQL, PostgreSQL, DB2
  • 50. Document ๏ ๏ Single master n Dimensional storage (hash meaning w/ nesting) consistency > availability ๏ Query any field ๏ Partitioning easy at any level thanks to richer ๏ BTree Indexes data model MongoDB, CouchDB, RethinkDB
  • 51. Graph ๏ 1 Dimensional storage... but grouped to appear 2D ๏ Differentiated by indexes ๏ Large indexes cover many relationships ๏ Query time depends on # records returned, not distance to get them ๏ Doesn’t require traversing to determine relationship Neo4j, about 20 more... nobody talks much about
  • 53. Right Data Model
  • 54. Types of genealogy data ๏ Events ๏ Photographs (birth, death, etc) ๏ ๏ Diaries & letters Official records ๏ ๏ Ship passenger list Census ๏ ๏ Occupation Names ๏ ๏ and more Relationships
  • 55. Challenges of genealogy data ๏ Lots of possible data points... need flexible schema ๏ Multiple versions of same data point (3 different dates for death date, 4 variations on name). ๏ Lots of data associated with physical records ๏ Multiple versions of same nodes (intelligent nondestructive merge needed) ๏ Need to have meta data associated
  • 56. Individual User Events[] • Name • AFN • type • Email Address • Modification Date • date • Password • contributor[] • Individual_id • record[] Name • First[] • Middle[] Location • Last[] • city • state • county Record • contributor • country • type • coordinates[] • thumbnail • content • description • tags[]
  • 57. Individual individual = { _id : ObjectId("4f2978dfaa999d9db02618ce"), AFN : '1XYK-KQJ', name: { first: ['john', 'johannes'], middle: 'peter', last: ['smith', 'sandvik'] } } db.individual.find( {name.first : ‘john’, name.middle : ‘peter’})
  • 58. Individual.Events events : [ death : { date : ISODate('1989-07-14'), location : { city: 'pensacola', state: 'fl', county: 'escambia', country: 'usa' coordinates : [30.26,87.12]}, contributor : ObjectId("4eeac...691")}] db.individual.find( {events.death.date : ISODate(‘1989-07-14’)}) db.individual.find( {events.death.location : { $near:[30,90]}})
  • 59. Event Versions events : [ birth : [ { date : ISODate('1928-04-06'), location : { city: 'brattleboro', state: 'vt', county: 'windham', country: 'usa' coordinates : [42.51,72.34]}, contributor : ObjectId("4ee...00000"), records: ObjectId("4ed8a...7b000000") }, { date : ISODate('1928-04-16'), location : { city: 'brattleboro', state: 'vt', county: 'windham', country: 'usa' coordinates : [42.51,72.34]}, contributor : ObjectId("4ee...37bb"), records: ObjectId("4eea...0000c8"), }], }
  • 60. Query with Versioned Events events : [ birth : [ { date : ISODate('1928-04-06')}, { date : ISODate('1928-04-16')} ], ] db.individual.find( {events.birth.date : ISODate(‘1928-04-16’)})
  • 61. Records record1 = { _id : ObjectId("4ed8aea7d8562f7d7b") contributor : ObjectId("4eeab...1537bb"), type : 'birth certificate', thumbnail : BinData(0,"/9j/4AAQSkZJ...."), content : BinData(0,"j6b/Id11lWqs..."), tags : ['NY', 'certified'], description : "John's birth certificate" }
  • 63. MongoDB: Scale built in ๏ Intelligent replication ๏ Automatic partitioning of data (user configurable) ๏ Horizontal Scale ๏ Targeted Queries ๏ Parallel Processing
  • 64. Intelligent Replication Node 1 Node 2 Secondary Secondary Heartbeat Re on p i cat lic ati pli on Re Node 3 Primary
  • 65. Scalable Architecture App Server App Server App Server Mongos Mongos Mongos Config Node 1 Server Secondary Config Node 1 Server Secondary Config Node 1 Server Secondary Shard Shard Shard
  • 66. x High Availability in Shards Shard Shard Primary Mongod or Secondary Secondary
  • 67. Targeted Requests 1 4 Mongos 2 3 Shard Shard Shard
  • 68. Parallel processing 1 6 Mongos 5 2 2 2 4 4 4 Shard Shard Shard 3 3 3
  • 70. Broad Feature Set ๏ Rich query language ๏ Native support for over 12 languages ๏ GeoSpatial ๏ Text search ๏ Aggregation & MapReduce ๏ GridFS (distributed & replicated file storage) ๏ Integration with Hadoop, Solr & more
  • 71. Last Year I presented on Graph in MongoDB http://j.mp/XvJ3dl
  • 76. http://spf13.com http://github.com/spf13 @spf13 Questions? download at mongodb.org