SlideShare una empresa de Scribd logo
1 de 38
Descargar para leer sin conexión
Graph Databases and the Future of Large-Scale
          Knowledge Management



                  Marko A. Rodriguez
           T-5, Center for Nonlinear Studies
           Los Alamos National Laboratory
              http://markorodriguez.com

                     April 8, 2009
Abstract

Modern day open source and commercial graph databases can store on the
order of 1 billion relationships with some databases reaching the 10 billion
mark. These developments are making the graph database practical for
applications that require large-scale knowledge structures. Moreover, with
the Web of Data standards set forth by the Linked Data community, it is
possible to interlink graph databases across the web into a giant global
knowledge structure. This talk will discuss graph databases, their
underlying data model, their querying mechanisms, and the benefits of the
graph data structure for modeling and analysis.




                                          Risk Symposium – Santa Fe, New Mexico – April 8, 2009
Outline

• The Relational Database vs. the Graph Database

• The Web of Documents vs. the Web of Data




                                      Risk Symposium – Santa Fe, New Mexico – April 8, 2009
Outline

• The Relational Database vs. the Graph Database

• The Web of Documents vs. the Web of Data




                                    Risk Symposium – Santa Fe, New Mexico – April 8, 2009
The Relational Database vs. the Graph Database

• A relational database’s (e.g. MySQL, PostgreSQL, Oracle) data model
  is a collection interlinked tables.

• A graph database’s (e.g. OpenSesame, AllegroGraph, Neo4j) data model
  is a multi-relational graph.


                Relational Database         Graph Database
                                               d

                                        c      a
                                                      a

                                                b
                    127.0.0.1                 127.0.0.2




                                      Risk Symposium – Santa Fe, New Mexico – April 8, 2009
Types of Graphs
• Undirected single-relational graph: homogenous set of symmetric links.

• Directed single-relational graph: homogenous set of links.

• Directed multi-relational graph: heterogenous set of links.
                       undirected single-relational graph

                          x                                     z


                       directed single-relational graph

                          x                                     z


                       directed multi-relational graph

                          x                  y                  z




                                                     Risk Symposium – Santa Fe, New Mexico – April 8, 2009
Our Make Believe World - Phase 1

• Marko is a human and Fluffy is a dog.




                                   Risk Symposium – Santa Fe, New Mexico – April 8, 2009
Our World Modeled in a Relational Database - Phase 1

               ID      Name     Type        Legs       Fur

              0001     Marko    Human         2        false

              0002     Fluffy   Dog           4        true


            Object_Table




                                 Risk Symposium – Santa Fe, New Mexico – April 8, 2009
Our World Modeled in a Graph Database - Phase 1

               Human                                 Dog



                type                                 type


               0001                                 0002


               name                                 name
        legs           fur                 legs                fur


    2          Marko         false    4             Fluffy            true




                                     Risk Symposium – Santa Fe, New Mexico – April 8, 2009
Our Make Believe World - Phase 2

• Marko is a human and Fluffy is a dog.

• Marko and Fluffy are good friends.




                                         Risk Symposium – Santa Fe, New Mexico – April 8, 2009
Our World Modeled in a Relational Database - Phase 2

         ID      Name     Type    Legs     Fur              ID2            ID2

        0001     Marko    Human    2       false           0001           0002

        0002     Fluffy   Dog      4       true            0002           0001


      Object_Table                                     Friendship_Table




                                         Risk Symposium – Santa Fe, New Mexico – April 8, 2009
Our World Modeled in a Graph Database - Phase 2

               Human                                          Dog



                type                                          type

                                     friend
               0001                  friend                  0002


               name                                          name
        legs           fur                          legs                fur


    2          Marko         false             4             Fluffy            true




                                              Risk Symposium – Santa Fe, New Mexico – April 8, 2009
Our Make Believe World - Phase 3

• Marko is a human and Fluffy is a dog.

• Marko and Fluffy are good friends.

• Human and dog are a subclass of mammal.




                                         Risk Symposium – Santa Fe, New Mexico – April 8, 2009
Our World Modeled in a Relational Database - Phase 3

     ID      Name     Type    Legs   Fur       ID2          ID2           Type1       Type2

    0001     Marko    Human    2     false     0001        0002          Human       Mammal

    0002     Fluffy   Dog      4     true      0002        0001            Dog       Mammal


  Object_Table                               Friendship_Table          Subclass_Table




                                              Risk Symposium – Santa Fe, New Mexico – April 8, 2009
Our World Modeled in a Graph Database - Phase 3
                                          Mammal


                          subclassof                subclassof

                  Human                                          Dog



                   type                                          type

                                           friend
                  0001                     friend                0002


                  name                                           name
           legs             fur                          legs             fur


       2          Marko           false             4            Fluffy         true




                                                    Risk Symposium – Santa Fe, New Mexico – April 8, 2009
Our Make Believe World - Phase 4
• Marko is a human and Fluffy is a dog.

• Marko and Fluffy are good friends.

• Human and dog are a subclass of mammal.

• Fluffy peed on the carpet.




                                         Risk Symposium – Santa Fe, New Mexico – April 8, 2009
Our World Modeled in a Relational Database - Phase 4

      ID      Name     Type     Legs   Fur       ID2          ID2           Type1       Type2

     0001     Marko    Human     2     false     0001        0002          Human       Mammal

     0002     Fluffy    Dog      4     true      0002        0001            Dog       Mammal

     0003    My_Rug    Carpet   N/A    N/A
                                               Friendship_Table          Subclass_Table

   Object_Table                                  ID1          ID2

                                                 0002        0003

                                               Pee_Table




                                                Risk Symposium – Santa Fe, New Mexico – April 8, 2009
Our World Modeled in a Graph Database - Phase 4

                                       Mammal


                       subclassof                subclassof

               Human                                          Dog                       Carpet



                type                                          type                       type

                                        friend
               0001                     friend                0002       peedOn          0003


               name                                           name                       name
        legs             fur                         legs              fur


    2          Marko           false             4            Fluffy         true      My_Rug




                                                            Risk Symposium – Santa Fe, New Mexico – April 8, 2009
Our Make Believe World - Phase 5

• Marko is a human and Fluffy is a dog.

• Marko and Fluffy are good friends.

• Human and dog are a subclass of mammal.

• Fluffy peed on the carpet.

• Marko and Fluffy are both mammals.




                                         Risk Symposium – Santa Fe, New Mexico – April 8, 2009
Our World Modeled in a Relational Database - Phase 5
        ID      Name     Type     Legs   Fur       ID2         ID2         Type1       Type2

       0001     Marko    Human     2     false     0001       0002         Human      Mammal

       0002     Fluffy    Dog      4     true      0002       0001          Dog       Mammal

       0003    My_Rug    Carpet   N/A    N/A
                                                 Friendship_Table        Subclass_Table

     Object_Table                                  ID1         ID2           ID        Type

                                                   0002       0003          0001      Human

                                                 Pee_Table                  0002        Dog


                                                                            0003      Carpet

                                                                            0001      Mammal

                                                                            0002      Mammal


                                                                         Type_Table



                                                   Risk Symposium – Santa Fe, New Mexico – April 8, 2009
Our World Modeled in a Graph Database - Phase 5

                                        Mammal


                       subclassof                 subclassof

               Human                                           Dog                       Carpet

                         type                         type

                type                                           type                       type

                                         friend
               0001                      friend                0002       peedOn          0003


               name                                            name                       name
        legs             fur                          legs              fur


    2          Marko            false             4            Fluffy         true      My_Rug




                                                             Risk Symposium – Santa Fe, New Mexico – April 8, 2009
The Graph as the Natural World Model

• The world is inherently (or perceived as) object-oriented.

• The world is filled with objects and relations among them.

• The multi-relational graph is a very natural representation of the world.




                                         Risk Symposium – Santa Fe, New Mexico – April 8, 2009
The Graph as the Natural Programming Model

• High-level computer languages are object-oriented.

• Nearly no impedance mismatch between the multi-relational graph and
  the programming object.

• It is easy to go from graph database to in-memory object.

          Human marko = new Human();
          marko.name = "Marko";
          marko.addFriend(fluffy);
          marko.setHasFur(false);
          marko.setLegs(2);


                                        Risk Symposium – Santa Fe, New Mexico – April 8, 2009
SQL vs. SPARQL

SELECT OTY.Name FROM Object_Table AS OTX,
      Object_Table AS OTY, Friendship_Table WHERE
   OTX.Name = "Marko" AND
   Friendship_Table.ID1 = OTY.ID AND
   Friendship_Table.ID2 = OTX.ID;


SELECT ?z WHERE {
  ?x name "Marko" .
  ?y friend ?x .
  ?y name ?z }




                                 Risk Symposium – Santa Fe, New Mexico – April 8, 2009
Outline

• The Relational Database vs. the Graph Database

• The Web of Documents vs. the Web of Data




                                      Risk Symposium – Santa Fe, New Mexico – April 8, 2009
Internet Address Spaces

• The Uniform Resource Identifier (URI) is the superclass of the Uniform
  Resource Locator (URL) and Uniform Resource Name (URN).




                                       Risk Symposium – Santa Fe, New Mexico – April 8, 2009
The Uniform Resource Locator
• The set of all URLs is the address space of all resources that can be
  located and retrieved on the Web. URLs denote where a resource is.
    http://markorodriguez.com/index.html
    ∗ Domain name server (DNS): markorodriguez.com → 216.251.43.6
    ∗ http:// means GET at port 80,
    ∗ /index.html means the resource to get at that Internet location.

                                Web Server




                                  index.html




                             markorodriguez.com
                                216.251.43.6



                                               Risk Symposium – Santa Fe, New Mexico – April 8, 2009
The Uniform Resource Name

• The set of all URNs is the address space of all resources within the urn:
  namespace.
    urn:uuid:bd93def0-8026-11dd-842be54955baa12
    urn:issn:0892-3310
    urn:doi:10.1016/j.knosys.2008.03.030

• Named resources need not be retrievable through the Web.

• URNs denote what a resource is.




                                         Risk Symposium – Santa Fe, New Mexico – April 8, 2009
The Uniform Resource Identifier
• The URI address space is an infinite space for all Internet resources.
    http://markorodriguez.com/index.html
    urn:issn:0892-3310
    ftp://markorodriguez.com/private/markos_secrets.txt
    http://www.lanl.gov#fluffy

• Imporant: URIs can denote concepts, instances, and datum.




                         lanl:fluffy        lanl:fluffy_legs




                                         Risk Symposium – Santa Fe, New Mexico – April 8, 2009
The Web of Documents
• The World of Documents is primarily concerned with the Hyper-Text
  Transfer Protocol (HTTP) and with retrievable resources in the URL
  address space.

• These retrievable resources are files: HTML documents, images, audio,
  etc. The “web” is created when HTML documents contain URLs.
                            http://markorodriguez.com/


                                    index.html



                                        href


            Resume.html   href      Home.html          href       Research.html




                                                 Risk Symposium – Santa Fe, New Mexico – April 8, 2009
The Web of Data

• The Web of Data is primarily concerned with URIs.

• The Resource Description Framework (RDF) is the standard for
  representing the relationship between URIs and literals (e.g. float, string,
  date time, etc.).

                           subject                 predicate                object


                          lanl:marko               foaf:knows               lanl:fluffy


                           foaf:name                                       foaf:name


                "Marko A. Rodriguez"^^xsd:string                "Fluffy P. Everywhere"^^xsd:string




                                                                 Risk Symposium – Santa Fe, New Mexico – April 8, 2009
Our Make Believe World in RDF
                                                      lanl:Mammal



                            rdfs:subClassOf                             rdfs:subClassOf


                   lanl:Human                                                          lanl:Dog

                                   rdf:type                                 rdf:type
                        rdf:type                                                       rdf:type



                   lanl:marko                          lanl:friend                     lanl:fluffy

                                                       lanl:friend
             lanl:fur        lanl:legs                                       lanl:fur      lanl:legs
                   foaf:name                                                        foaf:name

"false"^^xsd:boolean               "2"^^xsd:integer            "true"^^xsd:boolean                  "4"^^xsd:integer


        "Marko A. Rodriguez"^^xsd:string                                "Fluffy P. Everywhere"^^xsd:string




                                                                 Risk Symposium – Santa Fe, New Mexico – April 8, 2009
The Web of Data is a Distributed Database

• The URI address space is distributed.

• URIs can denote datum.

• RDF denotes the relationships URIs.

• The Web of Data’s foundational standard is RDF.

• Therefore, the Web of Data is a distributed database.




                                          Risk Symposium – Santa Fe, New Mexico – April 8, 2009
The Web of Documents vs. the Web of Data

        Web Server                             Web Server




             HTML          href                     HTML


         127.0.0.1                              127.0.0.2




       Graph Database                        Graph Database



                        lanl:friend



         127.0.0.1                              127.0.0.2




                                      Risk Symposium – Santa Fe, New Mexico – April 8, 2009
The Current Web of Data - March 2009
                               homologenekegg                     projectgutenberg
                            symbol
                                                                                   homologenekegg
                                                                              libris                                          projectgutenberg
                                                   cas                          symbol
                                                                                     bbcjohnpeel
                                                                                                                                          libris
                  unists                    diseasome dailymed                 w3cwordnet
                                   chebi
                                        hgnc     pubchem           eurostat
                          mgi
                                   geneid
                                            omim                      wikicompany         geospecies
                                                                                                                   cas                               bbcjohnpeel
                                                                                                       diseasome dailymed
                                               drugbank                        worldfactbook
                       reactome
                                   pubmed                    unists
                                                                 magnatune
                                                                              opencyc                                                          w3cwordnet
              uniparc                                     linkedct                       chebi
                                                                                            freebase

taxonomy
         uniref
                             uniprot
                        geneontology
                                       interpro                                                    hgnc       pubchem              eurostat
                                   pdb                                   yago      umbel
                                             pfam                      mgi
                                                                   dbpedia                             omim
                                                                                              bbclatertotpgovtrack                    wikicompany         geospecies
                                         prosite
                               prodom                                     flickrwrappr
                                                                                         geneid
                                                                                      opencalais

                                                                reactome
                                                                                                uscensusdata
                                                                                                          drugbank                             worldfactbook
                                                                      lingvoj linkedmdb
                                                                                           surgeradio
                                                                                                                                 magnatune
                                                                                          pubmed
                                                                                  virtuososponger                                             opencyc
                                                  rdfbookmashup
                                             uniparc                                                                                                        freebase
                                                   swconferencecorpus         geonames musicbrainz      myspacewrapper    linkedct
                                         dblpberlin                         uniprot pubguide
                      taxonomy                         revyu                                      interpro
                                    uniref                       geneontologyjamendo bbcplaycountdata
                                                                   rdfohloh
                                                                                         pdb                                                       umbel
                                                                                                                                         yago
                                                    semanticweborg          siocsites        riese
                                                                                                        pfam                       dbpedia                    bbclatertotp            govtrack
                                                                foafprofiles
                            dblphannover    openguides                         audioscrobbler       prosite bbcprogrammes
                                                                                prodom
                                                                                     crunchbase                                           flickrwrappropencalais
                                                                    doapspace                                                                                   uscensusdata
                                                            flickrexporter
                                                                                                                                                           surgeradio
              budapestbme                                               qdos
                                                                                                                                      lingvoj linkedmdb
                                                                          semwebcentral                                                           virtuososponger
            eurecom                    ecssouthampton

                   pisa
                          dblprkbexplorer
                                  newcastle                                                                            rdfbookmashup
                                                                                                                                                   geonames musicbrainz
                                      rae2001
                                  eprints
                                       irittoulouse
                    laascnrs acm citeseer
                                                                                                                         swconferencecorpus                                        myspacewrapper
                           ieee                                                                                dblpberlin                                            pubguide
                resex
                                ibm

                                                                                                                            revyu                               jamendo
                                                                                                                                       rdfohloh
                                                                                                                                                                               bbcplaycountdata
                                                                                                                        semanticweborg        siocsites        riese
                                                                                                                                  foafprofiles
                                                                                                                  openguides                     audioscrobbler            bbcprogrammes
                                                                                               dblphannover
                                                                                                                                                       crunchbase
                                                                                                                             Risk Symposium – Santa Fe, New Mexico – April 8, 2009
                                                                                                                                       doapspace


                                                                                                                                 flickrexporter
                                                                                                                                            qdos
The Current Web of Data - March 2009
data set           domain       data set           domain        data set              domain
audioscrobbler     music        govtrack           government    pubguide              books
bbclatertotp       music        homologene         biology       qdos                  social
bbcplaycountdata   music        ibm                computer      rae2001               computer
bbcprogrammes      media        ieee               computer      rdfbookmashup         books
budapestbme        computer     interpro           biology       rdfohloh              social
chebi              biology      jamendo            music         resex                 computer
crunchbase         business     laascnrs           computer      riese                 government
dailymed           medical      libris             books         semanticweborg        computer
dblpberlin         computer     lingvoj            reference     semwebcentral         social
dblphannover       computer     linkedct           medical       siocsites             social
dblprkbexplorer    computer     linkedmdb          movie         surgeradio            music
dbpedia            general      magnatune          music         swconferencecorpus    computer
doapspace          social       musicbrainz        music         taxonomy              reference
drugbank           medical      myspacewrapper     social        umbel                 general
eurecom            computer     opencalais         reference     uniref                biology
eurostat           government   opencyc            general       unists                biology
flickrexporter      images       openguides         reference     uscensusdata          government
flickrwrappr        images       pdb                biology       virtuososponger       reference
foafprofiles        social       pfam               biology       w3cwordnet            reference
freebase           general      pisa               computer      wikicompany           business
geneid             biology      prodom             biology       worldfactbook         government
geneontology       biology      projectgutenberg   books         yago                  general
geonames           geographic   prosite            biology       ...



                                                     Risk Symposium – Santa Fe, New Mexico – April 8, 2009
Cultural Differences that are Leading to a New World of
          Large-Scale Knowledge Management

• Relational databases tend to not maintain public access points.

• Relational database users tend to not publish their schemas.

• Web of Data graph databases maintain public access points called
  SPARQL end-points.

• Web of Data graph databases tend to reuse and extend public schemas
  called ontologies.




                                        Risk Symposium – Santa Fe, New Mexico – April 8, 2009
Conclusion

• Thank you for your time...
    My homepage: http://markorodriguez.com
    Neno/Fhat: http://neno.lanl.gov
    Collective Decision Making Systems: http://cdms.lanl.gov
    Faith in the Algorithm: http://faithinthealgorithm.net
    MESUR: http://www.mesur.org




                                      Risk Symposium – Santa Fe, New Mexico – April 8, 2009

Más contenido relacionado

La actualidad más candente

Introduction to matlab
Introduction to matlabIntroduction to matlab
Introduction to matlabBilawalBaloch1
 
Moore and mealy machines
Moore and mealy machinesMoore and mealy machines
Moore and mealy machineslavishka_anuj
 
Greedy Algorithm - Knapsack Problem
Greedy Algorithm - Knapsack ProblemGreedy Algorithm - Knapsack Problem
Greedy Algorithm - Knapsack ProblemMadhu Bala
 
A simple approach of lexical analyzers
A simple approach of lexical analyzersA simple approach of lexical analyzers
A simple approach of lexical analyzersArchana Gopinath
 
6-Role of Parser, Construction of Parse Tree and Elimination of Ambiguity-06-...
6-Role of Parser, Construction of Parse Tree and Elimination of Ambiguity-06-...6-Role of Parser, Construction of Parse Tree and Elimination of Ambiguity-06-...
6-Role of Parser, Construction of Parse Tree and Elimination of Ambiguity-06-...movocode
 
Handling Missing Values for Machine Learning.pptx
Handling Missing Values for Machine Learning.pptxHandling Missing Values for Machine Learning.pptx
Handling Missing Values for Machine Learning.pptxShamimBhuiyan8
 
Dinive conquer algorithm
Dinive conquer algorithmDinive conquer algorithm
Dinive conquer algorithmMohd Arif
 
Import and Export Big Data using R Studio
Import and Export Big Data using R StudioImport and Export Big Data using R Studio
Import and Export Big Data using R StudioRupak Roy
 
Automata theory - CFG and normal forms
Automata theory - CFG and normal formsAutomata theory - CFG and normal forms
Automata theory - CFG and normal formsAkila Krishnamoorthy
 
Data analysis with R
Data analysis with RData analysis with R
Data analysis with RShareThis
 
2.5 backpropagation
2.5 backpropagation2.5 backpropagation
2.5 backpropagationKrish_ver2
 
Matlab ploting
Matlab plotingMatlab ploting
Matlab plotingAmeen San
 
Python Pandas
Python PandasPython Pandas
Python PandasSunil OS
 
Quick sort-Data Structure
Quick sort-Data StructureQuick sort-Data Structure
Quick sort-Data StructureJeanie Arnoco
 

La actualidad más candente (20)

Introduction to matlab
Introduction to matlabIntroduction to matlab
Introduction to matlab
 
Moore and mealy machines
Moore and mealy machinesMoore and mealy machines
Moore and mealy machines
 
B tree
B treeB tree
B tree
 
Greedy Algorithm - Knapsack Problem
Greedy Algorithm - Knapsack ProblemGreedy Algorithm - Knapsack Problem
Greedy Algorithm - Knapsack Problem
 
A simple approach of lexical analyzers
A simple approach of lexical analyzersA simple approach of lexical analyzers
A simple approach of lexical analyzers
 
6-Role of Parser, Construction of Parse Tree and Elimination of Ambiguity-06-...
6-Role of Parser, Construction of Parse Tree and Elimination of Ambiguity-06-...6-Role of Parser, Construction of Parse Tree and Elimination of Ambiguity-06-...
6-Role of Parser, Construction of Parse Tree and Elimination of Ambiguity-06-...
 
6. R data structures
6. R data structures6. R data structures
6. R data structures
 
Handling Missing Values for Machine Learning.pptx
Handling Missing Values for Machine Learning.pptxHandling Missing Values for Machine Learning.pptx
Handling Missing Values for Machine Learning.pptx
 
Dinive conquer algorithm
Dinive conquer algorithmDinive conquer algorithm
Dinive conquer algorithm
 
Regular expressions
Regular expressionsRegular expressions
Regular expressions
 
Import and Export Big Data using R Studio
Import and Export Big Data using R StudioImport and Export Big Data using R Studio
Import and Export Big Data using R Studio
 
Automata theory - CFG and normal forms
Automata theory - CFG and normal formsAutomata theory - CFG and normal forms
Automata theory - CFG and normal forms
 
Data analysis with R
Data analysis with RData analysis with R
Data analysis with R
 
Python Operators
Python OperatorsPython Operators
Python Operators
 
Quick sort
Quick sortQuick sort
Quick sort
 
2.5 backpropagation
2.5 backpropagation2.5 backpropagation
2.5 backpropagation
 
Matlab ploting
Matlab plotingMatlab ploting
Matlab ploting
 
Csc446: Pattern Recognition
Csc446: Pattern Recognition Csc446: Pattern Recognition
Csc446: Pattern Recognition
 
Python Pandas
Python PandasPython Pandas
Python Pandas
 
Quick sort-Data Structure
Quick sort-Data StructureQuick sort-Data Structure
Quick sort-Data Structure
 

Más de Marko Rodriguez

mm-ADT: A Virtual Machine/An Economic Machine
mm-ADT: A Virtual Machine/An Economic Machinemm-ADT: A Virtual Machine/An Economic Machine
mm-ADT: A Virtual Machine/An Economic MachineMarko Rodriguez
 
mm-ADT: A Multi-Model Abstract Data Type
mm-ADT: A Multi-Model Abstract Data Typemm-ADT: A Multi-Model Abstract Data Type
mm-ADT: A Multi-Model Abstract Data TypeMarko Rodriguez
 
Open Problems in the Universal Graph Theory
Open Problems in the Universal Graph TheoryOpen Problems in the Universal Graph Theory
Open Problems in the Universal Graph TheoryMarko Rodriguez
 
Gremlin 101.3 On Your FM Dial
Gremlin 101.3 On Your FM DialGremlin 101.3 On Your FM Dial
Gremlin 101.3 On Your FM DialMarko Rodriguez
 
Gremlin's Graph Traversal Machinery
Gremlin's Graph Traversal MachineryGremlin's Graph Traversal Machinery
Gremlin's Graph Traversal MachineryMarko Rodriguez
 
Quantum Processes in Graph Computing
Quantum Processes in Graph ComputingQuantum Processes in Graph Computing
Quantum Processes in Graph ComputingMarko Rodriguez
 
ACM DBPL Keynote: The Graph Traversal Machine and Language
ACM DBPL Keynote: The Graph Traversal Machine and LanguageACM DBPL Keynote: The Graph Traversal Machine and Language
ACM DBPL Keynote: The Graph Traversal Machine and LanguageMarko Rodriguez
 
The Gremlin Graph Traversal Language
The Gremlin Graph Traversal LanguageThe Gremlin Graph Traversal Language
The Gremlin Graph Traversal LanguageMarko Rodriguez
 
Faunus: Graph Analytics Engine
Faunus: Graph Analytics EngineFaunus: Graph Analytics Engine
Faunus: Graph Analytics EngineMarko Rodriguez
 
Solving Problems with Graphs
Solving Problems with GraphsSolving Problems with Graphs
Solving Problems with GraphsMarko Rodriguez
 
Titan: The Rise of Big Graph Data
Titan: The Rise of Big Graph DataTitan: The Rise of Big Graph Data
Titan: The Rise of Big Graph DataMarko Rodriguez
 
The Pathology of Graph Databases
The Pathology of Graph DatabasesThe Pathology of Graph Databases
The Pathology of Graph DatabasesMarko Rodriguez
 
Traversing Graph Databases with Gremlin
Traversing Graph Databases with GremlinTraversing Graph Databases with Gremlin
Traversing Graph Databases with GremlinMarko Rodriguez
 
The Path-o-Logical Gremlin
The Path-o-Logical GremlinThe Path-o-Logical Gremlin
The Path-o-Logical GremlinMarko Rodriguez
 
The Gremlin in the Graph
The Gremlin in the GraphThe Gremlin in the Graph
The Gremlin in the GraphMarko Rodriguez
 
Memoirs of a Graph Addict: Despair to Redemption
Memoirs of a Graph Addict: Despair to RedemptionMemoirs of a Graph Addict: Despair to Redemption
Memoirs of a Graph Addict: Despair to RedemptionMarko Rodriguez
 
Graph Databases: Trends in the Web of Data
Graph Databases: Trends in the Web of DataGraph Databases: Trends in the Web of Data
Graph Databases: Trends in the Web of DataMarko Rodriguez
 
Problem-Solving using Graph Traversals: Searching, Scoring, Ranking, and Reco...
Problem-Solving using Graph Traversals: Searching, Scoring, Ranking, and Reco...Problem-Solving using Graph Traversals: Searching, Scoring, Ranking, and Reco...
Problem-Solving using Graph Traversals: Searching, Scoring, Ranking, and Reco...Marko Rodriguez
 
A Perspective on Graph Theory and Network Science
A Perspective on Graph Theory and Network ScienceA Perspective on Graph Theory and Network Science
A Perspective on Graph Theory and Network ScienceMarko Rodriguez
 

Más de Marko Rodriguez (20)

mm-ADT: A Virtual Machine/An Economic Machine
mm-ADT: A Virtual Machine/An Economic Machinemm-ADT: A Virtual Machine/An Economic Machine
mm-ADT: A Virtual Machine/An Economic Machine
 
mm-ADT: A Multi-Model Abstract Data Type
mm-ADT: A Multi-Model Abstract Data Typemm-ADT: A Multi-Model Abstract Data Type
mm-ADT: A Multi-Model Abstract Data Type
 
Open Problems in the Universal Graph Theory
Open Problems in the Universal Graph TheoryOpen Problems in the Universal Graph Theory
Open Problems in the Universal Graph Theory
 
Gremlin 101.3 On Your FM Dial
Gremlin 101.3 On Your FM DialGremlin 101.3 On Your FM Dial
Gremlin 101.3 On Your FM Dial
 
Gremlin's Graph Traversal Machinery
Gremlin's Graph Traversal MachineryGremlin's Graph Traversal Machinery
Gremlin's Graph Traversal Machinery
 
Quantum Processes in Graph Computing
Quantum Processes in Graph ComputingQuantum Processes in Graph Computing
Quantum Processes in Graph Computing
 
ACM DBPL Keynote: The Graph Traversal Machine and Language
ACM DBPL Keynote: The Graph Traversal Machine and LanguageACM DBPL Keynote: The Graph Traversal Machine and Language
ACM DBPL Keynote: The Graph Traversal Machine and Language
 
The Gremlin Graph Traversal Language
The Gremlin Graph Traversal LanguageThe Gremlin Graph Traversal Language
The Gremlin Graph Traversal Language
 
The Path Forward
The Path ForwardThe Path Forward
The Path Forward
 
Faunus: Graph Analytics Engine
Faunus: Graph Analytics EngineFaunus: Graph Analytics Engine
Faunus: Graph Analytics Engine
 
Solving Problems with Graphs
Solving Problems with GraphsSolving Problems with Graphs
Solving Problems with Graphs
 
Titan: The Rise of Big Graph Data
Titan: The Rise of Big Graph DataTitan: The Rise of Big Graph Data
Titan: The Rise of Big Graph Data
 
The Pathology of Graph Databases
The Pathology of Graph DatabasesThe Pathology of Graph Databases
The Pathology of Graph Databases
 
Traversing Graph Databases with Gremlin
Traversing Graph Databases with GremlinTraversing Graph Databases with Gremlin
Traversing Graph Databases with Gremlin
 
The Path-o-Logical Gremlin
The Path-o-Logical GremlinThe Path-o-Logical Gremlin
The Path-o-Logical Gremlin
 
The Gremlin in the Graph
The Gremlin in the GraphThe Gremlin in the Graph
The Gremlin in the Graph
 
Memoirs of a Graph Addict: Despair to Redemption
Memoirs of a Graph Addict: Despair to RedemptionMemoirs of a Graph Addict: Despair to Redemption
Memoirs of a Graph Addict: Despair to Redemption
 
Graph Databases: Trends in the Web of Data
Graph Databases: Trends in the Web of DataGraph Databases: Trends in the Web of Data
Graph Databases: Trends in the Web of Data
 
Problem-Solving using Graph Traversals: Searching, Scoring, Ranking, and Reco...
Problem-Solving using Graph Traversals: Searching, Scoring, Ranking, and Reco...Problem-Solving using Graph Traversals: Searching, Scoring, Ranking, and Reco...
Problem-Solving using Graph Traversals: Searching, Scoring, Ranking, and Reco...
 
A Perspective on Graph Theory and Network Science
A Perspective on Graph Theory and Network ScienceA Perspective on Graph Theory and Network Science
A Perspective on Graph Theory and Network Science
 

Último

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 

Último (20)

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 

Graph Databases and the Future of Large-Scale Knowledge Management

  • 1. Graph Databases and the Future of Large-Scale Knowledge Management Marko A. Rodriguez T-5, Center for Nonlinear Studies Los Alamos National Laboratory http://markorodriguez.com April 8, 2009
  • 2. Abstract Modern day open source and commercial graph databases can store on the order of 1 billion relationships with some databases reaching the 10 billion mark. These developments are making the graph database practical for applications that require large-scale knowledge structures. Moreover, with the Web of Data standards set forth by the Linked Data community, it is possible to interlink graph databases across the web into a giant global knowledge structure. This talk will discuss graph databases, their underlying data model, their querying mechanisms, and the benefits of the graph data structure for modeling and analysis. Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 3. Outline • The Relational Database vs. the Graph Database • The Web of Documents vs. the Web of Data Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 4. Outline • The Relational Database vs. the Graph Database • The Web of Documents vs. the Web of Data Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 5. The Relational Database vs. the Graph Database • A relational database’s (e.g. MySQL, PostgreSQL, Oracle) data model is a collection interlinked tables. • A graph database’s (e.g. OpenSesame, AllegroGraph, Neo4j) data model is a multi-relational graph. Relational Database Graph Database d c a a b 127.0.0.1 127.0.0.2 Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 6. Types of Graphs • Undirected single-relational graph: homogenous set of symmetric links. • Directed single-relational graph: homogenous set of links. • Directed multi-relational graph: heterogenous set of links. undirected single-relational graph x z directed single-relational graph x z directed multi-relational graph x y z Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 7. Our Make Believe World - Phase 1 • Marko is a human and Fluffy is a dog. Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 8. Our World Modeled in a Relational Database - Phase 1 ID Name Type Legs Fur 0001 Marko Human 2 false 0002 Fluffy Dog 4 true Object_Table Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 9. Our World Modeled in a Graph Database - Phase 1 Human Dog type type 0001 0002 name name legs fur legs fur 2 Marko false 4 Fluffy true Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 10. Our Make Believe World - Phase 2 • Marko is a human and Fluffy is a dog. • Marko and Fluffy are good friends. Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 11. Our World Modeled in a Relational Database - Phase 2 ID Name Type Legs Fur ID2 ID2 0001 Marko Human 2 false 0001 0002 0002 Fluffy Dog 4 true 0002 0001 Object_Table Friendship_Table Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 12. Our World Modeled in a Graph Database - Phase 2 Human Dog type type friend 0001 friend 0002 name name legs fur legs fur 2 Marko false 4 Fluffy true Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 13. Our Make Believe World - Phase 3 • Marko is a human and Fluffy is a dog. • Marko and Fluffy are good friends. • Human and dog are a subclass of mammal. Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 14. Our World Modeled in a Relational Database - Phase 3 ID Name Type Legs Fur ID2 ID2 Type1 Type2 0001 Marko Human 2 false 0001 0002 Human Mammal 0002 Fluffy Dog 4 true 0002 0001 Dog Mammal Object_Table Friendship_Table Subclass_Table Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 15. Our World Modeled in a Graph Database - Phase 3 Mammal subclassof subclassof Human Dog type type friend 0001 friend 0002 name name legs fur legs fur 2 Marko false 4 Fluffy true Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 16. Our Make Believe World - Phase 4 • Marko is a human and Fluffy is a dog. • Marko and Fluffy are good friends. • Human and dog are a subclass of mammal. • Fluffy peed on the carpet. Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 17. Our World Modeled in a Relational Database - Phase 4 ID Name Type Legs Fur ID2 ID2 Type1 Type2 0001 Marko Human 2 false 0001 0002 Human Mammal 0002 Fluffy Dog 4 true 0002 0001 Dog Mammal 0003 My_Rug Carpet N/A N/A Friendship_Table Subclass_Table Object_Table ID1 ID2 0002 0003 Pee_Table Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 18. Our World Modeled in a Graph Database - Phase 4 Mammal subclassof subclassof Human Dog Carpet type type type friend 0001 friend 0002 peedOn 0003 name name name legs fur legs fur 2 Marko false 4 Fluffy true My_Rug Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 19. Our Make Believe World - Phase 5 • Marko is a human and Fluffy is a dog. • Marko and Fluffy are good friends. • Human and dog are a subclass of mammal. • Fluffy peed on the carpet. • Marko and Fluffy are both mammals. Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 20. Our World Modeled in a Relational Database - Phase 5 ID Name Type Legs Fur ID2 ID2 Type1 Type2 0001 Marko Human 2 false 0001 0002 Human Mammal 0002 Fluffy Dog 4 true 0002 0001 Dog Mammal 0003 My_Rug Carpet N/A N/A Friendship_Table Subclass_Table Object_Table ID1 ID2 ID Type 0002 0003 0001 Human Pee_Table 0002 Dog 0003 Carpet 0001 Mammal 0002 Mammal Type_Table Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 21. Our World Modeled in a Graph Database - Phase 5 Mammal subclassof subclassof Human Dog Carpet type type type type type friend 0001 friend 0002 peedOn 0003 name name name legs fur legs fur 2 Marko false 4 Fluffy true My_Rug Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 22. The Graph as the Natural World Model • The world is inherently (or perceived as) object-oriented. • The world is filled with objects and relations among them. • The multi-relational graph is a very natural representation of the world. Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 23. The Graph as the Natural Programming Model • High-level computer languages are object-oriented. • Nearly no impedance mismatch between the multi-relational graph and the programming object. • It is easy to go from graph database to in-memory object. Human marko = new Human(); marko.name = "Marko"; marko.addFriend(fluffy); marko.setHasFur(false); marko.setLegs(2); Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 24. SQL vs. SPARQL SELECT OTY.Name FROM Object_Table AS OTX, Object_Table AS OTY, Friendship_Table WHERE OTX.Name = "Marko" AND Friendship_Table.ID1 = OTY.ID AND Friendship_Table.ID2 = OTX.ID; SELECT ?z WHERE { ?x name "Marko" . ?y friend ?x . ?y name ?z } Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 25. Outline • The Relational Database vs. the Graph Database • The Web of Documents vs. the Web of Data Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 26. Internet Address Spaces • The Uniform Resource Identifier (URI) is the superclass of the Uniform Resource Locator (URL) and Uniform Resource Name (URN). Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 27. The Uniform Resource Locator • The set of all URLs is the address space of all resources that can be located and retrieved on the Web. URLs denote where a resource is. http://markorodriguez.com/index.html ∗ Domain name server (DNS): markorodriguez.com → 216.251.43.6 ∗ http:// means GET at port 80, ∗ /index.html means the resource to get at that Internet location. Web Server index.html markorodriguez.com 216.251.43.6 Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 28. The Uniform Resource Name • The set of all URNs is the address space of all resources within the urn: namespace. urn:uuid:bd93def0-8026-11dd-842be54955baa12 urn:issn:0892-3310 urn:doi:10.1016/j.knosys.2008.03.030 • Named resources need not be retrievable through the Web. • URNs denote what a resource is. Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 29. The Uniform Resource Identifier • The URI address space is an infinite space for all Internet resources. http://markorodriguez.com/index.html urn:issn:0892-3310 ftp://markorodriguez.com/private/markos_secrets.txt http://www.lanl.gov#fluffy • Imporant: URIs can denote concepts, instances, and datum. lanl:fluffy lanl:fluffy_legs Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 30. The Web of Documents • The World of Documents is primarily concerned with the Hyper-Text Transfer Protocol (HTTP) and with retrievable resources in the URL address space. • These retrievable resources are files: HTML documents, images, audio, etc. The “web” is created when HTML documents contain URLs. http://markorodriguez.com/ index.html href Resume.html href Home.html href Research.html Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 31. The Web of Data • The Web of Data is primarily concerned with URIs. • The Resource Description Framework (RDF) is the standard for representing the relationship between URIs and literals (e.g. float, string, date time, etc.). subject predicate object lanl:marko foaf:knows lanl:fluffy foaf:name foaf:name "Marko A. Rodriguez"^^xsd:string "Fluffy P. Everywhere"^^xsd:string Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 32. Our Make Believe World in RDF lanl:Mammal rdfs:subClassOf rdfs:subClassOf lanl:Human lanl:Dog rdf:type rdf:type rdf:type rdf:type lanl:marko lanl:friend lanl:fluffy lanl:friend lanl:fur lanl:legs lanl:fur lanl:legs foaf:name foaf:name "false"^^xsd:boolean "2"^^xsd:integer "true"^^xsd:boolean "4"^^xsd:integer "Marko A. Rodriguez"^^xsd:string "Fluffy P. Everywhere"^^xsd:string Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 33. The Web of Data is a Distributed Database • The URI address space is distributed. • URIs can denote datum. • RDF denotes the relationships URIs. • The Web of Data’s foundational standard is RDF. • Therefore, the Web of Data is a distributed database. Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 34. The Web of Documents vs. the Web of Data Web Server Web Server HTML href HTML 127.0.0.1 127.0.0.2 Graph Database Graph Database lanl:friend 127.0.0.1 127.0.0.2 Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 35. The Current Web of Data - March 2009 homologenekegg projectgutenberg symbol homologenekegg libris projectgutenberg cas symbol bbcjohnpeel libris unists diseasome dailymed w3cwordnet chebi hgnc pubchem eurostat mgi geneid omim wikicompany geospecies cas bbcjohnpeel diseasome dailymed drugbank worldfactbook reactome pubmed unists magnatune opencyc w3cwordnet uniparc linkedct chebi freebase taxonomy uniref uniprot geneontology interpro hgnc pubchem eurostat pdb yago umbel pfam mgi dbpedia omim bbclatertotpgovtrack wikicompany geospecies prosite prodom flickrwrappr geneid opencalais reactome uscensusdata drugbank worldfactbook lingvoj linkedmdb surgeradio magnatune pubmed virtuososponger opencyc rdfbookmashup uniparc freebase swconferencecorpus geonames musicbrainz myspacewrapper linkedct dblpberlin uniprot pubguide taxonomy revyu interpro uniref geneontologyjamendo bbcplaycountdata rdfohloh pdb umbel yago semanticweborg siocsites riese pfam dbpedia bbclatertotp govtrack foafprofiles dblphannover openguides audioscrobbler prosite bbcprogrammes prodom crunchbase flickrwrappropencalais doapspace uscensusdata flickrexporter surgeradio budapestbme qdos lingvoj linkedmdb semwebcentral virtuososponger eurecom ecssouthampton pisa dblprkbexplorer newcastle rdfbookmashup geonames musicbrainz rae2001 eprints irittoulouse laascnrs acm citeseer swconferencecorpus myspacewrapper ieee dblpberlin pubguide resex ibm revyu jamendo rdfohloh bbcplaycountdata semanticweborg siocsites riese foafprofiles openguides audioscrobbler bbcprogrammes dblphannover crunchbase Risk Symposium – Santa Fe, New Mexico – April 8, 2009 doapspace flickrexporter qdos
  • 36. The Current Web of Data - March 2009 data set domain data set domain data set domain audioscrobbler music govtrack government pubguide books bbclatertotp music homologene biology qdos social bbcplaycountdata music ibm computer rae2001 computer bbcprogrammes media ieee computer rdfbookmashup books budapestbme computer interpro biology rdfohloh social chebi biology jamendo music resex computer crunchbase business laascnrs computer riese government dailymed medical libris books semanticweborg computer dblpberlin computer lingvoj reference semwebcentral social dblphannover computer linkedct medical siocsites social dblprkbexplorer computer linkedmdb movie surgeradio music dbpedia general magnatune music swconferencecorpus computer doapspace social musicbrainz music taxonomy reference drugbank medical myspacewrapper social umbel general eurecom computer opencalais reference uniref biology eurostat government opencyc general unists biology flickrexporter images openguides reference uscensusdata government flickrwrappr images pdb biology virtuososponger reference foafprofiles social pfam biology w3cwordnet reference freebase general pisa computer wikicompany business geneid biology prodom biology worldfactbook government geneontology biology projectgutenberg books yago general geonames geographic prosite biology ... Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 37. Cultural Differences that are Leading to a New World of Large-Scale Knowledge Management • Relational databases tend to not maintain public access points. • Relational database users tend to not publish their schemas. • Web of Data graph databases maintain public access points called SPARQL end-points. • Web of Data graph databases tend to reuse and extend public schemas called ontologies. Risk Symposium – Santa Fe, New Mexico – April 8, 2009
  • 38. Conclusion • Thank you for your time... My homepage: http://markorodriguez.com Neno/Fhat: http://neno.lanl.gov Collective Decision Making Systems: http://cdms.lanl.gov Faith in the Algorithm: http://faithinthealgorithm.net MESUR: http://www.mesur.org Risk Symposium – Santa Fe, New Mexico – April 8, 2009