SlideShare una empresa de Scribd logo
1 de 51
Descargar para leer sin conexión
KNOWLEDGE
INFORMATION
DATA
Adding Value Through Graph
Analysis

Matthias Broecheler, CTO
@mbroecheler               AURELIUS
March V, MMXIII            THINKAURELIUS.COM
"                                   "
                                                     "
               "
                                                "
                                 "                       "

    "
                                           "
Communities of Interest 
         Finding Influencers 
                        "
Understanding Behavior 
                                "
"                                "
                                                  "
                "
                                             "
                              "                       "

    "
                                        "
Information Integration 
          Recommendation 
                        "
Question Answering 
                             "
"                                  "
                                                    "
                "
                                               "
                                "                       "


    "
                                          "
Fraud Detection 
             Risk Analysis 
                        "
Market Valuation 
                               "
Knowledge




               Value
Information




   Data
likes(Jane Joe, cute mamals):0.8


                                    Knowledge
         userid:3552
"
  clicked
 timestamp:
                  addid:9914
                                    Information
 93932342
                 "
2013-03-03 18:52:48:112;
12.123.211.192; ACCESS/TRR;
http://adserve.domain.com/
render.cgi?
uid=F32282DA39B&flagtru&xls=trendi     Data
ng ; ACTION=CLICK|DELAY=250|x=450|
y=632!
Graph Databases 
                                                          &
likes(Jane Joe, cute mamals):0.8
                                                    Graph Analysis
                                    Knowledge
         userid:3552
"
  clicked
 timestamp:
                  addid:9914
                                    Information
 93932342
                 "
2013-03-03 18:52:48:112;
12.123.211.192; ACCESS/TRR;
http://adserve.domain.com/
render.cgi?
uid=F32282DA39B&flagtru&xls=trendi     Data
ng ; ACTION=CLICK|DELAY=250|x=450|
y=632!
I
Graph Foundation


                   AURELIUS
                   THINKAURELIUS.COM
name: Neptune
   name: Alcmene
                         type: god
       type: god



Vertex
                                                              Property


         name: Saturn
   name: Jupiter
   name: Hercules
         type: titan
    type: god
       type: demigod




                         name: Pluto
     name: Cerberus
                         type: god
       type: monster




                                                            Graph
name: Neptune
                  name: Alcmene
                                   type: god
                      type: god



Edge
                        brother
                         mother


       name: Saturn
               name: Jupiter
                  name: Hercules
       type: titan
                type: god
                      type: demigod



              father
                       father

                                                                                        Edge
                                                        battled
                        brother
                                                      Property
                                                      time:12


                                   name: Pluto
                    name: Cerberus
                                   type: god
                      type: monster

   Edge
   Type                                      pet



                                                                                     Graph
name: Neptune
                  name: Alcmene
                            type: god
                      type: god




                 brother
                         mother


name: Saturn
               name: Jupiter
                  name: Hercules
type: titan
                type: god
                      type: demigod



       father
                       father


                                                 battled
                 brother
                                               time:12


                            name: Pluto
                    name: Cerberus
                            type: god
                      type: monster



                                      pet



                                                                              Path
name: Neptune
                  name: Alcmene
                            type: god
                      type: god




                 brother
                         mother


name: Saturn
               name: Jupiter
                  name: Hercules
type: titan
                type: god
                      type: demigod



       father
                       father


                                                 battled
                 brother
                                               time:12


                            name: Pluto
                    name: Cerberus
                            type: god
                      type: monster



                                      pet



                                                                              Degree
Apache 2

            Aurelius Graph Cluster
          TITAN                                 FAUNUS                               FULGORA




                                Map/Reduce
                                                                          Load

                                 Bulk Load




                                 Analysis results
                                 back into Titan


    Stores a massive-scale                    Batch processing of large           Runs global graph algorithms
property graph allowing real-                   graphs with Hadoop
                  on large, compressed,
 time traversals and updates
                                                          in-memory graphs
II
Titan Graph Database



                       AURELIUS
                       THINKAURELIUS.COM
Titan Features
  Numerous Concurrent Users
  Many Short Transactions
    read/write
  Real-time Traversals (OLTP)
  High Availability
  Dynamic Scalability
  Variable Consistency Model
    ACID or eventual consistency
  Real-time Big Graph Data
Storage Backends
               Partitionability




Consistency
                       Availability
$ ./titan-0.2.0/bin/gremlin.sh!
  ! ! !,,,/!
         (o o)!
-----oOOo-(_)-oOOo-----!
gremlin> g = TitanFactory.open('/tmp/titan')!
==>titangraph[local:/tmp/titan]!
gremlin> v = g.V(‘name’,’Hercules’)!
==>v[4]!
gremlin> v.out(‘father’).out(‘brother’).name!
name: Neptune
                  name: Alcmene
                                  type: god
                      type: god




                       brother
                         mother


      name: Saturn
               name: Jupiter
                  name: Hercules
      type: titan
                type: god
                      type: demigod



             father
                       father


                                                       battled
                       brother
                                                     time:12


                                  name: Pluto
                    name: Cerberus
                                  type: god
                      type: monster



                                            pet




gremlin> v.out(‘father’).out(‘brother’).name!
Vertex-Centric Indices
  Sort and index edges per
   vertex by primary key
    Primary key can be composite
  Enables efficient focused
   traversals
    Only retrieve edges that matter
  Uses push down predicates for
   quick, index-driven retrieval
battled
         battled
        battled
 time: 1
        time: 3
        time: 5



       mother
                       battled
                            v
                  v.query()!
                                     time: 9



  father
        fought
         fought
battled
         battled
        battled
 time: 1
        time: 3
        time: 5



       mother
                       battled
                            v
                  v.query()!
                                     time: 9
                                                 .direction(OUT)!

  father
battled
    battled
        battled
 time: 1
   time: 3
        time: 5




                                battled
                       v
                  v.query()!
                                time: 9
                                            .direction(OUT)!
                                            .labels(‘battled’)!
battled
    battled
 time: 1
   time: 3




                       v
   v.query()!
                             .direction(OUT)!
                             .labels(‘battled’)!
                             .has(‘time,T.lt,5)!
Titan Features

I.  Data Management




II.  Vertex-Centric
     Indices
Titan Features

III.  Graph
   Partitioning




IV.  Edge Compression
III
TITAN 0.3.0 [-SNAPSHOT]



                          AURELIUS
                          THINKAURELIUS.COM
Titan Embedding
  Rexster RexPro
    lightweight Gremlin
     Server
    binary protocol
  Titan Gremlin Engine
  Embedded Storage
   Backend
    in-JVM method calls
  Native clients
    Java, Python, Clojure
Graph Indexing
  Vertex and Edge indexing
  Pluggable index provider
    ElasticSearch
    Lucene
  Full-text search
  Numeric range search
  Geographic search
name: Neptune
                  name: Alcmene
                            age: 5200
                      age: 3300
                            title: God of the
                            earth and ocean




                 brother
                       mother

                            name: Jupiter
name: Saturn
               age: 4800
                      name: Hercules
age: 5900
                  title: God of the               title: Divine hero
                            heaven and skies


       father
                       father

                                                          battled

                 brother
                                      time:12
                                                               location: (38.071,23.745)


                            name: Pluto
                                                            name: Cerberus
                            age: 4900
                                                            title: Ugly beast of the
                            title: God of the
                                                            underworld
                            underworld

                                       pet
name: Neptune
                  name: Alcmene
                                   age: 5200
                      age: 3300
                                   title: God of the
                                   earth and ocean




                        brother
                       mother

                                   name: Jupiter
       name: Saturn
               age: 4800
                      name: Hercules
       age: 5900
                  title: God of the               title: Divine hero
                                   heaven and skies


              father
                       father

                                                                 battled

                        brother
                                      time:12
                                                                      location: (38.071,23.745)


                                   name: Pluto
                                                                   name: Cerberus
                                   age: 4900
                                                                   title: Ugly beast of the
                                   title: God of the
                                                                   underworld
                                   underworld

                                              pet




g.query().has(‘age’,Cmp.GREATER_THAN,5000).vertices()!
name: Neptune
                  name: Alcmene
                                   age: 5200
                      age: 3300
                                   title: God of the
                                   earth and ocean




                        brother
                       mother

                                   name: Jupiter
       name: Saturn
               age: 4800
                      name: Hercules
       age: 5900
                  title: God of the               title: Divine hero
                                   heaven and skies


              father
                       father

                                                                 battled

                        brother
                                      time:12
                                                                      location: (38.071,23.745)


                                   name: Pluto
                                                                   name: Cerberus
                                   age: 4900
                                                                   title: Ugly beast of the
                                   title: God of the
                                                                   underworld
                                   underworld

                                              pet




g.query().has(‘title’,Txt.CONTAINS,’god’).vertices()!
name: Neptune
                  name: Alcmene
                              age: 5200
                      age: 3300
                              title: God of the
                              earth and ocean




                   brother
                       mother

                              name: Jupiter
  name: Saturn
               age: 4800
                      name: Hercules
  age: 5900
                  title: God of the               title: Divine hero
                              heaven and skies


         father
                       father

                                                            battled

                   brother
                                      time:12
                                                                 location: (38.071,23.745)


                              name: Pluto
                                                              name: Cerberus
                              age: 4900
                                                              title: Ugly beast of the
                              title: God of the
                                                              underworld
                              underworld

                                         pet




g.query().has(‘age’,Cmp.GREATER_THAN,5000)

has(‘title’,Txt.CONTAINS,’god’).vertices()!
name: Neptune
                  name: Alcmene
                            age: 5200
                      age: 3300
                            title: God of the
                            earth and ocean




                 brother
                       mother

                            name: Jupiter
name: Saturn
               age: 4800
                      name: Hercules
age: 5900
                  title: God of the               title: Divine hero
                            heaven and skies


       father
                       father

                                                          battled

                 brother
                                      time:12
                                                               location: (38.071,23.745)


                            name: Pluto
                                                            name: Cerberus
                            age: 4900
                                                            title: Ugly beast of the
                            title: God of the
                                                            underworld
                            underworld

                                       pet




  g.query().has(‘location’,Geo.WITHIN,

   Geoshape.circle(38,23,100).edges()!
IV
Faunus Graph Analytics



                         AURELIUS
                         THINKAURELIUS.COM
Faunus Features
  Hadoop-based Graph
   Computing Framework
  Graph Analytics
  Breadth-first Traversals
  Global Graph Computations
  Batch Big Graph Data
Faunus Architecture




         g._()!
Faunus Work Flow

g.V.out                        .out                   .count()




                                  hdfs://user/ubuntu/
                                      output/job-0/
                                      output/job-1/       graph*
                                      output/job-2/   {   sideeffect*
Compressed HDFS Graphs
  stored in sequence files
  variable length encoding
  prefix compression
Apache 2

            Aurelius Graph Cluster
          TITAN                                 FAUNUS                               FULGORA




                                Map/Reduce
                                                                          Load

                                 Bulk Load




                                 Analysis results
                                 back into Titan


    Stores a massive-scale                    Batch processing of large           Runs global graph algorithms
property graph allowing real-                   graphs with Hadoop
                  on large, compressed,
 time traversals and updates
                                                          in-memory graphs
What’s New
  Faunus 0.1 released
  Bulk Import / Export for Titan
    loaded graph into Titan
    loading derivations into Titan
    RDF support
  Many optimizations
    vertex compression
Faunus Setup


$ bin/gremlin.sh !

         ,,,/!
         (o o)!
-----oOOo-(_)-oOOo-----!
gremlin> g = FaunusFactory.open('bin/titan-hbase.properties')!
==>faunusgraph[titanhbaseinputformat]!
gremlin> g.getProperties()!
==>faunus.graph.input.format=com.thinkaurelius.faunus.formats.titan.hbase.TitanHBaseInputFormat
==>faunus.graph.output.format=org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat!
==>faunus.sideeffect.output.format=org.apache.hadoop.mapreduce.lib.output.TextOutputFormat!
==>faunus.output.location=dbpedia!
==>faunus.output.location.overwrite=true!
gremlin> g._() !
12/11/09 15:17:45 INFO mapreduce.FaunusCompiler: Compiled to 1 MapReduce job(s)!
12/11/09 15:17:45 INFO mapreduce.FaunusCompiler: Executing job 1 out of 1:
MapSequence[com.thinkaurelius.faunus.mapreduce.transform.IdentityMap.Map]!
12/11/09 15:17:50 INFO mapred.JobClient: Running job: job_201211081058_0003!
Build a Knowledge Graph
  Based on DBPedia
    Graph version of Wikipedia
    ~290 million edges (~1B triples)
1.  Bulk load RDF into Faunus
    6 m1.xlarge
2.  Convert to property graph
3.  Bulk load into Titan
    3 m1.xlarge with Cassandra
4.  OLTP+OLAP
    Total Time: ~ 2 hours
Graph OLTP

gremlin> g = TitanFactory.open('bin/cassandra.local')   !
==>titangraph[cassandrathrift:10.176.213.110]!

gremlin> g.V('name','Random_walker_algorithm').both.name!
==>Random_walk!
==>Segmentation_(image_processing)!
==>Graph_(mathematics)!
==>Laplacian_matrix!
==>Graph!
==>Laplacian_matrix!
==>Electrical_network!
==>Resistor!
==>Electrical_resistance_and_conductance!
==>Ground_(electricity)!
==>Direct_current!
==>Voltage_source!
==>Precomputation!
==>Category:Computer_vision!
==>Random_Walker_(Computer_Vision)!
==>List_of_algorithms!
==>Segmentation_(image_processing)!
==>Watershed_(image_processing)!
==>Random_walker_(computer_vision)!
==>Random_Walker_(computer_vision)!
gremlin> g.V('name','Learning').out.out.out.out[0..10].name !
==>Latium!
==>Roman_Kingdom!
==>Roman_Republic!
==>Roman_Empire!
==>Middle_Ages!
==>Early_modern_Europe!
==>Armenian_Kingdom_of_Cilicia!
==>Lingua_franca!
==>Vatican_City!
==>Vulgar_Latin!
==>Romance_languages!
Apache 2

            Aurelius Graph Cluster
          TITAN                                 FAUNUS                               FULGORA




                                Map/Reduce
                                                                          Load

                                 Bulk Load




                                 Analysis results
      aureliusgraphs@googlegroups.com
                                 back into Titan


    Stores a massive-scale                    Batch processing of large           Runs global graph algorithms
property graph allowing real-                   graphs with Hadoop
                  on large, compressed,
 time traversals and updates
                                                          in-memory graphs
Speed of Traversal/Process
     The Graph Landscape




Illustration only, not to scale
                                         Size of Graph
TINKERPOP.COM
Thanks!


   Vadas Gintautas
    Marko Rodriguez
   @vadasg
            @twarko


   Stephen Mallette
   Daniel LaRocque
   @spmallette

                           AURELIUS
                           THINKAURELIUS.COM
We are Hiring



   AURELIUS
  THINKAURELIUS.COM

Más contenido relacionado

Destacado

Recuperare dati da partizioni NTFS danneggiate
Recuperare dati da partizioni NTFS danneggiateRecuperare dati da partizioni NTFS danneggiate
Recuperare dati da partizioni NTFS danneggiateAndrea Lazzarotto
 
Meetup Big Data User Group Dresden: Gradoop - Scalable Graph Analytics with A...
Meetup Big Data User Group Dresden: Gradoop - Scalable Graph Analytics with A...Meetup Big Data User Group Dresden: Gradoop - Scalable Graph Analytics with A...
Meetup Big Data User Group Dresden: Gradoop - Scalable Graph Analytics with A...Martin Junghanns
 
Ricostruzione forense di NTFS con metadati parzialmente danneggiati
Ricostruzione forense di NTFS con metadati parzialmente danneggiatiRicostruzione forense di NTFS con metadati parzialmente danneggiati
Ricostruzione forense di NTFS con metadati parzialmente danneggiatiAndrea Lazzarotto
 
TinkerPop and Titan from a Python State of Mind
TinkerPop and Titan from a  Python State of MindTinkerPop and Titan from a  Python State of Mind
TinkerPop and Titan from a Python State of MindDenise Gosnell, Ph.D.
 
C* Summit 2013: Distributed Graph Computing with Titan and Faunus by Matthias...
C* Summit 2013: Distributed Graph Computing with Titan and Faunus by Matthias...C* Summit 2013: Distributed Graph Computing with Titan and Faunus by Matthias...
C* Summit 2013: Distributed Graph Computing with Titan and Faunus by Matthias...DataStax Academy
 
TinkerPop: a story of graphs, DBs, and graph DBs
TinkerPop: a story of graphs, DBs, and graph DBsTinkerPop: a story of graphs, DBs, and graph DBs
TinkerPop: a story of graphs, DBs, and graph DBsJoshua Shinavier
 
Come si creano le app Android
Come si creano le app AndroidCome si creano le app Android
Come si creano le app AndroidAndrea Lazzarotto
 
Building Knowledge Graphs in DIG
Building Knowledge Graphs in DIGBuilding Knowledge Graphs in DIG
Building Knowledge Graphs in DIGPalak Modi
 
Cassandra Summit - What's New In Apache TinkerPop?
Cassandra Summit - What's New In Apache TinkerPop?Cassandra Summit - What's New In Apache TinkerPop?
Cassandra Summit - What's New In Apache TinkerPop?Stephen Mallette
 
Graph Processing with Apache TinkerPop
Graph Processing with Apache TinkerPopGraph Processing with Apache TinkerPop
Graph Processing with Apache TinkerPopJason Plurad
 
DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandr...
DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandr...DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandr...
DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandr...DataStax
 
Intro to Graph Databases Using Tinkerpop, TitanDB, and Gremlin
Intro to Graph Databases Using Tinkerpop, TitanDB, and GremlinIntro to Graph Databases Using Tinkerpop, TitanDB, and Gremlin
Intro to Graph Databases Using Tinkerpop, TitanDB, and GremlinCaleb Jones
 
Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph DatabasesMax De Marzi
 

Destacado (15)

Recuperare dati da partizioni NTFS danneggiate
Recuperare dati da partizioni NTFS danneggiateRecuperare dati da partizioni NTFS danneggiate
Recuperare dati da partizioni NTFS danneggiate
 
Meetup Big Data User Group Dresden: Gradoop - Scalable Graph Analytics with A...
Meetup Big Data User Group Dresden: Gradoop - Scalable Graph Analytics with A...Meetup Big Data User Group Dresden: Gradoop - Scalable Graph Analytics with A...
Meetup Big Data User Group Dresden: Gradoop - Scalable Graph Analytics with A...
 
Ricostruzione forense di NTFS con metadati parzialmente danneggiati
Ricostruzione forense di NTFS con metadati parzialmente danneggiatiRicostruzione forense di NTFS con metadati parzialmente danneggiati
Ricostruzione forense di NTFS con metadati parzialmente danneggiati
 
TinkerPop and Titan from a Python State of Mind
TinkerPop and Titan from a  Python State of MindTinkerPop and Titan from a  Python State of Mind
TinkerPop and Titan from a Python State of Mind
 
C* Summit 2013: Distributed Graph Computing with Titan and Faunus by Matthias...
C* Summit 2013: Distributed Graph Computing with Titan and Faunus by Matthias...C* Summit 2013: Distributed Graph Computing with Titan and Faunus by Matthias...
C* Summit 2013: Distributed Graph Computing with Titan and Faunus by Matthias...
 
TinkerPop: a story of graphs, DBs, and graph DBs
TinkerPop: a story of graphs, DBs, and graph DBsTinkerPop: a story of graphs, DBs, and graph DBs
TinkerPop: a story of graphs, DBs, and graph DBs
 
Come si creano le app Android
Come si creano le app AndroidCome si creano le app Android
Come si creano le app Android
 
Building Knowledge Graphs in DIG
Building Knowledge Graphs in DIGBuilding Knowledge Graphs in DIG
Building Knowledge Graphs in DIG
 
PSL Overview
PSL OverviewPSL Overview
PSL Overview
 
Cassandra Summit - What's New In Apache TinkerPop?
Cassandra Summit - What's New In Apache TinkerPop?Cassandra Summit - What's New In Apache TinkerPop?
Cassandra Summit - What's New In Apache TinkerPop?
 
Graph Processing with Apache TinkerPop
Graph Processing with Apache TinkerPopGraph Processing with Apache TinkerPop
Graph Processing with Apache TinkerPop
 
DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandr...
DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandr...DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandr...
DataStax | Graph Computing with Apache TinkerPop (Marko Rodriguez) | Cassandr...
 
Intro to Graph Databases Using Tinkerpop, TitanDB, and Gremlin
Intro to Graph Databases Using Tinkerpop, TitanDB, and GremlinIntro to Graph Databases Using Tinkerpop, TitanDB, and Gremlin
Intro to Graph Databases Using Tinkerpop, TitanDB, and Gremlin
 
Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph Databases
 
Slideshare ppt
Slideshare pptSlideshare ppt
Slideshare ppt
 

Más de Matthias Broecheler

Titan: Scaling Graphs and TinkerPop3
Titan: Scaling Graphs and TinkerPop3Titan: Scaling Graphs and TinkerPop3
Titan: Scaling Graphs and TinkerPop3Matthias Broecheler
 
Graph Computing @ Strangeloop 2013
Graph Computing @ Strangeloop 2013Graph Computing @ Strangeloop 2013
Graph Computing @ Strangeloop 2013Matthias Broecheler
 
Titan - Graph Computing with Cassandra
Titan - Graph Computing with CassandraTitan - Graph Computing with Cassandra
Titan - Graph Computing with CassandraMatthias Broecheler
 
PMatch: Probabilistic Subgraph Matching on Huge Social Networks
PMatch: Probabilistic Subgraph Matching on Huge Social NetworksPMatch: Probabilistic Subgraph Matching on Huge Social Networks
PMatch: Probabilistic Subgraph Matching on Huge Social NetworksMatthias Broecheler
 
Budget-Match: Cost Effective Subgraph Matching on Large Networks
Budget-Match: Cost Effective Subgraph Matching on Large NetworksBudget-Match: Cost Effective Subgraph Matching on Large Networks
Budget-Match: Cost Effective Subgraph Matching on Large NetworksMatthias Broecheler
 
Computing Marginal in CCMRFs - NIPS 2010
Computing Marginal in CCMRFs - NIPS 2010Computing Marginal in CCMRFs - NIPS 2010
Computing Marginal in CCMRFs - NIPS 2010Matthias Broecheler
 
A Scalable Framework for Modeling Competitive Diffusion in Social Networks
A Scalable Framework for Modeling Competitive Diffusion in Social NetworksA Scalable Framework for Modeling Competitive Diffusion in Social Networks
A Scalable Framework for Modeling Competitive Diffusion in Social NetworksMatthias Broecheler
 
COSI: Cloud Oriented Subgraph Identification in Massive Social Networks
COSI: Cloud Oriented Subgraph Identification in Massive Social NetworksCOSI: Cloud Oriented Subgraph Identification in Massive Social Networks
COSI: Cloud Oriented Subgraph Identification in Massive Social NetworksMatthias Broecheler
 

Más de Matthias Broecheler (10)

Titan: Scaling Graphs and TinkerPop3
Titan: Scaling Graphs and TinkerPop3Titan: Scaling Graphs and TinkerPop3
Titan: Scaling Graphs and TinkerPop3
 
Titan NYC Meetup March 2014
Titan NYC Meetup March 2014Titan NYC Meetup March 2014
Titan NYC Meetup March 2014
 
Graph Computing @ Strangeloop 2013
Graph Computing @ Strangeloop 2013Graph Computing @ Strangeloop 2013
Graph Computing @ Strangeloop 2013
 
Titan - Graph Computing with Cassandra
Titan - Graph Computing with CassandraTitan - Graph Computing with Cassandra
Titan - Graph Computing with Cassandra
 
PMatch: Probabilistic Subgraph Matching on Huge Social Networks
PMatch: Probabilistic Subgraph Matching on Huge Social NetworksPMatch: Probabilistic Subgraph Matching on Huge Social Networks
PMatch: Probabilistic Subgraph Matching on Huge Social Networks
 
Budget-Match: Cost Effective Subgraph Matching on Large Networks
Budget-Match: Cost Effective Subgraph Matching on Large NetworksBudget-Match: Cost Effective Subgraph Matching on Large Networks
Budget-Match: Cost Effective Subgraph Matching on Large Networks
 
Probabilistic Soft Logic
Probabilistic Soft LogicProbabilistic Soft Logic
Probabilistic Soft Logic
 
Computing Marginal in CCMRFs - NIPS 2010
Computing Marginal in CCMRFs - NIPS 2010Computing Marginal in CCMRFs - NIPS 2010
Computing Marginal in CCMRFs - NIPS 2010
 
A Scalable Framework for Modeling Competitive Diffusion in Social Networks
A Scalable Framework for Modeling Competitive Diffusion in Social NetworksA Scalable Framework for Modeling Competitive Diffusion in Social Networks
A Scalable Framework for Modeling Competitive Diffusion in Social Networks
 
COSI: Cloud Oriented Subgraph Identification in Massive Social Networks
COSI: Cloud Oriented Subgraph Identification in Massive Social NetworksCOSI: Cloud Oriented Subgraph Identification in Massive Social Networks
COSI: Cloud Oriented Subgraph Identification in Massive Social Networks
 

Último

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 

Último (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

Adding Value through graph analysis using Titan and Faunus

  • 1. KNOWLEDGE INFORMATION DATA Adding Value Through Graph Analysis Matthias Broecheler, CTO @mbroecheler AURELIUS March V, MMXIII THINKAURELIUS.COM
  • 2. " " " " " " " " " Communities of Interest Finding Influencers " Understanding Behavior "
  • 3. " " " " " " " " " Information Integration Recommendation " Question Answering "
  • 4. " " " " " " " " " Fraud Detection Risk Analysis " Market Valuation "
  • 5. Knowledge Value Information Data
  • 6. likes(Jane Joe, cute mamals):0.8 Knowledge userid:3552 " clicked timestamp: addid:9914 Information 93932342 " 2013-03-03 18:52:48:112; 12.123.211.192; ACCESS/TRR; http://adserve.domain.com/ render.cgi? uid=F32282DA39B&flagtru&xls=trendi Data ng ; ACTION=CLICK|DELAY=250|x=450| y=632!
  • 7. Graph Databases & likes(Jane Joe, cute mamals):0.8 Graph Analysis Knowledge userid:3552 " clicked timestamp: addid:9914 Information 93932342 " 2013-03-03 18:52:48:112; 12.123.211.192; ACCESS/TRR; http://adserve.domain.com/ render.cgi? uid=F32282DA39B&flagtru&xls=trendi Data ng ; ACTION=CLICK|DELAY=250|x=450| y=632!
  • 8.
  • 9.
  • 10.
  • 11. I Graph Foundation AURELIUS THINKAURELIUS.COM
  • 12. name: Neptune name: Alcmene type: god type: god Vertex Property name: Saturn name: Jupiter name: Hercules type: titan type: god type: demigod name: Pluto name: Cerberus type: god type: monster Graph
  • 13. name: Neptune name: Alcmene type: god type: god Edge brother mother name: Saturn name: Jupiter name: Hercules type: titan type: god type: demigod father father Edge battled brother Property time:12 name: Pluto name: Cerberus type: god type: monster Edge Type pet Graph
  • 14. name: Neptune name: Alcmene type: god type: god brother mother name: Saturn name: Jupiter name: Hercules type: titan type: god type: demigod father father battled brother time:12 name: Pluto name: Cerberus type: god type: monster pet Path
  • 15. name: Neptune name: Alcmene type: god type: god brother mother name: Saturn name: Jupiter name: Hercules type: titan type: god type: demigod father father battled brother time:12 name: Pluto name: Cerberus type: god type: monster pet Degree
  • 16. Apache 2 Aurelius Graph Cluster TITAN FAUNUS FULGORA Map/Reduce Load Bulk Load Analysis results back into Titan Stores a massive-scale Batch processing of large Runs global graph algorithms property graph allowing real- graphs with Hadoop on large, compressed, time traversals and updates in-memory graphs
  • 17. II Titan Graph Database AURELIUS THINKAURELIUS.COM
  • 18. Titan Features   Numerous Concurrent Users   Many Short Transactions   read/write   Real-time Traversals (OLTP)   High Availability   Dynamic Scalability   Variable Consistency Model   ACID or eventual consistency   Real-time Big Graph Data
  • 19. Storage Backends Partitionability Consistency Availability
  • 20. $ ./titan-0.2.0/bin/gremlin.sh! ! ! !,,,/! (o o)! -----oOOo-(_)-oOOo-----! gremlin> g = TitanFactory.open('/tmp/titan')! ==>titangraph[local:/tmp/titan]! gremlin> v = g.V(‘name’,’Hercules’)! ==>v[4]! gremlin> v.out(‘father’).out(‘brother’).name!
  • 21. name: Neptune name: Alcmene type: god type: god brother mother name: Saturn name: Jupiter name: Hercules type: titan type: god type: demigod father father battled brother time:12 name: Pluto name: Cerberus type: god type: monster pet gremlin> v.out(‘father’).out(‘brother’).name!
  • 22. Vertex-Centric Indices   Sort and index edges per vertex by primary key   Primary key can be composite   Enables efficient focused traversals   Only retrieve edges that matter   Uses push down predicates for quick, index-driven retrieval
  • 23. battled battled battled time: 1 time: 3 time: 5 mother battled v v.query()! time: 9 father fought fought
  • 24. battled battled battled time: 1 time: 3 time: 5 mother battled v v.query()! time: 9 .direction(OUT)! father
  • 25. battled battled battled time: 1 time: 3 time: 5 battled v v.query()! time: 9 .direction(OUT)! .labels(‘battled’)!
  • 26. battled battled time: 1 time: 3 v v.query()! .direction(OUT)! .labels(‘battled’)! .has(‘time,T.lt,5)!
  • 27. Titan Features I.  Data Management II.  Vertex-Centric Indices
  • 28. Titan Features III.  Graph Partitioning IV.  Edge Compression
  • 29. III TITAN 0.3.0 [-SNAPSHOT] AURELIUS THINKAURELIUS.COM
  • 30. Titan Embedding   Rexster RexPro   lightweight Gremlin Server   binary protocol   Titan Gremlin Engine   Embedded Storage Backend   in-JVM method calls   Native clients   Java, Python, Clojure
  • 31. Graph Indexing   Vertex and Edge indexing   Pluggable index provider   ElasticSearch   Lucene   Full-text search   Numeric range search   Geographic search
  • 32. name: Neptune name: Alcmene age: 5200 age: 3300 title: God of the earth and ocean brother mother name: Jupiter name: Saturn age: 4800 name: Hercules age: 5900 title: God of the title: Divine hero heaven and skies father father battled brother time:12 location: (38.071,23.745) name: Pluto name: Cerberus age: 4900 title: Ugly beast of the title: God of the underworld underworld pet
  • 33. name: Neptune name: Alcmene age: 5200 age: 3300 title: God of the earth and ocean brother mother name: Jupiter name: Saturn age: 4800 name: Hercules age: 5900 title: God of the title: Divine hero heaven and skies father father battled brother time:12 location: (38.071,23.745) name: Pluto name: Cerberus age: 4900 title: Ugly beast of the title: God of the underworld underworld pet g.query().has(‘age’,Cmp.GREATER_THAN,5000).vertices()!
  • 34. name: Neptune name: Alcmene age: 5200 age: 3300 title: God of the earth and ocean brother mother name: Jupiter name: Saturn age: 4800 name: Hercules age: 5900 title: God of the title: Divine hero heaven and skies father father battled brother time:12 location: (38.071,23.745) name: Pluto name: Cerberus age: 4900 title: Ugly beast of the title: God of the underworld underworld pet g.query().has(‘title’,Txt.CONTAINS,’god’).vertices()!
  • 35. name: Neptune name: Alcmene age: 5200 age: 3300 title: God of the earth and ocean brother mother name: Jupiter name: Saturn age: 4800 name: Hercules age: 5900 title: God of the title: Divine hero heaven and skies father father battled brother time:12 location: (38.071,23.745) name: Pluto name: Cerberus age: 4900 title: Ugly beast of the title: God of the underworld underworld pet g.query().has(‘age’,Cmp.GREATER_THAN,5000)
 has(‘title’,Txt.CONTAINS,’god’).vertices()!
  • 36. name: Neptune name: Alcmene age: 5200 age: 3300 title: God of the earth and ocean brother mother name: Jupiter name: Saturn age: 4800 name: Hercules age: 5900 title: God of the title: Divine hero heaven and skies father father battled brother time:12 location: (38.071,23.745) name: Pluto name: Cerberus age: 4900 title: Ugly beast of the title: God of the underworld underworld pet g.query().has(‘location’,Geo.WITHIN,
 Geoshape.circle(38,23,100).edges()!
  • 37. IV Faunus Graph Analytics AURELIUS THINKAURELIUS.COM
  • 38. Faunus Features   Hadoop-based Graph Computing Framework   Graph Analytics   Breadth-first Traversals   Global Graph Computations   Batch Big Graph Data
  • 40. Faunus Work Flow g.V.out .out .count() hdfs://user/ubuntu/ output/job-0/ output/job-1/ graph* output/job-2/ { sideeffect* Compressed HDFS Graphs   stored in sequence files   variable length encoding   prefix compression
  • 41. Apache 2 Aurelius Graph Cluster TITAN FAUNUS FULGORA Map/Reduce Load Bulk Load Analysis results back into Titan Stores a massive-scale Batch processing of large Runs global graph algorithms property graph allowing real- graphs with Hadoop on large, compressed, time traversals and updates in-memory graphs
  • 42. What’s New   Faunus 0.1 released   Bulk Import / Export for Titan   loaded graph into Titan   loading derivations into Titan   RDF support   Many optimizations   vertex compression
  • 43. Faunus Setup $ bin/gremlin.sh ! ,,,/! (o o)! -----oOOo-(_)-oOOo-----! gremlin> g = FaunusFactory.open('bin/titan-hbase.properties')! ==>faunusgraph[titanhbaseinputformat]! gremlin> g.getProperties()! ==>faunus.graph.input.format=com.thinkaurelius.faunus.formats.titan.hbase.TitanHBaseInputFormat ==>faunus.graph.output.format=org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat! ==>faunus.sideeffect.output.format=org.apache.hadoop.mapreduce.lib.output.TextOutputFormat! ==>faunus.output.location=dbpedia! ==>faunus.output.location.overwrite=true! gremlin> g._() ! 12/11/09 15:17:45 INFO mapreduce.FaunusCompiler: Compiled to 1 MapReduce job(s)! 12/11/09 15:17:45 INFO mapreduce.FaunusCompiler: Executing job 1 out of 1: MapSequence[com.thinkaurelius.faunus.mapreduce.transform.IdentityMap.Map]! 12/11/09 15:17:50 INFO mapred.JobClient: Running job: job_201211081058_0003!
  • 44. Build a Knowledge Graph   Based on DBPedia   Graph version of Wikipedia   ~290 million edges (~1B triples) 1.  Bulk load RDF into Faunus   6 m1.xlarge 2.  Convert to property graph 3.  Bulk load into Titan   3 m1.xlarge with Cassandra 4.  OLTP+OLAP   Total Time: ~ 2 hours
  • 45. Graph OLTP gremlin> g = TitanFactory.open('bin/cassandra.local') ! ==>titangraph[cassandrathrift:10.176.213.110]! gremlin> g.V('name','Random_walker_algorithm').both.name! ==>Random_walk! ==>Segmentation_(image_processing)! ==>Graph_(mathematics)! ==>Laplacian_matrix! ==>Graph! ==>Laplacian_matrix! ==>Electrical_network! ==>Resistor! ==>Electrical_resistance_and_conductance! ==>Ground_(electricity)! ==>Direct_current! ==>Voltage_source! ==>Precomputation! ==>Category:Computer_vision! ==>Random_Walker_(Computer_Vision)! ==>List_of_algorithms! ==>Segmentation_(image_processing)! ==>Watershed_(image_processing)! ==>Random_walker_(computer_vision)! ==>Random_Walker_(computer_vision)!
  • 47. Apache 2 Aurelius Graph Cluster TITAN FAUNUS FULGORA Map/Reduce Load Bulk Load Analysis results aureliusgraphs@googlegroups.com back into Titan Stores a massive-scale Batch processing of large Runs global graph algorithms property graph allowing real- graphs with Hadoop on large, compressed, time traversals and updates in-memory graphs
  • 48. Speed of Traversal/Process The Graph Landscape Illustration only, not to scale Size of Graph
  • 50. Thanks! Vadas Gintautas Marko Rodriguez @vadasg @twarko Stephen Mallette Daniel LaRocque @spmallette AURELIUS THINKAURELIUS.COM
  • 51. We are Hiring AURELIUS THINKAURELIUS.COM