SlideShare una empresa de Scribd logo
1 de 42
Descargar para leer sin conexión
An overview of
                       Neo4j Internals


                                   tobias@neotechnology.com
 Tobias Lindaaker                  twitter: @thobe, #neo4j (@neo4j)
                                   web: neo4j.org neotechnology.com
 Hacker @ Neo Technology           my web: thobe.org




Monday, May 21, 2012
Outline
       This is a rough structure of
       how the pieces of Neo4j fit
       together.

       This talk will not cover
       how disks/fs works, we
       just assume it does.           Traversals      Core API       Cypher
       Nor will it cover the “Core
       API”, you are assumed to
       know it.
                                      Node/Relationship
                                                            Thread local diffs
                                        Object cache

                                          FS Cache                         HA


                                        Record files          Transaction log


                                                      Disk(s)



                                                                                 2

Monday, May 21, 2012
Outline

                             Traversals      Core API       Cypher


                             Node/Relationship
                                                   Thread local diffs
                               Object cache

                                 FS Cache                         HA

      Let’s start at the
      bottom: the on disk      Record files          Transaction log
      storage file layout.



                                             Disk(s)



                                                                        3

Monday, May 21, 2012
Simple sample graph. It all boils down to
                                                linked lists of fixed size records on disk.

       Your graph on disk                       Properties are stored as a linked list of
                                                property records, each holding key+value.
                                                Each node/relationship references its first
                                                property record.
                                                The Nodes also reference the first node in its
                                                relationship chain.
                                                Each Relationship references its start and
                   Name: Alistair               end node.
                   Age: 34             KNOWS    It also references the prev/next relationship
                                                record for the start/end node respectively


                                                                   Name: Tobias
                                                                   Age: 27
                                                                   Nationality: Swedish



            KNOWS
                                        KNOWS


                                                                      KNOWS
              Name: Ian
              Age: 42


                                                           Name: Jim
                                                           Age: 37
                                    KNOWS                  Stuff: good

                                                                                                 4

Monday, May 21, 2012
Simple sample graph. It all boils down to
                                         linked lists of fixed size records on disk.

       Your graph on disk                Properties are stored as a linked list of
                                         property records, each holding key+value.
                                         Each node/relationship references its first
        Name                             property record.
                                         The Nodes also reference the first node in its
       Alistair                          relationship chain.
                                         Each Relationship references its start and
                                         end node.                                        Name
                                KNOWS    It also references the prev/next relationship
                                         record for the start/end node respectively       Tobias
      Age
       34
                                                                                                        Age
                                                                                                            27


                                                                                          Nationality

             KNOWS                                                                        Swedish
                                 KNOWS


                                                               KNOWS

                                                                               Name
                                                                                 Jim
                                                                                                            Age
     Name                                                                                                   37
                                                                                 Stuff
       Ian                   KNOWS
                       Age                                                      good

                       42                                                                               4

Monday, May 21, 2012
Simple sample graph. It all boils down to
                                         linked lists of fixed size records on disk.

       Your graph on disk                Properties are stored as a linked list of
                                         property records, each holding key+value.
                                         Each node/relationship references its first
        Name                             property record.
                                         The Nodes also reference the first node in its
       Alistair                          relationship chain.
                                         Each Relationship references its start and
                                         end node.                                        Name
                                KNOWS    It also references the prev/next relationship
                                         record for the start/end node respectively       Tobias
      Age
       34
                                                                                                        Age
                                                                                                            27


                                                                                          Nationality

             KNOWS                                                                        Swedish
                                 KNOWS


                                                               KNOWS

                                                                               Name
                                                                                 Jim
                                                                                                            Age
     Name                                                                                                   37
                                                                                 Stuff
       Ian                   KNOWS
                       Age                                                      good

                       42                                                                               4

Monday, May 21, 2012
Simple sample graph. It all boils down to
                                                   linked lists of fixed size records on disk.

       Your graph on disk                          Properties are stored as a linked list of
                                                   property records, each holding key+value.
                                                   Each node/relationship references its first
        Name
                                       SP    EP    property record.
                                                   The Nodes also reference the first node in its
       Alistair                                    relationship chain.
                                       SN    EN    Each Relationship references its start and
                                                   end node.                                        Name
                                       KNOWS       It also references the prev/next relationship
                                                   record for the start/end node respectively       Tobias
      Age
       34
                                                                                                                  Age
                                                                                                                      27
             SP        EP
                                        SP    EP
             SN        EN                                                                           Nationality
                                        SN    EN
             KNOWS                                                                                  Swedish
                                        KNOWS                            SP       EP
                                                                         SN       EN
                                                                         KNOWS

                                                                                         Name
                                  SP   EP                                                  Jim
                                                                                                                      Age
                                  SN   EN
     Name                                                                                                             37
                                                                                           Stuff
       Ian                        KNOWS
                            Age                                                           good

                            42                                                                                    4

Monday, May 21, 2012
Simple sample graph. It all boils down to
                                                   linked lists of fixed size records on disk.

       Your graph on disk                          Properties are stored as a linked list of
                                                   property records, each holding key+value.
                                                   Each node/relationship references its first
        Name
                                       SP    EP    property record.
                                                   The Nodes also reference the first node in its
       Alistair                                    relationship chain.
                                       SN    EN    Each Relationship references its start and
                                                   end node.                                        Name
                                       KNOWS       It also references the prev/next relationship
                                                   record for the start/end node respectively       Tobias
      Age
       34
                                                                                                                  Age
                                                                                                                      27
             SP        EP
                                        SP    EP
             SN        EN                                                                           Nationality
                                        SN    EN
             KNOWS                                                                                  Swedish
                                        KNOWS                            SP       EP
                                                                         SN       EN
                                                                         KNOWS

                                                                                         Name
                                  SP   EP                                                  Jim
                                                                                                                      Age
                                  SN   EN
     Name                                                                                                             37
                                                                                           Stuff
       Ian                        KNOWS
                            Age                                                           good

                            42                                                                                    4

Monday, May 21, 2012
Simple sample graph. It all boils down to
                                                   linked lists of fixed size records on disk.

       Your graph on disk                          Properties are stored as a linked list of
                                                   property records, each holding key+value.
                                                   Each node/relationship references its first
        Name
                                       SP    EP    property record.
                                                   The Nodes also reference the first node in its
       Alistair                                    relationship chain.
                                       SN    EN    Each Relationship references its start and
                                                   end node.                                        Name
                                       KNOWS       It also references the prev/next relationship
                                                   record for the start/end node respectively       Tobias
      Age
       34
                                                                                                                  Age
                                                                                                                      27
             SP        EP
                                        SP    EP
             SN        EN                                                                           Nationality
                                        SN    EN
             KNOWS                                                                                  Swedish
                                        KNOWS                            SP       EP
                                                                         SN       EN
                                                                         KNOWS

                                                                                         Name
                                  SP   EP                                                  Jim
                                                                                                                      Age
                                  SN   EN
     Name                                                                                                             37
                                                                                           Stuff
       Ian                        KNOWS
                            Age                                                           good

                            42                                                                                    4

Monday, May 21, 2012
Simple sample graph. It all boils down to
                                                   linked lists of fixed size records on disk.

       Your graph on disk                          Properties are stored as a linked list of
                                                   property records, each holding key+value.
                                                   Each node/relationship references its first
        Name
                                       SP    EP    property record.
                                                   The Nodes also reference the first node in its
       Alistair                                    relationship chain.
                                       SN    EN    Each Relationship references its start and
                                                   end node.                                        Name
                                       KNOWS       It also references the prev/next relationship
                                                   record for the start/end node respectively       Tobias
      Age
       34
                                                                                                                  Age
                                                                                                                      27
             SP        EP
                                        SP    EP
             SN        EN                                                                           Nationality
                                        SN    EN
             KNOWS                                                                                  Swedish
                                        KNOWS                            SP       EP
                                                                         SN       EN
                                                                         KNOWS

                                                                                         Name
                                  SP   EP                                                  Jim
                                                                                                                      Age
                                  SN   EN
     Name                                                                                                             37
                                                                                           Stuff
       Ian                        KNOWS
                            Age                                                           good

                            42                                                                                    4

Monday, May 21, 2012
Store files
                       ๏Node store
                       ๏Relationship store
                         •   Relationship type store
                       ๏Property store
                         •   Property key store

                         • (long) String store    Short string and
                                                  array values are
                                                  inlined in the




                         •
                                                  property store, long
                                                  values are stored in

                           (long) Array store     separate store files.


                                                                          5

Monday, May 21, 2012
Neo4j Storage Record Layout
Node (9 bytes)
inUse nextRelId                          nextPropId


     1                               5                9



Relationship (33 bytes)
inUse firstNode                          secondNode       relationshipType    firstPrevRelId    firstNextRelId    secondPrevRelId    secondNextRelId    nextPropId


     1                               5                9                      13                17                21                 25                 29            33



Relationship Type (5 bytes)
inUse typeBlockId


     1                               5



Property (33 bytes)
inUse type              keyIndexId       propBlock                                                                                                      nextPropId


     1              3                5                                                                                                                 29            33



Property Index (9 bytes)
inUse propCount                          keyBlockId


     1                               5                9



Dynamic Store (125 bytes)
inUse next                               data


     1                               5



NeoStore (5 bytes)
inUse datum


     1                               5

Monday, May 21, 2012
Outline

                                      Traversals      Core API       Cypher


     Next: The t wo levels of cache   Node/Relationship
     in Neo4j.                                              Thread local diffs
     The low level FS Cache for the     Object cache
     record files.
     And the high level Object
     cache storing a structure
     more optimized for traversal.
                                          FS Cache                         HA


                                        Record files          Transaction log


                                                      Disk(s)



                                                                                 7

Monday, May 21, 2012
The caches
        ๏ Filesystem cache:
                • Caches regions store file intofiles sized regions)
                    (divides each
                                   of the store
                                                  equally

                • The cache holds a fixed number of regions for each file
                • Regions are evicted based on ahit in non-cached region)
                    (hit count vs. miss count, i.e.
                                                    LFU-like policy


                • Default implementation of regions uses OS mmap
        ๏ Node/Relationship cache
                • Cache a version more optimized for traversals
                                                                            8

Monday, May 21, 2012
What we put in cache
                       ID             Relationship ID refs
                                      in:    R1       R2    ...   Rn               The structure of the elements in the high level
                       type 1                                                      object cache.
                                      out    R1       R2    ...   Rn
                                                                                   On disk most of the information is contained
                                      in:    R1       R2    R3     ...    Rn       in the relationship records, with the nodes just
                       type 2                                                      referencing their first relationship. In the
                                      out    R1       ...   Rn
              Node                                                                 cache this is turned around: the nodes hold
                                                                                   references to all its relationships. The
                        ...            (grouped by type)                           relationships are simple, only holding its
                                                                                   properties.

                                                                                   The relationships for each node is grouped by
                                                                                   RelationshipType to allow fast traversal of a
                                                                                   specific type.
                       Key 1                Key 2           ...          Key n
                                                                                   All references (dotted arrows) are by ID, and
                                                                                   traversals do indirect lookup through the cache.
                              Val 1



                                              Val 2




                       ID         start               end         type     Val n

        Relationship
                       Key 1                Key 2           ...          Key n
                              Val 1



                                              Val 2




                                                                           Val n




                                                                                                                       9

Monday, May 21, 2012
Outline

     So how do traversals work...
                                    Traversals      Core API       Cypher


                                    Node/Relationship
                                                          Thread local diffs
                                      Object cache

                                        FS Cache                         HA


                                      Record files          Transaction log


                                                    Disk(s)



                                                                               10

Monday, May 21, 2012
Traversals - how do they work?
       ๏ RelationshipExpanders: given (a path to) a node, returns           The surface
                                                                            layer, the you
                 Relationships to continue traversing from that node        interact with.


       ๏ Evaluators: given (a path to) a node, returns whether to:
                • Continue traversing on that branch (i.e. expand) or not
                • Include (the path to) the node in the result set or not
       ๏ Then a projection to Path, Node, or Relationship applied to
                 each Path in the result set.
            ... but also:
       ๏ Uniqueness level: policy for when it is ok to revisit a node
                 that has already been visited
       ๏ Implemented on top of the Core API
                                                                            11

Monday, May 21, 2012
More on Traversals
        ๏ Fetch node data from cache - non-blocking access                  This is what happens


                •
                                                                            under the hood.
                       If not in cache, retrieve from storage, into cache
                       ‣If region is in FS cache: blocking but short duration access
                       ‣If region is outside FS cache: blocking slower access
        ๏ Get relationships from cached node
                • If not fetched, retrieve from storage, by following chains
        ๏ Expand relationship(s) to end up on next node(s)
                • The relationship knows the node, no need to fetch it yet
        ๏ Evaluate
                •      possibly emitting a Path into the result set
        ๏ Repeat                                                                 12

Monday, May 21, 2012
Outline

                                                                  How is Cypher different?
                       Traversals      Core API       Cypher      and how dowes it work?



                       Node/Relationship
                                             Thread local diffs
                         Object cache

                           FS Cache                         HA


                         Record files          Transaction log


                                       Disk(s)



                                                                              13

Monday, May 21, 2012
Cypher - Just convenient traversal descriptions?
       ๏ Builds on the same infrastructure as Traversals - Expanders
                • but not on the full Traversal system
       ๏ Uses graph pattern matching for traversing the graph
              • Recursive MATCH x-->y, backtracking z-->a-->b, z-->b
            START x=...
                          matching with
                                        x-->z, y-->z,


                                                                Red: pattern graph
                                                                Blue: actual graph
                                                                Green: start node
                                                                Purple: matches




                                                                          14

Monday, May 21, 2012
Cypher - Just convenient traversal descriptions?
       ๏ Builds on the same infrastructure as Traversals - Expanders
                • but not on the full Traversal system
       ๏ Uses graph pattern matching for traversing the graph
              • Recursive MATCH x-->y, backtracking z-->a-->b, z-->b
            START x=...
                          matching with
                                        x-->z, y-->z,


                                                                Red: pattern graph
                                                                Blue: actual graph
                                                                Green: start node
                                                                Purple: matches




                                                                          14

Monday, May 21, 2012
Cypher - Just convenient traversal descriptions?
       ๏ Builds on the same infrastructure as Traversals - Expanders
                • but not on the full Traversal system
       ๏ Uses graph pattern matching for traversing the graph
              • Recursive MATCH x-->y, backtracking z-->a-->b, z-->b
            START x=...
                          matching with
                                        x-->z, y-->z,


                                                                Red: pattern graph
                                                                Blue: actual graph
                                                                Green: start node
                                                                Purple: matches




                                                                          14

Monday, May 21, 2012
Cypher - Just convenient traversal descriptions?
       ๏ Builds on the same infrastructure as Traversals - Expanders
                • but not on the full Traversal system
       ๏ Uses graph pattern matching for traversing the graph
              • Recursive MATCH x-->y, backtracking z-->a-->b, z-->b
            START x=...
                          matching with
                                        x-->z, y-->z,


                                                                Red: pattern graph
                                                                Blue: actual graph
                                                                Green: start node
                                                                Purple: matches




                                                                          14

Monday, May 21, 2012
Cypher - Just convenient traversal descriptions?
       ๏ Builds on the same infrastructure as Traversals - Expanders
                • but not on the full Traversal system
       ๏ Uses graph pattern matching for traversing the graph
              • Recursive MATCH x-->y, backtracking z-->a-->b, z-->b
            START x=...
                          matching with
                                        x-->z, y-->z,


                                                                Red: pattern graph
                                                                Blue: actual graph
                                                                Green: start node
                                                                Purple: matches




                                                                          14

Monday, May 21, 2012
Cypher - Just convenient traversal descriptions?
       ๏ Builds on the same infrastructure as Traversals - Expanders
                • but not on the full Traversal system
       ๏ Uses graph pattern matching for traversing the graph
              • Recursive MATCH x-->y, backtracking z-->a-->b, z-->b
            START x=...
                          matching with
                                        x-->z, y-->z,


                                                                Red: pattern graph
                                                                Blue: actual graph
                                                                Green: start node
                                                                Purple: matches




                                                                          14

Monday, May 21, 2012
Cypher - Just convenient traversal descriptions?
       ๏ Builds on the same infrastructure as Traversals - Expanders
                • but not on the full Traversal system
       ๏ Uses graph pattern matching for traversing the graph
              • Recursive MATCH x-->y, backtracking z-->a-->b, z-->b
            START x=...
                          matching with
                                        x-->z, y-->z,


                                                                Red: pattern graph
                                                                Blue: actual graph
                                                                Green: start node
                                                                Purple: matches




                                                                          14

Monday, May 21, 2012
Cypher - Just convenient traversal descriptions?
       ๏ Builds on the same infrastructure as Traversals - Expanders
                • but not on the full Traversal system
       ๏ Uses graph pattern matching for traversing the graph
              • Recursive MATCH x-->y, backtracking z-->a-->b, z-->b
            START x=...
                          matching with
                                        x-->z, y-->z,


                                                                Red: pattern graph
                                                                Blue: actual graph
                                                                Green: start node
                                                                Purple: matches




                                                                          14

Monday, May 21, 2012
Cypher - Just convenient traversal descriptions?
       ๏ Builds on the same infrastructure as Traversals - Expanders
                • but not on the full Traversal system
       ๏ Uses graph pattern matching for traversing the graph
              • Recursive MATCH x-->y, backtracking z-->a-->b, z-->b
            START x=...
                          matching with
                                        x-->z, y-->z,


                                                                Red: pattern graph
                                                                Blue: actual graph
                                                                Green: start node
                                                                Purple: matches




                                                                          14

Monday, May 21, 2012
What about gremlin?
        ๏ gremlin is a third party language, built by Marko Rodriguez of
                   Tinkerpop (a group of people who like to hack on graphs)
        ๏ Originally based on the idea of using xpath to describe traversals:
                   ./HAS_CART/CONTAINS_ITEM/PURCHASED/PURCHASED
                   but bastardized to distinguish between nodes and relationships:
                   ./outE[label=HAS_CART]/inV                     Traversals are close
                    /outE[label=CONTAINS_ITEM]/inV to xpath, which is
                                                                  why xpath like
                    /inE[label=PURCHASED]/outV                    descriptions of
                                                                  traversals seemed
                    /outE[label=PURCHASED]/inV                    like a good idea.

        ๏ xpath is not complete enough to express full algorithms, it needs a
                   host language, gremlin originally defined its own.
                   This changed Groovy as a more complete host language and
                   abandoned xpath in favor of method chaining
                   [ replace ‘/’ with ‘.’ ]                              15

Monday, May 21, 2012
Gremlin compared to Cypher
        ๏start me=node:people(name={myname})
              match me-[:HAS_CART]->cart-[:CONTAINS_ITEM]->item
              item<-[:PURCHASED]-user-[:PURCHASED]->recommendation
              return recommendation

        ๏ Cypher is declarative, describes what data to get - its shape
        ๏ Gremlin is imperative, prescribes how to get the data
        ๏ Cypher has more opportunities for optimization by the engine
        ๏ Gremlin can implement pagerank, Cypher can’t (yet?)



                                                                     16

Monday, May 21, 2012
Outline

                       Traversals      Core API       Cypher      Transactions involve t wo
                                                                  parts:
                                                                  The (thread local) changes
                                                                  being done by an active
                                                                  transaction,
                       Node/Relationship                          and the transaction replay log
                                             Thread local diffs
                         Object cache                             for recovery.



                           FS Cache                         HA


                         Record files          Transaction log


                                       Disk(s)



                                                                               17

Monday, May 21, 2012
Transaction Isolation
        ๏ Mutating operations are not written when performed
        ๏ They are stored in a thread confined transaction state object
        ๏ This prevents other threads from seeing uncommitted changes
                   from the transactions of other threads
        ๏ When Transaction.finish() is invoked the transaction is either
                   committed or rolled back
        ๏ Rollback is simple: discard the transaction state object



                                                                          18

Monday, May 21, 2012
Transactions & Durability
        ๏ Commit is:
                • Changes made in the transaction are collected as commands
                • Commands are sorted to get predictable update order
                       ‣This prevents concurrent readers from seeing inconsistent
                        data when the changes are applied to the store
                • Write changes (in sorted order) to the transaction log
                • Mark the transaction as committed in the log
                • Apply the changes (in sorted order) to the store files
                                                                              19

Monday, May 21, 2012
Recovery
        ๏ Transaction commands dictate state, they don’t modify state
                • i.e. SET property "count" to 5
                • rather than ADD 1 to property "count"
        ๏ Thus: Applying the same command twice yields the same state
        ๏ Recovery simply replays all transactions since the last safe point
        ๏ If tx A mutates node1.name, then tx B also mutates
                   node1.name that doesn’t matter, because the database is not
                   recovered until all transactions have been replayed


                                                                         20

Monday, May 21, 2012
Outline

                       Traversals      Core API       Cypher


                       Node/Relationship
                                             Thread local diffs
                         Object cache

                           FS Cache                         HA


                         Record files          Transaction log     High Availability in
                                                                  Neo4j builds on top of
                                                                  the transaction replay


                                       Disk(s)



                                                                              21

Monday, May 21, 2012
Outline                       The transaction logs are
                                                                          shared bet ween all instances
                                                                          in an High Availability setup,
                                                                          all other parts operate on the
                                                                          local data just like in the
                                                                          standalone case.

                               Traversals      Core API       Cypher


                       Local   Node/Relationship
                                                     Thread local diffs
                                 Object cache

                                   FS Cache                         HA


                                 Record files          Transaction log     Shared

                                               Disk(s)



                                                                                       22

Monday, May 21, 2012
• HA - the parts to it:
        ๏ Based on streaming transactions between servers
        ๏ All transactions are committed through the master
                • Then (eventually) applied to the slaves
                • Eventuality synchronizationupdate intervalby interaction
                    or when
                              defined by the
                                              is mandated
        ๏ When writing to a slave:
                • Locks coordinated through the master
                • Transaction data buffered on theget a txid
                    applied first on the master to
                                                   slave

                       then applied with the same txid on the slave
                                                                             23

Monday, May 21, 2012
Creating new Nodes and Relationships
        ๏ New Nodes/Relationships don’t need locks, so they don’t need a
                   transaction synced with master until the transaction commit
        ๏ They do need an ID that is unique and equal among all instances
        ๏ Each instance allocates IDs in blocks from the master, then assigns
                   them to new Nodes/Relationships locally

                • This batch allocation can be seen in (Enterprise) 1000 as
                    Node/Relationship counts jumping in steps of
                                                                    WebAdmin




                                                                           24

Monday, May 21, 2012
HA synchronization points
        ๏ Transactions are uniquely identified by monotonically increasing ID
        ๏ All Requests from slave send the current latest txid on that slave
        ๏ Responses from master send back a stream of transactions that
                   have occurred since then, along with the actual result
        ๏ Transaction is replayed just like when committed / recovered
        ๏ Nodes/Relationships touched during this application are evicted
                   from cache to avoid cache staleness
        ๏ Transaction commands are only sorted when created, before
                   stored/transmitted, thus consistency is preserved during all
                   application phases
                                                                              25

Monday, May 21, 2012
Locking semantics of HA
        ๏ To be granted a lock the slave must have the latest version of the
                   Node/Relationship it is trying to lock

                • This ensures consistency
                • The implementation of “Latest version ofentireNode/
                    Relationship” is “Latest version of the
                                                            the
                                                                 graph”

                • The slave must thus sync transactions from the master


                                                                          26

Monday, May 21, 2012
Master election
        ๏ Each instance communicates/coordinates:
              • its latest transaction id (including the master id for that tx)
              • the id forclock value for when the txid was written
                            that instance
              • (logical)
        ๏    Election chooses:
           1. The instance with highest txid
           2. IF multiple: The instance that was master for that tx
           3. IF unavailable: The instance with the lowest clock value
           4. IF multiple: The instance with the lowest id
        ๏ Election happens when the current master cannot be reached
                •
              Any instance can choose to re-elect
                •
              Each instance runs the election protocol individually
                •
              Notify others when election chooses new master
        ๏ When elected, the new master broadcasts to all instances, 27
             forcing them to bind to the new master
Monday, May 21, 2012
Thank you for listening!

                                    tobias@neotechnology.com
 Tobias Lindaaker                   twitter: @thobe, #neo4j (@neo4j)
                                    web: neo4j.org neotechnology.com
 Hacker @ Neo Technology            my web: thobe.org




Monday, May 21, 2012

Más contenido relacionado

La actualidad más candente

Neo4j Fundamentals
Neo4j FundamentalsNeo4j Fundamentals
Neo4j FundamentalsMax De Marzi
 
Building an Observability platform with ClickHouse
Building an Observability platform with ClickHouseBuilding an Observability platform with ClickHouse
Building an Observability platform with ClickHouseAltinity Ltd
 
Apache Flink Training: DataStream API Part 1 Basic
 Apache Flink Training: DataStream API Part 1 Basic Apache Flink Training: DataStream API Part 1 Basic
Apache Flink Training: DataStream API Part 1 BasicFlink Forward
 
Oracle GoldenGate 21c New Features and Best Practices
Oracle GoldenGate 21c New Features and Best PracticesOracle GoldenGate 21c New Features and Best Practices
Oracle GoldenGate 21c New Features and Best PracticesBobby Curtis
 
Introducing Neo4j
Introducing Neo4jIntroducing Neo4j
Introducing Neo4jNeo4j
 
Neo4j Graph Data Science Training - June 9 & 10 - Slides #5 - Graph Catalog O...
Neo4j Graph Data Science Training - June 9 & 10 - Slides #5 - Graph Catalog O...Neo4j Graph Data Science Training - June 9 & 10 - Slides #5 - Graph Catalog O...
Neo4j Graph Data Science Training - June 9 & 10 - Slides #5 - Graph Catalog O...Neo4j
 
Role-Based Access Control (RBAC) in Neo4j
Role-Based Access Control (RBAC) in Neo4jRole-Based Access Control (RBAC) in Neo4j
Role-Based Access Control (RBAC) in Neo4jNeo4j
 
Databricks Fundamentals
Databricks FundamentalsDatabricks Fundamentals
Databricks FundamentalsDalibor Wijas
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to RedisDvir Volk
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Apache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversApache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversScyllaDB
 
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfDeep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfAltinity Ltd
 
Elastic stack Presentation
Elastic stack PresentationElastic stack Presentation
Elastic stack PresentationAmr Alaa Yassen
 
Intro to Neo4j and Graph Databases
Intro to Neo4j and Graph DatabasesIntro to Neo4j and Graph Databases
Intro to Neo4j and Graph DatabasesNeo4j
 
Presto best practices for Cluster admins, data engineers and analysts
Presto best practices for Cluster admins, data engineers and analystsPresto best practices for Cluster admins, data engineers and analysts
Presto best practices for Cluster admins, data engineers and analystsShubham Tagra
 
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)Yohei Onishi
 

La actualidad más candente (20)

Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
Neo4j Fundamentals
Neo4j FundamentalsNeo4j Fundamentals
Neo4j Fundamentals
 
Building an Observability platform with ClickHouse
Building an Observability platform with ClickHouseBuilding an Observability platform with ClickHouse
Building an Observability platform with ClickHouse
 
Apache Flink Training: DataStream API Part 1 Basic
 Apache Flink Training: DataStream API Part 1 Basic Apache Flink Training: DataStream API Part 1 Basic
Apache Flink Training: DataStream API Part 1 Basic
 
Oracle GoldenGate 21c New Features and Best Practices
Oracle GoldenGate 21c New Features and Best PracticesOracle GoldenGate 21c New Features and Best Practices
Oracle GoldenGate 21c New Features and Best Practices
 
Introducing Neo4j
Introducing Neo4jIntroducing Neo4j
Introducing Neo4j
 
Neo4j Graph Data Science Training - June 9 & 10 - Slides #5 - Graph Catalog O...
Neo4j Graph Data Science Training - June 9 & 10 - Slides #5 - Graph Catalog O...Neo4j Graph Data Science Training - June 9 & 10 - Slides #5 - Graph Catalog O...
Neo4j Graph Data Science Training - June 9 & 10 - Slides #5 - Graph Catalog O...
 
Sqoop
SqoopSqoop
Sqoop
 
Role-Based Access Control (RBAC) in Neo4j
Role-Based Access Control (RBAC) in Neo4jRole-Based Access Control (RBAC) in Neo4j
Role-Based Access Control (RBAC) in Neo4j
 
Databricks Fundamentals
Databricks FundamentalsDatabricks Fundamentals
Databricks Fundamentals
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Apache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversApache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the Covers
 
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfDeep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
 
Elastic stack Presentation
Elastic stack PresentationElastic stack Presentation
Elastic stack Presentation
 
Intro to Neo4j and Graph Databases
Intro to Neo4j and Graph DatabasesIntro to Neo4j and Graph Databases
Intro to Neo4j and Graph Databases
 
Presto best practices for Cluster admins, data engineers and analysts
Presto best practices for Cluster admins, data engineers and analystsPresto best practices for Cluster admins, data engineers and analysts
Presto best practices for Cluster admins, data engineers and analysts
 
Apache Nifi Crash Course
Apache Nifi Crash CourseApache Nifi Crash Course
Apache Nifi Crash Course
 
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
 

Más de Tobias Lindaaker

Building Applications with a Graph Database
Building Applications with a Graph DatabaseBuilding Applications with a Graph Database
Building Applications with a Graph DatabaseTobias Lindaaker
 
Choosing the right NOSQL database
Choosing the right NOSQL databaseChoosing the right NOSQL database
Choosing the right NOSQL databaseTobias Lindaaker
 
[JavaOne 2011] Models for Concurrent Programming
[JavaOne 2011] Models for Concurrent Programming[JavaOne 2011] Models for Concurrent Programming
[JavaOne 2011] Models for Concurrent ProgrammingTobias Lindaaker
 
Django and Neo4j - Domain modeling that kicks ass
Django and Neo4j - Domain modeling that kicks assDjango and Neo4j - Domain modeling that kicks ass
Django and Neo4j - Domain modeling that kicks assTobias Lindaaker
 
NOSQLEU - Graph Databases and Neo4j
NOSQLEU - Graph Databases and Neo4jNOSQLEU - Graph Databases and Neo4j
NOSQLEU - Graph Databases and Neo4jTobias Lindaaker
 
Persistent graphs in Python with Neo4j
Persistent graphs in Python with Neo4jPersistent graphs in Python with Neo4j
Persistent graphs in Python with Neo4jTobias Lindaaker
 
A Better Python for the JVM
A Better Python for the JVMA Better Python for the JVM
A Better Python for the JVMTobias Lindaaker
 
A Better Python for the JVM
A Better Python for the JVMA Better Python for the JVM
A Better Python for the JVMTobias Lindaaker
 
Exploiting Concurrency with Dynamic Languages
Exploiting Concurrency with Dynamic LanguagesExploiting Concurrency with Dynamic Languages
Exploiting Concurrency with Dynamic LanguagesTobias Lindaaker
 

Más de Tobias Lindaaker (11)

NOSQL Overview
NOSQL OverviewNOSQL Overview
NOSQL Overview
 
Building Applications with a Graph Database
Building Applications with a Graph DatabaseBuilding Applications with a Graph Database
Building Applications with a Graph Database
 
JDK Power Tools
JDK Power ToolsJDK Power Tools
JDK Power Tools
 
Choosing the right NOSQL database
Choosing the right NOSQL databaseChoosing the right NOSQL database
Choosing the right NOSQL database
 
[JavaOne 2011] Models for Concurrent Programming
[JavaOne 2011] Models for Concurrent Programming[JavaOne 2011] Models for Concurrent Programming
[JavaOne 2011] Models for Concurrent Programming
 
Django and Neo4j - Domain modeling that kicks ass
Django and Neo4j - Domain modeling that kicks assDjango and Neo4j - Domain modeling that kicks ass
Django and Neo4j - Domain modeling that kicks ass
 
NOSQLEU - Graph Databases and Neo4j
NOSQLEU - Graph Databases and Neo4jNOSQLEU - Graph Databases and Neo4j
NOSQLEU - Graph Databases and Neo4j
 
Persistent graphs in Python with Neo4j
Persistent graphs in Python with Neo4jPersistent graphs in Python with Neo4j
Persistent graphs in Python with Neo4j
 
A Better Python for the JVM
A Better Python for the JVMA Better Python for the JVM
A Better Python for the JVM
 
A Better Python for the JVM
A Better Python for the JVMA Better Python for the JVM
A Better Python for the JVM
 
Exploiting Concurrency with Dynamic Languages
Exploiting Concurrency with Dynamic LanguagesExploiting Concurrency with Dynamic Languages
Exploiting Concurrency with Dynamic Languages
 

Último

Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 

Último (20)

Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 

Neo4j Internals Overview

  • 1. An overview of Neo4j Internals tobias@neotechnology.com Tobias Lindaaker twitter: @thobe, #neo4j (@neo4j) web: neo4j.org neotechnology.com Hacker @ Neo Technology my web: thobe.org Monday, May 21, 2012
  • 2. Outline This is a rough structure of how the pieces of Neo4j fit together. This talk will not cover how disks/fs works, we just assume it does. Traversals Core API Cypher Nor will it cover the “Core API”, you are assumed to know it. Node/Relationship Thread local diffs Object cache FS Cache HA Record files Transaction log Disk(s) 2 Monday, May 21, 2012
  • 3. Outline Traversals Core API Cypher Node/Relationship Thread local diffs Object cache FS Cache HA Let’s start at the bottom: the on disk Record files Transaction log storage file layout. Disk(s) 3 Monday, May 21, 2012
  • 4. Simple sample graph. It all boils down to linked lists of fixed size records on disk. Your graph on disk Properties are stored as a linked list of property records, each holding key+value. Each node/relationship references its first property record. The Nodes also reference the first node in its relationship chain. Each Relationship references its start and Name: Alistair end node. Age: 34 KNOWS It also references the prev/next relationship record for the start/end node respectively Name: Tobias Age: 27 Nationality: Swedish KNOWS KNOWS KNOWS Name: Ian Age: 42 Name: Jim Age: 37 KNOWS Stuff: good 4 Monday, May 21, 2012
  • 5. Simple sample graph. It all boils down to linked lists of fixed size records on disk. Your graph on disk Properties are stored as a linked list of property records, each holding key+value. Each node/relationship references its first Name property record. The Nodes also reference the first node in its Alistair relationship chain. Each Relationship references its start and end node. Name KNOWS It also references the prev/next relationship record for the start/end node respectively Tobias Age 34 Age 27 Nationality KNOWS Swedish KNOWS KNOWS Name Jim Age Name 37 Stuff Ian KNOWS Age good 42 4 Monday, May 21, 2012
  • 6. Simple sample graph. It all boils down to linked lists of fixed size records on disk. Your graph on disk Properties are stored as a linked list of property records, each holding key+value. Each node/relationship references its first Name property record. The Nodes also reference the first node in its Alistair relationship chain. Each Relationship references its start and end node. Name KNOWS It also references the prev/next relationship record for the start/end node respectively Tobias Age 34 Age 27 Nationality KNOWS Swedish KNOWS KNOWS Name Jim Age Name 37 Stuff Ian KNOWS Age good 42 4 Monday, May 21, 2012
  • 7. Simple sample graph. It all boils down to linked lists of fixed size records on disk. Your graph on disk Properties are stored as a linked list of property records, each holding key+value. Each node/relationship references its first Name SP EP property record. The Nodes also reference the first node in its Alistair relationship chain. SN EN Each Relationship references its start and end node. Name KNOWS It also references the prev/next relationship record for the start/end node respectively Tobias Age 34 Age 27 SP EP SP EP SN EN Nationality SN EN KNOWS Swedish KNOWS SP EP SN EN KNOWS Name SP EP Jim Age SN EN Name 37 Stuff Ian KNOWS Age good 42 4 Monday, May 21, 2012
  • 8. Simple sample graph. It all boils down to linked lists of fixed size records on disk. Your graph on disk Properties are stored as a linked list of property records, each holding key+value. Each node/relationship references its first Name SP EP property record. The Nodes also reference the first node in its Alistair relationship chain. SN EN Each Relationship references its start and end node. Name KNOWS It also references the prev/next relationship record for the start/end node respectively Tobias Age 34 Age 27 SP EP SP EP SN EN Nationality SN EN KNOWS Swedish KNOWS SP EP SN EN KNOWS Name SP EP Jim Age SN EN Name 37 Stuff Ian KNOWS Age good 42 4 Monday, May 21, 2012
  • 9. Simple sample graph. It all boils down to linked lists of fixed size records on disk. Your graph on disk Properties are stored as a linked list of property records, each holding key+value. Each node/relationship references its first Name SP EP property record. The Nodes also reference the first node in its Alistair relationship chain. SN EN Each Relationship references its start and end node. Name KNOWS It also references the prev/next relationship record for the start/end node respectively Tobias Age 34 Age 27 SP EP SP EP SN EN Nationality SN EN KNOWS Swedish KNOWS SP EP SN EN KNOWS Name SP EP Jim Age SN EN Name 37 Stuff Ian KNOWS Age good 42 4 Monday, May 21, 2012
  • 10. Simple sample graph. It all boils down to linked lists of fixed size records on disk. Your graph on disk Properties are stored as a linked list of property records, each holding key+value. Each node/relationship references its first Name SP EP property record. The Nodes also reference the first node in its Alistair relationship chain. SN EN Each Relationship references its start and end node. Name KNOWS It also references the prev/next relationship record for the start/end node respectively Tobias Age 34 Age 27 SP EP SP EP SN EN Nationality SN EN KNOWS Swedish KNOWS SP EP SN EN KNOWS Name SP EP Jim Age SN EN Name 37 Stuff Ian KNOWS Age good 42 4 Monday, May 21, 2012
  • 11. Store files ๏Node store ๏Relationship store • Relationship type store ๏Property store • Property key store • (long) String store Short string and array values are inlined in the • property store, long values are stored in (long) Array store separate store files. 5 Monday, May 21, 2012
  • 12. Neo4j Storage Record Layout Node (9 bytes) inUse nextRelId nextPropId 1 5 9 Relationship (33 bytes) inUse firstNode secondNode relationshipType firstPrevRelId firstNextRelId secondPrevRelId secondNextRelId nextPropId 1 5 9 13 17 21 25 29 33 Relationship Type (5 bytes) inUse typeBlockId 1 5 Property (33 bytes) inUse type keyIndexId propBlock nextPropId 1 3 5 29 33 Property Index (9 bytes) inUse propCount keyBlockId 1 5 9 Dynamic Store (125 bytes) inUse next data 1 5 NeoStore (5 bytes) inUse datum 1 5 Monday, May 21, 2012
  • 13. Outline Traversals Core API Cypher Next: The t wo levels of cache Node/Relationship in Neo4j. Thread local diffs The low level FS Cache for the Object cache record files. And the high level Object cache storing a structure more optimized for traversal. FS Cache HA Record files Transaction log Disk(s) 7 Monday, May 21, 2012
  • 14. The caches ๏ Filesystem cache: • Caches regions store file intofiles sized regions) (divides each of the store equally • The cache holds a fixed number of regions for each file • Regions are evicted based on ahit in non-cached region) (hit count vs. miss count, i.e. LFU-like policy • Default implementation of regions uses OS mmap ๏ Node/Relationship cache • Cache a version more optimized for traversals 8 Monday, May 21, 2012
  • 15. What we put in cache ID Relationship ID refs in: R1 R2 ... Rn The structure of the elements in the high level type 1 object cache. out R1 R2 ... Rn On disk most of the information is contained in: R1 R2 R3 ... Rn in the relationship records, with the nodes just type 2 referencing their first relationship. In the out R1 ... Rn Node cache this is turned around: the nodes hold references to all its relationships. The ... (grouped by type) relationships are simple, only holding its properties. The relationships for each node is grouped by RelationshipType to allow fast traversal of a specific type. Key 1 Key 2 ... Key n All references (dotted arrows) are by ID, and traversals do indirect lookup through the cache. Val 1 Val 2 ID start end type Val n Relationship Key 1 Key 2 ... Key n Val 1 Val 2 Val n 9 Monday, May 21, 2012
  • 16. Outline So how do traversals work... Traversals Core API Cypher Node/Relationship Thread local diffs Object cache FS Cache HA Record files Transaction log Disk(s) 10 Monday, May 21, 2012
  • 17. Traversals - how do they work? ๏ RelationshipExpanders: given (a path to) a node, returns The surface layer, the you Relationships to continue traversing from that node interact with. ๏ Evaluators: given (a path to) a node, returns whether to: • Continue traversing on that branch (i.e. expand) or not • Include (the path to) the node in the result set or not ๏ Then a projection to Path, Node, or Relationship applied to each Path in the result set. ... but also: ๏ Uniqueness level: policy for when it is ok to revisit a node that has already been visited ๏ Implemented on top of the Core API 11 Monday, May 21, 2012
  • 18. More on Traversals ๏ Fetch node data from cache - non-blocking access This is what happens • under the hood. If not in cache, retrieve from storage, into cache ‣If region is in FS cache: blocking but short duration access ‣If region is outside FS cache: blocking slower access ๏ Get relationships from cached node • If not fetched, retrieve from storage, by following chains ๏ Expand relationship(s) to end up on next node(s) • The relationship knows the node, no need to fetch it yet ๏ Evaluate • possibly emitting a Path into the result set ๏ Repeat 12 Monday, May 21, 2012
  • 19. Outline How is Cypher different? Traversals Core API Cypher and how dowes it work? Node/Relationship Thread local diffs Object cache FS Cache HA Record files Transaction log Disk(s) 13 Monday, May 21, 2012
  • 20. Cypher - Just convenient traversal descriptions? ๏ Builds on the same infrastructure as Traversals - Expanders • but not on the full Traversal system ๏ Uses graph pattern matching for traversing the graph • Recursive MATCH x-->y, backtracking z-->a-->b, z-->b START x=... matching with x-->z, y-->z, Red: pattern graph Blue: actual graph Green: start node Purple: matches 14 Monday, May 21, 2012
  • 21. Cypher - Just convenient traversal descriptions? ๏ Builds on the same infrastructure as Traversals - Expanders • but not on the full Traversal system ๏ Uses graph pattern matching for traversing the graph • Recursive MATCH x-->y, backtracking z-->a-->b, z-->b START x=... matching with x-->z, y-->z, Red: pattern graph Blue: actual graph Green: start node Purple: matches 14 Monday, May 21, 2012
  • 22. Cypher - Just convenient traversal descriptions? ๏ Builds on the same infrastructure as Traversals - Expanders • but not on the full Traversal system ๏ Uses graph pattern matching for traversing the graph • Recursive MATCH x-->y, backtracking z-->a-->b, z-->b START x=... matching with x-->z, y-->z, Red: pattern graph Blue: actual graph Green: start node Purple: matches 14 Monday, May 21, 2012
  • 23. Cypher - Just convenient traversal descriptions? ๏ Builds on the same infrastructure as Traversals - Expanders • but not on the full Traversal system ๏ Uses graph pattern matching for traversing the graph • Recursive MATCH x-->y, backtracking z-->a-->b, z-->b START x=... matching with x-->z, y-->z, Red: pattern graph Blue: actual graph Green: start node Purple: matches 14 Monday, May 21, 2012
  • 24. Cypher - Just convenient traversal descriptions? ๏ Builds on the same infrastructure as Traversals - Expanders • but not on the full Traversal system ๏ Uses graph pattern matching for traversing the graph • Recursive MATCH x-->y, backtracking z-->a-->b, z-->b START x=... matching with x-->z, y-->z, Red: pattern graph Blue: actual graph Green: start node Purple: matches 14 Monday, May 21, 2012
  • 25. Cypher - Just convenient traversal descriptions? ๏ Builds on the same infrastructure as Traversals - Expanders • but not on the full Traversal system ๏ Uses graph pattern matching for traversing the graph • Recursive MATCH x-->y, backtracking z-->a-->b, z-->b START x=... matching with x-->z, y-->z, Red: pattern graph Blue: actual graph Green: start node Purple: matches 14 Monday, May 21, 2012
  • 26. Cypher - Just convenient traversal descriptions? ๏ Builds on the same infrastructure as Traversals - Expanders • but not on the full Traversal system ๏ Uses graph pattern matching for traversing the graph • Recursive MATCH x-->y, backtracking z-->a-->b, z-->b START x=... matching with x-->z, y-->z, Red: pattern graph Blue: actual graph Green: start node Purple: matches 14 Monday, May 21, 2012
  • 27. Cypher - Just convenient traversal descriptions? ๏ Builds on the same infrastructure as Traversals - Expanders • but not on the full Traversal system ๏ Uses graph pattern matching for traversing the graph • Recursive MATCH x-->y, backtracking z-->a-->b, z-->b START x=... matching with x-->z, y-->z, Red: pattern graph Blue: actual graph Green: start node Purple: matches 14 Monday, May 21, 2012
  • 28. Cypher - Just convenient traversal descriptions? ๏ Builds on the same infrastructure as Traversals - Expanders • but not on the full Traversal system ๏ Uses graph pattern matching for traversing the graph • Recursive MATCH x-->y, backtracking z-->a-->b, z-->b START x=... matching with x-->z, y-->z, Red: pattern graph Blue: actual graph Green: start node Purple: matches 14 Monday, May 21, 2012
  • 29. What about gremlin? ๏ gremlin is a third party language, built by Marko Rodriguez of Tinkerpop (a group of people who like to hack on graphs) ๏ Originally based on the idea of using xpath to describe traversals: ./HAS_CART/CONTAINS_ITEM/PURCHASED/PURCHASED but bastardized to distinguish between nodes and relationships: ./outE[label=HAS_CART]/inV Traversals are close /outE[label=CONTAINS_ITEM]/inV to xpath, which is why xpath like /inE[label=PURCHASED]/outV descriptions of traversals seemed /outE[label=PURCHASED]/inV like a good idea. ๏ xpath is not complete enough to express full algorithms, it needs a host language, gremlin originally defined its own. This changed Groovy as a more complete host language and abandoned xpath in favor of method chaining [ replace ‘/’ with ‘.’ ] 15 Monday, May 21, 2012
  • 30. Gremlin compared to Cypher ๏start me=node:people(name={myname}) match me-[:HAS_CART]->cart-[:CONTAINS_ITEM]->item item<-[:PURCHASED]-user-[:PURCHASED]->recommendation return recommendation ๏ Cypher is declarative, describes what data to get - its shape ๏ Gremlin is imperative, prescribes how to get the data ๏ Cypher has more opportunities for optimization by the engine ๏ Gremlin can implement pagerank, Cypher can’t (yet?) 16 Monday, May 21, 2012
  • 31. Outline Traversals Core API Cypher Transactions involve t wo parts: The (thread local) changes being done by an active transaction, Node/Relationship and the transaction replay log Thread local diffs Object cache for recovery. FS Cache HA Record files Transaction log Disk(s) 17 Monday, May 21, 2012
  • 32. Transaction Isolation ๏ Mutating operations are not written when performed ๏ They are stored in a thread confined transaction state object ๏ This prevents other threads from seeing uncommitted changes from the transactions of other threads ๏ When Transaction.finish() is invoked the transaction is either committed or rolled back ๏ Rollback is simple: discard the transaction state object 18 Monday, May 21, 2012
  • 33. Transactions & Durability ๏ Commit is: • Changes made in the transaction are collected as commands • Commands are sorted to get predictable update order ‣This prevents concurrent readers from seeing inconsistent data when the changes are applied to the store • Write changes (in sorted order) to the transaction log • Mark the transaction as committed in the log • Apply the changes (in sorted order) to the store files 19 Monday, May 21, 2012
  • 34. Recovery ๏ Transaction commands dictate state, they don’t modify state • i.e. SET property "count" to 5 • rather than ADD 1 to property "count" ๏ Thus: Applying the same command twice yields the same state ๏ Recovery simply replays all transactions since the last safe point ๏ If tx A mutates node1.name, then tx B also mutates node1.name that doesn’t matter, because the database is not recovered until all transactions have been replayed 20 Monday, May 21, 2012
  • 35. Outline Traversals Core API Cypher Node/Relationship Thread local diffs Object cache FS Cache HA Record files Transaction log High Availability in Neo4j builds on top of the transaction replay Disk(s) 21 Monday, May 21, 2012
  • 36. Outline The transaction logs are shared bet ween all instances in an High Availability setup, all other parts operate on the local data just like in the standalone case. Traversals Core API Cypher Local Node/Relationship Thread local diffs Object cache FS Cache HA Record files Transaction log Shared Disk(s) 22 Monday, May 21, 2012
  • 37. • HA - the parts to it: ๏ Based on streaming transactions between servers ๏ All transactions are committed through the master • Then (eventually) applied to the slaves • Eventuality synchronizationupdate intervalby interaction or when defined by the is mandated ๏ When writing to a slave: • Locks coordinated through the master • Transaction data buffered on theget a txid applied first on the master to slave then applied with the same txid on the slave 23 Monday, May 21, 2012
  • 38. Creating new Nodes and Relationships ๏ New Nodes/Relationships don’t need locks, so they don’t need a transaction synced with master until the transaction commit ๏ They do need an ID that is unique and equal among all instances ๏ Each instance allocates IDs in blocks from the master, then assigns them to new Nodes/Relationships locally • This batch allocation can be seen in (Enterprise) 1000 as Node/Relationship counts jumping in steps of WebAdmin 24 Monday, May 21, 2012
  • 39. HA synchronization points ๏ Transactions are uniquely identified by monotonically increasing ID ๏ All Requests from slave send the current latest txid on that slave ๏ Responses from master send back a stream of transactions that have occurred since then, along with the actual result ๏ Transaction is replayed just like when committed / recovered ๏ Nodes/Relationships touched during this application are evicted from cache to avoid cache staleness ๏ Transaction commands are only sorted when created, before stored/transmitted, thus consistency is preserved during all application phases 25 Monday, May 21, 2012
  • 40. Locking semantics of HA ๏ To be granted a lock the slave must have the latest version of the Node/Relationship it is trying to lock • This ensures consistency • The implementation of “Latest version ofentireNode/ Relationship” is “Latest version of the the graph” • The slave must thus sync transactions from the master 26 Monday, May 21, 2012
  • 41. Master election ๏ Each instance communicates/coordinates: • its latest transaction id (including the master id for that tx) • the id forclock value for when the txid was written that instance • (logical) ๏ Election chooses: 1. The instance with highest txid 2. IF multiple: The instance that was master for that tx 3. IF unavailable: The instance with the lowest clock value 4. IF multiple: The instance with the lowest id ๏ Election happens when the current master cannot be reached • Any instance can choose to re-elect • Each instance runs the election protocol individually • Notify others when election chooses new master ๏ When elected, the new master broadcasts to all instances, 27 forcing them to bind to the new master Monday, May 21, 2012
  • 42. Thank you for listening! tobias@neotechnology.com Tobias Lindaaker twitter: @thobe, #neo4j (@neo4j) web: neo4j.org neotechnology.com Hacker @ Neo Technology my web: thobe.org Monday, May 21, 2012