SlideShare a Scribd company logo
1 of 40
Download to read offline
Egeria and Graphs
Graham Wallis, August 2020
Graham Wallis is an open-source developer and maintainer on the ODPi Egeria project. He has worked with graph-
related technologies for about 5 years, so he doesn’t have all the answers but hopes you find this presentation
interesting and useful.
Metadata, Sharing & Automation
An example of open, standardized metadata
4
• In a commercial setting, metadata is used to describe:
• database records and schemas, files and file formats, documents, models, …
• systems, applications, processes such as ETL, archiving, analytics, …
• business concepts as glossaries of terms and their semantic assignments
• In typical commercial organizations:
• the data landscape is vast and distributed
• data is dispersed across multiple data lakes managed by different parts of an organization
• multiple tools from different vendors are used to load, access and manage the data
• multiple tools are used to analyze the data
Commercial metadata and governance
6
Today’s reality – separate tools, disjointed metadata
• Organizations need a business-friendly logical interface to the data landscape. This implies that
the organization develop a common business vocabulary or glossary.
• Organizations need governance of data to be driven by the metadata, requiring that the metadata
is accurate and up-to-date.
• The maintenance of metadata must be automated to scale to the volumes and variety of data
involved in modern business.
• The metadata must be available across different tools and platforms so that processing engines
can build capability around it.
• Wherever possible, discovery and maintenance of metadata must be an integral part of tools that
access, change and move information.
• Metadata access must become open and remotely accessible so that tools from different vendors
can work with metadata located on different platforms.
• This implies unique identifiers for metadata elements, some level of standardization in the types
and formats for metadata and standard interfaces for accessing and manipulating metadata.
Commercial metadata and governance
The ODPi Egeria project
• ODPi Egeria is an open source project dedicated to making metadata open and automatically
exchanged between tools and data platforms
• Egeria provides an Apache 2.0 licensed platform to enable users and vendors to create an open
ecosystem for metadata
• Egeria arose from several years work by Mandy Chessell (IBM), Ferd Scheepers (ING Bank) and others,
on data lakes, data governance & common information models
• Egeria is hosted by the Linux Foundation ODPi project (Open Data Platform Initiative): egeria.odpi.org
• The code is on Github: github.com/odpi/egeria
• The Egeria community includes IBM, ING Bank, Manta and SAS plus contributions and interest from
other organizations and individuals.
Egeria Project & Community
10
Today’s reality – separate tools, disjointed metadata
11
Egeria enables exchange of metadata between tools from different vendors
Open and
Unified Metadata
Development DevOps Data Science
Egeria Servers and Cohorts
Cohort Cohort
External
Tool/Repository
Egeria
Server
Egeria
Server
Egeria
Server
Egeria
Server
Egeria
Repository
Egeria
Repository
Egeria
Server
A server may have a repository or may support a
given tool or external repository.
A server may join multiple cohorts.
Applications
Applications
Graphs in Metadata
Graphs in Metadata
Business
metadata
Structural
metadata for
a data store
EMPNAME EMPNO JOBCODE SALARY
EMPLOYEE
RECORD
Employee
Work Location
Annual Salary
Job Title
Employee Id
Employee Name
Hourly Pay Rate
Manager Compensation Plan
HAS-A
HAS-A
HAS-A
HAS-A
HAS-A
HAS-A
IS-A IS-A
SensitiveIS-A
Data
• The interconnected nature of metadata forms a graph
• The business concepts associated with the data form a graph of terms and classifications
Graphs in Metadata
• Different tools or databases gives rise to graphs at both business and technical levels
Querying across graphs…
• Enterprise integration and queries require that we can query across graphs and
between business and technical metadata
Parallels between graphs…
• The graph of artifacts in a Discovery Analysis Report mirrors the graph of schema elements
• As seen from the foregoing examples (of different tools, business and technical metadata, discovery analysis
reports) there are many graph-like structures in metadata
• Egeria is therefore based on graphs and graph-like approaches; it includes a graph repository and graph-
based tooling
• The Open Metadata Types form graphs - an entity type inheritance graph and a graph of the possible
relationship types for an entity type
• We also see graphs in glossary structure (glossary, terms, categories) as well in the semantic assignment of
glossary terms to metadata instances
• Metadata instances (entities, relationships and classifications) are organized as graphs and can be queried
using graph traversals
Graphs in Egeria
• Within the Egeria integration UI:
• The Type Explorer can be used to visualize entity type inheritance and entity type relationship graphs
• The Repository Explorer can be used to explore graphs of entities and relationships across repositories
• The Admin UI shows the deployed topology of Egeria platforms, servers and cohorts
Egeria UI graph visualizations
• Egeria can transparently federate metadata from multiple repositories, giving rise to a distributed graph
• Entities in different repositories can be related by a relationship in either repository or a further repository
• Entities and relationships in different repositories can be queried and traversed as if they were collocated
• Egeria’s federation capability avoids the need to move or copy metadata
• Ownership remains with the current owner
• There is no duplication, or risk of updates being applied to a copy of the metadata
• Egeria can create a local reference copy of a remote instance, as a locally cached copy, but ownership of the
metadata remains with the tool and repository that created it. Updates are only permitted on the owner’s
original, not on the copies
• When an Egeria user accesses a remote instance, the Egeria server will register interest in the remote
instance
• If the remote instance is modified or deleted, any registered Egeria servers receive events, delivered to the
access services that triggered the interest
• Ownership of an instance can be transferred if necessary
Egeria federation (a distributed graph)
Egeria distributed graph model
21
Database
Column
Glossary
Term
OMAG Server 1 OMAG Server 2
§ A pair of entities may be stored in separate servers
Egeria distributed graph model – using reference copies
22
Database
Column
Glossary
Term
Glossary
Term
Meaning
OMAG Server 1 OMAG Server 2
§ One entity could be replicated to the other server, as a ‘reference copy’
§ The original Glossary Term on OMAG Server 2 is still the authoritative instance; the copy cannot be updated
§ A relationship could be defined between the local DB column and the reference copy of the Glossary Term
Reference Copy
Egeria distributed graph model – using reference copies
23
Database
Column
Glossary
Term
OMAG Server 1
OMAG Server 3
OMAG Server 2
Database
Column
Glossary
Term
Meaning
§ Alternatively, both entities could be replicated to a third server, as reference copies
§ The originals are still the authoritative instances
§ A relationship could be defined between the local reference copies
Egeria distributed graph model – using entity proxies
24
Database
Column
Glossary
Term
OMAG Server 1
OMAG Server 3
OMAG Server 2
Meaning
Database
Column
Glossary
Term
§ Instead of replication, the third server could relate the original entities using entity proxies
Entity Proxy
The Egeria Graph Repository
Egeria OMRS Repositories
26
Search
Open Metadata Access Services
Open Metadata Repository Services
• Egeria includes a choice of metadata repositories, which can be used as additional metadata stores that can plug
functional gaps between other tools and repositories and can provide local access
• One of the Egeria repositories is a graph repository, which lends itself to the types of queries we saw earlier
Egeria Open Metadata Repository Services (OMRS)
• The OMRS defines a protocol and a set of connectors
• The Enterprise Connector performs cohort-wide operations – this
includes issuing queries to the cohort and when metadata is replicated
from another server it can use the local connector and repository to
cache it for availability and performance
• The Local Connector performs local operations and provides a default
Event Mapper that enables events relating to local operations to be sent
to the cohort
• The Repository Connector interfaces to a specific repository – and
optionally, may be accompanied by a custom Event Mapper
• Egeria provides two built-in repositories and there are connectors to
other repositories
• The interface to a repository connector is the MetadataCollection API,
described on the next slide
OMRS Enterprise Connector
OMRS Local Connector
& Event Mapper
OMRS Repository Connector
Repository
Cohort
MetadataCollection
API
The OMRSMetadataCollection interface
• The interface to an Egeria repository is the OMRSMetadataCollection interface
• It includes groups of operations:
• Group 1: Identification of the metadata repository - metadataCollectionId
• Group 2: Type definitions (types, attributes) - add, find, get, remove, …
• Group 3: Find instances (entities, relationships) - get, find, graph-queries, …
• Group 4: Maintain instances (entities, relationships) - addEntity, deleteEntity, …
• Group 5: Change control information (entities, relationships) - reIdentify, reHome, …
• Group 6: Maintenance of reference (replica) copies – save, purge, refresh,…
Egeria Local Graph Repository
• The Egeria distribution includes a persistent repository and a non-persistent repository
• The persistent repository is a graph repository built on JanusGraph, an open-source graph database project, hosted by the
Linux Foundation
• http://janusgraph.org
• http://github.com/janusgraph/janusgraph
• The built-in graph repository provides an OMAG Server with a persistent metadata store and is built using Egeria’s ‘plugin’
pattern
• The graph repository can store instances of metadata owned by the local server
• It can also store reference copies of metadata instances replicated to the local server
• It also supports relationship instances that refer to entity proxy instances
• Other graph databases are available, and Egeria’s pluggable connector architecture enables the creation of repository
connectors for different databases.
• The Conformance Test Suite provides a set of automated tests that can be run against a repository to assess whether it
correctly implements the Egeria types and interfaces
Anatomy of the local graph repository
30
Graph Metadata Store
JanusGraph
persistence
search
OMAG Server
OMAS – access services
OMRS Enterprise Connector OMRS topics
in
out
Apache
Tinkerpop
OMRS Local Connector
& Event Mapper
OMRS Graph Connector
JanusGraph
Management
Cohort
Graph Repository configurations
• The first release of the Egeria Graph Repository used BerkeleyDB and Lucene as embedded persistence
and indexing backends. This provides a relatively simple quick-start configuration, especially good for
development and testing and sufficient for some production uses.
• In production it may be desirable (or essential) to use a different persistence backend (e.g. Cassandra) or
indexing backend (e.g. Elastic).
• ING Bank added to the configuration of the Graph Repository to enable the use of (remote) Cassandra and
Elastic services.
• Discussions have started about work to add a remote JanusGraph Server configuration in order to provide
an HA option.
Graph Repository components
• GraphOMRSRepositoryConnector - implements the open connector framework interface
• GraphOMRSRepositoryConnectorProvider – implements the mechanism for brokering a connector
• GraphOMRSMetadataCollection – top level interface supporting type and instance operations
• GraphOMRSMetadataStore – implements the MetadataCollection using a graph database
• GraphOMRSGraphFactory – creation, schema, indexing - encapsulates JanusGraph-specifics
• Mappers – convert between OMRS objects and graph vertices and edges
• GraphOMRSEntityMapper
• GraphOMRSRelationshipMapper
• GraphOMRSClassificationMapper
• Plus various utility classes – error codes, audit logging, constants and utility methods
https://github.com/odpi/egeria/
See open-metadata-implementation/adapters/open-connectors/repository-services-connectors/
open-metadata-collection-store-connectors/graph-repository-connector
To use the Egeria Graph Repository
• Configure the OMAG Server with repository-mode = ‘local-graph-repository’
• e.g. HTTP POST http://localhost:8080/open-metadata/admin-
services/users/{username}/servers/{servermame}/local-repository/mode/local-graph-repository
• Start the OMRS instance in the server
• e.g. HTTP POST http://localhost:8080/open-metadata/admin-
services/users/{username}/servers/{servername}/instance
• If using the embedded configuration of Berkeley DB for persistence and Lucene for indexing,
when OMRS starts, the graph repository auto-creates a JanusGraph database – including:
• Persistence backend
• Search backend
• Graph schema
• Search indexes
• If using alternative backends for persistence or indexing, ensure that they are correctly configured
and available before starting the OMAG Server.
Graph Schema
The MetadataCollection interface is the formal interface to an Egeria repository.
Whilst it is possible to look at the graph directly (e.g. using Gremlin console):
Please don’t rely on the schema – it is likely to evolve
Type data:
• The Graph Repository does not store type definitions
• It delegates all type operations to the Repository Content Manager
Instance data:
• The Egeria Graph Repository stores instance data, using a JanusGraph schema that has:
• vertices for entities and classifications
• edges for relationships and classifiers
Instance representations in the OMRS
35
Relationship
Instance
Entity
Instance
Entity
Instance
Classification
Instance
Classification
Instance
Primitives
Enums
Collections
Attributes
Attributes
Attributes
Attributes
Primitives
Enums
Collections
Primitives
Enums
Collections
Primitives
Enums
Collections
Primitives
Enums
Collections
Attributes
Graph mapping – vertices and edges Relationship
Instance
Entity
Instance
Entity
Instance
Classification
Instance
Classification
Instance
Primitives
Enums
Collections
Attributes
Attributes
Attributes
Attributes
Primitives
Enums
Collections
Primitives
Enums
Collections
Primitives
Enums
Collections
Primitives
Enums
Collections
Attributes
Classification
Instance
Entity
Instance
Relationship Instance
Attributes
Primitives
Enums
Collections
AttributesAttributes
Primitives
Enums
Collections
Primitives
Enums
Collections
label : “classification” label : “entity” label : “relationship”
Properties Properties Properties
vertex
label : “classifier”
Properties
OMRSinstance
representation
Graphschema
element
vertex edge edge
Graph mapping – vertices and edges
Properties
Properties Properties
Properties
Properties
relationship
classifier
classifier
entity
entity
classification
classification
Relationship
Instance
Entity
Instance
Entity
Instance
Classification
Instance
Classification
Instance
Primitives
Enums
Collections
Attributes
Attributes
Attributes
Attributes
Primitives
Enums
Collections
Primitives
Enums
Collections
Primitives
Enums
Collections
Primitives
Enums
Collections
Attributes
Local instances, reference copies and proxies
38
• The graph contains one vertex per entity – whether the entity is local, a reference copy or a proxy
• If the entity has an associated classification, the classification is stored as a vertex, with an edge from the
entity vertex to the classification vertex
• The graph contains one edge per relationship – whether the relationship is local or a reference copy
• Reference Copies
• The metadataCollectionId core attribute is set to the ‘guid’ of the home repository
• Entity Proxy objects
• Each entity instance has a vertex property of type Boolean, to indicate whether the instance is a
proxy
Metadata Collection ‘graph-query’ methods
• There are 4 sub-graph query methods:
• getRelatedEntities() - optional
• Returns the entity and its immediate neighbors
• getEntityNeighborhood() - optional
• Returns the entity and its neighbors up to the depth specified by
the ‘level’ parameter
• getLinkingEntities() - optional
• Returns the relationships and intermediate entities that connect
the specified pair of entities
• getRelationshipsForEntity() - mandatory
• Returns relationships associated with entity, optionally filtered
by relationship type and status
level = 2
Graph Repository – supported functions
• The GraphRepository supports most of the OMRS MetadataCollection API, including:
• Save and purge of reference copies
• Use of entity proxies
• Delete and restore as well as purge – delete is a soft, restorable delete; purge is permanent
• Re-type of instances
• Re-identify of instances
• Re-home of instances
• The four ‘graph queries’ – described on the previous slide
• The ‘find’ methods – find..ByProperty, find..ByPropertyValue, findEntityByClassification
• The Graph Repository does not (yet) support:
• Historic queries – find methods that specify an asOfTime parameter
• Undo of previous instance updates
• Egeria project website: egeria.odpi.org
• Github: github.com/odpi/egeria
• Slack: https://slack.odpi.org/
More information…

More Related Content

What's hot

Oracle GoldenGate 21c New Features and Best Practices
Oracle GoldenGate 21c New Features and Best PracticesOracle GoldenGate 21c New Features and Best Practices
Oracle GoldenGate 21c New Features and Best PracticesBobby Curtis
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerDatabricks
 
Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...
Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...
Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...HostedbyConfluent
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
Inside open metadata—the deep dive
Inside open metadata—the deep diveInside open metadata—the deep dive
Inside open metadata—the deep diveDataWorks Summit
 
Interactive real time dashboards on data streams using Kafka, Druid, and Supe...
Interactive real time dashboards on data streams using Kafka, Druid, and Supe...Interactive real time dashboards on data streams using Kafka, Druid, and Supe...
Interactive real time dashboards on data streams using Kafka, Druid, and Supe...DataWorks Summit
 
Apache Kafka® and the Data Mesh
Apache Kafka® and the Data MeshApache Kafka® and the Data Mesh
Apache Kafka® and the Data MeshConfluentInc1
 
Oracle GoldenGate and Apache Kafka: A Deep Dive Into Real-Time Data Streaming
Oracle GoldenGate and Apache Kafka: A Deep Dive Into Real-Time Data StreamingOracle GoldenGate and Apache Kafka: A Deep Dive Into Real-Time Data Streaming
Oracle GoldenGate and Apache Kafka: A Deep Dive Into Real-Time Data StreamingMichael Rainey
 
Azure data factory
Azure data factoryAzure data factory
Azure data factoryBizTalk360
 
Azure data factory
Azure data factoryAzure data factory
Azure data factoryDavid Giard
 
Oracle Architecture
Oracle ArchitectureOracle Architecture
Oracle ArchitectureNeeraj Singh
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
Azure Database Services for MySQL PostgreSQL and MariaDB
Azure Database Services for MySQL PostgreSQL and MariaDBAzure Database Services for MySQL PostgreSQL and MariaDB
Azure Database Services for MySQL PostgreSQL and MariaDBNicholas Vossburg
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks DeltaDatabricks
 
PySpark Programming | PySpark Concepts with Hands-On | PySpark Training | Edu...
PySpark Programming | PySpark Concepts with Hands-On | PySpark Training | Edu...PySpark Programming | PySpark Concepts with Hands-On | PySpark Training | Edu...
PySpark Programming | PySpark Concepts with Hands-On | PySpark Training | Edu...Edureka!
 

What's hot (20)

Oracle GoldenGate 21c New Features and Best Practices
Oracle GoldenGate 21c New Features and Best PracticesOracle GoldenGate 21c New Features and Best Practices
Oracle GoldenGate 21c New Features and Best Practices
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics Primer
 
Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...
Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...
Data Mess to Data Mesh | Jay Kreps, CEO, Confluent | Kafka Summit Americas 20...
 
Apache Spark 101
Apache Spark 101Apache Spark 101
Apache Spark 101
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
 
Apache Ranger
Apache RangerApache Ranger
Apache Ranger
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
Inside open metadata—the deep dive
Inside open metadata—the deep diveInside open metadata—the deep dive
Inside open metadata—the deep dive
 
Interactive real time dashboards on data streams using Kafka, Druid, and Supe...
Interactive real time dashboards on data streams using Kafka, Druid, and Supe...Interactive real time dashboards on data streams using Kafka, Druid, and Supe...
Interactive real time dashboards on data streams using Kafka, Druid, and Supe...
 
Apache Kafka® and the Data Mesh
Apache Kafka® and the Data MeshApache Kafka® and the Data Mesh
Apache Kafka® and the Data Mesh
 
Oracle GoldenGate and Apache Kafka: A Deep Dive Into Real-Time Data Streaming
Oracle GoldenGate and Apache Kafka: A Deep Dive Into Real-Time Data StreamingOracle GoldenGate and Apache Kafka: A Deep Dive Into Real-Time Data Streaming
Oracle GoldenGate and Apache Kafka: A Deep Dive Into Real-Time Data Streaming
 
Azure data factory
Azure data factoryAzure data factory
Azure data factory
 
Azure data factory
Azure data factoryAzure data factory
Azure data factory
 
Oracle Architecture
Oracle ArchitectureOracle Architecture
Oracle Architecture
 
Apache Atlas: Governance for your Data
Apache Atlas: Governance for your DataApache Atlas: Governance for your Data
Apache Atlas: Governance for your Data
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Azure Database Services for MySQL PostgreSQL and MariaDB
Azure Database Services for MySQL PostgreSQL and MariaDBAzure Database Services for MySQL PostgreSQL and MariaDB
Azure Database Services for MySQL PostgreSQL and MariaDB
 
Graph databases
Graph databasesGraph databases
Graph databases
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks Delta
 
PySpark Programming | PySpark Concepts with Hands-On | PySpark Training | Edu...
PySpark Programming | PySpark Concepts with Hands-On | PySpark Training | Edu...PySpark Programming | PySpark Concepts with Hands-On | PySpark Training | Edu...
PySpark Programming | PySpark Concepts with Hands-On | PySpark Training | Edu...
 

Similar to Egeria and graphs

OSS NA 2019 - Demo Booth deck overview of Egeria
OSS NA 2019 - Demo Booth deck overview of EgeriaOSS NA 2019 - Demo Booth deck overview of Egeria
OSS NA 2019 - Demo Booth deck overview of EgeriaODPi
 
Technical Challenges in Open Metadata
Technical Challenges in Open MetadataTechnical Challenges in Open Metadata
Technical Challenges in Open MetadataAll Things Open
 
Metadata-powered dissemination of content
Metadata-powered dissemination of contentMetadata-powered dissemination of content
Metadata-powered dissemination of contentNikos Manouselis
 
Diksha sda presentation
Diksha sda presentationDiksha sda presentation
Diksha sda presentationdikshagupta111
 
Become an data driven organization through unified metadata using ODPi Egeria
Become an data driven organization through unified metadata using ODPi EgeriaBecome an data driven organization through unified metadata using ODPi Egeria
Become an data driven organization through unified metadata using ODPi EgeriaData Con LA
 
Open Metadata and Governance with Apache Atlas
Open Metadata and Governance with Apache AtlasOpen Metadata and Governance with Apache Atlas
Open Metadata and Governance with Apache AtlasDataWorks Summit
 
Ladies Be Architects - Integration - Multi-Org, Security, JSON, Backup & Restore
Ladies Be Architects - Integration - Multi-Org, Security, JSON, Backup & RestoreLadies Be Architects - Integration - Multi-Org, Security, JSON, Backup & Restore
Ladies Be Architects - Integration - Multi-Org, Security, JSON, Backup & Restoregemziebeth
 
1-informatica-training
1-informatica-training1-informatica-training
1-informatica-trainingKrishna Sujeer
 
Aucfanlab Datalake - Big Data Management Platform -
Aucfanlab Datalake - Big Data Management Platform -Aucfanlab Datalake - Big Data Management Platform -
Aucfanlab Datalake - Big Data Management Platform -Aucfan
 
Semantic Technologies for Enterprise Cloud Management
Semantic Technologies for Enterprise Cloud ManagementSemantic Technologies for Enterprise Cloud Management
Semantic Technologies for Enterprise Cloud ManagementPeter Haase
 
Co 4, session 2, aws analytics services
Co 4, session 2, aws analytics servicesCo 4, session 2, aws analytics services
Co 4, session 2, aws analytics servicesm vaishnavi
 
Apache atlas sydney 2017-v4
Apache atlas   sydney 2017-v4Apache atlas   sydney 2017-v4
Apache atlas sydney 2017-v4Nigel Jones
 
Portal and Intranets
Portal and Intranets Portal and Intranets
Portal and Intranets Redar Ismail
 
Common Data Model - A Business Database!
Common Data Model - A Business Database!Common Data Model - A Business Database!
Common Data Model - A Business Database!Pedro Azevedo
 
IncQuery Server for Teamwork Cloud - Talk at IW2019
IncQuery Server for Teamwork Cloud - Talk at IW2019IncQuery Server for Teamwork Cloud - Talk at IW2019
IncQuery Server for Teamwork Cloud - Talk at IW2019Istvan Rath
 
Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Cask Data
 
Delivering a Linked Data warehouse and realising the power of graphs
Delivering a Linked Data warehouse and realising the power of graphsDelivering a Linked Data warehouse and realising the power of graphs
Delivering a Linked Data warehouse and realising the power of graphsBen Gardner
 
Big SQL 3.0 - Fast and easy SQL on Hadoop
Big SQL 3.0 - Fast and easy SQL on HadoopBig SQL 3.0 - Fast and easy SQL on Hadoop
Big SQL 3.0 - Fast and easy SQL on HadoopWilfried Hoge
 
Informatica intro
Informatica introInformatica intro
Informatica introvam1
 
Software engineering practices for the data science and machine learning life...
Software engineering practices for the data science and machine learning life...Software engineering practices for the data science and machine learning life...
Software engineering practices for the data science and machine learning life...DataWorks Summit
 

Similar to Egeria and graphs (20)

OSS NA 2019 - Demo Booth deck overview of Egeria
OSS NA 2019 - Demo Booth deck overview of EgeriaOSS NA 2019 - Demo Booth deck overview of Egeria
OSS NA 2019 - Demo Booth deck overview of Egeria
 
Technical Challenges in Open Metadata
Technical Challenges in Open MetadataTechnical Challenges in Open Metadata
Technical Challenges in Open Metadata
 
Metadata-powered dissemination of content
Metadata-powered dissemination of contentMetadata-powered dissemination of content
Metadata-powered dissemination of content
 
Diksha sda presentation
Diksha sda presentationDiksha sda presentation
Diksha sda presentation
 
Become an data driven organization through unified metadata using ODPi Egeria
Become an data driven organization through unified metadata using ODPi EgeriaBecome an data driven organization through unified metadata using ODPi Egeria
Become an data driven organization through unified metadata using ODPi Egeria
 
Open Metadata and Governance with Apache Atlas
Open Metadata and Governance with Apache AtlasOpen Metadata and Governance with Apache Atlas
Open Metadata and Governance with Apache Atlas
 
Ladies Be Architects - Integration - Multi-Org, Security, JSON, Backup & Restore
Ladies Be Architects - Integration - Multi-Org, Security, JSON, Backup & RestoreLadies Be Architects - Integration - Multi-Org, Security, JSON, Backup & Restore
Ladies Be Architects - Integration - Multi-Org, Security, JSON, Backup & Restore
 
1-informatica-training
1-informatica-training1-informatica-training
1-informatica-training
 
Aucfanlab Datalake - Big Data Management Platform -
Aucfanlab Datalake - Big Data Management Platform -Aucfanlab Datalake - Big Data Management Platform -
Aucfanlab Datalake - Big Data Management Platform -
 
Semantic Technologies for Enterprise Cloud Management
Semantic Technologies for Enterprise Cloud ManagementSemantic Technologies for Enterprise Cloud Management
Semantic Technologies for Enterprise Cloud Management
 
Co 4, session 2, aws analytics services
Co 4, session 2, aws analytics servicesCo 4, session 2, aws analytics services
Co 4, session 2, aws analytics services
 
Apache atlas sydney 2017-v4
Apache atlas   sydney 2017-v4Apache atlas   sydney 2017-v4
Apache atlas sydney 2017-v4
 
Portal and Intranets
Portal and Intranets Portal and Intranets
Portal and Intranets
 
Common Data Model - A Business Database!
Common Data Model - A Business Database!Common Data Model - A Business Database!
Common Data Model - A Business Database!
 
IncQuery Server for Teamwork Cloud - Talk at IW2019
IncQuery Server for Teamwork Cloud - Talk at IW2019IncQuery Server for Teamwork Cloud - Talk at IW2019
IncQuery Server for Teamwork Cloud - Talk at IW2019
 
Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?
 
Delivering a Linked Data warehouse and realising the power of graphs
Delivering a Linked Data warehouse and realising the power of graphsDelivering a Linked Data warehouse and realising the power of graphs
Delivering a Linked Data warehouse and realising the power of graphs
 
Big SQL 3.0 - Fast and easy SQL on Hadoop
Big SQL 3.0 - Fast and easy SQL on HadoopBig SQL 3.0 - Fast and easy SQL on Hadoop
Big SQL 3.0 - Fast and easy SQL on Hadoop
 
Informatica intro
Informatica introInformatica intro
Informatica intro
 
Software engineering practices for the data science and machine learning life...
Software engineering practices for the data science and machine learning life...Software engineering practices for the data science and machine learning life...
Software engineering practices for the data science and machine learning life...
 

Recently uploaded

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 

Recently uploaded (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 

Egeria and graphs

  • 1. Egeria and Graphs Graham Wallis, August 2020 Graham Wallis is an open-source developer and maintainer on the ODPi Egeria project. He has worked with graph- related technologies for about 5 years, so he doesn’t have all the answers but hopes you find this presentation interesting and useful.
  • 2. Metadata, Sharing & Automation
  • 3. An example of open, standardized metadata 4
  • 4. • In a commercial setting, metadata is used to describe: • database records and schemas, files and file formats, documents, models, … • systems, applications, processes such as ETL, archiving, analytics, … • business concepts as glossaries of terms and their semantic assignments • In typical commercial organizations: • the data landscape is vast and distributed • data is dispersed across multiple data lakes managed by different parts of an organization • multiple tools from different vendors are used to load, access and manage the data • multiple tools are used to analyze the data Commercial metadata and governance
  • 5. 6 Today’s reality – separate tools, disjointed metadata
  • 6. • Organizations need a business-friendly logical interface to the data landscape. This implies that the organization develop a common business vocabulary or glossary. • Organizations need governance of data to be driven by the metadata, requiring that the metadata is accurate and up-to-date. • The maintenance of metadata must be automated to scale to the volumes and variety of data involved in modern business. • The metadata must be available across different tools and platforms so that processing engines can build capability around it. • Wherever possible, discovery and maintenance of metadata must be an integral part of tools that access, change and move information. • Metadata access must become open and remotely accessible so that tools from different vendors can work with metadata located on different platforms. • This implies unique identifiers for metadata elements, some level of standardization in the types and formats for metadata and standard interfaces for accessing and manipulating metadata. Commercial metadata and governance
  • 7. The ODPi Egeria project
  • 8. • ODPi Egeria is an open source project dedicated to making metadata open and automatically exchanged between tools and data platforms • Egeria provides an Apache 2.0 licensed platform to enable users and vendors to create an open ecosystem for metadata • Egeria arose from several years work by Mandy Chessell (IBM), Ferd Scheepers (ING Bank) and others, on data lakes, data governance & common information models • Egeria is hosted by the Linux Foundation ODPi project (Open Data Platform Initiative): egeria.odpi.org • The code is on Github: github.com/odpi/egeria • The Egeria community includes IBM, ING Bank, Manta and SAS plus contributions and interest from other organizations and individuals. Egeria Project & Community
  • 9. 10 Today’s reality – separate tools, disjointed metadata
  • 10. 11 Egeria enables exchange of metadata between tools from different vendors Open and Unified Metadata Development DevOps Data Science
  • 11. Egeria Servers and Cohorts Cohort Cohort External Tool/Repository Egeria Server Egeria Server Egeria Server Egeria Server Egeria Repository Egeria Repository Egeria Server A server may have a repository or may support a given tool or external repository. A server may join multiple cohorts. Applications Applications
  • 13. Graphs in Metadata Business metadata Structural metadata for a data store EMPNAME EMPNO JOBCODE SALARY EMPLOYEE RECORD Employee Work Location Annual Salary Job Title Employee Id Employee Name Hourly Pay Rate Manager Compensation Plan HAS-A HAS-A HAS-A HAS-A HAS-A HAS-A IS-A IS-A SensitiveIS-A Data • The interconnected nature of metadata forms a graph • The business concepts associated with the data form a graph of terms and classifications
  • 14. Graphs in Metadata • Different tools or databases gives rise to graphs at both business and technical levels
  • 15. Querying across graphs… • Enterprise integration and queries require that we can query across graphs and between business and technical metadata
  • 16. Parallels between graphs… • The graph of artifacts in a Discovery Analysis Report mirrors the graph of schema elements
  • 17. • As seen from the foregoing examples (of different tools, business and technical metadata, discovery analysis reports) there are many graph-like structures in metadata • Egeria is therefore based on graphs and graph-like approaches; it includes a graph repository and graph- based tooling • The Open Metadata Types form graphs - an entity type inheritance graph and a graph of the possible relationship types for an entity type • We also see graphs in glossary structure (glossary, terms, categories) as well in the semantic assignment of glossary terms to metadata instances • Metadata instances (entities, relationships and classifications) are organized as graphs and can be queried using graph traversals Graphs in Egeria
  • 18. • Within the Egeria integration UI: • The Type Explorer can be used to visualize entity type inheritance and entity type relationship graphs • The Repository Explorer can be used to explore graphs of entities and relationships across repositories • The Admin UI shows the deployed topology of Egeria platforms, servers and cohorts Egeria UI graph visualizations
  • 19. • Egeria can transparently federate metadata from multiple repositories, giving rise to a distributed graph • Entities in different repositories can be related by a relationship in either repository or a further repository • Entities and relationships in different repositories can be queried and traversed as if they were collocated • Egeria’s federation capability avoids the need to move or copy metadata • Ownership remains with the current owner • There is no duplication, or risk of updates being applied to a copy of the metadata • Egeria can create a local reference copy of a remote instance, as a locally cached copy, but ownership of the metadata remains with the tool and repository that created it. Updates are only permitted on the owner’s original, not on the copies • When an Egeria user accesses a remote instance, the Egeria server will register interest in the remote instance • If the remote instance is modified or deleted, any registered Egeria servers receive events, delivered to the access services that triggered the interest • Ownership of an instance can be transferred if necessary Egeria federation (a distributed graph)
  • 20. Egeria distributed graph model 21 Database Column Glossary Term OMAG Server 1 OMAG Server 2 § A pair of entities may be stored in separate servers
  • 21. Egeria distributed graph model – using reference copies 22 Database Column Glossary Term Glossary Term Meaning OMAG Server 1 OMAG Server 2 § One entity could be replicated to the other server, as a ‘reference copy’ § The original Glossary Term on OMAG Server 2 is still the authoritative instance; the copy cannot be updated § A relationship could be defined between the local DB column and the reference copy of the Glossary Term Reference Copy
  • 22. Egeria distributed graph model – using reference copies 23 Database Column Glossary Term OMAG Server 1 OMAG Server 3 OMAG Server 2 Database Column Glossary Term Meaning § Alternatively, both entities could be replicated to a third server, as reference copies § The originals are still the authoritative instances § A relationship could be defined between the local reference copies
  • 23. Egeria distributed graph model – using entity proxies 24 Database Column Glossary Term OMAG Server 1 OMAG Server 3 OMAG Server 2 Meaning Database Column Glossary Term § Instead of replication, the third server could relate the original entities using entity proxies Entity Proxy
  • 24. The Egeria Graph Repository
  • 25. Egeria OMRS Repositories 26 Search Open Metadata Access Services Open Metadata Repository Services • Egeria includes a choice of metadata repositories, which can be used as additional metadata stores that can plug functional gaps between other tools and repositories and can provide local access • One of the Egeria repositories is a graph repository, which lends itself to the types of queries we saw earlier
  • 26. Egeria Open Metadata Repository Services (OMRS) • The OMRS defines a protocol and a set of connectors • The Enterprise Connector performs cohort-wide operations – this includes issuing queries to the cohort and when metadata is replicated from another server it can use the local connector and repository to cache it for availability and performance • The Local Connector performs local operations and provides a default Event Mapper that enables events relating to local operations to be sent to the cohort • The Repository Connector interfaces to a specific repository – and optionally, may be accompanied by a custom Event Mapper • Egeria provides two built-in repositories and there are connectors to other repositories • The interface to a repository connector is the MetadataCollection API, described on the next slide OMRS Enterprise Connector OMRS Local Connector & Event Mapper OMRS Repository Connector Repository Cohort MetadataCollection API
  • 27. The OMRSMetadataCollection interface • The interface to an Egeria repository is the OMRSMetadataCollection interface • It includes groups of operations: • Group 1: Identification of the metadata repository - metadataCollectionId • Group 2: Type definitions (types, attributes) - add, find, get, remove, … • Group 3: Find instances (entities, relationships) - get, find, graph-queries, … • Group 4: Maintain instances (entities, relationships) - addEntity, deleteEntity, … • Group 5: Change control information (entities, relationships) - reIdentify, reHome, … • Group 6: Maintenance of reference (replica) copies – save, purge, refresh,…
  • 28. Egeria Local Graph Repository • The Egeria distribution includes a persistent repository and a non-persistent repository • The persistent repository is a graph repository built on JanusGraph, an open-source graph database project, hosted by the Linux Foundation • http://janusgraph.org • http://github.com/janusgraph/janusgraph • The built-in graph repository provides an OMAG Server with a persistent metadata store and is built using Egeria’s ‘plugin’ pattern • The graph repository can store instances of metadata owned by the local server • It can also store reference copies of metadata instances replicated to the local server • It also supports relationship instances that refer to entity proxy instances • Other graph databases are available, and Egeria’s pluggable connector architecture enables the creation of repository connectors for different databases. • The Conformance Test Suite provides a set of automated tests that can be run against a repository to assess whether it correctly implements the Egeria types and interfaces
  • 29. Anatomy of the local graph repository 30 Graph Metadata Store JanusGraph persistence search OMAG Server OMAS – access services OMRS Enterprise Connector OMRS topics in out Apache Tinkerpop OMRS Local Connector & Event Mapper OMRS Graph Connector JanusGraph Management Cohort
  • 30. Graph Repository configurations • The first release of the Egeria Graph Repository used BerkeleyDB and Lucene as embedded persistence and indexing backends. This provides a relatively simple quick-start configuration, especially good for development and testing and sufficient for some production uses. • In production it may be desirable (or essential) to use a different persistence backend (e.g. Cassandra) or indexing backend (e.g. Elastic). • ING Bank added to the configuration of the Graph Repository to enable the use of (remote) Cassandra and Elastic services. • Discussions have started about work to add a remote JanusGraph Server configuration in order to provide an HA option.
  • 31. Graph Repository components • GraphOMRSRepositoryConnector - implements the open connector framework interface • GraphOMRSRepositoryConnectorProvider – implements the mechanism for brokering a connector • GraphOMRSMetadataCollection – top level interface supporting type and instance operations • GraphOMRSMetadataStore – implements the MetadataCollection using a graph database • GraphOMRSGraphFactory – creation, schema, indexing - encapsulates JanusGraph-specifics • Mappers – convert between OMRS objects and graph vertices and edges • GraphOMRSEntityMapper • GraphOMRSRelationshipMapper • GraphOMRSClassificationMapper • Plus various utility classes – error codes, audit logging, constants and utility methods https://github.com/odpi/egeria/ See open-metadata-implementation/adapters/open-connectors/repository-services-connectors/ open-metadata-collection-store-connectors/graph-repository-connector
  • 32. To use the Egeria Graph Repository • Configure the OMAG Server with repository-mode = ‘local-graph-repository’ • e.g. HTTP POST http://localhost:8080/open-metadata/admin- services/users/{username}/servers/{servermame}/local-repository/mode/local-graph-repository • Start the OMRS instance in the server • e.g. HTTP POST http://localhost:8080/open-metadata/admin- services/users/{username}/servers/{servername}/instance • If using the embedded configuration of Berkeley DB for persistence and Lucene for indexing, when OMRS starts, the graph repository auto-creates a JanusGraph database – including: • Persistence backend • Search backend • Graph schema • Search indexes • If using alternative backends for persistence or indexing, ensure that they are correctly configured and available before starting the OMAG Server.
  • 33. Graph Schema The MetadataCollection interface is the formal interface to an Egeria repository. Whilst it is possible to look at the graph directly (e.g. using Gremlin console): Please don’t rely on the schema – it is likely to evolve Type data: • The Graph Repository does not store type definitions • It delegates all type operations to the Repository Content Manager Instance data: • The Egeria Graph Repository stores instance data, using a JanusGraph schema that has: • vertices for entities and classifications • edges for relationships and classifiers
  • 34. Instance representations in the OMRS 35 Relationship Instance Entity Instance Entity Instance Classification Instance Classification Instance Primitives Enums Collections Attributes Attributes Attributes Attributes Primitives Enums Collections Primitives Enums Collections Primitives Enums Collections Primitives Enums Collections Attributes
  • 35. Graph mapping – vertices and edges Relationship Instance Entity Instance Entity Instance Classification Instance Classification Instance Primitives Enums Collections Attributes Attributes Attributes Attributes Primitives Enums Collections Primitives Enums Collections Primitives Enums Collections Primitives Enums Collections Attributes Classification Instance Entity Instance Relationship Instance Attributes Primitives Enums Collections AttributesAttributes Primitives Enums Collections Primitives Enums Collections label : “classification” label : “entity” label : “relationship” Properties Properties Properties vertex label : “classifier” Properties OMRSinstance representation Graphschema element vertex edge edge
  • 36. Graph mapping – vertices and edges Properties Properties Properties Properties Properties relationship classifier classifier entity entity classification classification Relationship Instance Entity Instance Entity Instance Classification Instance Classification Instance Primitives Enums Collections Attributes Attributes Attributes Attributes Primitives Enums Collections Primitives Enums Collections Primitives Enums Collections Primitives Enums Collections Attributes
  • 37. Local instances, reference copies and proxies 38 • The graph contains one vertex per entity – whether the entity is local, a reference copy or a proxy • If the entity has an associated classification, the classification is stored as a vertex, with an edge from the entity vertex to the classification vertex • The graph contains one edge per relationship – whether the relationship is local or a reference copy • Reference Copies • The metadataCollectionId core attribute is set to the ‘guid’ of the home repository • Entity Proxy objects • Each entity instance has a vertex property of type Boolean, to indicate whether the instance is a proxy
  • 38. Metadata Collection ‘graph-query’ methods • There are 4 sub-graph query methods: • getRelatedEntities() - optional • Returns the entity and its immediate neighbors • getEntityNeighborhood() - optional • Returns the entity and its neighbors up to the depth specified by the ‘level’ parameter • getLinkingEntities() - optional • Returns the relationships and intermediate entities that connect the specified pair of entities • getRelationshipsForEntity() - mandatory • Returns relationships associated with entity, optionally filtered by relationship type and status level = 2
  • 39. Graph Repository – supported functions • The GraphRepository supports most of the OMRS MetadataCollection API, including: • Save and purge of reference copies • Use of entity proxies • Delete and restore as well as purge – delete is a soft, restorable delete; purge is permanent • Re-type of instances • Re-identify of instances • Re-home of instances • The four ‘graph queries’ – described on the previous slide • The ‘find’ methods – find..ByProperty, find..ByPropertyValue, findEntityByClassification • The Graph Repository does not (yet) support: • Historic queries – find methods that specify an asOfTime parameter • Undo of previous instance updates
  • 40. • Egeria project website: egeria.odpi.org • Github: github.com/odpi/egeria • Slack: https://slack.odpi.org/ More information…