This document discusses how semantic web ontologies and knowledge graphs can help reduce high IT costs by providing a common schema and linking data across systems. It introduces AnzoGraph DB, a graph database built on semantic web standards that can perform both analytics and graph algorithms on large datasets. The document demonstrates how public flight delay data can be converted to a knowledge graph and analyzed using techniques like PageRank, shortest paths, and querying for delayed flights. Overall, it argues that semantic technologies can help address the problem of data integration costs by enabling linked and standardized data.
The Business Case for Semantic Web Ontology & Knowledge Graphs
1. The Business Case for Semantic
Web Ontology & Knowledge Graph
Mark Wallace
Senior Ontologist, Semantic Arts
Thomas Cook
Sales Director, AnzoGraph DB
1
2. • Founded by senior team from IBM’s Advanced Internet
Technology Group
• Complemented by MPP technology team previously
founded Netezza & ParAccel (Amazon Redshift)
• Experienced executive team with proven track record of
success
About Cambridge Semantics
Based in Boston and San Diego
150+ Employees
Enterprise scale data fabric
for automated data
management & analytics
LEADERSHIP TEAM – CSI
AnzoGraph DB
Anzo Data Fabric
2
Graph database embedded
in the Anzo data fabric, but
also sold separately
3. About Semantic Arts
• We’re experts in Semantic
technology and Ontology
design
• We specialize in Semantic
strategy and Ontology
implementation, refining our
best practices since 2000
• We’re thought leaders who
speak at conferences, publish
articles, and author books on
Information Management
innovation and enrichment
across the enterprise
• Collectively, our team has
over 200 years of experience
• We’re investing in the
Ontology community and the
pursuit of sharing ideas
(Gist Council, Estes Park Group)
• We’re developing proprietary
tools for faster adoption
Gist
(Minimalist Upper Ontology)
• We observe international
WC3 standards and
guidelines
3
4. Who is Mark Wallace?
• Ontologist
• Software architect/developer
• Decades of experience designing / building
data-centric systems
• Semantic Web since 2004
• Built Large-scale RDF applications
• Invited Semantic Web speaker since 2009
4
5. Agenda
• Why are IT costs so high?
• What changes are needed to solve this?
• Are Semantic Web Knowledge Graphs the fix?
• Intro to Ontology for Knowledge Graphs
• Some Myth-busting
• Demonstrations
• Q&A
5
6. The Problem:
• 40%-70% of most IT budgets are spent on integration1
• Reason: Local data models and KGs
• The implied scope of a relational system: enterprise
• (actually an application within an enterprise)
• no real mechanism for the identifiers to mean anything beyond that context
• The implied scope of LPG: single graph DB
• generally working on one data set or several that have been hand-harmonized
• no mechanism to reach beyond a single graph DB
• To fix this local data problem you need:
• "Global-er" IDs
• Shared schema with clear meanings
• Vendor agnostic
6
1. The Data-Centric Revolution, Dave McComb, p.63
7. Is Semantic Web the Fix?
• The explicit scope of RDF: web scale
• Global IDs (URIs)
• Shared schema with unambiguous meaning (OWL Ontology)
• Vendor agnostic (W3C Standards based)
• A Semantic Knowledge Graph is:
• an RDF graph with an OWL ontology as its schema
7
8. What is an Enterprise OWL Ontology?
• Defines a common, core set of concepts
• Central to the enterprise
• Relatively stable
• Relatively small
• Identifies terminology conflicts
• Synchronizes different words for same concept
• Resolves same word use for different concepts
• Formal model, enables automation
• Detecting logical schema inconsistencies
• Discovering “new” facts in the data
8
Ontology Model
Data lake or hub
Apps
Other Applications with their own data models
9. Traditional Pro/Con of Semantic Web KG
• Pros:
• Global URIs for linking data and reusing schema
• Ontology - ensures schema meaning is clear
"You cannot maintain what you do not understand" -Michael Uschold
• Vendor independence - a silo buster!
• Cons:
• No properties on relationships - leads to complex models
• Can't do graph algorithms
9
10. Myth Busting
• Myth: you have to pick either RDF or LPG
• RDF is for data harmonization & analytics
• LPG is for graph algorithms and "edge properties"
• Now Busted!
• RDF* gives properties on relationships
• AnzoGraph provides graph algorithms
• E.g., PageRank & Weighted Shortest Path
10
14. Converting Rows and Columns to Triples
FlightDeparture
FlightArrival
Destination
AirportFlight
AirportFlight
Airport Airport
14
15. Converting Rows and Columns to Triples
Destination
Airport Airport
Flight
15
16. Nodes have types and properties
YEAR
MONTH
DAY
DAY_OF_WEEK
AIRLINE
FLIGHT_NUMBER
TAIL_NUMBER
ORIGIN_AIRPORT
DESTINATION_AIRPORT
….
*Note: Types can also be called Labels, as in Labeled
Property Graphs or LPG
Flight
Properties
16
18. Now we are ready to analyze the data:
BI-Style Analytics
#1 Longest flight segments by distance from Boston (BOS)
#2 Airports less the 400 mi from Boston (BOS) - Network Viewer output
#3 Longest distances between two airports
#4 Longest flights by elapsed time
#5 Airlines with the longest average delays
#6 Airlines with the most flights
#7 Longest 2 segments reachable from Boston and the distances of each segment
#8 Which segments have the longest average departure delays
Graph Algorithms
#9 Page Rank - Graph Algorithm - Show most well-connected airports based on page rank algorithm
#10 Shortest Path Graph Algorithm - show shortest path with weighted RDF* edge property
18
20. What Makes
AnzoGraph DB
Special?
BUILT ON STANDARDS
SPARQL/RDF, Cypher/BOLT,
SPARQL*/RDF*, OWL, RDFS+
ANALYTICS FOR MANY USES
Graph Algorithms, BI/DW Analytics, Inferencing,
Data Science/Feature Engineering, Geospatial
MAKE USE OF ANY DATA
Use all servers to load and persist data for ultimate
performance or GDI for virtualization
SCALABLE TO MEET ANY NEED
Linear horizontal scaling to handle billions
or trillions of triples
LOAD AND ANALYZE FAST
In-Memory MPP platform able speed
though challenging analytics and loading
21. Recap
• Why are IT costs so high?
• What changes are needed to solve this?
• Are Semantic Web Knowledge Graphs the fix?
• Intro to Ontology for Knowledge Graphs
• Some Myth-busting
• Demonstrations
21