Bringing Machine Learning and Knowledge Graphs Together
Six Core Aspects of Semantic AI:
- Hybrid Approach
- Data Quality
- Data as a Service
- Structured Data Meets Text
- No Black-box
- Towards Self-optimizing Machines
1. Principal Data Scientist
Booz Allen Hamilton
http://www.boozallen.com/datascience
Kirk Borne
@KirkDBorne
Semantic AI: Smart Data for
Smarter Discovery & Actions
2. Six Core Aspects of Semantic AI
https://bit.ly/2Kxw8H5
•Hybrid Approach
•Data Quality
•Data as a Service
•Structured Data Meets Text
•No Black-box
•Towards Self-optimizing Machines
3. Ever since we first explored our world…
http://www.livescience.com/27663-seven-seas.html 3
4. …We have asked questions about everything around us.
https://atillakingthehun.wordpress.com/2014/08/07/atlantis-not-lost/
4
5. So, we have collected evidence (data) to answer our questions,
which leads to more questions, which leads to more data collection,
which leads to more questions, which leads to… BIG DATA!
5
https://www.linkedin.com/pulse/exponential-growth-isnt-cool-combinatorial-tor-bair
6. So, we have collected evidence (data) to answer our questions,
which leads to more questions, which leads to more data collection,
which leads to more questions, which leads to… BIG DATA!
y ~ 2 * x (linear growth)
y ~ 2 ^ x (exponential growth)
6
https://www.linkedin.com/pulse/exponential-growth-isnt-cool-combinatorial-tor-bair
y ~ x! ≈ x ^ x
→ Combinatorial Growth!
(all possible interconnections,
linkages, and interactions)
3+1 V’s of Big Data:
Volume = most annoying V
Velocity = most challenging V
Variety = most rich V for discovery
Value = the most important V
7. “All the World is a Graph” – Shakespeare?
(Graphic by Cray, for Cray Graph Engine CGE)
7
http://www.cray.com/products/analytics/cray-graph-engine
8. Semantic, Meaning-filled Data:
• Ontologies (formal)
• Taxonomies (class hierarchies)
• Folksonomies (informal)
• Tagging / Annotation
– Automated (Machine Learning)
– Crowdsourced
– “Breadcrumbs” (user trails)
Broad, Enriched Data:
• Linked Data (RDF)
– All of those combinations!
• Graph Databases
• Machine Learning
• Cognitive Analytics
• Context
• The 360o view
Making Sense of the World with Smart Data
The Human Connectome Project:
mapping and linking the major
pathways in the brain.
http://www.humanconnectomeproject.org/
8
9. Semantic AI in the Internet of Things (IoT):
Internet of
Everything
https://www.nsf.gov/news/news_images.jsp?cntn_id=122028 9
The Internet of Things (IoT) will be an interconnected network of Sensors and
Dynamic Data-Driven Application Systems (dddas.org) =>
Leading to a Combinatorial Explosive Growth of Smart Data!
IoT will power an “Internet of Context” – empowering smarter
actionable intelligence from contextual data everywhere!
10. 1) Class Discovery: Find the categories of objects
(population segments), events, and behaviors in your
data. + Learn the rules that constrain the class
boundaries (that uniquely distinguish them).
2) Correlation (Predictive and Prescriptive Power)
Discovery: Finding trends, patterns, dependencies in
data, which reveal the governing principles or behavioral
patterns (the object’s “DNA”).
3) Novelty (Surprise!) Discovery:
Finding new, rare, one-in-a-[million / billion / trillion]
objects, events, or behaviors.
4) Association (or Link) Discovery: (Graph and Network
Analytics) – Find the unusual (interesting) co-occurring
Make your data smarter with Machine Learning =
= generate semantic tags that describe discoveries
10
(Graphic by S. G. Djorgovski, Caltech)
11. SEMANTIC AI USE CASE IN ENVIRONMENTAL SCIENCE:
From Data to Information to Knowledge to Understanding
11
12. Semantic AI tags new discoveries for search, re-use, & building the knowledge graph!
12
SEMANTIC AI USE CASE IN ENVIRONMENTAL SCIENCE:
13. Semantic AI creates a Smarter Data Narrative
• It is best when we understand our data’s context and meaning…
• … the Semantics! This is based on Ontologies.
• My students memorized the definition of an Ontology…
–“is_a formal, explicit specification of a shared conceptualization.”
from Tom Gruber (Stanford)
• Semantic “facts” can be expressed in a database as RDF triples:
{subject, predicate, object} = {noun, verb, noun}
13
14. Get Smart (Data)!
• Collect, Create, Connect smart data across your repositories.
• Build Actionable Knowledge with Semantic AI, not databases!
… then Explore and Exploit Your Knowledge Graph.
14http://ghostednotes.com/category/semantic-web
Chapters
Indexes
Covers
Tablesof
Contents
https://www.quora.com/What-is-the-main-goal-of-semantic-web
Query your data for Patterns & Knowledge
(Action)(Discovery)
15. Andreas Blumauer
CEO & Managing Partner
Semantic Web Company /
PoolParty Semantic Suite
Semantic AI
Bringing Machine Learning, NLP
and Knowledge Graphs together
16. Agenda
16
Semantic
AI
▸ A Quick Introduction to the Semantic Web
▹ Semantic Web in Use
▹ Reasoning
▹ The Linked Data Lifecycle
▸ Six Core Aspects of Semantic AI
▹ Data Quality
▹ Data as a Service
▹ No black-box
▹ Hybrid approach
▹ Structured data meets text
▹ Towards self
optimizing machines
17. A Quick Introduction
To the Semantic Web
Benefiting from Knowledge Graphs and
Semantic Web Standards
17
18. The Semantic
Web
A standards-based
graph of
knowledge graphs
18
Linking Open Data cloud diagram 2017, by Andrejs Abele, John P. McCrae, Paul Buitelaar, Anja Jentzsch and Richard Cyganiak. http://lod-
cloud.net/
19. Semantic Web
in Use
Knowledge Graphs
to support Search
and Q&A engines
Knowledge Graphs (KG) can
cover general knowledge (often
also called cross-domain or
encyclopedic knowledge), or
provide knowledge about special
domains such as biomedicine.
In most cases KGs are based on
Semantic Web standards, and
have been generated by a mixture
of automatic extraction from text
or structured data, and manual
curation work.
Examples:
▸ DBpedia
▸ Google Knowledge Graph
▸ YAGO
▸ OpenCyc
▸ Wikidata
19 Who is the inventor of the World Wide Web?
20. Reasoning
Knowledge Graphs
& Knowledge
Extraction
20
Perth
Australia
Perth is one of
the most isolated
major cities in the
world, with a
population of
2,022,044 living
in Greater Perth.
Australia is a
member of the
OECD, United
Nations, G20,
ANZUS, and
the World
Trade
Organisation.
Country
City
is a
is a
is located in
Avoid illogical
answers:
Support complex
Q&A:
distance between
Which cities located in
the
Commonwealth of
Nations
have a population of
Commonwealt
h of Nations
Internation
al
Organisatio
n
is part of
is a
21. The Linked Data
Life Cycle
Creating Semantic
Data along the
Data Life Cycle
21
Auer, S. et al. (2012). Managing the life-cycle of linked data with the LOD2 stack.In International semantic Web conference (pp. 1-16). Springer,
Berlin, Heidelberg. https://link.springer.com/content/pdf/10.1007/978-3-642-35173-0_1.pdf
22. Six Core Aspects of
Semantic AI
#SemanticAI: Bringing Machine Learning,
NLP and Knowledge Graphs together
22
23. Six Core Aspects
of Semantic AI
1. Data Quality: Semantically enriched data serves as a basis for better data
quality and provides more options for feature extraction.
2. Data as a Service: Linked data based on W3C Standards can serve as an
enterprise-wide data platform and helps to provide training data for machine
learning in a more cost-efficient way.
3. No black-box: Semantic AI ultimately leads to AI governance that works on
three layers: technically, ethically, and on the legal layer.
4. Hybrid approach: Semantic AI is the combination of methods derived from
symbolic AI and statistical AI.
5. Structured data meets text: Most machine learning algorithms work well
either with text or with structured data.
6. Towards self optimizing machines: Machine learning can help to extend
knowledge graphs, and in return, knowledge graphs can help to improve ML
algorithms.
https://www.datasciencecentral.com/profiles/blogs/six-core-aspects-of-semantic-ai
23
24. 1. Data Quality
Benchmarking
the PoolParty
Semantic
Classifier
24
Reegle thesaurus
A comprehensive SKOS taxonomy
for the clean energy sector
(http://data.reeep.org/thesaurus/guide)
● 3,420 concepts
● 7,280 labels (English version)
● 9,183 relations (broader/narrower + related)
Document Training Set
1,800 documents in 7 classes
Renewable Energy, District Heating Systems,
Cogeneration, Energy Efficiency, Energy (general),
Climate Protection, Rural Electrification
▸ Improvement of 5.2% (F1 score) compared to
traditional (term-based) SVM
25. 1. Data Quality
PoolParty
Semantic
Classifier in a
Nutshell
25
PoolParty Semantic Classifier combines machine learning algorithms
(SVM, Deep Learning, Naive Bayes, etc.) with Semantic Knowledge Graphs.
26. 2. Data as a
Service
26
Structured Data
Machine
Learning
Cognitive
Applications
27. 2. Data as a
Service
27 Unstructured Data
Structured Data
Machine
Learning
Cognitive
Applications
28. 2. Data as a
Service
28 Unstructured Data
Structured Data
Knowledge Graphs
Machine
Learning
Cognitive
Applications
29. 2. Data as a
Service
Knowledge Graphs
as a Data Model
for Machine
Learning
Wilcke X, Bloem P, De Boer V. The Knowledge Graph as the Default Data Model for Machine Learning.
Data Science. 2017 Oct 17;1-19. Available from, DOI: 10.3233/DS-170007
29 “Traditionally, when
faced with
heterogeneous
knowledge in a machine
learning context, data
scientists preprocess
the data and engineer
feature vectors so they
can be used as input for
learning algorithms
(e.g., for classification).”
30. 3. No Black Box
Infrastructure to
overcome
information
asymmetries
between the
developers of AI
systems and other
stakeholders
30
31. 3. No Black Box
Explainable AI
Classifiers based on ML algorithms such as Deep Learning perform better when training data is
semantically enhanced. Additional features are derived from a controlled vocabulary, which also
make the used features more transparent to the Data Scientist.
31
32. 4. Hybrid
Approach
32
Artificial Intelligence
ANN
Symbolic AISub-Symbolic AI Statistical AI
KR & reasoning
NLP
Machine Learning
Word Embedding Deep Learning
Natural Language
Understanding
Entity Recognition &
Linking
Knowledge Extraction
Semantic enhanced
Text Classification
In Semantic AI, various methods
from Symbolic AI are combined with
machine learning methods, and/or
neuronal networks.
Examples:
● Semantic enrichment of
text corpora to enhance
word embeddings
● Extraction of semantic features
from text to improve ML-based
classification tasks
● Combine ML-based with Graph-
based entity extraction
● Knowledge Graphs as a Data
Model for Machine Learning
● ….
33. 5. Structured
Data meets Text
33 Purchase
History
Social
Media
Recommender
Personal Assistant
Prediction
Customer Retention
Classification
Intent Detection
35. 6. Towards self
optimizing
machines
35 ▸ Semantic AI is the next-generation
Artificial Intelligence
▸ Machine learning can help to extend
knowledge graphs (e.g., through
‘corpus-based ontology learning’ or
through graph mapping based on
‘spreading activation’), and in return,
knowledge graphs can help to improve
ML algorithms (e.g., through ‘distant
supervision’).
▸ This integrated approach ultimately
leads to systems that work like self
optimizing machines after an initial
setup phase, while being transparent to
the underlying knowledge models.
▸ Graph Convolutional Networks (in
progress) promise new insights
Mike Bergman: Knowledge-based Artificial Intelligence
(2014) http://www.mkbergman.com/1816/knowledge-based-artificial-
intelligence/
36. ▸ To understand
▹ Content aboutness in a defined
framework
▹ Data relationships and context within
a
unified organizational model
▹ Connections across disparate datasets
▸ To increase precision
▹ Hierarchical or other mapped
relationships allow for recommending
similar content when exact matches
not found
▹ Granularity allows for more specific
recommendations
▹ Consistency across structure results
more precise analysis and predictions
Source: Suzanne Carroll, Data Science Product Director at XO Group
Why
Data Scientists
need
Semantic Models
36