Building a successful ModernAI application often requires large volumes of data for training ML models or data that has been organized into knowledge using taxonomies or ontologies to support specific vertical markets (healthcare, insurance, pharma, etc.) or horizontal functions (HR, legal, supply chain, etc). While tools do exist to help developers ingest and organize the required data into meaningful knowledge stores, using pre-built data or knowledge packages can make application development faster, more reliable, and less expensive than starting from scratch.
In this webinar we will look at trends and examples of specific proprietary and open source data sets that offer prebuilt knowledge, representations, or models to serve these markets.
1. APRIL 12, 2018
Knowledge as a Service
An Introduction to the Emerging Pre-Built Knowledge Market
Adrian J Bowles, PhD
Founder, STORM Insights, Inc.
Lead Analyst, AI, Aragon Research
info@storminsights.com
3. Copyright (c) 2018 by STORM Insights Inc. All Rights Reserved.
WHY IS THE AVAILABILITY OF KNOWLEDGE & DATA SUCH AN ISSUE TODAY?
PERCEPTION
UNDERSTANDING
LEARNING
Big
Data
Classic
AI
Deep
Learning
4. Copyright (c) 2018 by STORM Insights Inc. All Rights Reserved.
Systems
Controls
Model
Data Mgmt
Human
Machine
Input Output
Gestures
Emotions
Language
Narrative Generation
Visualization
Reports
Haptics
Sensors
(IOT)
Systems
Controls
DATA IN THE MODERN AI LANDSCAPE
Learn
Reason
Understand
Emotions Meaning
Concepts Intent
Context
5. Copyright (c) 2018 by STORM Insights Inc. All Rights Reserved.
Systems
Controls
Model
Data Mgmt
Human
Machine
Input Output
Gestures
Emotions
Language
Narrative Generation
Visualization
Reports
Haptics
Sensors
(IOT)
Systems
Controls
DATA MANAGEMENT IN THE MODERN AI LANDSCAPE
Emotions Meaning
Concepts Intent
Context
6. IDENTIFYING THE RIGHT DATA SOURCES IS INCREASING IN IMPORTANCE
Copyright (c) 2018 by STORM Insights Inc. All Rights Reserved.
DATA
More Data + Faster HW make
Deep Learning Practical
Deep Learning Success With Recognition
Spurs Investment
ALGORITHMS
&
RULES
Caution for Applications Where
Transparency is Critical
Investment Leads to Investigation
Broaden the Scope of Applications
New “Explainability” Research Emerges
Hybrid Solutions to Augment Intelligence
Will Thrive for Critical Applications
7. DATA REQUIREMENTS VARY WITH DOMAIN/TASK REQUIREMENTS
Domain
Task
General
General
Intelligent
Apps
Healthcare
Customer Service
Reorder Rx
Speak to a
Pharmacist
Pharma
Artificial
General
Intelligence
Chatbot
8. THE SEMANTIC WEB:
ALL DATA SHOULD BE ASSOCIATED WITH SEMANTIC ATTRIBUTES (MEANING)
Copyright (c) 2018 by STORM Insights Inc. All Rights Reserved.
RDF - Resource Description Framework - A directed, labeled graph.
DFS - RDF Specifications Suite Recommendations (Language for representing RDF vocabularies)
SPARQL - A Semantic Protocol & Query Language for RDF Data
OWL - The Web Ontology Language is a Semantic We
language designed to represent knowledge about things
and relationships between things on the Web.
An OWL Document is an Ontology.
https://www.w3.org/2013/data/
BASICS OF THE W3C SEMANTIC WEB ONTOLOGY STACK
9. DEEP STRUCTURE REQUIRES STRONGER METHODS FOR ANALYSIS TO FIND CONCEPTS
Copyright (c) 2018 by STORM Insights Inc. All Rights Reserved.
Perception: obvious
structure is easy to
process…
but most of the
interesting stuff isn’t
obvious to a
computer.
10. START WITH A TAXONOMY
Copyright (c) 2018 by STORM Insights Inc. All Rights Reserved.
A taxonomy represents the formal structure of classes or types of objects within a domain.
•Generally hierarchical and provide names for each class in the domain.
•May also capture the membership properties of each object in relation to the other objects.
•The rules of a specific taxonomy are used to classify or categorize any object in the domain, so they
must be complete, consistent, and unambiguous. This rigor in specification should ensure that any
newly discovered object must fit into one, and only one, category or object class.
11. ONTOLOGIES
Copyright (c) 2018 by STORM Insights Inc. All Rights Reserved.
An ontology formalizes and specifies the names, definitions,
and attributes of entities within a domain.
An accepted ontology may define the domain.
12. ONTOLOGIES EVOLVE - SYSTEMS MUST BE FLEXIBLE
Copyright (c) 2018 by STORM Insights Inc. All Rights Reserved.
TRUTH VS BELIEF - DESIGN ACCORDINGLY
13. DATA SOURCE INTEGRATION IS A DESIGN CONSIDERATION
Copyright (c) 2018 by STORM Insights Inc. All Rights Reserved.
DataSources
Integrate
CRM
ERP
Enterprise Apps
Streaming
Historical/
Static/Batch
Required
Optional
IoT Sensors
Social Media Streams
Log Data
Other Streams
Deliv
er
Visualize
Analyze
14. DETERMINE YOUR NEED OR IDENTIFY RESOURCES FIRST?
Domain
Rate of Change
General
Streaming
Specific
Static
Natural
Language
Traffic
Stock Prices
Weather
APA
Diagnostics
Disaster/Battlefield
Monitoring
Twitterverse
15. USE PRE-BUILT KNOWLEDGE RESOURCES, SAVE TIME (30 YEARS?)
Copyright (c) 2018 by STORM Insights Inc. All Rights Reserved.
17. PREBUILT CONTENT FOR FASTER DEPLOYMENT
Copyright (c) 2018 by STORM Insights Inc. All Rights Reserved.
Watson Conversation
and
Virtual Agent
Source: IBM
19. YAGO - YET ANOTHER GREAT ONTOLOGY
Semantic knowledge base derived from Wikipedia, WordNet, and GeoNames
s://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/yag
Joint project of the Man Planck Institute for Informatics
and the
Telecom Paris Tech University
> 10M entities > 120M facts
Facts & Entities may have Temporal and Spacial Dimensions
Open Source: Available on Github (8/31/2017)
Graph Browser
20. Extracting “structured data” from Wikipedia.
The DBpedia data set describes > 4.58 million entities
>4 million are classified in a consistent ontology, including
1,445,000 persons, 735,000 places, 123,000 music albums, 87,000 films, 19,000 video games,
241,000 organizations, 251,000 species and 6,000 diseases.
~50 million links to other RDF datasets, 80.9 million links to Wikipedia categories, and 41.2 million
YAGO2 categories.
DBpedia uses the Resource Description Framework (RDF) to represent extracted information and
consists of 3 billion RDF triples, of which 580 million were extracted from the English edition of
Wikipedia and 2.46 billion from other language editions.
Derived from "DBpedia." Wikipedia, The Free Encyclopedia. Wikipedia, The Free Encyclopedia, 17 Nov. 2017. Web. 12 Apr.
2018.
21. NEED TO ASSOCIATE/RECOGNIZE/UNDERSTAND TO ORGANIZE/REPRESENT
Copyright (c) 2018 by STORM Insights Inc. All Rights Reserved.
Wordnet(R) Princeton
University "About WordNet."
Princeton University. 2010.
<http://wordnet.princeton.edu>
25. CUSTOMERS CAN BE RICH DATA SOURCES
Copyright (c) 2018 by STORM Insights Inc. All Rights Reserved.
Type Typical Current Use Potential Use
Accelerometer/motion
Rotate screen, Switch screen to
landscape/portrait
Ambient Light Adjust screen brightness
Barometer Measure altitude
Geo-Location (wifi/cellular) Location/Alerts
3-Axis Gyroscope Rotation rate for games, VR…
Proximity
Turn off screen when phone is by
your head
Touch ID fingerprint, Facial
Recognition
Security
26. CUSTOMERS CAN BE RICH DATA SOURCES
Copyright (c) 2012-6 by Dark Sky Company LLC. All Rights Reserved.
28. PUBLIC USE OF OPT-IN DATA
https://www.boston.gov/departments/new-urban-mechanics/street-bump
Smarter Cities
Collaborative Intelligence
The Borg Lives!
29. DISTRIBUTED DATA AND INTELLIGENCE
Copyright (c) 2018 by STORM Insights Inc. All Rights Reserved.
Intelligence can be
Local to the device
Distributed
Aggregated
32. A WORD OF CAUTION
Copyright (c) 2018 by STORM Insights Inc. All Rights Reserved.
Sometimes the wisdom of crowds
leads to
Unintended Consequences
33. BE CAREFUL USING COMMERCIALLY ACQUIRED DATA
Copyright (c) 2018 by STORM Insights Inc. All Rights Reserved.
According to MaxMind…
This farm is home to
600,000,000 IP Addresses
Watch Out for Unintended Consequences,
Especially With Big Data
34. “THAT’S A BIG HOUSE” - MEANING MAY BE DIFFERENT FOR DIFFERENT SPEAKERS
Copyright (c) 2018 by STORM Insights Inc. All Rights Reserved.
Bob Mary
Al (Capone)
Wikipedia contributors. "Alcatraz Federal Penitentiary." Wikipedia, The Free Encyclopedia. Wikipedia, The Free Encyclopedia, 30 Oct. 2017. Web. 8 Nov. 2017.
35. DRAW A QUARTER, TO SCALE - RESULTS DIFFER ACCORDING TO HIDDEN CONTEXT
Copyright (c) 2018 by STORM Insights Inc. All Rights Reserved.
36. IT’S (ALMOST) ALL OUT THERE
Domain
Rate of Change
General
Streaming
Specific
Static
Wikipedia
OpenCyc
Public
OpenData
NOAA
Data
37. FIND NEW USES FOR EXISTING DATA SOURCES
Copyright (c) 2018 by STORM Insights Inc. All Rights Reserved.
Copyright (c) 2014 by Umbrellium Ltd.
No Shortage of Data
How will you
create value?
38. adrian@storminsights.com
Twitter @ajbowles
Skype ajbowles
KEEP IN TOUCH
Upcoming SmartData Webinar Dates & Topics
May 10 Case Studies: Transforming Industries with AI
(Manufacturing & Retail)
June 14 Natural Language Processing:
From Chatbots to Artificial Understanding with Affective I/O
COMING SOON…
AGEOFREASONING.COM
BOOK, VIDEOS, PROFESSIONAL SERVICES
WWW.AGEOFREASONING.COM