The document discusses advances in digital archiving through the use of semantic web technologies like linked data and metadata annotation, as well as recent developments in artificial intelligence like deep learning that can help automate metadata generation and computer vision tasks. While semantic web approaches have seen some adoption in archiving communities, most applications have focused on search. New AI techniques now make it possible to reduce the cost of annotating archive collections at scale and develop interactive explorations of archive materials.
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Digital Archiving, The Semantic Web, and Modern AI
1. Tetherless World Constellation, RPI
Digital Archiving, The Semantic
Web, and Modern AI
Jim Hendler
Tetherless World Professor of Computer, Web and Cognitive Sciences
Director, Institute for Data Exploration and Applications
Rensselaer Polytechnic Institute
http://www.cs.rpi.edu/~hendler
@jahendler (twitter)
Major talks at: http://www.slideshare.net/jahendler
2. Tetherless World Constellation, RPI
Not going to talk today about issues of
AI and society, personal data, umeployment, etc.
Wrote a book about those, happy to discuss w/people…
Today I will focus on archiving:
metadata, knowledge graphs, & new directions in AI
(or see slideshare for “jahendler”, TedX, …)
3. Tetherless World Constellation, RPI
The real challenge
• Today would be the 60th birthday of
my best friend growing up, Jack
Pressman (who passed away 20
years ago)
– How could we find a picture/image of
him?
• Not famous enough for wikipedia
• Never made it into a youtube video
• Common name (and not likely to have been
annotated)
4. Tetherless World Constellation, RPI
Finding Jack
• What would you do?
– (Class exercise)
• We’d learn what we could about him
– We know his age
– Where did he grow up
• Any of those locations have pictures with people
– Where did he go to school
• Any famous classmates he may be in picture with
– Any major accomplishments
• He wrote a well-respected book on the history of medicine (lobotomies)
• Essentially, we look for things that “link” him to
places, events, objects, times …
– This is how finding things in archives happens
• How can machines help?
5. Tetherless World Constellation, RPI
So we annotate images/videos
But the information is saved internal to the
system, generally for later search, not exposed
externally…
7. Tetherless World Constellation, RPI
On the Web -- links are critical!
<a href= URI>
HTML
Web page Any Web Resource
<a href=“http://…”>
RDF
URI URI
URI
RDF is like the web!
8. Tetherless World Constellation, RPI
<mind:Person rdf:id=“Hendler”>
<mind:title jobs:Professor>
<jobs:placeOfWork http://www.cs.rpi.edu>
</mind:Person>
DOC1
Hendler
DOC1 Mind:title
Jobs:placeOfWork Web Page
http://www…
ProfessorJobs:Mind:
Jobs:
Links in the data
9. Tetherless World Constellation, RPI
<mind:Person rdf:id=“Hendler”>
owl:sameAs
<http://dbpedia.org/page/James_Hendler>
DOC2
Hendler
Mind:title
Jobs:placeOfWork Web Page
http://www…
Jobs:Professor
Asserting Links in the data
Dbpedia:Hendler
Owl:sameAs
Dbpedia:ComputerScientist
Dbpedia:occupation
10. Tetherless World Constellation, RPI
Led to Linked Data experimentation and growth
Billions of links in public cloud – across many sectors
15. Tetherless World Constellation, RPI
Extended to video markup (segments)
A particular scene from
a movie…
The story that ran on
NHK television from
0847-0903 on
2001-09-11 (GMT + 9)
2008
22. Tetherless World Constellation, RPI
Impressive results
Google finds embedded metadata on >30% of its crawl – Guha, 2015
Google “knowledge vault” reported to have over 5 billion “facts” (links)
27. Tetherless World Constellation, RPI
What about image/video archiving
• Despite this growth, still mostly
“experimental” in the archiving
community
– Especially image/video
• Two main impediments
– High cost of annotating collections with
enhanced metadata
– How does doing the annotation increase
the “value” of a collection
• Beyond search
28. Tetherless World Constellation, RPI
Recent major breakthrough in
automating computer vision
“phase transition” in capabilities of neural networks
w/machine power
34. Tetherless World Constellation, RPI
But recent “action” descriptions doing better
than question answering
A very promising direction for
jumpstarting (semi)-automated
annotation
35. Tetherless World Constellation, RPI
Moving from search to exploration
(Mei Si, 2017)
Using “narrative” technology to turn our campus
archive into an interactive “story”
37. Tetherless World Constellation, RPI
Summary
Semantic Web (Linked Data) has been a small, but growing
presence in the archiving world
- increasing use in library and museum communities
- increasing interest in collection management
- increasing interest in collection sharing
Semantic Technologies are being deployed at scale in the
larger Web world
- still primarily for search (ad match) and social
networking (ad match)
New AI technologies have the potential to overcome some of
the key problems
- reducing the cost of metadata generation/annotation
- making archives “alive” and explorable