SlideShare una empresa de Scribd logo
1 de 46
From Text and Data to Knowledge: Via
Semantic Wikis
The Social Semantic Web in the Small
Jesse Wang
The Bottleneck of AI is Knowledge Acquisition
2
Human
Intelligence
Computer
Intelligence
COMPUTER INTELLIGENCE IS IN THE
CONNECTIONS
3
Connecting both Information and People
Connections between people
ConnectionsbetweenInformation
Email
Social Networking
Groupware
Javascript
Weblogs
Databases
File Systems
HTTP
Keyword Search
USENET
Wikis
Websites
Directory Portals
2010 - 2020
Web 1.0
2000 - 2010
1990 - 2000
PC Era
1980 - 1990
RSS
Widgets
PC’s
2020 - 2030
Office 2.0
XML
RDF
SPARQLAJAX
FTP IRC
SOAP
Mashups
File Servers
Social Media Sharing
Lightweight Collaboration
ATOM
Web 3.0
Web 4.0
Semantic Search
Semantic Databases
Distributed Search
Intelligent personal agents
Java
SaaS
Web 2.0Flash
OWL
HTML
SGML
SQL
Gopher
P2P
The Web
The PC
Windows
MacOS
SWRL
OpenID
BBS
MMO’s
VR
Semantic Web
Intelligent Web
The Internet
Social Web
Web OS
At Multiple Levels of Understanding
5
Signal entity (Words)
Signal form (Syntax)
Signal semantics (Concepts)
Categories (taxonomy)
Statements
Models
Decision-making
HOW DO WE CAPTURE ALL?
At least, the semantics?
6
Two Paths for Semantics (>>KB Construction)
 “Bottom-Up”
– Add semantic metadata to pages and databases all over the Web
• Alternatively train models to extract above info (machine-assisted)
– Every Website becomes semantic
• except for those not tagged, trained, or errors
 “Top-Down”
– Experts build models and rules for semantics
– Create services that provide this as an overlay to non-semantic
Web
– Every website becomes semantic
• except for those not covered 
-- Alex Iskold
Five Approaches to Semantics
 Tagging
 Statistics
 Linguistics
 Semantic Web
 Artificial Intelligence
The Tagging Approach
 Pros
– Easy for users to add and read tags
– Tags are just strings
– No algorithms or ontologies to deal
with
– No technology to learn
 Cons
– Easy for users to add and read tags
– Tags are just strings
– No algorithms or ontologies to deal
with
– No technology to learn
 Technorati
 Del.icio.us
 Flickr
 Wikipedia
 YouTube
The Statistical Approach
 Pros:
– Pure mathematical algorithms
– Massively scalable with good training
data
– Language independent
 Cons:
– No understanding of the content
– Hard to craft good queries
– Best for finding really popular things –
not good at finding needles in
haystacks
– Limited by data (esp. quality training
data)
– Not great for sparse structured data
with strong inherent semantics
 Google
 Lucene
 Autonomy
 Farecast (Bing Travel)
The Linguistic Approach
 Pros:
– Almost-true language understanding
– Extract knowledge from text
– Best for search for particular facts or
relationships
– More precise queries
 Cons:
– Computationally intensive
– Difficult to scale
– Lots of special case and other errors
– Language-dependent
 Powerset
 Hakia
 Inxight, Attensity, and others…
The Semantic Web Approach
 Pros:
– More precise queries
– Smarter apps with less work
– Not as computationally intensive
– Share & link data between apps
– Works for both unstructured and
structured data
 Cons:
– Lack of tools
– Difficult to scale
– Who makes all the metadata?
 Radar Networks
 DBpedia Project
 Metaweb (Freebase)
The Artificial Intelligence Approach
 Pros:
– Smart in narrow domains
– Answer questions intelligently
– Reasoning and learning
 Cons:
– Computationally intensive
– Difficult to scale
– Extremely hard to program
– Does not work well outside of narrow
domains
– Training takes a lot of work
 Cycorp
 AURA (Project Halo)
The Approaches Compared
Make the software smarter
Make the Data Smarter
Statistics
Linguistics
Semantic
Web
A.I.
Tagging
In Practice
Tagging
Semantic Web
Statistics
Linguistics
Artificial intelligence
From Tagging to AI
Data Structure
Intelligence
16
The Semantic Web is a Key Enabler
 Moves the “intelligence” out of applications, into the data
 Data need special structures
 becomes self-describing; Meaning of data becomes part of
the data
 Apps can become smarter with less work, because the data
carries knowledge about what it is and how to use it
 Data can be shared and linked more easily
The Semantic Web = Open Database Layer for the Web
User
Profiles
Web
Content
Data
Records
Apps &
Services
Ads &
Listings
Open Data Mappings
Open Data Records
Open Rules
Open Ontologies
Open Query Interfaces
And The Web IS the Database!
Application A Application B
BUT THERE IS STILL SOMETHING
MISSING
20
21
In Every Part or Layer of the Semantic Web, We Need
22
Now a Complete Web
23
Crowd Wisdom To Best Map Human Knowledge for Human
24
Clear Semantics for Machine to Understand Knowledge
25
Semantic Wikis: the Social Semantic Web in Action!
26
Semantic
Wikis
What is a Wiki? A Key Feature of Wikis is
27
This distinguishes wikis from other publication tools
Consensus in Wikis Comes from
 Collaboration
– ~17 edits/page on average in
Wikipedia (with high variance)
– Wikipedia’s Neutral Point of View
 Convention
– Users follow customs and
conventions to engage with
articles effectively
28
Software Support Makes Wikis Successful
 Trivial to edit by anyone
 Tracking of all changes, one-
step rollback
 Every article has a “Talk” page
for discussion
 Notification facility allows anyone
to “watch” an article
 Sufficient security on
pages, logins can be required
 A hierarchy of
administrators, gardeners, and
editors
 Software Bots recognize certain
kinds of vandalism and auto-
revert, or recognize articles that
need work, and flag them for
editors 29
Success of Wikis
30
Actual number of articles on en.wikipedia.org (thick
blue line) compared with a Gompertz model that leads
eventually to a maximum of about 4.4 million articles
(thin green line)
Summary: What Wiki Is Really About
Quick and Easy – No download
Layered Community Authoring
Interlinked Hierarchical Content
Revision Control
Notification
What is a Semantic Wiki
 A wiki that has an underlying model of the
knowledge described in its pages.
 To allow users to make their knowledge explicit and formal
 Semantic Web Compatible
32
Semantic Wiki
Combining Human Knowledge and Data Structures
Wikis for
Metadata
Metadata
for Wikis
33
Basics of Semantic Wikis
 Still a wiki, with regular wiki features
– E.g. Category/Tags, Namespaces, Title, Versioning, ...
 Typed Content
– E.g. Page/Card, Date, Number, URL/Email, String, …
 Typed Links
– E.g. “capital_of”, “contains”, “born_in”…
 Querying Interface Support
– E.g. “[[Category:Person]] [[Age::<30]]”
34
Advanced Semantic Wiki Features
 Semantic forms or templates
 Auto-completion based on semantics
 Powerful visualizations based on semantics/structures/types
 Rules and reasoning support
 Advanced search and queries (faceted
search, SPARQL, etc.)
 Semantic notifications (personalized information filtering)
 Import and Export of Semantic Data
 Data Integration:
identification, disambiguation, merging, trust, security/privac
y, …
35
Characteristics of Semantic Wikis
36
What is the Promise of Semantic Wikis?
 Semantic Wikis facilitate
Consensus over Data
(Knowledge)
 Combine low-expressivity
data authorship with the
best features of traditional
wikis
 User-governed, user-
maintained, user-defined
 Easy to use as an
extension of text authoring
37
One Key Helpful Feature of Semantic Wikis
Semantic Wikis are “Schema-Last”
Databases require DBAs and schema design;
Semantic Wikis develop and maintain the schema in the wiki
Great Candidate for Knowledge Acquisition
 Combining both unstructured and semi-structured data
 High connectivity on both information and social dimensions
 Collaboration with sophisticated software support
 Expected low-cost for crowd-sourcing
 Evolving category and template systems
 But…
39
BUT – Plain Wikis Are Not Good Enough
for Deep Knowledge Acquisition
40
Knowledge is represented
MOSTLY in unstructured and
semi-structured ways
• Plain text
• Templates
• Infoboxes
• Tables
• Section headers
• Links
• References
• Redirects
• …
Software/Feature Enhancements Are Needed
Quick and easy way to view and edit schema
Machine assistence (NLP, Auto-suggest…)
Better visualizations with structured data
More user layers for better KB construction
Better targeted (semantic) notifications
41
 K.A. is the well-known Artificial Intelligence Problem
– AI authoring is too expensive, too slow, not scalable
 Three Possible Solutions
– Automatic Machine Parsing (e.g. NELL, ReVerb)
• Quality (depth) not good enough for textbook sentences
• Error rates are too high
• Still need humans in the loop for training data
– Crowd Sourced Authoring (e.g. AMT)
• Biology and Knowledge Engineering expertise is difficult to get
• Mechanical Turk uses individuals, but the Knowledge Entry tasks appear to
require coordination, judgment, discussion, and working together
– Social Authoring and Crowdsourcing with Intelligence Software
Assistance
• Wikipedia showed this could work for text
• Semantic Wiki software R&D to make it work for more structured knowledge
Best Bet for Knowledge Acquisition?
42
With All These Features…
Effective
Knowledge
acquisition via
Semantic
Wikis
Combine the
strength of
human and
machines
Connecting
Human and
Machines
High Quality
while low cost
43
Conclusion: To Bridge Machine and Human Intelligence
44
To Dive Into Social Semantic Web
45
THANK YOU!
Credits: some slides are originally from the following people, with little or no
modifications:
Nova Spivack
Denny Vrandecic
Mark Greaves
Bao Jie
46

Más contenido relacionado

Más de Jesse Wang

Agile lean workshop
Agile lean workshopAgile lean workshop
Agile lean workshopJesse Wang
 
Big data analytic platform
Big data analytic platformBig data analytic platform
Big data analytic platformJesse Wang
 
Smart datamining semtechbiz 2013 report
Smart datamining semtechbiz 2013 reportSmart datamining semtechbiz 2013 report
Smart datamining semtechbiz 2013 reportJesse Wang
 
The Web of data and web data commons
The Web of data and web data commonsThe Web of data and web data commons
The Web of data and web data commonsJesse Wang
 
Hybrid system architecture overview
Hybrid system architecture overviewHybrid system architecture overview
Hybrid system architecture overviewJesse Wang
 
Experiment on Knowledge Acquisition
Experiment on Knowledge AcquisitionExperiment on Knowledge Acquisition
Experiment on Knowledge AcquisitionJesse Wang
 
SemTech 2012 Talk semantify office
SemTech 2012 Talk  semantify officeSemTech 2012 Talk  semantify office
SemTech 2012 Talk semantify officeJesse Wang
 
Building SMWCon Spring 2012 Site
Building SMWCon Spring 2012 SiteBuilding SMWCon Spring 2012 Site
Building SMWCon Spring 2012 SiteJesse Wang
 
SMWCon Spring 2012 SMW+ Team Dev Update
SMWCon Spring 2012 SMW+ Team Dev UpdateSMWCon Spring 2012 SMW+ Team Dev Update
SMWCon Spring 2012 SMW+ Team Dev UpdateJesse Wang
 
SMWCon Spring 2012 Welcome Remarks
SMWCon Spring 2012 Welcome RemarksSMWCon Spring 2012 Welcome Remarks
SMWCon Spring 2012 Welcome RemarksJesse Wang
 
Pre-SMWCon Spring 2012 meetup (short)
Pre-SMWCon Spring 2012 meetup (short)Pre-SMWCon Spring 2012 meetup (short)
Pre-SMWCon Spring 2012 meetup (short)Jesse Wang
 
Msra talk smw+apps
Msra talk smw+appsMsra talk smw+apps
Msra talk smw+appsJesse Wang
 
Jist tutorial semantic wikis and applications
Jist tutorial   semantic wikis and applicationsJist tutorial   semantic wikis and applications
Jist tutorial semantic wikis and applicationsJesse Wang
 
Semantic Wiki Page Maker
Semantic Wiki Page MakerSemantic Wiki Page Maker
Semantic Wiki Page MakerJesse Wang
 
Smwcon widget editor - first preview
Smwcon widget editor - first previewSmwcon widget editor - first preview
Smwcon widget editor - first previewJesse Wang
 
Microsoft Office Connector Update at SMWCon Spring 2011
Microsoft Office Connector Update at SMWCon Spring 2011Microsoft Office Connector Update at SMWCon Spring 2011
Microsoft Office Connector Update at SMWCon Spring 2011Jesse Wang
 
Smwcon spring2011 tutorial applied semantic mediawiki
Smwcon spring2011 tutorial applied semantic mediawikiSmwcon spring2011 tutorial applied semantic mediawiki
Smwcon spring2011 tutorial applied semantic mediawikiJesse Wang
 
Semantic Wikis - Social Semantic Web in Action
Semantic Wikis - Social Semantic Web in ActionSemantic Wikis - Social Semantic Web in Action
Semantic Wikis - Social Semantic Web in ActionJesse Wang
 
Agile and effective project management of for-by wikis
Agile and effective project management of for-by wikisAgile and effective project management of for-by wikis
Agile and effective project management of for-by wikisJesse Wang
 
Aswc2009 Smw Tutorial Part 4 Wiki Tags
Aswc2009 Smw Tutorial Part 4 Wiki TagsAswc2009 Smw Tutorial Part 4 Wiki Tags
Aswc2009 Smw Tutorial Part 4 Wiki TagsJesse Wang
 

Más de Jesse Wang (20)

Agile lean workshop
Agile lean workshopAgile lean workshop
Agile lean workshop
 
Big data analytic platform
Big data analytic platformBig data analytic platform
Big data analytic platform
 
Smart datamining semtechbiz 2013 report
Smart datamining semtechbiz 2013 reportSmart datamining semtechbiz 2013 report
Smart datamining semtechbiz 2013 report
 
The Web of data and web data commons
The Web of data and web data commonsThe Web of data and web data commons
The Web of data and web data commons
 
Hybrid system architecture overview
Hybrid system architecture overviewHybrid system architecture overview
Hybrid system architecture overview
 
Experiment on Knowledge Acquisition
Experiment on Knowledge AcquisitionExperiment on Knowledge Acquisition
Experiment on Knowledge Acquisition
 
SemTech 2012 Talk semantify office
SemTech 2012 Talk  semantify officeSemTech 2012 Talk  semantify office
SemTech 2012 Talk semantify office
 
Building SMWCon Spring 2012 Site
Building SMWCon Spring 2012 SiteBuilding SMWCon Spring 2012 Site
Building SMWCon Spring 2012 Site
 
SMWCon Spring 2012 SMW+ Team Dev Update
SMWCon Spring 2012 SMW+ Team Dev UpdateSMWCon Spring 2012 SMW+ Team Dev Update
SMWCon Spring 2012 SMW+ Team Dev Update
 
SMWCon Spring 2012 Welcome Remarks
SMWCon Spring 2012 Welcome RemarksSMWCon Spring 2012 Welcome Remarks
SMWCon Spring 2012 Welcome Remarks
 
Pre-SMWCon Spring 2012 meetup (short)
Pre-SMWCon Spring 2012 meetup (short)Pre-SMWCon Spring 2012 meetup (short)
Pre-SMWCon Spring 2012 meetup (short)
 
Msra talk smw+apps
Msra talk smw+appsMsra talk smw+apps
Msra talk smw+apps
 
Jist tutorial semantic wikis and applications
Jist tutorial   semantic wikis and applicationsJist tutorial   semantic wikis and applications
Jist tutorial semantic wikis and applications
 
Semantic Wiki Page Maker
Semantic Wiki Page MakerSemantic Wiki Page Maker
Semantic Wiki Page Maker
 
Smwcon widget editor - first preview
Smwcon widget editor - first previewSmwcon widget editor - first preview
Smwcon widget editor - first preview
 
Microsoft Office Connector Update at SMWCon Spring 2011
Microsoft Office Connector Update at SMWCon Spring 2011Microsoft Office Connector Update at SMWCon Spring 2011
Microsoft Office Connector Update at SMWCon Spring 2011
 
Smwcon spring2011 tutorial applied semantic mediawiki
Smwcon spring2011 tutorial applied semantic mediawikiSmwcon spring2011 tutorial applied semantic mediawiki
Smwcon spring2011 tutorial applied semantic mediawiki
 
Semantic Wikis - Social Semantic Web in Action
Semantic Wikis - Social Semantic Web in ActionSemantic Wikis - Social Semantic Web in Action
Semantic Wikis - Social Semantic Web in Action
 
Agile and effective project management of for-by wikis
Agile and effective project management of for-by wikisAgile and effective project management of for-by wikis
Agile and effective project management of for-by wikis
 
Aswc2009 Smw Tutorial Part 4 Wiki Tags
Aswc2009 Smw Tutorial Part 4 Wiki TagsAswc2009 Smw Tutorial Part 4 Wiki Tags
Aswc2009 Smw Tutorial Part 4 Wiki Tags
 

Último

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 

Último (20)

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 

Semantic Wiki, Great Candidate for Knowledge Acquisition

  • 1. From Text and Data to Knowledge: Via Semantic Wikis The Social Semantic Web in the Small Jesse Wang
  • 2. The Bottleneck of AI is Knowledge Acquisition 2 Human Intelligence Computer Intelligence
  • 3. COMPUTER INTELLIGENCE IS IN THE CONNECTIONS 3
  • 4. Connecting both Information and People Connections between people ConnectionsbetweenInformation Email Social Networking Groupware Javascript Weblogs Databases File Systems HTTP Keyword Search USENET Wikis Websites Directory Portals 2010 - 2020 Web 1.0 2000 - 2010 1990 - 2000 PC Era 1980 - 1990 RSS Widgets PC’s 2020 - 2030 Office 2.0 XML RDF SPARQLAJAX FTP IRC SOAP Mashups File Servers Social Media Sharing Lightweight Collaboration ATOM Web 3.0 Web 4.0 Semantic Search Semantic Databases Distributed Search Intelligent personal agents Java SaaS Web 2.0Flash OWL HTML SGML SQL Gopher P2P The Web The PC Windows MacOS SWRL OpenID BBS MMO’s VR Semantic Web Intelligent Web The Internet Social Web Web OS
  • 5. At Multiple Levels of Understanding 5 Signal entity (Words) Signal form (Syntax) Signal semantics (Concepts) Categories (taxonomy) Statements Models Decision-making
  • 6. HOW DO WE CAPTURE ALL? At least, the semantics? 6
  • 7. Two Paths for Semantics (>>KB Construction)  “Bottom-Up” – Add semantic metadata to pages and databases all over the Web • Alternatively train models to extract above info (machine-assisted) – Every Website becomes semantic • except for those not tagged, trained, or errors  “Top-Down” – Experts build models and rules for semantics – Create services that provide this as an overlay to non-semantic Web – Every website becomes semantic • except for those not covered  -- Alex Iskold
  • 8. Five Approaches to Semantics  Tagging  Statistics  Linguistics  Semantic Web  Artificial Intelligence
  • 9. The Tagging Approach  Pros – Easy for users to add and read tags – Tags are just strings – No algorithms or ontologies to deal with – No technology to learn  Cons – Easy for users to add and read tags – Tags are just strings – No algorithms or ontologies to deal with – No technology to learn  Technorati  Del.icio.us  Flickr  Wikipedia  YouTube
  • 10. The Statistical Approach  Pros: – Pure mathematical algorithms – Massively scalable with good training data – Language independent  Cons: – No understanding of the content – Hard to craft good queries – Best for finding really popular things – not good at finding needles in haystacks – Limited by data (esp. quality training data) – Not great for sparse structured data with strong inherent semantics  Google  Lucene  Autonomy  Farecast (Bing Travel)
  • 11. The Linguistic Approach  Pros: – Almost-true language understanding – Extract knowledge from text – Best for search for particular facts or relationships – More precise queries  Cons: – Computationally intensive – Difficult to scale – Lots of special case and other errors – Language-dependent  Powerset  Hakia  Inxight, Attensity, and others…
  • 12. The Semantic Web Approach  Pros: – More precise queries – Smarter apps with less work – Not as computationally intensive – Share & link data between apps – Works for both unstructured and structured data  Cons: – Lack of tools – Difficult to scale – Who makes all the metadata?  Radar Networks  DBpedia Project  Metaweb (Freebase)
  • 13. The Artificial Intelligence Approach  Pros: – Smart in narrow domains – Answer questions intelligently – Reasoning and learning  Cons: – Computationally intensive – Difficult to scale – Extremely hard to program – Does not work well outside of narrow domains – Training takes a lot of work  Cycorp  AURA (Project Halo)
  • 14. The Approaches Compared Make the software smarter Make the Data Smarter Statistics Linguistics Semantic Web A.I. Tagging
  • 16. From Tagging to AI Data Structure Intelligence 16
  • 17. The Semantic Web is a Key Enabler  Moves the “intelligence” out of applications, into the data  Data need special structures  becomes self-describing; Meaning of data becomes part of the data  Apps can become smarter with less work, because the data carries knowledge about what it is and how to use it  Data can be shared and linked more easily
  • 18. The Semantic Web = Open Database Layer for the Web User Profiles Web Content Data Records Apps & Services Ads & Listings Open Data Mappings Open Data Records Open Rules Open Ontologies Open Query Interfaces
  • 19. And The Web IS the Database! Application A Application B
  • 20. BUT THERE IS STILL SOMETHING MISSING 20
  • 21. 21
  • 22. In Every Part or Layer of the Semantic Web, We Need 22
  • 23. Now a Complete Web 23
  • 24. Crowd Wisdom To Best Map Human Knowledge for Human 24
  • 25. Clear Semantics for Machine to Understand Knowledge 25
  • 26. Semantic Wikis: the Social Semantic Web in Action! 26 Semantic Wikis
  • 27. What is a Wiki? A Key Feature of Wikis is 27 This distinguishes wikis from other publication tools
  • 28. Consensus in Wikis Comes from  Collaboration – ~17 edits/page on average in Wikipedia (with high variance) – Wikipedia’s Neutral Point of View  Convention – Users follow customs and conventions to engage with articles effectively 28
  • 29. Software Support Makes Wikis Successful  Trivial to edit by anyone  Tracking of all changes, one- step rollback  Every article has a “Talk” page for discussion  Notification facility allows anyone to “watch” an article  Sufficient security on pages, logins can be required  A hierarchy of administrators, gardeners, and editors  Software Bots recognize certain kinds of vandalism and auto- revert, or recognize articles that need work, and flag them for editors 29
  • 30. Success of Wikis 30 Actual number of articles on en.wikipedia.org (thick blue line) compared with a Gompertz model that leads eventually to a maximum of about 4.4 million articles (thin green line)
  • 31. Summary: What Wiki Is Really About Quick and Easy – No download Layered Community Authoring Interlinked Hierarchical Content Revision Control Notification
  • 32. What is a Semantic Wiki  A wiki that has an underlying model of the knowledge described in its pages.  To allow users to make their knowledge explicit and formal  Semantic Web Compatible 32 Semantic Wiki
  • 33. Combining Human Knowledge and Data Structures Wikis for Metadata Metadata for Wikis 33
  • 34. Basics of Semantic Wikis  Still a wiki, with regular wiki features – E.g. Category/Tags, Namespaces, Title, Versioning, ...  Typed Content – E.g. Page/Card, Date, Number, URL/Email, String, …  Typed Links – E.g. “capital_of”, “contains”, “born_in”…  Querying Interface Support – E.g. “[[Category:Person]] [[Age::<30]]” 34
  • 35. Advanced Semantic Wiki Features  Semantic forms or templates  Auto-completion based on semantics  Powerful visualizations based on semantics/structures/types  Rules and reasoning support  Advanced search and queries (faceted search, SPARQL, etc.)  Semantic notifications (personalized information filtering)  Import and Export of Semantic Data  Data Integration: identification, disambiguation, merging, trust, security/privac y, … 35
  • 37. What is the Promise of Semantic Wikis?  Semantic Wikis facilitate Consensus over Data (Knowledge)  Combine low-expressivity data authorship with the best features of traditional wikis  User-governed, user- maintained, user-defined  Easy to use as an extension of text authoring 37
  • 38. One Key Helpful Feature of Semantic Wikis Semantic Wikis are “Schema-Last” Databases require DBAs and schema design; Semantic Wikis develop and maintain the schema in the wiki
  • 39. Great Candidate for Knowledge Acquisition  Combining both unstructured and semi-structured data  High connectivity on both information and social dimensions  Collaboration with sophisticated software support  Expected low-cost for crowd-sourcing  Evolving category and template systems  But… 39
  • 40. BUT – Plain Wikis Are Not Good Enough for Deep Knowledge Acquisition 40 Knowledge is represented MOSTLY in unstructured and semi-structured ways • Plain text • Templates • Infoboxes • Tables • Section headers • Links • References • Redirects • …
  • 41. Software/Feature Enhancements Are Needed Quick and easy way to view and edit schema Machine assistence (NLP, Auto-suggest…) Better visualizations with structured data More user layers for better KB construction Better targeted (semantic) notifications 41
  • 42.  K.A. is the well-known Artificial Intelligence Problem – AI authoring is too expensive, too slow, not scalable  Three Possible Solutions – Automatic Machine Parsing (e.g. NELL, ReVerb) • Quality (depth) not good enough for textbook sentences • Error rates are too high • Still need humans in the loop for training data – Crowd Sourced Authoring (e.g. AMT) • Biology and Knowledge Engineering expertise is difficult to get • Mechanical Turk uses individuals, but the Knowledge Entry tasks appear to require coordination, judgment, discussion, and working together – Social Authoring and Crowdsourcing with Intelligence Software Assistance • Wikipedia showed this could work for text • Semantic Wiki software R&D to make it work for more structured knowledge Best Bet for Knowledge Acquisition? 42
  • 43. With All These Features… Effective Knowledge acquisition via Semantic Wikis Combine the strength of human and machines Connecting Human and Machines High Quality while low cost 43
  • 44. Conclusion: To Bridge Machine and Human Intelligence 44
  • 45. To Dive Into Social Semantic Web 45
  • 46. THANK YOU! Credits: some slides are originally from the following people, with little or no modifications: Nova Spivack Denny Vrandecic Mark Greaves Bao Jie 46