A presentation about the state of the collaborative semantic web, including:
- What?
- Why?
- Where do we stand?
- A case study on Metaweb's Freebase project
3. Object‐Oriented Representation
“John Smith is a five‐
foot‐tall male who
weighs one‐hundred
eighty pounds. He’s
forty‐two years old and
he loves dogs.”
sortByHeight(users);
isADude(“John Smith”);
Structured data in String of characters
object ‐ useful to ‐ useful to people
computers
4. Object‐Oriented Representation
class Dog<Mammal class Place
@home, @age, @human @latlong, @elevation,
def goHome @address, @name
go(@home); def isBelowSeaLevel
Home
end @elevation < 0;
end
Pet
Home
Human
Spouse
O‐O instance variables and class Human<Mammal Friends
methods represent @home, @age, @pets
structured semantic @spouse, @friends
def divorce
relationships between
@spouse = nil;
pieces of information. end
7. The (semantic) Web Tomorrow?
Servers
Servers
User Computer
Structured Data
(OWL/RDF)
Data Aggregator/
Document Data
(HTML/CSS)
Visualizer
Hyperlinks
Servers
(Old www links)
12. Why? Context‐Aware AI
Today’s AI is limited by the domain‐
specificity of its input data
Example: Image Analysis
Context‐Aware: Mine the semantic
graph for heuristics and clues
Trees
Sky
You’re in the mountains
near Aspen, CO. It is the
fall. Your friends Taifur
and Rachel are at
the same location. Etc.
Rachel Taifur Dan
14. Challenges ‐ Motivation/Critical Mass
Value comes from ubiquity,
and ubiquity comes from value.
How do we encourage adoption of technology
that does not yet provide value?
More importantly, how will it provide value to content producers?
How is giving away your data’s meaning (your secret sauce)
instead of presenting it alongside ads valuable?
Should semantic feeds be monetized?
“Information As A Service”
14
20. Bottom‐Up: Publishing Standards
RDF and OWL standards
•W3C‐sponsored standards for defining semanSc relaSonships and resources
•Powerful, but complex and hard for humans to read/create
•No mo@va@on for developers to create
•No W3C‐sponsored universal ontologies
<rdf:Description owl:Class rdf:ID=quot;Carquot;
rdf:about=quot;http://.../DansCarquot;> rdfs:subClassOf rdf:resource=quot;#Vehiclequot;
<car:color>Red</car:color> rdfs:subClassOf [a owl:Restriction;
<car:make>Honda</make> owl:cardinality quot;4quot;^^xsd:nonNegativeInteger;
<car:year>1999</make> owl:onProperty <#Wheel> ]
RDF Data Node OWL Ontology
Dan’s hasA
isA
Car Color
Vehicle
color make
Car
HasMul*ple(4)
year
1999
25. Metaweb Freebase Statistics
•5.3 million topics today
•Growing by ~15,000/month
•Pulled from public data
•By comparison, Wikipedia has
2.64 million English arScles
•25,379 users today.
•Growing by 600‐800/month
•Freebase Launch, March 2007
•LLC Founded, July 2005
26. The Freebase Approach
Crea@ve Commons Content
ALribu@on License Publishers WWW
Open API
PI
RDF A
(MQL) App Developers
Node Editor GUI
Data Modelers
Expert Users &
Dataset Owners
Casual
Collaborators Exis@ng
Datasets WWW
28. The Freebase Approach
Crea@ve Commons Content
ALribu@on License Publishers WWW
Open API
PI
RDF A
(MQL) App Developers
Node Editor GUI
Data Modelers
Expert Users &
Dataset Owners
Casual
Collaborators Exis@ng
Datasets WWW
29. Freebase Community
App Developers List
Data Modelers List
App Developers
IRC Chat (open to all)
Data Modelers
Discussion Threads
on Individual Topics Expert Users &
Dataset Owners
•No Central Forum
•No Backlog of Mailing List
Casual
•No Friends
Collaborators
•No Private Messages
30. Freebase Community Tools
“Acre, the Freebase applicaSon
development plaqorm, lets anyone
mashup Freebase data using
Javascript and have it hosted for
Employees free.” ‐ Shawn Simister, developer
App Developers
“We have been working “[Data Modeling is hard because new
hard recently to provide schemas are a slow process, and they
bulk import tools can break users’ code. Let’s have a
“Sloppy Freebase” that allows Data Modelers
for Freebase. While such
tools exist internally, the users to enter unstructured data unSl
reconciliaSon process has new schemas are defined.]” ‐ Jack Alves,
Expert Users &
thus far been too former Metaweb Director of Engineering
Dataset Owners
complicated for public
release.” ‐ Brian
Culbertson, Metaweb
“As a programmer, I feel that I'm most
Engineer
effecSve when I'm contribuSng
large data sets … faciliSes
Casual built into Freebase that let users
Collaborators upload lists of topics are limited to
specific situaSons.” ‐ Shawn Simister
31. “Sloppy” Data Modeling
“[Data Modeling is hard because new
Currently, Freebase users cannot submit data if there is not
schemas are a slow process, and they
already a data structure + ontology built for that data type.
can break users’ code. Let’s have a
“Sloppy Freebase” that allows
“Sloppy” data creation allows users to create their own data
users to enter unstructured data unSl
new schemas are defined.]” ‐ Jack Alves,
types, which will later be cleaned and standardized.
former Metaweb Director of Engineering
User‐Generated, Semi‐
structured “sloppy” data Clean, structured data
Bob Jones John Green
RDF
major: English area of study:
Sociology
to semanSc web
John Green Eric Bradley
Bob Jones
focus: studying:
area of study:
Sociology CS
OWL
English
Casual Automated Data Eric Bradley
Collaborators Cleaner‐Upper area of study:
CS
32. Conclusion
We are here. • Progress
• RDF/OWL
• Freebase ‐ 5.3 mil. topics!
• Dapper, Yahoo! Pipes, other
data abstractors
• API’s out the wazoo
• Issues
• AdopSon
• Privacy/Security
• Intellectual Property
• We’re good at making
content, but we suck at
mining it and describing it.