Paper presentation at UK Computation Intelligence workshop 2003, Bristol. This paper reviews the current state of the art of machine learning applied to the Semantic Web. It looks at the Semantic Web and its languages, including RDF and OWL, from a machine learning perspective. Trends in the Semantic Web are mentioned throughout and the relationship with Web Services is examined. Applications are discussed with recent examples and pointers to data sets. Finally, the emerging field of Semantic Web Mining is introduced.
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
A review of the state of the art in Machine Learning on the Semantic Web
1. A review of the state of the art in
Machine Learning on the Semantic Web
Simon Price
University of Bristol
http://www.cs.bris.ac.uk/~price
2. Outline
• Introduction to the Semantic Web
– Semantic Web layers
– URI, RDF(S), OWL
– Web Services and the Semantic Web
• Applications of Machine Learning
– Creating the Semantic Web
– Using the Semantic Web
• Summary and pointers to further info.
4. Definition
"The Semantic Web is the representation of data
on the World Wide Web. It is a collaborative
effort led by W3C with participation from a large
number of researchers and industrial partners. It
is based on the Resource Description Framework
(RDF), which integrates a variety of applications
using XML for syntax and URIs for naming."
5. Uniform Resource Identifier (URI)
• URI addressing scheme
http://...
ftp://...
mailto:..., etc.
• Each URI points to a resource (or a specific point
within a resource)
• Typically, the resource is somewhere on the Web but
it may be a non-network retrievable entity
e.g.
- human beings, corporations, bound books in a library,
- concepts, topics, relations, ...
6. URIs - Good news. Bad news.
• Good news: decentralisation
– anyone can create a URI
– allows rapid growth of Web
• Bad news: decentralisation
– no centralised register or clearing house
– multiple URIs can refer to same entity
– testing for equality (or equivalence) poses interesting problems
7. Resource Description Framework (RDF)
• A language of URI triples.
• An RDF statement has the form:
{ subject, predicate, object }
• e.g. "http://www.example.org/index.html has a creator whose
value is the literal John Smith" could be represented as a plain
text triple:
subject http://www.example.org/index.html
predicate http://purl.org/dc/elements/1.1/creator
object John Smith
8. Representing RDF
• Default syntax is XML (not human friendly)
• SQL triple stores commonly used
• RDF toolkits: Jena (HP) and Redland (Dave Beckett)
• Prolog: SWI-Prolog (40M triples per 100MB RAM)
e.g.
rdf( 'http://www.example.org/index.html',
'http://purl.org/dc/elements/1.1/creator',
'John Smith' ).
10. RDF Schema
• A language for describing properties and classes of
RDF resources
• Includes semantics for generalisation-hierarchies of
such properties and classes
• Simple data typing model:
– is-a relationships and properties
– some range and domain restriction
Notes:
1. RDF Schema recently renamed as "RDF Vocabulary Description Layer"
2. In the literature, RDF + RDF Schema is often referred to as RDF(S)
11. Ontology Vocabulary Layer
• Huge number of different ontologies:
– simple: thesauri, taxonomies
– complex: DAML+OIL, OWL
• OWL supersedes the older DAML+OIL
• OWL goes further than RDF Schema, adding:
– relations between classes
– cardinality
– equality
– richer typing
– characteristics of properties
– enumerated classes
12. Web Ontology Language (OWL)
• OWL Lite - hierarchical classification (ideal for
thesauri and other taxonomies).
• OWL DL - description logics (computationally
complete but inference services are restricted to
classification and subsumption).
• OWL Full - full syntactic freedom of RDF (no
computational guarantees).
13. Web Services and the Semantic Web
• Web Services
– XML-based interfaces to programs accessible via the Web
– Operating system neutral Remote Procedure Call (RPC) protocol
• Today's Web Services
– Business-orientated, simple, short transactional operations
– Domain-specific XML vocabularies (not RDF)
• Tomorrow's Web Services
– Combination of simple services to achieve complex operations
– Automated discovery, selection and pipelining of Web Services
• Semantic Web + Machine Learning may have an
important role to play in the future of Web Services
15. • Attempts to apply Machine Learning are being made
within each of the Semantic Web layers.
• Research activity within each layer can be divided
into two parts:
The application of Machine Learning in:
• creating the Semantic Web
• using the Semantic Web
• Most activity to-date is in creating the Semantic Web
Activity
16. Creating the Semantic Web
• Why can't people do this themselves?
– People are frequently unaware of metadata standards
– People are (usually) unwilling to spend time creating metadata
• May be no direct benefit (to them)
• Boring
– People are incapable of applying metadata consistently
• Consistency varies from person to person
• Consistency varies in the same person over time
– There's already a huge backlog of unlabelled data on the existing web!
– Also, someone else's metadata may not be what you want
• e.g. Site content rating from supplier may be unreliable
17. Automatic Generation of Metadata
• Paper describes examples of ML research that use:
– Inductive Logic Programming (on popular science articles)
• F-Score close to human expert. Precision between 0.7 and 1.0
– Hidden Markov Models (on marked-up MUC and MEDLINE texts)
• Reported as adequate but not able to scale due to fragmentation of
probability distribution. Portable across domains. SVMs suggested.
– Association Analysis (using Web Directory for labelling examples)
• Work in progress but looks for terms in text that indicate directory path
e.g. of a path .../Manufacturing/Materials/Metals/Steel/..
18. Application of ML to Ontologies
• Ontology Vocabulary Layer is currently a popular
area of Semantic Web research
• Most ontologies hand-crafted
• Creating ontologies is far more complex than RDF
metadata extraction
– ILP has been used to revise and maintain, but not create
– Association rule learning has been used to partly automate
– Regular expression (FSA) rewriting guided by Minimum Description
Length to create Document Type Descriptors (DTD) for XML docs.
• Ontology mapping
– Hard problem
– Some work using Naive Bayes
19. Using the Semantic Web
• Not much ML research in this area (yet)
• Datasets exists
– RSS newsfeeds and Weblogs/Blogs
– DAML repository
– Dave Beckett's RDF Resource Guide
• Locating suitable data can be a problem
• Semantic Web Mining has been conjectured
– combines Semantic Web with Web Mining
– Relational Data Mining (RDM) suggested to exploit structure in data
20. Summary
• Semantic Web is rapidly evolving
• Key languages:
– RDF
– vocabularies built on top of RDF
• Publicly available RDF datasets exist
– in applications like RSS
– and repositories like DAML
• RDF maps well to Prolog (and SQL)
• Machine Learning looks promising for both the
creation and use of the Semantic Web