1. Pelorus:
A Semantic Web Application
Platform
2010 Semantic Technology
Conference
Michael Grove
Director of Software Development
Clark & Parsia, LLC.
mike@clarkparsia.com
http://clarkparsia.com -- http://www.twitter.com/candp
2. Who are we?
Clark & Parsia is a Semantic software startup founded in
2005
Offices in DC and Cambridge, MA
Software products for end-user and OEM use
Provides software development and integration services
Specializing in Semantic Web, web services, and
advanced AI technologies for federal and enterprise
customers.
3. Where do we start?
No, literally, where do we start?
Enterprise increasingly wants to utilize semweb tech to
manage information
Lack of in-house SemWeb expertise
So what's the first step in these cases?
It's hard to get a project off the ground without
expertise
In many cases, you just want to get a prototype
running ASAP to evaluate the approach
An integrated platform to rapidly prototype and assess
semweb tech, which also scales to production, is crucial
4. The Pelorus Platform
Pelorus Platform aims to ease this situation
It's a standards-based application development stack
geared toward enterprise information integration via RDF,
SPARQL and OWL.
Provides a collection of software designed to take you
from ontology (or data) to application
Based on years of customer engagements learning
what parts are the same for everyone, and what parts
are customized by everyone--and facilitating both.
Minimal or no human in the loop steps are required to get
a barebones application running
From there, it's just UI customization
5. Ingredients
PelletServer
RESTful server-side component powered by Pellet
Provides:
Reasoning
Semantic Search
Integrity constraints
Query services
Machine Learning ... and Planning too!
Semantic ETL
Toolkit for transforming existing data into RDF
Support for most common formats, XML, CSV,
Excel, relational, etc.
Conversion driven from domain ontology
6. More Ingredients
Annex - A linked data server
Publishes your RDF as linked data
Works in-place against any RDF database
No files to parse and directory structure to fill out
Javascript module and pluggable template API for
rendering resources
CRUD workflow support for maintaining your data
7. More Ingredients
Machine Learning Suite
Bootstrap ontologies from existing data
Provides capabilities for learning ETL transformations
from existing data, decreasing by-hand mapping
burden
Automatically create Pelorus models for browsing
Analysis support, clustering, classification, and more.
Pelorus
Faceted browsing via SPARQL for RDF data.
8. So What Now?
Intent of Platform is to take either your existing data, or an
existing ontology, as input and provide as output a
working skeleton application.
This is the Staples Easy button for the Semantic Web
Some minimal configuration and UI customize may be
required
The goal is to Just Add Data and get back a working, full-
service, modern app that's optimized for data integration
and analysis.
9. Getting Started
Legacy data in a series of databases, XML files, etc
This is a maintenance nightmare
How to you search this data, analyze it, or verify it's
correctness?
If we could get the data out of these legacy formats and
integrate them, then we could do something useful...
10. 1. Integrate Legacy Data
Ontology Bootstrapping via ML
We can learn the basic ontology from our existing data
Feed data to a ML process that will produce our
ontology
Semantic ETL
Using our ontology, and some additional ML, we can
generate mappings from the source data to the
ontology
Automatically convert our legacy data into RDF
11. 2. Publish Integrated Data
Now that we have RDF, we'd like to publish it as Linked
Data
Annex Linked Data server takes any RDF database
and exposes it's contents as Linked Data.
Customizable template framework
Javascript API to access original RDF database
We'd also like to maintain our data
Using Empire, we can generate Java beans to
represent our domain ontology.
Annex provides generic CRUD templates driven from
standard Java beans, using JPA as a persistence
mechanism.
By virtue of simply having RDF in a database, we've got
publication as Linked Data, and maintenance via simple
CRUD pages for free.
12. 3. Browse & Search & Query
We've published our RDF, but clicking around pages
looking for a particular resource is not ideal
Having a simple interface to browse the data would be
great.
Pelorus is served via Annex
Facet model is generated dynamically via more ML
Uses same Javascript template framework for custom
display of RDF content.
13. Step 4: Analyze & Plan & Act
We can use OWL reasoning via Pellet to learn new things
about the data; for example:
which products should we sell to which customers?
which products should we sell to which prospects?
why do we make these recommendations?
We can use Machine Learning to learn new things, too:
which customers are like others? (similarity)
which groups do our customers fall into? (clustering)
which employees are liaisons between parts of the
company (social network analysis)
which employees are most likely to retire in the next
year? (classification)
We can use Automated Planning to:
build actionable plans/workflows based on these
analyses
15. What's the point?
Getting to step 4 (and beyond) is the point, that's where
the real ROI lives...
You want to get there sooner & cheaper
But many times step 1-3 is a hurdle
If you've got limited time and/or budget to prove
value in step 4, you don't want to waste it on the
drudgery of getting off the ground
This is the key to semantic technology's value
proposition