1. Climbing the Learning Curve
with Linked Data
Open Government Data Camp 20-Oct-2011
Bernadette Hyland, CEO
bhyland@3roundstones.com
Twitter @BernHyland
Wednesday, October 19, 2011
Information overload, Impatient society, Change is the only constant
Software is not valued by its usefulness ... but by its expected future value
2. • Linked Data is about
publishing and
consuming data using
international data
standards
• Based on 20 year old
idea
• Goal is to solve
organizational issues
related to data silos,
requirements for faster
data integration and an
environment of reduced
IT budgets
Wednesday, October 19, 2011
Why am I speaking on Linked Data and sharing? I’m here in my role as the co-chair of W3C GLD WG.
I’m also a long time entrepreneur in this space having founded companies that led to several of the most
widely used Open Source projects for Linked Data, including Mulgara, OpenRDF/Sesame, the PURLs
2.0 and Callimachus. I’ve authored chapters in two of these peer-reviewed books published by Springer
which are available in hardcopy or for free, via the Web.
3. There is a Process
Identify Model Name Describe Convert Publish
Maintain
Wednesday, October 19, 2011
Identify the data, model exemplar records -- what you are going to carry forward & what you
are going to leave behind.
Name all of the NOUNs. Turn the records into URIs.
Next, describe RESOURCES with vocabularies.
Write a script or process to convert from canonical form to RDF. Then publish. Maintain over
time.
4. Preparation
1. Leverage what exists
• Request a copy of the logical and physical model of the
database(s)
• Obtain data extracts (i.e., databases and/or spreadsheets)
or create data in a way that can be replicated.
Wednesday, October 19, 2011
Linked Data modelers typically model two or three exemplar objects to begin the process.
We figure out the relationships and identify how each object relates to the real world, initially
drawing on a large white board or collaborative wiki site.
5. Model the data
2. Model data without context to allow for
reuse and easier merging of data sets
• Traditional DBAs organize data for specified
Web services or applications.
• With LD, application logic does not drive the
data schema, concepts, etc.
Wednesday, October 19, 2011
LD domain experts model data without context versus traditional modelers who typically
organize data for specified Web services or applications.
Application logic does not drive the data schema.
Better enables data reuse and easier merging of data sets.
6. Model the data
3.Look for real world objects of interest (e.g.,
people, places, things, locations, etc.) and
model them.
• Investigate how others are already modeling
similar or related data.
• Look for duplication and normalize the data
• Use common sense to decide whether or
not to make link
Wednesday, October 19, 2011
Linked Data modeling experts typically model two or three exemplar objects to begin the
process. We figure out the relationships and identify how each object relates to the real
world, initially drawing on a large white board or collaborative wiki site.
7. Model the data ...
4. Connect data from different sources and
authoritative vocabularies (see list of popular
vocabularies below).
•Use URIs as names for your
objects
Wednesday, October 19, 2011
During the modeling process, donʼt think about how an application will use your data.
Instead, focus on modeling real world things that are known about the data and how it is
related to other objects. Take the time to understand the data and how the objects
represented in the data are related to each other.
8. Model the data ...
•Put aside immediate needs of any
application
•Don’t think about how an application will
use your data
•Do think about time and how the data will
change over time.
Wednesday, October 19, 2011
Focus on modeling real world things that are known about the data and how it is related to
other objects.
Take the time to understand the data and how the objects represented in the data are related
to each other.
9. Convert, Publish & Maintain
5.Write a script or process to convert the
data set repeatedly
6.Publish to the Web and announce it! (more
details shortly)
7.Maintenance strategy (more details in the
social contract at the end)
Wednesday, October 19, 2011
1.Expect to be maintained in perpetuity
2.Do not encode the name of the department or agency currently defining and naming a
concept, as that may be re-assigned
3.Support a direct response, or redirect to department/agency servers
10. Take the plunge ... Be forgiving
• Simplistic data models can still be useful
• Better to make progress with something
rather than do nothing because we cannot
be comprehensive and complete
Wednesday, October 19, 2011
Science still doesn’t have a good understanding of a gene. We have gene therapy yet we
haven’t agreed on a definition of a gene.
We capture vast quantities of topographical data (USGS), yet scientists still debate the
meaning of topographical elements. From the time we are young children, we use mono
syllabic words to navigate trees and roads. If our parents said we cannot do anything
because we don’t have a perfect model of the world, we couldn’t have learned to navigate our
home as toddlers.
11. Take an iterative approach
1. Review of modeling decisions
2. Review vocabularies chosen and developed
3. Modify/update data conversion scripts
4. Do a maintenance walk-through with real use cases
5. Show how to explore data with SPARQL and
visualizations
6. Discuss a persistent identifier strategy (think PURLs)
Wednesday, October 19, 2011
Iterate on this process in short sprints, two weeks at a time. Don’t be afraid to review
modeling decisions with SMEs. Review vocabulary choices
Do a maintenance walk through with actual use cases and ensure the team can carry forward
Show people their OWN DATA in visualization tools like Callimachus.
16. Wednesday, October 19, 2011
We used two common RDF vocabulary description languages in our modeling for SRS: RDF
Schema (RDFS) and Simple Knowledge Organization System (SKOS). RDFS is used to give
labels to objects, synonyms and substance lists. Human-readable comments were added
using rdfs:comment property.
17. Possible Solutions for Data
Management
Roll your own three-tier
Content Management System
Wiki-based
Linked Data Management System
Wednesday, October 19, 2011
A few different possible solutions to the three challenges stated earlier
18. Content Management Systems
Wednesday, October 19, 2011
The big downside to 3 tier architecture is the upfront cost, as well as getting people to agree upfront on the
schema
So we then looked at CMS. These are systems that can be up and running the same day, however these systems
are architected to work well with primarily unstructured content.
19. Wednesday, October 19, 2011
We have a strong heritage in FLOSS projects starting with the first community supported RDF
database in 2003. We offered a commercial version used by the US defense community
primarily, and in 2004 open sourced 80% into what became the Mulgara triple store and is
used by institutions all over the world. OpenRDF and Sesame was led by Aduna.
20. Linked Data Management System
Callimachus (kəәlĭm'əәkəәs) is a framework for data-driven
applications based on Linked Data principles.
Callimachus allows Web authors to quickly and easily create
semantically-enabled Web applications.
Wednesday, October 19, 2011
Wiki Systems don't handle structured content well nor promulgate change well.
A tool for Web 2.0 developers creating DATA RICH web sites was needed …
We created Callimachus, a triples up & down solution (no mySQL under the covers). HIGHLY SCALABLE for real world use.
Named for the father of Bibliography (The Pinakes) at the Great Library of Alexandria. Lived during 305-c. 240 BCE.
He could not categorize his own work using Aristotle's hierarchical system. He was the first person who defined the use case for Linked
Data.
21. Wednesday, October 19, 2011
Callimachus uses RDFa as a query langage; templates are parsed to build SPARQL from RDFa
markup and the query result set is returned to the Web page for human to read, or a machine
to parse. This is very valuable and to our knowledge, there is no other solution available as
FLOSS or commercially that compares to Callimachus at this time.
22. Wednesday, October 19, 2011
Once we had the data modeled, validated with SMEs, we converted & loaded into Callimachus.
We spent about 1 hour creating templates to view the data in Callimachus. So here is the
power of LOD in action -- Within one hour, we could view the data, navigate through the data
and verify the contents without being a DBA or Java developer!
23. Wednesday, October 19, 2011
Callimachus’ forms driven interface allows authorized users to modify the underlying triples
in the database -- we are round tripping create/modify/delete to a triple store via a Web
page!
29. Wednesday, October 19, 2011
A history of changes is kept. Note the change to the name and the added comment, along with the time/date
and name of the user who made the edit.
30. Wednesday, October 19, 2011
Callimachus view page of the SRS, created in less than an hour. Someone with HTML, CSS and
RDFa / SPARQL skills can create this type of page. No understanding of semantics, deep RDF
knowledge is required.
31. Wednesday, October 19, 2011
Notice the wiki like editing capabilities of a Callimachus page!
34. Web 2.0 developers can create data driven application
with templates in hours
Triples up & down (no mySQL under the covers)
Wiki editing of content
Access control
Collaboration via Web
Change tracking (history)
Page/form Templates
Wednesday, October 19, 2011
Callimachus is a great way to collaboratively manage your Linked Data
Media Wiki is to free text what Callimachus is to linked data
Callimachus uses a straight forward ACL for linked data
35. Join the Community
Callimachus has benefited from 2+ years of corporate support
We’re using it for real world Web applications in environmental
protection, finance and healthcare
We’d love to work with the publishing industry
Open Source project
Visit callimachusproject.org
Join the discussion
Wednesday, October 19, 2011
36. @BernHyland
Email. bhyland@3roundstones.com
Wednesday, October 19, 2011
37. Next talk today @ 14:00
Sala I - “Linked Open Government
Data Workshop”
WHY SHARE AND WHO
BENEFITS?
Bernadette Hyland, co-chair
W3C Government Linked Data Working Group
http://purl.org/net/bhyland/why-share-2011-10
Wednesday, October 19, 2011