20111120 warsaw learning curve by b hyland notes

Climbing the Learning Curve
with Linked Data
Open Government Data Camp 20-Oct-2011

Bernadette Hyland, CEO
bhyland@3roundstones.com
Twitter @BernHyland

Wednesday, October 19, 2011

Information overload, Impatient society, Change is the only constant
Software is not valued by its usefulness ... but by its expected future value

• Linked Data is about
publishing and
consuming data using
international data
standards
• Based on 20 year old
idea
• Goal is to solve
organizational issues
related to data silos,
requirements for faster
data integration and an
environment of reduced
IT budgets


Why am I speaking on Linked Data and sharing? I’m here in my role as the co-chair of W3C GLD WG.
I’m also a long time entrepreneur in this space having founded companies that led to several of the most
widely used Open Source projects for Linked Data, including Mulgara, OpenRDF/Sesame, the PURLs
2.0 and Callimachus. I’ve authored chapters in two of these peer-reviewed books published by Springer
which are available in hardcopy or for free, via the Web.

There is a Process

Identify Model Name Describe Convert Publish

Maintain


Identify the data, model exemplar records -- what you are going to carry forward & what you
are going to leave behind.
Name all of the NOUNs. Turn the records into URIs.
Next, describe RESOURCES with vocabularies.
Write a script or process to convert from canonical form to RDF. Then publish. Maintain over
time.

Preparation
1. Leverage what exists
• Request a copy of the logical and physical model of the
database(s)
• Obtain data extracts (i.e., databases and/or spreadsheets)
or create data in a way that can be replicated.


Linked Data modelers typically model two or three exemplar objects to begin the process.
We ﬁgure out the relationships and identify how each object relates to the real world, initially
drawing on a large white board or collaborative wiki site.

Model the data
2. Model data without context to allow for
reuse and easier merging of data sets

• Traditional DBAs organize data for speciﬁed
Web services or applications.

• With LD, application logic does not drive the
data schema, concepts, etc.


LD domain experts model data without context versus traditional modelers who typically
organize data for speciﬁed Web services or applications.
Application logic does not drive the data schema.
Better enables data reuse and easier merging of data sets.

Model the data
3.Look for real world objects of interest (e.g.,
people, places, things, locations, etc.) and
model them.
• Investigate how others are already modeling
similar or related data.
• Look for duplication and normalize the data
• Use common sense to decide whether or
not to make link


Linked Data modeling experts typically model two or three exemplar objects to begin the
process. We ﬁgure out the relationships and identify how each object relates to the real
world, initially drawing on a large white board or collaborative wiki site.

Model the data ...
4. Connect data from different sources and
authoritative vocabularies (see list of popular
vocabularies below).
•Use URIs as names for your
objects


During the modeling process, donʼt think about how an application will use your data.
Instead, focus on modeling real world things that are known about the data and how it is
related to other objects. Take the time to understand the data and how the objects
represented in the data are related to each other.

Model the data ...

•Put aside immediate needs of any
application
•Don’t think about how an application will
use your data
•Do think about time and how the data will
change over time.


Focus on modeling real world things that are known about the data and how it is related to
other objects.
Take the time to understand the data and how the objects represented in the data are related
to each other.

Convert, Publish & Maintain
5.Write a script or process to convert the
data set repeatedly
6.Publish to the Web and announce it! (more
details shortly)
7.Maintenance strategy (more details in the
social contract at the end)


1.Expect to be maintained in perpetuity
2.Do not encode the name of the department or agency currently deﬁning and naming a
concept, as that may be re-assigned
3.Support a direct response, or redirect to department/agency servers

Take the plunge ... Be forgiving

• Simplistic data models can still be useful
• Better to make progress with something
rather than do nothing because we cannot
be comprehensive and complete


Science still doesn’t have a good understanding of a gene. We have gene therapy yet we
haven’t agreed on a deﬁnition of a gene.

We capture vast quantities of topographical data (USGS), yet scientists still debate the
meaning of topographical elements. From the time we are young children, we use mono
syllabic words to navigate trees and roads. If our parents said we cannot do anything
because we don’t have a perfect model of the world, we couldn’t have learned to navigate our
home as toddlers.

Take an iterative approach
1. Review of modeling decisions

2. Review vocabularies chosen and developed

3. Modify/update data conversion scripts

4. Do a maintenance walk-through with real use cases

5. Show how to explore data with SPARQL and
visualizations

6. Discuss a persistent identiﬁer strategy (think PURLs)


Iterate on this process in short sprints, two weeks at a time. Don’t be afraid to review
modeling decisions with SMEs. Review vocabulary choices
Do a maintenance walk through with actual use cases and ensure the team can carry forward
Show people their OWN DATA in visualization tools like Callimachus.


We used two common RDF vocabulary description languages in our modeling for SRS: RDF
Schema (RDFS) and Simple Knowledge Organization System (SKOS). RDFS is used to give
labels to objects, synonyms and substance lists. Human-readable comments were added
using rdfs:comment property.

Possible Solutions for Data
Management
Roll your own three-tier

Content Management System

Wiki-based

Linked Data Management System


A few different possible solutions to the three challenges stated earlier

Content Management Systems


The big downside to 3 tier architecture is the upfront cost, as well as getting people to agree upfront on the
schema
So we then looked at CMS. These are systems that can be up and running the same day, however these systems
are architected to work well with primarily unstructured content.


We have a strong heritage in FLOSS projects starting with the ﬁrst community supported RDF
database in 2003. We offered a commercial version used by the US defense community
primarily, and in 2004 open sourced 80% into what became the Mulgara triple store and is
used by institutions all over the world. OpenRDF and Sesame was led by Aduna.

Linked Data Management System
Callimachus (kəәlĭm'əәkəәs) is a framework for data-driven
applications based on Linked Data principles.

Callimachus allows Web authors to quickly and easily create
semantically-enabled Web applications.


Wiki Systems don't handle structured content well nor promulgate change well.
A tool for Web 2.0 developers creating DATA RICH web sites was needed …
We created Callimachus, a triples up & down solution (no mySQL under the covers). HIGHLY SCALABLE for real world use.
Named for the father of Bibliography (The Pinakes) at the Great Library of Alexandria. Lived during 305-c. 240 BCE.
He could not categorize his own work using Aristotle's hierarchical system. He was the ﬁrst person who deﬁned the use case for Linked
Data.


Callimachus uses RDFa as a query langage; templates are parsed to build SPARQL from RDFa
markup and the query result set is returned to the Web page for human to read, or a machine
to parse. This is very valuable and to our knowledge, there is no other solution available as
FLOSS or commercially that compares to Callimachus at this time.


Once we had the data modeled, validated with SMEs, we converted & loaded into Callimachus.
We spent about 1 hour creating templates to view the data in Callimachus. So here is the
power of LOD in action -- Within one hour, we could view the data, navigate through the data
and verify the contents without being a DBA or Java developer!


Callimachus’ forms driven interface allows authorized users to modify the underlying triples
in the database -- we are round tripping create/modify/delete to a triple store via a Web
page!


Note the ﬁxed name and added
comment.


A history of changes is kept. Note the change to the name and the added comment, along with the time/date
and name of the user who made the edit.


Callimachus view page of the SRS, created in less than an hour. Someone with HTML, CSS and
RDFa / SPARQL skills can create this type of page. No understanding of semantics, deep RDF
knowledge is required.


Notice the wiki like editing capabilities of a Callimachus page!

Web 2.0 developers can create data driven application
with templates in hours
Triples up & down (no mySQL under the covers)
Wiki editing of content
Access control
Collaboration via Web
Change tracking (history)
Page/form Templates


Callimachus is a great way to collaboratively manage your Linked Data
Media Wiki is to free text what Callimachus is to linked data
Callimachus uses a straight forward ACL for linked data

Join the Community
Callimachus has benefited from 2+ years of corporate support

We’re using it for real world Web applications in environmental
protection, finance and healthcare

We’d love to work with the publishing industry

Open Source project

Visit callimachusproject.org

Join the discussion


@BernHyland
Email. bhyland@3roundstones.com


Next talk today @ 14:00
Sala I - “Linked Open Government
Data Workshop”
WHY SHARE AND WHO
BENEFITS?
Bernadette Hyland, co-chair
W3C Government Linked Data Working Group

http://purl.org/net/bhyland/why-share-2011-10


20111120 warsaw learning curve by b hyland notes

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (20)

Similar a 20111120 warsaw learning curve by b hyland notes

Similar a 20111120 warsaw learning curve by b hyland notes (20)

Más de Bernadette Hyland-Wood

Más de Bernadette Hyland-Wood (14)

Último

Último (20)

20111120 warsaw learning curve by b hyland notes