One day workshop to Sentara Healthcare on using a Linked Data approach for enterprise architecture. Topics include: Open Government Data initiatives, demo of Weather Health Web application; leveraging open data from NIH, NLM, NOAA, EPA, HHS; Callimachus Enterprise, a Linked Data Management System for the enterprise.
Sales & Marketing Alignment: How to Synergize for Success
Sentara Linked Data Workshop - Sept 10, 2012
1. Integrated Data for Improved
Personal Health Delivery
10-September-2012
Presenters: Bernadette Hyland, David Wood & Luke Ruth
Email. bhyland@3roundstones.com
Twitter: @BernHyland
This presentation: http://slideshare.net/3roundstones
2. Today’s Agenda
• 9.00-9.20 - (All) Introductions
• 9.20-9.45 - (Phil) Goals & objectives
• 9.45-10.30 - (Bernadette) Value proposition of Linked Data, update on
government data publishing initiatives, Health Datapalooza
• 10.30-11.10 - (David) Intro to enterprise linked data, a resource oriented
approach to interoperability
• 11.10-11.30 - Break
• 11.30-12noon - (Luke) Review of Weather Health app development
• 12.00-12.45 - lunch
• 12.45-1.30 (David) Web of data architecture, Callimachus
• 1.30-2.15 (All) Building support within Sentara, uses cases for Weather
Health (Phase I), Q&A
3. Introductions ...
• Sentara team
• 3 Round Stones team
• Dave Wood, PhD - Enterprise Architect
• Bernadette Hyland - Sr. Solutions Architect
• Luke Ruth - Software Engineer
• ... All specialists in Web architecture & Linked Data
5. • Linked Data is
about publishing
and consuming
data using
international data
standards
• Based on 20 year
old idea
• A system of linked
information systems
Why am I speaking on Linked Data and sharing today? I’m here in my
role as the co-chair of W3C GLD WG.
I’m a serial entrepreneur in this space having founded several companies
that led some of the most widely used Open Source projects for Linked
Data, including Mulgara, OpenRDF/Sesame, the PURLs 2.0 and
Callimachus.
I’ve authored chapters a couple peer-reviewed chapters in these books
which are available in hardcopy or for free, via the Web.
7. Jeff Pollock, Oracle
Businesses are in future shock
• Needs changing at faster pace
• Affordable Care Act,
new regulations, changes in
global economy accelerating
changes
• Information increasingly more
central to the operation of
any business
In a dynamic economy, we have to adapt quickly. We cannot change people or
hardware fast enough. We have to take a new approach in software to deal
with this. This is a quote from a director @ Oracle who is saying this.
Credits: (c) Random House
8. "If information systems are to
keep up with business,
we need to change more than
technology -
we need to change how people deal
with technology."
- Jeff Pollock
Of course, Jeff also said "Changes in behavior have to be well-
motivated and show some visible value immediately."
9. Goal for improved health delivery ...
• Harness larger & more complex datasets to
evaluate the potential for health impacts
• More accurately predict factors that
contribute to illness or diagnose disease
10. DATA
Drives every decision we make daily &
every decision others make on our behalf
What is happening to data? We are sharing it ...
The Web is the a natural place to publish information for public dissemination.
The modern Web is an information system owned by no one and yet open to
vendors, governments and private citizens. The Web of documents has been a
great place to share HTML, PDF. However we are entering the Web of Data.
This is how we’ll share most open data in the next decade.
11. “We’re moving from managing
documents to managing discrete pieces of
open data and content which can be
tagged, shared, secured, mashed up and
presented in the way that is most useful
for the consumer of that information.”
-- Report on Digital Government: Building a 21st Century Platform to
Better Serve the American People
Governments around the world are defining detailed digital services plans that
are based on Open data and open APIs to deliver government and private digital
services.
At the highest level, government executives in the UK, EU, US, India, Brazil are
committed to managing open data and content in a way that is useful for the
consumer of that content. The question is HOW?
12. Sharing
Worldwide
We are sharing documents and data worldwide, routinely with people we don’t
know.
If achieved, it will transform how governments interact with one another,
between nations and how they serve their citizens in the 21st Century.
Using the Web to solicit input and inform decision making, and ultimately, to
create a more transparent and accessible government is a very, very worthwhile
goal.
13. Who is sharing their data ... ? Small and large commercial and government
organizations, NGOs, Non-profits ... plus many universities.
Governments in the last few years have been responding to Open Government
initiatives that mandate publishing open government data.
Some are careful, slow-moving entities who simply needed to find real solutions
to real problems.
18. Common business need ...
• The ability to integrate & manage large
amounts of data in a rigorous & transparent
manner
• Discovery through interaction of scientific
communities, including biomedical
informatics & evidence-based medicines
19. How many are doing it ...
the Web of Data
• No one vendor owns it
• It scales ... to Web-scale
• Doesn’t require a super model
• Based on International Data Exchange
Standards (RDF, SPARQL)
Scope: Bigger than any other deployed system
Infinitely adaptable: Changes piecemeal and allows for ad hoc
additions & changes.
Ownership: Nobody owns it
20. Let’s look at some ‘versions’ of the Web. It should be said here that Tim
Berners-Lee, the recognized “father” of the WWW, doesn’t like the idea
of versioning the Web. I happen to agree, but I understand why people do
it.
As we talk about these versions of the Web, you may want to think of
this as a continuum with significant waves; each with its own
benchmark technologies rather than specific versions with distinct
start and end points.
Nova Spivack of Radar Networks and Twine.com created this.
21. RDF is a lingua
franca for data
exchange
Not all of Open Government content is Linked Data. A relatively small
percentage of open data is 4-5 star linked data, however it is growing
exponentially.
Use of structured data is actively promoted by international standards groups
like the W3C and major search engines, Google, Yahoo!, Bing, Yandex.
22. Semantic
Technologies
Semantic Linked
Web Data
Linked Open Data is a small, pragmatic portion of the greater body of
Semantic Technologies & international standards for data.
23. The 5 Stars of Open Linked Data
Guidance per Tim Berners-Lee, W3C
★ Make your stuff available on the web (any format)
★★ make it available as structured data (e.g. Excel instead of
image scan of a table)
★★★ Use a non-proprietary format (e.g. CSV instead of Excel)
★★★★ Use URLs to identify things, so that people can point at your
stuff
★★★★★ Link your data to other people’s data to provide context
Credit: http://www.w3.org/DesignIssues/LinkedData.html
24. 5 Stars of Open Linked Vocabularies
Bernard Vatant (Mondeca) Guidance
★ Publish your vocabulary on the Web at a stable URI
★★ Provide human-readable documentation and basic metadata
(e.g. creator, publisher, date of creation, last modification,
version number)
★★★ Provide labels and descriptions, if possible in several
languages, to make your vocabulary usable in multiple
linguistic scopes
★★★★ Make your vocabulary available via its namespace URI, both
as a formal file and human-readable documentation, using
content negotiation
★★★★★ Link to other vocabularies by re-using elements rather than
re-inventing
Credit: http://blog.hubjects.com/2012/02/is-your-linked-data-vocabulary-5-star_9588.html
25. Why is RDF important?
• It is an international standard for publishing data on
the Web (public and private)
• Data exchange model
• It is the future of the Web
• ... because it is how we share and reuse data
Leading publishers, HCLS scientists, library scientists, new media,
old media, retailers have all committed to structured data for
improved search & access.
26. WE’VE SEEN THIS BEFORE
Like HTML and RDF, credit cards have a human-readable side and a machine-readable side.
27. Each HTML page is paired with a machine-readable data representation.
28. Open Government Data
3 brief years ...
• Starting in 2008, a few heads of state directed open
government data to be published on the Web
• In September 2011, Presidents Obama (USA) and
Rousseff (Brazil) endorsed the Open Government
Partnership
• 7 other nations launched their government’s National
Plans during the meeting of the UN General Assembly
Beginning in 2008, a a couple of heads of state embraced directed open
government to be published on the Web. Last month, (September 2011),
President Obama and President Dilma Rousseff stood with other heads of
state to endorse the principles of the Open Government Partnership and
launch their government’s Open Government National Plans during the
meeting of the UN General Assembly.
In addition to Brazil and the US, nations who have made committments
include: Indonesia, Mexico, Norway, Philippines, South Africa, and the
UK.
29. What is next for Data?
• Structured data on the Web is rapidly becoming
mainstream
• Government authorities are funding more Linked
Open Data projects, especially for weather, human
health and scientific research
• In 2012 we’re seeing Apps Challenges, hack-a-thons,
funding ($1M-$200M)
What’s next? We are already seeing signs of the things to come.
Structured data on the Web is quickly becoming mainstream.
There have been many well-publicized triple challenges, hack-a-thons, apps challenges
-- they are popping up everywhere.
Organizations with mission critical applications based on relational technologies are
creating a layer above their traditional architectures and building Linked Data-driven
Web apps.
Web apps based on LD are beginning to replace data warehouses.
30. Publishing data in 2012 & beyond ...
• Good = Use Data Standards (RDF) to publish
metadata about data and models
• Better = Use a Linked Data approach to publish all
your open data on the Web
• Best = Link your data + models using a Linked Data
approach
• Web architecture, Web-scale
31.
32. Open
CDC Linked Data
Government
EPA Cloud DBpedia
Data Ontology
Clinical
US Pub Med
Census Business NLM
Ontology Internal
Social Portal
Media Data Physicians
Facebook
Twitter EMR Services
Data Locations
Clinical Condition Specific
33. Methodology
1. Define target population and clinical data from
electronic medical record
2. Identify sources of open government data related to
environmental, weather, and other variables related to
chronic pulmonary disease exacerbations
3. Combine open content from NLM, PubMed, Medline to
support education
4. Leverage a Linked Data approach, using Open Source
and international data exchange standards (RDF)
5. Alert patient of possible hazardous conditions and
recommend appropriate actions
34. Iterative Approach
• Initial POC delivered May 2012 (60 day sprint)
• EMR (anonymized)
• EPA air quality
• Doctors listing (spreadsheet)
• Demo’d at Health Datapalooza, Washington DC in June
35. Health
Data
Ini,a,ve
Forum
III
Health
Datapalooza
Using EMR and
Linked Open Data
to
Manage Chronic
Asthma and COPD
36. Conceptual MODEL
Pa$ents
with
chronic
pulmonary
disease
that
are
educated
and
no$fied
of
adverse
environmental,
weather,
and
geographic
condi$ons
are
.
.
.
be#er
able
to
respond
and
proac/vely
manage
their
condi/on.
Health
Data
Ini,a,ve
Forum
III
Health
Datapalooza
37. Value PROPOSITION
MODEL
Decrease in
costly
Emergency
Department
visits
Reduce
hospital
re-‐admissions
aBer
treatment
Improve self-‐care
and
medica$on
compliance
Awareness
of
triggers
and
disease
management
Health
Data
Ini,a,ve
Forum
III
Health
Datapalooza
38. Big data ecosystem includes
complex data
A phased approach to delivery of a successful Weather Health Explorer
application is selecting both available and reliable data sources as
inputs. It is for these reasons, authoritative government sources from
organizations including the National Library of Medicine (NLM), National
Oceanic and Atmospheric Association (NOAA) and the US Environment
Protection Agency (EPA) have been selected for use in this project.
39. Leverage
Linked DATA,
OPEN
SOURCE
&
STANDARDS
SEMANTIC
FRAMEWORK
Web
of
Data
CDC DBpedia SMS
EPA Pub
Med
US
Census NLM
Email
Web
EMR
Health
Data
Ini,a,ve
Forum
III
Health
Datapalooza
Callimachus is a Linked Data Management platform that takes full advantage of RDF and data
driven navigation. Created with Web 2.0 developers in mind.
Governments are providing citizens access to open government data;
Corporates can information to the public, customers, suppliers, regulators, with timely information on
the corporation;
Research portals etc.
40.
41. Today’s Asthma Forecast
Current
Anticipate and Prevent
EPA
Data
Patient
Admission Data
by Date
Historic EPA Data
at Admission
42. Progress Update
• June - Sept 2012
• Designed Weather Health Web application
• Identified data sources (NIH, NOAA, EPA)
• Created a Web based application with live data feeds from
NIH, NOAA & EPA
• Hosted on the cloud using a linked data management
system, Callimachus
47. The NLM will function as the primary source for drug-related information.
The NLM publishes multiple API’s that could be of use to this project but
the most immediately beneficial will probably be one called DailyMed.
DailyMed is an API that offers access to current Structured Product Label
(SPL) information for drugs.
49. Drug information may also be taken from a service called MedlinePlus -
which is organized and distributed by the National Library of Medicine,
National Institutes of Health, and the Department of Health and Human
Services. Upgrades are currently being done to MedlinePlus which will
include the ability to return an XML document as opposed to a search
results page. This feature would be extremely useful and if fully functional,
may make MedlinePlus the logical choice for primary drug information.
50. Hosted
on cloud
Off-s
S
n
M
/S ns
atio
notifications
Email/SMS
ail atio
ite b
Em tific
istr
no
a
min
cku
Ad
ps
Monitoring Service
Application-level
monitoring
l
ve SNS
-le
m ring
ste ito
Sy on
m
Callimachus
(application)
Additional attached Periodic snapshots
storage (backup)
EBS - 50 GB S3 - 50 GB
M2.2XLarge
HTTP/HTTPS
Public users
51. In summary, Weather Health ...
• Leverages internal and external structured data on the Web
• All data from authoritative sources
• Involves a combination of static and dynamic data
• Hosted on the cloud using AWS
• Created using a linked data management system
• Callimachus enables Web 2.0 internal or contract
developers to combine data sources & quickly build a
web UI for Web or mobile devices
The Weather Health application can also serve to warn patients of drug interactions or advising
them on dosage. There is also opportunity for smaller modules within the application such as pill
identification by using imprint data. This application was built using Callimachus, a data platform
for data-driven applications. Callimachus allows Web 2.0 developers within Sentara or external
developers to combine multiple data sources and quickly build a Web UI.
The basic architecture for the Weather Health solution involves a combination of both static (or
pseudo-static) and dynamic data.
53. Web of Data
• Resource oriented approach to data interoperability
• Callimachus Overview
• Maturity of ecosystem
• Development environments, reporting tools, databases,
hosting, commercial support & training
• Next steps, an iterative approach
54. A History of Silos
$ cat foo.txt
| grep blah |
sort
1970s 1980s 1990s
A neat little package Client-Server The Early Web
55. The Next Great Leap
Extending the Ubiquitous,
Universal Client reusable applications
Expanding the
Universal Connection
Explaining the
Web of Data Logic
Providing the
Universal
Database
57. Requirements of The Informatics Landscape
Maximal Agility
vMust span the entire drug development lifecycle
o and back (post-market surveillance to discovery)
vMust support large and very heterogeneous data
o single nucleotide polymorphisms to countries
vWill change as new science emerges & new regulations come into play
o Medline just under 1M articles/year
vMust be able to work with multiple, international regulatory bodies
o Emerging markets
vPartners, customers and collaborators will change
o and will have divergent technical aptitudes
vMust be able to interoperated with pre-competitive consortia
o Can they perform common tasks for the community
vMust be able to work with legacy data
o Lots of unmined gems here!
Slide credit: Tom Plaster, PhD, AstraZeneca R&D | RDI
58. Improving Internal Interoperability
Scientists, Clinicians, Informaticists can now freely interoperate as:
vThe PURL server provides a central identity management authority for
resources that are of value (need to persist) across the enterprise. The
Persistent URLs are used to connect resources found in multiple locations
vThe vocabulary server provides a way of harmonizing concepts across
different domains
o Where possible, public vocabularies are used
o Where not, they’re extended
o We don’t want to develop and maintain vocabularies
Slide credit: Tom Plaster, PhD, AstraZeneca
59. • Callimachus is a framework for data-driven applications
based on Linked Data principles
• Callimachus allows Web developers to easily create data
driven applications for the Web
• It is Open Source (FLOSS)
• http://callimachusproject.org
60. Tools & best practices?
• Large and small vendors are involved in Linked Data
• From Oracle, IBM to 3 Round Stones
• Listing of active research projects & deployments See
http://dir.w3.org/
• Best practices, see http://www.w3.org/2011/gld/charter
61. W3C HCLS
The mission of the Semantic Web Health Care and Life Sciences Interest
Group (HCLS IG) is to develop, advocate for, and support the use of
Semantic Web technologies across health care, life sciences, clinical
research and translational medicine
vActivities:
oContinue to develop high level (e.g. TMO) and architectural (e.g. SWAN)
vocabularies.
oImplement proof-of-concept demonstrations and industry-ready code.
oDocument guidelines to accelerate the adoption of the technology.
oDisseminate information about the group's work at government, industry, academic
events and by participating in community initiatives.
vUse Cases/Domains
oDrug Discovery
oElectronic Lab Notebooks
oComparator Arm Data
oPatient Data Ownership
oBiotech Acquisition
oSupply Chain Automation
oWeb Integration
oBio-surveillance
oCo-development
Reference: http://www.w3.org/blog/hcls/ Slide credit: Tom Plaster, PhD, AstraZeneca