Pushing back, standards and standard organizations in a Semantic Web enabled world
1. “Pushing Back”
Standards and
Standard Organizations
in a Semantic Web Enabled World
Kerstin Forsberg
Informatics Scientist
AstraZeneca
Mölndal, Sweden
Image: Flickr bitpuddle
(Twitter @eric_d_hancock)
2. Purpose
Encourage standard organisations to
“Use Standards for Standards”
Agenda
• Standards for Data and Semantics
• Exemplas of Standard Organizations
now looking into using Semantic Web
• Provenance/Justification for Mappings
2
Kerstin Forsberg | SWAT4LS, Dec 10th 2013
AZIT | R&D Information
3. Kerstin Forsberg (@kerfors)
“Information architect, semantic web and linked data enthusiast
caring about clinical trial data.”
• “Volvo Web Wave Project” 1995-1997
W3C conferences 1996 & 1999, Dublin Core, RDF
• “Extensible use of RDF in a business context”
paper presented at the W3C WWW9 conference, 2000,
Amsterdam
• “Advancing translational research with the Semantic Web”
joint W3C HCLS paper in BMC Bioinformatics, 2007
• “Linked data, an opportunity to mitigate complexity in
pharmaceutical research and development”
Summary of experiences from LarKC and W3C HCLS
2011 together with my colleague Bosse Andersson
3
Kerstin Forsberg | SWAT4LS, Dec 10th 2013
AZIT | R&D Information
4. About AstraZeneca
• Alongside our own R&D, we partner
with others, combining skills and
resources to broaden the potential
for successful innovation.
• We believe that only by working together with others
who have a part of play in improving healthcare can real
progress be made.
• We work closely with others in the healthcare
community, including physicians and those who pay for
healthcare, to understand their challenges and how we
can combine skills and resources to achieve a
common goal: improved health.
4
Kerstin Forsberg | SWAT4LS, Dec 10th 2013
AZIT | R&D Information
5. AstraZeneca’s view on “Semantics”
Enabling the hyperconnected enterprise
“We need to build a linked
data architecture enabling us
to ask questions and solve
business problems across a
heterogeneous information
landscape extending beyond
the traditional boundaries of
the enterprise.”
semanticsconnectsusall
5
Kerstin Forsberg | SWAT4LS, Dec 10th 2013
AZIT | R&D Information
6. Standards for Data and Semantics
Different types of standards
• Entity-based Ontologies
• Concept-based Terminologies/Code systems
• Code lists/Value sets/Term sets
• Data exchange (Tabulated data)
• Information Models
6
Kerstin Forsberg | SWAT4LS, Dec 10th 2013
AZIT | R&D Information
7. Standards for Data and Semantics
Examples
• Entity-based Ontologies
• Concept-based Terminologies/Code systems
• Code lists/Value sets/Term sets
• Data exchange (Tabulated data)
• Information Models
7
Kerstin Forsberg | SWAT4LS, Dec 10th 2013
AZIT | R&D Information
8. “Pushing back” – Use standards for standards
1. NCI (National Cancer Institute)Thesaurus
• Entity-based Ontologies
• Concept-based Terminologies/Code systems
• Code lists/Value sets/Term sets
• Data exchange (Tabulated data)
• Information Models
8
Kerstin Forsberg | SWAT4LS, Dec 10th 2013
AZIT | R&D Information
9. “Pushing back” – Use standards for standards
AZ Vocabulary Management team shared this with NCI EVS
• The NCI Thesaurus is an extensive medical
vocabulary published by the US National Institutes
of Health: http://ncit.nci.nih.gov/
• It is made available in several downloadable
formats: http://evs.nci.nih.gov/ftp1/NCI_Thesaurus
• In order for use to use the thesaurus in our system,
we need to convert it to RDF, following the SKOS
standard: http://www.w3.org/2004/02/skos/
Jim Morris, Informatics Scientist
AstraZeneca R&D Wilmington, USA
9
Kerstin Forsberg | SWAT4LS, Dec 10th 2013
AZIT | R&D Information
10. “Pushing back” – Use standards for standards
2. MedDRA (Medical Dictionary for Regulatory Activities)
• Entity-based Ontologies
• Concept-based Terminologies/Code systems
• Code lists/Value sets/Term sets
• Data exchange (Tabulated data)
• Information Models
10
Kerstin Forsberg | SWAT4LS, Dec 10th 2013
AZIT | R&D Information
11. “Pushing back” – Use standards for standards
AZ Vocabulary Management team shared this with MedDRA MSSO
A very simple SKOS-rendering
of MedDRA
• term skos:Concept
• hierarchy level
skos:ConceptScheme
• SMQ skos:Collection
Approach should be augmented with
VoID representation of MedDRA
versions and term properties
distinguishing active from inactive
terms.
Skos:Collection is likely not sufficient
to support SMQ versioning nor
context of terms in an SMQ (e.g.
weight)
11
Kerstin Forsberg | SWAT4LS, Dec 10th 2013
Courtland Yockey, Informatics Scientist
AstraZeneca R&D Wilmington, USA
AZIT | R&D Information
12. “Pushing back” – Use standards for standards
3. CDISC (Clinical Data Interchange Consortium)
• Entity-based Ontologies
• Concept-based Terminologies / Code systems
• Code lists/Value sets/Term sets
• Data exchange (Tabulated data)
• Information Models
12
Kerstin Forsberg | SWAT4LS, Dec 10th 2013
AZIT | R&D Information
13. Standards for Data Exchange
Clinical Trial Data standardized “containers”
Patient level
Submission standards SDTM
“designed so [FDA] reviewers with
no tools other than perhaps the SAS
Viewer would be able to open a
dataset and browse it easily”.
13
Kerstin Forsberg | SWAT4LS, Dec 10th 2013
Trial Summary level
AZIT | R&D Information
14. Standards for Data Exchange
Documentation of standardized “containers”
Human readable
documentation in 200+
pages PDF:s, Excel:s (and
some in XML).
14
Kerstin Forsberg | SWAT4LS, Dec 10th 2013
AZIT | R&D Information
15. Standards for Data Exchange
Data in standardized “containers”
CDISC SDTM
Implementation
Guideline (IG)
Humans can connect data to data
standards.
15
Kerstin Forsberg | SWAT4LS, Dec 10th 2013
AZIT | R&D Information
16. Standards for Data Exchange
Documentation of Standard fragments
CDISC SDTM
Model
1
2
CDISC SDTM
Implementation
Guideline (IG)
3 CDISC SDTM
Humans can connect data to data
standards and connect the
different standard fragments to
each other.
16
Kerstin Forsberg | SWAT4LS, Dec 10th 2013
Controlled Terminiolgy
AZIT | R&D Information
17. Standards for Data Exchange
Linked Clinical Data Standards
• CDISC2RDF started as a cross-pharma precompetitive project with AstraZeneca, Roche,
W3C et al. to show case Semantic Web
standards and Linked Data principles.
• Become part of the Semantic Technology
project, a FDA/PhUSE working group for
Emerging Technologies, with 30+ repr.
from FDA, CDISC, Pharma:s, CRO:s
and software vendors.
• First phase: Representing existing
“container” standards (SDTM, CDASH,
SEND, ADaM) in RDF.
17
Kerstin Forsberg | SWAT4LS, Dec 10th 2013
AZIT | R&D Information
18. Standards for Data Exchange
Linked Clinical Data Standards
Human readable documentation in
PDF:s, Excel:s (and some in XML)
Machine processable linked
data structured as RDF triples
(160.000+ )
Serializations of RDF triples
in Turtle and XML …
https://github.com/phuse-org/rdf.cdisc.org
18
Kerstin Forsberg | SWAT4LS, Dec 10th 2013
AZIT | R&D Information
19. Standards for Data Exchange
Linked Clinical Data Standards
Meta Model Schema (mms)
Based on the core ISO11179 model
(metadata for data elements and
a few CDISC specific classes and properties)
Import files
Human readable documentation in Annotated Excel files from CDISC with
classes and properties from the Schemas
PDF:s, Excel:s (and some in XML)
ready to transform to RDF triples
using a off-the-shelf tool
(TopQuadrant Composer)
Machine processable linked
data structured as RDF triples
(160.000+ )
Serializations of RDF triples
in Turtle and XML …
https://github.com/phuse-org/rdf.cdisc.org
19
Kerstin Forsberg | SWAT4LS, Dec 10th 2013
AZIT | R&D Information
20. Standards for Data Exchange
Annotating existing standards
Meta Model Schema (mms)
Based on the core ISO11179 model
(metadata for data elements and
a few CDISC specific classes and properties)
This turned out to be a good
way to communicate with
people knowledgeable in
CDISC but new to RDF
schemas to understand the
process of “triplification”.
20
Import files
Annotated Excel files from CDISC with
classes and properties from the Schemas
ready to transform to RDF triples
using a off-the-shelf tool
(TopQuadrant Composer)
Kerstin Forsberg | SWAT4LS, Dec 10th 2013
AZIT | R&D Information
21. CDISC and NCIT
Value sets is an issue
• Concept-based Terminologies / Code systems
• Code lists/Value sets/Term sets
mms:PermissibleValue
mms:ValueDomain
• Data exchange (Tabulated data)
mms:Data Element
mms:Dataset
mms:DataCollectionForm
21
Kerstin Forsberg | SWAT4LS, Dec 10th 2013
AZIT | R&D Information
22. Standards for Data Exchange
Cross standard review and mappings
Data Elements [SDTM, ADaM, CDASH] ”haveSame” Value Domain (CT)
22
Kerstin Forsberg | SWAT4LS, Dec 10th 2013
AZIT | R&D Information
23. Provenance/Justification for Mappings
Example from EU project SALUS for Post Market Safety Studies
The example show the hierarchy of cardiac disorders in both the MedDRA and
SNOMED-CT concept schemes, expressed using the skos:broader property. Mappings between
similar concepts in both concept schemes are stated using the skos:exactMatch property.
From: SALUS Harmonized Ontology for Post Market Safety Studies
23
Kerstin Forsberg | SWAT4LS, Dec 10th 2013
AZIT | R&D Information
24. Provenance/Justification for Mappings
Example from EU project SALUS for Post Market Safety Studies
MedDRA:10028596
skos:exactMatch
SNOMEDCT:22298006
The example show the hierarchy of cardiac disorders in both the MedDRA and
SNOMED-CT concept schemes, expressed using the skos:broader property. Mappings between
similar concepts in both concept schemes are stated using the skos:exactMatch property.
From: SALUS Harmonized Ontology for Post Market Safety Studies
24
Kerstin Forsberg | SWAT4LS, Dec 10th 2013
AZIT | R&D Information
25. Provenance/Justification for Mappings
Alternative: Mappings as LinkSets
The Dataset Descriptions for the Open
Pharmacological Space is a specification for
the metadata to described datasets, and the
LinkSets that relate them.
25
Kerstin Forsberg | SWAT4LS, Dec 10th 2013
AZIT | R&D Information
27. Summary
Encourage standard organisations to
“Use Standards for Standards”
for sustainability and trustability.
Think if …
semanticsconnectsusall
27
Kerstin Forsberg | SWAT4LS, Dec 10th 2013
AZIT | R&D Information
28. Acknowledgements
AZ’s Semantic Web Community of Practice members:
Tom Plasterer (lead), Jim Morris, Courtland Yockey, Sorana
Popa, Rob Hernandez, Mike Westaway, Rajan Desai, Simon
Rakov, Dana Crowley, Ian Dix, Johan Törnqvist
Collaborators and Advisors:
• Charlie Mead – IO Informatics
• Dean Allemang – Working Ontologist
• Frederik Malfait – IMOS consulting / Roche
• Phil Ashworth – TopQuadrant
Thank you! Kerstin.l.forsberg@astrazeneca.com
28
Kerstin Forsberg | SWAT4LS, Dec 10th 2013
AZIT | R&D Information