SlideShare una empresa de Scribd logo
1 de 10
Descargar para leer sin conexión
Towards Linked Vital Registration Data for
Reconstituting Families and Creating
Longitudinal Health HistoriesLongitudinal Health Histories
Oya Beyan, Ciara Breathnach, Sandra Collins,
Christophe Debruyne, Stefan Decker, Dolores Grant,
Rebecca Grant, and Brian Gurrin
21st of July 2014 – KR4HC Workshop – Vienna, Austria21st of July 2014 – KR4HC Workshop – Vienna, Austria
Irish Record Linkage, 1864-1913
• Developing a platform applying semantic
technologies to historical birth-, death andtechnologies to historical birth-, death and
marriage certificates.
• Answering questions such as: “How accurate are
historic maternal mortality rates (MMR) and
infant mortality rates (IMR) for Dublin?”
• Team consists of researchers (historians), digital
archivists, and knowledge engineers.
21/07/2014 2
Data: General Office Records
• Vital registration data
– Birth-certificates– Birth-certificates
– Death-certificates
– Marriage records
• Digitised TIFF images of
hardcopy indexes and
registers.
• 2 TB of data• 2 TB of data
• Database describing the
digitised records allowing
searches on some fields.
21/07/2014 3
©General Records Office of Ireland 2014
Challenges
• Certified causes of death that can be attributed to maternal
death
– Within 42 days after labour – before (1864) it was 12– Within 42 days after labour – before (1864) it was 12
– Septicemia (blood poisoning), Fever, …
– “Corresponding” birth certificate?
• Death certificates with no corresponding birth certificate
• “Gaps” in sibship interval, even though no birth- or death
certificates can be found.
• The terminology used pre-1900. E.g., “debile” to denote• The terminology used pre-1900. E.g., “debile” to denote
weak or a failure to thrive.
• Capturing the socio-economical status of the families via,
for instance, the professions, ranks of fathers.
21/07/2014 4
Conceptual Architecture
Digital Archivist
SPARQL endpoint /
Linked Data Server
Updates
GRO records
as RDF
LinksLinker UpdaterRepository
Triple-
store
Linked Data Server
Analytics
Researcher
21/07/2014 5
DATA ANALYTICSPRESERVATION
Links to external datasets: e.g., Logainm – a database of Irish historical and
contemporary place names to provide additional context.
Development of 2 ontologies
Triplestore 2 Data Analysis
CONCERNSSEPARATIONOFCONCERNS
Obviously, due to
the sensitive
nature of the
data, data
protection is key.
21/07/2014 6
GRO Triplestore
Transformation from one model to another
• SPIN – SPARQL Inference
• SWRL / RuleML
• SPARQL Construct
• …
SEPARATION
protection is key.
Development of 2 ontologies
• 2 ontologies were developed – separation of concerns
• First ontology for describing the contents of records
– OWL 2 shallow, “flat ontology”
• Second ontology for data analysis
– OWL 2 + rules
– Rules to capture background and domain knowledge– Rules to capture background and domain knowledge
– Developed by having the historians formulate competency
questions (Grüninger and Fox)
– Captured graphically using Object Role Modelling
21/07/2014 7
Graphical Representation in ORM
21/07/2014 8
### Prefixes ommitted …
irl:Record a owl:Class ;
rdfs:label "Record" ; .
irl:Certificate a owl:Class ;
rdfs:label "Certificate" ;
rdfs:subClassOf irl:Record; .rdfs:subClassOf irl:Record; .
irl:BirthRecord a owl:Class ;
rdfs:label "Birth Record" ;
rdfs:subClassOf irl:Certificate ; .
irl:DeathRecord a owl:Class ;
rdfs:label "Death Record" ;
rdfs:subClassOf irl:Certificate ; .
irl:MarriageRecord a owl:Class ;
rdfs:label "Marriage Record" ;rdfs:label "Marriage Record" ;
rdfs:subClassOf irl:Record ; .
irl:Return a owl:Class ;
rdfs:label "Return" ; .
…
21/07/2014 9
Conclusions
• Presented the problem and highlighted the
challengeschallenges
• Developed two ontologies
– Encoding contents of digitized GRO records for
long-term digital preservation DRI
– Data analytics to answer the researchers’
question – in this case a historianquestion – in this case a historian
• Data exploration and annotation of the
records started on a subset of the dataset
21/07/2014 10

Más contenido relacionado

Más de Christophe Debruyne

Generating Executable Mappings from RDF Data Cube Data Structure Definitions
Generating Executable Mappings from RDF Data Cube Data Structure DefinitionsGenerating Executable Mappings from RDF Data Cube Data Structure Definitions
Generating Executable Mappings from RDF Data Cube Data Structure Definitions
Christophe Debruyne
 

Más de Christophe Debruyne (20)

One year of DALIDA Data Literacy Workshops for Adults: a Report
One year of DALIDA Data Literacy Workshops for Adults: a ReportOne year of DALIDA Data Literacy Workshops for Adults: a Report
One year of DALIDA Data Literacy Workshops for Adults: a Report
 
Projet TOXIN : Des graphes de connaissances pour la recherche en toxicologie
Projet TOXIN : Des graphes de connaissances pour la recherche en toxicologieProjet TOXIN : Des graphes de connaissances pour la recherche en toxicologie
Projet TOXIN : Des graphes de connaissances pour la recherche en toxicologie
 
Knowledge Graphs: Concept, mogelijkheden en aandachtspunten
Knowledge Graphs: Concept, mogelijkheden en aandachtspuntenKnowledge Graphs: Concept, mogelijkheden en aandachtspunten
Knowledge Graphs: Concept, mogelijkheden en aandachtspunten
 
Reusable SHACL Constraint Components for Validating Geospatial Linked Data
Reusable SHACL Constraint Components for Validating Geospatial Linked DataReusable SHACL Constraint Components for Validating Geospatial Linked Data
Reusable SHACL Constraint Components for Validating Geospatial Linked Data
 
Hidden Amongst the Data: the Beyond 2022 Knowledge Graph
Hidden Amongst the Data: the Beyond 2022 Knowledge GraphHidden Amongst the Data: the Beyond 2022 Knowledge Graph
Hidden Amongst the Data: the Beyond 2022 Knowledge Graph
 
Facilitating Data Curation: a Solution Developed in the Toxicology Domain
Facilitating Data Curation: a Solution Developed in the Toxicology DomainFacilitating Data Curation: a Solution Developed in the Toxicology Domain
Facilitating Data Curation: a Solution Developed in the Toxicology Domain
 
Using Maps for Interlinking Geospatial Linked Data
Using Maps for Interlinking Geospatial Linked DataUsing Maps for Interlinking Geospatial Linked Data
Using Maps for Interlinking Geospatial Linked Data
 
Linked Data Publication and Interlinking Research within the SFI funded ADAPT...
Linked Data Publication and Interlinking Research within the SFI funded ADAPT...Linked Data Publication and Interlinking Research within the SFI funded ADAPT...
Linked Data Publication and Interlinking Research within the SFI funded ADAPT...
 
Towards Generating Policy-compliant Datasets (poster)
Towards GeneratingPolicy-compliant Datasets (poster)Towards GeneratingPolicy-compliant Datasets (poster)
Towards Generating Policy-compliant Datasets (poster)
 
Towards Generating Policy-compliant Datasets
Towards Generating Policy-compliant DatasetsTowards Generating Policy-compliant Datasets
Towards Generating Policy-compliant Datasets
 
Generating Executable Mappings from RDF Data Cube Data Structure Definitions
Generating Executable Mappings from RDF Data Cube Data Structure DefinitionsGenerating Executable Mappings from RDF Data Cube Data Structure Definitions
Generating Executable Mappings from RDF Data Cube Data Structure Definitions
 
Uplift – Generating RDF datasets from non-RDF data with R2RML
Uplift – Generating RDF datasets from non-RDF data with R2RMLUplift – Generating RDF datasets from non-RDF data with R2RML
Uplift – Generating RDF datasets from non-RDF data with R2RML
 
A Lightweight Approach to Explore, Enrich and Use Data with a Geospatial Dime...
A Lightweight Approach to Explore, Enrich and Use Data with a Geospatial Dime...A Lightweight Approach to Explore, Enrich and Use Data with a Geospatial Dime...
A Lightweight Approach to Explore, Enrich and Use Data with a Geospatial Dime...
 
Client-side Processing of GeoSPARQL Functions with Triple Pattern Fragments
Client-side Processing of GeoSPARQL Functions with Triple Pattern FragmentsClient-side Processing of GeoSPARQL Functions with Triple Pattern Fragments
Client-side Processing of GeoSPARQL Functions with Triple Pattern Fragments
 
Serving Ireland's Geospatial Information as Linked Data
Serving Ireland's Geospatial Information as Linked DataServing Ireland's Geospatial Information as Linked Data
Serving Ireland's Geospatial Information as Linked Data
 
Serving Ireland's Geospatial Information as Linked Data (ISWC 2016 Poster)
Serving Ireland's Geospatial Information as Linked Data (ISWC 2016 Poster)Serving Ireland's Geospatial Information as Linked Data (ISWC 2016 Poster)
Serving Ireland's Geospatial Information as Linked Data (ISWC 2016 Poster)
 
R2RML-F: Towards Sharing and Executing Domain Logic in R2RML Mappings
R2RML-F: Towards Sharing and Executing Domain Logic in R2RML MappingsR2RML-F: Towards Sharing and Executing Domain Logic in R2RML Mappings
R2RML-F: Towards Sharing and Executing Domain Logic in R2RML Mappings
 
Towards a Project Centric Metadata Model and Lifecycle for Ontology Mapping G...
Towards a Project Centric Metadata Model and Lifecycle for Ontology Mapping G...Towards a Project Centric Metadata Model and Lifecycle for Ontology Mapping G...
Towards a Project Centric Metadata Model and Lifecycle for Ontology Mapping G...
 
Creating and Consuming Metadata from Transcribed Historical Vital Records for...
Creating and Consuming Metadata from Transcribed Historical Vital Records for...Creating and Consuming Metadata from Transcribed Historical Vital Records for...
Creating and Consuming Metadata from Transcribed Historical Vital Records for...
 
What is Linked Data?
What is Linked Data?What is Linked Data?
What is Linked Data?
 

Último

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Último (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

Towards Linked Vital Registration Data for Reconstituting Families and Creating Longitudinal Health Histories

  • 1. Towards Linked Vital Registration Data for Reconstituting Families and Creating Longitudinal Health HistoriesLongitudinal Health Histories Oya Beyan, Ciara Breathnach, Sandra Collins, Christophe Debruyne, Stefan Decker, Dolores Grant, Rebecca Grant, and Brian Gurrin 21st of July 2014 – KR4HC Workshop – Vienna, Austria21st of July 2014 – KR4HC Workshop – Vienna, Austria
  • 2. Irish Record Linkage, 1864-1913 • Developing a platform applying semantic technologies to historical birth-, death andtechnologies to historical birth-, death and marriage certificates. • Answering questions such as: “How accurate are historic maternal mortality rates (MMR) and infant mortality rates (IMR) for Dublin?” • Team consists of researchers (historians), digital archivists, and knowledge engineers. 21/07/2014 2
  • 3. Data: General Office Records • Vital registration data – Birth-certificates– Birth-certificates – Death-certificates – Marriage records • Digitised TIFF images of hardcopy indexes and registers. • 2 TB of data• 2 TB of data • Database describing the digitised records allowing searches on some fields. 21/07/2014 3 ©General Records Office of Ireland 2014
  • 4. Challenges • Certified causes of death that can be attributed to maternal death – Within 42 days after labour – before (1864) it was 12– Within 42 days after labour – before (1864) it was 12 – Septicemia (blood poisoning), Fever, … – “Corresponding” birth certificate? • Death certificates with no corresponding birth certificate • “Gaps” in sibship interval, even though no birth- or death certificates can be found. • The terminology used pre-1900. E.g., “debile” to denote• The terminology used pre-1900. E.g., “debile” to denote weak or a failure to thrive. • Capturing the socio-economical status of the families via, for instance, the professions, ranks of fathers. 21/07/2014 4
  • 5. Conceptual Architecture Digital Archivist SPARQL endpoint / Linked Data Server Updates GRO records as RDF LinksLinker UpdaterRepository Triple- store Linked Data Server Analytics Researcher 21/07/2014 5 DATA ANALYTICSPRESERVATION Links to external datasets: e.g., Logainm – a database of Irish historical and contemporary place names to provide additional context.
  • 6. Development of 2 ontologies Triplestore 2 Data Analysis CONCERNSSEPARATIONOFCONCERNS Obviously, due to the sensitive nature of the data, data protection is key. 21/07/2014 6 GRO Triplestore Transformation from one model to another • SPIN – SPARQL Inference • SWRL / RuleML • SPARQL Construct • … SEPARATION protection is key.
  • 7. Development of 2 ontologies • 2 ontologies were developed – separation of concerns • First ontology for describing the contents of records – OWL 2 shallow, “flat ontology” • Second ontology for data analysis – OWL 2 + rules – Rules to capture background and domain knowledge– Rules to capture background and domain knowledge – Developed by having the historians formulate competency questions (Grüninger and Fox) – Captured graphically using Object Role Modelling 21/07/2014 7
  • 8. Graphical Representation in ORM 21/07/2014 8
  • 9. ### Prefixes ommitted … irl:Record a owl:Class ; rdfs:label "Record" ; . irl:Certificate a owl:Class ; rdfs:label "Certificate" ; rdfs:subClassOf irl:Record; .rdfs:subClassOf irl:Record; . irl:BirthRecord a owl:Class ; rdfs:label "Birth Record" ; rdfs:subClassOf irl:Certificate ; . irl:DeathRecord a owl:Class ; rdfs:label "Death Record" ; rdfs:subClassOf irl:Certificate ; . irl:MarriageRecord a owl:Class ; rdfs:label "Marriage Record" ;rdfs:label "Marriage Record" ; rdfs:subClassOf irl:Record ; . irl:Return a owl:Class ; rdfs:label "Return" ; . … 21/07/2014 9
  • 10. Conclusions • Presented the problem and highlighted the challengeschallenges • Developed two ontologies – Encoding contents of digitized GRO records for long-term digital preservation DRI – Data analytics to answer the researchers’ question – in this case a historianquestion – in this case a historian • Data exploration and annotation of the records started on a subset of the dataset 21/07/2014 10