SlideShare una empresa de Scribd logo
1 de 17
From EAD to Linked Data:
(still) a work in progress
Archives & Linked Data meeting,
  JISC London, 7 Feb 2012

Pete Johnston
  Technical Researcher, Eduserv
  pete.johnston@eduserv.org.uk
How?
•   Model our “world”
•   Design URI patterns
•   Select/create RDF vocabularies
•   Design mapping of existing data to RDF
•   Convert/transform data
•   Generate links
•   Publish/expose data
•   Maintain/sustain
in
  Finding                              maintainedBy/       Repository         administeredBy/    Place                           Postcode
    Aid                                maintains            (Agent)           administers                                          Unit
   hasPart/           encodedAs/
   partOf             encodes              EAD
                                         Document
                                                                accessProvidedBy/
                                                                                                   Level
Biographical                                                    providesAccessTo
                              hasBiogHist/     topic/
   History                    isBiogHistFor    page
                                                                                      level      Language
                                                            Archival                 language
   topic/                                                                                                              at time
   page
                                origination    hasPart/     Resource
                                                                                    product of   Creation                        Temporal
                                               partOf
                                                                                                                                   Entity
                                                           associatedWith            extent
                                              inScheme
                                                                                                   Extent
   Agent                         Concept                        Concept
                                 Scheme
                                                                                                   representedBy
        Is-a                                       foaf:focus
                                                                                                                                  Object
                                                                                    Is-a         associatedWith
  Person                        Family             Organisation                Place
                                                                                                                                   Book

            participates in

   Birth                        Death                                                                Genre                       Function


                          at time
                                                            Temporal
                                                              Entity
Finding            maintainedBy/              Repository                 administeredBy/      Place
  Aid              maintains                   (Agent)                   administers




                                               accessProvidedBy/
                                               providesAccessTo
                           topic/
                           page


                                              Archival
                                              Resource

            origination            hasPart/
                                   partOf
                                         associatedWith


Agent        Concept                           Concept              associatedWith
             Scheme          inScheme
                                                                                             Book


                                          foaf:focus                   Is-a
     Is-a



Person      Family             Organisation                 Place                    Genre   Function
Design URI Patterns
Cool URIs for the Semantic Web
http://blogs.ukoln.ac.uk/locah/2010/11/16/
identifying-the-things-uri-patterns-for-the-hub-linked-data/
Designing URI Sets for the UK Public Sector
http://www.cabinetoffice.gov.uk/resource-library/
designing-uri-sets-uk-public-sector

http://example.org/id/person/p123456
http://example.org/doc/person/p123456
http://example.org/doc/person/p123456.html
http://example.org/doc/person/p123456.rdf
Identifying the “things”: URI Patterns for the Hub Linked Data
http://blogs.ukoln.ac.uk/locah/2010/11/16/
identifying-the-things-uri-patterns-for-the-hub-linked-data/
HTML

                                                   Expose    XHTML+
EAD                                                           RDFa
 EAD




                                   SPARQL
XML
  EAD                                                       RDF/
 XMLEAD    Transform      Triple                            XML
  XMLEAD                  Store
    XML
     XML
                         SPARQL/                   Other
                           API                     Apps




                         Enhance




                  Data    Data              Data
                  Set     Set               Set
EAD
 EAD
XML
  EAD
 XMLEAD    Transform   Triple
  XMLEAD               Store
    XML
     XML
Transform

•   Transform EAD XML to RDF/XML using XSLT
•   Translate RDF/XML to N-Triples
•   Split N-Triples into chunks
•   Post to Triple Store

• Manage inputs
• Capture metadata about each step of process
Challenges
• Archival description/Encoded Archival Description
   • Document v data
• Hub as aggregation
   • Messy data, from multiple sources
• Versioning
   • What happens when EAD doc X updated?
• Tracking triple/graph provenance
   • Graph/quad support in store
Triple
        Store

       SPARQL/
         API




       Enhance




Data    Data     Data
Set     Set      Set
Enhance
• Add supplementary data
   • Repository postcode data
   • Data about project (DOAP), dataset (VOID) etc
• Internal links/consolidation
• Generate links to external resources
   • Ordnance Survey – trivial from postcode
   • VIAF – script to look up candidate matches
   • LCSH – script to look up, match
Enhance
• Tools
   • Silk - pattern matching
   • Google Refine
• Use third-party links
   • e.g. get Dbpedia link from VIAF
• Use aggregator services
   • e.g. sameas.org
• Capture metadata about each process
Challenges
• Various target interfaces for lookup
• Identity/similarity/”sameAs” issues, verification
• Workflow
   • Repeatability?
• Versioning
• Tracking triple/graph provenance
   • Graph/quad support in store
• Exposing triple provenance
RDF o3-1
        Lic A


                RDF i1                              RDF o3
                                                                        Lic A
EAD 1
                                                             RDF o3-2



                                                                        Lic C
                RDF i2   RDF i1            RDF i2   RDF o2
EAD 2                             Lic C
                                                              Lic B
                                  RDF iX

        Lic B            Lic A             Lic B
                                                    RDF o1
                RDF iX
                                                              Lic C

        Lic C
HTML o2-1
  Meta A                              HTML o2



 (from     RDF i1                                     HTML o2-2        Meta oA
 Linked
Archives
  Hub)                                                   RDF o2-1
                                      RDF o2                           Meta oB


                    RDF i1
                             Meta B                      RDF o3-2


                             RDF i2


                    Meta A
 (from     RDF i2
                                                RDF o1
DBpedia)


                                                HTML o1             Meta oB

  Meta B
Summary: challenges

   Archival description/EAD
   Data consistency, cleaning
   Lookups, linking & identity
   Time, versioning, persistence, workflows
   Trust, provenance, graphs, metadata
From EAD to Linked Data:
(still) a work in progress
Archives & Linked Data meeting,
  JISC London, 7 Feb 2012

Pete Johnston
  Technical Researcher, Eduserv
  pete.johnston@eduserv.org.uk

Más contenido relacionado

Similar a From EAD to Linked Data: (still) a work in progress

Architecting Smarter Apps with Entity Framework
Architecting Smarter Apps with Entity FrameworkArchitecting Smarter Apps with Entity Framework
Architecting Smarter Apps with Entity FrameworkSaltmarch Media
 
Www 2 ggg Athanassios Hatzis
Www 2 ggg Athanassios HatzisWww 2 ggg Athanassios Hatzis
Www 2 ggg Athanassios HatzisIgnite_Athens
 
Distro Recipes 2013 : Contribution of RDF metadata for traceability among pro...
Distro Recipes 2013 : Contribution of RDF metadata for traceability among pro...Distro Recipes 2013 : Contribution of RDF metadata for traceability among pro...
Distro Recipes 2013 : Contribution of RDF metadata for traceability among pro...Anne Nicolas
 
Presentation distro recipes-2013
Presentation distro recipes-2013Presentation distro recipes-2013
Presentation distro recipes-2013olberger
 
Libraries and Linked Data: Looking to the Future (3)
Libraries and Linked Data: Looking to the Future (3)Libraries and Linked Data: Looking to the Future (3)
Libraries and Linked Data: Looking to the Future (3)ALATechSource
 
Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database
Towards a Simple, Standards-Compliant, and Generic Phylogenetic DatabaseTowards a Simple, Standards-Compliant, and Generic Phylogenetic Database
Towards a Simple, Standards-Compliant, and Generic Phylogenetic DatabaseHilmar Lapp
 
20120411 travelalliancemcguinnessfinal
20120411 travelalliancemcguinnessfinal20120411 travelalliancemcguinnessfinal
20120411 travelalliancemcguinnessfinalDeborah McGuinness
 

Similar a From EAD to Linked Data: (still) a work in progress (7)

Architecting Smarter Apps with Entity Framework
Architecting Smarter Apps with Entity FrameworkArchitecting Smarter Apps with Entity Framework
Architecting Smarter Apps with Entity Framework
 
Www 2 ggg Athanassios Hatzis
Www 2 ggg Athanassios HatzisWww 2 ggg Athanassios Hatzis
Www 2 ggg Athanassios Hatzis
 
Distro Recipes 2013 : Contribution of RDF metadata for traceability among pro...
Distro Recipes 2013 : Contribution of RDF metadata for traceability among pro...Distro Recipes 2013 : Contribution of RDF metadata for traceability among pro...
Distro Recipes 2013 : Contribution of RDF metadata for traceability among pro...
 
Presentation distro recipes-2013
Presentation distro recipes-2013Presentation distro recipes-2013
Presentation distro recipes-2013
 
Libraries and Linked Data: Looking to the Future (3)
Libraries and Linked Data: Looking to the Future (3)Libraries and Linked Data: Looking to the Future (3)
Libraries and Linked Data: Looking to the Future (3)
 
Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database
Towards a Simple, Standards-Compliant, and Generic Phylogenetic DatabaseTowards a Simple, Standards-Compliant, and Generic Phylogenetic Database
Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database
 
20120411 travelalliancemcguinnessfinal
20120411 travelalliancemcguinnessfinal20120411 travelalliancemcguinnessfinal
20120411 travelalliancemcguinnessfinal
 

Último

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 

Último (20)

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 

From EAD to Linked Data: (still) a work in progress

  • 1. From EAD to Linked Data: (still) a work in progress Archives & Linked Data meeting, JISC London, 7 Feb 2012 Pete Johnston Technical Researcher, Eduserv pete.johnston@eduserv.org.uk
  • 2. How? • Model our “world” • Design URI patterns • Select/create RDF vocabularies • Design mapping of existing data to RDF • Convert/transform data • Generate links • Publish/expose data • Maintain/sustain
  • 3. in Finding maintainedBy/ Repository administeredBy/ Place Postcode Aid maintains (Agent) administers Unit hasPart/ encodedAs/ partOf encodes EAD Document accessProvidedBy/ Level Biographical providesAccessTo hasBiogHist/ topic/ History isBiogHistFor page level Language Archival language topic/ at time page origination hasPart/ Resource product of Creation Temporal partOf Entity associatedWith extent inScheme Extent Agent Concept Concept Scheme representedBy Is-a foaf:focus Object Is-a associatedWith Person Family Organisation Place Book participates in Birth Death Genre Function at time Temporal Entity
  • 4. Finding maintainedBy/ Repository administeredBy/ Place Aid maintains (Agent) administers accessProvidedBy/ providesAccessTo topic/ page Archival Resource origination hasPart/ partOf associatedWith Agent Concept Concept associatedWith Scheme inScheme Book foaf:focus Is-a Is-a Person Family Organisation Place Genre Function
  • 5. Design URI Patterns Cool URIs for the Semantic Web http://blogs.ukoln.ac.uk/locah/2010/11/16/ identifying-the-things-uri-patterns-for-the-hub-linked-data/ Designing URI Sets for the UK Public Sector http://www.cabinetoffice.gov.uk/resource-library/ designing-uri-sets-uk-public-sector http://example.org/id/person/p123456 http://example.org/doc/person/p123456 http://example.org/doc/person/p123456.html http://example.org/doc/person/p123456.rdf Identifying the “things”: URI Patterns for the Hub Linked Data http://blogs.ukoln.ac.uk/locah/2010/11/16/ identifying-the-things-uri-patterns-for-the-hub-linked-data/
  • 6. HTML Expose XHTML+ EAD RDFa EAD SPARQL XML EAD RDF/ XMLEAD Transform Triple XML XMLEAD Store XML XML SPARQL/ Other API Apps Enhance Data Data Data Set Set Set
  • 7. EAD EAD XML EAD XMLEAD Transform Triple XMLEAD Store XML XML
  • 8. Transform • Transform EAD XML to RDF/XML using XSLT • Translate RDF/XML to N-Triples • Split N-Triples into chunks • Post to Triple Store • Manage inputs • Capture metadata about each step of process
  • 9. Challenges • Archival description/Encoded Archival Description • Document v data • Hub as aggregation • Messy data, from multiple sources • Versioning • What happens when EAD doc X updated? • Tracking triple/graph provenance • Graph/quad support in store
  • 10. Triple Store SPARQL/ API Enhance Data Data Data Set Set Set
  • 11. Enhance • Add supplementary data • Repository postcode data • Data about project (DOAP), dataset (VOID) etc • Internal links/consolidation • Generate links to external resources • Ordnance Survey – trivial from postcode • VIAF – script to look up candidate matches • LCSH – script to look up, match
  • 12. Enhance • Tools • Silk - pattern matching • Google Refine • Use third-party links • e.g. get Dbpedia link from VIAF • Use aggregator services • e.g. sameas.org • Capture metadata about each process
  • 13. Challenges • Various target interfaces for lookup • Identity/similarity/”sameAs” issues, verification • Workflow • Repeatability? • Versioning • Tracking triple/graph provenance • Graph/quad support in store • Exposing triple provenance
  • 14. RDF o3-1 Lic A RDF i1 RDF o3 Lic A EAD 1 RDF o3-2 Lic C RDF i2 RDF i1 RDF i2 RDF o2 EAD 2 Lic C Lic B RDF iX Lic B Lic A Lic B RDF o1 RDF iX Lic C Lic C
  • 15. HTML o2-1 Meta A HTML o2 (from RDF i1 HTML o2-2 Meta oA Linked Archives Hub) RDF o2-1 RDF o2 Meta oB RDF i1 Meta B RDF o3-2 RDF i2 Meta A (from RDF i2 RDF o1 DBpedia) HTML o1 Meta oB Meta B
  • 16. Summary: challenges  Archival description/EAD  Data consistency, cleaning  Lookups, linking & identity  Time, versioning, persistence, workflows  Trust, provenance, graphs, metadata
  • 17. From EAD to Linked Data: (still) a work in progress Archives & Linked Data meeting, JISC London, 7 Feb 2012 Pete Johnston Technical Researcher, Eduserv pete.johnston@eduserv.org.uk