SlideShare una empresa de Scribd logo
1 de 16
Linked Humanities Data:
   The Next Frontier?
 A Case-Study in Historical Census Data


            Albert Meroño-Peñuela
  Knowledge Representation & Reasoning Group
                  29-10-2012
The Dutch historical censuses
                     (1795-1971)




29-10-2012           Linked Humanities Data: The Next Frontier?   2
The Dutch historical censuses
                     (1795-1971)




29-10-2012           Linked Humanities Data: The Next Frontier?   3
The Dutch historical censuses
                     (1795-1971)

• Population,
  Houses and
  Occupation
  censuses
• 507 Excel files
• 2,288 tables
• 33,283
  annotated cells

29-10-2012           Linked Humanities Data: The Next Frontier?   4
Heterogeneity: structural




29-10-2012         Linked Humanities Data: The Next Frontier?   5
Heterogeneity: semantic
• Variable meaning
      – Plaatselijke indeling / Kom, buiten de kom + Wijk +
        Naam / Plaats
      – Variable design (age 14-18, 19-20 vs. 14-15, 16-20)
• Variable values
      – RomschKatholik, RomsKatholic, VaticanChristelijk
      – Change in municipalities, occupations



29-10-2012           Linked Humanities Data: The Next Frontier?   6
(Current) Harmonization
• Manually create a (more general) translation
  table using standard CS
      – Map occupation literals with HISCO codes
      – Map municipality literals with AC codes
• Cons
      – Expensive
      – Detail/specificity loss
      – Process is non-repeatable

29-10-2012           Linked Humanities Data: The Next Frontier?   7
Additional requirements
• Errors: non-destructive update of values
• Provenance: record who did what, when, why
• Datamodel: do not commit to a specific one
• Linkage: enrich the dataset by linking it to
  others (e.g. labour strikes, book publications
  in NL)
• Publication: open data for researchers


29-10-2012        Linked Humanities Data: The Next Frontier?   8
Census RDF: arch

   • RDF Data Cube
     Vocabulary (cell data)
   • D2S Vocabulary (layout
     data)

   • Open Annotation Core
     Data Model (annotation
     data)




29-10-2012                Linked Humanities Data: The Next Frontier?   9
Census RDF: cell data




29-10-2012       Linked Humanities Data: The Next Frontier?   10
Census RDF: layout data




29-10-2012        Linked Humanities Data: The Next Frontier?   11
Census RDF: annotation data




29-10-2012          Linked Humanities Data: The Next Frontier?   12
Querying the RDF’d census




29-10-2012         Linked Humanities Data: The Next Frontier?   13
Not ready-to-publish RDF
• Disconnected graphs (but 279,136 possible variable
  mappings!)
• Complex & non-homogeneous SPARQL queries
• Contradictory annotation statements
• Drifted concepts
      – Tile settler -> roof repairer
      – Shoemaker (works with leather) -> shoemaker (owns a
        company)



29-10-2012            Linked Humanities Data: The Next Frontier?   14
New challenges
• Dynamic ontologies
      – Different concept formalizations depending on the
        time frame
      – Subjective definitions (contested concepts)
• Partitions and counting
      – Cannot merge counts of non aligned concepts
      – Infer individuals?
• Format round-tripping
      – On-demand XLS, CSV, RDF, RDB conversions with(out)
        data loss

29-10-2012            Linked Humanities Data: The Next Frontier?   15
Thank you!
Questions, suggestions?

     http://cedar-project.nl/
http://www.data2semantics.org/

Más contenido relacionado

Similar a Linked Humanities data

CEDAR & PRELIDA Preservation of Linked Socio-Historical Data
CEDAR & PRELIDA Preservation of Linked Socio-Historical DataCEDAR & PRELIDA Preservation of Linked Socio-Historical Data
CEDAR & PRELIDA Preservation of Linked Socio-Historical DataPRELIDA Project
 
MDST 3703 F10 Seminar 11
MDST 3703 F10 Seminar 11MDST 3703 F10 Seminar 11
MDST 3703 F10 Seminar 11Rafael Alvarado
 
Jordi Martí-Henneberg, Luís Espinha da Silveira, Daniel Alves & Josep Puig,To...
Jordi Martí-Henneberg, Luís Espinha da Silveira, Daniel Alves & Josep Puig,To...Jordi Martí-Henneberg, Luís Espinha da Silveira, Daniel Alves & Josep Puig,To...
Jordi Martí-Henneberg, Luís Espinha da Silveira, Daniel Alves & Josep Puig,To...Universidade Nova de Lisboa
 
Data Models and Query Languages for Linked Geospatial Data
Data Models and Query Languages for Linked Geospatial DataData Models and Query Languages for Linked Geospatial Data
Data Models and Query Languages for Linked Geospatial DataKostis Kyzirakos
 
Data Models and Query Languages for Linked Geospatial Data
Data Models and Query Languages for Linked Geospatial DataData Models and Query Languages for Linked Geospatial Data
Data Models and Query Languages for Linked Geospatial DataKostis Kyzirakos
 
LSD Dimensions: Use and Reuse of Linked Statistical Data as RDF Data Cube
LSD Dimensions: Use and Reuse of Linked Statistical Data as RDF Data CubeLSD Dimensions: Use and Reuse of Linked Statistical Data as RDF Data Cube
LSD Dimensions: Use and Reuse of Linked Statistical Data as RDF Data CubeAlbert Meroño-Peñuela
 
Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked DataEUCLID project
 
Linked Open Data in Romania
Linked Open Data in RomaniaLinked Open Data in Romania
Linked Open Data in RomaniaVlad Posea
 
[DCSB] Dr Gabriel Bodard (KCL) “A View on Digital Classics Collaboration: fro...
[DCSB] Dr Gabriel Bodard (KCL) “A View on Digital Classics Collaboration: fro...[DCSB] Dr Gabriel Bodard (KCL) “A View on Digital Classics Collaboration: fro...
[DCSB] Dr Gabriel Bodard (KCL) “A View on Digital Classics Collaboration: fro...Digital Classicist Seminar Berlin
 
Real-time Visualisation of Cultural Heritage and Environmental Archaeology Da...
Real-time Visualisation of Cultural Heritage and Environmental Archaeology Da...Real-time Visualisation of Cultural Heritage and Environmental Archaeology Da...
Real-time Visualisation of Cultural Heritage and Environmental Archaeology Da...Marcus Smith
 
Knowledge Graph Introduction
Knowledge Graph IntroductionKnowledge Graph Introduction
Knowledge Graph IntroductionSören Auer
 
Decentralized Data Management for the Semantic Web
Decentralized Data Management for the Semantic WebDecentralized Data Management for the Semantic Web
Decentralized Data Management for the Semantic Webhala Skaf
 
Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked DataMarin Dimitrov
 
Intra- and interdisciplinary cross-concordances for information retrieval
Intra- and interdisciplinary cross-concordances for information retrieval Intra- and interdisciplinary cross-concordances for information retrieval
Intra- and interdisciplinary cross-concordances for information retrieval GESIS
 

Similar a Linked Humanities data (20)

CEDAR & PRELIDA Preservation of Linked Socio-Historical Data
CEDAR & PRELIDA Preservation of Linked Socio-Historical DataCEDAR & PRELIDA Preservation of Linked Socio-Historical Data
CEDAR & PRELIDA Preservation of Linked Socio-Historical Data
 
MDST 3703 F10 Seminar 11
MDST 3703 F10 Seminar 11MDST 3703 F10 Seminar 11
MDST 3703 F10 Seminar 11
 
Data Driven Ontology Practices: The Real world objects of Ordnance Survey Ir...
Data Driven Ontology Practices: The Real world objects of  Ordnance Survey Ir...Data Driven Ontology Practices: The Real world objects of  Ordnance Survey Ir...
Data Driven Ontology Practices: The Real world objects of Ordnance Survey Ir...
 
Jordi Martí-Henneberg, Luís Espinha da Silveira, Daniel Alves & Josep Puig,To...
Jordi Martí-Henneberg, Luís Espinha da Silveira, Daniel Alves & Josep Puig,To...Jordi Martí-Henneberg, Luís Espinha da Silveira, Daniel Alves & Josep Puig,To...
Jordi Martí-Henneberg, Luís Espinha da Silveira, Daniel Alves & Josep Puig,To...
 
Data Models and Query Languages for Linked Geospatial Data
Data Models and Query Languages for Linked Geospatial DataData Models and Query Languages for Linked Geospatial Data
Data Models and Query Languages for Linked Geospatial Data
 
POSTDATA: Towards publishing European Poetry as Linked Open Data
POSTDATA: Towards publishing European Poetry as Linked Open DataPOSTDATA: Towards publishing European Poetry as Linked Open Data
POSTDATA: Towards publishing European Poetry as Linked Open Data
 
Data Models and Query Languages for Linked Geospatial Data
Data Models and Query Languages for Linked Geospatial DataData Models and Query Languages for Linked Geospatial Data
Data Models and Query Languages for Linked Geospatial Data
 
LSD Dimensions: Use and Reuse of Linked Statistical Data as RDF Data Cube
LSD Dimensions: Use and Reuse of Linked Statistical Data as RDF Data CubeLSD Dimensions: Use and Reuse of Linked Statistical Data as RDF Data Cube
LSD Dimensions: Use and Reuse of Linked Statistical Data as RDF Data Cube
 
Semantic Technologies for Cultural Heritage
Semantic Technologies for Cultural HeritageSemantic Technologies for Cultural Heritage
Semantic Technologies for Cultural Heritage
 
Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked Data
 
Linked Open Data in Romania
Linked Open Data in RomaniaLinked Open Data in Romania
Linked Open Data in Romania
 
[DCSB] Dr Gabriel Bodard (KCL) “A View on Digital Classics Collaboration: fro...
[DCSB] Dr Gabriel Bodard (KCL) “A View on Digital Classics Collaboration: fro...[DCSB] Dr Gabriel Bodard (KCL) “A View on Digital Classics Collaboration: fro...
[DCSB] Dr Gabriel Bodard (KCL) “A View on Digital Classics Collaboration: fro...
 
Real-time Visualisation of Cultural Heritage and Environmental Archaeology Da...
Real-time Visualisation of Cultural Heritage and Environmental Archaeology Da...Real-time Visualisation of Cultural Heritage and Environmental Archaeology Da...
Real-time Visualisation of Cultural Heritage and Environmental Archaeology Da...
 
Knowledge Graph Introduction
Knowledge Graph IntroductionKnowledge Graph Introduction
Knowledge Graph Introduction
 
Statistical data in RDF
Statistical data in RDFStatistical data in RDF
Statistical data in RDF
 
Decentralized Data Management for the Semantic Web
Decentralized Data Management for the Semantic WebDecentralized Data Management for the Semantic Web
Decentralized Data Management for the Semantic Web
 
Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked Data
 
CBS CEDAR Presentation
CBS CEDAR PresentationCBS CEDAR Presentation
CBS CEDAR Presentation
 
Intra- and interdisciplinary cross-concordances for information retrieval
Intra- and interdisciplinary cross-concordances for information retrieval Intra- and interdisciplinary cross-concordances for information retrieval
Intra- and interdisciplinary cross-concordances for information retrieval
 
Open statistics Belgium
Open statistics BelgiumOpen statistics Belgium
Open statistics Belgium
 

Más de Albert Meroño-Peñuela

List.MID: A MIDI-Based Benchmark for RDF Lists
List.MID: A MIDI-Based Benchmark for RDF ListsList.MID: A MIDI-Based Benchmark for RDF Lists
List.MID: A MIDI-Based Benchmark for RDF ListsAlbert Meroño-Peñuela
 
Modelling and Querying Lists in RDF. A Pragmatic Study
Modelling and Querying Lists in RDF. A Pragmatic StudyModelling and Querying Lists in RDF. A Pragmatic Study
Modelling and Querying Lists in RDF. A Pragmatic StudyAlbert Meroño-Peñuela
 
Making social science more reproducible by encapsulating access to linked data
Making social science more reproducible by encapsulating access to linked dataMaking social science more reproducible by encapsulating access to linked data
Making social science more reproducible by encapsulating access to linked dataAlbert Meroño-Peñuela
 
What can I expect from an academic career? Valuable skills
What can I expect from an academic career? Valuable skillsWhat can I expect from an academic career? Valuable skills
What can I expect from an academic career? Valuable skillsAlbert Meroño-Peñuela
 
Automatic Query-Centric API for Routine Access to Linked Data
Automatic Query-Centric API for Routine Access to Linked DataAutomatic Query-Centric API for Routine Access to Linked Data
Automatic Query-Centric API for Routine Access to Linked DataAlbert Meroño-Peñuela
 
One Score To Rule Them All: Semantics in Music Notation
One Score To Rule Them All: Semantics in Music NotationOne Score To Rule Them All: Semantics in Music Notation
One Score To Rule Them All: Semantics in Music NotationAlbert Meroño-Peñuela
 
Repeatable Semantic Queries for the Linked Data Agnostic
Repeatable Semantic Queries for the Linked Data AgnosticRepeatable Semantic Queries for the Linked Data Agnostic
Repeatable Semantic Queries for the Linked Data AgnosticAlbert Meroño-Peñuela
 
The Statistics of Stairway to Heaven: A Semantic Story About Digital Humanities
The Statistics of Stairway to Heaven: A Semantic Story About Digital HumanitiesThe Statistics of Stairway to Heaven: A Semantic Story About Digital Humanities
The Statistics of Stairway to Heaven: A Semantic Story About Digital HumanitiesAlbert Meroño-Peñuela
 
grlc: Bridging the Gap Between RESTful APIs and Linked Data
grlc: Bridging the Gap Between RESTful APIs and Linked Datagrlc: Bridging the Gap Between RESTful APIs and Linked Data
grlc: Bridging the Gap Between RESTful APIs and Linked DataAlbert Meroño-Peñuela
 
grlc Makes GitHub Taste Like Linked Data APIs
grlc Makes GitHub Taste Like Linked Data APIsgrlc Makes GitHub Taste Like Linked Data APIs
grlc Makes GitHub Taste Like Linked Data APIsAlbert Meroño-Peñuela
 
How does a knowledge graph sound like? (or: music is a graph)
How does a knowledge graph sound like? (or: music is a graph)How does a knowledge graph sound like? (or: music is a graph)
How does a knowledge graph sound like? (or: music is a graph)Albert Meroño-Peñuela
 
Non-Temporal Orderings for Extensional Concept Drift
Non-Temporal Orderings for Extensional Concept DriftNon-Temporal Orderings for Extensional Concept Drift
Non-Temporal Orderings for Extensional Concept DriftAlbert Meroño-Peñuela
 
Detecting and Reporting Extensional Concept Drift in Statistical Linked Data
Detecting and Reporting Extensional Concept Drift in Statistical Linked DataDetecting and Reporting Extensional Concept Drift in Statistical Linked Data
Detecting and Reporting Extensional Concept Drift in Statistical Linked DataAlbert Meroño-Peñuela
 

Más de Albert Meroño-Peñuela (18)

List.MID: A MIDI-Based Benchmark for RDF Lists
List.MID: A MIDI-Based Benchmark for RDF ListsList.MID: A MIDI-Based Benchmark for RDF Lists
List.MID: A MIDI-Based Benchmark for RDF Lists
 
Modelling and Querying Lists in RDF. A Pragmatic Study
Modelling and Querying Lists in RDF. A Pragmatic StudyModelling and Querying Lists in RDF. A Pragmatic Study
Modelling and Querying Lists in RDF. A Pragmatic Study
 
Making social science more reproducible by encapsulating access to linked data
Making social science more reproducible by encapsulating access to linked dataMaking social science more reproducible by encapsulating access to linked data
Making social science more reproducible by encapsulating access to linked data
 
What can I expect from an academic career? Valuable skills
What can I expect from an academic career? Valuable skillsWhat can I expect from an academic career? Valuable skills
What can I expect from an academic career? Valuable skills
 
The MIDI Linked Data Cloud
The MIDI Linked Data CloudThe MIDI Linked Data Cloud
The MIDI Linked Data Cloud
 
Automatic Query-Centric API for Routine Access to Linked Data
Automatic Query-Centric API for Routine Access to Linked DataAutomatic Query-Centric API for Routine Access to Linked Data
Automatic Query-Centric API for Routine Access to Linked Data
 
One Score To Rule Them All: Semantics in Music Notation
One Score To Rule Them All: Semantics in Music NotationOne Score To Rule Them All: Semantics in Music Notation
One Score To Rule Them All: Semantics in Music Notation
 
Repeatable Semantic Queries for the Linked Data Agnostic
Repeatable Semantic Queries for the Linked Data AgnosticRepeatable Semantic Queries for the Linked Data Agnostic
Repeatable Semantic Queries for the Linked Data Agnostic
 
The Statistics of Stairway to Heaven: A Semantic Story About Digital Humanities
The Statistics of Stairway to Heaven: A Semantic Story About Digital HumanitiesThe Statistics of Stairway to Heaven: A Semantic Story About Digital Humanities
The Statistics of Stairway to Heaven: A Semantic Story About Digital Humanities
 
grlc: Bridging the Gap Between RESTful APIs and Linked Data
grlc: Bridging the Gap Between RESTful APIs and Linked Datagrlc: Bridging the Gap Between RESTful APIs and Linked Data
grlc: Bridging the Gap Between RESTful APIs and Linked Data
 
grlc Makes GitHub Taste Like Linked Data APIs
grlc Makes GitHub Taste Like Linked Data APIsgrlc Makes GitHub Taste Like Linked Data APIs
grlc Makes GitHub Taste Like Linked Data APIs
 
Historical Reasoning on the Web
Historical Reasoning on the WebHistorical Reasoning on the Web
Historical Reasoning on the Web
 
How does a knowledge graph sound like? (or: music is a graph)
How does a knowledge graph sound like? (or: music is a graph)How does a knowledge graph sound like? (or: music is a graph)
How does a knowledge graph sound like? (or: music is a graph)
 
What Is Linked Historical Data?
What Is Linked Historical Data?What Is Linked Historical Data?
What Is Linked Historical Data?
 
Non-Temporal Orderings for Extensional Concept Drift
Non-Temporal Orderings for Extensional Concept DriftNon-Temporal Orderings for Extensional Concept Drift
Non-Temporal Orderings for Extensional Concept Drift
 
Detecting and Reporting Extensional Concept Drift in Statistical Linked Data
Detecting and Reporting Extensional Concept Drift in Statistical Linked DataDetecting and Reporting Extensional Concept Drift in Statistical Linked Data
Detecting and Reporting Extensional Concept Drift in Statistical Linked Data
 
Semantic Web for the Humanities
Semantic Web for the HumanitiesSemantic Web for the Humanities
Semantic Web for the Humanities
 
Linked Census Data
Linked Census DataLinked Census Data
Linked Census Data
 

Último

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 

Último (20)

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 

Linked Humanities data

  • 1. Linked Humanities Data: The Next Frontier? A Case-Study in Historical Census Data Albert Meroño-Peñuela Knowledge Representation & Reasoning Group 29-10-2012
  • 2. The Dutch historical censuses (1795-1971) 29-10-2012 Linked Humanities Data: The Next Frontier? 2
  • 3. The Dutch historical censuses (1795-1971) 29-10-2012 Linked Humanities Data: The Next Frontier? 3
  • 4. The Dutch historical censuses (1795-1971) • Population, Houses and Occupation censuses • 507 Excel files • 2,288 tables • 33,283 annotated cells 29-10-2012 Linked Humanities Data: The Next Frontier? 4
  • 5. Heterogeneity: structural 29-10-2012 Linked Humanities Data: The Next Frontier? 5
  • 6. Heterogeneity: semantic • Variable meaning – Plaatselijke indeling / Kom, buiten de kom + Wijk + Naam / Plaats – Variable design (age 14-18, 19-20 vs. 14-15, 16-20) • Variable values – RomschKatholik, RomsKatholic, VaticanChristelijk – Change in municipalities, occupations 29-10-2012 Linked Humanities Data: The Next Frontier? 6
  • 7. (Current) Harmonization • Manually create a (more general) translation table using standard CS – Map occupation literals with HISCO codes – Map municipality literals with AC codes • Cons – Expensive – Detail/specificity loss – Process is non-repeatable 29-10-2012 Linked Humanities Data: The Next Frontier? 7
  • 8. Additional requirements • Errors: non-destructive update of values • Provenance: record who did what, when, why • Datamodel: do not commit to a specific one • Linkage: enrich the dataset by linking it to others (e.g. labour strikes, book publications in NL) • Publication: open data for researchers 29-10-2012 Linked Humanities Data: The Next Frontier? 8
  • 9. Census RDF: arch • RDF Data Cube Vocabulary (cell data) • D2S Vocabulary (layout data) • Open Annotation Core Data Model (annotation data) 29-10-2012 Linked Humanities Data: The Next Frontier? 9
  • 10. Census RDF: cell data 29-10-2012 Linked Humanities Data: The Next Frontier? 10
  • 11. Census RDF: layout data 29-10-2012 Linked Humanities Data: The Next Frontier? 11
  • 12. Census RDF: annotation data 29-10-2012 Linked Humanities Data: The Next Frontier? 12
  • 13. Querying the RDF’d census 29-10-2012 Linked Humanities Data: The Next Frontier? 13
  • 14. Not ready-to-publish RDF • Disconnected graphs (but 279,136 possible variable mappings!) • Complex & non-homogeneous SPARQL queries • Contradictory annotation statements • Drifted concepts – Tile settler -> roof repairer – Shoemaker (works with leather) -> shoemaker (owns a company) 29-10-2012 Linked Humanities Data: The Next Frontier? 14
  • 15. New challenges • Dynamic ontologies – Different concept formalizations depending on the time frame – Subjective definitions (contested concepts) • Partitions and counting – Cannot merge counts of non aligned concepts – Infer individuals? • Format round-tripping – On-demand XLS, CSV, RDF, RDB conversions with(out) data loss 29-10-2012 Linked Humanities Data: The Next Frontier? 15
  • 16. Thank you! Questions, suggestions? http://cedar-project.nl/ http://www.data2semantics.org/