SlideShare una empresa de Scribd logo
1 de 16
Linked Humanities Data:
   The Next Frontier?
 A Case-Study in Historical Census Data


            Albert Meroño-Peñuela
  Knowledge Representation & Reasoning Group
                  29-10-2012
The Dutch historical censuses
                     (1795-1971)




29-10-2012           Linked Humanities Data: The Next Frontier?   2
The Dutch historical censuses
                     (1795-1971)




29-10-2012           Linked Humanities Data: The Next Frontier?   3
The Dutch historical censuses
                     (1795-1971)

• Population,
  Houses and
  Occupation
  censuses
• 507 Excel files
• 2,288 tables
• 33,283
  annotated cells

29-10-2012           Linked Humanities Data: The Next Frontier?   4
Heterogeneity: structural




29-10-2012         Linked Humanities Data: The Next Frontier?   5
Heterogeneity: semantic
• Variable meaning
      – Plaatselijke indeling / Kom, buiten de kom + Wijk +
        Naam / Plaats
      – Variable design (age 14-18, 19-20 vs. 14-15, 16-20)
• Variable values
      – RomschKatholik, RomsKatholic, VaticanChristelijk
      – Change in municipalities, occupations



29-10-2012           Linked Humanities Data: The Next Frontier?   6
(Current) Harmonization
• Manually create a (more general) translation
  table using standard CS
      – Map occupation literals with HISCO codes
      – Map municipality literals with AC codes
• Cons
      – Expensive
      – Detail/specificity loss
      – Process is non-repeatable

29-10-2012           Linked Humanities Data: The Next Frontier?   7
Additional requirements
• Errors: non-destructive update of values
• Provenance: record who did what, when, why
• Datamodel: do not commit to a specific one
• Linkage: enrich the dataset by linking it to
  others (e.g. labour strikes, book publications
  in NL)
• Publication: open data for researchers


29-10-2012        Linked Humanities Data: The Next Frontier?   8
Census RDF: arch

   • RDF Data Cube
     Vocabulary (cell data)
   • D2S Vocabulary (layout
     data)

   • Open Annotation Core
     Data Model (annotation
     data)




29-10-2012                Linked Humanities Data: The Next Frontier?   9
Census RDF: cell data




29-10-2012       Linked Humanities Data: The Next Frontier?   10
Census RDF: layout data




29-10-2012        Linked Humanities Data: The Next Frontier?   11
Census RDF: annotation data




29-10-2012          Linked Humanities Data: The Next Frontier?   12
Querying the RDF’d census




29-10-2012         Linked Humanities Data: The Next Frontier?   13
Not ready-to-publish RDF
• Disconnected graphs (but 279,136 possible variable
  mappings!)
• Complex & non-homogeneous SPARQL queries
• Contradictory annotation statements
• Drifted concepts
      – Tile settler -> roof repairer
      – Shoemaker (works with leather) -> shoemaker (owns a
        company)



29-10-2012            Linked Humanities Data: The Next Frontier?   14
New challenges
• Dynamic ontologies
      – Different concept formalizations depending on the
        time frame
      – Subjective definitions (contested concepts)
• Partitions and counting
      – Cannot merge counts of non aligned concepts
      – Infer individuals?
• Format round-tripping
      – On-demand XLS, CSV, RDF, RDB conversions with(out)
        data loss

29-10-2012            Linked Humanities Data: The Next Frontier?   15
Thank you!
Questions, suggestions?

     http://cedar-project.nl/
http://www.data2semantics.org/

Más contenido relacionado

Similar a Linked Humanities data

CEDAR & PRELIDA Preservation of Linked Socio-Historical Data
CEDAR & PRELIDA Preservation of Linked Socio-Historical DataCEDAR & PRELIDA Preservation of Linked Socio-Historical Data
CEDAR & PRELIDA Preservation of Linked Socio-Historical DataPRELIDA Project
 
MDST 3703 F10 Seminar 11
MDST 3703 F10 Seminar 11MDST 3703 F10 Seminar 11
MDST 3703 F10 Seminar 11Rafael Alvarado
 
Jordi Martí-Henneberg, Luís Espinha da Silveira, Daniel Alves & Josep Puig,To...
Jordi Martí-Henneberg, Luís Espinha da Silveira, Daniel Alves & Josep Puig,To...Jordi Martí-Henneberg, Luís Espinha da Silveira, Daniel Alves & Josep Puig,To...
Jordi Martí-Henneberg, Luís Espinha da Silveira, Daniel Alves & Josep Puig,To...Universidade Nova de Lisboa
 
Data Models and Query Languages for Linked Geospatial Data
Data Models and Query Languages for Linked Geospatial DataData Models and Query Languages for Linked Geospatial Data
Data Models and Query Languages for Linked Geospatial DataKostis Kyzirakos
 
Data Models and Query Languages for Linked Geospatial Data
Data Models and Query Languages for Linked Geospatial DataData Models and Query Languages for Linked Geospatial Data
Data Models and Query Languages for Linked Geospatial DataKostis Kyzirakos
 
LSD Dimensions: Use and Reuse of Linked Statistical Data as RDF Data Cube
LSD Dimensions: Use and Reuse of Linked Statistical Data as RDF Data CubeLSD Dimensions: Use and Reuse of Linked Statistical Data as RDF Data Cube
LSD Dimensions: Use and Reuse of Linked Statistical Data as RDF Data CubeAlbert Meroño-Peñuela
 
Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked DataEUCLID project
 
Linked Open Data in Romania
Linked Open Data in RomaniaLinked Open Data in Romania
Linked Open Data in RomaniaVlad Posea
 
[DCSB] Dr Gabriel Bodard (KCL) “A View on Digital Classics Collaboration: fro...
[DCSB] Dr Gabriel Bodard (KCL) “A View on Digital Classics Collaboration: fro...[DCSB] Dr Gabriel Bodard (KCL) “A View on Digital Classics Collaboration: fro...
[DCSB] Dr Gabriel Bodard (KCL) “A View on Digital Classics Collaboration: fro...Digital Classicist Seminar Berlin
 
Real-time Visualisation of Cultural Heritage and Environmental Archaeology Da...
Real-time Visualisation of Cultural Heritage and Environmental Archaeology Da...Real-time Visualisation of Cultural Heritage and Environmental Archaeology Da...
Real-time Visualisation of Cultural Heritage and Environmental Archaeology Da...Marcus Smith
 
Knowledge Graph Introduction
Knowledge Graph IntroductionKnowledge Graph Introduction
Knowledge Graph IntroductionSören Auer
 
Decentralized Data Management for the Semantic Web
Decentralized Data Management for the Semantic WebDecentralized Data Management for the Semantic Web
Decentralized Data Management for the Semantic Webhala Skaf
 
Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked DataMarin Dimitrov
 
Intra- and interdisciplinary cross-concordances for information retrieval
Intra- and interdisciplinary cross-concordances for information retrieval Intra- and interdisciplinary cross-concordances for information retrieval
Intra- and interdisciplinary cross-concordances for information retrieval GESIS
 

Similar a Linked Humanities data (20)

CEDAR & PRELIDA Preservation of Linked Socio-Historical Data
CEDAR & PRELIDA Preservation of Linked Socio-Historical DataCEDAR & PRELIDA Preservation of Linked Socio-Historical Data
CEDAR & PRELIDA Preservation of Linked Socio-Historical Data
 
MDST 3703 F10 Seminar 11
MDST 3703 F10 Seminar 11MDST 3703 F10 Seminar 11
MDST 3703 F10 Seminar 11
 
Data Driven Ontology Practices: The Real world objects of Ordnance Survey Ir...
Data Driven Ontology Practices: The Real world objects of  Ordnance Survey Ir...Data Driven Ontology Practices: The Real world objects of  Ordnance Survey Ir...
Data Driven Ontology Practices: The Real world objects of Ordnance Survey Ir...
 
Jordi Martí-Henneberg, Luís Espinha da Silveira, Daniel Alves & Josep Puig,To...
Jordi Martí-Henneberg, Luís Espinha da Silveira, Daniel Alves & Josep Puig,To...Jordi Martí-Henneberg, Luís Espinha da Silveira, Daniel Alves & Josep Puig,To...
Jordi Martí-Henneberg, Luís Espinha da Silveira, Daniel Alves & Josep Puig,To...
 
Data Models and Query Languages for Linked Geospatial Data
Data Models and Query Languages for Linked Geospatial DataData Models and Query Languages for Linked Geospatial Data
Data Models and Query Languages for Linked Geospatial Data
 
POSTDATA: Towards publishing European Poetry as Linked Open Data
POSTDATA: Towards publishing European Poetry as Linked Open DataPOSTDATA: Towards publishing European Poetry as Linked Open Data
POSTDATA: Towards publishing European Poetry as Linked Open Data
 
Data Models and Query Languages for Linked Geospatial Data
Data Models and Query Languages for Linked Geospatial DataData Models and Query Languages for Linked Geospatial Data
Data Models and Query Languages for Linked Geospatial Data
 
LSD Dimensions: Use and Reuse of Linked Statistical Data as RDF Data Cube
LSD Dimensions: Use and Reuse of Linked Statistical Data as RDF Data CubeLSD Dimensions: Use and Reuse of Linked Statistical Data as RDF Data Cube
LSD Dimensions: Use and Reuse of Linked Statistical Data as RDF Data Cube
 
Semantic Technologies for Cultural Heritage
Semantic Technologies for Cultural HeritageSemantic Technologies for Cultural Heritage
Semantic Technologies for Cultural Heritage
 
Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked Data
 
Linked Open Data in Romania
Linked Open Data in RomaniaLinked Open Data in Romania
Linked Open Data in Romania
 
[DCSB] Dr Gabriel Bodard (KCL) “A View on Digital Classics Collaboration: fro...
[DCSB] Dr Gabriel Bodard (KCL) “A View on Digital Classics Collaboration: fro...[DCSB] Dr Gabriel Bodard (KCL) “A View on Digital Classics Collaboration: fro...
[DCSB] Dr Gabriel Bodard (KCL) “A View on Digital Classics Collaboration: fro...
 
Real-time Visualisation of Cultural Heritage and Environmental Archaeology Da...
Real-time Visualisation of Cultural Heritage and Environmental Archaeology Da...Real-time Visualisation of Cultural Heritage and Environmental Archaeology Da...
Real-time Visualisation of Cultural Heritage and Environmental Archaeology Da...
 
Knowledge Graph Introduction
Knowledge Graph IntroductionKnowledge Graph Introduction
Knowledge Graph Introduction
 
Statistical data in RDF
Statistical data in RDFStatistical data in RDF
Statistical data in RDF
 
Decentralized Data Management for the Semantic Web
Decentralized Data Management for the Semantic WebDecentralized Data Management for the Semantic Web
Decentralized Data Management for the Semantic Web
 
Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked Data
 
CBS CEDAR Presentation
CBS CEDAR PresentationCBS CEDAR Presentation
CBS CEDAR Presentation
 
Intra- and interdisciplinary cross-concordances for information retrieval
Intra- and interdisciplinary cross-concordances for information retrieval Intra- and interdisciplinary cross-concordances for information retrieval
Intra- and interdisciplinary cross-concordances for information retrieval
 
Open statistics Belgium
Open statistics BelgiumOpen statistics Belgium
Open statistics Belgium
 

Más de Albert Meroño-Peñuela

List.MID: A MIDI-Based Benchmark for RDF Lists
List.MID: A MIDI-Based Benchmark for RDF ListsList.MID: A MIDI-Based Benchmark for RDF Lists
List.MID: A MIDI-Based Benchmark for RDF ListsAlbert Meroño-Peñuela
 
Modelling and Querying Lists in RDF. A Pragmatic Study
Modelling and Querying Lists in RDF. A Pragmatic StudyModelling and Querying Lists in RDF. A Pragmatic Study
Modelling and Querying Lists in RDF. A Pragmatic StudyAlbert Meroño-Peñuela
 
Making social science more reproducible by encapsulating access to linked data
Making social science more reproducible by encapsulating access to linked dataMaking social science more reproducible by encapsulating access to linked data
Making social science more reproducible by encapsulating access to linked dataAlbert Meroño-Peñuela
 
What can I expect from an academic career? Valuable skills
What can I expect from an academic career? Valuable skillsWhat can I expect from an academic career? Valuable skills
What can I expect from an academic career? Valuable skillsAlbert Meroño-Peñuela
 
Automatic Query-Centric API for Routine Access to Linked Data
Automatic Query-Centric API for Routine Access to Linked DataAutomatic Query-Centric API for Routine Access to Linked Data
Automatic Query-Centric API for Routine Access to Linked DataAlbert Meroño-Peñuela
 
One Score To Rule Them All: Semantics in Music Notation
One Score To Rule Them All: Semantics in Music NotationOne Score To Rule Them All: Semantics in Music Notation
One Score To Rule Them All: Semantics in Music NotationAlbert Meroño-Peñuela
 
Repeatable Semantic Queries for the Linked Data Agnostic
Repeatable Semantic Queries for the Linked Data AgnosticRepeatable Semantic Queries for the Linked Data Agnostic
Repeatable Semantic Queries for the Linked Data AgnosticAlbert Meroño-Peñuela
 
The Statistics of Stairway to Heaven: A Semantic Story About Digital Humanities
The Statistics of Stairway to Heaven: A Semantic Story About Digital HumanitiesThe Statistics of Stairway to Heaven: A Semantic Story About Digital Humanities
The Statistics of Stairway to Heaven: A Semantic Story About Digital HumanitiesAlbert Meroño-Peñuela
 
grlc: Bridging the Gap Between RESTful APIs and Linked Data
grlc: Bridging the Gap Between RESTful APIs and Linked Datagrlc: Bridging the Gap Between RESTful APIs and Linked Data
grlc: Bridging the Gap Between RESTful APIs and Linked DataAlbert Meroño-Peñuela
 
grlc Makes GitHub Taste Like Linked Data APIs
grlc Makes GitHub Taste Like Linked Data APIsgrlc Makes GitHub Taste Like Linked Data APIs
grlc Makes GitHub Taste Like Linked Data APIsAlbert Meroño-Peñuela
 
How does a knowledge graph sound like? (or: music is a graph)
How does a knowledge graph sound like? (or: music is a graph)How does a knowledge graph sound like? (or: music is a graph)
How does a knowledge graph sound like? (or: music is a graph)Albert Meroño-Peñuela
 
Non-Temporal Orderings for Extensional Concept Drift
Non-Temporal Orderings for Extensional Concept DriftNon-Temporal Orderings for Extensional Concept Drift
Non-Temporal Orderings for Extensional Concept DriftAlbert Meroño-Peñuela
 
Detecting and Reporting Extensional Concept Drift in Statistical Linked Data
Detecting and Reporting Extensional Concept Drift in Statistical Linked DataDetecting and Reporting Extensional Concept Drift in Statistical Linked Data
Detecting and Reporting Extensional Concept Drift in Statistical Linked DataAlbert Meroño-Peñuela
 

Más de Albert Meroño-Peñuela (18)

List.MID: A MIDI-Based Benchmark for RDF Lists
List.MID: A MIDI-Based Benchmark for RDF ListsList.MID: A MIDI-Based Benchmark for RDF Lists
List.MID: A MIDI-Based Benchmark for RDF Lists
 
Modelling and Querying Lists in RDF. A Pragmatic Study
Modelling and Querying Lists in RDF. A Pragmatic StudyModelling and Querying Lists in RDF. A Pragmatic Study
Modelling and Querying Lists in RDF. A Pragmatic Study
 
Making social science more reproducible by encapsulating access to linked data
Making social science more reproducible by encapsulating access to linked dataMaking social science more reproducible by encapsulating access to linked data
Making social science more reproducible by encapsulating access to linked data
 
What can I expect from an academic career? Valuable skills
What can I expect from an academic career? Valuable skillsWhat can I expect from an academic career? Valuable skills
What can I expect from an academic career? Valuable skills
 
The MIDI Linked Data Cloud
The MIDI Linked Data CloudThe MIDI Linked Data Cloud
The MIDI Linked Data Cloud
 
Automatic Query-Centric API for Routine Access to Linked Data
Automatic Query-Centric API for Routine Access to Linked DataAutomatic Query-Centric API for Routine Access to Linked Data
Automatic Query-Centric API for Routine Access to Linked Data
 
One Score To Rule Them All: Semantics in Music Notation
One Score To Rule Them All: Semantics in Music NotationOne Score To Rule Them All: Semantics in Music Notation
One Score To Rule Them All: Semantics in Music Notation
 
Repeatable Semantic Queries for the Linked Data Agnostic
Repeatable Semantic Queries for the Linked Data AgnosticRepeatable Semantic Queries for the Linked Data Agnostic
Repeatable Semantic Queries for the Linked Data Agnostic
 
The Statistics of Stairway to Heaven: A Semantic Story About Digital Humanities
The Statistics of Stairway to Heaven: A Semantic Story About Digital HumanitiesThe Statistics of Stairway to Heaven: A Semantic Story About Digital Humanities
The Statistics of Stairway to Heaven: A Semantic Story About Digital Humanities
 
grlc: Bridging the Gap Between RESTful APIs and Linked Data
grlc: Bridging the Gap Between RESTful APIs and Linked Datagrlc: Bridging the Gap Between RESTful APIs and Linked Data
grlc: Bridging the Gap Between RESTful APIs and Linked Data
 
grlc Makes GitHub Taste Like Linked Data APIs
grlc Makes GitHub Taste Like Linked Data APIsgrlc Makes GitHub Taste Like Linked Data APIs
grlc Makes GitHub Taste Like Linked Data APIs
 
Historical Reasoning on the Web
Historical Reasoning on the WebHistorical Reasoning on the Web
Historical Reasoning on the Web
 
How does a knowledge graph sound like? (or: music is a graph)
How does a knowledge graph sound like? (or: music is a graph)How does a knowledge graph sound like? (or: music is a graph)
How does a knowledge graph sound like? (or: music is a graph)
 
What Is Linked Historical Data?
What Is Linked Historical Data?What Is Linked Historical Data?
What Is Linked Historical Data?
 
Non-Temporal Orderings for Extensional Concept Drift
Non-Temporal Orderings for Extensional Concept DriftNon-Temporal Orderings for Extensional Concept Drift
Non-Temporal Orderings for Extensional Concept Drift
 
Detecting and Reporting Extensional Concept Drift in Statistical Linked Data
Detecting and Reporting Extensional Concept Drift in Statistical Linked DataDetecting and Reporting Extensional Concept Drift in Statistical Linked Data
Detecting and Reporting Extensional Concept Drift in Statistical Linked Data
 
Semantic Web for the Humanities
Semantic Web for the HumanitiesSemantic Web for the Humanities
Semantic Web for the Humanities
 
Linked Census Data
Linked Census DataLinked Census Data
Linked Census Data
 

Último

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 

Último (20)

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 

Linked Humanities data

  • 1. Linked Humanities Data: The Next Frontier? A Case-Study in Historical Census Data Albert Meroño-Peñuela Knowledge Representation & Reasoning Group 29-10-2012
  • 2. The Dutch historical censuses (1795-1971) 29-10-2012 Linked Humanities Data: The Next Frontier? 2
  • 3. The Dutch historical censuses (1795-1971) 29-10-2012 Linked Humanities Data: The Next Frontier? 3
  • 4. The Dutch historical censuses (1795-1971) • Population, Houses and Occupation censuses • 507 Excel files • 2,288 tables • 33,283 annotated cells 29-10-2012 Linked Humanities Data: The Next Frontier? 4
  • 5. Heterogeneity: structural 29-10-2012 Linked Humanities Data: The Next Frontier? 5
  • 6. Heterogeneity: semantic • Variable meaning – Plaatselijke indeling / Kom, buiten de kom + Wijk + Naam / Plaats – Variable design (age 14-18, 19-20 vs. 14-15, 16-20) • Variable values – RomschKatholik, RomsKatholic, VaticanChristelijk – Change in municipalities, occupations 29-10-2012 Linked Humanities Data: The Next Frontier? 6
  • 7. (Current) Harmonization • Manually create a (more general) translation table using standard CS – Map occupation literals with HISCO codes – Map municipality literals with AC codes • Cons – Expensive – Detail/specificity loss – Process is non-repeatable 29-10-2012 Linked Humanities Data: The Next Frontier? 7
  • 8. Additional requirements • Errors: non-destructive update of values • Provenance: record who did what, when, why • Datamodel: do not commit to a specific one • Linkage: enrich the dataset by linking it to others (e.g. labour strikes, book publications in NL) • Publication: open data for researchers 29-10-2012 Linked Humanities Data: The Next Frontier? 8
  • 9. Census RDF: arch • RDF Data Cube Vocabulary (cell data) • D2S Vocabulary (layout data) • Open Annotation Core Data Model (annotation data) 29-10-2012 Linked Humanities Data: The Next Frontier? 9
  • 10. Census RDF: cell data 29-10-2012 Linked Humanities Data: The Next Frontier? 10
  • 11. Census RDF: layout data 29-10-2012 Linked Humanities Data: The Next Frontier? 11
  • 12. Census RDF: annotation data 29-10-2012 Linked Humanities Data: The Next Frontier? 12
  • 13. Querying the RDF’d census 29-10-2012 Linked Humanities Data: The Next Frontier? 13
  • 14. Not ready-to-publish RDF • Disconnected graphs (but 279,136 possible variable mappings!) • Complex & non-homogeneous SPARQL queries • Contradictory annotation statements • Drifted concepts – Tile settler -> roof repairer – Shoemaker (works with leather) -> shoemaker (owns a company) 29-10-2012 Linked Humanities Data: The Next Frontier? 14
  • 15. New challenges • Dynamic ontologies – Different concept formalizations depending on the time frame – Subjective definitions (contested concepts) • Partitions and counting – Cannot merge counts of non aligned concepts – Infer individuals? • Format round-tripping – On-demand XLS, CSV, RDF, RDB conversions with(out) data loss 29-10-2012 Linked Humanities Data: The Next Frontier? 15
  • 16. Thank you! Questions, suggestions? http://cedar-project.nl/ http://www.data2semantics.org/