This document discusses leveraging the DDI (Data Documentation Initiative) model for linked statistical data in the social, behavioral, and economic sciences. It outlines how the DDI was developed as an ontology, including using use cases to identify important elements to model and mapping existing DDI-XML documents to DDI-RDF. A key use case is discovering microdata connected across multiple studies based on dimensions like time, country, and subject. The document walks through examples of queries this ontology would support, such as finding questions associated with a concept or the maximum value of a variable. It concludes by identifying some open issues to address in the DDI ontology.
1. Leveraging the DDI Model for Linked Statistical Data
in the Social, Behavioural, and Economic Sciences
Workshop on Semantic Statistics
15.10.2012 – 19.10.2012
Thomas Bosch
M.Sc. (TUM)
postgraduate student
http://boschthomas.blogspot.com
GESIS - Leibniz Institute for the Social Sciences
3. Why DDI as Linked Data?
• Currently no such ontology available
• To increase visibility of data holdings using mainstream Web
technologies
• To open DDI to the Linked Data community
• To process DDI-RDF by RDF tools
• To link DDI-RDF to other RDF data
• To better identify opportunities for merging datasets
• To enable inferencing
• To research microdata within the LOD cloud
3
4. How was the DDI Ontology developed?
• DDI subset
• of the most important DDI elements
• Use cases
• Experts in the statistics domain formulated use cases which are seen
as most significant to solve frequent problems
• Most important use case: discover microdata connected with multiple
studies
• Leverage existing DDI-XML docs to DDI-RDF automatically
• Direct mapping
• Generic mapping (Bosch and Mathiak, 2011)
4
5. Discovery Use Case
• Which studies are connected with a specific coverage consisting of the 3
dimensions: time, country, and subject?
• What questions with a specific question text are contained in the study
questionnaire?
• What questions are connected with a concept with a specific label?
• What questions are combined with a variable with an associated coverage
consisting of the 3 dimensions time, country, and subject?
• What concepts are linked to particular variables or questions?
• What representation does a specific variable have?
• What codes and what categories are part of this representation?
• What variable label does a variable with a particular variable name have?
• What‘s the maximum value of a certain variable?
• What are the absolute and relative frequencies of a specific code?
• What data files contain the entire dataset?
5