Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
e-Science, Research Data and Libaries
1. LIBER: e-Science Workshop
Rob Grim
e-Science Coordinator, Tilburg University
Executive manager Open Data Foundation (ODaF)
December 5th, Bristol 2011
2. e-Science, Research Data and Libraries
Overview of this presentation:
1. Open Data Foundation (ODaF)
2. e-Science
3. Research Data Life Cycle: Data Documentation Initiative (DDI 3)
4. Technology for Statistical Data and Metadata Exchange (SDMX)
5. Role of Libraries
Main issue of my talk:
• What kind of problems can be solved with metadata management?
• How and where can metadata management help libraries to support
research?
• What sort of data services could libraries develop?
LIBER e-Science Workshop 14-12-2011 2
3. What is ODaF?
The Open Data Foundation (ODaF) is a non-profit organization
promoting the adoption of global metadata standards and the development
of open-source solutions for the management and use of statistical data.
We focus on improving data and metadata accessibility and overall
quality in support of research, policy making, and transparency, in the
fields of Social, Behavioral and Economic sciences.
ODaF is heavily involved in developing and promoting SDMX and DDI 3
4. Why ODaF?
The Open Data Foundation (ODaF) was established to fill a gap in the
area of statistical data and metadata management in Social, Behavioral
and Economic sciences (SBE).
The adoption of metadata specifications (DC, DDI, SDMX, ISO/IEC
11179, ISO19115) has been impaired by the LACK OF TOOLS and agreed
guidelines for their use.
Building such tools requires the coordination of strong information
technology and cross-domain expertise that is NOT typically a function of
these agencies. This is not by lack of interest: it is simply not their
mandate, mission or responsibility.
5. What does ODaF do?
1. Support and coordinate the development of open-source
tools for management of statistical data and metadata
2. Provide technical assistance to agencies for the adoption of
metadata specifications, best practices in data
management, and capacity building
3. Provide access to public metadata collections and registries
4. Promote international cooperation and address global issues
5. Develop training resources and reference materials
6. Provide web-based facilities to foster the dialog between
various communities
6. Adopters/Interest in SDMX
1. European Central Bank (ECB)
2. International Monetary Fund (IMF)
3. United Nations (MDG, WHO, UNESCO)
4. World Bank (WB)
5. UNESCO (Education)
6. > 100 National Statistical Offices (NSO’s)
Adopters/Interest in DDI3
1. Australian Bureau of Statistics
2. CESSDA partners
3. OECD
4. Research Data Centers (CentERdata)
7. e-Science and Research Data
1. e-Science is about
Digital Curation
Machine actionable!
Automated Capture
Tools Development
2. Three characteristics of the “Digital Revolution”:
More Data
Data Sharing
Data Life Cycle
3. Metadata management is a critical issue to all of these!
LIBER e-Science Workshop 14-12-2011 7
9. Structure of the General Statistical Business
Process Model (GSBPM)
Process
Phases
Sub-
processes
(Descriptions)
Source: Steven Vale, UNECE, 2010
10. DDI 3 Use Cases
• Study design/survey instrumentation
• Questionnaire generation/data collection and procesing
• Data recoding, aggregation and other processing
• Data dissemination/discovery
• Archival ingestion/metadata value-add
• Question/concept/variable banks
• DDI for use within a research project
• Capture of metadata regarding data use
• Metadata mining for comparison, etc.
• Generating instruction packages/presentations
LIBER e-Science Workshop
11. DDI 3 Perspective
Media/Press
General Public Academic
Policy Makers
Government
Sponsors
Business
Producers Users
Archivists
Source: Pascal Heus, ODaF
12. DDI 3 Technical Overview
• DDI 3 is composed of several schemas
• Use only what you need!
• Schemas represent modules, sub-modules
(substitutions), reusable, external schemas
• archive • instance
• comparative • logicalproduct
• conceptualcomponent • ncube_recordlayout
• datacollection • physicaldataproduct
• dataset • physicalinstance
• dcelements • proprietary_record_layout (beta)
• DDIprofile • reusable
• ddi-xhtml11 • simpledc20021212
• ddi-xhtml11-model-1 • studyunit
• ddi-xhtml11-modules-1 • tabular_ncube_recordlayout
• group • xml
• inline_ncube_recordlayout • set of xml schemas to support xhtml
Source: Arofan Gregory/Wendy Thomas
13. Data Set Structure:Concepts
Stock/Flow
Country
Unit Multiplier
Unit
Time/Frequency
Computers need structure of data
•Concepts
•Code lists
Topicvalues
•Data
•How these fit together
14. Data Makes Sense
Q,ZA,B,1,1999-06-30=16547
Quarterly, South Africa,
Bank Loans, Stocks,
for 30 June 1999
16457
15. Libraries and Research Data Involvement
Four key areas of activity:
1. Data Availability
2. Data Discovery Services
3. Access and Accessibility
4. Delivery Services
LIBER e-Science Workshop 14-12-2011 37
16. Data Availability Data Discovery Access and Delivery
Accessibility
Registries Research data Metadata Enhanced
portals management tools Publications
(distributed access,
secured access to
data structures)
Data Archiving Subject repositories Research Data Data Publications
(Repositories) Warehousing and Data Journals
Collection building Resource Data Curation Supplementary
(application of Aggregation materials
ontologies) + (Disciplinary)
“Dark Archive
Materials”
Locally produced or Metadata Data Security and Data Dissemination
reused research Mining Data Privacy
data (“mash ups”) Digital Rights
Management (DRM)
LIBER e-Science Workshop 14-12-2011 38
17. Library and IT Services,Tilburg University
1. Research data services: registering, archiving, accessibility
2. Link publications, research data and supplementary materials
3. Data discovery services: subject portals European Values Study
4. Lobby to value research data as scientific output
5. Lobby for a generally adopted research data policy
LIBER e-Science Workshop 14-12-2011 39
18. Disclaimer
“No one, including NSF is quite sure what is
meant by DATA MANAGEMENT Or PLAN.”
Christine Borgman (DCC, Chicago, 2010)
Thanks for your attention!
Notas del editor
Two issues that are key to supporting research are the research data life cycle and the challenges and hindrances for research data sharing. I use the term E-Science interchangeably with E-Research.Two key issues for research data support Jim Gray: e-Science is where IT meets Science
Now lets see how this works….>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>Provide these examples after 3:More data: Metadata content caching is used for optimizing data retrieval queries in wireless networks.Reference: Metadata guided evaluation of resource-constrained queries in content caching based wireless networks.Data sharing: the economic crises over the last decades have been the main stimulator to foster a global infrastructure for the exchange of statistical data and raising awareness for data quality. IMF DQAS. Data life cycle: transparency, science as a social contract but also relevant to verification, replication, documentation and reuse of data.Literatuur: Liu, Zheng, Liu, Wang en Chen. Metadata guided evaluation of resource-constrained queries in content caching based wireless networks. Wireless networks, 17:1833-1850. Springer.
DDI 3 takes the data life cycle as a starting point. --- MOST COMPLEX STANDARD OUT THERE ---
Do we need such complex standards?
Big Data, Cloud Computingloud? Capture? Curation?Tool developmet
Functional perspective on the services that libraries might be willing to provide.See also: E. Harold (IBM column).