This document summarizes recent reports calling for increased sharing of social science data and new forms of data. It discusses challenges with traditional and new data types. Specifically, it outlines recommendations from the Administrative Data Taskforce to improve access to and linkage of administrative data for research through new centers, legislation, and governance. Implications for libraries and journals include hosting secure remote access, improving data documentation, and building researcher capacity for using new administrative data resources.
New developments for research using administrative data
1. Data context: new developments for
research the social sciences
Peter Elias
4th Luso-Brazilian Conference on Open Access,
University of Sao Paulo
9th October 2013
2. Structure of the presentation
• Recent reports - what’s going on?
• What constitutes data in the social sciences?
• What problems do we face with the more
traditional forms of data?
• New forms of data
• Challenges using new data types
• The report of the Administrative Data
Taskforce
• What does this mean for journals?
4. Science as an Open Enterprise
(Royal Society 2012)
Royal Society
2012
The main thrust of this report was that transparency and
openness should characterise all scientific research. As a
major part of this, data sharing should be regarded as the
norm and researchers, their funders and research
institutions should adopt this stance in all their research
activities. An important recommendation relates to
situations where data hold personal information. In such
cases, appropriate safeguards should be put in place to
prevent disclosure of such details whilst facilitating data
sharing.
5. New Data for Understanding the
Human Condition: international
perspectives. (OECD 2013)
OECD 2013
The focus of this report was on the need for global
collaboration over data sharing. This will require
improved incentives for researchers who agree to share
data, and the adoption of agreed standards and
protocols for data description. Additionally, the report
calls for an international approach to the use of ‘Big Data’
for research, covering collaboration over the exploration
of the research value of new forms of data, the
development of tools for their analysis and improved
access to administrative datasets on a cross-national
basis.
6. Report of the Administrative Data
Taskforce 2012 (ESRC, MRC, Wellcome
2012)
This cross-departmental Taskforce proposes a
major boost to the resources available for linkage
and sharing across administrative datasets with
the establishment of Administrative Data
Research Centres in the countries in the UK.
Additionally, all taskforce members are agreed
that new legislation is required in order to
overcome current legal obstacles to record-level
linkage between data held by different
administrative bodies.
7. Investing for Growth: Capital
infrastructure for the 21st Century
(RCUK 2012)
This report sets out priorities for capital
investment for research. A major theme
throughout is to improve UK capacity to harness
‘Big Data’, emphasising the key importance of
longitudinal data, of linking socioeconomic data
sources to other data, including administrative
records, private sector, and biomedical data, as
well as ensuring these resources are accessible
for social scientific research to benefit the
economy, health and other sectors.
8. What constitutes data in the social
sciences?
• Research interests focus upon people and organisations, their
•
•
interaction, their evolution – seeking to understand better the
behavioural relationships between them
Data types of interests relate to people and organisations,
variously classified as
Aggregated/disaggregated
Spatially referenced/time-stamped
Longitudinal/cross-sectional
Quantitative/qualitative
Structured/unstructured
Data structures include ‘rectangular’ datasets, hierarchical data,
textual, numerical, audio, video
9. What problems do we face with the
more traditional forms of data?
• Discovery (NESSTAR; CESSDA; Data
Management Plans)
• Documentation (DDI; SDMX)
• Access (DWB; IHSN)
• Reuse (CESSDA)
• Preservation (CESSDA)
10. New forms of data
Broad category
of data
Detailed categories
Examples
Individual tax records
Corporate tax records
Corporation tax; sales; tax, value added tax
Property tax records
Tax on sales of property; tax on value of property
Social security payments
State pensions; hardship payments: unemployment benefits;
child benefits
Import/export records
Category A:
Government
transactions
Income tax; tax credits
Border control records; import/export licensing records
Housing and land use
registers
Registers of ownership
Educational registers
Criminal justice registers
Police records; court records
Social security registers
Category B:
Government
and other
registration
records
School inspections; pupil results
Registers of eligible persons
Electoral registers
Voter registration records
Employment registers
Population registers
Employer census records: registers of persons joining/leaving
employment
Births; marriages; civil unions; deaths; immigration/emigration
records; census records
Health system registers
Personal medical records; hospital records
Vehicle/driver registers
Driver licence registers; vehicle licence registers
Membership registers
Political parties; charities; clubs
12. Challenges using new data types
•
•
•
•
•
•
•
•
Provenance
Replicability
Durability
Volume
Ethics
Confidentiality
Legal issues
Access may be strictly controlled
13. Focus from here on one particular data
type:
Administrative data – reuse for
research
14. What are administrative data?
Data which are the product of an administrative
system. They are generated by organisations for
operational purposes or as a legal requirement.
They might identify people and/or organisations
and may contain detailed spatial information, be
time-stamped. They are produced by public and
private sector organisations. They are not
designed for research.
15. What is the research value of such data?
• They already exist. No additional data collection costs associated
with research use.
• They are typically large national datasets, permitting more
detailed research to be undertaken than would otherwise be the
case.
• They record a process, which can be documented and
understood.
• Linkage between data relating to different time periods can
create longitudinal resources.
• Linkage to other data sources (e.g. surveys) can enhance these
resources.
16. What are the problems associated with
their research use?
• Not designed for research. This may pose difficulties for their
use in specific research areas.
• They are not subject to statistical standards or statistical
quality controls.
• They may be difficult to access, and linkage may be prohibited
or may not be feasible.
• As the systems that generate them change, so might the data.
• Their preservation for research is not regarded as a
fundamental objective – may lead to problems with metadata.
17. Some of the problems currently faced by
researchers
• Inconsistent access conditions.
• Severe time delays in granting access or refusal.
• Lack of information about selection and/or linking of
administrative datasets.
• Restricted access to datasets – especially for addressing the
counterfactual.
• Data controller making unilateral decision about
appropriateness of data for research.
• Research permitted then publication denied.
18. Terms of reference for the Taskforce
• identification of potential risks and benefits from increased
research use of administrative data;
• identification of likely resource implications arising from
increased research use of administrative data;
• the development and introduction of common procedures to
provide more efficient access to administrative datasets;
• clarification of the legal situation governing the use of routine
data;
• clarification of when consent is required and what consent
procedures should be used;
• identification of possible need for legislative change to improve
access to administrative data for research.
19. What has the Taskforce recommended?
• Improved access and linkage procedures and arrangements
for their governance.
• A clearer legal environment for linkage between data held by
different departments.
• A common accreditation process for researchers applying for
access to and linkage between administrative datasets.
20. Where are we now?
• £34 million released by government .
• Four Administrative Data Centres commissioned.
• A new UK Administrative Data Service set up.
• A national governing authority is being established.
• New legislation under preparation.
• Now commissioning centres for local government and private
sector data
21. What are the implications for libraries and
journals?
• Libraries as home for secure remote access facilities .
• More attention to data documentation and discovery tools.
• Building up capacity within the research community to
facilitate research using the improved access and data linkage
arrangements.
• Subject knowledge of librarians to extend to administrative
datasets.
• To be solved – open access and access to administrative data