This is ICPSR's core workshop deck designed to introduce, remind, and refresh your knowledge of ICPSR. It contains four "tours" or sub-presentations describing ICPSR's general reason for being, it's social and behavioral research data complete with search strategies, its training, educational, and instructional resources, and its data management and curation services, data repository options, and support resources (content and budget estimates) for those writing grant proposals.
2. What’s Included – Four Tours in One
• What, Why, & Who of “ICPSR”
– Mission and usage of ICPSR
– ICPSR’s past & present
– Benefits of membership
• Finding Research Data for Analysis
– Scope & search strategies
– Data tools
• ICPSR in Education
– ICPSR Summer Program
– Teaching resources
– Student internships & research opportunities
• Sustainable Data Management & Curation
– Fulfilling grant requirements
– Deposit and curation options and resources
– Sharing restricted-use data
4. ICPSR’s Mission
ICPSR advances and expands social and behavioral research,
acting as a global leader in data stewardship and providing rich
data resources and responsive educational opportunities for
present and future generations.
Three Pillars for Implementing our Mission
1. Share data – maximize access to research
data for analysis and publications
2. Educate and train current & future
research methodologists & data scientists
(curators)
3. Provide data management & curation
services to fulfill grant requirements and
assure long-term viability of research data
5. What We Do – It’s About Data!
• Seek research data and pertinent
documents from researchers (PIs,
research agencies, government)
• Process, describe (tag), and preserve
the data and documents
• Disseminate (share) data
• Provide education, training, &
instructional resources
• Offer grant-writing and fulfillment
support and data management
services
6. Why People Use ICPSR
• Write articles, papers, or theses using real
research data – publish & pass!
• Conduct secondary research (analysis) to support
findings of current research or to generate new
findings
• Study or teach quantitative methods (data
analysis techniques)
• Study data curation and repository management
• Use as intro material in grant proposals
• Preserve/disseminate primary research data
– Fulfill data management plan (grant) and data
sharing requirements
8. • One of the world’s oldest and largest social
& behavioral science data archives, est. 1962
• Based at the University of Michigan
• Data distributed on punch cards, then reel-
to-reel tape, now:
– Data available on demand
– Over 9,200 studies with over 72,700 data sets
• Membership organization among 22
universities, now:
– Currently about 750 members world-wide
– Federal funding of public-access collections
ICPSR’s Past & Present
9. Present Volumes of Activity
• 7,951 studies: 65,892 datasets: 187,931 files
available for download on demand
– 1,280 restricted-use studies (6,826 datasets)
• ICPSR Catalog points to 9,370 studies
• FY 2015
– 1,002,511 datasets downloaded
– 45,647 active MyData accounts
– 478,016 website visits/303,286 unique visitors
– 983 Summer Program attendees
10. Most Popular Downloads this Past Year:
• National Survey on Drug Use and Health
• National Longitudinal Study of Adolescent Health
• General Social Surveys (1972-2012 Cumulative)
• National Survey of Midlife Development in the US (MIDUS)
• National Mental Health Services Survey (N-MHSS)
• Drug Abuse Warning Network (DAWN)
• Gender, Mental Illness, and Crime in the United States
• Children of Immigrants Longitudinal Study (CILS)
• India Human Development Survey
• Chinese Household Income Project
• Law Enforcement Management and Administrative Statistics
(LEMAS)
• How Couples Meet and Stay Together (HCMST)
• Police-Public Contact Survey
11. Benefits of Membership in ICPSR
• Data access: 5,108 studies associated with 28,873 curated datasets including:
– General Social Survey
– American National Election Survey
– Education Longitudinal Survey
– New Family Structures Study
• Teaching resources (Data-Driven Learning Guides) available exclusively to
ICPSR members
• Discounted ICPSR Summer Program tuition
• Free data deposits to openICPSR – ICPSR’s public data access collection
• Menu of data usage reports across your institution immediately available
electronically
• Data management plan and budget estimate support for grant proposals
• Access to a global network of over 750 institutions of all sizes interested in
research data, data curation, and training
12. The Concept of “Data Curation”
• Curation, from the Latin "to care," is the process used to add value to
data, maximize access, and ensure long-term preservation
• Data curation is akin to work performed by an art or museum curator.
– Data are organized, described, cleaned, enhanced, and preserved for
public use, much like the work done on paintings or rare books to make
the works accessible to the public now and in the future
• Curation provides meaningful and enduring access to data
• Data curation is the foundation for effective, long-term data sharing
14. If you recall: Most Popular Downloads this Past Year:
• National Survey on Drug Use and Health
• National Longitudinal Study of Adolescent Health
• General Social Surveys (1972-2012 Cumulative)
• National Survey of Midlife Development in the US (MIDUS)
• National Mental Health Services Survey (N-MHSS)
• Drug Abuse Warning Network (DAWN)
• Gender, Mental Illness, and Crime in the United States
• Children of Immigrants Longitudinal Study (CILS)
• India Human Development Survey
• Chinese Household Income Project
• Law Enforcement Management and Administrative Statistics
(LEMAS)
• How Couples Meet and Stay Together (HCMST)
• Police-Public Contact Survey
15. What’s in a “Download?”
• Documentation files - pdfs
– Questionnaire
– Codebook
– Description & Citation
• Data in many forms!
– SPSS, SAS, Stata
– ASCII
16. How does One Download Data?
The MyData Account
• MyData account – operates as authentication and like a
shopping cart!
• Authenticate once every six months on campus and you
can carry it with you
17. Enter Our Front Office: ICPSR Website
http://www.icpsr.umich.edu/
18. The Challenge – Hoards of Data & Metadata
How does one make sense of:
• 9,230 studies
• 72,720 datasets
• 239,800 files
• Millions of variables
• 67,245 bibliographic citations
20. ICPSR’s Thematic Data Collections
– another search strategy
• ICPSR’s Thematic Collections are
archives organized around
specific topics
• Most collections are funded by
government agencies or
foundations and therefore data
are open to the public
• Data from all collections,
including the membership
archive, are searchable by using
the search found on ICPSR’s
Find & Analyze page
• Those desiring to search for
data only within a particular
collection should use the search
provided within that collection
22. It’s really a searchable database
• Containing over 67,250 citations
of known published and
unpublished works resulting
from analyses of data archived
at ICPSR
• That can generate study
bibliographies associating each
study with the literature about
it
• Included in one integrated
search on the ICPSR website
Data Tools: Find Publications
The Bibliography of Data-related Literature
23. Data Tools: Social Science Variables Database
Enables ICPSR users to:
• Search & Compare Variables across
datasets
• Assists in:
– Data discovery
– Comparison/harmonization projects
– Data harvesting & data analysis
– Question mining for designing new research
– Research methods & substantive courses
instruction
27. Supporting the Data
• Free user support
• The Get Help Page offers:
– User support (at ICPSR) email and phone contact
information
– Data User Help Center: Short Tutorials & Webinars
available 24/7 (via ICPSR’s YouTube channel)
– Local Support: Who to contact at your local institution
– Glossary of Terms
– Social Networks: Where you can find us on YouTube,
Facebook, Twitter, LinkedIn, Slideshare, and more
29. ICPSR Summer Program in Quantitative
Methods
• Instruction on the tools and practices needed to analyze data
• For those with math phobia and those with advanced analysis
skills
• 3-5 day workshops and 4-8 week courses
• Primarily held in Ann Arbor, MI,
on the campus of The University
of Michigan, but some courses
on other campuses also
• http://www.icpsr.umich.edu/sumprog/
30. Teaching Resources to
Bring Data Into the Classroom
• Easy to use features of ICPSR’s website in classes
– Social Science Variables Database
– Bibliography of Data-Related Literature
– SDA – Online Analysis
• Additionally, in partnership with teaching faculty, ICPSR has
developed:
– Short Exercises – the DDLGs
– Online teaching modules
– Online tutorials
35. Student Internships & Research Opportunities
• Paid Student
Internships focusing
on investigating social
& behavioral sciences
research – an REU
• Research paper
competitions -- a
research journal
experience & cash
prizes!
37. First - The Concept of “Data Curation”
• Curation, from the Latin "to care," is the process used to add value to
data, maximize access, and ensure long-term preservation
• Data curation is akin to work performed by an art or museum curator.
– Data are organized, described, cleaned, enhanced, and preserved for
public use, much like the work done on paintings or rare books to make
the works accessible to the public now and in the future
• Curation provides meaningful and enduring access to data
• Data curation is the foundation for effective, long-term data sharing
38. Two ‘Recent’ Moments in Federal Data
Sharing History
• NSF: January 2011 – requirement of data
management plans
• OSTP: February 2013 – Memo with subject
“Increasing Access to the Results of Federally
Funded Scientific Research”
39. The details are still developing but the
focus for research data sharing includes:
1. Maximize public access (includes discoverability)
2. Protect confidentiality and privacy
3. Allow for inclusion of costs in proposals for federal funding of
scientific research
4. Appropriate evaluation of submitted data plans
5. Compliance mechanisms
6. Cooperation with the private sector
7. Appropriate attribution
8. Long term preservation and sustainability
40. What is good data sharing?
The goals are simple:
• Data gets used (maximizes taxpayer
investment & credits investigators)
• Available today and into the future
• Research respondent protection
41. ICPSR offers Three Sustainable Data
Sharing Models to Fulfill Requirements
• Fee-for-access model (membership archive)
• Agency model (agency or foundation funds
public access)
• Fee-for-deposit model (researcher writes fee
into grant and pays at deposit to fund public
access)
42. ICPSR’s Fee-for-Access Data Sharing
• Funding is maintained by annual membership (subscription) fees
charged to institutions; individuals at member institutions have
free (open) access to data
• Pooled (ongoing) fees are used to acquire, curate, and maintain
the service
• Datasets can be acquired by non-members for a fee
43. ICPSR’S Agency-funded Data Sharing
• Agency sponsors/funds (ongoing) data curation & sharing enabling the
public to access without charge
• The archive is hosted by ICPSR where the public can easily discover and
access data and restricted-use data can also be securely shared
• Agency directs data selection and compliance policies
44. ICPSR’s Fee-for-Deposit Data Sharing
- openICPSR -
• Depositor (individual or entity) pays for
data to be curated and stored – a fee at
deposit
• Deposit fees to be written into the grant
application
• Incoming deposit fees sustain the service
and the professionals behind it
• Deposits are bit-level and fully curated
deposits (recommended!)
• Bit-level deposits are free to individuals at
member institutions
45. openICPSR for Institutions and Journals
• Fully branded data-sharing repository
– Fully hosted and fully integrated in
ICPSR’s infrastructure
– No need for technical staff or downloads
(patches & upgrades)
• Meets government grant and journal
requirements for data sharing
– DOIs and data citations provided upon
deposit
• Capability to share restricted-use
(sensitive) data
46. Data Management & Curation Resources
http://www.icpsr.umich.edu/datamanagement/
47. Purpose of Data Management Plans
• Data management plans describe how researchers
will provide for long-term preservation of, and
access to, scientific data in digital formats.
• Data management plans provide opportunities for
researchers to manage and curate their data more
actively from project inception to completion.
50. And still more guidelines after the
project is awarded:
• Guide emphasizes
preparation for data
sharing throughout
the project
• Available online and
via download (pdf)
51. Sharing Restricted-Use Data
• Data with disclosure risk –
potential to identify a research
subject
• Data with highly sensitive
personal information
What is Restricted-Use Data?
52. Common Objection/Misperception:
“My data are too sensitive to share. . .”
• ICPSR has been sharing restricted-use data for
over a decade via three methods:
– Secure Download
– Virtual Data Enclave
– Physical Enclave
• ICPSR stores & shares over 6,400 restricted-
use datasets associated with over 2,000
‘active’ restricted-use data agreements
53. Reality: Restricted-use data can be
effectively shared with the public
• Through the use of a virtual data enclave where
the data never leave the server
• Where there is a process (and understanding!)
to garner IRB approval from the requesting
scientist’s university
• Where there is a system, technology, data
professionals, and collaboration space in place
to disseminate (expensive to build!)
• Because federal agencies do allow for an
incremental charge to the data requestor to
offset marginal costs
55. For More Information on ICPSR:
• Explore the website - www.icpsr.umich.edu
• Sign up for our email announcements -
www.icpsr.umich.edu/icpsrweb/membership/lists/index.jsp
• “Like” ICPSR on Facebook/follow ICPSR on Twitter
• Attend or view our webinars (open to the public!);
recordings & slides found on ICPSR’s YouTube
channel.
• Find our presentations on www.slideshare.net –
user: icpsr
• Contact user support – netmail@icpsr.umich.edu
Notas del editor
This is the front office for ICPSR employees.
ICPSR supports students, faculty, researchers, and policymakers.
It is sometime easier to identify what ICPSR does not accept than what it does. ICPSR is not appropriate for the natural or hard sciences (bio-medical). It is also not appropriate for huge datasets – multiple GBs of data. Our meta-data experts and our catalog is focused on a very broadly defined area known as the social and behavioral sciences.
For repositories outside ICPSR’s domain, see Stanford’s list: http://library.stanford.edu/research/data-management-services/share-and-preserve-research-data/domain-specific-data-repositories
As of September 2015, over 72,718 datasets (over 239,746 files) available for download associated with over 9,200 studies.
In addition, 1,280 restricted-use studies (6,820 datasets) available for analysis.
ICPSR catalog indexes over 9,370 catalog entries/studies (as of 9/2015).
As a sense of volume of downloads, total downloads for FY 2014 = over 683,200 datasets (428,900 studies) downloaded/accessed.
Also in FY2014 – about 45,645 (20,415 members) MyData accounts downloaded/accessed something – were active.
As of September 2015
Downloads for the period July 2014 - June 2015 (FY2015)
Data Access: Members receive full access to all ICPSR data; all individuals are permitted to download data with no limitations on quantities (counts) of downloads. Datasets are curated by ICPSR professionals meaning the study and datasets are well-documented, tagged, researched for past publications, and rendered into current statistical software applications, online analysis software, and other data tools (where plausible). The cost to non-members is $600 per dataset.
Teaching Resources: Teaching faculty and students at member institutions receive full access to ICPSR’s teaching resources developed to support quantitative literacy and introduce data analysis into intro-level undergraduate courses with little prep needs.
Discounted Summer Program Tuition: Individuals from member institutions may attend the Summer Program at half the rate of non-members: http://www.icpsr.umich.edu/icpsrweb/content/sumprog/2014/index.html
Free data deposits into openICPSR: Individuals from member institutions receive free/discounts on fees related to deposits into ICPSR whether they are fees for self-deposits or fully curated deposits.
ICPSR supports the development of data management plans and budget estimates for those applying for grant or contract funding where data sharing is required or highly encouraged.
Members of ICPSR form a research data community with immediate access to experts around the world in diverse areas of data collection, data analysis, data management, data curation, data preservation, and training and education.
This is the front office for ICPSR employees.
Downloads for the period July 2014 - June 2015 (FY2015)
We keep talking about “the download.” What’s in a download anyway?
First: Many PDFs
A copy of the actual questionnaire – it’s not pretty!
A copy of the codebook – much more attractive & it contains frequencies!
Description & Citation: essentially, the data about the data (metadata) as well as the data citation you are to use when citing the dataset as a source.
Second: The actual data file(s)
System files for SAS, SPSS, & Stata
ASCII files – straight-up data file and/or Setups files for SAS, SPSS, Stata
This is the front office for our customers!
FIND & ANALYZE DATA: this is the page where you can find several tools to help you find/browse data (you can also use the search box in the center of the Find Data page).
The Search/Compare Variables link enables you to examine and compare variables and questions across studies or series.
Find publications enables you to search (or submit!) citations for works that use ICPSR data as part of analysis.
Resources for students covers information on the research paper competition and our paid internships.
You’ll find links to our thematic collections (also known as our special topic archives or projects) in the left-hand pane.
MEMBERSHIP IN ICPSR: this page contains all the information about the consortium (history, mission, staff overview, careers, and contact information). This area contains the list of members and a list of partners. You’ll find our subscribed Email Lists here – important if you are interested in our webinars, summer program, or other news about the consortium. Need printed promotional materials on ICPSR or templates for workshops? – see the Promoting ICPSR link.
DEPOSIT DATA: this page is for those interested in depositing data at ICPSR or preparing data in good form for long term preservation. Also, links to discussion of protection of respondent confidentiality are found here.
ICPSR Summer Program: contains information on the ICPSR Summer Program in Quantitative Methods including course descriptions, fees, and registration.
RESOURCES FOR INSTRUCTORS: contains short data-driven exercises and modules, resources for students (careers/internships information), and links to other data-related teaching resources.
DATA MANAGEMENT: describes ICPSR's practices in selection and appraisal of data, ingest, access and dissemination, and disaster planning. Provides information on digital preservation, data management plans and data citations.
CONTACT US: information on contacting various ICPSR staff is found by clicking on the “Contact” link at the bottom of each page.
For this slide, we tend to conduct live demos, but a few notes here to get you started:
This is the front page of the data search. The search box, Find Data, works much like other search engines. Note however that unfortunately, this search does not accommodate (or correct) for misspellings. Not receiving any results? Check your spelling. If using names, names like Bob and will not bring up Robert or Timothy. Correct name references must be input.
The page also offers several pre-programmed ways to obtain results – by topic, by geography, or by studies that have learning guides (teaching resources) associated with the study.
Link to the Thematic Collections page: http://www.icpsr.umich.edu/icpsrweb/content/membership/partners/archives.html
The Study Home Page is also a great “search” strategy. Click into any study, and you will find all the information we have been able to gather about the study.
Use the Summary for a quick review, then click into the “view details” to understand the full scope of the research – methodology, survey type, sampling, scope, geography, subject terms used to tag the dataset, PI, and much more. You’ll also find a link to all of the journal articles, reports, and presentations we’ve been able to link to the dataset (where the data was used as part of the analysis within the article). This is a great way to understand whether this data is for you.
What’s in the bibliography collection? Published & unpublished works . . .
using data in the ICPSR holdings as the primary data source
using ICPSR data in a comparison with the primary dataset investigated
"about" an ICPSR dataset or study series
The link to Find Publications is found from the Find & Analyze Data page or directly here: http://www.icpsr.umich.edu/icpsrweb/ICPSR/citations/index.jsp
Tool for teaching
Research Methods:
Concept operationalization
Effect of question wording, context, and answer categories on variable distributions
Substantive classes:
Cultural / social changes reflected in different question wordings, or elicited answers (longitudinal or time series data)
Current content:
Over 76 percent of ICPSR holdings
Approx. 4 million variables
Continues to grow by including
All new releases, if suitable
Retrofits as made available by small-scale projects
As of September 2015: 1040 studies available online.
View SDA studies here: http://www.icpsr.umich.edu/icpsrweb/ICPSR/access/sda.jsp
Gender variable by occasions smoked marijuana variable
As you seen, ICPSR doesn’t just deliver data. We surround that data with tools and services that support its use and interpretation.
Instructional materials are another way to “share” research data – in addition to educating the next generation.
Exploring Data Through Research Literature
Designed to teach quantitative research methods to undergraduates in a different way.
Integrates ICPSR bibliography of data related literature into teaching students how make their way from ideas to empirical work to literature and back.
Suitable for both research methods and other substantive courses requiring empirical research
http://www.icpsr.umich.edu/icpsrweb/EDRL/index.jsp
Investigating Community and Social Capital
Uses 3 data sets including the General Social Survey, DDB Needham Life Style Surveys, and State-level data to reproduce findings from Robert Putnam’s Bowling Alone
Teaches how to browse codebooks, devise and execute crosstabulations, and use summary statistics
Helps teach replication of scientific evidence
http://www.icpsr.umich.edu/ICSC/index.html
SETUPS
Uses the 2012 National Election Study to understand voting behavior (2008 & 2004 also available)
Provides substantive background, terms and descriptions, and embedded exercises to allow users to get through simple exploratory analyses of political behavior. Builds crosstabular exercises based on various questions about the 2012 Presidential elections.
www.icpsr.umich.edu/SETUPS2008
In January 2011, the National Science Foundation released a new requirement for proposal submissions regarding the management of data generated using NSF support. All proposals must now include a data management plan (DMP). (NIH has similar DMP requirements.)
The plan is to be short, no more than two pages, and is submitted as a supplementary document. The plan needs to address two main topics:
What data are generated by your research?
What is your plan for managing the data?
The OSTP Memo
This memo directed funding agencies with an annual R&D budget over $100 million to develop a public access plan for disseminating the results of their research
concern for investment: “Policies that mobilize these publications and data for re-use through preservation and broader public access also maximize the impact and accountability of the Federal research investment.”
Federal agencies with over $100 M annually in R&D expenditures to develop plans to support increased public access to the results of research funded by the Federal Government
These are ICPSR’s “Thematic Collections”
openICPSR is a unique public data-sharing service:
Where the deposit is reviewed by professional data curators who are experts in developing metadata (tags) for the social and behavioral sciences = discoverable
With an immediate distribution network of over 750 institutions looking for research data, that has powerful search tools, and a data catalog indexed by major search engines = usage
Sustained by a respected organization with over 50 years of experience in reliably protecting research data = sustainable
Prepared to accept and disseminate sensitive and/or restricted-use data in the public-access environment = protection of research subjects
A collection of resources (links) to assist in data management plans for grant proposals
Tools to prepare plans (templates & sample plans)
Contact information for plan advice
22 pages of guidelines and references even including a sample plan (boilerplate!) available for download.
Link to pdf document: http://www.icpsr.umich.edu/files/datamanagement/DataManagementPlans-All.pdf
Pdf link to the data prep guide: http://www.icpsr.umich.edu/files/deposit/dataprep.pdf
More information on data preparation for archiving: http://www.icpsr.umich.edu/icpsrweb/content/deposit/guide/
Sensitive personal information isn’t about names, addresses, credit card numbers, or other direct identifying information. Research scientists should never, never, ever submit this type of information to any hosted service – ever. What we’re talking about is highly personal information (topics) within research data that may include past/present drug use, illegal activities, or perhaps sexual habits.
We’re currently adding about 50 new agreements each month.