With big data research all the rage, how are librarians being asked to engage with data? As big data research takes off across Business, Science, and the Humanities, librarians need to understand big data and the issues around its storage and curation. How can it be made accessible? What tools and resources are required to use and analyze big data? In this webinar, panelists Caroline Muglia and Jill Parchuck share how big data is being used on their campuses and how they, as librarians, are supporting the sourcing and storage of this data.
1. Los Angeles | London | New Delhi | Singapore | Washington DC | Melbourne
From Big
Data to the
Big Picture
#SAGETalks
2. Los Angeles | London | New Delhi | Singapore | Washington DC | Melbourne
Caroline Muglia, Head of Resource Sharing and Collection
Assessment Librarian at University of Southern California,
manages the Interlibrary Loan and Document Delivery
department and leads the collection assessment efforts for
the Library. Before this position, Caroline worked at the
Library of Congress and later as a Data Librarian for an
educational technology company.
Jill Parchuck has been the Associate University Librarian
for Science, Social Science and Medicine at Yale
University since 2014. Other positions Jill has held at Yale
include Director, Science and Social Science Libraries and
Co-Director of the Center for Science and Social Science
Information from 2010 to 2014 and Director of Social
Science Libraries and Information Services from 2007 to
2010.
#SAGETalks
3. Los Angeles | London | New Delhi | Singapore | Washington DC | Melbourne
While we do our best to answer as many questions as we can, time constraints may not allow us to
answer every question. Thank you for understanding.
Send us your questions!
Send in your questions
via the Question Box
on your screen. →
Using Twitter? Use
the hashtag
#SAGETalks.
#SAGETalks
4. Los Angeles | London | New Delhi | Singapore | Washington DC | Melbourne
Introduction
• Big data initiatives are plentiful!
• Libraries can play an important role
• What steps can librarians take to contribute to big data
projects?
• How can libraries add value to big data projects?
• How can libraries determine the needs for data support?
#SAGETalks
5. Los Angeles | London | New Delhi | Singapore | Washington DC | Melbourne
Areas we will cover
I. Datasets (Homegrown and Purchased)
II. Licensing Data
III. Storage and Repositories
IV.Software and Tools
V. Looking Ahead
#SAGETalks
6. Los Angeles | London | New Delhi | Singapore | Washington DC | Melbourne
What is Big Data?
• Volume: Amount of data being created and ingested.
What qualifies as “big”?
• Variety: Number of types of data
• Velocity: Speed at which data is being created and
processed
• Value: How data is being analyzed and utilized
#SAGETalks
7. Los Angeles | London | New Delhi | Singapore | Washington DC | Melbourne
Homegrown Datasets
Guiding question:
● Who is creating datasets and how?
● What are the uses of the datasets?
● Where are the datasets stored?
#SAGETalks
8. Los Angeles | London | New Delhi | Singapore | Washington DC | Melbourne
• Created by researchers
• Spatial Sciences student
projects
• USC Neuroimaging and
Informatics
• Created by Partnerships
• Big Data for Discovery Science
• Libraries
• USC Shoah Digital Library (8-
petabytes)
• Created by researchers
• Institution for Social and
Policy Studies (ISPS)
• Yale Open Data Access
Project (YODA)
• Yale Proteomics Expression
Database (YPEDS)
• Produced by administrative
units of the institution
• Yale Sustainability
#SAGETalks
9. Los Angeles | London | New Delhi | Singapore | Washington DC | Melbourne
Purchased Datasets
Guiding questions:
● In what format does the library receive the datasets?
● Where are the datasets stored?
● What kind of access do users have?
● How can users discover the datasets?
#SAGETalks
10. Los Angeles | London | New Delhi | Singapore | Washington DC | Melbourne
• Subject specialist receives request from researchers and places
order
• Data librarian receives and manages data and places it on local
server
• Cataloger creates records for the online discovery system
#SAGETalks
11. Los Angeles | London | New Delhi | Singapore | Washington DC | Melbourne
Licensing Data
Guiding questions:
● What are the terms of use?
● Access vs. ownership?
#SAGETalks
12. Los Angeles | London | New Delhi | Singapore | Washington DC | Melbourne
• Review criteria of license
• Ensure the widest possible use of content
• Ensure that a viable platform is available to provide access
• Ensure that metadata can be provided
• Can we retain a backup copy?
#SAGETalks
13. Los Angeles | London | New Delhi | Singapore | Washington DC | Melbourne
Storage & Repositories
Guiding questions:
● How much storage space is needed?
● What do we need the repository to do?
#SAGETalks
14. Los Angeles | London | New Delhi | Singapore | Washington DC | Melbourne
• USC Digital Repository
• ICPSR
• Departments/Schools
• Contract to repositories
• Purchase server space
from university
• Smaller data sets
• External hard drives
• Registry of Research Data
Repositories
• ICPSR - Inter-university
Consortium for Political and
Social Research
• Yale Social Science Data
Archive - all in local discovery
system
#SAGETalks
15. Los Angeles | London | New Delhi | Singapore | Washington DC | Melbourne
Services for Analyzing Big
Data
Guiding questions:
● Who owns the data? what rights management is needed?
● What do you need to do?
● Who will be using the data?
● What output options do you need to have?
#SAGETalks
16. Los Angeles | London | New Delhi | Singapore | Washington DC | Melbourne
Support For Using Data
• Organizing data
• Statistical analysis
• Cleaning data
• Manipulating data
• Managing data
• Data Visualization
• Retaining data
#SAGETalks
17. Los Angeles | London | New Delhi | Singapore | Washington DC | Melbourne
• Libraries
• Subject librarians
cultivate different skills
• Tableau license
• University-wide
• SC-CTSI (Clinical Data
Analysis)
• Center for High
Performance Computing
(HPC)
• Yale Center for Science and
Social Science Information
(CSSSI) - data services
• Yale StatLab Consultants -
statistical analysis
• CSSSI Research Data
Management - guide
• Yale Research Data
Consultation Group
#SAGETalks
18. Los Angeles | London | New Delhi | Singapore | Washington DC | Melbourne
Looking Ahead
What can you do now?
#SAGETalks
19. Los Angeles | London | New Delhi | Singapore | Washington DC | Melbourne
• Identify unique role that library can play
• Information management is a library service
• Data literacy or teaching with data; data education
• Expert trainers in Tableau, TDM tools
• Metadata expertise
• Store/make accessible other department’s raw data
• Can libraries provide analytical services?
• Learn needs of the institution
• Digital humanities projects can be a starting point
• Identify stakeholders
• Vendors and librarians can act as research partners
• Unique relationship that other departments may not have
#SAGETalks
20. Los Angeles | London | New Delhi | Singapore | Washington DC | Melbourne
Future considerations
What should you be prepared to handle in the near
future?
#SAGETalks
21. Los Angeles | London | New Delhi | Singapore | Washington DC | Melbourne
• Data management plans
• NSF data management requirements
• Open data
• Open Government
• Los Angeles Open City
• Open Science
• Data science
• More students trained in Data Science-increased knowledge on campus
• What is library’s role (Instruction, Collection Development) in meeting
these research needs, but also in capitalizing on them?
#SAGETalks
22. Los Angeles | London | New Delhi | Singapore | Washington DC | Melbourne
Webinar recording, slides, and follow-up Q&A will be emailed to you and available on
connection.sagepub.com.
Thank you!
Be sure to check our website for updates on our webinar series!
#SAGETalks