The Role of Libraries in Data Management and Curation, presented at the American Library Association conference in Las Vegas, NV, 07/29/14.
Abstract:
As increasing amounts of data are being generated, applying best practices in handling data is important, and librarians are well poised to assist users. During this session, we will discuss the role of libraries in assisting with data management, application of metadata, ontologies, data standards, and the publication of data in repositories and on the Semantic Web. This talk will describe best data practices and engage the attendees in interactive activities to demonstrate these principles.
5. Role of Libraries in
Research Data Management
Data
management
training
Information
Literacy
Metadata,
Archiving,
Reporting,
Open Access
Host
repositories
Open Access
Policies
6. 1 | How can we make science more
reproducible?
2 | How can we educate researchers to
make their data reusable and research
reproducible?
3 | How can we use data to generate new
hypotheses and make new connections?
8. Definition:
Any closed,
prescribed list
of terms used
for
classifying
data
Wine
Chardonnay
Pinot Noir
Bordeaux
Red
Reisling
Controlled vocabulary
Key Features:
• List of terms
• Terms are
defined
• Relationships
between terms
are defined
11. A formal conceptualization of a specified domain of
interest
What is an Ontology?
Roz Chast 8/4/1986
1. Hierarchical terms are
defined textually and
logically
2. Relationships between the
terms are defined
3. Expressed in a language
that can be reasoned
across by computers
4. Data can be reused and
can be easily linked
together
23. Resource Identification Initiative
Promoting use of Research Resource IDs (RRIDs)
in the published literature
Antibodies
Software & Tools
Model Organisms
Pilot project ongoing through 2014
RRIDs should be:
Machine Readable
Consistent across publishers
and journals
Free to generate and access
Resources:
24. Sample citation:
Polyclonal rabbit anti-
MAPK3
antibody, Abgent, Cat#
AP7251E,
RRID:AB_2140114
1.
Research
er
submits a
manuscri
pt for
publicatio
n
2. Editor or
Publisher
asks for
inclusion of
RRID
3. Author goes to
Research
Identification
Portal to locate
RRID
4. RRID is
included
in
Methods
section
and
as
Keyword
Workflow
25. Outcomes
Demonstrate the need for …
better reporting of materials and
methods
a cultural shift in the way we write and
structure papers
a cultural shift in the way we view the
literature
27. Attempting to independently replicate research in 50
major cancer studies
https://osf.io/e81xl/wiki/home/
Reproducibility Project: Cancer Biology
28. On average, approximately 15% of the resources
are unidentifiable
Resources reported in the 50 Reproducibility
Initiative studies show similar results
Vasilevsky et al., 2013,
PeerJ
Reproducibility
Initiative
29. Treatment with peptide X and two of its
isomers inhibits leishmania growth
http://pt.wikipedia.org/wiki/Leishmania_infantum
30. Tried to replicate the primary finding, not the other
experiments (funding constraints)
Experiment showed similar dose response, but at
10X concentration
There was no negative control
The Leshmania strain turned out to be a different
one
The peptides turned out to be amidated but this was
not described in the original publication
The Reproducibility Initiative
attempted to reproduce this study
31. What does it mean to be
reproducible?
• Compare study results statistically
• What is primary conclusion being tested?
• Which experiments need to be
reproduced?
• Is there an experimental effect?
a lab effect?
– A synergy between the two?
32. 2 | How can we educate researchers to
make their data reusable and
reproducible?
33. What would you do with
$1k today to make
research communication
better that doesn’t involve
building another tool?
38. Your Data: Gummy Bear Raw Data
Bounce
s
Amplitude Color
15 4 blue
43 3 red
58 9 green
75 82 purple
Materials:
• Haribo Gummi
Bears Sugar Free, 5
lb bag
• SpringOMatic 3000
http://laughingsquid.com/the-anatomy-of-a-gummy-
bear-by-jason-freeny/
39. Fig. 1
Belly button of
Haribo Sugar Free
Gummi Bear
Group 1 Group 2
Group 3 Group 4
Results from each groups varied
40. GUMMY BEARS TAUGHT US…
• People see the same data very
differently
• “Detailed” means different things…
• Metadata?!?
• File Management is Difficult
• Workflow
42. Initial findings…
• Researchers need assistance:
• Finding and choosing the best standard
for their data
• File versioning
• Applying metadata to facilitate data
sharing
• Lack of awareness of services and
expertise offered by the library
43. 3 | How can we use data to generate
new hypotheses and make new
connections?
46. CTSAconnect Project
Connecting people and resources
Means
Goal is to create a semantic representation
researcher expertise
Publish linked data
vivoweb.org
47. VIVO Integrated Semantic Framework
(VIVO-ISF) Ontology Suite
Merge the eagle-i and VIVO ontologies into one single
ontology suite (the VIVO-ISF)
Extend their coverage to include representation of clinical
encounter
Modularize the VIVO-ISF such that it can be made
available in a set of files that can be reused independently
eagle-i
Resources
VIVO
People
Coordination
eagle-i
VIV
O
Semantic
Clinical
activities
vivoweb.org
48. Potential Points of
Connection Dr. Sawyer Dr. Finn Connected?
University
Appointments A, B, D C No
Journals
Journal of Circles and
Squares
Annals of Diamonds and
Triangles No
Co-Authors
Lennon, McCartney,
Harrison, Starr Jagger, Richards, Jones No
MESH Terms Yada, Yada, Yada Bada Big, Bada Bing No
Machines Used
Alpha, Gamma, Theta,
Sigma
Gamma, Beta, Kappa,
Theta Yes!
Agents Used
Cyan, Orange, Green,
Mauve, Beige
Cyan, Chartreuse, Green,
Mauve, Taupe Yes!
Genes Referenced
bz3d14.2, bz3d,98.1,
bz3c13.1 bz3c13.1 Yes!
Proteins Referenced Eng1a, Ntl, Ncdq Ndrw, Eng1a, Brs Yes!
People Affiliated Harry, Ron, Hermione Harry, Ron, Hermione Yes!
But Wait — What Does the VIVO-ISF Tell Us?
Traditional Methods of Searching For Connections
How the VIVO-ISF can help
51. The undiagnosed patient
Is it a known disorder that we
are not recognizing?
Is it a new disorder?
52. Genotype vs Phenotype
Phenotype = genotype + environment
+ life history + epigenetics
Genotype: genetic code of an organism
Phenotype: Observable characteristics of an organism
54. 1 | How can we make science more
reproducible?
2 | How can we educate researchers to
make their data reusable and
reproducible?
3 | How can we use data to generate new
hypotheses and make new connections?
55. Acknowledgements
ODG
• Melissa Haendel
• Robin Champieux
• Matthew Brush
• Shahim Essaid
• Bryan Laraway
• Eric Segerdell
• Jeff Emch
• Mike Grove
Monarch Initiative
Participating institutions:
• OHSU
• LBNL
• UC San Diego
• University of Pittsburg
• Sanger Institute
• Charité -
Universitätsmedizin
Berlin
• NIH UDP
Resource Identification
Initiative
Participating institutions:
• OHSU
• University of California,
San Diego
• International
Neuroscience
Coordinating Facility
• National Institute of
Health
• Publishers and
Journals
CTSAconnect
Participating institutions:
• OHSU
• Cornell University
• Stony Brook University
• University of Florida
• Harvard University
• University at Buffalo
Collaborators
• Urban Lab, Carnegie
Mellon
• Science Exchange
• Anita de Waard, Elsevier
• Michael Lauruhn,
Elsevier
OHSU Library
• Chris Shaffer
• Jackie Wirz
• Todd Hannon
• Kyle Banerjee
How our library has developed examples or libraries making these research network stronger and move beyond the ways libraries often conceive of their role in data management and curation