Patient Counselling. Definition of patient counseling; steps involved in pati...
Talk at OHSU, September 25, 2013
1. Making
Research
Data
Discoverable
and
Usable
(It’s
the
metadata,
stupid!)
Anita
de
Waard
VP
Research
Data
Collabora7ons
a.dewaard@elsevier.com
h=p://researchdata.elsevier.com/
2. Research
data
is
the
‘new
hotness’…
§ Share
research
outputs
§ Demonstrate
impact
to
public
§ Data
availability
drives
growth
§ Demonstrate
impact
§ Guarantee
permanence,
discoverability
§ Avoid
fraud
§ Generate,
track
outputs
§ Comply
with
mandates
§ Ensure
availability
§ Archive,
track,
curate
§ Support
researcher/ins7tu7on
§ Archive
§ Add
cura7on
§ Allow
reuse
Todd
Vision,
DataDryad,
OAI8,
6/23/13:
“We
need
to
find
a
way
to
keep
Dryad
funded,
and
would
love
to
hear
your
ideas
about
doing
that.”
Phil
Bourne,
Associate
Vice
Chancellor,
UCSD,
4/13:
“We
are
thinking
about
the
university
as
a
digital
enterprise.”
Mike
Huerta,
Ass.
Director
NLM
O
of
Health
Info
at
NIH,
6/13:
“Today,
the
major
public
product
of
science
are
concepts,
wri=en
down
in
papers.
But
tomorrow,
data
will
be
the
main
product
of
science….
We
will
require
scien7sts
to
track
and
share
their
data
as
least
as
well,
if
not
be=er,
than
they
are
sharing
their
ideas
today.”
Mara
Saule,
Dean
University
Libraries/CIO,
UVM,
5/13:
“We
need
to
do
something
about
data.”
§ Derive
credit
§ Comply
with
mandates
§ Discover
and
use
§ Cite/acknowledge
Gov
Funding
bodies
University
management
Researchers
Librarians
Data
Repositories
Nathan
Urban,
PI
Urban
Lab,
CMU,
3/13:
“If
we
can
share
our
data,
we
can
write
a
paper
that
will
knock
everybody’s
socks
off!”
Roles
and
needs
wrt
Research
Data:
Barbara
Ransom,
NSF
Program
Director
Earth
Sciences,
2/13:
“We’re
not
going
to
spend
any
more
money
for
you
to
go
out
and
get
more
data!
We
want
you
first
to
show
us
how
you’re
going
to
use
all
the
data
we
paid
y’all
to
collect
in
the
past!”
3. Research
data
management
today:
Using
an7bodies
and
squishy
bits
Grad
Students
experiment
and
enter
details
into
their
lab
notebook.
The
PI
then
tries
to
make
sense
of
their
slides,
and
writes
a
paper.
End
of
story.
4. Prepare
Observe
Analyze
Ponder
Communicate
Prepare
Observe
Analyze
Ponder
Communicate
Research
today
(in
biology)
is
o^en
quite
insular:
5. But
life
is
VERY
complicated:
h=p://en.wikipedia.org/wiki/File:Duck_of_Vaucanson.jpg
• Interspecies
variability:
A
specimen
is
not
a
species
• Gene
expression
variability:
Knowing
genes
is
not
knowing
how
they
are
expressed
• Microbiome:
An
animal
is
an
ecosystem
• Systems
biology:
A
whole
is
more
than
the
sum
of
its
parts
Reduc7onist
science
does
not
work
for
living
systems!
6. What
if
the
data
were
connected?
Prepare
Analyze
Communicate
Prepare
Analyze
Communicate
Observa7ons
Observa7ons
Observa7ons
Across
labs,
experiments:
track
reagents
and
how
they
are
used
7. Prepare
Analyze
Communicate
Prepare
Analyze
Communicate
Observa7ons
Observa7ons
Observa7ons
Compare
outcome
of
interac7ons
with
these
en77es
What
if
the
data
were
connected?
8. Prepare
Analyze
Communicate
Prepare
Analyze
Communicate
Observa7ons
Observa7ons
Observa7ons
Build
a
‘virtual
reagent
spectrogram’
by
comparing
how
different
en77es
interacted
in
different
experiments
Think
What
if
the
data
were
connected?
9. Where
research
data
goes
now:
>
50
My
Papers
2
M
scien7sts
2
My
papers/year
Majority
of
data
(90%?)
is
stored
on
local
hard
drives
Dryad:
7,631
files
Dataverse:
0.6
My
Ins7tu7onal
Repositories
Some
data
(8%?)
stored
in
large,
generic
data
repositories
MiRB:
25k
PetDB:
1,5
k
TAIR:
72,1
k
PDB:
88,3
k
SedDB:
0.6
k
A
small
por7on
of
data
(1-‐2%?)
stored
in
small,
topic-‐focused
data
repositories
1.
How
do
we
get
researchers
to
curate,
store
and
share
their
data?
2.
How
do
we
ensure
long-‐term
sustainability
for
high-‐
end
repositories?
3.
What
role
do
libraries/
ins7tu7ons
play?
10. de
Waard,
A.,
Burton,
S.
et
al.,
2013
1.1.
An
a=empt
to
get
researchers
to
curate
(but
only
parZally
share!)
their
data:
11. • In
220
publica7ons
only
40%
of
an7bodies,
40%
of
cell
lines
and
25%
of
constructs
can
be
manually
iden7fied
(Vasilevsky
et
al,
submi=ed)
• Proposal
(with
NIH/NIF
and
Force11
Group):
– Adding
minimal
data
standards
– Tool
extracts
likely
reagents
/
resources
– User
interface
asks
author
to
confirm
or
select
1.2.
What
to
do
in
the
mean7me?
49
publica7ons
193
publica7ons
76
publica7ons
214
publica7ons
210
publica7
12. Pilot
project
with
IEDA:
– Build
a
database
for
lunar
geochemistry
– Write
joint
report
on
building
repository,
cura7on,
costs
and
challenges
2.2
How
can
research
databases
become
long-‐term
sustainable?
13. With
WDS/RDA
WG:
• Planning
survey
of
cost
recovery
models
for
research
databases
• Input/inspira7on:
ICPSR
Sloane-‐funded
project
Sustaining
Domain
Repositories
for
Digital
Data’
• Developing
overarching
funding
model:
2.2
Cost
recovery
ques7onnaire:
14. Private
store
Data producer
or sponsor
Access
Closed
Flow of funds
Data
publication
Public
Service
Collaboration
Conclave
Limited
Subscription
content
Commercial
overlay
Limited
Academic Use/Limited
Data user
Flow of funds
Examples
ICSPR,
CERN-
LHC
KEGG
GeoFacets
Reaxys
DRAFT - CC-BY-NC 2013, Todd Vision & Anita de Waard
Many small
operations, e.g.
try-db.org,
plhdb.org
Dryad,
arXiv,
PDB
Commercial
and
institutional
storage
&
or
2.3.
A
first
stab
at
a
model:
15. 3.1.
Where
do
ins7tu7onal
repositories
fit
in?
Repository
Advantages
Disadvantages
Local
data
repository
Easy!
No
one
steals
your
data.
No
one
sees
it.
Not
compliant
with
requirements
Generic
data
repository
Not
very
hard
to
do.
Have
complied!
Data
can’t
be
easily
reused.
Credit?
Ins7tu7onal
Repository
Can
use
exis7ng
IR?
Tracking
and
compliance
checks.
Data
can’t
easily
be
reused.
Credit?
Domain-‐specific
data
repository
Data
can
be
reused.
Credit!
Lot
of
work
for
curators.
Long-‐term
sustainable?
Effort,
Reuse,
Credit,
Compliance
Habit,
Ease,
Privacy,
Control
Higher
quality
metadata
16. Funding
Agency:
University:
Collaborators:
Domain
of
study:
Domain-‐Specific
Data
Repository
Local
Data
Repository
Ins7tu7onal
Data
Repository
Generic
Data
Repository
AND
THEY
ALL
WANT
DIFFERENT
METADATA!!!!
3.2.
The
poor
researcher:
17. Domain
repository
3.3.
Possible
pilot
project:
Domain
repository
IR
Data
Metadata:
What
data
was
stored/
viewed
Meta
data
Metadata:
What
data
was
stored/
viewed
• Interview
ins7tu7ons
• Normalize
repor7ng
data
• Talking
to
• IQSS,
Harvard
• ICPSR,
U
Mich
• DataDryad,
UNC
• Pangaea,
Germany
18. 3.4.
Ins7tu7onal
Pilot
study:
• Planning
series
of
interviews
at
key
ins7tu7ons:
– What
role
do
libraries/ins7tu7ons
play
wrt
research
data
management?
– What
tools/metadata
standards
are
used?
– What
aspects
of
data
deposi7on
is
the
Research
Office/
IR/Ins7tu7on
interested
in?
– How
does
this
compare
with
what
scien7sts
want
and
do
in
their
labs?
• Outcomes:
– Share
knowledge
(within
ins7tu7on);
– Write
joint
report
(anonymised)
– Establish
joint
plan
of
ac7on
19. Elsevier
Research
Data
Services:
• 2013/2013:
Series
of
pilots,
reviews,
and
reports:
- With
CMU:
Data/metadata
entry
and
sharing
- With
IEDA:
Repository
crea7on:
feasibility
study
&
report
- With
RDA:
Cost
of
Data
Repositories
ques7onnaire
- With
series
of
ins7tutes:
Interviews
re.
role
of
ins7tu7on
• Main
ques7ons:
- What
are
key
needs?
- Can
we
play
a
role:
skillsets,
partnerships?
- Is
there
a
(transparent)
business
model
for
this?
• Principles:
– Collabora7on
is
tailored
to
partner’s
needs,
using
local
resources;
– Collabora7on
plan
is
MoU/Service-‐Level
Agreement;
– At
all
7mes,
all
data,
reports
and
so^ware
are
open
and
shared.
20. In
summary:
1. If
researchers
start
to
curate
and
share
their
data…
2. And
research
databases
become
long-‐term
sustainable…
3. And
libraries,
data
repositories
and
grid
infrastructures
start
to
work
together…
We
might
enable
a
knowledge
infrastructure
that
allows
us
to
jointly
tackle
the
quesZons
of
life!
21. Many
ques7ons
remain:
? What
carrots
and
s7cks
will
make
researchers
share
their
data?
? How
do
we
create
interoperable
metadata
layers?
? What
role
would
the
ins7tu7on/library
play?
? What
are
sustainable
models,
moving
forward?
? Is
there
a
place
for
publishers,
in
all
this?
22. Thank
you!
Collabora7ons
and
discussions
gratefully
acknowledged:
• CMU:
Nathan
Urban,
Shreejoy
Tripathy,
Shawn
Burton,
Ed
Hovy
• UCSD:
Phil
Bourne,
Brian
Shoe=lander,
David
Minor,
Declan
Fleming,
Ilya
Zaslavsky
• NIF:
Maryann
Martone,
Anita
Bandrowski
• MSU:
Brian
Bothner
• OHSU:
Melissa
Haendel,
Nicole
Vasilevsky
• California
Digital
Library:
Carly
Strasser,
John
Kunze,
Stephen
Abrams
• Columbia/IEDA:
Kers7n
Lehnert,
Leslie
Hsu
• ICPSR:
George
Altman,
Mary
Vardigan
• CNI:
Clifford
Lynch
• Harvard:
Michael
Kurtz,
Chris
Erdmann
• MIT:
Micah
Altman
• UVM:
Mara
Saurle
• RDA:
Simon
Hodson,
Michael
Diepenbroek
23. Your
ques7ons?
Anita
de
Waard
VP
Research
Data
Collabora7ons,
Elsevier
Research
Data
Services
(VT)
a.dewaard@elsevier.com
h=p://researchdata.elsevier.com/