HTML Injection Attacks: Impact and Mitigation Strategies
Is a Biological Database Really Different than a Biological Journal?
1. In the Future Will a Biological
Database Really be Different than a
Biological Journal?
Philip E. Bourne PhD
pbourne@ucsd.edu
iDASH October 18, 2013
1
2. I am speaking to you today as
someone who..
• Maintains a major biological database – the
PDB – used by over 300,000 scientists per
month
• Is the Founding Editor in Chief of PLOS
Computational Biology
iDASH October 18, 2013
2
3. A Question First Posed in August 2005
PLOS Comp Biol 2005 1(3): e34
iDASH October 18, 2013
3
4. Here is one reason why the question is
important….
iDASH October 18, 2013
4
5. The Paper As Experiment
0. Full text of PLoS papers stored
in a database
4. The composite view has
links to pertinent blocks
of literature text and back to the PDB
4.
1.
1. A link brings up figures
from the paper
2.
3. A composite view of
journal and database
content results
3.
1. User clicks on thumbnail
2. Metadata and a
webservices call provide
a renderable image that
can be annotated
3. Selecting a features
provides a
database/literature
mashup
4. That leads to new
papers
PLoS Comp. Biol. 2005 1(3) e34
2. Clicking the paper figure retrieves
data from the PDB which is
analyzed
5
6. The answer 8 years ago, as is now is…
In principle there is no difference, but
the way in which each is perceived is
still very different…
Yet progress has been made and we
will focus on what we can do to further
accelerate change
iDASH October 18, 2013
6
7. Why Bother?
Better integration of data and the
knowledge derived from it can
accelerate discovery and improve the
comprehension and dissemination of
science
iDASH October 18, 2013
7
8. Lets take a step back ...
What got me thinking this way?
iDASH October 18, 2013
8
9. Data Are Becoming More Complex:
Witness The World Wide Protein Data Bank
http://www.wwpdb.org
• The single worldwide
repository for data on
the structure of
biological
macromolecules
• Vital for drug
discovery and the life
sciences
• 43 years old
• Free to all
iDASH October 18, 2013
9
10. The World Wide Protein Data Bank
Places High Value on Data
• Paper not published
unless data are
deposited – strong
data to literature
correspondence
• Highly structured data
conforming to an
extensive ontology
• DOI’s assigned to
every structure
http://www.wwpdb.org
iDASH October 18, 2013
10
11. The PLoS Corpus
• Established in 2000
• Identified as a high
quality publications
• Currently 8 journals
with healthy growth
• Open Access – free to
all
• PLOS ONE a huge
success
iDASH October 18, 2013
11
12. Similar Processes Lead to Similar Resources
Author Submission via the Web
Depositor Submission via the Web
Syntax Checking
Syntax Checking
Review by Scientists &
Editors
Review by Annotators
Corrections by Depositor
Corrections by Author
Release – Web Accessible
Publish – Web Accessible
iDASH October 18, 2013
12
13. The scientific process for handling data
and publications are not that
different, but the end product is
perceived very differently
iDASH October 18, 2013
13
15. This makes no sense when you ask
yourself the question:
What is more valuable a dataset used
and cited by 100 scientists or a paper
you wrote that only you cite?
Case in point…
iDASH October 18, 2013
15
16. What can you do today to change the
situation?
iDASH October 18, 2013
16
17. Think Globally Act Locally
• Support emergent community commons/portals
• Be involved in the support and development of
metadata standards
• Contribute to workflow development etc. to drive
an open research lifecycle
• Educate your mentors on the importance of
open science and scholarly communication
• Write software thinking of an App model
iDASH October 18, 2013
17
18. Pressure Your Institutions to Play a
Greater Role
• We need institutional data/knowledge sharing
plans
• We need digital universities
• We need data/information scientists to be
better recognized by institutions – its not all
about papers – this implies new metrics
iDASH October 18, 2013
18
19. Committee on Academic
Promotions
• What Counts
–
–
–
–
–
Money
Grants
Papers
Teaching
Service
• What Does Not
–
–
–
–
–
–
Sharing data
Sharing software
Open access
Collaboration
Patents
Startups
Ten Simple Rules for Getting Ahead as a Computational Biologist in Academia
2011 PLOS Comp Biol 7(1) e1002001
iDASH October 18, 2013
19
20. We Need to Bend the Traditional System
The Wikipedia Experiment – Topic Pages
Identify areas of Wikipedia that
relate to the journal that are
missing of stubs
Develop a Wikipedia page in the
sandbox
Have a Topic Page Editor Review
the page
Publish the copy of record with
associated rewards
Release the living version into
Wikipedia
iDASH October 18, 2013
20
21. We Need Innovative Contributions
to the Research Lifecycle
Authoring
Tools
Data
Capture
Lab
Notebooks
Software
Repositories
Analysis
Tools
Scholarly
Communication
Visualization
IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION
Commercial &
Public Tools
DisciplineBased Metadata
Standards
Community Portals
Git-like
Resources
By Discipline
Data Journals
New Reward
Systems
Training
Institutional Repositories
iDASH October 18, 2013
Commercial Repositories
21
22. We Need Innovative Contributions
to the Research Lifecycle
Authoring
Tools
Data
Capture
Lab
Notebooks
Software
Repositories
Analysis
Tools
Scholarly
Communication
Visualization
IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION
Commercial &
Public Tools
DisciplineBased Metadata
Standards
Community Portals
Git-like
Resources
By Discipline
Data Journals
New Reward
Systems
Training
Institutional Repositories
iDASH October 18, 2013
Commercial Repositories
22
23. Example Interoperability: The Database View
www.rcsb.org/pdb/explore/literature.do?structureId=1TIM
BMC Bioinformatics 2010 11:220
iDASH October 18, 2013
23
24. This is asking a lot of us, but our job is
being made easier by what is going on
around us
iDASH October 18, 2013
24
25. Open Access to Data and the
Literature is no Longer a Curiosity, but
Mainstream
iDASH October 18, 2013
25
26. Conservative Bodies Are Recognizing
Change
• Anyone, anything, anyt
ime
• publication
access, data, models, sour
ce
codes, resources, transpar
ent
methods, standards, forma
ts, identifiers, apis, license
s, education, policies
• “accessible, intelligible,
assessable, reusable”
[Carole Goble]
http://royalsociety.org/policy/projects/science-public-enterprise/report/
27. Governments Are Recognizing Change
G8 Open Data Charter
http://opensource.com/government/13/7/open-data-charter-g8
iDASH October 18, 2013
27
29. Publishing is Changing
• Today:
• Approx 10,000 publishers
• Publishing approx 25,000 journals
• Which publish approx 1.5 million articles per year
(almost 1 million of which appear in PubMed)
iDASH October 18, 2013
29
30. Witness the ‘Open Access Mega
Journal'
1. Very very large
– Publishing thousands of articles per year
– and benefiting from economies of scale
2. Open Access
– Because no one will pay a subscription fee for a journal that
large (and growing that fast)
– and using an OA Business Model where each article pays for its
own costs
3. (Preferably) without any ‘artificial’ constraints on
its ability to grow
– For example, a desire to only publish ‘high impact; papers
[Pete Binfield]
iDASH October 18, 2013
30
32. “Open Access Mega Journals”
– One Name, Two Flavours
• ‘Clones’ of PLoS ONE (not selective)
–
–
–
–
–
–
SAGE Open
BMJ Open
Scientific Reports (Nature)
AIP Advances (Am Inst Physics)
G3 (Genetics Soc of America)
Biology Open (Company of Biologists)
• ‘Pseudo-Clones’ of PLoS ONE (probably selective)
– Physical Review X (Am Physical Society)
– Open Biology (Royal Society)
– Cell Reports (Elsevier, Cell Press)
[Pete Binfield]
iDASH October 18, 2013
32
33. Attitudes are Changing
datasets
data collections
algorithms
configurations
tools and apps
codes
workflows
scripts
code libraries
services,
system software
infrastructure,
compilers
hardware
[Carole Goble]
“An article about computational
science in a scientific publication
is not the scholarship itself, it is
merely advertising of the
scholarship. The actual
scholarship is the complete
software development
environment, [the complete data]
and the complete set of
instructions which generated the
figures.”
David Donoho, “Wavelab and Reproducible
Research,” 1995
Morin et al Shining Light into Black Boxes
Science 13 April 2012: 336(6078) 159-160
Ince et al The case for open computer
33
programs, Nature 482, 2012
34. Flaws Are Becoming More
Obvious
Out of 18 microarray papers, results
from 10 could not be reproduced
More retractions:
>15X increase in last decade
At current % > by 2045 as many papers published as retracted
[Carole Goble]
1. Ioannidis et al., 2009. Repeatability of published microarray gene expression analyses. Nature Genetics 41: 14
2. Science publishing: The trouble with retractions http://www.nature.com/news/2011/111005/full/478026a.html
34
3. Bjorn Brembs: Open Access and the looming crisis in science https://theconversation.com/open-access-and-the-looming-crisis-in-science-14950
35. Science is Being Deinstitutionalized
Daniel Hulshizer/Associated Press
iDASH October 18, 2013
35
36. Science is Being Deinstitutionalized
Daniel Hulshizer/Associated Press
iDASH October 18, 2013
36
37. In Summary
• Question (2005): In the Future Will a Biological
Database Really be Different than a Biological
Journal?
• Answer:
–
–
–
–
Less different that they were in 2005
We still have a long way to go improve science
Change is accelerating
What one does on a daily basis as a scholar is very
different from when I was in graduate school and it
will be very different again
iDASH October 18, 2013
37