Collaboratively creating a network of ideas, data and software
1. Anita de Waard
VP Research Data Collaborations
Elsevier, Jericho, VT
Some Thoughts on Collectively Creating
Networks of Ideas, Data and Software
2. How do we unify the needs of the
collective and the individual?
“Let us endeavor to build systems that allow
a kid in Mali who wants to learn about proteomics
to not be overwhelmed by the irrelevant and the untrue.”
- John Perry Barlow, iAnnotate 2014
Collectively create nimble and robust systems of
knowledge management that interconnect ideas, data
and software.
6. Linking Papers to Data, Phase 1
• Supplementary data at PANGAEA
• Bidirectional links between PANGAEA &
ScienceDirect
• Data visualized next to the article
http://www.elsevier.com/databaselinking
7. Linking Papers to Data, Phase 2
• ICSU/WDS/RDA Publishing Data
Service Working group
• Currently creating linked-data
model for exposing DOI to DOI
links outside publisher’s firewall
• Merged with National Data
Service pilot with the same goal
• Collaboration between CrossRef,
DataCite, Europe PubMed
Central, ANDS, Thompson
Reuters, Elsevier
• About to deliver:
http://dliservice.research-
infrastructures.eu/#/api
Objective: move from
a plethora of (mostly)
bilateral arrangements
between the different
players…
.. a one-for-all cross-
referencing service for
articles and data
.. to ..
10. Researchers
Funding
Agency
Institution
Data
Repository
Dataset
Journal
Paper
A Proposal To Address These Issues:
1. Researcher creates datasets and posts to repository
(under embargo)
2. Funder is automatically notified of dataset publication
3. Researcher writes paper & publishes in journal; embargo is lifted and data linked
- NB this also allows release of non-used data for negative result and reproducibility
4. Funder and institution get report on publication and embargo lifting
2
1
1
3
3
3
4
4
i. Less
Work!
iv. Better
Tracking!
iii. Better
Linking!
ii. More
Data
Stored!
11. One piece of the puzzle: Mendeley Data:
https://data.mendeley.com/datasets/xz6gv65m6d/6
Linked to published
papers – or not
Linked to Github – or not
Versioning and
provenance
12. Another Piece of the Puzzle: DataSearch:
http://datasearchdemo.elsevier.com/indexed#/search/mercury
13. Federated
Poor API
Rich API
FTP & Index
Federated
Poor API
Rich API
FTP & Index
Federated
Poor API
Rich API
FTP & Index
Data
Enrichment
Manual
Automated
(User) Intent
Ranking
Filtering (how
to mix
federated &
indexed rich &
poor)
Search
Rendering
Search all data
Faceted query/Results
refinement
Store & Use results
Gener
al
UI
Doma
in
UI
Filterin
g
Feeding
user signals
back into
Search
ranking
Evaluation
How Do We Evaluate Discoverability?
Birds of a Feather on Data Search: https://rd-alliance.org/bof-data-search.html
14. How do we pay for all this?
RDA Cost Recovery WG
• Cochair with Ingrid Dillo (DANS), Simon Hodson (CODATA)
• Goal: write a report regarding new potential funding models for
data repositories, allow them to start sharing this knowledge
• Interviewed 24 repositories on their funding (current and future)
• Now summarising stories and trends – will present at RDAP7
Terms of funding for main income stream (in %)
16. Working with Networks of Partners
Force11:
– Multi-stakeholder, member-driven organisation
– Unites scholars, tool developers, librarians, publishers, funding agencies etc. etc.
– E.g. Software citation group, akin to Data Citation Group
– Will present at Force16 in Portland, OR April 17-19, 2016
National Data Service:
– Multi-stakeholder group, based around supercomputing centres
– Aims to be a ‘connective tissue’ between data creation, curation, storage etc projects.
– Inviting Pilots: two or more partners who have not worked together, interested in collaborating
on a data-centric project to solve a real-world needs: can include software sharing
– E.g. Datasearch, Data Linking systems
RDA:
– CoLead Data publishing, linking group
– Colead Cost Recovery group
– Active in Chemistry, Earth Science groups
– Starting BoF Data Search
The National
DATA SERVICE
17. Anita de Waard
VP Research Data Collaborations, Elsevier
a.dewaard@elsevier.com
@anitadewaard
In summary:
Let’s collectively enable ‘an account of the present
undertakings, studies and labours of the ingenious in many
considerable parts of the world’,
by connecting ideas, data, and software
through interconnected partnerships!