SlideShare una empresa de Scribd logo
1 de 41
Maryann E. Martone, Ph. D.
Executive Director
Professor of Neuroscience, University of California, San Diego
Future of Research Communications and E-Scholarship
Creating a data and tools ecosystem
What is FORCE11?
Future of Research Communications and E-
Scholarship:
A grass roots effort to accelerate the pace and nature
of scholarly communications and e-scholarship through
technology, education and community
Why 11? We were born in 2011 in Dagstuhl,
Germany
Principles laid out in the FORCE11 Manifesto
FORCE11 launched in July 2012
Who is FORCE11?
Anyone who has a stake in moving scholarly communication into the 21st century
Publishers
Library and
Information
scientists
Policy makers
Tool builders
Funders
Scholars
Science Humanities
Social
Sciences
FORCE11 Vision
• Modern technologies enable vastly improve knowledge transfer and far wider
impact; freed from the restrictions of paper, numerous advantages appear
• We see a future in which scientific information and scholarly communication more
generally become part of a global, universal and explicit network of knowledge
• To enable this vision, we need to create and use new forms of scholarly
publication that work with reusable scholarly artifacts
• To obtain the benefits that networked knowledge promises, we have to put in
place reward systems that encourage scholars and researchers to participate and
contribute
• To ensure that this exciting future can develop and be sustained, we have to
support the rich, variegated, integrated and disparate knowledge offerings
that new technologies enable
Beyond the PDF Visual Notes by De Jongens van de Tekeningen is licensed under a Creative Commons Attribution 3.0 Unported License.
Old Model: Single type of content;
single mode of distribution
Scholar
Library
Scholar
Publisher
The future is now...
Scholar
Consumer
Libraries
Data Repositories
Code Repositories
Community databases/platforms
OA
Curators
Social
Networks
Social
NetworksSocial
Networks
Peer Reviewers
Workflows
Data
Blogs/Wikis
Multimedia
Nanopublications
Narrative
Code
The duality of modern scholarship
Observation: Those who build information systems from the
machine side don’t understand the requirements of the
human very well
Those who build information systems from the human side,
don’t understand requirements of machines very well
Scholarship requires the ability to cite and track usage of
scholarly artifacts. In our current mode of working, there is no
way to easily track artifacts as they move through the
ecosystem; no way to incrementally add human expertise; no
way to alert everyone when things go wrong
Digital objects are a new beast
New modes of representation and verification
will be necessary
Trust: Not just
who produced it
but what
produced it
Impetus for change: Is our current
method serving science?
47/50 major preclinical
published cancer studies
could not be replicated
 “The scientific community
assumes that the claims in a
preclinical study can be taken at
face value-that although there
might be some errors in detail,
the main message of the paper
can be relied on and the data
will, for the most part, stand
the test of time. Unfortunately,
this is not always the case.”
Begley and Ellis, 29 MARCH 2012 | VOL 483 | NATURE | 531
The scientific corpus is fragmented
• ~25 million articles
total, each covering a
fragment of the
biomedical space
• Each publisher owns a
fragment of a particular
field
• The current process is
inefficient and slow
Wiley
Elsevier
MacMillian
Oxford
Spinal Muscular Atrophy
Machine-based access requires that we take a global view
of the body scholarly and allow mining across content
A new platform for scholarly
communications
Components
• Authoring tools
– Optimized for mark up and linked content
• Containers
– Expand the objects that are considered “publications”
– Optimize the container for the content
• Processes
– Scholarship is code
• Mark up
– Data, claims, content suitable for the web
– Suitable identifier systems
• Reward systems
– Incentives to change
– Reward for new objects
Scholarship must move from a “single currency system”;
platforms must recognize diversity of output and representation
FORCE11.org
• Community platform
– Meetings
– Discussions
– Tools and resources
– Blogs
– Event calendar
– Community projects
• Promote
interoperability
– Data Citation
– Resource identification
initiative
500 members from diverse stakeholder groups
700
Beyond the PDF
• Conference/unconferen
ce where all
stakeholders come
together as equals to
discuss issues
– Publishers
– Technologists
– Scholars
– Library scientists
• Incubator for change
• What would you do to
change scholarly
communication?
San Diego, Jan 2011 ...... Amsterdam, March 2013........?2015
http://www.force11.org/beyondthepdf2
YES!!!
FORCE
Promote community, cross-
fertilization and interoperability
• FORCE11 helps facilitate
communications across
disciplines and
communities
• Issues are not identical but
we can learn from each
other
– Enhanced publications
• Digital humanities +
– Dealing with data
• Science +
– Open Access
• Science +
“What is an ORCID id?”-computer scientist
ORCID
Data journals
Research Data Alliance
PeerJ, eLife
Workflows 4Ever
Data Verse
Impact Story, Rubriq
Sadie
Scalar
Resource for scholarly communications:
People, organizations, publications, tools
FORCE11 Working Groups
• FORCE11 provides a neutral convening place
for individuals to come together around issues
in scholarly communication
– FORCE11 provides web working space and
facilitation where possible
– 1K Challenge: Beyond the PDF
– Short term working groups with clear focus
• Deliverable specified
• Time line determined
Data: Who’s problem is it?
Scholar
Library
Scholar
Publisher
Domain-
specific
Repository
Web
site/Personal
data
management
Computing
Scholars, Data Repositories, Institutional Repositories taking ownership of
data. Where should it go? Sometimes it can’t go anywhere.
Is data like a
bibliographic record?
• Not uniform in
size
• Not uniform in
type
• Curation requires
deep
understanding of
domain
• Data is dynamic
• Data is fluid
Geoff Bilder, CrossRef
Surveying the resource
landscape
Neuroscience Information Framework http://neuinfo.org
Deep metadata
http://neuinfo.org
With the thousands of databases and other information sources
available, simple descriptive metadata will not suffice
A place to come together: Data
citation principles
•FORCE11 provides a neutral
space for bringing groups
together
•35 individuals
representing > 20
organizations concerned
with data citation
•Conducted a review of
current data citation
recommendations from 4
different organizations
•Arrived at a sense of
consensus principles
Data citation synthesis group:
http://www.force11.org/node/4
381
Process
Synthesis
Community
feedback
Revision Dissemination
July-Sept 2013 Nov-Dec 2013 Jan 2014 Now
Data Citation Principles: Open for Endorsement
Joint Declaration of Data Citation
Principles
• Designed to be high
level and easy to
understand
• Supplemented with
a glossary,
references and
examples
http://www.force11.org/datacitation
1. Importance
2. Credit and attribution
3. Evidence
4. Unique Identification
5. Access
6. Persistence
7. Specificity and verifiability
8. Interoperability and
flexibility
Significance & Scope
• Sound, reproducible scholarship rests upon a
foundation of robust, accessible data.
• Data should be considered legitimate, citable products
of research.
• Data citation, like the citation of other evidence and
sources, is good research practice.
• The Joint Principles cover purpose, function and
attributes of citations.
• Specific practices vary across communities and
technologies – we recommend communities develop
practices for machine and human citations consistent
with these general principles.
1. Importance. Data should be considered legitimate, citable
products of research. Data citations should be accorded the same
importance in the scholarly record as citations of other research
objects, such as publications [1].
2. Credit and attribution: Data citations should facilitate giving
scholarly credit and normative and legal attribution to all
contributors to the data, recognizing that a single style or
mechanism of attribution may not be applicable to all data [2].
3. Evidence. In scholarly literature, whenever and wherever a claim
relies upon data, the corresponding data should be cited [3].
Purpose
Function
4. Unique Identification. A data citation should include a persistent
method for identification that is machine-actionable, globally
unique, and widely used by a community [4].
5. Access. Data citations should facilitate access to the data
themselves and to such associated metadata, documentation, code,
and other materials, as are necessary for both humans and
machines to make informed use of the referenced data [5].
Joint Declaration of Data
Attributes
6. Persistence. Unique identifiers, and metadata describing the data
and its disposition, should persist -- even beyond the lifespan of
the data they describe [6].
7. Specificity and verifiability. Data citations should facilitate
identification of, access to, and verification of the specific data
that support a claim. Citations or citation metadata should include
information about provenance and fixity sufficient to facilitate
verifying that the specific timeslice, version and/or granular
portion of data retrieved subsequently is the same as was
originally cited [7].
8. Interoperability and flexibility. Data citation methods should be
sufficiently flexible to accommodate the variant practices among
communities, but should not differ so much that they compromise
interoperability of data citation practices across communities [8].
Generic Data Citation
(as it appears in printed reference list)
Note:
● Neither the format nor specific required elements are intended to be defined with this example. Formats, optional
elements, and required elements will vary across publishers and communities. [Principle 8: Interoperability and flexibility].
● As illustrated in the previous examples, intra-work citations may be accompanied with information including the specific
portion used. [Principles 7,8].
● As illustrated in the next example, printed citations should be accompanied by metadata that support credit, attribution,
specificity, and verification. [Principles 2, 5 and 7].
Author(s), Year, Dataset Title, Data Repository or Archive, Version, Global
Persistent Identifier
Principle 2: Credit and
Attribution (e.g. authors,
repositories or other
distributors and contributors)
Principle 4: Unique Identifier (e.g.
DOI, Handle.). Principle 5, 6
Access, Persistence: A persistent
identifier that provides access and
metadata
Principle 7: Specificity and verification (e.g. the specific
version used).
Versioning or timeslice information should be supplied with
any updated or dynamic dataset.
Placement of Citations
Intra-work:
● Should provide sufficient information to identify cited data reference within included
reference list.
● Citation to data should be in close proximity to claims relying on data. [Principle 3]
● May include additional information identifying specific portion of data related
supporting that claim. [Principle 7]
Example: The plots shown in Figure X show the distribution of selected measures from the main
data [Author(s), Year, portion or subset used].
Full Citation:
Citation may vary in style, but should be included in the full reference list along with citations to other
types works.
Example:
References Section
Author(s), Year, Article Title, Journal, Publisher, DOI.
Author(s), Year, Dataset Title, Data Repository or Archive, Version, Global Persistent Identifier.
Author(s), Year, Book Title, Publisher, ISBN.
Citation Metadata
Author(s), Year, Dataset Title,
Data Repository or Archive,
Version, Global Persistent
Identifier.
Metadata
retrieval
<!--- CONTRIBUTOR METADATA -->
<contributor role=”
ORCIDid=”>Name</contributor>
<!-- FIXITY and PROVENANCE --
<fixity type=”MD5”>XXXX</fixity>
<fixity type=”UNF”>UNF:XXXX</fixity>
<!-- MACHINE UNDERSTANDABILITY --
>
<content type>data</content type>
<format>HDF5</format>
Note:
● Metadata location, formats, and elements will vary
across publishers and communities. [Principle 8]
● Citation metadata is needed in addition to the
information in the printed citation.
● Metadata describing the data and its disposition
should persist beyond the lifespan of the data.
[Principle 6]
● Citation metadata should support attribution and
credit [Principle 2]; machine use [Principle 5];
specificity and verification [principle 7]
● For example, additional citation metadata may be
embedded in the citing document; attached to the
persistent identifier for the citation, through its
resolution service; stored in a separate community
indexing service (e.g. DataCite, CrossRef); or provided
in a machine-readable way through the surrogate
(“landing page”) presented by the repository to which
the identifier is resolved.
For more detail, see the References section.
http://www.force11.org/node/4772
EXAMPLE METADATA
Growing Adoption
https://www.force11.org/datacitation/endorsements
Endorse the Principles!
• http://www.force11.org/datacitation/endorsements
148 individuals; 60 organizations
Unique ID’s for all! Resource
Identification Initiative
• It is currently impossible
to query the biomedical
literature to find out
what research resources
have been used to
produce the results of a
study
• Impossible to find all
studies that used a
resource
• Critical for
reproducibility and data
mining
• Critical for trouble-
shooting
http://www.force11.org/resource_identification_initiative
Faulty Antibodies Continue to Enter US and
European Markets, Warns Top Clinical
Chemistry Researcher-Genome Web Daily,
October 11, 2013
Resource Identification Initiative
• Have authors supply
appropriate identifiers for
key resources used within
a study such that they
are:
– Machine processible (i.e.,
unique identifier that
resolves to a single
resource)
– Outside of the paywall
– Uniform across journals
and publishers
Launched February 2014: > 30 journals
participating
Pilot Project
• Have authors identify 3 different types
of research resources:
– Software tools and databases
– Antibodies
– Genetically modified animals
• Include RRID in methods section
• RRID=RRID:Accession number
– Just a string at this point
• Voluntary for authors
• Journals did not have to modify their
submission system
• Journals have flexibility in
implementation. Send request to
author at:
– Submission
– During review
– After acceptance
http://scicrunch.com/resources
Resource Identification Portal: Aggregates
accession numbers from >10 different
databases that are the authorities for
registering research resources
First results are in the literature
Google Scholar: Search RRID; select since 2014
What studies used X?
To date:
•30 articles have appeared
•2 articles have disappeared, i.e.,
the RRID’s were removed at
copyediting
•195 RRID’s were reported
•14 were in error = 0.7%
•> 200 antibodies were added
•> 75 software tools/databases
were added
•A resolver service has been
created
•3rd party tools are being created
to provide linkage between
resources and papers
RRID:nif-0000-30467
What have we learned?
Utopia plug-in: Steve Pettifer
•Authors are willing to
adopt new types of
citations
•RRID = usage of
research resource
•Ideal: resolved by
search engines without
requiring specialized
citation services
•Citation drives
registration
•Clear role for
repositories as
authorities
•Should RRID’s be DOI’s?
Will system work
for data citation
and more
complicated
research objects?
Data Citation Implementation Group
FORCE11 Vision
• Modern technologies enable vastly improve knowledge transfer and far wider
impact; freed from the restrictions of paper, numerous advantages appear
• We see a future in which scientific information and scholarly communication more
generally become part of a global, universal and explicit network of knowledge
• To enable this vision, we need to create and use new forms of scholarly
publication that work with reusable scholarly artifacts
• To obtain the benefits that networked knowledge promises, we have to put in
place reward systems that encourage scholars and researchers to participate and
contribute
• To ensure that this exciting future can develop and be sustained, we have to
support the rich, variegated, integrated and disparate knowledge offerings
that new technologies enable
No single infrastructure serves everything; cooperation
in defining a global system of scholarly communication
Notes & References for Data Citation Principles
Notes
[1] CODATA 2013: sec 3.2.1; Uhlir (ed.) 2012, ch 14; Altman & King 2007
[2] CODATA 2013, Sec 3.2; 7.2.3; Uhlir (ed.) 2012,ch. 14
[3] CODATA 2013, Sec 3.1; 7.2.3; Uhlir (ed.) 2012, ch. 14
[4] Altman-King 2007; CODATA 2013, Sec 3.2.3, Ch. 5; Ball & Duke 2012
[5] CODATA 2013, Sec 3.2.4, 3.2.5, 3.2.8
[6] Altman-King 2007; Ball & Duke 2012; CODATA 2013, Sec 3.2.2
[7] Altman-King 2007; CODATA 2013, Sec 3.2.7, 3.2.8
[8] CODATA 2013, Sec 3.2.10
References
• M. Altman & G. King, 2007. A Proposed Standard for the Scholarly Citation of
Quantitative Data, D-Lib
• Ball, A., Duke, M. (2012). ‘Data Citation and Linking’. DCC Briefing Papers.
Edinburgh: Digital Curation Centre.
• CODATA-ICSTI Task Group on Data Citation, 2013; Out of Cite, Out of Mind: The
Current State of Practice, Policy, and Technology for the Citation of Data. Data
Science Journal
• P. Uhlir (ed.),2011. For Attribution -- Developing Data Attribution and Citation
Practices and Standards. National Academies of Sciences

Más contenido relacionado

La actualidad más candente

Virtual Communities: Catalysts for Advancing Scholarship
Virtual Communities: Catalysts for Advancing ScholarshipVirtual Communities: Catalysts for Advancing Scholarship
Virtual Communities: Catalysts for Advancing Scholarship
John Butler
 
Learning from past infrastructure to embrace friction and create the Research...
Learning from past infrastructure to embrace friction and create the Research...Learning from past infrastructure to embrace friction and create the Research...
Learning from past infrastructure to embrace friction and create the Research...
Research Data Alliance
 

La actualidad más candente (19)

Current opinions in drug discovery public compound databases
Current opinions in drug discovery public compound databasesCurrent opinions in drug discovery public compound databases
Current opinions in drug discovery public compound databases
 
Semantic Web Technologies: Changing Bibliographic Descriptions?
Semantic Web Technologies: Changing Bibliographic Descriptions?Semantic Web Technologies: Changing Bibliographic Descriptions?
Semantic Web Technologies: Changing Bibliographic Descriptions?
 
Virtual Communities: Catalysts for Advancing Scholarship
Virtual Communities: Catalysts for Advancing ScholarshipVirtual Communities: Catalysts for Advancing Scholarship
Virtual Communities: Catalysts for Advancing Scholarship
 
Metadata 2020 Vivo Conference 2018
Metadata 2020 Vivo Conference 2018 Metadata 2020 Vivo Conference 2018
Metadata 2020 Vivo Conference 2018
 
Data Sharing: Social and Normative - ISWC
Data Sharing: Social and Normative - ISWCData Sharing: Social and Normative - ISWC
Data Sharing: Social and Normative - ISWC
 
Public Compound Databases
Public Compound DatabasesPublic Compound Databases
Public Compound Databases
 
Vision of Library Technical Services
Vision of Library Technical ServicesVision of Library Technical Services
Vision of Library Technical Services
 
Research Data Services at the University of Utah
Research Data Services at the University of UtahResearch Data Services at the University of Utah
Research Data Services at the University of Utah
 
Jyoti singh
Jyoti singhJyoti singh
Jyoti singh
 
Design and development of subject gateways with special reference to lisgateway
Design and development of subject  gateways with special reference to lisgatewayDesign and development of subject  gateways with special reference to lisgateway
Design and development of subject gateways with special reference to lisgateway
 
Managing Knowledge in a Network Environment
Managing Knowledge in a Network EnvironmentManaging Knowledge in a Network Environment
Managing Knowledge in a Network Environment
 
Learning from past infrastructure to embrace friction and create the Research...
Learning from past infrastructure to embrace friction and create the Research...Learning from past infrastructure to embrace friction and create the Research...
Learning from past infrastructure to embrace friction and create the Research...
 
Chapter 1,2,3,6
Chapter 1,2,3,6Chapter 1,2,3,6
Chapter 1,2,3,6
 
"Plans are worthless, but planning is essential"
"Plans are worthless, but planning is essential""Plans are worthless, but planning is essential"
"Plans are worthless, but planning is essential"
 
Linked Data: Why Bother?
Linked Data:  Why Bother?Linked Data:  Why Bother?
Linked Data: Why Bother?
 
Relationship Web: Trailblazing, Analytics and Computing for Human Experience
Relationship Web: Trailblazing, Analytics and Computing for Human ExperienceRelationship Web: Trailblazing, Analytics and Computing for Human Experience
Relationship Web: Trailblazing, Analytics and Computing for Human Experience
 
RDAP 15: Research Data Integration in the Purdue Libraries
RDAP 15: Research Data Integration in the Purdue LibrariesRDAP 15: Research Data Integration in the Purdue Libraries
RDAP 15: Research Data Integration in the Purdue Libraries
 
Linked Data Workshop Stanford University
Linked Data Workshop Stanford University Linked Data Workshop Stanford University
Linked Data Workshop Stanford University
 
Finding learning resources through Web Data
Finding learning resources  through Web DataFinding learning resources  through Web Data
Finding learning resources through Web Data
 

Similar a FORCE11: Creating a data and tools ecosystem

OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...
OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...
OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...
Open Science Fair
 

Similar a FORCE11: Creating a data and tools ecosystem (20)

The Future of Research Communications and e-Scholarship: Are we there yet?
The Future of Research Communications and e-Scholarship: Are we there yet?The Future of Research Communications and e-Scholarship: Are we there yet?
The Future of Research Communications and e-Scholarship: Are we there yet?
 
Alpsp final martone
Alpsp final martoneAlpsp final martone
Alpsp final martone
 
FORCE11: Future of Research Communications and e-Scholarship
FORCE11:  Future of Research Communications and e-ScholarshipFORCE11:  Future of Research Communications and e-Scholarship
FORCE11: Future of Research Communications and e-Scholarship
 
Linking Data to Publications through Citation and Virtual Archives
Linking Data to Publications through Citation and Virtual ArchivesLinking Data to Publications through Citation and Virtual Archives
Linking Data to Publications through Citation and Virtual Archives
 
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...
 
The role of libraries and information professionals during the Big Data Era/ ...
The role of libraries and information professionals during the Big Data Era/ ...The role of libraries and information professionals during the Big Data Era/ ...
The role of libraries and information professionals during the Big Data Era/ ...
 
2013 DataCite Summer Meeting - Update on Force 11 and the Amsterdam manifesto...
2013 DataCite Summer Meeting - Update on Force 11 and the Amsterdam manifesto...2013 DataCite Summer Meeting - Update on Force 11 and the Amsterdam manifesto...
2013 DataCite Summer Meeting - Update on Force 11 and the Amsterdam manifesto...
 
Introduction to the workshop Services to support FAIR data - Sarah Jones
Introduction to the workshop Services to support FAIR data - Sarah JonesIntroduction to the workshop Services to support FAIR data - Sarah Jones
Introduction to the workshop Services to support FAIR data - Sarah Jones
 
Why Data Citation Currently Misses the Point
Why Data Citation Currently Misses the PointWhy Data Citation Currently Misses the Point
Why Data Citation Currently Misses the Point
 
OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...
OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...
OSFair2017 Workshop | Building a global knowledge commons - ramping up reposi...
 
DataONE Education Module 02: Data Sharing
DataONE Education Module 02: Data SharingDataONE Education Module 02: Data Sharing
DataONE Education Module 02: Data Sharing
 
Holmes "Institutional Infrastructure for Data Sharing"
Holmes "Institutional Infrastructure for Data Sharing"Holmes "Institutional Infrastructure for Data Sharing"
Holmes "Institutional Infrastructure for Data Sharing"
 
Anu digital research literacies
Anu digital research literaciesAnu digital research literacies
Anu digital research literacies
 
Data, Data Everywhere: What's A Publisher to Do?
Data, Data Everywhere: What's  A Publisher to Do?Data, Data Everywhere: What's  A Publisher to Do?
Data, Data Everywhere: What's A Publisher to Do?
 
What does open science mean? A stakeholder perspective
What does open science mean? A stakeholder perspectiveWhat does open science mean? A stakeholder perspective
What does open science mean? A stakeholder perspective
 
Open data in a big data world (Accord ICSU-IAP-ISSC-TWAS)
Open data in a big data world (Accord ICSU-IAP-ISSC-TWAS)Open data in a big data world (Accord ICSU-IAP-ISSC-TWAS)
Open data in a big data world (Accord ICSU-IAP-ISSC-TWAS)
 
Open data in a big data world Accord (ICSU-IAP-ISSC-TWAS)
Open data in a big data world Accord (ICSU-IAP-ISSC-TWAS)Open data in a big data world Accord (ICSU-IAP-ISSC-TWAS)
Open data in a big data world Accord (ICSU-IAP-ISSC-TWAS)
 
Full Erdmann Ruttenberg Community Approaches to Open Data at Scale
Full Erdmann Ruttenberg Community Approaches to Open Data at ScaleFull Erdmann Ruttenberg Community Approaches to Open Data at Scale
Full Erdmann Ruttenberg Community Approaches to Open Data at Scale
 
Slides | Research data literacy and the library
Slides | Research data literacy and the librarySlides | Research data literacy and the library
Slides | Research data literacy and the library
 
Slides | Research data literacy and the library
Slides | Research data literacy and the librarySlides | Research data literacy and the library
Slides | Research data literacy and the library
 

Más de Maryann Martone

Más de Maryann Martone (9)

Introductory remarks: role of generalist repositories to enhance data discove...
Introductory remarks: role of generalist repositories to enhance data discove...Introductory remarks: role of generalist repositories to enhance data discove...
Introductory remarks: role of generalist repositories to enhance data discove...
 
Annotating research resources with rrid’s
Annotating research resources with rrid’sAnnotating research resources with rrid’s
Annotating research resources with rrid’s
 
Martone grethe
Martone gretheMartone grethe
Martone grethe
 
Data-knowledge transition zones within the biomedical research ecosystem
Data-knowledge transition zones within the biomedical research ecosystemData-knowledge transition zones within the biomedical research ecosystem
Data-knowledge transition zones within the biomedical research ecosystem
 
How do we know what we don't know?  Exploring the data and knowledge space th...
How do we know what we don't know?  Exploring the data and knowledge space th...How do we know what we don't know?  Exploring the data and knowledge space th...
How do we know what we don't know?  Exploring the data and knowledge space th...
 
How do we know what we don’t know: Using the Neuroscience Information Framew...
How do we know what we don’t know:  Using the Neuroscience Information Framew...How do we know what we don’t know:  Using the Neuroscience Information Framew...
How do we know what we don’t know: Using the Neuroscience Information Framew...
 
EcsiNeurosciences Information Framework (NIF): An example of community Cyberi...
EcsiNeurosciences Information Framework (NIF): An example of community Cyberi...EcsiNeurosciences Information Framework (NIF): An example of community Cyberi...
EcsiNeurosciences Information Framework (NIF): An example of community Cyberi...
 
A Deep Survey of the Digital Resource Landscape: Perspectives from the Neuros...
A Deep Survey of the Digital Resource Landscape:Perspectives from the Neuros...A Deep Survey of the Digital Resource Landscape:Perspectives from the Neuros...
A Deep Survey of the Digital Resource Landscape: Perspectives from the Neuros...
 
The real world of ontologies and phenotype representation: perspectives from...
The real world of ontologies and phenotype representation:  perspectives from...The real world of ontologies and phenotype representation:  perspectives from...
The real world of ontologies and phenotype representation: perspectives from...
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 

FORCE11: Creating a data and tools ecosystem

  • 1. Maryann E. Martone, Ph. D. Executive Director Professor of Neuroscience, University of California, San Diego Future of Research Communications and E-Scholarship Creating a data and tools ecosystem
  • 2. What is FORCE11? Future of Research Communications and E- Scholarship: A grass roots effort to accelerate the pace and nature of scholarly communications and e-scholarship through technology, education and community Why 11? We were born in 2011 in Dagstuhl, Germany Principles laid out in the FORCE11 Manifesto FORCE11 launched in July 2012
  • 3. Who is FORCE11? Anyone who has a stake in moving scholarly communication into the 21st century Publishers Library and Information scientists Policy makers Tool builders Funders Scholars Science Humanities Social Sciences
  • 4. FORCE11 Vision • Modern technologies enable vastly improve knowledge transfer and far wider impact; freed from the restrictions of paper, numerous advantages appear • We see a future in which scientific information and scholarly communication more generally become part of a global, universal and explicit network of knowledge • To enable this vision, we need to create and use new forms of scholarly publication that work with reusable scholarly artifacts • To obtain the benefits that networked knowledge promises, we have to put in place reward systems that encourage scholars and researchers to participate and contribute • To ensure that this exciting future can develop and be sustained, we have to support the rich, variegated, integrated and disparate knowledge offerings that new technologies enable Beyond the PDF Visual Notes by De Jongens van de Tekeningen is licensed under a Creative Commons Attribution 3.0 Unported License.
  • 5. Old Model: Single type of content; single mode of distribution Scholar Library Scholar Publisher
  • 6. The future is now... Scholar Consumer Libraries Data Repositories Code Repositories Community databases/platforms OA Curators Social Networks Social NetworksSocial Networks Peer Reviewers Workflows Data Blogs/Wikis Multimedia Nanopublications Narrative Code
  • 7. The duality of modern scholarship Observation: Those who build information systems from the machine side don’t understand the requirements of the human very well Those who build information systems from the human side, don’t understand requirements of machines very well Scholarship requires the ability to cite and track usage of scholarly artifacts. In our current mode of working, there is no way to easily track artifacts as they move through the ecosystem; no way to incrementally add human expertise; no way to alert everyone when things go wrong
  • 8. Digital objects are a new beast New modes of representation and verification will be necessary Trust: Not just who produced it but what produced it
  • 9. Impetus for change: Is our current method serving science? 47/50 major preclinical published cancer studies could not be replicated  “The scientific community assumes that the claims in a preclinical study can be taken at face value-that although there might be some errors in detail, the main message of the paper can be relied on and the data will, for the most part, stand the test of time. Unfortunately, this is not always the case.” Begley and Ellis, 29 MARCH 2012 | VOL 483 | NATURE | 531
  • 10. The scientific corpus is fragmented • ~25 million articles total, each covering a fragment of the biomedical space • Each publisher owns a fragment of a particular field • The current process is inefficient and slow Wiley Elsevier MacMillian Oxford Spinal Muscular Atrophy Machine-based access requires that we take a global view of the body scholarly and allow mining across content
  • 11. A new platform for scholarly communications Components • Authoring tools – Optimized for mark up and linked content • Containers – Expand the objects that are considered “publications” – Optimize the container for the content • Processes – Scholarship is code • Mark up – Data, claims, content suitable for the web – Suitable identifier systems • Reward systems – Incentives to change – Reward for new objects Scholarship must move from a “single currency system”; platforms must recognize diversity of output and representation
  • 12. FORCE11.org • Community platform – Meetings – Discussions – Tools and resources – Blogs – Event calendar – Community projects • Promote interoperability – Data Citation – Resource identification initiative 500 members from diverse stakeholder groups 700
  • 13. Beyond the PDF • Conference/unconferen ce where all stakeholders come together as equals to discuss issues – Publishers – Technologists – Scholars – Library scientists • Incubator for change • What would you do to change scholarly communication? San Diego, Jan 2011 ...... Amsterdam, March 2013........?2015 http://www.force11.org/beyondthepdf2 YES!!! FORCE
  • 14. Promote community, cross- fertilization and interoperability • FORCE11 helps facilitate communications across disciplines and communities • Issues are not identical but we can learn from each other – Enhanced publications • Digital humanities + – Dealing with data • Science + – Open Access • Science + “What is an ORCID id?”-computer scientist
  • 15. ORCID Data journals Research Data Alliance PeerJ, eLife Workflows 4Ever Data Verse Impact Story, Rubriq Sadie Scalar Resource for scholarly communications: People, organizations, publications, tools
  • 16. FORCE11 Working Groups • FORCE11 provides a neutral convening place for individuals to come together around issues in scholarly communication – FORCE11 provides web working space and facilitation where possible – 1K Challenge: Beyond the PDF – Short term working groups with clear focus • Deliverable specified • Time line determined
  • 17. Data: Who’s problem is it? Scholar Library Scholar Publisher Domain- specific Repository Web site/Personal data management Computing Scholars, Data Repositories, Institutional Repositories taking ownership of data. Where should it go? Sometimes it can’t go anywhere.
  • 18. Is data like a bibliographic record? • Not uniform in size • Not uniform in type • Curation requires deep understanding of domain • Data is dynamic • Data is fluid Geoff Bilder, CrossRef
  • 19. Surveying the resource landscape Neuroscience Information Framework http://neuinfo.org
  • 20. Deep metadata http://neuinfo.org With the thousands of databases and other information sources available, simple descriptive metadata will not suffice
  • 21. A place to come together: Data citation principles •FORCE11 provides a neutral space for bringing groups together •35 individuals representing > 20 organizations concerned with data citation •Conducted a review of current data citation recommendations from 4 different organizations •Arrived at a sense of consensus principles Data citation synthesis group: http://www.force11.org/node/4 381
  • 22. Process Synthesis Community feedback Revision Dissemination July-Sept 2013 Nov-Dec 2013 Jan 2014 Now Data Citation Principles: Open for Endorsement
  • 23. Joint Declaration of Data Citation Principles • Designed to be high level and easy to understand • Supplemented with a glossary, references and examples http://www.force11.org/datacitation 1. Importance 2. Credit and attribution 3. Evidence 4. Unique Identification 5. Access 6. Persistence 7. Specificity and verifiability 8. Interoperability and flexibility
  • 24. Significance & Scope • Sound, reproducible scholarship rests upon a foundation of robust, accessible data. • Data should be considered legitimate, citable products of research. • Data citation, like the citation of other evidence and sources, is good research practice. • The Joint Principles cover purpose, function and attributes of citations. • Specific practices vary across communities and technologies – we recommend communities develop practices for machine and human citations consistent with these general principles.
  • 25. 1. Importance. Data should be considered legitimate, citable products of research. Data citations should be accorded the same importance in the scholarly record as citations of other research objects, such as publications [1]. 2. Credit and attribution: Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data, recognizing that a single style or mechanism of attribution may not be applicable to all data [2]. 3. Evidence. In scholarly literature, whenever and wherever a claim relies upon data, the corresponding data should be cited [3]. Purpose
  • 26. Function 4. Unique Identification. A data citation should include a persistent method for identification that is machine-actionable, globally unique, and widely used by a community [4]. 5. Access. Data citations should facilitate access to the data themselves and to such associated metadata, documentation, code, and other materials, as are necessary for both humans and machines to make informed use of the referenced data [5]. Joint Declaration of Data
  • 27. Attributes 6. Persistence. Unique identifiers, and metadata describing the data and its disposition, should persist -- even beyond the lifespan of the data they describe [6]. 7. Specificity and verifiability. Data citations should facilitate identification of, access to, and verification of the specific data that support a claim. Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice, version and/or granular portion of data retrieved subsequently is the same as was originally cited [7]. 8. Interoperability and flexibility. Data citation methods should be sufficiently flexible to accommodate the variant practices among communities, but should not differ so much that they compromise interoperability of data citation practices across communities [8].
  • 28. Generic Data Citation (as it appears in printed reference list) Note: ● Neither the format nor specific required elements are intended to be defined with this example. Formats, optional elements, and required elements will vary across publishers and communities. [Principle 8: Interoperability and flexibility]. ● As illustrated in the previous examples, intra-work citations may be accompanied with information including the specific portion used. [Principles 7,8]. ● As illustrated in the next example, printed citations should be accompanied by metadata that support credit, attribution, specificity, and verification. [Principles 2, 5 and 7]. Author(s), Year, Dataset Title, Data Repository or Archive, Version, Global Persistent Identifier Principle 2: Credit and Attribution (e.g. authors, repositories or other distributors and contributors) Principle 4: Unique Identifier (e.g. DOI, Handle.). Principle 5, 6 Access, Persistence: A persistent identifier that provides access and metadata Principle 7: Specificity and verification (e.g. the specific version used). Versioning or timeslice information should be supplied with any updated or dynamic dataset.
  • 29. Placement of Citations Intra-work: ● Should provide sufficient information to identify cited data reference within included reference list. ● Citation to data should be in close proximity to claims relying on data. [Principle 3] ● May include additional information identifying specific portion of data related supporting that claim. [Principle 7] Example: The plots shown in Figure X show the distribution of selected measures from the main data [Author(s), Year, portion or subset used]. Full Citation: Citation may vary in style, but should be included in the full reference list along with citations to other types works. Example: References Section Author(s), Year, Article Title, Journal, Publisher, DOI. Author(s), Year, Dataset Title, Data Repository or Archive, Version, Global Persistent Identifier. Author(s), Year, Book Title, Publisher, ISBN.
  • 30. Citation Metadata Author(s), Year, Dataset Title, Data Repository or Archive, Version, Global Persistent Identifier. Metadata retrieval <!--- CONTRIBUTOR METADATA --> <contributor role=” ORCIDid=”>Name</contributor> <!-- FIXITY and PROVENANCE -- <fixity type=”MD5”>XXXX</fixity> <fixity type=”UNF”>UNF:XXXX</fixity> <!-- MACHINE UNDERSTANDABILITY -- > <content type>data</content type> <format>HDF5</format> Note: ● Metadata location, formats, and elements will vary across publishers and communities. [Principle 8] ● Citation metadata is needed in addition to the information in the printed citation. ● Metadata describing the data and its disposition should persist beyond the lifespan of the data. [Principle 6] ● Citation metadata should support attribution and credit [Principle 2]; machine use [Principle 5]; specificity and verification [principle 7] ● For example, additional citation metadata may be embedded in the citing document; attached to the persistent identifier for the citation, through its resolution service; stored in a separate community indexing service (e.g. DataCite, CrossRef); or provided in a machine-readable way through the surrogate (“landing page”) presented by the repository to which the identifier is resolved. For more detail, see the References section. http://www.force11.org/node/4772 EXAMPLE METADATA
  • 32. Endorse the Principles! • http://www.force11.org/datacitation/endorsements 148 individuals; 60 organizations
  • 33. Unique ID’s for all! Resource Identification Initiative • It is currently impossible to query the biomedical literature to find out what research resources have been used to produce the results of a study • Impossible to find all studies that used a resource • Critical for reproducibility and data mining • Critical for trouble- shooting http://www.force11.org/resource_identification_initiative Faulty Antibodies Continue to Enter US and European Markets, Warns Top Clinical Chemistry Researcher-Genome Web Daily, October 11, 2013
  • 34. Resource Identification Initiative • Have authors supply appropriate identifiers for key resources used within a study such that they are: – Machine processible (i.e., unique identifier that resolves to a single resource) – Outside of the paywall – Uniform across journals and publishers Launched February 2014: > 30 journals participating
  • 35. Pilot Project • Have authors identify 3 different types of research resources: – Software tools and databases – Antibodies – Genetically modified animals • Include RRID in methods section • RRID=RRID:Accession number – Just a string at this point • Voluntary for authors • Journals did not have to modify their submission system • Journals have flexibility in implementation. Send request to author at: – Submission – During review – After acceptance http://scicrunch.com/resources Resource Identification Portal: Aggregates accession numbers from >10 different databases that are the authorities for registering research resources
  • 36. First results are in the literature Google Scholar: Search RRID; select since 2014
  • 37. What studies used X? To date: •30 articles have appeared •2 articles have disappeared, i.e., the RRID’s were removed at copyediting •195 RRID’s were reported •14 were in error = 0.7% •> 200 antibodies were added •> 75 software tools/databases were added •A resolver service has been created •3rd party tools are being created to provide linkage between resources and papers RRID:nif-0000-30467
  • 38. What have we learned? Utopia plug-in: Steve Pettifer •Authors are willing to adopt new types of citations •RRID = usage of research resource •Ideal: resolved by search engines without requiring specialized citation services •Citation drives registration •Clear role for repositories as authorities •Should RRID’s be DOI’s? Will system work for data citation and more complicated research objects?
  • 40. FORCE11 Vision • Modern technologies enable vastly improve knowledge transfer and far wider impact; freed from the restrictions of paper, numerous advantages appear • We see a future in which scientific information and scholarly communication more generally become part of a global, universal and explicit network of knowledge • To enable this vision, we need to create and use new forms of scholarly publication that work with reusable scholarly artifacts • To obtain the benefits that networked knowledge promises, we have to put in place reward systems that encourage scholars and researchers to participate and contribute • To ensure that this exciting future can develop and be sustained, we have to support the rich, variegated, integrated and disparate knowledge offerings that new technologies enable No single infrastructure serves everything; cooperation in defining a global system of scholarly communication
  • 41. Notes & References for Data Citation Principles Notes [1] CODATA 2013: sec 3.2.1; Uhlir (ed.) 2012, ch 14; Altman & King 2007 [2] CODATA 2013, Sec 3.2; 7.2.3; Uhlir (ed.) 2012,ch. 14 [3] CODATA 2013, Sec 3.1; 7.2.3; Uhlir (ed.) 2012, ch. 14 [4] Altman-King 2007; CODATA 2013, Sec 3.2.3, Ch. 5; Ball & Duke 2012 [5] CODATA 2013, Sec 3.2.4, 3.2.5, 3.2.8 [6] Altman-King 2007; Ball & Duke 2012; CODATA 2013, Sec 3.2.2 [7] Altman-King 2007; CODATA 2013, Sec 3.2.7, 3.2.8 [8] CODATA 2013, Sec 3.2.10 References • M. Altman & G. King, 2007. A Proposed Standard for the Scholarly Citation of Quantitative Data, D-Lib • Ball, A., Duke, M. (2012). ‘Data Citation and Linking’. DCC Briefing Papers. Edinburgh: Digital Curation Centre. • CODATA-ICSTI Task Group on Data Citation, 2013; Out of Cite, Out of Mind: The Current State of Practice, Policy, and Technology for the Citation of Data. Data Science Journal • P. Uhlir (ed.),2011. For Attribution -- Developing Data Attribution and Citation Practices and Standards. National Academies of Sciences