SlideShare una empresa de Scribd logo
1 de 46
EMBO SourceData
– Next Gen Open Accesss
Bernd Pulverer
Chief Editor | The EMBO Journal
Head | Scientific Publications
Data transparency
Scientific publishing
– Dominant channel for the
dissemination of peer-
reviewed data.
– Journals function as a proxy
for quality in research
assessment
– The rate of publishing keeps
increasing.
– Papers are human-readable
but poorly machine-readable.
5/27
Title
Abstract
Synopsis
Main paper
Supp Info
Datasets
The Research Paper
Title
Abstract
Synopsis
Main paper
Supp Info
Datasets
Expert View
The Research Paper
‘Expert View’
• All the data required to support the conclusions
included in the paper.
• ‘General reader’ vs. ‘expert’ view of the paper:
– Expandable/collapsible ‘inline’ sections,
– Copy edited.
• Restricted to select types of data and information:
– Replicates
– Controls, experimental optimization
– ‘Negative’ results
– Extended experimental protocols
– Computational algorithms
• Datasets presented as separate files.
• No further reaching data 6
Title
Abstract
Synopsis
Main paper
Expert View
Datasets
Source data
What is a figure?
A scientific result converted into a
collection of pixels
8/27
Discoverable, rich content
n seeing all the data –
nt lever that we have for transparency’
Michael F
SourceData
Tools to publish figures as structured digital objects that
link the human-readable illustrations with machine-
readable metadata and ‘source data’ in order to
• improve data transparency (ethics)
• make published data (re)useable
• enable data-oriented search
9/27
Metadata
•Focus on the biological
content
•Use standard identifiers
and existing controlled
vocabularies
Search
•Data-oriented semantic
search of the literature.
•Overcome some of the
limitations of keyword-
based search
10/27
SourceData
Data
•Figure source data files
hosted by the journals
•Link to data repositories
•Archive
•Transparency
•Revisualization
•Reuse
•Integration
•Search
•Discourage
manipulation
o voluntary
o ~40% papers
12/27
No
No
Yes
Yes
Data Transparency
Metadata
•Focus on the biological
content
•Use standard identifiers
and existing controlled
vocabularies
Search
•Data-oriented semantic
search of the literature.
•Overcome some of the
limitations of keyword-
based search
10/27
SourceData
Data
•Figure source data files
hosted by the journals
•Link to data repositories
Structured metadata:
‘perturbation-observation-assay’
1. ‘Object-oriented’ representation of experimental
variables: list biological components.
2. Retain the causality of the experimental design:
“Measurement of Y as a function of A, B, C,
using assay P in biological system S.”
3. Machine-readable representation with standard
identifiers.
measured
component
measured
component
perturbed
component
perturbed
component
experimental system
15/27
assayed
property
Data copy editors
18
Data
•Figure source data files
hosted by the journals
•Link to data repositories
Metadata
•Focus on the biological
content
•Use standard identifiers
and existing controlled
vocabularies
Search
•Sata-oriented semantic
search of the literature.
•Overcome some of the
limitations of keyword-
based search
10/27
SourceData
Data-oriented search
Resulting hypothesis: test drug Z in disease D.
tissue Ttissue T disease
D
disease
D
gene xgene x
Paper3 protein Xprotein X PPkinase Ykinase Y
Paper2
kinase Ykinase Y activityactivitydrug Zdrug Z
Paper1
Data-oriented search
19/27
Data-oriented search
CREBforskolin CREBforskolin CREBforskolin CREBtime
Query: More-like-this:
17/27
sdAnnotations:annotationID a
sdCore:PerturbationMeasurmentExp;
:linkedToPanel sdPanels:panelID;
:hasVariable
sdVariables:variable1;
:hasVariable
sdVariables:variable2;
:usingBiologicalSystem
sdBiolSystem:biolSystemNode;
:basedOnSourcedataset
sdSourceDatasets:dsID .
‘Next Generation’ Open Access
Data SearchMetadata
24
Raw, rare, well done...?
From raw to processed data
A data ‘ecosystem’
data access
search
ReaderReader
paper
data
AuthorAuthor
SourceDataSourceData
JournalsJournals Data repositoriesData repositories
26/27
Distributed infrastructure
Database
Journals
Users
Users
ResearchdataResearchdata
Smad3
Hey1
TGFbeta
VE-cdh
Rad51
foci
AR
Tsc2
1 4
6 2 5
3
1,4
4
5
6
2
…
…
Rad51
Nuclear
complexesTGFb, Smad3
Literature search engines
PubMed
72%
PubMed
72%
Europe PMC
<2%
Europe PMC
<2%
Google
17%
Google
17%
Data are published in papers
7/27
‘Publishing’
papers
‘Depositing’
datasets
Availability of published data and
software
• Datasets obtained by experimentation, computation or
data mining, should be made freely available, without
restriction.
• Software should be described in sufficient detail to
allow reproduction. If a specific implementation is the
focus of the study, free access for non-commercial
users is strongly recommended.
• Deposition of data should preferably be in one of the
public databases prior to submission.
Data deposition
Large-scale datasets, sequences, atomic coordinates
and computational models should be deposited in
one of the relevant public databases prior to
submission (provided private access is available at
the database) and authors should include accession
codes in the Materials & Methods section.
Big
Public databases
Structural data PDB, NDB, EMDataBank
Functional genomics GEO, ArrayExpress
Proteomics Pride, PeptideAtlas, PASSEL
PPI IMEx consortium
Clinical genomics datasets EGA, dbGAP
Metagenomics Genbank
Computational models BioModels, JWS
search
SourceData
Data
•Figure source data files
hosted by the journals
•Link to ‘unstructured
data’ repositories
Metadata
•Focus on the biological
content
•Use standard identifiers
and existing controlled
vocabularies
Search
•Data-oriented semantic
search of the literature.
•Overcome some of the
limitations of keyword-
based search
10/27
43
Pulverer-embo-source data-nfdp13
Pulverer-embo-source data-nfdp13
Pulverer-embo-source data-nfdp13

Más contenido relacionado

La actualidad más candente

BioAssay Research Database Presentation at the Chem Axon UGM 2013
BioAssay Research Database Presentation at the Chem Axon UGM 2013BioAssay Research Database Presentation at the Chem Axon UGM 2013
BioAssay Research Database Presentation at the Chem Axon UGM 2013Andrea de Souza
 
Investigating plant systems using data integration and network analysis
Investigating plant systems using data integration and network analysisInvestigating plant systems using data integration and network analysis
Investigating plant systems using data integration and network analysisCatherine Canevet
 
Data Publishing Workflows with Dataverse
Data Publishing Workflows with DataverseData Publishing Workflows with Dataverse
Data Publishing Workflows with DataverseMicah Altman
 
Library Databases
Library DatabasesLibrary Databases
Library Databasesirnmtn25
 
Effective search of bibliographic databases
Effective search of bibliographic databasesEffective search of bibliographic databases
Effective search of bibliographic databasesTarek Tawfik Amin
 
Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how Carole Goble
 
On community-standards, data curation and scholarly communication - BITS, Ita...
On community-standards, data curation and scholarly communication - BITS, Ita...On community-standards, data curation and scholarly communication - BITS, Ita...
On community-standards, data curation and scholarly communication - BITS, Ita...Susanna-Assunta Sansone
 
Pharos: A Torch to Use in Your Journey in the Dark Genome
Pharos: A Torch to Use in Your Journey in the Dark GenomePharos: A Torch to Use in Your Journey in the Dark Genome
Pharos: A Torch to Use in Your Journey in the Dark GenomeRajarshi Guha
 
RDAP 15 Lost in the Data Jungle: A Case Study for Organizing, Publishing, and...
RDAP 15 Lost in the Data Jungle: A Case Study for Organizing, Publishing, and...RDAP 15 Lost in the Data Jungle: A Case Study for Organizing, Publishing, and...
RDAP 15 Lost in the Data Jungle: A Case Study for Organizing, Publishing, and...ASIS&T
 
Nucl. Acids Res.-2014-Howe-nar-gku1244
Nucl. Acids Res.-2014-Howe-nar-gku1244Nucl. Acids Res.-2014-Howe-nar-gku1244
Nucl. Acids Res.-2014-Howe-nar-gku1244Yasel Cruz
 
Google Scholar and Web of Science: Similarities and Differences in Citation A...
Google Scholar and Web of Science: Similarities and Differences in Citation A...Google Scholar and Web of Science: Similarities and Differences in Citation A...
Google Scholar and Web of Science: Similarities and Differences in Citation A...Balachandar Radhakrishnan
 
Leveraging publication metadata to help overcome the data ingest bottleneck
Leveraging publication metadata to help overcome the data ingest bottleneck Leveraging publication metadata to help overcome the data ingest bottleneck
Leveraging publication metadata to help overcome the data ingest bottleneck Todd Vision
 
The stairs evaluation
The stairs evaluationThe stairs evaluation
The stairs evaluationmaruthimlis
 
NAISTビッグデータシンポジウム - 情報 松本先生
NAISTビッグデータシンポジウム - 情報 松本先生NAISTビッグデータシンポジウム - 情報 松本先生
NAISTビッグデータシンポジウム - 情報 松本先生ysuzuki-naist
 
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...Carole Goble
 

La actualidad más candente (18)

BioAssay Research Database Presentation at the Chem Axon UGM 2013
BioAssay Research Database Presentation at the Chem Axon UGM 2013BioAssay Research Database Presentation at the Chem Axon UGM 2013
BioAssay Research Database Presentation at the Chem Axon UGM 2013
 
Investigating plant systems using data integration and network analysis
Investigating plant systems using data integration and network analysisInvestigating plant systems using data integration and network analysis
Investigating plant systems using data integration and network analysis
 
Data Publishing Workflows with Dataverse
Data Publishing Workflows with DataverseData Publishing Workflows with Dataverse
Data Publishing Workflows with Dataverse
 
Library Databases
Library DatabasesLibrary Databases
Library Databases
 
Effective search of bibliographic databases
Effective search of bibliographic databasesEffective search of bibliographic databases
Effective search of bibliographic databases
 
Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how
 
On community-standards, data curation and scholarly communication - BITS, Ita...
On community-standards, data curation and scholarly communication - BITS, Ita...On community-standards, data curation and scholarly communication - BITS, Ita...
On community-standards, data curation and scholarly communication - BITS, Ita...
 
Entrez databases
Entrez databasesEntrez databases
Entrez databases
 
Pharos: A Torch to Use in Your Journey in the Dark Genome
Pharos: A Torch to Use in Your Journey in the Dark GenomePharos: A Torch to Use in Your Journey in the Dark Genome
Pharos: A Torch to Use in Your Journey in the Dark Genome
 
RDAP 15 Lost in the Data Jungle: A Case Study for Organizing, Publishing, and...
RDAP 15 Lost in the Data Jungle: A Case Study for Organizing, Publishing, and...RDAP 15 Lost in the Data Jungle: A Case Study for Organizing, Publishing, and...
RDAP 15 Lost in the Data Jungle: A Case Study for Organizing, Publishing, and...
 
Nucl. Acids Res.-2014-Howe-nar-gku1244
Nucl. Acids Res.-2014-Howe-nar-gku1244Nucl. Acids Res.-2014-Howe-nar-gku1244
Nucl. Acids Res.-2014-Howe-nar-gku1244
 
Gen bank databases
Gen bank databasesGen bank databases
Gen bank databases
 
Google Scholar and Web of Science: Similarities and Differences in Citation A...
Google Scholar and Web of Science: Similarities and Differences in Citation A...Google Scholar and Web of Science: Similarities and Differences in Citation A...
Google Scholar and Web of Science: Similarities and Differences in Citation A...
 
Leveraging publication metadata to help overcome the data ingest bottleneck
Leveraging publication metadata to help overcome the data ingest bottleneck Leveraging publication metadata to help overcome the data ingest bottleneck
Leveraging publication metadata to help overcome the data ingest bottleneck
 
The stairs evaluation
The stairs evaluationThe stairs evaluation
The stairs evaluation
 
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
 
NAISTビッグデータシンポジウム - 情報 松本先生
NAISTビッグデータシンポジウム - 情報 松本先生NAISTビッグデータシンポジウム - 情報 松本先生
NAISTビッグデータシンポジウム - 情報 松本先生
 
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
 

Similar a Pulverer-embo-source data-nfdp13

A Characterization Of The Scientific Data Analysis Process. Revision 1
A Characterization Of The Scientific Data Analysis Process. Revision 1A Characterization Of The Scientific Data Analysis Process. Revision 1
A Characterization Of The Scientific Data Analysis Process. Revision 1Sean Flores
 
Pathway studio into webinar 052715v1
Pathway studio into webinar 052715v1Pathway studio into webinar 052715v1
Pathway studio into webinar 052715v1Ann-Marie Roche
 
Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...Greg Landrum
 
A Federated In-Memory Database Computing Platform Enabling Real-Time Analysis...
A Federated In-Memory Database Computing Platform Enabling Real-Time Analysis...A Federated In-Memory Database Computing Platform Enabling Real-Time Analysis...
A Federated In-Memory Database Computing Platform Enabling Real-Time Analysis...Matthieu Schapranow
 
Reproducible research: theory
Reproducible research: theoryReproducible research: theory
Reproducible research: theoryC. Tobin Magle
 
Data publication: Discover, Explore, Visualise
Data publication: Discover, Explore, VisualiseData publication: Discover, Explore, Visualise
Data publication: Discover, Explore, VisualiseAlejandra Gonzalez-Beltran
 
Building linked data large-scale chemistry platform - challenges, lessons and...
Building linked data large-scale chemistry platform - challenges, lessons and...Building linked data large-scale chemistry platform - challenges, lessons and...
Building linked data large-scale chemistry platform - challenges, lessons and...Valery Tkachenko
 
Martin Ringwald, Mouse Gene Expression DB, fged_seattle_2013
Martin Ringwald, Mouse Gene Expression DB, fged_seattle_2013Martin Ringwald, Mouse Gene Expression DB, fged_seattle_2013
Martin Ringwald, Mouse Gene Expression DB, fged_seattle_2013Functional Genomics Data Society
 
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014Susanna-Assunta Sansone
 
Standards and tools for model management in biomedical research
Standards and tools for model management in biomedical researchStandards and tools for model management in biomedical research
Standards and tools for model management in biomedical researchUniversity Medicine Greifswald
 
GSmith Springer Nature Data policies and practices: HKU Open Data and Data Pu...
GSmith Springer Nature Data policies and practices: HKU Open Data and Data Pu...GSmith Springer Nature Data policies and practices: HKU Open Data and Data Pu...
GSmith Springer Nature Data policies and practices: HKU Open Data and Data Pu...GrahamSmith646206
 
ECCB 2014: Extracting patterns of database and software usage from the bioinf...
ECCB 2014: Extracting patterns of database and software usage from the bioinf...ECCB 2014: Extracting patterns of database and software usage from the bioinf...
ECCB 2014: Extracting patterns of database and software usage from the bioinf...geraintduck
 
tranSMART Community Meeting 5-7 Nov 13 - Session 3: transmart’s application t...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: transmart’s application t...tranSMART Community Meeting 5-7 Nov 13 - Session 3: transmart’s application t...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: transmart’s application t...David Peyruc
 
EUGM 2013 - Andrea de Souza (Broad Institute): Setting the stage for the “SD”...
EUGM 2013 - Andrea de Souza (Broad Institute): Setting the stage for the “SD”...EUGM 2013 - Andrea de Souza (Broad Institute): Setting the stage for the “SD”...
EUGM 2013 - Andrea de Souza (Broad Institute): Setting the stage for the “SD”...ChemAxon
 
Gaining credit for sharing research data
Gaining credit for sharing research dataGaining credit for sharing research data
Gaining credit for sharing research dataVarsha Khodiyar
 
Transparency and reproducibility in research
Transparency and reproducibility in researchTransparency and reproducibility in research
Transparency and reproducibility in researchLouise Corti
 
Data peer review workshop
Data peer review workshopData peer review workshop
Data peer review workshopVarsha Khodiyar
 
The Simulacrum, a Synthetic Cancer Dataset
The Simulacrum, a Synthetic Cancer DatasetThe Simulacrum, a Synthetic Cancer Dataset
The Simulacrum, a Synthetic Cancer DatasetCongChen35
 
ELSS use cases and strategy
ELSS use cases and strategyELSS use cases and strategy
ELSS use cases and strategyAnton Yuryev
 

Similar a Pulverer-embo-source data-nfdp13 (20)

A Characterization Of The Scientific Data Analysis Process. Revision 1
A Characterization Of The Scientific Data Analysis Process. Revision 1A Characterization Of The Scientific Data Analysis Process. Revision 1
A Characterization Of The Scientific Data Analysis Process. Revision 1
 
Pathway studio into webinar 052715v1
Pathway studio into webinar 052715v1Pathway studio into webinar 052715v1
Pathway studio into webinar 052715v1
 
Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...
 
A Federated In-Memory Database Computing Platform Enabling Real-Time Analysis...
A Federated In-Memory Database Computing Platform Enabling Real-Time Analysis...A Federated In-Memory Database Computing Platform Enabling Real-Time Analysis...
A Federated In-Memory Database Computing Platform Enabling Real-Time Analysis...
 
Reproducible research: theory
Reproducible research: theoryReproducible research: theory
Reproducible research: theory
 
Data publication: Discover, Explore, Visualise
Data publication: Discover, Explore, VisualiseData publication: Discover, Explore, Visualise
Data publication: Discover, Explore, Visualise
 
Building linked data large-scale chemistry platform - challenges, lessons and...
Building linked data large-scale chemistry platform - challenges, lessons and...Building linked data large-scale chemistry platform - challenges, lessons and...
Building linked data large-scale chemistry platform - challenges, lessons and...
 
Martin Ringwald, Mouse Gene Expression DB, fged_seattle_2013
Martin Ringwald, Mouse Gene Expression DB, fged_seattle_2013Martin Ringwald, Mouse Gene Expression DB, fged_seattle_2013
Martin Ringwald, Mouse Gene Expression DB, fged_seattle_2013
 
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014
 
Standards and tools for model management in biomedical research
Standards and tools for model management in biomedical researchStandards and tools for model management in biomedical research
Standards and tools for model management in biomedical research
 
GSmith Springer Nature Data policies and practices: HKU Open Data and Data Pu...
GSmith Springer Nature Data policies and practices: HKU Open Data and Data Pu...GSmith Springer Nature Data policies and practices: HKU Open Data and Data Pu...
GSmith Springer Nature Data policies and practices: HKU Open Data and Data Pu...
 
ECCB 2014: Extracting patterns of database and software usage from the bioinf...
ECCB 2014: Extracting patterns of database and software usage from the bioinf...ECCB 2014: Extracting patterns of database and software usage from the bioinf...
ECCB 2014: Extracting patterns of database and software usage from the bioinf...
 
tranSMART Community Meeting 5-7 Nov 13 - Session 3: transmart’s application t...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: transmart’s application t...tranSMART Community Meeting 5-7 Nov 13 - Session 3: transmart’s application t...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: transmart’s application t...
 
OpenTox Europe 2013
OpenTox Europe 2013OpenTox Europe 2013
OpenTox Europe 2013
 
EUGM 2013 - Andrea de Souza (Broad Institute): Setting the stage for the “SD”...
EUGM 2013 - Andrea de Souza (Broad Institute): Setting the stage for the “SD”...EUGM 2013 - Andrea de Souza (Broad Institute): Setting the stage for the “SD”...
EUGM 2013 - Andrea de Souza (Broad Institute): Setting the stage for the “SD”...
 
Gaining credit for sharing research data
Gaining credit for sharing research dataGaining credit for sharing research data
Gaining credit for sharing research data
 
Transparency and reproducibility in research
Transparency and reproducibility in researchTransparency and reproducibility in research
Transparency and reproducibility in research
 
Data peer review workshop
Data peer review workshopData peer review workshop
Data peer review workshop
 
The Simulacrum, a Synthetic Cancer Dataset
The Simulacrum, a Synthetic Cancer DatasetThe Simulacrum, a Synthetic Cancer Dataset
The Simulacrum, a Synthetic Cancer Dataset
 
ELSS use cases and strategy
ELSS use cases and strategyELSS use cases and strategy
ELSS use cases and strategy
 

Más de DataDryad

Wood-RDA and-data publishing-nfdp13
Wood-RDA and-data publishing-nfdp13Wood-RDA and-data publishing-nfdp13
Wood-RDA and-data publishing-nfdp13DataDryad
 
Smit-Scrap supplementary material-nfdp13
Smit-Scrap supplementary material-nfdp13Smit-Scrap supplementary material-nfdp13
Smit-Scrap supplementary material-nfdp13DataDryad
 
Michener-institutional and subject-specific data repositories-nfdp13
Michener-institutional and subject-specific data repositories-nfdp13Michener-institutional and subject-specific data repositories-nfdp13
Michener-institutional and subject-specific data repositories-nfdp13DataDryad
 
Hole-data journal-nfdp13
Hole-data journal-nfdp13Hole-data journal-nfdp13
Hole-data journal-nfdp13DataDryad
 
Shotton force11-nfdp13
Shotton force11-nfdp13Shotton force11-nfdp13
Shotton force11-nfdp13DataDryad
 
Coles partnerships quality and trust-nfdp13
Coles partnerships quality and trust-nfdp13Coles partnerships quality and trust-nfdp13
Coles partnerships quality and trust-nfdp13DataDryad
 
Irving-TeraData: data and science driven big industry-nfdp13
Irving-TeraData: data and science driven big industry-nfdp13Irving-TeraData: data and science driven big industry-nfdp13
Irving-TeraData: data and science driven big industry-nfdp13DataDryad
 
Mounce-Herding Cats
Mounce-Herding CatsMounce-Herding Cats
Mounce-Herding CatsDataDryad
 
Pfeiffenberger-Data Policies and Sustainability-NFDP13
Pfeiffenberger-Data Policies and Sustainability-NFDP13Pfeiffenberger-Data Policies and Sustainability-NFDP13
Pfeiffenberger-Data Policies and Sustainability-NFDP13DataDryad
 
Lyon-data metrics panel introduction-nfdp13
Lyon-data metrics panel introduction-nfdp13Lyon-data metrics panel introduction-nfdp13
Lyon-data metrics panel introduction-nfdp13DataDryad
 
Lyon-data publishing challenges-nfdp13
Lyon-data publishing challenges-nfdp13Lyon-data publishing challenges-nfdp13
Lyon-data publishing challenges-nfdp13DataDryad
 
Costas-data metrics-nfdp13
Costas-data metrics-nfdp13Costas-data metrics-nfdp13
Costas-data metrics-nfdp13DataDryad
 
Mowlam-semantic publishing-up-nfdp13
Mowlam-semantic publishing-up-nfdp13Mowlam-semantic publishing-up-nfdp13
Mowlam-semantic publishing-up-nfdp13DataDryad
 
Manola-open aire and data publishing-nfdp13
Manola-open aire and data publishing-nfdp13Manola-open aire and data publishing-nfdp13
Manola-open aire and data publishing-nfdp13DataDryad
 
Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13
Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13
Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13DataDryad
 
Green-oecd and data publishing-nfdp13
Green-oecd and data publishing-nfdp13Green-oecd and data publishing-nfdp13
Green-oecd and data publishing-nfdp13DataDryad
 
Lawrence-f1000-publishing with data-nfdp13
Lawrence-f1000-publishing with data-nfdp13Lawrence-f1000-publishing with data-nfdp13
Lawrence-f1000-publishing with data-nfdp13DataDryad
 
Karunkara-Keynote-msf and open data-nfdp2013
Karunkara-Keynote-msf and open data-nfdp2013Karunkara-Keynote-msf and open data-nfdp2013
Karunkara-Keynote-msf and open data-nfdp2013DataDryad
 
Fox-Keynote-Now and Now of Data Publishing-nfdp13
Fox-Keynote-Now and Now of Data Publishing-nfdp13Fox-Keynote-Now and Now of Data Publishing-nfdp13
Fox-Keynote-Now and Now of Data Publishing-nfdp13DataDryad
 
Hodson-Introduction-nfdp13
Hodson-Introduction-nfdp13Hodson-Introduction-nfdp13
Hodson-Introduction-nfdp13DataDryad
 

Más de DataDryad (20)

Wood-RDA and-data publishing-nfdp13
Wood-RDA and-data publishing-nfdp13Wood-RDA and-data publishing-nfdp13
Wood-RDA and-data publishing-nfdp13
 
Smit-Scrap supplementary material-nfdp13
Smit-Scrap supplementary material-nfdp13Smit-Scrap supplementary material-nfdp13
Smit-Scrap supplementary material-nfdp13
 
Michener-institutional and subject-specific data repositories-nfdp13
Michener-institutional and subject-specific data repositories-nfdp13Michener-institutional and subject-specific data repositories-nfdp13
Michener-institutional and subject-specific data repositories-nfdp13
 
Hole-data journal-nfdp13
Hole-data journal-nfdp13Hole-data journal-nfdp13
Hole-data journal-nfdp13
 
Shotton force11-nfdp13
Shotton force11-nfdp13Shotton force11-nfdp13
Shotton force11-nfdp13
 
Coles partnerships quality and trust-nfdp13
Coles partnerships quality and trust-nfdp13Coles partnerships quality and trust-nfdp13
Coles partnerships quality and trust-nfdp13
 
Irving-TeraData: data and science driven big industry-nfdp13
Irving-TeraData: data and science driven big industry-nfdp13Irving-TeraData: data and science driven big industry-nfdp13
Irving-TeraData: data and science driven big industry-nfdp13
 
Mounce-Herding Cats
Mounce-Herding CatsMounce-Herding Cats
Mounce-Herding Cats
 
Pfeiffenberger-Data Policies and Sustainability-NFDP13
Pfeiffenberger-Data Policies and Sustainability-NFDP13Pfeiffenberger-Data Policies and Sustainability-NFDP13
Pfeiffenberger-Data Policies and Sustainability-NFDP13
 
Lyon-data metrics panel introduction-nfdp13
Lyon-data metrics panel introduction-nfdp13Lyon-data metrics panel introduction-nfdp13
Lyon-data metrics panel introduction-nfdp13
 
Lyon-data publishing challenges-nfdp13
Lyon-data publishing challenges-nfdp13Lyon-data publishing challenges-nfdp13
Lyon-data publishing challenges-nfdp13
 
Costas-data metrics-nfdp13
Costas-data metrics-nfdp13Costas-data metrics-nfdp13
Costas-data metrics-nfdp13
 
Mowlam-semantic publishing-up-nfdp13
Mowlam-semantic publishing-up-nfdp13Mowlam-semantic publishing-up-nfdp13
Mowlam-semantic publishing-up-nfdp13
 
Manola-open aire and data publishing-nfdp13
Manola-open aire and data publishing-nfdp13Manola-open aire and data publishing-nfdp13
Manola-open aire and data publishing-nfdp13
 
Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13
Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13
Zudilova-Seinstra-Elsevier-data and the article of the future-nfdp13
 
Green-oecd and data publishing-nfdp13
Green-oecd and data publishing-nfdp13Green-oecd and data publishing-nfdp13
Green-oecd and data publishing-nfdp13
 
Lawrence-f1000-publishing with data-nfdp13
Lawrence-f1000-publishing with data-nfdp13Lawrence-f1000-publishing with data-nfdp13
Lawrence-f1000-publishing with data-nfdp13
 
Karunkara-Keynote-msf and open data-nfdp2013
Karunkara-Keynote-msf and open data-nfdp2013Karunkara-Keynote-msf and open data-nfdp2013
Karunkara-Keynote-msf and open data-nfdp2013
 
Fox-Keynote-Now and Now of Data Publishing-nfdp13
Fox-Keynote-Now and Now of Data Publishing-nfdp13Fox-Keynote-Now and Now of Data Publishing-nfdp13
Fox-Keynote-Now and Now of Data Publishing-nfdp13
 
Hodson-Introduction-nfdp13
Hodson-Introduction-nfdp13Hodson-Introduction-nfdp13
Hodson-Introduction-nfdp13
 

Último

Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...Pooja Nehwal
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 

Último (20)

Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 

Pulverer-embo-source data-nfdp13

Notas del editor

  1. Transparent process provides a permissive environment for the publication of ethically robust papers by releasing some of the pressures in the race to publish in biology.
  2. I would like to present some initiatives and ideas we have with regard to published data. They represent an extension of the concept of transparency to the data we publish in our journals, but also and extension to the concept of open access . Several of these ideas are currently being developed in a project called SourceData that I will briefly summarize.
  3. Data are the heart of a paper: free text is author interpretation; data is absolute . WE think it is important to think about how data are presented in figures. But publishing faces many challenges. One of which is that that the rate of publishing is increasing: 1 Mio papers are currently indexed in every year, twice as much as 10 years ago. Some journals like PLOS one are in exponential growth phase. It is thus becoming harder and harder to search the literature and find specific information. While less and less people actually manage to keep up with this mass of human-readable papers, we rely more and more on machines to access these documents which are however poorly machine-readable.
  4. Deconstruct a paper : it is a stacked layered structure that allow access to the content with increasing depth, from the title, abstracts down to the data. Title and abstract provide quick access to the browser. synopses and visual abstracts provide summaries of key facts The core is the main paper which is optimized for the human reader. At a deeper level there are supplementary information, structured datasets and computer codes.
  5. What we would like to achieve in the near future is to eliminate the concept of supp info : - the volume of these supplementary sections is continuously growing, -they are not well reviewed, not copy edited, often not well presented -sometimes contain data only peripherally related to the main conclusions.
  6. Instead of supp info, we propose to have an expert view of the paper. Some data can be repetitive and make papers difficult to read. We propose to have two views of the paper: one for general readers , that correspond to the main paper as we know it now, and an expert view where the additional in-depth information, data are included within the paper as expandable/colapsible sections.
  7. Similarly, to encourage maximal use/reuse, we will make all the datasets and source data freely available under a CC0 license by default.
  8. Data are the core of a research paper, yet these figures are published as images, that is a collection of pixels, making it impossible to re-analyse, to re-use or to find data easily. &gt; affects all journals, whether they are open access or not. &gt; most of the published scientific data remains locked inside the papers.
  9. To start addressing this challenge, we have started the SourceData project. With SourceData we want to be able to publish figures into structured digital objects by linking Figures with Source data and machine readable metadata. This in order to improve data transparency, to promote data sharing and to enable data-oriented search.
  10. Three components… This first step of the project is to enable authors to provide the raw data that are behind the figures. This can be done in several ways: either host the data in Journals , and I will show an example in a minute, or elect to host the data on one of the several ‘ unstructured data repositories’ such as Dryad and provide links to these resources.
  11. This is not limited to numerical data. Here is an example from the EMBO Journal where the full gels are provided as uncropped images, allowing readers to examine the blots beyond the narrow slices usually displayed in published figures. Also micrographs.
  12. Datasets alone are however of limited utility: Need to associate these data with structured metadata that explain the biological content of the data. To be useful for data mining, this must be done in a machine-readable way which involves the use of standard identifiers, controlled vocabularies and existing ontologies .
  13. The second level will encode a fundamental structure common to many biomedical experiments. This is most easily seen for data that are represented as a plot: these data result from an experiments that where a given biological component Y was measured or observed as a function of various experimental conditions or perturbations A, B, C, using an assay P in a defined experimental system S. This separation between components that are observed — the phosphorylation level of a protein — and the components that are perturbed — for example a kinase inhibited by a drug — can be applied across an extremely broad range of published data, whether western blots, histological preparations or microscopy. This is because the model is able to represent the causality underlying an experimental design. This representation of directional relationship between biological components represents a backbone model on which much details can be elaborated. It is thus scalable as a model, in the sense that it can be extended and refined and specialized models can be derived from this backbone.
  14. So we are currently developing tools that will enable curation of accepted manuscript by data editors and embed the curation process in the production process.
  15. Finally, the third component is to use the machine-readable metadata to enable data-oriented searches of the papers based on the data they contain. The semantic information provided by the metadata will help overcoming some of the limitations of text-based searches. Source Data will make figures more useable. The search will make them discoverable
  16. Source Data will allow to search papers through their data. If for example we are interested in finding data about CDK1 substrates , we can formulate a more or less complex query in PubMed. In this case, we would find a series of papers. To check the relevance we would have to open them, read the abstract, check the article and the figures. If the figures would be have been annotated with SourceData metadata, we could search directly for published experiments where measurements were conducted under conditions where CDK1 activity was perturbed. This would lead us to the relevant data inside the papers, from where we can link out to the associated papers.
  17. As a consequence, related experiments can be found across papers in the literature and joined in a directional way to help generating hypotheses. In this example: drug Z might be interesting to test for disease D. It would be extremely difficult to perform such tasks in a systematical manner with conventional search strategies. This is an applications that goes beyond mere search and is a step towards the integration of multiple datasets. This will potentially be an extremely powerful feature to generate new hypotheses and potentially new findings.
  18. find figures that are closely related to each other. With this function, from a starting figure, it would be possible to find related figures and the respective papers. This is a function that resembles the function ‘related articles’ in PubMed but applied to individual figure panels.
  19. To conclude, the last ten years has seen profound changes in scientific publishing: the transition to online publishing and open access content has opened the door to the large-scale systematic mining of the literature. This transition needs however to be completed to go beyond access to the text and offer deeper access to the research data. The current format of the human-readable version of the papers will remain . But the paper of the future will need to be associated to a machine-readable version of the paper. With SourceData, we will make published data useable by linking them to explanatory machine-readable metadata. These metadata will in turn enable data-oriented search functions that will increase the discoverability the papers. This will represent the next generation of open access which will enable a much deeper access to the literature and a systematical mining and integration of published data across the literature. Such transformation will be needed to benefit from the potential of research data to generate new findings and accelerate scientific discovery.
  20. It is very early days to predict how the data ecosystem will stabilize both at the technical and economical level. From the users point of view, the basic tasks have to remain as simple and straightforward as possible: authors want to submit their papers and data, and readers need to access the data and search the literature. The role of SourceData in this ecosystem is that it will provide a series of tools and services that will will create a win-win-win situation across the major stakeholders: authors will benefit from the increase discoverability of their research, journals and data repositories will increase the visibility of their content and add more value to their content, which is a crucial issue currently in publishing readers will have a greater and deeper access to data and to the literature.
  21. 50 panels on TGF beta signaling data, annotated primitivelly
  22. These data that are published in papers are mainly presented in figures. It is in the figures that the evidence that support conclusions are shown. Figures are absolutely essential to make a formal scientific proof.
  23. The fact that these large datasets do not fit the classical format of published papers, especially when print was still relevant, has created a situation where papers and data largely live parallel and separate lives: Papers are published in journals, datasets are deposited in databases. This has maybe conditioned us to think of scientific publishing and data dissemination in separate terms.
  24. The importance of making research data available has been largely driven by the fields in biology that produce large-scale datasets—genomics and the other omics fields, but also structural biology.
  25. This has serious consequences for search, which has become essential to find specific information in this ocean of papers.
  26. This first step of the project is to enable authors to provide the raw data that are behind the figures—we call these source data and this gave the name to the entire project. This can be done in several ways: either host the data on the publisher’s website, and I will show an example in a minute, or elect to host the data on one of the several ‘unstructured data repositories’ such as Dryad and provide links to these resources. Datasets alone are however of limited utility. The second component which central to the present proposal today is to associate these data with structured metadata that explain the biological content of the data. To be useful for global data mining, this must be done in a machine-readable way which involves the use of standard identifiers and controlled vocabularies and existing ontologies. Finally, the third component is to use the machine-readable metadata to enable data-oriented searches of the papers based on the data they contain. The semantic information provided by the metadata will help overcoming some of the limitations of text-based searches. The metadata will make the data more useable. The search will make them discoverable So, let us briefly review these three components.