SlideShare una empresa de Scribd logo
1 de 39
Descargar para leer sin conexión
Data Consultant,
Honorary Academic Editor
Associate Director,
Principal Investigator
The rise of the data-centric !
research and publication enterprises!
Susanna-Assunta Sansone, PhD!
!
uk.linkedin.com/in/sasansone!
@biosharing!
@isatools!
@scientificdata!
!
'Managing Big Data - Setting the standards for analyzing and integrating big data’, Berlin, July 9-10, 2014
http://www.slideshare.net/SusannaSansone
•  About myself!
o  activities and interests!
•  Be FAIR to your data!
o  concept!
o  my related projects!
•  The Scientific Data exemplar!
o  rationale!
o  Data Descriptors!
Outline!
My areas of activity:!
•  Data capture and curation!
•  Data (nano)publication!
•  Data provenance !
•  Open, community ontologies
and standards!
•  Semantic web!
•  Software development!
•  Training!
Communities I work with/for:! As part of:!
•  UK, European and international
consortia!
•  Pre-competitive informatics
public-private partnerships!
•  Standardization initiatives!
with e.g.:!
Notes in Lab Books
(information for humans)
Spreadsheets andTables
( the compromise)
Facts as RDF statements
(information for machines)
Notes and narrative! Spreadsheets and tables! Linked data and nanopublications!
Enabling reproducible research and open science,
driving science and discoveries !
Increase the level of annotation at the source, tracking provenance and using community standards
https://projects.ac/blog/five-top-reasons-to-protect-your-data-and-practise-safe-science/
Credit to:
A great start, but not enough!
image by Greg Emmerich
http://discovery.urlibraries.org/
http://www.theguardian.com/higher-education-network/blog/2014/jun/26
Findable, Accessible, Interoperable, Reusable!
Worldwide movement for FAIR data
In all fairness, no much data is FAIR!
But it is not just about technology…!
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta
Sansone www.ebi.ac.uk/net-project
1
0
…breath and depth of the content!
…is pivotal!
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta
Sansone www.ebi.ac.uk/net-project
11
sample characteristic(s)!
experimental design!
experimental variable(s)!
technology(s)!
measurement(s)!
protocols(s)!
data file(s)!
......!
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta
Sansone www.ebi.ac.uk/net-project
1
2
•  make annotation explicit
and discoverable
•  structure the descriptions for
consistency
•  ensure/regulate access
•  deposit and publish
•  etc….
§  To make this dataset ‘FAIR’, one
must have standards, tools and
best practices to:
•  report sufficient details
•  capture all salient features of
the experimental workflow
A community mobilization to develop standards, e.g.:
§  Structural and operational differences
•  organization types (open, close to members, society, WG etc.)
•  standards development (how to formulate, conduct and maintain)
•  adoption, uptake, outreach (link to journals, funders and commercial sector)
•  funds (sponsors, memberships, grants, volunteering)
de jure de facto
grass-roots
groups
standard
organizations
Nanotechnology Working Group
A community mobilization to develop standards, e.g.:
§  Structural and operational differences
•  organization types (open, close to members, society, WG etc.)
•  standards development (how to formulate, conduct and maintain)
•  adoption, uptake, outreach (link to journals, funders and commercial sector)
•  funds (sponsors, memberships, grants, volunteering)
de jure de facto
grass-roots
groups
standard
organizations
Nanotechnology Working Group
Focus on reporting or content standards
Including minimum
information reporting
requirements, or
checklists to report the
same core, essential
information
Including controlled
vocabularies, taxonomies,
thesauri, ontologies etc. to
use the same word and
refer to the same ‘thing’
Including conceptual
model, conceptual
schema from which an
exchange format is derived
to allow data to flow from
one system to another
Community-developed, standards are pivotal to structure, enrich the
description and share datasets, facilitating understanding and reuse!
16
Technologically-delineated
views of the world

!
Biologically-delineated
views of the world!
Generic features ( common core )!
- description of source biomaterial!
- experimental design components!
Arrays!
Scanning! Arrays &

Scanning!
Columns!
Gels!
MS! MS!
FTIR!
NMR!
Columns!
transcriptomics
proteomics
metabolomics
plant biology
epidemiology
microbiology
Fragmentation, duplications and gaps
To compare and integrate data we need interoperable standards
17
Arrays!
Scanning! Arrays &

Scanning!
Columns!
Gels!
MS! MS!
FTIR!
NMR!
Columns!
transcriptomics
proteomics
metabolomics
Synergistic examples exist, but more are needed!
Growing number of reporting standards
+ 130
+ 150
+ 303
Source:BioPortal
Databases, !
annotation,!
curation !
tools !
implementing !
standards!
miame!
MIAPA!
MIRIAM!
MIQAS!
MIX!
MIGEN!
CIMR!
MIAPE!
MIASE!
MIQE!
MISFISHIE….!
REMARK!
CONSORT!
MAGE-Tab!
GCDML!
SRAxml!
SOFT!
FASTA!
DICOM!
MzML!
SBRML!
SEDML…!
GELML!
ISA-Tab!
CML!
MITAB!
AAO!
CHEBI!
OBI!
PATO! ENVO!
MOD!
BTO!
IDO…!
TEDDY!
PRO!
XAO!
DO
VO!
Source:BioSharing
Source:BioSharing
Navigating the sea of standards is not trivial!
The relationship among popular standard formats for pathway information!
BioPAX and PSI-MI are designed for data exchange to and from databases and pathway and
network data integration. SBML and CellML are designed to support mathematical simulations
of biological systems and SBGN represents pathway diagrams. !
CREDIT:
Demir, et al., The BioPAX
community standard for
pathway data sharing, 2010.
Which standards and database can we use/recommend
I work in the field of cell
migration research,
which one are
applicable to me?
I us cell migration in
translational research, are
there specific clinical
standards?
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta
Sansone www.ebi.ac.uk/net-project
2
2
Registering and cataloging is just step one; the next one are:
•  Develop assessment criteria for usability and popularity of standards
•  Associate standards to data policies and databases
•  Assemble journal and funder policies re data storage
•  Make fully cross-searchable
•  Intended goal: help stakeholders make informed decisions
General-purpose, configurable format,
designed to support:
•  description of the experimental metadata,
making the annotation explicit and
discoverable
•  provenance tracking
•  use community standards, such as minimal
reporting guidelines and terminologies
•  designed to be converted to - a growing
number of - other metadata formats, e.g.
used by EBI repositories
analysis !
method! script!
Data file or !
record in a
database!
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta
Sansone www.ebi.ac.uk/net-project
ISA powers data collection, curation resources and repositories, e.g.:
eTRIKS – european Translational Information
and Knowledge management Services
Consortium of academic (Imperial College, CNRS, Un
of Luxemburg) and pharmas (Janssen, Merck, AZ, Lilly,
Lundbeck, Pfizer, Roche, Sanofi, Bayer, GSK) building
a sustainable, open translational research informatics
platform
• Nature Publishing Group‘s Scientific Data
• BioMedCentral and BGI‘s GigaScience
• F1000 Research
• Oxford University Press
Susanna-Assunta Sansone, Philippe Rocca-Serra, Alejandra Gonzalez-Beltran, Eamonn Maguire, Milo Thurston;
Oxford alumni: Annapaola Santarsiero, Pavlos Georgiou + my previous team when at EBI (2001 – 2010)!
•  About myself!
o  activities and interests!
•  Be FAIR to your data!
o  concept!
o  my related projects!
•  The Scientific Data exemplar!
o  rationale!
o  Data Descriptors!
Outline!
FAIR data - roles and responsibilities
•  Data has to become an integral part of
the scholarly communications!
•  Responsibilities lie across several
stakeholder groups: researchers, data
centers, librarians, funding agencies and
publishers!
•  But publishers occupy a “leverage point”
in this process!
Human Genome 2001
62 Pages, 150 Authors,
49 Figure, 27 tables
Encode Project 2012
30 papers,
3 Journals
Journal publishing - changing landscape !


!
!
!
Launched on May 27th, 2014
A new online-only publication for descriptions of scientifically valuable datasets
in the life, environmental and biomedical sciences, but not limited to these!
Credit for sharing
your data
Focused on reuse
and reproducibility
Peer reviewed,
curated
Promoting Community
Data Repositories
Open Access
Supported by:!
!
!
!
Experimental metadata or!
structured component!
(in-house curated, machine-
readable formats)!
Article or !
narrative component!
(PDF and HTML)!


Data Descriptor: narrative and structure!
!
!
!
!
!
!
!
!
Scientific hypotheses:!
Synthesis!
Analysis!
Conclusions!
Methods and technical analyses supporting
the quality of the measurements:!
What did I do to generate the data?!
How was the data processed?!
Where is the data?!
Who did what when!
BEFORE: get your data to the community as soon as possible (see NPG pre-publication policy)
AT THE SAME TIME: publish your Data Descriptor(s) alongside research article(s)
AFTER: expand on your research articles, adding further information for reuse of the data


Relation with traditional articles - content and time!
Export to various formats
(ISA_tab, RDF, etc)
Linking between research papers, Data Descriptors, and data records


Making data discoverable

!
We currently recognize over
50 public data repositories!
!
Evaluation is not be based on the perceived impact or novelty of the findings!
•  Experimental Rigour and Technical Data Quality!
o  Were the data produced in a rigorous and methodologically sound manner?!
o  Was the technical quality of the data supported convincingly with technical validation
experiments and statistical analyses of data quality or error, as needed?!
o  Are the depth, coverage, size, and/or completeness of these data sufficient for the types of
applications or research questions outlined by the authors?!
•  Completeness of the Description!
o  Are the methods and any data-processing steps described in sufficient detail to allow others to
reproduce these steps?!
o  Did the authors provide all the information needed for others to reuse this dataset or integrate it
with other data?!
o  Is this Data Descriptor, in combination with any repository metadata, consistent with relevant
minimum information or reporting standards?!
•  Integrity of the Data Files and Repository Record!
o  Have you confirmed that the data files deposited by the authors are complete and match the
descriptions in the Data Descriptor?!
o  Have these data files been deposited in the most appropriate available data repository?!
Peer review process focused on quality and reuse!
•  Neuroscience, ecology, epidemiology, environmental science,
functional genomics, metabolomics, toxicology!
•  New datasets and previously published data sets!
o  a fuller, more in-depth look at the data processing steps,
supported by additional data files and code from each step!
o  additional tutorial-like information for scientists interested in
reusing or integrating the data with their own!
•  Datasets in figshare and domain specific databases!
•  Code deposited in figshare and GitHub!
•  Individual datasets, curated aggregation and citizen science!
•  First dataset part of a collection !
•  Academic and industry authors!
37
Current content is diverse – bimonthly releases !
•  Do you run a data resource we should recognize?!
o  See on our website the list of criteria databases should meet!!
•  Are you interested in facilitating submission to us? !
o  See our ISA-Tab specification on the website!
-  you can implement and export in this format from your authoring/curation tool, or
from your database!!
•  Do you want to submit Data Descriptor(s)?!
o  Check suitability by sending a pre-submission enquire, we accept:!
-  Submissions in the life, environmental and biomedical sciences; but not limited to!
-  Experimental, observational and computational datasets!
-  Individual datasets, curated aggregations, and collections!
-  Unpublished data and follow-up, with additional information for wider reuse, e.g.:!
ü  a fuller, more in-depth look at the data processing steps, supported by additional data
files and code from each step!
ü  additional tutorial-like information for scientists interested in reusing or integrating the
data with their own!


Interested in collaborating and/or enable submission?!
Helping you publish, discover and reuse research data
Visit
nature.com/scientificdata
Email
scientificdata@nature.com
Tweet
@ScientificData
Supported by:!
Honorary Academic Editor
Susanna-Assunta Sansone, PhD
Managing Editor
Andrew L Hufton, PhD
Editorial Curator
Victoria Newman
Advisory Panel and Editorial Board
including senior researchers, funders,
librarians and curators

Más contenido relacionado

La actualidad más candente

OeRC_BioNatMedSciences_TeamOverview_Dec2013
OeRC_BioNatMedSciences_TeamOverview_Dec2013OeRC_BioNatMedSciences_TeamOverview_Dec2013
OeRC_BioNatMedSciences_TeamOverview_Dec2013
Susanna-Assunta Sansone
 

La actualidad más candente (20)

Overview of standards/stakeholders in life science (RDA Engagement Interest G...
Overview of standards/stakeholders in life science (RDA Engagement Interest G...Overview of standards/stakeholders in life science (RDA Engagement Interest G...
Overview of standards/stakeholders in life science (RDA Engagement Interest G...
 
OeRC_BioNatMedSciences_TeamOverview_Dec2013
OeRC_BioNatMedSciences_TeamOverview_Dec2013OeRC_BioNatMedSciences_TeamOverview_Dec2013
OeRC_BioNatMedSciences_TeamOverview_Dec2013
 
FAIR data and standards for a coordinated COVID-19 response
FAIR data and standards for a coordinated COVID-19 responseFAIR data and standards for a coordinated COVID-19 response
FAIR data and standards for a coordinated COVID-19 response
 
ISA - a short overview - Dec 2013
ISA - a short overview - Dec 2013ISA - a short overview - Dec 2013
ISA - a short overview - Dec 2013
 
"Standards landscape" NIF Big Data 2 Knowledge (BD2K) Initiative, Sep, 2013
"Standards landscape" NIF Big Data 2 Knowledge (BD2K) Initiative, Sep, 2013"Standards landscape" NIF Big Data 2 Knowledge (BD2K) Initiative, Sep, 2013
"Standards landscape" NIF Big Data 2 Knowledge (BD2K) Initiative, Sep, 2013
 
FAIRsharing poster
FAIRsharing posterFAIRsharing poster
FAIRsharing poster
 
The FAIR Principles and FAIRsharing
The FAIR Principles and FAIRsharingThe FAIR Principles and FAIRsharing
The FAIR Principles and FAIRsharing
 
RDA BioSharing WG + RDA Metabolomics IG OVERVIEWS
RDA BioSharing WG + RDA Metabolomics IG OVERVIEWSRDA BioSharing WG + RDA Metabolomics IG OVERVIEWS
RDA BioSharing WG + RDA Metabolomics IG OVERVIEWS
 
Enabling FAIR - what works?
Enabling FAIR - what works? Enabling FAIR - what works?
Enabling FAIR - what works?
 
The FAIR movement - Oxford Open Data Week
The FAIR movement - Oxford Open Data WeekThe FAIR movement - Oxford Open Data Week
The FAIR movement - Oxford Open Data Week
 
All Things Biocuration
All Things BiocurationAll Things Biocuration
All Things Biocuration
 
FAIRsharing: how we assist with FAIRness
FAIRsharing: how we assist with FAIRnessFAIRsharing: how we assist with FAIRness
FAIRsharing: how we assist with FAIRness
 
The FAIR Cookbook in a nutshell
The FAIR Cookbook in a nutshellThe FAIR Cookbook in a nutshell
The FAIR Cookbook in a nutshell
 
FAIR and FAIRsharing - ESOF 2020
FAIR and FAIRsharing - ESOF 2020FAIR and FAIRsharing - ESOF 2020
FAIR and FAIRsharing - ESOF 2020
 
FAIR resources, selected examples from ELIXIR-related projects
FAIR resources, selected examples from ELIXIR-related projectsFAIR resources, selected examples from ELIXIR-related projects
FAIR resources, selected examples from ELIXIR-related projects
 
FAIRsharing: curating an ecosystem of research standards and databases
FAIRsharing: curating an ecosystem of research standards and databasesFAIRsharing: curating an ecosystem of research standards and databases
FAIRsharing: curating an ecosystem of research standards and databases
 
Behind the FAIR brand: Thinkers, Doers and Dreamers
Behind the FAIR brand: Thinkers, Doers and DreamersBehind the FAIR brand: Thinkers, Doers and Dreamers
Behind the FAIR brand: Thinkers, Doers and Dreamers
 
FAIR: standards and services
FAIR: standards and servicesFAIR: standards and services
FAIR: standards and services
 
The FAIR Cookbook poster
The FAIR Cookbook posterThe FAIR Cookbook poster
The FAIR Cookbook poster
 
EOSC-Life AGM 2022 Publishing FAIR RI data resources in EOSC.pdf
EOSC-Life AGM 2022 Publishing FAIR RI data resources in EOSC.pdfEOSC-Life AGM 2022 Publishing FAIR RI data resources in EOSC.pdf
EOSC-Life AGM 2022 Publishing FAIR RI data resources in EOSC.pdf
 

Destacado

NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...
NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...
NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...
Susanna-Assunta Sansone
 

Destacado (11)

On community-standards, data curation and scholarly communication - BITS, Ita...
On community-standards, data curation and scholarly communication - BITS, Ita...On community-standards, data curation and scholarly communication - BITS, Ita...
On community-standards, data curation and scholarly communication - BITS, Ita...
 
NIH BD2K DataMed data index - DATS model
NIH BD2K DataMed data index - DATS modelNIH BD2K DataMed data index - DATS model
NIH BD2K DataMed data index - DATS model
 
My projects at University of Oxford e-Research Centre - Nov 2014
My projects at University of Oxford e-Research Centre - Nov 2014My projects at University of Oxford e-Research Centre - Nov 2014
My projects at University of Oxford e-Research Centre - Nov 2014
 
NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...
NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...
NPG Scientific Data; SSP, Boston, May 2014: http://www.sspnet.org/events/annu...
 
NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science
NC3Rs Publication Bias workshop - Sansone - Better Data = Better ScienceNC3Rs Publication Bias workshop - Sansone - Better Data = Better Science
NC3Rs Publication Bias workshop - Sansone - Better Data = Better Science
 
RDA - Long Tail Data Interest Group - NPG Scientitic Data oveview
RDA - Long Tail Data Interest Group - NPG Scientitic Data oveviewRDA - Long Tail Data Interest Group - NPG Scientitic Data oveview
RDA - Long Tail Data Interest Group - NPG Scientitic Data oveview
 
High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014
High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014
High quality data publications: drives and needs - Sansone, BDebate, 12 Nov 2014
 
On community-standards, data curation and scholarly communication" Stanford M...
On community-standards, data curation and scholarly communication" Stanford M...On community-standards, data curation and scholarly communication" Stanford M...
On community-standards, data curation and scholarly communication" Stanford M...
 
B4OS-2012
B4OS-2012B4OS-2012
B4OS-2012
 
BioSharing WG - ELIXIR IG - RDA Plenary 7, Tokyo, March 2016
BioSharing WG - ELIXIR IG - RDA Plenary 7, Tokyo, March 2016BioSharing WG - ELIXIR IG - RDA Plenary 7, Tokyo, March 2016
BioSharing WG - ELIXIR IG - RDA Plenary 7, Tokyo, March 2016
 
Sansone Westminster Higher Education Forum - Open Access, Open Data - March 2015
Sansone Westminster Higher Education Forum - Open Access, Open Data - March 2015Sansone Westminster Higher Education Forum - Open Access, Open Data - March 2015
Sansone Westminster Higher Education Forum - Open Access, Open Data - March 2015
 

Similar a Managing Big Data - Berlin, July 9-10, 201.

Similar a Managing Big Data - Berlin, July 9-10, 201. (20)

Overview to: BBSRC Oxford Doctoral Training Partnership - Dr Sansone - July 2014
Overview to: BBSRC Oxford Doctoral Training Partnership - Dr Sansone - July 2014Overview to: BBSRC Oxford Doctoral Training Partnership - Dr Sansone - July 2014
Overview to: BBSRC Oxford Doctoral Training Partnership - Dr Sansone - July 2014
 
FAIR data and NPG Scientific Data: RIKEN Yokohama, 25 June, 2014
FAIR data and NPG Scientific Data: RIKEN Yokohama, 25 June, 2014FAIR data and NPG Scientific Data: RIKEN Yokohama, 25 June, 2014
FAIR data and NPG Scientific Data: RIKEN Yokohama, 25 June, 2014
 
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific DataNIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
 
Sansone mibbi-intro
Sansone mibbi-introSansone mibbi-intro
Sansone mibbi-intro
 
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014
 
Repositories as key players in non-commercial open access - a developing reg...
Repositories as key players in non-commercial open access  - a developing reg...Repositories as key players in non-commercial open access  - a developing reg...
Repositories as key players in non-commercial open access - a developing reg...
 
Repositories as key players in non-commercial open access - a developing reg...
Repositories as key players in non-commercial open access  - a developing reg...Repositories as key players in non-commercial open access  - a developing reg...
Repositories as key players in non-commercial open access - a developing reg...
 
ELIXIR Webinar: BioSharing
ELIXIR Webinar: BioSharingELIXIR Webinar: BioSharing
ELIXIR Webinar: BioSharing
 
SciDataCon 2014 Data Papers and their applications workshop - NPG Scientific ...
SciDataCon 2014 Data Papers and their applications workshop - NPG Scientific ...SciDataCon 2014 Data Papers and their applications workshop - NPG Scientific ...
SciDataCon 2014 Data Papers and their applications workshop - NPG Scientific ...
 
Sansone bio sharing introduction
Sansone bio sharing introductionSansone bio sharing introduction
Sansone bio sharing introduction
 
Standards: awareness, information, education
Standards: awareness, information, educationStandards: awareness, information, education
Standards: awareness, information, education
 
Metadata challenges research and re-usable data - BioSharing, ISA and STATO
Metadata challenges research and re-usable data - BioSharing, ISA and STATOMetadata challenges research and re-usable data - BioSharing, ISA and STATO
Metadata challenges research and re-usable data - BioSharing, ISA and STATO
 
Cross-linked metadata standards, repositories and the data policies - The Bio...
Cross-linked metadata standards, repositories and the data policies - The Bio...Cross-linked metadata standards, repositories and the data policies - The Bio...
Cross-linked metadata standards, repositories and the data policies - The Bio...
 
The Future of Research Communications and e-Scholarship: Are we there yet?
The Future of Research Communications and e-Scholarship: Are we there yet?The Future of Research Communications and e-Scholarship: Are we there yet?
The Future of Research Communications and e-Scholarship: Are we there yet?
 
Kristi Holmes. A bird’s-eye view of scholarship at the individual, institutio...
Kristi Holmes. A bird’s-eye view of scholarship at the individual, institutio...Kristi Holmes. A bird’s-eye view of scholarship at the individual, institutio...
Kristi Holmes. A bird’s-eye view of scholarship at the individual, institutio...
 
Big Data Standards - Workshop, ExpBio, Boston, 2015
Big Data Standards - Workshop, ExpBio, Boston, 2015Big Data Standards - Workshop, ExpBio, Boston, 2015
Big Data Standards - Workshop, ExpBio, Boston, 2015
 
Big data, small data, data papers - short statement for "BDebate on Biomedici...
Big data, small data, data papers - short statement for "BDebate on Biomedici...Big data, small data, data papers - short statement for "BDebate on Biomedici...
Big data, small data, data papers - short statement for "BDebate on Biomedici...
 
INSERM - Data Management & Reuse of Health Data - May 2017
INSERM - Data Management & Reuse of Health Data - May 2017INSERM - Data Management & Reuse of Health Data - May 2017
INSERM - Data Management & Reuse of Health Data - May 2017
 
The role of libraries and information professionals during the Big Data Era/ ...
The role of libraries and information professionals during the Big Data Era/ ...The role of libraries and information professionals during the Big Data Era/ ...
The role of libraries and information professionals during the Big Data Era/ ...
 
BioSharing update and next steps - ELIXIR ALL Hands - March, 2015
BioSharing update and next steps - ELIXIR ALL Hands - March, 2015BioSharing update and next steps - ELIXIR ALL Hands - March, 2015
BioSharing update and next steps - ELIXIR ALL Hands - March, 2015
 

Más de Susanna-Assunta Sansone

Más de Susanna-Assunta Sansone (16)

FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
FAIRsharing-Standards-4-GSC-Aug23.pdf
FAIRsharing-Standards-4-GSC-Aug23.pdfFAIRsharing-Standards-4-GSC-Aug23.pdf
FAIRsharing-Standards-4-GSC-Aug23.pdf
 
FAIR-4-GSC-Sansone-Aug23.pdf
FAIR-4-GSC-Sansone-Aug23.pdfFAIR-4-GSC-Sansone-Aug23.pdf
FAIR-4-GSC-Sansone-Aug23.pdf
 
FAIRsharing & FAIRcookbook at RDA 2023
FAIRsharing & FAIRcookbook at RDA 2023FAIRsharing & FAIRcookbook at RDA 2023
FAIRsharing & FAIRcookbook at RDA 2023
 
NFDI Physical Sciences Colloquium - FAIR
NFDI Physical Sciences Colloquium - FAIRNFDI Physical Sciences Colloquium - FAIR
NFDI Physical Sciences Colloquium - FAIR
 
Metadata Standards
Metadata StandardsMetadata Standards
Metadata Standards
 
FAIRcookbook: GSRS22-Singapore
FAIRcookbook: GSRS22-SingaporeFAIRcookbook: GSRS22-Singapore
FAIRcookbook: GSRS22-Singapore
 
FAIR Cookbook
FAIR Cookbook FAIR Cookbook
FAIR Cookbook
 
FAIR, community standards and data FAIRification: components and recipes
FAIR, community standards and data FAIRification: components and recipesFAIR, community standards and data FAIRification: components and recipes
FAIR, community standards and data FAIRification: components and recipes
 
FAIRsharing and the FAIR Cookbook
FAIRsharing and the FAIR Cookbook FAIRsharing and the FAIR Cookbook
FAIRsharing and the FAIR Cookbook
 
FAIRsharing for EOSC
FAIRsharing for EOSC FAIRsharing for EOSC
FAIRsharing for EOSC
 
ELIXIR FAIR Activities - Examplars
ELIXIR FAIR Activities - ExamplarsELIXIR FAIR Activities - Examplars
ELIXIR FAIR Activities - Examplars
 
FAIRsharing - focus on standards and new features
FAIRsharing - focus on standards and new features FAIRsharing - focus on standards and new features
FAIRsharing - focus on standards and new features
 
Open Science FAIR 2021: FAIRsharing and the FAIR Cookbook
Open Science FAIR 2021: FAIRsharing and the FAIR Cookbook Open Science FAIR 2021: FAIRsharing and the FAIR Cookbook
Open Science FAIR 2021: FAIRsharing and the FAIR Cookbook
 
FAIRsharing COVID-19 Collection for The Global Health Network
FAIRsharing COVID-19 Collection for The Global Health NetworkFAIRsharing COVID-19 Collection for The Global Health Network
FAIRsharing COVID-19 Collection for The Global Health Network
 
The FAIR Principles and the IMI FAIRplus project
The FAIR Principles and the IMI FAIRplus projectThe FAIR Principles and the IMI FAIRplus project
The FAIR Principles and the IMI FAIRplus project
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

Managing Big Data - Berlin, July 9-10, 201.

  • 1. Data Consultant, Honorary Academic Editor Associate Director, Principal Investigator The rise of the data-centric ! research and publication enterprises! Susanna-Assunta Sansone, PhD! ! uk.linkedin.com/in/sasansone! @biosharing! @isatools! @scientificdata! ! 'Managing Big Data - Setting the standards for analyzing and integrating big data’, Berlin, July 9-10, 2014 http://www.slideshare.net/SusannaSansone
  • 2. •  About myself! o  activities and interests! •  Be FAIR to your data! o  concept! o  my related projects! •  The Scientific Data exemplar! o  rationale! o  Data Descriptors! Outline!
  • 3. My areas of activity:! •  Data capture and curation! •  Data (nano)publication! •  Data provenance ! •  Open, community ontologies and standards! •  Semantic web! •  Software development! •  Training! Communities I work with/for:! As part of:! •  UK, European and international consortia! •  Pre-competitive informatics public-private partnerships! •  Standardization initiatives! with e.g.:!
  • 4. Notes in Lab Books (information for humans) Spreadsheets andTables ( the compromise) Facts as RDF statements (information for machines) Notes and narrative! Spreadsheets and tables! Linked data and nanopublications! Enabling reproducible research and open science, driving science and discoveries ! Increase the level of annotation at the source, tracking provenance and using community standards
  • 6. A great start, but not enough! image by Greg Emmerich http://discovery.urlibraries.org/ http://www.theguardian.com/higher-education-network/blog/2014/jun/26
  • 7. Findable, Accessible, Interoperable, Reusable! Worldwide movement for FAIR data
  • 8. In all fairness, no much data is FAIR!
  • 9. But it is not just about technology…!
  • 10. The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project 1 0 …breath and depth of the content! …is pivotal!
  • 11. The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project 11 sample characteristic(s)! experimental design! experimental variable(s)! technology(s)! measurement(s)! protocols(s)! data file(s)! ......!
  • 12. The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project 1 2 •  make annotation explicit and discoverable •  structure the descriptions for consistency •  ensure/regulate access •  deposit and publish •  etc…. §  To make this dataset ‘FAIR’, one must have standards, tools and best practices to: •  report sufficient details •  capture all salient features of the experimental workflow
  • 13. A community mobilization to develop standards, e.g.: §  Structural and operational differences •  organization types (open, close to members, society, WG etc.) •  standards development (how to formulate, conduct and maintain) •  adoption, uptake, outreach (link to journals, funders and commercial sector) •  funds (sponsors, memberships, grants, volunteering) de jure de facto grass-roots groups standard organizations Nanotechnology Working Group
  • 14. A community mobilization to develop standards, e.g.: §  Structural and operational differences •  organization types (open, close to members, society, WG etc.) •  standards development (how to formulate, conduct and maintain) •  adoption, uptake, outreach (link to journals, funders and commercial sector) •  funds (sponsors, memberships, grants, volunteering) de jure de facto grass-roots groups standard organizations Nanotechnology Working Group
  • 15. Focus on reporting or content standards Including minimum information reporting requirements, or checklists to report the same core, essential information Including controlled vocabularies, taxonomies, thesauri, ontologies etc. to use the same word and refer to the same ‘thing’ Including conceptual model, conceptual schema from which an exchange format is derived to allow data to flow from one system to another Community-developed, standards are pivotal to structure, enrich the description and share datasets, facilitating understanding and reuse!
  • 16. 16 Technologically-delineated views of the world
 ! Biologically-delineated views of the world! Generic features ( common core )! - description of source biomaterial! - experimental design components! Arrays! Scanning! Arrays &
 Scanning! Columns! Gels! MS! MS! FTIR! NMR! Columns! transcriptomics proteomics metabolomics plant biology epidemiology microbiology Fragmentation, duplications and gaps To compare and integrate data we need interoperable standards
  • 17. 17 Arrays! Scanning! Arrays &
 Scanning! Columns! Gels! MS! MS! FTIR! NMR! Columns! transcriptomics proteomics metabolomics Synergistic examples exist, but more are needed!
  • 18. Growing number of reporting standards + 130 + 150 + 303 Source:BioPortal Databases, ! annotation,! curation ! tools ! implementing ! standards! miame! MIAPA! MIRIAM! MIQAS! MIX! MIGEN! CIMR! MIAPE! MIASE! MIQE! MISFISHIE….! REMARK! CONSORT! MAGE-Tab! GCDML! SRAxml! SOFT! FASTA! DICOM! MzML! SBRML! SEDML…! GELML! ISA-Tab! CML! MITAB! AAO! CHEBI! OBI! PATO! ENVO! MOD! BTO! IDO…! TEDDY! PRO! XAO! DO VO! Source:BioSharing Source:BioSharing
  • 19. Navigating the sea of standards is not trivial! The relationship among popular standard formats for pathway information! BioPAX and PSI-MI are designed for data exchange to and from databases and pathway and network data integration. SBML and CellML are designed to support mathematical simulations of biological systems and SBGN represents pathway diagrams. ! CREDIT: Demir, et al., The BioPAX community standard for pathway data sharing, 2010.
  • 20. Which standards and database can we use/recommend I work in the field of cell migration research, which one are applicable to me? I us cell migration in translational research, are there specific clinical standards?
  • 21.
  • 22. The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project 2 2
  • 23. Registering and cataloging is just step one; the next one are: •  Develop assessment criteria for usability and popularity of standards •  Associate standards to data policies and databases •  Assemble journal and funder policies re data storage •  Make fully cross-searchable •  Intended goal: help stakeholders make informed decisions
  • 24.
  • 25. General-purpose, configurable format, designed to support: •  description of the experimental metadata, making the annotation explicit and discoverable •  provenance tracking •  use community standards, such as minimal reporting guidelines and terminologies •  designed to be converted to - a growing number of - other metadata formats, e.g. used by EBI repositories analysis ! method! script! Data file or ! record in a database!
  • 26.
  • 27. The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project ISA powers data collection, curation resources and repositories, e.g.:
  • 28. eTRIKS – european Translational Information and Knowledge management Services Consortium of academic (Imperial College, CNRS, Un of Luxemburg) and pharmas (Janssen, Merck, AZ, Lilly, Lundbeck, Pfizer, Roche, Sanofi, Bayer, GSK) building a sustainable, open translational research informatics platform • Nature Publishing Group‘s Scientific Data • BioMedCentral and BGI‘s GigaScience • F1000 Research • Oxford University Press Susanna-Assunta Sansone, Philippe Rocca-Serra, Alejandra Gonzalez-Beltran, Eamonn Maguire, Milo Thurston; Oxford alumni: Annapaola Santarsiero, Pavlos Georgiou + my previous team when at EBI (2001 – 2010)!
  • 29. •  About myself! o  activities and interests! •  Be FAIR to your data! o  concept! o  my related projects! •  The Scientific Data exemplar! o  rationale! o  Data Descriptors! Outline!
  • 30. FAIR data - roles and responsibilities •  Data has to become an integral part of the scholarly communications! •  Responsibilities lie across several stakeholder groups: researchers, data centers, librarians, funding agencies and publishers! •  But publishers occupy a “leverage point” in this process!
  • 31. Human Genome 2001 62 Pages, 150 Authors, 49 Figure, 27 tables Encode Project 2012 30 papers, 3 Journals Journal publishing - changing landscape !
  • 32. 
 ! ! ! Launched on May 27th, 2014 A new online-only publication for descriptions of scientifically valuable datasets in the life, environmental and biomedical sciences, but not limited to these! Credit for sharing your data Focused on reuse and reproducibility Peer reviewed, curated Promoting Community Data Repositories Open Access Supported by:!
  • 33. ! ! ! Experimental metadata or! structured component! (in-house curated, machine- readable formats)! Article or ! narrative component! (PDF and HTML)! 
 Data Descriptor: narrative and structure!
  • 34. ! ! ! ! ! ! ! ! Scientific hypotheses:! Synthesis! Analysis! Conclusions! Methods and technical analyses supporting the quality of the measurements:! What did I do to generate the data?! How was the data processed?! Where is the data?! Who did what when! BEFORE: get your data to the community as soon as possible (see NPG pre-publication policy) AT THE SAME TIME: publish your Data Descriptor(s) alongside research article(s) AFTER: expand on your research articles, adding further information for reuse of the data 
 Relation with traditional articles - content and time!
  • 35. Export to various formats (ISA_tab, RDF, etc) Linking between research papers, Data Descriptors, and data records 
 Making data discoverable
 ! We currently recognize over 50 public data repositories! !
  • 36. Evaluation is not be based on the perceived impact or novelty of the findings! •  Experimental Rigour and Technical Data Quality! o  Were the data produced in a rigorous and methodologically sound manner?! o  Was the technical quality of the data supported convincingly with technical validation experiments and statistical analyses of data quality or error, as needed?! o  Are the depth, coverage, size, and/or completeness of these data sufficient for the types of applications or research questions outlined by the authors?! •  Completeness of the Description! o  Are the methods and any data-processing steps described in sufficient detail to allow others to reproduce these steps?! o  Did the authors provide all the information needed for others to reuse this dataset or integrate it with other data?! o  Is this Data Descriptor, in combination with any repository metadata, consistent with relevant minimum information or reporting standards?! •  Integrity of the Data Files and Repository Record! o  Have you confirmed that the data files deposited by the authors are complete and match the descriptions in the Data Descriptor?! o  Have these data files been deposited in the most appropriate available data repository?! Peer review process focused on quality and reuse!
  • 37. •  Neuroscience, ecology, epidemiology, environmental science, functional genomics, metabolomics, toxicology! •  New datasets and previously published data sets! o  a fuller, more in-depth look at the data processing steps, supported by additional data files and code from each step! o  additional tutorial-like information for scientists interested in reusing or integrating the data with their own! •  Datasets in figshare and domain specific databases! •  Code deposited in figshare and GitHub! •  Individual datasets, curated aggregation and citizen science! •  First dataset part of a collection ! •  Academic and industry authors! 37 Current content is diverse – bimonthly releases !
  • 38. •  Do you run a data resource we should recognize?! o  See on our website the list of criteria databases should meet!! •  Are you interested in facilitating submission to us? ! o  See our ISA-Tab specification on the website! -  you can implement and export in this format from your authoring/curation tool, or from your database!! •  Do you want to submit Data Descriptor(s)?! o  Check suitability by sending a pre-submission enquire, we accept:! -  Submissions in the life, environmental and biomedical sciences; but not limited to! -  Experimental, observational and computational datasets! -  Individual datasets, curated aggregations, and collections! -  Unpublished data and follow-up, with additional information for wider reuse, e.g.:! ü  a fuller, more in-depth look at the data processing steps, supported by additional data files and code from each step! ü  additional tutorial-like information for scientists interested in reusing or integrating the data with their own! 
 Interested in collaborating and/or enable submission?!
  • 39. Helping you publish, discover and reuse research data Visit nature.com/scientificdata Email scientificdata@nature.com Tweet @ScientificData Supported by:! Honorary Academic Editor Susanna-Assunta Sansone, PhD Managing Editor Andrew L Hufton, PhD Editorial Curator Victoria Newman Advisory Panel and Editorial Board including senior researchers, funders, librarians and curators