2. - Where do you store your experimental data?
- What happens with data when a PhD students leaves the group?
- Are all data complete for a publication?
- Do you make regular backups of your local machine?
- Do you send emails to share data with your colleagues?
- Do you always store email attachements in your local directory?
- Do you store all different versions of a data file together in the same place?
- Which protocol was used for the experiment?
...
Why do you need data management?
3. Vahan Simonyan, Center for Biologics
Evaluation and Research, Food and
DrugAdministration, USA
How well is your experiment
documented?
4. • Track collection of raw and processed
(secondary) data, models & metadata
• Maintain experimental context
• Organise and link assets
• Choose what to keep and what to ditch
• Report consistently
• Reproducible publications
• Promote standardised metadata practices
• Exchange among colleagues
• How and when to share and publish
• Get and give credit
• Retain and find beyond project
• Integrate with legacy, home grown,
external systems
• Reuse tools and community archives
• Support automation and analytics
workflows. Support curation
CREATING
DATA
PROCESSING
DATA
ANALYSING
DATA
PRESERVING
DATA
ACCESS
TO DATA
RE-USING
DATA
Purpose of Project Data
Management
5. Purpose of Project Data
Management
Organisation
Communication
Dissemination
Partners
Funders
Public
6. The FAIR Guiding Principles for scientific data management and stewardship
https://www.nature.com/articles/sdata201618 (2016)
8. FAIR Checklists
Making Data Findable (documentation and metadata management)
• What documentation and metadata will accompany the data (assist its
discoverability)? (Details on methodology, definitions, procedures, SOPs,
vocabularies, units, dependencies, etc)
• What information is needed for the data to be read and interpreted in the
future?
• What naming conventions will be used?
• How will you approach versioning your data?
• How will you capture / create this documentation and metadata?
• How do you ensure the completeness of the captured data?
Making Data Accessible
Specify which data will be made openly available taking into consideration
• What ethics and legal compliance issues do you have if any? Do you need
consent for data preservation and sharing? Do you have to protect certain
data? Is any data sensitive?
• Do you think you might have Intellectual Property Rights issues? Have you
considered ownership of the data, licensing, restrictions on use?
• Do you think you will need to embargo any data?
• How will you make the data available? (consider the platforms you will use:
databases, repositories, etc)
• What methods or software tools are needed to access the data? shoudl you
include documentation detailing how to access use/access the software that is
needed for accessing the data? Is it possible to include this software with the
data (e.g. source code, docker etc)
• If there are any restrictions on accessibility, how will you provide access?
Making Data Interoperable
• What standards (metadata vocabularies, formats,
checklists) or methodologies will you use?
• How do you address data and model quality?What
validation steps do you foresee?
• Will you use standardised vocabulary for all data types
to allow inter-disciplinary interoperability?
• Where you can not used standardised vocabulary for all
types of data, can you map to more commonly used
ontologies?
Making Data Re-usable
• How will you licence your data to permit the widest re-
use possible?
• When will the data be made available for re-use? Does
this include an embargo period? (if so, why?)
• Which data will be available for re-use during/after the
project? If not, why?
• What are your data quality assurance processes?
• How long do you expect your data to remain re-usable?
9. FAIRDOM Initiative
- develop a community
- establish an internationally sustained Data and Model Management service
- joint action of ERA-Net EraSysAPP and European Research Infrastructure ISBE
10. A bit of history : 11Year Anniversary
2008
2010
2014
2018
2012
2016
2020
Standards based asset
management (data,
models, workflows,
SOPs…) for multi-party
projects
Sensitive sharing
Self-deposit / curation
Mixed stewardship skills
Legacy local systems
Community resources
Started in Systems
Biology. Now widened.
11. SEEK Software
- Open source web platform for sharing scientific research assets, processes
and outcomes
- Associations between data along with information about the people and
organisations (yellow pages)
- ISA (Investigation, Study, Assay) structure for describing how individual
experiments are aggregated into studies and investigations
- Flexible and detailed sharing permissions
- DOI can be generated for individual items, or entire aggregates
- Semantic technology, allowing sophisticated queries over the content
- Collection of meta data
https://seek4science.org/
18. Data Files, SOPs, Documents
- no file format restrictions
- some formats allow to view the content in SEEK: e.g.Excel,Word, PDF, XML, PNG
19. Models
SBML Model simulation
Model comparison
Model versioning
Reproducing simulations
[Jacky Snoep, Dagmar Waltemath,
Martin Peters, Martin Scharm]
21. Tracking model versions smartly
Scharm, M.,Wolkenhauer, O., &Waltemath, D. (2015). An algorithm to detect and communicate the differences in
computational models describing biological systems. Bioinformatics
29. More than simple supplementary materials
16 datafiles (kinetic, flux inhibition, runout)
19 models (kinetics, validation)
13 SOPs
3 studies (model analysis, construction,
validation)
24 assays/analyses (simulations, model
characterisations)
Penkler, G., du Toit, F., Adams, W., Rautenbach, M.,
Palm, D. C., van Niekerk, D. D. and Snoep, J. L. (2015),
Construction and validation of a detailed kinetic model
of glycolysis in Plasmodium falciparum. FEBS J, 282:
1481–1511. doi:10.1111/febs.13237
30. Scharm M,Wendland F, Peters M,Wolfien M,TheileT,Waltemath D
SEMS, University of Rostock
zip-like file with a manifest & metadata
- Bundling files - Keeping provenance
- Exchanging data - Shipping results
Bergmann, F.T.,Adams, R., Moodie, S., Cooper, J., Glont, M., Golebiewski, M., ... & Olivier, B. G. (2014). COMBINE archive and OMEX format:
one file to share all information to reproduce a modeling project. BMC bioinformatics,15(1), 1.
Packaging: COMBINEarchive
31. Standards-based metadata framework for
bundling (scattered) resources with context and citation
Packaging: Research Objects
http://researchobject.org
32. SEEK as project-specific local
instances or as central FAIRDOMHub
Service hosted at HITS
(Institutional Guarantee at least until 2029)
33. FAIRDOMHub Statistics
1st July 2019
Programmes 60
Projects 144
Institutions 274
People 1291
Data files 2280
Models 487
SOPs 301
Sample types 63
Presentations 729
Publications 370
Events 178
34. FAIRDOM Platform
Free and Open Source
Front end
Project(s) Hub
Back end
Onsite storage & analytics
On site
Tracking, data analytic pipelines,
Extract,Transform and Load direct from the
instruments, large data management
LIMS, auto-archiving
Web-based portal
Project controlled spaces
Metadata catalogue &Yellow pages
Results repository, dissemination and collaboration
Tool gateway
Built using Built using
35. Back end
Instrument Data Management, LIMS, ELN
• samples
• protocols
• instruments
• data management
• experimental description
Norway’s national e-Infrastructure
for Life Science
https://nels.bioinfo.no/
Electronic Laboratory Notebook and Laboratory
Information Management System (ELN-LIMS)
https://csb.ethz.ch/tools/software/openbis-lims-eln.html
36. [Adapted from Ursula Klingmüller, Martin Böhm]
Excemplify
Antibody
Database
FAIR collaboration
from the ERANet ERASysAPP
37. 38
Programme
Overarching research theme (The Digital Salmon)
Project
Research grant (DigiSal, GenoSysFat)
Investigation
A particular biological process, phenomenon or thing
(typically corresponds to [plans for] one or more closely related
papers)
Study
Experiment whose design reflects a specific biological research
question
Assay
Standardized measurement or diagnostic experiment using a
specific protocol
(applied to material from a study)
Jon Olav Vik,
Norwegian University of Life Science
Integration with Norway’s national
einfrastructure for Life Science (NeLS)
38. • Project controlled protected spaces
– Working space, show space for results
– Supp. materials space for publications
– Yellow pages and collaboration
– Upload or link to data
• One place catalogue
– Regardless of physical store
– ISA with shared metadata
– Standards-compliant
• Linked with other systems
– Project on-site (secure) repositories
– Public deposition archives
– Integrated with JWSOnline modelling tools
Front End
Find, Access and Organise assets
“Using FAIRDOMHub my own
lab colleagues saw what I was
doing and called to
collaborate!”
https://fairdomhub.org/
39. Catalogue across repositories regardless
of location
In House Stores
External Databases
Publishing services
Secure Stores
Model Resources
Upload or
Reference
42. PALs - Project Area Liaisons
PALs
DM Team
Data management training
Requirements & Suggestions
• Training needs for users
• Suggestions to improve SEEK
• Requirements for new SEEK
features and DM services
43. PALs - Project Area Liaisons
- our user focus group
- post docs, postgrads and techs
- experimentalists, modellers and bioinformaticians
- advocates and communicate our progress back to their projects
44. Data Stewards
function, profession, cultural shift
• 500,000 needed in Europe*
• Specialist skills
• Career pathways
• Recognition
Curation and management
• Supported, Resourced
• Recognised, Rewarded
Sharing policy and practice embedded
* Realising the Open European Science Cloud (2016)
47. LiSyM (Liver Systems Medicine)
German Research Network on
Systems Medicine for Liver Disease
Supported by
The German Federal Ministry of Education and Research 2016-2020
Multiple disciplines
• Medicine
• Biology, Biochemistry
• Pharmacology
• Physics
• Bioinformatics
• Data management
• Industry
38 independent research groups:
• Bayer AG
• Max Planck (Dresden and Berlin)
• MEVIS Fraunhofer (Bremen)
• Leibniz Institute IfaDo (Dortmund)
• Charité (Berlin)
• DKFZ (Heidelberg)
• Hospitals: Dresden, Kiel,Aachen, Homburg,
Berlin, Heidelberg, Munich
• + 18 Universitieshttp://www.lisym.org/
49. Clinical data sharing concept
Goal:
• Diffuse description of data
throughout consortium
Challenge:
• Some partners cannot share
Solution:
• Share table structure
• Create & share common code
• Distributedly create summaries
50. NMTrypI
Trypanosomiasis causes sleeping
sickness, leishmaniasis and Chagas
disease - in Africa, South America and
India
EU-funded project 2014 – 2017
Goal: new candidate drugs against
Trypanosomatidic infections
Consortium: 12 partners (3 SMEs and 9
academics) in Europe and in disease-
endemic countries (Italy, Greece, Portugal,
Spain, Germany, UK, Sudan, Brazil)
https://fp7-nmtrypi.eu
51. NMTrypI specific challenges
• New visualizations of spreadsheet data
• Cross-references with external databases
• Chemical compound specific features
– show structure
– allow (sub)structure search
– create compound summary reports
55. de.NBI -The German Network for
Bioinformatics Infrastructure
de.NBI consortium
• 39 project partners
• 30 institutions
• 8 service centers
https://www.denbi.de/
Mission
• Provide, expand and improve specialized
bioinformatics tools
• Provide access to computing and storage
capacities
• Provide regular training events and workshops
• Maintain and develop specific high-quality data
resources
56. Research and service topics of de.NBI service centers
HD-HuB
Bioinformatics Infrastructures in Biomedical Research
• Human genetics and genomics
• Metagenomics
• Systematic phenotyping of human cells
• Epigenetics
BiGi
Microbial Research for Biotechnology and Medicine
• High performance computing services
• Repository of reusable workflows
• Comparative genomics and meta-omics
• Post-genomics data integration
BioData
Reference Databases, Services and Tools
• Ribosomal RNAs (SILVA)
• Environmental data (PANGAEA)
• Taxon-associated metadata (BacDive)
• Enzymes & Ligands (BRENDA/EnzymeStructures)
CIBI
Tools for omics data and imaging
• Open-source libraries (OpenMS, SeqAn, FIJI)
• Tools for NGS, mass spec, and imaging
• Workflow engine (KNIME) for automation
• (Multi-)omics data analysis workflows
RBC
RNA Bioinformatics
• Analysis of RNA-related data
• Life science data analysis with Galaxy
• Meta-transcriptomics
• Epigenetic research
de.NBI-SysBio
Standards-based Systems Biology
• Data and model management tools
• SABIO-RK reaction kinetics data
• Methods and tools for modeling in Systems Biology
• Standards & tools for model search and management
GCBN
Crops and BioGreenFormatics
• Plant genetic resources and traits
• Bridging genotypes to phenotypes
• Plant gene and genome annotation
• Enabling technologies to improve crops
BioInfra.Prot
Bioinformatics for Proteomics
• Comprehensive proteomics workflow
• Data publication, analysis & tool services
• Quality standards for targeted proteomics
• Lipidomics
de.NBI -The German Network for
Bioinformatics Infrastructure
57. Current Actions in de.NBI
• Goal: Make Data FAIRness part of all de.NBI centers
• Idea: Have service centers collect more metadata. No metadata, no
service.
• Approach: Build use cases that involve data management and service
centers
Two example use cases: Medical proteomics center
• Statistical advice service
– tracking of advice given
– making reports FAIR
• From data to PRIDE
– Catalogue links to PRIDE in SEEK/FAIRDOMHub
– Store and standardise intermediate files
58. Summary FAIRDOM
FAIRDOM Software Platform+Tools
A Central Public Hub
for Projects
Customised Project
Installations
Project Stewardship
Consultancy Services
Community
Activities
144 Projects 30+ Installations
59. Summary FAIRDOM
Find & Access Central catalogue
Link to original files and external resources
Search
Metadata tagging and standards
Yellow pages of projects and people
Access control to spaces
Embedded tools
Interoperate Rich metadata, standards compliance
Consistent reporting – ISA
Curation support
Integration with other resources, archives, tools
Export packages
Reuse Secure sharing space
Long term retention
Reproducible publication
60. - Where do you store your experimental data?
- What happens with data when a PhD students leaves the group?
- Are all data complete for a publication?
- Do you make regular backups of your local machine?
- Do you send emails to share data with your colleagues?
- Do you always store email attachements in your local directory?
- Do you store all different versions of a data file together in the same place?
- Which protocol was used for the experiment?
...
Why do you need data management?
61. What can you do? Be FAIR!
1. make a Data Management Plan
2. use standard identifiers
3. use metadata standards
4. catalogue / register data with metadata
5. define and share your SOPs
6. use data (assets) management platforms and tools that work
together
7. deposit into public archives
8. have a sustainability / end project plan
9. resource and support, and that means people too
10. embed data management into work practices and do some
training
11. give credit
12. check if you have sensitive data issues
13. educate your supervisors, institutions and peers