SlideShare una empresa de Scribd logo
1 de 9
Descargar para leer sin conexión
Research Data Management
for Computational Science
Gerard Gorman
g.gorman@imperial.ac.uk
Christian Jacobs
c.jacobs10@imperial.ac.uk
1 / 9
Data requirements
Data produced by scientific software should be reproducible and
recomputable.
This requires:
raw data (input and output files)
the software (with info about the specific version used) to produce it
provenance data
We need a way of curating this data and software at the push of a
button...
...and a way of referencing it correctly in papers.
2 / 9
Figshare
In addition to papers and figures, Figshare (figshare.com) provides
hosting for datasets.
Each dataset is given its own Digital Object Identifier (DOI).
Programs developed by users can interface with Figshare via the
Figshare API.
3 / 9
Aims
Develop a program to automatically push software and data to
Figshare.
Incorporate this program into the workflow of Fluidity – an
open-source CFD code for fluid flow simulations
(http://amcg.ese.ic.ac.uk/fluidity).
DOIs are ‘minted’ automatically, and added to the current metadata
of simulation output.
4 / 9
Fluidity with RDM support
Current progress
Implementation of a Python program which enables the publication of
both software and data to Figshare.
Addition of a ‘publish’ option to Fluidity simulation setup files.
New DOIs created when:
Software is pushed to Figshare (if the specific version of the software,
identified by the git commit hash, has not been published already).
Input data is pushed to Figshare.
Output data is pushed to Figshare.
DOIs are recorded in the simulation setup file – if the simulation is
run again, the same DOI is used to store the data.
In the future, we will use MD5sums.
5 / 9
Fluidity with RDM support
Screenshot of the ‘publish’ option in the Fluidity simulation setup file. 6 / 9
Fluidity with RDM support
Example: simulation of the top hat cg supg test case
Screenshot of software, input data and output data automatically pushed to Figshare.
7 / 9
Links
The Software Sustainability Institute: www.software.ac.uk
Digital Curation Centre: www.dcc.ac.uk
Software Carpentry: software-carpentry.org (and Data
Carpentry: nescent.github.io/2014-05-08-datacarpentry)
Fidgit: www.github.com/arfon/fidgit
Reproducible Research course:
https://www.coursera.org/course/repdata
ROpenSci’s Reproducibility Guide:
http://ropensci.github.io/reproducibility-guide
8 / 9
’It has always been my habit to hide none of my methods, either
from my friend Watson or from any one who might take an
intelligent interest in them.’
Sherlock Holmes
9 / 9

Más contenido relacionado

La actualidad más candente

La actualidad más candente (13)

Image Caption Generation: Intro to Distributed Tensorflow and Distributed Sco...
Image Caption Generation: Intro to Distributed Tensorflow and Distributed Sco...Image Caption Generation: Intro to Distributed Tensorflow and Distributed Sco...
Image Caption Generation: Intro to Distributed Tensorflow and Distributed Sco...
 
Open Services for Lifecycle Collaboration (OSLC)
Open Services for Lifecycle Collaboration (OSLC) Open Services for Lifecycle Collaboration (OSLC)
Open Services for Lifecycle Collaboration (OSLC)
 
Enabling the digital thread using open OSLC standards
Enabling the digital thread using open OSLC standardsEnabling the digital thread using open OSLC standards
Enabling the digital thread using open OSLC standards
 
A view of graph data usage by Cerved
A view of graph data usage by CervedA view of graph data usage by Cerved
A view of graph data usage by Cerved
 
Resume_VS
Resume_VSResume_VS
Resume_VS
 
TIN-X v2: modernized architecture with REST API
TIN-X v2: modernized architecture with REST APITIN-X v2: modernized architecture with REST API
TIN-X v2: modernized architecture with REST API
 
apidays LIVE Helsinki & North - 20 minutes to build a serverless COVID-19 RES...
apidays LIVE Helsinki & North - 20 minutes to build a serverless COVID-19 RES...apidays LIVE Helsinki & North - 20 minutes to build a serverless COVID-19 RES...
apidays LIVE Helsinki & North - 20 minutes to build a serverless COVID-19 RES...
 
h5web: a web-based viewer of HDF5 files
h5web: a web-based viewer of HDF5 filesh5web: a web-based viewer of HDF5 files
h5web: a web-based viewer of HDF5 files
 
Introduction to Open Services for Lifecycle Collaboration (OSLC)
Introduction to Open Services for Lifecycle Collaboration (OSLC)Introduction to Open Services for Lifecycle Collaboration (OSLC)
Introduction to Open Services for Lifecycle Collaboration (OSLC)
 
Data science for smart manufacturing at Pirelli
Data science for smart manufacturing at PirelliData science for smart manufacturing at Pirelli
Data science for smart manufacturing at Pirelli
 
The A-Z of Data: Introduction to MLOps
The A-Z of Data: Introduction to MLOpsThe A-Z of Data: Introduction to MLOps
The A-Z of Data: Introduction to MLOps
 
Open DMPs: Machine Actionable open data management planning (Presentation at ...
Open DMPs: Machine Actionable open data management planning (Presentation at ...Open DMPs: Machine Actionable open data management planning (Presentation at ...
Open DMPs: Machine Actionable open data management planning (Presentation at ...
 
Big Data Europe: Simplifying Development and Deployment of Big Data Applications
Big Data Europe: Simplifying Development and Deployment of Big Data ApplicationsBig Data Europe: Simplifying Development and Deployment of Big Data Applications
Big Data Europe: Simplifying Development and Deployment of Big Data Applications
 

Similar a Research Data Management for Computational Science

Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Yael Garten
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Shirshanka Das
 

Similar a Research Data Management for Computational Science (20)

How to build containerized architectures for deep learning - Data Festival 20...
How to build containerized architectures for deep learning - Data Festival 20...How to build containerized architectures for deep learning - Data Festival 20...
How to build containerized architectures for deep learning - Data Festival 20...
 
Cut your Grails application to pieces - build feature plugins
Cut your Grails application to pieces - build feature pluginsCut your Grails application to pieces - build feature plugins
Cut your Grails application to pieces - build feature plugins
 
mago3D Technical Workshop Material
mago3D Technical Workshop Material mago3D Technical Workshop Material
mago3D Technical Workshop Material
 
R sharing 101
R sharing 101R sharing 101
R sharing 101
 
Challenges of Deep Learning in Computer Vision Webinar - Tessellate Imaging
Challenges of Deep Learning in Computer Vision Webinar - Tessellate ImagingChallenges of Deep Learning in Computer Vision Webinar - Tessellate Imaging
Challenges of Deep Learning in Computer Vision Webinar - Tessellate Imaging
 
Building a Data Exchange with Spring Cloud Data Flow
Building a Data Exchange with Spring Cloud Data FlowBuilding a Data Exchange with Spring Cloud Data Flow
Building a Data Exchange with Spring Cloud Data Flow
 
Faster Time to Market using Scilab/XCOS/X2C for motor control algorithm devel...
Faster Time to Market using Scilab/XCOS/X2C for motor control algorithm devel...Faster Time to Market using Scilab/XCOS/X2C for motor control algorithm devel...
Faster Time to Market using Scilab/XCOS/X2C for motor control algorithm devel...
 
Use AppDynamics SDK to Integrate with your Applications - AppSphere16
Use AppDynamics SDK to Integrate with your Applications - AppSphere16Use AppDynamics SDK to Integrate with your Applications - AppSphere16
Use AppDynamics SDK to Integrate with your Applications - AppSphere16
 
AI-SDV 2020: Using Transformer technology to build an AI based personal News ...
AI-SDV 2020: Using Transformer technology to build an AI based personal News ...AI-SDV 2020: Using Transformer technology to build an AI based personal News ...
AI-SDV 2020: Using Transformer technology to build an AI based personal News ...
 
Federated Machine Learning Framework
Federated Machine Learning FrameworkFederated Machine Learning Framework
Federated Machine Learning Framework
 
journal in research
journal in research journal in research
journal in research
 
journal published
journal publishedjournal published
journal published
 
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
 
FSM integration with SAP
FSM integration with SAPFSM integration with SAP
FSM integration with SAP
 
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
 
Ab initio training Ab-initio Architecture
Ab initio training Ab-initio ArchitectureAb initio training Ab-initio Architecture
Ab initio training Ab-initio Architecture
 
Kubernetes as data platform
Kubernetes as data platformKubernetes as data platform
Kubernetes as data platform
 
DevOpsDaysRiga 2018: Eric Skoglund, Lars Albertsson - Kubernetes as data plat...
DevOpsDaysRiga 2018: Eric Skoglund, Lars Albertsson - Kubernetes as data plat...DevOpsDaysRiga 2018: Eric Skoglund, Lars Albertsson - Kubernetes as data plat...
DevOpsDaysRiga 2018: Eric Skoglund, Lars Albertsson - Kubernetes as data plat...
 
FRIDA 101 Android
FRIDA 101 AndroidFRIDA 101 Android
FRIDA 101 Android
 

Último

如何办理澳洲悉尼大学毕业证(USYD毕业证书)学位证成绩单原版一比一
如何办理澳洲悉尼大学毕业证(USYD毕业证书)学位证成绩单原版一比一如何办理澳洲悉尼大学毕业证(USYD毕业证书)学位证成绩单原版一比一
如何办理澳洲悉尼大学毕业证(USYD毕业证书)学位证成绩单原版一比一
hwhqz6r1y
 
如何办理澳洲悉尼大学毕业证(USYD毕业证书)学位证书成绩单原版一比一
如何办理澳洲悉尼大学毕业证(USYD毕业证书)学位证书成绩单原版一比一如何办理澳洲悉尼大学毕业证(USYD毕业证书)学位证书成绩单原版一比一
如何办理澳洲悉尼大学毕业证(USYD毕业证书)学位证书成绩单原版一比一
w7jl3eyno
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Stephen266013
 
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
fztigerwe
 
Toko Jual Viagra Asli Di Salatiga 081229400522 Obat Kuat Viagra
Toko Jual Viagra Asli Di Salatiga 081229400522 Obat Kuat ViagraToko Jual Viagra Asli Di Salatiga 081229400522 Obat Kuat Viagra
Toko Jual Viagra Asli Di Salatiga 081229400522 Obat Kuat Viagra
adet6151
 
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
ju0dztxtn
 
Fuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyFuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertainty
RafigAliyev2
 
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
pyhepag
 
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
dq9vz1isj
 
Toko Jual Viagra Asli Di Malang 081229400522 COD Obat Kuat Viagra Malang
Toko Jual Viagra Asli Di Malang 081229400522 COD Obat Kuat Viagra MalangToko Jual Viagra Asli Di Malang 081229400522 COD Obat Kuat Viagra Malang
Toko Jual Viagra Asli Di Malang 081229400522 COD Obat Kuat Viagra Malang
adet6151
 

Último (20)

Pre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxPre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptx
 
如何办理澳洲悉尼大学毕业证(USYD毕业证书)学位证成绩单原版一比一
如何办理澳洲悉尼大学毕业证(USYD毕业证书)学位证成绩单原版一比一如何办理澳洲悉尼大学毕业证(USYD毕业证书)学位证成绩单原版一比一
如何办理澳洲悉尼大学毕业证(USYD毕业证书)学位证成绩单原版一比一
 
如何办理澳洲悉尼大学毕业证(USYD毕业证书)学位证书成绩单原版一比一
如何办理澳洲悉尼大学毕业证(USYD毕业证书)学位证书成绩单原版一比一如何办理澳洲悉尼大学毕业证(USYD毕业证书)学位证书成绩单原版一比一
如何办理澳洲悉尼大学毕业证(USYD毕业证书)学位证书成绩单原版一比一
 
How to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsHow to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data Analytics
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptx
 
NOAM AAUG Adobe Summit 2024: Summit Slam Dunks
NOAM AAUG Adobe Summit 2024: Summit Slam DunksNOAM AAUG Adobe Summit 2024: Summit Slam Dunks
NOAM AAUG Adobe Summit 2024: Summit Slam Dunks
 
Machine Learning for Accident Severity Prediction
Machine Learning for Accident Severity PredictionMachine Learning for Accident Severity Prediction
Machine Learning for Accident Severity Prediction
 
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
 
Toko Jual Viagra Asli Di Salatiga 081229400522 Obat Kuat Viagra
Toko Jual Viagra Asli Di Salatiga 081229400522 Obat Kuat ViagraToko Jual Viagra Asli Di Salatiga 081229400522 Obat Kuat Viagra
Toko Jual Viagra Asli Di Salatiga 081229400522 Obat Kuat Viagra
 
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
 
Easy and simple project file on mp online
Easy and simple project file on mp onlineEasy and simple project file on mp online
Easy and simple project file on mp online
 
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
 
Fuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyFuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertainty
 
123.docx. .
123.docx.                                 .123.docx.                                 .
123.docx. .
 
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
 
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
 
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
 
Toko Jual Viagra Asli Di Malang 081229400522 COD Obat Kuat Viagra Malang
Toko Jual Viagra Asli Di Malang 081229400522 COD Obat Kuat Viagra MalangToko Jual Viagra Asli Di Malang 081229400522 COD Obat Kuat Viagra Malang
Toko Jual Viagra Asli Di Malang 081229400522 COD Obat Kuat Viagra Malang
 
社内勉強会資料  Mamba - A new era or ephemeral
社内勉強会資料   Mamba - A new era or ephemeral社内勉強会資料   Mamba - A new era or ephemeral
社内勉強会資料  Mamba - A new era or ephemeral
 
How I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prisonHow I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prison
 

Research Data Management for Computational Science

  • 1. Research Data Management for Computational Science Gerard Gorman g.gorman@imperial.ac.uk Christian Jacobs c.jacobs10@imperial.ac.uk 1 / 9
  • 2. Data requirements Data produced by scientific software should be reproducible and recomputable. This requires: raw data (input and output files) the software (with info about the specific version used) to produce it provenance data We need a way of curating this data and software at the push of a button... ...and a way of referencing it correctly in papers. 2 / 9
  • 3. Figshare In addition to papers and figures, Figshare (figshare.com) provides hosting for datasets. Each dataset is given its own Digital Object Identifier (DOI). Programs developed by users can interface with Figshare via the Figshare API. 3 / 9
  • 4. Aims Develop a program to automatically push software and data to Figshare. Incorporate this program into the workflow of Fluidity – an open-source CFD code for fluid flow simulations (http://amcg.ese.ic.ac.uk/fluidity). DOIs are ‘minted’ automatically, and added to the current metadata of simulation output. 4 / 9
  • 5. Fluidity with RDM support Current progress Implementation of a Python program which enables the publication of both software and data to Figshare. Addition of a ‘publish’ option to Fluidity simulation setup files. New DOIs created when: Software is pushed to Figshare (if the specific version of the software, identified by the git commit hash, has not been published already). Input data is pushed to Figshare. Output data is pushed to Figshare. DOIs are recorded in the simulation setup file – if the simulation is run again, the same DOI is used to store the data. In the future, we will use MD5sums. 5 / 9
  • 6. Fluidity with RDM support Screenshot of the ‘publish’ option in the Fluidity simulation setup file. 6 / 9
  • 7. Fluidity with RDM support Example: simulation of the top hat cg supg test case Screenshot of software, input data and output data automatically pushed to Figshare. 7 / 9
  • 8. Links The Software Sustainability Institute: www.software.ac.uk Digital Curation Centre: www.dcc.ac.uk Software Carpentry: software-carpentry.org (and Data Carpentry: nescent.github.io/2014-05-08-datacarpentry) Fidgit: www.github.com/arfon/fidgit Reproducible Research course: https://www.coursera.org/course/repdata ROpenSci’s Reproducibility Guide: http://ropensci.github.io/reproducibility-guide 8 / 9
  • 9. ’It has always been my habit to hide none of my methods, either from my friend Watson or from any one who might take an intelligent interest in them.’ Sherlock Holmes 9 / 9