Presentation on the use of the Eureka Research Workbench to store data and scientific workflow information. Presented online as part of the Dial-a-molecule 'Liberating Laboratory Data' event (http://www.dial-a-molecule.org/wp/events-listing/liberating-laboratory-data/)
1. Eureka Research Workbench:
Semantic Capture of the
Scientific Process
Stuart J. Chalk
Department of Chemistry
University of North Florida
Jacksonville, FL USA
schalk@unf.edu
Liberating Laboratory Data – Day 2
2. Capturing Science Data
Data is a fundamental output of science, but…
Data is not useful if it does not have context
Big data analytics needs detailed, well structured metadata and
relationships to assemble aggregated datasets for useful
interpretation
Options
LabArchives http://www.labarchives.com
eCAT http://www.researchspace.com/electronic-lab-notebook/
LabTrove http://www.labtrove.org/
Dryad data publishing http://datadryad.org/
or …
3. Eureka Research Workbench
Started in 2006 as an offshoot of getting involved in the
Analytical Information Markup Language (AnIML) project
No way to store all research notes in a digital format
No way to capture the workflow of scientists
Realized writing in a lab notebook is equivalent to “multitype” blogging in the digital world
How to capture information? Many datatypes -> ExptML
How to store files and make them available through web
interface? (Fedora-Commons)
How to link data together? RDF (in Fedora-Commons)
4. Experiment Markup Language (ExptML)
A specification (written in XML) that describes
different types of information recorded during the
scientific process (http://exptml.sourceforge.net)
Many datatypes (will expand…)
Annotation
Api
Calculation
Chemical
Citation
Communication
Customer
Data
Dataset
Definition
Element
Equipment
Event
Experiment
Group
Project
Protocol
Quote
Report
Result
Sample
Solution
Space
Specimen
Substance
Task
Template
Timeline
User
Vendor
8. Related Data - ExptML Ontology
In computer science and ontology
“formally represents knowledge as a set of concepts within
a domain, and the relationships between those concepts. It
can be used to model a domain and support reasoning about
concepts.”*
In essence, an ontology allows us to define the
relationships and assertions about concepts
For substances represented in ExptML we define
isSubstance (assertion)
hasSubstance
isSubstanceOf
*https://en.wikipedia.org/wiki/Ontology_(information_science)
10. Fedora Commons
Digital repository software for creating and managing
online digital libraries
Stores the ExptML files
Stores any other files (PDFs, Images, Word etc.)
Stores relationships as RDF
Version control
Checksumming
Built in search of content and relationships
11. File Storage
Fedora-Commons treats each ExptML file as an object
In the definition of a fedora object the file is just one
stream of many. By default each object also has a “DC”
stream of metadata and an “RELS-EXT” stream of
relationships
Each Fedora object can have any number of additional
streams for
Paper PDFs, product/sample pictures, original file formats (if a
conversion has been done)
Video, audio, anything
You can export individual streams or the whole Fedora
object with streams binary encoded (Sharing/archiving)
13. Eureka Interface
So, finally to the Eureka Research Workbench!
Web interface written in PHP using the CakePHP Framework
Communicates with Fedora-Commons API to
create, retrieve, update and delete (CRUD) ExptML and
other files
Representational State Transfer (REST) format for URLs
E.g. http://web.server/chemicals/view/exptml:chm1
Allows for searching of all files in Fedora
Can also search based on relationships
Can extract data out of XML files
Can gather data from other websites (via API controller) and
add it to ExptML files
14. Typical things we record
in our notebook
Eureka Website – Notebook
15. Conclusion
Eureka uses ExptML for representing science data
Reliable storage system for ExptML files (Fedora)
Method for storage of relationships (RDF in Fedora)
Web application to create ExptML files (Eureka)
TODO
Provide web functionality to process data
Provide mechanism for sharing of data (authenticated)
Integration into the RDA model for sharing research data
Integrate with many other websites, e.g. ChemSpider
Support enlItemManifest and future RDA specifications