Liberating Laboratory Data - Eureka

Eureka Research Workbench:
Semantic Capture of the
Scientific Process
Stuart J. Chalk
Department of Chemistry
University of North Florida
Jacksonville, FL USA
schalk@unf.edu

Liberating Laboratory Data – Day 2

Capturing Science Data
Data is a fundamental output of science, but…
Data is not useful if it does not have context
Big data analytics needs detailed, well structured metadata and
relationships to assemble aggregated datasets for useful
interpretation
Options
LabArchives http://www.labarchives.com
eCAT http://www.researchspace.com/electronic-lab-notebook/
LabTrove http://www.labtrove.org/
Dryad data publishing http://datadryad.org/
or …

Eureka Research Workbench
Started in 2006 as an offshoot of getting involved in the
Analytical Information Markup Language (AnIML) project
No way to store all research notes in a digital format
No way to capture the workflow of scientists
Realized writing in a lab notebook is equivalent to “multitype” blogging in the digital world
How to capture information? Many datatypes -> ExptML
How to store files and make them available through web
interface? (Fedora-Commons)
How to link data together? RDF (in Fedora-Commons)

Experiment Markup Language (ExptML)
A specification (written in XML) that describes
different types of information recorded during the
scientific process (http://exptml.sourceforge.net)
Many datatypes (will expand…)











Annotation
Api

Calculation
Chemical
Citation
Communication

Customer
Data
Dataset
Definition












Element
Equipment
Event
Experiment

Group
Project
Protocol
Quote
Report
Result












Sample
Solution
Space
Specimen

Substance
Task
Template
Timeline
User
Vendor

Related Data - ExptML Ontology
In computer science and ontology
“formally represents knowledge as a set of concepts within
a domain, and the relationships between those concepts. It
can be used to model a domain and support reasoning about
concepts.”*

In essence, an ontology allows us to define the
relationships and assertions about concepts
For substances represented in ExptML we define
isSubstance (assertion)
hasSubstance
isSubstanceOf
*https://en.wikipedia.org/wiki/Ontology_(information_science)

Fedora Commons
Digital repository software for creating and managing
online digital libraries
Stores the ExptML files
Stores any other files (PDFs, Images, Word etc.)
Stores relationships as RDF

Version control
Checksumming
Built in search of content and relationships

File Storage
Fedora-Commons treats each ExptML file as an object
In the definition of a fedora object the file is just one
stream of many. By default each object also has a “DC”
stream of metadata and an “RELS-EXT” stream of
relationships
Each Fedora object can have any number of additional
streams for
Paper PDFs, product/sample pictures, original file formats (if a
conversion has been done)
Video, audio, anything

You can export individual streams or the whole Fedora
object with streams binary encoded (Sharing/archiving)

Eureka Interface
So, finally to the Eureka Research Workbench!
Web interface written in PHP using the CakePHP Framework
Communicates with Fedora-Commons API to
create, retrieve, update and delete (CRUD) ExptML and
other files
Representational State Transfer (REST) format for URLs
E.g. http://web.server/chemicals/view/exptml:chm1

Allows for searching of all files in Fedora
Can also search based on relationships
Can extract data out of XML files
Can gather data from other websites (via API controller) and
add it to ExptML files

Typical things we record
in our notebook




Eureka Website – Notebook

Conclusion
Eureka uses ExptML for representing science data
Reliable storage system for ExptML files (Fedora)
Method for storage of relationships (RDF in Fedora)
Web application to create ExptML files (Eureka)
TODO
Provide web functionality to process data
Provide mechanism for sharing of data (authenticated)
Integration into the RDA model for sharing research data
Integrate with many other websites, e.g. ChemSpider
Support enlItemManifest and future RDA specifications

References
Eureka – http://sourceforge.net/projects/eureka
Fedora-Commons – http://fedora-commons.org
XML – http://www.w3.org/standards/xml
ExptML – http://exptml.sourceforge.net/
JSON – http://www.json.org/
UnitsML – http://unitsml.nist.gov/
RDF – http://www.w3.org/RDF/
CIR – http://cactus.nci.nih.gov/chemical/structure
RDA – http://rd-alliance.org

Liberating Laboratory Data - Eureka

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (19)

Destacado

Destacado (6)

Similar a Liberating Laboratory Data - Eureka

Similar a Liberating Laboratory Data - Eureka (20)

Más de Stuart Chalk

Más de Stuart Chalk (17)

Último

Último (20)

Liberating Laboratory Data - Eureka