Scientists are looking for ways to leverage web 2.0 technologies in the research laboratory and as a consequence a number of approaches to web-based electronic notebooks are being evaluated. In this presentation I discuss the Eureka Research Workbench, an electronic laboratory notebook built on semantic technology and XML. Using this approach the context of the information recorded in the laboratory can be captured and searched along with the data itself. A discussion of the current system is presented along with the next planned development of the framework and long-term plans relative to linked open data. Presented at the 246th American Chemical Society Meeting in Indianapolis, IN, USA on September 12th, 2013.
Unraveling Multimodality with Large Language Models.pdf
Eureka Research Workbench: A Semantic Approach to an Open Source Electronic Laboratory Notebook
1. Eureka Research Workbench:
A Semantic Approach
to an Open Source
Electronic Laboratory Notebook
Stuart J. Chalk
Department of Chemistry
University of North Florida
schalk@unf.edu
2013 Fall ACS Meeting – CINF Paper 116
2. Big Data
Electronic Notebooks
The Eureka Research Workbench
Experiment Markup Language
ExptML Schema and Files
Semantic Data and Ontologies
File Storage
Eureka Interface
Web Interface
Conclusion
Outline
3. Current buzz word for “this bring together lots of data and
build tools on top to extract knowledge”
This is great, except…
How do we do that for science?
Platform, data structures, and exchange protocols to
capture, identify, and disseminate scientific information
Research Data Alliance (https://rd-alliance.org/)
http://www.nytimes.com/2013/08/13/science/how-to-share-scientific-data.html
Big Data
4. Electronic Notebooks (ELNs) very common in industry
Not appropriate for academics doing science
Expensive
Overly complicated (regulations)
Data sharing not easy
We need an electronic notebook for faculty/students
LabArchives http://www.labarchives.com
eCAT http://www.researchspace.com/electronic-lab-notebook/
LabTrove http://www.labtrove.org/
Dryad data publishing http://datadryad.org/
Electronic Notebooks
5. Started in 2006 as an offshoot of getting involved in the
Analytical Information Markup Language (AnIML) project
through ASTM
No way to store all research notes in a digital format
No way to capture the workflow of scientists
Realized writing in a lab notebook is equivalent to “multi-
type” blogging in the digital world
How to capture information? Many datatypes -> ExptML
How to store files and make them available through web
interface? (Fedora-Commons)
How to link data together? RDF (in Fedora-Commons)
Eureka Research Workbench
6. A specification (written in XML) that describes
different types of information recorded during the
scientific process (http://exptml.sourceforge.net)
Many datatypes (will expand…)
Experiment Markup Language (ExptML)
Sample
Solution
Space
Specimen
Substance
Task
Template
Timeline
User
Vendor
Annotation
Api
Calculation
Chemical
Citation
Communication
Customer
Data
Dataset
Definition
Element
Equipment
Event
Experiment
Group
Project
Protocol
Quote
Report
Result
11. Files that represent the data need to be ‘linked’ together to
allow the user to see the context of the data
The ‘Semantic Web’ is a big push to contextualize data
Proposed storage of ‘relationships’ between data is the
Resource Description Format (RDF - http://www.w3.org/RDF/)
Semantic Data
From http://www.w3.org/TR/2004/REC-rdf-primer-20040210/
12. In computer science and ontology
“formally represents knowledge as a set of concepts within
a domain, and the relationships between those concepts. It
can be used to model a domain and support reasoning about
concepts.”*
In essence, an ontology allows us to define the
relationships and assertions about concepts
For substances represented in ExptML we define
isSubstance (assertion)
hasSubstance
isSubstanceOf
ExptML Ontology
*https://en.wikipedia.org/wiki/Ontology_(information_science)
14. Digital repository software for creating and managing
online digital libraries
Stores the ExptML files
Stores any other files (PDFs, Images, Word etc.)
Stores relationships as RDF
Version control
Checksumming
Built in search of content and relationships
Fedora Commons
15. Fedora-Commons treats each ExptML file as an object
In the definition of a fedora object the file is just one
stream of many. By default each object also has a “DC”
stream of metadata and an “RELS-EXT” stream of
relationships
Each Fedora object can have any number of additional
streams for
Paper PDFs, product/sample pictures, original file formats (if a
conversion has been done)
Video, audio, anything
You can export individual streams or the whole Fedora
object with streams binary encoded (Sharing/archiving)
File Storage
17. So, finally to the Eureka Research Workbench!
Web interface written in PHP using the CakePHP Framework
Communicates with Fedora-Commons API to create,
retrieve, update and delete (CRUD) ExptML and other files
Representational State Transfer (REST) format for URLs
E.g. http://web.server/chemicals/view/exptml:chm1
Allows for searching of all files in Fedora
Can also search based on relationships
Can extract data out of XML files
Can gather data from other websites (via API controller) and
add it to ExptML files
Eureka Interface
18. Eureka Website - Group
Onlydatatypesrelatedtothe
researchgroupshowuponleft
19. Eureka Website – Lab Bench
Typesofinformationthatarethingsyou
wouldhaveonyourlabbenchareonleft
Clicking on the “Add” menu on the right
Allows you add a comment to this solution
21. Eureka Website - Laboratory
Informationaboutresourcesthat
youuseinyourlaboratory
The “Rel” menu shows you the information related to this instrument
22. Eureka Website - Library
Papersandprotocols
relatedtoyourwork
You can add the PDF
of the paper to the
citation.
The contents of the
PDF is searchable in
the system
24. Robust markup language for representing science data
(ExptML)
Reliable storage system for ExptML files (Fedora)
Method for storage of relationships (RDF in Fedora)
Web application to create ExptML files (Eureka)
TODO
Provide web functionality to process data
Provide mechanism for sharing of data (different levels)
Integration into the RDA model for sharing research data
Get the word out and test system with many users
Conclusion