In a single generation, technology and economic conditions have radically altered the pace and practice of research. Once manageable in a laboratory notebook, the scale and complexity of scientific data in the life sciences has exploded. The number of software packages and distributed computational resources available to scientists for data storage and analysis has undergone similar expansion. Once solitary, research is now increasingly team-based, spanning cross-disciplinary and cross-institutional collaborations. Collaboration requiring specialized scientific computing resources magnifies the challenges of integrating raw data and maintaining analysis provenance. Consequently, the full potential of these resources can only be realized if the entire pipeline from data collection to analysis can readily capture the annotations and intuition of each distributed collaborator. Currently, few tools exist that integrate data management, provenance tracking and collaborative infrastructure into a package palatable to all stakeholders in this growing, distributed team.
Ovation™ (http://ovation.io) is a distributed and eventually consistent data management and collaboration platform. Ovation’s data model, interface and API are closely matched to the mental model of researchers, facilitating adoption by experimental and computational research teams. Ovation integrates with researchers’ existing acquisition and analysis tools including Matlab, Python, R and Mathematica. The Ovation platform helps individual scientists organize their data and track provenance, and empowers collaborative project teams through sharing of data, annotations and analyses. I will share our experience in deploying Ovation to research groups in the life sciences and discuss the potential of deeper integration with computational resources such as those at the UW eScience institute.
More than Just Lines on a Map: Best Practices for U.S Bike Routes
Uw e sciences institute april 2013
1. Challenges in Life Sciences data management and
cloud enabled collaboration
Barry Wark, Ph.D.
Founder and President, Physion
barry@physion.us
Twitter @barryjwark
3. The nature of scientific research has changed, challenging
the fundamentals of the scientific method
Life scientists need solutions that help them bridge local
needs with global resources
Think globally, act locally
4.
5. The nature of scientific research has changed
fundamentally
Biology is a context dependent system. Studying
context dependence requires lots of data.
‣Data volume ‣ Analytical tools
• High-content screening: desktop confocal • Central computing resources, elastic
can image 25,000 samples per day provisioning
• Human genome $5000, and falling fast • Open source software democratizes
contribution and distribution
• IonWorks Barracuda® can perform 6,000
whole-cell patch clamp experiments per hour ‣Teams
‣Data variety • Experimental and analytical specialization
• “Coherent” data sets (e.g. Sage, Personal • Research cores and consortia
Genome Project)
• Distributed across organizations and
• Behavior, anatomy, physiology, genomics institutions
experiments on the same subject
6. Pipelined data flow through computational resources
Researcher
Analyst Result/Report
dataset
7. Data that is not easily pipelined doesn’t get
incorporated
Researcher
dataset
Not scalable
Researcher
Analyst Result/Report
dataset
Researcher
dataset
8. Analysis provenance that transits individual
researchers is hard to track
Researcher
dataset
Researcher
Analyst Result/Report
dataset
Researcher
dataset
Researcher
Analyst Result/Report
dataset
9. Comprehensive data management must span the
entire data lifecycle
Enterprise SDMS
Complexity/Cost
Analytical tools
ELN
OSF
Paper notebook Figshare
Acquisition Analysis
Data lifecycle stage
10. Comprehensive data management must span the
entire data lifecycle
Enterprise SDMS
Complexity/Cost
Analytical tools
Ovation
ELN
OSF
Paper notebook Figshare
Acquisition Analysis
Data lifecycle stage
11. Ovation’s data model describes science
Ovation is built to represent the language of science. Scientific data, regardless of
discipline, fits this model.
Analogous example shows that representing music in the appropriate language of the domain
provides an appropriate data model
Music, in the language of the domain expert. Computer representation in the language of
May include margin notes, etc. the domain expert (including “margin notes”
from composer, conductor, etc.). Any genre
of music is representable.
Lab notebook representation Ovation representation 11
12. Ubiquitous data model is the correct granularity for
knowledge transfer
Ovation’s data model is more granular than an ELN. Instead of loosing information
during conversion to (and from) a report format such as a Word document or PDF,
Ovation allows data to be transferred in the natural language and granularity of
science.
Information lost in transfer
Analogous example shows that transferring data via a “report” (a sound recording) produces an information bottleneck
Data transferred directly
Seamless collaboration and data transfer removes information bottlenecks 12
13. Common data model enables collaboration
Interoperability across institutional boundaries is easier with Ovation than other
solutions. Unlike ad-hoc or customized data management systems, every Ovation
customer uses the same data model.
Individual Global
Collaborators
researcher community
Data transfer via Ovation data model
13
14. The Ovation data model for subject definition
Protocol Project
Subject
Epoch Experiment
Procedure
Subject
{
species : Drosophila melanogaster,
father : 79326326-9CC0-4770-8DC6-3695113C7A64,
mother : A2D40CFF-3016-41AE-AC67-BB09A7D8D9E1
}
15. The Ovation data model for measurements
Project
Subject
Protocol Experiment EquipmentSetup
Epoch
Procedure
Measurement
DataElement
16. The Ovation data model for analysis provenance
AnalysisRecord
Optionally named
AnalysisRecord Named
DataElement
Measurement Optionally named
DataElement
Measurement
DataElement
Measurement
DataElement
18. Ovation uses eventual consistency
Ovation chooses availability and partition tolerance
over consistency
(so you can work from the coffee shop)
X2
1 Y1 X2’
1 Y1 X1 Y1
Client 1 Client 2 Cloud
19. Ovation uses eventual consistency
This means conflicting edits can be made by
disconnected clients
Append-only (mostly) and user-isolated
changes at the edge of the object graph
minimize these conflicts.
X2
1 Y1 X2’
1 Y1 X2 X2’
1 Y1
Client 1 Client 2 Cloud
20. Ovation uses eventual consistency
Ovation requires users to resolve conflicts that
they have authority to decide during sync.
X2
3 Y1 X2’
3 Y1 X2 X2’
3 Y1
Client 1 Client 2 Cloud
21. Ovation Scientific Data Management System®
• Comprehensive data management
• Multi-modality
• Multi-user annotation
• Analysis provenance
• Seamless user experience
• Double-click installation
• Integration with existing tools: Matlab, Python,
R, Java
• Guide to success
• Effective collaboration
• Distributed and co-located experts
• Data ownership maintained
• Cloud-based replication and archiving
22. Integrated analysis workflow
Analysis pipelines that begin with a search, facilitate
automatic incorporation of new results
Acquire Organize Search Analyze
%% Run a simple query
iterator = context.query('Epoch', ' ...criteria... ');
while(iterator.hasNext())
currEpoch = itrator.next();
...analyze currEpoch...
end
23. Integrated analysis workflow
Acquire Organize
Search Analyze
Acquire Organize
Replication technology allows Ovation to replicate a subset of the database for data locality within a computational cluster.
Execute workflows on a local or cloud cluster
24. context = NewDataStoreCoordinator('username', password).getContext();
epochs = context.query(context.getQuery('query-name'));
%% analysis parameters
params = struct();
params.MaxLag = 1000; % time window for cross-correlation function
params.ResponseDelayPts = 0; % exclude at end of modulated light
params.MinAnalysisEpochs = 3;
params.FrequencyCutoff = 500;
params.FlushData = 1;
%% ANALYZE AND COLLECT RESULTS
====> ORIGINAL ANALYSIS CODE HERE <====
%% save analysis record for this figure
ar = project.insertAnalysisRecord('Figure 1’, epochs,
'AnalysisFunction.m', params, svnRevision, svnURL);
ar.setUserDescription('Manuscript - Figure 1');
ar.addTag(<manuscript>);
ar.addOutput('Figure 1a’, './Figure1a.pdf');
ar.addResource('Figure 1b’, './Figure1b.pdf');
26. Share data in context
Project Source
Experiment Experiment
Device
Trial Group
DerivedResponse
Trial Trial Trial
name: spikes
parameters: {…}
code: spikes.m
Stimulus Response
Stimulus Response
27. Ovation enables researchers to extract more
knowledge from existing data
• Lab’s lifetime work was enough data to answer fundamental questions about signal
and noise in the early visual system
• Data was locked in individual’s ad-hoc data management
• Ovation enabled meta-analysis of this existing data
• New graduate students start with the old data, not new experiments et al. • Arrestin Competition
(38):11867–11879 Doan
psin is pro-
d for each
e transduc-
convert the
nge in cur-
mptions, we
␣ and ␥0/
the single-
GRK1ϩ/Ϫ, “Ovation has changed the way we do science…”
—Fred Rieke
able 2). Be-
30. ovation.io
• Store and archive all your data • Make your data available wherever you
need it
• Safe, secure, highly reliable cloud
storage • Replicate and synchronize data to
multiple devices
• “Offline” archiving
• Benefit from our scalable cloud-based
architecture
• Collaborate locally and globally
• Pay for what you use
• Share selected data with designated
users or the public
• Simple monthly fee
32. Neuron
Inference in Visual Adaptation
Collaboration with ovation.io
>sp|P63252|1-427
MGSVRTNRYSIVSSEEDGMKLATMAVANGFG
NGKSKVHTRQQCRSRFVKKDGHCNVQFIN
VGEKGQRYLADIFTTCVDIRWRWMLVIFCLA
FVLSWLFFGCVFWLIALLHGDLDASKEGK
ACVSEVNSFTAAFLFSIETQTTIGYGFRCVT
DECPIAVFMVVFQSIVGCIIDAFIIGAVM
AKMAKPKKRNETLVFSHNAVIAMRDGKLCLM
WRVGNLRKSHLVEAHVRAQLLKSRITSEG
EYIPLDQIDINVGFDSGIDRIFLVSPITIVH
EIDEDSPLYDLSKQDIDNADFEIVVILEG
MVEATAMTTQCRSSYLANEILWGHRYEPVLF
EEKHYYKVDYSRFHKTYEVPNTPLCSARD
LAEKKYILSNANSFCYENEVALTSKEEDDSE
NGVPESTSTDTPPDIDLHNQASVPLEPRP
LRRESEI
an Increase in Temporal Contrast Depends on the Period between Contrast Switches
RGC (holding potential 10 mV) in response to a single switch in stimulus contrast (6%–36%,
n (A) and 32 s in (B).
als as in (A) and (B). Exponential fits to the response following an increase in contrast are shown in red.
Figure 1. The Time Course of Adaptation following an Increase in Temporal Contrast Depends on the Period between Contrast Switches
nt (mean ± SEM) of the exponential fit to the response following an increase in contrast (6%–36%) for
OFF) as a function of stimulus switching period.
(A and B) Inhibitory synaptic current to an OFF-transient RGC (holding potential 10 mV) in response to a single switch in stimulus contrast (6%–36%,
Meister, 2002; nonrectified, the r.m.s. current was fit with the same function.
mean $400 R*/rod/s; red). The switching period was 16 s in (A) and 32 s in (B).
ynamics of the The exponential amplitude A and baseline c did not change
(C and D) significantly as a function of the switching period approximately 100 trials as in (A) and (B). Exponential fits to the response following an increase in contrast are shown in red.
Mean synaptic currents from (not shown).
Figure 1E shows the population average time constant as
(E) Population-averaged (n z 10 for each period) time constant (mean ± SEM) of the exponential fit to the response following an increase in contrast (6%–36%) for
a function of period. The average effective time constant of
adaptation scales approximately linearly across a broad range
stall RGC types (ON, OFF-sustained, OFF-transient, and ON-OFF) as a function of stimulus switching period.
of switching periods ($8–32 s). The observed scaling fails for
ion depend on short periods but extends to the longest period (T = 32 s) that
eriodic switch we could measure reliably. A similar relationship was observed
scribed below, when comparing the time constant of an exponential fit to only
se in contrast the first 8 s of 8, 16, and 32 s periods (not shown). Thus the effect
et al., 2001; Smirnakis et al., 1997; Baccus and Meister, 2002;
ptic currents in
is not simply the result of fitting an exponential to a nonexponen-
tial response over varying time windows. These results indicate
nonrectified, the r.m.s. current was fit with the same function.
Kim and Rieke, 2001). Here we focus on the dynamics of the
a stimulus that
period of 16 s
that a fixed first-order process does not govern the dynamics
of contrast adaptation in mouse retina. Instead, the adapting
The exponential amplitude A and baseline c did not change
slow component of adaptation.
d across trials
trast stimulus,
machinery has access to multiple timescales.
Dynamics of Adaptation to Luminance
significantly as a function of the switching period (not shown).
synaptic input
urse of several
To test the generality of multiple-timescale dynamics of adapta-
tion, we measured responses to periodic changes in mean light
Figure 1E shows the population average time constant as
Contrast and Luminance Adaptation
slow relaxation intensity (luminance). As for contrast adaptation, the dynamics of
a function of period. The average effective time constant of
33. Analysis provenance that transits individual
researchers is hard to track
Researcher
dataset
Researcher
Analyst Result/report
dataset
Researcher
dataset
Researcher
Analyst Result/report
dataset