Slides from my short talk at INCF 2013 (neuroinformatics annual meeting) in Stockholm. I talk about realities of data sharing and a proposal to make it easier through use and adoption of electronic lab notebooks. Project a collaboration between carnegie mellon university and elsevier research data services.
Towards reusable experiments: making metadata while you measure
1. Towards Reusable Experiments:
Making Metadata While You
Measure
Shreejoy Tripathy
PhD student, Carnegie Mellon
Email: stripat3@gmail.com
Twitter: @neuronJoy
3. Barriers to data sharing
• Social
– “What’s in it for me? How will I get credit?”
– “It’s my data, not yours”
– “The benefit to me isn’t worth the time I put into it”
– “What if I get scooped?”
• Methodological
– “How do I share data? What do I share?”
– “Going back and annotating my files to share is super-
time consuming”
– Specifying file formats, data standards
– Building FTP servers and nice user interfaces
4. Project idea
• How can we make a standard neuroscience
wet lab more data-sharing savvy?
• Incorporate structured workflows into the
daily practice of a typical electrophysiology lab
(the Urban Lab at CMU)
– What does it take?
– Where are points of conflict?
5. Key insights/motivations
1. Effective data
sharing includes raw
data files +
experimental
metadata (typically
stored in a lab
notebook)
SDB_MC_12_voltages.mat
6. Key insights/motivations
1. Share raw data files
+ experimental
metadata
2. You know the most
about an
experiment when
you’re performing it
7. Key insights/motivations
1. Share raw data files +
experimental
metadata
2. You know the most
about an experiment
when you’re
performing it
3. Improved data
practices should
make labs more
productive
10. Metadata data app
• Electronic lab
notebook models
sequential slice-
electrophysiology
workflow
– Replaces pen-and-
paper lab notebook
11. Metadata data entry
• Electronic lab
notebook allows
structured data entry
Animal Strain
12. Metadata data entry
• Electronic lab
notebook allows
structured data entry
(i.e., dropdown
menus)
– Allows incorporation
of semantic ontologies
• Important to strike a
balance between
structure and
flexibility
MGI:3719486
13. Metadata data entry
MGI:3719486
• Electronic lab
notebook facilitates
entry of new content,
like registration of
recorded neurons to
brain atlas
14. Data integration
• Syncing of metadata
app and
electrophysiology data
acquisition via server
– Each trace of
experimental data
annotated with
metadata
• IGOR-Pro specific,
support pClamp, other
acquisition packages as
needed later
16. Data dashboard (future-steps)
• Use collected
metadata to sort
experiments
– Like mouse strain,
neuron type, animal
age
• Enable in-browser
analyses
– Track provenance
of analyzed data
back to raw data
17. Next steps
• Use built tools
– Populate data server with many experiments
• Is use of e-notebook too prohibitive?
– If yes, continue to iterate
– What can we ask now that we couldn’t before?
• It is much easier to ask exploratory questions, like
– How is the cell type that Shawn records different from the one that Matt
records?
• Exposing data to neuroscience databases
– NIF, INCF Dataspace, neuroelectro.org
• How adaptable are these solutions for use in other
labs?
• Who is going to pay for this?
18. Acknowledgements
• Carnegie Mellon
– Shreejoy Tripathy
– Nathan Urban
– Shawn Burton
– Rick Gerkin
– Santosh
Chandrasekaran
– Matthew Geramita
• Elsevier Research
Data Services
– Anita de Waard
– Mark Harviston
– Jez Alder
– Sarah Tyrchniewicz
– David Marques
– (funding!)
19. Next steps
• Roll out updated app to experimentalists
• Populate database with the contents of many
experiments
• Flesh out Data dashboard functionality
• Investigate the new things that we can achieve
given these tools
20. Effective data sharing is…
• Not just experimental data file
– But also the experimental metadata: what was
done? What does this variable mean? This is
usually stored in PHYSICAL lab notebooks,
understandable by only the experimenter
• Effective data sharing – someone who is not
the person who collected the data can
understand the experiment and data
21. App user testing
• “I don’t like the way the app forces me
through a specific workflow, I want to enter
experimental data when I see fit”
• “I’m not opposed to the idea of dropdowns,
but I want more flexibility, more text fields”
• “When I use a lab notebook, I only write down
the absolute minimum. Can the app’s fields
be prepolated with the results of an old
experiment?”
22. What is effective data sharing?
• Effective data sharing – someone who is not
the person who collected the data can
understand the experiment and data
– i.e., datasets should be more or less self-
describing
– >90% of data sharing use cases are an
experimentalist sharing data with a future version
of herself or with a labmate
Tangible benefits of data sharing – more people can collaborate on the same project – which should lead to more productivity and better science = “nature paper”
Walk through pieces 1 by 1, also mention that this is very much an uncompleted work in progress
Walk through pieces 1 by 1, also mention that this is very much an uncompleted work in progress