Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
The possibility and probability of a global Neuroscience Information Framework
1. The possibility and probability
of establishing a global
neuroscience information
framework:
lessons learned from practical experiences
in data integration for neuroscience
Maryann Martone, Ph. D.
University of California, San Diego
2. “Neural Choreography”
“A grand challenge in neuroscience is to elucidate brain function in relation
to its multiple layers of organization that operate at different spatial and
temporal scales. Central to this effort is tackling “neural choreography” --
the integrated functioning of neurons into brain circuits--their spatial
organization, local and long-distance connections, their temporal
orchestration, and their dynamic features. Neural choreography cannot
be understood via a purely reductionist approach. Rather, it entails the
convergent use of analytical and synthetic tools to gather, analyze and
mine information from each level of analysis, and capture the emergence
of new layers of function (or dysfunction) as we move from studying
genes and proteins, to cells, circuits, thought, and behavior....
However, the neuroscience community is not yet fully engaged in exploiting the
rich array of data currently available, nor is it adequately poised to capitalize
on the forthcoming data explosion. “
Akil et al., Science, Feb 11, 2011
3. On the other hand...
In that same issue of Science
Asked peer reviewers from last year about the availability and use of
data
About half of those polled store their data only in their
laboratories—not an ideal long-term solution.
Many bemoaned the lack of common metadata and archives as a
main impediment to using and storing data, and most of the
respondents have no funding to support archiving
And even where accessible, much data in many fields is too poorly
organized to enable it to be efficiently used.
“...it is a growing challenge to ensure that data produced during the
course of reported research are appropriately
described, standardized, archived, and available to all.” Lead Science
editorial (Science 11 February 2011:Vol. 331 no. 6018 p. 649 )
4. We speak piously of taking
measurements and making small
studies that will add another brick
to the temple of science. Most
such bricks just lie around the
brickyard.
Platt,J.R. (1964)Strong Inference.
Science. 146: 347-353.
"We now have unprecedented
ability to collect data about
nature…but there is now a crisis
developing in biology, in that
completely unstructured information
does not enhance understanding”
-Sidney Brenner
5. The
Encyclopedia
of Life
A…
Access to data has
changed over the
years
Tim Berner-s Lee: Web of data
Wikipedia defines Linked Data as "a term used
to describe a recommended best practice for
exposing, sharing, and connecting pieces of
data, information, and knowledge on the
SemanticWeb using URIs and RDF.”
http://linkeddata.org/
Genban
k
PDB
6. Are we there yet?
We’d like to be able to find:
What is known****:
What is the average diameter of a Purkinje
neuron
Is GRM1 expressed In cerebral cortex?
What are the projections of hippocampus
What genes have been found to be
upregulated in chronic drug abuse in adults
What studies used my monoclonal mouse
antibody against GAD in humans?
Find all instances of spines that
contain membrane-bound organelles
****by combining data from different
sources and different groups
What is not known:
Connections among data
Gaps in knowledge
We’d like it to be really simple to
implement and use:
– Query interface
– Search strategies
– Data sources
– Infrastructure
– Results display
– Trust
– Context
– Analysis tools
– Tools for translating existing
content into linkable form
– Tools for creating new data ready
to be linked
7. NIF is an initiative of the NIH Blueprint consortium of institutes
What types of resources (data, tools, materials, services) are
available to the neuroscience community?
How many are there?
What domains do they cover? What domains do they not cover?
Where are they?
Web sites
Databases
Literature
Supplementary material
Who uses them?
Who creates them?
How can we find them?
How can we make them better in the future? http://neuinfo.org
A look into the brickyard
• PDF files
• Desk drawers
8. How many resources are there?
•NIF Registry: A
catalog of
neuroscience-relevant
resources
•> 3500 currently
described
•> 1700 databases
•Another 3000
awaiting curation
•And we are finding
more every day
9. But we have Google!
Current web is designed
to share documents
Documents are
unstructured data
Much of the content of
digital resources is part of
the “hidden web”
Wikipedia: The DeepWeb
(also called Deepnet, the
invisible
Web, DarkNet, Undernet
or the hiddenWeb) refers
toWorldWideWeb content
that is not part of the
SurfaceWeb, which is
indexed by standard
search engines.
10. A tip of the “resourceome”
Microarray
9, 535, 440
Model organisms
246, 639
Connectivity
26, 443
Antibodies
890, 571
Pathways
43, 013
Brain Activation
Foci
56, 591
65 databases
11. But we have Pub Med!
Bulk of neuroscience data
is published as part of
papers
> 20,000,000
Structured vs
unstructured information
“...it is a growing challenge to ensure that
data produced during the course of reported
research are appropriately
described, standardized, archived, and
available to all.” Lead Science editorial
(Science 11 February 2011: Vol. 331 no. 6018 p.
649 )
Author, year,
journal, keyw
ords
Content
12. The Neuroscience Information Framework: Discovery and
utilization of web-based resources for neuroscience
A portal for finding and
using neuroscience
resources
A consistent framework for
describing resources
Provides simultaneous
search of multiple types of
information, organized by
category
Supported by an expansive
ontology for neuroscience
Utilizes advanced
technologies to search the
“hidden web”
http://neuinfo.org
UCSD,Yale, CalTech, George Mason, Washington Univ
Supported by NIH Blueprint
Literature
Database
Federation
Registry
13. Neuroscience is unlikely to be
served by a few large databases
like the genomics and proteomics
community
Whole brain data
(20 um
microscopic MRI)
Mosiac LM
images (1 GB+)
Conventional LM
images
Individual cell
morphologies
EM volumes &
reconstructions
Solved molecular
structures
No single technology serves these all
equally well.
Multiple data types; multiple
scales; multiple databases
A data federation problem
14. NIF Data Federation
Too many databases to visit
Registry not adequate for finding and using them
Capturing content in a few keywords is difficult if not impossible
Access to deep content; currently searches over 30 million records from > 65
different databases
Flexible tools for resource providers to make their content available as easily and
meaningfully as possible
Organized according to level of nervous system and data type, e.g., brain
activation foci
Link to host resource: these databases are independent!
Provides simplified and unified views to help users navigate very different
resources
Common vocabularies
Common data models for basic neuroscience data
Laying the foundations for data integration for neuroscience
16. HippocampusOR “CornuAmmonis” OR
“Ammon’s horn” Query expansion: Synonyms
and related concepts
Boolean queries
Data sources
categorized by
“data type” and
level of nervous
system
Simplified views of
complex data
sources
Tutorials for using
full resource when
getting there from
NIF
Link back to
record in
original
source
17. What are the connections of the
hippocampus?
Connects to
Synapsed with
Synapsed by
Input region
innervates
Axon innervates
Projects toCellular contact
Subcellular contact
Source site
Target site
Each resource implements a different, though related model;
systems are complex and difficult to learn, in many cases
18. NIF: Minimum requirements to use shared
data
You (and the machine) have to be able to find it
Accessible through the web
Structured or semi-structured
Annotations
You (and the machine) have to be able to use it
Data type specified and in a usable form
You (and the machine) have to know what the data
mean
Semantics
Context: Experimental metadata
Reporting neuroscience data within a consistent framework helps enormously
19. Is GRM1 in cerebral cortex?
NIF system allows easy search over multiple sources of information
But, we have difficulty finding data
Well known difficulties in search
Inconsistent and sparse annotation of scientific data
Many different names for the same thing
The same name means many things
“Hidden semantics”: 1 = male; 1 = present; 1=mouse
Allen Brain Atlas
MGD
Gensat
21. What is an ontology?
Brain
Cerebellum
Purkinje Cell Layer
Purkinje cell
neuron
has a
has a
has a
is a
Ontology: an explicit, formal
representation of concepts
relationships among them
within a particular domain that
expresses human knowledge in a
machine readable form
Branch of philosophy: a theory
of what is
e.g., Gene ontologies
22. What can ontology do for us?
Express neuroscience concepts in a way that is machine readable
Synonyms, lexical variants
Definitions
Provide means of disambiguation of strings
Nucleus part of cell; nucleus part of brain; nucleus part of atom
Rules by which a class is defined, e.g., a GABAergic neuron is neuron that releases
GABA as a neurotransmitter
Properties
Provide universals for navigating across different data sources
Semantic “index”
Perform reasoning
Link data through relationships not just one-to-one mappings
Provide the basis for concept-based queries to probe and mine data
As a branch of philosophy, make us think about the nature of the
things we are trying to describe, e.g., synapse is a site
23. Linking datatypes to semantics: What is
the average diameter of a Purkinje
neuron dendrite?
Branch structure not a
tree, not a set of blood
vessels, not a road map but a
DENDRITE
Because anyone who uses
Neurolucida uses the same
concepts: axon, dendrite, cell
body, dendritic
spine, information systems
can combine the data
together in meaningful ways
Neurolucida
doesn’t, however, tell you that
dendrite belongs to a neuron
of a particular type or whether
this dendrite is a neural
dendrite at all
( (Color Yellow) ; [10,1]
(Dendrite)
( 5.04 -44.40 -89.00 1.32) ; Root
( 3.39 -44.40 -89.00 1.32) ; R, 1
(
( 2.81 -45.10 -90.00 0.91) ; R-1, 1
( 2.81 -45.18 -90.00 0.91) ; R-1, 2
( 1.90 -46.01 -90.00 0.91) ; R-1, 3
( 1.82 -46.09 -90.00 0.91) ; R-1, 4
( 0.91 -46.59 -90.00 0.91) ; R-1, 5
( 0.41 -46.83 -92.50 0.91) ; R-1, 6
(
( -0.66 -46.92 -88.50 0.74) ; R-1-1, 1
( -0.74 -46.92 -88.50 0.74) ; R-1-1, 2
( -2.15 -47.25 -88.00 0.74) ; R-1-1, 3
( -2.15 -47.33 -88.00 0.74) ; R-1-1, 4
( -3.06 -47.00 -87.00 0.74) ; R-1-1, 5
( -4.05 -46.92 -86.00 0.74) ; R-1-1, 6
Output of Neurolucida neuron trace
24. “A rose by any other name...”:
Identity:
Entities are uniquely identifiable
Name is a meaningless numerical identifier (URI: Uniform resource identifier)
Any number of human readable labels can be assigned to it
Definition:
Genera: is a type of (cell, anatomical structure, cell part)
Differentia: “has a” A set of properties that distinguish among members of that
class
Can include necessary and sufficient conditions
Implementation: How is this definition expressed
Depending on the nature of the concept or entity and the needs of the
information system, we can say more or fewer things
Different languages; can express different things about the concept that can be
computed upon
OWLW3C standard, RDF
25. Comprehensive Ontology
NIF covers multiple structural scales and domains of relevance to neuroscience
Aggregate of community ontologies with some extensions for
neuroscience, e.g., Gene Ontology, Chebi, Protein Ontology
Simple, basic “is a : hierarchies that can be used “as is” or to form the building blocks
for more complex representations
NIFSTD
Organism
NS FunctionMolecule Investigation
Subcellular
structure
Macromolecule Gene
Molecule Descriptors
Techniques
Reagent Protocols
Cell
Resource Instrument
Dysfunction Quality
Anatomical
Structure
26. Query across resources: Snca
and striatum
NIF uses the NIFSTD ontologies to query across sources that use very
different terminologies, symbolic notations and levels of granularity
28. Concept-based search: search by meaning
Search Google: GABAergic neuron
Search NIF: GABAergic neuron
NIF automatically searches for types of
GABAergic neurons
Types of GABAergic
neurons
29. Data mining through
interrogation
What genes are upregulated by drugs of abuse in the adult
mouse?
Morphine
Increased
expression
Adult Mouse
30. Integration of knowledge based on
relationships
Looking for commonalities and distinctions among animal
models and human conditions based on phenotypes
Sarah Maynard, Chris Mungall, Suzie Lewis NINDS
Thalamus
Cellular inclusion
Midline nuclear
group
Lewy Body
Paracentral nucleus
Cellular inclusion
31. And now, the literature
The scientific article remains the currency of science
Vast majority of neuroscience data is published in
the literature
Computational biologists like to consume data
Neuroscientists like to produce it
Two NIF projects:
1) Resource identification from the literature
Identifying antibodies used in scientific studies from
text
2) Extracting data from tables and supplementary
material
32. Neuroscience is fundamentally reliant on antibodies
Neuroscientists spend a lot of time searching for antibodies
that will work in their system for the target of interest and
troubleshooting experiments that didn’t work
The scientific literature is a major source of information on
antibodies
Proposal
Use text mining strategies to identify antibodies, protocol
type and subject organism from materials and methods
section of J. Neuroscience
Problem: antibodies
33. Midfrontal cortex tissue samples from neurologically unimpaired subjects (n9)
and from subjects with AD (n11) were obtained from the Rapid Autopsy
Program
Immunoblot analysis and antibodies
The following antibodies were used for immunoblotting:-actinmAb (1:10,000
dilution, Sigma-Aldrich); -tubulinmAb (1:10,000,Abcam);T46 mAb (specific to tau 404–
441, 1:1000, Invitrogen);Tau-5 mAb (human tau 218–225, 1:1000, BD Biosciences) (Porzig et
al., 2007);AT8 mAb (phospho-tau Ser199, Ser202, andThr205, 1:500, Innogenetics); PHF-1
mAb (phospho-tau Ser396 and Ser404, 1:250, gift from P. Davies); 12E8 mAb(phospho-tau
Ser262 and Ser356, 1:1000, gift from P. Seubert); NMDA receptors 2A, 2B and 2D goat pAbs (C
terminus, 1:1000, Santa Cruz Biotechnology)…
Semantic annotation: Entity mapping by
human
Sato et al., J. Neurosci. 2008 Subject is
Human
Antibody #7
"12E8" is a Monoclonal antibody birnlex_2027
Antibody
reagent has target human PHF tau
Waiting for
Neurolex ID
Protein product of
Antibody
reagent has provider Peter Seubert
Antibody
reagent has catalog #
Antibody
reagent
has source
organism Mouse birnlex_167 NCBI Taxonomic ID: 10090
Antibody
reagent has id "12E8"
Provider has location
Elan Pharmaceuticals, South San
Francisco, CA
Provider has url
34. Try this Watson!
• 95 antibodies were identified in 8 articles
• 52 did not contain enough information to determine the
antibody used
• Some provided details in another paper
• And another paper, and another...
• Failed to give species, clonality, vendor, or catalog number
• But, many provided the location of the vendor because
the instructions to authors said to do so
• no antibodies had lot numbers associated
We never got to test the algorithms!
35. NIF along with several other large informatics
projects recommends that all authors provide
vendor and catalog # for all reagents use
But...vendors merge and sell each other’s
antibodies, making it difficult to track down exactly
which reagent was used in some cases
Catalog numbers get replaced; many variants on the
same product, e.g., HRP-conjugated, 200 ulvs 500 ul
Clone names are not unique
Universal antibody ID
Publishing for the 21st Century
36. NIF Antibody Registry
• We have created an antibody
registry database
• Assigns each antibody a
persistent identifier to both
commercial and non-
commercial antibodies
• ID will persist even if company
goes out of business or the
antibody is sold by multiple
vendors
• The data model is being formalized
into a rigorous ontology in
collaboration with others:
• We negotiated with antibody
aggregators to pull data for over
800,000 commercial antibodies,
200 vendors
• Can be used to register homegrown
antibodies as well
• http://antibodyregistry.org
37. “Find studies that used a rabbit polyclonal antibody
against GFAP that recognizes human in
immunocytochemisty”
Paz et al,
J Neurosci, 2010
(AB_310775)
38. Demo 2: Extracting data from
tables and supplementary
material
Challenge: Extract data on gene expression in brain from
studies relevant to drug abuse
Workflow:
Find articles
Extract results
from tables
Standardize
results
Load into NIF
Current DB: 140 tables from 54 articles
Andrea Arnaud-Stagg, Anita Bandrowski
39. Gene for tyrosine
hydroxylase has
increased
expression in locus
coeruleus of mouse
compared to control
when given chronic
morphine
Translations:
Upregulatedp< 0.05 =
increased expression
LC = locus coeruleus
Probe ID = gene name
Extract data and meaning of data
from tables
40. Challenges working with tables and
supplemental data
Difficult data arrangements
PDF, JPG,TXT,CSV, XLS
Difficult styles: colors, symbols, data arrangements (results
combined into one column, multiple comparisons in one table,
legends defining values, unclearly described data (eg., unclear
significance)
Not clear what tables/values represent
nothing in paper about the supplementary data file and table has no heading
Probe ID’s are given but not gene identifiers
No link from supplemental material back to article; lose
provenance
Results are presented but values of significance unclear
Neither curator (nor machine) could distinguish between no difference
and not reported
41. What affects SMN1 expression?
Researchers often report results in a way where curators cannot
extract full information from a study
42. Common theme
•We are not publishing data in a
form that is easy to integrate
•What we mean isn’t clear to a
search engine (or even to a
human)
•We use many different data
structures to say the same
thing
•We don’t provide crucial
information
•Searching and navigating across
individual resources takes an
inordinate amount of human effort
Tempus PecuniaEst Painting by Richard
Harpum
43. When I talk to neuroscientists (and journal editors)...
44. Collaboration, competition,
coordination, cooperation
The diversity and dynamism of neuroscience will make data
integration challenging always
Neural space is vast: No one group or individual can do
everything
We don’t have to solve everything to make it better
Global partnership with room for everyone:
Neuroscientists
Curators
Resource developers
Funders
Computational biologists
Text miners
Computer scientists
Watson
45. Hopeful signs...
•Means for sharing data on the web
becoming more routine
•With availability, growing recognition for a role
of standards and curation
•For neuroscience, we now have
organizations that can help
coordinate
•NIF, NITRC (http://nitrc.org)
•NeuroimagingTools and Resource
Clearinghouse
•International Neuroinformatics
Coordinating Facility
•Educate neuroscientists on what is
necessary
•Bring together stakeholders to
define what is necessary for
interoperation
•Implement structures and
procedures for developing
neuroscience resources within a
framework
http://incf.org
46. We don’t know everything but we
do know some things
1. Register your resource
with NIF!!!!
3: Be mindful
Resource providers: Mindfulness that your
resource is contributing data to a global
federation
Link to shared ontology identifiers where
possible
Stable and unique identifiers for data
Explicit semantics
Database, model, atlas
Researchers: Mindfulness when publishing
data that it is to be consumed by machines
and not just your colleagues
Accession numbers for genes and species
Catalog numbers for reagents
Provide supplemental data in a form where it is
is easy to re-use
2. Become involved with NIF
and INCF
48. Many thanks to...
Amarnath Gupta, UCSD, Co Investigator
Jeff Grethe, UCSD, Co Investigator
Anita Bandrowski, NIF Curator
Gordon Shepherd,Yale University
Perry Miller
Luis Marenco
DavidVan Essen,Washington University
Erin Reid
Paul Sternberg, CalTech
ArunRangarajan
Hans Michael Muller
GiorgioAscoli,George Mason University
SrideviPolavarum
FahimImam, NIF Ontology Engineer
Karen Skinner, NIH, Program Officer
Mark Ellisman
Lee Hornbrook
Kara Lu
VadimAstakhov
XufeiQian
Chris Condit
Stephen Larson
Sarah Maynard
Bill Bug
50. How old is an adult squirrel?
Definitions can be
quantitative
Arbitrary but defensible
Qualitative categories
for quantitative
attributes
Best practice to
provide ages of
subjects, but for
query, need to
translate into
qualitative concepts
Jonathan Cachat, Anita Bandrowski
51. But there are no databases for
siRNA
NIF Registry is probably the most complete accounting we have of what is out
there