SlideShare una empresa de Scribd logo
1 de 46
A Deep Survey of the Digital
Resource Landscape:
Perspectives from the Neuroscience
Information Framework
Maryann E. Martone, Ph. D.
University of California, San Diego
• NIF is an initiative of the NIH Blueprint consortium of institutes
– What types of resources (data, tools, materials, services) are available to the
neuroscience community?
– How many are there?
– What domains do they cover? What domains do they not cover?
– Where are they?
• Web sites
• Databases
• Literature
• Supplementary material
– Who uses them?
– Who creates them?
– How can we find them?
– How can we make them better in the future?
http://neuinfo.org
• PDF files
• Desk drawers
The Neuroscience Information
Framework
• NIF has developed a
production technology
platform for researchers to:
– Discover
– Share
– Analyze
– Integrate
neuroscience-relevant
information
• Since 2008, NIF has
assembled the largest
searchable catalog of
neuroscience data and
resources on the web
• Cost-effective and
innovative strategy for
managing data assets
“This unique data depository serves as a model
for other Web sites to provide research data. “
- Choice Reviews Online
NIF is poised to capitalize on the new tools
and emphasis on big data and open
science
http://neuinfo.org
June10, 2013 dkCOIN Investigator's Retreat 4
The Neuroscience Information Framework: Discovery and
utilization of web-based resources for neuroscience
• A portal for finding and using
neuroscience resources
 A consistent framework for
describing resources
 Provides simultaneous
search of multiple types of
information, organized by
category
 Supported by an expansive
ontology for neuroscience
 Utilizes advanced
technologies to search the
“hidden web”
UCSD, Yale, Cal Tech, George Mason, Washington Univ
Literature
Database
Federation
Registry
Part 1: Surveying the
resource landscape
•NIF Registry: A catalog
of neuroscience-
relevant resources
•> 6000 currently
listed
•>2200 databases
•And we are finding
more every day
How do resources get added to the
NIF Registry?
June10, 2013 dkCOIN Investigator's Retreat 6
•NIF curators
•Nomination by the
community
•Semi-automated text mining
pipelines
NIF Registry
Requires no special
skills
Site map available for
local hosting
•NIF Data Federation
•DISCO interop
•Requires some
programming skill
Bandrowski et al., 2012
NIF Registry
• Extended over time
– Parent resource
– Supporting agency
– Grant numbers
– Accessibility
– Related to
– Organism
– Disease or condition
– Last updated
First catalog: SFN Neuroscience Database Gateway  NIF 0.5  NIF 1.0+
Simple metadata model
Name, description, type, URL, other
names, keywords, unique identifier
~2003 2006 2008
Resource Curation
June10, 2013 dkCOIN Investigator's Retreat 8
• NIF Registry is hosted
on Semantic Media
Wiki platform
Neurolex
– Community can
add, review, edit
without special
privileges
– Searchable by Google
– Integrated with NIF
ontologies
– Graph structure
http://neurolex.org
The resource graph
NIF is creating the linked data graph of resources
Keeping the Registry Current
– NIF employs an automated link checker
– Last analysis: 478/6100 invalid URL’s (~8%)
– 199 can’t locate at another university or location  out of service (~3%)
– Bigger issue: Many resources are no longer updated or maintained
0
20
40
60
80
100
120
140
160
180
200
1996 1998 2000 2002 2004 2006 2008 2010 2012 2014
0
500
1000
1500
2000
2500
3000
3500
Resourcesadded
Lastupdated
• Automated text mining is used to look for “web page last
updated” or copyright dates
– Identified for 570 resources; manual review suggested that the results
were accurate although we can’t guarantee that the date itself is
accurate
– 373 were not updated within the last 2 years (65%)
• Manual review of ~200 resources identified by 3DVC for their
catalog
– 38 not updated within the past 2 years (~20%)
– 8 migrated to new addresses or institutions
– 7 are no longer in service (~3%)
– 3 were deemed no longer appropriate
Tracking the fate of digital resources
Yuling Li, Paul Sternberg, Cal Tech
Keeping content up
to date
Connectome
Tractography
Epigenetics
•New tags come into
existence
•New resource types come
into existence, e.g., Mobile
apps
•Resources add new types of
content
•Change name
•Change scope
•> 7000 updates to the
registry last year
It’s a challenge to keep the registry up to date;
sitemaps, curation, ontologies, community review
Ontology provides a human-centric
model for search and data integration
June10, 2013 dkCOIN Investigator's Retreat 13
Last updated...
• Some neglected
resources are still
valuable
– Complete data sets
– Rare data
• Software may still be
usable
• Some
databases, however, ma
y only be of historical
interest
– “all metalloproteins
found in PDB” Are all databases and data sets equally valuable?
• The NIF Registry has created a linked data
graph of web-accessible resources
• Maintained on a community wiki
platform
• Provides data on the fluidity of the
resource landscape
– New resources continue to be created and
found
– Relatively few disappear altogether
– Many more grow stale, although their value
may still be significant
– Maintaining up to date curation requires
frequent updating
Summary
NIF Registry provides insight into the state of digital
resources on the web
Part 2: Surveying the data
landscape
•The NIF data federation performs deep search over
the content of over 200 databases
•New databases are added at a rate of 25-40 per year
•Latest update: Open Source Brain; ingest
completed in 2 hours
•Databases chosen on a variety of criteria:
•Early: testing different types of resources
•Thematic areas
•Volunteers
0
50
100
150
200
250
0.01
0.1
1
10
100
1000
Jun-08 Dec-08 Jul-09 Jan-10 Aug-10 Feb-11 Sep-11 Apr-12 Oct-12 May-13
NumberofFederatedDatabases
NumberofFederatedRecords(Millions)
Data Federation Growth
NIF searches the largest collation of
neuroscience-relevant data on the web
DISCO
June10, 2013 dkCOIN Investigator's Retreat 17
Data Ingestion Architecture
Current
Planned
DISCO Dashboard Functions
• Ingest Script Manager
• Public Script Repository
• Data & Event Tracker
• Versioning System
• Curator Tool
• Data Transformer Manager
June10, 2013 dkCOIN Investigator's Retreat 18Luis Marenco, Rixin Wang, Perrry Miller, Gordon Shepherd
Yale University
DISCO Dashboard
June10, 2013 dkCOIN Investigator's Retreat 19
• Management of registry resources
through a single administrative
dashboard
• Associated discovery pipeline
• Tools to manage data updates
• Change tracking
• Globally unique identifier creation
Luis Marenco, Rixin Wang, Perrry Miller, Gordon Shepherd
Yale University
NIF data federation
NIF was designed to be populated rapidly
with progressive refinement
What are the connections of the
hippocampus?
Hippocampus OR “CornuAmmonis” OR
“Ammon’s horn” Query expansion: Synonyms
and related concepts
Boolean queries
Data sources
categorized by
“data type” and
level of nervous
system
Common views
across multiple
sources
Tutorials for using
full resource when
getting there from
NIF
Link back to
record in
original source
Results are organized within a common
framework
Connects to
Synapsed with
Synapsed by
Input region
innervates
Axon innervates
Projects toCellular contact
Subcellular contact
Source site
Target site
Each resource implements a different, though related model;
systems are complex and difficult to learn, in many cases
NIF Semantic Framework: NIFSTD ontology
• NIF covers multiple structural scales and domains of relevance to neuroscience
• Aggregate of community ontologies with some extensions for neuroscience, e.g., Gene
Ontology, Chebi, Protein Ontology
NIFSTD
Organism
NS FunctionMolecule Investigation
Subcellular
structure
Macromolecule Gene
Molecule Descriptors
Techniques
Reagent Protocols
Cell
Resource Instrument
Dysfunction Quality
Anatomical
Structure
Use of Ontologies
• Controlled vocabulary for describing type of resource and
content
– Database, Image, Diabetes
• Entity-mapping of database and data content
• Data integration across sources
• Search: Mixture of mapped content and string-based
search
– Different parts of the infrastructure use the vocabularies in
different ways
– Utilize synonyms, parents, children to refine search
– Increasing use of other relationships and logical inferencing
• Generation of semantic content (i.e. RDF, Linked Data)
June10, 2013 dkCOIN Investigator's Retreat 24
NIF Concept Mapper
June10, 2013 25
Aligns sources to the NIF semantic framework
Column level mapping:
Reducing false positives
The scourge of neuroanatomical nomenclature:
Importance of NIF semantic framework
•NIF Connectivity: 7 databases containing connectivity primary data or claims
from literature on connectivity between brain regions
•Brain Architecture Management System (rodent)
•Temporal lobe.com (rodent)
•Connectome Wiki (human)
•Brain Maps (various)
•CoCoMac (primate cortex)
•UCLA Multimodal database (Human fMRI)
•Avian Brain Connectivity Database (Bird)
•Total: 1800 unique brain terms (excluding Avian)
•Number of exact terms used in > 1 database: 42
•Number of synonym matches: 99
•Number of 1st order partonomy matches: 385
Content Annotation – Google Refine
June10, 2013 dkCOIN Investigator's Retreat 28
Resource Provider Services - Linkout
June10, 2013 dkCOIN Investigator's Retreat 29
What have we learned: Grabbing the
long tail of small data
• NIF can be used to survey the
data landscape
• Analysis of NIF shows multiple
databases with similar scope
and content
• Many contain partially
overlapping data
• Data “flows” from one
resource to the next
– Data is reinterpreted, reanalyzed or
added to
• Is duplication good or bad?
What do you mean by data?
Databases come in many shapes and sizes
• Primary data:
– Data available for
reanalysis, e.g., microarray data sets
from GEO; brain images from XNAT;
microscopic images (CCDB/CIL)
• Secondary data
– Data features extracted through
data processing and sometimes
normalization, e.g, brain structure
volumes (IBVD), gene expression
levels (Allen Brain Atlas); brain
connectivity statements (BAMS)
• Tertiary data
– Claims and assertions about the
meaning of data
• E.g., gene
upregulation/downregulation,
brain activation as a function of
task
• Registries:
– Metadata
– Pointers to data sets or
materials stored elsewhere
• Data aggregators
– Aggregate data of the same
type from multiple
sources, e.g., Cell Image
Library ,SUMSdb, Brede
• Single source
– Data acquired within a single
context , e.g., Allen Brain Atlas
Researchers are producing a variety of
information artifacts using a multitude of
technologies
NIF Analytics: The Neuroscience Landscape
NIF is in a unique position to answer questions about the neuroscience
landscape
Where are the data?
Striatum
Hypothalamus
Olfactory bulb
Cerebral cortex
Brain
Brainregion
Data source
VadimAstakhov, Kepler Workflow Engine
Whither neuroscience information?
∞
What is easily machine
processable and accessible
What is potentially knowable
What is known:
Literature, images, human
knowledge
Unstructured;
Natural language
processing, entity
recognition, image
processing and
analysis;
communication
Open world meets closed world
We know a lot about some things and less about others; some
of NIF’s sources are comprehensive; others are highly biased
But...NIF has > 900,000
antibodies, 250,000 model
organisms, and 3 million microarray
records
Diseases of nervous system
What drives discovery?
The combination of ontologies, diverse data and analytics lets us look at
the current landscape in interesting ways
Neurodegenerative
Seizuredisorders
Neoplasticdiseaseofnervoussystem
NIH
Reporter
NIFdatafederatedsources
Embracing duplication: Data Mash ups
•NIF queries across 3 of approximately 10 fMRIdatabases
•Two resources, Brede and SUMSdbcurated activation foci from the literature
•~300 PMID’swere common between Brede and SUMSdb
•PMID serves as a unique identifier for an article
•Same information; value added
Data is additive
Same data: different analysis
• Gemma: Gene ID + Gene Symbol
• DRG: Gene name + Probe ID
• Gemma presented results relative to baseline chronic
morphine; DRG with respect to saline, so direction of change is
opposite in the 2 databases
Chronic vs acute morphine in striatum
• Analysis:
•1370 statements from Gemma regarding gene expression as a function of
chronicmorphine
•617 were consistent with DRG; over half of the claims of the paper were not
confirmed in this analysis
•Results for 1 gene were opposite in DRG and Gemma
•45 did not have enough information provided in the paper to make a judgment
Relatively simple standards would make life easier
Phases of NIF
• 2006-2008: A survey of what was out there
• 2008-2009: Strategy for resource discovery
– NIF Registry vs NIF data federation
– Ingestion of data contained within different technology platforms, e.g., XML vs relational
vs RDF
– Effective search across semantically diverse sources
• NIFSTD ontologies
• 2009-2011: Strategy for data integration
– Unified views across common sources
– Mapping of content to NIF vocabularies
• 2011-present: Data analytics
– Uniform external data references
• 2012-present: SciCrunch: unified biomedical resource
services
NIF provides a strategy and set of tools applicable to all
biomedical science
Where is the Neuroscience in NIF?
• Search semantics
• Ranking
• Resources supported by NIH Blueprint Institutes are
more thoroughly covered
• Data types, e.g., Brain activation foci
June10, 2013 dkCOIN Investigator's Retreat 39
Building a Uniform Resource Layer
Discoverability
Accessibility
Web of Data
Data specified via simple semantics
Data in a usable form
Semantically-enabled search
Enhanced semantics
Standardized representation
Linked Open Data - RDF
Data resources simply described
Automated data harvesting technologies
Common resource registry
A production data (resource)
catalog and underlying technology
platform for researchers to
discover, share, access, analyze, and
integrate biomedical information
June10, 2013 40
Community Built Uniform Resource
Layer
June10, 2013 41
SciCrunch
NIF
Neuroscience
MONARCH
Animal Models
Community
Services
dkCOIN
Shared
Resources
Undiagnosed
Disease Program
Phenotype RCN
3D Virtual Cell
National Institute
on Aging
One Mind for
Research
BIRN
International
Neuroinformatics
Coordinating
Facility
Model Organism
Databases
Community
Outreach
DELSA
Varied
(not just a data catalog)
Each project shares resources and adds
unique value to the resource layer
42
•3dVC: Focus on models and simulation
•Gene Ontology: Focus on
bioinformatics tools
•National Institute on aging: Aging-
related data sets
•Monarch: Phenotype-Genotype; deep
semantic data integration
•One Mind for Research: Biospecimen
repositories
•NeuroGateway: Computational
resources
•FORCE11: Tools for next-gen publishing
and e-scholarship
SciCrunch
SciCrunch is actively supporting multiple
communities; multiple communities are
enriching and improving SciCrunch
Customized portals and rankings
June10, 2013 dkCOIN Investigator's Retreat 43
SciCrunch
NIF
Neuroscience
MONARCH
Animal Models
Community
Services
dkCOIN
Shared
Resources
Undiagnosed
Disease Program
Phenotype RCN
3D Virtual Cell
National Institute
on Aging
One Mind for
Research
BIRN
International
Neuroinformatics
Coordinating
Facility
Model Organism
Databases
Community
Outreach
DELSA
Varied
dkCOIN
Ontology
SciCrunch
Shared
Resources
Community
database:
beginning
Community
database:
End
Register your resource to NIF!
“How do I share my
data/tool?”
“There is no database
for my data”
1
2
3
4
Institutional
repositories
Cloud
INCF: Global
infrastructure
Government
Education
Industry
NIF is designed to leverage existing investments in resources and infrastructure
Tool repositories
Collaboration, competition, coordinat
ion, cooperation
• The diversity and dynamism of biomedical data will make
data integration challenging always
• The overall data space is vast: No one group or individual
can do everything
– Cooperation and coordination is essential
• Creating a core resource registry and data catalog allows
the entire community to track resources, work together to
keep it updated, promote cross-fertilization, and build
better resources
June10, 2013 dkCOIN Investigator's Retreat 45
NIF team (past and present)
Jeff Grethe, UCSD, Co Investigator, Interim PI
AmarnathGupta, UCSD, Co Investigator
Anita Bandrowski, NIF Project Leader
Gordon Shepherd, Yale University
Perry Miller
Luis Marenco
Rixin Wang
David Van Essen, Washington University
Erin Reid
Paul Sternberg, Cal Tech
ArunRangarajan
Hans Michael Muller
Yuling Li
Giorgio Ascoli, George Mason University
SrideviPolavarum
FahimImam
Larry Lui
Andrea Arnaud Stagg
Jonathan Cachat
Jennifer Lawrence
Svetlana Sulima
Davis Banks
VadimAstakhov
XufeiQian
Chris Condit
Mark Ellisman
Stephen Larson
Willie Wong
Tim Clark, Harvard University
Paolo Ciccarese
Karen Skinner, NIH, Program Officer
(retired)
Jonathan Pollock, NIH, Program Officer
And my colleagues in Monarch, dkNet, 3DVC, Force 11

Más contenido relacionado

La actualidad más candente

The real world of ontologies and phenotype representation: perspectives from...
The real world of ontologies and phenotype representation:  perspectives from...The real world of ontologies and phenotype representation:  perspectives from...
The real world of ontologies and phenotype representation: perspectives from...Neuroscience Information Framework
 
How do we know what we don’t know: Using the Neuroscience Information Framew...
How do we know what we don’t know:  Using the Neuroscience Information Framew...How do we know what we don’t know:  Using the Neuroscience Information Framew...
How do we know what we don’t know: Using the Neuroscience Information Framew...Maryann Martone
 
The Neuroscience Information Framework: Making Resources Discoverable for the...
The Neuroscience Information Framework: Making Resources Discoverable for the...The Neuroscience Information Framework: Making Resources Discoverable for the...
The Neuroscience Information Framework: Making Resources Discoverable for the...Neuroscience Information Framework
 
The Path to Enlightened Solutions for Biodiversity's Dark Data
The Path to Enlightened Solutions for Biodiversity's Dark DataThe Path to Enlightened Solutions for Biodiversity's Dark Data
The Path to Enlightened Solutions for Biodiversity's Dark Datavbrant
 
Heidorn The Path to Enlightened Solutions for Biodiversity's Dark DataViBRANT...
Heidorn The Path to Enlightened Solutions for Biodiversity's Dark DataViBRANT...Heidorn The Path to Enlightened Solutions for Biodiversity's Dark DataViBRANT...
Heidorn The Path to Enlightened Solutions for Biodiversity's Dark DataViBRANT...Bryan Heidorn
 
The Neuroscience Information Framework: A Scalable Platform for Information E...
The Neuroscience Information Framework: A Scalable Platform for Information E...The Neuroscience Information Framework: A Scalable Platform for Information E...
The Neuroscience Information Framework: A Scalable Platform for Information E...Neuroscience Information Framework
 
How do we know what we don't know?  Exploring the data and knowledge space th...
How do we know what we don't know?  Exploring the data and knowledge space th...How do we know what we don't know?  Exploring the data and knowledge space th...
How do we know what we don't know?  Exploring the data and knowledge space th...Maryann Martone
 
The Neuroscience Information Framework: Establishing a practical semantic fra...
The Neuroscience Information Framework: Establishing a practical semantic fra...The Neuroscience Information Framework: Establishing a practical semantic fra...
The Neuroscience Information Framework: Establishing a practical semantic fra...Neuroscience Information Framework
 
Knowledge Graph Semantics/Interoperability
Knowledge Graph Semantics/InteroperabilityKnowledge Graph Semantics/Interoperability
Knowledge Graph Semantics/InteroperabilityJames Hendler
 
An analysis and characterization of DMPs in NSF proposals from the University...
An analysis and characterization of DMPs in NSF proposals from the University...An analysis and characterization of DMPs in NSF proposals from the University...
An analysis and characterization of DMPs in NSF proposals from the University...Megan O'Donnell
 
Functional and Architectural Requirements for Metadata: Supporting Discovery...
Functional and Architectural Requirements for Metadata: Supporting Discovery...Functional and Architectural Requirements for Metadata: Supporting Discovery...
Functional and Architectural Requirements for Metadata: Supporting Discovery...Jian Qin
 
The possibility and probability of a global Neuroscience Information Framework
The possibility and probability of a global Neuroscience Information Framework The possibility and probability of a global Neuroscience Information Framework
The possibility and probability of a global Neuroscience Information Framework Neuroscience Information Framework
 
Machines are people too
Machines are people tooMachines are people too
Machines are people tooPaul Groth
 
Boundless Opportunity
Boundless OpportunityBoundless Opportunity
Boundless OpportunityRachel Frick
 
Data Sharing with ICPSR: Fueling the Cycle of Science through Discovery, Acce...
Data Sharing with ICPSR: Fueling the Cycle of Science through Discovery, Acce...Data Sharing with ICPSR: Fueling the Cycle of Science through Discovery, Acce...
Data Sharing with ICPSR: Fueling the Cycle of Science through Discovery, Acce...ICPSR
 
New Metaphors: Data Papers and Data Citations
New Metaphors: Data Papers and Data CitationsNew Metaphors: Data Papers and Data Citations
New Metaphors: Data Papers and Data CitationsJohn Kunze
 

La actualidad más candente (18)

The real world of ontologies and phenotype representation: perspectives from...
The real world of ontologies and phenotype representation:  perspectives from...The real world of ontologies and phenotype representation:  perspectives from...
The real world of ontologies and phenotype representation: perspectives from...
 
How do we know what we don’t know: Using the Neuroscience Information Framew...
How do we know what we don’t know:  Using the Neuroscience Information Framew...How do we know what we don’t know:  Using the Neuroscience Information Framew...
How do we know what we don’t know: Using the Neuroscience Information Framew...
 
The Neuroscience Information Framework: Making Resources Discoverable for the...
The Neuroscience Information Framework: Making Resources Discoverable for the...The Neuroscience Information Framework: Making Resources Discoverable for the...
The Neuroscience Information Framework: Making Resources Discoverable for the...
 
The Path to Enlightened Solutions for Biodiversity's Dark Data
The Path to Enlightened Solutions for Biodiversity's Dark DataThe Path to Enlightened Solutions for Biodiversity's Dark Data
The Path to Enlightened Solutions for Biodiversity's Dark Data
 
Heidorn The Path to Enlightened Solutions for Biodiversity's Dark DataViBRANT...
Heidorn The Path to Enlightened Solutions for Biodiversity's Dark DataViBRANT...Heidorn The Path to Enlightened Solutions for Biodiversity's Dark DataViBRANT...
Heidorn The Path to Enlightened Solutions for Biodiversity's Dark DataViBRANT...
 
The Neuroscience Information Framework: A Scalable Platform for Information E...
The Neuroscience Information Framework: A Scalable Platform for Information E...The Neuroscience Information Framework: A Scalable Platform for Information E...
The Neuroscience Information Framework: A Scalable Platform for Information E...
 
Martone grethe
Martone gretheMartone grethe
Martone grethe
 
How do we know what we don't know?  Exploring the data and knowledge space th...
How do we know what we don't know?  Exploring the data and knowledge space th...How do we know what we don't know?  Exploring the data and knowledge space th...
How do we know what we don't know?  Exploring the data and knowledge space th...
 
The Neuroscience Information Framework: Establishing a practical semantic fra...
The Neuroscience Information Framework: Establishing a practical semantic fra...The Neuroscience Information Framework: Establishing a practical semantic fra...
The Neuroscience Information Framework: Establishing a practical semantic fra...
 
Knowledge Graph Semantics/Interoperability
Knowledge Graph Semantics/InteroperabilityKnowledge Graph Semantics/Interoperability
Knowledge Graph Semantics/Interoperability
 
An analysis and characterization of DMPs in NSF proposals from the University...
An analysis and characterization of DMPs in NSF proposals from the University...An analysis and characterization of DMPs in NSF proposals from the University...
An analysis and characterization of DMPs in NSF proposals from the University...
 
Functional and Architectural Requirements for Metadata: Supporting Discovery...
Functional and Architectural Requirements for Metadata: Supporting Discovery...Functional and Architectural Requirements for Metadata: Supporting Discovery...
Functional and Architectural Requirements for Metadata: Supporting Discovery...
 
The possibility and probability of a global Neuroscience Information Framework
The possibility and probability of a global Neuroscience Information Framework The possibility and probability of a global Neuroscience Information Framework
The possibility and probability of a global Neuroscience Information Framework
 
Open Science
Open Science Open Science
Open Science
 
Machines are people too
Machines are people tooMachines are people too
Machines are people too
 
Boundless Opportunity
Boundless OpportunityBoundless Opportunity
Boundless Opportunity
 
Data Sharing with ICPSR: Fueling the Cycle of Science through Discovery, Acce...
Data Sharing with ICPSR: Fueling the Cycle of Science through Discovery, Acce...Data Sharing with ICPSR: Fueling the Cycle of Science through Discovery, Acce...
Data Sharing with ICPSR: Fueling the Cycle of Science through Discovery, Acce...
 
New Metaphors: Data Papers and Data Citations
New Metaphors: Data Papers and Data CitationsNew Metaphors: Data Papers and Data Citations
New Metaphors: Data Papers and Data Citations
 

Similar a A Deep Survey of the Digital Resource Landscape: Perspectives from the Neuroscience Information Framework

RDAP14: Maryann Martone, Keynote, The Neuroscience Information Framework
RDAP14: Maryann Martone, Keynote, The Neuroscience Information FrameworkRDAP14: Maryann Martone, Keynote, The Neuroscience Information Framework
RDAP14: Maryann Martone, Keynote, The Neuroscience Information FrameworkASIS&T
 
Neurosciences Information Framework (NIF): An example of community Cyberinfra...
Neurosciences Information Framework (NIF): An example of community Cyberinfra...Neurosciences Information Framework (NIF): An example of community Cyberinfra...
Neurosciences Information Framework (NIF): An example of community Cyberinfra...Neuroscience Information Framework
 
bioCADDIE Webinar: The NIDDK Information Network (dkNET) - A Community Resear...
bioCADDIE Webinar: The NIDDK Information Network (dkNET) - A Community Resear...bioCADDIE Webinar: The NIDDK Information Network (dkNET) - A Community Resear...
bioCADDIE Webinar: The NIDDK Information Network (dkNET) - A Community Resear...dkNET
 
Reproducibility in human cognitive neuroimaging: a community-­driven data sha...
Reproducibility in human cognitive neuroimaging: a community-­driven data sha...Reproducibility in human cognitive neuroimaging: a community-­driven data sha...
Reproducibility in human cognitive neuroimaging: a community-­driven data sha...Nolan Nichols
 
Sla2009 D Curation Heidorn
Sla2009 D Curation HeidornSla2009 D Curation Heidorn
Sla2009 D Curation HeidornBryan Heidorn
 
RDAP14: Learning to Curate Panel
RDAP14: Learning to Curate Panel RDAP14: Learning to Curate Panel
RDAP14: Learning to Curate Panel ASIS&T
 
The real world of ontologies and phenotype representation: perspectives from...
The real world of ontologies and phenotype representation:  perspectives from...The real world of ontologies and phenotype representation:  perspectives from...
The real world of ontologies and phenotype representation: perspectives from...Maryann Martone
 
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...ICPSR
 
Conference Linked Data: the ScholarlyData project
Conference Linked Data: the ScholarlyData projectConference Linked Data: the ScholarlyData project
Conference Linked Data: the ScholarlyData projectAndrea Nuzzolese
 
5-14-13 An Introduction to VIVO Presentation Slides
5-14-13 An Introduction to VIVO Presentation Slides5-14-13 An Introduction to VIVO Presentation Slides
5-14-13 An Introduction to VIVO Presentation SlidesDuraSpace
 
Research Data Alliance: Creating the culture and technology for an internatio...
Research Data Alliance: Creating the culture and technology for an internatio...Research Data Alliance: Creating the culture and technology for an internatio...
Research Data Alliance: Creating the culture and technology for an internatio...Research Data Alliance
 
A Knowledge Discovery Framework for Planetary Defense
A Knowledge Discovery Framework for Planetary DefenseA Knowledge Discovery Framework for Planetary Defense
A Knowledge Discovery Framework for Planetary DefenseYongyao Jiang
 
Beyond Management: Data Curation as Scholarship in Archaeology
Beyond Management: Data Curation as Scholarship in ArchaeologyBeyond Management: Data Curation as Scholarship in Archaeology
Beyond Management: Data Curation as Scholarship in ArchaeologySarah Whitcher Kansa
 
Jim Woolley - Name Registration: One Less Impediment to Taxonomy
Jim Woolley - Name Registration: One Less Impediment to TaxonomyJim Woolley - Name Registration: One Less Impediment to Taxonomy
Jim Woolley - Name Registration: One Less Impediment to TaxonomyICZN
 
Records professionals and Research Data - a new role?
Records professionals and Research Data - a new role?Records professionals and Research Data - a new role?
Records professionals and Research Data - a new role?Rebecca Grant
 
NSF Software @ ApacheConNA
NSF Software @ ApacheConNANSF Software @ ApacheConNA
NSF Software @ ApacheConNADaniel S. Katz
 

Similar a A Deep Survey of the Digital Resource Landscape: Perspectives from the Neuroscience Information Framework (20)

RDAP14: Maryann Martone, Keynote, The Neuroscience Information Framework
RDAP14: Maryann Martone, Keynote, The Neuroscience Information FrameworkRDAP14: Maryann Martone, Keynote, The Neuroscience Information Framework
RDAP14: Maryann Martone, Keynote, The Neuroscience Information Framework
 
Neurosciences Information Framework (NIF): An example of community Cyberinfra...
Neurosciences Information Framework (NIF): An example of community Cyberinfra...Neurosciences Information Framework (NIF): An example of community Cyberinfra...
Neurosciences Information Framework (NIF): An example of community Cyberinfra...
 
bioCADDIE Webinar: The NIDDK Information Network (dkNET) - A Community Resear...
bioCADDIE Webinar: The NIDDK Information Network (dkNET) - A Community Resear...bioCADDIE Webinar: The NIDDK Information Network (dkNET) - A Community Resear...
bioCADDIE Webinar: The NIDDK Information Network (dkNET) - A Community Resear...
 
Reproducibility in human cognitive neuroimaging: a community-­driven data sha...
Reproducibility in human cognitive neuroimaging: a community-­driven data sha...Reproducibility in human cognitive neuroimaging: a community-­driven data sha...
Reproducibility in human cognitive neuroimaging: a community-­driven data sha...
 
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
 
Sla2009 D Curation Heidorn
Sla2009 D Curation HeidornSla2009 D Curation Heidorn
Sla2009 D Curation Heidorn
 
RDAP14: Learning to Curate Panel
RDAP14: Learning to Curate Panel RDAP14: Learning to Curate Panel
RDAP14: Learning to Curate Panel
 
The real world of ontologies and phenotype representation: perspectives from...
The real world of ontologies and phenotype representation:  perspectives from...The real world of ontologies and phenotype representation:  perspectives from...
The real world of ontologies and phenotype representation: perspectives from...
 
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
 
Conference Linked Data: the ScholarlyData project
Conference Linked Data: the ScholarlyData projectConference Linked Data: the ScholarlyData project
Conference Linked Data: the ScholarlyData project
 
Engaging the Researcher in RDM
Engaging the Researcher in RDMEngaging the Researcher in RDM
Engaging the Researcher in RDM
 
5-14-13 An Introduction to VIVO Presentation Slides
5-14-13 An Introduction to VIVO Presentation Slides5-14-13 An Introduction to VIVO Presentation Slides
5-14-13 An Introduction to VIVO Presentation Slides
 
Open Science and Open Data for Librarians
Open Science and Open Data for LibrariansOpen Science and Open Data for Librarians
Open Science and Open Data for Librarians
 
Research Data Alliance: Creating the culture and technology for an internatio...
Research Data Alliance: Creating the culture and technology for an internatio...Research Data Alliance: Creating the culture and technology for an internatio...
Research Data Alliance: Creating the culture and technology for an internatio...
 
A Knowledge Discovery Framework for Planetary Defense
A Knowledge Discovery Framework for Planetary DefenseA Knowledge Discovery Framework for Planetary Defense
A Knowledge Discovery Framework for Planetary Defense
 
Beyond Management: Data Curation as Scholarship in Archaeology
Beyond Management: Data Curation as Scholarship in ArchaeologyBeyond Management: Data Curation as Scholarship in Archaeology
Beyond Management: Data Curation as Scholarship in Archaeology
 
Jim Woolley - Name Registration: One Less Impediment to Taxonomy
Jim Woolley - Name Registration: One Less Impediment to TaxonomyJim Woolley - Name Registration: One Less Impediment to Taxonomy
Jim Woolley - Name Registration: One Less Impediment to Taxonomy
 
Records professionals and Research Data - a new role?
Records professionals and Research Data - a new role?Records professionals and Research Data - a new role?
Records professionals and Research Data - a new role?
 
Research Objects in Wf4Ever
Research Objects in Wf4EverResearch Objects in Wf4Ever
Research Objects in Wf4Ever
 
NSF Software @ ApacheConNA
NSF Software @ ApacheConNANSF Software @ ApacheConNA
NSF Software @ ApacheConNA
 

Último

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 

Último (20)

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 

A Deep Survey of the Digital Resource Landscape: Perspectives from the Neuroscience Information Framework

  • 1. A Deep Survey of the Digital Resource Landscape: Perspectives from the Neuroscience Information Framework Maryann E. Martone, Ph. D. University of California, San Diego
  • 2. • NIF is an initiative of the NIH Blueprint consortium of institutes – What types of resources (data, tools, materials, services) are available to the neuroscience community? – How many are there? – What domains do they cover? What domains do they not cover? – Where are they? • Web sites • Databases • Literature • Supplementary material – Who uses them? – Who creates them? – How can we find them? – How can we make them better in the future? http://neuinfo.org • PDF files • Desk drawers
  • 3. The Neuroscience Information Framework • NIF has developed a production technology platform for researchers to: – Discover – Share – Analyze – Integrate neuroscience-relevant information • Since 2008, NIF has assembled the largest searchable catalog of neuroscience data and resources on the web • Cost-effective and innovative strategy for managing data assets “This unique data depository serves as a model for other Web sites to provide research data. “ - Choice Reviews Online NIF is poised to capitalize on the new tools and emphasis on big data and open science
  • 4. http://neuinfo.org June10, 2013 dkCOIN Investigator's Retreat 4 The Neuroscience Information Framework: Discovery and utilization of web-based resources for neuroscience • A portal for finding and using neuroscience resources  A consistent framework for describing resources  Provides simultaneous search of multiple types of information, organized by category  Supported by an expansive ontology for neuroscience  Utilizes advanced technologies to search the “hidden web” UCSD, Yale, Cal Tech, George Mason, Washington Univ Literature Database Federation Registry
  • 5. Part 1: Surveying the resource landscape •NIF Registry: A catalog of neuroscience- relevant resources •> 6000 currently listed •>2200 databases •And we are finding more every day
  • 6. How do resources get added to the NIF Registry? June10, 2013 dkCOIN Investigator's Retreat 6 •NIF curators •Nomination by the community •Semi-automated text mining pipelines NIF Registry Requires no special skills Site map available for local hosting •NIF Data Federation •DISCO interop •Requires some programming skill Bandrowski et al., 2012
  • 7. NIF Registry • Extended over time – Parent resource – Supporting agency – Grant numbers – Accessibility – Related to – Organism – Disease or condition – Last updated First catalog: SFN Neuroscience Database Gateway  NIF 0.5  NIF 1.0+ Simple metadata model Name, description, type, URL, other names, keywords, unique identifier ~2003 2006 2008
  • 8. Resource Curation June10, 2013 dkCOIN Investigator's Retreat 8 • NIF Registry is hosted on Semantic Media Wiki platform Neurolex – Community can add, review, edit without special privileges – Searchable by Google – Integrated with NIF ontologies – Graph structure http://neurolex.org
  • 9. The resource graph NIF is creating the linked data graph of resources
  • 10. Keeping the Registry Current – NIF employs an automated link checker – Last analysis: 478/6100 invalid URL’s (~8%) – 199 can’t locate at another university or location  out of service (~3%) – Bigger issue: Many resources are no longer updated or maintained 0 20 40 60 80 100 120 140 160 180 200 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014 0 500 1000 1500 2000 2500 3000 3500 Resourcesadded Lastupdated
  • 11. • Automated text mining is used to look for “web page last updated” or copyright dates – Identified for 570 resources; manual review suggested that the results were accurate although we can’t guarantee that the date itself is accurate – 373 were not updated within the last 2 years (65%) • Manual review of ~200 resources identified by 3DVC for their catalog – 38 not updated within the past 2 years (~20%) – 8 migrated to new addresses or institutions – 7 are no longer in service (~3%) – 3 were deemed no longer appropriate Tracking the fate of digital resources Yuling Li, Paul Sternberg, Cal Tech
  • 12. Keeping content up to date Connectome Tractography Epigenetics •New tags come into existence •New resource types come into existence, e.g., Mobile apps •Resources add new types of content •Change name •Change scope •> 7000 updates to the registry last year It’s a challenge to keep the registry up to date; sitemaps, curation, ontologies, community review
  • 13. Ontology provides a human-centric model for search and data integration June10, 2013 dkCOIN Investigator's Retreat 13
  • 14. Last updated... • Some neglected resources are still valuable – Complete data sets – Rare data • Software may still be usable • Some databases, however, ma y only be of historical interest – “all metalloproteins found in PDB” Are all databases and data sets equally valuable?
  • 15. • The NIF Registry has created a linked data graph of web-accessible resources • Maintained on a community wiki platform • Provides data on the fluidity of the resource landscape – New resources continue to be created and found – Relatively few disappear altogether – Many more grow stale, although their value may still be significant – Maintaining up to date curation requires frequent updating Summary NIF Registry provides insight into the state of digital resources on the web
  • 16. Part 2: Surveying the data landscape •The NIF data federation performs deep search over the content of over 200 databases •New databases are added at a rate of 25-40 per year •Latest update: Open Source Brain; ingest completed in 2 hours •Databases chosen on a variety of criteria: •Early: testing different types of resources •Thematic areas •Volunteers
  • 17. 0 50 100 150 200 250 0.01 0.1 1 10 100 1000 Jun-08 Dec-08 Jul-09 Jan-10 Aug-10 Feb-11 Sep-11 Apr-12 Oct-12 May-13 NumberofFederatedDatabases NumberofFederatedRecords(Millions) Data Federation Growth NIF searches the largest collation of neuroscience-relevant data on the web DISCO June10, 2013 dkCOIN Investigator's Retreat 17
  • 18. Data Ingestion Architecture Current Planned DISCO Dashboard Functions • Ingest Script Manager • Public Script Repository • Data & Event Tracker • Versioning System • Curator Tool • Data Transformer Manager June10, 2013 dkCOIN Investigator's Retreat 18Luis Marenco, Rixin Wang, Perrry Miller, Gordon Shepherd Yale University
  • 19. DISCO Dashboard June10, 2013 dkCOIN Investigator's Retreat 19 • Management of registry resources through a single administrative dashboard • Associated discovery pipeline • Tools to manage data updates • Change tracking • Globally unique identifier creation Luis Marenco, Rixin Wang, Perrry Miller, Gordon Shepherd Yale University
  • 20. NIF data federation NIF was designed to be populated rapidly with progressive refinement
  • 21. What are the connections of the hippocampus? Hippocampus OR “CornuAmmonis” OR “Ammon’s horn” Query expansion: Synonyms and related concepts Boolean queries Data sources categorized by “data type” and level of nervous system Common views across multiple sources Tutorials for using full resource when getting there from NIF Link back to record in original source
  • 22. Results are organized within a common framework Connects to Synapsed with Synapsed by Input region innervates Axon innervates Projects toCellular contact Subcellular contact Source site Target site Each resource implements a different, though related model; systems are complex and difficult to learn, in many cases
  • 23. NIF Semantic Framework: NIFSTD ontology • NIF covers multiple structural scales and domains of relevance to neuroscience • Aggregate of community ontologies with some extensions for neuroscience, e.g., Gene Ontology, Chebi, Protein Ontology NIFSTD Organism NS FunctionMolecule Investigation Subcellular structure Macromolecule Gene Molecule Descriptors Techniques Reagent Protocols Cell Resource Instrument Dysfunction Quality Anatomical Structure
  • 24. Use of Ontologies • Controlled vocabulary for describing type of resource and content – Database, Image, Diabetes • Entity-mapping of database and data content • Data integration across sources • Search: Mixture of mapped content and string-based search – Different parts of the infrastructure use the vocabularies in different ways – Utilize synonyms, parents, children to refine search – Increasing use of other relationships and logical inferencing • Generation of semantic content (i.e. RDF, Linked Data) June10, 2013 dkCOIN Investigator's Retreat 24
  • 25. NIF Concept Mapper June10, 2013 25 Aligns sources to the NIF semantic framework
  • 27. The scourge of neuroanatomical nomenclature: Importance of NIF semantic framework •NIF Connectivity: 7 databases containing connectivity primary data or claims from literature on connectivity between brain regions •Brain Architecture Management System (rodent) •Temporal lobe.com (rodent) •Connectome Wiki (human) •Brain Maps (various) •CoCoMac (primate cortex) •UCLA Multimodal database (Human fMRI) •Avian Brain Connectivity Database (Bird) •Total: 1800 unique brain terms (excluding Avian) •Number of exact terms used in > 1 database: 42 •Number of synonym matches: 99 •Number of 1st order partonomy matches: 385
  • 28. Content Annotation – Google Refine June10, 2013 dkCOIN Investigator's Retreat 28
  • 29. Resource Provider Services - Linkout June10, 2013 dkCOIN Investigator's Retreat 29
  • 30. What have we learned: Grabbing the long tail of small data • NIF can be used to survey the data landscape • Analysis of NIF shows multiple databases with similar scope and content • Many contain partially overlapping data • Data “flows” from one resource to the next – Data is reinterpreted, reanalyzed or added to • Is duplication good or bad?
  • 31. What do you mean by data? Databases come in many shapes and sizes • Primary data: – Data available for reanalysis, e.g., microarray data sets from GEO; brain images from XNAT; microscopic images (CCDB/CIL) • Secondary data – Data features extracted through data processing and sometimes normalization, e.g, brain structure volumes (IBVD), gene expression levels (Allen Brain Atlas); brain connectivity statements (BAMS) • Tertiary data – Claims and assertions about the meaning of data • E.g., gene upregulation/downregulation, brain activation as a function of task • Registries: – Metadata – Pointers to data sets or materials stored elsewhere • Data aggregators – Aggregate data of the same type from multiple sources, e.g., Cell Image Library ,SUMSdb, Brede • Single source – Data acquired within a single context , e.g., Allen Brain Atlas Researchers are producing a variety of information artifacts using a multitude of technologies
  • 32. NIF Analytics: The Neuroscience Landscape NIF is in a unique position to answer questions about the neuroscience landscape Where are the data? Striatum Hypothalamus Olfactory bulb Cerebral cortex Brain Brainregion Data source VadimAstakhov, Kepler Workflow Engine
  • 33. Whither neuroscience information? ∞ What is easily machine processable and accessible What is potentially knowable What is known: Literature, images, human knowledge Unstructured; Natural language processing, entity recognition, image processing and analysis; communication
  • 34. Open world meets closed world We know a lot about some things and less about others; some of NIF’s sources are comprehensive; others are highly biased But...NIF has > 900,000 antibodies, 250,000 model organisms, and 3 million microarray records
  • 35. Diseases of nervous system What drives discovery? The combination of ontologies, diverse data and analytics lets us look at the current landscape in interesting ways Neurodegenerative Seizuredisorders Neoplasticdiseaseofnervoussystem NIH Reporter NIFdatafederatedsources
  • 36. Embracing duplication: Data Mash ups •NIF queries across 3 of approximately 10 fMRIdatabases •Two resources, Brede and SUMSdbcurated activation foci from the literature •~300 PMID’swere common between Brede and SUMSdb •PMID serves as a unique identifier for an article •Same information; value added Data is additive
  • 37. Same data: different analysis • Gemma: Gene ID + Gene Symbol • DRG: Gene name + Probe ID • Gemma presented results relative to baseline chronic morphine; DRG with respect to saline, so direction of change is opposite in the 2 databases Chronic vs acute morphine in striatum • Analysis: •1370 statements from Gemma regarding gene expression as a function of chronicmorphine •617 were consistent with DRG; over half of the claims of the paper were not confirmed in this analysis •Results for 1 gene were opposite in DRG and Gemma •45 did not have enough information provided in the paper to make a judgment Relatively simple standards would make life easier
  • 38. Phases of NIF • 2006-2008: A survey of what was out there • 2008-2009: Strategy for resource discovery – NIF Registry vs NIF data federation – Ingestion of data contained within different technology platforms, e.g., XML vs relational vs RDF – Effective search across semantically diverse sources • NIFSTD ontologies • 2009-2011: Strategy for data integration – Unified views across common sources – Mapping of content to NIF vocabularies • 2011-present: Data analytics – Uniform external data references • 2012-present: SciCrunch: unified biomedical resource services NIF provides a strategy and set of tools applicable to all biomedical science
  • 39. Where is the Neuroscience in NIF? • Search semantics • Ranking • Resources supported by NIH Blueprint Institutes are more thoroughly covered • Data types, e.g., Brain activation foci June10, 2013 dkCOIN Investigator's Retreat 39
  • 40. Building a Uniform Resource Layer Discoverability Accessibility Web of Data Data specified via simple semantics Data in a usable form Semantically-enabled search Enhanced semantics Standardized representation Linked Open Data - RDF Data resources simply described Automated data harvesting technologies Common resource registry A production data (resource) catalog and underlying technology platform for researchers to discover, share, access, analyze, and integrate biomedical information June10, 2013 40
  • 41. Community Built Uniform Resource Layer June10, 2013 41 SciCrunch NIF Neuroscience MONARCH Animal Models Community Services dkCOIN Shared Resources Undiagnosed Disease Program Phenotype RCN 3D Virtual Cell National Institute on Aging One Mind for Research BIRN International Neuroinformatics Coordinating Facility Model Organism Databases Community Outreach DELSA Varied (not just a data catalog)
  • 42. Each project shares resources and adds unique value to the resource layer 42 •3dVC: Focus on models and simulation •Gene Ontology: Focus on bioinformatics tools •National Institute on aging: Aging- related data sets •Monarch: Phenotype-Genotype; deep semantic data integration •One Mind for Research: Biospecimen repositories •NeuroGateway: Computational resources •FORCE11: Tools for next-gen publishing and e-scholarship SciCrunch SciCrunch is actively supporting multiple communities; multiple communities are enriching and improving SciCrunch
  • 43. Customized portals and rankings June10, 2013 dkCOIN Investigator's Retreat 43 SciCrunch NIF Neuroscience MONARCH Animal Models Community Services dkCOIN Shared Resources Undiagnosed Disease Program Phenotype RCN 3D Virtual Cell National Institute on Aging One Mind for Research BIRN International Neuroinformatics Coordinating Facility Model Organism Databases Community Outreach DELSA Varied dkCOIN Ontology SciCrunch Shared Resources
  • 44. Community database: beginning Community database: End Register your resource to NIF! “How do I share my data/tool?” “There is no database for my data” 1 2 3 4 Institutional repositories Cloud INCF: Global infrastructure Government Education Industry NIF is designed to leverage existing investments in resources and infrastructure Tool repositories
  • 45. Collaboration, competition, coordinat ion, cooperation • The diversity and dynamism of biomedical data will make data integration challenging always • The overall data space is vast: No one group or individual can do everything – Cooperation and coordination is essential • Creating a core resource registry and data catalog allows the entire community to track resources, work together to keep it updated, promote cross-fertilization, and build better resources June10, 2013 dkCOIN Investigator's Retreat 45
  • 46. NIF team (past and present) Jeff Grethe, UCSD, Co Investigator, Interim PI AmarnathGupta, UCSD, Co Investigator Anita Bandrowski, NIF Project Leader Gordon Shepherd, Yale University Perry Miller Luis Marenco Rixin Wang David Van Essen, Washington University Erin Reid Paul Sternberg, Cal Tech ArunRangarajan Hans Michael Muller Yuling Li Giorgio Ascoli, George Mason University SrideviPolavarum FahimImam Larry Lui Andrea Arnaud Stagg Jonathan Cachat Jennifer Lawrence Svetlana Sulima Davis Banks VadimAstakhov XufeiQian Chris Condit Mark Ellisman Stephen Larson Willie Wong Tim Clark, Harvard University Paolo Ciccarese Karen Skinner, NIH, Program Officer (retired) Jonathan Pollock, NIH, Program Officer And my colleagues in Monarch, dkNet, 3DVC, Force 11

Notas del editor

  1. Lists all NIF resources registered at levels 2+ in the DISCO server.Shows their DISCO services, and location of DISCO filesControls to filter, sort and page all resources