This document proposes collaborating with the BioDBCore initiative to standardize the registration and description of biological databases. It identifies challenges in uniquely identifying databases due to unstable URLs. The proposal suggests adopting the MIRIAM registry's persistent identifiers to decouple identification from location. Benefits include globally identifying life science databases, improved discovery of relevant resources, and potential for BioDBCore to evolve into a database publishing platform. Open questions remain regarding technical details and integrating existing database lists.
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
GEN2PHEN GAM8 meeting Leiden - Identifiers for LSDBs
1. G. A. Thorisson, A. J. Webb, R. Dalgleish ULEIC / J. Muilu FIMM
Identification of G2P databases -
challenges and proposal for a solution
Gudmundur A. Thorisson <gt50@leicester.ac.uk> ULEIC
Adam J. Webb <ajw51@leicester.ac.uk> ULEIC
Raymond Dalgleish <ray@leicester.ac.uk> ULEIC
Juha Muilu <juha.muilu@helsinki.fi> FIMM
-- Overview --
✴ Identification difficulties - the Knowledge Centre perspective
✴ Or, why we need persistent identifiers for database resources
✴ Proposal to collaborate with the BioDBCore initiative
✴ standardizing registration & description of bio-databases
This work is published under the Creative Commons Attribution license
(CC BY: http://creativecommons.org/licenses/by/3.0/) which means that
it can be freely copied, redistributed and adapted, as long as proper
attribution is given.
GEN2PHEN 8th General Assembly Meeting, Leiden, Jan 24-25 2012 1
Friday, 27 January 12
2. Linking
resources
External
records
/
annotaEons
c. c. c. c. c.
Databases 301C> 465A 555G> 103C> 321G>
T >G T T T
DB
maintainer SubmiIer SubmiIer DB
maintainer
Friday, 27 January 12
3. URLs
are
unstable
hIp://subdomain.example.com/path/to/resource
• Domain
names
/
subdomains
can
change
– hgvbaseg2p.org
-‐>
gwascentral.org
– server1.example.com
-‐>
server2.example.com
• Paths
can
change
– e.g
/LOVD2/
change
to
/LOVD3/
• LSDB
genes
can
move
– e.g
gene
ADAM19
moves
from
one
LOVD
install
to
another
• Databases
can
merge
– i.e
gene
ADAM19
on
two
different
installs
are
reconciled
into
a
single
install
Friday, 27 January 12
4. 1:1
IDENTIFIER DATA
RESOURCE
• Gene
name
not
suitable
– >
1
database
for
a
given
gene
• gene.lovd.nl
-‐>
returns
list
of
databases
(or
redirects
if
only
1
is
known)
– 1
to
many
• lovd.nl/gene
-‐>
redirects
to
*one*
database
– 1
to
one,
but
many
resource
do
not
receive
idenEfiers
• These
are
locators,
not
idenEfiers
• Non-‐gene
based
resources
• Ideally
the
idenEfier
should
also
operate
as
the
locator
(like
DOIs
via
a
DOI
resoluEon
service)
– hIp://dx.doi.org/10.19192
resolves
DOI
10.19192
Friday, 27 January 12
5. G. A. Thorisson, A. J. Webb, R. Dalgleish ULEIC / J. Muilu FIMM
Proposal to collaborate with BioDBCore
• BioDBCore aims
– annotation - organize the bio-database
‘resourceome’
– discovery - e.g. which protein
sequence databases are available?
• Who’s behind it?
– International Society for Biocuration
– Resource catalogues: Bioinformatics
Links, BioSiteMaps, NAR db-issue etc
– Working group includes reps from NAR
and DATABASE journals, MIBBI, Model
organism db’s, CASIMIR mouse
informatics consortium, others
GEN2PHEN 8th General Assembly Meeting, Leiden, Jan 24-25 2012 5
Friday, 27 January 12
6. G. A. Thorisson, A. J. Webb, R. Dalgleish ULEIC / J. Muilu FIMM
GEN2PHEN 8th General Assembly Meeting, Leiden, Jan 24-25 2012 6
Friday, 27 January 12
7. G. A. Thorisson, A. J. Webb, R. Dalgleish ULEIC / J. Muilu FIMM
Persistent resource identifiers in BioDBCore
• They plan to use MIRIAM registry / ID resolution service
– unique, persistent and unambiguous identification of various kind of concepts.
• http://identifiers.org/ec-code/1.1.1.1
• http://identifiers.org/pubmed/16333295
• http://identifiers.org/doi/10.1038/nbt1156
• Decouples identification from location
• Many resourcesa are already registered with MIRIAM
• Operated by EBI <-- long-term sustainability prospect
• Adoption by players LS Semantic Web comunity
– URIs for identifying entities in biological information represented in RDF
– http://lsrn.org, Shared Names, Bio2RDF, others
GEN2PHEN 8th General Assembly Meeting, Leiden, Jan 24-25 2012 7
Friday, 27 January 12
8. G. A. Thorisson, A. J. Webb, R. Dalgleish ULEIC / J. Muilu FIMM
How might this work?
• Using database URIs - plausible scenario
– Persistent canonical URI: http://identifiers.org/biodbcore/10235900
– Click URL, browser redirects to http://biodbcore.org/resource/10235900
– BioDBCore metadata record for the database (akin to “landing page” online journal
site)
• BioDBCore “landing page” presents database metadata
– Information *about* the “thing”
– Name: Ehlers-Danlos Syndrome Variant Database
Main resource URL: https://eds.gene.le.ac.uk <-- the “thing” itself
[scope, data standards, other metadata]
• Location of database = the “thing” itself
GEN2PHEN 8th General Assembly Meeting, Leiden, Jan 24-25 2012 8
Friday, 27 January 12
9. G. A. Thorisson, A. J. Webb, R. Dalgleish ULEIC / J. Muilu FIMM
Mututal benefits
• To GEN2PHEN / G2P community
– Identification - slot into resource identifier scheme for bio-databases globally, build
more detailed catalogues & annotation systems around this
– Discovery - finding relevant LSDB and other G2P resources via range of search/
query tools outside the KC or LSDB lists
– BioDBCore could possibly evolve into a sort of live “database publishing
platform” , instead of the static “snapshot” conventional papers.
• To BioDBCore initiative
– Acquire an entire category’s worth of metadata records & link to community
– Extra pairs of eyes on what they’re doing, alternative perspective
– Potential for further collaboration on contrib. tracking tools & ORCID integration
GEN2PHEN 8th General Assembly Meeting, Leiden, Jan 24-25 2012 9
Friday, 27 January 12
10. G. A. Thorisson, A. J. Webb, R. Dalgleish ULEIC / J. Muilu FIMM
Open questions, known unknowns etc.
• BioDBCore quite new, many things remain in flux
– e.g. the MIRIAM / identifiers.org technical details are vague
• DOIs for BioDBCore records - register database DOIs for fuller
integration into publishing process?
• How will this work with existing LSDB lists?
GEN2PHEN 8th General Assembly Meeting, Leiden, Jan 24-25 2012 10
Friday, 27 January 12
11. G. A. Thorisson, ULEIC
Acknowledgements
GEN2PHEN Consortium
This work has received funding from the
http://www.gen2phen.org/about-gen2phen/partners European Community's Seventh
Framework Programme (FP7/2007-2013)
under grant agreement number 200754 -
Prof Anthony J. Brookes Bioinformatics Group, Leicester
the GEN2PHEN project.
Contact me!
<gt50@le.ac.uk> |<gthorisson@gmail.com>
http://www.linkedin.com/in/mummi
http://www.twitter.com/gthorisson
Published under the CC BY license (http://
http://www.gthorisson.name creativecommons.org/licenses/by/3.0/)
GEN2PHEN 8th General Assembly Meeting, Leiden, Jan 24-25 2012 11
Friday, 27 January 12