SlideShare una empresa de Scribd logo
1 de 21
Descargar para leer sin conexión
BRIF Digital identifiers subgroup

              Gudmundur A. Thorisson <gt50@leicester.ac.uk> GEN2PHEN / University of Leicester
                     Pierre-Antoine Gourraud <pierreantoine.gourraud@ucsf.edu> UCSF



                                                  -- Overview --
             ‣Brief backgrounder on identification & digital identifiers
             ‣Use cases for bio-resource identification in BRIF
                   ‣Digital resources: datasets, databases (Mummi)
                   ‣Non-digital resources: projects, studies, cohorts [...] (Pierre)

             ‣Conclusions and next steps




                                       This work is published under the Creative Commons Attribution license
                                       (CC BY: http://creativecommons.org/licenses/by/3.0/) which means that
                                       it can be freely copied, redistributed and adapted, as long as proper
                                       attribution is given.


Monday, 22 October 12
BRIF and bio-resource identification
        • The identification requirement: need to identify resources in
          order to
              – track use/reuse and impact
              – credit those who contribute to them



        • Biobanking projects have relied on:
              – Project/study/cohort names
                    • Example: the GAZEL study in France >20 years http://www.gazel.inserm.fr
                    • Challenges: - ad hoc agreements with research groups who reuse samples or data
                                  - painstaking manual searching through literature for mentions of ‘GAZEL‘
                                  - project names are often ambiguous in global context




 BRIF workshop, Toulouse Oct 22 2012
Monday, 22 October 12
Monday, 22 October 12
BRIF and bio-resource identification
        • The identification requirement: need to identify resources in
          order to
              – track use/reuse and impact
              – credit those who contribute to them



        • Example: biobanking projects frequently rely on...
              – Project/study/cohort names
                    • Example: the GAZEL study in France >20 years http://www.gazel.inserm.fr
                    • Challenges: - ad hoc agreements with research groups who reuse samples or data
                                  - painstaking manual searching through literature for mentions of ‘GAZEL‘
                                  - project names are often ambiguous in global context


              – Citations to journal publications
                    • Which paper to cite? Tricky to keep track of which citations are relevant to impact
                    • Also troublesome if there is no paper to cite (e.g. for a new study)
 BRIF workshop, Toulouse Oct 22 2012
Monday, 22 October 12
Digital identifiers - some background
        • Definition: a digital identifier is a character string used to uniquely
            identify i) a digital object in a computer system, or ii) a record in a
            computer system which describes a non-digital object
        • Persistence - once assigned, identifier MUST NOT change
        • Uniqueness - global scope vs local scope
              – Most ID schemes require tacid knowledge of the type of identifier to interpret
                    • Example: EC grant identifiers in acknowledgement statements




 BRIF workshop, Toulouse Oct 22 2012
Monday, 22 October 12
This work has received funding from the European Community's
          Seventh Framework Programme (FP7/2007-2013) under grant
          agreement number 200754 - the GEN2PHEN project.




 BRIF workshop, Toulouse Oct 22 2012
Monday, 22 October 12
This work has received funding
                                           under grant
          agreement number 200754




 BRIF workshop, Toulouse Oct 22 2012
Monday, 22 October 12
Digital identifiers - some background
        • Definition: a digital identifier is a character string used to uniquely
            identify i) a digital object in a computer system, or ii) a record in a
            computer system which describes a non-digital object
        • Persistence - once assigned, identifier MUST NOT change
        • Uniqueness - global scope vs local scope
              – Most ID schemes require tacid knowledge of the type of identifier to interpret
                    • Example: EC grant identifiers

        • Some problem domains require for globally unique IDs
              – Example: ISBN numbers to identify books, e.g. for copyright purposes

        • Some problem domains require resolvable IDs
              – Resolve = retrieve out information about the thing being identified, including where
                to access it (for a digital object, its location on the Internet)
              – Digital Object IDs best known, but several other systems exist

 BRIF workshop, Toulouse Oct 22 2012
Monday, 22 October 12
Monday, 22 October 12
Identifier use cases in BRIF
       • 3x broad categories of “stuff” to identify


            i) Digital resources
           Resources that actually “lives” in computers (born-digital or digitized content):
           datasets and databases

            ii) Physical resources
           Resources corresponding to actual physical things: samples, groups of samples,
           experimental instruments, etc.

            iii) Project-level and other “meta” resources
           Higher-level aggregates of things, projects, organizations, consortia etc.


           NB in many cases identifiers already exist for these things, but they are
           not exposed to the outside world in a usable form (i.e. made resolvable,
           citable, globally-unique).
 BRIF workshop, Toulouse Oct 22 2012
Monday, 22 October 12
Datasets
        • Definition: a data set (or dataset) is a collection of data, often presented in
            tabular form but in the bio-sciences also frequently in a multitude of
            domain-specific formats, such as FASTA for biological sequences
        • Data publication and data citation is a hot topic - lots of
          research and infrastructure-building activity in recent years
        • Emerging best practices for data citation & attribution
        • Identifiers for dataset - persistent data DOIs issued via DataCite


        • Little new for BRIF to add here, except issue recommendations
              – KEY POINT: infrastructure for data preservation and access is a prerequisite for any
                sort of persistent bio-dataset identification scheme. Many projects don’t have this!




 BRIF workshop, Toulouse Oct 22 2012
Monday, 22 October 12
Data DOI scenario (simplified)
        1. Research group registers a dataset and metadata in a suitable domain
            repository (or their own repository)


        2. Repository archives dataset and and assigns a DOI name to it

        3. Unique DOI name is used by article authors (and others) to indicate resource
            reuse (ideally via formal data citation)


        4. Journal article reference listings & full-text and other sources are mined to
            identify references to dataset and/or downloads


        5. Dataset-level metrics calculated from collected data
            e.g. - total no. citations in scholarly articles
                 - no. secondary citations (citations to papers which cited the original dataset)
                  - no. downloads in the last 2 years



 BRIF workshop, Toulouse Oct 22 2012
Monday, 22 October 12
ORCID and DataCite Interoperability Network

        • Persistent identifiers for connecting people and
          dataset
        • 2y EC-funded project, 7 partners in Europe + USA
        • Two main proof-of-concept pilots
              – Social Science data - use and citation of British Birth Cohort
                Studies
                    • historical data, decades old, steadily being curated by lots of
                      different people
                    • high rate of reuse, often cited in papers
              – High-energy physics - attribution challenges
                    • dealing with large no. authors on HEP papers - ‘dilution’ of the term
                      authorship
                    • Linking HEP papers to supporting datasets


                                       http://odin-project.eu/
 BRIF workshop, Toulouse Oct 22 2012
Monday, 22 October 12
Databases
        • Definition: an online database can be regarded as a collection of
            data, but made accessible in such a way that facilitates using the data
            to answer scientific question, via  structured querying and/or free-text
            searching of the data over the Internet
        • Broad range, from large-scale DNA and protein sequence
          repositories to small locus-specific databaess
              – E.g. GenBank, UniProt, GWAS Central, Ehlers-Danlos Syndrome Variant Database



        • Challenges in assessing impact & attributing curators
              – Reliance citations to database paper, if there is one (sometimes many)
                    • Analyzing website traffic is another indicator - highly-accessed database =~ important
              – Database URLs sometimes change
              – Database name + URL often only mentioned only in materials&methods, no citation
              – Credit via authorship impossible if there is no database journal paper
 BRIF workshop, Toulouse Oct 22 2012
Monday, 22 October 12
BioDBCore - global catalogue of bio-db’s
     • BioDBCore aims
           – annotation - organize the bio-database
             ‘resourceome’
           – discovery - e.g. which protein sequence
             databases are available?

     • Who’s behind it?
           – International Society for Biocuration
           – Resource catalogues: Bioinformatics Links,
             BioSiteMaps, NAR db-issue etc
           – Working group includes reps from NAR and
             DATABASE journals, MIBBI, Model
             organism db’s, others

     • Catalogue will have persistent
       identifiers for each db entry

             http://www.biosharing.org/biodbcore
 BRIF workshop, Toulouse Oct 22 2012
Monday, 22 October 12
Monday, 22 October 12
•[slot in Pierre]




 BRIF workshop, Toulouse Oct 22 2012
Monday, 22 October 12
From	
  Pa(ents	
  to	
  BioBanks	
  and	
  back…
       • Persistent	
  IDs	
  for	
  datasets	
  &	
  other	
  digital	
  resources
             – Absolute	
  need
       • From	
  BioresourceResearchIF	
  to	
  BioresourceXIF
             – More	
  than	
  an	
  IP	
  address	
  ?	
  
       • Increase	
  need	
  of	
  iden<fica<on	
  for	
  source	
  of	
  informa<on	
  
         in	
  general	
  
             – 	
  Not	
  only	
  research	
  purpose…
             – “Big	
  data”	
  
             – Quan<fied	
  self.
       • Blurring	
  the	
  border	
  between	
  :	
  Research,	
  data	
  (Non-­‐CLIA),	
  	
  
         Clinically	
  approved	
  ,	
  consumer	
  centered	
  data

Monday, 22 October 12
Database	
  Gateway	
  	
  &	
  Computa1ons


      User	
  data                                        Imaging




                                                    Reference	
                                                               Front-­‐end	
  
       Individual	
  data                       groups	
  of	
  pa.ents                                                         tablet	
  
                                                                                                                             Applica1on

Copyright	
  ©	
  2012	
  The	
  Regents	
  of	
  University	
  California,	
  USA	
  -­‐	
  All	
  right	
  reserved.	
  
Monday, 22 October 12
Conclusions / next steps
        • Complex landscape, lots of problems to tackle
        • Key challenge will be to get authors to use the right identifiers
              – education, awareness, best practices, journal guidelines etc.
              – build support into tools that researchers use



        • Potential outputs from BRIF subgroup, by end of GEN2PHEN
              – Continue work on whitepaper on identifiers (partial drafted earlier in the year)
              – Compile recommendations for authors & biobankers, for use cases where workable
                solutions exist or are emerging (data DOIs, BioDBCore)

        • Need some biobanker-expert help in ID subgroup!
              – Esp. to look in-depth into study catalogues with established identifier schemes
                    • International Clinical Trials Registry Platform
                    • ClinicalTrials.gov
                    • P3G study catalogue
 BRIF workshop, Toulouse Oct 22 2012
Monday, 22 October 12
Acknowledgements
   GEN2PHEN Consortium
                                                               This work has received funding from the
         http://www.gen2phen.org/about-gen2phen/partners       European Community's Seventh
                                                               Framework Programme (FP7/2007-2013)
                                                               under grant agreement number 200754 -
   Prof Anthony J. Brookes Bioinformatics Group, Leicester
                                                               the GEN2PHEN project.




                              Contact me!

        <gt50@le.ac.uk> |<gthorisson@gmail.com>
              http://www.linkedin.com/in/mummi
                http://www.twitter.com/gthorisson
                                                             Published under the CC BY license (http://
                        http://www.gthorisson.name           creativecommons.org/licenses/by/3.0/)




 BRIF workshop, Toulouse Oct 22 2012
Monday, 22 October 12

Más contenido relacionado

La actualidad más candente

Digital preservation: an introduction
Digital preservation: an introductionDigital preservation: an introduction
Digital preservation: an introductionMichael Day
 
DATA MANAGEMENT – WHAT DOES IT MEAN FOR RESEARCHERS?
DATA MANAGEMENT – WHAT DOES IT MEAN FOR RESEARCHERS?DATA MANAGEMENT – WHAT DOES IT MEAN FOR RESEARCHERS?
DATA MANAGEMENT – WHAT DOES IT MEAN FOR RESEARCHERS?Incremental Project
 
ARCLib project presentation from Pasig 2016
ARCLib project presentation from Pasig 2016ARCLib project presentation from Pasig 2016
ARCLib project presentation from Pasig 2016dp-blog-cz
 
Linked Open Data for Digital Humanities
Linked Open Data for Digital HumanitiesLinked Open Data for Digital Humanities
Linked Open Data for Digital HumanitiesChristophe Guéret
 
Triplifier talk
Triplifier talkTriplifier talk
Triplifier talkJohn Deck
 
Meeting the Research Data Management Challenge - Rachel Bruce, Kevin Ashley, ...
Meeting the Research Data Management Challenge - Rachel Bruce, Kevin Ashley, ...Meeting the Research Data Management Challenge - Rachel Bruce, Kevin Ashley, ...
Meeting the Research Data Management Challenge - Rachel Bruce, Kevin Ashley, ...Jisc
 
Semi-automated metadata extraction in the long-term
Semi-automated metadata extraction in the long-termSemi-automated metadata extraction in the long-term
Semi-automated metadata extraction in the long-termPERICLES_FP7
 
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)Blue BRIDGE
 
LIBER and its EU projects
LIBER and its EU projectsLIBER and its EU projects
LIBER and its EU projectsLIBER Europe
 
Research as infrastructure, Digital Humanities Congress, Sheffield 2012
Research as infrastructure, Digital Humanities Congress, Sheffield 2012Research as infrastructure, Digital Humanities Congress, Sheffield 2012
Research as infrastructure, Digital Humanities Congress, Sheffield 2012University of South Australlia
 
Elag workshop sessie 1 en 2 v10
Elag workshop sessie 1 en 2 v10Elag workshop sessie 1 en 2 v10
Elag workshop sessie 1 en 2 v10Jeroen Rombouts
 
The current challenges of upgrading the infrastructure
The current challenges of upgrading the infrastructureThe current challenges of upgrading the infrastructure
The current challenges of upgrading the infrastructureArhiv družboslovnih podatkov
 
Intro to Digitization Projects
Intro to Digitization ProjectsIntro to Digitization Projects
Intro to Digitization Projectszsrlibrary
 
鏈結資料在圖書館的應用20131107
鏈結資料在圖書館的應用20131107鏈結資料在圖書館的應用20131107
鏈結資料在圖書館的應用20131107皓仁 柯
 
Data management plans
Data management plansData management plans
Data management plansBrad Houston
 

La actualidad más candente (20)

EDINA / Data Library Overview
EDINA / Data Library OverviewEDINA / Data Library Overview
EDINA / Data Library Overview
 
Digital preservation: an introduction
Digital preservation: an introductionDigital preservation: an introduction
Digital preservation: an introduction
 
DATA MANAGEMENT – WHAT DOES IT MEAN FOR RESEARCHERS?
DATA MANAGEMENT – WHAT DOES IT MEAN FOR RESEARCHERS?DATA MANAGEMENT – WHAT DOES IT MEAN FOR RESEARCHERS?
DATA MANAGEMENT – WHAT DOES IT MEAN FOR RESEARCHERS?
 
Investigation into Private LOCKSS Networks
Investigation into Private LOCKSS NetworksInvestigation into Private LOCKSS Networks
Investigation into Private LOCKSS Networks
 
ARCLib project presentation from Pasig 2016
ARCLib project presentation from Pasig 2016ARCLib project presentation from Pasig 2016
ARCLib project presentation from Pasig 2016
 
Linked Open Data for Digital Humanities
Linked Open Data for Digital HumanitiesLinked Open Data for Digital Humanities
Linked Open Data for Digital Humanities
 
Triplifier talk
Triplifier talkTriplifier talk
Triplifier talk
 
Organising and Documenting Data
Organising and Documenting DataOrganising and Documenting Data
Organising and Documenting Data
 
Data preservation
Data preservationData preservation
Data preservation
 
Meeting the Research Data Management Challenge - Rachel Bruce, Kevin Ashley, ...
Meeting the Research Data Management Challenge - Rachel Bruce, Kevin Ashley, ...Meeting the Research Data Management Challenge - Rachel Bruce, Kevin Ashley, ...
Meeting the Research Data Management Challenge - Rachel Bruce, Kevin Ashley, ...
 
Semi-automated metadata extraction in the long-term
Semi-automated metadata extraction in the long-termSemi-automated metadata extraction in the long-term
Semi-automated metadata extraction in the long-term
 
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
Using e-infrastructures for biodiversity conservation - Gianpaolo Coro (CNR)
 
LIBER and its EU projects
LIBER and its EU projectsLIBER and its EU projects
LIBER and its EU projects
 
Research as infrastructure, Digital Humanities Congress, Sheffield 2012
Research as infrastructure, Digital Humanities Congress, Sheffield 2012Research as infrastructure, Digital Humanities Congress, Sheffield 2012
Research as infrastructure, Digital Humanities Congress, Sheffield 2012
 
Jan Brase: Data and Libraries - the DataCite consortium
Jan Brase: Data and Libraries - the DataCite consortiumJan Brase: Data and Libraries - the DataCite consortium
Jan Brase: Data and Libraries - the DataCite consortium
 
Elag workshop sessie 1 en 2 v10
Elag workshop sessie 1 en 2 v10Elag workshop sessie 1 en 2 v10
Elag workshop sessie 1 en 2 v10
 
The current challenges of upgrading the infrastructure
The current challenges of upgrading the infrastructureThe current challenges of upgrading the infrastructure
The current challenges of upgrading the infrastructure
 
Intro to Digitization Projects
Intro to Digitization ProjectsIntro to Digitization Projects
Intro to Digitization Projects
 
鏈結資料在圖書館的應用20131107
鏈結資料在圖書館的應用20131107鏈結資料在圖書館的應用20131107
鏈結資料在圖書館的應用20131107
 
Data management plans
Data management plansData management plans
Data management plans
 

Destacado

JISC MRD workshop Birmingham march 2011
JISC MRD workshop Birmingham march 2011JISC MRD workshop Birmingham march 2011
JISC MRD workshop Birmingham march 2011Gudmundur Thorisson
 
TNC2012 Federated and scholarly identity - match made in heaven?
TNC2012 Federated and scholarly identity - match made in heaven?TNC2012 Federated and scholarly identity - match made in heaven?
TNC2012 Federated and scholarly identity - match made in heaven?Gudmundur Thorisson
 
Afmælisfundur Líf- og umhverfisvísindastofnunar - kynning á vef
Afmælisfundur Líf- og umhverfisvísindastofnunar - kynning á vefAfmælisfundur Líf- og umhverfisvísindastofnunar - kynning á vef
Afmælisfundur Líf- og umhverfisvísindastofnunar - kynning á vefGudmundur Thorisson
 
Our Journey To AnnaMaria
Our Journey To AnnaMariaOur Journey To AnnaMaria
Our Journey To AnnaMariaDOUTHITT
 
Value of Unique IDs in Academia, Vilnius - Identifying knowledge contributors
Value of Unique IDs in Academia, Vilnius - Identifying knowledge contributorsValue of Unique IDs in Academia, Vilnius - Identifying knowledge contributors
Value of Unique IDs in Academia, Vilnius - Identifying knowledge contributorsGudmundur Thorisson
 
GEN2PHEN GAM8 meeting Leiden - Identifiers for LSDBs
GEN2PHEN GAM8 meeting Leiden - Identifiers for LSDBsGEN2PHEN GAM8 meeting Leiden - Identifiers for LSDBs
GEN2PHEN GAM8 meeting Leiden - Identifiers for LSDBsGudmundur Thorisson
 
методы решения интегральных уравнений
методы решения интегральных уравненийметоды решения интегральных уравнений
методы решения интегральных уравненийИван Иванов
 
GEN2PHEN GAM8 meeting Leiden - Update on ORCID and other ID developments
GEN2PHEN GAM8 meeting Leiden - Update on ORCID and other ID developmentsGEN2PHEN GAM8 meeting Leiden - Update on ORCID and other ID developments
GEN2PHEN GAM8 meeting Leiden - Update on ORCID and other ID developmentsGudmundur Thorisson
 
ORCID Outreach meeting Oxford may 2013 integration demo
ORCID Outreach meeting Oxford may 2013 integration demoORCID Outreach meeting Oxford may 2013 integration demo
ORCID Outreach meeting Oxford may 2013 integration demoGudmundur Thorisson
 

Destacado (9)

JISC MRD workshop Birmingham march 2011
JISC MRD workshop Birmingham march 2011JISC MRD workshop Birmingham march 2011
JISC MRD workshop Birmingham march 2011
 
TNC2012 Federated and scholarly identity - match made in heaven?
TNC2012 Federated and scholarly identity - match made in heaven?TNC2012 Federated and scholarly identity - match made in heaven?
TNC2012 Federated and scholarly identity - match made in heaven?
 
Afmælisfundur Líf- og umhverfisvísindastofnunar - kynning á vef
Afmælisfundur Líf- og umhverfisvísindastofnunar - kynning á vefAfmælisfundur Líf- og umhverfisvísindastofnunar - kynning á vef
Afmælisfundur Líf- og umhverfisvísindastofnunar - kynning á vef
 
Our Journey To AnnaMaria
Our Journey To AnnaMariaOur Journey To AnnaMaria
Our Journey To AnnaMaria
 
Value of Unique IDs in Academia, Vilnius - Identifying knowledge contributors
Value of Unique IDs in Academia, Vilnius - Identifying knowledge contributorsValue of Unique IDs in Academia, Vilnius - Identifying knowledge contributors
Value of Unique IDs in Academia, Vilnius - Identifying knowledge contributors
 
GEN2PHEN GAM8 meeting Leiden - Identifiers for LSDBs
GEN2PHEN GAM8 meeting Leiden - Identifiers for LSDBsGEN2PHEN GAM8 meeting Leiden - Identifiers for LSDBs
GEN2PHEN GAM8 meeting Leiden - Identifiers for LSDBs
 
методы решения интегральных уравнений
методы решения интегральных уравненийметоды решения интегральных уравнений
методы решения интегральных уравнений
 
GEN2PHEN GAM8 meeting Leiden - Update on ORCID and other ID developments
GEN2PHEN GAM8 meeting Leiden - Update on ORCID and other ID developmentsGEN2PHEN GAM8 meeting Leiden - Update on ORCID and other ID developments
GEN2PHEN GAM8 meeting Leiden - Update on ORCID and other ID developments
 
ORCID Outreach meeting Oxford may 2013 integration demo
ORCID Outreach meeting Oxford may 2013 integration demoORCID Outreach meeting Oxford may 2013 integration demo
ORCID Outreach meeting Oxford may 2013 integration demo
 

Similar a BRIF workshop Toulouse 2012 Digital IDs subgroup

How to overcome obstacles to data publication: Issues, requirements, and good...
How to overcome obstacles to data publication: Issues, requirements, and good...How to overcome obstacles to data publication: Issues, requirements, and good...
How to overcome obstacles to data publication: Issues, requirements, and good...ariadnenetwork
 
BioMed Central's open data initiatives
BioMed Central's open data initiativesBioMed Central's open data initiatives
BioMed Central's open data initiativesiainh_z
 
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...Sarah Anna Stewart
 
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎Libcorpio
 
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation Research Data Alliance
 
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation Research Data Alliance
 
Ppt 20120917 cor wash repository
Ppt 20120917 cor   wash repositoryPpt 20120917 cor   wash repository
Ppt 20120917 cor wash repositoryAgua Saneamiento
 
Data Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost RecoveryData Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost RecoveryAnita de Waard
 
Unpacking persistent identifiers for research
Unpacking persistent identifiers for researchUnpacking persistent identifiers for research
Unpacking persistent identifiers for researchARDC
 
Moving OA to the scientific enterprise
Moving OA to the scientific enterpriseMoving OA to the scientific enterprise
Moving OA to the scientific enterpriseMichael Day
 
The biodiversity informatics landscape: a systematics perspective
The biodiversity informatics landscape: a systematics perspectiveThe biodiversity informatics landscape: a systematics perspective
The biodiversity informatics landscape: a systematics perspectiveVince Smith
 
HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9 HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9 Scott Edmunds
 
Creating a sustainable business model for a digital repository: the Dryad exp...
Creating a sustainable business model for a digital repository: the Dryad exp...Creating a sustainable business model for a digital repository: the Dryad exp...
Creating a sustainable business model for a digital repository: the Dryad exp...ASIS&T
 
FSCI Persistent Identifiers
FSCI Persistent IdentifiersFSCI Persistent Identifiers
FSCI Persistent IdentifiersARDC
 
Open Access Week 2017: Introduction to Open Data Policies in H2020
Open Access Week 2017: Introduction to Open Data Policies in H2020Open Access Week 2017: Introduction to Open Data Policies in H2020
Open Access Week 2017: Introduction to Open Data Policies in H2020OpenAIRE
 
General introduction to Open Data Policies H2020, influence of OD policies on...
General introduction to Open Data Policies H2020, influence of OD policies on...General introduction to Open Data Policies H2020, influence of OD policies on...
General introduction to Open Data Policies H2020, influence of OD policies on...Nancy Pontika
 
Data Description Registry Interoperability WG at Research Data Alliance Third...
Data Description Registry Interoperability WG at Research Data Alliance Third...Data Description Registry Interoperability WG at Research Data Alliance Third...
Data Description Registry Interoperability WG at Research Data Alliance Third...amiraryani
 
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...DataCite – Bridging the gap and helping to find, access and reuse data – Herb...
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...OpenAIRE
 

Similar a BRIF workshop Toulouse 2012 Digital IDs subgroup (20)

How to overcome obstacles to data publication: Issues, requirements, and good...
How to overcome obstacles to data publication: Issues, requirements, and good...How to overcome obstacles to data publication: Issues, requirements, and good...
How to overcome obstacles to data publication: Issues, requirements, and good...
 
BioMed Central's open data initiatives
BioMed Central's open data initiativesBioMed Central's open data initiatives
BioMed Central's open data initiatives
 
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...
 
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎
 
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
 
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
 
Ppt 20120917 cor wash repository
Ppt 20120917 cor   wash repositoryPpt 20120917 cor   wash repository
Ppt 20120917 cor wash repository
 
Data Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost RecoveryData Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost Recovery
 
Unpacking persistent identifiers for research
Unpacking persistent identifiers for researchUnpacking persistent identifiers for research
Unpacking persistent identifiers for research
 
Moving OA to the scientific enterprise
Moving OA to the scientific enterpriseMoving OA to the scientific enterprise
Moving OA to the scientific enterprise
 
The biodiversity informatics landscape: a systematics perspective
The biodiversity informatics landscape: a systematics perspectiveThe biodiversity informatics landscape: a systematics perspective
The biodiversity informatics landscape: a systematics perspective
 
HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9 HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9
 
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
 
Creating a sustainable business model for a digital repository: the Dryad exp...
Creating a sustainable business model for a digital repository: the Dryad exp...Creating a sustainable business model for a digital repository: the Dryad exp...
Creating a sustainable business model for a digital repository: the Dryad exp...
 
Scholze liber 2015-06-25_final
Scholze liber 2015-06-25_finalScholze liber 2015-06-25_final
Scholze liber 2015-06-25_final
 
FSCI Persistent Identifiers
FSCI Persistent IdentifiersFSCI Persistent Identifiers
FSCI Persistent Identifiers
 
Open Access Week 2017: Introduction to Open Data Policies in H2020
Open Access Week 2017: Introduction to Open Data Policies in H2020Open Access Week 2017: Introduction to Open Data Policies in H2020
Open Access Week 2017: Introduction to Open Data Policies in H2020
 
General introduction to Open Data Policies H2020, influence of OD policies on...
General introduction to Open Data Policies H2020, influence of OD policies on...General introduction to Open Data Policies H2020, influence of OD policies on...
General introduction to Open Data Policies H2020, influence of OD policies on...
 
Data Description Registry Interoperability WG at Research Data Alliance Third...
Data Description Registry Interoperability WG at Research Data Alliance Third...Data Description Registry Interoperability WG at Research Data Alliance Third...
Data Description Registry Interoperability WG at Research Data Alliance Third...
 
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...DataCite – Bridging the gap and helping to find, access and reuse data – Herb...
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...
 

Más de Gudmundur Thorisson

Staða opins aðgangs á Íslandi
Staða opins aðgangs á ÍslandiStaða opins aðgangs á Íslandi
Staða opins aðgangs á ÍslandiGudmundur Thorisson
 
ODIN 1st year Conference Oct 2013 Interoperability: connecting identifiers
ODIN 1st year Conference Oct 2013 Interoperability: connecting identifiersODIN 1st year Conference Oct 2013 Interoperability: connecting identifiers
ODIN 1st year Conference Oct 2013 Interoperability: connecting identifiersGudmundur Thorisson
 
OA útskýrt: hvað er opinn aðgangur og af hverju?
OA útskýrt: hvað er opinn aðgangur og af hverju?OA útskýrt: hvað er opinn aðgangur og af hverju?
OA útskýrt: hvað er opinn aðgangur og af hverju?Gudmundur Thorisson
 
BRIF workshop Toulouse 2012 ORCID intro and status update
BRIF workshop Toulouse 2012 ORCID intro and status updateBRIF workshop Toulouse 2012 ORCID intro and status update
BRIF workshop Toulouse 2012 ORCID intro and status updateGudmundur Thorisson
 
GEN2PHEN GAM9 Toulouse - Launching the ORCID system, what do we do now?
GEN2PHEN GAM9 Toulouse - Launching the ORCID system, what do we do now?GEN2PHEN GAM9 Toulouse - Launching the ORCID system, what do we do now?
GEN2PHEN GAM9 Toulouse - Launching the ORCID system, what do we do now?Gudmundur Thorisson
 
ORCID Outreach Meeting dev breakout session
ORCID Outreach Meeting dev breakout sessionORCID Outreach Meeting dev breakout session
ORCID Outreach Meeting dev breakout sessionGudmundur Thorisson
 
RDFC2012 Open Access to Research Data
RDFC2012 Open Access to Research DataRDFC2012 Open Access to Research Data
RDFC2012 Open Access to Research DataGudmundur Thorisson
 
VIVO conference Aug 2011: The VIVO platform and ORCID in the scholarly identi...
VIVO conference Aug 2011: The VIVO platform and ORCID in the scholarly identi...VIVO conference Aug 2011: The VIVO platform and ORCID in the scholarly identi...
VIVO conference Aug 2011: The VIVO platform and ORCID in the scholarly identi...Gudmundur Thorisson
 
ORCID participant meeting May 2011: The digital scholar, identity on the Web ...
ORCID participant meeting May 2011: The digital scholar, identity on the Web ...ORCID participant meeting May 2011: The digital scholar, identity on the Web ...
ORCID participant meeting May 2011: The digital scholar, identity on the Web ...Gudmundur Thorisson
 
Data Citation Principles Harvard May 2011: ORCID and data publication - Ident...
Data Citation Principles Harvard May 2011: ORCID and data publication - Ident...Data Citation Principles Harvard May 2011: ORCID and data publication - Ident...
Data Citation Principles Harvard May 2011: ORCID and data publication - Ident...Gudmundur Thorisson
 
sameAs London May 2011: The digital scholar, identity on the Web and ORCID
sameAs London May 2011: The digital scholar, identity on the Web and ORCIDsameAs London May 2011: The digital scholar, identity on the Web and ORCID
sameAs London May 2011: The digital scholar, identity on the Web and ORCIDGudmundur Thorisson
 
DataCite workshop at BL April 2011
DataCite workshop at BL April 2011DataCite workshop at BL April 2011
DataCite workshop at BL April 2011Gudmundur Thorisson
 
NIH VIVO workshop Indiana March 2011
NIH VIVO workshop Indiana March 2011NIH VIVO workshop Indiana March 2011
NIH VIVO workshop Indiana March 2011Gudmundur Thorisson
 
Identity in research data publication - meeting with SageCite people march2011
Identity in research data publication - meeting with SageCite people march2011Identity in research data publication - meeting with SageCite people march2011
Identity in research data publication - meeting with SageCite people march2011Gudmundur Thorisson
 
Thorisson science online london sep2010
Thorisson science online london sep2010Thorisson science online london sep2010
Thorisson science online london sep2010Gudmundur Thorisson
 

Más de Gudmundur Thorisson (16)

Staða opins aðgangs á Íslandi
Staða opins aðgangs á ÍslandiStaða opins aðgangs á Íslandi
Staða opins aðgangs á Íslandi
 
ODIN 1st year Conference Oct 2013 Interoperability: connecting identifiers
ODIN 1st year Conference Oct 2013 Interoperability: connecting identifiersODIN 1st year Conference Oct 2013 Interoperability: connecting identifiers
ODIN 1st year Conference Oct 2013 Interoperability: connecting identifiers
 
Elsevier webinar New York
Elsevier webinar New YorkElsevier webinar New York
Elsevier webinar New York
 
OA útskýrt: hvað er opinn aðgangur og af hverju?
OA útskýrt: hvað er opinn aðgangur og af hverju?OA útskýrt: hvað er opinn aðgangur og af hverju?
OA útskýrt: hvað er opinn aðgangur og af hverju?
 
BRIF workshop Toulouse 2012 ORCID intro and status update
BRIF workshop Toulouse 2012 ORCID intro and status updateBRIF workshop Toulouse 2012 ORCID intro and status update
BRIF workshop Toulouse 2012 ORCID intro and status update
 
GEN2PHEN GAM9 Toulouse - Launching the ORCID system, what do we do now?
GEN2PHEN GAM9 Toulouse - Launching the ORCID system, what do we do now?GEN2PHEN GAM9 Toulouse - Launching the ORCID system, what do we do now?
GEN2PHEN GAM9 Toulouse - Launching the ORCID system, what do we do now?
 
ORCID Outreach Meeting dev breakout session
ORCID Outreach Meeting dev breakout sessionORCID Outreach Meeting dev breakout session
ORCID Outreach Meeting dev breakout session
 
RDFC2012 Open Access to Research Data
RDFC2012 Open Access to Research DataRDFC2012 Open Access to Research Data
RDFC2012 Open Access to Research Data
 
VIVO conference Aug 2011: The VIVO platform and ORCID in the scholarly identi...
VIVO conference Aug 2011: The VIVO platform and ORCID in the scholarly identi...VIVO conference Aug 2011: The VIVO platform and ORCID in the scholarly identi...
VIVO conference Aug 2011: The VIVO platform and ORCID in the scholarly identi...
 
ORCID participant meeting May 2011: The digital scholar, identity on the Web ...
ORCID participant meeting May 2011: The digital scholar, identity on the Web ...ORCID participant meeting May 2011: The digital scholar, identity on the Web ...
ORCID participant meeting May 2011: The digital scholar, identity on the Web ...
 
Data Citation Principles Harvard May 2011: ORCID and data publication - Ident...
Data Citation Principles Harvard May 2011: ORCID and data publication - Ident...Data Citation Principles Harvard May 2011: ORCID and data publication - Ident...
Data Citation Principles Harvard May 2011: ORCID and data publication - Ident...
 
sameAs London May 2011: The digital scholar, identity on the Web and ORCID
sameAs London May 2011: The digital scholar, identity on the Web and ORCIDsameAs London May 2011: The digital scholar, identity on the Web and ORCID
sameAs London May 2011: The digital scholar, identity on the Web and ORCID
 
DataCite workshop at BL April 2011
DataCite workshop at BL April 2011DataCite workshop at BL April 2011
DataCite workshop at BL April 2011
 
NIH VIVO workshop Indiana March 2011
NIH VIVO workshop Indiana March 2011NIH VIVO workshop Indiana March 2011
NIH VIVO workshop Indiana March 2011
 
Identity in research data publication - meeting with SageCite people march2011
Identity in research data publication - meeting with SageCite people march2011Identity in research data publication - meeting with SageCite people march2011
Identity in research data publication - meeting with SageCite people march2011
 
Thorisson science online london sep2010
Thorisson science online london sep2010Thorisson science online london sep2010
Thorisson science online london sep2010
 

BRIF workshop Toulouse 2012 Digital IDs subgroup

  • 1. BRIF Digital identifiers subgroup Gudmundur A. Thorisson <gt50@leicester.ac.uk> GEN2PHEN / University of Leicester Pierre-Antoine Gourraud <pierreantoine.gourraud@ucsf.edu> UCSF -- Overview -- ‣Brief backgrounder on identification & digital identifiers ‣Use cases for bio-resource identification in BRIF ‣Digital resources: datasets, databases (Mummi) ‣Non-digital resources: projects, studies, cohorts [...] (Pierre) ‣Conclusions and next steps This work is published under the Creative Commons Attribution license (CC BY: http://creativecommons.org/licenses/by/3.0/) which means that it can be freely copied, redistributed and adapted, as long as proper attribution is given. Monday, 22 October 12
  • 2. BRIF and bio-resource identification • The identification requirement: need to identify resources in order to – track use/reuse and impact – credit those who contribute to them • Biobanking projects have relied on: – Project/study/cohort names • Example: the GAZEL study in France >20 years http://www.gazel.inserm.fr • Challenges: - ad hoc agreements with research groups who reuse samples or data - painstaking manual searching through literature for mentions of ‘GAZEL‘ - project names are often ambiguous in global context BRIF workshop, Toulouse Oct 22 2012 Monday, 22 October 12
  • 4. BRIF and bio-resource identification • The identification requirement: need to identify resources in order to – track use/reuse and impact – credit those who contribute to them • Example: biobanking projects frequently rely on... – Project/study/cohort names • Example: the GAZEL study in France >20 years http://www.gazel.inserm.fr • Challenges: - ad hoc agreements with research groups who reuse samples or data - painstaking manual searching through literature for mentions of ‘GAZEL‘ - project names are often ambiguous in global context – Citations to journal publications • Which paper to cite? Tricky to keep track of which citations are relevant to impact • Also troublesome if there is no paper to cite (e.g. for a new study) BRIF workshop, Toulouse Oct 22 2012 Monday, 22 October 12
  • 5. Digital identifiers - some background • Definition: a digital identifier is a character string used to uniquely identify i) a digital object in a computer system, or ii) a record in a computer system which describes a non-digital object • Persistence - once assigned, identifier MUST NOT change • Uniqueness - global scope vs local scope – Most ID schemes require tacid knowledge of the type of identifier to interpret • Example: EC grant identifiers in acknowledgement statements BRIF workshop, Toulouse Oct 22 2012 Monday, 22 October 12
  • 6. This work has received funding from the European Community's Seventh Framework Programme (FP7/2007-2013) under grant agreement number 200754 - the GEN2PHEN project. BRIF workshop, Toulouse Oct 22 2012 Monday, 22 October 12
  • 7. This work has received funding under grant agreement number 200754 BRIF workshop, Toulouse Oct 22 2012 Monday, 22 October 12
  • 8. Digital identifiers - some background • Definition: a digital identifier is a character string used to uniquely identify i) a digital object in a computer system, or ii) a record in a computer system which describes a non-digital object • Persistence - once assigned, identifier MUST NOT change • Uniqueness - global scope vs local scope – Most ID schemes require tacid knowledge of the type of identifier to interpret • Example: EC grant identifiers • Some problem domains require for globally unique IDs – Example: ISBN numbers to identify books, e.g. for copyright purposes • Some problem domains require resolvable IDs – Resolve = retrieve out information about the thing being identified, including where to access it (for a digital object, its location on the Internet) – Digital Object IDs best known, but several other systems exist BRIF workshop, Toulouse Oct 22 2012 Monday, 22 October 12
  • 10. Identifier use cases in BRIF • 3x broad categories of “stuff” to identify i) Digital resources Resources that actually “lives” in computers (born-digital or digitized content): datasets and databases ii) Physical resources Resources corresponding to actual physical things: samples, groups of samples, experimental instruments, etc. iii) Project-level and other “meta” resources Higher-level aggregates of things, projects, organizations, consortia etc. NB in many cases identifiers already exist for these things, but they are not exposed to the outside world in a usable form (i.e. made resolvable, citable, globally-unique). BRIF workshop, Toulouse Oct 22 2012 Monday, 22 October 12
  • 11. Datasets • Definition: a data set (or dataset) is a collection of data, often presented in tabular form but in the bio-sciences also frequently in a multitude of domain-specific formats, such as FASTA for biological sequences • Data publication and data citation is a hot topic - lots of research and infrastructure-building activity in recent years • Emerging best practices for data citation & attribution • Identifiers for dataset - persistent data DOIs issued via DataCite • Little new for BRIF to add here, except issue recommendations – KEY POINT: infrastructure for data preservation and access is a prerequisite for any sort of persistent bio-dataset identification scheme. Many projects don’t have this! BRIF workshop, Toulouse Oct 22 2012 Monday, 22 October 12
  • 12. Data DOI scenario (simplified) 1. Research group registers a dataset and metadata in a suitable domain repository (or their own repository) 2. Repository archives dataset and and assigns a DOI name to it 3. Unique DOI name is used by article authors (and others) to indicate resource reuse (ideally via formal data citation) 4. Journal article reference listings & full-text and other sources are mined to identify references to dataset and/or downloads 5. Dataset-level metrics calculated from collected data e.g. - total no. citations in scholarly articles - no. secondary citations (citations to papers which cited the original dataset) - no. downloads in the last 2 years BRIF workshop, Toulouse Oct 22 2012 Monday, 22 October 12
  • 13. ORCID and DataCite Interoperability Network • Persistent identifiers for connecting people and dataset • 2y EC-funded project, 7 partners in Europe + USA • Two main proof-of-concept pilots – Social Science data - use and citation of British Birth Cohort Studies • historical data, decades old, steadily being curated by lots of different people • high rate of reuse, often cited in papers – High-energy physics - attribution challenges • dealing with large no. authors on HEP papers - ‘dilution’ of the term authorship • Linking HEP papers to supporting datasets http://odin-project.eu/ BRIF workshop, Toulouse Oct 22 2012 Monday, 22 October 12
  • 14. Databases • Definition: an online database can be regarded as a collection of data, but made accessible in such a way that facilitates using the data to answer scientific question, via  structured querying and/or free-text searching of the data over the Internet • Broad range, from large-scale DNA and protein sequence repositories to small locus-specific databaess – E.g. GenBank, UniProt, GWAS Central, Ehlers-Danlos Syndrome Variant Database • Challenges in assessing impact & attributing curators – Reliance citations to database paper, if there is one (sometimes many) • Analyzing website traffic is another indicator - highly-accessed database =~ important – Database URLs sometimes change – Database name + URL often only mentioned only in materials&methods, no citation – Credit via authorship impossible if there is no database journal paper BRIF workshop, Toulouse Oct 22 2012 Monday, 22 October 12
  • 15. BioDBCore - global catalogue of bio-db’s • BioDBCore aims – annotation - organize the bio-database ‘resourceome’ – discovery - e.g. which protein sequence databases are available? • Who’s behind it? – International Society for Biocuration – Resource catalogues: Bioinformatics Links, BioSiteMaps, NAR db-issue etc – Working group includes reps from NAR and DATABASE journals, MIBBI, Model organism db’s, others • Catalogue will have persistent identifiers for each db entry http://www.biosharing.org/biodbcore BRIF workshop, Toulouse Oct 22 2012 Monday, 22 October 12
  • 17. •[slot in Pierre] BRIF workshop, Toulouse Oct 22 2012 Monday, 22 October 12
  • 18. From  Pa(ents  to  BioBanks  and  back… • Persistent  IDs  for  datasets  &  other  digital  resources – Absolute  need • From  BioresourceResearchIF  to  BioresourceXIF – More  than  an  IP  address  ?   • Increase  need  of  iden<fica<on  for  source  of  informa<on   in  general   –  Not  only  research  purpose… – “Big  data”   – Quan<fied  self. • Blurring  the  border  between  :  Research,  data  (Non-­‐CLIA),     Clinically  approved  ,  consumer  centered  data Monday, 22 October 12
  • 19. Database  Gateway    &  Computa1ons User  data Imaging Reference   Front-­‐end   Individual  data groups  of  pa.ents tablet   Applica1on Copyright  ©  2012  The  Regents  of  University  California,  USA  -­‐  All  right  reserved.   Monday, 22 October 12
  • 20. Conclusions / next steps • Complex landscape, lots of problems to tackle • Key challenge will be to get authors to use the right identifiers – education, awareness, best practices, journal guidelines etc. – build support into tools that researchers use • Potential outputs from BRIF subgroup, by end of GEN2PHEN – Continue work on whitepaper on identifiers (partial drafted earlier in the year) – Compile recommendations for authors & biobankers, for use cases where workable solutions exist or are emerging (data DOIs, BioDBCore) • Need some biobanker-expert help in ID subgroup! – Esp. to look in-depth into study catalogues with established identifier schemes • International Clinical Trials Registry Platform • ClinicalTrials.gov • P3G study catalogue BRIF workshop, Toulouse Oct 22 2012 Monday, 22 October 12
  • 21. Acknowledgements GEN2PHEN Consortium This work has received funding from the http://www.gen2phen.org/about-gen2phen/partners European Community's Seventh Framework Programme (FP7/2007-2013) under grant agreement number 200754 - Prof Anthony J. Brookes Bioinformatics Group, Leicester the GEN2PHEN project. Contact me! <gt50@le.ac.uk> |<gthorisson@gmail.com> http://www.linkedin.com/in/mummi http://www.twitter.com/gthorisson Published under the CC BY license (http:// http://www.gthorisson.name creativecommons.org/licenses/by/3.0/) BRIF workshop, Toulouse Oct 22 2012 Monday, 22 October 12