SlideShare una empresa de Scribd logo
1 de 43
a centre of expertise in data curation and preservation




             Create, curate, re-use:
the expanding life course of digital research data


                 Chris Rusbridge
           EDUCAUSE Australasia May 2007
                                                                                                         Funded by:
     This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 2.5
     UK: Scotland License, excluding content property of others. To view a copy of this license, visit
     http://creativecommons.org/licenses/by-nc-sa/2.5/scotland/ ; or, (b) send a letter to Creative
     Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.
a centre of expertise in data curation and preservation




                    Contents
     • Science and digital curation
     • Why are data important?
     • What kinds of data?
     • What to do with your data: frontiers of
       practice
     • Repository frontiers
     • Changing practice



EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




  Digital Curation Centre Mission
       “The over-riding purpose of the DCC is to
       support and promote continuing improvement
       in the quality of data curation, and of
       associated digital preservation”




EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




                Summarising…
     • Sustainability                  • Maintaining meaning
     • Creation or selection             over time
     • Growth, development             • Preserving, including
     • Making available                  past states
     • Access management               • De-selection…
     • Re-usability                    • Extended time
     • Linkage, context,               • Budget and policy
       metadata                          impacts
     • Authenticity, integrity,
                                       • People issues!
       provenance


EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




           Science and curation
     • Creating and managing data suitable for re-use
     • Good curation supports good science (managing
       your data properly)
        • Poor curation allows sloppy science?
     • Data curation should save money
        • Murray-Rust/Frey on interesting but fruitless experiments!
     • Some science impossible without curation…
        • QCD strong coupling constant prediction (Bethke)
        • Viscosity of earth mantle from Shang Dynasty eclipse
          records (Pang et al)
        • Science depending on past baselines (eg environmental,
          social sciences)

EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




            Records of science
     • Data increasingly important as evidence
        • Key part of the scholarly record (public good)
           • Unrepeatable observations & experiments
        • Experimental verifiability (the basis of science)
           • Would Chang retractions have been reduced if his first
             data were available?
        • Allows additional interpretations
        • Legal and compliance
           • See APSR/AERES report for good examples



EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




           What kinds of data?
     • Observations
        • eg UARS (Upper Atmosphere) Level 0: telemetry
        • UARS Level 1: measured physical parameters (post
          calibration?)
     • Derived data
        • UARS Level 2: calculated geophysical? profiles
        • UARS level 3: gridded, interpolated?
     • Combined data
     • Crafted data
        • Eg annotated gene/protein databases
     • Descriptive (meta)data


EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




Retaining research data means…
     • Data secure against loss (within group)
     • Communal repository (secure bit dump)
     • Re-usable, sharable information
     • As above, plus active curation (eg bio-
       informatics)
     • Long term preservation of information

     • Be clear what you are trying to do!

EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




     … or the data trajectory is…
     • Hard drive → lost (crash)
     • Hard drive →DVD →Cardboard box →Loft
       →Skip/dumpster → lost

     • Sometimes this is a very bad thing
     • Sometimes these are the right options!




EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




        Long term bit storage…
     • A solved problem? Just requires well-
       understood good data management
       practices?
     • Wrong! For very large datasets over very long
       time, there are significant problems…




                 BAKER, M., SHAH, M., ROSENTHAL, D. S. H., ROUSSOPOLOUS, M., MANIATIS, P., GIULI, T.
                 J. & BUNGALE, P. (2006) A Fresh Look at the Reliability of Long-term Digital Storage. EuroSys
                 '06. Leuven, Belgium, ACM.

EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




   How Well Must We Preserve?
   Keep a petabyte for a century
   – With   50% chance of remaining completely undamaged

   Consider each bit decaying independently
   – Analogy   with radioactive decay

   That's a bit half- life of 10**18 years
   – One    hundred million times the age of the universe

   That's a very demanding requirement
   – Hard to measure
   – Even very unlikely faults will matter a lot

EDUCAUSE Australasia 2007 •Slide from David Rosenthal, LOCKSS
a centre of expertise in data curation and preservation




       What to do about curation
     • Build curation/reusability into your workflow
        • Curation begins before creation
        • What’s easy at first becomes (impossibly) hard
          later
        • Describe your data (metadata schemas,
          “representation info”, etc)
        • Keep experimental parameters (technical, who,
          what, when, where)
        • Keep ability to process
        • Keep data!


EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




    What to do about curation - 2
     • Use standard/agreed formats for data
     • Make ownership & restrictions clear, &
       explain how to cite your data
     • Offer for deposit in institutional or discipline
       repository
        • Appraisal and selection essential
        • Possible time-limited embargos
     • “Publish” data in support of articles



EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation



 Internet Archaeology: publication with
                 data




EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




           Database as book…
     • Buneman (early pilot)
       work on IUPHAR
       database
     • MySQL to XML
       database
        • Historic to logical
          schema
     • XML via XSLT to LaTeX




EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




                The StORe vision
     • Seamless transport                                 Source
       from research data to
       research publications
       and vice versa                                      ware
     • Bi-directional links                               Middle
       proven in social science
       e-research but capable
       of export to other
       disciplines
                                                          Output




                 •http://jiscstore.jot.com/WikiHome/
EDUCAUSE Australasia 2007                         •Slide from Graham Pryor
a centre of expertise in data curation and preservation




 What are the reusability issues?
     • Data not neutral to hypothesis
     • Hard to know the risks & pitfalls of a particular
       dataset
     • Data not self-describing: hard to find
       appropriate data (but see Murray-Rust on
       Googling InChi etc)
     • Hard to “understand” data once found
        • Really need information, not data!
     • Hard to use data once understood

EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




                     Context
     • Data meaningless without context
        • Metadata of many kinds
        • Representation information… from data to
          information
        • Linkage and connection between datasets
        • Use your workflow!
     • Provenance
        • Authenticity/integrity
        • Computational lineage



EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation



                                              NASA



 Csat8-day composite and subsceneCsat 8-day composite subscene PAR subscene RPT
                                  E0SST            and    Pbopt calc      H
                                                                       Ctot calc Zeu calc PPeu calc




                                           University                                                                           research
                                                                                                         University              group3       local
                                           research
                                                                                                         research                           decision-
                                            group1
                                                                                                          group2                           making body



EDUCAUSE Australasia 2007                                                                                                Slide from Rajendra Bose
a centre of expertise in data curation and preservation




            Access and re-use
     • Ethics and rights control access
        • Weak in expressing this long-term
     • Collaboration tools
        • Annotation, discussion, review (see DART…)
        • Re-use leading to change and development
     • “Publication”
        • Not just in “print”
        • Underlying data should be “published”, too


EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




      Database citation issues…
     • Citation for human readers and machine use cases
     • Granularity: database, record, item
     • Citation of changing objects
        • Version change (eg W3C practice: no version = latest, vs bibliographic:
          no version = first)
        • An efficient way to reference and access “archived” past states of
          more rapidly changing dataset, eg Genomics… datasets that result
          from the combined work of curators, or contain opinions or facts likely
          to change (work in progress, Buneman et al)
     • Standards conflict and immature (NLM best?)

     • Citation ESSENTIAL for motivating quality academic work on data
       management and curation


EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




            Who does curation?
     •   Individuals
     •   Departments or groups
     •   Institutions, maybe through libraries
     •   Communities
     •   Disciplines
     •   Publishers
     •   National services
     •   Other 3rd parties…

EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




            Curation: Individual
     • “Small science 2-3 times more data than Big
       science”, but much more at risk
     • PhD student? RA? PI? Administrator? IT support?
     • Data potentially on local hard drives, or at best
       shared network drives
        • May be inadequately protected
        • Liable for policy-led deletion on resignation
     • Individual “knows” too much (tacit knowledge)
        • Documentation/metadata unlikely to be adequate
     • Future: gone!

EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




            Curation: Individual




EDUCAUSE Australasia 2007                                          •© Marita Bushell
a centre of expertise in data curation and preservation




         Department: eCrystals
                                       •   Partnership with Institutional
                                           Repository
                                       •   Specialist department
                                           archive (& national service)
                                       •   Workflow recording of lab
                                           parameters (R4L)
                                       •   Public & private elements
                                       •   Trying to build eCrystals
                                           federation (eBank 3)
                                       •   Future: likely to continue




EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




  Data in institutional repositories




EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




    Institution: Cambridge Chemistry
                                         •   175,000 small molecule
                                             structures in CML
                                         •   Alongside Archaeology,
                                             Manuscripts, Learning
                                             Materials, etc
                                         •   No library curation skills;
                                             dependent on research
                                             group enthusiast
                                         •   Collection isolated from
                                             other Chemistry
                                         •   (Only 5 UK institutional
                                             repositories claim to hold
                                             data)
                                         •   Future: assured…

EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




         Community: LOCKSS?
                                         •   Self-selected group of
                                             collectors: closest to genuine
                                             open activity (despite
                                             Alliance)?
                                         •   Traditionally libraries
                                             collecting eJournals
                                         •   Model respects IPR
                                         •   No domain expertise; rely on
                                             origins
                                         •   Data limitations…
                                         •   Future: potentially very
                                             persistent (low cost, high
                                             reliability, attack resistance,
                                             distributed)

EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




 Discipline: Atmospheric Science
                                         • Strong believer in need
                                           for domain scientists as
                                           curators
                                         • Significant participant in
                                           “community proxy”
                                           agenda-setting activities
                                         • Internationally
                                           fragmented resources
                                         • Future: mostly
                                           dependent on grant
                                           funding (but strong
                                           commitment)

EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




       Discipline: Pharmacology
                                         • International Scientific
                                           Union
                                         • Attempting to build
                                           credit for data
                                           contributions
                                         • Future: extremely
                                           limited funding




EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




  Bio-informatics: Nature article
            23 June 05
    • Databases in Peril
        • 51 out of 89 biological databases contacted reported they
          were struggling financially
        • 7 have closed
        • Several being updated in owner’s spare time
        • (Notes that not all deserve long term support)

    • [Nucleic Acids Research reports 968 databases in
      2007!]
    • Major issue: money


EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation



      Publisher: Crystallography
                                         •   Publisher and Scientific
                                             Union
                                         •   Created key domain
                                             crystallographic standard
                                             (CIF)
                                         •   Strong motivator for deposit
                                             of structure data
                                         •   Consistent quality checks
                                         •   DOIs used for structure data
                                         •   Future: publishing business
                                             model



EDUCAUSE Australasia 2007                                 •Slide from IUCr
a centre of expertise in data curation and preservation




   National bodies: British Library
                                         • Serious and robust
                                           approach
                                         • Legal deposit powers &
                                           responsibilities as driver
                                         • Oriented primarily
                                           towards “cultural
                                           heritage” (broadly
                                           interpreted)
                                         • Little data, no science
                                           domain experience
                                         • Future: strong future
                                           commitment

EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




    National bodies: TNA/NDAD
                                         •   Specialist archive for
                                             government datasets
                                         •   Understand government
                                             regulations, dynamics &
                                             requirements
                                         •   Subject generalists;
                                             disconnected from
                                             associated science
                                         •   Technology specialists
                                             (understand databases)
                                         •   Future: likely to pass
                                             eventually to The National
                                             Archives



EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




            3rd parties: Portico
                                         • Specific area: eJournals
                                         • Depends on publisher
                                           agreements
                                         • No data or domain
                                           science expertise
                                         • Future: commitment
                                           from Mellon +
                                           publishers +
                                           subscriptions, good
                                           funding mix


EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




     3rd Parties: Iron Mountain?
                                         • Records management
                                           IS a curation problem
                                         • Organisations like this
                                           very likely to branch out
                                         • No domain science
                                           expertise
                                         • Future: business case,
                                           viability, stock market…




EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




      3rd parties: Web 2.0 style,
            Swivel.com??




EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




       Institutions & the network
     • Institutions have                              Inst’   Inst’n   Inst’n
                                                       n1       2        3
       fundamental
       sustainability               Discipline 1       X                  X
     • Disciplines have domain
       knowledge advantage          Discipline 2                X         X
       but sustainability is an
       issue
                                    Discipline 3       X        X
     • Can we get the best of
       both?
     • Needs serious work to             etc

       examine!


EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




   Who are the curation players?




EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




               Cultural change
     • If we build it, will they come? NO!!
     • Outreach important: communication with
       scientists and researchers is hard graft
     • Cultural change to new approach requires more:
        • Incentives, rewards and mandates
        • Successful exemplars (well publicised)
        • Discipline-oriented approach (one size does not fit all)




EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




            Australian context?
     • In the emerging context of the Research
       Quality Framework, and the expected
       National Collaborative Research
       Infrastructure Strategy, curation can only
       increase in importance!




EDUCAUSE Australasia 2007
a centre of expertise in data curation and preservation




                            Thank you




                               •(Citations in paper in proceedings)
EDUCAUSE Australasia 2007

Más contenido relacionado

La actualidad más candente

Introduction to Research Data Management at UWA
Introduction to Research Data Management at UWAIntroduction to Research Data Management at UWA
Introduction to Research Data Management at UWAKatina Toufexis
 
Research Data Management Services at UWA (November 2015)
Research Data Management Services at UWA (November 2015)Research Data Management Services at UWA (November 2015)
Research Data Management Services at UWA (November 2015)Katina Toufexis
 
Research Data in the Arts and Humanities: A Few Difficulties
Research Data in the Arts and Humanities: A Few DifficultiesResearch Data in the Arts and Humanities: A Few Difficulties
Research Data in the Arts and Humanities: A Few DifficultiesMartin Donnelly
 
Disciplinary and institutional perspectives on digital curation
Disciplinary and institutional perspectives on digital curationDisciplinary and institutional perspectives on digital curation
Disciplinary and institutional perspectives on digital curationMichael Day
 
Neil Rambo "Understanding E-Science: A Symposium for Medical Librarians"
Neil Rambo "Understanding E-Science: A Symposium for Medical Librarians"Neil Rambo "Understanding E-Science: A Symposium for Medical Librarians"
Neil Rambo "Understanding E-Science: A Symposium for Medical Librarians"The TMC Library
 

La actualidad más candente (7)

Introduction to Research Data Management at UWA
Introduction to Research Data Management at UWAIntroduction to Research Data Management at UWA
Introduction to Research Data Management at UWA
 
Research Data Management Services at UWA (November 2015)
Research Data Management Services at UWA (November 2015)Research Data Management Services at UWA (November 2015)
Research Data Management Services at UWA (November 2015)
 
UWA Research Week 2016
UWA Research Week 2016UWA Research Week 2016
UWA Research Week 2016
 
Research Data in the Arts and Humanities: A Few Difficulties
Research Data in the Arts and Humanities: A Few DifficultiesResearch Data in the Arts and Humanities: A Few Difficulties
Research Data in the Arts and Humanities: A Few Difficulties
 
Disciplinary and institutional perspectives on digital curation
Disciplinary and institutional perspectives on digital curationDisciplinary and institutional perspectives on digital curation
Disciplinary and institutional perspectives on digital curation
 
Neil Rambo "Understanding E-Science: A Symposium for Medical Librarians"
Neil Rambo "Understanding E-Science: A Symposium for Medical Librarians"Neil Rambo "Understanding E-Science: A Symposium for Medical Librarians"
Neil Rambo "Understanding E-Science: A Symposium for Medical Librarians"
 
data curation issues
data curation issuesdata curation issues
data curation issues
 

Destacado (8)

Blue Ribbon Task Force on Sustainable Digital Preservation
Blue Ribbon Task Force on Sustainable Digital PreservationBlue Ribbon Task Force on Sustainable Digital Preservation
Blue Ribbon Task Force on Sustainable Digital Preservation
 
Disciplinary dimensions of digital curation: introduction and synthesis
Disciplinary dimensions of digital curation: introduction and synthesisDisciplinary dimensions of digital curation: introduction and synthesis
Disciplinary dimensions of digital curation: introduction and synthesis
 
Bollansee Jan
Bollansee JanBollansee Jan
Bollansee Jan
 
Pimp Je Bib
Pimp Je BibPimp Je Bib
Pimp Je Bib
 
LOCKSS UK, with a focus on reporting experience
LOCKSS UK, with a focus on reporting experienceLOCKSS UK, with a focus on reporting experience
LOCKSS UK, with a focus on reporting experience
 
Under 4 Event Presentation
Under 4 Event PresentationUnder 4 Event Presentation
Under 4 Event Presentation
 
JISC Digital Library initiatives
JISC Digital Library initiativesJISC Digital Library initiatives
JISC Digital Library initiatives
 
20100122_Ep_Ehost_2_0_Ehis
20100122_Ep_Ehost_2_0_Ehis20100122_Ep_Ehost_2_0_Ehis
20100122_Ep_Ehost_2_0_Ehis
 

Similar a Create, curate, re-use: the expanding life course of digital research data

Saving private data, sharing Open Data? Role of libraries and institutional r...
Saving private data, sharing Open Data? Role of libraries and institutional r...Saving private data, sharing Open Data? Role of libraries and institutional r...
Saving private data, sharing Open Data? Role of libraries and institutional r...Chris Rusbridge
 
ANDS and Data Management
ANDS and Data ManagementANDS and Data Management
ANDS and Data ManagementJulia Gross
 
Scally The Library's Role in Research Data Management. OCLC partnership meeti...
Scally The Library's Role in Research Data Management. OCLC partnership meeti...Scally The Library's Role in Research Data Management. OCLC partnership meeti...
Scally The Library's Role in Research Data Management. OCLC partnership meeti...John Scally
 
RDAP13 John Kunze: The Data Management Ecosystem
RDAP13 John Kunze: The Data Management EcosystemRDAP13 John Kunze: The Data Management Ecosystem
RDAP13 John Kunze: The Data Management EcosystemASIS&T
 
The Data Management Ecosystem
The Data Management EcosystemThe Data Management Ecosystem
The Data Management EcosystemJohn Kunze
 
Ausplots Training - Session 1
Ausplots Training - Session 1Ausplots Training - Session 1
Ausplots Training - Session 1bensparrowau
 
Research Data Management Services at UWA (July 2015)
Research Data Management Services at UWA (July 2015)Research Data Management Services at UWA (July 2015)
Research Data Management Services at UWA (July 2015)Katina Toufexis
 
Data Literacy: Creating and Managing Reserach Data
Data Literacy: Creating and Managing Reserach DataData Literacy: Creating and Managing Reserach Data
Data Literacy: Creating and Managing Reserach Datacunera
 
ESI Supplemental 1 E-research Support Slides
ESI Supplemental 1   E-research Support SlidesESI Supplemental 1   E-research Support Slides
ESI Supplemental 1 E-research Support SlidesDuraSpace
 
Improving RDM through closer integration of electronic lab notebooks and data...
Improving RDM through closer integration of electronic lab notebooks and data...Improving RDM through closer integration of electronic lab notebooks and data...
Improving RDM through closer integration of electronic lab notebooks and data...rmacneil88
 
Presentation to the UM Library Emergent Research Series
Presentation to the UM Library Emergent Research SeriesPresentation to the UM Library Emergent Research Series
Presentation to the UM Library Emergent Research SeriesSEAD
 
Claudia Bauzer Medeiros Digital preservation – caring for our data to foster...
Claudia Bauzer Medeiros  Digital preservation – caring for our data to foster...Claudia Bauzer Medeiros  Digital preservation – caring for our data to foster...
Claudia Bauzer Medeiros Digital preservation – caring for our data to foster...Beniamino Murgante
 
Introducingthe anu datacommons
Introducingthe anu datacommonsIntroducingthe anu datacommons
Introducingthe anu datacommonsDoug Moncur
 
'Data Management Planning: the role of institutions and researchers' eResearc...
'Data Management Planning: the role of institutions and researchers' eResearc...'Data Management Planning: the role of institutions and researchers' eResearc...
'Data Management Planning: the role of institutions and researchers' eResearc...Marta Ribeiro
 
Stuart Phinn_Many kinds of infrastructure: resolving and advancing ecosystem ...
Stuart Phinn_Many kinds of infrastructure: resolving and advancing ecosystem ...Stuart Phinn_Many kinds of infrastructure: resolving and advancing ecosystem ...
Stuart Phinn_Many kinds of infrastructure: resolving and advancing ecosystem ...TERN Australia
 
Guy avoiding-dat apocalypse
Guy avoiding-dat apocalypseGuy avoiding-dat apocalypse
Guy avoiding-dat apocalypseENUG
 

Similar a Create, curate, re-use: the expanding life course of digital research data (20)

Saving private data, sharing Open Data? Role of libraries and institutional r...
Saving private data, sharing Open Data? Role of libraries and institutional r...Saving private data, sharing Open Data? Role of libraries and institutional r...
Saving private data, sharing Open Data? Role of libraries and institutional r...
 
ANDS and Data Management
ANDS and Data ManagementANDS and Data Management
ANDS and Data Management
 
Scally The Library's Role in Research Data Management. OCLC partnership meeti...
Scally The Library's Role in Research Data Management. OCLC partnership meeti...Scally The Library's Role in Research Data Management. OCLC partnership meeti...
Scally The Library's Role in Research Data Management. OCLC partnership meeti...
 
RDAP13 John Kunze: The Data Management Ecosystem
RDAP13 John Kunze: The Data Management EcosystemRDAP13 John Kunze: The Data Management Ecosystem
RDAP13 John Kunze: The Data Management Ecosystem
 
The Data Management Ecosystem
The Data Management EcosystemThe Data Management Ecosystem
The Data Management Ecosystem
 
Ausplots Training - Session 1
Ausplots Training - Session 1Ausplots Training - Session 1
Ausplots Training - Session 1
 
Research Data Management Services at UWA (July 2015)
Research Data Management Services at UWA (July 2015)Research Data Management Services at UWA (July 2015)
Research Data Management Services at UWA (July 2015)
 
Data Literacy: Creating and Managing Reserach Data
Data Literacy: Creating and Managing Reserach DataData Literacy: Creating and Managing Reserach Data
Data Literacy: Creating and Managing Reserach Data
 
ESI Supplemental 1 E-research Support Slides
ESI Supplemental 1   E-research Support SlidesESI Supplemental 1   E-research Support Slides
ESI Supplemental 1 E-research Support Slides
 
Improving RDM through closer integration of electronic lab notebooks and data...
Improving RDM through closer integration of electronic lab notebooks and data...Improving RDM through closer integration of electronic lab notebooks and data...
Improving RDM through closer integration of electronic lab notebooks and data...
 
Presentation to the UM Library Emergent Research Series
Presentation to the UM Library Emergent Research SeriesPresentation to the UM Library Emergent Research Series
Presentation to the UM Library Emergent Research Series
 
Claudia Bauzer Medeiros Digital preservation – caring for our data to foster...
Claudia Bauzer Medeiros  Digital preservation – caring for our data to foster...Claudia Bauzer Medeiros  Digital preservation – caring for our data to foster...
Claudia Bauzer Medeiros Digital preservation – caring for our data to foster...
 
Introducingthe anu datacommons
Introducingthe anu datacommonsIntroducingthe anu datacommons
Introducingthe anu datacommons
 
The future of the DCC
The future of the DCCThe future of the DCC
The future of the DCC
 
'Data Management Planning: the role of institutions and researchers' eResearc...
'Data Management Planning: the role of institutions and researchers' eResearc...'Data Management Planning: the role of institutions and researchers' eResearc...
'Data Management Planning: the role of institutions and researchers' eResearc...
 
Stuart Phinn_Many kinds of infrastructure: resolving and advancing ecosystem ...
Stuart Phinn_Many kinds of infrastructure: resolving and advancing ecosystem ...Stuart Phinn_Many kinds of infrastructure: resolving and advancing ecosystem ...
Stuart Phinn_Many kinds of infrastructure: resolving and advancing ecosystem ...
 
Guy avoiding-dat apocalypse
Guy avoiding-dat apocalypseGuy avoiding-dat apocalypse
Guy avoiding-dat apocalypse
 
RDM Programme @ Edinburgh: Data Librarian Experience
RDM Programme @ Edinburgh: Data Librarian ExperienceRDM Programme @ Edinburgh: Data Librarian Experience
RDM Programme @ Edinburgh: Data Librarian Experience
 
User engagement in research data curation
User engagement in research data curationUser engagement in research data curation
User engagement in research data curation
 
METRO RDM Webinar
METRO RDM WebinarMETRO RDM Webinar
METRO RDM Webinar
 

Más de Chris Rusbridge

The Distributed National Electronic Resource and the Electronic Libraries Pro...
The Distributed National Electronic Resource and the Electronic Libraries Pro...The Distributed National Electronic Resource and the Electronic Libraries Pro...
The Distributed National Electronic Resource and the Electronic Libraries Pro...Chris Rusbridge
 
Practical steps towards digital preservation at institutional levels
Practical steps towards digital preservation at institutional levelsPractical steps towards digital preservation at institutional levels
Practical steps towards digital preservation at institutional levelsChris Rusbridge
 
Cautious Optimism: Cultivate your Garden
Cautious Optimism: Cultivate your GardenCautious Optimism: Cultivate your Garden
Cautious Optimism: Cultivate your GardenChris Rusbridge
 
Frequently-asked questions on Freedom of Information and Environmental Inform...
Frequently-asked questions on Freedom of Information and Environmental Inform...Frequently-asked questions on Freedom of Information and Environmental Inform...
Frequently-asked questions on Freedom of Information and Environmental Inform...Chris Rusbridge
 
Issues in long-term knowledge retention in engineering
Issues in long-term knowledge retention in engineeringIssues in long-term knowledge retention in engineering
Issues in long-term knowledge retention in engineeringChris Rusbridge
 
"Tomorrow, and tomorrow, and tomorrow": the players on the curation stage
"Tomorrow, and tomorrow, and tomorrow": the players on the curation stage"Tomorrow, and tomorrow, and tomorrow": the players on the curation stage
"Tomorrow, and tomorrow, and tomorrow": the players on the curation stageChris Rusbridge
 
Moving the repository upstream
Moving the repository upstreamMoving the repository upstream
Moving the repository upstreamChris Rusbridge
 
Trust and repository audit: can repository managers assure trustworthiness?
Trust and repository audit: can repository managers assure trustworthiness?Trust and repository audit: can repository managers assure trustworthiness?
Trust and repository audit: can repository managers assure trustworthiness?Chris Rusbridge
 
Reference Model for Economically Sustainable Digital Curation
Reference Model for Economically Sustainable Digital CurationReference Model for Economically Sustainable Digital Curation
Reference Model for Economically Sustainable Digital CurationChris Rusbridge
 
Frequently-asked questions on Freedom of Information and Environmental Inform...
Frequently-asked questions on Freedom of Information and Environmental Inform...Frequently-asked questions on Freedom of Information and Environmental Inform...
Frequently-asked questions on Freedom of Information and Environmental Inform...Chris Rusbridge
 
Sustainable Digital Preservation and Access
Sustainable Digital Preservation and AccessSustainable Digital Preservation and Access
Sustainable Digital Preservation and AccessChris Rusbridge
 
Data curation issues for repositories
Data curation issues for repositoriesData curation issues for repositories
Data curation issues for repositoriesChris Rusbridge
 

Más de Chris Rusbridge (15)

The Distributed National Electronic Resource and the Electronic Libraries Pro...
The Distributed National Electronic Resource and the Electronic Libraries Pro...The Distributed National Electronic Resource and the Electronic Libraries Pro...
The Distributed National Electronic Resource and the Electronic Libraries Pro...
 
Practical steps towards digital preservation at institutional levels
Practical steps towards digital preservation at institutional levelsPractical steps towards digital preservation at institutional levels
Practical steps towards digital preservation at institutional levels
 
The Licence Trap
The Licence TrapThe Licence Trap
The Licence Trap
 
Cautious Optimism: Cultivate your Garden
Cautious Optimism: Cultivate your GardenCautious Optimism: Cultivate your Garden
Cautious Optimism: Cultivate your Garden
 
Frequently-asked questions on Freedom of Information and Environmental Inform...
Frequently-asked questions on Freedom of Information and Environmental Inform...Frequently-asked questions on Freedom of Information and Environmental Inform...
Frequently-asked questions on Freedom of Information and Environmental Inform...
 
Dcc endeavour-2006
Dcc endeavour-2006Dcc endeavour-2006
Dcc endeavour-2006
 
Issues in long-term knowledge retention in engineering
Issues in long-term knowledge retention in engineeringIssues in long-term knowledge retention in engineering
Issues in long-term knowledge retention in engineering
 
"Tomorrow, and tomorrow, and tomorrow": the players on the curation stage
"Tomorrow, and tomorrow, and tomorrow": the players on the curation stage"Tomorrow, and tomorrow, and tomorrow": the players on the curation stage
"Tomorrow, and tomorrow, and tomorrow": the players on the curation stage
 
Moving the repository upstream
Moving the repository upstreamMoving the repository upstream
Moving the repository upstream
 
Dcc jsr phase 3
Dcc jsr phase 3Dcc jsr phase 3
Dcc jsr phase 3
 
Trust and repository audit: can repository managers assure trustworthiness?
Trust and repository audit: can repository managers assure trustworthiness?Trust and repository audit: can repository managers assure trustworthiness?
Trust and repository audit: can repository managers assure trustworthiness?
 
Reference Model for Economically Sustainable Digital Curation
Reference Model for Economically Sustainable Digital CurationReference Model for Economically Sustainable Digital Curation
Reference Model for Economically Sustainable Digital Curation
 
Frequently-asked questions on Freedom of Information and Environmental Inform...
Frequently-asked questions on Freedom of Information and Environmental Inform...Frequently-asked questions on Freedom of Information and Environmental Inform...
Frequently-asked questions on Freedom of Information and Environmental Inform...
 
Sustainable Digital Preservation and Access
Sustainable Digital Preservation and AccessSustainable Digital Preservation and Access
Sustainable Digital Preservation and Access
 
Data curation issues for repositories
Data curation issues for repositoriesData curation issues for repositories
Data curation issues for repositories
 

Último

Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 

Último (20)

Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 

Create, curate, re-use: the expanding life course of digital research data

  • 1. a centre of expertise in data curation and preservation Create, curate, re-use: the expanding life course of digital research data Chris Rusbridge EDUCAUSE Australasia May 2007 Funded by: This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 2.5 UK: Scotland License, excluding content property of others. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/2.5/scotland/ ; or, (b) send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.
  • 2. a centre of expertise in data curation and preservation Contents • Science and digital curation • Why are data important? • What kinds of data? • What to do with your data: frontiers of practice • Repository frontiers • Changing practice EDUCAUSE Australasia 2007
  • 3. a centre of expertise in data curation and preservation Digital Curation Centre Mission “The over-riding purpose of the DCC is to support and promote continuing improvement in the quality of data curation, and of associated digital preservation” EDUCAUSE Australasia 2007
  • 4. a centre of expertise in data curation and preservation EDUCAUSE Australasia 2007
  • 5. a centre of expertise in data curation and preservation Summarising… • Sustainability • Maintaining meaning • Creation or selection over time • Growth, development • Preserving, including • Making available past states • Access management • De-selection… • Re-usability • Extended time • Linkage, context, • Budget and policy metadata impacts • Authenticity, integrity, • People issues! provenance EDUCAUSE Australasia 2007
  • 6. a centre of expertise in data curation and preservation Science and curation • Creating and managing data suitable for re-use • Good curation supports good science (managing your data properly) • Poor curation allows sloppy science? • Data curation should save money • Murray-Rust/Frey on interesting but fruitless experiments! • Some science impossible without curation… • QCD strong coupling constant prediction (Bethke) • Viscosity of earth mantle from Shang Dynasty eclipse records (Pang et al) • Science depending on past baselines (eg environmental, social sciences) EDUCAUSE Australasia 2007
  • 7. a centre of expertise in data curation and preservation Records of science • Data increasingly important as evidence • Key part of the scholarly record (public good) • Unrepeatable observations & experiments • Experimental verifiability (the basis of science) • Would Chang retractions have been reduced if his first data were available? • Allows additional interpretations • Legal and compliance • See APSR/AERES report for good examples EDUCAUSE Australasia 2007
  • 8. a centre of expertise in data curation and preservation What kinds of data? • Observations • eg UARS (Upper Atmosphere) Level 0: telemetry • UARS Level 1: measured physical parameters (post calibration?) • Derived data • UARS Level 2: calculated geophysical? profiles • UARS level 3: gridded, interpolated? • Combined data • Crafted data • Eg annotated gene/protein databases • Descriptive (meta)data EDUCAUSE Australasia 2007
  • 9. a centre of expertise in data curation and preservation Retaining research data means… • Data secure against loss (within group) • Communal repository (secure bit dump) • Re-usable, sharable information • As above, plus active curation (eg bio- informatics) • Long term preservation of information • Be clear what you are trying to do! EDUCAUSE Australasia 2007
  • 10. a centre of expertise in data curation and preservation … or the data trajectory is… • Hard drive → lost (crash) • Hard drive →DVD →Cardboard box →Loft →Skip/dumpster → lost • Sometimes this is a very bad thing • Sometimes these are the right options! EDUCAUSE Australasia 2007
  • 11. a centre of expertise in data curation and preservation Long term bit storage… • A solved problem? Just requires well- understood good data management practices? • Wrong! For very large datasets over very long time, there are significant problems… BAKER, M., SHAH, M., ROSENTHAL, D. S. H., ROUSSOPOLOUS, M., MANIATIS, P., GIULI, T. J. & BUNGALE, P. (2006) A Fresh Look at the Reliability of Long-term Digital Storage. EuroSys '06. Leuven, Belgium, ACM. EDUCAUSE Australasia 2007
  • 12. a centre of expertise in data curation and preservation How Well Must We Preserve? Keep a petabyte for a century – With 50% chance of remaining completely undamaged Consider each bit decaying independently – Analogy with radioactive decay That's a bit half- life of 10**18 years – One hundred million times the age of the universe That's a very demanding requirement – Hard to measure – Even very unlikely faults will matter a lot EDUCAUSE Australasia 2007 •Slide from David Rosenthal, LOCKSS
  • 13. a centre of expertise in data curation and preservation What to do about curation • Build curation/reusability into your workflow • Curation begins before creation • What’s easy at first becomes (impossibly) hard later • Describe your data (metadata schemas, “representation info”, etc) • Keep experimental parameters (technical, who, what, when, where) • Keep ability to process • Keep data! EDUCAUSE Australasia 2007
  • 14. a centre of expertise in data curation and preservation What to do about curation - 2 • Use standard/agreed formats for data • Make ownership & restrictions clear, & explain how to cite your data • Offer for deposit in institutional or discipline repository • Appraisal and selection essential • Possible time-limited embargos • “Publish” data in support of articles EDUCAUSE Australasia 2007
  • 15. a centre of expertise in data curation and preservation Internet Archaeology: publication with data EDUCAUSE Australasia 2007
  • 16. a centre of expertise in data curation and preservation Database as book… • Buneman (early pilot) work on IUPHAR database • MySQL to XML database • Historic to logical schema • XML via XSLT to LaTeX EDUCAUSE Australasia 2007
  • 17. a centre of expertise in data curation and preservation The StORe vision • Seamless transport Source from research data to research publications and vice versa ware • Bi-directional links Middle proven in social science e-research but capable of export to other disciplines Output •http://jiscstore.jot.com/WikiHome/ EDUCAUSE Australasia 2007 •Slide from Graham Pryor
  • 18. a centre of expertise in data curation and preservation What are the reusability issues? • Data not neutral to hypothesis • Hard to know the risks & pitfalls of a particular dataset • Data not self-describing: hard to find appropriate data (but see Murray-Rust on Googling InChi etc) • Hard to “understand” data once found • Really need information, not data! • Hard to use data once understood EDUCAUSE Australasia 2007
  • 19. a centre of expertise in data curation and preservation Context • Data meaningless without context • Metadata of many kinds • Representation information… from data to information • Linkage and connection between datasets • Use your workflow! • Provenance • Authenticity/integrity • Computational lineage EDUCAUSE Australasia 2007
  • 20. a centre of expertise in data curation and preservation NASA Csat8-day composite and subsceneCsat 8-day composite subscene PAR subscene RPT E0SST and Pbopt calc H Ctot calc Zeu calc PPeu calc University research University group3 local research research decision- group1 group2 making body EDUCAUSE Australasia 2007 Slide from Rajendra Bose
  • 21. a centre of expertise in data curation and preservation Access and re-use • Ethics and rights control access • Weak in expressing this long-term • Collaboration tools • Annotation, discussion, review (see DART…) • Re-use leading to change and development • “Publication” • Not just in “print” • Underlying data should be “published”, too EDUCAUSE Australasia 2007
  • 22. a centre of expertise in data curation and preservation Database citation issues… • Citation for human readers and machine use cases • Granularity: database, record, item • Citation of changing objects • Version change (eg W3C practice: no version = latest, vs bibliographic: no version = first) • An efficient way to reference and access “archived” past states of more rapidly changing dataset, eg Genomics… datasets that result from the combined work of curators, or contain opinions or facts likely to change (work in progress, Buneman et al) • Standards conflict and immature (NLM best?) • Citation ESSENTIAL for motivating quality academic work on data management and curation EDUCAUSE Australasia 2007
  • 23. a centre of expertise in data curation and preservation Who does curation? • Individuals • Departments or groups • Institutions, maybe through libraries • Communities • Disciplines • Publishers • National services • Other 3rd parties… EDUCAUSE Australasia 2007
  • 24. a centre of expertise in data curation and preservation Curation: Individual • “Small science 2-3 times more data than Big science”, but much more at risk • PhD student? RA? PI? Administrator? IT support? • Data potentially on local hard drives, or at best shared network drives • May be inadequately protected • Liable for policy-led deletion on resignation • Individual “knows” too much (tacit knowledge) • Documentation/metadata unlikely to be adequate • Future: gone! EDUCAUSE Australasia 2007
  • 25. a centre of expertise in data curation and preservation Curation: Individual EDUCAUSE Australasia 2007 •© Marita Bushell
  • 26. a centre of expertise in data curation and preservation Department: eCrystals • Partnership with Institutional Repository • Specialist department archive (& national service) • Workflow recording of lab parameters (R4L) • Public & private elements • Trying to build eCrystals federation (eBank 3) • Future: likely to continue EDUCAUSE Australasia 2007
  • 27. a centre of expertise in data curation and preservation Data in institutional repositories EDUCAUSE Australasia 2007
  • 28. a centre of expertise in data curation and preservation Institution: Cambridge Chemistry • 175,000 small molecule structures in CML • Alongside Archaeology, Manuscripts, Learning Materials, etc • No library curation skills; dependent on research group enthusiast • Collection isolated from other Chemistry • (Only 5 UK institutional repositories claim to hold data) • Future: assured… EDUCAUSE Australasia 2007
  • 29. a centre of expertise in data curation and preservation Community: LOCKSS? • Self-selected group of collectors: closest to genuine open activity (despite Alliance)? • Traditionally libraries collecting eJournals • Model respects IPR • No domain expertise; rely on origins • Data limitations… • Future: potentially very persistent (low cost, high reliability, attack resistance, distributed) EDUCAUSE Australasia 2007
  • 30. a centre of expertise in data curation and preservation Discipline: Atmospheric Science • Strong believer in need for domain scientists as curators • Significant participant in “community proxy” agenda-setting activities • Internationally fragmented resources • Future: mostly dependent on grant funding (but strong commitment) EDUCAUSE Australasia 2007
  • 31. a centre of expertise in data curation and preservation Discipline: Pharmacology • International Scientific Union • Attempting to build credit for data contributions • Future: extremely limited funding EDUCAUSE Australasia 2007
  • 32. a centre of expertise in data curation and preservation Bio-informatics: Nature article 23 June 05 • Databases in Peril • 51 out of 89 biological databases contacted reported they were struggling financially • 7 have closed • Several being updated in owner’s spare time • (Notes that not all deserve long term support) • [Nucleic Acids Research reports 968 databases in 2007!] • Major issue: money EDUCAUSE Australasia 2007
  • 33. a centre of expertise in data curation and preservation Publisher: Crystallography • Publisher and Scientific Union • Created key domain crystallographic standard (CIF) • Strong motivator for deposit of structure data • Consistent quality checks • DOIs used for structure data • Future: publishing business model EDUCAUSE Australasia 2007 •Slide from IUCr
  • 34. a centre of expertise in data curation and preservation National bodies: British Library • Serious and robust approach • Legal deposit powers & responsibilities as driver • Oriented primarily towards “cultural heritage” (broadly interpreted) • Little data, no science domain experience • Future: strong future commitment EDUCAUSE Australasia 2007
  • 35. a centre of expertise in data curation and preservation National bodies: TNA/NDAD • Specialist archive for government datasets • Understand government regulations, dynamics & requirements • Subject generalists; disconnected from associated science • Technology specialists (understand databases) • Future: likely to pass eventually to The National Archives EDUCAUSE Australasia 2007
  • 36. a centre of expertise in data curation and preservation 3rd parties: Portico • Specific area: eJournals • Depends on publisher agreements • No data or domain science expertise • Future: commitment from Mellon + publishers + subscriptions, good funding mix EDUCAUSE Australasia 2007
  • 37. a centre of expertise in data curation and preservation 3rd Parties: Iron Mountain? • Records management IS a curation problem • Organisations like this very likely to branch out • No domain science expertise • Future: business case, viability, stock market… EDUCAUSE Australasia 2007
  • 38. a centre of expertise in data curation and preservation 3rd parties: Web 2.0 style, Swivel.com?? EDUCAUSE Australasia 2007
  • 39. a centre of expertise in data curation and preservation Institutions & the network • Institutions have Inst’ Inst’n Inst’n n1 2 3 fundamental sustainability Discipline 1 X X • Disciplines have domain knowledge advantage Discipline 2 X X but sustainability is an issue Discipline 3 X X • Can we get the best of both? • Needs serious work to etc examine! EDUCAUSE Australasia 2007
  • 40. a centre of expertise in data curation and preservation Who are the curation players? EDUCAUSE Australasia 2007
  • 41. a centre of expertise in data curation and preservation Cultural change • If we build it, will they come? NO!! • Outreach important: communication with scientists and researchers is hard graft • Cultural change to new approach requires more: • Incentives, rewards and mandates • Successful exemplars (well publicised) • Discipline-oriented approach (one size does not fit all) EDUCAUSE Australasia 2007
  • 42. a centre of expertise in data curation and preservation Australian context? • In the emerging context of the Research Quality Framework, and the expected National Collaborative Research Infrastructure Strategy, curation can only increase in importance! EDUCAUSE Australasia 2007
  • 43. a centre of expertise in data curation and preservation Thank you •(Citations in paper in proceedings) EDUCAUSE Australasia 2007