Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Repository and preservation systems

2.256 visualizaciones

Publicado el

Researcher data management shared service for the UK – John Kaye, Jisc
Hydra - Tom Cramer, Stanford University and Chris Awre, University of Hull
Addressing the preservation gap at the University of York - Jenny Mitcham, University of York
Emulation developments - David Rosenthal, Stanford University

Jisc and CNI conference, 6 July 2016

Publicado en: Educación
  • Sé el primero en comentar

Repository and preservation systems

  1. 1. Repository and preservation systems Chair: Chris Keene, Jisc 06/07/2016 1
  2. 2. Introduction Chair: Chris Keene, Jisc 14/07/2016
  3. 3. Research data management shared service for the UK John Kaye, Jisc 14/07/2016
  4. 4. Jisc-CNI Conference Jisc Research Data Shared Service06/07/2016
  5. 5. Contents »Background and Policy Context »Sector Requirements »Shared Service »Timescales »Engagement 14/07/2016 Jisc-CNI Research Data Shared Service 5
  6. 6. Research at Risk 14/07/2016 Jisc-CNI Research Data Shared Service 6
  7. 7. Research Funder Policies 14/07/2016 Jisc-CNI Research Data Shared Service 7 » Public good: Publicly funded research data are produced in the public interest should be made openly available with as few restrictions as possible » Planning for preservation: Institutional and project specific data management policies and plans needed to ensure valued data remains usable » Discovery: Metadata should be available and discoverable; Published results should indicate how to access supporting data » Confidentiality: Research organisation policies and practices to ensure legal, ethical and commercial constraints assessed; research process should not be damaged by inappropriate release » First use: Provision for a period of exclusive use, to enable research teams to publish results » Recognition: Data users should acknowledge data sources and terms & conditions of access » Public funding: Use of public funds for RDM infrastructure is appropriate and must be efficient and cost-effective RCUK Common Principles on Data Policy
  8. 8. EPSRC Policy 14/07/2016 Jisc-CNI Research Data Shared Service 8 » Retained EPSRC-funded research data is preserved for a minimum of ten years » Effective data curation is provided throughout full data lifecycle » Knowledge of publicly-funded research data holdings » Discoverability; recording of third party access requests » Notice and justification of access restrictions, for example ‘commercially confidential’ » Awareness and use of relevant law, for example FOI » Awareness and compliance with research data policise » Adequate RDM resource allocation for example from quality-related research (QR) funding or research grants
  9. 9. Strategic guidance from… 14/07/2016 Jisc-CNI Research Data Shared Service 9 »UCISA research IT systems group - › Procure a shared national RDM service »UUK research policy network discussion – › Concern over multiple solutions
  10. 10. What would you like Jisc to Provide? 14/07/2016 Jisc-CNI Research Data Shared Service 10 2015 Research Systems Survey: » “Currently the UK is running a very inefficient model requiring individual institutions to establish their own repositories. Influencing future central/research council provision would be useful” » “A national data repository” » “Increasing use of CRISes to fulfill traditional repository functions does not seem to be prioritised as an issue by JISC……” » “If not able to provide e.g. data repositories, influence funder or sector/community provision to support the needs of that funder/community.” » “Data access and user tracking tools and statistics on shared archive services” » “Development of the national research data registry.This will have implications for institutional research data registry development.”
  11. 11. A Key Requirement - Preservation 14/07/2016 Jisc-CNI Research Data Shared Service 11
  12. 12. A Key Requirement - Interoperability 14/07/2016 Jisc-CNI Research Data Shared Service 12
  13. 13. Vision »Researchers shouldn’t need to think (too much!) about Research Data Management »"Visible data, invisible infrastructure” › Provide researchers intuitive, easy functionality to publish, archive and preserve their research outputs. › Provide interoperable systems to allow researchers and institutions to fulfil and go beyond policy requirements and adhere to best practice throughout the RDM lifecycle. 14/07/2016 Jisc-CNI Research Data Shared Service 13
  14. 14. Why a Shared Service? 14/07/2016 Jisc-CNI Research Data Shared Service 14 » There is no single “solution” easily available and that meets requirements for Universities to enable Research Data Management » More effective Research Data Management must happen to comply with Funder Mandates, ensure data is not lost, and to realise a whole range of positive benefits » A shared service (provided by Jisc) seems to offer a number of benefits: » Cost savings and efficiencies » Common approaches and practice » Research system standardisation and interoperability » Others…
  15. 15. Pilot Institutions » Pilot institutions selected to create a balanced portfolio of types of institution, specialisms and research systems already in place 14/07/2016 Jisc-CNI Research Data Shared Service 15 Institution Name Cardiff University CREST - Consortium for Research Excellence, Support andTraining (Buckinghamshire New University, Harper Adams, St Mary’s -Twickenham, UCA &Winchester) Imperial College of Science,Technology and Medicine Middlesex University Plymouth University Royal College of Music St George's Hospital Medical School University of Cambridge University of Lancaster University of Lincoln University of StAndrews University of Surrey University ofYork
  16. 16. Pilots’ MVP’s »Easy to use and cost effective archiving, ingest, preservation, repository, reporting and discovery supported that can handle sensitive data” »“Robust data storage that has growth ability for active and archive data” »“Standard metadata profile - international for interoperability” »“Integration with all main CRIS systems” »“Meets REF and funder deposit requirements (supports deposit of REF data output types)” »….......... 14/07/2016 Jisc-CNI Research Data Shared Service 16
  17. 17. What we need 14/07/2016 Jisc-CNI Research Data Shared Service 17
  18. 18. Where are we now? 14/07/2016 Jisc-CNI Research Data Shared Service 18
  19. 19. Research at Risk Portfolio 14/07/2016 Jisc-CNI Research Data Shared Service 19
  20. 20. Project Support 14/07/2016 Jisc-CNI Research Data Shared Service 20 Consultancy Description RDM Costing (Cambridge Econometrics) To investigate current costing practices, tools, models and potential future developments in the field of RDM costing—and this work is being applied to developing the business model for the research data shared service pilot Data Asset Framework (Research Consulting) To provide the consultation phase for stakeholders in the project, not focused on the final technology solution, for example an audit of datasets, legal and compliance framework, financial and strategic commitment. Technical Architect (Digirati) To provide expert technical advice to the project on the technical architecture of the service, assessment of institutional technical capability and to assist in gatheringdetailed requirements from institutions and researchers Metadata and Interoperability (CLAX) An examination of metadata specifications and provide advice on identifier systems and interoperability Project Management (LM) To provide project management support and coordinate contract negotiations, facilitate collaboration between suppliers and HEI’s and monitor overall service development. This function will also gather evidence to feed into the business model for the next stage Market Research (TBC) To gather information on the demand for a service and to test proposed models for the business case to proceed to aproduction service. Preservation Audit (TBC) To provide the requirements and priorities for RDM preservation tools development
  21. 21. Project Support 14/07/2016 Jisc-CNI Research Data Shared Service 21 Milestones 2015-18 Apr 2015-Dec 2015 Jan 2016 – July 2016 Aug-2016 -June 2017 Jul 2017-Sept 2017 Oct 2017-Apr 2018 -Requirements - HEI Pilots Selected -Procurement commences - Support consultancy work begins -Supplier Framework selected -Alpha Development -Alpha service tested and reviewed -Beta Development -Feedback on Beta Service -Detailed HEI requirements and technical architecture -Contracting commences -Development Phase -Contact additional early adopter HEI’s and promote Beta Service -Business planning and Begin Business Case -Market Research and Consultation -Promote service to institutions -Start on next phases (service enhancement/mod ular) -Requirements - HEI Pilots Selected -Procurement commences -Institutional survey -HEI and supplier workshops -Pilot HEI selection process - Business case decision -If go then begin transition to production service
  22. 22. researchdata.network 14/07/2016 Jisc-CNI Research Data Shared Service 22
  23. 23. Thank you! Email: john.kaye@jisc.ac.uk Twitter:@JohnPKaye Blog: http://researchdata.jiscinvolve.org Except where otherwise noted, this work is licensed under CC-BY-NC-ND 14/07/2016 Jisc-CNI Research Data Shared Service 23
  24. 24. Hydra Tom Cramer, Stanford University – Chris Awre, University of Hull 14/07/2016
  25. 25. get ahead on your repository Tom Cramer Stanford University @tcramer Chris Awre University of Hull @clawre
  26. 26. get ahead on your repository Why ?
  27. 27. use a particular repository technology?
  28. 28. use a particular repository technology? Wrong question can we implement sustainable repository infrastructure to serve our digital content management needs?
  29. 29. Answers to questions • How do I manage my various collections of different digital content? • How can I deal with the different file types I’m having to archive? • How do I ensure I can cope with the increasing amount of digital content I need to manage? • How can I manage my digital content in a way that is meaningful? • How can I ensure that I can sustain the technology choice I make?
  30. 30. Building the digital library
  31. 31. Creating a sustainable open source project Technology Community
  32. 32. Creating a sustainable open source project Technology Community
  33. 33. One Body, Many Heads
  34. 34. One Body, Many Heads
  35. 35. CRUD in Repositories
  36. 36. CRUD in Repositories
  37. 37. A Word About… • Flexible, Extensible, Durable, Object Repository Architecture • Open source digital repository • middleware for relating your objects and hooking them to services & storage • Particularly powerful for data & other “non-simple” content types • More than 300 adopters worldwide • 4 major software releases since 2000
  38. 38. Large Universities Small Universities Colleges Public Broadcasting Government Ministry National Libraries National Lab Small Research Labs National Digital Repository Statewide Digital Libraries Chemical Heritage Foundation Museum of Performing Arts A Shakespeare Festival Self-deposit System Digital Collections System Sheet Music Architectural Resources Electronic Theses & Dissertations Digital Image System Media Management Media Preservation System Research Data Management Digitization Workflow System Digital Preservation System Digital Archives System And more! Used By... Used For...
  39. 39. Solutions and Solution Bundles Sufia RYO (roll your own) Hydra in a Box
  40. 40. Trend 1: Move to Linked Data PCDM (Portland Common Data Model), for data and code interoperability
  41. 41. Trend 2: Architecting Layers & Gems for Code Reuse Active Fedora Hydra::PCDM Hydra::Works Curation Concerns Sufia Local customization Hydra App Layers Hydra Gems (kinda like sprinkles) browse-everything hydra-editor hydra-derivatives hydra-role-management hydra-shibboleth Geomash iiif_manifest orcid questioning_authority etc.
  42. 42. Trend 3: Hydra-in-a-Box ● Directed project to produce a turnkey solution ○ ...and a hosted service ○ ...and metadata enrichment engine ● 2.5 years (May 2015 - November 2017) ● $2M grant from IMLS ● Core partners = DPLA, DuraSpace & Stanford ○ Plus significant & growing community contributions
  43. 43. What is Hydra? Community Hydra Connect Mailing lists, Slack, Skype/Hangouts Meetings – manager and technical focus
  44. 44. Hydra Partners & Adopters
  45. 45. Hydra Partners & Known Users
  46. 46. Hydra Partners & Known Users
  47. 47. Communication Channels
  48. 48. Hydra Interest & Working Groups
  49. 49. Hydra – getting localised • Hydra New England (NE) regional group • Hydra West Coast regional group • Developer congresses • Stanford and Michigan this year so far • Fostering face-to-face exchange of ideas and putting them into practice
  50. 50. Hydra UK Durham Lancaster York Hull Oxford LSE Research data catalogue Research output management Digitised archives Marketing images Institutional Repository Born digital archive Digital library
  51. 51. Hydra in (other parts of) Europe Ireland • Digital Repository of Ireland (based at Trinity College, Dublin) • University College Dublin • Maynooth University Denmark • Royal Library of Copenhagen • Danish Technical University Theatre Museum of Barcelona Hydra Europe Symposia • Dublin 2014 • London 2015 • ?
  52. 52. Hydra Support
  53. 53. Partnership Hydra would not work without partnership Hydra would not work if we tried to do the same by ourselves Partnership has brought together many different types of institution who would not have worked together otherwise Partnership has been stimulated by recognising a common need and finding a way to address this together Partnership has helped us find answers to our questions
  54. 54. Answers to questions • How do I manage my various collections of different digital content? • How can I deal with the different file types I’m having to archive? • How do I ensure I can cope with the increasing amount of digital content I need to manage? • How can I manage my digital content in a way that is meaningful? • How can I ensure that I can sustain the technology choice I make?
  55. 55. Thank you Tom Cramer tcramer@stanford.edu Chris Awre c.awre@hull.ac.uk
  56. 56. Addressing the preservation gap at the University of York Jenny Mitcham, University ofYork 14/07/2016
  57. 57. Addressing the preservation gap at the University of York Jenny Mitcham - University of York Jisc and CNI conference - 7 July 2016
  58. 58. Why do we need digital preservation?
  59. 59. Why is this relevant for research data? • Funder requirements around retention: – NERC - data should be retained for a minimum of 10 years but for projects of major importance this may need to be 20 years or longer – STFC - expect data to be retained for a minimum of 10 years and data that cannot be re-measured should be retained indefinitely – Wellcome Trust – expect data to be kept for a minimum of 10 years but suggest longer periods for certain types of data – EPSRC – expect research data to be securely preserved for a minimum of 10‐years from the date of last access
  60. 60. University of York RDM questionnaire 2013 • Which data management issues have you come across in your research over the last five years? – “Inability to read files in old software formats on old media or because of expired software licences” – 24% of 181 researchers who answered this question admitted this had been a problem for them …and researchers already encounter barriers to reusing data
  61. 61. Most universities have a place to store data The researcher The researcher gives data to the repository Access to the research data via the repository interface But what about this bit? The Open Archival Information System Data reuse will happen hereThe repository ingests the data
  62. 62. Visible v. invisible
  63. 63. Filling the digital preservation gap: Project aim “…to investigate Archivematica and explore how it might be used to provide digital preservation functionality within a wider infrastructure for Research Data Management.”
  64. 64. The teamUniversity of Hull: • Chris Awre – Head of Information Services, Library and Learning Innovation • Richard Green – Independent Consultant • Simon Wilson – University Archivist University of York: • Julie Allinson – Manager, Digital York • Jen Mitcham – Digital Archivist Artefactual Systems
  65. 65. What have we been doing? • Phase 1 – explore: test Archivematica, research, do some thinking (3 months) • Phase 2 – develop: make Archivematica better for RDM, plan implementation (4 months) • Phase 3 – implement: set up proof of concepts at York and Hull, investigation of the file format problem (6 months)
  66. 66. York
  67. 67. Hull
  68. 68. A quick look at file formats Research data file formats are: • Numerous • Sometimes a bit obscure • Sometimes very big • Ever-changing • Often very new This means they can be hard to preserve... because we can’t identify them. If we can’t identify them how can we carry out preservation activities?
  69. 69. Top research data applications at York
  70. 70. The NDSA Levels of Digital Preservation: Level 2 requires you to know what you’ve got ... and levels 3 and 4 build on this
  71. 71. Can we identify our research data? We ran Droid over the research data deposited with us over the past year. Out of 3752 individual files: • for 1382 (37%) of the files a file format was identified – 668 (48%) by signature – 648 (47%) by extension – 65 (5%) by container • 34 different file formats were identified automatically
  72. 72. Identified research data files • Files identified by Droid (listed by file type)
  73. 73. Unidentified research data files • Files not identified by Droid (listed by file ext) • 107 different file extensions not identified
  74. 74. Every little helps
  75. 75. How do we improve this result? • More file signature research required – institutions can submit sample files to TNA – or they can create their own file format signatures – digital preservation tools (eg: Archivematica) can help us with better reporting on unidentified files We can improve the tools if we work together
  76. 76. Where to find out more
  77. 77. Do talk to me (or Chris) if you are interested in finding out more about our preservation work Useful links: Project website: http://www.york.ac.uk/borthwick/archivematica Digital archiving blog: http://digital-archiving.blogspot.co.uk/ Archivematica: https://www.archivematica.org/en/ PRONOM: http://www.nationalarchives.gov.uk/PRONOM/ Phase 1 report: http://dx.doi.org/10.6084/m9.figshare.1481170 Phase 2 report: https://dx.doi.org/10.6084/m9.figshare.2073220
  78. 78. Emulation developments David Rosenthal, Standford University 14/07/2016
  79. 79. »AWAITING CONTENT 14/07/2016
  80. 80. Repository and preservation systems06/07/2016 81

×