Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×

Research Data Management in GLAM: Managing Data for Cultural Heritage


Eche un vistazo a continuación

1 de 57 Anuncio

Research Data Management in GLAM: Managing Data for Cultural Heritage

Descargar para leer sin conexión

Presentation given at the 'Open Science Infrastructures for Big Cultural Data' - Advanced International Masterclass in Plovdiv, Bulgaria. Dec. 13-15, 2018

Presentation given at the 'Open Science Infrastructures for Big Cultural Data' - Advanced International Masterclass in Plovdiv, Bulgaria. Dec. 13-15, 2018


Más Contenido Relacionado

Presentaciones para usted (20)

Similares a Research Data Management in GLAM: Managing Data for Cultural Heritage (20)


Más de Sarah Anna Stewart (10)

Más reciente (20)


Research Data Management in GLAM: Managing Data for Cultural Heritage

  1. 1. Research Data Management in ‘GLAM’: Managing Data for Cultural Heritage Sarah A. Stewart, The British Library @Biostew ‘Open Science Infrastructures for Big Cultural Data’ Masterclass, Dec. 13-15th, Plovdiv, Bulgaria
  2. 2. Outline Благодаря ви, че дойдохте днес! • Introduction and Challenges: Data in Cultural Heritage • Research Data Management ‘In a nutshell’: Key Concepts • Software as Data • PIDs for RDM – DataCite and DOIs • RDM at the British Library – Developing Infrastructure and Service around Data • Conclusion and Questions? 2
  3. 3. The Digital Transformation of Research 3
  4. 4. Digital Transformations in ‘GLAM’: The ‘Inside-Out’ Museum • GLAM institutions are ‘everting’ their collections and research to the (open) web – ‘Collections without Walls’ • Dynamic, changing research landscape – development of new tools and techniques for digital research • New infrastructures to support digital collections, research and scholarship • Changing materiality of research – from ‘analog’ to digital • Greater role for data and metadata • Research Data Management will play a crucial role! 4
  5. 5. ‘Inside-Out’ Museum: From Specimen to Data 5
  6. 6. The (Inside-Out) British Library… 6
  7. 7. “Challenges” (Opportunities?) • Research is digital, are we? • Are we still needed for discovery? • In an open world, do we still have a role for access to digital content? • Will print become invisible? • Global content grows so fast, our collections are shrinking (relatively) • Resources? Funding, Time, Labour
  8. 8. Many Types of Data in CH!
  9. 9. Big Data, Little Data, No Data…(Borgman, 2014) • Language of ‘Data’ taken from the Sciences, but can be defined and managed more broadly in all disciplines • Big Data requires computational methods for analysis and visualisation – Volume, Velocity, Variety • Cultural Heritage Data might be ‘messy’ or ‘dirty’ – may be incomplete, have gaps or require additional metadata (e.g. ‘Box of 19th Century Theatre Posters’) • Sensitive data(?) Can still occur in CH! • Broad definition of ‘data’ in Cultural Heritage 9
  10. 10. UKRI Concordat on Open Research Data (2016) • Data are ‘evidence that underpins the answer to the research question, and can be used to validate findings regardless of its form (e.g. print, digital, or physical).’ • The primary purpose of research data is to provide the information necessary to support or validate a research project's observations, findings or outputs 10
  11. 11. Why Manage Research Data? • Make data Findable, Accessible, Interoperable and Re-Useable (FAIR Data) • Preserve data for long-term use and re-use • Make Research Transparent/Open (Validation of Research!) and Reproducible • Funder and Publisher mandates • Good Research Practice – GLAM Institutions are Research Institutions! 11
  12. 12. Why is Research Data Management Important? Good Professional Practice: • Funder mandates and requirements • Supports institutional integrity • Supports collaboration through data sharing and re-use • Reduces redundancy in research Value to you as a Researcher/Institution: • Reduce the risk of data loss • Increased efficiency • Validated and replicable research • Increased sharing and re-use (increased possibilities for collaboration) • Increased citations • Increased Research Impact!
  13. 13. Missing Data Inhibits Research 13 data-at-a-rapid-rate-1.14416
  14. 14. What does Managed Data Look like? Well-managed data is: • intelligible and verifiable, because it is well-documented • findable, because it is well organised and uses useful filenames • protected against loss, corruption and authorised access, because it is backed up and secured appropriately • easy to share, because mechanisms for protecting confidentiality and intellectual property have been considered • maintainable, because it is managed in a way that suits the research group that uses it • compliant with relevant laws and policies 14
  15. 15. FAIR Data Principles (Force11, 2016) • Findable • Accessible • Interoperable • Reuseable 15
  16. 16. • To be Findable: • F1. (meta)data are assigned a globally unique and eternally persistent identifier. F2. data are described with rich metadata. F3. (meta)data are registered or indexed in a searchable resource. F4. metadata specify the data identifier. • TO BE ACCESSIBLE: • A1 (meta)data are retrievable by their identifier using a standardized communications protocol. A1.1 the protocol is open, free, and universally implementable. A1.2 the protocol allows for an authentication and authorization procedure, where necessary. A2 metadata are accessible, even when the data are no longer available. 16
  17. 17. 17 TO BE INTEROPERABLE: I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation. I2. (meta)data use vocabularies that follow FAIR principles. I3. (meta)data include qualified references to other (meta)data. TO BE RE-USABLE: R1. meta(data) have a plurality of accurate and relevant attributes. R1.1. (meta)data are released with a clear and accessible data usage license. R1.2. (meta)data are associated with their provenance. R1.3. (meta)data meet domain-relevant community standards.
  18. 18. Metadata 18 Metadata should be Open and as Rich as Possible!
  19. 19. The Research Data Lifecycle 19
  20. 20. Research Data Management (in a Nutshell) • Data Management Planning • Data Preservation (Both Short- and Long-Term) • Data Sharing (and Sensitive Data) • Data Discovery, Access and Re-Use 20
  21. 21. Data Creation: Data Management Plans • Should have a data management plan in place at the beginning of a project, as standard practice • Data management plans should provide an outline of uses, responsibilities, ownership, access and sharing (licensing), storage, maintenance and archiving (even disposal) of research data and software • Online tools for data management plans include DMPOnline ( 21
  22. 22. Data Sharing: Why Share Data? • Funder and publisher mandates • Collaborations – Interdisciplinary and (often) International • Validation/ Transparency, Support for Open Research • Citations = Research Impact! • Sensitive data – Not All Data Can be Shared!
  23. 23. Data Re-Use? You Might Be Surprised! 23
  24. 24. Software as Data • ‘Software is used to create, interpret, present, manipulate and manage data’ (Software Sustainability Institute) • Data: ‘recorded factual material commonly retained by and accepted…as necessary to validate research findings’ (EPSRC) • Software = Data!
  25. 25. Obsolecence!
  26. 26. Software should be preserved if: • Software can’t be separated from the data or digital object. • Software is classified as a research output • Software has intrinsic value • More resources available at the Software Sustainability Institute:
  27. 27. Digital Preservation (Software and Data) • Software is a digital object and is also often a vital prerequisite for the preservation of other digital objects • Storage, Retrieval, Reconstruction and Replay are all complexities relating to code libraries, dependencies and software engineering overall • Planning is essential for subsequent preservation • Software management should be part of a broader plan for research data management. 27
  28. 28. Some Strategies for Digital Preservation • Data integrity and file fixity checks (using checksums) for source code • Media and format migrations • Refreshing (reduces bit-rot) • Emulation (‘simulates’ the conditions of a legacy system) • Replication – ‘Lots of Copies Keeps Stuff Safe’ • Encapsulation – linking content with all information required for operation – rich metadata approach (e.g. ‘README’ file and annotation) • Version Control – metadata to support versioning of software and data 28
  29. 29. Open Data and where to Find It (and Store/Archive It, too…) • – Directory of subject-specific repositories • – Open Data repository run by CERN • Github – Software and code repository
  30. 30. Why Use Persistent Identifiers? • Use of persistent identifiers has increased as scholarly communications become increasingly digital. • ORCIDs and DOIs support open science through supporting interoperability in research infrastructures. • For instance, DataCite, CrossRef can use DOIs and ORCID iDs in addition to other metadata to map and link documents, data and researchers. (LOD)
  31. 31. DataCite and DataCite UK • Non-profit organisation which provides infrastructure for DOIs, (Digital Object Identifiers) • DOIs make data discoverable, citable and link datasets with other related research outputs • The British Library is the DataCite hub for DOI creation in the UK. • • ‘To help the research community locate, identify, and cite research data with confidence.’ 31
  32. 32. DOIs (Digital Object Identifiers) • Persistent identifier used to uniquely identify objects (datasets, software, journal articles, theses), standardised by the International Standards Organisation (ISO) • Presented as an alphanumeric code consisting of a prefix and suffix separated by a slash ‘/’ . The ‘10’ at the start of the DOI positions the DOI within DOI namespace. E.g. 10.1037/rmh0000008 • Uses a ‘handle’ system in which a DOI is ‘resolvable’ through binding metadata (such as a URL) to the specific DOI that describes it. • DOI is persistent, so it is the publisher’s responsibility to update the metadata attached to the DOI, otherwise, the DOI will resolve to a dead link. 32
  33. 33. 33
  34. 34. FREYA Ambassadors’ Programme • 3-Year EU-funded Project to advance infrastructure for persistent identifiers as a core component of Open Research • For more info, or to join, contact • • Funded partners of FREYA include: STFC, PANGAEA, DANS, DataCite and CERN and the British Library 34
  35. 35. Build Bridges, Not Siloes! • Use FAIR Data Principles, Open Metadata and Persistent Identifiers for Data! 35
  36. 36. The British Library in Context • National Library for the United Kingdom • Second Largest Library in the World – over 150 million items in most known languages • Over 16,000 visitors per day (on-site and on-line) • Legal Deposit 36
  37. 37. The British Library response to challenges • Living Knowledge articulates the vision of the British Library in 2023 as the most open, creative and innovative institution of its kind in the world. • A new Service Strategy for research and a new Content Strategy. • New approach for delivery that brings together the researcher- facing departments in joined-up roadmap. • Everything Available is a strategic change management portfolio designed to deliver the transformation of the Library’s services to researchers and research organisations.
  38. 38. Six strategic priorities • Unified discovery workflowFind • Unified access workflow • Registration and identity management • Workspaces and tools Use • Digital collection unification • Collection management as a service Share
  39. 39. SHARE: part of a wider ambition
  40. 40. Digital service elements Digitisation •On demand •For institutions Metadata •Enhance content •Provide identifiers •Build semantic links •Licensing support Preservation •Born digital •Digitised •Print •Preservation as a service Discovery •BL & external content •Feed external services (e.g. Google) •Discovery as a service •Single Digital Presence for public libraries Analysis •Text and data mining •Machine interfaces •Visualisation •Machine learning •Dedicated staff support Access •Shared platform •Institutional portals •Machine interfaces •Feed external platforms
  41. 41. The UK ‘Research Data’ Landscape… • UKRI – Data underpinning research and policy must be archived for 10+ years • Data must be made as openly available as possible (with constraints for sensitive data) • Data must have appropriate metadata and be citable 41
  42. 42. Vision – Data Collections and Services Our vision for the British Library is that research data are as integrated into our collections, research and services as text is today. The British Library's users will be able to consume research data online through tools that enable it to be analysed, visualised and understood by non-specialists.
  43. 43. British Library Data Strategy (2017) • All will be easy to discover and linked to related research outputs, be they text, data or multimedia.” 43
  44. 44. Data Services at the British Library • Development of Infrastructure to support research data management for data use and re-use at the British Library • DataCite UK • FREYA Project for Persistent Identifier (PID) Infrastructure • Data in the Research Repository • Discovery Services for Research Data • Software and Data Carpentry and Software as Data Initiatives (TBA) 44
  45. 45. Four Themes Data Archiving and Preservation Data Discovery, Access and Reuse Data CreationData Management
  46. 46. Data management training Data Management Plan engagement British Library Data Management Plans Documented Data Management Processes Data Management Jo, BL staff member I was working on a grant proposal for ESRC. They require a data management plan, so when I was given an outline plan that set out the Library’s processes for data management, I was able to reuse that and save myself days of extra work!
  47. 47. Engaging and linking with others Clarify approach to data collection Data Creation Sonja, Epidemiologist I was able to use the British Library web archive as a dataset, correlating positive and negative messages about statin use with NHS prescription data. The subset of data I extracted is really useful to others, so I offered it to the Library who now make it available alongside their other datasets.
  48. 48. Digital shared storage Data preservation services for third parties Data Archiving and Preservation Robin, Consultant We produce valuable reports and data on the political environment of emerging market economies. Now that the British Library is archiving that data, we can ensure others get to use it even if our consultancy closes down. We can also give them DOIs, and track the impact of the work we produce.
  49. 49. SHARE: Developing a repository platform • Single BL repository platform • Refresh national preservation system (>5m items, petabyte-scale) • Access layer with multiple repositories, shared service model • Repository pilot developed with: Preservation Layer Services Layer Access Layer EThOS BL Institutional Repository Partner Repositories
  50. 50. 50
  51. 51. Data Collections – 51
  52. 52. 52
  53. 53. Rosslyn, Social Historian My research on perceptions of gender involves looking at if and how gender-specific words evolve into derogatory terms. The British Library gave me great advice on which collections I could use, and how to connect tools to them. This allowed me to automate analysis and visualisation of the data, finding things I didn’t expect. New models of data access Third-party data discovery Discovery for Library data Data Discovery, Access, Reuse
  54. 54. Tools and skills for data exploration Alice, Post-Graduate Researcher Being able to persistently identify my data with a DOI means I can make my research reproducible. It also means that I can track when my data is cited, which is really helpful when it comes to looking at my research impact. Widening access DataCite UK Data Discovery, Access, Reuse
  55. 55. ‘Take-Home’ Points • Data in cultural heritage may be very broadly defined. • Use FAIR Data Principles as best practice • Plan for data management following the research data lifecycle • Data Discovery – Build Bridges, not Siloes • Consider software as ‘data’ in RDM • Persistent Identifiers to build robust, citable and discoverable metadata and link outputs 55
  56. 56. Join the FREYA Project! 56
  57. 57. Благодарим ви, че ни отделихте от времето си! Thank You! Questions? Email: @Biostew 57

Notas del editor

  • Many Types of Data - Data can come in many forms – What types of data are there? What types of data do you use/generate? Please give some examples here (make these appear on slide) - digital, spatial, physical (in the form of specimens) and even software can be considered to be data.

    What kind of data will you be generating?
  • Why share data? Data sharing may be mandated by your funder. Another researcher may want to use your data for their work and collaborate/cite your data. Data may be shared to validate your published results. Increase your citations and impact. Not all data can be shared – sensitive data may include ethical constraints such as medical data, personal identifiers or commercially sensitive data. These types of data are typically restrited and cannot always be shared.
  • The best way to make content available to our users is to help other organisations to manage and share their content.
  • The Library needs to make data core to what it does. And this is the ultimate aim – being able to find, access and use research data at the British Library should eventually become business as usual.
    Includes software not just data, and is one part of one of the Library’s strategic change programmes about opening up content – Everything Available.
  • The strategy is built on four themes, each of which is split into more specific areas of work.

    I’m going to briefly introduce each theme to give a flavour of the activities they cover. Each theme also comes with a scenario that is the kind of activity we hope to be able to support if the vision is achieved. These are all in a nice shiny booklet we have about the strategy, come and see me if you want a copy!
  • The data management theme largely has an internal focus. Its aim is to meet our data management and data management planning obligations as a funding recipient we’re an independent research organisation, we get funding from AHRC but also EU funding, both require data management plans.
    If we have documented data management process and plans in place, any BL staff participating in research will be able to take advantage of those, this will go hand in hand with training and engagement.
  • Even then our aim is not to hold every bit of data in the UK. But we want to link any data that we have with data held by others. Data derived from our collections can help to provide important context for that held elsewhere, as shown in our case study. The breadth of our collections can provide important social, geographical and other contexts – both historical and contemporary.
  • The strategy does not explicitly define potential services such as these because the landscape is moving and we want to be able to predict and respond, rather than tie ourselves down to a service that may be relevant this week, but not in 18 months time.

    However, some of the proposed work may relate directly to DataCite services, for instance helping support persistence of DataCite DOIs by bolting preservation on to the existing DataCite service, which has the core of requiring persistence.
  • Finally, discovery, access and reuse of data is the largest theme in the strategy. As an implication of creating new datasets, we will need to make sure that users can not only find them, but access and use them in an appropriate way. We should also be ensuring that users are able to find data no matter who holds it and where.
  • Within this theme, there is also an opportunity to widen access to data. We want to look at how we can provide access mechanisms and environments for restricted data that not only meet the requirements of data stewards, but also allow access the non-academic but still bona fide researchers that we see in the reading rooms.
    We will also continue to support data accessibility and sharing through our work on DataCite.