Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×
Cargando en…3

Eche un vistazo a continuación

1 de 28 Anuncio

Más Contenido Relacionado

Presentaciones para usted (20)


Similares a Open data pilot (20)

Más de Sarah Jones (20)


Más reciente (20)

Open data pilot

  1. 1. The Horizon 2020 Open Data Pilot Sarah Jones Digital Curation Centre, Glasgow Twitter: @sjDCC Fot-Net Data Stakeholder Meeting on Open Data and Data Re-use in Horizon 2020, 10th March 2015, ERTICO, Brussels Funded by:
  2. 2. What is the Digital Curation Centre? “a centre of expertise in digital information curation with a focus on building capacity, capability and skills for research data management across the UK's higher education research community”
  3. 3. Benefits and drivers WHY SHARE DATA (OPENLY)? Image CC-BY-NC-SA by Wonderwebby
  4. 4. It’s part of good research practice
  5. 5. Science as an open enterprise “Much of the remarkable growth of scientific understanding in recent centuries is due to open practices; open communication and deliberation sit at the heart of scientific practice.” The Royal Society report calls for ‘intelligent openness’ whereby data are accessible, intelligible, assessable and usable.
  6. 6. Faster scientific breakthroughs “It was unbelievable. Its not science the way most of us have practiced in our careers. But we all realised that we would never get biomarkers unless all of us parked our egos and intellectual property noses outside the door and agreed that all of our data would be public immediately.” Dr John Trojanowski, University of Pennsylvania
  7. 7. Increased use and economic benefit UP TO 2008 Sold through the US Geological Survey for US$600 per scene Sales of 19,000 scenes per year Annual revenue of $11.4 million SINCE 2009 Freely available over the internet Google Earth now uses the images Transmission of 2,100,000 scenes per year. Estimated to have created value for the environmental management industry of $935 million, with direct benefit of more than $100 million per year to the US economy Has stimulated the development of applications from a large number of companies worldwide The case of NASA Landsat satellite imagery of the Earth’s surface:
  8. 8. HORIZON 2020 OPEN DATA PILOT Image CC-BY-NC-SA by Tom Magllery
  9. 9. Why open access and open data? “The European Commission’s vision is that information already paid for by the public purse should not be paid for again each time it is accessed or used, and that it should benefit European companies and citizens to the full.” ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa- pilot-guide_en.pdf
  10. 10. H2020 open data pilot • Seven areas are participating in the pilot, which correspond to about €3 billion or 20% of the overall Horizon 2020 budget in 2014 and 2015. • Projects in other areas can opt in on a voluntary basis Guidelines on Data Management in Horizon 2020 lot/h2020-hi-oa-data-mgt_en.pdf • Participants can opt out at proposal stage or during the lifetime of the project • Reasons for exemption to be explained in the DMP
  11. 11. Which data does the pilot apply to? Data, including associated metadata, needed to validate the results in scientific publications Other curated and/or raw data, including associated metadata, as specified in the DMP Doesn’t apply to all data (researchers to define as appropriate) Don’t have to share data if inappropriate – exemptions apply
  12. 12. Key requirements of the open data pilot 1. Deposit in a research data repository 2. Make it possible for third parties to access, mine, exploit, reproduce and disseminate data – free of charge for any user 3. Provide information on the tools and instruments needed to validate the results (or better still provide the tools) Image CC-BY-NC-SA by adesigna
  13. 13. Data Management Plans Projects participating in the pilot will be required to develop a Data Management plan (DMP), in which they will specify what data will be open. • What types of data will the project generate/collect? • What standards will be used? • How will this data be shared/made available? If not, why? • How will this data be curated and preserved? Note that the Commission does NOT require applicants to submit a DMP at the proposal stage. DMPs are a deliverable for those participating in the pilot.
  14. 14. Good practice, tools, infrastructure & services SUPPORT FOR IMPLEMENTATION
  15. 15. Data sharing: degrees of openness Open Restricted Closed Content that can be freely used, modified and shared by anyone for any purpose Limits on who can use the data, how or for what purpose - Charges for use - Data sharing agreements - Restrictive licences - Peer-to-peer exchange - …  online under an open licence  structured data  non-proprietary formats  use URIs to denote things  link data to provide context Five star open data Unable to share Under embargo
  16. 16. How to make data open? 1. Choose your dataset(s) What can you may open? You may need to revisit this step if you encounter problems later. 2. Apply an open license Determine what IP exists. Apply a suitable licence e.g. CC-BY or CC0 3. Make the data available Provide the data in a suitable format. Use repositories. 4. Make it discoverable Post on the web, register in catalogues…
  17. 17. Data licensing This DCC how-to guide outlines pros and cons of each approach and gives practical advice on how to implement your licence. • Do you own the rights or have permission to redistribute? • Do you need to place restrictions on who can use the data or how?
  18. 18. EUDAT licensing wizard Search / browse through a list of possible licences Or answer questions to determine which is most suitable
  19. 19. Metadata standards • Good metadata is key for research data access and re-use • Many disciplines have formalised community metadata standards • Use relevant standards for interoperability
  20. 20. Data catalogues Institutional services e.g. DataFinder at the University of Oxford National services e.g. Research Data Australia and RDDS pilot in the UK Data centres and community initiatives e.g. FOT Data Catalogue, B2FIND etc
  21. 21. Joining up data catalogues
  22. 22. Data repositories Zenodo • Joint effort by OpenAIRE- CERN • Multidisciplinary repository • Multiple data types – Publications – Long tail of research data • Citable data (DOI) • Links funding, publications, data & software • Does your publisher or funder suggest a repository? • Are there data centres or community databases for your field? • Does your university offer support for long-term preservation?
  23. 23. EUDAT services EUDAT offers a pan-European solution, providing a generic set of services to ensure minimum level of interoperability Building common data services in close collaboration with 25+ communities
  24. 24. EUDAT B2 service suite Covering both access and deposit, from informal data sharing to long-term archiving, and addressing identification, discoverability and computability of both long-tail and big data, EUDAT’s services will address the full lifecycle of research data
  25. 25. Institutional RDM support services Diagram courtesy of Sally Rumsey, University of Oxford University of Edinburgh Research Data Management Roadmap departments/information- services/about/strategy- planning/rdm-roadmap Research Data Oxford
  26. 26. Support on Data Management Plans • Checklist on what to include • How to guide on developing a plan • Guidance on assessing plans (forthcoming) • Webinars and training materials • DMPonline tool • Example DMPs
  27. 27. DMPonline • Presents requirements from funders • Guidance from funder, uni, discipline… • Example answers • Ability to share plans with collaborators • Export into a variety of formats • …
  28. 28. Thanks for listening DCC guidance, tools & case studies: Follow us on twitter: @digitalcuration and #ukdcc

Notas del editor

  • The Royal Society report ‘Science as an Open Enterprise’ emphasises that much of the growth of scientific understanding is due to open practices. Being open about your work and encouraging feedback from others is at the heart of scientific practice.

    The report calls for ‘intelligent openness’ – data shouldn’t just be accessible, they need to be intelligible by others so they can assess and reuse them.
  • Certain research communities have also seen the benefit of sharing data as it speeds up the process of discovery. This article shows how researchers in the field of Alzheimer’s research have agreed as a community to share data immediately to make scientific breakthroughs.
  • There’s also an economic benefit, as seen by the case of the NASA landsat satellite images. These were sold until 2008 for $600 a scene. Now they’re freely available and used by Google Earth. Previously they sold 19,000 images a year, whereas now they transmit 2.1 million. The revenue has gone up incredibly too from $11.4 million to over $100 million with an estimated value of $935 million. The release has also stimulated the development of applications from companies worldwide.

    This case study comes from the Royal Society Report on Science as an Open Enterprise.
  • The background to this is about making the most of the data that has been created through publicly funded research. The guidelines speak of:
    Improved quality of results
    Greater efficiency
    Faster to market = faster growth
    Improved transparency of the scientific process
  • For those that do take part in the pilot, the starting point is to make all data that underpin publications open. After that, it’s for researchers to define what else should be shared and can be made open. This should be outlined in the DMP.

    Sometimes sharing is not appropriate (e.g. due to ethical rules of personal data, intellectual property protection, commercial restrictions etc). It’s fine to apply restrictions in such cases. This could be an embargo period prior to publication or while a patent is sought, or controlling access and re-use to protect participants’ identities (e.g. via the use of secure data services / data enclaves or data sharing agreements). Restrictions should be outlined up-front in the DMP.
  • So the specific requirements on projects that participate in the pilot are to:
    - Deposit data in a repository
    Enable reuse via open licensing
    Provide any tools (or at least info on them) needed to validate the data

    The focus is planning for data sharing and then facilitating that through deposit, licensing and enabling reproducibility
  • The Open Knowledge Foundation suggests four simple steps to make data open.
    First off you need to decide what to share. Not all data can be made openly available due to commercial restrictions or sensitivities.
    Once you’ve decided what to share, determine what IP exists and apply a suitable licence.
    You should then make the data available in a suitable format so others can bulk download it. Remember that for it to be useful you want to share appropriate metadata and documentation too. Using repositories is useful to make sure your data are properly managed and preserved for the long-term.
    The final step is to make your data discoverable. Put it online, tell others about it and add details to various registries so it gets found.
  • Guidance from the DCC can also help researchers to understand data licensing. This guide outlines the pros and cons of each approach e.g. the limitations of some CC options
  • The RDRDS will work in partnership with a network of UK subject-specific data centres and university-based institutional data repositories to harvest dataset metadata records, and so promote the discoverability of research data held by all partner institutions. Partner institutions will remain responsible for the selection and stewardship of the datasets.

    There are already services doing similar things, but none have quite the same scope.

    In fact we have the potential to complement existing services (see Figure 1), by:
    collating records from both data centres and institutional repositories;
    normalising and deduplicating, to provide a unified search interface;
    ultimately make the records visible in other places researchers might look.
  • All share common challenges:
    – Reference models and architectures
    – Persistent data identifiers
    – Metadata management
    – Distributed data sources
    – Data interoperability