Slides for keynote talk at the Big Data Europe workshop nr 3 on 11.9.2017 in Amsterdam co-located with SEMANTiCS2017 conference by Ron Dekker, Director CESSDA: European Open Science Agenda: where we are and where we are going?
3. Open Science - Definition
Michael Nielsen
"Open science is the idea that
scientific knowledge of all kinds
should be openly shared as early
as is practical in the discovery
process."
scientific knowledge of all kinds:
includes journal articles, data, code,
online software tools, questions,
ideas, and speculations;
anything which can be considered
knowledge.
as is practical: very often there are
other factors (legal, ethical, social, etc)
that must be considered.
4. TRENDS
1. SCIENCE WILL OPEN UP
• Data-driven
• Reproducibility
• Better connect within science and with society
2. INFORMATION SOCIETY
3. PLATFORMS
5. TRENDS
1. SCIENCE WILL OPEN UP
2. INFORMATION SOCIETY
• Data is the new oil
• Re-usable
3. PLATFORMS
6. TRENDS
1. SCIENCE WILL OPEN UP
2. INFORMATION SOCIETY
3. PLATFORMS
• Value-creating interactions
between producers & users
• No ownership by provider
8. FAIR
Findable
Easy to find by both humans and computer systems
Based on mandatory description of the metadata;
Accessible
Stored for long term
Easy access /download well-defined license and access conditions
At the level of metadata, or at the level of the actual data content
Interoperable
Ready to be combined with other datasets
By humans as well as computer systems
Reusable
Ready to be used for future research
9. FAIR
Findable
Easy to find by both humans and computer systems
Based on mandatory description of the metadata;
Accessible
Stored for long term
Easy access /download well-defined license and access conditions
At the level of metadata, or at the level of the actual data content
Interoperable
Ready to be combined with other datasets
By humans as well as computer systems
Reusable
Ready to be used for future research
CATALOGUE
META DATA
F+A+I = R ?
10. Data Management Plans
A Data Management Plan provides information on:
•The data the research will generate
•How to ensure its
•curation,
•preservation and
•sustainability
•What parts of that data will be open (and how)
11. DMPs: mainly sticks …
Sticks
• Obligations by many stakeholders
• risk of fragmentation and red tape
Carrots
• Tools
• DMPonline
https://dmponline.dcc.ac.uk
• “Lab Journal” software/tools that
• ensure Reproducibility
• reserve a DOI for the data
• upload the DMPs to publishing platforms
• Change: Publish and Curate your data
• different mindset
LabFolder integrates with
Mendeley
12. European Cloud Initiative
3 pillars (COM 2016/178 - 19 April 2016)
European Data Infrastructure (EDI)
Development and deployment of large-scale European
HPC, data and network infrastructure
Widening access
SMEs, Industry at large, Government
European Open Science Cloud (EOSC)
Researchers have seamless access to all relevant data
13. European Open Science Cloud
Connect with Open Science
• EOSC is part of Europe´s ambition to support the transition to
Open Science and make the most of data-driven science.
Efficient
• It's cost-effective,
• Covers privacy & IPR-conscious
• Combine existing infrastructure
• Federation of existing and emerging infrastructures
Added value
• Scale, data-driven science, inter-disciplinary,
• Data - to - knowledge - to - innovation
14.
15. Clouds are already existing
NIH Commons
NSF Open Science Cloud
Microsoft Azure
Amazon Web Services
16. How to proceed?
• Let 1000 flowers bloom, or top-down
• By Nation or Discipline
• Pipelines (silo’s) or Platforms
COMPUTING NETWORKS SOFTWARE CONTENT
17. It’s a cultural challenge
How to …
• Create a safe & secure environment
• Realise authentication - of users, of producers
• Deal with sensitive data
• Ensure quality of the data
• Stimulate sharing data
• Bring trust
19. Open Science
A systemic change in the modus
operandi of science and research
Affecting the whole research cycle
and its stakeholders
Commissioner Carlos Moedas
Open Science Presidency Conference
Amsterdam, 4 April 2016
20. European Open Science Agenda
1.Reward systems
2.Altmetrics: measuring quality and impact
3.New models for publishing
4.FAIR open data
5.Open Science Cloud
6.Research integrity
7.Citizen Science
8.Open education and skills
21. European Open Science Agenda
1.Reward systems
2.Altmetrics: measuring quality and impact
3.New models for publishing
4.FAIR open data
5.Open Science Cloud
6.Research integrity
7.Citizen Science
8.Open education and skills
25. CESSDA
Mission
• Provide a distributed and sustainable research
infrastructure that enables the research community to
conduct high-quality research in the social sciences
Vision
• Platform to provide seamless access to FAIR social
science research data in a safe & secure way
26. Stakeholders
Members (Funders)
• Governments, Research Funding Organisations
• Universities, other Research Performing Organisations
Service Providers
• Data Services
• IT Infrastructure (computing, network, software)
• Research Libraries
• Publishers
Data Producers
• Researchers & Research Performing Organisations
Data Re-Users
• Researchers, Professionals, Citizens
27. CESSDA Strategy
• Technology
• CESSDA Catalogue (Findable)
• Pathfinder Projects on FAIR, Secure/Safe/Seamless
• Trust
• Safe & Secure Data Infrastructure
• incl. Single Sign On, Different Access Modes
• CESSDA Providers as Trusted Repositories
• Training & Tools
• Train the Trainers & Train the Researchers
• Tools, e.g. for data management plans
31. Big Data Europe
BDI Components Used in this Pilot
• Apache Flume (data ingestion)
• Apache Kafka (messaging)
• Apache Spark (distributed analysis, transformation)
• Apache HDFS (raw storage)
• SWC PoolParty Semantic Suite
(data consolidation, curation)
• OpenLink Virtuoso (triple store)
• Apache HTTP (linked data serving)
• PoolParty Semantic Graph Search Server
(visualisation and data browsing)
32. Big Data Europe
BDI Components Used in this Pilot
• Apache Flume (data ingestion)
• Apache Kafka (messaging)
• Apache Spark (distributed analysis, transformation)
• Apache HDFS (raw storage)
• SWC PoolParty Semantic Suite
(data consolidation, curation)
• OpenLink Virtuoso (triple store)
• Apache HTTP (linked data serving)
• PoolParty Semantic Graph Search Server
(visualisation and data browsing)
A LOT OF WORK - VOLUME & DYNAMICS
COMPLEX
MAINTENANCE
BUSINESS MODELS - WHO PAYS FOR WHAT
33. Big Data or AI?
What if machines take over the tedious and dirty work?
• Machines, Platforms and Crowd
• Dr. Watson
• Homo Deus
Chan Zuckerberg Initiative
• Human Cell Project
In the news
• New AI can guess whether you're gay or straight from a photograph
• Elon Musk says AI could lead to third world war
• Report shows that AI is more important to IoT than big data insights