Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×

Real-World Data Challenges: Moving Towards Richer Data Ecosystems

Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio

Eche un vistazo a continuación

1 de 9 Anuncio

Más Contenido Relacionado

Presentaciones para usted (20)

A los espectadores también les gustó (20)

Anuncio

Similares a Real-World Data Challenges: Moving Towards Richer Data Ecosystems (20)

Más de Anita de Waard (20)

Anuncio

Más reciente (20)

Real-World Data Challenges: Moving Towards Richer Data Ecosystems

  1. 1. | 1 Anita de Waard 0000-0002-9034-4119 VP Research Data Collaborations Elsevier RDM Services a.dewaard@elsevier.com Big Data PI Meeting March 16, 2016 Real-World Data Challenges: Moving Towards Richer Data Ecosystems
  2. 2. | 2 ESGF- VL ESGF ESG- CET ESG-II ESG-I Usable capabilities Future capabilities Prototype capabilities 1999-2001 2001-2006 2006-2011 2011-2020 2020- Planned Earth System Grid System Evolution Planned Earth System Grid System Data Archival Model Intercomparison Projects Remote Sensing, In Situ, Climatology, Diagnostics, Ecosystem, Hydrology, Biology, Etc. Petabytes (1015) Exabytes (1018) 1999 20222017 Centralized Archive Distributed Data Ecosystem Virtual Laboratory Source: Dean Williams, Lawrence Livermore/ESGF, March 1st 2017 Trend # 1: Repositories are becoming virtual labs
  3. 3. | 3 Trend # 2: Scientists are Moving ‘Beyond Downloads’
  4. 4. | 4 Trend # 3: Computers are scientists, too! “intelligent systems for computer-aided discovery can complement and integrate into the insight generation loop in scalable ways…” http://ieeexplore.ieee.org/abstract/document/7515118/: Computer-Aided Discovery: Toward Scientific Insight Generation with Machine Support “This work combines time series Principal Component Analysis with InSAR to constrain the space of possible model explanations on current empirical data sets and achieve a better identification of deformation patterns”
  5. 5. | 5 Raising many technical/organisational/policy questions: • Is Long-Tail Data + Semantics = Big Data? • Is Data Science a field, or a skill? (A department, or a class?) • Are supercomputing centers research departments or bits of infrastructure? (And if infrastructure, are they part of IT? (“Oh, no, anything but that!”) • Are repositories places to store outputs, or places where science is conducted? • If so, how are repositories and HPC’s recognised and rewarded? • How can we keep track of (micro)provenance of parts of data sets? • Should we explore Blockchain technology for this? (“Oh no, anything but that!”) • Is a piece of software part of the University’s Research Outputs? • If so, how do we reward brilliant coders who blog, but don’t write? • How do we reward (virtual) collaboration? • Why won’t those damn scientists share their data? • Who will own the Data Science Cloud: Amazon? Or the joint HPC’s (NDS??) Is NIH Data Commons the Model? Or is this a free for all? What is the role of commercial parties? • Is data curation/stewardship a part of science, or a glorified administrator's job? • What is the role of libraries, in all this? • And why the hell is a publisher talking about it?
  6. 6. | 6 6 Inst. Data Repositorie(s) Lab ELN(s) Data Journal Data search Link to article Journal Find Topic Identify gaps Plan & Fund Discover data, people, methods & protocols Collect, analyze & vizualize Store, preserve & share Publish Prepare, reproduce, re-use & benchmark Domain-specific Repositories General search Faculty LIMS Data center Inst. Data Repositorie(s) Lab ELN(s) Data Journal Data search Data Management Plans Metadata, methods & protocols ready for preservation and publishing Link to article Journal Publish data (under embargo) Secure discoverability in & outside the institution Plan each step from experiment to publish Domain-specific Repositories General search What Elsevier is Interested in: Supporting RDM Networks
  7. 7. | 7 Biological Pathways extracted via semantic text mining A upregulates B B upregulates C C increases disease D Normalizing vocabularies required: proteins, diseases, drugs, chemicals A  B  C  D Bioactivities through text analysis IC50 6.3nM, kinase binding assay 10mM concentration Chemical Structures And Properties InChi, Name NCBI, Uniprot EMTREE ReaxysTree, Structures What Elsevier is Interested in: Knowledge Graphs in Life Science
  8. 8. | 8 What Elsevier is Interested in: Knowledgegraphs in Research
  9. 9. | 9 Thank you! Links to things we’re involved with: • https://www.elsevier.com/connect/10-aspects-of-highly-effective-research-data • https://www.elsevier.com/about/open-science/research-data • https://www.hivebench.com • https://data.mendeley.com/ • https://datasearch.elsevier.com/ • https://www.elsevier.com/books-and-journals/content-innovation/data-base- linking • http://www.journals.elsevier.com/softwarex/ • https://www.elsevier.com/physical-sciences/earth-and-planetary-sciences/the- 2015-international-data-rescue-award-in-the-geosciences • https://rd-alliance.org/groups/rdawds-publishing-data-services-wg.html • https://www.force11.org/ • http://www.nationaldataservice.org/ • https://rd-alliance.org/ Anita de Waard, a.dewaard@elsevier.com

Notas del editor

  • Outline:
    Some Trends
    Some Questions
    What Elsevier is interested in, and doing
  • Example – your eln being able to publish protocols directly - easing the resaerchers burden

×