Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Boosting Data Science in Geochemistry: We Need Global Geochemical Data Standards and Networking!

81 visualizaciones

Publicado el

Presentation at AGU Fall Meeting 2018: Large-scale, global geochemical data syntheses like EarthChem and GEOROC have, for nearly two decades, inspired and made possible a vast range of scientific studies and new discoveries, facilitating the analysis and mining of geochemical data and creating new paradigms in geochemical data analysis such as statistical geochemistry. These syntheses provide easy access to fully integrated compilations of thousands of datasets (‘data fusion’) with millions of geochemical measurements that are accompanied by comprehensive and harmonized metadata for context and provenance to search, filter, sort, and evaluate the data.
The syntheses have been assembled and maintained through manual labor by data managers, who extract data and metadata from text, tables, and supplements of publications for inclusion in the databases, a time-consuming task due to the multitude of data formats, units, normalizations, vocabularies, etc., i.e. lack of best practices for geochemical data reporting. In order to support and advance future science endeavors that rely on access to and analysis of large volumes of geochemical data, we need to develop and implement global standards for geochemical data that not only make geochemical data FAIR (Findable, Accessible, Interoperable, Re-usable), but ready for data fusion. As more geochemical data systems are emerging at national, programmatic, and subdomain levels in response to Open Access policies and science needs, standard protocols for exchanging geochemical data among these systems will need to be developed, implemented, and governed.

Critical is the alignment with existing standards such as the Semantic Sensor Network (SSN) ontology, a recent joint W3C and OGC standard that standardizes description of sensors, observation, sampling, and actuation, with sufficient flexibility to allow details of these elements to be defined in different domains. New initiatives within the International Council for Science and CODATA are working towards coordinating the International Science Unions to identify and endorse the more authoritative standards (including vocabularies and ontologies). These initiatives present a timely opportunity for geochemical data to ensure that they are born ‘connected’ within and across disciplines.

Publicado en: Datos y análisis
  • Sé el primero en comentar

  • Sé el primero en recomendar esto

Boosting Data Science in Geochemistry: We Need Global Geochemical Data Standards and Networking!

  1. 1. BOOSTING DATA SCIENCE IN GEOCHEMISTRY We Need Global Geochemical Data Standards and Networking! Kerstin Lehnert Lamont-Doherty Earth Observatory, Columbia University, USA Lesley A Wyborn Australian National University, Australia Simon J D Cox CSIRO Land and Water, Australia Jens F Klump CSIRO Earth Science Resource Engineering, Australia Brent McInnes Curtin University, Australia
  2. 2. Data Science is Happening in Geochemistry (and Mineralogy, Petrology, etc.) V21A-08: Boosting Data Science in Geochemistry: We Need Global Geochemical Data Standards and Networking! 2 Goldschmidt 2018 Workshop “Data Science in Geochemistry”
  3. 3. Just Reflecting on this Session ... ■ How much work went into assembling the data to do the data-driven research in each of the talks? ■ What standards were followed to compile the data? What information about uncertainties or analytical procedure was included, what terminology was used? ■ Can I integrate the data compiled for talk A with the one from talk B? ■ Can we use the tools presented in talk X with the data from talk Y? V21A-08: Boosting Data Science in Geochemistry: We Need Global Geochemical Data Standards and Networking! 3
  4. 4. Obstacles for Data Science ■ Surveys in recent years show that data scientists still spend 75-80% of their time ‘data wrangling’. V21A-08: Boosting Data Science in Geochemistry: We Need Global Geochemical Data Standards and Networking! 4 Source: Crowdflower • RDA EU survey 2013 (75%) • Brodie 2015 (80%) • CrowdFlower 2017 (80%) Did you?
  5. 5. Example: Data Synthesis for DECADE ■ 15 scientists working for 5 days ■ Major progress was only made with the compilation of melt inclusion geochemistry because data were discoverable in PetDB and GEOROC. ■ Another 2 months of effort of the EarthChem data manager required to format & integrate data from different databases and unpublished data. V21A-08: Boosting Data Science in Geochemistry: We Need Global Geochemical Data Standards and Networking! 5
  6. 6. Urgency for a Geochemical Data Standard ■ We need to be able to share & integrate data globally from multiple databases each with their own schema. ■ We need to integrate & link data across disciplines (transdisciplinary). ■ We need to ensure compliance with FAIR data principles. ■ We need it to – Be more comprehensive with respect to data documentation, – Be aligned with modern standards, e.g. RDF, – Use, where possible, internationally endorsed vocabularies. ■ Above all, we need to have a more formal approval and governance. – We need to think of standard specifically for both technical and 'social' reasons. V21A-08: Boosting Data Science in Geochemistry: We Need Global Geochemical Data Standards and Networking! 6
  7. 7. A Never-Ending Story? V21A-08: Boosting Data Science in Geochemistry: We Need Global Geochemical Data Standards and Networking! 7 IGC 2008
  8. 8. Can We “Standardize” Geochemical Data? V21A-08: Boosting Data Science in Geochemistry: We Need Global Geochemical Data Standards and Networking! 8 I believe that our failure to unite our voices as geochemists has a simple origin – it is the complexity of our subject.
  9. 9. We Made Some Progress ■ Editors Roundtable recommendations with geochemical journals and databases ■ EarthChem XML schema ■ Rise of the IGSN V21A-08: Boosting Data Science in Geochemistry: We Need Global Geochemical Data Standards and Networking! 9 Goldstein et al. 2014, published in the EarthChem Library doi:10.1594/IEDA/100426
  10. 10. EarthChem Data Templates V21A-08: Boosting Data Science in Geochemistry: We Need Global Geochemical Data Standards and Networking! 10
  11. 11. EarthChemXML • Developed for the EarthChem Portal (ECP) in 2006 • Locally developed XML schema for data exchange that partner data systems use to encode their database content for inclusion in the ECP database. • Not comprehensive with respect to metadata, uses EarthCem vocabularies (so does not align with broader community vocabularies) • XML format is voluminous, especially for databases with hundreds of thousands of records. V21A-08: Boosting Data Science in Geochemistry: We Need Global Geochemical Data Standards and Networking! 11
  12. 12. EarthChem Portal V21A-08: Boosting Data Science in Geochemistry: We Need Global Geochemical Data Standards and Networking! 12 • 22,074 publications • 1,054,738 samples • 30,059,995 analytical values Global Federation of Geochemical Databases: • PetDB • SedDB • GEOROC (Germany) • USGS • MetPetDB • GANSEKI (Japan) • Data exchange protocol: EarthChemXML • APIs & web services (WMS, WFS) • Interoperability with modeling tools More formal & community governed standards needed for FAIR
  13. 13. Interoperable EarthChem Data V21A-08: Boosting Data Science in Geochemistry: We Need Global Geochemical Data Standards and Networking! 13 DECADE Portal (beta) http://decade.iedadata.org
  14. 14. V21A-08: Boosting Data Science in Geochemistry: We Need Global Geochemical Data Standards and Networking! 14
  15. 15. V21A-08: Boosting Data Science in Geochemistry: We Need Global Geochemical Data Standards and Networking! 15
  16. 16. V21A-08: Boosting Data Science in Geochemistry: We Need Global Geochemical Data Standards and Networking! 16
  17. 17. V21A-08: Boosting Data Science in Geochemistry: We Need Global Geochemical Data Standards and Networking! 17
  18. 18. Proliferation of Geochemical Databases ■ International ■ Science programs ■ Thematic V21A-08: Boosting Data Science in Geochemistry: We Need Global Geochemical Data Standards and Networking! 18 sponsored by the State Key Lab of the Geological Processes and Mineral Resources in the China University of Geosciences
  19. 19. ‚long tail‘ communities: • Analogue modelling • Rock physics/ mechanics • Paleomagnetics • Geochemistry Slide contributed by Kirsten Elger, GFZ Potsdam V21A-08: Boosting Data Science in Geochemistry: We Need Global Geochemical Data Standards and Networking! 19
  20. 20. Spain, 11, 41% Netherlands, 1, 4%Portugal, 3, 11% Italy, 12, 44% 27 Analytical labs, 4 countries Spain Netherlands Portugal Italy Barcelona Workshop (Nov 2018): • Agreement to use EarthChem Library templates for data publications via GFZ Data Services • Interest in collaboration for the development of global standards for geochemical data V21A-08: Boosting Data Science in Geochemistry: We Need Global Geochemical Data Standards and Networking! 20 Slide contributed by Kirsten Elger, GFZ Potsdam
  21. 21. Data Standards in Geochemistry are no longer an option ■ Publishers & funders are demanding FAIR data. ■ In order to do Data Science, we need to have a global network of geochemical data that can be accessed in a standardized format. ■ No one can do it alone – no one organization, no one group, no one country has the required resources or expertise. We need to build a global geochemistry data platform together! V21A-08: Boosting Data Science in Geochemistry: We Need Global Geochemical Data Standards and Networking! 21 “We must, indeed, all hang together or, most assuredly, we shall all hang separately”. Benjamin Franklin
  22. 22. AGU Town Hall “Building a Global Network of Geochemical Data” Tuesday, Dec 11, 6:15-7:00pm Marriott Marquis, Independence E Panel: Roberta Rudnick (President, Geochemical Society) Catherine Chauvel (Editor, Chemical Geology) Maria Uhle (NSF Program Director for International Activities) Lesley Wyborn, ANU V21A-08: Boosting Data Science in Geochemistry: We Need Global Geochemical Data Standards and Networking! 22

×