SlideShare a Scribd company logo
1 of 44
Data Observation Network for Earth
(DataONE): Supporting Scientific Data
Preservation, Discovery, and Innovation

Bill Michener

Professor and DataONE Project Director
University of New Mexico

24 September 2012

National Information Standards Organization
2
Research and Data Life Cycle Integration


                                 ?
                                                        Plan
         Proposal
          writing                           Analyze              Collect




Ideas                 Research       Integrate                         Assure




                                           Discover              Describe

        Publication                                   Preserve




                                 ?
                                                                                3
Three Key Challenges
                              Plan

                 Analyze               Collect
  I v o
  n a n
  n t
  o i
          Integrate                              Assure




                 Discover              Describe

                            Preserve
                                                          4
1. Data Preservation and Planning




✔                           ?       5
The Long Tail of Orphan Data


                                      “Most of the bytes
                                      are at the high end,
         Specialized repositories     but most of the
         (e.g. GenBank, PDB)          datasets are at the
Volume




                                      low end” – Jim Gray

                      Orphan data



                                                  (B. Heidorn)
                    Rank frequency of datatype

                                                                 6
Planning ?




             Metadata standard?
              Data repository?




                                  7
DataONE and the DMPTool
         Support Data Preservation
Three major components for a      Member Nodes
flexible, scalable, sustainable   • diverse institutions
                                  Coordinating Nodes
network                           • serve local community
                                  • retain complete metadata
                                  Investigator Toolkit
                                  • provide resources for
                                    catalog
                                    managing their data
                                  • indexing for search
                                  • retain copies of data
                                  • network-wide services
                                  • ensure content
                                    availability (preservation)
                                  • replication services




                                                                  8
Dryad (>3,000 data products)
Coordinated
submission of articles
and underlying data




Handshaking with
specialized
repositories




Promotion of reuse
and incentives for
deposit

                               9
Knowledge Network for Biocomplexity
 (20,000+ data packages)
                                Data Types
                                • Ecological
                                • Environmental
                                • Demographic
                                • Social/Legal/Economic

Contributors                     60
• Individual investigators       45                  Data
• Field stations and networks    30                  Sizes
• Government agencies                                   %
                                 15
• Non-profit partnerships         0

                                                  10-200

                                                           >200
                                      <1

                                           1-10
• Synthesis centers
                                                                  MB
                                                                       10
✔Check for best practices
                 ✔Create metadata
                 ✔Connect to ONEShare




   Data &
Metadata (EML)




                                         11
Data Management Planning Tool




                                12
13
14
2. Data Discovery




                    15
Data Silos




             16
The DataONE Federation




                         17
Member Node Functional Tiers

Tier 1: Read only, public content
   ping(), getLogRecords(), getCapabilities(),get(), getSys
   temMetadata(), getChecksum(),listObjects(), synchronizat
   ionFailed()

Tier 2: Read only, with access control
   isAuthorized(), setAccessPolicy()

Tier 3: Read/Write using client tools
   create(), update(), delete()

Tier 4: Able to operate as a replication target
    replicate(),getReplica()


http://mule1.dataone.org/ArchitectureDocs-current/apis/MN_APIs.html


                                                                      18
ORNL DAAC
as a DataONE
Member Node              NASA collectors   DAAC Users (UWG)




Investigator Toolkit




         DataONE Users
                                                          19
20
21
22
23
24
1. Ontology-based discovery search results


Concepts acquire
context: biomass
  as Material or
biomass as Energy                                                     Additional
                                                                     search terms




 Super-classes
   may have
   different
                    1. NCBO ontology repository instance
  properties        2. Populated with ontologies (e.g., the NASA-JPL Semantic Web
                    for Earth and Environmental Terminology)
                    3. Queried ontologies and returned results using REST services   25
Approach 2: Enrich MN Metadata
                                 DAAC    DRYAD        KNB            3                   KNB
Number of Documents               978     1,729       24,249         2       DRYAD
Total Number of Keywords         7,294    8,266     254,525          1            DAAC
Average Keywords/Document         7.46     4.78         10.49            0    2      4     6   8   10   12



     Actual Keywords                          Suggested Keywords
                                              [1]field investigation
     1. canopy characteristics                [2]analysis
     2. field investigation                   [3]land cover
                                              [4]computational model
     3. vegetation index                      [5]reflectance
     4. leaf characteristics                  [6]vegetative cover
                                              [7]biomass
     5. Satellite                             [8]primary production
                                              [9]steel measuring tape
     6. land cover                            [10]weigh balance
     7. leaf area meter                       [11]precipitation amount
                                              [12]canopy characteristics
     8. Reflectance                           [13]leaf characteristics
     9. steel measuring tape                  [14]water vapor
                                              [15]quadrat sample frame
     10. vegetative cover                     [16]rain gauge
                                              [17]surface air temperature
     11. plant characteristics                [18]air temperature
     12. albedo                               [19]meteorological station
                                              [20]human observer
                                              [21]vegetation index
                                              [22]soil core device
                                              [23]plant characteristics
                                              [24]surface wind                                               26
                                              [25]albedo
3. Innovation


The Fourth Paradigm:
1. Observational and
   experimental
2. Theoretical research
3. Computer simulations of
   natural phenomena
4. Data-intensive research
    • new
      tools, techniques, and
      ways of working

                               27
                                    27
“Data Intensive Science” and the “80:20 Rule”
                              Increasing Process Knowledge
Decreasing Spatial Coverage




                                                                                      Intensive science sites
                                                                                      and experiments


                                                                                          Extensive science sites


                                                                                                Volunteer &
                                                                                                education networks

                                                                                                      Remote
                                                                                                      sensing
                                                             Adapted from CENR-OSTP

                                                                                                                     28
Public Participation in Scientific Research Conference: 4-5 August 2012 in
Portland, Oregon USA prior to Ecological Society of America meeting (6-10 Aug.):
http://www.birds.cornell.edu/citscitoolkit/conference/2012




                                                                                   29
Investigator Toolkit Support

                            Plan
                          DMP-Tool
               Analyze               Collect
Kepler




         Integrate                         Assure




               Discover              Describe

                          Preserve
                                                    30
Exploration, Visualization, and Analysis

                Diverse bird observations and           Model results
                environmental data from
                300,00 locations in the US      Occurrence of Indigo Bunting (2008)
                integrated and analyzed using
                High Performance Computing
                Resources


Land Cover


                                                  Jan   Ap     Jun   Sep    Dec
                                                        r
Meteorology
                                                  • Examine patterns of
                                                    migration
MODIS –         Spatio-Temporal Exploratory       • Infer how climate
Remote          Model identifies factors            change may affect
sensing data    affecting patterns of               bird migration
                migration


                                                                                      31
Taverna, MyExperiment




                        32
Provenance Browser




       33
                     33
DataONE: Supporting Scientific Data
 Preservation, Discovery, and Innovation
                             Current Member Nodes:




                             Coming Soon:
Current Tools:


Tools Coming Soon:               Queensland University of Technology



                                                                       34
Deployment Targets – Y5
                                                
 2009        2010           2011          2012           2013        2014
        Y1             Y2          Y3               Y4          Y5


        Metadata Objects           100k (130k)           400k         1M
        Datasets                   90k (120k)            180k        360k
        Uptime                     99.0 (100)            99.9        99.9
        Metadata Schemas                8 (4)             8           8
        Member Nodes                 10 (8)              20           40
        MN Countries                    3 (2)             5           10
        Coordinating Nodes              3 (3)             4           5
        CN Countries                    1 (1)             1           2
        ITK Tools                       8 (4)            10           12



                                                                            35
Community Engagement




                       36
User Assessments


Scientists: BL                          Scientists: FU


                 Library Policies: BL                       Library Policies: FU


                 Librarians: BL                             Librarians: FU


                             Policy Makers: BL                               Policy Makers: FU


                                                    Educators: BL                                Educators: FU




      Year 1                      Year 2             Year 3               Year 4                 Year 5


                                                                                                                 37
Community Engagement




                       38
Best Practices and Software Tools




                                    39
June 3-21, 2013
University of New Mexico   40
Internships
                                 2009 – 4 interns, 2010 – 4 interns
                                 2011 – 8 interns, 2012 – 6 interns




https://notebooks.dataone.org/summer2012/




                                                                      41
DataONE: Supporting Scientific Data
Preservation, Discovery, and Innovation




                                          42
DataONE.org




              43
DataONE Team and Sponsors
        • Amber Budden, Roger Dahl, Rebecca Koskela, Bill       • Ewa Deelman
          Michener, Robert Nahf, Skye Roseboom, Mark
          Servilla
                                                                • Deborah McGuinness
        • Dave Vieglais
        • Suzie Allard, Nick Dexter, Kimberly                   • Jeff Horsburgh
          Douglass, Carol Tenopir, Robert Waltz, Bruce
        • Wilson
          John Cobb, Bob Cook, Ranjeet                          • Robert Sandusky
         Devarakonda, Giri Palanismy, Line Pouchard
        • Patricia Cruse, John Kunze                            • Bertram Ludaescher

        • Sky Bristol, Mike Frame, Richard Huffine, Viv         • Peter Honeyman
          Hutchison, Jeff Morisette, Jake Weltzin, Lisa Zolly
        • Stephanie Hampton, Chris Jones, Matt                  • Cliff Duke
          Jones, Ben Leinfelder, Andrew Pippin

        • Paul Allen, Rick Bonney, Steve Kelling                • Carole Goble

        • Ryan Scherle, Todd Vision                             • Donald Hobern

        • Randy Butler                                          • David DeRoure


                  LEON LEVY
                  FOUNDATION                                                           44

More Related Content

What's hot

Preserving the Inputs and Outputs of Scholarship
Preserving the Inputs and Outputs of ScholarshipPreserving the Inputs and Outputs of Scholarship
Preserving the Inputs and Outputs of Scholarshiptsbbbu
 
Publishing biodiversity: The interplay between Scratchpads and the new Biodiv...
Publishing biodiversity: The interplay between Scratchpads and the new Biodiv...Publishing biodiversity: The interplay between Scratchpads and the new Biodiv...
Publishing biodiversity: The interplay between Scratchpads and the new Biodiv...Dimitrios Koureas
 
Exploring Process Barriers to Release Public Sector Information in Local Gove...
Exploring Process Barriers to Release Public Sector Information in Local Gove...Exploring Process Barriers to Release Public Sector Information in Local Gove...
Exploring Process Barriers to Release Public Sector Information in Local Gove...Peter Conradie
 
Small Data: Bridging the Gap Between Generic and Specific Repositories
Small Data: Bridging the Gap Between Generic and Specific RepositoriesSmall Data: Bridging the Gap Between Generic and Specific Repositories
Small Data: Bridging the Gap Between Generic and Specific RepositoriesAnita de Waard
 
IASSIST identifiers By Joan Starr
IASSIST identifiers By Joan StarrIASSIST identifiers By Joan Starr
IASSIST identifiers By Joan StarrCarly Strasser
 
RDFC2012 Open Access to Research Data
RDFC2012 Open Access to Research DataRDFC2012 Open Access to Research Data
RDFC2012 Open Access to Research DataGudmundur Thorisson
 
Making Data Dynamic: Views from UC3, CDL
Making Data Dynamic: Views from UC3, CDLMaking Data Dynamic: Views from UC3, CDL
Making Data Dynamic: Views from UC3, CDLCarly Strasser
 
A Linked Data Prototype for the Union Catalog of Digital Archives Taiwan
A Linked Data Prototype for the Union Catalog of Digital Archives TaiwanA Linked Data Prototype for the Union Catalog of Digital Archives Taiwan
A Linked Data Prototype for the Union Catalog of Digital Archives Taiwanandrea huang
 
Scott Edmunds: Data publication in the data deluge
Scott Edmunds: Data publication in the data delugeScott Edmunds: Data publication in the data deluge
Scott Edmunds: Data publication in the data delugeGigaScience, BGI Hong Kong
 
RDAP13 Jian Qin: Functional and Architectural Requirements for Metadata
RDAP13 Jian Qin: Functional and Architectural Requirements for MetadataRDAP13 Jian Qin: Functional and Architectural Requirements for Metadata
RDAP13 Jian Qin: Functional and Architectural Requirements for MetadataASIS&T
 
Making your data work for you: Scratchpads, publishing & the biodiversity dat...
Making your data work for you: Scratchpads, publishing & the biodiversity dat...Making your data work for you: Scratchpads, publishing & the biodiversity dat...
Making your data work for you: Scratchpads, publishing & the biodiversity dat...Vince Smith
 
Inquiry Optimization Technique for a Topic Map Database
Inquiry Optimization Technique for a Topic Map DatabaseInquiry Optimization Technique for a Topic Map Database
Inquiry Optimization Technique for a Topic Map Databasetmra
 
Collaboration and Sharing
Collaboration and SharingCollaboration and Sharing
Collaboration and SharingJisc
 
OAI7 Research Objects
OAI7 Research ObjectsOAI7 Research Objects
OAI7 Research Objectsseanb
 

What's hot (19)

Data managementbasics issr_20130301
Data managementbasics issr_20130301Data managementbasics issr_20130301
Data managementbasics issr_20130301
 
Preserving the Inputs and Outputs of Scholarship
Preserving the Inputs and Outputs of ScholarshipPreserving the Inputs and Outputs of Scholarship
Preserving the Inputs and Outputs of Scholarship
 
Publishing biodiversity: The interplay between Scratchpads and the new Biodiv...
Publishing biodiversity: The interplay between Scratchpads and the new Biodiv...Publishing biodiversity: The interplay between Scratchpads and the new Biodiv...
Publishing biodiversity: The interplay between Scratchpads and the new Biodiv...
 
Metadata Workshop
Metadata WorkshopMetadata Workshop
Metadata Workshop
 
Exploring Process Barriers to Release Public Sector Information in Local Gove...
Exploring Process Barriers to Release Public Sector Information in Local Gove...Exploring Process Barriers to Release Public Sector Information in Local Gove...
Exploring Process Barriers to Release Public Sector Information in Local Gove...
 
Small Data: Bridging the Gap Between Generic and Specific Repositories
Small Data: Bridging the Gap Between Generic and Specific RepositoriesSmall Data: Bridging the Gap Between Generic and Specific Repositories
Small Data: Bridging the Gap Between Generic and Specific Repositories
 
IASSIST identifiers By Joan Starr
IASSIST identifiers By Joan StarrIASSIST identifiers By Joan Starr
IASSIST identifiers By Joan Starr
 
NISO DCMI Webinar bibframe-20130123
NISO DCMI Webinar bibframe-20130123NISO DCMI Webinar bibframe-20130123
NISO DCMI Webinar bibframe-20130123
 
RDFC2012 Open Access to Research Data
RDFC2012 Open Access to Research DataRDFC2012 Open Access to Research Data
RDFC2012 Open Access to Research Data
 
Making Data Dynamic: Views from UC3, CDL
Making Data Dynamic: Views from UC3, CDLMaking Data Dynamic: Views from UC3, CDL
Making Data Dynamic: Views from UC3, CDL
 
A Linked Data Prototype for the Union Catalog of Digital Archives Taiwan
A Linked Data Prototype for the Union Catalog of Digital Archives TaiwanA Linked Data Prototype for the Union Catalog of Digital Archives Taiwan
A Linked Data Prototype for the Union Catalog of Digital Archives Taiwan
 
Scott Edmunds: Data publication in the data deluge
Scott Edmunds: Data publication in the data delugeScott Edmunds: Data publication in the data deluge
Scott Edmunds: Data publication in the data deluge
 
NISO/DCMI Webinar: Metadata for Public Sector Administration
NISO/DCMI Webinar: Metadata for Public Sector AdministrationNISO/DCMI Webinar: Metadata for Public Sector Administration
NISO/DCMI Webinar: Metadata for Public Sector Administration
 
RDAP13 Jian Qin: Functional and Architectural Requirements for Metadata
RDAP13 Jian Qin: Functional and Architectural Requirements for MetadataRDAP13 Jian Qin: Functional and Architectural Requirements for Metadata
RDAP13 Jian Qin: Functional and Architectural Requirements for Metadata
 
Making your data work for you: Scratchpads, publishing & the biodiversity dat...
Making your data work for you: Scratchpads, publishing & the biodiversity dat...Making your data work for you: Scratchpads, publishing & the biodiversity dat...
Making your data work for you: Scratchpads, publishing & the biodiversity dat...
 
Identifying psychological research data in the digital environment.
Identifying psychological research data in the digital environment. Identifying psychological research data in the digital environment.
Identifying psychological research data in the digital environment.
 
Inquiry Optimization Technique for a Topic Map Database
Inquiry Optimization Technique for a Topic Map DatabaseInquiry Optimization Technique for a Topic Map Database
Inquiry Optimization Technique for a Topic Map Database
 
Collaboration and Sharing
Collaboration and SharingCollaboration and Sharing
Collaboration and Sharing
 
OAI7 Research Objects
OAI7 Research ObjectsOAI7 Research Objects
OAI7 Research Objects
 

Similar to NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an era of data-intensive science

ESI Supplemental Webinar 2 - DataONE presentation slides
ESI Supplemental Webinar 2 - DataONE presentation slides ESI Supplemental Webinar 2 - DataONE presentation slides
ESI Supplemental Webinar 2 - DataONE presentation slides DuraSpace
 
e-Science, Research Data and Libaries
e-Science, Research Data and Libariese-Science, Research Data and Libaries
e-Science, Research Data and LibariesRob Grim
 
Libby Bishop, Ethics Of Data Sharing Ncess Jun 09 Final
Libby Bishop, Ethics Of Data Sharing Ncess Jun 09 FinalLibby Bishop, Ethics Of Data Sharing Ncess Jun 09 Final
Libby Bishop, Ethics Of Data Sharing Ncess Jun 09 Finala.carusi
 
DataOne - Suzie Allard - RDAP12
DataOne - Suzie Allard - RDAP12DataOne - Suzie Allard - RDAP12
DataOne - Suzie Allard - RDAP12ASIS&T
 
SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science Robert H. McDonald
 
DataONE_cobb_hubbub2012_20120924_v05
DataONE_cobb_hubbub2012_20120924_v05DataONE_cobb_hubbub2012_20120924_v05
DataONE_cobb_hubbub2012_20120924_v05John Cobb
 
3 bitriplifiertalk
3 bitriplifiertalk3 bitriplifiertalk
3 bitriplifiertalkJohn Deck
 
Scientific data management from the lab to the web
Scientific data management   from the lab to the webScientific data management   from the lab to the web
Scientific data management from the lab to the webJose Manuel Gómez-Pérez
 
Functional and Architectural Requirements for Metadata: Supporting Discovery...
Functional and Architectural Requirements for Metadata: Supporting Discovery...Functional and Architectural Requirements for Metadata: Supporting Discovery...
Functional and Architectural Requirements for Metadata: Supporting Discovery...Jian Qin
 
IASSIT Kansa Presentation
IASSIT Kansa PresentationIASSIT Kansa Presentation
IASSIT Kansa Presentationekansa
 
Metadata in general and Dublin Core in specific; some experiences
Metadata in general and Dublin Core in specific; some experiencesMetadata in general and Dublin Core in specific; some experiences
Metadata in general and Dublin Core in specific; some experiencesKerstin Forsberg
 
Using Architectures for Semantic Interoperability to Create Journal Clubs for...
Using Architectures for Semantic Interoperability to Create Journal Clubs for...Using Architectures for Semantic Interoperability to Create Journal Clubs for...
Using Architectures for Semantic Interoperability to Create Journal Clubs for...James Powell
 
Triplifier talk
Triplifier talkTriplifier talk
Triplifier talkJohn Deck
 
Materials Data Facility: Streamlined and automated data sharing, discovery, ...
Materials Data Facility: Streamlined and automated data sharing,  discovery, ...Materials Data Facility: Streamlined and automated data sharing,  discovery, ...
Materials Data Facility: Streamlined and automated data sharing, discovery, ...Ian Foster
 
Dataset Citation and Identification
Dataset Citation and IdentificationDataset Citation and Identification
Dataset Citation and Identificationguest453b14
 
Dataset Citation and Identification
Dataset Citation and IdentificationDataset Citation and Identification
Dataset Citation and Identificationguest453b14
 
Dataset Citation and Identification
Dataset Citation and IdentificationDataset Citation and Identification
Dataset Citation and Identificationguest453b14
 
Dataset citation and identification
Dataset citation and identificationDataset citation and identification
Dataset citation and identificationAdam Farquhar
 
Tim Malthus_Towards standards for the exchange of field spectral datasets
Tim Malthus_Towards standards for the exchange of field spectral datasetsTim Malthus_Towards standards for the exchange of field spectral datasets
Tim Malthus_Towards standards for the exchange of field spectral datasetsTERN Australia
 

Similar to NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an era of data-intensive science (20)

Michener Plenary PPSR2012
Michener Plenary PPSR2012Michener Plenary PPSR2012
Michener Plenary PPSR2012
 
ESI Supplemental Webinar 2 - DataONE presentation slides
ESI Supplemental Webinar 2 - DataONE presentation slides ESI Supplemental Webinar 2 - DataONE presentation slides
ESI Supplemental Webinar 2 - DataONE presentation slides
 
e-Science, Research Data and Libaries
e-Science, Research Data and Libariese-Science, Research Data and Libaries
e-Science, Research Data and Libaries
 
Libby Bishop, Ethics Of Data Sharing Ncess Jun 09 Final
Libby Bishop, Ethics Of Data Sharing Ncess Jun 09 FinalLibby Bishop, Ethics Of Data Sharing Ncess Jun 09 Final
Libby Bishop, Ethics Of Data Sharing Ncess Jun 09 Final
 
DataOne - Suzie Allard - RDAP12
DataOne - Suzie Allard - RDAP12DataOne - Suzie Allard - RDAP12
DataOne - Suzie Allard - RDAP12
 
SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science
 
DataONE_cobb_hubbub2012_20120924_v05
DataONE_cobb_hubbub2012_20120924_v05DataONE_cobb_hubbub2012_20120924_v05
DataONE_cobb_hubbub2012_20120924_v05
 
3 bitriplifiertalk
3 bitriplifiertalk3 bitriplifiertalk
3 bitriplifiertalk
 
Scientific data management from the lab to the web
Scientific data management   from the lab to the webScientific data management   from the lab to the web
Scientific data management from the lab to the web
 
Functional and Architectural Requirements for Metadata: Supporting Discovery...
Functional and Architectural Requirements for Metadata: Supporting Discovery...Functional and Architectural Requirements for Metadata: Supporting Discovery...
Functional and Architectural Requirements for Metadata: Supporting Discovery...
 
IASSIT Kansa Presentation
IASSIT Kansa PresentationIASSIT Kansa Presentation
IASSIT Kansa Presentation
 
Metadata in general and Dublin Core in specific; some experiences
Metadata in general and Dublin Core in specific; some experiencesMetadata in general and Dublin Core in specific; some experiences
Metadata in general and Dublin Core in specific; some experiences
 
Using Architectures for Semantic Interoperability to Create Journal Clubs for...
Using Architectures for Semantic Interoperability to Create Journal Clubs for...Using Architectures for Semantic Interoperability to Create Journal Clubs for...
Using Architectures for Semantic Interoperability to Create Journal Clubs for...
 
Triplifier talk
Triplifier talkTriplifier talk
Triplifier talk
 
Materials Data Facility: Streamlined and automated data sharing, discovery, ...
Materials Data Facility: Streamlined and automated data sharing,  discovery, ...Materials Data Facility: Streamlined and automated data sharing,  discovery, ...
Materials Data Facility: Streamlined and automated data sharing, discovery, ...
 
Dataset Citation and Identification
Dataset Citation and IdentificationDataset Citation and Identification
Dataset Citation and Identification
 
Dataset Citation and Identification
Dataset Citation and IdentificationDataset Citation and Identification
Dataset Citation and Identification
 
Dataset Citation and Identification
Dataset Citation and IdentificationDataset Citation and Identification
Dataset Citation and Identification
 
Dataset citation and identification
Dataset citation and identificationDataset citation and identification
Dataset citation and identification
 
Tim Malthus_Towards standards for the exchange of field spectral datasets
Tim Malthus_Towards standards for the exchange of field spectral datasetsTim Malthus_Towards standards for the exchange of field spectral datasets
Tim Malthus_Towards standards for the exchange of field spectral datasets
 

More from National Information Standards Organization (NISO)

More from National Information Standards Organization (NISO) (20)

Bazargan "NISO Webinar, Sustainability in Publishing"
Bazargan "NISO Webinar, Sustainability in Publishing"Bazargan "NISO Webinar, Sustainability in Publishing"
Bazargan "NISO Webinar, Sustainability in Publishing"
 
Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"Rapple "Scholarly Communications and the Sustainable Development Goals"
Rapple "Scholarly Communications and the Sustainable Development Goals"
 
Compton "NISO Webinar, Sustainability in Publishing"
Compton "NISO Webinar, Sustainability in Publishing"Compton "NISO Webinar, Sustainability in Publishing"
Compton "NISO Webinar, Sustainability in Publishing"
 
Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"
 
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
Hazen, Morse, and Varnum "Spring 2024 ODI Conformance Statement Workshop for ...
 
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
Mattingly "AI & Prompt Design" - Introduction to Machine Learning"
 
Mattingly "Text and Data Mining: Building Data Driven Applications"
Mattingly "Text and Data Mining: Building Data Driven Applications"Mattingly "Text and Data Mining: Building Data Driven Applications"
Mattingly "Text and Data Mining: Building Data Driven Applications"
 
Mattingly "Text and Data Mining: Searching Vectors"
Mattingly "Text and Data Mining: Searching Vectors"Mattingly "Text and Data Mining: Searching Vectors"
Mattingly "Text and Data Mining: Searching Vectors"
 
Mattingly "Text Mining Techniques"
Mattingly "Text Mining Techniques"Mattingly "Text Mining Techniques"
Mattingly "Text Mining Techniques"
 
Mattingly "Text Processing for Library Data: Representing Text as Data"
Mattingly "Text Processing for Library Data: Representing Text as Data"Mattingly "Text Processing for Library Data: Representing Text as Data"
Mattingly "Text Processing for Library Data: Representing Text as Data"
 
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
Carpenter "Designing NISO's New Strategic Plan: 2023-2026"
 
Ross and Clark "Strategic Planning"
Ross and Clark "Strategic Planning"Ross and Clark "Strategic Planning"
Ross and Clark "Strategic Planning"
 
Mattingly "Data Mining Techniques: Classification and Clustering"
Mattingly "Data Mining Techniques: Classification and Clustering"Mattingly "Data Mining Techniques: Classification and Clustering"
Mattingly "Data Mining Techniques: Classification and Clustering"
 
Straza "Global collaboration towards equitable and open science: UNESCO Recom...
Straza "Global collaboration towards equitable and open science: UNESCO Recom...Straza "Global collaboration towards equitable and open science: UNESCO Recom...
Straza "Global collaboration towards equitable and open science: UNESCO Recom...
 
Lippincott "Beyond access: Accelerating discovery and increasing trust throug...
Lippincott "Beyond access: Accelerating discovery and increasing trust throug...Lippincott "Beyond access: Accelerating discovery and increasing trust throug...
Lippincott "Beyond access: Accelerating discovery and increasing trust throug...
 
Kriegsman "Integrating Open and Equitable Research into Open Science"
Kriegsman "Integrating Open and Equitable Research into Open Science"Kriegsman "Integrating Open and Equitable Research into Open Science"
Kriegsman "Integrating Open and Equitable Research into Open Science"
 
Mattingly "Ethics and Cleaning Data"
Mattingly "Ethics and Cleaning Data"Mattingly "Ethics and Cleaning Data"
Mattingly "Ethics and Cleaning Data"
 
Mercado-Lara "Open & Equitable Program"
Mercado-Lara "Open & Equitable Program"Mercado-Lara "Open & Equitable Program"
Mercado-Lara "Open & Equitable Program"
 
Ratner "Enhancing Open Science: Assessing Tools & Charting Progress"
Ratner "Enhancing Open Science: Assessing Tools & Charting Progress"Ratner "Enhancing Open Science: Assessing Tools & Charting Progress"
Ratner "Enhancing Open Science: Assessing Tools & Charting Progress"
 
Pfeiffer "Enhancing Open Science: Assessing Tools & Charting Progress"
Pfeiffer "Enhancing Open Science: Assessing Tools & Charting Progress"Pfeiffer "Enhancing Open Science: Assessing Tools & Charting Progress"
Pfeiffer "Enhancing Open Science: Assessing Tools & Charting Progress"
 

Recently uploaded

Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptxmary850239
 
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxDIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxMichelleTuguinay1
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...DhatriParmar
 
How to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseHow to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseCeline George
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvRicaMaeCastro1
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQuiz Club NITW
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxSayali Powar
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Projectjordimapav
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQuiz Club NITW
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...DhatriParmar
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptxDhatriParmar
 

Recently uploaded (20)

Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering
Faculty Profile prashantha K EEE dept Sri Sairam college of EngineeringFaculty Profile prashantha K EEE dept Sri Sairam college of Engineering
Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx
 
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxDIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
 
How to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseHow to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 Database
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Project
 
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptxINCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
 

NISO Forum, Denver, Sept. 24, 2012: Scientific discovery and innovation in an era of data-intensive science

  • 1. Data Observation Network for Earth (DataONE): Supporting Scientific Data Preservation, Discovery, and Innovation Bill Michener Professor and DataONE Project Director University of New Mexico 24 September 2012 National Information Standards Organization
  • 2. 2
  • 3. Research and Data Life Cycle Integration ? Plan Proposal writing Analyze Collect Ideas Research Integrate Assure Discover Describe Publication Preserve ? 3
  • 4. Three Key Challenges Plan Analyze Collect I v o n a n n t o i Integrate Assure Discover Describe Preserve 4
  • 5. 1. Data Preservation and Planning ✔ ? 5
  • 6. The Long Tail of Orphan Data “Most of the bytes are at the high end, Specialized repositories but most of the (e.g. GenBank, PDB) datasets are at the Volume low end” – Jim Gray Orphan data (B. Heidorn) Rank frequency of datatype 6
  • 7. Planning ? Metadata standard? Data repository? 7
  • 8. DataONE and the DMPTool Support Data Preservation Three major components for a Member Nodes flexible, scalable, sustainable • diverse institutions Coordinating Nodes network • serve local community • retain complete metadata Investigator Toolkit • provide resources for catalog managing their data • indexing for search • retain copies of data • network-wide services • ensure content availability (preservation) • replication services 8
  • 9. Dryad (>3,000 data products) Coordinated submission of articles and underlying data Handshaking with specialized repositories Promotion of reuse and incentives for deposit 9
  • 10. Knowledge Network for Biocomplexity (20,000+ data packages) Data Types • Ecological • Environmental • Demographic • Social/Legal/Economic Contributors 60 • Individual investigators 45 Data • Field stations and networks 30 Sizes • Government agencies % 15 • Non-profit partnerships 0 10-200 >200 <1 1-10 • Synthesis centers MB 10
  • 11. ✔Check for best practices ✔Create metadata ✔Connect to ONEShare Data & Metadata (EML) 11
  • 13. 13
  • 14. 14
  • 18. Member Node Functional Tiers Tier 1: Read only, public content ping(), getLogRecords(), getCapabilities(),get(), getSys temMetadata(), getChecksum(),listObjects(), synchronizat ionFailed() Tier 2: Read only, with access control isAuthorized(), setAccessPolicy() Tier 3: Read/Write using client tools create(), update(), delete() Tier 4: Able to operate as a replication target replicate(),getReplica() http://mule1.dataone.org/ArchitectureDocs-current/apis/MN_APIs.html 18
  • 19. ORNL DAAC as a DataONE Member Node NASA collectors DAAC Users (UWG) Investigator Toolkit DataONE Users 19
  • 20. 20
  • 21. 21
  • 22. 22
  • 23. 23
  • 24. 24
  • 25. 1. Ontology-based discovery search results Concepts acquire context: biomass as Material or biomass as Energy Additional search terms Super-classes may have different 1. NCBO ontology repository instance properties 2. Populated with ontologies (e.g., the NASA-JPL Semantic Web for Earth and Environmental Terminology) 3. Queried ontologies and returned results using REST services 25
  • 26. Approach 2: Enrich MN Metadata DAAC DRYAD KNB 3 KNB Number of Documents 978 1,729 24,249 2 DRYAD Total Number of Keywords 7,294 8,266 254,525 1 DAAC Average Keywords/Document 7.46 4.78 10.49 0 2 4 6 8 10 12 Actual Keywords Suggested Keywords [1]field investigation 1. canopy characteristics [2]analysis 2. field investigation [3]land cover [4]computational model 3. vegetation index [5]reflectance 4. leaf characteristics [6]vegetative cover [7]biomass 5. Satellite [8]primary production [9]steel measuring tape 6. land cover [10]weigh balance 7. leaf area meter [11]precipitation amount [12]canopy characteristics 8. Reflectance [13]leaf characteristics 9. steel measuring tape [14]water vapor [15]quadrat sample frame 10. vegetative cover [16]rain gauge [17]surface air temperature 11. plant characteristics [18]air temperature 12. albedo [19]meteorological station [20]human observer [21]vegetation index [22]soil core device [23]plant characteristics [24]surface wind 26 [25]albedo
  • 27. 3. Innovation The Fourth Paradigm: 1. Observational and experimental 2. Theoretical research 3. Computer simulations of natural phenomena 4. Data-intensive research • new tools, techniques, and ways of working 27 27
  • 28. “Data Intensive Science” and the “80:20 Rule” Increasing Process Knowledge Decreasing Spatial Coverage Intensive science sites and experiments Extensive science sites Volunteer & education networks Remote sensing Adapted from CENR-OSTP 28
  • 29. Public Participation in Scientific Research Conference: 4-5 August 2012 in Portland, Oregon USA prior to Ecological Society of America meeting (6-10 Aug.): http://www.birds.cornell.edu/citscitoolkit/conference/2012 29
  • 30. Investigator Toolkit Support Plan DMP-Tool Analyze Collect Kepler Integrate Assure Discover Describe Preserve 30
  • 31. Exploration, Visualization, and Analysis Diverse bird observations and Model results environmental data from 300,00 locations in the US Occurrence of Indigo Bunting (2008) integrated and analyzed using High Performance Computing Resources Land Cover Jan Ap Jun Sep Dec r Meteorology • Examine patterns of migration MODIS – Spatio-Temporal Exploratory • Infer how climate Remote Model identifies factors change may affect sensing data affecting patterns of bird migration migration 31
  • 34. DataONE: Supporting Scientific Data Preservation, Discovery, and Innovation Current Member Nodes: Coming Soon: Current Tools: Tools Coming Soon: Queensland University of Technology 34
  • 35. Deployment Targets – Y5  2009 2010 2011 2012 2013 2014 Y1 Y2 Y3 Y4 Y5 Metadata Objects 100k (130k) 400k 1M Datasets 90k (120k) 180k 360k Uptime 99.0 (100) 99.9 99.9 Metadata Schemas 8 (4) 8 8 Member Nodes 10 (8) 20 40 MN Countries 3 (2) 5 10 Coordinating Nodes 3 (3) 4 5 CN Countries 1 (1) 1 2 ITK Tools 8 (4) 10 12 35
  • 37. User Assessments Scientists: BL Scientists: FU Library Policies: BL Library Policies: FU Librarians: BL Librarians: FU Policy Makers: BL Policy Makers: FU Educators: BL Educators: FU Year 1 Year 2 Year 3 Year 4 Year 5 37
  • 39. Best Practices and Software Tools 39
  • 40. June 3-21, 2013 University of New Mexico 40
  • 41. Internships 2009 – 4 interns, 2010 – 4 interns 2011 – 8 interns, 2012 – 6 interns https://notebooks.dataone.org/summer2012/ 41
  • 42. DataONE: Supporting Scientific Data Preservation, Discovery, and Innovation 42
  • 44. DataONE Team and Sponsors • Amber Budden, Roger Dahl, Rebecca Koskela, Bill • Ewa Deelman Michener, Robert Nahf, Skye Roseboom, Mark Servilla • Deborah McGuinness • Dave Vieglais • Suzie Allard, Nick Dexter, Kimberly • Jeff Horsburgh Douglass, Carol Tenopir, Robert Waltz, Bruce • Wilson John Cobb, Bob Cook, Ranjeet • Robert Sandusky Devarakonda, Giri Palanismy, Line Pouchard • Patricia Cruse, John Kunze • Bertram Ludaescher • Sky Bristol, Mike Frame, Richard Huffine, Viv • Peter Honeyman Hutchison, Jeff Morisette, Jake Weltzin, Lisa Zolly • Stephanie Hampton, Chris Jones, Matt • Cliff Duke Jones, Ben Leinfelder, Andrew Pippin • Paul Allen, Rick Bonney, Steve Kelling • Carole Goble • Ryan Scherle, Todd Vision • Donald Hobern • Randy Butler • David DeRoure LEON LEVY FOUNDATION 44

Editor's Notes

  1. Networking, interconnectedness of information. Defining the relationships between components increases the value and utility of those items.The internet provides connectivity between systems, and a good deal of infrastructure has been built on this rapidly evolving, now pervasive fabric.The design of most internet based infrastructure though is very ephemeral, and thus is not suitable for preservation of information, or more importantly, the relationships between elements.URLs are often used as identifiers, except these have a significant problem in that their resolution, that is finding the location where the content identified by the URL may be retrieved is entirely dependent on the persistent availability of the service endpoint referenced by the URL. Change in any component in the resolution chain results in failure, and thus negates the utility of the URL.[Diagram of URL resolution process]The semantic web, the goal of interconnectedness between information is entirely dependent on effective identifier resolution.Preservation of content.Access to content. Creating communities of agents able to access and manipulate, information. Generating new content, relationships between content, discovering new associations. Being completely open about activity – the generation of new content, mining existing information, access to processing resources may however be best done with some privacy. There are always some activities best not to perform in full public view.The DataONE project is building infrastructure that addresses these concerns.
  2. In fact, many researchers find the new requirement to be quite confusing. Here are just a few examples of the questions that they are asking.
  3. There is widely used infrastructure for certain well-defined “easy” biological datatypes like DNA sequences and protein structures. But these repositories are not adequate to capture all those many datasets that requires more context to be reusable. Our civilization is not wealthy to ever support the variety specialized repositories that would be needed, and the curation that would be needed to standardize these data.
  4. DataONE is a federated data network built to improve access to Earth science data, and to support science by: engaging the relevant science, data, and policy communities; facilitating easy, secure, and persistent storage of data; and disseminating integrated and user-friendly tools for data discovery, analysis, visualization, and decision-making. There are three principal components:Member Nodes which include a diverse array of data centers and repositories that are associated with national and international agencies and research networks, universities, libraries, etc.Coordinating Nodes which support data replication across Member Nodes (i.e., data centers) as well as network wide services like 24/7 access to metadata at the CNs, indexing and rapid search and discovery, etc. Am Investigator Toolkit that includes tools that are widely used by scientists, The tools are coupled with the DataONE resources so that it is, for example, possible to seamlessly and transparently access data at Member Nodes through the tool of your choice.
  5. ContentData supporting peer-reviewed articles in basic and applied bioscienceCurrently, 2.4 Gb data from ~400 articles and 50 journalsPlatformCustomized Dspace repositoryMetadata and data standardsDublin Core Application ProfileData file format determined by depositor and journal policySome curation and migration of file formatsAvailabilityOpen Data (Creative Commons Zero), with time-limited embargoesIdentifier schemeDataCite DOIUsage~3000 annual downloadsGovernance and sustainabilityJointly managed by a consortium of partner journalsProject funding from NSF (since 2008) and JISC (starting 2010)Institutional homeNational Evolutionary Synthesis Center, British Library (pending)
  6. As one example, DataONE is part of a consortium that is developing a Data Management Planning Online Tool. The tool “walks” scientists through the process of developing a concise, but comprehensive data management plan that could enable good stewardship of data and meet requirements of sponsors and home institutions.
  7. First, one logs in, selects the Research sponsor and solicitation number.
  8. The five steps are located on the left side bar and include information about the data, metadata (or documentation about the data, policies for access and re-use, and plans for archiving and preserving the data. In this example, the Univ. of Virginia offers suggested text for archiving and preserving the data that can be pasted into the plan.
  9. There are many opportunities for collaboration with DataONE and there are many benefits to doing so; the next few slides highlight the benefit and opps for research scientists, Member Nodes, and funding agencies. This map highlights many of the international partners that have expressed interest in establishing Member Nodes, many of which are active members of the DataONE Users Group.
  10. NASA Collectors: Field investigators who collect data from NASA-funded projects and deposit those data in the ORNL DAAC. DAAC Users: Those who search and download data from the ORNL DAACMember Node Crescent: the software stack that enables the MN functionality for the ORNL DAAC. This crescent software is developed and installed by D1 staff, making use of the characteristics of the DAAC system and metadata DAAC users can obtain data directly from the ORNL DAAC, as they did before. D1 users will access metadata from the CN and will acquire ORNL DAAC data from the DAAC indirectly via the Member Node. The data and documentation downloads are recorded by the DAAC; the D1 users sees the DAAC’s citation to the downloaded data set
  11. I
  12. Other development activities during years 2-5 will focus on expanding the suite of tools that are available through the Investigator Toolkit. New tool additions will be identified and prioritized by the DataONE Users Group.
  13. How else do we know what the community needs?The Scientific Exploration, Visualization and Analysis working group is another example that you heard about earlier. In summary, by running through a comprehensive case study, this working group was able to provide specific guidance on the challenges faced when conducting data intensive science. Challenges that were communicated to, and met by, the DataONE core CI team and developers.Another mechanism to understand community needs is to conduct extensive surveys of stakeholders….