SlideShare a Scribd company logo
1 of 16
RDAP Summary

           Topics that drive future digital libraries


                       Reagan Moore




4/4/2012                   ASIST RDAP 2012              1
Topics
• Data Management Plans and Policies
      – Scientific research data support
      – Planning for NSF Data Management Plans
• Data Citation Panel
      – Digital identifiers
      – Data representation (context)
• Curation Service Models
      – Institution-based repositories
• SIG-DL Sustainability Panel
      – Cost model
      – Business model
• Training Data Management Practitioners
      – Theory for information and knowledge, but not digital data
      – Teaching eScience librarians how to manage data for researchers
4/4/2012                           ASIST RDAP 2012                        2
Data Management Plans
• Enforcement of regulations:
   – IRB, FERPA, HIPAA
• Enforcement of agency policies:
   – NSF Data management plans
• Enforcement of institutional policies:
   – Trustworthiness
• Compliance with community consensus on collection properties
   – Compliance with standards for discovery and access
• Enforcement of management policies:
   – Integrity, authenticity, retention, disposition, replication
• Automation of administrative tasks
   – Migration
• Validation of assessment criteria
4/4/2012                    ASIST RDAP 2012                    3
Data Identifiers
• Generate identifiers that are location
  independent
      – Handle system, hash function
      – Data management system updates link from identifier
        to representation of location (replicas)
• Given an identifier, what does it represent
      –    Landing page that provides context for the data
      –    Data model that approximates data in space and time
      –    Direct access to the data
      –    Access to procedure that generates the data

4/4/2012                      ASIST RDAP 2012                    4
Data Identifiers
• For derived data
      –    NASA Level 0 – raw data
      –    NASA Level 1 – Calibrated
      –    NASA Level 2 – Transformed to physical quantities
      –    NASA Level 3 – Functional transformations, projections
• Can we identify the process that created the data
      – Generalization of workflow provenance
      – Re-execute the workflow to re-create the data
• Create identifier for the workflow
      – Need workflow virtualization
• Reproducible science

4/4/2012                        ASIST RDAP 2012                     5
Curation Service Models
• Driven by user requirements
    – Unique services for each science and engineering domain
    – Different data formats, data analyses, semantics
• Can generic software support each unique collection?
    – View curation as a continuum with varying policies and
      procedures for each stage of the data life cycle
    – Characterize domains by access methods, policies, and
      procedures
• Are there standard best practices for a data center?
    – Data colocation – minimize administrative costs
    – Evolution of center to broaden range of supported
      communities

 4/4/2012                    ASIST RDAP 2012                    6
Standard Services
• Data discovery
• Data access
• Data manipulation
      – Re-creation of derived data products
      – Transformation
      – Feature detection
      – Indexing
      – Representation – fit polynomial in space and time
           • Manipulate data based on polynomial

4/4/2012                      ASIST RDAP 2012               7
Sustainability
• Business models
      – Identification of a sustaining community
      – Quantification of benefit
• Cost model
      – Distribution of cost across entire community
      – Membership fee
      – Pro-rated per item cost
• Minimizing cost
      – Automate curation
      – Transfer curation tasks to submitter
      – FITS file (astronomy)
           • Metadata for project/observatory
           • Metadata for each image

4/4/2012                           ASIST RDAP 2012     8
Creating a Repository
• Identify a support community
      – Tie to requirements of researchers
      – Tie to new science and research initiatives
      – Tie to intellectual capital of the university
• Identify cost benefit
      – Co-location of services
      – Benefit of scale
• Demonstrate responsiveness
      – Support for users

4/4/2012                    ASIST RDAP 2012             9
Educating Next Generation
• Identify a motivating challenge
• Curriculum development
   – Coupling of research to education
   – Competency in scientific data management and technology
• Data intensive science
   – Interest driven by a domain
   – Multi-disciplinary problems
   – Treat as a skill
• Work with live data
   – Enable students to make a discovery

 4/4/2012                  ASIST RDAP 2012              10
Data – Information – Knowledge
                       (iRODS)
• Data – instantiation of an approximation to reality
      – Form of representation of reality
      – Requires description of the physical approximation (context)
• Information – application of label to data
      – Requires identification of the relationships that must be
        satisfied for the label to be applied
      – Reification of knowledge (extraction of features)
• Knowledge – relationships between labels
      – Requires procedures to parse data to see if relationships are
        present
• Data science – transformation of data into knowledge
      – Use case driven

4/4/2012                      ASIST RDAP 2012                       11
Digital Library Evolution
• Witnessing rapid evolution of digital libraries
      – Item level indexing
      – Item level searching
      – Data manipulation services
• Driven by scale
      – Completeness of semantics
           • Represent every word in the English language (15 million)
           • Represent cultural knowledge (~ 1 Tbyte)
      – Types of reified relationships
           • Index based on more than 100 relationships present within
             documents (IBM-Watson)
           • Spatial, temporal, organizational, familial, …
      – Ability to couple indexing to data within storage

4/4/2012                           ASIST RDAP 2012                       12
Vision
• Dynamic digital library
      – Continually extract features from data
      – Generate index based on features within the data
• Create knowledge base
      – Link local index to community index
• Support evolution of the library
      – Define new relationships
      – Analyze contents
      – Generate new index

4/4/2012                  ASIST RDAP 2012                  13
Implications
• Characterize scientific data by the workflow that creates the
  published version
      – Transform from a library of data files into a library of workflows
• Support re-execution of workflows
      – Modify input parameters, generate new version
• Generate discovery semantics (features) through reification
  of relationships
      –    Must be able to parse each file
      –    Create algorithm that tests for the desired relationship
      –    Apply algorithms within storage systems
      –    Build terabyte index of reified relationships for each storage
           system


4/4/2012                            ASIST RDAP 2012                          14
Virtualization
• Digital library represents data as searchable metadata
• Collection virtualization defines and manages the
  properties of the collection
      – Assertions about each file in the collection
      – Location independent naming and access
      – Management of state information
• Workflow virtualization defines the properties of
  procedures
      – Provenance information for each procedure
      – Location independent naming and execution
      – Management of state information
4/4/2012                       ASIST RDAP 2012             15
Digital Library in 2050
• Links contents to cultural knowledge
      – Terabyte indices
• Enables analysis of library contents
      – Feature detection services
• Provides workspace in which research is conducted
      – Coupling of processing to data storage
• Validates assertions about collection properties
      – Published policies
• Scalable infrastructure

4/4/2012                     ASIST RDAP 2012         16

More Related Content

What's hot

Best practices data collection
Best practices data collectionBest practices data collection
Best practices data collection
Sherry Lake
 
Data Sets, Ensemble Cloud Computing, and the University Library: Getting the ...
Data Sets, Ensemble Cloud Computing, and the University Library:Getting the ...Data Sets, Ensemble Cloud Computing, and the University Library:Getting the ...
Data Sets, Ensemble Cloud Computing, and the University Library: Getting the ...
SEAD
 
Data management for TA's
Data management for TA'sData management for TA's
Data management for TA's
aaroncollie
 

What's hot (20)

Best practices data collection
Best practices data collectionBest practices data collection
Best practices data collection
 
Data Sets, Ensemble Cloud Computing, and the University Library: Getting the ...
Data Sets, Ensemble Cloud Computing, and the University Library:Getting the ...Data Sets, Ensemble Cloud Computing, and the University Library:Getting the ...
Data Sets, Ensemble Cloud Computing, and the University Library: Getting the ...
 
Data 2012 -- Presentation by Margaret Hedstrom (Jan 2012
Data 2012 -- Presentation by Margaret Hedstrom (Jan 2012Data 2012 -- Presentation by Margaret Hedstrom (Jan 2012
Data 2012 -- Presentation by Margaret Hedstrom (Jan 2012
 
NISO Forum, Denver, Sept. 24, 2012: DataCite and Campus Data Services
NISO Forum, Denver, Sept. 24, 2012: DataCite and Campus Data ServicesNISO Forum, Denver, Sept. 24, 2012: DataCite and Campus Data Services
NISO Forum, Denver, Sept. 24, 2012: DataCite and Campus Data Services
 
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
 
Data management for TA's
Data management for TA'sData management for TA's
Data management for TA's
 
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
 
Preservation, Publishing, and People: A SEAD View
Preservation, Publishing, and  People: A SEAD ViewPreservation, Publishing, and  People: A SEAD View
Preservation, Publishing, and People: A SEAD View
 
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
 
NISO Training Thursday Crafting a Scientific Data Management Plan
NISO Training Thursday Crafting a Scientific Data Management PlanNISO Training Thursday Crafting a Scientific Data Management Plan
NISO Training Thursday Crafting a Scientific Data Management Plan
 
Hansen Metadata for Institutional Repositories
Hansen Metadata for Institutional RepositoriesHansen Metadata for Institutional Repositories
Hansen Metadata for Institutional Repositories
 
RDAP14: Learning to Curate Panel
RDAP14: Learning to Curate Panel RDAP14: Learning to Curate Panel
RDAP14: Learning to Curate Panel
 
Activities of JaLC as a national service
Activities of JaLC as a national serviceActivities of JaLC as a national service
Activities of JaLC as a national service
 
Slides | Research data literacy and the library
Slides | Research data literacy and the librarySlides | Research data literacy and the library
Slides | Research data literacy and the library
 
Best practices data management
Best practices data managementBest practices data management
Best practices data management
 
Research Data Management
Research Data ManagementResearch Data Management
Research Data Management
 
Data as a Library Aquisition
Data as a Library AquisitionData as a Library Aquisition
Data as a Library Aquisition
 
Working with Global Infrastructure at a National Level
Working with Global Infrastructure at a National LevelWorking with Global Infrastructure at a National Level
Working with Global Infrastructure at a National Level
 
RDAP 15 Local ICPSR Data Curation Workshop Pilot Project
RDAP 15 Local ICPSR Data Curation Workshop Pilot ProjectRDAP 15 Local ICPSR Data Curation Workshop Pilot Project
RDAP 15 Local ICPSR Data Curation Workshop Pilot Project
 
Putnam Data Quality and the IR
Putnam Data Quality and the IRPutnam Data Quality and the IR
Putnam Data Quality and the IR
 

Similar to Rdap12 wrap up reagan moore

IASSIST 2012 - DDI-RDF - Trouble with Triples
IASSIST 2012 - DDI-RDF - Trouble with TriplesIASSIST 2012 - DDI-RDF - Trouble with Triples
IASSIST 2012 - DDI-RDF - Trouble with Triples
Dr.-Ing. Thomas Hartmann
 
Addressing Institutional Research Data Management - University of Edinburgh R...
Addressing Institutional Research Data Management - University of Edinburgh R...Addressing Institutional Research Data Management - University of Edinburgh R...
Addressing Institutional Research Data Management - University of Edinburgh R...
EDINA, University of Edinburgh
 

Similar to Rdap12 wrap up reagan moore (20)

Integrating an electronic lab notebook with a data repository; American Chemi...
Integrating an electronic lab notebook with a data repository; American Chemi...Integrating an electronic lab notebook with a data repository; American Chemi...
Integrating an electronic lab notebook with a data repository; American Chemi...
 
Elns and repositories, American Chemical Society, Dallas, March 2014
Elns and repositories, American Chemical Society, Dallas, March 2014Elns and repositories, American Chemical Society, Dallas, March 2014
Elns and repositories, American Chemical Society, Dallas, March 2014
 
Steven McEachern - ADA, DDI (metadata standard) and the Data Lifecycle
Steven McEachern - ADA, DDI (metadata standard) and the Data LifecycleSteven McEachern - ADA, DDI (metadata standard) and the Data Lifecycle
Steven McEachern - ADA, DDI (metadata standard) and the Data Lifecycle
 
ADA, DDI and the data lifecycle - Steve McEachern - 7 April 2017
ADA, DDI and the data lifecycle - Steve McEachern - 7 April 2017ADA, DDI and the data lifecycle - Steve McEachern - 7 April 2017
ADA, DDI and the data lifecycle - Steve McEachern - 7 April 2017
 
MetadataTheory: Learning Repositories Technologies (9th of 10)
MetadataTheory: Learning Repositories Technologies (9th of 10)MetadataTheory: Learning Repositories Technologies (9th of 10)
MetadataTheory: Learning Repositories Technologies (9th of 10)
 
How Portable Are the Metadata Standards for Scientific Data?
How Portable Are the Metadata Standards for Scientific Data?How Portable Are the Metadata Standards for Scientific Data?
How Portable Are the Metadata Standards for Scientific Data?
 
Hide the Stack: Toward Usable Linked Data
Hide the Stack:Toward Usable Linked DataHide the Stack:Toward Usable Linked Data
Hide the Stack: Toward Usable Linked Data
 
NISO access related projects (presented at the Charleston conference 2016)
NISO access related projects (presented at the Charleston conference 2016)NISO access related projects (presented at the Charleston conference 2016)
NISO access related projects (presented at the Charleston conference 2016)
 
Chapter 5 data resource management
Chapter 5  data resource managementChapter 5  data resource management
Chapter 5 data resource management
 
IASSIST 2012 - DDI-RDF - Trouble with Triples
IASSIST 2012 - DDI-RDF - Trouble with TriplesIASSIST 2012 - DDI-RDF - Trouble with Triples
IASSIST 2012 - DDI-RDF - Trouble with Triples
 
Presentation 16 may keynote karin bredenberg
Presentation 16 may keynote karin bredenbergPresentation 16 may keynote karin bredenberg
Presentation 16 may keynote karin bredenberg
 
ESI Supplemental 1 E-research Support Slides
ESI Supplemental 1   E-research Support SlidesESI Supplemental 1   E-research Support Slides
ESI Supplemental 1 E-research Support Slides
 
Research Data Management at Imperial College London
Research Data Management at Imperial College LondonResearch Data Management at Imperial College London
Research Data Management at Imperial College London
 
Emerging domain agnostic functionalities on the handle-centered networks
Emerging domain agnostic functionalities on the handle-centered networksEmerging domain agnostic functionalities on the handle-centered networks
Emerging domain agnostic functionalities on the handle-centered networks
 
Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...
 
Addressing Institutional Research Data Management - University of Edinburgh R...
Addressing Institutional Research Data Management - University of Edinburgh R...Addressing Institutional Research Data Management - University of Edinburgh R...
Addressing Institutional Research Data Management - University of Edinburgh R...
 
The Web of Data: The W3C Semantic Web Initiative
The Web of Data: The W3C Semantic Web InitiativeThe Web of Data: The W3C Semantic Web Initiative
The Web of Data: The W3C Semantic Web Initiative
 
Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked Data
 
Linked Data Competency Index : Mapping the field for teachers and learners
 Linked Data Competency Index : Mapping the field for teachers and learners Linked Data Competency Index : Mapping the field for teachers and learners
Linked Data Competency Index : Mapping the field for teachers and learners
 
10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...
10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...
10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...
 

More from ASIS&T

More from ASIS&T (20)

RDAP 16: Sustaining Research Data Services (Panel 2: Sustainability)
RDAP 16: Sustaining Research Data Services (Panel 2: Sustainability)RDAP 16: Sustaining Research Data Services (Panel 2: Sustainability)
RDAP 16: Sustaining Research Data Services (Panel 2: Sustainability)
 
RDAP 16: Sustainability of data infrastructure: The history of science scienc...
RDAP 16: Sustainability of data infrastructure: The history of science scienc...RDAP 16: Sustainability of data infrastructure: The history of science scienc...
RDAP 16: Sustainability of data infrastructure: The history of science scienc...
 
RDAP 16: DMPs and Public Access: Agency and Data Service Experiences
RDAP 16: DMPs and Public Access: Agency and Data Service ExperiencesRDAP 16: DMPs and Public Access: Agency and Data Service Experiences
RDAP 16: DMPs and Public Access: Agency and Data Service Experiences
 
RDAP 16: Perspective on DMPs, Funders and Public Access (Panel 5: DMPs and Pu...
RDAP 16: Perspective on DMPs, Funders and Public Access (Panel 5: DMPs and Pu...RDAP 16: Perspective on DMPs, Funders and Public Access (Panel 5: DMPs and Pu...
RDAP 16: Perspective on DMPs, Funders and Public Access (Panel 5: DMPs and Pu...
 
RDAP 16: DMPs and Public Access: An NIH Perspective (Panel 5, DMPs and Public...
RDAP 16: DMPs and Public Access: An NIH Perspective (Panel 5, DMPs and Public...RDAP 16: DMPs and Public Access: An NIH Perspective (Panel 5, DMPs and Public...
RDAP 16: DMPs and Public Access: An NIH Perspective (Panel 5, DMPs and Public...
 
RDAP 16: If I could turn back time: Looking back on 2+ years of DMP consultin...
RDAP 16: If I could turn back time: Looking back on 2+ years of DMP consultin...RDAP 16: If I could turn back time: Looking back on 2+ years of DMP consultin...
RDAP 16: If I could turn back time: Looking back on 2+ years of DMP consultin...
 
RDAP 16: Data Management Plan Perspectives (Panel 5, DMPs and Public Access)
RDAP 16: Data Management Plan Perspectives (Panel 5, DMPs and Public Access)RDAP 16: Data Management Plan Perspectives (Panel 5, DMPs and Public Access)
RDAP 16: Data Management Plan Perspectives (Panel 5, DMPs and Public Access)
 
RDAP 16 Poster: Challenges and Opportunities in an Institutional Repository S...
RDAP 16 Poster: Challenges and Opportunities in an Institutional Repository S...RDAP 16 Poster: Challenges and Opportunities in an Institutional Repository S...
RDAP 16 Poster: Challenges and Opportunities in an Institutional Repository S...
 
RDAP 16 Poster: Interpreting Local Data Policies in Practice
RDAP 16 Poster: Interpreting Local Data Policies in PracticeRDAP 16 Poster: Interpreting Local Data Policies in Practice
RDAP 16 Poster: Interpreting Local Data Policies in Practice
 
RDAP 16 Poster: Measuring adoption of Electronic Lab Notebooks and their impa...
RDAP 16 Poster: Measuring adoption of Electronic Lab Notebooks and their impa...RDAP 16 Poster: Measuring adoption of Electronic Lab Notebooks and their impa...
RDAP 16 Poster: Measuring adoption of Electronic Lab Notebooks and their impa...
 
RDAP 16 Poster: Responding to Data Management and Sharing Requirements in the...
RDAP 16 Poster: Responding to Data Management and Sharing Requirements in the...RDAP 16 Poster: Responding to Data Management and Sharing Requirements in the...
RDAP 16 Poster: Responding to Data Management and Sharing Requirements in the...
 
RDAP 16 Lightning: Spreading the love: Bringing data management training to s...
RDAP 16 Lightning: Spreading the love: Bringing data management training to s...RDAP 16 Lightning: Spreading the love: Bringing data management training to s...
RDAP 16 Lightning: Spreading the love: Bringing data management training to s...
 
RDAP 16 Lightning: RDM Discussion Group: How'd that go?
RDAP 16 Lightning: RDM Discussion Group: How'd that go?RDAP 16 Lightning: RDM Discussion Group: How'd that go?
RDAP 16 Lightning: RDM Discussion Group: How'd that go?
 
RDAP 16 Lightning: Data Practices and Perspectives of Atmospheric and Enginee...
RDAP 16 Lightning: Data Practices and Perspectives of Atmospheric and Enginee...RDAP 16 Lightning: Data Practices and Perspectives of Atmospheric and Enginee...
RDAP 16 Lightning: Data Practices and Perspectives of Atmospheric and Enginee...
 
RDAP 16 Lightning: Working Across Cultures: Data Librarian as Knowledge Broker
RDAP 16 Lightning: Working Across Cultures: Data Librarian as Knowledge BrokerRDAP 16 Lightning: Working Across Cultures: Data Librarian as Knowledge Broker
RDAP 16 Lightning: Working Across Cultures: Data Librarian as Knowledge Broker
 
RDAP 16 Lightning: An Open Science Framework for Solving Institutional Challe...
RDAP 16 Lightning: An Open Science Framework for Solving Institutional Challe...RDAP 16 Lightning: An Open Science Framework for Solving Institutional Challe...
RDAP 16 Lightning: An Open Science Framework for Solving Institutional Challe...
 
RDAP 16 Lightning: Quantifying Needs for a University Research Repository Sys...
RDAP 16 Lightning: Quantifying Needs for a University Research Repository Sys...RDAP 16 Lightning: Quantifying Needs for a University Research Repository Sys...
RDAP 16 Lightning: Quantifying Needs for a University Research Repository Sys...
 
RDAP 16 Lightning: Personas as a Policy Development Tool for Research Data
RDAP 16 Lightning: Personas as a Policy Development Tool for Research DataRDAP 16 Lightning: Personas as a Policy Development Tool for Research Data
RDAP 16 Lightning: Personas as a Policy Development Tool for Research Data
 
RDAP 16 Lightning: Growing Data in Utah: A Model for Statewide Collaboration
RDAP 16 Lightning: Growing Data in Utah: A Model for Statewide CollaborationRDAP 16 Lightning: Growing Data in Utah: A Model for Statewide Collaboration
RDAP 16 Lightning: Growing Data in Utah: A Model for Statewide Collaboration
 
RDAP 16: Building Without a Plan: How do you assess structural strength? (Pan...
RDAP 16: Building Without a Plan: How do you assess structural strength? (Pan...RDAP 16: Building Without a Plan: How do you assess structural strength? (Pan...
RDAP 16: Building Without a Plan: How do you assess structural strength? (Pan...
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Recently uploaded (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 

Rdap12 wrap up reagan moore

  • 1. RDAP Summary Topics that drive future digital libraries Reagan Moore 4/4/2012 ASIST RDAP 2012 1
  • 2. Topics • Data Management Plans and Policies – Scientific research data support – Planning for NSF Data Management Plans • Data Citation Panel – Digital identifiers – Data representation (context) • Curation Service Models – Institution-based repositories • SIG-DL Sustainability Panel – Cost model – Business model • Training Data Management Practitioners – Theory for information and knowledge, but not digital data – Teaching eScience librarians how to manage data for researchers 4/4/2012 ASIST RDAP 2012 2
  • 3. Data Management Plans • Enforcement of regulations: – IRB, FERPA, HIPAA • Enforcement of agency policies: – NSF Data management plans • Enforcement of institutional policies: – Trustworthiness • Compliance with community consensus on collection properties – Compliance with standards for discovery and access • Enforcement of management policies: – Integrity, authenticity, retention, disposition, replication • Automation of administrative tasks – Migration • Validation of assessment criteria 4/4/2012 ASIST RDAP 2012 3
  • 4. Data Identifiers • Generate identifiers that are location independent – Handle system, hash function – Data management system updates link from identifier to representation of location (replicas) • Given an identifier, what does it represent – Landing page that provides context for the data – Data model that approximates data in space and time – Direct access to the data – Access to procedure that generates the data 4/4/2012 ASIST RDAP 2012 4
  • 5. Data Identifiers • For derived data – NASA Level 0 – raw data – NASA Level 1 – Calibrated – NASA Level 2 – Transformed to physical quantities – NASA Level 3 – Functional transformations, projections • Can we identify the process that created the data – Generalization of workflow provenance – Re-execute the workflow to re-create the data • Create identifier for the workflow – Need workflow virtualization • Reproducible science 4/4/2012 ASIST RDAP 2012 5
  • 6. Curation Service Models • Driven by user requirements – Unique services for each science and engineering domain – Different data formats, data analyses, semantics • Can generic software support each unique collection? – View curation as a continuum with varying policies and procedures for each stage of the data life cycle – Characterize domains by access methods, policies, and procedures • Are there standard best practices for a data center? – Data colocation – minimize administrative costs – Evolution of center to broaden range of supported communities 4/4/2012 ASIST RDAP 2012 6
  • 7. Standard Services • Data discovery • Data access • Data manipulation – Re-creation of derived data products – Transformation – Feature detection – Indexing – Representation – fit polynomial in space and time • Manipulate data based on polynomial 4/4/2012 ASIST RDAP 2012 7
  • 8. Sustainability • Business models – Identification of a sustaining community – Quantification of benefit • Cost model – Distribution of cost across entire community – Membership fee – Pro-rated per item cost • Minimizing cost – Automate curation – Transfer curation tasks to submitter – FITS file (astronomy) • Metadata for project/observatory • Metadata for each image 4/4/2012 ASIST RDAP 2012 8
  • 9. Creating a Repository • Identify a support community – Tie to requirements of researchers – Tie to new science and research initiatives – Tie to intellectual capital of the university • Identify cost benefit – Co-location of services – Benefit of scale • Demonstrate responsiveness – Support for users 4/4/2012 ASIST RDAP 2012 9
  • 10. Educating Next Generation • Identify a motivating challenge • Curriculum development – Coupling of research to education – Competency in scientific data management and technology • Data intensive science – Interest driven by a domain – Multi-disciplinary problems – Treat as a skill • Work with live data – Enable students to make a discovery 4/4/2012 ASIST RDAP 2012 10
  • 11. Data – Information – Knowledge (iRODS) • Data – instantiation of an approximation to reality – Form of representation of reality – Requires description of the physical approximation (context) • Information – application of label to data – Requires identification of the relationships that must be satisfied for the label to be applied – Reification of knowledge (extraction of features) • Knowledge – relationships between labels – Requires procedures to parse data to see if relationships are present • Data science – transformation of data into knowledge – Use case driven 4/4/2012 ASIST RDAP 2012 11
  • 12. Digital Library Evolution • Witnessing rapid evolution of digital libraries – Item level indexing – Item level searching – Data manipulation services • Driven by scale – Completeness of semantics • Represent every word in the English language (15 million) • Represent cultural knowledge (~ 1 Tbyte) – Types of reified relationships • Index based on more than 100 relationships present within documents (IBM-Watson) • Spatial, temporal, organizational, familial, … – Ability to couple indexing to data within storage 4/4/2012 ASIST RDAP 2012 12
  • 13. Vision • Dynamic digital library – Continually extract features from data – Generate index based on features within the data • Create knowledge base – Link local index to community index • Support evolution of the library – Define new relationships – Analyze contents – Generate new index 4/4/2012 ASIST RDAP 2012 13
  • 14. Implications • Characterize scientific data by the workflow that creates the published version – Transform from a library of data files into a library of workflows • Support re-execution of workflows – Modify input parameters, generate new version • Generate discovery semantics (features) through reification of relationships – Must be able to parse each file – Create algorithm that tests for the desired relationship – Apply algorithms within storage systems – Build terabyte index of reified relationships for each storage system 4/4/2012 ASIST RDAP 2012 14
  • 15. Virtualization • Digital library represents data as searchable metadata • Collection virtualization defines and manages the properties of the collection – Assertions about each file in the collection – Location independent naming and access – Management of state information • Workflow virtualization defines the properties of procedures – Provenance information for each procedure – Location independent naming and execution – Management of state information 4/4/2012 ASIST RDAP 2012 15
  • 16. Digital Library in 2050 • Links contents to cultural knowledge – Terabyte indices • Enables analysis of library contents – Feature detection services • Provides workspace in which research is conducted – Coupling of processing to data storage • Validates assertions about collection properties – Published policies • Scalable infrastructure 4/4/2012 ASIST RDAP 2012 16