SlideShare una empresa de Scribd logo
1 de 57
Prepared for

                                    NISO Forum:
         Tracking it Back to the Source: Managing and Citing Research Data


                                 September 2012




   Needs for Data Management &
Citation Throughout the Information
              Lifecycle

                 Micah Altman
 Director of Research, MIT Libraries
Collaborators and Co-Conspirators
• Jonathan Crabtree, Merce Crosas, Gary King, Tom
  Lipkis, Nancy McGovern, John Willinsky

• Research Support
   –   Library of Congress (PA#NDP03-1),
   –   National Science Foundation (DMS-0835500, SES 0112072)
   –   Institute for Museum and Library Services (LG-05-09-0041-09)
   –   Sloan Foundation
   –   Amazon Web Services
   –   Massachusetts Institute of Technology




                        Needs for Data Management & Citation          2
Related Work
Reprints available from:
http://maltman.hmdc.harvard.edu

•   Altman, M. 2012. Data Citation in The Dataverse Network ®. In P. F. Uhlir (Ed.), Developing Data
    Attribution and Citation Practices and Standards: Report from an International Workshop (p.
    Forthcoming). National Academies Press. Forthcoming.
•   Altman, M., & Crabtree, J. 2011. Using the SafeArchive System : TRAC-Based Auditing of LOCKSS.
    Archiving 2011 (pp. 165–170). Society for Imaging Science and Technology.
•   M. Altman, Adams, M., Crabtree, J., Donakowski, D., Maynard, M., Pienta, A., & Young, C. 2009.
    "Digital preservation through archival collaboration: The Data Preservation Alliance for the Social
    Sciences." The American Archivist. 72(1): 169-182 M. Altman, 2008, "A Fingerprint Method for
    Verification of Scientific Data" in, Advances in Systems, Computing Sciences and Software
    Engineering, (Proceedings of the International Conference on Systems, Computing Sciences and
    Software Engineering 2007) , Springer-Verlag.
•   M. Altman and G. King. 2007. “A Proposed Standard for the Scholarly Citation of Quantitative Data”,
    D-Lib, 13, 3/4 (March/April).




                                   Needs for Data Management & Citation                               3
Preview
•   Principled approach to data management
•   Lifecycle data management planning
•   Lifecycle data management tracking
•   Lifecycle data management infrastructure
•    [Exemplar Projects]




                  Needs for Data Management & Citation   4
(Some) Timely Challenges



      Needs for Data Management & Citation   5
“Data science is suddenly sexy –
 does that mean data is the new
            black?”




          Needs for Data Management & Citation   6
Valuable Data is Lost
• Researchers lack                    Examples
  archiving capability                Intentionally Discarded: “Destroyed, in accord with
                                          [nonexistent] APA 5-year post-publication rule.”
• Incentives for data                 Unintentional Hardware Problems “Some data were
  sharing are weak                       collected, but the data file was lost in a technical
                                         malfunction.”

                                      Acts of Nature The data from the studies were on punched
                                         cards that were destroyed in a flood in the department
                                         in the early 80s.”

                                      Discarded or Lost in a Move “As I retired ….
                                         Unfortunately, I simply didn’t have the room to store
                                         these data sets at my house.”

                                      Obsolescence “Speech recordings stored on a LISP
                                        Machine…, an experimental computer which is long
                                        obsolete.”

                                      Simply Lost “For all I know, they are on a [University]
                                         server, but it has been literally years and years since
                                         the research was done, and my files are long gone.”

                                                                     Research by:
                     Needs for Data Management & Citation                                 7
Unpublished Data Ends up in the “Desk Drawer”
• Null results are less likely to be published
• Outliers are routinely discarded




                                                                Daniel
                                                             Schectman’s
                                                             Lab Notebook
                                                               Providing
                                                                 Initial
                                                              Evidence of
                                                             Quasi Crystals




                      Needs for Data Management & Citation              8
Data Behind Publications Unavailable for
      Review, Reuse, Replication




          Needs for Data Management & Citation   9
Model Science


“Citations to unpublished data and personal
communications cannot be used to support
        claims in a published paper”

“All data necessary to understand, assess,
     and extend the conclusions of the
manuscript must be available to any reader
               of Science.”
                Needs for Data Management & Citation   10
Compliance with Policies is Low
   Compliance is low even in
    best examples of journals
   Checking compliance
    manually is tedious,
    doesn’t scale




                         Needs for Data Management & Citation   11
Special Challenges for Long-Term Access
         to New Forms of Data
• Some Examples
  – GIS and geospatial trails
  – Facebook & social networks
  – Text: blogs, tweets
  – Cell phone data
• Challenges
  – Proprietary – intellectual                            Source: [Calberese 2008]


    property
  – Size
  – Dynamic content
  – Fixity
  – Format         Needs for Data Management & Citation                              12
A Lifecycle Framework



    Needs for Data Management & Citation   13
“The published article is not scientific output
                      –
    it’s a summary of scientific output.”

    -- corollary of Buckheit & Donaho 1995


                 Needs for Data Management & Citation   14
Information Lifecycle

                                Long-term              Creation/Collecti
                                  access                      on
Modeling




              Re-use
              • Scientific                                                  Storage/I
              • Educational                                                   ngest
              • Scientometric
              • Institutional




                  External
           dissemination/publicati                                          Processing
                     on



                                                               Internal
                                 Analysis
                                                                Sharing
                                     Needs for Data Management & Citation                15
Stakeholders
                                                                                                Data
                 Consumers               Long-                                               Sources/Su
                                                                 Creation/C                    bjects
                                          term
                                                                  ollection
                                         access

             Data
Modeling




           Archives/                                                            Storage/
           Publisher         Re-use
                                                   Researchers                   Ingest



           Research                                Research
           Sponsors                              Organizations
                            External
                        dissemination/                                          Processing
                          publication


           Scholarly                                               Internal
                                         Analysis
           Publishers                                              Sharing
                                                                                           Service/Infras
                                                                                             tructure
                                         Needs for Data Management & Citation
                                                                                             Providers 16
Legal Requirements and Rights

             Contract                                                              Intellectual Property
                                                                  Trade
                                                                  Secret            Intellectual
                 Contract        Click-Wrap                       Patent
                                                                                    Attribution
                                     TOU
                 License                                                              Moral Rights
Modeling




                                                                 Database Rights
              Journal       Funder Open                    Copyright       DMCA               Trademar
            Replication        Access                                                             k
           Requirement                        Fair Use                                        Rights of
                                                                           Common
                 s                                                                            Publicity
                                                                             Rule
                                                         HIPAA             45 CFR 26           Privacy
              FOIA                                                          EU Privacy
                                                         FERPA                                  Torts
                                                                             Directive         (Invasion,
             State                                                                            Defamation)
              FOI                                            CIPSEA
                                                                                          Potentially
             Laws                                                    State                 Harmful
                                                                 Privacy Laws             (Archeologic
                                                                                            al Sites,
                                                                    Classifie
                                                                  Sensitive                  Animal
                                                                    butd                   Testing, …)
             Access                                                                  EA Confidentiality
                                                                 Unclassifie
             Rights                                                  d                R
                                                                                   ITAR
Stakeholders, Rights and Requirements

             Contract                                                                          Intellectual Property
                                                                              Trade
                                                                              Secret            Intellectual
                 Contract        Click-Wrap                  Scholarly        Patent
                                                             Publisher                          Attribution
                                     TOU
                 License                                         s                                Moral Rights
Modeling




                                                     Consumers
                                               -     Secondary research
                                              -      Participative Science
                                                -     - Public policy uses
                                                                             Database Rights
              Journal       Funder Open                      Copyright
                                              Infrastructure/Serv                      DMCA          Trademar
            Replication        Access                                                         Primary
                                                  ice Providers                                            k
           Requirement                           Fair Use                                            Rights of
                                                                                            Researchers
                                                                                Common
                 s                                                                                   Publicity
                                                      Research HIPAA
                                                     HIPAA                         Rule
              FOIA                                  Organizations               45 CFR 26              Privacy
                                                                                  EU Privacy             Torts
                                                      FERPA    FERPA
                                                                                   Directive           (Invasion,
             State                                  Data Archives                CIPSEA               Defamation)
            FOI Laws                                                State                        Potentially
                                                                Privacy Laws                       Harmful
                                                                         Classifie               (Archeologic
                                                    Research                                       al Sites,
                                                    Sponsors            Sensitive Sources/S
                                                                            d
                                                                                                    Animal
                                                                           but      ubjects       Testing, …)
             Access                                                    Unclassifie              Confidentiality
             Rights                                                         d
Stakeholder Drivers per Stage of Information Lifecycle
           Stage        Actors               Legal Constraint                Concerns
           Research     Subjects             - Consent/contract              - Public benefit
           Proposal,                                                         - Privacy
           Design and                                                        - Future access to own
Modeling




           Data                                                                information
           Collection   Sources              - Intellectual                  - Business confidentiality
                                               Property                      - IP
                                             - Contract                      - Profit from licenses
                        Funder               - Open Access                   -   Public benefit
                                             - Confidentiality               -   Policy Relevance
                                                                             -   Reproducible Research
                                                                             -   Future access
                        Primary              - Confidentiality               - Publication potential
                        Researcher           - Contract                      - Compliance with
                                             - IP                              institutional/funder
                                                                               requirements
                        Research             - Confidentiality               - Compliance with funder
                        Institution          - Contract                        requirements
                                             - IP
                                      Needs for Data Management & Citation   - License, IP, confidentiality
                                                                                                       19
                                                                               compliance
Stakeholder Drivers per Stage of Information Lifecycle
           Stage               Actors            Legal Constraint          Concerns
           Data Storage,       Primary           - Confidentiality         - Publication potential
           Analysis            Researcher        - Contract                - Compliance with
           (Pre-publication)                     - IP                        institutional/funder
Modeling




                                                                             requirements
                               Research          - Confidentiality         - License, IP,
                               Institution       - Contract                  confidentiality
                                                 - IP                        compliance
                                                                           - Records management
                               Service           - Contract                - Contract
                               Providers         - (Selected Cases)        - Service business
                                                   Confidentiality           model
                                                   Requirements            - Service deployment




                                    Needs for Data Management & Citation                       20
Stakeholder Drivers per Stage of Information Lifecycle
           Stage       Actors        Legal Constraint                 Concerns
           Publication Primary       Compliance for:                  - Scholarly attribution/credit
                       Researcher    - Source/subjects                - Promote use of research
                                     - Sponsor                        - Track use/impact of research
                                     - Host institution
Modeling




                                     - Publisher
                       Sponsor                                        - Track research products
                                                                      - Track compliance
                                                                      - Track use/impact
                       Research      - Sponsor compliance             - Track OA products
                       Institution                                    - Records management
                                                                      - Intellectual property
                       Scholarly     - IP                             - Impact/use
                       /Journal      - Contract                       - Profit/business model
                       Publisher                                      - Replicability
                       Data          - IP                             - Profit/business model
                       Publisher                                      - Replicability
                                                                      - Connection to publication
                                     Needs for Data Management & Citation                           21
Stakeholder Drivers per Stage of Information Lifecycle
           Stage       Actors            Legal Constraint                   Concerns
           Re(use)     Research          - Access Rights                    - Provenance
                       Reader
Modeling




                       Secondary         - Access rights                    -   Replicability
                       Researcher        - Confidentiality                  -   Data reintegration/reanalysis
                                         - Contract                         -   Linking publications and data
                                                                            -   Provenance
                       “Citizen/Co       Access Rights                      - Data
                       mmunity                                                redissemination/reanalysis
                       Scientist”                                           - Linking publications and data
                       Public Policy Access Rights                          - Provenance
                                                                            - Replicability
                                                                            - Linking publications and data
                       Education         Access Rights                      - “Classroom” use
                       /teaching                                            - MOOC use


                                     Needs for Data Management & Citation                                22
Lifecycle Management:
Data Management Planning


      Needs for Data Management & Citation   23
Some Formal “DMP” Requirements
           • The Final NIH Statement on Sharing Research Data
              – was published in the NIH Guide on February 26, 2003.
               “Starting with the October 1, 2003 receipt date, investigators submitting an
               NIH application seeking $500,000 or more in direct costs in any single year
Planning




               are expected to include a plan for data sharing or state why data sharing is
               not possible. “
              – No later than the main findings from the final data set are
                accepted for publication
           • NSF, All proposals must (as of 1/1/2011) include a data
             management plan.
              – Specific requirements vague, for the most part:
                “will be determined by the community of interest through the process of peer review and
                program management.”
           • Wellcome Trust:
              – “ will review data management and sharing plans, and any costs
                involved in delivering them, as an integral part of the funding
                decision”
                                        Needs for Data Management & Citation                         24
DMP Goals
          • Orchestrate data for current use
          • Control disclosure
          • Compliance with contracts, regulations, law,
Planing




            and policy
          • Maximize value of information assets
          • Ensure short term and long term
            dissemination



                         Needs for Data Management & Citation   25
DMP Elements
           •   Orchestrate data for current use                      –   Data description
                – Quality Assurance                                  –   Data value
                – Storage, backup, replication, and                  –   Relation to collection
                  versioning                                         –   Relation to evidence base
                – Data Formats                                       –   Budget
                – Data Organization
Planning




                – Budget                                      •    Ensure short term and long term
                – Metadata and documentation                       dissemination
                                                                     –   Data description
           •   Control disclosure                                    –   Institutional Archiving Commitments
                –   Access and Sharing                               –   Audience
                –   Intellectual Property Rights                     –   Access and Sharing
                –   Legal Requirements                               –   Data Formats
                –   Security                                         –   Data Organization
                                                                     –   Metadata and documentation
           •   Compliance with contracts,                            –   Budget
               regulations, law, and policy
                –   Access and Sharing
                –   Adherence
                –   Responsibility
                –   Ethics and privacy
                –   Security

           •   Value of information assets
                                          Needs for Data Management & Citation                                 26
DMP Details
           •   Sharing                                                                                 –     Restrictions on use
                 –      Plans for depositing in an existing public database                     •   Budget
                 –      Access procedures                                                              –     Cost of preparing data and documentation
                 –      Embargo periods                                                                –     Cost of storage and backup
                 –      Access charges                                                                 –     Cost of permanent archiving and access
                 –      Timeframe for access                                                    •   Intellectual Property Rights
                 –      Technical access methods                                                       –     Entities who hold property rights
                 –      Restrictions on access                                                         –     Types of IP rights in data
           •   Long term access                                                                        –     Protections provided
               (Preservation)                                                                          –     Dispute resolution process
                 –
Planning




                        Requirements for data destruction, if applicable                        •   Legal Requirements
                 –      Procedures for long term preservation                                          –     Provider requirements and plans to meet them
                 –      Institution responsible for long-term costs of data preservation               –     Institutional requirements and plans to meet them
                 –      Succession plans for data should archiving entity go out of existence   •   Responsibility
           •   Formats                                                                                 –     Individual or project team role responsible for data management
                 –      Generation and dissemination formats and procedural justification              –     Qualifications, certifications, and licenses of responsible parties
                 –      Storage format and archival justification                               •   Ethics and privacy
                 –      Format documentation                                                           –     Informed consent
           •   Metadata and documentation                                                              –     Protection of privacy
                 –      Internal and External Identifiers and Citations                                –     Data use agreements
                 –      Metadata to be provided                                                        –     Other ethical issues
                 –      Metadata standards used                                                 •   Adherence
                 –      Planned documentation and supporting materials                                 –     When will adherence to data management plan be checked or
                 –      Quality assurance procedures for metadata and documentation                          demonstrated
           •   Data Organization                                                                       –     Who is responsible for managing data in the project
                 –      File organization                                                              –     Who is responsible for checking adherence to data management plan
                 –      Naming conventions                                                             –     Auditing procedures and framework
           •   Storage, backup, replication, and versioning                                     •   Value of information assets
                 –      Facilities                                                                    –     Project use value
                 –      Methods                                                                       –     Institutional audience and uses
                 –      Procedures                                                                    –     Public audience and uses
                 –      Frequency                                                                     –     Relation to institutional collection
                 –      Replication                                                                   –     Relation to disciplinary evidence base
                 –      Version management                                                            –     Cost of re-creating data
                 –      Recovery guarantees
           •   Security
                 –      Procedural controls
                 –      Technical Controls
                 –      Confidentiality concerns
                 –      Access control rules

                                                               Needs for Data Management & Citation                                                                      27
Approaching Requirement Overlap
           • Sanity-check DMP details with lifecycle questions:
               – Who wants it?
Planning




               – What do they need it for?
               – When will it be used?
           • Be conscious of elements that serve multiple goals / or lifecycle
               –   Metadata/documentation
               –   Identifiers
               –   Budgets
               –   Formats
               –   IP Rights and confidentiality restrictions
               –   Responsibilities/Adherence
           • Use tracking tools and methods throughout lifecycle
                                                       This Way…


                                     Needs for Data Management & Citation        28
Lifecycle Management:
        Tracking


     Needs for Data Management & Citation   29
What do we track?

             What tools and methods provide technical leverage or
             incentives to management across lifecycle stages and among
             actors?
Tracking




             •   Identification – identifiers, references, citations
             •   Provenance – relationship of delivered data to history of inputs and
                 modifications and actors responsible for these ; revision control; versioning
             •   Authenticity: assertions about the provenance of the records
             •   Respect des fonds: assertions about the original organization of the records
             •   Chain of custody: assertions about the ownership of the records
             •   Integrity: assertions about the management of the records; fixity of bits; fixity of
                 semantics
             •   Auditing: verification of properties & policy compliance



           Sources: Bulleted list of attributes adapted from Moore 2008

                                                 Needs for Data Management & Citation               30
Tracking Across Information Lifecycle

                               Long-term     Creation/Collecti
                                 access             on


                                           identifiers
Tracking




                                                                 Storage/I
                   Re-use
                                                                   ngest
                                                Metadata for:
                                                  Integrity,
                                                Provenance,
                              citation             Custody

                  External
           dissemination/publicati                               Processing
                     on



                                                    Internal
                                Analysis
                                                    Sharing                   31
Data Citation: a Point of Leverage
           • Services
              – Identifiers to specific fixed versions of data are needed to
                establish unambiguous chains of provenance
              – Identifiers that can be globally resolved to machine-
                understandable metadata and to identified object are needed to
Tracking




                building generalized access and analysis services
              – Persistence of identifiers are needed to maintain long-term
                access
           • Incentives
              – Scholarly credit (intellectual attribution) is a large motivator for
                many researchers
                 – citation creates incentive for researchers to publish data
              – Scholars also comply with enforceable journal policies
                -- requiring data citation is a light-weight method to make data
                access policies auditable
              – Impact/usage is a motivator for public research funders – data
                citation provides foundation for measures of usage and impact

                                 Needs for Data Management & Citation                  32
Emerging Practices for Data Citation
       • Publishers
           – OECD iLibrary
           – Thomson Reuters
Tracking




             Data Citation Index
       • Data archives
           – Dataverse Network
           – Data-PASS
       • Harmonization
         efforts
           – DataCite
           – NAS BRDI
           – ICSU/Co-Data
       • Discipline specific
                           Needs for Data Management & Citation   33
Identifier and Citation Use Cases
                                     Attribution
                                     • Provide scholarly attribution
                                     • Provide legal attribution
                                     • Identify contributors to data



Verification                                                                   Discovery
• Associate work with version                                                  • Locate data via identifier
  of evidence used                                                             • Locate data integral to article
• Verify fixity of bits                                                        • Locate works related to data
• Verify fixity of information                                                   – articles, derivatives,
• Verify “authenticity” of work                                                  sources




    Access                                                                 Persistence
    • Access to surrogate                                                  • Does evidence persists as
                                                                             long as assertions based on
    • On-line access to object
                                                                             it?
    • Machine understandability
                                                                           • Is durability of evidence
    • Long-term understandability                                            transparent?
                                    Needs for Data Management & Citation                               34
Emerging Principles for Data Citation
           • Data citations should be first class objects for publication
             -- appear with citations to other works; should be as easy
Tracking




             to cite as other works

           • Citations should persist and enable access to fixed version of data at least
             as long as citing work
           • Citations should persist and enable access to fixed version of data at least
             as long as the citing work exists.
           • Citations should support unambiguous attribution of credit to all contributors,
             possibly through the citation ecosystem.




                                       Needs for Data Management & Citation                35
Fixity
Tracking

           • Are files, bitstreams corrupted?
           • Do semantics remain the same over time, across formats, software
             analysis systems?
           Some semantic approaches…
  Universal Numeric Fingerprint - Canonicalization              Perceptual Signatures –
                                                                Characterization of Significant Properties




                                    Needs for Data Management & Citation                           36
Audit [aw-dit]:

              An independent evaluation of
              records and activities to
Tracking




              assess a system of controls

           Fixity mitigates risk only if used
                     for auditing.
Example:
           Functions of Storage Auditing
           • Detect
             corruption/deletion of content
Tracking




           • Verify
             compliance with storage/replication
             policies

           • Prompt
             repair actions
Audit Design Choices
           • Audit regularity and coverage:
             on-demand (manually); on event;
             randomized sample;
             scheduled/comprehensive
Tracking




           • Audit procedure, algorithms, certifying
             authority
           • Auditing scope:
             integrity of object; integrity of collection;
             integrity of network; policy compliance;
             public/transparent auditing
           • Trust model
           • Threat model
Lifecycle Management:
     Infrastructure


     Needs for Data Management & Citation   40
Many Tools, Few Solutions
                 “Poor carpenters blame their tools”
                 –Proverb

                 “If all you have is a hammer, everything looks like a nail”
                 – Another Proverb

                 “Ultimately, some people need holes – but no one needs a drill. ”
                 – Yet Another Proverb
Infrastructure




                 • Many scientific tools are embedded in needs,
                   perspectives, and practices of specific disciplines
                 • Identify common requirements
                 • Identify gaps across lifecycle stages and among actors


                                            Needs for Data Management & Citation     41
Core Requirements for Data Sharing Infrastructure
                     • Stakeholder incentives
                           – recognition; citation; payment; compliance; services
Infrastructure


                     • Dissemination
                           – access to metadata; documentation; data
                     • Access control
                           – authentication; authorization; rights management
                     • Provenance
                           – chain of control; verification of metadata, bits, semantic content
                     • Persistence
                           – bits; semantic content; use
                     • Legal protection
                           – rights management; consent; record keeping; auditing
                     • Usability
                           – discovery; deposit; curation; administration; collaboration
                     • Business model

                 Sources: King 2007; ICSU 2004; NSB 2005

                                                     Needs for Data Management & Citation         42
Mind the Gaps
                                      Lifecycle                                            Strengths                                 Other Gaps




                                                     dissemination
                   collection



                                          analysis
                                storage




                                                                     reuse
Scientific                                                                    - Close integration across supported        - Discipline-centric
                                                                                lifecycle                                 - Doesn’t address most storage
Workflow
                                                                              - Perceived as useful service by              requirements (replication, access
Software                                                                        researchers                                 control)
(e.g. Taverna)                                                                - High Performance
Storage                                                                       - Integration across supported lifecycle    - Loose integration of analysis,
                                                                              - Storage is perceived as useful service      insufficient for reproducibility
Grid/VRE
                                                                                by researchers
(e.g. Irods)                                                                  - High performance performance
Institutional                                                                 - Low cost                                  - Access and discovery mechanisms
                                                                              - Institutional commitment to long-           usually tailored to publications, not
Repository                                                                                                                  data
                                                                                term access
(e.g. Dspace)
Reproducible                                                                  - Close integration of analysis and         - Addresses replication but not
                                                                                scientific publication                      reuse for secondary analysis,
Publications                                                                                                                integration
                                                                              - Reduces risk of embarrassment
Systems                                                                         when working with “co-authors”
(e.g. StatWeave)                                                              - Ensures one form of reproducibility
                                                                                (calibration, mechanical replicability)

“Data Archive”                                                                - Richer support for reuse                  - Varied models – curated database;
                                                                              - Often supports cross-discipline             “virtual archive”, disciplinary
                                                                                discovery; long-term access                 repository
                                                                                                                          - Often discipline-centric
                                                                     Needs for Data Management & Citation                                                 43
Exemplar Efforts

(A.K.A., What have you done for me lately?)




           Needs for Data Management & Citation   44
• Audit Data Replication & Integrity
              Policies

               Automatic Auditing of Data
Examplars




                 Replication & Integrity
                        Policies

                    safearchive.org
                          Needs for Data Management & Citation   45
The Distributed Content Replication Problem
            • We hold digital assets we    A Partial Solution: LOCKSS
                                           Self-contained OSS
              wish to preserve
                                           Harvests resources via open
            • Many of these assets are      interfaces
              not replicated
                                           Replicated through secure P2P
            • Even when replicated,         protocol
              vulnerable to single         Self-repairing
Examplars




              points of failure because    Zero trust
              replicas are managed by      Used by hundreds of institution
              single institution            for collaborative preservation


                         What we needed…
  Auditing – how many replicates exist, where & are they
  current?
  Policy    – prove replication are consistent with policy, like
  TRAC?
  Collaboration – coordinateforwith partners to replicate content?46
                         Needs Data Management & Citation
Resilience of peer-to-peer with
            the Accountability of centralized system
Examplars




        Facilitating collaborative replication and preservation with cyberinfrastructure …
        • Collaborators declare explicit non-uniform resource commitments
        • Policy records and schematizes commitments, desired TRAC replication properties
        • Storage layer provides replication, integrity, freshness, versioning
        • SafeArchive software provides monitoring, auditing, transparency, and provisioning
        • Content is harvested through HTTP (LOCKSS) or OAI-PMH
        • Integration of LOCKSS, Institutional Repositories, TRAC
                                     Needs for Data Management & Citation               47
ORCID is an international, interdisciplinary, open, and not-for-profit
            organization created for the benefit of all stakeholders, including research
Examplars




            institutions, funding organizations, publishers, and researchers to enhance
            the scientific discovery process and improve collaboration and the efficiency
            of research funding.

            ORCID aims to solve the name ambiguity problem in scholarly
            communications by creating a registry of persistent unique identifiers for
            individual researchers and an open and transparent linking mechanism
            between ORCID, other ID schemes, and research objects such as publications,
            grants, and patents.

                                 http://orcid.org
                                     Needs for Data Management & Citation              48
ORCID Launch to Public in October
             ORCID Launch Partners Program include research institutions, publishers, research funders, data
             repositories, and third party providers, such as:

                   The American Physical Society, Aries Systems, Avedas, Boston University, the California Institute of
                   Technology, CrossRef, Elsevier, Faculty of 1000, figshare, Hindawi Publishing Corporation, KNODE, Nature
                   Publishing Group, SafetyLit, Symplectic, Thomson Reuters, Total-Impact, and Wellcome Trust.
Examplars




             At Launch, the ORCID Registry will:

             •   Allow researchers and scholars to register for an ORCID identifier, create ORCID records, and
                 manage their privacy settings
             •    Contain ORCID records created by universities on behalf of their researchers and scholars
             •   Allow researchers and scholars to link their ORCID record external identifiers, including Scopus
                 and ResearcherID
             •    Facilitate synchronization of ORCID identifier record data with external systems including
                 Scopus
             •   Bi-directionally link to a number of author profile and manuscript submission, including the
                 American Physical Society, Aries Systems, Hindawi Publishing Corporation, Nature Publishing
                 Group, and Scholar One Manuscripts
             •    Allow researchers and scholars to search and upload publication metadata from CrossRef
             •   (Soon after launch) have the ability to link to grant application systems


                                                Needs for Data Management & Citation                                          49
Data Management Workflows
             for Open Access Journals
Examplars




                                    +



                http://bit.ly/DVNOJS
                    Needs for Data Management & Citation   50
Embed Real Data Archives in Journals
            • Embed remotely managed
              data archive in OJS journal
            • Replaces “supplemental
              materials”
            • Ads
               – Online analysis
Examplars




               – Independent storage
               – Persistent identifiers and
                 citation
               – Data versioning
               – Enhanced discoverability
                 and interoperability
               – Format normalization
               – Fixity and replication

                                  Needs for Data Management & Citation   51
Integrated Policies, Workflow, Access
            • OJS and DVN
                – Support workflows
                – Enforce policies
                – Disseminate content

            • Integrate policies for
                – Access and data license
Examplars




                – Embargoes
                – Citation
            • Coordinate
                – Submission
                – Review
                – Publication
            • Link
                – Content
                – Subscriptions & notifications
                – Usage Metrics
                                       Needs for Data Management & Citation   52
Wrapping Up



Needs for Data Management & Citation   53
How will we see the geography of science e,
when we reveal how research connects through
                    data?




                                                      Research & Node Layout: Kevin Boyack and Dick
                                                      Klavans (mapofscience.com); Data: Thompson ISI;
                                                      Graphics & Typography: W. Bradford Paley
                                                      (didi.com/brad); Commissioned Katy Börner
                                                      (scimaps.org)

                                                      Seed Magazine, Mar 7, 2007
                                                      http://seedmagazine.com/content/article/scientific_m
                                                      ethod_relationships_among_scientific_paradigms/
               Needs for Data Management & Citation                                             54
Summary
•   Principled approach to data management
     –   Follow information through information lifecycle
     –   Assess stakeholder requirements
     –   Track management, use, impact across lifecycle
•   Data management planning goals
     –   Orchestrate data for current use
     –   Protect against disclosure
     –   Compliance with contracts, regulations, law, and policy
     –   Maximize value of information assets
     –   Ensure short term and long term dissemination
•   Lifecycle data management tracking
     –   Identification – identifiers, references, citations
     –   Provenance – relationship of delivered data to history of inputs and modifications and actors responsible for
         these
     –   Authenticity: assertions about the provenance of the records
     –   Chain of custody: assertions about the ownership of the records
     –   Integrity: assertions about the management of the records; fixity of bits; fixity of semantics
     –   Auditing: verification of properties & policy compliance
•   Data citation is a key leverage point
     –   Services: establish provenance; access; long-term preservation
     –   Incentives: scholarly credit; reproducible research policies; impact/usage analysis
     –   Data citations should be first class objects for publication -- appear with citations to other works;
         should be as easy to cite as other work



                                       Needs for Data Management & Citation                                         55
Additional References
• Buckheit J, Donoho DL. Wavelab and reproducible research. In:
  Antoniadis A, editor. Wavelets and Statistics. New York, NY:
  Springer; 1995. p. 55-81.
• International Council For Science (ICSU) 2004. ICSU Report of the
  CSPR Assessment Panel on Scientific Data and Information. Report.
• King, Gary. 2007. "An Introduction to the Dataverse Network as an
  Infrastructure for Data Sharing." Sociological Methods and Research
  36
• Moore, M. 2008, Towards a Theory of Digital Preservation,
  International Journal of Digital Curation 1(3)
• National Science Board (NSB), 2005, Long-Lived Digital Data
  Collections: Enabling Research and Education in the 21rst Century,
  NSF. (NSB-05-40).



                        Needs for Data Management & Citation        56
Discussion
Contact information:


  Web: http://micahaltman.com

  E-mail: micah_altman@alumni.brown.edu

 Twitter: @drmaltman

Más contenido relacionado

La actualidad más candente

RDAP13 Mark Leggott: Stewarding research data using the Islandora framework
RDAP13 Mark Leggott: Stewarding research data using the Islandora frameworkRDAP13 Mark Leggott: Stewarding research data using the Islandora framework
RDAP13 Mark Leggott: Stewarding research data using the Islandora frameworkASIS&T
 
Research Data Curation _ Grad Humanities Class
Research Data Curation _ Grad Humanities ClassResearch Data Curation _ Grad Humanities Class
Research Data Curation _ Grad Humanities ClassAaron Collie
 
Research Data Management Fundamentals for MSU Engineering Students
Research Data Management Fundamentals for MSU Engineering StudentsResearch Data Management Fundamentals for MSU Engineering Students
Research Data Management Fundamentals for MSU Engineering StudentsAaron Collie
 
A basic course on Research data management: part 1 - part 4
A basic course on Research data management: part 1 - part 4A basic course on Research data management: part 1 - part 4
A basic course on Research data management: part 1 - part 4Leon Osinski
 
Creating a sustainable business model for a digital repository: the Dryad exp...
Creating a sustainable business model for a digital repository: the Dryad exp...Creating a sustainable business model for a digital repository: the Dryad exp...
Creating a sustainable business model for a digital repository: the Dryad exp...ASIS&T
 
Publishing biodiversity: The interplay between Scratchpads and the new Biodiv...
Publishing biodiversity: The interplay between Scratchpads and the new Biodiv...Publishing biodiversity: The interplay between Scratchpads and the new Biodiv...
Publishing biodiversity: The interplay between Scratchpads and the new Biodiv...Dimitrios Koureas
 
Saving private data, sharing Open Data? Role of libraries and institutional r...
Saving private data, sharing Open Data? Role of libraries and institutional r...Saving private data, sharing Open Data? Role of libraries and institutional r...
Saving private data, sharing Open Data? Role of libraries and institutional r...Chris Rusbridge
 
Martin Donnelly Sarah Jones DMP Online
Martin Donnelly Sarah Jones DMP OnlineMartin Donnelly Sarah Jones DMP Online
Martin Donnelly Sarah Jones DMP OnlineFuture Perfect 2012
 
From policy to practice with DMP Online
From policy to practice with DMP OnlineFrom policy to practice with DMP Online
From policy to practice with DMP OnlineSarah Jones
 
Data sharing and data management – what are they all about?
Data sharing and data management –  what are they all about?Data sharing and data management –  what are they all about?
Data sharing and data management – what are they all about?Belinda Weaver
 
Making Data Dynamic: Views from UC3, CDL
Making Data Dynamic: Views from UC3, CDLMaking Data Dynamic: Views from UC3, CDL
Making Data Dynamic: Views from UC3, CDLCarly Strasser
 
Supporting UC Research Data Management
Supporting UC Research Data ManagementSupporting UC Research Data Management
Supporting UC Research Data Managementslabrams
 
Data Management: The Current Landscape
Data Management: The Current LandscapeData Management: The Current Landscape
Data Management: The Current LandscapeCarly Strasser
 
Libraries and Research Data Curation: Barriers and Incentives for Preservatio...
Libraries and Research Data Curation: Barriers and Incentives for Preservatio...Libraries and Research Data Curation: Barriers and Incentives for Preservatio...
Libraries and Research Data Curation: Barriers and Incentives for Preservatio...University of California Curation Center
 
RDFC2012 Open Access to Research Data
RDFC2012 Open Access to Research DataRDFC2012 Open Access to Research Data
RDFC2012 Open Access to Research DataGudmundur Thorisson
 

La actualidad más candente (18)

RDAP13 Mark Leggott: Stewarding research data using the Islandora framework
RDAP13 Mark Leggott: Stewarding research data using the Islandora frameworkRDAP13 Mark Leggott: Stewarding research data using the Islandora framework
RDAP13 Mark Leggott: Stewarding research data using the Islandora framework
 
Research Data Curation _ Grad Humanities Class
Research Data Curation _ Grad Humanities ClassResearch Data Curation _ Grad Humanities Class
Research Data Curation _ Grad Humanities Class
 
Research data life cycle
Research data life cycleResearch data life cycle
Research data life cycle
 
Research Data Management Fundamentals for MSU Engineering Students
Research Data Management Fundamentals for MSU Engineering StudentsResearch Data Management Fundamentals for MSU Engineering Students
Research Data Management Fundamentals for MSU Engineering Students
 
A basic course on Research data management: part 1 - part 4
A basic course on Research data management: part 1 - part 4A basic course on Research data management: part 1 - part 4
A basic course on Research data management: part 1 - part 4
 
Creating a sustainable business model for a digital repository: the Dryad exp...
Creating a sustainable business model for a digital repository: the Dryad exp...Creating a sustainable business model for a digital repository: the Dryad exp...
Creating a sustainable business model for a digital repository: the Dryad exp...
 
Publishing biodiversity: The interplay between Scratchpads and the new Biodiv...
Publishing biodiversity: The interplay between Scratchpads and the new Biodiv...Publishing biodiversity: The interplay between Scratchpads and the new Biodiv...
Publishing biodiversity: The interplay between Scratchpads and the new Biodiv...
 
Saving private data, sharing Open Data? Role of libraries and institutional r...
Saving private data, sharing Open Data? Role of libraries and institutional r...Saving private data, sharing Open Data? Role of libraries and institutional r...
Saving private data, sharing Open Data? Role of libraries and institutional r...
 
Martin Donnelly Sarah Jones DMP Online
Martin Donnelly Sarah Jones DMP OnlineMartin Donnelly Sarah Jones DMP Online
Martin Donnelly Sarah Jones DMP Online
 
From policy to practice with DMP Online
From policy to practice with DMP OnlineFrom policy to practice with DMP Online
From policy to practice with DMP Online
 
Data sharing and data management – what are they all about?
Data sharing and data management –  what are they all about?Data sharing and data management –  what are they all about?
Data sharing and data management – what are they all about?
 
Making Data Dynamic: Views from UC3, CDL
Making Data Dynamic: Views from UC3, CDLMaking Data Dynamic: Views from UC3, CDL
Making Data Dynamic: Views from UC3, CDL
 
Supporting UC Research Data Management
Supporting UC Research Data ManagementSupporting UC Research Data Management
Supporting UC Research Data Management
 
Data Management: The Current Landscape
Data Management: The Current LandscapeData Management: The Current Landscape
Data Management: The Current Landscape
 
Libraries and Research Data Curation: Barriers and Incentives for Preservatio...
Libraries and Research Data Curation: Barriers and Incentives for Preservatio...Libraries and Research Data Curation: Barriers and Incentives for Preservatio...
Libraries and Research Data Curation: Barriers and Incentives for Preservatio...
 
Metadata Workshop
Metadata WorkshopMetadata Workshop
Metadata Workshop
 
NISO DCMI Webinar bibframe-20130123
NISO DCMI Webinar bibframe-20130123NISO DCMI Webinar bibframe-20130123
NISO DCMI Webinar bibframe-20130123
 
RDFC2012 Open Access to Research Data
RDFC2012 Open Access to Research DataRDFC2012 Open Access to Research Data
RDFC2012 Open Access to Research Data
 

Similar a Needs for Data Management & Citation Throughout the Information Lifecycle

Metadata in general and Dublin Core in specific; some experiences
Metadata in general and Dublin Core in specific; some experiencesMetadata in general and Dublin Core in specific; some experiences
Metadata in general and Dublin Core in specific; some experiencesKerstin Forsberg
 
Supporting Libraries in Leading the Way in Research Data Management
Supporting Libraries in Leading the Way in Research Data ManagementSupporting Libraries in Leading the Way in Research Data Management
Supporting Libraries in Leading the Way in Research Data ManagementMarieke Guy
 
e-Science, Research Data and Libaries
e-Science, Research Data and Libariese-Science, Research Data and Libaries
e-Science, Research Data and LibariesRob Grim
 
CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...
CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...
CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...SEAD
 
DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake ...
DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake ...DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake ...
DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake ...University of California Curation Center
 
Publishing your research: Research Data Management (Introduction)
Publishing your research: Research Data Management (Introduction) Publishing your research: Research Data Management (Introduction)
Publishing your research: Research Data Management (Introduction) Jamie Bisset
 
2-6-14 ESI Supplemental Webinar: The Data Information Literacy Project
2-6-14 ESI Supplemental Webinar: The Data Information  Literacy Project2-6-14 ESI Supplemental Webinar: The Data Information  Literacy Project
2-6-14 ESI Supplemental Webinar: The Data Information Literacy ProjectDuraSpace
 
The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...Projeto RCAAP
 
ESI Supplemental Webinar 2 - DataONE presentation slides
ESI Supplemental Webinar 2 - DataONE presentation slides ESI Supplemental Webinar 2 - DataONE presentation slides
ESI Supplemental Webinar 2 - DataONE presentation slides DuraSpace
 
Open Data in Slovenia: An assessment of Accountability among Stakeholders, 2012
Open Data in Slovenia: An assessment of Accountability among Stakeholders, 2012Open Data in Slovenia: An assessment of Accountability among Stakeholders, 2012
Open Data in Slovenia: An assessment of Accountability among Stakeholders, 2012Arhiv družboslovnih podatkov
 
Research Data Management
Research Data ManagementResearch Data Management
Research Data ManagementJamie Bisset
 
Software Management Plans and Software as Data
Software Management Plans and Software as DataSoftware Management Plans and Software as Data
Software Management Plans and Software as DataSarah Anna Stewart
 
Crushing, Blending, and Stretching Data
Crushing, Blending, and Stretching DataCrushing, Blending, and Stretching Data
Crushing, Blending, and Stretching DataRay Schwartz
 
Data mining - GDi Techno Solutions
Data mining - GDi Techno SolutionsData mining - GDi Techno Solutions
Data mining - GDi Techno SolutionsGDi Techno Solutions
 
Iassist 2012 dms public version
Iassist 2012 dms public versionIassist 2012 dms public version
Iassist 2012 dms public versionjhudms
 
Linked Open data: CNR
Linked Open data: CNRLinked Open data: CNR
Linked Open data: CNRDatiGovIT
 
Data Management Lab: Session 1 Slides
Data Management Lab: Session 1 SlidesData Management Lab: Session 1 Slides
Data Management Lab: Session 1 SlidesIUPUI
 
Simon Hodson
Simon HodsonSimon Hodson
Simon HodsonEduserv
 

Similar a Needs for Data Management & Citation Throughout the Information Lifecycle (20)

Metadata in general and Dublin Core in specific; some experiences
Metadata in general and Dublin Core in specific; some experiencesMetadata in general and Dublin Core in specific; some experiences
Metadata in general and Dublin Core in specific; some experiences
 
Supporting Libraries in Leading the Way in Research Data Management
Supporting Libraries in Leading the Way in Research Data ManagementSupporting Libraries in Leading the Way in Research Data Management
Supporting Libraries in Leading the Way in Research Data Management
 
e-Science, Research Data and Libaries
e-Science, Research Data and Libariese-Science, Research Data and Libaries
e-Science, Research Data and Libaries
 
What is-rdm
What is-rdmWhat is-rdm
What is-rdm
 
CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...
CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...
CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...
 
DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake ...
DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake ...DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake ...
DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake ...
 
Publishing your research: Research Data Management (Introduction)
Publishing your research: Research Data Management (Introduction) Publishing your research: Research Data Management (Introduction)
Publishing your research: Research Data Management (Introduction)
 
2-6-14 ESI Supplemental Webinar: The Data Information Literacy Project
2-6-14 ESI Supplemental Webinar: The Data Information  Literacy Project2-6-14 ESI Supplemental Webinar: The Data Information  Literacy Project
2-6-14 ESI Supplemental Webinar: The Data Information Literacy Project
 
The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...
 
ESI Supplemental Webinar 2 - DataONE presentation slides
ESI Supplemental Webinar 2 - DataONE presentation slides ESI Supplemental Webinar 2 - DataONE presentation slides
ESI Supplemental Webinar 2 - DataONE presentation slides
 
Open Data in Slovenia: An assessment of Accountability among Stakeholders, 2012
Open Data in Slovenia: An assessment of Accountability among Stakeholders, 2012Open Data in Slovenia: An assessment of Accountability among Stakeholders, 2012
Open Data in Slovenia: An assessment of Accountability among Stakeholders, 2012
 
Research Data Management
Research Data ManagementResearch Data Management
Research Data Management
 
Software Management Plans and Software as Data
Software Management Plans and Software as DataSoftware Management Plans and Software as Data
Software Management Plans and Software as Data
 
Crushing, Blending, and Stretching Data
Crushing, Blending, and Stretching DataCrushing, Blending, and Stretching Data
Crushing, Blending, and Stretching Data
 
Data mining - GDi Techno Solutions
Data mining - GDi Techno SolutionsData mining - GDi Techno Solutions
Data mining - GDi Techno Solutions
 
Iassist 2012 dms public version
Iassist 2012 dms public versionIassist 2012 dms public version
Iassist 2012 dms public version
 
Linked Open data: CNR
Linked Open data: CNRLinked Open data: CNR
Linked Open data: CNR
 
Data Management Lab: Session 1 Slides
Data Management Lab: Session 1 SlidesData Management Lab: Session 1 Slides
Data Management Lab: Session 1 Slides
 
Intro to RDM
Intro to RDMIntro to RDM
Intro to RDM
 
Simon Hodson
Simon HodsonSimon Hodson
Simon Hodson
 

Más de Micah Altman

Selecting efficient and reliable preservation strategies
Selecting efficient and reliable preservation strategiesSelecting efficient and reliable preservation strategies
Selecting efficient and reliable preservation strategiesMicah Altman
 
Well-Being - A Sunset Conversation
Well-Being - A Sunset ConversationWell-Being - A Sunset Conversation
Well-Being - A Sunset ConversationMicah Altman
 
Matching Uses and Protections for Government Data Releases: Presentation at t...
Matching Uses and Protections for Government Data Releases: Presentation at t...Matching Uses and Protections for Government Data Releases: Presentation at t...
Matching Uses and Protections for Government Data Releases: Presentation at t...Micah Altman
 
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019Micah Altman
 
Well-being A Sunset Conversation
Well-being A Sunset ConversationWell-being A Sunset Conversation
Well-being A Sunset ConversationMicah Altman
 
Can We Fix Peer Review
Can We Fix Peer ReviewCan We Fix Peer Review
Can We Fix Peer ReviewMicah Altman
 
Academy Owned Peer Review
Academy Owned Peer ReviewAcademy Owned Peer Review
Academy Owned Peer ReviewMicah Altman
 
Redistricting in the US -- An Overview
Redistricting in the US -- An OverviewRedistricting in the US -- An Overview
Redistricting in the US -- An OverviewMicah Altman
 
A Future for Electoral Districting
A Future for Electoral DistrictingA Future for Electoral Districting
A Future for Electoral DistrictingMicah Altman
 
A History of the Internet :Scott Bradner’s Program on Information Science Talk
A History of the Internet :Scott Bradner’s Program on Information Science Talk  A History of the Internet :Scott Bradner’s Program on Information Science Talk
A History of the Internet :Scott Bradner’s Program on Information Science Talk Micah Altman
 
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...Micah Altman
 
Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...
Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...
Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...Micah Altman
 
Utilizing VR and AR in the Library Space:
Utilizing VR and AR in the Library Space:Utilizing VR and AR in the Library Space:
Utilizing VR and AR in the Library Space:Micah Altman
 
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-Nots
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-NotsCreative Data Literacy: Bridging the Gap Between Data-Haves and Have-Nots
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-NotsMicah Altman
 
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...Micah Altman
 
Ndsa 2016 opening plenary
Ndsa 2016 opening plenaryNdsa 2016 opening plenary
Ndsa 2016 opening plenaryMicah Altman
 
Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...Micah Altman
 
Software Repositories for Research-- An Environmental Scan
Software Repositories for Research-- An Environmental ScanSoftware Repositories for Research-- An Environmental Scan
Software Repositories for Research-- An Environmental ScanMicah Altman
 
The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...
The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...
The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...Micah Altman
 
Gary Price, MIT Program on Information Science
Gary Price, MIT Program on Information ScienceGary Price, MIT Program on Information Science
Gary Price, MIT Program on Information ScienceMicah Altman
 

Más de Micah Altman (20)

Selecting efficient and reliable preservation strategies
Selecting efficient and reliable preservation strategiesSelecting efficient and reliable preservation strategies
Selecting efficient and reliable preservation strategies
 
Well-Being - A Sunset Conversation
Well-Being - A Sunset ConversationWell-Being - A Sunset Conversation
Well-Being - A Sunset Conversation
 
Matching Uses and Protections for Government Data Releases: Presentation at t...
Matching Uses and Protections for Government Data Releases: Presentation at t...Matching Uses and Protections for Government Data Releases: Presentation at t...
Matching Uses and Protections for Government Data Releases: Presentation at t...
 
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019
 
Well-being A Sunset Conversation
Well-being A Sunset ConversationWell-being A Sunset Conversation
Well-being A Sunset Conversation
 
Can We Fix Peer Review
Can We Fix Peer ReviewCan We Fix Peer Review
Can We Fix Peer Review
 
Academy Owned Peer Review
Academy Owned Peer ReviewAcademy Owned Peer Review
Academy Owned Peer Review
 
Redistricting in the US -- An Overview
Redistricting in the US -- An OverviewRedistricting in the US -- An Overview
Redistricting in the US -- An Overview
 
A Future for Electoral Districting
A Future for Electoral DistrictingA Future for Electoral Districting
A Future for Electoral Districting
 
A History of the Internet :Scott Bradner’s Program on Information Science Talk
A History of the Internet :Scott Bradner’s Program on Information Science Talk  A History of the Internet :Scott Bradner’s Program on Information Science Talk
A History of the Internet :Scott Bradner’s Program on Information Science Talk
 
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
 
Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...
Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...
Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...
 
Utilizing VR and AR in the Library Space:
Utilizing VR and AR in the Library Space:Utilizing VR and AR in the Library Space:
Utilizing VR and AR in the Library Space:
 
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-Nots
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-NotsCreative Data Literacy: Bridging the Gap Between Data-Haves and Have-Nots
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-Nots
 
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...
 
Ndsa 2016 opening plenary
Ndsa 2016 opening plenaryNdsa 2016 opening plenary
Ndsa 2016 opening plenary
 
Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...
 
Software Repositories for Research-- An Environmental Scan
Software Repositories for Research-- An Environmental ScanSoftware Repositories for Research-- An Environmental Scan
Software Repositories for Research-- An Environmental Scan
 
The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...
The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...
The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...
 
Gary Price, MIT Program on Information Science
Gary Price, MIT Program on Information ScienceGary Price, MIT Program on Information Science
Gary Price, MIT Program on Information Science
 

Último

The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 

Último (20)

The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 

Needs for Data Management & Citation Throughout the Information Lifecycle

  • 1. Prepared for NISO Forum: Tracking it Back to the Source: Managing and Citing Research Data September 2012 Needs for Data Management & Citation Throughout the Information Lifecycle Micah Altman Director of Research, MIT Libraries
  • 2. Collaborators and Co-Conspirators • Jonathan Crabtree, Merce Crosas, Gary King, Tom Lipkis, Nancy McGovern, John Willinsky • Research Support – Library of Congress (PA#NDP03-1), – National Science Foundation (DMS-0835500, SES 0112072) – Institute for Museum and Library Services (LG-05-09-0041-09) – Sloan Foundation – Amazon Web Services – Massachusetts Institute of Technology Needs for Data Management & Citation 2
  • 3. Related Work Reprints available from: http://maltman.hmdc.harvard.edu • Altman, M. 2012. Data Citation in The Dataverse Network ®. In P. F. Uhlir (Ed.), Developing Data Attribution and Citation Practices and Standards: Report from an International Workshop (p. Forthcoming). National Academies Press. Forthcoming. • Altman, M., & Crabtree, J. 2011. Using the SafeArchive System : TRAC-Based Auditing of LOCKSS. Archiving 2011 (pp. 165–170). Society for Imaging Science and Technology. • M. Altman, Adams, M., Crabtree, J., Donakowski, D., Maynard, M., Pienta, A., & Young, C. 2009. "Digital preservation through archival collaboration: The Data Preservation Alliance for the Social Sciences." The American Archivist. 72(1): 169-182 M. Altman, 2008, "A Fingerprint Method for Verification of Scientific Data" in, Advances in Systems, Computing Sciences and Software Engineering, (Proceedings of the International Conference on Systems, Computing Sciences and Software Engineering 2007) , Springer-Verlag. • M. Altman and G. King. 2007. “A Proposed Standard for the Scholarly Citation of Quantitative Data”, D-Lib, 13, 3/4 (March/April). Needs for Data Management & Citation 3
  • 4. Preview • Principled approach to data management • Lifecycle data management planning • Lifecycle data management tracking • Lifecycle data management infrastructure • [Exemplar Projects] Needs for Data Management & Citation 4
  • 5. (Some) Timely Challenges Needs for Data Management & Citation 5
  • 6. “Data science is suddenly sexy – does that mean data is the new black?” Needs for Data Management & Citation 6
  • 7. Valuable Data is Lost • Researchers lack Examples archiving capability Intentionally Discarded: “Destroyed, in accord with [nonexistent] APA 5-year post-publication rule.” • Incentives for data Unintentional Hardware Problems “Some data were sharing are weak collected, but the data file was lost in a technical malfunction.” Acts of Nature The data from the studies were on punched cards that were destroyed in a flood in the department in the early 80s.” Discarded or Lost in a Move “As I retired …. Unfortunately, I simply didn’t have the room to store these data sets at my house.” Obsolescence “Speech recordings stored on a LISP Machine…, an experimental computer which is long obsolete.” Simply Lost “For all I know, they are on a [University] server, but it has been literally years and years since the research was done, and my files are long gone.” Research by: Needs for Data Management & Citation 7
  • 8. Unpublished Data Ends up in the “Desk Drawer” • Null results are less likely to be published • Outliers are routinely discarded Daniel Schectman’s Lab Notebook Providing Initial Evidence of Quasi Crystals Needs for Data Management & Citation 8
  • 9. Data Behind Publications Unavailable for Review, Reuse, Replication Needs for Data Management & Citation 9
  • 10. Model Science “Citations to unpublished data and personal communications cannot be used to support claims in a published paper” “All data necessary to understand, assess, and extend the conclusions of the manuscript must be available to any reader of Science.” Needs for Data Management & Citation 10
  • 11. Compliance with Policies is Low  Compliance is low even in best examples of journals  Checking compliance manually is tedious, doesn’t scale Needs for Data Management & Citation 11
  • 12. Special Challenges for Long-Term Access to New Forms of Data • Some Examples – GIS and geospatial trails – Facebook & social networks – Text: blogs, tweets – Cell phone data • Challenges – Proprietary – intellectual Source: [Calberese 2008] property – Size – Dynamic content – Fixity – Format Needs for Data Management & Citation 12
  • 13. A Lifecycle Framework Needs for Data Management & Citation 13
  • 14. “The published article is not scientific output – it’s a summary of scientific output.” -- corollary of Buckheit & Donaho 1995 Needs for Data Management & Citation 14
  • 15. Information Lifecycle Long-term Creation/Collecti access on Modeling Re-use • Scientific Storage/I • Educational ngest • Scientometric • Institutional External dissemination/publicati Processing on Internal Analysis Sharing Needs for Data Management & Citation 15
  • 16. Stakeholders Data Consumers Long- Sources/Su Creation/C bjects term ollection access Data Modeling Archives/ Storage/ Publisher Re-use Researchers Ingest Research Research Sponsors Organizations External dissemination/ Processing publication Scholarly Internal Analysis Publishers Sharing Service/Infras tructure Needs for Data Management & Citation Providers 16
  • 17. Legal Requirements and Rights Contract Intellectual Property Trade Secret Intellectual Contract Click-Wrap Patent Attribution TOU License Moral Rights Modeling Database Rights Journal Funder Open Copyright DMCA Trademar Replication Access k Requirement Fair Use Rights of Common s Publicity Rule HIPAA 45 CFR 26 Privacy FOIA EU Privacy FERPA Torts Directive (Invasion, State Defamation) FOI CIPSEA Potentially Laws State Harmful Privacy Laws (Archeologic al Sites, Classifie Sensitive Animal butd Testing, …) Access EA Confidentiality Unclassifie Rights d R ITAR
  • 18. Stakeholders, Rights and Requirements Contract Intellectual Property Trade Secret Intellectual Contract Click-Wrap Scholarly Patent Publisher Attribution TOU License s Moral Rights Modeling Consumers - Secondary research - Participative Science - - Public policy uses Database Rights Journal Funder Open Copyright Infrastructure/Serv DMCA Trademar Replication Access Primary ice Providers k Requirement Fair Use Rights of Researchers Common s Publicity Research HIPAA HIPAA Rule FOIA Organizations 45 CFR 26 Privacy EU Privacy Torts FERPA FERPA Directive (Invasion, State Data Archives CIPSEA Defamation) FOI Laws State Potentially Privacy Laws Harmful Classifie (Archeologic Research al Sites, Sponsors Sensitive Sources/S d Animal but ubjects Testing, …) Access Unclassifie Confidentiality Rights d
  • 19. Stakeholder Drivers per Stage of Information Lifecycle Stage Actors Legal Constraint Concerns Research Subjects - Consent/contract - Public benefit Proposal, - Privacy Design and - Future access to own Modeling Data information Collection Sources - Intellectual - Business confidentiality Property - IP - Contract - Profit from licenses Funder - Open Access - Public benefit - Confidentiality - Policy Relevance - Reproducible Research - Future access Primary - Confidentiality - Publication potential Researcher - Contract - Compliance with - IP institutional/funder requirements Research - Confidentiality - Compliance with funder Institution - Contract requirements - IP Needs for Data Management & Citation - License, IP, confidentiality 19 compliance
  • 20. Stakeholder Drivers per Stage of Information Lifecycle Stage Actors Legal Constraint Concerns Data Storage, Primary - Confidentiality - Publication potential Analysis Researcher - Contract - Compliance with (Pre-publication) - IP institutional/funder Modeling requirements Research - Confidentiality - License, IP, Institution - Contract confidentiality - IP compliance - Records management Service - Contract - Contract Providers - (Selected Cases) - Service business Confidentiality model Requirements - Service deployment Needs for Data Management & Citation 20
  • 21. Stakeholder Drivers per Stage of Information Lifecycle Stage Actors Legal Constraint Concerns Publication Primary Compliance for: - Scholarly attribution/credit Researcher - Source/subjects - Promote use of research - Sponsor - Track use/impact of research - Host institution Modeling - Publisher Sponsor - Track research products - Track compliance - Track use/impact Research - Sponsor compliance - Track OA products Institution - Records management - Intellectual property Scholarly - IP - Impact/use /Journal - Contract - Profit/business model Publisher - Replicability Data - IP - Profit/business model Publisher - Replicability - Connection to publication Needs for Data Management & Citation 21
  • 22. Stakeholder Drivers per Stage of Information Lifecycle Stage Actors Legal Constraint Concerns Re(use) Research - Access Rights - Provenance Reader Modeling Secondary - Access rights - Replicability Researcher - Confidentiality - Data reintegration/reanalysis - Contract - Linking publications and data - Provenance “Citizen/Co Access Rights - Data mmunity redissemination/reanalysis Scientist” - Linking publications and data Public Policy Access Rights - Provenance - Replicability - Linking publications and data Education Access Rights - “Classroom” use /teaching - MOOC use Needs for Data Management & Citation 22
  • 23. Lifecycle Management: Data Management Planning Needs for Data Management & Citation 23
  • 24. Some Formal “DMP” Requirements • The Final NIH Statement on Sharing Research Data – was published in the NIH Guide on February 26, 2003. “Starting with the October 1, 2003 receipt date, investigators submitting an NIH application seeking $500,000 or more in direct costs in any single year Planning are expected to include a plan for data sharing or state why data sharing is not possible. “ – No later than the main findings from the final data set are accepted for publication • NSF, All proposals must (as of 1/1/2011) include a data management plan. – Specific requirements vague, for the most part: “will be determined by the community of interest through the process of peer review and program management.” • Wellcome Trust: – “ will review data management and sharing plans, and any costs involved in delivering them, as an integral part of the funding decision” Needs for Data Management & Citation 24
  • 25. DMP Goals • Orchestrate data for current use • Control disclosure • Compliance with contracts, regulations, law, Planing and policy • Maximize value of information assets • Ensure short term and long term dissemination Needs for Data Management & Citation 25
  • 26. DMP Elements • Orchestrate data for current use – Data description – Quality Assurance – Data value – Storage, backup, replication, and – Relation to collection versioning – Relation to evidence base – Data Formats – Budget – Data Organization Planning – Budget • Ensure short term and long term – Metadata and documentation dissemination – Data description • Control disclosure – Institutional Archiving Commitments – Access and Sharing – Audience – Intellectual Property Rights – Access and Sharing – Legal Requirements – Data Formats – Security – Data Organization – Metadata and documentation • Compliance with contracts, – Budget regulations, law, and policy – Access and Sharing – Adherence – Responsibility – Ethics and privacy – Security • Value of information assets Needs for Data Management & Citation 26
  • 27. DMP Details • Sharing – Restrictions on use – Plans for depositing in an existing public database • Budget – Access procedures – Cost of preparing data and documentation – Embargo periods – Cost of storage and backup – Access charges – Cost of permanent archiving and access – Timeframe for access • Intellectual Property Rights – Technical access methods – Entities who hold property rights – Restrictions on access – Types of IP rights in data • Long term access – Protections provided (Preservation) – Dispute resolution process – Planning Requirements for data destruction, if applicable • Legal Requirements – Procedures for long term preservation – Provider requirements and plans to meet them – Institution responsible for long-term costs of data preservation – Institutional requirements and plans to meet them – Succession plans for data should archiving entity go out of existence • Responsibility • Formats – Individual or project team role responsible for data management – Generation and dissemination formats and procedural justification – Qualifications, certifications, and licenses of responsible parties – Storage format and archival justification • Ethics and privacy – Format documentation – Informed consent • Metadata and documentation – Protection of privacy – Internal and External Identifiers and Citations – Data use agreements – Metadata to be provided – Other ethical issues – Metadata standards used • Adherence – Planned documentation and supporting materials – When will adherence to data management plan be checked or – Quality assurance procedures for metadata and documentation demonstrated • Data Organization – Who is responsible for managing data in the project – File organization – Who is responsible for checking adherence to data management plan – Naming conventions – Auditing procedures and framework • Storage, backup, replication, and versioning • Value of information assets – Facilities – Project use value – Methods – Institutional audience and uses – Procedures – Public audience and uses – Frequency – Relation to institutional collection – Replication – Relation to disciplinary evidence base – Version management – Cost of re-creating data – Recovery guarantees • Security – Procedural controls – Technical Controls – Confidentiality concerns – Access control rules Needs for Data Management & Citation 27
  • 28. Approaching Requirement Overlap • Sanity-check DMP details with lifecycle questions: – Who wants it? Planning – What do they need it for? – When will it be used? • Be conscious of elements that serve multiple goals / or lifecycle – Metadata/documentation – Identifiers – Budgets – Formats – IP Rights and confidentiality restrictions – Responsibilities/Adherence • Use tracking tools and methods throughout lifecycle This Way… Needs for Data Management & Citation 28
  • 29. Lifecycle Management: Tracking Needs for Data Management & Citation 29
  • 30. What do we track? What tools and methods provide technical leverage or incentives to management across lifecycle stages and among actors? Tracking • Identification – identifiers, references, citations • Provenance – relationship of delivered data to history of inputs and modifications and actors responsible for these ; revision control; versioning • Authenticity: assertions about the provenance of the records • Respect des fonds: assertions about the original organization of the records • Chain of custody: assertions about the ownership of the records • Integrity: assertions about the management of the records; fixity of bits; fixity of semantics • Auditing: verification of properties & policy compliance Sources: Bulleted list of attributes adapted from Moore 2008 Needs for Data Management & Citation 30
  • 31. Tracking Across Information Lifecycle Long-term Creation/Collecti access on identifiers Tracking Storage/I Re-use ngest Metadata for: Integrity, Provenance, citation Custody External dissemination/publicati Processing on Internal Analysis Sharing 31
  • 32. Data Citation: a Point of Leverage • Services – Identifiers to specific fixed versions of data are needed to establish unambiguous chains of provenance – Identifiers that can be globally resolved to machine- understandable metadata and to identified object are needed to Tracking building generalized access and analysis services – Persistence of identifiers are needed to maintain long-term access • Incentives – Scholarly credit (intellectual attribution) is a large motivator for many researchers – citation creates incentive for researchers to publish data – Scholars also comply with enforceable journal policies -- requiring data citation is a light-weight method to make data access policies auditable – Impact/usage is a motivator for public research funders – data citation provides foundation for measures of usage and impact Needs for Data Management & Citation 32
  • 33. Emerging Practices for Data Citation • Publishers – OECD iLibrary – Thomson Reuters Tracking Data Citation Index • Data archives – Dataverse Network – Data-PASS • Harmonization efforts – DataCite – NAS BRDI – ICSU/Co-Data • Discipline specific Needs for Data Management & Citation 33
  • 34. Identifier and Citation Use Cases Attribution • Provide scholarly attribution • Provide legal attribution • Identify contributors to data Verification Discovery • Associate work with version • Locate data via identifier of evidence used • Locate data integral to article • Verify fixity of bits • Locate works related to data • Verify fixity of information – articles, derivatives, • Verify “authenticity” of work sources Access Persistence • Access to surrogate • Does evidence persists as long as assertions based on • On-line access to object it? • Machine understandability • Is durability of evidence • Long-term understandability transparent? Needs for Data Management & Citation 34
  • 35. Emerging Principles for Data Citation • Data citations should be first class objects for publication -- appear with citations to other works; should be as easy Tracking to cite as other works • Citations should persist and enable access to fixed version of data at least as long as citing work • Citations should persist and enable access to fixed version of data at least as long as the citing work exists. • Citations should support unambiguous attribution of credit to all contributors, possibly through the citation ecosystem. Needs for Data Management & Citation 35
  • 36. Fixity Tracking • Are files, bitstreams corrupted? • Do semantics remain the same over time, across formats, software analysis systems? Some semantic approaches… Universal Numeric Fingerprint - Canonicalization Perceptual Signatures – Characterization of Significant Properties Needs for Data Management & Citation 36
  • 37. Audit [aw-dit]: An independent evaluation of records and activities to Tracking assess a system of controls Fixity mitigates risk only if used for auditing.
  • 38. Example: Functions of Storage Auditing • Detect corruption/deletion of content Tracking • Verify compliance with storage/replication policies • Prompt repair actions
  • 39. Audit Design Choices • Audit regularity and coverage: on-demand (manually); on event; randomized sample; scheduled/comprehensive Tracking • Audit procedure, algorithms, certifying authority • Auditing scope: integrity of object; integrity of collection; integrity of network; policy compliance; public/transparent auditing • Trust model • Threat model
  • 40. Lifecycle Management: Infrastructure Needs for Data Management & Citation 40
  • 41. Many Tools, Few Solutions “Poor carpenters blame their tools” –Proverb “If all you have is a hammer, everything looks like a nail” – Another Proverb “Ultimately, some people need holes – but no one needs a drill. ” – Yet Another Proverb Infrastructure • Many scientific tools are embedded in needs, perspectives, and practices of specific disciplines • Identify common requirements • Identify gaps across lifecycle stages and among actors Needs for Data Management & Citation 41
  • 42. Core Requirements for Data Sharing Infrastructure • Stakeholder incentives – recognition; citation; payment; compliance; services Infrastructure • Dissemination – access to metadata; documentation; data • Access control – authentication; authorization; rights management • Provenance – chain of control; verification of metadata, bits, semantic content • Persistence – bits; semantic content; use • Legal protection – rights management; consent; record keeping; auditing • Usability – discovery; deposit; curation; administration; collaboration • Business model Sources: King 2007; ICSU 2004; NSB 2005 Needs for Data Management & Citation 42
  • 43. Mind the Gaps Lifecycle Strengths Other Gaps dissemination collection analysis storage reuse Scientific - Close integration across supported - Discipline-centric lifecycle - Doesn’t address most storage Workflow - Perceived as useful service by requirements (replication, access Software researchers control) (e.g. Taverna) - High Performance Storage - Integration across supported lifecycle - Loose integration of analysis, - Storage is perceived as useful service insufficient for reproducibility Grid/VRE by researchers (e.g. Irods) - High performance performance Institutional - Low cost - Access and discovery mechanisms - Institutional commitment to long- usually tailored to publications, not Repository data term access (e.g. Dspace) Reproducible - Close integration of analysis and - Addresses replication but not scientific publication reuse for secondary analysis, Publications integration - Reduces risk of embarrassment Systems when working with “co-authors” (e.g. StatWeave) - Ensures one form of reproducibility (calibration, mechanical replicability) “Data Archive” - Richer support for reuse - Varied models – curated database; - Often supports cross-discipline “virtual archive”, disciplinary discovery; long-term access repository - Often discipline-centric Needs for Data Management & Citation 43
  • 44. Exemplar Efforts (A.K.A., What have you done for me lately?) Needs for Data Management & Citation 44
  • 45. • Audit Data Replication & Integrity Policies Automatic Auditing of Data Examplars Replication & Integrity Policies safearchive.org Needs for Data Management & Citation 45
  • 46. The Distributed Content Replication Problem • We hold digital assets we A Partial Solution: LOCKSS  Self-contained OSS wish to preserve  Harvests resources via open • Many of these assets are interfaces not replicated  Replicated through secure P2P • Even when replicated, protocol vulnerable to single  Self-repairing Examplars points of failure because  Zero trust replicas are managed by  Used by hundreds of institution single institution for collaborative preservation What we needed… Auditing – how many replicates exist, where & are they current? Policy – prove replication are consistent with policy, like TRAC? Collaboration – coordinateforwith partners to replicate content?46 Needs Data Management & Citation
  • 47. Resilience of peer-to-peer with the Accountability of centralized system Examplars Facilitating collaborative replication and preservation with cyberinfrastructure … • Collaborators declare explicit non-uniform resource commitments • Policy records and schematizes commitments, desired TRAC replication properties • Storage layer provides replication, integrity, freshness, versioning • SafeArchive software provides monitoring, auditing, transparency, and provisioning • Content is harvested through HTTP (LOCKSS) or OAI-PMH • Integration of LOCKSS, Institutional Repositories, TRAC Needs for Data Management & Citation 47
  • 48. ORCID is an international, interdisciplinary, open, and not-for-profit organization created for the benefit of all stakeholders, including research Examplars institutions, funding organizations, publishers, and researchers to enhance the scientific discovery process and improve collaboration and the efficiency of research funding. ORCID aims to solve the name ambiguity problem in scholarly communications by creating a registry of persistent unique identifiers for individual researchers and an open and transparent linking mechanism between ORCID, other ID schemes, and research objects such as publications, grants, and patents. http://orcid.org Needs for Data Management & Citation 48
  • 49. ORCID Launch to Public in October ORCID Launch Partners Program include research institutions, publishers, research funders, data repositories, and third party providers, such as: The American Physical Society, Aries Systems, Avedas, Boston University, the California Institute of Technology, CrossRef, Elsevier, Faculty of 1000, figshare, Hindawi Publishing Corporation, KNODE, Nature Publishing Group, SafetyLit, Symplectic, Thomson Reuters, Total-Impact, and Wellcome Trust. Examplars At Launch, the ORCID Registry will: • Allow researchers and scholars to register for an ORCID identifier, create ORCID records, and manage their privacy settings • Contain ORCID records created by universities on behalf of their researchers and scholars • Allow researchers and scholars to link their ORCID record external identifiers, including Scopus and ResearcherID • Facilitate synchronization of ORCID identifier record data with external systems including Scopus • Bi-directionally link to a number of author profile and manuscript submission, including the American Physical Society, Aries Systems, Hindawi Publishing Corporation, Nature Publishing Group, and Scholar One Manuscripts • Allow researchers and scholars to search and upload publication metadata from CrossRef • (Soon after launch) have the ability to link to grant application systems Needs for Data Management & Citation 49
  • 50. Data Management Workflows for Open Access Journals Examplars + http://bit.ly/DVNOJS Needs for Data Management & Citation 50
  • 51. Embed Real Data Archives in Journals • Embed remotely managed data archive in OJS journal • Replaces “supplemental materials” • Ads – Online analysis Examplars – Independent storage – Persistent identifiers and citation – Data versioning – Enhanced discoverability and interoperability – Format normalization – Fixity and replication Needs for Data Management & Citation 51
  • 52. Integrated Policies, Workflow, Access • OJS and DVN – Support workflows – Enforce policies – Disseminate content • Integrate policies for – Access and data license Examplars – Embargoes – Citation • Coordinate – Submission – Review – Publication • Link – Content – Subscriptions & notifications – Usage Metrics Needs for Data Management & Citation 52
  • 53. Wrapping Up Needs for Data Management & Citation 53
  • 54. How will we see the geography of science e, when we reveal how research connects through data? Research & Node Layout: Kevin Boyack and Dick Klavans (mapofscience.com); Data: Thompson ISI; Graphics & Typography: W. Bradford Paley (didi.com/brad); Commissioned Katy Börner (scimaps.org) Seed Magazine, Mar 7, 2007 http://seedmagazine.com/content/article/scientific_m ethod_relationships_among_scientific_paradigms/ Needs for Data Management & Citation 54
  • 55. Summary • Principled approach to data management – Follow information through information lifecycle – Assess stakeholder requirements – Track management, use, impact across lifecycle • Data management planning goals – Orchestrate data for current use – Protect against disclosure – Compliance with contracts, regulations, law, and policy – Maximize value of information assets – Ensure short term and long term dissemination • Lifecycle data management tracking – Identification – identifiers, references, citations – Provenance – relationship of delivered data to history of inputs and modifications and actors responsible for these – Authenticity: assertions about the provenance of the records – Chain of custody: assertions about the ownership of the records – Integrity: assertions about the management of the records; fixity of bits; fixity of semantics – Auditing: verification of properties & policy compliance • Data citation is a key leverage point – Services: establish provenance; access; long-term preservation – Incentives: scholarly credit; reproducible research policies; impact/usage analysis – Data citations should be first class objects for publication -- appear with citations to other works; should be as easy to cite as other work Needs for Data Management & Citation 55
  • 56. Additional References • Buckheit J, Donoho DL. Wavelab and reproducible research. In: Antoniadis A, editor. Wavelets and Statistics. New York, NY: Springer; 1995. p. 55-81. • International Council For Science (ICSU) 2004. ICSU Report of the CSPR Assessment Panel on Scientific Data and Information. Report. • King, Gary. 2007. "An Introduction to the Dataverse Network as an Infrastructure for Data Sharing." Sociological Methods and Research 36 • Moore, M. 2008, Towards a Theory of Digital Preservation, International Journal of Digital Curation 1(3) • National Science Board (NSB), 2005, Long-Lived Digital Data Collections: Enabling Research and Education in the 21rst Century, NSF. (NSB-05-40). Needs for Data Management & Citation 56
  • 57. Discussion Contact information: Web: http://micahaltman.com E-mail: micah_altman@alumni.brown.edu Twitter: @drmaltman

Notas del editor

  1. This work. by Micah Altman (http://micahaltman.com) is licensed under the Creative Commons Attribution-Share Alike 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.
  2. Most of the different stakeholders have stronger relationships/stakes with research at different stages. But researchers and research institutions are in the middle – they have a strong stake in most stagesResearchers are more directly concerned with collection, processing, analysis, dissemination. Organizations have a higher stake in internal sharing, re-use, long-term access.
  3. This section is an a more detailed deep-dive into drivers at major stages of the information lifecycle. It is not intended to be part of the main presentation – but could be used to respond to questions, or to focus on a particular stage.