SlideShare una empresa de Scribd logo
1 de 58
Descargar para leer sin conexión
Research Data Management
Spring 2014: Session 1
Practical strategies for better results
University Library
Center for Digital Scholarship
Acknowledgements
Department of Biostatistics – Data Management,
Indiana University School of Medicine
Colleagues at Johns Hopkins University, Purdue
University, Oregon State University, University of
Oregon, New York University, and others who shared
their expertise.
ROAD MAP FOR THIS LAB
Overview
• Four sessions, 2 hours each
• Some lecture, more discussion and activities
• Major products
– Practical, detailed data management plan [DRAFT]
– Map of data outcomes
– Storage & backup plan
– Documentation checklist
– Data quality standards
– Screening & cleaning checklist
Products & Resources
• Box folders
– Session 1, 2, 3, 4: Materials for each session
– Resources: Miscellaneous resources that span
sessions or are useful later
– Upload HERE: Folder for uploading products
• Will be used to assess my teaching – content & delivery
• Will NOT be used to assess you
• Please delete your name from the file before you
upload them
1. Research data
management plans
& planning
2. Documentation &
metadata
3. Data quality
4. Ethical & Legal issues
in data sharing &
reuse
Session 1
1. Research data management plans & planning
a) Planning for good data management from the
start
b) Defining expected outcomes for your data
c) Getting a storage and backup plan
Activities & Discussions
• Introductions (<1 minute each)
–Name
–Department or Program
–What do you want to get out of these
workshops?
INTRODUCTION TO RESEARCH DATA
MANAGEMENT
MODULE 1
LEARNING
OUTCOMES
• Describe key challenges
associated with
managing digital
research data
• Identify the potential
consequences for
irresponsible or
inattentive data
management
Photocourtesyofwww.carboafrica.net
Data is collected from sensors, sensor
networks, remote sensing, observations,
and more - this calls for increased attention
to data management and stewardship
Data Deluge
Photocourtesyof
http://modis.gsfc.nasa.gov/
Photocourtesyof
http://www.futurlec.com
CCimagebytajaionFlickr
CCimagebyCIMMYTonFlickr
ImagecollectedbyVivHutchinson
Source: John Gantz, IDC Corporation: The Expanding Digital Universe
0
100,000
200,000
300,000
400,000
500,000
600,000
700,000
800,000
900,000
1,000,000
2005 2006 2007 2008 2009 2010
The World of Data Around Us)
Transient
information
or unfilled
demand for
storage
Information
Available Storage
PetabytesWorldwide
Why Data Management
• Natural disaster
• Facilities infrastructure failure
• Storage failure
• Server hardware/software failure
• Application software failure
• External dependencies (e.g. PKI
failure)
• Format obsolescence
• Legal encumbrance
• Human error
• Malicious attack by human or
automated agents
• Loss of staffing competencies
• Loss of institutional commitment
• Loss of financial stability
• Changes in user expectations and
requirements
CCimagebySharynMorrowonFlickr
CCimagebymomboleumonFlickr
Best Practices
Best Practices for Preparing Ecological Data Sets, ESA, August 2010
Poor data practice results in loss of information
(data entropy)
InformationContent
Time
Time of publication
Specific details
General details
Accident
Retirement or
career change
Death
(Michener et al. 1997)
14
Data Loss
15
.33
Vines et al, 2014
“MEDICARE PAYMENT ERRORS NEAR $20B” (CNN) December 2004
Miscoding and Billing Errors from Doctors and Hospitals totaled $20,000,000,000 in FY
2003 (9.3% error rate) . The error rate measured claims that were paid despite being
medically unnecessary, inadequately documented or improperly coded. In some
instances, Medicare asked health care providers for medical records to back up their
claims and got no response. The survey did not document instances of alleged fraud.
This error rate actually was an improvement over the previous fiscal year (9.8% error rate).
“AUDIT: JUSTICE STATS ON ANTI-TERROR CASES FLAWED” (AP) February 2007
The Justice Department Inspector General found only two sets of data out of 26
concerning terrorism attacks were accurate. The Justice Department uses these
statistics to argue for their budget. The Inspector General said the data “appear to be
the result of decentralized and haphazard methods of collections … and do not appear
to be intentional.”
“OOPS! TECH ERROR WIPES OUT Alaska Info” (AP) March 2007
A technician managed to delete the data and backup for the $38 billion Alaska oil
revenue fund – money received by residents of the State. Correcting the errors cost the
State an additional $220,700 (which of course was taken off the receipts to Alaska
residents.)
Slide courtesy of BLM
Professional Stakes
Benefits of GOOD Data Management
• Efficiency
• Safety
• Quality
• Reputation
• Compliance
Minute paper
Why should we care about how
research data is managed?
[Subtext: Why should researchers spend time
managing their data better?]
Don’t forget to upload your paper to Box.
References
1. DataONE Education Module: Data Management. DataONE. Retrieved
December 2013. From http://www.dataone.org/sites/all/documents/
L01_DataManagement.pptx
2. Cook, B. (2013). NACP All Investigator Meeting: Data Management
Practices for Early Career Scientists. Presented February 3, 2013. From
http://daac.ornl.gov/NACP_AIM_2013/NACP_AIM_Agenda.html
3. Vines et al, (2014), Current Biology, The availability of research data
declines rapidly with article age.
http://dx.doi.org/10.1016/j.cub.2013.11.014
DATA MANAGEMENT PLANS &
PLANNING
MODULE 1
LEARNING
OUTCOMES
• Understand the life
cycle approach to
managing research data
• Summarize the basic
components of US
federal funding agency
requirements for data
management and
sharing.
• Outline planned project
and data
documentation in a data
management plan.
• Define expected
outcomes for data.
The Life Cycle Approach
• Helps define and explain complex processes
(graphically). (Carlson, 2013)
• Help to identify important components, roles,
responsibilities, milestones, etc. (Carlson, 2013)
• Demonstrates connections and relationships
between parts and the whole. (Carlson, 2013)
• Emphasizes the role of data management as an
active process embedded throughout the
research and knowledge creation life cycles.
DataONE Data Life Cycle
Amanda Whitmire, 2013
Humphrey, Knowledge Creation Cycle
Progress Towards Openness
1985:
National
Research
Council
1999:
OMB
Circular
A-110
revisions
2003:
NIH Data
Sharing
Policy
2008:
NIH
Public
Access
Policy
2011:
NSF DMP
Requirem
ent
2012:
NEH,
Office of
Digital
Humaniti
es DMP
Requirem
ent
2013:
NSF Bio
sketch
change
2013:
OSTP
Memo on
Public
Access to
the
Results of
Federally-
Funded
Research
OSTP Memo - February 2013
• Data
– Maximize access by the general public and without charge…protecting
confidentiality and personal privacy
– …recognizing proprietary interests, business confidential information,
and intellectual property rights
– …preserving the balance between the relative value of long-term
preservation and access and administrative burden
– …ensure all researchers develop data management plans
– Ensure appropriate evaluation of the merits of submitted DMPs
– Promote the deposit of data in publicly accessible databases
– …support training, education, and workforce development related to
scientific data management, analysis, storage, preservation, and
stewardship
Policy Drivers
• Funding agencies
– Increased impact of funding dollars
– Reduce redundant data collection
– Further scientific research
• Research Communities
– Enhance use and value of existing data
– Address big challenges
Data Management Planning
Plan
Collect
Assure
Describe
Preserve
Discover
Integrate
Analyze
DMPs – What do they do?
• Outlines what you will do with your data
during and after you complete your research
• Submitted to funders – formal document
• Functional DMP – working document
– Start developing during design
– Use to guide project start-up
– Review and update throughout the project
DMPs – Why?
• Doing it right saves you time and makes your
research more efficient
– Document crucial information for your thesis or
dissertation
• Makes it easier to preserve and share your data
• Increases visibility of research
Data management is an
investment in your research to
make it easier and more efficient.
A dose of DMP realism
My data management plan – a satire
DMP
Introduction to the DMP
• Workshop - emphasis on planning
• BUT it is a working document
Sections to draft
• Data description
• Existing data (if applicable)
• Format
Mapping Data Outcomes
• Clearly describe what you want your research
project to accomplish
• Define what the data need to be in order for
you to answer your research questions
• Review example
DMP
Data mapping exercise – map out
research questions through data
fields/points/variables
References
1. Carlson, J. (2013). ICPSR Curating and Managing Data for
Reuse: Life Cycle Models and Principles.
2. DataONE Education Module: Data Management Planning.
DataONE. From http://www.dataone.org/sites/all/
documents/L03_DataManagementPlanning.pptx
3. Humphrey, C. (2008). e-Science and the Life Cycle of
Research. From http://datalib.library.ualberta.ca/
~humphrey/lifecycle-science060308.doc
4. Whitmire, A. (2013). Research Life Cycle. From
http://guides.library.oregonstate.edu/content.php?pid=5020
68&sid=4136875
ETHICAL & LEGAL OBLIGATIONS
MODULE 1
LEARNING
OUTCOMES
• Identify your legal
obligations for sharing
and long-term
preservation.
• Identify your ethical
obligations for ensuring
data confidentiality,
privacy, and security.
• Describe intellectual
property issues for data
that result in a
patentable or
commercial product.
Ethical vs. Legal
• Ethical (Professional Society, Licensure, Community of Practice)
– Sharing (consent, IRB approval, de-identification, etc.)
– Redistribution & Re-use
– Citation
• Legal (Federal, State, Local, Funding Agency, Institution)
– Intellectual Property (e.g., who owns it?)
– Copyright
– Patents
– Trade secrets
– Licensing
– Monetary exchange
– Open source vs. proprietary software
– Data retention
Privacy
• Privacy: having control over the extent, timing, and
circumstances of sharing oneself (physically, behaviorally, or
intellectually) with others.
• Federal guidelines: FERPA, HIPAA
• Most research involves asking subjects to provide or release
information voluntarily following an informed consent
process.
• Privacy issues arise in regard to information obtained for
research purposes without the consent of the subjects.
Confidentiality
• Confidentiality: treatment of information that an individual has
disclosed in a relationship of trust and with the expectation that it
will not be divulged to others in ways that are inconsistent with the
understanding of the original disclosure without permission.
• Questions to consider:
– Are identifiers really needed or could data be collected anonymously?
– If identifiers are needed, can coded IDs be created to use for data collection,
merging, and analysis, with identifiers kept entirely separate and secure?
– How will the data be protected from inadvertent disclosure or unauthorized
access during collection, storage, and analysis?
– Should data be manipulated in specific ways to reduce specificity, by
collapsing data into categories with small numbers of individuals, reducing age
or geographic specificity, etc.
Intellectual Property Rights
• Patent
• Copyright
• Trademark
• Design
• Circuit Layout Right
• Plant Breeder’s Right
• Trade Secret
DMP
Sections to work on:
• Ethics and privacy
• Legal obligations
References
1. Australian Research Council. (nd). National Principles of
Intellectual Property Management for Publicly Funded
Research. From http://www.arc.gov.au/pdf/01_01.pdf
STORAGE & BACKUP
MODULE 1
LEARNING
OUTCOMES
• Prepare a
comprehensive storage
and backup plan.
• Create protected copies
of files at crucial points
in your study.
Storage & Back-up Plan
• Storage
– Keep primary copies in a secure, accessible location
• Backup
– Additional copies to prevent data loss
– Rule of 3
– Diversify hardware, software, and physical location
• Other considerations
– Security, encryption, compression
Storage @ IU
• Box @ IU
– http://kb.iu.edu/data/bdsv.html
• Research File System
– http://kb.iu.edu/data/aroz.html
• Scholarly Data Archive
– http://kb.iu.edu/data/aiyi.html
• REDCap
– http://www.indianactsi.org/rct
• Slashtmp (sharing)
– http://kb.iu.edu/data/angt.html
Backup Plan
• Rule of 3
– Local copy (ex: desktop or laptop)
– Semi-local copy (ex: IU cloud storage)
– Remote copy (ex: IU cloud storage)
• Backup frequency
– How much data can you risk losing?
• Backup procedure
– Manual or automatic?
– Full or incremental?
– Verification/testing?
– Documentation
Security & Encryption
• Use IU systems
– Strong authentication protocols
• Encryption
– Useful for portable devices (e.g., laptops, external hard
drives, flash drives, smartphones, etc.)
– Use for highly sensitive data
– IU recommendations
• http://kb.iu.edu/data/ayzi.html
• http://kb.iu.edu/data/bcnh.html
Master Files
• Provides snapshots of key phases in the data life
cycle
– Raw
– Cleaned
– Phases of processing
• In combination with detailed documentation, these
files make write-up easier and supports
reproducibility and reuse
EF-5 Horror Stories
• World’s Biggest Data Breaches:
http://www.informationisbeautiful.net/visualizations/worlds-
biggest-data-breaches-hacks/
• Excel error responsible for misinterpretation of data and
resulting policy decisions: http://arstechnica.com/tech-
policy/2013/04/
microsoft-excel-the-ruiner-of-global-economies/
• Sandy’s floodwaters damage 1500 volumes of digital art:
http://www.theverge.com/2013/1/15/3876790/eyebeam-
hurricane-sandy-digital-archive-rescue
EF-3 Horror Stories
• UNC Researcher Demoted over data breach:
– http://www.insidehighered.com/news/2011/01/27/unc_case_h
ighlights_debate_about_data_security_and_accountability_for_
hacks
– http://www.databreaches.net/cancer-researcher-fights-unc-
demotion-over-data-breach/
• UK Tamiflu Clinical Trial data:
http://blogs.plos.org/speakingofmedicine/2014/01/03/follow-the-
money-or-why-it-took-an-accounts-committee-to-decide-why-
access-to-clinical-trial-data-matters/
• Data loss at Emory Healthcare exposes over 315,000 patients:
http://www.bizjournals.com/atlanta/news/2012/04/18/data-loss-
at-emory-healthcare-exposes.html?s=print
EF-1 Horror Stories
• PLoS Retraction: http://retractionwatch.com/2013/01/30/
study-links-failure-to-share-data-with-poor-quality-research-
and-leads-to-a-plos-one-retraction/
• Stolen laptops, flash drives, etc:
http://news.cnet.com/8301-17938_105-20028475-1.html
http://gawker.com/5625139/grad-students-thesis-dreams-on-
stolen-laptop
http://www.techrepublic.com/forums/questions/help-i-am-
afraid-ive-lost-my-dissertation/
• Data Management & Sharing Snafu in 3 acts:
https://www.youtube.com/watch?v=N2zK3sAtr-4
Minute Paper
Describe how your storage and backup
plan will address the key risks for your
data.
Don’t forget to upload your paper to Box.
DMP
Sections to work on:
• Data organization
–Storage & Backup Plan
Don’t forget to upload your DMP to Box.
Wrapping up
What’s next?
Discussion
• What worked?
• What didn’t?

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Urinalysis
UrinalysisUrinalysis
Urinalysis
 
Protein fractionation
Protein fractionationProtein fractionation
Protein fractionation
 
determination of moisture content in a food stuff.
determination of moisture content in a food stuff.determination of moisture content in a food stuff.
determination of moisture content in a food stuff.
 
Automation in cytology.
Automation in cytology.Automation in cytology.
Automation in cytology.
 
Food Analysis.pptx
Food Analysis.pptxFood Analysis.pptx
Food Analysis.pptx
 
cream separation.pptx
cream separation.pptxcream separation.pptx
cream separation.pptx
 
Culture media
Culture mediaCulture media
Culture media
 
Cytopathology Lab manual for MLT Students
Cytopathology Lab manual for MLT Students Cytopathology Lab manual for MLT Students
Cytopathology Lab manual for MLT Students
 
Protein Analysis Methods.pptx
Protein Analysis Methods.pptxProtein Analysis Methods.pptx
Protein Analysis Methods.pptx
 
Incubators types and applications
Incubators  types and applicationsIncubators  types and applications
Incubators types and applications
 
Determination of specific gravity
Determination of specific gravityDetermination of specific gravity
Determination of specific gravity
 
Enzymetic Analysis
Enzymetic AnalysisEnzymetic Analysis
Enzymetic Analysis
 
Butter analysis
Butter analysis Butter analysis
Butter analysis
 
Milk minerals and salts ppt
Milk minerals and salts pptMilk minerals and salts ppt
Milk minerals and salts ppt
 
water bath instrument
 water bath instrument water bath instrument
water bath instrument
 
Physico chemical properties of milk
Physico chemical properties of milkPhysico chemical properties of milk
Physico chemical properties of milk
 
Uncertainty of measurement d dhingra
Uncertainty of measurement d dhingraUncertainty of measurement d dhingra
Uncertainty of measurement d dhingra
 
Laboratory hazards, safety and contamination
Laboratory hazards, safety and contaminationLaboratory hazards, safety and contamination
Laboratory hazards, safety and contamination
 
Composition of milk
Composition of milkComposition of milk
Composition of milk
 
Radioimmunoassay
RadioimmunoassayRadioimmunoassay
Radioimmunoassay
 

Similar a Data Management Lab: Session 1 Slides

DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake ...
DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake ...DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake ...
DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake ...
University of California Curation Center
 

Similar a Data Management Lab: Session 1 Slides (20)

DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake ...
DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake ...DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake ...
DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake ...
 
Managing data responsibly to enable research interity
Managing data responsibly to enable research interityManaging data responsibly to enable research interity
Managing data responsibly to enable research interity
 
Research Data Management for SOE
Research Data Management for SOEResearch Data Management for SOE
Research Data Management for SOE
 
Meeting the NSF DMP Requirement June 13, 2012
Meeting the NSF DMP Requirement June 13, 2012Meeting the NSF DMP Requirement June 13, 2012
Meeting the NSF DMP Requirement June 13, 2012
 
DataONE Education Module 01: Why Data Management?
DataONE Education Module 01: Why Data Management?DataONE Education Module 01: Why Data Management?
DataONE Education Module 01: Why Data Management?
 
Creating dmp
Creating dmpCreating dmp
Creating dmp
 
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
 
Managing and Sharing Research Data
Managing and Sharing Research DataManaging and Sharing Research Data
Managing and Sharing Research Data
 
Meeting the NSF DMP Requirement: March 7, 2012
Meeting the NSF DMP Requirement: March 7, 2012Meeting the NSF DMP Requirement: March 7, 2012
Meeting the NSF DMP Requirement: March 7, 2012
 
Data Management for the Digital Humanities
Data Management for the Digital HumanitiesData Management for the Digital Humanities
Data Management for the Digital Humanities
 
Praetzellis "Data Management Planning and Tools"
Praetzellis "Data Management Planning and Tools"Praetzellis "Data Management Planning and Tools"
Praetzellis "Data Management Planning and Tools"
 
DMPTool Webinar 10: More Extensive DMPs
DMPTool Webinar 10: More Extensive DMPsDMPTool Webinar 10: More Extensive DMPs
DMPTool Webinar 10: More Extensive DMPs
 
Practical Research Data Management: tools and approaches, pre- and post-award
Practical Research Data Management:  tools and approaches, pre- and post-awardPractical Research Data Management:  tools and approaches, pre- and post-award
Practical Research Data Management: tools and approaches, pre- and post-award
 
Guidelines for OSTP Data Access Plans
Guidelines for OSTP Data Access PlansGuidelines for OSTP Data Access Plans
Guidelines for OSTP Data Access Plans
 
NISO Training Thursday Crafting a Scientific Data Management Plan
NISO Training Thursday Crafting a Scientific Data Management PlanNISO Training Thursday Crafting a Scientific Data Management Plan
NISO Training Thursday Crafting a Scientific Data Management Plan
 
Data Management Planning
Data Management PlanningData Management Planning
Data Management Planning
 
Research Data Management and Sharing for the Social Sciences and Humanities
Research Data Management and Sharing for the Social Sciences and HumanitiesResearch Data Management and Sharing for the Social Sciences and Humanities
Research Data Management and Sharing for the Social Sciences and Humanities
 
Managing your research data
Managing your research dataManaging your research data
Managing your research data
 
Introduction to data management
Introduction to data managementIntroduction to data management
Introduction to data management
 
RDM and DMP intro
RDM and DMP introRDM and DMP intro
RDM and DMP intro
 

Más de IUPUI

Building the Future of Research Together
Building the Future of Research TogetherBuilding the Future of Research Together
Building the Future of Research Together
IUPUI
 

Más de IUPUI (20)

Altmetrics 101 - Altmetrics in Libraries
Altmetrics 101 - Altmetrics in LibrariesAltmetrics 101 - Altmetrics in Libraries
Altmetrics 101 - Altmetrics in Libraries
 
Gather evidence to demonstrate the impact of your research
Gather evidence to demonstrate the impact of your researchGather evidence to demonstrate the impact of your research
Gather evidence to demonstrate the impact of your research
 
Case studies for open science
Case studies for open scienceCase studies for open science
Case studies for open science
 
Midwest Medical Library Association 2015 Big Data Panel
Midwest Medical Library Association 2015 Big Data PanelMidwest Medical Library Association 2015 Big Data Panel
Midwest Medical Library Association 2015 Big Data Panel
 
Gathering Evidence to Demonstrate Impact
Gathering Evidence to Demonstrate ImpactGathering Evidence to Demonstrate Impact
Gathering Evidence to Demonstrate Impact
 
Citation & altmetrics - a comparison
Citation & altmetrics - a comparisonCitation & altmetrics - a comparison
Citation & altmetrics - a comparison
 
Altmetrics for Team Science
Altmetrics for Team ScienceAltmetrics for Team Science
Altmetrics for Team Science
 
Ensuring data quality
Ensuring data qualityEnsuring data quality
Ensuring data quality
 
Preventing data loss
Preventing data lossPreventing data loss
Preventing data loss
 
Practical Data Management Plans
Practical Data Management PlansPractical Data Management Plans
Practical Data Management Plans
 
Teaching data management in a lab environment (IASSIST 2014)
Teaching data management in a lab environment (IASSIST 2014)Teaching data management in a lab environment (IASSIST 2014)
Teaching data management in a lab environment (IASSIST 2014)
 
Building the Future of Research Together
Building the Future of Research TogetherBuilding the Future of Research Together
Building the Future of Research Together
 
NIH Data Sharing Plan Workshop - Handout
NIH Data Sharing Plan Workshop - HandoutNIH Data Sharing Plan Workshop - Handout
NIH Data Sharing Plan Workshop - Handout
 
NIH Data Sharing Plan Workshop - Slides
NIH Data Sharing Plan Workshop - SlidesNIH Data Sharing Plan Workshop - Slides
NIH Data Sharing Plan Workshop - Slides
 
Data Management Lab: Session 4 Slides
Data Management Lab: Session 4 SlidesData Management Lab: Session 4 Slides
Data Management Lab: Session 4 Slides
 
Data Management Lab: Session 4 Review Outline
Data Management Lab: Session 4 Review OutlineData Management Lab: Session 4 Review Outline
Data Management Lab: Session 4 Review Outline
 
Data Management Lab: Session 3 Slides
Data Management Lab: Session 3 SlidesData Management Lab: Session 3 Slides
Data Management Lab: Session 3 Slides
 
Data Management Lab: Session 3 Data Review Checklist
Data Management Lab: Session 3 Data Review ChecklistData Management Lab: Session 3 Data Review Checklist
Data Management Lab: Session 3 Data Review Checklist
 
Data Management Lab: Session 3 Data Entry Best Practices
Data Management Lab: Session 3 Data Entry Best PracticesData Management Lab: Session 3 Data Entry Best Practices
Data Management Lab: Session 3 Data Entry Best Practices
 
Data Management Lab: Session 3 Data Coding Best Practices
Data Management Lab: Session 3 Data Coding Best PracticesData Management Lab: Session 3 Data Coding Best Practices
Data Management Lab: Session 3 Data Coding Best Practices
 

Último

Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
AnaAcapella
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 

Último (20)

How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptx
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 

Data Management Lab: Session 1 Slides

  • 1. Research Data Management Spring 2014: Session 1 Practical strategies for better results University Library Center for Digital Scholarship
  • 2. Acknowledgements Department of Biostatistics – Data Management, Indiana University School of Medicine Colleagues at Johns Hopkins University, Purdue University, Oregon State University, University of Oregon, New York University, and others who shared their expertise.
  • 3. ROAD MAP FOR THIS LAB
  • 4. Overview • Four sessions, 2 hours each • Some lecture, more discussion and activities • Major products – Practical, detailed data management plan [DRAFT] – Map of data outcomes – Storage & backup plan – Documentation checklist – Data quality standards – Screening & cleaning checklist
  • 5. Products & Resources • Box folders – Session 1, 2, 3, 4: Materials for each session – Resources: Miscellaneous resources that span sessions or are useful later – Upload HERE: Folder for uploading products • Will be used to assess my teaching – content & delivery • Will NOT be used to assess you • Please delete your name from the file before you upload them
  • 6. 1. Research data management plans & planning 2. Documentation & metadata 3. Data quality 4. Ethical & Legal issues in data sharing & reuse
  • 7. Session 1 1. Research data management plans & planning a) Planning for good data management from the start b) Defining expected outcomes for your data c) Getting a storage and backup plan
  • 8. Activities & Discussions • Introductions (<1 minute each) –Name –Department or Program –What do you want to get out of these workshops?
  • 9. INTRODUCTION TO RESEARCH DATA MANAGEMENT MODULE 1
  • 10. LEARNING OUTCOMES • Describe key challenges associated with managing digital research data • Identify the potential consequences for irresponsible or inattentive data management
  • 11. Photocourtesyofwww.carboafrica.net Data is collected from sensors, sensor networks, remote sensing, observations, and more - this calls for increased attention to data management and stewardship Data Deluge Photocourtesyof http://modis.gsfc.nasa.gov/ Photocourtesyof http://www.futurlec.com CCimagebytajaionFlickr CCimagebyCIMMYTonFlickr ImagecollectedbyVivHutchinson
  • 12. Source: John Gantz, IDC Corporation: The Expanding Digital Universe 0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 900,000 1,000,000 2005 2006 2007 2008 2009 2010 The World of Data Around Us) Transient information or unfilled demand for storage Information Available Storage PetabytesWorldwide
  • 13. Why Data Management • Natural disaster • Facilities infrastructure failure • Storage failure • Server hardware/software failure • Application software failure • External dependencies (e.g. PKI failure) • Format obsolescence • Legal encumbrance • Human error • Malicious attack by human or automated agents • Loss of staffing competencies • Loss of institutional commitment • Loss of financial stability • Changes in user expectations and requirements CCimagebySharynMorrowonFlickr CCimagebymomboleumonFlickr
  • 14. Best Practices Best Practices for Preparing Ecological Data Sets, ESA, August 2010 Poor data practice results in loss of information (data entropy) InformationContent Time Time of publication Specific details General details Accident Retirement or career change Death (Michener et al. 1997) 14
  • 16. “MEDICARE PAYMENT ERRORS NEAR $20B” (CNN) December 2004 Miscoding and Billing Errors from Doctors and Hospitals totaled $20,000,000,000 in FY 2003 (9.3% error rate) . The error rate measured claims that were paid despite being medically unnecessary, inadequately documented or improperly coded. In some instances, Medicare asked health care providers for medical records to back up their claims and got no response. The survey did not document instances of alleged fraud. This error rate actually was an improvement over the previous fiscal year (9.8% error rate). “AUDIT: JUSTICE STATS ON ANTI-TERROR CASES FLAWED” (AP) February 2007 The Justice Department Inspector General found only two sets of data out of 26 concerning terrorism attacks were accurate. The Justice Department uses these statistics to argue for their budget. The Inspector General said the data “appear to be the result of decentralized and haphazard methods of collections … and do not appear to be intentional.” “OOPS! TECH ERROR WIPES OUT Alaska Info” (AP) March 2007 A technician managed to delete the data and backup for the $38 billion Alaska oil revenue fund – money received by residents of the State. Correcting the errors cost the State an additional $220,700 (which of course was taken off the receipts to Alaska residents.) Slide courtesy of BLM
  • 18. Benefits of GOOD Data Management • Efficiency • Safety • Quality • Reputation • Compliance
  • 19. Minute paper Why should we care about how research data is managed? [Subtext: Why should researchers spend time managing their data better?] Don’t forget to upload your paper to Box.
  • 20. References 1. DataONE Education Module: Data Management. DataONE. Retrieved December 2013. From http://www.dataone.org/sites/all/documents/ L01_DataManagement.pptx 2. Cook, B. (2013). NACP All Investigator Meeting: Data Management Practices for Early Career Scientists. Presented February 3, 2013. From http://daac.ornl.gov/NACP_AIM_2013/NACP_AIM_Agenda.html 3. Vines et al, (2014), Current Biology, The availability of research data declines rapidly with article age. http://dx.doi.org/10.1016/j.cub.2013.11.014
  • 21. DATA MANAGEMENT PLANS & PLANNING MODULE 1
  • 22. LEARNING OUTCOMES • Understand the life cycle approach to managing research data • Summarize the basic components of US federal funding agency requirements for data management and sharing. • Outline planned project and data documentation in a data management plan. • Define expected outcomes for data.
  • 23. The Life Cycle Approach • Helps define and explain complex processes (graphically). (Carlson, 2013) • Help to identify important components, roles, responsibilities, milestones, etc. (Carlson, 2013) • Demonstrates connections and relationships between parts and the whole. (Carlson, 2013) • Emphasizes the role of data management as an active process embedded throughout the research and knowledge creation life cycles.
  • 27. Progress Towards Openness 1985: National Research Council 1999: OMB Circular A-110 revisions 2003: NIH Data Sharing Policy 2008: NIH Public Access Policy 2011: NSF DMP Requirem ent 2012: NEH, Office of Digital Humaniti es DMP Requirem ent 2013: NSF Bio sketch change 2013: OSTP Memo on Public Access to the Results of Federally- Funded Research
  • 28. OSTP Memo - February 2013 • Data – Maximize access by the general public and without charge…protecting confidentiality and personal privacy – …recognizing proprietary interests, business confidential information, and intellectual property rights – …preserving the balance between the relative value of long-term preservation and access and administrative burden – …ensure all researchers develop data management plans – Ensure appropriate evaluation of the merits of submitted DMPs – Promote the deposit of data in publicly accessible databases – …support training, education, and workforce development related to scientific data management, analysis, storage, preservation, and stewardship
  • 29. Policy Drivers • Funding agencies – Increased impact of funding dollars – Reduce redundant data collection – Further scientific research • Research Communities – Enhance use and value of existing data – Address big challenges
  • 31. DMPs – What do they do? • Outlines what you will do with your data during and after you complete your research • Submitted to funders – formal document • Functional DMP – working document – Start developing during design – Use to guide project start-up – Review and update throughout the project
  • 32. DMPs – Why? • Doing it right saves you time and makes your research more efficient – Document crucial information for your thesis or dissertation • Makes it easier to preserve and share your data • Increases visibility of research Data management is an investment in your research to make it easier and more efficient.
  • 33. A dose of DMP realism My data management plan – a satire
  • 34. DMP Introduction to the DMP • Workshop - emphasis on planning • BUT it is a working document Sections to draft • Data description • Existing data (if applicable) • Format
  • 35. Mapping Data Outcomes • Clearly describe what you want your research project to accomplish • Define what the data need to be in order for you to answer your research questions • Review example
  • 36. DMP Data mapping exercise – map out research questions through data fields/points/variables
  • 37. References 1. Carlson, J. (2013). ICPSR Curating and Managing Data for Reuse: Life Cycle Models and Principles. 2. DataONE Education Module: Data Management Planning. DataONE. From http://www.dataone.org/sites/all/ documents/L03_DataManagementPlanning.pptx 3. Humphrey, C. (2008). e-Science and the Life Cycle of Research. From http://datalib.library.ualberta.ca/ ~humphrey/lifecycle-science060308.doc 4. Whitmire, A. (2013). Research Life Cycle. From http://guides.library.oregonstate.edu/content.php?pid=5020 68&sid=4136875
  • 38. ETHICAL & LEGAL OBLIGATIONS MODULE 1
  • 39. LEARNING OUTCOMES • Identify your legal obligations for sharing and long-term preservation. • Identify your ethical obligations for ensuring data confidentiality, privacy, and security. • Describe intellectual property issues for data that result in a patentable or commercial product.
  • 40. Ethical vs. Legal • Ethical (Professional Society, Licensure, Community of Practice) – Sharing (consent, IRB approval, de-identification, etc.) – Redistribution & Re-use – Citation • Legal (Federal, State, Local, Funding Agency, Institution) – Intellectual Property (e.g., who owns it?) – Copyright – Patents – Trade secrets – Licensing – Monetary exchange – Open source vs. proprietary software – Data retention
  • 41. Privacy • Privacy: having control over the extent, timing, and circumstances of sharing oneself (physically, behaviorally, or intellectually) with others. • Federal guidelines: FERPA, HIPAA • Most research involves asking subjects to provide or release information voluntarily following an informed consent process. • Privacy issues arise in regard to information obtained for research purposes without the consent of the subjects.
  • 42. Confidentiality • Confidentiality: treatment of information that an individual has disclosed in a relationship of trust and with the expectation that it will not be divulged to others in ways that are inconsistent with the understanding of the original disclosure without permission. • Questions to consider: – Are identifiers really needed or could data be collected anonymously? – If identifiers are needed, can coded IDs be created to use for data collection, merging, and analysis, with identifiers kept entirely separate and secure? – How will the data be protected from inadvertent disclosure or unauthorized access during collection, storage, and analysis? – Should data be manipulated in specific ways to reduce specificity, by collapsing data into categories with small numbers of individuals, reducing age or geographic specificity, etc.
  • 43. Intellectual Property Rights • Patent • Copyright • Trademark • Design • Circuit Layout Right • Plant Breeder’s Right • Trade Secret
  • 44. DMP Sections to work on: • Ethics and privacy • Legal obligations
  • 45. References 1. Australian Research Council. (nd). National Principles of Intellectual Property Management for Publicly Funded Research. From http://www.arc.gov.au/pdf/01_01.pdf
  • 47. LEARNING OUTCOMES • Prepare a comprehensive storage and backup plan. • Create protected copies of files at crucial points in your study.
  • 48. Storage & Back-up Plan • Storage – Keep primary copies in a secure, accessible location • Backup – Additional copies to prevent data loss – Rule of 3 – Diversify hardware, software, and physical location • Other considerations – Security, encryption, compression
  • 49. Storage @ IU • Box @ IU – http://kb.iu.edu/data/bdsv.html • Research File System – http://kb.iu.edu/data/aroz.html • Scholarly Data Archive – http://kb.iu.edu/data/aiyi.html • REDCap – http://www.indianactsi.org/rct • Slashtmp (sharing) – http://kb.iu.edu/data/angt.html
  • 50. Backup Plan • Rule of 3 – Local copy (ex: desktop or laptop) – Semi-local copy (ex: IU cloud storage) – Remote copy (ex: IU cloud storage) • Backup frequency – How much data can you risk losing? • Backup procedure – Manual or automatic? – Full or incremental? – Verification/testing? – Documentation
  • 51. Security & Encryption • Use IU systems – Strong authentication protocols • Encryption – Useful for portable devices (e.g., laptops, external hard drives, flash drives, smartphones, etc.) – Use for highly sensitive data – IU recommendations • http://kb.iu.edu/data/ayzi.html • http://kb.iu.edu/data/bcnh.html
  • 52. Master Files • Provides snapshots of key phases in the data life cycle – Raw – Cleaned – Phases of processing • In combination with detailed documentation, these files make write-up easier and supports reproducibility and reuse
  • 53. EF-5 Horror Stories • World’s Biggest Data Breaches: http://www.informationisbeautiful.net/visualizations/worlds- biggest-data-breaches-hacks/ • Excel error responsible for misinterpretation of data and resulting policy decisions: http://arstechnica.com/tech- policy/2013/04/ microsoft-excel-the-ruiner-of-global-economies/ • Sandy’s floodwaters damage 1500 volumes of digital art: http://www.theverge.com/2013/1/15/3876790/eyebeam- hurricane-sandy-digital-archive-rescue
  • 54. EF-3 Horror Stories • UNC Researcher Demoted over data breach: – http://www.insidehighered.com/news/2011/01/27/unc_case_h ighlights_debate_about_data_security_and_accountability_for_ hacks – http://www.databreaches.net/cancer-researcher-fights-unc- demotion-over-data-breach/ • UK Tamiflu Clinical Trial data: http://blogs.plos.org/speakingofmedicine/2014/01/03/follow-the- money-or-why-it-took-an-accounts-committee-to-decide-why- access-to-clinical-trial-data-matters/ • Data loss at Emory Healthcare exposes over 315,000 patients: http://www.bizjournals.com/atlanta/news/2012/04/18/data-loss- at-emory-healthcare-exposes.html?s=print
  • 55. EF-1 Horror Stories • PLoS Retraction: http://retractionwatch.com/2013/01/30/ study-links-failure-to-share-data-with-poor-quality-research- and-leads-to-a-plos-one-retraction/ • Stolen laptops, flash drives, etc: http://news.cnet.com/8301-17938_105-20028475-1.html http://gawker.com/5625139/grad-students-thesis-dreams-on- stolen-laptop http://www.techrepublic.com/forums/questions/help-i-am- afraid-ive-lost-my-dissertation/ • Data Management & Sharing Snafu in 3 acts: https://www.youtube.com/watch?v=N2zK3sAtr-4
  • 56. Minute Paper Describe how your storage and backup plan will address the key risks for your data. Don’t forget to upload your paper to Box.
  • 57. DMP Sections to work on: • Data organization –Storage & Backup Plan Don’t forget to upload your DMP to Box.
  • 58. Wrapping up What’s next? Discussion • What worked? • What didn’t?