SlideShare una empresa de Scribd logo
1 de 30
Why Data Management
Lesson 1: Introduction to Data Management
Why Data Management?
CCimagebyUniversityofMarylandPressReleasesonFlickr
Why Data Management
• The data world around us
• Importance of data management
• The data lifecycle
• The case for data management
CCimagebyinterpunctonFlickr
Why Data Management
After completing this lesson, the participant will be able to:
• Give two general examples of why increasing amounts of
data is a concern
• Explain, using two examples, how lack of data management
makes an impact
• Define the research data lifecycle
• Give one example of how well-managed data can result in
new scientific conclusions
Why Data Management
Images collected by DataOne.org
Why Data Management
Photocourtesyofwww.carboafrica.net
Data are collected from sensors, sensor
networks, remote sensing, observations,
and more - this calls for increased attention
to data management and stewardship
Photocourtesyof
http://modis.gsfc.nasa.gov/
Photocourtesyof
http://www.futurlec.com
CCimagebytajaionFlickr
CCimagebyCIMMYTonFlickr
ImagecollectedbyVivHutchinson
Source: John Gantz, IDC Corporation: The Expanding Digital Universe
0
100,000
200,000
300,000
400,000
500,000
600,000
700,000
800,000
900,000
1,000,000
2005 2006 2007 2008 2009 2010
Transient
information
or unfilled
demand for
storage
Information
Available Storage
PetabytesWorldwide
Why Data Management
• Natural disaster
• Facilities infrastructure failure
• Storage failure
• Server hardware/software failure
• Application software failure
• External dependencies (e.g. PKI
failure)
• Format obsolescence
• Legal encumbrance
• Human error
• Malicious attack by human or
automated agents
• Loss of staffing competencies
• Loss of institutional commitment
• Loss of financial stability
• Changes in user expectations and
requirements
CCimagebySharynMorrowonFlickr
CCimagebymomboleumonFlickr
Why Data Management
“MEDICARE PAYMENT ERRORS NEAR $20B” (CNN) December 2004
Miscoding and billing errors from doctors and hospitals totaled $20 billion in FY 2003 (9.3%
error rate). The error rate measured claims that were paid despite being medically
unnecessary, inadequately documented, or improperly coded. This error rate actually was an
improvement over the previous fiscal year (9.8% error rate).
“AUDIT: JUSTICE STATS ON ANTI-TERROR CASES FLAWED” (AP) February 2007
The Justice Department Inspector General found only two sets of data out of 26 concerning
terrorism attacks were accurate. The Justice Department uses these statistics to argue for
their budget. The Inspector General said the data “appear to be the result of decentralized
and haphazard methods of collections … and do not appear to be intentional.”
“SOCIAL SECURITY DATA CAN TURN PEOPLE INTO THE LIVING DEAD” (NPR) August 2016
In 2011, an audit found that about 1,000 people a month in the U.S. were marked deceased
when they were very much alive. Rona Lawson, who works in the Office of the Inspector
General at the Social Security Administration, says that number has gone down. It's now
around 500 people a month. Lawson says 90 percent of the time, the cascade of
misinformation starts with an input error by Social Security staff — a regular mistake on a
regular office day that just happens to kill a person off, at least on paper.
Slide courtesy of BLM
A wildlife biologist for a small field office was the in-house GIS
expert and provided support for all the staff’s GIS needs. However,
the data were stored on her own workstation. When the biologist
relocated to another office, no one understood how the data were
stored or managed.
Solution: A state office GIS specialist retrieved the workstation and
sifted through files trying to salvage relevant data.
Cost: 1 work month ($4,000) plus the value of
data that were not recovered
Consider that the situation could have been worse, because the data
were not being backed up as they would have been if stored on a server.
Poor Science Data Management Example
Why Data Management
In preparation for a Resource Management Plan, an office
discovered 14 duplicate GPS inventories of roads. However,
because none of the inventories had enough metadata, it was
impossible to know which inventory was best or if any of the
inventories actually met their requirements.
Solution: Re-Inventory roads
Cost: Estimated 9 work months/inventory
@$4,000/work month
(14 inventories = $504,000)
CCimagebyruffin_readyonFlickr
Why Data Management
Why Data Management
“Please forgive my paranoia about protocols, standards, and data review. I'm
in the latter stages of a long career with USGS (30 years, and counting), and
have experienced much. Experience is the knowledge you get just after you
needed it.
Several times, I've seen colleagues called to court in order to testify about
conditions they have observed.
Without a strong tradition of constant review and approval of basic data, they
would've been in deep trouble under cross-examination. Instead, they were
able to produce field notes, data approval records, and the like, to back up
their testimony.
It's one thing to be questioned by a college student who is working on a
project for school. It's another entirely to be grilled by an attorney under oath
with the media present.”
- Nelson Williams, Scientist
US Geological Survey
Why Data Management
The climate scientists at the centre of a media storm
over leaked emails were yesterday cleared of
accusations that they fudged their results and silenced
critics, but a review found they had failed to be open
enough about their work.
Why Data Management
• Manage your data for yourself:
o Keep yourself organized – be able to find your files (data
inputs, analytic scripts, outputs at various stages of the
analytic process, etc.)
o Track your science processes for reproducibility – be able to
match up your outputs with exact inputs and
transformations that produced them
o Better control versions of data – easily identify versions
that can be periodically purged
o Quality control your data more efficiently
Why Data Management
• To avoid data loss (e.g. making backups)
• Format your data for re-use (by yourself or others)
• Be prepared: Document your data for your own
recollection, accountability, and re-use (by yourself or
others)
• Gain credibility and recognition for your science efforts
through data sharing!
CCimagebyUWWResNetonFlickr
Why Data Management
• Data is a valuable asset – it is expensive and time consuming to
collect
• Data should be managed to:
o maximize the effective use and value of data and information
assets
o continually improve the quality including: data accuracy, integrity,
integration, timeliness of data capture and presentation,
relevance, and usefulness
o ensure appropriate use of data and information
o facilitate data sharing
o ensure sustainability and accessibility in long term for re-use in
science
Why Data Management
Here are a few reasons (from the UK Data Archive):
• Increases the impact and visibility of research
• Promotes innovation and potential new data uses
• Leads to new collaborations between data users and creators
• Maximizes transparency and accountability
• Enables scrutiny of research findings
• Encourages improvement and validation of research methods
• Reduces cost of duplicating data collection
• Provides important resources for education and training
Why Data Management
Spatio-Temporal Exploratory
Models predict the
probability of occurrence of
bird species across the United
States at a 35 km x 35 km
grid.
Land Cover
Potential Uses-
• Examine patterns of migration
• Infer impacts of climate change
• Measure patterns of habitat usage
• Measure population trends
Model results
eBird
Meteorology
MODIS –
Remote
sensing data
Occurrence of Indigo Bunting (2008)
Jan Sep DecJunApr
Slide courtesy of DataOne
Why Data Management
A new image processing technique reveals something not before seen in this Hubble Space
Telescope image taken 11 years ago: A faint planet (arrows), the outermost of three
discovered with ground-based telescopes last year around the young star HR 8799.D.
Lafrenière et al., Astrophysical Journal Letters.
“The first thing it tells you is how valuable maintaining long-term archives can be. Here is a major
discovery that’s been lurking in the data for about 10 years!” comments Matt Mountain, director
of the Space Telescope Science Institute in Baltimore, which operates Hubble.
“The second thing its tells you is having a well calibrated archive is necessary but not sufficient to
make breakthroughs — it also takes a very innovative group of people to develop very smart
extraction routines that can get rid of all the artifacts to reveal the planet hidden under all that
telescope and detector structure.”
D.Lafrenièreetal.,ApJLetters
Why Data Management
Plan
Collect
Assure
Describe
Preserve
Discover
Integrate
Analyze
Why Data Management
• …there are best practices… and… tools to help!
• The following data management lessons will illustrate in
detail each stage of the data lifecycle
• Your well-managed and accessible data can contribute to
science in ways you may not even imagine today!
Why Data Management
• If data are:
o Well-organized
o Documented
o Preserved
o Accessible
o Verified as to accuracy and validity
• Result is:
o High quality data
o Easy to share and re-use in science
o Citation and credibility to the researcher
o Cost-savings to science
Why Data Management
• The data deluge has created a surge of information that
needs to be well-managed and made accessible.
• The cost of not doing data management can be very high.
• Be cognizant of best practices and tools associated with the
data lifecycle to manage your data well.
• Many benefits are associated with the act of managing
data, including the ability to find, access, understand,
integrate, and re-use data.
Why Data Management
1. Chatfield, T., Selbach, R. February 2011. Data Management Training
Workshop. Bureau of Land Management (BLM).
2. Strasser, Carly. February 2012. Data Management for Scientists.
http://www.slideshare.net/carlystrasser/oceansciences2012workshop
3. UK Data Archive. May 2011. Managing and Sharing Data: Best
Practices for Researchers. http://www.data-
archive.ac.uk/media/2894/managingsharing.pdf
4. DAMA International, The DAMA Guide to the Data Management Body
of Knowledge. https://www.dama.org/content/body-knowledge
Why Data Management
The full slide deck may be downloaded from:
http://www.dataone.org/education-modules
Suggested citation:
DataONE Education Module: Data Management. DataONE.
Retrieved Nov 16, 2016. From
http://www.dataone.org/sites/all/documents/L01_DataManage
ment.pptx
Copyright license information:
No rights reserved; you may enhance and reuse for
your own purposes. We do ask that you provide
appropriate citation and attribution to DataONE.

Más contenido relacionado

La actualidad más candente

EDI Training Module 12: An Introduction to Metadata and Data Repositories
EDI Training Module 12:  An Introduction to Metadata and Data RepositoriesEDI Training Module 12:  An Introduction to Metadata and Data Repositories
EDI Training Module 12: An Introduction to Metadata and Data RepositoriesEnvironmental Data Initiative
 
Introduction to Data Management
Introduction to Data ManagementIntroduction to Data Management
Introduction to Data ManagementAmanda Whitmire
 
Introduction to data management
Introduction to data managementIntroduction to data management
Introduction to data managementCunera Buys
 
Introduction to the Environmental Data Initiative (EDI)
Introduction to the Environmental Data Initiative (EDI)Introduction to the Environmental Data Initiative (EDI)
Introduction to the Environmental Data Initiative (EDI)Corinna Gries
 
Who will use the open data? Mark Humphries keynote
Who will use the open data? Mark Humphries keynoteWho will use the open data? Mark Humphries keynote
Who will use the open data? Mark Humphries keynoteJisc RDM
 
Basics of Research Data Management
Basics of Research Data ManagementBasics of Research Data Management
Basics of Research Data ManagementOpenAIRE
 
Data Management for Postgraduate students by Lynn Woolfrey
Data Management for Postgraduate students by Lynn WoolfreyData Management for Postgraduate students by Lynn Woolfrey
Data Management for Postgraduate students by Lynn Woolfreypvhead123
 
Data Management Lab: Session 1 Slides
Data Management Lab: Session 1 SlidesData Management Lab: Session 1 Slides
Data Management Lab: Session 1 SlidesIUPUI
 
Collaborative Data Management using OSF
Collaborative Data Management using OSFCollaborative Data Management using OSF
Collaborative Data Management using OSFC. Tobin Magle
 
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...Amanda Whitmire
 
DataONE Education Module 10: Legal and Policy Issues
DataONE Education Module 10: Legal and Policy IssuesDataONE Education Module 10: Legal and Policy Issues
DataONE Education Module 10: Legal and Policy IssuesDataONE
 
DataONE Education Module 09: Analysis and Workflows
DataONE Education Module 09: Analysis and WorkflowsDataONE Education Module 09: Analysis and Workflows
DataONE Education Module 09: Analysis and WorkflowsDataONE
 
Next generation data services at the Marriott Library
Next generation data services at the Marriott LibraryNext generation data services at the Marriott Library
Next generation data services at the Marriott LibraryRebekah Cummings
 
Data Literacy: Creating and Managing Reserach Data
Data Literacy: Creating and Managing Reserach DataData Literacy: Creating and Managing Reserach Data
Data Literacy: Creating and Managing Reserach Datacunera
 
EDI Training Module 10: EDI Data Repository Overview
EDI Training Module 10:  EDI Data Repository OverviewEDI Training Module 10:  EDI Data Repository Overview
EDI Training Module 10: EDI Data Repository OverviewEnvironmental Data Initiative
 
Who owns the data? Intellectual property considerations for academic research...
Who owns the data? Intellectual property considerations for academic research...Who owns the data? Intellectual property considerations for academic research...
Who owns the data? Intellectual property considerations for academic research...Rebekah Cummings
 
Data Management Lab: Data management plan instructions
Data Management Lab: Data management plan instructionsData Management Lab: Data management plan instructions
Data Management Lab: Data management plan instructionsIUPUI
 

La actualidad más candente (20)

NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
 
EDI Training Module 12: An Introduction to Metadata and Data Repositories
EDI Training Module 12:  An Introduction to Metadata and Data RepositoriesEDI Training Module 12:  An Introduction to Metadata and Data Repositories
EDI Training Module 12: An Introduction to Metadata and Data Repositories
 
Introduction to Data Management
Introduction to Data ManagementIntroduction to Data Management
Introduction to Data Management
 
Introduction to data management
Introduction to data managementIntroduction to data management
Introduction to data management
 
Introduction to the Environmental Data Initiative (EDI)
Introduction to the Environmental Data Initiative (EDI)Introduction to the Environmental Data Initiative (EDI)
Introduction to the Environmental Data Initiative (EDI)
 
Who will use the open data? Mark Humphries keynote
Who will use the open data? Mark Humphries keynoteWho will use the open data? Mark Humphries keynote
Who will use the open data? Mark Humphries keynote
 
Basics of Research Data Management
Basics of Research Data ManagementBasics of Research Data Management
Basics of Research Data Management
 
Data Management for Postgraduate students by Lynn Woolfrey
Data Management for Postgraduate students by Lynn WoolfreyData Management for Postgraduate students by Lynn Woolfrey
Data Management for Postgraduate students by Lynn Woolfrey
 
Data Management Lab: Session 1 Slides
Data Management Lab: Session 1 SlidesData Management Lab: Session 1 Slides
Data Management Lab: Session 1 Slides
 
Collaborative Data Management using OSF
Collaborative Data Management using OSFCollaborative Data Management using OSF
Collaborative Data Management using OSF
 
Valen Metadata and the [Data] Repository
Valen Metadata and the [Data] RepositoryValen Metadata and the [Data] Repository
Valen Metadata and the [Data] Repository
 
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...
 
DataONE Education Module 10: Legal and Policy Issues
DataONE Education Module 10: Legal and Policy IssuesDataONE Education Module 10: Legal and Policy Issues
DataONE Education Module 10: Legal and Policy Issues
 
DataONE Education Module 09: Analysis and Workflows
DataONE Education Module 09: Analysis and WorkflowsDataONE Education Module 09: Analysis and Workflows
DataONE Education Module 09: Analysis and Workflows
 
Next generation data services at the Marriott Library
Next generation data services at the Marriott LibraryNext generation data services at the Marriott Library
Next generation data services at the Marriott Library
 
Data Literacy: Creating and Managing Reserach Data
Data Literacy: Creating and Managing Reserach DataData Literacy: Creating and Managing Reserach Data
Data Literacy: Creating and Managing Reserach Data
 
EDI Training Module 10: EDI Data Repository Overview
EDI Training Module 10:  EDI Data Repository OverviewEDI Training Module 10:  EDI Data Repository Overview
EDI Training Module 10: EDI Data Repository Overview
 
Who owns the data? Intellectual property considerations for academic research...
Who owns the data? Intellectual property considerations for academic research...Who owns the data? Intellectual property considerations for academic research...
Who owns the data? Intellectual property considerations for academic research...
 
Data Management Lab: Data management plan instructions
Data Management Lab: Data management plan instructionsData Management Lab: Data management plan instructions
Data Management Lab: Data management plan instructions
 
Va sla nov 15 final
Va sla nov 15 finalVa sla nov 15 final
Va sla nov 15 final
 

Similar a DataONE Education Module 01: Why Data Management?

Introduction to research data management; Lecture 01 for GRAD521
Introduction to research data management; Lecture 01 for GRAD521Introduction to research data management; Lecture 01 for GRAD521
Introduction to research data management; Lecture 01 for GRAD521Amanda Whitmire
 
20160414 23 Research Data Things
20160414 23 Research Data Things20160414 23 Research Data Things
20160414 23 Research Data ThingsKatina Toufexis
 
Managing data responsibly to enable research interity
Managing data responsibly to enable research interityManaging data responsibly to enable research interity
Managing data responsibly to enable research interityIUPUI
 
Natasha intro to rdm c3 dis may 2018.pptx
Natasha intro to rdm c3 dis may 2018.pptxNatasha intro to rdm c3 dis may 2018.pptx
Natasha intro to rdm c3 dis may 2018.pptxARDC
 
accelerating-data-driven
accelerating-data-drivenaccelerating-data-driven
accelerating-data-drivenJoshua Chudy
 
Causal discovery
Causal discoveryCausal discovery
Causal discoverydagunisa
 
DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake ...
DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake ...DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake ...
DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake ...University of California Curation Center
 
Research Data Curation _ Grad Humanities Class
Research Data Curation _ Grad Humanities ClassResearch Data Curation _ Grad Humanities Class
Research Data Curation _ Grad Humanities ClassAaron Collie
 
NCME Big Data in Education
NCME Big Data  in EducationNCME Big Data  in Education
NCME Big Data in EducationPhilip Piety
 
Meeting the NSF DMP Requirement June 13, 2012
Meeting the NSF DMP Requirement June 13, 2012Meeting the NSF DMP Requirement June 13, 2012
Meeting the NSF DMP Requirement June 13, 2012IUPUI
 
Data Management for Research (New Faculty Orientation)
Data Management for Research (New Faculty Orientation)Data Management for Research (New Faculty Orientation)
Data Management for Research (New Faculty Orientation)aaroncollie
 
Minimal viable data reuse
Minimal viable data reuseMinimal viable data reuse
Minimal viable data reusevoginip
 
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...ICPSR
 
Data Management Lab: Session 3 Slides
Data Management Lab: Session 3 SlidesData Management Lab: Session 3 Slides
Data Management Lab: Session 3 SlidesIUPUI
 

Similar a DataONE Education Module 01: Why Data Management? (20)

Introduction to research data management; Lecture 01 for GRAD521
Introduction to research data management; Lecture 01 for GRAD521Introduction to research data management; Lecture 01 for GRAD521
Introduction to research data management; Lecture 01 for GRAD521
 
METRO RDM Webinar
METRO RDM WebinarMETRO RDM Webinar
METRO RDM Webinar
 
20160414 23 Research Data Things
20160414 23 Research Data Things20160414 23 Research Data Things
20160414 23 Research Data Things
 
Managing data responsibly to enable research interity
Managing data responsibly to enable research interityManaging data responsibly to enable research interity
Managing data responsibly to enable research interity
 
Natasha intro to rdm c3 dis may 2018.pptx
Natasha intro to rdm c3 dis may 2018.pptxNatasha intro to rdm c3 dis may 2018.pptx
Natasha intro to rdm c3 dis may 2018.pptx
 
accelerating-data-driven
accelerating-data-drivenaccelerating-data-driven
accelerating-data-driven
 
Causal discovery
Causal discoveryCausal discovery
Causal discovery
 
DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake ...
DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake ...DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake ...
DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake ...
 
Data science unit1
Data science unit1Data science unit1
Data science unit1
 
Research Data Curation _ Grad Humanities Class
Research Data Curation _ Grad Humanities ClassResearch Data Curation _ Grad Humanities Class
Research Data Curation _ Grad Humanities Class
 
NCME Big Data in Education
NCME Big Data  in EducationNCME Big Data  in Education
NCME Big Data in Education
 
Simon hodson
Simon hodsonSimon hodson
Simon hodson
 
Oracle openworld-presentation
Oracle openworld-presentationOracle openworld-presentation
Oracle openworld-presentation
 
ARLIS-NY Presentation
ARLIS-NY PresentationARLIS-NY Presentation
ARLIS-NY Presentation
 
Meeting the NSF DMP Requirement June 13, 2012
Meeting the NSF DMP Requirement June 13, 2012Meeting the NSF DMP Requirement June 13, 2012
Meeting the NSF DMP Requirement June 13, 2012
 
Data Management for Research (New Faculty Orientation)
Data Management for Research (New Faculty Orientation)Data Management for Research (New Faculty Orientation)
Data Management for Research (New Faculty Orientation)
 
Minimal viable data reuse
Minimal viable data reuseMinimal viable data reuse
Minimal viable data reuse
 
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
 
Johnston - How to Curate Research Data
Johnston - How to Curate Research DataJohnston - How to Curate Research Data
Johnston - How to Curate Research Data
 
Data Management Lab: Session 3 Slides
Data Management Lab: Session 3 SlidesData Management Lab: Session 3 Slides
Data Management Lab: Session 3 Slides
 

Último

4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
Activity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationActivity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationRosabel UA
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxAshokKarra1
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 

Último (20)

4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
Activity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationActivity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translation
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptx
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 

DataONE Education Module 01: Why Data Management?

  • 1. Why Data Management Lesson 1: Introduction to Data Management Why Data Management? CCimagebyUniversityofMarylandPressReleasesonFlickr
  • 2. Why Data Management • The data world around us • Importance of data management • The data lifecycle • The case for data management CCimagebyinterpunctonFlickr
  • 3. Why Data Management After completing this lesson, the participant will be able to: • Give two general examples of why increasing amounts of data is a concern • Explain, using two examples, how lack of data management makes an impact • Define the research data lifecycle • Give one example of how well-managed data can result in new scientific conclusions
  • 5. Images collected by DataOne.org
  • 6. Why Data Management Photocourtesyofwww.carboafrica.net Data are collected from sensors, sensor networks, remote sensing, observations, and more - this calls for increased attention to data management and stewardship Photocourtesyof http://modis.gsfc.nasa.gov/ Photocourtesyof http://www.futurlec.com CCimagebytajaionFlickr CCimagebyCIMMYTonFlickr ImagecollectedbyVivHutchinson
  • 7. Source: John Gantz, IDC Corporation: The Expanding Digital Universe 0 100,000 200,000 300,000 400,000 500,000 600,000 700,000 800,000 900,000 1,000,000 2005 2006 2007 2008 2009 2010 Transient information or unfilled demand for storage Information Available Storage PetabytesWorldwide
  • 8. Why Data Management • Natural disaster • Facilities infrastructure failure • Storage failure • Server hardware/software failure • Application software failure • External dependencies (e.g. PKI failure) • Format obsolescence • Legal encumbrance • Human error • Malicious attack by human or automated agents • Loss of staffing competencies • Loss of institutional commitment • Loss of financial stability • Changes in user expectations and requirements CCimagebySharynMorrowonFlickr CCimagebymomboleumonFlickr
  • 10. “MEDICARE PAYMENT ERRORS NEAR $20B” (CNN) December 2004 Miscoding and billing errors from doctors and hospitals totaled $20 billion in FY 2003 (9.3% error rate). The error rate measured claims that were paid despite being medically unnecessary, inadequately documented, or improperly coded. This error rate actually was an improvement over the previous fiscal year (9.8% error rate). “AUDIT: JUSTICE STATS ON ANTI-TERROR CASES FLAWED” (AP) February 2007 The Justice Department Inspector General found only two sets of data out of 26 concerning terrorism attacks were accurate. The Justice Department uses these statistics to argue for their budget. The Inspector General said the data “appear to be the result of decentralized and haphazard methods of collections … and do not appear to be intentional.” “SOCIAL SECURITY DATA CAN TURN PEOPLE INTO THE LIVING DEAD” (NPR) August 2016 In 2011, an audit found that about 1,000 people a month in the U.S. were marked deceased when they were very much alive. Rona Lawson, who works in the Office of the Inspector General at the Social Security Administration, says that number has gone down. It's now around 500 people a month. Lawson says 90 percent of the time, the cascade of misinformation starts with an input error by Social Security staff — a regular mistake on a regular office day that just happens to kill a person off, at least on paper. Slide courtesy of BLM
  • 11. A wildlife biologist for a small field office was the in-house GIS expert and provided support for all the staff’s GIS needs. However, the data were stored on her own workstation. When the biologist relocated to another office, no one understood how the data were stored or managed. Solution: A state office GIS specialist retrieved the workstation and sifted through files trying to salvage relevant data. Cost: 1 work month ($4,000) plus the value of data that were not recovered Consider that the situation could have been worse, because the data were not being backed up as they would have been if stored on a server. Poor Science Data Management Example
  • 12. Why Data Management In preparation for a Resource Management Plan, an office discovered 14 duplicate GPS inventories of roads. However, because none of the inventories had enough metadata, it was impossible to know which inventory was best or if any of the inventories actually met their requirements. Solution: Re-Inventory roads Cost: Estimated 9 work months/inventory @$4,000/work month (14 inventories = $504,000) CCimagebyruffin_readyonFlickr
  • 14. Why Data Management “Please forgive my paranoia about protocols, standards, and data review. I'm in the latter stages of a long career with USGS (30 years, and counting), and have experienced much. Experience is the knowledge you get just after you needed it. Several times, I've seen colleagues called to court in order to testify about conditions they have observed. Without a strong tradition of constant review and approval of basic data, they would've been in deep trouble under cross-examination. Instead, they were able to produce field notes, data approval records, and the like, to back up their testimony. It's one thing to be questioned by a college student who is working on a project for school. It's another entirely to be grilled by an attorney under oath with the media present.” - Nelson Williams, Scientist US Geological Survey
  • 15. Why Data Management The climate scientists at the centre of a media storm over leaked emails were yesterday cleared of accusations that they fudged their results and silenced critics, but a review found they had failed to be open enough about their work.
  • 16. Why Data Management • Manage your data for yourself: o Keep yourself organized – be able to find your files (data inputs, analytic scripts, outputs at various stages of the analytic process, etc.) o Track your science processes for reproducibility – be able to match up your outputs with exact inputs and transformations that produced them o Better control versions of data – easily identify versions that can be periodically purged o Quality control your data more efficiently
  • 17. Why Data Management • To avoid data loss (e.g. making backups) • Format your data for re-use (by yourself or others) • Be prepared: Document your data for your own recollection, accountability, and re-use (by yourself or others) • Gain credibility and recognition for your science efforts through data sharing! CCimagebyUWWResNetonFlickr
  • 18. Why Data Management • Data is a valuable asset – it is expensive and time consuming to collect • Data should be managed to: o maximize the effective use and value of data and information assets o continually improve the quality including: data accuracy, integrity, integration, timeliness of data capture and presentation, relevance, and usefulness o ensure appropriate use of data and information o facilitate data sharing o ensure sustainability and accessibility in long term for re-use in science
  • 19.
  • 20.
  • 21.
  • 22. Why Data Management Here are a few reasons (from the UK Data Archive): • Increases the impact and visibility of research • Promotes innovation and potential new data uses • Leads to new collaborations between data users and creators • Maximizes transparency and accountability • Enables scrutiny of research findings • Encourages improvement and validation of research methods • Reduces cost of duplicating data collection • Provides important resources for education and training
  • 23. Why Data Management Spatio-Temporal Exploratory Models predict the probability of occurrence of bird species across the United States at a 35 km x 35 km grid. Land Cover Potential Uses- • Examine patterns of migration • Infer impacts of climate change • Measure patterns of habitat usage • Measure population trends Model results eBird Meteorology MODIS – Remote sensing data Occurrence of Indigo Bunting (2008) Jan Sep DecJunApr Slide courtesy of DataOne
  • 24. Why Data Management A new image processing technique reveals something not before seen in this Hubble Space Telescope image taken 11 years ago: A faint planet (arrows), the outermost of three discovered with ground-based telescopes last year around the young star HR 8799.D. Lafrenière et al., Astrophysical Journal Letters. “The first thing it tells you is how valuable maintaining long-term archives can be. Here is a major discovery that’s been lurking in the data for about 10 years!” comments Matt Mountain, director of the Space Telescope Science Institute in Baltimore, which operates Hubble. “The second thing its tells you is having a well calibrated archive is necessary but not sufficient to make breakthroughs — it also takes a very innovative group of people to develop very smart extraction routines that can get rid of all the artifacts to reveal the planet hidden under all that telescope and detector structure.” D.Lafrenièreetal.,ApJLetters
  • 26. Why Data Management • …there are best practices… and… tools to help! • The following data management lessons will illustrate in detail each stage of the data lifecycle • Your well-managed and accessible data can contribute to science in ways you may not even imagine today!
  • 27. Why Data Management • If data are: o Well-organized o Documented o Preserved o Accessible o Verified as to accuracy and validity • Result is: o High quality data o Easy to share and re-use in science o Citation and credibility to the researcher o Cost-savings to science
  • 28. Why Data Management • The data deluge has created a surge of information that needs to be well-managed and made accessible. • The cost of not doing data management can be very high. • Be cognizant of best practices and tools associated with the data lifecycle to manage your data well. • Many benefits are associated with the act of managing data, including the ability to find, access, understand, integrate, and re-use data.
  • 29. Why Data Management 1. Chatfield, T., Selbach, R. February 2011. Data Management Training Workshop. Bureau of Land Management (BLM). 2. Strasser, Carly. February 2012. Data Management for Scientists. http://www.slideshare.net/carlystrasser/oceansciences2012workshop 3. UK Data Archive. May 2011. Managing and Sharing Data: Best Practices for Researchers. http://www.data- archive.ac.uk/media/2894/managingsharing.pdf 4. DAMA International, The DAMA Guide to the Data Management Body of Knowledge. https://www.dama.org/content/body-knowledge
  • 30. Why Data Management The full slide deck may be downloaded from: http://www.dataone.org/education-modules Suggested citation: DataONE Education Module: Data Management. DataONE. Retrieved Nov 16, 2016. From http://www.dataone.org/sites/all/documents/L01_DataManage ment.pptx Copyright license information: No rights reserved; you may enhance and reuse for your own purposes. We do ask that you provide appropriate citation and attribution to DataONE.

Notas del editor

  1. The topics covered in this module will answer the questions:
  2. Data is being generated in massive quantities daily. Improvements in technology enable higher precision and coverage in data acquisition and makes higher capacity systems store and migrate more data –increasing the importance of managing, integrating, and re-using data.
  3. Manage your data for yourself: Keep yourself organized – be able to find your files (data inputs, analytic scripts, outputs at various stages of the analytic process, etc) Track your science processes for reproducibility – be able to match up your outputs with exact inputs and transformations that produced themBetter control versions of data – identify easily versions that can be periodically purgedQuality control your data more efficiently
  4. Manage your data for yourself: Make backups to avoid data lossFormat your data for re-use (by yourself or others)Be prepared: Document your data for your own recollection and re-use (by yourself or others) Prepare it to share it – gain credibility and recognition for your science efforts
  5. Data is a valuable asset – it is expensive and time consuming to collect Data should be managed to:maximize the effective use and value of data and information assetscontinually improve the quality including: data accuracy, integrity, integration, timeliness of data capture and presentation, relevance and usefulnessensure appropriate use of data and informationfacilitate data sharingensure sustainability and accessibility in long term for re-use in science
  6. Data management and organization facilitate archiving, sharing and publishing data. These activities feed data re-use and reproducibility in science.
  7. By re-using data collected from a variety of sources – eBird database, land cover data, meteorology, and remotely sensed -- this project was able to compile and process the data using supercomputering to determine bird migration routes for particular species.
  8. There is an abundance of data and metadata (if it is done) end up in filing cabinets, on discarded hard drives, in hard-copy journals on the library shelves -- or on the web, but many are subscription only journals.
  9. Data should be properly managed and eventually be placed where they are accessible, understandable, and re-usable.
  10. A data lifecycle illustrates stages thru which well-managed data passes from the inception of a research project to its conclusion. In the reality of science research, the stages do not always follow a continuous circle.
  11. For each stage of the data lifecycle…there are best practices…..and….tools to help! The following data management lessons will illustrate in detail each stage of the data lifecycle Your well-managed and accessible data can contribute to science in ways you may not even imagine today!
  12. The data deluge has created a surge of information that needs to be well-managed and made accessible.The cost of not doing data management is very high. Be cognizant of best practices and tools associated with the data lifecycle to manage your data well. Many benefits are associated with the act of managing data, including the ability to find, access, understand, integrate and re-use data.
  13. For each stage of the data lifecycle…there are best practices…..and….tools to help! The following data management lessons will illustrate in detail each stage of the data lifecycle Your well-managed and accessible data can contribute to science in ways you may not even imagine today!