SlideShare una empresa de Scribd logo
1 de 1
Descargar para leer sin conexión
IUPUI University Library Center for Digital Scholarship
Data Management Lab: Spring 2014
Data Entry Best Practices
Data Entry
1. Dataset creation and integrity
a. Separate the coding and data entry tasks as much as possible
b. Coding should be performed so that distractions to coding tasks are minimized
c. Arrange for particularly complex tasks to be carried out by people specially trained for
the task
d. Use a data-entry program that is designed to catch typing errors (i.e., one that's pre-
programmed to detect out of range values)
e. Perform double entry of data
f. Carefully check the first 5-10 percent of the data records created, then choose random
records to quality-control checks throughout the process
g. Let the computer do complex coding and recoding, if possible
2. Things to check
a. Wild codes and out-of-range values
b. Consistency checks - comparisons across variables
c. Record matches and counts - relevant in longitudinal studies where subjects may have
more than one record and varying numbers of records
3. Variable names
a. Prefix, root, suffix systems is a systematic approach (compared to one-up numbers,
question numbers, and mnemonic names)
4. Variable labels
a. Should provide three pieces of information
i. The item or question number in the original data collection instrument
ii. A clear indication of the variable's content
iii. An indication of whether the variable is constructed from other items
5. Variable groups
a. Groups are recommended if a dataset contains a large number of variables
b. Can effectively organize a dataset an enable secondary analysts get an overview of a
dataset quickly
6. Over the long-term, store data in a consistent format
References
1. ICPSR. (2012). Guide to Social Science Data Preparation and Archiving, University of Michigan,
Ann Arbor, MI. From http://www.icpsr.umich.edu/files/deposit/dataprep.pdf.
2. Scott, T. 2012. Guidelines for data collection and entry.
From http://www.mc.vanderbilt.edu/gcrc/workshop_files/2012-09-07.pdf
3. DataONE Education Module: Data Entry and Manipulation. DataONE.
From http://www.dataone.org/sites/all/documents/L04_DataEntryManipulation.pptx
Heather Coates, 2013

Más contenido relacionado

La actualidad más candente

La actualidad más candente (7)

BIOMAJ
BIOMAJBIOMAJ
BIOMAJ
 
Transparency and reproducibility in research
Transparency and reproducibility in researchTransparency and reproducibility in research
Transparency and reproducibility in research
 
UKON 2014
UKON 2014UKON 2014
UKON 2014
 
Warm Up 08-18
Warm Up 08-18Warm Up 08-18
Warm Up 08-18
 
eSource: A Clinical Data Manager's Tale of Three Studies
eSource: A Clinical Data Manager's Tale of Three StudieseSource: A Clinical Data Manager's Tale of Three Studies
eSource: A Clinical Data Manager's Tale of Three Studies
 
rOpenGov: an R ecosystem for open government data and computational social sc...
rOpenGov: an R ecosystem for open government data and computational social sc...rOpenGov: an R ecosystem for open government data and computational social sc...
rOpenGov: an R ecosystem for open government data and computational social sc...
 
Ds mcq
Ds mcqDs mcq
Ds mcq
 

Similar a Data Management Lab: Session 3 Data Entry Best Practices

An Empirical Study of the Applications of Classification Techniques in Studen...
An Empirical Study of the Applications of Classification Techniques in Studen...An Empirical Study of the Applications of Classification Techniques in Studen...
An Empirical Study of the Applications of Classification Techniques in Studen...
IJERA Editor
 
Indexing based Genetic Programming Approach to Record Deduplication
Indexing based Genetic Programming Approach to Record DeduplicationIndexing based Genetic Programming Approach to Record Deduplication
Indexing based Genetic Programming Approach to Record Deduplication
idescitation
 
Role of computers in research
Role of computers in researchRole of computers in research
Role of computers in research
Saravana Kumar
 
Student database management system
Student database management systemStudent database management system
Student database management system
Snehal Raut
 
СРС АКТ Малошов Нұралы ВМ-МҚБ-11-23.pptx
СРС АКТ Малошов Нұралы ВМ-МҚБ-11-23.pptxСРС АКТ Малошов Нұралы ВМ-МҚБ-11-23.pptx
СРС АКТ Малошов Нұралы ВМ-МҚБ-11-23.pptx
ssuser8719a6
 

Similar a Data Management Lab: Session 3 Data Entry Best Practices (20)

Bi4101343346
Bi4101343346Bi4101343346
Bi4101343346
 
An Empirical Study of the Applications of Classification Techniques in Studen...
An Empirical Study of the Applications of Classification Techniques in Studen...An Empirical Study of the Applications of Classification Techniques in Studen...
An Empirical Study of the Applications of Classification Techniques in Studen...
 
Using ID3 Decision Tree Algorithm to the Student Grade Analysis and Prediction
Using ID3 Decision Tree Algorithm to the Student Grade Analysis and PredictionUsing ID3 Decision Tree Algorithm to the Student Grade Analysis and Prediction
Using ID3 Decision Tree Algorithm to the Student Grade Analysis and Prediction
 
Indexing based Genetic Programming Approach to Record Deduplication
Indexing based Genetic Programming Approach to Record DeduplicationIndexing based Genetic Programming Approach to Record Deduplication
Indexing based Genetic Programming Approach to Record Deduplication
 
Pemanfaatan Big Data Dalam Riset 2023.pptx
Pemanfaatan Big Data Dalam Riset 2023.pptxPemanfaatan Big Data Dalam Riset 2023.pptx
Pemanfaatan Big Data Dalam Riset 2023.pptx
 
Trends and innovations in database course
Trends and innovations in database courseTrends and innovations in database course
Trends and innovations in database course
 
STUDENTS’ PERFORMANCE PREDICTION SYSTEM USING MULTI AGENT DATA MINING TECHNIQUE
STUDENTS’ PERFORMANCE PREDICTION SYSTEM USING MULTI AGENT DATA MINING TECHNIQUESTUDENTS’ PERFORMANCE PREDICTION SYSTEM USING MULTI AGENT DATA MINING TECHNIQUE
STUDENTS’ PERFORMANCE PREDICTION SYSTEM USING MULTI AGENT DATA MINING TECHNIQUE
 
Data Management Lab: Session 3 Data Coding Best Practices
Data Management Lab: Session 3 Data Coding Best PracticesData Management Lab: Session 3 Data Coding Best Practices
Data Management Lab: Session 3 Data Coding Best Practices
 
Data mining and business intelligence
Data mining and business intelligenceData mining and business intelligence
Data mining and business intelligence
 
T0 numtq0n tk=
T0 numtq0n tk=T0 numtq0n tk=
T0 numtq0n tk=
 
Predicting students' performance using id3 and c4.5 classification algorithms
Predicting students' performance using id3 and c4.5 classification algorithmsPredicting students' performance using id3 and c4.5 classification algorithms
Predicting students' performance using id3 and c4.5 classification algorithms
 
Role of computers in research
Role of computers in researchRole of computers in research
Role of computers in research
 
A Survey on the Classification Techniques In Educational Data Mining
A Survey on the Classification Techniques In Educational Data MiningA Survey on the Classification Techniques In Educational Data Mining
A Survey on the Classification Techniques In Educational Data Mining
 
Exam Questions
Exam QuestionsExam Questions
Exam Questions
 
Role of Computers in Research, Data Processing, Data Analysis
Role of Computers in Research, Data Processing, Data AnalysisRole of Computers in Research, Data Processing, Data Analysis
Role of Computers in Research, Data Processing, Data Analysis
 
Analysis Of Data Mining Model For Successful Implementation Of Data Warehouse...
Analysis Of Data Mining Model For Successful Implementation Of Data Warehouse...Analysis Of Data Mining Model For Successful Implementation Of Data Warehouse...
Analysis Of Data Mining Model For Successful Implementation Of Data Warehouse...
 
Student database management system
Student database management systemStudent database management system
Student database management system
 
Read Between The Lines: an Annotation Tool for Multimodal Data
Read Between The Lines: an Annotation Tool for Multimodal DataRead Between The Lines: an Annotation Tool for Multimodal Data
Read Between The Lines: an Annotation Tool for Multimodal Data
 
СРС АКТ Малошов Нұралы ВМ-МҚБ-11-23.pptx
СРС АКТ Малошов Нұралы ВМ-МҚБ-11-23.pptxСРС АКТ Малошов Нұралы ВМ-МҚБ-11-23.pptx
СРС АКТ Малошов Нұралы ВМ-МҚБ-11-23.pptx
 
Data mining
Data miningData mining
Data mining
 

Más de IUPUI

Building the Future of Research Together
Building the Future of Research TogetherBuilding the Future of Research Together
Building the Future of Research Together
IUPUI
 

Más de IUPUI (20)

Altmetrics 101 - Altmetrics in Libraries
Altmetrics 101 - Altmetrics in LibrariesAltmetrics 101 - Altmetrics in Libraries
Altmetrics 101 - Altmetrics in Libraries
 
Gather evidence to demonstrate the impact of your research
Gather evidence to demonstrate the impact of your researchGather evidence to demonstrate the impact of your research
Gather evidence to demonstrate the impact of your research
 
Managing data responsibly to enable research interity
Managing data responsibly to enable research interityManaging data responsibly to enable research interity
Managing data responsibly to enable research interity
 
Case studies for open science
Case studies for open scienceCase studies for open science
Case studies for open science
 
Midwest Medical Library Association 2015 Big Data Panel
Midwest Medical Library Association 2015 Big Data PanelMidwest Medical Library Association 2015 Big Data Panel
Midwest Medical Library Association 2015 Big Data Panel
 
Gathering Evidence to Demonstrate Impact
Gathering Evidence to Demonstrate ImpactGathering Evidence to Demonstrate Impact
Gathering Evidence to Demonstrate Impact
 
Citation & altmetrics - a comparison
Citation & altmetrics - a comparisonCitation & altmetrics - a comparison
Citation & altmetrics - a comparison
 
Altmetrics for Team Science
Altmetrics for Team ScienceAltmetrics for Team Science
Altmetrics for Team Science
 
Ensuring data quality
Ensuring data qualityEnsuring data quality
Ensuring data quality
 
Preventing data loss
Preventing data lossPreventing data loss
Preventing data loss
 
Practical Data Management Plans
Practical Data Management PlansPractical Data Management Plans
Practical Data Management Plans
 
Teaching data management in a lab environment (IASSIST 2014)
Teaching data management in a lab environment (IASSIST 2014)Teaching data management in a lab environment (IASSIST 2014)
Teaching data management in a lab environment (IASSIST 2014)
 
Building the Future of Research Together
Building the Future of Research TogetherBuilding the Future of Research Together
Building the Future of Research Together
 
NIH Data Sharing Plan Workshop - Handout
NIH Data Sharing Plan Workshop - HandoutNIH Data Sharing Plan Workshop - Handout
NIH Data Sharing Plan Workshop - Handout
 
NIH Data Sharing Plan Workshop - Slides
NIH Data Sharing Plan Workshop - SlidesNIH Data Sharing Plan Workshop - Slides
NIH Data Sharing Plan Workshop - Slides
 
Data Management Lab: Session 4 Slides
Data Management Lab: Session 4 SlidesData Management Lab: Session 4 Slides
Data Management Lab: Session 4 Slides
 
Data Management Lab: Session 4 Review Outline
Data Management Lab: Session 4 Review OutlineData Management Lab: Session 4 Review Outline
Data Management Lab: Session 4 Review Outline
 
Data Management Lab: Session 3 Slides
Data Management Lab: Session 3 SlidesData Management Lab: Session 3 Slides
Data Management Lab: Session 3 Slides
 
Data Management Lab: Session 3 Data Review Checklist
Data Management Lab: Session 3 Data Review ChecklistData Management Lab: Session 3 Data Review Checklist
Data Management Lab: Session 3 Data Review Checklist
 
Data Management Lab: Session 2 slides
Data Management Lab: Session 2 slidesData Management Lab: Session 2 slides
Data Management Lab: Session 2 slides
 

Último

Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
MateoGardella
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
SanaAli374401
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
heathfieldcps1
 

Último (20)

APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 

Data Management Lab: Session 3 Data Entry Best Practices

  • 1. IUPUI University Library Center for Digital Scholarship Data Management Lab: Spring 2014 Data Entry Best Practices Data Entry 1. Dataset creation and integrity a. Separate the coding and data entry tasks as much as possible b. Coding should be performed so that distractions to coding tasks are minimized c. Arrange for particularly complex tasks to be carried out by people specially trained for the task d. Use a data-entry program that is designed to catch typing errors (i.e., one that's pre- programmed to detect out of range values) e. Perform double entry of data f. Carefully check the first 5-10 percent of the data records created, then choose random records to quality-control checks throughout the process g. Let the computer do complex coding and recoding, if possible 2. Things to check a. Wild codes and out-of-range values b. Consistency checks - comparisons across variables c. Record matches and counts - relevant in longitudinal studies where subjects may have more than one record and varying numbers of records 3. Variable names a. Prefix, root, suffix systems is a systematic approach (compared to one-up numbers, question numbers, and mnemonic names) 4. Variable labels a. Should provide three pieces of information i. The item or question number in the original data collection instrument ii. A clear indication of the variable's content iii. An indication of whether the variable is constructed from other items 5. Variable groups a. Groups are recommended if a dataset contains a large number of variables b. Can effectively organize a dataset an enable secondary analysts get an overview of a dataset quickly 6. Over the long-term, store data in a consistent format References 1. ICPSR. (2012). Guide to Social Science Data Preparation and Archiving, University of Michigan, Ann Arbor, MI. From http://www.icpsr.umich.edu/files/deposit/dataprep.pdf. 2. Scott, T. 2012. Guidelines for data collection and entry. From http://www.mc.vanderbilt.edu/gcrc/workshop_files/2012-09-07.pdf 3. DataONE Education Module: Data Entry and Manipulation. DataONE. From http://www.dataone.org/sites/all/documents/L04_DataEntryManipulation.pptx Heather Coates, 2013