1. Data Cleaning
using
OpenRefine
C. Tobin Magle, PhD
Mar. 28, 2016
10:00-11:30 a.m.
Morgan Library Computer
Classroom 175
*inspired by content from Data Carpentry
4. Survey data
• Rows: observations of
individual animals
• Columns: Variables that
describe the animals
• Species, sex, date, location, etc
• Messy Data
• Misspellings
• White space
• Combined variables
7. Exercise 1:
1. Using faceting, find out how many years are represented in
the census.
2. Which years have the most and least observations?
8. Clustering “finding groups of different values that might be alternative
representations of the same thing”
a.k.a. spellcheck
metaphone3Key collision
9. Undo/Redo
• All your steps are saved!
• Click where it says Undo / Redo
• Left frame
• Click on the step to revert to
• Result: data change.
10. Split
• Edit Column > Split
• Put space as separator
• Result: new columns
1
2 3
4
5
11. Exercise 2:
• Try to change the name of the second new
column to “species”
• How can you correct the problem you encounter?
17. Exercise 4: sort by multiple columns
• Sort by year then month. What order are the entries in?
• Sort by year, month, and day
• What happens when you remove the sort on month column?
21. Exercise 6
• In a numeric column, replace a number with text (such
as abc) and one with a blank
• Create a numeric facet for this column
• How is this different than the numeric facet for “Year”?
23. Exercise 7
• Click on the Scatterplot Matrix square
for recordID and period
• Facet on Amphiphiza bilineata (AB).
• Notice the change in the scatterplot. It might be easier to see if
you click export plot to put it on a new browser tab.
24. Saving Scripts
• Export the steps for reuse
• In the Undo / Redo section, click
Extract
• Select the steps you want to keep
• Save code as .txt file using a text
editor
25. Applying Scripts
• Run the same steps on a similar
document
• Click apply
• Paste in code
Paste
26. Saving and exporting a project
• Autosave feature
• Click 'Export' button (top right)
• Select 'Export project'
• Result: a compressed file that
contains
• Data
• Cleaning steps
27. Importing a project
• Found in the menu where you create/open projects
• Loads data and history
28. Exporting data
• Go to 'Export' in the top
right.
• Click on the file type you
want to export the data in.
• 'Tab-separated values'
• 'Comma-separated values'
29. Need help?
• Email: tobin.magle@colostate.edu
• Data Management Services website:
http://lib.colostate.edu/services/data-management
• Data Carpentry: http://www.datacarpentry.org/
• OpenRefine Lesson:
http://www.datacarpentry.org/OpenRefine-ecology-lesson/
Notas del editor
Start Open Refine
Create project by selecting the .csv
Look at the preview, change options if needed
Click create project