This document summarizes a presentation about the data cleaning tool Open Refine and how librarians are using it. The presentation has three parts: an introduction to Open Refine describing it as a popular but unknown library tool, a comparison of Open Refine to Excel explaining why Open Refine is better for handling larger datasets, and examples of how librarians have used Open Refine for tasks like migrating 50,000 catalog records between library management systems.
1. Open Refine for Librarians
How a power tool for Google is now
being used by librarians to clean up
data and connect it to the world
Mita Williams
Scholarly Communications Librarian
University of Windsor
October 24, 2018 : 2:45 - 3:15 pm
NISO: That Cutting Edge: Technology’s Impact on Scholarly
Research Processes in the Library
11. PART TWO:
WHY NOT KEEP USING EXCEL?
The most popular library tool you’ve never heard of…
12. Why use Open Refine?
• Ability to handle more types of data
TSV, CSV, *SV, Excel (.xls and .xlsx), JSON, XML, RDF as XML
• Ability to handle larger amounts of data
Excel’s max: 1,048,576 rows by 16,384 columns
• Better control of data
• Ability to script processes
• Ability share and reproduce these scripts
20. PART THREE:
HOW ARE LIBRARIANS
USING OPENREFINE?
!!! OpenRefine is NOT Excel !!!
21. • Institution changing their library management system
and wished to migrate their catalogue data
• Approximately 50,000 bibliographic records
• MARC output from existing system would not load into
new system
link