3. History
► Library restructure in 1995
► Individual specialists roles dissolved and each
professional member of staff given many hats
► No metadata/cataloguing specialist from that
time until November 2016
► Small group of staff do their best to catalogue
in the interim
4. The state of the catalogue
► Authority control
was lacking
► Many fields missing
or incorrect
► Local subject index
► Local subject
headings
► No LCSH in some
records
► Hybrid e-book and
print book records
► Split multi-volume
works
5. MARCEdit
► First created in 1999 to
enable a data clean-up
project at Oregon State
University.
► Developed by Terry Reese
and updated by him
regularly
► Offered as a free download
► Has an enormous array of
functionality built into it
7. Authority control
► Authority control: established, unique,
consistent forms of terms for
disambiguation and collocation
► Project scope: to authorise the name and
subject headings in Sierra
► All records were in scope except PDA
records as these were not purchased
8. Data extraction and manipulation
1. Extract records in scope from Sierra and
save them locally
2. Use MarcBreaker
3. Validate name headings and embed URIs
9. Data extraction and manipulation
1. Extract records in scope from Sierra and
save them to locally
2. Use MarcBreaker
3. Validate name headings and embed URIs
10. Data extraction and manipulation
1. Extract records in scope from Sierra and
save them locally
2. Use MarcBreaker
3. Validate name headings and embed URIs
4. Extract 1XX, 7XX headings and URIs and
copy to Notepad++
11. Data extraction and manipulation
1. Extract records in scope from Sierra and
save them to locally
2. Use MarcBreaker
3. Validate name headings and embed URIs
4. Extract 1XX, 7XX headings and URIs and
copy to Notepad++
12. Data extraction and manipulation
1. Extract records in scope from Sierra and
save them locally
2. Use MarcBreaker
3. Validate name headings and embed URIs
4. Extract 1XX, 7XX headings and URIs and
copy to Notepad++
5. Use regular expressions to extract just the
LCCN
6. Make the LCCNs searchable via z39.50
13. Data extraction and manipulation
1. Extract records in scope from Sierra and
save them to locally
2. Use MarcBreaker
3. Validate name headings and embed URIs
4. Extract 1XX, 7XX headings and URIs and
copy to Notepad++
5. Use regular expressions to extract just the
LCCN
6. Make the LCCNs searchable via z39.50
14. To sum up
► MarcBreaker
► Validate headings
► Normalise data for searching
► z39.50
15. Module codes
► Project scope:
Update all reading list items in Sierra with
the current course code
► Concerns these were out of date
► Codes in Sierra still important for current
workflows
23. Finishing
► Load dummy MARC records using custom
load table
► Matches on bibliographic number
► Only importing 980 field
► Use Sierra Global Update function to
update 900 module code field
24. Reclassification
► Some areas of the collection classified to an old
standard
► Split collections with shelf ready records
► Too many to individually reclassify
► MarcEdit function “Generate classification” based
on OCLCs Classify
► Project Scope: 301-307, just over 7000 titles
► Import the classification the same way as module
codes via local field 982
25. Pros
► Tool is fast and
easy to use
► Lots of extra
functionality such
as fast headings
► Accurate up to date
classification
(mostly)
► It relies on ISBN
and author/title
matching
► Some errors
► Some things simply
not found
…and cons
26. Metadata enhancement
► Project scope:
Improve the metadata of Aston legacy records
starting with records lacking LCSH by fishing for
records
► Data preparation
► z39.50 search
► Data enhancement
► RDA
► Linked data?
27. Normalize data, z39.50 searching
► Export target records from Sierra
► Search for and extract data points such as
ISBN, title, main author, date of publication
► Normalize data e.g. remove fluff from 020s,
use only title proper, fixed dates, use
surnames only
► Make these data searchable via z39.50
31. Lessons learned and concluding remarks
► Data normalization takes time and is important
► Take care to document everything and make your
file metadata clear
► Using Box really helped with reversing mistakes
► Search files need to be manageable (probably no
bigger than 1000 records)
► Trial and error and Google are your friends
► Questions?
Notas del editor
In 1995 Aston Library was restructured. The main outcome of this restructure was to dissolve the different specialist roles within the library. Each member of professional staff in the Information Resources division had multiple roles and in theory did a bit of everything. Cataloguing and metadata were relegated to one member of staff who “minded” the catalogue in this period. A small group of non-professional staff did their best to keep the catalogue together through this period. Aston adopted basically full shelf-ready in 2008 (and the quality of records improve from that time)
This is not a fully comprehensive list of the issues with the catalogue but it gives a good impression. As I discovered these issues, I began to work out ways that I might be able to fix them. Additionally, I was tasked with lightening the burden of cataloguing on the Information Assistants. I prioritised my efforts first on streamlining workflows as these produce tangible results.
For those unfamiliar with MARCEdit, it is a free software created and curated by Terry Reese. He initially created it for a metadata project in 1999 and later added a GUI and a whole host of useful functionalities.
I will now briefly describe some metadata projects I undertook to improve the legacy data and our workflows.
Authority control is the process of using a set of standards to create an established, unique and consistent form of a term for disambiguation and collocation.
At Aston, there was a scrappy local name index that had been in obsolesce for 20 years and did not correspond to the NACO forms of names in our vendor and shelf ready records. My goal was to establish authority control with NACO headings. Every record in the system was in scope for the project with the exception of our PDA records as these were not purchased items.
I will describe this process in a little detail as each project follows a similar path.
To begin I use Sierra create list function to isolate sets of records that I want to use and save these separately either on my local machine or in Box (a cloud based file sharing site)
I use MarcBreaker which creates a text readable form of the MARC records that can be easily manipulated.
I use the Validate headings functions in the text editor and embed URIs which are my target for this project.
I extract the 1xx and 7xx using the “search all” function and put these in to Notepad++
Here is a view of the extracted data. My target in the URI is the LCCN.
I use regular expressions to extract just the LCCN from the URI and then make this searchable via z39.50.
@attr 1=9 will search the 010 MARC field in authority records. I then use MARCEdit z39.50 client to search the NAF (Name Authority File) for these authorities.
These steps summarise the steps in many of the projects. Using MARCBreaker to get text records that are easy to manipulate. Do something to the records in this form (in this case validating headings). Normalising data use Notepad++ and regular expressions (this step often happens more than once). Using z39.50 or load tables to in Sierra to apply the changes.
This project was one to improve the internal house keeping for our reading list records.
As part of the workflow for adding items to reading lists, module codes for these courses were added to Sierra in a local, indexed field. These were done manually by staff both adding and removing. There was concern that these were out of date and things might be missed. It was determined that this was still a valuable thing to maintain but we wanted to automate it.
Here is a view of the data extracted from Talis. The two pieces of data I’m interested in are the Reading list name and code and the LCN (Local control number)
I take these fields and a few others to make an excel file ready for translating data from text to MARC. I include the extra fields to make the dummy MARC records more human reader friendly.
MARCEdit’s delimited text translator can turn tab separated data into MARC records.
Each field (or column) needs an argument so MARCEdit will correctly place each data in the right field.
Once this is done there will be several arguments listed and can be more complex than this.
What I generate at the end is a dummy MARC record. The dummy record contains the reading list code and the record number. Using the record number I can input the reading list code into Sierra.
Reclassification is one of the simplest processes I run. It is important in the age of shelf ready to keep collections up to date as new class ranges are formed or discontinued. This leads to split runs of books on the same topic. I extract the records from Sierra, run in MARCEdit “generate classification” and then load the new classmarks back into Sierra into a local field the same way I load the module codes.
The scope of the project is potentially any legacy record that is not as good as it could be for discovery and access. However to begin with I selected all those records that lacked LCSH. These records had other issues I outlined at the start, such as in accurate headings (or no headings), missing data and so on. I followed various steps from normalising the search data, to enhancing the records with automatic processes in MARCEdit. It even gave me an opportunity to add some linked data elements to our records.
This summarises the steps I take.
I exporting the target records from Sierra and put them through MarcBreaker. From here I extracted different data points from the records to form a search strategy. This data needed significant normalization as metadata transcription practices vary enormously. Once I have the normalised data points I make them searchable via z39.50
This examples shows a sample search for the author (surname only), the ISBN, the title proper and the publication date.
I select sources I have access to where better records are likely to exist, in this case the Library of Congress, and run a custom search. I then analyse the results and see what was accepted and rejected. Sometimes records do exist but the data points don’t match, such as the date or even the title proper. All those that fail to match I save and try another source such as the British Library.
At this point I take the downloaded record and match it to the original Sierra record using the Merge Records function. I insert the Sierra local number into the improved external record and run the MarcEdit automatic transformations adding Fast headings, making some programmatic RDA changes and adding linked data points.
For all of these projects I have learned the importance of data normalisation. It has made me more aware of consistency in transcription and showed me fairly clearly the limitations of MARC for machine processing.
Documentation is very important, recording what was done and what needs to be done really helps to keep the projects coherent. I put a high importance on naming my files as descriptively as possible and where possible recording notes or comments.
Another important feature to the projects was using the online file sharing site Box. It versions every time I save, so any mistakes I make including in editing MARC records can be undone if I inadvertently save it.
When dealing with batches of records I find it is vital to break them down into smaller chunks. It adds a bit of time but it makes dealing with the records much more manageable.
And finally trail and error really do work. I frequently had an idea, tried it out, found it didn’t work and amended it until it did.