3. Why?
Metadata problems when collections are aggregated:
• Savage, C. R. (Charles Roscoe), 1832-1909
• Savage, C. R.(Charles Roscoe),1832-1909
• Savage, C.R. (Charles Roscoe)
• Savage, Charles R.
• C. R. Savage (Charles Roscoe Savage and George
Ottinger), Pioneer Art Gallery, East Temple Street,
Salt Lake City, Utah
• Charles R. Savage
• Savage, Charles Roscoe
• C. R. (Charles Roscoe) Savage, photographer
• Savage, C. R.
• Charles R.
Charles R. Savage
BYU Digital Collections
http://contentdm.lib.byu.edu/cdm/ref/collection/Savage2/id/1749
Charles R. Savage,
perplexed by his
many name
variants.
5. IMLS Grant for regional authority control
4 phases:
1) investigating data models to express
local/regional name authority data using
linked data standards
2) evaluation of tools used for creating,
maintaining, and making this data
available
3) pilot implementation using the tools
investigated in the second phase
4) assessment of how this type of
authority data can improve digital
collection metadata on a local, regional,
and national level.
Thompson, Mickey; Salt Flats – Shot 24
https://collections.lib.utah.edu/details?id=694074
7. Libraries Working Together
Metadata Review
Received data from 8 institutions
• Brigham Young University
• Oregon Digital (Univ. of Oregon and
Oregon State Univ.)
• University of Nevada, Reno
• University of Utah
• University of Denver
• Utah Department of Heritage and Arts
• Utah State Archives
• Utah State University
Data gathered
• Name (including associated dates)
• Institution
• Collection
• Field(s)
• Type (personal name or corporate body)
• Cross references
Mormon Meteor -Shot 6
https://collections.lib.utah.edu/details?id=608551
9. Phase 1: metadata wrangling
(data metrics)
• Started with ~500,000 names
• Deduplicated to 76,360
• 7357 -- 2+ collections (9.6%)
• 1484 -- 2+ institutions (1.9%)
• 271 -- 2+ states (0.35%)
• 62,381 personal names
• 10,706 corporate bodies
• 3,273 unknown
• 1091 are single words
• 2400+ cross references
• 500+ PN are First Last
Total names submitted
• Brigham Young University - 30,535
• University of Utah - 7533
• Utah State University - 2067
• Utah State Historical Society - 12,138
• Utah State Archives - 3657
• University of Nevada, Reno - 1277
• Oregon Digital - 4170
• University of Denver - 16,608
13. Data Models: Collaborative Investigation
Data to capture
• Preferred form of name
• Alternate forms of name
• Local authority source
• Institution holdings
• Relationship information
Data Models Explored
• SKOS
• OWL
• BIBFRAME Authorities/Agent/Role
• EAC-CPF – look to SNAC as a model
http://socialarchive.iath.virginia.edu/
Campbell, Donald; Salt Flats – Shot 2
https://collections.lib.utah.edu/details?id=691549
15. Stats script leveraging DPLA
API
Search results in December 2016 Creator Subject
Name MWDL DPLA MWDL DPLA
Savage, C. R. 2022 2226 44 50
Savage, C. R. (Charles Roscoe) 1830 2023 43 44
Savage, C. R. (Charles Roscoe), 1832-
1909 1708 1901 43 43
Savage, C. R., 1832-1909 1708 1901 43 43
Savage, Charles 1936 2165 60 126
Savage, Charles R. 1932 2128 58 65
Savage, Charles R., 1832-1909 1708 1902 43 43
Savage, Charles Roscoe 1833 2034 44 53
Savage, Charles Roscoe, 1832-1909 1708 1904 43 43
Savage, Charles, 1832-1909 1708 1905 43 43
Savage, Chas. R. 1 1 0 1
Savage, Chas. R., 1832-1909 0 0 0 0
SQLlite
database, use
python
Requests
library to query
DPLA API - get
stats on
coverage for
name variants
in MWDL and
DPLA.
16. Workflow considerations -
2nd year
• Managing vocabulary
• Reconciliation issues
• Many partners on
CONTENTdm
• Need external system not
tied to specific digital
library
software/infrastructure
• Explore impact of WNAF on
metadata creators and users Thompson, Mickey; Salt Flats -Shot 11
Utah Department of Heritage and Arts, Salt Lake Tribune
Negative Collection
https://collections.lib.utah.edu/details?id=694077
17. Further reading and acknowledgements
Full Grant Narrative
https://www.imls.gov/grants/awarded/LG-72-16-0002-16
Project Webpage
https://sites.google.com/site/westernnameauthorityfile/home
This project was made possible in part by the
Institute of Museum and Library Services LG-72-
16-0002-16.
Notas del editor
Today I’m going to present on our pilot project to establish a regional vocabulary for names and corporate bodies.
Along the way too, I’m going to highlight a new collection of Salt Lake Tribune negatives from the Utah Department of Heritage and Arts – one of the partners of the University of Utah.
Phases for project – reviewing data models, evaluate tools, pilot implementation for tools and/or framework developed and assessment
Partners we received data from – note it is very Utah centric although we also have Nevada, Colorado, and Oregon represented.
JM
Started with over 500,000 names
Deduplicated to 76,360 (deduplicated based off of full name entered, institution, collection, field)
Lots more deduplication still needs to take place (some manual work that will wait for our graduate student assistant)
I expect the following stats to improve after more deduplication is done
7357 used in more than one collection/field (9.6%)
13 used in more than 20 collections/fields
80 used in more than 10 collections/fields
6795 used in 2-5 collections/fields
1484 used in more than one institution (1.9%)
1360 in two institutions
110 in three institutions
11 in four institutions
3 in five institutions
271 used in more than one state (.35%)
267 in two states
4 in three states
62,381 personal names
10,706 corporate bodies
3,273 unknown
1091 are single words (could be PN or CB or only family name)
1922 cross references (this will go up as the list is further refined and deduplicated)
~70 variations on C.R. Savage
424 variations on Shipler Commercial Photographers
500+ personal names are First Last (rather than Last, First)
many generic names shared between Colorado and Oregon
other common names include United States, US Army Corps of Engineers, Ronald Reagan, John Fremont, William Shakespeare, George Washington, Harry S. Truman, William Howard Taft (and other similar names that would be in LCNAF)
Total names submitted
BYU - 30,535
UU - 7533
USU - 2067
USHS - 12,138
UtSA - 3657
UNR - 1277
Oregon - 4170
DU - 16,608
AN - Using tools broadly, but we are evaluating solutions with the awareness that we are likely to end up with a solution that will be likely to be more of a framework. For example we might store names and related information with one particular software solution, but then need to export and transform the data for a public display.
AN - when we started we were trying to go very wide in terms of opening up all possibilities for testing. Some tools we were initially considering have been ruled out already, with comments in the evaluation matrix.
Reviewed lots of documentation about several data models
Due to scope of project and our goals (along with our partners goals), decided on EAC-CPF (Encoded Archival Context for Corporate Bodies, Personal Names, and Families)
Reviewing EAC fields we want to include now and fields to implement in the future
How to express this as RDF linked data
Creating cheat sheet for entering data we want to capture
Using document Gina Strack @ Utah State Archives prepared as starting point
Fields include:
Identity
Alternate Names
Multiple identities
Official Name
Authorized form
Parallel names
Relationships (in future)
Context (in future)
Resources
Baseline statistics on current names representation, will gather stats again at the end of the project
Will likely be building a prototype framework rather than finding one tool out there that will do everything we want it to, from managing names collaboratively, to expressing the data in EAC-CPF, to developing some workflows for metadata remediation for partners.