SlideShare una empresa de Scribd logo
1 de 19
Automating Controlled
Vocabulary Reconciliation
Anna Neatrour
Metadata Librarian
Jeremy Myntti
Interim Head, Digital Library Services
Summary
 Metadata inconsistency
 Overview of vendor authority process
 Further work with Open Refine
 Next steps
2
http://www.utahindians.org
Inconsistency
Gosiute Indians
Goshute Indians
Navajo Indians
Navaho Indians
Salt Lake
Salt Lake City
Salt Lake City (Utah)
Bishop, Dail Stapley
Bishop, Dale Stapely
Bishop, Dale Stapley
Beckwith, Frank A. (1876-1951)
Beckwith, Frank Asahel (1876-
1951)
Beckwith, Frank A.
Beckwith, Frank A. (1876-1951)
Beckwith, Frank Asahel (1876-1951)
Beckwith, Frank Asahel, 1876-1951
3
Woven basket or jug;
http://content.lib.utah.edu/cdm/ref/collection/
UU_Photo_Archives/id/13887
Project Timeline
4
June-Sept. 2012 – Define project
Oct. 2012 – May 2013 – Testing
June 2013 – Contracted with
Backstage Library Works
June 2013-Feb. 2014 – Continued
testing
Feb.-May 2014 – 17 collections
processed
June-Aug. 2014 – Manual review
(intern)
April 2015-today – Explore
OpenRefine
Methodology
5
<title>A group of St. George (Sibwit) Paiutes and Wickiups (cedar)</title>
<subjec>Paiute Indians; Ute Indians--History; Wickiups; Indians of North
America--Dwellings;</subject>
<covspa>Utah;</covspa>
<descri>A group of people sitting and standing in front of a brush
shelter;<descri>
<publis>Digitized by: J. Willard Marriott Library, University of Utah;</publis>
<type>Image;StillImage;</type>
<format>image/jpeg;</format>
http://content.lib.utah.edu/cdm/ref/collection/uaida/id/14697
Backstage: statistics and
reports
6
Unmatched report
Change report
Backstage: standardization
Capitalization, Punctuation, and
Updated Authorized Access Points
Forests and Forestry – Utah
forests and forestry -- Utah
Forest lands - Utah
Forests and forestry--Utah
7
A group of Navajos at Navajo Mountain
government school;
http://content.lib.utah.edu/cdm/ref/collection/
uaida/id/43551
Backstage: problems
encountered
Missing MARC tags
 Names treated as topical
headings and vice versa
 Provo => Provisional IRA
Data in wrong fields
 Date: Price Hiram, 1814-
1901
Incorrect match
 Local names matching wrong
records
 Johnson, Abe is not Johnson,
F. T.
8
Walker War Map 1853-1854;
http://content.lib.utah.edu/cdm/ref/collection/
uaida/id/15474
Intern review and clean-up
9
OpenRefine project
◦ Used UAIDA as a
pilot, since it had the
greatest number of
unmatched names
due to the size of the
collection (over
8,000 items)
◦ 529 unmatched
names after
Backstage process
10
Navajo woman weaving,
http://content.lib.utah.edu/cdm/ref/collection/uaida/id/45379
OpenRefine: two approaches
 Reconciliation
process developed
by Jenn Wright and
Matt Carruthers,
University of
Michigan Library,
https://github.com/mcarruthers
/LCNAF-Named-Entity-
Reconciliation
 Reconciliation
process developed
by Roderic Page,
http://iphylo.blogspot.com/2
013/04/reconciling-author-
names-using-open.html
11
A group of Navajo children and teenagers,
http://content.lib.utah.edu/cdm/ref/collection/uaida/id/43285
OpenRefine: differences in
results
 Both processes found name matches
through searching VIAF.
◦ Wright and Carruthers’ process looked for
a matching LC authority record in the
VIAF cluster
 81 records were matched, 132 were false
matches, and 312 number had no match
◦ Page’s process matched names to
authors in a more general fashion
 70 records were matched, 37 were false
matches, and 449 had no match.
12
OpenRefine: manual work
 Check matches against collection and
discard false matches
13
OpenRefine: updating UAIDA
 We updated an
additional 455
records with
updated names.
 405 matches were
from both
processes, 38 were
unique to Wright
and Carruthers and
5 were matched by
the Page process.
14
Eight Hopi Baskets,
http://content.lib.utah.edu/cdm/ref/collection/uaida/id/45009
Open Refine: student work
 Fall 2015 – student ran additional
unmatched items from other
collections through OpenRefine with
Wright & Carruthers process
 Metadata librarian currently reviewing
student work and updating collections
15
Next Steps
Create local and regional controlled
vocabularies
16
Next Steps: Reconcile across
more collections
 CONTENTdm
metadata exported
in SOLR
 Easier to get list of
personal names
across all
collections
 Explore other
reconciliation
methods
17
Next Steps
URIs in Digital Collections Metadata,
MWDL (Primo),
and DPLA
18
http://content.lib.utah.edu/cdm/ref/
collection/uaida/id/43183
Questions?
19
Anna Neatrour | anna.neatrour@utah.edu
Metadata Librarian
Jeremy Myntti | jeremy.myntti@utah.edu
Interim Head, Digital Library Services
Forthcoming article:,
Use Existing Data First: Reconcile Metadata Before
Creating New Controlled Vocabularies. Journal of
Library Metadata.
http://dx.doi.org/10.1080/19386389.2015.1099989

Más contenido relacionado

Destacado

Mil y una formas de contar historias II
Mil y una formas de contar historias IIMil y una formas de contar historias II
Mil y una formas de contar historias IICaralonsorey
 
Amanda rojas gonzales 40514405
Amanda rojas gonzales 40514405Amanda rojas gonzales 40514405
Amanda rojas gonzales 40514405Diego Hernández
 
Búsqueda de Recursos
Búsqueda de RecursosBúsqueda de Recursos
Búsqueda de RecursosCaralonsorey
 
教師教育改革の国際的動向:改革根拠と手法
教師教育改革の国際的動向:改革根拠と手法教師教育改革の国際的動向:改革根拠と手法
教師教育改革の国際的動向:改革根拠と手法Makito Yurita
 
World Coast Journey for the Preservation of the Seas
World Coast Journey for the Preservation of the SeasWorld Coast Journey for the Preservation of the Seas
World Coast Journey for the Preservation of the Seasguigevaerd
 
Blogs: Creación y Gestión
Blogs: Creación y GestiónBlogs: Creación y Gestión
Blogs: Creación y GestiónCaralonsorey
 
Squicciarini - Introduction from moderator
Squicciarini - Introduction from moderatorSquicciarini - Introduction from moderator
Squicciarini - Introduction from moderatorinnovationoecd
 
Exposec2015 slideshare
Exposec2015 slideshareExposec2015 slideshare
Exposec2015 slideshareCelso Calazans
 
Devdatt Dubhashi gfke 2014
Devdatt Dubhashi gfke 2014Devdatt Dubhashi gfke 2014
Devdatt Dubhashi gfke 2014innovationoecd
 
Barbe - Trade secrets international trade policy and empirical research
Barbe - Trade secrets international trade policy and empirical researchBarbe - Trade secrets international trade policy and empirical research
Barbe - Trade secrets international trade policy and empirical researchinnovationoecd
 
Arquitetura de Software - Concorrência
Arquitetura de Software - ConcorrênciaArquitetura de Software - Concorrência
Arquitetura de Software - ConcorrênciaAndré Faria Gomes
 
Entorno abies 1 general
Entorno abies 1 generalEntorno abies 1 general
Entorno abies 1 generalcuruena
 
Aborto e lesão corporal
Aborto e lesão corporalAborto e lesão corporal
Aborto e lesão corporalcrisdupret
 
Raffo - Measuring incremental innovation IP data to capture non-radical inven...
Raffo - Measuring incremental innovation IP data to capture non-radical inven...Raffo - Measuring incremental innovation IP data to capture non-radical inven...
Raffo - Measuring incremental innovation IP data to capture non-radical inven...innovationoecd
 

Destacado (20)

Mil y una formas de contar historias II
Mil y una formas de contar historias IIMil y una formas de contar historias II
Mil y una formas de contar historias II
 
Amanda rojas gonzales 40514405
Amanda rojas gonzales 40514405Amanda rojas gonzales 40514405
Amanda rojas gonzales 40514405
 
Búsqueda de Recursos
Búsqueda de RecursosBúsqueda de Recursos
Búsqueda de Recursos
 
教師教育改革の国際的動向:改革根拠と手法
教師教育改革の国際的動向:改革根拠と手法教師教育改革の国際的動向:改革根拠と手法
教師教育改革の国際的動向:改革根拠と手法
 
World Coast Journey for the Preservation of the Seas
World Coast Journey for the Preservation of the SeasWorld Coast Journey for the Preservation of the Seas
World Coast Journey for the Preservation of the Seas
 
Blogs: Creación y Gestión
Blogs: Creación y GestiónBlogs: Creación y Gestión
Blogs: Creación y Gestión
 
ChinaNANO2007
ChinaNANO2007ChinaNANO2007
ChinaNANO2007
 
Squicciarini - Introduction from moderator
Squicciarini - Introduction from moderatorSquicciarini - Introduction from moderator
Squicciarini - Introduction from moderator
 
Exposec2015 slideshare
Exposec2015 slideshareExposec2015 slideshare
Exposec2015 slideshare
 
Engargolado
EngargoladoEngargolado
Engargolado
 
Devdatt Dubhashi gfke 2014
Devdatt Dubhashi gfke 2014Devdatt Dubhashi gfke 2014
Devdatt Dubhashi gfke 2014
 
Ensayo iNNOVACION EDUCATIVA
Ensayo iNNOVACION EDUCATIVAEnsayo iNNOVACION EDUCATIVA
Ensayo iNNOVACION EDUCATIVA
 
Presentation2
Presentation2Presentation2
Presentation2
 
Barbe - Trade secrets international trade policy and empirical research
Barbe - Trade secrets international trade policy and empirical researchBarbe - Trade secrets international trade policy and empirical research
Barbe - Trade secrets international trade policy and empirical research
 
Arquitetura de Software - Concorrência
Arquitetura de Software - ConcorrênciaArquitetura de Software - Concorrência
Arquitetura de Software - Concorrência
 
Maus tratos
Maus tratosMaus tratos
Maus tratos
 
Direito penal i concurso de pessoas
Direito penal i   concurso de pessoasDireito penal i   concurso de pessoas
Direito penal i concurso de pessoas
 
Entorno abies 1 general
Entorno abies 1 generalEntorno abies 1 general
Entorno abies 1 general
 
Aborto e lesão corporal
Aborto e lesão corporalAborto e lesão corporal
Aborto e lesão corporal
 
Raffo - Measuring incremental innovation IP data to capture non-radical inven...
Raffo - Measuring incremental innovation IP data to capture non-radical inven...Raffo - Measuring incremental innovation IP data to capture non-radical inven...
Raffo - Measuring incremental innovation IP data to capture non-radical inven...
 

Similar a Automating Controlled Vocabulary Reconciliation

SAA 2014 session 703
SAA 2014 session 703SAA 2014 session 703
SAA 2014 session 703rosalielack
 
Social Machines of Science and Scholarship
Social Machines of Science and ScholarshipSocial Machines of Science and Scholarship
Social Machines of Science and ScholarshipDavid De Roure
 
Towards OpenURL Quality Metrics: Initial Findings
Towards OpenURL Quality Metrics: Initial FindingsTowards OpenURL Quality Metrics: Initial Findings
Towards OpenURL Quality Metrics: Initial Findingsalc28
 
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...DeVonne Parks, CEM
 
A demonstration of transparent and scalable OpenURL quality metrics for use i...
A demonstration of transparent and scalable OpenURL quality metrics for use i...A demonstration of transparent and scalable OpenURL quality metrics for use i...
A demonstration of transparent and scalable OpenURL quality metrics for use i...alc28
 
HIBERLINK: Reference Rot and Linked Data: Threat and Remedy
HIBERLINK: Reference Rot and Linked Data: Threat and RemedyHIBERLINK: Reference Rot and Linked Data: Threat and Remedy
HIBERLINK: Reference Rot and Linked Data: Threat and RemedyPRELIDA Project
 
BIBFRAME : the future of cataloguing?
BIBFRAME : the future of cataloguing?BIBFRAME : the future of cataloguing?
BIBFRAME : the future of cataloguing?Thomas Meehan
 
Libraries, collections, technology: presented at Pennylvania State University...
Libraries, collections, technology: presented at Pennylvania State University...Libraries, collections, technology: presented at Pennylvania State University...
Libraries, collections, technology: presented at Pennylvania State University...lisld
 
VRA_2015_CatalogingRoundup_Seneff
VRA_2015_CatalogingRoundup_SeneffVRA_2015_CatalogingRoundup_Seneff
VRA_2015_CatalogingRoundup_SeneffHeather Seneff
 
Foundations to Actions: Extending Innovations to Digital Libraries in Partner...
Foundations to Actions: Extending Innovations to Digital Libraries in Partner...Foundations to Actions: Extending Innovations to Digital Libraries in Partner...
Foundations to Actions: Extending Innovations to Digital Libraries in Partner...Trish Rose-Sandler
 
User Responses to Finding Aids
User Responses to Finding AidsUser Responses to Finding Aids
User Responses to Finding Aidsamandajholgate
 
User-centered assessment: Leveraging what you know and filling in the gaps.
User-centered assessment: Leveraging what you know and filling in the gaps. User-centered assessment: Leveraging what you know and filling in the gaps.
User-centered assessment: Leveraging what you know and filling in the gaps. Lynn Connaway
 
Emerging Technologies And Sla – Vance Stevens
Emerging Technologies And Sla – Vance StevensEmerging Technologies And Sla – Vance Stevens
Emerging Technologies And Sla – Vance StevensVance Stevens
 
Beyond Management: Data Curation as Scholarship in Archaeology
Beyond Management: Data Curation as Scholarship in ArchaeologyBeyond Management: Data Curation as Scholarship in Archaeology
Beyond Management: Data Curation as Scholarship in ArchaeologySarah Whitcher Kansa
 
Building a Community Digital Archive in the Post-Custodial World, Krystyna Ma...
Building a Community Digital Archive in the Post-Custodial World, Krystyna Ma...Building a Community Digital Archive in the Post-Custodial World, Krystyna Ma...
Building a Community Digital Archive in the Post-Custodial World, Krystyna Ma...Visual Resources Association
 
Linked dataworkshopintro14aug2014
Linked dataworkshopintro14aug2014Linked dataworkshopintro14aug2014
Linked dataworkshopintro14aug2014Jane Stevenson
 

Similar a Automating Controlled Vocabulary Reconciliation (20)

SAA 2014 session 703
SAA 2014 session 703SAA 2014 session 703
SAA 2014 session 703
 
Social Machines of Science and Scholarship
Social Machines of Science and ScholarshipSocial Machines of Science and Scholarship
Social Machines of Science and Scholarship
 
Towards OpenURL Quality Metrics: Initial Findings
Towards OpenURL Quality Metrics: Initial FindingsTowards OpenURL Quality Metrics: Initial Findings
Towards OpenURL Quality Metrics: Initial Findings
 
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
 
A demonstration of transparent and scalable OpenURL quality metrics for use i...
A demonstration of transparent and scalable OpenURL quality metrics for use i...A demonstration of transparent and scalable OpenURL quality metrics for use i...
A demonstration of transparent and scalable OpenURL quality metrics for use i...
 
HIBERLINK: Reference Rot and Linked Data: Threat and Remedy
HIBERLINK: Reference Rot and Linked Data: Threat and RemedyHIBERLINK: Reference Rot and Linked Data: Threat and Remedy
HIBERLINK: Reference Rot and Linked Data: Threat and Remedy
 
"In the Early Days of a Better Nation": Enhancing the power of metadata today...
"In the Early Days of a Better Nation": Enhancing the power of metadata today..."In the Early Days of a Better Nation": Enhancing the power of metadata today...
"In the Early Days of a Better Nation": Enhancing the power of metadata today...
 
BIBFRAME : the future of cataloguing?
BIBFRAME : the future of cataloguing?BIBFRAME : the future of cataloguing?
BIBFRAME : the future of cataloguing?
 
Libraries, collections, technology: presented at Pennylvania State University...
Libraries, collections, technology: presented at Pennylvania State University...Libraries, collections, technology: presented at Pennylvania State University...
Libraries, collections, technology: presented at Pennylvania State University...
 
VRA_2015_CatalogingRoundup_Seneff
VRA_2015_CatalogingRoundup_SeneffVRA_2015_CatalogingRoundup_Seneff
VRA_2015_CatalogingRoundup_Seneff
 
Foundations to Actions: Extending Innovations to Digital Libraries in Partner...
Foundations to Actions: Extending Innovations to Digital Libraries in Partner...Foundations to Actions: Extending Innovations to Digital Libraries in Partner...
Foundations to Actions: Extending Innovations to Digital Libraries in Partner...
 
User Responses to Finding Aids
User Responses to Finding AidsUser Responses to Finding Aids
User Responses to Finding Aids
 
Drupal and Libraries
Drupal and LibrariesDrupal and Libraries
Drupal and Libraries
 
User-centered assessment: Leveraging what you know and filling in the gaps.
User-centered assessment: Leveraging what you know and filling in the gaps. User-centered assessment: Leveraging what you know and filling in the gaps.
User-centered assessment: Leveraging what you know and filling in the gaps.
 
Emerging Technologies And Sla – Vance Stevens
Emerging Technologies And Sla – Vance StevensEmerging Technologies And Sla – Vance Stevens
Emerging Technologies And Sla – Vance Stevens
 
Beyond Management: Data Curation as Scholarship in Archaeology
Beyond Management: Data Curation as Scholarship in ArchaeologyBeyond Management: Data Curation as Scholarship in Archaeology
Beyond Management: Data Curation as Scholarship in Archaeology
 
Building a Community Digital Archive in the Post-Custodial World, Krystyna Ma...
Building a Community Digital Archive in the Post-Custodial World, Krystyna Ma...Building a Community Digital Archive in the Post-Custodial World, Krystyna Ma...
Building a Community Digital Archive in the Post-Custodial World, Krystyna Ma...
 
Open Science
Open Science Open Science
Open Science
 
Linked dataworkshopintro14aug2014
Linked dataworkshopintro14aug2014Linked dataworkshopintro14aug2014
Linked dataworkshopintro14aug2014
 
Qpat 2007
Qpat 2007Qpat 2007
Qpat 2007
 

Último

Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxVishalSingh1417
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfChris Hunter
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Shubhangi Sonawane
 
Role Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptxRole Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptxNikitaBankoti2
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxnegromaestrong
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesEnergy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesShubhangi Sonawane
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 

Último (20)

Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
Role Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptxRole Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptx
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesEnergy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 

Automating Controlled Vocabulary Reconciliation

  • 1. Automating Controlled Vocabulary Reconciliation Anna Neatrour Metadata Librarian Jeremy Myntti Interim Head, Digital Library Services
  • 2. Summary  Metadata inconsistency  Overview of vendor authority process  Further work with Open Refine  Next steps 2 http://www.utahindians.org
  • 3. Inconsistency Gosiute Indians Goshute Indians Navajo Indians Navaho Indians Salt Lake Salt Lake City Salt Lake City (Utah) Bishop, Dail Stapley Bishop, Dale Stapely Bishop, Dale Stapley Beckwith, Frank A. (1876-1951) Beckwith, Frank Asahel (1876- 1951) Beckwith, Frank A. Beckwith, Frank A. (1876-1951) Beckwith, Frank Asahel (1876-1951) Beckwith, Frank Asahel, 1876-1951 3 Woven basket or jug; http://content.lib.utah.edu/cdm/ref/collection/ UU_Photo_Archives/id/13887
  • 4. Project Timeline 4 June-Sept. 2012 – Define project Oct. 2012 – May 2013 – Testing June 2013 – Contracted with Backstage Library Works June 2013-Feb. 2014 – Continued testing Feb.-May 2014 – 17 collections processed June-Aug. 2014 – Manual review (intern) April 2015-today – Explore OpenRefine
  • 5. Methodology 5 <title>A group of St. George (Sibwit) Paiutes and Wickiups (cedar)</title> <subjec>Paiute Indians; Ute Indians--History; Wickiups; Indians of North America--Dwellings;</subject> <covspa>Utah;</covspa> <descri>A group of people sitting and standing in front of a brush shelter;<descri> <publis>Digitized by: J. Willard Marriott Library, University of Utah;</publis> <type>Image;StillImage;</type> <format>image/jpeg;</format> http://content.lib.utah.edu/cdm/ref/collection/uaida/id/14697
  • 7. Backstage: standardization Capitalization, Punctuation, and Updated Authorized Access Points Forests and Forestry – Utah forests and forestry -- Utah Forest lands - Utah Forests and forestry--Utah 7 A group of Navajos at Navajo Mountain government school; http://content.lib.utah.edu/cdm/ref/collection/ uaida/id/43551
  • 8. Backstage: problems encountered Missing MARC tags  Names treated as topical headings and vice versa  Provo => Provisional IRA Data in wrong fields  Date: Price Hiram, 1814- 1901 Incorrect match  Local names matching wrong records  Johnson, Abe is not Johnson, F. T. 8 Walker War Map 1853-1854; http://content.lib.utah.edu/cdm/ref/collection/ uaida/id/15474
  • 9. Intern review and clean-up 9
  • 10. OpenRefine project ◦ Used UAIDA as a pilot, since it had the greatest number of unmatched names due to the size of the collection (over 8,000 items) ◦ 529 unmatched names after Backstage process 10 Navajo woman weaving, http://content.lib.utah.edu/cdm/ref/collection/uaida/id/45379
  • 11. OpenRefine: two approaches  Reconciliation process developed by Jenn Wright and Matt Carruthers, University of Michigan Library, https://github.com/mcarruthers /LCNAF-Named-Entity- Reconciliation  Reconciliation process developed by Roderic Page, http://iphylo.blogspot.com/2 013/04/reconciling-author- names-using-open.html 11 A group of Navajo children and teenagers, http://content.lib.utah.edu/cdm/ref/collection/uaida/id/43285
  • 12. OpenRefine: differences in results  Both processes found name matches through searching VIAF. ◦ Wright and Carruthers’ process looked for a matching LC authority record in the VIAF cluster  81 records were matched, 132 were false matches, and 312 number had no match ◦ Page’s process matched names to authors in a more general fashion  70 records were matched, 37 were false matches, and 449 had no match. 12
  • 13. OpenRefine: manual work  Check matches against collection and discard false matches 13
  • 14. OpenRefine: updating UAIDA  We updated an additional 455 records with updated names.  405 matches were from both processes, 38 were unique to Wright and Carruthers and 5 were matched by the Page process. 14 Eight Hopi Baskets, http://content.lib.utah.edu/cdm/ref/collection/uaida/id/45009
  • 15. Open Refine: student work  Fall 2015 – student ran additional unmatched items from other collections through OpenRefine with Wright & Carruthers process  Metadata librarian currently reviewing student work and updating collections 15
  • 16. Next Steps Create local and regional controlled vocabularies 16
  • 17. Next Steps: Reconcile across more collections  CONTENTdm metadata exported in SOLR  Easier to get list of personal names across all collections  Explore other reconciliation methods 17
  • 18. Next Steps URIs in Digital Collections Metadata, MWDL (Primo), and DPLA 18 http://content.lib.utah.edu/cdm/ref/ collection/uaida/id/43183
  • 19. Questions? 19 Anna Neatrour | anna.neatrour@utah.edu Metadata Librarian Jeremy Myntti | jeremy.myntti@utah.edu Interim Head, Digital Library Services Forthcoming article:, Use Existing Data First: Reconcile Metadata Before Creating New Controlled Vocabularies. Journal of Library Metadata. http://dx.doi.org/10.1080/19386389.2015.1099989

Notas del editor

  1. I’m Anna Neatrour, Metadata librarian at the University of Utah Marriott Library. I’m also presenting on behalf of Jeremy Myntti who is Interim Head of Digital Library Services.
  2. I’ll provide an overview of the metadata problems we had, what the vendor supplied authority service did, go over further reconciliation work, and talk about our plans for the future. Throughout this presentation you’ll see examples of items from the Utah American Indian Digital Archive, which was one of the collections we processed.
  3. We’ve been creating digital library collections for a long time. Our existing records had a great deal of inconsistency both for personal names and subjects. Having six different ways of expressing the name of one person is really bad, and leads to problems for users in faceting and discovery.
  4. Project started in 2012, and we’re just moving on to a different stage of it today. 17 of our collections were processed by Backstage Library Works, and we did additional review and post processing.
  5. Processing digital collections XML files similar to existing processes developed at Backstage for MARC records for names and subjects. Had previously done other projects at our library where we changed raw CONTENTdm metadata. Backstage did their automated authority control on the CONTENTdm desc.all file, which is the way metadata is stored internally in the system. Different from the file you get if you export metadata as a collection manager in CONTENTdm. If you have hosted contentdm you cannot do this. Skip----------- Extracting data from CONTENTdm Stop updates to collection and make it read-only Make copy of desc.all metadata file for backup. Run desc.all file through AC processing from Backstage Replace desc.all file on CONTENTdm server Run the full collection index Remove read-only status from collection
  6. Get reports on matches, no matches, and changes report back from Backstage. In matched headings report, it also included URIs for id.loc.gov and VIAF in some cases, so we have URIs for items we can then use if we want to express our information as linked data in the future. Skip----------------- For UAIDA: Creator/Contributor names (7033) 10% changed (669) 48% matched 3342) Subjects (98931) 21% changed (21072) 76% matched (75471)
  7. In addition to matching authorities, Backstage helped with fixing punctuation problems. Here’s an example of three poorly punctuated subject heading variants that got fixed. Skip------- Space double dash space Word em dash Single dash Convert all to double dash Each collection may be different, so need to watch out for which ways to standardize (single dash may not need to be converted in some collections) Capitalizing every word Capitalizing nothing Convert to the correct capitalization according to LCSH Older forms of names used (pre-RDA) Cross references used rather than authorized access point This happened a lot because of training issues. Many students for a few years didn’t realize the correct form of an access point that should be used from the 1xx in the auth “access points” = heading
  8. Backstage makes certain assumptions about the headings Part of that is shunting data into a topical subject heading as opposed to a geographic or personal or corporate name headings See sample of matching issues on this slide: -skip- So “Provo” goes in as a 650 field; our system performs the match as a Subject heading; so we find an authority where the 110 field is: Provisional IRA And the 410 field is: Provo With “Cars” we searched this as a generic name heading, and lopped off The date of 2002, so we found the conference heading instead We found that we need to be careful with single-word headings in CDM In fact, our recommendation might be to not search those at all or, if we do, then to report them rather than update them
  9. Backstage made 85,000 changes in our digital collections. There were 200 problems in the data that they changed that were fixed after review Need for lots of manual review for some collections Intern reviewed collections for 3 months – fixed nearly 2000 mistakes (mostly from metadata rather than Authority Control process Process pointed out need for training to encourage more consistency. (use correct access points, standardize punctuation, spacing, capitalization, date formats, etc., field usage, NACO or local auths?
  10. Used Utah American Indian Digital Archive first since it was a large collection. Wanted to do further reconciliation for the names Backstage wasn’t able to match and see if we could enhance the collections even more.
  11. We tested out two different approaches to matching personal names to see what would work best.
  12. Found better results with matching just against LC name authorities, as would be expected since the collection we were working on was a regional Utah collection.
  13. We have no Carlos Santana materials in this collection! Manually reviewed matches in Open Refine.
  14. Updated the desc.all file locally and reindexed the collection to get these changes live. At the end of this we combined all the matched names, and did a further update of the Utah American Indian Digital Archive desc.all collection. Skip------- Wright and Carruthers process had 262 undo/redo actions in the open refine project Page’s reconciliation process resulted in 424 undo/redo actions
  15. Updating manually if only a few changes, but we can also script against the desc.all as we did with UAIDA if any of the other collections have extensive updates.
  16. Create NACO records for notable people Investigate local controlled vocabularies for more regional personal names
  17. A few weeks ago, our metadata for our CONTENTdm collections was dumped into SOLR. Expect it will be much easier to work with. Building unified list of creators and contributors. Want to explore additional means of reconciliation.
  18. Put URIs in digital collections metadata, test how they appear in our repository, in MWDL, and DPLA. We have been putting geonames URIs in our metadata, but not personal name URIs yet. Could easily add the URIs at a future date when our repository can do more with them. Have several collections now where we have confidence in our metadata being cleaned up and matched to authorities, so this is a great base as we want to explore using Linked Data more.
  19. We have an article coming out in the Journal of Library Metadata that goes into greater detail about what I presented here today. Happy to take your questions.