SlideShare una empresa de Scribd logo
1 de 31
The magic of
MarcEdit, or, how I
learned to stop
worrying and love
metadata
By Will Peaden
w.peaden@aston.ac.uk
Introduction: Aston metadata
Card catalog
image by Megan
Amaral (CC BY 2.0)
History
► Library restructure in 1995
► Individual specialists roles dissolved and each
professional member of staff given many hats
► No metadata/cataloguing specialist from that
time until November 2016
► Small group of staff do their best to catalogue
in the interim
The state of the catalogue
► Authority control
was lacking
► Many fields missing
or incorrect
► Local subject index
► Local subject
headings
► No LCSH in some
records
► Hybrid e-book and
print book records
► Split multi-volume
works
MARCEdit
► First created in 1999 to
enable a data clean-up
project at Oregon State
University.
► Developed by Terry Reese
and updated by him
regularly
► Offered as a free download
► Has an enormous array of
functionality built into it
Metadata projects
Authority control
Module codes
Reclassification
Metadata enhancement
Authority control
► Authority control: established, unique,
consistent forms of terms for
disambiguation and collocation
► Project scope: to authorise the name and
subject headings in Sierra
► All records were in scope except PDA
records as these were not purchased
Data extraction and manipulation
1. Extract records in scope from Sierra and
save them locally
2. Use MarcBreaker
3. Validate name headings and embed URIs
Data extraction and manipulation
1. Extract records in scope from Sierra and
save them to locally
2. Use MarcBreaker
3. Validate name headings and embed URIs
Data extraction and manipulation
1. Extract records in scope from Sierra and
save them locally
2. Use MarcBreaker
3. Validate name headings and embed URIs
4. Extract 1XX, 7XX headings and URIs and
copy to Notepad++
Data extraction and manipulation
1. Extract records in scope from Sierra and
save them to locally
2. Use MarcBreaker
3. Validate name headings and embed URIs
4. Extract 1XX, 7XX headings and URIs and
copy to Notepad++
Data extraction and manipulation
1. Extract records in scope from Sierra and
save them locally
2. Use MarcBreaker
3. Validate name headings and embed URIs
4. Extract 1XX, 7XX headings and URIs and
copy to Notepad++
5. Use regular expressions to extract just the
LCCN
6. Make the LCCNs searchable via z39.50
Data extraction and manipulation
1. Extract records in scope from Sierra and
save them to locally
2. Use MarcBreaker
3. Validate name headings and embed URIs
4. Extract 1XX, 7XX headings and URIs and
copy to Notepad++
5. Use regular expressions to extract just the
LCCN
6. Make the LCCNs searchable via z39.50
To sum up
► MarcBreaker
► Validate headings
► Normalise data for searching
► z39.50
Module codes
► Project scope:
 Update all reading list items in Sierra with
the current course code
► Concerns these were out of date
► Codes in Sierra still important for current
workflows
Data from Talis
Excel to MARC records
Delimited Text Translator
Arguments
Arguments
Arguments
Dummy MARC records
Finishing
► Load dummy MARC records using custom
load table
► Matches on bibliographic number
► Only importing 980 field
► Use Sierra Global Update function to
update 900 module code field
Reclassification
► Some areas of the collection classified to an old
standard
► Split collections with shelf ready records
► Too many to individually reclassify
► MarcEdit function “Generate classification” based
on OCLCs Classify
► Project Scope: 301-307, just over 7000 titles
► Import the classification the same way as module
codes via local field 982
Pros
► Tool is fast and
easy to use
► Lots of extra
functionality such
as fast headings
► Accurate up to date
classification
(mostly)
► It relies on ISBN
and author/title
matching
► Some errors
► Some things simply
not found
…and cons
Metadata enhancement
► Project scope:
 Improve the metadata of Aston legacy records
starting with records lacking LCSH by fishing for
records
► Data preparation
► z39.50 search
► Data enhancement
► RDA
► Linked data?
Normalize data, z39.50 searching
► Export target records from Sierra
► Search for and extract data points such as
ISBN, title, main author, date of publication
► Normalize data e.g. remove fluff from 020s,
use only title proper, fixed dates, use
surnames only
► Make these data searchable via z39.50
Normalize data, z39.50 searching
Search for records, analyse results
Transformation and match points
Lessons learned and concluding remarks
► Data normalization takes time and is important
► Take care to document everything and make your
file metadata clear
► Using Box really helped with reversing mistakes
► Search files need to be manageable (probably no
bigger than 1000 records)
► Trial and error and Google are your friends
► Questions?

Más contenido relacionado

Similar a The magic of MarcEdit, or, how I learned to stop worrying and love metadata / Will Peaden

Deep Dive into Cassandra
Deep Dive into CassandraDeep Dive into Cassandra
Deep Dive into CassandraBrent Theisen
 
Tagging search solution design
Tagging search solution designTagging search solution design
Tagging search solution designAlexander Tokarev
 
20161004 “Open Data Web” – A Linked Open Data Repository Built with CKAN
20161004 “Open Data Web” – A Linked Open Data Repository Built with CKAN20161004 “Open Data Web” – A Linked Open Data Repository Built with CKAN
20161004 “Open Data Web” – A Linked Open Data Repository Built with CKANandrea huang
 
Using the Archivists' Toolkit: Hands-on practice and related tools
Using the Archivists' Toolkit: Hands-on practice and related toolsUsing the Archivists' Toolkit: Hands-on practice and related tools
Using the Archivists' Toolkit: Hands-on practice and related toolsAudra Eagle Yun
 
One to Many: The Story of Sharding at Box
One to Many: The Story of Sharding at BoxOne to Many: The Story of Sharding at Box
One to Many: The Story of Sharding at BoxFlorian Jourda
 
Ils on a shoe string budget
Ils on a shoe string budgetIls on a shoe string budget
Ils on a shoe string budgetJolene81
 
Stupid Index Block Tricks
Stupid Index Block TricksStupid Index Block Tricks
Stupid Index Block Trickshannonhill
 
Reimagining Serials handout: BIBFRAME Exercise
Reimagining Serials handout: BIBFRAME ExerciseReimagining Serials handout: BIBFRAME Exercise
Reimagining Serials handout: BIBFRAME ExerciseNASIG
 
“Open Data Web” – A Linked Open Data Repository Built with CKAN
“Open Data Web” – A Linked Open Data Repository Built with CKAN“Open Data Web” – A Linked Open Data Repository Built with CKAN
“Open Data Web” – A Linked Open Data Repository Built with CKANChengjen Lee
 
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...Spark Summit
 
Unleashing the Power of XSLT: Catalog Records in Batch
Unleashing the Power of XSLT: Catalog Records in BatchUnleashing the Power of XSLT: Catalog Records in Batch
Unleashing the Power of XSLT: Catalog Records in Batchc7002593
 
Big Data Day LA 2015 - Spark after Dark by Chris Fregly of Databricks
Big Data Day LA 2015 - Spark after Dark by Chris Fregly of DatabricksBig Data Day LA 2015 - Spark after Dark by Chris Fregly of Databricks
Big Data Day LA 2015 - Spark after Dark by Chris Fregly of DatabricksData Con LA
 
IMCSummit 2015 - Day 1 Developer Track - Spark After Dark: Generating High Qu...
IMCSummit 2015 - Day 1 Developer Track - Spark After Dark: Generating High Qu...IMCSummit 2015 - Day 1 Developer Track - Spark After Dark: Generating High Qu...
IMCSummit 2015 - Day 1 Developer Track - Spark After Dark: Generating High Qu...In-Memory Computing Summit
 
OECD policies & TERMINALFOUR as a policy for innovation:TERMINALFOUR t44u 2013
OECD policies & TERMINALFOUR as a policy for innovation:TERMINALFOUR t44u 2013OECD policies & TERMINALFOUR as a policy for innovation:TERMINALFOUR t44u 2013
OECD policies & TERMINALFOUR as a policy for innovation:TERMINALFOUR t44u 2013Terminalfour
 
ESWC SS 2013 - Tuesday Keynote Steffen Staab: Programming the Semantic Web
ESWC SS 2013 - Tuesday Keynote Steffen Staab: Programming the Semantic WebESWC SS 2013 - Tuesday Keynote Steffen Staab: Programming the Semantic Web
ESWC SS 2013 - Tuesday Keynote Steffen Staab: Programming the Semantic Webeswcsummerschool
 
Library Boot Camp: Basic Cataloging, Part 1
Library Boot Camp: Basic Cataloging, Part 1Library Boot Camp: Basic Cataloging, Part 1
Library Boot Camp: Basic Cataloging, Part 1Denise Garofalo
 
presentation on MARC21 Standard Bibliography for LibMS
presentation on MARC21 Standard Bibliography for LibMSpresentation on MARC21 Standard Bibliography for LibMS
presentation on MARC21 Standard Bibliography for LibMSMuhammad Zeeshan
 
Apache Carbondata: An Indexed Columnar File Format for Interactive Query with...
Apache Carbondata: An Indexed Columnar File Format for Interactive Query with...Apache Carbondata: An Indexed Columnar File Format for Interactive Query with...
Apache Carbondata: An Indexed Columnar File Format for Interactive Query with...Spark Summit
 

Similar a The magic of MarcEdit, or, how I learned to stop worrying and love metadata / Will Peaden (20)

Second Thoughts about Metadata Standards for Data
Second Thoughts about Metadata Standards for DataSecond Thoughts about Metadata Standards for Data
Second Thoughts about Metadata Standards for Data
 
Deep Dive into Cassandra
Deep Dive into CassandraDeep Dive into Cassandra
Deep Dive into Cassandra
 
Tthornton code4lib
Tthornton code4libTthornton code4lib
Tthornton code4lib
 
Tagging search solution design
Tagging search solution designTagging search solution design
Tagging search solution design
 
20161004 “Open Data Web” – A Linked Open Data Repository Built with CKAN
20161004 “Open Data Web” – A Linked Open Data Repository Built with CKAN20161004 “Open Data Web” – A Linked Open Data Repository Built with CKAN
20161004 “Open Data Web” – A Linked Open Data Repository Built with CKAN
 
Using the Archivists' Toolkit: Hands-on practice and related tools
Using the Archivists' Toolkit: Hands-on practice and related toolsUsing the Archivists' Toolkit: Hands-on practice and related tools
Using the Archivists' Toolkit: Hands-on practice and related tools
 
One to Many: The Story of Sharding at Box
One to Many: The Story of Sharding at BoxOne to Many: The Story of Sharding at Box
One to Many: The Story of Sharding at Box
 
Ils on a shoe string budget
Ils on a shoe string budgetIls on a shoe string budget
Ils on a shoe string budget
 
Stupid Index Block Tricks
Stupid Index Block TricksStupid Index Block Tricks
Stupid Index Block Tricks
 
Reimagining Serials handout: BIBFRAME Exercise
Reimagining Serials handout: BIBFRAME ExerciseReimagining Serials handout: BIBFRAME Exercise
Reimagining Serials handout: BIBFRAME Exercise
 
“Open Data Web” – A Linked Open Data Repository Built with CKAN
“Open Data Web” – A Linked Open Data Repository Built with CKAN“Open Data Web” – A Linked Open Data Repository Built with CKAN
“Open Data Web” – A Linked Open Data Repository Built with CKAN
 
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
 
Unleashing the Power of XSLT: Catalog Records in Batch
Unleashing the Power of XSLT: Catalog Records in BatchUnleashing the Power of XSLT: Catalog Records in Batch
Unleashing the Power of XSLT: Catalog Records in Batch
 
Big Data Day LA 2015 - Spark after Dark by Chris Fregly of Databricks
Big Data Day LA 2015 - Spark after Dark by Chris Fregly of DatabricksBig Data Day LA 2015 - Spark after Dark by Chris Fregly of Databricks
Big Data Day LA 2015 - Spark after Dark by Chris Fregly of Databricks
 
IMCSummit 2015 - Day 1 Developer Track - Spark After Dark: Generating High Qu...
IMCSummit 2015 - Day 1 Developer Track - Spark After Dark: Generating High Qu...IMCSummit 2015 - Day 1 Developer Track - Spark After Dark: Generating High Qu...
IMCSummit 2015 - Day 1 Developer Track - Spark After Dark: Generating High Qu...
 
OECD policies & TERMINALFOUR as a policy for innovation:TERMINALFOUR t44u 2013
OECD policies & TERMINALFOUR as a policy for innovation:TERMINALFOUR t44u 2013OECD policies & TERMINALFOUR as a policy for innovation:TERMINALFOUR t44u 2013
OECD policies & TERMINALFOUR as a policy for innovation:TERMINALFOUR t44u 2013
 
ESWC SS 2013 - Tuesday Keynote Steffen Staab: Programming the Semantic Web
ESWC SS 2013 - Tuesday Keynote Steffen Staab: Programming the Semantic WebESWC SS 2013 - Tuesday Keynote Steffen Staab: Programming the Semantic Web
ESWC SS 2013 - Tuesday Keynote Steffen Staab: Programming the Semantic Web
 
Library Boot Camp: Basic Cataloging, Part 1
Library Boot Camp: Basic Cataloging, Part 1Library Boot Camp: Basic Cataloging, Part 1
Library Boot Camp: Basic Cataloging, Part 1
 
presentation on MARC21 Standard Bibliography for LibMS
presentation on MARC21 Standard Bibliography for LibMSpresentation on MARC21 Standard Bibliography for LibMS
presentation on MARC21 Standard Bibliography for LibMS
 
Apache Carbondata: An Indexed Columnar File Format for Interactive Query with...
Apache Carbondata: An Indexed Columnar File Format for Interactive Query with...Apache Carbondata: An Indexed Columnar File Format for Interactive Query with...
Apache Carbondata: An Indexed Columnar File Format for Interactive Query with...
 

Más de CILIP MDG

UK Committee on RDA, RDA Day: New Tools for the Future of Cataloguing - Jenny...
UK Committee on RDA, RDA Day: New Tools for the Future of Cataloguing - Jenny...UK Committee on RDA, RDA Day: New Tools for the Future of Cataloguing - Jenny...
UK Committee on RDA, RDA Day: New Tools for the Future of Cataloguing - Jenny...CILIP MDG
 
Challenges to implementation - Jenny Wright
Challenges to implementation - Jenny WrightChallenges to implementation - Jenny Wright
Challenges to implementation - Jenny WrightCILIP MDG
 
Application Profiles in RDA - Jenny Wright
Application Profiles in RDA - Jenny WrightApplication Profiles in RDA - Jenny Wright
Application Profiles in RDA - Jenny WrightCILIP MDG
 
The Official RDA Toolkit - Opportunities for Efficiency - Thurstan Young
The Official RDA Toolkit - Opportunities for Efficiency - Thurstan YoungThe Official RDA Toolkit - Opportunities for Efficiency - Thurstan Young
The Official RDA Toolkit - Opportunities for Efficiency - Thurstan YoungCILIP MDG
 
The Official RDA Toolkit - Opportunities for Enrichment - Thurstan Youing
The Official RDA Toolkit - Opportunities for Enrichment - Thurstan YouingThe Official RDA Toolkit - Opportunities for Enrichment - Thurstan Youing
The Official RDA Toolkit - Opportunities for Enrichment - Thurstan YouingCILIP MDG
 
UKCoR RDA Day 2023 - "Only" Connect
UKCoR RDA Day 2023 - "Only" ConnectUKCoR RDA Day 2023 - "Only" Connect
UKCoR RDA Day 2023 - "Only" ConnectCILIP MDG
 
RDA methods, scenarios, tools - Gordon Dunsire
RDA methods, scenarios, tools - Gordon DunsireRDA methods, scenarios, tools - Gordon Dunsire
RDA methods, scenarios, tools - Gordon DunsireCILIP MDG
 
Poster: What’s in a name? Re-Discovering cataloguing and index through metada...
Poster: What’s in a name? Re-Discovering cataloguing and index through metada...Poster: What’s in a name? Re-Discovering cataloguing and index through metada...
Poster: What’s in a name? Re-Discovering cataloguing and index through metada...CILIP MDG
 
Poster: Revamping our in-house cataloguing training / Victoria Parkinson (Kin...
Poster: Revamping our in-house cataloguing training / Victoria Parkinson (Kin...Poster: Revamping our in-house cataloguing training / Victoria Parkinson (Kin...
Poster: Revamping our in-house cataloguing training / Victoria Parkinson (Kin...CILIP MDG
 
Poster: FAST : can it lighten the load, and what is the impact? / Jenny Wrigh...
Poster: FAST : can it lighten the load, and what is the impact? / Jenny Wrigh...Poster: FAST : can it lighten the load, and what is the impact? / Jenny Wrigh...
Poster: FAST : can it lighten the load, and what is the impact? / Jenny Wrigh...CILIP MDG
 
Poster: The West Midlands Evidence Repository (WMER) : a regional collaborati...
Poster: The West Midlands Evidence Repository (WMER) : a regional collaborati...Poster: The West Midlands Evidence Repository (WMER) : a regional collaborati...
Poster: The West Midlands Evidence Repository (WMER) : a regional collaborati...CILIP MDG
 
Poster: Updating the Wessex Classification Scheme for UK health libraries : a...
Poster: Updating the Wessex Classification Scheme for UK health libraries : a...Poster: Updating the Wessex Classification Scheme for UK health libraries : a...
Poster: Updating the Wessex Classification Scheme for UK health libraries : a...CILIP MDG
 
Revamping in-house cataloguing training / Victoria Parkinson (King's College ...
Revamping in-house cataloguing training / Victoria Parkinson (King's College ...Revamping in-house cataloguing training / Victoria Parkinson (King's College ...
Revamping in-house cataloguing training / Victoria Parkinson (King's College ...CILIP MDG
 
UK NACO funnel : progress, obstacles, and solutions / Martin Kelleher (Univer...
UK NACO funnel : progress, obstacles, and solutions / Martin Kelleher (Univer...UK NACO funnel : progress, obstacles, and solutions / Martin Kelleher (Univer...
UK NACO funnel : progress, obstacles, and solutions / Martin Kelleher (Univer...CILIP MDG
 
Ship[w]right[e]s? : the challenges of cataloguing reports from scientific exp...
Ship[w]right[e]s? : the challenges of cataloguing reports from scientific exp...Ship[w]right[e]s? : the challenges of cataloguing reports from scientific exp...
Ship[w]right[e]s? : the challenges of cataloguing reports from scientific exp...CILIP MDG
 
BFI Reuben Library : an RDA implementation story / Anastasia Kerameos (BFI Re...
BFI Reuben Library : an RDA implementation story / Anastasia Kerameos (BFI Re...BFI Reuben Library : an RDA implementation story / Anastasia Kerameos (BFI Re...
BFI Reuben Library : an RDA implementation story / Anastasia Kerameos (BFI Re...CILIP MDG
 
RDA implementation at the British Library / Thurstan Young (British Library)
RDA implementation at the British Library / Thurstan Young (British Library)RDA implementation at the British Library / Thurstan Young (British Library)
RDA implementation at the British Library / Thurstan Young (British Library)CILIP MDG
 
Community forward : developing descriptive cataloguing of rare materials (RDA...
Community forward : developing descriptive cataloguing of rare materials (RDA...Community forward : developing descriptive cataloguing of rare materials (RDA...
Community forward : developing descriptive cataloguing of rare materials (RDA...CILIP MDG
 
The West Midlands Evidence Repository (WMER) : a regional collaboration proje...
The West Midlands Evidence Repository (WMER) : a regional collaboration proje...The West Midlands Evidence Repository (WMER) : a regional collaboration proje...
The West Midlands Evidence Repository (WMER) : a regional collaboration proje...CILIP MDG
 
Authority of assertion in repository contributions to the PID graph / George ...
Authority of assertion in repository contributions to the PID graph / George ...Authority of assertion in repository contributions to the PID graph / George ...
Authority of assertion in repository contributions to the PID graph / George ...CILIP MDG
 

Más de CILIP MDG (20)

UK Committee on RDA, RDA Day: New Tools for the Future of Cataloguing - Jenny...
UK Committee on RDA, RDA Day: New Tools for the Future of Cataloguing - Jenny...UK Committee on RDA, RDA Day: New Tools for the Future of Cataloguing - Jenny...
UK Committee on RDA, RDA Day: New Tools for the Future of Cataloguing - Jenny...
 
Challenges to implementation - Jenny Wright
Challenges to implementation - Jenny WrightChallenges to implementation - Jenny Wright
Challenges to implementation - Jenny Wright
 
Application Profiles in RDA - Jenny Wright
Application Profiles in RDA - Jenny WrightApplication Profiles in RDA - Jenny Wright
Application Profiles in RDA - Jenny Wright
 
The Official RDA Toolkit - Opportunities for Efficiency - Thurstan Young
The Official RDA Toolkit - Opportunities for Efficiency - Thurstan YoungThe Official RDA Toolkit - Opportunities for Efficiency - Thurstan Young
The Official RDA Toolkit - Opportunities for Efficiency - Thurstan Young
 
The Official RDA Toolkit - Opportunities for Enrichment - Thurstan Youing
The Official RDA Toolkit - Opportunities for Enrichment - Thurstan YouingThe Official RDA Toolkit - Opportunities for Enrichment - Thurstan Youing
The Official RDA Toolkit - Opportunities for Enrichment - Thurstan Youing
 
UKCoR RDA Day 2023 - "Only" Connect
UKCoR RDA Day 2023 - "Only" ConnectUKCoR RDA Day 2023 - "Only" Connect
UKCoR RDA Day 2023 - "Only" Connect
 
RDA methods, scenarios, tools - Gordon Dunsire
RDA methods, scenarios, tools - Gordon DunsireRDA methods, scenarios, tools - Gordon Dunsire
RDA methods, scenarios, tools - Gordon Dunsire
 
Poster: What’s in a name? Re-Discovering cataloguing and index through metada...
Poster: What’s in a name? Re-Discovering cataloguing and index through metada...Poster: What’s in a name? Re-Discovering cataloguing and index through metada...
Poster: What’s in a name? Re-Discovering cataloguing and index through metada...
 
Poster: Revamping our in-house cataloguing training / Victoria Parkinson (Kin...
Poster: Revamping our in-house cataloguing training / Victoria Parkinson (Kin...Poster: Revamping our in-house cataloguing training / Victoria Parkinson (Kin...
Poster: Revamping our in-house cataloguing training / Victoria Parkinson (Kin...
 
Poster: FAST : can it lighten the load, and what is the impact? / Jenny Wrigh...
Poster: FAST : can it lighten the load, and what is the impact? / Jenny Wrigh...Poster: FAST : can it lighten the load, and what is the impact? / Jenny Wrigh...
Poster: FAST : can it lighten the load, and what is the impact? / Jenny Wrigh...
 
Poster: The West Midlands Evidence Repository (WMER) : a regional collaborati...
Poster: The West Midlands Evidence Repository (WMER) : a regional collaborati...Poster: The West Midlands Evidence Repository (WMER) : a regional collaborati...
Poster: The West Midlands Evidence Repository (WMER) : a regional collaborati...
 
Poster: Updating the Wessex Classification Scheme for UK health libraries : a...
Poster: Updating the Wessex Classification Scheme for UK health libraries : a...Poster: Updating the Wessex Classification Scheme for UK health libraries : a...
Poster: Updating the Wessex Classification Scheme for UK health libraries : a...
 
Revamping in-house cataloguing training / Victoria Parkinson (King's College ...
Revamping in-house cataloguing training / Victoria Parkinson (King's College ...Revamping in-house cataloguing training / Victoria Parkinson (King's College ...
Revamping in-house cataloguing training / Victoria Parkinson (King's College ...
 
UK NACO funnel : progress, obstacles, and solutions / Martin Kelleher (Univer...
UK NACO funnel : progress, obstacles, and solutions / Martin Kelleher (Univer...UK NACO funnel : progress, obstacles, and solutions / Martin Kelleher (Univer...
UK NACO funnel : progress, obstacles, and solutions / Martin Kelleher (Univer...
 
Ship[w]right[e]s? : the challenges of cataloguing reports from scientific exp...
Ship[w]right[e]s? : the challenges of cataloguing reports from scientific exp...Ship[w]right[e]s? : the challenges of cataloguing reports from scientific exp...
Ship[w]right[e]s? : the challenges of cataloguing reports from scientific exp...
 
BFI Reuben Library : an RDA implementation story / Anastasia Kerameos (BFI Re...
BFI Reuben Library : an RDA implementation story / Anastasia Kerameos (BFI Re...BFI Reuben Library : an RDA implementation story / Anastasia Kerameos (BFI Re...
BFI Reuben Library : an RDA implementation story / Anastasia Kerameos (BFI Re...
 
RDA implementation at the British Library / Thurstan Young (British Library)
RDA implementation at the British Library / Thurstan Young (British Library)RDA implementation at the British Library / Thurstan Young (British Library)
RDA implementation at the British Library / Thurstan Young (British Library)
 
Community forward : developing descriptive cataloguing of rare materials (RDA...
Community forward : developing descriptive cataloguing of rare materials (RDA...Community forward : developing descriptive cataloguing of rare materials (RDA...
Community forward : developing descriptive cataloguing of rare materials (RDA...
 
The West Midlands Evidence Repository (WMER) : a regional collaboration proje...
The West Midlands Evidence Repository (WMER) : a regional collaboration proje...The West Midlands Evidence Repository (WMER) : a regional collaboration proje...
The West Midlands Evidence Repository (WMER) : a regional collaboration proje...
 
Authority of assertion in repository contributions to the PID graph / George ...
Authority of assertion in repository contributions to the PID graph / George ...Authority of assertion in repository contributions to the PID graph / George ...
Authority of assertion in repository contributions to the PID graph / George ...
 

Último

毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxUnduhUnggah1
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 

Último (20)

毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docx
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 

The magic of MarcEdit, or, how I learned to stop worrying and love metadata / Will Peaden

  • 1. The magic of MarcEdit, or, how I learned to stop worrying and love metadata By Will Peaden w.peaden@aston.ac.uk
  • 2. Introduction: Aston metadata Card catalog image by Megan Amaral (CC BY 2.0)
  • 3. History ► Library restructure in 1995 ► Individual specialists roles dissolved and each professional member of staff given many hats ► No metadata/cataloguing specialist from that time until November 2016 ► Small group of staff do their best to catalogue in the interim
  • 4. The state of the catalogue ► Authority control was lacking ► Many fields missing or incorrect ► Local subject index ► Local subject headings ► No LCSH in some records ► Hybrid e-book and print book records ► Split multi-volume works
  • 5. MARCEdit ► First created in 1999 to enable a data clean-up project at Oregon State University. ► Developed by Terry Reese and updated by him regularly ► Offered as a free download ► Has an enormous array of functionality built into it
  • 6. Metadata projects Authority control Module codes Reclassification Metadata enhancement
  • 7. Authority control ► Authority control: established, unique, consistent forms of terms for disambiguation and collocation ► Project scope: to authorise the name and subject headings in Sierra ► All records were in scope except PDA records as these were not purchased
  • 8. Data extraction and manipulation 1. Extract records in scope from Sierra and save them locally 2. Use MarcBreaker 3. Validate name headings and embed URIs
  • 9. Data extraction and manipulation 1. Extract records in scope from Sierra and save them to locally 2. Use MarcBreaker 3. Validate name headings and embed URIs
  • 10. Data extraction and manipulation 1. Extract records in scope from Sierra and save them locally 2. Use MarcBreaker 3. Validate name headings and embed URIs 4. Extract 1XX, 7XX headings and URIs and copy to Notepad++
  • 11. Data extraction and manipulation 1. Extract records in scope from Sierra and save them to locally 2. Use MarcBreaker 3. Validate name headings and embed URIs 4. Extract 1XX, 7XX headings and URIs and copy to Notepad++
  • 12. Data extraction and manipulation 1. Extract records in scope from Sierra and save them locally 2. Use MarcBreaker 3. Validate name headings and embed URIs 4. Extract 1XX, 7XX headings and URIs and copy to Notepad++ 5. Use regular expressions to extract just the LCCN 6. Make the LCCNs searchable via z39.50
  • 13. Data extraction and manipulation 1. Extract records in scope from Sierra and save them to locally 2. Use MarcBreaker 3. Validate name headings and embed URIs 4. Extract 1XX, 7XX headings and URIs and copy to Notepad++ 5. Use regular expressions to extract just the LCCN 6. Make the LCCNs searchable via z39.50
  • 14. To sum up ► MarcBreaker ► Validate headings ► Normalise data for searching ► z39.50
  • 15. Module codes ► Project scope:  Update all reading list items in Sierra with the current course code ► Concerns these were out of date ► Codes in Sierra still important for current workflows
  • 17. Excel to MARC records
  • 23. Finishing ► Load dummy MARC records using custom load table ► Matches on bibliographic number ► Only importing 980 field ► Use Sierra Global Update function to update 900 module code field
  • 24. Reclassification ► Some areas of the collection classified to an old standard ► Split collections with shelf ready records ► Too many to individually reclassify ► MarcEdit function “Generate classification” based on OCLCs Classify ► Project Scope: 301-307, just over 7000 titles ► Import the classification the same way as module codes via local field 982
  • 25. Pros ► Tool is fast and easy to use ► Lots of extra functionality such as fast headings ► Accurate up to date classification (mostly) ► It relies on ISBN and author/title matching ► Some errors ► Some things simply not found …and cons
  • 26. Metadata enhancement ► Project scope:  Improve the metadata of Aston legacy records starting with records lacking LCSH by fishing for records ► Data preparation ► z39.50 search ► Data enhancement ► RDA ► Linked data?
  • 27. Normalize data, z39.50 searching ► Export target records from Sierra ► Search for and extract data points such as ISBN, title, main author, date of publication ► Normalize data e.g. remove fluff from 020s, use only title proper, fixed dates, use surnames only ► Make these data searchable via z39.50
  • 29. Search for records, analyse results
  • 31. Lessons learned and concluding remarks ► Data normalization takes time and is important ► Take care to document everything and make your file metadata clear ► Using Box really helped with reversing mistakes ► Search files need to be manageable (probably no bigger than 1000 records) ► Trial and error and Google are your friends ► Questions?

Notas del editor

  1. In 1995 Aston Library was restructured. The main outcome of this restructure was to dissolve the different specialist roles within the library. Each member of professional staff in the Information Resources division had multiple roles and in theory did a bit of everything. Cataloguing and metadata were relegated to one member of staff who “minded” the catalogue in this period. A small group of non-professional staff did their best to keep the catalogue together through this period. Aston adopted basically full shelf-ready in 2008 (and the quality of records improve from that time)
  2. This is not a fully comprehensive list of the issues with the catalogue but it gives a good impression. As I discovered these issues, I began to work out ways that I might be able to fix them. Additionally, I was tasked with lightening the burden of cataloguing on the Information Assistants. I prioritised my efforts first on streamlining workflows as these produce tangible results.
  3. For those unfamiliar with MARCEdit, it is a free software created and curated by Terry Reese. He initially created it for a metadata project in 1999 and later added a GUI and a whole host of useful functionalities.
  4. I will now briefly describe some metadata projects I undertook to improve the legacy data and our workflows.
  5. Authority control is the process of using a set of standards to create an established, unique and consistent form of a term for disambiguation and collocation. At Aston, there was a scrappy local name index that had been in obsolesce for 20 years and did not correspond to the NACO forms of names in our vendor and shelf ready records. My goal was to establish authority control with NACO headings. Every record in the system was in scope for the project with the exception of our PDA records as these were not purchased items.
  6. I will describe this process in a little detail as each project follows a similar path. To begin I use Sierra create list function to isolate sets of records that I want to use and save these separately either on my local machine or in Box (a cloud based file sharing site) I use MarcBreaker which creates a text readable form of the MARC records that can be easily manipulated. I use the Validate headings functions in the text editor and embed URIs which are my target for this project.
  7. I extract the 1xx and 7xx using the “search all” function and put these in to Notepad++
  8. Here is a view of the extracted data. My target in the URI is the LCCN.
  9. I use regular expressions to extract just the LCCN from the URI and then make this searchable via z39.50.
  10. @attr 1=9 will search the 010 MARC field in authority records. I then use MARCEdit z39.50 client to search the NAF (Name Authority File) for these authorities.
  11. These steps summarise the steps in many of the projects. Using MARCBreaker to get text records that are easy to manipulate. Do something to the records in this form (in this case validating headings). Normalising data use Notepad++ and regular expressions (this step often happens more than once). Using z39.50 or load tables to in Sierra to apply the changes.
  12. This project was one to improve the internal house keeping for our reading list records. As part of the workflow for adding items to reading lists, module codes for these courses were added to Sierra in a local, indexed field. These were done manually by staff both adding and removing. There was concern that these were out of date and things might be missed. It was determined that this was still a valuable thing to maintain but we wanted to automate it.
  13. Here is a view of the data extracted from Talis. The two pieces of data I’m interested in are the Reading list name and code and the LCN (Local control number)
  14. I take these fields and a few others to make an excel file ready for translating data from text to MARC. I include the extra fields to make the dummy MARC records more human reader friendly.
  15. MARCEdit’s delimited text translator can turn tab separated data into MARC records.
  16. Each field (or column) needs an argument so MARCEdit will correctly place each data in the right field.
  17. Once this is done there will be several arguments listed and can be more complex than this.
  18. What I generate at the end is a dummy MARC record. The dummy record contains the reading list code and the record number. Using the record number I can input the reading list code into Sierra.
  19. Reclassification is one of the simplest processes I run. It is important in the age of shelf ready to keep collections up to date as new class ranges are formed or discontinued. This leads to split runs of books on the same topic. I extract the records from Sierra, run in MARCEdit “generate classification” and then load the new classmarks back into Sierra into a local field the same way I load the module codes.
  20. The scope of the project is potentially any legacy record that is not as good as it could be for discovery and access. However to begin with I selected all those records that lacked LCSH. These records had other issues I outlined at the start, such as in accurate headings (or no headings), missing data and so on. I followed various steps from normalising the search data, to enhancing the records with automatic processes in MARCEdit. It even gave me an opportunity to add some linked data elements to our records.
  21. This summarises the steps I take. I exporting the target records from Sierra and put them through MarcBreaker. From here I extracted different data points from the records to form a search strategy. This data needed significant normalization as metadata transcription practices vary enormously. Once I have the normalised data points I make them searchable via z39.50
  22. This examples shows a sample search for the author (surname only), the ISBN, the title proper and the publication date.
  23. I select sources I have access to where better records are likely to exist, in this case the Library of Congress, and run a custom search. I then analyse the results and see what was accepted and rejected. Sometimes records do exist but the data points don’t match, such as the date or even the title proper. All those that fail to match I save and try another source such as the British Library.
  24. At this point I take the downloaded record and match it to the original Sierra record using the Merge Records function. I insert the Sierra local number into the improved external record and run the MarcEdit automatic transformations adding Fast headings, making some programmatic RDA changes and adding linked data points.
  25. For all of these projects I have learned the importance of data normalisation. It has made me more aware of consistency in transcription and showed me fairly clearly the limitations of MARC for machine processing. Documentation is very important, recording what was done and what needs to be done really helps to keep the projects coherent. I put a high importance on naming my files as descriptively as possible and where possible recording notes or comments. Another important feature to the projects was using the online file sharing site Box. It versions every time I save, so any mistakes I make including in editing MARC records can be undone if I inadvertently save it. When dealing with batches of records I find it is vital to break them down into smaller chunks. It adds a bit of time but it makes dealing with the records much more manageable. And finally trail and error really do work. I frequently had an idea, tried it out, found it didn’t work and amended it until it did.