SlideShare una empresa de Scribd logo
1 de 16
Descargar para leer sin conexión
This is an interesting
metadata source. Can I
 import it into Koha?
 Marijana Glavica <mglavica@ffzg.hr>
Dobrica Pavlinušić <dpavlin@rot13.org>
Material
● 6000 scans of book front pages

● directories organised by person who did the
  scanning, and location of the books

● filenames - inventory number (having
  duplicates)
Task
● add metadata to scanned material
  ○ some books already catalogued somewhere else
  ○ not all sources have Z39.59


● upload images

● keep track of what is done
  ○ separate spreadsheet file?
Solution
● Create MARC records from file names -
  bibliographic and items
  ○ itemtype - not yet processed


● import MARC records and upload all images
  in Koha
What is wrong with metadata?
http://www.catholicresearch.net/blog/2012/05/oai/
The harvested Dublin Core metadata was
typical of OAI-PMH repositories: thin, a bit
ambiguous, and somewhat inconsistent across
repositories. -- Eric Lease Morgan

Europeana is nice example of this:
● sparse on meta-data
● multiple link hops to image of record (?!)
Importing covers and meta data
● DVD with scanned book front pages
  ○ various resolution (from stamp size to 300 dpi)
  ○ number_student/location/inventory_note.jpg
● Koha 3.8 has a tool to upload zip with cover
  images and idlink.txt
  ○ zip files big, and we don't have biblio records
● Create MARC21 records from file names
  (only metadata available to us)
● Write script which uses Koha API
  ○ create MARC21 using MARC::Record
  ○ AddItemFromMarc, PutImage
  ○ https://github.com/dpavlin/Koha/blob/koha_gimpoz/misc/gimpoz/import-images.pl
Scrape cataloging
● It's like copy cataloguing, but you don't have
  to use copy/paste in your browser to do it
● Instead, you use scraper to Z39.50 gateway:
  https://github.com/dpavlin/Biblio-Z3950
● Source formats:
   ○   Aleph - NSK, our national library
   ○   COBISS - they started serving images for records!
   ○   Google Books - another JSON API
   ○   vuFind - HathiTrust (MARC records export)
   ○   DPLA - JSON API (with broken UTF-8 encoding)
● Returns MARC21 records for Koha import
Scraping?!
● It's 2012, where is my semantic web?!
● Various reasons why scraping is easier
  ○ no public Z39.50 server
    ■ or there is one but has wrong encoding
  ○ data source isn't MARC21
    ■ older national MARC standards, UNIMARC or
       JSON for Google Books
● This is open source projects
  ○ all parts, but some assembly required
    ■ URLS to resources, mapping to MARC
  ○ modify existing scrapers to create new ones
● Let the data flow!
Biblio::Z3950
● based on Net::Z3950::SimpleServer
● convert Z39.50 RPN query to URL params
  ○ API support for and/or/not operators
  ○ enter just one field in Koha
● use WWW::Mechanize to issue search
  ○ advanced search syntax is best choice if available
  ○ scrape web page for results
    ■ web page with MARC-like structure
    ■ export formats
● use MARC::Record to create MARC21
  ○ web pages have utf-8 encoding
  ○ mapping to MARC specified in code
Mappings easy to define (in code :-)




                                Google Books JSON to
                                MARC mapping is more
                                complex but still only 80
                                    lines of code
Questions?
● Do you have nicely formatted web pages
  which need conversion to MARC21 for
  Koha?
● Is storing cover images in database the right
  way? (4.9Gb gziped SQL dump)

● This presentation: http://bit.ly/gimpoz
● Koha instance: http://gimpoz.koha.rot13.org
● Blog: http://blog.rot13.org
Abstract
We live in a world of data. However, data doesn’t always come in a format that is as easy to share as
we would expect.
We had approximately 6000 scans of book front pages coming from the Teachers' library stock of
the Gymnasium in Požega, which was proclaimed the movable monument of culture that carries
national significance. Our goal was to make the library stock visible to public and we needed to
add metadata to those images. Fortunately, some of that data was already available on the web: in
National Library’s Aleph system, several Croatian libraries using Koha, Hathi Trust digital library
(VuFind), Open Library, Google Books, Europeana etc.
Importing local images is now standard part of Koha, so we decided to import all those images and
to create the initial biblio records using the only kind of metadata that we had: structured directories
and filenames which represent some kind of identifier number. After that, we started cataloguing our
items. There is a convenient method for adding bibliographic data to a catalogue: using Z39.50
search. Unfortunately, not all of our metadata sources provided Z39.50 interface.
Our solution to the problem was to use scrape-cataloguing, which provided us with a way to avoid
infinite copy & paste cycles or manual data entry. Instead, the job was done by our script that provides
Z39.50 interface for Koha.

Más contenido relacionado

La actualidad más candente

Semantic Media Management with Apache Marmotta
Semantic Media Management with Apache MarmottaSemantic Media Management with Apache Marmotta
Semantic Media Management with Apache Marmotta
Thomas Kurz
 
Semantics, rdf and drupal
Semantics, rdf and drupalSemantics, rdf and drupal
Semantics, rdf and drupal
Gokul Nk
 

La actualidad más candente (20)

Drupal and RDF
Drupal and RDFDrupal and RDF
Drupal and RDF
 
swib15 ALIADA
swib15 ALIADAswib15 ALIADA
swib15 ALIADA
 
Geospatial Querying in Apache Marmotta - Apache Big Data North America 2016
Geospatial Querying in Apache Marmotta -  Apache Big Data North America 2016Geospatial Querying in Apache Marmotta -  Apache Big Data North America 2016
Geospatial Querying in Apache Marmotta - Apache Big Data North America 2016
 
Review of KohaCon18
Review of KohaCon18Review of KohaCon18
Review of KohaCon18
 
Querying GrAF data in linguistic analysis
Querying GrAF data in linguistic analysisQuerying GrAF data in linguistic analysis
Querying GrAF data in linguistic analysis
 
MuseoTorino, first italian project using a GraphDB, RDFa, Linked Open Data
MuseoTorino, first italian project using a GraphDB, RDFa, Linked Open DataMuseoTorino, first italian project using a GraphDB, RDFa, Linked Open Data
MuseoTorino, first italian project using a GraphDB, RDFa, Linked Open Data
 
Leaving Blackboxes Behind - ELAG 2016
Leaving Blackboxes Behind - ELAG 2016Leaving Blackboxes Behind - ELAG 2016
Leaving Blackboxes Behind - ELAG 2016
 
Enabling access to Linked Media with SPARQL-MM
Enabling access to Linked Media with SPARQL-MMEnabling access to Linked Media with SPARQL-MM
Enabling access to Linked Media with SPARQL-MM
 
Apache Marmotta (incubating)
Apache Marmotta (incubating)Apache Marmotta (incubating)
Apache Marmotta (incubating)
 
Semantic Media Management with Apache Marmotta
Semantic Media Management with Apache MarmottaSemantic Media Management with Apache Marmotta
Semantic Media Management with Apache Marmotta
 
RDF Seminar Presentation
RDF Seminar PresentationRDF Seminar Presentation
RDF Seminar Presentation
 
Hadoop World 2011: Radoop: a Graphical Analytics Tool for Big Data - Gabor Ma...
Hadoop World 2011: Radoop: a Graphical Analytics Tool for Big Data - Gabor Ma...Hadoop World 2011: Radoop: a Graphical Analytics Tool for Big Data - Gabor Ma...
Hadoop World 2011: Radoop: a Graphical Analytics Tool for Big Data - Gabor Ma...
 
DBpedia Japanese
DBpedia JapaneseDBpedia Japanese
DBpedia Japanese
 
Sasaki datathon-madrid-2015
Sasaki datathon-madrid-2015Sasaki datathon-madrid-2015
Sasaki datathon-madrid-2015
 
Linked Media Management with Apache Marmotta
Linked Media Management with Apache MarmottaLinked Media Management with Apache Marmotta
Linked Media Management with Apache Marmotta
 
Drupal 7 and RDF
Drupal 7 and RDFDrupal 7 and RDF
Drupal 7 and RDF
 
Semantics, rdf and drupal
Semantics, rdf and drupalSemantics, rdf and drupal
Semantics, rdf and drupal
 
Produce & Publish Authoring Environment V 2.0 (english version)
Produce & Publish Authoring Environment V 2.0 (english version)Produce & Publish Authoring Environment V 2.0 (english version)
Produce & Publish Authoring Environment V 2.0 (english version)
 
Apache Spark's Built-in File Sources in Depth
Apache Spark's Built-in File Sources in DepthApache Spark's Built-in File Sources in Depth
Apache Spark's Built-in File Sources in Depth
 
Selecting the right database type for your knowledge management needs.
Selecting the right database type for your knowledge management needs.Selecting the right database type for your knowledge management needs.
Selecting the right database type for your knowledge management needs.
 

Destacado

Operation Payback (...is a bitch): Hacktivism at the Dawn of Copyright Contro...
Operation Payback (...is a bitch): Hacktivism at the Dawn of Copyright Contro...Operation Payback (...is a bitch): Hacktivism at the Dawn of Copyright Contro...
Operation Payback (...is a bitch): Hacktivism at the Dawn of Copyright Contro...
PaleFire
 
Kako napraviti Google od zgrade sa računalima?
Kako napraviti Google od zgrade sa računalima?Kako napraviti Google od zgrade sa računalima?
Kako napraviti Google od zgrade sa računalima?
Dobrica Pavlinušić
 

Destacado (20)

Social Media & Web 2.0 Services for Choirs
Social Media & Web 2.0 Services for ChoirsSocial Media & Web 2.0 Services for Choirs
Social Media & Web 2.0 Services for Choirs
 
Playful Blended Digital Storytelling in 3D Immersive eLearning Environments f...
Playful Blended Digital Storytelling in 3D Immersive eLearning Environments f...Playful Blended Digital Storytelling in 3D Immersive eLearning Environments f...
Playful Blended Digital Storytelling in 3D Immersive eLearning Environments f...
 
Mojo Facets – so, you have data and browser?
Mojo Facets – so, you have data and browser?Mojo Facets – so, you have data and browser?
Mojo Facets – so, you have data and browser?
 
Intro to Haml
Intro to HamlIntro to Haml
Intro to Haml
 
Morocco
MoroccoMorocco
Morocco
 
Creating And Customizing Your Blackboard Class
Creating And Customizing Your Blackboard ClassCreating And Customizing Your Blackboard Class
Creating And Customizing Your Blackboard Class
 
Operation Payback (...is a bitch): Hacktivism at the Dawn of Copyright Contro...
Operation Payback (...is a bitch): Hacktivism at the Dawn of Copyright Contro...Operation Payback (...is a bitch): Hacktivism at the Dawn of Copyright Contro...
Operation Payback (...is a bitch): Hacktivism at the Dawn of Copyright Contro...
 
Oslobodimo Hardware
Oslobodimo HardwareOslobodimo Hardware
Oslobodimo Hardware
 
CTE Teaching and Learning Inst. 2008
CTE Teaching and Learning Inst. 2008CTE Teaching and Learning Inst. 2008
CTE Teaching and Learning Inst. 2008
 
REST ili kao sam se prestao brinuti o HTTP-u i zavolio ga (HTTP Server sa RFI...
REST ili kao sam se prestao brinuti o HTTP-u i zavolio ga (HTTP Server sa RFI...REST ili kao sam se prestao brinuti o HTTP-u i zavolio ga (HTTP Server sa RFI...
REST ili kao sam se prestao brinuti o HTTP-u i zavolio ga (HTTP Server sa RFI...
 
Χριστούγεννα χωρίς Χριστό
Χριστούγεννα χωρίς ΧριστόΧριστούγεννα χωρίς Χριστό
Χριστούγεννα χωρίς Χριστό
 
Free Libre Open Source Software at FFZG library
Free Libre Open Source Software at FFZG libraryFree Libre Open Source Software at FFZG library
Free Libre Open Source Software at FFZG library
 
Towards an Instructional Design Motivational Framework to Address the Retenti...
Towards an Instructional Design Motivational Framework to Address the Retenti...Towards an Instructional Design Motivational Framework to Address the Retenti...
Towards an Instructional Design Motivational Framework to Address the Retenti...
 
Wiki: Open Collaborative Learning Environment
Wiki: Open Collaborative Learning EnvironmentWiki: Open Collaborative Learning Environment
Wiki: Open Collaborative Learning Environment
 
Ppt Demo Slideshare
Ppt Demo SlidesharePpt Demo Slideshare
Ppt Demo Slideshare
 
Kako napraviti Google od zgrade sa računalima?
Kako napraviti Google od zgrade sa računalima?Kako napraviti Google od zgrade sa računalima?
Kako napraviti Google od zgrade sa računalima?
 
Pubic Diplomacy and Web 2.0
Pubic Diplomacy and Web 2.0Pubic Diplomacy and Web 2.0
Pubic Diplomacy and Web 2.0
 
Cisco Board 18
Cisco Board 18Cisco Board 18
Cisco Board 18
 
Virtual LDAP - kako natjerati strgane aplikacije da koriste LDAP
Virtual LDAP - kako natjerati strgane aplikacije da koriste LDAPVirtual LDAP - kako natjerati strgane aplikacije da koriste LDAP
Virtual LDAP - kako natjerati strgane aplikacije da koriste LDAP
 
The Constellation Query Language
The Constellation Query LanguageThe Constellation Query Language
The Constellation Query Language
 

Similar a This is an interesting metadata source. Can I import it into Koha?

Mechanical curator - Technical notes
Mechanical curator - Technical notesMechanical curator - Technical notes
Mechanical curator - Technical notes
benosteen
 

Similar a This is an interesting metadata source. Can I import it into Koha? (20)

EuropeanaTech x AI: Qurator.ai @ Berlin State Library
EuropeanaTech x AI: Qurator.ai @ Berlin State LibraryEuropeanaTech x AI: Qurator.ai @ Berlin State Library
EuropeanaTech x AI: Qurator.ai @ Berlin State Library
 
Data analysis with Pandas and Spark
Data analysis with Pandas and SparkData analysis with Pandas and Spark
Data analysis with Pandas and Spark
 
Linked Open Citation Database (LOC-DB)
Linked Open Citation Database (LOC-DB)Linked Open Citation Database (LOC-DB)
Linked Open Citation Database (LOC-DB)
 
Ontology Access Kit_ Workshop Intro Slides.pptx
Ontology Access Kit_ Workshop Intro Slides.pptxOntology Access Kit_ Workshop Intro Slides.pptx
Ontology Access Kit_ Workshop Intro Slides.pptx
 
Logs aggregation and analysis
Logs aggregation and analysisLogs aggregation and analysis
Logs aggregation and analysis
 
AddisDev Meetup ii: Golang and Flow-based Programming
AddisDev Meetup ii: Golang and Flow-based ProgrammingAddisDev Meetup ii: Golang and Flow-based Programming
AddisDev Meetup ii: Golang and Flow-based Programming
 
ArangoDB
ArangoDBArangoDB
ArangoDB
 
Apache Arrow: Cross-language Development Platform for In-memory Data
Apache Arrow: Cross-language Development Platform for In-memory DataApache Arrow: Cross-language Development Platform for In-memory Data
Apache Arrow: Cross-language Development Platform for In-memory Data
 
Mechanical curator - Technical notes
Mechanical curator - Technical notesMechanical curator - Technical notes
Mechanical curator - Technical notes
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
A rubyist's naive comparison of some database systems and toolkits
A rubyist's naive comparison of some database systems and toolkitsA rubyist's naive comparison of some database systems and toolkits
A rubyist's naive comparison of some database systems and toolkits
 
Interactive E-Books
Interactive E-BooksInteractive E-Books
Interactive E-Books
 
Apache Arrow at DataEngConf Barcelona 2018
Apache Arrow at DataEngConf Barcelona 2018Apache Arrow at DataEngConf Barcelona 2018
Apache Arrow at DataEngConf Barcelona 2018
 
Migrating structured data between Hadoop and RDBMS
Migrating structured data between Hadoop and RDBMSMigrating structured data between Hadoop and RDBMS
Migrating structured data between Hadoop and RDBMS
 
Berlin Buzzwords 2019 - Taming the language border in data analytics and scie...
Berlin Buzzwords 2019 - Taming the language border in data analytics and scie...Berlin Buzzwords 2019 - Taming the language border in data analytics and scie...
Berlin Buzzwords 2019 - Taming the language border in data analytics and scie...
 
Modern SharePoint, the Good, the Bad, and the Ugly
Modern SharePoint, the Good, the Bad, and the UglyModern SharePoint, the Good, the Bad, and the Ugly
Modern SharePoint, the Good, the Bad, and the Ugly
 
Apache Arrow: Present and Future @ ScaledML 2020
Apache Arrow: Present and Future @ ScaledML 2020Apache Arrow: Present and Future @ ScaledML 2020
Apache Arrow: Present and Future @ ScaledML 2020
 
Introduction to Go
Introduction to GoIntroduction to Go
Introduction to Go
 
Introduction to knime
Introduction to knimeIntroduction to knime
Introduction to knime
 
On Again; Off Again - Benjamin Young - ebookcraft 2017
On Again; Off Again - Benjamin Young - ebookcraft 2017On Again; Off Again - Benjamin Young - ebookcraft 2017
On Again; Off Again - Benjamin Young - ebookcraft 2017
 

Más de Dobrica Pavlinušić

Virtualization which isn't: LXC (Linux Containers)
Virtualization which isn't: LXC (Linux Containers)Virtualization which isn't: LXC (Linux Containers)
Virtualization which isn't: LXC (Linux Containers)
Dobrica Pavlinušić
 
Slobodni softver za digitalne arhive: EPrints u Knjižnici Filozofskog fakulte...
Slobodni softver za digitalne arhive: EPrints u Knjižnici Filozofskog fakulte...Slobodni softver za digitalne arhive: EPrints u Knjižnici Filozofskog fakulte...
Slobodni softver za digitalne arhive: EPrints u Knjižnici Filozofskog fakulte...
Dobrica Pavlinušić
 
Post-relational databases: What's wrong with web development?
Post-relational databases: What's wrong with web development?Post-relational databases: What's wrong with web development?
Post-relational databases: What's wrong with web development?
Dobrica Pavlinušić
 

Más de Dobrica Pavlinušić (20)

Mainline kernel on ARM Tegra20 devices that are left behind on 2.6 kernels
Mainline kernel on ARM Tegra20 devices that are left behind on 2.6 kernelsMainline kernel on ARM Tegra20 devices that are left behind on 2.6 kernels
Mainline kernel on ARM Tegra20 devices that are left behind on 2.6 kernels
 
Linux+sensor+device-tree+shell=IoT !
Linux+sensor+device-tree+shell=IoT !Linux+sensor+device-tree+shell=IoT !
Linux+sensor+device-tree+shell=IoT !
 
bro - what is in my network?
bro - what is in my network?bro - what is in my network?
bro - what is in my network?
 
Let's hack cheap hardware 2016 edition
Let's hack cheap hardware 2016 editionLet's hack cheap hardware 2016 edition
Let's hack cheap hardware 2016 edition
 
Raspberry Pi - best friend for all your GPIO needs
Raspberry Pi - best friend for all your GPIO needsRaspberry Pi - best friend for all your GPIO needs
Raspberry Pi - best friend for all your GPIO needs
 
Cheap, good, hackable tools from China: AVR component tester
Cheap, good, hackable tools from China: AVR component testerCheap, good, hackable tools from China: AVR component tester
Cheap, good, hackable tools from China: AVR component tester
 
Ganeti - build your own cloud
Ganeti - build your own cloudGaneti - build your own cloud
Ganeti - build your own cloud
 
FSEC 2014 - I can haz your board with JTAG
FSEC 2014 - I can haz your board with JTAGFSEC 2014 - I can haz your board with JTAG
FSEC 2014 - I can haz your board with JTAG
 
Hardware hacking for software people
Hardware hacking for software peopleHardware hacking for software people
Hardware hacking for software people
 
Gnu linux on arm for $50 - $100
Gnu linux on arm for $50 - $100Gnu linux on arm for $50 - $100
Gnu linux on arm for $50 - $100
 
Security of Linux containers in the cloud
Security of Linux containers in the cloudSecurity of Linux containers in the cloud
Security of Linux containers in the cloud
 
Web scale monitoring
Web scale monitoringWeb scale monitoring
Web scale monitoring
 
SysAdmin cookbook
SysAdmin cookbookSysAdmin cookbook
SysAdmin cookbook
 
Printing on Linux, simple right?
Printing on Linux, simple right?Printing on Linux, simple right?
Printing on Linux, simple right?
 
KohaCon11: Integrating Koha with RFID system
KohaCon11: Integrating Koha with RFID systemKohaCon11: Integrating Koha with RFID system
KohaCon11: Integrating Koha with RFID system
 
Deploy your own P2P network
Deploy your own P2P networkDeploy your own P2P network
Deploy your own P2P network
 
Post-relational databases: What's wrong with web development? v3
Post-relational databases: What's wrong with web development? v3Post-relational databases: What's wrong with web development? v3
Post-relational databases: What's wrong with web development? v3
 
Virtualization which isn't: LXC (Linux Containers)
Virtualization which isn't: LXC (Linux Containers)Virtualization which isn't: LXC (Linux Containers)
Virtualization which isn't: LXC (Linux Containers)
 
Slobodni softver za digitalne arhive: EPrints u Knjižnici Filozofskog fakulte...
Slobodni softver za digitalne arhive: EPrints u Knjižnici Filozofskog fakulte...Slobodni softver za digitalne arhive: EPrints u Knjižnici Filozofskog fakulte...
Slobodni softver za digitalne arhive: EPrints u Knjižnici Filozofskog fakulte...
 
Post-relational databases: What's wrong with web development?
Post-relational databases: What's wrong with web development?Post-relational databases: What's wrong with web development?
Post-relational databases: What's wrong with web development?
 

Último

Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
KarakKing
 

Último (20)

Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 

This is an interesting metadata source. Can I import it into Koha?

  • 1. This is an interesting metadata source. Can I import it into Koha? Marijana Glavica <mglavica@ffzg.hr> Dobrica Pavlinušić <dpavlin@rot13.org>
  • 2.
  • 3.
  • 4.
  • 5. Material ● 6000 scans of book front pages ● directories organised by person who did the scanning, and location of the books ● filenames - inventory number (having duplicates)
  • 6. Task ● add metadata to scanned material ○ some books already catalogued somewhere else ○ not all sources have Z39.59 ● upload images ● keep track of what is done ○ separate spreadsheet file?
  • 7. Solution ● Create MARC records from file names - bibliographic and items ○ itemtype - not yet processed ● import MARC records and upload all images in Koha
  • 8.
  • 9. What is wrong with metadata? http://www.catholicresearch.net/blog/2012/05/oai/ The harvested Dublin Core metadata was typical of OAI-PMH repositories: thin, a bit ambiguous, and somewhat inconsistent across repositories. -- Eric Lease Morgan Europeana is nice example of this: ● sparse on meta-data ● multiple link hops to image of record (?!)
  • 10. Importing covers and meta data ● DVD with scanned book front pages ○ various resolution (from stamp size to 300 dpi) ○ number_student/location/inventory_note.jpg ● Koha 3.8 has a tool to upload zip with cover images and idlink.txt ○ zip files big, and we don't have biblio records ● Create MARC21 records from file names (only metadata available to us) ● Write script which uses Koha API ○ create MARC21 using MARC::Record ○ AddItemFromMarc, PutImage ○ https://github.com/dpavlin/Koha/blob/koha_gimpoz/misc/gimpoz/import-images.pl
  • 11. Scrape cataloging ● It's like copy cataloguing, but you don't have to use copy/paste in your browser to do it ● Instead, you use scraper to Z39.50 gateway: https://github.com/dpavlin/Biblio-Z3950 ● Source formats: ○ Aleph - NSK, our national library ○ COBISS - they started serving images for records! ○ Google Books - another JSON API ○ vuFind - HathiTrust (MARC records export) ○ DPLA - JSON API (with broken UTF-8 encoding) ● Returns MARC21 records for Koha import
  • 12. Scraping?! ● It's 2012, where is my semantic web?! ● Various reasons why scraping is easier ○ no public Z39.50 server ■ or there is one but has wrong encoding ○ data source isn't MARC21 ■ older national MARC standards, UNIMARC or JSON for Google Books ● This is open source projects ○ all parts, but some assembly required ■ URLS to resources, mapping to MARC ○ modify existing scrapers to create new ones ● Let the data flow!
  • 13. Biblio::Z3950 ● based on Net::Z3950::SimpleServer ● convert Z39.50 RPN query to URL params ○ API support for and/or/not operators ○ enter just one field in Koha ● use WWW::Mechanize to issue search ○ advanced search syntax is best choice if available ○ scrape web page for results ■ web page with MARC-like structure ■ export formats ● use MARC::Record to create MARC21 ○ web pages have utf-8 encoding ○ mapping to MARC specified in code
  • 14. Mappings easy to define (in code :-) Google Books JSON to MARC mapping is more complex but still only 80 lines of code
  • 15. Questions? ● Do you have nicely formatted web pages which need conversion to MARC21 for Koha? ● Is storing cover images in database the right way? (4.9Gb gziped SQL dump) ● This presentation: http://bit.ly/gimpoz ● Koha instance: http://gimpoz.koha.rot13.org ● Blog: http://blog.rot13.org
  • 16. Abstract We live in a world of data. However, data doesn’t always come in a format that is as easy to share as we would expect. We had approximately 6000 scans of book front pages coming from the Teachers' library stock of the Gymnasium in Požega, which was proclaimed the movable monument of culture that carries national significance. Our goal was to make the library stock visible to public and we needed to add metadata to those images. Fortunately, some of that data was already available on the web: in National Library’s Aleph system, several Croatian libraries using Koha, Hathi Trust digital library (VuFind), Open Library, Google Books, Europeana etc. Importing local images is now standard part of Koha, so we decided to import all those images and to create the initial biblio records using the only kind of metadata that we had: structured directories and filenames which represent some kind of identifier number. After that, we started cataloguing our items. There is a convenient method for adding bibliographic data to a catalogue: using Z39.50 search. Unfortunately, not all of our metadata sources provided Z39.50 interface. Our solution to the problem was to use scrape-cataloguing, which provided us with a way to avoid infinite copy & paste cycles or manual data entry. Instead, the job was done by our script that provides Z39.50 interface for Koha.