SlideShare una empresa de Scribd logo
1 de 19
Stefano Bargioni
Pontificia Università della Santa Croce

Catalogue enrichment: importing
Dewey Decimal Classification
from external sources

Oct 18, 2013

ADLUG 2013

1
The project
●

Improving the Dewey search path
–
–

●

●

with a minimal effort
while adding BNCF compliant subject headings to our
catalog

Koha 3 <http://koha-community.org> open source
ILS
Can be applied to other ILS's

Oct 18, 2013

ADLUG 2013

2
Version 1: The Batch Mode
●

Add Dewey notations to the catalog
–

automatically

–

from selected sources

–

ensure quality and uniformity

Oct 18, 2013

ADLUG 2013

3
An atomic copy cataloguing
●
●

copy cataloguing is usually related to the full record
we only need to copy field 082 (MARC21) or 676
(Unimarc)

●

ISBN unique identifier

●

the policy issue

Oct 18, 2013

ADLUG 2013

4
Records to be modified
●

without Dewey notation

●

with ISBN

●

limit: 008 language
–

SELECT biblionumber, ISBN
FROM biblio
WHERE ISBN_present
AND dewey_absent
AND language_008='...'

Oct 18, 2013

ADLUG 2013

In
Ko
cla ha,
My use i the W
Ex
tra SQ s ba HE
on ctV L
s
fie alu fun ed o RE
ld
e, t ctio n
thr bibl ha n
ou io. t w
exp gh X ma ork
res Pa rcxm s
sio th
l
ns
5
Dewey Sources (I)
●

a choice based on copy cataloguing experience

●

OCLC Classify

●

some National Libraries

●

API, Z39.50 or HTML access

Oct 18, 2013

ADLUG 2013

6
Dewey Sources (II): OCLC Classify
●

●

●

Classify is a FRBR-based prototype designed to support the assignment of classification
numbers and subject headings for books, DVDs, CDs, and other types of materials.
This project applies principles of the FRBR model to aggregate bibliographic information
above the manifestation level. Bibliographic records are grouped using the OCLC FRBR
Work-Set algorithm to form a work-level summary of the class numbers and subject headings
assigned to a work. You can retrieve a summary by ISBN, ISSN, UPC, OCLC number,
author/title, or subject heading.
The Classify database is accessible through a user interface and as a machine-to-machine
service. The database provides access to more than 36 million WorldCat records that contain
Dewey Decimal Classification (DDC) numbers,[...].

●

Retrieved information is in XML format.

●

http://www.oclc.org/research/activities/classify.html?urlm=159746

Oct 18, 2013

ADLUG 2013

7
Dewey Sources (III): National Libraries
LC

Library of Congress

(any)

MARC

BNF

Bibliothèque nationale de France

(fre)

MARC

DNB

Deutsche Nationalbibliothek

(ger)

HTML

BNCF

Biblioteca Nazionale Centrale di Firenze

(ita)

HTML

BNCR

Biblioteca Nazionale Centrale di Roma

(ita)

HTML

BNB

British National Bibliography

(eng)

MARC

Oct 18, 2013

ADLUG 2013

8
The logic used in the programs
●

open the connection to the bibliographical database

●

obtain the ISBN from records without a Dewey number

●

open the connection to the Dewey source, if Z39.50

●

for each ISBN

●

query the data source using the current ISBN

●

if a Dewey number is available in the response

●

if the Dewey number passes quality control

●

update the bibliographical record

●

wait to avoid overloading

●

close the connection to the Dewey source, if Z39.50

●

close the connection to the bibliographical database

Oct 18, 2013

ADLUG 2013

9
Quality check
●

Catalogs contain errors

●

DDC has many editions

●

Our old Dewey numbers start from edition 19

●

Indicators

●

Lot of discarded Dewey...

●

… but we moved from 40,000
to 60,000 records with Dewey number

Oct 18, 2013

ADLUG 2013

+5

0%
10
Delay while searching sources
●

Continuous searching can suffocate remote servers
–
–

●
●

robots.txt
policies for crawlers

Continuous indexing can overload your server
Wait a few seconds between searches or group of
searches
–

this will slow the harvesting process

Oct 18, 2013

ADLUG 2013

11
Statistics
Source

Language

Dewey #
not found

Dewey #
discarded

Classify

all

42387

10267

5321

6607

20059

LC

all

31999

1252

21195

8562

1011

BNF

all

30903

2253

21327

7268

55

DNB

ger

4193

163

3867

163

0

BNCF

ita

12017

4088

3643

3542

744

BNCR

ita

7549

1515

3003

2978

53

BNB

eng

6215

193

5449

55

518

Total

Oct 18, 2013

Records
Scanned

Records
Modified

ISBN not
found

Several
works
with
same
ISBN

8240

ISBN
incorrect

133

19710

ADLUG 2013

12
Browsing Dewey Index
Besides author, uniform
titles and subject
headings, our OPAC
offers a path of semantic
search based on the
Dewey classification
number

Oct 18, 2013

ADLUG 2013

13
Software
●

Query programs were written in Perl language, making
use of the Koha API and the following libraries
available on CPAN:
–

LWP for HTTP connections

–

ZOOM for Z39.50 connections

–

DBI for connections to the MySQL database

–

XML::XPath for XML data processing

–

WWW::Scraper for HTML data processing

–

MARC::Record for MARC records processing

Oct 18, 2013

ADLUG 2013

14
A scientific article
●

●

published on JLIS.it at
http://leo.cilea.it/index.php/jlis/article/view/8766
JLIS.it, Italian Journal of Library and information
science, is an academic journal of international
scope, peer-reviewed and open access

●

written with my cataloguers

●

doesn't deal with the dynamic component

Oct 18, 2013

ADLUG 2013

15
Version 2.0 - Single Record Mode
●

New record:
–
–

retrieve Dewey from important catalogs

–
●

enter the ISBN
choose and import the best one into the new record

Or upgrade an old record adding or modifying its
Dewey classification

Oct 18, 2013

ADLUG 2013

16
Oct 18, 2013

ADLUG 2013

17
Conclusions
●

Increase of available bibliographic data on the net

●

Unique identifiers
–
–

●

ISBN, ISSN, ...
VIAF Id, ISNI, ...

Catalog enrichment
–
–

●

bibliographic records
authority records

Expose rich linked data
–

with coded information like Dewey

–

with standard IDs like iSBN, ISNI, ...

Oct 18, 2013

ADLUG 2013

18
Thank you
Gracias
Grazie

Oct 18, 2013

ADLUG 2013

19

Más contenido relacionado

Similar a Catalog enrichment: importing Dewey Decimal Classification from external sources (slides)

TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
chiportal
 

Similar a Catalog enrichment: importing Dewey Decimal Classification from external sources (slides) (20)

Session3 01.clemens neudecker
Session3 01.clemens neudeckerSession3 01.clemens neudecker
Session3 01.clemens neudecker
 
OCR-D: An end-to-end open source OCR framework for historical printed documents
OCR-D: An end-to-end open source OCR framework for historical printed documentsOCR-D: An end-to-end open source OCR framework for historical printed documents
OCR-D: An end-to-end open source OCR framework for historical printed documents
 
JCDL 2016 Doctoral Consortium - Web Archive Profiling
JCDL 2016 Doctoral Consortium - Web Archive ProfilingJCDL 2016 Doctoral Consortium - Web Archive Profiling
JCDL 2016 Doctoral Consortium - Web Archive Profiling
 
Web Archive Profiling Through Fulltext Search
Web Archive Profiling Through Fulltext SearchWeb Archive Profiling Through Fulltext Search
Web Archive Profiling Through Fulltext Search
 
TPDL 2016 Doctoral Consortium - Web Archive Profiling
TPDL 2016 Doctoral Consortium - Web Archive ProfilingTPDL 2016 Doctoral Consortium - Web Archive Profiling
TPDL 2016 Doctoral Consortium - Web Archive Profiling
 
AGROVOC GACS Working Group
AGROVOC GACS Working GroupAGROVOC GACS Working Group
AGROVOC GACS Working Group
 
The ECM world from the point of view of Alfresco - Linux Day 2013 - Rome
The ECM world from the point of view of Alfresco - Linux Day 2013 - RomeThe ECM world from the point of view of Alfresco - Linux Day 2013 - Rome
The ECM world from the point of view of Alfresco - Linux Day 2013 - Rome
 
Kirstin Kemner-Heek and Roswitha Schweitzer - Kuali OLE: Activities in Germany
Kirstin Kemner-Heek and Roswitha Schweitzer - Kuali OLE: Activities in GermanyKirstin Kemner-Heek and Roswitha Schweitzer - Kuali OLE: Activities in Germany
Kirstin Kemner-Heek and Roswitha Schweitzer - Kuali OLE: Activities in Germany
 
Archival Technologies
Archival TechnologiesArchival Technologies
Archival Technologies
 
Everything you always wanted to know about WorldCat (but were afraid to ask) ...
Everything you always wanted to know about WorldCat (but were afraid to ask) ...Everything you always wanted to know about WorldCat (but were afraid to ask) ...
Everything you always wanted to know about WorldCat (but were afraid to ask) ...
 
Linking Metrics to Logs using Loki
Linking Metrics to Logs using LokiLinking Metrics to Logs using Loki
Linking Metrics to Logs using Loki
 
Linking Metrics to Logs using Loki
Linking Metrics to Logs using LokiLinking Metrics to Logs using Loki
Linking Metrics to Logs using Loki
 
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
 
Mongo db basics
Mongo db basicsMongo db basics
Mongo db basics
 
Lokijs
LokijsLokijs
Lokijs
 
Rene Voorburg - Using KB APIs to collect data
Rene Voorburg - Using KB APIs to collect dataRene Voorburg - Using KB APIs to collect data
Rene Voorburg - Using KB APIs to collect data
 
Cloud Foundry Logging and Metrics
Cloud Foundry Logging and MetricsCloud Foundry Logging and Metrics
Cloud Foundry Logging and Metrics
 
BlogForever Project presentation at MTSR2013
BlogForever Project presentation at MTSR2013BlogForever Project presentation at MTSR2013
BlogForever Project presentation at MTSR2013
 
Apache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriApache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-Ari
 
Science Gateways: one portal, many e-Infrastructures and related services
Science Gateways: one portal, many e-Infrastructures and related servicesScience Gateways: one portal, many e-Infrastructures and related services
Science Gateways: one portal, many e-Infrastructures and related services
 

Más de Stefano Bargioni

Adding browse to Koha using Solr
Adding browse to Koha using SolrAdding browse to Koha using Solr
Adding browse to Koha using Solr
Stefano Bargioni
 
Adding browse to Koha using Solr
Adding browse to Koha using SolrAdding browse to Koha using Solr
Adding browse to Koha using Solr
Stefano Bargioni
 

Más de Stefano Bargioni (11)

Catalog Enrichment for RDA - Adding relationship designators (in Koha) [text]
Catalog Enrichment for RDA - Adding relationship designators (in Koha) [text]Catalog Enrichment for RDA - Adding relationship designators (in Koha) [text]
Catalog Enrichment for RDA - Adding relationship designators (in Koha) [text]
 
Catalog Enrichment for RDA - Adding relationship designators (in Koha)
Catalog Enrichment for RDA - Adding relationship designators (in Koha)Catalog Enrichment for RDA - Adding relationship designators (in Koha)
Catalog Enrichment for RDA - Adding relationship designators (in Koha)
 
Koha RDA FRBR: alcune riflessioni (text)
Koha RDA FRBR: alcune riflessioni (text)Koha RDA FRBR: alcune riflessioni (text)
Koha RDA FRBR: alcune riflessioni (text)
 
Koha, RDA, FRBR: alcune riflessioni
Koha, RDA, FRBR: alcune riflessioniKoha, RDA, FRBR: alcune riflessioni
Koha, RDA, FRBR: alcune riflessioni
 
Publication cover management in a library system (text)
Publication cover management in a library system (text)Publication cover management in a library system (text)
Publication cover management in a library system (text)
 
Publication cover management in a library system (slides)
Publication cover management in a library system (slides)Publication cover management in a library system (slides)
Publication cover management in a library system (slides)
 
Open, Big, & Linked Data
Open, Big, & Linked DataOpen, Big, & Linked Data
Open, Big, & Linked Data
 
Un nuovo motore per Koha
Un nuovo motore per KohaUn nuovo motore per Koha
Un nuovo motore per Koha
 
Adding browse to Koha using Solr
Adding browse to Koha using SolrAdding browse to Koha using Solr
Adding browse to Koha using Solr
 
Adding browse to Koha using Solr
Adding browse to Koha using SolrAdding browse to Koha using Solr
Adding browse to Koha using Solr
 
Stelline 2013
Stelline 2013Stelline 2013
Stelline 2013
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 

Catalog enrichment: importing Dewey Decimal Classification from external sources (slides)

  • 1. Stefano Bargioni Pontificia Università della Santa Croce Catalogue enrichment: importing Dewey Decimal Classification from external sources Oct 18, 2013 ADLUG 2013 1
  • 2. The project ● Improving the Dewey search path – – ● ● with a minimal effort while adding BNCF compliant subject headings to our catalog Koha 3 <http://koha-community.org> open source ILS Can be applied to other ILS's Oct 18, 2013 ADLUG 2013 2
  • 3. Version 1: The Batch Mode ● Add Dewey notations to the catalog – automatically – from selected sources – ensure quality and uniformity Oct 18, 2013 ADLUG 2013 3
  • 4. An atomic copy cataloguing ● ● copy cataloguing is usually related to the full record we only need to copy field 082 (MARC21) or 676 (Unimarc) ● ISBN unique identifier ● the policy issue Oct 18, 2013 ADLUG 2013 4
  • 5. Records to be modified ● without Dewey notation ● with ISBN ● limit: 008 language – SELECT biblionumber, ISBN FROM biblio WHERE ISBN_present AND dewey_absent AND language_008='...' Oct 18, 2013 ADLUG 2013 In Ko cla ha, My use i the W Ex tra SQ s ba HE on ctV L s fie alu fun ed o RE ld e, t ctio n thr bibl ha n ou io. t w exp gh X ma ork res Pa rcxm s sio th l ns 5
  • 6. Dewey Sources (I) ● a choice based on copy cataloguing experience ● OCLC Classify ● some National Libraries ● API, Z39.50 or HTML access Oct 18, 2013 ADLUG 2013 6
  • 7. Dewey Sources (II): OCLC Classify ● ● ● Classify is a FRBR-based prototype designed to support the assignment of classification numbers and subject headings for books, DVDs, CDs, and other types of materials. This project applies principles of the FRBR model to aggregate bibliographic information above the manifestation level. Bibliographic records are grouped using the OCLC FRBR Work-Set algorithm to form a work-level summary of the class numbers and subject headings assigned to a work. You can retrieve a summary by ISBN, ISSN, UPC, OCLC number, author/title, or subject heading. The Classify database is accessible through a user interface and as a machine-to-machine service. The database provides access to more than 36 million WorldCat records that contain Dewey Decimal Classification (DDC) numbers,[...]. ● Retrieved information is in XML format. ● http://www.oclc.org/research/activities/classify.html?urlm=159746 Oct 18, 2013 ADLUG 2013 7
  • 8. Dewey Sources (III): National Libraries LC Library of Congress (any) MARC BNF Bibliothèque nationale de France (fre) MARC DNB Deutsche Nationalbibliothek (ger) HTML BNCF Biblioteca Nazionale Centrale di Firenze (ita) HTML BNCR Biblioteca Nazionale Centrale di Roma (ita) HTML BNB British National Bibliography (eng) MARC Oct 18, 2013 ADLUG 2013 8
  • 9. The logic used in the programs ● open the connection to the bibliographical database ● obtain the ISBN from records without a Dewey number ● open the connection to the Dewey source, if Z39.50 ● for each ISBN ● query the data source using the current ISBN ● if a Dewey number is available in the response ● if the Dewey number passes quality control ● update the bibliographical record ● wait to avoid overloading ● close the connection to the Dewey source, if Z39.50 ● close the connection to the bibliographical database Oct 18, 2013 ADLUG 2013 9
  • 10. Quality check ● Catalogs contain errors ● DDC has many editions ● Our old Dewey numbers start from edition 19 ● Indicators ● Lot of discarded Dewey... ● … but we moved from 40,000 to 60,000 records with Dewey number Oct 18, 2013 ADLUG 2013 +5 0% 10
  • 11. Delay while searching sources ● Continuous searching can suffocate remote servers – – ● ● robots.txt policies for crawlers Continuous indexing can overload your server Wait a few seconds between searches or group of searches – this will slow the harvesting process Oct 18, 2013 ADLUG 2013 11
  • 12. Statistics Source Language Dewey # not found Dewey # discarded Classify all 42387 10267 5321 6607 20059 LC all 31999 1252 21195 8562 1011 BNF all 30903 2253 21327 7268 55 DNB ger 4193 163 3867 163 0 BNCF ita 12017 4088 3643 3542 744 BNCR ita 7549 1515 3003 2978 53 BNB eng 6215 193 5449 55 518 Total Oct 18, 2013 Records Scanned Records Modified ISBN not found Several works with same ISBN 8240 ISBN incorrect 133 19710 ADLUG 2013 12
  • 13. Browsing Dewey Index Besides author, uniform titles and subject headings, our OPAC offers a path of semantic search based on the Dewey classification number Oct 18, 2013 ADLUG 2013 13
  • 14. Software ● Query programs were written in Perl language, making use of the Koha API and the following libraries available on CPAN: – LWP for HTTP connections – ZOOM for Z39.50 connections – DBI for connections to the MySQL database – XML::XPath for XML data processing – WWW::Scraper for HTML data processing – MARC::Record for MARC records processing Oct 18, 2013 ADLUG 2013 14
  • 15. A scientific article ● ● published on JLIS.it at http://leo.cilea.it/index.php/jlis/article/view/8766 JLIS.it, Italian Journal of Library and information science, is an academic journal of international scope, peer-reviewed and open access ● written with my cataloguers ● doesn't deal with the dynamic component Oct 18, 2013 ADLUG 2013 15
  • 16. Version 2.0 - Single Record Mode ● New record: – – retrieve Dewey from important catalogs – ● enter the ISBN choose and import the best one into the new record Or upgrade an old record adding or modifying its Dewey classification Oct 18, 2013 ADLUG 2013 16
  • 18. Conclusions ● Increase of available bibliographic data on the net ● Unique identifiers – – ● ISBN, ISSN, ... VIAF Id, ISNI, ... Catalog enrichment – – ● bibliographic records authority records Expose rich linked data – with coded information like Dewey – with standard IDs like iSBN, ISNI, ... Oct 18, 2013 ADLUG 2013 18
  • 19. Thank you Gracias Grazie Oct 18, 2013 ADLUG 2013 19