Understanding, extracting and enhancing catalogue data (CE Book history workshop, 2023)
1. Understanding, extracting and
enhancing catalogue data
Péter Király
(GWDG, Göttingen, Germany)
Central European Book History workshop: data & tools
11 January 2023
Österreichische Nationalbibliothek
https://bit.ly/book-history-onb-2023
6. place name normalization
place-synonyms.csv (8085 surface forms of 628 locations)
Milano=Milan|Milano, Italy|Milan, Italy|Milani|Cinisello Balsamo (Milano)|...
coords.csv (1800+ locations)
"Milano",3173435,"Milan","Italy","45.46427","9.18951"
Milan
Milano, Italy
Milan, Italy
Milani
Cinisello Balsamo (Milano)
…
Geonames ID normal form country latitude longitude
3173435 Milano Italy 45.46427 9.18951
https://bit.ly/book-history-onb-2023
7. 18th century books in three catalogues
country catalogue books with
recognized
locations
place name
recognition (%)
normalized
geonames
(751, 752)
Austria ÖNB 123 431 95+ 7%
Hungary OSzK 32 974 95+ 0%
Poland BPNL 26 843 90+ 1%
https://bit.ly/book-history-onb-2023
17. 001 990029097480603338
751 $a Milano
$e publication place
$0 3173435
$1 https://www.geonames.org/3173435
$2 Geonames
$4 pup
roundtripping
datasharing options
record enhancement: ID and publication place information
https://bit.ly/book-history-onb-2023