Presentation given to the Cataloguing and Indexing Group Scotland seminar on Linked Open Data practises in archives and libraries, 18 November 2013. I explained the issues associated with discovering vocabulary URIs from literals and tips and techniques that could be employed to help discovery of URIs
8. What IS the URI
for “Spud”
anyway?
is the mapping of
historical strings to
their modern day
things or URIs … that
we would have used if
we were starting now.
9. the DOD and its vocs
Name Authority File (names)
go
Subject Authority File (keyword)
go
Thesaurus for Graphic Materials (keyword)
go
Thesaurus of Geographic Names (place)
go
Art & Architecture Thesaurus (keyword)
go
14. LoC Getty
Name Authority File
go
Subject Authority File
go
Thesaurus for Graphic Materials
go
Thesaurus of Geographic Names
go
The Art and Architecture Thesaurus
go
Well …. real
soon now
19. literals
String matching …. Yeah baby!
What do we want?
UNIQUE MATCHES!
When do we want them?
NOW! NOW! NOW!
OH NO!!! hang on a minute ….
20. literals
String matching …. DAMN YOU!
WOO-HOO!
Not hits hits
Multiple
EXACT MATCH
means we’ve gotta URI
no URI
no need for humans
needs humans
but can you really
REALLY trust it?
Innumerate?
21. an aside ….
the innumerate Scots
The first bridge
The Forth bridge
is the Forth
neither the Fourth
nor the 4th
The 2nd bridge
is the Forth Road
bridge
22. an aside ….
the innumerate Scots
Fourth Forth
bridge
Third Forth
bridge
23. an aside ….
the innumerate Scots
There’s a third
The First Bridge on
Forth Bridge on
the Firth of Forth
the Firth
The FIFTH Forth bridge
is the Forth bridge
Did I tell
you about
And Finally,
The 2nd Bridge
there’s the
on the Firth of
Fifth Forth
Forth is the Forth
bridge on the
Road bridge yet?
Firth of Forth
Firths
And on the
Firth of Forth
there’s a
Fourth bridge
but it’s not
the Forth
bridge
27. groups
keyword
Earth (soil)
close
Earthworks (engineering works)
sh85040505
exact
exact
AAT
74465029
1048
close
AAT
74549258
3723
AAT
74546044
4844
TGMI
sh85124396
Earls
Match
http://en.wikipedia.org/wiki/Soil
Match
Wikipedia
http://en.wikipedia.org/wiki/Earls
LCSH
keywordAuthor
ity
74548320
1055
74546674
4352
http://en.wikipedia.org/wiki/Earthworks_(engineering)
exact
Eating & drinking
Editors
Keyword
http://id.loc.gov/auth
orities/subjects/sh850
40976.html
close
LCSH
AAT
Match Wikipedia
http://en.wikipedia.org/wiki/Editing
Edwardian
Match
broad
DODid
keywordID
Originating voc
http://en.wikipedia.org/wiki/Edwardian
close
AAT
74549696
1071
Egg
sh85041248
close
http://en.wikipedia.org/wiki/Egg_(biology)
close
AAT
74549016
4351
Elderly
Electricity
Embankments
tgm007221
sh85042065
close
close
http://en.wikipedia.org/wiki/Elderly
http://en.wikipedia.org/wiki/Electricity
close
close
AAT
AAT
AAT
Eating & drinking
Editors
Emblems
Edwardian
Emergency medical services
Egg
Enemies
Engineers
Elderly
Engines (power producing equipment)
Electricity
Entertainers
Entertaining
Embankments
Entertainment
sh85040976
sh85042664
sh85042693
close http://en.wikipedia.org/wiki/Embankments
http://en.wikipedia.org/wiki/Editing
exact
AAT
http://en.wikipedia.org/wiki/Emblems
close
http://en.wikipedia.org/wiki/Edwardian
http://en.wikipedia.org/wiki/Emergency_medical_service
TGMI
exact
s
close
close http://en.wikipedia.org/wiki/Enemies
http://en.wikipedia.org/wiki/Egg_(biology)
AAT
close
broad
AAT
close
http://en.wikipedia.org/wiki/Engineers
close
close http://en.wikipedia.org/wiki/Elderly
AAT
close
close
close http://en.wikipedia.org/wiki/Engines
http://en.wikipedia.org/wiki/Electricity TGMI
exact
http://en.wikipedia.org/wiki/Entertainers
broad
AAT
close exact http://en.wikipedia.org/wiki/Entertaining
broad
http://en.wikipedia.org/wiki/Embankments
exact
exact
sh85042747
sh85041248
sh95005954
sh85043249
tgm007221
sh85043258
sh85042065
sh85044098
sh85044107
sh85042664
sh96009616
broad
http://en.wikipedia.org/wiki/Entertainers
exact
TGMI
broad
close
close
close
close
exact
TGMI 74549556
74547020
AAT 74548178
AAT 74546714
74545806
AAT 74549382
AAT 74549498
74549310
AAT
74548888
AAT 74546398
1079
3744
1087
1088
4807
1104
1106
4895
1115
1116
74548150
5191
Entrances
?
http://en.wikipedia.org/wiki/Entrance
exact
AAT
74549258
1117
Epaulets
-
http://en.wikipedia.org/wiki/Epaulette
exact
AAT
74546740
1118
Equestrians
sh85062154
http://en.wikipedia.org/wiki/Equestrianism
broad
AAT
74549594
1123
AAT
74545814
1124
TGMI
74549442
3782
74548678
1139
Equipment
Equipment & supplies
sh85085299
sh85085299
close
?
broad
http://en.wikipedia.org/wiki/Military_equipment
close
close
http://en.wikipedia.org/wiki/Military_equipment
broad
Ethnic groups
sh85045172
exact
http://en.wikipedia.org/wiki/Ethnic_group
exact
AAT
Events
sh96009616
close
http://en.wikipedia.org/wiki/Competition
close
AAT
74547864
1148
sh85046104
broad
http://en.wikipedia.org/wiki/Excavation_(archaeology)
related
AAT
74549618
4850
74548718
5038
Excavation (process)
Exhibiting
sh85046354
close
http://en.wikipedia.org/wiki/Exhibition
broad
AAT
Exhibitions (events)
sh85046354
close
http://en.wikipedia.org/wiki/Exhibition
exact
AAT
74546188
1163
Explosions
sh85046465
exact
http://en.wikipedia.org/wiki/Explosion
close
AAT
74549252
3750
28. crowds
Which person is Old Fox?
`<name> Capt. Campbell </name>
<name> Maj
Duncanson </name>
Is Glencoe here?
<name> old Fox </name>
<place> Glencoe </place>
http://en.wikipedia.org/wiki/Massacre_of_Glencoe
<name> McDonalds </name>
Order to Capt. Campbell by Maj.
Duncanson
You are hereby ordered to fall upon the
rebells, the McDonalds of Glencoe, and
put all to the sword under seventy. you
are to have a speciall care that the old
Fox and his sones doe upon no account
escape your hands
29. crowds & geonames
We think this is
Cambrai …
Do you think
this Cambrai is
here?
it
Or do you think
it’s here?
31. crowds & LCSH
Would you
describe this
horse in any of
these ways?
it
Show jumpers (horses)
Horses in motion pictures
Toy Horses
Horses
War horses
Travel with horses
none of
these
32. Things to think about ….
• using a voc without URIs?
– should we change?
• are there good ways to string match?
– are they trustworthy?
• are crowds helpful?
• what vocs are mapped to what other vocs?
– can/should we help map vocs beyond our
domain?
Notas del editor
Are your strings in a fankle or your things unmentionable? This session will cover practical issues associated with mapping linked open datasets from a local environment to the global semantic web. Issues will be illustrated by examples encountered at National Library of Scotland. Topics covered will include: DODLOD@NLS; LoC(h)-Getty; two other lochs; Wikipedia wickedness and the Elusive loch of Shandon (a watery Brigadoon); innumerate Scots and their thirds, forths and firths; The Germans (a Basil Fawlty moment);
NLS @ Edinburgh, Digital Access ManagerReorganisation& Responsibilities Overseeing Library systems & strategic development of collecting databaseResource discoveryWebsite Strategic development on open data and semantic webThere are 3 of us! Web editor/sys libBusy with ELDopen data and content policy – draftWikipedian in Residence – improve access and knowledge of our collections in to the open sphere. undertaken modest steps with linked open datano books in the Library. Coz as Gildas says … books are boring! We only have pictures.
Represents 50,000 digitised maps, millions of pages of books and directories (Post Office directories, military lists), papers about the medical history of India, broadsides, photographs, posters and manuscripts
Represents 50,000 digitised maps, millions of pages of books and directories (Post Office directories, military lists), papers about the medical history of India, broadsides, photographs, posters and manuscripts
It’s in efficient, looking things up, typing them in and then correcting the text. It should be look up and link.
It’s in efficient, looking things up, typing them in and then correcting the text. It should be look up and link.
So the DOD is old. Old doesn’t mean it’s not fucntionaltho! It’s relevant.It pre-dates Linked Open Data.THE Big issue that I want to focus on is how do we get from our historical strings which we store in the database to their modern day URIsOf course if we started now we’d record the URIs
We use 5 vocabularies in the DODActually we use more … we have a few local terms but we try very hard to avoid these
Take what we have which is the string is assigned local URI. Then the string needs to be matched with the string in the voc we used to discover its URIAnd That links our URI to the voc URI.
Getty doesn’t have linked data representationsIf we started today we’d store the URIs and not the strings (we’d take datadump for strings and URIs for lookup for cataloguers)
So far we’ve been looking inwards … at the local. What if we want to extend to the global.How does NLS get from the local to the global?Do we just publish our data and let it be so?Like DNBDo we try start making links?Like BnF to Wikipedia?How do we know what vocs have been mapped to what other vocs? Where is LCNAF mapped ?Where is LCSH mappedShould we map the Scottish stuff?What about those vocs that don’t have URIs (TGN)Where do I map?Or am I trapped in the local?
Le Chef tells me that the links just need to be made once and once onlyFor all timeI have a hope …. that the vocs I use are mapped to other vocs and so that my Library’s stuff will link to othersThat’s how its meant to work, isn’t it?So the people manufacturing Kilts and looking at Kilts on wikipediaFind these soldiers!
So let’s look at some techniques to connect the local to the global graph and try to find those elusive URIs
Use tools like Google Refine and other string matching tools and algorithms
Can use tools like Google Refine and various string matching tools.
No hits, means no URI – needs humansMultiple hits, means no URIs – needs humansExact match, means YEAH! But .. – needs humans
No hits, means no URI – needs humansMultiple hits, means no URIs – needs humansExact match, means YEAH! But .. – needs humans
No hits, means no URI – needs humansMultiple hits, means no URIs – needs humansExact match, means YEAH! But .. – needs humans
We could use humans. We could use humans and machines to give humans assistance. Examples?
Why use humans? Lots of them are looking for something interesting to do ….There are lots of people not being exploited in your organisation. They are bored. They want to contribute. They are bored.ReceptionistsNight watch men.Simple short tasks. – if you don’t know, move to next.
Groups – we had library school placement students comeAnd we decided to get them to help discover URIs for the Haig collection.So I got everything set up, they came to see me and then I had a Basil Fawlty momentDon’t mention the war But I had to And then I had to explain about why there were rascists comments in the data
There are lots of people not being exploited in your organisation. They are bored. They want to contribute. They are bored.ReceptionistsNight watch men.Simple short tasks. – if you don’t know, move to next.
I requested that we build a module that enables tagging of places, names
Change focus on this – getting people to say that this IS that. Then say that we are planning to implement semantic tagging as part of other projects. Building a tagging module for transcription service.
These are the results at id.loc.gov if you search for horses. IN DOD this picture only describes “horses” but of course the crowd can enhance while tagging. Run a string against this