SlideShare una empresa de Scribd logo
1 de 76
Linked Library Datain the wild
Technical Lead for Prism Phil John Introductions...
So, what’s Prism then? Introductions...
a next generation discovery interface Prism Introductions
(yes…even configuration settings) Built entirely on Linked Data Prism
Discovery of library  catalogue resources Prism but grander plans afoot...
...some future sources... Prism ,[object Object]
 archives/records (e.g. DS Calm)
 thesis repositories
 rare items/special collections
 and more!,[object Object]
MARC 21    RDF Performs data conversion Prism
this ensures it keeps in sync with the LMS Initial “bulk” conversion then periodic “delta” files Prism
provided by a suite of RESTful web services Borrower/Availability data pulled from LMS “live” Prism
just add .rss to collectionsor .rdf/.nt/.ttl/.json to items Linked Data API Prism
The Challenges Prism
Extracting data from MARC 21 The Challenges
Some quotes... Extracting Data from MARC 21 ...cataloguers may want to look away now
...and even if it does, there are millions of existing records that we’ll want to convert MARC 21 is not going away anytime soon... Extracting Data from MARC 21
How are we approaching it? Extracting Data from MARC 21
By tackling it in small chunks! Extracting Data from MARC 21
We’ve created a solution that... Extracting Data from MARC 21 ,[object Object]
 compartmentalises code for different sections
 provides robustness
 is performant
 allows us to experiment ,[object Object]
fires events when it encounters a MARC 21 data structure; very strict with syntax MARC 21 Parser Extracting Data from MARC 21
listens for MARC 21 data structures and hands control over to one or more handlers Event Observer Extracting Data from MARC 21
know how to convert MARC 21structures and fields into linked data Bibliographic Handlers Extracting Data from MARC 21
So, where are we up to? Extracting Data from MARC 21
we tackled this one first as it allows us to reason more fully about the record Format (and duration) Extracting Data from MARC 21
In theory quite easy... Format
...in practice not so much... Format ,[object Object]
 DVD and LaserDisc share(d) a code
 LC slow(ish) to support new formats in M21
 limited use of control field (007) codings...
 ...so need to parse text from 3xx, 5xx fields,[object Object]
Which gives us...
an important part of the recordto model, or so I’ve been told Title Extracting Data from MARC 21
Quite tricky because... Title ,[object Object]
 ‡c must be last subfield in a 245...
 ...so sometimes data from ‡n / ‡p is in ‡c instead...
 ...which means we can’t just drop the ‡c ,[object Object]
Now with more title
sounds easy...acronyms from EAN to UPC describing 13 digit codes...right? Identifier Extracting Data from MARC 21
what are all those other things doing in the ‡a? ...STOP! Identifier
Identifier “For a hardbound resource, there is no attempt to use a consistent term other than to use one that conveys the condition intelligibly.” Library of Congress Rule Interpretation 1.8
(and then validate whatever’s left) So we need to parse them out Identifier
LDR: 01425ngm a22005058  4504 001: 750785 003: xxxxxxx 005: 20090824164118.0 007: vd||s|||| 008: 080623s2007    enk||| e          v|eng d 020:  ,   | $c Retail (S24.99) | 024: 3,   | $a 7321900108089 | 028: 4, 0 | $a BDY10808 | $b Warner Home Video | 029:  ,   | $a 7321900108089 | 082:  ,   | $a 812 245: 0, 0 | $a Goodfellas | $h [videorecording] / | $c directed by Martin Scorsese ; music by Christopher Brooks 260:  ,   | $b Warner Home Video, | $c 2007. | 300:  ,   | $a 1 Blu-Ray (139 min.) : | $b col. | 306:  ,   | $a 021900 | 366:  ,   | $b 20070611 | 511:  ,   | $a Starring Robert De Niro, Ray Liotta and Joe Pesci 521: 8,   | $a BBFC code: 18. | 538:  ,   | $a Blu-Ray. | 700: 1,   | $a Scorsese, Martin | 700: 1,   | $a Brooks, Christopher | 852:  ,   | $b John Harvard | $c BLU-RAY DISC | $m 18 | $z , $z Blu Ray Disc. 18Cert Phew, this one’s easy, no (pbk), (hbk) or even (pbk. , alk. paper) to contend with
Now we can start performing lookups against other sources!
hardest of the lot... Author Extracting Data from MARC 21
...why? Author ,[object Object]
 Rowling, J.K. vs Rowling, Joanne K.
 Few records with relator term in 100/700 ‡e...
 ...so we have to parse that from the 245 ‡c...
 ...and we don’t just deal with English records.,[object Object]
we’ve licensed the names/subjects authority files, and created RDF from them Library of Congress to the rescue! Author
LDR: 01425ngm a22005058  4504 001: 750785 003: xxxxxxx 005: 20090824164118.0 007: vd||s|||| 008: 080623s2007    enk||| e          v|eng d 020:  ,   | $c Retail (S24.99) | 024: 3,   | $a 7321900108089 | 028: 4, 0 | $a BDY10808 | $b Warner Home Video | 029:  ,   | $a 7321900108089 | 082:  ,   | $a 812 245: 0, 0 | $a Goodfellas | $h [videorecording] / | $c directed by Martin Scorsese ; music by Christopher Brooks 260:  ,   | $b Warner Home Video, | $c 2007. | 300:  ,   | $a 1 Blu-Ray (139 min.) : | $b col. | 306:  ,   | $a 021900 | 366:  ,   | $b 20070611 | 511:  ,   | $a Starring Robert De Niro, Ray Liotta and Joe Pesci 521: 8,   | $a BBFC code: 18. | 538:  ,   | $a Blu-Ray. | 700: 1,   | $a Scorsese, Martin | 700: 1,   | $a Brooks, Christopher | $e music 852:  ,   | $b John Harvard | $c BLU-RAY DISC | $m 18 | $z , $z Blu Ray Disc. 18Cert A contrived example (sorry!) with and without relator terms
Hope you can all read this at the back!
A closer look at Authority Matching Author
Some requirements: Author ,[object Object]
 ...(able to process 2M records in several hours)
 requires accuracy
 must handle pseudonyms and variant spellings,[object Object]
You can tell J.K. Rowling is successful, she’s been translated lots
Language/Alternate Graphical Representation Extracting Data from MARC 21
Nice “high impact” feature Language ,[object Object]

Más contenido relacionado

Similar a Linked Library Data in the wild

Introduction to Transcoding: Tools and Processes
Introduction to Transcoding: Tools and ProcessesIntroduction to Transcoding: Tools and Processes
Introduction to Transcoding: Tools and Processes
PrestoCentre
 
Avtex Lync 2013 Event - Fargo
Avtex Lync 2013 Event - FargoAvtex Lync 2013 Event - Fargo
Avtex Lync 2013 Event - Fargo
Avtex
 

Similar a Linked Library Data in the wild (20)

PAL
PALPAL
PAL
 
SHARE Interface in Flash Storage for Relational and NoSQL Databases
SHARE Interface in Flash Storage for Relational and NoSQL DatabasesSHARE Interface in Flash Storage for Relational and NoSQL Databases
SHARE Interface in Flash Storage for Relational and NoSQL Databases
 
Cwmg
CwmgCwmg
Cwmg
 
CouchDB
CouchDBCouchDB
CouchDB
 
String Comparison Surprises: Did Postgres lose my data?
String Comparison Surprises: Did Postgres lose my data?String Comparison Surprises: Did Postgres lose my data?
String Comparison Surprises: Did Postgres lose my data?
 
All About Storeconfigs
All About StoreconfigsAll About Storeconfigs
All About Storeconfigs
 
Introduction to Transcoding: Tools and Processes
Introduction to Transcoding: Tools and ProcessesIntroduction to Transcoding: Tools and Processes
Introduction to Transcoding: Tools and Processes
 
(BDT203) From Zero to NoSQL Hero: Amazon DynamoDB Tutorial | AWS re:Invent 2014
(BDT203) From Zero to NoSQL Hero: Amazon DynamoDB Tutorial | AWS re:Invent 2014(BDT203) From Zero to NoSQL Hero: Amazon DynamoDB Tutorial | AWS re:Invent 2014
(BDT203) From Zero to NoSQL Hero: Amazon DynamoDB Tutorial | AWS re:Invent 2014
 
IBM SAN Volume Controller Performance Analysis
IBM SAN Volume Controller Performance AnalysisIBM SAN Volume Controller Performance Analysis
IBM SAN Volume Controller Performance Analysis
 
Building an Amazon Datawarehouse and Using Business Intelligence Analytics Tools
Building an Amazon Datawarehouse and Using Business Intelligence Analytics ToolsBuilding an Amazon Datawarehouse and Using Business Intelligence Analytics Tools
Building an Amazon Datawarehouse and Using Business Intelligence Analytics Tools
 
Tips And Tricks For Bioinformatics Software Engineering
Tips And Tricks For Bioinformatics Software EngineeringTips And Tricks For Bioinformatics Software Engineering
Tips And Tricks For Bioinformatics Software Engineering
 
Avtex Lync 2013 Event - Fargo
Avtex Lync 2013 Event - FargoAvtex Lync 2013 Event - Fargo
Avtex Lync 2013 Event - Fargo
 
Data Alchemy: Turn your Data into Gold
Data Alchemy: Turn your Data into GoldData Alchemy: Turn your Data into Gold
Data Alchemy: Turn your Data into Gold
 
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
 
unit 5.ppt
unit 5.pptunit 5.ppt
unit 5.ppt
 
15 bufferand records
15 bufferand records15 bufferand records
15 bufferand records
 
Triangle Visibility buffer
Triangle Visibility bufferTriangle Visibility buffer
Triangle Visibility buffer
 
No more dumb hex!
No more dumb hex!No more dumb hex!
No more dumb hex!
 
Fast track to getting started with DSE Max @ ING
Fast track to getting started with DSE Max @ INGFast track to getting started with DSE Max @ ING
Fast track to getting started with DSE Max @ ING
 
DynamoDB as a Secondary Language - Pop-up Loft Tel Aviv
DynamoDB as a Secondary Language - Pop-up Loft Tel AvivDynamoDB as a Secondary Language - Pop-up Loft Tel Aviv
DynamoDB as a Secondary Language - Pop-up Loft Tel Aviv
 

Último

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 

Linked Library Data in the wild

  • 2. Technical Lead for Prism Phil John Introductions...
  • 3. So, what’s Prism then? Introductions...
  • 4.
  • 5.
  • 6.
  • 7. a next generation discovery interface Prism Introductions
  • 8. (yes…even configuration settings) Built entirely on Linked Data Prism
  • 9. Discovery of library catalogue resources Prism but grander plans afoot...
  • 10.
  • 13. rare items/special collections
  • 14.
  • 15. MARC 21 RDF Performs data conversion Prism
  • 16. this ensures it keeps in sync with the LMS Initial “bulk” conversion then periodic “delta” files Prism
  • 17. provided by a suite of RESTful web services Borrower/Availability data pulled from LMS “live” Prism
  • 18. just add .rss to collectionsor .rdf/.nt/.ttl/.json to items Linked Data API Prism
  • 19.
  • 20.
  • 21.
  • 23. Extracting data from MARC 21 The Challenges
  • 24. Some quotes... Extracting Data from MARC 21 ...cataloguers may want to look away now
  • 25.
  • 26. ...and even if it does, there are millions of existing records that we’ll want to convert MARC 21 is not going away anytime soon... Extracting Data from MARC 21
  • 27.
  • 28. How are we approaching it? Extracting Data from MARC 21
  • 29. By tackling it in small chunks! Extracting Data from MARC 21
  • 30.
  • 31. compartmentalises code for different sections
  • 34.
  • 35. fires events when it encounters a MARC 21 data structure; very strict with syntax MARC 21 Parser Extracting Data from MARC 21
  • 36. listens for MARC 21 data structures and hands control over to one or more handlers Event Observer Extracting Data from MARC 21
  • 37. know how to convert MARC 21structures and fields into linked data Bibliographic Handlers Extracting Data from MARC 21
  • 38. So, where are we up to? Extracting Data from MARC 21
  • 39. we tackled this one first as it allows us to reason more fully about the record Format (and duration) Extracting Data from MARC 21
  • 40. In theory quite easy... Format
  • 41.
  • 42. DVD and LaserDisc share(d) a code
  • 43. LC slow(ish) to support new formats in M21
  • 44. limited use of control field (007) codings...
  • 45.
  • 47. an important part of the recordto model, or so I’ve been told Title Extracting Data from MARC 21
  • 48.
  • 49. ‡c must be last subfield in a 245...
  • 50. ...so sometimes data from ‡n / ‡p is in ‡c instead...
  • 51.
  • 52. Now with more title
  • 53. sounds easy...acronyms from EAN to UPC describing 13 digit codes...right? Identifier Extracting Data from MARC 21
  • 54. what are all those other things doing in the ‡a? ...STOP! Identifier
  • 55. Identifier “For a hardbound resource, there is no attempt to use a consistent term other than to use one that conveys the condition intelligibly.” Library of Congress Rule Interpretation 1.8
  • 56.
  • 57. (and then validate whatever’s left) So we need to parse them out Identifier
  • 58. LDR: 01425ngm a22005058 4504 001: 750785 003: xxxxxxx 005: 20090824164118.0 007: vd||s|||| 008: 080623s2007 enk||| e v|eng d 020: , | $c Retail (S24.99) | 024: 3, | $a 7321900108089 | 028: 4, 0 | $a BDY10808 | $b Warner Home Video | 029: , | $a 7321900108089 | 082: , | $a 812 245: 0, 0 | $a Goodfellas | $h [videorecording] / | $c directed by Martin Scorsese ; music by Christopher Brooks 260: , | $b Warner Home Video, | $c 2007. | 300: , | $a 1 Blu-Ray (139 min.) : | $b col. | 306: , | $a 021900 | 366: , | $b 20070611 | 511: , | $a Starring Robert De Niro, Ray Liotta and Joe Pesci 521: 8, | $a BBFC code: 18. | 538: , | $a Blu-Ray. | 700: 1, | $a Scorsese, Martin | 700: 1, | $a Brooks, Christopher | 852: , | $b John Harvard | $c BLU-RAY DISC | $m 18 | $z , $z Blu Ray Disc. 18Cert Phew, this one’s easy, no (pbk), (hbk) or even (pbk. , alk. paper) to contend with
  • 59. Now we can start performing lookups against other sources!
  • 60. hardest of the lot... Author Extracting Data from MARC 21
  • 61.
  • 62. Rowling, J.K. vs Rowling, Joanne K.
  • 63. Few records with relator term in 100/700 ‡e...
  • 64. ...so we have to parse that from the 245 ‡c...
  • 65.
  • 66. we’ve licensed the names/subjects authority files, and created RDF from them Library of Congress to the rescue! Author
  • 67. LDR: 01425ngm a22005058 4504 001: 750785 003: xxxxxxx 005: 20090824164118.0 007: vd||s|||| 008: 080623s2007 enk||| e v|eng d 020: , | $c Retail (S24.99) | 024: 3, | $a 7321900108089 | 028: 4, 0 | $a BDY10808 | $b Warner Home Video | 029: , | $a 7321900108089 | 082: , | $a 812 245: 0, 0 | $a Goodfellas | $h [videorecording] / | $c directed by Martin Scorsese ; music by Christopher Brooks 260: , | $b Warner Home Video, | $c 2007. | 300: , | $a 1 Blu-Ray (139 min.) : | $b col. | 306: , | $a 021900 | 366: , | $b 20070611 | 511: , | $a Starring Robert De Niro, Ray Liotta and Joe Pesci 521: 8, | $a BBFC code: 18. | 538: , | $a Blu-Ray. | 700: 1, | $a Scorsese, Martin | 700: 1, | $a Brooks, Christopher | $e music 852: , | $b John Harvard | $c BLU-RAY DISC | $m 18 | $z , $z Blu Ray Disc. 18Cert A contrived example (sorry!) with and without relator terms
  • 68. Hope you can all read this at the back!
  • 69. A closer look at Authority Matching Author
  • 70.
  • 71. ...(able to process 2M records in several hours)
  • 73.
  • 74. You can tell J.K. Rowling is successful, she’s been translated lots
  • 75. Language/Alternate Graphical Representation Extracting Data from MARC 21
  • 76.
  • 77. both forms can be searched for
  • 78.
  • 79. tagged with an ISO-639-2 language and masquerading as the field listed in ‡6 Passes 880s back into Observer Language
  • 81.
  • 82.
  • 83.
  • 84. it’s part of the reason we use Linked Data...but it’s got some challenges at the moment Using/Linking to External Datasets The Challenges
  • 85.
  • 86. ...or worse, is taken offline permanently?
  • 87. can we trust this data?
  • 88.
  • 89. ...or, if that’s not practical, proxy requests using a caching proxy such as Squid
  • 90. if using Wikipedia and worried about vandalism...
  • 91.
  • 92. ...or – what we’d like to seehappen to Linked Library Data The Future...
  • 93. especially on the peripheries – authority data, author information, links to other resources More library data as LOD The Future
  • 94. seriously – this would makeour lives so much simpler LMS vendors adopting LOD The Future
  • 95. LOD replacing MARC 21 as the standard representation of bibliographic records The Future
  • 96.
  • 97. Photo Credits Slide 15 - http://www.flickr.com/photos/gammaman/5241860326/ Slide 21 - http://www.flickr.com/photos/agizienski/3778965891/ Slide 40 - http://www.flickr.com/photos/54409200@N04/5070012761/ Slide 42 - http://www.flickr.com/photos/proimos/4199675334/ Slide 48 - http://www.flickr.com/photos/maveric2003/91198458/ Slide 63 - http://richard.cyganiak.de/2007/10/lod/ Slide 67 - http://www.flickr.com/photos/markchapmanphoto/5139429152/ Slide 72 - http://www.flickr.com/photos/-bast-/349497988/