NISO's IOTA OpenURL Quality Initiative @ ALA & SLA 2012

NISO'S IOTA Initiative:
Fixing OpenURL Links Using Data Analysis
Improving OpenURLs Through Analytics
Special Libraries Association, Annual Conference,
Chicago, IL, July 17, 2012
American Library Association, Annual Meeting,
Anaheim, CA, June 24, 2012

Rafal Kasprowski, Electronic Resources Librarian, Rice University
Susan Marcin, Licensed Electronic Resources Librarian, Columbia University
Oliver Pesch, Chief Strategist, EBSCO Information Services

HISTORY OF OPENURL / OVERVIEW OF IOTA

Rafal Kasprowski
Rice University
kasprowski@rice.edu

What is IOTA?

• Initiative that measures the relative importance of the
elements that make up OpenURL links to help vendors
improve their OpenURL strings so that the maximum
number of OpenURL requests resolve to a correct record.

Elements:
• journal title book title ISBN
• ISSN start page DOI
• volume author PMID
• issue date …

Presentation
• Part I (Rafal Kasprowski, Electronic Resources
Librarian, Rice University)
• History of OpenURL and IOTA
• IOTA’s Deliverables:
• OpenURL Reports (comparing vendors’ OpenURL links)
• OpenURL Quality Index (preliminary version)

• Part II (Susan Marcin, Licensed Electronic Resources
Librarian, Columbia University)
• Usefulness of IOTA’s OpenURL Reports in improving OpenURL
links for e-books
• Part III (Oliver Pesch, Chief Strategist, EBSCO
Information Services)
• Improvements to IOTA’s OpenURL Quality Index and its limitations

Part I - Agenda
• Full-text linking: from proprietary linking to OpenURL

• From Cornell to NISO: IOTA created in response to OpenURL
linking problems

• IOTA in the context of ERM best practices: IOTA and KBART

• IOTA’s analytical approach

• Reports comparing vendor OpenURLs

• Weighting OpenURL elements

• Concept of the OpenURL Quality Index and preliminary
version.

Before OpenURL: Proprietary Linking

• Certain A&I database providers (e.g., CSA, PubMed)
offered full-text linking options for a select number of
content providers.

• Libraries manually activated full-text linking with
providers they had subscriptions with.

• A&I --> Full Text

Proprietary Linking: Pros and Cons
• Linking had to be activated manually by libraries for
each full-text provider.

• A&I providers offering this option were few.

• Selection of full-text providers was limited.

But...
• Once set up, the static links to full texts were
accurate.

• Problem source pinpointed easily: A&I --> Full Text

Advent of OpenURL
• Objective: Deliver full texts unrestrained by proprietary silos.

• Open standard generating dynamic links at time of request.

• A-Z list (e.g., e-journals, e-books):
o Knowledge base (KB) with library's holdings.
o Replaces librarian as intermediary in linking.
o Indicates provider of "appropriate copy"

• A&I ("Source") --> A-Z list ("KB") --> Full Text ("Target")

OpenURL: syntax, resolver, linking nodes

Source Citation
A, Bernand, et al. "A versatile nanotechnology to connect individual nano-objects
for the fabrication of hybrid single-electron devices." Nanotechnology 21, no. 44
(November 5, 2010): 445201. Academic Search Complete, EBSCOhost (accessed
October 24, 2010).

Target Link (example using OpenURL syntax, similar to Source OpenURL)
http://www.anytarget.com/?issn=0957-4484&volume=21&issue=44&date=20101105
&spage=445201&title=Nanotechnology&atitle=A+versatile+nanotechnology+to+
connect+individual+nano-objects+for+the+ fabrication+of+hybrid+single-
electron+devices.&aulast=A++Bernand

Example of Resolver Menu Page

Matthew Reidsma, “jQuery for Customizing Hosted Library Services", http://matthew.reidsrow.com/articles/11 (accessed July 23, 2012)

Pros & Cons of OpenURL
Pros:
• KB/Resolver vendors took over most of the linking setup:
Less work for libraries and providers.
• Participation by A&I platforms and full-text providers
exceeded proprietary linking: OpenURL scales better.

Cons:
• Dynamic linking less predictable than static linking: more
difficult to pinpoint cause of link failures.
• OpenURL linking not improved significantly in last 10
years.
• No systematic method exists to benchmark OpenURLs.

Identifying source of problem…
“72% of respondents to the online survey either agreed or strongly
agreed that a significant problem for link resolvers is the generation
of incomplete or inaccurate OpenURLs by databases (for example,
A&I products).”
Culling, James. 2007. Link Resolvers and the Serials Supply Chain: Final Project Report for UKSG, p.33.
http://www.uksg.org/sites/uksg.org/files/uksg_link_resolvers_final_report.pdf.

Defining methodology for approaching problem
Recently, researchers have indicated the need for metadata quality
metrics, including:
• completeness;
• accuracy;
• conformance to expectations;
• logical consistency and coherence.
Bruce, Thomas R. and Hillmann, Diane I. 2004. The Continuum of Metadata Quality: Defining, Expressing, Exploiting. In Metadata in
Practice. Ed. Diane I. Hillmann and Elaine L. Westbrooks. Chicago: American Library Association, pp. 238-256.

Année philologique OpenURL Study
2008 Cornell study led by Adam Chandler*
• Problem: Too often links sent from Aph did not
successfully resolve to requested resource.
• Objective: Examine quality of OpenURLs offered to
users by Aph in order to improve the linking.

Aph Study investigated:
• Faulty citation metadata from source database.
• Method to evaluate the OpenURLs.

*Chandler, Adam. 2009. Results of L’Année philologique online OpenURL Quality Investigation: Mellon Planning Grant Final Report.
http://metadata.library.cornell.edu/oq/files/200902%20lannee-mellonreport-openurlquality-final.pdf.

Scoring System & Aph Study Outcomes
Concept of scoring in Aph study (based on B. Hughes study)*
• establish a baseline for comparison;
• results to be shared with data providers;
• develop a best practice.

Problem analysis in Aph study limited to:
• source link
• presence/absence of citation metadata elements

Results:
• OpenURL quality model: compares elements in Aph
OpenURLs to those of other providers.
• No scoring was achieved for Aph, but model is first step
towards scoring system.
*Hughes, Baden. 2004. Metadata Quality Evaluation: Experience from the Open Language Archives Community. In Digital Libraries:
International Collaboration and Cross-Fertilization. Ed. Zhaoneng Chen et al. Berlin: Springer-Verlag, 2004, pp. 320-329.

Creation of IOTA

IOTA is formed in
January 2010
• NISO accepts proposal
to take Aph Study to
wider community.
• URL: openurlquality.org

IOTA & KBART: complementary NISO working groups

IOTA
• Deals with issues specific to OpenURL linking;
• Seeks improvements in OpenURL elements used by:
• OpenURL providers.

KBART
• “Knowledge Bases And Related Tools”
• Deals with data issues at the KB level
• Seeks improvements in data exchange practices between:
• content providers (e.g. OpenURL providers);
• product vendors (e.g. link resolver vendors).
• subscription agents;

IOTA & KBART: related through OpenURL

• IOTA:
• analyzing data sent from OpenURL source to link resolver.

• KBART:
• creating best practices for data formats sent from content
providers to knowledge base (and link resolver) vendors.

IOTA’s Basic Assumptions

• Results are achieved through an analytical investigation
of how OpenURL links work.

• Practical: Not the OpenURL standard is addressed, but
links (OpenURLs) generated by standard.

• Selective changes to OpenURLs will lead to significant
improvement in linking success rate.
o Motto: "small changes. big improvements"

IOTA’s Desired Outcomes…
…a continuation of Aph Study

A. Produce qualitative reports that will help OpenURL
providers quickly compare their OpenURL quality to
that of their peers.

B. Develop community-recognized index for
measuring the quality of OpenURL links generated
by content providers:

 scalable across all OpenURLs and their providers

Usefulness of comparing OpenURLs
• Content providers that generate OpenURLs can:
• compare their OpenURLs with other providers;
• make improvements to their OpenURLs.

• Institutions can:
• compare OpenURL providers;
• make local adjustments to OpenURL setup.

• Resolver vendors can:
• compare OpenURL providers;
• Change their settings for OpenURL providers:
• Link resolvers;
• Web-scale discovery products.

Report types
• Source reports
• Viewing how a particular (1) vendor or (2) database
• A. uses OpenURL elements (element frequency)
• B. formats OpenURL elements (pattern frequency)

• Element / Pattern reports
• Viewing how a particular (1) element or format
• A. is used across vendors
• B. is used across databases

• Vendor Completeness Report
• Viewing vendors’ OpenURL quality score

OpenURL Quality Index:
Rating vendors by their OpenURLs

1. Core Elements:
• Any element contained in IOTA's OpenURL reporting system;
• 20M OpenURLs obtained from libraries & content providers.

2. Scoring System:
• Assumption: Correlation exists between
o # of core elements ("OpenURL completeness") &
o ability of OpenURLs to link to specific content.

3. Element Weighting:
• Assigned based on their relative importance:
o spage vs atitle
o issn vs jtitle
o doi/pmid vs date, etc.

OpenURL Quality Index
preliminary version

Further investigation was needed
• Element weighting needed to be adjusted in a more systematic
way:
o Importance of identifiers (doi, pmid) vs bibliographic data (issn, volume,
spage, etc.)
o Relative importance of bib. data (issn vs volume vs spage, etc.)

• IOTA focused on OpenURLs from citation sources only. How is
OpenURL linking impacted by other factors?
o knowledge base,
o resolver,
o full-text provider (target).

• High "completeness" score of OpenURLs not always indicative of
"success" in linking to full texts
o Combination of indexes, incl. “success index”, developed by IOTA and/or
other groups may lead to more precise metrics.

Presentation: Parts II and III

• Part II (Susan Marcin, Columbia University)
• Usefulness of IOTA’s OpenURL Reports in improving OpenURL
links for e-books

• Part III (Oliver Pesch, EBSCO Information Services)
• Improvements to IOTA’s OpenURL Quality Index and its limitations

E-Books & OpenURL Linking
A collaborative study by the
2CUL E-Books Task Force

Susan Marcin
Columbia University Libraries
smarcin@columbia.edu

What is 2CUL?

2CUL is a transformative partnership between two major
academic research libraries, the Columbia University
Libraries and the Cornell University Library, based on a
broad integration of resources, collections, services, and
expertise.

http://2cul.org/

2CUL E-Books Task Force

E-books represent a large, diverse, and rapidly growing group
of library materials whose acquisition, description,
management, and use touch many parts of our libraries’
organization. Moreover, models affecting virtually every aspect
of e-books are still evolving, leaving a host of issues in flux,
with many options and no perfect solutions.

• Survey e-book landscape in more detail
• Recommend steps that our libraries should take in the short
term to improve e-book access and management
• Make recommendations for action

E-Books Linking Group

• Examine, evaluate and compare the quality of
E-Book OpenURLs
• Focus on what works, what doesn’t and why

Group members:
• Adam Chandler, E-Resources & Database Management
Research Librarian, Cornell University
• Susan Marcin, Licensed Electronic Resources Librarian,
Columbia University

OpenURL :: NISO Standard Z39.88
URL strings generated “on the fly.”
The OpenURL path from citation to full text consists of data
being generated and passed through the following places:
1. An OpenURL is sent from the citation resource to the
OpenURL resolver
2. The data is matched against a knowledge base to generate
content on the OpenURL resolver page
3. A second proprietary URL is sent from the resolver page to
the full text

User Experience (UX): E-Book Citation

link

What the OpenURL string looks like (invisible to user)

http://rd8hp6du2b.search.serialssolutions.com/?ctx_ver=Z39.88
-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-
8&rfr_id=info:sid/summon.serialssolutions.com&rft_val_fmt=info
:ofi/fmt:kev:mtx:dc&
rft.title=Apple+Developer+Programs&
rft.date=2010-01-01&
rft.pub=Apress&
rft.isbn=1430229314&
rft.spage=179&
rft.epage=190&rft_id=info:doi/10.1007%2F978-1-4302-2932-
2_9&rft.externalDBID=n%2Fa&rft.externalDocID=978-1-4302-
2932-2_209124_Chap9
link

UX:
Link
Resolver
Page

link

Why does improved OpenURL linking
matter?
Users

• Better user experience: Users expect the provision of easy
connection between library resources
• Increased patron satisfaction with library e-resources

Library

• Enhanced discovery of and cross-linking into subscription
e-resources

How did the task force assess the quality of E-Book
OpenURLs?
• Looked at NISO IOTA data
(IOTA = Improving OpenURL through Analytics)
• IOTA tracks the OpenURLs sent from citations to link
resolvers

http://openurlquality.niso.org/

Question: How many patron requests for
e-books resulted in full text?
To find the answer --
Three weeks of Columbia U's openURL log data on the NISO
servers analyzed:
• one week from January 2011
• one week from February 2011
• one week from March 2011

78,540 total openURL requests in these 3 weeks

How many of these OpenURLs are for e-books?
• 1474 Requests that contained an ISBN
• 781 Requests that lacked an ISBN but contained
genre=book
We analyzed 2255 e-book OpenURLs.

ISBNs in OpenURL Strings – Do they
matter?

1474 OpenURL Requests contained an ISBN

Yes No
Full text link offered? 47% (698/1474) 47% (698/1474)

"Get Book" link Successful?
Yes (book link No (book link N/A (links to journal, rather
leads to full-text fails) than book or cannot
successfully) process information)
"Get Book" 94.9% (663/698) 1.1% (8/698) 1.1% (8/698)
link
Successful?

Including "Genre=Book" in OpenURL
Strings

Lacks ISBN, but found full text anyway?

Yes No
Found full text? 28% (208/781) 73% (573/781)

OpenURL linking failure :: "Bad" Data
• Correctly identified as "genre=book," but article title "atitle"
passed in openURL string.

http://rd8hp6du2b.search.serialssolutions.com/?SS_Page=refin
er&sid=sersol%3ARefinerQuery&rft.aulast=Mihailovic&url_ver=
Z39.88-
2004&l=RD8HP6DU2B&SS_ReferentFormat=BookFormat&rft.
genre=book&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%
3Abook&rft.atitle=corporeal+words&citationsubmit=Search&SS
_LibHash=RD8HP6DU2B&rfr_id=info%3Asid%2Fsersol%3ARe
finerQuery&rft.aufirst=Alexandar&SS_Errors=RequiredDataMis
sing
link

We do have this book in our catalog

link

A better OpenURL might look like this
http://rd8hp6du2b.search.serialssolutions.com/?rft.au=Mihailovi
c%2C+Alexandar&sid=sersol%3ARefinerQuery&SS_authors=
Mihailovic%2C+Alexandar&rft.aulast=Mihailovic&url_ver=Z39.8
8-
2004&l=RD8HP6DU2B&SS_ReferentFormat=BookFormat&rft.
genre=book&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%
3Abook&rft.title=Corporeal+Words&rft.isbn=0810114593&citati
onsubmit=Search&paramdict=en-
US&rfr_id=info%3Asid%2Fsersol%3ARefinerQuery&SS_LibHa
sh=RD8HP6DU2B&SS_isbnh=0-8101-1459-
3&rft.aufirst=Alexandar
link

IOTA Metric Report: ISBN
Vendor Percentage of Total number of
Drawing from our OpenURL OpenURLs
observations requests that analyzed
contain this
about the element
importance of summon.serialssolutions.com 98 17575
ISBN for full text
ebsco 82 27023
linking, we
pqil 82 1170
analyzed the
csa 79 12952
IOTA metric
hww 75 1428
"ISBN" to
refworks 68 1897
highlight the
firstsearch.oclc.org 65 10309
differences
unknown 63 3662
across a sample
sersol 30 4140
of OpenURL
oup 7 2071
providers.

Detailed Comparison of Summon vs. OUP
metric summon.serialso oup
The
lutions.com (percentage of
differences in (percentage of OpenURLs
the presence OpenURLs containing this
of the ISBN containing this element)
element are element)
very important aulast 0 90
in connecting date 100 96
the user to the
full text. doi 11 12
isbn 98 7
title 100 99

Criteria for improved OpenURL linking
Sending only a title in an OpenURL string is not always
sufficient to find a match. If there is an ISBN associated with a
book, then the OpenURL provider should ideally include the
ISBN in the OpenURL request for better results.

The inclusion of certain criteria do appear to promote OpenURL
linking success, such as:
• ISBN
• DOI
• Title

If one includes additional data, such as an author last name,
with the title, the precision of the results should improve.

Can we rank the OpenURL linking
success of e-book providers?
IOTA Vendor Rating OpenURLs
Completeness Analyzed
Report (Draft) summon.serialssolutions.co 0.64 17862
m
pqil 0.61 2237
metric weight
csa 0.59 27483
aulast 1 ebsco 0.57 50927
date 2 hww 0.57 2448
doi 8 refworks 0.55 3078
isbn 7 firstsearch.oclc.org 0.54 23444
unknown 0.43 5079
title 5
oup 0.4 3382
sersol 0.29 8227
gale 0.26 3019

Links & Contact
• 2CUL -- http://2cul.org/
• NISO IOTA -- http://openurlquality.niso.org/
• 2011 2CUL E-Books Task Force: Linking Subgroup Report
http://tinyurl.com/linkingreport

Susan Marcin
Columbia University Libraries
smarcin@columbia.edu

NISO'S IOTA INITIATIVE: COMPLETENESS INDEX AND
IMPROVING ELEMENT WEIGHTS

Oliver Pesch
EBSCO Information Services
opesch@ebsco.com

Overview
• Premise for IOTA completeness score and element
weights
• Proving the theory through real-life tests
• Using statistical approach to determine weights
• Test results
• Conclusions
• Next steps for IOTA

The premise behind IOTA
• Completeness Score is the measure of the
“completeness” of a single OpenURL
• Completeness Index is attributed to the content provider
as an overall measure of the completeness of their
OpenURLs

• The Completeness Score is calculated by “weighing” the
elements provided in the OpenURL based on their
importance in target links
• Some elements are more important than others and will
have a higher weight
• Completeness Score equals the sum of weights of
elements found divided by the maximum score possible

• Simple example assuming equal element weights

Element Description Weight This OpenURL

ATitle Article title 1
AuLast Author’s last name 1
Date Date of publication 1
ISSN ISSN 1
Issue Issue number 1
SPage Start page 1
Title Journal Title 1
Volume Volume number 1
TOTAL 8

SAMPLE OPEN URL DATA
The premise behind IOTA ?date=2/4/2008
&issn=1083-3013
&volume=13
• Simple example assuming equal element weights
&issue=20
&atitle=the+casualties+of+war
Completeness Score...
Element Description Weight This OpenURL
(Total for This OpenURL)
Total Weights
ATitle Article title 1 1
AuLast 5 / Author’s last name
8 1
Date Date of publication 1 1
ISSN
= .625
ISSN 1 1
Issue Issue number 1 1
SPage Start page 1
Volume Volume number 1 1
TOTAL 8 5

Determining the weights
• Initial approach
• Frequency of element occurrence in target link templates
• Combined with reasoning

Initial Weights
OpenURL data element Description Weight
eISSN Online ISSN 3
ISSN Print ISSN 3
Jtitle Journal Title 1
Pmid PubMed ID 8
SPage Start page 3
DOI Digital Object Identifier 8

Initial Weights
eISSN
Initial
weights were Online ISSN 3
somewhat subjective.
ISSN Print ISSN 3
Pmid PubMed ID 8
SPage Start page 3

Initial Weights
eISSN Online ISSN 3
ISSN Print ISSN 3
Jtitle
Most link resolver Journal Title 1
knowledge bases can
Pmid PubMed ID 8
handle look-ups by either
SPage Start page 3
Print ISSN or Online ISSN
(both are not needed)

Initial Weights
Most link resolvers will
AuLast
enhance identifiers like Author’s last name 1
PubMed ID and DOI;
therefore, having an
eISSN Online ISSN 3
identifier is like having all
ISSN Print ISSN 3
metadata elements.
Pmid PubMed ID 8
SPage Start page 3

Validating the Completeness Score
• Use real OpenURLs and a commercial link resolver.
(tested with LinkSource and 360-Link)
• Remove institutional holdings as a limit to resolution

• Process each OpenURL through the link resolver to
determine “Success”
• Score one point for finding at least one full text target

• Calculate the completeness score for each OpenURL
• Look for a statistical correlation between the
completeness score and the success score

Results: Original Weights

1.2000
1.0000
0.8000 Average of
0.6000 Completeness
0.4000 Score
0.2000
0.0000 Average of
Success Score

Correlation Coefficient .43
Tests conducted on sample of 15,000 OpenURLs randomly pulled from IOTA database

A Statistical Approach to Determining
Element Weights
• Select a set of “perfect” OpenURLs
• include all key data elements and resolve to full text

• Perform step-wise regression
• Test failure rates for each element by removing that element

• Use failure rates as basis for weights
• Use new weights to test for correlation between weights
and success for larger sample

Failure Rates from 1500 OpenURL test sample
Author’s last name is least
Element removed Description Failure Percentage
importantOpenURL
from the
ATitle Article title .74%
Date is AuLast
surprisingly low Author’s last name .07%
Date Date of publication .4%
ISSN ISSN (either online or 22.02%
print ISSN)
Issue Issue number 20.27%
SPage
Volume is most critical Start page 33.27%
Title Journal Title (either .61%
Title or Jtitle)
Volume Volume number 74.14%

Calculated Element Weights
Element Description Weight*

ATitle Article title 1.87
AuLast Author’s last name 0.83
Date Date of publication 1.61
ISSN ISSN (either online or 3.34
print ISSN)

Issue Issue number 3.31
SPage Start page 3.52
Title Journal Title (either Title 1.78
or Jtitle)

Volume Volume number 3.87

*Element weight calculation: log10 (failure-rate-per-10,000 OpenURLs)

Results: New Weights

1.2000
1.0000
Average of
0.8000 Completeness
0.6000 Score
0.4000
0.2000 Average of
Success Score
0.0000

Correlation Coefficient .80
Tests conducted on sample of 15,000 OpenURLs randomly pulled from IOTA database

Notes
Testing the same OpenURLs on 360-Link results in
different numbers but consistent trends. Differences may
be attributed to:
• Variations in metadata enhancement techniques
• Strictness in target link rules (e.g. required elements before link
shows – tied to level of forgiveness of target)
• Link syntax used for target

Notes
96.3 of OpenURLs in the test were able to populate a
full text target of credible ILL form…
• Perception of high failure rate of OpenURL may be attributed to
library holdings and user expectations
• Suggestion: set link text to control expectations
• Link to full text (for items in the online collection)
• Check library collection (for things in print collection)
• Request from library (for everything else)

Conclusions
• Step-wise regression approach to element weights works
• Completeness Index scores can be correlated to actual
OpenURL “success”
• KB and resolver technology influence results and prevent
a universal set of element weights

The Completeness Index is a mechanism
individual link resolver vendors can use to provide
metrics to help improve their service quality

Other takeaways
Several factors involved in perceived “link failure”:
1. Bad or missing metadata in the OpenURL link
2. Inaccurate holdings data within the resolver’s knowledge base
3. Flexibility of syntax to the target
- e.g., target supports at least two: OpenURL syntax, DOI link, proprietary link structure
4. Flexibility of resolution logic at the target
- i.e., target finds way to create link using available data when some data missing or
wrong
5. User expectations
- e. g., link resolver provided link to OPAC or ILL form, but user was expecting full text

- IOTA focused on (1)
- KBART working on (2)
- Education of content providers could address (4)
- Displaying OpenURL button only if full text available could address
(5)

What’s next for IOTA
• Continue offering public access to reports on element
frequency
• Publish technical report on work to date
• Publish recommended practice for calculation and use of
completeness scores for link quality assessment by link
resolver vendors
• Continue work as a NISO standing committee for at least
one more year

NISO's IOTA OpenURL Quality Initiative @ ALA & SLA 2012

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to NISO's IOTA OpenURL Quality Initiative @ ALA & SLA 2012

Similar to NISO's IOTA OpenURL Quality Initiative @ ALA & SLA 2012 (20)

Recently uploaded

Recently uploaded (20)

NISO's IOTA OpenURL Quality Initiative @ ALA & SLA 2012

Editor's Notes