Unit-IV; Professional Sales Representative (PSR).pptx
CARLI Usage Stats Keynote 20130325
1.
2. What's the Use: A
Symposium on Usage
Statistics
John McDonald & Jason Price, PhD
CIO & AVP Interim Library Director
Claremont Colleges Library
March 25, 2013
CARLI Electronic Resources and Collections Working Groups
3. Overview: a Keynote in three parts
1. Broad perspective: Where are we now?
2. Detailed perspective : Addressing the
challenges of usage statistics
3. The latest: our present & future projects
7. How many people do you need in a room before it is
highly likely that two share a birthday?
a) Less than 30
b) 30 – 60
c) More than 60
d) 367
8. Which treatment for kidney stones is more
successful?
Treatment A Treatment B
Success Treatment A
78% Treatment B
83%
Rates (273/350) (289/350)
Small Group 1 Group 2
Stones 93% (81/87) 87% (234/270)
Group 3
Large Group 4
73%
Stones 69% (55/80)
(192/263)
Success 78%
83% (289/350)
Rates (273/350)
10. Harvesting the Crop: Implementing a Usage Statistics Management
System at Georgia State
Social Media, ROI and Cookie Day
How Do E-Resources Contribute to Teaching and Learning? Findings
from the Lib-Value Project
Using Data Visualization Tools for Collection Analysis
To Keep, or Not to Keep: The Effect of Discovery Tools on Licensed
Resources
Everything That's Wrong with E-book Statistics - A Comparison of E-
book Packages
Discovery & Usage: The Foundation of a Powerful Collection
13. Progress
Commonly agreed upon measures
Routine methods of transmission
Regular formatting of files
Standard dates for delivery
Audits of reports
Certification of compliant vendors
Established process for refinement
14. Still evolving
Comprehensive coverage of publishers
Sophistication on Ebooks & Databases
Automation
Further granularity
Measures for non-text usage (article
parts)
Article level metrics
17. # of Total $ of Additional Total Savings
Ebooks Ebooks not STL Costs over Existing
Purchased Purchased Plan
Purchase on 89 $17,382.31 $3,327.20 $14,055.11
Cost Projections - GVSU
4th Loan
Purchase on 58 $24,512.55 $4,621.09 $19,891.46
5th Loan
Purchase on 34 $25,722.11 $5,041.64 $20,680.47
6th Loan
Purchase on 22 $26,899.83 $5,324.84 $21,579.99
7th Loan
Doug Way and Julie Garrison, “Financial Implications of Demand-Driven
Acquisition,” in David Swords (ed.) Patron-Driven Acquisitions: History
and Best Practices. (Berlin: De Gruyter Saur, 2011), p. 148.
19. DU Storage study
Levine-Clark, Michael, “Analyzing and Describing Collections Use: Strategies for
Managing a Library Move,” LYRASIS Ideas and Insights, Webinar, May 4, 2012.
http://www.slideshare.net/MichaelLevineClark/
21. Each “Title-Holding” has different characteristics
Dominguez Fullerton Long Beach Los Angeles Northridge Pomona
Hills
Total Circulations
0 circs 19 circs 16 circs 12 circs 13 circs 8 circs
Last Circulation Date
-none- 11/30/11 12/16/08 5/30/07 4/27/07 3/11/08
Date added to Collection
6/27/02 4/23/02 9/21/01 5/03/00 11/11/02 8/11/00
Sustainable Collections Services, Maine Shared Collections Strategy Planning
Meeting, http://www.slideshare.net/Maine_SharedCollections/mscs-scs-planning-meeting-rick-
21
lugg-andy-breeding
22. Sample Pilot Group - Title-Holdings by Holdings Level
2,000,000
Sample Pilot Group - Title-Holdings by Holdings
Level
1,800,000
2,000,000
1,600,000
1,800,000
779,756
1,400,000
1,600,000 4+ circs
4+ circs 779,756
1,400,000
1,200,000 1-3 Circs
1-3 Circs
1,200,000
1,000,000
0 circs
0 circs
1,000,000
800,000 305,438 539,718
800,000 305,438 539,718
600,000 257,739
600,000 311,240 257,739
400,000 311,240
400,000
220,071
220,071 560,107
200,000 560,107
362,050
200,000 362,050 239,202
239,202
-
-
1 1 22 3-6 3-6
# of Pilot Group Libraries Holding Title
# of Pilot Group Libraries Holding Title
23. Resource Sharing: CAMINO Collections
CUC
LMU
Oxy
Pep
UOP
CST
Wstmt
CalArts
CBU
Dom
WJU
WUHS
AJU
HNU 0 200,000 400,000 600,000 800,000 1,000,000 1,200,000
Books held only by library
Books held by BOTH library and the rest of Camino
Books held only by the rest of Camino
35. Part 2: Addressing the
Challenges of Usage Stats
1. Comparability
• Package price per use
• Defining the appropriate range(s) of cost per use
• Practical applications
2. Reliability
• Impact of mobile, discovery & harvesters
3. Prediction
• Demand Driven Acquisition
• Number of books available <> Size of budget
4. Context – Data about our data
37. Challenge 1: Comparing Package Price Per View
pkgIDTotal Use SubsCost UnSubsCost Overall PPV
S3.140048 $1,652,000 comparison$13.10
1 Cross-package $182,000
2 20341 $333,000 $10,000 $16.86
3 13572 $282,000 $21,000 $22.33
So Pkg 1 is a better value than Pkg 3?
It might not be…
38. html to pdf Ratios vary widely for these packages
50000 48047
html views
40000 pdf downloads
32688
# of views
30000
1:1.3 1:23
20000
13004
1:12
10000
4066
352 568
0
1 2 3
Package
How many pdfs in Pkg 1 are duplicates of html views?
(fmi: See Davis & Price, 2006 JASIST 57(9))
39. Getting a pdf from Package 1…
‘Get article’ links directly to the html version…
then the user downloads the pdf…
…2 uses are recorded for 1 pdf
41. pkgID Package value revisited
S3. Use SubsCost UnSubsCost Overall PPV
Total
1 140048 $1,652,000 $182,000 $13.10
2 20341 $333,000 $10,000 vs. $16.86
3 13572 $282,000 $21,000 $22.33
pdf requests only tell a different story!
pkgID Est. pdf Use SubsCost UnSubsCost Overall PPP
1 83469 $1,652,000 $182,000 $21.97
2 18734 $333,000 $10,000 $18.31
3 13287 $282,000 $21,000 $22.80
42. Addressing Challenge 1:
Comparing Package Price Per View
When comparing packages, both total views
and PDF downloads should be compared
Extension of principle: Journal report 1B
JR 1a JR 1b
ARCHIVE FRONTFILE
43. Challenge 2:
Defining acceptable range(s) of
cost per use
Among packages
Within packages
44. Reality Check
Should we expect cost per use to be
equivalent among packages?
Content Quality
Business Model
For Profit vs Cost Recovery
Exposure in Discovery tools
Title list accuracy
Backfile access
ASSUMPTIONS
46. Consortial Benchmarks
SCELC Package 'W' Overall Price per Use
$50.00
Price per full text article view
$40.00
$30.00
Use data not avaliable
$20.00
$10.00
$0.00
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Consortium Member (Sorted by decreasing spend)
47. Consortial Benchmarks
SCELC Package 'W' Overall Price per Use
$50.00
Price per full text article view
$40.00
$30.00
Use data not avaliable
$20.00
$10.00
$0.00
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Consortium Member (Sorted by decreasing spend)
48. Consortial Benchmarks
SCELC Package 'W' Overall Price per Use
$50.00
Price per full text article view
$40.00
$30.00
Use data not avaliable
$20.00
$10.00
$0.00
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Consortium Member (Sorted by decreasing spend)
49. Consortial Benchmarks
SCELC Package 'W' Overall Price per Use
$50.00
Price per full text article view
$40.00
$30.00
Use data not avaliable
$20.00
$10.00
$0.00
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Consortium Member (Sorted by decreasing spend)
50. Subscribed titles w/in a pkg- Apples to apples?
$1,200
$4300/year; 11 uses
$1,000
Price Per Use (PPU)
$800
$8800/year; 33 uses
$600
$3400/year; 17 uses
$400
$200
$0
0 10 20 30 40 50
Subscribed Title # (ordered by PPU)
51. Strict title level cost per view is misleading
Cost per view by access type
$35
$30.51
$30
$25
Cost per view
$20
$13.41
$15
$10
$5
$0.81
$0
All Titles Subscribed Titles Unsubscribed
[n=537] [n=192] (Leased) Titles
[n=345]
Access Type
52. Best practices for usage comparison tasks
1. Goal - Identify pricing inequity
a. Best accomplished by consortial benchmarking
b. Requires readily available package level cost per use
across consortial participants
c. Leverage COUNTER consortial reports and economy
of scale of consortial specialist
2. Goal - Identify lower value packages
a. Use both total views and pdf download comparisons
3. Goal - Identify lower value titles
a. Only after targeting specific lower value packages
b. Recognize that by title price per use comparison is only
valid within a package
53. Challenge 2: The convenience/reliability trade off
COUNTER R4: Search activity generated by
federated search engines and other automated
Case Search Inflation Full text
search agents should be included in separate impact
Impact inflation
Direct from Google IP Low? [Cost is granularity Low?
“Searches_federated and automated” counts
access of usage stats]
…and are NOT to Unlikely to be significant “Regular
Mobile devices
be included in the Low to None
Searches” counts.
Federated search Significant, but COUNTER None
engines (built into some rules require separate
discovery tools) automate reporting & number of
searches searches has always had
dubious meaning
Harvesters (e.g. Quosa) Same as federated search Potentially very
automate article high
downloads from search
results
56. Challenge 3: Prediction – Coming soon!
Observations:
• Libraries prefer predictability over savings!
• Title level journal usage is remarkably
predictable year on year
• Usage driven purchasing is ripe for modelling
based on this predictability
57. Example: Demand driven ebook forecasting
Estimated List Size
-OR-
Estimated Annual Expenditure
=List Size ×
(% visible list purchased × mean book price) +
(% visible list w STL × mean cost per STL × mean STL per title)
58. Challenge 4 – Context = metadata!
• We do need good data about our data
• Data quality is more than just accuracy
• Retrospective studies require history!
• Circulation Statistics
• Dates of profile changes
• Cross library comparisons
• In an ideal world we’d share datasets with rich
metadata
• Library science is far from this ideal world
• An example of the power of good retrospective
data…
59. Total Books & Usage
User- Pre- Usage by Usage Read
Library Model
Selected Selected Download Online
A MIX 1131 552 6773 9888
B MIX 5246 2612 42880 38329
C USER 2198 102 0 11801
D USER 3010 48 697 15126
E MIX 4159 909 17396 25604
F PRE 0 1451 4905 3082
G PRE 31 2154 7001 4459
H USER 801 0 556 415
I MIX 305 336 3334 2568
J USER 2799 53 5 13349
K MIX 147 276 2436 2283
TOTAL 19,831 8,496 85,983 126,904
60. Total Books & Usage
User- Pre- Usage by Usage Read
Library Model
Selected Selected Download Online
A MIX 1131 552 6773 9888
B MIX 5246 2612 42880 38329
C USER 2198 102 0 11801
D USER 3010 48 697 15126
E MIX 4159 909 17396 25604
F PRE 0 1451 4905 3082
G PRE 31 2154 7001 4459
H USER 801 0 556 415
I MIX 305 336 3334 2568
J USER 2799 53 5 13349
K MIX 147 276 2436 2283
TOTAL 19,831 8,496 85,983 126,904
61. Total Books & Usage
User- Pre- Usage by Usage Read
Library Model
Selected Selected Download Online
A MIX 1131 552 6773 9888
B MIX 5246 2612 42880 38329
C USER 2198 102 0 11801
D USER 3010 48 697 15126
E MIX 4159 909 17396 25604
F PRE 0 1451 4905 3082
G PRE 31 2154 7001 4459
H USER 801 0 556 415
I MIX 305 336 3334 2568
J USER 2799 53 5 13349
K MIX 147 276 2436 2283
TOTAL 19,831 8,496 85,983 126,904
62. Total Books & Usage
User- Pre- Usage by Usage Read
Library Model
Selected Selected Download Online
A MIX 1131 552 6773 9888
B MIX 5246 2612 42880 38329
C USER 2198 102 0 11801
D USER 3010 48 697 15126
E MIX 4159 909 17396 25604
F PRE 0 1451 4905 3082
G PRE 31 2154 7001 4459
H USER 801 0 556 415
I MIX 305 336 3334 2568
J USER 2799 53 5 13349
K MIX 147 276 2436 2283
TOTAL 19,831 8,496 85,983 126,904
67. Data required
• Book purchase date
• Book purchase type
• Many years of use
• Different types of use
• Library purchasing profile
• Library list profile (what content was excluded)
• Individual user IDs (anonymized)
• Came from 4 files per library with a total of 69
data elements….
• We found one vendor that invested in library
facing reports the level of data needed, there are
few others…
• Addressing the challenge: a consortial solution?
68. Part 3: Our present & future
1. Improving usage stats collection
a. (External) Consortial paperstats
b. (Internal) Dublin Six AUDITOR
2. Improving usage stats visualization
a. Excel Conditional formatting
b. Splunk for Dashboard Creation…
3. Better database metrics
4. Improving on Journal number
comparisons
5. Usage Factor for Journal Evaluation
70. SCELC PaperStats
by the numbers
Total number of full text downloads tracked for
SCELC: 312,908,657
Total counter reports downloaded: 2000+
Total number of logins: 387
Number of month records: 20.3M
Earliest year covered: 2003
Total number of reports being harvested: 15
Total number of institutions covered: 95
Total number of participants: 14
79. Beyond numbers of journals & total usage
• Knowledge base & Usage statistics comparisons
• Selected group of peers with same
knowledgebase & stats consolidation vendor
• Run comparisons in Access & Excel
80. Usage Factor Formula
Usage Factor =
Total usage over period ‘x’ of articles published during period ‘y’
÷
Total articles published during period ‘y’
I could tell you about all the useful interesting things that either Jason or I have done or that we’ve worked on together. But here is the most important thing for you to know today about us!
Give the agenda for the talk
My thoughts on what usage statistics could have done and what they aren’t currently doing. Story about usage based pricing from AAAS, how it was never known what people were doing – bibliometric research was mostly based on WoK/ISI data, etc.How statistics can be wielded for mis-use. Drawing causation from correlation, or using raw count data that is not statistically significant. Story about a professor asking a speaker to put up his slides “Which one?” “Anyone of them – I have a critique on every one of them”Investment of a lot of time and effort and it’s not paying off … yet.
In the example given earlier, a list of 23 people, comparing the birthday of the first person on the list to the others allows 22 chances for a matching birthday, the second person on the list to the others allows 21 chances for a matching birthday, third person has 20 chances, and so on. Hence total chances are: 22+21+20+....+1 = 253, so comparing every person to all of the others allows 253 distinct chances (combinations): in a group of 23 people there are pairs.Presuming all birthdays are equally probable,[2][3][4] the probability of a given birthday for a person chosen from the entire population at random is 1/365 (ignoring Leap Day, February 29). Although the pairings in a group of 23 people are not statistically equivalent to 253 pairs chosen independently, the birthday paradox becomes less surprising if a group is thought of in terms of the number of possible pairs, rather than as the number of individuals.
The paradoxical conclusion is that treatment A is more effective when used on small stones, and also when used on large stones, yet treatment B is more effective when considering both sizes at the same time. In this example the "lurking" variable (or confounding variable) of the stone size was not previously known to be important until its effects were included.Which treatment is considered better is determined by an inequality between two ratios (successes/total). The reversal of the inequality between the ratios, which creates Simpson's paradox, happens because two effects occur together:The sizes of the groups, which are combined when the lurking variable is ignored, are very different. Doctors tend to give the severe cases (large stones) the better treatment (A), and the milder cases (small stones) the inferior treatment (B). Therefore, the totals are dominated by groups 3 and 2, and not by the two much smaller groups 1 and 4.The lurking variable has a large effect on the ratios, i.e. the success rate is more strongly influenced by the severity of the case than by the choice of treatment. Therefore, the group of patients with large stones using treatment A (group 3) does worse than the group with small stones, even if the latter used the inferior treatment B (group 2).
So, who cares? Well, given this meeting and a variety of others over the years, obviously we’re still seeking that concept of ‘The Promise’ for usage statistics. And in fact, we’re making progress – at Charleston 2012, we saw the following sessions that were partially or primarily about usage statistics in some form or other.
See if I can find a slideshare of this article to show something more akin to statistics.
Add reference for this
Sustainable Collections Services, Maine Shared Collections Strategy Planning Meeting, http://www.slideshare.net/Maine_SharedCollections/mscs-scs-planning-meeting-rick-lugg-andy-breeding
Add reference
Find better example of this.
This is a graph of the number of articles covered by source as of last month. We only started tracking Twitter in June of this year and it’s expected that the graph will change a social media sites accrue more mentions to PLOS articles.
Even though there are good reasons not to expect CPU to be the same, Cite Blecic talk (3rd Breakout Presentation) …Size of the discipline is always an issue in PPU – Scale by relative size of user base
Sort of subscribed titles by price per use, for use of package cancellation allowanceAcknowledge that these prices don’t tell the whole story, since they subsidize the “unsubscribed” titlesComparison of price per use of subscribed titles from other packages would be apples to oranges
Variables in green influenced by purchase trigger (# of loans before purchase)Variables in blue could be effected by subject profile or Max Price% visible list with STL…
*In all subsequent slides user books from user selected collections are in blue, and those from preselected collections are in green*Overall Average number of uses per year in general quite high ≈ 6 per year *Average number of post-purchase uses per year is significantly greater for user-selected ebooks (2x as high) *Even though the total number of books (n) in the user selected set is greater, this has no effect on the result—these are PER BOOK averages, so each book in the user selected collection is used an average of 8.6x per year, andeach book the preselected collection is used an average of 4.3x per year*This result rejects the hypothesis rejects the hypothesis that users will select ebooks will be used less than pre-selected ebooks
*Pattern of greater use for user-selected books is consistent across all 5 libraries: 4 of 5 are significantly different based on non-overlapping 95% confidencec intervals*degree of difference varies from 1.75x to 4.5x
*This figure shows for the number of unique users per ebook per year for the overall user selected and preselected collections*The average user-selected ebook was used by a significantly greater number of different users per year (about 2x as many)*These data allow us to result rejects the hypothesis that users select books that are only of interest to themselves
*Here we see that pattern of wider use of user-selected ebooks is also consistent across the 5 libraries, with the same 4 libraries showingsignificantly wider useThe degree of this effect varies from 1.75x to 3.3 times more unique users per book per year in user-selected collections
Out of the research, an idea of what metrics could contribute to a Usage Factor measure began to emerge. Similar to Impact Factor, it was Total Usage over a Specified Time Period of the Articles Published during a Time Period, divided by the Total articles published during the Time Period
Adding in journals attending PPM that are not ISI ranked (Green bars = no IF rank)