Collection Intelligence: Using data driven decision making in collection management
1. Collection Intelligence: Using data driven
decision-making in collection management
Annette Day
Hilary Davis
North Carolina State University Libraries
Charleston Conference
November 6, 2010
2. Today’s Presentation
Using data to inform and articulate collections decisions
NCSU Libraries’ projects
Journal cancellation project
Collections Views tool
Return on investment for
journal backfiles
3. Maintaining a balance
Articulate and explain our decisions
Show our collection intelligence
Flickr: RayBanBro66
4. Data can help
Data-informed collection management
Types of Data
Cost
Use
Formats
Owned or Leased
Citation and publication patterns
Impact Factors
Regional holdings
Editorial activity
Ways to use Data
Show value/ROI
Use is high and increasing
Test assumptions about the collections
Fit and alignment with campus
Flickr: quinn.anya
5. NCSU Context
~31,000 students
~8,000 faculty
$10 million collection budget
4 million volumes
1 main library
4 branch libraries
Campus Strength Areas
Engineering, Architecture, Agriculture, Science, Technology,
Veterinary Medicine
Flickr: rshannonsmith, ncsunewsdept, Angela De Marco
7. Collections Review Project 2009/2010
15% cut in collections budget = $1.5 million
Significant journal cuts
1,112 journals proposed for cancelation
Cost for each title
Package bundle or piecemeal?
Usage statistics
Impact factor (where available)
Publication and citation data
Alternative access points
8. Gathering Campus Feedback
Low barrier to entry to encourage feedback
Created informational website
Authenticated Webform
Captured departmental affiliation and rank
Sortable
Saveable
Downloadable
Admin features
9.
10.
11.
12. Feedback Received
1,365 users 700 submitted feedback
12,710 title rankings
Lots of data; how to make sense of it all?!
Weighted approach
Minimize impact of ranking journals outside discipline/research
Cost per use
Additional data metrics
13. Processing the Feedback
Weighted Ranking – college affiliation and journal subject
Favored rankings most closely aligned with a user’s
research/teaching (weight of 1.0)
Minimized tangential/unrelated rankings (weight of 0.1)
Priority to “Must keep” rank (10 points)
Multiplied ranking points by the association weight and the
total number of rankings, then summed
Higher the number, the more campus wants to keep it
14. Ranking Patron's Department Patron's College % Match
Weighted
Ranking
Weighted
Ranking x
% Match
(Weighted
Ranking x
% Match)
x Total #
Rankings
Sum
((Weighted
Ranking x %
Match) x Total
# Rankings)
Can Cancel Entomology Agriculture and Life Sciences 0.5 1 0.5 7.5 621
Can Cancel Agricultural & Life Sciences Agriculture and Life Sciences 0.5 1 0.5 7.5 621
Can Cancel Agricultural & Life Sciences Agriculture and Life Sciences 0.5 1 0.5 7.5 621
Can Cancel Poultry Science Agriculture and Life Sciences 0.5 1 0.5 7.5 621
Can Cancel Engineering Engineering 0.8 1 0.8 12 621
Can Cancel
Humanities & Social
Sciences Humanities & Social Sciences 0.1 1 0.1 1.5 621
Can Cancel
Physical & Mathematical
Sciences Physical & Mathematical Sciences 1 1 1 15 621
Can Cancel Chemistry
Physical and Mathematical
Sciences 1 1 1 15 621
Can Cancel Chemistry
Physical and Mathematical
Sciences 1 1 1 15 621
Can Cancel Mathematics
Physical and Mathematical
Sciences 1 1 1 15 621
keep if
possible Engineering Engineering 0.8 5 4 60 621
keep if
possible Veterinary Medicine Veterinary Medicine 0.5 5 2.5 37.5 621
Must keep Electrical Engineering Engineering 0.8 10 8 120 621
Must keep Physics Physical & Mathematical Sciences 1 10 10 150 621
Must keep Physics Physical & Mathematical Sciences 1 10 10 150 621
Example: Astronomy Letters
15. Processing the Feedback – Other metrics
Cost per use
Other data points
Use data
Impact factor
Publication and citation data
Resulting Formula
Sum of the following:
Average of 2 most recent years of use data
Number of cites
(2 x Number of publications) x (impact factor +1)
More weight to data points we valued highly and reflected journal’s
relevance
16. Journal Title Price 2007 Use
2008
Use
Impact
Factor
LJUR
Pubs
LJUR
Citations
Data
Metric
Cost per
Use
Weighted
Ranking
Environmental Progress $486.00 64 67 1 0 11 24.62 $7 165.2
Robotics and autonomous systems $1,841.00 107 200 0.633 3 12 34.41 $12 536
Computational intelligence $858.00 23 76 1.972 2 4 26.72 $17 536
Sensor Review $2,972.00 156 84 2.40 $25 109.9
Journal of environmental science and
health - part A $3,886.00 99 164 0.967 1 36 79.92 $30 625.3
Information Processing Letters $2,238.00 42 83 0.66 2 10 25.32 $36 378.9
Materials Science and Technology $2,180.00 57 55 0.713 0 0 1.92 $39 1086.4
Separation science and technology $8,678.00 56 172 1.048 0 28 62.01 $76 284.9
Circuits, Systems, and Signal
Processing $1,407.00 12 18 0.456 0 2 3.35 $94 369.9
Distributed and Parallel Databases $927.00 6 11 0.771 0 1 2.07 $109 71.4
Applied Artificial Intelligence $1,485.00 15 12 0.753 1 8 18.00 $110 347.4
Plastics, rubber and composites $1,489.00 11 10 0.431 0.30 $142 80.4
Acta Informatica $1,219.00 4 7 0.8 1 7 16.40 $222 1413.3
Cybernetics and Systems Analysis $3,368.00 8 16 0.24 $281 50.5
International Journal of Satellite
Communications and Networking $412.00 0 2 0.284 0.03 $412 254.8
Chemical Engineering Research and
Design $1,692.00 0 2 0.837 2 22 47.80 $1,692 151.2
17. Issues/Challenges
What difficulties did we encounter?
List of what we subscribe to and costs
All data not available for every title
Usage statistics
Impact factor and publication/citation data
Processing the data
“Tune out” irrelevant rankings
Imprecise weighting
Data is instructive but not the final decision point
Technical skills needed to create webform
19. Collection Views Database Project
We needed to answer the
following questions:
How do the NCSU Libraries‘
expenditures on resources support
the research and teaching needs of
diverse colleges and departments
at NCSU?
What data exist that might help us
understand how our resource
expenditures look in terms of the
departments we serve?
Flickr: ncsunewsdept, egnowit
20. Data Types
Library data
Expenditure data
Monographs (Quantity & Cost)
Firm Order
Approval Plan
Serials (Cost)
Databases (Cost)
Subject Fund Codes
Examples:
• ENTO – Entomology
• GTEC – General Technology
• NATM – Atmospheric Sciences NRL*
• TDES – Textiles Design
Flickr: hemingway gyro
21. Data Types
Academic Department Data
NCSU Office of University Planning and Analysis
Faculty Headcount
Enrolled Student Headcount
Graduate Students
Undergraduate Students
NCSU's Sponsored Programs & Regulatory
Compliance Services
PhD Degrees Awarded
Research Grant Income
Flickr: ncsunewsdept
22. Connecting the Data
Map subject fund codes to departments
Connect library expenditures and department demographics (e.g., $x
supports the Physics Dept)
Present expenditure data and department data side-by-side
No “right way” to map codes to departments
A code could be applied to more than one department
Expenditures associated with a code applies to departments in full
(no weighting/no splitting)
Broad and narrow mappings
23.
24. Collection Views Database
An SQL database was created to store the data and the
mappings
Only have to add new data – not rebuild relationships and
other data
Flexible output options
Web
Custom queries
Canned queries
Data Portal
31. Uses of Collection Views
Distribution of collections budget/expenditures across
subject areas
Is it what we expected?
Is it in line with our knowledge of how specific
departments/disciplines use library resources?
Cumulative impacts of collecting decisions over time
Facilitates discussion on budget allocation
Graphs and charts provide illustrations of impact
32. Issues/Challenges
All depends on the mapping
Considering adding weighted mappings
Timely gathering of data
Campus data not readily available
SQL database programming skills
Digital Library Initiatives
34. Journal Backfiles ROI Project
Investment in online journal backfiles over many years
Demonstrate value and impact of these purchases
Usage statistics
Fiscal effectiveness
Non-traditional ROI approach
Cumulative cost of backfiles compared to cumulative use
Lower cost/use over time
Flickr: cambodia4kidsorg
36. How we calculated the metrics
Data Sources
Full text article downloads
Cost data
Every backfile purchased since 2003
Initial purchase cost and annual fees
Calculations
Initial cost and annual fees carried over through years
Cost divided by cumulative usage
39. Issues/Challenges
Non-traditional ROI metric
May need clarification
Use data not always available from year of purchase
Backfile use data is not always separate from current
journals
40. Final Thoughts
Data is a powerful tool, but not the end-all, be-all!
Moving Forward…..
Continued use of data
Build data skills competencies
Tools
Data manipulation and interpretation
Data dashboard
Expanded/Improved Tools
Visualization
For the NCSU demographic data, we worked with the NCSU Office of University Planning and Analysis to get data on number of faculty, students and staff in each department on campus. We also collected data on grant dollars acquired by each department from another database maintained by NCSU’s Sponsored Programs office.
To connect library expenditures and data about departments
Map subject fund codes to departments
Make connections between library expenditures and department demographics
View expenditure data and department data next to each other
Mapping was totally subjective – no right way
A Subject Identifier could be applied to more than one department.
The expenditure amount associated with the Subject Identifier applies to departments in full (no weighting).
Broad and narrow mappings – control scope of how codes are mapped to departments – make it a more broad mapping by including the general fund codes or make it more narrow by limiting to only the more specific fund codes.
Example to make all this clear!
Investment in online journal backfiles over many years
Approximately 90 backfile packages
How to demonstrate value and impact of these purchases
Usage
Fiscal effectiveness
i.e. were these good investments for campus
Non traditional ROI approach
Cumulative cost of archives compared to cumulative use
Lower cost per use over year
Investment in backfiles pays for itself over time