The document discusses extracting transactional data from library systems like the integrated library system (ILS), interlibrary loan (ILL) software, and other vendor services. It describes setting up an application server to store this extracted data in a database for reporting and analysis. The goal is to mine this data to determine which patron groups, like academic majors and departments, are accessing different library services and resources.
Boost PC performance: How more available memory can improve productivity
Crushing, Blending, and Stretching Transactional Data
1. Crushing, Blending, and
Stretching Transactional Data
Data Warehousing and Mining Data from
Voyager and Other Library and University
Systems for Assessment of Library
Operations
VALE Users’/NJLA CUS/NJ ACRL Conference
Busch Campus Center, Rutgers University,
Piscataway, New Jersey,
Friday, January 8, 2010
Ray Schwartz,
Systems Specialist Librarian
Cheng Library, William Paterson University,
Wayne, New Jersey, USA
schwartzr2 @ wpunj.edu
2. Outline
• Assessment and Why Now?
• What is Data Mining and Data
Warehousing and Why Do We Do It?
• Our Library and University
• Groups and Services
• Steps
• Reporting
2
3. Recent Extent of Assessment
• ILSs collect transactional data for circulation
and allocation of collection funds.
• ILL and Document Delivery services supply
general transactional data.
• Reports from other vendor services
– Bibliographic utilities
– Subscription agents
– Book jobbers
• Many other ways of collecting transactional
data.
– Gate counts
– Reference transaction counts
– Reshelving counts 3
7. What is different now?
• Most ILSs have search and web server logs
• Most (if not all) Full-Text Databases have usage
reports
• Link Resolver logs
• Proxy Server logs
• Google Scholar Library Links
7
8. What would we like to see?
• Breakdowns by department and majors.
• Combined usage by department/majors
of more than one library service.
8
9. What is Data Mining and Data
Warehousing
• Extracting data from legacy systems and other
resources;
• cleaning, scrubbing and preparing data for decision
support;
• maintaining data in appropriate data stores;
• accessing and analysing data using a variety of end
user tools;
• and mining data for significant relationships.
• Chaffey, D., Mayer, R., Johnston, K., & Ellis-Chadwick, F. (2002). Internet Marketing:
Strategy, Implementation and Practice (2nd ed.). Financial Times/ Prentice Hall.
9
10. • The primary purpose of these efforts is
to provide easy access to specifically
prepared data that can be used with
decision support applications such as
management reports, queries, decision
support systems, executive information
systems and data mining.
• Chaffey, D., Mayer, R., Johnston, K., & Ellis-Chadwick, F. (2002). Internet Marketing:
Strategy, Implementation and Practice (2nd ed.). Financial Times/ Prentice Hall.
10
11. Of course there are many
ways to measure
–
Scott Nicholson’s
Measurement Model
11
12. Measurement Matrix with
methodologies
Topic
Perspective Library System Use
Procedures and Standards Recorded interactions with
Internal (Library •Staff survey and interviews interface & materials
System) •Audits of collections, systems, •Bibliomining
or staff •Transaction/Web Log Analysis
•Observation of User Behavior
Usability Knowledge states and User
External •Effectiveness of the system for citations to materials
the staff and institution. •How useful is the library
(User) system?
•Focus groups, User Citation
tracking
Nicholson, Scott (2004). A Conceptual framework for the holistic measurement and cumulative evaluation12
of library services. Journal of Documentation 60(2) p.164-181
14. Our Library
• 19 librarians and 26 library staff
• 350,000 volumes
• 18,000 audiovisual items
• 22,000 print and electronic periodicals
• 100 general and subject specific databases
14
15. Our Systems since 2005
• Voyager ILS
• Online Periodical Database (OPD)
• Clio ILL Software
• EZProxy Server
• Banner – University ERP
• University Networked Drive K:
• University Email Server
• University Web Server
15
16. Systems Chart – ca. 2005
Integrated Library System www.wpunj.edu
Online Periodicals Serials
Form
Scripting Language Database Scripting Language
ILL Form
Web Server ER
Micro Page
Web Server
DBMS Form
Voyager Materials
Proxy Server
Circulation Media
Scheduling
Off Campus Dbase Hits
Patrons
Patrons Searches & ILL Form
( EZProxy Log )
Banner
SIS HRS University Networked
Drive K:
( University ERP System ) University Email Server
Patrons Materials
Serials Solutions OCLC – Bibliographic Utility ILL ( Cliodata )
A to Z
WorldCat
ILL
Other Vendors‘
Database Services
Current Relationships
Internal Externally & Usage Reports
only accessible Non
WPUNJ WPUNJ WPUNJ
Server
Server Server 16
20. • Voyager Patron Database allows a maximum
of 10 statistical categories per patron record.
• Decide which statistical categories are needed
for each patron group defined.
• Work with your University Information Systems
Department to extract the relevant data from
the relevant sources.
20
21. Groups and Services
• Major • Circulation
• Status – Books
– Media
– Undergrad or Grad
– Reserve
– Faculty, Adjunct Faculty or
– By Fund Code
Staff
– Location
• Department
• ILL / Document Delivery
• College • Databases
• Degree • Library Web Pages
• No. of Credits – Subject Area Resource Guides
– Reference Requests
• Year of Study
• Catalog
• Campus Location • Other Vendor Services
– Serials Solutions
21
22. History Department - 12 months - Feb. 2008
%
BORROW CIRC/ CIRC/
PATRON STATUS BOOK CIRC MEDIA CIRC EQUIP CIRC TOTAL CIRC MEMBERS BORROWERS ING MEMBER BORROWER
UNDERGRADUATE
STUDENTS 2,715 250 698 3,663 238 186 78% 15.39 19.69
GRADUATE
STUDENTS 419 13 76 508 14 13 93% 36.29 39.08
ADJUNCT FACULTY 100 65 20 185 32 20 63% 5.78 9.25
FULL-TIME FACULTY 159 115 194 468 24 23 96% 19.50 20.35
HISTORY TOTALS 3,393 443 988 4,824 308 242 79% 15.66 19.93
LIBRARY TOTALS 23,370 8,713 20,703 52,756 7,418 4,981 67% 7.11 10.59
DEFINITIONS:
BOOK CIRCULATION = books, book disks, maps, oversize, Curriculum materials, reserve books, NJ History, Leisure Lounge
MEDIA CIRCULATION = audio & video materials, including media reserves
EQUIPMENT CIRCULATION = camcorders, overhead & data projectors, laptops, easels, DVD players, etc.
MEMBER = declared major or department member
BORROWER = any member who borrowed materials
Library Total = declared undergrad & grad majors, adjuncts & full time faculty borrowers
22
23. Problems with Configuration of
Services
• Little to no linkage of data
• Need to search multiple services to get
complete picture of serial holdings
• Multiple user IDs for authentication
23
24. Retirement the the OPD
• Serials holdings data was extracted
from the OPD and added to Voyager
catalog
• From Voyager catalog, serials
holdings data is extracted and added
to Serials Solutions A to Z list
24
31. Our Systems in 2008
• Voyager ILS
• Shared Application Server
• Clio ILL Software
• EZProxy Server
• Banner – University ERP
• University Networked Drive K:
• University Email Server
• University Web Server
31
32. Systems Chart - 2008
Integrated Library System Application Server www.wpunj.edu Serials
Form
Scripting Language Scripting Language
ILL Form
Scripting Language Web Server ER
Micro Page
Web Server Form
Voyager Web Server Proxy Server
Circulation Media
Scheduling
DBMS Off Campus Dbase Hits
Patrons Searches & ILL Form
OffCampus ILL ILL
Dbase Patrons/ Patrons/ ( EZProxy Log )
Usage by Materials
Materials
Patron Requested
Groups Received
Banner
SIS HRS University Networked
( University ERP System ) University Email Server Drive K:
Patrons Materials
Serials Solutions OCLC – Bibliographic Utility ILL ( Cliodata )
A to Z
W WorldCat
MARC Records C
Link Resolver A ILL
Other Vendors‘
Database Services
& Usage Reports
Current Relationships
Internal Externally
only accessible Non
WPUNJ WPUNJ WPUNJ
Server
Server Server 32
33. What is an Application Server?
• A machine or its software that works in
conjunction with a web server to deliver
application services such as the dynamic
creation of a webpage from content stored in a
database. From http://www.webtools.ca.gov/help/Glossary.asp
• Web Server Software (Apache or IIS)
• Database Management System – DBMS (MySQL,
Oracle, MS SQL Server)
• Scripting Language (Perl, PHP, ColdFusion, ASP)
33
34. Why an Application Server?
• Relevant data in logfiles need to be in
a database to be analyze.
• Need your own DBMS to create new
tables and queries.
34
35. • Decide how you will use the Application
Server.
• Decide on the best and most plausible
configuration.
35
36. Daily and Weekly Email
Reports from the Application
Server
Circ Fines Audit Daily Report - Daily at 6:05 AM.
Dupe Patron Record Report - Daily at 5:56 AM.
Hobart Media Services Equipment Pickup Summary - Daily at 6:58 AM.
Media Service Scheduling Rooms Report - Daily at 6:02 AM.
Media Services Equipment Pickup Summary - Daily at 7:00 AM.
Received Title Alert - Daily at 6:59 AM.
Reserves Overdues - Daily at 5:59 AM.
Scheduled LIS Tasks - Daily at 6:00 AM.
ILL Borrowing Overdues Report - Weekly at 5:59 AM.
ILL Lending Reports - Weekly at 6:15 AM.
36
37. Monthly Email Reports from
the Application Server
Circ Fines Audit - Monthly at 6:10 AM.
Circulation by Location and Item Type - Monthly at 6:21 AM.
Circulation Lost and Paid - Monthly at 6:25 AM.
Circulation Online Renewal Count - Monthly at 6:30 AM.
Media Circulation - Monthly at 6:35 AM.
Reserve Circulation - Monthly at 6:40 AM.
37
40. Lending Services Reports
Lists of patrons with fines between $10 and $19.99
• Student and Alumni fines list - Sorted by either Name, Amount or Notice Date.
• PALS and Courtesy Patron fines list - Sorted by Name.
• All other Patron fines list - Sorted by Name.
Lists of patrons with fines over $19.99
• Student and Alumni fines list - Sorted by either Name, IID, Amount, Notice Date or
Notes.
• PALS and Courtesy Patron fines list - Sorted by Name.
• VALE Patron fines list - Sorted by Name.
• All other Patron fines list - Sorted by Name.
Lists of patrons with overdues older than 30 days
• Student and Alumni overdues list - Sorted by either Name, IID or Notes.
• PALS and Courtesy Patron overdues list - Sorted by Name.
• All other Patron overdues list except VALE - Sorted by Name.
40
41. Lending Services Reports, cont.
Lists of VALE patrons with overdues older than 6 months
• VALE patron overdues list - Sorted by Name.
Miscellaneous Reports
• Patrons with the word "Collection Agency" or "CA" in their notes.
• Patrons with the word "FINE" in one of their notes.
• Patrons with the word "SOILS" in their notes.
• Patrons with the word "FALL07 SOILS" in their notes.
• Patrons with the word "HOLD" in their notes.
• Combined list of HOLD, FINE, and CA.
Circulation Reports by Item Type from 2003 to the present
• All Staff.
• All Colleges
• Undergraduates by Major.
• Graduates by Major
• Patrons that have reached a total fine balance of $10 or more after 31-Dec-2009
and 30-Nov-2009 41
42. One of Our Projects
• Mining EZProxy logfiles and linking to patron
statistical categories from the Voyager Patron
Database
– What majors and departments are accessing
which database services?
– What majors and departments are accessing
the ILL services?
42
43. Systems Chart - 2008
Integrated Library System Application Server www.wpunj.edu Serials
Form
Scripting Language
Scripting Language Scripting Language
ILL Form
Web Server ER
Micro Page
Web Server Form
Voyager Web Server Proxy Server
Circulation Media
Scheduling
DBMS Off Campus Dbase Hits
Patrons Searches & ILL Form
OffCampus ILL ILL
Dbase Patrons/ Patrons/ ( EZProxy Log )
Usage by Materials
Materials
Patron Requested
Groups Received
Banner
SIS HRS University Networked
( University ERP System ) University Email Server Drive K:
Patrons Materials
Serials Solutions OCLC ILL ( Cliodata )
A to Z
W WorldCat
MARC Records C
Link Resolver A ILL
Other Vendors‘
Database Services
& Usage Reports
Current Relationships
Internal Externally
ILL Collection and Patron Group Analyses only accessible Non
WPUNJ WPUNJ WPUNJ
Server
43
Off Campus Database Hits by Patron Group
Server Server
44. ILL request form authentications by major
Article Book
Count Major Count Major
62 M- Psychology 90 M- History
60 M- Sociology 28 M- Non-Degree
42 M- Applied Clinical Psych 25 M- Pub Pol & Intl Affairs
35 M- Education 20 M- Spanish
31 M- History 18 M- English
30 M- Spanish 16 M- Undecided
29 M- Nursing 14 M- Art
M- Communication 14 M- Education
19 Disorders 11 M- Sociology
19 M- Communication 10 M- Biology
14 M- Biotechnology 9 M- Music
14 M- Counseling 9 M- Special Programs
14 M- English 8 M- Psychology
12 M- Non-Degree 7 M- Biotechnology
10 M- Community/Sch Health 7 M- Political Science
7 M- Biology 6 M- Anthropology
7 M- Political Science 6 M- Music - Jazz Studies
6 M- Undecided 4 M- Business
5 M- Comm Media Studies 4 M- Communication
5 M- Reading 4 M- Nursing
4 M- Business
44
53. • Active Circ transactions are stored in a
table with patron ID and statistical
categories.
• Completed Circ transactions are stored
in a table without the patron ID, but still
with the patron statistical categories.
• The Patron Table contains the total
counts of transactions for each patron,
but no link to which transactions they are.
53
54. • EZProxy transactions would be stored in
one table with patron statistical
categories, but without the user
ID.
• User ID s would be stored in another
table with counts for each service divided
by academic
year.
• Logs are collected monthly and loaded
and deleted monthly.
54
55. Example of EZProxy log entry
• Ip address nj.dhcp.embarqhsd.net
• (Not used) -
• user id theuser
• date/time 1/1/2008 4:25:15 AM
• Method GET
• page http://ezproxy.wpunj.edu:2048/connect?session=sGHMbeSss121YxZ
a&url=http://www.wpunj.edu/scripts/webscript.exe?fs.scr
retrieved
HTTP/1.1
• Version
302
• response
code
• no. of bytes 537
• Referring http://ezproxy.wpunj.edu:2048/login?url=http://www.wpunj.edu/scripts/
URL webscript.exe?fs.scr
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR
• User agent 1.1.4322)
55
56. Perl Script for loading ezproxy
log into MySQL
use strict;
my
%month=(Jan=>'01',Feb=>'02',Mar=>'03',Apr=>'04',May=>'05',Jun=>'06',Jul=>'07',
Aug=>'08',Sep=>'09',Oct=>'10',Nov=>'11',Dec=>'12');
while (<>){
my $pattern =
'^(S*) (S*) (S*) (S*) '.
'[(..)/(...)/(....):(..):(..):(..) .....]'.
' "(S*) (S*) (S*)" '.
'(d*) (-|d*) "([^"]*)" "([^"]*)"';
if (m/$pattern/){
my ($tgt,$ref,$agt) = (esc($12),esc($16),esc($17));
my $byt = $15 eq '_'?'NULL':$15;
print "INSERT INTO ezproxylogs VALUES ('$1','$2','$3',".
" TIMESTAMP '$7/$month{$6}/$5 $8:$9:$10','$11','$tgt',".
"'$13',$14,$byt,'$ref','$agt');r.";
}else{
print "--Skipped line $.n";
}
}
sub esc{
my ($p) = @_;
$p =~ s/'/''/g;
return $p; 56
}
57. Created table to assist the
linking
SELECT PATRON_ADDRESS.ADDRESS_TYPE,
Left([ADDRESS_LINE1],InStr([ADDRESS_LINE1],"@"
)-1) AS usr,
PATRON_ADDRESS.PATRON_ID,
PATRON_ADDRESS.ADDRESS_STATUS,
PATRON_ADDRESS.EFFECT_DATE,
PATRON_ADDRESS.EXPIRE_DATE,
PATRON_ADDRESS.MODIFY_DATE,
PATRON_ADDRESS.MODIFY_OPERATOR_ID INTO
emailprefix
FROM PATRON_ADDRESS
WHERE
(((PATRON_ADDRESS.ADDRESS_TYPE)="3"));
57
58. Immediate Tasks
SIS/HRS extracts to import into MySQL
DMBS on the application server.
• To be able to store more statistical
categories.
Export Patron SIF from MySQL into
Voyager Patron database.
58
59. Systems Chart - 2010
Integrated Library System Application Server www.wpunj.edu
Serials
Form
Scripting Language
Scripting Language
ILL Form
Scripting Language Web Server
Web Server
ER
Micro Page
Form
Voyager Web Server Proxy Server
Circulation Media
Scheduling
Patrons DBMS Off Campus Dbase Hits
Patrons Searches & ILL Form
OffCampus ILL ILL
Dbase Patrons/ Patrons/ ( EZProxy Log )
Usage by Materials
Materials
Patron Requested
Groups Received
Banner
SIS HRS University Networked
( University ERP System ) University Email Server Drive K:
Patrons Materials
Serials Solutions OCLC ILL ( Cliodata )
A to Z
W WorldCat
MARC Records C
Link Resolver A ILL
Other Vendors‘
Database Services
& Usage Reports
Current Relationships
Internal Externally
ILL Collection and Patron Group Analyses only accessible Non
WPUNJ WPUNJ WPUNJ
Server
59
Off Campus Database Hits by Patron Group
Server Server
60. Reporting and Standards
• Reporting
– Emailed periodically - e.g., daily dossiers,
and other event triggered reports.
– On demand, via email, web pages or a
printer.
• Standards
– Share data for comparative research.
– Groups of libraries and consortia
60
61. Questions?
Ray Schwartz,
Systems Specialist Librarian
Cheng Library, William Paterson University,
Wayne, New Jersey, USA
schwartzr2 @ wpunj.edu
61