2. Outline
Open University (OU) Context
Why use activity data?
Scope of the project
What we did
Evaluation and next steps
3. OU context
“The search
engine on the
library is not very
user friendly. I
had to find a
specific article
recommended in “The search
the text and it facility is poor
took several and doesn’t
attempts to find stuff that is
locate it.” supposed to
be there”
http://www.flickr.com/photos/james_lumb/3921968993/sizes/z/in/photostream
4. New search system
New generation
Discovery
System from
EBSCO (EDS)
http://www.flickr.com/photos/jiscimages/435135071/sizes/m/in/photostream/
5. Could we do more?
http://www.flickr.com/photos/davepattern/5808712333/sizes/z/in/photostream/
6. Recommendations Improve the
Search Experience?
“That recommender systems
can enhance the student
experience in new generation
e-resource discovery services”
7. Do recommendations improve
the search experience?
Can you use search data to
make recommendations?
Are recommendations useful
in Discovery systems?
http://www.flickr.com/photos/davepattern/3473326634/sizes/z/in/photostream/
8. JISC Activity Data Programme
JISC funded project
February – July 2011
One of eight projects [list at http://bit.ly/gwCmNS]
9. Why activity data?
"Every day I wake up and
ask, 'how can I flow data
better, manage data
better, analyse data better?"
Rollin Ford, the CIO of Wal-Mart
http://www.flickr.com/photos/zerimski/5215633183/sizes/z/in/photostream/
11. OU Library activity data
Computer
Loans Holds
bookings
Library e-
access resources
12. OU Library systems environment
Athens DA authentication built into local (SAMS) login system
EZProxy remote resource access
SFX knowledge base and OpenURL link resolver
Ebsco Discovery Solution
13. Scope of our project
Algorithms &
Activity data recommender Search
code interface
15. So what is in the EZProxy logs?
• Remote host
• Date/Time
• OUCU
• Request
• Status
• Size of response
• Referrer
• User agent
• Session
http://www.flickr.com/photos/vixon/116447718/sizes/m/in/photostream/
16. So what is in the EZProxy logs?
"0"|||"137.108.143.168"|||20110115235421|||“nn12
34"|||"GET http://libezproxy.open.ac.uk:80/connect?
Session=st3ShtizgtrS7tU5&url=
http://search.ebscohost.com/login.aspx?direct=true&
site=edslive&scope=site&type=0&cli0=FT&clv0=Y&c
li1=FT1&clv1=Y&authtype=ip&group=VCStud&bquer
y=War%20Against%20the%20Panthers
HTTP/1.1“|||302|||0|||http://library.open.ac.uk/
|||"Mozilla/5.0 (X11; U; Linux i686; en-US;
rv:1.9.2.13) Gecko/20101206 Ubuntu/10.10
(maverick) Firefox/3.6.13"|||"t3ShtizgtrS7tU5"
17. So what is in the EZProxy logs?
"0"|||"137.108.143.168"|||20110115235421|||“nn12
34"|||"GET http://libezproxy.open.ac.uk:80/connect?
date and time
Session=st3ShtizgtrS7tU5&url=
http://search.ebscohost.com/login.aspx?direct=true&
site=edslive&scope=site&type=0&cli0=FT&clv0=Y&c
li1=FT1&clv1=Y&authtype=ip&group=VCStud&bquer
y=War%20Against%20the%20Panthers
HTTP/1.1“|||302|||0|||http://library.open.ac.uk/
|||"Mozilla/5.0 (X11; U; Linux i686; en-US;
rv:1.9.2.13) Gecko/20101206 Ubuntu/10.10
(maverick) Firefox/3.6.13"|||"t3ShtizgtrS7tU5"
18. So what is in the EZProxy logs?
"0"|||"137.108.143.168"|||20110115235421|||“nn12
34"|||"GET http://libezproxy.open.ac.uk:80/connect?
User
Session=st3ShtizgtrS7tU5&url=
name
http://search.ebscohost.com/login.aspx?direct=true&
site=edslive&scope=site&type=0&cli0=FT&clv0=Y&c
li1=FT1&clv1=Y&authtype=ip&group=VCStud&bquer
y=War%20Against%20the%20Panthers
HTTP/1.1“|||302|||0|||http://library.open.ac.uk/
|||"Mozilla/5.0 (X11; U; Linux i686; en-US;
rv:1.9.2.13) Gecko/20101206 Ubuntu/10.10
(maverick) Firefox/3.6.13"|||"t3ShtizgtrS7tU5"
19. So what is in the EZProxy logs?
"0"|||"137.108.143.168"|||20110115235421|||“nn12
34"|||"GET http://libezproxy.open.ac.uk:80/connect?
Session=st3ShtizgtrS7tU5&url=
http://search.ebscohost.com/login.aspx?direct=true&
site=edslive&scope=site&type=0&cli0=FT&clv0=Y&c
li1=FT1&clv1=Y&authtype=ip&group=VCStud&bquer
y=War%20Against%20the%20Panthers
HTTP/1.1“|||302|||0|||http://library.open.ac.uk/
|||"Mozilla/5.0 (X11; Request
U; Linux i686; en-US;
rv:1.9.2.13) Gecko/20101206 Ubuntu/10.10
(maverick) Firefox/3.6.13"|||"t3ShtizgtrS7tU5"
23. What can the data tell us?
People who looked at resource ‘C’ also
People on course ‘A’ viewed resource ‘B’
looked at resource ‘D’
Which are the most popular resources
This resource is being used by people studying this course
24. But what isn’t there?
ISSNs DOI
Article Subject
information terms
http://www.flickr.com/photos/kevharb/5466661946/sizes/z/in/photostream/
25. So how do you improve your data?
Remote host | Date/Time | Oucu | request | status EZProxy
| size of response | referrer | user agent | session
user type | course code(s) CIRCE
EDS
Bibliographic data matching
Crossref
30. So how do you improve your data?
Remote host | Date/Time | Oucu | request | status | size of EZProxy
response | referrer | user agent | session
user type | course code(s) CIRCE
EDS
Bibliographic data matching
Crossref
RISE Searches in RISE
32. What can the data tell us?
People on course ‘A’ viewed People who looked at resource People who searched for subject
resource ‘B’ ‘C’ also looked at resource ‘D’ ‘E’ looked at resource ‘F’
People are looking at resources on this subject
This resource is being used by people studying this course
34. Getting a recommendation
User A Views Resource B Views +1 Resource B
Module A123 RV=14 RV=15
User C Recommended Resource B Views +1 Resource B
Module A123 RV=15 RV=16
User C Rate Useful +1 Resource B
Module A123 RV=17
User C Rate Not Useful Resource B
Module A123 -2 RV=14
35. Data Protection and privacy
Added a privacy policy to
RISE, EDS and SFX interfaces
Provided an opt-out feature
Privacy and opt-out URL
http://library.open.ac.uk/rise/?p
age=privacy
36. Evaluation
Online Survey
Face to Face interviews
Review of web analytics
37. Survey results
Related to records you have viewed
Very useful Not sure
45% 11%
Not useful
22%
Quite useful
22%
Slightly useful
0%
39. Focus groups
Undergraduates Postgraduates
Like ratings and reviews from Citation as a recommendation
other students
‘other people’s experiences Wary of provenance
valuable’
Feed to module website
Which module studied?
Want synonyms
How high a mark? Trust repository
40. Face to face interviews
First impressions of recommendations (course-related)
Asked to enter a search term. Results and
recommendations explored.
Asked about relevance
Asked about preference for type of recommendation
41. Should we have a recommender
system?
“I think it would be a very good useful feature. It
would be definitely very very useful” postgraduate
Maths student
“I'm afraid my first reaction is to be a bit sceptical - it presumably doesn't tell
you if fellow students found the information/article useful or relevant to what
they were looking for. I would hate to waste time following unproductive
links laid down by others who might be failing students or think that any
"lazy" students might develop poor practice by relying on what others had
looked at. It sounds like a good idea but I think caution needs to be
exercised. ”
I have just had a go, it was good with
suggested papers that I had already found
(which shows potential in my view) through
Google.
49. My thanks go to Richard Nurse and Liz
Mallett of the Open University Library for
giving me the use of their slides on the
project for this presentation.
Any questions ???
Notas del editor
If we go back to 2009, it became obvious that library search simply didn’t work as well as users expected it to.We were regularly getting the sort of comments you see on screen which showed that library users were struggling with the federated search system that we were using.So the library embarked on some work to improve search, by introducing a new discovery search system and making improvements to the web site.
We changed the search system to a new generation of library search system, EBSCO Discovery System (EDS). Instead of searching library resources individually and telling you how many results are in each database it now searches one index and shows the results in a single list. Throughout 2010 and 2011 we worked with the system to make sure it was integrated into our improvements to the library web site and included as many of our resources as possible. We are still in the process of making further enhancements to include subject based searching.
We then started thinking whether there was more that we could do to improve the user experience. For a while we’d been following with interest some JISC work looking at whether activity data could be used by libraries to improve services, in projects such as TILE and MOSAIC. So we started to think whether there was an opportunity to look at whether using our activity data could improve the user experience of library search.
So when we knew that JISC were going to be funding some more work on activity data, we thought about what we’d want to do, and came up with this hypothesis.
The project we came up with was RISE – Recommendations Improve the Search ExperienceWe set out to test two thingsCan you use search data to make recommendations?Are recommendations useful for these new systems?
RISE was funded as part of the Activity Data strand of the JISC Infrastructure for Education and Research programme.It was a very short project, just six months, with a small team consisting of a developer and a project manager.There were seven other projects in the programme. Some of which were working with libraries, such as SALT and LIDP, others of which were looking at activity data in a range of other areas from Virtual Learning Environments, through repositories, to student systems to video-conferencing data, and including the UCIAD project in the OU’s Knowledge Management Institute looking at a user-centred approach to web click stream data.
So why the focus on activity data?Since the early 1990’s the business sector, particularly companies such as Tesco, Amazon and Wal-Mart, have been exploiting the data they have about customer activities to support decision making. Industry analysts have noted how many big retailers now use complex algorithms to analyse the large amounts of customer data involved to create new revenue opportunities and increase customer retention.Equally publishers are looking at ways to reach readers who are interested in particular topics online. In the context of Facebook, Twitter and other social networking sites worldwide networks of friends and people with similar interests form a large online community of potential readers that are just waiting to discover new relevant content. Knowing what these people are looking at, buying, and recommending could be key to marketing products in the years to come.Some early research by JISC, in the TILE and MOSAIC projects, identified that the HE sector also had extensive user data and there was some potential to make use of it, but it was greatly underused. So this JISC programme has set out to explore this area in more detail. Across the sector we are being told to be more business-like and the use of customer data is one of the areas that businesses seem to be exploiting far more than we do.
For a traditional ‘bricks and mortar’ university these are some of the ways that you’d typically interact with your customers.Well, for the OU things are a bit different…
We don’t really loan many books to students or have many accessing the library. All our students are distance learners so they interact with us online and use our resources electronically. And with more than 450,000 unique users of our website and over 100,000 unique users of our e-resources each year then there’s a fair amount of activity data for us to use.
So, if we are concentrating on our e-resources then the systems we use are SAMS single sign on. The EZProxy system from OCLC which allows students to access our resources as if they were locally within the library We are using SFX from ExLibris as our resources knowledge base and as the OpenURL link resolver and then finally the EBSCO Discovery System in place of an older federated search system
The stages of the project were to build the database fill it with activity data, write some software to create the recommendationscreate a search interface to show the recommendationstest it with some users and get feedback
We push as much as possible through EZproxy, so we use it for access through our discovery system, for links from SFX, for links placed in our VLE. So it seemed the obvious choice as the place to start to look at e-resource activity data. We didn’t have access to the EBSCO Discovery log files and we hadn’t been using that system for long whereas we did have a few months of log files from EZProxy.So we started with the EZProxy log files as the core dataset.
So when we start to look in detail at what data is contained within the log files you’ve got some useful data and other data that isn’t so useful for activity data purposes.We know the user name – that’s the OUCU, the Open University Computer User account name. You know the request, that is the website that is being accessedSo when you look at the detail of the record what you get is…
Something that looks like this (we’ve anonymised the OUCU for obvious reasons).This is one record out of tens of thousands of rows but with a bit of work you can break it down to…
So you’ve got the date and time – useful to be able to know when something happened
And the username of the user
And the request that has been made – in this case an EBSCOHost search
So our database starts to build up with details of the user and resources
We can then get data about the course(s) that students were studying from our internal student information system.
This tells us their course, subject area of interest and degree programme.
So, the data we have so far can tell us which courses people are on, so we can make recommendations based on that, i.e. these are the most popular resources that people on your course are looking at. We can also start to say that if you looked at resource C and then straightaway looked at resource D that there is a likelihood that there is some relationship between resource C and resource D.And we can also say which overall are the most popular articles or journals.
But there are limitations. From the logs you don’t always know what search terms were used or have much information about the item that is being accessed.And if you want to make a recommendation you don’t even have an article or journal title to show as the recommendation.So we looked at how we could improve the data. At the moment we use another EBSCO EDS API call to extract bibliographic details that are used to extract data from Crossref that we can store in the database.
That meant we could then retrieve some bibliographic data.Originally we’d hoped that we would be able to store basic metadata from EBSCO in the system but after discussion with them we realised that the license terms wouldn’t let us do that.So we had to look for other metadata sources that we could use. So we set the system up to retrieve data keys from EBSCO and use them to search Crossref. The Crossref data license allows you to store that data locally.
We created a test search interface called MyRecommendations, to test recommendations with users using the EBSCO EDS API. It gave a search screen, and included recommendations based on other resources the user had recently viewed.
Once a search had been carried out the search results page presented the user with further recommendations based on the articles viewed by people who had used similar search terms.
If you viewed one of the recommended resources it then opened the record in another window and you were given the chance to rate the usefulness of the recommendation.
We also built a second interface – this one is a Google Gadget version with pretty much the same functions as the main interface.
We then also started to capture search terms used in the MyRecommendations (RISE) interface.
Now we can add search terms that are being used
So we’ve ended up with a set of data that can give us a range of different types of recommendationsFrom ‘people on your course are looking at these articles’ through ‘people who looked at this article also looked at this article’ and ‘to people using this search term looked at these resources’And we are sure that you could put the data to other types of use.
When we were looking at recommendations we thought that the simplest approach was just to start with something very basic.What drives the recommendations is a set of relationship values. Values are assigned based on resource views and subsequent ratings by users.The relationships are ranked according to value so the top ones get shown as recommendations.
Each relationship starts as value 0 +1 each time the resource is viewed +1 each time the recommendation is viewed +1 each time the recommendation is rated as ‘Useful’ -2 each time the recommendation is rated as ‘Not Useful’Recommendations are displayed in value order
Any system that deals with personal data has to be mindful of privacy and data protection requirements. After discussion within the JISC Activity Data programme and some helpful information particularly from EDINA’s OpenURL project, we put together a specific privacy policy and discussed it with our data protection people at the University. The policy explicitly covered activity data and we have linked to it from the RISE interfaces, from our main EBSCO EDS page and from SFX. The policy gives people an opt-out to have their data removed from the recommendations, even though they aren’t identified personally in any of the recommendations.With the new EU ‘cookies’ legislation we are doing some more work to ensure that we are legally compliant. Ideally we would want any institutional ‘cookie’ policy and agreement to cover permission to use data for this type of activity.
With EZProxy data on its own there are limits to the recommendations you can make, they would mostly be about which are the most popular resources.Our main issue was to get access to bibliographic data about the articles being accessed and recommended.To create something meaningful you need to combine the activity (EZproxy) data with other stuff, such student data.The more data you can get the better. The more data you add to the mix the more types recommendations you can then make.License restrictions on article level metadata limit what you can store in your database.
The original plan with the project was to be able to release an open data set of search data. We spent quite a lot of time looking at methods of anonymising the data, by removing usernames, generalising courses to broad subjects and looking at whether there was a threshold of students that we needed on a course to be able to release any data from that course.We faced a major challenge because the activity data we had was fairly meaningless without some article metadata and at the time we could only find data we could use ourselves and nothing we could make available in an open data set.So unfortunately it wasn’t possible to release the data. But others at EDINA, LIDP and SALT were able to do so.
The Google Gadget version of MyRecommendations will go into list of tools for students.We are migrating the database so we can use it for more mainstream use. We plan to use it for the new JISC MACON (Mobilising Academic Content Online) mobile devices search and accessibility project. And we’re interested in how this data could be used by Learning Analytics, an OU project to gather all user and activity data together into one single data warehouse.
We are also looking at how we can use these approaches to provide personalised services to users through the library website, so have been looking at being able to show people what articles are being looked at and have been developing some beta services to demonstrate this.