The document summarizes several projects related to improving the UK national academic union catalogue Copac. It discusses redesigning Copac to better serve 21st century researchers, developing tools to analyze library collections using Copac data, and a project called Surfacing the Academic Long Tail that uses circulation data to recommend lesser-used materials to humanities researchers. It provides updates on the progress of these projects and discusses strategic issues and next steps to further develop the tools and assess their sustainability and value.
1. Copac: Reengineering the UK national academic union catalogue to serve the 21st Century researcher Redesign, collection analysis, recommendations Joy Palmer, Mimas University of Manchester
2. Key points Background & context of Copac Development in progress Strategic issues and directions R&D/Innovations Work Collections management project Surfacing the Academic Long Tail project
3. Aggregation of 50+ research & specialist libraries 40 million records < Aprox 1 million search sessions per month Primary academic use case – locating long tail materials Primary workflow use case – cataloguing & ILL support Funded by JISC since 1996 Sponsored by RLUK (and based on RLUK data +) In re-engineering process Expanding consistently to include specialist libraries Copac…
4.
5.
6.
7. Others include…. Imperial War Museum Chetham’s library Windsor Castle National Maritime Museum British Museum French Institute University of Exeter Special Collections The Women’s Library Institute of Education Royal Academy of Music Kew Royal Botanic Gardens Tate Gallery Library Natural History Museum
8. Half our users Advanced researchers Humanities-based Been with us a while… Looking for specific items More later…..
9. And the rest are mostly librarians Cataloguing Support Collections Mgt ILL Support Researcher Support
12. c.1m Sessions p/mOpenURL router Z39.50 ESTC at the BL Search updates User interfaces ILL/Copy via users’ OpenURL server HTTP Social media COinS RSS OpenURL Z39.50 SRU/SRW Users: Open access Use by HE, FE, NHS, Libraries, Schools, General public M2M Last Updated 07/12/07
13. Development activity in progress New hardware (Oracle) Enhanced-de-duplication Improved search (ranking, facets) ‘FRBR-ised’ record display Enhanced user interface Additional specialist libraries Graphic redesign
14.
15.
16. Strategic issues -- macro Changing technological landscape and user-expectations Death of the physical? ‘Good enough’ = just-in-time (not a specific item) eBook search and discovery challenges Integrated and cross-domain search
17. Strategic issues - micro Leadership and community role Identity and positioning Enabling infrastructure support library workflows or resource discovery service? Governance Collections policy (uniqueness vs. comprehensiveness?) Innovation vs. service delivery
19. Can I release this book?How does my collection compare in strength to that of other UK libraries?
20. Project background Builds on the work of the White Rose Consortium Partners are Leeds, York and Sheffield Universities Funded by JISC as part of Discovery initiative (making Copac data ‘work harder). Sponsored & facilitated by RLUK 7 months and limited in budget
21. How it works Web-based Identifies which locations items/batches exist Search by ISBN, RLUK #, author, title, subject Batch search (comma delimited sets) Data visualisation of results Map views Graphs Record export in MODS, CSV
22.
23. Exploratory and iterative in approachDevelopment/testing cycle with partners trialling and providing structured feedback
24. Six use cases developed Identifying last copies among titles considered for withdrawal Identifying collection strengths Deciding whether to conserve a book Reviewing a collection at the shelves Prioritising a collection or items for digitisation Subject strengths – collection development and marketing/differentiation
25. Findings Need for further development and refinement of the tools (esp. duplication & user interface issues) Significant potential for answering strategic questions about the status of collections
26. Particularly Overlap between the holdings of major UK research libraries in particular subject areas; Differences in that overlap between different subject disciplines and areas; The proportion of unique titles within those collections; The extent to which researchers will find that they no longer have access to such a wide range of research materials in future years, given current pressures on space and the widespread severe deterioration of printed materials through brittle paper and collapsing bindings.
27. Next proposed steps Expansion of test libraries and resilience testing of the tool. Address evidence of scalability. Building collaborations and alliances with interested organisations pursuing complimentary activity Addressing the development of a business model for a service beyond a pilot More targeted communications and dissemination of the activity
30. And also… how sustainable would an API-based national shared service be?Can such a service support users and also library workflows such as collections management?RLUK, M25, Leeds University, Cambridge University, Sussex University.
31. --John Rylands University Library: --1.3 million bib records--600,000 search sessions per month--23% of records unique (cross checked against WorldCat)--40,000 students10 years of circulation data
35. Y- market research reveals these users as… Centrifugal searchersBerrypickers from various trailsQuite isolated and prone to pitfalls
36. And increasingly they just don’t ask librarians…They ask their tutors and each other where to look…
37. Researchers are suspicious about UGC, especially ratings & reviews, but…. they could see the immediate benefit of‘tacit’ recommender functions….
38.
39.
40.
41. What if? this represented a national aggregation of data gathered from the usage activity of these researchers, collected as they worked with a national aggregation of unique or rare research collections?
43. What can this mean? Surfacing and increasing usage of hidden collections ( & demonstrating value) Providing new routes to discovery based on use and disciplinary contexts (not traditional classification). Powering ‘centrifugal searching’ and discovery through serendipity Enabling new, original research – academic excellence…
48. Focus groups and user testing 3 focus groups (18 people) MA/PhD humanities students (mixed ages) How relevant/useful are the recommendations at first glance? Do any other recommendations look useful? Were you previously aware of these texts? How likely would you be to borrow the recommended item?
49. What are users saying? Recommendations are already key to them: Supervisors/Peers Amazon Bib citations Don’t accept recommendations blindly Serendipity important (but not to all)
50. What are users saying? Very supportive, but in practice founds results too generalist, irrelevant, and sometimes bizzare! Lower ranked recommendations much better Did you find something you’d borrow (yes!) Find something new? (mixed) Would you use it? (yes!) Useful for searching more widely More university data needed to improve results
52. Can we make the data work harder to solve other shared problems?
53. Issues for sustainability Is there a clear-cut case for a national shared service here? Data model: data out = easy data in = not so much Licensing & Attribution: collective ownership of a collective pot? Is proof of our hypothesis key to sustainability?
54. Key findings Lower thresholds will throw up ‘long tail’ items, but relevance and usefulness is not evident (but what is The Long Tail?) Users aren’t concerned about data privacy This can be successful without a significant backlog of data A shared service needs to aggregate activity data from more libraries (but not many more)
55. Proposed next steps Aggregate more data Assess impact over time Gather requirements and costs for a shared service Establish more data extraction recipes Investigate utility for collections mgt further Investigate usefulness for teachers & supervisors
Copac Closest thing UK has to National Union CatalogueOver 50 UK academic or specialist librariesGrowingCurrently being radically re-engineered from ground upFRBR-isingCompletely new UIMobile appsPersonalisation?Linked data – RDF planned for 2011Primarily used by humanities & philosophical studies postgrads and academicsHeavily used -- 8 million sessions per month
Over sixty libraries….
Get Shirley’s latest data model slide
I need Shirley to send me stuff here…By July 2011 the new Copac service will include: Enhanced user interface and complete graphic redesign with ‘FRBRised’ record display, enhanced deduplication, improved search and ranking of results and faceted browsing support (released iteratively. First prototype available for testing by end of November 2011). Improved coverage through the incorporation of more UK academic libraries (6 more by July 2011 – expansion rate and tactics to be determined post July 2011) Integration of article level data (Zetoc) and access to full content (for authenticated academic users) An ‘Open Copac’ API and support tools for developers wishing to ‘mash’ Copac MODS xml content into new applications (e.g. local library technical developers creating scholarly support resources). Licensing issues still be explored as part of RDTF, but there are some ‘quick win’ ways forward here. More flexible personalisation features for end users so that they can export and repurpose content within citation management systems and social media contexts (blogs, virtual learning spaces, etc) Also in development, with prototypes released by September 2011: Collaborative Collections Management service prototype (supporting the decision‐making processes for librarians managing, developing & disposing of collections) PENDING JISC FUNDING: Recommender functionality based on aggregations of UK university book circulation data. People who borrowed this also borrowed…) 2
Key strategic issues Leadership & community roleCopac’s ‘passivity’ and lack of overt leadership was raised several times. Copac needs to exploit its position as a community focal point, and take a more overt leadership role in engaging the UK HE/FE library community, understanding its needs, and how Copac can serve those needs. How Copac engages with OCLC also needs to be determined, with an emphasis on finding ways to collaborate rather than compete.Identity & brand positioningCopac lacks a strong brand identity. Work is required to position Copac as a brand which targets both end users and stakeholders (librarians, developers) Enabling Infrastructure or One Service?A key challenge in developing a coherent identity for Copac is the question of whether Copac is ‘simply’ a resource discovery service, or an infrastructure to support resource discovery but also other business requirements. The question of what drives Copac strategically also needs to be tackled and clarified (see Governance below). GovernanceWhat drives Copac and shapes its strategy and values? If Copac is to exploit its full potential as an enabling infrastructure, and to take a leadership role, then formal Governance structures need to be put into place. Collections policy. Exposing unique content; providing comprehensive coverage?Copac does not have a clear Collections Policy. There are strong drivers to incorporate more unique and specialist content into Copac, and there are equally strong drivers to provide more comprehensive coverage so that all HE/FE users can use Copac to locate content locally. Cross domain search and discovery also featured heavily. Copac needs to develop a sustainable collections policy that best serves the needs of end users and contributing institutions. Copac also needs to explore the technical feasibility of serving as a National Union Catalogue for all UK HEFE libraries – are there risks around adding more libraries? (Will the value decrease as more data is added? Will standards need to drop?) Can coverage be achieved without aggregating? InnovationCopac does innovative work and has been successful in securing funds for specific projects. But the service needs to develop a more strategic and prioritised approach to innovation, and specifically for how it engages with social media, Open Data, Linked data, and the shared services agenda. Copac needs to develop a strategy that helps it innovate for new services or functionality that meets the needs of existing users, and also opens up the opportunity of reaching new users and markets. Copac also needs to identify how it wishes to engage the developer community through opening up data, or sharing source code.
Key strategic issues Leadership & community roleCopac’s ‘passivity’ and lack of overt leadership was raised several times. Copac needs to exploit its position as a community focal point, and take a more overt leadership role in engaging the UK HE/FE library community, understanding its needs, and how Copac can serve those needs. How Copac engages with OCLC also needs to be determined, with an emphasis on finding ways to collaborate rather than compete.Identity & brand positioningCopac lacks a strong brand identity. Work is required to position Copac as a brand which targets both end users and stakeholders (librarians, developers) Enabling Infrastructure or One Service?A key challenge in developing a coherent identity for Copac is the question of whether Copac is ‘simply’ a resource discovery service, or an infrastructure to support resource discovery but also other business requirements. The question of what drives Copac strategically also needs to be tackled and clarified (see Governance below). GovernanceWhat drives Copac and shapes its strategy and values? If Copac is to exploit its full potential as an enabling infrastructure, and to take a leadership role, then formal Governance structures need to be put into place. Collections policy. Exposing unique content; providing comprehensive coverage?Copac does not have a clear Collections Policy. There are strong drivers to incorporate more unique and specialist content into Copac, and there are equally strong drivers to provide more comprehensive coverage so that all HE/FE users can use Copac to locate content locally. Cross domain search and discovery also featured heavily. Copac needs to develop a sustainable collections policy that best serves the needs of end users and contributing institutions. Copac also needs to explore the technical feasibility of serving as a National Union Catalogue for all UK HEFE libraries – are there risks around adding more libraries? (Will the value decrease as more data is added? Will standards need to drop?) Can coverage be achieved without aggregating? InnovationCopac does innovative work and has been successful in securing funds for specific projects. But the service needs to develop a more strategic and prioritised approach to innovation, and specifically for how it engages with social media, Open Data, Linked data, and the shared services agenda. Copac needs to develop a strategy that helps it innovate for new services or functionality that meets the needs of existing users, and also opens up the opportunity of reaching new users and markets. Copac also needs to identify how it wishes to engage the developer community through opening up data, or sharing source code.
“…to develop and test a service that will enable improved decision making regarding the retention, disposal, and redistribution of materials. The service will provide evidence of the wider availability of individual materials and/or collections when discussing the disposal of materials with academic staff within an institution.” And additionally will, “achieve the longer term aim of developing the technical framework required to support a more proactive and cohesive approach to collection management at a national level. The development of the CCM Pilot will enable the practical demonstration of applying our tools to collection management workflows and provide an assessment of benefit that will feed into sustainability and business planning.
Builds on work of White Rose ConsortiumPartnering with RLUK7 months, minimal budgetProposal in development to extend (Discovery initiative).Uses Copac dataFunded through - RDTF/Discovery – making data ‘work harder.’ By building the service on top of Copac data, the project contributes significantly to furthering the work of the JISC & RLUK Resource Discovery Taskforce, which aims to explore how data can be opened up and made to ‘work harder.’”From the Interim Report at http://www.rluk.ac.uk/files/CopacCMInterimReportfinal.pdfCheap & low profile…
How it worksWhat we’ve built:Web –based toolUses a variety of means to identify in which locations a particular item or batches of items exist. Data visualisation provides differing views of the results for example, map views to assess quickly where items are held across the country, and graphs to indicate how many items searched for exist within specific libraries.Access via IP address checkingFacility to search for a set of records by entering a comma delimited set of local record identifiers or standard record numbers via a text box. The initial limit on the number of records in a set (~100).Result set display including holding libraries.Record export in MODS format.Option to view a map of the results to see where the documents are held.Option to see a graph showing the number of records held by each library. 1. The ability to set up an RSS feed that will tell you when the results are available. 2. A Search History button on the search screen that lets you look at all your batch search results.3. No limit on the number of records for batch searching.4. Revised search procedure behind-the-scenes.5. More information in the brief record.6. A basic full record html display.7. Records are sorted by author/title; records with no author will file at the top in title order.8. Export of visualisation data now saves as a csv file.9. There is a MARC exchange export format.
Exploratory in Approach – iterative development; put tools in front of users (WRC) for feedback so we can refine them. Very much a pilot/exploratory projectManaging expectations….
6.1 Interim Report Use CasesThe Interim Project Report described four Use Case scenarios that the project team had developed in light of the participating library’s collection management requirements and experience of the first CCM interface trial, alongside the discussion arising from that work. These four covered the following:Use case 1: Identifying last copies among titles considered for withdrawalUse case 2: Identifying collection strengthsUse case 3: Deciding whether to conserve a bookUse case 4: Reviewing a collection at the shelvesResulting from continued consideration of the benefits the CCM Tool offers to libraries and how easily it can be integrated into existing workflows two further Use Case scenarios have been developed during Phase 3 of the project. The purpose of these illustrative use cases will be to demonstrate how this tool can be used productively. The aim will be to look to applying and testing all the use cases in live conditions once the CCM Tool is more fully developed.6.2 Detail of additional Use Case Scenarios6.2.1 Use Case 5: Prioritising a collection or item(s) for digitisationContextMany libraries are faced with the monumental task of potentially digitising entire collections either through user request or preservation needs. Dependent upon resources and funding some material may be lost due to the lack of data required to understand what should be prioritised over other content. With all the competing priorities of a digitisation service (on-demand, research led, project funded, preservation) having a tool that identifies what is a strength, is unique or endangered 6.2.2 Use Case 6: Subject search - Collection development and marketing ContextUniversities are under pressure to attract more students and high quality researchers. This often involves creating new courses and/or research areas and expanding the topics covered by existing courses or research. It also requires strong marketing. An example of the former would be creating a new department. An example of the latter would be a department which historically focuses on their topic in the context of Europe and the US, but in order to offer a competitive course they expand this to include Asia and the Pacific, or even the whole world. The library’s collections will often be weak in the new area and will need building up. Libraries also need to market what they already have more effectively.
This case study, based on the current version of the Copac tools, demonstrates both the need for further development and refinement of those tools, but also their potential for answering strategic questions about the status of the collections in individual libraries and the opportunity to deepen our understanding of the parameters around collection development in terms of:Overlap between the holdings of major UK research libraries in particular subject areas;Differences in that overlap between different subject disciplines and areas;The proportion of unique titles within those collections; The extent to which researchers will find that they no longer have access to such a wide range of research materials in future years, given current pressures on space and the widespread severe deterioration of printed materials through brittle paper and collapsing bindings.
Relatively plain sailing ‘til now – followed the guidelines of Dave Pattern to extract the data and create the appropriate algorithms for the recommenderThis is pulling from the API (which is not public yet, but will be)
Building on the work of MOSAIC, the SALT project will focus on a different use case where the barriers encountered by MOSAIC can be overcome. Academic researchers in the humanities make up a vast proportion of the institutional library OPAC usage. Much of the work of these researchers is monograph-based, and recent findings from Mimas and RIN indicate that a high level of postgraduate research centres on the use of unique or rare items held across the UK. These are the collections that make up the ‘Long Tail’ of UK research library collections. SALT will test the ‘long tail’ hypothesis in relation to advanced academic users of long tail collections held in UK research libraries which hold some of the richest heritage collections in the world. We will investigate how issues of relevance and frequency of borrowing might shift within the particular use case of humanities research, where low level of borrowing of rare or niche items may not necessarily equate with lower relevance to end users whose search behaviour is typically centrifugal and exploratory. We will look at the relationship between key critical texts and commonly read humanities secondary or primary research monographs that might occupy the ‘head’ of frequently borrowed items and follow activity trails to explore the relevancy of long tail, lesser borrowed, lesser known niche items.
A lower threshold may throw up ‘long tail’ items, but they are likely to not be deemed relevant or useful by users (although they might be seen as ‘interesting’ and something they might look into further). Set a threshold of ten or so, as the University of Hudderfield has, and the quality of recommendations is relatively sound.Concerns over anonymisation and data privacy are not remotely shared by the users we spoke to. While we might question this response as potentially naive, this does indicate that users trust libraries to handle their data in a way that protects them and also benefits them.You don’t necessarily need a significant backlog of data to make this work locally. Yes, we had ten years worth from JRUL, which turned out to be a vast amount of data to crunch. But interestingly in our testing phases when we worked with only 5 weeks of data, the recommendations were remarkably good. Of course, whether this is true elsewhere, depends on the nature and size of the institution. But it’s certainly worth investigating.If the API is to work on the shared service level, then we need more (but potentially not many more) representative libraries to aggregate data from in order to ensure that recommendations aren’t skewed to represent one institution’s holdings, course listings or niche research interests, and can support different use cases (i.e. learning and teaching).