This presentation was provided by Peter Vlahakis and Dan Paskett, both of ITHAKA/JSTOR, during the NISO webinar, Tracing Discovery and Subsequent Use, held on Wednesday, December 6, 2017,
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Vlahakis From the Perspective of the Platform Provider
1. From the Perspective of
the Platform Provider
December 6, 2017
Peter Vlahakis, Dan Paskett
2. Dan Paskett
Director, JSTOR Forum Outreach Coordinator
New York, NY
dan.paskett@ithaka.org
Peter Vlahakis
Product Manager, JSTOR
Ann Arbor, MI
peter.vlahakis@ithaka.org
3. ITHAKA is a not-for-profit organization that helps the academic
community use digital technologies to preserve the scholarly record
and to advance research and teaching in sustainable ways.
JSTOR is a not-for-profit
digital library of academic
journals, books, and
primary sources.
Ithaka S+R is a not-for-profit
research and consulting
service that helps academic,
cultural, and publishing
communities thrive in the
digital environment.
Portico is a not-for-profit
preservation service for
digital publications, including
electronic journals, books,
and historical collections.
Artstor provides 2+ million
high-quality images and
digital asset management
software to enhance
scholarship and teaching.
4. Platform Overview
Databases/Collections
JSTOR
• Journals, Books, Research Reports, Primary
Sources
Artstor Digital Library
Primary Sources
• Global Plants
• Struggles for Freedom – South Africa
• World Heritage Sites – Africa
Content Types
• Books
• Images
• Journals
• Research Reports
• Primary Source items
• Images
• Pamphlets
• Plant specimens
5. User Paths to
JSTOR
1. Discovery Landscape
2. Motivations
3. Analysis & Use Case
4. Challenges
5. Improvements
6. Discovery Landscape
Many types of content and technology integration requirements to
support outside-in discovery.
• Indexed/library discovery services
• EDS, Summon, Primo, WCDS
• A&I databases
• Publisher requests
• Link resolvers / knowledge bases
• Search API (federated search)
• CrossRef
• Social Media
• Web search
• Google, Bing
• “Academic” search
• Google Scholar, Naver Academic
• Emerging products/technologies
• Browzine
• Library websites, course
management tools
• Homegrown library solutions
7. Activities supporting
the discovery and
linking ecosystem
Table-Stakes Integration Points
• Metadata and full-text feeds to central
indexes
• Crawl infrastructure with different
requirements for different services
(Scholar)
• KBART files for knowledge bases
• Library entitlement feeds
• CrossRef registration
• Inbound link resolution
• One-off requests for special file formats
and custom content collections
• Data platform investments (data
warehouse)
• Support, support, support
9. Why learn more
about user behavior
within the discovery
ecosystem?
Motivations
• Help libraries leverage the value of their
investments with ITHAKA/JSTOR and
library services
• Deliver key insights to our participants
on user paths to finding/accessing
JSTOR content
• Drive usage, improve user experience,
be proactive about solving problems
• Supporting these integrations is
expensive, both in development time
and human resource costs
• Identify trends and validate investments
made towards improving discovery
integrations
13. Tools we’ve used to
capture and analyze
signals
Integration & User Path Analysis
• Adobe Site Catalyst
• Data warehouse canned reports
• Referrals data tied to usage through event
login
• Tableau
• Webmaster tools
• In-house tools for real-time analysis
• Custom reports (high cost)
• Discovery (user tests)
14.
15.
16. Signals we’ve used in traffic
analyses…
Monitor Integrations
• Referrers from web search
• Sign of crawl health, algorithms updates?
• Traffic from discovery services
• Are content feeds being sent, indexed quickly?
• Metadata evaluation
• Are JSTOR indexing guidelines presenting
problems for link resolving?
User Path Analysis
• Share referral data with libraries
• Improvements to authentication and
linking workflows (Google CASA,
browser pairing)
• Promote direct linking when other
linking protocols are ineffective
• Identify opportunities for formal
linking partnerships
• Users paths by content type ???
17. “If we have data, let’s look at data. If all
we have are opinions, let’s go with
mine.”
Jim Barksdale, former Netscape CEO
Source: https://www.tibco.com/blog/2013/06/28/19781/
18. Small Case Study
Experimenting with metadata investments
Hypothesis: adding our semantic index terms to our distributed
metadata will improve discovery and usage of JSTOR content in
these services.
JSTOR Thesaurus: semantic index that we’re building up to enrich the
connections and discoverability of JSTOR content both within and
outside of JSTOR
How can we leverage the value of this beyond our own platform?
Metadata enrichment! But, the upfront cost is larger of a fully dynamic
implementation is expensive. What will the investment return?
Let’s run a test…
19. Small Case Study
Experimenting with metadata investments
1. Created test and control groups of articles, then distributed test group
of article metadata with JSTOR Thesaurus terms to several major
services
2. Using log data and vendor websites to locate as many “origin”
identifiers as possible: referrers, domains, URL parameters
3. Watching impact on referral traffic from those services
4. The analysis is easier said than done…
Experiment Setup
20. Small Case Study
Experimenting with metadata investments
Results
1. TBD, but mixed results so far; complete analysis conducted at end of
trial period
2. Rolled up lists of referring domains, linked origin ids for each
participating discovery service
3. Data is not always available at granular enough levels to make
significant conclusions about specific scenarios
4. Working on removing confounding variables; have tracking indicators
changed? has market changed?
22. Some problems
we’ve encountered…
Examples
• Missing and inconsistent link origin
identifiers in URLs
• Identifiers, referrers obfuscated from
link resolvers, proxy servers
• Lack of available information on
database to domain/identifier mapping
• Locally hosted instances or domains
that do not indicate service
• Silently failing integrations – lack of
logging and monitoring
• Insufficient understanding of discovery
to delivery workflows
23. What would help improve these
insights?
Some ideas…
• Industry recommendations or standards (NISO working group!)
• Address technical problems with common discovery to delivery paths
• Transparency into link origin identifier values and database domains via public
documentation
• Socialization of user discovery to delivery paths, understanding how
implementation impacts usage and metrics
• More collaboration in the discovery and linking ecosystem