Slides from a webinar given on February 5th, 2013, organized by Comtech Services (http://comtech-serv.com/ ). Abstract as follows:
In the past, it was often difficult for information development teams to obtain quantitative data on how their content was used. In recent years, with the spread of online content delivery, it has become easier to obtain such data. Now, the challenge is how to interpret it in order to make content more effective.
In this webinar, Joe Pairman from HTC's User Education team will show how content usage data, ratings by users, and search query records can:
• Indicate appropriate vocabulary
• Contribute to taxonomy development
• Suggest areas of focus for content improvements
• Help to answer specific questions about designing effective content
How To Simplify Your Scheduling with AI Calendarfly The Hassle-Free Online Bo...
How Just a Little Data Analysis Can Improve your Content
1. How just a little data analysis can improve your
content
Joe Pairman
Listening for movement in the mine. National Institute for Occupational Safety
and Health (NIOSH). www.flickr.com/photos/25069384@N03/2492849690/
2. Background
• DITA XML implementation at HTC: effective web content a primary driver
• From “How do we design for the web?” to “What can we learn from the
web?”
• Co-ordinated analytics and user feedback plan
• Main focus is improving content
• This presentation covers methods, tips, and lessons learned from that
• Exploration of ideas rather than a technical guide
Introduction How just a little data analysis can improve your content — Joe Pairman
3. Slide types
Ideas and overviews
Cautionary notes
How to
Tips and insights
Introduction How just a little data analysis can improve your content — Joe Pairman
4. Examples in this presentation
Online knowledge base of support articles for a fictitious e-reader device
http://commons.wikimedia.org/wiki/File%3AEbook_reader_icon.png By netalloy
(Open Clip Art Library image's page) [see page for license], via Wikimedia Commons
Introduction How just a little data analysis can improve your content — Joe Pairman
5. The predominant flavor of web analytics
“This is a fast-growing category that's generated tremendous interest in recent
years due to the advertising and marketing value derived from tracking and
understanding user behavior.”
Morville & Rosenfeld, Information Architecture for the World Wide Web, 3rd
Edition (emphasis added)
• Much web analytics aims to directly improve sales
• In contrast, content-based sites focus on delivering effective information
• Of course (for a commercial site), the goal is still sales, but indirectly
Introduction How just a little data analysis can improve your content — Joe Pairman
6. What can web data tell us about content?
• What people are searching for, and the language they use to search for it
• What they’re viewing and how long they’re staying there
• (With a ratings system) How much they like what they’re seeing
• (With a combination of metrics) What we can focus on for improvement
• What's the effect of particular qualities (graphics, word count, links, etc)
Introduction How just a little data analysis can improve your content — Joe Pairman
7. You need…
• Access to analytics data
• Significant body of homogeneous content, such as knowledge base,
established blog
• Significant views of that content
• Data such as searches, page views, ratings
Introduction How just a little data analysis can improve your content — Joe Pairman
8. What can’t web data tell us?
• How to design our content (it can suggest which things work better but in the
end we still need a coherent design)
• Why the patterns exist (interpretation is up to us)
• What the full context is
Introduction How just a little data analysis can improve your content — Joe Pairman
9. Ultimately
The data provides focus and pointers, not answers
Introduction How just a little data analysis can improve your content — Joe Pairman
11. What can search query data tell us?
• Top searches (so crucial content)
• The vocabulary that customers use
• The way that customers classify things
• And much more
Search terms How just a little data analysis can improve your content — Joe Pairman
12. External search v.s. site search 1
Site search External search
• Users more likely to know what • Potentially many more queries
they’re looking for?
Pros
• A much wider range of data • A much wider range of search
terms
available
• Increasingly, Google is where • Still only those who made it to
people search first your site
Cons
• Poorer range and quality of • Google encrypted search: now
results may drive people away up to a third of queries may not
from your site search have associated terms
Search terms How just a little data analysis can improve your content — Joe Pairman
13. External search v.s. site search 2
• It’s possible that site search is used more for technical or specialized info
• But some argue against this
• Best way would be to actually compare external (referral) to site (local) terms
Rosenfeld, Louis. 2011. Search Analytics for Your Site. New York: Rosenfeld Media. www.rosenfeldmedia.com/books/searchanaly tics/
• External is probably still the best way to get started
Search terms How just a little data analysis can improve your content — Joe Pairman
14. Processing search terms 1: Data collection
• Even if your content is only one section of the site, it’s best to get the whole
site’s search queries
• If a lot, try using a phrase to filter, such as "how to". Also filter out the obvious
irrelevant terms
• But if you do this, compare with other sources to make sure not too skewed
Search terms How just a little data analysis can improve your content — Joe Pairman
15. Processing search terms 2: Common phrases
• Filter out small words: and, the, a
• Consider getting 2- and 3-word phrases too:
back up ≠ back + up
• Even at this stage the results may be very interesting
Search terms How just a little data analysis can improve your content — Joe Pairman
16. Processing search terms 3: Categorizing
• Based on the frequent keywords, draft out categories. Not too granular; the
idea is to make big baskets to categorize quickly.
• Categorize the original search terms, based on these categories (automate
this!) Anything uncategorized goes in “Other”.
• Spot check your categorized terms so far.
• Look at “Other”, and think up new categories.
• Iterate a couple of times. Probably some manual categorization at the end.
Search terms How just a little data analysis can improve your content — Joe Pairman
17. Using search data 1: Prioritization
• Do you have gaps? Are you putting energy into the right places?
Search terms How just a little data analysis can improve your content — Joe Pairman
18. Using search data 2: Language
• Based on your categories, look into the language that people actually search for
most:
display or screen?
storage, memory, or just space?
• Best place for frequent terms is page title; next is intro paragraph
• After that, try to get terms into body of the page.
• Last resort is index or other non-visible keywords (but that’s mostly for internal site
search, not external searches)
• Strike a balance between using a range of terms and “stuffing”
Search terms How just a little data analysis can improve your content — Joe Pairman
19. Using search data 3: Classification
• How do your site users classify subject areas?
For example, a UI-driven category of “Sharing” might not match users’
distinct searches for recommend a book and sync notes
• If designing from scratch (or big revamp) this work should probably come first
• Search terms seem particularly amenable to a flat, “tagging” approach, but
can be informative no matter the approach
Search terms How just a little data analysis can improve your content — Joe Pairman
20. Other avenues for exploration
• Segmentation by screen size / geography / language
• Social media monitoring
• Further site search data such as audience and searches with no results
Search terms How just a little data analysis can improve your content — Joe Pairman
22. Food for thought
(simulated data)
Page views
Pages
Page views and time on page How just a little data analysis can improve your content — Joe Pairman
23. High (unique) page views
• Some indication of what's popular
• Compare with search keyword categories, to identify gaps
• Doesn’t identify whether the pages are doing a good job, or even if they’re
actually the things users were looking for
Page views and time on page How just a little data analysis can improve your content — Joe Pairman
24. Low (unique) page views
• Generally could indicate candidate for removal, but...
• Could be not effective information on a “niche” topic
• Could be useful but not findable
Page views and time on page How just a little data analysis can improve your content — Joe Pairman
25. Time on page
• Seems appealing at first — longer means better (up to a point)?
• But people can just leave a page open
• Some pages might be harder to read than others, so take longer?
• Some topics just deeper than others
• However, low time on page could be useful...
Page views and time on page How just a little data analysis can improve your content — Joe Pairman
26. Time on page correlates with related keywords
• When people land on a page that wasn’t what they wanted, they don’t tend to
stay long:
• Pages with average time of less than a minute could be flagged.
• Though tip-style pages may have short time on page but still be popular.
Page views and time on page How just a little data analysis can improve your content — Joe Pairman
28. What can ratings tell us?
• Do people like the page or not? (For whatever reason.)
• Can be a good metric, when combined with other data. A simple example:
High page views Low page views
High positive Could be helpful info on a niche
Good
ratings ratio subject, or perhaps is hard to find
Low positive
Needs improved Possible candidate for removal?
ratings ratio
Page ratings How just a little data analysis can improve your content — Joe Pairman
29. Cautions about ratings
• Avoid assumptions. “Not helpful” doesn’t always mean the page content is
unsuitable for its purpose.
• Don’t use in isolation.
• Combine with qualitative data if at all possible. Comments, usability studies,
social media monitoring, etc.
Page ratings How just a little data analysis can improve your content — Joe Pairman
30. What you need…
• A rating per page
• Should have at least ability to rate positively and negatively (not just "like",
which is dubious - people don't even remember what they liked and why)
• Not really about lengthy surveys — they are a separate thing and require a lot
more preparation
Page ratings How just a little data analysis can improve your content — Joe Pairman
31. Getting a better response rate
• Keep the ratings system as simple as possible
• If there’s the chance to provide a comment, make sure this shows up after a
rating is selected
Kohavi, R; Henne, R; and Sommerfield, D: Practical Guide to Controlled Experiments on the Web: Listen to Your Customers not to the HiPPO (Slides from talk on Controlled Experiments).
www.exp-platform.com/Documents/controlledExperimentsHippoEbay.pdf
Page ratings How just a little data analysis can improve your content — Joe Pairman
32. How to prepare ratings data
• Make sure it's comparable — i.e. don't compare product section to support
section
• If it's binary — helpful or not — divide positive by negative:
756 helpful divided by 230 not helpful gives a helpfulness ratio of 3.29
• Even if have multiple negative options, sum them and do the same, though
hang on to source data — it could be useful
• You end up with a list of pages, ranked by their helpfulness ratio
Page ratings How just a little data analysis can improve your content — Joe Pairman
33. If few rate a page, do the ratings count?
• Response rate may correlate with helpfulness ratio (so don’t ignore pages
with low response rate)
(simulated data)
Response rate
Helpfulness
• Response rate is a useful metric in itself
Page ratings How just a little data analysis can improve your content — Joe Pairman
35. A combination of metrics could indicate …
• Which pages need to tackle their subject more effectively
• Which pages need to be more findable (similar to above but not the same)
• Which pages need to discourage wrong searches (different again)
• Which pages are candidates for removal
• Which pages work well (so are examples to follow)
Synthetic metrics dashboard How just a little data analysis can improve your content — Joe Pairman
37. Relative measures are fine
• What’s a good helpfulness ratio? How many page views do we need?
• Very hard to answer these kinds of questions (especially at first)
• Rather, focus on relative measures: which pages are comparatively weak or
strong
Synthetic metrics dashboard How just a little data analysis can improve your content — Joe Pairman
38. Calculating low, medium, & high rankings
• For each metric, create a column to show whether the page is in the bottom third,
middle third, or top third
• In Excel, use something like this:
=IF(RANK(AC2,AC:AC)>ROUND(COUNT(AC:AC)*2/3,0),"Low",IF(RANK(AC2,AC:A
C)>ROUND(COUNT(AC:AC)*1/3,0),"Medium","High"))
Synthetic metrics dashboard How just a little data analysis can improve your content — Joe Pairman
39. Synthesizing metrics
• Indicators for Improve searchability: High helpfulness ratio, low page views,
and response rate at least medium.
Synthetic metrics dashboard How just a little data analysis can improve your content — Joe Pairman
40. Ratings with other metrics
• Improve content? — Low helpfulness, and page views are at least medium
• Improve searchability? Low page views, high helpfulness, and response rate at
least medium
Synthetic metrics dashboard How just a little data analysis can improve your content — Joe Pairman
41. Ratings with other metrics
• Unrelated searches — may be indicated by low time on page > check
keywords for these (remember tip-type pages may have low time on page too)
• Consider getting rid of — low page views, and low or medium helpfulness
Synthetic metrics dashboard How just a little data analysis can improve your content — Joe Pairman
42. Ratings with other metrics
• Good topics — high helpfulness, and at least medium response rate and page
views
Synthetic metrics dashboard How just a little data analysis can improve your content — Joe Pairman
43. Further research into a (potential) problem page
• Does it really have a problem? For example the time on page may be low, but
ratings very good. Is it a short, tip-style page?
• How do people get there? Where do they go when they leave? (Search terms,
navpaths, exits.)
• Is there anything the good pages have in common that the problem ones
don’t? (See next section … )
Synthetic metrics dashboard How just a little data analysis can improve your content — Joe Pairman
45. Ratings ratios for answering specific questions
• Are pages with graphics more helpful?
• Is it better to have more subtopics on a page?
• Does the number of links on a page affect bounce rate?
Specific content attributes How just a little data analysis can improve your content — Joe Pairman
46. Looking at relationships
• Excel CORREL function (0.3 or above is respectable)
• Scatter chart, with optional trend line
• But remember that correlation is not causation!
Specific content attributes How just a little data analysis can improve your content — Joe Pairman
47. Correlating with XHTML / XML structure
• For example, pages with more graphics:
<img> or perhaps <fig>
• More subtopics on a page:
<h2> or perhaps use information from DITA maps
• Several ways to automate this: Python with LXML library is powerful and not
too intimidating
Specific content attributes How just a little data analysis can improve your content — Joe Pairman
48. Bounce rate
• Why should we try to keep people on the site? Don't we want to give them
the answer and then have them leave satisfied?
• However, bounce rate can indicate things like whether links are being used —
(correlate links on page to bounce rate)
Specific content attributes How just a little data analysis can improve your content — Joe Pairman
49. Combining ratings with non-web data
• Assign human-judged ratings and see how they match up. (Is a particular
word usage important? Friendly style?)
• (For support content) Matching to support call issues. What types of pages
are used more on the web v.s. called about?
Specific content attributes How just a little data analysis can improve your content — Joe Pairman
51. Web data in the whole organisation
• Content teams should have access to the data
• Can not only improve content but provide valuable feedback for other groups
in the organization
• Resourcing may require persuasion
• Potential legal issues may need to be addressed
• Once we have the data, we need to treat it responsibly
Next steps How just a little data analysis can improve your content — Joe Pairman
52. Schedule
• Search terms — every six months
• Synthetic metrics dashboard — every month or two
• Specific questions — as necessary
Next steps How just a little data analysis can improve your content — Joe Pairman
53. General principles
• Always present data in terms of the question it’s aiming to answer (though it’s
good to explore the data first)
• Surprises are good. They indicate that you're not just confirming your
prejudices.
• Don't assume that your data answers the question. Be very suspicious. Use
all other sources possible. And use common sense.
• Watch your resources.
• Analytics is not going to write your content or guarantee its success. And it's
reactive — only measures what's there, not what could be there.
Next steps How just a little data analysis can improve your content — Joe Pairman
55. Useful resources
• Search Analytics for Your Site, by Louis Rosenfeld — a thorough and thought-
provoking investigation of applications for internal site search data (Also see
slide deck with some key points at the same link.)
www.rosenfeldmedia.com/books/searchanalytics/
• Best Practices for “Was this helpful?” — a discussion about the design of
page ratings systems:
www.ixda.org/node/24101
• For “Was this page helpful” data, should I take response rate into account? —
a question with some useful comments and answers:
stats.stackexchange.com/questions/46428/for-was-this-page-helpful-data-
should-i-take-response-rate-into-account
Further information How just a little data analysis can improve your content — Joe Pairman
56. A simple synthetic metrics dashboard — steps
In Excel:
1. Get data from each source such as your analytics tool and your ratings database. Get the
data in any format that Excel can open.
2. Combine the data from different sources. Use VLOOKUP formula if the value you’re matching
on is to the left of other values; INDEX and MATCH if not. If matching on page title, remember to
allow for any underscores / percent encoded characters / garbled characters.
3. Calculate rankings for key metrics. See slide 38. An example formula:
=IF(RANK(AC2,AC:AC)>ROUND(COUNT(AC:AC)*2/3,0),"Low",IF(RANK(AC2,AC:AC)>ROUND(CO
UNT(AC:AC)*1/3,0),"Medium","High"))
4. Set synthetic metric indicators. See slide 39. An example formula:
=IF(AND(AC2="Low", OR(L2="High", L2="Medium"), OR(N2="High", N2="Medium")), "1","")
Or, get your data as CSV/TSV, do steps 2-4 with a Python script, write to a CSV file, and then
open the result in any spreadsheet package.
Further information How just a little data analysis can improve your content — Joe Pairman