Tata AIG General Insurance Company - Insurer Innovation Award 2024
Smart eBooks: ScentIndex and ScentHighlight research published at VAST2006
1. Ed H. Chi, Lichan Hong, Julie Heiser, Stu Card
Palo Alto Research Center
The user study portion of this research has been
funded in part by ARDA NIMD program.
Ed H. Chi VAST 2006 1
2. Ed H. Chi, Lichan Hong, Julie Heiser, Stu Card,
Michelle Gumbrecht
Palo Alto Research Center
The user study portion of this research has been
funded in part by ARDA NIMD program.
Ed H. Chi VAST 2006 2
3. Reading is an essential human activity.
• Giant leaps forward is marked by new and better ways to
find, correlate, and comprehend information.
Ed H. Chi VAST 2006 Copyright 2004 PARC 3
4. Many books digitized in the
(goal)
digital library effort.
90%
(current)
• Amazon, Google, and
RECALL
75%
CMU’s Million Book
Project.
Intelligent Analysts spend an
enormous amount of time 0%
Reading! [Pirolli, Lee, Card]
0
TIME
Ed H. Chi VAST 2006 Copyright 2004 PARC 4
5. Subject Indexes as a new reading “device”
• Invented in the 15th Century [Dewar98]
• Method of design refined thru centuries
Peter Heylyn's 1652 Cosmographie in Four Bookes
Ed H. Chi VAST 2006 Copyright 2004 PARC 5
6. Instead of generating new
indexes using IR
techniques, why not
enhance them?
Take advantage of
centuries of experience
in building subject
indexes.
Ed H. Chi VAST 2006 Copyright 2004 PARC 6
8. Readers need help in directing their attention to the most
relevant passages to their topic of interest.
Idea: conceptually highlight passages and keywords that
are related to user search keywords.
Ed H. Chi VAST 2006 Copyright 2004 PARC 8
9. Conceptually highlight any relevant
User first type search keywords: passages and keywords
“anthrax symptoms”
Draw user attention
Ed H. Chi VAST 2006 Copyright 2004 PARC 9
10. Biohazard
• by Ken Alibek
• non-fiction retelling of
his experiences
working on biological
weapons in the
former Soviet Union.
• 13 index pages in two
columns, consisting
829 entries
Ed H. Chi VAST 2006 Copyright 2004 PARC 10
11. Associated Entries
underlined in red
Exact Matches
in red
Ed H. Chi VAST 2006 Copyright 2004 PARC 11
16. Goal: Compare how users find, compare, and comprehend
information using the ScentIndex and 3Book versus the
physical book.
It’s not clear to us that the ScentIndex would be better,
because:
• Unfamiliarity with 3Book Interface (page turning, clicking on page
numbers, use of search box)
• Inability to grasp the ScentIndex concept (reorganization might
be confusing, harder to read the index page on screen)
• Readability of the Screen (hard to read a large number of pieces
of text)
• Users might be very good at using the physical book index.
Ed H. Chi VAST 2006 Copyright 2004 PARC 16
17. Study Design:
• Within-subject
• Interface condition (ScentIndex vs. Physical Book Index), and
• Task Type (find, compare, comprehend),
• with the order of the interface used and expertise level as the
between-subject variables.
Subjects: 16 subjects (8 experts on the content, 8 novices)
Materials: subjects used PC with two LCD monitors, and the
physical Alibek book.
Ed H. Chi VAST 2006 Copyright 2004 PARC 17
18. Overall, the ScentIndex eBook performed better
against the physical Book.
Faster Speed:
• Subjects using the ScentIndex were faster in
completing their tasks no matter whether they were
experts or novices, F(1,12)=12.96, p<.01.
More Accurate:
• Answers that they provided while using ScentIndex
interface were more accurate, F(1,12)=3.991, p=.06.
Ed H. Chi VAST 2006 Copyright 2004 PARC 18
20. The difficulty seems to be, not so much that we publish unduly
in view of the extent and variety of present-day interests, but
rather that publication has been extended far beyond our present
ability to make real use of the record.” --- V. Bush
Indeed, our goal is to enhance current Browsing Interfaces
for more productive reading.
Ed H. Chi VAST 2006 Copyright 2004 PARC 20
21. The user study portion of this research has been funded in part by
contract #MDA904-03-C-0404 to Stuart K. Card and Peter Pirolli from
the Advanced Research and Development Activity, Novel
Intelligence from Massive Data program.
We thank Jock Mackinlay for some fruitful conversation about the
interaction of the eBook; Michelle Gumbrecht and Tan Lee for
running some of our experiments; Pam Desmond for help in the data
analysis, and Brian Tramontana for the video production.
Contact:
Ed H. Chi (chi@acm.org)
Ed H. Chi VAST 2006 Copyright 2004 PARC 21
26. Page
Textures
sample Renderer
scan
Page
Images Words +
extract Locations
OCR
Word Association
Text
compute Matrix
Sentence Scent
parse Structure Highlights
Ed H. Chi VAST 2006 Copyright 2004 PARC 26
28. Early proposal of an indexing system: Memex [Bush45]
Electronic Books: Rocket eBook, SoftBook Reader, DigiPaper
[Huttenlocher00], DjVu [DjVu Zone03], PDF [Adobe03], MS Reader
[MS03].
3D Electronic Books: SGI Demo Book [SGI93], WebBook [Card96],
British Library Turning the Pages [BL03], 3Book [Card03].
Computer help search systems such as Apple or Microsoft.
Google or AltaVista provide highlighting and searching
Automatic Indexing in IR: use noun-phrases and parsers to create
indexes [Wacholder01, Nevill-Manning99]. Scatter/Gather [Cutting91].
Concept similar to: Word Co-occurrence [Schuetze99], InfoScent and
Spreading Activation [Chi01, Chi00].
Ed H. Chi VAST 2006 Copyright 2004 PARC 28
29. HyperText Book Systems: SuperBook [Remde87] provides a dynamic
TOC with fisheye DOI.
Ed H. Chi VAST 2006 Copyright 2004 PARC 29
30. Two issues:
• M is calculated using a 40 word window
• Caveat: Exact word matches do not always show up.
• Solution: Insert large values onto the diagonal
Ed H. Chi VAST 2006 Copyright 2004 PARC 30
31. An experimenter without prior knowledge of how the ScentIndex
system works devised a total of 12 tasks.
The tasks were divided into two groups of six tasks each.
• Between the two sets of questions, half of the subjects received one set
first; the other received the other set first.
• Tasks from one group were designed to be one-to-one equivalents of
the other group.
• Of these six tasks, two were Simple Fact Retrieval questions, two were
Dispersed Comparison questions, and two were Comprehension
questions.
Ed H. Chi VAST 2006 Copyright 2004 PARC 31
32. Initial Survey (on computing and search experiences.)
4 expert and 4 novice subjects used the Book interface first, and the
other eight used the ScentIndex first.
Subjects were trained to use the ScentIndex immediately before they
needed to use it.
Each task was given on a separate sheet of paper. Read, understand each
question completely before they start the task.
• time limit for each task (simple retrieval=2min, comparison=4min,
comprehension=6min) with one minute warnings.
• For each interface, subjects performed the simple fact retrievals first,
the dispersed comparisons second, and the comprehension questions
last.
Post Survey (comments, preferences).
Ed H. Chi VAST 2006 Copyright 2004 PARC 32
33. Simple Fact Retrieval:
• The last natural occurring case of WHICH virus occurred in Somalia in 1977?
• Who received a state award for developing a Q fever weapon?
Dispersed Comparision:
• What is the death rate of smallpox and tularemia? Which virus has a higher
death rate?
• What year did Russia open negotiations with Iraq for large fermentation vessels?
What year did Vladimir Kryuchkov become chairman of the KGB? Which
occurred first?
Comprehension:
• Pasechnik’s defection to the West had grave implications for the Soviet
biowarfare program. Match the person with the fact that describes how they’re
involved:
• Persons: Frolov, Chernyayev, Karpov, Vinogradov
• Facts: A. First told Alibek about Pasechnik’s defection. B. Deputy minister who
refused to sign formal diplomatic reply. C. Given demarche that said US have
“new information”, presumably given by Pasechnik. D. Told American visitors
that Pasechnik’s jetstream milling machine was for “salt”.
Ed H. Chi VAST 2006 Copyright 2004 PARC 33
34. Time to Completion
6
Participants using the
ScentIndex interface performed
ln(completion time) in seconds
5.5
tasks faster than those using the
5
Book, F(1,12)=12.96, p<.01.
Many tasks not completed in 4.5
the time allotted using the Book
interface.
4 Book
• 6/7 for simple retrieval,
ScentIndex
3.5
• 7/8 for comparison,
• 3/5 for comprehension.
3
Simple Dispersed Comprehension
Ed H. Chi VAST 2006 Copyright 2004 PARC 34
35. Natural log transformation on the
completion time
6
As predicted, experts performed
ln(completion time) in seconds
tasks faster than novices overall
5
5.435
• (Expert Mean=4.85, S.D.=.
4 4.987
212, Novice Mean=4.58,
S.D.=.212, F(1,12)=17.7, p<. 3 3.722
01.)
2
• There were no interactions.
Simple Retrieval < Dispersed 1
Comparison < Comprehension, F
0
(2,24)=204, p<.01.
Sim ple Dispersed Com prehension
Ed H. Chi VAST 2006 Copyright 2004 PARC 35
36. Converted the scores for each task to a percentage.
(measured in Simple Dispersed Comprehen
points) Retrieval Comparison sion
ScentIndex M=1.88 M=1.88 M=1.77
eBook S.D.=.342 S.D.=.269 S.D.=.284
Book M=1.75 M=1.58 M=1.84
SD=.447 S.D.=.516 S.D.=.259
We found that users performed better using the ScentIndex, reaching
marginal significance F(1,12)=3.991, p=.06.
We found no difference between experts and novices.
Ed H. Chi VAST 2006 Copyright 2004 PARC 36
37. Users overwhelmingly preferred the ScentIndex interface (15/16)
• “can search using keyword combinations”
• “clicking on page number to navigate”
• “highlighting enables faster scanning and skimming”
• “easier to compare index entries because it’s all on 1 page.”
Some users mentioned that they prefer paper for extensive reading
Potential Future work:
• Compare with search engines (organize results by relevancy).
• Understand difference between this technique and keyword finding.
• Limit the page number list of each relevant index entry to only the pages
that are relevant to the keywords specified.
Ed H. Chi VAST 2006 Copyright 2004 PARC 37