Unleash Your Potential - Namagunga Girls Coding Club
My Research Defense
1. Mental Workload in Multi-Device
Personal Information Management
Manas Tungare
Advisory Committee:
Dr. Manuel Pérez-Quiñones
Dr. Stephen H. Edwards
Dr. Edward A. Fox
Prof. Steve Harrison
Dr. Tonya Smith-Jackson
Thursday, February 12, 2009
2. Talk outline
0 ~45 min 90 min
Presentation & questions Additional comments, suggestions
OK to record audio?
Your questions/comments are welcome at any time.
Thursday, February 12, 2009
5. State of the art
• Difficult to maintain files on 2+ machines
• Workaround: USB drives, email-to-self
• Multiple paper calendars are difficult to read
• Workaround: Online calendars
• Hard to enter phone numbers on phone
• Workaround: Sticky notes
Thursday, February 12, 2009
6. General Hypothesis
• PIM strategies may result in high workload
• leading to increased perception of task
difficulty
• Alternate strategies may lead to lower
workload
Thursday, February 12, 2009
7. Mental workload issues
• What is the mental workload incurred by users when
they are trying to use multiple devices for personal
information management?
• For those tasks that users have indicated are frustrating
for them, do the alternate strategies result in lower
mental workload?
• Are multi-dimensional subjective workload assessment
techniques (such as NASA TLX) an accurate indicator
of operator performance in information ecosystems?
Thursday, February 12, 2009
8. Mental workload
• [...] “That portion of an operator’s limited
capacity actually required to perform a
particular task.” [O’Donnell and Eggemeier, 1986]
• Low to moderate levels of workload are
associated with acceptable levels of
operator performance [Wilson and Eggemeier, 2006]
• Measured using subjective measures or
physiological measures
Thursday, February 12, 2009
9. Research Question 1
• RQ: Do alternate strategies impose
different levels of mental workload?
• Hypothesis: Alternate strategies lead to
lower mental workload than the standard
strategies
• Experiment: Compare mental workload for
tasks identified as difficult, and for their
respective workarounds
Thursday, February 12, 2009
10. Research Question 2
• RQ: Are subjective assessments of mental
workload an accurate indicator of operator
performance in this domain?
• Hypothesis: Mental workload measured by
NASA TLX can be used to predict
operator performance
• Experiment: (Attempt to) correlate
workload assessments with operator
performance
Thursday, February 12, 2009
11. Research Question 3
• RQ: Are both, subjective measures of
workload (TLX) and physiological measure
(pupil diameter), sensitive to PIM tasks?
• What can we learn from changes in pupil
diameter in relation to sub-task
boundaries?
Thursday, February 12, 2009
13. Survey
• N ⊂ 220
• Responses to free-form questions in survey
• 5 tag types defined a priori:
• Devices, tasks, problems, solutions, results
• Tags based on emergent codes
• Device=laptop, desktop
• Problem=syncFailed, conflictingEdits
Thursday, February 12, 2009
14. Experiment design
• Within subjects (repeated measures) in
two sessions 2 weeks apart to minimize
learning effects)
• Complete block design
• Two-factor (task, level of system support)
• 6 treatments: 3 tasks ⨉ 2 levels of system
support
• Counterbalanced to minimize order effects
Thursday, February 12, 2009
15. Overview
Files Calendar Contacts
Participant Code: Date: Treatment: Session: W T 2009S
January 5 to January 11, 2009
January 2009 February
M TW T F S S MT F S
Home Calendar
1 2 3 4 1
5 6 7 8 9 10 11 2 3 4 5 6 7 8
Week 1
12 13 14 15 16 17 18 9 10 11 12 13 14 15
January 2009
19 20 21 22 23 24 25 16 17 18 19 20 21 22
26 27 28 29 30 31 23 24 25 26 27 28
PIM Study - Home
Monday 5 Tuesday 6 Wednesday 7 Thursday 8 Friday 9 Saturday 10 Sunday 11
8 AM
9 AM
10 AM
Participant Code: Date: Treatment: Session: W T 2009S
January 5 to January 11, 2009
January 2009 February
11 AM
M TW T F S S MT F S
Home Calendar
1 2 3 4 1
5 6 7 8 9 10 11 2 3 4 5 6 7 8
Week 1
NOON
12 13 14 15 16 17 18 9 10 11 12 13 14 15
Level 0
January 2009
19 20 21 22 23 24 25 16 17 18 19 20 21 22
26 27 28 29 30 31 23 24 25 26 27 28
1 PM
PIM Study - Home
2 PM
Team Outing Monday 5 Tuesday 6 Wednesday 7 Thursday 8 Friday 9 Saturday 10 Sunday 11
3 PM
8 AM
4 PM
Dentist's appoint!
9 AM
ment
5 PM
10 AM
6 PM
Michael's Little
League game (tenta! 11 AM
tive; confirm with
7 PM
Alex) NOON
8 PM
1 PM
9 PM
2 PM
Team Outing
Page 1/1
3 PM
4 PM
Dentist's appoint!
ment
5 PM
6 PM
Michael's Little
League game (tenta!
tive; confirm with
7 PM
Alex)
8 PM
9 PM
Page 1/1
Multiple paper No support for
No support for
calendars synchronization
file migration
Level 1
System supports Devices support
Online calendars
file migration synchronization
Thursday, February 12, 2009
16. Sample size estimation
• After first 8 participants
Task Cohen’s d Sample size estimate
Files d = 0.671 n = 9.778
Calendar d = 0.528 n = 15.098
Contacts d = 0.536 n = 14.672
All tasks d = 0.602 n = 11.861
• Effect sizes = Medium to High for Overall
Workload
• Goal for sample size is 20
Thursday, February 12, 2009
17. Participants
• Knowledge workers recruited via email,
flyers, personal contacts and promises of
pizza
• Experienced in laptop & phone use
• N=11 18-21
6 Male, 22-25
26-30
5 Female 31-35
0 1 2 3 4
Thursday, February 12, 2009
19. Task familiarization
• 6 videos were made, related to tasks
• each between 2–6 minutes long
• 10 familiarization tasks required to be
performed before experimental tasks
• Watch videos
Thursday, February 12, 2009
20. Files task
• Start on desktop
• Set of instructions to edit specific files
• Then move to laptop, edit more files
• Move back to desktop
• L0: using USB drives, email-to-self
• L1: using a Network drive
Thursday, February 12, 2009
22. Calendar task
• Set of instructions to create, replace,
update, delete calendar entries
• “Today is …”
• Questions on availability and schedule
• L0: Paper calendars, home and work
• L1: Online calendars, home and work
Thursday, February 12, 2009
24. Contacts task
• Set of instructions to create, replace,
update, delete contact records
• “You may/may not use your phone/laptop”
• L0: phone + laptop, no sync support
• L1: phone + laptop, with sync support
Thursday, February 12, 2009
26. Measures
• Time on task
• captured by app that displays instructions
• Task performance metrics (vary by task)
• NASA TLX
• Pupillometric data from eye tracker
Thursday, February 12, 2009
27. Why NASA TLX
• Higher correlation with performance
(concurrent validity) as compared to SWAT
and WP [Rubio & Díaz, 2004]
• Validated in several environments since
1988 [several, 1988-present]
• Sensitive to some differences not
discriminated by SWAT [Battiste 1988]
• Highest sensitivity among 4 scales [Hill 1989]
Thursday, February 12, 2009
28. Pupillometric data
• Pupil diameter can be used as an estimate
of mental workload [Beatty 1982]
• Task-Evoked Pupillary Response (TERP)
• Physiological measure (not subjective)
• Continuous measure (unlike TLX)
• Post-processing is required
Thursday, February 12, 2009
30. RQ1: Workload at L0 & L1
• Effort is significantly lower at α=0.05 for L1
than for L0 (ANOVA) for N=8
Mean L0 Mean L1 p value
MD: Mental Demand 48.9 40 0.1878
PD: Physical Demand 35.3 33 0.7271
TD: Temporal Demand 40.3 30.5 0.1197
OP: Own Performance 27.6 17.8 0.0604
✓
EF: Effort 51.1 35.5 0.0382
FR: Frustration 38.2 25.8 0.0564
OW: Overall Workload 41.4 31.4 0.0666
Thursday, February 12, 2009
31. Time on task
L0 L1
p value
Mean (SD) Mean (SD)
Files 2663 (802) 2309 (601) 0.394
Calendar 2754 (1677) 1786 (1077) 0.226
Contacts 2558 (1368) 1832 (1478) 0.377
Thursday, February 12, 2009
32. RQ2: Performance predictor
• TLX OW not found to correlate highly
with time on task
• Pearson’s r: Workload ~ Time on Task
• r = 0.188 for Files
• r = –0.014 for Calendars
• r = 0.031 for Contacts
Thursday, February 12, 2009
33. RQ2: Performance predictor
Pearson’s r
Files Calendar Contacts
MD: Mental Demand 0.271 –0.171 0.087
PD: Physical Demand 0.140 0.190 –0.226
TD: Temporal Demand 0.095 0.074 –0.254
OP: Own Performance 0.288 0.036 –0.086
EF: Effort 0.196 0.016 0.227
FR: Frustration 0.393 0.135 0.083
OW: Overall Workload 0.188 0.014 –0.031
• Further analysis at step-level, not task-level
Thursday, February 12, 2009
34. Time on task: Files
400
L0
Move from Desktop Move from Laptop
to Laptop to Desktop
L1 p = 0.1624 p = 0.1577
300
Time Taken (s)
200
!
!
100
!
! !
!
!
!
!
!
0
1 2 3 4 5 6 7 8 9 10
Step #
Thursday, February 12, 2009
35. Time on task: Calendars
100
L0
!
Data lookup
steps L1
80
!
!
!
!
60
Time Taken (s)
!
40
!
!
20
!
!
!
Data entry
steps
!
! !
!
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Step #
Thursday, February 12, 2009
36. Time on task: Contacts
200
L0
L1
150
!
Time Taken (s)
!
100
!
50
!
!
!
0
1 2 3 4 5 6
Step #
Thursday, February 12, 2009
37. RQ3: TLX & Pupillometric
• Analyzing pupillometric data
• 100,000 data points per session @ 30 Hz
• Need to filter blinks
• Establish baseline; compute relative
changes
• Signal smoothing techniques
Thursday, February 12, 2009
38. Initial results from
pupillometric data
120
100
Pupil Radius (eye image pixels)
80
60
40
S0 S1 S3
S2 S4 S5 S6 S7 S8 S9 S10
20
0 200 400 600
Thursday, February 12, 2009 Time Elapsed (seconds)
41. Alternate strategies
• Lower mental workload ✓
• Lower time on task ✓
• Synced phones have more data entered ✓
• (Slightly) fewer errors
Thursday, February 12, 2009
42. Observations
• Online calendars provided a frame of
reference at all times (highlighted day)
• A few chose not to sync calendars (in L1)
• None prepared for the transition until
asked to switch machines
• The step after “move now” takes a lot of
time — participants don’t realize they’re
missing information until they need it
Thursday, February 12, 2009
43. Identify critical sub-tasks
• Files: Time-on-task was a highly
discriminative measure for the sub-task of
moving from one machine to another
• Pupillometric measure appears sensitive to
changes in workload across sub-tasks
• Calendar task: paper was faster for data
entry, online was faster for lookup
★ Optimize selectively, remove bottlenecks
Thursday, February 12, 2009
44. Cross-task measure
• TLX can be used to study PIM tasks
• E.g. which of browsing or searching leads to
higher workload?
• E.g. does Tool A lead to lower workload
than Tool B?
Thursday, February 12, 2009
45. Studying multiple devices
• Study each one individually?
• What happens at the transition …
• ...
Thursday, February 12, 2009
46. Questions & comments
?
!
Note to self: Turn off audio recording before committee deliberation.
Thursday, February 12, 2009
47. Questions & comments
?
!
Thank you!
Note to self: Turn off audio recording before committee deliberation.
Thursday, February 12, 2009