ICME 2016 - Tutorial on Interactive Search in Video & Lifelog Repositories

Interactive Search in Video &
Lifelog Repositories
Klaus Schoeffmann, PhD
Klagenfurt University
Institute of Information Technology
Klagenfurt, Austria
Frank Hopfgartner, PhD
University of Glasgow
School of Humanities
Glasgow, UK

Interactive Search in Video & Lifelog Repositories
• Part 1: Interactive Video Search
Ø Search in video content: motivation and challenges
Ø Automatic video retrieval vs. interactive video search
Ø Tools for interactive search
§ Browsing, Navigation, Visualization, Similarity & Sketch-based Search
Ø Evaluation of IVS Tools
§ TRECVID, Video Browser Showdown (VBS)
Short break
• Part 2: Lifelogging
Ø Quantified Self
Ø Lifelog repositories
Ø Lifelogging techniques
Ø Interactive visualization
Klaus Schoeffmann IEEE International Conference on Multimedia & Expo (ICME) 2016 2

Motivation
3Klaus Schoeffmann IEEE International Conference on Multimedia & Expo (ICME) 2016

Video Everywhere
• Ubiquitous use of videos nowadays
Ø Entertainment and commercials
Ø Social gaming (screencasts)
Ø Personal videos (family, kids, …)
Ø Sports documentation and analysis (e.g., GoPro)
Ø Product usage instructions (e.g., furniture)
Ø Surveillance (buildings, places, street, …)
Ø Health care and medical science (endoscopic procedures)
Ø Lifelogging
• Enormous amount of data, challenging to search!

Video – The Ultimate Media?
[Mary Meeker, Liang Wu, Internet Trends, D11 Conference, May, 2013]
As of 2014, every
minute 300 hours of
video are uploaded
to YouTube!

Video Cameras
• Increasingly powerful
Ø These days you can record 4K content with your mobile!
Ø Video sensors use auto-focus, object tracking, color
correction, and image stabilization
Ø Storage space not a big problem
§ Current smartphones have 128 GB of memory
§ NAS devices cheaply available
Ø Network bandwidth also dramatically increased over years
§ Video streaming on the go is simple and common
§ LTE connections provide 30 Mbit/s and even much more!

[Mary Meeker, Liang Wu, Internet Trends, D11 Conference, May, 2013]

Challenge: Finding Content
• Even with retrieval tools still challenging
to find content later
Ø Especially if not publicly available (and popular+annotated)
Ø Many problems with querying, in particular for non-experts
• Ultimate goal: make search as effective as for text
Ø Quickly find relevant content
Ø Compare to interactivity of a text book
§ Index, ToC, list of figures/tables, etc.
§ Change, extend, copy, bookmark, highlight, etc.

Search for
Video Content

Example Scenario
10
Why? (e.g., show to someone, include in edited video,
find some information, extract image, etc.)
You want to find this video clip in your collection:
Klaus Schoeffmann IEEE International Conference on Multimedia & Expo (ICME) 2016

Large Video Collection
11
IACC data set, as
used for TRECVID:
146,788 shots
(~9,000 videos)
Page 1 2 3 …. 38 39 40

How a Novice Would Solve This
Novice users typically employ a file browser and a simple video player!
VCR in the 1970s provided a similar functionality!
12
?
File explorer and
video player

13
Factor > 1 Mio !
[en.wikipedia.org]
Klaus Schoeffmann
IEEE International Conference on Multimedia
& Expo (ICME) 2016

How a Novice Would Solve This
Novice users typically employ a file browser and a simple video player!
VCR in the 1970s provided a similar functionality!
File explorer and
video player

• Video retrieval tool with content analysis and search
• Query by
Ø Text, Concept, Example
• Automatic search
Ø Content-based data such as:
§ Text (e.g., metadata, ASR, OCR,
transcripts, …)
§ Global features (e.g., color, texture,
motion)
§ Local features and concepts
(e.g., VLAD, BoVW, …)
Ø Ranked result list
15
IBM TRECVID 2007 Video Retrieval System [1]
How a Retrieval Expert Would Solve This

16
Content-
based
Feature
Example
Image
Text
Ranked list
of shots
In IACC about
5800 pages. L
Temporal
Context
[ Heesch, D., Howarth, P., Magalhaes, J., May, A., Pickering, M., Yavlinsky, A., & Rüger, S. (2004, November). Video retrieval using search and browsing. In TREC Video Retrieval Evaluation Online Proceedings. ]
How a Retrieval Expert Would Solve This

17
This was 10 years
ago, what about
state-of-the-art?
Klaus Schoeffmann
& Expo (ICME) 2016

A More Recent Video Retrieval Tool
[A. Moumtzidou et al., “VERGE: A Multimodal Interactive Video Search Engine”, Proc. of the 21st International Conference on MultiMedia Modeling (MMM 2015), Sydney, 2015]
kNN Similarity search
based on VLAD vectors
Concept detection with SVM and
five local descriptors (SIFT, SURF,
ORB, ...) and PCA
or CNNs
Hierarchical
keyframe clustering

19URL: http://mklab-services.iti.grKlaus Schoeffmann

21Similarity Search ResultsKlaus Schoeffmann

22
Concept-based search still far from optimal (even with CNNs)!
Even with perfect results, who would browse a few 1000 shots?

Shortcomings of the
Query-and-Browse Approach

Common Video Retrieval Approach
Works well if
Ø users can properly express their needs.
Ø content features can sufficiently describe visual content.
Ø computer vision can accurately detect semantics.
24
Content-
based
Search
Ranked Results
Unfortunately, in practice these assumptions do not hold.

Ø Content-based features
§ How to understand semantics from pixels? Semantic Gap
Both images show
bears in front
of a landscape.
Mind the Gap!

Ø Database affinity of concept classifiers
Ø Low performance in broad domain
P(k) Precision at level k (after k results)
rel(k) defines if kth retrieved document is relevant
Performance
Gap
26
TRECVID 2015 Semantic Indexing (60 concepts):
median “inferred average precision” (infAP) = 0.24
In other words:
more than 75%
of results are wrong!
Mind the Gap!

Ø Query-by-concept
§ Which concept to use? Choose from a long list of results…
Ø Query-by-example
§ Typically no perfect example available.
Ø Query-by-sketch
§ Users are no artists J (see also next slide)
Ø Query-by-text
§ How to describe a desired image by text?
Usability Gap
27
A picture tells a 1000 words.
by marfis75
How to describe a desired video clip by text???
Mind the Gap!

Needs More Focus on the User (Interface)!
Ø In some situations users cannot formulate a query
§ à provide exploratory search features!
§ For example: browsing, filtering, similarity search
Ø Users expect good results (on first page!)
§ à Use relevance feedback / active learning instead of long lists!
Ø Videos are dynamic
§ Static thumbnails are not informative
§ Esp. true for long shots and self-similar content
§ à skims and visual summaries (“smart playback”)
§ à sophisticated navigation & content structure visualization
Ø Shots have a temporal context
Ø Grid interfaces are not always the best choice
Usability Gap

Interactive Video Search

Interactive Video Search
30
• HCI community
• Methods for interactive search
• Human computation
• No content understanding but simple
• Multimedia community
• Mostly automatic search
• Retrieval engine
• Complicated to use
Mismatch
Novices Experts
à Combine HCI with CV and MIR for better search tools

User-Centric Exploratory Search
• Strongly integrate user into search process
Ø Assume a smart user
Ø Give him/her more control over search process
§ Inspects and interacts
§ Selects most meaningful tool for current needs, e.g.
• Content Browsing/Navigation
• Content Visualization and Summarization
• Ad-hoc Querying (e.g., by sketch, filtering, ad-hoc example)
• Aspect-based exploration, parallel search paths
Ø Iterative: Search – Inspect – Think – Repeat
§ Exploratory search (“will know it when I see it”)
§ Instead of „query-and-browse-results“

Aspects of Interactive Video Search (IVS)
IVS
Navigation &
Browsing
Different
Query
Types
Dynamics &
Convenience
Content
Visualization
Underlying Structure
Abstracts/summaries
Overview (TOC)
Skims
Smart Playback
Bookmarks
History
Text or Concept
Example Image
Example Clip
(Similarity Search)
Sketch
Filter (Spatial & Temporal)
Coarse Navigation
Fine Navigation
Browsing
Sequences/Scenes/Shots
Similarity-Based
Arrangements
(e.g., by Color)

Outline
Interactive Video Search (IVS) Tools:
Ø Video Navigation
Ø Video Browsing
Ø Content Visualization
Ø Sketch-based Search

Video Navigation

Improving Navigation
35
e.g., on YouTube
default window:
640 pixels = frames
(25 seconds)
Common seeker-bar limits
navigation granularity
[Huerst et al., ICME 2007]
ZoomSlider
Improvements (selected):

Improving Seeker-Bar Navigation
36
Wolfgang Hürst, Georg Götz, and Martina Welte, “Interactive video browsing on mobile devices”, in Proceedings of the 15th International Conference on Multimedia (MULTIMEDIA '07). ACM, pp. 247-256, 2007
ZoomSlider

Improving Navigation
38
e.g., on YouTube
default window:
640 pixels = frames
(25 seconds)
Common seeker-bar limits
navigation granularity
[Dragicevic et al., CHI 2008]
Direct
Manipulation
ZoomSlider
Improvements (selected):

Relative Flow Dragging
Background Stabilization
39
Pierre Dragicevic, Gonzalo Ramos, Jacobo Bibliowitcz, Derek Nowrouzezahrai, Ravin Balakrishnan, and Karan Singh. “Video browsing by direct manipulation”, in Proceedings of the SIGCHI Conference on Human Factors
in Computing Systems (CHI '08). ACM, pp. 237-246, 2008
Video browsing by direct manipulation / relative flow dragging

Relative Flow Dragging
• Evaluation with a user study
Ø 16 participants (18-44 years old)
Ø Direct comparison to seeker-bar navigation
Ø Navigation tasks, 2 videos (ladybug, cars)
§ “Find the position where the ladybug passes over marker X”
§ “Find the moment when car X starts moving”
Ø Flow dragging significantly faster (RM-ANOVA)
by at least 250% (also significantly less errors)
40
Pierre Dragicevic, Gonzalo Ramos, Jacobo Bibliowitcz, Derek Nowrouzezahrai, Ravin Balakrishnan, and Karan Singh. “Video browsing by direct manipulation”, in Proceedings of the SIGCHI Conference on Human Factors
in Computing Systems (CHI '08). ACM, pp. 237-246, 2008

Scrubbing Wheel
• Requirements
Ø Simple and effective
navigation on touchscreens
Ø Efficient navigation that allows
for content search
in both short and long videos
• Idea
Ø improve navigation by using a
circular navigation area
Ø inspired by Apple iPod (c) device
Klaus Schoeffmann and Lukas Burgstaller, “Scrubbing Wheel: An Interaction Concept to Improve Video Content Navigation on Devices with Touchscreens“, in Proceedings of the IEEE International Symposium on
Multimedia 2015 (ISM 2015), Miami, FL, USA, 2015, pp.351-356

Scrubbing Wheel Implementation (iOS)

& Expo (ICME) 2016
Demo
Video
Klaus Schoeffmann 44

Video Browsing

46
Video Browsing
[ F. Arman, R. Depommier, A. Hsu, and M-Y. Chiu, Content-based Browsing of Video Sequences, in Proc. of ACM International Conference on Multimedia, 1994, pp. 97-103 ]

Video Browser for the Digital Native
47
[Adams, Brett, Stewart Greenhill, and Svetha Venkatesh. "Towards a video browser for the digital native." Multimedia and Expo Workshops (ICMEW), 2012 IEEE International
Conference on. IEEE, 2012.]
“Temporal Semantic Compression” based on tempo function and shot popularity (insight)

Video Browser for the Digital Native
• User study with 8 participants
Ø Test configuration elements by two tasks
(after presentation + 5 minutes training)
§ (i) Browse a familiar movie to find scenes you remember
§ (ii) Browse an unfamiliar movie to get a feel for its story or structure
Ø Questionnaire with
Likert-scale ratings
48
[Adams, Brett, Stewart Greenhill, and Svetha Venkatesh. "Towards a video browser for the digital native." Multimedia and Expo Workshops (ICMEW), 2012 IEEE International
Conference on. IEEE, 2012.]

The Video Explorer
49
[ Schoeffmann, K., Taschwer, M., & Boeszoermenyi, L. (2010, February). The video explorer: a tool for navigation and searching within a single video based on fast content analysis. In Proceedings of the first annual
ACM SIGMM conference on Multimedia systems (pp. 247-258). ACM. ]

Interactive Navigation Summaries
Allows a user to quickly identify
similar/repeating scenes
50
[ Schoeffmann, K., & Boeszoermenyi, L. (2009, June). Video browsing using interactive navigation summaries. In Content-Based Multimedia Indexing, 2009. CBMI'09. Seventh Int.Workshop on (pp. 243-248). IEEE. ]

Motion Layout: Direction + Intensity
Motion Vector (µ) classification into
Motion histogram with K=12
equidistant motion directions (bins)
Mapping to Hue channel
51
[ Schoeffmann, K., Lux, M., Taschwer, M., & Boeszoermenyi, L. (2009, June). Visualization of video motion in context of video browsing. In Multimedia and Expo, 2009. ICME 2009. IEEE Int. Conf. on (pp. 658-661). IEEE. ]

52
[ Schoeffmann, K., Lux, M., Taschwer, M., & Boeszoermenyi, L. (2009, June). Visualization of video motion in context of video browsing. In Multimedia and Expo, 2009. ICME 2009. IEEE Int. Conf. on (pp. 658-661). IEEE. ]
Similarity Search (SOI) with Motion Layout

• SOI Search
Ø Motion-based search by example sequence
§ Using Motion Direction histogram Db
§ User-selected sequence
Ø Find most similar sequences
§ Compute distance to any possible seq. of same length
§ Match if below spec. threshold
Motion Layout (Db)
Match 1 Match 2 Match 3
frame 1 frame n
Similarity Search (SOI) with Motion Layout

Region-of-Interest (ROI) Search
Ø User selects spatial region-of-interest
Ø On search
§ Compute Euclidian distance of frame F
to every other frame f (acc. to selected region)
§ Based on color layout descriptor
…
frame F
frame 1 frame k frame n
User-selected
region (I)
…
d(F,1)=350 d(F,k)=8 d(F,n)=400
54
Similarity Search (ROI) with Color Layout

55
Similarity Search (ROI) with Color Layout

The ForkBrowser
• Thread: linked sequence of shots in a specified order
Ø Query results, visual similarity, semantic similarity, textual similarity
time, …
[De Rooij, Ork, Cees GM Snoek, and Marcel Worring. "Balancing thread based navigation for targeted video search." Proceedings of the 2008 international conference on Content-based image
and video retrieval (CIVR). ACM, 2008.]

Klaus Schoeffmann
& Expo (ICME) 2016
57

& Expo (ICME) 2016
Demo
Video
Klaus Schoeffmann 58
Goal: improve two-handed use

The ThumbBrowser
[Marco Hudelist, Klaus Schoeffmann, Laszlo Böszörmenyi. “Mobile Video Browsing with the ThumbBrowser”, Proc. of the International Conference on Multimedia, 2013, pp. 405-406 ]

Content Visualization

Grid Interfaces Aren‘t Enough!
• Many video retrieval systems use a Grid interface!?
Moreover, a grid interface does not allow
for fast human visual search (see later)!
61
A ranked list of results does not convey
the temporal content structure!
• To which video does a shot belong to?
• What is the sequence of shots?
• How long is a shot / scene?

Table of Video Content
(TOVC)
[Goeau et al., ICME 2007]
62
Squeeze / Fisheye
Rapid Visual Serial
Presentation (RSVP)
Improving Visualization
aka “Video Surrogates”
[Wildemuth et al., 2003]
[Wittenburg et al., 2005]

63
VideoTree
[Jansen et al., CBMI 2008]
However, outperformed by
simple “grid of keyframes”
in terms of search time.
Similar concept proposed later
[Girgensohn et al., ICMR 2011]
• Split-based clustering algorithm with
color correlograms.
• Tree not directly shown to the user
(only one level).
Improving Visualization
aka “Video Surrogates”

Hierarchical Video Browsing
Another Tree-based Approach
Frontal View Top View
From: [Schoeffmann and Del Fabro, 2011]
64
• Goal: improve content overview
• No content analysis (just uniform sampling of frames)

Klaus Schoeffmann
& Expo (ICME) 2016
65

3D Ring Instead of Grid!
• Utilization of screen real estate
Ø Large set of images
Ø Minor occlusion, slight distortion
• Intuitive interaction
Ø Rotate and zoom
• Content-based sorting
• “Pop-out images” (in the back)
• Further advantages
Ø Immediately continue on miss,
scaling
66
Klaus Schoeffmann, David Ahlström, and Marco Andrea Hudelist, “3-D Interfaces to Improve the Performance of Visual Known-Item Search“,
in IEEE Transactions on Multimedia, Vol. 16, No. 7, November, 2014, pp. 1942-1951.

3D Ring Interface - Perspectives
Preferred Design acc. to user study
25% Vertical 66% Horizontal 8.3% Frontal
67

3D interface significantly faster than grid by 12.7%
User Study: Grid vs. Ring (both sorted)
150 images, 12 participants, 1440 trials
68

Extension: Multiple Rings with Vertical Scrolling
69
Klaus Schoeffmann. 2014. The Stack-of-Rings Interface for Large-Scale Image Browsing on Mobile Touch Devices. In Proc. of the ACM Int. Conference on Multimedia (MM '14). ACM, New York, NY, USA, 1097-1100.
Significantly faster search (by about 48%) than common image browser on iPad!

Sketch-Based Search

• Color sketches mapped to
feature signatures
• Matched to those of
keyframes
71
1. Sampling keypoints
2. Description through location (x,y),
CIE Lab, contrast and entropy of
surrounding pixels
3. k-means clustering
Feature Signatures
[ Kruliš, M., Lokoč, J. and Skopal, T. (2013). Efficient Extraction of Feature Signatures Using Multi-GPU Architecture. Springer Berlin Heidelberg, LNCS 7733, pp.446-456. ]

Feature Signature-Based Video Browser
72
Color Sketch
(Signature)
Player
Winner of Video Browser Showdown 2014 + 2015
Download demo at: http://siret.ms.mff.cuni.cz/lokoc/vbs.zip
2nd
Color Sketch
(optional)
[ Lokoč, J., Blažek, A., & Skopal, T. (2014, January). Signature-Based Video Browser. In MultiMedia Modeling (pp. 415-418). Springer International Publishing. ]

Compact visualization
Simple color-position sketch
Negative
example
Matched key-frames
Time to
2nd sketch
2nd optional
sketch
Interactive-navigation summaryOn demand neighborhood expansion
[Slide: Adam Blazek et al.
(siret research group, Czech Republic)]

Compact Visualization to Save Space
[Courtesy of Jakub Lokoc et al.]

Another Example of a Sketch-Based Browser
[Kai Uwe Barthel, Nico Hezel, Radek Mackowiak. Navigating a graph of scenes for exploring large video collections, in Proc. of 22nd International Conference on MultiMedia Modeling (MMM 2016), Lecture Notes in
Computer Science (LNCS), Vol. tbd, Springer International Publishing, 2016, pp. 1-7]
Winner of Video Browser Showdown 2016

Break

Evaluation of
IVS Tools

User Studies with Significance Tests!
• Many interfaces proposed without proper evaluation
• Interface A better than interface B?
à comparative user study needed!
Ø Perform search tasks in exactly the
same setting (data, environment, etc.)
Ø Logging of interaction behavior
and task solve time
Ø Questionnaire about subjective workloads
Ø Statistical analysis with proper tests
(e.g., t-test, ANOVA, Wilcoxon signed-rank, etc.)
• User simulations?
• Evaluation competitions
Ø Same data set
Ø Comparative evaluation
Ø TRECVID, MediaEval, Video Browser Showdown

Video Browser Showdown (VBS)
• Annual performance evaluation competition
Ø Live evaluation of search performance
Ø Special session at Int. Conference on MultiMedia Modeling (MMM)
Ø Demonstrates and evaluates state-of-the-art interactive video search tools
Ø Idea influenced by VideOlympics (Snoek et al., IEEE Multimedia 2008)
• Focus
Ø Known-item Search tasks
§ Target clips are presented on site
§ Teams search in shared data set
Ø Highly interactive search
§ Should push research on interfaces
and interaction/navigation
Ø Experts and Novices
§ Easy-to-use tools and methods
Ø Ad-Hoc Video Search (TRECVID AVS) tasks starting from 2017
79
http://videobrowsershowdown.org/

Video Browser Showdown (VBS)
• Live evaluation/scoring through VBS Server
• Score (s) [0-100] for task i and team k is based on
Ø Solve time (t)
Ø Penalty (p) based on
number of submissions (m)
80
Maximum solve time (Tmax)
typically 5 minutes
[Schoeffmann, K., Ahlström, D., Bailer, W., Cobârzan, C., Hopfgartner, F., McGuinness, K., ... & Weiss, W. (2013). The Video Browser Showdown: a live evaluation of interactive video search tools. International Journal
of Multimedia Information Retrieval, 1-15. ]

Correct but submitted
later than first team
Penalty due to too many
wrong submissions
Klaus Schoeffmann
& Expo (ICME) 2016
81

Video Browser Showdown 2016
• Search in mid-sized video collections
Ø Originally only single video search
• Two different kind of KIS tasks:
Ø Visual: visual presentation of a 30s target clip
Ø Textual: textual description of a 30s target clip
• Shared video data from BBC
Ø 2016: 441 video files, about 320.000 shots (250 hours)
[Schoeffmann, Klaus. "A user-centric media retrieval competition: The video browser showdown 2012-2014." MultiMedia, IEEE 21.4 (2014): 8-13.]

Visual Task Example (2016)

Textual Task Example (2016)
“Steve cutting a drawing into his block of wood. You
can see his hand and a cutter and flower symbols.”

2012: Klagenfurt
11 teams
2013: Huangshan
6 teams
2014: Dublin
7 teams
2015: Sydney
9 teams
2016: Miami
9 teams
VBS 2017: January 4, 2017, Reykjavik, Iceland (MMM 2017)
http://www.videobrowsershowdown.org/

Winner 2014 and 2015
(2014: single video and collection search, 2015: collection only)
86
Color Sketch
(Signature)
Player
2nd
Color Sketch
(optional)
[ Lokoč, J., Blažek, A., & Skopal, T. (2014, January). Signature-Based Video Browser. In MultiMedia Modeling (pp. 415-418). Springer International Publishing. ]

Video Browser Showdown 2015
Two other examples of the 9tools (collection search only)
87
Moumtzidou, A., Avgerinakis, K., Apostolidis, E., Markatopoulou, F., Apostolidis, K., Mironidis, T., ... &
Patras, I. (2015, January). VERGE: A Multimodal Interactive Video Search Engine. In MultiMedia Modeling
(pp. 249-254). Springer International Publishing.
• Shot and scene detection
• HLF (Concepts) with
SIFT/SURF and VLAD
• Similarity search
• Uniform sampled frames
• Human computation
Hürst, W., van de Werken, R., & Hoet, M. (2015, January). A Storyboard-Based
Interface for Mobile Video Browsing. In MultiMedia Modeling (pp. 261-265).
Springer International Publishing.
3rd place

Human vs. Machine
• Utrecht University @ VBS 2015
Ø Wolfgang Huerst et al., The Netherlands
Ø Strong experience in HCI
• Features
Ø Uniformly sampled thumbs
(1 second distance)
Ø Huge storyboard on tablet
Ø Vertical scrolling, paging
88
625 thumbnails in one screen
[Hürst, W., van de Werken, R., & Hoet, M. (2015, January). A Storyboard-Based Interface for Mobile Video Browsing. In MultiMedia Modeling (pp. 261-265). Springer International Publishing.]

Winner 2016

Frank Hopfgartner
School of Humanities
University of Glasgow, UK
Tutorial: Interactive Search in
Video & Lifelog Repositories
Part 2: The Quantified Self and Lifelogging
IEEE International Conference on Multimedia and Expo (ICME) 2016

A few words about me
Research on Multimedia Analysis,
Quantified Self, Lifelogging
Lecturer (Assistant Professor) in
Information Studies (UGlasgow)
PhD in Computing Science
(University of Glasgow)
Past: Various positions in Berlin
(TUB), Dublin (DCU), Berkeley
(ICSI), and London (QMUL)

What is The Quantified Self?
The Quantified Self is about obtaining self-knowledge
through self-tracking.

What is The Quantified Self?
Self-tracking is also referred to as lifelogging, self-
analysis, or self-hacking.

Memex
Bush, Vannevar. "As We May Think." The Atlantic Monthly. July 1945.
Images of Memex: http://trevor.smith.name/memex/

MyLifeBits
• Gordon Bell (Microsoft)
digitized his life:
Ø Books written
Ø Personal documents
Ø Photos
Ø Posters, paintings, photo of
things
Ø Home movies and videos
Ø CD collection
Ø PC files
Ø …
Gordon Bell and Jim Gemmell. Total Recall: How the E-Memory Revolution will change everything, New York, Dutton 2009
http://research.microsoft.com/en-us/projects/mylifebits/

MyLifeBits
Slide from: G. Bell. Challenges in Using Lifetime Personal Information Stores based on MyLifeBits. Presentation at Alpbach Forum on 26 August 2004.

Creating Personal Lifelog Repositories
A lifelog repository consists of heterogeneous data
recorded using many different sensors.

In this tutorial, we will…
• get an introduction
into the creation of
lifelog repositories
• understand the major
challenges of creating
lifelog repositories
• discuss how to
evaluate lifelogging
techniques.

So what are the challenges?
The challenges are how to sense the person, capture
their actions, their life and make it accessible using
appropriate graphical user interfaces,
search/recommendation engines and visual/aural
feedback. Further, exploiting the lifelog to identify
context for adaptive information services.

Research communities
Multimedia
ACM
Multimedia
IEEE ICME
Multimedia
Modeling
HCI
ACM CHI
Augmented
Human
ACM
UbiComp
Machine
Learning
ICML
KDD
ECML

The Key Challenges
Capturing
Semantic
Analysis
Access
Evaluation
Lifelog
repository

Challenge 1: Capture
Automatically and unobtrusively capture lifelogger’s life
experiences.

Image: @morberg, flickr.com
Communication
Interests
Health
Travel
Social networks

Recording my media consumption
Brusilovsky, P. and Kobsa, Alfred and Nejdl, Wolfgang. “The Adaptive Web: Methods and Strategies of Web Personalization." Lecture Notes in
Computer Science, Springer Verlag, 2007.

Recording my communicationImage: http://www.wired.co.uk/news/archive/2013-
06/10/simple-guide-to-prism/viewgallery/304880

Recording how I feel
https://exist.io/

Recording how I feel
http://measuredme.com/

Recording what I hear
http://lifeboxapp.com/

Recording where I travel
http://flightdiary.net/

Recording my activities
Source: https://jawbone.com/blog/jawbone-up-data-by-city/

Recording who I meet
http://linkedin.com/

(Automatically) recording who I meet
• Inferred, weighted friendship network vs. reported,
discrete friendship network.
Eagle, Nathan and Pentland, Alex (Sandy) and Lazer, David. “Inferring friendship network structure by using mobile phone data." Proceedings of the
National Academy of Sciences of the United States of America, 106(36):15274-15278, 2009.

Recording what I eat
Aizawa, Kiyoharu, Maruyama, Yutu, Li, He, and Morikawa, Chamin. “Food Balance Estimation by Using Personal Dietrary Tendencies in a Multimedia
Food Log." IEEE Transactions on Multimedia, 15(8):2176-2185, 2013.
Semantic Gap
http://foodlog.jp/
http://mealsnap.com/

Recording what I eat
Source: http://edition.cnn.com/2014/01/29/world/asia/korea-eating-room/

Recording what I see
"LifeGlogging cameras 1998 2004 2006 2013 labeled" by Glogger - Own work. Licensed under CC BY-SA 3.0 via Commons -
https://commons.wikimedia.org/wiki/File:LifeGlogging_cameras_1998_2004_2006_2013_labeled.jpg#/media/File:LifeGlog
ging_cameras_1998_2004_2006_2013_labeled.jpg

Example: Visual Lifelog of a day
2,000 pictures a day
Slide: C. Gurrin

Big Data
Cathal Gurrin, Alan F. Smeaton and Aiden R. Doherty (2014), "LifeLogging: Personal Big Data", Foundations and Trends® in Information Retrieval:
Vol. 8: No. 1, pp 1-125.

Vision: Recording what I see
(Black Mirror, S01E03)

Challenge 2: Semantic Analysis

A day
This does not work well…
Let’s add event segmentation.

Event Segmentation & Annotation
• Segment 5,500 photos per day into a set of events
Ø Similar to SBD in digital video processing
Ø We employ visual features and output of on-device sensors
Multiple Events
Finishing work in
the lab
At the bus stop Chatting at Skylon Hotel lobby Moving to a
room
Tea time On the way
back home
Event Segmentation
Summarization
Slide: Cathal Gurrin

Context is key
• Context cues help us to
remember (Naaman et al.)
• Context in lifelogging data:
Ø Location, bluetooth, time, date,
…
Ø Derived Knowledge (e.g.
activities)
• Approaches:
Ø Combine cues from different
sources
Ø Perform content analysis to
identify objects, people, events…
Ø Annotate lifelogs in form of
narrative text
Mor Naaman, Susumu Harada, QianYing Wang, Hector Garcia-Molina, Andreas Paepcke: Context data in geo-referenced digital photo collections.
ACM Multimedia 2004: 196-203

Visual Feature Extraction
Ø Steering wheel (72%)
Ø Shopping (75%)
Ø Inside of vehicle when not driving (airplane, taxi, car,
bus) (60%)
Ø Toilet/Bathroom (58%)
Ø Giving Presentation / Teaching (29%)
Ø View of Horizon (23%)
Ø Door (62%)
Ø Staircase (48%)
Ø Hands (68%)
Ø Holding a cup/glass (35%)
Ø Holding a mobile phone (39%)
Ø Eating food (41%)
Ø Screen (computer/laptop/tv) (78%)
Ø Reading paper/book (58%)
Ø Meeting (34%)
Ø Road (47%)
Ø Vegetation (64%)
Ø Office Scene (72%)
Ø Faces (61%)
Ø People (45%)
Ø Grass (61%)
Ø Sky (79%)
Ø Tree (63%)
Byrne, Daragh, Doherty, Aiden R., Snoek, Cees G. M., Jones, Gareth J. F., Smeaton, Alan F. “Everyday concept detection in visual lifelogs: validation,
relationships and trends." Multimedia Tools and Applications, 49(1):119-144, 2010.

Non-supervised Event
Segmentation
2. Arriving
in the office
6. Walking in
the building 12. Leaving
the office
Na Li et al. “Random Matrix Ensembles of Time Correlation Matrices to Analyze Visual Lifelogs." In Proc. Multimedia Modeling Conference, Dublin,
Ireland, pp. 400-411, 2014.
Event Segmentation based on the
extraction of low level features and
computation of semantic concepts
requires knowledge about dataset.
Alternative: Highlight “significant
events” by performing time series
analysis

People access memory for five reasons
Sellen, Abigail and Whittaker, Steve. “Beyond Total Capture: A Constructive Critique of Lifelogging." Communications of the ACM, 53(5):70-77,
2010.
•Reliving past experiences for various reasons
Recollecting
•Story-telling or sharing life experiences with others
Reminiscing
•Find specific information such as an address, or a document
Retrieving
•Gaining insights (Quantified Self)
Reflecting
•Planning future activities.
Remembering

Quantified Self
P. Kostopoulos. Stress Detection using Smartphone Data. In Proc. HealthWear’16, Budapest, Hungary, 2016

Quantified Self
http://quantifiedself.com/data-visualization/

Reflecting
• Reflecting is a form of
quantified self-analysis
over the life archive data
to discover knowledge
and insights that may not
be immediately obvious.
• Example: Nick Feltron
Annual Reports
Image: © Nick Feltron.

MyLifeBits
Gordon Bell and Jim Gemmell. Total Recall: How the E-Memory Revolution will change everything, New York, Dutton 2009

Interactive visualization
Hwang, Keum-Sung and Cho, Sung-Bae. “A Lifelog browser for visualization and search of mobile everyday-life." Mobile Information Systems,
10(2013): 243-258.
Jeon, Jae Ho and Yeon, Jongheum and Lee, Sang-goo and Seo, Jinwook. “Exploratory Visualization of Smartphone-based Lifelogging Data using
Smart Reality Testbed.” In Proc. Big Data and Smart Computing, pp. 29-33, 2014

Virtual reality
“Bad Trip is an immersive virtual
reality installation […] that enables
people to navigate the creator's
mind using a game controller.
Since November 2011, every
moments of his life has been
documented by a video camera
mounted on glasses, producing an
expanding database of digitalized
visual memories. Using custom
virtual reality software, he created a
virtual mindscape where people
could navigate, and experience his
memories and dreams.”
Souce: http://www.kwanalan.com

Virtual reality
Souce: http://www.kwanalan.com

Art installations
Kelly, Philip and Doherty, Aiden R. and Smeaton, Alan F. and Gurrin, Cathal and O’Connor, Noel E. “The Colour of Life: Novel Visualisations of
Population Lifestyles." In Proc. ACM Multimedia, pp. 1063-1066, 2010.
Image: Courtesy of C. Gurrin

Displaying photo stream
Image: http://thenextweb.com/gadgets/2013/07/29/autographer-review-we-put-this-615-wearable-life-logging-camera-
to-the-test/

Browsing in the Living Room
• Control with a suite of
gestures:
Ø Next/previous event
Ø Next/previous image
Ø Next/previous day, week, …
• Possibility of pivot view across
multiple axes, e.g., people,
locations, …
Gurrin, Cathal and Lee, Hyowon and Caprani, Niamh and Zheng, Zhenxing and O’Connor, Noel and Carthy, Denise. “Browsing Large Personal
Multimedia Archives in a Lean-back Environment." In Proc. Multimedia Modeling Conference, pp. 98-109, 2010.

SenseCam Viewer
Doherty, Aiden R., Moulin, Chris J.A., and Smeaton, Alan F. (2011) Automatically Assisting Human Memory: A SenseCam Browser., Memory: Special
Issue on SenseCam: The Future of Everyday Research? Taylor and Francis, 19(7), 785-795

Browsing Interface
Lee, Hyowon, Smeaton, Alan F., O’Connor, Noel E., Jones, Gareth J. F., Blighe, Michael, Byrne, Daragh, Doherty, Aiden R., Gurrin, Cathal.
“Constructing a SenseCam visual diary as a media process." Multimedia Systems, 14(6):341-349, 2008.

Lifelog Insight Tool
Aaron Duane, Rashmi Gupta, Liting Zhou, and Cathal Gurrin. “Visual Insights from Personal Lifelogs." In Proc. NTCIR 12, 2016.

Highlighting Key Moments
Hopfgartner, F. and Yang, Yang and Zhou, Lijuan and Gurrin, Cathal. “User Interaction Templates for the Design of Lifelogging Systems." In Semantic
Models for Adaptive Interactive Systems. Chapter 10, pp. 187-204, 2013.

Lifelog Moment Retrieval
“Find the moments when I’m drinking coffee in front of my laptop”
G. De Oliveira Barra, A. Cartas Ayala, M. Bolanos, M. Dimiccoli, X. Giro-i-Nieto, P. Radeva. “LEMoRe: A Lifelog Engine for Moments Retrieval at the
NTCIR-Lifelog LSAT Task." In Proc. NTCIR 12, 2016.

Reminiscing
• Reminiscing is about story-telling or sharing life
experiences with others.
Image: Courtesy of C. Gurrin

Open Research Questions
• Multimedia summarisation
• Handling heterogeneous data streams
• Visualisation of lifelogs
• Retrieval and Recommendation
• …

NTCIR
• Workshop series focusing on research on
Information Access technologies (information
retrieval, question answering, text
summarisation, etc)
• Initially sponsored by Japan Society for
Promotion of Science (JSPS)
• Organised since 1997 in an 18-months cycle
• NTCIR-12: January 2015 – June 2016
NII Test Collection for IR Systems

NTCIR-12 Tasks
NTCIR-12
§ Second round:
§ Search-Intent Mining
§ Mobile Click
§ Temporal Information Access
§ Spoken Query & Spoken Document Retrieval
§ QA Lab for Entrance Exam
§ First round:
§ Medical NLP for Clinical Documents
§ Personal Lifelog Access & Retrieval
§ Short Text Conversation

Encourage research advances in organising
and retrieving from lifelog data.
LifeLog @ NTCIR-12
C. Gurrin, H. Joho, F. Hopfgartner, L. Zhou, R. Albatal. Overview of NTCIR-12 Lifelog Task. In Proc. NTCIR-12, Tokyo, Japan, 2016

Multimodal dataset with information
needs
Created by three
individuals over
10+ days
TEST COLLECTION
§ 18.18GB
§ 88,124 images
§ Accompanying output of
1,000 concepts (825MB)
§ Data processed pre-release
(removal of personal content;
face blurring, translation of
concepts)
§ Detailed user queries and
judgments generated by the
lifelogging data gatherers
C. Gurrin, H. Joho, F. Hopfgartner, L. Zhou, R. Albatal. NTCIR Lifelog: The First Test Collection for Lifelog Research. In Proc. SIGIR’16, to appear.

Tasks
Evaluate different methods of
retrieval and access.
T1: LIFELOG SEMANTIC ACCESS (LSAT)
§ Models the retrieval need
from lifelogs (Known-item
Search)
§ Retrieve N segments that
match information need
§ Interactive or Automatic
participation
§ Interactive: Time limit for fair
and comparative evaluation in
an interactive system with
users
§ Automatic: Fully-automatic
retrieval system. Automated
query processing
T2: LIFELOG INSIGHT
§ Models the need for
reflection over lifelog data
§ Exploratory task, the aim is
to:
§ Encourage broad
participation
§ Novel methods to
visualize and explore
lifelogs
§ Same data as LSAT task
§ Presented via demo/poster

Tasks
Evaluate different methods of
retrieval and access.
T1: LIFELOG SEMANTIC ACCESS (LSAT)
§ A known item search task to
find moments
§ Automatic and interactive
(4 & 1 participants)
§ 48 queries
§ Unit of retrieval was the
moment
§ Any image within a
moment can be
submitted T2: LIFELOG INSIGHT
§ Models the need for
reflection over lifelog data
§ Exploratory task, the aim is
to:
§ Encourage broad
participation
§ Novel methods to
visualize and explore
lifelogs
§ Same data as LSAT task
§ Three participants

Example LSAT Topic
Title: Tower Bridge
Description: Find the moment(s) when I was looking at
Tower Bridge in London
Narrative: To be considered relevant, the full span of
Tower Bridge must be visible. Moments of crossing the
Tower Bridge or showing some subset of Tower Bridge
are not considered relevant

Evaluation
top v typical automatic runs Interactive v automatic (best) runs

Example LIT Topics
Title: Who has a more healthy lifestyle?
Description: Compare the lifestyle of all three users within
the dimension of personal health and wellness
Narrative: There are many aspects to a healthy lifestyle, such
as the amount of exercise, the food and drink consumed,
environmental factors, the level of social interactions and
sleep time. This topic is seeking to understand which of the
users would be considered to be the most healthy. Any
dimension (or combination of dimensions) of healthy lifestyle
is considered acceptable as a point of comparison.

Aaron Duane, Rashmi Gupta, Liting Zhou, and Cathal Gurrin. “Visual Insights from Personal Lifelogs." In Proc. NTCIR 12, 2016.

Task 1: Lifelog Semantic Access
Find the
moment(s)
where I use my
coffee machine.
Find the
moment(s)
where I am in
the kitchen
Find the
moment(s)
where I am
playing with my
phone.
Find the
moment(s)
where I am
preparing
breakfast.
http://ntcir-lifelog.computing.dcu.ie/

Task 2: Lifelog Insight Task
Provide insights
on the time I
spend taking
breakfast.
Provide insights
on the time I
spend driving to
work.
Provide insights
on the time I
spend reading a
paper.
Provide insights
on the time I
spend working
on the computer.

Evaluation (Task 1)
• Automatic runs assume that there was no user involvement in
the search process beyond specifying the query. The search
system generates a ranked list of up to 100 moments for each
topic and no time .
• Interactive runs assume that there is a user involved in the
search process that generates a query and selects which
moments are considered correct for each topic.
Ø 1. In interactive runs, the maximum time allowed for any topic is 5
minutes
Ø 2. We used the time elapsed to calculate run performance at different
time Cut-offs. The Cut-offs were selected as 10s, 30s, 60s, 120s, 300s.
• Evaluation Metrics
Ø Mean Average Precision (MAP)
Ø Normalised Discounted Cumulative Gain (NDCG)

Example results
(Interactive Runs)

Shameless advertisement
Consider participating in
NTCIR Lifelog 2 and present
your work in Europe or
Japan

NTCIR-12: Lifelog Glasgow-Tokyo
session

Thank you for your attention
http://ntcir-
lifelog.computing.dcu.ie/
Frank Hopfgartner
Frank.Hopfgartner@glasgow.ac.uk
@OkapiBM25
www.hopfgartner.co.uk

ICME 2016 - Tutorial on Interactive Search in Video & Lifelog Repositories

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (7)

Similar a ICME 2016 - Tutorial on Interactive Search in Video & Lifelog Repositories

Similar a ICME 2016 - Tutorial on Interactive Search in Video & Lifelog Repositories (20)

Último

Último (17)

ICME 2016 - Tutorial on Interactive Search in Video & Lifelog Repositories