2. Interactive Search in Video & Lifelog Repositories
• Part 1: Interactive Video Search
Ø Search in video content: motivation and challenges
Ø Automatic video retrieval vs. interactive video search
Ø Tools for interactive search
§ Browsing, Navigation, Visualization, Similarity & Sketch-based Search
Ø Evaluation of IVS Tools
§ TRECVID, Video Browser Showdown (VBS)
Short break
• Part 2: Lifelogging
Ø Quantified Self
Ø Lifelog repositories
Ø Lifelogging techniques
Ø Interactive visualization
Klaus Schoeffmann IEEE International Conference on Multimedia & Expo (ICME) 2016 2
4. Video Everywhere
• Ubiquitous use of videos nowadays
Ø Entertainment and commercials
Ø Social gaming (screencasts)
Ø Personal videos (family, kids, …)
Ø Sports documentation and analysis (e.g., GoPro)
Ø Product usage instructions (e.g., furniture)
Ø Surveillance (buildings, places, street, …)
Ø Health care and medical science (endoscopic procedures)
Ø Lifelogging
• Enormous amount of data, challenging to search!
Klaus Schoeffmann IEEE International Conference on Multimedia & Expo (ICME) 2016 4
5. Video – The Ultimate Media?
Klaus Schoeffmann IEEE International Conference on Multimedia & Expo (ICME) 2016 5
[Mary Meeker, Liang Wu, Internet Trends, D11 Conference, May, 2013]
As of 2014, every
minute 300 hours of
video are uploaded
to YouTube!
6. Video Cameras
• Increasingly powerful
Ø These days you can record 4K content with your mobile!
Ø Video sensors use auto-focus, object tracking, color
correction, and image stabilization
Ø Storage space not a big problem
§ Current smartphones have 128 GB of memory
§ NAS devices cheaply available
Ø Network bandwidth also dramatically increased over years
§ Video streaming on the go is simple and common
§ LTE connections provide 30 Mbit/s and even much more!
Klaus Schoeffmann IEEE International Conference on Multimedia & Expo (ICME) 2016 6
8. Challenge: Finding Content
• Even with retrieval tools still challenging
to find content later
Ø Especially if not publicly available (and popular+annotated)
Ø Many problems with querying, in particular for non-experts
• Ultimate goal: make search as effective as for text
Ø Quickly find relevant content
Ø Compare to interactivity of a text book
§ Index, ToC, list of figures/tables, etc.
§ Change, extend, copy, bookmark, highlight, etc.
Klaus Schoeffmann IEEE International Conference on Multimedia & Expo (ICME) 2016 8
12. How a Novice Would Solve This
Novice users typically employ a file browser and a simple video player!
VCR in the 1970s provided a similar functionality!
12
?
Klaus Schoeffmann IEEE International Conference on Multimedia & Expo (ICME) 2016
File explorer and
video player
14. How a Novice Would Solve This
Novice users typically employ a file browser and a simple video player!
VCR in the 1970s provided a similar functionality!
14Klaus Schoeffmann IEEE International Conference on Multimedia & Expo (ICME) 2016
File explorer and
video player
15. • Video retrieval tool with content analysis and search
• Query by
Ø Text, Concept, Example
• Automatic search
Ø Content-based data such as:
§ Text (e.g., metadata, ASR, OCR,
transcripts, …)
§ Global features (e.g., color, texture,
motion)
§ Local features and concepts
(e.g., VLAD, BoVW, …)
Ø Ranked result list
15
IBM TRECVID 2007 Video Retrieval System [1]
How a Retrieval Expert Would Solve This
Klaus Schoeffmann IEEE International Conference on Multimedia & Expo (ICME) 2016
18. A More Recent Video Retrieval Tool
Klaus Schoeffmann IEEE International Conference on Multimedia & Expo (ICME) 2016 18
[A. Moumtzidou et al., “VERGE: A Multimodal Interactive Video Search Engine”, Proc. of the 21st International Conference on MultiMedia Modeling (MMM 2015), Sydney, 2015]
kNN Similarity search
based on VLAD vectors
Concept detection with SVM and
five local descriptors (SIFT, SURF,
ORB, ...) and PCA
or CNNs
Hierarchical
keyframe clustering
24. Common Video Retrieval Approach
Works well if
Ø users can properly express their needs.
Ø content features can sufficiently describe visual content.
Ø computer vision can accurately detect semantics.
24
Content-
based
Search
Ranked Results
Unfortunately, in practice these assumptions do not hold.
Klaus Schoeffmann IEEE International Conference on Multimedia & Expo (ICME) 2016
26. Ø Database affinity of concept classifiers
Ø Low performance in broad domain
P(k) Precision at level k (after k results)
rel(k) defines if kth retrieved document is relevant
Performance
Gap
26
TRECVID 2015 Semantic Indexing (60 concepts):
median “inferred average precision” (infAP) = 0.24
In other words:
more than 75%
of results are wrong!
Klaus Schoeffmann IEEE International Conference on Multimedia & Expo (ICME) 2016
Mind the Gap!
27. Ø Query-by-concept
§ Which concept to use? Choose from a long list of results…
Ø Query-by-example
§ Typically no perfect example available.
Ø Query-by-sketch
§ Users are no artists J (see also next slide)
Ø Query-by-text
§ How to describe a desired image by text?
Usability Gap
27
A picture tells a 1000 words.
by marfis75
How to describe a desired video clip by text???
Klaus Schoeffmann IEEE International Conference on Multimedia & Expo (ICME) 2016
Mind the Gap!
28. Needs More Focus on the User (Interface)!
Ø In some situations users cannot formulate a query
§ à provide exploratory search features!
§ For example: browsing, filtering, similarity search
Ø Users expect good results (on first page!)
§ à Use relevance feedback / active learning instead of long lists!
Ø Videos are dynamic
§ Static thumbnails are not informative
§ Esp. true for long shots and self-similar content
§ à skims and visual summaries (“smart playback”)
§ à sophisticated navigation & content structure visualization
Ø Shots have a temporal context
Ø Grid interfaces are not always the best choice
Klaus Schoeffmann IEEE International Conference on Multimedia & Expo (ICME) 2016 28
Usability Gap
30. Interactive Video Search
30
• HCI community
• Methods for interactive search
• Human computation
• No content understanding but simple
• Multimedia community
• Mostly automatic search
• Retrieval engine
• Complicated to use
Mismatch
Novices Experts
à Combine HCI with CV and MIR for better search tools
Klaus Schoeffmann IEEE International Conference on Multimedia & Expo (ICME) 2016
31. User-Centric Exploratory Search
• Strongly integrate user into search process
Ø Assume a smart user
Ø Give him/her more control over search process
§ Inspects and interacts
§ Selects most meaningful tool for current needs, e.g.
• Content Browsing/Navigation
• Content Visualization and Summarization
• Ad-hoc Querying (e.g., by sketch, filtering, ad-hoc example)
• Aspect-based exploration, parallel search paths
Ø Iterative: Search – Inspect – Think – Repeat
§ Exploratory search (“will know it when I see it”)
§ Instead of „query-and-browse-results“
31Klaus Schoeffmann IEEE International Conference on Multimedia & Expo (ICME) 2016
32. Aspects of Interactive Video Search (IVS)
IVS
Navigation &
Browsing
Different
Query
Types
Dynamics &
Convenience
Content
Visualization
Klaus Schoeffmann IEEE International Conference on Multimedia & Expo (ICME) 2016 32
Underlying Structure
Abstracts/summaries
Overview (TOC)
Skims
Smart Playback
Bookmarks
History
Text or Concept
Example Image
Example Clip
(Similarity Search)
Sketch
Filter (Spatial & Temporal)
Coarse Navigation
Fine Navigation
Browsing
Sequences/Scenes/Shots
Similarity-Based
Arrangements
(e.g., by Color)
39. Relative Flow Dragging
Background Stabilization
39
Pierre Dragicevic, Gonzalo Ramos, Jacobo Bibliowitcz, Derek Nowrouzezahrai, Ravin Balakrishnan, and Karan Singh. “Video browsing by direct manipulation”, in Proceedings of the SIGCHI Conference on Human Factors
in Computing Systems (CHI '08). ACM, pp. 237-246, 2008
Video browsing by direct manipulation / relative flow dragging
Klaus Schoeffmann IEEE International Conference on Multimedia & Expo (ICME) 2016
40. Relative Flow Dragging
• Evaluation with a user study
Ø 16 participants (18-44 years old)
Ø Direct comparison to seeker-bar navigation
Ø Navigation tasks, 2 videos (ladybug, cars)
§ “Find the position where the ladybug passes over marker X”
§ “Find the moment when car X starts moving”
Ø Flow dragging significantly faster (RM-ANOVA)
by at least 250% (also significantly less errors)
40
Pierre Dragicevic, Gonzalo Ramos, Jacobo Bibliowitcz, Derek Nowrouzezahrai, Ravin Balakrishnan, and Karan Singh. “Video browsing by direct manipulation”, in Proceedings of the SIGCHI Conference on Human Factors
in Computing Systems (CHI '08). ACM, pp. 237-246, 2008
Klaus Schoeffmann IEEE International Conference on Multimedia & Expo (ICME) 2016
42. Scrubbing Wheel
• Requirements
Ø Simple and effective
navigation on touchscreens
Ø Efficient navigation that allows
for content search
in both short and long videos
• Idea
Ø improve navigation by using a
circular navigation area
Ø inspired by Apple iPod (c) device
Klaus Schoeffmann IEEE International Conference on Multimedia & Expo (ICME) 2016 42
Klaus Schoeffmann and Lukas Burgstaller, “Scrubbing Wheel: An Interaction Concept to Improve Video Content Navigation on Devices with Touchscreens“, in Proceedings of the IEEE International Symposium on
Multimedia 2015 (ISM 2015), Miami, FL, USA, 2015, pp.351-356
47. Video Browser for the Digital Native
47
[Adams, Brett, Stewart Greenhill, and Svetha Venkatesh. "Towards a video browser for the digital native." Multimedia and Expo Workshops (ICMEW), 2012 IEEE International
Conference on. IEEE, 2012.]
“Temporal Semantic Compression” based on tempo function and shot popularity (insight)
Klaus Schoeffmann IEEE International Conference on Multimedia & Expo (ICME) 2016
48. Video Browser for the Digital Native
• User study with 8 participants
Ø Test configuration elements by two tasks
(after presentation + 5 minutes training)
§ (i) Browse a familiar movie to find scenes you remember
§ (ii) Browse an unfamiliar movie to get a feel for its story or structure
Ø Questionnaire with
Likert-scale ratings
48
[Adams, Brett, Stewart Greenhill, and Svetha Venkatesh. "Towards a video browser for the digital native." Multimedia and Expo Workshops (ICMEW), 2012 IEEE International
Conference on. IEEE, 2012.]
Klaus Schoeffmann IEEE International Conference on Multimedia & Expo (ICME) 2016
51. Motion Layout: Direction + Intensity
Motion Vector (µ) classification into
Motion histogram with K=12
equidistant motion directions (bins)
Mapping to Hue channel
51
[ Schoeffmann, K., Lux, M., Taschwer, M., & Boeszoermenyi, L. (2009, June). Visualization of video motion in context of video browsing. In Multimedia and Expo, 2009. ICME 2009. IEEE Int. Conf. on (pp. 658-661). IEEE. ]
Klaus Schoeffmann IEEE International Conference on Multimedia & Expo (ICME) 2016
53. • SOI Search
Ø Motion-based search by example sequence
§ Using Motion Direction histogram Db
§ User-selected sequence
Ø Find most similar sequences
§ Compute distance to any possible seq. of same length
§ Match if below spec. threshold
Klaus Schoeffmann IEEE International Conference on Multimedia & Expo (ICME) 2016 53
Motion Layout (Db)
Match 1 Match 2 Match 3
frame 1 frame n
Similarity Search (SOI) with Motion Layout
54. Region-of-Interest (ROI) Search
Ø User selects spatial region-of-interest
Ø On search
§ Compute Euclidian distance of frame F
to every other frame f (acc. to selected region)
§ Based on color layout descriptor
…
frame F
frame 1 frame k frame n
User-selected
region (I)
…
d(F,1)=350 d(F,k)=8 d(F,n)=400
54
[ Schoeffmann, K., Taschwer, M., & Boeszoermenyi, L. (2010, February). The video explorer: a tool for navigation and searching within a single video based on fast content analysis. In Proceedings of the first annual
ACM SIGMM conference on Multimedia systems (pp. 247-258). ACM. ]
Similarity Search (ROI) with Color Layout
Klaus Schoeffmann IEEE International Conference on Multimedia & Expo (ICME) 2016
56. The ForkBrowser
• Thread: linked sequence of shots in a specified order
Ø Query results, visual similarity, semantic similarity, textual similarity
time, …
Klaus Schoeffmann IEEE International Conference on Multimedia & Expo (ICME) 2016 56
[De Rooij, Ork, Cees GM Snoek, and Marcel Worring. "Balancing thread based navigation for targeted video search." Proceedings of the 2008 international conference on Content-based image
and video retrieval (CIVR). ACM, 2008.]
61. Grid Interfaces Aren‘t Enough!
• Many video retrieval systems use a Grid interface!?
Moreover, a grid interface does not allow
for fast human visual search (see later)!
61
A ranked list of results does not convey
the temporal content structure!
• To which video does a shot belong to?
• What is the sequence of shots?
• How long is a shot / scene?
Klaus Schoeffmann IEEE International Conference on Multimedia & Expo (ICME) 2016
64. Hierarchical Video Browsing
Another Tree-based Approach
Frontal View Top View
From: [Schoeffmann and Del Fabro, 2011]
64
• Goal: improve content overview
• No content analysis (just uniform sampling of frames)
Klaus Schoeffmann IEEE International Conference on Multimedia & Expo (ICME) 2016
66. 3D Ring Instead of Grid!
• Utilization of screen real estate
Ø Large set of images
Ø Minor occlusion, slight distortion
• Intuitive interaction
Ø Rotate and zoom
• Content-based sorting
• “Pop-out images” (in the back)
• Further advantages
Ø Immediately continue on miss,
scaling
66
Klaus Schoeffmann, David Ahlström, and Marco Andrea Hudelist, “3-D Interfaces to Improve the Performance of Visual Known-Item Search“,
in IEEE Transactions on Multimedia, Vol. 16, No. 7, November, 2014, pp. 1942-1951.
Klaus Schoeffmann IEEE International Conference on Multimedia & Expo (ICME) 2016
67. 3D Ring Interface - Perspectives
Preferred Design acc. to user study
25% Vertical 66% Horizontal 8.3% Frontal
67
Klaus Schoeffmann, David Ahlström, and Marco Andrea Hudelist, “3-D Interfaces to Improve the Performance of Visual Known-Item Search“,
in IEEE Transactions on Multimedia, Vol. 16, No. 7, November, 2014, pp. 1942-1951.
Klaus Schoeffmann IEEE International Conference on Multimedia & Expo (ICME) 2016
68. 3D interface significantly faster than grid by 12.7%
User Study: Grid vs. Ring (both sorted)
150 images, 12 participants, 1440 trials
68
Klaus Schoeffmann, David Ahlström, and Marco Andrea Hudelist, “3-D Interfaces to Improve the Performance of Visual Known-Item Search“,
in IEEE Transactions on Multimedia, Vol. 16, No. 7, November, 2014, pp. 1942-1951.
Klaus Schoeffmann IEEE International Conference on Multimedia & Expo (ICME) 2016
69. Extension: Multiple Rings with Vertical Scrolling
69
Klaus Schoeffmann. 2014. The Stack-of-Rings Interface for Large-Scale Image Browsing on Mobile Touch Devices. In Proc. of the ACM Int. Conference on Multimedia (MM '14). ACM, New York, NY, USA, 1097-1100.
Significantly faster search (by about 48%) than common image browser on iPad!
Klaus Schoeffmann IEEE International Conference on Multimedia & Expo (ICME) 2016
71. • Color sketches mapped to
feature signatures
• Matched to those of
keyframes
71
1. Sampling keypoints
2. Description through location (x,y),
CIE Lab, contrast and entropy of
surrounding pixels
3. k-means clustering
Feature Signatures
[ Kruliš, M., Lokoč, J. and Skopal, T. (2013). Efficient Extraction of Feature Signatures Using Multi-GPU Architecture. Springer Berlin Heidelberg, LNCS 7733, pp.446-456. ]
Klaus Schoeffmann IEEE International Conference on Multimedia & Expo (ICME) 2016
72. Feature Signature-Based Video Browser
72
Color Sketch
(Signature)
Player
Winner of Video Browser Showdown 2014 + 2015
Download demo at: http://siret.ms.mff.cuni.cz/lokoc/vbs.zip
2nd
Color Sketch
(optional)
[ Lokoč, J., Blažek, A., & Skopal, T. (2014, January). Signature-Based Video Browser. In MultiMedia Modeling (pp. 415-418). Springer International Publishing. ]
Klaus Schoeffmann IEEE International Conference on Multimedia & Expo (ICME) 2016
74. Compact Visualization to Save Space
Klaus Schoeffmann IEEE International Conference on Multimedia & Expo (ICME) 2016 74
[Courtesy of Jakub Lokoc et al.]
75. Another Example of a Sketch-Based Browser
Klaus Schoeffmann IEEE International Conference on Multimedia & Expo (ICME) 2016 75
[Kai Uwe Barthel, Nico Hezel, Radek Mackowiak. Navigating a graph of scenes for exploring large video collections, in Proc. of 22nd International Conference on MultiMedia Modeling (MMM 2016), Lecture Notes in
Computer Science (LNCS), Vol. tbd, Springer International Publishing, 2016, pp. 1-7]
Winner of Video Browser Showdown 2016
78. User Studies with Significance Tests!
• Many interfaces proposed without proper evaluation
• Interface A better than interface B?
à comparative user study needed!
Ø Perform search tasks in exactly the
same setting (data, environment, etc.)
Ø Logging of interaction behavior
and task solve time
Ø Questionnaire about subjective workloads
Ø Statistical analysis with proper tests
(e.g., t-test, ANOVA, Wilcoxon signed-rank, etc.)
• User simulations?
• Evaluation competitions
Ø Same data set
Ø Comparative evaluation
Ø TRECVID, MediaEval, Video Browser Showdown
Klaus Schoeffmann IEEE International Conference on Multimedia & Expo (ICME) 2016 78
79. Video Browser Showdown (VBS)
• Annual performance evaluation competition
Ø Live evaluation of search performance
Ø Special session at Int. Conference on MultiMedia Modeling (MMM)
Ø Demonstrates and evaluates state-of-the-art interactive video search tools
Ø Idea influenced by VideOlympics (Snoek et al., IEEE Multimedia 2008)
• Focus
Ø Known-item Search tasks
§ Target clips are presented on site
§ Teams search in shared data set
Ø Highly interactive search
§ Should push research on interfaces
and interaction/navigation
Ø Experts and Novices
§ Easy-to-use tools and methods
Ø Ad-Hoc Video Search (TRECVID AVS) tasks starting from 2017
79
http://videobrowsershowdown.org/
Klaus Schoeffmann IEEE International Conference on Multimedia & Expo (ICME) 2016
80. Video Browser Showdown (VBS)
• Live evaluation/scoring through VBS Server
• Score (s) [0-100] for task i and team k is based on
Ø Solve time (t)
Ø Penalty (p) based on
number of submissions (m)
80
Maximum solve time (Tmax)
typically 5 minutes
[Schoeffmann, K., Ahlström, D., Bailer, W., Cobârzan, C., Hopfgartner, F., McGuinness, K., ... & Weiss, W. (2013). The Video Browser Showdown: a live evaluation of interactive video search tools. International Journal
of Multimedia Information Retrieval, 1-15. ]
Klaus Schoeffmann IEEE International Conference on Multimedia & Expo (ICME) 2016
82. Video Browser Showdown 2016
• Search in mid-sized video collections
Ø Originally only single video search
• Two different kind of KIS tasks:
Ø Visual: visual presentation of a 30s target clip
Ø Textual: textual description of a 30s target clip
• Shared video data from BBC
Ø 2016: 441 video files, about 320.000 shots (250 hours)
[Schoeffmann, Klaus. "A user-centric media retrieval competition: The video browser showdown 2012-2014." MultiMedia, IEEE 21.4 (2014): 8-13.]
Klaus Schoeffmann IEEE International Conference on Multimedia & Expo (ICME) 2016 82
83. Visual Task Example (2016)
Klaus Schoeffmann IEEE International Conference on Multimedia & Expo (ICME) 2016 83
84. Textual Task Example (2016)
“Steve cutting a drawing into his block of wood. You
can see his hand and a cutter and flower symbols.”
Klaus Schoeffmann IEEE International Conference on Multimedia & Expo (ICME) 2016 84
86. Winner 2014 and 2015
(2014: single video and collection search, 2015: collection only)
86
Color Sketch
(Signature)
Player
2nd
Color Sketch
(optional)
[ Lokoč, J., Blažek, A., & Skopal, T. (2014, January). Signature-Based Video Browser. In MultiMedia Modeling (pp. 415-418). Springer International Publishing. ]
Klaus Schoeffmann IEEE International Conference on Multimedia & Expo (ICME) 2016
87. Video Browser Showdown 2015
Two other examples of the 9tools (collection search only)
87
Moumtzidou, A., Avgerinakis, K., Apostolidis, E., Markatopoulou, F., Apostolidis, K., Mironidis, T., ... &
Patras, I. (2015, January). VERGE: A Multimodal Interactive Video Search Engine. In MultiMedia Modeling
(pp. 249-254). Springer International Publishing.
• Shot and scene detection
• HLF (Concepts) with
SIFT/SURF and VLAD
• Similarity search
• Uniform sampled frames
• Human computation
Hürst, W., van de Werken, R., & Hoet, M. (2015, January). A Storyboard-Based
Interface for Mobile Video Browsing. In MultiMedia Modeling (pp. 261-265).
Springer International Publishing.
3rd place
Klaus Schoeffmann IEEE International Conference on Multimedia & Expo (ICME) 2016
88. Human vs. Machine
• Utrecht University @ VBS 2015
Ø Wolfgang Huerst et al., The Netherlands
Ø Strong experience in HCI
• Features
Ø Uniformly sampled thumbs
(1 second distance)
Ø Huge storyboard on tablet
Ø Vertical scrolling, paging
88
625 thumbnails in one screen
[Hürst, W., van de Werken, R., & Hoet, M. (2015, January). A Storyboard-Based Interface for Mobile Video Browsing. In MultiMedia Modeling (pp. 261-265). Springer International Publishing.]
Klaus Schoeffmann IEEE International Conference on Multimedia & Expo (ICME) 2016
91. A few words about me
Research on Multimedia Analysis,
Quantified Self, Lifelogging
Lecturer (Assistant Professor) in
Information Studies (UGlasgow)
PhD in Computing Science
(University of Glasgow)
Past: Various positions in Berlin
(TUB), Dublin (DCU), Berkeley
(ICSI), and London (QMUL)
92. What is The Quantified Self?
The Quantified Self is about obtaining self-knowledge
through self-tracking.
93. What is The Quantified Self?
Self-tracking is also referred to as lifelogging, self-
analysis, or self-hacking.
95. MyLifeBits
• Gordon Bell (Microsoft)
digitized his life:
Ø Books written
Ø Personal documents
Ø Photos
Ø Posters, paintings, photo of
things
Ø Home movies and videos
Ø CD collection
Ø PC files
Ø …
Gordon Bell and Jim Gemmell. Total Recall: How the E-Memory Revolution will change everything, New York, Dutton 2009
http://research.microsoft.com/en-us/projects/mylifebits/
99. Creating Personal Lifelog Repositories
A lifelog repository consists of heterogeneous data
recorded using many different sensors.
100. In this tutorial, we will…
• get an introduction
into the creation of
lifelog repositories
• understand the major
challenges of creating
lifelog repositories
• discuss how to
evaluate lifelogging
techniques.
101. So what are the challenges?
The challenges are how to sense the person, capture
their actions, their life and make it accessible using
appropriate graphical user interfaces,
search/recommendation engines and visual/aural
feedback. Further, exploiting the lifelog to identify
context for adaptive information services.
106. Recording my media consumption
Brusilovsky, P. and Kobsa, Alfred and Nejdl, Wolfgang. “The Adaptive Web: Methods and Strategies of Web Personalization." Lecture Notes in
Computer Science, Springer Verlag, 2007.
116. (Automatically) recording who I meet
• Inferred, weighted friendship network vs. reported,
discrete friendship network.
Eagle, Nathan and Pentland, Alex (Sandy) and Lazer, David. “Inferring friendship network structure by using mobile phone data." Proceedings of the
National Academy of Sciences of the United States of America, 106(36):15274-15278, 2009.
117. Recording what I eat
Aizawa, Kiyoharu, Maruyama, Yutu, Li, He, and Morikawa, Chamin. “Food Balance Estimation by Using Personal Dietrary Tendencies in a Multimedia
Food Log." IEEE Transactions on Multimedia, 15(8):2176-2185, 2013.
Semantic Gap
http://foodlog.jp/
http://mealsnap.com/
118. Recording what I eat
Source: http://edition.cnn.com/2014/01/29/world/asia/korea-eating-room/
119. Recording what I see
"LifeGlogging cameras 1998 2004 2006 2013 labeled" by Glogger - Own work. Licensed under CC BY-SA 3.0 via Commons -
https://commons.wikimedia.org/wiki/File:LifeGlogging_cameras_1998_2004_2006_2013_labeled.jpg#/media/File:LifeGlog
ging_cameras_1998_2004_2006_2013_labeled.jpg
128. Event Segmentation & Annotation
• Segment 5,500 photos per day into a set of events
Ø Similar to SBD in digital video processing
Ø We employ visual features and output of on-device sensors
Multiple Events
Finishing work in
the lab
At the bus stop Chatting at Skylon Hotel lobby Moving to a
room
Tea time On the way
back home
Event Segmentation
Summarization
Slide: Cathal Gurrin
129. Context is key
• Context cues help us to
remember (Naaman et al.)
• Context in lifelogging data:
Ø Location, bluetooth, time, date,
…
Ø Derived Knowledge (e.g.
activities)
• Approaches:
Ø Combine cues from different
sources
Ø Perform content analysis to
identify objects, people, events…
Ø Annotate lifelogs in form of
narrative text
Mor Naaman, Susumu Harada, QianYing Wang, Hector Garcia-Molina, Andreas Paepcke: Context data in geo-referenced digital photo collections.
ACM Multimedia 2004: 196-203
130. Visual Feature Extraction
Ø Steering wheel (72%)
Ø Shopping (75%)
Ø Inside of vehicle when not driving (airplane, taxi, car,
bus) (60%)
Ø Toilet/Bathroom (58%)
Ø Giving Presentation / Teaching (29%)
Ø View of Horizon (23%)
Ø Door (62%)
Ø Staircase (48%)
Ø Hands (68%)
Ø Holding a cup/glass (35%)
Ø Holding a mobile phone (39%)
Ø Eating food (41%)
Ø Screen (computer/laptop/tv) (78%)
Ø Reading paper/book (58%)
Ø Meeting (34%)
Ø Road (47%)
Ø Vegetation (64%)
Ø Office Scene (72%)
Ø Faces (61%)
Ø People (45%)
Ø Grass (61%)
Ø Sky (79%)
Ø Tree (63%)
Byrne, Daragh, Doherty, Aiden R., Snoek, Cees G. M., Jones, Gareth J. F., Smeaton, Alan F. “Everyday concept detection in visual lifelogs: validation,
relationships and trends." Multimedia Tools and Applications, 49(1):119-144, 2010.
131. Non-supervised Event
Segmentation
2. Arriving
in the office
6. Walking in
the building 12. Leaving
the office
Na Li et al. “Random Matrix Ensembles of Time Correlation Matrices to Analyze Visual Lifelogs." In Proc. Multimedia Modeling Conference, Dublin,
Ireland, pp. 400-411, 2014.
Event Segmentation based on the
extraction of low level features and
computation of semantic concepts
requires knowledge about dataset.
Alternative: Highlight “significant
events” by performing time series
analysis
133. People access memory for five reasons
Sellen, Abigail and Whittaker, Steve. “Beyond Total Capture: A Constructive Critique of Lifelogging." Communications of the ACM, 53(5):70-77,
2010.
•Reliving past experiences for various reasons
Recollecting
•Story-telling or sharing life experiences with others
Reminiscing
•Find specific information such as an address, or a document
Retrieving
•Gaining insights (Quantified Self)
Reflecting
•Planning future activities.
Remembering
147. Browsing in the Living Room
• Control with a suite of
gestures:
Ø Next/previous event
Ø Next/previous image
Ø Next/previous day, week, …
• Possibility of pivot view across
multiple axes, e.g., people,
locations, …
Gurrin, Cathal and Lee, Hyowon and Caprani, Niamh and Zheng, Zhenxing and O’Connor, Noel and Carthy, Denise. “Browsing Large Personal
Multimedia Archives in a Lean-back Environment." In Proc. Multimedia Modeling Conference, pp. 98-109, 2010.
156. Open Research Questions
• Multimedia summarisation
• Handling heterogeneous data streams
• Visualisation of lifelogs
• Retrieval and Recommendation
• …
158. NTCIR-12 Tasks
NTCIR-12
§ Second round:
§ Search-Intent Mining
§ Mobile Click
§ Temporal Information Access
§ Spoken Query & Spoken Document Retrieval
§ QA Lab for Entrance Exam
§ First round:
§ Medical NLP for Clinical Documents
§ Personal Lifelog Access & Retrieval
§ Short Text Conversation
160. Multimodal dataset with information
needs
Created by three
individuals over
10+ days
TEST COLLECTION
§ 18.18GB
§ 88,124 images
§ Accompanying output of
1,000 concepts (825MB)
§ Data processed pre-release
(removal of personal content;
face blurring, translation of
concepts)
§ Detailed user queries and
judgments generated by the
lifelogging data gatherers
C. Gurrin, H. Joho, F. Hopfgartner, L. Zhou, R. Albatal. NTCIR Lifelog: The First Test Collection for Lifelog Research. In Proc. SIGIR’16, to appear.
163. Example LSAT Topic
Title: Tower Bridge
Description: Find the moment(s) when I was looking at
Tower Bridge in London
Narrative: To be considered relevant, the full span of
Tower Bridge must be visible. Moments of crossing the
Tower Bridge or showing some subset of Tower Bridge
are not considered relevant
165. Example LIT Topics
Title: Who has a more healthy lifestyle?
Description: Compare the lifestyle of all three users within
the dimension of personal health and wellness
Narrative: There are many aspects to a healthy lifestyle, such
as the amount of exercise, the food and drink consumed,
environmental factors, the level of social interactions and
sleep time. This topic is seeking to understand which of the
users would be considered to be the most healthy. Any
dimension (or combination of dimensions) of healthy lifestyle
is considered acceptable as a point of comparison.
167. Task 1: Lifelog Semantic Access
Find the
moment(s)
where I use my
coffee machine.
Find the
moment(s)
where I am in
the kitchen
Find the
moment(s)
where I am
playing with my
phone.
Find the
moment(s)
where I am
preparing
breakfast.
http://ntcir-lifelog.computing.dcu.ie/
168. Task 2: Lifelog Insight Task
Provide insights
on the time I
spend taking
breakfast.
Provide insights
on the time I
spend driving to
work.
Provide insights
on the time I
spend reading a
paper.
Provide insights
on the time I
spend working
on the computer.
http://ntcir-lifelog.computing.dcu.ie/
169. Evaluation (Task 1)
• Automatic runs assume that there was no user involvement in
the search process beyond specifying the query. The search
system generates a ranked list of up to 100 moments for each
topic and no time .
• Interactive runs assume that there is a user involved in the
search process that generates a query and selects which
moments are considered correct for each topic.
Ø 1. In interactive runs, the maximum time allowed for any topic is 5
minutes
Ø 2. We used the time elapsed to calculate run performance at different
time Cut-offs. The Cut-offs were selected as 10s, 30s, 60s, 120s, 300s.
• Evaluation Metrics
Ø Mean Average Precision (MAP)
Ø Normalised Discounted Cumulative Gain (NDCG)
http://ntcir-lifelog.computing.dcu.ie/