Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Can Deep Learning and Egocentric Vision for Visual Lifelogging help us eat better?
1.
Can Deep Learning and Egocentric Vision
for Visual Lifelogging help us eat better?
Petia Radeva
www.cvc.uab.es/~petia
Computer Vision at UB (CVUB), Universitat de Barcelona &
Medical Imaging Laboratory, Computer Vision Center
5. Project led by Dr. Maite Garolera of the Consorci Sanitari de Terrassa:
Goal: using episodic images to develop cognitive exercises and tools for memory
reinforcing of MCI and Alzheimer people.
22:45
But episodic images serve for something more than reinforcing memory….
They are showing the lifestyle of individuals!
Rememory: Life-logging for MCI treatment
8. Obesity in Catalunya
51% of the Catalan population from 18 to 74 years overweight, 15% are obese.
62% without university studies vs. 36% with high education. 22:45
9. The obesity pandemic
Risk factors for cancers, cardiovascular and
metabolic disorders and leading causes of
premature mortality worldwide.
4.2 million die of chronic diseases in Europe
(diabetes or cancer) linked to lack of physical
activities and unhealthy diet.
Physical activities can increase lifespan by
1.5-3.7 years.
22:45
10. Which wearables do consumers plan to buy?
• 21M Fitbit sold in 2015!
• It’s expected to double by 2018, to 81.7 million users.
22:45
The Consumer Technology Association (CTA), formerly the Consumer Electronics Association (CEA), surveyed
1,001 US internet users. Source: eMarketer.
11. Today, automatically measuring physical activity is not a problem.
But what about food and nutrition?
22:45
What are we missing in health applications?
12. But what about food and nutrition?
State of the art: Nutritional health apps are based on manual food diaries.
22:45
Sparkpeople
LoseIt!
MyFitnessPal
Cronometer Fatsecret
What are we missing in health applications?
20. White House wants the nation to get ready for AI
October, 2016
http://readwrite.com/2016/10/16/white-house-offers-artificial-intelligence-plan-cl1/
22:45
23. The learning process
22:45
argminf Σi Error(yi(f),yi)
Expectation over
data distribution
Prediction Ground Truth
Measure of prediction quality (error, loss)
Training data {(xi,yi), i = 1,2,…,n}
Loss function the negative conditional log-likelihood, with the interpretation that fi(X) estimates
P(Y=i|X):
L(f(x),y)) = -log fi(x), where fi(x)>=0, Σi fi(x) = 1.
24. The problem of image classification
22:45
32x32x3 D vector
Each image of M rows by N columns by C channels (3 for color images) can be
considered as a vector/point in RMxNxC and viceversa.
Dual representation of images as points/vectors
R32x32x3
25. Linear classification
22:45
Given two classes how to learn a hyperplane to separate them?
R32x32x3
To find the hyperplane that separates dogs from cats, we need to define:
• The score function
• The loss function
• And the optimization process.
26. Linear classification
22:45
How to project data in the feature space:
f(x)=W x + b
If x is an image of (32x32x3), -> x in R3072,
The matrix W is (3x3072).
The bias vector b is 3-dimensional.
3072x1
3x3072 3x1
3x1
27. Linear classification
22:45
How to project data in the feature space:
f(x)=W x + b
If we have 3 classes, f(x) will give 3 scores.
3072x1
3x3072 3x1
3x1
29. Loss function and optimisation
Question: if you were to assign a single number to how unhappy you are
with these scores, what would you do?
22:45
Question : Given the score and the loss function, how to find the parameters W?
L(f(xi),yi)
W
Loss function
f(xi,W)
Score
function
Input
Xi
Yi
30. How is a CNN doing deep learning?
22:45
y=Wx
Image
….
First layer
y1=ΣiW1ixi
y10=ΣiW10ixi
….
Second layer
y=W(Wx) y=W(W(Wx))
….
Output layer
W11
W12
W13
W1n
Fully connected layers
y1=ΣiW1ixi
…
31. Why a CNN is a neural network?
From: Fei-Fei Li & Andrej Karpathy & Justin Johnson
22:45
Modern CNNs – 10M neurons
Human CNNs – 5B of neurons.
36. Example architecture
22:45
The trick is to train the weights such that when the network sees a picture of a truck, the last layer will say
“truck”.
Credit slide: Li Fei-fei
37. Training a CNN
22:45
The process of training a CNN consists of training all hyperparameters: convolutional
matrices and weights of the fully connected layers.
- Several millions of parameters!!!
38. 1001 benefits of CNN
Transfer learning: Fine tunning for object recognition
Replace and retrain the classier on top of the ConvNet
Fine-tune the weights of the pre-trained network by continuing the backpropagation
Feature extraction by CNN
Object detection
Object segmentation
Image similarity and matching by CNN
22:45Convolutional Neural Networks (4096 Features)
39. Index
Healthy habits and food analysis
Deep learning
Automatic food analysis
Egocentric vision
22:45
40. Automatic food analysis
Can we automatically recognize food?
• To detect and classify every instance of a dish in all of its variants, shapes and
positions and in a large number of images.
The main problems that arise are:
• Complexity and variability of the data.
• Huge amounts of data to analyse.
22:45
43. Image Input
Foodness Map
Extraction
Food Detection CNN
Food Recognition CNN
Food Type
Recognition
Apple
Strawberry
Food recognition
Results: TOP-1 74.7%
TOP-5 91.6%
SoA (Bossard,2014): TOP-1 56,4%22:45
44. Demo
22:45
Herruzo, P., Bolaños, M. and Radeva, P. (2016). “Can a CNN Recognize Catalan Diet?”. In Proceedings of the 8th Intl Conf. for
Promoting the Application of Mathematics in Technical and Natural Sciences (AMiTaNS).
45. Food environment classification
Bakery
Banquet hall
Bar
Butcher shop
Cafetería
Ice cream parlor
Kitchen
Kitchenette
Market
Pantry
Picnic Area
Restaurant
Restaurant Kitchen
Restaurant Patio
Supermarket
Candy store
Coffee shop
Dinette
Dining room
Food court
Galley
Classification results:
0.92 - Food-related vs. Non-food-related
0.68 - 22 classes of Food-related categories
22:45
46. Towards automatic image description
22:45
Bolaños, M., Peris, Á., Casacuberta, F., & Radeva, P. “VIBIKNet: Visual Bidirectional Kernelized Network for the VQA
Challenge” VQA Challenge, CVPR '16.
47. Two main questions?
What we eat?
Automatic food recognition vs. Food
diaries
And how we eat?
Automatic eating pattern extraction –
when, where, how, how long, with
whom, in which context?
22:45
48. Index
Healthy habits and food analysis
Deep learning
Automatic food analysis
Egocentric vision
22:45
49. Wearable cameras and the life-logging trend
Shipments of wearable computing devices worldwide by
category from 2013 to 2015 (in millions)
22:45
51. Wealth of life-logging data
We propose an energy-based approach for motion-based event
segmentation of life-logging sequences of low temporal
resolution
- The segmentation is reached integrating different kind of
image features and classifiers into a graph-cut framework to
assure consistent sequence treatment.
22:45
Complete dataset of a day captured with SenseCam (more than 4,100 images
Choice of devise depends on:
1) where they are set: a hung up camera has
the advantage that is considered more
unobtrusive for the user, or
2) their temporal resolution: a camera with a
low fps will capture less motion information,
but we will need to process less data.
We chose a SenseCam or Narrative - cameras
hung on the neck or pinned on the dress that
capture 2-4 fps.
Or the hell of life-logging data
52. Visual Life-logging data
Events to be extracted from life-logging images
- Activities he/she has done
- Interactions he/she has participated
- Events he/she has taken part
- Duties he/she has performed
- Environments and places he/she visited, etc.
22:45
Dimiccoli, M., Bolaños, M., Talavera, E., Aghaei, M., Nikolov, S., and Radeva, P. (2015). “SRClustering: Semantic Regularized Clustering for
Egocentric Photo Streams Segmentation”. In Computer Vision and Image Understanding Journal (CVIU) (In press). Preprint:
http://arxiv.org/abs/1512.07143
53. Egocentric vision progress
22:45
Bolaños, M., Dimiccoli, M. & Radeva, P. (2015). “Towards Storytelling from Visual Lifelogging: An Overview”.
In Transactions on HumanMachine Systems Journal (THMS) (IN PRESS). Preprint: http://arxiv.org/abs/1507.06120
54. Towards healthy habits
Towards visualizing summarized lifestyle data to ease the management of the user’s
healthy habits (sedentary lifestyles, nutritional activity, etc.).
22:45
M. Aeghai, M. Dimiccoli, P. Radeva. Extended Bag-of-Tracklets for Multi-Face Tracking in Egocentric Photo Streams. Computer Vision and Image
Understanding, Volume 149, 146-156, 2016. Special Issue on Assistive Computer Vision and Robotics, Elsevier, 2016. doi: 10.1016/j.cviu.2016.02.013
55. Conclusions
Healthy habits – one of the main health concern for people, society, and
governments
Deep learning – a technology that came to stay
A new technological trend that is affecting directly our environment
Food analysis and recognition – a new challenge with huge potential for applications
We need food databases of millions of images and thousands of categories
A wide set of problems for food analysis – recognition, segmentation, habits
characterization, image and video description, etc.
Egocentric vision and Lifelogging – a recent trend in Computer Vision and
unexplored technology that hides big potential to help people monitor and describe
their behaviour and thus improve their lifestyle.
22:45
51% de la població catalana de 18 a 74 anys pateix un excés de pes important –un 15% són obesos–, aquesta situació afecta un 62% dels que no tenen estudis o no van superar els de Primària, i un 36% de les famílies amb formació universitària.
“Deep learning: In recent years, some of the most impressive advancements in machine learning have been in the subfield of deep learning, also known as deep network learning. Deep learning uses structures loosely inspired by the human brain, consisting of a set of units (or “neurons”). Each unit combines a set of input values to produce an output value, which in turn is passed on to other neurons downstream. …”
Exponential linear units- ELU all benefits of relu, does not die, closer to zero meanoutputs, but computation requires exp()