SlideShare una empresa de Scribd logo
1 de 29
Semantic Indexing of Wearable
Camera Images: Kids’Cam
Concepts
Alan F. Smeaton
(Dublin City University)
… and …
... Kevin McGuinness and Cathal Gurrin and Jiang Zhou
and Noel E. O’Connor
and Peng Wang
and Brian Davis and Lucas Azevedo
and Andre Freitas
and Louise Signal and Moira Smith and James Stanley
and Michelle Barr and Tim Chambers and Cliona Ní
Mhurchu
Overview
• Automatic assignment of one-per-class concept detectors
is now commonplace.
• We’re interested in the challenging case of processing
images from wearable cameras where improvement is
necessary.
• We try to exploit some limited manual annotations to
improve accuracy of automatic concept weights.
• This work is not complete, its ongoing, but the story is
interesting.
Analysis of Visual Media
• More progress made within the last few years than in previous decade
• Incorporation of deep learning plus availability of huge searchable
image resources and training data
• Automatic image tagging is now hosted
and offered by website like Aylien,
Imagga, Clarifai, and others, and very
cost-effective.
Analysis of Visual Media
• These developments are welcome … but … restrictive tagging
vocabularies.
• How to map to users formulating queries
• Alternative approach is tagging at query time but its expensive and not
scalable to huge collections.
• Almost all work on concept detection based on one concept at a time.
• TRECVid tried simultaneous detection of concept pairs like “computer-
screen with telephone”, and “airplane with clouds”.
• Limited success but “Government Leader with Flag” was OK !
• Detection of concepts independently needs a course-correction
because:
– Doesn’t avail of all available information sources
– Doesn’t map to a user’s search vocabulary
Long-term approach …
Images Concept Set
Mapping
User Search
vocabulary
How can a single image be mapped to two different vocabularies ?
Using NL for image search … tagging
• NL is fraught with complexities, ambiguities at all levels ..
– Lexical level polysemy
– Syntactic level structural ambiguity
– Semantic interpretations
– Discourse level pronoun resolution
• + vocabulary limitations when finding word or phrase to describe
something
• When using computers to help search for image data, language
challenges are exacerbated yet we assume a “simplistic” approach of
tagging by a set of concepts, notwithstanding what we’re seeing with
captioning here today
• Tagging is very useful for smaller, niche applications in restricted
domains with manual tagging, but we see scalability problems
– Addressed with progress in automatic tagging but we’re tolerant of
inaccuracies !
In this paper …
• We are interested in images from wearable cameras with lots of juicy
challenges.
• Notoriously difficult to process automatically because …
– Blurring caused by wearer moving at image capture
– Occlusions from wearer’s hands
– Lighting conditions
– Fisheye lens for wider perspective causing distortion
– First person viewpoint but not what wearer sees
– Content varies hugely across subjects
• Applications in memory support, behaviour recording and analysis,
security, other work-related, and QS.
• In this paper we work with wearable camera data from school children,
for analysis of their environments
Wearable Camera Images
The Kids’Cam Project
• Child obesity is a significant public health concern, worldwide.
• Unequivocal evidence that marketing of energy-dense and nutrient-
poor foods and beverages is a causal factor in child obesity.
• Evidence of children’s total exposure to advertising of poor foodstuffs
is not quantified.
• Kids’Cam study aimed to determine the frequency, nature and duration
of children’s exposure to such marketing.
• 169 randomly selected children 11 to 13 yo from 16 schools in
Wellington, NZ, each wore an Autographer and carried a GPS for 4
days .. .mages every 7 seconds, GPS every 5 seconds.
– 1.5M images, 2.5M GPS datapoints
• Manual annotation for food / beverage marketing using a 3-level, 53
concept ontology .. Inter-annotator reliability of 90%.
+ GPS + Date/Time + User
xxxx yyyy zzzz
xxxx yyyy zzzz
xxxx yyyy zzzz
xxxx yyyy zzzz
85 concepts
Manual Annotation
Shop front > sign > sugary drinks/juices
Convenience store indoors > in-store
marketing > convenience store
School > sign > fast food
Processing the Kids’Cam Data
• Following integration of different data sources and after the
manual annotation of images, we processed the image
collection in the following way …
14
+ GPS + Date/Time + User
xxxx yyyy zzzz
xxxx yyyy zzzz
xxxx yyyy zzzz
xxxx yyyy zzzz
85 concepts
:
X 1.5M images
aaaa bbbb cccc
aaaa bbbb cccc
aaaa bbbb cccc
aaaa bbbb cccc
aaaa bbbb cccc
aaaa bbbb cccc
TFR
aaaa’ bbbb’ cccc’
aaaa’ bbbb’ cccc’
aaaa’ bbbb’ cccc’
aaaa’ bbbb’ cccc’
aaaa’ bbbb’ cccc’
aaaa’ bbbb’ cccc’
GPU
Concept
Models
6,000
concepts
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
aaaaa bbbb ccccc
aaaa bbbb ccccc
aaaa bbbb ccccc
aaaa bbbb ccccc
8
7
6
5
4
3
2
1
9
13
12
11
10
14
+ GPS + Date/Time + User
xxxx yyyy zzzz
xxxx yyyy zzzz
xxxx yyyy zzzz
xxxx yyyy zzzz
85 concepts
:
X 1.5M images
aaaa bbbb cccc
aaaa bbbb cccc
aaaa bbbb cccc
aaaa bbbb cccc
aaaa bbbb cccc
aaaa bbbb cccc
TFR
aaaa’ bbbb’ cccc’
aaaa’ bbbb’ cccc’
aaaa’ bbbb’ cccc’
aaaa’ bbbb’ cccc’
aaaa’ bbbb’ cccc’
aaaa’ bbbb’ cccc’
GPU
Concept
Models
6,000
concepts
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
aaaaa bbbb ccccc
aaaa bbbb ccccc
aaaa bbbb ccccc
aaaa bbbb ccccc
8
7
6
5
4
3
2
1
9
13
12
11
10
14
+ GPS + Date/Time + User
xxxx yyyy zzzz
xxxx yyyy zzzz
xxxx yyyy zzzz
xxxx yyyy zzzz
85 concepts
:
X 1.5M images
aaaa bbbb cccc
aaaa bbbb cccc
aaaa bbbb cccc
aaaa bbbb cccc
aaaa bbbb cccc
aaaa bbbb cccc
TFR
aaaa’ bbbb’ cccc’
aaaa’ bbbb’ cccc’
aaaa’ bbbb’ cccc’
aaaa’ bbbb’ cccc’
aaaa’ bbbb’ cccc’
aaaa’ bbbb’ cccc’
GPU
Concept
Models
6,000
concepts
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
aaaaa bbbb ccccc
aaaa bbbb ccccc
aaaa bbbb ccccc
aaaa bbbb ccccc
8
7
6
5
4
3
2
1
9
13
12
11
10
7. Using a CNN to apply tags
to images. We used the
VGG-16 network, a deep
CNN, trained on 1,000
object classes using 1.2M
images from ImageNet
14
+ GPS + Date/Time + User
xxxx yyyy zzzz
xxxx yyyy zzzz
xxxx yyyy zzzz
xxxx yyyy zzzz
85 concepts
:
X 1.5M images
aaaa bbbb cccc
aaaa bbbb cccc
aaaa bbbb cccc
aaaa bbbb cccc
aaaa bbbb cccc
aaaa bbbb cccc
TFR
aaaa’ bbbb’ cccc’
aaaa’ bbbb’ cccc’
aaaa’ bbbb’ cccc’
aaaa’ bbbb’ cccc’
aaaa’ bbbb’ cccc’
aaaa’ bbbb’ cccc’
GPU
Concept
Models
6,000
concepts
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
aaaaa bbbb ccccc
aaaa bbbb ccccc
aaaa bbbb ccccc
aaaa bbbb ccccc
8
7
6
5
4
3
2
1
9
13
12
11
10
8. Trained models were
used to predict probabilities
for each concept in each of
1.5M images. Processed in
batches of 64 on NVIDIA
GPU, taking 4 days to
complete
Training Free Refinement
• Current concept-at-a-time classifiers do not consider inter-
concept relationships or dependencies yet these do exist
• To improve one-per-class detectors, we post-process detection
scores
– We take advantage of concept co-occurrence and re-
occurrence which depend on the particular collection
– We take advantage of local (temporal) neighbourhood
information where concepts are likely to re-occur close in
time
– We use GPS location information where concepts identified
by a person at a location, may re-occur subsequently at that
same location
• TFR is based on non-negative matrix factorisation, described
elsewhere
14
+ GPS + Date/Time + User
xxxx yyyy zzzz
xxxx yyyy zzzz
xxxx yyyy zzzz
xxxx yyyy zzzz
85 concepts
:
X 1.5M images
aaaa bbbb cccc
aaaa bbbb cccc
aaaa bbbb cccc
aaaa bbbb cccc
aaaa bbbb cccc
aaaa bbbb cccc
TFR
aaaa’ bbbb’ cccc’
aaaa’ bbbb’ cccc’
aaaa’ bbbb’ cccc’
aaaa’ bbbb’ cccc’
aaaa’ bbbb’ cccc’
aaaa’ bbbb’ cccc’
GPU
Concept
Models
6,000
concepts
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
aaaaa bbbb ccccc
aaaa bbbb ccccc
aaaa bbbb ccccc
aaaa bbbb ccccc
8
7
6
5
4
3
2
1
9
13
12
11
10
9. Previously-described, we
then applied Training-Free
Refinement to improve
probability assignments
• We do not know accuracy of assignment of 1,000 concepts but we
know accuracy of assignment of 53 concepts …and we have 1.5M
images each mapped into 2 concept spaces
• Can we adjust values in (b), anchored and pivoting around (a) in
addition to having already used local, within-collection distributions ?
y1
y2
x1 x2
b2
b1
a1 a2
(a) Manual, correct (b) Automatic,
unknown
accuracy
y1
y2
x1 x2
b2
b1
a1 a2
(a)
(b)
b2’
b1’
a1’ a2’
(c)
Cross-mapping concept spaces
• Distributional semantics – corpus-driven approach – based
on hypothesis that co-occurring words in similar contexts
have similar meaning
• Using word2vec in DINFRA, we can
map all words in a vocabulary to an
n-dimensional vector space, where
we can obtain relatedless scores
among the words
• Figure illustrates an example
• For each image in Kids’Cam we can
evaluate relatedness between human
annotation and automatic concepts
with highest-probability
School > availability > drink bottle
• We have top-ranked
concepts, their
confidences, their
relatedness to the
manual tags …
• First effort is to simply
multiple, as in Table, but
its hard to see the
impact of this
And the result is …
• … and that’s where we currently are !
Conclusions and Future Work
• Since automatic concept-detection using pre-defined models has
made so much progress recently, we’re seeing vocabulary / concept
space mis-matching
• Using 1.5M Kids’Cam images from wearable cameras, we have used
within-collection distributions to “smooth” concept weights (outliers and
gaps) in TFR
• We are trying to pivot around some manual annotations in order to
improve concept accuracies
• But, we need …
– More concepts – a richer vocabulary of them
– More varied manual annotations, not just fast food adverts
– A more global or collection-wide way to combine concept
confidences and relatedless to known manual annotations
– Some validation of accuracy of automatic concepts to measure
accuracy of our post-processing
Finally, a plug …
• TRECVid Video captioning Pilot task 2016
• 2,000 x Vine Videos, manually annotated with
captions, twice
• 8 participating groups (CMU, CUHK, DCU, GMU,
NII, UvA, Sheffield)
• Two tasks …
– For each video, rank the 2,000 captions –
metric is MRR
– For each video, generate your own caption –
metrics are bleu, meteor, and UMBC STS
(Semantic Textual Similarity) Service
• Lots of lessons learned and will build upon for full
task in 2017, probably using Vine videos

Más contenido relacionado

Similar a "Semantic Indexing of Wearable Camera Images: Kids’Cam Concepts"

Emotion recognition and drowsiness detection using python.ppt
Emotion recognition and drowsiness detection using python.pptEmotion recognition and drowsiness detection using python.ppt
Emotion recognition and drowsiness detection using python.ppt
Gopi Naidu
 

Similar a "Semantic Indexing of Wearable Camera Images: Kids’Cam Concepts" (20)

The Frontier of Deep Learning in 2020 and Beyond
The Frontier of Deep Learning in 2020 and BeyondThe Frontier of Deep Learning in 2020 and Beyond
The Frontier of Deep Learning in 2020 and Beyond
 
slide-171212080528.pptx
slide-171212080528.pptxslide-171212080528.pptx
slide-171212080528.pptx
 
Forever Young: A Tribute to the Grandmaster through a recount of Personal Jou...
Forever Young: A Tribute to the Grandmaster through a recount of Personal Jou...Forever Young: A Tribute to the Grandmaster through a recount of Personal Jou...
Forever Young: A Tribute to the Grandmaster through a recount of Personal Jou...
 
Real Time Object Dectection using machine learning
Real Time Object Dectection using machine learningReal Time Object Dectection using machine learning
Real Time Object Dectection using machine learning
 
Seminar nov2017
Seminar nov2017Seminar nov2017
Seminar nov2017
 
Elderly Assistance- Deep Learning Theme detection
Elderly Assistance- Deep Learning Theme detectionElderly Assistance- Deep Learning Theme detection
Elderly Assistance- Deep Learning Theme detection
 
Time series analysis : Refresher and Innovations
Time series analysis : Refresher and InnovationsTime series analysis : Refresher and Innovations
Time series analysis : Refresher and Innovations
 
Makine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine Learning
Makine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine LearningMakine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine Learning
Makine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine Learning
 
Surveillance scene classification using machine learning
Surveillance scene classification using machine learningSurveillance scene classification using machine learning
Surveillance scene classification using machine learning
 
Emotion recognition and drowsiness detection using python.ppt
Emotion recognition and drowsiness detection using python.pptEmotion recognition and drowsiness detection using python.ppt
Emotion recognition and drowsiness detection using python.ppt
 
Ultrasound Nerve Segmentation
Ultrasound Nerve Segmentation Ultrasound Nerve Segmentation
Ultrasound Nerve Segmentation
 
Deep learning Tutorial - Part II
Deep learning Tutorial - Part IIDeep learning Tutorial - Part II
Deep learning Tutorial - Part II
 
Video to Video Translation CGAN
Video to Video Translation CGANVideo to Video Translation CGAN
Video to Video Translation CGAN
 
Introduction to computer vision with Convoluted Neural Networks
Introduction to computer vision with Convoluted Neural NetworksIntroduction to computer vision with Convoluted Neural Networks
Introduction to computer vision with Convoluted Neural Networks
 
Creating 3D neuron reconstructions from image stacks and virtual slides
Creating 3D neuron reconstructions from image stacks and virtual slidesCreating 3D neuron reconstructions from image stacks and virtual slides
Creating 3D neuron reconstructions from image stacks and virtual slides
 
YU CS Summer 2021 Project | TensorFlow Street Image Classification and Object...
YU CS Summer 2021 Project | TensorFlow Street Image Classification and Object...YU CS Summer 2021 Project | TensorFlow Street Image Classification and Object...
YU CS Summer 2021 Project | TensorFlow Street Image Classification and Object...
 
ADS Team 8 Final Presentation
ADS Team 8 Final PresentationADS Team 8 Final Presentation
ADS Team 8 Final Presentation
 
Image Object Detection Pipeline
Image Object Detection PipelineImage Object Detection Pipeline
Image Object Detection Pipeline
 
Video + Language: Where Does Domain Knowledge Fit in?
Video + Language: Where Does Domain Knowledge Fit in?Video + Language: Where Does Domain Knowledge Fit in?
Video + Language: Where Does Domain Knowledge Fit in?
 
Video + Language: Where Does Domain Knowledge Fit in?
Video + Language: Where Does Domain Knowledge Fit in?Video + Language: Where Does Domain Knowledge Fit in?
Video + Language: Where Does Domain Knowledge Fit in?
 

Último

Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 

Último (20)

Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 

"Semantic Indexing of Wearable Camera Images: Kids’Cam Concepts"

  • 1. Semantic Indexing of Wearable Camera Images: Kids’Cam Concepts Alan F. Smeaton (Dublin City University) … and …
  • 2. ... Kevin McGuinness and Cathal Gurrin and Jiang Zhou and Noel E. O’Connor and Peng Wang and Brian Davis and Lucas Azevedo and Andre Freitas and Louise Signal and Moira Smith and James Stanley and Michelle Barr and Tim Chambers and Cliona Ní Mhurchu
  • 3. Overview • Automatic assignment of one-per-class concept detectors is now commonplace. • We’re interested in the challenging case of processing images from wearable cameras where improvement is necessary. • We try to exploit some limited manual annotations to improve accuracy of automatic concept weights. • This work is not complete, its ongoing, but the story is interesting.
  • 4. Analysis of Visual Media • More progress made within the last few years than in previous decade • Incorporation of deep learning plus availability of huge searchable image resources and training data • Automatic image tagging is now hosted and offered by website like Aylien, Imagga, Clarifai, and others, and very cost-effective.
  • 5. Analysis of Visual Media • These developments are welcome … but … restrictive tagging vocabularies. • How to map to users formulating queries • Alternative approach is tagging at query time but its expensive and not scalable to huge collections. • Almost all work on concept detection based on one concept at a time. • TRECVid tried simultaneous detection of concept pairs like “computer- screen with telephone”, and “airplane with clouds”. • Limited success but “Government Leader with Flag” was OK ! • Detection of concepts independently needs a course-correction because: – Doesn’t avail of all available information sources – Doesn’t map to a user’s search vocabulary
  • 6. Long-term approach … Images Concept Set Mapping User Search vocabulary How can a single image be mapped to two different vocabularies ?
  • 7. Using NL for image search … tagging • NL is fraught with complexities, ambiguities at all levels .. – Lexical level polysemy – Syntactic level structural ambiguity – Semantic interpretations – Discourse level pronoun resolution • + vocabulary limitations when finding word or phrase to describe something • When using computers to help search for image data, language challenges are exacerbated yet we assume a “simplistic” approach of tagging by a set of concepts, notwithstanding what we’re seeing with captioning here today • Tagging is very useful for smaller, niche applications in restricted domains with manual tagging, but we see scalability problems – Addressed with progress in automatic tagging but we’re tolerant of inaccuracies !
  • 8. In this paper … • We are interested in images from wearable cameras with lots of juicy challenges. • Notoriously difficult to process automatically because … – Blurring caused by wearer moving at image capture – Occlusions from wearer’s hands – Lighting conditions – Fisheye lens for wider perspective causing distortion – First person viewpoint but not what wearer sees – Content varies hugely across subjects • Applications in memory support, behaviour recording and analysis, security, other work-related, and QS. • In this paper we work with wearable camera data from school children, for analysis of their environments
  • 10. The Kids’Cam Project • Child obesity is a significant public health concern, worldwide. • Unequivocal evidence that marketing of energy-dense and nutrient- poor foods and beverages is a causal factor in child obesity. • Evidence of children’s total exposure to advertising of poor foodstuffs is not quantified. • Kids’Cam study aimed to determine the frequency, nature and duration of children’s exposure to such marketing. • 169 randomly selected children 11 to 13 yo from 16 schools in Wellington, NZ, each wore an Autographer and carried a GPS for 4 days .. .mages every 7 seconds, GPS every 5 seconds. – 1.5M images, 2.5M GPS datapoints • Manual annotation for food / beverage marketing using a 3-level, 53 concept ontology .. Inter-annotator reliability of 90%.
  • 11. + GPS + Date/Time + User xxxx yyyy zzzz xxxx yyyy zzzz xxxx yyyy zzzz xxxx yyyy zzzz 85 concepts Manual Annotation
  • 12. Shop front > sign > sugary drinks/juices
  • 13. Convenience store indoors > in-store marketing > convenience store
  • 14. School > sign > fast food
  • 15. Processing the Kids’Cam Data • Following integration of different data sources and after the manual annotation of images, we processed the image collection in the following way …
  • 16. 14 + GPS + Date/Time + User xxxx yyyy zzzz xxxx yyyy zzzz xxxx yyyy zzzz xxxx yyyy zzzz 85 concepts : X 1.5M images aaaa bbbb cccc aaaa bbbb cccc aaaa bbbb cccc aaaa bbbb cccc aaaa bbbb cccc aaaa bbbb cccc TFR aaaa’ bbbb’ cccc’ aaaa’ bbbb’ cccc’ aaaa’ bbbb’ cccc’ aaaa’ bbbb’ cccc’ aaaa’ bbbb’ cccc’ aaaa’ bbbb’ cccc’ GPU Concept Models 6,000 concepts xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx aaaaa bbbb ccccc aaaa bbbb ccccc aaaa bbbb ccccc aaaa bbbb ccccc 8 7 6 5 4 3 2 1 9 13 12 11 10
  • 17. 14 + GPS + Date/Time + User xxxx yyyy zzzz xxxx yyyy zzzz xxxx yyyy zzzz xxxx yyyy zzzz 85 concepts : X 1.5M images aaaa bbbb cccc aaaa bbbb cccc aaaa bbbb cccc aaaa bbbb cccc aaaa bbbb cccc aaaa bbbb cccc TFR aaaa’ bbbb’ cccc’ aaaa’ bbbb’ cccc’ aaaa’ bbbb’ cccc’ aaaa’ bbbb’ cccc’ aaaa’ bbbb’ cccc’ aaaa’ bbbb’ cccc’ GPU Concept Models 6,000 concepts xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx aaaaa bbbb ccccc aaaa bbbb ccccc aaaa bbbb ccccc aaaa bbbb ccccc 8 7 6 5 4 3 2 1 9 13 12 11 10
  • 18. 14 + GPS + Date/Time + User xxxx yyyy zzzz xxxx yyyy zzzz xxxx yyyy zzzz xxxx yyyy zzzz 85 concepts : X 1.5M images aaaa bbbb cccc aaaa bbbb cccc aaaa bbbb cccc aaaa bbbb cccc aaaa bbbb cccc aaaa bbbb cccc TFR aaaa’ bbbb’ cccc’ aaaa’ bbbb’ cccc’ aaaa’ bbbb’ cccc’ aaaa’ bbbb’ cccc’ aaaa’ bbbb’ cccc’ aaaa’ bbbb’ cccc’ GPU Concept Models 6,000 concepts xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx aaaaa bbbb ccccc aaaa bbbb ccccc aaaa bbbb ccccc aaaa bbbb ccccc 8 7 6 5 4 3 2 1 9 13 12 11 10 7. Using a CNN to apply tags to images. We used the VGG-16 network, a deep CNN, trained on 1,000 object classes using 1.2M images from ImageNet
  • 19. 14 + GPS + Date/Time + User xxxx yyyy zzzz xxxx yyyy zzzz xxxx yyyy zzzz xxxx yyyy zzzz 85 concepts : X 1.5M images aaaa bbbb cccc aaaa bbbb cccc aaaa bbbb cccc aaaa bbbb cccc aaaa bbbb cccc aaaa bbbb cccc TFR aaaa’ bbbb’ cccc’ aaaa’ bbbb’ cccc’ aaaa’ bbbb’ cccc’ aaaa’ bbbb’ cccc’ aaaa’ bbbb’ cccc’ aaaa’ bbbb’ cccc’ GPU Concept Models 6,000 concepts xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx aaaaa bbbb ccccc aaaa bbbb ccccc aaaa bbbb ccccc aaaa bbbb ccccc 8 7 6 5 4 3 2 1 9 13 12 11 10 8. Trained models were used to predict probabilities for each concept in each of 1.5M images. Processed in batches of 64 on NVIDIA GPU, taking 4 days to complete
  • 20. Training Free Refinement • Current concept-at-a-time classifiers do not consider inter- concept relationships or dependencies yet these do exist • To improve one-per-class detectors, we post-process detection scores – We take advantage of concept co-occurrence and re- occurrence which depend on the particular collection – We take advantage of local (temporal) neighbourhood information where concepts are likely to re-occur close in time – We use GPS location information where concepts identified by a person at a location, may re-occur subsequently at that same location • TFR is based on non-negative matrix factorisation, described elsewhere
  • 21. 14 + GPS + Date/Time + User xxxx yyyy zzzz xxxx yyyy zzzz xxxx yyyy zzzz xxxx yyyy zzzz 85 concepts : X 1.5M images aaaa bbbb cccc aaaa bbbb cccc aaaa bbbb cccc aaaa bbbb cccc aaaa bbbb cccc aaaa bbbb cccc TFR aaaa’ bbbb’ cccc’ aaaa’ bbbb’ cccc’ aaaa’ bbbb’ cccc’ aaaa’ bbbb’ cccc’ aaaa’ bbbb’ cccc’ aaaa’ bbbb’ cccc’ GPU Concept Models 6,000 concepts xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx aaaaa bbbb ccccc aaaa bbbb ccccc aaaa bbbb ccccc aaaa bbbb ccccc 8 7 6 5 4 3 2 1 9 13 12 11 10 9. Previously-described, we then applied Training-Free Refinement to improve probability assignments
  • 22. • We do not know accuracy of assignment of 1,000 concepts but we know accuracy of assignment of 53 concepts …and we have 1.5M images each mapped into 2 concept spaces • Can we adjust values in (b), anchored and pivoting around (a) in addition to having already used local, within-collection distributions ? y1 y2 x1 x2 b2 b1 a1 a2 (a) Manual, correct (b) Automatic, unknown accuracy
  • 24. Cross-mapping concept spaces • Distributional semantics – corpus-driven approach – based on hypothesis that co-occurring words in similar contexts have similar meaning • Using word2vec in DINFRA, we can map all words in a vocabulary to an n-dimensional vector space, where we can obtain relatedless scores among the words • Figure illustrates an example • For each image in Kids’Cam we can evaluate relatedness between human annotation and automatic concepts with highest-probability
  • 25. School > availability > drink bottle
  • 26. • We have top-ranked concepts, their confidences, their relatedness to the manual tags … • First effort is to simply multiple, as in Table, but its hard to see the impact of this
  • 27. And the result is … • … and that’s where we currently are !
  • 28. Conclusions and Future Work • Since automatic concept-detection using pre-defined models has made so much progress recently, we’re seeing vocabulary / concept space mis-matching • Using 1.5M Kids’Cam images from wearable cameras, we have used within-collection distributions to “smooth” concept weights (outliers and gaps) in TFR • We are trying to pivot around some manual annotations in order to improve concept accuracies • But, we need … – More concepts – a richer vocabulary of them – More varied manual annotations, not just fast food adverts – A more global or collection-wide way to combine concept confidences and relatedless to known manual annotations – Some validation of accuracy of automatic concepts to measure accuracy of our post-processing
  • 29. Finally, a plug … • TRECVid Video captioning Pilot task 2016 • 2,000 x Vine Videos, manually annotated with captions, twice • 8 participating groups (CMU, CUHK, DCU, GMU, NII, UvA, Sheffield) • Two tasks … – For each video, rank the 2,000 captions – metric is MRR – For each video, generate your own caption – metrics are bleu, meteor, and UMBC STS (Semantic Textual Similarity) Service • Lots of lessons learned and will build upon for full task in 2017, probably using Vine videos