Multimodal Analysis for Bridging Semantic Gap with Biologically Inspired Algorithm

Multimodal Analysis for Bridging
Semantic Gap with Biologically
Inspired Algorithms
Dr. Krishna Chandramouli
Media Engineering and Analytics Research Group
VIT University

Overview
 Who we are!!
 Media and Internet
 Information Access
 Subjective vs Objective Indexing
 The Semantic Gap
 Evolving Strategies
 Social Media Analysis
 MediaEval 2013 Participation
 Conclusion
 Q & A
04/07/2014Uni. of Siegen

Who we are!!

Media and internet

Media and internet
 In March 2013 that Flickr had a
total of 87 million registered
members and more than 3.5
million new images uploaded
daily.
 There are currently almost 90
billion photos total on Facebook.
This means we are, by far, the
largest photos site on the Internet.

Information Access
Textual search
Visual search
Search query formulation

Information Access
 Traditional ordering of images is achieved through categorization
of information into logical structures
 Creation of albums
 Categorizing through date/time
 Clustering through location
 Image based search engines are gaining popularity with the
increase in power of indexing schemes

Information Access

Indexing subjective or objective

 How to uniquely name an image to make them distinguishable?
 What names can be used to search images?
 How many names are needed to make the images unique?
 Will all humans use the same names to identify the images?

 Humans are culturally influenced
 Terms contain different meanings across boundaries and cultures
 Therefore, any tag/word assigned to an image will be considered
subjective
 Objective signatures for images are generated from the
characteristics of the images
 The beginning of MPEG-7 standardisation activities.

 Image characteristics exploited for objective annotation include
 Colour
 Colour Layout Descriptor
 Colour Structure Descriptor
 Dominant Colour Descriptor
 Scalable Colour Descriptor
 Texture
 Texture Browsing Descriptor
 Edge Histogram Descriptor
 Homogenous Texture Descriptor
 Shape

The Semantic Gap

The Semantic Gap
 The semantic gap characterizes the difference between two
descriptions of an object by different linguistic representations, for
instance languages or symbols.
 In computer science, the concept is relevant whenever ordinary
human activities, observations, and tasks are transferred into a
computational representation

Evolving strategies
Image Classification; Visual Classifier; Knowledge Assisted Analysis; Image Retrieval
and User Relevance Feedback; Multi-Concept Space Search and Retrieval

Evolving strategies
 The problem of Image classification and clustering has been the
subject of active research for last decade. Mainly attributed to
the exponential growth of digital content.
 The efficiency of the clustering and classification algorithms can
be attributed to the efficiency of the machine learning
approaches.
 To improve the performance of machine learning algorithms,
different optimisation techniques has been employed such as
Genetic Algorithms.

Evolving strategies
 Recent developments in applied and heuristic optimisation
techniques have been strongly influenced and inspired by natural
and biological systems.
 Algorithms developed from such observations are
 Ant Colony Optimisation (ACO) - based on the ability of an ant colony to
nd the shortest path between the food and the source compared to an
individual ant.
 Articial Immune System (AIS) - typically exploit the immune system's
characteristics of learning and memory to solve a problem
 Particle Swarm Optimisation (PSO) - inspired by the social behaviour of a
flock of birds.

Evolving strategies
 In the study of "Semantic Gap", machine learning algorithms are
the building blocks for bottom-up approach.
 Some of the applications of efficient machine learning algorithms
are:
 Automatic Content Annotation
 Knowledge Extraction
 Content Retrieval
 In the top-down approach, Ontology provides partial
understanding of human semantics.

Visual classifier

Visual classifier
 In an effort to transform the social interaction of different species into a
computer simulation, Kennedy and Eberhart developed an optimisation
technique named Particle Swarm Optimisation.
 In theory, the universal behaviour of individuals is summarised in terms of
Evaluate, Compare and Imitate principles.

Visual classifier
 Evaluate: The tendency to evaluate stimuli – to rate them as
positive or negative, attractive or repulsive is perhaps the most
ubiquitous behavioural characteristic of living organisms.
 Compare: In almost every aspect of life, human tend to compare
with others
 Imitate: Humans imitation comprises taking the perspective of the
other person, not only imitating a behaviour but also realising its
purpose and executing the behaviour when it is appropriate

Visual classifier
 Equations governing the motion of particles in PSO.
valuessocialandcognitivegoverningparameterscc
particletheofpositiontherepresentstx
swarmtheforsolutionbestglobalrepresentstgbest
iparticleofsolutionbestpersonalrepresentstpbest
particleofvelocitytherepresentstvid
tvtxtx
txtgbestctxtpbestctvtv
id
d
i
ididid
iddidiidid
−
−
−
−
−
++=+
−+−+=+
21
21
,
)(
)(
)(
)(
)1()()1(
))()(())()(()()1(

Visual classifier
 Pseudo code for the algorithm
 Step 1: Random Initialization of Particles
 Step 2: Function Evaluation
 Step 3: Computation of personal best and global best
 Step 4: Velocity update
 Step 5: Position update
 Step 6: Loop to step 2, until the stopping criteria is reached

Visual classifier
 Self Organising Map
[X]
[X] - Input feature
vector
Class 1 – Red
Untrained - Black
Winner Node selected
based on L2 norm

Visual classifier
 The elementary principle of “Chaos” is introduced to model the behaviour of
particle motion.
 The theoretical discussion on Chaotic – PSO includes the notion of “wind
speed” and “wind direction” modelling the biological atmosphere for
position update of the particles.
 The wind speed and therefore the position update equation are presented
by:
particleofposition
particleofvelocity
atmosphereofeffectsupporting()*
atmosphereofeffectopposing()*
)1()1()()1(
()*()*)()1(
−
−
−
−
−
++++=+
++=+
id
id
su
op
w
wididid
suopww
x
v
randv
randv
speedwindv
tvtvtxtx
randvrandvtvtv

Knowledge Assisted Framework

 Experimental Dataset
 A set of 500 Images, belonging to the general category of
vacation images was assembled.
 The content was mainly obtained from Flickr online photo
management and sharing application and includes images
that depict cityscape, seaside, mountain and landscape
locations.
 Every image was manually annotated, i.e. after the
segmentation algorithm is applied, a single concept was
associated with each resulting image segment

 From the results it can be seen that the combined use of PSO
optimisation technique with SOM results in better classification
accuracy compared to using the latter alone.
 It can be noted that the performance of PSO classier is better
than the performance of SVM and GA classifiers.
 Since, SVM's need large training data to accurately discriminate
between image classes.

Image Retrieval and User Relevance Feedback

User Relevance Feedback

User Relevance Feedback
 The database used in the experiment is generated from Corel
Dataset and consists of seven concepts namely, building, cloud,
car, elephant, grass, lion and tiger
 The test set has been modelled for seven concepts with a variety
of background elements and overlapping concepts, hence
making the test set complex.

Multi-concept search space

• High-level queries
“A tiger resting in the forest
and guarding his territory”
• Mid-level features (context
independent)
“Tiger”, “Grass”, “Rock”,
“Water”,……

• Mid-level features:
 In a constrained environment with limited number of mid-level features,
the performance of classification algorithm has found to be satisfactory
• High-level queries:
 Open to subjective interpretation of the concepts and also may involve
more than one mid-level feature
 Main objective:
• In this multi-concept framework, users are encouraged to construct high
level queries based on their preferences

• SVM Classifier
• SVM Light toolbox was used to generate semantic labels
• CLD+EHD
• Multi-feature classifier (MF)
• Employs a mixture of 7 visual features.
• The visual features are merged using Multi-Objective Learning (MOL)

 Pre-processing stage: mid-level feature concept detection
 Query formulation: users to construct a high-level semantic information
space

 Fisheye distortion
technique
 Overview + focus

• Query space panel
• Concept map panel
• Concept chart panel

 A 3500 image set collection
 From Corel dataset
 Natural images with many elements
 Foreground and background
 Rich semantic context
 Fully annotated
 10 mid-level concepts
 lion, water, grass, building, car, cloud, rock, tiger, elephant, flower
 8 high-level concepts
 flower fields, modern city view, rural garden, mountain view, waterfalls, wild life, city street,
boat

 Retrieval of high level queries using the proposed MCB framework

 Retrieval of high level queries using SVM classification

 Content-based retrieval with RF mechanism

Landscape water, grass 0.58
Modern city building, cloud 0.8
Wild life lion, tiger, elephant 0.59
Rural garden flower, water, grass 0.9
User 2
Landscape water 0.23
Modern city building 0.71
Wild life lion, rock, grass, tiger, elephant 0.87
Rural garden flower 0.28
User 3
Landscape water, grass, cloud, car, elephant 0.59
Modern city cloud, building, car 0.91
Wild life lion, tiger, grass, elephant, rock 0.82
Rural garden flower, water, grass 0.88

Social Media Analysis

 Social media is the interaction among people in which they create, share or
exchange information and ideas in virtual communities and networks.
 Andreas Kaplan and Michael Haenlein define social media as "a group of
Internet-based applications that build on the ideological and technological
foundations of Web 2.0
 Social media allows for the creation and exchange of user-generated
content.
 Social media differ from traditional or industrial media in many ways,
including quality, reach, frequency, usability, immediacy, and permanence.

• Images are often accompanies with free-text annotations, which
can be used as complementary information for content-based
classification
• The challenge is to extract entities from text and classify them into
an arbitrary set of classes
Plansarsko lake
Shepherd in Bucegi
National Park

 Content-based analysis (KAA)
restricted to classes for which the
classifier has been learnt
 For text-based analysis (SCM/THD),
the classes have to be exhaustive -
all entities are classified
 Mapping from SCM/THD to KAA
 Perform intersection between the
individual classifier results
 Select concept occupying largest area
on the image

MediaEval 2013 Participation

VIT @ MediaEval 2013
 Social Event Detection Task
17/07/14

17/07/14
 The geographical coordinates is an important
component and indicator of where an event has
happened.
 The event clusters are analysed through the weighted
occurrence of tags among the distribution of media
annotation

17/07/14
 The system computes the similarity between synset representing
the tags and each of the categories.
 We use Lin similarity measure to evaluate the semantic distance
between the synset and category.

 Placing Task
17/07/14

 Dividing the globe into grids with a maximum of 10,000 images
per grid . Starting from an initial grid that spans the entire
globe, recursively subdividing grids into smaller ones once the
threshold is reached.
17/07/14
0
5
10
15
20
25
30
35
1 10 100 500 1000
Series1 0.74 3.9 15.24 26.3 30.14

Conclusion and Future Work

Conclusion
 Automatic concept detection within images is a challenging and as of yet
unsolved research problem.
 Impressive improvements have been achieved, although most of the
proposed systems rely on training data that has been manually, and thus
reliably labeled, an expensive and laborious endeavor that cannot easily
scale.
 Current research in domain adaptation focuses on a scenario where
 (a) the prior domain (source) consists of one or maximum two databases
 (b) the labels between the source and the target domain are the same, and
 (c) the number of annotated training data for the target domain are limited.

Thank you for your attention
Q & A

Multimodal Analysis for Bridging Semantic Gap with Biologically Inspired Algorithm

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Viewers also liked

Viewers also liked (16)

Similar to Multimodal Analysis for Bridging Semantic Gap with Biologically Inspired Algorithm

Similar to Multimodal Analysis for Bridging Semantic Gap with Biologically Inspired Algorithm (20)

Recently uploaded

Recently uploaded (20)

Multimodal Analysis for Bridging Semantic Gap with Biologically Inspired Algorithm