Boost PC performance: How more available memory can improve productivity
New Approaches to Interactive Multimedia Content Retrieval from different Sources
1. New Approaches to Interactive
Multimedia Content Retrieval from
different Sources
Julián Moreno Schneider
LaBDA Group, Computer Science Department
Universidad Carlos III de Madrid, Spain
jmschnei@inf.uc3m.es
2. Content
New Approaches to Interactive Multimedia
Content Retrieval from different Sources
2
Motivation
Background
Objectives
Proposal
Sports-domain Scenario and Validation
Adaptation techniques and Validation
Health-domain Scenario and Evaluation
Future directions
Publications
3. Motivation (I)
New Approaches to Interactive Multimedia
Content Retrieval from different Sources
3
Multimedia content is increasing at staggering rates
Devices and formats are very diverse and move
away from traditional modes.
5. Motivation (III)
New Approaches to Interactive Multimedia
Content Retrieval from different Sources
5
Problem description
Current Limitation: multimedia elements retrieved by
textual metadata
Users need access in a transparent, faster and easier
way to many independent sources containing information
in different formats (such as video, text, audio, images,
graphics, etc.).
6. Motivation (IV)
New Approaches to Interactive Multimedia
Content Retrieval from different Sources
6
Clarifying the problem
Seeking the album of a song having the audio file and the
artist’s name
+ ‘I want you back’ The Jackson 5
7. Content
New Approaches to Interactive Multimedia
Content Retrieval from different Sources
7
Motivation
Background
Objectives
Proposal
Formal model
Sports-domain Scenario and Validation
Adaptation techniques and validation
Health-domain Scenario and Evaluation
Future directions
Publications
8. Organization by the components
Multimodal Information (Collections)
Query
Information Retrieval Approaches
Retrieval Selection
Fusion
Interactions
Background (I)
New Approaches to Interactive Multimedia
Content Retrieval from different Sources
8
9. Multimodal Information
Background (II)
New Approaches to Interactive Multimedia
Content Retrieval from different Sources
9
Image and Text
Image and Audio
Image and Video
Text
and
Video
Multimodal
Federated Web Search Track
Jou et al. [2013]
10. Background (III)
New Approaches to Interactive Multimedia
Content Retrieval from different Sources
10
Query Modalities
Text (Monomodal)
Image (Monomodal)
Text and Image
Video and Image
Text and Audio
Multimodal
Yang et al. [2002]
de Vries [1998]
Marchand-Maillet et al. [2011]
11. Background (IV)
New Approaches to Interactive Multimedia
Content Retrieval from different Sources
11
Retrieval Approaches
Text Retrieval
Low-level features
Combined Indexes
Low-level features
Text-based
(metadata)
retrieval
Full text
retrieval
Salton et al. [1975]
Romberg et al. [2012]Lana-Serrano et al. [2011]
12. Background (V)
New Approaches to Interactive Multimedia
Content Retrieval from different Sources
12
Retrieval Engine Selection Strategy
Unknown
Strategy
By Elements
By Query Terms
Probabilistic
Renaud and Azzopardi [2012]
Demner-Fushman et al. [2012]
Romberg et al. [2012].
Chernov et al. [2006]
Balog et al. [2012]
13. Background (VI)
New Approaches to Interactive Multimedia
Content Retrieval from different Sources
13
Result Fusion or Aggregation
Pre-RE fusion:
Joint indexes (prior fusion)
Post-RE fusion
Randomness
Source or type
Scores (unification)
Aggregated search
Arampatzis et al. [2011]
Balog et al. [2012]
Romberg et al. [2012]
14. Background (VII)
New Approaches to Interactive Multimedia
Content Retrieval from different Sources
14
Semantic Knowledge
Annotation-based Retrieval
Multimedia Ontology Retrieval
Combination of
multimedia ontologies
Worring et al. [2007]
Medina-Ramírez [2007]
Castells et al. [2007]
15. Background (VIII)
New Approaches to Interactive Multimedia
Content Retrieval from different Sources
15
User Interactions
• Relevance Judgments
o Direct
o Indirect
Document browsing
Clicks logging and analysis
Query history
• Log Analysis
• Surveys
• Dwell time
• Eye tracking
• Gestures, lip motion, speech
and facial expression
16. Discussion
New Approaches to Interactive Multimedia
Content Retrieval from different Sources
16
Limitations
Handler strategy (specially adapted to the user
experience)
Multimodality in query and results
Multimodal semantically related collection
Spanish
Out of the scope of this thesis
Retrieval approaches
Fusion algorithms
Innovation in Interaction Logging
17. Content
New Approaches to Interactive Multimedia
Content Retrieval from different Sources
17
Motivation
Background
Objectives
Proposal
Formal model
Sports-domain Scenario and Validation
Adaptation techniques and validation
Health-domain Scenario and Evaluation
Future directions
Publications
18. Objectives (I)
New Approaches to Interactive Multimedia
Content Retrieval from different Sources
18
1
• Propose a formal model to define
multimodal information retrieval (IMR)
systems.
2
• Develop two multimodal prototypes based
on the proposed model and evaluate them
3
• Design and define techniques to adapt
MIR System based on user experience.
19. Objectives (II)
New Approaches to Interactive Multimedia
Content Retrieval from different Sources
19
Methodology
Formal Model
Interactions
Sports Domain
Scenario
Adaptation techniques
Evaluation
Evaluation
1
2 3
4 5
Health Domain
Scenario
6
20. Content
New Approaches to Interactive Multimedia
Content Retrieval from different Sources
20
Motivation
Background
Objectives
Proposal
Formal Model
Sports-domain Scenario and Validation
Adaptation techniques and validation
Health-domain Scenario and Evaluation
Future directions
Publications
21. Formal Model (I)
New Approaches to Interactive Multimedia
Content Retrieval from different Sources
21
Architecture is composed by the most common
components used in IR models.
26. Content
New Approaches to Interactive Multimedia
Content Retrieval from different Sources
26
Motivation
Background
Objectives
Proposal
Formal Model
Sports-domain Scenario and Validation
Adaptation techniques and validation
Health-domain Scenario and Evaluation
Future directions
Publications
27. Proposal: Sports-Domain Prototype (XII)
New Approaches to Interactive Multimedia
Content Retrieval from different Sources
27
Architecture
28. Proposal: Sports-Domain Prototype (VI)
New Approaches to Interactive Multimedia
Content Retrieval from different Sources
28
Buscamedia Collection
Developed in the framework of
the Buscamedia Project
Sports Domain
Multimodal documents
10000 Texts
350 Images
15 Videos
Recruited in October 2010
Semantically Related
29. Proposal: Sports-Domain Prototype (VII)
New Approaches to Interactive Multimedia
Content Retrieval from different Sources
29
Multimodal Query
Text, Audio and Text + Image
Información sobre el accidente de la foto +
30. Proposal: Sports-Domain Prototype (VIII)
New Approaches to Interactive Multimedia
Content Retrieval from different Sources
30
Retrieval Engines
Question Answering (QA),
Full Text Search (FT), Ontology-based
Search (ONT), Object Detection in Image (ODI), OCR
in Image (OCRI), Audio Transcription (AT)
RE selection (Handler)
Simple Approach
Expert-defined rule-based approach
Question {QA,FT}
Txt(short)+img {ONT,FT,{ODI,OCRI}}
31. Proposal: Sports-Domain Prototype (X)
New Approaches to Interactive Multimedia
Content Retrieval from different Sources
31
Fusion Strategy: Round-Robin Approach
32. Proposal: Sports-Domain Prototype (XI)
New Approaches to Interactive Multimedia
Content Retrieval from different Sources
32
User Interactions
Searches
Documents Browsing
Relevance Judgments
Visualizations
33. Validation and Results (I)
New Approaches to Interactive Multimedia
Content Retrieval from different Sources
33
Objective
User Preferences
Requested sources?
Preferred modes?
Preferred visualizations?
More used query modes?
Expert-defined Rules Validation
Comparison with Baseline (Full Text Search Engine)
Web Interface to test with users
2 months
235 users
34. Validation and Results (II)
New Approaches to Interactive Multimedia
Content Retrieval from different Sources
34
What query types are used?
981 queries: 239 predefined and 742 user-generated.
Short, long and question queries more often than
concepts.
Sources ‘usage’ by query type.
Visualizations
Answer List, Answer / Concept Cloud, Concept
Groups, Individual Document
35. Validation and Results (III)
New Approaches to Interactive Multimedia
Content Retrieval from different Sources
35
Baseline: logs from users
IR performance
Mean Average
Precision
(MAP)
Mean
Reciprocal
Rank (MRR)
R-Precision
36. Adapting IR Functionality (I)
New Approaches to Interactive Multimedia
Content Retrieval from different Sources
36
Motivation
Background
Objectives
Proposal
Formal Model
Sports-domain Scenario and Validation
Adaptation techniques and validation
Health-domain Scenario and Evaluation
Future directions
Publications
37. Adapting IR Functionality (II)
New Approaches to Interactive Multimedia
Content Retrieval from different Sources
37
Rule-Based MIR
(qmode=t, qtype=long) ont , qa , f t
(qmode=t, qtype=question, qlength=14) qa , f t , ont
(qmode=t, qtype=short, qlength=2, qentities=alonso) ont, qa, ft
38. Adapting IR Functionality (III)
New Approaches to Interactive Multimedia
Content Retrieval from different Sources
38
Adaptation architecture
39. Adapting IR Functionality (IV)
New Approaches to Interactive Multimedia
Content Retrieval from different Sources
39
Classification Algorithms
Decision trees, multilayer perpectron and simple K-means
Query features
Mode, type, length, number of entities, entities, number of
verbs, topic
Ranking Scores
Interaction-based
Lowest-position
Average-position
Iteration
Mathematical
40. Validating IR Functionality Adaptation (I)
New Approaches to Interactive Multimedia
Content Retrieval from different Sources
40
Definition of SilverStandard
Example with 4 entity features:
qmode=‘t’; qtype=‘short’; qlength=‘1’; qentities=‘Barcelona’ ft, ont, qa
Query: Barcelona
41. Validating IR Functionality Adaptation (II)
New Approaches to Interactive Multimedia
Content Retrieval from different Sources
41
The best combination is:
Query features: mtle
Classification algorithms: J4.8
Ranking scores: Average Position Score
42. Monitoring health social media (I)
New Approaches to Interactive Multimedia
Content Retrieval from different Sources
42
Motivation
Background
Objectives
Proposal
Formal Model
Sports-domain Scenario and Validation
Adaptation techniques and validation
Health-domain Scenario and Evaluation
Future directions
Publications
43. Monitoring health social media (I)
New Approaches to Interactive Multimedia
Content Retrieval from different Sources
43
Online: http://trendminer.daedalus.es/views/dashboard.php
44. Monitoring health social media (III)
New Approaches to Interactive Multimedia
Content Retrieval from different Sources
44
Annotation Pipeline
Documents
Index
Twitter Saluspot
Relations Manager
Disambiguation
Medical Events Filter
Topics Analyzer
Morpho-syntactic Parser
Language Identification
Resources
• DrugsGaz
• DrugsATC
• AdrsMedDRA
• DiseasesUMLS
• SpanishDrugEffectDB
AnotationPipeline
45. Monitoring health social media (IV)
New Approaches to Interactive Multimedia
Content Retrieval from different Sources
45
IMIR System
46. Monitoring health social media (V)
New Approaches to Interactive Multimedia
Content Retrieval from different Sources
46
Results’ Combination
47. Health-domain Prototype Evaluation
New Approaches to Interactive Multimedia
Content Retrieval from different Sources
47
No User Evaluation
NER & Relation Extraction Performance
NER
Relations extraction
Drugs R P F-m
Strict 0,68 0,75 0,76
Lenient 0,68 0,75 0,76
Effects R P F-m
Strict 0,43 0,75 0,54
Lenient 0,47 0,83 0,6
SpanishDrugEffectDB Coocurrences
Wind. R P F-m R P F-m
30 Strict 0,08 0,57 0,14 0,63 0,44 0,52
30 Lenient 0,13 0,96 0,24 0,88 0,61 0,72
48. Conclusions (I)
New Approaches to Interactive Multimedia
Content Retrieval from different Sources
48
Formal model for IMIR systems
Two prototypes based on the formal model in two
different scenarios:
Sports domain
Health social media
Scenario 1: Adaptation of multimodal IR
Best result: NDCG=81,54% (2,81% gain)
Good RE performance Small improvements
49. Future Lines (I)
New Approaches to Interactive Multimedia
Content Retrieval from different Sources
49
Multimodal Query
50. Future Lines (II)
New Approaches to Interactive Multimedia
Content Retrieval from different Sources
50
Second Screen
51. Publications: Journals
New Approaches to Interactive Multimedia
Content Retrieval from different Sources
51
Bedmar, I. S., Martínez, P., Arenaz, R. R., and
Schneider, J. M. (2015). Exploring spanish health
social media for detecting drug effects. BMC Medical
Informatics and Decision Making, 15. 183, 216
Martínez, P., Fernández, J. L. M., Bedmar, I. S.,
Schneider, J. M., Luna, A., and Arenaz, R. R. (2015).
Turning user generated health-related content into
actionable knowledge through text analytics
services. Computers in Industry.
52. Publications: Conferences
New Approaches to Interactive Multimedia
Content Retrieval from different Sources
52
SEPLN
Julián Moreno-Schneider, José Luis Martínez Fernández,
Paloma Martínez, and Thierry Declerck. Prueba de Concepto de
Expansión de Consultas basada en Ontologías de Dominio
Financiero.
AMR
Julián Moreno-Schneider, José Luis Martínez Fernández, and
Paloma Martínez. A Proof-of-Concept for Orthographic Named
Entity Correction in Spanish Voice Queries.
González, M., Moreno Schneider, J., Martínez, J. L., and
Martínez, P. (2013). An illustrated methodology for evaluating
asr systems.
Schneider, J. M., Salazar, M. G., Martínez, P., and Fernández, J.
L. M. (2011). Some experiments in evaluating asr systems
applied to multimedia retrieval.
53. Publications: Conferences
New Approaches to Interactive Multimedia
Content Retrieval from different Sources
53
CLEF Conference
Vicente-Díez, M. T., Moreno-Schneider, J., and Martínez, P.
(2010a). Temporal information needs in respubliqa: an attempt
to improve accuracy. the uc3m participation at clef 2010.
Vicente-Díez, M. T., De Pablo-Sanchez, C., Martínez, P.,
Moreno-Schneider, J., and Salazar, M. G. (2009). Are passages
enough? the miracle team participation in qaclef2009.
SemEval
Vicente-Díez, M. T., Moreno-Schneider, J., and Martínez, P.
(2010b). Uc3m system: Determining the extent, type and value
of time expressions in tempeval-2.
54. Research and Development (R&D) projects
New Approaches to Interactive Multimedia
Content Retrieval from different Sources
54
Trendminer (FP7-ICT 287863)
Buscamedia (CEN-20091026)
Bravo (Búsqueda de Respuestas Avanzada
Multimodal y Multilingüe) (TIN2007-67407-C03-01)
MAVIR (S-0505/TIC-0267) and MAVIR2 (S-
2009/TIC-1542)
55. New Approaches to Interactive Multimedia
Content Retrieval from different Sources
55
‘‘New Approaches to Interactive Multimedia Content
Retrieval from different Sources’’
Julián Moreno Schneider
jmschnei@inf.uc3m.es
Thank you for your attention
Editor's Notes
La arquitectura final del prototipo con todos los elementos implementados se muestra en la figura. Además, es destacable que la interfaz web y el prototipo están accesible en la dirección mostrada en pantalla, por si les interesa hacer una prueba.
Esta arquitectura he sido realizada con un gran trabajo de desarrollo, no sólo por la integración del prototipo en la interfaz, sino al desarrollar la propia interfaz y todos los modos de visualización.