Mediaeval 2013 Spoken Web Search results slides

Spoken Web Search at Mediaeval
2013
Xavier Anguera, Florian Metze, Andi
Buzo, Igor Szoke and Luis Javier
Rodriguez-Fuentes

Spoken Audio Search (or Query-by-Example
Spoken-Term Detection)
Given a spoken query we search for instances at lexical
level within spoken documents
It is similar to Spoken Term Detection (NIST STD2006,
OpenKWS 2013) but…
 Queries are spoken

 Different speakers
 Different acoustic conditions
 No prior knowledge of the
language(s) might be available

SWS history in Mediaeval
• SWS 2011 had 5 finishing participants and
focused on 4 Indian languages
• SWS 2012 had 9 finishing participants and
focused on 4 African Languages
• SWS 2013 has 13 finishing (18 registered)
participants and contains 9 languages
18
16

14

1400
#teams
1200

database size

1000

12
10

800

8

600

6

400

4
200

2
0

0
2011

2012

2013

SWS 2013 evaluation setup
• 1 single search corpus with ~20 hours of
data, collected from contributions of 9
languages
– No transcription or language information is given
to participants

• 500 queries for dev and 500 queries for eval
– For each query, participants need to return all
instances of that query in the search corpus

Mediaeval SWS 2013
• 9 languages in different acoustic contexts: 4 African
languages
(isixhosa, isizulu, sepedi, setswana), Albanian, Basqu
e, Czech, non-native English, Romanian
#utts

time

Avg. length/utt.

Search corpus

10762

19:57:55

6.67s

Dev Queries

505

0:11:26h

1.35s

Extended dev*

1046

0:08:42h

0.49s

Eval Queries

503

0:11:37h

1.38s

Extended eval*

1037

0:08:57h

0.51s

Total
13853
20:38:37h
*Only Basque (3x) and Czech (10x) queries have extended versions

Database distribution per language
Language

Number of
utterances / total
duration

Number of queries

Speech quality (original
sampling rate)

Recording environment

African - isixhosa

395 / 60 min.

25 / 25

Telephone speech, 8KHz

Field recordings, read
speech

African - isizulu

395 / 60 min.

25 / 25


speech

African - sepedi

395 / 60 min.

25 / 25


speech

African - setswana

395 / 60 min.

25 / 25


speech

Albanian

968 / 127 min.

50 / 50

PC microphone, 16KHz

Lab environment, read
speech

Basque

1841 / 192 min.

100 / 100 (recorded
by mobile phone)

TV Broadcast news,
16KHz

Studio, read speech

Czech

3667 / 252 min.

94 / 93


Telephone calls into
radio broadcasts,
spontaneous speech

Non-native English

434 / 141 min.

61 / 60

High quality mic, 44KHz

Conference lectures,
spontaneous speech

Romanian

2272 / 244 min.

100 / 100

PC microphone, 16KHz

Lab environment, read
speech

SWS 2013 participants
Dto. Electricidad y electrónica, Universidad Pais Vasco

Spain

Speec@FIT, Brno University of Technology

Czech Republic

Telefonica Research

Spain
Romania

School of Electrical and Computer Engineering, Georgia Institute of Technology

USA

L2F - INESC-ID

Portugal

Departament de sistemes informàtics I Computació, Universitat Politècnica de València

Spain

Audiolab, University of Zilina

Slovakia

LIA, University of Avignon

France

Technical University of Kosice

Slovakia

Universitat Pompeu Fabra

Spain

DSP-STL, Dept. of EE, The chinese University of Hong Kong

Hong Kong

International Institute of Information Technology- Hyderabad

Non-finishing

country

University Politechnica of Bucarest

organizers

Team name

India

IAIS, Fraunhofer Institute

Germany

TATA Consultancy Services Ltd.

India

Indian Statistical Institute

India

Northwestern Polytechnical University of Xi’an

China

Toyota Technological Institute at Chicago

USA

Possible approaches to QbE-STD
Pattern based
Language spoken
Acoustic models +

Lattice based
Language models +

Word-based

Followed approaches
Team name
Dto. Electricidad y electrónica, Universidad Pais Vasco
Speec@FIT, Brno University of Technology
Telefonica Research
University Politechnica of Bucarest
School of Electrical and Computer Engineering, Georgia Institute of Technology
L2F - INESC-ID
Dept. de sistemes informàtics I Computació, Universitat Politècnica de València
Audiolab, University of Zilina
LIA, University of Avignon
Technical University of Kosice
Universitat Pompeu Fabra
DSP-STL, Dept. of EE, The chinese University of Hong Kong
International Institute of Information Technology- Hyderabad

DTW-like

AKWS

Scoring metrics
• PRIMARY: Actual Term Weighted Value (ATWV) /
Maximum Term Weighted Value (MTWV)
• Actual/minimum Cnxe
• Real-time factor
• Memory usage

Per language results
Average for the 10-best systems

Per-language results: African (eval)

Per-language results: Albanian(eval)

Per-language results: Basque(eval)

Per-language results: Czech (eval)

Per-language results: Non-native English (eval)

Per-language results: Romanian (eval)

DET dev

Miss probability (in %)

98

95
90

80

60

40

20

10
5
.0001

.5 1

2

5

10

20

Random Performance
GTTS (MTWV=0.417, Thr=5.204)
L2F (MTWV=0.390, Thr=3.428)
CUHK (MTWV=0.368, Thr=0.530)
BUT (MTWV=0.371, Thr=0.930)
CMTECHETAL (MTWV=0.264, Thr=16.535)
IIITH (MTWV=0.253, Thr=2.130)
ELIRF (MTWV=0.170, Thr=2.697)
TID (MTWV=0.116, Thr=4.085)
GTC (MTWV=0.116, Thr=3.248)
SPEED (MTWV=0.083, Thr=0.960)
LIA-Late (MTWV=0.005, Thr=13.065)
UNIZA-Late (MTWV=0.000, Thr=1.000)
TUKE-Late (MTWV=0.000, Thr=3.000)

Primary systems (development)

.001 .004 .01 .02 .05 .1 .2

False Alarm probability (in %)

40

DET eval

Miss probability (in %)

98

95
90

80

60

40

20

10
5
.0001

.5 1

2

5

10

20

Random Performance
GTTS (MTWV=0.399, Thr=5.243)
L2F (MTWV=0.342, Thr=3.551)
CUHK (MTWV=0.306, Thr=0.618)
BUT (MTWV=0.297, Thr=0.914)
CMTECHETAL (MTWV=0.257, Thr=18.153)
IIITH (MTWV=0.224, Thr=2.721)
ELIRF (MTWV=0.159, Thr=2.759)
TID (MTWV=0.093, Thr=5.051)
GTC (MTWV=0.084, Thr=3.341)
SPEED (MTWV=0.059, Thr=0.923)
LIA-Late (MTWV=0.000, Thr=1079.003)
UNIZA-Late (MTWV=0.001, Thr=1.000)
TUKE-Late (MTWV=0.000, Thr=3.000)

Primary systems (evaluation)

.001 .004 .01 .02 .05 .1 .2

False Alarm probability (in %)

40

Cnxe metric
Cnxe

2.9
Min Cnxe (development)

Act Cnxe (development)

3
2.8
Act Cnxe (evaluation)

CUHK

2.7

L2F

Min Cnxe (evaluation)

GTTS

2.6
2.5
2.4
2.3
2.2
2.1
2
1.9
1.8
1.7
1.6
1.5
1.4
1.3
1.2
1.1
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
ELIRF

TID

GTC

Cnxe for primary systems

BUT CMTECHETAL IIITH

SpeeD

LIA

UNIZA

TUKE

Extended Queries
• 4 teams submitted 4 extended systems, making use of 3
repetitions of Basque queries and 10 repetitions of Czech
queries available
– TID: computes each query individually and then puts together all
results
– GTTS: DTW-aligns all queries above a minimum duration and searches
with the resulting query
– GeorgiaTech: builds a graphical keyword model using more than one
instance

Real-Time Factor versus Memory usage

Real-Time Factor versus Memory usage (partial)

Take home messages
• The task was more complicated than in 2012
– GTTS got MTWV-13 = 0.39 MTWV-12 = 0.51 (on
2013 data)
– HKCU MTWV-12 = 0.74 (on 2012 data)

• It is possible to do QbE-STD on unknown/low
resources data

New things to watch out for in the posters session
• BUT:
– Fusion of 26 systems (13 AKWS + 13 DTW)
– M-norm normalization

• IIIT:
– Articulatory Bottleneck features

• CUHK:
– Tokenizer construction using Gaussian Component clustering
– Query expansion using PSOLA

• L2F
– DTW candidate pre-selection

• GTTS:
– Distance matrix normalization in DTW

• GeorgiaTech:
– Low-resource speech modeling using EHMM Models

• LIA:
– Use of I-vectors in SWS

• ARF
– DTW string matching algorithm with a novel scoring

System presentations
• 16:30-16:45 "GTTS Systems for the SWS Task at
MediaEval 2013", Luis Javier Rodriguez-Fuentes, DEE,
Universidad del País Vasco
• 16:45-17:00 "The L2F Spoken Web Search system for
Mediaeval 2013”, Alberto Abad, L2F, INESC-ID
• 17:00-17:15 "BUT SWS 2013 - MASSIVE PARALLEL
APPROACH", Lucas Ondel, Speech@BUT, Brno
University of Technology
• 17:15-17:30 "The CMTECH Spoken Web Search System
for MediaEval 2013", Ciro Gracia, UPF
• 17:30-17:45 Discussion and SWS 2014 teaser, Xavier
Anguera

Mediaeval 2013 Spoken Web Search results slides

Recomendados

Recomendados

Más contenido relacionado

Similar a Mediaeval 2013 Spoken Web Search results slides

Similar a Mediaeval 2013 Spoken Web Search results slides (20)

Último

Último (20)

Mediaeval 2013 Spoken Web Search results slides

Notas del editor