SlideShare una empresa de Scribd logo
1 de 33
Spoken Web Search at Mediaeval
2013
Xavier Anguera, Florian Metze, Andi
Buzo, Igor Szoke and Luis Javier
Rodriguez-Fuentes
Spoken Audio Search (or Query-by-Example
Spoken-Term Detection)
Given a spoken query we search for instances at lexical
level within spoken documents
It is similar to Spoken Term Detection (NIST STD2006,
OpenKWS 2013) but…
 Queries are spoken

 Different speakers
 Different acoustic conditions
 No prior knowledge of the
language(s) might be available
SWS history in Mediaeval
• SWS 2011 had 5 finishing participants and
focused on 4 Indian languages
• SWS 2012 had 9 finishing participants and
focused on 4 African Languages
• SWS 2013 has 13 finishing (18 registered)
participants and contains 9 languages
18
16

14

1400
#teams
1200

database size

1000

12
10

800

8

600

6

400

4
200

2
0

0
2011

2012

2013
SWS 2013 evaluation setup
• 1 single search corpus with ~20 hours of
data, collected from contributions of 9
languages
– No transcription or language information is given
to participants

• 500 queries for dev and 500 queries for eval
– For each query, participants need to return all
instances of that query in the search corpus
Mediaeval SWS 2013
• 9 languages in different acoustic contexts: 4 African
languages
(isixhosa, isizulu, sepedi, setswana), Albanian, Basqu
e, Czech, non-native English, Romanian
#utts

time

Avg. length/utt.

Search corpus

10762

19:57:55

6.67s

Dev Queries

505

0:11:26h

1.35s

Extended dev*

1046

0:08:42h

0.49s

Eval Queries

503

0:11:37h

1.38s

Extended eval*

1037

0:08:57h

0.51s

Total
13853
20:38:37h
*Only Basque (3x) and Czech (10x) queries have extended versions
Database distribution per language
Language

Number of
utterances / total
duration

Number of queries

Speech quality (original
sampling rate)

Recording environment

African - isixhosa

395 / 60 min.

25 / 25

Telephone speech, 8KHz

Field recordings, read
speech

African - isizulu

395 / 60 min.

25 / 25

Telephone speech, 8KHz

Field recordings, read
speech

African - sepedi

395 / 60 min.

25 / 25

Telephone speech, 8KHz

Field recordings, read
speech

African - setswana

395 / 60 min.

25 / 25

Telephone speech, 8KHz

Field recordings, read
speech

Albanian

968 / 127 min.

50 / 50

PC microphone, 16KHz

Lab environment, read
speech

Basque

1841 / 192 min.

100 / 100 (recorded
by mobile phone)

TV Broadcast news,
16KHz

Studio, read speech

Czech

3667 / 252 min.

94 / 93

Telephone speech, 8KHz

Telephone calls into
radio broadcasts,
spontaneous speech

Non-native English

434 / 141 min.

61 / 60

High quality mic, 44KHz

Conference lectures,
spontaneous speech

Romanian

2272 / 244 min.

100 / 100

PC microphone, 16KHz

Lab environment, read
speech
SWS 2013 participants
Dto. Electricidad y electrónica, Universidad Pais Vasco

Spain

Speec@FIT, Brno University of Technology

Czech Republic

Telefonica Research

Spain
Romania

School of Electrical and Computer Engineering, Georgia Institute of Technology

USA

L2F - INESC-ID

Portugal

Departament de sistemes informàtics I Computació, Universitat Politècnica de València

Spain

Audiolab, University of Zilina

Slovakia

LIA, University of Avignon

France

Technical University of Kosice

Slovakia

Universitat Pompeu Fabra

Spain

DSP-STL, Dept. of EE, The chinese University of Hong Kong

Hong Kong

International Institute of Information Technology- Hyderabad

Non-finishing

country

University Politechnica of Bucarest

organizers

Team name

India

IAIS, Fraunhofer Institute

Germany

TATA Consultancy Services Ltd.

India

Indian Statistical Institute

India

Northwestern Polytechnical University of Xi’an

China

Toyota Technological Institute at Chicago

USA
Possible approaches to QbE-STD
Pattern based
Language spoken
Acoustic models +

Lattice based
Language models +

Word-based
Followed approaches
Team name
Dto. Electricidad y electrónica, Universidad Pais Vasco
Speec@FIT, Brno University of Technology
Telefonica Research
University Politechnica of Bucarest
School of Electrical and Computer Engineering, Georgia Institute of Technology
L2F - INESC-ID
Dept. de sistemes informàtics I Computació, Universitat Politècnica de València
Audiolab, University of Zilina
LIA, University of Avignon
Technical University of Kosice
Universitat Pompeu Fabra
DSP-STL, Dept. of EE, The chinese University of Hong Kong
International Institute of Information Technology- Hyderabad

DTW-like

AKWS
Scoring metrics
• PRIMARY: Actual Term Weighted Value (ATWV) /
Maximum Term Weighted Value (MTWV)
• Actual/minimum Cnxe
• Real-time factor
• Memory usage
Primary metric (dev)
Primary metric (eval)
Per language results
Average for the 10-best systems
Per-language results: African (eval)
Per-language results: Albanian(eval)
Per-language results: Basque(eval)
Per-language results: Czech (eval)
Per-language results: Non-native English (eval)
Per-language results: Romanian (eval)
DET dev

Miss probability (in %)

98

95
90

80

60

40

20

10
5
.0001

.5 1

2

5

10

20

Random Performance
GTTS (MTWV=0.417, Thr=5.204)
L2F (MTWV=0.390, Thr=3.428)
CUHK (MTWV=0.368, Thr=0.530)
BUT (MTWV=0.371, Thr=0.930)
CMTECHETAL (MTWV=0.264, Thr=16.535)
IIITH (MTWV=0.253, Thr=2.130)
ELIRF (MTWV=0.170, Thr=2.697)
TID (MTWV=0.116, Thr=4.085)
GTC (MTWV=0.116, Thr=3.248)
SPEED (MTWV=0.083, Thr=0.960)
LIA-Late (MTWV=0.005, Thr=13.065)
UNIZA-Late (MTWV=0.000, Thr=1.000)
TUKE-Late (MTWV=0.000, Thr=3.000)

Primary systems (development)

.001 .004 .01 .02 .05 .1 .2

False Alarm probability (in %)

40
DET eval

Miss probability (in %)

98

95
90

80

60

40

20

10
5
.0001

.5 1

2

5

10

20

Random Performance
GTTS (MTWV=0.399, Thr=5.243)
L2F (MTWV=0.342, Thr=3.551)
CUHK (MTWV=0.306, Thr=0.618)
BUT (MTWV=0.297, Thr=0.914)
CMTECHETAL (MTWV=0.257, Thr=18.153)
IIITH (MTWV=0.224, Thr=2.721)
ELIRF (MTWV=0.159, Thr=2.759)
TID (MTWV=0.093, Thr=5.051)
GTC (MTWV=0.084, Thr=3.341)
SPEED (MTWV=0.059, Thr=0.923)
LIA-Late (MTWV=0.000, Thr=1079.003)
UNIZA-Late (MTWV=0.001, Thr=1.000)
TUKE-Late (MTWV=0.000, Thr=3.000)

Primary systems (evaluation)

.001 .004 .01 .02 .05 .1 .2

False Alarm probability (in %)

40
Cnxe metric
Cnxe

2.9
Min Cnxe (development)

Act Cnxe (development)

3
2.8
Act Cnxe (evaluation)

CUHK

2.7

L2F

Min Cnxe (evaluation)

GTTS

2.6
2.5
2.4
2.3
2.2
2.1
2
1.9
1.8
1.7
1.6
1.5
1.4
1.3
1.2
1.1
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
ELIRF

TID

GTC

Cnxe for primary systems

BUT CMTECHETAL IIITH

SpeeD

LIA

UNIZA

TUKE
Extended Queries
• 4 teams submitted 4 extended systems, making use of 3
repetitions of Basque queries and 10 repetitions of Czech
queries available
– TID: computes each query individually and then puts together all
results
– GTTS: DTW-aligns all queries above a minimum duration and searches
with the resulting query
– GeorgiaTech: builds a graphical keyword model using more than one
instance
Extended systems
Extended systems
Extended systems
Extended systems
Real-Time Factor versus Memory usage
Real-Time Factor versus Memory usage (partial)
Take home messages
• The task was more complicated than in 2012
– GTTS got MTWV-13 = 0.39 MTWV-12 = 0.51 (on
2013 data)
– HKCU MTWV-12 = 0.74 (on 2012 data)

• It is possible to do QbE-STD on unknown/low
resources data
New things to watch out for in the posters session
• BUT:
– Fusion of 26 systems (13 AKWS + 13 DTW)
– M-norm normalization

• IIIT:
– Articulatory Bottleneck features

• CUHK:
– Tokenizer construction using Gaussian Component clustering
– Query expansion using PSOLA

• L2F
– DTW candidate pre-selection

• GTTS:
– Distance matrix normalization in DTW

• GeorgiaTech:
– Low-resource speech modeling using EHMM Models

• LIA:
– Use of I-vectors in SWS

• ARF
– DTW string matching algorithm with a novel scoring
System presentations
• 16:30-16:45 "GTTS Systems for the SWS Task at
MediaEval 2013", Luis Javier Rodriguez-Fuentes, DEE,
Universidad del País Vasco
• 16:45-17:00 "The L2F Spoken Web Search system for
Mediaeval 2013”, Alberto Abad, L2F, INESC-ID
• 17:00-17:15 "BUT SWS 2013 - MASSIVE PARALLEL
APPROACH", Lucas Ondel, Speech@BUT, Brno
University of Technology
• 17:15-17:30 "The CMTECH Spoken Web Search System
for MediaEval 2013", Ciro Gracia, UPF
• 17:30-17:45 Discussion and SWS 2014 teaser, Xavier
Anguera

Más contenido relacionado

Similar a Mediaeval 2013 Spoken Web Search results slides

500 languages to English Machine Translation Model
500 languages to English Machine Translation Model500 languages to English Machine Translation Model
500 languages to English Machine Translation ModelThamme Gowda
 
Wreck a nice beach: adventures in speech recognition
Wreck a nice beach: adventures in speech recognitionWreck a nice beach: adventures in speech recognition
Wreck a nice beach: adventures in speech recognitionStephen Marquard
 
Investigation of Text-to-Speech based Synthetic Parallel Data for Sequence-to...
Investigation of Text-to-Speech based Synthetic Parallel Data for Sequence-to...Investigation of Text-to-Speech based Synthetic Parallel Data for Sequence-to...
Investigation of Text-to-Speech based Synthetic Parallel Data for Sequence-to...NU_I_TODALAB
 
MediaEval 2016 - BUT Zero-Cost Speech Recognition
MediaEval 2016 - BUT Zero-Cost Speech RecognitionMediaEval 2016 - BUT Zero-Cost Speech Recognition
MediaEval 2016 - BUT Zero-Cost Speech Recognitionmultimediaeval
 
T he SPL - IT Query by Example Search on Speech system for MediaEval 2014
T he SPL - IT Query by Example Search on Speech system for MediaEval 2014T he SPL - IT Query by Example Search on Speech system for MediaEval 2014
T he SPL - IT Query by Example Search on Speech system for MediaEval 2014multimediaeval
 
Automatic transcription of video files sig media
Automatic transcription of video files   sig mediaAutomatic transcription of video files   sig media
Automatic transcription of video files sig mediaCarlos Turró Ribalta
 
MediaEval 2015 - Query by Example Search on Speech Task
MediaEval 2015 - Query by Example Search on Speech TaskMediaEval 2015 - Query by Example Search on Speech Task
MediaEval 2015 - Query by Example Search on Speech Taskmultimediaeval
 
Applying static code analysis for domain-specific languages
Applying static code analysis for domain-specific languagesApplying static code analysis for domain-specific languages
Applying static code analysis for domain-specific languagesIván Ruiz-Rube
 
Matsunaga crowdsourcing IEEE e-science 2014
Matsunaga crowdsourcing IEEE e-science 2014Matsunaga crowdsourcing IEEE e-science 2014
Matsunaga crowdsourcing IEEE e-science 2014Andrea Matsunaga
 
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...Universitat Politècnica de Catalunya
 
Curriculum Development of an Audio Processing Laboratory Course
Curriculum Development of an Audio Processing Laboratory CourseCurriculum Development of an Audio Processing Laboratory Course
Curriculum Development of an Audio Processing Laboratory Coursesipij
 
AppTek-CLimateGPT-EvryWS20240308-v3.pptx
AppTek-CLimateGPT-EvryWS20240308-v3.pptxAppTek-CLimateGPT-EvryWS20240308-v3.pptx
AppTek-CLimateGPT-EvryWS20240308-v3.pptxGérard Chollet
 
2010 INTERSPEECH
2010 INTERSPEECH 2010 INTERSPEECH
2010 INTERSPEECH WarNik Chow
 
Sampl 2015 intro
Sampl 2015 introSampl 2015 intro
Sampl 2015 introef-anat
 
NLP Data Cleansing Based on Linguistic Ontology Constraints
NLP Data Cleansing Based on Linguistic Ontology ConstraintsNLP Data Cleansing Based on Linguistic Ontology Constraints
NLP Data Cleansing Based on Linguistic Ontology ConstraintsDimitris Kontokostas
 
SiddhantSancheti_MediumShortStory.pptx
SiddhantSancheti_MediumShortStory.pptxSiddhantSancheti_MediumShortStory.pptx
SiddhantSancheti_MediumShortStory.pptxSiddhantSancheti1
 
Neural Network Language Models for Candidate Scoring in Multi-System Machine...
 Neural Network Language Models for Candidate Scoring in Multi-System Machine... Neural Network Language Models for Candidate Scoring in Multi-System Machine...
Neural Network Language Models for Candidate Scoring in Multi-System Machine...Matīss ‎‎‎‎‎‎‎  
 
Triantafyllia Voulibasi
Triantafyllia VoulibasiTriantafyllia Voulibasi
Triantafyllia VoulibasiISSEL
 

Similar a Mediaeval 2013 Spoken Web Search results slides (20)

500 languages to English Machine Translation Model
500 languages to English Machine Translation Model500 languages to English Machine Translation Model
500 languages to English Machine Translation Model
 
Wreck a nice beach: adventures in speech recognition
Wreck a nice beach: adventures in speech recognitionWreck a nice beach: adventures in speech recognition
Wreck a nice beach: adventures in speech recognition
 
Investigation of Text-to-Speech based Synthetic Parallel Data for Sequence-to...
Investigation of Text-to-Speech based Synthetic Parallel Data for Sequence-to...Investigation of Text-to-Speech based Synthetic Parallel Data for Sequence-to...
Investigation of Text-to-Speech based Synthetic Parallel Data for Sequence-to...
 
MediaEval 2016 - BUT Zero-Cost Speech Recognition
MediaEval 2016 - BUT Zero-Cost Speech RecognitionMediaEval 2016 - BUT Zero-Cost Speech Recognition
MediaEval 2016 - BUT Zero-Cost Speech Recognition
 
T he SPL - IT Query by Example Search on Speech system for MediaEval 2014
T he SPL - IT Query by Example Search on Speech system for MediaEval 2014T he SPL - IT Query by Example Search on Speech system for MediaEval 2014
T he SPL - IT Query by Example Search on Speech system for MediaEval 2014
 
Automatic transcription of video files sig media
Automatic transcription of video files   sig mediaAutomatic transcription of video files   sig media
Automatic transcription of video files sig media
 
MediaEval 2015 - Query by Example Search on Speech Task
MediaEval 2015 - Query by Example Search on Speech TaskMediaEval 2015 - Query by Example Search on Speech Task
MediaEval 2015 - Query by Example Search on Speech Task
 
Applying static code analysis for domain-specific languages
Applying static code analysis for domain-specific languagesApplying static code analysis for domain-specific languages
Applying static code analysis for domain-specific languages
 
Matsunaga crowdsourcing IEEE e-science 2014
Matsunaga crowdsourcing IEEE e-science 2014Matsunaga crowdsourcing IEEE e-science 2014
Matsunaga crowdsourcing IEEE e-science 2014
 
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
 
Curriculum Development of an Audio Processing Laboratory Course
Curriculum Development of an Audio Processing Laboratory CourseCurriculum Development of an Audio Processing Laboratory Course
Curriculum Development of an Audio Processing Laboratory Course
 
AppTek-CLimateGPT-EvryWS20240308-v3.pptx
AppTek-CLimateGPT-EvryWS20240308-v3.pptxAppTek-CLimateGPT-EvryWS20240308-v3.pptx
AppTek-CLimateGPT-EvryWS20240308-v3.pptx
 
2010 INTERSPEECH
2010 INTERSPEECH 2010 INTERSPEECH
2010 INTERSPEECH
 
Sampl 2015 intro
Sampl 2015 introSampl 2015 intro
Sampl 2015 intro
 
NLP Data Cleansing Based on Linguistic Ontology Constraints
NLP Data Cleansing Based on Linguistic Ontology ConstraintsNLP Data Cleansing Based on Linguistic Ontology Constraints
NLP Data Cleansing Based on Linguistic Ontology Constraints
 
Searching for the Best Machine Translation Combination
Searching for the Best Machine Translation CombinationSearching for the Best Machine Translation Combination
Searching for the Best Machine Translation Combination
 
SiddhantSancheti_MediumShortStory.pptx
SiddhantSancheti_MediumShortStory.pptxSiddhantSancheti_MediumShortStory.pptx
SiddhantSancheti_MediumShortStory.pptx
 
Odyssey 2022: Language-Independent Speaker Anonymization Approach using Self-...
Odyssey 2022: Language-Independent Speaker Anonymization Approach using Self-...Odyssey 2022: Language-Independent Speaker Anonymization Approach using Self-...
Odyssey 2022: Language-Independent Speaker Anonymization Approach using Self-...
 
Neural Network Language Models for Candidate Scoring in Multi-System Machine...
 Neural Network Language Models for Candidate Scoring in Multi-System Machine... Neural Network Language Models for Candidate Scoring in Multi-System Machine...
Neural Network Language Models for Candidate Scoring in Multi-System Machine...
 
Triantafyllia Voulibasi
Triantafyllia VoulibasiTriantafyllia Voulibasi
Triantafyllia Voulibasi
 

Último

MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 

Último (20)

MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 

Mediaeval 2013 Spoken Web Search results slides

  • 1. Spoken Web Search at Mediaeval 2013 Xavier Anguera, Florian Metze, Andi Buzo, Igor Szoke and Luis Javier Rodriguez-Fuentes
  • 2. Spoken Audio Search (or Query-by-Example Spoken-Term Detection) Given a spoken query we search for instances at lexical level within spoken documents It is similar to Spoken Term Detection (NIST STD2006, OpenKWS 2013) but…  Queries are spoken  Different speakers  Different acoustic conditions  No prior knowledge of the language(s) might be available
  • 3. SWS history in Mediaeval • SWS 2011 had 5 finishing participants and focused on 4 Indian languages • SWS 2012 had 9 finishing participants and focused on 4 African Languages • SWS 2013 has 13 finishing (18 registered) participants and contains 9 languages 18 16 14 1400 #teams 1200 database size 1000 12 10 800 8 600 6 400 4 200 2 0 0 2011 2012 2013
  • 4. SWS 2013 evaluation setup • 1 single search corpus with ~20 hours of data, collected from contributions of 9 languages – No transcription or language information is given to participants • 500 queries for dev and 500 queries for eval – For each query, participants need to return all instances of that query in the search corpus
  • 5. Mediaeval SWS 2013 • 9 languages in different acoustic contexts: 4 African languages (isixhosa, isizulu, sepedi, setswana), Albanian, Basqu e, Czech, non-native English, Romanian #utts time Avg. length/utt. Search corpus 10762 19:57:55 6.67s Dev Queries 505 0:11:26h 1.35s Extended dev* 1046 0:08:42h 0.49s Eval Queries 503 0:11:37h 1.38s Extended eval* 1037 0:08:57h 0.51s Total 13853 20:38:37h *Only Basque (3x) and Czech (10x) queries have extended versions
  • 6. Database distribution per language Language Number of utterances / total duration Number of queries Speech quality (original sampling rate) Recording environment African - isixhosa 395 / 60 min. 25 / 25 Telephone speech, 8KHz Field recordings, read speech African - isizulu 395 / 60 min. 25 / 25 Telephone speech, 8KHz Field recordings, read speech African - sepedi 395 / 60 min. 25 / 25 Telephone speech, 8KHz Field recordings, read speech African - setswana 395 / 60 min. 25 / 25 Telephone speech, 8KHz Field recordings, read speech Albanian 968 / 127 min. 50 / 50 PC microphone, 16KHz Lab environment, read speech Basque 1841 / 192 min. 100 / 100 (recorded by mobile phone) TV Broadcast news, 16KHz Studio, read speech Czech 3667 / 252 min. 94 / 93 Telephone speech, 8KHz Telephone calls into radio broadcasts, spontaneous speech Non-native English 434 / 141 min. 61 / 60 High quality mic, 44KHz Conference lectures, spontaneous speech Romanian 2272 / 244 min. 100 / 100 PC microphone, 16KHz Lab environment, read speech
  • 7. SWS 2013 participants Dto. Electricidad y electrónica, Universidad Pais Vasco Spain Speec@FIT, Brno University of Technology Czech Republic Telefonica Research Spain Romania School of Electrical and Computer Engineering, Georgia Institute of Technology USA L2F - INESC-ID Portugal Departament de sistemes informàtics I Computació, Universitat Politècnica de València Spain Audiolab, University of Zilina Slovakia LIA, University of Avignon France Technical University of Kosice Slovakia Universitat Pompeu Fabra Spain DSP-STL, Dept. of EE, The chinese University of Hong Kong Hong Kong International Institute of Information Technology- Hyderabad Non-finishing country University Politechnica of Bucarest organizers Team name India IAIS, Fraunhofer Institute Germany TATA Consultancy Services Ltd. India Indian Statistical Institute India Northwestern Polytechnical University of Xi’an China Toyota Technological Institute at Chicago USA
  • 8. Possible approaches to QbE-STD Pattern based Language spoken Acoustic models + Lattice based Language models + Word-based
  • 9. Followed approaches Team name Dto. Electricidad y electrónica, Universidad Pais Vasco Speec@FIT, Brno University of Technology Telefonica Research University Politechnica of Bucarest School of Electrical and Computer Engineering, Georgia Institute of Technology L2F - INESC-ID Dept. de sistemes informàtics I Computació, Universitat Politècnica de València Audiolab, University of Zilina LIA, University of Avignon Technical University of Kosice Universitat Pompeu Fabra DSP-STL, Dept. of EE, The chinese University of Hong Kong International Institute of Information Technology- Hyderabad DTW-like AKWS
  • 10. Scoring metrics • PRIMARY: Actual Term Weighted Value (ATWV) / Maximum Term Weighted Value (MTWV) • Actual/minimum Cnxe • Real-time factor • Memory usage
  • 13. Per language results Average for the 10-best systems
  • 20. DET dev Miss probability (in %) 98 95 90 80 60 40 20 10 5 .0001 .5 1 2 5 10 20 Random Performance GTTS (MTWV=0.417, Thr=5.204) L2F (MTWV=0.390, Thr=3.428) CUHK (MTWV=0.368, Thr=0.530) BUT (MTWV=0.371, Thr=0.930) CMTECHETAL (MTWV=0.264, Thr=16.535) IIITH (MTWV=0.253, Thr=2.130) ELIRF (MTWV=0.170, Thr=2.697) TID (MTWV=0.116, Thr=4.085) GTC (MTWV=0.116, Thr=3.248) SPEED (MTWV=0.083, Thr=0.960) LIA-Late (MTWV=0.005, Thr=13.065) UNIZA-Late (MTWV=0.000, Thr=1.000) TUKE-Late (MTWV=0.000, Thr=3.000) Primary systems (development) .001 .004 .01 .02 .05 .1 .2 False Alarm probability (in %) 40
  • 21. DET eval Miss probability (in %) 98 95 90 80 60 40 20 10 5 .0001 .5 1 2 5 10 20 Random Performance GTTS (MTWV=0.399, Thr=5.243) L2F (MTWV=0.342, Thr=3.551) CUHK (MTWV=0.306, Thr=0.618) BUT (MTWV=0.297, Thr=0.914) CMTECHETAL (MTWV=0.257, Thr=18.153) IIITH (MTWV=0.224, Thr=2.721) ELIRF (MTWV=0.159, Thr=2.759) TID (MTWV=0.093, Thr=5.051) GTC (MTWV=0.084, Thr=3.341) SPEED (MTWV=0.059, Thr=0.923) LIA-Late (MTWV=0.000, Thr=1079.003) UNIZA-Late (MTWV=0.001, Thr=1.000) TUKE-Late (MTWV=0.000, Thr=3.000) Primary systems (evaluation) .001 .004 .01 .02 .05 .1 .2 False Alarm probability (in %) 40
  • 22. Cnxe metric Cnxe 2.9 Min Cnxe (development) Act Cnxe (development) 3 2.8 Act Cnxe (evaluation) CUHK 2.7 L2F Min Cnxe (evaluation) GTTS 2.6 2.5 2.4 2.3 2.2 2.1 2 1.9 1.8 1.7 1.6 1.5 1.4 1.3 1.2 1.1 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 ELIRF TID GTC Cnxe for primary systems BUT CMTECHETAL IIITH SpeeD LIA UNIZA TUKE
  • 23. Extended Queries • 4 teams submitted 4 extended systems, making use of 3 repetitions of Basque queries and 10 repetitions of Czech queries available – TID: computes each query individually and then puts together all results – GTTS: DTW-aligns all queries above a minimum duration and searches with the resulting query – GeorgiaTech: builds a graphical keyword model using more than one instance
  • 28. Real-Time Factor versus Memory usage
  • 29. Real-Time Factor versus Memory usage (partial)
  • 30. Take home messages • The task was more complicated than in 2012 – GTTS got MTWV-13 = 0.39 MTWV-12 = 0.51 (on 2013 data) – HKCU MTWV-12 = 0.74 (on 2012 data) • It is possible to do QbE-STD on unknown/low resources data
  • 31. New things to watch out for in the posters session • BUT: – Fusion of 26 systems (13 AKWS + 13 DTW) – M-norm normalization • IIIT: – Articulatory Bottleneck features • CUHK: – Tokenizer construction using Gaussian Component clustering – Query expansion using PSOLA • L2F – DTW candidate pre-selection • GTTS: – Distance matrix normalization in DTW • GeorgiaTech: – Low-resource speech modeling using EHMM Models • LIA: – Use of I-vectors in SWS • ARF – DTW string matching algorithm with a novel scoring
  • 32.
  • 33. System presentations • 16:30-16:45 "GTTS Systems for the SWS Task at MediaEval 2013", Luis Javier Rodriguez-Fuentes, DEE, Universidad del País Vasco • 16:45-17:00 "The L2F Spoken Web Search system for Mediaeval 2013”, Alberto Abad, L2F, INESC-ID • 17:00-17:15 "BUT SWS 2013 - MASSIVE PARALLEL APPROACH", Lucas Ondel, Speech@BUT, Brno University of Technology • 17:15-17:30 "The CMTECH Spoken Web Search System for MediaEval 2013", Ciro Gracia, UPF • 17:30-17:45 Discussion and SWS 2014 teaser, Xavier Anguera

Notas del editor

  1. AKWS means they use some sort of Viterbi alg.DTW-like means they use DTW algorithms to match different sorts of features
  2. La UPF te molt bona regularització per a trobat el optim score en tots els queries.TID I IIIT tenen mal matching entre ATWV I MTWVOnly the positive scores were plotted