SlideShare a Scribd company logo
1 of 24
Information Retrieval-based
Dynamic Time Warping
Xavier Anguera
Telefonica Research
Spain
Query-by-Example Spoken-Term Detection
Given a spoken query we search for instances at lexical
level within spoken documents
It is similar to Spoken Term Detection (NIST STD2006,
Babel 2013) but…
 Queries are spoken

 Different speakers
 Different acoustic conditions
 No prior knowledge of the
language might be available
Information Retrieval-based
Dynamic Time Warping Algorithm
(IRDTW)
Information Retrieval-based DTW
• Inspired on the Subsequence-Dynamic time warping
algorithm by Müller [1]
• It performs a ‘sparse’ matching of two signals like
Jansen [2]
• Uses ideas borrowed from Information retrieval to
preserve memory (lots of it)
• It can take advantage of pre-indexing all reference
data and thus perform a fast frame-level matching
(described in [3])
[1] Meinard Müller, “Information Retrieval for Music and Motion”, Springer-Verlag, ISBN 978-3-540-74047-6, pp. 147150, 2010
[2] Aren Jansen, Benjamin Van Durme, “Indexing Raw Acoustic Features for Scalable Zero Resource Search”, Proc.
Interspeech 2012
[3] Gautam Mantena, Xavier Anguera, “Speed Improvements to Information Retrieval-based Dynamic Time Warping
Using Hierarchical K-means Clustering”, in Proc. ICASSP 2013
Query term

Subsequence-DTW algorithm (review)

Reference term
Query term

Reference term
Query term

Reference term
‘Sparse’ frame matching
Only the closest (lowest distance) query-reference pairs are
considered. These can be found through…
• Exhaustive comparison
• Efficient retrieval using indexing techniques
10

20

30

40

50

60

70
100

200

300

400

500

600

700

800

10

20

30

40

50

60

70
100

200

300

400

500

600

700

800
‘Sparse’ dynamic programming
IR-DTW
Query

Query

S-DTW

Reference
IR-DTW warping constraints
Query

IR-DTW

WRange =

maxQDist
2

WRange =maxQDist

Possible constraints:
• Amount of warping:
• basic warping
• 2X warping
• Length to the match
IR-DTW warping constraints
Query

IR-DTW

WRange =

maxQDist
2

WRange =maxQDist

Possible constraints:
• Amount of warping:
• basic warping
• 2X warping
• Length to the match
From 2D to 1D: Memory efficient matching
We borrow an alignment
algorithm used for
Information Retrieval
It finds unconstrained startend locations but does not
allow any time-warping

With IRDTW we modified this algorithm to allow for
time-warped matching
We use the ‘matching counts’ vector in the dynamic
programming instead of the similarity matrix.
The end position of the
paths define their location
in the 1D vector
The new matching point defines
a target location where one of
the paths will warp to
What information is stored in this vector?
DT
DT = tqi - trj

For each path we store:
• query(start, end)
• reference(start,end)
• Accumulated Distance
• #matching points

• Only paths with #matches > 1 are stored in the ΔT vector
• Size(ΔT) = size_query + size_ref (can be constrained using a circular buffer)
Applying warping constraints in 1D
Constraints in the similarity matrix translate as:
1. Consider all paths within range
DT
Wrange

Wrange/2

1. Check for local constraints
• Basic warping:
Δr > 0

• 2X warping:
Δq ≥ Δr/2

Δq ≤ 2*Δr

Query

Δq > 0

Δq
Δr
Reference
Best matching path selection
We select the path with most number of matches. It is
then warped to end in the current matching point
DT = tqi - trj

DT

New path info:
• q_end = tqi
• r_end = trj
• Accum. Distance += d(qi, rj)
• #matches++

we can dynamically save memory by eliminating obsolete paths
Query-by-Example
Spoken-Term detection
system
Acoustic features
• Posteriorgram features are used (Zhang-Glass
2010)
– MFCC-39 -> GMM-64 Posterior probability vectors

• Distance between features:
æ N-1 x [i]× y [i] ö
n
÷
d(xm , yn ) = -log çå m
ç
xm yn ÷
è i=0
ø
Query-by-example Spoken Term
Detection system*
Background
model training

Search
corpus

Feature
extractor
Background
model

Query

Feature
extractor

Index mode

VAD models
training

Energy-based
VAD

Development
dataset

VAD
model
Energy-based
VAD

IR-DTW

Local S-DTW
refinement

Overlap
prunning

Search mode
*X. Anguera, “Telefonica system for the Spoken Web Search Task at Mediaeval 2012”,
Mediaeval 2012 Workshop, Pisa, Italy
Performance evaluation
• Database: Mediaeval SWS 2012 data (4
African languages, subset of Lwazy database*)
– ~4h development corpus + 100 queries
– ~4h evaluation corpus + 100 queries

• Metrics:
– Minimum Term Weighted Value (MTWV)
– Memory usage

*E. Barnard, M. Davel, C. V. Heerden, “ASR Corpus Design for Resource-Scarse Languages”, in
Proc. Interspeech 2009
Minimum Term Weighted Value
System

Dev. Set

Eval Set

Diagonal

0.258

0.276

IR-DTW

0.394

0.394

S-DTW

0.443

0.450

Rails system

0.381

0.384

Contrastive systems:
• Diagonal: Substitute IR-DTW by only allowing diagonal matches
• S-DTW: Implementation as in [1]
• Rails system: scores from [2] on the same database
[1] X. Anguera and M. Ferrarons, “Memory-Efficient Subsequence-DTW for Query-by-Example
Spoken Term Detection”, in Proc. ICME, 2013
[2] A. Jansen, B. V. Durme and P. Clark, “The JHU-HLTCOE Spoken Web Search System for
Mediaeval 2012”, in Proc. Mediaeval Workshop 2012, Pisa, Italy
Memory usage analysis

System

Dev. Set (mean/std)

Eval set (mean/std)

S-DTW

506.2MB/342.8MB

568.1MB/326.4MB

IR-DTW

91.7MB/15MB

112.3MB/21.8MB
Conclusions and Future Work
• We have introduced the IR-DTW algorithm and
demonstrated its potential in the QbE-STD task.
– Its main advantage is its low memory usage
– Accuracy still falls short from an exhaustive/traditional
search
Not anymore!

• We are testing IR-DTW in other tasks
– Large volumes of data that disallow building similarity
matrices
– Applications not in speech that can benefit from
sparse matching
Thanks for your attention

Questions?
Xavier Anguera
xanguera@tid.es
Download the code from here:
http://www.xavieranguera.com/resources/resources.html#IRDT
W

More Related Content

What's hot

Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for SearchBhaskar Mitra
 
K-Means, its Variants and its Applications
K-Means, its Variants and its ApplicationsK-Means, its Variants and its Applications
K-Means, its Variants and its ApplicationsVarad Meru
 
Dynamic Memory & Linked Lists
Dynamic Memory & Linked ListsDynamic Memory & Linked Lists
Dynamic Memory & Linked ListsAfaq Mansoor Khan
 
CSA 3702 machine learning module 3
CSA 3702 machine learning module 3CSA 3702 machine learning module 3
CSA 3702 machine learning module 3Nandhini S
 
Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...
Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...
Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...Neelabha Pant
 
K-means Clustering with Scikit-Learn
K-means Clustering with Scikit-LearnK-means Clustering with Scikit-Learn
K-means Clustering with Scikit-LearnSarah Guido
 
K-Means clustring @jax
K-Means clustring @jaxK-Means clustring @jax
K-Means clustring @jaxAjay Iet
 
K means clustering
K means clusteringK means clustering
K means clusteringkeshav goyal
 
A density based micro aggregation technique for privacy-preserving data mining
A density based micro aggregation technique for privacy-preserving data miningA density based micro aggregation technique for privacy-preserving data mining
A density based micro aggregation technique for privacy-preserving data miningeSAT Publishing House
 
Could a Data Science Program use Data Science Insights?
Could a Data Science Program use Data Science Insights?Could a Data Science Program use Data Science Insights?
Could a Data Science Program use Data Science Insights?Zachary Thomas
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to RankBhaskar Mitra
 
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...MLconf
 
P2P Resource Discovery for the Browser
P2P Resource Discovery for the BrowserP2P Resource Discovery for the Browser
P2P Resource Discovery for the BrowserDavid Dias
 
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Gabriel Moreira
 

What's hot (20)

Lect4
Lect4Lect4
Lect4
 
Birch
BirchBirch
Birch
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for Search
 
K-Means, its Variants and its Applications
K-Means, its Variants and its ApplicationsK-Means, its Variants and its Applications
K-Means, its Variants and its Applications
 
Dynamic Memory & Linked Lists
Dynamic Memory & Linked ListsDynamic Memory & Linked Lists
Dynamic Memory & Linked Lists
 
CSA 3702 machine learning module 3
CSA 3702 machine learning module 3CSA 3702 machine learning module 3
CSA 3702 machine learning module 3
 
Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...
Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...
Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...
 
CSCC-X2007
CSCC-X2007CSCC-X2007
CSCC-X2007
 
Clique
Clique Clique
Clique
 
K-means Clustering with Scikit-Learn
K-means Clustering with Scikit-LearnK-means Clustering with Scikit-Learn
K-means Clustering with Scikit-Learn
 
K-Means clustring @jax
K-Means clustring @jaxK-Means clustring @jax
K-Means clustring @jax
 
K means clustering
K means clusteringK means clustering
K means clustering
 
H0964752
H0964752H0964752
H0964752
 
A density based micro aggregation technique for privacy-preserving data mining
A density based micro aggregation technique for privacy-preserving data miningA density based micro aggregation technique for privacy-preserving data mining
A density based micro aggregation technique for privacy-preserving data mining
 
Could a Data Science Program use Data Science Insights?
Could a Data Science Program use Data Science Insights?Could a Data Science Program use Data Science Insights?
Could a Data Science Program use Data Science Insights?
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to Rank
 
Kmeans
KmeansKmeans
Kmeans
 
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
 
P2P Resource Discovery for the Browser
P2P Resource Discovery for the BrowserP2P Resource Discovery for the Browser
P2P Resource Discovery for the Browser
 
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017
 

Similar to Information Retrieval Dynamic Time Warping - Interspeech 2013 presentation

Target Holding - Big Dikes and Big Data
Target Holding - Big Dikes and Big DataTarget Holding - Big Dikes and Big Data
Target Holding - Big Dikes and Big DataFrens Jan Rumph
 
Elag 2012 - Under the hood of 3TU.Datacentrum.
Elag 2012 - Under the hood of 3TU.Datacentrum.Elag 2012 - Under the hood of 3TU.Datacentrum.
Elag 2012 - Under the hood of 3TU.Datacentrum.Egbert Gramsbergen
 
is2015_poster
is2015_posteris2015_poster
is2015_posterJan Svec
 
Seattle Scalability Meetup 6-26-13
Seattle Scalability Meetup 6-26-13Seattle Scalability Meetup 6-26-13
Seattle Scalability Meetup 6-26-13specialk29
 
AN INVERTED LIST BASED APPROACH TO GENERATE OPTIMISED PATH IN DSR IN MANETS –...
AN INVERTED LIST BASED APPROACH TO GENERATE OPTIMISED PATH IN DSR IN MANETS –...AN INVERTED LIST BASED APPROACH TO GENERATE OPTIMISED PATH IN DSR IN MANETS –...
AN INVERTED LIST BASED APPROACH TO GENERATE OPTIMISED PATH IN DSR IN MANETS –...Editor IJCATR
 
Efficient Database Management System For Wireless Sensor Network
Efficient Database Management System For Wireless Sensor Network Efficient Database Management System For Wireless Sensor Network
Efficient Database Management System For Wireless Sensor Network Onyebuchi nosiri
 
INC 2004: An Efficient Mechanism for Adaptive Resource Discovery in Grids
INC 2004: An Efficient Mechanism for Adaptive Resource Discovery in GridsINC 2004: An Efficient Mechanism for Adaptive Resource Discovery in Grids
INC 2004: An Efficient Mechanism for Adaptive Resource Discovery in GridsJames Salter
 
Semantic Sensor Network Ontology: Description et usage
Semantic Sensor Network Ontology: Description et usageSemantic Sensor Network Ontology: Description et usage
Semantic Sensor Network Ontology: Description et usagecatherine roussey
 
The Case for a Signal Oriented Data Stream Management System
The Case for a Signal Oriented Data Stream Management SystemThe Case for a Signal Oriented Data Stream Management System
The Case for a Signal Oriented Data Stream Management SystemReza Rahimi
 
Semantics in Sensor Networks
Semantics in Sensor NetworksSemantics in Sensor Networks
Semantics in Sensor NetworksOscar Corcho
 
Deep Learning and Watson Studio
Deep Learning and Watson StudioDeep Learning and Watson Studio
Deep Learning and Watson StudioSasha Lazarevic
 
CODE AWARE DYNAMIC SOURCE ROUTING FOR DISTRIBUTED SENSOR NETWORK
CODE AWARE DYNAMIC SOURCE ROUTING FOR DISTRIBUTED SENSOR NETWORKCODE AWARE DYNAMIC SOURCE ROUTING FOR DISTRIBUTED SENSOR NETWORK
CODE AWARE DYNAMIC SOURCE ROUTING FOR DISTRIBUTED SENSOR NETWORKIJNSA Journal
 
Improvement of limited Storage Placement in Wireless Sensor Network
Improvement of limited Storage Placement in Wireless Sensor NetworkImprovement of limited Storage Placement in Wireless Sensor Network
Improvement of limited Storage Placement in Wireless Sensor NetworkIOSR Journals
 
PERFORMANCE EVALUATION OF SQL AND NOSQL DATABASE MANAGEMENT SYSTEMS IN A CLUSTER
PERFORMANCE EVALUATION OF SQL AND NOSQL DATABASE MANAGEMENT SYSTEMS IN A CLUSTERPERFORMANCE EVALUATION OF SQL AND NOSQL DATABASE MANAGEMENT SYSTEMS IN A CLUSTER
PERFORMANCE EVALUATION OF SQL AND NOSQL DATABASE MANAGEMENT SYSTEMS IN A CLUSTERijdms
 
TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...
TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...
TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...Simplilearn
 
Cyclone DDS Unleashed: Scalability in DDS and Dealing with Large Systems
Cyclone DDS Unleashed: Scalability in DDS and Dealing with Large SystemsCyclone DDS Unleashed: Scalability in DDS and Dealing with Large Systems
Cyclone DDS Unleashed: Scalability in DDS and Dealing with Large SystemsZettaScaleTechnology
 

Similar to Information Retrieval Dynamic Time Warping - Interspeech 2013 presentation (20)

Target Holding - Big Dikes and Big Data
Target Holding - Big Dikes and Big DataTarget Holding - Big Dikes and Big Data
Target Holding - Big Dikes and Big Data
 
Elag 2012 - Under the hood of 3TU.Datacentrum.
Elag 2012 - Under the hood of 3TU.Datacentrum.Elag 2012 - Under the hood of 3TU.Datacentrum.
Elag 2012 - Under the hood of 3TU.Datacentrum.
 
is2015_poster
is2015_posteris2015_poster
is2015_poster
 
Seattle Scalability Meetup 6-26-13
Seattle Scalability Meetup 6-26-13Seattle Scalability Meetup 6-26-13
Seattle Scalability Meetup 6-26-13
 
AN INVERTED LIST BASED APPROACH TO GENERATE OPTIMISED PATH IN DSR IN MANETS –...
AN INVERTED LIST BASED APPROACH TO GENERATE OPTIMISED PATH IN DSR IN MANETS –...AN INVERTED LIST BASED APPROACH TO GENERATE OPTIMISED PATH IN DSR IN MANETS –...
AN INVERTED LIST BASED APPROACH TO GENERATE OPTIMISED PATH IN DSR IN MANETS –...
 
Efficient Database Management System For Wireless Sensor Network
Efficient Database Management System For Wireless Sensor Network Efficient Database Management System For Wireless Sensor Network
Efficient Database Management System For Wireless Sensor Network
 
INC 2004: An Efficient Mechanism for Adaptive Resource Discovery in Grids
INC 2004: An Efficient Mechanism for Adaptive Resource Discovery in GridsINC 2004: An Efficient Mechanism for Adaptive Resource Discovery in Grids
INC 2004: An Efficient Mechanism for Adaptive Resource Discovery in Grids
 
Semantic Sensor Network Ontology: Description et usage
Semantic Sensor Network Ontology: Description et usageSemantic Sensor Network Ontology: Description et usage
Semantic Sensor Network Ontology: Description et usage
 
Distributed deep learning_over_spark_20_nov_2014_ver_2.8
Distributed deep learning_over_spark_20_nov_2014_ver_2.8Distributed deep learning_over_spark_20_nov_2014_ver_2.8
Distributed deep learning_over_spark_20_nov_2014_ver_2.8
 
The Case for a Signal Oriented Data Stream Management System
The Case for a Signal Oriented Data Stream Management SystemThe Case for a Signal Oriented Data Stream Management System
The Case for a Signal Oriented Data Stream Management System
 
Semantics in Sensor Networks
Semantics in Sensor NetworksSemantics in Sensor Networks
Semantics in Sensor Networks
 
STDCS
STDCSSTDCS
STDCS
 
Deep Learning and Watson Studio
Deep Learning and Watson StudioDeep Learning and Watson Studio
Deep Learning and Watson Studio
 
GaianDB
GaianDBGaianDB
GaianDB
 
CODE AWARE DYNAMIC SOURCE ROUTING FOR DISTRIBUTED SENSOR NETWORK
CODE AWARE DYNAMIC SOURCE ROUTING FOR DISTRIBUTED SENSOR NETWORKCODE AWARE DYNAMIC SOURCE ROUTING FOR DISTRIBUTED SENSOR NETWORK
CODE AWARE DYNAMIC SOURCE ROUTING FOR DISTRIBUTED SENSOR NETWORK
 
Improvement of limited Storage Placement in Wireless Sensor Network
Improvement of limited Storage Placement in Wireless Sensor NetworkImprovement of limited Storage Placement in Wireless Sensor Network
Improvement of limited Storage Placement in Wireless Sensor Network
 
Ijnsa050209
Ijnsa050209Ijnsa050209
Ijnsa050209
 
PERFORMANCE EVALUATION OF SQL AND NOSQL DATABASE MANAGEMENT SYSTEMS IN A CLUSTER
PERFORMANCE EVALUATION OF SQL AND NOSQL DATABASE MANAGEMENT SYSTEMS IN A CLUSTERPERFORMANCE EVALUATION OF SQL AND NOSQL DATABASE MANAGEMENT SYSTEMS IN A CLUSTER
PERFORMANCE EVALUATION OF SQL AND NOSQL DATABASE MANAGEMENT SYSTEMS IN A CLUSTER
 
TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...
TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...
TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...
 
Cyclone DDS Unleashed: Scalability in DDS and Dealing with Large Systems
Cyclone DDS Unleashed: Scalability in DDS and Dealing with Large SystemsCyclone DDS Unleashed: Scalability in DDS and Dealing with Large Systems
Cyclone DDS Unleashed: Scalability in DDS and Dealing with Large Systems
 

More from Xavier Anguera

Kaldi-voice: Your personal speech recognition server using open source code
Kaldi-voice: Your personal speech recognition server using open source codeKaldi-voice: Your personal speech recognition server using open source code
Kaldi-voice: Your personal speech recognition server using open source codeXavier Anguera
 
MASK: Robust Local Features for Audio Fingerprinting
MASK: Robust Local Features for Audio FingerprintingMASK: Robust Local Features for Audio Fingerprinting
MASK: Robust Local Features for Audio FingerprintingXavier Anguera
 
Mediaeval 2013 Spoken Web Search results slides
Mediaeval 2013 Spoken Web Search results slidesMediaeval 2013 Spoken Web Search results slides
Mediaeval 2013 Spoken Web Search results slidesXavier Anguera
 
Daghstuhl Seminar 13451 (Computational Audio Analysis) Inspirational Talk
Daghstuhl Seminar 13451 (Computational Audio Analysis) Inspirational TalkDaghstuhl Seminar 13451 (Computational Audio Analysis) Inspirational Talk
Daghstuhl Seminar 13451 (Computational Audio Analysis) Inspirational TalkXavier Anguera
 
Time Machine session @ ICME 2012 - DTW's New Youth
Time Machine session @ ICME 2012 - DTW's New YouthTime Machine session @ ICME 2012 - DTW's New Youth
Time Machine session @ ICME 2012 - DTW's New YouthXavier Anguera
 
Multimodal pattern matching algorithms and applications
Multimodal pattern matching algorithms and applicationsMultimodal pattern matching algorithms and applications
Multimodal pattern matching algorithms and applicationsXavier Anguera
 

More from Xavier Anguera (6)

Kaldi-voice: Your personal speech recognition server using open source code
Kaldi-voice: Your personal speech recognition server using open source codeKaldi-voice: Your personal speech recognition server using open source code
Kaldi-voice: Your personal speech recognition server using open source code
 
MASK: Robust Local Features for Audio Fingerprinting
MASK: Robust Local Features for Audio FingerprintingMASK: Robust Local Features for Audio Fingerprinting
MASK: Robust Local Features for Audio Fingerprinting
 
Mediaeval 2013 Spoken Web Search results slides
Mediaeval 2013 Spoken Web Search results slidesMediaeval 2013 Spoken Web Search results slides
Mediaeval 2013 Spoken Web Search results slides
 
Daghstuhl Seminar 13451 (Computational Audio Analysis) Inspirational Talk
Daghstuhl Seminar 13451 (Computational Audio Analysis) Inspirational TalkDaghstuhl Seminar 13451 (Computational Audio Analysis) Inspirational Talk
Daghstuhl Seminar 13451 (Computational Audio Analysis) Inspirational Talk
 
Time Machine session @ ICME 2012 - DTW's New Youth
Time Machine session @ ICME 2012 - DTW's New YouthTime Machine session @ ICME 2012 - DTW's New Youth
Time Machine session @ ICME 2012 - DTW's New Youth
 
Multimodal pattern matching algorithms and applications
Multimodal pattern matching algorithms and applicationsMultimodal pattern matching algorithms and applications
Multimodal pattern matching algorithms and applications
 

Recently uploaded

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 

Recently uploaded (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 

Information Retrieval Dynamic Time Warping - Interspeech 2013 presentation

  • 1. Information Retrieval-based Dynamic Time Warping Xavier Anguera Telefonica Research Spain
  • 2. Query-by-Example Spoken-Term Detection Given a spoken query we search for instances at lexical level within spoken documents It is similar to Spoken Term Detection (NIST STD2006, Babel 2013) but…  Queries are spoken  Different speakers  Different acoustic conditions  No prior knowledge of the language might be available
  • 3. Information Retrieval-based Dynamic Time Warping Algorithm (IRDTW)
  • 4. Information Retrieval-based DTW • Inspired on the Subsequence-Dynamic time warping algorithm by Müller [1] • It performs a ‘sparse’ matching of two signals like Jansen [2] • Uses ideas borrowed from Information retrieval to preserve memory (lots of it) • It can take advantage of pre-indexing all reference data and thus perform a fast frame-level matching (described in [3]) [1] Meinard Müller, “Information Retrieval for Music and Motion”, Springer-Verlag, ISBN 978-3-540-74047-6, pp. 147150, 2010 [2] Aren Jansen, Benjamin Van Durme, “Indexing Raw Acoustic Features for Scalable Zero Resource Search”, Proc. Interspeech 2012 [3] Gautam Mantena, Xavier Anguera, “Speed Improvements to Information Retrieval-based Dynamic Time Warping Using Hierarchical K-means Clustering”, in Proc. ICASSP 2013
  • 5. Query term Subsequence-DTW algorithm (review) Reference term
  • 8. ‘Sparse’ frame matching Only the closest (lowest distance) query-reference pairs are considered. These can be found through… • Exhaustive comparison • Efficient retrieval using indexing techniques 10 20 30 40 50 60 70 100 200 300 400 500 600 700 800 10 20 30 40 50 60 70 100 200 300 400 500 600 700 800
  • 10. IR-DTW warping constraints Query IR-DTW WRange = maxQDist 2 WRange =maxQDist Possible constraints: • Amount of warping: • basic warping • 2X warping • Length to the match
  • 11. IR-DTW warping constraints Query IR-DTW WRange = maxQDist 2 WRange =maxQDist Possible constraints: • Amount of warping: • basic warping • 2X warping • Length to the match
  • 12. From 2D to 1D: Memory efficient matching We borrow an alignment algorithm used for Information Retrieval It finds unconstrained startend locations but does not allow any time-warping With IRDTW we modified this algorithm to allow for time-warped matching
  • 13. We use the ‘matching counts’ vector in the dynamic programming instead of the similarity matrix. The end position of the paths define their location in the 1D vector The new matching point defines a target location where one of the paths will warp to
  • 14. What information is stored in this vector? DT DT = tqi - trj For each path we store: • query(start, end) • reference(start,end) • Accumulated Distance • #matching points • Only paths with #matches > 1 are stored in the ΔT vector • Size(ΔT) = size_query + size_ref (can be constrained using a circular buffer)
  • 15. Applying warping constraints in 1D Constraints in the similarity matrix translate as: 1. Consider all paths within range DT Wrange Wrange/2 1. Check for local constraints • Basic warping: Δr > 0 • 2X warping: Δq ≥ Δr/2 Δq ≤ 2*Δr Query Δq > 0 Δq Δr Reference
  • 16. Best matching path selection We select the path with most number of matches. It is then warped to end in the current matching point DT = tqi - trj DT New path info: • q_end = tqi • r_end = trj • Accum. Distance += d(qi, rj) • #matches++ we can dynamically save memory by eliminating obsolete paths
  • 18. Acoustic features • Posteriorgram features are used (Zhang-Glass 2010) – MFCC-39 -> GMM-64 Posterior probability vectors • Distance between features: æ N-1 x [i]× y [i] ö n ÷ d(xm , yn ) = -log çå m ç xm yn ÷ è i=0 ø
  • 19. Query-by-example Spoken Term Detection system* Background model training Search corpus Feature extractor Background model Query Feature extractor Index mode VAD models training Energy-based VAD Development dataset VAD model Energy-based VAD IR-DTW Local S-DTW refinement Overlap prunning Search mode *X. Anguera, “Telefonica system for the Spoken Web Search Task at Mediaeval 2012”, Mediaeval 2012 Workshop, Pisa, Italy
  • 20. Performance evaluation • Database: Mediaeval SWS 2012 data (4 African languages, subset of Lwazy database*) – ~4h development corpus + 100 queries – ~4h evaluation corpus + 100 queries • Metrics: – Minimum Term Weighted Value (MTWV) – Memory usage *E. Barnard, M. Davel, C. V. Heerden, “ASR Corpus Design for Resource-Scarse Languages”, in Proc. Interspeech 2009
  • 21. Minimum Term Weighted Value System Dev. Set Eval Set Diagonal 0.258 0.276 IR-DTW 0.394 0.394 S-DTW 0.443 0.450 Rails system 0.381 0.384 Contrastive systems: • Diagonal: Substitute IR-DTW by only allowing diagonal matches • S-DTW: Implementation as in [1] • Rails system: scores from [2] on the same database [1] X. Anguera and M. Ferrarons, “Memory-Efficient Subsequence-DTW for Query-by-Example Spoken Term Detection”, in Proc. ICME, 2013 [2] A. Jansen, B. V. Durme and P. Clark, “The JHU-HLTCOE Spoken Web Search System for Mediaeval 2012”, in Proc. Mediaeval Workshop 2012, Pisa, Italy
  • 22. Memory usage analysis System Dev. Set (mean/std) Eval set (mean/std) S-DTW 506.2MB/342.8MB 568.1MB/326.4MB IR-DTW 91.7MB/15MB 112.3MB/21.8MB
  • 23. Conclusions and Future Work • We have introduced the IR-DTW algorithm and demonstrated its potential in the QbE-STD task. – Its main advantage is its low memory usage – Accuracy still falls short from an exhaustive/traditional search Not anymore! • We are testing IR-DTW in other tasks – Large volumes of data that disallow building similarity matrices – Applications not in speech that can benefit from sparse matching
  • 24. Thanks for your attention Questions? Xavier Anguera xanguera@tid.es Download the code from here: http://www.xavieranguera.com/resources/resources.html#IRDT W