SlideShare una empresa de Scribd logo
1 de 40
Dolování dat z řeči
pro bezpečnostní aplikace
Honza Černocký
BUT Speech@FIT, FIT VUT v Brně
Security Session, 11.4.2015
Security Session Honza Cernocky 11/4/2015 2/36
Agenda
• Introduction
• Gender ID example
• Speech recognition
• Language identification
• Speaker recognition
• Conclusions
3/36
Needle in a haystack
• Speech is the most important modality of human-human
communication (~80% of information) … criminals and
terrorists are also communicating by speech
• Speech is easy to acquire in the scenarios of interest.
• More difficult is to find what we are looking for
• Typically done by human experts, but always count on:
• Limited personnel
• Limited budget
• Not enough languages spoken
• Insufficient security clearances
Technologies of speech processing are not
almighty but can help to narrow the search
space.
Security Session Honza Cernocky 11/4/2015
Security Session Honza Cernocky 11/4/2015 4/36
Data mining from spontaneous unprepared
speech
Speaker/Voice
Recognition
Gender
Recognition
Language
Recognition
Who speaks?
What gender?
What language?
John Doe
Male or Female
English/German/??
audio (speech)
Speech
Recognition What was
said?“Hello John!”
“John” spotted
Time/relation
analysis
Who asked to
whom?
John asked Paul
Security Session Honza Cernocky 11/4/2015 5/36
How do we work ?
• According to recipes from pattern recognition text-
books !
Collect data
Choose features
Choose model
Train model
Evaluate the classifier
A priori knowledge
of the problem
deployment
Happy (or deadline passed) ?
Unhappy?
Security Session Honza Cernocky 11/4/2015 6/36
The result
Feature
extraction Evaluation of
probabilities or
likelihoods
Models
“Decoding”
nput decision
7/36
The simplest example … GID
Gender Identification
• Tag speech segments as male or
female.
Security Session Honza Cernocky 11/4/2015
Security Session Honza Cernocky 11/4/2015 8/36
So how is Gender-ID done ?
Evaluation of
GMM
likelihoods
MFCC
put
Gaussian Mixture
models – boys,
girls
Decision
Male/female
Security Session Honza Cernocky 11/4/2015 9/36
Features – Mel Frequency Cepstral Coefficients
• The signal is not stationary
• And the hearing is not linear
Security Session Honza Cernocky 11/4/2015 10/36
Features – a vector each 10ms
Security Session Honza Cernocky 11/4/2015 11/36
The evaluation of likelihoods: GMM
Security Session Honza Cernocky 11/4/2015 12/36
Decision - „decoding“
Gender ID summary
Needed data:
•Several hours of speech (from the target channels)
labeled as M or F.
Accuracy:
•the most accurate of our speech data mining tools:
>96% accuracy on challenging channels
What do we get:
•Limiting the search space by 50%
Security Session Honza Cernocky 11/4/2015 13/36
Security Session Honza Cernocky 11/4/2015 14/36
Speech recognition
• Voice2text (V2T), Speech2text (S2T), transcription …
• Large vocabulary continuous speech recognition
(LVCSR)
Feature
extraction Evaluation of
likelihoods (scores
of hypothesis)
Acoustic models
“Decoding”
peech text
Language model Pronunciation
dictionary
Recognition network
LVCSR technically …
• Acoustic models
• … how do speech segments match basic speech unites
(phonemes)
• trained on large (>100h) quantities of carefully transcribed
speech data
• Classically Gaussian Mixture models
• Language models
• … how do the words follow each other
President George Bush
President George push
• Need to be trained on large quantities (Gigabytes) of text from
the target domain
• Pronunciation dictionary
• Translate words into phonemes: dog  d oh g
• Basis needs to be created by hand, the rest generated using
trained grapheme to phoneme (g2p) converter
• A toolkit to do all this … HTK, KALDI, proprietary.
Security Session Honza Cernocky 11/4/2015 15/36
Security Session Honza Cernocky 11/4/2015 16/36
Making LVCSR work well
• Neural networks
• Eating up other techniques (feature extraction, scoring,
LM) - DNNs
• Bottle-neck NNs.
• Speaker adaptation
• Asking the speaker to read a text in dictation systems …
• Unsupervised needed !
• MAP, MLLR, CMLLR, RDLT, SAT …
Security Session Honza Cernocky 11/4/2015 17/36
Challenges in LVCSR
• LVCSR relatively mature in well represented languages (US
English, Modern Standard Arabic, Czech)
• Fast development of recognizers for new languages with
limited resources – IARBA BABEL project
• Limited language packs 10h + some 70h of untranscribed
data
• 2013 languages: Cantonese, Turkish, Pashto, Tagalog,
Surprise - Vietnamese
• 2014 languages: Bengali, Assamese, Zulu, Haiti Creole, Lao,
Surprise: Tamil
• How to re-use resources
from other languages ?
• How to adapt to user’s
language/domain without
seeing his/her data ?
Security Session Honza Cernocky 11/4/2015 18/36
Some examples ….
and then they have one week to retrain their
keyword results ...
and ...
give you might ask why one we there a lot of
research or evaluation methods ...
the people are trying out what keywords or so it
is important to leave a ...
sufficient amount of time there as well ...
uhuh kade sengifowunelwe nguThami manje ithi
angazi e- ekhuluma nomunye ubhuti wakwamasipala
ukuthi ene usho ukuthi kunabantu ekufanele
baphelelwe ngumsebenzi ngoba uNomvula emecabanga
uzokhokha (()) ngoba yena uzoy ithela uzoyi
uzoyihlulisela ngoba phela kukhona aba- abaphethe
u-Adam angithi
LVCSR – what to expect
Accuracies (word accuracy)
•Dictation: >90%
•Reasonable languages:
>70%
•Babel languages ~70%
WER (example on Tamil)
Is this OK ??
•Usually not useable for direct reading, and
questionable, if a trained secretary is not faster in case
we need 100% accurate output.
•Yes useable for search, for rare languages often the
only alternative.
Security Session Honza Cernocky 11/4/2015 19/36
LVCSR – user data
• Speech (for acoustic models):
• Many hours of data as close as possible to the target use
(language, dialect, speaking style …)
• Needs to be transcribed better than in TV subtitles.
• Text (for language models)
• Newspapers and TV news work for dictation but not here.
• Need target text data (including very dirty language)
• Can be simulated by looking for dirty Internet data (Twitter,
discussion forums).
• Pronunciations: generally not a big deal, needs list of words.
Problematic for languages without expertise.
• Privacy issues:
• Speech and text are sensitive.
• Re-training of LVCSR by the users so far not successful.
• Work on modularization: collection of statistics by the user,
shipping to development teams…
• Opportunity to collect this data jointly, especially for
languages relevant for security across Europe
Security Session Honza Cernocky 11/4/2015 20/36
Security Session Honza Cernocky 11/4/2015 21/36
Language identification
• Which language in the recording ?
LID
Security Session Honza Cernocky 11/4/2015 22/36
Standard approaches
• Acoustics
• Phonotactics
Security Session Honza Cernocky 11/4/2015 23/36
LID: Current state-of-the-art system
• A large GMM (“Universal Background model - UBM”) –
performs collection of sufficient statistics – a vector of
several thousands of parameters per utterance (fixed size!)
• Projection to a “language print” – several hundreds of
values.
• These language prints are scored and score is calibrated.
LID – what to expect
• Performance on nice data
NIST LRE 2009,
23 languages
Security Session Honza Cernocky 11/4/2015 24/36
0%
2%
4%
6%
8%
10%
30s 10s 3s
Best 1
Best 2
Best 3
Best 4
Best 5
Phase3
Phase2
Phase1
17
• And on terrible data
RATS 2014,
5 languages (EER)
Security Session Honza Cernocky 11/4/2015 25/36
LID – user data
• Tens of hours of data per target language or dialect
• Need to have only the language label, no transcription necessary.
• Allow to:
• Improve the model of an existing language.
• Add a new language or dialect, or even a target group
• LID is a technology where the user can modify the system
him/her-self
• Language prints do not carry the information on the content –
potential for cooperation
• Backup solution:
• automatic acquisition of language-specific telephone data from public
sources (EOARD project)
Security Session Honza Cernocky 11/4/2015 26/36
Speaker recognition
Two hypotheses
• H0: the speaker in test recording IS THE SAME WE
SAW IN THE ENROLMENT
• H1: the speaker in test recording IS DIFFERENT
• Log likelihood ratio
SRE classical scheme
• Feature extraction – Mel Frequency Cepstral
Coefficients
• Background model implemented as a Gaussian
Mixture model
• Adapted to the target speaker.
• At the time of the test, both models produce likelihoods
that are subtracted and thresholded.
Such a system
• Can be built by a reasonably skilled student equipped
with Matlab in half a day
• Will reasonably function in case enrollment and test
take place under similar conditions.
Security Session Honza Cernocky 11/4/2015 27/36
IKR !
Inter-session variability
NOT HAVING THE SAME CONDITIONS !
Intrinsic variability
•Language
•Emotions, stress, Lombard effect
•Health condition
•Content of the message
Extrinsic variability
•Noise
•Transmission channel
•Codec (or series of codecs)
•Recording device …
Security Session Honza Cernocky 11/4/2015 28/36
Security Session Honza Cernocky 11/4/2015 29/36
Years of SRE R&D fighting the variability …
Front-end
processing
Front-end
processing
Target modelTarget model
Background
model
Background
model
LR score
normalization
LR score
normalization
Σ ΛAdapt
Feature domain Model domain Score domain
• Noise
removal
• Tone
removal
• Cepstral mean
subtraction
• RASTA filtering
• Mean & variance
normalization
• Feature warping
• Speaker Model
Synthesis
• Eigenchannel
compensation
•Joint Factor
Analysis
• Nuisance Attribute
Projection
• Z-norm
• T-norm
• ZT-norm
•Feature
Mapping
•Eigenchannel
adaptation in
feature domain
Security Session Honza Cernocky 11/4/2015 30/36
Current state-of-the-art
• Low-dimensional representation of whole recordings
• i-Vectors (for R&D), Voiceprints (for business)
• Allows for very fast scoring.
Security Session Honza Cernocky 11/4/2015 31/36
What to expect I.
• Works very nicely for long telephone recordings (EER
~2%) – multiple successes in NIST evaluations.
• Examples …
Security Session Honza Cernocky 11/4/2015 32/36
What to expect II.
• Noise, varying communication channels, short
recordings (10s) still a problem – DARPA RATS
program
• Examples …
SRE – user data
• The performance of the SRE system crucially depends
on how the training data is close to the deployment.
• UBM – needs lots (100s of hours) of unannotated data,
not very sensitive.
• VoicePrint extractor – dtto.
• Scoring done by PLDA
• Voice-prints with speaker labels (A,B,C, …) needed
• Even 50 speakers help to increase the accuracy by 30%.
• … but some users are not able to collect/label even this
amount.
• Work running on unsupervised adaptation on
unannotated data.
Security Session Honza Cernocky 11/4/2015 33/36
The charm of voice-prints
• Allowing for transfer of speaker identities
• without giving out the original WAV
• Without possibility to reconstruct what was said.
Security Session Honza Cernocky 11/4/2015 34/36
No contentcontent
• Opening a range of opportunities for
• Cooperation between customers and law enforcement
• Cooperation with R&D teams.
Conclusions
• Speech data mining technologies are already serving
in security and defense (and you can test and
eventually buy the ones from several vendors)
• International crime asks for international reaction:
Standardization (even in the form of informal
working draft) should take place ASAP to allow Police
forces to exchange voice-prints regardless of vendors.
… we’re on it.
Security Session Honza Cernocky 11/4/2015 35/36
Security Session Honza Cernocky 11/4/2015 36/36
Díky za pozvání na Security Session !
Otázky ?
BACKUP
SLIDES
Security Session Honza Cernocky 11/4/2015 37/36
Security Session Honza Cernocky 11/4/2015 38/11
Who am I
• MS. in Radioelectronics from BUT 1993.
• PhD. in Signal processing jointly from Universite d’Orsay
(France) and BUT
• Started speech coding in 1992 and stayed in speech processing
since
• was with Oregon Graduate Institute (Portland, OR) in the group
of Prof. Hermansky in 2001
• Since 2002 at the Faculty of Information Technology of BUT,
habilitation to Associate Professor (Doc.) in 2003.
• Executive leader of BUT Speech@FIT research group
• Since 2008 Head of Department of Computer Graphics and
Multimedia
Security Session Honza Cernocky 11/4/2015 39/36
BUT Speech@FIT
• Founded in 1997 (1 person)
• ~20 people in 2013 (faculty, researchers, grad and pre-grad
students, support staff)
• Active in all technologies this presentation is about
• Supported by EU,
local and US
(DARPA and
IARPA)
grants
International cooperation and standardization
• NIST evaluation campaigns
• Allowing for objective comparison of technologies
• Often on too good data.
• US-funded projects
• Realistic testing on noisy channels (DARPA RATS) and new
languages (IARPA Babel)
• Restricted to participants
• EU projects examples
• Past: MOBIO EU FP7 (mobile biometry) helped and fast speaker
recognition based on low-dimensional voice-prints.
• SIIP – addressing topic SEC-2013.5.1-2 Audio and voice analysis,
speaker identification for security applications – Integration Project
- starting now.
Standardization – not much …
• UK Home Office Forensic Speech and Audio (FSA) Group - Bring
forensic speech and audio under the regulation of ISO 17025
• ANSI/NIST-ITL Standard 1-2013, Data Format for
InterchangeRecord Type-11: Forensic and investigatory voice
record
Security Session Honza Cernocky 11/4/2015 40/36

Más contenido relacionado

La actualidad más candente

Computational linguistics
Computational linguisticsComputational linguistics
Computational linguisticsshrey bhate
 
ICANN 51: IDN Root Zone LGR (workshop)
ICANN 51: IDN Root Zone LGR (workshop)ICANN 51: IDN Root Zone LGR (workshop)
ICANN 51: IDN Root Zone LGR (workshop)ICANN
 
Language and Intelligence
Language and IntelligenceLanguage and Intelligence
Language and Intelligencebutest
 
MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...
MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...
MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...Lifeng (Aaron) Han
 
American Standard Sign Language Representation Using Speech Recognition
American Standard Sign Language Representation Using Speech RecognitionAmerican Standard Sign Language Representation Using Speech Recognition
American Standard Sign Language Representation Using Speech Recognitionpaperpublications3
 
An HLT profile of the official South African languages
An HLT profile of the official South African languagesAn HLT profile of the official South African languages
An HLT profile of the official South African languagesGuy De Pauw
 
The Role of Natural Language Processing in Information Retrieval
The Role of Natural Language Processing in Information RetrievalThe Role of Natural Language Processing in Information Retrieval
The Role of Natural Language Processing in Information RetrievalTony Russell-Rose
 
Natural language processing
Natural language processingNatural language processing
Natural language processingAbash shah
 
Natural language processing
Natural language processingNatural language processing
Natural language processingprashantdahake
 

La actualidad más candente (11)

Computational linguistics
Computational linguisticsComputational linguistics
Computational linguistics
 
ICANN 51: IDN Root Zone LGR (workshop)
ICANN 51: IDN Root Zone LGR (workshop)ICANN 51: IDN Root Zone LGR (workshop)
ICANN 51: IDN Root Zone LGR (workshop)
 
Language and Intelligence
Language and IntelligenceLanguage and Intelligence
Language and Intelligence
 
Intro
IntroIntro
Intro
 
MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...
MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...
MT SUMMIT PPT: Language-independent Model for Machine Translation Evaluation ...
 
American Standard Sign Language Representation Using Speech Recognition
American Standard Sign Language Representation Using Speech RecognitionAmerican Standard Sign Language Representation Using Speech Recognition
American Standard Sign Language Representation Using Speech Recognition
 
An HLT profile of the official South African languages
An HLT profile of the official South African languagesAn HLT profile of the official South African languages
An HLT profile of the official South African languages
 
The Role of Natural Language Processing in Information Retrieval
The Role of Natural Language Processing in Information RetrievalThe Role of Natural Language Processing in Information Retrieval
The Role of Natural Language Processing in Information Retrieval
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Penetration Testing
Penetration TestingPenetration Testing
Penetration Testing
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 

Similar a Dolování dat z řeči pro bezpečnostní aplikace - Jan Černocký

Natural Language Processing: L01 introduction
Natural Language Processing: L01 introductionNatural Language Processing: L01 introduction
Natural Language Processing: L01 introductionananth
 
NLP: Challenges and Opportunities in Underserved Areas
NLP: Challenges and Opportunities in Underserved AreasNLP: Challenges and Opportunities in Underserved Areas
NLP: Challenges and Opportunities in Underserved AreasColleen Farrelly
 
Automatic Speech Recognition.ppt
Automatic Speech Recognition.pptAutomatic Speech Recognition.ppt
Automatic Speech Recognition.pptRudraSaraswat3
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognitionboddu syamprasad
 
GTTS System for the Spoken Web Search Task at MediaEval 2012
GTTS System for the Spoken Web Search Task at MediaEval 2012GTTS System for the Spoken Web Search Task at MediaEval 2012
GTTS System for the Spoken Web Search Task at MediaEval 2012MediaEval2012
 
Conversational sensemaking Preece and Braines
Conversational sensemaking   Preece and BrainesConversational sensemaking   Preece and Braines
Conversational sensemaking Preece and Brainesdiannepatricia
 
Embracing Diversity: Searching over Multiple Languages - Suneel Marthi, Red H...
Embracing Diversity: Searching over Multiple Languages - Suneel Marthi, Red H...Embracing Diversity: Searching over Multiple Languages - Suneel Marthi, Red H...
Embracing Diversity: Searching over Multiple Languages - Suneel Marthi, Red H...Lucidworks
 
Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...RajkiranVeluri
 
Introduction to NLP.pptx
Introduction to NLP.pptxIntroduction to NLP.pptx
Introduction to NLP.pptxbuivantan_uneti
 
AI, don't f$# up my name.pdf
AI, don't f$# up my name.pdfAI, don't f$# up my name.pdf
AI, don't f$# up my name.pdfMarcis Pinnis
 
Introduction to text to speech
Introduction to text to speechIntroduction to text to speech
Introduction to text to speechBilgin Aksoy
 
NOVA Data Science Meetup 1/19/2017 - Presentation 2
NOVA Data Science Meetup 1/19/2017 - Presentation 2NOVA Data Science Meetup 1/19/2017 - Presentation 2
NOVA Data Science Meetup 1/19/2017 - Presentation 2NOVA DATASCIENCE
 
ELKL 5 Language documentation for linguistics and technology
ELKL 5 Language documentation for linguistics and technologyELKL 5 Language documentation for linguistics and technology
ELKL 5 Language documentation for linguistics and technologyDafydd Gibbon
 
Audiovisual collections, the spoken word and user needs of scholars in the Hu...
Audiovisual collections, the spoken word and user needs of scholars in the Hu...Audiovisual collections, the spoken word and user needs of scholars in the Hu...
Audiovisual collections, the spoken word and user needs of scholars in the Hu...roelandordelman.nl
 
Your Voice is My Passport
Your Voice is My PassportYour Voice is My Passport
Your Voice is My PassportPriyanka Aash
 
6_Big Data Sources part3-Day 3_A_text_mining.pptx
6_Big Data Sources part3-Day 3_A_text_mining.pptx6_Big Data Sources part3-Day 3_A_text_mining.pptx
6_Big Data Sources part3-Day 3_A_text_mining.pptxShowravDuttaAnkur
 
Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...
Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...
Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...Europeana
 
Towards Machine Comprehension of Spoken Content
Towards Machine Comprehension of Spoken ContentTowards Machine Comprehension of Spoken Content
Towards Machine Comprehension of Spoken ContentNVIDIA Taiwan
 
Forum Tal 2014: Celi company presentation
Forum Tal 2014: Celi company presentationForum Tal 2014: Celi company presentation
Forum Tal 2014: Celi company presentationCELI
 

Similar a Dolování dat z řeči pro bezpečnostní aplikace - Jan Černocký (20)

Natural Language Processing: L01 introduction
Natural Language Processing: L01 introductionNatural Language Processing: L01 introduction
Natural Language Processing: L01 introduction
 
NLP: Challenges and Opportunities in Underserved Areas
NLP: Challenges and Opportunities in Underserved AreasNLP: Challenges and Opportunities in Underserved Areas
NLP: Challenges and Opportunities in Underserved Areas
 
Automatic Speech Recognition.ppt
Automatic Speech Recognition.pptAutomatic Speech Recognition.ppt
Automatic Speech Recognition.ppt
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognition
 
GTTS System for the Spoken Web Search Task at MediaEval 2012
GTTS System for the Spoken Web Search Task at MediaEval 2012GTTS System for the Spoken Web Search Task at MediaEval 2012
GTTS System for the Spoken Web Search Task at MediaEval 2012
 
Conversational sensemaking Preece and Braines
Conversational sensemaking   Preece and BrainesConversational sensemaking   Preece and Braines
Conversational sensemaking Preece and Braines
 
Embracing Diversity: Searching over Multiple Languages - Suneel Marthi, Red H...
Embracing Diversity: Searching over Multiple Languages - Suneel Marthi, Red H...Embracing Diversity: Searching over Multiple Languages - Suneel Marthi, Red H...
Embracing Diversity: Searching over Multiple Languages - Suneel Marthi, Red H...
 
Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...Natural Language Processing, Techniques, Current Trends and Applications in I...
Natural Language Processing, Techniques, Current Trends and Applications in I...
 
Introduction to NLP.pptx
Introduction to NLP.pptxIntroduction to NLP.pptx
Introduction to NLP.pptx
 
AI, don't f$# up my name.pdf
AI, don't f$# up my name.pdfAI, don't f$# up my name.pdf
AI, don't f$# up my name.pdf
 
Introduction to text to speech
Introduction to text to speechIntroduction to text to speech
Introduction to text to speech
 
NOVA Data Science Meetup 1/19/2017 - Presentation 2
NOVA Data Science Meetup 1/19/2017 - Presentation 2NOVA Data Science Meetup 1/19/2017 - Presentation 2
NOVA Data Science Meetup 1/19/2017 - Presentation 2
 
ELKL 5 Language documentation for linguistics and technology
ELKL 5 Language documentation for linguistics and technologyELKL 5 Language documentation for linguistics and technology
ELKL 5 Language documentation for linguistics and technology
 
Audiovisual collections, the spoken word and user needs of scholars in the Hu...
Audiovisual collections, the spoken word and user needs of scholars in the Hu...Audiovisual collections, the spoken word and user needs of scholars in the Hu...
Audiovisual collections, the spoken word and user needs of scholars in the Hu...
 
Your Voice is My Passport
Your Voice is My PassportYour Voice is My Passport
Your Voice is My Passport
 
6_Big Data Sources part3-Day 3_A_text_mining.pptx
6_Big Data Sources part3-Day 3_A_text_mining.pptx6_Big Data Sources part3-Day 3_A_text_mining.pptx
6_Big Data Sources part3-Day 3_A_text_mining.pptx
 
Searching for the Best Machine Translation Combination
Searching for the Best Machine Translation CombinationSearching for the Best Machine Translation Combination
Searching for the Best Machine Translation Combination
 
Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...
Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...
Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...
 
Towards Machine Comprehension of Spoken Content
Towards Machine Comprehension of Spoken ContentTowards Machine Comprehension of Spoken Content
Towards Machine Comprehension of Spoken Content
 
Forum Tal 2014: Celi company presentation
Forum Tal 2014: Celi company presentationForum Tal 2014: Celi company presentation
Forum Tal 2014: Celi company presentation
 

Más de Security Session

Getting your hands dirty: How to Analyze the Behavior of Malware Traffic / SE...
Getting your hands dirty: How to Analyze the Behavior of Malware Traffic / SE...Getting your hands dirty: How to Analyze the Behavior of Malware Traffic / SE...
Getting your hands dirty: How to Analyze the Behavior of Malware Traffic / SE...Security Session
 
Základy reverse engineeringu a assembleru / KAREL LEJSKA, MILAN BARTOŠ [DEFEN...
Základy reverse engineeringu a assembleru / KAREL LEJSKA, MILAN BARTOŠ [DEFEN...Základy reverse engineeringu a assembleru / KAREL LEJSKA, MILAN BARTOŠ [DEFEN...
Základy reverse engineeringu a assembleru / KAREL LEJSKA, MILAN BARTOŠ [DEFEN...Security Session
 
Zabezpečení nejen SSH na serveru pomocí Fail2Ban a jednoduchého honeypotu. / ...
Zabezpečení nejen SSH na serveru pomocí Fail2Ban a jednoduchého honeypotu. / ...Zabezpečení nejen SSH na serveru pomocí Fail2Ban a jednoduchého honeypotu. / ...
Zabezpečení nejen SSH na serveru pomocí Fail2Ban a jednoduchého honeypotu. / ...Security Session
 
Insights of a brute-forcing botnet / VERONICA VALEROS [CISCO]
Insights of a brute-forcing botnet / VERONICA VALEROS [CISCO]Insights of a brute-forcing botnet / VERONICA VALEROS [CISCO]
Insights of a brute-forcing botnet / VERONICA VALEROS [CISCO]Security Session
 
Softwarove protektory / KAREL LEJSKA, MILAN BARTOŠ [DEFENDIO]
Softwarove protektory / KAREL LEJSKA, MILAN BARTOŠ [DEFENDIO]Softwarove protektory / KAREL LEJSKA, MILAN BARTOŠ [DEFENDIO]
Softwarove protektory / KAREL LEJSKA, MILAN BARTOŠ [DEFENDIO]Security Session
 
Wintel Hell: průvodce devíti kruhy Dantova technologického pekla / MARTIN HRO...
Wintel Hell: průvodce devíti kruhy Dantova technologického pekla / MARTIN HRO...Wintel Hell: průvodce devíti kruhy Dantova technologického pekla / MARTIN HRO...
Wintel Hell: průvodce devíti kruhy Dantova technologického pekla / MARTIN HRO...Security Session
 
Robots against robots: How a Machine Learning IDS detected a novel Linux Botn...
Robots against robots: How a Machine Learning IDS detected a novel Linux Botn...Robots against robots: How a Machine Learning IDS detected a novel Linux Botn...
Robots against robots: How a Machine Learning IDS detected a novel Linux Botn...Security Session
 
#ochranadat pred sebou samotným / MATEJ ZACHAR [SAFETICA TECHNOLOGIES S.R.O.]
#ochranadat pred sebou samotným / MATEJ ZACHAR [SAFETICA TECHNOLOGIES S.R.O.]#ochranadat pred sebou samotným / MATEJ ZACHAR [SAFETICA TECHNOLOGIES S.R.O.]
#ochranadat pred sebou samotným / MATEJ ZACHAR [SAFETICA TECHNOLOGIES S.R.O.]Security Session
 
Co vše skrývá síťový provoz a jak detekovat kybernetické hrozby? / MARTIN ŠKO...
Co vše skrývá síťový provoz a jak detekovat kybernetické hrozby? / MARTIN ŠKO...Co vše skrývá síťový provoz a jak detekovat kybernetické hrozby? / MARTIN ŠKO...
Co vše skrývá síťový provoz a jak detekovat kybernetické hrozby? / MARTIN ŠKO...Security Session
 
Bezpečnější pošta díky protokolu DANE / ONDŘEJ CALETKA [CESNET]
Bezpečnější pošta díky protokolu DANE / ONDŘEJ CALETKA [CESNET]Bezpečnější pošta díky protokolu DANE / ONDŘEJ CALETKA [CESNET]
Bezpečnější pošta díky protokolu DANE / ONDŘEJ CALETKA [CESNET]Security Session
 
Exploitace – od minulosti po současnost - Jan Kopecký
Exploitace – od minulosti po současnost - Jan KopeckýExploitace – od minulosti po současnost - Jan Kopecký
Exploitace – od minulosti po současnost - Jan KopeckýSecurity Session
 
Kontrola uživatelských účtů ve Windows a jak ji obejít - Martin Dráb
Kontrola uživatelských účtů ve Windows a jak ji obejít - Martin DrábKontrola uživatelských účtů ve Windows a jak ji obejít - Martin Dráb
Kontrola uživatelských účtů ve Windows a jak ji obejít - Martin DrábSecurity Session
 
Research in Liveness Detection - Martin Drahanský
Research in Liveness Detection - Martin DrahanskýResearch in Liveness Detection - Martin Drahanský
Research in Liveness Detection - Martin DrahanskýSecurity Session
 
Co se skrývá v datovém provozu? - Pavel Minařík
Co se skrývá v datovém provozu? - Pavel MinaříkCo se skrývá v datovém provozu? - Pavel Minařík
Co se skrývá v datovém provozu? - Pavel MinaříkSecurity Session
 
Jak odesílat zprávy, když někdo vypne Internet - Pavel Táborský
Jak odesílat zprávy, když někdo vypne Internet - 	Pavel TáborskýJak odesílat zprávy, když někdo vypne Internet - 	Pavel Táborský
Jak odesílat zprávy, když někdo vypne Internet - Pavel TáborskýSecurity Session
 
Two Years with botnet Asprox - Michal Ambrož
Two Years with botnet Asprox - Michal AmbrožTwo Years with botnet Asprox - Michal Ambrož
Two Years with botnet Asprox - Michal AmbrožSecurity Session
 
Falsifikace biometricke charakteristiky a detekce zivosti
Falsifikace biometricke charakteristiky a detekce zivostiFalsifikace biometricke charakteristiky a detekce zivosti
Falsifikace biometricke charakteristiky a detekce zivostiSecurity Session
 

Más de Security Session (20)

Getting your hands dirty: How to Analyze the Behavior of Malware Traffic / SE...
Getting your hands dirty: How to Analyze the Behavior of Malware Traffic / SE...Getting your hands dirty: How to Analyze the Behavior of Malware Traffic / SE...
Getting your hands dirty: How to Analyze the Behavior of Malware Traffic / SE...
 
Základy reverse engineeringu a assembleru / KAREL LEJSKA, MILAN BARTOŠ [DEFEN...
Základy reverse engineeringu a assembleru / KAREL LEJSKA, MILAN BARTOŠ [DEFEN...Základy reverse engineeringu a assembleru / KAREL LEJSKA, MILAN BARTOŠ [DEFEN...
Základy reverse engineeringu a assembleru / KAREL LEJSKA, MILAN BARTOŠ [DEFEN...
 
Zabezpečení nejen SSH na serveru pomocí Fail2Ban a jednoduchého honeypotu. / ...
Zabezpečení nejen SSH na serveru pomocí Fail2Ban a jednoduchého honeypotu. / ...Zabezpečení nejen SSH na serveru pomocí Fail2Ban a jednoduchého honeypotu. / ...
Zabezpečení nejen SSH na serveru pomocí Fail2Ban a jednoduchého honeypotu. / ...
 
Insights of a brute-forcing botnet / VERONICA VALEROS [CISCO]
Insights of a brute-forcing botnet / VERONICA VALEROS [CISCO]Insights of a brute-forcing botnet / VERONICA VALEROS [CISCO]
Insights of a brute-forcing botnet / VERONICA VALEROS [CISCO]
 
Softwarove protektory / KAREL LEJSKA, MILAN BARTOŠ [DEFENDIO]
Softwarove protektory / KAREL LEJSKA, MILAN BARTOŠ [DEFENDIO]Softwarove protektory / KAREL LEJSKA, MILAN BARTOŠ [DEFENDIO]
Softwarove protektory / KAREL LEJSKA, MILAN BARTOŠ [DEFENDIO]
 
Wintel Hell: průvodce devíti kruhy Dantova technologického pekla / MARTIN HRO...
Wintel Hell: průvodce devíti kruhy Dantova technologického pekla / MARTIN HRO...Wintel Hell: průvodce devíti kruhy Dantova technologického pekla / MARTIN HRO...
Wintel Hell: průvodce devíti kruhy Dantova technologického pekla / MARTIN HRO...
 
Robots against robots: How a Machine Learning IDS detected a novel Linux Botn...
Robots against robots: How a Machine Learning IDS detected a novel Linux Botn...Robots against robots: How a Machine Learning IDS detected a novel Linux Botn...
Robots against robots: How a Machine Learning IDS detected a novel Linux Botn...
 
#ochranadat pred sebou samotným / MATEJ ZACHAR [SAFETICA TECHNOLOGIES S.R.O.]
#ochranadat pred sebou samotným / MATEJ ZACHAR [SAFETICA TECHNOLOGIES S.R.O.]#ochranadat pred sebou samotným / MATEJ ZACHAR [SAFETICA TECHNOLOGIES S.R.O.]
#ochranadat pred sebou samotným / MATEJ ZACHAR [SAFETICA TECHNOLOGIES S.R.O.]
 
Co vše skrývá síťový provoz a jak detekovat kybernetické hrozby? / MARTIN ŠKO...
Co vše skrývá síťový provoz a jak detekovat kybernetické hrozby? / MARTIN ŠKO...Co vše skrývá síťový provoz a jak detekovat kybernetické hrozby? / MARTIN ŠKO...
Co vše skrývá síťový provoz a jak detekovat kybernetické hrozby? / MARTIN ŠKO...
 
Bezpečnější pošta díky protokolu DANE / ONDŘEJ CALETKA [CESNET]
Bezpečnější pošta díky protokolu DANE / ONDŘEJ CALETKA [CESNET]Bezpečnější pošta díky protokolu DANE / ONDŘEJ CALETKA [CESNET]
Bezpečnější pošta díky protokolu DANE / ONDŘEJ CALETKA [CESNET]
 
Prezentace brno
Prezentace brnoPrezentace brno
Prezentace brno
 
OSINT and beyond
OSINT and beyondOSINT and beyond
OSINT and beyond
 
Exploitace – od minulosti po současnost - Jan Kopecký
Exploitace – od minulosti po současnost - Jan KopeckýExploitace – od minulosti po současnost - Jan Kopecký
Exploitace – od minulosti po současnost - Jan Kopecký
 
Kontrola uživatelských účtů ve Windows a jak ji obejít - Martin Dráb
Kontrola uživatelských účtů ve Windows a jak ji obejít - Martin DrábKontrola uživatelských účtů ve Windows a jak ji obejít - Martin Dráb
Kontrola uživatelských účtů ve Windows a jak ji obejít - Martin Dráb
 
Research in Liveness Detection - Martin Drahanský
Research in Liveness Detection - Martin DrahanskýResearch in Liveness Detection - Martin Drahanský
Research in Liveness Detection - Martin Drahanský
 
Turris - Robert Šefr
Turris - Robert ŠefrTurris - Robert Šefr
Turris - Robert Šefr
 
Co se skrývá v datovém provozu? - Pavel Minařík
Co se skrývá v datovém provozu? - Pavel MinaříkCo se skrývá v datovém provozu? - Pavel Minařík
Co se skrývá v datovém provozu? - Pavel Minařík
 
Jak odesílat zprávy, když někdo vypne Internet - Pavel Táborský
Jak odesílat zprávy, když někdo vypne Internet - 	Pavel TáborskýJak odesílat zprávy, když někdo vypne Internet - 	Pavel Táborský
Jak odesílat zprávy, když někdo vypne Internet - Pavel Táborský
 
Two Years with botnet Asprox - Michal Ambrož
Two Years with botnet Asprox - Michal AmbrožTwo Years with botnet Asprox - Michal Ambrož
Two Years with botnet Asprox - Michal Ambrož
 
Falsifikace biometricke charakteristiky a detekce zivosti
Falsifikace biometricke charakteristiky a detekce zivostiFalsifikace biometricke charakteristiky a detekce zivosti
Falsifikace biometricke charakteristiky a detekce zivosti
 

Último

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 

Último (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 

Dolování dat z řeči pro bezpečnostní aplikace - Jan Černocký

  • 1. Dolování dat z řeči pro bezpečnostní aplikace Honza Černocký BUT Speech@FIT, FIT VUT v Brně Security Session, 11.4.2015
  • 2. Security Session Honza Cernocky 11/4/2015 2/36 Agenda • Introduction • Gender ID example • Speech recognition • Language identification • Speaker recognition • Conclusions
  • 3. 3/36 Needle in a haystack • Speech is the most important modality of human-human communication (~80% of information) … criminals and terrorists are also communicating by speech • Speech is easy to acquire in the scenarios of interest. • More difficult is to find what we are looking for • Typically done by human experts, but always count on: • Limited personnel • Limited budget • Not enough languages spoken • Insufficient security clearances Technologies of speech processing are not almighty but can help to narrow the search space. Security Session Honza Cernocky 11/4/2015
  • 4. Security Session Honza Cernocky 11/4/2015 4/36 Data mining from spontaneous unprepared speech Speaker/Voice Recognition Gender Recognition Language Recognition Who speaks? What gender? What language? John Doe Male or Female English/German/?? audio (speech) Speech Recognition What was said?“Hello John!” “John” spotted Time/relation analysis Who asked to whom? John asked Paul
  • 5. Security Session Honza Cernocky 11/4/2015 5/36 How do we work ? • According to recipes from pattern recognition text- books ! Collect data Choose features Choose model Train model Evaluate the classifier A priori knowledge of the problem deployment Happy (or deadline passed) ? Unhappy?
  • 6. Security Session Honza Cernocky 11/4/2015 6/36 The result Feature extraction Evaluation of probabilities or likelihoods Models “Decoding” nput decision
  • 7. 7/36 The simplest example … GID Gender Identification • Tag speech segments as male or female. Security Session Honza Cernocky 11/4/2015
  • 8. Security Session Honza Cernocky 11/4/2015 8/36 So how is Gender-ID done ? Evaluation of GMM likelihoods MFCC put Gaussian Mixture models – boys, girls Decision Male/female
  • 9. Security Session Honza Cernocky 11/4/2015 9/36 Features – Mel Frequency Cepstral Coefficients • The signal is not stationary • And the hearing is not linear
  • 10. Security Session Honza Cernocky 11/4/2015 10/36 Features – a vector each 10ms
  • 11. Security Session Honza Cernocky 11/4/2015 11/36 The evaluation of likelihoods: GMM
  • 12. Security Session Honza Cernocky 11/4/2015 12/36 Decision - „decoding“
  • 13. Gender ID summary Needed data: •Several hours of speech (from the target channels) labeled as M or F. Accuracy: •the most accurate of our speech data mining tools: >96% accuracy on challenging channels What do we get: •Limiting the search space by 50% Security Session Honza Cernocky 11/4/2015 13/36
  • 14. Security Session Honza Cernocky 11/4/2015 14/36 Speech recognition • Voice2text (V2T), Speech2text (S2T), transcription … • Large vocabulary continuous speech recognition (LVCSR) Feature extraction Evaluation of likelihoods (scores of hypothesis) Acoustic models “Decoding” peech text Language model Pronunciation dictionary Recognition network
  • 15. LVCSR technically … • Acoustic models • … how do speech segments match basic speech unites (phonemes) • trained on large (>100h) quantities of carefully transcribed speech data • Classically Gaussian Mixture models • Language models • … how do the words follow each other President George Bush President George push • Need to be trained on large quantities (Gigabytes) of text from the target domain • Pronunciation dictionary • Translate words into phonemes: dog  d oh g • Basis needs to be created by hand, the rest generated using trained grapheme to phoneme (g2p) converter • A toolkit to do all this … HTK, KALDI, proprietary. Security Session Honza Cernocky 11/4/2015 15/36
  • 16. Security Session Honza Cernocky 11/4/2015 16/36 Making LVCSR work well • Neural networks • Eating up other techniques (feature extraction, scoring, LM) - DNNs • Bottle-neck NNs. • Speaker adaptation • Asking the speaker to read a text in dictation systems … • Unsupervised needed ! • MAP, MLLR, CMLLR, RDLT, SAT …
  • 17. Security Session Honza Cernocky 11/4/2015 17/36 Challenges in LVCSR • LVCSR relatively mature in well represented languages (US English, Modern Standard Arabic, Czech) • Fast development of recognizers for new languages with limited resources – IARBA BABEL project • Limited language packs 10h + some 70h of untranscribed data • 2013 languages: Cantonese, Turkish, Pashto, Tagalog, Surprise - Vietnamese • 2014 languages: Bengali, Assamese, Zulu, Haiti Creole, Lao, Surprise: Tamil • How to re-use resources from other languages ? • How to adapt to user’s language/domain without seeing his/her data ?
  • 18. Security Session Honza Cernocky 11/4/2015 18/36 Some examples …. and then they have one week to retrain their keyword results ... and ... give you might ask why one we there a lot of research or evaluation methods ... the people are trying out what keywords or so it is important to leave a ... sufficient amount of time there as well ... uhuh kade sengifowunelwe nguThami manje ithi angazi e- ekhuluma nomunye ubhuti wakwamasipala ukuthi ene usho ukuthi kunabantu ekufanele baphelelwe ngumsebenzi ngoba uNomvula emecabanga uzokhokha (()) ngoba yena uzoy ithela uzoyi uzoyihlulisela ngoba phela kukhona aba- abaphethe u-Adam angithi
  • 19. LVCSR – what to expect Accuracies (word accuracy) •Dictation: >90% •Reasonable languages: >70% •Babel languages ~70% WER (example on Tamil) Is this OK ?? •Usually not useable for direct reading, and questionable, if a trained secretary is not faster in case we need 100% accurate output. •Yes useable for search, for rare languages often the only alternative. Security Session Honza Cernocky 11/4/2015 19/36
  • 20. LVCSR – user data • Speech (for acoustic models): • Many hours of data as close as possible to the target use (language, dialect, speaking style …) • Needs to be transcribed better than in TV subtitles. • Text (for language models) • Newspapers and TV news work for dictation but not here. • Need target text data (including very dirty language) • Can be simulated by looking for dirty Internet data (Twitter, discussion forums). • Pronunciations: generally not a big deal, needs list of words. Problematic for languages without expertise. • Privacy issues: • Speech and text are sensitive. • Re-training of LVCSR by the users so far not successful. • Work on modularization: collection of statistics by the user, shipping to development teams… • Opportunity to collect this data jointly, especially for languages relevant for security across Europe Security Session Honza Cernocky 11/4/2015 20/36
  • 21. Security Session Honza Cernocky 11/4/2015 21/36 Language identification • Which language in the recording ? LID
  • 22. Security Session Honza Cernocky 11/4/2015 22/36 Standard approaches • Acoustics • Phonotactics
  • 23. Security Session Honza Cernocky 11/4/2015 23/36 LID: Current state-of-the-art system • A large GMM (“Universal Background model - UBM”) – performs collection of sufficient statistics – a vector of several thousands of parameters per utterance (fixed size!) • Projection to a “language print” – several hundreds of values. • These language prints are scored and score is calibrated.
  • 24. LID – what to expect • Performance on nice data NIST LRE 2009, 23 languages Security Session Honza Cernocky 11/4/2015 24/36 0% 2% 4% 6% 8% 10% 30s 10s 3s Best 1 Best 2 Best 3 Best 4 Best 5 Phase3 Phase2 Phase1 17 • And on terrible data RATS 2014, 5 languages (EER)
  • 25. Security Session Honza Cernocky 11/4/2015 25/36 LID – user data • Tens of hours of data per target language or dialect • Need to have only the language label, no transcription necessary. • Allow to: • Improve the model of an existing language. • Add a new language or dialect, or even a target group • LID is a technology where the user can modify the system him/her-self • Language prints do not carry the information on the content – potential for cooperation • Backup solution: • automatic acquisition of language-specific telephone data from public sources (EOARD project)
  • 26. Security Session Honza Cernocky 11/4/2015 26/36 Speaker recognition Two hypotheses • H0: the speaker in test recording IS THE SAME WE SAW IN THE ENROLMENT • H1: the speaker in test recording IS DIFFERENT • Log likelihood ratio
  • 27. SRE classical scheme • Feature extraction – Mel Frequency Cepstral Coefficients • Background model implemented as a Gaussian Mixture model • Adapted to the target speaker. • At the time of the test, both models produce likelihoods that are subtracted and thresholded. Such a system • Can be built by a reasonably skilled student equipped with Matlab in half a day • Will reasonably function in case enrollment and test take place under similar conditions. Security Session Honza Cernocky 11/4/2015 27/36 IKR !
  • 28. Inter-session variability NOT HAVING THE SAME CONDITIONS ! Intrinsic variability •Language •Emotions, stress, Lombard effect •Health condition •Content of the message Extrinsic variability •Noise •Transmission channel •Codec (or series of codecs) •Recording device … Security Session Honza Cernocky 11/4/2015 28/36
  • 29. Security Session Honza Cernocky 11/4/2015 29/36 Years of SRE R&D fighting the variability … Front-end processing Front-end processing Target modelTarget model Background model Background model LR score normalization LR score normalization Σ ΛAdapt Feature domain Model domain Score domain • Noise removal • Tone removal • Cepstral mean subtraction • RASTA filtering • Mean & variance normalization • Feature warping • Speaker Model Synthesis • Eigenchannel compensation •Joint Factor Analysis • Nuisance Attribute Projection • Z-norm • T-norm • ZT-norm •Feature Mapping •Eigenchannel adaptation in feature domain
  • 30. Security Session Honza Cernocky 11/4/2015 30/36 Current state-of-the-art • Low-dimensional representation of whole recordings • i-Vectors (for R&D), Voiceprints (for business) • Allows for very fast scoring.
  • 31. Security Session Honza Cernocky 11/4/2015 31/36 What to expect I. • Works very nicely for long telephone recordings (EER ~2%) – multiple successes in NIST evaluations. • Examples …
  • 32. Security Session Honza Cernocky 11/4/2015 32/36 What to expect II. • Noise, varying communication channels, short recordings (10s) still a problem – DARPA RATS program • Examples …
  • 33. SRE – user data • The performance of the SRE system crucially depends on how the training data is close to the deployment. • UBM – needs lots (100s of hours) of unannotated data, not very sensitive. • VoicePrint extractor – dtto. • Scoring done by PLDA • Voice-prints with speaker labels (A,B,C, …) needed • Even 50 speakers help to increase the accuracy by 30%. • … but some users are not able to collect/label even this amount. • Work running on unsupervised adaptation on unannotated data. Security Session Honza Cernocky 11/4/2015 33/36
  • 34. The charm of voice-prints • Allowing for transfer of speaker identities • without giving out the original WAV • Without possibility to reconstruct what was said. Security Session Honza Cernocky 11/4/2015 34/36 No contentcontent • Opening a range of opportunities for • Cooperation between customers and law enforcement • Cooperation with R&D teams.
  • 35. Conclusions • Speech data mining technologies are already serving in security and defense (and you can test and eventually buy the ones from several vendors) • International crime asks for international reaction: Standardization (even in the form of informal working draft) should take place ASAP to allow Police forces to exchange voice-prints regardless of vendors. … we’re on it. Security Session Honza Cernocky 11/4/2015 35/36
  • 36. Security Session Honza Cernocky 11/4/2015 36/36 Díky za pozvání na Security Session ! Otázky ?
  • 37. BACKUP SLIDES Security Session Honza Cernocky 11/4/2015 37/36
  • 38. Security Session Honza Cernocky 11/4/2015 38/11 Who am I • MS. in Radioelectronics from BUT 1993. • PhD. in Signal processing jointly from Universite d’Orsay (France) and BUT • Started speech coding in 1992 and stayed in speech processing since • was with Oregon Graduate Institute (Portland, OR) in the group of Prof. Hermansky in 2001 • Since 2002 at the Faculty of Information Technology of BUT, habilitation to Associate Professor (Doc.) in 2003. • Executive leader of BUT Speech@FIT research group • Since 2008 Head of Department of Computer Graphics and Multimedia
  • 39. Security Session Honza Cernocky 11/4/2015 39/36 BUT Speech@FIT • Founded in 1997 (1 person) • ~20 people in 2013 (faculty, researchers, grad and pre-grad students, support staff) • Active in all technologies this presentation is about • Supported by EU, local and US (DARPA and IARPA) grants
  • 40. International cooperation and standardization • NIST evaluation campaigns • Allowing for objective comparison of technologies • Often on too good data. • US-funded projects • Realistic testing on noisy channels (DARPA RATS) and new languages (IARPA Babel) • Restricted to participants • EU projects examples • Past: MOBIO EU FP7 (mobile biometry) helped and fast speaker recognition based on low-dimensional voice-prints. • SIIP – addressing topic SEC-2013.5.1-2 Audio and voice analysis, speaker identification for security applications – Integration Project - starting now. Standardization – not much … • UK Home Office Forensic Speech and Audio (FSA) Group - Bring forensic speech and audio under the regulation of ISO 17025 • ANSI/NIST-ITL Standard 1-2013, Data Format for InterchangeRecord Type-11: Forensic and investigatory voice record Security Session Honza Cernocky 11/4/2015 40/36

Notas del editor

  1. Sem dat obrazek spkID a zaramovat sloupec s pohlavim !!!
  2. Can do this in more detail later …
  3. Q publikum: kde takova data vzit ? Mozna demo na cestine, kurva piča, atd
  4. Q pro publikum: co je tady nejvetsi challenge ? … poznat kde vubec rec je – VAD !
  5. It might be problematic to collect even these 50 speakers (if possible on different communication channels…)