Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.
Voice and Identity: applications and
limitations of the voice as a biometric
Colleen Kavanagh, Paul Foulkes, Peter French,...
Outline
1. voice comparison
2. automatic speaker recognition (ASR)
– principles
– theoretical limitations
– practical limi...
1. Voice comparison
• comparison of voice in criminal recording with voice
in recording of known suspect
• assist court wi...
2. Automatic speaker recognition (ASR)
• voice of questioned (criminal) speaker is put into
system
• voice of suspect also...
2.1 ASR: principles
• system mathematically reduces
speech samples to statistical
models reflecting vocal tract
geometry
•...
2.1 ASR: principles
• ‘distance’ between
criminal voice and
suspect model  score
for similarity
• ‘distance’ between
crim...
2.2 ASR limitations: theoretical and
practical
2.2.1 theoretical problems with ASR
• inherent limitations of underlying assumption of
ASR: the vocal tract as a biometric...
2.2.1 theoretical problems with ASR
interspeaker differences
• Relatively small differences
between vocal tracts of
indivi...
2.2.1 theoretical problems with ASR
indirectness
• we do not examine the
physical vocal tract directly:
contours, surfaces...
2.3 practical problems with ASR
• UK: Regulation of Investigatory Powers Act (2000)
prohibits use of intercepts (phone tap...
• example: French & Harrison (2010)
– 767 trials using real forensic case data with known outcomes
– EER = equal error rat...
2.4 summary
• ASR has great advantages
– speed, replicability, very good performance in experiments...
• but several inher...
2.4 summary
• voice is therefore different from most other
biometrics: it is much less fixed, much more subject
to within-...
3. Contributions to forensic speaker
comparison from phonetics and linguistics
3. Contributions from phonetics
• ASR as one component in a broader approach
– avoid dependence on a single type of metric...
3.1 componential phon-ling analysis
• application of standard, largely uncontroversial,
analytic techniques from phonetics...
3.1 componential phon-ling analysis
• syntax/grammar
– e.g. I did it ~ I done it
• morphology (word-structure)
– e.g. twen...
• numerous components for analysis
– French et al (2011), Foulkes & French (2012)
19
feature notes
Vowels English: 24 Vs; ...
• numerous components for analysis
– French et al (2011), Foulkes & French (2012)
20
feature notes
Articulation rate speed...
3.1 componential phon-ling analysis
• some general advantages
– many components robust to channel mismatch
– can derive ri...
3.2 ‘Cooper’ case example
• case referred to as ‘Cooper’ in Foulkes &
French (2012: 564-5)
• theft at care home
• 4 second...
3.2 ‘Cooper’ case example
example: QS (4 seconds)
I’ve come to see the lady at number two
av kʰʊm tsiː ʔ ɫɛɪd jəʔ nʊmbə ts...
3.2 ‘Cooper’ case example
(some) observable features:
• general Yorkshire accent
• PRICE reduced to monophthong [a] (in bo...
4. York research
4.1 Voice and Identity: source, filter, biometric
• overriding aims: ASR seen as useful addition to other
components withi...
4.1 Voice and Identity: source, filter, biometric
• Comparison of 3 methods of analysis of vocal tract
output using DyViS ...
LTFD: acoustic analysis of vowels
– vowels characterised by configuration of constituent
resonances – formants
28
i ɑ u
LTFD: acoustic analysis of vowels
– vowels characterised by configuration of constituent
resonances – formants
29
i ɑ u
F1...
• vowels extracted from running speech
• LTFD: long-term formant distribution
30
Voice quality: Vocal Profile Analysis (VPA)
• Modified VPA protocol
(Laver et al 1981; Stevens
& French 2012)
31
Denasal N...
4.1 Voice and Identity: source, filter, biometric
• no method perfect, but methods make different
mistakes
#067 #072
• str...
4.2 summary
• voice is not like other biometrics (e.g. fingerprints, DNA), but
its problems are not insurmountable
• voice...
thank you
colleen.kavanagh@york.ac.uk
paul.foulkes@york.ac.uk
Próxima SlideShare
Cargando en…5
×
Próxima SlideShare
What to Upload to SlideShare
Siguiente
Descargar para leer sin conexión y ver en pantalla completa.

1

Compartir

Descargar para leer sin conexión

Voice and identity: applications and limitations of the voice as a biometric

Descargar para leer sin conexión

Kavanagh, C., Foulkes, P., French, J. P., Hughes, V., Harrison, P. and San Segundo, E. (2017) Voice and identity: applications and limitations of the voice as a biometric. Centre for Research and Evidence on Security Threats (CREST) Workshop: Current state of the art, and future directions for linguistic analysis in a security context, London, UK. 2 October 2017.

Libros relacionados

Gratis con una prueba de 30 días de Scribd

Ver todo

Audiolibros relacionados

Gratis con una prueba de 30 días de Scribd

Ver todo

Voice and identity: applications and limitations of the voice as a biometric

  1. 1. Voice and Identity: applications and limitations of the voice as a biometric Colleen Kavanagh, Paul Foulkes, Peter French, Vincent Hughes, Philip Harrison, & Eugenia San Segundo University of York, J P French Associates, & Royal Canadian Mounted Police 2 October 2017
  2. 2. Outline 1. voice comparison 2. automatic speaker recognition (ASR) – principles – theoretical limitations – practical limitations 3. phonetics and linguistics 4. ongoing research at York 2
  3. 3. 1. Voice comparison • comparison of voice in criminal recording with voice in recording of known suspect • assist court with determining identity/non-identity of suspect and criminal 3
  4. 4. 2. Automatic speaker recognition (ASR) • voice of questioned (criminal) speaker is put into system • voice of suspect also put into system • database of voices: reference population is put into system 4
  5. 5. 2.1 ASR: principles • system mathematically reduces speech samples to statistical models reflecting vocal tract geometry • 1 model for suspect • models for reference population • voice of criminal is compared with model for suspect and for reference population models 5 http://parole.loria.fr
  6. 6. 2.1 ASR: principles • ‘distance’ between criminal voice and suspect model  score for similarity • ‘distance’ between criminal voice and reference population models  score for typicality • high similarity with suspect and low typicality  criminal voice more likely to be suspect • low similarity with suspect and high typicality  criminal voice more likely to be different person 6
  7. 7. 2.2 ASR limitations: theoretical and practical
  8. 8. 2.2.1 theoretical problems with ASR • inherent limitations of underlying assumption of ASR: the vocal tract as a biometric • vocal tract is indeed probably unique to each speaker • but potential of vocal tract as a biometric is limited by: – small differences between speakers – plasticity – indirectness – exogenous influences 8
  9. 9. 2.2.1 theoretical problems with ASR interspeaker differences • Relatively small differences between vocal tracts of individuals • abundant evidence from studies in anatomy & physiology plasticity of the VT • short-term plasticity – in the course of speaking • long-term plasticity – linguistic socialisation, e.g. voice quality associations by dialect/community of practice – ageing: • puberty: larynx descent in males • old age: calcification of cartilages, atrophy of muscles  whispery phonation, limitation to vocal movement 9
  10. 10. 2.2.1 theoretical problems with ASR indirectness • we do not examine the physical vocal tract directly: contours, surfaces, volume, cross-section... • instead, we examine its acoustic output to infer facts about the vocal tract • cf. inferring the shape and size of a gun barrel from the acoustics of a recorded shot exogenous influences • health • intoxicants • physical environment • time of day 10
  11. 11. 2.3 practical problems with ASR • UK: Regulation of Investigatory Powers Act (2000) prohibits use of intercepts (phone taps) as evidence • recordings are therefore usually degraded in quality – channel mismatch (e.g. phone versus direct) – recording media (mobile phone, CCTV, poor technology...) – acoustic environments (traffic, noise, distance from mic...) • these problems affect ASR performance markedly... 11
  12. 12. • example: French & Harrison (2010) – 767 trials using real forensic case data with known outcomes – EER = equal error rate (classifies SS as DS; DS as SS) • to achieve EER of 5%, 78% of cases must be rejected as unsuitable for analysis • is error rate of 5% even acceptable – particularly where the errors are false hits? 12 adequacy rating (given by Batvox system) # trials EER OK 171/767 (22%) 5.4 % OK + Warning 369/767 (48%) 15.1 % All 767/767 (100%) 24.2 %
  13. 13. 2.4 summary • ASR has great advantages – speed, replicability, very good performance in experiments... • but several inherent limitations of vocal tract acoustic output as a biometric – relative lack of variability across individuals – high variability within individuals – these factors yield greater overlap between speakers • does vocal tract output alone really have the potential to discriminate a population of e.g. 16m adult Caucasian males in England? 13
  14. 14. 2.4 summary • voice is therefore different from most other biometrics: it is much less fixed, much more subject to within-individual variation • but sources of variation in speech and language are relatively well understood by linguists, phoneticians, dialectologists, sociolinguists – patterns are usually principled not random • we can exploit knowledge of these patterns to assist in forensic speaker comparison 14
  15. 15. 3. Contributions to forensic speaker comparison from phonetics and linguistics
  16. 16. 3. Contributions from phonetics • ASR as one component in a broader approach – avoid dependence on a single type of metric – seek alternative features for analysis to circumvent inherent problems in ASR – stronger evidence where multiple lines of enquiry (independent features) yield consistent conclusions • incorporate componential phonetic-linguistic analysis 16
  17. 17. 3.1 componential phon-ling analysis • application of standard, largely uncontroversial, analytic techniques from phonetics & linguistics • views speech signal as complex & divisible, composed of (semi-)independent elements, vs. holistic approach of automatic systems 17
  18. 18. 3.1 componential phon-ling analysis • syntax/grammar – e.g. I did it ~ I done it • morphology (word-structure) – e.g. twenty-five pounds ~ twenty-five pound • lexical choices – e.g. twenty-five pounds ~ twenty-five quid ~ pony • phonology (sound system) – e.g. distinction of look/luck, which/witch • phonetics/acoustics (pronunciation) – e.g. /t/ variation: get off with [t – d - ʔ – r] 18
  19. 19. • numerous components for analysis – French et al (2011), Foulkes & French (2012) 19 feature notes Vowels English: 24 Vs; different patterns for specific phonological environments; acoustic features (formant centre frequencies, densities, bandwidths), sociolinguistic variables... Consonants English: 20 Cs; different patterns for specific phonological environments; energy loci of fricatives and stop bursts; segment durations inc. VOT; sociolinguistic variables... Vocal setting Laver VPA scheme: 38 separate elements Intonation contours constrained by phonology & discourse Pitch mean, range, s.d. ...
  20. 20. • numerous components for analysis – French et al (2011), Foulkes & French (2012) 20 feature notes Articulation rate speed of speech Rhythm Tone for languages with contrastive tone Connected speech processes assimilation, elision... Discourse/ Pragmatics discourse markers, turn-taking, telephone openings, code switching... Non-linguistic audible breathing, throat-clearing, tongue clicking, filled and silent hesitation phenomena...
  21. 21. 3.1 componential phon-ling analysis • some general advantages – many components robust to channel mismatch – can derive rich information from short samples – concrete reference: easily expressed in court – independence of features: increases depth of analysis, multiple evidence types in combination 21
  22. 22. 3.2 ‘Cooper’ case example • case referred to as ‘Cooper’ in Foulkes & French (2012: 564-5) • theft at care home • 4 seconds of speech recorded via intercom system • suspect read version of text 22
  23. 23. 3.2 ‘Cooper’ case example example: QS (4 seconds) I’ve come to see the lady at number two av kʰʊm tsiː ʔ ɫɛɪd jəʔ nʊmbə tsuwːː (I’m fro)m the Home Care I’ve come to collect her sheet mʔ oʊm kʰɛːɹ av kʰʊm tə kʰəlɛkt ə ʃiːːʔ 23
  24. 24. 3.2 ‘Cooper’ case example (some) observable features: • general Yorkshire accent • PRICE reduced to monophthong [a] (in both instances of I’ve) • STRUT = typical northern English /ʊ/ (come, number) • schwa fully elided (to, collect) • /t/ = glottal stop in word-final position (at, sheet) • /l/ is ‘dark’ in syllable-onset positions (lady, collect) • despite Yorks accent, FACE & GOAT = diphthongs (lady, Home) • GOOSE and FLEECE are not monophthongal (two, sheet) • final syllable in each speaking turn markedly elongated • definite article = local northern form [ʔ] • /h/ is deleted (Home, her) • the speaker is not rhotic but uses linking /r/ (Care I’ve) 24
  25. 25. 4. York research
  26. 26. 4.1 Voice and Identity: source, filter, biometric • overriding aims: ASR seen as useful addition to other components within componential approach – test & strengthen all components in broader approach 1. what are best speaker-discriminating components? 2. robustness of components to exogenous factors 3. to show for the first time what the relationship is between what the ASR (black box) measures and features linguistic phoneticians examine 26
  27. 27. 4.1 Voice and Identity: source, filter, biometric • Comparison of 3 methods of analysis of vocal tract output using DyViS data (Nolan et al., 2009): • Automatic: MFCCs (Batvox) • Semi-automatic: LTFDs • acoustic phonetic • Phonetic: supralaryngeal voice quality (VPA) • auditory phonetic 27
  28. 28. LTFD: acoustic analysis of vowels – vowels characterised by configuration of constituent resonances – formants 28 i ɑ u
  29. 29. LTFD: acoustic analysis of vowels – vowels characterised by configuration of constituent resonances – formants 29 i ɑ u F1 F2
  30. 30. • vowels extracted from running speech • LTFD: long-term formant distribution 30
  31. 31. Voice quality: Vocal Profile Analysis (VPA) • Modified VPA protocol (Laver et al 1981; Stevens & French 2012) 31 Denasal Nasal Fronted tongue Pharyngeal constriction
  32. 32. 4.1 Voice and Identity: source, filter, biometric • no method perfect, but methods make different mistakes #067 #072 • strong argument for combining methods 32
  33. 33. 4.2 summary • voice is not like other biometrics (e.g. fingerprints, DNA), but its problems are not insurmountable • voice has considerable potential as evidence • overarching aim to move towards unified theoretical position on speaker characterisation • best means forward is tried and tested phonetic/acoustic methods in combination with ASR 33
  34. 34. thank you colleen.kavanagh@york.ac.uk paul.foulkes@york.ac.uk
  • FannyCarlstrmPlaza

    Sep. 24, 2018

Kavanagh, C., Foulkes, P., French, J. P., Hughes, V., Harrison, P. and San Segundo, E. (2017) Voice and identity: applications and limitations of the voice as a biometric. Centre for Research and Evidence on Security Threats (CREST) Workshop: Current state of the art, and future directions for linguistic analysis in a security context, London, UK. 2 October 2017.

Vistas

Total de vistas

798

En Slideshare

0

De embebidos

0

Número de embebidos

0

Acciones

Descargas

26

Compartidos

0

Comentarios

0

Me gusta

1

×