SlideShare una empresa de Scribd logo
1 de 1
Descargar para leer sin conexión
David Grogan
dgrogan@stanford.edu
Phathaphol (Peter)
Karnchanapimonkul
phatk@stanford.edu
● We built a Speaker Identification network.
Input is a voice sample, output is who’s talking.
● Our motivation came from Alzheimer’s patients
who forget or can’t identify who is speaking in
group calls, e.g. family calls to Grandma from
one speaker phone.
● The solution extends to voice conference calls,
as seen on right.
● Many existing solutions perform Speaker
Identification only on clean audio. We use noisy
audio to better simulate real-world conditions.
● Our network is trained on 20 speakers, as that’s
roughly the max size of a family or work team.
Dataset
Introduction Algorithms and Models
● LibriVox and Internet Archive host
royalty-free audiobooks read by
volunteers.
● Mono-channel, 22050Hz mp3s; one
file per chapter.
● We split into 0.5 second training
examples.. 0.5 seconds seemed to be
the shortest time a human needs to
identify the speaker.
● Complication: Some audio books
have multiple speakers. Some
speakers read multiple audio books.
Augmenting with Noise
Results & Analysis
Experiments
● Audiobooks are clean recordings. We want background noise.
● 3 Background Noises: 1) Crowd Talking, 2) Laptop Keyboard, 3) Plastic Crumple
● Overlay ENTIRE audiobook with noise in 20-second chunks. Normalize volume to match.
Speaker Identification in a Noisy Environment
Raw Waveforms vs MFCC
INPUT
11025 waveform samples
Batch Normalization
CONV1D Filter 3, Stride 2, Depth 40
Leaky ReLU
Batch Normalization
CONV1D Filter 3, Stride 2, Depth 50
Leaky ReLU
Batch Normalization
CONV1D Filter 3, Stride 2, Depth 60
Leaky ReLU
Max Pool Size 2
Batch Normalization
CONV1D Filter 3, Stride 2, Depth 60
Leaky ReLU
Max Pool Size 2
Batch Normalization
CONV1D Filter 3, Stride 2, Depth 80
Leaky ReLU
Max Pool Size 2
Batch Normalization
CONV1D Filter 3, Stride 2, Depth 100
Leaky ReLU
Max Pool Size 2
Batch Normalization
Dense Size 40
Leaky ReLU
Dense Size 20
SoftMax
OUTPUT
20 classes
20 audiobooks from LibriVox
Final Architecture
Batch Normalization
CONV1D Filter 3, Stride 2, Depth 60
Leaky ReLU
Input comprises
11025 waveform
samples which
represents 0.5
seconds of data.
We used 7
convolutional
layers and 4
max-pooling
layers all of stride
2 to reduce the
input from size
11025 to size 5
before flattening
the channels to
pass to the dense
layers.
Each of the
output classes
corresponds to
one speaker.
● Model trained on clean data achieves 94% test accuracy on clean data.
● Clean model achieves a paltry 14% accuracy when making predictions on noisy data.
● Significant effort training models with MFCCs, which are generated from the raw
waveform via substantial signal preprocessing. Traditionally, most Speech Recognition and
Speaker Identification use MFCCs as input.
● The MFCC models achieved 88% accuracy on clean data, but only 63% on noisy data.
● TA advised that the preprocessing could be discarding valuable signal, so we abandoned
MFCCs and trained a bigger network using raw waveforms as input.
● Final model achieved 86% accuracy on the test set
● Accuracy was NOT uniform across speakers
References
Sainath, Tara N., et al. "Learning the speech front-end with raw waveform CLDNNs." Sixteenth Annual Conference of the International Speech
Communication Association. 2015.
Méndez, Alfredo, “Speaker Identification with Deep Neural Networks,” Stanford CS230 Past Projects – Winter 2019.
McLaren, Mitchell, and Yun Lei. "Improved speaker recognition using DCT coefficients as features." 2015 IEEE International Conference on Acoustics,
Speech and Signal Processing (ICASSP). IEEE, 2015.
Torfi, Amirsina, Jeremy Dawson, and Nasser M. Nasrabadi. "Text-independent speaker verification using 3d convolutional neural networks." 2018 IEEE
International Conference on Multimedia and Expo (ICME). IEEE, 2018.
“Crowd Talking 8.” MP3 file. http://www.soundjay.com/beep-sounds-1.html.
“Computer Keyboard 1.” WAV file. https://www.soundjay.com/communication-sounds.html.
“Plastic Crumble 1.” MP3 file. https://www.soundjay.com/misc-sounds-2.html.

Más contenido relacionado

La actualidad más candente

A Distributed System for Recognizing Home Automation Commands and Distress Ca...
A Distributed System for Recognizing Home Automation Commands and Distress Ca...A Distributed System for Recognizing Home Automation Commands and Distress Ca...
A Distributed System for Recognizing Home Automation Commands and Distress Ca...a3labdsp
 
A Survey on Speaker Recognition System
A Survey on Speaker Recognition SystemA Survey on Speaker Recognition System
A Survey on Speaker Recognition SystemVani011
 
Speaker recognition on matlab
Speaker recognition on matlabSpeaker recognition on matlab
Speaker recognition on matlabArcanjo Salazaku
 
Interactive voice conversion for augmented speech production
Interactive voice conversion for augmented speech productionInteractive voice conversion for augmented speech production
Interactive voice conversion for augmented speech productionNU_I_TODALAB
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognitionRichie
 
multirate signal processing for speech
multirate signal processing for speechmultirate signal processing for speech
multirate signal processing for speechRudra Prasad Maiti
 
LPC for Speech Recognition
LPC for Speech RecognitionLPC for Speech Recognition
LPC for Speech RecognitionDr. Uday Saikia
 
Digital speech within 125 hz bandwidth (DS-125)
Digital speech within 125 hz bandwidth (DS-125)Digital speech within 125 hz bandwidth (DS-125)
Digital speech within 125 hz bandwidth (DS-125)Michael Lebo
 
Multirate signal processing and decimation interpolation
Multirate signal processing and decimation interpolationMultirate signal processing and decimation interpolation
Multirate signal processing and decimation interpolationransherraj
 
Speech Analysis and synthesis using Vocoder
Speech Analysis and synthesis using VocoderSpeech Analysis and synthesis using Vocoder
Speech Analysis and synthesis using VocoderIJTET Journal
 
Speech Recognition System By Matlab
Speech Recognition System By MatlabSpeech Recognition System By Matlab
Speech Recognition System By MatlabAnkit Gujrati
 
Digital modeling of speech signal
Digital modeling of speech signalDigital modeling of speech signal
Digital modeling of speech signalVinodhini
 
Homomorphic speech processing
Homomorphic speech processingHomomorphic speech processing
Homomorphic speech processingsivakumar m
 

La actualidad más candente (20)

Mini Project- Audio Enhancement
Mini Project-  Audio EnhancementMini Project-  Audio Enhancement
Mini Project- Audio Enhancement
 
A Distributed System for Recognizing Home Automation Commands and Distress Ca...
A Distributed System for Recognizing Home Automation Commands and Distress Ca...A Distributed System for Recognizing Home Automation Commands and Distress Ca...
A Distributed System for Recognizing Home Automation Commands and Distress Ca...
 
A Survey on Speaker Recognition System
A Survey on Speaker Recognition SystemA Survey on Speaker Recognition System
A Survey on Speaker Recognition System
 
Speaker recognition on matlab
Speaker recognition on matlabSpeaker recognition on matlab
Speaker recognition on matlab
 
Interactive voice conversion for augmented speech production
Interactive voice conversion for augmented speech productionInteractive voice conversion for augmented speech production
Interactive voice conversion for augmented speech production
 
Mini Project- Audio Enhancement
Mini Project- Audio EnhancementMini Project- Audio Enhancement
Mini Project- Audio Enhancement
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognition
 
Linear Predictive Coding
Linear Predictive CodingLinear Predictive Coding
Linear Predictive Coding
 
multirate signal processing for speech
multirate signal processing for speechmultirate signal processing for speech
multirate signal processing for speech
 
SPEAKER VERIFICATION
SPEAKER VERIFICATIONSPEAKER VERIFICATION
SPEAKER VERIFICATION
 
LPC for Speech Recognition
LPC for Speech RecognitionLPC for Speech Recognition
LPC for Speech Recognition
 
Digital speech within 125 hz bandwidth (DS-125)
Digital speech within 125 hz bandwidth (DS-125)Digital speech within 125 hz bandwidth (DS-125)
Digital speech within 125 hz bandwidth (DS-125)
 
Multirate signal processing and decimation interpolation
Multirate signal processing and decimation interpolationMultirate signal processing and decimation interpolation
Multirate signal processing and decimation interpolation
 
Speech Analysis and synthesis using Vocoder
Speech Analysis and synthesis using VocoderSpeech Analysis and synthesis using Vocoder
Speech Analysis and synthesis using Vocoder
 
lpc and horn noise detection
lpc and horn noise detectionlpc and horn noise detection
lpc and horn noise detection
 
Speech Recognition System By Matlab
Speech Recognition System By MatlabSpeech Recognition System By Matlab
Speech Recognition System By Matlab
 
Digital modeling of speech signal
Digital modeling of speech signalDigital modeling of speech signal
Digital modeling of speech signal
 
Sub band project
Sub band projectSub band project
Sub band project
 
Homomorphic speech processing
Homomorphic speech processingHomomorphic speech processing
Homomorphic speech processing
 
Speech encoding techniques
Speech encoding techniquesSpeech encoding techniques
Speech encoding techniques
 

Similar a CS 230 poster

Audio Inpainting with D2WGAN.pdf
Audio Inpainting with D2WGAN.pdfAudio Inpainting with D2WGAN.pdf
Audio Inpainting with D2WGAN.pdfssuser849b73
 
TomW_Radio-Romania-lecture
TomW_Radio-Romania-lectureTomW_Radio-Romania-lecture
TomW_Radio-Romania-lectureTom Williams
 
Konversa.docx - konversa.googlecode.com
Konversa.docx - konversa.googlecode.comKonversa.docx - konversa.googlecode.com
Konversa.docx - konversa.googlecode.combutest
 
Dante Audio Networking Fundamentals
Dante Audio Networking FundamentalsDante Audio Networking Fundamentals
Dante Audio Networking FundamentalsrAVe [PUBS]
 
ETE405-lec8.pptx
ETE405-lec8.pptxETE405-lec8.pptx
ETE405-lec8.pptxmashiur
 
Audio Compression_2023.pptx
Audio Compression_2023.pptxAudio Compression_2023.pptx
Audio Compression_2023.pptxzulhelmanz
 
Novel Approach of Implementing Psychoacoustic model for MPEG-1 Audio
Novel Approach of Implementing Psychoacoustic model for MPEG-1 AudioNovel Approach of Implementing Psychoacoustic model for MPEG-1 Audio
Novel Approach of Implementing Psychoacoustic model for MPEG-1 Audioinventy
 
ECE462 CommunicationIILecture1.pdf
ECE462 CommunicationIILecture1.pdfECE462 CommunicationIILecture1.pdf
ECE462 CommunicationIILecture1.pdfSayedHassan84
 
An Evolutionary Approach to Speech Quality Estimation
An Evolutionary Approach to Speech Quality EstimationAn Evolutionary Approach to Speech Quality Estimation
An Evolutionary Approach to Speech Quality Estimationadil raja
 
An Evolutionary Approach to Speech Quality Estimation Using Genetic Programming
An Evolutionary Approach to Speech Quality Estimation Using Genetic ProgrammingAn Evolutionary Approach to Speech Quality Estimation Using Genetic Programming
An Evolutionary Approach to Speech Quality Estimation Using Genetic Programmingadil raja
 
Digital Watermarking Of Audio Signals.pptx
Digital Watermarking Of Audio Signals.pptxDigital Watermarking Of Audio Signals.pptx
Digital Watermarking Of Audio Signals.pptxAyushJaiswal781174
 
The analog to digital conversion process
The analog to digital conversion processThe analog to digital conversion process
The analog to digital conversion processDJNila
 
GENDER RECOGNITION SYSTEM USING SPEECH SIGNAL
GENDER RECOGNITION SYSTEM USING SPEECH SIGNALGENDER RECOGNITION SYSTEM USING SPEECH SIGNAL
GENDER RECOGNITION SYSTEM USING SPEECH SIGNALIJCSEIT Journal
 

Similar a CS 230 poster (20)

Digital audio
Digital audioDigital audio
Digital audio
 
Audio Inpainting with D2WGAN.pdf
Audio Inpainting with D2WGAN.pdfAudio Inpainting with D2WGAN.pdf
Audio Inpainting with D2WGAN.pdf
 
Mp3
Mp3Mp3
Mp3
 
TomW_Radio-Romania-lecture
TomW_Radio-Romania-lectureTomW_Radio-Romania-lecture
TomW_Radio-Romania-lecture
 
Week two a d conversion
Week two a d conversionWeek two a d conversion
Week two a d conversion
 
Konversa.docx - konversa.googlecode.com
Konversa.docx - konversa.googlecode.comKonversa.docx - konversa.googlecode.com
Konversa.docx - konversa.googlecode.com
 
SPEECH CODING
SPEECH CODINGSPEECH CODING
SPEECH CODING
 
Dante Audio Networking Fundamentals
Dante Audio Networking FundamentalsDante Audio Networking Fundamentals
Dante Audio Networking Fundamentals
 
ETE405-lec8.pptx
ETE405-lec8.pptxETE405-lec8.pptx
ETE405-lec8.pptx
 
Audio Compression_2023.pptx
Audio Compression_2023.pptxAudio Compression_2023.pptx
Audio Compression_2023.pptx
 
Novel Approach of Implementing Psychoacoustic model for MPEG-1 Audio
Novel Approach of Implementing Psychoacoustic model for MPEG-1 AudioNovel Approach of Implementing Psychoacoustic model for MPEG-1 Audio
Novel Approach of Implementing Psychoacoustic model for MPEG-1 Audio
 
Solo solution
Solo solutionSolo solution
Solo solution
 
ECE462 CommunicationIILecture1.pdf
ECE462 CommunicationIILecture1.pdfECE462 CommunicationIILecture1.pdf
ECE462 CommunicationIILecture1.pdf
 
An Evolutionary Approach to Speech Quality Estimation
An Evolutionary Approach to Speech Quality EstimationAn Evolutionary Approach to Speech Quality Estimation
An Evolutionary Approach to Speech Quality Estimation
 
An Evolutionary Approach to Speech Quality Estimation Using Genetic Programming
An Evolutionary Approach to Speech Quality Estimation Using Genetic ProgrammingAn Evolutionary Approach to Speech Quality Estimation Using Genetic Programming
An Evolutionary Approach to Speech Quality Estimation Using Genetic Programming
 
Digital Watermarking Of Audio Signals.pptx
Digital Watermarking Of Audio Signals.pptxDigital Watermarking Of Audio Signals.pptx
Digital Watermarking Of Audio Signals.pptx
 
The analog to digital conversion process
The analog to digital conversion processThe analog to digital conversion process
The analog to digital conversion process
 
3 Digital Audio
3  Digital  Audio3  Digital  Audio
3 Digital Audio
 
3 Digital Audio
3 Digital Audio3 Digital Audio
3 Digital Audio
 
GENDER RECOGNITION SYSTEM USING SPEECH SIGNAL
GENDER RECOGNITION SYSTEM USING SPEECH SIGNALGENDER RECOGNITION SYSTEM USING SPEECH SIGNAL
GENDER RECOGNITION SYSTEM USING SPEECH SIGNAL
 

Último

FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxolyaivanovalion
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 

Último (20)

FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 

CS 230 poster

  • 1. David Grogan dgrogan@stanford.edu Phathaphol (Peter) Karnchanapimonkul phatk@stanford.edu ● We built a Speaker Identification network. Input is a voice sample, output is who’s talking. ● Our motivation came from Alzheimer’s patients who forget or can’t identify who is speaking in group calls, e.g. family calls to Grandma from one speaker phone. ● The solution extends to voice conference calls, as seen on right. ● Many existing solutions perform Speaker Identification only on clean audio. We use noisy audio to better simulate real-world conditions. ● Our network is trained on 20 speakers, as that’s roughly the max size of a family or work team. Dataset Introduction Algorithms and Models ● LibriVox and Internet Archive host royalty-free audiobooks read by volunteers. ● Mono-channel, 22050Hz mp3s; one file per chapter. ● We split into 0.5 second training examples.. 0.5 seconds seemed to be the shortest time a human needs to identify the speaker. ● Complication: Some audio books have multiple speakers. Some speakers read multiple audio books. Augmenting with Noise Results & Analysis Experiments ● Audiobooks are clean recordings. We want background noise. ● 3 Background Noises: 1) Crowd Talking, 2) Laptop Keyboard, 3) Plastic Crumple ● Overlay ENTIRE audiobook with noise in 20-second chunks. Normalize volume to match. Speaker Identification in a Noisy Environment Raw Waveforms vs MFCC INPUT 11025 waveform samples Batch Normalization CONV1D Filter 3, Stride 2, Depth 40 Leaky ReLU Batch Normalization CONV1D Filter 3, Stride 2, Depth 50 Leaky ReLU Batch Normalization CONV1D Filter 3, Stride 2, Depth 60 Leaky ReLU Max Pool Size 2 Batch Normalization CONV1D Filter 3, Stride 2, Depth 60 Leaky ReLU Max Pool Size 2 Batch Normalization CONV1D Filter 3, Stride 2, Depth 80 Leaky ReLU Max Pool Size 2 Batch Normalization CONV1D Filter 3, Stride 2, Depth 100 Leaky ReLU Max Pool Size 2 Batch Normalization Dense Size 40 Leaky ReLU Dense Size 20 SoftMax OUTPUT 20 classes 20 audiobooks from LibriVox Final Architecture Batch Normalization CONV1D Filter 3, Stride 2, Depth 60 Leaky ReLU Input comprises 11025 waveform samples which represents 0.5 seconds of data. We used 7 convolutional layers and 4 max-pooling layers all of stride 2 to reduce the input from size 11025 to size 5 before flattening the channels to pass to the dense layers. Each of the output classes corresponds to one speaker. ● Model trained on clean data achieves 94% test accuracy on clean data. ● Clean model achieves a paltry 14% accuracy when making predictions on noisy data. ● Significant effort training models with MFCCs, which are generated from the raw waveform via substantial signal preprocessing. Traditionally, most Speech Recognition and Speaker Identification use MFCCs as input. ● The MFCC models achieved 88% accuracy on clean data, but only 63% on noisy data. ● TA advised that the preprocessing could be discarding valuable signal, so we abandoned MFCCs and trained a bigger network using raw waveforms as input. ● Final model achieved 86% accuracy on the test set ● Accuracy was NOT uniform across speakers References Sainath, Tara N., et al. "Learning the speech front-end with raw waveform CLDNNs." Sixteenth Annual Conference of the International Speech Communication Association. 2015. Méndez, Alfredo, “Speaker Identification with Deep Neural Networks,” Stanford CS230 Past Projects – Winter 2019. McLaren, Mitchell, and Yun Lei. "Improved speaker recognition using DCT coefficients as features." 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2015. Torfi, Amirsina, Jeremy Dawson, and Nasser M. Nasrabadi. "Text-independent speaker verification using 3d convolutional neural networks." 2018 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 2018. “Crowd Talking 8.” MP3 file. http://www.soundjay.com/beep-sounds-1.html. “Computer Keyboard 1.” WAV file. https://www.soundjay.com/communication-sounds.html. “Plastic Crumble 1.” MP3 file. https://www.soundjay.com/misc-sounds-2.html.