SlideShare una empresa de Scribd logo
1 de 27
A Study on the Video Scene
Retrieving System
with a Speech Recognizer
2013. 5. 14
Yoshika OSAWA
Kohno Lab.
Outline
1. Introduction
2. Aim of Study
3. Composition of System
i. Voice Divide Section
ii. Speech Recognize Section
iii. Scene Retrieve Section
4. Evaluation Experiment
5. Conclusion
1. Introduction
• A variety of video data are being
generated, stored, and accessed with
advances in the Internet.
• To make search a video scene quickly from
the data, an efficient technique is needed.
1. Introduction
• Multimedia Annotations
o Nagao(2001)
1. Introduction
• A Subtitling System for Broadcast
Programs with a Speech Recognizer
o Ando et al.(2001)
1. Introduction
• Extracting voices from the video.
• The advantage of voice :
Easy to Make texts.
Simple association.
Apply the speech recognition to the scene
retrieving.
Outline
1. Introduction
2. Aim of Study
3. Composition of System
i. Voice Divide Section
ii. Speech Recognize Section
iii. Scene Retrieve Section
4. Evaluation Experiment
5. Conclusion
2. Aim of Study
Implement a scene retrieving
system, then verify the accuracy and
check the operations.
Make annotations with the speech
recognition automatically.
Outline
1. Introduction
2. Aim of Study
3. Composition of System
i. Voice Divide Section
ii. Speech Recognize Section
iii. Scene Retrieve Section
4. Evaluation Experiment
5. Conclusion
3. Composition of System
Start
End
Select a Video
Speech Recognize Section
Input a Keyword
Scene Retrieve Section
Output the resultVoice Divide Section
i. Voice Divide Section
• Focus on the Amplitude
o Use signals while exceeding the threshold
value of the amplitude.
o Reject because it is not possible to recognize if
it is too short.
o Derive threshold based on experiment.
axis threshold
Amplitude 10[%]
Time 1000[ms]
ii. Speech Recognize
Section
(1) Pre-Processing Unit
• Digitization
o Sampling frequency: 16kHz
o Quantization bit : 16bit
• Noise Reduction
o Additive: Subtract the difference between the silence
o Multiplicative: Subtract in the log axis
Microphone characteristics of SM57
(2) Feature Extraction Unit
Resonant frequency is effective as a feature value
• Resolution of human hearing
o Higher sensitivity in lower frequency
• Filter that matches the human hearing
Mel-frequency
(2) Feature Extraction Unit
• Inverse Fourier transform in the Mel-frequency axis
o New axis: Cepstrum
o Separate the voice pitch and resonance frequency
• MFCC(Mel Frequency Cepstrum Coefficient)
o Information of vowel
• ΔMFCC
o Infromation of consonant
• Feature vector
o (Average power, MFCC, ΔMFCC)
(2) Feature Extraction Unit
(3) Identification Unit
From Bayes' theorem
(3) Identification Unit
Speech waveform : Observable
Character information:
Unobservable directly
Estimate the character information
from the waveform by using HMM
(Hidden Markov Models)
Maximum likelihood calculation : Viterbi algorithm
Machine learning : Baum-Welch algorithm
iii. Scene Retrieve Section
• Matching keyword and text
1. Input a keyword
2. Matching the keyword by String searching
3. Extract scene that the keyword was spoken.
4. Output a thumbnail
Outline
1. Introduction
2. Aim of Study
3. Composition of System
i. Voice Divide Section
ii. Speech Recognize Section
iii. Scene Retrieve Section
4. Evaluation Experiment
5. Conclusion
4. Evaluation Experiment
1. Compare the result with the word I heard
2. Calculate the recognition rate
3. Evaluate it by each number of characters
Sample data
Video NHK news
Time 3 minutes
Number 30 videos
Words 457 words
Engine Julius
4. Evaluation Experiment
Total average rate is 68%.
67%
73%
69%
46% 45%
40%
0%
20%
40%
60%
80%
Recognition Rate
1 2 3 4 5 6 words
4. Evaluation Experiment
• Verify the correspondence between
keyword and the seek destination
o Select thumbnail and play from the scene
o Check whether the keyword was spoken.
4. Evaluation Experiment
• Recognition rate decrease when number
of characters increase.
• The retrieved scene is corresponding to
the keyword.
• Recognition error in weak consonant part
o Need improvement in Voice Devide Section
o Must also improve the recognition accuracy
Outline
1. Introduction
2. Aim of Study
3. Composition of System
i. Voice Divide Section
ii. Speech Recognize Section
iii. Scene Retrieve Section
4. Evaluation Experiment
5. Conclusion
5. Conclusion
• System for efficient watching video
o Use Speech Recognition
o Make Annotations automatically
• Future work
o Adopt the Zero-Crossing Number in Voice
Devide Section
o Take in latest Speech Recognition technology.
o Incorporate Image Recognition.
Thank you for your attention!

Más contenido relacionado

Similar a A Study on the Video Scene Retrieving System

Utterance based speaker identification
Utterance based speaker identificationUtterance based speaker identification
Utterance based speaker identificationIJCSEA Journal
 
Emotion Recognition based on audio signal using GFCC Extraction and BPNN Clas...
Emotion Recognition based on audio signal using GFCC Extraction and BPNN Clas...Emotion Recognition based on audio signal using GFCC Extraction and BPNN Clas...
Emotion Recognition based on audio signal using GFCC Extraction and BPNN Clas...ijceronline
 
Analysis of Suitable Extraction Methods and Classifiers For Speaker Identific...
Analysis of Suitable Extraction Methods and Classifiers For Speaker Identific...Analysis of Suitable Extraction Methods and Classifiers For Speaker Identific...
Analysis of Suitable Extraction Methods and Classifiers For Speaker Identific...IRJET Journal
 
Utterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANNUtterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANNIJCSEA Journal
 
Utterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANNUtterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANNIJCSEA Journal
 
Speaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization ApproachSpeaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization Approachijsrd.com
 
A Review of Video Classification Techniques
A Review of Video Classification TechniquesA Review of Video Classification Techniques
A Review of Video Classification TechniquesIRJET Journal
 
Speaker Identification & Verification Using MFCC & SVM
Speaker Identification & Verification Using MFCC & SVMSpeaker Identification & Verification Using MFCC & SVM
Speaker Identification & Verification Using MFCC & SVMIRJET Journal
 
3D Audio playback for single channel audio using visual cues
3D Audio playback for single channel audio using visual cues3D Audio playback for single channel audio using visual cues
3D Audio playback for single channel audio using visual cuesRamin Anushiravani
 
Speaker recognition on matlab
Speaker recognition on matlabSpeaker recognition on matlab
Speaker recognition on matlabArcanjo Salazaku
 
Generating a time shrunk lecture video by event
Generating a time shrunk lecture video by eventGenerating a time shrunk lecture video by event
Generating a time shrunk lecture video by eventYara Ali
 
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...IJCSEA Journal
 
Voice recognition security systems
Voice recognition security systemsVoice recognition security systems
Voice recognition security systemsSandeep Kumar
 
Text independent speaker recognition system
Text independent speaker recognition systemText independent speaker recognition system
Text independent speaker recognition systemDeepesh Lekhak
 
TS3-1: Hijiri Suzuki from Nagoya Institute of Technology
TS3-1: Hijiri Suzuki from Nagoya Institute of TechnologyTS3-1: Hijiri Suzuki from Nagoya Institute of Technology
TS3-1: Hijiri Suzuki from Nagoya Institute of TechnologyJawad Haqbeen
 
TS3-1: Hijiri Suzuki from Nagoya Institute of Technology
TS3-1: Hijiri Suzuki from Nagoya Institute of TechnologyTS3-1: Hijiri Suzuki from Nagoya Institute of Technology
TS3-1: Hijiri Suzuki from Nagoya Institute of TechnologyJawad Haqbeen
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)inventionjournals
 

Similar a A Study on the Video Scene Retrieving System (20)

Utterance based speaker identification
Utterance based speaker identificationUtterance based speaker identification
Utterance based speaker identification
 
Emotion Recognition based on audio signal using GFCC Extraction and BPNN Clas...
Emotion Recognition based on audio signal using GFCC Extraction and BPNN Clas...Emotion Recognition based on audio signal using GFCC Extraction and BPNN Clas...
Emotion Recognition based on audio signal using GFCC Extraction and BPNN Clas...
 
Analysis of Suitable Extraction Methods and Classifiers For Speaker Identific...
Analysis of Suitable Extraction Methods and Classifiers For Speaker Identific...Analysis of Suitable Extraction Methods and Classifiers For Speaker Identific...
Analysis of Suitable Extraction Methods and Classifiers For Speaker Identific...
 
Utterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANNUtterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANN
 
Utterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANNUtterance Based Speaker Identification Using ANN
Utterance Based Speaker Identification Using ANN
 
A novel approach to record sound
A novel approach to record soundA novel approach to record sound
A novel approach to record sound
 
Speaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization ApproachSpeaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization Approach
 
A Review of Video Classification Techniques
A Review of Video Classification TechniquesA Review of Video Classification Techniques
A Review of Video Classification Techniques
 
Speaker Identification & Verification Using MFCC & SVM
Speaker Identification & Verification Using MFCC & SVMSpeaker Identification & Verification Using MFCC & SVM
Speaker Identification & Verification Using MFCC & SVM
 
3D Audio playback for single channel audio using visual cues
3D Audio playback for single channel audio using visual cues3D Audio playback for single channel audio using visual cues
3D Audio playback for single channel audio using visual cues
 
Speaker recognition on matlab
Speaker recognition on matlabSpeaker recognition on matlab
Speaker recognition on matlab
 
Generating a time shrunk lecture video by event
Generating a time shrunk lecture video by eventGenerating a time shrunk lecture video by event
Generating a time shrunk lecture video by event
 
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...
 
D04812125
D04812125D04812125
D04812125
 
Voice recognition security systems
Voice recognition security systemsVoice recognition security systems
Voice recognition security systems
 
Speaker Recognition Using Vocal Tract Features
Speaker Recognition Using Vocal Tract FeaturesSpeaker Recognition Using Vocal Tract Features
Speaker Recognition Using Vocal Tract Features
 
Text independent speaker recognition system
Text independent speaker recognition systemText independent speaker recognition system
Text independent speaker recognition system
 
TS3-1: Hijiri Suzuki from Nagoya Institute of Technology
TS3-1: Hijiri Suzuki from Nagoya Institute of TechnologyTS3-1: Hijiri Suzuki from Nagoya Institute of Technology
TS3-1: Hijiri Suzuki from Nagoya Institute of Technology
 
TS3-1: Hijiri Suzuki from Nagoya Institute of Technology
TS3-1: Hijiri Suzuki from Nagoya Institute of TechnologyTS3-1: Hijiri Suzuki from Nagoya Institute of Technology
TS3-1: Hijiri Suzuki from Nagoya Institute of Technology
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
 

Último

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 

Último (20)

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 

A Study on the Video Scene Retrieving System

  • 1. A Study on the Video Scene Retrieving System with a Speech Recognizer 2013. 5. 14 Yoshika OSAWA Kohno Lab.
  • 2. Outline 1. Introduction 2. Aim of Study 3. Composition of System i. Voice Divide Section ii. Speech Recognize Section iii. Scene Retrieve Section 4. Evaluation Experiment 5. Conclusion
  • 3. 1. Introduction • A variety of video data are being generated, stored, and accessed with advances in the Internet. • To make search a video scene quickly from the data, an efficient technique is needed.
  • 4. 1. Introduction • Multimedia Annotations o Nagao(2001)
  • 5. 1. Introduction • A Subtitling System for Broadcast Programs with a Speech Recognizer o Ando et al.(2001)
  • 6. 1. Introduction • Extracting voices from the video. • The advantage of voice : Easy to Make texts. Simple association. Apply the speech recognition to the scene retrieving.
  • 7. Outline 1. Introduction 2. Aim of Study 3. Composition of System i. Voice Divide Section ii. Speech Recognize Section iii. Scene Retrieve Section 4. Evaluation Experiment 5. Conclusion
  • 8. 2. Aim of Study Implement a scene retrieving system, then verify the accuracy and check the operations. Make annotations with the speech recognition automatically.
  • 9. Outline 1. Introduction 2. Aim of Study 3. Composition of System i. Voice Divide Section ii. Speech Recognize Section iii. Scene Retrieve Section 4. Evaluation Experiment 5. Conclusion
  • 10. 3. Composition of System Start End Select a Video Speech Recognize Section Input a Keyword Scene Retrieve Section Output the resultVoice Divide Section
  • 11. i. Voice Divide Section • Focus on the Amplitude o Use signals while exceeding the threshold value of the amplitude. o Reject because it is not possible to recognize if it is too short. o Derive threshold based on experiment. axis threshold Amplitude 10[%] Time 1000[ms]
  • 13. (1) Pre-Processing Unit • Digitization o Sampling frequency: 16kHz o Quantization bit : 16bit • Noise Reduction o Additive: Subtract the difference between the silence o Multiplicative: Subtract in the log axis Microphone characteristics of SM57
  • 14. (2) Feature Extraction Unit Resonant frequency is effective as a feature value
  • 15. • Resolution of human hearing o Higher sensitivity in lower frequency • Filter that matches the human hearing Mel-frequency (2) Feature Extraction Unit
  • 16. • Inverse Fourier transform in the Mel-frequency axis o New axis: Cepstrum o Separate the voice pitch and resonance frequency • MFCC(Mel Frequency Cepstrum Coefficient) o Information of vowel • ΔMFCC o Infromation of consonant • Feature vector o (Average power, MFCC, ΔMFCC) (2) Feature Extraction Unit
  • 18. (3) Identification Unit Speech waveform : Observable Character information: Unobservable directly Estimate the character information from the waveform by using HMM (Hidden Markov Models) Maximum likelihood calculation : Viterbi algorithm Machine learning : Baum-Welch algorithm
  • 19. iii. Scene Retrieve Section • Matching keyword and text 1. Input a keyword 2. Matching the keyword by String searching 3. Extract scene that the keyword was spoken. 4. Output a thumbnail
  • 20. Outline 1. Introduction 2. Aim of Study 3. Composition of System i. Voice Divide Section ii. Speech Recognize Section iii. Scene Retrieve Section 4. Evaluation Experiment 5. Conclusion
  • 21. 4. Evaluation Experiment 1. Compare the result with the word I heard 2. Calculate the recognition rate 3. Evaluate it by each number of characters Sample data Video NHK news Time 3 minutes Number 30 videos Words 457 words Engine Julius
  • 22. 4. Evaluation Experiment Total average rate is 68%. 67% 73% 69% 46% 45% 40% 0% 20% 40% 60% 80% Recognition Rate 1 2 3 4 5 6 words
  • 23. 4. Evaluation Experiment • Verify the correspondence between keyword and the seek destination o Select thumbnail and play from the scene o Check whether the keyword was spoken.
  • 24. 4. Evaluation Experiment • Recognition rate decrease when number of characters increase. • The retrieved scene is corresponding to the keyword. • Recognition error in weak consonant part o Need improvement in Voice Devide Section o Must also improve the recognition accuracy
  • 25. Outline 1. Introduction 2. Aim of Study 3. Composition of System i. Voice Divide Section ii. Speech Recognize Section iii. Scene Retrieve Section 4. Evaluation Experiment 5. Conclusion
  • 26. 5. Conclusion • System for efficient watching video o Use Speech Recognition o Make Annotations automatically • Future work o Adopt the Zero-Crossing Number in Voice Devide Section o Take in latest Speech Recognition technology. o Incorporate Image Recognition.
  • 27. Thank you for your attention!

Notas del editor

  1. Good afternoon, everyone.I’m Yoshika OSAWA, I am very happy to see all of you today.Let's begin.The theme of my presentation is “A Study on the Video Scene Retrieving System with a Speech Recognizer”.which I studied last year at Gunma National College of Technology.