SlideShare una empresa de Scribd logo
1 de 6
Descargar para leer sin conexión
IOSR Journal of Electronics and Communication Engineering (IOSR-JECE)
e-ISSN: 2278-2834,p- ISSN: 2278-8735.Volume 8, Issue 1 (Sep. - Oct. 2013), PP 42-47
www.iosrjournals.org
www.iosrjournals.org 42 | Page
Pitch Detection of Speech Synthesis by Using Matlab
Abhishek Nandy
(Electronics and Communication Engineering, Abacus Institute of Engineering & Management, India)
Abstract: In speech synthesis, machine is developed which can accept text and convert into natural sounding
speech. Applications of speech synthesis include speech output from computers, reading machine for the
visually challenged people. The difference between text to speech synthesizer and any other talking machine
(e.g., cassette player) is, it could be trained for any speaker’s voice in a fully automatic way. Three main
approaches to speech synthesis: articulator synthesis, formant synthesis, and concatenate synthesis. I am
carrying out with concatenate synthesis approach. In addition, text-to-speech (TTS) conversion system based on
time-domain pitch-synchronous overlap-add (TD-PSOLA) method, has been employed to perform prosody
(includes pitch, duration of a speech) modification.7
To assure good quality of synthetic speech accurate
estimation of pitch-period and pitch-marks are necessary for pitch modification. Pitch marking is divided into
two tasks; pitch detection and location determination. LPF and some nonlinearity are being used for pitch-
detection; peak-valley decision method is used to determine the appropriate parts of speech for used in pitch-
mark estimation. In each pitch period, two possible peaks/valleys are searched and one dynamic programming
is run to obtain pitch-mark.
Keywords: auto-correlation, concatenate, pitch-detection, pitch-mark, TTS
I. Introduction
A text-to-speech (TTS) system converts normal language text into speech; other systems render
symbolic linguistic representations like phonetic transcriptions into speech. Stephen Hawking is one of the most
famous people using speech synthesis to communicate. Concatenative synthesis is based on the concatenation
(or stringing together) of segments of recorded speech. Generally, concatenative synthesis produces the most
natural-sounding synthesized speech. However, differences between natural variations in speech and the nature
of the automated techniques for segmenting the waveforms sometimes result in audible glitches in the output.
There are three main sub-types of concatenative synthesis. Concatenate speech synthesis produces speech by
concatenating small, pre-recorded units of speech. With longer units, the naturalness increases, less
concatenation points are needed. The most widely used units in concatenated synthesis are diphones. At
synthesis time, the diphone waveforms are processed through an analysis- synthesis system which is based on a
representation of the speech signal by its short-time Fourier transform (STFT) at a pitch synchronous sampling
rate. The synthesis part of the system works by overlap-adding the modified short-term signals and it ensures a
smooth concatenation of the diphone waveforms. Overlap and add (OLA) methods which have been proposed to
manipulate some of the speech parameter while maintain a high degree of naturalness.
In the PSOLA method every signal is correspond to one pitch period of the speech signal. This implies
a preliminary segmentation of the speech signal into individual pitch periods. In this method, input speech is
represented in speech digital waveform with successive “pitch-marks” distributed along the time scale.
Two analysis windows centered on two successive p-marks should overlap by sufficient amount. The
analysis short term signals are obtained by multiplying the signal by the analysis window centered on the
corresponding p-marks.
In addition, this method has been used to perform the prosody modification according to the target
prosodic information. Prosodic information of a speech includes its pitch (fundamental frequency, f0), intensity,
duration. In TD-PSOLA method it is required to obtain a pitch mark for each pitch period in order to assure a
good quality of synthesis speech. If the pitch mark is denoted at the signal with the largest amplitude, but largest
peak may not correspond to the largest one in the next period. This is shown in the following figure:
Fig1: It shows that largest peak does not correspond to the largest one in the next period
This will create an unpleasant speech after the method is used. So that in order to determine a pitch
mark two highest peaks in each period are searched.
To find out the pitch mark position, I have to first find the pitch period of a speech signal. I use a low
pass filter and autocorrelation method to find the pitch of a signal. After that I use an adaptable filter, peak-
Pitch Detection of Speech Synthesis by Using Matlab
www.iosrjournals.org 43 | Page
valley decision method, and a dynamic programming to determine the pitch mark of a speech signal. With the
purpose of spectrally flattening the signal I applied some sort of nonlinearities to the speech signal before the
autocorrelation method.
The autocorrelation of a sequence is correlation of a sequence with itself, the autocorrelation of a
sequence x(n) is defined by6






n
x mnxnxm ).()(lim)( 12
1

II. Method Used:
2.1Proposed Algorithm:
I define some symbols which are used in the algorithm:
N: frame size in sample.
sm[n]: the voiced speech of the mth
frame, where 0 ≤n< N.
SFm[k]: the frequency response of sm[n], where 0 ≤k< N.
YFm[k]: the pass band frequency response SFm[k], where 0 ≤k< N.
om[l]: the adaptive filter’s output signal of the mth
frame, where 0 ≤l< N.
The algorithm is as follows:
Step1: Use FFT to transform the signal sm[n] to obtain the frequency response SFm[k].
Step2: Find the position kp of the spectral peak for SFm[k] by searching the first forty points of | SFm[k] |.
Step3: Decide on the filter’s pass band. Let YFm[k] = SFm[k] if 3≤k≤kp+2, otherwise let YFm[k] = 0.
Step4: normalize YFm[k] by multiplying a scale of Maxk(|YFm[k]| / |YFm[kp]|).
Step5: Use IFFT to transform the normalized YFm[k] to the time domain.
Next step in the pitch mark determination is the peak-valley decision method and dynamic
programming.
In the peak-valley decision method I used to calculate the two costs by summing the amplitude of s[q],
where q represent the position of the two extreme point of o[.
] over each pitch period. In that case o[.
] represent
the filtered output and s[.] represent the input speech signal.
2.2 Design & Description:
For a synthesis scheme based on the TD-PSOLA method, it is required to obtain a pitch mark for each
pitch period in order to assure good quality of synthesis speech. The pitch mark is the reference point for the
overlap between the speech signals. There are two major tasks in determination of pitch mark: i) pitch detection
ii) location determination. To do that, I follow the design that based on an adaptable filter and a peak-valley
method. The block diagram of that method is shown below. In that block diagram I take voiced speech as the
input signal because only the periodic parts are of interest. In pitch detection part I use low pass filter and
nonlinearity blocks to find out the pitch period of a signal. After filtration of the signal, it goes pass through the
autocorrelation block and I get the required pitch period. The required block diagram is shown in section 2.2.1.
2.2.1 Block diagram of the proposed pitch marking method:
III. Simulation Results Obtained:
To implement the structure which is given to the block diagram, I required designing some element
namely low pass filter, two nonlinear blocks, and a correlator. Using the MATLAB programming I design the
blocks that are required to implement my job.
Pitch Detection of Speech Synthesis by Using Matlab
www.iosrjournals.org 44 | Page
The first one which I design is a low pass filter, to design a low pass filter; I use some of the following
filter specification:
i) FIR, Linear phase, Digital Filter
ii) Pass band of 0 to 900 Hz
iii) Stop band beginning at 1700 Hz
iv) Filter has an impulse response duration of 25 samples
v) The filter pass band was flat to within ± 0.03, and the stop band response was down at least 50 dB6
After using those specifications to design a low pass filter, I get the following filter’s response graph.
In the Y-axis of the graph I plot the filter’s absolute magnitude part and in the X-axis I plot the frequency which
is normalized in nature.
3.1 Nonlinear Correlation processing:
Fig2: Block diagram of non-linear correlation processing
In this block diagram speech signal[S(n)] is first low pass filtered, the output is then used as input to
two nonlinear processor, labeled as NL1 and NL2 in the block diagram. The nonlinearities used in each path
may or may not be same.6
It has also been argued that such a correlation will be most appropriate in a variety of
actual situations in pitch detection.2
3.2 Peak-Valley decision and dynamic programming:
I use the following two equations to calculate the costs:


peak
peak
N
n
peakpeak nPossC
1
1 ]][[ 

valley
valley
N
n
valleyvalley nPossC
1
1 ]][[
Where the symbols are defined as follows:
C peak= cost estimated at the peaks of o [.
]. C valley= cost estimated at the valley of o [.
].
N peak= total number of the peaks of the o [.
]. N valley= total number of the valleys of the o [.
].
Pos peak[n] = position of the nth
peak of o [.
]. Pos valley[n] = position of the nth
valley of o [.
].
The peak-valley decision method is made as follows:
If C peak > C valley then the positive part (peak) of s[.
] is adopted for the evaluation of the pitch mark.
Otherwise, the negative part (valley) of s[.
] is adopted for the purpose.
Once the peak or valley is determined, in my case say the peak is adopted for the pitch mark
determination. The positions of the pitch mark are set by picking the peaks of the speech signal. The PSOLA
(Pitch Synchronous Overlap and Add) method can be used to produce good quality of speech if the pitch is
marked at the highest amplitude of the signal. To do the pitch mark we have to calculate the following two
quantities.
For the ith
pitch period, Pi, suppose the highest and the second highest peaks are located at Li1 and Li2,
respectively. The distortion of the pitch period, disi (j, k) and its accumulation, Ai (j) are defined as follows:
disi (j, k) = ||Lij – L(i-1)k|- Pi|+ g(j, k), for i=2, 3...PN,
Ai (j) = min {disi (j, 1) + Ai-1(1), disi(j, 2) + Ai-1(2)}, for i=2, 3….PN,
Where PN is the total number of pitch period and j, k= 1, 2; g (j, k) is a penalty function, defined by g(j,
k) = 0, if j=1 or k=1 otherwise, 1/PN
Pitch Detection of Speech Synthesis by Using Matlab
www.iosrjournals.org 45 | Page
IV. Graphs of Simulation:
Output response of the low pass filter: Input speech and Low pass filtered signal:
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.005
0.01
0.015
0.02
0.025
Normalized frequency (/) -- >>
Magnitude-->>
0 0.005 0.01 0.015 0.02 0.025 0.03 0.035
-0.05
-0.04
-0.03
-0.02
-0.01
0
0.01
0.02
Time(s) -- >>
Magnitude-->>
Input speech signal
Fig 3: Response graph of the low pass filter . Fig4: Input speech signal.
Non-linearly processed output from the correlation
processing:
0 0.005 0.01 0.015 0.02 0.025 0.03 0.035
-4
-3
-2
-1
0
1
2
x 10
-4
Time(s) -- >>
Magnitude-->>
Lowpass filtered speech signal
0 0.005 0.01 0.015 0.02 0.025 0.03
-1
-0.5
0
0.5
1
Time (s)
Amplitude
Waveform
-0.02 -0.015 -0.01 -0.005 0 0.005 0.01 0.015 0.02
-0.5
0
0.5
1
Lag (s)
Correlationcoef
Auto-correlation
Fig5: speech signal after passing through LPF Fig6: nonlinearly processed signal and correlator output
After the correlator block I get the pitch period (f0) of the speech signal. In my work, I get an f0=
155.3398 Hz for a speech signal which I am used in my on-going work.
First figure shows the response of the band pass filter that I am used as my adaptive filter to transform
the speech signal into a sin like wave. The speech signal that I am getting after the band pass filter is like smooth
sine curves with the high frequency component are cut out. Filtered speech is now passing through a peak-valley
decision block to find out which part of the speech is suitable for pitch mark estimation. In fig8 and fig9, the red
asterisks denote the positive peaks and the green asterisks denote the valleys of the speech signal. After
calculate the cost using my method I see that the positive peak is greater than the negative peak, so I use the
positive peak of the speech signal to determine the positions of the pitch marks.
Output response of band pass filter’s: Peak-Valley detection and pitch mark in a speech signal:
0 100 200 300 400 500 600
-3
-2.5
-2
-1.5
-1
-0.5
0
0.5
Filter Point ---->>
FilterCoefficient---->>
Band pass filter's response
0 100 200 300 400 500 600
-0.015
-0.01
-0.005
0
0.005
0.01
IFFT length of Filter Signal ---->>
IFFTCoefficientofFilterSignal---->>
Input Speech Signal
Valley
Peak
Fig7: band pass filter’s response Fig8: filtered signal with peak-valley detection
Pitch Detection of Speech Synthesis by Using Matlab
www.iosrjournals.org 46 | Page
0 100 200 300 400 500 600
-0.015
-0.01
-0.005
0
0.005
0.01
IFFT length of Filter Signal ---->>
IFFTCoefficientofFilterSignal---->>
Fig9: speech signal with its pitch mark
V. Conclusion
I have examined method for combining nonlinear processing of the speech signal with a standard
correlation analysis to give correlation functions which have sharp peaks at the pitch period. I have seen that the
non-linearity provide some degree of spectral flattening which enhance the periodicity peaks in the correlation
function, and reduce the correlation peaks due to the formant structure of the waveform. In the following figures
I show the difference between unprocessed and non-linearly processed signal:
0 0.005 0.01 0.015 0.02 0.025 0.03
-4
-2
0
2
x 10
-4
Time (s)
Amplitude
Waveform
-0.02 -0.015 -0.01 -0.005 0 0.005 0.01 0.015 0.02
-0.5
0
0.5
1
Lag (s)
Correlationcoef
Auto-correlation
0 0.005 0.01 0.015 0.02 0.025 0.03
-1
-0.5
0
0.5
1
Time (s)
Amplitude
Waveform
-0.02 -0.015 -0.01 -0.005 0 0.005 0.01 0.015 0.02
-0.5
0
0.5
1
Lag (s)
Correlationcoef
Auto-correlation
Fig 10: Unprocessed speech signal Fig 11: Non linearly processed speech signal
The output of the correlator section gives the pitch of the speech signal which is very important to set
the pitch mark of a speech signal. Pitch mark is a reference point for the overlap between speech signals. If I set
the pitch mark at the right position it will assure good quality of synthetic speech. To do that a peak valley
decision method has been adopted to select either the positive or negative parts of the signal are being used,
followed by a dynamic programming to set the pitch mark at the signal.
Acknowledgments:
Though what matters most in a journal are its contents, it is the parts of the whole introduction,
methods used, graphs of simulation, simulation results obtained, conclusion etc. that make it an attractive
proposition. I have pinned my hopes that the readers would appreciate this approach. I have been fortunate to
get help and co-operation from many friends involved in this journal writing effort.
Dr. Anish Deb, Reader & Professor in Calcutta University, Raja bazaar, and Anil Kumar Sharma, Asst.
Prof of Abacus Institute of Engineering. & Management, Mogra, took the responsibility of creating the
excellencies in formatting the journal paper somewhere. Many thanks to both of you.
And lastly many thanks to Miss Moumita Bachaspati who cheered me in good times, encouraged me in
bad times, understood me at all times and for listening to my dreams.
Pitch Detection of Speech Synthesis by Using Matlab
www.iosrjournals.org 47 | Page
References:
[1] M. M. Sondhi, “New methods of pitch extraction,” IEEE Trans. Audio Electro Acoust., vol. AU-16, pp. 262-266, June 1968.
[2] J. D. Markel, “The SIFT algorithm for fundamental frequency estimation,” IEEE Trans. Audio Electro Acoust., vol. AU-20, pp.
367-377, Dec. 1972.
[3] L. R. Rabiner, M. R. Sambur, and C. E. Schmidt, “Applications of a nonlinear smoothing algorithm to speech processing,” IEEE
Trans. Acoust., Speech, Signal Processing, vol. ASSP-23, PP. 552-557, Dec. 1975.
[4] C. A. McGonegal, L. R. Rabiner, and A. E. Rosenberg, “A semiautomatic pitch detector (SAPD),” IEEE Trans. Acoust., Speech,
and Signal Processing, vol. ASSP-23, pp. 570-574, Dec. 1975.
[5] J. J. Dubnowski, R. W. Schafer, and L. R. Rabiner, “Real-time digital hardware pitch detector,” IEEE Trans. Acoust., Speech, and
Signal Processing, vol. ASSP-24, pp. 2-8, Feb. 1976.
[6] L. R. Rabiner, M. J. Cheng, A. E. Rosenberg, and C. A. McGonegal, “A comparative performance study of several pitch detection
algorithms,” IEEE Trans. Acoust., Speech, and Signal Processing, vol. ASSP-24, pp. 399-418, oct. 1976.
[7] Lawrence R. Rabiner, Fellow, IEEE, “On the use of Autocorrelation Analysis for pitch Detection,” IEEE Transaction on Acoustics,
Speech, and Signal Processing, vol. ASSP-25, No.1, Feb. 1977.
[8] Jau-Hung Chen and Yung-An Kao, “Pitch Marking Based on an Adaptable Filter and a Peak-Valley Estimation Method,”
computational linguistics and Chinese language processing, vol. 6, No.2, pp. 31-42, Feb. 2001,
[9] Pedro M. Carvalho, Luis C. Oliveira, Isabel M. Trancoso, M. Ceu Viana*, “Concatenative Speech Synthesis for European
Portuguese,” INESC/IST, *CLUL INESC, Rua Alves Redol, 9, 1000 Lisboa, PORTUGAL {Pedro.Carvalho, Luis.Oliveira,
Isabel.Trancoso, mcv}@inesc.pt.
[10] Ramesh Babu, “Digital Signal Processing”.

Más contenido relacionado

La actualidad más candente

Speech Processing in Stressing Co-Channel Interference Using the Wigner Distr...
Speech Processing in Stressing Co-Channel Interference Using the Wigner Distr...Speech Processing in Stressing Co-Channel Interference Using the Wigner Distr...
Speech Processing in Stressing Co-Channel Interference Using the Wigner Distr...CSCJournals
 
ROBUST FEATURE EXTRACTION USING AUTOCORRELATION DOMAIN FOR NOISY SPEECH RECOG...
ROBUST FEATURE EXTRACTION USING AUTOCORRELATION DOMAIN FOR NOISY SPEECH RECOG...ROBUST FEATURE EXTRACTION USING AUTOCORRELATION DOMAIN FOR NOISY SPEECH RECOG...
ROBUST FEATURE EXTRACTION USING AUTOCORRELATION DOMAIN FOR NOISY SPEECH RECOG...sipij
 
Dc3210881096
Dc3210881096Dc3210881096
Dc3210881096IJMER
 
Missing Component Restoration for Masked Speech Signals based on Time-Domain ...
Missing Component Restoration for Masked Speech Signals based on Time-Domain ...Missing Component Restoration for Masked Speech Signals based on Time-Domain ...
Missing Component Restoration for Masked Speech Signals based on Time-Domain ...NU_I_TODALAB
 
Fft analysis
Fft analysisFft analysis
Fft analysisSatrious
 
DSP_2018_FOEHU - Lec 06 - FIR Filter Design
DSP_2018_FOEHU - Lec 06 - FIR Filter DesignDSP_2018_FOEHU - Lec 06 - FIR Filter Design
DSP_2018_FOEHU - Lec 06 - FIR Filter DesignAmr E. Mohamed
 
IR UWB TOA Estimation Techniques and Comparison
IR UWB TOA Estimation Techniques and ComparisonIR UWB TOA Estimation Techniques and Comparison
IR UWB TOA Estimation Techniques and Comparisoninventionjournals
 
Gst08 tutorial-fast fourier sampling
Gst08 tutorial-fast fourier samplingGst08 tutorial-fast fourier sampling
Gst08 tutorial-fast fourier samplingssuser8d9d45
 
DSP_2018_FOEHU - Lec 0 - Course Outlines
DSP_2018_FOEHU - Lec 0 - Course OutlinesDSP_2018_FOEHU - Lec 0 - Course Outlines
DSP_2018_FOEHU - Lec 0 - Course OutlinesAmr E. Mohamed
 
Realization and design of a pilot assist decision making system based on spee...
Realization and design of a pilot assist decision making system based on spee...Realization and design of a pilot assist decision making system based on spee...
Realization and design of a pilot assist decision making system based on spee...csandit
 
Audio Processing
Audio ProcessingAudio Processing
Audio Processinganeetaanu
 
A Combined Voice Activity Detector Based On Singular Value Decomposition and ...
A Combined Voice Activity Detector Based On Singular Value Decomposition and ...A Combined Voice Activity Detector Based On Singular Value Decomposition and ...
A Combined Voice Activity Detector Based On Singular Value Decomposition and ...CSCJournals
 
DSP_2018_FOEHU - Lec 1 - Introduction to Digital Signal Processing
DSP_2018_FOEHU - Lec 1 - Introduction to Digital Signal ProcessingDSP_2018_FOEHU - Lec 1 - Introduction to Digital Signal Processing
DSP_2018_FOEHU - Lec 1 - Introduction to Digital Signal ProcessingAmr E. Mohamed
 
Fast Fourier Transform
Fast Fourier TransformFast Fourier Transform
Fast Fourier Transformop205
 
Adaptive equalization
Adaptive equalizationAdaptive equalization
Adaptive equalizationKamal Bhatt
 
Simulating communication systems with MATLAB: An introduction
Simulating communication systems with MATLAB: An introductionSimulating communication systems with MATLAB: An introduction
Simulating communication systems with MATLAB: An introductionAniruddha Chandra
 
FPGA Implementation of Large Area Efficient and Low Power Geortzel Algorithm ...
FPGA Implementation of Large Area Efficient and Low Power Geortzel Algorithm ...FPGA Implementation of Large Area Efficient and Low Power Geortzel Algorithm ...
FPGA Implementation of Large Area Efficient and Low Power Geortzel Algorithm ...IDES Editor
 

La actualidad más candente (20)

Signal Processing
Signal ProcessingSignal Processing
Signal Processing
 
Speech Processing in Stressing Co-Channel Interference Using the Wigner Distr...
Speech Processing in Stressing Co-Channel Interference Using the Wigner Distr...Speech Processing in Stressing Co-Channel Interference Using the Wigner Distr...
Speech Processing in Stressing Co-Channel Interference Using the Wigner Distr...
 
ROBUST FEATURE EXTRACTION USING AUTOCORRELATION DOMAIN FOR NOISY SPEECH RECOG...
ROBUST FEATURE EXTRACTION USING AUTOCORRELATION DOMAIN FOR NOISY SPEECH RECOG...ROBUST FEATURE EXTRACTION USING AUTOCORRELATION DOMAIN FOR NOISY SPEECH RECOG...
ROBUST FEATURE EXTRACTION USING AUTOCORRELATION DOMAIN FOR NOISY SPEECH RECOG...
 
Dc3210881096
Dc3210881096Dc3210881096
Dc3210881096
 
Missing Component Restoration for Masked Speech Signals based on Time-Domain ...
Missing Component Restoration for Masked Speech Signals based on Time-Domain ...Missing Component Restoration for Masked Speech Signals based on Time-Domain ...
Missing Component Restoration for Masked Speech Signals based on Time-Domain ...
 
Fft analysis
Fft analysisFft analysis
Fft analysis
 
DSP_2018_FOEHU - Lec 06 - FIR Filter Design
DSP_2018_FOEHU - Lec 06 - FIR Filter DesignDSP_2018_FOEHU - Lec 06 - FIR Filter Design
DSP_2018_FOEHU - Lec 06 - FIR Filter Design
 
IR UWB TOA Estimation Techniques and Comparison
IR UWB TOA Estimation Techniques and ComparisonIR UWB TOA Estimation Techniques and Comparison
IR UWB TOA Estimation Techniques and Comparison
 
Gst08 tutorial-fast fourier sampling
Gst08 tutorial-fast fourier samplingGst08 tutorial-fast fourier sampling
Gst08 tutorial-fast fourier sampling
 
DSP_2018_FOEHU - Lec 0 - Course Outlines
DSP_2018_FOEHU - Lec 0 - Course OutlinesDSP_2018_FOEHU - Lec 0 - Course Outlines
DSP_2018_FOEHU - Lec 0 - Course Outlines
 
Realization and design of a pilot assist decision making system based on spee...
Realization and design of a pilot assist decision making system based on spee...Realization and design of a pilot assist decision making system based on spee...
Realization and design of a pilot assist decision making system based on spee...
 
Audio Processing
Audio ProcessingAudio Processing
Audio Processing
 
A Combined Voice Activity Detector Based On Singular Value Decomposition and ...
A Combined Voice Activity Detector Based On Singular Value Decomposition and ...A Combined Voice Activity Detector Based On Singular Value Decomposition and ...
A Combined Voice Activity Detector Based On Singular Value Decomposition and ...
 
Fft analysis
Fft analysisFft analysis
Fft analysis
 
DSP_2018_FOEHU - Lec 1 - Introduction to Digital Signal Processing
DSP_2018_FOEHU - Lec 1 - Introduction to Digital Signal ProcessingDSP_2018_FOEHU - Lec 1 - Introduction to Digital Signal Processing
DSP_2018_FOEHU - Lec 1 - Introduction to Digital Signal Processing
 
Fast Fourier Transform
Fast Fourier TransformFast Fourier Transform
Fast Fourier Transform
 
Adaptive equalization
Adaptive equalizationAdaptive equalization
Adaptive equalization
 
Simulating communication systems with MATLAB: An introduction
Simulating communication systems with MATLAB: An introductionSimulating communication systems with MATLAB: An introduction
Simulating communication systems with MATLAB: An introduction
 
FPGA Implementation of Large Area Efficient and Low Power Geortzel Algorithm ...
FPGA Implementation of Large Area Efficient and Low Power Geortzel Algorithm ...FPGA Implementation of Large Area Efficient and Low Power Geortzel Algorithm ...
FPGA Implementation of Large Area Efficient and Low Power Geortzel Algorithm ...
 
Dcp project
Dcp projectDcp project
Dcp project
 

Destacado

Comparative Study between DCT and Wavelet Transform Based Image Compression A...
Comparative Study between DCT and Wavelet Transform Based Image Compression A...Comparative Study between DCT and Wavelet Transform Based Image Compression A...
Comparative Study between DCT and Wavelet Transform Based Image Compression A...IOSR Journals
 
Analysis and study of vibrations in view of combustion gas forces for diesel ...
Analysis and study of vibrations in view of combustion gas forces for diesel ...Analysis and study of vibrations in view of combustion gas forces for diesel ...
Analysis and study of vibrations in view of combustion gas forces for diesel ...IOSR Journals
 
Anomaly Detection using multidimensional reduction Principal Component Analysis
Anomaly Detection using multidimensional reduction Principal Component AnalysisAnomaly Detection using multidimensional reduction Principal Component Analysis
Anomaly Detection using multidimensional reduction Principal Component AnalysisIOSR Journals
 
Project and Change Management Success Factors from Malaysian Government Depar...
Project and Change Management Success Factors from Malaysian Government Depar...Project and Change Management Success Factors from Malaysian Government Depar...
Project and Change Management Success Factors from Malaysian Government Depar...IOSR Journals
 
Sequential Pattern Tree Mining
Sequential Pattern Tree MiningSequential Pattern Tree Mining
Sequential Pattern Tree MiningIOSR Journals
 
Audit Firm Rotation and Audit Report Lag in Nigeria
Audit Firm Rotation and Audit Report Lag in NigeriaAudit Firm Rotation and Audit Report Lag in Nigeria
Audit Firm Rotation and Audit Report Lag in NigeriaIOSR Journals
 
The Effects of Copper Addition on the compression behavior of Al-Ca Alloy
The Effects of Copper Addition on the compression behavior of Al-Ca AlloyThe Effects of Copper Addition on the compression behavior of Al-Ca Alloy
The Effects of Copper Addition on the compression behavior of Al-Ca AlloyIOSR Journals
 
Appraisal of E-learning structure in Nigerian Polytechnics: A Case study of F...
Appraisal of E-learning structure in Nigerian Polytechnics: A Case study of F...Appraisal of E-learning structure in Nigerian Polytechnics: A Case study of F...
Appraisal of E-learning structure in Nigerian Polytechnics: A Case study of F...IOSR Journals
 
Cryptanalysis of Efficient Unlinkable Secret Handshakes for Anonymous Communi...
Cryptanalysis of Efficient Unlinkable Secret Handshakes for Anonymous Communi...Cryptanalysis of Efficient Unlinkable Secret Handshakes for Anonymous Communi...
Cryptanalysis of Efficient Unlinkable Secret Handshakes for Anonymous Communi...IOSR Journals
 
Content Based Image Retrieval Using 2-D Discrete Wavelet Transform
Content Based Image Retrieval Using 2-D Discrete Wavelet TransformContent Based Image Retrieval Using 2-D Discrete Wavelet Transform
Content Based Image Retrieval Using 2-D Discrete Wavelet TransformIOSR Journals
 
The Development of Software for Assessment of Environmental Impact Indicators...
The Development of Software for Assessment of Environmental Impact Indicators...The Development of Software for Assessment of Environmental Impact Indicators...
The Development of Software for Assessment of Environmental Impact Indicators...IOSR Journals
 
Evolution of the coastline of Saidia - Cap Water (Northeastern Morocco)
Evolution of the coastline of Saidia - Cap Water (Northeastern Morocco)Evolution of the coastline of Saidia - Cap Water (Northeastern Morocco)
Evolution of the coastline of Saidia - Cap Water (Northeastern Morocco)IOSR Journals
 
A Power Flow Analysis of the Nigerian 330 KV Electric Power System
A Power Flow Analysis of the Nigerian 330 KV Electric Power SystemA Power Flow Analysis of the Nigerian 330 KV Electric Power System
A Power Flow Analysis of the Nigerian 330 KV Electric Power SystemIOSR Journals
 
Study of Different Parameters on the Chassis Space Frame For the Sports Car b...
Study of Different Parameters on the Chassis Space Frame For the Sports Car b...Study of Different Parameters on the Chassis Space Frame For the Sports Car b...
Study of Different Parameters on the Chassis Space Frame For the Sports Car b...IOSR Journals
 
An Analysis of Consumer Perceptions and Usage of Mobile Telecommunications Br...
An Analysis of Consumer Perceptions and Usage of Mobile Telecommunications Br...An Analysis of Consumer Perceptions and Usage of Mobile Telecommunications Br...
An Analysis of Consumer Perceptions and Usage of Mobile Telecommunications Br...IOSR Journals
 

Destacado (20)

Comparative Study between DCT and Wavelet Transform Based Image Compression A...
Comparative Study between DCT and Wavelet Transform Based Image Compression A...Comparative Study between DCT and Wavelet Transform Based Image Compression A...
Comparative Study between DCT and Wavelet Transform Based Image Compression A...
 
G1302044952
G1302044952G1302044952
G1302044952
 
O017128591
O017128591O017128591
O017128591
 
D012412931
D012412931D012412931
D012412931
 
Analysis and study of vibrations in view of combustion gas forces for diesel ...
Analysis and study of vibrations in view of combustion gas forces for diesel ...Analysis and study of vibrations in view of combustion gas forces for diesel ...
Analysis and study of vibrations in view of combustion gas forces for diesel ...
 
Anomaly Detection using multidimensional reduction Principal Component Analysis
Anomaly Detection using multidimensional reduction Principal Component AnalysisAnomaly Detection using multidimensional reduction Principal Component Analysis
Anomaly Detection using multidimensional reduction Principal Component Analysis
 
Project and Change Management Success Factors from Malaysian Government Depar...
Project and Change Management Success Factors from Malaysian Government Depar...Project and Change Management Success Factors from Malaysian Government Depar...
Project and Change Management Success Factors from Malaysian Government Depar...
 
B010230612
B010230612B010230612
B010230612
 
Sequential Pattern Tree Mining
Sequential Pattern Tree MiningSequential Pattern Tree Mining
Sequential Pattern Tree Mining
 
Audit Firm Rotation and Audit Report Lag in Nigeria
Audit Firm Rotation and Audit Report Lag in NigeriaAudit Firm Rotation and Audit Report Lag in Nigeria
Audit Firm Rotation and Audit Report Lag in Nigeria
 
The Effects of Copper Addition on the compression behavior of Al-Ca Alloy
The Effects of Copper Addition on the compression behavior of Al-Ca AlloyThe Effects of Copper Addition on the compression behavior of Al-Ca Alloy
The Effects of Copper Addition on the compression behavior of Al-Ca Alloy
 
Appraisal of E-learning structure in Nigerian Polytechnics: A Case study of F...
Appraisal of E-learning structure in Nigerian Polytechnics: A Case study of F...Appraisal of E-learning structure in Nigerian Polytechnics: A Case study of F...
Appraisal of E-learning structure in Nigerian Polytechnics: A Case study of F...
 
Cryptanalysis of Efficient Unlinkable Secret Handshakes for Anonymous Communi...
Cryptanalysis of Efficient Unlinkable Secret Handshakes for Anonymous Communi...Cryptanalysis of Efficient Unlinkable Secret Handshakes for Anonymous Communi...
Cryptanalysis of Efficient Unlinkable Secret Handshakes for Anonymous Communi...
 
Content Based Image Retrieval Using 2-D Discrete Wavelet Transform
Content Based Image Retrieval Using 2-D Discrete Wavelet TransformContent Based Image Retrieval Using 2-D Discrete Wavelet Transform
Content Based Image Retrieval Using 2-D Discrete Wavelet Transform
 
The Development of Software for Assessment of Environmental Impact Indicators...
The Development of Software for Assessment of Environmental Impact Indicators...The Development of Software for Assessment of Environmental Impact Indicators...
The Development of Software for Assessment of Environmental Impact Indicators...
 
Evolution of the coastline of Saidia - Cap Water (Northeastern Morocco)
Evolution of the coastline of Saidia - Cap Water (Northeastern Morocco)Evolution of the coastline of Saidia - Cap Water (Northeastern Morocco)
Evolution of the coastline of Saidia - Cap Water (Northeastern Morocco)
 
A Power Flow Analysis of the Nigerian 330 KV Electric Power System
A Power Flow Analysis of the Nigerian 330 KV Electric Power SystemA Power Flow Analysis of the Nigerian 330 KV Electric Power System
A Power Flow Analysis of the Nigerian 330 KV Electric Power System
 
N061102111
N061102111N061102111
N061102111
 
Study of Different Parameters on the Chassis Space Frame For the Sports Car b...
Study of Different Parameters on the Chassis Space Frame For the Sports Car b...Study of Different Parameters on the Chassis Space Frame For the Sports Car b...
Study of Different Parameters on the Chassis Space Frame For the Sports Car b...
 
An Analysis of Consumer Perceptions and Usage of Mobile Telecommunications Br...
An Analysis of Consumer Perceptions and Usage of Mobile Telecommunications Br...An Analysis of Consumer Perceptions and Usage of Mobile Telecommunications Br...
An Analysis of Consumer Perceptions and Usage of Mobile Telecommunications Br...
 

Similar a H0814247

Speech Analysis and synthesis using Vocoder
Speech Analysis and synthesis using VocoderSpeech Analysis and synthesis using Vocoder
Speech Analysis and synthesis using VocoderIJTET Journal
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
 
IRJET- Pitch Detection Algorithms in Time Domain
IRJET- Pitch Detection Algorithms in Time DomainIRJET- Pitch Detection Algorithms in Time Domain
IRJET- Pitch Detection Algorithms in Time DomainIRJET Journal
 
Speech Compression using LPC
Speech Compression using LPCSpeech Compression using LPC
Speech Compression using LPCDisha Modi
 
Voice biometric recognition
Voice biometric recognitionVoice biometric recognition
Voice biometric recognitionphyuhsan
 
Speech Enhancement Based on Spectral Subtraction Involving Magnitude and Phas...
Speech Enhancement Based on Spectral Subtraction Involving Magnitude and Phas...Speech Enhancement Based on Spectral Subtraction Involving Magnitude and Phas...
Speech Enhancement Based on Spectral Subtraction Involving Magnitude and Phas...IRJET Journal
 
A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...
A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...
A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...IRJET Journal
 
A New Method for Pitch Tracking and Voicing Decision Based on Spectral Multi-...
A New Method for Pitch Tracking and Voicing Decision Based on Spectral Multi-...A New Method for Pitch Tracking and Voicing Decision Based on Spectral Multi-...
A New Method for Pitch Tracking and Voicing Decision Based on Spectral Multi-...CSCJournals
 
Design and implementation of different audio restoration techniques for audio...
Design and implementation of different audio restoration techniques for audio...Design and implementation of different audio restoration techniques for audio...
Design and implementation of different audio restoration techniques for audio...eSAT Journals
 
Enhanced modulation spectral subtraction for IOVT speech recognition application
Enhanced modulation spectral subtraction for IOVT speech recognition applicationEnhanced modulation spectral subtraction for IOVT speech recognition application
Enhanced modulation spectral subtraction for IOVT speech recognition applicationIRJET Journal
 
METHOD FOR REDUCING OF NOISE BY IMPROVING SIGNAL-TO-NOISE-RATIO IN WIRELESS LAN
METHOD FOR REDUCING OF NOISE BY IMPROVING SIGNAL-TO-NOISE-RATIO IN WIRELESS LANMETHOD FOR REDUCING OF NOISE BY IMPROVING SIGNAL-TO-NOISE-RATIO IN WIRELESS LAN
METHOD FOR REDUCING OF NOISE BY IMPROVING SIGNAL-TO-NOISE-RATIO IN WIRELESS LANIJNSA Journal
 
A GAUSSIAN MIXTURE MODEL BASED SPEECH RECOGNITION SYSTEM USING MATLAB
A GAUSSIAN MIXTURE MODEL BASED SPEECH RECOGNITION SYSTEM USING MATLABA GAUSSIAN MIXTURE MODEL BASED SPEECH RECOGNITION SYSTEM USING MATLAB
A GAUSSIAN MIXTURE MODEL BASED SPEECH RECOGNITION SYSTEM USING MATLABsipij
 
Emotion Recognition based on audio signal using GFCC Extraction and BPNN Clas...
Emotion Recognition based on audio signal using GFCC Extraction and BPNN Clas...Emotion Recognition based on audio signal using GFCC Extraction and BPNN Clas...
Emotion Recognition based on audio signal using GFCC Extraction and BPNN Clas...ijceronline
 
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...IJCSEA Journal
 
Automatic Speech Recognition Incorporating Modulation Domain Enhancement
Automatic Speech Recognition Incorporating Modulation Domain EnhancementAutomatic Speech Recognition Incorporating Modulation Domain Enhancement
Automatic Speech Recognition Incorporating Modulation Domain EnhancementIRJET Journal
 
PSoC BASED SPEECH RECOGNITION SYSTEM
PSoC BASED SPEECH RECOGNITION SYSTEMPSoC BASED SPEECH RECOGNITION SYSTEM
PSoC BASED SPEECH RECOGNITION SYSTEMirjes
 
PSoC BASED SPEECH RECOGNITION SYSTEM
PSoC BASED SPEECH RECOGNITION SYSTEMPSoC BASED SPEECH RECOGNITION SYSTEM
PSoC BASED SPEECH RECOGNITION SYSTEMIJRES Journal
 
Speaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization ApproachSpeaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization Approachijsrd.com
 

Similar a H0814247 (20)

Speech Analysis and synthesis using Vocoder
Speech Analysis and synthesis using VocoderSpeech Analysis and synthesis using Vocoder
Speech Analysis and synthesis using Vocoder
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
IRJET- Pitch Detection Algorithms in Time Domain
IRJET- Pitch Detection Algorithms in Time DomainIRJET- Pitch Detection Algorithms in Time Domain
IRJET- Pitch Detection Algorithms in Time Domain
 
Speech Compression using LPC
Speech Compression using LPCSpeech Compression using LPC
Speech Compression using LPC
 
Voice biometric recognition
Voice biometric recognitionVoice biometric recognition
Voice biometric recognition
 
G010424248
G010424248G010424248
G010424248
 
Speech Enhancement Based on Spectral Subtraction Involving Magnitude and Phas...
Speech Enhancement Based on Spectral Subtraction Involving Magnitude and Phas...Speech Enhancement Based on Spectral Subtraction Involving Magnitude and Phas...
Speech Enhancement Based on Spectral Subtraction Involving Magnitude and Phas...
 
A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...
A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...
A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...
 
A New Method for Pitch Tracking and Voicing Decision Based on Spectral Multi-...
A New Method for Pitch Tracking and Voicing Decision Based on Spectral Multi-...A New Method for Pitch Tracking and Voicing Decision Based on Spectral Multi-...
A New Method for Pitch Tracking and Voicing Decision Based on Spectral Multi-...
 
Design and implementation of different audio restoration techniques for audio...
Design and implementation of different audio restoration techniques for audio...Design and implementation of different audio restoration techniques for audio...
Design and implementation of different audio restoration techniques for audio...
 
Enhanced modulation spectral subtraction for IOVT speech recognition application
Enhanced modulation spectral subtraction for IOVT speech recognition applicationEnhanced modulation spectral subtraction for IOVT speech recognition application
Enhanced modulation spectral subtraction for IOVT speech recognition application
 
METHOD FOR REDUCING OF NOISE BY IMPROVING SIGNAL-TO-NOISE-RATIO IN WIRELESS LAN
METHOD FOR REDUCING OF NOISE BY IMPROVING SIGNAL-TO-NOISE-RATIO IN WIRELESS LANMETHOD FOR REDUCING OF NOISE BY IMPROVING SIGNAL-TO-NOISE-RATIO IN WIRELESS LAN
METHOD FOR REDUCING OF NOISE BY IMPROVING SIGNAL-TO-NOISE-RATIO IN WIRELESS LAN
 
N017428692
N017428692N017428692
N017428692
 
A GAUSSIAN MIXTURE MODEL BASED SPEECH RECOGNITION SYSTEM USING MATLAB
A GAUSSIAN MIXTURE MODEL BASED SPEECH RECOGNITION SYSTEM USING MATLABA GAUSSIAN MIXTURE MODEL BASED SPEECH RECOGNITION SYSTEM USING MATLAB
A GAUSSIAN MIXTURE MODEL BASED SPEECH RECOGNITION SYSTEM USING MATLAB
 
Emotion Recognition based on audio signal using GFCC Extraction and BPNN Clas...
Emotion Recognition based on audio signal using GFCC Extraction and BPNN Clas...Emotion Recognition based on audio signal using GFCC Extraction and BPNN Clas...
Emotion Recognition based on audio signal using GFCC Extraction and BPNN Clas...
 
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...
AN ANALYSIS OF SPEECH RECOGNITION PERFORMANCE BASED UPON NETWORK LAYERS AND T...
 
Automatic Speech Recognition Incorporating Modulation Domain Enhancement
Automatic Speech Recognition Incorporating Modulation Domain EnhancementAutomatic Speech Recognition Incorporating Modulation Domain Enhancement
Automatic Speech Recognition Incorporating Modulation Domain Enhancement
 
PSoC BASED SPEECH RECOGNITION SYSTEM
PSoC BASED SPEECH RECOGNITION SYSTEMPSoC BASED SPEECH RECOGNITION SYSTEM
PSoC BASED SPEECH RECOGNITION SYSTEM
 
PSoC BASED SPEECH RECOGNITION SYSTEM
PSoC BASED SPEECH RECOGNITION SYSTEMPSoC BASED SPEECH RECOGNITION SYSTEM
PSoC BASED SPEECH RECOGNITION SYSTEM
 
Speaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization ApproachSpeaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization Approach
 

Más de IOSR Journals (20)

A011140104
A011140104A011140104
A011140104
 
M0111397100
M0111397100M0111397100
M0111397100
 
L011138596
L011138596L011138596
L011138596
 
K011138084
K011138084K011138084
K011138084
 
J011137479
J011137479J011137479
J011137479
 
I011136673
I011136673I011136673
I011136673
 
G011134454
G011134454G011134454
G011134454
 
H011135565
H011135565H011135565
H011135565
 
F011134043
F011134043F011134043
F011134043
 
E011133639
E011133639E011133639
E011133639
 
D011132635
D011132635D011132635
D011132635
 
C011131925
C011131925C011131925
C011131925
 
B011130918
B011130918B011130918
B011130918
 
A011130108
A011130108A011130108
A011130108
 
I011125160
I011125160I011125160
I011125160
 
H011124050
H011124050H011124050
H011124050
 
G011123539
G011123539G011123539
G011123539
 
F011123134
F011123134F011123134
F011123134
 
E011122530
E011122530E011122530
E011122530
 
D011121524
D011121524D011121524
D011121524
 

H0814247

  • 1. IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-ISSN: 2278-2834,p- ISSN: 2278-8735.Volume 8, Issue 1 (Sep. - Oct. 2013), PP 42-47 www.iosrjournals.org www.iosrjournals.org 42 | Page Pitch Detection of Speech Synthesis by Using Matlab Abhishek Nandy (Electronics and Communication Engineering, Abacus Institute of Engineering & Management, India) Abstract: In speech synthesis, machine is developed which can accept text and convert into natural sounding speech. Applications of speech synthesis include speech output from computers, reading machine for the visually challenged people. The difference between text to speech synthesizer and any other talking machine (e.g., cassette player) is, it could be trained for any speaker’s voice in a fully automatic way. Three main approaches to speech synthesis: articulator synthesis, formant synthesis, and concatenate synthesis. I am carrying out with concatenate synthesis approach. In addition, text-to-speech (TTS) conversion system based on time-domain pitch-synchronous overlap-add (TD-PSOLA) method, has been employed to perform prosody (includes pitch, duration of a speech) modification.7 To assure good quality of synthetic speech accurate estimation of pitch-period and pitch-marks are necessary for pitch modification. Pitch marking is divided into two tasks; pitch detection and location determination. LPF and some nonlinearity are being used for pitch- detection; peak-valley decision method is used to determine the appropriate parts of speech for used in pitch- mark estimation. In each pitch period, two possible peaks/valleys are searched and one dynamic programming is run to obtain pitch-mark. Keywords: auto-correlation, concatenate, pitch-detection, pitch-mark, TTS I. Introduction A text-to-speech (TTS) system converts normal language text into speech; other systems render symbolic linguistic representations like phonetic transcriptions into speech. Stephen Hawking is one of the most famous people using speech synthesis to communicate. Concatenative synthesis is based on the concatenation (or stringing together) of segments of recorded speech. Generally, concatenative synthesis produces the most natural-sounding synthesized speech. However, differences between natural variations in speech and the nature of the automated techniques for segmenting the waveforms sometimes result in audible glitches in the output. There are three main sub-types of concatenative synthesis. Concatenate speech synthesis produces speech by concatenating small, pre-recorded units of speech. With longer units, the naturalness increases, less concatenation points are needed. The most widely used units in concatenated synthesis are diphones. At synthesis time, the diphone waveforms are processed through an analysis- synthesis system which is based on a representation of the speech signal by its short-time Fourier transform (STFT) at a pitch synchronous sampling rate. The synthesis part of the system works by overlap-adding the modified short-term signals and it ensures a smooth concatenation of the diphone waveforms. Overlap and add (OLA) methods which have been proposed to manipulate some of the speech parameter while maintain a high degree of naturalness. In the PSOLA method every signal is correspond to one pitch period of the speech signal. This implies a preliminary segmentation of the speech signal into individual pitch periods. In this method, input speech is represented in speech digital waveform with successive “pitch-marks” distributed along the time scale. Two analysis windows centered on two successive p-marks should overlap by sufficient amount. The analysis short term signals are obtained by multiplying the signal by the analysis window centered on the corresponding p-marks. In addition, this method has been used to perform the prosody modification according to the target prosodic information. Prosodic information of a speech includes its pitch (fundamental frequency, f0), intensity, duration. In TD-PSOLA method it is required to obtain a pitch mark for each pitch period in order to assure a good quality of synthesis speech. If the pitch mark is denoted at the signal with the largest amplitude, but largest peak may not correspond to the largest one in the next period. This is shown in the following figure: Fig1: It shows that largest peak does not correspond to the largest one in the next period This will create an unpleasant speech after the method is used. So that in order to determine a pitch mark two highest peaks in each period are searched. To find out the pitch mark position, I have to first find the pitch period of a speech signal. I use a low pass filter and autocorrelation method to find the pitch of a signal. After that I use an adaptable filter, peak-
  • 2. Pitch Detection of Speech Synthesis by Using Matlab www.iosrjournals.org 43 | Page valley decision method, and a dynamic programming to determine the pitch mark of a speech signal. With the purpose of spectrally flattening the signal I applied some sort of nonlinearities to the speech signal before the autocorrelation method. The autocorrelation of a sequence is correlation of a sequence with itself, the autocorrelation of a sequence x(n) is defined by6       n x mnxnxm ).()(lim)( 12 1  II. Method Used: 2.1Proposed Algorithm: I define some symbols which are used in the algorithm: N: frame size in sample. sm[n]: the voiced speech of the mth frame, where 0 ≤n< N. SFm[k]: the frequency response of sm[n], where 0 ≤k< N. YFm[k]: the pass band frequency response SFm[k], where 0 ≤k< N. om[l]: the adaptive filter’s output signal of the mth frame, where 0 ≤l< N. The algorithm is as follows: Step1: Use FFT to transform the signal sm[n] to obtain the frequency response SFm[k]. Step2: Find the position kp of the spectral peak for SFm[k] by searching the first forty points of | SFm[k] |. Step3: Decide on the filter’s pass band. Let YFm[k] = SFm[k] if 3≤k≤kp+2, otherwise let YFm[k] = 0. Step4: normalize YFm[k] by multiplying a scale of Maxk(|YFm[k]| / |YFm[kp]|). Step5: Use IFFT to transform the normalized YFm[k] to the time domain. Next step in the pitch mark determination is the peak-valley decision method and dynamic programming. In the peak-valley decision method I used to calculate the two costs by summing the amplitude of s[q], where q represent the position of the two extreme point of o[. ] over each pitch period. In that case o[. ] represent the filtered output and s[.] represent the input speech signal. 2.2 Design & Description: For a synthesis scheme based on the TD-PSOLA method, it is required to obtain a pitch mark for each pitch period in order to assure good quality of synthesis speech. The pitch mark is the reference point for the overlap between the speech signals. There are two major tasks in determination of pitch mark: i) pitch detection ii) location determination. To do that, I follow the design that based on an adaptable filter and a peak-valley method. The block diagram of that method is shown below. In that block diagram I take voiced speech as the input signal because only the periodic parts are of interest. In pitch detection part I use low pass filter and nonlinearity blocks to find out the pitch period of a signal. After filtration of the signal, it goes pass through the autocorrelation block and I get the required pitch period. The required block diagram is shown in section 2.2.1. 2.2.1 Block diagram of the proposed pitch marking method: III. Simulation Results Obtained: To implement the structure which is given to the block diagram, I required designing some element namely low pass filter, two nonlinear blocks, and a correlator. Using the MATLAB programming I design the blocks that are required to implement my job.
  • 3. Pitch Detection of Speech Synthesis by Using Matlab www.iosrjournals.org 44 | Page The first one which I design is a low pass filter, to design a low pass filter; I use some of the following filter specification: i) FIR, Linear phase, Digital Filter ii) Pass band of 0 to 900 Hz iii) Stop band beginning at 1700 Hz iv) Filter has an impulse response duration of 25 samples v) The filter pass band was flat to within ± 0.03, and the stop band response was down at least 50 dB6 After using those specifications to design a low pass filter, I get the following filter’s response graph. In the Y-axis of the graph I plot the filter’s absolute magnitude part and in the X-axis I plot the frequency which is normalized in nature. 3.1 Nonlinear Correlation processing: Fig2: Block diagram of non-linear correlation processing In this block diagram speech signal[S(n)] is first low pass filtered, the output is then used as input to two nonlinear processor, labeled as NL1 and NL2 in the block diagram. The nonlinearities used in each path may or may not be same.6 It has also been argued that such a correlation will be most appropriate in a variety of actual situations in pitch detection.2 3.2 Peak-Valley decision and dynamic programming: I use the following two equations to calculate the costs:   peak peak N n peakpeak nPossC 1 1 ]][[   valley valley N n valleyvalley nPossC 1 1 ]][[ Where the symbols are defined as follows: C peak= cost estimated at the peaks of o [. ]. C valley= cost estimated at the valley of o [. ]. N peak= total number of the peaks of the o [. ]. N valley= total number of the valleys of the o [. ]. Pos peak[n] = position of the nth peak of o [. ]. Pos valley[n] = position of the nth valley of o [. ]. The peak-valley decision method is made as follows: If C peak > C valley then the positive part (peak) of s[. ] is adopted for the evaluation of the pitch mark. Otherwise, the negative part (valley) of s[. ] is adopted for the purpose. Once the peak or valley is determined, in my case say the peak is adopted for the pitch mark determination. The positions of the pitch mark are set by picking the peaks of the speech signal. The PSOLA (Pitch Synchronous Overlap and Add) method can be used to produce good quality of speech if the pitch is marked at the highest amplitude of the signal. To do the pitch mark we have to calculate the following two quantities. For the ith pitch period, Pi, suppose the highest and the second highest peaks are located at Li1 and Li2, respectively. The distortion of the pitch period, disi (j, k) and its accumulation, Ai (j) are defined as follows: disi (j, k) = ||Lij – L(i-1)k|- Pi|+ g(j, k), for i=2, 3...PN, Ai (j) = min {disi (j, 1) + Ai-1(1), disi(j, 2) + Ai-1(2)}, for i=2, 3….PN, Where PN is the total number of pitch period and j, k= 1, 2; g (j, k) is a penalty function, defined by g(j, k) = 0, if j=1 or k=1 otherwise, 1/PN
  • 4. Pitch Detection of Speech Synthesis by Using Matlab www.iosrjournals.org 45 | Page IV. Graphs of Simulation: Output response of the low pass filter: Input speech and Low pass filtered signal: 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.005 0.01 0.015 0.02 0.025 Normalized frequency (/) -- >> Magnitude-->> 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 -0.05 -0.04 -0.03 -0.02 -0.01 0 0.01 0.02 Time(s) -- >> Magnitude-->> Input speech signal Fig 3: Response graph of the low pass filter . Fig4: Input speech signal. Non-linearly processed output from the correlation processing: 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 -4 -3 -2 -1 0 1 2 x 10 -4 Time(s) -- >> Magnitude-->> Lowpass filtered speech signal 0 0.005 0.01 0.015 0.02 0.025 0.03 -1 -0.5 0 0.5 1 Time (s) Amplitude Waveform -0.02 -0.015 -0.01 -0.005 0 0.005 0.01 0.015 0.02 -0.5 0 0.5 1 Lag (s) Correlationcoef Auto-correlation Fig5: speech signal after passing through LPF Fig6: nonlinearly processed signal and correlator output After the correlator block I get the pitch period (f0) of the speech signal. In my work, I get an f0= 155.3398 Hz for a speech signal which I am used in my on-going work. First figure shows the response of the band pass filter that I am used as my adaptive filter to transform the speech signal into a sin like wave. The speech signal that I am getting after the band pass filter is like smooth sine curves with the high frequency component are cut out. Filtered speech is now passing through a peak-valley decision block to find out which part of the speech is suitable for pitch mark estimation. In fig8 and fig9, the red asterisks denote the positive peaks and the green asterisks denote the valleys of the speech signal. After calculate the cost using my method I see that the positive peak is greater than the negative peak, so I use the positive peak of the speech signal to determine the positions of the pitch marks. Output response of band pass filter’s: Peak-Valley detection and pitch mark in a speech signal: 0 100 200 300 400 500 600 -3 -2.5 -2 -1.5 -1 -0.5 0 0.5 Filter Point ---->> FilterCoefficient---->> Band pass filter's response 0 100 200 300 400 500 600 -0.015 -0.01 -0.005 0 0.005 0.01 IFFT length of Filter Signal ---->> IFFTCoefficientofFilterSignal---->> Input Speech Signal Valley Peak Fig7: band pass filter’s response Fig8: filtered signal with peak-valley detection
  • 5. Pitch Detection of Speech Synthesis by Using Matlab www.iosrjournals.org 46 | Page 0 100 200 300 400 500 600 -0.015 -0.01 -0.005 0 0.005 0.01 IFFT length of Filter Signal ---->> IFFTCoefficientofFilterSignal---->> Fig9: speech signal with its pitch mark V. Conclusion I have examined method for combining nonlinear processing of the speech signal with a standard correlation analysis to give correlation functions which have sharp peaks at the pitch period. I have seen that the non-linearity provide some degree of spectral flattening which enhance the periodicity peaks in the correlation function, and reduce the correlation peaks due to the formant structure of the waveform. In the following figures I show the difference between unprocessed and non-linearly processed signal: 0 0.005 0.01 0.015 0.02 0.025 0.03 -4 -2 0 2 x 10 -4 Time (s) Amplitude Waveform -0.02 -0.015 -0.01 -0.005 0 0.005 0.01 0.015 0.02 -0.5 0 0.5 1 Lag (s) Correlationcoef Auto-correlation 0 0.005 0.01 0.015 0.02 0.025 0.03 -1 -0.5 0 0.5 1 Time (s) Amplitude Waveform -0.02 -0.015 -0.01 -0.005 0 0.005 0.01 0.015 0.02 -0.5 0 0.5 1 Lag (s) Correlationcoef Auto-correlation Fig 10: Unprocessed speech signal Fig 11: Non linearly processed speech signal The output of the correlator section gives the pitch of the speech signal which is very important to set the pitch mark of a speech signal. Pitch mark is a reference point for the overlap between speech signals. If I set the pitch mark at the right position it will assure good quality of synthetic speech. To do that a peak valley decision method has been adopted to select either the positive or negative parts of the signal are being used, followed by a dynamic programming to set the pitch mark at the signal. Acknowledgments: Though what matters most in a journal are its contents, it is the parts of the whole introduction, methods used, graphs of simulation, simulation results obtained, conclusion etc. that make it an attractive proposition. I have pinned my hopes that the readers would appreciate this approach. I have been fortunate to get help and co-operation from many friends involved in this journal writing effort. Dr. Anish Deb, Reader & Professor in Calcutta University, Raja bazaar, and Anil Kumar Sharma, Asst. Prof of Abacus Institute of Engineering. & Management, Mogra, took the responsibility of creating the excellencies in formatting the journal paper somewhere. Many thanks to both of you. And lastly many thanks to Miss Moumita Bachaspati who cheered me in good times, encouraged me in bad times, understood me at all times and for listening to my dreams.
  • 6. Pitch Detection of Speech Synthesis by Using Matlab www.iosrjournals.org 47 | Page References: [1] M. M. Sondhi, “New methods of pitch extraction,” IEEE Trans. Audio Electro Acoust., vol. AU-16, pp. 262-266, June 1968. [2] J. D. Markel, “The SIFT algorithm for fundamental frequency estimation,” IEEE Trans. Audio Electro Acoust., vol. AU-20, pp. 367-377, Dec. 1972. [3] L. R. Rabiner, M. R. Sambur, and C. E. Schmidt, “Applications of a nonlinear smoothing algorithm to speech processing,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-23, PP. 552-557, Dec. 1975. [4] C. A. McGonegal, L. R. Rabiner, and A. E. Rosenberg, “A semiautomatic pitch detector (SAPD),” IEEE Trans. Acoust., Speech, and Signal Processing, vol. ASSP-23, pp. 570-574, Dec. 1975. [5] J. J. Dubnowski, R. W. Schafer, and L. R. Rabiner, “Real-time digital hardware pitch detector,” IEEE Trans. Acoust., Speech, and Signal Processing, vol. ASSP-24, pp. 2-8, Feb. 1976. [6] L. R. Rabiner, M. J. Cheng, A. E. Rosenberg, and C. A. McGonegal, “A comparative performance study of several pitch detection algorithms,” IEEE Trans. Acoust., Speech, and Signal Processing, vol. ASSP-24, pp. 399-418, oct. 1976. [7] Lawrence R. Rabiner, Fellow, IEEE, “On the use of Autocorrelation Analysis for pitch Detection,” IEEE Transaction on Acoustics, Speech, and Signal Processing, vol. ASSP-25, No.1, Feb. 1977. [8] Jau-Hung Chen and Yung-An Kao, “Pitch Marking Based on an Adaptable Filter and a Peak-Valley Estimation Method,” computational linguistics and Chinese language processing, vol. 6, No.2, pp. 31-42, Feb. 2001, [9] Pedro M. Carvalho, Luis C. Oliveira, Isabel M. Trancoso, M. Ceu Viana*, “Concatenative Speech Synthesis for European Portuguese,” INESC/IST, *CLUL INESC, Rua Alves Redol, 9, 1000 Lisboa, PORTUGAL {Pedro.Carvalho, Luis.Oliveira, Isabel.Trancoso, mcv}@inesc.pt. [10] Ramesh Babu, “Digital Signal Processing”.