SlideShare una empresa de Scribd logo
1 de 66
Descargar para leer sin conexión
Computational Approaches to Melodic
Analysis of Indian Art Music
Indian Institute of Sciences, Bengaluru, India 2016
Sankalp Gulati
Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain
Tonic
Melody
Intonation
Raga
Motifs
Similarity
Melodic
description
Tonic Identification
Tonic Identification
time (s)
Frequency(Hz)
0 1 2 3 4 5 6 7 8
0
1000
2000
3000
4000
5000
100 150 200 250 300
0
0.2
0.4
0.6
0.8
1
Frequency (bins), 1bin=10 cents, Ref=55 Hz
Normalizedsalience
f2
f3
f4
f
5f6
Tonic
Signal processing Learning
q  Tanpura / drone background sound
q  Extent of gamakas on Sa and Pa svara
q  Vadi, sam-vadi svara of the rāga
S. Gulati, A. Bellur, J. Salamon, H. Ranjani, V. Ishwar, H.A. Murthy, and X. Serra. Automatic tonic identification in Indian art music: approaches
and evaluation. Journal of New Music Research, 43(01):55–73, 2014.
Salamon, J., Gulati, S., & Serra, X. (2012). A multipitch approach to tonic identification in Indian classical music. In Proc. of Int. Conf. on Music
Information Retrieval (ISMIR) (pp. 499–504), Porto, Portugal.
Bellur, A., Ishwar, V., Serra, X., & Murthy, H. (2012). A knowledge based signal processing approach to tonic identification in Indian classical music. In 2nd
CompMusic Workshop (pp. 113–118) Istanbul, Turkey.
Ranjani, H. G., Arthi, S., & Sreenivas, T. V. (2011). Carnatic music analysis: Shadja, swara identification and raga verification in Alapana using stochastic
models. Applications of Signal Processing to Audio and Acoustics (WASPAA), IEEE Workshop , 29–32, New Paltz, NY.
Accuracy : ~90% !!!
Tonic Identification: Multipitch Approach
q  Audio example:
q  Utilizing drone sound
q  Multi-pitch analysis
Vocals	
Drone	
J. Salamon, E. G´omez, and J. Bonada. Sinusoid extraction and salience function design for predominant melody
estimation. In Proc. 14th Int. Conf. on Digital Audio Effects (DAFX-11), pages 73–80, Paris, France, Sep. 2011.
Tonic Identification: Block Diagram
STFT	
Spectral	Peak	
Picking	
Frequency/	Amplitude	
correc<on	
Salience	peak	
picking	
Mul<-pitch	
histogram	
Histogram	peak	
picking	
Bin	salience	mapping	
Harmonic	summa<on	
Audio	
Sinusoids	
Time	frequency	salience	
Sinusoid	Extrac<on	
Tonic	candidates	
Salience	func<on	
computa<on	
Tonic	candidate	
genera<on
Tonic Identification: Signal Processing
q  STFT
§  Hop size: 11 ms
§  Window length: 46 ms
§  Window type: hamming
§  FFT = 8192 points
STFT
Tonic Identification: Signal Processing
q  Spectral peak picking
§  Absolute threshold: -60 dB
Spectral	Peak	
Picking
Tonic Identification: Signal Processing
q  Frequency/Amplitude
correction
§  Parabolic interpolation
Frequency/	Amplitude	
correc<on
Tonic Identification: Signal Processing
q  Harmonic summation
§  Spectrum considered: 55-7200 Hz
§  Frequency range: 55-1760 Hz
§  Base frequency: 55 Hz
§  Bin resolution: 10 cents per bin (120
per octave)
§  N octaves: 5
§  Maximum harmonics: 20
§  Square cosine window across 50 cents
Bin	salience	mapping	
Harmonic	summa<on
Tonic Identification: Signal Processing
q  Tonic candidate generation
§  Number of salience peaks per
frame: 5
§  Frequency range: 110-550 Hz
Mul<-pitch	
histogram
Tonic Identification: Feature Exraction
q  Identifying tonic in correct octave using multi-pitch
histogram
q  Classification based template learning
q  Class of an instance is the rank of the tonic
100 150 200 250 300 350 400
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Frequency bins (1 bin = 10 cents), Ref: 55Hz
Normalizedsalience
Multipitch Histogram
f2	
f3	
f4	
f5
q  Decision Tree:
f2	
f3	
f2	
f3	
f5	
1st	
1st	2nd	
3rd	
4th	 5th	
>5	<=5	
>-7	<=-7	
>-11	<=-11	
>5	<=5	 >-6	<=-6	
Sa	
Sa	
Pa	
salience	
Frequency	
Sa	
Sa	
Pa	
salience	
Frequency	
Tonic Identification: Classification
Tonic Identification: Results
S. Gulati, A. Bellur, J. Salamon, H. Ranjani, V. Ishwar, H.A. Murthy, and X. Serra. Automatic tonic
identification in Indian art music: approaches and evaluation. Journal of New Music Research, 43(01):
55–73, 2014.
Predominant Pitch Estimation
Pitch Estimation Algorithms
q  Time-domain approaches
§  ACF-based (Rabiner 1977)
§  AMDF-based (YIN) Cheveigné et al.
q  Frequency-domain approaches
§  Two-way mismatch (Maher and
Beauchamp 1994)
§  Subharmonic summation (Hermes 1988)
Rabiner, L. (1977, February). On the use of autocorrelation analysis for pitch detection. IEEE Transactions on Acoustics, Speech, and Signal
Processing, 25(1), 24–33
De Cheveigné, A., and Kawahara, H., "YIN, a fundamental frequency estimator for speech and music." The Journal of the Acoustical Society
of America 111, no. 4 (2002): 1917-1930.
§  Multi-pitch approaches
§  Source separation-based (Klapuri, 2003)
§  Harmonic summation (Melodia) (Salamon and
Gómez, 2012)
Medan, Y., & Yair, E. (1991). Super resolution pitch determination of speech signals. IEEE transactions on signal processing, 39(1), 40–48.
Maher, R., & Beauchamp, J. W. (1994). Fundamental frequency estimation of musical signals using a two-way mismatch procedure. The
Journal of the Acoustical Society of , 95 (April), 2254–2263.
Hermes, D. (1988, 1988). Measurement of pitch by subharmonic summation. Journal of the Acoustical Society of America, 83, 257 - 264.
Klapuri, A. (2003b, November). Multiple fundamental frequency estimation based on harmonicity and spectral smoothness. IEEE
Transactions on Speech and Audio Processing, 11(6), 804–816.
Salamon, J., & Gómez, E. (2012, August). Melody Extraction From Polyphonic Music Signals Using Pitch Contour Characteristics. IEEE
Transactions on Audio, Speech, and Language Processing, 20(6), 1759–1770.
Pitch Estimation Algorithms
q  Time-domain approaches
§  ACF-based (Rabiner 1977)
§  AMDF-based (YIN) Cheveigné et al.
q  Frequency-domain approaches
§  Two-way mismatch (Maher and
Beauchamp 1994)
§  Subharmonic summation (Hermes 1988)
Rabiner, L. (1977, February). On the use of autocorrelation analysis for pitch detection. IEEE Transactions on Acoustics, Speech, and Signal
Processing, 25(1), 24–33
De Cheveigné, A., and Kawahara, H., "YIN, a fundamental frequency estimator for speech and music." The Journal of the Acoustical
Society of America 111, no. 4 (2002): 1917-1930.
§  Multi-pitch approaches
§  Source separation-based (Klapuri, 2003)
§  Harmonic summation (Melodia) (Salamon
and Gómez, 2012)
Medan, Y., & Yair, E. (1991). Super resolution pitch determination of speech signals. IEEE transactions on signal processing, 39(1), 40–48.
Maher, R., & Beauchamp, J. W. (1994). Fundamental frequency estimation of musical signals using a two-way mismatch procedure. The
Journal of the Acoustical Society of , 95 (April), 2254–2263.
Hermes, D. (1988, 1988). Measurement of pitch by subharmonic summation. Journal of the Acoustical Society of America, 83, 257 - 264.
Klapuri, A. (2003b, November). Multiple fundamental frequency estimation based on harmonicity and spectral smoothness. IEEE
Transactions on Speech and Audio Processing, 11(6), 804–816.
Salamon, J., & Gómez, E. (2012, August). Melody Extraction From Polyphonic Music Signals Using Pitch Contour Characteristics.
IEEE Transactions on Audio, Speech, and Language Processing, 20(6), 1759–1770.
Predominant Pitch Estimation: YIN
Signal
Difference function
Auto-correlation
Cumulative difference
function
rt͑␶͒ϭ ͚jϭtϩ1
tϩW
xjxjϩ␶, ͑1͒
where rt(␶) is the autocorrelation function of lag ␶ calculated
at time index t, and W is the integration window size. This
function is illustrated in Fig. 1͑b͒ for the signal plotted in
Fig. 1͑a͒. It is common in signal processing to use a slightly
different definition:
rtЈ͑␶͒ϭ ͚jϭtϩ1
tϩWϪ␶
xjxjϩ␶. ͑2͒
Here the integration window size shrinks with increasing
values of ␶, with the result that the envelope of the function
decreases as a function of lag as illustrated in Fig. 1͑c͒. The
FIG. 1. ͑a͒ Example of a speech waveform. ͑b͒ Autocorrelation function
͑ACF͒ calculated from the waveform in ͑a͒ according to Eq. ͑1͒. ͑c͒ Same,
calculated according to Eq. ͑2͒. The envelope of this function is tapered to
zero because of the smaller number of terms in the summation at larger ␶.
FIG. 2. F0 estimation error rates as a function of the slope of the envelope
of the ACF, quantified by its intercept with the abscissa. The dotted line
represents errors for which the F0 estimate was too high, the dashed line
those for which it was too low, and the full line their sum. Triangles at the
right represent error rates for ACF calculated as in Eq. ͑1͒ (␶maxϭϱ). These
rates were measured over a subset of the database used in Sec. III.
Lag	(samples)	
The present article introduces a method for F0 estima-
tion that produces fewer errors than other well-known meth-
ods. The name YIN ͑from ‘‘yin’’ and ‘‘yang’’ of oriental
philosophy͒ alludes to the interplay between autocorrelation
and cancellation that it involves. This article is the first of a
rt͑␶͒ϭ
where rt(␶
at time ind
function is
Fig. 1͑a͒. I
different d
rtЈ͑␶͒ϭ
Here the
values of ␶
decreases
two definit
side ͓tϩ1,
this articl
‘‘modified
correlation
In resp
multiples
FIG. 1. ͑a͒ Example of a speech waveform. ͑b͒ Autocorrelation function
͑ACF͒ calculated from the waveform in ͑a͒ according to Eq. ͑1͒. ͑c͒ Same,
calculated according to Eq. ͑2͒. The envelope of this function is tapered to
zero because of the smaller number of terms in the summation at larger ␶.
The horizontal arrows symbolize the search range for the period.
FIG. 2. F0 e
of the ACF,
represents er
those for wh
right represen
rates were m
␶max . The parameter ␶max allows the algorithm to be biased
to favor one form of error at the expense of the other, with a
minimum of total error for intermediate values. Using Eq. ͑2͒
rather than Eq. ͑1͒ introduces a natural bias that can be tuned
by adjusting W. However, changing the window size has
other effects, and one can argue that a bias of this sort, if
useful, should be applied explicitly rather than implicitly.
This is one reason to prefer the definition of Eq. ͑1͒.
The autocorrelation method compares the signal to its
shifted self. In that sense it is related to the AMDF method
͑average magnitude difference function, Ross et al., 1974;
Ney, 1982͒ that performs its comparison using differences
rather than products, and more generally to time-domain
methods that measure intervals between events in time
͑Hess, 1983͒. The ACF is the Fourier transform of the power
spectrum, and can be seen as measuring the regular spacing
of harmonics within that spectrum. The cepstrum method
͑Noll, 1967͒ replaces the power spectrum by the log magni-
tude spectrum and thus puts less weight on high-amplitude
parts of the spectrum ͑particularly near the first formant that
often dominates the ACF͒. Similar ‘‘spectral whitening’’ ef-
fects can be obtained by linear predictive inverse filtering or
center-clipping ͑Rabiner and Schafer, 1978͒, or by splitting
the signal over a bank of filters, calculating ACFs within
each channel, and adding the results after amplitude normal-
ization ͑de Cheveigne´, 1991͒. Auditory models based on au-
tocorrelation are currently one of the more popular ways to
The same is true after taking the square and averaging over a
window:
͚jϭtϩ1
tϩW
͑xjϪxjϩT͒2
ϭ0. ͑5͒
Conversely, an unknown period may be found by forming
the difference function:
dt͑␶͒ϭ ͚jϭ1
W
͑xjϪxjϩ␶͒2
, ͑6͒
and searching for the values of ␶ for which the function is
zero. There is an infinite set of such values, all multiples of
the period. The difference function calculated from the signal
in Fig. 1͑a͒ is illustrated in Fig. 3͑a͒. The squared sum may
FIG. 3. ͑a͒ Difference function calculated for the speech signal of Fig. 1͑a͒.
͑b͒ Cumulative mean normalized difference function. Note that the function
starts at 1 rather than 0 and remains high until the dip at the period.
size was 25 ms, window shift was one sample, search range was 40 to 800
Hz, and threshold ͑step 4͒ was 0.1.
Version Gross error ͑%͒
Step 1 10.0
Step 2 1.95
Step 3 1.69
Step 4 0.78
Step 5 0.77
Step 6 0.50
Lag	(samples)	
ed
a
od
re
ow
00
sed
h a
͑2͒
ned
has
if
tly.
its
hod
74;
ces
ain
The same is true after taking the square and averaging over a
FIG. 3. ͑a͒ Difference function calculated for the speech signal of Fig. 1͑a͒.
͑b͒ Cumulative mean normalized difference function. Note that the function
starts at 1 rather than 0 and remains high until the dip at the period.
hod
were
dow
800
Lag	(samples)	
␶max . The parameter ␶max allows the algorithm to be biased
to favor one form of error at the expense of the other, with a
minimum of total error for intermediate values. Using Eq. ͑2͒
rather than Eq. ͑1͒ introduces a natural bias that can be tuned
by adjusting W. However, changing the window size has
other effects, and one can argue that a bias of this sort, if
useful, should be applied explicitly rather than implicitly.
This is one reason to prefer the definition of Eq. ͑1͒.
The autocorrelation method compares the signal to its
shifted self. In that sense it is related to the AMDF method
͑average magnitude difference function, Ross et al., 1974;
Ney, 1982͒ that performs its comparison using differences
rather than products, and more generally to time-domain
The same is true after taking the square and averaging over a
window:
FIG. 3. ͑a͒ Difference function calculated for the speech signal of Fig. 1͑a͒.
͑b͒ Cumulative mean normalized difference function. Note that the function
starts at 1 rather than 0 and remains high until the dip at the period.
TABLE I. Gross error rates for the simple unbiased autocorrelation method
͑step 1͒, and for the cumulated steps described in the text. These rates were
measured over a subset of the database used in Sec. III. Integration window
size was 25 ms, window shift was one sample, search range was 40 to 800
Hz, and threshold ͑step 4͒ was 0.1.
Version Gross error ͑%͒
Step 1 10.0
Step 2 1.95
Step 3 1.69
Step 4 0.78
Step 5 0.77
Step 6 0.50
Lag	(samples)	
De Cheveigné, A., and Kawahara, H., "YIN, a fundamental frequency estimator for speech and music." The Journal of the
Acoustical Society of America 111, no. 4 (2002): 1917-1930.
Predominant Pitch Estimation: YIN
Predominant Pitch Estimation: Melodia
Salamon, J., & Gómez, E. (2012, August). Melody Extraction From Polyphonic Music Signals Using Pitch Contour Characteristics. IEEE
Transactions on Audio, Speech, and Language Processing, 20 (6), 1759–1770.
Predominant Pitch Estimation: Melodia
Salamon, J., & Gómez, E. (2012, August). Melody Extraction From Polyphonic Music Signals Using Pitch Contour Characteristics. IEEE
Transactions on Audio, Speech, and Language Processing, 20 (6), 1759–1770.
Predominant Pitch Estimation: Melodia
audio
Spectrogram
Spectral peaks
Salamon, J., & Gómez, E. (2012, August). Melody Extraction From Polyphonic Music Signals Using Pitch Contour Characteristics. IEEE
Transactions on Audio, Speech, and Language Processing, 20 (6), 1759–1770.
Predominant Pitch Estimation: Melodia
Spectral peaks
Time-frequency
salience
Salamon, J., & Gómez, E. (2012, August). Melody Extraction From Polyphonic Music Signals Using Pitch Contour Characteristics. IEEE
Transactions on Audio, Speech, and Language Processing, 20 (6), 1759–1770.
Predominant Pitch Estimation: Melodia
Time-frequency
salience
Salience peaks
Contours
Salamon, J., & Gómez, E. (2012, August). Melody Extraction From Polyphonic Music Signals Using Pitch Contour Characteristics. IEEE
Transactions on Audio, Speech, and Language Processing, 20 (6), 1759–1770.
Predominant Pitch Estimation: Melodia
Contours
Predominant
melody contours
Salamon, J., & Gómez, E. (2012, August). Melody Extraction From Polyphonic Music Signals Using Pitch Contour Characteristics. IEEE
Transactions on Audio, Speech, and Language Processing, 20 (6), 1759–1770.
Essentia implementation of Melodia
Essentia implementation of Melodia
Essentia implementation of Melodia
Essentia implementation of Melodia
Essentia implementation of Melodia
Essentia implementation of Melodia
Essentia implementation of Melodia
Essentia implementation of Melodia
Essentia implementation of Melodia
Audio
Spectrogram
Essentia implementation of Melodia
Essentia implementation of Melodia
Spectral peaks
Spectrogram
Essentia implementation of Melodia
Essentia implementation of Melodia
Time-frequency
salience
Spectral peaks
Essentia implementation of Melodia
Essentia implementation of Melodia
Salience peaks
Time-frequency
salience
Essentia implementation of Melodia
Essentia implementation of Melodia
All contours
Salience peaks
Essentia implementation of Melodia
Essentia implementation of Melodia
Predominant
melody contours
All contours
Essentia implementation of Melodia
Essentia implementation of Melodia
Essentia implementation of Melodia
Essentia implementation of Melodia
Predominant Pitch Estimation: Melodia
What about loudness and timbre?
What about loudness and timbre?
Loudness features in Essentia
Loudness of predominant voice
Loudness of predominant voiceFrequency	
Time
Loudness of predominant voiceFrequency	
Time
Loudness of predominant voiceFrequency	
Time	
F0
Loudness of predominant voiceFrequency	
Time	
F0
Loudness of predominant voiceFrequency	
Time	
F0
Loudness of predominant voiceFrequency	
Time	
F0
Loudness of predominant voice: example
Spectral centroid of predominant voice
CompMusic: Dunya
CompMusic: Dunya
API	 Internet
CompMusic: Dunya Web
CompMusic: Dunya API
hTps://github.com/MTG/pycompmusic
Dunya API Examples
q  Metadata
q  Features

Más contenido relacionado

Similar a [Tutorial] Computational Approaches to Melodic Analysis of Indian Art Music

An efficient peak valley detection based vad algorithm for robust detection o...
An efficient peak valley detection based vad algorithm for robust detection o...An efficient peak valley detection based vad algorithm for robust detection o...
An efficient peak valley detection based vad algorithm for robust detection o...csandit
 
A Combined Voice Activity Detector Based On Singular Value Decomposition and ...
A Combined Voice Activity Detector Based On Singular Value Decomposition and ...A Combined Voice Activity Detector Based On Singular Value Decomposition and ...
A Combined Voice Activity Detector Based On Singular Value Decomposition and ...CSCJournals
 
Investigation of-combined-use-of-mfcc-and-lpc-features-in-speech-recognition-...
Investigation of-combined-use-of-mfcc-and-lpc-features-in-speech-recognition-...Investigation of-combined-use-of-mfcc-and-lpc-features-in-speech-recognition-...
Investigation of-combined-use-of-mfcc-and-lpc-features-in-speech-recognition-...Cemal Ardil
 
A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...
A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...
A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...IRJET Journal
 
129966863723746268[1]
129966863723746268[1]129966863723746268[1]
129966863723746268[1]威華 王
 
Voice morphing document
Voice morphing documentVoice morphing document
Voice morphing documenthimadrigupta
 
Graphical visualization of musical emotions
Graphical visualization of musical emotionsGraphical visualization of musical emotions
Graphical visualization of musical emotionsPranay Prasoon
 
CORRELATION BASED FUNDAMENTAL FREQUENCY EXTRACTION METHOD IN NOISY SPEECH SIGNAL
CORRELATION BASED FUNDAMENTAL FREQUENCY EXTRACTION METHOD IN NOISY SPEECH SIGNALCORRELATION BASED FUNDAMENTAL FREQUENCY EXTRACTION METHOD IN NOISY SPEECH SIGNAL
CORRELATION BASED FUNDAMENTAL FREQUENCY EXTRACTION METHOD IN NOISY SPEECH SIGNALijcseit
 
CORRELATION BASED FUNDAMENTAL FREQUENCY EXTRACTION METHOD IN NOISY SPEECH SIGNAL
CORRELATION BASED FUNDAMENTAL FREQUENCY EXTRACTION METHOD IN NOISY SPEECH SIGNALCORRELATION BASED FUNDAMENTAL FREQUENCY EXTRACTION METHOD IN NOISY SPEECH SIGNAL
CORRELATION BASED FUNDAMENTAL FREQUENCY EXTRACTION METHOD IN NOISY SPEECH SIGNALijcseit
 
Robot navigation in unknown environment with obstacle recognition using laser...
Robot navigation in unknown environment with obstacle recognition using laser...Robot navigation in unknown environment with obstacle recognition using laser...
Robot navigation in unknown environment with obstacle recognition using laser...IJECEIAES
 
Handling Ihnarmonic Series with Median-Adjustive Trajectories
Handling Ihnarmonic Series with Median-Adjustive TrajectoriesHandling Ihnarmonic Series with Median-Adjustive Trajectories
Handling Ihnarmonic Series with Median-Adjustive TrajectoriesMatthieu Hodgkinson
 
Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...
Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...
Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...IJERA Editor
 
Analysis the results_of_acoustic_echo_cancellation_for_speech_processing_usin...
Analysis the results_of_acoustic_echo_cancellation_for_speech_processing_usin...Analysis the results_of_acoustic_echo_cancellation_for_speech_processing_usin...
Analysis the results_of_acoustic_echo_cancellation_for_speech_processing_usin...Venkata Sudhir Vedurla
 
CORRELATION BASED FUNDAMENTAL FREQUENCY EXTRACTION METHOD IN NOISY SPEECH SIGNAL
CORRELATION BASED FUNDAMENTAL FREQUENCY EXTRACTION METHOD IN NOISY SPEECH SIGNALCORRELATION BASED FUNDAMENTAL FREQUENCY EXTRACTION METHOD IN NOISY SPEECH SIGNAL
CORRELATION BASED FUNDAMENTAL FREQUENCY EXTRACTION METHOD IN NOISY SPEECH SIGNALijcseit
 
Modified synthesis strategy for vowels and semi vowels klatt synthesize
Modified synthesis strategy for vowels and semi vowels klatt synthesizeModified synthesis strategy for vowels and semi vowels klatt synthesize
Modified synthesis strategy for vowels and semi vowels klatt synthesizeIAEME Publication
 
Broad Phoneme Classification Using Signal Based Features
Broad Phoneme Classification Using Signal Based Features  Broad Phoneme Classification Using Signal Based Features
Broad Phoneme Classification Using Signal Based Features ijsc
 
Broad phoneme classification using signal based features
Broad phoneme classification using signal based featuresBroad phoneme classification using signal based features
Broad phoneme classification using signal based featuresijsc
 

Similar a [Tutorial] Computational Approaches to Melodic Analysis of Indian Art Music (20)

An efficient peak valley detection based vad algorithm for robust detection o...
An efficient peak valley detection based vad algorithm for robust detection o...An efficient peak valley detection based vad algorithm for robust detection o...
An efficient peak valley detection based vad algorithm for robust detection o...
 
A Combined Voice Activity Detector Based On Singular Value Decomposition and ...
A Combined Voice Activity Detector Based On Singular Value Decomposition and ...A Combined Voice Activity Detector Based On Singular Value Decomposition and ...
A Combined Voice Activity Detector Based On Singular Value Decomposition and ...
 
Investigation of-combined-use-of-mfcc-and-lpc-features-in-speech-recognition-...
Investigation of-combined-use-of-mfcc-and-lpc-features-in-speech-recognition-...Investigation of-combined-use-of-mfcc-and-lpc-features-in-speech-recognition-...
Investigation of-combined-use-of-mfcc-and-lpc-features-in-speech-recognition-...
 
A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...
A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...
A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...
 
129966863723746268[1]
129966863723746268[1]129966863723746268[1]
129966863723746268[1]
 
Voice morphing document
Voice morphing documentVoice morphing document
Voice morphing document
 
cr1503
cr1503cr1503
cr1503
 
Graphical visualization of musical emotions
Graphical visualization of musical emotionsGraphical visualization of musical emotions
Graphical visualization of musical emotions
 
CORRELATION BASED FUNDAMENTAL FREQUENCY EXTRACTION METHOD IN NOISY SPEECH SIGNAL
CORRELATION BASED FUNDAMENTAL FREQUENCY EXTRACTION METHOD IN NOISY SPEECH SIGNALCORRELATION BASED FUNDAMENTAL FREQUENCY EXTRACTION METHOD IN NOISY SPEECH SIGNAL
CORRELATION BASED FUNDAMENTAL FREQUENCY EXTRACTION METHOD IN NOISY SPEECH SIGNAL
 
CORRELATION BASED FUNDAMENTAL FREQUENCY EXTRACTION METHOD IN NOISY SPEECH SIGNAL
CORRELATION BASED FUNDAMENTAL FREQUENCY EXTRACTION METHOD IN NOISY SPEECH SIGNALCORRELATION BASED FUNDAMENTAL FREQUENCY EXTRACTION METHOD IN NOISY SPEECH SIGNAL
CORRELATION BASED FUNDAMENTAL FREQUENCY EXTRACTION METHOD IN NOISY SPEECH SIGNAL
 
Robot navigation in unknown environment with obstacle recognition using laser...
Robot navigation in unknown environment with obstacle recognition using laser...Robot navigation in unknown environment with obstacle recognition using laser...
Robot navigation in unknown environment with obstacle recognition using laser...
 
Handling Ihnarmonic Series with Median-Adjustive Trajectories
Handling Ihnarmonic Series with Median-Adjustive TrajectoriesHandling Ihnarmonic Series with Median-Adjustive Trajectories
Handling Ihnarmonic Series with Median-Adjustive Trajectories
 
Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...
Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...
Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...
 
Analysis the results_of_acoustic_echo_cancellation_for_speech_processing_usin...
Analysis the results_of_acoustic_echo_cancellation_for_speech_processing_usin...Analysis the results_of_acoustic_echo_cancellation_for_speech_processing_usin...
Analysis the results_of_acoustic_echo_cancellation_for_speech_processing_usin...
 
F010334548
F010334548F010334548
F010334548
 
Ijeer journal
Ijeer journalIjeer journal
Ijeer journal
 
CORRELATION BASED FUNDAMENTAL FREQUENCY EXTRACTION METHOD IN NOISY SPEECH SIGNAL
CORRELATION BASED FUNDAMENTAL FREQUENCY EXTRACTION METHOD IN NOISY SPEECH SIGNALCORRELATION BASED FUNDAMENTAL FREQUENCY EXTRACTION METHOD IN NOISY SPEECH SIGNAL
CORRELATION BASED FUNDAMENTAL FREQUENCY EXTRACTION METHOD IN NOISY SPEECH SIGNAL
 
Modified synthesis strategy for vowels and semi vowels klatt synthesize
Modified synthesis strategy for vowels and semi vowels klatt synthesizeModified synthesis strategy for vowels and semi vowels klatt synthesize
Modified synthesis strategy for vowels and semi vowels klatt synthesize
 
Broad Phoneme Classification Using Signal Based Features
Broad Phoneme Classification Using Signal Based Features  Broad Phoneme Classification Using Signal Based Features
Broad Phoneme Classification Using Signal Based Features
 
Broad phoneme classification using signal based features
Broad phoneme classification using signal based featuresBroad phoneme classification using signal based features
Broad phoneme classification using signal based features
 

Más de Sankalp Gulati

Mining Melodic Patterns in Large Audio Collections of Indian Art Music
	Mining Melodic Patterns in Large Audio Collections of Indian Art Music	Mining Melodic Patterns in Large Audio Collections of Indian Art Music
Mining Melodic Patterns in Large Audio Collections of Indian Art MusicSankalp Gulati
 
Landmark Detection in Hindustani Music Melodies
Landmark Detection in Hindustani Music MelodiesLandmark Detection in Hindustani Music Melodies
Landmark Detection in Hindustani Music MelodiesSankalp Gulati
 
Phrase-based Rāga Recognition Using Vector Space Modeling
Phrase-based Rāga Recognition Using Vector Space ModelingPhrase-based Rāga Recognition Using Vector Space Modeling
Phrase-based Rāga Recognition Using Vector Space ModelingSankalp Gulati
 
Computational Approaches to Melodic Analysis of Indian Art Music
Computational Approaches to Melodic Analysis of Indian Art MusicComputational Approaches to Melodic Analysis of Indian Art Music
Computational Approaches to Melodic Analysis of Indian Art MusicSankalp Gulati
 
Computational Melodic Analysis of Indian Art Music
Computational Melodic Analysis of Indian Art MusicComputational Melodic Analysis of Indian Art Music
Computational Melodic Analysis of Indian Art MusicSankalp Gulati
 
Computational Approaches for Melodic Description in Indian Art Music Corpora
Computational Approaches for Melodic Description in Indian Art Music CorporaComputational Approaches for Melodic Description in Indian Art Music Corpora
Computational Approaches for Melodic Description in Indian Art Music CorporaSankalp Gulati
 
Discovery and Characterization of Melodic Motives in Large Audio Music Collec...
Discovery and Characterization of Melodic Motives in Large Audio Music Collec...Discovery and Characterization of Melodic Motives in Large Audio Music Collec...
Discovery and Characterization of Melodic Motives in Large Audio Music Collec...Sankalp Gulati
 
Tonic Identification System for Hindustani and Carnatic Music
Tonic Identification System for Hindustani and Carnatic MusicTonic Identification System for Hindustani and Carnatic Music
Tonic Identification System for Hindustani and Carnatic MusicSankalp Gulati
 
Tonic Identification System for Indian Art Music
Tonic Identification System for Indian Art MusicTonic Identification System for Indian Art Music
Tonic Identification System for Indian Art MusicSankalp Gulati
 

Más de Sankalp Gulati (10)

Mining Melodic Patterns in Large Audio Collections of Indian Art Music
	Mining Melodic Patterns in Large Audio Collections of Indian Art Music	Mining Melodic Patterns in Large Audio Collections of Indian Art Music
Mining Melodic Patterns in Large Audio Collections of Indian Art Music
 
Landmark Detection in Hindustani Music Melodies
Landmark Detection in Hindustani Music MelodiesLandmark Detection in Hindustani Music Melodies
Landmark Detection in Hindustani Music Melodies
 
Phrase-based Rāga Recognition Using Vector Space Modeling
Phrase-based Rāga Recognition Using Vector Space ModelingPhrase-based Rāga Recognition Using Vector Space Modeling
Phrase-based Rāga Recognition Using Vector Space Modeling
 
Computational Approaches to Melodic Analysis of Indian Art Music
Computational Approaches to Melodic Analysis of Indian Art MusicComputational Approaches to Melodic Analysis of Indian Art Music
Computational Approaches to Melodic Analysis of Indian Art Music
 
Computational Melodic Analysis of Indian Art Music
Computational Melodic Analysis of Indian Art MusicComputational Melodic Analysis of Indian Art Music
Computational Melodic Analysis of Indian Art Music
 
Computational Approaches for Melodic Description in Indian Art Music Corpora
Computational Approaches for Melodic Description in Indian Art Music CorporaComputational Approaches for Melodic Description in Indian Art Music Corpora
Computational Approaches for Melodic Description in Indian Art Music Corpora
 
Hindify
HindifyHindify
Hindify
 
Discovery and Characterization of Melodic Motives in Large Audio Music Collec...
Discovery and Characterization of Melodic Motives in Large Audio Music Collec...Discovery and Characterization of Melodic Motives in Large Audio Music Collec...
Discovery and Characterization of Melodic Motives in Large Audio Music Collec...
 
Tonic Identification System for Hindustani and Carnatic Music
Tonic Identification System for Hindustani and Carnatic MusicTonic Identification System for Hindustani and Carnatic Music
Tonic Identification System for Hindustani and Carnatic Music
 
Tonic Identification System for Indian Art Music
Tonic Identification System for Indian Art MusicTonic Identification System for Indian Art Music
Tonic Identification System for Indian Art Music
 

Último

Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...PsychoTech Services
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 

Último (20)

INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 

[Tutorial] Computational Approaches to Melodic Analysis of Indian Art Music

  • 1. Computational Approaches to Melodic Analysis of Indian Art Music Indian Institute of Sciences, Bengaluru, India 2016 Sankalp Gulati Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain
  • 4. Tonic Identification time (s) Frequency(Hz) 0 1 2 3 4 5 6 7 8 0 1000 2000 3000 4000 5000 100 150 200 250 300 0 0.2 0.4 0.6 0.8 1 Frequency (bins), 1bin=10 cents, Ref=55 Hz Normalizedsalience f2 f3 f4 f 5f6 Tonic Signal processing Learning q  Tanpura / drone background sound q  Extent of gamakas on Sa and Pa svara q  Vadi, sam-vadi svara of the rāga S. Gulati, A. Bellur, J. Salamon, H. Ranjani, V. Ishwar, H.A. Murthy, and X. Serra. Automatic tonic identification in Indian art music: approaches and evaluation. Journal of New Music Research, 43(01):55–73, 2014. Salamon, J., Gulati, S., & Serra, X. (2012). A multipitch approach to tonic identification in Indian classical music. In Proc. of Int. Conf. on Music Information Retrieval (ISMIR) (pp. 499–504), Porto, Portugal. Bellur, A., Ishwar, V., Serra, X., & Murthy, H. (2012). A knowledge based signal processing approach to tonic identification in Indian classical music. In 2nd CompMusic Workshop (pp. 113–118) Istanbul, Turkey. Ranjani, H. G., Arthi, S., & Sreenivas, T. V. (2011). Carnatic music analysis: Shadja, swara identification and raga verification in Alapana using stochastic models. Applications of Signal Processing to Audio and Acoustics (WASPAA), IEEE Workshop , 29–32, New Paltz, NY. Accuracy : ~90% !!!
  • 5. Tonic Identification: Multipitch Approach q  Audio example: q  Utilizing drone sound q  Multi-pitch analysis Vocals Drone J. Salamon, E. G´omez, and J. Bonada. Sinusoid extraction and salience function design for predominant melody estimation. In Proc. 14th Int. Conf. on Digital Audio Effects (DAFX-11), pages 73–80, Paris, France, Sep. 2011.
  • 6. Tonic Identification: Block Diagram STFT Spectral Peak Picking Frequency/ Amplitude correc<on Salience peak picking Mul<-pitch histogram Histogram peak picking Bin salience mapping Harmonic summa<on Audio Sinusoids Time frequency salience Sinusoid Extrac<on Tonic candidates Salience func<on computa<on Tonic candidate genera<on
  • 7. Tonic Identification: Signal Processing q  STFT §  Hop size: 11 ms §  Window length: 46 ms §  Window type: hamming §  FFT = 8192 points STFT
  • 8. Tonic Identification: Signal Processing q  Spectral peak picking §  Absolute threshold: -60 dB Spectral Peak Picking
  • 9. Tonic Identification: Signal Processing q  Frequency/Amplitude correction §  Parabolic interpolation Frequency/ Amplitude correc<on
  • 10. Tonic Identification: Signal Processing q  Harmonic summation §  Spectrum considered: 55-7200 Hz §  Frequency range: 55-1760 Hz §  Base frequency: 55 Hz §  Bin resolution: 10 cents per bin (120 per octave) §  N octaves: 5 §  Maximum harmonics: 20 §  Square cosine window across 50 cents Bin salience mapping Harmonic summa<on
  • 11. Tonic Identification: Signal Processing q  Tonic candidate generation §  Number of salience peaks per frame: 5 §  Frequency range: 110-550 Hz Mul<-pitch histogram
  • 12. Tonic Identification: Feature Exraction q  Identifying tonic in correct octave using multi-pitch histogram q  Classification based template learning q  Class of an instance is the rank of the tonic 100 150 200 250 300 350 400 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Frequency bins (1 bin = 10 cents), Ref: 55Hz Normalizedsalience Multipitch Histogram f2 f3 f4 f5
  • 13. q  Decision Tree: f2 f3 f2 f3 f5 1st 1st 2nd 3rd 4th 5th >5 <=5 >-7 <=-7 >-11 <=-11 >5 <=5 >-6 <=-6 Sa Sa Pa salience Frequency Sa Sa Pa salience Frequency Tonic Identification: Classification
  • 14. Tonic Identification: Results S. Gulati, A. Bellur, J. Salamon, H. Ranjani, V. Ishwar, H.A. Murthy, and X. Serra. Automatic tonic identification in Indian art music: approaches and evaluation. Journal of New Music Research, 43(01): 55–73, 2014.
  • 16. Pitch Estimation Algorithms q  Time-domain approaches §  ACF-based (Rabiner 1977) §  AMDF-based (YIN) Cheveigné et al. q  Frequency-domain approaches §  Two-way mismatch (Maher and Beauchamp 1994) §  Subharmonic summation (Hermes 1988) Rabiner, L. (1977, February). On the use of autocorrelation analysis for pitch detection. IEEE Transactions on Acoustics, Speech, and Signal Processing, 25(1), 24–33 De Cheveigné, A., and Kawahara, H., "YIN, a fundamental frequency estimator for speech and music." The Journal of the Acoustical Society of America 111, no. 4 (2002): 1917-1930. §  Multi-pitch approaches §  Source separation-based (Klapuri, 2003) §  Harmonic summation (Melodia) (Salamon and Gómez, 2012) Medan, Y., & Yair, E. (1991). Super resolution pitch determination of speech signals. IEEE transactions on signal processing, 39(1), 40–48. Maher, R., & Beauchamp, J. W. (1994). Fundamental frequency estimation of musical signals using a two-way mismatch procedure. The Journal of the Acoustical Society of , 95 (April), 2254–2263. Hermes, D. (1988, 1988). Measurement of pitch by subharmonic summation. Journal of the Acoustical Society of America, 83, 257 - 264. Klapuri, A. (2003b, November). Multiple fundamental frequency estimation based on harmonicity and spectral smoothness. IEEE Transactions on Speech and Audio Processing, 11(6), 804–816. Salamon, J., & Gómez, E. (2012, August). Melody Extraction From Polyphonic Music Signals Using Pitch Contour Characteristics. IEEE Transactions on Audio, Speech, and Language Processing, 20(6), 1759–1770.
  • 17. Pitch Estimation Algorithms q  Time-domain approaches §  ACF-based (Rabiner 1977) §  AMDF-based (YIN) Cheveigné et al. q  Frequency-domain approaches §  Two-way mismatch (Maher and Beauchamp 1994) §  Subharmonic summation (Hermes 1988) Rabiner, L. (1977, February). On the use of autocorrelation analysis for pitch detection. IEEE Transactions on Acoustics, Speech, and Signal Processing, 25(1), 24–33 De Cheveigné, A., and Kawahara, H., "YIN, a fundamental frequency estimator for speech and music." The Journal of the Acoustical Society of America 111, no. 4 (2002): 1917-1930. §  Multi-pitch approaches §  Source separation-based (Klapuri, 2003) §  Harmonic summation (Melodia) (Salamon and Gómez, 2012) Medan, Y., & Yair, E. (1991). Super resolution pitch determination of speech signals. IEEE transactions on signal processing, 39(1), 40–48. Maher, R., & Beauchamp, J. W. (1994). Fundamental frequency estimation of musical signals using a two-way mismatch procedure. The Journal of the Acoustical Society of , 95 (April), 2254–2263. Hermes, D. (1988, 1988). Measurement of pitch by subharmonic summation. Journal of the Acoustical Society of America, 83, 257 - 264. Klapuri, A. (2003b, November). Multiple fundamental frequency estimation based on harmonicity and spectral smoothness. IEEE Transactions on Speech and Audio Processing, 11(6), 804–816. Salamon, J., & Gómez, E. (2012, August). Melody Extraction From Polyphonic Music Signals Using Pitch Contour Characteristics. IEEE Transactions on Audio, Speech, and Language Processing, 20(6), 1759–1770.
  • 18. Predominant Pitch Estimation: YIN Signal Difference function Auto-correlation Cumulative difference function rt͑␶͒ϭ ͚jϭtϩ1 tϩW xjxjϩ␶, ͑1͒ where rt(␶) is the autocorrelation function of lag ␶ calculated at time index t, and W is the integration window size. This function is illustrated in Fig. 1͑b͒ for the signal plotted in Fig. 1͑a͒. It is common in signal processing to use a slightly different definition: rtЈ͑␶͒ϭ ͚jϭtϩ1 tϩWϪ␶ xjxjϩ␶. ͑2͒ Here the integration window size shrinks with increasing values of ␶, with the result that the envelope of the function decreases as a function of lag as illustrated in Fig. 1͑c͒. The FIG. 1. ͑a͒ Example of a speech waveform. ͑b͒ Autocorrelation function ͑ACF͒ calculated from the waveform in ͑a͒ according to Eq. ͑1͒. ͑c͒ Same, calculated according to Eq. ͑2͒. The envelope of this function is tapered to zero because of the smaller number of terms in the summation at larger ␶. FIG. 2. F0 estimation error rates as a function of the slope of the envelope of the ACF, quantified by its intercept with the abscissa. The dotted line represents errors for which the F0 estimate was too high, the dashed line those for which it was too low, and the full line their sum. Triangles at the right represent error rates for ACF calculated as in Eq. ͑1͒ (␶maxϭϱ). These rates were measured over a subset of the database used in Sec. III. Lag (samples) The present article introduces a method for F0 estima- tion that produces fewer errors than other well-known meth- ods. The name YIN ͑from ‘‘yin’’ and ‘‘yang’’ of oriental philosophy͒ alludes to the interplay between autocorrelation and cancellation that it involves. This article is the first of a rt͑␶͒ϭ where rt(␶ at time ind function is Fig. 1͑a͒. I different d rtЈ͑␶͒ϭ Here the values of ␶ decreases two definit side ͓tϩ1, this articl ‘‘modified correlation In resp multiples FIG. 1. ͑a͒ Example of a speech waveform. ͑b͒ Autocorrelation function ͑ACF͒ calculated from the waveform in ͑a͒ according to Eq. ͑1͒. ͑c͒ Same, calculated according to Eq. ͑2͒. The envelope of this function is tapered to zero because of the smaller number of terms in the summation at larger ␶. The horizontal arrows symbolize the search range for the period. FIG. 2. F0 e of the ACF, represents er those for wh right represen rates were m ␶max . The parameter ␶max allows the algorithm to be biased to favor one form of error at the expense of the other, with a minimum of total error for intermediate values. Using Eq. ͑2͒ rather than Eq. ͑1͒ introduces a natural bias that can be tuned by adjusting W. However, changing the window size has other effects, and one can argue that a bias of this sort, if useful, should be applied explicitly rather than implicitly. This is one reason to prefer the definition of Eq. ͑1͒. The autocorrelation method compares the signal to its shifted self. In that sense it is related to the AMDF method ͑average magnitude difference function, Ross et al., 1974; Ney, 1982͒ that performs its comparison using differences rather than products, and more generally to time-domain methods that measure intervals between events in time ͑Hess, 1983͒. The ACF is the Fourier transform of the power spectrum, and can be seen as measuring the regular spacing of harmonics within that spectrum. The cepstrum method ͑Noll, 1967͒ replaces the power spectrum by the log magni- tude spectrum and thus puts less weight on high-amplitude parts of the spectrum ͑particularly near the first formant that often dominates the ACF͒. Similar ‘‘spectral whitening’’ ef- fects can be obtained by linear predictive inverse filtering or center-clipping ͑Rabiner and Schafer, 1978͒, or by splitting the signal over a bank of filters, calculating ACFs within each channel, and adding the results after amplitude normal- ization ͑de Cheveigne´, 1991͒. Auditory models based on au- tocorrelation are currently one of the more popular ways to The same is true after taking the square and averaging over a window: ͚jϭtϩ1 tϩW ͑xjϪxjϩT͒2 ϭ0. ͑5͒ Conversely, an unknown period may be found by forming the difference function: dt͑␶͒ϭ ͚jϭ1 W ͑xjϪxjϩ␶͒2 , ͑6͒ and searching for the values of ␶ for which the function is zero. There is an infinite set of such values, all multiples of the period. The difference function calculated from the signal in Fig. 1͑a͒ is illustrated in Fig. 3͑a͒. The squared sum may FIG. 3. ͑a͒ Difference function calculated for the speech signal of Fig. 1͑a͒. ͑b͒ Cumulative mean normalized difference function. Note that the function starts at 1 rather than 0 and remains high until the dip at the period. size was 25 ms, window shift was one sample, search range was 40 to 800 Hz, and threshold ͑step 4͒ was 0.1. Version Gross error ͑%͒ Step 1 10.0 Step 2 1.95 Step 3 1.69 Step 4 0.78 Step 5 0.77 Step 6 0.50 Lag (samples) ed a od re ow 00 sed h a ͑2͒ ned has if tly. its hod 74; ces ain The same is true after taking the square and averaging over a FIG. 3. ͑a͒ Difference function calculated for the speech signal of Fig. 1͑a͒. ͑b͒ Cumulative mean normalized difference function. Note that the function starts at 1 rather than 0 and remains high until the dip at the period. hod were dow 800 Lag (samples) ␶max . The parameter ␶max allows the algorithm to be biased to favor one form of error at the expense of the other, with a minimum of total error for intermediate values. Using Eq. ͑2͒ rather than Eq. ͑1͒ introduces a natural bias that can be tuned by adjusting W. However, changing the window size has other effects, and one can argue that a bias of this sort, if useful, should be applied explicitly rather than implicitly. This is one reason to prefer the definition of Eq. ͑1͒. The autocorrelation method compares the signal to its shifted self. In that sense it is related to the AMDF method ͑average magnitude difference function, Ross et al., 1974; Ney, 1982͒ that performs its comparison using differences rather than products, and more generally to time-domain The same is true after taking the square and averaging over a window: FIG. 3. ͑a͒ Difference function calculated for the speech signal of Fig. 1͑a͒. ͑b͒ Cumulative mean normalized difference function. Note that the function starts at 1 rather than 0 and remains high until the dip at the period. TABLE I. Gross error rates for the simple unbiased autocorrelation method ͑step 1͒, and for the cumulated steps described in the text. These rates were measured over a subset of the database used in Sec. III. Integration window size was 25 ms, window shift was one sample, search range was 40 to 800 Hz, and threshold ͑step 4͒ was 0.1. Version Gross error ͑%͒ Step 1 10.0 Step 2 1.95 Step 3 1.69 Step 4 0.78 Step 5 0.77 Step 6 0.50 Lag (samples) De Cheveigné, A., and Kawahara, H., "YIN, a fundamental frequency estimator for speech and music." The Journal of the Acoustical Society of America 111, no. 4 (2002): 1917-1930.
  • 20. Predominant Pitch Estimation: Melodia Salamon, J., & Gómez, E. (2012, August). Melody Extraction From Polyphonic Music Signals Using Pitch Contour Characteristics. IEEE Transactions on Audio, Speech, and Language Processing, 20 (6), 1759–1770.
  • 21. Predominant Pitch Estimation: Melodia Salamon, J., & Gómez, E. (2012, August). Melody Extraction From Polyphonic Music Signals Using Pitch Contour Characteristics. IEEE Transactions on Audio, Speech, and Language Processing, 20 (6), 1759–1770.
  • 22. Predominant Pitch Estimation: Melodia audio Spectrogram Spectral peaks Salamon, J., & Gómez, E. (2012, August). Melody Extraction From Polyphonic Music Signals Using Pitch Contour Characteristics. IEEE Transactions on Audio, Speech, and Language Processing, 20 (6), 1759–1770.
  • 23. Predominant Pitch Estimation: Melodia Spectral peaks Time-frequency salience Salamon, J., & Gómez, E. (2012, August). Melody Extraction From Polyphonic Music Signals Using Pitch Contour Characteristics. IEEE Transactions on Audio, Speech, and Language Processing, 20 (6), 1759–1770.
  • 24. Predominant Pitch Estimation: Melodia Time-frequency salience Salience peaks Contours Salamon, J., & Gómez, E. (2012, August). Melody Extraction From Polyphonic Music Signals Using Pitch Contour Characteristics. IEEE Transactions on Audio, Speech, and Language Processing, 20 (6), 1759–1770.
  • 25. Predominant Pitch Estimation: Melodia Contours Predominant melody contours Salamon, J., & Gómez, E. (2012, August). Melody Extraction From Polyphonic Music Signals Using Pitch Contour Characteristics. IEEE Transactions on Audio, Speech, and Language Processing, 20 (6), 1759–1770.
  • 34. Essentia implementation of Melodia Audio Spectrogram
  • 36. Essentia implementation of Melodia Spectral peaks Spectrogram
  • 38. Essentia implementation of Melodia Time-frequency salience Spectral peaks
  • 40. Essentia implementation of Melodia Salience peaks Time-frequency salience
  • 42. Essentia implementation of Melodia All contours Salience peaks
  • 44. Essentia implementation of Melodia Predominant melody contours All contours
  • 50. What about loudness and timbre?
  • 51. What about loudness and timbre?
  • 54. Loudness of predominant voiceFrequency Time
  • 55. Loudness of predominant voiceFrequency Time
  • 56. Loudness of predominant voiceFrequency Time F0
  • 57. Loudness of predominant voiceFrequency Time F0
  • 58. Loudness of predominant voiceFrequency Time F0
  • 59. Loudness of predominant voiceFrequency Time F0
  • 60. Loudness of predominant voice: example
  • 61. Spectral centroid of predominant voice
  • 66. Dunya API Examples q  Metadata q  Features