SlideShare una empresa de Scribd logo
1 de 66
Descargar para leer sin conexión
Computational Approaches to Melodic
Analysis of Indian Art Music
Indian Institute of Sciences, Bengaluru, India 2016
Sankalp Gulati
Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain
Tonic
Melody
Intonation
Raga
Motifs
Similarity
Melodic
description
Tonic Identification
Tonic Identification
time (s)
Frequency(Hz)
0 1 2 3 4 5 6 7 8
0
1000
2000
3000
4000
5000
100 150 200 250 300
0
0.2
0.4
0.6
0.8
1
Frequency (bins), 1bin=10 cents, Ref=55 Hz
Normalizedsalience
f2
f3
f4
f
5f6
Tonic
Signal processing Learning
q  Tanpura / drone background sound
q  Extent of gamakas on Sa and Pa svara
q  Vadi, sam-vadi svara of the rāga
S. Gulati, A. Bellur, J. Salamon, H. Ranjani, V. Ishwar, H.A. Murthy, and X. Serra. Automatic tonic identification in Indian art music: approaches
and evaluation. Journal of New Music Research, 43(01):55–73, 2014.
Salamon, J., Gulati, S., & Serra, X. (2012). A multipitch approach to tonic identification in Indian classical music. In Proc. of Int. Conf. on Music
Information Retrieval (ISMIR) (pp. 499–504), Porto, Portugal.
Bellur, A., Ishwar, V., Serra, X., & Murthy, H. (2012). A knowledge based signal processing approach to tonic identification in Indian classical music. In 2nd
CompMusic Workshop (pp. 113–118) Istanbul, Turkey.
Ranjani, H. G., Arthi, S., & Sreenivas, T. V. (2011). Carnatic music analysis: Shadja, swara identification and raga verification in Alapana using stochastic
models. Applications of Signal Processing to Audio and Acoustics (WASPAA), IEEE Workshop , 29–32, New Paltz, NY.
Accuracy : ~90% !!!
Tonic Identification: Multipitch Approach
q  Audio example:
q  Utilizing drone sound
q  Multi-pitch analysis
Vocals	
Drone	
J. Salamon, E. G´omez, and J. Bonada. Sinusoid extraction and salience function design for predominant melody
estimation. In Proc. 14th Int. Conf. on Digital Audio Effects (DAFX-11), pages 73–80, Paris, France, Sep. 2011.
Tonic Identification: Block Diagram
STFT	
Spectral	Peak	
Picking	
Frequency/	Amplitude	
correc<on	
Salience	peak	
picking	
Mul<-pitch	
histogram	
Histogram	peak	
picking	
Bin	salience	mapping	
Harmonic	summa<on	
Audio	
Sinusoids	
Time	frequency	salience	
Sinusoid	Extrac<on	
Tonic	candidates	
Salience	func<on	
computa<on	
Tonic	candidate	
genera<on
Tonic Identification: Signal Processing
q  STFT
§  Hop size: 11 ms
§  Window length: 46 ms
§  Window type: hamming
§  FFT = 8192 points
STFT
Tonic Identification: Signal Processing
q  Spectral peak picking
§  Absolute threshold: -60 dB
Spectral	Peak	
Picking
Tonic Identification: Signal Processing
q  Frequency/Amplitude
correction
§  Parabolic interpolation
Frequency/	Amplitude	
correc<on
Tonic Identification: Signal Processing
q  Harmonic summation
§  Spectrum considered: 55-7200 Hz
§  Frequency range: 55-1760 Hz
§  Base frequency: 55 Hz
§  Bin resolution: 10 cents per bin (120
per octave)
§  N octaves: 5
§  Maximum harmonics: 20
§  Square cosine window across 50 cents
Bin	salience	mapping	
Harmonic	summa<on
Tonic Identification: Signal Processing
q  Tonic candidate generation
§  Number of salience peaks per
frame: 5
§  Frequency range: 110-550 Hz
Mul<-pitch	
histogram
Tonic Identification: Feature Exraction
q  Identifying tonic in correct octave using multi-pitch
histogram
q  Classification based template learning
q  Class of an instance is the rank of the tonic
100 150 200 250 300 350 400
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Frequency bins (1 bin = 10 cents), Ref: 55Hz
Normalizedsalience
Multipitch Histogram
f2	
f3	
f4	
f5
q  Decision Tree:
f2	
f3	
f2	
f3	
f5	
1st	
1st	2nd	
3rd	
4th	 5th	
>5	<=5	
>-7	<=-7	
>-11	<=-11	
>5	<=5	 >-6	<=-6	
Sa	
Sa	
Pa	
salience	
Frequency	
Sa	
Sa	
Pa	
salience	
Frequency	
Tonic Identification: Classification
Tonic Identification: Results
S. Gulati, A. Bellur, J. Salamon, H. Ranjani, V. Ishwar, H.A. Murthy, and X. Serra. Automatic tonic
identification in Indian art music: approaches and evaluation. Journal of New Music Research, 43(01):
55–73, 2014.
Predominant Pitch Estimation
Pitch Estimation Algorithms
q  Time-domain approaches
§  ACF-based (Rabiner 1977)
§  AMDF-based (YIN) Cheveigné et al.
q  Frequency-domain approaches
§  Two-way mismatch (Maher and
Beauchamp 1994)
§  Subharmonic summation (Hermes 1988)
Rabiner, L. (1977, February). On the use of autocorrelation analysis for pitch detection. IEEE Transactions on Acoustics, Speech, and Signal
Processing, 25(1), 24–33
De Cheveigné, A., and Kawahara, H., "YIN, a fundamental frequency estimator for speech and music." The Journal of the Acoustical Society
of America 111, no. 4 (2002): 1917-1930.
§  Multi-pitch approaches
§  Source separation-based (Klapuri, 2003)
§  Harmonic summation (Melodia) (Salamon and
Gómez, 2012)
Medan, Y., & Yair, E. (1991). Super resolution pitch determination of speech signals. IEEE transactions on signal processing, 39(1), 40–48.
Maher, R., & Beauchamp, J. W. (1994). Fundamental frequency estimation of musical signals using a two-way mismatch procedure. The
Journal of the Acoustical Society of , 95 (April), 2254–2263.
Hermes, D. (1988, 1988). Measurement of pitch by subharmonic summation. Journal of the Acoustical Society of America, 83, 257 - 264.
Klapuri, A. (2003b, November). Multiple fundamental frequency estimation based on harmonicity and spectral smoothness. IEEE
Transactions on Speech and Audio Processing, 11(6), 804–816.
Salamon, J., & Gómez, E. (2012, August). Melody Extraction From Polyphonic Music Signals Using Pitch Contour Characteristics. IEEE
Transactions on Audio, Speech, and Language Processing, 20(6), 1759–1770.
Pitch Estimation Algorithms
q  Time-domain approaches
§  ACF-based (Rabiner 1977)
§  AMDF-based (YIN) Cheveigné et al.
q  Frequency-domain approaches
§  Two-way mismatch (Maher and
Beauchamp 1994)
§  Subharmonic summation (Hermes 1988)
Rabiner, L. (1977, February). On the use of autocorrelation analysis for pitch detection. IEEE Transactions on Acoustics, Speech, and Signal
Processing, 25(1), 24–33
De Cheveigné, A., and Kawahara, H., "YIN, a fundamental frequency estimator for speech and music." The Journal of the Acoustical
Society of America 111, no. 4 (2002): 1917-1930.
§  Multi-pitch approaches
§  Source separation-based (Klapuri, 2003)
§  Harmonic summation (Melodia) (Salamon
and Gómez, 2012)
Medan, Y., & Yair, E. (1991). Super resolution pitch determination of speech signals. IEEE transactions on signal processing, 39(1), 40–48.
Maher, R., & Beauchamp, J. W. (1994). Fundamental frequency estimation of musical signals using a two-way mismatch procedure. The
Journal of the Acoustical Society of , 95 (April), 2254–2263.
Hermes, D. (1988, 1988). Measurement of pitch by subharmonic summation. Journal of the Acoustical Society of America, 83, 257 - 264.
Klapuri, A. (2003b, November). Multiple fundamental frequency estimation based on harmonicity and spectral smoothness. IEEE
Transactions on Speech and Audio Processing, 11(6), 804–816.
Salamon, J., & Gómez, E. (2012, August). Melody Extraction From Polyphonic Music Signals Using Pitch Contour Characteristics.
IEEE Transactions on Audio, Speech, and Language Processing, 20(6), 1759–1770.
Predominant Pitch Estimation: YIN
Signal
Difference function
Auto-correlation
Cumulative difference
function
rt͑␶͒ϭ ͚jϭtϩ1
tϩW
xjxjϩ␶, ͑1͒
where rt(␶) is the autocorrelation function of lag ␶ calculated
at time index t, and W is the integration window size. This
function is illustrated in Fig. 1͑b͒ for the signal plotted in
Fig. 1͑a͒. It is common in signal processing to use a slightly
different definition:
rtЈ͑␶͒ϭ ͚jϭtϩ1
tϩWϪ␶
xjxjϩ␶. ͑2͒
Here the integration window size shrinks with increasing
values of ␶, with the result that the envelope of the function
decreases as a function of lag as illustrated in Fig. 1͑c͒. The
FIG. 1. ͑a͒ Example of a speech waveform. ͑b͒ Autocorrelation function
͑ACF͒ calculated from the waveform in ͑a͒ according to Eq. ͑1͒. ͑c͒ Same,
calculated according to Eq. ͑2͒. The envelope of this function is tapered to
zero because of the smaller number of terms in the summation at larger ␶.
FIG. 2. F0 estimation error rates as a function of the slope of the envelope
of the ACF, quantified by its intercept with the abscissa. The dotted line
represents errors for which the F0 estimate was too high, the dashed line
those for which it was too low, and the full line their sum. Triangles at the
right represent error rates for ACF calculated as in Eq. ͑1͒ (␶maxϭϱ). These
rates were measured over a subset of the database used in Sec. III.
Lag	(samples)	
The present article introduces a method for F0 estima-
tion that produces fewer errors than other well-known meth-
ods. The name YIN ͑from ‘‘yin’’ and ‘‘yang’’ of oriental
philosophy͒ alludes to the interplay between autocorrelation
and cancellation that it involves. This article is the first of a
rt͑␶͒ϭ
where rt(␶
at time ind
function is
Fig. 1͑a͒. I
different d
rtЈ͑␶͒ϭ
Here the
values of ␶
decreases
two definit
side ͓tϩ1,
this articl
‘‘modified
correlation
In resp
multiples
FIG. 1. ͑a͒ Example of a speech waveform. ͑b͒ Autocorrelation function
͑ACF͒ calculated from the waveform in ͑a͒ according to Eq. ͑1͒. ͑c͒ Same,
calculated according to Eq. ͑2͒. The envelope of this function is tapered to
zero because of the smaller number of terms in the summation at larger ␶.
The horizontal arrows symbolize the search range for the period.
FIG. 2. F0 e
of the ACF,
represents er
those for wh
right represen
rates were m
␶max . The parameter ␶max allows the algorithm to be biased
to favor one form of error at the expense of the other, with a
minimum of total error for intermediate values. Using Eq. ͑2͒
rather than Eq. ͑1͒ introduces a natural bias that can be tuned
by adjusting W. However, changing the window size has
other effects, and one can argue that a bias of this sort, if
useful, should be applied explicitly rather than implicitly.
This is one reason to prefer the definition of Eq. ͑1͒.
The autocorrelation method compares the signal to its
shifted self. In that sense it is related to the AMDF method
͑average magnitude difference function, Ross et al., 1974;
Ney, 1982͒ that performs its comparison using differences
rather than products, and more generally to time-domain
methods that measure intervals between events in time
͑Hess, 1983͒. The ACF is the Fourier transform of the power
spectrum, and can be seen as measuring the regular spacing
of harmonics within that spectrum. The cepstrum method
͑Noll, 1967͒ replaces the power spectrum by the log magni-
tude spectrum and thus puts less weight on high-amplitude
parts of the spectrum ͑particularly near the first formant that
often dominates the ACF͒. Similar ‘‘spectral whitening’’ ef-
fects can be obtained by linear predictive inverse filtering or
center-clipping ͑Rabiner and Schafer, 1978͒, or by splitting
the signal over a bank of filters, calculating ACFs within
each channel, and adding the results after amplitude normal-
ization ͑de Cheveigne´, 1991͒. Auditory models based on au-
tocorrelation are currently one of the more popular ways to
The same is true after taking the square and averaging over a
window:
͚jϭtϩ1
tϩW
͑xjϪxjϩT͒2
ϭ0. ͑5͒
Conversely, an unknown period may be found by forming
the difference function:
dt͑␶͒ϭ ͚jϭ1
W
͑xjϪxjϩ␶͒2
, ͑6͒
and searching for the values of ␶ for which the function is
zero. There is an infinite set of such values, all multiples of
the period. The difference function calculated from the signal
in Fig. 1͑a͒ is illustrated in Fig. 3͑a͒. The squared sum may
FIG. 3. ͑a͒ Difference function calculated for the speech signal of Fig. 1͑a͒.
͑b͒ Cumulative mean normalized difference function. Note that the function
starts at 1 rather than 0 and remains high until the dip at the period.
size was 25 ms, window shift was one sample, search range was 40 to 800
Hz, and threshold ͑step 4͒ was 0.1.
Version Gross error ͑%͒
Step 1 10.0
Step 2 1.95
Step 3 1.69
Step 4 0.78
Step 5 0.77
Step 6 0.50
Lag	(samples)	
ed
a
od
re
ow
00
sed
h a
͑2͒
ned
has
if
tly.
its
hod
74;
ces
ain
The same is true after taking the square and averaging over a
FIG. 3. ͑a͒ Difference function calculated for the speech signal of Fig. 1͑a͒.
͑b͒ Cumulative mean normalized difference function. Note that the function
starts at 1 rather than 0 and remains high until the dip at the period.
hod
were
dow
800
Lag	(samples)	
␶max . The parameter ␶max allows the algorithm to be biased
to favor one form of error at the expense of the other, with a
minimum of total error for intermediate values. Using Eq. ͑2͒
rather than Eq. ͑1͒ introduces a natural bias that can be tuned
by adjusting W. However, changing the window size has
other effects, and one can argue that a bias of this sort, if
useful, should be applied explicitly rather than implicitly.
This is one reason to prefer the definition of Eq. ͑1͒.
The autocorrelation method compares the signal to its
shifted self. In that sense it is related to the AMDF method
͑average magnitude difference function, Ross et al., 1974;
Ney, 1982͒ that performs its comparison using differences
rather than products, and more generally to time-domain
The same is true after taking the square and averaging over a
window:
FIG. 3. ͑a͒ Difference function calculated for the speech signal of Fig. 1͑a͒.
͑b͒ Cumulative mean normalized difference function. Note that the function
starts at 1 rather than 0 and remains high until the dip at the period.
TABLE I. Gross error rates for the simple unbiased autocorrelation method
͑step 1͒, and for the cumulated steps described in the text. These rates were
measured over a subset of the database used in Sec. III. Integration window
size was 25 ms, window shift was one sample, search range was 40 to 800
Hz, and threshold ͑step 4͒ was 0.1.
Version Gross error ͑%͒
Step 1 10.0
Step 2 1.95
Step 3 1.69
Step 4 0.78
Step 5 0.77
Step 6 0.50
Lag	(samples)	
De Cheveigné, A., and Kawahara, H., "YIN, a fundamental frequency estimator for speech and music." The Journal of the
Acoustical Society of America 111, no. 4 (2002): 1917-1930.
Predominant Pitch Estimation: YIN
Predominant Pitch Estimation: Melodia
Salamon, J., & Gómez, E. (2012, August). Melody Extraction From Polyphonic Music Signals Using Pitch Contour Characteristics. IEEE
Transactions on Audio, Speech, and Language Processing, 20 (6), 1759–1770.
Predominant Pitch Estimation: Melodia
Salamon, J., & Gómez, E. (2012, August). Melody Extraction From Polyphonic Music Signals Using Pitch Contour Characteristics. IEEE
Transactions on Audio, Speech, and Language Processing, 20 (6), 1759–1770.
Predominant Pitch Estimation: Melodia
audio
Spectrogram
Spectral peaks
Salamon, J., & Gómez, E. (2012, August). Melody Extraction From Polyphonic Music Signals Using Pitch Contour Characteristics. IEEE
Transactions on Audio, Speech, and Language Processing, 20 (6), 1759–1770.
Predominant Pitch Estimation: Melodia
Spectral peaks
Time-frequency
salience
Salamon, J., & Gómez, E. (2012, August). Melody Extraction From Polyphonic Music Signals Using Pitch Contour Characteristics. IEEE
Transactions on Audio, Speech, and Language Processing, 20 (6), 1759–1770.
Predominant Pitch Estimation: Melodia
Time-frequency
salience
Salience peaks
Contours
Salamon, J., & Gómez, E. (2012, August). Melody Extraction From Polyphonic Music Signals Using Pitch Contour Characteristics. IEEE
Transactions on Audio, Speech, and Language Processing, 20 (6), 1759–1770.
Predominant Pitch Estimation: Melodia
Contours
Predominant
melody contours
Salamon, J., & Gómez, E. (2012, August). Melody Extraction From Polyphonic Music Signals Using Pitch Contour Characteristics. IEEE
Transactions on Audio, Speech, and Language Processing, 20 (6), 1759–1770.
Essentia implementation of Melodia
Essentia implementation of Melodia
Essentia implementation of Melodia
Essentia implementation of Melodia
Essentia implementation of Melodia
Essentia implementation of Melodia
Essentia implementation of Melodia
Essentia implementation of Melodia
Essentia implementation of Melodia
Audio
Spectrogram
Essentia implementation of Melodia
Essentia implementation of Melodia
Spectral peaks
Spectrogram
Essentia implementation of Melodia
Essentia implementation of Melodia
Time-frequency
salience
Spectral peaks
Essentia implementation of Melodia
Essentia implementation of Melodia
Salience peaks
Time-frequency
salience
Essentia implementation of Melodia
Essentia implementation of Melodia
All contours
Salience peaks
Essentia implementation of Melodia
Essentia implementation of Melodia
Predominant
melody contours
All contours
Essentia implementation of Melodia
Essentia implementation of Melodia
Essentia implementation of Melodia
Essentia implementation of Melodia
Predominant Pitch Estimation: Melodia
What about loudness and timbre?
What about loudness and timbre?
Loudness features in Essentia
Loudness of predominant voice
Loudness of predominant voiceFrequency	
Time
Loudness of predominant voiceFrequency	
Time
Loudness of predominant voiceFrequency	
Time	
F0
Loudness of predominant voiceFrequency	
Time	
F0
Loudness of predominant voiceFrequency	
Time	
F0
Loudness of predominant voiceFrequency	
Time	
F0
Loudness of predominant voice: example
Spectral centroid of predominant voice
CompMusic: Dunya
CompMusic: Dunya
API	 Internet
CompMusic: Dunya Web
CompMusic: Dunya API
hTps://github.com/MTG/pycompmusic
Dunya API Examples
q  Metadata
q  Features

Más contenido relacionado

Similar a [Tutorial] Computational Approaches to Melodic Analysis of Indian Art Music

An efficient peak valley detection based vad algorithm for robust detection o...
An efficient peak valley detection based vad algorithm for robust detection o...An efficient peak valley detection based vad algorithm for robust detection o...
An efficient peak valley detection based vad algorithm for robust detection o...csandit
 
A Combined Voice Activity Detector Based On Singular Value Decomposition and ...
A Combined Voice Activity Detector Based On Singular Value Decomposition and ...A Combined Voice Activity Detector Based On Singular Value Decomposition and ...
A Combined Voice Activity Detector Based On Singular Value Decomposition and ...CSCJournals
 
Investigation of-combined-use-of-mfcc-and-lpc-features-in-speech-recognition-...
Investigation of-combined-use-of-mfcc-and-lpc-features-in-speech-recognition-...Investigation of-combined-use-of-mfcc-and-lpc-features-in-speech-recognition-...
Investigation of-combined-use-of-mfcc-and-lpc-features-in-speech-recognition-...Cemal Ardil
 
A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...
A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...
A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...IRJET Journal
 
129966863723746268[1]
129966863723746268[1]129966863723746268[1]
129966863723746268[1]威華 王
 
Voice morphing document
Voice morphing documentVoice morphing document
Voice morphing documenthimadrigupta
 
Graphical visualization of musical emotions
Graphical visualization of musical emotionsGraphical visualization of musical emotions
Graphical visualization of musical emotionsPranay Prasoon
 
CORRELATION BASED FUNDAMENTAL FREQUENCY EXTRACTION METHOD IN NOISY SPEECH SIGNAL
CORRELATION BASED FUNDAMENTAL FREQUENCY EXTRACTION METHOD IN NOISY SPEECH SIGNALCORRELATION BASED FUNDAMENTAL FREQUENCY EXTRACTION METHOD IN NOISY SPEECH SIGNAL
CORRELATION BASED FUNDAMENTAL FREQUENCY EXTRACTION METHOD IN NOISY SPEECH SIGNALijcseit
 
CORRELATION BASED FUNDAMENTAL FREQUENCY EXTRACTION METHOD IN NOISY SPEECH SIGNAL
CORRELATION BASED FUNDAMENTAL FREQUENCY EXTRACTION METHOD IN NOISY SPEECH SIGNALCORRELATION BASED FUNDAMENTAL FREQUENCY EXTRACTION METHOD IN NOISY SPEECH SIGNAL
CORRELATION BASED FUNDAMENTAL FREQUENCY EXTRACTION METHOD IN NOISY SPEECH SIGNALijcseit
 
Robot navigation in unknown environment with obstacle recognition using laser...
Robot navigation in unknown environment with obstacle recognition using laser...Robot navigation in unknown environment with obstacle recognition using laser...
Robot navigation in unknown environment with obstacle recognition using laser...IJECEIAES
 
Handling Ihnarmonic Series with Median-Adjustive Trajectories
Handling Ihnarmonic Series with Median-Adjustive TrajectoriesHandling Ihnarmonic Series with Median-Adjustive Trajectories
Handling Ihnarmonic Series with Median-Adjustive TrajectoriesMatthieu Hodgkinson
 
Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...
Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...
Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...IJERA Editor
 
Analysis the results_of_acoustic_echo_cancellation_for_speech_processing_usin...
Analysis the results_of_acoustic_echo_cancellation_for_speech_processing_usin...Analysis the results_of_acoustic_echo_cancellation_for_speech_processing_usin...
Analysis the results_of_acoustic_echo_cancellation_for_speech_processing_usin...Venkata Sudhir Vedurla
 
CORRELATION BASED FUNDAMENTAL FREQUENCY EXTRACTION METHOD IN NOISY SPEECH SIGNAL
CORRELATION BASED FUNDAMENTAL FREQUENCY EXTRACTION METHOD IN NOISY SPEECH SIGNALCORRELATION BASED FUNDAMENTAL FREQUENCY EXTRACTION METHOD IN NOISY SPEECH SIGNAL
CORRELATION BASED FUNDAMENTAL FREQUENCY EXTRACTION METHOD IN NOISY SPEECH SIGNALijcseit
 
Modified synthesis strategy for vowels and semi vowels klatt synthesize
Modified synthesis strategy for vowels and semi vowels klatt synthesizeModified synthesis strategy for vowels and semi vowels klatt synthesize
Modified synthesis strategy for vowels and semi vowels klatt synthesizeIAEME Publication
 
Broad Phoneme Classification Using Signal Based Features
Broad Phoneme Classification Using Signal Based Features  Broad Phoneme Classification Using Signal Based Features
Broad Phoneme Classification Using Signal Based Features ijsc
 
Broad phoneme classification using signal based features
Broad phoneme classification using signal based featuresBroad phoneme classification using signal based features
Broad phoneme classification using signal based featuresijsc
 

Similar a [Tutorial] Computational Approaches to Melodic Analysis of Indian Art Music (20)

An efficient peak valley detection based vad algorithm for robust detection o...
An efficient peak valley detection based vad algorithm for robust detection o...An efficient peak valley detection based vad algorithm for robust detection o...
An efficient peak valley detection based vad algorithm for robust detection o...
 
A Combined Voice Activity Detector Based On Singular Value Decomposition and ...
A Combined Voice Activity Detector Based On Singular Value Decomposition and ...A Combined Voice Activity Detector Based On Singular Value Decomposition and ...
A Combined Voice Activity Detector Based On Singular Value Decomposition and ...
 
Investigation of-combined-use-of-mfcc-and-lpc-features-in-speech-recognition-...
Investigation of-combined-use-of-mfcc-and-lpc-features-in-speech-recognition-...Investigation of-combined-use-of-mfcc-and-lpc-features-in-speech-recognition-...
Investigation of-combined-use-of-mfcc-and-lpc-features-in-speech-recognition-...
 
A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...
A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...
A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...
 
129966863723746268[1]
129966863723746268[1]129966863723746268[1]
129966863723746268[1]
 
Voice morphing document
Voice morphing documentVoice morphing document
Voice morphing document
 
cr1503
cr1503cr1503
cr1503
 
Graphical visualization of musical emotions
Graphical visualization of musical emotionsGraphical visualization of musical emotions
Graphical visualization of musical emotions
 
CORRELATION BASED FUNDAMENTAL FREQUENCY EXTRACTION METHOD IN NOISY SPEECH SIGNAL
CORRELATION BASED FUNDAMENTAL FREQUENCY EXTRACTION METHOD IN NOISY SPEECH SIGNALCORRELATION BASED FUNDAMENTAL FREQUENCY EXTRACTION METHOD IN NOISY SPEECH SIGNAL
CORRELATION BASED FUNDAMENTAL FREQUENCY EXTRACTION METHOD IN NOISY SPEECH SIGNAL
 
CORRELATION BASED FUNDAMENTAL FREQUENCY EXTRACTION METHOD IN NOISY SPEECH SIGNAL
CORRELATION BASED FUNDAMENTAL FREQUENCY EXTRACTION METHOD IN NOISY SPEECH SIGNALCORRELATION BASED FUNDAMENTAL FREQUENCY EXTRACTION METHOD IN NOISY SPEECH SIGNAL
CORRELATION BASED FUNDAMENTAL FREQUENCY EXTRACTION METHOD IN NOISY SPEECH SIGNAL
 
Robot navigation in unknown environment with obstacle recognition using laser...
Robot navigation in unknown environment with obstacle recognition using laser...Robot navigation in unknown environment with obstacle recognition using laser...
Robot navigation in unknown environment with obstacle recognition using laser...
 
Handling Ihnarmonic Series with Median-Adjustive Trajectories
Handling Ihnarmonic Series with Median-Adjustive TrajectoriesHandling Ihnarmonic Series with Median-Adjustive Trajectories
Handling Ihnarmonic Series with Median-Adjustive Trajectories
 
Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...
Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...
Identification of Sex of the Speaker With Reference To Bodo Vowels: A Compara...
 
Analysis the results_of_acoustic_echo_cancellation_for_speech_processing_usin...
Analysis the results_of_acoustic_echo_cancellation_for_speech_processing_usin...Analysis the results_of_acoustic_echo_cancellation_for_speech_processing_usin...
Analysis the results_of_acoustic_echo_cancellation_for_speech_processing_usin...
 
F010334548
F010334548F010334548
F010334548
 
Ijeer journal
Ijeer journalIjeer journal
Ijeer journal
 
CORRELATION BASED FUNDAMENTAL FREQUENCY EXTRACTION METHOD IN NOISY SPEECH SIGNAL
CORRELATION BASED FUNDAMENTAL FREQUENCY EXTRACTION METHOD IN NOISY SPEECH SIGNALCORRELATION BASED FUNDAMENTAL FREQUENCY EXTRACTION METHOD IN NOISY SPEECH SIGNAL
CORRELATION BASED FUNDAMENTAL FREQUENCY EXTRACTION METHOD IN NOISY SPEECH SIGNAL
 
Modified synthesis strategy for vowels and semi vowels klatt synthesize
Modified synthesis strategy for vowels and semi vowels klatt synthesizeModified synthesis strategy for vowels and semi vowels klatt synthesize
Modified synthesis strategy for vowels and semi vowels klatt synthesize
 
Broad Phoneme Classification Using Signal Based Features
Broad Phoneme Classification Using Signal Based Features  Broad Phoneme Classification Using Signal Based Features
Broad Phoneme Classification Using Signal Based Features
 
Broad phoneme classification using signal based features
Broad phoneme classification using signal based featuresBroad phoneme classification using signal based features
Broad phoneme classification using signal based features
 

Más de Sankalp Gulati

Mining Melodic Patterns in Large Audio Collections of Indian Art Music
	Mining Melodic Patterns in Large Audio Collections of Indian Art Music	Mining Melodic Patterns in Large Audio Collections of Indian Art Music
Mining Melodic Patterns in Large Audio Collections of Indian Art MusicSankalp Gulati
 
Landmark Detection in Hindustani Music Melodies
Landmark Detection in Hindustani Music MelodiesLandmark Detection in Hindustani Music Melodies
Landmark Detection in Hindustani Music MelodiesSankalp Gulati
 
Phrase-based Rāga Recognition Using Vector Space Modeling
Phrase-based Rāga Recognition Using Vector Space ModelingPhrase-based Rāga Recognition Using Vector Space Modeling
Phrase-based Rāga Recognition Using Vector Space ModelingSankalp Gulati
 
Computational Approaches to Melodic Analysis of Indian Art Music
Computational Approaches to Melodic Analysis of Indian Art MusicComputational Approaches to Melodic Analysis of Indian Art Music
Computational Approaches to Melodic Analysis of Indian Art MusicSankalp Gulati
 
Computational Melodic Analysis of Indian Art Music
Computational Melodic Analysis of Indian Art MusicComputational Melodic Analysis of Indian Art Music
Computational Melodic Analysis of Indian Art MusicSankalp Gulati
 
Computational Approaches for Melodic Description in Indian Art Music Corpora
Computational Approaches for Melodic Description in Indian Art Music CorporaComputational Approaches for Melodic Description in Indian Art Music Corpora
Computational Approaches for Melodic Description in Indian Art Music CorporaSankalp Gulati
 
Discovery and Characterization of Melodic Motives in Large Audio Music Collec...
Discovery and Characterization of Melodic Motives in Large Audio Music Collec...Discovery and Characterization of Melodic Motives in Large Audio Music Collec...
Discovery and Characterization of Melodic Motives in Large Audio Music Collec...Sankalp Gulati
 
Tonic Identification System for Hindustani and Carnatic Music
Tonic Identification System for Hindustani and Carnatic MusicTonic Identification System for Hindustani and Carnatic Music
Tonic Identification System for Hindustani and Carnatic MusicSankalp Gulati
 
Tonic Identification System for Indian Art Music
Tonic Identification System for Indian Art MusicTonic Identification System for Indian Art Music
Tonic Identification System for Indian Art MusicSankalp Gulati
 

Más de Sankalp Gulati (10)

Mining Melodic Patterns in Large Audio Collections of Indian Art Music
	Mining Melodic Patterns in Large Audio Collections of Indian Art Music	Mining Melodic Patterns in Large Audio Collections of Indian Art Music
Mining Melodic Patterns in Large Audio Collections of Indian Art Music
 
Landmark Detection in Hindustani Music Melodies
Landmark Detection in Hindustani Music MelodiesLandmark Detection in Hindustani Music Melodies
Landmark Detection in Hindustani Music Melodies
 
Phrase-based Rāga Recognition Using Vector Space Modeling
Phrase-based Rāga Recognition Using Vector Space ModelingPhrase-based Rāga Recognition Using Vector Space Modeling
Phrase-based Rāga Recognition Using Vector Space Modeling
 
Computational Approaches to Melodic Analysis of Indian Art Music
Computational Approaches to Melodic Analysis of Indian Art MusicComputational Approaches to Melodic Analysis of Indian Art Music
Computational Approaches to Melodic Analysis of Indian Art Music
 
Computational Melodic Analysis of Indian Art Music
Computational Melodic Analysis of Indian Art MusicComputational Melodic Analysis of Indian Art Music
Computational Melodic Analysis of Indian Art Music
 
Computational Approaches for Melodic Description in Indian Art Music Corpora
Computational Approaches for Melodic Description in Indian Art Music CorporaComputational Approaches for Melodic Description in Indian Art Music Corpora
Computational Approaches for Melodic Description in Indian Art Music Corpora
 
Hindify
HindifyHindify
Hindify
 
Discovery and Characterization of Melodic Motives in Large Audio Music Collec...
Discovery and Characterization of Melodic Motives in Large Audio Music Collec...Discovery and Characterization of Melodic Motives in Large Audio Music Collec...
Discovery and Characterization of Melodic Motives in Large Audio Music Collec...
 
Tonic Identification System for Hindustani and Carnatic Music
Tonic Identification System for Hindustani and Carnatic MusicTonic Identification System for Hindustani and Carnatic Music
Tonic Identification System for Hindustani and Carnatic Music
 
Tonic Identification System for Indian Art Music
Tonic Identification System for Indian Art MusicTonic Identification System for Indian Art Music
Tonic Identification System for Indian Art Music
 

Último

ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 
Q4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxQ4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxnelietumpap1
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomnelietumpap1
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxAshokKarra1
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxMaryGraceBautista27
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxChelloAnnAsuncion2
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 

Último (20)

ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 
Q4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxQ4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptx
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choom
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptx
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptx
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 

[Tutorial] Computational Approaches to Melodic Analysis of Indian Art Music

  • 1. Computational Approaches to Melodic Analysis of Indian Art Music Indian Institute of Sciences, Bengaluru, India 2016 Sankalp Gulati Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain
  • 4. Tonic Identification time (s) Frequency(Hz) 0 1 2 3 4 5 6 7 8 0 1000 2000 3000 4000 5000 100 150 200 250 300 0 0.2 0.4 0.6 0.8 1 Frequency (bins), 1bin=10 cents, Ref=55 Hz Normalizedsalience f2 f3 f4 f 5f6 Tonic Signal processing Learning q  Tanpura / drone background sound q  Extent of gamakas on Sa and Pa svara q  Vadi, sam-vadi svara of the rāga S. Gulati, A. Bellur, J. Salamon, H. Ranjani, V. Ishwar, H.A. Murthy, and X. Serra. Automatic tonic identification in Indian art music: approaches and evaluation. Journal of New Music Research, 43(01):55–73, 2014. Salamon, J., Gulati, S., & Serra, X. (2012). A multipitch approach to tonic identification in Indian classical music. In Proc. of Int. Conf. on Music Information Retrieval (ISMIR) (pp. 499–504), Porto, Portugal. Bellur, A., Ishwar, V., Serra, X., & Murthy, H. (2012). A knowledge based signal processing approach to tonic identification in Indian classical music. In 2nd CompMusic Workshop (pp. 113–118) Istanbul, Turkey. Ranjani, H. G., Arthi, S., & Sreenivas, T. V. (2011). Carnatic music analysis: Shadja, swara identification and raga verification in Alapana using stochastic models. Applications of Signal Processing to Audio and Acoustics (WASPAA), IEEE Workshop , 29–32, New Paltz, NY. Accuracy : ~90% !!!
  • 5. Tonic Identification: Multipitch Approach q  Audio example: q  Utilizing drone sound q  Multi-pitch analysis Vocals Drone J. Salamon, E. G´omez, and J. Bonada. Sinusoid extraction and salience function design for predominant melody estimation. In Proc. 14th Int. Conf. on Digital Audio Effects (DAFX-11), pages 73–80, Paris, France, Sep. 2011.
  • 6. Tonic Identification: Block Diagram STFT Spectral Peak Picking Frequency/ Amplitude correc<on Salience peak picking Mul<-pitch histogram Histogram peak picking Bin salience mapping Harmonic summa<on Audio Sinusoids Time frequency salience Sinusoid Extrac<on Tonic candidates Salience func<on computa<on Tonic candidate genera<on
  • 7. Tonic Identification: Signal Processing q  STFT §  Hop size: 11 ms §  Window length: 46 ms §  Window type: hamming §  FFT = 8192 points STFT
  • 8. Tonic Identification: Signal Processing q  Spectral peak picking §  Absolute threshold: -60 dB Spectral Peak Picking
  • 9. Tonic Identification: Signal Processing q  Frequency/Amplitude correction §  Parabolic interpolation Frequency/ Amplitude correc<on
  • 10. Tonic Identification: Signal Processing q  Harmonic summation §  Spectrum considered: 55-7200 Hz §  Frequency range: 55-1760 Hz §  Base frequency: 55 Hz §  Bin resolution: 10 cents per bin (120 per octave) §  N octaves: 5 §  Maximum harmonics: 20 §  Square cosine window across 50 cents Bin salience mapping Harmonic summa<on
  • 11. Tonic Identification: Signal Processing q  Tonic candidate generation §  Number of salience peaks per frame: 5 §  Frequency range: 110-550 Hz Mul<-pitch histogram
  • 12. Tonic Identification: Feature Exraction q  Identifying tonic in correct octave using multi-pitch histogram q  Classification based template learning q  Class of an instance is the rank of the tonic 100 150 200 250 300 350 400 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Frequency bins (1 bin = 10 cents), Ref: 55Hz Normalizedsalience Multipitch Histogram f2 f3 f4 f5
  • 13. q  Decision Tree: f2 f3 f2 f3 f5 1st 1st 2nd 3rd 4th 5th >5 <=5 >-7 <=-7 >-11 <=-11 >5 <=5 >-6 <=-6 Sa Sa Pa salience Frequency Sa Sa Pa salience Frequency Tonic Identification: Classification
  • 14. Tonic Identification: Results S. Gulati, A. Bellur, J. Salamon, H. Ranjani, V. Ishwar, H.A. Murthy, and X. Serra. Automatic tonic identification in Indian art music: approaches and evaluation. Journal of New Music Research, 43(01): 55–73, 2014.
  • 16. Pitch Estimation Algorithms q  Time-domain approaches §  ACF-based (Rabiner 1977) §  AMDF-based (YIN) Cheveigné et al. q  Frequency-domain approaches §  Two-way mismatch (Maher and Beauchamp 1994) §  Subharmonic summation (Hermes 1988) Rabiner, L. (1977, February). On the use of autocorrelation analysis for pitch detection. IEEE Transactions on Acoustics, Speech, and Signal Processing, 25(1), 24–33 De Cheveigné, A., and Kawahara, H., "YIN, a fundamental frequency estimator for speech and music." The Journal of the Acoustical Society of America 111, no. 4 (2002): 1917-1930. §  Multi-pitch approaches §  Source separation-based (Klapuri, 2003) §  Harmonic summation (Melodia) (Salamon and Gómez, 2012) Medan, Y., & Yair, E. (1991). Super resolution pitch determination of speech signals. IEEE transactions on signal processing, 39(1), 40–48. Maher, R., & Beauchamp, J. W. (1994). Fundamental frequency estimation of musical signals using a two-way mismatch procedure. The Journal of the Acoustical Society of , 95 (April), 2254–2263. Hermes, D. (1988, 1988). Measurement of pitch by subharmonic summation. Journal of the Acoustical Society of America, 83, 257 - 264. Klapuri, A. (2003b, November). Multiple fundamental frequency estimation based on harmonicity and spectral smoothness. IEEE Transactions on Speech and Audio Processing, 11(6), 804–816. Salamon, J., & Gómez, E. (2012, August). Melody Extraction From Polyphonic Music Signals Using Pitch Contour Characteristics. IEEE Transactions on Audio, Speech, and Language Processing, 20(6), 1759–1770.
  • 17. Pitch Estimation Algorithms q  Time-domain approaches §  ACF-based (Rabiner 1977) §  AMDF-based (YIN) Cheveigné et al. q  Frequency-domain approaches §  Two-way mismatch (Maher and Beauchamp 1994) §  Subharmonic summation (Hermes 1988) Rabiner, L. (1977, February). On the use of autocorrelation analysis for pitch detection. IEEE Transactions on Acoustics, Speech, and Signal Processing, 25(1), 24–33 De Cheveigné, A., and Kawahara, H., "YIN, a fundamental frequency estimator for speech and music." The Journal of the Acoustical Society of America 111, no. 4 (2002): 1917-1930. §  Multi-pitch approaches §  Source separation-based (Klapuri, 2003) §  Harmonic summation (Melodia) (Salamon and Gómez, 2012) Medan, Y., & Yair, E. (1991). Super resolution pitch determination of speech signals. IEEE transactions on signal processing, 39(1), 40–48. Maher, R., & Beauchamp, J. W. (1994). Fundamental frequency estimation of musical signals using a two-way mismatch procedure. The Journal of the Acoustical Society of , 95 (April), 2254–2263. Hermes, D. (1988, 1988). Measurement of pitch by subharmonic summation. Journal of the Acoustical Society of America, 83, 257 - 264. Klapuri, A. (2003b, November). Multiple fundamental frequency estimation based on harmonicity and spectral smoothness. IEEE Transactions on Speech and Audio Processing, 11(6), 804–816. Salamon, J., & Gómez, E. (2012, August). Melody Extraction From Polyphonic Music Signals Using Pitch Contour Characteristics. IEEE Transactions on Audio, Speech, and Language Processing, 20(6), 1759–1770.
  • 18. Predominant Pitch Estimation: YIN Signal Difference function Auto-correlation Cumulative difference function rt͑␶͒ϭ ͚jϭtϩ1 tϩW xjxjϩ␶, ͑1͒ where rt(␶) is the autocorrelation function of lag ␶ calculated at time index t, and W is the integration window size. This function is illustrated in Fig. 1͑b͒ for the signal plotted in Fig. 1͑a͒. It is common in signal processing to use a slightly different definition: rtЈ͑␶͒ϭ ͚jϭtϩ1 tϩWϪ␶ xjxjϩ␶. ͑2͒ Here the integration window size shrinks with increasing values of ␶, with the result that the envelope of the function decreases as a function of lag as illustrated in Fig. 1͑c͒. The FIG. 1. ͑a͒ Example of a speech waveform. ͑b͒ Autocorrelation function ͑ACF͒ calculated from the waveform in ͑a͒ according to Eq. ͑1͒. ͑c͒ Same, calculated according to Eq. ͑2͒. The envelope of this function is tapered to zero because of the smaller number of terms in the summation at larger ␶. FIG. 2. F0 estimation error rates as a function of the slope of the envelope of the ACF, quantified by its intercept with the abscissa. The dotted line represents errors for which the F0 estimate was too high, the dashed line those for which it was too low, and the full line their sum. Triangles at the right represent error rates for ACF calculated as in Eq. ͑1͒ (␶maxϭϱ). These rates were measured over a subset of the database used in Sec. III. Lag (samples) The present article introduces a method for F0 estima- tion that produces fewer errors than other well-known meth- ods. The name YIN ͑from ‘‘yin’’ and ‘‘yang’’ of oriental philosophy͒ alludes to the interplay between autocorrelation and cancellation that it involves. This article is the first of a rt͑␶͒ϭ where rt(␶ at time ind function is Fig. 1͑a͒. I different d rtЈ͑␶͒ϭ Here the values of ␶ decreases two definit side ͓tϩ1, this articl ‘‘modified correlation In resp multiples FIG. 1. ͑a͒ Example of a speech waveform. ͑b͒ Autocorrelation function ͑ACF͒ calculated from the waveform in ͑a͒ according to Eq. ͑1͒. ͑c͒ Same, calculated according to Eq. ͑2͒. The envelope of this function is tapered to zero because of the smaller number of terms in the summation at larger ␶. The horizontal arrows symbolize the search range for the period. FIG. 2. F0 e of the ACF, represents er those for wh right represen rates were m ␶max . The parameter ␶max allows the algorithm to be biased to favor one form of error at the expense of the other, with a minimum of total error for intermediate values. Using Eq. ͑2͒ rather than Eq. ͑1͒ introduces a natural bias that can be tuned by adjusting W. However, changing the window size has other effects, and one can argue that a bias of this sort, if useful, should be applied explicitly rather than implicitly. This is one reason to prefer the definition of Eq. ͑1͒. The autocorrelation method compares the signal to its shifted self. In that sense it is related to the AMDF method ͑average magnitude difference function, Ross et al., 1974; Ney, 1982͒ that performs its comparison using differences rather than products, and more generally to time-domain methods that measure intervals between events in time ͑Hess, 1983͒. The ACF is the Fourier transform of the power spectrum, and can be seen as measuring the regular spacing of harmonics within that spectrum. The cepstrum method ͑Noll, 1967͒ replaces the power spectrum by the log magni- tude spectrum and thus puts less weight on high-amplitude parts of the spectrum ͑particularly near the first formant that often dominates the ACF͒. Similar ‘‘spectral whitening’’ ef- fects can be obtained by linear predictive inverse filtering or center-clipping ͑Rabiner and Schafer, 1978͒, or by splitting the signal over a bank of filters, calculating ACFs within each channel, and adding the results after amplitude normal- ization ͑de Cheveigne´, 1991͒. Auditory models based on au- tocorrelation are currently one of the more popular ways to The same is true after taking the square and averaging over a window: ͚jϭtϩ1 tϩW ͑xjϪxjϩT͒2 ϭ0. ͑5͒ Conversely, an unknown period may be found by forming the difference function: dt͑␶͒ϭ ͚jϭ1 W ͑xjϪxjϩ␶͒2 , ͑6͒ and searching for the values of ␶ for which the function is zero. There is an infinite set of such values, all multiples of the period. The difference function calculated from the signal in Fig. 1͑a͒ is illustrated in Fig. 3͑a͒. The squared sum may FIG. 3. ͑a͒ Difference function calculated for the speech signal of Fig. 1͑a͒. ͑b͒ Cumulative mean normalized difference function. Note that the function starts at 1 rather than 0 and remains high until the dip at the period. size was 25 ms, window shift was one sample, search range was 40 to 800 Hz, and threshold ͑step 4͒ was 0.1. Version Gross error ͑%͒ Step 1 10.0 Step 2 1.95 Step 3 1.69 Step 4 0.78 Step 5 0.77 Step 6 0.50 Lag (samples) ed a od re ow 00 sed h a ͑2͒ ned has if tly. its hod 74; ces ain The same is true after taking the square and averaging over a FIG. 3. ͑a͒ Difference function calculated for the speech signal of Fig. 1͑a͒. ͑b͒ Cumulative mean normalized difference function. Note that the function starts at 1 rather than 0 and remains high until the dip at the period. hod were dow 800 Lag (samples) ␶max . The parameter ␶max allows the algorithm to be biased to favor one form of error at the expense of the other, with a minimum of total error for intermediate values. Using Eq. ͑2͒ rather than Eq. ͑1͒ introduces a natural bias that can be tuned by adjusting W. However, changing the window size has other effects, and one can argue that a bias of this sort, if useful, should be applied explicitly rather than implicitly. This is one reason to prefer the definition of Eq. ͑1͒. The autocorrelation method compares the signal to its shifted self. In that sense it is related to the AMDF method ͑average magnitude difference function, Ross et al., 1974; Ney, 1982͒ that performs its comparison using differences rather than products, and more generally to time-domain The same is true after taking the square and averaging over a window: FIG. 3. ͑a͒ Difference function calculated for the speech signal of Fig. 1͑a͒. ͑b͒ Cumulative mean normalized difference function. Note that the function starts at 1 rather than 0 and remains high until the dip at the period. TABLE I. Gross error rates for the simple unbiased autocorrelation method ͑step 1͒, and for the cumulated steps described in the text. These rates were measured over a subset of the database used in Sec. III. Integration window size was 25 ms, window shift was one sample, search range was 40 to 800 Hz, and threshold ͑step 4͒ was 0.1. Version Gross error ͑%͒ Step 1 10.0 Step 2 1.95 Step 3 1.69 Step 4 0.78 Step 5 0.77 Step 6 0.50 Lag (samples) De Cheveigné, A., and Kawahara, H., "YIN, a fundamental frequency estimator for speech and music." The Journal of the Acoustical Society of America 111, no. 4 (2002): 1917-1930.
  • 20. Predominant Pitch Estimation: Melodia Salamon, J., & Gómez, E. (2012, August). Melody Extraction From Polyphonic Music Signals Using Pitch Contour Characteristics. IEEE Transactions on Audio, Speech, and Language Processing, 20 (6), 1759–1770.
  • 21. Predominant Pitch Estimation: Melodia Salamon, J., & Gómez, E. (2012, August). Melody Extraction From Polyphonic Music Signals Using Pitch Contour Characteristics. IEEE Transactions on Audio, Speech, and Language Processing, 20 (6), 1759–1770.
  • 22. Predominant Pitch Estimation: Melodia audio Spectrogram Spectral peaks Salamon, J., & Gómez, E. (2012, August). Melody Extraction From Polyphonic Music Signals Using Pitch Contour Characteristics. IEEE Transactions on Audio, Speech, and Language Processing, 20 (6), 1759–1770.
  • 23. Predominant Pitch Estimation: Melodia Spectral peaks Time-frequency salience Salamon, J., & Gómez, E. (2012, August). Melody Extraction From Polyphonic Music Signals Using Pitch Contour Characteristics. IEEE Transactions on Audio, Speech, and Language Processing, 20 (6), 1759–1770.
  • 24. Predominant Pitch Estimation: Melodia Time-frequency salience Salience peaks Contours Salamon, J., & Gómez, E. (2012, August). Melody Extraction From Polyphonic Music Signals Using Pitch Contour Characteristics. IEEE Transactions on Audio, Speech, and Language Processing, 20 (6), 1759–1770.
  • 25. Predominant Pitch Estimation: Melodia Contours Predominant melody contours Salamon, J., & Gómez, E. (2012, August). Melody Extraction From Polyphonic Music Signals Using Pitch Contour Characteristics. IEEE Transactions on Audio, Speech, and Language Processing, 20 (6), 1759–1770.
  • 34. Essentia implementation of Melodia Audio Spectrogram
  • 36. Essentia implementation of Melodia Spectral peaks Spectrogram
  • 38. Essentia implementation of Melodia Time-frequency salience Spectral peaks
  • 40. Essentia implementation of Melodia Salience peaks Time-frequency salience
  • 42. Essentia implementation of Melodia All contours Salience peaks
  • 44. Essentia implementation of Melodia Predominant melody contours All contours
  • 50. What about loudness and timbre?
  • 51. What about loudness and timbre?
  • 54. Loudness of predominant voiceFrequency Time
  • 55. Loudness of predominant voiceFrequency Time
  • 56. Loudness of predominant voiceFrequency Time F0
  • 57. Loudness of predominant voiceFrequency Time F0
  • 58. Loudness of predominant voiceFrequency Time F0
  • 59. Loudness of predominant voiceFrequency Time F0
  • 60. Loudness of predominant voice: example
  • 61. Spectral centroid of predominant voice
  • 66. Dunya API Examples q  Metadata q  Features