Computational Approaches for Melodic Description in Indian Art Music Corpora

5[Y a M U[ M 3 [MOT _ R[ [PUO
6 _O U U[ U PUM 3 a_UO 5[ [ M
.
6[O [ M ET _U_ B _ M U[
a_UO E OT [ [Se 9 [a
6 M Y [R R[ YM U[ M P 5[YYa UOM U[ E OT [ [SU _
F Ub _U M B[Y a MN M 4M O [ M D MU
ET _U_ PU O [ 1 ET _U_ O[YYU Y YN _1
.
)/ [b YN ().

Aa U
[PaO U[
a_UO 5[ [ M
M P 6M M_ _
[Pe 6 _O U [ _
M P C _ M U[ _
[Pe BM
B [O __U S
3 UOM U[ _
DaYYM e M P
a a B _ O Ub _ClSM C O[S U U[
4

Aa U
[PaO U[
a_UO 5[ [ M
M P 6M M_ _
[Pe 6 _O U [ _
M P C _ M U[ _
[Pe BM
B [O __U S
3 UOM U[ _
DaYYM e M P
a a B _ O Ub _ClSM C O[S U U[
Chapter 1
Introduction
Information technology (IT) is constantly shaping the world that we live in. Its ad-
vancements have changed human behavior and influenced the way we connect with
our environment. Music being a universal socio-cultural phenomena has been deeply
influenced by these advancements. The way music is created, stored, disseminated,
listened to, and even learned has changed drastically over the last few decades. A
massive amount of audio music content is now available on demand. Thus, it becomes
necessary to develop computational techniques that can process and automatically de-
scribe large volumes of digital music content to facilitate novel ways of interaction
with it. There are different information sources such as editorial metadata, social data,
and audio recordings that can be exploited to generate a description of music. Melody,
along with harmony and rhythm, is a fundamental facet of music, and therefore, an
essential component in its description. In this thesis, we focus on describing melodic
aspects of music through an automated analysis of audio content.
1.1 Motivation
“It is melody that enables us to distinguish one work from another. It
is melody that human beings are innately able to reproduce by singing,
humming, and whistling. It is melody that makes music memorable: we
are likely to recall a tune long after we have forgotten its text.”
(Selfridge-Field, 1998)
The importance of melody in our musical experiences makes its analysis and descrip-
tion a crucial component in music content processing. It becomes even more import-
ant for melody dominant music traditions such as Indian art music (IAM), where the
concept of harmony (functional harmony as understood in common practice) does not
exist, and the complex melodic structure takes the central role in music aesthetics.
1
Chapter 3
Music Corpora and Datasets
3.1 Introduction
A research corpus is a collection of data compiled to study a research problem. A well
designed research corpus is representative of the domain under study. It is practically
infeasible to work with the entire universe of data. Therefore, to ensure scalability of
information retrieval technologies to real-world scenarios, it is to important to develop
and test computational approaches using a representative data corpus. Moreover, an
easily accessible data corpus provides a common ground for researchers to evaluate
their methods, and thus, accelerates knowledge sharing and advancement.
Not every computational task requires the entire research corpus for development
and evaluation of approaches. Typically a subset of the corpus is used in a specific
research task. We call this subset a test corpus or test dataset. The models built over a
test dataset can later be extended to the entire research corpus. Test dataset is a static
collection of data specific to an experiment, as opposed to a research corpus, which
can evolve over time. Therefore, different versions of the test dataset used in a specific
experiment should be retained for ensuring reproducibility of the research results.
Note that a test dataset should not be confused with the training and testing split of
a dataset, which are the terms used in the context of a cross validation experimental
setup.
In MIR, a considerable number of the computational approaches follow a data-driven
methodology, and hence, a well curated research corpus becomes a key factor in de-
termining the success of these approaches. Due to the importance of a good data
corpus in research, building a corpus in itself is a fundamental research task (MacMul-
len, 2003). MIR can be regarded as a relatively new research area within information
retrieval, which has primarily gained popularity in last two decades. Even today, a
significant number of the studies in MIR use ad-hoc procedures to build a collection
of data to be used in the experiments. Quite often the audio recordings used in the
experiments are taken from a researcher’s personal music collection. Availability of a
good representative research corpus has been a challenge in MIR (Serra, 2014). This
61
Chapter 4
Melody Descriptors and
Representations
4.1 Introduction
In this chapter, we describe methods for extracting relevant melodic descriptors and
low-level melody representation from raw audio signals. These melodic features are
used as inputs to perform higher level melodic analyses by the methods described in
the subsequent chapters. Since these features are used in a number of computational
tasks, we consolidate and present these common processing blocks in the current
chapter. This chapter is largely based on our published work presented in Gulati et al.
(2014a,b,c).
There are four sections in this chapter, and in each section, we describe the extraction
of a different melodic descriptor or a representation.
In Section 4.2, we focus on the task of automatically identifying the tonic pitch
in an audio recording. Our main objective is to perform a comparative evalu-
ation of different existing methods and select the best method to be used in our
work.
In Section 4.3, we present the method used to extract the predominant pitch
from audio signals, and describe the post-processing steps to reduce frequently
occurring errors.
In Section 4.4, we describe the process of segmenting the solo percussion re-
gions (Tani sections) in the audio recordings of Carnatic music.
In Section 4.5, we describe our ny¯as-based approach for segmenting melodies
in Hindustani music.
87
Chapter 5
Melodic Pattern Processing:
Similarity, Discovery and
Characterization
5.1 Introduction
In this chapter, we present our methodology for discovering musically relevant melodic
patterns in sizable audio collections of IAM. We address three main computational
tasks involved in this process: melodic similarity, pattern discovery and characteriz-
ation of the discovered melodic patterns. We refer to these different tasks jointly as
melodic pattern processing.
“Only by repetition can a series of tones be characterized as something
definite. Only repetition can demarcate a series of tones and its purpose.
Repetition thus is the basis of music as an art”
(Schenker et al., 1980)
Repeating patterns are at the core of music. Consequently, analysis of patterns is
fundamental in music analysis. In IAM, recurring melodic patterns are the building
blocks of melodic structures. They provide a base for improvisation and composition,
and thus, are crucial to the analysis and description of r¯agas, compositions, and artists
in this music tradition. A detailed account of the importance of melodic patterns in
IAM is provided in Section 2.3.2.
To recapitulate, from the literature review presented in Section 2.4.2 and Section 2.5.2
we see that the approaches for pattern processing in music can be broadly put into
two categories (Figure 5.1). One of the types of approaches perform pattern detection
(or matching) and follow a supervised methodology. In these approaches the system
knows a priori the pattern to be extracted. Typically such a system is fed with exem-
plar patterns or queries and is expected to extract all of their occurrences in a piece of
121
Chapter 6
Automatic R¯aga Recognition
6.1 Introduction
In this chapter, we address the task of automatically recognizing r¯agas in audio re-
cordings of IAM. We describe two novel approaches for r¯aga recognition that jointly
capture the tonal and the temporal aspects of melody. The contents of this chapter are
largely based on our published work in Gulati et al. (2016a,b).
R¯aga is a core musical concept used in compositions, performances, music organiz-
ation, and pedagogy of IAM. Even beyond the art music, numerous compositions in
Indian folk and film music are also based on r¯agas (Ganti, 2013). R¯aga is therefore
one of the most desired melodic descriptions of a recorded performance of IAM, and
an important criterion used by listeners to browse its audio music collections. Despite
its significance, there exists a large volume of audio content whose r¯aga is incorrectly
labeled or, simply, unlabeled. A computational approach to r¯aga recognition will al-
low us to automatically annotate large collections of audio music recordings. It will
enable r¯aga-based music retrieval in large audio archives, semantically-meaningful
music discovery and musicologically-informed navigation. Furthermore, a deeper
understanding of the r¯aga framework from a computational perspective will pave the
way for building applications for music pedagogy in IAM.
R¯aga recognition is the most studied research topic in MIR of IAM. There exist a
considerable number of approaches utilizing different characteristic aspects of r¯agas
such as svara set, svara salience and ¯ar¯ohana-avr¯ohana. A critical in-depth review of
the existing approaches for r¯aga recognition is presented in Section 2.4.3, wherein we
identify several shortcomings in these approaches and possible avenues for scientific
contribution to take this task to the next level. Here we provide a short summary of
this analysis:
Nearly half of the number of the existing approaches for r¯aga recognition do
not utilize the temporal aspects of melody at all (Table 2.3), which are crucial
177
Chapter 7
Applications
7.1 Introduction
The research work presented in this thesis has several applications. A number of these
applications have already been described in Section 1.1. While some applications
such as r¯aga-based music retrieval and music discovery are relevant mainly in the
context of large audio collections, there are several applications that can be developed
on the scale of the music collection compiled in the CompMusic project. In this
chapter, we present some concrete examples of such applications that have already
incorporated parts of our work. We provide a brief overview of Dunya, a collection of
the music corpora and software tools developed during the CompMusic project and,
Sar¯aga and Riy¯az, the mobile applications developed within the technology transfer
project, Culture Aware MUsic Technologies (CAMUT). In addition, we present three
web demos that showcase parts of the outcomes of our computational methods. We
also briefly present one of our recent studies that performs musicologically motivated
exploration of melodic structures in IAM. It serves as an example of how our methods
can facilitate investigations in computational musicology.
7.2 Dunya
Dunya55 comprises the music corpora and the software tools that have been developed
as part of the CompMusic project. It includes data for five music traditions - Hindus-
tani, Carnatic, Turkish Makam, Jingju and Andalusian music. By providing access
to the gathered data (such as audio recordings and metadata), and the generated data
(such as audio descriptors) in the project, Dunya aims to facilitate study and explor-
ation of relevant musical aspects of different music repertoires. The CompMusic
corpora mainly comprise audio recordings and complementary information that de-
scribes those recordings. This complementary information can be in the form of the
55http://dunya.compmusic.upf.edu/
207
Chapter 8
Summary and Future Perspectives
8.1 Introduction
In this thesis, we have presented a number of computational approaches for analyzing
melodic elements at different levels of melodic organization in IAM. In tandem, these
approaches generate a high-level melodic description of the audio collections in this
music tradition. They build upon each other to finally allow us to achieve the goals
that we set at the beginning of this thesis, which are:
To curate and structure representative music corpora of IAM that comprise au-
dio recordings and the associated metadata, and use that to compile sizable and
well annotated tests datasets for melodic analyses.
To develop data-driven computational approaches for the discovery and char-
acterization of musically relevant melodic patterns in sizable audio collections
of IAM
To devise computational approaches for automatically recognizing r¯agas in re-
corded performances of IAM.
Based on the results presented in this thesis, we can now say that our goals are suc-
cessfully met. We started this thesis by presenting our motivation behind the analysis
and description of melodies in IAM, highlighting the opportunities and challenges
that this music tradition offers within the context of music information research and
the CompMusic project (Chapter 1). We provided a brief introduction to IAM and its
melodic organization, and critically reviewed the existing literature on related topics
within the context of MIR and IAM (Chapter 2).
In Chapter 3, we provided a comprehensive overview of the music corpora and data-
sets that were compiled and structured in our work as a part of the CompMusic
project. To the best of our knowledge, these corpora comprise the largest audio
collections of IAM along with curated metadata and automatically extracted music
221
Chapter 2
Background
2.1 Introduction
In this chapter, we present our review of the existing literature related with the work
presented in this thesis. In addition, we also present a brief overview of the relevant
music and mathematical background. We start with a brief discussion on the termin-
ology used in this thesis (Section 2.2). Subsequently, we provide an overview of the
selected music concepts to better understand the computational tasks addressed in
our work (Section 2.3). We then present a review of the relevant literature, which we
divide into two parts. We first present a review of the work done in computational ana-
lysis of IAM (Section 2.4). This includes approaches for tonic identification, melodic
pattern processing and automatic r¯aga recognition. In the second part, we present
relevant work done in MIR in general, including topics related to pattern processing
(detection and discovery), and key estimation (Section 2.5). Finally, we provide a
brief overview of selected scientific concepts (Section 2.6).
2.2 Terminology
In this section, we provide our working definition of selected terms that we have
used throughout the thesis. To begin with, it is important to understand the mean-
ing of melody in the context of this thesis. As we see from the literature, defining
melody in itself has been a challenge, with no consensus on a single definition of
melody (Gómez et al., 2003; Salamon, 2013). We do not aim here to formally define
melody in the universal sense, but present its working definition within the scope of
this thesis. Before we proceed, it is important to understand the setup of a concert
of IAM. Every performance of IAM has a lead artist, who plays the central role (also
literally positioned in the center of the stage), and all other instruments are considered
as accompaniments. There is a main melody line by the lead performer, and typic-
ally, also a melodic accompaniment that follows the main melody. A more detailed
13
5

[PaO U[
6
Chapter 1
Introduction
Information technology (IT) is constantly shaping the world that we live in. Its ad-
vancements have changed human behavior and inﬂuenced the way we connect with
our environment. Music being a universal socio-cultural phenomena has been deeply
inﬂuenced by these advancements. The way music is created, stored, disseminated,
listened to, and even learned has changed drastically over the last few decades. A
massive amount of audio music content is now available on demand. Thus, it becomes
necessary to develop computational techniques that can process and automatically de-
scribe large volumes of digital music content to facilitate novel ways of interaction
with it. There are different information sources such as editorial metadata, social data,
and audio recordings that can be exploited to generate a description of music. Melody,
along with harmony and rhythm, is a fundamental facet of music, and therefore, an
essential component in its description. In this thesis, we focus on describing melodic
aspects of music through an automated analysis of audio content.
1.1 Motivation
“It is melody that enables us to distinguish one work from another. It
is melody that human beings are innately able to reproduce by singing,
humming, and whistling. It is melody that makes music memorable: we
are likely to recall a tune long after we have forgotten its text.”
(Selfridge-Field, 1998)
The importance of melody in our musical experiences makes its analysis and descrip-
tion a crucial component in music content processing. It becomes even more import-
ant for melody dominant music traditions such as Indian art music (IAM), where the
concept of harmony (functional harmony as understood in common practice) does not
exist, and the complex melodic structure takes the central role in music aesthetics.
1

[ UbM U[
3a [YM UO Ya_UO P _O U U[
7

[ UbM U[
] ]_Ue]gl& <]fWbiYel&
YUeW & IUX]b&
IYWb YaXUg]ba
GYXU[b[l& =XhWUg]ba&
=agYegU]a Yag&
=a UaWYX _]fgYa]a[
b chgUg]baU_
hf]Wb_b[l
8

[ UbM U[
IYceYfYagUg]ba
Y[ YagUg]ba
] ]_Ue]gl
Dbg]Zf <]fWbiYel
GUggYea UeUWgYe]mUg]ba
DY_bX]W
<YfWe]cg]ba
9

5[Y a_UO B [V O
• Serra, X. (2011). A multicultural approach to music information research. In Proc. of ISMIR, pp. 151–156.
<YfWe]cg]ba
<UgU Xe]iYa
h_gheY U UeY
FcYa UWWYff
IYcebXhW]V]_]gl
10

PUM M a_UO
11
]aXhfgUa]
BUhfghi B( ?Ua[h_]
UeaUg]W
M][aYf Af Ue

E YU [ [Se
§ E[ UO1 4M_ U OT GM U _ MO [__ M U_ _
§ [Pe1 BU OT [R T P[YU M Y [PUO _[a O
§ ClSM1 [PUO R MY c[ W
§ DbM M1 3 M [S[a_ [ M [
§ 3 mTM M Mb mTM M1 3_O PU S P _O PU S [S __U[
§ 5TM M 1 3N_ MO U[ [R T Y [PUO [S __U[
§ [ UR_1 5TM MO U_ UO Y [PUO M _
§ 9MYMWM_1 9 UPU S Y [PUO S _ a _ 5M M UO Ya_UO
12

A [ a U U _ M P 5TM S _
§ A [ a U U _
13

A [ a U U _ M P 5TM S _
§ A [ a U U _
§ H _ aPU P Ya_UO MPU U[
14

A [ a U U _ M P 5TM S _
§ A [ a U U _
§ : [ T[ UO MO[a_ UOM OTM MO U_ UO_
15

A [ a U U _ M P 5TM S _
§ A [ a U U _
• http://www.music- ir.org/mirex/wiki/MIREX_HOME
• Salamon, J. & Gómez, E. (2012). Melody extraction from polyphonic music signals using pitch contour characteristics. IEEE
Transactions on Audio, Speech, and Language Processing, 20(6), 1759–1770.
DAI=O /DAI=O 3 /<: AE<A9E 2
16
< <

A [ a U U _ M P 5TM S _
§ A [ a U U _
§ [PUO M _ à NaU PU S N [OW_
17

A [ a U U _ M P 5TM S _
§ A [ a U U _
§ [PUO M _ à NaU PU S N [OW_
§ ClSM R MY c[ W1
• Martinez, J. L. (2001). Semiosis in Hindustani Music. Motilal Banarsidass Publishers. 18
rK Y ex[U ]f beY Z]kYX g Ua U bXY& UaX _Yff Z]kYX g Ua g Y Y_bXl& VYlbaX g Y
bXY UaX f beg bZ Y_bXl& UaX e]W Ye Vbg g Ua U []iYa bXY be U []iYa
Y_bXl(s
RDUeg]aYm& G(30S
DbXY 5 Ix[U 5 DY_bXl

A [ a U U _ M P 5TM S _
§ 5TM S _
19

A [ a U U _ M P 5TM S _
§ 5TM S _
§ Y [bU_M [ e Ya_UO MPU U[
Time (s)
10 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0
200
400
600
800
000
200
400
20

A [ a U U _ M P 5TM S _
§ 5TM S _
§ [PUO M _O U U[ OTM SU S M P U P RU P
0 2 4 6 8 10
Time (s)
100
0
100
200
300
400
500
600
Frequency(Cents)
S
m
g
Characteristic
movement
• http://www.freesound.org/people/sankalp/sounds/360428/ 21

A [ a U U _ M P 5TM S _
§ 5TM S _
§ [ _ M PM P R O R ]a Oe E[ UO U OT
9
#++
9
#++0
:
#+,-
?
#+ -
?
#32
22

A [ a U U _ M P 5TM S _
§ 5TM S _
§ [ _ M PM P R O R ]a Oe E[ UO U OT
§ >[ S MaPU[ O[ PU S_
q+ FLI
23

4 [MP ANV O Ub _
§ 5[Y U M U[ Oa M U[ [R O[ [ M [R 3
§ 6U_O[b e OTM MO UfM U[ [R Y [PUO M _
§ 3a [YM UO lSM O[S U U[
24

5[Y a M U[ M EM_W_
DY_bX]W
<YfWe]cg]ba
25

5[Y a M U[ M EM_W_
Kba]W AXYag]Z]WUg]ba #=iU_hUg]ba
DY_bX]W
<YfWe]cg]ba
26

5[Y a M U[ M EM_W_
DY_bX]W
<YfWe]cg]ba
Elxf Y[ YagUg]ba
27

5[Y a M U[ M EM_W_
DY_bX]W
<YfWe]cg]ba
DY_bX]W ] ]_Ue]gl
Elxf Y[ YagUg]ba
28

5[Y a M U[ M EM_W_
DY_bX]W
<YfWe]cg]ba
GUggYea <]fWbiYel
DY_bX]W ] ]_Ue]gl
Elxf Y[ YagUg]ba
time (s)
10
time (s)
10
29

5[Y a M U[ M EM_W_
DY_bX]W
<YfWe]cg]ba
GUggYea <]fWbiYel
DY_bX]W ] ]_Ue]gl
Elxf Y[ YagUg]ba
time (s)
10
time (s)
10
30

5[Y a M U[ M EM_W_
9hgb Ug]W Ix[U IYWb[a]g]ba
DY_bX]W
<YfWe]cg]ba
GUggYea <]fWbiYel
DY_bX]W ] ]_Ue]gl
Elxf Y[ YagUg]ba
time (s)
10
time (s)
10
31

a_UO 5[ [ M M P 6M M_ _
32
Chapter 3
Music Corpora and Datasets
3.1 Introduction
A research corpus is a collection of data compiled to study a research problem. A well
designed research corpus is representative of the domain under study. It is practically
infeasible to work with the entire universe of data. Therefore, to ensure scalability of
information retrieval technologies to real-world scenarios, it is to important to develop
and test computational approaches using a representative data corpus. Moreover, an
easily accessible data corpus provides a common ground for researchers to evaluate
their methods, and thus, accelerates knowledge sharing and advancement.
Not every computational task requires the entire research corpus for development
and evaluation of approaches. Typically a subset of the corpus is used in a specific
research task. We call this subset a test corpus or test dataset. The models built over a
test dataset can later be extended to the entire research corpus. Test dataset is a static
collection of data specific to an experiment, as opposed to a research corpus, which
can evolve over time. Therefore, different versions of the test dataset used in a specific
experiment should be retained for ensuring reproducibility of the research results.
Note that a test dataset should not be confused with the training and testing split of
a dataset, which are the terms used in the context of a cross validation experimental
setup.
In MIR, a considerable number of the computational approaches follow a data-driven
methodology, and hence, a well curated research corpus becomes a key factor in de-
termining the success of these approaches. Due to the importance of a good data
corpus in research, building a corpus in itself is a fundamental research task (MacMul-
len, 2003). MIR can be regarded as a relatively new research area within information
retrieval, which has primarily gained popularity in last two decades. Even today, a
significant number of the studies in MIR use ad-hoc procedures to build a collection
of data to be used in the experiments. Quite often the audio recordings used in the
experiments are taken from a researcher’s personal music collection. Availability of a
good representative research corpus has been a challenge in MIR (Serra, 2014). This
61

5[Y a_UO 5[ [ M
§ E MY RR[
34

5[Y a_UO 5[ [ M
§ E MY RR[
§ B [O Pa M P P _US O U UM
• Serra, X. (2014). Creating research corpora for the computational study of music: the case of the Compmusic project. In Proc. of the
53rd AES Int. Conf. on Semantic Audio. London.
• Srinivasamurthy, A., Koduri, G. K., Gulati, S., Ishwar, V., & Serra, X. (2014). Corpora for music information research in Indian art
music. In Int. Computer Music Conf./Sound and Music Computing Conf., pp. 1029–1036. Athens, Greece. 35

5[Y a_UO 5[ [ M
9hX]b =X]gbe]U_ YgUXUgU
<f Dhf]WVeU]am
37

5M M UO M P :U Pa_ M U a_UO 5[ [ M
8 9
.=3
.20891573
9 5
38

A MOO __ a_UO 5[ [ M
8 9
.=3
.20891573
9 5
39

A MOO __ a_UO 5[ [ M
U Uf
YWg]baf
DY_bX]W c eUfYf
40

E _ 6M M_ _
§ E[ UO P URUOM U[
§ el_ D SY M U[
§ [PUO DUYU M U e
§ ClSM C O[S U U[
41

[Pe 6 _O U [ _ M P C _ M U[ _
42
Chapter 4
Melody Descriptors and
Representations
4.1 Introduction
In this chapter, we describe methods for extracting relevant melodic descriptors and
low-level melody representation from raw audio signals. These melodic features are
used as inputs to perform higher level melodic analyses by the methods described in
the subsequent chapters. Since these features are used in a number of computational
tasks, we consolidate and present these common processing blocks in the current
chapter. This chapter is largely based on our published work presented in Gulati et al.
(2014a,b,c).
There are four sections in this chapter, and in each section, we describe the extraction
of a different melodic descriptor or a representation.
In Section 4.2, we focus on the task of automatically identifying the tonic pitch
in an audio recording. Our main objective is to perform a comparative evalu-
ation of different existing methods and select the best method to be used in our
work.
In Section 4.3, we present the method used to extract the predominant pitch
from audio signals, and describe the post-processing steps to reduce frequently
occurring errors.
In Section 4.4, we describe the process of segmenting the solo percussion re-
gions (Tani sections) in the audio recordings of Carnatic music.
In Section 4.5, we describe our ny¯as-based approach for segmenting melodies
in Hindustani music.
87

[Pe 6 _O U [ _ M P C _ M U[ _
Kba]W AXYag]Z]WUg]ba
#=iU_hUg]ba
Elxf Y[ YagUg]ba
GeYXb ]aUag DY_bXl =fg] Ug]ba
UaX cbfg cebWYff]a[
KUa] Y[ YagUg]ba
43

[Pe 6 _O U [ _ M P C _ M U[ _
Kba]W AXYag]Z]WUg]ba
#=iU_hUg]ba
Elxf Y[ YagUg]ba
GeYXb ]aUag DY_bXl =fg] Ug]ba
UaX cbfg cebWYff]a[
KUa] Y[ YagUg]ba
44

E[ UO P URUOM U[ 1 3 [MOT _
Method Features Feature Distribution Tonic Selection
MRS (Sengupta et al., 2005) Pitch (Datta, 1996) NA Error minimization
MRH1/MRH2 (Ranjani et al., 2011)
Pitch (Boersma & Weenink,
2001)
Parzen-window-based PDE GMM fitting
MJS (Salamon et al., 2012)
Multi-pitch salience
(Salamon et al., 2011)
Multi-pitch histogram Decision tree
MSG (Gulati et al., 2012)
Predominant melody
(Salamon & Gómez, 2012)
Pitch histogram Decision tree
MAB1 (Bellur et al., 2012)
Pitch (De Cheveigné &
Kawahara, 2002)
GD histogram Highest peak
Kawahara, 2002)
GD histogram Template matching
Kawahara, 2002)
MCS (Chordia & ¸Sentürk, 2013)
ABBREVIATIONS: NA=Not applicable; GD=Group Delay, PDE=Probability Density Estimate
Table 2.1: Summary of the existing tonic identification approaches.
• Salamon, J., Gulati, S., & Serra, X. (2012). A multipitch approach to tonic identification in Indian classical music. In Proc. of Int. Soc.
for Music Information Retrieval Conf. (ISMIR), pp. 499–504. Porto, Portugal.
• Gulati, S., Salamon, J., & Serra, X. (2012). A two-stage approach for tonic identification in indian art music. In Proc. of the 2nd
CompMusic Workshop, pp. 119–127. Istanbul, Turkey: Universitat Pompeu Fabra.
Method Features Feature Distribution Tonic Selection
MRS (Sengupta et al., 2005) Pitch (Datta, 1996) NA Error minimization
MRH1/MRH2 (Ranjani et al., 2011)
Pitch (Boersma & Weenink,
2001)
Parzen-window-based PDE GMM fitting
MJS (Salamon et al., 2012)
MSG (Gulati et al., 2012)
Predominant melody
(Salamon & Gómez, 2012)
Pitch histogram Decision tree
Kawahara, 2002)
Kawahara, 2002)
GD histogram Template matching
Kawahara, 2002)
MCS (Chordia & ¸Sentürk, 2013)
ABBREVIATIONS: NA=Not applicable; GD=Group Delay, PDE=Probability Density Estimate
Table 2.1: Summary of the existing tonic identification approaches.
Dh_g]c]gW % _Uff]Z]WUg]ba GeYXb ]aUag c]gW % gY c_UgY)cYU c]W
45

5[Y M M Ub 7bM aM U[
§ D b Y T[P_
§ DUd PUb _ PM M_ _
TIDCM1
TIDCM2
TIDCM3
TIDIITM1
TIDIITM2
TIDIISc
,1+
3-/
.,2
-2
.1,
//
-
-
+/
+..
+,
1
=kWYecgf DYUa XheUg]ba
# ]af
46

C _a _
Methods
TIDCM1 TIDCM2 TIDCM3 TIDIISc TIDIITM1 TIDIITM2
TP TPC TP TPC TP TPC TP TPC TP TPC TP TPC
MRH1 - 81.4 69.6 84.9 73.2 90.8 81.8 83.6 92.1 97.4 80.2 86.9
MRH2 - 63.2 65.7 78.2 68.5 83.5 83.6 83.6 94.7 97.4 83.8 88.8
MAB1 - - - - - - - - 89.5 89.5 - -
MAB2 - 88.9 74.5 82.9 78.5 83.4 72.7 76.4 92.1 92.1 86.6 89.1
MAB3 - 86 61.1 80.5 67.8 79.9 72.7 72.7 94.7 94.7 85 86.6
MJS - 88.9 87.4 90.1 88.4 91 75.6 77.5 89.5 97.4 90.8 94.1
MSG - 92.2 87.8 90.9 87.7 90.5 79.8 85.3 97.4 97.4 93.6 93.6
Table 4.1: Accuracies (%) for tonic pitch (TP) and tonic pitch-class (TPC) identiﬁcation by seven methods on six different datasets using only
audio data. The best accuracy obtained for each dataset is highlighted using bold text. The dashed horizontal line divides the methods based
on supervised learning (MJS and MSG) and those based on expert knowledge (MRH1, MRH2, MAB1, MAB2 and MAB3). TP column for TIDCM1 is
marked as ‘-’, because it consists of only instrumental excerpts for which we not evaluate tonic pitch accuracy. MAB1 is only evaluated on TIDIITM1
since it works on the whole concert recording.
94 MELODY DESCRIPTORS AND REPRESENTATION
Methods TIDCM1 TIDCM2 TIDCM3 TIDIISc TIDIITM1 TIDIITM2
MRH1 87.7 83.5 88.9 87.3 97.4 91.7
MRH2 79.55 76.3 82 85.5 97.4 91.5
MAB1 - - - - 97.4 -
MAB2 92.3 91.5 94.2 81.8 97.4 91.1
MAB3 87.5 86.7 90.9 81.8 94.7 89.9
MJS 88.9 93.6 92.4 80.9 97.4 92.3
MSG 92.2 90.9 90.5 85.3 97.4 93.6
3aPU[
3aPU[ MPM M
47

C _a _
CM2 CM3 IITM2 IITM1
60
65
70
75
80
85
90
95
100
PerformanceAccuracy(%)
JS
SG
RH1
RH2
AB2
AB3
Dataset
Accuracy(%)
40
50
60
70
80
90
100
JS
SG
R
H
1R
H
2
AB2AB3
JS
SG
R
H
1R
H
2
AB2AB3
JS
SG
R
H
1R
H
2AB2
AB3
JS
SG
R
H
1R
H
2
AB2AB3
Tonic Pitch
Tonic Pitch−class
Hindustani Carnatic Male Female
Accuracy(%)
Tonic Pitch-class
Tonic Pitch
0
5
10
15
20
25
30
Errors(%)
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
CM1 CM2 CM3 IISCB1 IITM1
0
5
10
15
20
25
30
Errors(%)
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
0
5
10
15
20
25
30
Errors(%)
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
Pa Errors
Ma Errors
Other Errors
CM1 CM2 CM3 IISCB1 IITM1 IITM2
0
5
10
15
20
25
30
Errors(%)
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
Pa E
Ma E
Othe
CM1 CM2 CM3 IISCB1 IITM1 IITM
0
5
10
15
20
25
30
Errors(%)
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
Pa Errors
Ma Errors
Other Error
0
5
10
15
20
25
30
Errors(%)
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
Pa Errors
Ma Errors
Other Errors
0
5
10
15
20
25
30
Errors(%)
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
Pa Errors
Ma Errors
Other Errors
Errors(%)
Datasets
0
5
10
15
20
25
30
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
Pa Errors
Ma Errors
Other Errors
Hindustani! Carnatic Male Female
Accuracy(%)
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
Pa Errors
Ma Errors
Other Errors
0
5
10
15
20
25
30
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
Pa Errors
Ma Errors
Other Errors
0
5
10
15
20
25
30
Errors(%)
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
0
5
10
15
20
25
30
Errors(%)
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
0
5
10
15
20
25
30
Errors(%)
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
P
M
O
CM1 CM2 CM3 IISCB1 IITM1 II
0
5
10
15
20
25
30
Errors(%)
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2A
Pa Errors
Ma Error
Other Er
0
5
10
15
20
25
30
Errors(%)
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
Pa Errors
Ma Errors
Other Errors
0
5
10
15
20
25
30
Errors(%)
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
Pa Errors
Ma Errors
Other Errors
0
5
10
15
20
25
30
Errors(%)
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
Pa Errors
Ma Errors
Other Errors
Errors(%)
Datasets
48

C _a _
CM2 CM3 IITM2 IITM1
60
65
70
75
80
85
90
95
100
JS
SG
RH1
RH2
AB2
AB3
Dataset
Accuracy(%)
40
50
60
70
80
90
100
JS
SG
R
H
1R
H
2
AB2AB3
JS
SG
R
H
1R
H
2
AB2AB3
JS
SG
R
H
1R
H
2AB2
AB3
JS
SG
R
H
1R
H
2
AB2AB3
Tonic Pitch
Tonic Pitch−class
Hindustani Carnatic Male Female
Accuracy(%)
Tonic Pitch-class
Tonic Pitch
0
5
10
15
20
25
30
Errors(%)
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
0
5
10
15
20
25
30
Errors(%)
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
0
5
10
15
20
25
30
Errors(%)
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
Pa Errors
Ma Errors
Other Errors
0
5
10
15
20
25
30
Errors(%)
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
Pa E
Ma E
Othe
CM1 CM2 CM3 IISCB1 IITM1 IITM
0
5
10
15
20
25
30
Errors(%)
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
Pa Errors
Ma Errors
Other Error
0
5
10
15
20
25
30
Errors(%)
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
Pa Errors
Ma Errors
Other Errors
0
5
10
15
20
25
30
Errors(%)
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
Pa Errors
Ma Errors
Other Errors
Errors(%)
Datasets
0
5
10
15
20
25
30
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
Pa Errors
Ma Errors
Other Errors
Accuracy(%)
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
Pa Errors
Ma Errors
Other Errors
0
5
10
15
20
25
30
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
Pa Errors
Ma Errors
Other Errors
0
5
10
15
20
25
30
Errors(%)
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
0
5
10
15
20
25
30
Errors(%)
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
0
5
10
15
20
25
30
Errors(%)
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
P
M
O
CM1 CM2 CM3 IISCB1 IITM1 II
0
5
10
15
20
25
30
Errors(%)
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2A
Pa Errors
Ma Error
Other Er
0
5
10
15
20
25
30
Errors(%)
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
Pa Errors
Ma Errors
Other Errors
0
5
10
15
20
25
30
Errors(%)
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
Pa Errors
Ma Errors
Other Errors
0
5
10
15
20
25
30
Errors(%)
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
Pa Errors
Ma Errors
Other Errors
Errors(%)
Datasets
<heUg]ba bZ YkWYecg
UgY[bel ]fY UaU_lf]f
<YgU]_YX Yeebe UaU_lf]f
UgY[bel ]fY Yeebe 9aU_lf]f
49

C _a _1 DaYYM e
§ a U U OT O M__URUOM U[
§ 3aPU[
§ bM UM [ Pa M U[
§ 5[ _U_ MPU U[ S P
U _ aY
§ BU OT Y M MW UOW
§ 3aPU[ Y MPM M
§ D _U Ub [ Pa M U[
§ O[ _U_ MPU U[ S P
U _ aY
• Gulati, S., Bellur, A., Salamon, J., Ranjani, H. G., Ishwar, V., Murthy, H. A., & Serra, X. (2014).
Automatic tonic identification in Indian art music: approaches and evaluation. JNMR, 43(1), 53–71. 50

B P[YU M BU OT 7_ UYM U[
DY_bX]U
• Salamon, J. & Gómez, E. (2012). Melody extraction from polyphonic music signals using pitch contour characteristics. IEEE
Transactions on Audio, Speech, and Language Processing, 20(6), 1759–1770.
• Bogdanov, D., Wack, N., Gómez, E., Gulati, S., Herrera, P., Mayor, O., Roma, G., Salamon, J., Zapata, J., & Serra, X. (2013). Essentia:
an audio analysis library for music information retrieval. In ISMIR, pp. 493–498. 51

el_ DbM M D SY M U[
§ [PUO _ SY M U[
§ 6 UYU Y [PUO T M_ _
178 180 182 184 186 188 190
−200
0
200
400
600
800
Time (s)
F0frequency(cents)
N1 N5N4N3N2
52

el_ DbM M D SY M U[
Predominant pitch estimation Tonic identification
Audio
Nyās segments
Pred. pitch estimation
and representation
Histogram
computation
SegmentationSvara identification Segmentation
Local Feature extractionContextual Local + Contextual
Segment classification Segment fusion
Segment classification and
fusion
53

el_ DbM M D SY M U[
Audio
Nyās segments
and representation
Histogram
computation
fusion
54

el_ DbM M D SY M U[
Audio
Nyās segments
Histogram
computation
fusion
and representation
55

el_ DbM M D SY M U[
Audio
Nyās segments
Histogram
computation
fusion
and representation
0.12, 0.34, 0.59, 0.23, 0.54
0.21, 0.24, 0.54, 0.54, 0.42
0.32, 0.23, 0.34, 0.41, 0.63
0.66, 0.98, 0.74, 0.33, 0.12
0.90, 0.42, 0.14, 0.83, 0.76
56

el_ DbM M D SY M U[
Audio
Nyās segments
Histogram
computation
Segment classification
and fusion
and representation
0.12, 0.34, 0.59, 0.23, 0.54
0.21, 0.24, 0.54, 0.54, 0.42
0.32, 0.23, 0.34, 0.41, 0.63
0.66, 0.98, 0.74, 0.33, 0.12
0.90, 0.42, 0.14, 0.83, 0.76
Nyās
Non-nyās
Nyās
Nyās
Non-nyās
57

6M M_
§ k l R[ YM O _ :U Pa_ M U
§ 3 [ M U[ _1 Ya_UOUM 2)- e M _ MU U S
§ ) -/ el_ _ SY _
657
IYWbeX]a[f <heUg]ba 9eg]fgf Ix[Uf
58

C _a _118 MELODY DESCRIPTORS AND REPRESENTATIONS
Segmentation Feat. DTW Tree k-NN NB LR SVM
PLS
FL 0.356 0.407 0.447 0.248 0.449 0.453
FC 0.284 0.394 0.387 0.383 0.389 0.406
FL+FC 0.289 0.414 0.426 0.409 0.432 0.437
Proposed
FL 0.524 0.672 0.719 0.491 0.736 0.749
FC 0.436 0.629 0.615 0.641 0.621 0.673
FL+FC 0.446 0.682 0.708 0.591 0.725 0.735
Table 4.3: F-scores for ny¯as boundary detection using PLS method and the proposed seg-
mentation method. Results are shown for different classiﬁers (Tree, k-NN, NB, LR, SVM)
and local (FL), contextual (FC) and local together with contextual (FL+FC) features. DTW
is the baseline method used for comparison. F-score for the random baseline obtained using
BR2 is 0.184.
PLS
FL 0.553 0.685 0.723 0.621 0.727 0.722
FC 0.251 0.639 0.631 0.690 0.688 0.674
FL+FC 0.389 0.694 0.693 0.708 0.722 0.706
PLS
FL 0.356 0.407 0.447 0.248 0.449 0.453
FC 0.284 0.394 0.387 0.383 0.389 0.406
FL+FC 0.289 0.414 0.426 0.409 0.432 0.437
Proposed
FL 0.524 0.672 0.719 0.491 0.736 0.749
FC 0.436 0.629 0.615 0.641 0.621 0.673
FL+FC 0.446 0.682 0.708 0.591 0.725 0.735
Table 4.3: F-scores for ny¯as boundary detection using PLS method and the proposed seg-
mentation method. Results are shown for different classiﬁers (Tree, k-NN, NB, LR, SVM)
and local (FL), contextual (FC) and local together with contextual (FL+FC) features. DTW
is the baseline method used for comparison. F-score for the random baseline obtained using
BR2 is 0.184.
PLS
FL 0.553 0.685 0.723 0.621 0.727 0.722
FC 0.251 0.639 0.631 0.690 0.688 0.674
FL+FC 0.389 0.694 0.693 0.708 0.722 0.706
Proposed
FL 0.546 0.708 0.754 0.714 0.749 0.758
FC 0.281 0.671 0.611 0.697 0.689 0.697
FL+FC 0.332 0.672 0.710 0.730 0.743 0.731
:bhaXUelCUVY_
59

C _a _1 DaYYM e
§ B [ [_ P _ SY M U[
§ M a d MO U[
§ >[OM R M a _
§ BU O cU_ U M _ SY M U[
§ BM YM OTU S 6EH
§ >[OM 5[ d aM R M a _
• Gulati, S., Serrà, J., Ganguli, K. K., & Serra, X. (2014). Landmark detection in Hindustani music
melodies. In ICMC-SMC, pp. 1062-1068. 60

[PUO BM B [O __U S
61
Chapter 5
Melodic Pattern Processing:
Similarity, Discovery and
Characterization
5.1 Introduction
In this chapter, we present our methodology for discovering musically relevant melodic
patterns in sizable audio collections of IAM. We address three main computational
tasks involved in this process: melodic similarity, pattern discovery and characteriz-
ation of the discovered melodic patterns. We refer to these different tasks jointly as
melodic pattern processing.
“Only by repetition can a series of tones be characterized as something
deﬁnite. Only repetition can demarcate a series of tones and its purpose.
Repetition thus is the basis of music as an art”
(Schenker et al., 1980)
Repeating patterns are at the core of music. Consequently, analysis of patterns is
fundamental in music analysis. In IAM, recurring melodic patterns are the building
blocks of melodic structures. They provide a base for improvisation and composition,
and thus, are crucial to the analysis and description of r¯agas, compositions, and artists
in this music tradition. A detailed account of the importance of melodic patterns in
IAM is provided in Section 2.3.2.
To recapitulate, from the literature review presented in Section 2.4.2 and Section 2.5.2
we see that the approaches for pattern processing in music can be broadly put into
two categories (Figure 5.1). One of the types of approaches perform pattern detection
(or matching) and follow a supervised methodology. In these approaches the system
knows a priori the pattern to be extracted. Typically such a system is fed with exem-
plar patterns or queries and is expected to extract all of their occurrences in a piece of
121

7dU_ U S 3 [MOT _ R[ 3
Method Task
Melody
Representation
Segmentation
Similarity
Measure
Speed-up #R¯agas #Rec #Patt #Occ
Ishwar et al. (2012) Distinction Continuous
GT
annotations
HMM - 5c NA 6 431
Ross & Rao (2012) Detection Continuous Pa ny¯as DTW Segmentation 1h 2 2 107
Ross et al. (2012) Detection
Continuous,
SAX-12,1200
Sama location Euc., DTW Segmentation 3h 4 3 107
Ishwar et al. (2013) Detection
Stationary point,
Continuous
Brute-force RLCS
Two stage
method
2c 47 4 173
Rao et al. (2013) Distinction Continuous
GT
annotations
DTW - 1h 8 3 268
Dutta & Murthy (2014a) Discovery
Stationary point,
Continuous
Brute-force RLCS - 5c 59 - -
Dutta & Murthy (2014b) Detection
Stationary point,
Continuous
NA
Modiﬁed-
RLCS
Two stage
method
1c 16 NA 59
Rao et al. (2014)
Distinction Continuous
GT
annotations
DTW - 1h 8 3 268
Distinction Continuous
GT
annotations
HMM - 5c NA 10 652
Ganguli et al. (2015) Detection
BSS,
Transcription
-
Smith-
Waterman
Discretization 34h 50 NA 1075
h Hindustani music collection c Carnatic music collection
ABBREVIATIONS: #Rec=Number of recordings; #Patt=Number of unique patterns; #Occ=Total number of annotated occurrences of the
patterns; GT=Ground truth; NA=Not available; “-”= Not applicable; Euc.=Euclidean distance.
Table 2.2: Summary of the methods proposed in the literature for melodic pattern processing in IAM. Note that all of these studies were published
during the course of our work.
<YgYWg]ba #/ <]fg]aWg]ba #. <]fWbiYel #+
, +, , +0
65

[PUO BM B [O __U S
Supervised Approach
Unsupervised Approach
Supervised Approach
66

[PUO BM B [O __U S
Supervised Approach
Supervised Approach
67

[PUO BM B [O __U S
§ 6M M_ _Uf
§ [c PS NUM_
§ :aYM [ _ UYU M U[ _
Supervised Approach
68

[PUO BM B [O __U S
DY_bX]W
] ]_Ue]gl
GUggYea
<]fWbiYel
GUggYea
UeUWgYe]mUg]ba
hcYei]fYX LafhcYei]fYX LafhcYei]fYX
69

[PUO BM B [O __U S
DY_bX]W
] ]_Ue]gl
GUggYea
<]fWbiYel
GUggYea
UeUWgYe]mUg]ba
70

[PUO DUYU M U e
Time (s)
10 2 3 4 5 6 7 8 9 10 11 12 13 14 15
00
00
00
00
00
00
00
U__Ya[Y
72

[PUO DUYU M U e
U__Ya[Y
73
+ , - .

[PUO DUYU M U e
Predominant Pitch Estimation
Pitch Normalization
Uniform Time-scaling
Distance Computation
Melodic Similarity
Audio1 Audio2
74

[PUO DUYU M U e
U c_]a[ eUgY8
PYf ) Eb& N Ug ]aX8
PYf ) Eb& b hW 8
=hW_]XYUa be
<laU ]W g] Y Uec]a[ #<KN &
GUeU YgYef8
Pitch Normalization
Melodic Similarity
Audio1 Audio2
75

[PUO DUYU M U e
n+ & 01& / & . & --p m
sg Standard deviation of
computation
Q Heaviside step functi
u Flatness measure
˜u Flatness threshold
° Complexity weightin
w Sampling rate of the
V Maximum error para
method
z Complexity estimate
Pitch Normalization
Melodic Similarity
Audio1 Audio2
76

§ Eb abe U_]mUg]ba
§ Kba]W
§ Q]aZ
§ Q,.
§ Q+,
§ DYUa
§ DYX]Ua
§ Q abe
§ DYX]Ua UVfb_hgY XYi]Ug]ba
[PUO DUYU M U e
Pitch Normalization
Melodic Similarity
Audio1 Audio2
77

[PUO DUYU M U e
§ vbZZ
§ vba n (3& (3/& +( & +( /& +(+p
Pitch Normalization
Melodic Similarity
Audio1 Audio2
78

[PUO DUYU M U e
§ =hW_]XYUa
§ <laU ]W g] Y Uec]a[
§ ?_bVU_ WbafgeU]ag #-
§ CbWU_ WbafgeU]ag #
Pitch Normalization
Melodic Similarity
Audio1 Audio2
79

[PUO DUYU M U e
Pitch Normalization
Melodic Similarity
Audio1 Audio2
k k k
80

7bM aM U[ 1 D a
E GUggYeaf
+ E IUaXb
fY[ Yagf
Kbc +
aYUeYfg
aY][ Vbef
DYUa
9iYeU[Y
GeYW]f]ba
81

6M M_
49
IYWbeX]a[f <heUg]ba Ix[UfGUggYea FWW(
49
UeaUg]W hf]W XUgUfYg
]aXhfgUa] hf]W XUgUfYg
82

C _a _
ODIC SIMILARITY: APPROACHES AND EVALUATION 131
Dataset MAP Srate Norm TScale Dist
MSDcmd
iitm
0.413 w67 Zmean Woff DDTW_L1_G90
0.412 w67 Zmean Won DDTW_L1_G10
MSDhmd
iitb
0.552 w100 Ztonic Woff DDTW_L0_G90
MAP score and details of the parameter settings for the three best performing
MSDcmd
iitm and MSDhmd
iitb dataset. Srate: sampling rate of the melody representation,
malization technique, TScale: uniform time-scaling and Dist: distance measure.
an average precision (MAP), a typical evaluation measure in information
Manning et al., 2008). Mean average precision (MAP) is computed by
mean of the average precision values of each query in the dataset. This way,
single number to evaluate and compare the performance of a variant. In
sess if the difference in the performance of any two variants is statistically
, we use the Wilcoxon signed rank-test (Wilcoxon, 1945) with p < 0.01. To
e for multiple comparisons, we apply the Holm-Bonferroni method (Holm,
us, considering that we compare 560 different variants, we effectively use a
CMD HMD
134 MELODIC PATTERN PR
MSDcmd
iitm
0.277 w67 ZtonicQ12 Won DDTW_L1_G10
MSDhmd
iitb
Table 5.2: MAP score and details of the parameter settings for the three best perf83

C _a _
ODIC SIMILARITY: APPROACHES AND EVALUATION 131
MSDcmd
iitm
MSDhmd
iitb
MAP score and details of the parameter settings for the three best performing
MSDcmd
iitm and MSDhmd
iitb dataset. Srate: sampling rate of the melody representation,
malization technique, TScale: uniform time-scaling and Dist: distance measure.
an average precision (MAP), a typical evaluation measure in information
Manning et al., 2008). Mean average precision (MAP) is computed by
mean of the average precision values of each query in the dataset. This way,
single number to evaluate and compare the performance of a variant. In
sess if the difference in the performance of any two variants is statistically
, we use the Wilcoxon signed rank-test (Wilcoxon, 1945) with p < 0.01. To
e for multiple comparisons, we apply the Holm-Bonferroni method (Holm,
us, considering that we compare 560 different variants, we effectively use a
CMD HMD
134 MELODIC PATTERN PR
MSDcmd
iitm
MSDhmd
iitb
Table 5.2: MAP score and details of the parameter settings for the three best perf
GUeU YgYef
GUggYea WUgY[be]Yf
GUggYea fY[ YagUg]ba
84

C _a _1 DaYYM e
U c_]a[ eUgY
Ebe U_]mUg]ba
K] Y fWU_]a[
<]fgUaWY
CbWU_ WbafgeU]ag
?_bVU_ WbafgeU]ag
][
DYUa
Eb WbafYafhf
<KN
N]g
AaiUe]Uag
AaiUe]Uag
Kba]W
Eb WbafYafhf
<KN
N]g bhg
N]g bhg
UeaUg]W
# (. D9G
]aXhfgUa]
# (// D9G
GUeU YgYe
85

C _a _1 DaYYM e
Bab a La ab a
G eUfY Y[ YagUg]ba
UeaUg]W
]aXhfgUa]
• Gulati, S., Serrà, J., & Serra, X. (2015). An evaluation of methodologies for melodic similarity in
audio recordings of Indian art music. In ICASSP, pp. 678–682. 86

Y [bU S [PUO DUYU M U e
Pitch Normalization
Melodic Similarity
Audio1 Audio2
8h_gheY fcYW]Z]W]g]Yf
87

]aXhfgUa] hf]W
88

• G. E. Batista, X. Wang, and E. J Keogh. A complexity-invariant distance measure for time series, in SDM, volume 11, pp. 699–710, 2011
b c_Yk]gl
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6
Time (s)
500
0
500
1000
1500
Frequency(Cent) 1
2
3
U Y c eUfY
UeaUg]W hf]W
89
142 MELODIC P
based distance (DDTW) in order to compute the ﬁnal similarit
We compute ° as:
° =
max(zi,zj)
min(zi,zj)
zi = 2
s
N 1
Â
i=1
(ˆpi ˆpi+1)2
where, zi is the complexity estimate of a melodic pattern of le
142 MELODIC
based distance (DDTW) in order to compute the ﬁnal simila
We compute ° as:
° =
max(zi,zj)
min(zi,zj)
zi = 2
s
N 1
Â
i=1
(ˆpi ˆpi+1)2
where, zi is the complexity estimate of a melodic pattern of
is the pitch value of the ith sample. We explore two variants o

Predominant Pitch Est. + Post Processing
Pitch Normalization
Svara Duration Truncation
Distance Computation + Complexity Weight
Melodic Similarity
Audio1 Audio2
Partial Transcription Elxf fY[ YagUg]ba
n (+& (-& (/& (1/& +( & +(/& ,( p
90

6M M_
49
49
91

C _a _ MELODIC PATTERN PROCESSING
MSDhmd
CM
Norm MB MDT MCW1 MCW2
Ztonic 0.45 (0.25) 0.52 (0.24) - -
Zmean 0.25 (0.20) 0.31 (0.23) - -
Ztetra 0.40 (0.23) 0.47 (0.23) - -
MSDcmd
CM
Ztonic 0.39 (0.29) 0.42 (0.29) 0.41 (0.28) 0.41 (0.29)
Zmean 0.39 (0.26) 0.45 (0.28) 0.43 (0.27) 0.45 (0.27)
Ztetra 0.45 (0.26) 0.50 (0.27) 0.49 (0.28) 0.51 (0.27)
5.3: MAP scores for the two datasets MSDhmd
CM and MSDcmd
CM for the four method vari-
MB, MDT, MCW1 and MCW2 and for different normalization techniques. Standard devi-
of average precision is reported within round brackets.
rst analyse the results for the MSDhmd
CM dataset. From Table 5.3 (upper half),
e that the proposed method variant that applies a duration truncation performs
than the baseline method for all the normalization techniques. Moreover, this
ence is found to be statistically signiﬁcant in each case. The results for the
hmd
CM in this table correspond to F =500 ms, for which we obtain the highest ac-
y compared to the other F values as shown in Figure 5.10. Furthermore, we see
tonic results in the best accuracy for the MSDhmd
CM for all the method variants and
fference is found to be statistically signiﬁcant in each case. From Table 5.3, we
e a high standard deviation of the average precision values. This is because some
MB MDT
H1 H2 H3 H4 H5
MB MB MB MBMDT MDT MDT MDT
MB MDT MCW2 MB MB MB MBMDT MDT MDT
C1 C2 C3 C4 C5
MDTMCW2 MCW2 MCW2 MCW2
0.1 0.3 0.5 0.75 1.0 1.5 2.0
0.44
0.46
0.48
0.50
0.52
MAP
HMD
CMD
(s)
92

C _a _ MELODIC PATTERN PROCESSING
MSDhmd
CM
Ztonic 0.45 (0.25) 0.52 (0.24) - -
Zmean 0.25 (0.20) 0.31 (0.23) - -
Ztetra 0.40 (0.23) 0.47 (0.23) - -
MSDcmd
CM
Ztonic 0.39 (0.29) 0.42 (0.29) 0.41 (0.28) 0.41 (0.29)
Zmean 0.39 (0.26) 0.45 (0.28) 0.43 (0.27) 0.45 (0.27)
Ztetra 0.45 (0.26) 0.50 (0.27) 0.49 (0.28) 0.51 (0.27)
5.3: MAP scores for the two datasets MSDhmd
CM and MSDcmd
CM for the four method vari-
MB, MDT, MCW1 and MCW2 and for different normalization techniques. Standard devi-
of average precision is reported within round brackets.
rst analyse the results for the MSDhmd
CM dataset. From Table 5.3 (upper half),
e that the proposed method variant that applies a duration truncation performs
than the baseline method for all the normalization techniques. Moreover, this
ence is found to be statistically signiﬁcant in each case. The results for the
hmd
CM in this table correspond to F =500 ms, for which we obtain the highest ac-
y compared to the other F values as shown in Figure 5.10. Furthermore, we see
tonic results in the best accuracy for the MSDhmd
CM for all the method variants and
fference is found to be statistically signiﬁcant in each case. From Table 5.3, we
e a high standard deviation of the average precision values. This is because some
MB MDT
H1 H2 H3 H4 H5
MB MB MB MBMDT MDT MDT MDT
MB MDT MCW2 MB MB MB MBMDT MDT MDT
C1 C2 C3 C4 C5
MDTMCW2 MCW2 MCW2 MCW2
0.1 0.3 0.5 0.75 1.0 1.5 2.0
0.44
0.46
0.48
0.50
0.52
MAP
HMD
CMD
(s)
<heUg]ba gehaWUg]ba
Fcg] U_ XheUg]ba8
b c_Yk]gl NY][ g]a[
93

C _a _1 DaYYM e
<heUg]ba gehaWUg]ba
b c_Yk]gl NY][ g]a[
UeaUg]W
]aXhfgUa]
UeaUg]W
• Gulati, S., Serrà, J., & Serra, X. (2015). Improving melodic similarity in Indian art music using
culture-specific melodic characteristics. In ISMIR, pp. 680–686. 94

[PUO BM B [O __U S
DY_bX]W
] ]_Ue]gl
GUggYea
<]fWbiYel
GUggYea
UeUWgYe]mUg]ba
95

BM 6U_O[b e
AageU eYWbeX]a[
GUggYea <]fWbiYel
AagYe eYWbeX]a[
GUggYea YUeW
IUa IYZ]aY Yag
I+
I,
I-
96

M O[ PU S BM 6U_O[b e
§ 6e MYUO UY cM U S 6EH
?_bVU_ WbafgeU]ag
#+ _Ya[g
Eb CbWU_ WbafgeU]ag
gYc4 n# &+ & #+&+ & #+& p
97

CM W C RU Y
CbWU_ WbafgeU]ag
gYc4 n#,&+ & #+&+ & #+&, p
CbWU_ Wbfg ZhaWg]ba
#M+& M,& M-& M.
98

5[Y a M U[ M 5[Y dU e
F#a, à U c_Yf
F#a, à UaX]XUgYf
-/ k #0 k 0 k / k (/ 6 -+ D]__]ba
<]fgUaWY Cb Ye :bhaXf
99

>[c 4[a P_ R[ 6EH
<KehY 7 <C:
<C: 7 <CUfg
<CUfg
Kbc E dhYhY
100

>[c 4[a P_ R[ 6EH
§ 5M_OMP P [c N[a P_ JRakthanmanon et al. (2013)K
§ U _ >M_ N[a P JKim et al. (2001)K
§ >4L [ST JKeogh et al. (2001)K
§ 7M e MNM P[ U S
• Kim, S. W., Park, S., & Chu, W. W. (2001). An index-based approach for similarity search supporting time warping in large sequence
databases. In 17th International Conference on Data Engineering, pp. 607–614.
• Keogh, E. & Ratanamahatana, C. A. (2004). Exact indexing of dynamic time warping. Knowledge and Information Systems, 7(3), 358–386.
• Rakthanmanon, T., Campana, B., Mueen, A., Batista, G., Westover, B., Zhu, Q., Zakaria, J., & Keogh, E. (2013). Addressing big data time
series: mining trillions of time series subsequences under dynamic time warping. ACM Transactions on Knowledge Discovery from Data
(TKDD), 7(3), 10:1–10:31 101

6M M_
§ D P M _1 /0 (((
§ D M OT M _1 j)- ((( (((
49
IYWbeX]a[f <heUg]ba
UeaUg]W hf]W
102

C _a _1 5[Y a M U[ _
§ M O[ PU S1 )&,) E
§ O[ PU S1 ) &, E
>]efg CUfg
C:TBYb[ T=H
C:TBYb[ T=
/,
,-
+
./
/+
-
Cb Ye VbhaX AageU eYWbeX]a[
#
AagYe eYWbeX]a[
#
+ K 6 + +,
10 33 104

C _a _
160 MELODIC PATTER
Seed Category V1 V2 V3 V4
S1 0.92 0.92 0.91 0.89
S2 0.68 0.73 0.73 0.66
S3 0.35 0.34 0.35 0.35
Table 5.5: MAP scores for four variants of rank reﬁnement method (Vi) for e
(S1, S2 and S3).
S1 S2 S3
S1 S2 S3
105

C _a _
160 MELODIC PATTER
Seed Category V1 V2 V3 V4
S1 0.92 0.92 0.91 0.89
S2 0.68 0.73 0.73 0.66
S3 0.35 0.34 0.35 0.35
Table 5.5: MAP scores for four variants of rank reﬁnement method (Vi) for e
(S1, S2 and S3).
S1 S2 S3
S1 S2 S3
2, cUggYeaf
CbWU_ Wbfg ZhaWg]ba
<]fgUaWY fYcUeUV]_]gl
AageU if AagYe eYWbeX]a[
106

C _a _
AageU eYWbeX]a[
AagYe eYWbeX]a[
+ >GI à 2 KGI
+ >GI à / KGI
• Gulati, S., Serrà, J., Ishwar, V., & Serra, X. (2014). Mining melodic patterns in large audio
collections of Indian art music. In SITIS-MIRA, pp. 264–271.
CbWU_ Wbfg ZhaWg]ba à ]gl V_bW
107

[PUO BM B [O __U S
6U__UYU M M _ bM M _
108

[PUO BM B [O __U S
DY_bX]W
] ]_Ue]gl
GUggYea
<]fWbiYel
GUggYea
UeUWgYe]mUg]ba
109

[PUO BM 5TM MO UfM U[
9MYMWM_ h 3 M Wl _
5[Y [_U U[ _ OURUO M _
ClSM Y[ UR_
110

Inter-recording Pattern
Detection
Intra-recording Pattern
Discovery
Data Processing
Pattern Network
Generation
Network
Filtering
Community Detection and
Characterization
Rāga motifs
Carnatic music collection
Melodic Pattern Discovery
Pattern Characterization
111

Inter-recording Pattern
Detection
Intra-recording Pattern
Discovery
Data Processing
Pattern Network
Generation
Network
Filtering
Characterization
Rāga motifs
Carnatic music collection
Pattern Characterization
112

• M. EJ Newman, “The structure and function of complex networks,” Society for Industrial and Applied Mathematics (SIAM) review,
vol. 45, no. 2, pp. 167–256, 2003.
LaX]eYWgYX
Pattern Network
Generation
Network
Filtering
Characterization
113

• S. Maslov and K. Sneppen, “Specificity and stability in topology of protein networks,” Science, vol. 296, no. 5569, pp. 910– 913, 2002.
Pattern Network
Generation
Network
Filtering
Characterization
114

• V. D. Blondel, J. L. Guillaume, R. Lambiotte, and E. Lefebvre, “Fast unfolding of communities in large networks,” Journal of Statistical
Mechanics: Theory and Experiment, vol. 2008, no. 10, pp. P10008, 2008.
• S Fortunato, “Community detection in graphs,” Physics Reports, vol. 486, no. 3, pp. 75–174, 2010.
Pattern Network
Generation
Network
Filtering
Characterization
115

Pattern Network
Generation
Network
Filtering
Characterization
b ha]gl KlcY EbXYf IYWbeX]a[f Ix[Uf
?U U U ][ ][ ][
b cbf]g]ba c eUfYf DYX]h CYff CYff
Ix[U c eUfYf DYX]h DYX]h CYff
116

Pattern Network
Generation
Network
Filtering
Characterization
Ix[U X]fge]Vhg]ba
4: Graphical representation of the melodic pattern network after ﬁlterin
The detected communities in the network are indicated by different colors
these communities are shown on the right.
ommunities we empirically devise a goodness measure G, which de
od that a community Cq represents a r¯aga motif. We propose to use
G = NL4
c,
an estimate of the likelihood of r¯aga a0
q in Cq,
L =
Aq,1
N
,
N = 5
bhag
Ix[U _] Y_] bbX
-
+ +
3
5
=
117

Pattern Network
Generation
Network
Filtering
Characterization
IYWbeX]a[ X]fge]Vhg]ba
N = 5
bhag
1x2 + 2x2 + 3x1
5
Yageb]X
, ,
+
munities we empirically devise a goodness measure G, which den
hat a community Cq represents a r¯aga motif. We propose to use
G = NL4
c,
estimate of the likelihood of r¯aga a0
q in Cq,
L =
Aq,1
N
,
how uniformly the nodes of the community are distributed over a
c =
Â
Lb
l=1 l ·Bq,l
N
. =
118

Pattern Network
Generation
Network
Filtering
Characterization
5.24: Graphical representation of the melodic pattern network after ﬁlt
ld ˜D. The detected communities in the network are indicated by different col
es of these communities are shown on the right.
e communities we empirically devise a goodness measure G, which
elihood that a community Cq represents a r¯aga motif. We propose to us
G = NL4
c,
L is an estimate of the likelihood of r¯aga a0
q in Cq,
A
?bbXaYff YUfheY
119
EbXYf
Yageb]X
Ix[U _] Y_] bbX

7bM aM U[
49
IYWbeX]a[f <heUg]ba Ix[Uf b cbf]g]baf
+ Ix[U k + b ha]gl 6 + G eUfYf
+ Dhf]W]Uaf
120

C _a _
5.5 CHARACTERIZATION OF MELODIC PATTERNS 173
R¯aga µr sr R¯aga µr sr
Hamsadhv¯ani 0.84 0.23 B¯egad. a 0.88 0.11
K¯amavardani 0.78 0.17 K¯api 0.75 0.10
Darb¯ar 0.81 0.23 Bhairavi 0.91 0.15
Kaly¯an. i 0.90 0.10 Beh¯ag 0.84 0.16
K¯a ˙mbh¯oji 0.87 0.12 T¯od.¯ı 0.92 0.07
Table 5.7: Mean (µr) and standard deviation (sr) of µ√ for each r¯aga. R¯agas with µr 0.85
Rāgas
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
µ
0
5
10
15
20
25
30
35
Count
121

C _a _
5.5 CHARACTERIZATION OF MELODIC PATTERNS 173
R¯aga µr sr R¯aga µr sr
Hamsadhv¯ani 0.84 0.23 B¯egad. a 0.88 0.11
K¯amavardani 0.78 0.17 K¯api 0.75 0.10
Darb¯ar 0.81 0.23 Bhairavi 0.91 0.15
Kaly¯an. i 0.90 0.10 Beh¯ag 0.84 0.16
K¯a ˙mbh¯oji 0.87 0.12 T¯od.¯ı 0.92 0.07
Table 5.7: Mean (µr) and standard deviation (sr) of µ√ for each r¯aga. R¯agas with µr 0.85
Rāgas
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
µ
0
5
10
15
20
25
30
35
Count
GYe c eUfY cYeZbe UaWY
GYe ex[U cYeZbe UaWY
Dhf]W]Uaf’ U[eYY Yag
122

C _a _1 DaYYM e
DYUaIx[U# DYUaG eUfY# IUg]a[f 5 (3,(1/ 5
(2/
• Gulati, S., Serrà, J., Ishwar, V., & Serra, X. (2016). Discovering rāga motifs by characterizing
communities in networks of melodic patterns. In ICASSP, pp. 286–290. 123
+ , - . / 0 1 2 3 +
PYf
bhag
+ , - -
+,
,+
,/
--

3a [YM UO ClSM C O[S U U[
124
Chapter 6
Automatic R¯aga Recognition
6.1 Introduction
In this chapter, we address the task of automatically recognizing r¯agas in audio re-
cordings of IAM. We describe two novel approaches for r¯aga recognition that jointly
capture the tonal and the temporal aspects of melody. The contents of this chapter are
largely based on our published work in Gulati et al. (2016a,b).
R¯aga is a core musical concept used in compositions, performances, music organiz-
ation, and pedagogy of IAM. Even beyond the art music, numerous compositions in
Indian folk and film music are also based on r¯agas (Ganti, 2013). R¯aga is therefore
one of the most desired melodic descriptions of a recorded performance of IAM, and
an important criterion used by listeners to browse its audio music collections. Despite
its significance, there exists a large volume of audio content whose r¯aga is incorrectly
labeled or, simply, unlabeled. A computational approach to r¯aga recognition will al-
low us to automatically annotate large collections of audio music recordings. It will
enable r¯aga-based music retrieval in large audio archives, semantically-meaningful
music discovery and musicologically-informed navigation. Furthermore, a deeper
understanding of the r¯aga framework from a computational perspective will pave the
way for building applications for music pedagogy in IAM.
R¯aga recognition is the most studied research topic in MIR of IAM. There exist a
considerable number of approaches utilizing different characteristic aspects of r¯agas
such as svara set, svara salience and ¯ar¯ohana-avr¯ohana. A critical in-depth review of
the existing approaches for r¯aga recognition is presented in Section 2.4.3, wherein we
identify several shortcomings in these approaches and possible avenues for scientific
contribution to take this task to the next level. Here we provide a short summary of
this analysis:
Nearly half of the number of the existing approaches for r¯aga recognition do
not utilize the temporal aspects of melody at all (Table 2.3), which are crucial
177

38BACKGROUND
Methods Svara set
Svara
salience
Svara
intonation
¯ar¯ohana-
avr¯ohana
Melodic
phrases
Svara
discretization
Temporal
aspects
Pandey et al. (2003) • • Yes •
Chordia & Rae (2007) • • • Yes •
Belle et al. (2009) • • • No
Shetty & Achary (2009) • • Yes •
Sridhar & Geetha (2009) • • Yes •
Koduri et al. (2011) • • Yes
Ranjani et al. (2011) • No
Chakraborty & De (2012) • Yes
Koduri et al. (2012) • • • Both
Chordia & ¸Sentürk (2013) • • • No
Dighe et al. (2013a) • • • Yes •
Dighe et al. (2013b) • • Yes
Koduri et al. (2014) • • • No
Kumar et al. (2014) • • • • Yes •
Dutta et al. (2015)a
• No •
a This method performs r¯aga veriﬁcation and not recognition
Table 2.3: R¯aga recognition methods proposed in the literature along with the melodic characteristics they utilize to perform the task. We also
indicate if a method uses a discrete svara representation of melody. The methods are arranged in chronological order.
125

38BACKGROUND
Methods Svara set
Svara
salience
Svara
intonation
¯ar¯ohana-
avr¯ohana
Melodic
phrases
Svara
discretization
Temporal
aspects
Belle et al. (2009) • • • No
Kumar et al. (2014) • • • • Yes •
• No •
e I [ ? D G X < a E
126

38BACKGROUND
Methods Svara set
Svara
salience
Svara
intonation
¯ar¯ohana-
avr¯ohana
Melodic
phrases
Svara
discretization
Temporal
aspects
Belle et al. (2009) • • • No
Kumar et al. (2014) • • • • Yes •
• No •
e I [ ? D G X < a E
127

38BACKGROUND
Methods Svara set
Svara
salience
Svara
intonation
¯ar¯ohana-
avr¯ohana
Melodic
phrases
Svara
discretization
Temporal
aspects
Belle et al. (2009) • • • No
Kumar et al. (2014) • • • • Yes •
• No •
e I [ ? D G X < a E
128

38BACKGROUND
Methods Svara set
Svara
salience
Svara
intonation
¯ar¯ohana-
avr¯ohana
Melodic
phrases
Svara
discretization
Temporal
aspects
Belle et al. (2009) • • • No
Kumar et al. (2014) • • • • Yes •
• No •
129

38BACKGROUND
Methods Svara set
Svara
salience
Svara
intonation
¯ar¯ohana-
avr¯ohana
Melodic
phrases
Svara
discretization
Temporal
aspects
Belle et al. (2009) • • • No
Kumar et al. (2014) • • • • Yes •
• No •
Ix[U 9 Ix[U : Ix[U
130

38BACKGROUND
Methods Svara set
Svara
salience
Svara
intonation
¯ar¯ohana-
avr¯ohana
Melodic
phrases
Svara
discretization
Temporal
aspects
Belle et al. (2009) • • • No
Kumar et al. (2014) • • • • Yes •
• No •
131

38BACKGROUND
Methods Svara set
Svara
salience
Svara
intonation
¯ar¯ohana-
avr¯ohana
Melodic
phrases
Svara
discretization
Temporal
aspects
Belle et al. (2009) • • • No
Kumar et al. (2014) • • • • Yes •
• No •
132

38BACKGROUND
Methods Svara set
Svara
salience
Svara
intonation
¯ar¯ohana-
avr¯ohana
Melodic
phrases
Svara
discretization
Temporal
aspects
Belle et al. (2009) • • • No
Kumar et al. (2014) • • • • Yes •
• No •
Ix[U MYe]Z]WUg]ba
133

2.4RELATEDWORKININDIANARTMUSIC39
Method Tonal Feature Tonic
Identification
Feature Recognition Method #R¯agas Dataset
(Dur./Num.)a
Audio
Type
Pandey et al. (2003) Pitch (Boersma & Weenink, 2001) NA Svara sequence HMM and n-Gram 2 - / 31 MP
Chordia & Rae (2007) Pitch (Sun, 2000) Manual PCD, PCDD SVM classifier 31 20 / 127 MP
Belle et al. (2009) Pitch (Rao & Rao, 2009) Manual PCD (parameterized) k-NN classifier 4 0.6 / 10 PP
Shetty & Achary (2009) Pitch (Sridhar & Geetha, 2006) NA #Svaras, Vakra svaras Neural Network
classifier
20 - / 90 MP
Sridhar & Geetha (2009) Pitch (Lee, 2006) Singer
identification
Svara set, its
sequence
String matching 3 - / 30 PP
Koduri et al. (2011) Pitch (Rao & Rao, 2010) Brute force PCD k-NN classifier 10 2.82 / 170 PP
Ranjani et al. (2011) Pitch (Boersma & Weenink, 2001) GMM fitting PDE SC-GMM and Set
matching
7 - / 48 PP
Chakraborty & De (2012) Pitch (Sengupta, 1990) Error
minimization
Svara set Set matching - - / - -
Koduri et al. (2012) Predominant pitch (Salamon &
Gómez, 2012)
Multipitch-based PCD variants k-NN classifier 43 - / 215 PP
Chordia & ¸Sentürk (2013) Pitch (Camacho, 2007) Brute force PCD variants k-NN and statistical
classifiers
31 20 / 127 MP
Dighe et al. (2013a) Chroma (Lartillot et al., 2008) Brute force
(v¯adi-based)
Chroma, Timbre
features
HMM 4 9.33 / 56 PP
Dighe et al. (2013b) Chroma (Lartillot et al., 2008) Brute force
(v¯adi-based)
PCD variant RF classifier 8 16.8 / 117 PP
Gómez, 2012)
Multipitch-based PCD (parameterized) Different classifiers 45? 93/424 PP
Kumar et al. (2014) Predominant pitch (Salamon &
Gómez, 2012)
Brute force PCD + n-Gram
distribution
SVM classifier 10 2.82 / 170 PP
Dutta et al. (2015)* Predominant pitch (Salamon &
Gómez, 2012)
Cepstrum-based Pitch contours LCS with k-NN 30† 3 / 254 PP
a In the case of multiple datasets we list the larger one * This method performs r¯aga verification and not recognition
? Authors do not use all 45 r¯agas at once in a single experiment, but consider groups of 3 r¯agas per experiment † Authors finally use only 17 r¯agas in their experiment
ABBREVIATIONS: Dur.: Duration of the dataset, Num.: Number of recordings, NA: Not applicable, ‘-’: Not available, SC-GMM: semi-continuous GMM, MP: Monophonic, PP:
Polyphonic
Table 2.4: Summary of the R¯aga recognition methods proposed in the literature. The methods are arranged in chronological order. 134

2.4RELATEDWORKININDIANARTMUSIC39
Method Tonal Feature Tonic
Identification
Feature Recognition Method #R¯agas Dataset
(Dur./Num.)a
Audio
Type
Pandey et al. (2003) Pitch (Boersma & Weenink, 2001) NA Svara sequence HMM and n-Gram 2 - / 31 MP
Chordia & Rae (2007) Pitch (Sun, 2000) Manual PCD, PCDD SVM classifier 31 20 / 127 MP
Belle et al. (2009) Pitch (Rao & Rao, 2009) Manual PCD (parameterized) k-NN classifier 4 0.6 / 10 PP
Shetty & Achary (2009) Pitch (Sridhar & Geetha, 2006) NA #Svaras, Vakra svaras Neural Network
classifier
20 - / 90 MP
Sridhar & Geetha (2009) Pitch (Lee, 2006) Singer
identification
Svara set, its
sequence
String matching 3 - / 30 PP
Koduri et al. (2011) Pitch (Rao & Rao, 2010) Brute force PCD k-NN classifier 10 2.82 / 170 PP
Ranjani et al. (2011) Pitch (Boersma & Weenink, 2001) GMM fitting PDE SC-GMM and Set
matching
7 - / 48 PP
Chakraborty & De (2012) Pitch (Sengupta, 1990) Error
minimization
Svara set Set matching - - / - -
Gómez, 2012)
Multipitch-based PCD variants k-NN classifier 43 - / 215 PP
Chordia & ¸Sentürk (2013) Pitch (Camacho, 2007) Brute force PCD variants k-NN and statistical
classifiers
31 20 / 127 MP
Dighe et al. (2013a) Chroma (Lartillot et al., 2008) Brute force
(v¯adi-based)
Chroma, Timbre
features
HMM 4 9.33 / 56 PP
Dighe et al. (2013b) Chroma (Lartillot et al., 2008) Brute force
(v¯adi-based)
PCD variant RF classifier 8 16.8 / 117 PP
Gómez, 2012)
Multipitch-based PCD (parameterized) Different classifiers 45? 93/424 PP
Kumar et al. (2014) Predominant pitch (Salamon &
Gómez, 2012)
Brute force PCD + n-Gram
distribution
SVM classifier 10 2.82 / 170 PP
Dutta et al. (2015)* Predominant pitch (Salamon &
Gómez, 2012)
Cepstrum-based Pitch contours LCS with k-NN 30† 3 / 254 PP
a In the case of multiple datasets we list the larger one * This method performs r¯aga verification and not recognition
? Authors do not use all 45 r¯agas at once in a single experiment, but consider groups of 3 r¯agas per experiment † Authors finally use only 17 r¯agas in their experiment
ABBREVIATIONS: Dur.: Duration of the dataset, Num.: Number of recordings, NA: Not applicable, ‘-’: Not available, SC-GMM: semi-continuous GMM, MP: Monophonic, PP:
Polyphonic
Table 2.4: Summary of the R¯aga recognition methods proposed in the literature. The methods are arranged in chronological order.
135

6M M_
49
IYWbeX]a[f <heUg]ba CYUX 9eg]fgfbaWYegfIx[Uf
49
IYWbeX]a[f <heUg]ba CYUX 9eg]fgfIY_YUfYfIx[Uf
136

B [ [_ P 3 [MOT _
time (s)
10
time (s)
10
30
0 20 40 60 80 100 120
Index
0
20
40
60
80
100
120
Index
(a)
0 20 40 60 80 100 120
Index
0
20
40
60
80
100
120
(b)
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
G eUfY VUfYX
Ix[U IYWb[a]g]ba
K<D VUfYX
Ix[U IYWb[a]g]ba
KbaU_ % KY cbeU_ <]fWeYg]mUg]ba
137

B [ [_ P 3 [MOT _
time (s)
10
time (s)
10
30
0 20 40 60 80 100 120
Index
0
20
40
60
80
100
120
Index
(a)
0 20 40 60 80 100 120
Index
0
20
40
60
80
100
120
(b)
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
G eUfY VUfYX
Ix[U IYWb[a]g]ba
K<D VUfYX
Ix[U IYWb[a]g]ba
KbaU_ % KY cbeU_
138
<]fWeYg]mUg]ba

BT M_ NM_ P ClSM C O[S U U[
Inter-recording
Pattern Detection
Intra-recording
Pattern Discovery
Data
Processing
Music collection
Vocabulary
Extraction
Term-frequency
Feature Extraction
Feature
Normalization
TF-IDF Feature Extraction
Feature Matrix
Pattern Network
Generation
Similarity
Thresholding
Community
Detection
Melodic Pattern Clustering
139

Inter-recording
Pattern Detection
Intra-recording
Pattern Discovery
Data
Processing
Music collection
Vocabulary
Extraction
Term-frequency
Feature Extraction
Feature
Normalization
TF-IDF Feature Extraction
Feature Matrix
Pattern Network
Generation
Similarity
Thresholding
Community
Detection
Melodic Pattern Clustering
140

141

142

Kbc]W DbXY_]a[ o KYkg _Uff]Z]WUg]ba
G eUfYf
Ix[U
IYWbeX]a[f
NbeXf
Kbc]W
<bWh Yagf
143

Kbc]W DbXY_]a[ o KYkg _Uff]Z]WUg]ba
KYe >eYdhYaWl t AaiYefY <bWh Yag
>eYdhYaWl #K> A<>
144

E 6 M a 7d MO U[
Vocabulary
Extraction
Term-frequency
Feature Extraction
Feature
Normalization
145

E 6 M a 7d MO U[
Vocabulary
Extraction
Term-frequency
Feature Extraction
Feature
Normalization
146

E 6 M a 7d MO U[
Vocabulary
Extraction
Term-frequency
Feature Extraction
Feature
Normalization
+ .
, ,
+ +
- - -
147

E 6 M a 7d MO U[
Vocabulary
Extraction
Term-frequency
Feature Extraction
Feature
Normalization
ies except the ones that comprise patterns extracted from only a single audio record-
ing. Such communities are analogous to the words that only occur within a document
and, hence, are irrelevant for modeling a topic.
We experiment with three different sets of features f1, f2 and f3, which are similar to
the TF-IDF features typically used in text information retrieval. We denote our corpus
by R comprising NR = |R| number of recordings. A melodic phrase and a recording
is denoted by √and r , respectively
f1(√,r) =
(
1, if f(√,r) > 0
0, otherwise
(6.1)
where, f(√,r) denotes the raw frequency of occurrence of pattern √ in recording
r. f1 only considers the presence or absence of a pattern in a recording. In order to
investigate if the frequency of occurrence of melodic patterns is relevant for charac-
terizing r¯agas, we take f2(√,r) = f(√,r). As mentioned, the melodic patterns that
occur across different r¯agas and in several recordings do not aid r¯aga recognition.
Therefore, to reduce their effect in the feature vector we employ a weighting scheme,
similar to the inverse document frequency (idf) weighting in text retrieval.
f3(√,r) = f(√,r)I(√,R) (6.2)
✓
NR
◆
is denoted by√and r , respectively
f1(√,r) =
(
1, if f(√,r) > 0
0, otherwise
(6.1)
terizing r¯agas, we take f2(√,r) = f(√,r). As mentioned, the melodic patterns that
occur across different r¯agas and in several recordings do not aid r¯aga recognition.
Therefore, to reduce their effect in the feature vector we employ a weighting scheme,
f3(√,r) = f(√,r)I(√,R) (6.2)
I(√,R) = log
✓
NR
|{r 2 R :√2 r}|
◆
(6.3)
1 2 3
the TF-IDF features typically used in text information retrieval. We denote our corp
by R comprising NR = |R| number of recordings. A melodic phrase and a recordi
is denoted by√and r , respectively
f1(√,r) =
(
1, if f(√,r) > 0
0, otherwise
(6
where, f(√,r) denotes the raw frequency of occurrence of pattern √ in recordi
r. f1 only considers the presence or absence of a pattern in a recording. In order
investigate if the frequency of occurrence of melodic patterns is relevant for chara
terizing r¯agas, we take f2(√,r) = f(√,r). As mentioned, the melodic patterns th
occur across different r¯agas and in several recordings do not aid r¯aga recognitio
Therefore, to reduce their effect in the feature vector we employ a weighting schem
f3(√,r) = f(√,r)I(√,R) (6
I(√,R) = log
✓
NR
|{r 2 R :√2 r}|
◆
(6
and, hence, are irrelevant for modeling a topic.
is denoted by √and r , respectively
f1(√,r) =
(
1, if f(√,r) > 0
0, otherwise
(6.1)
terizing r¯agas, we take f2(√,r) = f(√,r). As mentioned, the melodic patterns tha
occur across different r¯agas and in several recordings do not aid r¯aga recognition
Therefore, to reduce their effect in the feature vector we employ a weighting scheme
f3(√,r) = f(√,r)I(√,R) (6.2)
I(√,R) = log
✓
NR
|{r 2 R :√2 r}|
◆
(6.3)
148

C _a _
2 PATTERN-BASED R ¯AGA RECOGNITION 185
Method Feature SVML SGD NBM NBG RF LR 1-NN
MVSM
f1 51.04 55 37.5 54.37 25.41 55.83 -
f2 45.83 50.41 35.62 47.5 26.87 51.87 -
f3 45.83 51.66 67.29 44.79 23.75 51.87 -
MPC PCDfull - - - - - - 73.12
MGK
PDparam 30.41 22.29 27.29 28.12 42.91 30.83 25.62
PDcontext 54.16 43.75 5.2 33.12 49.37 54.79 26.25
ble 6.1: Accuracy (%) of r¯aga recognition on RRDCMD dataset by MVSM and other methods
thods using different features and classifiers. Bold text signifies the best accuracy by a
thod among all its variants
ndustani music recordings. We therefore do not consider this method for comparing
ults on Hindustani music.
e authors of MPC courteously ran the experiments on our dataset using the original
plementations of the method. For MGK, the authors kindly extracted the features
Dparam and PDcontext) using the original implementation of their method and the
periments using different classification strategies were done by us.
2.3 Results and Discussion
fore we proceed to present our results, we notify readers that the accuracies repor-
d in this section for different methods vary slightly from the ones reported in Gulati
6 AUTOMATIC R ¯AGA RECOGNITION
MVSM
f1 71 72.33 69.33 79.33 38.66 74.33 -
f2 65.33 64.33 67.66 72.66 40.33 68 -
f3 65.33 62.66 82.66 72 41.33 67.66 -
MPC PCDfull - - - - - - 91.66
ble 6.2: Accuracy (%) of r¯aga recognition on RRDHMD dataset by MVSM and other methods
hods using different features and classifiers. Bold text signifies the best accuracy by a
hod among all its variants.
3 4 5 6 7 8 9 10 11 12 13 14
˜ (bin index)
0.06
0.08
0.10
0.12
0.14
0.16
0.18
0.20
0.22
ClusteringCoefficient(C)
0
10
20
30
40
50
60
70
Accuracy(%)
C(G) C(Gr) Accuracy of MVSM
3 4 5 6 7 8 9 10 11 12 13 14
˜ (bin index)
0.06
0.08
0.10
0.12
0.14
0.16
0.18
0.20
0.22
0.24
30
40
50
60
70
80
90
Accuracy(%)
149

C _a _
2 PATTERN-BASED R ¯AGA RECOGNITION 185
MVSM
f1 51.04 55 37.5 54.37 25.41 55.83 -
f2 45.83 50.41 35.62 47.5 26.87 51.87 -
f3 45.83 51.66 67.29 44.79 23.75 51.87 -
MPC PCDfull - - - - - - 73.12
MGK
PDparam 30.41 22.29 27.29 28.12 42.91 30.83 25.62
PDcontext 54.16 43.75 5.2 33.12 49.37 54.79 26.25
ble 6.1: Accuracy (%) of r¯aga recognition on RRDCMD dataset by MVSM and other methods
thods using different features and classifiers. Bold text signifies the best accuracy by a
thod among all its variants
ndustani music recordings. We therefore do not consider this method for comparing
ults on Hindustani music.
e authors of MPC courteously ran the experiments on our dataset using the original
plementations of the method. For MGK, the authors kindly extracted the features
Dparam and PDcontext) using the original implementation of their method and the
periments using different classification strategies were done by us.
2.3 Results and Discussion
fore we proceed to present our results, we notify readers that the accuracies repor-
d in this section for different methods vary slightly from the ones reported in Gulati
6 AUTOMATIC R ¯AGA RECOGNITION
MVSM
f1 71 72.33 69.33 79.33 38.66 74.33 -
f2 65.33 64.33 67.66 72.66 40.33 68 -
f3 65.33 62.66 82.66 72 41.33 67.66 -
MPC PCDfull - - - - - - 91.66
ble 6.2: Accuracy (%) of r¯aga recognition on RRDHMD dataset by MVSM and other methods
hods using different features and classifiers. Bold text signifies the best accuracy by a
hod among all its variants.
3 4 5 6 7 8 9 10 11 12 13 14
˜ (bin index)
0.06
0.08
0.10
0.12
0.14
0.16
0.18
0.20
0.22
0
10
20
30
40
50
60
70
Accuracy(%)
3 4 5 6 7 8 9 10 11 12 13 14
˜ (bin index)
0.06
0.08
0.10
0.12
0.14
0.16
0.18
0.20
0.22
0.24
30
40
50
60
70
80
90
Accuracy(%)
b cUeY ZYUgheYf
gUgY bZ g Y Ueg
150

C _a _1 DaYYM e
beX]U wYague #, +-
GebcbfYX
BbXhe] Yg U_( #, +.
G <>h__
K> A<> #>-
G <cUeU
1-
01
//
3,
2-
UeaUg]W
#
]aXhfgUa]
#
• Gulati, S., Serrà, J., Ishwar, V., Şentürk, S., & Serra, X. (2016). Phrase-based rāga recognition
using vector space modeling. In ICASSP, pp. 66–70. 151

7 [ 3 M e_U_
R1-S.anmukhapriya
R2-K¯api
R3-Bhairavi
R4-Madhyam¯avati
R5-Bilahari
R6-M¯ohana˙m
R7-Sencurut.t.i
R8-´Sr¯ıranjani
R9-R¯ıtigaul.a
R10-Huss¯en¯ı
R11-Dhany¯asi
R12-At.¯ana
R13-Beh¯ag
R14-Surat.i
R15-K¯amavardani
R16-Mukh¯ari
R17-Sindhubhairavi
R18-Sah¯an¯a
R19-K¯anad.a
R20-M¯ay¯am¯al.avagaul.a
R21-N¯at.a
R22-´Sankar¯abharan.a˙m
R23-S¯av¯eri
R24-Kam¯as
R25-T¯od.i
R26-B¯egad.a
R27-Harik¯ambh¯oji
R28-´Sr¯ı
R29-Kaly¯an.i
R30-S¯ama
R31-N¯at.akurinji
R32-P¯urv¯ıkal.y¯an.i
R33-Yadukulak¯a˙mb¯oji
R34-D¯evag¯andh¯ari
R35-K¯ed¯aragaul.a
R36-¯Anandabhairavi
R37-Gaul.a
R38-Var¯al.i
R39-K¯a˙mbh¯oji
R40-Karaharapriya
R1
R2
R3
R4
R5
R6
R7
R8
R9
R10
R11
R12
R13
R14
R15
R16
R17
R18
R19
R20
R21
R22
R23
R24
R25
R26
R27
R28
R29
R30
R31
R32
R33
R34
R35
R36
R37
R38
R39
R40
8 1 1 1 1
4 1 1 1 1 1 1 1 1
9 1 1 1
1 5 3 1 2
7 1 1 1 2
1 5 1 1 1 1 1 1
1 5 1 1 1 1 1 1
2 5 1 1 3
12
1 1 2 5 1 1 1
7 1 2 1 1
1 7 2 1 1
11 1
10 1 1
10 2
1 11
1 1 1 2 1 4 1 1
1 9 1 1
1 1 1 1 1 3 1 1 1 1
11 1
12
12
1 3 7 1
10 2
12
1 10 1
1 9 2
1 3 1 1 2 1 2 1
11 1
1 3 1 1 6
1 1 10
1 11
1 1 1 1 5 2 1
1 1 10
1 1 1 8 1
2 1 2 1 1 1 3 1
2 2 1 7
12
1 1 10
1 11
0 1 2 3 4 5 6 7 8 9 10 11 12
152

7 [ 3 M e_U_
R1-S.anmukhapriya
R2-K¯api
R3-Bhairavi
R4-Madhyam¯avati
R5-Bilahari
R6-M¯ohana˙m
R7-Sencurut.t.i
R8-´Sr¯ıranjani
R9-R¯ıtigaul.a
R10-Huss¯en¯ı
R11-Dhany¯asi
R12-At.¯ana
R13-Beh¯ag
R14-Surat.i
R15-K¯amavardani
R16-Mukh¯ari
R17-Sindhubhairavi
R18-Sah¯an¯a
R19-K¯anad.a
R21-N¯at.a
R23-S¯av¯eri
R24-Kam¯as
R25-T¯od.i
R26-B¯egad.a
R28-´Sr¯ı
R29-Kaly¯an.i
R30-S¯ama
R31-N¯at.akurinji
R37-Gaul.a
R38-Var¯al.i
R39-K¯a˙mbh¯oji
R40-Karaharapriya
R1
R2
R3
R4
R5
R6
R7
R8
R9
R10
R11
R12
R13
R14
R15
R16
R17
R18
R19
R20
R21
R22
R23
R24
R25
R26
R27
R28
R29
R30
R31
R32
R33
R34
R35
R36
R37
R38
R39
R40
8 1 1 1 1
4 1 1 1 1 1 1 1 1
9 1 1 1
1 5 3 1 2
7 1 1 1 2
1 5 1 1 1 1 1 1
1 5 1 1 1 1 1 1
2 5 1 1 3
12
1 1 2 5 1 1 1
7 1 2 1 1
1 7 2 1 1
11 1
10 1 1
10 2
1 11
1 1 1 2 1 4 1 1
1 9 1 1
1 1 1 1 1 3 1 1 1 1
11 1
12
12
1 3 7 1
10 2
12
1 10 1
1 9 2
1 3 1 1 2 1 2 1
11 1
1 3 1 1 6
1 1 10
1 11
1 1 1 1 5 2 1
1 1 10
1 1 1 8 1
2 1 2 1 1 1 3 1
2 2 1 7
12
1 1 10
1 11
0 1 2 3 4 5 6 7 8 9 10 11 12
G eUfY VUfYX ex[Uf
153

7 [ 3 M e_U_
R1-S.anmukhapriya
R2-K¯api
R3-Bhairavi
R4-Madhyam¯avati
R5-Bilahari
R6-M¯ohana˙m
R7-Sencurut.t.i
R8-´Sr¯ıranjani
R9-R¯ıtigaul.a
R10-Huss¯en¯ı
R11-Dhany¯asi
R12-At.¯ana
R13-Beh¯ag
R14-Surat.i
R15-K¯amavardani
R16-Mukh¯ari
R17-Sindhubhairavi
R18-Sah¯an¯a
R19-K¯anad.a
R21-N¯at.a
R23-S¯av¯eri
R24-Kam¯as
R25-T¯od.i
R26-B¯egad.a
R28-´Sr¯ı
R29-Kaly¯an.i
R30-S¯ama
R31-N¯at.akurinji
R37-Gaul.a
R38-Var¯al.i
R39-K¯a˙mbh¯oji
R40-Karaharapriya
R1
R2
R3
R4
R5
R6
R7
R8
R9
R10
R11
R12
R13
R14
R15
R16
R17
R18
R19
R20
R21
R22
R23
R24
R25
R26
R27
R28
R29
R30
R31
R32
R33
R34
R35
R36
R37
R38
R39
R40
8 1 1 1 1
4 1 1 1 1 1 1 1 1
9 1 1 1
1 5 3 1 2
7 1 1 1 2
1 5 1 1 1 1 1 1
1 5 1 1 1 1 1 1
2 5 1 1 3
12
1 1 2 5 1 1 1
7 1 2 1 1
1 7 2 1 1
11 1
10 1 1
10 2
1 11
1 1 1 2 1 4 1 1
1 9 1 1
1 1 1 1 1 3 1 1 1 1
11 1
12
12
1 3 7 1
10 2
12
1 10 1
1 9 2
1 3 1 1 2 1 2 1
11 1
1 3 1 1 6
1 1 10
1 11
1 1 1 1 5 2 1
1 1 10
1 1 1 8 1
2 1 2 1 1 1 3 1
2 2 1 7
12
1 1 10
1 11
0 1 2 3 4 5 6 7 8 9 10 11 12
9__]YX ex[Uf
154

Computational Approaches for Melodic Description in Indian Art Music Corpora

Computational Approaches for Melodic Description in Indian Art Music Corpora

Recomendados

Recomendados

Más contenido relacionado

Similar a Computational Approaches for Melodic Description in Indian Art Music Corpora

Similar a Computational Approaches for Melodic Description in Indian Art Music Corpora (20)

Más de Sankalp Gulati

Más de Sankalp Gulati (8)

Último

Último (20)

Computational Approaches for Melodic Description in Indian Art Music Corpora