SlideShare una empresa de Scribd logo
1 de 197
Descargar para leer sin conexión
5[Y a M U[ M 3 [MOT _ R[ [PUO
6 _O U U[ U PUM 3 a_UO 5[ [ M
.
6[O [ M ET _U_ B _ M U[
a_UO E OT [ [Se 9 [a
6 M Y [R R[ YM U[ M P 5[YYa UOM U[ E OT [ [SU _
F Ub _U M B[Y a MN M 4M O [ M D MU
ET _U_ PU O [ 1 ET _U_ O[YYU Y YN _1
.
)/ [b YN ().
Aa U
[PaO U[
a_UO 5[ [ M
M P 6M M_ _
[Pe 6 _O U [ _
M P C _ M U[ _
[Pe BM
B [O __U S
3 UOM U[ _
DaYYM e M P
a a B _ O Ub _ClSM C O[S U U[
4
Aa U
[PaO U[
a_UO 5[ [ M
M P 6M M_ _
[Pe 6 _O U [ _
M P C _ M U[ _
[Pe BM
B [O __U S
3 UOM U[ _
DaYYM e M P
a a B _ O Ub _ClSM C O[S U U[
Chapter 1
Introduction
Information technology (IT) is constantly shaping the world that we live in. Its ad-
vancements have changed human behavior and influenced the way we connect with
our environment. Music being a universal socio-cultural phenomena has been deeply
influenced by these advancements. The way music is created, stored, disseminated,
listened to, and even learned has changed drastically over the last few decades. A
massive amount of audio music content is now available on demand. Thus, it becomes
necessary to develop computational techniques that can process and automatically de-
scribe large volumes of digital music content to facilitate novel ways of interaction
with it. There are different information sources such as editorial metadata, social data,
and audio recordings that can be exploited to generate a description of music. Melody,
along with harmony and rhythm, is a fundamental facet of music, and therefore, an
essential component in its description. In this thesis, we focus on describing melodic
aspects of music through an automated analysis of audio content.
1.1 Motivation
“It is melody that enables us to distinguish one work from another. It
is melody that human beings are innately able to reproduce by singing,
humming, and whistling. It is melody that makes music memorable: we
are likely to recall a tune long after we have forgotten its text.”
(Selfridge-Field, 1998)
The importance of melody in our musical experiences makes its analysis and descrip-
tion a crucial component in music content processing. It becomes even more import-
ant for melody dominant music traditions such as Indian art music (IAM), where the
concept of harmony (functional harmony as understood in common practice) does not
exist, and the complex melodic structure takes the central role in music aesthetics.
1
Chapter 3
Music Corpora and Datasets
3.1 Introduction
A research corpus is a collection of data compiled to study a research problem. A well
designed research corpus is representative of the domain under study. It is practically
infeasible to work with the entire universe of data. Therefore, to ensure scalability of
information retrieval technologies to real-world scenarios, it is to important to develop
and test computational approaches using a representative data corpus. Moreover, an
easily accessible data corpus provides a common ground for researchers to evaluate
their methods, and thus, accelerates knowledge sharing and advancement.
Not every computational task requires the entire research corpus for development
and evaluation of approaches. Typically a subset of the corpus is used in a specific
research task. We call this subset a test corpus or test dataset. The models built over a
test dataset can later be extended to the entire research corpus. Test dataset is a static
collection of data specific to an experiment, as opposed to a research corpus, which
can evolve over time. Therefore, different versions of the test dataset used in a specific
experiment should be retained for ensuring reproducibility of the research results.
Note that a test dataset should not be confused with the training and testing split of
a dataset, which are the terms used in the context of a cross validation experimental
setup.
In MIR, a considerable number of the computational approaches follow a data-driven
methodology, and hence, a well curated research corpus becomes a key factor in de-
termining the success of these approaches. Due to the importance of a good data
corpus in research, building a corpus in itself is a fundamental research task (MacMul-
len, 2003). MIR can be regarded as a relatively new research area within information
retrieval, which has primarily gained popularity in last two decades. Even today, a
significant number of the studies in MIR use ad-hoc procedures to build a collection
of data to be used in the experiments. Quite often the audio recordings used in the
experiments are taken from a researcher’s personal music collection. Availability of a
good representative research corpus has been a challenge in MIR (Serra, 2014). This
61
Chapter 4
Melody Descriptors and
Representations
4.1 Introduction
In this chapter, we describe methods for extracting relevant melodic descriptors and
low-level melody representation from raw audio signals. These melodic features are
used as inputs to perform higher level melodic analyses by the methods described in
the subsequent chapters. Since these features are used in a number of computational
tasks, we consolidate and present these common processing blocks in the current
chapter. This chapter is largely based on our published work presented in Gulati et al.
(2014a,b,c).
There are four sections in this chapter, and in each section, we describe the extraction
of a different melodic descriptor or a representation.
In Section 4.2, we focus on the task of automatically identifying the tonic pitch
in an audio recording. Our main objective is to perform a comparative evalu-
ation of different existing methods and select the best method to be used in our
work.
In Section 4.3, we present the method used to extract the predominant pitch
from audio signals, and describe the post-processing steps to reduce frequently
occurring errors.
In Section 4.4, we describe the process of segmenting the solo percussion re-
gions (Tani sections) in the audio recordings of Carnatic music.
In Section 4.5, we describe our ny¯as-based approach for segmenting melodies
in Hindustani music.
87
Chapter 5
Melodic Pattern Processing:
Similarity, Discovery and
Characterization
5.1 Introduction
In this chapter, we present our methodology for discovering musically relevant melodic
patterns in sizable audio collections of IAM. We address three main computational
tasks involved in this process: melodic similarity, pattern discovery and characteriz-
ation of the discovered melodic patterns. We refer to these different tasks jointly as
melodic pattern processing.
“Only by repetition can a series of tones be characterized as something
definite. Only repetition can demarcate a series of tones and its purpose.
Repetition thus is the basis of music as an art”
(Schenker et al., 1980)
Repeating patterns are at the core of music. Consequently, analysis of patterns is
fundamental in music analysis. In IAM, recurring melodic patterns are the building
blocks of melodic structures. They provide a base for improvisation and composition,
and thus, are crucial to the analysis and description of r¯agas, compositions, and artists
in this music tradition. A detailed account of the importance of melodic patterns in
IAM is provided in Section 2.3.2.
To recapitulate, from the literature review presented in Section 2.4.2 and Section 2.5.2
we see that the approaches for pattern processing in music can be broadly put into
two categories (Figure 5.1). One of the types of approaches perform pattern detection
(or matching) and follow a supervised methodology. In these approaches the system
knows a priori the pattern to be extracted. Typically such a system is fed with exem-
plar patterns or queries and is expected to extract all of their occurrences in a piece of
121
Chapter 6
Automatic R¯aga Recognition
6.1 Introduction
In this chapter, we address the task of automatically recognizing r¯agas in audio re-
cordings of IAM. We describe two novel approaches for r¯aga recognition that jointly
capture the tonal and the temporal aspects of melody. The contents of this chapter are
largely based on our published work in Gulati et al. (2016a,b).
R¯aga is a core musical concept used in compositions, performances, music organiz-
ation, and pedagogy of IAM. Even beyond the art music, numerous compositions in
Indian folk and film music are also based on r¯agas (Ganti, 2013). R¯aga is therefore
one of the most desired melodic descriptions of a recorded performance of IAM, and
an important criterion used by listeners to browse its audio music collections. Despite
its significance, there exists a large volume of audio content whose r¯aga is incorrectly
labeled or, simply, unlabeled. A computational approach to r¯aga recognition will al-
low us to automatically annotate large collections of audio music recordings. It will
enable r¯aga-based music retrieval in large audio archives, semantically-meaningful
music discovery and musicologically-informed navigation. Furthermore, a deeper
understanding of the r¯aga framework from a computational perspective will pave the
way for building applications for music pedagogy in IAM.
R¯aga recognition is the most studied research topic in MIR of IAM. There exist a
considerable number of approaches utilizing different characteristic aspects of r¯agas
such as svara set, svara salience and ¯ar¯ohana-avr¯ohana. A critical in-depth review of
the existing approaches for r¯aga recognition is presented in Section 2.4.3, wherein we
identify several shortcomings in these approaches and possible avenues for scientific
contribution to take this task to the next level. Here we provide a short summary of
this analysis:
Nearly half of the number of the existing approaches for r¯aga recognition do
not utilize the temporal aspects of melody at all (Table 2.3), which are crucial
177
Chapter 7
Applications
7.1 Introduction
The research work presented in this thesis has several applications. A number of these
applications have already been described in Section 1.1. While some applications
such as r¯aga-based music retrieval and music discovery are relevant mainly in the
context of large audio collections, there are several applications that can be developed
on the scale of the music collection compiled in the CompMusic project. In this
chapter, we present some concrete examples of such applications that have already
incorporated parts of our work. We provide a brief overview of Dunya, a collection of
the music corpora and software tools developed during the CompMusic project and,
Sar¯aga and Riy¯az, the mobile applications developed within the technology transfer
project, Culture Aware MUsic Technologies (CAMUT). In addition, we present three
web demos that showcase parts of the outcomes of our computational methods. We
also briefly present one of our recent studies that performs musicologically motivated
exploration of melodic structures in IAM. It serves as an example of how our methods
can facilitate investigations in computational musicology.
7.2 Dunya
Dunya55 comprises the music corpora and the software tools that have been developed
as part of the CompMusic project. It includes data for five music traditions - Hindus-
tani, Carnatic, Turkish Makam, Jingju and Andalusian music. By providing access
to the gathered data (such as audio recordings and metadata), and the generated data
(such as audio descriptors) in the project, Dunya aims to facilitate study and explor-
ation of relevant musical aspects of different music repertoires. The CompMusic
corpora mainly comprise audio recordings and complementary information that de-
scribes those recordings. This complementary information can be in the form of the
55http://dunya.compmusic.upf.edu/
207
Chapter 8
Summary and Future Perspectives
8.1 Introduction
In this thesis, we have presented a number of computational approaches for analyzing
melodic elements at different levels of melodic organization in IAM. In tandem, these
approaches generate a high-level melodic description of the audio collections in this
music tradition. They build upon each other to finally allow us to achieve the goals
that we set at the beginning of this thesis, which are:
To curate and structure representative music corpora of IAM that comprise au-
dio recordings and the associated metadata, and use that to compile sizable and
well annotated tests datasets for melodic analyses.
To develop data-driven computational approaches for the discovery and char-
acterization of musically relevant melodic patterns in sizable audio collections
of IAM
To devise computational approaches for automatically recognizing r¯agas in re-
corded performances of IAM.
Based on the results presented in this thesis, we can now say that our goals are suc-
cessfully met. We started this thesis by presenting our motivation behind the analysis
and description of melodies in IAM, highlighting the opportunities and challenges
that this music tradition offers within the context of music information research and
the CompMusic project (Chapter 1). We provided a brief introduction to IAM and its
melodic organization, and critically reviewed the existing literature on related topics
within the context of MIR and IAM (Chapter 2).
In Chapter 3, we provided a comprehensive overview of the music corpora and data-
sets that were compiled and structured in our work as a part of the CompMusic
project. To the best of our knowledge, these corpora comprise the largest audio
collections of IAM along with curated metadata and automatically extracted music
221
Chapter 2
Background
2.1 Introduction
In this chapter, we present our review of the existing literature related with the work
presented in this thesis. In addition, we also present a brief overview of the relevant
music and mathematical background. We start with a brief discussion on the termin-
ology used in this thesis (Section 2.2). Subsequently, we provide an overview of the
selected music concepts to better understand the computational tasks addressed in
our work (Section 2.3). We then present a review of the relevant literature, which we
divide into two parts. We first present a review of the work done in computational ana-
lysis of IAM (Section 2.4). This includes approaches for tonic identification, melodic
pattern processing and automatic r¯aga recognition. In the second part, we present
relevant work done in MIR in general, including topics related to pattern processing
(detection and discovery), and key estimation (Section 2.5). Finally, we provide a
brief overview of selected scientific concepts (Section 2.6).
2.2 Terminology
In this section, we provide our working definition of selected terms that we have
used throughout the thesis. To begin with, it is important to understand the mean-
ing of melody in the context of this thesis. As we see from the literature, defining
melody in itself has been a challenge, with no consensus on a single definition of
melody (Gómez et al., 2003; Salamon, 2013). We do not aim here to formally define
melody in the universal sense, but present its working definition within the scope of
this thesis. Before we proceed, it is important to understand the setup of a concert
of IAM. Every performance of IAM has a lead artist, who plays the central role (also
literally positioned in the center of the stage), and all other instruments are considered
as accompaniments. There is a main melody line by the lead performer, and typic-
ally, also a melodic accompaniment that follows the main melody. A more detailed
13
5
[PaO U[
6
Chapter 1
Introduction
Information technology (IT) is constantly shaping the world that we live in. Its ad-
vancements have changed human behavior and influenced the way we connect with
our environment. Music being a universal socio-cultural phenomena has been deeply
influenced by these advancements. The way music is created, stored, disseminated,
listened to, and even learned has changed drastically over the last few decades. A
massive amount of audio music content is now available on demand. Thus, it becomes
necessary to develop computational techniques that can process and automatically de-
scribe large volumes of digital music content to facilitate novel ways of interaction
with it. There are different information sources such as editorial metadata, social data,
and audio recordings that can be exploited to generate a description of music. Melody,
along with harmony and rhythm, is a fundamental facet of music, and therefore, an
essential component in its description. In this thesis, we focus on describing melodic
aspects of music through an automated analysis of audio content.
1.1 Motivation
“It is melody that enables us to distinguish one work from another. It
is melody that human beings are innately able to reproduce by singing,
humming, and whistling. It is melody that makes music memorable: we
are likely to recall a tune long after we have forgotten its text.”
(Selfridge-Field, 1998)
The importance of melody in our musical experiences makes its analysis and descrip-
tion a crucial component in music content processing. It becomes even more import-
ant for melody dominant music traditions such as Indian art music (IAM), where the
concept of harmony (functional harmony as understood in common practice) does not
exist, and the complex melodic structure takes the central role in music aesthetics.
1
[ UbM U[
3a [YM UO Ya_UO P _O U U[
7
[ UbM U[
] ]_Ue]gl& <]fWbiYel&
YUeW & IUX]b&
IYWb YaXUg]ba
GYXU[b[l& =XhWUg]ba&
=agYegU]a Yag&
=a UaWYX _]fgYa]a[
b chgUg]baU_
hf]Wb_b[l
8
[ UbM U[
IYceYfYagUg]ba
Y[ YagUg]ba
] ]_Ue]gl
Dbg]Zf <]fWbiYel
GUggYea UeUWgYe]mUg]ba
DY_bX]W
<YfWe]cg]ba
9
5[Y a_UO B [V O
• Serra, X. (2011). A multicultural approach to music information research. In Proc. of ISMIR, pp. 151–156.
<YfWe]cg]ba
<UgU Xe]iYa
h_gheY U UeY
FcYa UWWYff
IYcebXhW]V]_]gl
10
PUM M a_UO
11
]aXhfgUa]
BUhfghi B( ?Ua[h_]
UeaUg]W
M][aYf Af Ue
E YU [ [Se
§ E[ UO1 4M_ U OT GM U _ MO [__ M U_ _
§ [Pe1 BU OT [R T P[YU M Y [PUO _[a O
§ ClSM1 [PUO R MY c[ W
§ DbM M1 3 M [S[a_ [ M [
§ 3 mTM M Mb mTM M1 3_O PU S P _O PU S [S __U[
§ 5TM M 1 3N_ MO U[ [R T Y [PUO [S __U[
§ [ UR_1 5TM MO U_ UO Y [PUO M _
§ 9MYMWM_1 9 UPU S Y [PUO S _ a _ 5M M UO Ya_UO
12
A [ a U U _ M P 5TM S _
§ A [ a U U _
13
A [ a U U _ M P 5TM S _
§ A [ a U U _
§ H _ aPU P Ya_UO MPU U[
14
A [ a U U _ M P 5TM S _
§ A [ a U U _
§ H _ aPU P Ya_UO MPU U[
§ : [ T[ UO MO[a_ UOM OTM MO U_ UO_
15
A [ a U U _ M P 5TM S _
§ A [ a U U _
§ H _ aPU P Ya_UO MPU U[
§ : [ T[ UO MO[a_ UOM OTM MO U_ UO_
• http://www.music- ir.org/mirex/wiki/MIREX_HOME
• Salamon, J. & Gómez, E. (2012). Melody extraction from polyphonic music signals using pitch contour characteristics. IEEE
Transactions on Audio, Speech, and Language Processing, 20(6), 1759–1770.
DAI=O /DAI=O 3 /<: AE<A9E 2
16
< <
A [ a U U _ M P 5TM S _
§ A [ a U U _
§ H _ aPU P Ya_UO MPU U[
§ : [ T[ UO MO[a_ UOM OTM MO U_ UO_
§ [PUO M _ à NaU PU S N [OW_
17
A [ a U U _ M P 5TM S _
§ A [ a U U _
§ H _ aPU P Ya_UO MPU U[
§ : [ T[ UO MO[a_ UOM OTM MO U_ UO_
§ [PUO M _ à NaU PU S N [OW_
§ ClSM R MY c[ W1
• Martinez, J. L. (2001). Semiosis in Hindustani Music. Motilal Banarsidass Publishers. 18
rK Y ex[U ]f beY Z]kYX g Ua U bXY& UaX _Yff Z]kYX g Ua g Y Y_bXl& VYlbaX g Y
bXY UaX f beg bZ Y_bXl& UaX e]W Ye Vbg g Ua U []iYa bXY be U []iYa
Y_bXl(s
RDUeg]aYm& G(30S
DbXY 5 Ix[U 5 DY_bXl
A [ a U U _ M P 5TM S _
§ 5TM S _
19
A [ a U U _ M P 5TM S _
§ 5TM S _
§ Y [bU_M [ e Ya_UO MPU U[
Time (s)
10 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0
200
400
600
800
000
200
400
20
A [ a U U _ M P 5TM S _
§ 5TM S _
§ Y [bU_M [ e Ya_UO MPU U[
§ [PUO M _O U U[ OTM SU S M P U P RU P
0 2 4 6 8 10
Time (s)
100
0
100
200
300
400
500
600
Frequency(Cents)
S
m
g
Characteristic
movement
• http://www.freesound.org/people/sankalp/sounds/360428/ 21
A [ a U U _ M P 5TM S _
§ 5TM S _
§ Y [bU_M [ e Ya_UO MPU U[
§ [PUO M _O U U[ OTM SU S M P U P RU P
§ [ _ M PM P R O R ]a Oe E[ UO U OT
9
#++
9
#++0
:
#+,-
?
#+ -
?
#32
22
A [ a U U _ M P 5TM S _
§ 5TM S _
§ Y [bU_M [ e Ya_UO MPU U[
§ [PUO M _O U U[ OTM SU S M P U P RU P
§ [ _ M PM P R O R ]a Oe E[ UO U OT
§ >[ S MaPU[ O[ PU S_
q+ FLI
23
4 [MP ANV O Ub _
§ 5[Y U M U[ Oa M U[ [R O[ [ M [R 3
§ 6U_O[b e OTM MO UfM U[ [R Y [PUO M _
§ 3a [YM UO lSM O[S U U[
24
5[Y a M U[ M EM_W_
DY_bX]W
<YfWe]cg]ba
25
5[Y a M U[ M EM_W_
Kba]W AXYag]Z]WUg]ba #=iU_hUg]ba
DY_bX]W
<YfWe]cg]ba
26
5[Y a M U[ M EM_W_
Kba]W AXYag]Z]WUg]ba #=iU_hUg]ba
DY_bX]W
<YfWe]cg]ba
Elxf Y[ YagUg]ba
27
5[Y a M U[ M EM_W_
Kba]W AXYag]Z]WUg]ba #=iU_hUg]ba
DY_bX]W
<YfWe]cg]ba
DY_bX]W ] ]_Ue]gl
Elxf Y[ YagUg]ba
28
5[Y a M U[ M EM_W_
Kba]W AXYag]Z]WUg]ba #=iU_hUg]ba
DY_bX]W
<YfWe]cg]ba
GUggYea <]fWbiYel
DY_bX]W ] ]_Ue]gl
Elxf Y[ YagUg]ba
time (s)
10
time (s)
10
29
5[Y a M U[ M EM_W_
Kba]W AXYag]Z]WUg]ba #=iU_hUg]ba
DY_bX]W
<YfWe]cg]ba
GUggYea UeUWgYe]mUg]ba
GUggYea <]fWbiYel
DY_bX]W ] ]_Ue]gl
Elxf Y[ YagUg]ba
time (s)
10
time (s)
10
30
5[Y a M U[ M EM_W_
Kba]W AXYag]Z]WUg]ba #=iU_hUg]ba
9hgb Ug]W Ix[U IYWb[a]g]ba
DY_bX]W
<YfWe]cg]ba
GUggYea <]fWbiYel
DY_bX]W ] ]_Ue]gl
Elxf Y[ YagUg]ba
time (s)
10
time (s)
10
31
GUggYea UeUWgYe]mUg]ba
a_UO 5[ [ M M P 6M M_ _
32
Chapter 3
Music Corpora and Datasets
3.1 Introduction
A research corpus is a collection of data compiled to study a research problem. A well
designed research corpus is representative of the domain under study. It is practically
infeasible to work with the entire universe of data. Therefore, to ensure scalability of
information retrieval technologies to real-world scenarios, it is to important to develop
and test computational approaches using a representative data corpus. Moreover, an
easily accessible data corpus provides a common ground for researchers to evaluate
their methods, and thus, accelerates knowledge sharing and advancement.
Not every computational task requires the entire research corpus for development
and evaluation of approaches. Typically a subset of the corpus is used in a specific
research task. We call this subset a test corpus or test dataset. The models built over a
test dataset can later be extended to the entire research corpus. Test dataset is a static
collection of data specific to an experiment, as opposed to a research corpus, which
can evolve over time. Therefore, different versions of the test dataset used in a specific
experiment should be retained for ensuring reproducibility of the research results.
Note that a test dataset should not be confused with the training and testing split of
a dataset, which are the terms used in the context of a cross validation experimental
setup.
In MIR, a considerable number of the computational approaches follow a data-driven
methodology, and hence, a well curated research corpus becomes a key factor in de-
termining the success of these approaches. Due to the importance of a good data
corpus in research, building a corpus in itself is a fundamental research task (MacMul-
len, 2003). MIR can be regarded as a relatively new research area within information
retrieval, which has primarily gained popularity in last two decades. Even today, a
significant number of the studies in MIR use ad-hoc procedures to build a collection
of data to be used in the experiments. Quite often the audio recordings used in the
experiments are taken from a researcher’s personal music collection. Availability of a
good representative research corpus has been a challenge in MIR (Serra, 2014). This
61
5[Y a_UO 5[ [ M
33
5[Y a_UO 5[ [ M
§ E MY RR[
34
5[Y a_UO 5[ [ M
§ E MY RR[
§ B [O Pa M P P _US O U UM
• Serra, X. (2014). Creating research corpora for the computational study of music: the case of the Compmusic project. In Proc. of the
53rd AES Int. Conf. on Semantic Audio. London.
• Srinivasamurthy, A., Koduri, G. K., Gulati, S., Ishwar, V., & Serra, X. (2014). Corpora for music information research in Indian art
music. In Int. Computer Music Conf./Sound and Music Computing Conf., pp. 1029–1036. Athens, Greece. 35
5[Y a_UO 5[ [ M
36
5[Y a_UO 5[ [ M
9hX]b =X]gbe]U_ YgUXUgU
<f Dhf]WVeU]am
37
5M M UO M P :U Pa_ M U a_UO 5[ [ M
8 9
.=3
.20891573
9 5
38
A MOO __ a_UO 5[ [ M
8 9
.=3
.20891573
9 5
39
A MOO __ a_UO 5[ [ M
U Uf
YWg]baf
DY_bX]W c eUfYf
40
E _ 6M M_ _
§ E[ UO P URUOM U[
§ el_ D SY M U[
§ [PUO DUYU M U e
§ ClSM C O[S U U[
41
[Pe 6 _O U [ _ M P C _ M U[ _
42
Chapter 4
Melody Descriptors and
Representations
4.1 Introduction
In this chapter, we describe methods for extracting relevant melodic descriptors and
low-level melody representation from raw audio signals. These melodic features are
used as inputs to perform higher level melodic analyses by the methods described in
the subsequent chapters. Since these features are used in a number of computational
tasks, we consolidate and present these common processing blocks in the current
chapter. This chapter is largely based on our published work presented in Gulati et al.
(2014a,b,c).
There are four sections in this chapter, and in each section, we describe the extraction
of a different melodic descriptor or a representation.
In Section 4.2, we focus on the task of automatically identifying the tonic pitch
in an audio recording. Our main objective is to perform a comparative evalu-
ation of different existing methods and select the best method to be used in our
work.
In Section 4.3, we present the method used to extract the predominant pitch
from audio signals, and describe the post-processing steps to reduce frequently
occurring errors.
In Section 4.4, we describe the process of segmenting the solo percussion re-
gions (Tani sections) in the audio recordings of Carnatic music.
In Section 4.5, we describe our ny¯as-based approach for segmenting melodies
in Hindustani music.
87
[Pe 6 _O U [ _ M P C _ M U[ _
Kba]W AXYag]Z]WUg]ba
#=iU_hUg]ba
Elxf Y[ YagUg]ba
GeYXb ]aUag DY_bXl =fg] Ug]ba
UaX cbfg cebWYff]a[
KUa] Y[ YagUg]ba
43
[Pe 6 _O U [ _ M P C _ M U[ _
Kba]W AXYag]Z]WUg]ba
#=iU_hUg]ba
Elxf Y[ YagUg]ba
GeYXb ]aUag DY_bXl =fg] Ug]ba
UaX cbfg cebWYff]a[
KUa] Y[ YagUg]ba
44
E[ UO P URUOM U[ 1 3 [MOT _
Method Features Feature Distribution Tonic Selection
MRS (Sengupta et al., 2005) Pitch (Datta, 1996) NA Error minimization
MRH1/MRH2 (Ranjani et al., 2011)
Pitch (Boersma & Weenink,
2001)
Parzen-window-based PDE GMM fitting
MJS (Salamon et al., 2012)
Multi-pitch salience
(Salamon et al., 2011)
Multi-pitch histogram Decision tree
MSG (Gulati et al., 2012)
Multi-pitch salience
(Salamon et al., 2011)
Multi-pitch histogram Decision tree
Predominant melody
(Salamon & Gómez, 2012)
Pitch histogram Decision tree
MAB1 (Bellur et al., 2012)
Pitch (De Cheveigné &
Kawahara, 2002)
GD histogram Highest peak
MAB2 (Bellur et al., 2012)
Pitch (De Cheveigné &
Kawahara, 2002)
GD histogram Template matching
MAB2 (Bellur et al., 2012)
Pitch (De Cheveigné &
Kawahara, 2002)
GD histogram Highest peak
MCS (Chordia & ¸Sentürk, 2013)
ABBREVIATIONS: NA=Not applicable; GD=Group Delay, PDE=Probability Density Estimate
Table 2.1: Summary of the existing tonic identification approaches.
• Salamon, J., Gulati, S., & Serra, X. (2012). A multipitch approach to tonic identification in Indian classical music. In Proc. of Int. Soc.
for Music Information Retrieval Conf. (ISMIR), pp. 499–504. Porto, Portugal.
• Gulati, S., Salamon, J., & Serra, X. (2012). A two-stage approach for tonic identification in indian art music. In Proc. of the 2nd
CompMusic Workshop, pp. 119–127. Istanbul, Turkey: Universitat Pompeu Fabra.
Method Features Feature Distribution Tonic Selection
MRS (Sengupta et al., 2005) Pitch (Datta, 1996) NA Error minimization
MRH1/MRH2 (Ranjani et al., 2011)
Pitch (Boersma & Weenink,
2001)
Parzen-window-based PDE GMM fitting
MJS (Salamon et al., 2012)
Multi-pitch salience
(Salamon et al., 2011)
Multi-pitch histogram Decision tree
MSG (Gulati et al., 2012)
Multi-pitch salience
(Salamon et al., 2011)
Multi-pitch histogram Decision tree
Predominant melody
(Salamon & Gómez, 2012)
Pitch histogram Decision tree
MAB1 (Bellur et al., 2012)
Pitch (De Cheveigné &
Kawahara, 2002)
GD histogram Highest peak
MAB2 (Bellur et al., 2012)
Pitch (De Cheveigné &
Kawahara, 2002)
GD histogram Template matching
MAB2 (Bellur et al., 2012)
Pitch (De Cheveigné &
Kawahara, 2002)
GD histogram Highest peak
MCS (Chordia & ¸Sentürk, 2013)
ABBREVIATIONS: NA=Not applicable; GD=Group Delay, PDE=Probability Density Estimate
Table 2.1: Summary of the existing tonic identification approaches.
Dh_g]c]gW % _Uff]Z]WUg]ba GeYXb ]aUag c]gW % gY c_UgY)cYU c]W
45
5[Y M M Ub 7bM aM U[
§ D b Y T[P_
§ DUd PUb _ PM M_ _
TIDCM1
TIDCM2
TIDCM3
TIDIITM1
TIDIITM2
TIDIISc
,1+
3-/
.,2
-2
.1,
//
-
-
+/
+..
+,
1
=kWYecgf DYUa XheUg]ba
# ]af
46
C _a _
Methods
TIDCM1 TIDCM2 TIDCM3 TIDIISc TIDIITM1 TIDIITM2
TP TPC TP TPC TP TPC TP TPC TP TPC TP TPC
MRH1 - 81.4 69.6 84.9 73.2 90.8 81.8 83.6 92.1 97.4 80.2 86.9
MRH2 - 63.2 65.7 78.2 68.5 83.5 83.6 83.6 94.7 97.4 83.8 88.8
MAB1 - - - - - - - - 89.5 89.5 - -
MAB2 - 88.9 74.5 82.9 78.5 83.4 72.7 76.4 92.1 92.1 86.6 89.1
MAB3 - 86 61.1 80.5 67.8 79.9 72.7 72.7 94.7 94.7 85 86.6
MJS - 88.9 87.4 90.1 88.4 91 75.6 77.5 89.5 97.4 90.8 94.1
MSG - 92.2 87.8 90.9 87.7 90.5 79.8 85.3 97.4 97.4 93.6 93.6
Table 4.1: Accuracies (%) for tonic pitch (TP) and tonic pitch-class (TPC) identification by seven methods on six different datasets using only
audio data. The best accuracy obtained for each dataset is highlighted using bold text. The dashed horizontal line divides the methods based
on supervised learning (MJS and MSG) and those based on expert knowledge (MRH1, MRH2, MAB1, MAB2 and MAB3). TP column for TIDCM1 is
marked as ‘-’, because it consists of only instrumental excerpts for which we not evaluate tonic pitch accuracy. MAB1 is only evaluated on TIDIITM1
since it works on the whole concert recording.
94 MELODY DESCRIPTORS AND REPRESENTATION
Methods TIDCM1 TIDCM2 TIDCM3 TIDIISc TIDIITM1 TIDIITM2
MRH1 87.7 83.5 88.9 87.3 97.4 91.7
MRH2 79.55 76.3 82 85.5 97.4 91.5
MAB1 - - - - 97.4 -
MAB2 92.3 91.5 94.2 81.8 97.4 91.1
MAB3 87.5 86.7 90.9 81.8 94.7 89.9
MJS 88.9 93.6 92.4 80.9 97.4 92.3
MSG 92.2 90.9 90.5 85.3 97.4 93.6
3aPU[
3aPU[ MPM M
47
C _a _
CM2 CM3 IITM2 IITM1
60
65
70
75
80
85
90
95
100
PerformanceAccuracy(%)
JS
SG
RH1
RH2
AB2
AB3
Dataset
Accuracy(%)
40
50
60
70
80
90
100
PerformanceAccuracy(%)
JS
SG
R
H
1R
H
2
AB2AB3
JS
SG
R
H
1R
H
2
AB2AB3
JS
SG
R
H
1R
H
2AB2
AB3
JS
SG
R
H
1R
H
2
AB2AB3
Tonic Pitch
Tonic Pitch−class
Hindustani Carnatic Male Female
Accuracy(%)
Tonic Pitch-class
Tonic Pitch
0
5
10
15
20
25
30
Errors(%)
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
CM1 CM2 CM3 IISCB1 IITM1
0
5
10
15
20
25
30
Errors(%)
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
CM1 CM2 CM3 IISCB1 IITM1
0
5
10
15
20
25
30
Errors(%)
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
Pa Errors
Ma Errors
Other Errors
CM1 CM2 CM3 IISCB1 IITM1 IITM2
0
5
10
15
20
25
30
Errors(%)
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
Pa E
Ma E
Othe
CM1 CM2 CM3 IISCB1 IITM1 IITM
0
5
10
15
20
25
30
Errors(%)
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
Pa Errors
Ma Errors
Other Error
CM1 CM2 CM3 IISCB1 IITM1 IITM2
0
5
10
15
20
25
30
Errors(%)
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
Pa Errors
Ma Errors
Other Errors
CM1 CM2 CM3 IISCB1 IITM1 IITM2
0
5
10
15
20
25
30
Errors(%)
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
Pa Errors
Ma Errors
Other Errors
CM1 CM2 CM3 IISCB1 IITM1 IITM2
Errors(%)
Datasets
0
5
10
15
20
25
30
PerformanceAccuracy(%)
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
Pa Errors
Ma Errors
Other Errors
Hindustani! Carnatic Male Female
Accuracy(%)
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
Pa Errors
Ma Errors
Other Errors
Hindustani! Carnatic Male Female
0
5
10
15
20
25
30
PerformanceAccuracy(%)
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
Pa Errors
Ma Errors
Other Errors
Hindustani! Carnatic Male Female
0
5
10
15
20
25
30
Errors(%)
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
CM1 CM2 CM3 IISCB1 IITM1
0
5
10
15
20
25
30
Errors(%)
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
CM1 CM2 CM3 IISCB1 IITM1
0
5
10
15
20
25
30
Errors(%)
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
P
M
O
CM1 CM2 CM3 IISCB1 IITM1 II
0
5
10
15
20
25
30
Errors(%)
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2A
Pa Errors
Ma Error
Other Er
CM1 CM2 CM3 IISCB1 IITM1 IITM2
0
5
10
15
20
25
30
Errors(%)
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
Pa Errors
Ma Errors
Other Errors
CM1 CM2 CM3 IISCB1 IITM1 IITM2
0
5
10
15
20
25
30
Errors(%)
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
Pa Errors
Ma Errors
Other Errors
CM1 CM2 CM3 IISCB1 IITM1 IITM2
0
5
10
15
20
25
30
Errors(%)
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
Pa Errors
Ma Errors
Other Errors
CM1 CM2 CM3 IISCB1 IITM1 IITM2
Errors(%)
Datasets
48
C _a _
CM2 CM3 IITM2 IITM1
60
65
70
75
80
85
90
95
100
PerformanceAccuracy(%)
JS
SG
RH1
RH2
AB2
AB3
Dataset
Accuracy(%)
40
50
60
70
80
90
100
PerformanceAccuracy(%)
JS
SG
R
H
1R
H
2
AB2AB3
JS
SG
R
H
1R
H
2
AB2AB3
JS
SG
R
H
1R
H
2AB2
AB3
JS
SG
R
H
1R
H
2
AB2AB3
Tonic Pitch
Tonic Pitch−class
Hindustani Carnatic Male Female
Accuracy(%)
Tonic Pitch-class
Tonic Pitch
0
5
10
15
20
25
30
Errors(%)
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
CM1 CM2 CM3 IISCB1 IITM1
0
5
10
15
20
25
30
Errors(%)
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
CM1 CM2 CM3 IISCB1 IITM1
0
5
10
15
20
25
30
Errors(%)
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
Pa Errors
Ma Errors
Other Errors
CM1 CM2 CM3 IISCB1 IITM1 IITM2
0
5
10
15
20
25
30
Errors(%)
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
Pa E
Ma E
Othe
CM1 CM2 CM3 IISCB1 IITM1 IITM
0
5
10
15
20
25
30
Errors(%)
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
Pa Errors
Ma Errors
Other Error
CM1 CM2 CM3 IISCB1 IITM1 IITM2
0
5
10
15
20
25
30
Errors(%)
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
Pa Errors
Ma Errors
Other Errors
CM1 CM2 CM3 IISCB1 IITM1 IITM2
0
5
10
15
20
25
30
Errors(%)
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
Pa Errors
Ma Errors
Other Errors
CM1 CM2 CM3 IISCB1 IITM1 IITM2
Errors(%)
Datasets
0
5
10
15
20
25
30
PerformanceAccuracy(%)
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
Pa Errors
Ma Errors
Other Errors
Hindustani! Carnatic Male Female
Accuracy(%)
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
Pa Errors
Ma Errors
Other Errors
Hindustani! Carnatic Male Female
0
5
10
15
20
25
30
PerformanceAccuracy(%)
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
Pa Errors
Ma Errors
Other Errors
Hindustani! Carnatic Male Female
0
5
10
15
20
25
30
Errors(%)
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
CM1 CM2 CM3 IISCB1 IITM1
0
5
10
15
20
25
30
Errors(%)
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
CM1 CM2 CM3 IISCB1 IITM1
0
5
10
15
20
25
30
Errors(%)
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
P
M
O
CM1 CM2 CM3 IISCB1 IITM1 II
0
5
10
15
20
25
30
Errors(%)
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2A
Pa Errors
Ma Error
Other Er
CM1 CM2 CM3 IISCB1 IITM1 IITM2
0
5
10
15
20
25
30
Errors(%)
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
Pa Errors
Ma Errors
Other Errors
CM1 CM2 CM3 IISCB1 IITM1 IITM2
0
5
10
15
20
25
30
Errors(%)
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
Pa Errors
Ma Errors
Other Errors
CM1 CM2 CM3 IISCB1 IITM1 IITM2
0
5
10
15
20
25
30
Errors(%)
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
JS
SGR
H
1R
H
2AB2AB3
Pa Errors
Ma Errors
Other Errors
CM1 CM2 CM3 IISCB1 IITM1 IITM2
Errors(%)
Datasets
<heUg]ba bZ YkWYecg
UgY[bel ]fY UaU_lf]f
<YgU]_YX Yeebe UaU_lf]f
UgY[bel ]fY Yeebe 9aU_lf]f
49
C _a _1 DaYYM e
§ a U U OT O M__URUOM U[
§ 3aPU[
§ bM UM [ Pa M U[
§ 5[ _U_ MPU U[ S P
U _ aY
§ BU OT Y M MW UOW
§ 3aPU[ Y MPM M
§ D _U Ub [ Pa M U[
§ O[ _U_ MPU U[ S P
U _ aY
• Gulati, S., Bellur, A., Salamon, J., Ranjani, H. G., Ishwar, V., Murthy, H. A., & Serra, X. (2014).
Automatic tonic identification in Indian art music: approaches and evaluation. JNMR, 43(1), 53–71. 50
B P[YU M BU OT 7_ UYM U[
DY_bX]U
• Salamon, J. & Gómez, E. (2012). Melody extraction from polyphonic music signals using pitch contour characteristics. IEEE
Transactions on Audio, Speech, and Language Processing, 20(6), 1759–1770.
• Bogdanov, D., Wack, N., Gómez, E., Gulati, S., Herrera, P., Mayor, O., Roma, G., Salamon, J., Zapata, J., & Serra, X. (2013). Essentia:
an audio analysis library for music information retrieval. In ISMIR, pp. 493–498. 51
el_ DbM M D SY M U[
§ [PUO _ SY M U[
§ 6 UYU Y [PUO T M_ _
178 180 182 184 186 188 190
−200
0
200
400
600
800
Time (s)
F0frequency(cents)
N1 N5N4N3N2
52
el_ DbM M D SY M U[
Predominant pitch estimation Tonic identification
Audio
Nyās segments
Pred. pitch estimation
and representation
Histogram
computation
SegmentationSvara identification Segmentation
Local Feature extractionContextual Local + Contextual
Segment classification Segment fusion
Segment classification and
fusion
53
el_ DbM M D SY M U[
Predominant pitch estimation Tonic identification
Audio
Nyās segments
Pred. pitch estimation
and representation
Histogram
computation
SegmentationSvara identification Segmentation
Local Feature extractionContextual Local + Contextual
Segment classification Segment fusion
Segment classification and
fusion
54
el_ DbM M D SY M U[
Predominant pitch estimation Tonic identification
Audio
Nyās segments
Histogram
computation
SegmentationSvara identification Segmentation
Local Feature extractionContextual Local + Contextual
Segment classification Segment fusion
Segment classification and
fusion
Pred. pitch estimation
and representation
55
el_ DbM M D SY M U[
Predominant pitch estimation Tonic identification
Audio
Nyās segments
Histogram
computation
SegmentationSvara identification Segmentation
Local Feature extractionContextual Local + Contextual
Segment classification Segment fusion
Segment classification and
fusion
Pred. pitch estimation
and representation
0.12, 0.34, 0.59, 0.23, 0.54
0.21, 0.24, 0.54, 0.54, 0.42
0.32, 0.23, 0.34, 0.41, 0.63
0.66, 0.98, 0.74, 0.33, 0.12
0.90, 0.42, 0.14, 0.83, 0.76
56
el_ DbM M D SY M U[
Predominant pitch estimation Tonic identification
Audio
Nyās segments
Histogram
computation
SegmentationSvara identification Segmentation
Local Feature extractionContextual Local + Contextual
Segment classification Segment fusion
Segment classification
and fusion
Pred. pitch estimation
and representation
0.12, 0.34, 0.59, 0.23, 0.54
0.21, 0.24, 0.54, 0.54, 0.42
0.32, 0.23, 0.34, 0.41, 0.63
0.66, 0.98, 0.74, 0.33, 0.12
0.90, 0.42, 0.14, 0.83, 0.76
Nyās
Non-nyās
Nyās
Nyās
Non-nyās
57
6M M_
§ k l R[ YM O _ :U Pa_ M U
§ 3 [ M U[ _1 Ya_UOUM 2)- e M _ MU U S
§ ) -/ el_ _ SY _
657
IYWbeX]a[f <heUg]ba 9eg]fgf Ix[Uf
58
C _a _118 MELODY DESCRIPTORS AND REPRESENTATIONS
Segmentation Feat. DTW Tree k-NN NB LR SVM
PLS
FL 0.356 0.407 0.447 0.248 0.449 0.453
FC 0.284 0.394 0.387 0.383 0.389 0.406
FL+FC 0.289 0.414 0.426 0.409 0.432 0.437
Proposed
FL 0.524 0.672 0.719 0.491 0.736 0.749
FC 0.436 0.629 0.615 0.641 0.621 0.673
FL+FC 0.446 0.682 0.708 0.591 0.725 0.735
Table 4.3: F-scores for ny¯as boundary detection using PLS method and the proposed seg-
mentation method. Results are shown for different classifiers (Tree, k-NN, NB, LR, SVM)
and local (FL), contextual (FC) and local together with contextual (FL+FC) features. DTW
is the baseline method used for comparison. F-score for the random baseline obtained using
BR2 is 0.184.
Segmentation Feat. DTW Tree k-NN NB LR SVM
PLS
FL 0.553 0.685 0.723 0.621 0.727 0.722
FC 0.251 0.639 0.631 0.690 0.688 0.674
FL+FC 0.389 0.694 0.693 0.708 0.722 0.706
Segmentation Feat. DTW Tree k-NN NB LR SVM
PLS
FL 0.356 0.407 0.447 0.248 0.449 0.453
FC 0.284 0.394 0.387 0.383 0.389 0.406
FL+FC 0.289 0.414 0.426 0.409 0.432 0.437
Proposed
FL 0.524 0.672 0.719 0.491 0.736 0.749
FC 0.436 0.629 0.615 0.641 0.621 0.673
FL+FC 0.446 0.682 0.708 0.591 0.725 0.735
Table 4.3: F-scores for ny¯as boundary detection using PLS method and the proposed seg-
mentation method. Results are shown for different classifiers (Tree, k-NN, NB, LR, SVM)
and local (FL), contextual (FC) and local together with contextual (FL+FC) features. DTW
is the baseline method used for comparison. F-score for the random baseline obtained using
BR2 is 0.184.
Segmentation Feat. DTW Tree k-NN NB LR SVM
PLS
FL 0.553 0.685 0.723 0.621 0.727 0.722
FC 0.251 0.639 0.631 0.690 0.688 0.674
FL+FC 0.389 0.694 0.693 0.708 0.722 0.706
Proposed
FL 0.546 0.708 0.754 0.714 0.749 0.758
FC 0.281 0.671 0.611 0.697 0.689 0.697
FL+FC 0.332 0.672 0.710 0.730 0.743 0.731
:bhaXUelCUVY_
59
C _a _1 DaYYM e
§ B [ [_ P _ SY M U[
§ M a d MO U[
§ >[OM R M a _
§ BU O cU_ U M _ SY M U[
§ BM YM OTU S 6EH
§ >[OM 5[ d aM R M a _
• Gulati, S., Serrà, J., Ganguli, K. K., & Serra, X. (2014). Landmark detection in Hindustani music
melodies. In ICMC-SMC, pp. 1062-1068. 60
[PUO BM B [O __U S
61
Chapter 5
Melodic Pattern Processing:
Similarity, Discovery and
Characterization
5.1 Introduction
In this chapter, we present our methodology for discovering musically relevant melodic
patterns in sizable audio collections of IAM. We address three main computational
tasks involved in this process: melodic similarity, pattern discovery and characteriz-
ation of the discovered melodic patterns. We refer to these different tasks jointly as
melodic pattern processing.
“Only by repetition can a series of tones be characterized as something
definite. Only repetition can demarcate a series of tones and its purpose.
Repetition thus is the basis of music as an art”
(Schenker et al., 1980)
Repeating patterns are at the core of music. Consequently, analysis of patterns is
fundamental in music analysis. In IAM, recurring melodic patterns are the building
blocks of melodic structures. They provide a base for improvisation and composition,
and thus, are crucial to the analysis and description of r¯agas, compositions, and artists
in this music tradition. A detailed account of the importance of melodic patterns in
IAM is provided in Section 2.3.2.
To recapitulate, from the literature review presented in Section 2.4.2 and Section 2.5.2
we see that the approaches for pattern processing in music can be broadly put into
two categories (Figure 5.1). One of the types of approaches perform pattern detection
(or matching) and follow a supervised methodology. In these approaches the system
knows a priori the pattern to be extracted. Typically such a system is fed with exem-
plar patterns or queries and is expected to extract all of their occurrences in a piece of
121
[PUO BM B [O __U S
62
[PUO BM B [O __U S
63
[PUO BM B [O __U S
64
7dU_ U S 3 [MOT _ R[ 3
Method Task
Melody
Representation
Segmentation
Similarity
Measure
Speed-up #R¯agas #Rec #Patt #Occ
Ishwar et al. (2012) Distinction Continuous
GT
annotations
HMM - 5c NA 6 431
Ross & Rao (2012) Detection Continuous Pa ny¯as DTW Segmentation 1h 2 2 107
Ross et al. (2012) Detection
Continuous,
SAX-12,1200
Sama location Euc., DTW Segmentation 3h 4 3 107
Ishwar et al. (2013) Detection
Stationary point,
Continuous
Brute-force RLCS
Two stage
method
2c 47 4 173
Rao et al. (2013) Distinction Continuous
GT
annotations
DTW - 1h 8 3 268
Dutta & Murthy (2014a) Discovery
Stationary point,
Continuous
Brute-force RLCS - 5c 59 - -
Dutta & Murthy (2014b) Detection
Stationary point,
Continuous
NA
Modified-
RLCS
Two stage
method
1c 16 NA 59
Rao et al. (2014)
Distinction Continuous
GT
annotations
DTW - 1h 8 3 268
Distinction Continuous
GT
annotations
HMM - 5c NA 10 652
Ganguli et al. (2015) Detection
BSS,
Transcription
-
Smith-
Waterman
Discretization 34h 50 NA 1075
h Hindustani music collection c Carnatic music collection
ABBREVIATIONS: #Rec=Number of recordings; #Patt=Number of unique patterns; #Occ=Total number of annotated occurrences of the
patterns; GT=Ground truth; NA=Not available; “-”= Not applicable; Euc.=Euclidean distance.
Table 2.2: Summary of the methods proposed in the literature for melodic pattern processing in IAM. Note that all of these studies were published
during the course of our work.
<YgYWg]ba #/ <]fg]aWg]ba #. <]fWbiYel #+
, +, , +0
65
[PUO BM B [O __U S
Supervised Approach
Unsupervised Approach
Supervised Approach
66
[PUO BM B [O __U S
Supervised Approach
Unsupervised Approach
Supervised Approach
67
[PUO BM B [O __U S
§ 6M M_ _Uf
§ [c PS NUM_
§ :aYM [ _ UYU M U[ _
Unsupervised Approach
Supervised Approach
68
[PUO BM B [O __U S
DY_bX]W
] ]_Ue]gl
GUggYea
<]fWbiYel
GUggYea
UeUWgYe]mUg]ba
hcYei]fYX LafhcYei]fYX LafhcYei]fYX
69
[PUO BM B [O __U S
DY_bX]W
] ]_Ue]gl
GUggYea
<]fWbiYel
GUggYea
UeUWgYe]mUg]ba
hcYei]fYX LafhcYei]fYX LafhcYei]fYX
70
[PUO DUYU M U e
71
[PUO DUYU M U e
Time (s)
10 2 3 4 5 6 7 8 9 10 11 12 13 14 15
00
00
00
00
00
00
00
U__Ya[Y
72
[PUO DUYU M U e
U__Ya[Y
73
+ , - .
[PUO DUYU M U e
Predominant Pitch Estimation
Pitch Normalization
Uniform Time-scaling
Distance Computation
Melodic Similarity
Audio1 Audio2
74
[PUO DUYU M U e
U c_]a[ eUgY8
PYf ) Eb& N Ug ]aX8
PYf ) Eb& b hW 8
=hW_]XYUa be
<laU ]W g] Y Uec]a[ #<KN &
GUeU YgYef8
Predominant Pitch Estimation
Pitch Normalization
Uniform Time-scaling
Distance Computation
Melodic Similarity
Audio1 Audio2
75
[PUO DUYU M U e
n+ & 01& / & . & --p m
sg Standard deviation of
computation
Q Heaviside step functi
u Flatness measure
˜u Flatness threshold
° Complexity weightin
w Sampling rate of the
V Maximum error para
method
z Complexity estimate
Predominant Pitch Estimation
Pitch Normalization
Uniform Time-scaling
Distance Computation
Melodic Similarity
Audio1 Audio2
76
§ Eb abe U_]mUg]ba
§ Kba]W
§ Q]aZ
§ Q,.
§ Q+,
§ DYUa
§ DYX]Ua
§ Q abe
§ DYX]Ua UVfb_hgY XYi]Ug]ba
[PUO DUYU M U e
Predominant Pitch Estimation
Pitch Normalization
Uniform Time-scaling
Distance Computation
Melodic Similarity
Audio1 Audio2
77
[PUO DUYU M U e
§ vbZZ
§ vba n (3& (3/& +( & +( /& +(+p
Predominant Pitch Estimation
Pitch Normalization
Uniform Time-scaling
Distance Computation
Melodic Similarity
Audio1 Audio2
78
[PUO DUYU M U e
§ =hW_]XYUa
§ <laU ]W g] Y Uec]a[
§ ?_bVU_ WbafgeU]ag #-
§ CbWU_ WbafgeU]ag #
Predominant Pitch Estimation
Pitch Normalization
Uniform Time-scaling
Distance Computation
Melodic Similarity
Audio1 Audio2
79
[PUO DUYU M U e
Predominant Pitch Estimation
Pitch Normalization
Uniform Time-scaling
Distance Computation
Melodic Similarity
Audio1 Audio2
k k k
80
7bM aM U[ 1 D a
E GUggYeaf
+ E IUaXb
fY[ Yagf
Kbc +
aYUeYfg
aY][ Vbef
DYUa
9iYeU[Y
GeYW]f]ba
81
6M M_
49
IYWbeX]a[f <heUg]ba Ix[UfGUggYea FWW(
49
IYWbeX]a[f <heUg]ba Ix[UfGUggYea FWW(
UeaUg]W hf]W XUgUfYg
]aXhfgUa] hf]W XUgUfYg
82
C _a _
ODIC SIMILARITY: APPROACHES AND EVALUATION 131
Dataset MAP Srate Norm TScale Dist
MSDcmd
iitm
0.413 w67 Zmean Woff DDTW_L1_G90
0.412 w67 Zmean Won DDTW_L1_G10
0.411 w100 Zmean Woff DDTW_L1_G90
MSDhmd
iitb
0.552 w100 Ztonic Woff DDTW_L0_G90
0.551 w67 Ztonic Woff DDTW_L0_G90
0.547 w50 Ztonic Woff DDTW_L0_G90
MAP score and details of the parameter settings for the three best performing
MSDcmd
iitm and MSDhmd
iitb dataset. Srate: sampling rate of the melody representation,
malization technique, TScale: uniform time-scaling and Dist: distance measure.
an average precision (MAP), a typical evaluation measure in information
Manning et al., 2008). Mean average precision (MAP) is computed by
mean of the average precision values of each query in the dataset. This way,
single number to evaluate and compare the performance of a variant. In
sess if the difference in the performance of any two variants is statistically
, we use the Wilcoxon signed rank-test (Wilcoxon, 1945) with p < 0.01. To
e for multiple comparisons, we apply the Holm-Bonferroni method (Holm,
us, considering that we compare 560 different variants, we effectively use a
CMD HMD
134 MELODIC PATTERN PR
Dataset MAP Srate Norm TScale Dist
MSDcmd
iitm
0.279 w67 Zmean Won DDTW_L1_G10
0.277 w67 ZtonicQ12 Won DDTW_L1_G10
0.275 w100 ZtonicQ12 Won DDTW_L1_G10
MSDhmd
iitb
0.259 w40 ZtonicQ12 Won DDTW_L1_G90
0.259 w100 ZtonicQ12 Won DDTW_L1_G90
0.259 w67 ZtonicQ12 Won DDTW_L1_G90
Table 5.2: MAP score and details of the parameter settings for the three best perf83
C _a _
ODIC SIMILARITY: APPROACHES AND EVALUATION 131
Dataset MAP Srate Norm TScale Dist
MSDcmd
iitm
0.413 w67 Zmean Woff DDTW_L1_G90
0.412 w67 Zmean Won DDTW_L1_G10
0.411 w100 Zmean Woff DDTW_L1_G90
MSDhmd
iitb
0.552 w100 Ztonic Woff DDTW_L0_G90
0.551 w67 Ztonic Woff DDTW_L0_G90
0.547 w50 Ztonic Woff DDTW_L0_G90
MAP score and details of the parameter settings for the three best performing
MSDcmd
iitm and MSDhmd
iitb dataset. Srate: sampling rate of the melody representation,
malization technique, TScale: uniform time-scaling and Dist: distance measure.
an average precision (MAP), a typical evaluation measure in information
Manning et al., 2008). Mean average precision (MAP) is computed by
mean of the average precision values of each query in the dataset. This way,
single number to evaluate and compare the performance of a variant. In
sess if the difference in the performance of any two variants is statistically
, we use the Wilcoxon signed rank-test (Wilcoxon, 1945) with p < 0.01. To
e for multiple comparisons, we apply the Holm-Bonferroni method (Holm,
us, considering that we compare 560 different variants, we effectively use a
CMD HMD
134 MELODIC PATTERN PR
Dataset MAP Srate Norm TScale Dist
MSDcmd
iitm
0.279 w67 Zmean Won DDTW_L1_G10
0.277 w67 ZtonicQ12 Won DDTW_L1_G10
0.275 w100 ZtonicQ12 Won DDTW_L1_G10
MSDhmd
iitb
0.259 w40 ZtonicQ12 Won DDTW_L1_G90
0.259 w100 ZtonicQ12 Won DDTW_L1_G90
0.259 w67 ZtonicQ12 Won DDTW_L1_G90
Table 5.2: MAP score and details of the parameter settings for the three best perf
GUeU YgYef
GUggYea WUgY[be]Yf
GUggYea fY[ YagUg]ba
84
C _a _1 DaYYM e
U c_]a[ eUgY
Ebe U_]mUg]ba
K] Y fWU_]a[
<]fgUaWY
CbWU_ WbafgeU]ag
?_bVU_ WbafgeU]ag
][
DYUa
Eb WbafYafhf
<KN
N]g
AaiUe]Uag
AaiUe]Uag
Kba]W
Eb WbafYafhf
<KN
N]g bhg
N]g bhg
UeaUg]W
# (. D9G
]aXhfgUa]
# (// D9G
GUeU YgYe
85
C _a _1 DaYYM e
Bab a La ab a
G eUfY Y[ YagUg]ba
UeaUg]W
]aXhfgUa]
• Gulati, S., Serrà, J., & Serra, X. (2015). An evaluation of methodologies for melodic similarity in
audio recordings of Indian art music. In ICASSP, pp. 678–682. 86
Y [bU S [PUO DUYU M U e
Predominant Pitch Estimation
Pitch Normalization
Uniform Time-scaling
Distance Computation
Melodic Similarity
Audio1 Audio2
8h_gheY fcYW]Z]W]g]Yf
87
Y [bU S [PUO DUYU M U e
]aXhfgUa] hf]W
88
Y [bU S [PUO DUYU M U e
• G. E. Batista, X. Wang, and E. J Keogh. A complexity-invariant distance measure for time series, in SDM, volume 11, pp. 699–710, 2011
b c_Yk]gl
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6
Time (s)
500
0
500
1000
1500
Frequency(Cent) 1
2
3
U Y c eUfY
UeaUg]W hf]W
89
142 MELODIC P
based distance (DDTW) in order to compute the final similarit
We compute ° as:
° =
max(zi,zj)
min(zi,zj)
zi = 2
s
N 1
Â
i=1
(ˆpi ˆpi+1)2
where, zi is the complexity estimate of a melodic pattern of le
142 MELODIC
based distance (DDTW) in order to compute the final simila
We compute ° as:
° =
max(zi,zj)
min(zi,zj)
zi = 2
s
N 1
Â
i=1
(ˆpi ˆpi+1)2
where, zi is the complexity estimate of a melodic pattern of
is the pitch value of the ith sample. We explore two variants o
Y [bU S [PUO DUYU M U e
Predominant Pitch Est. + Post Processing
Pitch Normalization
Svara Duration Truncation
Distance Computation + Complexity Weight
Melodic Similarity
Audio1 Audio2
Partial Transcription Elxf fY[ YagUg]ba
n (+& (-& (/& (1/& +( & +(/& ,( p
90
6M M_
49
IYWbeX]a[f <heUg]ba Ix[UfGUggYea FWW(
49
IYWbeX]a[f <heUg]ba Ix[UfGUggYea FWW(
UeaUg]W hf]W XUgUfYg
]aXhfgUa] hf]W XUgUfYg
91
C _a _ MELODIC PATTERN PROCESSING
MSDhmd
CM
Norm MB MDT MCW1 MCW2
Ztonic 0.45 (0.25) 0.52 (0.24) - -
Zmean 0.25 (0.20) 0.31 (0.23) - -
Ztetra 0.40 (0.23) 0.47 (0.23) - -
MSDcmd
CM
Norm MB MDT MCW1 MCW2
Ztonic 0.39 (0.29) 0.42 (0.29) 0.41 (0.28) 0.41 (0.29)
Zmean 0.39 (0.26) 0.45 (0.28) 0.43 (0.27) 0.45 (0.27)
Ztetra 0.45 (0.26) 0.50 (0.27) 0.49 (0.28) 0.51 (0.27)
5.3: MAP scores for the two datasets MSDhmd
CM and MSDcmd
CM for the four method vari-
MB, MDT, MCW1 and MCW2 and for different normalization techniques. Standard devi-
of average precision is reported within round brackets.
rst analyse the results for the MSDhmd
CM dataset. From Table 5.3 (upper half),
e that the proposed method variant that applies a duration truncation performs
than the baseline method for all the normalization techniques. Moreover, this
ence is found to be statistically significant in each case. The results for the
hmd
CM in this table correspond to F =500 ms, for which we obtain the highest ac-
y compared to the other F values as shown in Figure 5.10. Furthermore, we see
tonic results in the best accuracy for the MSDhmd
CM for all the method variants and
fference is found to be statistically significant in each case. From Table 5.3, we
e a high standard deviation of the average precision values. This is because some
MB MDT
H1 H2 H3 H4 H5
MB MB MB MBMDT MDT MDT MDT
MB MDT MCW2 MB MB MB MBMDT MDT MDT
C1 C2 C3 C4 C5
MDTMCW2 MCW2 MCW2 MCW2
0.1 0.3 0.5 0.75 1.0 1.5 2.0
0.44
0.46
0.48
0.50
0.52
MAP
HMD
CMD
(s)
92
C _a _ MELODIC PATTERN PROCESSING
MSDhmd
CM
Norm MB MDT MCW1 MCW2
Ztonic 0.45 (0.25) 0.52 (0.24) - -
Zmean 0.25 (0.20) 0.31 (0.23) - -
Ztetra 0.40 (0.23) 0.47 (0.23) - -
MSDcmd
CM
Norm MB MDT MCW1 MCW2
Ztonic 0.39 (0.29) 0.42 (0.29) 0.41 (0.28) 0.41 (0.29)
Zmean 0.39 (0.26) 0.45 (0.28) 0.43 (0.27) 0.45 (0.27)
Ztetra 0.45 (0.26) 0.50 (0.27) 0.49 (0.28) 0.51 (0.27)
5.3: MAP scores for the two datasets MSDhmd
CM and MSDcmd
CM for the four method vari-
MB, MDT, MCW1 and MCW2 and for different normalization techniques. Standard devi-
of average precision is reported within round brackets.
rst analyse the results for the MSDhmd
CM dataset. From Table 5.3 (upper half),
e that the proposed method variant that applies a duration truncation performs
than the baseline method for all the normalization techniques. Moreover, this
ence is found to be statistically significant in each case. The results for the
hmd
CM in this table correspond to F =500 ms, for which we obtain the highest ac-
y compared to the other F values as shown in Figure 5.10. Furthermore, we see
tonic results in the best accuracy for the MSDhmd
CM for all the method variants and
fference is found to be statistically significant in each case. From Table 5.3, we
e a high standard deviation of the average precision values. This is because some
MB MDT
H1 H2 H3 H4 H5
MB MB MB MBMDT MDT MDT MDT
MB MDT MCW2 MB MB MB MBMDT MDT MDT
C1 C2 C3 C4 C5
MDTMCW2 MCW2 MCW2 MCW2
0.1 0.3 0.5 0.75 1.0 1.5 2.0
0.44
0.46
0.48
0.50
0.52
MAP
HMD
CMD
(s)
<heUg]ba gehaWUg]ba
Fcg] U_ XheUg]ba8
b c_Yk]gl NY][ g]a[
93
C _a _1 DaYYM e
<heUg]ba gehaWUg]ba
b c_Yk]gl NY][ g]a[
UeaUg]W
]aXhfgUa]
UeaUg]W
• Gulati, S., Serrà, J., & Serra, X. (2015). Improving melodic similarity in Indian art music using
culture-specific melodic characteristics. In ISMIR, pp. 680–686. 94
[PUO BM B [O __U S
DY_bX]W
] ]_Ue]gl
GUggYea
<]fWbiYel
GUggYea
UeUWgYe]mUg]ba
hcYei]fYX LafhcYei]fYX LafhcYei]fYX
95
BM 6U_O[b e
AageU eYWbeX]a[
GUggYea <]fWbiYel
AagYe eYWbeX]a[
GUggYea YUeW
IUa IYZ]aY Yag
I+
I,
I-
96
M O[ PU S BM 6U_O[b e
§ 6e MYUO UY cM U S 6EH
?_bVU_ WbafgeU]ag
#+ _Ya[g
Eb CbWU_ WbafgeU]ag
gYc4 n# &+ & #+&+ & #+& p
97
CM W C RU Y
CbWU_ WbafgeU]ag
gYc4 n#,&+ & #+&+ & #+&, p
CbWU_ Wbfg ZhaWg]ba
#M+& M,& M-& M.
98
5[Y a M U[ M 5[Y dU e
F#a, à U c_Yf
F#a, à UaX]XUgYf
-/ k #0 k 0 k / k (/ 6 -+ D]__]ba
<]fgUaWY Cb Ye :bhaXf
99
>[c 4[a P_ R[ 6EH
<KehY 7 <C:
<C: 7 <CUfg
<CUfg
Kbc E dhYhY
100
>[c 4[a P_ R[ 6EH
§ 5M_OMP P [c N[a P_ JRakthanmanon et al. (2013)K
§ U _ >M_ N[a P JKim et al. (2001)K
§ >4L [ST JKeogh et al. (2001)K
§ 7M e MNM P[ U S
• Kim, S. W., Park, S., & Chu, W. W. (2001). An index-based approach for similarity search supporting time warping in large sequence
databases. In 17th International Conference on Data Engineering, pp. 607–614.
• Keogh, E. & Ratanamahatana, C. A. (2004). Exact indexing of dynamic time warping. Knowledge and Information Systems, 7(3), 358–386.
• Rakthanmanon, T., Campana, B., Mueen, A., Batista, G., Westover, B., Zhu, Q., Zakaria, J., & Keogh, E. (2013). Addressing big data time
series: mining trillions of time series subsequences under dynamic time warping. ACM Transactions on Knowledge Discovery from Data
(TKDD), 7(3), 10:1–10:31 101
6M M_
§ D P M _1 /0 (((
§ D M OT M _1 j)- ((( (((
49
IYWbeX]a[f <heUg]ba
UeaUg]W hf]W
102
C _a _
103
C _a _1 5[Y a M U[ _
§ M O[ PU S1 )&,) E
§ O[ PU S1 ) &, E
>]efg CUfg
C:TBYb[ T=H
C:TBYb[ T=
/,
,-
+
./
/+
-
Cb Ye VbhaX AageU eYWbeX]a[
#
AagYe eYWbeX]a[
#
+ K 6 + +,
10 33 104
C _a _
160 MELODIC PATTER
Seed Category V1 V2 V3 V4
S1 0.92 0.92 0.91 0.89
S2 0.68 0.73 0.73 0.66
S3 0.35 0.34 0.35 0.35
Table 5.5: MAP scores for four variants of rank refinement method (Vi) for e
(S1, S2 and S3).
S1 S2 S3
S1 S2 S3
105
C _a _
160 MELODIC PATTER
Seed Category V1 V2 V3 V4
S1 0.92 0.92 0.91 0.89
S2 0.68 0.73 0.73 0.66
S3 0.35 0.34 0.35 0.35
Table 5.5: MAP scores for four variants of rank refinement method (Vi) for e
(S1, S2 and S3).
S1 S2 S3
S1 S2 S3
2, cUggYeaf
CbWU_ Wbfg ZhaWg]ba
<]fgUaWY fYcUeUV]_]gl
AageU if AagYe eYWbeX]a[
106
C _a _
AageU eYWbeX]a[
AagYe eYWbeX]a[
+ >GI à 2 KGI
+ >GI à / KGI
• Gulati, S., Serrà, J., Ishwar, V., & Serra, X. (2014). Mining melodic patterns in large audio
collections of Indian art music. In SITIS-MIRA, pp. 264–271.
CbWU_ Wbfg ZhaWg]ba à ]gl V_bW
107
[PUO BM B [O __U S
6U__UYU M M _ bM M _
108
[PUO BM B [O __U S
DY_bX]W
] ]_Ue]gl
GUggYea
<]fWbiYel
GUggYea
UeUWgYe]mUg]ba
hcYei]fYX LafhcYei]fYX LafhcYei]fYX
109
[PUO BM 5TM MO UfM U[
9MYMWM_ h 3 M Wl _
5[Y [_U U[ _ OURUO M _
ClSM Y[ UR_
110
[PUO BM 5TM MO UfM U[
Inter-recording Pattern
Detection
Intra-recording Pattern
Discovery
Data Processing
Pattern Network
Generation
Network
Filtering
Community Detection and
Characterization
Rāga motifs
Carnatic music collection
Melodic Pattern Discovery
Pattern Characterization
111
[PUO BM 5TM MO UfM U[
Inter-recording Pattern
Detection
Intra-recording Pattern
Discovery
Data Processing
Pattern Network
Generation
Network
Filtering
Community Detection and
Characterization
Rāga motifs
Carnatic music collection
Melodic Pattern Discovery
Pattern Characterization
112
[PUO BM 5TM MO UfM U[
• M. EJ Newman, “The structure and function of complex networks,” Society for Industrial and Applied Mathematics (SIAM) review,
vol. 45, no. 2, pp. 167–256, 2003.
LaX]eYWgYX
Pattern Network
Generation
Network
Filtering
Community Detection and
Characterization
113
[PUO BM 5TM MO UfM U[
• S. Maslov and K. Sneppen, “Specificity and stability in topology of protein networks,” Science, vol. 296, no. 5569, pp. 910– 913, 2002.
Pattern Network
Generation
Network
Filtering
Community Detection and
Characterization
114
[PUO BM 5TM MO UfM U[
• V. D. Blondel, J. L. Guillaume, R. Lambiotte, and E. Lefebvre, “Fast unfolding of communities in large networks,” Journal of Statistical
Mechanics: Theory and Experiment, vol. 2008, no. 10, pp. P10008, 2008.
• S Fortunato, “Community detection in graphs,” Physics Reports, vol. 486, no. 3, pp. 75–174, 2010.
Pattern Network
Generation
Network
Filtering
Community Detection and
Characterization
115
[PUO BM 5TM MO UfM U[
Pattern Network
Generation
Network
Filtering
Community Detection and
Characterization
b ha]gl KlcY EbXYf IYWbeX]a[f Ix[Uf
?U U U ][ ][ ][
b cbf]g]ba c eUfYf DYX]h CYff CYff
Ix[U c eUfYf DYX]h DYX]h CYff
116
[PUO BM 5TM MO UfM U[
Pattern Network
Generation
Network
Filtering
Community Detection and
Characterization
Ix[U X]fge]Vhg]ba
4: Graphical representation of the melodic pattern network after filterin
The detected communities in the network are indicated by different colors
these communities are shown on the right.
ommunities we empirically devise a goodness measure G, which de
od that a community Cq represents a r¯aga motif. We propose to use
G = NL4
c,
an estimate of the likelihood of r¯aga a0
q in Cq,
L =
Aq,1
N
,
N = 5
bhag
Ix[U _] Y_] bbX
-
+ +
3
5
=
117
[PUO BM 5TM MO UfM U[
Pattern Network
Generation
Network
Filtering
Community Detection and
Characterization
IYWbeX]a[ X]fge]Vhg]ba
N = 5
bhag
1x2 + 2x2 + 3x1
5
Yageb]X
, ,
+
munities we empirically devise a goodness measure G, which den
hat a community Cq represents a r¯aga motif. We propose to use
G = NL4
c,
estimate of the likelihood of r¯aga a0
q in Cq,
L =
Aq,1
N
,
how uniformly the nodes of the community are distributed over a
c =
Â
Lb
l=1 l ·Bq,l
N
. =
118
[PUO BM 5TM MO UfM U[
Pattern Network
Generation
Network
Filtering
Community Detection and
Characterization
5.24: Graphical representation of the melodic pattern network after filt
ld ˜D. The detected communities in the network are indicated by different col
es of these communities are shown on the right.
e communities we empirically devise a goodness measure G, which
elihood that a community Cq represents a r¯aga motif. We propose to us
G = NL4
c,
L is an estimate of the likelihood of r¯aga a0
q in Cq,
A
?bbXaYff YUfheY
119
EbXYf
Yageb]X
Ix[U _] Y_] bbX
7bM aM U[
49
IYWbeX]a[f <heUg]ba Ix[Uf b cbf]g]baf
+ Ix[U k + b ha]gl 6 + G eUfYf
+ Dhf]W]Uaf
120
C _a _
5.5 CHARACTERIZATION OF MELODIC PATTERNS 173
R¯aga µr sr R¯aga µr sr
Hamsadhv¯ani 0.84 0.23 B¯egad. a 0.88 0.11
K¯amavardani 0.78 0.17 K¯api 0.75 0.10
Darb¯ar 0.81 0.23 Bhairavi 0.91 0.15
Kaly¯an. i 0.90 0.10 Beh¯ag 0.84 0.16
K¯a ˙mbh¯oji 0.87 0.12 T¯od.¯ı 0.92 0.07
Table 5.7: Mean (µr) and standard deviation (sr) of µ√ for each r¯aga. R¯agas with µr 0.85
Rāgas
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
µ
0
5
10
15
20
25
30
35
Count
121
C _a _
5.5 CHARACTERIZATION OF MELODIC PATTERNS 173
R¯aga µr sr R¯aga µr sr
Hamsadhv¯ani 0.84 0.23 B¯egad. a 0.88 0.11
K¯amavardani 0.78 0.17 K¯api 0.75 0.10
Darb¯ar 0.81 0.23 Bhairavi 0.91 0.15
Kaly¯an. i 0.90 0.10 Beh¯ag 0.84 0.16
K¯a ˙mbh¯oji 0.87 0.12 T¯od.¯ı 0.92 0.07
Table 5.7: Mean (µr) and standard deviation (sr) of µ√ for each r¯aga. R¯agas with µr 0.85
Rāgas
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
µ
0
5
10
15
20
25
30
35
Count
GYe c eUfY cYeZbe UaWY
GYe ex[U cYeZbe UaWY
Dhf]W]Uaf’ U[eYY Yag
122
C _a _1 DaYYM e
DYUaIx[U# DYUaG eUfY# IUg]a[f 5 (3,(1/ 5
(2/
• Gulati, S., Serrà, J., Ishwar, V., & Serra, X. (2016). Discovering rāga motifs by characterizing
communities in networks of melodic patterns. In ICASSP, pp. 286–290. 123
+ , - . / 0 1 2 3 +
PYf
bhag
+ , - -
+,
,+
,/
--
3a [YM UO ClSM C O[S U U[
124
Chapter 6
Automatic R¯aga Recognition
6.1 Introduction
In this chapter, we address the task of automatically recognizing r¯agas in audio re-
cordings of IAM. We describe two novel approaches for r¯aga recognition that jointly
capture the tonal and the temporal aspects of melody. The contents of this chapter are
largely based on our published work in Gulati et al. (2016a,b).
R¯aga is a core musical concept used in compositions, performances, music organiz-
ation, and pedagogy of IAM. Even beyond the art music, numerous compositions in
Indian folk and film music are also based on r¯agas (Ganti, 2013). R¯aga is therefore
one of the most desired melodic descriptions of a recorded performance of IAM, and
an important criterion used by listeners to browse its audio music collections. Despite
its significance, there exists a large volume of audio content whose r¯aga is incorrectly
labeled or, simply, unlabeled. A computational approach to r¯aga recognition will al-
low us to automatically annotate large collections of audio music recordings. It will
enable r¯aga-based music retrieval in large audio archives, semantically-meaningful
music discovery and musicologically-informed navigation. Furthermore, a deeper
understanding of the r¯aga framework from a computational perspective will pave the
way for building applications for music pedagogy in IAM.
R¯aga recognition is the most studied research topic in MIR of IAM. There exist a
considerable number of approaches utilizing different characteristic aspects of r¯agas
such as svara set, svara salience and ¯ar¯ohana-avr¯ohana. A critical in-depth review of
the existing approaches for r¯aga recognition is presented in Section 2.4.3, wherein we
identify several shortcomings in these approaches and possible avenues for scientific
contribution to take this task to the next level. Here we provide a short summary of
this analysis:
Nearly half of the number of the existing approaches for r¯aga recognition do
not utilize the temporal aspects of melody at all (Table 2.3), which are crucial
177
3a [YM UO ClSM C O[S U U[
38BACKGROUND
Methods Svara set
Svara
salience
Svara
intonation
¯ar¯ohana-
avr¯ohana
Melodic
phrases
Svara
discretization
Temporal
aspects
Pandey et al. (2003) • • Yes •
Chordia & Rae (2007) • • • Yes •
Belle et al. (2009) • • • No
Shetty & Achary (2009) • • Yes •
Sridhar & Geetha (2009) • • Yes •
Koduri et al. (2011) • • Yes
Ranjani et al. (2011) • No
Chakraborty & De (2012) • Yes
Koduri et al. (2012) • • • Both
Chordia & ¸Sentürk (2013) • • • No
Dighe et al. (2013a) • • • Yes •
Dighe et al. (2013b) • • Yes
Koduri et al. (2014) • • • No
Kumar et al. (2014) • • • • Yes •
Dutta et al. (2015)a
• No •
a This method performs r¯aga verification and not recognition
Table 2.3: R¯aga recognition methods proposed in the literature along with the melodic characteristics they utilize to perform the task. We also
indicate if a method uses a discrete svara representation of melody. The methods are arranged in chronological order.
125
3a [YM UO ClSM C O[S U U[
38BACKGROUND
Methods Svara set
Svara
salience
Svara
intonation
¯ar¯ohana-
avr¯ohana
Melodic
phrases
Svara
discretization
Temporal
aspects
Pandey et al. (2003) • • Yes •
Chordia & Rae (2007) • • • Yes •
Belle et al. (2009) • • • No
Shetty & Achary (2009) • • Yes •
Sridhar & Geetha (2009) • • Yes •
Koduri et al. (2011) • • Yes
Ranjani et al. (2011) • No
Chakraborty & De (2012) • Yes
Koduri et al. (2012) • • • Both
Chordia & ¸Sentürk (2013) • • • No
Dighe et al. (2013a) • • • Yes •
Dighe et al. (2013b) • • Yes
Koduri et al. (2014) • • • No
Kumar et al. (2014) • • • • Yes •
Dutta et al. (2015)a
• No •
a This method performs r¯aga verification and not recognition
Table 2.3: R¯aga recognition methods proposed in the literature along with the melodic characteristics they utilize to perform the task. We also
indicate if a method uses a discrete svara representation of melody. The methods are arranged in chronological order.
e I [ ? D G X < a E
126
3a [YM UO ClSM C O[S U U[
38BACKGROUND
Methods Svara set
Svara
salience
Svara
intonation
¯ar¯ohana-
avr¯ohana
Melodic
phrases
Svara
discretization
Temporal
aspects
Pandey et al. (2003) • • Yes •
Chordia & Rae (2007) • • • Yes •
Belle et al. (2009) • • • No
Shetty & Achary (2009) • • Yes •
Sridhar & Geetha (2009) • • Yes •
Koduri et al. (2011) • • Yes
Ranjani et al. (2011) • No
Chakraborty & De (2012) • Yes
Koduri et al. (2012) • • • Both
Chordia & ¸Sentürk (2013) • • • No
Dighe et al. (2013a) • • • Yes •
Dighe et al. (2013b) • • Yes
Koduri et al. (2014) • • • No
Kumar et al. (2014) • • • • Yes •
Dutta et al. (2015)a
• No •
a This method performs r¯aga verification and not recognition
Table 2.3: R¯aga recognition methods proposed in the literature along with the melodic characteristics they utilize to perform the task. We also
indicate if a method uses a discrete svara representation of melody. The methods are arranged in chronological order.
e I [ ? D G X < a E
127
3a [YM UO ClSM C O[S U U[
38BACKGROUND
Methods Svara set
Svara
salience
Svara
intonation
¯ar¯ohana-
avr¯ohana
Melodic
phrases
Svara
discretization
Temporal
aspects
Pandey et al. (2003) • • Yes •
Chordia & Rae (2007) • • • Yes •
Belle et al. (2009) • • • No
Shetty & Achary (2009) • • Yes •
Sridhar & Geetha (2009) • • Yes •
Koduri et al. (2011) • • Yes
Ranjani et al. (2011) • No
Chakraborty & De (2012) • Yes
Koduri et al. (2012) • • • Both
Chordia & ¸Sentürk (2013) • • • No
Dighe et al. (2013a) • • • Yes •
Dighe et al. (2013b) • • Yes
Koduri et al. (2014) • • • No
Kumar et al. (2014) • • • • Yes •
Dutta et al. (2015)a
• No •
a This method performs r¯aga verification and not recognition
Table 2.3: R¯aga recognition methods proposed in the literature along with the melodic characteristics they utilize to perform the task. We also
indicate if a method uses a discrete svara representation of melody. The methods are arranged in chronological order.
e I [ ? D G X < a E
128
3a [YM UO ClSM C O[S U U[
38BACKGROUND
Methods Svara set
Svara
salience
Svara
intonation
¯ar¯ohana-
avr¯ohana
Melodic
phrases
Svara
discretization
Temporal
aspects
Pandey et al. (2003) • • Yes •
Chordia & Rae (2007) • • • Yes •
Belle et al. (2009) • • • No
Shetty & Achary (2009) • • Yes •
Sridhar & Geetha (2009) • • Yes •
Koduri et al. (2011) • • Yes
Ranjani et al. (2011) • No
Chakraborty & De (2012) • Yes
Koduri et al. (2012) • • • Both
Chordia & ¸Sentürk (2013) • • • No
Dighe et al. (2013a) • • • Yes •
Dighe et al. (2013b) • • Yes
Koduri et al. (2014) • • • No
Kumar et al. (2014) • • • • Yes •
Dutta et al. (2015)a
• No •
a This method performs r¯aga verification and not recognition
Table 2.3: R¯aga recognition methods proposed in the literature along with the melodic characteristics they utilize to perform the task. We also
indicate if a method uses a discrete svara representation of melody. The methods are arranged in chronological order.
129
3a [YM UO ClSM C O[S U U[
38BACKGROUND
Methods Svara set
Svara
salience
Svara
intonation
¯ar¯ohana-
avr¯ohana
Melodic
phrases
Svara
discretization
Temporal
aspects
Pandey et al. (2003) • • Yes •
Chordia & Rae (2007) • • • Yes •
Belle et al. (2009) • • • No
Shetty & Achary (2009) • • Yes •
Sridhar & Geetha (2009) • • Yes •
Koduri et al. (2011) • • Yes
Ranjani et al. (2011) • No
Chakraborty & De (2012) • Yes
Koduri et al. (2012) • • • Both
Chordia & ¸Sentürk (2013) • • • No
Dighe et al. (2013a) • • • Yes •
Dighe et al. (2013b) • • Yes
Koduri et al. (2014) • • • No
Kumar et al. (2014) • • • • Yes •
Dutta et al. (2015)a
• No •
a This method performs r¯aga verification and not recognition
Table 2.3: R¯aga recognition methods proposed in the literature along with the melodic characteristics they utilize to perform the task. We also
indicate if a method uses a discrete svara representation of melody. The methods are arranged in chronological order.
Ix[U 9 Ix[U : Ix[U
130
3a [YM UO ClSM C O[S U U[
38BACKGROUND
Methods Svara set
Svara
salience
Svara
intonation
¯ar¯ohana-
avr¯ohana
Melodic
phrases
Svara
discretization
Temporal
aspects
Pandey et al. (2003) • • Yes •
Chordia & Rae (2007) • • • Yes •
Belle et al. (2009) • • • No
Shetty & Achary (2009) • • Yes •
Sridhar & Geetha (2009) • • Yes •
Koduri et al. (2011) • • Yes
Ranjani et al. (2011) • No
Chakraborty & De (2012) • Yes
Koduri et al. (2012) • • • Both
Chordia & ¸Sentürk (2013) • • • No
Dighe et al. (2013a) • • • Yes •
Dighe et al. (2013b) • • Yes
Koduri et al. (2014) • • • No
Kumar et al. (2014) • • • • Yes •
Dutta et al. (2015)a
• No •
a This method performs r¯aga verification and not recognition
Table 2.3: R¯aga recognition methods proposed in the literature along with the melodic characteristics they utilize to perform the task. We also
indicate if a method uses a discrete svara representation of melody. The methods are arranged in chronological order.
131
3a [YM UO ClSM C O[S U U[
38BACKGROUND
Methods Svara set
Svara
salience
Svara
intonation
¯ar¯ohana-
avr¯ohana
Melodic
phrases
Svara
discretization
Temporal
aspects
Pandey et al. (2003) • • Yes •
Chordia & Rae (2007) • • • Yes •
Belle et al. (2009) • • • No
Shetty & Achary (2009) • • Yes •
Sridhar & Geetha (2009) • • Yes •
Koduri et al. (2011) • • Yes
Ranjani et al. (2011) • No
Chakraborty & De (2012) • Yes
Koduri et al. (2012) • • • Both
Chordia & ¸Sentürk (2013) • • • No
Dighe et al. (2013a) • • • Yes •
Dighe et al. (2013b) • • Yes
Koduri et al. (2014) • • • No
Kumar et al. (2014) • • • • Yes •
Dutta et al. (2015)a
• No •
a This method performs r¯aga verification and not recognition
Table 2.3: R¯aga recognition methods proposed in the literature along with the melodic characteristics they utilize to perform the task. We also
indicate if a method uses a discrete svara representation of melody. The methods are arranged in chronological order.
132
3a [YM UO ClSM C O[S U U[
38BACKGROUND
Methods Svara set
Svara
salience
Svara
intonation
¯ar¯ohana-
avr¯ohana
Melodic
phrases
Svara
discretization
Temporal
aspects
Pandey et al. (2003) • • Yes •
Chordia & Rae (2007) • • • Yes •
Belle et al. (2009) • • • No
Shetty & Achary (2009) • • Yes •
Sridhar & Geetha (2009) • • Yes •
Koduri et al. (2011) • • Yes
Ranjani et al. (2011) • No
Chakraborty & De (2012) • Yes
Koduri et al. (2012) • • • Both
Chordia & ¸Sentürk (2013) • • • No
Dighe et al. (2013a) • • • Yes •
Dighe et al. (2013b) • • Yes
Koduri et al. (2014) • • • No
Kumar et al. (2014) • • • • Yes •
Dutta et al. (2015)a
• No •
a This method performs r¯aga verification and not recognition
Table 2.3: R¯aga recognition methods proposed in the literature along with the melodic characteristics they utilize to perform the task. We also
indicate if a method uses a discrete svara representation of melody. The methods are arranged in chronological order.
Ix[U MYe]Z]WUg]ba
133
3a [YM UO ClSM C O[S U U[
2.4RELATEDWORKININDIANARTMUSIC39
Method Tonal Feature Tonic
Identification
Feature Recognition Method #R¯agas Dataset
(Dur./Num.)a
Audio
Type
Pandey et al. (2003) Pitch (Boersma & Weenink, 2001) NA Svara sequence HMM and n-Gram 2 - / 31 MP
Chordia & Rae (2007) Pitch (Sun, 2000) Manual PCD, PCDD SVM classifier 31 20 / 127 MP
Belle et al. (2009) Pitch (Rao & Rao, 2009) Manual PCD (parameterized) k-NN classifier 4 0.6 / 10 PP
Shetty & Achary (2009) Pitch (Sridhar & Geetha, 2006) NA #Svaras, Vakra svaras Neural Network
classifier
20 - / 90 MP
Sridhar & Geetha (2009) Pitch (Lee, 2006) Singer
identification
Svara set, its
sequence
String matching 3 - / 30 PP
Koduri et al. (2011) Pitch (Rao & Rao, 2010) Brute force PCD k-NN classifier 10 2.82 / 170 PP
Ranjani et al. (2011) Pitch (Boersma & Weenink, 2001) GMM fitting PDE SC-GMM and Set
matching
7 - / 48 PP
Chakraborty & De (2012) Pitch (Sengupta, 1990) Error
minimization
Svara set Set matching - - / - -
Koduri et al. (2012) Predominant pitch (Salamon &
Gómez, 2012)
Multipitch-based PCD variants k-NN classifier 43 - / 215 PP
Chordia & ¸Sentürk (2013) Pitch (Camacho, 2007) Brute force PCD variants k-NN and statistical
classifiers
31 20 / 127 MP
Dighe et al. (2013a) Chroma (Lartillot et al., 2008) Brute force
(v¯adi-based)
Chroma, Timbre
features
HMM 4 9.33 / 56 PP
Dighe et al. (2013b) Chroma (Lartillot et al., 2008) Brute force
(v¯adi-based)
PCD variant RF classifier 8 16.8 / 117 PP
Koduri et al. (2014) Predominant pitch (Salamon &
Gómez, 2012)
Multipitch-based PCD (parameterized) Different classifiers 45? 93/424 PP
Kumar et al. (2014) Predominant pitch (Salamon &
Gómez, 2012)
Brute force PCD + n-Gram
distribution
SVM classifier 10 2.82 / 170 PP
Dutta et al. (2015)* Predominant pitch (Salamon &
Gómez, 2012)
Cepstrum-based Pitch contours LCS with k-NN 30† 3 / 254 PP
a In the case of multiple datasets we list the larger one * This method performs r¯aga verification and not recognition
? Authors do not use all 45 r¯agas at once in a single experiment, but consider groups of 3 r¯agas per experiment † Authors finally use only 17 r¯agas in their experiment
ABBREVIATIONS: Dur.: Duration of the dataset, Num.: Number of recordings, NA: Not applicable, ‘-’: Not available, SC-GMM: semi-continuous GMM, MP: Monophonic, PP:
Polyphonic
Table 2.4: Summary of the R¯aga recognition methods proposed in the literature. The methods are arranged in chronological order. 134
2.4RELATEDWORKININDIANARTMUSIC39
Method Tonal Feature Tonic
Identification
Feature Recognition Method #R¯agas Dataset
(Dur./Num.)a
Audio
Type
Pandey et al. (2003) Pitch (Boersma & Weenink, 2001) NA Svara sequence HMM and n-Gram 2 - / 31 MP
Chordia & Rae (2007) Pitch (Sun, 2000) Manual PCD, PCDD SVM classifier 31 20 / 127 MP
Belle et al. (2009) Pitch (Rao & Rao, 2009) Manual PCD (parameterized) k-NN classifier 4 0.6 / 10 PP
Shetty & Achary (2009) Pitch (Sridhar & Geetha, 2006) NA #Svaras, Vakra svaras Neural Network
classifier
20 - / 90 MP
Sridhar & Geetha (2009) Pitch (Lee, 2006) Singer
identification
Svara set, its
sequence
String matching 3 - / 30 PP
Koduri et al. (2011) Pitch (Rao & Rao, 2010) Brute force PCD k-NN classifier 10 2.82 / 170 PP
Ranjani et al. (2011) Pitch (Boersma & Weenink, 2001) GMM fitting PDE SC-GMM and Set
matching
7 - / 48 PP
Chakraborty & De (2012) Pitch (Sengupta, 1990) Error
minimization
Svara set Set matching - - / - -
Koduri et al. (2012) Predominant pitch (Salamon &
Gómez, 2012)
Multipitch-based PCD variants k-NN classifier 43 - / 215 PP
Chordia & ¸Sentürk (2013) Pitch (Camacho, 2007) Brute force PCD variants k-NN and statistical
classifiers
31 20 / 127 MP
Dighe et al. (2013a) Chroma (Lartillot et al., 2008) Brute force
(v¯adi-based)
Chroma, Timbre
features
HMM 4 9.33 / 56 PP
Dighe et al. (2013b) Chroma (Lartillot et al., 2008) Brute force
(v¯adi-based)
PCD variant RF classifier 8 16.8 / 117 PP
Koduri et al. (2014) Predominant pitch (Salamon &
Gómez, 2012)
Multipitch-based PCD (parameterized) Different classifiers 45? 93/424 PP
Kumar et al. (2014) Predominant pitch (Salamon &
Gómez, 2012)
Brute force PCD + n-Gram
distribution
SVM classifier 10 2.82 / 170 PP
Dutta et al. (2015)* Predominant pitch (Salamon &
Gómez, 2012)
Cepstrum-based Pitch contours LCS with k-NN 30† 3 / 254 PP
a In the case of multiple datasets we list the larger one * This method performs r¯aga verification and not recognition
? Authors do not use all 45 r¯agas at once in a single experiment, but consider groups of 3 r¯agas per experiment † Authors finally use only 17 r¯agas in their experiment
ABBREVIATIONS: Dur.: Duration of the dataset, Num.: Number of recordings, NA: Not applicable, ‘-’: Not available, SC-GMM: semi-continuous GMM, MP: Monophonic, PP:
Polyphonic
Table 2.4: Summary of the R¯aga recognition methods proposed in the literature. The methods are arranged in chronological order.
3a [YM UO ClSM C O[S U U[
135
6M M_
49
IYWbeX]a[f <heUg]ba CYUX 9eg]fgfbaWYegfIx[Uf
49
IYWbeX]a[f <heUg]ba CYUX 9eg]fgfIY_YUfYfIx[Uf
UeaUg]W hf]W XUgUfYg
]aXhfgUa] hf]W XUgUfYg
136
B [ [_ P 3 [MOT _
time (s)
10
time (s)
10
30
0 20 40 60 80 100 120
Index
0
20
40
60
80
100
120
Index
(a)
0 20 40 60 80 100 120
Index
0
20
40
60
80
100
120
(b)
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
G eUfY VUfYX
Ix[U IYWb[a]g]ba
K<D VUfYX
Ix[U IYWb[a]g]ba
KbaU_ % KY cbeU_ <]fWeYg]mUg]ba
137
B [ [_ P 3 [MOT _
time (s)
10
time (s)
10
30
0 20 40 60 80 100 120
Index
0
20
40
60
80
100
120
Index
(a)
0 20 40 60 80 100 120
Index
0
20
40
60
80
100
120
(b)
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
G eUfY VUfYX
Ix[U IYWb[a]g]ba
K<D VUfYX
Ix[U IYWb[a]g]ba
KbaU_ % KY cbeU_
138
<]fWeYg]mUg]ba
BT M_ NM_ P ClSM C O[S U U[
Inter-recording
Pattern Detection
Intra-recording
Pattern Discovery
Data
Processing
Music collection
Melodic Pattern Discovery
Vocabulary
Extraction
Term-frequency
Feature Extraction
Feature
Normalization
TF-IDF Feature Extraction
Feature Matrix
Pattern Network
Generation
Similarity
Thresholding
Community
Detection
Melodic Pattern Clustering
139
BT M_ NM_ P ClSM C O[S U U[
Inter-recording
Pattern Detection
Intra-recording
Pattern Discovery
Data
Processing
Music collection
Melodic Pattern Discovery
Vocabulary
Extraction
Term-frequency
Feature Extraction
Feature
Normalization
TF-IDF Feature Extraction
Feature Matrix
Pattern Network
Generation
Similarity
Thresholding
Community
Detection
Melodic Pattern Clustering
140
BT M_ NM_ P ClSM C O[S U U[
141
BT M_ NM_ P ClSM C O[S U U[
142
BT M_ NM_ P ClSM C O[S U U[
Kbc]W DbXY_]a[ o KYkg _Uff]Z]WUg]ba
G eUfYf
Ix[U
IYWbeX]a[f
NbeXf
Kbc]W
<bWh Yagf
143
BT M_ NM_ P ClSM C O[S U U[
Kbc]W DbXY_]a[ o KYkg _Uff]Z]WUg]ba
KYe >eYdhYaWl t AaiYefY <bWh Yag
>eYdhYaWl #K> A<>
144
E 6 M a 7d MO U[
Vocabulary
Extraction
Term-frequency
Feature Extraction
Feature
Normalization
145
E 6 M a 7d MO U[
Vocabulary
Extraction
Term-frequency
Feature Extraction
Feature
Normalization
146
E 6 M a 7d MO U[
Vocabulary
Extraction
Term-frequency
Feature Extraction
Feature
Normalization
+ .
, ,
+ +
- - -
147
E 6 M a 7d MO U[
Vocabulary
Extraction
Term-frequency
Feature Extraction
Feature
Normalization
ies except the ones that comprise patterns extracted from only a single audio record-
ing. Such communities are analogous to the words that only occur within a document
and, hence, are irrelevant for modeling a topic.
We experiment with three different sets of features f1, f2 and f3, which are similar to
the TF-IDF features typically used in text information retrieval. We denote our corpus
by R comprising NR = |R| number of recordings. A melodic phrase and a recording
is denoted by √and r , respectively
f1(√,r) =
(
1, if f(√,r) > 0
0, otherwise
(6.1)
where, f(√,r) denotes the raw frequency of occurrence of pattern √ in recording
r. f1 only considers the presence or absence of a pattern in a recording. In order to
investigate if the frequency of occurrence of melodic patterns is relevant for charac-
terizing r¯agas, we take f2(√,r) = f(√,r). As mentioned, the melodic patterns that
occur across different r¯agas and in several recordings do not aid r¯aga recognition.
Therefore, to reduce their effect in the feature vector we employ a weighting scheme,
similar to the inverse document frequency (idf) weighting in text retrieval.
f3(√,r) = f(√,r)I(√,R) (6.2)
✓
NR
◆
We experiment with three different sets of features f1, f2 and f3, which are similar to
the TF-IDF features typically used in text information retrieval. We denote our corpus
by R comprising NR = |R| number of recordings. A melodic phrase and a recording
is denoted by√and r , respectively
f1(√,r) =
(
1, if f(√,r) > 0
0, otherwise
(6.1)
where, f(√,r) denotes the raw frequency of occurrence of pattern √ in recording
r. f1 only considers the presence or absence of a pattern in a recording. In order to
investigate if the frequency of occurrence of melodic patterns is relevant for charac-
terizing r¯agas, we take f2(√,r) = f(√,r). As mentioned, the melodic patterns that
occur across different r¯agas and in several recordings do not aid r¯aga recognition.
Therefore, to reduce their effect in the feature vector we employ a weighting scheme,
similar to the inverse document frequency (idf) weighting in text retrieval.
f3(√,r) = f(√,r)I(√,R) (6.2)
I(√,R) = log
✓
NR
|{r 2 R :√2 r}|
◆
(6.3)
1 2 3
the TF-IDF features typically used in text information retrieval. We denote our corp
by R comprising NR = |R| number of recordings. A melodic phrase and a recordi
is denoted by√and r , respectively
f1(√,r) =
(
1, if f(√,r) > 0
0, otherwise
(6
where, f(√,r) denotes the raw frequency of occurrence of pattern √ in recordi
r. f1 only considers the presence or absence of a pattern in a recording. In order
investigate if the frequency of occurrence of melodic patterns is relevant for chara
terizing r¯agas, we take f2(√,r) = f(√,r). As mentioned, the melodic patterns th
occur across different r¯agas and in several recordings do not aid r¯aga recognitio
Therefore, to reduce their effect in the feature vector we employ a weighting schem
similar to the inverse document frequency (idf) weighting in text retrieval.
f3(√,r) = f(√,r)I(√,R) (6
I(√,R) = log
✓
NR
|{r 2 R :√2 r}|
◆
(6
and, hence, are irrelevant for modeling a topic.
We experiment with three different sets of features f1, f2 and f3, which are similar to
the TF-IDF features typically used in text information retrieval. We denote our corpus
by R comprising NR = |R| number of recordings. A melodic phrase and a recording
is denoted by √and r , respectively
f1(√,r) =
(
1, if f(√,r) > 0
0, otherwise
(6.1)
where, f(√,r) denotes the raw frequency of occurrence of pattern √ in recording
r. f1 only considers the presence or absence of a pattern in a recording. In order to
investigate if the frequency of occurrence of melodic patterns is relevant for charac-
terizing r¯agas, we take f2(√,r) = f(√,r). As mentioned, the melodic patterns tha
occur across different r¯agas and in several recordings do not aid r¯aga recognition
Therefore, to reduce their effect in the feature vector we employ a weighting scheme
similar to the inverse document frequency (idf) weighting in text retrieval.
f3(√,r) = f(√,r)I(√,R) (6.2)
I(√,R) = log
✓
NR
|{r 2 R :√2 r}|
◆
(6.3)
148
C _a _
2 PATTERN-BASED R ¯AGA RECOGNITION 185
Method Feature SVML SGD NBM NBG RF LR 1-NN
MVSM
f1 51.04 55 37.5 54.37 25.41 55.83 -
f2 45.83 50.41 35.62 47.5 26.87 51.87 -
f3 45.83 51.66 67.29 44.79 23.75 51.87 -
MPC PCDfull - - - - - - 73.12
MGK
PDparam 30.41 22.29 27.29 28.12 42.91 30.83 25.62
PDcontext 54.16 43.75 5.2 33.12 49.37 54.79 26.25
ble 6.1: Accuracy (%) of r¯aga recognition on RRDCMD dataset by MVSM and other methods
thods using different features and classifiers. Bold text signifies the best accuracy by a
thod among all its variants
ndustani music recordings. We therefore do not consider this method for comparing
ults on Hindustani music.
e authors of MPC courteously ran the experiments on our dataset using the original
plementations of the method. For MGK, the authors kindly extracted the features
Dparam and PDcontext) using the original implementation of their method and the
periments using different classification strategies were done by us.
2.3 Results and Discussion
fore we proceed to present our results, we notify readers that the accuracies repor-
d in this section for different methods vary slightly from the ones reported in Gulati
6 AUTOMATIC R ¯AGA RECOGNITION
Method Feature SVML SGD NBM NBG RF LR 1-NN
MVSM
f1 71 72.33 69.33 79.33 38.66 74.33 -
f2 65.33 64.33 67.66 72.66 40.33 68 -
f3 65.33 62.66 82.66 72 41.33 67.66 -
MPC PCDfull - - - - - - 91.66
ble 6.2: Accuracy (%) of r¯aga recognition on RRDHMD dataset by MVSM and other methods
hods using different features and classifiers. Bold text signifies the best accuracy by a
hod among all its variants.
3 4 5 6 7 8 9 10 11 12 13 14
˜ (bin index)
0.06
0.08
0.10
0.12
0.14
0.16
0.18
0.20
0.22
ClusteringCoefficient(C)
0
10
20
30
40
50
60
70
Accuracy(%)
C(G) C(Gr) Accuracy of MVSM
3 4 5 6 7 8 9 10 11 12 13 14
˜ (bin index)
0.06
0.08
0.10
0.12
0.14
0.16
0.18
0.20
0.22
0.24
ClusteringCoefficient(C)
30
40
50
60
70
80
90
Accuracy(%)
C(G) C(Gr) Accuracy of MVSM
149
C _a _
2 PATTERN-BASED R ¯AGA RECOGNITION 185
Method Feature SVML SGD NBM NBG RF LR 1-NN
MVSM
f1 51.04 55 37.5 54.37 25.41 55.83 -
f2 45.83 50.41 35.62 47.5 26.87 51.87 -
f3 45.83 51.66 67.29 44.79 23.75 51.87 -
MPC PCDfull - - - - - - 73.12
MGK
PDparam 30.41 22.29 27.29 28.12 42.91 30.83 25.62
PDcontext 54.16 43.75 5.2 33.12 49.37 54.79 26.25
ble 6.1: Accuracy (%) of r¯aga recognition on RRDCMD dataset by MVSM and other methods
thods using different features and classifiers. Bold text signifies the best accuracy by a
thod among all its variants
ndustani music recordings. We therefore do not consider this method for comparing
ults on Hindustani music.
e authors of MPC courteously ran the experiments on our dataset using the original
plementations of the method. For MGK, the authors kindly extracted the features
Dparam and PDcontext) using the original implementation of their method and the
periments using different classification strategies were done by us.
2.3 Results and Discussion
fore we proceed to present our results, we notify readers that the accuracies repor-
d in this section for different methods vary slightly from the ones reported in Gulati
6 AUTOMATIC R ¯AGA RECOGNITION
Method Feature SVML SGD NBM NBG RF LR 1-NN
MVSM
f1 71 72.33 69.33 79.33 38.66 74.33 -
f2 65.33 64.33 67.66 72.66 40.33 68 -
f3 65.33 62.66 82.66 72 41.33 67.66 -
MPC PCDfull - - - - - - 91.66
ble 6.2: Accuracy (%) of r¯aga recognition on RRDHMD dataset by MVSM and other methods
hods using different features and classifiers. Bold text signifies the best accuracy by a
hod among all its variants.
3 4 5 6 7 8 9 10 11 12 13 14
˜ (bin index)
0.06
0.08
0.10
0.12
0.14
0.16
0.18
0.20
0.22
ClusteringCoefficient(C)
0
10
20
30
40
50
60
70
Accuracy(%)
C(G) C(Gr) Accuracy of MVSM
3 4 5 6 7 8 9 10 11 12 13 14
˜ (bin index)
0.06
0.08
0.10
0.12
0.14
0.16
0.18
0.20
0.22
0.24
ClusteringCoefficient(C)
30
40
50
60
70
80
90
Accuracy(%)
C(G) C(Gr) Accuracy of MVSM
b cUeY ZYUgheYf
gUgY bZ g Y Ueg
150
C _a _1 DaYYM e
beX]U wYague #, +-
GebcbfYX
BbXhe] Yg U_( #, +.
G <>h__
K> A<> #>-
G <cUeU
1-
01
//
3,
2-
UeaUg]W
#
]aXhfgUa]
#
• Gulati, S., Serrà, J., Ishwar, V., Şentürk, S., & Serra, X. (2016). Phrase-based rāga recognition
using vector space modeling. In ICASSP, pp. 66–70. 151
7 [ 3 M e_U_
R1-S.anmukhapriya
R2-K¯api
R3-Bhairavi
R4-Madhyam¯avati
R5-Bilahari
R6-M¯ohana˙m
R7-Sencurut.t.i
R8-´Sr¯ıranjani
R9-R¯ıtigaul.a
R10-Huss¯en¯ı
R11-Dhany¯asi
R12-At.¯ana
R13-Beh¯ag
R14-Surat.i
R15-K¯amavardani
R16-Mukh¯ari
R17-Sindhubhairavi
R18-Sah¯an¯a
R19-K¯anad.a
R20-M¯ay¯am¯al.avagaul.a
R21-N¯at.a
R22-´Sankar¯abharan.a˙m
R23-S¯av¯eri
R24-Kam¯as
R25-T¯od.i
R26-B¯egad.a
R27-Harik¯ambh¯oji
R28-´Sr¯ı
R29-Kaly¯an.i
R30-S¯ama
R31-N¯at.akurinji
R32-P¯urv¯ıkal.y¯an.i
R33-Yadukulak¯a˙mb¯oji
R34-D¯evag¯andh¯ari
R35-K¯ed¯aragaul.a
R36-¯Anandabhairavi
R37-Gaul.a
R38-Var¯al.i
R39-K¯a˙mbh¯oji
R40-Karaharapriya
R1
R2
R3
R4
R5
R6
R7
R8
R9
R10
R11
R12
R13
R14
R15
R16
R17
R18
R19
R20
R21
R22
R23
R24
R25
R26
R27
R28
R29
R30
R31
R32
R33
R34
R35
R36
R37
R38
R39
R40
8 1 1 1 1
4 1 1 1 1 1 1 1 1
9 1 1 1
1 5 3 1 2
7 1 1 1 2
1 5 1 1 1 1 1 1
1 5 1 1 1 1 1 1
2 5 1 1 3
12
1 1 2 5 1 1 1
7 1 2 1 1
1 7 2 1 1
11 1
10 1 1
10 2
1 11
1 1 1 2 1 4 1 1
1 9 1 1
1 1 1 1 1 3 1 1 1 1
11 1
12
12
1 3 7 1
10 2
12
1 10 1
1 9 2
1 3 1 1 2 1 2 1
11 1
1 3 1 1 6
1 1 10
1 11
1 1 1 1 5 2 1
1 1 10
1 1 1 8 1
2 1 2 1 1 1 3 1
2 2 1 7
12
1 1 10
1 11
0 1 2 3 4 5 6 7 8 9 10 11 12
152
7 [ 3 M e_U_
R1-S.anmukhapriya
R2-K¯api
R3-Bhairavi
R4-Madhyam¯avati
R5-Bilahari
R6-M¯ohana˙m
R7-Sencurut.t.i
R8-´Sr¯ıranjani
R9-R¯ıtigaul.a
R10-Huss¯en¯ı
R11-Dhany¯asi
R12-At.¯ana
R13-Beh¯ag
R14-Surat.i
R15-K¯amavardani
R16-Mukh¯ari
R17-Sindhubhairavi
R18-Sah¯an¯a
R19-K¯anad.a
R20-M¯ay¯am¯al.avagaul.a
R21-N¯at.a
R22-´Sankar¯abharan.a˙m
R23-S¯av¯eri
R24-Kam¯as
R25-T¯od.i
R26-B¯egad.a
R27-Harik¯ambh¯oji
R28-´Sr¯ı
R29-Kaly¯an.i
R30-S¯ama
R31-N¯at.akurinji
R32-P¯urv¯ıkal.y¯an.i
R33-Yadukulak¯a˙mb¯oji
R34-D¯evag¯andh¯ari
R35-K¯ed¯aragaul.a
R36-¯Anandabhairavi
R37-Gaul.a
R38-Var¯al.i
R39-K¯a˙mbh¯oji
R40-Karaharapriya
R1
R2
R3
R4
R5
R6
R7
R8
R9
R10
R11
R12
R13
R14
R15
R16
R17
R18
R19
R20
R21
R22
R23
R24
R25
R26
R27
R28
R29
R30
R31
R32
R33
R34
R35
R36
R37
R38
R39
R40
8 1 1 1 1
4 1 1 1 1 1 1 1 1
9 1 1 1
1 5 3 1 2
7 1 1 1 2
1 5 1 1 1 1 1 1
1 5 1 1 1 1 1 1
2 5 1 1 3
12
1 1 2 5 1 1 1
7 1 2 1 1
1 7 2 1 1
11 1
10 1 1
10 2
1 11
1 1 1 2 1 4 1 1
1 9 1 1
1 1 1 1 1 3 1 1 1 1
11 1
12
12
1 3 7 1
10 2
12
1 10 1
1 9 2
1 3 1 1 2 1 2 1
11 1
1 3 1 1 6
1 1 10
1 11
1 1 1 1 5 2 1
1 1 10
1 1 1 8 1
2 1 2 1 1 1 3 1
2 2 1 7
12
1 1 10
1 11
0 1 2 3 4 5 6 7 8 9 10 11 12
G eUfY VUfYX ex[Uf
153
7 [ 3 M e_U_
R1-S.anmukhapriya
R2-K¯api
R3-Bhairavi
R4-Madhyam¯avati
R5-Bilahari
R6-M¯ohana˙m
R7-Sencurut.t.i
R8-´Sr¯ıranjani
R9-R¯ıtigaul.a
R10-Huss¯en¯ı
R11-Dhany¯asi
R12-At.¯ana
R13-Beh¯ag
R14-Surat.i
R15-K¯amavardani
R16-Mukh¯ari
R17-Sindhubhairavi
R18-Sah¯an¯a
R19-K¯anad.a
R20-M¯ay¯am¯al.avagaul.a
R21-N¯at.a
R22-´Sankar¯abharan.a˙m
R23-S¯av¯eri
R24-Kam¯as
R25-T¯od.i
R26-B¯egad.a
R27-Harik¯ambh¯oji
R28-´Sr¯ı
R29-Kaly¯an.i
R30-S¯ama
R31-N¯at.akurinji
R32-P¯urv¯ıkal.y¯an.i
R33-Yadukulak¯a˙mb¯oji
R34-D¯evag¯andh¯ari
R35-K¯ed¯aragaul.a
R36-¯Anandabhairavi
R37-Gaul.a
R38-Var¯al.i
R39-K¯a˙mbh¯oji
R40-Karaharapriya
R1
R2
R3
R4
R5
R6
R7
R8
R9
R10
R11
R12
R13
R14
R15
R16
R17
R18
R19
R20
R21
R22
R23
R24
R25
R26
R27
R28
R29
R30
R31
R32
R33
R34
R35
R36
R37
R38
R39
R40
8 1 1 1 1
4 1 1 1 1 1 1 1 1
9 1 1 1
1 5 3 1 2
7 1 1 1 2
1 5 1 1 1 1 1 1
1 5 1 1 1 1 1 1
2 5 1 1 3
12
1 1 2 5 1 1 1
7 1 2 1 1
1 7 2 1 1
11 1
10 1 1
10 2
1 11
1 1 1 2 1 4 1 1
1 9 1 1
1 1 1 1 1 3 1 1 1 1
11 1
12
12
1 3 7 1
10 2
12
1 10 1
1 9 2
1 3 1 1 2 1 2 1
11 1
1 3 1 1 6
1 1 10
1 11
1 1 1 1 5 2 1
1 1 10
1 1 1 8 1
2 1 2 1 1 1 3 1
2 2 1 7
12
1 1 10
1 11
0 1 2 3 4 5 6 7 8 9 10 11 12
9__]YX ex[Uf
154
Computational Approaches for Melodic Description in Indian Art Music Corpora
Computational Approaches for Melodic Description in Indian Art Music Corpora
Computational Approaches for Melodic Description in Indian Art Music Corpora
Computational Approaches for Melodic Description in Indian Art Music Corpora
Computational Approaches for Melodic Description in Indian Art Music Corpora
Computational Approaches for Melodic Description in Indian Art Music Corpora
Computational Approaches for Melodic Description in Indian Art Music Corpora
Computational Approaches for Melodic Description in Indian Art Music Corpora
Computational Approaches for Melodic Description in Indian Art Music Corpora
Computational Approaches for Melodic Description in Indian Art Music Corpora
Computational Approaches for Melodic Description in Indian Art Music Corpora
Computational Approaches for Melodic Description in Indian Art Music Corpora
Computational Approaches for Melodic Description in Indian Art Music Corpora
Computational Approaches for Melodic Description in Indian Art Music Corpora
Computational Approaches for Melodic Description in Indian Art Music Corpora
Computational Approaches for Melodic Description in Indian Art Music Corpora
Computational Approaches for Melodic Description in Indian Art Music Corpora
Computational Approaches for Melodic Description in Indian Art Music Corpora
Computational Approaches for Melodic Description in Indian Art Music Corpora
Computational Approaches for Melodic Description in Indian Art Music Corpora
Computational Approaches for Melodic Description in Indian Art Music Corpora
Computational Approaches for Melodic Description in Indian Art Music Corpora
Computational Approaches for Melodic Description in Indian Art Music Corpora
Computational Approaches for Melodic Description in Indian Art Music Corpora
Computational Approaches for Melodic Description in Indian Art Music Corpora
Computational Approaches for Melodic Description in Indian Art Music Corpora
Computational Approaches for Melodic Description in Indian Art Music Corpora
Computational Approaches for Melodic Description in Indian Art Music Corpora
Computational Approaches for Melodic Description in Indian Art Music Corpora
Computational Approaches for Melodic Description in Indian Art Music Corpora
Computational Approaches for Melodic Description in Indian Art Music Corpora
Computational Approaches for Melodic Description in Indian Art Music Corpora
Computational Approaches for Melodic Description in Indian Art Music Corpora
Computational Approaches for Melodic Description in Indian Art Music Corpora
Computational Approaches for Melodic Description in Indian Art Music Corpora
Computational Approaches for Melodic Description in Indian Art Music Corpora
Computational Approaches for Melodic Description in Indian Art Music Corpora
Computational Approaches for Melodic Description in Indian Art Music Corpora
Computational Approaches for Melodic Description in Indian Art Music Corpora
Computational Approaches for Melodic Description in Indian Art Music Corpora
Computational Approaches for Melodic Description in Indian Art Music Corpora
Computational Approaches for Melodic Description in Indian Art Music Corpora
Computational Approaches for Melodic Description in Indian Art Music Corpora
Computational Approaches for Melodic Description in Indian Art Music Corpora
Computational Approaches for Melodic Description in Indian Art Music Corpora

Más contenido relacionado

Similar a Computational Approaches for Melodic Description in Indian Art Music Corpora

Collins
CollinsCollins
Collins
anesah
 
The kusc classical music dataset for audio key finding
The kusc classical music dataset for audio key findingThe kusc classical music dataset for audio key finding
The kusc classical music dataset for audio key finding
ijma
 
Pitch detection from singing voice, advantages, limitations and applications ...
Pitch detection from singing voice, advantages, limitations and applications ...Pitch detection from singing voice, advantages, limitations and applications ...
Pitch detection from singing voice, advantages, limitations and applications ...
eSAT Journals
 
Extracting Acoustic Features of Singing Voice for Various Applications Relate...
Extracting Acoustic Features of Singing Voice for Various Applications Relate...Extracting Acoustic Features of Singing Voice for Various Applications Relate...
Extracting Acoustic Features of Singing Voice for Various Applications Relate...
idescitation
 

Similar a Computational Approaches for Melodic Description in Indian Art Music Corpora (20)

Collins
CollinsCollins
Collins
 
The kusc classical music dataset for audio key finding
The kusc classical music dataset for audio key findingThe kusc classical music dataset for audio key finding
The kusc classical music dataset for audio key finding
 
IRJET- Music Genre Classification using Machine Learning Algorithms: A Compar...
IRJET- Music Genre Classification using Machine Learning Algorithms: A Compar...IRJET- Music Genre Classification using Machine Learning Algorithms: A Compar...
IRJET- Music Genre Classification using Machine Learning Algorithms: A Compar...
 
Nithin Xavier research_proposal
Nithin Xavier research_proposalNithin Xavier research_proposal
Nithin Xavier research_proposal
 
Mehfil : Song Recommendation System Using Sentiment Detected
Mehfil : Song Recommendation System Using Sentiment DetectedMehfil : Song Recommendation System Using Sentiment Detected
Mehfil : Song Recommendation System Using Sentiment Detected
 
CONTENT BASED AUDIO CLASSIFIER & FEATURE EXTRACTION USING ANN TECNIQUES
CONTENT BASED AUDIO CLASSIFIER & FEATURE EXTRACTION USING ANN TECNIQUESCONTENT BASED AUDIO CLASSIFIER & FEATURE EXTRACTION USING ANN TECNIQUES
CONTENT BASED AUDIO CLASSIFIER & FEATURE EXTRACTION USING ANN TECNIQUES
 
Pitch detection from singing voice, advantages, limitations and applications ...
Pitch detection from singing voice, advantages, limitations and applications ...Pitch detection from singing voice, advantages, limitations and applications ...
Pitch detection from singing voice, advantages, limitations and applications ...
 
A Review on Song Recommendation Techniques
A Review on Song Recommendation TechniquesA Review on Song Recommendation Techniques
A Review on Song Recommendation Techniques
 
Extracting Acoustic Features of Singing Voice for Various Applications Relate...
Extracting Acoustic Features of Singing Voice for Various Applications Relate...Extracting Acoustic Features of Singing Voice for Various Applications Relate...
Extracting Acoustic Features of Singing Voice for Various Applications Relate...
 
Mood classification of songs based on lyrics
Mood classification of songs based on lyricsMood classification of songs based on lyrics
Mood classification of songs based on lyrics
 
Optimized audio classification and segmentation algorithm by using ensemble m...
Optimized audio classification and segmentation algorithm by using ensemble m...Optimized audio classification and segmentation algorithm by using ensemble m...
Optimized audio classification and segmentation algorithm by using ensemble m...
 
Knn a machine learning approach to recognize a musical instrument
Knn  a machine learning approach to recognize a musical instrumentKnn  a machine learning approach to recognize a musical instrument
Knn a machine learning approach to recognize a musical instrument
 
Music data mining
Music  data miningMusic  data mining
Music data mining
 
AI THROUGH THE EYES OF ORGANISE SOUND
AI THROUGH THE EYES OF ORGANISE SOUNDAI THROUGH THE EYES OF ORGANISE SOUND
AI THROUGH THE EYES OF ORGANISE SOUND
 
Emofy
Emofy Emofy
Emofy
 
A Proposed Framework For Visualising Music Mood Using Texture Image
A Proposed Framework For Visualising Music Mood Using Texture ImageA Proposed Framework For Visualising Music Mood Using Texture Image
A Proposed Framework For Visualising Music Mood Using Texture Image
 
IRJET- A Survey on Sound Recognition
IRJET- A Survey on Sound RecognitionIRJET- A Survey on Sound Recognition
IRJET- A Survey on Sound Recognition
 
Extraction and Conversion of Vocals
Extraction and Conversion of VocalsExtraction and Conversion of Vocals
Extraction and Conversion of Vocals
 
Music Emotion Classification based on Lyrics-Audio using Corpus based Emotion...
Music Emotion Classification based on Lyrics-Audio using Corpus based Emotion...Music Emotion Classification based on Lyrics-Audio using Corpus based Emotion...
Music Emotion Classification based on Lyrics-Audio using Corpus based Emotion...
 
C045031116
C045031116C045031116
C045031116
 

Más de Sankalp Gulati

Mining Melodic Patterns in Large Audio Collections of Indian Art Music
	Mining Melodic Patterns in Large Audio Collections of Indian Art Music	Mining Melodic Patterns in Large Audio Collections of Indian Art Music
Mining Melodic Patterns in Large Audio Collections of Indian Art Music
Sankalp Gulati
 

Más de Sankalp Gulati (8)

Mining Melodic Patterns in Large Audio Collections of Indian Art Music
	Mining Melodic Patterns in Large Audio Collections of Indian Art Music	Mining Melodic Patterns in Large Audio Collections of Indian Art Music
Mining Melodic Patterns in Large Audio Collections of Indian Art Music
 
Landmark Detection in Hindustani Music Melodies
Landmark Detection in Hindustani Music MelodiesLandmark Detection in Hindustani Music Melodies
Landmark Detection in Hindustani Music Melodies
 
Phrase-based Rāga Recognition Using Vector Space Modeling
Phrase-based Rāga Recognition Using Vector Space ModelingPhrase-based Rāga Recognition Using Vector Space Modeling
Phrase-based Rāga Recognition Using Vector Space Modeling
 
[Tutorial] Computational Approaches to Melodic Analysis of Indian Art Music
[Tutorial] Computational Approaches to Melodic Analysis of Indian Art Music[Tutorial] Computational Approaches to Melodic Analysis of Indian Art Music
[Tutorial] Computational Approaches to Melodic Analysis of Indian Art Music
 
Computational Approaches to Melodic Analysis of Indian Art Music
Computational Approaches to Melodic Analysis of Indian Art MusicComputational Approaches to Melodic Analysis of Indian Art Music
Computational Approaches to Melodic Analysis of Indian Art Music
 
Computational Melodic Analysis of Indian Art Music
Computational Melodic Analysis of Indian Art MusicComputational Melodic Analysis of Indian Art Music
Computational Melodic Analysis of Indian Art Music
 
Discovery and Characterization of Melodic Motives in Large Audio Music Collec...
Discovery and Characterization of Melodic Motives in Large Audio Music Collec...Discovery and Characterization of Melodic Motives in Large Audio Music Collec...
Discovery and Characterization of Melodic Motives in Large Audio Music Collec...
 
Tonic Identification System for Hindustani and Carnatic Music
Tonic Identification System for Hindustani and Carnatic MusicTonic Identification System for Hindustani and Carnatic Music
Tonic Identification System for Hindustani and Carnatic Music
 

Último

Último (20)

Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Plant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxPlant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptx
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 

Computational Approaches for Melodic Description in Indian Art Music Corpora

  • 1. 5[Y a M U[ M 3 [MOT _ R[ [PUO 6 _O U U[ U PUM 3 a_UO 5[ [ M . 6[O [ M ET _U_ B _ M U[ a_UO E OT [ [Se 9 [a 6 M Y [R R[ YM U[ M P 5[YYa UOM U[ E OT [ [SU _ F Ub _U M B[Y a MN M 4M O [ M D MU ET _U_ PU O [ 1 ET _U_ O[YYU Y YN _1 . )/ [b YN ().
  • 2. Aa U [PaO U[ a_UO 5[ [ M M P 6M M_ _ [Pe 6 _O U [ _ M P C _ M U[ _ [Pe BM B [O __U S 3 UOM U[ _ DaYYM e M P a a B _ O Ub _ClSM C O[S U U[ 4
  • 3. Aa U [PaO U[ a_UO 5[ [ M M P 6M M_ _ [Pe 6 _O U [ _ M P C _ M U[ _ [Pe BM B [O __U S 3 UOM U[ _ DaYYM e M P a a B _ O Ub _ClSM C O[S U U[ Chapter 1 Introduction Information technology (IT) is constantly shaping the world that we live in. Its ad- vancements have changed human behavior and influenced the way we connect with our environment. Music being a universal socio-cultural phenomena has been deeply influenced by these advancements. The way music is created, stored, disseminated, listened to, and even learned has changed drastically over the last few decades. A massive amount of audio music content is now available on demand. Thus, it becomes necessary to develop computational techniques that can process and automatically de- scribe large volumes of digital music content to facilitate novel ways of interaction with it. There are different information sources such as editorial metadata, social data, and audio recordings that can be exploited to generate a description of music. Melody, along with harmony and rhythm, is a fundamental facet of music, and therefore, an essential component in its description. In this thesis, we focus on describing melodic aspects of music through an automated analysis of audio content. 1.1 Motivation “It is melody that enables us to distinguish one work from another. It is melody that human beings are innately able to reproduce by singing, humming, and whistling. It is melody that makes music memorable: we are likely to recall a tune long after we have forgotten its text.” (Selfridge-Field, 1998) The importance of melody in our musical experiences makes its analysis and descrip- tion a crucial component in music content processing. It becomes even more import- ant for melody dominant music traditions such as Indian art music (IAM), where the concept of harmony (functional harmony as understood in common practice) does not exist, and the complex melodic structure takes the central role in music aesthetics. 1 Chapter 3 Music Corpora and Datasets 3.1 Introduction A research corpus is a collection of data compiled to study a research problem. A well designed research corpus is representative of the domain under study. It is practically infeasible to work with the entire universe of data. Therefore, to ensure scalability of information retrieval technologies to real-world scenarios, it is to important to develop and test computational approaches using a representative data corpus. Moreover, an easily accessible data corpus provides a common ground for researchers to evaluate their methods, and thus, accelerates knowledge sharing and advancement. Not every computational task requires the entire research corpus for development and evaluation of approaches. Typically a subset of the corpus is used in a specific research task. We call this subset a test corpus or test dataset. The models built over a test dataset can later be extended to the entire research corpus. Test dataset is a static collection of data specific to an experiment, as opposed to a research corpus, which can evolve over time. Therefore, different versions of the test dataset used in a specific experiment should be retained for ensuring reproducibility of the research results. Note that a test dataset should not be confused with the training and testing split of a dataset, which are the terms used in the context of a cross validation experimental setup. In MIR, a considerable number of the computational approaches follow a data-driven methodology, and hence, a well curated research corpus becomes a key factor in de- termining the success of these approaches. Due to the importance of a good data corpus in research, building a corpus in itself is a fundamental research task (MacMul- len, 2003). MIR can be regarded as a relatively new research area within information retrieval, which has primarily gained popularity in last two decades. Even today, a significant number of the studies in MIR use ad-hoc procedures to build a collection of data to be used in the experiments. Quite often the audio recordings used in the experiments are taken from a researcher’s personal music collection. Availability of a good representative research corpus has been a challenge in MIR (Serra, 2014). This 61 Chapter 4 Melody Descriptors and Representations 4.1 Introduction In this chapter, we describe methods for extracting relevant melodic descriptors and low-level melody representation from raw audio signals. These melodic features are used as inputs to perform higher level melodic analyses by the methods described in the subsequent chapters. Since these features are used in a number of computational tasks, we consolidate and present these common processing blocks in the current chapter. This chapter is largely based on our published work presented in Gulati et al. (2014a,b,c). There are four sections in this chapter, and in each section, we describe the extraction of a different melodic descriptor or a representation. In Section 4.2, we focus on the task of automatically identifying the tonic pitch in an audio recording. Our main objective is to perform a comparative evalu- ation of different existing methods and select the best method to be used in our work. In Section 4.3, we present the method used to extract the predominant pitch from audio signals, and describe the post-processing steps to reduce frequently occurring errors. In Section 4.4, we describe the process of segmenting the solo percussion re- gions (Tani sections) in the audio recordings of Carnatic music. In Section 4.5, we describe our ny¯as-based approach for segmenting melodies in Hindustani music. 87 Chapter 5 Melodic Pattern Processing: Similarity, Discovery and Characterization 5.1 Introduction In this chapter, we present our methodology for discovering musically relevant melodic patterns in sizable audio collections of IAM. We address three main computational tasks involved in this process: melodic similarity, pattern discovery and characteriz- ation of the discovered melodic patterns. We refer to these different tasks jointly as melodic pattern processing. “Only by repetition can a series of tones be characterized as something definite. Only repetition can demarcate a series of tones and its purpose. Repetition thus is the basis of music as an art” (Schenker et al., 1980) Repeating patterns are at the core of music. Consequently, analysis of patterns is fundamental in music analysis. In IAM, recurring melodic patterns are the building blocks of melodic structures. They provide a base for improvisation and composition, and thus, are crucial to the analysis and description of r¯agas, compositions, and artists in this music tradition. A detailed account of the importance of melodic patterns in IAM is provided in Section 2.3.2. To recapitulate, from the literature review presented in Section 2.4.2 and Section 2.5.2 we see that the approaches for pattern processing in music can be broadly put into two categories (Figure 5.1). One of the types of approaches perform pattern detection (or matching) and follow a supervised methodology. In these approaches the system knows a priori the pattern to be extracted. Typically such a system is fed with exem- plar patterns or queries and is expected to extract all of their occurrences in a piece of 121 Chapter 6 Automatic R¯aga Recognition 6.1 Introduction In this chapter, we address the task of automatically recognizing r¯agas in audio re- cordings of IAM. We describe two novel approaches for r¯aga recognition that jointly capture the tonal and the temporal aspects of melody. The contents of this chapter are largely based on our published work in Gulati et al. (2016a,b). R¯aga is a core musical concept used in compositions, performances, music organiz- ation, and pedagogy of IAM. Even beyond the art music, numerous compositions in Indian folk and film music are also based on r¯agas (Ganti, 2013). R¯aga is therefore one of the most desired melodic descriptions of a recorded performance of IAM, and an important criterion used by listeners to browse its audio music collections. Despite its significance, there exists a large volume of audio content whose r¯aga is incorrectly labeled or, simply, unlabeled. A computational approach to r¯aga recognition will al- low us to automatically annotate large collections of audio music recordings. It will enable r¯aga-based music retrieval in large audio archives, semantically-meaningful music discovery and musicologically-informed navigation. Furthermore, a deeper understanding of the r¯aga framework from a computational perspective will pave the way for building applications for music pedagogy in IAM. R¯aga recognition is the most studied research topic in MIR of IAM. There exist a considerable number of approaches utilizing different characteristic aspects of r¯agas such as svara set, svara salience and ¯ar¯ohana-avr¯ohana. A critical in-depth review of the existing approaches for r¯aga recognition is presented in Section 2.4.3, wherein we identify several shortcomings in these approaches and possible avenues for scientific contribution to take this task to the next level. Here we provide a short summary of this analysis: Nearly half of the number of the existing approaches for r¯aga recognition do not utilize the temporal aspects of melody at all (Table 2.3), which are crucial 177 Chapter 7 Applications 7.1 Introduction The research work presented in this thesis has several applications. A number of these applications have already been described in Section 1.1. While some applications such as r¯aga-based music retrieval and music discovery are relevant mainly in the context of large audio collections, there are several applications that can be developed on the scale of the music collection compiled in the CompMusic project. In this chapter, we present some concrete examples of such applications that have already incorporated parts of our work. We provide a brief overview of Dunya, a collection of the music corpora and software tools developed during the CompMusic project and, Sar¯aga and Riy¯az, the mobile applications developed within the technology transfer project, Culture Aware MUsic Technologies (CAMUT). In addition, we present three web demos that showcase parts of the outcomes of our computational methods. We also briefly present one of our recent studies that performs musicologically motivated exploration of melodic structures in IAM. It serves as an example of how our methods can facilitate investigations in computational musicology. 7.2 Dunya Dunya55 comprises the music corpora and the software tools that have been developed as part of the CompMusic project. It includes data for five music traditions - Hindus- tani, Carnatic, Turkish Makam, Jingju and Andalusian music. By providing access to the gathered data (such as audio recordings and metadata), and the generated data (such as audio descriptors) in the project, Dunya aims to facilitate study and explor- ation of relevant musical aspects of different music repertoires. The CompMusic corpora mainly comprise audio recordings and complementary information that de- scribes those recordings. This complementary information can be in the form of the 55http://dunya.compmusic.upf.edu/ 207 Chapter 8 Summary and Future Perspectives 8.1 Introduction In this thesis, we have presented a number of computational approaches for analyzing melodic elements at different levels of melodic organization in IAM. In tandem, these approaches generate a high-level melodic description of the audio collections in this music tradition. They build upon each other to finally allow us to achieve the goals that we set at the beginning of this thesis, which are: To curate and structure representative music corpora of IAM that comprise au- dio recordings and the associated metadata, and use that to compile sizable and well annotated tests datasets for melodic analyses. To develop data-driven computational approaches for the discovery and char- acterization of musically relevant melodic patterns in sizable audio collections of IAM To devise computational approaches for automatically recognizing r¯agas in re- corded performances of IAM. Based on the results presented in this thesis, we can now say that our goals are suc- cessfully met. We started this thesis by presenting our motivation behind the analysis and description of melodies in IAM, highlighting the opportunities and challenges that this music tradition offers within the context of music information research and the CompMusic project (Chapter 1). We provided a brief introduction to IAM and its melodic organization, and critically reviewed the existing literature on related topics within the context of MIR and IAM (Chapter 2). In Chapter 3, we provided a comprehensive overview of the music corpora and data- sets that were compiled and structured in our work as a part of the CompMusic project. To the best of our knowledge, these corpora comprise the largest audio collections of IAM along with curated metadata and automatically extracted music 221 Chapter 2 Background 2.1 Introduction In this chapter, we present our review of the existing literature related with the work presented in this thesis. In addition, we also present a brief overview of the relevant music and mathematical background. We start with a brief discussion on the termin- ology used in this thesis (Section 2.2). Subsequently, we provide an overview of the selected music concepts to better understand the computational tasks addressed in our work (Section 2.3). We then present a review of the relevant literature, which we divide into two parts. We first present a review of the work done in computational ana- lysis of IAM (Section 2.4). This includes approaches for tonic identification, melodic pattern processing and automatic r¯aga recognition. In the second part, we present relevant work done in MIR in general, including topics related to pattern processing (detection and discovery), and key estimation (Section 2.5). Finally, we provide a brief overview of selected scientific concepts (Section 2.6). 2.2 Terminology In this section, we provide our working definition of selected terms that we have used throughout the thesis. To begin with, it is important to understand the mean- ing of melody in the context of this thesis. As we see from the literature, defining melody in itself has been a challenge, with no consensus on a single definition of melody (Gómez et al., 2003; Salamon, 2013). We do not aim here to formally define melody in the universal sense, but present its working definition within the scope of this thesis. Before we proceed, it is important to understand the setup of a concert of IAM. Every performance of IAM has a lead artist, who plays the central role (also literally positioned in the center of the stage), and all other instruments are considered as accompaniments. There is a main melody line by the lead performer, and typic- ally, also a melodic accompaniment that follows the main melody. A more detailed 13 5
  • 4. [PaO U[ 6 Chapter 1 Introduction Information technology (IT) is constantly shaping the world that we live in. Its ad- vancements have changed human behavior and influenced the way we connect with our environment. Music being a universal socio-cultural phenomena has been deeply influenced by these advancements. The way music is created, stored, disseminated, listened to, and even learned has changed drastically over the last few decades. A massive amount of audio music content is now available on demand. Thus, it becomes necessary to develop computational techniques that can process and automatically de- scribe large volumes of digital music content to facilitate novel ways of interaction with it. There are different information sources such as editorial metadata, social data, and audio recordings that can be exploited to generate a description of music. Melody, along with harmony and rhythm, is a fundamental facet of music, and therefore, an essential component in its description. In this thesis, we focus on describing melodic aspects of music through an automated analysis of audio content. 1.1 Motivation “It is melody that enables us to distinguish one work from another. It is melody that human beings are innately able to reproduce by singing, humming, and whistling. It is melody that makes music memorable: we are likely to recall a tune long after we have forgotten its text.” (Selfridge-Field, 1998) The importance of melody in our musical experiences makes its analysis and descrip- tion a crucial component in music content processing. It becomes even more import- ant for melody dominant music traditions such as Indian art music (IAM), where the concept of harmony (functional harmony as understood in common practice) does not exist, and the complex melodic structure takes the central role in music aesthetics. 1
  • 5. [ UbM U[ 3a [YM UO Ya_UO P _O U U[ 7
  • 6. [ UbM U[ ] ]_Ue]gl& <]fWbiYel& YUeW & IUX]b& IYWb YaXUg]ba GYXU[b[l& =XhWUg]ba& =agYegU]a Yag& =a UaWYX _]fgYa]a[ b chgUg]baU_ hf]Wb_b[l 8
  • 7. [ UbM U[ IYceYfYagUg]ba Y[ YagUg]ba ] ]_Ue]gl Dbg]Zf <]fWbiYel GUggYea UeUWgYe]mUg]ba DY_bX]W <YfWe]cg]ba 9
  • 8. 5[Y a_UO B [V O • Serra, X. (2011). A multicultural approach to music information research. In Proc. of ISMIR, pp. 151–156. <YfWe]cg]ba <UgU Xe]iYa h_gheY U UeY FcYa UWWYff IYcebXhW]V]_]gl 10
  • 9. PUM M a_UO 11 ]aXhfgUa] BUhfghi B( ?Ua[h_] UeaUg]W M][aYf Af Ue
  • 10. E YU [ [Se § E[ UO1 4M_ U OT GM U _ MO [__ M U_ _ § [Pe1 BU OT [R T P[YU M Y [PUO _[a O § ClSM1 [PUO R MY c[ W § DbM M1 3 M [S[a_ [ M [ § 3 mTM M Mb mTM M1 3_O PU S P _O PU S [S __U[ § 5TM M 1 3N_ MO U[ [R T Y [PUO [S __U[ § [ UR_1 5TM MO U_ UO Y [PUO M _ § 9MYMWM_1 9 UPU S Y [PUO S _ a _ 5M M UO Ya_UO 12
  • 11. A [ a U U _ M P 5TM S _ § A [ a U U _ 13
  • 12. A [ a U U _ M P 5TM S _ § A [ a U U _ § H _ aPU P Ya_UO MPU U[ 14
  • 13. A [ a U U _ M P 5TM S _ § A [ a U U _ § H _ aPU P Ya_UO MPU U[ § : [ T[ UO MO[a_ UOM OTM MO U_ UO_ 15
  • 14. A [ a U U _ M P 5TM S _ § A [ a U U _ § H _ aPU P Ya_UO MPU U[ § : [ T[ UO MO[a_ UOM OTM MO U_ UO_ • http://www.music- ir.org/mirex/wiki/MIREX_HOME • Salamon, J. & Gómez, E. (2012). Melody extraction from polyphonic music signals using pitch contour characteristics. IEEE Transactions on Audio, Speech, and Language Processing, 20(6), 1759–1770. DAI=O /DAI=O 3 /<: AE<A9E 2 16 < <
  • 15. A [ a U U _ M P 5TM S _ § A [ a U U _ § H _ aPU P Ya_UO MPU U[ § : [ T[ UO MO[a_ UOM OTM MO U_ UO_ § [PUO M _ à NaU PU S N [OW_ 17
  • 16. A [ a U U _ M P 5TM S _ § A [ a U U _ § H _ aPU P Ya_UO MPU U[ § : [ T[ UO MO[a_ UOM OTM MO U_ UO_ § [PUO M _ à NaU PU S N [OW_ § ClSM R MY c[ W1 • Martinez, J. L. (2001). Semiosis in Hindustani Music. Motilal Banarsidass Publishers. 18 rK Y ex[U ]f beY Z]kYX g Ua U bXY& UaX _Yff Z]kYX g Ua g Y Y_bXl& VYlbaX g Y bXY UaX f beg bZ Y_bXl& UaX e]W Ye Vbg g Ua U []iYa bXY be U []iYa Y_bXl(s RDUeg]aYm& G(30S DbXY 5 Ix[U 5 DY_bXl
  • 17. A [ a U U _ M P 5TM S _ § 5TM S _ 19
  • 18. A [ a U U _ M P 5TM S _ § 5TM S _ § Y [bU_M [ e Ya_UO MPU U[ Time (s) 10 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 200 400 600 800 000 200 400 20
  • 19. A [ a U U _ M P 5TM S _ § 5TM S _ § Y [bU_M [ e Ya_UO MPU U[ § [PUO M _O U U[ OTM SU S M P U P RU P 0 2 4 6 8 10 Time (s) 100 0 100 200 300 400 500 600 Frequency(Cents) S m g Characteristic movement • http://www.freesound.org/people/sankalp/sounds/360428/ 21
  • 20. A [ a U U _ M P 5TM S _ § 5TM S _ § Y [bU_M [ e Ya_UO MPU U[ § [PUO M _O U U[ OTM SU S M P U P RU P § [ _ M PM P R O R ]a Oe E[ UO U OT 9 #++ 9 #++0 : #+,- ? #+ - ? #32 22
  • 21. A [ a U U _ M P 5TM S _ § 5TM S _ § Y [bU_M [ e Ya_UO MPU U[ § [PUO M _O U U[ OTM SU S M P U P RU P § [ _ M PM P R O R ]a Oe E[ UO U OT § >[ S MaPU[ O[ PU S_ q+ FLI 23
  • 22. 4 [MP ANV O Ub _ § 5[Y U M U[ Oa M U[ [R O[ [ M [R 3 § 6U_O[b e OTM MO UfM U[ [R Y [PUO M _ § 3a [YM UO lSM O[S U U[ 24
  • 23. 5[Y a M U[ M EM_W_ DY_bX]W <YfWe]cg]ba 25
  • 24. 5[Y a M U[ M EM_W_ Kba]W AXYag]Z]WUg]ba #=iU_hUg]ba DY_bX]W <YfWe]cg]ba 26
  • 25. 5[Y a M U[ M EM_W_ Kba]W AXYag]Z]WUg]ba #=iU_hUg]ba DY_bX]W <YfWe]cg]ba Elxf Y[ YagUg]ba 27
  • 26. 5[Y a M U[ M EM_W_ Kba]W AXYag]Z]WUg]ba #=iU_hUg]ba DY_bX]W <YfWe]cg]ba DY_bX]W ] ]_Ue]gl Elxf Y[ YagUg]ba 28
  • 27. 5[Y a M U[ M EM_W_ Kba]W AXYag]Z]WUg]ba #=iU_hUg]ba DY_bX]W <YfWe]cg]ba GUggYea <]fWbiYel DY_bX]W ] ]_Ue]gl Elxf Y[ YagUg]ba time (s) 10 time (s) 10 29
  • 28. 5[Y a M U[ M EM_W_ Kba]W AXYag]Z]WUg]ba #=iU_hUg]ba DY_bX]W <YfWe]cg]ba GUggYea UeUWgYe]mUg]ba GUggYea <]fWbiYel DY_bX]W ] ]_Ue]gl Elxf Y[ YagUg]ba time (s) 10 time (s) 10 30
  • 29. 5[Y a M U[ M EM_W_ Kba]W AXYag]Z]WUg]ba #=iU_hUg]ba 9hgb Ug]W Ix[U IYWb[a]g]ba DY_bX]W <YfWe]cg]ba GUggYea <]fWbiYel DY_bX]W ] ]_Ue]gl Elxf Y[ YagUg]ba time (s) 10 time (s) 10 31 GUggYea UeUWgYe]mUg]ba
  • 30. a_UO 5[ [ M M P 6M M_ _ 32 Chapter 3 Music Corpora and Datasets 3.1 Introduction A research corpus is a collection of data compiled to study a research problem. A well designed research corpus is representative of the domain under study. It is practically infeasible to work with the entire universe of data. Therefore, to ensure scalability of information retrieval technologies to real-world scenarios, it is to important to develop and test computational approaches using a representative data corpus. Moreover, an easily accessible data corpus provides a common ground for researchers to evaluate their methods, and thus, accelerates knowledge sharing and advancement. Not every computational task requires the entire research corpus for development and evaluation of approaches. Typically a subset of the corpus is used in a specific research task. We call this subset a test corpus or test dataset. The models built over a test dataset can later be extended to the entire research corpus. Test dataset is a static collection of data specific to an experiment, as opposed to a research corpus, which can evolve over time. Therefore, different versions of the test dataset used in a specific experiment should be retained for ensuring reproducibility of the research results. Note that a test dataset should not be confused with the training and testing split of a dataset, which are the terms used in the context of a cross validation experimental setup. In MIR, a considerable number of the computational approaches follow a data-driven methodology, and hence, a well curated research corpus becomes a key factor in de- termining the success of these approaches. Due to the importance of a good data corpus in research, building a corpus in itself is a fundamental research task (MacMul- len, 2003). MIR can be regarded as a relatively new research area within information retrieval, which has primarily gained popularity in last two decades. Even today, a significant number of the studies in MIR use ad-hoc procedures to build a collection of data to be used in the experiments. Quite often the audio recordings used in the experiments are taken from a researcher’s personal music collection. Availability of a good representative research corpus has been a challenge in MIR (Serra, 2014). This 61
  • 31. 5[Y a_UO 5[ [ M 33
  • 32. 5[Y a_UO 5[ [ M § E MY RR[ 34
  • 33. 5[Y a_UO 5[ [ M § E MY RR[ § B [O Pa M P P _US O U UM • Serra, X. (2014). Creating research corpora for the computational study of music: the case of the Compmusic project. In Proc. of the 53rd AES Int. Conf. on Semantic Audio. London. • Srinivasamurthy, A., Koduri, G. K., Gulati, S., Ishwar, V., & Serra, X. (2014). Corpora for music information research in Indian art music. In Int. Computer Music Conf./Sound and Music Computing Conf., pp. 1029–1036. Athens, Greece. 35
  • 34. 5[Y a_UO 5[ [ M 36
  • 35. 5[Y a_UO 5[ [ M 9hX]b =X]gbe]U_ YgUXUgU <f Dhf]WVeU]am 37
  • 36. 5M M UO M P :U Pa_ M U a_UO 5[ [ M 8 9 .=3 .20891573 9 5 38
  • 37. A MOO __ a_UO 5[ [ M 8 9 .=3 .20891573 9 5 39
  • 38. A MOO __ a_UO 5[ [ M U Uf YWg]baf DY_bX]W c eUfYf 40
  • 39. E _ 6M M_ _ § E[ UO P URUOM U[ § el_ D SY M U[ § [PUO DUYU M U e § ClSM C O[S U U[ 41
  • 40. [Pe 6 _O U [ _ M P C _ M U[ _ 42 Chapter 4 Melody Descriptors and Representations 4.1 Introduction In this chapter, we describe methods for extracting relevant melodic descriptors and low-level melody representation from raw audio signals. These melodic features are used as inputs to perform higher level melodic analyses by the methods described in the subsequent chapters. Since these features are used in a number of computational tasks, we consolidate and present these common processing blocks in the current chapter. This chapter is largely based on our published work presented in Gulati et al. (2014a,b,c). There are four sections in this chapter, and in each section, we describe the extraction of a different melodic descriptor or a representation. In Section 4.2, we focus on the task of automatically identifying the tonic pitch in an audio recording. Our main objective is to perform a comparative evalu- ation of different existing methods and select the best method to be used in our work. In Section 4.3, we present the method used to extract the predominant pitch from audio signals, and describe the post-processing steps to reduce frequently occurring errors. In Section 4.4, we describe the process of segmenting the solo percussion re- gions (Tani sections) in the audio recordings of Carnatic music. In Section 4.5, we describe our ny¯as-based approach for segmenting melodies in Hindustani music. 87
  • 41. [Pe 6 _O U [ _ M P C _ M U[ _ Kba]W AXYag]Z]WUg]ba #=iU_hUg]ba Elxf Y[ YagUg]ba GeYXb ]aUag DY_bXl =fg] Ug]ba UaX cbfg cebWYff]a[ KUa] Y[ YagUg]ba 43
  • 42. [Pe 6 _O U [ _ M P C _ M U[ _ Kba]W AXYag]Z]WUg]ba #=iU_hUg]ba Elxf Y[ YagUg]ba GeYXb ]aUag DY_bXl =fg] Ug]ba UaX cbfg cebWYff]a[ KUa] Y[ YagUg]ba 44
  • 43. E[ UO P URUOM U[ 1 3 [MOT _ Method Features Feature Distribution Tonic Selection MRS (Sengupta et al., 2005) Pitch (Datta, 1996) NA Error minimization MRH1/MRH2 (Ranjani et al., 2011) Pitch (Boersma & Weenink, 2001) Parzen-window-based PDE GMM fitting MJS (Salamon et al., 2012) Multi-pitch salience (Salamon et al., 2011) Multi-pitch histogram Decision tree MSG (Gulati et al., 2012) Multi-pitch salience (Salamon et al., 2011) Multi-pitch histogram Decision tree Predominant melody (Salamon & Gómez, 2012) Pitch histogram Decision tree MAB1 (Bellur et al., 2012) Pitch (De Cheveigné & Kawahara, 2002) GD histogram Highest peak MAB2 (Bellur et al., 2012) Pitch (De Cheveigné & Kawahara, 2002) GD histogram Template matching MAB2 (Bellur et al., 2012) Pitch (De Cheveigné & Kawahara, 2002) GD histogram Highest peak MCS (Chordia & ¸Sentürk, 2013) ABBREVIATIONS: NA=Not applicable; GD=Group Delay, PDE=Probability Density Estimate Table 2.1: Summary of the existing tonic identification approaches. • Salamon, J., Gulati, S., & Serra, X. (2012). A multipitch approach to tonic identification in Indian classical music. In Proc. of Int. Soc. for Music Information Retrieval Conf. (ISMIR), pp. 499–504. Porto, Portugal. • Gulati, S., Salamon, J., & Serra, X. (2012). A two-stage approach for tonic identification in indian art music. In Proc. of the 2nd CompMusic Workshop, pp. 119–127. Istanbul, Turkey: Universitat Pompeu Fabra. Method Features Feature Distribution Tonic Selection MRS (Sengupta et al., 2005) Pitch (Datta, 1996) NA Error minimization MRH1/MRH2 (Ranjani et al., 2011) Pitch (Boersma & Weenink, 2001) Parzen-window-based PDE GMM fitting MJS (Salamon et al., 2012) Multi-pitch salience (Salamon et al., 2011) Multi-pitch histogram Decision tree MSG (Gulati et al., 2012) Multi-pitch salience (Salamon et al., 2011) Multi-pitch histogram Decision tree Predominant melody (Salamon & Gómez, 2012) Pitch histogram Decision tree MAB1 (Bellur et al., 2012) Pitch (De Cheveigné & Kawahara, 2002) GD histogram Highest peak MAB2 (Bellur et al., 2012) Pitch (De Cheveigné & Kawahara, 2002) GD histogram Template matching MAB2 (Bellur et al., 2012) Pitch (De Cheveigné & Kawahara, 2002) GD histogram Highest peak MCS (Chordia & ¸Sentürk, 2013) ABBREVIATIONS: NA=Not applicable; GD=Group Delay, PDE=Probability Density Estimate Table 2.1: Summary of the existing tonic identification approaches. Dh_g]c]gW % _Uff]Z]WUg]ba GeYXb ]aUag c]gW % gY c_UgY)cYU c]W 45
  • 44. 5[Y M M Ub 7bM aM U[ § D b Y T[P_ § DUd PUb _ PM M_ _ TIDCM1 TIDCM2 TIDCM3 TIDIITM1 TIDIITM2 TIDIISc ,1+ 3-/ .,2 -2 .1, // - - +/ +.. +, 1 =kWYecgf DYUa XheUg]ba # ]af 46
  • 45. C _a _ Methods TIDCM1 TIDCM2 TIDCM3 TIDIISc TIDIITM1 TIDIITM2 TP TPC TP TPC TP TPC TP TPC TP TPC TP TPC MRH1 - 81.4 69.6 84.9 73.2 90.8 81.8 83.6 92.1 97.4 80.2 86.9 MRH2 - 63.2 65.7 78.2 68.5 83.5 83.6 83.6 94.7 97.4 83.8 88.8 MAB1 - - - - - - - - 89.5 89.5 - - MAB2 - 88.9 74.5 82.9 78.5 83.4 72.7 76.4 92.1 92.1 86.6 89.1 MAB3 - 86 61.1 80.5 67.8 79.9 72.7 72.7 94.7 94.7 85 86.6 MJS - 88.9 87.4 90.1 88.4 91 75.6 77.5 89.5 97.4 90.8 94.1 MSG - 92.2 87.8 90.9 87.7 90.5 79.8 85.3 97.4 97.4 93.6 93.6 Table 4.1: Accuracies (%) for tonic pitch (TP) and tonic pitch-class (TPC) identification by seven methods on six different datasets using only audio data. The best accuracy obtained for each dataset is highlighted using bold text. The dashed horizontal line divides the methods based on supervised learning (MJS and MSG) and those based on expert knowledge (MRH1, MRH2, MAB1, MAB2 and MAB3). TP column for TIDCM1 is marked as ‘-’, because it consists of only instrumental excerpts for which we not evaluate tonic pitch accuracy. MAB1 is only evaluated on TIDIITM1 since it works on the whole concert recording. 94 MELODY DESCRIPTORS AND REPRESENTATION Methods TIDCM1 TIDCM2 TIDCM3 TIDIISc TIDIITM1 TIDIITM2 MRH1 87.7 83.5 88.9 87.3 97.4 91.7 MRH2 79.55 76.3 82 85.5 97.4 91.5 MAB1 - - - - 97.4 - MAB2 92.3 91.5 94.2 81.8 97.4 91.1 MAB3 87.5 86.7 90.9 81.8 94.7 89.9 MJS 88.9 93.6 92.4 80.9 97.4 92.3 MSG 92.2 90.9 90.5 85.3 97.4 93.6 3aPU[ 3aPU[ MPM M 47
  • 46. C _a _ CM2 CM3 IITM2 IITM1 60 65 70 75 80 85 90 95 100 PerformanceAccuracy(%) JS SG RH1 RH2 AB2 AB3 Dataset Accuracy(%) 40 50 60 70 80 90 100 PerformanceAccuracy(%) JS SG R H 1R H 2 AB2AB3 JS SG R H 1R H 2 AB2AB3 JS SG R H 1R H 2AB2 AB3 JS SG R H 1R H 2 AB2AB3 Tonic Pitch Tonic Pitch−class Hindustani Carnatic Male Female Accuracy(%) Tonic Pitch-class Tonic Pitch 0 5 10 15 20 25 30 Errors(%) JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 CM1 CM2 CM3 IISCB1 IITM1 0 5 10 15 20 25 30 Errors(%) JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 CM1 CM2 CM3 IISCB1 IITM1 0 5 10 15 20 25 30 Errors(%) JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 Pa Errors Ma Errors Other Errors CM1 CM2 CM3 IISCB1 IITM1 IITM2 0 5 10 15 20 25 30 Errors(%) JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H Pa E Ma E Othe CM1 CM2 CM3 IISCB1 IITM1 IITM 0 5 10 15 20 25 30 Errors(%) JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 Pa Errors Ma Errors Other Error CM1 CM2 CM3 IISCB1 IITM1 IITM2 0 5 10 15 20 25 30 Errors(%) JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 Pa Errors Ma Errors Other Errors CM1 CM2 CM3 IISCB1 IITM1 IITM2 0 5 10 15 20 25 30 Errors(%) JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 Pa Errors Ma Errors Other Errors CM1 CM2 CM3 IISCB1 IITM1 IITM2 Errors(%) Datasets 0 5 10 15 20 25 30 PerformanceAccuracy(%) JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 Pa Errors Ma Errors Other Errors Hindustani! Carnatic Male Female Accuracy(%) JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 Pa Errors Ma Errors Other Errors Hindustani! Carnatic Male Female 0 5 10 15 20 25 30 PerformanceAccuracy(%) JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 Pa Errors Ma Errors Other Errors Hindustani! Carnatic Male Female 0 5 10 15 20 25 30 Errors(%) JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 CM1 CM2 CM3 IISCB1 IITM1 0 5 10 15 20 25 30 Errors(%) JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 CM1 CM2 CM3 IISCB1 IITM1 0 5 10 15 20 25 30 Errors(%) JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H P M O CM1 CM2 CM3 IISCB1 IITM1 II 0 5 10 15 20 25 30 Errors(%) JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2A Pa Errors Ma Error Other Er CM1 CM2 CM3 IISCB1 IITM1 IITM2 0 5 10 15 20 25 30 Errors(%) JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 Pa Errors Ma Errors Other Errors CM1 CM2 CM3 IISCB1 IITM1 IITM2 0 5 10 15 20 25 30 Errors(%) JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 Pa Errors Ma Errors Other Errors CM1 CM2 CM3 IISCB1 IITM1 IITM2 0 5 10 15 20 25 30 Errors(%) JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 Pa Errors Ma Errors Other Errors CM1 CM2 CM3 IISCB1 IITM1 IITM2 Errors(%) Datasets 48
  • 47. C _a _ CM2 CM3 IITM2 IITM1 60 65 70 75 80 85 90 95 100 PerformanceAccuracy(%) JS SG RH1 RH2 AB2 AB3 Dataset Accuracy(%) 40 50 60 70 80 90 100 PerformanceAccuracy(%) JS SG R H 1R H 2 AB2AB3 JS SG R H 1R H 2 AB2AB3 JS SG R H 1R H 2AB2 AB3 JS SG R H 1R H 2 AB2AB3 Tonic Pitch Tonic Pitch−class Hindustani Carnatic Male Female Accuracy(%) Tonic Pitch-class Tonic Pitch 0 5 10 15 20 25 30 Errors(%) JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 CM1 CM2 CM3 IISCB1 IITM1 0 5 10 15 20 25 30 Errors(%) JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 CM1 CM2 CM3 IISCB1 IITM1 0 5 10 15 20 25 30 Errors(%) JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 Pa Errors Ma Errors Other Errors CM1 CM2 CM3 IISCB1 IITM1 IITM2 0 5 10 15 20 25 30 Errors(%) JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H Pa E Ma E Othe CM1 CM2 CM3 IISCB1 IITM1 IITM 0 5 10 15 20 25 30 Errors(%) JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 Pa Errors Ma Errors Other Error CM1 CM2 CM3 IISCB1 IITM1 IITM2 0 5 10 15 20 25 30 Errors(%) JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 Pa Errors Ma Errors Other Errors CM1 CM2 CM3 IISCB1 IITM1 IITM2 0 5 10 15 20 25 30 Errors(%) JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 Pa Errors Ma Errors Other Errors CM1 CM2 CM3 IISCB1 IITM1 IITM2 Errors(%) Datasets 0 5 10 15 20 25 30 PerformanceAccuracy(%) JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 Pa Errors Ma Errors Other Errors Hindustani! Carnatic Male Female Accuracy(%) JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 Pa Errors Ma Errors Other Errors Hindustani! Carnatic Male Female 0 5 10 15 20 25 30 PerformanceAccuracy(%) JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 Pa Errors Ma Errors Other Errors Hindustani! Carnatic Male Female 0 5 10 15 20 25 30 Errors(%) JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 CM1 CM2 CM3 IISCB1 IITM1 0 5 10 15 20 25 30 Errors(%) JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 CM1 CM2 CM3 IISCB1 IITM1 0 5 10 15 20 25 30 Errors(%) JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H P M O CM1 CM2 CM3 IISCB1 IITM1 II 0 5 10 15 20 25 30 Errors(%) JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2A Pa Errors Ma Error Other Er CM1 CM2 CM3 IISCB1 IITM1 IITM2 0 5 10 15 20 25 30 Errors(%) JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 Pa Errors Ma Errors Other Errors CM1 CM2 CM3 IISCB1 IITM1 IITM2 0 5 10 15 20 25 30 Errors(%) JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 Pa Errors Ma Errors Other Errors CM1 CM2 CM3 IISCB1 IITM1 IITM2 0 5 10 15 20 25 30 Errors(%) JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 JS SGR H 1R H 2AB2AB3 Pa Errors Ma Errors Other Errors CM1 CM2 CM3 IISCB1 IITM1 IITM2 Errors(%) Datasets <heUg]ba bZ YkWYecg UgY[bel ]fY UaU_lf]f <YgU]_YX Yeebe UaU_lf]f UgY[bel ]fY Yeebe 9aU_lf]f 49
  • 48. C _a _1 DaYYM e § a U U OT O M__URUOM U[ § 3aPU[ § bM UM [ Pa M U[ § 5[ _U_ MPU U[ S P U _ aY § BU OT Y M MW UOW § 3aPU[ Y MPM M § D _U Ub [ Pa M U[ § O[ _U_ MPU U[ S P U _ aY • Gulati, S., Bellur, A., Salamon, J., Ranjani, H. G., Ishwar, V., Murthy, H. A., & Serra, X. (2014). Automatic tonic identification in Indian art music: approaches and evaluation. JNMR, 43(1), 53–71. 50
  • 49. B P[YU M BU OT 7_ UYM U[ DY_bX]U • Salamon, J. & Gómez, E. (2012). Melody extraction from polyphonic music signals using pitch contour characteristics. IEEE Transactions on Audio, Speech, and Language Processing, 20(6), 1759–1770. • Bogdanov, D., Wack, N., Gómez, E., Gulati, S., Herrera, P., Mayor, O., Roma, G., Salamon, J., Zapata, J., & Serra, X. (2013). Essentia: an audio analysis library for music information retrieval. In ISMIR, pp. 493–498. 51
  • 50. el_ DbM M D SY M U[ § [PUO _ SY M U[ § 6 UYU Y [PUO T M_ _ 178 180 182 184 186 188 190 −200 0 200 400 600 800 Time (s) F0frequency(cents) N1 N5N4N3N2 52
  • 51. el_ DbM M D SY M U[ Predominant pitch estimation Tonic identification Audio Nyās segments Pred. pitch estimation and representation Histogram computation SegmentationSvara identification Segmentation Local Feature extractionContextual Local + Contextual Segment classification Segment fusion Segment classification and fusion 53
  • 52. el_ DbM M D SY M U[ Predominant pitch estimation Tonic identification Audio Nyās segments Pred. pitch estimation and representation Histogram computation SegmentationSvara identification Segmentation Local Feature extractionContextual Local + Contextual Segment classification Segment fusion Segment classification and fusion 54
  • 53. el_ DbM M D SY M U[ Predominant pitch estimation Tonic identification Audio Nyās segments Histogram computation SegmentationSvara identification Segmentation Local Feature extractionContextual Local + Contextual Segment classification Segment fusion Segment classification and fusion Pred. pitch estimation and representation 55
  • 54. el_ DbM M D SY M U[ Predominant pitch estimation Tonic identification Audio Nyās segments Histogram computation SegmentationSvara identification Segmentation Local Feature extractionContextual Local + Contextual Segment classification Segment fusion Segment classification and fusion Pred. pitch estimation and representation 0.12, 0.34, 0.59, 0.23, 0.54 0.21, 0.24, 0.54, 0.54, 0.42 0.32, 0.23, 0.34, 0.41, 0.63 0.66, 0.98, 0.74, 0.33, 0.12 0.90, 0.42, 0.14, 0.83, 0.76 56
  • 55. el_ DbM M D SY M U[ Predominant pitch estimation Tonic identification Audio Nyās segments Histogram computation SegmentationSvara identification Segmentation Local Feature extractionContextual Local + Contextual Segment classification Segment fusion Segment classification and fusion Pred. pitch estimation and representation 0.12, 0.34, 0.59, 0.23, 0.54 0.21, 0.24, 0.54, 0.54, 0.42 0.32, 0.23, 0.34, 0.41, 0.63 0.66, 0.98, 0.74, 0.33, 0.12 0.90, 0.42, 0.14, 0.83, 0.76 Nyās Non-nyās Nyās Nyās Non-nyās 57
  • 56. 6M M_ § k l R[ YM O _ :U Pa_ M U § 3 [ M U[ _1 Ya_UOUM 2)- e M _ MU U S § ) -/ el_ _ SY _ 657 IYWbeX]a[f <heUg]ba 9eg]fgf Ix[Uf 58
  • 57. C _a _118 MELODY DESCRIPTORS AND REPRESENTATIONS Segmentation Feat. DTW Tree k-NN NB LR SVM PLS FL 0.356 0.407 0.447 0.248 0.449 0.453 FC 0.284 0.394 0.387 0.383 0.389 0.406 FL+FC 0.289 0.414 0.426 0.409 0.432 0.437 Proposed FL 0.524 0.672 0.719 0.491 0.736 0.749 FC 0.436 0.629 0.615 0.641 0.621 0.673 FL+FC 0.446 0.682 0.708 0.591 0.725 0.735 Table 4.3: F-scores for ny¯as boundary detection using PLS method and the proposed seg- mentation method. Results are shown for different classifiers (Tree, k-NN, NB, LR, SVM) and local (FL), contextual (FC) and local together with contextual (FL+FC) features. DTW is the baseline method used for comparison. F-score for the random baseline obtained using BR2 is 0.184. Segmentation Feat. DTW Tree k-NN NB LR SVM PLS FL 0.553 0.685 0.723 0.621 0.727 0.722 FC 0.251 0.639 0.631 0.690 0.688 0.674 FL+FC 0.389 0.694 0.693 0.708 0.722 0.706 Segmentation Feat. DTW Tree k-NN NB LR SVM PLS FL 0.356 0.407 0.447 0.248 0.449 0.453 FC 0.284 0.394 0.387 0.383 0.389 0.406 FL+FC 0.289 0.414 0.426 0.409 0.432 0.437 Proposed FL 0.524 0.672 0.719 0.491 0.736 0.749 FC 0.436 0.629 0.615 0.641 0.621 0.673 FL+FC 0.446 0.682 0.708 0.591 0.725 0.735 Table 4.3: F-scores for ny¯as boundary detection using PLS method and the proposed seg- mentation method. Results are shown for different classifiers (Tree, k-NN, NB, LR, SVM) and local (FL), contextual (FC) and local together with contextual (FL+FC) features. DTW is the baseline method used for comparison. F-score for the random baseline obtained using BR2 is 0.184. Segmentation Feat. DTW Tree k-NN NB LR SVM PLS FL 0.553 0.685 0.723 0.621 0.727 0.722 FC 0.251 0.639 0.631 0.690 0.688 0.674 FL+FC 0.389 0.694 0.693 0.708 0.722 0.706 Proposed FL 0.546 0.708 0.754 0.714 0.749 0.758 FC 0.281 0.671 0.611 0.697 0.689 0.697 FL+FC 0.332 0.672 0.710 0.730 0.743 0.731 :bhaXUelCUVY_ 59
  • 58. C _a _1 DaYYM e § B [ [_ P _ SY M U[ § M a d MO U[ § >[OM R M a _ § BU O cU_ U M _ SY M U[ § BM YM OTU S 6EH § >[OM 5[ d aM R M a _ • Gulati, S., Serrà, J., Ganguli, K. K., & Serra, X. (2014). Landmark detection in Hindustani music melodies. In ICMC-SMC, pp. 1062-1068. 60
  • 59. [PUO BM B [O __U S 61 Chapter 5 Melodic Pattern Processing: Similarity, Discovery and Characterization 5.1 Introduction In this chapter, we present our methodology for discovering musically relevant melodic patterns in sizable audio collections of IAM. We address three main computational tasks involved in this process: melodic similarity, pattern discovery and characteriz- ation of the discovered melodic patterns. We refer to these different tasks jointly as melodic pattern processing. “Only by repetition can a series of tones be characterized as something definite. Only repetition can demarcate a series of tones and its purpose. Repetition thus is the basis of music as an art” (Schenker et al., 1980) Repeating patterns are at the core of music. Consequently, analysis of patterns is fundamental in music analysis. In IAM, recurring melodic patterns are the building blocks of melodic structures. They provide a base for improvisation and composition, and thus, are crucial to the analysis and description of r¯agas, compositions, and artists in this music tradition. A detailed account of the importance of melodic patterns in IAM is provided in Section 2.3.2. To recapitulate, from the literature review presented in Section 2.4.2 and Section 2.5.2 we see that the approaches for pattern processing in music can be broadly put into two categories (Figure 5.1). One of the types of approaches perform pattern detection (or matching) and follow a supervised methodology. In these approaches the system knows a priori the pattern to be extracted. Typically such a system is fed with exem- plar patterns or queries and is expected to extract all of their occurrences in a piece of 121
  • 60. [PUO BM B [O __U S 62
  • 61. [PUO BM B [O __U S 63
  • 62. [PUO BM B [O __U S 64
  • 63. 7dU_ U S 3 [MOT _ R[ 3 Method Task Melody Representation Segmentation Similarity Measure Speed-up #R¯agas #Rec #Patt #Occ Ishwar et al. (2012) Distinction Continuous GT annotations HMM - 5c NA 6 431 Ross & Rao (2012) Detection Continuous Pa ny¯as DTW Segmentation 1h 2 2 107 Ross et al. (2012) Detection Continuous, SAX-12,1200 Sama location Euc., DTW Segmentation 3h 4 3 107 Ishwar et al. (2013) Detection Stationary point, Continuous Brute-force RLCS Two stage method 2c 47 4 173 Rao et al. (2013) Distinction Continuous GT annotations DTW - 1h 8 3 268 Dutta & Murthy (2014a) Discovery Stationary point, Continuous Brute-force RLCS - 5c 59 - - Dutta & Murthy (2014b) Detection Stationary point, Continuous NA Modified- RLCS Two stage method 1c 16 NA 59 Rao et al. (2014) Distinction Continuous GT annotations DTW - 1h 8 3 268 Distinction Continuous GT annotations HMM - 5c NA 10 652 Ganguli et al. (2015) Detection BSS, Transcription - Smith- Waterman Discretization 34h 50 NA 1075 h Hindustani music collection c Carnatic music collection ABBREVIATIONS: #Rec=Number of recordings; #Patt=Number of unique patterns; #Occ=Total number of annotated occurrences of the patterns; GT=Ground truth; NA=Not available; “-”= Not applicable; Euc.=Euclidean distance. Table 2.2: Summary of the methods proposed in the literature for melodic pattern processing in IAM. Note that all of these studies were published during the course of our work. <YgYWg]ba #/ <]fg]aWg]ba #. <]fWbiYel #+ , +, , +0 65
  • 64. [PUO BM B [O __U S Supervised Approach Unsupervised Approach Supervised Approach 66
  • 65. [PUO BM B [O __U S Supervised Approach Unsupervised Approach Supervised Approach 67
  • 66. [PUO BM B [O __U S § 6M M_ _Uf § [c PS NUM_ § :aYM [ _ UYU M U[ _ Unsupervised Approach Supervised Approach 68
  • 67. [PUO BM B [O __U S DY_bX]W ] ]_Ue]gl GUggYea <]fWbiYel GUggYea UeUWgYe]mUg]ba hcYei]fYX LafhcYei]fYX LafhcYei]fYX 69
  • 68. [PUO BM B [O __U S DY_bX]W ] ]_Ue]gl GUggYea <]fWbiYel GUggYea UeUWgYe]mUg]ba hcYei]fYX LafhcYei]fYX LafhcYei]fYX 70
  • 69. [PUO DUYU M U e 71
  • 70. [PUO DUYU M U e Time (s) 10 2 3 4 5 6 7 8 9 10 11 12 13 14 15 00 00 00 00 00 00 00 U__Ya[Y 72
  • 71. [PUO DUYU M U e U__Ya[Y 73 + , - .
  • 72. [PUO DUYU M U e Predominant Pitch Estimation Pitch Normalization Uniform Time-scaling Distance Computation Melodic Similarity Audio1 Audio2 74
  • 73. [PUO DUYU M U e U c_]a[ eUgY8 PYf ) Eb& N Ug ]aX8 PYf ) Eb& b hW 8 =hW_]XYUa be <laU ]W g] Y Uec]a[ #<KN & GUeU YgYef8 Predominant Pitch Estimation Pitch Normalization Uniform Time-scaling Distance Computation Melodic Similarity Audio1 Audio2 75
  • 74. [PUO DUYU M U e n+ & 01& / & . & --p m sg Standard deviation of computation Q Heaviside step functi u Flatness measure ˜u Flatness threshold ° Complexity weightin w Sampling rate of the V Maximum error para method z Complexity estimate Predominant Pitch Estimation Pitch Normalization Uniform Time-scaling Distance Computation Melodic Similarity Audio1 Audio2 76
  • 75. § Eb abe U_]mUg]ba § Kba]W § Q]aZ § Q,. § Q+, § DYUa § DYX]Ua § Q abe § DYX]Ua UVfb_hgY XYi]Ug]ba [PUO DUYU M U e Predominant Pitch Estimation Pitch Normalization Uniform Time-scaling Distance Computation Melodic Similarity Audio1 Audio2 77
  • 76. [PUO DUYU M U e § vbZZ § vba n (3& (3/& +( & +( /& +(+p Predominant Pitch Estimation Pitch Normalization Uniform Time-scaling Distance Computation Melodic Similarity Audio1 Audio2 78
  • 77. [PUO DUYU M U e § =hW_]XYUa § <laU ]W g] Y Uec]a[ § ?_bVU_ WbafgeU]ag #- § CbWU_ WbafgeU]ag # Predominant Pitch Estimation Pitch Normalization Uniform Time-scaling Distance Computation Melodic Similarity Audio1 Audio2 79
  • 78. [PUO DUYU M U e Predominant Pitch Estimation Pitch Normalization Uniform Time-scaling Distance Computation Melodic Similarity Audio1 Audio2 k k k 80
  • 79. 7bM aM U[ 1 D a E GUggYeaf + E IUaXb fY[ Yagf Kbc + aYUeYfg aY][ Vbef DYUa 9iYeU[Y GeYW]f]ba 81
  • 80. 6M M_ 49 IYWbeX]a[f <heUg]ba Ix[UfGUggYea FWW( 49 IYWbeX]a[f <heUg]ba Ix[UfGUggYea FWW( UeaUg]W hf]W XUgUfYg ]aXhfgUa] hf]W XUgUfYg 82
  • 81. C _a _ ODIC SIMILARITY: APPROACHES AND EVALUATION 131 Dataset MAP Srate Norm TScale Dist MSDcmd iitm 0.413 w67 Zmean Woff DDTW_L1_G90 0.412 w67 Zmean Won DDTW_L1_G10 0.411 w100 Zmean Woff DDTW_L1_G90 MSDhmd iitb 0.552 w100 Ztonic Woff DDTW_L0_G90 0.551 w67 Ztonic Woff DDTW_L0_G90 0.547 w50 Ztonic Woff DDTW_L0_G90 MAP score and details of the parameter settings for the three best performing MSDcmd iitm and MSDhmd iitb dataset. Srate: sampling rate of the melody representation, malization technique, TScale: uniform time-scaling and Dist: distance measure. an average precision (MAP), a typical evaluation measure in information Manning et al., 2008). Mean average precision (MAP) is computed by mean of the average precision values of each query in the dataset. This way, single number to evaluate and compare the performance of a variant. In sess if the difference in the performance of any two variants is statistically , we use the Wilcoxon signed rank-test (Wilcoxon, 1945) with p < 0.01. To e for multiple comparisons, we apply the Holm-Bonferroni method (Holm, us, considering that we compare 560 different variants, we effectively use a CMD HMD 134 MELODIC PATTERN PR Dataset MAP Srate Norm TScale Dist MSDcmd iitm 0.279 w67 Zmean Won DDTW_L1_G10 0.277 w67 ZtonicQ12 Won DDTW_L1_G10 0.275 w100 ZtonicQ12 Won DDTW_L1_G10 MSDhmd iitb 0.259 w40 ZtonicQ12 Won DDTW_L1_G90 0.259 w100 ZtonicQ12 Won DDTW_L1_G90 0.259 w67 ZtonicQ12 Won DDTW_L1_G90 Table 5.2: MAP score and details of the parameter settings for the three best perf83
  • 82. C _a _ ODIC SIMILARITY: APPROACHES AND EVALUATION 131 Dataset MAP Srate Norm TScale Dist MSDcmd iitm 0.413 w67 Zmean Woff DDTW_L1_G90 0.412 w67 Zmean Won DDTW_L1_G10 0.411 w100 Zmean Woff DDTW_L1_G90 MSDhmd iitb 0.552 w100 Ztonic Woff DDTW_L0_G90 0.551 w67 Ztonic Woff DDTW_L0_G90 0.547 w50 Ztonic Woff DDTW_L0_G90 MAP score and details of the parameter settings for the three best performing MSDcmd iitm and MSDhmd iitb dataset. Srate: sampling rate of the melody representation, malization technique, TScale: uniform time-scaling and Dist: distance measure. an average precision (MAP), a typical evaluation measure in information Manning et al., 2008). Mean average precision (MAP) is computed by mean of the average precision values of each query in the dataset. This way, single number to evaluate and compare the performance of a variant. In sess if the difference in the performance of any two variants is statistically , we use the Wilcoxon signed rank-test (Wilcoxon, 1945) with p < 0.01. To e for multiple comparisons, we apply the Holm-Bonferroni method (Holm, us, considering that we compare 560 different variants, we effectively use a CMD HMD 134 MELODIC PATTERN PR Dataset MAP Srate Norm TScale Dist MSDcmd iitm 0.279 w67 Zmean Won DDTW_L1_G10 0.277 w67 ZtonicQ12 Won DDTW_L1_G10 0.275 w100 ZtonicQ12 Won DDTW_L1_G10 MSDhmd iitb 0.259 w40 ZtonicQ12 Won DDTW_L1_G90 0.259 w100 ZtonicQ12 Won DDTW_L1_G90 0.259 w67 ZtonicQ12 Won DDTW_L1_G90 Table 5.2: MAP score and details of the parameter settings for the three best perf GUeU YgYef GUggYea WUgY[be]Yf GUggYea fY[ YagUg]ba 84
  • 83. C _a _1 DaYYM e U c_]a[ eUgY Ebe U_]mUg]ba K] Y fWU_]a[ <]fgUaWY CbWU_ WbafgeU]ag ?_bVU_ WbafgeU]ag ][ DYUa Eb WbafYafhf <KN N]g AaiUe]Uag AaiUe]Uag Kba]W Eb WbafYafhf <KN N]g bhg N]g bhg UeaUg]W # (. D9G ]aXhfgUa] # (// D9G GUeU YgYe 85
  • 84. C _a _1 DaYYM e Bab a La ab a G eUfY Y[ YagUg]ba UeaUg]W ]aXhfgUa] • Gulati, S., Serrà, J., & Serra, X. (2015). An evaluation of methodologies for melodic similarity in audio recordings of Indian art music. In ICASSP, pp. 678–682. 86
  • 85. Y [bU S [PUO DUYU M U e Predominant Pitch Estimation Pitch Normalization Uniform Time-scaling Distance Computation Melodic Similarity Audio1 Audio2 8h_gheY fcYW]Z]W]g]Yf 87
  • 86. Y [bU S [PUO DUYU M U e ]aXhfgUa] hf]W 88
  • 87. Y [bU S [PUO DUYU M U e • G. E. Batista, X. Wang, and E. J Keogh. A complexity-invariant distance measure for time series, in SDM, volume 11, pp. 699–710, 2011 b c_Yk]gl 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 Time (s) 500 0 500 1000 1500 Frequency(Cent) 1 2 3 U Y c eUfY UeaUg]W hf]W 89 142 MELODIC P based distance (DDTW) in order to compute the final similarit We compute ° as: ° = max(zi,zj) min(zi,zj) zi = 2 s N 1 Â i=1 (ˆpi ˆpi+1)2 where, zi is the complexity estimate of a melodic pattern of le 142 MELODIC based distance (DDTW) in order to compute the final simila We compute ° as: ° = max(zi,zj) min(zi,zj) zi = 2 s N 1 Â i=1 (ˆpi ˆpi+1)2 where, zi is the complexity estimate of a melodic pattern of is the pitch value of the ith sample. We explore two variants o
  • 88. Y [bU S [PUO DUYU M U e Predominant Pitch Est. + Post Processing Pitch Normalization Svara Duration Truncation Distance Computation + Complexity Weight Melodic Similarity Audio1 Audio2 Partial Transcription Elxf fY[ YagUg]ba n (+& (-& (/& (1/& +( & +(/& ,( p 90
  • 89. 6M M_ 49 IYWbeX]a[f <heUg]ba Ix[UfGUggYea FWW( 49 IYWbeX]a[f <heUg]ba Ix[UfGUggYea FWW( UeaUg]W hf]W XUgUfYg ]aXhfgUa] hf]W XUgUfYg 91
  • 90. C _a _ MELODIC PATTERN PROCESSING MSDhmd CM Norm MB MDT MCW1 MCW2 Ztonic 0.45 (0.25) 0.52 (0.24) - - Zmean 0.25 (0.20) 0.31 (0.23) - - Ztetra 0.40 (0.23) 0.47 (0.23) - - MSDcmd CM Norm MB MDT MCW1 MCW2 Ztonic 0.39 (0.29) 0.42 (0.29) 0.41 (0.28) 0.41 (0.29) Zmean 0.39 (0.26) 0.45 (0.28) 0.43 (0.27) 0.45 (0.27) Ztetra 0.45 (0.26) 0.50 (0.27) 0.49 (0.28) 0.51 (0.27) 5.3: MAP scores for the two datasets MSDhmd CM and MSDcmd CM for the four method vari- MB, MDT, MCW1 and MCW2 and for different normalization techniques. Standard devi- of average precision is reported within round brackets. rst analyse the results for the MSDhmd CM dataset. From Table 5.3 (upper half), e that the proposed method variant that applies a duration truncation performs than the baseline method for all the normalization techniques. Moreover, this ence is found to be statistically significant in each case. The results for the hmd CM in this table correspond to F =500 ms, for which we obtain the highest ac- y compared to the other F values as shown in Figure 5.10. Furthermore, we see tonic results in the best accuracy for the MSDhmd CM for all the method variants and fference is found to be statistically significant in each case. From Table 5.3, we e a high standard deviation of the average precision values. This is because some MB MDT H1 H2 H3 H4 H5 MB MB MB MBMDT MDT MDT MDT MB MDT MCW2 MB MB MB MBMDT MDT MDT C1 C2 C3 C4 C5 MDTMCW2 MCW2 MCW2 MCW2 0.1 0.3 0.5 0.75 1.0 1.5 2.0 0.44 0.46 0.48 0.50 0.52 MAP HMD CMD (s) 92
  • 91. C _a _ MELODIC PATTERN PROCESSING MSDhmd CM Norm MB MDT MCW1 MCW2 Ztonic 0.45 (0.25) 0.52 (0.24) - - Zmean 0.25 (0.20) 0.31 (0.23) - - Ztetra 0.40 (0.23) 0.47 (0.23) - - MSDcmd CM Norm MB MDT MCW1 MCW2 Ztonic 0.39 (0.29) 0.42 (0.29) 0.41 (0.28) 0.41 (0.29) Zmean 0.39 (0.26) 0.45 (0.28) 0.43 (0.27) 0.45 (0.27) Ztetra 0.45 (0.26) 0.50 (0.27) 0.49 (0.28) 0.51 (0.27) 5.3: MAP scores for the two datasets MSDhmd CM and MSDcmd CM for the four method vari- MB, MDT, MCW1 and MCW2 and for different normalization techniques. Standard devi- of average precision is reported within round brackets. rst analyse the results for the MSDhmd CM dataset. From Table 5.3 (upper half), e that the proposed method variant that applies a duration truncation performs than the baseline method for all the normalization techniques. Moreover, this ence is found to be statistically significant in each case. The results for the hmd CM in this table correspond to F =500 ms, for which we obtain the highest ac- y compared to the other F values as shown in Figure 5.10. Furthermore, we see tonic results in the best accuracy for the MSDhmd CM for all the method variants and fference is found to be statistically significant in each case. From Table 5.3, we e a high standard deviation of the average precision values. This is because some MB MDT H1 H2 H3 H4 H5 MB MB MB MBMDT MDT MDT MDT MB MDT MCW2 MB MB MB MBMDT MDT MDT C1 C2 C3 C4 C5 MDTMCW2 MCW2 MCW2 MCW2 0.1 0.3 0.5 0.75 1.0 1.5 2.0 0.44 0.46 0.48 0.50 0.52 MAP HMD CMD (s) <heUg]ba gehaWUg]ba Fcg] U_ XheUg]ba8 b c_Yk]gl NY][ g]a[ 93
  • 92. C _a _1 DaYYM e <heUg]ba gehaWUg]ba b c_Yk]gl NY][ g]a[ UeaUg]W ]aXhfgUa] UeaUg]W • Gulati, S., Serrà, J., & Serra, X. (2015). Improving melodic similarity in Indian art music using culture-specific melodic characteristics. In ISMIR, pp. 680–686. 94
  • 93. [PUO BM B [O __U S DY_bX]W ] ]_Ue]gl GUggYea <]fWbiYel GUggYea UeUWgYe]mUg]ba hcYei]fYX LafhcYei]fYX LafhcYei]fYX 95
  • 94. BM 6U_O[b e AageU eYWbeX]a[ GUggYea <]fWbiYel AagYe eYWbeX]a[ GUggYea YUeW IUa IYZ]aY Yag I+ I, I- 96
  • 95. M O[ PU S BM 6U_O[b e § 6e MYUO UY cM U S 6EH ?_bVU_ WbafgeU]ag #+ _Ya[g Eb CbWU_ WbafgeU]ag gYc4 n# &+ & #+&+ & #+& p 97
  • 96. CM W C RU Y CbWU_ WbafgeU]ag gYc4 n#,&+ & #+&+ & #+&, p CbWU_ Wbfg ZhaWg]ba #M+& M,& M-& M. 98
  • 97. 5[Y a M U[ M 5[Y dU e F#a, à U c_Yf F#a, à UaX]XUgYf -/ k #0 k 0 k / k (/ 6 -+ D]__]ba <]fgUaWY Cb Ye :bhaXf 99
  • 98. >[c 4[a P_ R[ 6EH <KehY 7 <C: <C: 7 <CUfg <CUfg Kbc E dhYhY 100
  • 99. >[c 4[a P_ R[ 6EH § 5M_OMP P [c N[a P_ JRakthanmanon et al. (2013)K § U _ >M_ N[a P JKim et al. (2001)K § >4L [ST JKeogh et al. (2001)K § 7M e MNM P[ U S • Kim, S. W., Park, S., & Chu, W. W. (2001). An index-based approach for similarity search supporting time warping in large sequence databases. In 17th International Conference on Data Engineering, pp. 607–614. • Keogh, E. & Ratanamahatana, C. A. (2004). Exact indexing of dynamic time warping. Knowledge and Information Systems, 7(3), 358–386. • Rakthanmanon, T., Campana, B., Mueen, A., Batista, G., Westover, B., Zhu, Q., Zakaria, J., & Keogh, E. (2013). Addressing big data time series: mining trillions of time series subsequences under dynamic time warping. ACM Transactions on Knowledge Discovery from Data (TKDD), 7(3), 10:1–10:31 101
  • 100. 6M M_ § D P M _1 /0 ((( § D M OT M _1 j)- ((( ((( 49 IYWbeX]a[f <heUg]ba UeaUg]W hf]W 102
  • 102. C _a _1 5[Y a M U[ _ § M O[ PU S1 )&,) E § O[ PU S1 ) &, E >]efg CUfg C:TBYb[ T=H C:TBYb[ T= /, ,- + ./ /+ - Cb Ye VbhaX AageU eYWbeX]a[ # AagYe eYWbeX]a[ # + K 6 + +, 10 33 104
  • 103. C _a _ 160 MELODIC PATTER Seed Category V1 V2 V3 V4 S1 0.92 0.92 0.91 0.89 S2 0.68 0.73 0.73 0.66 S3 0.35 0.34 0.35 0.35 Table 5.5: MAP scores for four variants of rank refinement method (Vi) for e (S1, S2 and S3). S1 S2 S3 S1 S2 S3 105
  • 104. C _a _ 160 MELODIC PATTER Seed Category V1 V2 V3 V4 S1 0.92 0.92 0.91 0.89 S2 0.68 0.73 0.73 0.66 S3 0.35 0.34 0.35 0.35 Table 5.5: MAP scores for four variants of rank refinement method (Vi) for e (S1, S2 and S3). S1 S2 S3 S1 S2 S3 2, cUggYeaf CbWU_ Wbfg ZhaWg]ba <]fgUaWY fYcUeUV]_]gl AageU if AagYe eYWbeX]a[ 106
  • 105. C _a _ AageU eYWbeX]a[ AagYe eYWbeX]a[ + >GI à 2 KGI + >GI à / KGI • Gulati, S., Serrà, J., Ishwar, V., & Serra, X. (2014). Mining melodic patterns in large audio collections of Indian art music. In SITIS-MIRA, pp. 264–271. CbWU_ Wbfg ZhaWg]ba à ]gl V_bW 107
  • 106. [PUO BM B [O __U S 6U__UYU M M _ bM M _ 108
  • 107. [PUO BM B [O __U S DY_bX]W ] ]_Ue]gl GUggYea <]fWbiYel GUggYea UeUWgYe]mUg]ba hcYei]fYX LafhcYei]fYX LafhcYei]fYX 109
  • 108. [PUO BM 5TM MO UfM U[ 9MYMWM_ h 3 M Wl _ 5[Y [_U U[ _ OURUO M _ ClSM Y[ UR_ 110
  • 109. [PUO BM 5TM MO UfM U[ Inter-recording Pattern Detection Intra-recording Pattern Discovery Data Processing Pattern Network Generation Network Filtering Community Detection and Characterization Rāga motifs Carnatic music collection Melodic Pattern Discovery Pattern Characterization 111
  • 110. [PUO BM 5TM MO UfM U[ Inter-recording Pattern Detection Intra-recording Pattern Discovery Data Processing Pattern Network Generation Network Filtering Community Detection and Characterization Rāga motifs Carnatic music collection Melodic Pattern Discovery Pattern Characterization 112
  • 111. [PUO BM 5TM MO UfM U[ • M. EJ Newman, “The structure and function of complex networks,” Society for Industrial and Applied Mathematics (SIAM) review, vol. 45, no. 2, pp. 167–256, 2003. LaX]eYWgYX Pattern Network Generation Network Filtering Community Detection and Characterization 113
  • 112. [PUO BM 5TM MO UfM U[ • S. Maslov and K. Sneppen, “Specificity and stability in topology of protein networks,” Science, vol. 296, no. 5569, pp. 910– 913, 2002. Pattern Network Generation Network Filtering Community Detection and Characterization 114
  • 113. [PUO BM 5TM MO UfM U[ • V. D. Blondel, J. L. Guillaume, R. Lambiotte, and E. Lefebvre, “Fast unfolding of communities in large networks,” Journal of Statistical Mechanics: Theory and Experiment, vol. 2008, no. 10, pp. P10008, 2008. • S Fortunato, “Community detection in graphs,” Physics Reports, vol. 486, no. 3, pp. 75–174, 2010. Pattern Network Generation Network Filtering Community Detection and Characterization 115
  • 114. [PUO BM 5TM MO UfM U[ Pattern Network Generation Network Filtering Community Detection and Characterization b ha]gl KlcY EbXYf IYWbeX]a[f Ix[Uf ?U U U ][ ][ ][ b cbf]g]ba c eUfYf DYX]h CYff CYff Ix[U c eUfYf DYX]h DYX]h CYff 116
  • 115. [PUO BM 5TM MO UfM U[ Pattern Network Generation Network Filtering Community Detection and Characterization Ix[U X]fge]Vhg]ba 4: Graphical representation of the melodic pattern network after filterin The detected communities in the network are indicated by different colors these communities are shown on the right. ommunities we empirically devise a goodness measure G, which de od that a community Cq represents a r¯aga motif. We propose to use G = NL4 c, an estimate of the likelihood of r¯aga a0 q in Cq, L = Aq,1 N , N = 5 bhag Ix[U _] Y_] bbX - + + 3 5 = 117
  • 116. [PUO BM 5TM MO UfM U[ Pattern Network Generation Network Filtering Community Detection and Characterization IYWbeX]a[ X]fge]Vhg]ba N = 5 bhag 1x2 + 2x2 + 3x1 5 Yageb]X , , + munities we empirically devise a goodness measure G, which den hat a community Cq represents a r¯aga motif. We propose to use G = NL4 c, estimate of the likelihood of r¯aga a0 q in Cq, L = Aq,1 N , how uniformly the nodes of the community are distributed over a c = Â Lb l=1 l ·Bq,l N . = 118
  • 117. [PUO BM 5TM MO UfM U[ Pattern Network Generation Network Filtering Community Detection and Characterization 5.24: Graphical representation of the melodic pattern network after filt ld ˜D. The detected communities in the network are indicated by different col es of these communities are shown on the right. e communities we empirically devise a goodness measure G, which elihood that a community Cq represents a r¯aga motif. We propose to us G = NL4 c, L is an estimate of the likelihood of r¯aga a0 q in Cq, A ?bbXaYff YUfheY 119 EbXYf Yageb]X Ix[U _] Y_] bbX
  • 118. 7bM aM U[ 49 IYWbeX]a[f <heUg]ba Ix[Uf b cbf]g]baf + Ix[U k + b ha]gl 6 + G eUfYf + Dhf]W]Uaf 120
  • 119. C _a _ 5.5 CHARACTERIZATION OF MELODIC PATTERNS 173 R¯aga µr sr R¯aga µr sr Hamsadhv¯ani 0.84 0.23 B¯egad. a 0.88 0.11 K¯amavardani 0.78 0.17 K¯api 0.75 0.10 Darb¯ar 0.81 0.23 Bhairavi 0.91 0.15 Kaly¯an. i 0.90 0.10 Beh¯ag 0.84 0.16 K¯a ˙mbh¯oji 0.87 0.12 T¯od.¯ı 0.92 0.07 Table 5.7: Mean (µr) and standard deviation (sr) of µ√ for each r¯aga. R¯agas with µr 0.85 Rāgas 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 µ 0 5 10 15 20 25 30 35 Count 121
  • 120. C _a _ 5.5 CHARACTERIZATION OF MELODIC PATTERNS 173 R¯aga µr sr R¯aga µr sr Hamsadhv¯ani 0.84 0.23 B¯egad. a 0.88 0.11 K¯amavardani 0.78 0.17 K¯api 0.75 0.10 Darb¯ar 0.81 0.23 Bhairavi 0.91 0.15 Kaly¯an. i 0.90 0.10 Beh¯ag 0.84 0.16 K¯a ˙mbh¯oji 0.87 0.12 T¯od.¯ı 0.92 0.07 Table 5.7: Mean (µr) and standard deviation (sr) of µ√ for each r¯aga. R¯agas with µr 0.85 Rāgas 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 µ 0 5 10 15 20 25 30 35 Count GYe c eUfY cYeZbe UaWY GYe ex[U cYeZbe UaWY Dhf]W]Uaf’ U[eYY Yag 122
  • 121. C _a _1 DaYYM e DYUaIx[U# DYUaG eUfY# IUg]a[f 5 (3,(1/ 5 (2/ • Gulati, S., Serrà, J., Ishwar, V., & Serra, X. (2016). Discovering rāga motifs by characterizing communities in networks of melodic patterns. In ICASSP, pp. 286–290. 123 + , - . / 0 1 2 3 + PYf bhag + , - - +, ,+ ,/ --
  • 122. 3a [YM UO ClSM C O[S U U[ 124 Chapter 6 Automatic R¯aga Recognition 6.1 Introduction In this chapter, we address the task of automatically recognizing r¯agas in audio re- cordings of IAM. We describe two novel approaches for r¯aga recognition that jointly capture the tonal and the temporal aspects of melody. The contents of this chapter are largely based on our published work in Gulati et al. (2016a,b). R¯aga is a core musical concept used in compositions, performances, music organiz- ation, and pedagogy of IAM. Even beyond the art music, numerous compositions in Indian folk and film music are also based on r¯agas (Ganti, 2013). R¯aga is therefore one of the most desired melodic descriptions of a recorded performance of IAM, and an important criterion used by listeners to browse its audio music collections. Despite its significance, there exists a large volume of audio content whose r¯aga is incorrectly labeled or, simply, unlabeled. A computational approach to r¯aga recognition will al- low us to automatically annotate large collections of audio music recordings. It will enable r¯aga-based music retrieval in large audio archives, semantically-meaningful music discovery and musicologically-informed navigation. Furthermore, a deeper understanding of the r¯aga framework from a computational perspective will pave the way for building applications for music pedagogy in IAM. R¯aga recognition is the most studied research topic in MIR of IAM. There exist a considerable number of approaches utilizing different characteristic aspects of r¯agas such as svara set, svara salience and ¯ar¯ohana-avr¯ohana. A critical in-depth review of the existing approaches for r¯aga recognition is presented in Section 2.4.3, wherein we identify several shortcomings in these approaches and possible avenues for scientific contribution to take this task to the next level. Here we provide a short summary of this analysis: Nearly half of the number of the existing approaches for r¯aga recognition do not utilize the temporal aspects of melody at all (Table 2.3), which are crucial 177
  • 123. 3a [YM UO ClSM C O[S U U[ 38BACKGROUND Methods Svara set Svara salience Svara intonation ¯ar¯ohana- avr¯ohana Melodic phrases Svara discretization Temporal aspects Pandey et al. (2003) • • Yes • Chordia & Rae (2007) • • • Yes • Belle et al. (2009) • • • No Shetty & Achary (2009) • • Yes • Sridhar & Geetha (2009) • • Yes • Koduri et al. (2011) • • Yes Ranjani et al. (2011) • No Chakraborty & De (2012) • Yes Koduri et al. (2012) • • • Both Chordia & ¸Sentürk (2013) • • • No Dighe et al. (2013a) • • • Yes • Dighe et al. (2013b) • • Yes Koduri et al. (2014) • • • No Kumar et al. (2014) • • • • Yes • Dutta et al. (2015)a • No • a This method performs r¯aga verification and not recognition Table 2.3: R¯aga recognition methods proposed in the literature along with the melodic characteristics they utilize to perform the task. We also indicate if a method uses a discrete svara representation of melody. The methods are arranged in chronological order. 125
  • 124. 3a [YM UO ClSM C O[S U U[ 38BACKGROUND Methods Svara set Svara salience Svara intonation ¯ar¯ohana- avr¯ohana Melodic phrases Svara discretization Temporal aspects Pandey et al. (2003) • • Yes • Chordia & Rae (2007) • • • Yes • Belle et al. (2009) • • • No Shetty & Achary (2009) • • Yes • Sridhar & Geetha (2009) • • Yes • Koduri et al. (2011) • • Yes Ranjani et al. (2011) • No Chakraborty & De (2012) • Yes Koduri et al. (2012) • • • Both Chordia & ¸Sentürk (2013) • • • No Dighe et al. (2013a) • • • Yes • Dighe et al. (2013b) • • Yes Koduri et al. (2014) • • • No Kumar et al. (2014) • • • • Yes • Dutta et al. (2015)a • No • a This method performs r¯aga verification and not recognition Table 2.3: R¯aga recognition methods proposed in the literature along with the melodic characteristics they utilize to perform the task. We also indicate if a method uses a discrete svara representation of melody. The methods are arranged in chronological order. e I [ ? D G X < a E 126
  • 125. 3a [YM UO ClSM C O[S U U[ 38BACKGROUND Methods Svara set Svara salience Svara intonation ¯ar¯ohana- avr¯ohana Melodic phrases Svara discretization Temporal aspects Pandey et al. (2003) • • Yes • Chordia & Rae (2007) • • • Yes • Belle et al. (2009) • • • No Shetty & Achary (2009) • • Yes • Sridhar & Geetha (2009) • • Yes • Koduri et al. (2011) • • Yes Ranjani et al. (2011) • No Chakraborty & De (2012) • Yes Koduri et al. (2012) • • • Both Chordia & ¸Sentürk (2013) • • • No Dighe et al. (2013a) • • • Yes • Dighe et al. (2013b) • • Yes Koduri et al. (2014) • • • No Kumar et al. (2014) • • • • Yes • Dutta et al. (2015)a • No • a This method performs r¯aga verification and not recognition Table 2.3: R¯aga recognition methods proposed in the literature along with the melodic characteristics they utilize to perform the task. We also indicate if a method uses a discrete svara representation of melody. The methods are arranged in chronological order. e I [ ? D G X < a E 127
  • 126. 3a [YM UO ClSM C O[S U U[ 38BACKGROUND Methods Svara set Svara salience Svara intonation ¯ar¯ohana- avr¯ohana Melodic phrases Svara discretization Temporal aspects Pandey et al. (2003) • • Yes • Chordia & Rae (2007) • • • Yes • Belle et al. (2009) • • • No Shetty & Achary (2009) • • Yes • Sridhar & Geetha (2009) • • Yes • Koduri et al. (2011) • • Yes Ranjani et al. (2011) • No Chakraborty & De (2012) • Yes Koduri et al. (2012) • • • Both Chordia & ¸Sentürk (2013) • • • No Dighe et al. (2013a) • • • Yes • Dighe et al. (2013b) • • Yes Koduri et al. (2014) • • • No Kumar et al. (2014) • • • • Yes • Dutta et al. (2015)a • No • a This method performs r¯aga verification and not recognition Table 2.3: R¯aga recognition methods proposed in the literature along with the melodic characteristics they utilize to perform the task. We also indicate if a method uses a discrete svara representation of melody. The methods are arranged in chronological order. e I [ ? D G X < a E 128
  • 127. 3a [YM UO ClSM C O[S U U[ 38BACKGROUND Methods Svara set Svara salience Svara intonation ¯ar¯ohana- avr¯ohana Melodic phrases Svara discretization Temporal aspects Pandey et al. (2003) • • Yes • Chordia & Rae (2007) • • • Yes • Belle et al. (2009) • • • No Shetty & Achary (2009) • • Yes • Sridhar & Geetha (2009) • • Yes • Koduri et al. (2011) • • Yes Ranjani et al. (2011) • No Chakraborty & De (2012) • Yes Koduri et al. (2012) • • • Both Chordia & ¸Sentürk (2013) • • • No Dighe et al. (2013a) • • • Yes • Dighe et al. (2013b) • • Yes Koduri et al. (2014) • • • No Kumar et al. (2014) • • • • Yes • Dutta et al. (2015)a • No • a This method performs r¯aga verification and not recognition Table 2.3: R¯aga recognition methods proposed in the literature along with the melodic characteristics they utilize to perform the task. We also indicate if a method uses a discrete svara representation of melody. The methods are arranged in chronological order. 129
  • 128. 3a [YM UO ClSM C O[S U U[ 38BACKGROUND Methods Svara set Svara salience Svara intonation ¯ar¯ohana- avr¯ohana Melodic phrases Svara discretization Temporal aspects Pandey et al. (2003) • • Yes • Chordia & Rae (2007) • • • Yes • Belle et al. (2009) • • • No Shetty & Achary (2009) • • Yes • Sridhar & Geetha (2009) • • Yes • Koduri et al. (2011) • • Yes Ranjani et al. (2011) • No Chakraborty & De (2012) • Yes Koduri et al. (2012) • • • Both Chordia & ¸Sentürk (2013) • • • No Dighe et al. (2013a) • • • Yes • Dighe et al. (2013b) • • Yes Koduri et al. (2014) • • • No Kumar et al. (2014) • • • • Yes • Dutta et al. (2015)a • No • a This method performs r¯aga verification and not recognition Table 2.3: R¯aga recognition methods proposed in the literature along with the melodic characteristics they utilize to perform the task. We also indicate if a method uses a discrete svara representation of melody. The methods are arranged in chronological order. Ix[U 9 Ix[U : Ix[U 130
  • 129. 3a [YM UO ClSM C O[S U U[ 38BACKGROUND Methods Svara set Svara salience Svara intonation ¯ar¯ohana- avr¯ohana Melodic phrases Svara discretization Temporal aspects Pandey et al. (2003) • • Yes • Chordia & Rae (2007) • • • Yes • Belle et al. (2009) • • • No Shetty & Achary (2009) • • Yes • Sridhar & Geetha (2009) • • Yes • Koduri et al. (2011) • • Yes Ranjani et al. (2011) • No Chakraborty & De (2012) • Yes Koduri et al. (2012) • • • Both Chordia & ¸Sentürk (2013) • • • No Dighe et al. (2013a) • • • Yes • Dighe et al. (2013b) • • Yes Koduri et al. (2014) • • • No Kumar et al. (2014) • • • • Yes • Dutta et al. (2015)a • No • a This method performs r¯aga verification and not recognition Table 2.3: R¯aga recognition methods proposed in the literature along with the melodic characteristics they utilize to perform the task. We also indicate if a method uses a discrete svara representation of melody. The methods are arranged in chronological order. 131
  • 130. 3a [YM UO ClSM C O[S U U[ 38BACKGROUND Methods Svara set Svara salience Svara intonation ¯ar¯ohana- avr¯ohana Melodic phrases Svara discretization Temporal aspects Pandey et al. (2003) • • Yes • Chordia & Rae (2007) • • • Yes • Belle et al. (2009) • • • No Shetty & Achary (2009) • • Yes • Sridhar & Geetha (2009) • • Yes • Koduri et al. (2011) • • Yes Ranjani et al. (2011) • No Chakraborty & De (2012) • Yes Koduri et al. (2012) • • • Both Chordia & ¸Sentürk (2013) • • • No Dighe et al. (2013a) • • • Yes • Dighe et al. (2013b) • • Yes Koduri et al. (2014) • • • No Kumar et al. (2014) • • • • Yes • Dutta et al. (2015)a • No • a This method performs r¯aga verification and not recognition Table 2.3: R¯aga recognition methods proposed in the literature along with the melodic characteristics they utilize to perform the task. We also indicate if a method uses a discrete svara representation of melody. The methods are arranged in chronological order. 132
  • 131. 3a [YM UO ClSM C O[S U U[ 38BACKGROUND Methods Svara set Svara salience Svara intonation ¯ar¯ohana- avr¯ohana Melodic phrases Svara discretization Temporal aspects Pandey et al. (2003) • • Yes • Chordia & Rae (2007) • • • Yes • Belle et al. (2009) • • • No Shetty & Achary (2009) • • Yes • Sridhar & Geetha (2009) • • Yes • Koduri et al. (2011) • • Yes Ranjani et al. (2011) • No Chakraborty & De (2012) • Yes Koduri et al. (2012) • • • Both Chordia & ¸Sentürk (2013) • • • No Dighe et al. (2013a) • • • Yes • Dighe et al. (2013b) • • Yes Koduri et al. (2014) • • • No Kumar et al. (2014) • • • • Yes • Dutta et al. (2015)a • No • a This method performs r¯aga verification and not recognition Table 2.3: R¯aga recognition methods proposed in the literature along with the melodic characteristics they utilize to perform the task. We also indicate if a method uses a discrete svara representation of melody. The methods are arranged in chronological order. Ix[U MYe]Z]WUg]ba 133
  • 132. 3a [YM UO ClSM C O[S U U[ 2.4RELATEDWORKININDIANARTMUSIC39 Method Tonal Feature Tonic Identification Feature Recognition Method #R¯agas Dataset (Dur./Num.)a Audio Type Pandey et al. (2003) Pitch (Boersma & Weenink, 2001) NA Svara sequence HMM and n-Gram 2 - / 31 MP Chordia & Rae (2007) Pitch (Sun, 2000) Manual PCD, PCDD SVM classifier 31 20 / 127 MP Belle et al. (2009) Pitch (Rao & Rao, 2009) Manual PCD (parameterized) k-NN classifier 4 0.6 / 10 PP Shetty & Achary (2009) Pitch (Sridhar & Geetha, 2006) NA #Svaras, Vakra svaras Neural Network classifier 20 - / 90 MP Sridhar & Geetha (2009) Pitch (Lee, 2006) Singer identification Svara set, its sequence String matching 3 - / 30 PP Koduri et al. (2011) Pitch (Rao & Rao, 2010) Brute force PCD k-NN classifier 10 2.82 / 170 PP Ranjani et al. (2011) Pitch (Boersma & Weenink, 2001) GMM fitting PDE SC-GMM and Set matching 7 - / 48 PP Chakraborty & De (2012) Pitch (Sengupta, 1990) Error minimization Svara set Set matching - - / - - Koduri et al. (2012) Predominant pitch (Salamon & Gómez, 2012) Multipitch-based PCD variants k-NN classifier 43 - / 215 PP Chordia & ¸Sentürk (2013) Pitch (Camacho, 2007) Brute force PCD variants k-NN and statistical classifiers 31 20 / 127 MP Dighe et al. (2013a) Chroma (Lartillot et al., 2008) Brute force (v¯adi-based) Chroma, Timbre features HMM 4 9.33 / 56 PP Dighe et al. (2013b) Chroma (Lartillot et al., 2008) Brute force (v¯adi-based) PCD variant RF classifier 8 16.8 / 117 PP Koduri et al. (2014) Predominant pitch (Salamon & Gómez, 2012) Multipitch-based PCD (parameterized) Different classifiers 45? 93/424 PP Kumar et al. (2014) Predominant pitch (Salamon & Gómez, 2012) Brute force PCD + n-Gram distribution SVM classifier 10 2.82 / 170 PP Dutta et al. (2015)* Predominant pitch (Salamon & Gómez, 2012) Cepstrum-based Pitch contours LCS with k-NN 30† 3 / 254 PP a In the case of multiple datasets we list the larger one * This method performs r¯aga verification and not recognition ? Authors do not use all 45 r¯agas at once in a single experiment, but consider groups of 3 r¯agas per experiment † Authors finally use only 17 r¯agas in their experiment ABBREVIATIONS: Dur.: Duration of the dataset, Num.: Number of recordings, NA: Not applicable, ‘-’: Not available, SC-GMM: semi-continuous GMM, MP: Monophonic, PP: Polyphonic Table 2.4: Summary of the R¯aga recognition methods proposed in the literature. The methods are arranged in chronological order. 134
  • 133. 2.4RELATEDWORKININDIANARTMUSIC39 Method Tonal Feature Tonic Identification Feature Recognition Method #R¯agas Dataset (Dur./Num.)a Audio Type Pandey et al. (2003) Pitch (Boersma & Weenink, 2001) NA Svara sequence HMM and n-Gram 2 - / 31 MP Chordia & Rae (2007) Pitch (Sun, 2000) Manual PCD, PCDD SVM classifier 31 20 / 127 MP Belle et al. (2009) Pitch (Rao & Rao, 2009) Manual PCD (parameterized) k-NN classifier 4 0.6 / 10 PP Shetty & Achary (2009) Pitch (Sridhar & Geetha, 2006) NA #Svaras, Vakra svaras Neural Network classifier 20 - / 90 MP Sridhar & Geetha (2009) Pitch (Lee, 2006) Singer identification Svara set, its sequence String matching 3 - / 30 PP Koduri et al. (2011) Pitch (Rao & Rao, 2010) Brute force PCD k-NN classifier 10 2.82 / 170 PP Ranjani et al. (2011) Pitch (Boersma & Weenink, 2001) GMM fitting PDE SC-GMM and Set matching 7 - / 48 PP Chakraborty & De (2012) Pitch (Sengupta, 1990) Error minimization Svara set Set matching - - / - - Koduri et al. (2012) Predominant pitch (Salamon & Gómez, 2012) Multipitch-based PCD variants k-NN classifier 43 - / 215 PP Chordia & ¸Sentürk (2013) Pitch (Camacho, 2007) Brute force PCD variants k-NN and statistical classifiers 31 20 / 127 MP Dighe et al. (2013a) Chroma (Lartillot et al., 2008) Brute force (v¯adi-based) Chroma, Timbre features HMM 4 9.33 / 56 PP Dighe et al. (2013b) Chroma (Lartillot et al., 2008) Brute force (v¯adi-based) PCD variant RF classifier 8 16.8 / 117 PP Koduri et al. (2014) Predominant pitch (Salamon & Gómez, 2012) Multipitch-based PCD (parameterized) Different classifiers 45? 93/424 PP Kumar et al. (2014) Predominant pitch (Salamon & Gómez, 2012) Brute force PCD + n-Gram distribution SVM classifier 10 2.82 / 170 PP Dutta et al. (2015)* Predominant pitch (Salamon & Gómez, 2012) Cepstrum-based Pitch contours LCS with k-NN 30† 3 / 254 PP a In the case of multiple datasets we list the larger one * This method performs r¯aga verification and not recognition ? Authors do not use all 45 r¯agas at once in a single experiment, but consider groups of 3 r¯agas per experiment † Authors finally use only 17 r¯agas in their experiment ABBREVIATIONS: Dur.: Duration of the dataset, Num.: Number of recordings, NA: Not applicable, ‘-’: Not available, SC-GMM: semi-continuous GMM, MP: Monophonic, PP: Polyphonic Table 2.4: Summary of the R¯aga recognition methods proposed in the literature. The methods are arranged in chronological order. 3a [YM UO ClSM C O[S U U[ 135
  • 134. 6M M_ 49 IYWbeX]a[f <heUg]ba CYUX 9eg]fgfbaWYegfIx[Uf 49 IYWbeX]a[f <heUg]ba CYUX 9eg]fgfIY_YUfYfIx[Uf UeaUg]W hf]W XUgUfYg ]aXhfgUa] hf]W XUgUfYg 136
  • 135. B [ [_ P 3 [MOT _ time (s) 10 time (s) 10 30 0 20 40 60 80 100 120 Index 0 20 40 60 80 100 120 Index (a) 0 20 40 60 80 100 120 Index 0 20 40 60 80 100 120 (b) 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 G eUfY VUfYX Ix[U IYWb[a]g]ba K<D VUfYX Ix[U IYWb[a]g]ba KbaU_ % KY cbeU_ <]fWeYg]mUg]ba 137
  • 136. B [ [_ P 3 [MOT _ time (s) 10 time (s) 10 30 0 20 40 60 80 100 120 Index 0 20 40 60 80 100 120 Index (a) 0 20 40 60 80 100 120 Index 0 20 40 60 80 100 120 (b) 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 G eUfY VUfYX Ix[U IYWb[a]g]ba K<D VUfYX Ix[U IYWb[a]g]ba KbaU_ % KY cbeU_ 138 <]fWeYg]mUg]ba
  • 137. BT M_ NM_ P ClSM C O[S U U[ Inter-recording Pattern Detection Intra-recording Pattern Discovery Data Processing Music collection Melodic Pattern Discovery Vocabulary Extraction Term-frequency Feature Extraction Feature Normalization TF-IDF Feature Extraction Feature Matrix Pattern Network Generation Similarity Thresholding Community Detection Melodic Pattern Clustering 139
  • 138. BT M_ NM_ P ClSM C O[S U U[ Inter-recording Pattern Detection Intra-recording Pattern Discovery Data Processing Music collection Melodic Pattern Discovery Vocabulary Extraction Term-frequency Feature Extraction Feature Normalization TF-IDF Feature Extraction Feature Matrix Pattern Network Generation Similarity Thresholding Community Detection Melodic Pattern Clustering 140
  • 139. BT M_ NM_ P ClSM C O[S U U[ 141
  • 140. BT M_ NM_ P ClSM C O[S U U[ 142
  • 141. BT M_ NM_ P ClSM C O[S U U[ Kbc]W DbXY_]a[ o KYkg _Uff]Z]WUg]ba G eUfYf Ix[U IYWbeX]a[f NbeXf Kbc]W <bWh Yagf 143
  • 142. BT M_ NM_ P ClSM C O[S U U[ Kbc]W DbXY_]a[ o KYkg _Uff]Z]WUg]ba KYe >eYdhYaWl t AaiYefY <bWh Yag >eYdhYaWl #K> A<> 144
  • 143. E 6 M a 7d MO U[ Vocabulary Extraction Term-frequency Feature Extraction Feature Normalization 145
  • 144. E 6 M a 7d MO U[ Vocabulary Extraction Term-frequency Feature Extraction Feature Normalization 146
  • 145. E 6 M a 7d MO U[ Vocabulary Extraction Term-frequency Feature Extraction Feature Normalization + . , , + + - - - 147
  • 146. E 6 M a 7d MO U[ Vocabulary Extraction Term-frequency Feature Extraction Feature Normalization ies except the ones that comprise patterns extracted from only a single audio record- ing. Such communities are analogous to the words that only occur within a document and, hence, are irrelevant for modeling a topic. We experiment with three different sets of features f1, f2 and f3, which are similar to the TF-IDF features typically used in text information retrieval. We denote our corpus by R comprising NR = |R| number of recordings. A melodic phrase and a recording is denoted by √and r , respectively f1(√,r) = ( 1, if f(√,r) > 0 0, otherwise (6.1) where, f(√,r) denotes the raw frequency of occurrence of pattern √ in recording r. f1 only considers the presence or absence of a pattern in a recording. In order to investigate if the frequency of occurrence of melodic patterns is relevant for charac- terizing r¯agas, we take f2(√,r) = f(√,r). As mentioned, the melodic patterns that occur across different r¯agas and in several recordings do not aid r¯aga recognition. Therefore, to reduce their effect in the feature vector we employ a weighting scheme, similar to the inverse document frequency (idf) weighting in text retrieval. f3(√,r) = f(√,r)I(√,R) (6.2) ✓ NR ◆ We experiment with three different sets of features f1, f2 and f3, which are similar to the TF-IDF features typically used in text information retrieval. We denote our corpus by R comprising NR = |R| number of recordings. A melodic phrase and a recording is denoted by√and r , respectively f1(√,r) = ( 1, if f(√,r) > 0 0, otherwise (6.1) where, f(√,r) denotes the raw frequency of occurrence of pattern √ in recording r. f1 only considers the presence or absence of a pattern in a recording. In order to investigate if the frequency of occurrence of melodic patterns is relevant for charac- terizing r¯agas, we take f2(√,r) = f(√,r). As mentioned, the melodic patterns that occur across different r¯agas and in several recordings do not aid r¯aga recognition. Therefore, to reduce their effect in the feature vector we employ a weighting scheme, similar to the inverse document frequency (idf) weighting in text retrieval. f3(√,r) = f(√,r)I(√,R) (6.2) I(√,R) = log ✓ NR |{r 2 R :√2 r}| ◆ (6.3) 1 2 3 the TF-IDF features typically used in text information retrieval. We denote our corp by R comprising NR = |R| number of recordings. A melodic phrase and a recordi is denoted by√and r , respectively f1(√,r) = ( 1, if f(√,r) > 0 0, otherwise (6 where, f(√,r) denotes the raw frequency of occurrence of pattern √ in recordi r. f1 only considers the presence or absence of a pattern in a recording. In order investigate if the frequency of occurrence of melodic patterns is relevant for chara terizing r¯agas, we take f2(√,r) = f(√,r). As mentioned, the melodic patterns th occur across different r¯agas and in several recordings do not aid r¯aga recognitio Therefore, to reduce their effect in the feature vector we employ a weighting schem similar to the inverse document frequency (idf) weighting in text retrieval. f3(√,r) = f(√,r)I(√,R) (6 I(√,R) = log ✓ NR |{r 2 R :√2 r}| ◆ (6 and, hence, are irrelevant for modeling a topic. We experiment with three different sets of features f1, f2 and f3, which are similar to the TF-IDF features typically used in text information retrieval. We denote our corpus by R comprising NR = |R| number of recordings. A melodic phrase and a recording is denoted by √and r , respectively f1(√,r) = ( 1, if f(√,r) > 0 0, otherwise (6.1) where, f(√,r) denotes the raw frequency of occurrence of pattern √ in recording r. f1 only considers the presence or absence of a pattern in a recording. In order to investigate if the frequency of occurrence of melodic patterns is relevant for charac- terizing r¯agas, we take f2(√,r) = f(√,r). As mentioned, the melodic patterns tha occur across different r¯agas and in several recordings do not aid r¯aga recognition Therefore, to reduce their effect in the feature vector we employ a weighting scheme similar to the inverse document frequency (idf) weighting in text retrieval. f3(√,r) = f(√,r)I(√,R) (6.2) I(√,R) = log ✓ NR |{r 2 R :√2 r}| ◆ (6.3) 148
  • 147. C _a _ 2 PATTERN-BASED R ¯AGA RECOGNITION 185 Method Feature SVML SGD NBM NBG RF LR 1-NN MVSM f1 51.04 55 37.5 54.37 25.41 55.83 - f2 45.83 50.41 35.62 47.5 26.87 51.87 - f3 45.83 51.66 67.29 44.79 23.75 51.87 - MPC PCDfull - - - - - - 73.12 MGK PDparam 30.41 22.29 27.29 28.12 42.91 30.83 25.62 PDcontext 54.16 43.75 5.2 33.12 49.37 54.79 26.25 ble 6.1: Accuracy (%) of r¯aga recognition on RRDCMD dataset by MVSM and other methods thods using different features and classifiers. Bold text signifies the best accuracy by a thod among all its variants ndustani music recordings. We therefore do not consider this method for comparing ults on Hindustani music. e authors of MPC courteously ran the experiments on our dataset using the original plementations of the method. For MGK, the authors kindly extracted the features Dparam and PDcontext) using the original implementation of their method and the periments using different classification strategies were done by us. 2.3 Results and Discussion fore we proceed to present our results, we notify readers that the accuracies repor- d in this section for different methods vary slightly from the ones reported in Gulati 6 AUTOMATIC R ¯AGA RECOGNITION Method Feature SVML SGD NBM NBG RF LR 1-NN MVSM f1 71 72.33 69.33 79.33 38.66 74.33 - f2 65.33 64.33 67.66 72.66 40.33 68 - f3 65.33 62.66 82.66 72 41.33 67.66 - MPC PCDfull - - - - - - 91.66 ble 6.2: Accuracy (%) of r¯aga recognition on RRDHMD dataset by MVSM and other methods hods using different features and classifiers. Bold text signifies the best accuracy by a hod among all its variants. 3 4 5 6 7 8 9 10 11 12 13 14 ˜ (bin index) 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20 0.22 ClusteringCoefficient(C) 0 10 20 30 40 50 60 70 Accuracy(%) C(G) C(Gr) Accuracy of MVSM 3 4 5 6 7 8 9 10 11 12 13 14 ˜ (bin index) 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20 0.22 0.24 ClusteringCoefficient(C) 30 40 50 60 70 80 90 Accuracy(%) C(G) C(Gr) Accuracy of MVSM 149
  • 148. C _a _ 2 PATTERN-BASED R ¯AGA RECOGNITION 185 Method Feature SVML SGD NBM NBG RF LR 1-NN MVSM f1 51.04 55 37.5 54.37 25.41 55.83 - f2 45.83 50.41 35.62 47.5 26.87 51.87 - f3 45.83 51.66 67.29 44.79 23.75 51.87 - MPC PCDfull - - - - - - 73.12 MGK PDparam 30.41 22.29 27.29 28.12 42.91 30.83 25.62 PDcontext 54.16 43.75 5.2 33.12 49.37 54.79 26.25 ble 6.1: Accuracy (%) of r¯aga recognition on RRDCMD dataset by MVSM and other methods thods using different features and classifiers. Bold text signifies the best accuracy by a thod among all its variants ndustani music recordings. We therefore do not consider this method for comparing ults on Hindustani music. e authors of MPC courteously ran the experiments on our dataset using the original plementations of the method. For MGK, the authors kindly extracted the features Dparam and PDcontext) using the original implementation of their method and the periments using different classification strategies were done by us. 2.3 Results and Discussion fore we proceed to present our results, we notify readers that the accuracies repor- d in this section for different methods vary slightly from the ones reported in Gulati 6 AUTOMATIC R ¯AGA RECOGNITION Method Feature SVML SGD NBM NBG RF LR 1-NN MVSM f1 71 72.33 69.33 79.33 38.66 74.33 - f2 65.33 64.33 67.66 72.66 40.33 68 - f3 65.33 62.66 82.66 72 41.33 67.66 - MPC PCDfull - - - - - - 91.66 ble 6.2: Accuracy (%) of r¯aga recognition on RRDHMD dataset by MVSM and other methods hods using different features and classifiers. Bold text signifies the best accuracy by a hod among all its variants. 3 4 5 6 7 8 9 10 11 12 13 14 ˜ (bin index) 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20 0.22 ClusteringCoefficient(C) 0 10 20 30 40 50 60 70 Accuracy(%) C(G) C(Gr) Accuracy of MVSM 3 4 5 6 7 8 9 10 11 12 13 14 ˜ (bin index) 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20 0.22 0.24 ClusteringCoefficient(C) 30 40 50 60 70 80 90 Accuracy(%) C(G) C(Gr) Accuracy of MVSM b cUeY ZYUgheYf gUgY bZ g Y Ueg 150
  • 149. C _a _1 DaYYM e beX]U wYague #, +- GebcbfYX BbXhe] Yg U_( #, +. G <>h__ K> A<> #>- G <cUeU 1- 01 // 3, 2- UeaUg]W # ]aXhfgUa] # • Gulati, S., Serrà, J., Ishwar, V., Şentürk, S., & Serra, X. (2016). Phrase-based rāga recognition using vector space modeling. In ICASSP, pp. 66–70. 151
  • 150. 7 [ 3 M e_U_ R1-S.anmukhapriya R2-K¯api R3-Bhairavi R4-Madhyam¯avati R5-Bilahari R6-M¯ohana˙m R7-Sencurut.t.i R8-´Sr¯ıranjani R9-R¯ıtigaul.a R10-Huss¯en¯ı R11-Dhany¯asi R12-At.¯ana R13-Beh¯ag R14-Surat.i R15-K¯amavardani R16-Mukh¯ari R17-Sindhubhairavi R18-Sah¯an¯a R19-K¯anad.a R20-M¯ay¯am¯al.avagaul.a R21-N¯at.a R22-´Sankar¯abharan.a˙m R23-S¯av¯eri R24-Kam¯as R25-T¯od.i R26-B¯egad.a R27-Harik¯ambh¯oji R28-´Sr¯ı R29-Kaly¯an.i R30-S¯ama R31-N¯at.akurinji R32-P¯urv¯ıkal.y¯an.i R33-Yadukulak¯a˙mb¯oji R34-D¯evag¯andh¯ari R35-K¯ed¯aragaul.a R36-¯Anandabhairavi R37-Gaul.a R38-Var¯al.i R39-K¯a˙mbh¯oji R40-Karaharapriya R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 R13 R14 R15 R16 R17 R18 R19 R20 R21 R22 R23 R24 R25 R26 R27 R28 R29 R30 R31 R32 R33 R34 R35 R36 R37 R38 R39 R40 8 1 1 1 1 4 1 1 1 1 1 1 1 1 9 1 1 1 1 5 3 1 2 7 1 1 1 2 1 5 1 1 1 1 1 1 1 5 1 1 1 1 1 1 2 5 1 1 3 12 1 1 2 5 1 1 1 7 1 2 1 1 1 7 2 1 1 11 1 10 1 1 10 2 1 11 1 1 1 2 1 4 1 1 1 9 1 1 1 1 1 1 1 3 1 1 1 1 11 1 12 12 1 3 7 1 10 2 12 1 10 1 1 9 2 1 3 1 1 2 1 2 1 11 1 1 3 1 1 6 1 1 10 1 11 1 1 1 1 5 2 1 1 1 10 1 1 1 8 1 2 1 2 1 1 1 3 1 2 2 1 7 12 1 1 10 1 11 0 1 2 3 4 5 6 7 8 9 10 11 12 152
  • 151. 7 [ 3 M e_U_ R1-S.anmukhapriya R2-K¯api R3-Bhairavi R4-Madhyam¯avati R5-Bilahari R6-M¯ohana˙m R7-Sencurut.t.i R8-´Sr¯ıranjani R9-R¯ıtigaul.a R10-Huss¯en¯ı R11-Dhany¯asi R12-At.¯ana R13-Beh¯ag R14-Surat.i R15-K¯amavardani R16-Mukh¯ari R17-Sindhubhairavi R18-Sah¯an¯a R19-K¯anad.a R20-M¯ay¯am¯al.avagaul.a R21-N¯at.a R22-´Sankar¯abharan.a˙m R23-S¯av¯eri R24-Kam¯as R25-T¯od.i R26-B¯egad.a R27-Harik¯ambh¯oji R28-´Sr¯ı R29-Kaly¯an.i R30-S¯ama R31-N¯at.akurinji R32-P¯urv¯ıkal.y¯an.i R33-Yadukulak¯a˙mb¯oji R34-D¯evag¯andh¯ari R35-K¯ed¯aragaul.a R36-¯Anandabhairavi R37-Gaul.a R38-Var¯al.i R39-K¯a˙mbh¯oji R40-Karaharapriya R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 R13 R14 R15 R16 R17 R18 R19 R20 R21 R22 R23 R24 R25 R26 R27 R28 R29 R30 R31 R32 R33 R34 R35 R36 R37 R38 R39 R40 8 1 1 1 1 4 1 1 1 1 1 1 1 1 9 1 1 1 1 5 3 1 2 7 1 1 1 2 1 5 1 1 1 1 1 1 1 5 1 1 1 1 1 1 2 5 1 1 3 12 1 1 2 5 1 1 1 7 1 2 1 1 1 7 2 1 1 11 1 10 1 1 10 2 1 11 1 1 1 2 1 4 1 1 1 9 1 1 1 1 1 1 1 3 1 1 1 1 11 1 12 12 1 3 7 1 10 2 12 1 10 1 1 9 2 1 3 1 1 2 1 2 1 11 1 1 3 1 1 6 1 1 10 1 11 1 1 1 1 5 2 1 1 1 10 1 1 1 8 1 2 1 2 1 1 1 3 1 2 2 1 7 12 1 1 10 1 11 0 1 2 3 4 5 6 7 8 9 10 11 12 G eUfY VUfYX ex[Uf 153
  • 152. 7 [ 3 M e_U_ R1-S.anmukhapriya R2-K¯api R3-Bhairavi R4-Madhyam¯avati R5-Bilahari R6-M¯ohana˙m R7-Sencurut.t.i R8-´Sr¯ıranjani R9-R¯ıtigaul.a R10-Huss¯en¯ı R11-Dhany¯asi R12-At.¯ana R13-Beh¯ag R14-Surat.i R15-K¯amavardani R16-Mukh¯ari R17-Sindhubhairavi R18-Sah¯an¯a R19-K¯anad.a R20-M¯ay¯am¯al.avagaul.a R21-N¯at.a R22-´Sankar¯abharan.a˙m R23-S¯av¯eri R24-Kam¯as R25-T¯od.i R26-B¯egad.a R27-Harik¯ambh¯oji R28-´Sr¯ı R29-Kaly¯an.i R30-S¯ama R31-N¯at.akurinji R32-P¯urv¯ıkal.y¯an.i R33-Yadukulak¯a˙mb¯oji R34-D¯evag¯andh¯ari R35-K¯ed¯aragaul.a R36-¯Anandabhairavi R37-Gaul.a R38-Var¯al.i R39-K¯a˙mbh¯oji R40-Karaharapriya R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 R13 R14 R15 R16 R17 R18 R19 R20 R21 R22 R23 R24 R25 R26 R27 R28 R29 R30 R31 R32 R33 R34 R35 R36 R37 R38 R39 R40 8 1 1 1 1 4 1 1 1 1 1 1 1 1 9 1 1 1 1 5 3 1 2 7 1 1 1 2 1 5 1 1 1 1 1 1 1 5 1 1 1 1 1 1 2 5 1 1 3 12 1 1 2 5 1 1 1 7 1 2 1 1 1 7 2 1 1 11 1 10 1 1 10 2 1 11 1 1 1 2 1 4 1 1 1 9 1 1 1 1 1 1 1 3 1 1 1 1 11 1 12 12 1 3 7 1 10 2 12 1 10 1 1 9 2 1 3 1 1 2 1 2 1 11 1 1 3 1 1 6 1 1 10 1 11 1 1 1 1 5 2 1 1 1 10 1 1 1 8 1 2 1 2 1 1 1 3 1 2 2 1 7 12 1 1 10 1 11 0 1 2 3 4 5 6 7 8 9 10 11 12 9__]YX ex[Uf 154