Finding Listening Experiences in Books

Feedback: @enridaga - enrico.daga@open.ac.ukhttp://led.kmi.open.ac.uk/
Finding listening experiences in books
Enrico Daga and Enrico Motta
The Open University (UK)
DARIAH EU Annual Event
Warsaw, 16th May 2019

The LED Project
• An open and freely searchable database that brings together a mass of data
about people’s experiences of listening to music of all kinds, in any historical
period and any culture [1].
• Sophisticated data model, natively in RDF / SPARQL
• Linked Open Data: http://data.open.ac.uk/context/led [2]
• Since 2012, the LED project has collected over 10,000 unique listening
experiences from a variety of textual sources
https://led.kmi.open.ac.uk/

Listening experiences
• What is a listening experience?
• An account of an event involving music and one or more participants
• "Introduced to the Anacreontic Society, consisting of amateurs who perform admirably the
best orchestral works. The usual supper followed. After propitiating me with a trio from
'Cosi Fan Tutte', they drew me to the piano.”
• "The best choir-singing, (Roman Catholic) without accompaniment, we have heard, was at
Munich."
• "Holland is the country of bells; and the merry chimes are to be heard hourly, from almost
every church-tower or steeple."
• All three constitute a report of an experience of a core subject: music.

Problem statement
• Acquiring evidence from texts requires eﬀort, expertise, and it is time
consuming
• The activity is exploratory, or based on an a-priori knowledge of the source
• The overall process is not systematic

LE Database includes text excerpts that can be analysed as positive examples.
Project Gutenberg, >50k english books in the public domain
Reuters-21578 (Reu) is a standard corpus adopted extensively for training and
evaluating systems for information retrieval, document classi︎cation, machine
learning and similar corpus-based research [5]. Includes 21.578 news articles of
various categories. It does not include music.
The UK Reading Experience Database (UK RED) investigates the evidence of
reading in Britain [6].
DBpedia is a large knowledge graph published as Linked Data. Includes SPARQL
endpoint and a NER tool: DBpedia Spotlight [7]
Background Knowledge (BK)

Competing methods?
Forest. A typical Machine Learning workﬂow. We chose a Random Forest
Classiﬁer [8] trained with LE, Reuters, and RED
Statistical (TF-IDF). Project Gutenberg has a Music shelf. We computed an
average TF/IDF to obtain a dictionary that we used to estimate the
relatedness of a text to the music domain.
Statistical (Embeddings). Using Word2Vec [9] to generate a dictionary of
terms related to music , threshold trained on LE, Reuters, and RED
Entities. Find DBpedia entities related to the category Music using DBpedia
Spotlight + a SPARQL query.

Forest, a typical machine learning classifier
LED
Database
LEs in Benchmark
LEs not in
Benchmark
Not LEs
(Reuters,
REs)
Negatives
Positives
Training
Test
Training
Test
Training
Test
Train
Classifier
Features
(NLP)
RF
Classifier
Features
(NLP)
Accuracy
?
?
?
?
?
?
?
Features
(NLP)
play[V],3149,5362
hear[V],2620,3598
music[N],2541,3650
time[N],2019,2644
first[J],2017,2738
come[V],1867,2389
sing[V],1783,2725
make[V],1759,2157
great[J],1727,2219
concert[N],
1705,2467
give[V],1647,2038
take[V],1403,1716
performance[N],
1353,1703
good[J],1323,1652
well[R],1305,1591
know[V],1178,1489
never[R],1142,1388
year[N],1129,1372
[…]
Bookmark
Bookmark
Bookmark
Bookmark
RF
Classifier

Statistical: TF-IDF / Embeddings
?
?
?
?
?
?
?
Bookmark
Bookmark
Bookmark
Bookmark
>
Threshold
T = [t1, t2, . . . , tn]
D = [d1 : s1, d2 : s2, . . . dn : sn] d s
R =
dn2 TX
dn2 D,dn! sn
s T
Beethoven[N]
vocal[J]
music[N]
Liszt[N]
Chopin[N]
composer[N]
Mozart[N]
musical[J]
Haydn[N]
piano[N]
aria[N]
fugue[N]
theme[N]
accent[N]
master[N]
Dickens[N]
resonance-chamber[N]
leading-tone[N]
florid[J]
sound[V]
score[N]
rondo[N]
sweet[J]
sense[N]
gesture[N]
hammer[N]
music[n]
melody[n]
musical[j]
singer[n]
choral[j]
musical[n]
tune[n]
song[n]
singing[n]
flute[n]
violin[n]
improvisation[n]
orchestral[j]
cello[n]
orchestra[n]
serenade[n]
cadenza[n]
melodious[j]
accompaniment[n]
symphony[n]
lute[n]
harp[n]
playing[n]
vocal[n]
lilt[n]
orchestration[n]

Entities
• DBPedia Spotlight to identify %entities%
• SPARQL query to DBpedia to ﬁlter the ones related to category:Music
?
?
?
?
?
?
?
DBpedia Spotlight
SELECT distinct ?sub WHERE {
VALUES ?sub { %entities% }
?sub dct:subject ?subject .
?subject skos:broader{0:%d%} cat:Music
}
• Where %entities% are the resources identiﬁed by the NER engine, and %d% is a
parameter, set to 5 (>5 too much noise).
Bookmark
Bookmark
Bookmark
Bookmark

Experiments
• Gold Standard (GS): 500 LEs + 500 not, from the same books
• GS (1) accurate by inter-rater agreement and (2) pessimistic
• Goal: reduce the bookmarks to supervise, with good ones all in!
• As an IR task (F1); as a (binary) classiﬁcation task (Accuracy)

Tool support
https://led.kmi.open.ac.uk/discovery
• Books have diﬀerent musical density, therefore the handle
to adjust the sensitivity
• We are cropping the text quite brutally, streaming the text
may get more accurate results (but it is more expensive)

Lessons learnt
Some results are very good: ~85% F-Measure & Accuracy, comparable to human annotators
However, different methods capture different aspects:
TRUE: Introduced to the Anacreontic Society, consisting of amateurs who perform admirably
the best orchestral works. The usual supper followed. After propitiating me with a trio from
'Cosi Fan Tutte', they drew me to the piano.
TRUE: In the evening we went to Rev. Baptist Noel's chapel, where one is always sure of
edification from the sermon if not from the psalms.
FALSE: Flags and pendants were suspended from the windows, [...] the colours of the German
States were waving harmoniously together, and the banners of the Fine Arts, with appropriate
inscriptions, particularly those of music, poetry and painting, were especially honored, and
︎oated triumphant amidst the standards of electorates, dukedoms, and kingdoms.

Future work
• Statistical approaches (incl. embeddings) inherit biases specific to
the core concept (music): inspirating[j], heartful[j], …
• Mentions of named entities related to music do not guarantee a
record of an experience of listening
• Hybrid method: (a) to integrate statistical analysis with entity
based “boost”, and (b) correct linguistic use bias (e.g. filter specific
POS)

References
• [1] Barlow, Helen and Rowland, David (Ed.) (2017). Listening to music: people, practices and experiences. The Open University.
• [2] Adamou, A., M. d’Aquin, H. Barlow, and S. Brown (2014). LED: curated and crowd-sourced linked data on music listening
experiences. Proceedings of the ISWC 2014 Posters & Demonstrations Track, 93–96.
• [3] Finkelstein, Lev, et al. "Placing search in context: The concept revisited." ACM Transactions on information systems 20.1
(2002): 116-131.
• [4] Giunchiglia, Fausto, Uladzimir Kharkevich, and Ilya Zaihrayeu. "Concept search." European Semantic Web Conference.
Springer, Berlin, Heidelberg, 2009.
• [5] Lewis, D. D. (1997). Reuters-21578 text categorization collection.
• [6] Halsey, K. (2008). Reading the evidence of reading: An introduction to the reading experience database, 1450-1945.Popular
Narrative Media 1(2).
• [7] Mendes, P. N., Jakob, M., García-Silva, A., & Bizer, C. (2011, September). DBpedia spotlight: shedding light on the web of
documents. In Proceedings of the 7th international conference on semantic systems (pp. 1-8). ACM.
• [8] Ho, Tin Kam. (1995) "Random decision forests." Document analysis and recognition, 1995., proceedings of the third
international conference on. Vol. 1. IEEE, 1995.
• [9] Mikolov, Tomas, et al. (2013) "Distributed representations of words and phrases and their compositionality." Advances in
neural information processing systems. 2013.

Finding Listening Experiences in Books

Recomendados

Recomendados

Más contenido relacionado

Más de Enrico Daga

Más de Enrico Daga (12)

Último

Último (20)

Finding Listening Experiences in Books