Presentation at the DARIAH EU Annual Event, Warsaw, May 2019.
The Listening Experience Database Project (LED) is an initiative aimed at collecting accounts of people’s private experiences of listening to music. Since 2012, the LED community explored a wide variety of sources, collecting over 10.000 unique experiences. Dr Enrico Daga participated at the DARIAH Annual Event presenting the work done for supporting users on finding evidence of listening experiences in books. The presentation, held within the “” session, covered the approach that guided the development of a system that traverses the content of digitised texts in search for passages remarking a description of a musical event – an account of an experience of listening to music. Several approaches have been investigated and compared spanning from Statistical NLP, Machine Learning techniques, and Semantic Web technologies. The best performing method has been used to develop a novel tool to support curators in discovering new listening experiences. FindLEr analyses the content of books for relevant paragraphs systematically, thus reducing significantly the effort required for finding candidate listening experiences.
2. Feedback: @enridaga - enrico.daga@open.ac.ukhttp://led.kmi.open.ac.uk/
The LED Project
• An open and freely searchable database that brings together a mass of data
about people’s experiences of listening to music of all kinds, in any historical
period and any culture [1].
• Sophisticated data model, natively in RDF / SPARQL
• Linked Open Data: http://data.open.ac.uk/context/led [2]
• Since 2012, the LED project has collected over 10,000 unique listening
experiences from a variety of textual sources
https://led.kmi.open.ac.uk/
3. Feedback: @enridaga - enrico.daga@open.ac.ukhttp://led.kmi.open.ac.uk/
Listening experiences
• What is a listening experience?
• An account of an event involving music and one or more participants
• "Introduced to the Anacreontic Society, consisting of amateurs who perform admirably the
best orchestral works. The usual supper followed. After propitiating me with a trio from
'Cosi Fan Tutte', they drew me to the piano.”
• "The best choir-singing, (Roman Catholic) without accompaniment, we have heard, was at
Munich."
• "Holland is the country of bells; and the merry chimes are to be heard hourly, from almost
every church-tower or steeple."
• All three constitute a report of an experience of a core subject: music.
5. Feedback: @enridaga - enrico.daga@open.ac.ukhttp://led.kmi.open.ac.uk/
LE Database includes text excerpts that can be analysed as positive examples.
Project Gutenberg, >50k english books in the public domain
Reuters-21578 (Reu) is a standard corpus adopted extensively for training and
evaluating systems for information retrieval, document classi︎cation, machine
learning and similar corpus-based research [5]. Includes 21.578 news articles of
various categories. It does not include music.
The UK Reading Experience Database (UK RED) investigates the evidence of
reading in Britain [6].
DBpedia is a large knowledge graph published as Linked Data. Includes SPARQL
endpoint and a NER tool: DBpedia Spotlight [7]
Background Knowledge (BK)
6. Feedback: @enridaga - enrico.daga@open.ac.ukhttp://led.kmi.open.ac.uk/
Competing methods?
Forest. A typical Machine Learning workflow. We chose a Random Forest
Classifier [8] trained with LE, Reuters, and RED
Statistical (TF-IDF). Project Gutenberg has a Music shelf. We computed an
average TF/IDF to obtain a dictionary that we used to estimate the
relatedness of a text to the music domain.
Statistical (Embeddings). Using Word2Vec [9] to generate a dictionary of
terms related to music , threshold trained on LE, Reuters, and RED
Entities. Find DBpedia entities related to the category Music using DBpedia
Spotlight + a SPARQL query.
7. Feedback: @enridaga - enrico.daga@open.ac.ukhttp://led.kmi.open.ac.uk/
Forest, a typical machine learning classifier
LED
Database
LEs in Benchmark
LEs not in
Benchmark
Not LEs
(Reuters,
REs)
Negatives
Positives
Training
Test
Training
Test
Training
Test
Train
Classifier
Features
(NLP)
RF
Classifier
Features
(NLP)
Accuracy
?
?
?
?
?
?
?
Features
(NLP)
play[V],3149,5362
hear[V],2620,3598
music[N],2541,3650
time[N],2019,2644
first[J],2017,2738
come[V],1867,2389
sing[V],1783,2725
make[V],1759,2157
great[J],1727,2219
concert[N],
1705,2467
give[V],1647,2038
take[V],1403,1716
performance[N],
1353,1703
good[J],1323,1652
well[R],1305,1591
know[V],1178,1489
never[R],1142,1388
year[N],1129,1372
[…]
Bookmark
Bookmark
Bookmark
Bookmark
RF
Classifier
9. Feedback: @enridaga - enrico.daga@open.ac.ukhttp://led.kmi.open.ac.uk/
Entities
• DBPedia Spotlight to identify %entities%
• SPARQL query to DBpedia to filter the ones related to category:Music
?
?
?
?
?
?
?
DBpedia Spotlight
SELECT distinct ?sub WHERE {
VALUES ?sub { %entities% }
?sub dct:subject ?subject .
?subject skos:broader{0:%d%} cat:Music
}
• Where %entities% are the resources identified by the NER engine, and %d% is a
parameter, set to 5 (>5 too much noise).
Bookmark
Bookmark
Bookmark
Bookmark
12. Feedback: @enridaga - enrico.daga@open.ac.ukhttp://led.kmi.open.ac.uk/
Lessons learnt
Some results are very good: ~85% F-Measure & Accuracy, comparable to human annotators
However, different methods capture different aspects:
TRUE: Introduced to the Anacreontic Society, consisting of amateurs who perform admirably
the best orchestral works. The usual supper followed. After propitiating me with a trio from
'Cosi Fan Tutte', they drew me to the piano.
TRUE: In the evening we went to Rev. Baptist Noel's chapel, where one is always sure of
edification from the sermon if not from the psalms.
FALSE: Flags and pendants were suspended from the windows, [...] the colours of the German
States were waving harmoniously together, and the banners of the Fine Arts, with appropriate
inscriptions, particularly those of music, poetry and painting, were especially honored, and
︎oated triumphant amidst the standards of electorates, dukedoms, and kingdoms.
13. Feedback: @enridaga - enrico.daga@open.ac.ukhttp://led.kmi.open.ac.uk/
Future work
• Statistical approaches (incl. embeddings) inherit biases specific to
the core concept (music): inspirating[j], heartful[j], …
• Mentions of named entities related to music do not guarantee a
record of an experience of listening
• Hybrid method: (a) to integrate statistical analysis with entity
based “boost”, and (b) correct linguistic use bias (e.g. filter specific
POS)
14. Feedback: @enridaga - enrico.daga@open.ac.ukhttp://led.kmi.open.ac.uk/
References
• [1] Barlow, Helen and Rowland, David (Ed.) (2017). Listening to music: people, practices and experiences. The Open University.
• [2] Adamou, A., M. d’Aquin, H. Barlow, and S. Brown (2014). LED: curated and crowd-sourced linked data on music listening
experiences. Proceedings of the ISWC 2014 Posters & Demonstrations Track, 93–96.
• [3] Finkelstein, Lev, et al. "Placing search in context: The concept revisited." ACM Transactions on information systems 20.1
(2002): 116-131.
• [4] Giunchiglia, Fausto, Uladzimir Kharkevich, and Ilya Zaihrayeu. "Concept search." European Semantic Web Conference.
Springer, Berlin, Heidelberg, 2009.
• [5] Lewis, D. D. (1997). Reuters-21578 text categorization collection.
• [6] Halsey, K. (2008). Reading the evidence of reading: An introduction to the reading experience database, 1450-1945.Popular
Narrative Media 1(2).
• [7] Mendes, P. N., Jakob, M., García-Silva, A., & Bizer, C. (2011, September). DBpedia spotlight: shedding light on the web of
documents. In Proceedings of the 7th international conference on semantic systems (pp. 1-8). ACM.
• [8] Ho, Tin Kam. (1995) "Random decision forests." Document analysis and recognition, 1995., proceedings of the third
international conference on. Vol. 1. IEEE, 1995.
• [9] Mikolov, Tomas, et al. (2013) "Distributed representations of words and phrases and their compositionality." Advances in
neural information processing systems. 2013.