SlideShare una empresa de Scribd logo
1 de 36
http://musicnet.mspace.fm MusicNet: Aligning Musicology’s Metadata  David Bretherton (Music), Daniel Alexander Smith, Joe Lambert and mc schraefel (Electronics and Computer Science) Music Linked Data Workshop 12 May 2011 • JISC, London
David Bretherton 2
musicSpace, the precursor to MusicNet 3
Problem 4
Digitised data is often ‘siloed’.  	Geographical dispersal has been replaced by virtual dispersal on the web. Data is now segregated into countless online repositories by:  Media type (text, image, audio, video) Date of creation/publication Subject 5
Digitised data is often ‘siloed’.  	Geographical dispersal has been replaced by virtual dispersal on the web. Data is now segregated into countless online repositories by:  Language Copyright holder Ad hoc/insecure nature of project funding 6
Digitised data is often ‘siloed’.  Interoperability has generally not been given a high enough priority.  And, because the datasets are ‘mature’ the data isn’t Linked Data. 7
Solution  8
9 ‘musicSpace’ is a faceted browser
10 Demonstration  	‘What recording of works by Cage exist, which performers have recorded a particular work by Cage, and what else by Cage have they recorded? Screencast 1: http://www.youtube.com/watch?v=keTN12OWies&hd=1
How musicSpace provided the motivation for MusicNet 11
Problem: you can align metadata fields, but this doesn’t align the data in those fields 12 	Schubert‏Schubert, Franz‏Schubert, Franz Peter‏Shu-po-tʻe,‏ ‎‡d  1797-1828‏Schubert‏ ‎‡d  1797-1828‏F. P. Schubert‏Schubert, ...‏ ‎‡d  1797-1828‏Schubert, F.‏Schubert, F.‏ ‎‡d  1797-1828‏Schubert, Fr.‏Schubert, Fr.‏ ‎‡d  1797-1828‏Schubert, Franciszek.‏Schubert, Franç.‏ ‎‡d  1797-1828‏Schubert, François‏ ‎‡d  1797-1828‏Schubert, Franz P.‏ ‎‡d  1797-1828‏ 	Schubert, Franz Peter‏Schubert, Franz Peter,‏ ‎‡d  1797-1828‏Schubert, Franz Peter‏ ‎‡d  1797-1828‏Schubert, François,‏ ‎‡d  1797-1828‏Schubert.‏Schubert‏ ‎‡d  1797-1828‏Shu-po-tʿe‏ ‎‡d  1797-1828‏Shubert, F. (Frant︠s︡)‏ ‎‡d  1797-1828‏Shubert, F.‏ ‎‡q  (Frant︠s︡),‏ ‎‡d  1797-1828‏Shubert, Frant︠s︡,‏ ‎‡d  1797-1828‏Shubert, Frant︠s︡‏ ‎‡d  1797-1828‏Shūberuto, F.‏Shūberuto, Furantsu‏ ‎‡d  1797-1828‏Šubert, Franc‏ ‎‡d  1797-1828‏Šubertas, F. (Francas),‏ ‎‡d  1797-1828‏ 	Šubertas, Francas Peteris,‏ ‎‡d  1797-1828‏ Šubert, F.‏ Šubertas, F.‏ ‎‡d  1797-1828‏ שוברט, פרנץ‏ シューベルト, F., 1797-1828‏ シューベルト, フランツ‏ ‎‡d  1797-1828‏ 舒柏特, 弗朗茨‏ Schubert, François‏ ‎‡d  1797-1828‏ Schubert, Franz Peter‏ ‎‡d  1797-1828‏
Causes of ‘dirty’ data (for names) Different naming conventions; e.g. ‘Bach, Johann Sebastian’ or ‘J. S. Bach’ Inclusion of non-name data in name field;  e.g. ‘Schubert, Franz, 1797-1828. Songs’, or ‘Allen, Betty (Teresa)’ Different languages (and alphabets); User input errors. e.g. ‘Bach, Johhan Sebastien’ 13
Dirty data degrades the user experience 14 	Searching for compositions by the composer Franz Schubert (1797–1828)... Screencast 2: http://www.youtube.com/watch?v=pFsYfz1vlAg&hd=1
MusicNet’s alignment tool  15
Prototype 1 (musicSpace era) 16
Used Alignment API & Google Docs We used Alignment API to compare the names as strings, using WordNet to enable word stemming, synonym support, etc.  Alignment API produces a similarity measure for each possible match.  We planned to set a threshold for automatic approval.  Matches below that threshold would be sent to a Google Docs spreadsheet for expert review. 17
Shortcoming: no threshold False matches with high similarity measures: True matches with low similarity measures: 18
Prototype 2 (building a custom tool for MusicNet) 19
Design considerations  From Prototype 1: A completely automated solution is out of the question (for the moment...).  We needed a custom tool with a human-friendly UI (we also wanted keyboard shortcuts for speed). Access to additional metadata (i.e. context), so matches can be researched by the reviewer. From experience with faceted browsers:  Alphabetically sorted columns enable one to spot synonymous names at a glance. Normally sources give names surname first; duplication arises from the different representation of given names. 20
Alignment process Data* Algorithm compares hash of alpha-only l.c. version of name Suggested groups No groups suggested User verified* or rejected* Manual grouping (research*) Synonym groups URIs Alternative names   Back links* 21
UI of Prototype 2 22
Prototype 2 demo 23 Screencast 3: http://www.youtube.com/watch?v=5f8iaryZMk0&hd=1
Daniel Alexander Smith 24
Linked Data 25 URI for everything e.g. Beethoven is: http://musicnet.mspace.fm/person/367b107e07a7f9db8aed7c72d2ebeab2#id http://dbpedia.org/resource/Ludwig_van_Beethoven http://www.bbc.co.uk/music/artists/1f9df192-a621-4f54-8850-2c5373b7eac9#artist
Contribution 26 MusicNet provides links between composers in multiple scholarly repositories We also link to MusicBrainz and BBC /music This can be fed back into projects like musicSpace where disambiguation is a problem
27
MusicNet Published Data 28 Links between multiple URIs Representations from each source Machine-readable, standardised to build applications over this data Human searchable and usable too http://musicspace.mspace.fm
29
30
Provenance 31 Retains source of information e.g. that Grove say “Schubert, Franz (Peter)”and British Library say “Schubert, Franz” and “Schubert”
Provenance 32 When they don’t exist already, musicnet provides individual URIs for a composer from each source, e.g.: http://musicnet.mspace.fm/person/7ca5e11353f11c7d625d9aabb27a6174#blcollection Then links back to search URLs, e.g.: http://catalogue.bl.uk/F/?func=find-b&request=Schubert%2C+Franz&find_code=WNA
33
34
Links from BBC /music 35 Harvested links from BBC to: DBPedia New York Times IMDB PBS etc.
36 Thank you for listening!

Más contenido relacionado

Similar a D. Bretherton, D. A. Smith, J. Lambert, mc schraefel. MusicNet: Aligning Musicology’s Metadata

How the Web can change social science research (including yours)
How the Web can change social science research (including yours)How the Web can change social science research (including yours)
How the Web can change social science research (including yours)Frank van Harmelen
 
Digital Composition
Digital CompositionDigital Composition
Digital CompositionJosh White
 
I want to know more about compuerized text analysis
I want to know more about   compuerized text analysisI want to know more about   compuerized text analysis
I want to know more about compuerized text analysisLuke Czarnecki
 
This American Library: An Experiment in Academic Library Podcasting
This American Library: An Experiment in Academic Library PodcastingThis American Library: An Experiment in Academic Library Podcasting
This American Library: An Experiment in Academic Library Podcastingcbkretz
 
Metric Learning for Music Discovery with Source and Target Playlists
Metric Learning for Music Discovery with Source and Target PlaylistsMetric Learning for Music Discovery with Source and Target Playlists
Metric Learning for Music Discovery with Source and Target PlaylistsYing-Shu Kuo
 
Linked Open Europeana: Semantics for the Citizen
Linked Open Europeana: Semantics for the CitizenLinked Open Europeana: Semantics for the Citizen
Linked Open Europeana: Semantics for the CitizenStefan Gradmann
 
Wimmics Research Team Overview 2017
Wimmics Research Team Overview 2017Wimmics Research Team Overview 2017
Wimmics Research Team Overview 2017Fabien Gandon
 
2_ResearchDataOverview_SarahCallaghan
2_ResearchDataOverview_SarahCallaghan2_ResearchDataOverview_SarahCallaghan
2_ResearchDataOverview_SarahCallaghanOpenAIRE
 
Sarah Callaghan Research Data Overview
Sarah Callaghan Research Data OverviewSarah Callaghan Research Data Overview
Sarah Callaghan Research Data OverviewOpenAIRE
 
Towards a musical Semantic Web
Towards a musical Semantic WebTowards a musical Semantic Web
Towards a musical Semantic WebYves Raimond
 
FRBRized Discovery and Cataloging
FRBRized Discovery and CatalogingFRBRized Discovery and Cataloging
FRBRized Discovery and CatalogingJenn Riley
 
Small Worlds Social Graphs Social Media
Small Worlds Social Graphs Social MediaSmall Worlds Social Graphs Social Media
Small Worlds Social Graphs Social Mediasuresh sood
 
Audio Signal Identification and Search Approach for Minimizing the Search Tim...
Audio Signal Identification and Search Approach for Minimizing the Search Tim...Audio Signal Identification and Search Approach for Minimizing the Search Tim...
Audio Signal Identification and Search Approach for Minimizing the Search Tim...aciijournal
 
AUDIO SIGNAL IDENTIFICATION AND SEARCH APPROACH FOR MINIMIZING THE SEARCH TIM...
AUDIO SIGNAL IDENTIFICATION AND SEARCH APPROACH FOR MINIMIZING THE SEARCH TIM...AUDIO SIGNAL IDENTIFICATION AND SEARCH APPROACH FOR MINIMIZING THE SEARCH TIM...
AUDIO SIGNAL IDENTIFICATION AND SEARCH APPROACH FOR MINIMIZING THE SEARCH TIM...aciijournal
 
Audio Signal Identification and Search Approach for Minimizing the Search Tim...
Audio Signal Identification and Search Approach for Minimizing the Search Tim...Audio Signal Identification and Search Approach for Minimizing the Search Tim...
Audio Signal Identification and Search Approach for Minimizing the Search Tim...aciijournal
 
Wish you were here before!
Wish you were here before!Wish you were here before!
Wish you were here before!David King
 
#OSSPARIS17 - Logiciel libre pour une science reproductible, par ROBERTO DI C...
#OSSPARIS17 - Logiciel libre pour une science reproductible, par ROBERTO DI C...#OSSPARIS17 - Logiciel libre pour une science reproductible, par ROBERTO DI C...
#OSSPARIS17 - Logiciel libre pour une science reproductible, par ROBERTO DI C...Paris Open Source Summit
 

Similar a D. Bretherton, D. A. Smith, J. Lambert, mc schraefel. MusicNet: Aligning Musicology’s Metadata (20)

How the Web can change social science research (including yours)
How the Web can change social science research (including yours)How the Web can change social science research (including yours)
How the Web can change social science research (including yours)
 
Digital Composition
Digital CompositionDigital Composition
Digital Composition
 
I want to know more about compuerized text analysis
I want to know more about   compuerized text analysisI want to know more about   compuerized text analysis
I want to know more about compuerized text analysis
 
Music in the Archives
Music in the ArchivesMusic in the Archives
Music in the Archives
 
This American Library: An Experiment in Academic Library Podcasting
This American Library: An Experiment in Academic Library PodcastingThis American Library: An Experiment in Academic Library Podcasting
This American Library: An Experiment in Academic Library Podcasting
 
Metric Learning for Music Discovery with Source and Target Playlists
Metric Learning for Music Discovery with Source and Target PlaylistsMetric Learning for Music Discovery with Source and Target Playlists
Metric Learning for Music Discovery with Source and Target Playlists
 
Linked Open Europeana: Semantics for the Citizen
Linked Open Europeana: Semantics for the CitizenLinked Open Europeana: Semantics for the Citizen
Linked Open Europeana: Semantics for the Citizen
 
About the Social Semantic Web
About the Social Semantic WebAbout the Social Semantic Web
About the Social Semantic Web
 
Wimmics Research Team Overview 2017
Wimmics Research Team Overview 2017Wimmics Research Team Overview 2017
Wimmics Research Team Overview 2017
 
2_ResearchDataOverview_SarahCallaghan
2_ResearchDataOverview_SarahCallaghan2_ResearchDataOverview_SarahCallaghan
2_ResearchDataOverview_SarahCallaghan
 
Sarah Callaghan Research Data Overview
Sarah Callaghan Research Data OverviewSarah Callaghan Research Data Overview
Sarah Callaghan Research Data Overview
 
Towards a musical Semantic Web
Towards a musical Semantic WebTowards a musical Semantic Web
Towards a musical Semantic Web
 
FRBRized Discovery and Cataloging
FRBRized Discovery and CatalogingFRBRized Discovery and Cataloging
FRBRized Discovery and Cataloging
 
Small Worlds Social Graphs Social Media
Small Worlds Social Graphs Social MediaSmall Worlds Social Graphs Social Media
Small Worlds Social Graphs Social Media
 
Audio Signal Identification and Search Approach for Minimizing the Search Tim...
Audio Signal Identification and Search Approach for Minimizing the Search Tim...Audio Signal Identification and Search Approach for Minimizing the Search Tim...
Audio Signal Identification and Search Approach for Minimizing the Search Tim...
 
Sattose talk
Sattose talkSattose talk
Sattose talk
 
AUDIO SIGNAL IDENTIFICATION AND SEARCH APPROACH FOR MINIMIZING THE SEARCH TIM...
AUDIO SIGNAL IDENTIFICATION AND SEARCH APPROACH FOR MINIMIZING THE SEARCH TIM...AUDIO SIGNAL IDENTIFICATION AND SEARCH APPROACH FOR MINIMIZING THE SEARCH TIM...
AUDIO SIGNAL IDENTIFICATION AND SEARCH APPROACH FOR MINIMIZING THE SEARCH TIM...
 
Audio Signal Identification and Search Approach for Minimizing the Search Tim...
Audio Signal Identification and Search Approach for Minimizing the Search Tim...Audio Signal Identification and Search Approach for Minimizing the Search Tim...
Audio Signal Identification and Search Approach for Minimizing the Search Tim...
 
Wish you were here before!
Wish you were here before!Wish you were here before!
Wish you were here before!
 
#OSSPARIS17 - Logiciel libre pour une science reproductible, par ROBERTO DI C...
#OSSPARIS17 - Logiciel libre pour une science reproductible, par ROBERTO DI C...#OSSPARIS17 - Logiciel libre pour une science reproductible, par ROBERTO DI C...
#OSSPARIS17 - Logiciel libre pour une science reproductible, par ROBERTO DI C...
 

Último

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 

Último (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 

D. Bretherton, D. A. Smith, J. Lambert, mc schraefel. MusicNet: Aligning Musicology’s Metadata

  • 1. http://musicnet.mspace.fm MusicNet: Aligning Musicology’s Metadata David Bretherton (Music), Daniel Alexander Smith, Joe Lambert and mc schraefel (Electronics and Computer Science) Music Linked Data Workshop 12 May 2011 • JISC, London
  • 5. Digitised data is often ‘siloed’. Geographical dispersal has been replaced by virtual dispersal on the web. Data is now segregated into countless online repositories by: Media type (text, image, audio, video) Date of creation/publication Subject 5
  • 6. Digitised data is often ‘siloed’. Geographical dispersal has been replaced by virtual dispersal on the web. Data is now segregated into countless online repositories by: Language Copyright holder Ad hoc/insecure nature of project funding 6
  • 7. Digitised data is often ‘siloed’. Interoperability has generally not been given a high enough priority. And, because the datasets are ‘mature’ the data isn’t Linked Data. 7
  • 9. 9 ‘musicSpace’ is a faceted browser
  • 10. 10 Demonstration ‘What recording of works by Cage exist, which performers have recorded a particular work by Cage, and what else by Cage have they recorded? Screencast 1: http://www.youtube.com/watch?v=keTN12OWies&hd=1
  • 11. How musicSpace provided the motivation for MusicNet 11
  • 12. Problem: you can align metadata fields, but this doesn’t align the data in those fields 12 Schubert‏Schubert, Franz‏Schubert, Franz Peter‏Shu-po-tʻe,‏ ‎‡d  1797-1828‏Schubert‏ ‎‡d  1797-1828‏F. P. Schubert‏Schubert, ...‏ ‎‡d  1797-1828‏Schubert, F.‏Schubert, F.‏ ‎‡d  1797-1828‏Schubert, Fr.‏Schubert, Fr.‏ ‎‡d  1797-1828‏Schubert, Franciszek.‏Schubert, Franç.‏ ‎‡d  1797-1828‏Schubert, François‏ ‎‡d  1797-1828‏Schubert, Franz P.‏ ‎‡d  1797-1828‏ Schubert, Franz Peter‏Schubert, Franz Peter,‏ ‎‡d  1797-1828‏Schubert, Franz Peter‏ ‎‡d  1797-1828‏Schubert, François,‏ ‎‡d  1797-1828‏Schubert.‏Schubert‏ ‎‡d  1797-1828‏Shu-po-tʿe‏ ‎‡d  1797-1828‏Shubert, F. (Frant︠s︡)‏ ‎‡d  1797-1828‏Shubert, F.‏ ‎‡q  (Frant︠s︡),‏ ‎‡d  1797-1828‏Shubert, Frant︠s︡,‏ ‎‡d  1797-1828‏Shubert, Frant︠s︡‏ ‎‡d  1797-1828‏Shūberuto, F.‏Shūberuto, Furantsu‏ ‎‡d  1797-1828‏Šubert, Franc‏ ‎‡d  1797-1828‏Šubertas, F. (Francas),‏ ‎‡d  1797-1828‏ Šubertas, Francas Peteris,‏ ‎‡d  1797-1828‏ Šubert, F.‏ Šubertas, F.‏ ‎‡d  1797-1828‏ שוברט, פרנץ‏ シューベルト, F., 1797-1828‏ シューベルト, フランツ‏ ‎‡d  1797-1828‏ 舒柏特, 弗朗茨‏ Schubert, François‏ ‎‡d  1797-1828‏ Schubert, Franz Peter‏ ‎‡d  1797-1828‏
  • 13. Causes of ‘dirty’ data (for names) Different naming conventions; e.g. ‘Bach, Johann Sebastian’ or ‘J. S. Bach’ Inclusion of non-name data in name field; e.g. ‘Schubert, Franz, 1797-1828. Songs’, or ‘Allen, Betty (Teresa)’ Different languages (and alphabets); User input errors. e.g. ‘Bach, Johhan Sebastien’ 13
  • 14. Dirty data degrades the user experience 14 Searching for compositions by the composer Franz Schubert (1797–1828)... Screencast 2: http://www.youtube.com/watch?v=pFsYfz1vlAg&hd=1
  • 17. Used Alignment API & Google Docs We used Alignment API to compare the names as strings, using WordNet to enable word stemming, synonym support, etc. Alignment API produces a similarity measure for each possible match. We planned to set a threshold for automatic approval. Matches below that threshold would be sent to a Google Docs spreadsheet for expert review. 17
  • 18. Shortcoming: no threshold False matches with high similarity measures: True matches with low similarity measures: 18
  • 19. Prototype 2 (building a custom tool for MusicNet) 19
  • 20. Design considerations From Prototype 1: A completely automated solution is out of the question (for the moment...). We needed a custom tool with a human-friendly UI (we also wanted keyboard shortcuts for speed). Access to additional metadata (i.e. context), so matches can be researched by the reviewer. From experience with faceted browsers: Alphabetically sorted columns enable one to spot synonymous names at a glance. Normally sources give names surname first; duplication arises from the different representation of given names. 20
  • 21. Alignment process Data* Algorithm compares hash of alpha-only l.c. version of name Suggested groups No groups suggested User verified* or rejected* Manual grouping (research*) Synonym groups URIs Alternative names  Back links* 21
  • 23. Prototype 2 demo 23 Screencast 3: http://www.youtube.com/watch?v=5f8iaryZMk0&hd=1
  • 25. Linked Data 25 URI for everything e.g. Beethoven is: http://musicnet.mspace.fm/person/367b107e07a7f9db8aed7c72d2ebeab2#id http://dbpedia.org/resource/Ludwig_van_Beethoven http://www.bbc.co.uk/music/artists/1f9df192-a621-4f54-8850-2c5373b7eac9#artist
  • 26. Contribution 26 MusicNet provides links between composers in multiple scholarly repositories We also link to MusicBrainz and BBC /music This can be fed back into projects like musicSpace where disambiguation is a problem
  • 27. 27
  • 28. MusicNet Published Data 28 Links between multiple URIs Representations from each source Machine-readable, standardised to build applications over this data Human searchable and usable too http://musicspace.mspace.fm
  • 29. 29
  • 30. 30
  • 31. Provenance 31 Retains source of information e.g. that Grove say “Schubert, Franz (Peter)”and British Library say “Schubert, Franz” and “Schubert”
  • 32. Provenance 32 When they don’t exist already, musicnet provides individual URIs for a composer from each source, e.g.: http://musicnet.mspace.fm/person/7ca5e11353f11c7d625d9aabb27a6174#blcollection Then links back to search URLs, e.g.: http://catalogue.bl.uk/F/?func=find-b&request=Schubert%2C+Franz&find_code=WNA
  • 33. 33
  • 34. 34
  • 35. Links from BBC /music 35 Harvested links from BBC to: DBPedia New York Times IMDB PBS etc.
  • 36. 36 Thank you for listening!

Notas del editor

  1. Research data has become ‘siloed’.Geographical dispersal has been replaced by virtual dispersal on the web. Data is now segregated into countless online repositories according to:Media type (whether the data relates to text, image, audio or video objects)Date of creation and/or publicationThe Subject area of the material [change slide]
  2. Language of the material Copyright holderAnd even the ad hoc, often insecure nature of project funding (lots of small digitisation grants = lots of small datasets).
  3. ‘musicSpace’ uses the ‘mSpace’ faceted browser.Facets are represented as columns list attributes of the data, allowing the user to make selections in the columns in order to filter down results. The interface is reactive, so that subsequent choices are limited to those that would yield results.The faceted and reactive nature of the interface enables complex queries to be addressed, as the following screencast illustrates … [change slide]
  4. [The cage.mp4 screencast shows musicSpace being used to answer the Cage multipart question that was discussed earlier in ‘1. Motivation’. It also demonstrates various aspects of the musicSpace interface’. The outline below gives time indexed comments on what is being shown in the screencast, and could be used as a the basis for a ‘script’]00:00 – musicSpace offers standard text functionality, so we can search a collection for references to ‘cage’. 00:16 – We can click on ‘Open slice’ to open a particular record in the columns at the top of the page. 00:19 – And of course we can search the database by selecting items in the columns and rearranging the columns. 00:23 – So to address the first part of this multipart query, to search for records by cage, I move the ‘Composer’ column before the ‘Musical Work’ column.00:34 – And by making sections in the ‘Musical Work’, I can a list of results of recording of a particular work. In this case, I’ve looked for recordings of Cage’s composition ‘Dream’.00:45 – I can add other columns to continue searching. Here I’m adding the ‘Performer’ column to address the second part of the query, which performers have recorded a particular work. 00:52 – You’ll see that as the column is added, it automatically populates with the names of performers that meet the previously selected criteria (Composer = Cage, Musical Work = Dream).00:56 – Furthermore, we can add additional columns, such as ‘Place’, to discover where the recordings were made. 01:05 – By clicking ‘Export columns’ we can bring up the selections and lists contained in the facets, so that this information can exported for later use by the researcher. 01:14 – Columns that are no longer needed can be closed. 01:18 – To perform the last part of the query, I can select a performer in the last column (‘Performer’), and then drag this column to the left so that it precedes the ‘Musical Work’ column.01:25 – The ‘Musical Works’ column now updates to list all the works by Cage that the selected performer (here John Tilbury) has recorded.01:30 – Again, this information can be exported for later use.01:34 – Each record returned by musicSpace contains a hyperlink to the source record, so that it can be viewed in its original context.