SlideShare una empresa de Scribd logo
1 de 47
dbrec
Music recommendations using DBpedia
         Alexandre Passant - DERI, NUI Galway
                  In-Use Track @ ISWC2010
             11th November 2010, Shanghai, China
Good news, it doesn’t fit
anymore in a slide !

Many producers, only a
few consumers (besides
search engines): BBC,
Drupal ,,,
Agenda

• Semantic Distance over Linked Data
• dbrec - architecture, dataset and UI
• Evaluation
• Lessons learnt
• Next steps and conclusion
Semantic Distance
Semantic Distance over
    Linked Data
• Relying only on links
• Relying only on instance data
• Using dereferencable URIs
 • And using resources following the LD
    principles
Linked Data
Linked Data
              e:l1

e:r1   e:l1          e:r2


         e:l2
e:l2   e:l3          e:l3




e:r3                 e:r4
G = (R, L, I)
                                         e:l1

                           e:r1   e:l1          e:r2

• R = {r , r , ..., r }
          1    2       n
                                    e:l2

• L = {l , l , ..., l }
         1 2       n       e:l2   e:l3          e:l3


• I = {i , i , ..., i }
         1 2       n

                           e:r3                 e:r4
e:l1

e:r1   e:l1          e:r2


         e:l2
e:l2   e:l3          e:l3




e:r3                 e:r4
e:l1                        e:l1

e:r1   e:l1          e:r2   e:r1   e:l1          e:r2


         e:l2                        e:l2
e:l2   e:l3          e:l3   e:l2   e:l3          e:l3




e:r3                 e:r4   e:r3                 e:r4
e:l1                        e:l1

e:r1   e:l1          e:r2   e:r1   e:l1          e:r2


         e:l2                        e:l2
e:l2   e:l3          e:l3   e:l2   e:l3          e:l3




e:r3                 e:r4   e:r3                 e:r4




              e:l1

e:r1   e:l1          e:r2


         e:l2
e:l2   e:l3          e:l3




e:r3                 e:r4
e:l1                        e:l1

e:r1   e:l1          e:r2   e:r1   e:l1          e:r2


         e:l2                        e:l2
e:l2   e:l3          e:l3   e:l2   e:l3          e:l3




e:r3                 e:r4   e:r3                 e:r4




              e:l1                        e:l1

e:r1   e:l1          e:r2   e:r1   e:l1          e:r2


         e:l2                        e:l2
e:l2   e:l3          e:l3   e:l2   e:l3          e:l3




e:r3                 e:r4   e:r3                 e:r4
e:l1

e:r1   e:l1          e:r2


         e:l2
e:l2   e:l3          e:l3




e:r3                 e:r4
LDSD
The LDSD ontology




                Our own ontology, but
                could map with MuSim
                in the future
dbrec
At a glance
• A system providing recommendations for all
  DBpedia bands and artists (±40K) using LDSD
    • And explaining its recommendations
    • Both using Linked Data and Semantic
      Web standards (RDF, SPARQL)
• Integrating related Web data for an improved
  user-experience
Architecture
                (2) Dataset reducing




 (1) Dataset                       (3) LDSD                 (4) User
identification                     computation              interface

                    RDF Data                    RDF Data
Dataset
•   Retrieving all artists and bands in DBpedia (±40K)
    •   Including incoming / outcoming links
    •   Approximately 3M triples
•   Removing datatype properties
    •   2.2M (75%)
•   Merging /ontology and /property
    •   1.7M (55%)
Distribution




               20K+ artists (50%) are
               not linked to any other
               artist
Curation
• 118 properties linking artists together
  • 18 mis-used, 35 wrongly defined (e.g.
    dbprop:klfsgProperty)
• 578 properties linking artist to resources
  • 183 used only once, 36 wrongly defined
• 767 properties linking resources to artists
  • 336 used only once, 115 wrongly defined
• Dataset reduced to 1M triples
Computing distance
• 9,797 minutes
                 Done for all artists in
                 DBpedia
                                                Artist    Time (sec.)
                                              Ramones       25.20
  • 2 x AMD Opteron 250                     Johnny Cash     61.16
    4GB Ubuntu 8.10                              U2         50.06

• 50M triples                                The Clash
                                            Bad Religion
                                                            43.34
                                                            34.98
  • Modelled using the                     The Aggrolites    7.35
    LDSD ontology                            Janis Joplin   23.12
Artist        Distance
   Elvis Presley      0.0978
June Carter Cash      0.1056
  Willie Nelson       0.1322
Kris Kristofferson    0.1407
    Bob Dylan         0.1466
  Marty Robbins       0.1673
  Rosanne Cash        0.1782
 Charlie McCoy        0.1836
   Gene Autry         0.1910
    Carl Smith        0.1980
User interface
Sorry, slideshare people,
that’s a movie so you
won’t be able to see it !
Evaluation
Evaluation settings
• Off-line and on-line user evaluation
 • Using common RecSys metrics
• 10 subjects
 • 2 women, 8 men
 • 24 to 34 years old
 • 35 to 55 minutes per interview, F2F
Metrics
•   Off-line evaluation - comparison with last.fm
    •   5 artists / bands
    •   2 blind list, 10 ranked recommendations per list
    •   Marks from 1 to 5
•   On-line recommendation - dbrec only
    •   5 artists / bands
    •   Browsing recommendations using dbrec
    •   Marks from 1 to 5, plus observations and interviews
dbrec vs last.fm

• Average mark of recommendations
 • 3.37(±1.19)
 • 3.44(±1.25) w/ on-line
 • 3.69(±1.01) for last.fm
Results for the precision
(t=X means items are
                                Precision
relevant if ranked X or
more)

Cannot compute recall

                                 dbrec           dbrec
(implies users know all
bands in the system)
                                                             last.fm
                                (off-line)   (off+on-line)
                          t=2    92.05          90.59        98.32

                          t=3    76.63          77.72        87.91

                          t=4    49.06          51.23        58.05

                          t=5    20.09            25         25.165
Novel recommendations
• Lots of unknown recommendations
 • 62% for dbrec (59.6% w/ on-line)
 • 40.4% for last.fm
 • But that’s a good news !
• Evaluated 274 of them on dbrec
 • 3.05(± 1.09)
Observations
• Explanations for unknown bands
 • Checked for 198 / 310
• But also for known ones
 • 24 / 190
• Helped to understand the recommendation
 • Even if they already knew the band
Interviews
              User-interface Explanations
 Enjoyable          9             7
  Useful            9             9
 Enriching         8             10
Easy to use        10             9
 Confusing         0              2
Complicated        0              2
 Too geeky         1              6
Lessons learnt
Data quality
• Issues with DBpedia properties
  • Misused : dbprop:notableInstruments
  • Wrongly defined : dbprop:klfsgProperty
  • Duplicates : /ontology versus /property
• Requires data curation !
  • Automated and manual
Use, but replicate
• More and more public SPARQL endpoints
 • Often limited to X max results
 • 5,000 on DBpedia              But, that’s fair enough.

                                 Hosting a SPARQL
                                 endpoint is costly and


• Difficult to use in production
                                 opening-it up fully to
                                 anyone would require lots
                                 of maintenance, etc.



 • Requires local replica
 • But implies synchronisation !
Use, but replicate
SELECT ?label
WHERE {
    ?x rdfs:label ?label .
    { ?x a dbpedia:MusicArtist }
    UNION
    { ?x a dbpedia:Band }
}
Use, but replicate

• Names of all DBpedia artists
 • Get number of results w/ COUNT
 • Run n/5000 queries (LIMIT + OFFSET)
 • Recompose results         The query had more than
                             40K results, since most
                             artists got their names


• Network errors, etc.
                             using different
                             languages.

                             So much more than 8
                             queries
SPARQL, Be quick or be neat
   • “List all artists / bands sharing common
     property-values with the current one”
     • Fits in a single SPARQL query
     • But does not scale
   • “Optimisation” has to be done manually by
     splitting the query and recomposing results
     using an external script
SPARQL, Be quick or be neat
                                                                  Tests done in the local
                                                                  RDF store

                                                                  1: full-query
                                                                  2: split by property
                                                                  3: split by property-
                                                                  object

                                                                  Up to 75% faster

                   Direct SPARQL       Property-slicing      Complete-slicing
                 Queries     Time    Queries       Time    Queries           Time
  Ramones          1        139.97     20         109.51     66              37.84
 Johnny Cash       1        257.81     30         152.60    135              75.35
     U2            1        155.53     22         122.91     70              44.03
  The Clash        1        146.43     20         110.84     79              42.61
 Bad Religion      1        104.08     23          86.49     97              47.35
The Aggrolites     1        145.92     13         114.52     28              28.33
 Janis Joplin      1        230.88     27         151.00     98              62.81
Next steps
Next steps
•   Other data sources
    •   FreeBase, MusicBrainz, etc.
•   Distance improvement
    •   Propagation, feature selection, etc.
•   User Interface
    •   User-friendly explanations
•   LOD-compliance
    •   Mapping with other ontologies, SPARQL endpoint
Conclusion
• Defined and applied a Semantic Distance
  measure to Linked Data
• Used it to build a end-user music
  recommender system, with ±40K artists
• Evaluated it using RecSys metrics
• Learnt several domain-independent lessons
  regarding LOD consumption
Questions ?
Contact:
alexandre.passant@deri.org - http://apassant.net - @terraces

                   Acknowledgements:
   Science Foundation Ireland - SFI/08/CE/I1380 (Lion 2)

                       References:
    AIII Spring Symposium 2010 - LinkedAI Symposium
                 ESWC2010 - Demo Track
                  ISWC2010 - In-Use Track
Pictures credits
•   http://flickr.com/photos/yumlog2/20896759/ by yuki*

•   http://richard.cyganiak.de/2007/10/lod/ by Richard Cyganiak and Anja Jentzsch

•   http://flickr.com/photos/loungerie/2196866243/ by loungerie

•   http://flickr.com/photos/iskanderstruck/248786430/ by iskanderbenamor

•   http://flickr.com/photos/homer4k/461407380/ by homer4k

•   http://flickr.com/photos/jpellgen/2390204986/ by jpellgen

•   http://flickr.com/photos/onegoodbumblebee/839927986/ by One Good Bumblebee

•   http://flickr.com/photos/28509009@N03/2668650475/ by marcreis

•   http://flickr.com/photos/8049973@N03/2656140464/ by wolf.tone

Más contenido relacionado

Similar a Dbrec - Music recommendations using DBpedia

Lecture 7: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 7: Data-Intensive Computing for Text Analysis (Fall 2011)Lecture 7: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 7: Data-Intensive Computing for Text Analysis (Fall 2011)
Matthew Lease
 
R for the semantic web, Quesada useR 2009
R for the semantic web, Quesada useR 2009R for the semantic web, Quesada useR 2009
R for the semantic web, Quesada useR 2009
Jose Quesada
 

Similar a Dbrec - Music recommendations using DBpedia (20)

HPTS talk on micro-sharding with Katta
HPTS talk on micro-sharding with KattaHPTS talk on micro-sharding with Katta
HPTS talk on micro-sharding with Katta
 
Clustering - ACM 2013 02-25
Clustering - ACM 2013 02-25Clustering - ACM 2013 02-25
Clustering - ACM 2013 02-25
 
An overview of Peer-to-Peer technology new
An overview of Peer-to-Peer technology newAn overview of Peer-to-Peer technology new
An overview of Peer-to-Peer technology new
 
Paris data-geeks-2013-03-28
Paris data-geeks-2013-03-28Paris data-geeks-2013-03-28
Paris data-geeks-2013-03-28
 
ACM 2013-02-25
ACM 2013-02-25ACM 2013-02-25
ACM 2013-02-25
 
Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford
 
Nearest Neighbor Customer Insight
Nearest Neighbor Customer InsightNearest Neighbor Customer Insight
Nearest Neighbor Customer Insight
 
How SolrCloud Solved Recovery Issues - Dat Cao Manh, Lucidworks
How SolrCloud Solved Recovery Issues - Dat Cao Manh, LucidworksHow SolrCloud Solved Recovery Issues - Dat Cao Manh, Lucidworks
How SolrCloud Solved Recovery Issues - Dat Cao Manh, Lucidworks
 
Oxford 05-oct-2012
Oxford 05-oct-2012Oxford 05-oct-2012
Oxford 05-oct-2012
 
Lecture 7: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 7: Data-Intensive Computing for Text Analysis (Fall 2011)Lecture 7: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 7: Data-Intensive Computing for Text Analysis (Fall 2011)
 
R for the semantic web, Quesada useR 2009
R for the semantic web, Quesada useR 2009R for the semantic web, Quesada useR 2009
R for the semantic web, Quesada useR 2009
 
Cassandra introduction apache con 2014 budapest
Cassandra introduction apache con 2014 budapestCassandra introduction apache con 2014 budapest
Cassandra introduction apache con 2014 budapest
 
Cassandra introduction mars jug
Cassandra introduction mars jugCassandra introduction mars jug
Cassandra introduction mars jug
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
Cassandra data structures and algorithms
Cassandra data structures and algorithmsCassandra data structures and algorithms
Cassandra data structures and algorithms
 
Processing Large Graphs
Processing Large GraphsProcessing Large Graphs
Processing Large Graphs
 
Approximate nearest neighbor methods and vector models – NYC ML meetup
Approximate nearest neighbor methods and vector models – NYC ML meetupApproximate nearest neighbor methods and vector models – NYC ML meetup
Approximate nearest neighbor methods and vector models – NYC ML meetup
 
Approximate Nearest Neighbors and Vector Models by Erik Bernhardsson
Approximate Nearest Neighbors and Vector Models by Erik BernhardssonApproximate Nearest Neighbors and Vector Models by Erik Bernhardsson
Approximate Nearest Neighbors and Vector Models by Erik Bernhardsson
 
Digital Twin: jSON-LD, RDF
Digital Twin: jSON-LD, RDFDigital Twin: jSON-LD, RDF
Digital Twin: jSON-LD, RDF
 
Hive at Last.fm
Hive at Last.fmHive at Last.fm
Hive at Last.fm
 

Más de Alexandre Passant

Semwebbers, LODers, what PubSubHubbub can do for you (SemTech)
Semwebbers, LODers, what PubSubHubbub can do for you (SemTech)Semwebbers, LODers, what PubSubHubbub can do for you (SemTech)
Semwebbers, LODers, what PubSubHubbub can do for you (SemTech)
Alexandre Passant
 
Semwebbers, LODers: What PubSubHubbub can do for you
Semwebbers, LODers: What PubSubHubbub can do for you Semwebbers, LODers: What PubSubHubbub can do for you
Semwebbers, LODers: What PubSubHubbub can do for you
Alexandre Passant
 

Más de Alexandre Passant (20)

seevl: Cloud computing, the Semantic Web and Music Discovery
seevl: Cloud computing, the Semantic Web and Music Discoveryseevl: Cloud computing, the Semantic Web and Music Discovery
seevl: Cloud computing, the Semantic Web and Music Discovery
 
seevl: Data-driven music discovery
seevl: Data-driven music discoveryseevl: Data-driven music discovery
seevl: Data-driven music discovery
 
Semwebbers, LODers, what PubSubHubbub can do for you (SemTech)
Semwebbers, LODers, what PubSubHubbub can do for you (SemTech)Semwebbers, LODers, what PubSubHubbub can do for you (SemTech)
Semwebbers, LODers, what PubSubHubbub can do for you (SemTech)
 
Seevl - SemTech lightning talk
Seevl - SemTech lightning talkSeevl - SemTech lightning talk
Seevl - SemTech lightning talk
 
SPARQL 1.1 - Quoi de neuf pour manipuler les données sur le Web
SPARQL 1.1 - Quoi de neuf pour manipuler les données sur le WebSPARQL 1.1 - Quoi de neuf pour manipuler les données sur le Web
SPARQL 1.1 - Quoi de neuf pour manipuler les données sur le Web
 
Social Web - The Next Generation
Social Web - The Next GenerationSocial Web - The Next Generation
Social Web - The Next Generation
 
Semwebbers, LODers: What PubSubHubbub can do for you
Semwebbers, LODers: What PubSubHubbub can do for you Semwebbers, LODers: What PubSubHubbub can do for you
Semwebbers, LODers: What PubSubHubbub can do for you
 
i-Semantics panel
i-Semantics paneli-Semantics panel
i-Semantics panel
 
Rethinking Microblogging: Open Distributed Semantic
Rethinking Microblogging: Open Distributed SemanticRethinking Microblogging: Open Distributed Semantic
Rethinking Microblogging: Open Distributed Semantic
 
SMOB - A Framework for Semantic Microblogging
SMOB - A Framework for Semantic MicrobloggingSMOB - A Framework for Semantic Microblogging
SMOB - A Framework for Semantic Microblogging
 
A semantic framework for modelling quotes in email conversations
A semantic framework for modelling quotes in email conversationsA semantic framework for modelling quotes in email conversations
A semantic framework for modelling quotes in email conversations
 
sparqlPuSH: Proactive notification of data updates in RDF stores using PubSub...
sparqlPuSH: Proactive notification of data updates in RDF stores using PubSub...sparqlPuSH: Proactive notification of data updates in RDF stores using PubSub...
sparqlPuSH: Proactive notification of data updates in RDF stores using PubSub...
 
Introduction to the Semantic Web
Introduction to the Semantic WebIntroduction to the Semantic Web
Introduction to the Semantic Web
 
Ontologies et Web 2.0 : une Expérimentation en Entreprise
Ontologies et Web 2.0 : une Expérimentation en EntrepriseOntologies et Web 2.0 : une Expérimentation en Entreprise
Ontologies et Web 2.0 : une Expérimentation en Entreprise
 
A user-friendly interface to browse and find DOAP project with doap:store
A user-friendly interface to browse and find DOAP project with doap:storeA user-friendly interface to browse and find DOAP project with doap:store
A user-friendly interface to browse and find DOAP project with doap:store
 
Folksonomies, Ontologies and Corporate Blogging
Folksonomies, Ontologies and Corporate BloggingFolksonomies, Ontologies and Corporate Blogging
Folksonomies, Ontologies and Corporate Blogging
 
Using Ontologies to Strengthen Folksonomies and Enrich Information Retrieval ...
Using Ontologies to Strengthen Folksonomies and Enrich Information Retrieval ...Using Ontologies to Strengthen Folksonomies and Enrich Information Retrieval ...
Using Ontologies to Strengthen Folksonomies and Enrich Information Retrieval ...
 
The Social Web
The Social WebThe Social Web
The Social Web
 
Using Semantics to Improve Corporate Online Communities
Using Semantics to Improve Corporate Online CommunitiesUsing Semantics to Improve Corporate Online Communities
Using Semantics to Improve Corporate Online Communities
 
Technologies du Web Sémantique pour l'Entreprise 2.0
Technologies du Web Sémantique pour l'Entreprise 2.0Technologies du Web Sémantique pour l'Entreprise 2.0
Technologies du Web Sémantique pour l'Entreprise 2.0
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 

Dbrec - Music recommendations using DBpedia

  • 1. dbrec Music recommendations using DBpedia Alexandre Passant - DERI, NUI Galway In-Use Track @ ISWC2010 11th November 2010, Shanghai, China
  • 2. Good news, it doesn’t fit anymore in a slide ! Many producers, only a few consumers (besides search engines): BBC, Drupal ,,,
  • 3.
  • 4. Agenda • Semantic Distance over Linked Data • dbrec - architecture, dataset and UI • Evaluation • Lessons learnt • Next steps and conclusion
  • 6. Semantic Distance over Linked Data • Relying only on links • Relying only on instance data • Using dereferencable URIs • And using resources following the LD principles
  • 8. Linked Data e:l1 e:r1 e:l1 e:r2 e:l2 e:l2 e:l3 e:l3 e:r3 e:r4
  • 9. G = (R, L, I) e:l1 e:r1 e:l1 e:r2 • R = {r , r , ..., r } 1 2 n e:l2 • L = {l , l , ..., l } 1 2 n e:l2 e:l3 e:l3 • I = {i , i , ..., i } 1 2 n e:r3 e:r4
  • 10. e:l1 e:r1 e:l1 e:r2 e:l2 e:l2 e:l3 e:l3 e:r3 e:r4
  • 11. e:l1 e:l1 e:r1 e:l1 e:r2 e:r1 e:l1 e:r2 e:l2 e:l2 e:l2 e:l3 e:l3 e:l2 e:l3 e:l3 e:r3 e:r4 e:r3 e:r4
  • 12. e:l1 e:l1 e:r1 e:l1 e:r2 e:r1 e:l1 e:r2 e:l2 e:l2 e:l2 e:l3 e:l3 e:l2 e:l3 e:l3 e:r3 e:r4 e:r3 e:r4 e:l1 e:r1 e:l1 e:r2 e:l2 e:l2 e:l3 e:l3 e:r3 e:r4
  • 13. e:l1 e:l1 e:r1 e:l1 e:r2 e:r1 e:l1 e:r2 e:l2 e:l2 e:l2 e:l3 e:l3 e:l2 e:l3 e:l3 e:r3 e:r4 e:r3 e:r4 e:l1 e:l1 e:r1 e:l1 e:r2 e:r1 e:l1 e:r2 e:l2 e:l2 e:l2 e:l3 e:l3 e:l2 e:l3 e:l3 e:r3 e:r4 e:r3 e:r4
  • 14. e:l1 e:r1 e:l1 e:r2 e:l2 e:l2 e:l3 e:l3 e:r3 e:r4
  • 15. LDSD
  • 16. The LDSD ontology Our own ontology, but could map with MuSim in the future
  • 17. dbrec
  • 18. At a glance • A system providing recommendations for all DBpedia bands and artists (±40K) using LDSD • And explaining its recommendations • Both using Linked Data and Semantic Web standards (RDF, SPARQL) • Integrating related Web data for an improved user-experience
  • 19. Architecture (2) Dataset reducing (1) Dataset (3) LDSD (4) User identification computation interface RDF Data RDF Data
  • 20. Dataset • Retrieving all artists and bands in DBpedia (±40K) • Including incoming / outcoming links • Approximately 3M triples • Removing datatype properties • 2.2M (75%) • Merging /ontology and /property • 1.7M (55%)
  • 21. Distribution 20K+ artists (50%) are not linked to any other artist
  • 22. Curation • 118 properties linking artists together • 18 mis-used, 35 wrongly defined (e.g. dbprop:klfsgProperty) • 578 properties linking artist to resources • 183 used only once, 36 wrongly defined • 767 properties linking resources to artists • 336 used only once, 115 wrongly defined • Dataset reduced to 1M triples
  • 23. Computing distance • 9,797 minutes Done for all artists in DBpedia Artist Time (sec.) Ramones 25.20 • 2 x AMD Opteron 250 Johnny Cash 61.16 4GB Ubuntu 8.10 U2 50.06 • 50M triples The Clash Bad Religion 43.34 34.98 • Modelled using the The Aggrolites 7.35 LDSD ontology Janis Joplin 23.12
  • 24. Artist Distance Elvis Presley 0.0978 June Carter Cash 0.1056 Willie Nelson 0.1322 Kris Kristofferson 0.1407 Bob Dylan 0.1466 Marty Robbins 0.1673 Rosanne Cash 0.1782 Charlie McCoy 0.1836 Gene Autry 0.1910 Carl Smith 0.1980
  • 26. Sorry, slideshare people, that’s a movie so you won’t be able to see it !
  • 28. Evaluation settings • Off-line and on-line user evaluation • Using common RecSys metrics • 10 subjects • 2 women, 8 men • 24 to 34 years old • 35 to 55 minutes per interview, F2F
  • 29. Metrics • Off-line evaluation - comparison with last.fm • 5 artists / bands • 2 blind list, 10 ranked recommendations per list • Marks from 1 to 5 • On-line recommendation - dbrec only • 5 artists / bands • Browsing recommendations using dbrec • Marks from 1 to 5, plus observations and interviews
  • 30. dbrec vs last.fm • Average mark of recommendations • 3.37(±1.19) • 3.44(±1.25) w/ on-line • 3.69(±1.01) for last.fm
  • 31. Results for the precision (t=X means items are Precision relevant if ranked X or more) Cannot compute recall dbrec dbrec (implies users know all bands in the system) last.fm (off-line) (off+on-line) t=2 92.05 90.59 98.32 t=3 76.63 77.72 87.91 t=4 49.06 51.23 58.05 t=5 20.09 25 25.165
  • 32. Novel recommendations • Lots of unknown recommendations • 62% for dbrec (59.6% w/ on-line) • 40.4% for last.fm • But that’s a good news ! • Evaluated 274 of them on dbrec • 3.05(± 1.09)
  • 33. Observations • Explanations for unknown bands • Checked for 198 / 310 • But also for known ones • 24 / 190 • Helped to understand the recommendation • Even if they already knew the band
  • 34. Interviews User-interface Explanations Enjoyable 9 7 Useful 9 9 Enriching 8 10 Easy to use 10 9 Confusing 0 2 Complicated 0 2 Too geeky 1 6
  • 36. Data quality • Issues with DBpedia properties • Misused : dbprop:notableInstruments • Wrongly defined : dbprop:klfsgProperty • Duplicates : /ontology versus /property • Requires data curation ! • Automated and manual
  • 37. Use, but replicate • More and more public SPARQL endpoints • Often limited to X max results • 5,000 on DBpedia But, that’s fair enough. Hosting a SPARQL endpoint is costly and • Difficult to use in production opening-it up fully to anyone would require lots of maintenance, etc. • Requires local replica • But implies synchronisation !
  • 38. Use, but replicate SELECT ?label WHERE { ?x rdfs:label ?label . { ?x a dbpedia:MusicArtist } UNION { ?x a dbpedia:Band } }
  • 39. Use, but replicate • Names of all DBpedia artists • Get number of results w/ COUNT • Run n/5000 queries (LIMIT + OFFSET) • Recompose results The query had more than 40K results, since most artists got their names • Network errors, etc. using different languages. So much more than 8 queries
  • 40. SPARQL, Be quick or be neat • “List all artists / bands sharing common property-values with the current one” • Fits in a single SPARQL query • But does not scale • “Optimisation” has to be done manually by splitting the query and recomposing results using an external script
  • 41. SPARQL, Be quick or be neat Tests done in the local RDF store 1: full-query 2: split by property 3: split by property- object Up to 75% faster Direct SPARQL Property-slicing Complete-slicing Queries Time Queries Time Queries Time Ramones 1 139.97 20 109.51 66 37.84 Johnny Cash 1 257.81 30 152.60 135 75.35 U2 1 155.53 22 122.91 70 44.03 The Clash 1 146.43 20 110.84 79 42.61 Bad Religion 1 104.08 23 86.49 97 47.35 The Aggrolites 1 145.92 13 114.52 28 28.33 Janis Joplin 1 230.88 27 151.00 98 62.81
  • 43. Next steps • Other data sources • FreeBase, MusicBrainz, etc. • Distance improvement • Propagation, feature selection, etc. • User Interface • User-friendly explanations • LOD-compliance • Mapping with other ontologies, SPARQL endpoint
  • 44. Conclusion • Defined and applied a Semantic Distance measure to Linked Data • Used it to build a end-user music recommender system, with ±40K artists • Evaluated it using RecSys metrics • Learnt several domain-independent lessons regarding LOD consumption
  • 46. Contact: alexandre.passant@deri.org - http://apassant.net - @terraces Acknowledgements: Science Foundation Ireland - SFI/08/CE/I1380 (Lion 2) References: AIII Spring Symposium 2010 - LinkedAI Symposium ESWC2010 - Demo Track ISWC2010 - In-Use Track
  • 47. Pictures credits • http://flickr.com/photos/yumlog2/20896759/ by yuki* • http://richard.cyganiak.de/2007/10/lod/ by Richard Cyganiak and Anja Jentzsch • http://flickr.com/photos/loungerie/2196866243/ by loungerie • http://flickr.com/photos/iskanderstruck/248786430/ by iskanderbenamor • http://flickr.com/photos/homer4k/461407380/ by homer4k • http://flickr.com/photos/jpellgen/2390204986/ by jpellgen • http://flickr.com/photos/onegoodbumblebee/839927986/ by One Good Bumblebee • http://flickr.com/photos/28509009@N03/2668650475/ by marcreis • http://flickr.com/photos/8049973@N03/2656140464/ by wolf.tone