SlideShare una empresa de Scribd logo
1 de 23
Descargar para leer sin conexión
DSNotify - Detecting and Fixing
        Broken Links in Linked Data Sets


        WebS ’09 @ DEXA 2009
        Linz, 02/09/2009
        Bernhard Haslhofer and Niko Popitsch
Bernhard Haslhofer, Niko Popitsch
Summary




Bernhard Haslhofer, Niko Popitsch   2
<mo:MusicGroup rdf:about="/music/artists/084308bd-1654-436f-ba03-df6697104e19#artist">
 <foaf:name>Green Day</foaf:name>
 <owl:sameAs rdf:resource="http://dbpedia.org/resource/Green_Day" />
 <mo:image rdf:resource="/music/images/artists/7col_in/084308bd-1654-436f-ba03-
df6697104e19.jpg" />


 <foaf:page rdf:resource="/music/artists/084308bd-1654-436f-ba03-df6697104e19.html" />
 <mo:musicbrainz rdf:resource="http://musicbrainz.org/artist/084308bd-1654-436f-ba03-
df6697104e19.html" />
 <mo:homepage rdf:resource="http://www.greenday.com/" />
 <mo:fanpage rdf:resource="http://www.greendayvideos.com/" />
 <mo:fanpage rdf:resource="http://www.greenday.net" />
 <mo:imdb rdf:resource="http://www.imdb.com/name/nm1554564/" />
 <mo:myspace rdf:resource="http://www.myspace.com/greenday" />
  ...
...
<rdf:Description rdf:about="http://dbpedia.org/resource/Green_Day">
      <dbpprop:abstract xmlns:dbpprop="http://dbpedia.org/property/" xml:lang="en">Green Day
      is an American rock trio formed in 1987. The band has consisted of Billie Joe Armstrong
      (vocals, guitar), Mike Dirnt, and Tré Cool for the majority of its existence...
      </dbpprop:abstract>
</rdf:Description>
...
<rdf:Description rdf:about="http://dbpedia.org/resource/Green_Day">
      <dbpprop:abstract xmlns:dbpprop="http://dbpedia.org/property/" xml:lang="de">Green Day
      [gɹiːn deɪ] ist eine US-amerikanische Punk-Rock-Band, mit der Anfang der 1990er das Punk-
      Revival begann. Die Band wurde 1987 von Billie Joe Armstrong und Mike Dirnt zusammen
      mit dem Schlagzeuger John Kiffmeyer alias Al Sobrante als The Sweet Children....
      </dbpprop:abstract>
</rdf:Description>
...
...but...




Bernhard Haslhofer, Niko Popitsch   8
Some numbers...

        •     Events between DBpedia 3.2 (10/2008) and 3.3
              (05/2009)
             •     # resources created: 29449

             •     # resources removed: 4789

             •     # resources moved: 729




Bernhard Haslhofer, Niko Popitsch           9
Link Integrity...
        •     is a qualitative property that is given when all links
              within and between a set of data sources are valid and
              deliver the result intended by the link creator.

        •     cf. referential integrity in RDBMS

        •     demands a solution that
             •     detects broken links between resources

             •     provides support for fixing broken links


Bernhard Haslhofer, Niko Popitsch          11
Types of broken links
        •     Removed link targets
             •     e.g., resource deleted, server not available anymore, etc.

        •     Moved link targets
             •     available at another Web location

             •     e.g., reorganization of Web resources

        •     Modified link targets


Bernhard Haslhofer, Niko Popitsch           12
The DSNotify Approach
        •     periodically monitor items (resources) in a specific
              Linked Data source

        •     extract descriptive features vector for each item

        •     store item + feature vector in index

        •     use feature vectors to detect if items have been
              removed or moved to another location

        •     if moved, add relationship between “old” and “new”
              item

Bernhard Haslhofer, Niko Popitsch     13
Architecture                                             LOD „consuming“
                                                                application



                                                                                         LOD Sources
                                       LOD Source

                                                           owl:sameAs

                                                             owl:sameAs



                                                                                                   monitor
                                             update
                                                                                * Monitor (feature extraction)
                                                                        Event
                                                                        LOG
                                                      notifications
                                       * LOD source                                        Indices
                                          updater
                                                             querying              II        RII         AII




                                        * Decider       Decision making         * Move Detector (heuristic)

                                user
                                                                                                      DSNOTIFY


Bernhard Haslhofer, Niko Popitsch                     14
Index Interaction
                    Item Index (II)           Archived Item Index (AII)       Removed Item Index (RII)
               http://dbpedia.org/resource/
     t1        Green_Day (band)


    t2                                                                         http://dbpedia.org/resource/
                                                                               Green_Day (band)


    t3        http://dbpedia.org/resource/     http://dbpedia.org/resource/
              band/Green_Day                   Green_Day (band)




    t4         http://dbpedia.org/resource/    http://dbpedia.org/resource/
               band/Alternative/Green_Day      band/Green_Day

                                               http://dbpedia.org/resource/
           time                                Green_Day (band)




Bernhard Haslhofer, Niko Popitsch                        15
Move Detection

        •     is a semi-automatic process

        •     calculate similarity between items based on their
              feature vectors using domain-specific heuristics

        •     probability > given threshold: automatic decision

        •     probability < given threshold: ask expert user



Bernhard Haslhofer, Niko Popitsch     16
DSNotify HTTP Interface

        •     GET http://<server>:<port>/<dsnotify>/item/<uri>
             •      find out what happened with an item

        •     GET http://<server>:<port>/<dsnotify>/eventChoice
             •      retrieve pending event choices (move / remove)

        •     ...



Bernhard Haslhofer, Niko Popitsch          17
Evaluation Plan
     t   -n             ...              t   -2                          t   -1                          t   0



DBpedia 2.0                         DBpedia 3.0                  DBpedia 3.1                DBpedia 3.2




                      Diff                              Diff                            Diff
              manual classification                manual classification            manual classification

              mv                    rm            mv             rm               mv             rm

Bernhard Haslhofer, Niko Popitsch                         18
Status / Future Work

        •     1st prototype (infrastructure) ready

        •     annotated test-data set based on DBpedia available

        •     Currently working on:
             •     system for simulating past modifications in DBpedia

             •     the DSNotify evaluation



Bernhard Haslhofer, Niko Popitsch            19
Fixing Your Web since 2009
Backup




Bernhard Haslhofer, Niko Popitsch     21
Evaluation Plan

        •     Monitor simulated DBpedia evolution (t-n - t0)

        •     Precision / recall of automatic move detection
             •     with different similarity thresholds

             •     with different heuristics / and feature vectors




Bernhard Haslhofer, Niko Popitsch            22
Linked Data / Web of Data

        •     Data management paradigm on the basis of Web
              technologies

        •     HTTP, URI, and RDF/S are the key technologies

        •     Applications (not Web browsers) are data consumers

        •     Links between resources play a major role



Bernhard Haslhofer, Niko Popitsch    23

Más contenido relacionado

Similar a DSNotify - Detecting and Fixing Broken Links in Linked Data Sets

From Open Linked Data towards an Ecosystem of Interlinked Knowledge
From Open Linked Data towards an Ecosystem of Interlinked KnowledgeFrom Open Linked Data towards an Ecosystem of Interlinked Knowledge
From Open Linked Data towards an Ecosystem of Interlinked Knowledge
Sören Auer
 

Similar a DSNotify - Detecting and Fixing Broken Links in Linked Data Sets (20)

S. Dixon, C. Mesnage, B. Norton. LinkedBrainz Live
S. Dixon, C. Mesnage, B. Norton. LinkedBrainz LiveS. Dixon, C. Mesnage, B. Norton. LinkedBrainz Live
S. Dixon, C. Mesnage, B. Norton. LinkedBrainz Live
 
Node collaboration - Exported Resources and PuppetDB
Node collaboration - Exported Resources and PuppetDBNode collaboration - Exported Resources and PuppetDB
Node collaboration - Exported Resources and PuppetDB
 
What is New in W3C land?
What is New in W3C land?What is New in W3C land?
What is New in W3C land?
 
Linked Open Data (LOD) part 1
Linked Open Data (LOD) part 1Linked Open Data (LOD) part 1
Linked Open Data (LOD) part 1
 
A document-inspired way for tracking changes of RDF data - The case of the Op...
A document-inspired way for tracking changes of RDF data - The case of the Op...A document-inspired way for tracking changes of RDF data - The case of the Op...
A document-inspired way for tracking changes of RDF data - The case of the Op...
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
 
Linked Data at the BNE. Elena Escolano Rodríguez, Daniel Vila Suero
Linked Data at the BNE. Elena Escolano Rodríguez, Daniel Vila SueroLinked Data at the BNE. Elena Escolano Rodríguez, Daniel Vila Suero
Linked Data at the BNE. Elena Escolano Rodríguez, Daniel Vila Suero
 
A Provenance-Aware Linked Data Application for Trip Management and Organization
A Provenance-Aware Linked Data Application for Trip Management and OrganizationA Provenance-Aware Linked Data Application for Trip Management and Organization
A Provenance-Aware Linked Data Application for Trip Management and Organization
 
Getting Started with Hadoop
Getting Started with HadoopGetting Started with Hadoop
Getting Started with Hadoop
 
W4 4 marc-alexandre-nolin-v2
W4 4 marc-alexandre-nolin-v2W4 4 marc-alexandre-nolin-v2
W4 4 marc-alexandre-nolin-v2
 
Making your data work for you: Scratchpads, publishing & the biodiversity dat...
Making your data work for you: Scratchpads, publishing & the biodiversity dat...Making your data work for you: Scratchpads, publishing & the biodiversity dat...
Making your data work for you: Scratchpads, publishing & the biodiversity dat...
 
From Open Linked Data towards an Ecosystem of Interlinked Knowledge
From Open Linked Data towards an Ecosystem of Interlinked KnowledgeFrom Open Linked Data towards an Ecosystem of Interlinked Knowledge
From Open Linked Data towards an Ecosystem of Interlinked Knowledge
 
RO-crate-FDO-ROHub
RO-crate-FDO-ROHubRO-crate-FDO-ROHub
RO-crate-FDO-ROHub
 
ROHub - Research Object Management Platform Introduction
ROHub - Research Object Management Platform IntroductionROHub - Research Object Management Platform Introduction
ROHub - Research Object Management Platform Introduction
 
Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and ReuseMendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
 
2012 03-28 Wf4ever, preserving workflows as digital research objects
2012 03-28 Wf4ever, preserving workflows as digital research objects2012 03-28 Wf4ever, preserving workflows as digital research objects
2012 03-28 Wf4ever, preserving workflows as digital research objects
 
ROHub-Argos integration
ROHub-Argos integrationROHub-Argos integration
ROHub-Argos integration
 
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
 
Furore devdays 2017- rdf1(solbrig)
Furore devdays 2017- rdf1(solbrig)Furore devdays 2017- rdf1(solbrig)
Furore devdays 2017- rdf1(solbrig)
 
IBC FAIR Data Prototype Implementation slideshow
IBC FAIR Data Prototype Implementation   slideshowIBC FAIR Data Prototype Implementation   slideshow
IBC FAIR Data Prototype Implementation slideshow
 

Más de Bernhard Haslhofer

Post-Bitcoin Cryptocurrencies, Off-Chain Transaction Channels, and Cryptocur...
 Post-Bitcoin Cryptocurrencies, Off-Chain Transaction Channels, and Cryptocur... Post-Bitcoin Cryptocurrencies, Off-Chain Transaction Channels, and Cryptocur...
Post-Bitcoin Cryptocurrencies, Off-Chain Transaction Channels, and Cryptocur...
Bernhard Haslhofer
 
The value of open data and the OpenGLAM network
The value of open data and the OpenGLAM networkThe value of open data and the OpenGLAM network
The value of open data and the OpenGLAM network
Bernhard Haslhofer
 
Offene Daten im Kulturbereich - Die pragmatische Perspektive
Offene Daten im Kulturbereich - Die pragmatische PerspektiveOffene Daten im Kulturbereich - Die pragmatische Perspektive
Offene Daten im Kulturbereich - Die pragmatische Perspektive
Bernhard Haslhofer
 
Open Data - Principles and Techniques
Open Data - Principles and TechniquesOpen Data - Principles and Techniques
Open Data - Principles and Techniques
Bernhard Haslhofer
 
ResourceSync: Leveraging Sitemaps for Resource Synchronization
ResourceSync: Leveraging Sitemaps for Resource SynchronizationResourceSync: Leveraging Sitemaps for Resource Synchronization
ResourceSync: Leveraging Sitemaps for Resource Synchronization
Bernhard Haslhofer
 

Más de Bernhard Haslhofer (20)

Decentralized Finance (DeFi) - Understanding Risks in an Emerging Financial P...
Decentralized Finance (DeFi) - Understanding Risks in an Emerging Financial P...Decentralized Finance (DeFi) - Understanding Risks in an Emerging Financial P...
Decentralized Finance (DeFi) - Understanding Risks in an Emerging Financial P...
 
Token Systems, Payment Channels, and Corporate Currencies
Token Systems, Payment Channels, and Corporate CurrenciesToken Systems, Payment Channels, and Corporate Currencies
Token Systems, Payment Channels, and Corporate Currencies
 
Can a blockchain solve the trust problem?
Can a blockchain solve the trust problem?Can a blockchain solve the trust problem?
Can a blockchain solve the trust problem?
 
Measurements in Cryptocurrency Networks
Measurements in Cryptocurrency NetworksMeasurements in Cryptocurrency Networks
Measurements in Cryptocurrency Networks
 
Post-Bitcoin Cryptocurrencies, Off-Chain Transaction Channels, and Cryptocur...
 Post-Bitcoin Cryptocurrencies, Off-Chain Transaction Channels, and Cryptocur... Post-Bitcoin Cryptocurrencies, Off-Chain Transaction Channels, and Cryptocur...
Post-Bitcoin Cryptocurrencies, Off-Chain Transaction Channels, and Cryptocur...
 
Insight Into Cryptocurrencies - Methods and Tools for Analyzing Blockchain-ba...
Insight Into Cryptocurrencies - Methods and Tools for Analyzing Blockchain-ba...Insight Into Cryptocurrencies - Methods and Tools for Analyzing Blockchain-ba...
Insight Into Cryptocurrencies - Methods and Tools for Analyzing Blockchain-ba...
 
O Bitcoin Where Art Thou? An Introduction to Cryptocurrency Analytics
O Bitcoin Where Art Thou? An Introduction to Cryptocurrency AnalyticsO Bitcoin Where Art Thou? An Introduction to Cryptocurrency Analytics
O Bitcoin Where Art Thou? An Introduction to Cryptocurrency Analytics
 
Mind the Gap - Data Science Meets Software Engineering
Mind the Gap - Data Science Meets Software EngineeringMind the Gap - Data Science Meets Software Engineering
Mind the Gap - Data Science Meets Software Engineering
 
GraphSense - Real-time Insight into Virtual Currency Ecosystems
GraphSense - Real-time Insight into Virtual Currency EcosystemsGraphSense - Real-time Insight into Virtual Currency Ecosystems
GraphSense - Real-time Insight into Virtual Currency Ecosystems
 
BITCOIN - De-anonymization and Money Laundering Detection Strategies
BITCOIN - De-anonymization and Money Laundering Detection StrategiesBITCOIN - De-anonymization and Money Laundering Detection Strategies
BITCOIN - De-anonymization and Money Laundering Detection Strategies
 
Bitcoin - Introduction, Technical Aspects and Ongoing Developments
Bitcoin - Introduction, Technical Aspects and Ongoing DevelopmentsBitcoin - Introduction, Technical Aspects and Ongoing Developments
Bitcoin - Introduction, Technical Aspects and Ongoing Developments
 
The value of open data and the OpenGLAM network
The value of open data and the OpenGLAM networkThe value of open data and the OpenGLAM network
The value of open data and the OpenGLAM network
 
Things, not Strings
Things, not StringsThings, not Strings
Things, not Strings
 
Offene Daten im Kulturbereich - Die pragmatische Perspektive
Offene Daten im Kulturbereich - Die pragmatische PerspektiveOffene Daten im Kulturbereich - Die pragmatische Perspektive
Offene Daten im Kulturbereich - Die pragmatische Perspektive
 
Open Data - Principles and Techniques
Open Data - Principles and TechniquesOpen Data - Principles and Techniques
Open Data - Principles and Techniques
 
Semantic Tagging for old maps...and other things on the Web
Semantic Tagging for old maps...and other things on the WebSemantic Tagging for old maps...and other things on the Web
Semantic Tagging for old maps...and other things on the Web
 
Linked (Open) Data
Linked (Open) DataLinked (Open) Data
Linked (Open) Data
 
ResourceSync: Leveraging Sitemaps for Resource Synchronization
ResourceSync: Leveraging Sitemaps for Resource SynchronizationResourceSync: Leveraging Sitemaps for Resource Synchronization
ResourceSync: Leveraging Sitemaps for Resource Synchronization
 
Using SKOS Vocabularies for Improving Web Search
Using SKOS Vocabularies for Improving Web SearchUsing SKOS Vocabularies for Improving Web Search
Using SKOS Vocabularies for Improving Web Search
 
Maphub and Annotorious
Maphub and AnnotoriousMaphub and Annotorious
Maphub and Annotorious
 

Último

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Último (20)

Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 

DSNotify - Detecting and Fixing Broken Links in Linked Data Sets

  • 1. DSNotify - Detecting and Fixing Broken Links in Linked Data Sets WebS ’09 @ DEXA 2009 Linz, 02/09/2009 Bernhard Haslhofer and Niko Popitsch Bernhard Haslhofer, Niko Popitsch
  • 3.
  • 4.
  • 5.
  • 6. <mo:MusicGroup rdf:about="/music/artists/084308bd-1654-436f-ba03-df6697104e19#artist"> <foaf:name>Green Day</foaf:name> <owl:sameAs rdf:resource="http://dbpedia.org/resource/Green_Day" /> <mo:image rdf:resource="/music/images/artists/7col_in/084308bd-1654-436f-ba03- df6697104e19.jpg" /> <foaf:page rdf:resource="/music/artists/084308bd-1654-436f-ba03-df6697104e19.html" /> <mo:musicbrainz rdf:resource="http://musicbrainz.org/artist/084308bd-1654-436f-ba03- df6697104e19.html" /> <mo:homepage rdf:resource="http://www.greenday.com/" /> <mo:fanpage rdf:resource="http://www.greendayvideos.com/" /> <mo:fanpage rdf:resource="http://www.greenday.net" /> <mo:imdb rdf:resource="http://www.imdb.com/name/nm1554564/" /> <mo:myspace rdf:resource="http://www.myspace.com/greenday" /> ...
  • 7. ... <rdf:Description rdf:about="http://dbpedia.org/resource/Green_Day"> <dbpprop:abstract xmlns:dbpprop="http://dbpedia.org/property/" xml:lang="en">Green Day is an American rock trio formed in 1987. The band has consisted of Billie Joe Armstrong (vocals, guitar), Mike Dirnt, and Tré Cool for the majority of its existence... </dbpprop:abstract> </rdf:Description> ... <rdf:Description rdf:about="http://dbpedia.org/resource/Green_Day"> <dbpprop:abstract xmlns:dbpprop="http://dbpedia.org/property/" xml:lang="de">Green Day [gɹiːn deɪ] ist eine US-amerikanische Punk-Rock-Band, mit der Anfang der 1990er das Punk- Revival begann. Die Band wurde 1987 von Billie Joe Armstrong und Mike Dirnt zusammen mit dem Schlagzeuger John Kiffmeyer alias Al Sobrante als The Sweet Children.... </dbpprop:abstract> </rdf:Description> ...
  • 9. Some numbers... • Events between DBpedia 3.2 (10/2008) and 3.3 (05/2009) • # resources created: 29449 • # resources removed: 4789 • # resources moved: 729 Bernhard Haslhofer, Niko Popitsch 9
  • 10.
  • 11. Link Integrity... • is a qualitative property that is given when all links within and between a set of data sources are valid and deliver the result intended by the link creator. • cf. referential integrity in RDBMS • demands a solution that • detects broken links between resources • provides support for fixing broken links Bernhard Haslhofer, Niko Popitsch 11
  • 12. Types of broken links • Removed link targets • e.g., resource deleted, server not available anymore, etc. • Moved link targets • available at another Web location • e.g., reorganization of Web resources • Modified link targets Bernhard Haslhofer, Niko Popitsch 12
  • 13. The DSNotify Approach • periodically monitor items (resources) in a specific Linked Data source • extract descriptive features vector for each item • store item + feature vector in index • use feature vectors to detect if items have been removed or moved to another location • if moved, add relationship between “old” and “new” item Bernhard Haslhofer, Niko Popitsch 13
  • 14. Architecture LOD „consuming“ application LOD Sources LOD Source owl:sameAs owl:sameAs monitor update * Monitor (feature extraction) Event LOG notifications * LOD source Indices updater querying II RII AII * Decider Decision making * Move Detector (heuristic) user DSNOTIFY Bernhard Haslhofer, Niko Popitsch 14
  • 15. Index Interaction Item Index (II) Archived Item Index (AII) Removed Item Index (RII) http://dbpedia.org/resource/ t1 Green_Day (band) t2 http://dbpedia.org/resource/ Green_Day (band) t3 http://dbpedia.org/resource/ http://dbpedia.org/resource/ band/Green_Day Green_Day (band) t4 http://dbpedia.org/resource/ http://dbpedia.org/resource/ band/Alternative/Green_Day band/Green_Day http://dbpedia.org/resource/ time Green_Day (band) Bernhard Haslhofer, Niko Popitsch 15
  • 16. Move Detection • is a semi-automatic process • calculate similarity between items based on their feature vectors using domain-specific heuristics • probability > given threshold: automatic decision • probability < given threshold: ask expert user Bernhard Haslhofer, Niko Popitsch 16
  • 17. DSNotify HTTP Interface • GET http://<server>:<port>/<dsnotify>/item/<uri> • find out what happened with an item • GET http://<server>:<port>/<dsnotify>/eventChoice • retrieve pending event choices (move / remove) • ... Bernhard Haslhofer, Niko Popitsch 17
  • 18. Evaluation Plan t -n ... t -2 t -1 t 0 DBpedia 2.0 DBpedia 3.0 DBpedia 3.1 DBpedia 3.2 Diff Diff Diff manual classification manual classification manual classification mv rm mv rm mv rm Bernhard Haslhofer, Niko Popitsch 18
  • 19. Status / Future Work • 1st prototype (infrastructure) ready • annotated test-data set based on DBpedia available • Currently working on: • system for simulating past modifications in DBpedia • the DSNotify evaluation Bernhard Haslhofer, Niko Popitsch 19
  • 20. Fixing Your Web since 2009
  • 22. Evaluation Plan • Monitor simulated DBpedia evolution (t-n - t0) • Precision / recall of automatic move detection • with different similarity thresholds • with different heuristics / and feature vectors Bernhard Haslhofer, Niko Popitsch 22
  • 23. Linked Data / Web of Data • Data management paradigm on the basis of Web technologies • HTTP, URI, and RDF/S are the key technologies • Applications (not Web browsers) are data consumers • Links between resources play a major role Bernhard Haslhofer, Niko Popitsch 23