SlideShare una empresa de Scribd logo
1 de 17
Digital Enterprise Research Institute                                                                         www.deri.ie




                 Leveraging Matching Dependencies for Guided
                   User Feedback in Linked Data Applications
                                                 Umair ul Hassan, Sean O’Riain, Edward Curry
                                                                     Digital Enterprise Research Institute
                                                                     National University of Ireland, Galway




 Copyright 2011 Digital Enterprise Research Institute. All rights reserved.
Outline
Digital Enterprise Research Institute                                        www.deri.ie




             Motivation & Problem Space
                    Identity Resolution on the Linked Open Data (LOD) Web
             Proposed Approach
                    LOD Application Architecture
                    How it relates to existing works
             Evaluation
             Conclusion & Future Work
Overview
Digital Enterprise Research Institute                                                      www.deri.ie




             Identity Resolution in the Linked Open Data Web
                    Real-world entities have multiple identifiers in LOD
                    Identity resolution links have associated uncertainty
                    LOD Applications require user verification of links
             Problem
                    Feedback for all links is infeasible for large datasets
                    LOD Applications have domain specific utility of links
             Proposed Approach
                    Leverages matching dependencies to define domain specific
                     requirements of identity resolution
                    Ranks identity resolution links according to value of perfect information
Linked Open Data (LOD)
Digital Enterprise Research Institute                                                                                            www.deri.ie




             Expose and interlink datasets on the Web
             Using URIs to identify “things” in your data
             Using a graph representation (RDF) to describe URIs
             Vision: The Web as a huge graph database




                                              Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
Linked Data Example
Digital Enterprise Research Institute                          www.deri.ie




        Identity resolution links




                                        Multiple Identifiers
Identity Resolution in LOD
Digital Enterprise Research Institute                                                      www.deri.ie




             Identity resolution is required for consolidation of data in
              applications consuming LOD

             Three sources of identity resolution links
                    Provided by data publishers (e.g. dbpedia.org)
                    Generated by consumer through tools (e.g. SILK, SEMIRI, RiMOM)
                    Maintained by third party web services (e.g. sameas.org)


             Uncertainty associated with links
                    Due to multiple identity equivalence interpretations
                    Due to characteristics of link generation algorithms (similarity based)
Identity Resolution Problem
Digital Enterprise Research Institute                                                         www.deri.ie




             User feedback for uncertain links
                    Verify uncertain identity resolution links from users/experts
                    Improve quality of entity consolidation


             Challenges
                    Domain specific semantic requirements
                       – How to define domain specific requirements of quality for Linked
                         Data applications?


                    Limited user attention
                       – How to rank candidate links according to their benefit to maximize
                         utility of user feedback?
Identity Resolution Problem
Digital Enterprise Research Institute                                                www.deri.ie




             User feedback for uncertain links
                    Verify uncertain identity resolution links from users/experts
                    Improve quality of entity consolidation


             Proposed Approach
                    Domain specific semantic requirements
                       – Leverage Matching Dependencies


                    Limited user attention
                       – Employ value of perfect information theory
LOD Application Architecture
Digital Enterprise Research Institute                                                                                                                                                 www.deri.ie




                                                                                                                                          Utility              Feedback         Consolidation
                                                                                                                                          Module                Module            Module
                                                                Candidate Links


                                                                                                                                                               Questions
                                                                                                                                                    Rules                  Feedback

                                                                                                                                         Matching                                   Utility
                                                                                                                                       Dependencies                              Improvement



                                                                                                                                                               Ranked
                                                                                                                                                            Feedback Tasks



Tom Heath and Christian Bizer (2011) Linked Data: Evolving the Web into a Global Data Space (1st edition), 1-136. Morgan & Claypool.
Related Work
Digital Enterprise Research Institute                                                    www.deri.ie




             Jeffery et al., “Pay-as-you-go user feedback for dataspace
              systems,” in Proceedings of the 2008 ACM SIGMOD
              Conference, 2008, pp. 847-860.

             Utility:
                    In terms of cardinality of query results on dataspace
                    General metric not suitable for application specific data quality
             Assumption:
                    Availability of global query statistics
                       – Problematic for Linked Open Data
Proposed Approach
Digital Enterprise Research Institute                                                             www.deri.ie




             Domain Specific Utility
                    Define utility in terms of user specified rules i.e. matching dependencies
                    Rank candidates links for user feedback according to value of perfect
                     information


             Assumptions
                    We assume matching dependencies are either provided by user or generated
                     through existing tools
                    Utility is based on satisfaction ratio of dependencies in dataspace
Proposed Approach
Digital Enterprise Research Institute                                        www.deri.ie




             Matching Dependencies

                    Matching Rule


                    Example


                    Utility of rule


                                             g (mk ) U ( Dmk , M {mk }) pk
             Value of Perfect Information           U ( Dmk , M {mk })(1 pk )
                                                     U ( D, M )
Evaluation
Digital Enterprise Research Institute                                   www.deri.ie




             Measure change in utility of a dataspace according to
              matching rules after a specific number of feedback iterations
             Candidate links generated by the Silk framework
Evaluation
Digital Enterprise Research Institute                                                                                                     www.deri.ie




             Datasets

                                            IIMB 2009 Dataset                UCI-Adult Dataset                     Drug Dataset

              Data Source           Instance Matching Benchmark      UCI Machine Learning Repository     Instance Matching Benchmark
                                    2009                                                                 2010
              Data Collection       IIMB 2009                        US Consensus Dataset                DrugBank and Sider Datasets
                                     - Reference Ontology            - Manually created duplicates and   - Interlinking between two datasets
                                     - Ontology #16 with errors in   data value errors                   of same domain
                                    attributes

              Entity Types          imdb:Movie                       foaf:Person                         drugbank:drugs, sider:drugs
              Total Triples         291                              64000                                             14348
              Total Entity IDs      44                               4000                                               5696
              Total Attributes      9                                16                                                   3
              Total Values          130                              10878                                              8473
              Candidate Links       81                               72                                                  94
              Correct Links         22                               72                                                  66
Evaluation
Digital Enterprise Research Institute                                                                                                                                               www.deri.ie




                                                        IIMB 2009 Dataset                                                                            UCI-Adult Dataset
                                      100%                                                                                         100%
      Dataspace Utility Improvement




                                                                                                   Dataspace Utility Improvement
                                      90%                                                                                          90%
                                      80%                                                                                          80%
                                      70%                                                                                          70%
                                      60%                                                                                          60%
                                      50%                                                                                          50%
                                      40%                                                                                          40%
                                      30%                                          VPI_RULES                                       30%                                         VPI_RULES
                                      20%                                          CONFIDENCE                                      20%                                         CONFIDENCE
                                      10%                                          RANDOM                                          10%                                         RANDOM

                                       0%                                                                                           0%
                                             0%   20%        40%         60%     80%        100%                                          0%   20%        40%         60%     80%          100%
                                                            Feedback Iteration                                                                           Feedback Iteration
Conclusion
Digital Enterprise Research Institute                                     www.deri.ie




             Matching dependencies provide an effective mechanism to:
                    Represent entity matching rules
                    Specify domain specific semantic requirements
                    Measure utility of dataspaces


             Value of perfect information enables effective ranking strategy
              for user feedback

             In the three datasets 100% utility improvement was reached
              under 40% of user feedback
Future Work
Digital Enterprise Research Institute                                     www.deri.ie




             Expand to other data quality problems

             Expand on types of dependencies such as comparable
              dependencies and order dependencies

             Allow multi-user feedback for collaborative data cleaning

Más contenido relacionado

La actualidad más candente

Social Software They'll Love to Use
Social Software They'll Love to UseSocial Software They'll Love to Use
Social Software They'll Love to UseKay Corry Aubrey
 
Web 2.0
Web 2.0Web 2.0
Web 2.0gypsy
 
Doculabs E Discovery 051710
Doculabs E Discovery 051710Doculabs E Discovery 051710
Doculabs E Discovery 051710Lane Severson
 
A Brief Tour of Responsability Driven Design
A Brief Tour of Responsability Driven DesignA Brief Tour of Responsability Driven Design
A Brief Tour of Responsability Driven Designelliando dias
 
Transitioning web application frameworks towards the Semantic Web (master the...
Transitioning web application frameworks towards the Semantic Web (master the...Transitioning web application frameworks towards the Semantic Web (master the...
Transitioning web application frameworks towards the Semantic Web (master the...Benjamin Heitmann
 
Self-service Linked Government Data
Self-service Linked Government DataSelf-service Linked Government Data
Self-service Linked Government DataFadi Maali
 
One-stop shop for software development information
One-stop shop for software development informationOne-stop shop for software development information
One-stop shop for software development informationAftab Iqbal
 
Enterprise Energy Management using a Linked Dataspace for Energy Intelligence
Enterprise Energy Management using a Linked Dataspace for Energy IntelligenceEnterprise Energy Management using a Linked Dataspace for Energy Intelligence
Enterprise Energy Management using a Linked Dataspace for Energy IntelligenceEdward Curry
 
Leveraging existing Web Frameworks for a SIOC explorer (Scripting for the Sem...
Leveraging existing Web Frameworks for a SIOC explorer (Scripting for the Sem...Leveraging existing Web Frameworks for a SIOC explorer (Scripting for the Sem...
Leveraging existing Web Frameworks for a SIOC explorer (Scripting for the Sem...Benjamin Heitmann
 
Understanding Composite Web Applications with SharePoint 2010
Understanding Composite Web Applications with SharePoint 2010Understanding Composite Web Applications with SharePoint 2010
Understanding Composite Web Applications with SharePoint 2010SharePoint Universe
 
Presentation of current research: distributed architecture for recommendation...
Presentation of current research: distributed architecture for recommendation...Presentation of current research: distributed architecture for recommendation...
Presentation of current research: distributed architecture for recommendation...Benjamin Heitmann
 
10052012 luc vervenne synergetics van syntax portfolio naar semantische uitwi...
10052012 luc vervenne synergetics van syntax portfolio naar semantische uitwi...10052012 luc vervenne synergetics van syntax portfolio naar semantische uitwi...
10052012 luc vervenne synergetics van syntax portfolio naar semantische uitwi...Stichting ePortfolio Support
 
Aberdeen ppt-iam integrated-db-06 20120412
Aberdeen ppt-iam integrated-db-06 20120412Aberdeen ppt-iam integrated-db-06 20120412
Aberdeen ppt-iam integrated-db-06 20120412OracleIDM
 
Osservatorio mobile social networks final report
Osservatorio mobile social networks final reportOsservatorio mobile social networks final report
Osservatorio mobile social networks final reportLaura Cavallaro
 
Clearvale overview oct2011
Clearvale overview oct2011Clearvale overview oct2011
Clearvale overview oct2011tommydm
 
Swap2010 agave
Swap2010 agaveSwap2010 agave
Swap2010 agavejuanaya
 
Dynamic Open Semantic Service Networks
Dynamic Open Semantic Service NetworksDynamic Open Semantic Service Networks
Dynamic Open Semantic Service NetworksJorge Cardoso
 
Introduction to google analytics
Introduction to google analyticsIntroduction to google analytics
Introduction to google analyticsJeff Wisniewski
 
Advanced Fuzzy Logic Based Image Watermarking Technique for Medical Images
Advanced Fuzzy Logic Based Image Watermarking Technique for Medical ImagesAdvanced Fuzzy Logic Based Image Watermarking Technique for Medical Images
Advanced Fuzzy Logic Based Image Watermarking Technique for Medical ImagesIJARIIT
 
System Center webinar
System Center webinarSystem Center webinar
System Center webinarSentri
 

La actualidad más candente (20)

Social Software They'll Love to Use
Social Software They'll Love to UseSocial Software They'll Love to Use
Social Software They'll Love to Use
 
Web 2.0
Web 2.0Web 2.0
Web 2.0
 
Doculabs E Discovery 051710
Doculabs E Discovery 051710Doculabs E Discovery 051710
Doculabs E Discovery 051710
 
A Brief Tour of Responsability Driven Design
A Brief Tour of Responsability Driven DesignA Brief Tour of Responsability Driven Design
A Brief Tour of Responsability Driven Design
 
Transitioning web application frameworks towards the Semantic Web (master the...
Transitioning web application frameworks towards the Semantic Web (master the...Transitioning web application frameworks towards the Semantic Web (master the...
Transitioning web application frameworks towards the Semantic Web (master the...
 
Self-service Linked Government Data
Self-service Linked Government DataSelf-service Linked Government Data
Self-service Linked Government Data
 
One-stop shop for software development information
One-stop shop for software development informationOne-stop shop for software development information
One-stop shop for software development information
 
Enterprise Energy Management using a Linked Dataspace for Energy Intelligence
Enterprise Energy Management using a Linked Dataspace for Energy IntelligenceEnterprise Energy Management using a Linked Dataspace for Energy Intelligence
Enterprise Energy Management using a Linked Dataspace for Energy Intelligence
 
Leveraging existing Web Frameworks for a SIOC explorer (Scripting for the Sem...
Leveraging existing Web Frameworks for a SIOC explorer (Scripting for the Sem...Leveraging existing Web Frameworks for a SIOC explorer (Scripting for the Sem...
Leveraging existing Web Frameworks for a SIOC explorer (Scripting for the Sem...
 
Understanding Composite Web Applications with SharePoint 2010
Understanding Composite Web Applications with SharePoint 2010Understanding Composite Web Applications with SharePoint 2010
Understanding Composite Web Applications with SharePoint 2010
 
Presentation of current research: distributed architecture for recommendation...
Presentation of current research: distributed architecture for recommendation...Presentation of current research: distributed architecture for recommendation...
Presentation of current research: distributed architecture for recommendation...
 
10052012 luc vervenne synergetics van syntax portfolio naar semantische uitwi...
10052012 luc vervenne synergetics van syntax portfolio naar semantische uitwi...10052012 luc vervenne synergetics van syntax portfolio naar semantische uitwi...
10052012 luc vervenne synergetics van syntax portfolio naar semantische uitwi...
 
Aberdeen ppt-iam integrated-db-06 20120412
Aberdeen ppt-iam integrated-db-06 20120412Aberdeen ppt-iam integrated-db-06 20120412
Aberdeen ppt-iam integrated-db-06 20120412
 
Osservatorio mobile social networks final report
Osservatorio mobile social networks final reportOsservatorio mobile social networks final report
Osservatorio mobile social networks final report
 
Clearvale overview oct2011
Clearvale overview oct2011Clearvale overview oct2011
Clearvale overview oct2011
 
Swap2010 agave
Swap2010 agaveSwap2010 agave
Swap2010 agave
 
Dynamic Open Semantic Service Networks
Dynamic Open Semantic Service NetworksDynamic Open Semantic Service Networks
Dynamic Open Semantic Service Networks
 
Introduction to google analytics
Introduction to google analyticsIntroduction to google analytics
Introduction to google analytics
 
Advanced Fuzzy Logic Based Image Watermarking Technique for Medical Images
Advanced Fuzzy Logic Based Image Watermarking Technique for Medical ImagesAdvanced Fuzzy Logic Based Image Watermarking Technique for Medical Images
Advanced Fuzzy Logic Based Image Watermarking Technique for Medical Images
 
System Center webinar
System Center webinarSystem Center webinar
System Center webinar
 

Destacado

A Capability Requirements Approach for Predicting Worker Performance in Crowd...
A Capability Requirements Approach for Predicting Worker Performance in Crowd...A Capability Requirements Approach for Predicting Worker Performance in Crowd...
A Capability Requirements Approach for Predicting Worker Performance in Crowd...Umair ul Hassan
 
SLUA: Towards Semantic Linking of Users with Actions in Crowdsourcing
SLUA: Towards Semantic Linking of Users with Actions in CrowdsourcingSLUA: Towards Semantic Linking of Users with Actions in Crowdsourcing
SLUA: Towards Semantic Linking of Users with Actions in CrowdsourcingUmair ul Hassan
 
Effects of Expertise Assessment on the Quality of Task Routing in Human Compu...
Effects of Expertise Assessment on the Quality of Task Routing in Human Compu...Effects of Expertise Assessment on the Quality of Task Routing in Human Compu...
Effects of Expertise Assessment on the Quality of Task Routing in Human Compu...Umair ul Hassan
 
A Collaborative Approach for Metadata Management for Internet of Things
A Collaborative Approach for Metadata Management for Internet of ThingsA Collaborative Approach for Metadata Management for Internet of Things
A Collaborative Approach for Metadata Management for Internet of ThingsUmair ul Hassan
 
Schema-agnositc queries over large-schema databases: a distributional semanti...
Schema-agnositc queries over large-schema databases: a distributional semanti...Schema-agnositc queries over large-schema databases: a distributional semanti...
Schema-agnositc queries over large-schema databases: a distributional semanti...Andre Freitas
 
A Multi-armed Bandit Approach to Online Spatial Task Assignment
A Multi-armed Bandit Approach to Online Spatial Task AssignmentA Multi-armed Bandit Approach to Online Spatial Task Assignment
A Multi-armed Bandit Approach to Online Spatial Task AssignmentUmair ul Hassan
 
Researh toolbox - Data analysis with python
Researh toolbox  - Data analysis with pythonResearh toolbox  - Data analysis with python
Researh toolbox - Data analysis with pythonUmair ul Hassan
 

Destacado (7)

A Capability Requirements Approach for Predicting Worker Performance in Crowd...
A Capability Requirements Approach for Predicting Worker Performance in Crowd...A Capability Requirements Approach for Predicting Worker Performance in Crowd...
A Capability Requirements Approach for Predicting Worker Performance in Crowd...
 
SLUA: Towards Semantic Linking of Users with Actions in Crowdsourcing
SLUA: Towards Semantic Linking of Users with Actions in CrowdsourcingSLUA: Towards Semantic Linking of Users with Actions in Crowdsourcing
SLUA: Towards Semantic Linking of Users with Actions in Crowdsourcing
 
Effects of Expertise Assessment on the Quality of Task Routing in Human Compu...
Effects of Expertise Assessment on the Quality of Task Routing in Human Compu...Effects of Expertise Assessment on the Quality of Task Routing in Human Compu...
Effects of Expertise Assessment on the Quality of Task Routing in Human Compu...
 
A Collaborative Approach for Metadata Management for Internet of Things
A Collaborative Approach for Metadata Management for Internet of ThingsA Collaborative Approach for Metadata Management for Internet of Things
A Collaborative Approach for Metadata Management for Internet of Things
 
Schema-agnositc queries over large-schema databases: a distributional semanti...
Schema-agnositc queries over large-schema databases: a distributional semanti...Schema-agnositc queries over large-schema databases: a distributional semanti...
Schema-agnositc queries over large-schema databases: a distributional semanti...
 
A Multi-armed Bandit Approach to Online Spatial Task Assignment
A Multi-armed Bandit Approach to Online Spatial Task AssignmentA Multi-armed Bandit Approach to Online Spatial Task Assignment
A Multi-armed Bandit Approach to Online Spatial Task Assignment
 
Researh toolbox - Data analysis with python
Researh toolbox  - Data analysis with pythonResearh toolbox  - Data analysis with python
Researh toolbox - Data analysis with python
 

Similar a Leveraging Matching Dependencies for Guided User Feedback in Linked Data Applications

E2.0 - Next Generation Portal and Content Management
E2.0 - Next Generation Portal and Content ManagementE2.0 - Next Generation Portal and Content Management
E2.0 - Next Generation Portal and Content Managementmuratc2a
 
Layer 7 Mobile Security Workshop with CA Technologies and Forrester Research ...
Layer 7 Mobile Security Workshop with CA Technologies and Forrester Research ...Layer 7 Mobile Security Workshop with CA Technologies and Forrester Research ...
Layer 7 Mobile Security Workshop with CA Technologies and Forrester Research ...CA API Management
 
A distributional structured semantic space for querying rdf graph data
A distributional structured semantic space for querying rdf graph dataA distributional structured semantic space for querying rdf graph data
A distributional structured semantic space for querying rdf graph dataAndre Freitas
 
Querying Heterogeneous Datasets on the Linked Data Web
Querying Heterogeneous Datasets on the Linked Data WebQuerying Heterogeneous Datasets on the Linked Data Web
Querying Heterogeneous Datasets on the Linked Data WebEdward Curry
 
Northridge Webinar Share Point 2010 Public Web
Northridge Webinar Share Point 2010 Public WebNorthridge Webinar Share Point 2010 Public Web
Northridge Webinar Share Point 2010 Public Webjfarq
 
Service Oriented Architecture (SOA) [1/5] : Introduction to SOA
Service Oriented Architecture (SOA) [1/5] : Introduction to SOAService Oriented Architecture (SOA) [1/5] : Introduction to SOA
Service Oriented Architecture (SOA) [1/5] : Introduction to SOAIMC Institute
 
Enabling agility with continuous integration testing
Enabling agility with continuous integration testingEnabling agility with continuous integration testing
Enabling agility with continuous integration testingIBM Rational software
 
Middleware 2002
Middleware 2002Middleware 2002
Middleware 2002eaiti
 
Innovations in Data Grid Technology with Oracle Coherence
Innovations in Data Grid Technology with Oracle CoherenceInnovations in Data Grid Technology with Oracle Coherence
Innovations in Data Grid Technology with Oracle CoherenceBob Rhubart
 
A Multidimensional Semantic Space for Data Model Independent Queries over RDF...
A Multidimensional Semantic Space for Data Model Independent Queries over RDF...A Multidimensional Semantic Space for Data Model Independent Queries over RDF...
A Multidimensional Semantic Space for Data Model Independent Queries over RDF...Andre Freitas
 
BDI 9/16/09 B2B Social Communications Case Studies Conference - Deloitte
BDI 9/16/09 B2B Social Communications Case Studies Conference - DeloitteBDI 9/16/09 B2B Social Communications Case Studies Conference - Deloitte
BDI 9/16/09 B2B Social Communications Case Studies Conference - DeloitteBusiness Development Institute
 
Conférence Open Data par où commencer ? "How to achieve interoperability?" E....
Conférence Open Data par où commencer ? "How to achieve interoperability?" E....Conférence Open Data par où commencer ? "How to achieve interoperability?" E....
Conférence Open Data par où commencer ? "How to achieve interoperability?" E....Aline Custodio
 
Multi-Source Provenance-Aware User Interest Profiling on the Social Semantic Web
Multi-Source Provenance-Aware User Interest Profiling on the Social Semantic WebMulti-Source Provenance-Aware User Interest Profiling on the Social Semantic Web
Multi-Source Provenance-Aware User Interest Profiling on the Social Semantic WebFabrizio Orlandi
 
Cloud and E2.0: Connecting the Dots - OSCON Cloud Summit - 2010
Cloud and E2.0: Connecting the Dots - OSCON Cloud Summit - 2010Cloud and E2.0: Connecting the Dots - OSCON Cloud Summit - 2010
Cloud and E2.0: Connecting the Dots - OSCON Cloud Summit - 2010Dion Hinchcliffe
 
Identity access and privacy in the new hybrid enterprise slides
Identity access and privacy in the new hybrid enterprise slidesIdentity access and privacy in the new hybrid enterprise slides
Identity access and privacy in the new hybrid enterprise slidesCA API Management
 
Intranet 2.0 - Integrating Enterprise 2.0 into your corporate intranet
Intranet 2.0 - Integrating Enterprise 2.0 into your corporate intranetIntranet 2.0 - Integrating Enterprise 2.0 into your corporate intranet
Intranet 2.0 - Integrating Enterprise 2.0 into your corporate intranetJames Dellow
 
An imperative focus on semantic
An imperative focus on semanticAn imperative focus on semantic
An imperative focus on semanticijasa
 

Similar a Leveraging Matching Dependencies for Guided User Feedback in Linked Data Applications (20)

E2.0 - Next Generation Portal and Content Management
E2.0 - Next Generation Portal and Content ManagementE2.0 - Next Generation Portal and Content Management
E2.0 - Next Generation Portal and Content Management
 
Layer 7 Mobile Security Workshop with CA Technologies and Forrester Research ...
Layer 7 Mobile Security Workshop with CA Technologies and Forrester Research ...Layer 7 Mobile Security Workshop with CA Technologies and Forrester Research ...
Layer 7 Mobile Security Workshop with CA Technologies and Forrester Research ...
 
A distributional structured semantic space for querying rdf graph data
A distributional structured semantic space for querying rdf graph dataA distributional structured semantic space for querying rdf graph data
A distributional structured semantic space for querying rdf graph data
 
Agent Technology
Agent Technology Agent Technology
Agent Technology
 
Agent Technology Presentation
Agent Technology PresentationAgent Technology Presentation
Agent Technology Presentation
 
Querying Heterogeneous Datasets on the Linked Data Web
Querying Heterogeneous Datasets on the Linked Data WebQuerying Heterogeneous Datasets on the Linked Data Web
Querying Heterogeneous Datasets on the Linked Data Web
 
Northridge Webinar Share Point 2010 Public Web
Northridge Webinar Share Point 2010 Public WebNorthridge Webinar Share Point 2010 Public Web
Northridge Webinar Share Point 2010 Public Web
 
Service Oriented Architecture (SOA) [1/5] : Introduction to SOA
Service Oriented Architecture (SOA) [1/5] : Introduction to SOAService Oriented Architecture (SOA) [1/5] : Introduction to SOA
Service Oriented Architecture (SOA) [1/5] : Introduction to SOA
 
Enabling agility with continuous integration testing
Enabling agility with continuous integration testingEnabling agility with continuous integration testing
Enabling agility with continuous integration testing
 
Middleware 2002
Middleware 2002Middleware 2002
Middleware 2002
 
Innovations in Data Grid Technology with Oracle Coherence
Innovations in Data Grid Technology with Oracle CoherenceInnovations in Data Grid Technology with Oracle Coherence
Innovations in Data Grid Technology with Oracle Coherence
 
A Multidimensional Semantic Space for Data Model Independent Queries over RDF...
A Multidimensional Semantic Space for Data Model Independent Queries over RDF...A Multidimensional Semantic Space for Data Model Independent Queries over RDF...
A Multidimensional Semantic Space for Data Model Independent Queries over RDF...
 
BDI 9/16/09 B2B Social Communications Case Studies Conference - Deloitte
BDI 9/16/09 B2B Social Communications Case Studies Conference - DeloitteBDI 9/16/09 B2B Social Communications Case Studies Conference - Deloitte
BDI 9/16/09 B2B Social Communications Case Studies Conference - Deloitte
 
Conférence Open Data par où commencer ? "How to achieve interoperability?" E....
Conférence Open Data par où commencer ? "How to achieve interoperability?" E....Conférence Open Data par où commencer ? "How to achieve interoperability?" E....
Conférence Open Data par où commencer ? "How to achieve interoperability?" E....
 
Multi-Source Provenance-Aware User Interest Profiling on the Social Semantic Web
Multi-Source Provenance-Aware User Interest Profiling on the Social Semantic WebMulti-Source Provenance-Aware User Interest Profiling on the Social Semantic Web
Multi-Source Provenance-Aware User Interest Profiling on the Social Semantic Web
 
Cloud and E2.0: Connecting the Dots - OSCON Cloud Summit - 2010
Cloud and E2.0: Connecting the Dots - OSCON Cloud Summit - 2010Cloud and E2.0: Connecting the Dots - OSCON Cloud Summit - 2010
Cloud and E2.0: Connecting the Dots - OSCON Cloud Summit - 2010
 
Identity access and privacy in the new hybrid enterprise slides
Identity access and privacy in the new hybrid enterprise slidesIdentity access and privacy in the new hybrid enterprise slides
Identity access and privacy in the new hybrid enterprise slides
 
What is SDMX-RDF?
What is SDMX-RDF?What is SDMX-RDF?
What is SDMX-RDF?
 
Intranet 2.0 - Integrating Enterprise 2.0 into your corporate intranet
Intranet 2.0 - Integrating Enterprise 2.0 into your corporate intranetIntranet 2.0 - Integrating Enterprise 2.0 into your corporate intranet
Intranet 2.0 - Integrating Enterprise 2.0 into your corporate intranet
 
An imperative focus on semantic
An imperative focus on semanticAn imperative focus on semantic
An imperative focus on semantic
 

Último

Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.pptamreenkhanum0307
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 

Último (20)

Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.ppt
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 

Leveraging Matching Dependencies for Guided User Feedback in Linked Data Applications

  • 1. Digital Enterprise Research Institute www.deri.ie Leveraging Matching Dependencies for Guided User Feedback in Linked Data Applications Umair ul Hassan, Sean O’Riain, Edward Curry Digital Enterprise Research Institute National University of Ireland, Galway Copyright 2011 Digital Enterprise Research Institute. All rights reserved.
  • 2. Outline Digital Enterprise Research Institute www.deri.ie  Motivation & Problem Space  Identity Resolution on the Linked Open Data (LOD) Web  Proposed Approach  LOD Application Architecture  How it relates to existing works  Evaluation  Conclusion & Future Work
  • 3. Overview Digital Enterprise Research Institute www.deri.ie  Identity Resolution in the Linked Open Data Web  Real-world entities have multiple identifiers in LOD  Identity resolution links have associated uncertainty  LOD Applications require user verification of links  Problem  Feedback for all links is infeasible for large datasets  LOD Applications have domain specific utility of links  Proposed Approach  Leverages matching dependencies to define domain specific requirements of identity resolution  Ranks identity resolution links according to value of perfect information
  • 4. Linked Open Data (LOD) Digital Enterprise Research Institute www.deri.ie  Expose and interlink datasets on the Web  Using URIs to identify “things” in your data  Using a graph representation (RDF) to describe URIs  Vision: The Web as a huge graph database Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
  • 5. Linked Data Example Digital Enterprise Research Institute www.deri.ie Identity resolution links Multiple Identifiers
  • 6. Identity Resolution in LOD Digital Enterprise Research Institute www.deri.ie  Identity resolution is required for consolidation of data in applications consuming LOD  Three sources of identity resolution links  Provided by data publishers (e.g. dbpedia.org)  Generated by consumer through tools (e.g. SILK, SEMIRI, RiMOM)  Maintained by third party web services (e.g. sameas.org)  Uncertainty associated with links  Due to multiple identity equivalence interpretations  Due to characteristics of link generation algorithms (similarity based)
  • 7. Identity Resolution Problem Digital Enterprise Research Institute www.deri.ie  User feedback for uncertain links  Verify uncertain identity resolution links from users/experts  Improve quality of entity consolidation  Challenges  Domain specific semantic requirements – How to define domain specific requirements of quality for Linked Data applications?  Limited user attention – How to rank candidate links according to their benefit to maximize utility of user feedback?
  • 8. Identity Resolution Problem Digital Enterprise Research Institute www.deri.ie  User feedback for uncertain links  Verify uncertain identity resolution links from users/experts  Improve quality of entity consolidation  Proposed Approach  Domain specific semantic requirements – Leverage Matching Dependencies  Limited user attention – Employ value of perfect information theory
  • 9. LOD Application Architecture Digital Enterprise Research Institute www.deri.ie Utility Feedback Consolidation Module Module Module Candidate Links Questions Rules Feedback Matching Utility Dependencies Improvement Ranked Feedback Tasks Tom Heath and Christian Bizer (2011) Linked Data: Evolving the Web into a Global Data Space (1st edition), 1-136. Morgan & Claypool.
  • 10. Related Work Digital Enterprise Research Institute www.deri.ie  Jeffery et al., “Pay-as-you-go user feedback for dataspace systems,” in Proceedings of the 2008 ACM SIGMOD Conference, 2008, pp. 847-860.  Utility:  In terms of cardinality of query results on dataspace  General metric not suitable for application specific data quality  Assumption:  Availability of global query statistics – Problematic for Linked Open Data
  • 11. Proposed Approach Digital Enterprise Research Institute www.deri.ie  Domain Specific Utility  Define utility in terms of user specified rules i.e. matching dependencies  Rank candidates links for user feedback according to value of perfect information  Assumptions  We assume matching dependencies are either provided by user or generated through existing tools  Utility is based on satisfaction ratio of dependencies in dataspace
  • 12. Proposed Approach Digital Enterprise Research Institute www.deri.ie  Matching Dependencies  Matching Rule  Example  Utility of rule g (mk ) U ( Dmk , M {mk }) pk  Value of Perfect Information U ( Dmk , M {mk })(1 pk ) U ( D, M )
  • 13. Evaluation Digital Enterprise Research Institute www.deri.ie  Measure change in utility of a dataspace according to matching rules after a specific number of feedback iterations  Candidate links generated by the Silk framework
  • 14. Evaluation Digital Enterprise Research Institute www.deri.ie  Datasets IIMB 2009 Dataset UCI-Adult Dataset Drug Dataset Data Source Instance Matching Benchmark UCI Machine Learning Repository Instance Matching Benchmark 2009 2010 Data Collection IIMB 2009 US Consensus Dataset DrugBank and Sider Datasets - Reference Ontology - Manually created duplicates and - Interlinking between two datasets - Ontology #16 with errors in data value errors of same domain attributes Entity Types imdb:Movie foaf:Person drugbank:drugs, sider:drugs Total Triples 291 64000 14348 Total Entity IDs 44 4000 5696 Total Attributes 9 16 3 Total Values 130 10878 8473 Candidate Links 81 72 94 Correct Links 22 72 66
  • 15. Evaluation Digital Enterprise Research Institute www.deri.ie IIMB 2009 Dataset UCI-Adult Dataset 100% 100% Dataspace Utility Improvement Dataspace Utility Improvement 90% 90% 80% 80% 70% 70% 60% 60% 50% 50% 40% 40% 30% VPI_RULES 30% VPI_RULES 20% CONFIDENCE 20% CONFIDENCE 10% RANDOM 10% RANDOM 0% 0% 0% 20% 40% 60% 80% 100% 0% 20% 40% 60% 80% 100% Feedback Iteration Feedback Iteration
  • 16. Conclusion Digital Enterprise Research Institute www.deri.ie  Matching dependencies provide an effective mechanism to:  Represent entity matching rules  Specify domain specific semantic requirements  Measure utility of dataspaces  Value of perfect information enables effective ranking strategy for user feedback  In the three datasets 100% utility improvement was reached under 40% of user feedback
  • 17. Future Work Digital Enterprise Research Institute www.deri.ie  Expand to other data quality problems  Expand on types of dependencies such as comparable dependencies and order dependencies  Allow multi-user feedback for collaborative data cleaning

Notas del editor

  1. Personal background
  2. Executive summary vs. overview
  3. Executive summary vs. overview
  4. Complete stack of semantic web technologies is based on open standards and protocols.The semantic web technologies focus on application layer of internet stack.
  5. Go back to research question slidesGo back to work flow and highlight whats needed
  6. Emphasize blendedReference SIGMOD