SlideShare a Scribd company logo
1 of 16
Crowdsourcing tasks in Linked Data management
 Elena Simperl,1 Barry Norton,2 Denny Vrandecic1
 1Institute         AIFB, Karlsruhe Institute of Technology, Germany
 2Ontotext          AD, Bulgaria
 Institute of Applied Informatics and Formal Description Methods (AIFB)
Institute of Applied Informatics and Formal Description Methods (AIFB)




 KIT – University of the State of Baden-Wuerttemberg and
 National Research Center of the Helmholtz Association                    www.kit.edu
Motivation

        Various aspects of Linked Data management
       naturally rely on human intelligence to yield
       optimal results
        But reaching a critical mass of useful contributions
       from all relevant stakeholders is still more an art
       than an engineering exercise




2   23.10.2011   Seminar - Die Rolle von Ontologien in Linked Data – Kickoff   Institut für Angewandte Informatik und Formale
                                                                                                Beschreibungsverfahren (AIFB)
Microtask platforms



                                                                 Break task
                                                                               Evaluate the
                    Define task                                 into smaller
                                                                                 results
                                                                    units




3   23.10.2011   Seminar - Die Rolle von Ontologien in Linked Data – Kickoff   Institut für Angewandte Informatik und Formale
                                                                                                Beschreibungsverfahren (AIFB)
Approach
        Formal, declarative description of the data and tasks
       using SPARQL patterns as a basis for the automatic
       design of HITs

         Integral part of Linked Data tools and applications
                 At design time application developer specifies which data
                 portions workers can process and via which types of HITs
                 At run time
                      The system materializes the data
                      Workers process it
                      Data and application are updated to reflect crowdsourcing results


4   23.10.2011   Crowdsourcing tasks in Linked Data management    Institut für Angewandte Informatik und Formale
                                                                                   Beschreibungsverfahren (AIFB)
Examples of Linked Data tasks
    amenable to crowdsourcing

         Identity resolution
         Metadata completion and checking/correction
         Classification
         Ordering
           Quantitative
           Qualitative
         Translation




5   23.10.2011   Crowdsourcing tasks in Linked Data management   Institut für Angewandte Informatik und Formale
                                                                                  Beschreibungsverfahren (AIFB)
Running Example




6   23.10.2011   Crowdsourcing tasks in Linked Data management   Institut für Angewandte Informatik und Formale
                                                                                  Beschreibungsverfahren (AIFB)
Identity resolution

    Identity Resolution “involves the creation of sameAs
    links, either by comparison of metadata or by
    investigation of links on the human Web.”
    Input: {?station a metar:Station;
                      rdfs:label ?slabel;
                      wgs84:lat ?slat;
                      wgs84:long ?slong .
             ?airport a dbp-owl:Airport;
                      rdfs:label ?alabel;
                      wgs84:lat ?alat;
                      wgs84:long ?along}
    Output: {OPTIONAL
             {?airport owl:sameAs ?station}}



7   23.10.2011   Crowdsourcing tasks in Linked Data management   Institut für Angewandte Informatik und Formale
                                                                                  Beschreibungsverfahren (AIFB)
Metadata completion & correction

    “Certain properties, necessary for a given query,
    may not be uniformly populated. Manually conducted
    research might be necessary to transfer this
    information from the human-readable Web”
     Input: {?station a metar:Station;
                      rdfs:label ?label;
                      wgs84:lat ?lat;
                      wgs84:long ?long;
                      dbp:icao ?badicao}


     Output: {?station dbp:icao ?goodicao}




8   23.10.2011   Crowdsourcing tasks in Linked Data management   Institut für Angewandte Informatik und Formale
                                                                                  Beschreibungsverfahren (AIFB)
Classification

    “Linked Data emphasis[es…] relationships between
    resources [over classification]. [D]ue to the promoted
    use of generic vocabularies, is it not always possible
    to infer classification from […] properties”
    Input: {?station a metar:Station;
                     rdfs:label ?label;
                     wgs84:lat ?lat;
                     wgs84:long ?long}



    Output: {?station a ?type.
             ?type rdfs:subClassOf
            metar:Station}


9   23.10.2011   Crowdsourcing tasks in Linked Data management   Institut für Angewandte Informatik und Formale
                                                                                  Beschreibungsverfahren (AIFB)
Ordering

     “Having means to rank Linked Data content along
     specific dimensions is typically deemed useful for
                                                          quantitative
     querying and browsing […both] “specific” ordering
     [(e.g. timestamps) … and] orderings […] via           qualitative
     “less straightforward” built-ins [(e.g. pref/alt labels)]”

 Input: {?station foaf:depiction ?x, ?y}




 Output: {{(?x ?y) a rdf:List}
          UNION {(?y ?x) a rdf:List}}



10   23.10.2011   Crowdsourcing tasks in Linked Data management   Institut für Angewandte Informatik und Formale
                                                                                   Beschreibungsverfahren (AIFB)
Translation

     “[An important] aspect of the labeling of resources for
     humans is multi-linguality […] actual provision of labels
     in non-English languages is currently rather low”


     Input: {?station rdfs:label ?enlabel.
             FILTER (LANG(?label) = "EN")}




     Output: {?station rdfs:label ?bglabel.
              FILTER (LANG(?label) = "BG")}




11   23.10.2011   Crowdsourcing tasks in Linked Data management   Institut für Angewandte Informatik und Formale
                                                                                   Beschreibungsverfahren (AIFB)
Open query answering

          Query a FOAF-file using the vCard vocabulary

     hp:Harry foaf:mbox <mailto:scarface@hogwarts.ac.uk> ;
        foaf:nick "Harry" ; foaf:familyName "Potter" .


     SELECT ?name ?email WHERE
     { ?p vcard:email ?email ; vcard:fn ?name }



          In order to answer the query as intended
                  Vocabulary mapping and entity resolution (foaf to vcard)
                  Metadata completion (full name is Harry Potter)
12   23.10.2011   Crowdsourcing tasks in Linked Data management   Institut für Angewandte Informatik und Formale
                                                                                   Beschreibungsverfahren (AIFB)
Limitations of microtask crowdsourcing

          Decomposability
          Verifiability
          Expertise

         Compositions to deal with tasks with
        underspecified workflow and/or multiple correct
        answers




13   23.10.2011   Crowdsourcing tasks in Linked Data management   Institut für Angewandte Informatik und Formale
                                                                                   Beschreibungsverfahren (AIFB)
Challenges
          Decomposition of user-visible queries:
                  SPARQL
                       Easy: Low quality (meta)data can be subject to automated
                       checking (even if not fixing)
                       Medium: Missing data (and translation) can be automatically
                       identified (but knowing to which dataset it should belong is not
                       necessarily clear)
                       Difficult:
                            Interlinking (at least sameAs) is somewhat implicit (using
                            entailment) and knowing where user expects
                            Query optimisation obfuscates what is used and should
                            involve costs for human tasks
                  Pig might be somewhat easier in latter regard
          Caching
                  Naively we can materialise HIT results into datasets
                  How to deal with partial coverage and dynamic datasets
14   23.10.2011   Crowdsourcing tasks in Linked Data management      Institut für Angewandte Informatik und Formale
                                                                                      Beschreibungsverfahren (AIFB)
Further Challenges

         Appropriate level of granularity for HITs design for
        specific SPARQL constructs and typical
        functionality of Linked Data management
        components
         Optimal user interfaces of graph-like content
                  (Contextual) Rendering of LOD entities and tasks
          Pricing and workers’ assignment
                  Can we connect the end-users of an application and
                  their wish for specific data to be consumed with the
                  payment of workers and prioritization of HITs?
                  Dealing with spam / gaming
15   23.10.2011   Crowdsourcing tasks in Linked Data management   Institut für Angewandte Informatik und Formale
                                                                                   Beschreibungsverfahren (AIFB)
QUESTIONS



16   23.10.2011   Crowdsourcing tasks in Linked Data management   Institut für Angewandte Informatik und Formale
                                                                                   Beschreibungsverfahren (AIFB)

More Related Content

Viewers also liked

Linked-Data based Data Management for data.gov.sg
Linked-Data based Data Management for data.gov.sgLinked-Data based Data Management for data.gov.sg
Linked-Data based Data Management for data.gov.sgAravind Sesagiri Raamkumar
 
Crowdsourcing for Information Retrieval: From Statistics to Ethics
Crowdsourcing for Information Retrieval: From Statistics to EthicsCrowdsourcing for Information Retrieval: From Statistics to Ethics
Crowdsourcing for Information Retrieval: From Statistics to EthicsMatthew Lease
 
Crowdsourcing Linked Data management
Crowdsourcing Linked Data managementCrowdsourcing Linked Data management
Crowdsourcing Linked Data managementElena Simperl
 
Crowdsourcing & Human Computation Labeling Data & Building Hybrid Systems
Crowdsourcing & Human Computation Labeling Data & Building Hybrid SystemsCrowdsourcing & Human Computation Labeling Data & Building Hybrid Systems
Crowdsourcing & Human Computation Labeling Data & Building Hybrid SystemsMatthew Lease
 
Managing Crowdsourced Human Computation: A Tutorial
Managing Crowdsourced Human Computation: A TutorialManaging Crowdsourced Human Computation: A Tutorial
Managing Crowdsourced Human Computation: A TutorialPanos Ipeirotis
 
ResearchSpace Platform in Use
ResearchSpace Platform in UseResearchSpace Platform in Use
ResearchSpace Platform in UseBarry Norton
 
European Data Science Academy: Training the Next Generation of Data Scientists
European Data Science Academy: Training the Next Generation of Data ScientistsEuropean Data Science Academy: Training the Next Generation of Data Scientists
European Data Science Academy: Training the Next Generation of Data ScientistsElena Simperl
 

Viewers also liked (7)

Linked-Data based Data Management for data.gov.sg
Linked-Data based Data Management for data.gov.sgLinked-Data based Data Management for data.gov.sg
Linked-Data based Data Management for data.gov.sg
 
Crowdsourcing for Information Retrieval: From Statistics to Ethics
Crowdsourcing for Information Retrieval: From Statistics to EthicsCrowdsourcing for Information Retrieval: From Statistics to Ethics
Crowdsourcing for Information Retrieval: From Statistics to Ethics
 
Crowdsourcing Linked Data management
Crowdsourcing Linked Data managementCrowdsourcing Linked Data management
Crowdsourcing Linked Data management
 
Crowdsourcing & Human Computation Labeling Data & Building Hybrid Systems
Crowdsourcing & Human Computation Labeling Data & Building Hybrid SystemsCrowdsourcing & Human Computation Labeling Data & Building Hybrid Systems
Crowdsourcing & Human Computation Labeling Data & Building Hybrid Systems
 
Managing Crowdsourced Human Computation: A Tutorial
Managing Crowdsourced Human Computation: A TutorialManaging Crowdsourced Human Computation: A Tutorial
Managing Crowdsourced Human Computation: A Tutorial
 
ResearchSpace Platform in Use
ResearchSpace Platform in UseResearchSpace Platform in Use
ResearchSpace Platform in Use
 
European Data Science Academy: Training the Next Generation of Data Scientists
European Data Science Academy: Training the Next Generation of Data ScientistsEuropean Data Science Academy: Training the Next Generation of Data Scientists
European Data Science Academy: Training the Next Generation of Data Scientists
 

Similar to Crowdsourcing tasks in Linked Data management

Crowdsourcing-enabled Linked Data management architecture
Crowdsourcing-enabled Linked Data management architectureCrowdsourcing-enabled Linked Data management architecture
Crowdsourcing-enabled Linked Data management architectureElena Simperl
 
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...HostedbyConfluent
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)inventionjournals
 
20100512 Workflow Ramage
20100512 Workflow Ramage20100512 Workflow Ramage
20100512 Workflow RamageSteven Ramage
 
Hibernate training at HarshithaTechnologySolutions @ Nizampet
Hibernate training at HarshithaTechnologySolutions @ NizampetHibernate training at HarshithaTechnologySolutions @ Nizampet
Hibernate training at HarshithaTechnologySolutions @ NizampetJayarajus
 
Sem tech 2011 v8
Sem tech 2011 v8Sem tech 2011 v8
Sem tech 2011 v8dallemang
 
How to Find a Needle in the Haystack
How to Find a Needle in the HaystackHow to Find a Needle in the Haystack
How to Find a Needle in the HaystackAdrian Stevenson
 
An Overview of VIEW
An Overview of VIEWAn Overview of VIEW
An Overview of VIEWShiyong Lu
 
ETL big data with apache hadoop
ETL big data with apache hadoopETL big data with apache hadoop
ETL big data with apache hadoopMaulik Thaker
 
Adcom2006 Full 6
Adcom2006 Full 6Adcom2006 Full 6
Adcom2006 Full 6umavanth
 
Application-Aware Big Data Deduplication in Cloud Environment
Application-Aware Big Data Deduplication in Cloud EnvironmentApplication-Aware Big Data Deduplication in Cloud Environment
Application-Aware Big Data Deduplication in Cloud EnvironmentSafayet Hossain
 
Compositional AI: Fusion of AI/ML Services
Compositional AI: Fusion of AI/ML ServicesCompositional AI: Fusion of AI/ML Services
Compositional AI: Fusion of AI/ML ServicesDebmalya Biswas
 
Possible Worlds Explorer: Datalog & Answer Set Programming for the Rest of Us
Possible Worlds Explorer: Datalog & Answer Set Programming for the Rest of UsPossible Worlds Explorer: Datalog & Answer Set Programming for the Rest of Us
Possible Worlds Explorer: Datalog & Answer Set Programming for the Rest of UsBertram Ludäscher
 

Similar to Crowdsourcing tasks in Linked Data management (20)

Crowdsourcing-enabled Linked Data management architecture
Crowdsourcing-enabled Linked Data management architectureCrowdsourcing-enabled Linked Data management architecture
Crowdsourcing-enabled Linked Data management architecture
 
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
 
A02620109
A02620109A02620109
A02620109
 
A02620109
A02620109A02620109
A02620109
 
20100512 Workflow Ramage
20100512 Workflow Ramage20100512 Workflow Ramage
20100512 Workflow Ramage
 
Hibernate training at HarshithaTechnologySolutions @ Nizampet
Hibernate training at HarshithaTechnologySolutions @ NizampetHibernate training at HarshithaTechnologySolutions @ Nizampet
Hibernate training at HarshithaTechnologySolutions @ Nizampet
 
Sem tech 2011 v8
Sem tech 2011 v8Sem tech 2011 v8
Sem tech 2011 v8
 
ThesisProposal
ThesisProposalThesisProposal
ThesisProposal
 
How to Find a Needle in the Haystack
How to Find a Needle in the HaystackHow to Find a Needle in the Haystack
How to Find a Needle in the Haystack
 
An Overview of VIEW
An Overview of VIEWAn Overview of VIEW
An Overview of VIEW
 
Software design
Software designSoftware design
Software design
 
Interoperability
InteroperabilityInteroperability
Interoperability
 
Task Complexity Metrics - Ben Colborn
Task Complexity Metrics - Ben ColbornTask Complexity Metrics - Ben Colborn
Task Complexity Metrics - Ben Colborn
 
ETL big data with apache hadoop
ETL big data with apache hadoopETL big data with apache hadoop
ETL big data with apache hadoop
 
Adcom2006 Full 6
Adcom2006 Full 6Adcom2006 Full 6
Adcom2006 Full 6
 
Application-Aware Big Data Deduplication in Cloud Environment
Application-Aware Big Data Deduplication in Cloud EnvironmentApplication-Aware Big Data Deduplication in Cloud Environment
Application-Aware Big Data Deduplication in Cloud Environment
 
Compositional AI: Fusion of AI/ML Services
Compositional AI: Fusion of AI/ML ServicesCompositional AI: Fusion of AI/ML Services
Compositional AI: Fusion of AI/ML Services
 
Search Approach - ES, GraphDB
Search Approach - ES, GraphDBSearch Approach - ES, GraphDB
Search Approach - ES, GraphDB
 
Possible Worlds Explorer: Datalog & Answer Set Programming for the Rest of Us
Possible Worlds Explorer: Datalog & Answer Set Programming for the Rest of UsPossible Worlds Explorer: Datalog & Answer Set Programming for the Rest of Us
Possible Worlds Explorer: Datalog & Answer Set Programming for the Rest of Us
 

More from Barry Norton

Knowledge Graphs and Milestone
Knowledge Graphs and MilestoneKnowledge Graphs and Milestone
Knowledge Graphs and MilestoneBarry Norton
 
ResearchSpace Collaborative Features
ResearchSpace Collaborative FeaturesResearchSpace Collaborative Features
ResearchSpace Collaborative FeaturesBarry Norton
 
Book of the Dead Project
Book of the Dead ProjectBook of the Dead Project
Book of the Dead ProjectBarry Norton
 
Data Culture / Culture Data
Data Culture / Culture DataData Culture / Culture Data
Data Culture / Culture DataBarry Norton
 
Querying Cultural Heritage
Querying Cultural HeritageQuerying Cultural Heritage
Querying Cultural HeritageBarry Norton
 
A Data API with Security and Graph-Level Access Control
A Data API with Security and Graph-Level Access ControlA Data API with Security and Graph-Level Access Control
A Data API with Security and Graph-Level Access ControlBarry Norton
 
GLAMorous LOD and ResearchSpace introduction
GLAMorous LOD and ResearchSpace introductionGLAMorous LOD and ResearchSpace introduction
GLAMorous LOD and ResearchSpace introductionBarry Norton
 
Linked Data, Ontologies and Inference
Linked Data, Ontologies and InferenceLinked Data, Ontologies and Inference
Linked Data, Ontologies and InferenceBarry Norton
 
Integrating Drupal with a Triple Store
Integrating Drupal with a Triple StoreIntegrating Drupal with a Triple Store
Integrating Drupal with a Triple StoreBarry Norton
 
Linked Data and Services
Linked Data and ServicesLinked Data and Services
Linked Data and ServicesBarry Norton
 
Towards Linked Open Services and Processes
Towards Linked Open Services and ProcessesTowards Linked Open Services and Processes
Towards Linked Open Services and ProcessesBarry Norton
 
Geospatial Linked Open Services
Geospatial Linked Open ServicesGeospatial Linked Open Services
Geospatial Linked Open ServicesBarry Norton
 
Linked Open Services @ SemData2010
Linked Open Services @ SemData2010Linked Open Services @ SemData2010
Linked Open Services @ SemData2010Barry Norton
 

More from Barry Norton (15)

Knowledge Graphs and Milestone
Knowledge Graphs and MilestoneKnowledge Graphs and Milestone
Knowledge Graphs and Milestone
 
GRAVITATE Search
GRAVITATE SearchGRAVITATE Search
GRAVITATE Search
 
ResearchSpace Collaborative Features
ResearchSpace Collaborative FeaturesResearchSpace Collaborative Features
ResearchSpace Collaborative Features
 
Book of the Dead Project
Book of the Dead ProjectBook of the Dead Project
Book of the Dead Project
 
Data Culture / Culture Data
Data Culture / Culture DataData Culture / Culture Data
Data Culture / Culture Data
 
Querying Cultural Heritage
Querying Cultural HeritageQuerying Cultural Heritage
Querying Cultural Heritage
 
A Data API with Security and Graph-Level Access Control
A Data API with Security and Graph-Level Access ControlA Data API with Security and Graph-Level Access Control
A Data API with Security and Graph-Level Access Control
 
GLAMorous LOD and ResearchSpace introduction
GLAMorous LOD and ResearchSpace introductionGLAMorous LOD and ResearchSpace introduction
GLAMorous LOD and ResearchSpace introduction
 
GLAMorous LOD
GLAMorous LODGLAMorous LOD
GLAMorous LOD
 
Linked Data, Ontologies and Inference
Linked Data, Ontologies and InferenceLinked Data, Ontologies and Inference
Linked Data, Ontologies and Inference
 
Integrating Drupal with a Triple Store
Integrating Drupal with a Triple StoreIntegrating Drupal with a Triple Store
Integrating Drupal with a Triple Store
 
Linked Data and Services
Linked Data and ServicesLinked Data and Services
Linked Data and Services
 
Towards Linked Open Services and Processes
Towards Linked Open Services and ProcessesTowards Linked Open Services and Processes
Towards Linked Open Services and Processes
 
Geospatial Linked Open Services
Geospatial Linked Open ServicesGeospatial Linked Open Services
Geospatial Linked Open Services
 
Linked Open Services @ SemData2010
Linked Open Services @ SemData2010Linked Open Services @ SemData2010
Linked Open Services @ SemData2010
 

Recently uploaded

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 

Recently uploaded (20)

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 

Crowdsourcing tasks in Linked Data management

  • 1. Crowdsourcing tasks in Linked Data management Elena Simperl,1 Barry Norton,2 Denny Vrandecic1 1Institute AIFB, Karlsruhe Institute of Technology, Germany 2Ontotext AD, Bulgaria Institute of Applied Informatics and Formal Description Methods (AIFB) Institute of Applied Informatics and Formal Description Methods (AIFB) KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association www.kit.edu
  • 2. Motivation Various aspects of Linked Data management naturally rely on human intelligence to yield optimal results But reaching a critical mass of useful contributions from all relevant stakeholders is still more an art than an engineering exercise 2 23.10.2011 Seminar - Die Rolle von Ontologien in Linked Data – Kickoff Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
  • 3. Microtask platforms Break task Evaluate the Define task into smaller results units 3 23.10.2011 Seminar - Die Rolle von Ontologien in Linked Data – Kickoff Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
  • 4. Approach Formal, declarative description of the data and tasks using SPARQL patterns as a basis for the automatic design of HITs Integral part of Linked Data tools and applications At design time application developer specifies which data portions workers can process and via which types of HITs At run time The system materializes the data Workers process it Data and application are updated to reflect crowdsourcing results 4 23.10.2011 Crowdsourcing tasks in Linked Data management Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
  • 5. Examples of Linked Data tasks amenable to crowdsourcing Identity resolution Metadata completion and checking/correction Classification Ordering Quantitative Qualitative Translation 5 23.10.2011 Crowdsourcing tasks in Linked Data management Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
  • 6. Running Example 6 23.10.2011 Crowdsourcing tasks in Linked Data management Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
  • 7. Identity resolution Identity Resolution “involves the creation of sameAs links, either by comparison of metadata or by investigation of links on the human Web.” Input: {?station a metar:Station; rdfs:label ?slabel; wgs84:lat ?slat; wgs84:long ?slong . ?airport a dbp-owl:Airport; rdfs:label ?alabel; wgs84:lat ?alat; wgs84:long ?along} Output: {OPTIONAL {?airport owl:sameAs ?station}} 7 23.10.2011 Crowdsourcing tasks in Linked Data management Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
  • 8. Metadata completion & correction “Certain properties, necessary for a given query, may not be uniformly populated. Manually conducted research might be necessary to transfer this information from the human-readable Web” Input: {?station a metar:Station; rdfs:label ?label; wgs84:lat ?lat; wgs84:long ?long; dbp:icao ?badicao} Output: {?station dbp:icao ?goodicao} 8 23.10.2011 Crowdsourcing tasks in Linked Data management Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
  • 9. Classification “Linked Data emphasis[es…] relationships between resources [over classification]. [D]ue to the promoted use of generic vocabularies, is it not always possible to infer classification from […] properties” Input: {?station a metar:Station; rdfs:label ?label; wgs84:lat ?lat; wgs84:long ?long} Output: {?station a ?type. ?type rdfs:subClassOf metar:Station} 9 23.10.2011 Crowdsourcing tasks in Linked Data management Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
  • 10. Ordering “Having means to rank Linked Data content along specific dimensions is typically deemed useful for quantitative querying and browsing […both] “specific” ordering [(e.g. timestamps) … and] orderings […] via qualitative “less straightforward” built-ins [(e.g. pref/alt labels)]” Input: {?station foaf:depiction ?x, ?y} Output: {{(?x ?y) a rdf:List} UNION {(?y ?x) a rdf:List}} 10 23.10.2011 Crowdsourcing tasks in Linked Data management Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
  • 11. Translation “[An important] aspect of the labeling of resources for humans is multi-linguality […] actual provision of labels in non-English languages is currently rather low” Input: {?station rdfs:label ?enlabel. FILTER (LANG(?label) = "EN")} Output: {?station rdfs:label ?bglabel. FILTER (LANG(?label) = "BG")} 11 23.10.2011 Crowdsourcing tasks in Linked Data management Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
  • 12. Open query answering Query a FOAF-file using the vCard vocabulary hp:Harry foaf:mbox <mailto:scarface@hogwarts.ac.uk> ; foaf:nick "Harry" ; foaf:familyName "Potter" . SELECT ?name ?email WHERE { ?p vcard:email ?email ; vcard:fn ?name } In order to answer the query as intended Vocabulary mapping and entity resolution (foaf to vcard) Metadata completion (full name is Harry Potter) 12 23.10.2011 Crowdsourcing tasks in Linked Data management Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
  • 13. Limitations of microtask crowdsourcing Decomposability Verifiability Expertise Compositions to deal with tasks with underspecified workflow and/or multiple correct answers 13 23.10.2011 Crowdsourcing tasks in Linked Data management Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
  • 14. Challenges Decomposition of user-visible queries: SPARQL Easy: Low quality (meta)data can be subject to automated checking (even if not fixing) Medium: Missing data (and translation) can be automatically identified (but knowing to which dataset it should belong is not necessarily clear) Difficult: Interlinking (at least sameAs) is somewhat implicit (using entailment) and knowing where user expects Query optimisation obfuscates what is used and should involve costs for human tasks Pig might be somewhat easier in latter regard Caching Naively we can materialise HIT results into datasets How to deal with partial coverage and dynamic datasets 14 23.10.2011 Crowdsourcing tasks in Linked Data management Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
  • 15. Further Challenges Appropriate level of granularity for HITs design for specific SPARQL constructs and typical functionality of Linked Data management components Optimal user interfaces of graph-like content (Contextual) Rendering of LOD entities and tasks Pricing and workers’ assignment Can we connect the end-users of an application and their wish for specific data to be consumed with the payment of workers and prioritization of HITs? Dealing with spam / gaming 15 23.10.2011 Crowdsourcing tasks in Linked Data management Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)
  • 16. QUESTIONS 16 23.10.2011 Crowdsourcing tasks in Linked Data management Institut für Angewandte Informatik und Formale Beschreibungsverfahren (AIFB)