SlideShare a Scribd company logo
1 of 61
Download to read offline
Query Expansion Methods and
                    Performance Evaluation
                              for
                Reusing Linking Open Data of the
              European Public Procurement Notices

                               José María Álvarez Rodríguez
                                   WESO-Universidad de Oviedo
                                   http://purl.org/weso/moldeas/
                      Tecnologías de Linked Data y sus aplicaciones en España (TLDE)
                                       CAEPIA 2011-Tenerife (Spain)
                                          8th of November, 2011

Code: TSI-020100-2010-919
Overview
 Use case & Context

SPARQL & Performance

     Next Steps
Objective




Creation of a pan-european
 e-procurement platform
E-procurement
   Long Tail
 TED
        BOE
       (official bulletin
        of the Spanish
        Governement)        BOPA
                            (official bulletin
                             of the Asturian
                             Governement)
To Be Able to answer to…



     Which public procurement notices are
relevant to Dutch companies (only SMEs) that
  want to tender for contracts announced by
 local authorities with a total value lower than
 170K € to procure “Road bridge construction
  work” and a two year duration in the Dutch-
    speaking region of Flanders (Belgium)?
Structuring public procurement notices
    d
                                                                 Providing new semantic-
                                                                      based services

             yD             Z                           ^
                                                                             ^
                                                                                 ^
                                                           D         D
K                                                                      W




KW          LOD
          enrichment



K                                                                                   

                                                                     W               ^


    Ws                      Z
                                                                    Easing the access to the
                                                                    published data using the
Ehd^                                                                     LOD approach


                      Transforming government classifications
Preliminary Results

/           d            d


      W
s                        Z



K


Ehd^


W
Semantic-based
         Services


      Problem of
«Query Expansion»
depending on the kind of
  information variable
Methods of«Query
                       Expansion»
                                 



                     /                           '



d                                                Z
                                    E



                            '              



    ^           ^            h       ,



        ^
                                         



    Z
Remembering…



     Which public procurement notices are
relevant to Dutch companies (only SMEs) that
  want to tender for contracts announced by
 local authorities with a total value lower than
 170K € to procure “Road bridge construction
  work” and a two year duration in the Dutch-
    speaking region of Flanders (Belgium)?
cpv:45221111-3
  NL




                                      Query…
                                  Ehd^    Z'
                                    t KEE


                                         ppn:nutsCode
                    ppn:hasDuration



                                             cpv:CodeIn2008



                 ppn:hasAmount                    org:classification



                                                  ^D
cpv:45221111-3
  NL


                      Applying Query Expansion…
                                       Ehd^ 
                                      Ehd^ E
                                      Ehd^
                                       Ehd^ 


                                             ppn:nutsCode
                    ppn:hasDuration




                                                 cpv:CodeIn2008


                 ppn:hasAmount                         org:classification



                                                       ^D
Example of SPARQL
                         query
SELECT DISTINCT * WHERE {
   ?ppn       rdf:type          http://purl.org/weso/ppn/def#ppn.
   ?ppn       ppn:nutsCode       ?nutsCode.
   ?ppn       cpv:codeIn2008 ?cpvCode.
   ?ppn       ppn:hasDuration ?duration
   ?ppn       dc:identifier      ?id.
   ?ppn       dc:date             ?date .
   ? ppn      ppn:hasAmount ?amount.
    FILTER(? cpvCode = cpv:45221111-3 ... ) .
    FILTER (
        (xsd:double(?amount) = xsd:long(170,000)) 
        (xsd:double(?amount) = xsd:long(200,000)) ).
.   FILTER(?nutsCode = nuts:B3 ... ) .
    FILTER (
        (xsd:long(?duration) = xsd:long(2)) 
        (xsd:long(?duration) =     xsd:long(3)) ).
}
Context

Performance of SPARQL
       Queries

     ~30 sec.
Hardware 
        Software
DELL PC 2GB RAM and 30GB HardDisk
      Virtual Box (version 4.0.6)

Linux 2.6.35-22-server #33-Ubuntu 2 SMP
           x86_64 GNU/Linux
              Ubuntu 10.10

   OpenLink Virtuoso Opensource-6-
              20110218
Question?
How to decrease the time of
 query execution without
modify the hardware and not
 use any vendor feature?
TripleStore
    25 graphs
20 M of RDF Triples
       But…
     8 graphs
11 M of RDF Triples
Focus on..
The generation of SPARQL
         queries
Let’s start…


9 SPARQL Queries

  3 executions
d   ^      /D/d   /dZ   'Z W,^   ^   W   d

d
d
d
d
d
d
d
d
d
d
d
d
d
d
d
Simple SPARQL query

SELECT DISTINCT * WHERE {
   ?ppn    rdf:type        http://purl.org/weso/ppn/def#ppn.
   ?ppn    ppn:nutsCode     ?nutsCode.
   ?ppn    cpv:codeIn2008 ?cpvCode.
   ?ppn    ppn:hasDuration ?duration
   ?ppn    dc:identifier    ?id.
   ?ppn    dc:date           ?date .
   ? ppn   ppn:hasAmount ?amount.
   FILTER(? cpvCode = cpv:15331137 ) .
.  FILTER(?nutsCode = nuts:UK) .
}
Simple Query

    1 CPV Code
   1 NUTS Code


Time: ~3,29 sec.
T1

Rewrite SPARQL queries:
Match triples from specific to
           general
  Filter as soon as possible
T2

Use the LIMIT clause

 Value set to 10,000
Rewrite SPARQL query

SELECT DISTINCT * WHERE {
   ?ppn     rdf:type        http://purl.org/weso/ppn/def#ppn.
   ?ppn     cpv:codeIn2008 ?cpvCode.
    FILTER(? cpvCode = cpv:15331137 ) .
    ?ppn    ppn:nutsCode     ?nutsCode.
    FILTER(?nutsCode = nuts:UK) .
   ?ppn     ppn:hasDuration ?duration
   ?ppn     dc:identifier    ?id.
   ?ppn     dc:date           ?date .
    ? ppn   ppn:hasAmount ?amount.
.  }
LIMIT 10000
Results T2

    1 CPV Code
   1 NUTS Code



Time: ~3,26 sec.
Evaluation

  There is no significant
changes in execution time
       and gain…
           and
   We are interested in
   “enhanced queries”
T3

Execution of enhanced
       queries
Enhanced SPARQL
            query
SELECT DISTINCT * WHERE {
   ?ppn    rdf:type        http://purl.org/weso/ppn/def#ppn.
   ?ppn    ppn:nutsCode     ?nutsCode.
   ?ppn    cpv:codeIn2008 ?cpvCode.
   ?ppn    ppn:hasDuration ?duration
   ?ppn    dc:identifier    ?id.
   ?ppn    dc:date           ?date .
   ? ppn   ppn:hasAmount ?amount.
   FILTER(? cpvCode = {cpv:15331137 , cpv:48611000,
           cpv: 48611000, cpv:50531510, cpv: 15871210}) .
.  FILTER(?nutsCode = {nuts:B3, nuts:PL, nuts:RO ) .
}
Results T3

    5 CPV Codes
   3 NUTS Codes
       1 query


Time: ~20,65 sec.
T4

Rewrite SPARQL queries
           +
 Use the LIMIT clause
Results T4 wrt T3

    5 CPV Codes
   3 NUTS Codes
       1 query


Time: ~20,55 sec.
Info

     8 graphs

11 M of RDF Triples
T5

Rewrite SPARQL queries
            +
  Use the LIMIT clause
            +
 Named Graphs (FROM)
Results T5 wrt T3

    5 CPV Codes
   3 NUTS Codes
       1 query


Time: ~20,65 sec.
T6
Rewrite SPARQL queries
             +
  Use the LIMIT clause
             +
 Named Graphs (FROM)
             +
Split into simple queries
Results T6 wrt T3
    5 CPV Codes
   3 NUTS Codes
      4 Graphs
  4 simple queries

Time: ~20,60 sec.
T6-1
        Rewrite SPARQL queries
                    +
          Use the LIMIT clause
                    +
         Named Graphs (FROM)
                    +
Split enhance query into simple queries
                    +
   Parallelization of query execution
          (ad-hoc map/reduce)
Results T6-1 wrt T3
    5 CPV Codes
   3 NUTS Codes
      4 Graphs
  4 simple queries

Time: ~11,93 sec.
T7
        Rewrite SPARQL queries
                   +
         Use the LIMIT clause
                   +
Split enhance query into simple queries
Results T7 wrt T3
   1 CPV Code (5)
    3 NUTS Code
  5 simple queries


Time: ~15,81 sec.
T7-1
        Rewrite SPARQL queries
                    +
          Use the LIMIT clause
                    +
Split enhance query into simple queries
                    +
   Parallelization of query execution
          (ad-hoc map/reduce)
Results T7-1 wrt T3
   1 CPV Code (5)
   3 NUTS Codes
  5 simple queries


Time: ~10,55 sec.
T8
Rewrite SPARQL queries
             +
  Use the LIMIT clause
             +
 Named Graphs (FROM)
             +
Split into simple queries
Results T8 wrt T3
   1 CPV Code (5)
   3 NUTS Codes
      4 Graphs
  20 simple queries

Time: ~32,34 sec.
T8-1
        Rewrite SPARQL queries
                    +
          Use the LIMIT clause
                    +
         Named Graphs (FROM)
                    +
Split enhance query into simple queries
                    +
   Parallelization of query execution
          (ad-hoc map/reduce)
Results T8-1 wrt T3
   1 CPV Code (5)
   3 NUTS Codes
      4 Graphs
  20 simple queries

Time: ~18,45 sec.
T9
        Rewrite SPARQL queries
                    +
          Use the LIMIT clause
                    +
Split enhance query into simple queries
       (1 CPV code+1 NUTS code)
Results T9 wrt T3
    1 CPV Code (5)
   1 NUTS Code (3)
  15 simple queries


Time: ~22,462 sec.
T9-1
        Rewrite SPARQL queries
                    +
          Use the LIMIT clause
                    +
Split enhance query into simple queries
       (1 CPV code+1 NUTS code)
                    +
   Parallelization of query execution
           (ad-hoc map/reduce)
Results T9-1 wrt T3
    1 CPV Code (5)
   1 NUTS Code (3)
  15 simple queries


Time: ~12,77 sec.
T10
 Rewrite SPARQL queries
               +
   Use the LIMIT clause
               +
  Named Graphs (FROM)
               +
  Split into simple queries
(1 CPV code+1 NUTS code)
Results T10 wrt T3
    1 CPV Code (5)
   1 NUTS Code (3)
       4 Graphs
  60 simple queries

Time: ~71,17 sec.
T10-1
        Rewrite SPARQL queries
                    +
          Use the LIMIT clause
                    +
         Named Graphs (FROM)
                    +
Split enhance query into simple queries
       (1 CPV code+1 NUTS code)
                    +
   Parallelization of query execution
           (ad-hoc map/reduce)
Results T10-1 wrt T3
    1 CPV Code (5)
   1 NUTS Code (3)
       4 Graphs
  60 simple queries

Time: ~35,13 sec.
d       Table of Results
           d        '
    d               E
    d
    d               E
    d
    d
    d
    d
    d
    d
    d
    d
    d
    d
    d
    d
Discussion
•       The number of queries is a key-factor
•       The number of CPV codes implies more
        execution time
•       The parallelization improves execution
        time
•       T7-1 is the best execution in terms of
        time
    •     Rewrite SPARQL queries
    •     Use the LIMIT clause
    •     Split enhance query into simple queries
    •     Parallelization of query execution
Further Steps

• Distribute graphs in different nodes
  (HW improvement)
• Use of other triple stores
• (SW comparison)
• Add SPARQL 1.1 new features
  (Expressiveness improvement)
• Cache of queries (SW improvement)
Some
              References…
•   http://www4.wiwiss.fu-
    berlin.de/bizer/BerlinSPARQLBenchmark/results/index.html#comparison
•   http://www.slideshare.net/olafhartig/an-overview-on-linked-data-
    management-and-sparql-querying-isslod2011
•   http://squin.sourceforge.net/
•   http://www2.informatik.hu-
    berlin.de/~hartig/files/Slides_Hartig_ISSLOD2011.pdf
•   http://www2008.org/papers/pdf/p595-stocker1.pdf
•   http://www.informatik.uni-
    freiburg.de/~mschmidt/docs/diss_final01122010.pdf
•   http://mayor2.dia.fi.upm.es/oeg-upm/files/sparql-dqp/eswc11-bac-ext.pdf
•   http://www.slideshare.net/olafhartig/the-sparql-query-graph-model-for-
    query-optimization-1259536
•   http://www.w3.org/TR/sparql-features/
Query Expansion Methods and
                    Performance Evaluation
                              for
                Reusing Linking Open Data of the
              European Public Procurement Notices

                               José María Álvarez Rodríguez
                                   WESO-Universidad de Oviedo
                                   http://purl.org/weso/moldeas/
                      Tecnologías de Linked Data y sus aplicaciones en España (TLDE)
                                       CAEPIA 2011-Tenerife (Spain)
                                          8th of November, 2011

Code: TSI-020100-2010-919
WESO CAEPIA-20111108

More Related Content

Viewers also liked

Status för rättsinformationsprojektet
Status för rättsinformationsprojektetStatus för rättsinformationsprojektet
Status för rättsinformationsprojektet
stafmal
 
Intro til de 5 tjenester
Intro til de 5 tjenesterIntro til de 5 tjenester
Intro til de 5 tjenester
UTH2010
 
Does your website speak Chinese?
Does your website speak Chinese?Does your website speak Chinese?
Does your website speak Chinese?
KenticoCMS
 
A Hinkhouse Design
A Hinkhouse DesignA Hinkhouse Design
A Hinkhouse Design
borracho13
 
Questionnaire results
Questionnaire results Questionnaire results
Questionnaire results
nicolaalalaa
 

Viewers also liked (20)

The rise of group buying sites
The rise of group buying sitesThe rise of group buying sites
The rise of group buying sites
 
Do Something Impossible (made with love by @stratlab)
Do Something Impossible (made with love by @stratlab)Do Something Impossible (made with love by @stratlab)
Do Something Impossible (made with love by @stratlab)
 
Esmoda
EsmodaEsmoda
Esmoda
 
Bringing Agile to universities
Bringing Agile to universitiesBringing Agile to universities
Bringing Agile to universities
 
La llegenda d
La llegenda dLa llegenda d
La llegenda d
 
Status för rättsinformationsprojektet
Status för rättsinformationsprojektetStatus för rättsinformationsprojektet
Status för rättsinformationsprojektet
 
Activism social
Activism socialActivism social
Activism social
 
Geodesic dome structures
Geodesic dome structuresGeodesic dome structures
Geodesic dome structures
 
Health
HealthHealth
Health
 
How to brand and sell online
How to brand and sell onlineHow to brand and sell online
How to brand and sell online
 
「共通点を見つける練習」宮下芳明(明治大学)
「共通点を見つける練習」宮下芳明(明治大学)「共通点を見つける練習」宮下芳明(明治大学)
「共通点を見つける練習」宮下芳明(明治大学)
 
Interaction keynote
Interaction keynoteInteraction keynote
Interaction keynote
 
Computers
ComputersComputers
Computers
 
Intro til de 5 tjenester
Intro til de 5 tjenesterIntro til de 5 tjenester
Intro til de 5 tjenester
 
Lipi
LipiLipi
Lipi
 
Does your website speak Chinese?
Does your website speak Chinese?Does your website speak Chinese?
Does your website speak Chinese?
 
[사회적기업가포럼]사단법인 씨즈 김동훈 청년국장
[사회적기업가포럼]사단법인 씨즈 김동훈 청년국장[사회적기업가포럼]사단법인 씨즈 김동훈 청년국장
[사회적기업가포럼]사단법인 씨즈 김동훈 청년국장
 
A Hinkhouse Design
A Hinkhouse DesignA Hinkhouse Design
A Hinkhouse Design
 
Questionnaire results
Questionnaire results Questionnaire results
Questionnaire results
 
Demokväll med rättsinformationssystemet
Demokväll med rättsinformationssystemetDemokväll med rättsinformationssystemet
Demokväll med rättsinformationssystemet
 

Similar to WESO CAEPIA-20111108

Building DSLs On CLR and DLR (Microsoft.NET)
Building DSLs On CLR and DLR (Microsoft.NET)Building DSLs On CLR and DLR (Microsoft.NET)
Building DSLs On CLR and DLR (Microsoft.NET)
Vitaly Baum
 
Maximilian Michels – Google Cloud Dataflow on Top of Apache Flink
Maximilian Michels – Google Cloud Dataflow on Top of Apache FlinkMaximilian Michels – Google Cloud Dataflow on Top of Apache Flink
Maximilian Michels – Google Cloud Dataflow on Top of Apache Flink
Flink Forward
 

Similar to WESO CAEPIA-20111108 (20)

WESO MeTTeG 2011
WESO MeTTeG 2011WESO MeTTeG 2011
WESO MeTTeG 2011
 
WESO MeTTeG 2011
WESO MeTTeG 2011WESO MeTTeG 2011
WESO MeTTeG 2011
 
XConf 2022 - Code As Data: How data insights on legacy codebases can fill the...
XConf 2022 - Code As Data: How data insights on legacy codebases can fill the...XConf 2022 - Code As Data: How data insights on legacy codebases can fill the...
XConf 2022 - Code As Data: How data insights on legacy codebases can fill the...
 
Distributed Queries in IDS: New features.
Distributed Queries in IDS: New features.Distributed Queries in IDS: New features.
Distributed Queries in IDS: New features.
 
Query Rewriting Optimisation Techniques for Ontology-Based Data Access
Query Rewriting Optimisation Techniques for Ontology-Based Data AccessQuery Rewriting Optimisation Techniques for Ontology-Based Data Access
Query Rewriting Optimisation Techniques for Ontology-Based Data Access
 
Making Use of the Linked Data Cloud: The Role of Index Structures
Making Use of the Linked Data Cloud: The Role of Index StructuresMaking Use of the Linked Data Cloud: The Role of Index Structures
Making Use of the Linked Data Cloud: The Role of Index Structures
 
Object Detection with Transformers
Object Detection with TransformersObject Detection with Transformers
Object Detection with Transformers
 
CMPT470-usask-guest-lecture
CMPT470-usask-guest-lectureCMPT470-usask-guest-lecture
CMPT470-usask-guest-lecture
 
Jfall 2019 - Driving the energy transition with java
Jfall 2019 - Driving the energy transition with javaJfall 2019 - Driving the energy transition with java
Jfall 2019 - Driving the energy transition with java
 
Visualize open data with Plone - eea.daviz PLOG 2013
Visualize open data with Plone - eea.daviz PLOG 2013Visualize open data with Plone - eea.daviz PLOG 2013
Visualize open data with Plone - eea.daviz PLOG 2013
 
Domain Driven Design Tactical Patterns
Domain Driven Design Tactical PatternsDomain Driven Design Tactical Patterns
Domain Driven Design Tactical Patterns
 
Using Graph Databases in Real-Time to Solve Resource Authorization at Telenor...
Using Graph Databases in Real-Time to Solve Resource Authorization at Telenor...Using Graph Databases in Real-Time to Solve Resource Authorization at Telenor...
Using Graph Databases in Real-Time to Solve Resource Authorization at Telenor...
 
Using Graph Databases in Real-Time to Solve Resource Authorization at Telenor...
Using Graph Databases in Real-Time to Solve Resource Authorization at Telenor...Using Graph Databases in Real-Time to Solve Resource Authorization at Telenor...
Using Graph Databases in Real-Time to Solve Resource Authorization at Telenor...
 
How to valuate and determine standard essential patents
How to valuate and determine standard essential patentsHow to valuate and determine standard essential patents
How to valuate and determine standard essential patents
 
Building DSLs On CLR and DLR (Microsoft.NET)
Building DSLs On CLR and DLR (Microsoft.NET)Building DSLs On CLR and DLR (Microsoft.NET)
Building DSLs On CLR and DLR (Microsoft.NET)
 
Validating statistical Index Data represented in RDF using SPARQL Queries: Co...
Validating statistical Index Data represented in RDF using SPARQL Queries: Co...Validating statistical Index Data represented in RDF using SPARQL Queries: Co...
Validating statistical Index Data represented in RDF using SPARQL Queries: Co...
 
Maximilian Michels – Google Cloud Dataflow on Top of Apache Flink
Maximilian Michels – Google Cloud Dataflow on Top of Apache FlinkMaximilian Michels – Google Cloud Dataflow on Top of Apache Flink
Maximilian Michels – Google Cloud Dataflow on Top of Apache Flink
 
Reactive Stream Processing Using DDS and Rx
Reactive Stream Processing Using DDS and RxReactive Stream Processing Using DDS and Rx
Reactive Stream Processing Using DDS and Rx
 
Multidimensional Interfaces for Selecting Data with Order
Multidimensional Interfaces for Selecting Data with OrderMultidimensional Interfaces for Selecting Data with Order
Multidimensional Interfaces for Selecting Data with Order
 
Using Graph Databases in Real-Time to Solve Resource Authorization at Telenor...
Using Graph Databases in Real-Time to Solve Resource Authorization at Telenor...Using Graph Databases in Real-Time to Solve Resource Authorization at Telenor...
Using Graph Databases in Real-Time to Solve Resource Authorization at Telenor...
 

More from WESO (Oviedo Semantic Web) (8)

CAEPIA 2011 Linked Data Methodology
CAEPIA 2011 Linked Data MethodologyCAEPIA 2011 Linked Data Methodology
CAEPIA 2011 Linked Data Methodology
 
Curso Integración Web Semántica-Conclusiones
Curso Integración Web Semántica-ConclusionesCurso Integración Web Semántica-Conclusiones
Curso Integración Web Semántica-Conclusiones
 
Curso Integración Web Semántica-OWL
Curso Integración Web Semántica-OWLCurso Integración Web Semántica-OWL
Curso Integración Web Semántica-OWL
 
Curso Integración Web Semántica Estadísticas
Curso Integración Web Semántica EstadísticasCurso Integración Web Semántica Estadísticas
Curso Integración Web Semántica Estadísticas
 
Curso integración Web Semántica
Curso integración Web Semántica Curso integración Web Semántica
Curso integración Web Semántica
 
WESO SATBI 2011
WESO SATBI 2011WESO SATBI 2011
WESO SATBI 2011
 
WESO 10ders
WESO 10dersWESO 10ders
WESO 10ders
 
WESO-10ders
WESO-10dersWESO-10ders
WESO-10ders
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 

WESO CAEPIA-20111108

  • 1. Query Expansion Methods and Performance Evaluation for Reusing Linking Open Data of the European Public Procurement Notices José María Álvarez Rodríguez WESO-Universidad de Oviedo http://purl.org/weso/moldeas/ Tecnologías de Linked Data y sus aplicaciones en España (TLDE) CAEPIA 2011-Tenerife (Spain) 8th of November, 2011 Code: TSI-020100-2010-919
  • 2. Overview Use case & Context SPARQL & Performance Next Steps
  • 3. Objective Creation of a pan-european e-procurement platform
  • 4. E-procurement Long Tail TED BOE (official bulletin of the Spanish Governement) BOPA (official bulletin of the Asturian Governement)
  • 5. To Be Able to answer to… Which public procurement notices are relevant to Dutch companies (only SMEs) that want to tender for contracts announced by local authorities with a total value lower than 170K € to procure “Road bridge construction work” and a two year duration in the Dutch- speaking region of Flanders (Belgium)?
  • 6. Structuring public procurement notices d Providing new semantic- based services yD Z ^ ^ ^ D D K W KW LOD enrichment K W ^ Ws Z Easing the access to the published data using the Ehd^ LOD approach Transforming government classifications
  • 7. Preliminary Results / d d W s Z K Ehd^ W
  • 8. Semantic-based Services Problem of «Query Expansion» depending on the kind of information variable
  • 9. Methods of«Query Expansion» / ' d Z E ' ^ ^ h , ^ Z
  • 10. Remembering… Which public procurement notices are relevant to Dutch companies (only SMEs) that want to tender for contracts announced by local authorities with a total value lower than 170K € to procure “Road bridge construction work” and a two year duration in the Dutch- speaking region of Flanders (Belgium)?
  • 11. cpv:45221111-3 NL Query… Ehd^ Z' t KEE ppn:nutsCode ppn:hasDuration cpv:CodeIn2008 ppn:hasAmount org:classification ^D
  • 12. cpv:45221111-3 NL Applying Query Expansion… Ehd^ Ehd^ E Ehd^ Ehd^ ppn:nutsCode ppn:hasDuration cpv:CodeIn2008 ppn:hasAmount org:classification ^D
  • 13. Example of SPARQL query SELECT DISTINCT * WHERE { ?ppn rdf:type http://purl.org/weso/ppn/def#ppn. ?ppn ppn:nutsCode ?nutsCode. ?ppn cpv:codeIn2008 ?cpvCode. ?ppn ppn:hasDuration ?duration ?ppn dc:identifier ?id. ?ppn dc:date ?date . ? ppn ppn:hasAmount ?amount. FILTER(? cpvCode = cpv:45221111-3 ... ) . FILTER ( (xsd:double(?amount) = xsd:long(170,000)) (xsd:double(?amount) = xsd:long(200,000)) ). . FILTER(?nutsCode = nuts:B3 ... ) . FILTER ( (xsd:long(?duration) = xsd:long(2)) (xsd:long(?duration) = xsd:long(3)) ). }
  • 14. Context Performance of SPARQL Queries ~30 sec.
  • 15. Hardware Software DELL PC 2GB RAM and 30GB HardDisk Virtual Box (version 4.0.6) Linux 2.6.35-22-server #33-Ubuntu 2 SMP x86_64 GNU/Linux Ubuntu 10.10 OpenLink Virtuoso Opensource-6- 20110218
  • 16. Question? How to decrease the time of query execution without modify the hardware and not use any vendor feature?
  • 17. TripleStore 25 graphs 20 M of RDF Triples But… 8 graphs 11 M of RDF Triples
  • 18. Focus on.. The generation of SPARQL queries
  • 19. Let’s start… 9 SPARQL Queries 3 executions
  • 20. d ^ /D/d /dZ 'Z W,^ ^ W d d d d d d d d d d d d d d d d
  • 21. Simple SPARQL query SELECT DISTINCT * WHERE { ?ppn rdf:type http://purl.org/weso/ppn/def#ppn. ?ppn ppn:nutsCode ?nutsCode. ?ppn cpv:codeIn2008 ?cpvCode. ?ppn ppn:hasDuration ?duration ?ppn dc:identifier ?id. ?ppn dc:date ?date . ? ppn ppn:hasAmount ?amount. FILTER(? cpvCode = cpv:15331137 ) . . FILTER(?nutsCode = nuts:UK) . }
  • 22. Simple Query 1 CPV Code 1 NUTS Code Time: ~3,29 sec.
  • 23. T1 Rewrite SPARQL queries: Match triples from specific to general Filter as soon as possible
  • 24. T2 Use the LIMIT clause Value set to 10,000
  • 25. Rewrite SPARQL query SELECT DISTINCT * WHERE { ?ppn rdf:type http://purl.org/weso/ppn/def#ppn. ?ppn cpv:codeIn2008 ?cpvCode. FILTER(? cpvCode = cpv:15331137 ) . ?ppn ppn:nutsCode ?nutsCode. FILTER(?nutsCode = nuts:UK) . ?ppn ppn:hasDuration ?duration ?ppn dc:identifier ?id. ?ppn dc:date ?date . ? ppn ppn:hasAmount ?amount. . } LIMIT 10000
  • 26. Results T2 1 CPV Code 1 NUTS Code Time: ~3,26 sec.
  • 27. Evaluation There is no significant changes in execution time and gain… and We are interested in “enhanced queries”
  • 29. Enhanced SPARQL query SELECT DISTINCT * WHERE { ?ppn rdf:type http://purl.org/weso/ppn/def#ppn. ?ppn ppn:nutsCode ?nutsCode. ?ppn cpv:codeIn2008 ?cpvCode. ?ppn ppn:hasDuration ?duration ?ppn dc:identifier ?id. ?ppn dc:date ?date . ? ppn ppn:hasAmount ?amount. FILTER(? cpvCode = {cpv:15331137 , cpv:48611000, cpv: 48611000, cpv:50531510, cpv: 15871210}) . . FILTER(?nutsCode = {nuts:B3, nuts:PL, nuts:RO ) . }
  • 30. Results T3 5 CPV Codes 3 NUTS Codes 1 query Time: ~20,65 sec.
  • 31. T4 Rewrite SPARQL queries + Use the LIMIT clause
  • 32. Results T4 wrt T3 5 CPV Codes 3 NUTS Codes 1 query Time: ~20,55 sec.
  • 33. Info 8 graphs 11 M of RDF Triples
  • 34. T5 Rewrite SPARQL queries + Use the LIMIT clause + Named Graphs (FROM)
  • 35. Results T5 wrt T3 5 CPV Codes 3 NUTS Codes 1 query Time: ~20,65 sec.
  • 36. T6 Rewrite SPARQL queries + Use the LIMIT clause + Named Graphs (FROM) + Split into simple queries
  • 37. Results T6 wrt T3 5 CPV Codes 3 NUTS Codes 4 Graphs 4 simple queries Time: ~20,60 sec.
  • 38. T6-1 Rewrite SPARQL queries + Use the LIMIT clause + Named Graphs (FROM) + Split enhance query into simple queries + Parallelization of query execution (ad-hoc map/reduce)
  • 39. Results T6-1 wrt T3 5 CPV Codes 3 NUTS Codes 4 Graphs 4 simple queries Time: ~11,93 sec.
  • 40. T7 Rewrite SPARQL queries + Use the LIMIT clause + Split enhance query into simple queries
  • 41. Results T7 wrt T3 1 CPV Code (5) 3 NUTS Code 5 simple queries Time: ~15,81 sec.
  • 42. T7-1 Rewrite SPARQL queries + Use the LIMIT clause + Split enhance query into simple queries + Parallelization of query execution (ad-hoc map/reduce)
  • 43. Results T7-1 wrt T3 1 CPV Code (5) 3 NUTS Codes 5 simple queries Time: ~10,55 sec.
  • 44. T8 Rewrite SPARQL queries + Use the LIMIT clause + Named Graphs (FROM) + Split into simple queries
  • 45. Results T8 wrt T3 1 CPV Code (5) 3 NUTS Codes 4 Graphs 20 simple queries Time: ~32,34 sec.
  • 46. T8-1 Rewrite SPARQL queries + Use the LIMIT clause + Named Graphs (FROM) + Split enhance query into simple queries + Parallelization of query execution (ad-hoc map/reduce)
  • 47. Results T8-1 wrt T3 1 CPV Code (5) 3 NUTS Codes 4 Graphs 20 simple queries Time: ~18,45 sec.
  • 48. T9 Rewrite SPARQL queries + Use the LIMIT clause + Split enhance query into simple queries (1 CPV code+1 NUTS code)
  • 49. Results T9 wrt T3 1 CPV Code (5) 1 NUTS Code (3) 15 simple queries Time: ~22,462 sec.
  • 50. T9-1 Rewrite SPARQL queries + Use the LIMIT clause + Split enhance query into simple queries (1 CPV code+1 NUTS code) + Parallelization of query execution (ad-hoc map/reduce)
  • 51. Results T9-1 wrt T3 1 CPV Code (5) 1 NUTS Code (3) 15 simple queries Time: ~12,77 sec.
  • 52. T10 Rewrite SPARQL queries + Use the LIMIT clause + Named Graphs (FROM) + Split into simple queries (1 CPV code+1 NUTS code)
  • 53. Results T10 wrt T3 1 CPV Code (5) 1 NUTS Code (3) 4 Graphs 60 simple queries Time: ~71,17 sec.
  • 54. T10-1 Rewrite SPARQL queries + Use the LIMIT clause + Named Graphs (FROM) + Split enhance query into simple queries (1 CPV code+1 NUTS code) + Parallelization of query execution (ad-hoc map/reduce)
  • 55. Results T10-1 wrt T3 1 CPV Code (5) 1 NUTS Code (3) 4 Graphs 60 simple queries Time: ~35,13 sec.
  • 56. d Table of Results d ' d E d d E d d d d d d d d d d d d
  • 57. Discussion • The number of queries is a key-factor • The number of CPV codes implies more execution time • The parallelization improves execution time • T7-1 is the best execution in terms of time • Rewrite SPARQL queries • Use the LIMIT clause • Split enhance query into simple queries • Parallelization of query execution
  • 58. Further Steps • Distribute graphs in different nodes (HW improvement) • Use of other triple stores • (SW comparison) • Add SPARQL 1.1 new features (Expressiveness improvement) • Cache of queries (SW improvement)
  • 59. Some References… • http://www4.wiwiss.fu- berlin.de/bizer/BerlinSPARQLBenchmark/results/index.html#comparison • http://www.slideshare.net/olafhartig/an-overview-on-linked-data- management-and-sparql-querying-isslod2011 • http://squin.sourceforge.net/ • http://www2.informatik.hu- berlin.de/~hartig/files/Slides_Hartig_ISSLOD2011.pdf • http://www2008.org/papers/pdf/p595-stocker1.pdf • http://www.informatik.uni- freiburg.de/~mschmidt/docs/diss_final01122010.pdf • http://mayor2.dia.fi.upm.es/oeg-upm/files/sparql-dqp/eswc11-bac-ext.pdf • http://www.slideshare.net/olafhartig/the-sparql-query-graph-model-for- query-optimization-1259536 • http://www.w3.org/TR/sparql-features/
  • 60. Query Expansion Methods and Performance Evaluation for Reusing Linking Open Data of the European Public Procurement Notices José María Álvarez Rodríguez WESO-Universidad de Oviedo http://purl.org/weso/moldeas/ Tecnologías de Linked Data y sus aplicaciones en España (TLDE) CAEPIA 2011-Tenerife (Spain) 8th of November, 2011 Code: TSI-020100-2010-919