SlideShare una empresa de Scribd logo
1 de 5
Adaptive Semantic Data Management Techniques for
         Federations of Endpoints -Tutorial Description
                              Maria-Esther Vidal1 , Edna Ruckhaus1
                    Maribel Acosta1,2 , Cosmin Basca3 , Gabriela Montoya1
                            1
                              Universidad Sim´ n Bol´var, Venezuela
                                              o       ı
                      {mvidal, ruckhaus, macosta,gmontoya}@ldc.usb.ve
                2
                  Institute AIFB, Karlsruhe Institute of Technology, Germany
                              Maribel.Acosta@aifb.uni-karlsruhe.de
               3
                 Department of Informatics, University of Zurich, Switzerland
                                        basca@ifi.uzh.ch

                                            January 20, 2012


                                                  Abstract
          Emerging technologies that support networks of sensors or mobile smartphones are making
      available an extremely large volume of data or Big Data; additionally, in the context of the
      Cloud of Linked Data, a large number of huge RDF linked datasets have become available, and
      this number keeps growing. Simultaneously, although scalable and efficient RDF engines that
      follow the traditional optimize-then-execute paradigm have been developed to locally access
      RDF data, SPARQL endpoints have been implemented for remote query processing. Given
      the size of existing datasets, lack of statistics to describe available sources, and unpredictable
      conditions of remote queries, existing solutions are still insufficient. First, the most efficient
      RDF engines rely their query processing algorithms on physical access and storage structures
      that are locally stored; however, because of the size of existing linked datasets, loading the
      data and their links is not always feasible. Second, remote linked data query processing can
      be extremely costly because of the lack of query planning; also, current techniques are not
      adaptable to unpredictable data transfers or data availability, thus, executions can be unsuccess-
      ful. To overcome these limitations, query physical operators and execution engines need to be
      able to access remote data and adapt query execution schedulers to data availability. In this
      tutorial we present the basis of adaptive query processing frameworks defined in the database
      area, and their applicability in the Linked and Big Data context where data can be accessed
      through SPARQL endpoints. This tutorial targets any conference attendee who wants to know
      limitations of existing RDF engines, adaptive query processing techniques, and how traditional
      RDF data management approaches can be well-suitable to runtime conditions, and extended to
      access a large volume of data distributed in federations of SPARQL endpoints. The first edition
      of this tutorial was presented at ESWC 2011.


1   Tutorial Description
1.1 Aims and Target Audience
The tutorial describes the traditional optimize-then-execute paradigm implemented in existing RDF
engines and its main drawbacks when a large volume of data needs to be remotely accessed. As a
solution to overcome limitations of current query processing approaches, we will present existing
adaptive query processing techniques defined in the context of database management systems, and

                                                      1
their applicability to the Semantic Web. Also, we will describe current solutions that have been
proposed in the context of the Semantic Web to access remote data. The target audience includes
researchers and practitioners that develop or use query engines to consume Linked and Big Data
through SPARQL endpoints. The participants will learn limitations of existing RDF query engines
and how current techniques can be extended to access remote data from Linked datasets, and hide
delays caused by unpredictable data transfers and datasets availability. A hands-on session will
allow attendees to evaluate the performance and robustness of existing approaches.

1.2 Presentation Method and Technical Requirements
We propose a full-day tutorial; first theoretical issues will be presented; then, a hands-on session
will allow attendees to evaluate existing query processing approaches and determine pros and cons
of each one. The morning session will comprise a short introduction, three lectures and one coffee-
break of fifteen minutes. In the introduction the core concepts of a data management engine will
be presented. Next, in the first and second lectures, query execution and optimization techniques
of the classical approach of optimize-then-execute paradigm will be described; limitations of exist-
ing SPARQL endpoints and existing approaches to query Linked and Big Data will be illustrated.
Then, adaptive query processing techniques proposed in the context of Databases and the Seman-
tic Web will be presented in the third lecture. In the afternoon session, applicability of existing
approaches to consume Linked data will be described and an evaluation of state-of-the-art engines
will be conducted. We expect participants to have just a basic understanding of RDF and SPARQL.


2    Justification for the tutorial in ESWC 2012
In the context of the Cloud of Linked Data, a large number of diverse datasets have become avail-
able, and an exponential growth of the published data and links has occurred during the last years.
Billions of triples from life science research groups, government agencies, Wikipedia or entertain-
ment organizations, currently comprise the Cloud.
     Following the guidelines to publish and link data on the Cloud, a great number of available
SPARQL endpoints that support remote query processing to linked data have become available,
and this number keeps growing. Additionally, to scale up to the size of existing datasets, RDF
engines have implemented storage and access structures and query processing techniques for local
query processing. However, although the semantic data management community actively works
on more suitable linked data query processing techniques, access to the Cloud of Linked datasets
is still limited and insufficient because data have to be locally stored or some SPARQL endpoints
only support very light-weight use. To successfully execute real-world queries, in addition to access
remote data, existing query solutions have to be able to adapt query execution schedulers to data
availability. This tutorial aims to illustrate limitations of existing approaches and how they can be
extended to be well-suitable for remote query processing and runtime conditions. We consider that
this tutorial is ideally co-located with ESWC 2012, because research institutions that traditionally
attend ESWC, have an active contribution in the domain of RDF data management. Particularly,
one of the conference research tracks is on semantic data management, being query processing of
semantic data one of the topics of interests. Thus, many of the conference attendees could see the
tutorial as a place to discuss possible solutions to current semantic data management limitations.




                                                 2
3    Outline of the Tutorial
The goal of the tutorial is to highlight limitations of existing RDF query engines, introduce the basic
concepts of existing adaptive query processing techniques and how they can be used to effectively
and efficiency access SPARQL endpoints.

3.1 Content
The tutorial will cover traditional data management solutions that implement the optimize-then-
execute paradigm, and their pros and cons for Linked Data query processing; novel storage and
access data structures, and query optimization and execution techniques implemented by state-of-
the-art RDF engines will be described. Then, adaptive frameworks defined in the database area to
manage remote query processing, will be analyzed; adaptive operators such as symmetric hash joins
(binary and n-ary), routing operators, and adaptive engines will be studied. Finally, applicability of
adaptive techniques will be illustrated with existing query processing engines for federations of
SPARQL endpoints. Attendees will evaluate the performance and robustness of state-of-the-art
approaches during a hands-on session; observed results will be discussed with the attendees.

3.2 Schedule
Morning Session

      Introduction (20 minutes):
              • Traditional data management system architecture and its main components.
              • Basic terminology.
      Lecture 1-The Optimize-then-Execute Paradigm (50 minutes):
              •   Cost-based optimization techniques.
              •   Traditional iterator model architecture.
              •   Centralized data management physical operators.
              •   Centralized data management query engines.
      Lecture 2-Existing RDF Engines (50 minutes):
              • Query optimization and execution techniques in existing RDF engines like RDF-
                3X [3].
              • SPARQL endpoints and their execution model.
              • The SPARQL 1.1 Federation extension [6].
              • RDF engines for query processing against federations of SPARQL endpoints; ap-
                proaches as FedX [5] and ARQ [7] will be studied.
      Coffee-Break (15 minutes)
      Lecture 3-Adaptive Query Processing Techniques (100 minutes):
              • Intra-operators solutions; adaptive physical operators: symmetric hash joins, n-ary
                joins.
              • Inter-operators solutions; Eddy operators, query processing schedulers, and routing
                policies.
              • Adaptive query engines.

Lunch (120 minutes)

                                                  3
Afternoon Session

      Lecture 4: Adaptive Approaches for Federations of SPARQL endpoints(50 minutes):
             • Requirements for query processing in Federations of SPARQL endpoints.
             • Existing benchmarks for evaluating query processing engines for Federations of
               SPARQL endpoints, e.g., FedBench [4].
             • Adaptive query processing engines for Federations of endpoints; approaches as
               ANAPSID [1] and Avalanche [2] will be studied.
      Coffee-Break (15 minutes)
      Hands-on Session: RDF Storage Systems Evaluation (100 minutes): existing benchmarks
          will be used to evaluate performance and robustness of state-of-the-art solutions; ARQ,
          FedX, ANAPSID and Avalanche will be analyzed.
      Analysis and Discussion of the Evaluation Results (30 minutes): results of the evaluation
          will be analyzed and discussed with the attendees.


4   Tutorial Former Editions
The first edition of the tutorial named Adaptive Semantic Data Management Techniques for Linked
Data, was held at ESWC 2011(http://www.eswc2011.org/content/tutorials); it was a half day tu-
torial that did not include a hands-on session and the evaluation of state-of-the-art approaches as
Avalanche, ARQ, ANAPSID and FedX.


5   Information of Presenters
Edna Ruckhaus is a Full Professor of the Computer Science department at the Universidad Sim´ n  o
     Bol´var, Venezuela since 1998, where she has taught several Database courses at undergrad-
         ı
     uate level. Visiting scholar of the research group Mindswap (Maryland Information and Net-
     work Dynamic Lab Semantic Web Agents Project), 2004-2005. Over 20 publications in in-
     ternational and national conferences and journals. She has been reviewer and has participated
     in the Program Committee of several International Conferences. Member of the Organizing
     Committee of the Workshop on Applications of Logic Programming to the Semantic Web
     and Semantic Web Services (ALPSWS2007) co-located with the International Conference on
     Logic Programming. Co-Chair of the Organizing Committee of the ESWC 2011 and 2012
     Workshops on Resource Discovery; she co-organized and co-lectured the tutorial on Adaptive
     Semantic Data Management Techniques for Linked Data at ESWC 2011.

Maria-Esther Vidal is a Full Professor of the Computer Science department at the Universidad
     Universidad Sim´ n Bol´var, Venezuela, where she has taught several Database and Semantic
                       o     ı
     Web courses at undergraduate and graduate level. Prof. Vidal has been also a Research Asso-
     ciate and Visiting Researcher at the Institute of Advanced Computer Studies of the University
     of Maryland, and Visiting Professor at Universidad Polit´ cnica de Catalunya, University of
                                                                e
     Laguna Spain, and Leipzig, Germany. She has participated in several international projects
     supported by NFS (USA), AECI (Spain) and CNRS (France), and advised six PhD students
     and more than 55 master and undergraduate students. Professor Vidal has published more
     than 60 papers in International Conferences and Journals of the Database and The Semantic
     Web areas. She has been reviewer and has participated in the Program Committee of sev-
     eral International Journals and Conferences. Co-chair of Workshop on Resource Discovery

                                                4
(RED2010) and accompanying professor of On the Move Academy (OTMa). Co-Chair of
      the Organizing Committee of the ESWC 2011 and 2012 Workshops on Resource Discov-
      ery; she co-organized and co-lectured the tutorial on Adaptive Semantic Data Management
      Techniques for Linked Data at ESWC 2011.

Maribel Acosta is a PhD student at Institute AIFB, Karlsruhe Institute of Technology, Germany.
     She has Master on Computer Science from the Universidad Sim´ n Bol´var where she was a
                                                                       o      ı
     Teaching Assistant and has taught Logic, Discrete Math, and Databases labs at the undergrad-
     uate level. She has published seven publications in international conferences and workshops.
     Her topics of interests are Adaptive Query Execution techniques for Linked and Big Data.

Cosmin Basca is a PhD student at the University of Zurich, Department of Informatics, Switzer-
    land. He holds a master in Computer Science from “Lucian Blaga” University of Sibiu,
    Romania where he did research in image processing and computer vision. Later, while being
    part of Digital Enterprise Research Institute in Galway, Ireland he focused his research on Se-
    mantic Web, specifically Semantic Data Management. His research interests include among
    others: large scale distributed graph data management systems and algorithms and Linked
    Data.

Gabriela Montoya is a Lecturer of the Computer Science Department at the Universidad Sim´ n    o
     Bol´var, where she has taught Logic, Algorithms and Programming Languages courses and
         ı
     labs at undergraduate level. She has Master on Computer Science from the Universidad
     Sim´ n Bol´var and currently, she is a doctoral student at the same university; her topics of
          o      ı
     interests are Data Integration and Query Processing techniques in Emerging Infrastructures.


References
[1] M. Acosta, M.-E. Vidal, T. Lampo, J. Castillo, and E. Ruckhaus. ANAPSID: AN Adaptive
    query ProcesSing engIne for sparql enDpoints. In Proceedings of the International Semantic
    Web Conference (ISWC), 2011.

[2] C. Basca and A. Bernstein. Avalanche: Putting the Spirit of the Web back into Semantic Web
    Querying. In SSWS2010 Workshop, Shanghai, China, 2010.

[3] T. Neumann and G. Weikum. RDF-3X: a RISC-style engine for RDF. Proc. VLDB, 1(1), 2008.

[4] M. Schmidt, O. Gorlitz, P. Haase, A. Schwarte, G. Ladwig, and T. Tran. Fedbench: A bench-
    mark suite for federated semantic data query processing. International Semantic Web Confer-
    ence, 2011.

[5] A. Schwarte, P. Haase, K. Hose, R. Schenkel, and M. Schmidt. Fedx: Optimization techniques
    for federated query processing on linked data. In International Semantic Web Conference (1),
    pages 601–616, 2011.

[6] E. P. Steve Harris, Andy Seaborne. SPARQL 1.1 Query Language, June 2010.

[7] M. Stocker, A. Seaborne, A. Bernstein, C. Kiefer, and D. Reynolds. SPARQL basic graph
    pattern optimization using selectivity estimation. In International Semantic Web Conference
    (ISWC), Beijing, China, 2008. ACM.




                                                5

Más contenido relacionado

La actualidad más candente

Transient and persistent RDF views over relational databases in the context o...
Transient and persistent RDF views over relational databases in the context o...Transient and persistent RDF views over relational databases in the context o...
Transient and persistent RDF views over relational databases in the context o...Nikolaos Konstantinou
 
Indexing techniques for advanced database systems
Indexing techniques for advanced database systemsIndexing techniques for advanced database systems
Indexing techniques for advanced database systemsMohammed Muqeet
 
Preparing eScience librarians -- RDAP 2012
Preparing eScience librarians -- RDAP 2012 Preparing eScience librarians -- RDAP 2012
Preparing eScience librarians -- RDAP 2012 Jian Qin
 
DataONE Education Module 09: Analysis and Workflows
DataONE Education Module 09: Analysis and WorkflowsDataONE Education Module 09: Analysis and Workflows
DataONE Education Module 09: Analysis and WorkflowsDataONE
 
IEEE Parallel and distributed system 2016 Title and Abstract
IEEE Parallel and distributed system 2016 Title and AbstractIEEE Parallel and distributed system 2016 Title and Abstract
IEEE Parallel and distributed system 2016 Title and Abstracttsysglobalsolutions
 
Urm concept for sharing information inside of communities
Urm concept for sharing information inside of communitiesUrm concept for sharing information inside of communities
Urm concept for sharing information inside of communitiesKarel Charvat
 
Don't Be Scared. Data Don't Bite. Introduction to Big Data.
Don't Be Scared. Data Don't Bite. Introduction to Big Data.Don't Be Scared. Data Don't Bite. Introduction to Big Data.
Don't Be Scared. Data Don't Bite. Introduction to Big Data.KGMGROUP
 
Beyond Transparency: Success & Lessons From tambisBoston2003
Beyond Transparency: Success & Lessons From tambisBoston2003Beyond Transparency: Success & Lessons From tambisBoston2003
Beyond Transparency: Success & Lessons From tambisBoston2003robertstevens65
 
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...IRJET Journal
 

La actualidad más candente (10)

Transient and persistent RDF views over relational databases in the context o...
Transient and persistent RDF views over relational databases in the context o...Transient and persistent RDF views over relational databases in the context o...
Transient and persistent RDF views over relational databases in the context o...
 
Az31349353
Az31349353Az31349353
Az31349353
 
Indexing techniques for advanced database systems
Indexing techniques for advanced database systemsIndexing techniques for advanced database systems
Indexing techniques for advanced database systems
 
Preparing eScience librarians -- RDAP 2012
Preparing eScience librarians -- RDAP 2012 Preparing eScience librarians -- RDAP 2012
Preparing eScience librarians -- RDAP 2012
 
DataONE Education Module 09: Analysis and Workflows
DataONE Education Module 09: Analysis and WorkflowsDataONE Education Module 09: Analysis and Workflows
DataONE Education Module 09: Analysis and Workflows
 
IEEE Parallel and distributed system 2016 Title and Abstract
IEEE Parallel and distributed system 2016 Title and AbstractIEEE Parallel and distributed system 2016 Title and Abstract
IEEE Parallel and distributed system 2016 Title and Abstract
 
Urm concept for sharing information inside of communities
Urm concept for sharing information inside of communitiesUrm concept for sharing information inside of communities
Urm concept for sharing information inside of communities
 
Don't Be Scared. Data Don't Bite. Introduction to Big Data.
Don't Be Scared. Data Don't Bite. Introduction to Big Data.Don't Be Scared. Data Don't Bite. Introduction to Big Data.
Don't Be Scared. Data Don't Bite. Introduction to Big Data.
 
Beyond Transparency: Success & Lessons From tambisBoston2003
Beyond Transparency: Success & Lessons From tambisBoston2003Beyond Transparency: Success & Lessons From tambisBoston2003
Beyond Transparency: Success & Lessons From tambisBoston2003
 
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
 

Destacado

Как да овладеем ескалацията на конфликт? 5 стъпки. Технология на Златното пра...
Как да овладеем ескалацията на конфликт? 5 стъпки. Технология на Златното пра...Как да овладеем ескалацията на конфликт? 5 стъпки. Технология на Златното пра...
Как да овладеем ескалацията на конфликт? 5 стъпки. Технология на Златното пра...Anastasia Panayotova
 
Mis content marketing seminarium dag1:2
Mis content marketing seminarium dag1:2Mis content marketing seminarium dag1:2
Mis content marketing seminarium dag1:2persod.com
 
Abx2 presentation updated 2
Abx2 presentation updated 2Abx2 presentation updated 2
Abx2 presentation updated 2csromania
 
Mis content marketing seminarium dag 2:2
Mis content marketing seminarium dag 2:2Mis content marketing seminarium dag 2:2
Mis content marketing seminarium dag 2:2persod.com
 
Franklin presentation by iOn the Ball
Franklin presentation by iOn the BallFranklin presentation by iOn the Ball
Franklin presentation by iOn the BallFranklin Matters
 
Hrproj Policyquickref 907
Hrproj Policyquickref 907Hrproj Policyquickref 907
Hrproj Policyquickref 907swati18
 
Towards Parallel Nonmonotonic Reasoning with Billions of Facts
Towards Parallel Nonmonotonic Reasoning with Billions of FactsTowards Parallel Nonmonotonic Reasoning with Billions of Facts
Towards Parallel Nonmonotonic Reasoning with Billions of FactsPlanetData Network of Excellence
 
How to Create a Culture That Fosters Employee Engagement | Webinar
How to Create a Culture That Fosters Employee Engagement | Webinar How to Create a Culture That Fosters Employee Engagement | Webinar
How to Create a Culture That Fosters Employee Engagement | Webinar BizLibrary
 
Performance Appraisal
Performance AppraisalPerformance Appraisal
Performance Appraisalmsexysmurf
 
Chameleons, Catfish, and Cybersleuths: The Art of Sourcing and Attracting Mul...
Chameleons, Catfish, and Cybersleuths: The Art of Sourcing and Attracting Mul...Chameleons, Catfish, and Cybersleuths: The Art of Sourcing and Attracting Mul...
Chameleons, Catfish, and Cybersleuths: The Art of Sourcing and Attracting Mul...Engage
 

Destacado (18)

Как да овладеем ескалацията на конфликт? 5 стъпки. Технология на Златното пра...
Как да овладеем ескалацията на конфликт? 5 стъпки. Технология на Златното пра...Как да овладеем ескалацията на конфликт? 5 стъпки. Технология на Златното пра...
Как да овладеем ескалацията на конфликт? 5 стъпки. Технология на Златното пра...
 
Mis content marketing seminarium dag1:2
Mis content marketing seminarium dag1:2Mis content marketing seminarium dag1:2
Mis content marketing seminarium dag1:2
 
MLAMarchPrintAd
MLAMarchPrintAdMLAMarchPrintAd
MLAMarchPrintAd
 
Abx2 presentation updated 2
Abx2 presentation updated 2Abx2 presentation updated 2
Abx2 presentation updated 2
 
Mis content marketing seminarium dag 2:2
Mis content marketing seminarium dag 2:2Mis content marketing seminarium dag 2:2
Mis content marketing seminarium dag 2:2
 
Proof Back Page Ad
Proof Back Page AdProof Back Page Ad
Proof Back Page Ad
 
Declarative Repairing Policies for Curated KBs
Declarative Repairing Policies for Curated KBsDeclarative Repairing Policies for Curated KBs
Declarative Repairing Policies for Curated KBs
 
Franklin presentation by iOn the Ball
Franklin presentation by iOn the BallFranklin presentation by iOn the Ball
Franklin presentation by iOn the Ball
 
711 alqurashi ppt
711 alqurashi ppt711 alqurashi ppt
711 alqurashi ppt
 
Hrproj Policyquickref 907
Hrproj Policyquickref 907Hrproj Policyquickref 907
Hrproj Policyquickref 907
 
1
11
1
 
Citi completion report
Citi completion reportCiti completion report
Citi completion report
 
Towards Parallel Nonmonotonic Reasoning with Billions of Facts
Towards Parallel Nonmonotonic Reasoning with Billions of FactsTowards Parallel Nonmonotonic Reasoning with Billions of Facts
Towards Parallel Nonmonotonic Reasoning with Billions of Facts
 
Motivationppt
MotivationpptMotivationppt
Motivationppt
 
708 in leadership
708 in leadership708 in leadership
708 in leadership
 
How to Create a Culture That Fosters Employee Engagement | Webinar
How to Create a Culture That Fosters Employee Engagement | Webinar How to Create a Culture That Fosters Employee Engagement | Webinar
How to Create a Culture That Fosters Employee Engagement | Webinar
 
Performance Appraisal
Performance AppraisalPerformance Appraisal
Performance Appraisal
 
Chameleons, Catfish, and Cybersleuths: The Art of Sourcing and Attracting Mul...
Chameleons, Catfish, and Cybersleuths: The Art of Sourcing and Attracting Mul...Chameleons, Catfish, and Cybersleuths: The Art of Sourcing and Attracting Mul...
Chameleons, Catfish, and Cybersleuths: The Art of Sourcing and Attracting Mul...
 

Similar a Adaptive Semantic Data Techniques

Bridging the gap between the semantic web and big data: answering SPARQL que...
Bridging the gap between the semantic web and big data:  answering SPARQL que...Bridging the gap between the semantic web and big data:  answering SPARQL que...
Bridging the gap between the semantic web and big data: answering SPARQL que...IJECEIAES
 
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...Ilkay Altintas, Ph.D.
 
Adaptive Semantic Data Management Techniques for Federations of Endpoints
Adaptive Semantic Data Management Techniques for Federations of EndpointsAdaptive Semantic Data Management Techniques for Federations of Endpoints
Adaptive Semantic Data Management Techniques for Federations of EndpointsMaribel Acosta Deibe
 
Ontology Based Approach for Semantic Information Retrieval System
Ontology Based Approach for Semantic Information Retrieval SystemOntology Based Approach for Semantic Information Retrieval System
Ontology Based Approach for Semantic Information Retrieval SystemIJTET Journal
 
Towards efficient processing of RDF data streams
Towards efficient processing of RDF data streamsTowards efficient processing of RDF data streams
Towards efficient processing of RDF data streamsAlejandro Llaves
 
Towards efficient processing of RDF data streams
Towards efficient processing of RDF data streamsTowards efficient processing of RDF data streams
Towards efficient processing of RDF data streamsAlejandro Llaves
 
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...MLconf
 
Michael Lang Sr. Presentation
Michael Lang Sr. PresentationMichael Lang Sr. Presentation
Michael Lang Sr. PresentationMediabistro
 
IEEE 2014 JAVA DATA MINING PROJECTS Scalable keyword search on large rdf data
IEEE 2014 JAVA DATA MINING PROJECTS Scalable keyword search on large rdf dataIEEE 2014 JAVA DATA MINING PROJECTS Scalable keyword search on large rdf data
IEEE 2014 JAVA DATA MINING PROJECTS Scalable keyword search on large rdf dataIEEEFINALYEARSTUDENTPROJECTS
 
Overview of the SPARQL-Generate language and latest developments
Overview of the SPARQL-Generate language and latest developmentsOverview of the SPARQL-Generate language and latest developments
Overview of the SPARQL-Generate language and latest developmentsMaxime Lefrançois
 
Robust Module based data management system
Robust Module based data management systemRobust Module based data management system
Robust Module based data management systemRahul Roi
 
Dotnet datamining ieee projects 2012 @ Seabirds ( Chennai, Pondicherry, Vello...
Dotnet datamining ieee projects 2012 @ Seabirds ( Chennai, Pondicherry, Vello...Dotnet datamining ieee projects 2012 @ Seabirds ( Chennai, Pondicherry, Vello...
Dotnet datamining ieee projects 2012 @ Seabirds ( Chennai, Pondicherry, Vello...SBGC
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)inventionjournals
 
The MADlib Analytics Library
The MADlib Analytics Library The MADlib Analytics Library
The MADlib Analytics Library EMC
 
A Model Of Non Functional Properties For Grid Resources
A Model Of Non Functional Properties For Grid ResourcesA Model Of Non Functional Properties For Grid Resources
A Model Of Non Functional Properties For Grid ResourcesAmy Cernava
 
Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...Lucy McKenna
 
Virtuoso, The Prometheus of RDF -- Sematics 2014 Conference Keynote
 Virtuoso, The Prometheus of RDF -- Sematics 2014 Conference Keynote Virtuoso, The Prometheus of RDF -- Sematics 2014 Conference Keynote
Virtuoso, The Prometheus of RDF -- Sematics 2014 Conference KeynoteKingsley Uyi Idehen
 

Similar a Adaptive Semantic Data Techniques (20)

Syllabus.pdf
Syllabus.pdfSyllabus.pdf
Syllabus.pdf
 
Bridging the gap between the semantic web and big data: answering SPARQL que...
Bridging the gap between the semantic web and big data:  answering SPARQL que...Bridging the gap between the semantic web and big data:  answering SPARQL que...
Bridging the gap between the semantic web and big data: answering SPARQL que...
 
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
 
Adaptive Semantic Data Management Techniques for Federations of Endpoints
Adaptive Semantic Data Management Techniques for Federations of EndpointsAdaptive Semantic Data Management Techniques for Federations of Endpoints
Adaptive Semantic Data Management Techniques for Federations of Endpoints
 
Ontology Based Approach for Semantic Information Retrieval System
Ontology Based Approach for Semantic Information Retrieval SystemOntology Based Approach for Semantic Information Retrieval System
Ontology Based Approach for Semantic Information Retrieval System
 
Towards efficient processing of RDF data streams
Towards efficient processing of RDF data streamsTowards efficient processing of RDF data streams
Towards efficient processing of RDF data streams
 
Towards efficient processing of RDF data streams
Towards efficient processing of RDF data streamsTowards efficient processing of RDF data streams
Towards efficient processing of RDF data streams
 
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
Anusua Trivedi, Data Scientist at Texas Advanced Computing Center (TACC), UT ...
 
Michael Lang Sr. Presentation
Michael Lang Sr. PresentationMichael Lang Sr. Presentation
Michael Lang Sr. Presentation
 
IEEE 2014 JAVA DATA MINING PROJECTS Scalable keyword search on large rdf data
IEEE 2014 JAVA DATA MINING PROJECTS Scalable keyword search on large rdf dataIEEE 2014 JAVA DATA MINING PROJECTS Scalable keyword search on large rdf data
IEEE 2014 JAVA DATA MINING PROJECTS Scalable keyword search on large rdf data
 
Data management presentation
Data management presentationData management presentation
Data management presentation
 
Overview of the SPARQL-Generate language and latest developments
Overview of the SPARQL-Generate language and latest developmentsOverview of the SPARQL-Generate language and latest developments
Overview of the SPARQL-Generate language and latest developments
 
Robust Module based data management system
Robust Module based data management systemRobust Module based data management system
Robust Module based data management system
 
Dotnet datamining ieee projects 2012 @ Seabirds ( Chennai, Pondicherry, Vello...
Dotnet datamining ieee projects 2012 @ Seabirds ( Chennai, Pondicherry, Vello...Dotnet datamining ieee projects 2012 @ Seabirds ( Chennai, Pondicherry, Vello...
Dotnet datamining ieee projects 2012 @ Seabirds ( Chennai, Pondicherry, Vello...
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
 
The MADlib Analytics Library
The MADlib Analytics Library The MADlib Analytics Library
The MADlib Analytics Library
 
A Model Of Non Functional Properties For Grid Resources
A Model Of Non Functional Properties For Grid ResourcesA Model Of Non Functional Properties For Grid Resources
A Model Of Non Functional Properties For Grid Resources
 
Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...
 
Virtuoso, The Prometheus of RDF -- Sematics 2014 Conference Keynote
 Virtuoso, The Prometheus of RDF -- Sematics 2014 Conference Keynote Virtuoso, The Prometheus of RDF -- Sematics 2014 Conference Keynote
Virtuoso, The Prometheus of RDF -- Sematics 2014 Conference Keynote
 
Shifting the Burden from the User to the Data Provider
Shifting the Burden from the User to the Data ProviderShifting the Burden from the User to the Data Provider
Shifting the Burden from the User to the Data Provider
 

Más de PlanetData Network of Excellence

A Contextualized Knowledge Repository for Open Data about Trentino
A Contextualized Knowledge Repository for Open Data about TrentinoA Contextualized Knowledge Repository for Open Data about Trentino
A Contextualized Knowledge Repository for Open Data about TrentinoPlanetData Network of Excellence
 
On Leveraging Crowdsourcing Techniques for Schema Matching Networks
On Leveraging Crowdsourcing Techniques for Schema Matching NetworksOn Leveraging Crowdsourcing Techniques for Schema Matching Networks
On Leveraging Crowdsourcing Techniques for Schema Matching NetworksPlanetData Network of Excellence
 
Towards Enabling Probabilistic Databases for Participatory Sensing
Towards Enabling Probabilistic Databases for Participatory SensingTowards Enabling Probabilistic Databases for Participatory Sensing
Towards Enabling Probabilistic Databases for Participatory SensingPlanetData Network of Excellence
 
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstream
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstreamDemo: tablet-based visualisation of transport data in Madrid using SPARQLstream
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstreamPlanetData Network of Excellence
 
On the need for a W3C community group on RDF Stream Processing
On the need for a W3C community group on RDF Stream ProcessingOn the need for a W3C community group on RDF Stream Processing
On the need for a W3C community group on RDF Stream ProcessingPlanetData Network of Excellence
 
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...PlanetData Network of Excellence
 
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatch
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatchLinking Smart Cities Datasets with Human Computation: the case of UrbanMatch
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatchPlanetData Network of Excellence
 
SciQL, Bridging the Gap between Science and Relational DBMS
SciQL, Bridging the Gap between Science and Relational DBMSSciQL, Bridging the Gap between Science and Relational DBMS
SciQL, Bridging the Gap between Science and Relational DBMSPlanetData Network of Excellence
 
Scalable Nonmonotonic Reasoning over RDF Data Using MapReduce
Scalable Nonmonotonic Reasoning over RDF Data Using MapReduceScalable Nonmonotonic Reasoning over RDF Data Using MapReduce
Scalable Nonmonotonic Reasoning over RDF Data Using MapReducePlanetData Network of Excellence
 
Evolution of Workflow Provenance Information in the Presence of Custom Infere...
Evolution of Workflow Provenance Information in the Presence of Custom Infere...Evolution of Workflow Provenance Information in the Presence of Custom Infere...
Evolution of Workflow Provenance Information in the Presence of Custom Infere...PlanetData Network of Excellence
 
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...PlanetData Network of Excellence
 

Más de PlanetData Network of Excellence (20)

Dl2014 slides
Dl2014 slidesDl2014 slides
Dl2014 slides
 
A Contextualized Knowledge Repository for Open Data about Trentino
A Contextualized Knowledge Repository for Open Data about TrentinoA Contextualized Knowledge Repository for Open Data about Trentino
A Contextualized Knowledge Repository for Open Data about Trentino
 
On Leveraging Crowdsourcing Techniques for Schema Matching Networks
On Leveraging Crowdsourcing Techniques for Schema Matching NetworksOn Leveraging Crowdsourcing Techniques for Schema Matching Networks
On Leveraging Crowdsourcing Techniques for Schema Matching Networks
 
Towards Enabling Probabilistic Databases for Participatory Sensing
Towards Enabling Probabilistic Databases for Participatory SensingTowards Enabling Probabilistic Databases for Participatory Sensing
Towards Enabling Probabilistic Databases for Participatory Sensing
 
Privacy-Preserving Schema Reuse
Privacy-Preserving Schema ReusePrivacy-Preserving Schema Reuse
Privacy-Preserving Schema Reuse
 
Pay-as-you-go Reconciliation in Schema Matching Networks
Pay-as-you-go Reconciliation in Schema Matching NetworksPay-as-you-go Reconciliation in Schema Matching Networks
Pay-as-you-go Reconciliation in Schema Matching Networks
 
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstream
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstreamDemo: tablet-based visualisation of transport data in Madrid using SPARQLstream
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstream
 
On the need for a W3C community group on RDF Stream Processing
On the need for a W3C community group on RDF Stream ProcessingOn the need for a W3C community group on RDF Stream Processing
On the need for a W3C community group on RDF Stream Processing
 
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
 
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatch
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatchLinking Smart Cities Datasets with Human Computation: the case of UrbanMatch
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatch
 
SciQL, Bridging the Gap between Science and Relational DBMS
SciQL, Bridging the Gap between Science and Relational DBMSSciQL, Bridging the Gap between Science and Relational DBMS
SciQL, Bridging the Gap between Science and Relational DBMS
 
CLODA: A Crowdsourced Linked Open Data Architecture
CLODA: A Crowdsourced Linked Open Data ArchitectureCLODA: A Crowdsourced Linked Open Data Architecture
CLODA: A Crowdsourced Linked Open Data Architecture
 
Scalable Nonmonotonic Reasoning over RDF Data Using MapReduce
Scalable Nonmonotonic Reasoning over RDF Data Using MapReduceScalable Nonmonotonic Reasoning over RDF Data Using MapReduce
Scalable Nonmonotonic Reasoning over RDF Data Using MapReduce
 
Data and Knowledge Evolution
Data and Knowledge Evolution  Data and Knowledge Evolution
Data and Knowledge Evolution
 
Evolution of Workflow Provenance Information in the Presence of Custom Infere...
Evolution of Workflow Provenance Information in the Presence of Custom Infere...Evolution of Workflow Provenance Information in the Presence of Custom Infere...
Evolution of Workflow Provenance Information in the Presence of Custom Infere...
 
Access Control for RDF graphs using Abstract Models
Access Control for RDF graphs using Abstract ModelsAccess Control for RDF graphs using Abstract Models
Access Control for RDF graphs using Abstract Models
 
Arrays in Databases, the next frontier?
Arrays in Databases, the next frontier?Arrays in Databases, the next frontier?
Arrays in Databases, the next frontier?
 
Abstract Access Control Model for Dynamic RDF Datasets
Abstract Access Control Model for Dynamic RDF DatasetsAbstract Access Control Model for Dynamic RDF Datasets
Abstract Access Control Model for Dynamic RDF Datasets
 
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
 
Heuristic based Query Optimisation for SPARQL
Heuristic based Query Optimisation for SPARQLHeuristic based Query Optimisation for SPARQL
Heuristic based Query Optimisation for SPARQL
 

Último

The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 

Último (20)

The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 

Adaptive Semantic Data Techniques

  • 1. Adaptive Semantic Data Management Techniques for Federations of Endpoints -Tutorial Description Maria-Esther Vidal1 , Edna Ruckhaus1 Maribel Acosta1,2 , Cosmin Basca3 , Gabriela Montoya1 1 Universidad Sim´ n Bol´var, Venezuela o ı {mvidal, ruckhaus, macosta,gmontoya}@ldc.usb.ve 2 Institute AIFB, Karlsruhe Institute of Technology, Germany Maribel.Acosta@aifb.uni-karlsruhe.de 3 Department of Informatics, University of Zurich, Switzerland basca@ifi.uzh.ch January 20, 2012 Abstract Emerging technologies that support networks of sensors or mobile smartphones are making available an extremely large volume of data or Big Data; additionally, in the context of the Cloud of Linked Data, a large number of huge RDF linked datasets have become available, and this number keeps growing. Simultaneously, although scalable and efficient RDF engines that follow the traditional optimize-then-execute paradigm have been developed to locally access RDF data, SPARQL endpoints have been implemented for remote query processing. Given the size of existing datasets, lack of statistics to describe available sources, and unpredictable conditions of remote queries, existing solutions are still insufficient. First, the most efficient RDF engines rely their query processing algorithms on physical access and storage structures that are locally stored; however, because of the size of existing linked datasets, loading the data and their links is not always feasible. Second, remote linked data query processing can be extremely costly because of the lack of query planning; also, current techniques are not adaptable to unpredictable data transfers or data availability, thus, executions can be unsuccess- ful. To overcome these limitations, query physical operators and execution engines need to be able to access remote data and adapt query execution schedulers to data availability. In this tutorial we present the basis of adaptive query processing frameworks defined in the database area, and their applicability in the Linked and Big Data context where data can be accessed through SPARQL endpoints. This tutorial targets any conference attendee who wants to know limitations of existing RDF engines, adaptive query processing techniques, and how traditional RDF data management approaches can be well-suitable to runtime conditions, and extended to access a large volume of data distributed in federations of SPARQL endpoints. The first edition of this tutorial was presented at ESWC 2011. 1 Tutorial Description 1.1 Aims and Target Audience The tutorial describes the traditional optimize-then-execute paradigm implemented in existing RDF engines and its main drawbacks when a large volume of data needs to be remotely accessed. As a solution to overcome limitations of current query processing approaches, we will present existing adaptive query processing techniques defined in the context of database management systems, and 1
  • 2. their applicability to the Semantic Web. Also, we will describe current solutions that have been proposed in the context of the Semantic Web to access remote data. The target audience includes researchers and practitioners that develop or use query engines to consume Linked and Big Data through SPARQL endpoints. The participants will learn limitations of existing RDF query engines and how current techniques can be extended to access remote data from Linked datasets, and hide delays caused by unpredictable data transfers and datasets availability. A hands-on session will allow attendees to evaluate the performance and robustness of existing approaches. 1.2 Presentation Method and Technical Requirements We propose a full-day tutorial; first theoretical issues will be presented; then, a hands-on session will allow attendees to evaluate existing query processing approaches and determine pros and cons of each one. The morning session will comprise a short introduction, three lectures and one coffee- break of fifteen minutes. In the introduction the core concepts of a data management engine will be presented. Next, in the first and second lectures, query execution and optimization techniques of the classical approach of optimize-then-execute paradigm will be described; limitations of exist- ing SPARQL endpoints and existing approaches to query Linked and Big Data will be illustrated. Then, adaptive query processing techniques proposed in the context of Databases and the Seman- tic Web will be presented in the third lecture. In the afternoon session, applicability of existing approaches to consume Linked data will be described and an evaluation of state-of-the-art engines will be conducted. We expect participants to have just a basic understanding of RDF and SPARQL. 2 Justification for the tutorial in ESWC 2012 In the context of the Cloud of Linked Data, a large number of diverse datasets have become avail- able, and an exponential growth of the published data and links has occurred during the last years. Billions of triples from life science research groups, government agencies, Wikipedia or entertain- ment organizations, currently comprise the Cloud. Following the guidelines to publish and link data on the Cloud, a great number of available SPARQL endpoints that support remote query processing to linked data have become available, and this number keeps growing. Additionally, to scale up to the size of existing datasets, RDF engines have implemented storage and access structures and query processing techniques for local query processing. However, although the semantic data management community actively works on more suitable linked data query processing techniques, access to the Cloud of Linked datasets is still limited and insufficient because data have to be locally stored or some SPARQL endpoints only support very light-weight use. To successfully execute real-world queries, in addition to access remote data, existing query solutions have to be able to adapt query execution schedulers to data availability. This tutorial aims to illustrate limitations of existing approaches and how they can be extended to be well-suitable for remote query processing and runtime conditions. We consider that this tutorial is ideally co-located with ESWC 2012, because research institutions that traditionally attend ESWC, have an active contribution in the domain of RDF data management. Particularly, one of the conference research tracks is on semantic data management, being query processing of semantic data one of the topics of interests. Thus, many of the conference attendees could see the tutorial as a place to discuss possible solutions to current semantic data management limitations. 2
  • 3. 3 Outline of the Tutorial The goal of the tutorial is to highlight limitations of existing RDF query engines, introduce the basic concepts of existing adaptive query processing techniques and how they can be used to effectively and efficiency access SPARQL endpoints. 3.1 Content The tutorial will cover traditional data management solutions that implement the optimize-then- execute paradigm, and their pros and cons for Linked Data query processing; novel storage and access data structures, and query optimization and execution techniques implemented by state-of- the-art RDF engines will be described. Then, adaptive frameworks defined in the database area to manage remote query processing, will be analyzed; adaptive operators such as symmetric hash joins (binary and n-ary), routing operators, and adaptive engines will be studied. Finally, applicability of adaptive techniques will be illustrated with existing query processing engines for federations of SPARQL endpoints. Attendees will evaluate the performance and robustness of state-of-the-art approaches during a hands-on session; observed results will be discussed with the attendees. 3.2 Schedule Morning Session Introduction (20 minutes): • Traditional data management system architecture and its main components. • Basic terminology. Lecture 1-The Optimize-then-Execute Paradigm (50 minutes): • Cost-based optimization techniques. • Traditional iterator model architecture. • Centralized data management physical operators. • Centralized data management query engines. Lecture 2-Existing RDF Engines (50 minutes): • Query optimization and execution techniques in existing RDF engines like RDF- 3X [3]. • SPARQL endpoints and their execution model. • The SPARQL 1.1 Federation extension [6]. • RDF engines for query processing against federations of SPARQL endpoints; ap- proaches as FedX [5] and ARQ [7] will be studied. Coffee-Break (15 minutes) Lecture 3-Adaptive Query Processing Techniques (100 minutes): • Intra-operators solutions; adaptive physical operators: symmetric hash joins, n-ary joins. • Inter-operators solutions; Eddy operators, query processing schedulers, and routing policies. • Adaptive query engines. Lunch (120 minutes) 3
  • 4. Afternoon Session Lecture 4: Adaptive Approaches for Federations of SPARQL endpoints(50 minutes): • Requirements for query processing in Federations of SPARQL endpoints. • Existing benchmarks for evaluating query processing engines for Federations of SPARQL endpoints, e.g., FedBench [4]. • Adaptive query processing engines for Federations of endpoints; approaches as ANAPSID [1] and Avalanche [2] will be studied. Coffee-Break (15 minutes) Hands-on Session: RDF Storage Systems Evaluation (100 minutes): existing benchmarks will be used to evaluate performance and robustness of state-of-the-art solutions; ARQ, FedX, ANAPSID and Avalanche will be analyzed. Analysis and Discussion of the Evaluation Results (30 minutes): results of the evaluation will be analyzed and discussed with the attendees. 4 Tutorial Former Editions The first edition of the tutorial named Adaptive Semantic Data Management Techniques for Linked Data, was held at ESWC 2011(http://www.eswc2011.org/content/tutorials); it was a half day tu- torial that did not include a hands-on session and the evaluation of state-of-the-art approaches as Avalanche, ARQ, ANAPSID and FedX. 5 Information of Presenters Edna Ruckhaus is a Full Professor of the Computer Science department at the Universidad Sim´ n o Bol´var, Venezuela since 1998, where she has taught several Database courses at undergrad- ı uate level. Visiting scholar of the research group Mindswap (Maryland Information and Net- work Dynamic Lab Semantic Web Agents Project), 2004-2005. Over 20 publications in in- ternational and national conferences and journals. She has been reviewer and has participated in the Program Committee of several International Conferences. Member of the Organizing Committee of the Workshop on Applications of Logic Programming to the Semantic Web and Semantic Web Services (ALPSWS2007) co-located with the International Conference on Logic Programming. Co-Chair of the Organizing Committee of the ESWC 2011 and 2012 Workshops on Resource Discovery; she co-organized and co-lectured the tutorial on Adaptive Semantic Data Management Techniques for Linked Data at ESWC 2011. Maria-Esther Vidal is a Full Professor of the Computer Science department at the Universidad Universidad Sim´ n Bol´var, Venezuela, where she has taught several Database and Semantic o ı Web courses at undergraduate and graduate level. Prof. Vidal has been also a Research Asso- ciate and Visiting Researcher at the Institute of Advanced Computer Studies of the University of Maryland, and Visiting Professor at Universidad Polit´ cnica de Catalunya, University of e Laguna Spain, and Leipzig, Germany. She has participated in several international projects supported by NFS (USA), AECI (Spain) and CNRS (France), and advised six PhD students and more than 55 master and undergraduate students. Professor Vidal has published more than 60 papers in International Conferences and Journals of the Database and The Semantic Web areas. She has been reviewer and has participated in the Program Committee of sev- eral International Journals and Conferences. Co-chair of Workshop on Resource Discovery 4
  • 5. (RED2010) and accompanying professor of On the Move Academy (OTMa). Co-Chair of the Organizing Committee of the ESWC 2011 and 2012 Workshops on Resource Discov- ery; she co-organized and co-lectured the tutorial on Adaptive Semantic Data Management Techniques for Linked Data at ESWC 2011. Maribel Acosta is a PhD student at Institute AIFB, Karlsruhe Institute of Technology, Germany. She has Master on Computer Science from the Universidad Sim´ n Bol´var where she was a o ı Teaching Assistant and has taught Logic, Discrete Math, and Databases labs at the undergrad- uate level. She has published seven publications in international conferences and workshops. Her topics of interests are Adaptive Query Execution techniques for Linked and Big Data. Cosmin Basca is a PhD student at the University of Zurich, Department of Informatics, Switzer- land. He holds a master in Computer Science from “Lucian Blaga” University of Sibiu, Romania where he did research in image processing and computer vision. Later, while being part of Digital Enterprise Research Institute in Galway, Ireland he focused his research on Se- mantic Web, specifically Semantic Data Management. His research interests include among others: large scale distributed graph data management systems and algorithms and Linked Data. Gabriela Montoya is a Lecturer of the Computer Science Department at the Universidad Sim´ n o Bol´var, where she has taught Logic, Algorithms and Programming Languages courses and ı labs at undergraduate level. She has Master on Computer Science from the Universidad Sim´ n Bol´var and currently, she is a doctoral student at the same university; her topics of o ı interests are Data Integration and Query Processing techniques in Emerging Infrastructures. References [1] M. Acosta, M.-E. Vidal, T. Lampo, J. Castillo, and E. Ruckhaus. ANAPSID: AN Adaptive query ProcesSing engIne for sparql enDpoints. In Proceedings of the International Semantic Web Conference (ISWC), 2011. [2] C. Basca and A. Bernstein. Avalanche: Putting the Spirit of the Web back into Semantic Web Querying. In SSWS2010 Workshop, Shanghai, China, 2010. [3] T. Neumann and G. Weikum. RDF-3X: a RISC-style engine for RDF. Proc. VLDB, 1(1), 2008. [4] M. Schmidt, O. Gorlitz, P. Haase, A. Schwarte, G. Ladwig, and T. Tran. Fedbench: A bench- mark suite for federated semantic data query processing. International Semantic Web Confer- ence, 2011. [5] A. Schwarte, P. Haase, K. Hose, R. Schenkel, and M. Schmidt. Fedx: Optimization techniques for federated query processing on linked data. In International Semantic Web Conference (1), pages 601–616, 2011. [6] E. P. Steve Harris, Andy Seaborne. SPARQL 1.1 Query Language, June 2010. [7] M. Stocker, A. Seaborne, A. Bernstein, C. Kiefer, and D. Reynolds. SPARQL basic graph pattern optimization using selectivity estimation. In International Semantic Web Conference (ISWC), Beijing, China, 2008. ACM. 5