Se ha denunciado esta presentación.

Triple-Triple RDF Store with Greedy Graph based Grouping

1

Compartir

Cargando en…3
×
1 de 12
1 de 12

Triple-Triple RDF Store with Greedy Graph based Grouping

1

Compartir

Descargar para leer sin conexión

Transcripción

  1. 1. Title: Triple-Triple RDF store with Greedy Graph Based Grouping Name: Vinoth Chandar ( vinothc@cs.utexas.edu ) Partner's Name: - RDBMS Platform: MySql 5.0
  2. 2. Triple-Triple RDF Store with Greedy Graph Based Grouping Abstract platform, on which the state of the art is evaluated and its Triple stores, based on relational databases, have progress towards the vision is assessed. Efficiently querying received wide attention amongst database such Triple stores presents many challenges. Since RDF is researchers and Semantic web enthusiasts. essentially a graph based data format, queries involve Improving the query performance on very large multiple joins and become really slow when scaling to RDF datasets is a challenging problem, which billion triples. If the 'Semantic Web' vision is to be needs to be addressed for efficient implementation incorporated, then we require very fast query retrieval of such triple stores. In this paper, we explore techniques since the long response times of these systems promising approaches to address the problem. We would be unacceptable to a normal Internet user. explore the possibility of storing the RDF triples in different orders in three tables and develop a In this paper, we explore promising new ideas for Triple query rewriting scheme for the same. We also look store implementation. In particular, we take the Triple-Triple at optimization of the physical schema by graph idea (explained later) to its logical conclusion, and develop clustering techniques that aim to bring related SPARQL to SQL query rewriting mechanisms for the same. triples closer to each other on disk. We also We further enhance the Triple-Triple idea by introducing a present experimental results from the computationally feasible clustering scheme that attempts to implementation of the scheme over a million reduce the number of disk pages accessed, by moving related triples. Our results show that our scheme can subjects/objects/properties closer to each other on disk. In yield significant performance benefits on complex fact, this clustering scheme can be applied to any general queries. indexing scheme for the Triple stores. 1. Introduction Section 2 details related work in this area. Section 3 presents In the recent years, RDF[1] stores or Triple stores, that can various approaches that were considered for improving the store information about (subject, property, object) triples of query performance. Section 4 presents a query rewriting Ontologies, have received significant attention from database technique corresponding to the Triple-Triple idea. Section 5 researchers. Many efforts have been made to implement identifies and analyzes the benefits of grouping related RDF stores using relational databases, and devising efficient triples in the same data block, to reduce the number of disk schemes for accessing information from such stores. These IO operations. Section 6 presents the experimental results efforts are focussed towards the larger vision of the and Section 7 concludes. 'Semantic Web'. To realize this vision, RDBMS based Triple stores should be able to store , and query enormous amounts 2. Related Work of triples that describe web pages on the Internet. [3] establishes the validity of using relational databases to store and query ontologies. The paper extends SQL with a The Billion Triple challenge [2] serves as a common set of ontology related operators, that can help obtain more
  3. 3. pertinent results for ontology driven applications . At the spo, osp , pos, to help lookups based on each of the three same time, the applications can also benefit from the elements in the Triple. However, it works with its own query efficient storage and retrieval mechanisms in a RDBMS. language rather than using the general purpose SPARQL and Simplistically, the baseline physical model for storing RDQL. Adabi et al [5] pursue an interesting path, arguing ontologies in RDF format, using a RDBMS consists of two towards having a table for each property, with the associated tables – Symbol table and a Triple table (refer figure 1). An subject and object that are related by that property. Such a ontology describing the elements of the Web, contains URLs vertical partitioning approach tends to reduce the query and URIs, which are long strings (lex field in SymbolTable). response time by incorporating fast linear merge joins, when To avoid redundancy and wastage of disk space, these each table is sorted by the subject or object. However, such elements are assigned unique integer identifiers (hash field in an approach inherently assumes that the queries are property SymbolTable). This mapping from the element to the bound. A non-property bound query would require us to identifiers is stored in the Symbol table. The Triples table query across all the tables. Hexastore [6] furthers the has three columns – s (subject), p (property), o (object) – as multiple indexing approach taken by Kowari, by storing the per RDF conventions and each tuple in the table represents a three elements of a triple, in six different orders. For RDF triple. The table has a compound primary key on all the example, the spo ordering is stored as a sorted list of three columns. Such a naive representation of the triples, subjects, with each subject pointing to another sorted list of enables us to analyze clearly, where the benefits come from, properties defined for that subject. Each property in such when evaluating much more sophisticated physical schema. sorted properties list points to a sorted list of objects defined Figure 1 Baseline Physical model for that subject, property pair. Thus, all joins can be Many research efforts have attempted to propose alternate converted into fast linear merge joins. Hexastore occupies physical schema and improved SPARQL to SQL query five times more space than a single triples table. However, rewriting techniques, to improve query performance over the this is acceptable with the ever falling storage costs. baseline model. This is based on the realization that the baseline model can be used as a simple logical data model 3. Promising Directions alone. Kowari metastore [4] proposes a RDF store based on We will now explore some promising directions in which we AVL trees with each triple stored in three different orders – can further improvements from the baseline physical model.
  4. 4. We will detail each idea and also present some motivation installation. for pursuing [or abandoning] the idea. We also present concrete and motivating examples for our arguments, using 3.2 Triple-Triple MySql. One promising idea is to create three redundant copies of the triples table with the compound primary keys – spo, pos, 3.1 Applicability of Spatial Indexes osp, such that each table has tuples sorted on subject , Potentially faster query execution times can be achieved if property and object in the order of occurrence in the primary the joins between the triples are materialized in euclidean keys of the tables. Figure 2 presents the Triple-Triple Figure 2 Triple­Triple Physical model space, in terms of fast minimum bounding rectangle (MBR) physical model. From here on, we will refer a table by its operations. For example, simply storing each triple as a (s,p), primary key i.e. spo table will denote the triples table with (p,o) line segment will materialize subject-subject joins as a (s,p,o) as the compound primary key. For the spo table, the MBRcontains operation between the line segment triples will be clustered based on subjects and then clustered (s,minhash) (s,maxhash) and the start point of each triple line on properties and then on objects, Such, a clustering ensures segment. Minhash and maxhash are the minimum and that the triples are stored in sorted order in disk and hence maximum integer ids from the symbol table. However, the fast linear merge joins can be applied. Note that this scheme support for R-Tree operations remain limited in commercial requires only 3 times extra space than a triples table, DBMS. MySql does not support spatial joins. Postgresql described in the baseline model. Hence, this approach is does not support R-Trees. Only Oracle Enterprise, supports definitely promising. The primary compound index is also spatial joins. We were unable to pursue this direction further, useful for any query that involves a prefix of the compound due to non availability (or rather infeasibility) of an Oracle key. For example, spo table can answer select queries based
  5. 5. on s, sp, spo columns, using the primary index. for each join operation – spo or pos or osp. Also, we must be able to support selects on any combination of the three triple Though MySql does not support merge joins [7], the idea elements. These decisions are listed in Table 1. still would yield faster execution times since the sorted nature of the data ensures efficient use of the index. Operation Method Remember that, even with a secondary index, if the relevant subject-subject join spo JOIN spo tuples are spread across the table in different blocks, the end performance could be worse than a table scan. For the property-property join pos JOIN pos Semantic web vision, MySql plays a pivotal role since many object-object join osp JOIN osp web sites are powered by MySql. This gives enough subject-object join spo JOIN osp motivation to continue exploring the idea using MySql. For Subject select spo example, a simple three table join on 25K triples, using Triple-Triple yields very encouraging benefits over the Property select pos baseline model. In Figure 3, Triples_s denotes spo table, Object select osp Triples_p denotes the pos table and Triples_o denotes the subject-property selects / spo [no need for a osp table. Triples tables denotes the baseline triples table. property-subject select separate pso] Nodes table will denote the symbol table. All following examples in the paper will use the same conventions to subject-object selects/ osp [no need for a denote the corresponding tables. object-subject selects separate sop] mysql> select * from Triples_s t_s,  Object-property selects/ pos [no need for a Triples_o t_o , Triples_p t_p where  property-object selects separate ops] t_s.s = t_o.o and t_s.s = t_p.p;  Empty set (0.28 sec) Subject-property-object Any table mysql> select * from Triples t_1,  select Triples t_2, Triples t_3 where t_1.s =  t_2.o and t_1.s = t_3.p; Table 1 : Answering queries using Triple-Triple Empty set (2 min 10.83 sec) Hence, to our understanding, the three possible orderings – Figure 3  Benefits of Triple­Triple over baseline spo, pos, osp – are sufficient for handling the same set of queries that the Hexastore handles. The only missing piece in In comparison to the Hexastore, the Triple-Triple stores only building a complete Triple-Triple store is to define three possible orderings of the elements in a triple. We mechanisms to convert sparql to sql queries, using the explore if these orderings are sufficient for answering the appropriate tables for each triple. We explore this problem in same range of queries answered by the Hexastore. The only the next section. possible joins are subject-subject joins, object-object joins, property-property joins and subject-object joins, with all 3.3 Applicability of Secondary Indexes joins being equijoins. Hence, mechanisms for using the It is tempting to create secondary indexes on other columns , Triple-Triple should judiciously choose which table to use
  6. 6. that do not form a prefix of the primary key of the triples is adopted. Since all joins of t1 are subject joins, we safely table (both in the baseline and the Triple-Triple idea). For choose spo table for t1. Since all the joins involving t4 are example, the spo table can have secondary indexes on po and subject joins, we choose spo table for t4. However, t3 o, to speed up the selects on those columns. However, initial presents a difficulty. For the t1,t3 join to be efficient , we experiments showed no improvement in the query response need to use spo table for t3. For the t3,t4 join to be efficient, times, since the bulk of the benefit was from the primary we need to pick the osp table for t3. Clearly, only one of index or the values were split across multiple disk pages so these options is possible. In general, a sparql to sql compiler that the secondary index did not make sense. For example, for the Triple-Triple has to make these hard decisions there are relatively very few properties in the data, when dynamically during runtime. From our experiments, we compared to subjects or objects. Hence, a secondary index noticed that the MySql optimizer (or any other DBMS) does would not be beneficial here. Also, along the similar lines, not do a good job in choosing the right join order for the secondary index on the lex field of the symbol table did not tables and substantial performance gains can be achieved by yield significant benefits. Hence, we stick with the Triple- simply rewriting the query by explicitly specifying a join Triple baseline model. order. These are hard search problems and thus, even in the context of the Triple-Triple, the sparql compiler cannot be 4. Add-Join expected to do a very good job in optimizing the choice of Converting SPARQL to a SQL query on the baseline is tables. straight forward. When deciding which tables to use for each triple in the Triple-Triple store, we can be faced with Hence, we adopt a method we term as Add-Join, which tries interesting tradeoffs. For example, consider the SPARQL to achieve the best of both worlds, by using multiple triples query in figure 4. tables for a single triple in the SPARQL query. In effect, we SELECT ?label ?producer ?comment add extra joins to the resulting sql query. But, as we show, WHERE { the cost of such additional joins is no way prohibitive and t1   dataFromProducer9:Product444  rdfs:label ?label . can be traded off in return for a deterministic simple sparql- t2   dataFromProducer9:Product444  sql compilation technique. For example, in the above rdfs:comment ?comment . t3   dataFromProducer9:Product444  example, we use two tables for t3 triple – t3_o [which is a bsbm:producer ?p . osp table] and t3_s [which is a spo table]. We join t1 with t4   ?p rdfs:label ?producer  } t3_s and t3_o with t4, and finally join t3_s and t3_o on all Figure 4: Tradeoff in SQL conversion three columns. The final join is very fast since it involves all three columns, so that the primary index can be used. In The triples involved in the sparql query are marked t1, t2, t3 effect, all of the joins in the query can use the clustered and t4. It is easy to observe that t1 joins t2,t3,t4 on the primary index. Figure 5 shows that the additional joins are subject and t3 joins t4 using a subject-object join. Our task is not prohibitively expensive, using the same example. to select one of the three triple tables – spo, pos, osp – Also, when rewriting the query for the Triple-Triple, we judiciously so that all joins make use of the primary index must ensure that we have as few rows to start with as and hence the fastest access path to access the relevant data possible. Since, MySql uses a single-sweep multi join
  7. 7. algorithm [8], this would ensure that we try to match as few method described above. tuples as possible in each stage of the multi join. 5. Graph Based Grouping Add­Join : We observe that the further benefits can be achieved only select * from Triples_s as t1  through physical optimizations. One interesting observation STRAIGHT_JOIN Triples_s as t3_s  STRAIGHT_JOIN Triples_o as t3_o  we make is that the selects on the triples table can be STRAIGHT_JOIN Triples_s as t4 where  improved if the related tuples are brought closer to each t1.s = ­1280067686087321383 and t1.s =  t3_s.s and t3_o.o = t4.s and t3_s.s =  other on disk. For example consider the spo table with t3_o.s and t3_s.p = t3_o.p and t3_s.o =  t3_o.o;   selects using po, p, o on it. This is a common scenario when 0.2 sec triples with different properties are joined on the subject. Use spo for t3: Remember that we introduce additional joins only for the  select * from Triples_s as t1  joins between triples and selects using bound input values. STRAIGHT_JOIN Triples_s as t3  STRAIGHT_JOIN Triples_s as t4 where  t1.s = ­1280067686087321383 and t1.s =  By bringing the subjects with common properties and t3.s and t3.o = t4.s; 0.17 sec objects, closer to each other, we, in effect, reduce the Figure 5 Cost of additional joins number of disk pages across which the result of the select Hence, in addition to joins, we can also involve multiple operation is distributed. Thus, such a scheme would result in triple tables for a single triple, when there is an initial select direct reductions in the I/O cost for the operation. The same operation on the triple. For eg: Though a triple joins on s, it scenario can be applied to other two tables as well, bringing might involve a select on p as a bound input value. In those together related properties and objects respectively. cases, selecting p using a spo table may not be efficient. Hence, we introduce an additional pos table for the triple and The integer identifiers assigned to the elements of the triples, perform the select upon it, and later joining the pos and spo determine the order in which the triples appear on disk. tables. We now present the algorithm to convert SPARQL to Right now, these ids are assigned randomly. Hence, we SQL, based on Add-Join method. cannot use general purpose clustering schemes based on euclidean distances, to group the related triples together. Query Rewriting method: Also, without the Triple-Triple, it would be impossible to Step 0. Convert the SPARQL query to sql, on the baseline give equal priority to each of subject, property and object. Step 1. Identify the triples that have bound values for its This is because the first element of the compound primary elements i.e. the input for the sql query. key determines the order on disk and hence a clustering Step 2. In the explicit join order that we intend to provide, scheme has to choose between the three elements. Another start with the triple with bound input values and follow it approach is to abandon the primary keys all together and with triples that join with that triple. define a physical representation that brings together related Step 3. For selects on the non join attributes, insert entries triples close to each other , based on all three columns. for 'Nodes' as necessary in the explicit order. However, such a scheme would compromise on having data Step 4. Once, the explicit ordering is done, introduce in sorted order. The Triple-Triple idea lends flexibility by additional triple tables for each triple, as per the Add-Join allowing us to optimize with respect to subject, property and
  8. 8. object using the spo, pos and osp tables respectively. formulated as an optimization problem as described in figure Hence, the integers Ids need to be assigned intelligently in 6. The formulation can be suitably generalized to P-graphs order to leverage these benefits. We now define metrics and O-graphs also. The problem is an instance of the which quantify the strength of the relationship between two Knapsack constrained Maximum Weighted cluster problem elements. S-score, P-score and O-score denote the metrics [9]. for interrelating subjects, properties and objects respectively. The formulation aims to extract the cluster from the graph S-score(s1, s2) = Number of triple pairs t1,t2 with (t1.s=s1 such that the sum of all the edge weights in the cluster is and t2.s=s2) and ( t1.p = t2.p or t1.o = t2.o) maximal, subject to the constraint that there can be at most B triples on a block. For MySql, which uses 16Kb data blocks, defines the S-score for two subjects s1 and s2. P-score and B = 16Kb/12 = 1300. Once, such a maximal cluster is O-score are defined similarly. Once, we have these metrics extracted, we assign consecutive ids to all the elements of computed, we build three graphs S-graph, P-graph, O-graph the cluster. We then repeat the algorithm, pruning the graph that depict relationships between subjects, properties , removing the assigned edges. However, in practice, this objects using S-score, P-score, O-score as edge weights problem is NP hard and grows computationally unrealistic, respectively. The S-graph will have a vertices for each with large data sets which involve thousands of subjects. The subject and S-score between two subjects as the edge weight. S-graph is also very dense and this complicates the solution Note that no subject or object occurs as a property. Hence, further. For example, a 25K triples database, contains 2367 the problem of assigning ids to properties can be solved subjects, 200K edges. Hence, when we scale to a billion independent of the other two elements. However, some triples, the graph construction itself may become a very long subjects also occur as objects. But, there can be only one id process. There are other graph clustering tools such as that can be assigned to that element. We therefore prune out MCL[10] and Graclus [11], for unsupervised learning from the O-graph by removing all vertices and edges graphs. Though, these tools do not solve the exact problem corresponding to such overlapping subjects. as described above, they attempt to provide clusters from the graph based on connected components. Attempts at Let S denote a cluster and Si denote a  subject belonging to S.  hierarchical clustering using the MCL algorithm, yields only Objective : 3 clusters, reflecting on the dense nature of these graphs and Max : ∀i ∀ j Σ S­score( Si, Sj ),i!= j Constraints: non applicability of standard graph clustering techniques. Σ numTriples(Si) <= B where numTriples(Si) denote the number  of triples with subject Si Hence, we attempt to develop computationally feasible B denotes the number of triples per  greedy algorithms for the problem. One such greedy block algorithm is described in Figure 7. The algorithm greedily Figure 6 : Optimal clustering of subjects constructs parts of the relationship graphs and assigns identifiers based on such partial graphs. The algorithm Each disk page can be viewed as a cluster or group and the closely approximates the optimal solution for certain parts of problem of finding the most related subjects can be the graph. Nonetheless, it is suboptimal since we ignore the
  9. 9. strength of the relationships between the discovered subjects queries that cause scalability problems for applications. Si. While there is a subject S [that has  RDBMS MySql 5.0 not been assigned an Id] : ­ Assign next available id to S OS Ubuntu 8.04 ­ compute all the subjects Si  related to S; i.e has a non zero S­ Processor AMD Turion TL 58 score with S ­ compute S­score(S, Si) for all  32/64 bit 32 such discovered subjects ­ Assign ids to Si in the  Processor speed 1.9 Ghz increasing order of S­score(S,Si) till  Σ numTriples(Si) <= B L1 cache (MB) 128 Figure 7: Greedy Grouping L2 cache (MB) 512 FSB speed (MHz) 800 In section 6, we validate the effectiveness of this algorithm. RAM (Gb) 2 Once again, the id assignment for properties can be done in a symmetric fashion. For the objects, we need to additionally Disk capacity (Gb) 160 ignore objects which are also subjects. Disk rotation 5400 (RPM) 6. Empirical results Buffering (MB) 8 We present empirical results that demonstrate the validity of the approaches proposed in this paper. Specifically, we study Table 2 Platform details the query performance compared to the baseline and the real benefits of the grouping technique described in the previous The improvements on query 3 and query 6 are not section. Our experimental setup is detailed in Table 2. significant. The baseline timings for those queries are not very large in the first place and involve lesser number of 6.1 Query Performance joins. For example, query 6 involves only one triple. Add- For each scheme that we evaluate, we define a metric called Join uses two triples for the query and offers the same speed up to compare it against the baseline. Speed up is amount of performance as the baseline. There are very simply the query response time for the baseline divided by significant benefits on the queries 4 and 5. The benefits due the query response time for the scheme for the same query. to Add-Join method account for the bulk of the benefits over Higher the speed up better the scheme i.e. it determines how the baseline amounting to approximately 50x improvement many magnitudes of performance improvement is achieved from the baseline. The direct benefits due to grouping by the scheme. Figure 8 and 9 present the speed ups for technique amount to 10x-20x times over the baseline, when three schemes – Rewrite (simply rewriting the query by compared to the Add-Join method without grouping. It explicitly specifying the join order), Add-Join , Add-Join remains to be seen if better grouping techniques can yield with grouping. The results are presented for queries 3,4,5,6 significantly higher benefits. However, these results [12]. Queries 4,5 are typical examples of the complex
  10. 10. demonstrate the validity of the techniques described earlier. Figure 8 Speedup from baseline (250K Triples) Figure 9 Speedup from baseline (1M Triples)
  11. 11. 6.2 Validation of Grouping scheme It is also important to validate the grouping scheme presented in section 5. We expect the grouping scheme to reduce the number of disk I/O for selects on the triples tables. Table 3 presents the number of rows the query optimizer expects to examine, for answering selects on the triple tables. It can be seen that the grouping has resulted in a decrease in the number of rows examined for the query. Table No grouping With grouping spo table 1000619 973866 pos table 1000619 805152 Figure 10 : R­score osp table 1000619 828004 The results indicate that the grouping algorithm has been Table 3 : Expected number of rows accessed for quite effective, increasing the interrelationship by 2x-10x selects times of the random Id assignment. The R-score for the pos table is lower since there are fewer number of properties and We also measure the amount of interrelationship between the hence in a number of cases, a single data block is full of triples in each block, with and without grouping, to observe triples from the same property. In fact, the average edge the effectiveness of our grouping algorithm. For each disk weight for the pos table is much lower than those of the spo block [i.e. a set of 1300 tuples] we construct S-graphs, P- and osp tables. There are higher values for the osp table graphs and O-graphs representing those clusters. We then since there are far more objects than subjects, in which case, compute the sum of all the edge weights of each such cluster there are more number of edges in the cluster graph and average it across all data blocks. This allows us to constructed. quantify the effectiveness of our grouping scheme. Ideally, we should also be comparing our grouping scheme against 7. Conclusions and Future work the optimal solution. However, the optimal solution is very Thus, we have explored some promising approaches to hard to compute as mentioned earlier and also cannot be improving query performance in relational triple stores. An predicted accurately since it depends on the nature of the interesting query rewriting mechanism which introduces triples. We divide the average edge weight for table with additional joins to speed up query execution has been grouping by the average edge weight for the corresponding discussed. Optimization of the physical schema by table without grouping to obtain a metric called the R-score leveraging the interrelationship between the elements of a or the relationship score for those two tables. Figure 10 triple, has been proposed. A greedy grouping algorithm Presents the R-scores for all three triples tables for 250K and which is simple and computationally feasible has been 1M triple tables.
  12. 12. proposed and validated. The results show that our approach Schiefer, Eds. Very Large Data Bases. VLDB Endowment, is promising and can be potentially combined with other 1054-1065. techniques in literature to yield faster RDF stores. As a part [4] David Wood, “Kowari: A Platform for Semantic Web of future work, we intend to compare the performance of the Storage and Analysis”,In XTech 2005 Conference system with the Hexastore and potentially enhance the [5] Abadi, D. J., Marcus, A., Madden, S. R., and Hexastore with our grouping algorithm. As mentioned Hollenbach, K. 2007. Scalable semantic web data earlier, we would like to develop better grouping algorithms management using vertical partitioning. In Proceedings of by leveraging parallel computing techniques to overcome the the 33rd international Conference on Very Large Data Bases computational issues. We believe that better grouping (Vienna, Austria, September 23 - 27, 2007). Very Large Data algorithms can yield significantly higher performance. Bases. VLDB Endowment, 411-422. Another key observation we make, is that no physical [6] Weiss, C., Karras, P., and Bernstein, A. 2008. Hexastore: schema will perform best for all types of queries. Hence, sextuple indexing for semantic web data management. Proc. with the cheap availability of storage, multiple physical VLDB Endow. 1, 1 (Aug. 2008), 1008-1019. DOI= schema can co exist within the same RDF store and the http://doi.acm.org/10.1145/1453856.145396 SPARQL compiler can judiciously employ them based on [7] Nested-Loop Join Algorithms the type of the query. http://dev.mysql.com/doc/refman/5.0/en/nested-loop- joins.html 8. References [8] Using Explain syntax [1] Resource Description Framework http://dev.mysql.com/doc/refman/5.0/en/using-explain.html http://www.w3.org/RDF/ [9] Anuj Mehrotra and Michael A Trick, “Cliques and [2] http://iswc2008.semanticweb.org/calls/call-for-semantic- Clustering: A Combinatorial Approach”, web-challenge-and-billion-triples-tracks/ [10] MCL : An algorithm for clustering graphs, [3] Das, S., Chong, E. I., Eadon, G., and Srinivasan, J. 2004. http://www.micans.org/mcl/ Supporting ontology-based semantic matching in RDBMS. [11] Graclus In Proceedings of the Thirtieth international Conference on http://www.cs.utexas.edu/users/dml/Software/graclus.html Very Large Data Bases - Volume 30 (Toronto, Canada, [12] Class project website August 31 - September 03, 2004). M. A. Nascimento, M. T. http://www.cs.utexas.edu/~jsequeda/cs386d/project.html Özsu, D. Kossmann, R. J. Miller, J. A. Blakeley, and K. B.

Transcripción

  1. 1. Title: Triple-Triple RDF store with Greedy Graph Based Grouping Name: Vinoth Chandar ( vinothc@cs.utexas.edu ) Partner's Name: - RDBMS Platform: MySql 5.0
  2. 2. Triple-Triple RDF Store with Greedy Graph Based Grouping Abstract platform, on which the state of the art is evaluated and its Triple stores, based on relational databases, have progress towards the vision is assessed. Efficiently querying received wide attention amongst database such Triple stores presents many challenges. Since RDF is researchers and Semantic web enthusiasts. essentially a graph based data format, queries involve Improving the query performance on very large multiple joins and become really slow when scaling to RDF datasets is a challenging problem, which billion triples. If the 'Semantic Web' vision is to be needs to be addressed for efficient implementation incorporated, then we require very fast query retrieval of such triple stores. In this paper, we explore techniques since the long response times of these systems promising approaches to address the problem. We would be unacceptable to a normal Internet user. explore the possibility of storing the RDF triples in different orders in three tables and develop a In this paper, we explore promising new ideas for Triple query rewriting scheme for the same. We also look store implementation. In particular, we take the Triple-Triple at optimization of the physical schema by graph idea (explained later) to its logical conclusion, and develop clustering techniques that aim to bring related SPARQL to SQL query rewriting mechanisms for the same. triples closer to each other on disk. We also We further enhance the Triple-Triple idea by introducing a present experimental results from the computationally feasible clustering scheme that attempts to implementation of the scheme over a million reduce the number of disk pages accessed, by moving related triples. Our results show that our scheme can subjects/objects/properties closer to each other on disk. In yield significant performance benefits on complex fact, this clustering scheme can be applied to any general queries. indexing scheme for the Triple stores. 1. Introduction Section 2 details related work in this area. Section 3 presents In the recent years, RDF[1] stores or Triple stores, that can various approaches that were considered for improving the store information about (subject, property, object) triples of query performance. Section 4 presents a query rewriting Ontologies, have received significant attention from database technique corresponding to the Triple-Triple idea. Section 5 researchers. Many efforts have been made to implement identifies and analyzes the benefits of grouping related RDF stores using relational databases, and devising efficient triples in the same data block, to reduce the number of disk schemes for accessing information from such stores. These IO operations. Section 6 presents the experimental results efforts are focussed towards the larger vision of the and Section 7 concludes. 'Semantic Web'. To realize this vision, RDBMS based Triple stores should be able to store , and query enormous amounts 2. Related Work of triples that describe web pages on the Internet. [3] establishes the validity of using relational databases to store and query ontologies. The paper extends SQL with a The Billion Triple challenge [2] serves as a common set of ontology related operators, that can help obtain more
  3. 3. pertinent results for ontology driven applications . At the spo, osp , pos, to help lookups based on each of the three same time, the applications can also benefit from the elements in the Triple. However, it works with its own query efficient storage and retrieval mechanisms in a RDBMS. language rather than using the general purpose SPARQL and Simplistically, the baseline physical model for storing RDQL. Adabi et al [5] pursue an interesting path, arguing ontologies in RDF format, using a RDBMS consists of two towards having a table for each property, with the associated tables – Symbol table and a Triple table (refer figure 1). An subject and object that are related by that property. Such a ontology describing the elements of the Web, contains URLs vertical partitioning approach tends to reduce the query and URIs, which are long strings (lex field in SymbolTable). response time by incorporating fast linear merge joins, when To avoid redundancy and wastage of disk space, these each table is sorted by the subject or object. However, such elements are assigned unique integer identifiers (hash field in an approach inherently assumes that the queries are property SymbolTable). This mapping from the element to the bound. A non-property bound query would require us to identifiers is stored in the Symbol table. The Triples table query across all the tables. Hexastore [6] furthers the has three columns – s (subject), p (property), o (object) – as multiple indexing approach taken by Kowari, by storing the per RDF conventions and each tuple in the table represents a three elements of a triple, in six different orders. For RDF triple. The table has a compound primary key on all the example, the spo ordering is stored as a sorted list of three columns. Such a naive representation of the triples, subjects, with each subject pointing to another sorted list of enables us to analyze clearly, where the benefits come from, properties defined for that subject. Each property in such when evaluating much more sophisticated physical schema. sorted properties list points to a sorted list of objects defined Figure 1 Baseline Physical model for that subject, property pair. Thus, all joins can be Many research efforts have attempted to propose alternate converted into fast linear merge joins. Hexastore occupies physical schema and improved SPARQL to SQL query five times more space than a single triples table. However, rewriting techniques, to improve query performance over the this is acceptable with the ever falling storage costs. baseline model. This is based on the realization that the baseline model can be used as a simple logical data model 3. Promising Directions alone. Kowari metastore [4] proposes a RDF store based on We will now explore some promising directions in which we AVL trees with each triple stored in three different orders – can further improvements from the baseline physical model.
  4. 4. We will detail each idea and also present some motivation installation. for pursuing [or abandoning] the idea. We also present concrete and motivating examples for our arguments, using 3.2 Triple-Triple MySql. One promising idea is to create three redundant copies of the triples table with the compound primary keys – spo, pos, 3.1 Applicability of Spatial Indexes osp, such that each table has tuples sorted on subject , Potentially faster query execution times can be achieved if property and object in the order of occurrence in the primary the joins between the triples are materialized in euclidean keys of the tables. Figure 2 presents the Triple-Triple Figure 2 Triple­Triple Physical model space, in terms of fast minimum bounding rectangle (MBR) physical model. From here on, we will refer a table by its operations. For example, simply storing each triple as a (s,p), primary key i.e. spo table will denote the triples table with (p,o) line segment will materialize subject-subject joins as a (s,p,o) as the compound primary key. For the spo table, the MBRcontains operation between the line segment triples will be clustered based on subjects and then clustered (s,minhash) (s,maxhash) and the start point of each triple line on properties and then on objects, Such, a clustering ensures segment. Minhash and maxhash are the minimum and that the triples are stored in sorted order in disk and hence maximum integer ids from the symbol table. However, the fast linear merge joins can be applied. Note that this scheme support for R-Tree operations remain limited in commercial requires only 3 times extra space than a triples table, DBMS. MySql does not support spatial joins. Postgresql described in the baseline model. Hence, this approach is does not support R-Trees. Only Oracle Enterprise, supports definitely promising. The primary compound index is also spatial joins. We were unable to pursue this direction further, useful for any query that involves a prefix of the compound due to non availability (or rather infeasibility) of an Oracle key. For example, spo table can answer select queries based
  5. 5. on s, sp, spo columns, using the primary index. for each join operation – spo or pos or osp. Also, we must be able to support selects on any combination of the three triple Though MySql does not support merge joins [7], the idea elements. These decisions are listed in Table 1. still would yield faster execution times since the sorted nature of the data ensures efficient use of the index. Operation Method Remember that, even with a secondary index, if the relevant subject-subject join spo JOIN spo tuples are spread across the table in different blocks, the end performance could be worse than a table scan. For the property-property join pos JOIN pos Semantic web vision, MySql plays a pivotal role since many object-object join osp JOIN osp web sites are powered by MySql. This gives enough subject-object join spo JOIN osp motivation to continue exploring the idea using MySql. For Subject select spo example, a simple three table join on 25K triples, using Triple-Triple yields very encouraging benefits over the Property select pos baseline model. In Figure 3, Triples_s denotes spo table, Object select osp Triples_p denotes the pos table and Triples_o denotes the subject-property selects / spo [no need for a osp table. Triples tables denotes the baseline triples table. property-subject select separate pso] Nodes table will denote the symbol table. All following examples in the paper will use the same conventions to subject-object selects/ osp [no need for a denote the corresponding tables. object-subject selects separate sop] mysql> select * from Triples_s t_s,  Object-property selects/ pos [no need for a Triples_o t_o , Triples_p t_p where  property-object selects separate ops] t_s.s = t_o.o and t_s.s = t_p.p;  Empty set (0.28 sec) Subject-property-object Any table mysql> select * from Triples t_1,  select Triples t_2, Triples t_3 where t_1.s =  t_2.o and t_1.s = t_3.p; Table 1 : Answering queries using Triple-Triple Empty set (2 min 10.83 sec) Hence, to our understanding, the three possible orderings – Figure 3  Benefits of Triple­Triple over baseline spo, pos, osp – are sufficient for handling the same set of queries that the Hexastore handles. The only missing piece in In comparison to the Hexastore, the Triple-Triple stores only building a complete Triple-Triple store is to define three possible orderings of the elements in a triple. We mechanisms to convert sparql to sql queries, using the explore if these orderings are sufficient for answering the appropriate tables for each triple. We explore this problem in same range of queries answered by the Hexastore. The only the next section. possible joins are subject-subject joins, object-object joins, property-property joins and subject-object joins, with all 3.3 Applicability of Secondary Indexes joins being equijoins. Hence, mechanisms for using the It is tempting to create secondary indexes on other columns , Triple-Triple should judiciously choose which table to use
  6. 6. that do not form a prefix of the primary key of the triples is adopted. Since all joins of t1 are subject joins, we safely table (both in the baseline and the Triple-Triple idea). For choose spo table for t1. Since all the joins involving t4 are example, the spo table can have secondary indexes on po and subject joins, we choose spo table for t4. However, t3 o, to speed up the selects on those columns. However, initial presents a difficulty. For the t1,t3 join to be efficient , we experiments showed no improvement in the query response need to use spo table for t3. For the t3,t4 join to be efficient, times, since the bulk of the benefit was from the primary we need to pick the osp table for t3. Clearly, only one of index or the values were split across multiple disk pages so these options is possible. In general, a sparql to sql compiler that the secondary index did not make sense. For example, for the Triple-Triple has to make these hard decisions there are relatively very few properties in the data, when dynamically during runtime. From our experiments, we compared to subjects or objects. Hence, a secondary index noticed that the MySql optimizer (or any other DBMS) does would not be beneficial here. Also, along the similar lines, not do a good job in choosing the right join order for the secondary index on the lex field of the symbol table did not tables and substantial performance gains can be achieved by yield significant benefits. Hence, we stick with the Triple- simply rewriting the query by explicitly specifying a join Triple baseline model. order. These are hard search problems and thus, even in the context of the Triple-Triple, the sparql compiler cannot be 4. Add-Join expected to do a very good job in optimizing the choice of Converting SPARQL to a SQL query on the baseline is tables. straight forward. When deciding which tables to use for each triple in the Triple-Triple store, we can be faced with Hence, we adopt a method we term as Add-Join, which tries interesting tradeoffs. For example, consider the SPARQL to achieve the best of both worlds, by using multiple triples query in figure 4. tables for a single triple in the SPARQL query. In effect, we SELECT ?label ?producer ?comment add extra joins to the resulting sql query. But, as we show, WHERE { the cost of such additional joins is no way prohibitive and t1   dataFromProducer9:Product444  rdfs:label ?label . can be traded off in return for a deterministic simple sparql- t2   dataFromProducer9:Product444  sql compilation technique. For example, in the above rdfs:comment ?comment . t3   dataFromProducer9:Product444  example, we use two tables for t3 triple – t3_o [which is a bsbm:producer ?p . osp table] and t3_s [which is a spo table]. We join t1 with t4   ?p rdfs:label ?producer  } t3_s and t3_o with t4, and finally join t3_s and t3_o on all Figure 4: Tradeoff in SQL conversion three columns. The final join is very fast since it involves all three columns, so that the primary index can be used. In The triples involved in the sparql query are marked t1, t2, t3 effect, all of the joins in the query can use the clustered and t4. It is easy to observe that t1 joins t2,t3,t4 on the primary index. Figure 5 shows that the additional joins are subject and t3 joins t4 using a subject-object join. Our task is not prohibitively expensive, using the same example. to select one of the three triple tables – spo, pos, osp – Also, when rewriting the query for the Triple-Triple, we judiciously so that all joins make use of the primary index must ensure that we have as few rows to start with as and hence the fastest access path to access the relevant data possible. Since, MySql uses a single-sweep multi join
  7. 7. algorithm [8], this would ensure that we try to match as few method described above. tuples as possible in each stage of the multi join. 5. Graph Based Grouping Add­Join : We observe that the further benefits can be achieved only select * from Triples_s as t1  through physical optimizations. One interesting observation STRAIGHT_JOIN Triples_s as t3_s  STRAIGHT_JOIN Triples_o as t3_o  we make is that the selects on the triples table can be STRAIGHT_JOIN Triples_s as t4 where  improved if the related tuples are brought closer to each t1.s = ­1280067686087321383 and t1.s =  t3_s.s and t3_o.o = t4.s and t3_s.s =  other on disk. For example consider the spo table with t3_o.s and t3_s.p = t3_o.p and t3_s.o =  t3_o.o;   selects using po, p, o on it. This is a common scenario when 0.2 sec triples with different properties are joined on the subject. Use spo for t3: Remember that we introduce additional joins only for the  select * from Triples_s as t1  joins between triples and selects using bound input values. STRAIGHT_JOIN Triples_s as t3  STRAIGHT_JOIN Triples_s as t4 where  t1.s = ­1280067686087321383 and t1.s =  By bringing the subjects with common properties and t3.s and t3.o = t4.s; 0.17 sec objects, closer to each other, we, in effect, reduce the Figure 5 Cost of additional joins number of disk pages across which the result of the select Hence, in addition to joins, we can also involve multiple operation is distributed. Thus, such a scheme would result in triple tables for a single triple, when there is an initial select direct reductions in the I/O cost for the operation. The same operation on the triple. For eg: Though a triple joins on s, it scenario can be applied to other two tables as well, bringing might involve a select on p as a bound input value. In those together related properties and objects respectively. cases, selecting p using a spo table may not be efficient. Hence, we introduce an additional pos table for the triple and The integer identifiers assigned to the elements of the triples, perform the select upon it, and later joining the pos and spo determine the order in which the triples appear on disk. tables. We now present the algorithm to convert SPARQL to Right now, these ids are assigned randomly. Hence, we SQL, based on Add-Join method. cannot use general purpose clustering schemes based on euclidean distances, to group the related triples together. Query Rewriting method: Also, without the Triple-Triple, it would be impossible to Step 0. Convert the SPARQL query to sql, on the baseline give equal priority to each of subject, property and object. Step 1. Identify the triples that have bound values for its This is because the first element of the compound primary elements i.e. the input for the sql query. key determines the order on disk and hence a clustering Step 2. In the explicit join order that we intend to provide, scheme has to choose between the three elements. Another start with the triple with bound input values and follow it approach is to abandon the primary keys all together and with triples that join with that triple. define a physical representation that brings together related Step 3. For selects on the non join attributes, insert entries triples close to each other , based on all three columns. for 'Nodes' as necessary in the explicit order. However, such a scheme would compromise on having data Step 4. Once, the explicit ordering is done, introduce in sorted order. The Triple-Triple idea lends flexibility by additional triple tables for each triple, as per the Add-Join allowing us to optimize with respect to subject, property and
  8. 8. object using the spo, pos and osp tables respectively. formulated as an optimization problem as described in figure Hence, the integers Ids need to be assigned intelligently in 6. The formulation can be suitably generalized to P-graphs order to leverage these benefits. We now define metrics and O-graphs also. The problem is an instance of the which quantify the strength of the relationship between two Knapsack constrained Maximum Weighted cluster problem elements. S-score, P-score and O-score denote the metrics [9]. for interrelating subjects, properties and objects respectively. The formulation aims to extract the cluster from the graph S-score(s1, s2) = Number of triple pairs t1,t2 with (t1.s=s1 such that the sum of all the edge weights in the cluster is and t2.s=s2) and ( t1.p = t2.p or t1.o = t2.o) maximal, subject to the constraint that there can be at most B triples on a block. For MySql, which uses 16Kb data blocks, defines the S-score for two subjects s1 and s2. P-score and B = 16Kb/12 = 1300. Once, such a maximal cluster is O-score are defined similarly. Once, we have these metrics extracted, we assign consecutive ids to all the elements of computed, we build three graphs S-graph, P-graph, O-graph the cluster. We then repeat the algorithm, pruning the graph that depict relationships between subjects, properties , removing the assigned edges. However, in practice, this objects using S-score, P-score, O-score as edge weights problem is NP hard and grows computationally unrealistic, respectively. The S-graph will have a vertices for each with large data sets which involve thousands of subjects. The subject and S-score between two subjects as the edge weight. S-graph is also very dense and this complicates the solution Note that no subject or object occurs as a property. Hence, further. For example, a 25K triples database, contains 2367 the problem of assigning ids to properties can be solved subjects, 200K edges. Hence, when we scale to a billion independent of the other two elements. However, some triples, the graph construction itself may become a very long subjects also occur as objects. But, there can be only one id process. There are other graph clustering tools such as that can be assigned to that element. We therefore prune out MCL[10] and Graclus [11], for unsupervised learning from the O-graph by removing all vertices and edges graphs. Though, these tools do not solve the exact problem corresponding to such overlapping subjects. as described above, they attempt to provide clusters from the graph based on connected components. Attempts at Let S denote a cluster and Si denote a  subject belonging to S.  hierarchical clustering using the MCL algorithm, yields only Objective : 3 clusters, reflecting on the dense nature of these graphs and Max : ∀i ∀ j Σ S­score( Si, Sj ),i!= j Constraints: non applicability of standard graph clustering techniques. Σ numTriples(Si) <= B where numTriples(Si) denote the number  of triples with subject Si Hence, we attempt to develop computationally feasible B denotes the number of triples per  greedy algorithms for the problem. One such greedy block algorithm is described in Figure 7. The algorithm greedily Figure 6 : Optimal clustering of subjects constructs parts of the relationship graphs and assigns identifiers based on such partial graphs. The algorithm Each disk page can be viewed as a cluster or group and the closely approximates the optimal solution for certain parts of problem of finding the most related subjects can be the graph. Nonetheless, it is suboptimal since we ignore the
  9. 9. strength of the relationships between the discovered subjects queries that cause scalability problems for applications. Si. While there is a subject S [that has  RDBMS MySql 5.0 not been assigned an Id] : ­ Assign next available id to S OS Ubuntu 8.04 ­ compute all the subjects Si  related to S; i.e has a non zero S­ Processor AMD Turion TL 58 score with S ­ compute S­score(S, Si) for all  32/64 bit 32 such discovered subjects ­ Assign ids to Si in the  Processor speed 1.9 Ghz increasing order of S­score(S,Si) till  Σ numTriples(Si) <= B L1 cache (MB) 128 Figure 7: Greedy Grouping L2 cache (MB) 512 FSB speed (MHz) 800 In section 6, we validate the effectiveness of this algorithm. RAM (Gb) 2 Once again, the id assignment for properties can be done in a symmetric fashion. For the objects, we need to additionally Disk capacity (Gb) 160 ignore objects which are also subjects. Disk rotation 5400 (RPM) 6. Empirical results Buffering (MB) 8 We present empirical results that demonstrate the validity of the approaches proposed in this paper. Specifically, we study Table 2 Platform details the query performance compared to the baseline and the real benefits of the grouping technique described in the previous The improvements on query 3 and query 6 are not section. Our experimental setup is detailed in Table 2. significant. The baseline timings for those queries are not very large in the first place and involve lesser number of 6.1 Query Performance joins. For example, query 6 involves only one triple. Add- For each scheme that we evaluate, we define a metric called Join uses two triples for the query and offers the same speed up to compare it against the baseline. Speed up is amount of performance as the baseline. There are very simply the query response time for the baseline divided by significant benefits on the queries 4 and 5. The benefits due the query response time for the scheme for the same query. to Add-Join method account for the bulk of the benefits over Higher the speed up better the scheme i.e. it determines how the baseline amounting to approximately 50x improvement many magnitudes of performance improvement is achieved from the baseline. The direct benefits due to grouping by the scheme. Figure 8 and 9 present the speed ups for technique amount to 10x-20x times over the baseline, when three schemes – Rewrite (simply rewriting the query by compared to the Add-Join method without grouping. It explicitly specifying the join order), Add-Join , Add-Join remains to be seen if better grouping techniques can yield with grouping. The results are presented for queries 3,4,5,6 significantly higher benefits. However, these results [12]. Queries 4,5 are typical examples of the complex
  10. 10. demonstrate the validity of the techniques described earlier. Figure 8 Speedup from baseline (250K Triples) Figure 9 Speedup from baseline (1M Triples)
  11. 11. 6.2 Validation of Grouping scheme It is also important to validate the grouping scheme presented in section 5. We expect the grouping scheme to reduce the number of disk I/O for selects on the triples tables. Table 3 presents the number of rows the query optimizer expects to examine, for answering selects on the triple tables. It can be seen that the grouping has resulted in a decrease in the number of rows examined for the query. Table No grouping With grouping spo table 1000619 973866 pos table 1000619 805152 Figure 10 : R­score osp table 1000619 828004 The results indicate that the grouping algorithm has been Table 3 : Expected number of rows accessed for quite effective, increasing the interrelationship by 2x-10x selects times of the random Id assignment. The R-score for the pos table is lower since there are fewer number of properties and We also measure the amount of interrelationship between the hence in a number of cases, a single data block is full of triples in each block, with and without grouping, to observe triples from the same property. In fact, the average edge the effectiveness of our grouping algorithm. For each disk weight for the pos table is much lower than those of the spo block [i.e. a set of 1300 tuples] we construct S-graphs, P- and osp tables. There are higher values for the osp table graphs and O-graphs representing those clusters. We then since there are far more objects than subjects, in which case, compute the sum of all the edge weights of each such cluster there are more number of edges in the cluster graph and average it across all data blocks. This allows us to constructed. quantify the effectiveness of our grouping scheme. Ideally, we should also be comparing our grouping scheme against 7. Conclusions and Future work the optimal solution. However, the optimal solution is very Thus, we have explored some promising approaches to hard to compute as mentioned earlier and also cannot be improving query performance in relational triple stores. An predicted accurately since it depends on the nature of the interesting query rewriting mechanism which introduces triples. We divide the average edge weight for table with additional joins to speed up query execution has been grouping by the average edge weight for the corresponding discussed. Optimization of the physical schema by table without grouping to obtain a metric called the R-score leveraging the interrelationship between the elements of a or the relationship score for those two tables. Figure 10 triple, has been proposed. A greedy grouping algorithm Presents the R-scores for all three triples tables for 250K and which is simple and computationally feasible has been 1M triple tables.
  12. 12. proposed and validated. The results show that our approach Schiefer, Eds. Very Large Data Bases. VLDB Endowment, is promising and can be potentially combined with other 1054-1065. techniques in literature to yield faster RDF stores. As a part [4] David Wood, “Kowari: A Platform for Semantic Web of future work, we intend to compare the performance of the Storage and Analysis”,In XTech 2005 Conference system with the Hexastore and potentially enhance the [5] Abadi, D. J., Marcus, A., Madden, S. R., and Hexastore with our grouping algorithm. As mentioned Hollenbach, K. 2007. Scalable semantic web data earlier, we would like to develop better grouping algorithms management using vertical partitioning. In Proceedings of by leveraging parallel computing techniques to overcome the the 33rd international Conference on Very Large Data Bases computational issues. We believe that better grouping (Vienna, Austria, September 23 - 27, 2007). Very Large Data algorithms can yield significantly higher performance. Bases. VLDB Endowment, 411-422. Another key observation we make, is that no physical [6] Weiss, C., Karras, P., and Bernstein, A. 2008. Hexastore: schema will perform best for all types of queries. Hence, sextuple indexing for semantic web data management. Proc. with the cheap availability of storage, multiple physical VLDB Endow. 1, 1 (Aug. 2008), 1008-1019. DOI= schema can co exist within the same RDF store and the http://doi.acm.org/10.1145/1453856.145396 SPARQL compiler can judiciously employ them based on [7] Nested-Loop Join Algorithms the type of the query. http://dev.mysql.com/doc/refman/5.0/en/nested-loop- joins.html 8. References [8] Using Explain syntax [1] Resource Description Framework http://dev.mysql.com/doc/refman/5.0/en/using-explain.html http://www.w3.org/RDF/ [9] Anuj Mehrotra and Michael A Trick, “Cliques and [2] http://iswc2008.semanticweb.org/calls/call-for-semantic- Clustering: A Combinatorial Approach”, web-challenge-and-billion-triples-tracks/ [10] MCL : An algorithm for clustering graphs, [3] Das, S., Chong, E. I., Eadon, G., and Srinivasan, J. 2004. http://www.micans.org/mcl/ Supporting ontology-based semantic matching in RDBMS. [11] Graclus In Proceedings of the Thirtieth international Conference on http://www.cs.utexas.edu/users/dml/Software/graclus.html Very Large Data Bases - Volume 30 (Toronto, Canada, [12] Class project website August 31 - September 03, 2004). M. A. Nascimento, M. T. http://www.cs.utexas.edu/~jsequeda/cs386d/project.html Özsu, D. Kossmann, R. J. Miller, J. A. Blakeley, and K. B.

Más Contenido Relacionado

Libros relacionados

Gratis con una prueba de 30 días de Scribd

Ver todo

Audiolibros relacionados

Gratis con una prueba de 30 días de Scribd

Ver todo

×