This document discusses benchmarking Virtuoso, an open source triplestore, using the Berlin SPARQL Benchmark (BSBM). It summarizes the results of loading and querying datasets of various sizes (10M, 100M, 200M triples) on different systems. Virtuoso showed short loading times and high query throughput. The document also provides information on connecting to and working with Virtuoso using RESTful services, the Jena API, and the Sesame framework.
4. How to...
BSBM (Berlin SPARQL
Benchmark)
to compare performance of RDF and
Named Graph store, as well as RDFmapped relational databases
Lehigh University Benchmark
(LUBM)
to facilitate the evaluation of
Semantic Web repositories in a
standard way
giovedì 19 dicembre 13
5. Benchmark (1/5)
After an analysis of a April 2013 BSBM experiment in which the
Berlin SPARQL Benchmark version 3.1 was used to measure the
performance of:
Load times for SUTs (hh:mm:ss)
BigData (rev. 6528)
BigOwlim (v. 5.2)
TDB (v. 0.9.4)
Virtuoso6 (ver. 6.04)
Virtuoso7 (ver. 7
.0)
giovedì 19 dicembre 13
SUT
10M
100M
200M
1B
BigData
00:2:39
00:25:35
00:59:25
-
BigOwlim
00:2:31
00:22:47
00:47:19
4:9:39
TDB
Virtuoso6
Virtuoso7
00:9:41
00:7:06
-
1:37:55
00:19:26
00:3:39
3:34:59
00:31:30
-
1:10:30
00:27:11
6. Benchmark (2/5)
The tables below summarize the query throughput
for various type of query over all 500 runs (in QpS)
Benchmark Query results: QpS (Queries per Second)
BigData
BigOwlim
TDB
100M
200M
100M
Query 1 49.955
Query 2 42.769
Query 3 37
.280
49.520
Query 1 93.773 65.385
Query 2 115.960 65.158
Query 3 170.242 61.155
43.713
38.355
200M
Query 1 232.234 217
.865
Query 2 109.445 110.019
Query 3 180.245 174.216
giovedì 19 dicembre 13
100M
200M
Query 1 119.048 94.877
Query 2 158.755 151.883
Query 3 84.660 70.492
Virtuoso7
Virtuoso6
100M
200M
100M
1B
Query 1 125.786 75.324
Query 2 68.929 68.820
Query 3 117426 62.243
.
7. Benchmark (3/5)
Query 1
Find products for a given set of generic features
Query 2
Query 3
giovedì 19 dicembre 13
Retrieve basic information about a specific product for
display purposes
Find products having some specific features and not
having one feature
9. Benchmark (5/5)
Hardware Configuration
Processors: 2 x Intel(R) Xeon(R) CPU E5-2650, 2.00GHz (8 cores and
hyperthreading), Sandy Bridge architecture
Memory: 256GB
Hard Disks: 3 x 1.8TB (7,200 rpm) SATA in RAID 0 (180MB/s sequential
throughput)
Soft ware Configuration
Operating System: Linux version 3.3.4 -3.fc16.x86_64
Filesystem: ext4
Java Version and JVM: Version 1.6.0_31, 64-Bit Server VM (build 20.6-b01)
BSBM generator and test driver version: bibm-0.7
.8
giovedì 19 dicembre 13
10. Conclusions about 1st step
The platform chosen was Virtuoso,
downstream of the phase of benchmarking
short loading times
high query throughput
giovedì 19 dicembre 13
13. Starting point
Linux CentOS 6.4
run “sudo yum install gcc gmake autoconf automake libtool flex
bison gperf gawk m4 make openssl-devel readline-devel wget”
to install build dependencies
It may be wise to open port 8890/tcp in the firewall
configuration to allow external access to Virtuoso's webbased interfaces such as the Conductor
run “yum update” in order to update the indexes of available
packages
giovedì 19 dicembre 13
14. Installing Virtuoso
Download Virtuoso from SourceForge and unpack it with:
“tar xvpfz virtuoso-opensource-6.1.7.tar.gz”
a simple configuration is:
[user@centos virtuoso-opensource-6.1.7]$ ./configure --prefix=/usr/local/ --withreadline
the prefix /usr/local in the above command forms a base directory for
Virtuoso. There will be the following structures:
/usr/local/lib/: various libraries for Jena, Sesame, JDBC and hosting;
/usr/local/bin/: where the main executables (virtuoso-t, isql) live;
/usr/local/share/virtuoso/vad/: used to store VAD archives;
/usr/local/share/virtuoso/doc/: local offline documentation;
/usr/local/var/lib/virtuoso/db/: default location for a Virtuoso instance;
/usr/local/var/lib/virtuoso/vsp/: various VSP scripts.
Building and Installing: [user@centos virtuoso-opensource-6.1.7]$ nice make and
[user@centos virtuoso-opensource-6.1.7]$ sudo make install
giovedì 19 dicembre 13
15. Getting Started (1/2)
Take a backup of the virtuoso.ini file in case of
making erroneous changes
run ”cd /usr/local/var/lib/virtuoso/db/” and “virtuoso-t -df”
to start the server
you can access the Conductor menu with
“http://localhost:8890/conductor/”
t wo system users are available:
dba - the relational data and administrative account
dav - the WebDAV administrative account
giovedì 19 dicembre 13
16. Getting Started (2/2)
Conductor
Helps you to manage users and automate backup, to install
VAD packages, to execute SQL commands in a wed-based
iSQL tool, to configure the RDF Sponger and to load more
SQL/ODBC Listener
Virtuoso provides a listener on port 1111/tcp. You can
connect directly to this and execute SQL statements with
isql tool
Resource Usage
The defaults with Virtuoso Open-Source give:
•
160 MB process size in memory
•
about 29 MB database
•
total 237 MB footprint on disk
There are 20 threads for db and/or web-server use
giovedì 19 dicembre 13
19. Rest (1/4)
HTTP PUT
Download an example dataset (e.g. geo_coordinates_en_uris_it.ttl from
Dbpedia)
Load the sample data to a named graph identified by
<urn:graph:update:test:put>
> curl --digest --user dba:dba --verbose --url "http://localhost:8890/sparql-graph-crud-auth?
graph-uri=urn:graph:update:test:put" -T /root/Desktop/Dataset/geo_coordinates_en_uris_it.ttl
Query the graph data:
SELECT *
FROM <urn:graph:update:test:put>
WHERE {?s ?p ?o}
giovedì 19 dicembre 13
20. Rest (2/4)
HTTP GET
Load the sample data to a named graph identified by
<urn:graph:update:test:get>
> curl --digest --user dba:dba --verbose --url "http://localhost:8890/sparql-graph-crudauth?graph-uri=urn:graph:update:test:get" -T /root/Desktop/Dataset/
geo_coordinates_en_uris_it.ttl
Query the graph data:
> curl --verbose --url "http://localhost:8890/sparql-graph-crud?graphuri=urn:graph:update:test:get"
giovedì 19 dicembre 13
21. Rest (3/4)
HTTP DELETE
Load the sample data to a named graph identified by
<urn:graph:update:test:delete>
> curl --digest --user dba:dba --verbose --url "http://localhost:8890/sparql-graph-crud-auth?graphuri=urn:graph:update:test:delete" -T /root/Desktop/Dataset/geo_coordinates_en_uris_it.ttl
Delete the graph data
> curl --digest --user dba:dba --verbose --url "http://localhost:8890/sparql-graph-crud-auth?graphuri=urn:graph:update:test:delete" -X DELETE
To ensure there are no triples after the deletion there are 2 ways:
curl: > curl --verbose --url "http://localhost:8890/sparql-graph-crud?graphuri=urn:graph:update:test:delete"
SPARQL:
SELECT *
FROM <urn:graph:update:test:delete>
WHERE {?s ?p ?o}
giovedì 19 dicembre 13
22. Rest (4/4)
HTTP POST
Load the sample data to a named graph identified by
<urn:graph:update:test:post>
> curl --digest --user dba:dba --verbose --url "http://localhost:8890/sparql-graph-crud-auth?
graph-uri=urn:graph:update:test:post" -X POST -T /root/Desktop/Dataset/
geo_coordinates_en_uris_it.ttl
To query the graph data there are t wo ways:
curl: > curl --verbose --url "http://localhost:8890/sparql-graph-crud?graphuri=urn:graph:update:test:post"
SPARQL:
SELECT *
FROM <urn:graph:update:test:post>
WHERE {?s ?p ?o}
giovedì 19 dicembre 13
23. What is Jena
Jena is an open source Semantic Web framework for Java
Provides an API to extract data from and write to RDF graphs
The graphs are represented as an abstract "model"
A model can be sourced with data from files, databases, URIs or a
combination of these
A model can also be queried through SPARQL and updated through
SPARUL
giovedì 19 dicembre 13
24. Virtuoso Jena provider
Virtuoso Jena Provider is a
Native Graph Model Storage
Provider for the Jena
Framework
It enables to query the Virtuoso
RDF Quad Store by Jena RDF
Frameworks Providers are
available for the latest Jena
2.6.x and 2.10.x versions
giovedì 19 dicembre 13
25. Setup
Download latest Virtuoso Jena Provider, Virtuoso JDBC driver, associated classes and
sample programs from the page www.openlinksw.com
Edit the sample programs VirtuosoSPARQLExampleX.java, where X = 1 to 9
Set the JDBC connection strings to a valid Virtuoso Server instance, using the form:
<jdbc:virtuoso://localhost:1111/charset=UTF-8/log_enable=2", "dba", "dba">
From Eclipse, start a new project and add the following jar at the CLASSPATH:
axis.jar
commons-logging.jar
icu4j.jar
xercesImpl.jar
jena-arq.jar
jena-core.jar
jena-iri.jar
slf4j-api.jar
slf4j-simple.jar
virt_jena.jar
virtjdbc.jar
giovedì 19 dicembre 13
26. Testing
Once the Provider classes and sample program have been
successfully compiled, the Provider can be tested using the included
sample programs.
Example 1
Example 2
Example 3
giovedì 19 dicembre 13
returns the contents of the RDF Quad store of
the targeted Virtuoso instance
reads in the contents of FOAF URIs
performs simple addition and deletion operation
on the content of the triple store
27. What is Sesame
Sesame is an open source Java framework for storing,
querying and reasoning with RDF and RDF Schema
It can be used as a database for RDF and RDF Schema, or as a
Java library for applications that need to work with RDF
internally
giovedì 19 dicembre 13
28. Virtuoso Sesame provider
Virtuoso Sesame Provider is a
Nat i ve Graph Model Storage
Pro v ide r f or t h e Se s ame
Framework
It allows to modify, query and
reason with the Virtuoso quad
store
The Se s ame Re p osi tor y AP I
offers a central access point for
connecting to the Virtuoso quad
store; it provides a Java-friendly
ac c e s s p o i n t t o Vi rt u o s o,
abstracting the details of the
underlying machinery
The Provider has been tested
agains t t he late s t ve rsions,
Sesame 2.7
.x.
giovedì 19 dicembre 13
29. Setup
Download latest Virtuoso Sesame 2 Provider for the version
of Sesame being used, Virtuoso JDBC dri ver, Sesame
Framework,associated classes and sample programs from the
page www.openlinksw.com
From Eclipse, start a new project and add the following jar at
the CLASSPATH:
virtjdbc.jar
virt_sesame.jar
slf4j-api.jar
slf4j-simple.jar
openrdf-sesame.jar
commons-io.jar
giovedì 19 dicembre 13
30. Testing
Once the Provider classes and sample program have been successfully compiled,
the Provider can be tested using the included sample programs
The following tests cover the essentials for connecting to and manipulating
data stored in a Virtuoso repository using the Sesame API
VirtuosoTest
Loading data from URL: http:/
/www.openlinksw.com/dataspace/person/kidehen@openlinksw.com/foaf.rdf
Clearing triple store
Loading data from file: virtuoso_driver/data.nt
Loading UNICODE single triple
Loading single triple
Casted value type
Selecting property
Statement does not exists
Statement exists (by resultset size)
Statement exists (by hasStatement())
Retrieving namespaces
Retrieving statement (http:/
/myopenlink.net/dataspace/person/kidehen http:/
/myopenlink.net/foaf/name null)
Writing the statements to file: (/Users/src/virtuoso-opensource/binsrc/sesame2/results.n3.txt)
Retrieving graph ids
Retrieving triple store size
Sending ask query
Sending construct query
Sending describe query
giovedì 19 dicembre 13
31. Conclusions
In this phase of my analysis the use of Jena or Sesame
providers is indifferent, beacause they are both fully
operational about the triple manipulation
Operations
SESAME
Reading RDF
V
V
Wirting RDF
V
V
Reasoning
V
V
SPARQL Support
V
V
Internal Storage
V
V
External Storage
giovedì 19 dicembre 13
JENA
V
V
33. Why ?
Problem
The explosive development of the Web has brought
for ward the need of semantically rich information: a
vision at the heart of the Semantic Web
Having soft ware application where RDF triple are used,
we often need to work with data stored in a semantic
repository
In such case the use of APIs of these repositories could
be difficult
Solution
giovedì 19 dicembre 13
The use of an object-RDF mapper is useful in applications
developed with object-oriented approach, to extend the features
of the OO-paradigm to the RDF world
34. How?
The
Bean!!
A class that contains attributes equivalent to
the semantic properties of the class and includes
get and set methods
JavaBean classes are written in the Java programming
language according to a particular convention
Used to encapsulate multiple objects into a single object (the
bean)
these objects can be passed as a single bean object instead of as multiple
individual objects
giovedì 19 dicembre 13
35. Pro and con
Advantages
The advantages are familiarity with the beans they are the
common currency of java frameworks
Disadvantages
The disadvantage is that it is harder to use RDF in a natural
way. Pulling in disparate data sources and merging, the
schemaless aspect of RDF stores, don't work that well when
forced into beans
giovedì 19 dicembre 13
37. Elmo
Features
BSD, Java 5.0, store Sesame (generic API)
Additional functionality on top of the triple store: predictive caching (preloading
properties and saving query results for future queries), query expansion (for
handling owl:sameAs), dealing with metadata (reification)
JavaBeans concepts for a number of well known web ontologies including Dublin
Core, RSS and FOAF
Dynamic Runtime JavaBean creation based on RDFS/OWL
A set of tools related to the supported ontologies:
RDF crawler
a generic smusher framework
a generic validator framework with various smushers and a validator specific to FOAF
Code generation using Groovy script template
Use of annotated Java interfaces, implemented by dynamic classes at runtime
using Javassist
giovedì 19 dicembre 13
38. Jenabean
Jenabean uses Jena's flexible RDF/OWL api to persist java beans. It
takes an unconventional approach to binding that is driven by the
java object model rather than an OWL or RDF schema.
Features
It works against Jena Model API
it should interact with one of the t wo jena backends (SDB,TDB)
use some wrapper to interact with another RDF store
(SAIL,AllegroGraph)
giovedì 19 dicembre 13
39. Sommer
Sommer just thinks of java fields as named relations. It
makes those relations explicit with the @RDF annotation
Features
runtime via byte code rewriting
no generation of code
uses Java annotations
store: Sesame
vocabulary: any URIs
giovedì 19 dicembre 13
40. RDFBeans
Features
Does not depend on specific triplestore implementation: any supported by
RDF2Go API can be used
Cascade databinding to reduce development time and ensure referential
integrity of complex object data structures
Modular RDFBeans annotations: can be inherited from superclasses and
interfaces
No predefined ontologies and RDF-schemas are required for RDF data
Transactions support (triplestore-specific)
Extensible mechanism of mapping Java data types to RDF literals
Support of basic Java Collections, optionally represented with RDF containers
Support of indexed JavaBean properties
Support of RDF namespaces
giovedì 19 dicembre 13
41. RDF2JAVA
Features
good command line
generates code from RDFS
Java classes for RDFS classes: no multiple inheritance
supported and no multiple super classes
very tiny, light weight project
not maintained anymore (soft ware frozen but working)
giovedì 19 dicembre 13
42. RDFReactor
Features
Generates code from RDFS
experimental, partial generation from OWL
cardinality constraints
store: via RDF2Go Jena, Sesame and YARS are supported
uses Velocity for template generation
giovedì 19 dicembre 13
43. Conclusion
Features
Elmo
Jenabean
Sommer
RDFbeans
Java Annotations
V
V
V
V
Storage via Sesame
V
X
V
V
Storage via Jena
X
V
X
X
JenaBean Generetaion
based on RDFS
V
X
X
-
JenaBean Generetaion
based on OWL
V
V
X
-
Documentation
V
X
X
V
Downstream of the analysis about the mapping tools, the choice fell on Elmo
Elmo is equipped with all the necessary functionality for handling triple
within Virtuoso
The Virtuoso provider chosen was SESAME
SESAME can easily interface with Virtuoso and Elmo
giovedì 19 dicembre 13