Virtuoso RDF Triple Store Analysis Benchmark & mapping tools RDF / OO

RDF Analysis with
Virtuoso

giovedì 19 dicembre 13

Over view
Triple Store Benchmarking
Virtuoso
Virtuoso Connection
RDF/OO Mapper


Triple store
Benchmarking

How to...
BSBM (Berlin SPARQL
Benchmark)
to compare performance of RDF and
Named Graph store, as well as RDFmapped relational databases

Lehigh University Benchmark
(LUBM)
to facilitate the evaluation of
Semantic Web repositories in a
standard way


Benchmark (1/5)
After an analysis of a April 2013 BSBM experiment in which the
Berlin SPARQL Benchmark version 3.1 was used to measure the
performance of:
Load times for SUTs (hh:mm:ss)
BigData (rev. 6528)
BigOwlim (v. 5.2)
TDB (v. 0.9.4)
Virtuoso6 (ver. 6.04)
Virtuoso7 (ver. 7
.0)


SUT

10M

100M

200M

1B

BigData

00:2:39

00:25:35

00:59:25

-

BigOwlim

00:2:31

00:22:47

00:47:19

4:9:39

TDB
Virtuoso6
Virtuoso7

00:9:41
00:7:06
-

1:37:55
00:19:26
00:3:39

3:34:59
00:31:30
-

1:10:30
00:27:11

Benchmark (2/5)
The tables below summarize the query throughput
for various type of query over all 500 runs (in QpS)
Benchmark Query results: QpS (Queries per Second)
BigData

BigOwlim

TDB

100M

200M

100M

Query 1 49.955
Query 2 42.769
Query 3 37
.280

49.520

Query 1 93.773 65.385
Query 2 115.960 65.158
Query 3 170.242 61.155

43.713
38.355

200M

Query 1 232.234 217
.865
Query 2 109.445 110.019
Query 3 180.245 174.216


100M

200M

Query 1 119.048 94.877
Query 2 158.755 151.883
Query 3 84.660 70.492

Virtuoso7

Virtuoso6
100M

200M

100M

1B

Query 1 125.786 75.324
Query 2 68.929 68.820
Query 3 117426 62.243
.

Benchmark (3/5)
Query 1
Find products for a given set of generic features

Query 2

Query 3


Retrieve basic information about a specific product for
display purposes

Find products having some specific features and not
having one feature

Benchmark (4/5)
Query 1
PREFIX bsbm-inst: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/instances/>
PREFIX bsbm: <http://www4.wiwiss.fu-berlin.de/bizer/bsbm/v01/vocabulary/>

Query 2

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

SELECT DISTINCT ?product ?label


WHERE {


?product rdfs:label ?label . ?product a %ProductType% . ?product bsbm:productFeature
%ProductFeature1% .

PREFIX dc: <http://purl.org/dc/elements/1.1/>

?product bsbm:productFeature %ProductFeature2% .

?product bsbm:productPropertyNumeric1 ?value1 .
!

FILTER (?value1 > %x%)

!

}

SELECT ?label ?comment ?producer ?productFeature ?propertyTextual1 ?propertyTextual2 ?propertyTextual3
?propertyNumeric1 ?propertyNumeric2 ?propertyTextual4 ?propertyTextual5 ?propertyNumeric4
WHERE {

ORDER BY ?label LIMIT 10

%ProductXYZ% rdfs:label ?label .
!

Query 3

%ProductXYZ% rdfs:comment ?comment .

!

%ProductXYZ% bsbm:producer ?p .

!

?p rdfs:label ?producer .
%ProductXYZ% dc:publisher ?p .


PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

!

%ProductXYZ% bsbm:productFeature ?f .

!

?f rdfs:label ?productFeature .

!

%ProductXYZ% bsbm:productPropertyTextual1 ?propertyTextual1 .

!



SELECT ?product ?label
!

WHERE {
?product rdfs:label ?label . ?product a %ProductType% .
!

?product bsbm:productFeature %ProductFeature1% .

!

?product bsbm:productPropertyNumeric3 ?p3 .

!

FILTER (?p3 < %y% )

%ProductXYZ% bsbm:productPropertyNumeric2 ?propertyNumeric2 .

!

OPTIONAL { %ProductXYZ% bsbm:productPropertyTextual4 ?propertyTextual4 }

FILTER ( ?p1 > %x% )

!

!

?product bsbm:productPropertyNumeric1 ?p1 .

!

%ProductXYZ% bsbm:productPropertyNumeric1 ?propertyNumeric1 .

OPTIONAL {
?product bsbm:productFeature %ProductFeature2% . ?product rdfs:label ?testVar }
FILTER (!bound(?testVar))
} ORDER BY ?label LIMIT 10


OPTIONAL { %ProductXYZ% bsbm:productPropertyTextual5 ?propertyTextual5 }
OPTIONAL { %ProductXYZ% bsbm:productPropertyNumeric4 ?propertyNumeric4 }
}

Benchmark (5/5)
Hardware Configuration
Processors: 2 x Intel(R) Xeon(R) CPU E5-2650, 2.00GHz (8 cores and
hyperthreading), Sandy Bridge architecture
Memory: 256GB
Hard Disks: 3 x 1.8TB (7,200 rpm) SATA in RAID 0 (180MB/s sequential
throughput)
Soft ware Configuration
Operating System: Linux version 3.3.4 -3.fc16.x86_64
Filesystem: ext4

Java Version and JVM: Version 1.6.0_31, 64-Bit Server VM (build 20.6-b01)
BSBM generator and test driver version: bibm-0.7
.8


Conclusions about 1st step
The platform chosen was Virtuoso,
downstream of the phase of benchmarking
short loading times
high query throughput


Virtuoso

Starting with Virtuoso
Starting point
Installing Virtuoso
Getting Started


Starting point
Linux CentOS 6.4
run “sudo yum install gcc gmake autoconf automake libtool flex
bison gperf gawk m4 make openssl-devel readline-devel wget”
to install build dependencies
It may be wise to open port 8890/tcp in the firewall
configuration to allow external access to Virtuoso's webbased interfaces such as the Conductor
run “yum update” in order to update the indexes of available
packages


Installing Virtuoso
Download Virtuoso from SourceForge and unpack it with:
“tar xvpfz virtuoso-opensource-6.1.7.tar.gz”
a simple configuration is:
[user@centos virtuoso-opensource-6.1.7]$ ./configure --prefix=/usr/local/ --withreadline
the prefix /usr/local in the above command forms a base directory for
Virtuoso. There will be the following structures:
/usr/local/lib/: various libraries for Jena, Sesame, JDBC and hosting;
/usr/local/bin/: where the main executables (virtuoso-t, isql) live;
/usr/local/share/virtuoso/vad/: used to store VAD archives;
/usr/local/share/virtuoso/doc/: local offline documentation;
/usr/local/var/lib/virtuoso/db/: default location for a Virtuoso instance;
/usr/local/var/lib/virtuoso/vsp/: various VSP scripts.

Building and Installing: [user@centos virtuoso-opensource-6.1.7]$ nice make and
[user@centos virtuoso-opensource-6.1.7]$ sudo make install

Getting Started (1/2)
Take a backup of the virtuoso.ini file in case of
making erroneous changes
run ”cd /usr/local/var/lib/virtuoso/db/” and “virtuoso-t -df”
to start the server
you can access the Conductor menu with
“http://localhost:8890/conductor/”
t wo system users are available:
dba - the relational data and administrative account
dav - the WebDAV administrative account


Getting Started (2/2)
Conductor

Helps you to manage users and automate backup, to install
VAD packages, to execute SQL commands in a wed-based
iSQL tool, to configure the RDF Sponger and to load more

SQL/ODBC Listener

Virtuoso provides a listener on port 1111/tcp. You can
connect directly to this and execute SQL statements with
isql tool

Resource Usage

The defaults with Virtuoso Open-Source give:
•

160 MB process size in memory

•

about 29 MB database

•

total 237 MB footprint on disk
There are 20 threads for db and/or web-server use


Virtuoso Connection

Connections used
RESTFul ser vices
JENA Provider
SESAME Provider


Rest (1/4)
HTTP PUT

Download an example dataset (e.g. geo_coordinates_en_uris_it.ttl from
Dbpedia)
Load the sample data to a named graph identified by
<urn:graph:update:test:put>
> curl --digest --user dba:dba --verbose --url "http://localhost:8890/sparql-graph-crud-auth?
graph-uri=urn:graph:update:test:put" -T /root/Desktop/Dataset/geo_coordinates_en_uris_it.ttl

Query the graph data:
SELECT *
FROM <urn:graph:update:test:put>
WHERE {?s ?p ?o}


Rest (2/4)
HTTP GET

<urn:graph:update:test:get>
> curl --digest --user dba:dba --verbose --url "http://localhost:8890/sparql-graph-crudauth?graph-uri=urn:graph:update:test:get" -T /root/Desktop/Dataset/
geo_coordinates_en_uris_it.ttl

Query the graph data:
> curl --verbose --url "http://localhost:8890/sparql-graph-crud?graphuri=urn:graph:update:test:get"


Rest (3/4)
HTTP DELETE

<urn:graph:update:test:delete>
> curl --digest --user dba:dba --verbose --url "http://localhost:8890/sparql-graph-crud-auth?graphuri=urn:graph:update:test:delete" -T /root/Desktop/Dataset/geo_coordinates_en_uris_it.ttl

Delete the graph data
> curl --digest --user dba:dba --verbose --url "http://localhost:8890/sparql-graph-crud-auth?graphuri=urn:graph:update:test:delete" -X DELETE

To ensure there are no triples after the deletion there are 2 ways:
curl: > curl --verbose --url "http://localhost:8890/sparql-graph-crud?graphuri=urn:graph:update:test:delete"
SPARQL:

SELECT *
FROM <urn:graph:update:test:delete>
WHERE {?s ?p ?o}


Rest (4/4)
HTTP POST

<urn:graph:update:test:post>
> curl --digest --user dba:dba --verbose --url "http://localhost:8890/sparql-graph-crud-auth?
graph-uri=urn:graph:update:test:post" -X POST -T /root/Desktop/Dataset/
geo_coordinates_en_uris_it.ttl

To query the graph data there are t wo ways:
curl: > curl --verbose --url "http://localhost:8890/sparql-graph-crud?graphuri=urn:graph:update:test:post"
SPARQL:

SELECT *
FROM <urn:graph:update:test:post>
WHERE {?s ?p ?o}


What is Jena
Jena is an open source Semantic Web framework for Java
Provides an API to extract data from and write to RDF graphs
The graphs are represented as an abstract "model"
A model can be sourced with data from files, databases, URIs or a
combination of these
A model can also be queried through SPARQL and updated through
SPARUL


Virtuoso Jena provider
Virtuoso Jena Provider is a
Native Graph Model Storage
Provider for the Jena
Framework
It enables to query the Virtuoso
RDF Quad Store by Jena RDF
Frameworks Providers are
available for the latest Jena
2.6.x and 2.10.x versions


Setup
Download latest Virtuoso Jena Provider, Virtuoso JDBC driver, associated classes and
sample programs from the page www.openlinksw.com
Edit the sample programs VirtuosoSPARQLExampleX.java, where X = 1 to 9
Set the JDBC connection strings to a valid Virtuoso Server instance, using the form:
<jdbc:virtuoso://localhost:1111/charset=UTF-8/log_enable=2", "dba", "dba">
From Eclipse, start a new project and add the following jar at the CLASSPATH:
axis.jar
commons-logging.jar
icu4j.jar
xercesImpl.jar
jena-arq.jar
jena-core.jar
jena-iri.jar
slf4j-api.jar
slf4j-simple.jar
virt_jena.jar
virtjdbc.jar


Testing
Once the Provider classes and sample program have been
successfully compiled, the Provider can be tested using the included
sample programs.
Example 1

Example 2

Example 3


returns the contents of the RDF Quad store of
the targeted Virtuoso instance

reads in the contents of FOAF URIs

performs simple addition and deletion operation
on the content of the triple store

What is Sesame

Sesame is an open source Java framework for storing,
querying and reasoning with RDF and RDF Schema
It can be used as a database for RDF and RDF Schema, or as a
Java library for applications that need to work with RDF
internally


Virtuoso Sesame provider
Virtuoso Sesame Provider is a
Nat i ve Graph Model Storage
Pro v ide r f or t h e Se s ame
Framework
It allows to modify, query and
reason with the Virtuoso quad
store
The Se s ame Re p osi tor y AP I
offers a central access point for
connecting to the Virtuoso quad
store; it provides a Java-friendly
ac c e s s p o i n t t o Vi rt u o s o,
abstracting the details of the
underlying machinery
The Provider has been tested
agains t t he late s t ve rsions,
Sesame 2.7
.x.

Setup
Download latest Virtuoso Sesame 2 Provider for the version
of Sesame being used, Virtuoso JDBC dri ver, Sesame
Framework,associated classes and sample programs from the
page www.openlinksw.com
From Eclipse, start a new project and add the following jar at
the CLASSPATH:
virtjdbc.jar
virt_sesame.jar
slf4j-api.jar
slf4j-simple.jar
openrdf-sesame.jar
commons-io.jar


Testing
Once the Provider classes and sample program have been successfully compiled,
the Provider can be tested using the included sample programs
The following tests cover the essentials for connecting to and manipulating
data stored in a Virtuoso repository using the Sesame API
VirtuosoTest
Loading data from URL: http:/
/www.openlinksw.com/dataspace/person/kidehen@openlinksw.com/foaf.rdf
Clearing triple store
Loading data from file: virtuoso_driver/data.nt
Loading UNICODE single triple
Loading single triple
Casted value type
Selecting property
Statement does not exists
Statement exists (by resultset size)
Statement exists (by hasStatement())
Retrieving namespaces
Retrieving statement (http:/
/myopenlink.net/dataspace/person/kidehen http:/
/myopenlink.net/foaf/name null)
Writing the statements to file: (/Users/src/virtuoso-opensource/binsrc/sesame2/results.n3.txt)
Retrieving graph ids
Retrieving triple store size
Sending ask query
Sending construct query
Sending describe query

Conclusions
In this phase of my analysis the use of Jena or Sesame
providers is indifferent, beacause they are both fully
operational about the triple manipulation

Operations

SESAME

Reading RDF

V

V

Wirting RDF

V

V

Reasoning

V

V

SPARQL Support

V

V

Internal Storage

V

V

External Storage


JENA

V

V

RDF/OO Mapping

Why ?
Problem

The explosive development of the Web has brought
for ward the need of semantically rich information: a
vision at the heart of the Semantic Web
Having soft ware application where RDF triple are used,
we often need to work with data stored in a semantic
repository
In such case the use of APIs of these repositories could
be difficult

Solution


The use of an object-RDF mapper is useful in applications
developed with object-oriented approach, to extend the features
of the OO-paradigm to the RDF world

How?
The
Bean!!

A class that contains attributes equivalent to
the semantic properties of the class and includes
get and set methods

JavaBean classes are written in the Java programming
language according to a particular convention
Used to encapsulate multiple objects into a single object (the
bean)
these objects can be passed as a single bean object instead of as multiple
individual objects

Pro and con
Advantages

The advantages are familiarity with the beans they are the
common currency of java frameworks

Disadvantages

The disadvantage is that it is harder to use RDF in a natural
way. Pulling in disparate data sources and merging, the
schemaless aspect of RDF stores, don't work that well when
forced into beans


RDF-Mapping tools
Elmo (ex Alibaba)
Jenabean
Sommer
RDFBeans
RDF2JAVA
RDFReactor

Elmo
Features
BSD, Java 5.0, store Sesame (generic API)
Additional functionality on top of the triple store: predictive caching (preloading
properties and saving query results for future queries), query expansion (for
handling owl:sameAs), dealing with metadata (reification)
JavaBeans concepts for a number of well known web ontologies including Dublin
Core, RSS and FOAF
Dynamic Runtime JavaBean creation based on RDFS/OWL
A set of tools related to the supported ontologies:
RDF crawler
a generic smusher framework
a generic validator framework with various smushers and a validator specific to FOAF

Code generation using Groovy script template
Use of annotated Java interfaces, implemented by dynamic classes at runtime
using Javassist

Jenabean
Jenabean uses Jena's flexible RDF/OWL api to persist java beans. It
takes an unconventional approach to binding that is driven by the
java object model rather than an OWL or RDF schema.
Features

It works against Jena Model API
it should interact with one of the t wo jena backends (SDB,TDB)
use some wrapper to interact with another RDF store
(SAIL,AllegroGraph)


Sommer
Sommer just thinks of java fields as named relations. It
makes those relations explicit with the @RDF annotation
Features

runtime via byte code rewriting
no generation of code
uses Java annotations
store: Sesame
vocabulary: any URIs

RDFBeans
Features
Does not depend on specific triplestore implementation: any supported by
RDF2Go API can be used
Cascade databinding to reduce development time and ensure referential
integrity of complex object data structures
Modular RDFBeans annotations: can be inherited from superclasses and
interfaces
No predefined ontologies and RDF-schemas are required for RDF data
Transactions support (triplestore-specific)
Extensible mechanism of mapping Java data types to RDF literals
Support of basic Java Collections, optionally represented with RDF containers
Support of indexed JavaBean properties
Support of RDF namespaces

RDF2JAVA
Features

good command line
generates code from RDFS
Java classes for RDFS classes: no multiple inheritance
supported and no multiple super classes
very tiny, light weight project
not maintained anymore (soft ware frozen but working)


RDFReactor
Features

Generates code from RDFS
experimental, partial generation from OWL
cardinality constraints
store: via RDF2Go Jena, Sesame and YARS are supported
uses Velocity for template generation


Conclusion
Features

Elmo

Jenabean

Sommer

RDFbeans

Java Annotations

V

V

V

V

Storage via Sesame

V

X

V

V

Storage via Jena

X

V

X

X

JenaBean Generetaion
based on RDFS

V

X

X

-

JenaBean Generetaion
based on OWL

V

V

X

-

Documentation

V

X

X

V

Downstream of the analysis about the mapping tools, the choice fell on Elmo
Elmo is equipped with all the necessary functionality for handling triple
within Virtuoso

The Virtuoso provider chosen was SESAME
SESAME can easily interface with Virtuoso and Elmo

The end


Virtuoso RDF Triple Store Analysis Benchmark & mapping tools RDF / OO

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Virtuoso RDF Triple Store Analysis Benchmark & mapping tools RDF / OO

Similar a Virtuoso RDF Triple Store Analysis Benchmark & mapping tools RDF / OO (20)

Último

Último (20)

Virtuoso RDF Triple Store Analysis Benchmark & mapping tools RDF / OO