1. Finding the Shortest Path in Transit Networks
Victor Chircu
Faculty of Computer Science, “AlexandruIoanCuza” University, Iasi, Romania
victor.chircu@info.uaic.ro
Abstract.More and more governments provide Sparql
endpoints to query over public data, which sometimes
includes route configuration for transit agencies.
Knowing the route configuration and geolocation for
each route stop in a transit network, one can run an
algorithm to find the (k-)shortest path(s) from point
A to point B. Since the Romanian Government does not
provide this kind of information, this article
describes a way one can access this data, store it in
a triple server, and expose it in Sparql endpoint.
Keywords: semantic web, rdf, sparql, triple store, virtuoso universal server,
dotNetRdf, transit network, public transportation network, k-shortest path, A*
1 Introduction
The focus of this paper is to provide a way to acquire, store and expose transit data, if
this data is not found open over the Web. In recent years, public transportation agen-
cies have been providing transit data freely over the internet, data that can be used by
developers to build useful applications. In some cases1, the agencies even encourage
developers to do so, by organizing applications contests. This trend started in 2005,
when TriMet2and Google developed the Google Transit Feed Specification (GTFS)3
format that will be used for finding routes using the Google Transit Trip Planner4.
Over the years, more transit agencies incorporated trip planners in their websites, and
provided data in GTFS format to be used with the Google Transit Trip Planner.
Unfortunately, this is not the case for Romania, where public transportation
(linked) data is still closed. For example, the RATB5(the main bus transit agency in
Bucharest) websitedoes not provide a trip planner, but does provide an HTML view
of all the routes, and stops per routes. On the other hand, Metrorex6 (subway transit
1
http://mtaappquest.com/
2
http://trimet.org/
3
https://developers.google.com/transit/gtfs/reference
4
http://www.google.com/intl/en/landing/transit/#mdy
5
http://www.ratb.ro/
6
http://www.metrorex.ro
2. agency in Bucharest) does provide a trip planner, but provides only a visual route
map, which cannot be parsed using a computer. But Metorex’s route planner would
need some improvement, since it provides routes only from station to station (the
average user would prefer to enter the origin and destination by clicking on a map)
enter
and does not connect with the RATB routes. Thus, there is a need for a city level tran-
tra
sit route planner that uses routes from all agencies in a city. This is the main purpose
of my dissertation paper, but this paper focuses only on getting the data, organizing it
bu
in a manner that fits my purpose and exposing it in a Sparql endpoint, so it can be
used by other developers.
2 Getting the data
I have managed to get part of the data I need as an SQL database. This database co
This con-
tains most of the routes (name, geometry) and stops (name, location) in the Bucharest
transit network. The routes are managed by the two biggest transit agencies in the
city, RATB and Metrorex. This data is incomplete, because I need to know the route
stops in each route and their sequence. To get this information, I can use a technique
calledweb scarping, because the RATB website does provide a way for human users
,
to see what stops are in each route. As defined in [1], web scraping is“is a computer
omputer
software technique of extracting information from websites
websites”.
Part of the page for one of the RATB routes on the RATB website is represented in
Figure 1.
Fig.1.Route page on the RATB website.
Fig.
3. As you can see, you can select the route from one of the drop down lists (each list
represents a type of route: rail, trolley, bus). The selected route name must be sent as
a parameter to the server. To find out how, we can use a Net monitoring tool, like
Firebug7for Mozila’s Firefox browser. So, for example, let’s say we have chosen 106
irefox
from the bus route drop down list (the third from the left). This selection automatica
automatical-
ly triggers a postback. The post parameters are shown in Figure 2.
2.Using Firebug to detect post parameters.
Fig.2
As you can see, the route name is indeed sent as a post parameter, with the name
tlin3. Knowing this, I can build a small (PowerShell) script that makes a GET request
(
to http://www.ratb.ro/v_trasee.php, gets all the options from the HTML select element
http://www.ratb.ro/v_trasee.php,
that I am interested in (e.g.: the bus route list), and for each option value, makes an
HTTP Post requestwith the required post parameters (e.g.: if we choose the third drop
questwith
down list, then thetlin1, tlin2, tlin4, tlin5 parameters are set to 0, and tlin3 is set to the
tlin1,
option value). The POST returns the page in HTML format. Now, I have to parse the
page, extract the information from the table and write it to disk. The next step is to
next
write the route steps to the existing SQL database. This can be done manually, or
automatically, by matching the stop name from the file with the stop name in the d da-
tabase. Following these steps I have found the complete route itinerary for three th
routes.
7
http://getfirebug.com/
4. 3 Modeling the data
3.1 General Information
Taking in account the classes of entities, and the relationships between these classes,
the domain I want to model is shown in Figure 3.
Fig.3.Transit Domain Model.
Because I want the dataset that I am exposing to be extensible and interoperable with
other datasets, I have chosen to model the data using RDF triples, store it in a Triple
Store and expose it in a SPARQL endpoint.
In the semantic Web, a knowledge base co contains a set ofRDF (Resource Description
RDF
Framework) [2] triples of the form <subject, predicate, object>. Resources are identi-
ident
fied by a unique URI. Subjects and predicates are always resources, while objects can
be either resources or literals (having an associated data type). Using this model,any
a
kind of domain can be represented.
Because in semantic web it is strongly advised to use, if possible, existing RDF v
use vo-
cabularies, and not create a new one, I have searched for one that would suit my
,
needs. I have found the T Transit8vocabulary, which models exactly the domain I am
8
http://vocab.org/transit/terms/
5. trying to describe. Transit is “a vocabulary for describing transit systems and routes
. a routes”
and is based on the General Transit Feed Specification published by Google. The
s Google.
Transit vocabulary core classes are shown in Figure 4.
re
The Transit vocabulary is already used to expose MTA New York city transit data 9
and bus route data in Southampton, UK10 , so it definitely is the most known transit
Southampton
RDF vocabulary.
Fig. 4.Transit Vocabulary Core Classes.
As you can see, the classes in this vocabulary map over the classes in my domain
model. There is one class that I do not use in my domain module, the Schedule class.
This is done because in most of the cities in Romania, public transportation does not
run according to a fixed schedule. By eliminating this class, I have greatly minimized
the amount of data I have to manipulate.
9
http://kasabi.com/dataset/mta
aset/mta-new-york-city-transit
10
http://data.southampton.ac.uk/bus-routes.html
http://data.southampton.ac.uk/bus
6. 3.2 Transit Core Classes and Properties
In this section, I will go over the main Transit classes and properties that I am using
anddefine them according tothe Transit vocabulary document11. You may notice that I
use only a subset of the classes and properties defined in this document, because I use
a model a little less general, by eliminating the Schedule and Service classes.
Note: Each resource’s URI is written in the page footnote section
Classes
Agency12 - an agency is an organization that oversees public transportation for a
city or region (e.g. RATB, Metrorex).
Route13 - a public transportation route; some of its subclasses are:
─ Bus Route14
─ Rail Route15
─ Subway Route16
Stop17 - a location where passengers board or disembark from a transit vehicle
Route Stop18 - a location where passengers board or disembark from a transit ve-
hicle for a specific route.
Properties
agency19 - the agency that operates this public transportation route
route20 - A route associated with the given resource
routeStop21 - Links a route to a particular stop and the sequence of that stop in the
route
stop22 - the physical stop associated with this route stop (Note: this property is not
used according to the Transit vocabulary, because having this property implied be-
ing a ServiceStop. In my domain, having this property implies being a RouteStop,
because I do not need to use the ServiceStop class.
11
http://vocab.org/transit/terms
12
http://vocab.org/transit/terms/Agency
13
http://vocab.org/transit/terms/Route
14
http://vocab.org/transit/terms/BusRoute
15
http://vocab.org/transit/terms/RailRoute
16
http://vocab.org/transit/terms/SubwayRoute
17
http://vocab.org/transit/terms/Stop
18
http://vocab.org/transit/terms/RouteStop
19
http://vocab.org/transit/terms/agency
20
http://vocab.org/transit/terms/route
21
http://vocab.org/transit/terms/routeStop
22
http://vocab.org/transit/terms/stop
7. 3.3 Other Vocabularies
Some of the other RDF vocabularies I am using are:
FOAF23, for the name and page properties
RDF24, for the type property
RDF Schema25, for the label property
Geometry Ontology 26 , for describing the geometry of a route in Well-Known
Text27 (WKT) format.
WGS84 Geo Positioning28, for representing latitude and longitude information in
the WGS84 geodetic reference datum
4 Technical Information
4.1 General Information
Because of my hands-on experience with the Microsoft development stack, I have
chosen to develop the application using .Net/C#. This might be a challenge, because
most of the semantic web tools focus on non-Windows client, as seen in the pie chart
represented in Figure 5, taken from [3]. The same source[3] shows that this trend is
constant, since in 2011 there were not developed any new semantic tools for the .Net
platform.
I have taken in consideration switching to the Java platform, which provides a set of
powerful tools for working with RDF and OWL, most notably the Jena Framework29,
developed by Apache, but after doing some research I have found a tool written for
the .Net platform which suits my needs. This tool will be presented in detail in section
4.4 of this paper.
23
http://xmlns.com/foaf/0.1/
24
http://www.w3.org/1999/02/22-rdf-syntax-ns#
25
http://www.w3.org/2000/01/rdf-schema#
26
http://data.ordnancesurvey.co.uk/ontology/geometry/
27
http://edndoc.esri.com/arcsde/9.0/general_topics/wkt_representation.htm
28
http://www.w3.org/2003/01/geo/wgs84_pos#
29
http://incubator.apache.org/jena/
8. Fig.4. Semantic Web Tools by Programming Language
4.2 Knowledge base
One of the disadvantages of using 3rd party triple stores is that there aren’t any open
open-
source products. But because of the nature of my problem, I could not use an in in-
memory triple store, I needed an efficient one, with a powerful query engine. Upon pon
researching different options, I have decided to use OpenLink’s Virtuoso Universal
option ,
Server30 as a Triple Store. My option was based on Virtuoso maturity, and its RDF
Graph Model features31:
Backward
ackward Chaining OWL Reasoner covering: rdfs:subClassOf,
rdfs:subPropertyOf, owl:sameAs, owl:equivalentClass, owl:equivalentProperty,
bPropertyOf,
owl:InverseFunctionalProperty, owl:inverseOf, owl:SymmetricalProperty, and
owl:TransitiveProperty
SPARQL 1.1 Query Language, Protocol, and Results Serialization support
SPARQL Create, Update, and Delete (SPARUL)
30
http://virtuoso.openlinksw.com/
31
http://virtuoso.openlinksw.com/rdf-quad-store/
http://virtuoso.openlinksw.com/rdf
9. Supports data broad range of RDF model data representation formats:
HTML+RDFa, RDF-JSON, N3, Turtle, TriG, TriX, and RDF/XML
REST interfaces for Create, Read, Update, and Delete operations
RDF Data is accessible also accessible via ODBC, JDBC, ADO.NET (Entity
Frameworks compatible), OLE DB, and XMLA data providers / drivers.
Because the application that I’m building is a non-commercial application, I am not
interested in acquiring a Virtuoso commercial license at the moment. OpenLink does
provide 2 X 15 days trial of the software. I have used the first one while configuring
and testing the Virtuoso Server, and will use the second one on demos later on. While
working on the application, I will use in-memory triple stores, loaded and saved to the
file system.
4.3 .Net API for working with RDF
While looking for a .Net API for working with RDF I have found three possible can-
didates:
Linq2RDF32, a LINQ query provider that converts queries into the SPARQL query
language. Unfortunately, it is not a mature enough API and the last update for this
project was in august 2008, so it is not under development anymore.
Jena .NET33, a flexible .NET port of the Jena semantic web toolkit. Unfortunately
this project is abandoned too, while still a beta 0.3 release.
dotNetRDF34, an open-source semantic web/RDF library for C#/.Net. Even if this
is just a beta 0.5 release, it is still under development, which is a big advantage
over the other two options. This API will be described in the next section.
4.4 dotNetRDF
General Information
Some of the points of interest regarding the API are:
currently a beta release (version 0.5.1)
works on .Net 3.5 (but according to the project’s Issue Tracker35, moving the li-
brary to .Net 4.0 is a top priority)
"simple but powerful API for working with RDF"
operates primarily with Triples, Graphs and Triple Stores
has limited support for Inference
no support for OWL
32
http://code.google.com/p/linqtordf/
33
http://semanticweb.org/wiki/Jena_.NET
34
http://www.dotnetrdf.org/
35
http://www.dotnetrdf.org/tracker/Issues/IssueDetail.aspx?id=22
10. Known formats
The library can read RDF fragments (including Graphs and Triple Stores) from
strings, files and even URIs. It can also write RDF fragments to files and strings.
Reading and writing can be done in all of the most used RDF formats: RDF/XML,
RDF/JSON, NTriples, Turtle, Notation 3, XHTML + RDFa/
Graphs
The API has support for getting Nodes and Triples from a Graph by a given criteria
(which is a combination of subject, predicate and object), merging graphs and compu-
ting graph difference and equality.
Triple Stores
The library can work with both in-memory Triple Stores and native Triple Stores.
It provides support for working with:
in-memory triple stores, loaded and saved from and to disk in two ways:
─ a folder, where each files represents a single Graph, and there is an additional
index file
─ a single file, using one the following formats: TriG, TriX and NQuads
simple SQL based stores with MySQL and Microsoft SQL Server databases
native 3rd party Triple Store: AllegroGraph36, Dydra37, 4store38, Fuseki39, Joseki40,
Sesame 41 (any Sesame based store e.g. BigOWLIM 42 ), SPARQL Graph Store
HTTP Protocol for compliant stores, Stardog43, the Talis Platform44 and Virtuoso.
By providing an easy way to work with Virtuoso based Triple Stores, dotNetRDF
proves to be the right choice.
Querying
Using dotNetRDF one can query easily over:
in-memory Graph using the library’s SPARQL implementation
remote SPARQL endpoints
3rd party Triple Stores, using native query (this is very important, since we can
rely on the more powerful Virtuoso query engine and not on the weaker dot-
NetRDF implementation.
36
http://www.franz.com/agraph/allegrograph/
37
http://dydra.com/
38
http://4store.org/
39
http://incubator.apache.org/jena/documentation/serving_data/index.html
40
http://www.joseki.org/
41
http://www.openrdf.org/
42
http://www.ontotext.com/owlim
43
http://stardog.com/
44
http://www.talis.com/platform/
11. The query mechanism is compatible with the current draft45 of the SPARQL 1.1 stan-
dard.
Inference and Reasoning
The current version provided three types of reasoners:
RDFS Reasoner, which does not apply the full range of possible RDFS based infe-
rencing but does do the following:
─ asserts additional type triples for anything which has a type which is a sub-class
of another type
─ asserts additional triples where the property (predicate) is a sub-property of
another property
─ asserts additional type triples based on the domain and range of properties
SKOS Reasoner is a simple concept hierarchy reasoner which can infer additional
triples where the subject has an object which is a skos:Concept in the taxonomy by
following skos:narrower and skos:broader links as appropriate.
Simple N3 Rules Reasoneris a reasoner that is able to apply simple N3 Rule
Unfortunately, there is no API support for using inference with 3rd party Triple
Stores. Because of this, the reasoner that comes with the Virtuoso Universal Server
cannot be used.
Configuration
The library comes with a very useful Configuration API that can be used to load dy-
namically commonly used objects (such as Graphs, connections to Triple Stores etc.),
and a couple of tools for deploying RDF enabled ASP.NET Web Applications. Be-
cause of these last two features, exposing a SPARQL endpoint is a trivial task.
5 Implementation Details
5.1 Populating the Virtuoso RDF Triple Store
As mentioned in section 2, I have the data in a MS SQL Server Database, and, as
mentioned in section 3, I have the RDF vocabulary. What I have to do is migrate the
data from the SQL database into the Virtuoso Triple Store. I have done this in two
steps:
Write the data from the SQL database into a set of files. To do this in the simplest
way, I have used the RAD features of the Visual Studio 2010 IDE together with
Entity Framework: the Entity Framework46 has created a set of classes, based on
45
http://www.w3.org/TR/sparql11-query/
46
http://msdn.microsoft.com/en-us/data/ee712906
12. the database’s tables. In code, I got the data from the DB, using these classes, and
wrote the data to a set of files.
The second step was to write the data from the files to a Graph. This could not be
done because EF 4.1 is not compatible with .NET 3.5, and the dotNetRDF library
is built on .NET 3.5. After this step, I have written the data from the Graph to the
Virtuoso Triple Store (dotNetRDF makes this task easier).
5.2 Exposing the data through a SPARQL Endpoint
To expose the data in the Virtuoso Triple Store through a SPARQL Endpoint, I
created a new ASP.NET Web Application, and Added in the App_Data folder, a con-
figuration file with the following content:
@prefix dnr: <http://www.dotnetrdf.org/configuration#> .
# Firstly note that our Handler must have a subject which
is a special
dotNetRDF URI as discussed in Configuration API - HTTP
Handlers
<dotnetrdf:/sparql> a dnr:HttpHandler ;
dnr:type "VDS.RDF.Web.QueryHandler" ; # States that we're
using the
QueryHandler
dnr:queryProcessor _:proc .
_:proc a dnr:SparqlQueryProcessor ;
dnr:type "VDS.RDF.Query.SimpleQueryProcessor" ;
dnr:usingStore _:store .
_:store a dnr:TripleStore ;
dnr:type "VDS.RDF.NativeTripleStore" ;
dnr:genericManager _:manager .
# Register the Virtuoso Ffactory
_:virtuosoFactory a dnr:ObjectFactory ;
dnr:type "VDS.RDF.Configuration.VirtuosoObjectFactory, dot-
NetRDF.Data.Virtuoso" .
# Now we define the initial dataset
_:manager adnr:GenericIOManager ;
dnr:type "VDS.RDF.Storage.VirtuosoManager, dot-
NetRDF.Data.Virtuoso" ;
dnr:server "myIp" ;
dnr:port "1111" ;
13. dnr:database "DB" ;
dnr:user "user" ;
dnr:password<appSettings:VirtuosoPassword> .
As you can see, we register an Http Handler47 of type QueryHandler. To this handler,
we associate a QueryProcessor. The QueryProcessor use a Native Triple Store, which
is defined by a manager that points to the Virtuoso Universal Server instance.
Now all I have to do is to register the handlers in the web application’s configuration
fiel (web.config), which is done automatically by the rdfDeploy tool that comes with
dotNetRDF.
6 Future Development
As described in [4], [5], any algorithm that aims to get the shortest path in a network
transit system needs some sort of pre-processing. This is needed since web usage
studies have shown that the path computation time should be less than 7 seconds [6],
[7]. The pre-processed data needs to be stored in the knowledge base, and since it is
algorithm dependent, I might have to extend the Transit vocabulary with a couple of
new classes and properties in order to store the information.
47
http://msdn.microsoft.com/en-us/library/aa479332.aspx
14. 7 Bibliography
1. ***, Web Scraping, http://en.wikipedia.org/wiki/Web_scraping
2. Allemang, D., Hendler j., Semantic Web for the Working Ontologist, Morgan Kaufmann,
2008
3. ***, The State of Tooling for Semantic Technolo-
gies,http://www.mkbergman.com/991/the-state-of-tooling-for-semantic-technologies/
4. J. Jariyasunant, D. Work, B. Kerkez, R. Sengupta, S. Glaser, A. Bayen., Mobile Transit
Trip Planning with Real–Time Data. Presented at the Transportation Research Board ,
2010
5. Qiujin Wu, Joanna Hartley, Using K-Shortest Paths Algorithms To Accommodate User
Preferences In The Optimization Of Public Transport Travel, Applications of Advanced
Technologies in Transportation Engineering, Proceedings of the Eighth International Con-
ference, pp. 181-186, 2004
6. R. Jain, T. Raleigh, C. Graff, and M. Bereschinsky, Mobile internet access and qos guaran-
tees using mobile ip and rsvp with location registers, IEEE Int. Conf. Commun., vol. 3, pp.
1690–1695, 1998.
7. T. Erl, Ed., Service-oriented architecture (SOA): concepts, technology, and design. Pren-
tice Hall, 2005.