4. Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
Appetite Whetting (1/3)
Google accurately detects Flu trend ahead of the U.S.
Center for Disease Control.
http://www.google.org/flutrends/about/how.html
5. Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
http://www.dailymail.co.uk/sciencetech/article-2120416/Twitter-predicts-stock-prices-
accurately-investment-tactic-say-scientists.html
Appetite Whetting (2/3)
6. Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
Appetite Whetting (3/3)
http://www.nature.com/srep/2011/111215/srep00196/full/srep00196.html
Flavor pyramids for North American and East Asian
cuisines
7. Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
Data Science and RDF
Ø Can we do “data science” using RDF data?
§ Do we have the data?
§ Do we have the tools?
Ø Why should we use RDF?
8. Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
RDF Characteristics
§ Graph data model
§ Clearly defined semantics
§ Support Web-scale distributed publication
9. Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
Available RDF Data
§ Freebase has 1.2 billion triples (Google)
§ The LOD Cloud has more than 31 billion triples
§ Embedded RDF data: schema.org, Drupal…
http://lod-cloud.net/
10. Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
Available RDF Tools
In this presentation we focus on the standard SPARQL:
q W3C Recommendation
q Supports Querying, transforming and updating RDF
data
q Large number of available implementations
q Define a communication protocol
q 427 public SPARQL endpoints
registered on the DataHub*
* http://sw.deri.org/~aidanh/docs/epmonitorISWC.pdf
12. Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
SELECT
?name
WHERE{
?p
:name
?name
.
}ORDER
BY
?name
SPARQL… Simple queries
13. Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
SELECT
?gender
(COUNT(*)
AS
?count)
WHERE{
?p
:gender
?gender
}
GROUP
BY
?gender
SPARQL… BI queries
14. Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
SELECT
?gender
(COUNT(*)
AS
?count)
WHERE{
?p
:gender
?gender
}
GROUP
BY
?gender
SPARQL… BI queries
15. Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
SELECT
?name
(COUNT(?n)
AS
?neighbours)
WHERE{
?p
:knows
?n
.
?p
:name>
?name
.
}
GROUP
BY
?p
?name
ORDER
BY
desc(?neighbours)
SPARQL… BI queries
16. Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
SELECT
?name
(COUNT(?n)
AS
?neighbours)
WHERE{
?p
:knows
?n
.
?p
:name>
?name
.
}
GROUP
BY
?p
?name
ORDER
BY
desc(?neighbours)
SPARQL… BI queries
17. Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
SPARQL… BI queries
Ø How influential a person is within a social network
Ø How a road is within an urban network
Ø How central an employee in an enterprise
18. Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
SPARQL… Graph measure
Can we use SPARQL to compute shortest paths in
the graph?
Short answer: NO!
Long answer: Let’s try!
19. Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
SELECT
?v1
?v2
(MIN(?l)
AS
?shortestPath)
WHERE{
{
?v1
:knows
?v2
BIND
(1
AS
?l)
}
UNION
{
?v1
:knows{2}
?v2
BIND
(2
AS
?l)
}
UNION
{
?v1
:knows{3}
?v2
BIND
(3
AS
?l)
}
FILTER
(?v1
!=
?v2)
}
GROUP
BY
?v1
?v2
SPARQL… graph measure
21. Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
SPARQL… graph measure
Ø finding directions between physical locations
Ø finding the most direct way to contact a person
Ø finding the min-delay communication path
22. Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
SPARQL… clustering
Can we do clustering using SPARQL? YES!
Peer-pressure algorithm implemented using (almost
only) SPARQL*
* http://yarcdata.com/blog/?p=318
23. Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
DROP
GRAPH
<urn:ga/g/xjz1>
;
CREATE
GRAPH
<urn:ga/g/xjz1>;
INSERT
{GRAPH
<urn:ga/g/xjz1>
{?s
:cluster
?clus3}}
WHERE
{
SELECT
?s
(SAMPLE(?clus)
AS
?clus3)
{
{
SELECT
?s
(MAX(?clusCt)
AS
?maxClusCt)
{
SELECT
?s
?clus
(COUNT(?clus)
AS
?clusCt)
WHERE
{
?s
:knows
?o
.
GRAPH
<urn:ga/g/xjz0>
{
?o
:cluster
?clus
}
}
GROUP
BY
?s
?clus
}
GROUP
BY
?s
}
{
SELECT
?s
?clus
(COUNT(?clus)
AS
?clusCt)
WHERE
{
?s
:knows
?o
.
GRAPH
<urn:ga/g/xjz0>
{
?o
:cluster
?clus
}
}
GROUP
BY
?s
?clus
}
FILTER
(?clusCt
=
?maxClusCt)
}
GROUP
BY
?s
}
SPARQL… clustering
24. Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
DROP
GRAPH
<urn:ga/g/xjz1>
;
CREATE
GRAPH
<urn:ga/g/xjz1>;
INSERT
{GRAPH
<urn:ga/g/xjz1>
{?s
:cluster
?clus3}}
WHERE
{
SELECT
?s
(SAMPLE(?clus)
AS
?clus3)
{
{
SELECT
?s
(MAX(?clusCt)
AS
?maxClusCt)
{
SELECT
?s
?clus
(COUNT(?clus)
AS
?clusCt)
WHERE
{
?s
:knows
?o
.
GRAPH
<urn:ga/g/xjz0>
{
?o
:cluster
?clus
}
}
GROUP
BY
?s
?clus
}
GROUP
BY
?s
}
{
SELECT
?s
?clus
(COUNT(?clus)
AS
?clusCt)
WHERE
{
?s
:knows
?o
.
GRAPH
<urn:ga/g/xjz0>
{
?o
:cluster
?clus
}
}
GROUP
BY
?s
?clus
}
FILTER
(?clusCt
=
?maxClusCt)
}
GROUP
BY
?s
}
SPARQL… clustering
25. Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
SPARQL Expressivity
Ø BI-like operations (rollup and drilldown)
Ø Graph Measures
Ø Iterative algorithms (Clustering)
26. Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
SPARQL Scalability…
One approach is to use a scale-out architecture… think
MapReduce or Hadoop
q Translate SPARQL into MapReduce
q Process RDF data directly in MapReduce
27. Digital Enterprise Research Institute www.deri.ie
Enabling networked knowledge
All examples used in this presentation and equivalent of some
of them using Pig Latin is available at:
https://github.com/fadmaa/rdf-analytics
Conclusion
Ø Can we do “data science” using RDF data?
§ Do we have the data? YES
§ Do we have the tools? Almost
v Is SPARQL expressive enough? Almost
v Does it scale? Yes… in principle, No in practice
v Is it usable/easy? Not really