3. Semantic Web and RDF
Semantic Web
A movement that makes computers be able to
understand meanings of resources on the Web
as same as human with appending metadata.
Resources: Documents, Images, …
RDF(Resource Description Framework)
A basic model for describing information
Triple: Subject,Predicate,Object
Describe attribute or property of „subject‟ with „predicate‟, and
write its value with „object‟
Construct graph structure
has_title “Univ. of
http://www.tsukuba.ac.jp/
Tsukuba”
3
Oval: Resource, Rectangle: Value (string)
4. Linked Data
Objective: It enables people and organizations to
share the structured data on the Web.
Linked Data uses RDF for describing attributes
and properties of data
Linked Data encourages people to use data more effectively
Features
Uses URI and HTTP. It can be referred on the Web.
Uses standardized technologies (RDF, URI, SPARQL)
Linked Open Data
Various data sets contains numerical and statistical values
Human, Company, Biological, Medical, Music, Weather, …
295 data sets, 31 billion triples, 5.4 hundreds links1
4
1: http://www.w3.org/wiki/SweoIG/TaskForces/CommunityProjects/LinkingOpenData
5. It's necessary to apply analytical operation to numerical and
statistical data that are published as a Linked Data.
5
6. OLAP(On-Line Analytical Processing)
An analytical method for huge accumulated data
It can answer to complex and statistical queries
Category OS Mac
Win
PC Laptop
Multi-dimensional model Desktop
Aomori
Data cube 32 686
128
East Sendai
8 2
Numerical values (facts) 100 64
Tokyo 386
Axis for analysis (dimensions) Place 686 386
8 32
hierarchical structure Osaka
128 2
West Hiro- 4 16
shima 386
16 64
Fuku- 8
oka 64
32 686
Q1
Q2 128
Q3
f-half Q4
l-half
Time 6
図: http://www.atmarkit.co.jp/fwin2ktutor/sql02/sql02_03.html
7. OLAP with Relational Database
Use data that is stored in Relational Database
Star Schema Category OS Mac
Win
Numerical values (fact table) PC Laptop
Desktop
Axis for analysis (dimension table)
Aomori
32 686
Dim. 128
Category 東 Sendai
8 2
ID 100 64
Tokyo 386
large category
Place 686 386
8 32
small category Osaka
128 2
西 Hiro- 4 16
Fact shima 386
Result 16 64
Dim. Dim. Fuku- 8
TIme ID Place oka 64
ID category_id ID 32 686
Q1
Q2 128
half time_id NS Q3
f-half Q4
quarter 店舗_id city name l-half
sales volume
Time
図: http://www.atmarkit.co.jp/fwin2ktutor/sql02/sql02_03.html 7
8. Objective and problems
Objective
Propose a method applying OLAP to typical numerical or
statistical data that‟s published as Linked Data
Problems
It‟s difficult to analyze RDF on OLAP directly
Need to convert graph data for analysis
on existing OLAP system
Prepare axes and hierarchies for analysis
OLAP needs axis and hierarchies for analysis
8
9. Related work
Benedikt et al1
Proposed a conversion method of Linked Data for OLAP
Uses RDF Data Cube (QB) vocabulary
A RDF vocabulary for describing a Data Cube in RDF
Convert Linked Data that follows QB Voc. to RDB
And hierarchical structures are written in RDF
However, there were only a few Linked Data sets that
follows QB Voc.
1. B. Kampgen, and A. Harth. Transforming Statistical Linked Data for Use in OLAP Systems,
I-SEMANTICS 2011, 7th Int. Conf. on Semantic Systems, 2011. 9
10. In this research, we propose a method
to apply OLAP to TYPICAL Linked Data
10
11. Approaches
Mapping RDF (graph structure) to relational schema
Create hierarchical structure from data and
data links, semi-automatically
(using features of RDF, Linked Data)
11
12. From retrieval of data to schema creation
1. Retrieval of RDF Data
2. Store RDF Data to RDB
3. Selection of analysis target
4. Create dimension table
5. Create schema for OLAP
12
13. 1. RDFデータの取得
1. Retrieval of RDF Data 2. RDFを関係データベースへ格納
3. ユーザーによる分析対象の選択
4. 次元表の作成
Retrieve RDF Data for analysis 5. スキーマの導出
Use RDF dump
Refer with URI
Retrieve resources in the same host recursively
Retrieve outside resources are used as Object too
Resources for analysis(www.example.com) Outside resources(www.w3.or
13
14. 1. RDFデータの取得
2. Store RDF Data to RDB (1/2)
2. RDFを関係データベースへ格納
3. ユーザーによる分析対象の選
4. 次元表の作成
Store triples to RDB in each “rdf:type” 5. スキーマの導出
We can‟t decide table schema
RDF must not use schemas
store as vertical representation 1. DJ Abadiら,2007.
Tsukuba computer “190,000”
category volum
store
result_1 time_1 “Result” table(vertical)
time subject predicate object type
result_1 time time_1 resourc
result_2 e
time result_1 category computer resourc
time_2 e
result_1 volume “190,000” literal
volume
rdf:type result_1 store Tsukuba resourc
“4,000” e
Result Store
Mito result_2 time time_2 resourc
14
e
15. 1. RDFデータの取得
2. Store RDF Data to RDB (2/2)
2. RDFを関係データベース
3. ユーザーによる分析対
4. 次元表の作成
Convert vertical table to horizontal one 5. スキーマの導出
we can know what attributes or properties there are
“Sales result” table(vertical)
subject predicate object type Sales result
result_1 time time_1 resourc subject[PK]
e time[FK]
result_1 category computer resourc category[FK]
e
volume
result_1 volume “190,000” literal
store[FK]
result_1 store Tsukuba resourc
got schema
e
result_2 time time_2 resourc PK: primary key
e FK: foreign key
result_2
“Sales volume “4,000”
volume” table(horizontal) literal
result_2
subject[PK] store
time[FK] Mito
category[FK] resourc
volume store[FK]
e
result_1 time_1 computer “190,000” Tsukuba
result_2 time_2 null “4,000” Mito
15
16. 1. RDFデータの取得
3. Selection of analysis target
2. RDFを関係データベースへ格納
3. ユーザーによる分析対象の選
4. 次元表の作成
User chooses which value will be
5. スキーマの導出
targeted in the analytical operation
Stock
Category
Sales volume subject
subject
subject product_name
name
Time category_ID category_ID
subject time_ID store_ID
date store_ID Store stock numbers
time volume subject
location_ID gn:location
subject
Visitor
name
subject
store_ID
time_ID
visitor counter
16
17. 1. RDFデータの取得
4. Create dimension table
2. RDFを関係データベースへ格納
3. ユーザーによる分析対象の選
4. 次元表の作成
Create dimension table for OLAP
5. スキーマの導出
1 We propose 3 creation methods 2
Use literal that is written in data directly Use layer structure that
is written in data directly
Fact
category
Sales volume
subject
subject
Time name 3
category_ID
subject
time_ID
Use layer structure
date from outside data
store_ID
time
volume store gn:location
subject subject
location_ID name
Blue: outside resources
17
18. 3
Use layer structure from outside data
Use other datasets that can use layer structure
outside datasets (resources)
target dataset
Tsukuba Japan
Mito Ibaraki
layer structure
e.g.)GeoNames
Can use geologically layered structure
Tsukuba / Ibaraki / Japan / Asia
18
19. 1. RDFデータの取得
5. Create schema for OLAP
2. RDFを関係データベースへ
3. ユーザーによる分析対象
4. 概念階層の作成
e.g.) A case “sales volume” was selected as 5. スキーマの導出
a measure
fact-table)Results
dim.-table)Time,Category,Store-gn:place
Dim. Category Dim.
Time Fact subject
Results
subject name
subject
hour
category_subject Store-gn:place
day
time_subject subject
month
store_subject L1(district)
quater
vlume L2(prefecture)
year
L3(country)
L4(continent)
Dim.
19
20. Experiments
Objective
Convert data and create a schema for OLAP from
numerical/statistical data that is published as Linked
Data
Exp. 1) Radiation observation data
National Radioactivity Stat as Linked Data1
A dataset from 環境放射能水準調査 by 文部科学省
文部科学省発行の環境放射能水準調査2をRDF化したデータセット
Exp. 2) Weather observation data
Linked Sensor/Observation Data3
Observatory meta data from more than 200,000 places.
Hurricane observation data from these observatories
1 http://www.kanzaki.com/works/2011/stat/ra/
2 http://radioactivity.mext.go.jp/ja/monitoring_by_prefecture/
20
3 http://wiki.knoesis.org/index.php/LinkedSensorData
21. Exp. 1)Results(1/2)
Got radiation observed data by crawling
Num. of Triples: 1,003,410 (March-Dec. in 2011)
Geo. info.: Use dumped data from GeoNames
Candidates of measure
Observation instance, Time, Location
Obs. Instance Value
rdf:value
ra:20110315/p02/t20 “0.040”
ev:place
ev:time Location
gn:2111833
Time
tl:at
time:20110315T22PT1H “2011-04-14T00:00:00”^^xsd:dateTime
“ra” は “http://www.kanzaki.com/works/2011/stat/ra/”,”time” は “http://www.kanzaki.com/works/2011/stat/dim/d/”
“gn” は “http://sws.geonames.org/”,”ev” は “http://purl.org/NET/c4dm/event.owl#” 21
”tl” は “http://purl.org/NET/c4dm/timeline.owl#”,”rdf” は “http://www.w3.org/1999/02/22-rdf-syntax-ns#”
22. Exp. 1)Results(2/2)
Created schema when “value” was used as a measure
fact-table)integrated table that contains obs. value
dim.-table)
1. Time(created hierarchy structure from obs. time)
2. Location(got hierarchy structure from GeoNames)
Dim.
Fact Dim.
Time observation_instance Location
subject subject subject
sec place_subject layer_1(district)
min time_subject layer_2(prefecture)
hour value layer_3(country)
day layer_4(continent)
month
(observed) value was used as string type
year
It was needed to have the data type completed by the user
22
23. Exp. 2)Results(1/2)
Used a dumped data
Hurricane Bill (17-22nd/Sept./2009)
Num. of triples: 231,021,108 / count of obs.: 21,272,790
Geo info.: used a dumped data of GeoNames
Candidates of measures
time, value, coordinate (lat., long.)
Created a schema when “Value” was used as a
measure
fact table: integrated table(obs. instance and value)
dimension table:
1. observed time (hierarchy structure got from time)
2. observatory (hierarchy structure got from GeoNames)
23
26. Conclusion
Proposed a method to apply OLAP to typical
Linked Data (numerical and statistical data)
processing with features of RDF and Linked Data
get layered structure from inside and outside of the data
and created hierarchy (dimension-table) for OLAP
Applied method to two observation data
convert data and prepare some axes for analysis when
target is chose
create star schema for OLAP
26
27. Future work
Revise method
The triple has many objects are described in the same predicate
case
Subject has many “rdf:type”s case
Subject has no “rdf:type” case
In order to analyze more datasets
Handling a lot of ontologies
Provide a mechanism to use outside resources
Definition of layered structure are different each
Apply and verify a lot of kinds of data in other regions
27